[00:01:33] RECOVERY - High average POST latency for mw requests on appserver in eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=POST [00:45:34] !log reset maxmind password [00:45:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:45:39] PROBLEM - Postgres Replication Lag on maps1006 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 7681275728 and 480 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:46:31] PROBLEM - Postgres Replication Lag on maps1005 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 6766379912 and 472 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:46:31] PROBLEM - Postgres Replication Lag on maps1010 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 3440023648 and 287 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:46:55] PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1608248752 and 224 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:48:25] PROBLEM - Postgres Replication Lag on maps1007 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1687333320 and 315 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:48:31] RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 18080 and 254 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:48:47] PROBLEM - Postgres Replication Lag on maps1008 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 7447154800 and 643 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:49:59] RECOVERY - Postgres Replication Lag on maps1007 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 180184 and 343 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:51:17] RECOVERY - Postgres Replication Lag on maps1010 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 33688 and 419 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:51:59] RECOVERY - Postgres Replication Lag on maps1006 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 6296 and 462 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:53:33] RECOVERY - Postgres Replication Lag on maps1008 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 328720 and 556 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:54:27] RECOVERY - Postgres Replication Lag on maps1005 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 49456 and 610 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:00:51] PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 565892264 and 47 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:01:31] PROBLEM - Postgres Replication Lag on maps2009 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 2348840336 and 173 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:01:51] PROBLEM - Postgres Replication Lag on maps2010 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 635228952 and 111 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:02:25] RECOVERY - Postgres Replication Lag on maps2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 60552 and 109 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:03:05] RECOVERY - Postgres Replication Lag on maps2009 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 39592 and 149 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:03:17] AaronSchulz: you around [01:03:21] ? [01:03:25] RECOVERY - Postgres Replication Lag on maps2010 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 0 and 169 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:04:12] (03CR) 10Bstorm: "Ok, I'm moving this back to review after working out all the problems and changing it fully to what we are proposing on https://phabricato" [puppet] - 10https://gerrit.wikimedia.org/r/627379 (https://phabricator.wikimedia.org/T260389) (owner: 10Bstorm) [03:23:01] (03PS1) 10Andrew Bogott: Keystone: use the keystone-wsgi-admin script installed by the package [puppet] - 10https://gerrit.wikimedia.org/r/651875 (https://phabricator.wikimedia.org/T261134) [03:23:03] (03PS1) 10Andrew Bogott: Keystone: update wmtotp plugin to match upstream Password auth code [puppet] - 10https://gerrit.wikimedia.org/r/651876 (https://phabricator.wikimedia.org/T261134) [03:25:50] (03CR) 10Andrew Bogott: [C: 03+2] Keystone: use the keystone-wsgi-admin script installed by the package [puppet] - 10https://gerrit.wikimedia.org/r/651875 (https://phabricator.wikimedia.org/T261134) (owner: 10Andrew Bogott) [03:26:33] (03CR) 10Andrew Bogott: [C: 03+2] Keystone: update wmtotp plugin to match upstream Password auth code [puppet] - 10https://gerrit.wikimedia.org/r/651876 (https://phabricator.wikimedia.org/T261134) (owner: 10Andrew Bogott) [03:35:28] (03PS1) 10Andrew Bogott: Keystone: remove [cors] sections that produce deprecation warnings [puppet] - 10https://gerrit.wikimedia.org/r/651877 (https://phabricator.wikimedia.org/T261134) [03:35:30] (03PS1) 10Andrew Bogott: Keystone: update password_safelist to match upstream code [puppet] - 10https://gerrit.wikimedia.org/r/651878 (https://phabricator.wikimedia.org/T261134) [03:36:08] (03CR) 10jerkins-bot: [V: 04-1] Keystone: update password_safelist to match upstream code [puppet] - 10https://gerrit.wikimedia.org/r/651878 (https://phabricator.wikimedia.org/T261134) (owner: 10Andrew Bogott) [03:50:40] (03CR) 10Andrew Bogott: [C: 03+2] Keystone: remove [cors] sections that produce deprecation warnings [puppet] - 10https://gerrit.wikimedia.org/r/651877 (https://phabricator.wikimedia.org/T261134) (owner: 10Andrew Bogott) [03:56:53] (03PS2) 10Andrew Bogott: Keystone: update password_safelist to match upstream code [puppet] - 10https://gerrit.wikimedia.org/r/651878 (https://phabricator.wikimedia.org/T261134) [03:58:13] (03CR) 10Andrew Bogott: [C: 03+2] Keystone: update password_safelist to match upstream code [puppet] - 10https://gerrit.wikimedia.org/r/651878 (https://phabricator.wikimedia.org/T261134) (owner: 10Andrew Bogott) [04:02:15] (03PS1) 10Andrew Bogott: Keystone: remove token_formatters-fixed.py backport [puppet] - 10https://gerrit.wikimedia.org/r/651880 (https://phabricator.wikimedia.org/T261134) [04:02:44] (03CR) 10Andrew Bogott: [C: 03+2] Keystone: remove token_formatters-fixed.py backport [puppet] - 10https://gerrit.wikimedia.org/r/651880 (https://phabricator.wikimedia.org/T261134) (owner: 10Andrew Bogott) [06:16:21] PROBLEM - MediaWiki exceptions and fatals per minute on alert1001 is CRITICAL: 133 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [06:18:01] RECOVERY - MediaWiki exceptions and fatals per minute on alert1001 is OK: (C)100 gt (W)50 gt 5 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:31:04] (03PS1) 10Samwilson: Disable Collection and ElectronPdfService exts for all Wikisources [mediawiki-config] - 10https://gerrit.wikimedia.org/r/651890 (https://phabricator.wikimedia.org/T255790) [08:58:36] 10Puppet, 10Beta-Cluster-Infrastructure, 10Developer Productivity, 10Patch-For-Review: puppetdb on deployment-puppetdb03 keeps getting OOMKilled - https://phabricator.wikimedia.org/T248041 (10hashar) 05Open→03Resolved a:05Krenair→03hashar The instance went from 2G to 4G: {F33971557} And a side e... [09:44:41] PROBLEM - HP RAID on ms-be1019 is CRITICAL: CRITICAL: Slot 3: Failed: 2I:2:3 - OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:4, 2I:4:1, 2I:4:2 - Controller: OK - Battery/Capacitor: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [09:44:43] ACKNOWLEDGEMENT - HP RAID on ms-be1019 is CRITICAL: CRITICAL: Slot 3: Failed: 2I:2:3 - OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:4, 2I:4:1, 2I:4:2 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T270806 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [09:44:48] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1019 - https://phabricator.wikimedia.org/T270806 (10ops-monitoring-bot) [10:26:38] PROBLEM - Device not healthy -SMART- on ms-be1019 is CRITICAL: cluster=swift device=None instance=ms-be1019 job=node site=eqiad https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ms-be1019&var-datasource=eqiad+prometheus/ops [10:34:17] 10Operations, 10ops-eqiad, 10SRE-swift-storage: Degraded RAID on ms-be1019 - https://phabricator.wikimedia.org/T270806 (10Volans) The array M is in a failed status. [10:50:30] 10Operations, 10ops-eqiad, 10SRE-swift-storage: Degraded RAID on ms-be1019 - https://phabricator.wikimedia.org/T270806 (10Volans) p:05Triage→03Medium [10:53:32] PROBLEM - BGP status on cr3-eqsin is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [11:07:57] apparently the BGP was transient, went back to WARNING within 2 minutes [11:08:49] !log gerrit2001 (replica) restarting Gerrit server [11:08:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:22:21] !log running on cumin1001: homer asw2-*-eqiad.mgmt.eqiad.wmnet commit "Fix numbering of an-worker hosts - T260445" [11:22:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:22:24] T260445: (Need By: TBD) rack/setup/install an-worker11[18-41] - https://phabricator.wikimedia.org/T260445 [11:25:38] hashar: can you dig [X@R5egpAEKIAAzF2hGoAAADB] 2020-12-24 11:20:27: Fatal exception of type "MWException for me? [11:25:52] unless you ain't logged in etc [11:27:40] tabbycat: hi [11:27:55] Bonjour Mr [11:27:56] tabbycat: give me a few minutes, I am finishing filing a bug report :] [11:28:07] Mais oui [11:31:06] (03PS6) 10Jbond: pcc: add more info to the status message [puppet] - 10https://gerrit.wikimedia.org/r/651788 (https://phabricator.wikimedia.org/T270757) [11:31:32] (03CR) 10jerkins-bot: [V: 04-1] pcc: add more info to the status message [puppet] - 10https://gerrit.wikimedia.org/r/651788 (https://phabricator.wikimedia.org/T270757) (owner: 10Jbond) [11:33:59] tabbycat: looking [11:35:38] tabbycat: CAS update failed on user_touched for user ID 'XXXXXXX' (master read) [11:35:51] tabbycat: CAS update failed on user_touched. The version of the user to be saved is older than the current version. [11:36:02] weird [11:36:40] hashar: it happened to a colleague [11:36:44] "CAS update failed on user_touched. The version of the user to be saved is older than the current version" [11:36:46] not me that is [11:37:00] I guess it is a race condition somewhere in the backend caches :( [11:37:08] there are a few tasks opened about it [11:37:57] tabbycat: https://phabricator.wikimedia.org/T249623 [11:39:10] I'll let them know [11:39:22] tabbycat: it should be a one off error [11:39:25] (03PS7) 10Jbond: pcc: add more info to the status message [puppet] - 10https://gerrit.wikimedia.org/r/651788 (https://phabricator.wikimedia.org/T270757) [11:39:26] another try should work ;) [11:39:30] yup, it did [11:39:50] (03CR) 10jerkins-bot: [V: 04-1] pcc: add more info to the status message [puppet] - 10https://gerrit.wikimedia.org/r/651788 (https://phabricator.wikimedia.org/T270757) (owner: 10Jbond) [11:50:17] (03CR) 10Jbond: pcc: add more info to the status message (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/651788 (https://phabricator.wikimedia.org/T270757) (owner: 10Jbond) [11:52:46] (03PS8) 10Jbond: pcc: add more info to the status message [puppet] - 10https://gerrit.wikimedia.org/r/651788 (https://phabricator.wikimedia.org/T270757) [12:03:00] (03PS1) 10Jbond: pcc: filter out noise [puppet] - 10https://gerrit.wikimedia.org/r/651911 [12:04:46] (03CR) 10Jbond: "Example output" [puppet] - 10https://gerrit.wikimedia.org/r/651911 (owner: 10Jbond) [12:06:00] (03CR) 10Jbond: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/651911 (owner: 10Jbond) [12:07:02] (03CR) 10Jbond: pcc: add more info to the status message (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/651788 (https://phabricator.wikimedia.org/T270757) (owner: 10Jbond) [12:09:32] (03CR) 10Jbond: varnish: ratelimit vscode-phabricator plugin (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/650494 (https://phabricator.wikimedia.org/T270482) (owner: 10Jbond) [12:09:51] (03CR) 10Jbond: varnish: migrate abuse_nets acl to abuse_networks hiera block (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/651174 (https://phabricator.wikimedia.org/T193762) (owner: 10Jbond) [12:22:16] (03PS1) 10Volans: tests: fix dependencies for tests [software/cumin] - 10https://gerrit.wikimedia.org/r/651913 (https://phabricator.wikimedia.org/T270795) [12:23:17] !log elukey@cumin1001 START - Cookbook sre.dns.netbox [12:23:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:34:01] (03CR) 10Volans: [C: 03+2] "Self-merging as tested on a fresh venv with pip upgraded (pip 20.3.3) and affecting only tests:" [software/cumin] - 10https://gerrit.wikimedia.org/r/651913 (https://phabricator.wikimedia.org/T270795) (owner: 10Volans) [12:34:19] !log elukey@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [12:34:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:37:28] (03Merged) 10jenkins-bot: tests: fix dependencies for tests [software/cumin] - 10https://gerrit.wikimedia.org/r/651913 (https://phabricator.wikimedia.org/T270795) (owner: 10Volans) [12:37:41] !log elukey@cumin1001 START - Cookbook sre.dns.netbox [12:37:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:41:12] !log elukey@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [12:41:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:42:53] (dns records fixed, back to holidays :) [12:44:43] (03PS3) 10Jbond: nodegen: add cumin support [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/651800 (https://phabricator.wikimedia.org/T245288) [12:48:40] (03CR) 10jerkins-bot: [V: 04-1] nodegen: add cumin support [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/651800 (https://phabricator.wikimedia.org/T245288) (owner: 10Jbond) [12:50:09] (03CR) 10Jbond: "recheck" (032 comments) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/651800 (https://phabricator.wikimedia.org/T245288) (owner: 10Jbond) [12:52:43] (03CR) 10Jbond: nodegen: add cumin support (031 comment) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/651800 (https://phabricator.wikimedia.org/T245288) (owner: 10Jbond) [12:54:49] (03PS4) 10Jbond: nodegen: add cumin support [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/651800 (https://phabricator.wikimedia.org/T245288) [12:56:51] (03PS1) 10Jbond: 1.1.1: bump version in setup.py [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/651914 [12:57:17] (03CR) 10Jbond: [V: 03+2 C: 03+2] 1.1.1: bump version in setup.py [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/651914 (owner: 10Jbond) [12:59:40] (03PS5) 10Jbond: nodegen: add cumin support [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/651800 (https://phabricator.wikimedia.org/T245288) [13:05:09] (03CR) 10jerkins-bot: [V: 04-1] nodegen: add cumin support [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/651800 (https://phabricator.wikimedia.org/T245288) (owner: 10Jbond) [13:23:53] (03CR) 10Jbond: "spoke with riccardo the error seems to be related to https://phabricator.wikimedia.org/T270795 and requires a cumin release so will pause " [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/651800 (https://phabricator.wikimedia.org/T245288) (owner: 10Jbond) [13:31:37] 10Operations, 10puppet-compiler, 10User-jbond: puppet-catalog-compiler: compilation result randomly places servers in the 'failed' section - https://phabricator.wikimedia.org/T224977 (10jbond) [13:31:50] 10Operations, 10Packaging, 10puppet-compiler, 10User-jbond: PCC always has an ERROR when compiling for servers with profile::redis::slave - https://phabricator.wikimedia.org/T228266 (10jbond) 05Open→03Resolved I think this is resolved with the PS from alex [13:57:36] PROBLEM - IPv4 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 55 probes of 674 (alerts on 35) - https://atlas.ripe.net/measurements/1791307/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:58:02] (03PS1) 10Jbond: profile::base::puppet: drop puppet/facter_major_version params [puppet] - 10https://gerrit.wikimedia.org/r/651917 (https://phabricator.wikimedia.org/T211547) [14:00:37] (03CR) 10Jbond: [C: 03+2] profile::base::puppet: drop puppet/facter_major_version params [puppet] - 10https://gerrit.wikimedia.org/r/651917 (https://phabricator.wikimedia.org/T211547) (owner: 10Jbond) [14:00:40] (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/651918 [14:03:22] RECOVERY - IPv4 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 34 probes of 674 (alerts on 35) - https://atlas.ripe.net/measurements/1791307/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [14:03:22] 10Operations, 10Puppet, 10puppet-compiler, 10Patch-For-Review, 10User-jbond: Cleanup the puppetmaster module so that we stop breaking expectations (and the puppet compiler) - https://phabricator.wikimedia.org/T211547 (10jbond) [14:04:32] 10Operations, 10Puppet, 10puppet-compiler, 10Patch-For-Review, 10User-jbond: Cleanup the puppetmaster module so that we stop breaking expectations (and the puppet compiler) - https://phabricator.wikimedia.org/T211547 (10jbond) 05Open→03Resolved a:03jbond [14:07:22] 10Operations, 10Puppet, 10Release-Engineering-Team-TODO, 10puppet-compiler, and 2 others: Figure out a way to enable volunteers to use the puppet compiler - https://phabricator.wikimedia.org/T192532 (10jbond) [14:17:19] 10Operations, 10Puppet, 10Release-Engineering-Team-TODO, 10puppet-compiler, 10Release-Engineering-Team (CI & Testing services): Integrate the puppet compiler in the puppet CI pipeline - https://phabricator.wikimedia.org/T166066 (10jbond) [14:58:58] PROBLEM - puppet last run on an-coord1002 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [15:59:00] PROBLEM - IPv4 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 63 probes of 674 (alerts on 35) - https://atlas.ripe.net/measurements/1791307/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [16:08:59] (03PS1) 10Jbond: (DO NOT MERGE) testing CI [puppet] - 10https://gerrit.wikimedia.org/r/651922 [16:10:18] RECOVERY - IPv4 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 27 probes of 674 (alerts on 35) - https://atlas.ripe.net/measurements/1791307/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [16:10:33] (03CR) 10jerkins-bot: [V: 04-1] (DO NOT MERGE) testing CI [puppet] - 10https://gerrit.wikimedia.org/r/651922 (owner: 10Jbond) [17:16:36] (03PS1) 10Jbond: (WIP) nodegen: add node selections based on commited files [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/651925 (https://phabricator.wikimedia.org/T166066) [17:20:31] (03CR) 10jerkins-bot: [V: 04-1] (WIP) nodegen: add node selections based on commited files [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/651925 (https://phabricator.wikimedia.org/T166066) (owner: 10Jbond) [17:24:14] (03PS2) 10Jbond: (WIP) nodegen: add node selections based on commited files [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/651925 (https://phabricator.wikimedia.org/T166066) [17:29:39] (03CR) 10jerkins-bot: [V: 04-1] (WIP) nodegen: add node selections based on commited files [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/651925 (https://phabricator.wikimedia.org/T166066) (owner: 10Jbond) [20:31:20] (03PS1) 10Andrew Bogott: profile::wmcs::nfs::primary: include bdsync [puppet] - 10https://gerrit.wikimedia.org/r/651928 [20:32:42] (03CR) 10Bstorm: [C: 03+1] profile::wmcs::nfs::primary: include bdsync [puppet] - 10https://gerrit.wikimedia.org/r/651928 (owner: 10Andrew Bogott) [20:33:09] (03CR) 10Andrew Bogott: [C: 03+2] profile::wmcs::nfs::primary: include bdsync [puppet] - 10https://gerrit.wikimedia.org/r/651928 (owner: 10Andrew Bogott)