[00:00:22] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] "Looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/542251 (owner: 10Filippo Giunchedi)
[00:11:01] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[00:12:30] <wikibugs>	 10Operations, 10DC-Ops: fix IPMI over LAN on certain HP hosts - https://phabricator.wikimedia.org/T235234 (10Dzahn)
[00:13:04] <wikibugs>	 10Operations, 10DC-Ops: fix IPMI over LAN on certain HP hosts - https://phabricator.wikimedia.org/T235234 (10Dzahn)
[00:16:58] <wikibugs>	 10Operations, 10DC-Ops: fix IPMI over LAN on certain HP hosts - https://phabricator.wikimedia.org/T235234 (10Dzahn)
[00:27:42] <wikibugs>	 (03PS1) 10Hoo man: No longer use --no-cache when dumping Wikibase entities [puppet] - 10https://gerrit.wikimedia.org/r/542278
[00:30:59] <wikibugs>	 10Operations, 10DC-Ops: fix IPMI over LAN on certain HP hosts - https://phabricator.wikimedia.org/T235234 (10Dzahn)
[00:32:09] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[00:35:51] <wikibugs>	 10Operations, 10DC-Ops: fix IPMI over LAN on certain HP hosts - https://phabricator.wikimedia.org/T235234 (10Dzahn) codfw db hosts - fixed  ms-be eqiad hosts - These brand new installs from T232367  @Robh @Jclark-ctr could you make sure IPMI over LAN is enabled on these?
[00:38:15] <wikibugs>	 10Operations, 10Documentation: Document how to fix IPMI issues on Wikitech - https://phabricator.wikimedia.org/T191956 (10Dzahn)
[00:38:18] <wikibugs>	 10Operations, 10DC-Ops: fix IPMI over LAN on certain HP hosts - https://phabricator.wikimedia.org/T235234 (10Dzahn)
[00:38:20] <wikibugs>	 10Operations, 10observability: Remote IPMI doesn't work for ~2% of the fleet - https://phabricator.wikimedia.org/T150160 (10Dzahn)
[00:38:40] <wikibugs>	 10Operations: IPMI Audit 2018-04 - https://phabricator.wikimedia.org/T193155 (10Dzahn)
[00:38:45] <wikibugs>	 10Operations, 10DC-Ops: fix IPMI over LAN on certain HP hosts - https://phabricator.wikimedia.org/T235234 (10Dzahn)
[00:39:54] <wikibugs>	 10Operations, 10DC-Ops: fix IPMI over LAN on certain HP hosts - https://phabricator.wikimedia.org/T235234 (10Dzahn) a:03Papaul assigning to Papaul per IRC chat (thanks!)
[00:49:57] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[01:31:10] <wikibugs>	 10Operations, 10DC-Ops: fix IPMI over LAN on certain HP hosts - https://phabricator.wikimedia.org/T235234 (10Papaul)
[01:32:11] <wikibugs>	 10Operations, 10DC-Ops: fix IPMI over LAN on certain HP hosts - https://phabricator.wikimedia.org/T235234 (10Papaul) a:05Papaul→03Dzahn
[01:32:34] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2058.codfw.wmnet - https://phabricator.wikimedia.org/T229543 (10Papaul)
[01:32:51] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2058.codfw.wmnet - https://phabricator.wikimedia.org/T229543 (10Papaul) 05Open→03Resolved Complete
[01:32:53] <wikibugs>	 10Operations, 10DBA: Decommission db2043-db2069 - https://phabricator.wikimedia.org/T228258 (10Papaul)
[01:33:10] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2069.codfw.wmnet - https://phabricator.wikimedia.org/T230107 (10Papaul)
[01:40:39] <icinga-wm>	 RECOVERY - Check the Netbox report librenms for fail status. on netbox1001 is OK: librenms.LibreNMS OK https://wikitech.wikimedia.org/wiki/Netbox%23Reports
[02:03:28] <wikibugs>	 10Operations, 10DC-Ops: fix IPMI over LAN on certain HP hosts - https://phabricator.wikimedia.org/T235234 (10Dzahn)
[02:10:46] <mutante>	 !log gerrit1001 - attempt to manually start replication to github
[02:10:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:13:08] <mutante>	 !log gerrit - restart service to ensure last config change is picked up
[02:13:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:14:32] <mutante>	 !log gerrit - "manually" starting replication via ssh command
[02:14:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:17:21] <wikibugs>	 10Operations, 10DC-Ops: fix IPMI over LAN on certain HP hosts - https://phabricator.wikimedia.org/T235234 (10Dzahn) a:05Dzahn→03None
[02:36:38] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decommission astatine - https://phabricator.wikimedia.org/T221244 (10Dzahn) The box for production DNS removed is checked but looking at DNS repo it's still there:  templates/wikimedia.org:astatine        1H  IN A    208.80.155.110 templates/155.80.208.i...
[02:37:37] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decommission astatine - https://phabricator.wikimedia.org/T221244 (10Dzahn)
[02:56:29] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 20673816 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[02:58:07] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 51248 and 66 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[04:36:25] <wikibugs>	 (03CR) 10Marostegui: "Now that this is merged, should we remove the IP bans, maybe on Monday?" [puppet] - 10https://gerrit.wikimedia.org/r/542153 (owner: 10CDanis)
[04:48:04] <wikibugs>	 (03CR) 10Dzahn: "@akosiaris @cdanis first i saw service/services.yaml and wanted to add a new service name, "httpd", to it. to not use "apache2" again per " [puppet] - 10https://gerrit.wikimedia.org/r/541377 (https://phabricator.wikimedia.org/T233654) (owner: 10Dzahn)
[04:54:11] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1098:3317 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9310 and previous config saved to /var/cache/conftool/dbconfig/20191011-045409-marostegui.json
[04:54:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:54:15] <stashbot>	 T233625: Change PK and remove partitions from the logging table - https://phabricator.wikimedia.org/T233625
[05:01:49] <wikibugs>	 10Operations, 10Documentation: Document how to fix IPMI issues on Wikitech - https://phabricator.wikimedia.org/T191956 (10Dzahn) see https://wikitech.wikimedia.org/wiki/Management_Interfaces
[05:13:27] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.decommission
[05:13:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:13:39] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
[05:13:41] <wikibugs>	 (03PS1) 10Marostegui: site.pp: Remove db2056 from puppet [puppet] - 10https://gerrit.wikimedia.org/r/542314 (https://phabricator.wikimedia.org/T230777)
[05:13:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:13:42] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: Decommission db2056.codfw.wmnet - https://phabricator.wikimedia.org/T230777 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `db2056.codfw.wmnet` -  db2056.codfw.wmnet (**PASS**)...
[05:15:06] <wikibugs>	 (03PS1) 10Marostegui: wmnet: Remove db2056 production DNS entries [dns] - 10https://gerrit.wikimedia.org/r/542316 (https://phabricator.wikimedia.org/T230777)
[05:16:10] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] site.pp: Remove db2056 from puppet [puppet] - 10https://gerrit.wikimedia.org/r/542314 (https://phabricator.wikimedia.org/T230777) (owner: 10Marostegui)
[05:16:54] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] wmnet: Remove db2056 production DNS entries [dns] - 10https://gerrit.wikimedia.org/r/542316 (https://phabricator.wikimedia.org/T230777) (owner: 10Marostegui)
[05:18:30] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2056.codfw.wmnet - https://phabricator.wikimedia.org/T230777 (10Marostegui) a:05RobH→03Papaul
[05:18:47] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2056.codfw.wmnet - https://phabricator.wikimedia.org/T230777 (10Marostegui) Host ready for onsite steps + switch disablement
[05:23:46] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 54, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:24:53] <elukey>	 hello cr2
[05:25:40] <wikibugs>	 (03PS1) 10Vgutierrez: ATS: Add timing request information to ats-tls log [puppet] - 10https://gerrit.wikimedia.org/r/542317 (https://phabricator.wikimedia.org/T234887)
[05:25:59] <elukey>	 seems Telia transport with eqiad down
[05:26:58] <papaul>	 |1log rebooting an-conf1001 for serial troubleshooting 
[05:27:06] <papaul>	 !1log rebooting an-conf1001 for serial troubleshooting 
[05:27:59] <papaul>	 !log rebooting an-conf1001 for serial troubleshooting 
[05:28:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:30:43] <elukey>	 mmm there was unexpected maintenance but not for that link from what I can read
[05:31:02] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 240, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:31:28] <elukey>	 ah ok and here the other side
[05:33:48] <wikibugs>	 10Operations, 10observability, 10Availability, 10Goal: Setup bacula backup monitoring - https://phabricator.wikimedia.org/T234900 (10Marostegui) The last issue we had with bacula host itself was some sort of storage degradation/failure, no? Maybe some sort of OS monitoring to catch potential issues on the...
[05:34:33] <elukey>	 so I don't see any planned maintenance for the circuit, can somebody else triple check? (morning pebcak prevention)
[05:39:04] <elukey>	 ah no I found the maintenance
[05:39:06] <elukey>	 - Maintenance window:
[05:39:06] <elukey>	 Start Date and Time: 2019-Oct-11 04:00 UTC
[05:39:06] <elukey>	 End Date and Time: 2019-Oct-11 11:00 UTC
[05:39:27] <elukey>	 ok so it was expected but not in any calendar afaics
[05:44:18] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 242, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:44:19] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 56, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:47:16] <icinga-wm>	 RECOVERY - BGP status on cr4-ulsfo is OK: BGP OK - up: 93, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[06:02:06] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 240, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:02:08] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 54, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:05:08] <elukey>	 again maintenance --^
[06:08:15] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db2085:3318 for compression - T232446', diff saved to https://phabricator.wikimedia.org/P9311 and previous config saved to /var/cache/conftool/dbconfig/20191011-060814-marostegui.json
[06:08:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:08:19] <stashbot>	 T232446: Compress  new Wikibase tables - https://phabricator.wikimedia.org/T232446
[06:13:20] <marostegui>	 !log Compress tables on db2085:3318 - T232446
[06:13:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:13:24] <stashbot>	 T232446: Compress  new Wikibase tables - https://phabricator.wikimedia.org/T232446
[06:19:50] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 242, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:19:52] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 56, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:56:36] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] admin: add jkumalah to analytics-privatedata-users, researchers [puppet] - 10https://gerrit.wikimedia.org/r/542141 (https://phabricator.wikimedia.org/T234433) (owner: 10Herron)
[06:57:30] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] admin: add dedcode to analytics-privatedata-users, researchers [puppet] - 10https://gerrit.wikimedia.org/r/542132 (https://phabricator.wikimedia.org/T234473) (owner: 10Herron)
[07:05:49] <wikibugs>	 10Operations, 10Analytics, 10SRE-Access-Requests: Requesting access to analytics cluster for Martin Gerlach - https://phabricator.wikimedia.org/T232707 (10MGerlach) @MoritzMuehlenhoff opening this again since I cannot access the cluster anymore, e.g. via 'ssh mgerlach@stat1007.eqiad.wmnet' This happended aft...
[07:28:31] <XioNoX>	 !log deactivate HE peering on cr1-eqiad for packet loss
[07:28:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:30:17] <XioNoX>	 !log deactivate HE peering on cr2-eqord for packet loss
[07:30:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:32:00] <XioNoX>	 !log rollback two previous HE peering deactivate
[07:32:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:34:32] <wikibugs>	 10Operations, 10ops-codfw: No microcode updates loaded on puppetmaster2001/2002 after reimage to Buster - https://phabricator.wikimedia.org/T235250 (10MoritzMuehlenhoff)
[07:35:41] <wikibugs>	 10Operations, 10ops-codfw: No microcode updates loaded on puppetmaster2001/2002 after reimage to Buster - https://phabricator.wikimedia.org/T235250 (10MoritzMuehlenhoff) p:05Triage→03Normal
[07:45:22] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch labpuppetmaster spares to Buster for microcode/initrd debugging [puppet] - 10https://gerrit.wikimedia.org/r/542320
[07:45:41] <wikibugs>	 (03PS1) 10Ayounsi: PDUs: add model sentry 4 to eqiad b1 and a2 [puppet] - 10https://gerrit.wikimedia.org/r/542321 (https://phabricator.wikimedia.org/T227536)
[07:48:48] <wikibugs>	 (03PS2) 10Muehlenhoff: Switch labpuppetmaster spares to Buster for microcode/initrd debugging [puppet] - 10https://gerrit.wikimedia.org/r/542320
[07:51:06] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/18852/icinga1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/542321 (https://phabricator.wikimedia.org/T227536) (owner: 10Ayounsi)
[07:51:43] <wikibugs>	 (03PS3) 10Muehlenhoff: Switch labpuppetmaster spares to Buster for microcode/initrd debugging [puppet] - 10https://gerrit.wikimedia.org/r/542320
[07:55:28] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch labpuppetmaster spares to Buster for microcode/initrd debugging [puppet] - 10https://gerrit.wikimedia.org/r/542320 (owner: 10Muehlenhoff)
[07:55:29] <icinga-wm>	 RECOVERY - ps1-b1-eqiad-infeed-load-tower-B-phase-Y on ps1-b1-eqiad is OK: SNMP OK - ps1-b1-eqiad-infeed-load-tower-B-phase-Y 194 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:55:39] <icinga-wm>	 RECOVERY - ps1-b1-eqiad-infeed-load-tower-B-phase-Z on ps1-b1-eqiad is OK: SNMP OK - ps1-b1-eqiad-infeed-load-tower-B-phase-Z 318 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:55:47] <icinga-wm>	 RECOVERY - ps1-a2-eqiad-infeed-load-tower-B-phase-X on ps1-a2-eqiad is OK: SNMP OK - ps1-a2-eqiad-infeed-load-tower-B-phase-X 367 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:55:55] <icinga-wm>	 RECOVERY - ps1-a2-eqiad-infeed-load-tower-A-phase-Y on ps1-a2-eqiad is OK: SNMP OK - ps1-a2-eqiad-infeed-load-tower-A-phase-Y 248 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:56:03] <icinga-wm>	 RECOVERY - ps1-b1-eqiad-infeed-load-tower-B-phase-X on ps1-b1-eqiad is OK: SNMP OK - ps1-b1-eqiad-infeed-load-tower-B-phase-X 273 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:57:29] <icinga-wm>	 RECOVERY - ps1-a2-eqiad-infeed-load-tower-B-phase-Z on ps1-a2-eqiad is OK: SNMP OK - ps1-a2-eqiad-infeed-load-tower-B-phase-Z 226 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:03:13] <icinga-wm>	 RECOVERY - ps1-b1-eqiad-infeed-load-tower-A-phase-X on ps1-b1-eqiad is OK: SNMP OK - ps1-b1-eqiad-infeed-load-tower-A-phase-X 322 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:04:13] <icinga-wm>	 RECOVERY - ps1-a2-eqiad-infeed-load-tower-A-phase-X on ps1-a2-eqiad is OK: SNMP OK - ps1-a2-eqiad-infeed-load-tower-A-phase-X 293 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:04:13] <icinga-wm>	 RECOVERY - ps1-a2-eqiad-infeed-load-tower-A-phase-Z on ps1-a2-eqiad is OK: SNMP OK - ps1-a2-eqiad-infeed-load-tower-A-phase-Z 321 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:04:13] <icinga-wm>	 RECOVERY - ps1-a2-eqiad-infeed-load-tower-B-phase-Y on ps1-a2-eqiad is OK: SNMP OK - ps1-a2-eqiad-infeed-load-tower-B-phase-Y 318 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:04:13] <icinga-wm>	 RECOVERY - ps1-b1-eqiad-infeed-load-tower-A-phase-Y on ps1-b1-eqiad is OK: SNMP OK - ps1-b1-eqiad-infeed-load-tower-A-phase-Y 199 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:04:13] <icinga-wm>	 RECOVERY - ps1-b1-eqiad-infeed-load-tower-A-phase-Z on ps1-b1-eqiad is OK: SNMP OK - ps1-b1-eqiad-infeed-load-tower-A-phase-Z 346 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:04:37] <moritzm>	 !log reimaging labpuppetmaster1002 (spare) for some tests related to microcode loading
[08:04:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:13:08] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: Install new PDUs in rows A/B (Top level tracking task) - https://phabricator.wikimedia.org/T226778 (10ayounsi) Can I suggest a few modifications to the PDU swap checklist of each task? Mostly to clear out the alerting noise Under: "schedule downtime for the entire list of...
[08:18:45] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: Install new PDUs in rows A/B (Top level tracking task) - https://phabricator.wikimedia.org/T226778 (10wiki_willy) Hi @ayounsi - I talked to a couple other people who had the same concern the other day, and I agree as well...so I started scheduling downtime for the PDU ale...
[08:28:29] <logmsgbot>	 !log jmm@cumin1001 START - Cookbook sre.hosts.downtime
[08:28:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:30:45] <logmsgbot>	 !log jmm@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[08:30:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:32:21] <moritzm>	 !log remove kafka1001-1003 from debmonitor DB (T235125)
[08:32:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:32:24] <stashbot>	 T235125: Move kafka200[123] to logstash202[012] - https://phabricator.wikimedia.org/T235125
[08:34:04] <moritzm>	 !log remove kafka2001-2003 from debmonitor DB (T235125)
[08:34:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:39:36] <wikibugs>	 10Operations, 10ops-eqiad, 10User-Elukey: (Need By: August 31) rack/setup/install (3) new zookeeper nodes - https://phabricator.wikimedia.org/T227025 (10wiki_willy) I'll dig around a bit and check with Dell to see if we can figure why Com1 and Com2 have to be flipped to get it working.  Talked to Luca and wo...
[08:40:57] <wikibugs>	 (03CR) 10Gehel: [C: 04-1] "see comments inline" (038 comments) [puppet] - 10https://gerrit.wikimedia.org/r/537138 (https://phabricator.wikimedia.org/T232297) (owner: 10Mathew.onipe)
[08:55:50] <wikibugs>	 (03CR) 10Muehlenhoff: [V: 03+2 C: 03+2] Move parsing of Cumin alias/query outside of a global option [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/542125 (owner: 10Muehlenhoff)
[09:08:18] <icinga-wm>	 PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[09:12:26] <wikibugs>	 (03CR) 10Gehel: [C: 04-1] "a few more comments inline" (035 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/540153 (https://phabricator.wikimedia.org/T230588) (owner: 10Mathew.onipe)
[09:17:25] <wikibugs>	 10Operations, 10observability, 10Availability, 10Goal: Setup bacula backup monitoring - https://phabricator.wikimedia.org/T234900 (10akosiaris) > The last issue we had with bacula host itself was some sort of storage degradation/failure, no?  Somewhat. A disk in the RAID failed, ending up with the nagios c...
[09:20:23] <wikibugs>	 10Operations, 10serviceops: Increase of varnish-be failed fetches error due to "http format error" - https://phabricator.wikimedia.org/T235254 (10jijiki)
[09:22:45] <wikibugs>	 10Operations, 10observability, 10Availability, 10Goal: Setup bacula backup monitoring - https://phabricator.wikimedia.org/T234900 (10jcrespo) @fgiunchedi I will either start with such brainstorming or maybe some the technical, foundation layers first (script for checking automation), please make sure to fe...
[09:24:04] <wikibugs>	 10Operations: Track remaining jessie systems in production - https://phabricator.wikimedia.org/T224549 (10MoritzMuehlenhoff)
[09:25:54] <wikibugs>	 10Operations, 10DBA, 10serviceops, 10Goal, 10Patch-For-Review: Strengthen backup infrastructure and support - https://phabricator.wikimedia.org/T229209 (10jcrespo) @akosiaris We have reached an impass. We should:  * Run puppet with the new permissions on the current bacula host, fix any issues found. * P...
[09:29:30] <icinga-wm>	 RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[09:36:15] <wikibugs>	 (03PS1) 10Muehlenhoff: debdeploy-deploy:  Transitions are optional [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/542331
[09:37:02] <wikibugs>	 (03CR) 10Volans: [V: 03+2 C: 03+2] "LGTM, bypassing CI for the sphinx issue with requests." [software/spicerack] - 10https://gerrit.wikimedia.org/r/542090 (https://phabricator.wikimedia.org/T147074) (owner: 10Jbond)
[09:37:10] <wikibugs>	 (03PS2) 10Volans: ipmi: The change to subprocess.run() failed to capture stdout [software/spicerack] - 10https://gerrit.wikimedia.org/r/542090 (https://phabricator.wikimedia.org/T147074) (owner: 10Jbond)
[09:37:54] <wikibugs>	 (03CR) 10Volans: [V: 03+2 C: 03+2] ipmi: The change to subprocess.run() failed to capture stdout [software/spicerack] - 10https://gerrit.wikimedia.org/r/542090 (https://phabricator.wikimedia.org/T147074) (owner: 10Jbond)
[09:38:51] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "\o/" [puppet] - 10https://gerrit.wikimedia.org/r/542153 (owner: 10CDanis)
[09:38:56] <wikibugs>	 (03CR) 10jenkins-bot: ipmi: The change to subprocess.run() failed to capture stdout [software/spicerack] - 10https://gerrit.wikimedia.org/r/542090 (https://phabricator.wikimedia.org/T147074) (owner: 10Jbond)
[09:42:35] <wikibugs>	 (03PS1) 10Elukey: eventlogging::dependencies: add python3 dependencies [puppet] - 10https://gerrit.wikimedia.org/r/542333 (https://phabricator.wikimedia.org/T233231)
[09:43:15] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: toolforge: aptly: add buster-tools repository [puppet] - 10https://gerrit.wikimedia.org/r/542334 (https://phabricator.wikimedia.org/T235059)
[09:43:17] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] eventlogging::dependencies: add python3 dependencies [puppet] - 10https://gerrit.wikimedia.org/r/542333 (https://phabricator.wikimedia.org/T233231) (owner: 10Elukey)
[09:47:06] <wikibugs>	 (03PS2) 10Elukey: eventlogging::dependencies: add python3 dependencies [puppet] - 10https://gerrit.wikimedia.org/r/542333 (https://phabricator.wikimedia.org/T233231)
[09:49:19] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: aptly: add buster-tools repository [puppet] - 10https://gerrit.wikimedia.org/r/542334 (https://phabricator.wikimedia.org/T235059) (owner: 10Arturo Borrero Gonzalez)
[09:52:10] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] eventlogging::dependencies: add python3 dependencies [puppet] - 10https://gerrit.wikimedia.org/r/542333 (https://phabricator.wikimedia.org/T233231) (owner: 10Elukey)
[09:52:22] <wikibugs>	 (03PS3) 10Elukey: eventlogging::dependencies: add python3 dependencies [puppet] - 10https://gerrit.wikimedia.org/r/542333 (https://phabricator.wikimedia.org/T233231)
[09:52:43] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] debdeploy-deploy:  Transitions are optional [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/542331 (owner: 10Muehlenhoff)
[09:56:05] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: toolforge: aptly: add buster-toolsbeta repository [puppet] - 10https://gerrit.wikimedia.org/r/542348 (https://phabricator.wikimedia.org/T235059)
[09:56:41] <elukey>	 gerrit in trouble?
[09:56:43] <elukey>	 https://grafana.wikimedia.org/d/Bw2mQ3iWz/gerrit-javamelody?panelId=16&fullscreen&orgId=1
[09:56:47] <elukey>	 super slow for me
[09:57:45] <elukey>	 hashar: --^
[09:58:47] <elukey>	 it doesn't load anymore for me
[09:59:05] <elukey>	 threads are climbing
[09:59:07] <hashar>	 elukey: looking
[10:02:31] <hashar>	 !log gerrit: killed a stall SendEmail thread that was holding a lock
[10:02:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:04:34] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: aptly: add buster-toolsbeta repository [puppet] - 10https://gerrit.wikimedia.org/r/542348 (https://phabricator.wikimedia.org/T235059) (owner: 10Arturo Borrero Gonzalez)
[10:04:42] <elukey>	 thanks!
[10:04:42] <hashar>	 fun 
[10:04:48] <hashar>	 the deadlock is gone for new requests
[10:04:56] <hashar>	 but the lock is still held anyway for the other http threads
[10:04:58] <hashar>	 :\
[10:05:14] <elukey>	 should we restart?
[10:06:05] <wikibugs>	 (03PS1) 10Elukey: eventlogging::dependencies: remove python3-pykafka dependency [puppet] - 10https://gerrit.wikimedia.org/r/542401 (https://phabricator.wikimedia.org/T233231)
[10:06:24] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "It needs to be a string that is referenced in hieradata/common/lvs/configuration.yaml in the corresponding LVS entry under the conftool st" [puppet] - 10https://gerrit.wikimedia.org/r/541377 (https://phabricator.wikimedia.org/T233654) (owner: 10Dzahn)
[10:06:48] <wikibugs>	 (03PS2) 10Elukey: eventlogging::dependencies: remove python3-pykafka dependency [puppet] - 10https://gerrit.wikimedia.org/r/542401 (https://phabricator.wikimedia.org/T233231)
[10:07:25] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] eventlogging::dependencies: remove python3-pykafka dependency [puppet] - 10https://gerrit.wikimedia.org/r/542401 (https://phabricator.wikimedia.org/T233231) (owner: 10Elukey)
[10:08:46] <elukey>	 hashar: gerrit is still half usable for me :(
[10:08:57] <hashar>	 yes
[10:09:01] <hashar>	 going to resrat it
[10:11:15] <hashar>	 !log Restarting Gerrit # T224448
[10:11:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:11:19] <stashbot>	 T224448: Gerrit account cache has a faulty reentrant lock causing http/sendemail threads to stall completely - https://phabricator.wikimedia.org/T224448
[10:15:54] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[10:17:26] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[10:17:56] <icinga-wm>	 PROBLEM - Widespread puppet agent failures on icinga1001 is CRITICAL: 0.01156 ge 0.01 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet
[10:18:00] <paladox>	 Oh threads problem again?
[10:18:09] <moritzm>	 !log imported debdeploy 0.0.99.11 for jessie/stretch/buster-wikimedia
[10:18:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:25:35] <hashar>	 paladox: yes :-\
[10:28:54] <paladox>	 Ok :(
[10:31:27] <paladox>	 hashar: we’ll need to get https://gerrit-review.googlesource.com/c/gerrit/+/239436 deployed!
[10:33:32] <hashar>	 possibly yeah
[10:34:58] <hashar>	 what I am wondering is that maybe the deadlock occurs way above. Eg in the thread pool executor
[10:35:09] <hashar>	 so that potentially two task ends up locking it for some reason
[10:35:14] <hashar>	 and they end up waiting on each other
[10:35:24] <hashar>	 with the jvm magically not detecting it :\
[10:38:25] <hashar>	 paladox: but maybe we can thread a heap dump (that is going to take a while and be large I guess) and then find a way to debug it
[10:45:22] <icinga-wm>	 RECOVERY - Widespread puppet agent failures on icinga1001 is OK: (C)0.01 ge (W)0.006 ge 0.003613 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet
[11:01:06] <wikibugs>	 10Operations, 10DBA: Switchover s1 primary database master db1067 -> db1083 - 14th Nov 05:00 - 05:30 UTC - https://phabricator.wikimedia.org/T234800 (10Johan) In that case, we don't need to take the less efficient way of writing in Tech News, better to contact the wiki directly.
[11:08:21] <moritzm>	 !log upgrading debdeploy to 0.0.99.11
[11:08:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:20:54] <wikibugs>	 10Operations, 10Analytics, 10SRE-Access-Requests: Requesting access to analytics cluster for Martin Gerlach - https://phabricator.wikimedia.org/T232707 (10MoritzMuehlenhoff) Try running   ` ssh-add ~/.ssh/id_ed25519  ` It will ask you for the passphrase of our SSH key. After running doing that, can you retry...
[11:33:23] <wikibugs>	 (03PS8) 10Arturo Borrero Gonzalez: toolforge: introduce new proxy role [puppet] - 10https://gerrit.wikimedia.org/r/508560 (https://phabricator.wikimedia.org/T219362)
[11:33:30] <hauskater>	 Urbanecm: you there?
[11:33:39] <Urbanecm>	 hauskater: yes, how may I help?
[11:34:55] <hauskater>	 Urbanecm: any issues with the job queue? Two global renames refusing to start
[11:34:58] <hauskater>	 both on enwiki
[11:35:15] * Urbanecm is opening logstash
[11:37:36] <Urbanecm>	 Haydenb13's rename seems to be done on enwiki, btw
[11:38:00] <Urbanecm>	 seems to be temporary issue :)
[11:38:04] <Urbanecm>	 both are done on my end
[11:38:09] <Urbanecm>	 hauskater: 
[11:39:37] * hauskater checks on his
[11:39:55] <hauskater>	 Yup, both seem now to have completed
[11:40:00] <hauskater>	 After ~20 minutes
[11:40:02] <hauskater>	 :)
[11:40:07] <hauskater>	 Busy queue maybe
[11:46:15] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/508560 (https://phabricator.wikimedia.org/T219362) (owner: 10Arturo Borrero Gonzalez)
[11:51:06] <moritzm>	 !log installing unzip security updates on stretch
[11:51:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:57:24] <wikibugs>	 (03PS1) 10Muehlenhoff: Add library hint for libcaca [puppet] - 10https://gerrit.wikimedia.org/r/542413
[11:57:42] <wikibugs>	 10Operations, 10Analytics, 10SRE-Access-Requests: Requesting access to analytics cluster for Martin Gerlach - https://phabricator.wikimedia.org/T232707 (10MGerlach) That solved it. Thanks.
[12:00:43] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add library hint for libcaca [puppet] - 10https://gerrit.wikimedia.org/r/542413 (owner: 10Muehlenhoff)
[12:22:35] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): "We already have a wmgWikibaseRepoEnableRefTabs setting that’s used to enable ref tabs in beta – it would be better to use that setting for" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/514461 (https://phabricator.wikimedia.org/T199197) (owner: 10Mvolz)
[12:24:39] <XioNoX>	 !log push firewall policies to pfw3-codfw - T235074
[12:24:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:24:51] <wikibugs>	 (03PS1) 10Muehlenhoff: Sort distros in generate-debdeploy-spec [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/542417
[12:25:37] <wikibugs>	 (03PS10) 10Lucas Werkmeister (WMDE): Enable reftabs on testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/514461 (https://phabricator.wikimedia.org/T199197) (owner: 10Mvolz)
[12:25:50] <XioNoX>	 !log push firewall policies to pfw3-eqiad - T235074
[12:25:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:31:02] <moritzm>	 !log installing libcaca security updates on stretch
[12:31:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:32:00] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1098:3317 after schema change T233625', diff saved to https://phabricator.wikimedia.org/P9314 and previous config saved to /var/cache/conftool/dbconfig/20191011-123159-marostegui.json
[12:32:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:32:03] <stashbot>	 T233625: Change PK and remove partitions from the logging table - https://phabricator.wikimedia.org/T233625
[12:33:38] <moritzm>	 !log installing gsoap security updates on stretch
[12:33:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:35:48] <moritzm>	 !log installin zsh updates from stretch point release
[12:35:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:45:59] <moritzm>	 !log installing libxslt security updates
[12:46:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:47:44] <XioNoX>	 !log disable SIP ALG on pfw3-codfw - T235150
[12:47:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:48:32] <XioNoX>	 !log disable SIP ALG on pfw3-eqiad - T235150
[12:48:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:57:23] <wikibugs>	 10Operations: Integrate Stretch 9.10/9.11 point updates - https://phabricator.wikimedia.org/T232308 (10MoritzMuehlenhoff)
[13:01:37] <moritzm>	 !log installing 4.9.189 Linux update from last stretch point releases (no reboots, deploying the package only at this point)
[13:01:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:04:40] <wikibugs>	 (03PS9) 10Arturo Borrero Gonzalez: toolforge: introduce new proxy role [puppet] - 10https://gerrit.wikimedia.org/r/508560 (https://phabricator.wikimedia.org/T219362)
[13:06:45] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] toolforge: introduce new proxy role [puppet] - 10https://gerrit.wikimedia.org/r/508560 (https://phabricator.wikimedia.org/T219362) (owner: 10Arturo Borrero Gonzalez)
[13:09:21] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Scoring-platform-team: Grant LDAP groups and deployment shell access to Kevin Bazira - https://phabricator.wikimedia.org/T234209 (10Halfak)
[13:10:31] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Scoring-platform-team: Grant LDAP groups and deployment shell access to Kevin Bazira - https://phabricator.wikimedia.org/T234209 (10Halfak) I've updated the task details with some high-level reasoning for the access.  If it's not evident, I app...
[13:17:05] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Scoring-platform-team: Grant LDAP groups and deployment shell access to Kevin Bazira - https://phabricator.wikimedia.org/T234209 (10kevinbazira) Thanks @Halfak,  @herron, I've signed the L3 agreement document, and below is my user information:...
[13:42:40] <wikibugs>	 10Operations, 10serviceops: Increase of varnish-be failed fetches error due to "http format error" - https://phabricator.wikimedia.org/T235254 (10jijiki) The varnish error rate is back to normal for now, but we should keep an eye for a similar issue in the future.
[13:57:02] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.downtime
[13:57:03] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[13:57:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:57:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:57:47] <moritzm>	 !log rebooting cloudbackup2001
[13:57:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:15:06] <wikibugs>	 (03PS3) 10Effie Mouzeli: lvs::monitor_services: increase number of tries before MCS is critical [puppet] - 10https://gerrit.wikimedia.org/r/541891 (https://phabricator.wikimedia.org/T229286)
[14:17:19] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] lvs::monitor_services: increase number of tries before MCS is critical [puppet] - 10https://gerrit.wikimedia.org/r/541891 (https://phabricator.wikimedia.org/T229286) (owner: 10Effie Mouzeli)
[14:22:08] <wikibugs>	 (03PS4) 10Effie Mouzeli: lvs::monitor_services: increase number of tries before MCS is critical [puppet] - 10https://gerrit.wikimedia.org/r/541891 (https://phabricator.wikimedia.org/T229286)
[14:25:53] <wikibugs>	 10Operations, 10Gerrit, 10Release-Engineering-Team-TODO, 10serviceops, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10Paladox)
[14:29:05] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] lvs::monitor_services: increase number of tries before MCS is critical [puppet] - 10https://gerrit.wikimedia.org/r/541891 (https://phabricator.wikimedia.org/T229286) (owner: 10Effie Mouzeli)
[14:29:33] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/542417 (owner: 10Muehlenhoff)
[14:31:27] <wikibugs>	 (03CR) 10Muehlenhoff: [V: 03+2 C: 03+2] Sort distros in generate-debdeploy-spec [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/542417 (owner: 10Muehlenhoff)
[14:38:48] <icinga-wm>	 PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[14:39:53] <wikibugs>	 (03PS1) 10Muehlenhoff: Print spec file name in generate-debdeploy-spec [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/542446
[14:43:18] <wikibugs>	 (03CR) 10Muehlenhoff: [V: 03+2 C: 03+2] Print spec file name in generate-debdeploy-spec [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/542446 (owner: 10Muehlenhoff)
[15:00:02] <icinga-wm>	 RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[15:02:26] <wikibugs>	 (03CR) 10Aaron Schulz: "A noop is fine. It means that I change the MW default without breaking prod." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521967 (owner: 10Aaron Schulz)
[15:34:51] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] Update probe endpoint to support path and spec_segment [debs/prometheus-swagger-exporter] - 10https://gerrit.wikimedia.org/r/541683 (owner: 10Cwhite)
[15:35:30] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[15:35:52] <wikibugs>	 (03PS1) 10Jhedden: openstack: update eqiad1 clients to wikimediacloud auth url [puppet] - 10https://gerrit.wikimedia.org/r/542452 (https://phabricator.wikimedia.org/T223907)
[15:36:26] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) is CRITICAL: Test Retrieve announcements returned the unexpected status 504 (expecting: 200): /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is WARNING: Test Retrieve aggregated feed content for April 29, 2016 responds with unexpected value
[15:36:26] <icinga-wm>	 g keys: [image, tfa, mostread] https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:36:26] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1005 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) is CRITICAL: Test Retrieve announcements returned the unexpected status 504 (expecting: 200): /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is WARNING: Test Retrieve aggregated feed content for April 29, 2016 responds with unexpected value
[15:36:26] <icinga-wm>	 g keys: [tfa, mostread, image] https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:36:32] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1015 is CRITICAL: PYBAL CRITICAL - CRITICAL - wikifeeds_8889: Servers kubernetes1002.eqiad.wmnet, kubernetes1005.eqiad.wmnet, kubernetes1006.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:36:36] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) is CRITICAL: Test Retrieve announcements returned the unexpected status 504 (expecting: 200): /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is WARNING: Test Retrieve aggregated feed content for April 29, 2016 responds with unexpected value
[15:36:36] <icinga-wm>	 g keys: [image, mostread, tfa] https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:36:42] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1019 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:36:42] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1016 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:37:04] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - CRITICAL - wikifeeds_8889: Servers kubernetes1003.eqiad.wmnet, kubernetes1005.eqiad.wmnet, kubernetes1006.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:37:10] <icinga-wm>	 PROBLEM - Restrouter LVS eqiad on restrouter.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase
[15:37:10] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1023 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) is CRITICAL: Test Retrieve announcements returned the unexpected status 504 (expecting: 200): /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restba
[15:37:10] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1024 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:37:10] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1021 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:37:30] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1027 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) is CRITICAL: Test Retrieve announcements returned the unexpected status 504 (expecting: 200): /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restba
[15:37:30] <icinga-wm>	 PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) is CRITICAL: Test Retrieve announcements returned the unexpected status 504 (expecting: 200): /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase
[15:37:54] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1022 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:37:54] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1025 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:37:58] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1026 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:37:58] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1020 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:37:58] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1018 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:37:58] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1017 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:38:04] <icinga-wm>	 PROBLEM - LVS HTTP IPv4 on wikifeeds.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[15:39:44] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:39:44] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:39:48] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1015 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[15:39:54] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1019 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:39:54] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1016 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:39:54] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:40:46] <icinga-wm>	 PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase
[15:40:51] <wikibugs>	 (03CR) 10Jhedden: "PCC results: https://puppet-compiler.wmflabs.org/compiler1001/18858/" [puppet] - 10https://gerrit.wikimedia.org/r/542452 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden)
[15:42:48] <elukey>	 shdubsh: o/ can the above alerts be related to your change?
[15:42:50] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1018 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:42:50] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1017 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:43:00] <elukey>	 or do we have an outage?
[15:43:23] <shdubsh>	 elukey: not related to anything I'm doing.
[15:43:40] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[15:43:45] <shdubsh>	 maybe an outage.  checking for effects
[15:43:46] <marostegui>	 what's going on?
[15:43:58] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1027 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is WARNING: Test Retrieve aggregated feed content for April 29, 2016 responds with unexpected value at path = Missing keys: [tfa, image, mostread] https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:44:04] <icinga-wm>	 PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase
[15:44:11] <elukey>	 not sure, there was a change from cole about the swagger prometheus stuff
[15:44:20] <elukey>	 so I thought it was related
[15:44:23] <elukey>	 https://gerrit.wikimedia.org/r/#/c/operations/debs/prometheus-swagger-exporter/+/541683/
[15:44:24] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1026 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:44:24] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1020 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:44:26] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1025 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:44:26] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1022 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:44:26] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1018 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:44:26] <icinga-wm>	 RECOVERY - LVS HTTP IPv4 on wikifeeds.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 945 bytes in 0.002 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[15:44:28] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1017 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:44:40] <wikibugs>	 (03CR) 10Jhedden: "openstack.<region>.wikimediacloud.org is a CNAME to cloudcontrol1003, but it will restart a lot of services." [puppet] - 10https://gerrit.wikimedia.org/r/542452 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden)
[15:44:42] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1015 is CRITICAL: PYBAL CRITICAL - CRITICAL - wikifeeds_8889: Servers kubernetes1002.eqiad.wmnet, kubernetes1003.eqiad.wmnet, kubernetes1004.eqiad.wmnet, kubernetes1005.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:44:46] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:44:46] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1005 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:44:56] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:44:56] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1019 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:44:56] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1016 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:45:09] <elukey>	 marostegui: I think we should ping more people
[15:45:14] <wikibugs>	 (03CR) 10Jhedden: [C: 03+1] "On hold until Tuesday, Oct 15 2019" [puppet] - 10https://gerrit.wikimedia.org/r/542452 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden)
[15:45:18] <elukey>	 mobrovac: --^
[15:45:22] <icinga-wm>	 RECOVERY - Restrouter LVS eqiad on restrouter.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[15:45:24] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1024 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) is CRITICAL: Test Retrieve announcements returned the unexpected status 504 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restba
[15:45:34] <wikibugs>	 (03CR) 10Jhedden: [C: 04-1] "> Patch Set 1: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/542452 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden)
[15:45:36] <marostegui>	 elukey: done
[15:45:55] <shdubsh>	 entrypoint latencies are multi-minute
[15:46:48] <elukey>	 seems only for one endpoint though
[15:46:58] <wikibugs>	 (03PS2) 10Jhedden: openstack: update eqiad1 clients to wikimediacloud auth url [puppet] - 10https://gerrit.wikimedia.org/r/542452 (https://phabricator.wikimedia.org/T223907)
[15:47:40] <elukey>	 the /en.wikipedia.org/v1/feed/featured
[15:48:02] <mobrovac>	 yup that's wikifeeds acting up
[15:48:03] <mobrovac>	 akosiaris: ^
[15:48:28] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[15:48:34] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1024 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:48:34] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1023 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:48:38] <mobrovac>	 aha!
[15:49:16] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1020 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is WARNING: Test Retrieve aggregated feed content for April 29, 2016 responds with unexpected value at path = Missing keys: [tfa, mostread, image] https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:49:18] <icinga-wm>	 PROBLEM - LVS HTTP IPv4 on wikifeeds.svc.eqiad.wmnet is CRITICAL: connect to address 10.2.2.47 and port 8889: Connection refused https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[15:49:22] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1025 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:49:22] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1022 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:49:22] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1026 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:49:22] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1017 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) is CRITICAL: Test Retrieve announcements returned the unexpected status 504 (expecting: 200): /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restba
[15:49:22] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1018 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) is CRITICAL: Test Retrieve announcements returned the unexpected status 504 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restba
[15:49:26] <mobrovac>	 yeah no
[15:49:32] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:49:32] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:49:34] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1015 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[15:49:42] <icinga-wm>	 PROBLEM - wikifeeds eqiad on wikifeeds.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/page/most-read/{year}/{month}/{day} (retrieve the most read articles for January 1, 2016) is CRITICAL: Test retrieve the most read articles for January 1, 2016 returned the unexpected status 429 (expecting: 200): /{domain}/v1/page/most-read/{year}/{month}/{day} (retrieve the most-read articles for January 1, 2016 (with aggregated=true)) is CRITICAL: 
[15:49:42] <icinga-wm>	  most-read articles for January 1, 2016 (with aggregated=true) returned the unexpected status 429 (expecting: 200) https://wikitech.wikimedia.org/wiki/Wikifeeds
[15:49:46] <mobrovac>	 akosiaris: ^ i get conn refused if i try manually wikifeeds from a rb host
[15:49:46] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1019 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:49:48] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1016 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:50:16] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1021 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:50:30] <icinga-wm>	 RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[15:50:34] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1027 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:50:35] <mobrovac>	 ok seems to be back to normal
[15:50:54] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1025 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:50:54] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1022 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:50:56] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1018 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:50:56] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1026 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:50:56] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1020 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:50:56] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1017 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:50:56] <icinga-wm>	 RECOVERY - LVS HTTP IPv4 on wikifeeds.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 945 bytes in 0.002 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[15:51:16] <icinga-wm>	 RECOVERY - wikifeeds eqiad on wikifeeds.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Wikifeeds
[15:51:22] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:51:50] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1021 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[15:53:59] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to 'analytics-privatedata-users' and 'researchers' for Erin Yener - https://phabricator.wikimedia.org/T234529 (10jrobell) Thanks for your help moving this forward @herron.  would it be possible to get on a call or chat with @EYener and @jkumalah to make su...
[15:54:31] <elukey>	 mobrovac: is wikifeeds on k8s or elsewhere?
[15:54:34] <elukey>	 (super ignorant)
[15:54:44] <mobrovac>	 k8s elukey
[15:54:51] <elukey>	 ah lovely
[16:05:46] <apergos>	 do I understand we were in an outage for a bit? because I see no pages even by email
[16:12:56] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Create wikimedia sustainability mailing list - https://phabricator.wikimedia.org/T234999 (10Aklapper) As long as the list description explains, the actual name can be short I guess :)
[16:16:00] <icinga-wm>	 PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[16:19:58] <icinga-wm>	 PROBLEM - Host ganeti2009 is DOWN: PING CRITICAL - Packet loss = 100%
[16:26:36] <icinga-wm>	 RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[16:32:00] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqsin on icinga1001 is CRITICAL: 58.87 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[16:32:12] <godog>	 I'll take a look at ganeti2009
[16:32:22] <icinga-wm>	 RECOVERY - Host ganeti2009 is UP: PING OK - Packet loss = 0%, RTA = 36.24 ms
[16:32:41] <godog>	 oh nevermind that's being setup isn't it ?
[16:32:52] <godog>	 papaul: ^ ?
[16:33:36] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at eqsin on icinga1001 is OK: (C)60 le (W)70 le 78.3 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[16:42:18] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: labsdb1009 broken PSU - https://phabricator.wikimedia.org/T233273 (10wiki_willy) @Jclark-ctr - this arrived Thursday via https://www.fedex.com/en-us/home.html.  Just a heads up, this will need to be replaced before the PDU upgrade next Tuesday, to retain redundan...
[16:43:48] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: labsdb1009 broken PSU - https://phabricator.wikimedia.org/T233273 (10Marostegui) I believe we don't have to put the host down for the PSU replacement, do we? However I would like to depool and stop mysql before, as a crash with mysql running could cause data corr...
[16:45:22] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: labsdb1009 broken PSU - https://phabricator.wikimedia.org/T233273 (10wiki_willy) Yup, it should be a hot swap.  So @Jclark-ctr - please reach out to @Marostegui before replacing it.  Thanks, Willy
[16:48:51] <wikibugs>	 10Operations, 10ops-eqiad, 10User-Elukey: (Need By: August 31) rack/setup/install (3) new zookeeper nodes - https://phabricator.wikimedia.org/T227025 (10Papaul) @elukey after workin 4 hours on this, te problem ended up no being the Serial configuration in the BIOS but the GRUB settings. on the systems we hav...
[16:54:17] <wikibugs>	 10Operations, 10ops-eqiad, 10User-Elukey: (Need By: August 31) rack/setup/install (3) new zookeeper nodes - https://phabricator.wikimedia.org/T227025 (10Papaul) I made the change again on an-conf1001 and did run systemctl enable getty@ttyS1 and reboot the system now it is working so you can do the same for t...
[16:55:57] <wikibugs>	 10Operations, 10ops-eqiad, 10User-Elukey: (Need By: August 31) rack/setup/install (3) new zookeeper nodes - https://phabricator.wikimedia.org/T227025 (10wiki_willy) Great job @Papaul in troubleshooting this and tracking it down to the root cause.  Thanks!   ~Willy
[17:06:54] <wikibugs>	 (03PS2) 10Herron: admin: add dedcode to analytics-privatedata-users, researchers [puppet] - 10https://gerrit.wikimedia.org/r/542132 (https://phabricator.wikimedia.org/T234473)
[17:09:13] <wikibugs>	 10Operations, 10Mail, 10Wikimedia-Mailing-lists: mass Yahoo / AOL bounces mailman - https://phabricator.wikimedia.org/T232417 (10Effeietsanders) An estimated 120 emails have now been unsubscribed. It looks like AOL and Yahoo. Is this also happening for other mailing lists?
[17:09:22] <wikibugs>	 (03PS1) 10Jgreen: add frqueue2001 to icinga nsca_frack.cfg.erb [puppet] - 10https://gerrit.wikimedia.org/r/542464 (https://phabricator.wikimedia.org/T232630)
[17:09:58] <wikibugs>	 (03CR) 10Herron: [C: 03+2] admin: add dedcode to analytics-privatedata-users, researchers [puppet] - 10https://gerrit.wikimedia.org/r/542132 (https://phabricator.wikimedia.org/T234473) (owner: 10Herron)
[17:10:36] <wikibugs>	 10Operations, 10Mail, 10Wikimedia-Mailing-lists: mass Yahoo / AOL bounces mailman - https://phabricator.wikimedia.org/T232417 (10Paladox) When i sent a email to wikitech-i last night, it failed (seems to be because yahoo blacklisted lists.wikimedia.org.
[17:11:49] <wikibugs>	 (03PS2) 10Herron: admin: add jkumalah to analytics-privatedata-users, researchers [puppet] - 10https://gerrit.wikimedia.org/r/542141 (https://phabricator.wikimedia.org/T234433)
[17:14:13] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frban2001.codfw.wmnet - https://phabricator.wikimedia.org/T234069 (10Jgreen)
[17:14:16] <wikibugs>	 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack/setup/install frban1001.eqiad.wmnet - https://phabricator.wikimedia.org/T234068 (10Jgreen)
[17:14:19] <wikibugs>	 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack/setup/install frnetmon1001 - https://phabricator.wikimedia.org/T232137 (10Jgreen)
[17:14:22] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: rack/setup/install frqueue2001 - https://phabricator.wikimedia.org/T232630 (10Jgreen)
[17:16:45] <wikibugs>	 (03CR) 10Herron: [C: 03+2] admin: add jkumalah to analytics-privatedata-users, researchers [puppet] - 10https://gerrit.wikimedia.org/r/542141 (https://phabricator.wikimedia.org/T234433) (owner: 10Herron)
[17:18:12] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to 'analytics-privatedata-users' and 'researchers' for Erin Yener - https://phabricator.wikimedia.org/T234529 (10herron)
[17:18:27] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: rack/setup/install frqueue2001 - https://phabricator.wikimedia.org/T232630 (10Jgreen)
[17:27:01] <wikibugs>	 (03PS1) 10Cwhite: profile: added swagger exporter jobs at svc endpoints [puppet] - 10https://gerrit.wikimedia.org/r/542472 (https://phabricator.wikimedia.org/T205870)
[17:28:10] <wikibugs>	 (03PS2) 10Jgreen: add frqueue2001 to icinga nsca_frack.cfg.erb [puppet] - 10https://gerrit.wikimedia.org/r/542464 (https://phabricator.wikimedia.org/T232630)
[17:29:41] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile: added swagger exporter jobs at svc endpoints [puppet] - 10https://gerrit.wikimedia.org/r/542472 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite)
[17:29:54] <wikibugs>	 (03CR) 10Jgreen: [C: 03+2] add frqueue2001 to icinga nsca_frack.cfg.erb [puppet] - 10https://gerrit.wikimedia.org/r/542464 (https://phabricator.wikimedia.org/T232630) (owner: 10Jgreen)
[17:31:11] <wikibugs>	 10Operations, 10ops-eqiad, 10media-storage, 10User-fgiunchedi: ms-be1020 - host went down - https://phabricator.wikimedia.org/T234698 (10Dzahn)
[17:31:47] <wikibugs>	 (03PS2) 10Cwhite: profile, prometheus, role: install swagger exporter on prometheus nodes [puppet] - 10https://gerrit.wikimedia.org/r/541619 (https://phabricator.wikimedia.org/T205870)
[17:31:48] <wikibugs>	 10Operations, 10ops-eqiad, 10media-storage, 10User-fgiunchedi: ms-be1020 - firmware upgrade: (was: host went down) - https://phabricator.wikimedia.org/T234698 (10Dzahn)
[17:32:22] <wikibugs>	 (03CR) 10Cwhite: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/541619 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite)
[17:33:04] <wikibugs>	 10Operations, 10Mail, 10Wikimedia-Mailing-lists: mass Yahoo / AOL bounces mailman - https://phabricator.wikimedia.org/T232417 (10Lea_Lacroix_WMDE) I'm wondering if this is somehow related to the massive spam attack we had a few months ago on some mailing-lists (hundred of //fake// AOL email addresses subscri...
[17:35:00] <wikibugs>	 (03PS2) 10Cwhite: profile: added swagger exporter jobs at svc endpoints [puppet] - 10https://gerrit.wikimedia.org/r/542472 (https://phabricator.wikimedia.org/T205870)
[17:35:03] <wikibugs>	 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: rack/setup/install frqueue2001 - https://phabricator.wikimedia.org/T232630 (10Jgreen) [x] bonded ethernet configuration done [x] redis replication appears to be working now that firewall policy is deployed [x] added to icinga
[17:38:23] <wikibugs>	 (03CR) 10Umherirrender: [C: 04-1] "Yes, it looks good on wiki. Lets work with the messages first" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530871 (https://phabricator.wikimedia.org/T78711) (owner: 10Umherirrender)
[17:44:45] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] scap: mediawiki logstash_checker [puppet] - 10https://gerrit.wikimedia.org/r/539881 (https://phabricator.wikimedia.org/T234283) (owner: 10Thcipriani)
[17:49:05] <wikibugs>	 10Operations, 10ops-eqiad, 10User-Elukey: (Need By: August 31) rack/setup/install (3) new zookeeper nodes - https://phabricator.wikimedia.org/T227025 (10Papaul) @elukey so the issue is the you used   puppet/modules/install_server/files/dhcpd/linux-host-entries.ttyS0-115200   and not   puppet/modules/install_...
[17:50:07] <wikibugs>	 (03PS16) 10Cwhite: ci: define statsd prometheus exporter mappings [puppet] - 10https://gerrit.wikimedia.org/r/479139 (https://phabricator.wikimedia.org/T233089)
[17:50:50] <wikibugs>	 10Operations, 10Documentation: Document how to fix IPMI issues on Wikitech - https://phabricator.wikimedia.org/T191956 (10Dzahn) @RobH there is a wikitech page you made back in 2012 about the ipmi_mgmt script at  https://wikitech.wikimedia.org/wiki/Systems_management.   Is that still used?  Would it make sense...
[17:52:03] <wikibugs>	 10Operations, 10observability, 10serviceops, 10Performance-Team (Radar): Messages in Logstash from php-fatal-error.php are missing from type:mediawiki/channel:fatal - https://phabricator.wikimedia.org/T234283 (10Krinkle) p:05Triage→03High
[17:55:12] <wikibugs>	 10Operations, 10observability, 10serviceops, 10Performance-Team (Radar): Messages in Logstash from php-fatal-error.php are missing from type:mediawiki/channel:fatal - https://phabricator.wikimedia.org/T234283 (10Krinkle)
[17:57:03] <wikibugs>	 10Operations, 10Mail, 10Wikimedia-Mailing-lists: mass Yahoo / AOL bounces mailman - https://phabricator.wikimedia.org/T232417 (10Aklapper) p:05High→03Normal >>! In T232417#5566947, @Effeietsanders wrote: > An estimated 120 emails have now been unsubscribed. It looks like AOL and Yahoo. Is this also happe...
[17:57:36] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to 'analytics-privatedata-users' and 'researchers' for Jerrie Kumalah - https://phabricator.wikimedia.org/T234433 (10jkumalah) @herron or @Nuria when i ssh into the stat1007 my password does not seem to work. I tried my yubikey as wel...
[17:58:45] <wikibugs>	 10Operations, 10Mail, 10Wikimedia-Mailing-lists: mass Yahoo / AOL bounces mailman - https://phabricator.wikimedia.org/T232417 (10Paladox) Is this normal though? Having lists.wikimedia.org blocked by yahoo in my opinion is pretty high priority.
[17:59:25] <wikibugs>	 10Operations, 10ops-eqiad, 10User-Elukey: (Need By: August 31) rack/setup/install (3) new zookeeper nodes - https://phabricator.wikimedia.org/T227025 (10elukey) Thanks a lot for this work! I was not aware that mistake, and I have also to admit my ignorance about that part of the dhcp configuration. I think t...
[18:02:07] <wikibugs>	 10Operations, 10MediaWiki-General, 10serviceops, 10CPT Initiatives (PHP7 (TEC4)), and 2 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Krinkle)
[18:02:11] <wikibugs>	 10Operations, 10serviceops, 10Patch-For-Review: SRE FY19-20 Q1 goal: complete the transition to PHP7 - https://phabricator.wikimedia.org/T219127 (10Krinkle)
[18:02:19] <wikibugs>	 10Operations, 10ops-eqiad, 10User-Elukey: (Need By: August 31) rack/setup/install (3) new zookeeper nodes - https://phabricator.wikimedia.org/T227025 (10elukey) The info is in https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Preparation_2 to I have clearly miss it, but a reference in the FAQ of platform...
[18:04:10] <wikibugs>	 10Operations, 10ops-eqiad, 10User-Elukey: (Need By: August 31) rack/setup/install (3) new zookeeper nodes - https://phabricator.wikimedia.org/T227025 (10Papaul) @elukey no need to feel bad about a mistake we all make mistakes just glad that it is fix.
[18:07:31] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Please create engprod-mgt@ mailing list - https://phabricator.wikimedia.org/T235291 (10greg) p:05Triage→03Normal
[18:08:13] <wikibugs>	 10Operations, 10Mail, 10Wikimedia-Mailing-lists: mass Yahoo / AOL bounces mailman - https://phabricator.wikimedia.org/T232417 (10aezell) I spoke to a friend who still works in this area and they said that spam detection and management is in freefall at Yahoo/AOL right now. They are rapidly defunding that par...
[18:09:12] <wikibugs>	 (03PS1) 10BryanDavis: wikitech: Update hostnames for OpenStack endpoints [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542506 (https://phabricator.wikimedia.org/T223907)
[18:10:04] <wikibugs>	 10Operations, 10ops-eqiad, 10User-Elukey: (Need By: August 31) rack/setup/install (3) new zookeeper nodes - https://phabricator.wikimedia.org/T227025 (10elukey) >>! In T227025#5567184, @Papaul wrote: > @elukey no need to feel bad about a mistake we all make mistakes just glad that it is fix.   Thanks! What I...
[18:11:48] <wikibugs>	 10Operations, 10Documentation: Document how to fix IPMI issues on Wikitech - https://phabricator.wikimedia.org/T191956 (10Dzahn) @ema @srodlund   > Wikitech has the following list of IPMI related pages: ..  - https://wikitech.wikimedia.org/wiki/Systems_management  pinged author in comment above  - https://wiki...
[18:12:59] <wikibugs>	 10Operations, 10ops-eqiad, 10User-Elukey: (Need By: August 31) rack/setup/install (3) new zookeeper nodes - https://phabricator.wikimedia.org/T227025 (10Papaul) I have no problem with you expanding the documentation : )
[18:19:16] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Please create engprod-mgt@ mailing list - https://phabricator.wikimedia.org/T235291 (10MarcoAurelio) @greg Will the list be public or private, with or without archives? Thanks.
[18:25:17] <wikibugs>	 (03PS1) 10Jforrester: build: Upgrade mediawiki-codesniffer to v28.0.0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542522
[18:27:22] <wikibugs>	 (03CR) 10Jhedden: [C: 03+1] wikitech: Update hostnames for OpenStack endpoints [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542506 (https://phabricator.wikimedia.org/T223907) (owner: 10BryanDavis)
[18:30:37] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] build: Upgrade mediawiki-codesniffer to v28.0.0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542522 (owner: 10Jforrester)
[18:36:27] <wikibugs>	 10Operations, 10Documentation: Document how to fix IPMI issues on Wikitech - https://phabricator.wikimedia.org/T191956 (10Dzahn) - https://wikitech.wikimedia.org/wiki/Systems_management  redirected to [[https://wikitech.wikimedia.org/wiki/Management_Interfaces | Management Interfaces]]
[18:43:16] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to 'analytics-privatedata-users' and 'researchers' for Jerrie Kumalah - https://phabricator.wikimedia.org/T234433 (10Nuria) Your ssh key is teh one that shoudl work but it should not require  a pasword the machine whole name is stat1007.eqiad.wmnet so  > s...
[18:48:52] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Please create engprod-mgt@ mailing list - https://phabricator.wikimedia.org/T235291 (10greg) Private, with archives.
[18:58:40] <wikibugs>	 (03PS7) 10Dzahn: conftool/LVS: add new service parsoid-php [puppet] - 10https://gerrit.wikimedia.org/r/541377 (https://phabricator.wikimedia.org/T233654)
[19:07:28] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Please create engprod-mgt@ mailing list - https://phabricator.wikimedia.org/T235291 (10Dzahn) @greg List created. I let it created a random pass, then added the secondary admins and ran a "reset password" command.  So you should have received 2 mails and everybody else...
[19:11:04] <mutante>	 @seen hauskater
[19:11:04] <wm-bot>	 mutante: Last time I saw hauskater they were quitting the network with reason: Quit: hauskater N/A at 10/11/2019 6:38:13 PM (32m51s ago)
[19:11:33] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 04-1] "The idea LGTM, but we'll have to DRY and base the target discovery on puppet resources for the services themselves." [puppet] - 10https://gerrit.wikimedia.org/r/542472 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite)
[19:12:02] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists, 10User-greg: Please create engprod-mgt@ mailing list - https://phabricator.wikimedia.org/T235291 (10Dzahn) a:03greg
[19:14:30] <wikibugs>	 (03PS1) 10Herron: logstash: add an index for deployment related logs [puppet] - 10https://gerrit.wikimedia.org/r/542557 (https://phabricator.wikimedia.org/T234564)
[19:18:31] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists, 10Wikispore: Creation of Wikispore mailing list - https://phabricator.wikimedia.org/T232961 (10Dzahn) 05Open→03Resolved a:03Dzahn List has been created  list info page: https://lists.wikimedia.org/mailman/listinfo/wikispore admin login: https://lists.wikimedia.o...
[19:19:23] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] logstash: add an index for deployment related logs [puppet] - 10https://gerrit.wikimedia.org/r/542557 (https://phabricator.wikimedia.org/T234564) (owner: 10Herron)
[19:21:00] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM! See inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/541619 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite)
[19:21:17] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists, 10User-greg: Please create engprod-mgt@ mailing list - https://phabricator.wikimedia.org/T235291 (10greg) 05Open→03Resolved Thanks! done.
[19:21:53] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] logstash: add an index for deployment related logs [puppet] - 10https://gerrit.wikimedia.org/r/542557 (https://phabricator.wikimedia.org/T234564) (owner: 10Herron)
[19:25:15] <wikibugs>	 (03CR) 10Herron: "that was quick!  awesome.  I'll plan to get this rolled out on tuesday, since we're about to go into a long us holiday weekend." [puppet] - 10https://gerrit.wikimedia.org/r/542557 (https://phabricator.wikimedia.org/T234564) (owner: 10Herron)
[19:25:37] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2056.codfw.wmnet - https://phabricator.wikimedia.org/T230777 (10Papaul) ` papaul@asw-d-codfw# show | compare  [edit interfaces interface-range vlan-private1-d-codfw] -    member ge-6/0/4; [edit interfaces interface-range disabled]      mem...
[19:26:04] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2056.codfw.wmnet - https://phabricator.wikimedia.org/T230777 (10Papaul)
[19:27:57] <wikibugs>	 (03PS1) 10Dzahn: add service records for new parsoid-php service [dns] - 10https://gerrit.wikimedia.org/r/542566 (https://phabricator.wikimedia.org/T233654)
[19:28:24] <wikibugs>	 (03CR) 10Dzahn: "DNS: https://gerrit.wikimedia.org/r/c/operations/dns/+/542566" [puppet] - 10https://gerrit.wikimedia.org/r/541377 (https://phabricator.wikimedia.org/T233654) (owner: 10Dzahn)
[19:29:11] <wikibugs>	 (03CR) 10Dzahn: [C: 04-2] "per comment in "discovery-geo-resources" do NOT merge before separate change to hieradata/common/discovery.yaml" [dns] - 10https://gerrit.wikimedia.org/r/542566 (https://phabricator.wikimedia.org/T233654) (owner: 10Dzahn)
[19:32:51] <wikibugs>	 (03PS1) 10Dzahn: discovery.yaml: add parsoid-php microservice [puppet] - 10https://gerrit.wikimedia.org/r/542572 (https://phabricator.wikimedia.org/T233654)
[19:34:56] <wikibugs>	 (03CR) 10Dzahn: "so it looks like first i have to do https://gerrit.wikimedia.org/r/c/operations/puppet/+/542572  and then the DNS change above and then i " [puppet] - 10https://gerrit.wikimedia.org/r/541377 (https://phabricator.wikimedia.org/T233654) (owner: 10Dzahn)
[19:40:55] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2057.codfw.wmnet - https://phabricator.wikimedia.org/T230394 (10Papaul)
[19:42:35] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2063.codfw.wmnet - https://phabricator.wikimedia.org/T230704 (10Papaul)
[19:43:46] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[19:46:57] <wikibugs>	 (03PS13) 10Dzahn: puppetmaster/configmaster: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/451821
[19:49:05] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] puppetmaster/configmaster: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/451821 (owner: 10Dzahn)
[19:50:05] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests: Onboarding Reuven Lazarus - https://phabricator.wikimedia.org/T235215 (10Dzahn) OIT reports E-mail account has been created. We can start now with some of these.
[19:55:32] <wikibugs>	 10Operations, 10ops-codfw: No microcode updates loaded on puppetmaster2001/2002 after reimage to Buster - https://phabricator.wikimedia.org/T235250 (10Papaul) @MoritzMuehlenhoff   The system is running : BIOS version :2.01 /available BIOS version: 2.10 Firmware version: 2.30 /available Firmware version:2.63
[19:56:13] <wikibugs>	 10Operations, 10MediaWiki-General, 10serviceops, 10CPT Initiatives (PHP7 (TEC4)), and 2 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Gorobay) Many articles beginning with lowercase letters are redirects to arti...
[19:56:40] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[20:00:24] <wikibugs>	 10Operations, 10Mail, 10Wikimedia-Mailing-lists: mass Yahoo / AOL bounces mailman - https://phabricator.wikimedia.org/T232417 (10herron) >! In T232417#5567208, @aezell wrote: > tl:dr; Contacting someone in the abuse department at Yahoo/AOL is probably the best bet to figure this out.  Yes indeed this looks t...
[20:06:09] <wikibugs>	 (03PS1) 10Papaul: DNS: Remove mgmt DNS for db2057 and db2063 [dns] - 10https://gerrit.wikimedia.org/r/542597
[20:08:33] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to 'analytics-privatedata-users' and 'researchers' for Jerrie Kumalah - https://phabricator.wikimedia.org/T234433 (10jkumalah) {F30630945}  Will follow-up with fr-tech teammates. The attached image is what i get each time.
[20:09:27] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] DNS: Remove mgmt DNS for db2057 and db2063 [dns] - 10https://gerrit.wikimedia.org/r/542597 (owner: 10Papaul)
[20:11:46] <wikibugs>	 (03PS3) 10Cwhite: profile, prometheus, role: install swagger exporter on prometheus nodes [puppet] - 10https://gerrit.wikimedia.org/r/541619 (https://phabricator.wikimedia.org/T205870)
[20:12:01] <wikibugs>	 (03CR) 10Cwhite: profile, prometheus, role: install swagger exporter on prometheus nodes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/541619 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite)
[20:13:45] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile, prometheus, role: install swagger exporter on prometheus nodes [puppet] - 10https://gerrit.wikimedia.org/r/541619 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite)
[20:18:23] <wikibugs>	 10Operations, 10MediaWiki-General, 10serviceops, 10CPT Initiatives (PHP7 (TEC4)), and 2 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Anomie) And in some cases the actual article is at the lowercase-letter title...
[20:18:55] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] DNS: Remove mgmt DNS for db2057 and db2063 [dns] - 10https://gerrit.wikimedia.org/r/542597 (owner: 10Papaul)
[20:22:04] <wikibugs>	 (03PS1) 10Herron: admin: add eyener to researchers, analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/542599 (https://phabricator.wikimedia.org/T234529)
[20:23:34] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to 'analytics-privatedata-users' and 'researchers' for Erin Yener - https://phabricator.wikimedia.org/T234529 (10herron) Hi @Nuria could you please review this group request for approval?
[20:27:16] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: Decommission db2057.codfw.wmnet - https://phabricator.wikimedia.org/T230394 (10Papaul)
[20:27:25] <wikibugs>	 10Operations, 10DBA: Decommission db2043-db2069 - https://phabricator.wikimedia.org/T228258 (10Papaul)
[20:27:28] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: Decommission db2057.codfw.wmnet - https://phabricator.wikimedia.org/T230394 (10Papaul) 05Open→03Resolved Complete
[20:28:04] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: Decommission db2063.codfw.wmnet - https://phabricator.wikimedia.org/T230704 (10Papaul)
[20:28:07] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to 'analytics-privatedata-users' and 'researchers' for Erin Yener - https://phabricator.wikimedia.org/T234529 (10herron) Regarding chat I'd encourage them to reach out with any questions via IRC.  Details about available channels and...
[20:28:16] <wikibugs>	 10Operations, 10DBA: Decommission db2043-db2069 - https://phabricator.wikimedia.org/T228258 (10Papaul)
[20:28:18] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: Decommission db2063.codfw.wmnet - https://phabricator.wikimedia.org/T230704 (10Papaul) 05Open→03Resolved Complete
[20:29:10] <wikibugs>	 10Operations, 10Research, 10SRE-Access-Requests: Requesting access to analytics cluster for Djellel Difallah - https://phabricator.wikimedia.org/T234473 (10herron) 05Open→03Resolved a:03herron Access has been granted.  Transitioning this to resolved now, but if any follow-up is needed please don't hesi...
[20:29:34] <wikibugs>	 10Operations, 10DNS, 10Toolforge, 10Traffic, 10cloud-services-team (Kanban): Update authoratiative nameservers for the toolforge.org domain to point to Designate - https://phabricator.wikimedia.org/T235303 (10Krenair) This will need to be communicated to MarkMonitor who register domains on WMF's behalf.....
[20:32:56] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Scoring-platform-team: Grant LDAP groups and deployment shell access to Kevin Bazira - https://phabricator.wikimedia.org/T234209 (10herron) Great, thank you!  @Nuria could you please review/approve for analytics groups?  @greg could you please...
[20:33:19] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Scoring-platform-team: Grant LDAP groups and deployment shell access to Kevin Bazira - https://phabricator.wikimedia.org/T234209 (10herron)
[20:38:46] <wikibugs>	 (03PS2) 10Jforrester: build: Upgrade mediawiki-codesniffer to v28.0.0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542522
[20:40:30] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC) - https://phabricator.wikimedia.org/T226782 (10RobH)
[20:41:36] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a8-eqiad pdu refresh (Thursday 10/17 @11am UTC) - https://phabricator.wikimedia.org/T227133 (10RobH)
[20:41:47] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: b2-eqiad pdu refresh (Tuesday 10/29 @11am UTC) - https://phabricator.wikimedia.org/T227538 (10RobH)
[20:41:57] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: b4-eqiad pdu refresh (Thursday 10/24 @11am UTC) - https://phabricator.wikimedia.org/T227540 (10RobH)
[20:42:08] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: b7-eqiad pdu refresh (Tuesday 11/5 @11am UTC) - https://phabricator.wikimedia.org/T227542 (10RobH)
[20:42:32] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) - https://phabricator.wikimedia.org/T227543 (10RobH)
[20:46:53] <wikibugs>	 10Operations, 10Mail, 10Wikimedia-Mailing-lists: mass Yahoo / AOL bounces mailman - https://phabricator.wikimedia.org/T232417 (10Effeietsanders) Thanks @Lea_Lacroix_WMDE - I didn't look thoroughly enough at the set of people being affected to recognize this pattern and wasn't aware of this issue at other lis...
[21:22:51] <wikibugs>	 (03PS1) 10Jhedden: openstack: Allow tools-dns-manager to connect from labs networks [puppet] - 10https://gerrit.wikimedia.org/r/542605 (https://phabricator.wikimedia.org/T235304)
[21:24:38] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+1] openstack: Allow tools-dns-manager to connect from labs networks [puppet] - 10https://gerrit.wikimedia.org/r/542605 (https://phabricator.wikimedia.org/T235304) (owner: 10Jhedden)
[21:26:18] <wikibugs>	 (03CR) 10Jhedden: [C: 03+2] openstack: Allow tools-dns-manager to connect from labs networks [puppet] - 10https://gerrit.wikimedia.org/r/542605 (https://phabricator.wikimedia.org/T235304) (owner: 10Jhedden)
[21:26:23] <wikibugs>	 (03CR) 10Alex Monk: [C: 03+1] openstack: Allow tools-dns-manager to connect from labs networks [puppet] - 10https://gerrit.wikimedia.org/r/542605 (https://phabricator.wikimedia.org/T235304) (owner: 10Jhedden)
[21:52:20] <icinga-wm>	 PROBLEM - novaadmin has roles in every project on cloudcontrol1003 is CRITICAL: In tools, user novaadmin should have roles [user, projectadmin] but has [udesignateadmin, uprojectadmin, uuser, uadmin] https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:55:34] <icinga-wm>	 RECOVERY - novaadmin has roles in every project on cloudcontrol1003 is OK: novaadmin has the correct roles in all projects. https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:18:33] <wikibugs>	 (03PS1) 10Bstorm: keystone: change monitoring some details to email rather than paging [puppet] - 10https://gerrit.wikimedia.org/r/542610
[22:28:08] <icinga-wm>	 PROBLEM - IPMI Sensor Status on maps1002 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Power Supply 2 = Critical, Power Supplies = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures
[22:31:24] <wikibugs>	 (03CR) 10Filippo Giunchedi: "> Patch Set 3: Verified-1" [puppet] - 10https://gerrit.wikimedia.org/r/541619 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite)
[22:31:35] <wikibugs>	 10Operations, 10ops-eqiad, 10decommission, 10User-jijiki: Decommission rdb1001, rdb1002, rdb1003, rdb1004, rdb1007, rdb1008 - https://phabricator.wikimedia.org/T209181 (10Jclark-ctr) a:05Cmjohnson→03Jclark-ctr
[22:32:49] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Wikimedia-Logstash, and 3 others: Decommission old eqiad logstash hardware hosts logstash100[456] - https://phabricator.wikimedia.org/T217556 (10Jclark-ctr) a:05Cmjohnson→03Jclark-ctr
[22:33:27] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, 10Patch-For-Review: decom californium - https://phabricator.wikimedia.org/T189921 (10Jclark-ctr)
[22:34:19] <wikibugs>	 (03PS1) 10Papaul: DNS: Remove mgmt DNS for phab1002, astatine and production DNS for astatine [dns] - 10https://gerrit.wikimedia.org/r/542613
[22:35:34] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-EventLogging, 10decommission: Decommission dbproxy1004 and dbproxy1009 - https://phabricator.wikimedia.org/T228768 (10Jclark-ctr)
[22:36:05] <wikibugs>	 (03PS1) 10Groceryheist: update ssh key for nathante [puppet] - 10https://gerrit.wikimedia.org/r/542614
[22:36:07] <wikibugs>	 (03CR) 10Welcome, new contributor!: "Thank you for making your first contribution to Wikimedia! :) To learn how to get your code changes reviewed faster and more likely to get" [puppet] - 10https://gerrit.wikimedia.org/r/542614 (owner: 10Groceryheist)
[22:36:12] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] DNS: Remove mgmt DNS for phab1002, astatine and production DNS for astatine [dns] - 10https://gerrit.wikimedia.org/r/542613 (owner: 10Papaul)
[22:37:02] <wikibugs>	 (03Abandoned) 10Groceryheist: update ssh key for nathante [puppet] - 10https://gerrit.wikimedia.org/r/542614 (owner: 10Groceryheist)
[22:39:52] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, 10Patch-For-Review: decommission phab1002/WMF4727 - https://phabricator.wikimedia.org/T221391 (10Papaul) 05Open→03Resolved Complete
[22:39:55] <wikibugs>	 10Operations, 10serviceops: setup/install WMF7426 as phab1003.eqiad.wmnet - https://phabricator.wikimedia.org/T221389 (10Papaul)
[22:40:00] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review: setup/install phab1002(WMF4727) - https://phabricator.wikimedia.org/T196019 (10Papaul)
[22:40:04] <wikibugs>	 10Operations, 10hardware-requests, 10Patch-For-Review: request to assign wmf6937 (mw1298, former imagescaler) (now: wmf4727)  as phab1002 - https://phabricator.wikimedia.org/T195623 (10Papaul)
[22:40:09] <wikibugs>	 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10serviceops, 10Release-Engineering-Team (Development services): Reimage both phab1001 and phab2001 to stretch / buster - https://phabricator.wikimedia.org/T190568 (10Papaul)
[22:40:54] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, 10Patch-For-Review: decommission astatine - https://phabricator.wikimedia.org/T221244 (10Papaul)
[22:40:59] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: Decommission old and unused/spare servers in eqiad - https://phabricator.wikimedia.org/T187473 (10Papaul)
[22:41:02] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, 10Patch-For-Review: decommission astatine - https://phabricator.wikimedia.org/T221244 (10Papaul) 05Open→03Resolved Complete
[22:41:18] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decommission restbase10(0[7-9]|1[0-5]) - https://phabricator.wikimedia.org/T226715 (10Jclark-ctr)
[22:42:45] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10DC-Ops, 10decommission: Decommission old Kafka analytics brokers: kafka1012,kafka1013,kafka1014,kafka1020,kafka1022,kafka1023 - https://phabricator.wikimedia.org/T226517 (10Jclark-ctr)
[22:43:33] <wikibugs>	 10Operations, 10ops-eqiad, 10decommission: Decommission labcontrol1001 & labcontrol1002 - https://phabricator.wikimedia.org/T221817 (10Jclark-ctr)
[22:44:51] <wikibugs>	 10Operations, 10ops-eqiad, 10decommission, 10User-fgiunchedi: Return graphite100[13] to spares pool (or decom) - https://phabricator.wikimedia.org/T209357 (10Jclark-ctr)
[22:47:53] <wikibugs>	 10Operations, 10ops-eqiad, 10Data-Services, 10decommission, 10cloud-services-team (Kanban): Decommission labsdb1004.eqiad.wmnet and labsdb1005.eqiad.wmnet - https://phabricator.wikimedia.org/T216749 (10Jclark-ctr)
[22:48:32] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-VPS, 10decommission: Decommission iron - https://phabricator.wikimedia.org/T220505 (10Jclark-ctr)
[22:55:21] <wikibugs>	 (03PS1) 10Groceryheist: Update ssh key for nathante (try 2) [puppet] - 10https://gerrit.wikimedia.org/r/542618
[22:58:29] <wikibugs>	 10Operations, 10ops-eqiad, 10decommission, 10User-jijiki: Decommission rdb1001, rdb1002, rdb1003, rdb1004, rdb1007, rdb1008 - https://phabricator.wikimedia.org/T209181 (10Jclark-ctr)
[22:59:54] <icinga-wm>	 PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[23:00:47] <wikibugs>	 10Operations, 10ops-eqiad, 10decommission, 10User-jijiki: Decommission rdb1001, rdb1002, rdb1003, rdb1004, rdb1007, rdb1008 - https://phabricator.wikimedia.org/T209181 (10Jclark-ctr)
[23:03:20] <wikibugs>	 (03Abandoned) 10Groceryheist: Update ssh key for nathante (try 2) [puppet] - 10https://gerrit.wikimedia.org/r/542618 (owner: 10Groceryheist)
[23:06:53] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Wikimedia-Logstash, and 3 others: Decommission old eqiad logstash hardware hosts logstash100[456] - https://phabricator.wikimedia.org/T217556 (10Jclark-ctr)
[23:08:04] <wikibugs>	 (03PS1) 10Groceryheist: update ssh key for nathante [puppet] - 10https://gerrit.wikimedia.org/r/542621
[23:08:49] <groceryheist>	 I need to update my ssh key
[23:08:50] <groceryheist>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/542621
[23:10:30] <icinga-wm>	 RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[23:16:08] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-EventLogging, 10decommission: Decommission dbproxy1004 and dbproxy1009 - https://phabricator.wikimedia.org/T228768 (10Papaul) ` papaul@asw2-b-eqiad# show | compare  [edit interfaces] -   ge-5/0/12 { -       description dbproxy1004; -   }
[23:17:11] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-EventLogging, 10decommission: Decommission dbproxy1004 and dbproxy1009 - https://phabricator.wikimedia.org/T228768 (10Papaul)
[23:22:42] <icinga-wm>	 PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[23:23:01] <wikibugs>	 (03PS1) 10Papaul: DNS: Remove DNS for dbproxy1004 and dbproxy1009 [dns] - 10https://gerrit.wikimedia.org/r/542623
[23:25:52] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] DNS: Remove DNS for dbproxy1004 and dbproxy1009 [dns] - 10https://gerrit.wikimedia.org/r/542623 (owner: 10Papaul)
[23:27:36] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-EventLogging, and 2 others: Decommission dbproxy1004 and dbproxy1009 - https://phabricator.wikimedia.org/T228768 (10Papaul)
[23:29:29] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-EventLogging, and 2 others: Decommission dbproxy1004 and dbproxy1009 - https://phabricator.wikimedia.org/T228768 (10Papaul) @Jclark-ctr once you add dbproxy1009 to the decom Sheet, you can resolve the task. Thanks
[23:37:28] <wikibugs>	 (03CR) 10Cwhite: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/541619 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite)
[23:42:05] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: labsdb1009 broken PSU - https://phabricator.wikimedia.org/T233273 (10Jclark-ctr) @Marostegui   Received PSU.  would like to replace Monday
[23:43:56] <icinga-wm>	 RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[23:48:57] <wikibugs>	 (03PS2) 10Umherirrender: Switch to wmf specific run mode for $wgDisableQueryPageUpdate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530871 (https://phabricator.wikimedia.org/T78711)
[23:49:18] <wikibugs>	 (03CR) 10Umherirrender: "Rebased" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/530871 (https://phabricator.wikimedia.org/T78711) (owner: 10Umherirrender)
[23:49:28] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC) - https://phabricator.wikimedia.org/T227138 (10Jclark-ctr) 05Open→03Resolved  updated ps2-a2-eqiad and location  set to active.
[23:49:29] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: Install new PDUs in rows A/B (Top level tracking task) - https://phabricator.wikimedia.org/T226778 (10Jclark-ctr)