[00:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Evening SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200205T0000).
[00:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[00:00:40] <wikibugs>	 10Operations, 10Gerrit: gerrit1002 running out of space - https://phabricator.wikimedia.org/T243808 (10thcipriani) >>! In T243808#5835393, @Dzahn wrote: > @thcipriani ^ This is back to 94% as of right now after ^.  And it's been downtime for a month.  Is the test instance usable with the current size? >  > Als...
[00:01:51] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] DHCP: Add MAC address entries for mw2310 to mw2334 [puppet] - 10https://gerrit.wikimedia.org/r/570166 (https://phabricator.wikimedia.org/T241852) (owner: 10Papaul)
[00:03:04] <wikibugs>	 10Operations, 10DBA, 10Privacy Engineering, 10Traffic, and 4 others: dbtree loads third party resources (from jquery.com and google.com) - https://phabricator.wikimedia.org/T96499 (10JFishback_WMF)
[00:05:54] <wikibugs>	 (03PS1) 10Legoktm: Remove initial attempt at libraryupgrader puppetization [puppet] - 10https://gerrit.wikimedia.org/r/570169 (https://phabricator.wikimedia.org/T173478)
[00:47:01] <wikibugs>	 10Operations, 10ops-codfw, 10serviceops: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10Papaul)
[00:57:16] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/570054 (owner: 10Muehlenhoff)
[01:32:34] <wikibugs>	 10Operations, 10WMF-Blog-Social-Team, 10WMF-Communications, 10Wikimedia-Mailing-lists: Delete mailing list "worldcup2018" - https://phabricator.wikimedia.org/T244316 (10Aklapper)
[01:40:52] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: es1019: reseat IPMI - https://phabricator.wikimedia.org/T243963 (10jcrespo) > > > yes, this seems to be an issue > I hope you understood this was a rant directed towards the machine/vendor only and for background info. I don't think we will get rid of it until we...
[01:52:13] <wikibugs>	 (03PS1) 10Jdlrobson: Drop enwiki mainpage special casing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570180 (https://phabricator.wikimedia.org/T32405)
[01:53:19] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Drop enwiki mainpage special casing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570180 (https://phabricator.wikimedia.org/T32405) (owner: 10Jdlrobson)
[01:58:02] <wikibugs>	 (03CR) 10Jdlrobson: "composer buildDBLists is not working for me locally so am hoping deployer can help me fix that part." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570180 (https://phabricator.wikimedia.org/T32405) (owner: 10Jdlrobson)
[02:03:43] <wikibugs>	 10Operations, 10Traffic, 10Wikimedia-Blog, 10HTTPS: Change automatic shortlink in blog theme - https://phabricator.wikimedia.org/T165511 (10Varnent) 05Open→03Declined This site has been closed and is no longer being actively developed.
[02:06:03] <wikibugs>	 10Operations, 10Traffic, 10Wikimedia-Blog, 10HTTPS: make blog links from wmfwiki front page use HTTPS links - https://phabricator.wikimedia.org/T104728 (10Varnent)
[02:12:22] <wikibugs>	 (03PS1) 10Jdlrobson: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570186 (https://phabricator.wikimedia.org/T232140)
[02:13:29] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570186 (https://phabricator.wikimedia.org/T232140) (owner: 10Jdlrobson)
[02:19:49] <wikibugs>	 10Operations, 10WMF-Blog-Social-Team, 10WMF-Communications, 10Wikimedia-Mailing-lists: Delete mailing list "worldcup2018" - https://phabricator.wikimedia.org/T244316 (10Zoranzoki21) Yes, this is right!
[02:38:42] <cdanis>	 !log T243634 ✔️ cdanis@cp4030.ulsfo.wmnet ~ 🕤🍺 sudo varnish-frontend-restart                                   
[02:38:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:38:47] <stashbot>	 T243634: ulsfo varnish-fe vcache processes overflow on FDs - https://phabricator.wikimedia.org/T243634
[03:17:54] <wikibugs>	 10Operations, 10ops-codfw, 10serviceops: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10Papaul) |servers|ready for service| |mw2310|yes| |mw2311|yes| |mw2312|yes| |mw2313|yes| |mw2314|yes| |mw2315|yes| |mw2316|yes| |mw2317|yes| |mw2318|yes| |mw2319|yes| |mw2320|yes| |m...
[05:07:56] <wikibugs>	 (03PS2) 10Ammarpad: Drop enwiki mainpage special casing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570180 (https://phabricator.wikimedia.org/T32405) (owner: 10Jdlrobson)
[05:22:00] <wikibugs>	 (03CR) 10Jdlrobson: [C: 04-1] "Note to deployer: I only want to test this config change on wikimedia debug - I don't want to deploy this right now. I'll sync with deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570180 (https://phabricator.wikimedia.org/T32405) (owner: 10Jdlrobson)
[05:27:17] <icinga-wm>	 PROBLEM - Maps - OSM synchronization lag - codfw on icinga1001 is CRITICAL: 1.002e+06 ge 2.592e+05 https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=12&fullscreen&orgId=1
[05:43:59] <icinga-wm>	 PROBLEM - Maps - OSM synchronization lag - eqiad on icinga1001 is CRITICAL: 1.003e+06 ge 2.592e+05 https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=11&fullscreen&orgId=1
[06:03:37] <wikibugs>	 (03PS1) 10Marostegui: db2086: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/570192 (https://phabricator.wikimedia.org/T239453)
[06:04:47] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db2086: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/570192 (https://phabricator.wikimedia.org/T239453) (owner: 10Marostegui)
[06:07:55] <wikibugs>	 (03PS1) 10Marostegui: db1098: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/570194 (https://phabricator.wikimedia.org/T239453)
[06:09:12] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db2085:3311, db2086:3317 - T239453', diff saved to https://phabricator.wikimedia.org/P10311 and previous config saved to /var/cache/conftool/dbconfig/20200205-060911-marostegui.json
[06:09:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:09:16] <stashbot>	 T239453: Remove partitions from revision table - https://phabricator.wikimedia.org/T239453
[06:09:43] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1098:3317 - T239453', diff saved to https://phabricator.wikimedia.org/P10312 and previous config saved to /var/cache/conftool/dbconfig/20200205-060942-marostegui.json
[06:09:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:09:53] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1098: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/570194 (https://phabricator.wikimedia.org/T239453) (owner: 10Marostegui)
[06:12:53] <marostegui>	 !log Remove partitions from revision table db1098:3317 - T239453
[06:12:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:27:38] <icinga-wm>	 PROBLEM - DPKG on ores1005 is CRITICAL: connect to address 10.64.32.14 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:27:38] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores2005 is CRITICAL: connect to address 10.192.32.173 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:27:38] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores2009 is CRITICAL: connect to address 10.192.48.90 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:27:42] <icinga-wm>	 PROBLEM - puppet last run on ores2005 is CRITICAL: connect to address 10.192.32.173 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:27:42] <icinga-wm>	 PROBLEM - puppet last run on ores2007 is CRITICAL: connect to address 10.192.48.88 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:27:42] <icinga-wm>	 PROBLEM - dhclient process on ores2006 is CRITICAL: connect to address 10.192.32.174 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:27:44] <icinga-wm>	 PROBLEM - Check systemd state on ores1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:27:44] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ores2009 is CRITICAL: connect to address 10.192.48.90 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:27:44] <icinga-wm>	 PROBLEM - ores uWSGI web app on ores2006 is CRITICAL: connect to address 10.192.32.174 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Services/ores
[06:27:46] <icinga-wm>	 PROBLEM - DPKG on ores1006 is CRITICAL: connect to address 10.64.32.15 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:27:46] <icinga-wm>	 PROBLEM - Check systemd state on ores2005 is CRITICAL: connect to address 10.192.32.173 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:27:46] <icinga-wm>	 PROBLEM - configured eth on ores2002 is CRITICAL: connect to address 10.192.0.18 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:27:48] <icinga-wm>	 PROBLEM - dhclient process on ores1002 is CRITICAL: connect to address 10.64.0.52 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:27:50] <icinga-wm>	 PROBLEM - Check systemd state on ores1005 is CRITICAL: connect to address 10.64.32.14 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:27:50] <icinga-wm>	 PROBLEM - MD RAID on ores1001 is CRITICAL: connect to address 10.64.0.51 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:27:50] <icinga-wm>	 PROBLEM - Disk space on ores1001 is CRITICAL: connect to address 10.64.0.51 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores1001&var-datasource=eqiad+prometheus/ops
[06:27:50] <icinga-wm>	 PROBLEM - ores uWSGI web app on ores1005 is CRITICAL: connect to address 10.64.32.14 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Services/ores
[06:27:51] <icinga-wm>	 PROBLEM - configured eth on ores1009 is CRITICAL: connect to address 10.64.48.28 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:27:51] <icinga-wm>	 PROBLEM - puppet last run on ores2002 is CRITICAL: connect to address 10.192.0.18 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:27:52] <icinga-wm>	 PROBLEM - dhclient process on ores1005 is CRITICAL: connect to address 10.64.32.14 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:27:52] <icinga-wm>	 PROBLEM - puppet last run on ores1004 is CRITICAL: connect to address 10.64.16.95 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:27:52] <icinga-wm>	 PROBLEM - DPKG on ores1003 is CRITICAL: connect to address 10.64.16.94 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:27:53] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ores2001 is CRITICAL: connect to address 10.192.0.12 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:27:54] <icinga-wm>	 PROBLEM - MD RAID on ores2005 is CRITICAL: connect to address 10.192.32.173 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:27:54] <icinga-wm>	 PROBLEM - Disk space on ores2008 is CRITICAL: connect to address 10.192.48.89 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2008&var-datasource=codfw+prometheus/ops
[06:27:54] <icinga-wm>	 PROBLEM - dhclient process on ores1001 is CRITICAL: connect to address 10.64.0.51 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:27:55] <icinga-wm>	 PROBLEM - puppet last run on ores1009 is CRITICAL: connect to address 10.64.48.28 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:27:55] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores2001 is CRITICAL: connect to address 10.192.0.12 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:27:56] <icinga-wm>	 PROBLEM - ores uWSGI web app on ores2009 is CRITICAL: connect to address 10.192.48.90 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Services/ores
[06:27:56] <icinga-wm>	 PROBLEM - Check systemd state on ores2006 is CRITICAL: connect to address 10.192.32.174 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:27:57] <icinga-wm>	 PROBLEM - ores uWSGI web app on ores1009 is CRITICAL: connect to address 10.64.48.28 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Services/ores
[06:27:57] <icinga-wm>	 PROBLEM - configured eth on ores1003 is CRITICAL: connect to address 10.64.16.94 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:27:58] <icinga-wm>	 PROBLEM - dhclient process on ores2005 is CRITICAL: connect to address 10.192.32.173 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:27:58] <icinga-wm>	 PROBLEM - dhclient process on ores2007 is CRITICAL: connect to address 10.192.48.88 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:27:59] <icinga-wm>	 PROBLEM - DPKG on ores2008 is CRITICAL: connect to address 10.192.48.89 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:27:59] <icinga-wm>	 PROBLEM - configured eth on ores1001 is CRITICAL: connect to address 10.64.0.51 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:28:00] <icinga-wm>	 PROBLEM - dhclient process on ores1009 is CRITICAL: connect to address 10.64.48.28 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:28:00] <icinga-wm>	 PROBLEM - MD RAID on ores1003 is CRITICAL: connect to address 10.64.16.94 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:28:01] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ores2008 is CRITICAL: connect to address 10.192.48.89 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:28:01] <icinga-wm>	 PROBLEM - configured eth on ores2005 is CRITICAL: connect to address 10.192.32.173 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:28:02] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores1002 is CRITICAL: connect to address 10.64.0.52 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:28:02] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ores1006 is CRITICAL: connect to address 10.64.32.15 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:28:03] <icinga-wm>	 PROBLEM - ores uWSGI web app on ores2004 is CRITICAL: connect to address 10.192.16.64 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Services/ores
[06:28:03] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ores2004 is CRITICAL: connect to address 10.192.16.64 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:28:04] <icinga-wm>	 PROBLEM - Check systemd state on ores2002 is CRITICAL: connect to address 10.192.0.18 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:28:04] <icinga-wm>	 PROBLEM - configured eth on ores2006 is CRITICAL: connect to address 10.192.32.174 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:28:05] <icinga-wm>	 PROBLEM - dhclient process on ores2008 is CRITICAL: connect to address 10.192.48.89 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:28:06] <icinga-wm>	 PROBLEM - Disk space on ores2001 is CRITICAL: connect to address 10.192.0.12 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2001&var-datasource=codfw+prometheus/ops
[06:28:06] <icinga-wm>	 PROBLEM - ores uWSGI web app on ores2005 is CRITICAL: connect to address 10.192.32.173 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Services/ores
[06:28:06] <icinga-wm>	 PROBLEM - ores uWSGI web app on ores2007 is CRITICAL: connect to address 10.192.48.88 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Services/ores
[06:28:07] <icinga-wm>	 PROBLEM - DPKG on ores2001 is CRITICAL: connect to address 10.192.0.12 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:28:08] <icinga-wm>	 PROBLEM - configured eth on ores2008 is CRITICAL: connect to address 10.192.48.89 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:28:10] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ores1004 is CRITICAL: connect to address 10.64.16.95 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:28:10] <icinga-wm>	 PROBLEM - DPKG on ores2002 is CRITICAL: connect to address 10.192.0.18 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:28:12] <icinga-wm>	 PROBLEM - DPKG on ores2004 is CRITICAL: connect to address 10.192.16.64 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:28:12] <icinga-wm>	 PROBLEM - DPKG on ores1002 is CRITICAL: connect to address 10.64.0.52 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:28:12] <icinga-wm>	 PROBLEM - MD RAID on ores2008 is CRITICAL: connect to address 10.192.48.89 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:28:16] <icinga-wm>	 PROBLEM - Disk space on ores2005 is CRITICAL: connect to address 10.192.32.173 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2005&var-datasource=codfw+prometheus/ops
[06:28:20] <icinga-wm>	 PROBLEM - configured eth on ores1006 is CRITICAL: connect to address 10.64.32.15 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:28:20] <icinga-wm>	 PROBLEM - MD RAID on ores1004 is CRITICAL: connect to address 10.64.16.95 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:28:24] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores2004 is CRITICAL: connect to address 10.192.16.64 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:28:24] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores2006 is CRITICAL: connect to address 10.192.32.174 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:28:24] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores2008 is CRITICAL: connect to address 10.192.48.89 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:28:26] <icinga-wm>	 PROBLEM - Check systemd state on ores2008 is CRITICAL: connect to address 10.192.48.89 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:28:30] <icinga-wm>	 PROBLEM - puppet last run on ores1005 is CRITICAL: connect to address 10.64.32.14 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:28:32] <icinga-wm>	 PROBLEM - dhclient process on ores2009 is CRITICAL: connect to address 10.192.48.90 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:28:32] <icinga-wm>	 PROBLEM - Disk space on ores1003 is CRITICAL: connect to address 10.64.16.94 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores1003&var-datasource=eqiad+prometheus/ops
[06:28:32] <icinga-wm>	 PROBLEM - Disk space on ores1005 is CRITICAL: connect to address 10.64.32.14 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores1005&var-datasource=eqiad+prometheus/ops
[06:28:32] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores1005 is CRITICAL: connect to address 10.64.32.14 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:28:34] <icinga-wm>	 PROBLEM - DPKG on ores1001 is CRITICAL: connect to address 10.64.0.51 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:28:40] <icinga-wm>	 PROBLEM - Check systemd state on ores1003 is CRITICAL: connect to address 10.64.16.94 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:28:40] <icinga-wm>	 PROBLEM - Check systemd state on ores1001 is CRITICAL: connect to address 10.64.0.51 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:28:40] <icinga-wm>	 PROBLEM - configured eth on ores1005 is CRITICAL: connect to address 10.64.32.14 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:28:43] <icinga-wm>	 PROBLEM - MD RAID on ores1009 is CRITICAL: connect to address 10.64.48.28 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:28:46] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ores2005 is CRITICAL: connect to address 10.192.32.173 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:28:46] <icinga-wm>	 PROBLEM - dhclient process on ores2001 is CRITICAL: connect to address 10.192.0.12 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:28:50] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ores1005 is CRITICAL: connect to address 10.64.32.14 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:28:50] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ores1002 is CRITICAL: connect to address 10.64.0.52 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:28:50] <icinga-wm>	 PROBLEM - Disk space on ores2009 is CRITICAL: connect to address 10.192.48.90 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2009&var-datasource=codfw+prometheus/ops
[06:28:52] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ores1009 is CRITICAL: connect to address 10.64.48.28 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:28:52] <icinga-wm>	 PROBLEM - ores uWSGI web app on ores1002 is CRITICAL: connect to address 10.64.0.52 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Services/ores
[06:28:52] <icinga-wm>	 PROBLEM - Check systemd state on ores1004 is CRITICAL: connect to address 10.64.16.95 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:28:52] <icinga-wm>	 PROBLEM - DPKG on ores2009 is CRITICAL: connect to address 10.192.48.90 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:28:52] <icinga-wm>	 PROBLEM - MD RAID on ores1006 is CRITICAL: connect to address 10.64.32.15 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:28:54] <icinga-wm>	 PROBLEM - DPKG on ores1009 is CRITICAL: connect to address 10.64.48.28 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:28:54] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores2002 is CRITICAL: connect to address 10.192.0.18 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:28:56] <icinga-wm>	 PROBLEM - configured eth on ores2001 is CRITICAL: connect to address 10.192.0.12 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:28:56] <icinga-wm>	 PROBLEM - dhclient process on ores2004 is CRITICAL: connect to address 10.192.16.64 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:28:56] <icinga-wm>	 PROBLEM - ores uWSGI web app on ores2008 is CRITICAL: connect to address 10.192.48.89 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Services/ores
[06:28:56] <icinga-wm>	 PROBLEM - DPKG on ores2006 is CRITICAL: connect to address 10.192.32.174 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:28:56] <icinga-wm>	 PROBLEM - MD RAID on ores2006 is CRITICAL: connect to address 10.192.32.174 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:28:58] <icinga-wm>	 PROBLEM - Disk space on ores1009 is CRITICAL: connect to address 10.64.48.28 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores1009&var-datasource=eqiad+prometheus/ops
[06:28:58] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores1003 is CRITICAL: connect to address 10.64.16.94 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:29:00] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores1001 is CRITICAL: connect to address 10.64.0.51 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:29:06] <icinga-wm>	 PROBLEM - dhclient process on ores1003 is CRITICAL: connect to address 10.64.16.94 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:29:06] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores1006 is CRITICAL: connect to address 10.64.32.15 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:29:06] <icinga-wm>	 PROBLEM - Check systemd state on ores2001 is CRITICAL: connect to address 10.192.0.12 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:29:18] <icinga-wm>	 RECOVERY - dhclient process on ores2007 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:29:46] <icinga-wm>	 PROBLEM - Check systemd state on ores2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:29:48] <icinga-wm>	 PROBLEM - puppet last run on ores2001 is CRITICAL: connect to address 10.192.0.12 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:30:14] <icinga-wm>	 RECOVERY - DPKG on ores2006 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:30:14] <icinga-wm>	 RECOVERY - MD RAID on ores2006 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:30:24] <icinga-wm>	 RECOVERY - dhclient process on ores2006 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:30:26] <icinga-wm>	 RECOVERY - Check systemd state on ores1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:30:28] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on ores1004 is CRITICAL: connect to address 10.64.16.95 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/NTP
[06:30:48] <icinga-wm>	 RECOVERY - configured eth on ores2006 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:31:06] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores2006 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:31:14] <icinga-wm>	 PROBLEM - puppet last run on ores2008 is CRITICAL: connect to address 10.192.48.89 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:31:24] <icinga-wm>	 PROBLEM - ores uWSGI web app on ores1001 is CRITICAL: connect to address 10.64.0.51 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Services/ores
[06:31:24] <icinga-wm>	 PROBLEM - Check systemd state on ores1006 is CRITICAL: connect to address 10.64.32.15 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:31:24] <icinga-wm>	 PROBLEM - MD RAID on ores2001 is CRITICAL: connect to address 10.192.0.12 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:31:24] <icinga-wm>	 PROBLEM - Check systemd state on ores2004 is CRITICAL: connect to address 10.192.16.64 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:32:22] <icinga-wm>	 PROBLEM - puppet last run on ores1006 is CRITICAL: connect to address 10.64.32.15 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:32:22] <icinga-wm>	 PROBLEM - puppet last run on ores1003 is CRITICAL: connect to address 10.64.16.94 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:32:30] <elukey>	 !log force a puppet run on ores* hosts
[06:32:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:32:48] <icinga-wm>	 PROBLEM - puppet last run on ores1001 is CRITICAL: connect to address 10.64.0.51 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:32:52] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ores2005 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:32:52] <icinga-wm>	 RECOVERY - dhclient process on ores2001 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:32:56] <icinga-wm>	 RECOVERY - Disk space on ores2009 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2009&var-datasource=codfw+prometheus/ops
[06:32:58] <icinga-wm>	 RECOVERY - DPKG on ores2009 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:33:00] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores2002 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:33:00] <icinga-wm>	 RECOVERY - configured eth on ores2001 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:33:00] <icinga-wm>	 RECOVERY - dhclient process on ores2004 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:33:08] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores2005 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:33:08] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores2009 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:33:08] <icinga-wm>	 RECOVERY - puppet last run on ores2007 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:33:08] <icinga-wm>	 RECOVERY - puppet last run on ores2005 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:33:12] <icinga-wm>	 RECOVERY - Check systemd state on ores2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:33:14] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ores2009 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:33:16] <icinga-wm>	 RECOVERY - Check systemd state on ores2005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:33:16] <icinga-wm>	 RECOVERY - configured eth on ores2002 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:33:16] <icinga-wm>	 RECOVERY - puppet last run on ores2002 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:33:22] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ores2001 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:33:22] <icinga-wm>	 RECOVERY - Disk space on ores2008 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2008&var-datasource=codfw+prometheus/ops
[06:33:22] <icinga-wm>	 RECOVERY - MD RAID on ores2005 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:33:24] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores2001 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:33:26] <icinga-wm>	 RECOVERY - dhclient process on ores2005 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:33:26] <icinga-wm>	 RECOVERY - DPKG on ores2008 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:33:28] <icinga-wm>	 RECOVERY - configured eth on ores2005 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:33:28] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ores2008 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:33:30] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ores2004 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:33:32] <icinga-wm>	 RECOVERY - Check systemd state on ores2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:33:34] <icinga-wm>	 RECOVERY - dhclient process on ores2008 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:33:34] <icinga-wm>	 RECOVERY - Disk space on ores2001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2001&var-datasource=codfw+prometheus/ops
[06:33:36] <icinga-wm>	 RECOVERY - DPKG on ores2001 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:33:38] <icinga-wm>	 PROBLEM - Disk space on ores1006 is CRITICAL: connect to address 10.64.32.15 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores1006&var-datasource=eqiad+prometheus/ops
[06:33:38] <icinga-wm>	 PROBLEM - ores uWSGI web app on ores1003 is CRITICAL: connect to address 10.64.16.94 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Services/ores
[06:33:38] <icinga-wm>	 RECOVERY - configured eth on ores2008 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:33:40] <icinga-wm>	 RECOVERY - DPKG on ores2002 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:33:42] <icinga-wm>	 RECOVERY - DPKG on ores2004 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:33:44] <icinga-wm>	 RECOVERY - MD RAID on ores2008 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:33:48] <icinga-wm>	 RECOVERY - Disk space on ores2005 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores2005&var-datasource=codfw+prometheus/ops
[06:33:54] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores2004 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:33:54] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores2008 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:33:56] <icinga-wm>	 RECOVERY - Check systemd state on ores2003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:33:58] <icinga-wm>	 RECOVERY - Check systemd state on ores2008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:34:02] <icinga-wm>	 RECOVERY - dhclient process on ores2009 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:34:16] <icinga-wm>	 RECOVERY - Check systemd state on ores2004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:34:16] <icinga-wm>	 RECOVERY - MD RAID on ores2001 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:34:22] <icinga-wm>	 RECOVERY - Check systemd state on ores1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:34:30] <icinga-wm>	 RECOVERY - Disk space on ores1009 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores1009&var-datasource=eqiad+prometheus/ops
[06:34:30] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores1003 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:34:34] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores1001 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:34:34] <icinga-wm>	 RECOVERY - DPKG on ores1005 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:34:40] <icinga-wm>	 RECOVERY - dhclient process on ores1003 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:34:40] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores1006 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:34:44] <icinga-wm>	 RECOVERY - DPKG on ores1006 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:34:46] <icinga-wm>	 RECOVERY - dhclient process on ores1002 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:34:48] <icinga-wm>	 RECOVERY - Check systemd state on ores1005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:34:48] <icinga-wm>	 RECOVERY - MD RAID on ores1001 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:34:50] <icinga-wm>	 RECOVERY - Disk space on ores1001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores1001&var-datasource=eqiad+prometheus/ops
[06:34:50] <icinga-wm>	 RECOVERY - configured eth on ores1009 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:34:50] <icinga-wm>	 RECOVERY - dhclient process on ores1005 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:34:50] <icinga-wm>	 RECOVERY - DPKG on ores1003 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:34:54] <icinga-wm>	 RECOVERY - dhclient process on ores1001 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:34:54] <icinga-wm>	 RECOVERY - configured eth on ores1003 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:34:56] <icinga-wm>	 RECOVERY - configured eth on ores1001 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:34:56] <icinga-wm>	 RECOVERY - dhclient process on ores1009 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:34:56] <icinga-wm>	 RECOVERY - MD RAID on ores1003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:34:58] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores1002 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:34:58] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ores1006 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:35:10] <icinga-wm>	 RECOVERY - Disk space on ores1006 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores1006&var-datasource=eqiad+prometheus/ops
[06:35:10] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ores1004 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:35:14] <icinga-wm>	 RECOVERY - DPKG on ores1002 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:35:20] <icinga-wm>	 RECOVERY - puppet last run on ores2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:35:22] <icinga-wm>	 RECOVERY - configured eth on ores1006 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:35:22] <icinga-wm>	 RECOVERY - MD RAID on ores1004 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:35:36] <icinga-wm>	 RECOVERY - Disk space on ores1003 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores1003&var-datasource=eqiad+prometheus/ops
[06:35:36] <icinga-wm>	 RECOVERY - Disk space on ores1005 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores1005&var-datasource=eqiad+prometheus/ops
[06:35:36] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores1005 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:35:38] <icinga-wm>	 RECOVERY - DPKG on ores1001 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:35:42] <icinga-wm>	 RECOVERY - Check systemd state on ores1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:35:42] <icinga-wm>	 RECOVERY - Check systemd state on ores1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:35:42] <icinga-wm>	 RECOVERY - configured eth on ores1005 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:35:44] <icinga-wm>	 RECOVERY - MD RAID on ores1009 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:35:46] <icinga-wm>	 RECOVERY - Check systemd state on ores1006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:35:54] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ores1005 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:35:54] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ores1002 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:35:54] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ores1009 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:35:56] <icinga-wm>	 RECOVERY - MD RAID on ores1006 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:35:58] <icinga-wm>	 RECOVERY - DPKG on ores1009 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:36:21] <wikibugs>	 10Operations, 10ORES, 10Scoring-platform-team (Current): Ores celery OOM event in codfw - https://phabricator.wikimedia.org/T242705 (10elukey) Happened again today at 06:25UTC when logrotate ran, forced a puppet run on all hosts to recover quickly.
[06:36:46] <icinga-wm>	 RECOVERY - puppet last run on ores2008 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:37:54] <icinga-wm>	 RECOVERY - puppet last run on ores1006 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:37:54] <icinga-wm>	 RECOVERY - puppet last run on ores1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:38:20] <icinga-wm>	 RECOVERY - puppet last run on ores1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:38:48] <icinga-wm>	 RECOVERY - puppet last run on ores1004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:38:50] <icinga-wm>	 RECOVERY - puppet last run on ores1009 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:39:30] <icinga-wm>	 RECOVERY - puppet last run on ores1005 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:40:45] <wikibugs>	 (03PS2) 10Jdlrobson: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570186 (https://phabricator.wikimedia.org/T232140)
[06:41:12] <icinga-wm>	 RECOVERY - Check systemd state on ores2006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:41:54] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570186 (https://phabricator.wikimedia.org/T232140) (owner: 10Jdlrobson)
[06:42:04] <wikibugs>	 10Operations, 10ORES, 10Scoring-platform-team (Current): Ores celery OOM event in codfw - https://phabricator.wikimedia.org/T242705 (10elukey) The 7 days view shows a nice increase of memory usage:  https://grafana.wikimedia.org/d/000000607/cluster-overview?orgId=1&from=now-7d&to=now&var-datasource=eqiad%20p...
[06:47:18] <wikibugs>	 (03PS3) 10Jdlrobson: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570186 (https://phabricator.wikimedia.org/T232140)
[06:48:20] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570186 (https://phabricator.wikimedia.org/T232140) (owner: 10Jdlrobson)
[06:51:06] <wikibugs>	 (03PS4) 10Jdlrobson: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570186 (https://phabricator.wikimedia.org/T232140)
[06:51:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570186 (https://phabricator.wikimedia.org/T232140) (owner: 10Jdlrobson)
[06:59:44] <wikibugs>	 (03PS5) 10Jdlrobson: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570186 (https://phabricator.wikimedia.org/T232140)
[07:00:51] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on ores1004 is OK: OK: synced at Wed 2020-02-05 07:00:50 UTC. https://wikitech.wikimedia.org/wiki/NTP
[07:02:49] <marostegui>	 !log Replay s1 traffic on db1107 (10.4) T242702
[07:02:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:02:53] <stashbot>	 T242702: Test MariaDB 10.4 in production - https://phabricator.wikimedia.org/T242702
[07:15:26] <wikibugs>	 10Operations, 10ORES, 10Scoring-platform-team (Current): Ores celery OOM event in codfw - https://phabricator.wikimedia.org/T242705 (10elukey) There you go: https://tools.wmflabs.org/sal/log/AXAM254BfYQT6VcDATbh  The deployment matches the start of the memory growth, @Halfak do you have any idea what caused...
[07:17:54] <wikibugs>	 10Operations, 10Performance-Team, 10serviceops, 10Wikimedia-production-error: Wiki diffs take over 15s to load - https://phabricator.wikimedia.org/T244058 (10Joe) >>! In T244058#5849290, @aaron wrote: > Links to old (non-current) versions due not use the parser cache. This means that rendering will always...
[07:57:25] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 54, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:58:09] <icinga-wm>	 PROBLEM - Router interfaces on cr3-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 74, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[08:12:19] <effie>	 marostegui: I have been doing the little clinic work since last week 
[08:12:24] <effie>	 can you add me to the topic ?
[08:13:24] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Extend Puppet CA Expiry date - https://phabricator.wikimedia.org/T236277 (10Marostegui)
[08:13:27] <marostegui>	 sure
[08:13:29] <wikibugs>	 10Operations, 10ORES, 10Scoring-platform-team (Current): Ores celery OOM event in codfw - https://phabricator.wikimedia.org/T242705 (10akosiaris) T243451 does explain the higher memory usage. It even points out that the higher memory usage is worrisome, however it was deployed anyway.
[08:13:48] <wikibugs>	 (03CR) 10Muehlenhoff: [V: 03+2 C: 03+2] Switch to yaml.safe_load to loading update spec files [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/570054 (owner: 10Muehlenhoff)
[08:14:53] <wikibugs>	 (03PS1) 10Vgutierrez: admin: Add additional SSH key for vgutierrez [puppet] - 10https://gerrit.wikimedia.org/r/570241
[08:32:16] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] admin: Add additional SSH key for vgutierrez [puppet] - 10https://gerrit.wikimedia.org/r/570241 (owner: 10Vgutierrez)
[08:52:17] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 49 probes of 521 (alerts on 35) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[08:58:07] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 34 probes of 521 (alerts on 35) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[09:05:01] <ema>	 !log add individual FortiGate IPs hitting ulsfo (currently cp4028) to vcl blocked_nets -- trying to identify problematic traffic T243634
[09:05:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:05:05] <stashbot>	 T243634: ulsfo varnish-fe vcache processes overflow on FDs - https://phabricator.wikimedia.org/T243634
[09:15:02] <wikibugs>	 (03PS1) 10Ema: vcl: apply blocked_nets acl before request normalization [puppet] - 10https://gerrit.wikimedia.org/r/570247 (https://phabricator.wikimedia.org/T243634)
[09:20:30] <wikibugs>	 10Operations, 10ops-eqiad, 10Dumps-Generation: (Need By Jan 25) rack/setup/install snapshot1010.eqiad.wmnet - https://phabricator.wikimedia.org/T241794 (10ArielGlenn) How does the above ETA look, now that all hands as done and you have a better idea of what's on your plate?
[09:21:07] <wikibugs>	 (03PS1) 10Elukey: presto: add kerberos and tls support [puppet] - 10https://gerrit.wikimedia.org/r/570248
[09:22:36] <wikibugs>	 (03PS2) 10Ema: vcl: block requests before Host normalization and switching vcl [puppet] - 10https://gerrit.wikimedia.org/r/570247 (https://phabricator.wikimedia.org/T243634)
[09:25:40] <wikibugs>	 (03PS1) 10Elukey: Add fake passwords for Presto TLS [labs/private] - 10https://gerrit.wikimedia.org/r/570250
[09:25:48] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] vcl: block requests before Host normalization and switching vcl [puppet] - 10https://gerrit.wikimedia.org/r/570247 (https://phabricator.wikimedia.org/T243634) (owner: 10Ema)
[09:26:01] <wikibugs>	 (03CR) 10Elukey: [V: 03+2 C: 03+2] Add fake passwords for Presto TLS [labs/private] - 10https://gerrit.wikimedia.org/r/570250 (owner: 10Elukey)
[09:26:59] <wikibugs>	 (03CR) 10Ema: [C: 03+2] vcl: block requests before Host normalization and switching vcl [puppet] - 10https://gerrit.wikimedia.org/r/570247 (https://phabricator.wikimedia.org/T243634) (owner: 10Ema)
[09:27:34] <ema>	 elukey: OK to puppet-merge your Presto change?
[09:28:50] <ema>	 elukey: nananananananana?
[09:29:06] <wikibugs>	 (03PS2) 10Elukey: presto: add kerberos and tls support [puppet] - 10https://gerrit.wikimedia.org/r/570248
[09:29:52] <elukey>	 ema: ahahah yes!
[09:30:17] <ema>	 elukey: excellent, done :)
[09:31:01] <wikibugs>	 (03CR) 10Muehlenhoff: presto: add kerberos and tls support (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/570248 (owner: 10Elukey)
[09:33:00] <wikibugs>	 (03CR) 10Elukey: presto: add kerberos and tls support (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/570248 (owner: 10Elukey)
[09:39:53] <wikibugs>	 (03PS1) 10Elukey: Add fake Java keystore/truststore for the Presto test cluster [labs/private] - 10https://gerrit.wikimedia.org/r/570251
[09:40:51] <wikibugs>	 (03CR) 10Elukey: [V: 03+2 C: 03+2] Add fake Java keystore/truststore for the Presto test cluster [labs/private] - 10https://gerrit.wikimedia.org/r/570251 (owner: 10Elukey)
[09:41:40] <wikibugs>	 (03PS3) 10Elukey: presto: add kerberos and tls support [puppet] - 10https://gerrit.wikimedia.org/r/570248
[09:42:01] <wikibugs>	 (03PS1) 10Vgutierrez: requests: Use POST-as-GET to fetch the issued certificate [software/acme-chief] - 10https://gerrit.wikimedia.org/r/570252 (https://phabricator.wikimedia.org/T244236)
[09:44:35] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] requests: Use POST-as-GET to fetch the issued certificate [software/acme-chief] - 10https://gerrit.wikimedia.org/r/570252 (https://phabricator.wikimedia.org/T244236) (owner: 10Vgutierrez)
[09:47:15] <wikibugs>	 (03PS2) 10Vgutierrez: requests: Use POST-as-GET to fetch the issued certificate [software/acme-chief] - 10https://gerrit.wikimedia.org/r/570252 (https://phabricator.wikimedia.org/T244236)
[09:48:22] <wikibugs>	 (03PS4) 10Elukey: presto: add kerberos and tls support [puppet] - 10https://gerrit.wikimedia.org/r/570248
[09:51:21] <effie>	 !log install libmemcached-tools on mc-gp* servers - T240684
[09:51:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:51:23] <stashbot>	 T240684: Upgrade and improve our application object caching service (memcached) - https://phabricator.wikimedia.org/T240684
[09:53:23] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: standard: Add linux-perf to standard packages [puppet] - 10https://gerrit.wikimedia.org/r/570254
[09:57:09] <akosiaris>	 !log upload kubernetes 1.13.12 to apt.wikimedia.org stretch-wikimedia/main T244335
[09:57:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:57:12] <stashbot>	 T244335: Upgrade production kubernetes clusters to a security supported version - https://phabricator.wikimedia.org/T244335
[09:57:24] <wikibugs>	 (03PS11) 10Giuseppe Lavagetto: Configure forensic logging of Apache requests; enable on beta [puppet] - 10https://gerrit.wikimedia.org/r/511751 (owner: 10Ori.livneh)
[09:57:26] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::php: raise number of workers on the canaries [puppet] - 10https://gerrit.wikimedia.org/r/570255
[09:57:28] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::php: allow varying the slowlog limit [puppet] - 10https://gerrit.wikimedia.org/r/570256
[09:57:34] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+1] standard: Add linux-perf to standard packages [puppet] - 10https://gerrit.wikimedia.org/r/570254 (owner: 10Alexandros Kosiaris)
[10:02:04] <wikibugs>	 (03PS1) 10Ema: vcl: temporarily skip Host header normalization for FortiGate [puppet] - 10https://gerrit.wikimedia.org/r/570257 (https://phabricator.wikimedia.org/T243634)
[10:03:41] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] vcl: temporarily skip Host header normalization for FortiGate [puppet] - 10https://gerrit.wikimedia.org/r/570257 (https://phabricator.wikimedia.org/T243634) (owner: 10Ema)
[10:03:47] <wikibugs>	 (03CR) 10Ema: [C: 03+1] requests: Use POST-as-GET to fetch the issued certificate [software/acme-chief] - 10https://gerrit.wikimedia.org/r/570252 (https://phabricator.wikimedia.org/T244236) (owner: 10Vgutierrez)
[10:04:43] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 04-1] "I like the idea, but linux-perf doesn't exist on jessie, so we need to make it conditional on stretch and later." [puppet] - 10https://gerrit.wikimedia.org/r/570254 (owner: 10Alexandros Kosiaris)
[10:04:58] <wikibugs>	 (03CR) 10Ema: [C: 03+2] vcl: temporarily skip Host header normalization for FortiGate [puppet] - 10https://gerrit.wikimedia.org/r/570257 (https://phabricator.wikimedia.org/T243634) (owner: 10Ema)
[10:06:05] <ema>	 elukey: can the fake Java keystore/truststore stuff be puppet-merged?
[10:06:14] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] requests: Use POST-as-GET to fetch the issued certificate [software/acme-chief] - 10https://gerrit.wikimedia.org/r/570252 (https://phabricator.wikimedia.org/T244236) (owner: 10Vgutierrez)
[10:06:21] <elukey>	 ema: yes please :)
[10:06:55] <ema>	 elukey: done!
[10:08:15] <elukey>	 <3
[10:10:44] <Urbanecm>	 !log Run mwscript deleteEqualMessages.php --delete to delete GrowthExperiments' message overrides (cswiki, viwiki, arwiki, kowiki)
[10:10:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:14:24] <wikibugs>	 (03PS5) 10Elukey: presto: add kerberos and tls support [puppet] - 10https://gerrit.wikimedia.org/r/570248
[10:22:58] <wikibugs>	 10Operations, 10serviceops: Reduce read pressure on memcached servers by adding a machine-local Memcache instance - https://phabricator.wikimedia.org/T244340 (10Joe)
[10:24:08] <effie>	 !log Upload php-apcu_5.1.17+4.0.11-1+0~20190217111312.9+stretch~1.gbp192528+wmf2 - T236800
[10:24:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:24:19] <stashbot>	 T236800: Ensure apcu incr/decr are atomic (Upgrade php-apcu) - https://phabricator.wikimedia.org/T236800
[10:24:46] <akosiaris>	 !log T244335 upgrade kubernetes-master on neon.eqiad.wmnet (staging)
[10:24:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:24:49] <stashbot>	 T244335: Upgrade production kubernetes clusters to a security supported version - https://phabricator.wikimedia.org/T244335
[10:25:02] <wikibugs>	 (03PS1) 10Ema: Revert "vcl: temporarily skip Host header normalization for FortiGate" [puppet] - 10https://gerrit.wikimedia.org/r/570278 (https://phabricator.wikimedia.org/T243634)
[10:26:11] <wikibugs>	 (03CR) 10Ema: [C: 03+2] Revert "vcl: temporarily skip Host header normalization for FortiGate" [puppet] - 10https://gerrit.wikimedia.org/r/570278 (https://phabricator.wikimedia.org/T243634) (owner: 10Ema)
[10:38:05] <icinga-wm>	 PROBLEM - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on icinga1001 is CRITICAL: 2.451e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw
[10:38:35] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: ulsfo varnish-fe vcache processes overflow on FDs - https://phabricator.wikimedia.org/T243634 (10akosiaris) I blocked a number of IPs manually on cr3 and cr4 for ulsfo. Command was  `set policy-options prefix-list blackhole4 <prefix>`  for 5 IPs. The prefix list w...
[10:43:24] <ema>	 !log cp4028: varnish-frontend-restart T243634
[10:43:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:43:27] <stashbot>	 T243634: ulsfo varnish-fe vcache processes overflow on FDs - https://phabricator.wikimedia.org/T243634
[10:44:33] <wikibugs>	 (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1003/20616/" [puppet] - 10https://gerrit.wikimedia.org/r/570248 (owner: 10Elukey)
[10:50:01] <akosiaris>	 !log T244335 upgrade kubernetes-node on kubestage1002.eqiad.wmnet to 1.13.12
[10:50:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:50:05] <stashbot>	 T244335: Upgrade production kubernetes clusters to a security supported version - https://phabricator.wikimedia.org/T244335
[10:53:08] <wikibugs>	 (03PS16) 10ArielGlenn: write out and reuse pagerange info for big page content jobs [dumps] - 10https://gerrit.wikimedia.org/r/566580 (https://phabricator.wikimedia.org/T243434)
[10:53:28] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] write out and reuse pagerange info for big page content jobs [dumps] - 10https://gerrit.wikimedia.org/r/566580 (https://phabricator.wikimedia.org/T243434) (owner: 10ArielGlenn)
[10:53:30] <wikibugs>	 (03CR) 10Elukey: "Andrew let me know if this makes sense for you. The idea is to use self-signed certs and force HTTPS only to enable kerberos auth. It shou" [puppet] - 10https://gerrit.wikimedia.org/r/570248 (owner: 10Elukey)
[10:53:37] <akosiaris>	 !log rolling restart of all pods on kubernetes staging cluster to make sure everything is fine after the upgrade
[10:53:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:56:37] <icinga-wm>	 RECOVERY - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on icinga1001 is OK: (C)1e+05 gt (W)1e+04 gt 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw
[11:15:17] <wikibugs>	 (03PS1) 10Matthias Mullie: Re-enable delayed new upload jobs for MachineVision extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570287 (https://phabricator.wikimedia.org/T241072)
[11:16:16] <wikibugs>	 (03CR) 10Cparle: [C: 03+1] Re-enable delayed new upload jobs for MachineVision extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570287 (https://phabricator.wikimedia.org/T241072) (owner: 10Matthias Mullie)
[11:16:18] <wikibugs>	 (03PS1) 10Muehlenhoff: Explicitly add theemin to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/570288
[11:19:15] <wikibugs>	 (03PS1) 10Muehlenhoff: Add lvs2009 to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/570289
[11:20:16] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Revert "Update scaffold template names to use chart name" [deployment-charts] - 10https://gerrit.wikimedia.org/r/570290
[11:20:42] <wikibugs>	 10Operations, 10Traffic: ulsfo varnish-fe vcache processes overflow on FDs - https://phabricator.wikimedia.org/T243634 (10ema) Today we've been tackling the "FortiGate" angle (correlation described in T243634#5848297). The host in trouble this morning was cp4028, with 140k FDs at 10:30. In total, 5 different "...
[11:23:56] <wikibugs>	 (03PS2) 10Matthias Mullie: Re-enable delayed new upload jobs for MachineVision extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570287 (https://phabricator.wikimedia.org/T241072)
[11:29:45] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] Explicitly add theemin to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/570288 (owner: 10Muehlenhoff)
[11:37:48] <wikibugs>	 10Operations, 10Traffic: ulsfo varnish-fe vcache processes overflow on FDs - https://phabricator.wikimedia.org/T243634 (10akosiaris) I just reverted the cr3, cr4 uslfo change.
[11:41:11] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: hiera: cloud: drop toollabs::external_hostname and toollabs::is_mail_relay [puppet] - 10https://gerrit.wikimedia.org/r/570294 (https://phabricator.wikimedia.org/T244222)
[11:45:14] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "basically a NOOP https://puppet-compiler.wmflabs.org/compiler1001/20617/" [puppet] - 10https://gerrit.wikimedia.org/r/570294 (https://phabricator.wikimedia.org/T244222) (owner: 10Arturo Borrero Gonzalez)
[11:45:37] <icinga-wm>	 RECOVERY - Router interfaces on cr3-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 76, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[11:46:33] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 56, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[12:00:05] <jouncebot>	 Amir1, Lucas_WMDE, awight, and Urbanecm: (Dis)respected human, time to deploy European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200205T1200). Please do the needful.
[12:00:05] <jouncebot>	 Jdlrobson and awight: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[12:00:26] <Urbanecm>	 I can SWAT today!
[12:00:56] <awight>	 I'd be happy to self-deploy my change, after Jdlrobson's.
[12:01:25] <Urbanecm>	 Jdlrobson: are you around?
[12:01:52] <awight>	 Urbanecm: thanks for taking care of so many of these deployments!  I can also do the other patches, just to simplify the window...
[12:02:27] <Urbanecm>	 awight: hth! Your backport is abandoned in its main version, hope that's okay :)
[12:02:50] <awight>	 ah--nvm, Jdlrobson's patches look non-trivial, so better if a developer is present.  Maybe I should deploy my patch while we wait for Jon?
[12:03:02] <Urbanecm>	 awight: go ahead
[12:03:08] <awight>	 ty
[12:04:23] <Jdlrobson>	 awight: iim here
[12:04:31] <Jdlrobson>	 Sorry lost track of time
[12:04:36] * Urbanecm waves to Jdlrobson 
[12:04:39] <Jdlrobson>	 hey Urbanecm 
[12:04:48] <Jdlrobson>	 so I actually only want to deploy one of my changes
[12:04:55] <Jdlrobson>	 the other I just need to sync to wikimedia debug
[12:04:56] <Jdlrobson>	 is that possible?
[12:05:01] <Urbanecm>	 Sure
[12:05:03] <awight>	 I'm waiting for Jenkins, so Jdlrobson please go ahead.
[12:05:19] <Jdlrobson>	 the more time I have for testing the better so that suits me :)
[12:05:41] <Jdlrobson>	 https://gerrit.wikimedia.org/r/c/570180/ is the one I just want to test on wikimedia debug but not sync
[12:05:47] <Urbanecm>	 awight: I'll claim mwdebug1002 for Jdlrobson's debug-only change then, so we can leave it there for longer time
[12:05:53] <Jdlrobson>	 https://gerrit.wikimedia.org/r/c/570186/ should be synced
[12:05:58] <Urbanecm>	 ack
[12:06:00] <Jdlrobson>	 *but I'll need time to test
[12:06:02] <Jdlrobson>	 thanks Urbanecm :)
[12:06:15] <awight>	 Urbanecm: great to hear we can use both hosts again!
[12:06:24] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] Switch authdns* to standard Partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/566476 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff)
[12:06:27] <Urbanecm>	 awight: both are broken in the same way :D
[12:06:34] <wikibugs>	 (03PS2) 10Muehlenhoff: Explicitly add theemin to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/570288
[12:06:35] <Urbanecm>	 (broken=some log noise)
[12:07:09] <Urbanecm>	 Jdlrobson: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/570180 is available at mwdebug1002
[12:07:22] <awight>	 hehe
[12:07:42] <Jdlrobson>	 great! beginning testing!
[12:09:05] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Explicitly add theemin to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/570288 (owner: 10Muehlenhoff)
[12:09:32] <Jdlrobson>	 Urbanecm: are you sure https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/570180/ is working? It doesn't seem to be kicking in
[12:09:37] <Urbanecm>	 looking
[12:09:43] <Jdlrobson>	 (am not sure how dblists work exactly)
[12:10:11] <Urbanecm>	 Jdlrobson: could you try now?
[12:10:23] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM, thanks" [puppet] - 10https://gerrit.wikimedia.org/r/568857 (owner: 10Legoktm)
[12:11:22] <Jdlrobson>	 Urbanecm: nope.. maybe i did something wrong? Looks at config..
[12:12:00] <Urbanecm>	 hmm...
[12:12:49] <Urbanecm>	 it definitely looks applied
[12:12:50] <Jdlrobson>	 Urbanecm: it's possible it is working actually.. just not behaving how i expected
[12:12:52] <Jdlrobson>	 which is fine :)
[12:13:10] <Urbanecm>	 I've now checked, the variable that is controlled by the dblist is definitely changed
[12:13:20] <Jdlrobson>	 great. let me try another test
[12:13:23] <Urbanecm>	 sure
[12:13:40] <Jdlrobson>	 ok yep! thank you
[12:13:41] <Jdlrobson>	 all good!
[12:13:44] <Jdlrobson>	 phew!
[12:13:44] <Urbanecm>	 Jdlrobson: ad https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/570186, I can't SWAT that. Per https://wikitech.wikimedia.org/wiki/SWAT_deploys, each patch should need only one sync. You need to create several patches, each changing only IS.php, or only CS.php, not both.
[12:14:12] <Urbanecm>	 Jdlrobson: okay. Can I revert that?
[12:14:23] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] Add lvs2009 to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/570289 (owner: 10Muehlenhoff)
[12:15:00] <Jdlrobson>	 yes please Urbanecm 
[12:15:03] <Urbanecm>	 reverting
[12:15:07] <Jdlrobson>	 i'll deploy that next week
[12:15:23] <Urbanecm>	 Jdlrobson: you mean the other patch? 
[12:15:41] <wikibugs>	 (03CR) 10Jdlrobson: [C: 03+1] "Thanks to Urbancm I was able to test this today and can confirm it works as expected. I'll deploy next week once I get the go ahead from e" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570180 (https://phabricator.wikimedia.org/T32405) (owner: 10Jdlrobson)
[12:15:56] <Jdlrobson>	 Urbanecm: you can revert this one ^
[12:16:10] <Urbanecm>	 Jdlrobson: done
[12:16:22] <Jdlrobson>	 With respect to https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/570186 what do I need to change?
[12:16:29] <Jdlrobson>	 this needs to go out before next deploy or it will cause some issues
[12:16:42] <Jdlrobson>	 so 3 patches?
[12:16:44] <Urbanecm>	 yes
[12:16:56] <wikibugs>	 (03PS1) 10Vgutierrez: Release 0.23 [software/acme-chief] - 10https://gerrit.wikimedia.org/r/570303 (https://phabricator.wikimedia.org/T244236)
[12:17:06] <Amir1>	 Urbanecm: once you're done, let me know, I have some backports to deploy
[12:17:20] <Urbanecm>	 we had some issues with deployers syncing stuff in wrong order, so that's why the policy was introduced - dependency tree would make sure patches get merged in correct order.
[12:17:34] <Urbanecm>	 Jdlrobson: let me know once the patches are done
[12:17:36] <icinga-wm>	 RECOVERY - WDQS high update lag on wdqs1010 is OK: (C)3600 ge (W)1200 ge 1056 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[12:17:39] <Jdlrobson>	 tests/WgConfTestCase.php is just a docs change
[12:17:46] <Jdlrobson>	 do I still need to break it out into a separate commit?
[12:17:54] <Urbanecm>	 Jdlrobson: no - that can be anywhere
[12:17:58] <wikibugs>	 (03CR) 10Jbond: [C: 04-1] "unfortunately this won't work until we enable `rich_data`" [puppet] - 10https://gerrit.wikimedia.org/r/569570 (https://phabricator.wikimedia.org/T242585) (owner: 10Filippo Giunchedi)
[12:18:30] <Urbanecm>	 Amir1: ack
[12:19:43] <wikibugs>	 (03PS6) 10Jdlrobson: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570186 (https://phabricator.wikimedia.org/T232140)
[12:19:45] <wikibugs>	 (03PS1) 10Jdlrobson: Prepare tests for logo config change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570304 (https://phabricator.wikimedia.org/T232140)
[12:20:58] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] Release 0.23 [software/acme-chief] - 10https://gerrit.wikimedia.org/r/570303 (https://phabricator.wikimedia.org/T244236) (owner: 10Vgutierrez)
[12:21:15] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Prepare tests for logo config change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570304 (https://phabricator.wikimedia.org/T232140) (owner: 10Jdlrobson)
[12:21:38] <wikibugs>	 (03CR) 10Muehlenhoff: "Actually -1ing, role(spare) would introduce base::firewall, while LVSes don't use it and it's non-trivial to get rid of. I'll add a role w" [puppet] - 10https://gerrit.wikimedia.org/r/570289 (owner: 10Muehlenhoff)
[12:21:43] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 04-1] Add lvs2009 to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/570289 (owner: 10Muehlenhoff)
[12:22:06] <Urbanecm>	 Jdlrobson: ping me once you're done, please
[12:22:59] <wikibugs>	 (03PS2) 10Jdlrobson: Prepare tests for logo config change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570304 (https://phabricator.wikimedia.org/T232140)
[12:23:22] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: profile::services_proxy: add temporarily entries for k8s services [puppet] - 10https://gerrit.wikimedia.org/r/570306
[12:23:45] <wikibugs>	 (03PS7) 10Jdlrobson: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570186 (https://phabricator.wikimedia.org/T232140)
[12:24:30] <Jdlrobson>	 Is https://gerrit.wikimedia.org/r/570304 + https://gerrit.wikimedia.org/r/570186 what you meant? (former updates the tests to pass in both cases). If so I think I'm ready
[12:25:03] <Urbanecm>	 Jdlrobson: i need one commit to change IS.php, and second one CS.php.
[12:25:17] <Jdlrobson>	 but it's just a comment?
[12:25:59] <Urbanecm>	 https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/570186 seems to both change something outside a comment?
[12:26:11] <Urbanecm>	 IS.php: line 1316, wgLogoHD => wgLogos
[12:26:16] <Urbanecm>	 CS.php: Adds some stuff for back-compat
[12:26:17] <Jdlrobson>	 ohhhhh
[12:26:19] <wikibugs>	 (03PS1) 10Vgutierrez: requests: Use POST-as-GET to fetch the issued certificate [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/570307 (https://phabricator.wikimedia.org/T244236)
[12:26:21] <wikibugs>	 (03PS1) 10Vgutierrez: Release 0.23 [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/570308 (https://phabricator.wikimedia.org/T244236)
[12:26:22] <Jdlrobson>	 you are talking about those files not tests
[12:26:23] <wikibugs>	 (03PS1) 10Vgutierrez: debian: Add release 0.23 to changelog [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/570309 (https://phabricator.wikimedia.org/T244236)
[12:26:27] <Urbanecm>	 Jdlrobson: yes
[12:26:34] <Jdlrobson>	 oh my misunderstanding
[12:26:34] <Urbanecm>	 sorry for the confusion
[12:26:36] <Jdlrobson>	 okay that makes more sense
[12:26:55] <Urbanecm>	 awight: I guess you can go ahead as I wait for Jdlrobson
[12:27:16] <awight>	 +1 Urbanecm thanks, my patch is merged so here goes!
[12:27:24] <Urbanecm>	 cool
[12:28:18] <wikibugs>	 10Operations, 10Discovery, 10Traffic, 10Wikidata, and 3 others: Wikidata maxlag repeatedly over 5s since Jan20, 2020 (primarily caused by the query service) - https://phabricator.wikimedia.org/T243701 (10JeanFred)
[12:29:15] <Jdlrobson>	 ok Urbanecm should be 3rd time lucky
[12:29:55] <wikibugs>	 (03PS8) 10Jdlrobson: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos (1/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570186 (https://phabricator.wikimedia.org/T232140)
[12:29:57] <wikibugs>	 (03PS3) 10Jdlrobson: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos (1/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570304 (https://phabricator.wikimedia.org/T232140)
[12:30:18] <wikibugs>	 (03PS4) 10Jdlrobson: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos (2/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570304 (https://phabricator.wikimedia.org/T232140)
[12:30:24] <Jdlrobson>	 ok Urbanecm ready when you are
[12:30:55] <wikibugs>	 (03PS1) 10Vgutierrez: install_server: Reimage cp5006 as buster [puppet] - 10https://gerrit.wikimedia.org/r/570310 (https://phabricator.wikimedia.org/T242093)
[12:31:13] <Urbanecm>	 Jdlrobson: cool. Just want to make sure: I hope stuff won't break when servers won't see any wgLogoHD for some time?
[12:31:23] <awight>	 my patch works on mwdebug1001, syncing now.
[12:31:32] <Jdlrobson>	 wgLogoHD should be optional so this shouldn't be a problem
[12:31:44] <Urbanecm>	 Ok, makes sense
[12:32:00] <Jdlrobson>	 and temporary removal of wgLogoHD now beats removal of all logos in next weeks deploy :)
[12:32:10] <Urbanecm>	 true
[12:32:42] <logmsgbot>	 !log awight@deploy1001 Synchronized php-1.35.0-wmf.18/extensions/Cite: SWAT: [[gerrit:570285|Revert follow standardization (T240858)]] (duration: 01m 13s)
[12:32:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:32:46] <stashbot>	 T240858: Clean up implementation for "follow" cases - https://phabricator.wikimedia.org/T240858
[12:33:03] <awight>	 Urbanecm: Jdlrobson: I'm all done, thanks!
[12:33:09] <Urbanecm>	 thanks!
[12:33:16] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570186 (https://phabricator.wikimedia.org/T232140) (owner: 10Jdlrobson)
[12:33:22] <wikibugs>	 (03PS1) 10Ema: Revert "ATS: temporarily leave AE untouched" [puppet] - 10https://gerrit.wikimedia.org/r/570311 (https://phabricator.wikimedia.org/T242478)
[12:33:24] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570304 (https://phabricator.wikimedia.org/T232140) (owner: 10Jdlrobson)
[12:33:28] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] requests: Use POST-as-GET to fetch the issued certificate [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/570307 (https://phabricator.wikimedia.org/T244236) (owner: 10Vgutierrez)
[12:33:36] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] Release 0.23 [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/570308 (https://phabricator.wikimedia.org/T244236) (owner: 10Vgutierrez)
[12:34:16] <wikibugs>	 (03Merged) 10jenkins-bot: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos (1/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570186 (https://phabricator.wikimedia.org/T232140) (owner: 10Jdlrobson)
[12:34:20] <wikibugs>	 (03Merged) 10jenkins-bot: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos (2/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570304 (https://phabricator.wikimedia.org/T232140) (owner: 10Jdlrobson)
[12:34:44] <wikibugs>	 (03CR) 10Ema: [C: 03+1] "\o/" [puppet] - 10https://gerrit.wikimedia.org/r/570310 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez)
[12:34:47] <Urbanecm>	 Jdlrobson: pulled both onto mwdebug1001 in case you want to test it there
[12:35:03] <Jdlrobson>	 yes please
[12:35:10] <Urbanecm>	 lmk if it works correctly
[12:35:46] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: hiera: cloud: tools: drop hiera keys migrated to horizon [puppet] - 10https://gerrit.wikimedia.org/r/570313 (https://phabricator.wikimedia.org/T244222)
[12:36:07] <wikibugs>	 (03Merged) 10jenkins-bot: requests: Use POST-as-GET to fetch the issued certificate [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/570307 (https://phabricator.wikimedia.org/T244236) (owner: 10Vgutierrez)
[12:36:38] <wikibugs>	 (03Merged) 10jenkins-bot: Release 0.23 [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/570308 (https://phabricator.wikimedia.org/T244236) (owner: 10Vgutierrez)
[12:36:49] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] hiera: cloud: tools: drop hiera keys migrated to horizon [puppet] - 10https://gerrit.wikimedia.org/r/570313 (https://phabricator.wikimedia.org/T244222) (owner: 10Arturo Borrero Gonzalez)
[12:36:52] <wikibugs>	 (03PS2) 10Vgutierrez: debian: Add release 0.23 to changelog [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/570309 (https://phabricator.wikimedia.org/T244236)
[12:37:35] <Urbanecm>	 Jdlrobson: I see a lot of memcached errors, https://logstash.wikimedia.org/goto/dced2f987d8d2ff9a77fc4948b465114
[12:37:58] <Jdlrobson>	 still looking
[12:38:27] <Urbanecm>	 sure
[12:39:06] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Investigate using the rich_data opsion to support Binary and binary_file for binary data - https://phabricator.wikimedia.org/T236481 (10jbond) for some reason this change is not attached to the ticket https://gerrit.wikimedia.org/r/c/operations/soft...
[12:39:28] <Jdlrobson>	 Urbanecm: are you seeing memcached issues relating to logos?
[12:40:36] <Urbanecm>	 Jdlrobson: no, but I see some related to resourceloader, which is mentioned in your commits
[12:41:10] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: hiera: cloud: tools: drop data for tools-bastion-03 [puppet] - 10https://gerrit.wikimedia.org/r/570314 (https://phabricator.wikimedia.org/T244222)
[12:41:35] <Jdlrobson>	 both changes are on there or just one?
[12:41:39] <Urbanecm>	 both
[12:41:44] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] hiera: cloud: tools: drop data for tools-bastion-03 [puppet] - 10https://gerrit.wikimedia.org/r/570314 (https://phabricator.wikimedia.org/T244222) (owner: 10Arturo Borrero Gonzalez)
[12:44:44] <Jdlrobson>	 change LGTM
[12:47:04] <Urbanecm>	 ok
[12:48:54] <logmsgbot>	 !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 5cc2b70: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos (T232140) (duration: 01m 07s)
[12:48:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:48:58] <stashbot>	 T232140: Separate out logo handling into square image logos and long text/wordmark banner logos - https://phabricator.wikimedia.org/T232140
[12:48:59] <Urbanecm>	 Jdlrobson: first one synced
[12:49:29] <Urbanecm>	 syncing the second one
[12:50:05] <Jdlrobson>	 Urbanecm: all good so far..
[12:50:32] <logmsgbot>	 !log urbanecm@deploy1001 Synchronized wmf-config/CommonSettings.php: SWAT: d450288: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos (T232140) (duration: 01m 07s)
[12:50:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:50:43] <Urbanecm>	 Jdlrobson: and second one is out too
[12:50:46] <Urbanecm>	 thanks for your patience
[12:51:20] <Urbanecm>	 Amir1: air is clear
[12:51:35] <Amir1>	 cool, I'm already testing it in mwdebug1001
[12:51:54] <Jdlrobson>	 Urbanecm: thank you !
[12:51:58] <Urbanecm>	 Happy to help!
[12:55:36] <XioNoX>	 !log disable transit/peering BGP sessions on cr2-eqdfw
[12:55:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:55:51] <wikibugs>	 (03CR) 10Vgutierrez: "recheck" [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/570309 (https://phabricator.wikimedia.org/T244236) (owner: 10Vgutierrez)
[12:58:40] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch cescout* to standard Partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/570316 (https://phabricator.wikimedia.org/T156955)
[13:00:45] <Jdlrobson>	 Urbanecm:  looks like there may be a problem? https://logstash.wikimedia.org/app/kibana#/discover?_g=h@c8f79bd&_a=h@3dd85cf
[13:00:57] <Amir1>	 !log SWAT needs more time
[13:00:59] <Jdlrobson>	 PHP Notice: Undefined variable: wgLogos in /srv/mediawiki/wmf-config/CommonSettings.php on line 857
[13:00:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:01:17] <XioNoX>	 !log reboot cr2-eqdfw for software upgrade
[13:01:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:01:19] <Urbanecm>	 UhOh, the cache issue...
[13:01:30] <Urbanecm>	 Amir1: you fine with me resyncing that?
[13:01:35] <Amir1>	 IS.php?
[13:01:37] <Urbanecm>	 yup
[13:01:51] <Amir1>	 okay
[13:02:02] <Urbanecm>	 submitted
[13:03:06] <logmsgbot>	 !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 5cc2b70: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos (T232140) (duration: 01m 06s)
[13:03:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:03:10] <stashbot>	 T232140: Separate out logo handling into square image logos and long text/wordmark banner logos - https://phabricator.wikimedia.org/T232140
[13:03:21] <Urbanecm>	 Jdlrobson: should be fine now
[13:03:39] <wikibugs>	 (03PS3) 10Vgutierrez: debian: Add release 0.23 to changelog [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/570309 (https://phabricator.wikimedia.org/T244236)
[13:04:17] <wikibugs>	 (03Abandoned) 10Filippo Giunchedi: cassandra: use wmflib::secret for binary files [puppet] - 10https://gerrit.wikimedia.org/r/569570 (https://phabricator.wikimedia.org/T242585) (owner: 10Filippo Giunchedi)
[13:04:58] <icinga-wm>	 PROBLEM - OSPF status on cr2-codfw is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[13:05:32] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] Switch cescout* to standard Partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/570316 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff)
[13:05:48] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[13:06:03] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] install_server: Reimage cp5006 as buster [puppet] - 10https://gerrit.wikimedia.org/r/570310 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez)
[13:06:28] <icinga-wm>	 PROBLEM - OSPF status on cr3-knams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[13:06:59] <XioNoX>	 all of those are expected ^
[13:08:00] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] debian: Add release 0.23 to changelog [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/570309 (https://phabricator.wikimedia.org/T244236) (owner: 10Vgutierrez)
[13:08:48] <vgutierrez>	 !log depooling & reimaging cp5006 as buster - T242093
[13:08:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:08:52] <stashbot>	 T242093: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093
[13:08:56] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[13:09:25] <wikibugs>	 (03CR) 10Gilles: "I've created a docker image available on docker hub with all you need, that should be helpful:" [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/568646 (https://phabricator.wikimedia.org/T228467) (owner: 10Brion VIBBER)
[13:09:38] <icinga-wm>	 RECOVERY - OSPF status on cr2-codfw is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[13:10:37] <XioNoX>	 !log rollback: disable transit/peering BGP sessions on cr2-eqdfw
[13:10:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:11:06] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on cumin1001.eqiad.wmnet for hosts: ` cp5006.eqsin.wmnet ` The log can be found in `/var/log/wmf-auto-reima...
[13:14:00] <wikibugs>	 (03CR) 10Gilles: [C: 04-1] "3 tests are failing with this patch applied, due to visual dissimilarity. For some tests the difference is huge, which might suggest that " [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/568646 (https://phabricator.wikimedia.org/T228467) (owner: 10Brion VIBBER)
[13:15:03] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized php-1.35.0-wmf.16/extensions/Wikibase/lib/includes/Store/CachingPropertyInfoLookup.php: SWAT: [[gerrit:570301|Cache PropertyInfoLookup internally]] (T243955) (duration: 01m 07s)
[13:15:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:15:06] <stashbot>	 T243955: CachingPropertyInfoLookup doesn't cache lookups internally - https://phabricator.wikimedia.org/T243955
[13:15:37] <XioNoX>	 !log disable transit/peering BGP sessions on cr2-eqord
[13:15:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:15:57] <wikibugs>	 (03CR) 10Gilles: "As pointed out in the other patch, the docker image I've just made should help you write tests easily: https://wikitech.wikimedia.org/wiki" [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/569341 (https://phabricator.wikimedia.org/T166024) (owner: 10Brion VIBBER)
[13:16:12] <vgutierrez>	 !log upload acme-chief 0.23 to apt.wm.o (buster) - T244236
[13:16:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:17] <stashbot>	 T244236: acme-chief is unable to renew certificates against LE staging environment - https://phabricator.wikimedia.org/T244236
[13:17:46] <XioNoX>	 !log increase ospf cost for cr2-eqord links
[13:17:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:18:43] <wikibugs>	 (03Restored) 10Filippo Giunchedi: cassandra: use wmflib::secret for binary files [puppet] - 10https://gerrit.wikimedia.org/r/569570 (https://phabricator.wikimedia.org/T242585) (owner: 10Filippo Giunchedi)
[13:22:20] <wikibugs>	 (03PS3) 10Filippo Giunchedi: cassandra: use wmflib::secret for binary files [puppet] - 10https://gerrit.wikimedia.org/r/569570 (https://phabricator.wikimedia.org/T242585)
[13:22:22] <wikibugs>	 (03PS4) 10Filippo Giunchedi: wip: cassandra logs to logging pipeline [puppet] - 10https://gerrit.wikimedia.org/r/569564 (https://phabricator.wikimedia.org/T242585)
[13:24:02] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "Added a disclaimer pointing to the ticket after chatting with John, should be good enough to make PCC available again and DTRT in producti" [puppet] - 10https://gerrit.wikimedia.org/r/569570 (https://phabricator.wikimedia.org/T242585) (owner: 10Filippo Giunchedi)
[13:24:46] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized php-1.35.0-wmf.18/extensions/Wikibase/lib/includes/Store/CachingPropertyInfoLookup.php: SWAT: [[gerrit:570301|Cache PropertyInfoLookup internally]] (T243955) (duration: 01m 07s)
[13:24:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:24:50] <stashbot>	 T243955: CachingPropertyInfoLookup doesn't cache lookups internally - https://phabricator.wikimedia.org/T243955
[13:25:16] <XioNoX>	 !log reboot cr2-eqord for software upgrade - yaaaaa
[13:25:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:26:33] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "minor nit, otherwise LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/563374 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff)
[13:28:57] <icinga-wm>	 PROBLEM - Router interfaces on cr3-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 74, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[13:29:54] <akosiaris>	 !log manually set 10.2.1.42 eventgate-analytics.discovery.wmnet in /etc/hosts for mw1331, mw1348. Verify hypothesis that this should cause increased latency
[13:29:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:31:01] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 240, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[13:31:53] <icinga-wm>	 PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 4/6 UP : OSPFv3: 4/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[13:32:25] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 242, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[13:32:38] <wikibugs>	 (03PS1) 10Jdlrobson: Restore wgLogoHD to wikis without a MinervaCustomLogos defined [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570326 (https://phabricator.wikimedia.org/T232140)
[13:33:01] <icinga-wm>	 RECOVERY - Router interfaces on cr3-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 76, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[13:33:13] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[13:33:39] <XioNoX>	 annnnndd it's back!
[13:34:25] <wikibugs>	 (03PS2) 10Ema: Revert "ATS: temporarily leave AE untouched" [puppet] - 10https://gerrit.wikimedia.org/r/570311 (https://phabricator.wikimedia.org/T242478)
[13:35:06] <XioNoX>	 !log rollback traffic steering off cr2-eqord
[13:35:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:37:20] <wikibugs>	 (03PS17) 10ArielGlenn: write out and reuse pagerange info for big page content jobs [dumps] - 10https://gerrit.wikimedia.org/r/566580 (https://phabricator.wikimedia.org/T243434)
[13:37:27] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] Revert "ATS: temporarily leave AE untouched" [puppet] - 10https://gerrit.wikimedia.org/r/570311 (https://phabricator.wikimedia.org/T242478) (owner: 10Ema)
[13:37:44] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] write out and reuse pagerange info for big page content jobs [dumps] - 10https://gerrit.wikimedia.org/r/566580 (https://phabricator.wikimedia.org/T243434) (owner: 10ArielGlenn)
[13:38:25] <ema>	 !log cp: disable puppet and merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570311/ T242478
[13:38:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:38:28] <stashbot>	 T242478: Production load.php spends ~ 10% time doing output compression within PHP - https://phabricator.wikimedia.org/T242478
[13:39:17] <Amir1>	 !log EU SWAT is done
[13:39:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:39:38] <wikibugs>	 (03CR) 10Ema: [C: 03+2] Revert "ATS: temporarily leave AE untouched" [puppet] - 10https://gerrit.wikimedia.org/r/570311 (https://phabricator.wikimedia.org/T242478) (owner: 10Ema)
[13:40:27] <Jdlrobson>	 Amir1: looks like I broke higher dpi logos on quite a few projects with my change.  It's late where I am (am on UTC+8)  but I've asked someone in my team to deploy https://gerrit.wikimedia.org/r/570326 in a later swat window. Letting you know in case anyone raises the issue here.
[13:41:28] <ema>	 !log cp1075: unset Accept-Encoding on origin server requests T242478
[13:41:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:42:00] <Amir1>	 Jdlrobson: sure, right now we have another problem though (increased latency) 
[13:42:03] <akosiaris>	 !log undo the manually set 10.2.1.42 eventgate-analytics.discovery.wmnet in /etc/hosts for mw1331, mw1348. Verify hypothesis that this should cause increased latency. Restart php-fpm
[13:42:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:42:10] <akosiaris>	 Amir1: wait out a bit 
[13:42:13] <akosiaris>	 I may be to blame
[13:42:36] * Amir1 loves blaming 
[13:42:48] <awight>	 (lol)
[13:43:07] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.downtime
[13:43:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:43:10] <akosiaris>	 I may also not be to blame ofc. The jump at the memcached gets probably exhonerates me however
[13:43:31] <akosiaris>	 yeah, doesn't look like it's my change
[13:45:29] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[13:45:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:46:51] <marostegui>	 !log Decrease buffer pool size on db1107 for testing - T242702
[13:46:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:46:54] <stashbot>	 T242702: Test MariaDB 10.4 in production - https://phabricator.wikimedia.org/T242702
[13:48:27] <wikibugs>	 (03PS18) 10ArielGlenn: write out and reuse pagerange info for big page content jobs [dumps] - 10https://gerrit.wikimedia.org/r/566580 (https://phabricator.wikimedia.org/T243434)
[13:50:58] <wikibugs>	 (03PS2) 10Muehlenhoff: Switch ORES to standard partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/563374 (https://phabricator.wikimedia.org/T156955)
[13:51:23] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Switch ORES to standard partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/563374 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff)
[13:51:30] <wikibugs>	 (03CR) 10Muehlenhoff: Switch ORES to standard partman recipes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/563374 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff)
[13:54:47] <wikibugs>	 (03PS1) 10Jbond: wmflib::end_with: create String.end_with function [puppet] - 10https://gerrit.wikimedia.org/r/570330 (https://phabricator.wikimedia.org/T244222)
[13:54:49] <wikibugs>	 (03PS1) 10Jbond: realm global: make the realm variable a global in labs [puppet] - 10https://gerrit.wikimedia.org/r/570331 (https://phabricator.wikimedia.org/T244222)
[13:55:02] <wikibugs>	 (03PS3) 10Muehlenhoff: Switch ORES to standard partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/563374 (https://phabricator.wikimedia.org/T156955)
[13:55:33] <wikibugs>	 (03PS1) 10Vgutierrez: requests: Fix content-type on fetch_certificate [software/acme-chief] - 10https://gerrit.wikimedia.org/r/570332 (https://phabricator.wikimedia.org/T244236)
[13:56:39] <wikibugs>	 (03PS4) 10Muehlenhoff: Switch ORES to standard partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/563374 (https://phabricator.wikimedia.org/T156955)
[13:58:15] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch ORES to standard partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/563374 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff)
[13:58:43] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] "Sounds good to me!  Is there a reason not to use the PuppetCA?  Totally fine with self signed, just curious." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/570248 (owner: 10Elukey)
[13:59:22] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqdfw is CRITICAL: OSPFv2: 3/5 UP : OSPFv3: 3/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[14:00:20] <icinga-wm>	 PROBLEM - BFD status on cr2-eqdfw is CRITICAL: CRIT: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[14:00:57] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp5006.eqsin.wmnet'] `  and were **ALL** successful.
[14:01:31] <wikibugs>	 (03PS2) 10Muehlenhoff: Switch authdns* to standard Partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/566476 (https://phabricator.wikimedia.org/T156955)
[14:01:59] <wikibugs>	 (03CR) 10Ema: [C: 03+1] requests: Fix content-type on fetch_certificate [software/acme-chief] - 10https://gerrit.wikimedia.org/r/570332 (https://phabricator.wikimedia.org/T244236) (owner: 10Vgutierrez)
[14:02:43] <wikibugs>	 10Operations, 10observability: Provision grafana VM in codfw - https://phabricator.wikimedia.org/T244357 (10fgiunchedi)
[14:02:51] <Jdlrobson>	 i leave the wikis in your more than capable hands awight Amir1 :)
[14:02:51] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] requests: Fix content-type on fetch_certificate [software/acme-chief] - 10https://gerrit.wikimedia.org/r/570332 (https://phabricator.wikimedia.org/T244236) (owner: 10Vgutierrez)
[14:03:41] <wikibugs>	 (03CR) 10Jdlrobson: [C: 03+1] "please swat asap to restore HD logos to a bunch of wikis." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570326 (https://phabricator.wikimedia.org/T232140) (owner: 10Jdlrobson)
[14:03:44] <awight>	 Amir1: I'm also finished, so it's all yours!
[14:04:23] <Amir1>	 I have nothing to do right now, do you want me to deploy things?
[14:04:27] <Amir1>	 I'm slightly confused now
[14:05:47] <wikibugs>	 (03CR) 10Elukey: ">" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/570248 (owner: 10Elukey)
[14:06:14] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[14:06:46] <awight>	 Amir1: No, but I heard a rumor earlier that you had your own backports :-)
[14:07:21] <Amir1>	 awight: I backported them one hour and five minutes ago :P
[14:08:12] <awight>	 bahaha.  Until next time, then!
[14:09:40] <wikibugs>	 (03PS1) 10Ema: ATS: temporarily leave AE untouched [puppet] - 10https://gerrit.wikimedia.org/r/570336 (https://phabricator.wikimedia.org/T242478)
[14:12:36] <wikibugs>	 (03CR) 10Ema: [C: 03+2] ATS: temporarily leave AE untouched [puppet] - 10https://gerrit.wikimedia.org/r/570336 (https://phabricator.wikimedia.org/T242478) (owner: 10Ema)
[14:13:51] <ema>	 !log cp1075: back to leaving Accept-Encoding as it is due to unrelated applayer issues T242478
[14:13:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:13:54] <stashbot>	 T242478: Production load.php spends ~ 10% time doing output compression within PHP - https://phabricator.wikimedia.org/T242478
[14:14:11] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] Release 0.24 [software/acme-chief] - 10https://gerrit.wikimedia.org/r/570338 (https://phabricator.wikimedia.org/T244236) (owner: 10Vgutierrez)
[14:14:38] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[14:16:42] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] "Makes sense proceed! TY!" [puppet] - 10https://gerrit.wikimedia.org/r/570248 (owner: 10Elukey)
[14:17:11] <wikibugs>	 (03Merged) 10jenkins-bot: Release 0.24 [software/acme-chief] - 10https://gerrit.wikimedia.org/r/570338 (https://phabricator.wikimedia.org/T244236) (owner: 10Vgutierrez)
[14:17:14] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqdfw is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[14:17:14] <icinga-wm>	 RECOVERY - OSPF status on cr3-knams is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[14:18:18] <icinga-wm>	 RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[14:19:33] <icinga-wm>	 ACKNOWLEDGEMENT - BFD status on cr2-eqdfw is CRITICAL: CRIT: Down: 1 Ayounsi T/S with Juniper https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[14:21:17] <wikibugs>	 (03PS1) 10Vgutierrez: requests: Fix content-type on fetch_certificate [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/570340 (https://phabricator.wikimedia.org/T244236)
[14:21:19] <wikibugs>	 (03PS1) 10Vgutierrez: Release 0.24 [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/570341 (https://phabricator.wikimedia.org/T244236)
[14:21:21] <wikibugs>	 (03PS1) 10Vgutierrez: debian: Add release 0.24 to changelog [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/570342 (https://phabricator.wikimedia.org/T244236)
[14:23:02] <vgutierrez>	 !log pooling cp5006 - T242093
[14:23:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:23:05] <stashbot>	 T242093: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093
[14:24:16] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] requests: Fix content-type on fetch_certificate [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/570340 (https://phabricator.wikimedia.org/T244236) (owner: 10Vgutierrez)
[14:24:28] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] Release 0.24 [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/570341 (https://phabricator.wikimedia.org/T244236) (owner: 10Vgutierrez)
[14:26:25] <wikibugs>	 (03PS1) 10Jbond: wmflib::require_domains: add new domain to to replace require_realm [puppet] - 10https://gerrit.wikimedia.org/r/570343
[14:26:37] <XioNoX>	 !log push inital flowspec config to all routers
[14:26:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:27:46] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] debian: Add release 0.24 to changelog [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/570342 (https://phabricator.wikimedia.org/T244236) (owner: 10Vgutierrez)
[14:29:44] <wikibugs>	 (03PS2) 10Jbond: wmflib::require_domains: add new function to to replace require_realm [puppet] - 10https://gerrit.wikimedia.org/r/570343
[14:29:52] <wikibugs>	 (03PS2) 10Jbond: realm global: make the realm variable a global in labs [puppet] - 10https://gerrit.wikimedia.org/r/570331 (https://phabricator.wikimedia.org/T244222)
[14:30:19] <wikibugs>	 (03PS3) 10Jbond: wmflib::require_domains: add new function to to replace require_realm [puppet] - 10https://gerrit.wikimedia.org/r/570343
[14:30:30] <vgutierrez>	 !log upload acme-chief 0.24 to apt.wm.o (buster) - T244236
[14:30:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:30:33] <stashbot>	 T244236: acme-chief is unable to renew certificates against LE staging environment - https://phabricator.wikimedia.org/T244236
[14:30:37] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: mcrouter: run at nice -19 as php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/570346
[14:32:14] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[14:32:32] <_joe_>	 !log restarting mcrouter at nice -19 on mw1331 for testing effects of that change
[14:32:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:33:05] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] mcrouter: run at nice -19 as php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/570346 (owner: 10Giuseppe Lavagetto)
[14:34:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmflib::require_domains: add new function to to replace require_realm [puppet] - 10https://gerrit.wikimedia.org/r/570343 (owner: 10Jbond)
[14:34:59] <vgutierrez>	 !log updating acme-chief to version 0.24 - T244236
[14:35:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:35:24] <wikibugs>	 10Operations, 10SRE-tools: Homer: commit> no causes stacktrace - https://phabricator.wikimedia.org/T244362 (10ayounsi) p:05Triage→03Low
[14:35:52] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] mcrouter: run at nice -19 as php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/570346 (owner: 10Giuseppe Lavagetto)
[14:36:13] <_joe_>	 sigh there is an error though
[14:36:35] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[14:36:58] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: mcrouter: run at nice -19 as php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/570346
[14:37:29] <wikibugs>	 10Operations, 10Acme-chief, 10Traffic, 10Patch-For-Review: acme-chief is unable to renew certificates against LE staging environment - https://phabricator.wikimedia.org/T244236 (10Vgutierrez) 05Open→03Resolved Fixed by backporting https://github.com/certbot/certbot/commit/0b5468e992ab57fa028ddf33ca2351...
[14:37:59] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] mcrouter: run at nice -19 as php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/570346 (owner: 10Giuseppe Lavagetto)
[14:38:44] <wikibugs>	 (03PS1) 10Vgutierrez: install_server: Reimage cp5012 as buster [puppet] - 10https://gerrit.wikimedia.org/r/570347 (https://phabricator.wikimedia.org/T242093)
[14:39:19] <wikibugs>	 (03PS1) 10Jbond: wmflib::require_domains: use require_domains instead of require_realm [puppet] - 10https://gerrit.wikimedia.org/r/570348 (https://phabricator.wikimedia.org/T244222)
[14:39:37] <wikibugs>	 (03CR) 10Jbond: "LGTM thanks" [puppet] - 10https://gerrit.wikimedia.org/r/569570 (https://phabricator.wikimedia.org/T242585) (owner: 10Filippo Giunchedi)
[14:39:43] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] cassandra: use wmflib::secret for binary files [puppet] - 10https://gerrit.wikimedia.org/r/569570 (https://phabricator.wikimedia.org/T242585) (owner: 10Filippo Giunchedi)
[14:39:56] <wikibugs>	 10Operations, 10SRE-tools: Homer: commit timeout on MX104 and SRXs - https://phabricator.wikimedia.org/T244363 (10ayounsi) p:05Triage→03Normal
[14:40:33] <wikibugs>	 (03PS1) 10Elukey: profile::memcached::instance: add the theads parameter [puppet] - 10https://gerrit.wikimedia.org/r/570349
[14:42:36] <wikibugs>	 (03PS1) 10Jbond: realm: remove realm global variable [puppet] - 10https://gerrit.wikimedia.org/r/570350 (https://phabricator.wikimedia.org/T244222)
[14:43:32] <wikibugs>	 (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1002/20619/ - noop as expected" [puppet] - 10https://gerrit.wikimedia.org/r/570349 (owner: 10Elukey)
[14:46:14] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] realm: remove realm global variable [puppet] - 10https://gerrit.wikimedia.org/r/570350 (https://phabricator.wikimedia.org/T244222) (owner: 10Jbond)
[14:48:20] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for codfw on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005:9112 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
[14:48:34] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for eqiad on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005:9112 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
[14:49:07] <_joe_>	 jbond42: removing ::realm?
[14:49:18] <_joe_>	 that's... quite complex, why would you want to?
[14:50:58] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] cassandra: use wmflib::secret for binary files [puppet] - 10https://gerrit.wikimedia.org/r/569570 (https://phabricator.wikimedia.org/T242585) (owner: 10Filippo Giunchedi)
[14:52:24] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10Patch-For-Review: Production load.php spends ~ 10% time doing output compression within PHP - https://phabricator.wikimedia.org/T242478 (10ema) I have applied the change to cp1075 for some minutes, and the effect on network transfer is [[https://grafana.wikime...
[14:53:34] <wikibugs>	 (03PS6) 10Elukey: presto: add kerberos and tls support [puppet] - 10https://gerrit.wikimedia.org/r/570248
[14:53:59] <jbond42>	 _joe_: i fell down a rabbit whole, i started moving it from hiera to  global variable so that wmcs could have a more complex hiera hierarcy and then thought that perhaps this may be a simpler alternative 
[14:54:06] <wikibugs>	 (03PS4) 10Jbond: wmflib::require_domains: add new function to to replace require_realm [puppet] - 10https://gerrit.wikimedia.org/r/570343
[14:54:40] <jbond42>	 _joe_: also it looks like the actual realm variabls is not really used any more appart from in the require_realm function
[14:55:12] <_joe_>	 jbond42: uh?
[14:55:31] <_joe_>	 ~/Code/WMF/operations/puppet (production=)$ git grep ::realm | wc -l
[14:55:32] <_joe_>	 75
[14:57:01] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+1] profile::memcached::instance: add the theads parameter [puppet] - 10https://gerrit.wikimedia.org/r/570349 (owner: 10Elukey)
[14:57:26] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmflib::require_domains: add new function to to replace require_realm [puppet] - 10https://gerrit.wikimedia.org/r/570343 (owner: 10Jbond)
[14:57:36] <jbond42>	 _joe_: thanks i forgoto to quialifyt it while checking ill drop the last two cr's in that set
[14:58:57] <wikibugs>	 (03PS1) 10Ema: Revert "ATS: temporarily leave AE untouched" [puppet] - 10https://gerrit.wikimedia.org/r/570352 (https://phabricator.wikimedia.org/T242478)
[14:59:06] <wikibugs>	 (03Abandoned) 10Jbond: realm: remove realm global variable [puppet] - 10https://gerrit.wikimedia.org/r/570350 (https://phabricator.wikimedia.org/T244222) (owner: 10Jbond)
[14:59:14] <icinga-wm>	 PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 8429 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[14:59:30] <_joe_>	 this is expected more or less ^^
[14:59:44] <_joe_>	 puppet is running and restarting mcrouter across the fleet
[15:01:03] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Revert "cassandra: use wmflib::secret for binary files" [puppet] - 10https://gerrit.wikimedia.org/r/570353
[15:01:06] <icinga-wm>	 RECOVERY - MediaWiki memcached error rate on icinga1001 is OK: (C)5000 gt (W)1000 gt 511 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:01:27] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] Revert "cassandra: use wmflib::secret for binary files" [puppet] - 10https://gerrit.wikimedia.org/r/570353 (owner: 10Filippo Giunchedi)
[15:04:49] <wikibugs>	 (03CR) 10Ema: [C: 03+1] install_server: Reimage cp5012 as buster [puppet] - 10https://gerrit.wikimedia.org/r/570347 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez)
[15:05:50] <wikibugs>	 (03PS5) 10Jbond: wmflib::require_domains: add new function to to replace require_realm [puppet] - 10https://gerrit.wikimedia.org/r/570343
[15:05:52] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] Revert "ATS: temporarily leave AE untouched" [puppet] - 10https://gerrit.wikimedia.org/r/570352 (https://phabricator.wikimedia.org/T242478) (owner: 10Ema)
[15:06:27] <wikibugs>	 (03PS2) 10Jbond: wmflib::require_domains: use require_domains instead of require_realm [puppet] - 10https://gerrit.wikimedia.org/r/570348 (https://phabricator.wikimedia.org/T244222)
[15:06:29] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] presto: add kerberos and tls support [puppet] - 10https://gerrit.wikimedia.org/r/570248 (owner: 10Elukey)
[15:08:33] <wikibugs>	 (03PS5) 10Filippo Giunchedi: cassandra: restbase-dev logs to logging pipeline [puppet] - 10https://gerrit.wikimedia.org/r/569564 (https://phabricator.wikimedia.org/T242585)
[15:08:40] <wikibugs>	 (03CR) 10Ema: [C: 03+2] Revert "ATS: temporarily leave AE untouched" [puppet] - 10https://gerrit.wikimedia.org/r/570352 (https://phabricator.wikimedia.org/T242478) (owner: 10Ema)
[15:09:02] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmflib::require_domains: add new function to to replace require_realm [puppet] - 10https://gerrit.wikimedia.org/r/570343 (owner: 10Jbond)
[15:12:00] <ema>	 !log cp: unset Accept-Encoding from ats-be requests to applayer T242478
[15:12:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:12:04] <stashbot>	 T242478: Production load.php spends ~ 10% time doing output compression within PHP - https://phabricator.wikimedia.org/T242478
[15:12:59] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] install_server: Reimage cp5012 as buster [puppet] - 10https://gerrit.wikimedia.org/r/570347 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez)
[15:15:19] <vgutierrez>	 !log depooling & reimaging cp5012 as buster - T242093
[15:15:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:15:21] <stashbot>	 T242093: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093
[15:17:12] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[15:17:46] <wikibugs>	 (03PS6) 10Jbond: wmflib::require_domains: add new function to to replace require_realm [puppet] - 10https://gerrit.wikimedia.org/r/570343
[15:17:57] <wikibugs>	 (03PS3) 10Jbond: wmflib::require_domains: use require_domains instead of require_realm [puppet] - 10https://gerrit.wikimedia.org/r/570348 (https://phabricator.wikimedia.org/T244222)
[15:18:28] <wikibugs>	 10Operations, 10Traffic: traffic_server crash upon Lua reload: attempt to concatenate a table value - https://phabricator.wikimedia.org/T242952 (10ema) This just happened on cp1087:  ` Feb 05 15:14:05 cp1087 systemd[1]: Reloaded Apache Traffic Server is a fast, scalable and extensible caching proxy server.. Fe...
[15:19:23] <icinga-wm>	 ACKNOWLEDGEMENT - traffic_server backend process restarted on cp1087 is CRITICAL: 2 ge 2 Ema Known issue: https://phabricator.wikimedia.org/T242952 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server https://grafana.wikimedia.org/d/6uhkG6OZk/ats-instance-drilldown?orgId=1&var-site=eqiad+prometheus/ops&var-instance=cp1087&var-layer=backend
[15:19:23] <icinga-wm>	 ACKNOWLEDGEMENT - traffic_server backend process restarted on cp5010 is CRITICAL: 2 ge 2 Ema Known issue: https://phabricator.wikimedia.org/T242952 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server https://grafana.wikimedia.org/d/6uhkG6OZk/ats-instance-drilldown?orgId=1&var-site=eqsin+prometheus/ops&var-instance=cp5010&var-layer=backend
[15:21:35] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on cumin1001.eqiad.wmnet for hosts: ` cp5012.eqsin.wmnet ` The log can be found in `/var/log/wmf-auto-reima...
[15:24:25] <effie>	 !log Rollout php-apcu_5.1.17+4.0.11-1+0~20190217111312.9+stretch~1.gbp192528+wmf2 to api, app and jobrunner canaries - T236800
[15:24:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:24:33] <stashbot>	 T236800: Ensure apcu incr/decr are atomic (Upgrade php-apcu) - https://phabricator.wikimedia.org/T236800
[15:25:20] <wikibugs>	 (03PS1) 10Jhedden: openstack: update cloudvirt101[56] pool status [puppet] - 10https://gerrit.wikimedia.org/r/570358 (https://phabricator.wikimedia.org/T243327)
[15:26:27] <wikibugs>	 10Operations, 10ops-codfw: (No Need By Date Provided) codfw: rack/setup/install  elastic20{55,56,57,58,59,60}.wikimedia.org - https://phabricator.wikimedia.org/T241337 (10Gehel)
[15:27:29] <wikibugs>	 (03CR) 10Jhedden: [C: 03+2] openstack: update cloudvirt101[56] pool status [puppet] - 10https://gerrit.wikimedia.org/r/570358 (https://phabricator.wikimedia.org/T243327) (owner: 10Jhedden)
[15:27:44] <wikibugs>	 (03PS7) 10Jbond: wmflib::require_domains: add new function to to replace require_realm [puppet] - 10https://gerrit.wikimedia.org/r/570343
[15:28:03] <wikibugs>	 (03PS4) 10Jbond: wmflib::require_domains: use require_domains instead of require_realm [puppet] - 10https://gerrit.wikimedia.org/r/570348 (https://phabricator.wikimedia.org/T244222)
[15:29:42] <effie>	 !log restart php-fpm on canaries - T236800
[15:29:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:29:45] <stashbot>	 T236800: Ensure apcu incr/decr are atomic (Upgrade php-apcu) - https://phabricator.wikimedia.org/T236800
[15:31:51] <wikibugs>	 (03PS1) 10Ottomata: Temporarily allow hadoop test cluster workers to talk to JupyterHub on analytics1030 [puppet] - 10https://gerrit.wikimedia.org/r/570362 (https://phabricator.wikimedia.org/T224658)
[15:35:18] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] profile::memcached::instance: add the theads parameter [puppet] - 10https://gerrit.wikimedia.org/r/570349 (owner: 10Elukey)
[15:36:14] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/20620/analytics1030.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/570362 (https://phabricator.wikimedia.org/T224658) (owner: 10Ottomata)
[15:37:12] <_joe_>	 we're having anotrher problem
[15:38:12] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[15:39:01] <wikibugs>	 (03PS1) 10Jhedden: openstack: switch cloudvirt101[56] to ceph storage [puppet] - 10https://gerrit.wikimedia.org/r/570363 (https://phabricator.wikimedia.org/T243327)
[15:39:52] <wikibugs>	 (03PS2) 10Jhedden: openstack: switch cloudvirt101[56] to ceph storage [puppet] - 10https://gerrit.wikimedia.org/r/570363 (https://phabricator.wikimedia.org/T243327)
[15:41:26] <wikibugs>	 (03PS1) 10Elukey: role::mediawiki::memcached:gutter: set threads to 16 [puppet] - 10https://gerrit.wikimedia.org/r/570364 (https://phabricator.wikimedia.org/T240684)
[15:41:35] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] "Assuming this works and gets set in the right order for VMs, looks good to me!" [puppet] - 10https://gerrit.wikimedia.org/r/570331 (https://phabricator.wikimedia.org/T244222) (owner: 10Jbond)
[15:42:46] <wikibugs>	 (03CR) 10Jforrester: "This isn't deploy-safe. You've just made $wgLogos non-false, but not set the ['1x'] value yet. This blew up Beta Cluster (T244370) and wil" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570304 (https://phabricator.wikimedia.org/T232140) (owner: 10Jdlrobson)
[15:43:44] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[15:43:47] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "LGTM." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/570343 (owner: 10Jbond)
[15:43:49] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] Explicitly add theemin to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/570288 (owner: 10Muehlenhoff)
[15:46:14] <wikibugs>	 (03CR) 10Jforrester: "> Patch Set 4:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570304 (https://phabricator.wikimedia.org/T232140) (owner: 10Jdlrobson)
[15:47:48] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] role::mediawiki::memcached:gutter: set threads to 16 [puppet] - 10https://gerrit.wikimedia.org/r/570364 (https://phabricator.wikimedia.org/T240684) (owner: 10Elukey)
[15:50:46] <wikibugs>	 10Operations, 10ops-eqiad, 10serviceops: (No Need By Date Provided) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10jijiki) @wiki_willy @Jclark-ctr I understand that eqiad is overloaded, but is there a chance we can raise the priority of this? We have been sufferin...
[15:51:02] <wikibugs>	 (03PS8) 10Jbond: wmflib::require_domains: add new function to to replace require_realm [puppet] - 10https://gerrit.wikimedia.org/r/570343
[15:51:38] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to deployment for niedzielski - https://phabricator.wikimedia.org/T243924 (10MarkTraceur) Approved as manager!
[15:51:52] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to deployment for niedzielski - https://phabricator.wikimedia.org/T243924 (10MarkTraceur)
[15:52:31] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.hosts.downtime
[15:52:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:52:49] <James_F>	 I'm going to do a quick deploy or two.
[15:52:54] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] Restore wgLogoHD to wikis without a MinervaCustomLogos defined [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570326 (https://phabricator.wikimedia.org/T232140) (owner: 10Jdlrobson)
[15:52:56] <wikibugs>	 (03PS3) 10Jbond: realm global: make the realm variable a global in labs [puppet] - 10https://gerrit.wikimedia.org/r/570331 (https://phabricator.wikimedia.org/T244222)
[15:54:14] <wikibugs>	 (03PS9) 10Jbond: wmflib::require_domains: add new function to to replace require_realm [puppet] - 10https://gerrit.wikimedia.org/r/570343
[15:54:24] <wikibugs>	 (03PS5) 10Jbond: wmflib::require_domains: use require_domains instead of require_realm [puppet] - 10https://gerrit.wikimedia.org/r/570348 (https://phabricator.wikimedia.org/T244222)
[15:54:50] <wikibugs>	 (03CR) 10Jbond: "Thanks updated" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/570343 (owner: 10Jbond)
[15:54:50] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[15:54:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:56:26] <wikibugs>	 10Operations, 10ops-codfw: (No Need By Date Provided) codfw: rack/setup/install  elastic20{55,56,57,58,59,60}.wikimedia.org - https://phabricator.wikimedia.org/T241337 (10Papaul) Talked to @Gehel on IRC those servers will be in the private VLAN and not in the public VLAN with Stretch as OS.
[15:59:23] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: realm: make the realm variable a global in labs [puppet] - 10https://gerrit.wikimedia.org/r/570369 (https://phabricator.wikimedia.org/T244222)
[16:00:04] <jouncebot>	 anomie and urandom: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Sessionstore deployment (mediawiki-config) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200205T1600).
[16:00:04] <jouncebot>	 urandom: A patch you scheduled for Sessionstore deployment (mediawiki-config) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[16:02:05] <wikibugs>	 (03PS1) 10Elukey: Raise memcached threads to 8 (was: 4) on mc1025 [puppet] - 10https://gerrit.wikimedia.org/r/570370
[16:05:42] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp5012.eqsin.wmnet'] `  Of which those **FAILED**: ` ['cp5012.eqsin.wmnet'] `
[16:07:04] <wikibugs>	 (03PS5) 10Eevans: Configure group0 & group1 for kask-transition (multi-write kask/redis) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569678 (https://phabricator.wikimedia.org/T243106)
[16:07:33] <wikibugs>	 10Operations, 10ops-codfw, 10serviceops: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10Papaul) All the servers on the table above are running Buster I had a chat with @MoritzMuehlenhoff and he mentioned that we need to install Stretch on those servers so I have to upd...
[16:07:57] <elukey>	 !log update puppet compiler's facts 
[16:07:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:08:16] <wikibugs>	 (03CR) 10Anomie: [C: 03+2] Configure group0 & group1 for kask-transition (multi-write kask/redis) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569678 (https://phabricator.wikimedia.org/T243106) (owner: 10Eevans)
[16:09:36] <wikibugs>	 (03PS1) 10CDanis: librenms API scrape alert: make critical [puppet] - 10https://gerrit.wikimedia.org/r/570371 (https://phabricator.wikimedia.org/T224888)
[16:12:38] <wikibugs>	 (03PS2) 10CDanis: librenms API scrape alert: make critical & change name [puppet] - 10https://gerrit.wikimedia.org/r/570371 (https://phabricator.wikimedia.org/T224888)
[16:14:05] <wikibugs>	 (03PS19) 10ArielGlenn: write out and reuse pagerange info for big page content jobs [dumps] - 10https://gerrit.wikimedia.org/r/566580 (https://phabricator.wikimedia.org/T243434)
[16:15:32] <wikibugs>	 (03CR) 10Jbond: "In comparing this with https://gerrit.wikimedia.org/r/c/operations/puppet/+/570331 im not sure if they really differ on a functional level" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/570369 (https://phabricator.wikimedia.org/T244222) (owner: 10Arturo Borrero Gonzalez)
[16:16:16] <wikibugs>	 (03CR) 10CDanis: "PCC looks correct https://puppet-compiler.wmflabs.org/compiler1001/20630/icinga1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/570371 (https://phabricator.wikimedia.org/T224888) (owner: 10CDanis)
[16:18:25] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] Raise memcached threads to 8 (was: 4) on mc1025 [puppet] - 10https://gerrit.wikimedia.org/r/570370 (owner: 10Elukey)
[16:18:27] <wikibugs>	 (03PS1) 10Elukey: presto: set server parameter in local presto exec script [puppet] - 10https://gerrit.wikimedia.org/r/570372
[16:18:31] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/570370 (owner: 10Elukey)
[16:20:16] <wikibugs>	 (03PS3) 10Muehlenhoff: Switch authdns* to standard Partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/566476 (https://phabricator.wikimedia.org/T156955)
[16:20:48] <wikibugs>	 (03CR) 10Ayounsi: librenms API scrape alert: make critical & change name (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/570371 (https://phabricator.wikimedia.org/T224888) (owner: 10CDanis)
[16:21:20] <icinga-wm>	 RECOVERY - mediawiki originals uploads -hourly- for eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
[16:21:56] <wikibugs>	 (03CR) 10CDanis: librenms API scrape alert: make critical & change name (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/570371 (https://phabricator.wikimedia.org/T224888) (owner: 10CDanis)
[16:22:24] <icinga-wm>	 RECOVERY - mediawiki originals uploads -hourly- for codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
[16:22:53] <James_F>	 urandom: BTW, I still have the conch. As soon as I can finish, it's yours.
[16:23:11] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] librenms API scrape alert: make critical & change name [puppet] - 10https://gerrit.wikimedia.org/r/570371 (https://phabricator.wikimedia.org/T224888) (owner: 10CDanis)
[16:23:12] <urandom>	 conch?
[16:23:22] <James_F>	 I'm still mid-deploy.
[16:23:31] <James_F>	 Or, rather, I would be, if CI was working.
[16:23:34] <wikibugs>	 (03Merged) 10jenkins-bot: Restore wgLogoHD to wikis without a MinervaCustomLogos defined [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570326 (https://phabricator.wikimedia.org/T232140) (owner: 10Jdlrobson)
[16:23:40] <James_F>	 Finally.
[16:23:57] <bblack>	 urandom: https://en.wikipedia.org/wiki/Conch#Literature_and_the_oral_tradition
[16:24:12] <bblack>	 (the second bullet point about Lord of the Flies)
[16:24:52] <wikibugs>	 (03PS2) 10Elukey: presto: set server parameter in local presto exec script [puppet] - 10https://gerrit.wikimedia.org/r/570372
[16:25:22] <logmsgbot>	 !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: T232140 Restore wgLogoHD to wikis without a MinervaCustomLogos defined (duration: 01m 09s)
[16:25:23] <bblack>	 it's like a virtual human-level mutex lock.  whomever holds the conch is who is acting/speaking at the time.
[16:25:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:25:26] <stashbot>	 T232140: Separate out logo handling into square image logos and long text/wordmark banner logos - https://phabricator.wikimedia.org/T232140
[16:25:37] <James_F>	 urandom: OK, over to you.
[16:25:38] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Investigate using the rich_data opsion to support Binary and binary_file for binary data - https://phabricator.wikimedia.org/T236481 (10jbond)
[16:25:45] <urandom>	 I actually think I knew this, but I don't know what it means in the context here
[16:26:03] <wikibugs>	 10Operations, 10puppet-compiler: puppet-compiler fails to compile production catalog for restbase2014 - https://phabricator.wikimedia.org/T238053 (10jbond)
[16:26:07] <wikibugs>	 (03Merged) 10jenkins-bot: Configure group0 & group1 for kask-transition (multi-write kask/redis) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569678 (https://phabricator.wikimedia.org/T243106) (owner: 10Eevans)
[16:26:11] <bblack>	 in this context, it's the human currently deploying holds the conch, so we don't have two deployers stepping on each other.
[16:26:17] <wikibugs>	 10Operations, 10puppet-compiler: puppet-compiler fails to compile production catalog for restbase2014 - https://phabricator.wikimedia.org/T238053 (10jbond)
[16:26:43] <James_F>	 urandom: As in I said in here half an hour ago that I was deploying
[16:26:57] <wikibugs>	 (03CR) 10Cwhite: "Nonblocking reply." [puppet] - 10https://gerrit.wikimedia.org/r/570330 (https://phabricator.wikimedia.org/T244222) (owner: 10Jbond)
[16:27:03] <urandom>	 bblack: gotcha, I didn't realize the previous window was waiting
[16:27:12] <wikibugs>	 10Operations, 10puppet-compiler: puppet-compiler fails to compile production catalog for restbase2014 - https://phabricator.wikimedia.org/T238053 (10jbond)
[16:27:22] <urandom>	 James_F: yeah, I didn't go back that far in the backscroll...my bad
[16:27:26] <elukey>	 there seems to be a big queue for jenkins https://integration.wikimedia.org/zuul/
[16:27:29] <elukey>	 sigh
[16:29:59] <wikibugs>	 10Operations, 10puppet-compiler: puppet-compiler fails to compile production catalog for restbase2014 - https://phabricator.wikimedia.org/T238053 (10jbond)
[16:30:13] <wikibugs>	 10Operations, 10puppet-compiler: puppet-compiler fails to compile production catalog for restbase2014 - https://phabricator.wikimedia.org/T238053 (10jbond)
[16:30:46] <James_F>	 elukey: We're on it.
[16:31:04] <thcipriani>	 elukey: known, waiting for g&s to clear and then Taking Action™
[16:32:01] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch authdns* to standard Partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/566476 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff)
[16:33:41] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] Raise memcached threads to 8 (was: 4) on mc1025 [puppet] - 10https://gerrit.wikimedia.org/r/570370 (owner: 10Elukey)
[16:35:32] <wikibugs>	 (03PS2) 10Muehlenhoff: Switch cescout* to standard Partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/570316 (https://phabricator.wikimedia.org/T156955)
[16:37:03] <logmsgbot>	 !log ppchelko@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:569678]] Config: Enable sessionstore on group0 and 1 T243106 (duration: 01m 08s)
[16:37:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:37:07] <stashbot>	 T243106: Phased rollout of sessionstore to production fleet - https://phabricator.wikimedia.org/T243106
[16:38:09] <wikibugs>	 (03PS1) 10Filippo Giunchedi: WIP: elasticsearch cirrus logs to logging pipeline [puppet] - 10https://gerrit.wikimedia.org/r/570374
[16:40:46] <addshore>	 jouncebot: now
[16:40:46] <jouncebot>	 For the next 0 hour(s) and 19 minute(s): Sessionstore deployment (mediawiki-config) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200205T1600)
[16:40:52] <addshore>	 jouncebot: next
[16:40:52] <jouncebot>	 In 2 hour(s) and 19 minute(s): Morning SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200205T1900)
[16:41:49] <wikibugs>	 (03CR) 10Thcipriani: "recheck" [dumps] - 10https://gerrit.wikimedia.org/r/566580 (https://phabricator.wikimedia.org/T243434) (owner: 10ArielGlenn)
[16:42:04] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[16:43:09] <James_F>	 addshore: Prod is not clear.
[16:43:25] <addshore>	 James_F: indeed :) I was just looking around!
[16:44:57] <thcipriani>	 messaging jouncebot -- the universal "everyone stand back" signal
[16:51:24] <urandom>	 thcipriani: hold my beer
[16:51:57] <thcipriani>	 hahaha
[16:52:41] <wikibugs>	 (03PS2) 10Filippo Giunchedi: WIP: elasticsearch cirrus logs to logging pipeline [puppet] - 10https://gerrit.wikimedia.org/r/570374
[16:53:36] <urandom>	 !log Sessionstore deployment (mediawiki-config) is done
[16:53:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:58:02] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: standard: Add linux-perf to standard packages [puppet] - 10https://gerrit.wikimedia.org/r/570254
[16:59:12] <wikibugs>	 (03CR) 10Paladox: "recheck" [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/566890 (owner: 10Paladox)
[17:00:22] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] standard: Add linux-perf to standard packages [puppet] - 10https://gerrit.wikimedia.org/r/570254 (owner: 10Alexandros Kosiaris)
[17:00:25] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] standard: Add linux-perf to standard packages (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/570254 (owner: 10Alexandros Kosiaris)
[17:00:49] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] standard: Add linux-perf to standard packages (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/570254 (owner: 10Alexandros Kosiaris)
[17:02:10] <wikibugs>	 (03PS4) 10Dzahn: add IP addresses for new install servers on buster [dns] - 10https://gerrit.wikimedia.org/r/569679 (https://phabricator.wikimedia.org/T224576)
[17:02:32] <wikibugs>	 10Operations, 10observability: Upgrade Grafana to 6.6 - https://phabricator.wikimedia.org/T244208 (10fgiunchedi)
[17:05:31] <wikibugs>	 (03PS3) 10Paladox: Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/562363
[17:05:50] <wikibugs>	 (03PS1) 10Jforrester: Set $wgLogos['1x'] (new style access) to $wgLogo (old style access) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570378 (https://phabricator.wikimedia.org/T232140)
[17:05:53] <wikibugs>	 (03PS1) 10Jforrester: Merge $wgLogo into $wgLogos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570379 (https://phabricator.wikimedia.org/T232140)
[17:05:57] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] presto: set server parameter in local presto exec script [puppet] - 10https://gerrit.wikimedia.org/r/570372 (owner: 10Elukey)
[17:06:34] <wikibugs>	 (03CR) 10Jforrester: [C: 04-2] "Not until wmf.19 is everywhere and won't regress. Also note that this is a deploy-trap; sync CommonSettings ahead of IS or it'll break the" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570379 (https://phabricator.wikimedia.org/T232140) (owner: 10Jforrester)
[17:07:17] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Merge $wgLogo into $wgLogos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570379 (https://phabricator.wikimedia.org/T232140) (owner: 10Jforrester)
[17:07:26] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[17:08:48] <wikibugs>	 (03Abandoned) 10Paladox: Bump Bazel version to 2.0.0 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/566890 (owner: 10Paladox)
[17:08:56] <wikibugs>	 10Operations, 10ops-eqiad, 10serviceops: (No Need By Date Provided) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10wiki_willy) Hi @jijiki - @Cmjohnson is currently working on finishing up T236437, which also had a previous need by date of a month ago.  Would the c...
[17:12:42] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Raise memcached threads to 8 (was: 4) on mc1025 [puppet] - 10https://gerrit.wikimedia.org/r/570370 (owner: 10Elukey)
[17:25:58] <wikibugs>	 (03PS1) 10Sbisson: Enable InukaPageView logging on production Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570381 (https://phabricator.wikimedia.org/T238029)
[17:26:47] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch cescout* to standard Partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/570316 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff)
[17:27:09] <wikibugs>	 (03CR) 10Alexandros Kosiaris: standard: Add linux-perf to standard packages (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/570254 (owner: 10Alexandros Kosiaris)
[17:27:22] <wikibugs>	 (03PS3) 10Alexandros Kosiaris: standard: Add linux-perf to standard packages [puppet] - 10https://gerrit.wikimedia.org/r/570254
[17:28:38] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[17:28:49] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/570371 (https://phabricator.wikimedia.org/T224888) (owner: 10CDanis)
[17:29:37] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "fixed. thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/570254 (owner: 10Alexandros Kosiaris)
[17:30:03] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/570254 (owner: 10Alexandros Kosiaris)
[17:30:56] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] librenms API scrape alert: make critical & change name [puppet] - 10https://gerrit.wikimedia.org/r/570371 (https://phabricator.wikimedia.org/T224888) (owner: 10CDanis)
[17:31:34] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] librenms API scrape alert: make critical & change name [puppet] - 10https://gerrit.wikimedia.org/r/570371 (https://phabricator.wikimedia.org/T224888) (owner: 10CDanis)
[17:33:04] <logmsgbot>	 !log reedy@deploy1001 Synchronized php-1.35.0-wmf.18/includes/: T244300 (duration: 01m 14s)
[17:33:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:33:07] <stashbot>	 T244300: Argument 1 passed to Title::getLanguageConverter() must be an instance of Language, instance of StubUserLang given, called in /srv/mediawiki/php-1.35.0-wmf.18/includes/Title.php on line 207 - https://phabricator.wikimedia.org/T244300
[17:34:31] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/570159 (https://phabricator.wikimedia.org/T243935) (owner: 10Volans)
[17:34:35] <logmsgbot>	 !log reedy@deploy1001 Synchronized php-1.35.0-wmf.18/languages/: T244300 (duration: 01m 13s)
[17:34:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:36:44] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[17:37:17] <Reedy>	 winning
[17:43:28] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] add IP addresses for new install servers on buster [dns] - 10https://gerrit.wikimedia.org/r/569679 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn)
[17:44:08] <icinga-wm>	 PROBLEM - parsoid on scandium is CRITICAL: connect to address 10.64.48.94 and port 8142: Connection refused https://wikitech.wikimedia.org/wiki/Services/Monitoring/parsoid
[17:44:27] <mutante>	 ^ eh.. scandium is a test server
[17:44:35] <mutante>	 expired long downtime 
[17:44:48] <mutante>	 not critical at all. downtiming it again
[17:45:47] <icinga-wm>	 ACKNOWLEDGEMENT - parsoid on scandium is CRITICAL: connect to address 10.64.48.94 and port 8142: Connection refused daniel_zahn test server https://wikitech.wikimedia.org/wiki/Services/Monitoring/parsoid
[17:47:33] <mutante>	 re-enabling notifications that should not be disabled anymore (for other stuff). often those are forgotten because unlike downtimes they never expire by themselves
[17:48:48] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to Deployment for Clarakosi - https://phabricator.wikimedia.org/T244381 (10Clarakosi)
[17:51:28] <mutante>	 !log ganeti1017 - rebooting (not in use yet)
[17:51:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:52:58] <icinga-wm>	 PROBLEM - Host ganeti1017 is DOWN: PING CRITICAL - Packet loss = 100%
[17:53:06] <wikibugs>	 10Operations, 10ops-eqsin, 10Traffic: rack/setup/install ps[12]-60[34]-eqsin - https://phabricator.wikimedia.org/T242250 (10RobH) Update:  I've coordinated with Jin via Google Hangout Messages and he has reviewed the rack and ensured he has all the cabled needed.  I sent in this email to him, but since then...
[17:53:15] <icinga-wm>	 RECOVERY - Host ganeti1017 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms
[17:53:30] <wikibugs>	 10Operations, 10ops-eqsin, 10Traffic: rack/setup/install ps[12]-60[34]-eqsin - https://phabricator.wikimedia.org/T242250 (10RobH)
[17:54:09] <icinga-wm>	 RECOVERY - Check whether microcode mitigations for CPU vulnerabilities are applied on ganeti1017 is OK: OK - All expected CPU flags found https://wikitech.wikimedia.org/wiki/Microcode
[17:54:51] <icinga-wm>	 PROBLEM - Logs skipped by trafficserver-tls on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/ATS
[17:54:51] <icinga-wm>	 PROBLEM - check_trafficserver_log_fifo_tls_tls on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[17:54:53] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3121 on cp5012 is CRITICAL: connect to address 10.132.0.112 and port 3121: Connection refused https://wikitech.wikimedia.org/wiki/Varnish
[17:54:55] <icinga-wm>	 PROBLEM - check_trafficserver_backend_config_status on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[17:55:03] <icinga-wm>	 PROBLEM - traffic-pool service on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[17:55:03] <icinga-wm>	 PROBLEM - TLS Lua configuration file on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/ATS
[17:55:03] <icinga-wm>	 PROBLEM - Default ATS Lua configuration file on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/ATS
[17:55:09] <icinga-wm>	 PROBLEM - check_trafficserver_log_fifo_notpurge_backend on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[17:55:09] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/Varnish
[17:55:09] <icinga-wm>	 PROBLEM - Freshness of OCSP Stapling files -ATS-TLS acme-chief- on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/HTTPS/Unified_Certificates
[17:55:15] <icinga-wm>	 PROBLEM - Ensure trafficserver_exporter is running for instance backend on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[17:55:15] <icinga-wm>	 PROBLEM - check_trafficserver_log_fifo_purge_backend on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[17:55:15] <icinga-wm>	 PROBLEM - Webrequests Varnishkafka log producer on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka
[17:55:15] <icinga-wm>	 PROBLEM - Freshness of OCSP Stapling files -ATS-TLS- on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/HTTPS/Unified_Certificates
[17:55:21] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3124 on cp5012 is CRITICAL: connect to address 10.132.0.112 and port 3124: Connection refused https://wikitech.wikimedia.org/wiki/Varnish
[17:55:28] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to Deployment for Clarakosi - https://phabricator.wikimedia.org/T244381 (10WDoranWMF) As @Clarakosi direct manager I approve this request as it is necessary for her to be able to deploy as part of her work on the Core Platform Team.
[17:55:38] <elukey>	 this is only cp5012 right?
[17:55:43] <icinga-wm>	 PROBLEM - Check systemd state on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:55:43] <icinga-wm>	 PROBLEM - Ensure traffic_manager is running for instance backend on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[17:55:49] <icinga-wm>	 PROBLEM - Ensure traffic_manager is running for instance tls on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[17:55:49] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3127 on cp5012 is CRITICAL: connect to address 10.132.0.112 and port 3127: Connection refused https://wikitech.wikimedia.org/wiki/Varnish
[17:55:51] <icinga-wm>	 PROBLEM - Logs skipped by trafficserver on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/ATS
[17:55:51] <icinga-wm>	 PROBLEM - configured eth on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[17:55:53] <icinga-wm>	 PROBLEM - Ensure traffic_server is running for instance tls on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[17:55:57] <icinga-wm>	 PROBLEM - Ensure traffic_server is running for instance backend on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[17:56:01] <icinga-wm>	 PROBLEM - Ensure trafficserver_exporter is running for instance tls on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[17:56:01] <icinga-wm>	 PROBLEM - confd service on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[17:56:11] <elukey>	 Cc: vgutierrez, ema, bblack --^
[17:56:17] <icinga-wm>	 PROBLEM - IPMI Sensor Status on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures
[17:56:23] <elukey>	 is anybody of you working on cp5012?
[17:56:44] <mutante>	 that looks so bad but it's only a single host at least
[17:56:55] <icinga-wm>	 RECOVERY - parsoid on scandium is OK: HTTP OK: HTTP/1.1 200 OK - 1535 bytes in 0.053 second response time https://wikitech.wikimedia.org/wiki/Services/Monitoring/parsoid
[17:57:02] <vgutierrez>	 yes
[17:57:09] <mutante>	 new install?
[17:57:10] <vgutierrez>	 cp5012 is depooled
[17:57:11] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3125 on cp5012 is CRITICAL: connect to address 10.132.0.112 and port 3125: Connection refused https://wikitech.wikimedia.org/wiki/Varnish
[17:57:11] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3122 on cp5012 is CRITICAL: connect to address 10.132.0.112 and port 3122: Connection refused https://wikitech.wikimedia.org/wiki/Varnish
[17:57:14] <mutante>	 alright
[17:57:14] <vgutierrez>	 new install
[17:57:18] <elukey>	 ah okok
[17:57:20] <vgutierrez>	 hmm let me check it
[17:57:21] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 80 on cp5012 is CRITICAL: connect to address 10.132.0.112 and port 80: Connection refused https://wikitech.wikimedia.org/wiki/Varnish
[17:58:03] <icinga-wm>	 PROBLEM - MD RAID on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[17:58:03] <icinga-wm>	 PROBLEM - dhclient process on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[17:58:04] <mutante>	 probably icinga just added these alerts a couple seconds ago?
[17:58:15] <icinga-wm>	 PROBLEM - puppet last run on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[17:58:21] <mutante>	 want me to silence them or see them recover?
[17:58:28] <vgutierrez>	 silence it please
[17:59:22] <mutante>	 done. for 4 hours or so
[17:59:26] <mutante>	 can do longer
[17:59:56] <vgutierrez>	 thx
[18:00:01] <wikibugs>	 10Operations, 10Traffic, 10netops, 10observability, 10Patch-For-Review: Network port utilization alerts should be paging - https://phabricator.wikimedia.org/T224888 (10CDanis) 05Open→03Resolved
[18:00:07] <wikibugs>	 10Operations, 10ops-eqiad, 10serviceops: (No Need By Date Provided) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10jijiki) @wiki_willy hopefully it will help, but we generally believe that we will not be able to cope well again when we have sudden request spikes....
[18:02:13] <wikibugs>	 10Operations, 10Citoid, 10Core Platform Team Workboards (Clinic Duty Team): Citoid is logging all request / response headers as separate fields - https://phabricator.wikimedia.org/T239713 (10jijiki) p:05Triage→03Normal
[18:02:40] <wikibugs>	 10Operations, 10Citoid, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team): Citoid is logging all request / response headers as separate fields - https://phabricator.wikimedia.org/T239713 (10jijiki)
[18:08:32] <wikibugs>	 10Operations, 10ops-eqiad, 10serviceops: (No Need By Date Provided) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10wiki_willy) @jijiki - I'll talk to @Jclark-ctr and see if there's someway to expedite these.  One of the current bottlenecks is getting rid of some o...
[18:09:18] <wikibugs>	 10Operations, 10vm-requests: rack/setup/install ganeti10([09]|1[0-8]).eqiad.wmnet - https://phabricator.wikimedia.org/T228924 (10Dzahn) Thanks @Cmjohnson !  I rebooted ganeti1017 one more time because that fixes the [[ https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=ganeti1017&service=Check...
[18:12:05] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3121 on cp5012 is OK: HTTP OK: HTTP/1.1 200 OK - 540 bytes in 0.471 second response time https://wikitech.wikimedia.org/wiki/Varnish
[18:12:39] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3124 on cp5012 is OK: HTTP OK: HTTP/1.1 200 OK - 540 bytes in 0.470 second response time https://wikitech.wikimedia.org/wiki/Varnish
[18:13:11] <icinga-wm>	 RECOVERY - configured eth on cp5012 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[18:13:11] <icinga-wm>	 RECOVERY - Logs skipped by trafficserver on cp5012 is OK: OK: no matches found in journal for unit trafficserver https://wikitech.wikimedia.org/wiki/ATS
[18:13:13] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3127 on cp5012 is OK: HTTP OK: HTTP/1.1 200 OK - 540 bytes in 0.511 second response time https://wikitech.wikimedia.org/wiki/Varnish
[18:13:15] <icinga-wm>	 RECOVERY - Ensure traffic_server is running for instance tls on cp5012 is OK: PROCS OK: 1 process with args /srv/trafficserver/tls/bin/traffic_server -M --run-root=/srv/trafficserver/tls/runroot.yaml --httpport 443 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[18:13:17] <icinga-wm>	 RECOVERY - Ensure traffic_server is running for instance backend on cp5012 is OK: PROCS OK: 1 process with args /usr/bin/traffic_server -M --httpport 3128 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[18:13:23] <icinga-wm>	 RECOVERY - confd service on cp5012 is OK: OK - confd is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[18:13:25] <icinga-wm>	 RECOVERY - Ensure trafficserver_exporter is running for instance tls on cp5012 is OK: PROCS OK: 1 process with args /usr/bin/python3 /usr/bin/prometheus-trafficserver-exporter --no-procstats --no-ssl-verification --endpoint https://127.0.0.1:443/_stats --port 9322 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[18:13:25] <icinga-wm>	 RECOVERY - Logs skipped by trafficserver-tls on cp5012 is OK: OK: no matches found in journal for unit trafficserver-tls https://wikitech.wikimedia.org/wiki/ATS
[18:13:25] <icinga-wm>	 RECOVERY - check_trafficserver_log_fifo_tls_tls on cp5012 is OK: OK: TS_MAIN writing to and fifo-log-demux reading from /srv/trafficserver/tls/var/log/tls.pipe https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[18:13:31] <icinga-wm>	 RECOVERY - check_trafficserver_backend_config_status on cp5012 is OK: OK: configuration is current https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[18:13:33] <vgutierrez>	 sigh... sorry about the noise
[18:13:45] <icinga-wm>	 RECOVERY - Default ATS Lua configuration file on cp5012 is OK: OK https://wikitech.wikimedia.org/wiki/ATS
[18:13:45] <icinga-wm>	 RECOVERY - TLS Lua configuration file on cp5012 is OK: OK https://wikitech.wikimedia.org/wiki/ATS
[18:13:53] <icinga-wm>	 RECOVERY - Freshness of OCSP Stapling files -ATS-TLS acme-chief- on cp5012 is OK: OK https://wikitech.wikimedia.org/wiki/HTTPS/Unified_Certificates
[18:13:53] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp5012 is OK: reload-vcl has not been executed yet. https://wikitech.wikimedia.org/wiki/Varnish
[18:13:53] <icinga-wm>	 RECOVERY - check_trafficserver_log_fifo_notpurge_backend on cp5012 is OK: OK: TS_MAIN writing to and fifo-log-demux reading from /var/log/trafficserver/notpurge.pipe https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[18:14:01] <icinga-wm>	 RECOVERY - Ensure trafficserver_exporter is running for instance backend on cp5012 is OK: PROCS OK: 1 process with args /usr/bin/python3 /usr/bin/prometheus-trafficserver-exporter --no-procstats --no-ssl-verification --endpoint http://127.0.0.1:3128/_stats --port 9122 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[18:14:01] <icinga-wm>	 RECOVERY - Webrequests Varnishkafka log producer on cp5012 is OK: PROCS OK: 1 process with args /usr/bin/varnishkafka -S /etc/varnishkafka/webrequest.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka
[18:14:01] <icinga-wm>	 RECOVERY - Freshness of OCSP Stapling files -ATS-TLS- on cp5012 is OK: OK https://wikitech.wikimedia.org/wiki/HTTPS/Unified_Certificates
[18:14:01] <icinga-wm>	 RECOVERY - check_trafficserver_log_fifo_purge_backend on cp5012 is OK: OK: TS_MAIN writing to and fifo-log-demux reading from /var/log/trafficserver/purge.pipe https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[18:14:31] <icinga-wm>	 RECOVERY - MD RAID on cp5012 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[18:14:31] <icinga-wm>	 RECOVERY - dhclient process on cp5012 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[18:14:41] <icinga-wm>	 RECOVERY - Ensure traffic_manager is running for instance backend on cp5012 is OK: PROCS OK: 1 process with args /usr/bin/traffic_manager --nosyslog https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[18:14:43] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3125 on cp5012 is OK: HTTP OK: HTTP/1.1 200 OK - 540 bytes in 0.470 second response time https://wikitech.wikimedia.org/wiki/Varnish
[18:14:43] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3122 on cp5012 is OK: HTTP OK: HTTP/1.1 200 OK - 540 bytes in 0.470 second response time https://wikitech.wikimedia.org/wiki/Varnish
[18:14:47] <icinga-wm>	 RECOVERY - Ensure traffic_manager is running for instance tls on cp5012 is OK: PROCS OK: 1 process with args /usr/bin/traffic_manager --run-root=/srv/trafficserver/tls --nosyslog https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[18:14:55] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 80 on cp5012 is OK: HTTP OK: HTTP/1.1 200 OK - 540 bytes in 0.520 second response time https://wikitech.wikimedia.org/wiki/Varnish
[18:14:55] <icinga-wm>	 RECOVERY - puppet last run on cp5012 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[18:16:12] <wikibugs>	 (03PS1) 10Papaul: DHCP: Update DHCP file so the new mw servers can use Stretch and not Buster [puppet] - 10https://gerrit.wikimedia.org/r/570388 (https://phabricator.wikimedia.org/T241852)
[18:20:19] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+1] sre.switchdc.mediawiki: adapt to current status [cookbooks] - 10https://gerrit.wikimedia.org/r/570131 (https://phabricator.wikimedia.org/T243316) (owner: 10Volans)
[18:21:14] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+1] mediawiki: use cumin alias instead of role query [software/spicerack] - 10https://gerrit.wikimedia.org/r/570159 (https://phabricator.wikimedia.org/T243935) (owner: 10Volans)
[18:21:18] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic: Production load.php spends ~ 10% time doing output compression within PHP - https://phabricator.wikimedia.org/T242478 (10Krinkle) ###### Network  | [Dashboard: Cluster overview (eqiad appservers)](https://grafana.wikimedia.org/d/000000607/cluster-overview?orgId=1&...
[18:21:20] <elukey>	 !log restart memcached on mc1025 with 8 threads (rollback - revert https://gerrit.wikimedia.org/r/#/c/570370/, run puppet, restart memcached)
[18:21:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:21:27] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 159703264 and 3 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[18:23:10] <vgutierrez>	 !log rebooting cp5012 - T242093
[18:23:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:23:13] <stashbot>	 T242093: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093
[18:23:19] <icinga-wm>	 PROBLEM - Host cp5012 is DOWN: PING CRITICAL - Packet loss = 100%
[18:23:21] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 0 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[18:24:35] <icinga-wm>	 RECOVERY - Host cp5012 is UP: PING OK - Packet loss = 0%, RTA = 235.03 ms
[18:25:41] <icinga-wm>	 RECOVERY - traffic-pool service on cp5012 is OK: OK - traffic-pool is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[18:25:52] <wikibugs>	 (03PS1) 10Dzahn: site: add new ganeti hosts for refresh/expansion with spare role [puppet] - 10https://gerrit.wikimedia.org/r/570390 (https://phabricator.wikimedia.org/T228924)
[18:26:07] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 44 probes of 525 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[18:26:55] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] site: add new ganeti hosts for refresh/expansion with spare role [puppet] - 10https://gerrit.wikimedia.org/r/570390 (https://phabricator.wikimedia.org/T228924) (owner: 10Dzahn)
[18:27:00] <mutante>	 disabled notifications for cp5012 (will need manual re-enable though)
[18:27:11] <vgutierrez>	 mutante: I'll reenable it soon
[18:27:42] <mutante>	 vgutierrez: ACK, cool
[18:27:43] <wikibugs>	 (03CR) 10Krinkle: Merge $wgLogo into $wgLogos (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570379 (https://phabricator.wikimedia.org/T232140) (owner: 10Jforrester)
[18:28:33] <wikibugs>	 (03PS2) 10Dzahn: site: add new ganeti hosts for refresh/expansion with spare role [puppet] - 10https://gerrit.wikimedia.org/r/570390 (https://phabricator.wikimedia.org/T228924)
[18:29:55] <wikibugs>	 (03CR) 10Reedy: [C: 03+1] Set $wgLogos['1x'] (new style access) to $wgLogo (old style access) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570378 (https://phabricator.wikimedia.org/T232140) (owner: 10Jforrester)
[18:31:32] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] DHCP: Update DHCP file so the new mw servers can use Stretch and not Buster [puppet] - 10https://gerrit.wikimedia.org/r/570388 (https://phabricator.wikimedia.org/T241852) (owner: 10Papaul)
[18:31:57] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 33 probes of 525 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[18:32:28] <wikibugs>	 (03PS1) 10Dzahn: install_server: add install[12]003 to partman recipe regex [puppet] - 10https://gerrit.wikimedia.org/r/570392 (https://phabricator.wikimedia.org/T224576)
[18:32:38] <wikibugs>	 (03PS2) 10Dzahn: install_server: add install[12]003 to partman recipe regex [puppet] - 10https://gerrit.wikimedia.org/r/570392 (https://phabricator.wikimedia.org/T224576)
[18:33:43] <wikibugs>	 (03PS1) 10Ppchelko: Session Strore: Switch group0 and group1 to kask-session [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570393 (https://phabricator.wikimedia.org/T243106)
[18:34:37] <wikibugs>	 (03CR) 10Ppchelko: "to be rolled out on 02/06/2020" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570393 (https://phabricator.wikimedia.org/T243106) (owner: 10Ppchelko)
[18:35:27] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] install_server: add install[12]003 to partman recipe regex [puppet] - 10https://gerrit.wikimedia.org/r/570392 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn)
[18:35:57] <vgutierrez>	 !log pooling cp5012 - T242093
[18:35:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:35:59] <stashbot>	 T242093: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093
[18:36:46] <wikibugs>	 (03CR) 10RLazarus: [C: 03+2] dnsdisc: fix typo in docstring [software/spicerack] - 10https://gerrit.wikimedia.org/r/570160 (owner: 10Volans)
[18:39:30] <wikibugs>	 10Operations, 10ops-codfw, 10serviceops, 10Patch-For-Review: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2310.codfw.wmnet ` The log can be found in `/var/log...
[18:39:53] <wikibugs>	 10Operations, 10Release-Engineering-Team-TODO, 10Security-Team, 10Release-Engineering-Team (Deployment services), 10User-greg: Determine a core set or a checklist of permissions for deployment purpose - https://phabricator.wikimedia.org/T140270 (10greg) 05Open→03Resolved a:03greg I think the combin...
[18:41:52] <wikibugs>	 (03CR) 10Jforrester: [C: 04-2] "a" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570379 (https://phabricator.wikimedia.org/T232140) (owner: 10Jforrester)
[18:42:00] <wikibugs>	 (03PS1) 10Ppchelko: Session Store: Switch group2 to kask-transition [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570395 (https://phabricator.wikimedia.org/T243106)
[18:42:02] <wikibugs>	 (03PS1) 10Ppchelko: Session Store: Switch everything to kask-session [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570396 (https://phabricator.wikimedia.org/T243106)
[18:42:13] <wikibugs>	 10Operations, 10Research, 10serviceops: Request for a in-memory caching data set for caching research - https://phabricator.wikimedia.org/T240503 (10jijiki) p:05Triage→03Low
[18:42:55] <wikibugs>	 (03CR) 10Ppchelko: "To be deployed on 02/10/2020" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570395 (https://phabricator.wikimedia.org/T243106) (owner: 10Ppchelko)
[18:43:04] <wikibugs>	 10Operations: Remove mobrovac@wikimedia.org from techcom@wikimedia.org - https://phabricator.wikimedia.org/T244146 (10jijiki) p:05Triage→03Normal
[18:43:43] <wikibugs>	 10Operations: Remove mobrovac@wikimedia.org from techcom@wikimedia.org - https://phabricator.wikimedia.org/T244146 (10Dzahn) 05Open→03Resolved a:03Dzahn done!  cc: @mobrovac
[18:43:49] <icinga-wm>	 PROBLEM - etherpad_up reduced availability on icinga1001 is CRITICAL: 0 le 0.8 https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_exporters_%22up%22_metrics_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[18:43:55] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] sre.switchdc.mediawiki: adapt to current status [cookbooks] - 10https://gerrit.wikimedia.org/r/570131 (https://phabricator.wikimedia.org/T243316) (owner: 10Volans)
[18:44:16] <wikibugs>	 10Operations: Remove mobrovac@wikimedia.org from techcom@wikimedia.org - https://phabricator.wikimedia.org/T244146 (10jijiki) @Joe is this list on our end or OIT's ?
[18:44:21] <wikibugs>	 (03CR) 10Krinkle: Restore wgLogoHD to wikis without a MinervaCustomLogos defined (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570326 (https://phabricator.wikimedia.org/T232140) (owner: 10Jdlrobson)
[18:44:28] <wikibugs>	 (03CR) 10Ppchelko: "To be deployed on 02/11 if everything is good." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570396 (https://phabricator.wikimedia.org/T243106) (owner: 10Ppchelko)
[18:44:53] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10WMF-Legal: Add Itamar Givon to the ldap/wmde group - https://phabricator.wikimedia.org/T244148 (10jijiki) p:05Triage→03Normal a:03jijiki
[18:45:32] <wikibugs>	 10Operations, 10LDAP-Access-Requests: Request LDAP access to the WMF group for Edna M - https://phabricator.wikimedia.org/T244176 (10jijiki) p:05Triage→03Normal
[18:45:41] <icinga-wm>	 RECOVERY - etherpad_up reduced availability on icinga1001 is OK: (C)0.8 le (W)0.9 le 1 https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_exporters_%22up%22_metrics_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[18:46:45] <wikibugs>	 10Operations, 10WMF-Blog-Social-Team, 10WMF-Communications, 10Wikimedia-Mailing-lists: Delete mailing list "worldcup2018" - https://phabricator.wikimedia.org/T244316 (10Dzahn) 05Open→03Resolved a:03Dzahn done!  [fermium:~] $ sudo rmlist worldcup2018
[18:46:55] <wikibugs>	 10Operations, 10LDAP-Access-Requests: Request LDAP access to the WMF group for Edna M - https://phabricator.wikimedia.org/T244176 (10jijiki) @Edna We will need your Wikitech username in order to be able to add you to the WMF group, as well as an approval from your manager. Thank you!
[18:47:24] <wikibugs>	 10Operations, 10observability, 10vm-requests: Provision grafana VM in codfw - https://phabricator.wikimedia.org/T244357 (10Dzahn)
[18:48:03] <wikibugs>	 10Operations, 10Performance-Team, 10SRE-Access-Requests: Requesting access to deployment for dpifke - https://phabricator.wikimedia.org/T244183 (10jijiki) p:05Triage→03Normal a:03jijiki
[18:48:20] <wikibugs>	 10Operations, 10ops-codfw, 10netops: codfw: Delete cloud interface-range - https://phabricator.wikimedia.org/T244196 (10jijiki) p:05Triage→03Normal
[18:48:39] <wikibugs>	 10Operations, 10observability, 10vm-requests: Provision grafana VM in codfw - https://phabricator.wikimedia.org/T244357 (10Dzahn)
[18:49:06] <wikibugs>	 10Operations, 10observability, 10vm-requests: Provision grafana VM in codfw - https://phabricator.wikimedia.org/T244357 (10Dzahn) added vm-requests tag and pasted vm-request form. please add the missing data above.
[18:49:45] <wikibugs>	 (03PS2) 10Sbisson: Enable InukaPageView logging on production Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570381 (https://phabricator.wikimedia.org/T238029)
[18:50:24] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Allow list admins to train spam filters - https://phabricator.wikimedia.org/T244241 (10jijiki) @Aklapper we will have to dig into this a bit, thank you!
[18:50:53] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists, 10serviceops: Allow list admins to train spam filters - https://phabricator.wikimedia.org/T244241 (10jijiki) p:05Triage→03Normal
[18:51:02] <wikibugs>	 10Operations, 10serviceops: Reduce read pressure on memcached servers by adding a machine-local Memcache instance - https://phabricator.wikimedia.org/T244340 (10jijiki) p:05Triage→03Normal
[18:51:48] <wikibugs>	 10Operations, 10Patch-For-Review: Upgrade install servers to Buster - https://phabricator.wikimedia.org/T224576 (10Dzahn)
[18:53:39] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to Deployment for Clarakosi - https://phabricator.wikimedia.org/T244381 (10Dzahn) a:03Dzahn
[18:53:47] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists, 10serviceops: Allow list admins to train spam filters - https://phabricator.wikimedia.org/T244241 (10Reedy) https://blogs.gnome.org/ovitters/2008/06/07/using-moderated-messages-to-train-the-bayes-classifier/  >I’ve added a patch to Mailman
[18:55:54] <wikibugs>	 (03PS5) 10Dzahn: define 2 API appservers per row in codfw as canary API appservers [puppet] - 10https://gerrit.wikimedia.org/r/564175 (https://phabricator.wikimedia.org/T242606)
[18:56:57] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] define 2 API appservers per row in codfw as canary API appservers [puppet] - 10https://gerrit.wikimedia.org/r/564175 (https://phabricator.wikimedia.org/T242606) (owner: 10Dzahn)
[18:57:39] <wikibugs>	 10Operations, 10Gerrit-Privilege-Requests, 10SRE-Access-Requests: Request for +2 access to mediawiki-config - https://phabricator.wikimedia.org/T244389 (10MarcoAurelio) Pinging SRE people as this is normally done during SRE onboarding when getting `deployment` or higher.
[18:58:42] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] mediawiki: use cumin alias instead of role query [software/spicerack] - 10https://gerrit.wikimedia.org/r/570159 (https://phabricator.wikimedia.org/T243935) (owner: 10Volans)
[19:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: I, the Bot under the Fountain, allow thee, The Deployer, to do Morning SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200205T1900).
[19:00:04] <jouncebot>	 Ammarpad and niedzielski: A patch you scheduled for Morning SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[19:00:55] <wikibugs>	 (03CR) 10Dzahn: "no functional change on the hosts or in puppet by changing this role, it includes the same things. noop" [puppet] - 10https://gerrit.wikimedia.org/r/564175 (https://phabricator.wikimedia.org/T242606) (owner: 10Dzahn)
[19:01:48] <wikibugs>	 (03PS2) 10Jforrester: Merge $wgLogo into $wgLogos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570379 (https://phabricator.wikimedia.org/T232140)
[19:01:53] <niedzielski>	 o/ It looks like James_F has already merged my patch! Thanks James_F!
[19:03:17] <James_F>	 niedzielski: Happy to help. Jon said it was urgent. :-)
[19:04:09] <James_F>	 Ammarpad isn't around, it seems?
[19:04:15] <niedzielski>	 💯👍
[19:04:15] <James_F>	 stephanebisson: You here?
[19:04:22] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] Set $wgLogos['1x'] (new style access) to $wgLogo (old style access) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570378 (https://phabricator.wikimedia.org/T232140) (owner: 10Jforrester)
[19:04:27] <stephanebisson>	 James_F: yep
[19:04:37] <James_F>	 Cool, will deploy yours.
[19:04:47] <stephanebisson>	 James_F: But let's wait for my patch, I need to recheck something first
[19:05:14] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] Enable InukaPageView logging on production Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570381 (https://phabricator.wikimedia.org/T238029) (owner: 10Sbisson)
[19:05:16] <wikibugs>	 (03Merged) 10jenkins-bot: Set $wgLogos['1x'] (new style access) to $wgLogo (old style access) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570378 (https://phabricator.wikimedia.org/T232140) (owner: 10Jforrester)
[19:05:29] <James_F>	 Ok, stopping.
[19:05:36] <wikibugs>	 (03CR) 10Jforrester: [C: 03+1] "Not yet." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570381 (https://phabricator.wikimedia.org/T238029) (owner: 10Sbisson)
[19:06:59] <wikibugs>	 10Operations, 10serviceops, 10Patch-For-Review: No mw canary servers in codfw - https://phabricator.wikimedia.org/T242606 (10Dzahn) The following are now declared canary API appservers in site.pp:  mw2215, mw2216 (rack A3)  mw2244, mw2245 (rack A4)
[19:08:44] <stephanebisson>	 James_F: please go ahead
[19:09:41] <wikibugs>	 (03PS3) 10Jforrester: Enable InukaPageView logging on production Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570381 (https://phabricator.wikimedia.org/T238029) (owner: 10Sbisson)
[19:09:45] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] Enable InukaPageView logging on production Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570381 (https://phabricator.wikimedia.org/T238029) (owner: 10Sbisson)
[19:10:16] <logmsgbot>	 !log jforrester@deploy1001 scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
[19:10:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:10:33] <James_F>	 Hmm. Not good.
[19:10:45] <wikibugs>	 (03Merged) 10jenkins-bot: Enable InukaPageView logging on production Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570381 (https://phabricator.wikimedia.org/T238029) (owner: 10Sbisson)
[19:10:50] <James_F>	 Hmm.
[19:12:35] <wikibugs>	 (03PS1) 10Jforrester: Revert "Set $wgLogos['1x'] (new style access) to $wgLogo (old style access)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570402
[19:12:45] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] Revert "Set $wgLogos['1x'] (new style access) to $wgLogo (old style access)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570402 (owner: 10Jforrester)
[19:12:50] * James_F sighs.
[19:13:08] <James_F>	 stephanebisson: Sorry, one moment.
[19:13:46] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Set $wgLogos['1x'] (new style access) to $wgLogo (old style access)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570402 (owner: 10Jforrester)
[19:14:34] <James_F>	 stephanebisson: OK, live on mwdebug1001 – can you test?
[19:15:21] <wikibugs>	 10Operations, 10serviceops: Reduce read pressure on memcached servers by adding a machine-local Memcache instance - https://phabricator.wikimedia.org/T244340 (10jijiki) The idea is obviously sensible. I do have some concerns about how this will perform with our loaded mwservers. We could wait to test this afte...
[19:15:28] <logmsgbot>	 !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: Sync back revert of 975b4bbb9 (duration: 01m 06s)
[19:15:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:16:10] <wikibugs>	 10Operations, 10observability, 10vm-requests: Provision grafana VM in codfw - https://phabricator.wikimedia.org/T244357 (10jijiki) p:05Triage→03High
[19:16:12] <wikibugs>	 10Operations, 10observability, 10serviceops, 10vm-requests: Provision grafana VM in codfw - https://phabricator.wikimedia.org/T244357 (10jijiki)
[19:17:09] <stephanebisson>	 James_F: on it
[19:18:09] <stephanebisson>	 James_F: all good
[19:18:17] <James_F>	 OK.
[19:19:41] <wikibugs>	 (03PS3) 10Jforrester: Merge $wgLogo into $wgLogos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570379 (https://phabricator.wikimedia.org/T232140)
[19:19:42] <logmsgbot>	 !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T238029 Enable InukaPageView logging on production Wikipedias (duration: 01m 07s)
[19:19:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:19:45] <stashbot>	 T238029: Code for InukaPageView instrumentation - https://phabricator.wikimedia.org/T238029
[19:19:59] <wikibugs>	 10Operations, 10Core Platform Team, 10MediaWiki-Parser, 10serviceops, and 2 others: API action=parse should be poolcounter-limited if a re-parse is necessary - https://phabricator.wikimedia.org/T243803 (10daniel)
[19:20:23] <wikibugs>	 10Operations, 10MediaWiki-Parser, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team), 10Wikimedia-Incident: API action=parse should be poolcounter-limited if a re-parse is necessary - https://phabricator.wikimedia.org/T243803 (10daniel)
[19:20:26] <James_F>	 OK, SWAT done.
[19:21:01] <stephanebisson>	 James_F: thanks
[19:22:25] <wikibugs>	 10Operations, 10ops-codfw, 10serviceops: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2310.codfw.wmnet'] `  Of which those **FAILED**: ` ['mw2310.codfw.wmnet'] `
[19:23:35] <wikibugs>	 (03PS2) 10Bartosz Dziewoński: Clean up VisualEditor settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/550535
[19:23:38] <wikibugs>	 10Operations, 10Traffic, 10Inuka-Team (Kanban), 10MW-1.35-notes (1.35.0-wmf.16; 2020-01-21), and 2 others: Code for InukaPageView instrumentation - https://phabricator.wikimedia.org/T238029 (10SBisson) This was enabled in production just now.
[19:23:50] <MatmaRex>	 James_F:
[19:23:53] <MatmaRex>	 bah.
[19:23:56] <MatmaRex>	 James_F:
[19:24:01] <wikibugs>	 (03PS3) 10Bartosz Dziewoński: Clean up VisualEditor settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/550535
[19:24:15] <MatmaRex>	 James_F: while you're there, want to do that patch too? ^ i removed the problematic part, i'll do it separately
[19:24:35] <wikibugs>	 10Operations, 10Wikimedia-Incident: Investigate whether we can automatically share incident status docs with WMDE - https://phabricator.wikimedia.org/T244395 (10RLazarus)
[19:24:55] <wikibugs>	 (03PS1) 10Dzahn: site: define 2 codfw appservers as canary_appservers [puppet] - 10https://gerrit.wikimedia.org/r/570405 (https://phabricator.wikimedia.org/T242606)
[19:25:41] <wikibugs>	 10Operations, 10Wikimedia-Incident: Investigate whether we can automatically share incident status docs with WMDE - https://phabricator.wikimedia.org/T244395 (10RLazarus) p:05Triage→03Normal
[19:25:48] <wikibugs>	 (03CR) 10Dzahn: "2 ... or should i do 4 more? That would be 6 in total since we already have 2 (mwdebug2*) and you said "at least 4"." [puppet] - 10https://gerrit.wikimedia.org/r/570405 (https://phabricator.wikimedia.org/T242606) (owner: 10Dzahn)
[19:27:52] <wikibugs>	 10Operations, 10serviceops, 10Patch-For-Review: No mw canary servers in codfw - https://phabricator.wikimedia.org/T242606 (10Dzahn)
[19:29:45] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: Fix incorrect spellings of "RESTBase" in config variables (1/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570409
[19:29:47] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: Fix incorrect spellings of "RESTBase" in config variables (2/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570410
[19:30:11] <wikibugs>	 (03CR) 10Bartosz Dziewoński: "@James I removed the problematic part from this commit, doing it separately in https://gerrit.wikimedia.org/r/570409 + https://gerrit.wiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/550535 (owner: 10Bartosz Dziewoński)
[19:32:12] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "indeed does not seem to be used anywhere, also checked openstackbrowser" [puppet] - 10https://gerrit.wikimedia.org/r/570169 (https://phabricator.wikimedia.org/T173478) (owner: 10Legoktm)
[19:33:02] <James_F>	 MatmaRex: Oh, hey. Sorry. Looking now.
[19:33:17] <MatmaRex>	 it's not important if you're working on somethng else now
[19:34:39] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] Clean up VisualEditor settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/550535 (owner: 10Bartosz Dziewoński)
[19:34:54] <James_F>	 MatmaRex: Just the OOUI release and some UBN follow-ups. :-)
[19:35:06] <James_F>	 MatmaRex: Trade for C+2 on https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/570412 ? ;-)
[19:35:11] <wikibugs>	 (03PS1) 10Dzahn: wmcs::toolsdb_secondary: fix a comment about what this class does [puppet] - 10https://gerrit.wikimedia.org/r/570414
[19:35:26] <wikibugs>	 (03PS2) 10Ammarpad: Add assigment of 'mover' group to bureaucrats on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566925 (https://phabricator.wikimedia.org/T243503)
[19:35:34] <wikibugs>	 (03CR) 10Brion VIBBER: "Thanks Gilles!" [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/568646 (https://phabricator.wikimedia.org/T228467) (owner: 10Brion VIBBER)
[19:35:40] <wikibugs>	 (03Merged) 10jenkins-bot: Clean up VisualEditor settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/550535 (owner: 10Bartosz Dziewoński)
[19:36:31] <wikibugs>	 (03PS2) 10Dzahn: wmcs::toolsdb_secondary: fix a comment about what this class does [puppet] - 10https://gerrit.wikimedia.org/r/570414
[19:37:15] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "comments only" [puppet] - 10https://gerrit.wikimedia.org/r/570414 (owner: 10Dzahn)
[19:38:04] <wikibugs>	 10Operations, 10vm-requests: VM requests for install_server replacements - https://phabricator.wikimedia.org/T244390 (10MoritzMuehlenhoff)
[19:38:42] <ebernhardson>	 !log restart mjolnir-kafka-bulk-daemon across eqiad, daemons appear stuck and not reading new messages
[19:38:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:38:44] <logmsgbot>	 !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Clean up VisualEditor settings (duration: 01m 07s)
[19:38:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:39:06] <wikibugs>	 (03PS3) 10Ammarpad: Add assignment of 'mover' group to bureaucrats on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566925 (https://phabricator.wikimedia.org/T243503)
[19:39:29] <wikibugs>	 10Operations, 10vm-requests: VM requests for install_server replacements - https://phabricator.wikimedia.org/T244390 (10Dzahn) Hmm.. fair enough. That means my DNS change was not correct though, it defined public IPs as before.
[19:39:33] <wikibugs>	 (03CR) 10Muehlenhoff: "I think 4 is fine, in eqiad we currently have 1261-1265 e.g." [puppet] - 10https://gerrit.wikimedia.org/r/570405 (https://phabricator.wikimedia.org/T242606) (owner: 10Dzahn)
[19:39:52] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure: Upgrade puppet in deployment-prep (Puppet agent broken in Beta Cluster) - https://phabricator.wikimedia.org/T243226 (10jbond) @Krenair i think the package needed is `puppet-terminus-puppetdb` which is provided by the `puppetdb` source package.  I have looked at buil...
[19:41:11] <wikibugs>	 10Operations, 10vm-requests: VM requests for install_server replacements - https://phabricator.wikimedia.org/T244390 (10Dzahn)
[19:43:13] <wikibugs>	 (03CR) 10Mholloway: [C: 03+2] Remove handler deleted from the MachineVision extension on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566859 (https://phabricator.wikimedia.org/T241242) (owner: 10Matthias Mullie)
[19:43:35] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] admins: add Sakti Pramudya to ldap_only_admins (wmf) [puppet] - 10https://gerrit.wikimedia.org/r/567563 (https://phabricator.wikimedia.org/T243802) (owner: 10Dzahn)
[19:43:44] <wikibugs>	 (03PS2) 10Dzahn: admins: add Sakti Pramudya to ldap_only_admins (wmf) [puppet] - 10https://gerrit.wikimedia.org/r/567563 (https://phabricator.wikimedia.org/T243802)
[19:44:38] <mutante>	 !log LDAP - added spramduya to wmf group (T243802)
[19:44:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:44:41] <stashbot>	 T243802: Request for LDAP access to the WMF group for Sakti Pramudya - https://phabricator.wikimedia.org/T243802
[19:45:42] <wikibugs>	 (03PS6) 10Ammarpad: Enable lead paragraph in user namespace on nlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562486 (https://phabricator.wikimedia.org/T242030)
[19:46:07] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: Request for LDAP access to the WMF group for Sakti Pramudya - https://phabricator.wikimedia.org/T243802 (10Dzahn) 05Open→03Resolved @SpramudyaDev You have been added to the "wmf" group. You should now be able to login with the same credentials use...
[19:46:35] <wikibugs>	 (03CR) 10Mholloway: [C: 03+2] "Oh, this is being held up on the wmf.18 backport, but shouldn't be since it's for labs. I'll force-merge." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566859 (https://phabricator.wikimedia.org/T241242) (owner: 10Matthias Mullie)
[19:48:27] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10serviceops-radar, 10Core Platform Team Workboards (Clinic Duty Team): Onboarding Hugh Nowlan - https://phabricator.wikimedia.org/T242309 (10Dzahn)
[19:49:52] <wikibugs>	 10Operations: Remove mobrovac@wikimedia.org from techcom@wikimedia.org - https://phabricator.wikimedia.org/T244146 (10Dzahn) @jijiki On our end in private repo. puppetmaster1001:/srv/private/modules/privateexim/files  (already done though)
[19:50:22] <wikibugs>	 (03PS3) 10Ammarpad: Enable new user message for auto-created accounts on zh_classical wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/567306 (https://phabricator.wikimedia.org/T243509)
[19:55:52] <wikibugs>	 (03PS1) 10Jforrester: [nlwiki] Enable VisualEditor in the Project namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570419 (https://phabricator.wikimedia.org/T159711)
[19:56:53] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 04-1] "The email address is incomplete" [puppet] - 10https://gerrit.wikimedia.org/r/567563 (https://phabricator.wikimedia.org/T243802) (owner: 10Dzahn)
[19:58:47] <wikibugs>	 10Operations, 10Gerrit, 10vm-requests: Gerrit VM to test data migration - https://phabricator.wikimedia.org/T239151 (10Dzahn)
[19:58:50] <wikibugs>	 10Operations, 10Gerrit: gerrit1002 running out of space - https://phabricator.wikimedia.org/T243808 (10Dzahn)
[19:59:53] <wikibugs>	 10Operations, 10Gerrit: gerrit1002 running out of space - https://phabricator.wikimedia.org/T243808 (10Dzahn) See T243983. I added a second disk to this VM, it's an additional 10GB and mounted on /srv/dbdump.  Hope that does it.
[20:00:05] <jouncebot>	 twentyafterfour and marxarelli: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Mediawiki train - American Version . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200205T2000).
[20:00:49] <moritzm>	 !log installing unzip security updates
[20:00:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:01:08] <wikibugs>	 (03PS3) 10Dzahn: admins: add Sakti Pramudya to ldap_only_admins (wmf) [puppet] - 10https://gerrit.wikimedia.org/r/567563 (https://phabricator.wikimedia.org/T243802)
[20:01:26] <wikibugs>	 (03CR) 10Dzahn: "thanks for the catch! fixed." [puppet] - 10https://gerrit.wikimedia.org/r/567563 (https://phabricator.wikimedia.org/T243802) (owner: 10Dzahn)
[20:07:20] <wikibugs>	 (03PS1) 10Dzahn: Revert "add IP addresses for new install servers on buster" [dns] - 10https://gerrit.wikimedia.org/r/570423
[20:09:43] <moritzm>	 !log installing git security updates for jessie
[20:09:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:09:49] <twentyafterfour>	 !log Preparing to deploy wmf/1.35.0-wmf.18 to group1 wikis refs T233866
[20:09:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:09:52] <stashbot>	 T233866: 1.35.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T233866
[20:14:41] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] admins: add Sakti Pramudya to ldap_only_admins (wmf) [puppet] - 10https://gerrit.wikimedia.org/r/567563 (https://phabricator.wikimedia.org/T243802) (owner: 10Dzahn)
[20:18:24] <icinga-wm>	 PROBLEM - Host mw2311 is DOWN: PING CRITICAL - Packet loss = 100%
[20:21:05] <logmsgbot>	 !log joal@deploy1001 Started deploy [analytics/hdfs-tools/deploy@714e2d0]: Deploy bug fix version
[20:21:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:21:13] <logmsgbot>	 !log joal@deploy1001 Finished deploy [analytics/hdfs-tools/deploy@714e2d0]: Deploy bug fix version (duration: 00m 08s)
[20:21:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:25:17] <mutante>	 !log mw1267 restarting php7.2-fpm
[20:25:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:25:52] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/weight=25; selector: name=mw1267.eqiad.wmnet
[20:25:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:29:08] <twentyafterfour>	 ok I guess it's safe to go ahead with train for group1? 
[20:29:46] <wikibugs>	 10Operations, 10Core Platform Team, 10Goal: Decommission the "session redis" cluster - https://phabricator.wikimedia.org/T243520 (10daniel) If this is what MediaWiki's MainStash is using, then this is also used by chronology protector. We'd have to move it to something else. Pinging @aaron for that.
[20:30:09] <mutante>	 twentyafterfour: if that was for me, i think so, we just had issues with a single server
[20:30:56] <wikibugs>	 10Operations, 10ops-eqiad: Heating alerts for mw servers in eqiad - https://phabricator.wikimedia.org/T149287 (10Dzahn) mw1267 was showing temperature issues today:  https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&refresh=5m&from=now-2d&to=now&fullscreen&panelId=25&var-server=mw1267&var-datasou...
[20:32:15] <twentyafterfour>	 mutante: thanks, yeah everything seems stable going ahead with the train
[20:32:59] <wikibugs>	 (03PS1) 1020after4: group1 wikis to 1.35.0-wmf.18  refs T233866 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570429
[20:33:03] <wikibugs>	 (03CR) 1020after4: [C: 03+2] group1 wikis to 1.35.0-wmf.18  refs T233866 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570429 (owner: 1020after4)
[20:34:09] <wikibugs>	 (03Merged) 10jenkins-bot: group1 wikis to 1.35.0-wmf.18  refs T233866 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570429 (owner: 1020after4)
[20:34:51] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/weight=25; selector: name=mw1269.eqiad.wmnet
[20:34:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:37:35] <logmsgbot>	 !log joal@deploy1001 Started deploy [analytics/refinery@a47f0d5]: Analytics regular weekly deploy
[20:37:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:37:52] <wikibugs>	 (03CR) 10Volans: [C: 03+2] mediawiki: use cumin alias instead of role query [software/spicerack] - 10https://gerrit.wikimedia.org/r/570159 (https://phabricator.wikimedia.org/T243935) (owner: 10Volans)
[20:42:23] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki: use cumin alias instead of role query [software/spicerack] - 10https://gerrit.wikimedia.org/r/570159 (https://phabricator.wikimedia.org/T243935) (owner: 10Volans)
[20:42:25] <wikibugs>	 (03Merged) 10jenkins-bot: dnsdisc: fix typo in docstring [software/spicerack] - 10https://gerrit.wikimedia.org/r/570160 (owner: 10Volans)
[20:44:12] <logmsgbot>	 !log twentyafterfour@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.18  refs T233866
[20:44:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:44:15] <stashbot>	 T233866: 1.35.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T233866
[20:45:19] <logmsgbot>	 !log twentyafterfour@deploy1001 Synchronized php: group1 wikis to 1.35.0-wmf.18  refs T233866 (duration: 01m 07s)
[20:45:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:46:46] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] admins: add Sakti Pramudya to ldap_only_admins (wmf) [puppet] - 10https://gerrit.wikimedia.org/r/567563 (https://phabricator.wikimedia.org/T243802) (owner: 10Dzahn)
[20:48:32] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+1] dynamicproxy: urlproxy: introduce support for domain-based routing [puppet] - 10https://gerrit.wikimedia.org/r/565556 (https://phabricator.wikimedia.org/T234617) (owner: 10Arturo Borrero Gonzalez)
[20:48:53] <icinga-wm>	 PROBLEM - Check systemd state on ores1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[20:50:25] <mutante>	 !log ores1004 - systemctl start celery-ores-worker
[20:50:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:50:44] <icinga-wm>	 RECOVERY - Check systemd state on ores1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[20:51:03] <logmsgbot>	 !log joal@deploy1001 Finished deploy [analytics/refinery@a47f0d5]: Analytics regular weekly deploy (duration: 13m 28s)
[20:51:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:51:18] <logmsgbot>	 !log joal@deploy1001 Started deploy [analytics/refinery@a47f0d5] (thin): Analytics regular weekly deploy
[20:51:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:51:25] <logmsgbot>	 !log joal@deploy1001 Finished deploy [analytics/refinery@a47f0d5] (thin): Analytics regular weekly deploy (duration: 00m 07s)
[20:51:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:52:03] <icinga-wm>	 RECOVERY - Disk space on notebook1004 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=notebook1004&var-datasource=eqiad+prometheus/ops
[20:58:55] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[21:00:04] <jouncebot>	 cscott, arlolra, subbu, halfak, and accraze: Dear deployers, time to do the Services – Graphoid / Parsoid / Citoid / ORES deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200205T2100).
[21:03:07] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to sites from Google Search Console - https://phabricator.wikimedia.org/T244407 (10CGlenn)
[21:03:07] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to sites from Google Search Console - https://phabricator.wikimedia.org/T244407 (10CGlenn)
[21:04:27] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[21:08:46] <wikibugs>	 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for CherRaye Glenn - https://phabricator.wikimedia.org/T244410 (10CGlenn)
[21:08:47] <wikibugs>	 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for CherRaye Glenn - https://phabricator.wikimedia.org/T244410 (10CGlenn)
[21:12:43] <wikibugs>	 10Operations: Remove mobrovac@wikimedia.org from techcom@wikimedia.org - https://phabricator.wikimedia.org/T244146 (10jijiki) Thank you Daniel!
[21:12:43] <wikibugs>	 10Operations: Remove mobrovac@wikimedia.org from techcom@wikimedia.org - https://phabricator.wikimedia.org/T244146 (10jijiki) Thank you Daniel!
[21:25:31] <hauskatze>	 hmm
[21:25:44] <hauskatze>	 is wikibugs reporting twice now?
[21:26:32] <paladox|UKInEU>	 apparently so
[21:31:05] <mutante>	 !log killing and restarting wikibugs, it was reporting each update twice
[21:31:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:33:46] <hauskatze>	 mutante robbed my idea :P
[21:33:55] <logmsgbot>	 !log arlolra@deploy1001 Started deploy [parsoid/deploy@01d9d3d]: Updating Parsoid to 74730a3
[21:33:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:34:26] <mutante>	 hauskatze: except it does not come back and is worse now ?:(
[21:34:42] <hauskatze>	 mutante: yes it comes, when there's something to report
[21:35:05] <hauskatze>	 wikibugs joins -cloud by default, and others when there's activity to report - and then stays
[21:35:32] <hauskatze>	 and given that I don't have bash installed in this PC, I thank you for restarting the thing
[21:37:01] <logmsgbot>	 !log arlolra@deploy1001 Finished deploy [parsoid/deploy@01d9d3d]: Updating Parsoid to 74730a3 (duration: 03m 07s)
[21:37:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:37:11] <hauskatze>	 it is working mutante 
[21:38:11] <wikibugs>	 10Operations, 10Shinken: Make the Shinken IRC alert and icinga-wm bots use colors - https://phabricator.wikimedia.org/T113785 (10Dzahn) test update
[21:38:19] <mutante>	 hauskatze: ^ confirmed :)
[21:38:26] <hauskatze>	 I told you :)
[21:39:47] <wikibugs>	 10Operations, 10Cloud-VPS (Debian Jessie Deprecation), 10cloud-services-team (Kanban): Migrate labmon* to Buster - https://phabricator.wikimedia.org/T224585 (10bd808) 05Open→03Resolved
[21:39:50] <wikibugs>	 10Operations: Track remaining jessie systems in production - https://phabricator.wikimedia.org/T224549 (10bd808)
[21:40:27] <wikibugs>	 10Operations, 10Performance-Team, 10serviceops, 10Wikimedia-production-error: Wiki diffs take over 15s to load - https://phabricator.wikimedia.org/T244058 (10jijiki) >>! In T244058#5851362, @Joe wrote: > Instead of caching, we should just rate-limit parsing of old revisions to N concurrent revisions per us...
[21:40:49] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] "I don't see anything particularly problematic.  LGTM" (031 comment) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/557050 (owner: 10Jbond)
[21:45:45] <wikibugs>	 10Operations, 10Performance-Team, 10serviceops, 10Wikimedia-production-error: Wiki diffs take over 15s to load - https://phabricator.wikimedia.org/T244058 (10jijiki)
[21:47:06] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Update labstore mediawiki-history readme file [puppet] - 10https://gerrit.wikimedia.org/r/566822 (https://phabricator.wikimedia.org/T243426) (owner: 10Joal)
[21:48:02] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: Decommission db1061.eqiad.wmnet - https://phabricator.wikimedia.org/T238624 (10Jclark-ctr)
[21:48:07] <wikibugs>	 (03CR) 10Eevans: "We should probably wait until we better understand the 404 rate (https://grafana.wikimedia.org/d/000001590/sessionstore?orgId=1&from=15809" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570393 (https://phabricator.wikimedia.org/T243106) (owner: 10Ppchelko)
[21:50:55] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: Decommission db1062.eqiad.wmnet - https://phabricator.wikimedia.org/T239188 (10Jclark-ctr)
[21:56:54] <wikibugs>	 (03PS9) 10Joal: Add profile::analytics::refinery::job::import_wikidata_entites_dumps [puppet] - 10https://gerrit.wikimedia.org/r/567954 (https://phabricator.wikimedia.org/T209655)
[21:57:18] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Report error messages on stderr [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/496565 (owner: 10BryanDavis)
[21:57:32] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Remove lighttpd-precise handling [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/496566 (owner: 10BryanDavis)
[21:57:46] <wikibugs>	 (03CR) 10Joal: "@elukey: This is ready except for the need to choose whether to use 1st block syntax (variables) or 2nd block syntax (single-line)." [puppet] - 10https://gerrit.wikimedia.org/r/567954 (https://phabricator.wikimedia.org/T209655) (owner: 10Joal)
[21:57:49] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Improve support for extra_args [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/496567 (owner: 10BryanDavis)
[21:57:59] <wikibugs>	 (03Merged) 10jenkins-bot: Report error messages on stderr [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/496565 (owner: 10BryanDavis)
[21:58:11] <wikibugs>	 (03Merged) 10jenkins-bot: Remove lighttpd-precise handling [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/496566 (owner: 10BryanDavis)
[21:58:21] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Rename internal "toollabs" package to "toolforge" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563605 (owner: 10BryanDavis)
[21:58:30] <wikibugs>	 (03Merged) 10jenkins-bot: Improve support for extra_args [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/496567 (owner: 10BryanDavis)
[21:58:34] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Rename internal "toollabs" package to "toolforge" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563605 (owner: 10BryanDavis)
[22:01:22] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Deprecate Jessie based Kubernetes types [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/565807 (owner: 10BryanDavis)
[22:01:22] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Rename internal "toollabs" package to "toolforge" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563605 (owner: 10BryanDavis)
[22:01:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Deprecate Jessie based Kubernetes types [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/565807 (owner: 10BryanDavis)
[22:02:32] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: Decommission db1066.eqiad.wmnet - https://phabricator.wikimedia.org/T233071 (10Jclark-ctr)
[22:04:55] <wikibugs>	 10Operations, 10Gerrit-Privilege-Requests, 10SRE-Access-Requests: Request for +2 access to mediawiki-config - https://phabricator.wikimedia.org/T244389 (10Dzahn) a:03Dzahn
[22:07:26] <mutante>	 !log Gerrit - added ppchelko to 'wmf-deployment' Gerrit group (he is already in deployment admin group) (T244389)
[22:07:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:07:29] <stashbot>	 T244389: Request for +2 access to mediawiki-config - https://phabricator.wikimedia.org/T244389
[22:10:31] <wikibugs>	 10Operations, 10Gerrit-Privilege-Requests, 10SRE-Access-Requests: Request for +2 access to mediawiki-config - https://phabricator.wikimedia.org/T244389 (10Dzahn) As @MarcoAurelio points out this normally goes together with getting the deployment admin group.  Petr is already a member of that (and various oth...
[22:10:58] <wikibugs>	 10Operations, 10Gerrit-Privilege-Requests, 10SRE-Access-Requests: Request for +2 access to mediawiki-config - https://phabricator.wikimedia.org/T244389 (10Dzahn) 05Open→03Resolved @Pchelolo This should work now.
[22:18:12] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] profile::mediawiki::php: raise number of workers on the canaries [puppet] - 10https://gerrit.wikimedia.org/r/570255 (owner: 10Giuseppe Lavagetto)
[22:24:32] <wikibugs>	 (03CR) 10Papaul: [C: 03+1] Revert "add IP addresses for new install servers on buster" [dns] - 10https://gerrit.wikimedia.org/r/570423 (owner: 10Dzahn)
[22:26:05] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] Revert "add IP addresses for new install servers on buster" [dns] - 10https://gerrit.wikimedia.org/r/570423 (owner: 10Dzahn)
[22:32:04] <wikibugs>	 10Operations, 10Patch-For-Review, 10Wikimedia-Incident: https://www.youtube.com/watch?v=_R47Cnv_cPs - https://phabricator.wikimedia.org/T244278 (10elhistorial) a:05CDanis→03RuyP
[22:33:25] <rlazarus>	 ^ who has the tools to clean up phab vandalism?
[22:33:50] <hauskatze>	 rlazarus: me
[22:34:11] <hauskatze>	 blocked
[22:34:16] <rlazarus>	 hero <3
[22:34:22] <mutante>	 user already disabled
[22:34:55] <wikibugs>	 10Operations, 10Patch-For-Review, 10Wikimedia-Incident: Tracking task: 2020-02-04 kartotherian outage - https://phabricator.wikimedia.org/T244278 (10Reedy) a:05RuyP→03CDanis
[22:35:05] <mutante>	 hauskatze: but we are not talking about an "undo" button ?
[22:35:26] <hauskatze>	 mutante: Phabricator is not that fancy
[22:35:39] <wikibugs>	 (03PS1) 10CDanis: add cdanis as super-user, also add 'next UID' tracker comment [homer/public] - 10https://gerrit.wikimedia.org/r/570437
[22:35:48] <hauskatze>	 you'll need to do that by hand
[22:35:52] <wikibugs>	 10Operations, 10Patch-For-Review, 10Wikimedia-Incident: Tracking task: 2020-02-04 kartotherian outage - https://phabricator.wikimedia.org/T244278 (10Reedy)
[22:35:56] <rlazarus>	 oh sorry, if I knew we were reverting by hand I'd've just done it
[22:36:00] <rlazarus>	 thanks Reedy 
[22:36:29] <Reedy>	 Phab spam is on the rise again :/
[22:37:00] <mutante>	 hauskatze: there is more
[22:37:01] <hauskatze>	 Did you reverted already? I was tuning spotify to do it :P
[22:39:21] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] add cdanis as super-user, also add 'next UID' tracker comment [homer/public] - 10https://gerrit.wikimedia.org/r/570437 (owner: 10CDanis)
[22:43:21] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decommission db1067.eqiad.wmnet - https://phabricator.wikimedia.org/T238297 (10Jclark-ctr)
[22:43:25] <wikibugs>	 (03PS2) 10Clarakosi: Add restbase202[123] to hiera [puppet] - 10https://gerrit.wikimedia.org/r/570094 (https://phabricator.wikimedia.org/T244178)
[22:50:00] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decommission auth1001 - https://phabricator.wikimedia.org/T234909 (10Jclark-ctr)
[22:57:19] <logmsgbot>	 !log mholloway-shell@deploy1001 Started deploy [mobileapps/deploy@a51f927]: Update mobileapps to a7928fa
[22:57:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:58:51] <wikibugs>	 (03CR) 10Urbanecm: "Good point, didn't realize that" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/567306 (https://phabricator.wikimedia.org/T243509) (owner: 10Ammarpad)
[23:00:14] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] Enable new user message for auto-created accounts on zh_classical wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/567306 (https://phabricator.wikimedia.org/T243509) (owner: 10Ammarpad)
[23:03:22] <wikibugs>	 (03PS7) 10BryanDavis: Rename internal "toollabs" package to "toolforge" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563605
[23:03:24] <wikibugs>	 (03PS4) 10BryanDavis: Deprecate Jessie based Kubernetes types [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/565807
[23:04:37] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] Rename internal "toollabs" package to "toolforge" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563605 (owner: 10BryanDavis)
[23:05:20] <wikibugs>	 (03Merged) 10jenkins-bot: Rename internal "toollabs" package to "toolforge" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/563605 (owner: 10BryanDavis)
[23:05:22] <wikibugs>	 (03Merged) 10jenkins-bot: Deprecate Jessie based Kubernetes types [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/565807 (owner: 10BryanDavis)
[23:08:06] <logmsgbot>	 !log mholloway-shell@deploy1001 Finished deploy [mobileapps/deploy@a51f927]: Update mobileapps to a7928fa (duration: 10m 48s)
[23:08:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:08:32] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure: Upgrade puppet in deployment-prep (Puppet agent broken in Beta Cluster) - https://phabricator.wikimedia.org/T243226 (10Krenair) ugh, ok
[23:15:05] <wikibugs>	 (03PS1) 10Papaul: DHCP: Add wdqs200[7-8] to netboot.cfg and MAC address [puppet] - 10https://gerrit.wikimedia.org/r/570465 (https://phabricator.wikimedia.org/T242301)
[23:15:16] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decommission cp1008, cp1071, cp1072, cp1073, cp1074, cp1099 - https://phabricator.wikimedia.org/T229586 (10Jclark-ctr)
[23:18:12] <wikibugs>	 (03PS1) 10Dzahn: add private IPs for new install servers [dns] - 10https://gerrit.wikimedia.org/r/570468 (https://phabricator.wikimedia.org/T224576)
[23:18:35] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] add private IPs for new install servers [dns] - 10https://gerrit.wikimedia.org/r/570468 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn)
[23:19:42] <wikibugs>	 (03CR) 10Dzahn: "why not buster?" [puppet] - 10https://gerrit.wikimedia.org/r/570465 (https://phabricator.wikimedia.org/T242301) (owner: 10Papaul)
[23:23:00] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, 10fundraising-tech-ops: decommission frav1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T222109 (10Jclark-ctr)
[23:27:39] <wikibugs>	 10Operations, 10ops-eqiad, 10decommission: Decommission neodymium - https://phabricator.wikimedia.org/T220503 (10Jclark-ctr)
[23:30:22] <ebernhardson>	 !log delete search indices duplicated on multiple clusters for: hywwiki, chrwiktionary, gcrwiki, mnwwiki, noboard_chapterswikimedia nqowiki nrmwiki outreachwiki and srnwiki
[23:30:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:31:52] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure: Upgrade puppet in deployment-prep (Puppet agent broken in Beta Cluster) - https://phabricator.wikimedia.org/T243226 (10Krenair) Have put puppetmaster03 back on the old version and created puppetmaster04
[23:32:47] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, 10Patch-For-Review: Decommission dbproxy1006.eqiad.wmnet - https://phabricator.wikimedia.org/T233207 (10Jclark-ctr)
[23:40:17] <wikibugs>	 10Operations, 10Discovery, 10Traffic, 10Wikidata, and 3 others: Wikidata maxlag repeatedly over 5s since Jan20, 2020 (primarily caused by the query service) - https://phabricator.wikimedia.org/T243701 (10Dvorapa)
[23:41:06] <wikibugs>	 10Operations, 10Discovery, 10Traffic, 10Wikidata, and 3 others: Wikidata maxlag repeatedly over 5s since Jan20, 2020 (primarily caused by the query service) - https://phabricator.wikimedia.org/T243701 (10Dvorapa)
[23:48:49] <wikibugs>	 10Operations, 10ops-eqiad, 10Dumps-Generation: (Need By Jan 25) rack/setup/install snapshot1010.eqiad.wmnet - https://phabricator.wikimedia.org/T241794 (10Jclark-ctr)
[23:49:41] <wikibugs>	 10Operations, 10Traffic, 10Inuka-Team (Kanban), 10MW-1.35-notes (1.35.0-wmf.16; 2020-01-21), 10Performance-Team (Radar): Code for InukaPageView instrumentation - https://phabricator.wikimedia.org/T238029 (10nshahquinn-wmf) 05Open→03Resolved I'm seeing events flowing into the production database, so I...