[00:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: Your horoscope predicts another unfortunate Evening SWAT(Max 6 patches) deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191218T0000).
[00:00:04] <jouncebot>	 mooeypoo: A patch you scheduled for Evening SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[00:02:00] <mooeypoo>	 o/ here
[00:02:06] <mooeypoo>	 anyone deploying?
[00:02:27] <wikibugs>	 (03PS2) 10Cwhite: scb: add graphoid matching rules and deploy statsd exporter to scb cluster [puppet] - 10https://gerrit.wikimedia.org/r/558732 (https://phabricator.wikimedia.org/T205870)
[00:02:29] <mooeypoo>	 @Niharika ? <3
[00:02:46] <Niharika>	 Yep, I'm here. Let's do it. 
[00:02:50] <mooeypoo>	 \o/
[00:03:13] <wikibugs>	 (03PS2) 10Niharika29: Enable $wgAllowRequiringEmailForResets on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558341 (https://phabricator.wikimedia.org/T240736) (owner: 10Samwilson)
[00:03:38] <wikibugs>	 (03CR) 10Niharika29: [C: 03+2] "SWAT." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558341 (https://phabricator.wikimedia.org/T240736) (owner: 10Samwilson)
[00:03:47] <ebernhardson>	 \o
[00:04:33] <wikibugs>	 (03Merged) 10jenkins-bot: Enable $wgAllowRequiringEmailForResets on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558341 (https://phabricator.wikimedia.org/T240736) (owner: 10Samwilson)
[00:06:01] <Niharika>	 mooeypoo: The patch is on mwdebug1001. 
[00:07:15] <musikanimal>	 Niharika: lookin good!
[00:07:30] <Niharika>	 Cool!
[00:07:40] <mooeypoo>	 looks great!
[00:08:32] <wikibugs>	 (03CR) 10Niharika29: [C: 03+2] [cirrus] Disable Glent M0 A/B test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548750 (https://phabricator.wikimedia.org/T237363) (owner: 10DCausse)
[00:08:41] <wikibugs>	 (03PS4) 10Niharika29: [cirrus] Disable Glent M0 A/B test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548750 (https://phabricator.wikimedia.org/T237363) (owner: 10DCausse)
[00:09:14] <logmsgbot>	 !log niharika29@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable  on test wikis - T240736 (duration: 01m 02s)
[00:09:16] <Niharika>	 mooeypoo: musikanimal: Deployed. 
[00:09:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:09:22] <stashbot>	 T240736: PRU: Enable PRU Functionality via UI in Test Wiki - https://phabricator.wikimedia.org/T240736
[00:09:26] <mooeypoo>	 Thank you!
[00:09:43] <wikibugs>	 (03CR) 10Niharika29: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548750 (https://phabricator.wikimedia.org/T237363) (owner: 10DCausse)
[00:10:20] <wikibugs>	 (03PS1) 10Arlolra: Bump Parsoid/PHP cluster memory_limit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558737 (https://phabricator.wikimedia.org/T239806)
[00:11:18] <wikibugs>	 (03Merged) 10jenkins-bot: [cirrus] Disable Glent M0 A/B test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548750 (https://phabricator.wikimedia.org/T237363) (owner: 10DCausse)
[00:11:18] <wikibugs>	 (03CR) 10Krinkle: [C: 04-1] "Move it closer to where wgLocalisationCacheConf['storeClas is normally set. Otherwise, I think it just gets overwritten again?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558239 (https://phabricator.wikimedia.org/T105683) (owner: 10Ladsgroup)
[00:12:42] <Niharika>	 ebernhardson: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/548750/ is on mwdebug1001
[00:12:58] <ebernhardson>	 looking
[00:13:39] <ebernhardson>	 Niharika: looks good
[00:13:47] <Niharika>	 Okay.
[00:14:47] <wikibugs>	 (03PS4) 10Niharika29: [cirrus] Enable Glent M0 for dewiki, enwiki and frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548751 (https://phabricator.wikimedia.org/T237365) (owner: 10DCausse)
[00:15:28] <logmsgbot>	 !log niharika29@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Disable Glent M0 A/B test - T237363 (duration: 01m 02s)
[00:15:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:15:34] <stashbot>	 T237363: Undeploy Glent M0 A/B test - https://phabricator.wikimedia.org/T237363
[00:15:42] <wikibugs>	 (03CR) 10Niharika29: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548751 (https://phabricator.wikimedia.org/T237365) (owner: 10DCausse)
[00:16:38] <wikibugs>	 (03Merged) 10jenkins-bot: [cirrus] Enable Glent M0 for dewiki, enwiki and frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548751 (https://phabricator.wikimedia.org/T237365) (owner: 10DCausse)
[00:18:00] <Niharika>	 ebernhardson: Your second patch is on mwdebug1001 too. 
[00:20:52] <ebernhardson>	 Niharika: looks good as well
[00:22:35] <logmsgbot>	 !log niharika29@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable Glent M0 for dewiki, enwiki and frwiki - T237365 (duration: 01m 02s)
[00:22:39] <Niharika>	 ebernhardson: Both deployed. And that concludes the swat. 
[00:22:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:22:41] <stashbot>	 T237365: Enable Glent M0 on de, en and fr wikipedias - https://phabricator.wikimedia.org/T237365
[00:56:37] <wikibugs>	 (03PS2) 10Ladsgroup: Add a bit for forcing LC caching backend in cli mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558239 (https://phabricator.wikimedia.org/T105683)
[00:57:57] <wikibugs>	 (03CR) 10Ladsgroup: "> Patch Set 1: Code-Review-1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558239 (https://phabricator.wikimedia.org/T105683) (owner: 10Ladsgroup)
[01:00:04] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] Add a bit for forcing LC caching backend in cli mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558239 (https://phabricator.wikimedia.org/T105683) (owner: 10Ladsgroup)
[01:14:13] <wikibugs>	 (03PS1) 10Krinkle: varnish: Remove duplicate 'Content-Type: text/html' statement [puppet] - 10https://gerrit.wikimedia.org/r/558752
[01:44:03] <wikibugs>	 (03PS1) 10Krinkle: CommonSettings.php: Remove CLI 'display_errors=stderr' setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558758
[01:50:35] <wikibugs>	 (03PS2) 10Krinkle: CommonSettings.php: Remove CLI 'display_errors=stderr' setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558758
[01:57:58] <wikibugs>	 (03PS1) 10Krinkle: CommonSettings.php: Remove very old 'error_append_string' INI override [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558761
[01:58:47] <wikibugs>	 (03CR) 10Krinkle: "Without patch:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558761 (owner: 10Krinkle)
[01:59:48] <wikibugs>	 (03CR) 10Krinkle: "mwdeploy@mwdebug1001:/srv/mediawiki/w$ cat krinkle.php" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558761 (owner: 10Krinkle)
[02:04:06] <wikibugs>	 (03PS1) 10Krinkle: Follows-up 164a3ac1f099 which removed IEUrlExtension from MediaWiki and has been deployed to all wikis since. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558763 (https://phabricator.wikimedia.org/T232563)
[02:04:32] <wikibugs>	 (03PS2) 10Krinkle: CommonSettings.php: Remove 'SERVER_SOFTWARE' override [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558763 (https://phabricator.wikimedia.org/T232563)
[02:14:56] <wikibugs>	 (03CR) 10VolkerE: [C: 03+1] CommonSettings.php: Remove 'SERVER_SOFTWARE' override [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558763 (https://phabricator.wikimedia.org/T232563) (owner: 10Krinkle)
[02:26:00] <wikibugs>	 (03PS1) 10Krinkle: CommonSettings.php: Move core DB/SQL-related config closer together [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558768
[02:26:02] <wikibugs>	 (03PS1) 10Krinkle: CommonSettings.php: Remove the disabled "temporary" code for T232613 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558769 (https://phabricator.wikimedia.org/T232613)
[02:38:24] <wikibugs>	 (03CR) 10Subramanya Sastry: [C: 03+1] "Joe: bump to 1G ok with you?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558737 (https://phabricator.wikimedia.org/T239806) (owner: 10Arlolra)
[02:39:09] <wikibugs>	 (03CR) 10Subramanya Sastry: [C: 03+1] Bump Parsoid/PHP cluster memory_limit (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558737 (https://phabricator.wikimedia.org/T239806) (owner: 10Arlolra)
[02:41:11] <wikibugs>	 (03PS1) 10Krinkle: etcd: Set globals explicitly in CommonSettings instead of etcd.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558774
[02:41:13] <wikibugs>	 (03PS1) 10Krinkle: etcd: Set $wmfEtcdLastModifiedIndex from CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558775
[02:41:15] <wikibugs>	 (03PS1) 10Krinkle: etcd: Add $etcdHost parameter to wmfSetupEtcd() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558776
[02:41:17] <wikibugs>	 (03PS1) 10Krinkle: etcd: Set wmfSetupEtcd($etcdHost) from CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558777
[02:42:44] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] etcd: Add $etcdHost parameter to wmfSetupEtcd() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558776 (owner: 10Krinkle)
[02:42:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] etcd: Set wmfSetupEtcd($etcdHost) from CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558777 (owner: 10Krinkle)
[02:43:05] <wikibugs>	 (03PS1) 10KartikMistry: Update cxserver to 2019-12-11-144337-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/558778 (https://phabricator.wikimedia.org/T233405)
[02:55:01] <wikibugs>	 (03PS2) 10C. Scott Ananian: Bump Parsoid/PHP cluster memory_limit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558737 (https://phabricator.wikimedia.org/T239806) (owner: 10Arlolra)
[02:55:30] <wikibugs>	 (03CR) 10C. Scott Ananian: [C: 03+1] Bump Parsoid/PHP cluster memory_limit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558737 (https://phabricator.wikimedia.org/T239806) (owner: 10Arlolra)
[02:55:47] <wikibugs>	 (03PS3) 10C. Scott Ananian: Bump Parsoid/PHP cluster memory_limit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558737 (https://phabricator.wikimedia.org/T239806) (owner: 10Arlolra)
[03:28:47] <wikibugs>	 (03PS1) 10CRusnov: Import various tools from netbox-deploy as part of unification [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/558791
[03:34:01] <wikibugs>	 (03CR) 10CRusnov: [C: 03+2] "Self merging because this is a code import." [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/558791 (owner: 10CRusnov)
[04:54:40] <XioNoX>	 !log add static routes for cloud's 185.15.57.0/29 on cr1/2-codfw - T239347
[04:54:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:54:47] <stashbot>	 T239347: create a 'normal' network for codf1dev neutron w/public IPs - https://phabricator.wikimedia.org/T239347
[04:59:42] <XioNoX>	 !log advertise 185.15.57.0/24 from [co|eq]dfw - T239347
[04:59:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:59:48] <stashbot>	 T239347: create a 'normal' network for codf1dev neutron w/public IPs - https://phabricator.wikimedia.org/T239347
[05:11:21] <wikibugs>	 (03PS1) 10Ayounsi: Start advertising 185.15.57.0/24 from codfw/eqdfw [homer/public] - 10https://gerrit.wikimedia.org/r/558821 (https://phabricator.wikimedia.org/T239347)
[05:12:10] <wikibugs>	 (03CR) 10Ayounsi: [V: 03+2 C: 03+2] Start advertising 185.15.57.0/24 from codfw/eqdfw [homer/public] - 10https://gerrit.wikimedia.org/r/558821 (https://phabricator.wikimedia.org/T239347) (owner: 10Ayounsi)
[05:31:35] <marostegui>	 !log Deploy schema change on commonswiki.image on s4 primary master (db1138) - T233135
[05:31:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:31:41] <stashbot>	 T233135: Schema change for refactored actor and comment storage - https://phabricator.wikimedia.org/T233135
[05:38:50] <wikibugs>	 (03PS1) 10Ammarpad: Add new namespace and aliases for zh.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558827 (https://phabricator.wikimedia.org/T241023)
[05:40:07] <wikibugs>	 (03PS2) 10Ammarpad: Add new namespace and aliases for zh.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558827 (https://phabricator.wikimedia.org/T241023)
[05:41:39] <wikibugs>	 (03PS3) 10Ammarpad: Add new namespace and aliases for zh.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558827 (https://phabricator.wikimedia.org/T241023)
[05:43:04] <wikibugs>	 (03PS4) 10Ammarpad: Add new namespace and aliases for zh.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558827 (https://phabricator.wikimedia.org/T241023)
[05:46:36] <wikibugs>	 (03PS1) 10Marostegui: db1136: Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/558829
[05:48:21] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1136: Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/558829 (owner: 10Marostegui)
[05:55:20] <marostegui>	 !log Upgrade db2071 and db2072
[05:55:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:59:48] <marostegui>	 !log Upgrade db2088, db2092
[05:59:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:03:35] <marostegui>	 !log Upgrade db2112 db2116 db2130
[06:03:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:05:36] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in codfw on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-m
[06:07:08] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in codfw on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET
[06:12:32] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET
[06:12:48] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool pc1007 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558836
[06:12:50] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=GET
[06:14:55] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool pc1007 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558836 (owner: 10Marostegui)
[06:15:42] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool pc1007 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558836 (owner: 10Marostegui)
[06:17:25] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool pc1007 for upgrade (duration: 01m 11s)
[06:17:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:20:10] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool pc1007" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558838
[06:21:48] <icinga-wm>	 PROBLEM - MariaDB Slave IO: pc1 on pc2010 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2003, Errmsg: error reconnecting to master repl@pc1007.eqiad.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: Cant connect to MySQL server on pc1007.eqiad.wmnet (111 Connection refused) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[06:22:11] <marostegui>	 ^ expected
[06:22:30] <icinga-wm>	 PROBLEM - MariaDB Slave IO: pc1 on pc2007 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2003, Errmsg: error reconnecting to master repl@pc1007.eqiad.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: Cant connect to MySQL server on pc1007.eqiad.wmnet (111 Connection refused) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[06:22:33] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool pc1007" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558838 (owner: 10Marostegui)
[06:23:27] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool pc1007" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558838 (owner: 10Marostegui)
[06:23:36] <icinga-wm>	 RECOVERY - MariaDB Slave IO: pc1 on pc2010 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[06:24:16] <icinga-wm>	 RECOVERY - MariaDB Slave IO: pc1 on pc2007 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[06:24:40] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool pc1007 after upgrade (duration: 01m 00s)
[06:24:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:28:01] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1105:3311, db1105:3312', diff saved to https://phabricator.wikimedia.org/P9922 and previous config saved to /var/cache/conftool/dbconfig/20191218-062759-marostegui.json
[06:28:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:30:06] <marostegui>	 !log Upgrade db1105
[06:30:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:36:53] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1105:3311, db1105:3312', diff saved to https://phabricator.wikimedia.org/P9923 and previous config saved to /var/cache/conftool/dbconfig/20191218-063652-marostegui.json
[06:36:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:37:57] <moritzm>	 !log upgrading debdeploy-client to 0.2.0 fleet-wide
[06:38:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:38:13] <moritzm>	 !log upgrading debmonitor-client to 0.2.0 fleet-wide
[06:38:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:45:12] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1105:3311, db1105:3312', diff saved to https://phabricator.wikimedia.org/P9924 and previous config saved to /var/cache/conftool/dbconfig/20191218-064510-marostegui.json
[06:45:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:45:41] <onimisionipe>	 !log running replicate-osm on maps1004 after failed osm sync - T239728
[06:45:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:45:46] <stashbot>	 T239728: Re-import OSM data at eqiad and codfw to temporarily fix current OSM replication issues. - https://phabricator.wikimedia.org/T239728
[06:48:03] <logmsgbot>	 !log volker-e@deploy1001 Started deploy [design/style-guide@d13b55d]: Deploy design/style-guide:
[06:48:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:48:10] <logmsgbot>	 !log volker-e@deploy1001 Finished deploy [design/style-guide@d13b55d]: Deploy design/style-guide:  (duration: 00m 07s)
[06:48:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:51:32] <marostegui>	 !log Upgrade db2135
[06:51:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:53:50] <marostegui>	 !log Upgrade db2132, db2133, db2134
[06:53:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:05:16] <wikibugs>	 (03PS1) 10Marostegui: filtered_tables.txt: Remove dropped columns [puppet] - 10https://gerrit.wikimedia.org/r/558851 (https://phabricator.wikimedia.org/T233135)
[07:08:02] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/558316 (https://phabricator.wikimedia.org/T240732) (owner: 10Jcrespo)
[07:14:43] <logmsgbot>	 !log andrew@deploy1001 Started deploy [horizon/deploy@f77e91b]: Fix for T240979
[07:14:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:14:49] <stashbot>	 T240979: Unable to create Web Proxy in the "phragile" Cloud VPS project (using Horizon) - https://phabricator.wikimedia.org/T240979
[07:18:07] <logmsgbot>	 !log andrew@deploy1001 Finished deploy [horizon/deploy@f77e91b]: Fix for T240979 (duration: 03m 24s)
[07:18:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:22:44] <wikibugs>	 (03CR) 10WMDE-Fisch: [C: 03+1] Phragile: Added PHP extensions needed by PHP 7 dependencies [puppet] - 10https://gerrit.wikimedia.org/r/558476 (https://phabricator.wikimedia.org/T211228) (owner: 10WMDE-leszek)
[07:30:03] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool es1016 for upgrade', diff saved to https://phabricator.wikimedia.org/P9925 and previous config saved to /var/cache/conftool/dbconfig/20191218-073002-marostegui.json
[07:30:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:31:49] <icinga-wm>	 ACKNOWLEDGEMENT - BFD status on cr1-eqiad is CRITICAL: CRIT: Down: 1 Ayounsi https://phabricator.wikimedia.org/T240659 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:40:33] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool es1016', diff saved to https://phabricator.wikimedia.org/P9926 and previous config saved to /var/cache/conftool/dbconfig/20191218-074032-marostegui.json
[07:40:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:46:43] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1105:3311, db1105:3312', diff saved to https://phabricator.wikimedia.org/P9927 and previous config saved to /var/cache/conftool/dbconfig/20191218-074642-marostegui.json
[07:46:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:53:15] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
[07:53:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:53:21] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
[07:53:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:53:48] <akosiaris>	 !log run helmfile sync for all staging deployments T239835
[07:53:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:53:53] <stashbot>	 T239835: setup new, buster based, kubernetes etcd servers for staging/codfw/eqiad cluster - https://phabricator.wikimedia.org/T239835
[07:53:55] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
[07:53:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:54:24] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'echostore' for release 'staging' .
[07:54:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:54:48] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics' for release 'analytics' .
[07:54:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:55:12] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] Remove old redundant rbac/ dir [deployment-charts] - 10https://gerrit.wikimedia.org/r/558704 (owner: 10Alexandros Kosiaris)
[07:55:18] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'logging-external' .
[07:55:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:55:29] <wikibugs>	 (03Merged) 10jenkins-bot: Remove old redundant rbac/ dir [deployment-charts] - 10https://gerrit.wikimedia.org/r/558704 (owner: 10Alexandros Kosiaris)
[07:55:53] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] RBAC: Add system:nodes group to system:node [deployment-charts] - 10https://gerrit.wikimedia.org/r/558705 (https://phabricator.wikimedia.org/T239835) (owner: 10Alexandros Kosiaris)
[07:56:08] <wikibugs>	 (03Merged) 10jenkins-bot: RBAC: Add system:nodes group to system:node [deployment-charts] - 10https://gerrit.wikimedia.org/r/558705 (https://phabricator.wikimedia.org/T239835) (owner: 10Alexandros Kosiaris)
[07:56:24] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'main' .
[07:56:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:56:53] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
[07:56:54] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
[07:57:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:57:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:57:28] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
[07:57:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:58:05] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
[07:58:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:58:29] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1105:3311, db1105:3312', diff saved to https://phabricator.wikimedia.org/P9928 and previous config saved to /var/cache/conftool/dbconfig/20191218-075828-marostegui.json
[07:58:31] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
[07:58:32] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
[07:58:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:58:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:58:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:59:15] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
[07:59:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:59:20] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool es1016', diff saved to https://phabricator.wikimedia.org/P9929 and previous config saved to /var/cache/conftool/dbconfig/20191218-075919-marostegui.json
[07:59:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:59:43] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
[07:59:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:01:19] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
[08:01:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:01:43] <marostegui>	 !log Upgrade db2109
[08:01:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:04:53] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
[08:04:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:05:40] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
[08:05:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:12:12] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
[08:12:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:12:57] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
[08:12:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool es1016', diff saved to https://phabricator.wikimedia.org/P9930 and previous config saved to /var/cache/conftool/dbconfig/20191218-081256-marostegui.json
[08:13:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:13:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:20:04] <wikibugs>	 (03PS1) 10Muehlenhoff: Make the images proxy configurable and add boron [puppet] - 10https://gerrit.wikimedia.org/r/558886
[08:20:05] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
[08:20:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:21:03] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
[08:21:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:21:13] <wikibugs>	 (03PS2) 10Jcrespo: admin: Provide access to kzimmerman (kzeta) to production analytics [puppet] - 10https://gerrit.wikimedia.org/r/558316 (https://phabricator.wikimedia.org/T240732)
[08:21:40] <wikibugs>	 (03CR) 10Jcrespo: admin: Provide access to kzimmerman (kzeta) to production analytics (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/558316 (https://phabricator.wikimedia.org/T240732) (owner: 10Jcrespo)
[08:22:28] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool es1016', diff saved to https://phabricator.wikimedia.org/P9931 and previous config saved to /var/cache/conftool/dbconfig/20191218-082226-marostegui.json
[08:22:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:24:29] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] admin: Provide access to kzimmerman (kzeta) to production analytics [puppet] - 10https://gerrit.wikimedia.org/r/558316 (https://phabricator.wikimedia.org/T240732) (owner: 10Jcrespo)
[08:24:31] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620
[08:26:46] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620 (owner: 10Giuseppe Lavagetto)
[08:29:18] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620
[08:31:32] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620 (owner: 10Giuseppe Lavagetto)
[08:44:26] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] elasticsearch: decommission elastic[1018-1031] [dns] - 10https://gerrit.wikimedia.org/r/558525 (https://phabricator.wikimedia.org/T239821) (owner: 10Gehel)
[08:44:33] <wikibugs>	 (03PS3) 10Gehel: elasticsearch: decommission elastic[1018-1031] [dns] - 10https://gerrit.wikimedia.org/r/558525 (https://phabricator.wikimedia.org/T239821)
[08:46:22] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM from a quick look, CC'ing Alex as he's involved with Graphoid IIRC" [puppet] - 10https://gerrit.wikimedia.org/r/558732 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite)
[08:49:24] <wikibugs>	 (03PS2) 10Muehlenhoff: Make the images proxy configurable and add boron [puppet] - 10https://gerrit.wikimedia.org/r/558886
[08:51:40] <wikibugs>	 (03PS1) 10Vgutierrez: ATS: Disable debug mode in cp3050 [puppet] - 10https://gerrit.wikimedia.org/r/558970 (https://phabricator.wikimedia.org/T238494)
[08:53:55] <wikibugs>	 (03CR) 10Ema: [C: 03+1] ATS: Disable debug mode in cp3050 [puppet] - 10https://gerrit.wikimedia.org/r/558970 (https://phabricator.wikimedia.org/T238494) (owner: 10Vgutierrez)
[08:54:05] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] ATS: Disable debug mode in cp3050 [puppet] - 10https://gerrit.wikimedia.org/r/558970 (https://phabricator.wikimedia.org/T238494) (owner: 10Vgutierrez)
[08:58:29] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
[08:58:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:59:13] <vgutierrez>	 !log restarting ats-be on cp3050
[08:59:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:01:34] <wikibugs>	 (03PS5) 10Alexandros Kosiaris: Switch codfw calico controller to the new etcd cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/558472 (https://phabricator.wikimedia.org/T239835)
[09:01:34] <wikibugs>	 (03PS5) 10Alexandros Kosiaris: Switch eqiad calico controller to the new etcd cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/558473 (https://phabricator.wikimedia.org/T239835)
[09:01:36] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: rbac: Add default metadata to system:node RBAC [deployment-charts] - 10https://gerrit.wikimedia.org/r/558972 (https://phabricator.wikimedia.org/T239835)
[09:03:14] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] "graphoid is without a maintainer and scheduled to be undeployed by mid of next quarter. Feel free to proceed with this, but don't spend to" [puppet] - 10https://gerrit.wikimedia.org/r/558732 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite)
[09:04:49] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] Make the images proxy configurable and add boron [puppet] - 10https://gerrit.wikimedia.org/r/558886 (owner: 10Muehlenhoff)
[09:05:47] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] rbac: Add default metadata to system:node RBAC [deployment-charts] - 10https://gerrit.wikimedia.org/r/558972 (https://phabricator.wikimedia.org/T239835) (owner: 10Alexandros Kosiaris)
[09:05:56] <wikibugs>	 (03Merged) 10jenkins-bot: rbac: Add default metadata to system:node RBAC [deployment-charts] - 10https://gerrit.wikimedia.org/r/558972 (https://phabricator.wikimedia.org/T239835) (owner: 10Alexandros Kosiaris)
[09:10:22] <wikibugs>	 (03PS6) 10Alexandros Kosiaris: Switch codfw calico controller to the new etcd cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/558472 (https://phabricator.wikimedia.org/T239835)
[09:10:24] <wikibugs>	 (03PS6) 10Alexandros Kosiaris: Switch eqiad calico controller to the new etcd cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/558473 (https://phabricator.wikimedia.org/T239835)
[09:10:26] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: admin: don't rely on coredns for kube-system tiller [deployment-charts] - 10https://gerrit.wikimedia.org/r/558973 (https://phabricator.wikimedia.org/T239835)
[09:14:30] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: admin: don't rely on coredns for kube-system tiller [deployment-charts] - 10https://gerrit.wikimedia.org/r/558973 (https://phabricator.wikimedia.org/T239835)
[09:14:32] <wikibugs>	 (03PS7) 10Alexandros Kosiaris: Switch codfw calico controller to the new etcd cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/558472 (https://phabricator.wikimedia.org/T239835)
[09:14:34] <wikibugs>	 (03PS7) 10Alexandros Kosiaris: Switch eqiad calico controller to the new etcd cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/558473 (https://phabricator.wikimedia.org/T239835)
[09:17:38] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "minor pedantic nitpick, otherwise LGTM" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/558220 (https://phabricator.wikimedia.org/T232135) (owner: 10BryanDavis)
[09:18:36] <ema>	 !log repool cp3050 after ats-be restart
[09:18:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:19:10] <wikibugs>	 (03PS1) 10Volans: CLI: fix typo in help message. [software/debmonitor] - 10https://gerrit.wikimedia.org/r/558979 (https://phabricator.wikimedia.org/T237978)
[09:20:31] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] admin: don't rely on coredns for kube-system tiller [deployment-charts] - 10https://gerrit.wikimedia.org/r/558973 (https://phabricator.wikimedia.org/T239835) (owner: 10Alexandros Kosiaris)
[09:20:44] <wikibugs>	 (03Merged) 10jenkins-bot: admin: don't rely on coredns for kube-system tiller [deployment-charts] - 10https://gerrit.wikimedia.org/r/558973 (https://phabricator.wikimedia.org/T239835) (owner: 10Alexandros Kosiaris)
[09:22:29] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool es1011 for upgrade', diff saved to https://phabricator.wikimedia.org/P9933 and previous config saved to /var/cache/conftool/dbconfig/20191218-092228-marostegui.json
[09:22:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:23:53] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] Be more verbose about backup1001, backup2001 [dns] - 10https://gerrit.wikimedia.org/r/547537 (owner: 10Alexandros Kosiaris)
[09:23:57] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: Be more verbose about backup1001, backup2001 [dns] - 10https://gerrit.wikimedia.org/r/547537
[09:24:12] <wikibugs>	 (03Abandoned) 10Alexandros Kosiaris: Be more verbose about backup1001, backup2001 [dns] - 10https://gerrit.wikimedia.org/r/547537 (owner: 10Alexandros Kosiaris)
[09:24:52] <elukey>	 !log execute 'megacli -LDSetProp WT -LAll -aAll' on analytics1057 - T239045
[09:24:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:24:57] <stashbot>	 T239045: analytics1057's BBU is faulty - https://phabricator.wikimedia.org/T239045
[09:26:26] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool es1011', diff saved to https://phabricator.wikimedia.org/P9934 and previous config saved to /var/cache/conftool/dbconfig/20191218-092625-marostegui.json
[09:26:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:27:55] <wikibugs>	 (03PS3) 10Muehlenhoff: Make the images proxy configurable and add boron [puppet] - 10https://gerrit.wikimedia.org/r/558886
[09:28:03] <wikibugs>	 (03PS1) 10Elukey: Fix /mnt/hdfs check to use Kerberos on the Hadoop test coordinator [puppet] - 10https://gerrit.wikimedia.org/r/558982
[09:29:23] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Fix /mnt/hdfs check to use Kerberos on the Hadoop test coordinator [puppet] - 10https://gerrit.wikimedia.org/r/558982 (owner: 10Elukey)
[09:30:54] <wikibugs>	 (03PS1) 10Ema: ATS: increase keep_alive_no_activity_timeout_out on ats-be [puppet] - 10https://gerrit.wikimedia.org/r/558984 (https://phabricator.wikimedia.org/T238494)
[09:30:56] <wikibugs>	 (03PS1) 10Ema: mediawiki::webserver: increase TLS termination keepalive_timeout [puppet] - 10https://gerrit.wikimedia.org/r/558985 (https://phabricator.wikimedia.org/T238494)
[09:31:55] <wikibugs>	 (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/20048/" [puppet] - 10https://gerrit.wikimedia.org/r/558886 (owner: 10Muehlenhoff)
[09:33:40] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/558886 (owner: 10Muehlenhoff)
[09:34:32] <wikibugs>	 (03CR) 10Ema: "https://puppet-compiler.wmflabs.org/compiler1001/20049/mw1266.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/558985 (https://phabricator.wikimedia.org/T238494) (owner: 10Ema)
[09:37:12] <wikibugs>	 (03CR) 10Ema: "https://puppet-compiler.wmflabs.org/compiler1003/20050/cp3050.esams.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/558984 (https://phabricator.wikimedia.org/T238494) (owner: 10Ema)
[09:37:21] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool es1011', diff saved to https://phabricator.wikimedia.org/P9935 and previous config saved to /var/cache/conftool/dbconfig/20191218-093720-marostegui.json
[09:37:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:41:08] <wikibugs>	 (03CR) 10Vgutierrez: "IMHO the timeout value should be aligned with the rest of the timeouts in the stack (see https://wikitech.wikimedia.org/wiki/HTTP_timeouts" [puppet] - 10https://gerrit.wikimedia.org/r/558985 (https://phabricator.wikimedia.org/T238494) (owner: 10Ema)
[09:45:42] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool es1011', diff saved to https://phabricator.wikimedia.org/P9936 and previous config saved to /var/cache/conftool/dbconfig/20191218-094540-marostegui.json
[09:45:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:53:32] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/558979 (https://phabricator.wikimedia.org/T237978) (owner: 10Volans)
[09:53:57] <wikibugs>	 (03CR) 10Volans: [C: 03+2] CLI: fix typo in help message. [software/debmonitor] - 10https://gerrit.wikimedia.org/r/558979 (https://phabricator.wikimedia.org/T237978) (owner: 10Volans)
[09:55:35] <wikibugs>	 (03PS4) 10Muehlenhoff: Make the images proxy configurable and add boron [puppet] - 10https://gerrit.wikimedia.org/r/558886
[09:56:34] <wikibugs>	 (03Merged) 10jenkins-bot: CLI: fix typo in help message. [software/debmonitor] - 10https://gerrit.wikimedia.org/r/558979 (https://phabricator.wikimedia.org/T237978) (owner: 10Volans)
[09:57:11] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool es1011', diff saved to https://phabricator.wikimedia.org/P9937 and previous config saved to /var/cache/conftool/dbconfig/20191218-095710-marostegui.json
[09:57:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:00:07] <akosiaris>	 !log populate new calico stores for codfw T239835
[10:00:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:00:12] <stashbot>	 T239835: setup new, buster based, kubernetes etcd servers for staging/codfw/eqiad cluster - https://phabricator.wikimedia.org/T239835
[10:03:27] <wikibugs>	 (03CR) 10DCausse: "async import is currently running to catchup updates on wdqs1010 (https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1" [puppet] - 10https://gerrit.wikimedia.org/r/558526 (https://phabricator.wikimedia.org/T238045) (owner: 10DCausse)
[10:04:35] <logmsgbot>	 !log akosiaris@cumin1001 START - Cookbook sre.hosts.downtime
[10:04:37] <logmsgbot>	 !log akosiaris@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[10:04:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:04:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:04:50] <logmsgbot>	 !log akosiaris@cumin1001 START - Cookbook sre.hosts.downtime
[10:04:53] <logmsgbot>	 !log akosiaris@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[10:04:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:04:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:05:05] <logmsgbot>	 !log akosiaris@cumin1001 START - Cookbook sre.hosts.downtime
[10:05:05] <logmsgbot>	 !log akosiaris@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[10:05:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:05:10] <logmsgbot>	 !log akosiaris@cumin1001 START - Cookbook sre.hosts.downtime
[10:05:11] <logmsgbot>	 !log akosiaris@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[10:05:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:05:16] <logmsgbot>	 !log akosiaris@cumin1001 START - Cookbook sre.hosts.downtime
[10:05:18] <logmsgbot>	 !log akosiaris@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[10:05:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:05:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:05:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:05:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:06:31] <logmsgbot>	 !log akosiaris@cumin1001 START - Cookbook sre.hosts.downtime
[10:06:31] <logmsgbot>	 !log akosiaris@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[10:06:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:06:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:08:42] <wikibugs>	 (03CR) 10DCausse: [cirrus] add elastic mapping for ores drafttopics (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558577 (https://phabricator.wikimedia.org/T240550) (owner: 10DCausse)
[10:09:35] <logmsgbot>	 !log jforrester@deploy1001 Synchronized php-1.35.0-wmf.10/extensions/GrowthExperiments/includes: T240444 Make PageViewInfo a soft dependency (duration: 01m 04s)
[10:09:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:09:40] <stashbot>	 T240444: GrowthExperiments homepage requires PageViewInfo even it is declared as soft dependency - https://phabricator.wikimedia.org/T240444
[10:10:44] <marostegui>	 !log Upgrade db2083
[10:10:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:12:16] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Make the images proxy configurable and add boron [puppet] - 10https://gerrit.wikimedia.org/r/558886 (owner: 10Muehlenhoff)
[10:12:35] <akosiaris>	 heads up, kubernetes codfw cluster reinit beginning in a few, I 've already downtime stuff and I am currently depooling all services, but I 've may have forgotten something
[10:12:49] <wikibugs>	 (03PS5) 10Alexandros Kosiaris: k8s: Migrate codfw to the new etcd cluster [puppet] - 10https://gerrit.wikimedia.org/r/558354 (https://phabricator.wikimedia.org/T239835)
[10:12:50] <wikibugs>	 (03PS5) 10Alexandros Kosiaris: k8s: Migrate eqiad to the new etcd cluster [puppet] - 10https://gerrit.wikimedia.org/r/558355 (https://phabricator.wikimedia.org/T239835)
[10:12:52] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: cache::text: Depool k8s services [puppet] - 10https://gerrit.wikimedia.org/r/559002 (https://phabricator.wikimedia.org/T239835)
[10:12:57] <James_F>	 Good luck.
[10:16:02] <logmsgbot>	 !log akosiaris@cumin1001 conftool action : set/pooled=false; selector: name=codfw,dnsdisc=(eventgate.*|mathoid|citoid|restrouter|sessionstore|echostore|zotero|termbox|wikifeeds|cxserver|blubberoid)
[10:16:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:16:15] <akosiaris>	 !log depooling eventgate.*|mathoid|citoid|restrouter|sessionstore|echostore|zotero|termbox|wikifeeds|cxserver|blubberoid) from codfw kubernetes
[10:16:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:26:45] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM, thanks!" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/558726 (https://phabricator.wikimedia.org/T241008) (owner: 10BryanDavis)
[10:28:46] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "Also, worth double checking that the ingress admission controller will allow this new setting in the ingress objects:" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/558726 (https://phabricator.wikimedia.org/T241008) (owner: 10BryanDavis)
[10:31:46] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "@ottomata, a change in https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/master/helmfile.d/services/eqiad/even" [puppet] - 10https://gerrit.wikimedia.org/r/558117 (owner: 10Ottomata)
[10:31:51] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] Change eventgate-logging-external TLS port to 4392 [puppet] - 10https://gerrit.wikimedia.org/r/558117 (owner: 10Ottomata)
[10:34:35] <wikibugs>	 10Operations, 10netops, 10cloud-services-team (Kanban): Return traffic to eqiad WMCS triggering FNM - https://phabricator.wikimedia.org/T240789 (10akosiaris) I guess parsing past alerts can help answer this question? If the number of exceptions that need to be defined is small enough it might not be worth it...
[10:35:36] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] Switch codfw calico controller to the new etcd cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/558472 (https://phabricator.wikimedia.org/T239835) (owner: 10Alexandros Kosiaris)
[10:35:53] <wikibugs>	 (03Merged) 10jenkins-bot: Switch codfw calico controller to the new etcd cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/558472 (https://phabricator.wikimedia.org/T239835) (owner: 10Alexandros Kosiaris)
[10:36:12] <wikibugs>	 (03PS3) 10Alexandros Kosiaris: mathoid: Remove mwapi_req/restbase_req [deployment-charts] - 10https://gerrit.wikimedia.org/r/488800
[10:37:10] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] "Multiple services have now been moved over, we can now merge this. It should be a noop and will make it to the next mathoid chart version" [deployment-charts] - 10https://gerrit.wikimedia.org/r/488800 (owner: 10Alexandros Kosiaris)
[10:37:28] <wikibugs>	 (03Merged) 10jenkins-bot: mathoid: Remove mwapi_req/restbase_req [deployment-charts] - 10https://gerrit.wikimedia.org/r/488800 (owner: 10Alexandros Kosiaris)
[10:41:54] <wikibugs>	 10Operations, 10netops, 10cloud-services-team (Kanban): Return traffic to eqiad WMCS triggering FNM - https://phabricator.wikimedia.org/T240789 (10ayounsi) Good point! Looking at past alerts only that one was a false positive.  2) would allow us to have stricter thresholds, but I agree that it's outside the...
[10:47:26] <wikibugs>	 (03PS3) 10Andrew Bogott: cloud base images: enable passwordless login on serial0 [puppet] - 10https://gerrit.wikimedia.org/r/558296 (https://phabricator.wikimedia.org/T240660)
[10:47:48] <logmsgbot>	 !log jforrester@deploy1001 Synchronized php-1.35.0-wmf.11/extensions/VisualEditor/includes/ApiVisualEditor.php: T240961: Fix unchecked array access in ApiVisualEditor (duration: 01m 02s)
[10:47:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:47:54] <stashbot>	 T240961: VisualEditor throwing "PHP Notice: Undefined index: etag" on officewiki as of wmf.11 - https://phabricator.wikimedia.org/T240961
[10:47:58] <wikibugs>	 (03PS4) 10Andrew Bogott: cloud base images: enable passwordless login on serial0 [puppet] - 10https://gerrit.wikimedia.org/r/558296 (https://phabricator.wikimedia.org/T240660)
[10:48:51] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: Update toolviews.py nginx log parser [puppet] - 10https://gerrit.wikimedia.org/r/558676 (https://phabricator.wikimedia.org/T238641) (owner: 10BryanDavis)
[10:48:59] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] cloud base images: enable passwordless login on serial0 [puppet] - 10https://gerrit.wikimedia.org/r/558296 (https://phabricator.wikimedia.org/T240660) (owner: 10Andrew Bogott)
[10:49:22] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/558676 (https://phabricator.wikimedia.org/T238641) (owner: 10BryanDavis)
[10:51:31] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620
[10:51:36] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM. Please collect +1 from Hieu before merging." [puppet] - 10https://gerrit.wikimedia.org/r/558597 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff)
[10:53:25] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620 (owner: 10Giuseppe Lavagetto)
[10:56:22] <wikibugs>	 (03PS1) 10Andrew Bogott: bootstrap-vz buster: rename puppet-overrides.conf to match the stretch filename [puppet] - 10https://gerrit.wikimedia.org/r/559014 (https://phabricator.wikimedia.org/T240660)
[10:57:16] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] bootstrap-vz buster: rename puppet-overrides.conf to match the stretch filename [puppet] - 10https://gerrit.wikimedia.org/r/559014 (https://phabricator.wikimedia.org/T240660) (owner: 10Andrew Bogott)
[10:59:57] <wikibugs>	 10Operations, 10ops-eqiad, 10serviceops: (Need By Dec 20) rack/setup/install mw13[49-84].eqiad.wmnet - https://phabricator.wikimedia.org/T236437 (10jijiki) @Jclark-ctr if you feel this will work better, we are happy. Either way, this racking is still better than the original one (30 servers in  D 5). Thank you!
[11:00:21] <effie>	 jouncebot: next 
[11:00:21] <jouncebot>	 In 0 hour(s) and 59 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191218T1200)
[11:01:04] <effie>	 CFisch_WMDE: are you running the upcoming SWAT?
[11:04:39] <CFisch_WMDE>	 effie: If I'm alone with my patch I might take the opportunity to practice my deployment skills, so yes.
[11:04:56] <CFisch_WMDE>	 (I've got backup in the office to accompany me on that)
[11:09:18] <effie>	 I want to to swap the scap proxies in eqiad and codfw 
[11:09:33] <effie>	 so ping me when you are about to run scap 
[11:09:45] <wikibugs>	 10Operations, 10User-jbond: Collects metrics for CAS - https://phabricator.wikimedia.org/T233934 (10jbond) 05Open→03Resolved a:03jbond This is completed https://grafana-next.wikimedia.org/d/spring_boot_21/spring-boot-statistics
[11:09:47] <wikibugs>	 10Operations, 10User-jbond: Further steps for CAS/web SSO - https://phabricator.wikimedia.org/T233921 (10jbond)
[11:09:52] <effie>	 I can delay merging my patch if needed anyway 
[11:13:50] <wikibugs>	 (03PS5) 10Giuseppe Lavagetto: wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620
[11:14:55] <moritzm>	 !log installing spamassassin security updates on mendelevium/OTRS
[11:14:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:16:06] <marostegui>	 !log Upgrade db2081, db2082
[11:16:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:20:54] <wikibugs>	 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb deamons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui)
[11:23:06] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in codfw on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-m
[11:24:54] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=GET
[11:25:55] <wikibugs>	 (03PS1) 10Effie Mouzeli: scap:dsh.yaml switch scap proxies so to reimage them [puppet] - 10https://gerrit.wikimedia.org/r/559021 (https://phabricator.wikimedia.org/T239054)
[11:26:05] <moritzm>	 !log installing spamassassin security updates on fermium/lists
[11:26:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:26:28] <wikibugs>	 (03PS6) 10Giuseppe Lavagetto: wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620
[11:27:18] <jbond42>	 !log installing apache update on basion servers
[11:27:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:27:51] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me, and all subs in the same rack." [puppet] - 10https://gerrit.wikimedia.org/r/559021 (https://phabricator.wikimedia.org/T239054) (owner: 10Effie Mouzeli)
[11:28:23] <moritzm>	 !log installing ruby2.3 security updates
[11:28:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:28:38] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620 (owner: 10Giuseppe Lavagetto)
[11:31:26] <wikibugs>	 10Operations: Add a second CPU to debmonitor hosts - https://phabricator.wikimedia.org/T241046 (10MoritzMuehlenhoff)
[11:32:50] <wikibugs>	 (03PS7) 10Giuseppe Lavagetto: wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620
[11:34:25] <wikibugs>	 10Operations, 10vm-requests: Add a second CPU to debmonitor hosts - https://phabricator.wikimedia.org/T241046 (10MoritzMuehlenhoff)
[11:34:39] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620 (owner: 10Giuseppe Lavagetto)
[11:51:26] <wikibugs>	 (03PS8) 10Giuseppe Lavagetto: wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620
[11:53:20] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620 (owner: 10Giuseppe Lavagetto)
[11:54:39] <wikibugs>	 (03PS7) 10Phamhi: cloudvps: rename+reimage labmon1001 as cloudmetrics1001 [dns] - 10https://gerrit.wikimedia.org/r/555570 (https://phabricator.wikimedia.org/T224585)
[11:56:03] <wikibugs>	 (03CR) 10Phamhi: [C: 03+1] Switch cloudmetrics to the new unified partitioning scheme [puppet] - 10https://gerrit.wikimedia.org/r/558597 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff)
[12:00:05] <jouncebot>	 Amir1, Lucas_WMDE, awight, and Urbanecm: I, the Bot under the Fountain, allow thee, The Deployer, to do European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191218T1200).
[12:00:05] <jouncebot>	 CFisch_WMDE: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[12:00:36] <Amir1>	 I actually have a patch to deploy, will add it :D
[12:00:51] <Amir1>	 CFisch_WMDE: let me know when you're around
[12:00:57] <awight>	 CFisch_WMDE and I were going to deploy some Popups backports, Amir1 feel free to go first!
[12:01:20] <CFisch_WMDE>	 +1
[12:01:34] <Amir1>	 cooool
[12:04:54] <wikibugs>	 (03PS4) 10Muehlenhoff: Switch cloudmetrics to the new unified partitioning scheme [puppet] - 10https://gerrit.wikimedia.org/r/558597 (https://phabricator.wikimedia.org/T156955)
[12:07:53] <wikibugs>	 (03CR) 10Phamhi: cloudvps: rename+reimage labmon1001 as cloudmetrics1001 (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/555570 (https://phabricator.wikimedia.org/T224585) (owner: 10Phamhi)
[12:07:55] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch cloudmetrics to the new unified partitioning scheme [puppet] - 10https://gerrit.wikimedia.org/r/558597 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff)
[12:10:12] <awight>	 Amir1: wondering if you are deploying?
[12:10:27] <Amir1>	 awight: I'm waiting for you :D
[12:10:35] <Amir1>	 sorry I didn't say it explicitly 
[12:10:48] <awight>	 Amir1: go first, if you don't mind?
[12:11:01] <Amir1>	 sure sure
[12:11:01] <awight>	 Ours might be slower than usual...
[12:11:04] <awight>	 ty!
[12:11:07] <Amir1>	 oh okay
[12:11:27] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558239 (https://phabricator.wikimedia.org/T105683) (owner: 10Ladsgroup)
[12:12:49] <wikibugs>	 (03Merged) 10jenkins-bot: Add a bit for forcing LC caching backend in cli mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558239 (https://phabricator.wikimedia.org/T105683) (owner: 10Ladsgroup)
[12:16:50] <James_F>	 effie: ^^^ FYI.
[12:17:15] <effie>	 thanks james!
[12:17:26] <effie>	 I will merge mypatch later, no need to delay swat 
[12:19:59] <James_F>	 effie: OK. Is it OK for me to do the train in 40 minutes' time, too?
[12:21:08] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:558239|Add a bit for forcing LC caching backend in cli mode (T105683)]] (duration: 01m 03s)
[12:21:12] <wikibugs>	 (03PS1) 10Effie Mouzeli: common::mcrouter.yaml switch mcrouter proxies so to reimage them [puppet] - 10https://gerrit.wikimedia.org/r/559033 (https://phabricator.wikimedia.org/T239054)
[12:21:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:21:14] <stashbot>	 T105683: Add Scap support for static-array format of LCStore - https://phabricator.wikimedia.org/T105683
[12:22:02] <Amir1>	 awight: I'm done
[12:22:37] <effie>	 James_F: yeah, I will wait for a +1 anyway 
[12:22:48] <James_F>	 Kk.
[12:22:54] <effie>	 thank you!
[12:23:00] <awight>	 Amir1: ack
[12:30:41] <wikibugs>	 (03PS6) 10Phamhi: cloudvps: rename+reimage labmon1001 as cloudmetrics1001 [puppet] - 10https://gerrit.wikimedia.org/r/555565 (https://phabricator.wikimedia.org/T224585)
[12:32:46] <CFisch_WMDE>	 effie: I'm about to scap now.
[12:32:54] <effie>	 scap it 
[12:35:19] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] cache::text: Depool k8s services [puppet] - 10https://gerrit.wikimedia.org/r/559002 (https://phabricator.wikimedia.org/T239835) (owner: 10Alexandros Kosiaris)
[12:35:33] <logmsgbot>	 !log wmde-fisch@deploy1001 Synchronized php-1.35.0-wmf.10/extensions/Popups: SWAT: [[gerrit:559010|Fix initial preferences for newly created user accounts (T240947)]] (duration: 01m 03s)
[12:35:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:35:40] <stashbot>	 T240947: ReferencePreviews accidentally enabled for new users even if in Beta - https://phabricator.wikimedia.org/T240947
[12:35:44] <CFisch_WMDE>	 And there will be another one...
[12:35:51] <wikibugs>	 (03PS7) 10Phamhi: cloudvps: rename+reimage labmon1001 as cloudmetrics1001 [puppet] - 10https://gerrit.wikimedia.org/r/555565 (https://phabricator.wikimedia.org/T224585)
[12:43:18] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/555565 (https://phabricator.wikimedia.org/T224585) (owner: 10Phamhi)
[12:43:37] <CFisch_WMDE>	 effie: About to scap again, still good to go? ;-)
[12:43:44] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: codfw1dev: update routing_source_ip [puppet] - 10https://gerrit.wikimedia.org/r/559036 (https://phabricator.wikimedia.org/T239347)
[12:43:48] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2006 is CRITICAL: PYBAL CRITICAL - CRITICAL - wikifeeds_8889: Servers kubernetes2001.codfw.wmnet, kubernetes2006.codfw.wmnet, kubernetes2005.codfw.wmnet are marked down but pooled: echostore_8082: Servers kubernetes2002.codfw.wmnet, kubernetes2006.codfw.wmnet, kubernetes2005.codfw.wmnet are marked down but pooled: eventgate-logging-external_43192: Servers kubernetes2002.codfw.wmnet, kub
[12:43:48] <icinga-wm>	 .wmnet, kubernetes2005.codfw.wmnet are marked down but pooled: eventgate-analytics_31192: Servers kubernetes2002.codfw.wmnet, kubernetes2003.codfw.wmnet, kubernetes2005.codfw.wmnet are marked down but pooled: blubberoid_8748: Servers kubernetes2006.codfw.wmnet, kubernetes2003.codfw.wmnet, kubernetes2005.codfw.wmnet are marked down but pooled: blubberoid-https_4666: Servers kubernetes2002.codfw.wmnet, kubernetes2006.codfw.wmnet, k
[12:43:48] <icinga-wm>	 fw.wmnet are marked down but pooled: restrouter_7231: Servers kubernetes2006.codfw.wmnet, kubernetes2003.codfw.wmnet, kubernetes2005.codfw.wmnet are marked down but pooled: eventgate-ma https://wikitech.wikimedia.org/wiki/PyBal
[12:43:58] <icinga-wm>	 PROBLEM - PyBal IPVS diff check on lvs2006 is CRITICAL: CRITICAL: Hosts in IPVS but unknown to PyBal: set([kubernetes2004.codfw.wmnet, kubernetes2006.codfw.wmnet, kubernetes2002.codfw.wmnet, kubernetes2003.codfw.wmnet, kubernetes2005.codfw.wmnet, kubernetes2001.codfw.wmnet]) https://wikitech.wikimedia.org/wiki/PyBal
[12:44:06] <James_F>	 akosiaris: ^^ Is that your acks expiring?
[12:44:30] <akosiaris>	 nope, I forgot about scheduling downtime for these as well
[12:44:39] <akosiaris>	 they are valid, but not worrisome
[12:44:42] <James_F>	 Ah. :-)
[12:44:46] * akosiaris scheduling downtime for them as well
[12:44:46] <James_F>	 Yeah, codfw.
[12:45:10] <wikibugs>	 (03PS8) 10Phamhi: cloudvps: rename+reimage labmon1001 as cloudmetrics1001 [puppet] - 10https://gerrit.wikimedia.org/r/555565 (https://phabricator.wikimedia.org/T224585)
[12:45:10] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2003 is CRITICAL: PYBAL CRITICAL - CRITICAL - wikifeeds_8889: Servers kubernetes2002.codfw.wmnet, kubernetes2001.codfw.wmnet, kubernetes2006.codfw.wmnet are marked down but pooled: echostore_8082: Servers kubernetes2004.codfw.wmnet, kubernetes2003.codfw.wmnet, kubernetes2005.codfw.wmnet are marked down but pooled: eventgate-logging-external_43192: Servers kubernetes2001.codfw.wmnet, kub
[12:45:11] <icinga-wm>	 .wmnet, kubernetes2005.codfw.wmnet are marked down but pooled: eventgate-analytics_31192: Servers kubernetes2002.codfw.wmnet, kubernetes2004.codfw.wmnet, kubernetes2003.codfw.wmnet are marked down but pooled: blubberoid_8748: Servers kubernetes2004.codfw.wmnet, kubernetes2003.codfw.wmnet, kubernetes2005.codfw.wmnet are marked down but pooled: blubberoid-https_4666: Servers kubernetes2004.codfw.wmnet, kubernetes2006.codfw.wmnet, k
[12:45:11] <icinga-wm>	 fw.wmnet are marked down but pooled: restrouter_7231: Servers kubernetes2004.codfw.wmnet, kubernetes2006.codfw.wmnet, kubernetes2003.codfw.wmnet are marked down but pooled: eventgate-ma https://wikitech.wikimedia.org/wiki/PyBal
[12:45:12] <icinga-wm>	 PROBLEM - Restbase LVS codfw on restbase.svc.codfw.wmnet is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase
[12:45:21] <akosiaris>	 restbase however... it was not expected
[12:45:26] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2009 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:45:40] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2015 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:45:44] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2018 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:45:58] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2020 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:46:00] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2010 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:46:00] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2016 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:46:24] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2019 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:46:26] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2017 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:46:28] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2013 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:46:28] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2012 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:46:28] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2011 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:46:52] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2014 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:47:01] <vgutierrez>	 ouch
[12:47:11] <akosiaris>	 yeah me looking
[12:47:25] <CFisch_WMDE>	 effie: I'll just scap now
[12:47:47] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: codfw1dev: update routing_source_ip [puppet] - 10https://gerrit.wikimedia.org/r/559036 (https://phabricator.wikimedia.org/T239347) (owner: 10Arturo Borrero Gonzalez)
[12:48:50] <icinga-wm>	 PROBLEM - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on icinga1001 is CRITICAL: 1.179e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw
[12:48:52] <logmsgbot>	 !log wmde-fisch@deploy1001 Synchronized php-1.35.0-wmf.11/extensions/Popups: SWAT: [[gerrit:559010|Fix initial preferences for newly created user accounts (T240947)]] (duration: 01m 02s)
[12:48:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:48:58] <stashbot>	 T240947: ReferencePreviews accidentally enabled for new users even if in Beta - https://phabricator.wikimedia.org/T240947
[12:49:16] <wikibugs>	 (03PS9) 10Phamhi: cloudvps: rename+reimage labmon1001 as cloudmetrics1001 [puppet] - 10https://gerrit.wikimedia.org/r/555565 (https://phabricator.wikimedia.org/T224585)
[12:49:18] <CFisch_WMDE>	 OK I'm done effie 
[12:49:41] <akosiaris>	 this seems to be just wikifeeds, which should be depooled however
[12:50:44] <wikibugs>	 (03PS6) 10Effie Mouzeli: systemd: fixes in coredump class [puppet] - 10https://gerrit.wikimedia.org/r/545558 (https://phabricator.wikimedia.org/T236253)
[12:50:54] <logmsgbot>	 !log akosiaris@cumin1001 conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=(wikifeeds)
[12:50:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:51:05] <akosiaris>	 dammit, eqiad was depooled? sigh
[12:51:28] <icinga-wm>	 PROBLEM - PyBal IPVS diff check on lvs2003 is CRITICAL: CRITICAL: Hosts in IPVS but unknown to PyBal: set([kubernetes2004.codfw.wmnet, kubernetes2006.codfw.wmnet, kubernetes2002.codfw.wmnet, kubernetes2003.codfw.wmnet, kubernetes2005.codfw.wmnet, kubernetes2001.codfw.wmnet]) https://wikitech.wikimedia.org/wiki/PyBal
[12:51:54] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2011 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:52:09] <akosiaris>	 !log pool wikifeeds eqiad. For some reason it was depooled
[12:52:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:52:38] <jbond42>	 !log disable puppet fleet wide to restart apache on puppetmasters
[12:52:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:53:08] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2016 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:53:25] <wikibugs>	 (03PS10) 10Phamhi: cloudvps: rename+reimage labmon1001 as cloudmetrics1001 [puppet] - 10https://gerrit.wikimedia.org/r/555565 (https://phabricator.wikimedia.org/T224585)
[12:53:34] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2017 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:53:36] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2013 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:53:36] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2012 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:53:46] <effie>	 CFisch_WMDE: thank you!
[12:53:52] <effie>	 jouncebot: nex
[12:53:54] <effie>	 jouncebot: next
[12:53:54] <jouncebot>	 In 0 hour(s) and 6 minute(s): Mediawiki train - European Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191218T1300)
[12:54:01] <effie>	 cool 
[12:54:20] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2018 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:54:22] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2009 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:54:41] <James_F>	 OK, train departing to group1 in five minutes. Last call for any blockers.
[12:56:10] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2014 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:56:10] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2010 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:56:10] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2019 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:56:10] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2020 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:56:10] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2015 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[12:56:11] <icinga-wm>	 RECOVERY - Restbase LVS codfw on restbase.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[12:56:22] <jbond42>	 !log enable puppet fleet wide 
[12:56:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:56:29] <wikibugs>	 (03CR) 10Phamhi: "> Patch Set 5:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/555565 (https://phabricator.wikimedia.org/T224585) (owner: 10Phamhi)
[12:56:48] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[12:58:36] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[12:59:30] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] k8s: Migrate codfw to the new etcd cluster [puppet] - 10https://gerrit.wikimedia.org/r/558354 (https://phabricator.wikimedia.org/T239835) (owner: 10Alexandros Kosiaris)
[13:00:04] <jouncebot>	 James_F and longma: Your horoscope predicts another unfortunate Mediawiki train - European Version deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191218T1300).
[13:00:30] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: wikimediacloud.org: introduce FQDN for routing_source_ip in codfw1dev [dns] - 10https://gerrit.wikimedia.org/r/559040 (https://phabricator.wikimedia.org/T239347)
[13:00:53] <wikibugs>	 10Operations, 10Data-Services, 10Discovery-Search, 10Wikidata, and 2 others: Do not rate limit dumps from internal network - https://phabricator.wikimedia.org/T222349 (10Gehel) So it looks like for the foreseeable future, using external dumps mirror will still be the way to go to retrieve full dumps intern...
[13:01:10] <wikibugs>	 (03PS1) 10Jforrester: group1 wikis to 1.35.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559041
[13:01:12] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] group1 wikis to 1.35.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559041 (owner: 10Jforrester)
[13:02:10] <wikibugs>	 (03Merged) 10jenkins-bot: group1 wikis to 1.35.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559041 (owner: 10Jforrester)
[13:02:49] <moritzm>	 !log installing ruby2.5 security updates
[13:02:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:03:08] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] wikimediacloud.org: introduce FQDN for routing_source_ip in codfw1dev [dns] - 10https://gerrit.wikimedia.org/r/559040 (https://phabricator.wikimedia.org/T239347) (owner: 10Arturo Borrero Gonzalez)
[13:03:16] <wikibugs>	 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb deamons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10jbond)
[13:03:23] <wikibugs>	 10Operations, 10Puppet, 10DBA, 10Patch-For-Review, 10User-jbond: Extend Puppet CA Expiry date - https://phabricator.wikimedia.org/T236277 (10jbond)
[13:03:34] <logmsgbot>	 !log jforrester@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.11
[13:03:35] <James_F>	 Burst of DB errors, as normal.
[13:03:37] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Document all uses of the puppetCA certificate - https://phabricator.wikimedia.org/T237259 (10jbond) This information has been documented in https://wikitech.wikimedia.org/wiki/User:Jbond/Encryption
[13:03:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:03:46] <wikibugs>	 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Document all uses of the puppetCA certificate - https://phabricator.wikimedia.org/T237259 (10jbond) 05Open→03Resolved
[13:03:48] <wikibugs>	 10Operations, 10Puppet, 10DBA, 10Patch-For-Review, 10User-jbond: Extend Puppet CA Expiry date - https://phabricator.wikimedia.org/T236277 (10jbond)
[13:04:26] <James_F>	 Hmm, almost all on Commons…
[13:04:36] <logmsgbot>	 !log jforrester@deploy1001 Synchronized php: group1 wikis to 1.35.0-wmf.11 (duration: 01m 01s)
[13:04:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:04:42] <wikibugs>	 10Operations, 10Puppet, 10User-jbond: Clean up SSL configueration - https://phabricator.wikimedia.org/T240941 (10jbond)
[13:05:16] <James_F>	 Big burst of "No working replica DB server" errors.
[13:05:28] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[13:06:48] <icinga-wm>	 RECOVERY - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on icinga1001 is OK: (C)1e+05 gt (W)1e+04 gt 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw
[13:06:52] <effie>	 could be the train?
[13:07:07] <effie>	 I will be back in a bit 
[13:07:14] <James_F>	 Yeah, error rate is not falling back.
[13:07:25] <James_F>	 Looks like Wikibase on Commons is unhappy.
[13:09:29] <wikibugs>	 (03PS1) 10Jforrester: train: Rolling Commons back to 1.35.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559043 (https://phabricator.wikimedia.org/T233859)
[13:09:31] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] train: Rolling Commons back to 1.35.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559043 (https://phabricator.wikimedia.org/T233859) (owner: 10Jforrester)
[13:10:29] <wikibugs>	 (03Merged) 10jenkins-bot: train: Rolling Commons back to 1.35.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559043 (https://phabricator.wikimedia.org/T233859) (owner: 10Jforrester)
[13:11:24] <icinga-wm>	 PROBLEM - Prometheus k8s cache not updating on prometheus2003 is CRITICAL: instance=127.0.0.1:9906 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23k8s_cache_not_updating https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=prometheus2003&var-datasource=codfw+prometheus/ops
[13:11:33] <logmsgbot>	 !log jforrester@deploy1001 rebuilt and synchronized wikiversions files: train: Rolling Commons back to 1.35.0-wmf.10 T233859
[13:11:36] <wikibugs>	 10Operations, 10Traffic, 10User-jbond: Setup a new PKI software as an alternative to the puppet CA for managing services certificates - https://phabricator.wikimedia.org/T194031 (10jbond) a:05Volans→03jbond
[13:11:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:11:39] <stashbot>	 T233859: 1.35.0-wmf.11 deployment blockers - https://phabricator.wikimedia.org/T233859
[13:11:44] <icinga-wm>	 PROBLEM - Prometheus k8s cache not updating on prometheus2004 is CRITICAL: instance=127.0.0.1:9906 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23k8s_cache_not_updating https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=prometheus2004&var-datasource=codfw+prometheus/ops
[13:11:51] <akosiaris>	 heh, interesting, I did not expect a prometheus alarm... although it makes sense
[13:11:54] <icinga-wm>	 PROBLEM - Kafka MirrorMaker main-codfw_to_main-eqiad average message produce rate in last 30m on icinga1001 is CRITICAL: 0 le 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=codfw+prometheus/ops&var-mirror_name=main-codfw_to_main-eqiad
[13:12:38] <icinga-wm>	 PROBLEM - nova instance creation test on cloudcontrol1003 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, args nova-fullstack https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[13:12:48] <icinga-wm>	 PROBLEM - Kafka MirrorMaker main-codfw_to_main-eqiad average message consume rate in last 30m on icinga1001 is CRITICAL: 0 le 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration%23MirrorMaker https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=codfw+prometheus/ops&var-mirror_name=main-codfw_to_main-eqiad
[13:13:14] <jynus>	 and it's gone now
[13:13:55] <icinga-wm>	 ACKNOWLEDGEMENT - nova instance creation test on cloudcontrol1003 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, args nova-fullstack andrew bogott I updated base images -- this will recover shortly. https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[13:14:26] <James_F>	 jynus: Yeah, train rolled back for Commons. Filing task now.
[13:14:26] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[13:14:26] <icinga-wm>	 RECOVERY - nova instance creation test on cloudcontrol1003 is OK: PROCS OK: 1 process with command name python, args nova-fullstack https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[13:16:15] <wikibugs>	 10Operations, 10Data-Services, 10Discovery-Search, 10Wikidata, and 2 others: Do not rate limit dumps from internal network - https://phabricator.wikimedia.org/T222349 (10ArielGlenn) How fast a download do folks want?  Can we schedule rsyncs for the specifiic use cases with a higher bandwidth cap?
[13:16:39] <jynus>	 James_F: looking at queries I've seen a query, unrelated to this, to file as task on, too
[13:20:20] <wikibugs>	 10Operations, 10netops, 10cloud-services-team (Kanban): Return traffic to eqiad WMCS triggering FNM - https://phabricator.wikimedia.org/T240789 (10CDanis) +1 to doing #1 and revisiting if it becomes a problem again.
[13:20:40] <James_F>	 jynus: Different issue or same issue but different query triggering it?
[13:24:04] <jynus>	 James_F: I thought it was connected, but it starts a long time ago and continues now, so I learned that unrelated
[13:24:07] <andrewbogott>	 James_F: is the rollback complete or still in progress?
[13:24:16] <jynus>	 I am filing it
[13:24:23] <andrewbogott>	 I'm seeing weird behavior on wikitech, maybe related?
[13:25:14] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[13:25:27] * andrewbogott backs away
[13:25:36] <effie>	 back
[13:25:42] <icinga-wm>	 PROBLEM - Check systemd state on labweb1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:25:44] <James_F>	 andrewbogott: Rolled back but only on Commons. Might be.
[13:26:02] <andrewbogott>	 the consistent thing I see is crashes in search
[13:26:13] <andrewbogott>	 https://wikitech.wikimedia.org/w/index.php?search=virsh+console&title=Special%3ASearch&go=Go&wprov=acrw1_-1
[13:26:14] <James_F>	 Yeah, looking.
[13:26:17] <andrewbogott>	 thanks
[13:26:40] <James_F>	 Searching looks OK to me, though?
[13:27:02] <andrewbogott>	 does that page I linked above load for you?
[13:27:14] <andrewbogott>	 For me I get "(Cannot access the database: Cannot access the database: Unknown error (10.64.32.72))"
[13:27:23] <andrewbogott>	 that's from running a search for 'virsh console'
[13:27:28] <James_F>	 andrewbogott: Yes, instantly. Hence my confusion.
[13:27:30] <icinga-wm>	 RECOVERY - Check systemd state on labweb1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:27:33] <andrewbogott>	 hm
[13:27:37] <andrewbogott>	 well, I'm in asia…?
[13:27:44] <longma>	 when you search again it breaks for me
[13:27:45] <James_F>	 But there are lots of errors, you're right.
[13:27:51] * andrewbogott tries again with a different browser
[13:27:57] <James_F>	 Oh, huh, on refresh it broke.
[13:28:03] <James_F>	 I'll rollback Wikitech too.
[13:28:46] <andrewbogott>	 search worked in a second browser but then I reloaded in the original browser and am missing static content
[13:28:52] <andrewbogott>	 so it's all over the place :(
[13:28:58] <James_F>	 Yeah.
[13:28:58] <jynus>	 https://phabricator.wikimedia.org/T241058
[13:29:07] <James_F>	 Did something exciting change in core's handing of DB connections?
[13:29:23] <cdanis>	 Yes.
[13:29:25] <wikibugs>	 (03PS1) 10Jforrester: train: Rolling Wikitech back to 1.35.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559052 (https://phabricator.wikimedia.org/T233859)
[13:29:27] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] train: Rolling Wikitech back to 1.35.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559052 (https://phabricator.wikimedia.org/T233859) (owner: 10Jforrester)
[13:29:30] <cdanis>	 I did some dbctl changes the other day.
[13:29:56] <James_F>	 But that would be expected to blow up at the time, not as part of the train only.
[13:30:01] <cdanis>	 True.
[13:30:25] <wikibugs>	 (03Merged) 10jenkins-bot: train: Rolling Wikitech back to 1.35.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559052 (https://phabricator.wikimedia.org/T233859) (owner: 10Jforrester)
[13:30:45] <wikibugs>	 (03PS1) 10Ema: ATS: disable compress plugin on ats-be [puppet] - 10https://gerrit.wikimedia.org/r/559053 (https://phabricator.wikimedia.org/T238494)
[13:30:46] <logmsgbot>	 !log jforrester@deploy1001 rebuilt and synchronized wikiversions files: train: Rolling Wikitech back to 1.35.0-wmf.10 T233859
[13:30:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:30:52] <stashbot>	 T233859: 1.35.0-wmf.11 deployment blockers - https://phabricator.wikimedia.org/T233859
[13:31:42] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] network: data: cleanup unused WMCS ranges [puppet] - 10https://gerrit.wikimedia.org/r/556994 (https://phabricator.wikimedia.org/T240670) (owner: 10Arturo Borrero Gonzalez)
[13:32:33] <James_F>	 No changes this week to includes/db or anything else obvious from a quick glance.
[13:32:56] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: networks: cleanup unused WMCS ranges [dns] - 10https://gerrit.wikimedia.org/r/556995 (https://phabricator.wikimedia.org/T240670)
[13:34:19] <James_F>	 andrewbogott: Hmm, I'm also seeing those errors on wmf.10 on Wikitech now I've rolled it back…
[13:35:37] <James_F>	 Filed as T241059
[13:35:38] <stashbot>	 T241059: Cannot access the database: Unknown error (10.64.32.72) - https://phabricator.wikimedia.org/T241059
[13:36:09] <wikibugs>	 (03PS2) 10Ema: ATS: disable compress plugin on ats-be [puppet] - 10https://gerrit.wikimedia.org/r/559053 (https://phabricator.wikimedia.org/T238494)
[13:36:28] <andrewbogott>	 James_F: I couldn't tell you when I last did a search on wikitech but probably almost every day… I doubt it's been broken for a week
[13:37:03] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] networks: cleanup unused WMCS ranges [dns] - 10https://gerrit.wikimedia.org/r/556995 (https://phabricator.wikimedia.org/T240670) (owner: 10Arturo Borrero Gonzalez)
[13:37:04] <icinga-wm>	 PROBLEM - Check systemd state on labweb1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:37:28] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] ATS: disable compress plugin on ats-be [puppet] - 10https://gerrit.wikimedia.org/r/559053 (https://phabricator.wikimedia.org/T238494) (owner: 10Ema)
[13:38:27] <andrewbogott>	 oh, now wikitech is down for me entirely.
[13:38:34] <moritzm>	 !log installing dbus security updates for stretch
[13:38:40] <andrewbogott>	 Maybe this is all jynus's ddos and nothing to do with release version
[13:38:50] <wikibugs>	 10Operations, 10netops, 10Patch-For-Review, 10cloud-services-team (Kanban): WMCS: cleanup network allocations - https://phabricator.wikimedia.org/T240670 (10aborrero) Patches are merged. I can do the Netbox cleanup (delete all those objects) by myself if you confirm @ayounsi
[13:38:51] <James_F>	 Maybe.
[13:39:17] <cdanis>	 https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1133&var-port=9104
[13:40:06] <icinga-wm>	 PROBLEM - Check systemd state on labweb1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:40:16] <James_F>	 I can roll back the train everywhere, but theoretically wikitech should be pretty isolated from the rest of the train.
[13:40:24] <cdanis>	 I think it's an actual problem with db1133?
[13:40:38] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1021 is CRITICAL: CRITICAL check_failover servers up 1 down 1 https://wikitech.wikimedia.org/wiki/HAProxy
[13:41:02] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - CRITICAL - labweb-ssl_7443: Servers labweb1002.wikimedia.org are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[13:41:04] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
[13:41:06] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
[13:41:14] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1015 is CRITICAL: PYBAL CRITICAL - CRITICAL - labweb-ssl_7443: Servers labweb1002.wikimedia.org are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[13:41:30] <James_F>	 Is 10.64.32.72 db1133?
[13:41:34] <cdanis>	 Yes.
[13:41:38] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1017 is CRITICAL: CRITICAL check_failover servers up 1 down 1 https://wikitech.wikimedia.org/wiki/HAProxy
[13:42:02] <effie>	 thats the labswiki master 
[13:42:14] <James_F>	 Right.
[13:42:44] <cdanis>	 At least one of the haproxy alerts there, dbproxy1017, is also re: db1133
[13:43:06] <cdanis>	 Same for dbproxy1021.
[13:43:12] <cdanis>	 Also has db1133 as its backing server.
[13:43:23] <andrewbogott>	 nova uses the same db cluster (M5) and seems to be working OK
[13:45:18] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
[13:45:24] <James_F>	 OK, ignoring Wikitech, everything else seems OK.
[13:45:38] <James_F>	 What do we want to do about Wikitech?
[13:46:01] <effie>	 the only thing I can see in tendril about db1133 is many aborted connections, but that does not help much 
[13:46:12] <mark>	 jynus: are you looking at the wikitech / db1133 issue?
[13:46:15] <effie>	 has anyone restarted php there?
[13:46:22] <effie>	 if not, we can start from there
[13:46:25] <andrewbogott>	 I can do that now
[13:46:41] <effie>	 ok
[13:46:42] <mark>	 but why would the dbproxy complain then
[13:46:46] <andrewbogott>	 effie: does 'restarting php' == 'service apache2 reload'?
[13:47:08] <effie>	 andrewbogott: depool ; systemctl restart php7.2-fpm; pool 
[13:47:14] <andrewbogott>	 ok, doing
[13:47:24] <effie>	 although depooling and pooling does not matter much 
[13:47:42] <andrewbogott>	 done on both wikitech servers
[13:48:00] <andrewbogott>	 !log depool ; systemctl restart php7.2-fpm; pool  on labweb1001 and labweb1002
[13:48:10] <cdanis>	 I grabbed a 'show full processlist' and stats output from db1133, although I don't know what to make of it myself: https://phabricator.wikimedia.org/P9939
[13:48:19] <cdanis>	 there are a fair number of long-running queries though
[13:49:26] <andrewbogott>	 that looks pretty normal to me but I'll restart some openstack things and see if that clears up the number of connections
[13:50:02] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[13:50:14] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1015 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[13:50:24] <wikibugs>	 (03PS11) 10Phamhi: cloudvps: rename+reimage labmon1001 as cloudmetrics1001 [puppet] - 10https://gerrit.wikimedia.org/r/555565 (https://phabricator.wikimedia.org/T224585)
[13:50:26] <effie>	 ok wikitech is back 
[13:50:37] <effie>	 at least from my end 
[13:50:47] <James_F>	 Confirmed.
[13:50:50] <andrewbogott>	 for me too
[13:50:52] <cdanis>	 search works
[13:50:56] <icinga-wm>	 RECOVERY - Check systemd state on labweb1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:50:56] <andrewbogott>	 anyone know what changed?
[13:51:04] <icinga-wm>	 RECOVERY - Prometheus k8s cache not updating on prometheus2003 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23k8s_cache_not_updating https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=prometheus2003&var-datasource=codfw+prometheus/ops
[13:51:24] <icinga-wm>	 RECOVERY - Prometheus k8s cache not updating on prometheus2004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23k8s_cache_not_updating https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=prometheus2004&var-datasource=codfw+prometheus/ops
[13:51:30] <icinga-wm>	 RECOVERY - Check systemd state on labweb1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:51:35] <cdanis>	 the monitoring graphs for db1133 still look wonky to me, but what do I know https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1133&var-port=9104
[13:52:12] <James_F>	 andrewbogott: Well, we deployed the train.
[13:52:18] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[13:52:35] <James_F>	 But it stayed when we rolled back, so unless it was ultra-long-running queries that didn't get killed, somehow…
[13:52:49] <andrewbogott>	 James_F: yeah, I was wondering what fixed it
[13:53:00] <James_F>	 The restart?
[13:53:35] <James_F>	 Presumably the sawtooth load on db1133 is unusual?
[13:53:42] <James_F>	 (Prior to everything blowing up.)
[13:54:05] <James_F>	 Something cron-based every 10 minutes?
[13:54:30] <cdanis>	 it's weirder than just sawtooth -- if you hover over the graphs, you'll see that during a bunch of that interval, there _aren't_ data points -- as in, stats couldn't be scraped for some reason
[13:54:34] <wikibugs>	 (03PS12) 10Phamhi: cloudvps: rename+reimage labmon1001 as cloudmetrics1001 [puppet] - 10https://gerrit.wikimedia.org/r/555565 (https://phabricator.wikimedia.org/T224585)
[13:54:45] <James_F>	 cdanis: Oh, so it's locking up every 10?
[13:55:09] <cdanis>	 no, just from ~13:30 onwards
[13:56:22] <James_F>	 Oh, I see what you mean. During the downtime there were no data points.
[13:57:33] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
[13:57:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:58:03] <cdanis>	 anyway, it looks like there were also a bunch of the "Cannot access the database: Unknown error" for other wikis in wmf.11?
[13:58:11] <wikibugs>	 (03PS1) 10IAmNetx: Add ng.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/559062 (https://phabricator.wikimedia.org/T240771)
[13:58:11] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
[13:58:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:58:31] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add ng.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/559062 (https://phabricator.wikimedia.org/T240771) (owner: 10IAmNetx)
[13:58:50] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
[13:58:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:59:24] <effie>	 cdanis: do you have a logstash url in hand ?
[13:59:33] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'echostore' for release 'production' .
[13:59:35] <cdanis>	 https://logstash.wikimedia.org/goto/7ff0eceadac6b8f5aaeecafdc3fb0fe2
[13:59:36] <wikibugs>	 (03PS2) 10IAmNetx: Add ng.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/559062 (https://phabricator.wikimedia.org/T240771)
[13:59:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:59:42] <cdanis>	 mostly commonswiki
[13:59:51] <James_F>	 cdanis: There's always a small spate of those during the train rollout (sadly).
[14:00:12] <James_F>	 cdanis: Yeah, I rolled back on Commons. See T241057. Possibly related, but was isolated to there at the time.
[14:00:12] <stashbot>	 T241057: Cannot access the database: No working replica DB server: Unknown error (10.64.32.113) - https://phabricator.wikimedia.org/T241057
[14:00:15] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics' for release 'analytics' .
[14:00:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:01:18] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'logging-external' .
[14:01:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:01:54] <Amir1>	 https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php72-noselenium-docker/2083/console
[14:02:01] <Amir1>	 Gerrit going down?
[14:02:17] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'main' .
[14:02:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:02:34] <icinga-wm>	 PROBLEM - DPKG on analytics1055 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[14:02:53] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
[14:02:55] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
[14:02:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:03:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:03:09] <akosiaris>	 Amir1: I don't think so. gerrit works fine with me
[14:03:12] <akosiaris>	 for*
[14:03:22] <James_F>	 Amir1: No, but occasionally it'll go away for CI and you'll need to re-run.
[14:03:22] <akosiaris>	 some rsync failed over there
[14:03:48] <akosiaris>	 what is that rsync doing there? does it have to do something with gerrit?
[14:03:53] <akosiaris>	 do I want to know?
[14:04:02] <Amir1>	 James_F: I'm going to deploy the fix right now, Can you try again after it? Please keep me posted
[14:04:08] <paladox>	 gerrit's up
[14:04:16] <Amir1>	 several jobs failed
[14:04:17] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'production' .
[14:04:17] <James_F>	 akosiaris: I can describe in detail, but not when there are production issues.
[14:04:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:04:23] <Amir1>	 https://integration.wikimedia.org/ci/job/wikibase-repo-docker/9684/console
[14:04:34] <akosiaris>	 James_F: ok, sorry for the interrupt
[14:04:38] <James_F>	 akosiaris: :-)
[14:05:02] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
[14:05:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:05:08] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs2006 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
[14:05:25] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' .
[14:05:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:06:01] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
[14:06:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:06:31] <logmsgbot>	 !log akosiaris@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'zotero' for release 'production' .
[14:06:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:06:44] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs2003 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
[14:06:53] <James_F>	 Amir1: How are you planning to test the patch? TestCommons?
[14:07:25] <Amir1>	 James_F: it's not possible, this only happens where there are in different hosts
[14:07:31] <Amir1>	 wikidata and commons
[14:08:24] <Amir1>	 nope, I was able to reproduce it
[14:10:09] <Amir1>	 https://test-commons.wikimedia.org/wiki/File:4050443322980010273.png
[14:10:22] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2006 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[14:10:56] <James_F>	 Amir1: TestCommons and TestWikidata are on different hosts, right?
[14:11:08] <Amir1>	 they should be both on s3?
[14:11:18] <James_F>	 TestCommons is on s4 for this reason.
[14:11:23] <Amir1>	 testwikidawiki is on s3
[14:11:27] <Amir1>	 nice idea
[14:11:30] <James_F>	 :-)
[14:11:32] * James_F bows.
[14:11:44] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2003 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[14:11:51] <James_F>	 (More generally, it's there to be as prod-like as possible.)
[14:17:18] <Amir1>	 Can I force merge it? 😈
[14:17:33] <James_F>	 Amir1: No.
[14:17:40] <James_F>	 Amir1: There's no great rush.
[14:17:54] <Amir1>	 okay :D
[14:22:12] <wikibugs>	 10Operations, 10SRE-tools: Extend debmonitor with image tracking support - https://phabricator.wikimedia.org/T237978 (10MoritzMuehlenhoff) 05Open→03Resolved This is rolled out to production, all further refinements can happen via followup tasks/commits.
[14:22:16] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: "https://puppet-compiler.wmflabs.org/compiler1001/20056/ makes sense. I'll fix the CI errors and the patch can be considered good to go as " [puppet] - 10https://gerrit.wikimedia.org/r/558620 (owner: 10Giuseppe Lavagetto)
[14:23:26] <wikibugs>	 10Operations: rack/setup/install auth1002 - https://phabricator.wikimedia.org/T196698 (10MoritzMuehlenhoff) 05Open→03Resolved This is installed (as a test system for now), closing the task.
[14:24:38] <wikibugs>	 (03PS1) 10IAmNetx: Add ng.wikimedia.org as chapter site [puppet] - 10https://gerrit.wikimedia.org/r/559073 (https://phabricator.wikimedia.org/T240771)
[14:25:42] <wikibugs>	 10Operations, 10Traffic: /sec-warning page: please add a helpful XML comment explaining why it's being delivered. - https://phabricator.wikimedia.org/T240794 (10DavidBrooks) Suggest: near the top of the /sec-warning file, add an HTML comment:  <!-- If this content has been delivered in response to a MediaWiki...
[14:27:08] <wikibugs>	 (03PS9) 10Giuseppe Lavagetto: 🚧 wmflib: Introduce a more usable data structure to describe services. [puppet] - 10https://gerrit.wikimedia.org/r/558620
[14:27:19] <wikibugs>	 10Operations, 10Traffic: /sec-warning page: please add an HTML comment that is more easily visible to API and transport-level inspection/debugging - https://phabricator.wikimedia.org/T240794 (10DavidBrooks)
[14:27:20] <_joe_>	 vgutierrez: ^^
[14:28:00] <moritzm>	 !log removing pollux from Ganeti (obsoleted by ldap-corp2001 for a while now)
[14:28:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:28:06] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: Check if a GPU fits in any of the remaining stat or notebook hosts - https://phabricator.wikimedia.org/T220698 (10Jclark-ctr) {F31480680} @EBernhardson   stat1004 and 1007 are 1 u host these will not fit dual-slot card . most on this list are 1u host a...
[14:30:50] <wikibugs>	 (03CR) 10Ottomata: "Yup that's this one https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/558115" [puppet] - 10https://gerrit.wikimedia.org/r/558117 (owner: 10Ottomata)
[14:32:21] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] common::mcrouter.yaml switch mcrouter proxies so to reimage them [puppet] - 10https://gerrit.wikimedia.org/r/559033 (https://phabricator.wikimedia.org/T239054) (owner: 10Effie Mouzeli)
[14:33:06] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] scap:dsh.yaml switch scap proxies so to reimage them [puppet] - 10https://gerrit.wikimedia.org/r/559021 (https://phabricator.wikimedia.org/T239054) (owner: 10Effie Mouzeli)
[14:37:06] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.decommission
[14:37:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:37:42] <logmsgbot>	 !log jmm@cumin2001 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
[14:37:44] <wikibugs>	 10Operations: Migrate ldap/corp replicas to Stretch/Buster - https://phabricator.wikimedia.org/T224557 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2001 for hosts: `pollux.wikimedia.org` -  pollux.wikimedia.org (**FAIL**)   - Downtimed host on Icinga   - No management interface fo...
[14:37:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:38:59] <jbond42>	 !log restart apache on phab
[14:39:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:39:07] <wikibugs>	 10Operations: Track remaining jessie systems in production - https://phabricator.wikimedia.org/T224549 (10MoritzMuehlenhoff)
[14:39:22] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Create OpenGLAM mailing list - https://phabricator.wikimedia.org/T238759 (10SandraF_WMF)
[14:39:26] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Migrate archives of the OKFN-hosted Open-GLAM mailing list to Wikimedia's mailman - https://phabricator.wikimedia.org/T240929 (10SandraF_WMF)
[14:41:43] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: "See the comments below but also:" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/545558 (https://phabricator.wikimedia.org/T236253) (owner: 10Effie Mouzeli)
[14:42:52] <Amir1>	 okay, it fixes the issue
[14:43:00] <Amir1>	 moving forward
[14:43:03] <Amir1>	 James_F: ^
[14:43:14] <James_F>	 Amir1: Cool.
[14:44:20] <wikibugs>	 (03PS1) 10CDanis: Revert "mwdebug1002: add motd warning" [puppet] - 10https://gerrit.wikimedia.org/r/559088 (https://phabricator.wikimedia.org/T214734)
[14:44:26] <wikibugs>	 (03PS1) 10Jforrester: train: Roll Commons forward to 1.35.0-wmf.11 again [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559089 (https://phabricator.wikimedia.org/T241057)
[14:44:38] <wikibugs>	 (03PS2) 10Jforrester: train: Roll Commons forward to 1.35.0-wmf.11 again [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559089 (https://phabricator.wikimedia.org/T241057)
[14:45:03] <awight>	 I'm hoping to monitor production servers, to see whether a recent change had any impact on MediaWiki app server memory usage.  However, I don't understand whether that's possible.  It seems what I would look for is less frequent PHP garbage collection--is there a metric for that?
[14:45:23] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized php-1.35.0-wmf.11/extensions/Wikibase: [[gerrit:559064|Fix DatabaseEntityInfoBuilder on federated repos (T241057)]] (duration: 01m 07s)
[14:45:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:45:29] <stashbot>	 T241057: Cannot access the database: No working replica DB server: Unknown error (10.64.32.113) - https://phabricator.wikimedia.org/T241057
[14:45:42] <wikibugs>	 10Operations: Track remaining jessie systems in production - https://phabricator.wikimedia.org/T224549 (10MoritzMuehlenhoff)
[14:45:43] <James_F>	 Amir1: Good to roll the train to Commons now?
[14:46:02] <Amir1>	 yes but please keep me in the loop for issues that might arise
[14:46:09] <Amir1>	 a big thing is going live today
[14:46:34] <James_F>	 Of all the weeks, you chose this one? :-)
[14:46:50] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] train: Roll Commons forward to 1.35.0-wmf.11 again [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559089 (https://phabricator.wikimedia.org/T241057) (owner: 10Jforrester)
[14:47:05] <cdanis>	 awight: I'm not sure if we have any existing stats about PHP GC, but maybe effie or _joe_ know better?
[14:47:28] <awight>	 Thanks--or any other way to measure PHP memory pressure.
[14:47:31] <Amir1>	 James_F: well, this supposed to be there six month ago, some people forgot to do it completely 
[14:47:50] <wikibugs>	 (03Merged) 10jenkins-bot: train: Roll Commons forward to 1.35.0-wmf.11 again [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559089 (https://phabricator.wikimedia.org/T241057) (owner: 10Jforrester)
[14:48:03] <James_F>	 Amir1: Fun.
[14:49:04] <_joe_>	 cdanis: we disable php gc IIRC
[14:49:11] <cdanis>	 👀
[14:49:19] <_joe_>	 well there is the natural gc
[14:49:24] <_joe_>	 at the end of a request
[14:49:39] <James_F>	 Run fast, reap thread.
[14:49:54] <logmsgbot>	 !log jforrester@deploy1001 rebuilt and synchronized wikiversions files: train: Rolling Commons foward to 1.35.0-wmf.11 T233859 T241057
[14:50:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:50:01] <stashbot>	 T233859: 1.35.0-wmf.11 deployment blockers - https://phabricator.wikimedia.org/T233859
[14:50:04] <James_F>	 Amir1: Live.
[14:50:10] <_joe_>	 awight: when did you change go live?
[14:50:32] <awight>	 _joe_: Good to know, thanks!  But no metrics about how much memory is allocated over the course of each thread's lifespan?
[14:51:32] <Amir1>	 thanks
[14:51:38] <Amir1>	 let me look at the logs
[14:51:40] <James_F>	 Amir1: LGTM.
[14:51:42] <_joe_>	 no, php-fpm doesn't expose that information, and I don't think we intrument mediawiki in that direction for the average request
[14:51:49] <cdanis>	 we should :)
[14:51:49] <Amir1>	 \o/
[14:51:52] <_joe_>	 *instrument
[14:52:04] <awight>	 _joe_: This change goes live in wmf.11, fwiw
[14:52:12] <James_F>	 Now the question is, is it safe to roll wmf.11 out to wikitech again?
[14:52:18] <_joe_>	 cdanis: from what I remember, and it's years, resource usage reporting in php is slow and utterly unreliable
[14:52:36] <_joe_>	 as it rarely counts the memory allocated by extensions in C
[14:53:22] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] Change eventgate-logging-external TLS port to 4392 [deployment-charts] - 10https://gerrit.wikimedia.org/r/558115 (owner: 10Ottomata)
[14:53:28] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] Change eventgate-logging-external TLS port to 4392 [deployment-charts] - 10https://gerrit.wikimedia.org/r/558115 (owner: 10Ottomata)
[14:53:38] <awight>	 interesting.  re: slow, if we decide these are useful metrics, we can always sample e.g. every 0.1% of app server threads gets profiled.
[14:54:22] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Revert "cache::text: Depool k8s services" [puppet] - 10https://gerrit.wikimedia.org/r/559091
[14:55:32] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] Revert "mwdebug1002: add motd warning" [puppet] - 10https://gerrit.wikimedia.org/r/559088 (https://phabricator.wikimedia.org/T214734) (owner: 10CDanis)
[14:55:45] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] "Everything looks good, reverting" [puppet] - 10https://gerrit.wikimedia.org/r/559091 (owner: 10Alexandros Kosiaris)
[15:00:32] <icinga-wm>	 PROBLEM - Restrouter LVS codfw on restrouter.svc.codfw.wmnet is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase
[15:01:16] <James_F>	 I should go have some lunch, it having reached 15:00.
[15:02:36] <_joe_>	 akosiaris: ^^ ?
[15:03:09] <akosiaris>	 _joe_: yeah, it's one of the 2 issues I am investigating
[15:03:37] <akosiaris>	 the other one being https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&fullscreen&panelId=11
[15:03:39] <_joe_>	 akosiaris: what the other? do you need help looking into this?
[15:03:46] <akosiaris>	 those metrics disappearing
[15:03:51] <akosiaris>	 that's though lower priority
[15:04:07] <akosiaris>	 it's not impacting anything btw, nothing uses restrouter
[15:04:07] <_joe_>	 oh that's probably that metrics changed with etcd3?
[15:04:10] <_joe_>	 dunno
[15:04:14] <_joe_>	 sure
[15:04:34] <akosiaris>	 I did a diff between old and new and there are changes for sure
[15:04:37] <_joe_>	 that's pretty ok for having transitioned to a new datastore and having recreated everything from scratch
[15:04:46] <akosiaris>	 it might be.. perhaps compareandswap is no longer thing?
[15:04:48] <_joe_>	 old and new what?
[15:04:54] <_joe_>	 indeed it's not
[15:04:54] <akosiaris>	 codfw vs eqiad
[15:05:07] <_joe_>	 etcd3 uses transactions
[15:05:28] <akosiaris>	 anyway I 'll have a closer look at that tomorrow, it's not impacting anything
[15:05:36] <akosiaris>	 now as to why restrouter is complaining
[15:05:46] <akosiaris>	 it's timeouts and it's currently supposedly reaching eqiad
[15:06:09] <akosiaris>	 I 'll pool wikifeeds just to test that, but if it's true, probably some timeout is too slow
[15:06:13] <akosiaris>	 low*
[15:06:45] <logmsgbot>	 !log akosiaris@cumin1001 conftool action : set/pooled=true; selector: name=codfw,dnsdisc=(eventgate.*|mathoid|citoid|restrouter|sessionstore|echostore|zotero|termbox|wikifeeds|cxserver|blubberoid)
[15:06:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:07:01] <akosiaris>	 !log repool all codfw k8s services. T239835
[15:07:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:07:07] <stashbot>	 T239835: setup new, buster based, kubernetes etcd servers for staging/codfw/eqiad cluster - https://phabricator.wikimedia.org/T239835
[15:07:21] <jynus>	 mark: no, I was having lunch
[15:08:08] <akosiaris>	 bstorm_: https://phabricator.wikimedia.org/T214513#5751354 <3
[15:08:27] <bstorm_>	 :)
[15:10:34] <icinga-wm>	 RECOVERY - Restrouter LVS codfw on restrouter.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[15:10:42] <jynus>	 bstorm_: apparently m5 broke
[15:10:53] <akosiaris>	 _joe_: ^
[15:11:03] <akosiaris>	 so indeed some timeout is set way too low
[15:11:03] <wikibugs>	 10Operations, 10Traffic: /sec-warning page: please add an HTML comment that is more easily visible to API and transport-level inspection/debugging - https://phabricator.wikimedia.org/T240794 (10BBlack) Or a patch to template this in.  The problem is it's implemented from a standard template for the top 30-40 l...
[15:11:08] <bstorm_>	 Ouch
[15:11:13] <_joe_>	 akosiaris: what was the problem?
[15:11:26] <_joe_>	 oh just that it was reaching cross-dc
[15:11:27] <akosiaris>	 _joe_: the moment I pooled codfw in the discovery records, it was fixed
[15:11:44] <akosiaris>	 the funny thing is that restbase isn't complaining
[15:11:44] <_joe_>	 well it's worrisome that doesn't happen to old restbase
[15:11:48] <_joe_>	 they test the same things
[15:11:52] <akosiaris>	 exactly!
[15:11:56] <_joe_>	 ohhh, did you repool restbase too?
[15:12:04] <akosiaris>	 no I never touched restbase
[15:12:06] <_joe_>	 restrouter was calling restbase in eqiad I guess
[15:12:12] <_joe_>	 uhm
[15:12:14] <jynus>	 bstorm_: https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1133&var-port=9104&from=1576673804647&to=1576678204871&fullscreen&panelId=10
[15:12:14] <_joe_>	 ok
[15:12:16] <akosiaris>	 logmsgbot: !log akosiaris@cumin1001 conftool action : set/pooled=true; selector: name=codfw,dnsdisc=(eventgate.*|mathoid|citoid|restrouter|sessionstore|echostore|zotero|termbox|wikifeeds|cxserver|blubberoid)
[15:12:26] <_joe_>	 akosiaris: this needs more investigation then
[15:12:47] <jynus>	 I am guessing that is the same issue andrewbogott noticed
[15:12:58] <icinga-wm>	 RECOVERY - Kafka MirrorMaker main-codfw_to_main-eqiad average message consume rate in last 30m on icinga1001 is OK: (C)0 le (W)100 le 106.4 https://wikitech.wikimedia.org/wiki/Kafka/Administration%23MirrorMaker https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=codfw+prometheus/ops&var-mirror_name=main-codfw_to_main-eqiad
[15:13:33] <akosiaris>	 yeah I 'll create a task. But in other news. :tada:
[15:14:12] <akosiaris>	 why is there 💯 but not 9000 ?
[15:14:14] <icinga-wm>	 RECOVERY - Kafka MirrorMaker main-codfw_to_main-eqiad average message produce rate in last 30m on icinga1001 is OK: (C)0 le (W)100 le 155.1 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=codfw+prometheus/ops&var-mirror_name=main-codfw_to_main-eqiad
[15:15:01] <akosiaris>	 I 'll file a task
[15:15:26] <awight>	 _joe_: FYI, I can't find where we disable garbage collection.  https://codesearch.wmflabs.org/search/?q=enable_gc&i=nope&files=&repos=
[15:16:22] <wikibugs>	 (03CR) 10Bstorm: "> Patch Set 1:" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/558726 (https://phabricator.wikimedia.org/T241008) (owner: 10BryanDavis)
[15:17:28] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] "Looks great! I'd noticed something a bit off once we were in tools (looking at the fourohfour tool), and this is definitely it!" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/558726 (https://phabricator.wikimedia.org/T241008) (owner: 10BryanDavis)
[15:17:30] <bblack>	 a task for an over-9000 emjoi?
[15:17:33] <bblack>	 *emoji
[15:17:41] <akosiaris>	 I was waiting for someone to ask that
[15:17:46] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: Check if a GPU fits in any of the remaining stat or notebook hosts - https://phabricator.wikimedia.org/T220698 (10elukey) @RobH I guess that the only way forward would be to order a new host like stat1005? If so we could use part of the GPU budget for...
[15:17:48] <akosiaris>	 😈
[15:17:55] <akosiaris>	 niah, for restrouter
[15:17:55] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] "> Patch Set 1: Code-Review+1" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/558726 (https://phabricator.wikimedia.org/T241008) (owner: 10BryanDavis)
[15:20:30] <awight>	 Parsoid code does disable the GC: https://codesearch.wmflabs.org/search/?q=gc_disable&i=nope&files=&repos=
[15:21:00] <bblack>	 !log cr[23]-esams: ns2 authdns static routing: route to both of dns300[12] w/ ECMP (was just dns3001)
[15:21:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:22:00] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] "> Patch Set 1:" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/558726 (https://phabricator.wikimedia.org/T241008) (owner: 10BryanDavis)
[15:22:20] <akosiaris>	 _joe_: https://phabricator.wikimedia.org/T241068
[15:25:19] <wikibugs>	 10Operations, 10serviceops: kubestagetcd1003 alerts daily via email to root@ for 'unexpected non snapshot file' - https://phabricator.wikimedia.org/T240932 (10akosiaris) 05Open→03Declined that host is to be removed pretty soon. The staging and codfw clusters have been migrated to etcd3 and different set of...
[15:25:29] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] "> Patch Set 1:" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/558726 (https://phabricator.wikimedia.org/T241008) (owner: 10BryanDavis)
[15:28:07] <wikibugs>	 (03PS1) 10Muehlenhoff: Track Kerberos principals in data.yaml [puppet] - 10https://gerrit.wikimedia.org/r/559104 (https://phabricator.wikimedia.org/T235418)
[15:29:04] <James_F>	 (Back.)
[15:29:12] <wikibugs>	 (03PS3) 10Ottomata: Change eventgate-logging-external TLS port to 4392 [deployment-charts] - 10https://gerrit.wikimedia.org/r/558115
[15:29:52] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Change eventgate-logging-external TLS port to 4392 [deployment-charts] - 10https://gerrit.wikimedia.org/r/558115 (owner: 10Ottomata)
[15:30:18] <wikibugs>	 (03PS2) 10Ottomata: Change eventgate-logging-external TLS port to 4392 [puppet] - 10https://gerrit.wikimedia.org/r/558117
[15:31:08] <James_F>	 OK for me to re-roll the train to wikitech? andrewbogott?
[15:31:44] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Migrate archives of the OKFN-hosted Open-GLAM mailing list to Wikimedia's mailman - https://phabricator.wikimedia.org/T240929 (10SandraF_WMF) Update: I @SandraF_WMF have access to the mbox file and can send it to whomever can do the archives import. Please ping me when...
[15:31:47] <logmsgbot>	 !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
[15:31:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:31:55] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] Preserve tool name and path info in k8s ingress rewrite [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/558726 (https://phabricator.wikimedia.org/T241008) (owner: 10BryanDavis)
[15:32:54] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10serviceops: cergen should output unencrypted key file for use with envoyproxy kubernetes sidecars - https://phabricator.wikimedia.org/T240990 (10Ottomata) 05Open→03Declined > Instead of modifying cergen to always output an unencrypted key file, could we...
[15:34:45] <logmsgbot>	 !log otto@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
[15:34:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:39:22] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2006 is CRITICAL: PYBAL CRITICAL - CRITICAL - eventgate-logging-external_43192: Servers kubernetes2004.codfw.wmnet, kubernetes2006.codfw.wmnet, kubernetes2003.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:39:42] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2003 is CRITICAL: PYBAL CRITICAL - CRITICAL - eventgate-logging-external_43192: Servers kubernetes2002.codfw.wmnet, kubernetes2006.codfw.wmnet, kubernetes2005.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:40:29] <Amir1>	 I'm going to mess with mwdebug1002
[15:40:40] <wikibugs>	 (03PS2) 10Jhedden: add forward and reverse for cloudceph.svc.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/558707 (https://phabricator.wikimedia.org/T240715)
[15:40:57] <ottomata>	 ahhh
[15:41:00] <jynus>	 not touching the proxies as they don't have real traffic
[15:41:04] <ottomata>	 i downtimed stuff but apparently not the right ones
[15:41:07] <wikibugs>	 (03PS1) 10Bstorm: Add changelog message for the last change [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/559105
[15:41:11] <ottomata>	 i'm changing the port for that eventgate service...
[15:41:39] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add changelog message for the last change [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/559105 (owner: 10Bstorm)
[15:42:13] <logmsgbot>	 !log otto@deploy1001 helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
[15:42:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:43:08] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Change eventgate-logging-external TLS port to 4392 [puppet] - 10https://gerrit.wikimedia.org/r/558117 (owner: 10Ottomata)
[15:43:33] <Amir1>	 mwdebug1002 is clean now
[15:45:00] <wikibugs>	 (03PS8) 10Alexandros Kosiaris: Switch eqiad calico controller to the new etcd cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/558473 (https://phabricator.wikimedia.org/T239835)
[15:45:02] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Deduplicate cluster-helmfile.sh [deployment-charts] - 10https://gerrit.wikimedia.org/r/559106
[15:45:04] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: cluster-helmfile: Add a simple sleep 1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/559107
[15:47:07] <wikibugs>	 (03PS2) 10Bstorm: Add changelog message for the last change [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/559105
[15:47:41] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add changelog message for the last change [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/559105 (owner: 10Bstorm)
[15:49:36] <andrewbogott>	 James_F: I'm afk but I trust your judgment :)
[15:52:20] <icinga-wm>	 PROBLEM - PyBal IPVS diff check on lvs1015 is CRITICAL: CRITICAL: Services known to PyBal but not to IPVS: set([10.2.2.50:4392]) https://wikitech.wikimedia.org/wiki/PyBal
[15:53:17] <James_F>	 OK, let's try this out.
[15:53:34] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] Track Kerberos principals in data.yaml (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/559104 (https://phabricator.wikimedia.org/T235418) (owner: 10Muehlenhoff)
[15:53:50] <wikibugs>	 (03CR) 10Bstorm: "Because we just need to get the last change out, I am going to force this merge, however, we have to fix our build process.  I don't think" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/559105 (owner: 10Bstorm)
[15:53:58] <wikibugs>	 (03CR) 10Bstorm: [V: 03+2 C: 03+2] Add changelog message for the last change [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/559105 (owner: 10Bstorm)
[15:54:07] <wikibugs>	 (03PS1) 10Jhedden: lvs ceph: add cloudceph service and cluster [puppet] - 10https://gerrit.wikimedia.org/r/559110 (https://phabricator.wikimedia.org/T240715)
[15:54:57] <wikibugs>	 10Operations, 10serviceops: kubestagetcd1003 alerts daily via email to root@ for 'unexpected non snapshot file' - https://phabricator.wikimedia.org/T240932 (10elukey) @akosiaris I'd just remove `0000000000001131-0000000016c32fdb.snap.broken` then to avoid daily cronspam if possible, otherwise no problem :)
[15:55:00] <addshore>	 !log addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --batch-size=1 --from-id=14546856 # T237984 (will run to completion)
[15:55:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:55:06] <stashbot>	 T237984: Some property labels are not displayed on Item pages - https://phabricator.wikimedia.org/T237984
[15:55:16] <icinga-wm>	 PROBLEM - PyBal IPVS diff check on lvs1016 is CRITICAL: CRITICAL: Services known to PyBal but not to IPVS: set([10.2.2.50:4392]) https://wikitech.wikimedia.org/wiki/PyBal
[15:56:05] <wikibugs>	 (03PS1) 10Jforrester: Revert "train: Rolling Wikitech back to 1.35.0-wmf.10" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559112 (https://phabricator.wikimedia.org/T233859)
[15:56:10] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] Revert "train: Rolling Wikitech back to 1.35.0-wmf.10" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559112 (https://phabricator.wikimedia.org/T233859) (owner: 10Jforrester)
[15:57:11] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "train: Rolling Wikitech back to 1.35.0-wmf.10" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559112 (https://phabricator.wikimedia.org/T233859) (owner: 10Jforrester)
[15:57:47] <wikibugs>	 (03PS2) 10Muehlenhoff: Track Kerberos principals in data.yaml [puppet] - 10https://gerrit.wikimedia.org/r/559104 (https://phabricator.wikimedia.org/T235418)
[15:58:03] <wikibugs>	 (03CR) 10Muehlenhoff: Track Kerberos principals in data.yaml (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/559104 (https://phabricator.wikimedia.org/T235418) (owner: 10Muehlenhoff)
[15:58:45] <logmsgbot>	 !log jforrester@deploy1001 rebuilt and synchronized wikiversions files: train: Rolling Wikitech foward to 1.35.0-wmf.11 T233859 T241059
[15:58:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:58:54] <stashbot>	 T233859: 1.35.0-wmf.11 deployment blockers - https://phabricator.wikimedia.org/T233859
[15:58:54] <stashbot>	 T241059: Cannot access the database: Unknown error (10.64.32.72) - https://phabricator.wikimedia.org/T241059
[15:59:56] <wikibugs>	 10Operations, 10Traffic: HTTPS/Browser Recommendations page on Wikitech is outdated - https://phabricator.wikimedia.org/T240813 (10BBlack) The wording issues here are actually a bit tricky.  We've done several TLS standards upgrades over time, and there are still a few to go:  Done sometime in the past: * HTTP...
[16:00:00] <wikibugs>	 (03PS6) 10Alexandros Kosiaris: k8s: Migrate eqiad to the new etcd cluster [puppet] - 10https://gerrit.wikimedia.org/r/558355 (https://phabricator.wikimedia.org/T239835)
[16:00:04] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Remove roles from old k8s etcd hosts [puppet] - 10https://gerrit.wikimedia.org/r/559113 (https://phabricator.wikimedia.org/T239835)
[16:00:34] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs1016 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
[16:00:37] <James_F>	 So far, all seems quiet.
[16:01:06] <wikibugs>	 (03PS1) 10Jbond: puppet-merge: add ability to continue with changes to labs [puppet] - 10https://gerrit.wikimedia.org/r/559114
[16:01:29] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] puppet-merge: close LOCKFD before checking ownership [puppet] - 10https://gerrit.wikimedia.org/r/558666 (owner: 10CDanis)
[16:02:06] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: Remove roles from old k8s etcd hosts [puppet] - 10https://gerrit.wikimedia.org/r/559113 (https://phabricator.wikimedia.org/T239835)
[16:02:08] <wikibugs>	 (03PS7) 10Alexandros Kosiaris: k8s: Migrate eqiad to the new etcd cluster [puppet] - 10https://gerrit.wikimedia.org/r/558355 (https://phabricator.wikimedia.org/T239835)
[16:02:31] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] common::mcrouter.yaml switch mcrouter proxies so to reimage them [puppet] - 10https://gerrit.wikimedia.org/r/559033 (https://phabricator.wikimedia.org/T239054) (owner: 10Effie Mouzeli)
[16:02:49] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] puppet-merge: close LOCKFD before checking ownership [puppet] - 10https://gerrit.wikimedia.org/r/558666 (owner: 10CDanis)
[16:05:00] <wikibugs>	 (03PS2) 10Jbond: puppet-merge: add ability to continue with changes to labs [puppet] - 10https://gerrit.wikimedia.org/r/559114
[16:06:43] <wikibugs>	 (03CR) 10Jhedden: "Looking for review and guidance to setup a new LVS service for the WMCS ceph cluster." [puppet] - 10https://gerrit.wikimedia.org/r/559110 (https://phabricator.wikimedia.org/T240715) (owner: 10Jhedden)
[16:07:12] <wikibugs>	 (03PS1) 10Ottomata: Enable TLS for eventgate-main and eventgate-analytics [deployment-charts] - 10https://gerrit.wikimedia.org/r/559118 (https://phabricator.wikimedia.org/T241073)
[16:07:38] <wikibugs>	 (03CR) 10CDanis: puppet-merge: add ability to continue with changes to labs (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/559114 (owner: 10Jbond)
[16:08:04] <wikibugs>	 10Operations, 10netops, 10cloud-services-team (Kanban): WMCS: cleanup network allocations - https://phabricator.wikimedia.org/T240670 (10ayounsi) For sure! Ping me if you have any issue.
[16:08:20] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs1015 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
[16:10:08] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] scap:dsh.yaml switch scap proxies so to reimage them [puppet] - 10https://gerrit.wikimedia.org/r/559021 (https://phabricator.wikimedia.org/T239054) (owner: 10Effie Mouzeli)
[16:10:28] <effie>	 jouncebot: next
[16:10:28] <jouncebot>	 In 2 hour(s) and 49 minute(s): Morning SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191218T1900)
[16:11:00] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2006 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[16:12:13] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Enable TLS for eventgate-main and eventgate-analytics [deployment-charts] - 10https://gerrit.wikimedia.org/r/559118 (https://phabricator.wikimedia.org/T241073) (owner: 10Ottomata)
[16:12:49] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] Remove roles from old k8s etcd hosts [puppet] - 10https://gerrit.wikimedia.org/r/559113 (https://phabricator.wikimedia.org/T239835) (owner: 10Alexandros Kosiaris)
[16:13:57] <wikibugs>	 (03CR) 10Jbond: "Thanks updated" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/559114 (owner: 10Jbond)
[16:16:50] <wikibugs>	 (03PS3) 10Jbond: puppet-merge: add ability to continue with changes to labs [puppet] - 10https://gerrit.wikimedia.org/r/559114
[16:17:35] <jbond42>	 ^^ cdanis updated
[16:17:36] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2003 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[16:18:26] <logmsgbot>	 !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
[16:18:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:18:57] <wikibugs>	 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): (No Need By Date Provided) rack/setup/install cloudvirt-wdqs100[123].eqiad.wmnet - https://phabricator.wikimedia.org/T235685 (10Bstorm)
[16:19:29] <logmsgbot>	 !log akosiaris@cumin1001 conftool action : set/weight=10; selector: name=kubernetes2002.codfw.wmnet,service=echostore
[16:19:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:19:55] <wikibugs>	 (03CR) 10CDanis: puppet-merge: add ability to continue with changes to labs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/559114 (owner: 10Jbond)
[16:20:32] <James_F>	 effie: Can you ping once you're done with the scap proxies?
[16:21:18] <icinga-wm>	 PROBLEM - Host kubestagetcd1002 is DOWN: PING CRITICAL - Packet loss = 100%
[16:21:41] <akosiaris>	 !log remove kubestagetcd100{1,2,3} from the fleet T239835
[16:21:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:21:48] <stashbot>	 T239835: setup new, buster based, kubernetes etcd servers for staging/codfw/eqiad cluster - https://phabricator.wikimedia.org/T239835
[16:21:54] <logmsgbot>	 !log otto@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
[16:21:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:10] <icinga-wm>	 PROBLEM - Host kubestagetcd1001 is DOWN: PING CRITICAL - Packet loss = 100%
[16:22:25] <wikibugs>	 10Operations, 10serviceops: kubestagetcd1003 alerts daily via email to root@ for 'unexpected non snapshot file' - https://phabricator.wikimedia.org/T240932 (10akosiaris) vm removed from the fleet. this should a problem no more :-)
[16:22:30] <icinga-wm>	 PROBLEM - Host kubestagetcd1003 is DOWN: PING CRITICAL - Packet loss = 100%
[16:23:37] <wikibugs>	 (03PS4) 10Jbond: puppet-merge: add ability to continue with changes to labs [puppet] - 10https://gerrit.wikimedia.org/r/559114
[16:23:50] <wikibugs>	 (03CR) 10Jbond: puppet-merge: add ability to continue with changes to labs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/559114 (owner: 10Jbond)
[16:25:23] <logmsgbot>	 !log otto@deploy1001 helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
[16:25:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:25:33] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] puppet-merge: add ability to continue with changes to labs [puppet] - 10https://gerrit.wikimedia.org/r/559114 (owner: 10Jbond)
[16:25:40] <cdanis>	 thanks jbond42 lgtm
[16:26:03] <jbond42>	 great thanks cdanis 
[16:26:27] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] puppet-merge: add ability to continue with changes to labs [puppet] - 10https://gerrit.wikimedia.org/r/559114 (owner: 10Jbond)
[16:29:29] <logmsgbot>	 !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
[16:29:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:29:38] <wikibugs>	 (03PS1) 10Ayounsi: Fastnetmon: add thresholds overrides [puppet] - 10https://gerrit.wikimedia.org/r/559125 (https://phabricator.wikimedia.org/T240789)
[16:30:27] <logmsgbot>	 !log akosiaris@cumin1001 conftool action : set/weight=10; selector: dc=codfw,service=echostore
[16:30:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:31:29] <wikibugs>	 10Operations, 10Kubernetes: Migrate etcd cluster for Kubernetes staging cluster to Stretch/Buster - https://phabricator.wikimedia.org/T224568 (10akosiaris) done,done and done
[16:31:40] <wikibugs>	 (03PS1) 10Jbond: puppet-merge: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/559126
[16:31:44] <wikibugs>	 10Operations, 10Kubernetes: Migrate etcd cluster for Kubernetes staging cluster to Stretch/Buster - https://phabricator.wikimedia.org/T224568 (10akosiaris)
[16:32:42] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:33:48] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:34:14] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] puppet-merge: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/559126 (owner: 10Jbond)
[16:34:20] <wikibugs>	 10Operations, 10Machine vision, 10Product-Infrastructure-Team-Backlog, 10Structured-Data-Backlog, and 6 others: Some jobs are not being processed / are processed slowly - https://phabricator.wikimedia.org/T240518 (10Pchelolo) The explanation of what's happened is in T241072#5751597. Shorly, Kafka-based que...
[16:34:40] <wikibugs>	 (03CR) 10CRusnov: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/554526 (owner: 10Jbond)
[16:36:10] <wikibugs>	 10Operations, 10DC-Ops, 10decommission, 10Discovery-Search (Current work): decommission elastic10[18-31].eqiad.wmnet - https://phabricator.wikimedia.org/T239821 (10TJones) 05Open→03Resolved
[16:37:23] <wikibugs>	 10Operations, 10netops, 10cloud-services-team (Kanban): WMCS: cleanup network allocations - https://phabricator.wikimedia.org/T240670 (10aborrero) 05Open→03Resolved Done, I deleted all the objects referred in the task description from Netbox.
[16:44:47] <wikibugs>	 10Operations, 10Cloud-VPS, 10Toolforge, 10cloud-services-team (Kanban): Toolforge's static webserver broken by Puppet changes and stale nginx packages - https://phabricator.wikimedia.org/T175885 (10Bstorm) 05Open→03Declined
[16:46:43] <wikibugs>	 (03PS2) 10Ayounsi: Fastnetmon: add thresholds overrides [puppet] - 10https://gerrit.wikimedia.org/r/559125 (https://phabricator.wikimedia.org/T240789)
[16:47:11] <wikibugs>	 (03PS1) 10Jbond: puppet-merge: typo [puppet] - 10https://gerrit.wikimedia.org/r/559131
[16:47:39] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on puppetmaster1001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes
[16:48:29] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] puppet-merge: typo [puppet] - 10https://gerrit.wikimedia.org/r/559131 (owner: 10Jbond)
[16:49:32] <jbond42>	 ^^ im looking t the unmerged change stuff
[16:49:41] <wikibugs>	 (03PS2) 10Jbond: puppet-merge: typo [puppet] - 10https://gerrit.wikimedia.org/r/559131
[16:49:48] <wikibugs>	 (03PS1) 10Tchanders: Enable banner on Special:Block for selected wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559132
[16:50:54] <wikibugs>	 (03PS2) 10Tchanders: Enable banner on Special:Block for selected wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559132 (https://phabricator.wikimedia.org/T240300)
[16:50:56] <wikibugs>	 (03PS3) 10Ayounsi: Fastnetmon: add thresholds overrides [puppet] - 10https://gerrit.wikimedia.org/r/559125 (https://phabricator.wikimedia.org/T240789)
[16:51:14] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] puppet-merge: typo [puppet] - 10https://gerrit.wikimedia.org/r/559131 (owner: 10Jbond)
[16:52:45] <wikibugs>	 (03CR) 10Ayounsi: "See the end result there: https://puppet-compiler.wmflabs.org/compiler1002/20062/netflow1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/559125 (https://phabricator.wikimedia.org/T240789) (owner: 10Ayounsi)
[16:54:20] <wikibugs>	 (03PS1) 10Jbond: puppet-merge: dont quote pass through arguments [puppet] - 10https://gerrit.wikimedia.org/r/559135
[16:55:39] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] puppet-merge: dont quote pass through arguments [puppet] - 10https://gerrit.wikimedia.org/r/559135 (owner: 10Jbond)
[16:56:05] <wikibugs>	 (03PS1) 10CDanis: dbctl: use hostsByName from etcd [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559136 (https://phabricator.wikimedia.org/T240991)
[16:57:05] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on puppetmaster1001 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes
[16:59:08] <wikibugs>	 (03PS1) 10Jbond: test puppet-merge [labs/private] - 10https://gerrit.wikimedia.org/r/559137
[16:59:49] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] "It seems correct in line of principle, but... deploying everywhere out of the box? Bold!" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559136 (https://phabricator.wikimedia.org/T240991) (owner: 10CDanis)
[16:59:51] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] test puppet-merge [labs/private] - 10https://gerrit.wikimedia.org/r/559137 (owner: 10Jbond)
[17:00:07] <cdanis>	 _joe_: tested on mwdebug2001 already, and scap does canarying ;)
[17:00:38] <wikibugs>	 (03CR) 10Anomie: filtered_tables.txt: Remove dropped columns (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/558851 (https://phabricator.wikimedia.org/T233135) (owner: 10Marostegui)
[17:00:53] <_joe_>	 for some value of it
[17:01:01] <_joe_>	 but ok, I did give you a +1
[17:01:03] <_joe_>	 :P
[17:01:35] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[17:01:41] <_joe_>	 cdanis: https://i.imgur.com/q9nz8cJ.mp4
[17:01:46] <wikibugs>	 10Puppet, 10User-jbond: puppet-=merge not working on the private repo - https://phabricator.wikimedia.org/T241075 (10jbond)
[17:01:58] <cdanis>	 _joe_: hey that's how I did externalLoads and all was fine ;)
[17:02:09] <James_F>	 effie, _joe_: Is it OK to scap? I guess you were taking down the scap proxies and rebuilding them?
[17:02:33] <_joe_>	 James_F: ok with me, but wait for effie to confirm
[17:02:48] <_joe_>	 for a couple minutes, I think she's amply done
[17:02:58] * James_F nods.
[17:03:03] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[17:03:04] <James_F>	 No great rush.
[17:04:46] <wikibugs>	 (03CR) 10Marostegui: filtered_tables.txt: Remove dropped columns (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/558851 (https://phabricator.wikimedia.org/T233135) (owner: 10Marostegui)
[17:08:16] <wikibugs>	 (03PS2) 10Marostegui: filtered_tables.txt: Remove dropped columns [puppet] - 10https://gerrit.wikimedia.org/r/558851 (https://phabricator.wikimedia.org/T233135)
[17:11:22] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] Bump Parsoid/PHP cluster memory_limit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558737 (https://phabricator.wikimedia.org/T239806) (owner: 10Arlolra)
[17:17:30] <James_F>	 _joe_: … it's been 15 minutes, can I just go?
[17:17:58] <_joe_>	 James_F: i guess just see the list and ping them?
[17:17:59] <logmsgbot>	 !log ppchelko@deploy1001 Started deploy [restbase/deploy@c9d8ef1]: Parsoid-PHP: mirror 100% of all traffic T229015
[17:18:12] <_joe_>	 lemme try
[17:18:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:18:14] <stashbot>	 T229015: Tracking: Direct live production traffic at Parsoid/PHP - https://phabricator.wikimedia.org/T229015
[17:18:20] <James_F>	 Sure.
[17:18:53] <wikibugs>	 (03PS3) 10Tchanders: Enable banner on Special:Block for selected wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559132 (https://phabricator.wikimedia.org/T240300)
[17:19:53] <_joe_>	 James_F: go on
[17:20:11] <logmsgbot>	 !log jforrester@deploy1001 Started scap: Full scap for extra AHT i18n
[17:20:12] <James_F>	 Thanks!
[17:20:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:29:44] <wikibugs>	 10Operations, 10Traffic: /sec-warning page: please add an HTML comment that is more easily visible to API and transport-level inspection/debugging - https://phabricator.wikimedia.org/T240794 (10DavidBrooks) Ah, that makes it more complex. Again, I guess there's no evidence that anyone else is encountering the...
[17:33:59] <logmsgbot>	 !log ppchelko@deploy1001 Finished deploy [restbase/deploy@c9d8ef1]: Parsoid-PHP: mirror 100% of all traffic T229015 (duration: 16m 00s)
[17:34:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:34:05] <stashbot>	 T229015: Tracking: Direct live production traffic at Parsoid/PHP - https://phabricator.wikimedia.org/T229015
[17:34:20] <logmsgbot>	 !log jforrester@deploy1001 Finished scap: Full scap for extra AHT i18n (duration: 14m 09s)
[17:34:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:35:49] <wikibugs>	 (03CR) 10Ammarpad: [C: 03+1] "Looks good." [puppet] - 10https://gerrit.wikimedia.org/r/559073 (https://phabricator.wikimedia.org/T240771) (owner: 10IAmNetx)
[17:36:19] <James_F>	 All done from my end.
[17:36:19] <wikibugs>	 (03Abandoned) 10CDanis: dbctl: add 'PLACEHOLDER' as possible section master [software/conftool] - 10https://gerrit.wikimedia.org/r/556725 (owner: 10CDanis)
[17:37:31] <wikibugs>	 (03PS3) 10BryanDavis: toolforge: Add CORS header to docker registry [puppet] - 10https://gerrit.wikimedia.org/r/558220 (https://phabricator.wikimedia.org/T232135)
[17:37:55] <wikibugs>	 (03CR) 10BryanDavis: "fixed formatting nits" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/558220 (https://phabricator.wikimedia.org/T232135) (owner: 10BryanDavis)
[17:39:28] <Tchanders>	 James_F Thanks very much from AHT
[17:52:05] <jynus>	 !log reload dbproxy1017 dbproxy1021
[17:52:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:53:47] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1017 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy
[17:54:03] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1021 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy
[17:55:04] <cdanis>	 jouncebot: next
[17:55:04] <jouncebot>	 In 1 hour(s) and 4 minute(s): Morning SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191218T1900)
[17:56:34] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] dbctl: use hostsByName from etcd [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559136 (https://phabricator.wikimedia.org/T240991) (owner: 10CDanis)
[17:58:58] <logmsgbot>	 !log cdanis@deploy1001 Synchronized wmf-config/CommonSettings.php: use hostsByName from etcd 96df9c004 T229676 T240991 (duration: 01m 01s)
[17:59:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:59:05] <stashbot>	 T240991: Does an eqiad Mediawiki need to have codfw DB servers in its hostsByName? and vice versa - https://phabricator.wikimedia.org/T240991
[17:59:05] <stashbot>	 T229676: #dbctl: generate hostsByName section as well - https://phabricator.wikimedia.org/T229676
[18:00:35] <logmsgbot>	 !log cdanis@deploy1001 Synchronized wmf-config/etcd.php: use hostsByName from etcd 96df9c004 T229676 T240991 (duration: 01m 01s)
[18:00:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:06:30] <wikibugs>	 (03PS1) 10CDanis: remove dbctl-obsoleted hostsByName entries 🔧 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559154 (https://phabricator.wikimedia.org/T240991)
[18:07:09] <wikibugs>	 (03PS1) 10Ayounsi: Prepare Puppet to apply netinsights to all sites [puppet] - 10https://gerrit.wikimedia.org/r/559155
[18:10:14] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] remove dbctl-obsoleted hostsByName entries 🔧 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559154 (https://phabricator.wikimedia.org/T240991) (owner: 10CDanis)
[18:10:25] <wikibugs>	 10Operations: Add security-team@wikimedia.org as recipient of abuse@wikimedia.org emails - https://phabricator.wikimedia.org/T241078 (10Aklapper) Ah, thanks! Adding the #Operations tag so this task ends up in the SRE basket.
[18:10:57] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] remove dbctl-obsoleted hostsByName entries 🔧 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559154 (https://phabricator.wikimedia.org/T240991) (owner: 10CDanis)
[18:12:02] <wikibugs>	 (03Merged) 10jenkins-bot: remove dbctl-obsoleted hostsByName entries 🔧 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559154 (https://phabricator.wikimedia.org/T240991) (owner: 10CDanis)
[18:13:58] <logmsgbot>	 !log cdanis@deploy1001 Synchronized wmf-config/db-codfw.php: remove dbctl-obsoleted hostsByName entries 🔧 7d20965f5 T240991 T229676 (duration: 01m 01s)
[18:14:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:14:05] <stashbot>	 T240991: Does an eqiad Mediawiki need to have codfw DB servers in its hostsByName? and vice versa - https://phabricator.wikimedia.org/T240991
[18:14:05] <stashbot>	 T229676: #dbctl: generate hostsByName section as well - https://phabricator.wikimedia.org/T229676
[18:14:21] <wikibugs>	 (03PS1) 10MSantos: Reduce osmosis maxInterval in half [puppet] - 10https://gerrit.wikimedia.org/r/559158 (https://phabricator.wikimedia.org/T239728)
[18:15:53] <logmsgbot>	 !log cdanis@deploy1001 Synchronized wmf-config/db-eqiad.php: remove dbctl-obsoleted hostsByName entries 🔧 7d20965f5 T240991 T229676 (duration: 01m 01s)
[18:15:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:17:38] <cdanis>	 🎉 
[18:17:50] <rlazarus>	 🎊
[18:18:50] <jynus>	 !log disable puppet on backup1001, dbprov1001 to test special backup recovery
[18:18:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:19:02] <cdanis>	 I am all done messing with production :D
[18:19:31] <wikibugs>	 (03PS1) 10Herron: dns: add forward/reverse ipv4 records for netflow4001.ulsfo.wmnet VM [dns] - 10https://gerrit.wikimedia.org/r/559160
[18:19:38] <cdanis>	 (for now, ofc ;)
[18:21:22] <wikibugs>	 (03CR) 10Herron: [C: 03+1] Prepare Puppet to apply netinsights to all sites [puppet] - 10https://gerrit.wikimedia.org/r/559155 (owner: 10Ayounsi)
[18:21:42] <wikibugs>	 10Operations, 10Documentation: Wikitech: update Bacula article - https://phabricator.wikimedia.org/T100954 (10jcrespo) a:03jcrespo
[18:22:07] <wikibugs>	 10Operations, 10Documentation: Wikitech: update Bacula article - https://phabricator.wikimedia.org/T100954 (10jcrespo)
[18:22:12] <wikibugs>	 10Operations, 10DBA, 10serviceops, 10Goal, 10Patch-For-Review: Strengthen backup infrastructure and support - https://phabricator.wikimedia.org/T229209 (10jcrespo)
[18:22:57] <wikibugs>	 10Operations, 10DBA, 10serviceops, 10Goal, 10Patch-For-Review: Strengthen backup infrastructure and support - https://phabricator.wikimedia.org/T229209 (10jcrespo)
[18:23:11] <wikibugs>	 10Operations, 10Goal: Followup to backup1001 bacula switchover (misc pending tasks) - https://phabricator.wikimedia.org/T238048 (10jcrespo)
[18:26:21] <wikibugs>	 (03PS1) 10Ladsgroup: Clean up unused configs in Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559161 (https://phabricator.wikimedia.org/T238154)
[18:26:24] <wikibugs>	 (03PS1) 10Ladsgroup: Clean up unused config in InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559162 (https://phabricator.wikimedia.org/T238154)
[18:27:05] <wikibugs>	 (03CR) 10Mathew.onipe: [C: 03+1] Reduce osmosis maxInterval in half [puppet] - 10https://gerrit.wikimedia.org/r/559158 (https://phabricator.wikimedia.org/T239728) (owner: 10MSantos)
[18:31:01] <wikibugs>	 (03CR) 10Herron: [C: 03+2] dns: add forward/reverse ipv4 records for netflow4001.ulsfo.wmnet VM [dns] - 10https://gerrit.wikimedia.org/r/559160 (owner: 10Herron)
[18:49:03] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: First version of the debmonitor client [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559165
[18:50:54] <wikibugs>	 (03PS4) 10DCausse: [cirrus] move similarity settings to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558576
[18:51:10] <wikibugs>	 (03CR) 10Jforrester: "So glad to see this gone. Thank you!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559154 (https://phabricator.wikimedia.org/T240991) (owner: 10CDanis)
[18:51:19] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@20e4c16]: Update glent to spark 2.4.4
[18:51:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:51:31] <cdanis>	 James_F: <3
[18:51:41] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [cirrus] move similarity settings to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558576 (owner: 10DCausse)
[18:51:49] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@20e4c16]: Update glent to spark 2.4.4 (duration: 00m 29s)
[18:51:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:51:55] <wikibugs>	 (03CR) 10DCausse: [cirrus] add elastic mapping for ores drafttopics (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558577 (https://phabricator.wikimedia.org/T240550) (owner: 10DCausse)
[18:52:01] <wikibugs>	 (03Abandoned) 10DCausse: [cirrus] add elastic mapping for ores drafttopics [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558577 (https://phabricator.wikimedia.org/T240550) (owner: 10DCausse)
[18:52:19] <cdanis>	 James_F: I took great joy in merging and deploying it, not least of all because it's IMO pretty hilarious to delete 300 lines that all end with "# Do not delete or comment out" 
[18:56:00] <logmsgbot>	 !log otto@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
[18:56:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:56:09] <wikibugs>	 (03PS1) 10Herron: ganeti: ensure package ganeti-instance-debootstrap installed [puppet] - 10https://gerrit.wikimedia.org/r/559166 (https://phabricator.wikimedia.org/T226444)
[18:57:07] <wikibugs>	 (03PS1) 10Ottomata: Switch eventgate-analytics LVS to use TLS port 4192 [puppet] - 10https://gerrit.wikimedia.org/r/559167 (https://phabricator.wikimedia.org/T241073)
[18:57:09] <wikibugs>	 (03PS1) 10Ottomata: Switch eventgate-main LVS to use TLS port 4292 [puppet] - 10https://gerrit.wikimedia.org/r/559168 (https://phabricator.wikimedia.org/T241073)
[18:58:04] <logmsgbot>	 !log otto@deploy1001 helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
[18:58:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:58:28] <wikibugs>	 (03CR) 10Anomie: [C: 03+1] "Seems correct to me. The fields being removed here match those removed from production." [puppet] - 10https://gerrit.wikimedia.org/r/558851 (https://phabricator.wikimedia.org/T233135) (owner: 10Marostegui)
[18:58:47] <wikibugs>	 (03CR) 10Dmaza: [C: 03+1] Enable banner on Special:Block for selected wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559132 (https://phabricator.wikimedia.org/T240300) (owner: 10Tchanders)
[19:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: I, the Bot under the Fountain, allow thee, The Deployer, to do Morning SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191218T1900).
[19:00:05] <jouncebot>	 Tchanders, cscott, and arlolra: A patch you scheduled for Morning SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[19:01:33] <wikibugs>	 (03PS2) 10Ottomata: Switch eventgate-main LVS to use TLS port 4292 [puppet] - 10https://gerrit.wikimedia.org/r/559168 (https://phabricator.wikimedia.org/T241073)
[19:02:31] <wikibugs>	 10Operations, 10ops-codfw: codfw: rack/setup/install puppetmaster2003.codfw.wmnet - https://phabricator.wikimedia.org/T239732 (10RobH)
[19:02:49] <wikibugs>	 10Operations, 10ops-codfw: codfw: rack/setup/install puppetmaster2003.codfw.wmnet - https://phabricator.wikimedia.org/T239732 (10RobH)
[19:03:54] <Niharika>	 I can SWAT. 
[19:04:03] <wikibugs>	 (03PS4) 10Niharika29: Enable banner on Special:Block for selected wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559132 (https://phabricator.wikimedia.org/T240300) (owner: 10Tchanders)
[19:04:12] <Tchanders>	 Niharika: Great, thanks
[19:04:14] <wikibugs>	 (03CR) 10Niharika29: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559132 (https://phabricator.wikimedia.org/T240300) (owner: 10Tchanders)
[19:05:13] <wikibugs>	 (03Merged) 10jenkins-bot: Enable banner on Special:Block for selected wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559132 (https://phabricator.wikimedia.org/T240300) (owner: 10Tchanders)
[19:06:19] <Niharika>	 Tchanders: Patch on mwdebug1001. 
[19:06:29] <Tchanders>	 OK, taking a look
[19:06:46] <Niharika>	 cscott: arlolra: Either of you around?
[19:07:44] <wikibugs>	 (03PS1) 10Herron: ganeti: allow ssh between cluster regardless of ganeti_cluster fact [puppet] - 10https://gerrit.wikimedia.org/r/559172
[19:07:58] <cscott>	 yup, i'm around
[19:08:06] <cscott>	 pretty sure arlolra is too, somewhere :)
[19:08:13] <Tchanders>	 Niharika: Do you know a wiki without partial blocks enabled off the top of your head (other than enwiki and commons)?
[19:08:56] <wikibugs>	 (03PS2) 10Herron: ganeti: allow ssh between cluster regardless of ganeti_cluster fact [puppet] - 10https://gerrit.wikimedia.org/r/559172
[19:08:58] <Niharika>	 Tchanders: wikidata?
[19:09:00] <arlolra>	 somewhere
[19:09:14] <Niharika>	 Tchanders: el.wikipedia.org too probably. 
[19:12:27] <Niharika>	 Tchanders: All good?
[19:13:40] <wikibugs>	 (03PS1) 10Herron: install_server: add dhcp/netboot entries for netflow4001 [puppet] - 10https://gerrit.wikimedia.org/r/559174
[19:13:54] <Tchanders>	 Niharika: elwiki doesn't have partial blocks, but is also in group 2, so no banner yet... Wikidata is showing the banner, but the CSS is wrong. Trying to figure out if it's a local caching problem
[19:15:11] <Tchanders>	 Niharika: Also if you can think of any group 0 or 1 wikis without partial blocks, I could cross-check with one of those
[19:15:29] <Niharika>	 Tchanders: en.wikinews.org should be one.
[19:15:54] <wikibugs>	 (03CR) 10Herron: [C: 03+2] install_server: add dhcp/netboot entries for netflow4001 [puppet] - 10https://gerrit.wikimedia.org/r/559174 (owner: 10Herron)
[19:17:06] <Niharika>	 Tchanders: Appears fine on enwikinews but not on https://en.wikiversity.org/wiki/Special:Block
[19:17:22] <Tchanders>	 Niharika: The CSS is wrong - we need to remove a class
[19:18:05] <Niharika>	 Tchanders: Okay. I will revert this config patch for now and we can re-attempt the swat later today. 
[19:18:14] <Tchanders>	 Niharika: OK
[19:18:30] <wikibugs>	 (03PS1) 10Niharika29: Revert "Enable banner on Special:Block for selected wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559175
[19:18:37] <wikibugs>	 (03CR) 10Niharika29: [C: 03+2] Revert "Enable banner on Special:Block for selected wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559175 (owner: 10Niharika29)
[19:19:37] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Enable banner on Special:Block for selected wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559175 (owner: 10Niharika29)
[19:20:05] <wikibugs>	 (03PS4) 10Niharika29: Bump Parsoid/PHP cluster memory_limit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558737 (https://phabricator.wikimedia.org/T239806) (owner: 10Arlolra)
[19:20:53] <wikibugs>	 (03CR) 10Niharika29: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558737 (https://phabricator.wikimedia.org/T239806) (owner: 10Arlolra)
[19:21:53] <wikibugs>	 (03Merged) 10jenkins-bot: Bump Parsoid/PHP cluster memory_limit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558737 (https://phabricator.wikimedia.org/T239806) (owner: 10Arlolra)
[19:22:17] <Niharika>	 cscott: arlolra: Ready to test on mwdebug1001. 
[19:22:56] <cscott>	 ok.  probably hard to tell the difference, but i can check that VE/Parsoid works at least.
[19:23:58] <subbu>	 cscott, arlolra mwdebug1001 is not relevant here parsoid/php doesn't land there.
[19:25:29] <cscott>	 yeah, but it checks that we didn't totally break the cluster config w/ a stupid typo
[19:25:42] <cscott>	 Niharika: mwdebug1001 seems to work fine.
[19:25:44] <subbu>	 true.
[19:26:14] <Niharika>	 Okay, will sync.
[19:27:59] <logmsgbot>	 !log niharika29@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Bump Parsoid/PHP cluster memory_limit - T239806, T236833 (duration: 01m 01s)
[19:28:02] <Niharika>	 cscott: arlolra: Deployed. 
[19:28:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:28:06] <stashbot>	 T236833: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833
[19:28:07] <stashbot>	 T239806: Parsoid/PHP errors - https://phabricator.wikimedia.org/T239806
[19:29:51] <wikibugs>	 (03PS1) 10Volans: Images: add support for names with slashes [software/debmonitor] - 10https://gerrit.wikimedia.org/r/559181 (https://phabricator.wikimedia.org/T237978)
[19:30:32] <arlolra>	 Niharika: thanks
[19:32:19] <wikibugs>	 (03PS2) 10Volans: Images: add support for names with slashes [software/debmonitor] - 10https://gerrit.wikimedia.org/r/559181 (https://phabricator.wikimedia.org/T237978)
[19:33:02] <cscott>	 OOMs seem to have decreased, so the swat seems to have been effective.
[19:33:05] <cscott>	 Niharika: thanks!
[19:42:59] <wikibugs>	 10Operations, 10LDAP-Access-Requests: Request for Superset & Turnilo Access - https://phabricator.wikimedia.org/T240988 (10Neil_P._Quinn_WMF) Glad you're exploring these new tools, @ifried!   For this, you need to be in the `wmf` LDAP group, so I'm moving this to the #ldap-access-requests project. The SRE team...
[19:43:53] <wikibugs>	 (03CR) 10Anomie: [C: 03+1] "Go for it when you're ready to deploy." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558769 (https://phabricator.wikimedia.org/T232613) (owner: 10Krinkle)
[19:47:45] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@4b174fd]: glent: Explicitly pass previous partition to m2run
[19:47:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:47:55] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@4b174fd]: glent: Explicitly pass previous partition to m2run (duration: 00m 10s)
[19:47:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:51:53] * Niharika continues swat
[19:53:01] <wikibugs>	 (03PS1) 10Niharika29: Revert "Revert "Enable banner on Special:Block for selected wikis"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559185
[19:57:13] <wikibugs>	 (03PS5) 10Jhedden: openstack: add ceph rbd support for nova-compute [puppet] - 10https://gerrit.wikimedia.org/r/557086 (https://phabricator.wikimedia.org/T239918)
[19:59:05] <logmsgbot>	 !log niharika29@deploy1001 Synchronized php-1.35.0-wmf.11/extensions/WikimediaMessages/includes/WikimediaMessagesHooks.php: Remove messagebox class from partial block banner - T240300 (duration: 01m 02s)
[19:59:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:59:10] <stashbot>	 T240300: Introduce a temporary banner on Special:Block to inform users about upcoming partial blocks deploy - https://phabricator.wikimedia.org/T240300
[19:59:20] <wikibugs>	 (03PS2) 10Niharika29: Revert "Revert "Enable banner on Special:Block for selected wikis"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559185
[19:59:22] <wikibugs>	 (03CR) 10Tchanders: [C: 03+1] Revert "Revert "Enable banner on Special:Block for selected wikis"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559185 (owner: 10Niharika29)
[19:59:27] <wikibugs>	 (03CR) 10Niharika29: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559185 (owner: 10Niharika29)
[19:59:47] <wikibugs>	 (03CR) 10Jhedden: [C: 03+2] openstack: add ceph rbd support for nova-compute [puppet] - 10https://gerrit.wikimedia.org/r/557086 (https://phabricator.wikimedia.org/T239918) (owner: 10Jhedden)
[20:00:47] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Revert "Enable banner on Special:Block for selected wikis"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559185 (owner: 10Niharika29)
[20:01:05] <Niharika>	 Tchanders: Ready to test on mwdebug1001. 
[20:01:17] <Tchanders>	 Niharika: On it
[20:01:30] <wikibugs>	 (03PS2) 10Krinkle: CommonSettings.php: Remove the disabled "temporary" code for T232613 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558769 (https://phabricator.wikimedia.org/T232613)
[20:02:43] <Tchanders>	 NIharika: Looks good
[20:02:54] <Niharika>	 Awesome. 
[20:04:41] <logmsgbot>	 !log niharika29@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable banner on Special:Block for selected wikis - T240300 (duration: 01m 01s)
[20:04:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:04:47] <stashbot>	 T240300: Introduce a temporary banner on Special:Block to inform users about upcoming partial blocks deploy - https://phabricator.wikimedia.org/T240300
[20:04:57] <Niharika>	 Tchanders: All done. 
[20:05:42] <Tchanders>	 Niharika: Great, thank you
[20:36:47] <logmsgbot>	 !log ppchelko@deploy1001 Started deploy [restbase/deploy@6e24349]: Disable all parsoid-php vs parsoid-js special cases T229015
[20:36:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:36:54] <stashbot>	 T229015: Tracking: Direct live production traffic at Parsoid/PHP - https://phabricator.wikimedia.org/T229015
[20:50:44] <logmsgbot>	 !log ppchelko@deploy1001 Finished deploy [restbase/deploy@6e24349]: Disable all parsoid-php vs parsoid-js special cases T229015 (duration: 13m 56s)
[20:50:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:50:49] <stashbot>	 T229015: Tracking: Direct live production traffic at Parsoid/PHP - https://phabricator.wikimedia.org/T229015
[20:52:49] <wikibugs>	 10Operations, 10Release-Engineering-Team, 10serviceops, 10Patch-For-Review, and 3 others: All debug hosts give (likely spurious) message: PHP Fatal error:  The UdpSocket to 127.0.0.1:10514 has been closed (from Monolog/SyslogUdp) - https://phabricator.wikimedia.org/T214734 (10Krinkle) I also see the UdpSoc...
[21:00:04] <jouncebot>	 cscott, arlolra, subbu, halfak, and accraze: Your horoscope predicts another unfortunate Services – Graphoid / Parsoid / Citoid / ORES deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191218T2100).
[21:00:19] <logmsgbot>	 !log ppchelko@deploy1001 Started deploy [cpjobqueue/deploy@7e68510]: Return low_traffic_jobs concurrency to normal after T240518
[21:00:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:00:25] <stashbot>	 T240518: Some jobs are not being processed / are processed slowly - https://phabricator.wikimedia.org/T240518
[21:00:31] <logmsgbot>	 !log halfak@deploy1001 Started deploy [ores/deploy@80b1e62]: T240725
[21:00:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:00:36] <stashbot>	 T240725: ORES deployment mid-Dec. 2019 - https://phabricator.wikimedia.org/T240725
[21:01:22] <logmsgbot>	 !log ppchelko@deploy1001 Finished deploy [cpjobqueue/deploy@7e68510]: Return low_traffic_jobs concurrency to normal after T240518 (duration: 01m 03s)
[21:01:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:22:35] <logmsgbot>	 !log halfak@deploy1001 Finished deploy [ores/deploy@80b1e62]: T240725 (duration: 22m 05s)
[21:22:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:22:42] <stashbot>	 T240725: ORES deployment mid-Dec. 2019 - https://phabricator.wikimedia.org/T240725
[21:23:07] <wikibugs>	 (03CR) 10Jhedden: [C: 03+1] cloud: update maintain-views to handle dblists with comments [puppet] - 10https://gerrit.wikimedia.org/r/555740 (https://phabricator.wikimedia.org/T239415) (owner: 10BryanDavis)
[21:25:39] <halfak>	 Success!  Looks good. 
[21:25:42] <wikibugs>	 10Operations, 10DC-Ops, 10hardware-requests: eqiad: three clouvirt-wdqs servers for WDQS testing - https://phabricator.wikimedia.org/T232654 (10bd808)
[21:26:03] <wikibugs>	 (03PS1) 10Ottomata: Use HOSTNAME_COMMAND with KAFKA_SERVICE_HOST for kafka.advertised_port [deployment-charts] - 10https://gerrit.wikimedia.org/r/559209
[21:26:48] <wikibugs>	 10Operations, 10Core Platform Team, 10TechCom, 10User-mobrovac: Service Ownership and Maintenance - https://phabricator.wikimedia.org/T122825 (10Krinkle) a:03Joe
[21:26:52] <wikibugs>	 10Operations, 10Domains, 10Traffic: nameserver change for wikimedia.sk - https://phabricator.wikimedia.org/T241084 (10RobH) a:03Luky001
[21:26:53] <wikibugs>	 (03PS2) 10Ottomata: Use HOSTNAME_COMMAND with KAFKA_SERVICE_HOST for kafka.advertised_port [deployment-charts] - 10https://gerrit.wikimedia.org/r/559209
[21:27:48] <logmsgbot>	 !log mholloway-shell@deploy1001 Started deploy [mobileapps/deploy@9dd8227]: Update mobileapps to cf2bb3b
[21:27:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:27:53] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] "This allows kafka-dev to work in docker-desktop" [deployment-charts] - 10https://gerrit.wikimedia.org/r/559209 (owner: 10Ottomata)
[21:27:59] <wikibugs>	 (03PS3) 10Ottomata: Use HOSTNAME_COMMAND with KAFKA_SERVICE_HOST for kafka.advertised_port [deployment-charts] - 10https://gerrit.wikimedia.org/r/559209
[21:29:11] <wikibugs>	 10Operations, 10Domains, 10Traffic: nameserver change for wikimedia.sk - https://phabricator.wikimedia.org/T241084 (10RobH)
[21:33:39] <logmsgbot>	 !log mholloway-shell@deploy1001 Finished deploy [mobileapps/deploy@9dd8227]: Update mobileapps to cf2bb3b (duration: 05m 51s)
[21:33:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:35:36] <wikibugs>	 10Operations, 10Core Platform Team, 10TechCom, 10User-mobrovac: Service Ownership and Maintenance - https://phabricator.wikimedia.org/T122825 (10Joe) I think most of the issues described here have been in the meantime solved by the implementation of the [[ https://www.mediawiki.org/wiki/Code_stewardship_re...
[21:40:02] <wikibugs>	 (03PS1) 10Bstorm: toolforge-k8s: add a script to grant "observer" access to a tool [puppet] - 10https://gerrit.wikimedia.org/r/559212 (https://phabricator.wikimedia.org/T233372)
[21:42:26] <wikibugs>	 (03CR) 10Bstorm: toolforge-k8s: add a script to grant "observer" access to a tool (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/559212 (https://phabricator.wikimedia.org/T233372) (owner: 10Bstorm)
[21:43:31] <wikibugs>	 (03PS2) 10Bstorm: toolforge-k8s: add a script to grant "observer" access to a tool [puppet] - 10https://gerrit.wikimedia.org/r/559212 (https://phabricator.wikimedia.org/T233372)
[22:12:29] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1123 - https://phabricator.wikimedia.org/T240534 (10Jclark-ctr) Confirmed: Service Request 1007375142 was successfully submitted
[22:19:25] <wikibugs>	 (03PS1) 10IAmNetx: Add initial configuration for ng.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559218 (https://phabricator.wikimedia.org/T240771)
[22:20:56] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add initial configuration for ng.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559218 (https://phabricator.wikimedia.org/T240771) (owner: 10IAmNetx)
[22:25:29] <wikibugs>	 (03PS2) 10IAmNetx: Add initial configuration for ng.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559218 (https://phabricator.wikimedia.org/T240771)
[22:26:39] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add initial configuration for ng.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559218 (https://phabricator.wikimedia.org/T240771) (owner: 10IAmNetx)
[22:29:00] <wikibugs>	 (03PS3) 10IAmNetx: Add initial configuration for ng.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559218 (https://phabricator.wikimedia.org/T240771)
[22:46:44] <wikibugs>	 10Operations, 10SRE-Access-Requests: Restore access for bmansurov - https://phabricator.wikimedia.org/T241089 (10bmansurov)
[22:48:45] <wikibugs>	 (03CR) 10CRusnov: [C: 03+1] "LGTM" [debs/pynetbox] (debian) - 10https://gerrit.wikimedia.org/r/553735 (owner: 10Hashar)
[22:59:37] <wikibugs>	 (03PS1) 10Krinkle: Disable wgExtractsExtendOpenSearchXml [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559224 (https://phabricator.wikimedia.org/T240691)
[23:04:38] <wikibugs>	 (03CR) 10Jdlrobson: [C: 03+1] Disable wgExtractsExtendOpenSearchXml [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559224 (https://phabricator.wikimedia.org/T240691) (owner: 10Krinkle)
[23:07:28] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] CommonSettings.php: Remove the disabled "temporary" code for T232613 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558769 (https://phabricator.wikimedia.org/T232613) (owner: 10Krinkle)
[23:08:10] <wikibugs>	 10Operations, 10SRE-Access-Requests: Restore access for bmansurov - https://phabricator.wikimedia.org/T241089 (10leila) Approved. thanks!
[23:08:22] <wikibugs>	 (03PS2) 10Krinkle: etcd: Add $etcdHost parameter to wmfSetupEtcd() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558776
[23:08:31] <wikibugs>	 (03Merged) 10jenkins-bot: CommonSettings.php: Remove the disabled "temporary" code for T232613 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558769 (https://phabricator.wikimedia.org/T232613) (owner: 10Krinkle)
[23:09:07] * Krinkle staging on mwdebug1001
[23:17:13] <logmsgbot>	 !log krinkle@deploy1001 Synchronized wmf-config/CommonSettings.php: If465c0ef cleanup (duration: 01m 01s)
[23:17:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:23:41] <wikibugs>	 (03CR) 10BryanDavis: "One bash nit inline. The actual logic makes sense. Having the labels to find these things in the cluster makes good sense too." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/559212 (https://phabricator.wikimedia.org/T233372) (owner: 10Bstorm)
[23:27:51] <logmsgbot>	 !log krinkle@deploy1001 Synchronized php-1.35.0-wmf.11/includes/resourceloader/ResourceLoaderFileModule.php: I3fe9f0a9ddc (duration: 01m 02s)
[23:27:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:40:35] <wikibugs>	 10Operations: Audit the WMF LDAP group and limit its permissions - https://phabricator.wikimedia.org/T240870 (10colewhite) @jcrespo that sounds bad to me.  Perhaps query monitoring is a great candidate for a more specific and limited group?
[23:44:51] <logmsgbot>	 !log krinkle@deploy1001 Synchronized php-1.35.0-wmf.11/includes/libs/objectcache/MemcachedPeclBagOStuff.php: Iacbc9ebda681 (duration: 01m 01s)
[23:44:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log