[00:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Evening backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201210T0000).
[00:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[00:01:56] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to deployment for Kosta Harlan - https://phabricator.wikimedia.org/T269731 (10thcipriani) >>! In T269731#6679835, @jbond wrote: > @thcipriani are you able to approve adding kostajh to the `deployment:` group  Approved!
[00:03:45] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/27063/wikistats-wild-tiger.wikistats.eqiad.wmflabs/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/646876 (owner: 10Dzahn)
[00:03:50] <wikibugs>	 10Operations, 10fundraising-tech-ops, 10netops: Manage frack switches with Netbox - https://phabricator.wikimedia.org/T268802 (10Dwisehaupt) @jbond Thanks for the pointers. I have started testing this in our VM setup and it looks like getting lldp in place should be easy to do.  I do have a question about th...
[00:04:55] <icinga-wm>	 PROBLEM - Widespread puppet agent failures on alert1001 is CRITICAL: 0.01 ge 0.01 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet
[00:22:30] <wikibugs>	 (03PS1) 10Dzahn: wikistats: fix file name of db dump script [puppet] - 10https://gerrit.wikimedia.org/r/647402
[00:23:20] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] wikistats: fix file name of db dump script [puppet] - 10https://gerrit.wikimedia.org/r/647402 (owner: 10Dzahn)
[00:23:26] <wikibugs>	 (03PS2) 10Dzahn: wikistats: fix file name of db dump script [puppet] - 10https://gerrit.wikimedia.org/r/647402
[00:26:20] <robh>	 !log cr2-eqsin bad fan being swapped via T267544
[00:26:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:26:24] <stashbot>	 T267544: cr2-eqsin: fan failure - https://phabricator.wikimedia.org/T267544
[00:32:17] <icinga-wm>	 PROBLEM - PHP opcache health on mw2243 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[00:33:31] <icinga-wm>	 ACKNOWLEDGEMENT - PHP opcache health on mw2243 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% daniel_zahn reimaged and not pooled https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[00:38:01] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1008 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 6418400728 and 746 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:38:01] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1005 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 2431911424 and 138 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:40:40] <wikibugs>	 (03PS1) 10Dzahn: wikistats: redirect output of mysqldump command properly [puppet] - 10https://gerrit.wikimedia.org/r/647408
[00:41:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wikistats: redirect output of mysqldump command properly [puppet] - 10https://gerrit.wikimedia.org/r/647408 (owner: 10Dzahn)
[00:41:15] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 292830432 and 20 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:41:19] <wikibugs>	 (03PS2) 10Dzahn: wikistats: redirect output of mysqldump command properly [puppet] - 10https://gerrit.wikimedia.org/r/647408
[00:41:47] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 2579873816 and 143 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:41:55] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 515024360 and 20 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:41:59] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1007 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 7966246568 and 481 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:42:17] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1010 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 75056672 and 16 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:42:23] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1006 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 6678317568 and 411 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:42:53] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 11368 and 45 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:43:31] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 0 and 83 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:43:46] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] wikistats: redirect output of mysqldump command properly [puppet] - 10https://gerrit.wikimedia.org/r/647408 (owner: 10Dzahn)
[00:43:53] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1010 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 45160 and 105 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:44:31] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1005 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 187112 and 142 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:45:01] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 59152 and 174 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:47:15] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1006 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 11992 and 306 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:47:45] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1008 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 2144 and 336 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:48:25] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1007 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 1040 and 378 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[00:48:54] <wikibugs>	 10Operations, 10DBA, 10Performance-Team, 10Platform Engineering Roadmap Decision Making, 10User-Kormat: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 (10nnikkhoui)
[00:50:37] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:52:42] <wikibugs>	 10Operations, 10ops-eqsin, 10DC-Ops: cr2-eqsin: fan failure - https://phabricator.wikimedia.org/T267544 (10RobH) Summary update:  * Jin installed the second replacement fan from Juniper into cr2-eqsin, the red led stayed red (didn't change to green) and software via ssh check by me still showed the fan in a...
[00:57:03] <icinga-wm>	 RECOVERY - Check systemd state on mwmaint1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:00:04] <jouncebot>	 twentyafterfour: (Dis)respected human, time to deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201210T0100). Please do the needful.
[01:01:35] <wikibugs>	 (03PS1) 10Bstorm: wikireplicas: close all connections [puppet] - 10https://gerrit.wikimedia.org/r/647419 (https://phabricator.wikimedia.org/T269620)
[01:01:51] <icinga-wm>	 PROBLEM - Check systemd state on mwmaint1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:11:47] <icinga-wm>	 PROBLEM - very high load average likely xfs on ms-be2018 is CRITICAL: CRITICAL - load average: 106.17, 100.34, 97.61 https://wikitech.wikimedia.org/wiki/Swift
[01:18:13] <icinga-wm>	 PROBLEM - very high load average likely xfs on ms-be2018 is CRITICAL: CRITICAL - load average: 101.30, 100.36, 98.38 https://wikitech.wikimedia.org/wiki/Swift
[01:34:03] <icinga-wm>	 RECOVERY - Check systemd state on mwmaint1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:38:55] <icinga-wm>	 PROBLEM - Check systemd state on mwmaint1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:13:05] <icinga-wm>	 PROBLEM - very high load average likely xfs on ms-be2018 is CRITICAL: CRITICAL - load average: 106.54, 102.17, 98.53 https://wikitech.wikimedia.org/wiki/Swift
[02:16:03] <icinga-wm>	 RECOVERY - Check systemd state on mwmaint1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:17:04] <wikibugs>	 (03PS1) 10Catrope: RCFilters: Temporarily fix TagItemWidget remove button size [core] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/647305 (https://phabricator.wikimedia.org/T269477)
[02:20:55] <icinga-wm>	 PROBLEM - Check systemd state on mwmaint1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:24:27] <icinga-wm>	 PROBLEM - very high load average likely xfs on ms-be2018 is CRITICAL: CRITICAL - load average: 101.64, 100.33, 99.40 https://wikitech.wikimedia.org/wiki/Swift
[02:55:19] <icinga-wm>	 RECOVERY - very high load average likely xfs on ms-be2018 is OK: OK - load average: 77.24, 75.89, 79.82 https://wikitech.wikimedia.org/wiki/Swift
[03:00:25] <icinga-wm>	 PROBLEM - Check systemd state on ms-be1030 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:47:15] <icinga-wm>	 RECOVERY - Check systemd state on ms-be1030 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:53:39] <icinga-wm>	 PROBLEM - very high load average likely xfs on ms-be2018 is CRITICAL: CRITICAL - load average: 106.22, 101.74, 97.43 https://wikitech.wikimedia.org/wiki/Swift
[04:03:31] <wikibugs>	 10Operations, 10Performance-Team, 10serviceops, 10Patch-For-Review, 10User-jijiki: Enable "/*/mw-with-onhost-tier/" route for MediaWiki where safe - https://phabricator.wikimedia.org/T264604 (10jijiki) @Krinkle @aaron do you think we are ready to move this forward?
[04:29:24] <wikibugs>	 10Operations, 10MW-on-K8s, 10serviceops: Sandbox/limit child processes within a container runtime - https://phabricator.wikimedia.org/T252745 (10tstarling)
[04:29:48] <wikibugs>	 10Operations, 10MW-on-K8s, 10serviceops, 10Patch-For-Review, and 2 others: RFC: PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10tstarling) 05Resolved→03Open Can the task stay open to track implementation? The RFC workboard has "Approved" and "Implemente...
[05:14:51] <wikibugs>	 10Operations, 10MW-on-K8s, 10serviceops, 10Patch-For-Review, and 2 others: RFC: PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10Krinkle) Given the title and task description, I assumed it was a dedicated task, but I see it's used as tracking task indeed. So...
[06:21:28] * kart_ upgrading Apertium service. No major changes.
[06:21:58] <wikibugs>	 (03CR) 10KartikMistry: [C: 03+2] Update apertium to 2020-12-09-115733-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/647220 (owner: 10KartikMistry)
[06:23:25] <wikibugs>	 (03Merged) 10jenkins-bot: Update apertium to 2020-12-09-115733-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/647220 (owner: 10KartikMistry)
[06:24:37] <icinga-wm>	 RECOVERY - very high load average likely xfs on ms-be2018 is OK: OK - load average: 62.65, 68.04, 78.51 https://wikitech.wikimedia.org/wiki/Swift
[06:27:00] <logmsgbot>	 !log kartik@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
[06:27:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:30:47] <logmsgbot>	 !log kartik@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
[06:30:47] <logmsgbot>	 !log kartik@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'plain' .
[06:30:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:30:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:35:00] <logmsgbot>	 !log kartik@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'plain' .
[06:35:00] <logmsgbot>	 !log kartik@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
[06:35:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:35:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:38:53] <kart_>	 !log Upgraded Apertium to 2020-12-09-115733-production
[06:38:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:18:37] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics-Clusters, 10DC-Ops: (Need By: TBD) rack/setup/install an-worker10[18-41] - https://phabricator.wikimedia.org/T260445 (10elukey) Thanks a lot for all the work!  To recap:  +2 servers in A2 +2 servers in A4 +2 servers in B2 +2 servers in B4 +1 servers in B7 +2 servers i...
[07:35:28] <wikibugs>	 (03CR) 10Elukey: [V: 03+1 C: 03+2] Add a second Hive Metastore on an-coord1002 [puppet] - 10https://gerrit.wikimedia.org/r/647273 (https://phabricator.wikimedia.org/T268028) (owner: 10Elukey)
[07:58:53] <wikibugs>	 (03CR) 10David Caro: [C: 04-1] "I have a question 😊" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/647419 (https://phabricator.wikimedia.org/T269620) (owner: 10Bstorm)
[08:09:36] <wikibugs>	 (03PS1) 10Elukey: hive: fix wrong kerberos principal for the replicated metastore [puppet] - 10https://gerrit.wikimedia.org/r/647599
[08:10:12] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] "And I was wondering why the hive server complained about kerberos auth :D" [puppet] - 10https://gerrit.wikimedia.org/r/647599 (owner: 10Elukey)
[08:11:28] <wikibugs>	 (03CR) 10Muehlenhoff: "> Patch Set 7:" [puppet] - 10https://gerrit.wikimedia.org/r/592712 (https://phabricator.wikimedia.org/T251005) (owner: 10Reedy)
[08:14:34] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/647369 (https://phabricator.wikimedia.org/T247364) (owner: 10CRusnov)
[08:16:12] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/645206 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn)
[08:18:04] <icinga-wm>	 RECOVERY - Check systemd state on mwmaint1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:22:44] <icinga-wm>	 PROBLEM - Check systemd state on mwmaint1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:22:55] <godog>	 !log swift codfw-prod: more weight to ms-be20[58-61] - T269337
[08:22:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:23:00] <stashbot>	 T269337: Add ms-be20[58-61] to swift - https://phabricator.wikimedia.org/T269337
[08:43:59] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/645206 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn)
[08:45:11] <wikibugs>	 (03PS7) 10Jbond: hiera: install redis on shard16 [puppet] - 10https://gerrit.wikimedia.org/r/647204 (https://phabricator.wikimedia.org/T265643) (owner: 10Effie Mouzeli)
[08:45:42] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/647204 (https://phabricator.wikimedia.org/T265643) (owner: 10Effie Mouzeli)
[08:51:36] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/647197 (https://phabricator.wikimedia.org/T265643) (owner: 10Effie Mouzeli)
[08:54:36] <wikibugs>	 (03PS3) 10Jbond: icinga: add support for downtimed and notifications_enabled parameters [software/spicerack] - 10https://gerrit.wikimedia.org/r/647245 (https://phabricator.wikimedia.org/T269672)
[08:54:41] <wikibugs>	 (03CR) 10Jbond: "updated" (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/647245 (https://phabricator.wikimedia.org/T269672) (owner: 10Jbond)
[08:55:06] <jbond42>	 volans: not sure if i have allready missed your spicerack release but ^^^ has been updated now
[08:56:28] <wikibugs>	 (03PS2) 10Jbond: icinga::raid_handler: add support for ssacli [puppet] - 10https://gerrit.wikimedia.org/r/647281 (https://phabricator.wikimedia.org/T269563)
[08:59:46] <wikibugs>	 (03CR) 10Effie Mouzeli: "PCC for all affected hosts looks ok: https://puppet-compiler.wmflabs.org/compiler1001/27064/" [puppet] - 10https://gerrit.wikimedia.org/r/647197 (https://phabricator.wikimedia.org/T265643) (owner: 10Effie Mouzeli)
[09:00:00] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] Upgrade to 2.9 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/603876 (https://phabricator.wikimedia.org/T254845) (owner: 10Gilles)
[09:00:34] <wikibugs>	 10Operations, 10Traffic: Incorrect X-Cache-Status reported by deployment-prep caches - https://phabricator.wikimedia.org/T269825 (10ema)
[09:00:40] <wikibugs>	 10Operations, 10Traffic: Incorrect X-Cache-Status reported by deployment-prep caches - https://phabricator.wikimedia.org/T269825 (10ema) p:05Triage→03Lowest
[09:02:20] <wikibugs>	 (03CR) 10Elukey: "To keep archives happy - the change was not submitted, and the WMDE team fixed the schema, so we are good now (no need to revert etc..)." [puppet] - 10https://gerrit.wikimedia.org/r/647351 (owner: 10Milimetric)
[09:06:07] <wikibugs>	 (03PS3) 10Jbond: icinga::raid_handler: add support for ssacli [puppet] - 10https://gerrit.wikimedia.org/r/647281 (https://phabricator.wikimedia.org/T269563)
[09:06:16] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:06:35] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: db1139 memory errors on boot (issue continues after board change) 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10jcrespo) Can confirm from os command line:   ` $ free -m Mem:         515690 `  Thank you very much!
[09:08:42] <wikibugs>	 (03CR) 10Jbond: icinga::raid_handler: add support for ssacli (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/647281 (https://phabricator.wikimedia.org/T269563) (owner: 10Jbond)
[09:10:48] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 03+1 C: 03+2] "Thanks! I 've missed that part, good catch!" [puppet] - 10https://gerrit.wikimedia.org/r/647210 (owner: 10Alexandros Kosiaris)
[09:11:12] <effie>	 !log disable puppet on all hosts running redis - T265643
[09:11:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:11:17] <stashbot>	 T265643: Upgrade MediaWiki's Redis cluster to Debian Buster - https://phabricator.wikimedia.org/T265643
[09:12:44] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] redis: define redis version on buster for multidc [puppet] - 10https://gerrit.wikimedia.org/r/647197 (https://phabricator.wikimedia.org/T265643) (owner: 10Effie Mouzeli)
[09:13:49] <effie>	 akosiaris: I thing I have your patch too 
[09:13:55] <wikibugs>	 10Operations, 10Traffic: X-Cache-Status: distinguish between fresh and stale hits/misses - https://phabricator.wikimedia.org/T269828 (10ema)
[09:14:02] <volans>	 jbond42: no you didn't
[09:14:02] <wikibugs>	 10Operations, 10Traffic: X-Cache-Status: distinguish between fresh and stale hits/misses - https://phabricator.wikimedia.org/T269828 (10ema) p:05Triage→03Medium
[09:14:05] <effie>	 akosiaris: should I proceed?
[09:14:57] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/647245 (https://phabricator.wikimedia.org/T269672) (owner: 10Jbond)
[09:15:05] <volans>	 jbond42: go ahead and merge it at will
[09:15:31] <akosiaris>	 effie: yup
[09:15:38] <effie>	 smile :D
[09:17:37] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on es1023 - https://phabricator.wikimedia.org/T268796 (10jcrespo) This is still rebuilding:   ` root@es1023:~$ megacli -PDRbld -ShowProg -PhysDrv \[32\:5\] -aALL                                       Rebuild Progress on Device at Enclosure 32, Slot 5 Completed...
[09:18:54] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] icinga::raid_handler: add support for ssacli [puppet] - 10https://gerrit.wikimedia.org/r/647281 (https://phabricator.wikimedia.org/T269563) (owner: 10Jbond)
[09:19:15] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] icinga: add support for downtimed and notifications_enabled parameters [software/spicerack] - 10https://gerrit.wikimedia.org/r/647245 (https://phabricator.wikimedia.org/T269672) (owner: 10Jbond)
[09:19:47] <jbond42>	 ack volans the spicerack one is merge ping me when you do a release and ill merge the icinga_status CR, thx
[09:20:11] <volans>	 perfect, will be shortly
[09:20:45] <wikibugs>	 (03PS1) 10Ema: cache: downgrade Varnish on cp3054 to 6.0.0-1wm1 [puppet] - 10https://gerrit.wikimedia.org/r/647615 (https://phabricator.wikimedia.org/T264398)
[09:20:59] <jbond42>	 ack
[09:22:18] <wikibugs>	 (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/647615 (https://phabricator.wikimedia.org/T264398) (owner: 10Ema)
[09:22:51] <wikibugs>	 (03Merged) 10jenkins-bot: icinga: add support for downtimed and notifications_enabled parameters [software/spicerack] - 10https://gerrit.wikimedia.org/r/647245 (https://phabricator.wikimedia.org/T269672) (owner: 10Jbond)
[09:24:50] <wikibugs>	 (03CR) 10Ema: [C: 03+2] cache: downgrade Varnish on cp3054 to 6.0.0-1wm1 [puppet] - 10https://gerrit.wikimedia.org/r/647615 (https://phabricator.wikimedia.org/T264398) (owner: 10Ema)
[09:26:31] <effie>	 !log disable puppet on all mw* hosts for 647204 - T265643
[09:26:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:26:35] <stashbot>	 T265643: Upgrade MediaWiki's Redis cluster to Debian Buster - https://phabricator.wikimedia.org/T265643
[09:28:19] <effie>	 !log disable puppet on all hosts running nutcracker for 647204 - T265643
[09:28:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:28:28] <wikibugs>	 (03PS8) 10Effie Mouzeli: hiera: install redis on shard16 [puppet] - 10https://gerrit.wikimedia.org/r/647204 (https://phabricator.wikimedia.org/T265643)
[09:28:46] <ema>	 !log cp3054: downgrade varnish to 6.0.0-1wm1 T264398
[09:28:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:28:49] <stashbot>	 T264398: 8-10% response start regression (Varnish 5.1.3-1wm15 -> 6.0.6-1wm1) - https://phabricator.wikimedia.org/T264398
[09:34:40] <icinga-wm>	 PROBLEM - Check systemd state on cp3054 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:37:42] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: db1139 memory errors on boot (issue continues after board change) 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10jcrespo) Also no more errors on reboot: ` Installed System Memory: 512 GB, Available System Memory: 512 GB  2 Processor(s) detected, 8 total cores enab...
[09:37:56] <icinga-wm>	 RECOVERY - Check systemd state on cp3054 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:39:15] <wikibugs>	 (03PS2) 10Jcrespo: Revert "mariadb: Reduce memory consumption of mariadb@s6 while hw degraded" [puppet] - 10https://gerrit.wikimedia.org/r/641498
[09:39:35] <wikibugs>	 (03PS1) 10Volans: CHANGELOG: add changelogs for release v0.0.46 [software/spicerack] - 10https://gerrit.wikimedia.org/r/647621
[09:40:00] <wikibugs>	 (03CR) 10JMeybohm: "Thanks for the review." (039 comments) [puppet] - 10https://gerrit.wikimedia.org/r/645417 (https://phabricator.wikimedia.org/T267653) (owner: 10JMeybohm)
[09:40:28] <wikibugs>	 (03PS8) 10JMeybohm: calico: Add support for calico 3.x with kubernetes datastore [puppet] - 10https://gerrit.wikimedia.org/r/645417 (https://phabricator.wikimedia.org/T267653)
[09:40:53] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] Revert "mariadb: Reduce memory consumption of mariadb@s6 while hw degraded" [puppet] - 10https://gerrit.wikimedia.org/r/641498 (owner: 10Jcrespo)
[09:41:10] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] alerting: Disable screen/tmux monitoring on orchestrator hosts [puppet] - 10https://gerrit.wikimedia.org/r/647319 (https://phabricator.wikimedia.org/T265990) (owner: 10Jcrespo)
[09:41:52] <jynus>	 ups
[09:41:54] <kormat>	 jynus: is it safe to puppet-merge your change?
[09:41:55] <jynus>	 ok to merge?
[09:41:57] <jynus>	 yeah
[09:41:59] <wikibugs>	 (03CR) 10Effie Mouzeli: "PCC https://puppet-compiler.wmflabs.org/compiler1001/27066/  and https://puppet-compiler.wmflabs.org/compiler1001/27067/" [puppet] - 10https://gerrit.wikimedia.org/r/647204 (https://phabricator.wikimedia.org/T265643) (owner: 10Effie Mouzeli)
[09:42:04] <kormat>	 go for it :)
[09:42:22] <kormat>	 jynus: i hit the wrong button on the screen-monitoring CR, so i decided i'd submit it
[09:42:33] <jynus>	 ha ha
[09:42:52] <jynus>	 so you only merged my suggestion because of an accident :-))))))
[09:43:04] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] CHANGELOG: add changelogs for release v0.0.46 [software/spicerack] - 10https://gerrit.wikimedia.org/r/647621 (owner: 10Volans)
[09:43:17] <kormat>	 jynus: haha. i _meant_ to +1 it, but it's morning :)
[09:43:21] <jynus>	 ah, ok
[09:43:38] <jynus>	 nah, those patches where you have the last call, I am more than cool with you merging
[09:43:49] <jynus>	 as you are more of the owner of them
[09:44:29] <wikibugs>	 (03PS2) 10Volans: CHANGELOG: add changelogs for release v0.0.46 [software/spicerack] - 10https://gerrit.wikimedia.org/r/647621
[09:44:33] <wikibugs>	 (03CR) 10JMeybohm: "> Patch Set 2: -Verified" [puppet] - 10https://gerrit.wikimedia.org/r/647011 (https://phabricator.wikimedia.org/T269461) (owner: 10Alexandros Kosiaris)
[09:46:48] <icinga-wm>	 RECOVERY - MariaDB read only s1 on db1139 is OK: Version 10.1.44-MariaDB, Uptime 56s, read_only: True, event_scheduler: True, 11.78 QPS, connection latency: 0.002330s, query latency: 0.000594s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[09:47:03] <kormat>	 wait, what?
[09:47:09] <kormat>	 ah
[09:47:15] <kormat>	 jynus: that you? ^
[09:47:17] <jynus>	 that is db1139 coming back from the dead
[09:47:20] <kormat>	 grand :)
[09:47:21] <jynus>	 I will disable notifications
[09:47:41] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] CHANGELOG: add changelogs for release v0.0.46 [software/spicerack] - 10https://gerrit.wikimedia.org/r/647621 (owner: 10Volans)
[09:47:45] <jynus>	 it is the typical issue that downtime only disables new downs, not recoveries
[09:48:40] <wikibugs>	 (03PS1) 10Jcrespo: Revert "database backups: Move s1&s6 snapshots and logical dumps from db1139 to db1140" [puppet] - 10https://gerrit.wikimedia.org/r/647626
[09:48:51] <wikibugs>	 (03PS2) 10Jcrespo: Revert "database backups: Move s1&s6 snapshots and logical dumps from db1139 to db1140" [puppet] - 10https://gerrit.wikimedia.org/r/647626
[09:49:08] <wikibugs>	 (03CR) 10Volans: "recheck" [software/spicerack] - 10https://gerrit.wikimedia.org/r/647621 (owner: 10Volans)
[09:51:56] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] Revert "database backups: Move s1&s6 snapshots and logical dumps from db1139 to db1140" [puppet] - 10https://gerrit.wikimedia.org/r/647626 (owner: 10Jcrespo)
[09:55:47] <wikibugs>	 (03CR) 10Volans: "recheck" [software/spicerack] - 10https://gerrit.wikimedia.org/r/647621 (owner: 10Volans)
[09:57:52] <ema>	 !log A:cp rolling ats-{tls,backend}-restart for openssl upgrades (CVE-2020-1971)
[09:57:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:59:38] <wikibugs>	 (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v0.0.46 [software/spicerack] - 10https://gerrit.wikimedia.org/r/647621 (owner: 10Volans)
[10:00:43] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on ms-be1030 - https://phabricator.wikimedia.org/T268036 (10fgiunchedi) >>! In T268036#6679944, @Cmjohnson wrote: > @fgiunchedi The bbu is on-site, please let me know when I can take this offline?  I can do tomorrow 1500UTC  1500 UTC sounds good to me, please LMK on IRC...
[10:00:47] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to deployment for Kosta Harlan - https://phabricator.wikimedia.org/T269731 (10jbond) >>! In T269731#6679919, @marcella wrote: > @jbond I am Kosta's manager and I approve this request.  Thank you! >>! In T269731#6680958, @kaldari wrote: > I approve as well...
[10:02:20] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/647204 (https://phabricator.wikimedia.org/T265643) (owner: 10Effie Mouzeli)
[10:02:33] <wikibugs>	 (03PS1) 10Volans: Upstream release v0.0.46 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/647649
[10:02:42] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] hiera: install redis on shard16 [puppet] - 10https://gerrit.wikimedia.org/r/647204 (https://phabricator.wikimedia.org/T265643) (owner: 10Effie Mouzeli)
[10:07:21] <wikibugs>	 (03CR) 10Volans: [C: 03+2] Upstream release v0.0.46 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/647649 (owner: 10Volans)
[10:10:14] <wikibugs>	 (03Merged) 10jenkins-bot: Upstream release v0.0.46 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/647649 (owner: 10Volans)
[10:16:38] <volans>	 !log uploaded spicerack_0.0.46 to apt.wikimedia.org buster-wikimedia
[10:16:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:16:43] <wikibugs>	 (03PS1) 10Jbond: admin: add kharlan to deployment group [puppet] - 10https://gerrit.wikimedia.org/r/647651 (https://phabricator.wikimedia.org/T269731)
[10:17:35] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to deployment for Kosta Harlan - https://phabricator.wikimedia.org/T269731 (10jbond)
[10:17:36] <icinga-wm>	 RECOVERY - HP RAID on labstore1006 is OK: OK: Slot 1: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2 - Controller: OK - Battery/Capacitor: OK --- Slot 3: OK: 1E:1:1, 1E:1:10, 1E:1:11, 1E:1:12, 1E:1:2, 1E:1:3, 1E:1:4, 1E:1:5, 1E:1:6, 1E:1:7, 1E:1:8, 1E:1:9, 1E:2:1, 1E:2:10, 1E:2:11, 1E:2:12, 1E:2:2, 1E:2:3, 1E:2:4, 1E:2:5, 1E:2:6, 1E:2:7, 1E:2:8, 1E:2:9 - Controll
[10:17:36] <icinga-wm>	 Capacitor: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[10:17:46] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] admin: add kharlan to deployment group [puppet] - 10https://gerrit.wikimedia.org/r/647651 (https://phabricator.wikimedia.org/T269731) (owner: 10Jbond)
[10:18:02] <icinga-wm>	 PROBLEM - Aggregate IPsec Tunnel Status codfw on alert1001 is CRITICAL: instance=mc2034 site=codfw tunnel=mc1034_v4 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan https://grafana.wikimedia.org/d/B9JpocKZz/ipsec-tunnel-status
[10:18:13] <volans>	 jbond42: new spicerack released, we can upgrade it on the cumin hosts whenver you're ready to merge
[10:23:13] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to deployment for Kosta Harlan - https://phabricator.wikimedia.org/T269731 (10jbond) I have now added you to the deployment group.  however there is currently on going work which means it may take a few hours for this change to propog...
[10:23:15] <jbond42>	 volans: i can deploy now
[10:24:14] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to deployment for Kosta Harlan - https://phabricator.wikimedia.org/T269731 (10jbond) 05Open→03Resolved p:05Triage→03Medium
[10:25:12] <wikibugs>	 (03PS4) 10Jbond: icinga_status: add downtimed and notifications_enabled to json [puppet] - 10https://gerrit.wikimedia.org/r/647084
[10:28:05] <kostajh>	 jbond42: thanks for your help!
[10:28:24] <volans>	 jbond42: ack, sorry got disconnected
[10:28:46] <jbond42>	 volans: no problem just ping me when its deployed to cumin then ill deploy my change
[10:28:56] <volans>	 jbond42: {done}
[10:29:14] <volans>	 !log upgraded spicearack to 0.0.46 on cumin[12]001
[10:29:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:29:40] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] icinga_status: add downtimed and notifications_enabled to json [puppet] - 10https://gerrit.wikimedia.org/r/647084 (owner: 10Jbond)
[10:29:56] <icinga-wm>	 PROBLEM - Check systemd state on cp5007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:31:22] <jbond42>	 volans: ack deployed to icinga1001 and tested localy.  will test the reboot cook book in a bit
[10:31:34] <volans>	 perfect, thanks a lot!
[10:31:40] <jbond42>	 np thx :)
[10:32:10] <logmsgbot>	 !log elukey@cumin1001 START - Cookbook sre.hosts.decommission
[10:32:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:22] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.hosts.reboot-single
[10:33:23] <logmsgbot>	 !log jbond@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
[10:33:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:34] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.hosts.reboot-single
[10:33:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:35:54] <wikibugs>	 (03PS1) 10Filippo Giunchedi: smokeping: force redirect to https [puppet] - 10https://gerrit.wikimedia.org/r/647654
[10:37:33] <logmsgbot>	 !log jbond@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
[10:37:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:37:44] <logmsgbot>	 !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
[10:37:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:37:49] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install an-tool1010.eqiad.wmnet - https://phabricator.wikimedia.org/T268146 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by elukey@cumin1001 for hosts: `an-tool1010.eqiad.wmnet` - an-tool1010.eqiad.wmnet (**PASS**)   - Downtim...
[10:38:28] <effie>	 !log uploading prometheus-redis-exporter_0.13-1 in component/redis2 for buster
[10:38:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:38:38] <wikibugs>	 (03PS1) 10Filippo Giunchedi: alertmanager: set karma poll interval to 10s [puppet] - 10https://gerrit.wikimedia.org/r/647655 (https://phabricator.wikimedia.org/T266017)
[10:41:48] <wikibugs>	 (03PS1) 10Volans: Testing CI [software/spicerack] - 10https://gerrit.wikimedia.org/r/647657
[10:42:51] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.hosts.reboot-single
[10:42:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:43:58] <godog>	 !log swift eqiad-prod: add weight to ms-be106[0-3] - T268435
[10:44:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:44:05] <stashbot>	 T268435: Add ms-be106[0-3] to swift - https://phabricator.wikimedia.org/T268435
[10:45:08] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Testing CI [software/spicerack] - 10https://gerrit.wikimedia.org/r/647657 (owner: 10Volans)
[10:45:59] <logmsgbot>	 !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
[10:46:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:47:02] <wikibugs>	 (03CR) 10Volans: "recheck" [software/spicerack] - 10https://gerrit.wikimedia.org/r/647657 (owner: 10Volans)
[10:47:16] <wikibugs>	 10Operations: Traceback in icinga-status  'Host' object has no attribute 'downtime' - https://phabricator.wikimedia.org/T269672 (10jbond) I fix has been applied to both spicerack and the icingas_status script.  I have checked things work with the following and all looks good to me.  please re-open if you still s...
[10:47:23] <wikibugs>	 10Operations: Traceback in icinga-status  'Host' object has no attribute 'downtime' - https://phabricator.wikimedia.org/T269672 (10jbond) 05Resolved→03Open
[10:47:42] <jbond42>	 volans: fyi looks like the spicerack release fixed the issue reported thanks ^^
[10:47:50] <volans>	 great! thanks a lot
[10:50:47] <wikibugs>	 (03CR) 10Volans: "recheck" [software/spicerack] - 10https://gerrit.wikimedia.org/r/647657 (owner: 10Volans)
[10:51:07] <icinga-wm>	 PROBLEM - varnish-http-requests grafana alert on alert1001 is CRITICAL: CRITICAL: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is alerting: 70% GET drop in 30min alert. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/
[10:52:14] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: WIP: Move monitoring stanzas to shared templates [deployment-charts] - 10https://gerrit.wikimedia.org/r/647660
[10:53:49] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at ulsfo on alert1001 is CRITICAL: 10.34 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[10:54:51] <wikibugs>	 (03CR) 10Volans: "recheck" [software/spicerack] - 10https://gerrit.wikimedia.org/r/647657 (owner: 10Volans)
[10:56:31] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at ulsfo on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[10:58:19] <wikibugs>	 10Operations, 10Release-Engineering-Team-TODO, 10serviceops, 10Patch-For-Review, and 2 others: Upgrade MediaWiki appservers to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10MoritzMuehlenhoff) >>! In T245757#6680909, @Dzahn wrote: >>>! In T245757#6645352, @jijiki wrote: >> @Dzahn...
[11:00:05] <jouncebot>	 mvolz: Dear deployers, time to do the Services – Citoid /  Zotero deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201210T1100).
[11:00:41] <wikibugs>	 (03CR) 10Volans: "recheck" [software/spicerack] - 10https://gerrit.wikimedia.org/r/647657 (owner: 10Volans)
[11:01:28] <wikibugs>	 (03PS3) 10Mvolz: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/641725 (owner: 10PipelineBot)
[11:02:19] <icinga-wm>	 RECOVERY - Aggregate IPsec Tunnel Status codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/strongswan https://grafana.wikimedia.org/d/B9JpocKZz/ipsec-tunnel-status
[11:02:41] <wikibugs>	 (03CR) 10Mvolz: [C: 03+2] citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/641725 (owner: 10PipelineBot)
[11:03:09] <icinga-wm>	 RECOVERY - varnish-http-requests grafana alert on alert1001 is OK: OK: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is not alerting. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/
[11:04:09] <wikibugs>	 (03Merged) 10jenkins-bot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/641725 (owner: 10PipelineBot)
[11:04:23] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.reboot-single
[11:04:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:05:01] <moritzm>	 !log rebooting failoid1001
[11:05:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:05:42] <volans>	 moritzm: it will fail
[11:05:46] <volans>	 ...oid
[11:06:24] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
[11:06:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:07:28] <logmsgbot>	 !log elukey@cumin1001 START - Cookbook sre.dns.netbox
[11:07:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:07:57] <wikibugs>	 10Operations: Traceback in icinga-status  'Host' object has no attribute 'downtime' - https://phabricator.wikimedia.org/T269672 (10MoritzMuehlenhoff) Works like a charm now.
[11:08:21] <moritzm>	 it continues to fail, that's how I like it!
[11:10:22] <wikibugs>	 10Operations, 10fundraising-tech-ops, 10netops: Manage frack switches with Netbox - https://phabricator.wikimedia.org/T268802 (10jbond) >>! In T268802#6681000, @Dwisehaupt wrote: > I do have a question about the use of facter though. In my testing with lldbctl I see multiple neighbors for an interface.  Alth...
[11:13:32] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to releasers-wikibase for toan - https://phabricator.wikimedia.org/T269777 (10jbond)
[11:13:57] <logmsgbot>	 !log elukey@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[11:13:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:14:21] <icinga-wm>	 PROBLEM - puppet last run on scb1003 is CRITICAL: CRITICAL: Puppet last ran 1 day ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[11:14:33] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install an-tool1010.eqiad.wmnet - https://phabricator.wikimedia.org/T268146 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` an-tool1010.eqiad.wmnet ` The log can be found in `/var/log/wm...
[11:19:30] <wikibugs>	 (03PS1) 10Jbond: admin: add toan user [puppet] - 10https://gerrit.wikimedia.org/r/647662 (https://phabricator.wikimedia.org/T269777)
[11:20:02] <wikibugs>	 (03PS7) 10Jbond: Add group wikibase-releasers & folder [puppet] - 10https://gerrit.wikimedia.org/r/643512 (https://phabricator.wikimedia.org/T268818) (owner: 10Tobias Andersson)
[11:20:57] <moritzm>	 !log installing apt security updates on buster/stretch
[11:21:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:21:20] <effie>	 !log upload rometheus-redis-exporter_0.13-1 to buster-wikimedia main 
[11:21:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:21:25] <wikibugs>	 (03PS8) 10Jbond: Add group wikibase-releasers & folder [puppet] - 10https://gerrit.wikimedia.org/r/643512 (https://phabricator.wikimedia.org/T268818) (owner: 10Tobias Andersson)
[11:21:54] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to releasers-wikibase for toan - https://phabricator.wikimedia.org/T269777 (10jbond)
[11:22:53] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to releasers-wikibase for toan - https://phabricator.wikimedia.org/T269777 (10jbond) @toan I have created the CR to add your shell account and wiull merge at the same time as the change to add [[ https://gerrit.wikimedia.org/r/c/opera...
[11:24:59] <icinga-wm>	 RECOVERY - puppet last run on scb1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[11:39:09] <wikibugs>	 10Operations, 10Growth-Team, 10serviceops, 10Patch-For-Review, and 2 others: Reimage one memcached shard per DC to Buster - https://phabricator.wikimedia.org/T252391 (10jijiki)
[11:39:19] <wikibugs>	 10Operations, 10Platform Engineering, 10serviceops, 10Patch-For-Review, 10User-jijiki: Upgrade MediaWiki's Redis cluster to Debian Buster - https://phabricator.wikimedia.org/T265643 (10jijiki)
[11:39:24] <wikibugs>	 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10jijiki)
[11:40:11] <wikibugs>	 10Operations, 10Growth-Team, 10serviceops, 10Patch-For-Review, and 2 others: Reimage one memcached shard per DC to Buster - https://phabricator.wikimedia.org/T252391 (10jijiki) 05Open→03Resolved a:03jijiki
[11:41:28] <wikibugs>	 10Operations, 10LDAP-Access-Requests: LDAP access to wmf group for Matt Cleinman - https://phabricator.wikimedia.org/T269696 (10jbond) p:05Triage→03Medium @MattCleinman granting access is not an issue however could you please provide information on the services you require access to for audit puposes @JoeW...
[11:45:15] <wikibugs>	 (03Abandoned) 10Effie Mouzeli: mcrouter: add gutter pool servers in configuration [puppet] - 10https://gerrit.wikimedia.org/r/569541 (https://phabricator.wikimedia.org/T213089) (owner: 10Effie Mouzeli)
[11:45:42] <wikibugs>	 (03Abandoned) 10Effie Mouzeli: mcrouter: enable gutter pool config on mwdebug1001 and mwdebug2001 [puppet] - 10https://gerrit.wikimedia.org/r/574200 (https://phabricator.wikimedia.org/T213089) (owner: 10Effie Mouzeli)
[11:45:47] <logmsgbot>	 !log kharlan@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
[11:51:03] <wikibugs>	 (03PS7) 10MSantos: start using imposm as OSM sync tool [puppet] - 10https://gerrit.wikimedia.org/r/644482 (https://phabricator.wikimedia.org/T260949)
[11:52:17] <wikibugs>	 10Operations, 10Performance-Team, 10Platform Engineering, 10Goal: Decommission the "session redis" cluster - https://phabricator.wikimedia.org/T243520 (10jijiki)
[11:52:37] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] start using imposm as OSM sync tool [puppet] - 10https://gerrit.wikimedia.org/r/644482 (https://phabricator.wikimedia.org/T260949) (owner: 10MSantos)
[11:53:35] <wikibugs>	 (03PS1) 10Lucas Werkmeister (WMDE): Fix prev/next links on Special:WhatLinksHere [core] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/647628 (https://phabricator.wikimedia.org/T269830)
[11:54:40] <Lucas_WMDE>	 jouncebot: recheck please
[11:55:08] <Lucas_WMDE>	 uh
[11:55:10] <Lucas_WMDE>	 i’m stupid
[11:55:12] <Lucas_WMDE>	 jouncebot: refresh please
[11:55:13] <jouncebot>	 I refreshed my knowledge about deployments.
[11:55:17] <Lucas_WMDE>	 thanks ^^
[11:56:06] <Lucas_WMDE>	 (I’ll be in a meeting for the first half of the window, so the other config changes can be deployed ahead of that backport)
[11:56:24] <wikibugs>	 10Operations, 10LDAP-Access-Requests: NDA for Superset Request from WMDE Employee - Mohammed Sadat - https://phabricator.wikimedia.org/T269843 (10Mohammed_Sadat_WMDE)
[11:57:03] <icinga-wm>	 RECOVERY - Check systemd state on mwmaint1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:00:04] <jouncebot>	 Amir1, Lucas_WMDE, awight, and Urbanecm: How many deployers does it take to do European mid-day backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201210T1200).
[12:00:04] <jouncebot>	 Bencemac, matthiasmullie, and Lucas_WMDE: A patch you scheduled for European mid-day backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[12:00:54] <Bencemac>	 I'm here, but it's also my first patch, so sorry in advance :)
[12:00:54] <Lucas_WMDE>	 I’m busy for the next 30 mins
[12:00:59] <matthiasmullie>	 I withdraw my patch - not to be deployed today
[12:01:53] <icinga-wm>	 PROBLEM - Check systemd state on mwmaint1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:02:54] <wikibugs>	 (03PS1) 10Ayounsi: Run Homer during the decom cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/647629
[12:03:09] <Urbanecm>	 I can deploy today!
[12:03:27] <Urbanecm>	 so, it's only Bencemac 's patch (and Lucas's once he gets back?)
[12:03:44] <Bencemac>	 it looks like
[12:03:46] <wikibugs>	 (03PS2) 10Ayounsi: Run Homer during the decom cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/647629
[12:04:12] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "B&C" [core] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/647628 (https://phabricator.wikimedia.org/T269830) (owner: 10Lucas Werkmeister (WMDE))
[12:04:12] <Bencemac>	 I have installed the gadget and ready to go
[12:04:14] <matthiasmullie>	 yes, just removed mine from deployments page
[12:04:17] <Urbanecm>	 Bencemac: great
[12:04:31] <Lucas_WMDE>	 Urbanecm: you saw my message that I’m not available yet?
[12:04:37] <Lucas_WMDE>	 I think that +2 is premature…
[12:04:51] <Urbanecm>	 Lucas_WMDE: yes, but also CI takes over 20 minutes to complete
[12:05:03] <Lucas_WMDE>	 yes, but I’ll be unavailable for more than over 20 minutes too…
[12:05:15] <Urbanecm>	 I can cancel it if you wish, but that'll mean you'll have to wait 20 minutes instead just coming to a merged patch :)
[12:05:42] <Urbanecm>	 Bencemac: is it intentional to also set wgKartographerEnableMapFrame to true?
[12:06:36] <wikibugs>	 (03PS1) 10Effie Mouzeli: hiera: upgrade mc1032, mc2032 to buster [puppet] - 10https://gerrit.wikimedia.org/r/647672 (https://phabricator.wikimedia.org/T213089)
[12:07:06] <Bencemac>	 Kartographer is not enabled @huwiki, so I'm not sure. But probably yes, because the FR settings affect it
[12:07:37] <Urbanecm>	 Lucas_WMDE: anyway, I'm happy to deploy yours even w/o you, as it's simple enough 🙂
[12:08:05] <Lucas_WMDE>	 or you could just wait?
[12:08:10] <Lucas_WMDE>	 and then I’ll be happy to deploy it…
[12:08:33] <Urbanecm>	 as you wish, +2 removed :)
[12:09:09] <Urbanecm>	 Bencemac: I don't understand it. The comment for wgKartographerEnableMapFrame says "// Disable for FlaggedRevs wikis with $wgFlaggedRevsOverride=true", and you're setting wgFlaggedRevsOverride to true?
[12:09:37] <icinga-wm>	 PROBLEM - DPKG on ganeti2019 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:10:48] <Bencemac>	 well, it's tgr's patch and I'm just here to learn how this works for the future
[12:11:00] <Bencemac>	 I also think that it should be false
[12:11:40] <Urbanecm>	 Bencemac: remove the kartographer thing from the patch please
[12:11:47] <icinga-wm>	 PROBLEM - DPKG on mwdebug2002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:12:00] <Urbanecm>	 this will actually enable it, and there's no reason to do it now AFAICS
[12:12:28] <moritzm>	 ^ dpkg error will sort out soon, apt update
[12:13:39] <logmsgbot>	 !log kharlan@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
[12:13:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:14:01] <wikibugs>	 (03CR) 10Volans: "recheck" [software/spicerack] - 10https://gerrit.wikimedia.org/r/647657 (owner: 10Volans)
[12:14:12] <Urbanecm>	 Bencemac: just to confrim, did you see my message? :-)
[12:14:27] <Bencemac>	 yes, I am just trying 
[12:14:40] <Urbanecm>	 okay, ask if you have any questions :)
[12:15:15] <wikibugs>	 10Operations, 10Mail: Bounces when sending mail to aliases of a specific WMF email address: 550 Previous (cached) callout verification failure - https://phabricator.wikimedia.org/T269725 (10jbond) I have tried to recreate this and every thing looks fine from an SMTP PoV  ` lines=5 $ telnet mx1001.wikimedia.org...
[12:15:27] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install an-tool1010.eqiad.wmnet - https://phabricator.wikimedia.org/T268146 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-tool1010.eqiad.wmnet'] `  Of which those **FAILED**: ` ['an-tool1010.eqiad.wmnet'] `
[12:16:07] <icinga-wm>	 PROBLEM - DPKG on puppetboard2001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:17:29] <wikibugs>	 (03PS7) 10Bencemac: [huwiki] Set wgFlaggedRevsOverride back to true per community vote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/496205 (https://phabricator.wikimedia.org/T210224) (owner: 10Mahveotm)
[12:17:37] <Lucas_WMDE>	 o/ meeting over \o/
[12:17:49] <wikibugs>	 (03PS8) 10Urbanecm: [huwiki] Set wgFlaggedRevsOverride back to true per community vote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/496205 (https://phabricator.wikimedia.org/T210224) (owner: 10Mahveotm)
[12:17:57] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] [huwiki] Set wgFlaggedRevsOverride back to true per community vote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/496205 (https://phabricator.wikimedia.org/T210224) (owner: 10Mahveotm)
[12:18:01] <Bencemac>	 it's done 
[12:18:04] <Urbanecm>	 thanks Bencemac :)
[12:18:09] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Fix prev/next links on Special:WhatLinksHere [core] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/647628 (https://phabricator.wikimedia.org/T269830) (owner: 10Lucas Werkmeister (WMDE))
[12:18:10] <Urbanecm>	 will ping you once it's ready to be tested
[12:18:39] <Lucas_WMDE>	 readded the +2 to my backport, hopefully that means the old gate-and-submit will still go through
[12:18:46] <Lucas_WMDE>	 but I gather you’re not done yet so I’ll wait with the actual deploy
[12:19:02] <Bencemac>	 sorry, I'm just a bit nervous, I'm not so eperienced in this stuff :D
[12:19:07] <wikibugs>	 (03Merged) 10jenkins-bot: [huwiki] Set wgFlaggedRevsOverride back to true per community vote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/496205 (https://phabricator.wikimedia.org/T210224) (owner: 10Mahveotm)
[12:19:25] <Bencemac>	 will wait here
[12:19:27] <Urbanecm>	 Bencemac: no problem, I'll guide you through it :)
[12:20:01] <Bencemac>	 truly appreciated 
[12:20:56] <Urbanecm>	 Bencemac: I've pulled your change onto mwdebug1001. Can you test, please? Assuming you have the browser extension/gadget installed, you need to only enable it, pick mwdebug1001 there, and ensure indeed the stable version appears.
[12:21:13] <Bencemac>	 doing...
[12:22:19] <wikibugs>	 (03CR) 10Tobias Andersson: admin: add toan user (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/647662 (https://phabricator.wikimedia.org/T269777) (owner: 10Jbond)
[12:24:17] <icinga-wm>	 RECOVERY - DPKG on ganeti2019 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:24:17] <icinga-wm>	 RECOVERY - DPKG on mwdebug2002 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:24:26] <Bencemac>	 Urbanecm, it works perfectly
[12:24:32] <Urbanecm>	 great, syncing then
[12:26:11] <logmsgbot>	 !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 042dd034ef5811923106e81dbb4ac129be1f1ba6: [huwiki] Set wgFlaggedRevsOverride back to true per community vote (T210224) (duration: 01m 07s)
[12:26:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:26:15] <stashbot>	 T210224: Revert FlaggedRevs changes on the Hungarian Wikipedia - https://phabricator.wikimedia.org/T210224
[12:26:20] <Urbanecm>	 Bencemac: done :). Anything else?
[12:26:53] <Bencemac>	 nothing else, thank you very much!
[12:26:59] <Urbanecm>	 no problem
[12:27:07] <Urbanecm>	 Lucas_WMDE: in that case, it's yours :)
[12:27:12] <Lucas_WMDE>	 alright, thanks :)
[12:27:16] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install an-tool1010.eqiad.wmnet - https://phabricator.wikimedia.org/T268146 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` an-tool1010.eqiad.wmnet ` The log can be found in `/var/log/wm...
[12:27:25] <icinga-wm>	 RECOVERY - DPKG on puppetboard2001 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:27:27] <wikibugs>	 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10jijiki) @aaron In order to change the servers defined in the mediawiki-config (and use other redis instances), apart from roll change them...
[12:27:31] <wikibugs>	 (03Merged) 10jenkins-bot: Fix prev/next links on Special:WhatLinksHere [core] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/647628 (https://phabricator.wikimedia.org/T269830) (owner: 10Lucas Werkmeister (WMDE))
[12:27:41] <Lucas_WMDE>	 aaand right on time \o/
[12:28:01] <icinga-wm>	 PROBLEM - DPKG on db1079 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:28:29] <Lucas_WMDE>	 testing on mwdebug1001
[12:28:49] <icinga-wm>	 PROBLEM - DPKG on ganeti1015 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:29:16] <Lucas_WMDE>	 seems to work just fine, syncing
[12:30:07] <icinga-wm>	 PROBLEM - DPKG on puppetboard1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:30:39] <icinga-wm>	 PROBLEM - DPKG on elastic2051 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:30:49] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized php-1.36.0-wmf.21/includes/specials/SpecialWhatLinksHere.php: Backport: [[gerrit:647628|Fix prev/next links on Special:WhatLinksHere (T269830)]] (duration: 01m 04s)
[12:30:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:30:53] <stashbot>	 T269830: Previous/next links on Special:WhatLinksHere are HTML-escaped on 1.36.0-wmf.21 - https://phabricator.wikimedia.org/T269830
[12:31:09] <logmsgbot>	 !log kharlan@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
[12:31:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:31:19] <icinga-wm>	 PROBLEM - DPKG on analytics1072 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:31:23] <jynus>	 someone doing updates, there seems to be a few dpkg alerts?
[12:31:41] <Urbanecm>	 jynus: I saw this a while ago: 13:12 <moritzm> ^ dpkg error will sort out soon, apt update
[12:31:46] <Urbanecm>	 not sure if it applies to those alerts too
[12:31:55] <jynus>	 thanks, Urbanecm, that explains it
[12:32:03] <jynus>	 I didn't read it, too much scrollback
[12:32:09] <moritzm>	 yeah, that's all going to recover soon and harmless
[12:32:20] <Urbanecm>	 i was deploying at that time, so... 🙂
[12:32:23] <icinga-wm>	 RECOVERY - DPKG on puppetboard1001 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:32:23] <icinga-wm>	 RECOVERY - DPKG on db1079 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:32:23] <icinga-wm>	 RECOVERY - DPKG on ganeti1015 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:33:32] <Lucas_WMDE>	 any other backport/config changes?
[12:33:55] <icinga-wm>	 RECOVERY - DPKG on analytics1072 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:33:55] <icinga-wm>	 RECOVERY - DPKG on elastic2051 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:33:59] <Urbanecm>	 not from me
[12:33:59] <Lucas_WMDE>	 !log EU backport+config window done
[12:34:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:35:45] <icinga-wm>	 PROBLEM - DPKG on ganeti1021 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:36:27] <icinga-wm>	 PROBLEM - DPKG on analytics1073 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:37:27] <icinga-wm>	 PROBLEM - DPKG on db1123 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:40:33] <icinga-wm>	 PROBLEM - DPKG on an-worker1113 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:41:24] <logmsgbot>	 !log elukey@cumin1001 START - Cookbook sre.hosts.downtime
[12:41:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:41:59] <icinga-wm>	 PROBLEM - DPKG on elastic1054 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:42:03] <icinga-wm>	 RECOVERY - DPKG on db1123 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:42:03] <icinga-wm>	 RECOVERY - DPKG on ganeti1021 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:42:03] <icinga-wm>	 RECOVERY - DPKG on analytics1073 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:43:28] <logmsgbot>	 !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[12:43:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:44:11] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1007 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1975379752 and 107 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:44:23] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1881107584 and 109 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:44:37] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1005 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1811215568 and 118 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:46:41] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1005 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 48992 and 142 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:47:15] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1007 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 4320 and 176 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:47:25] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 214776 and 187 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:50:11] <icinga-wm>	 PROBLEM - DPKG on argon is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:50:21] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 294901824 and 12 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:51:24] <wikibugs>	 10Operations, 10Mail: Bounces when sending mail to aliases of a specific WMF email address: 550 Previous (cached) callout verification failure - https://phabricator.wikimedia.org/T269725 (10jbond) 05Open→03Resolved a:03jbond >>! In T269725#6680196, @JCabanero wrote: > Hi all, >  > I sent a test email to...
[12:51:59] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2008 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1116630440 and 64 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:52:59] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install an-tool1010.eqiad.wmnet - https://phabricator.wikimedia.org/T268146 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-tool1010.eqiad.wmnet'] `  and were **ALL** successful.
[12:53:27] <wikibugs>	 (03PS2) 10Jbond: admin: add toan user [puppet] - 10https://gerrit.wikimedia.org/r/647662 (https://phabricator.wikimedia.org/T269777)
[12:53:38] <wikibugs>	 (03CR) 10Jbond: admin: add toan user (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/647662 (https://phabricator.wikimedia.org/T269777) (owner: 10Jbond)
[12:53:39] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 90336 and 65 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:54:01] <wikibugs>	 (03PS9) 10Jbond: Add group wikibase-releasers & folder [puppet] - 10https://gerrit.wikimedia.org/r/643512 (https://phabricator.wikimedia.org/T268818) (owner: 10Tobias Andersson)
[12:54:11] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2008 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 33688 and 99 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:54:21] <icinga-wm>	 RECOVERY - DPKG on argon is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:54:21] <icinga-wm>	 RECOVERY - DPKG on elastic1054 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:54:21] <icinga-wm>	 RECOVERY - DPKG on an-worker1113 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[12:59:55] <icinga-wm>	 PROBLEM - Thanos compact has not run on alert1001 is CRITICAL: 4.466e+05 ge 24 https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/651943d05a8123e32867b4673963f42b/thanos-compact
[13:01:17] <icinga-wm>	 RECOVERY - Thanos compact has not run on alert1001 is OK: (C)24 ge (W)12 ge 0.01697 https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/651943d05a8123e32867b4673963f42b/thanos-compact
[13:01:35] <icinga-wm>	 PROBLEM - DPKG on elastic1032 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[13:05:23] <wikibugs>	 (03PS1) 10Elukey: Add bigtop15 component for Analytics [puppet] - 10https://gerrit.wikimedia.org/r/647697
[13:09:09] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27068/console" [puppet] - 10https://gerrit.wikimedia.org/r/647697 (owner: 10Elukey)
[13:12:08] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install an-tool1010.eqiad.wmnet - https://phabricator.wikimedia.org/T268146 (10elukey) All right the host is now up and running in the analytics vlan, this is the procedure that I followed:  - ran the decom cookbook for an-tool1010 - manually rem...
[13:12:48] <wikibugs>	 (03CR) 10Elukey: Add bigtop15 component for Analytics [puppet] - 10https://gerrit.wikimedia.org/r/647697 (owner: 10Elukey)
[13:20:15] <wikibugs>	 (03PS20) 10Jbond: puppet-merge: add Repository class [puppet] - 10https://gerrit.wikimedia.org/r/544943 (https://phabricator.wikimedia.org/T254249)
[13:20:37] <wikibugs>	 (03CR) 10Jbond: "updated" (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/544943 (https://phabricator.wikimedia.org/T254249) (owner: 10Jbond)
[13:20:53] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] puppet-merge: add Repository class [puppet] - 10https://gerrit.wikimedia.org/r/544943 (https://phabricator.wikimedia.org/T254249) (owner: 10Jbond)
[13:21:56] <wikibugs>	 (03PS21) 10Jbond: puppet-merge: add Repository class [puppet] - 10https://gerrit.wikimedia.org/r/544943 (https://phabricator.wikimedia.org/T254249)
[13:26:56] <jbond42>	 !log disable puppet fleet wide to reboot puppet managment infrastructre
[13:26:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:31:38] <icinga-wm>	 PROBLEM - Host puppetmaster2003 is DOWN: PING CRITICAL - Packet loss = 100%
[13:31:46] <jbond42>	 ^^ me downtimeing now
[13:32:38] <icinga-wm>	 RECOVERY - Host puppetmaster2003 is UP: PING OK - Packet loss = 0%, RTA = 31.89 ms
[13:32:43] <wikibugs>	 10Operations, 10LDAP-Access-Requests: LDAP access to wmf group for Matt Cleinman - https://phabricator.wikimedia.org/T269696 (10Aklapper) @MattCleinman: See https://phabricator.wikimedia.org/project/profile/1564/ for required info; in case that you followed some onboarding docs you may want them to link to tha...
[13:32:57] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.hosts.reboot-single
[13:32:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:40:25] <wikibugs>	 (03PS1) 10Ppchelko: Configure $wgWikimediaApiPortalOAuthMetaApiURL in labs. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647708
[13:41:57] <wikibugs>	 (03CR) 10Ppchelko: [C: 03+2] Configure $wgWikimediaApiPortalOAuthMetaApiURL in labs. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647708 (owner: 10Ppchelko)
[13:42:55] <wikibugs>	 (03Merged) 10jenkins-bot: Configure $wgWikimediaApiPortalOAuthMetaApiURL in labs. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647708 (owner: 10Ppchelko)
[13:44:36] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "> Patch Set 11:" (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/640571 (https://phabricator.wikimedia.org/T265526) (owner: 10Mstyles)
[13:45:54] <logmsgbot>	 !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
[13:45:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:46:00] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "I 've also added you to https://gerrit.wikimedia.org/r/admin/groups/3fdcf8fd0d569e90a3e9b39788a29f2c50d33be9,members you should have +2 ri" [deployment-charts] - 10https://gerrit.wikimedia.org/r/640571 (https://phabricator.wikimedia.org/T265526) (owner: 10Mstyles)
[13:46:45] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.hosts.reboot-single
[13:46:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:46:50] <wikibugs>	 10Operations, 10ops-codfw, 10SRE-swift-storage: audit / test / upgrade hp smartarray P840 firmware - https://phabricator.wikimedia.org/T141756 (10fgiunchedi)
[13:50:22] <logmsgbot>	 !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
[13:50:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:50:29] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.hosts.reboot-single
[13:50:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:04:52] <wikibugs>	 10Operations, 10RESTBase: restbase2009 reimaging issues - https://phabricator.wikimedia.org/T269853 (10hnowlan)
[14:05:49] <wikibugs>	 (03PS1) 10Jbond: sre.puppet.renew-cert: convert to class API [cookbooks] - 10https://gerrit.wikimedia.org/r/647712
[14:05:54] <wikibugs>	 10Operations, 10RESTBase: restbase2009 reimaging issues - https://phabricator.wikimedia.org/T269853 (10hnowlan)
[14:06:06] <wikibugs>	 (03PS2) 10Jbond: sre.puppet.renew-cert: convert to class API [cookbooks] - 10https://gerrit.wikimedia.org/r/647712
[14:06:18] <wikibugs>	 10Operations, 10RESTBase: restbase2009 reimaging issues - https://phabricator.wikimedia.org/T269853 (10hnowlan)
[14:07:13] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] sre.puppet.renew-cert: convert to class API [cookbooks] - 10https://gerrit.wikimedia.org/r/647712 (owner: 10Jbond)
[14:07:52] <logmsgbot>	 !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
[14:07:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:10:02] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.hosts.reboot-single
[14:10:02] <logmsgbot>	 !log jbond@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
[14:10:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:10:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:10:12] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.hosts.reboot-single
[14:10:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:13:12] <wikibugs>	 (03PS3) 10Jbond: sre.puppet.renew-cert: convert to class API [cookbooks] - 10https://gerrit.wikimedia.org/r/647712
[14:14:20] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] sre.puppet.renew-cert: convert to class API [cookbooks] - 10https://gerrit.wikimedia.org/r/647712 (owner: 10Jbond)
[14:16:31] <logmsgbot>	 !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
[14:16:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:16:47] <wikibugs>	 (03Abandoned) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/641151 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak)
[14:17:11] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.hosts.reboot-single
[14:17:11] <wikibugs>	 (03PS4) 10Jbond: sre.puppet.renew-cert: convert to class API [cookbooks] - 10https://gerrit.wikimedia.org/r/647712
[14:17:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:19:10] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] GeoDNS: Remove old hack for Wikia RES datacenter [dns] - 10https://gerrit.wikimedia.org/r/647253 (owner: 10TK-999)
[14:21:20] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Publish Wikibase tarball releases on releases.wikimedia.org - https://phabricator.wikimedia.org/T268818 (10thcipriani) >>! In T268818#6656828, @ssingh wrote: > Thanks @Dzahn! >  > @thcipriani: Adding you to this task to see if you have any possible con...
[14:22:54] <wikibugs>	 (03CR) 10Jbond: "ready for review" [cookbooks] - 10https://gerrit.wikimedia.org/r/647712 (owner: 10Jbond)
[14:28:30] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to releasers-wikibase for toan - https://phabricator.wikimedia.org/T269777 (10jbond) @KFrancis Are you able to confirm NDA status for Tobias, thanks
[14:30:22] <wikibugs>	 (03PS10) 10Jbond: Add group wikibase-releasers & folder [puppet] - 10https://gerrit.wikimedia.org/r/643512 (https://phabricator.wikimedia.org/T268818) (owner: 10Tobias Andersson)
[14:30:37] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27069/console" [puppet] - 10https://gerrit.wikimedia.org/r/645417 (https://phabricator.wikimedia.org/T267653) (owner: 10JMeybohm)
[14:30:42] <wikibugs>	 (03PS11) 10Jbond: Add group wikibase-releasers & folder [puppet] - 10https://gerrit.wikimedia.org/r/643512 (https://phabricator.wikimedia.org/T268818) (owner: 10Tobias Andersson)
[14:32:50] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] Add group wikibase-releasers & folder [puppet] - 10https://gerrit.wikimedia.org/r/643512 (https://phabricator.wikimedia.org/T268818) (owner: 10Tobias Andersson)
[14:32:52] <icinga-wm>	 RECOVERY - DPKG on elastic1032 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[14:33:44] <logmsgbot>	 !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
[14:33:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:36:26] <wikibugs>	 (03PS3) 10Jbond: admin: add toan user and add to wikibase-releasers group [puppet] - 10https://gerrit.wikimedia.org/r/647662 (https://phabricator.wikimedia.org/T269777)
[14:37:55] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 03+1 C: 03+1] "PCC is happy too, so +1" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/645417 (https://phabricator.wikimedia.org/T267653) (owner: 10JMeybohm)
[14:38:25] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Publish Wikibase tarball releases on releases.wikimedia.org - https://phabricator.wikimedia.org/T268818 (10jbond) 05Open→03Resolved a:03jbond The change has now been merged all users listed in the original post should have the required access.  @...
[14:38:34] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.hosts.reboot-single
[14:38:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:39:50] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install an-tool1010.eqiad.wmnet - https://phabricator.wikimedia.org/T268146 (10Ottomata) Thanks Luca!
[14:40:55] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Publish Wikibase tarball releases on releases.wikimedia.org - https://phabricator.wikimedia.org/T268818 (10toan) >>! In T268818#6682278, @jbond wrote: > The change has now been merged all users listed in the original post should have the required acces...
[14:42:06] <logmsgbot>	 !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
[14:42:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:47:10] <jbond42>	 !log re-enable puppet fleet wide
[14:47:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:47:28] <icinga-wm>	 PROBLEM - puppet last run on miscweb1002 is CRITICAL: CRITICAL: Puppet last ran 1 day ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[14:50:08] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.hosts.reboot-single
[14:50:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:52:51] <wikibugs>	 (03CR) 10Bstorm: wikireplicas: close all connections (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/647419 (https://phabricator.wikimedia.org/T269620) (owner: 10Bstorm)
[14:53:04] <icinga-wm>	 RECOVERY - puppet last run on miscweb1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[14:54:02] <icinga-wm>	 PROBLEM - Check systemd state on pki2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:54:49] <wikibugs>	 (03CR) 10Bstorm: "> Patch Set 1: Code-Review-1" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/647419 (https://phabricator.wikimedia.org/T269620) (owner: 10Bstorm)
[14:59:49] <wikibugs>	 (03CR) 10VolkerE: [C: 03+1] "Have just +1 rights here" [core] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/647305 (https://phabricator.wikimedia.org/T269477) (owner: 10Catrope)
[15:03:32] <wikibugs>	 (03CR) 10David Caro: wikireplicas: close all connections (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/647419 (https://phabricator.wikimedia.org/T269620) (owner: 10Bstorm)
[15:03:33] <logmsgbot>	 !log jbond@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
[15:03:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:03:54] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.hosts.reboot-single
[15:03:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:04:02] <jbond42>	 !log reboot  deneb.codfw.wmnet
[15:04:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:04:20] <moritzm>	 !log restarting slapd on ldap replicas to pick up OpenSSL updates
[15:04:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:04:50] <icinga-wm>	 RECOVERY - Check systemd state on deneb is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:04:52] <icinga-wm>	 PROBLEM - Check systemd state on idp1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:07:49] <logmsgbot>	 !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
[15:07:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:09:38] <wikibugs>	 (03CR) 10Volans: "LGTM, nit and a suggestion inline" (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/647712 (owner: 10Jbond)
[15:11:20] <icinga-wm>	 RECOVERY - Check systemd state on idp1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:12:11] <Majavah>	 hi, anyone with Logstash access around that could lookup stack trace for X9I6DgpAIC4AAHI1i4kAAAAR T269857? thanks!
[15:12:11] <stashbot>	 T269857: Fatal exception of type "TypeError" when viewing enwiki page "Draft:Richard_L._Greene" - https://phabricator.wikimedia.org/T269857
[15:12:18] <moritzm>	 !log restarting turnilo and hue to pick up OpenSSL security updates
[15:12:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:16:06] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/647697 (owner: 10Elukey)
[15:16:23] <hauskatze>	 Hello. For some reason a specific page on enwiki is displayed to me in a different MW Skin than the one I use. Any ideas?
[15:16:32] <wikibugs>	 (03CR) 10Bstorm: wikireplicas: close all connections (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/647419 (https://phabricator.wikimedia.org/T269620) (owner: 10Bstorm)
[15:17:28] <wikibugs>	 10Operations, 10serviceops, 10MW-1.36-notes (1.36.0-wmf.18; 2020-11-17), 10Performance Issue, and 3 others: Strategy for storing parser output for "old revision" (Popular diffs and permalinks) - https://phabricator.wikimedia.org/T244058 (10Pchelolo)
[15:19:29] <wikibugs>	 (03PS5) 10Jbond: sre.puppet.renew-cert: convert to class API [cookbooks] - 10https://gerrit.wikimedia.org/r/647712
[15:19:32] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Add bigtop15 component for Analytics [puppet] - 10https://gerrit.wikimedia.org/r/647697 (owner: 10Elukey)
[15:19:42] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "updated thanks" (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/647712 (owner: 10Jbond)
[15:21:38] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] "Nice. Sorry it took so long to review this. I definitely removes some duplication" [deployment-charts] - 10https://gerrit.wikimedia.org/r/644787 (https://phabricator.wikimedia.org/T268434) (owner: 10JMeybohm)
[15:22:32] <wikibugs>	 (03CR) 10Bstorm: wikireplicas: close all connections (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/647419 (https://phabricator.wikimedia.org/T269620) (owner: 10Bstorm)
[15:23:34] <icinga-wm>	 PROBLEM - Host ms-be1030 is DOWN: PING CRITICAL - Packet loss = 100%
[15:26:19] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: https://lists.wikimedia.org/pipermail/wikija-l/ has broken encoding - https://phabricator.wikimedia.org/T269301 (10jbond) Do you have any idea for when this may have broke?
[15:28:38] <icinga-wm>	 RECOVERY - Host ms-be1030 is UP: PING OK - Packet loss = 0%, RTA = 0.15 ms
[15:30:05] <jouncebot>	 CindyCicaleseWMF and bpirkle: #bothumor I � Unicode. All rise for Core Platform Team Deployment deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201210T1530).
[15:30:05] <jouncebot>	 CindyCicaleseWMF: A patch you scheduled for Core Platform Team Deployment is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[15:31:12] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.reboot-single
[15:31:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:32:07] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to releasers-wikibase for toan - https://phabricator.wikimedia.org/T269777 (10jbond) p:05Triage→03Medium
[15:32:43] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on ms-be1030 - https://phabricator.wikimedia.org/T268036 (10Cmjohnson) I swapped the bbu with a new one and powered the server up
[15:33:52] <CindyCicaleseWMF>	 bpirkle and I are getting ready to deploy a config change. Let us know if there is anything going on to prevent that now.
[15:34:02] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
[15:34:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:34:53] <wikibugs>	 (03CR) 10Razzi: [C: 03+2] Add kafka-test1007 virtual machine [puppet] - 10https://gerrit.wikimedia.org/r/647109 (https://phabricator.wikimedia.org/T268202) (owner: 10Razzi)
[15:35:34] <icinga-wm>	 RECOVERY - HP RAID on ms-be1030 is OK: OK: Slot 3: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2 - Controller: OK - Battery/Capacitor: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[15:36:17] <wikibugs>	 (03PS3) 10Alexandros Kosiaris: k8s_infrastructure_users: Amend to support groups, avoid uid conflicts [puppet] - 10https://gerrit.wikimedia.org/r/647011 (https://phabricator.wikimedia.org/T269461)
[15:36:19] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: kubestage2*: Assign role [puppet] - 10https://gerrit.wikimedia.org/r/647728 (https://phabricator.wikimedia.org/T252185)
[15:36:51] <wikibugs>	 (03PS2) 10Cicalese: Configure API Portal permissions for launch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/646862 (https://phabricator.wikimedia.org/T267953)
[15:37:07] <wikibugs>	 (03PS4) 10Cicalese: CommonSettings: OAuth 2.0 refresh tokens expire after 1 minute [mediawiki-config] - 10https://gerrit.wikimedia.org/r/645308 (https://phabricator.wikimedia.org/T269152) (owner: 10Vlad.shapik)
[15:37:10] <wikibugs>	 (03CR) 10BPirkle: [C: 03+2] Configure API Portal permissions for launch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/646862 (https://phabricator.wikimedia.org/T267953) (owner: 10Cicalese)
[15:38:31] <wikibugs>	 (03Merged) 10jenkins-bot: Configure API Portal permissions for launch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/646862 (https://phabricator.wikimedia.org/T267953) (owner: 10Cicalese)
[15:39:11] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] admin_ng: Generalization, prod values anf fixes [deployment-charts] - 10https://gerrit.wikimedia.org/r/644787 (https://phabricator.wikimedia.org/T268434) (owner: 10JMeybohm)
[15:40:17] <wikibugs>	 (03PS1) 10Muehlenhoff: Enable base::service_auto_restart for prometheus-es-exporter [puppet] - 10https://gerrit.wikimedia.org/r/647730 (https://phabricator.wikimedia.org/T135991)
[15:40:50] <wikibugs>	 10Operations, 10LDAP-Access-Requests: NDA for Superset Request from WMDE Employee - Mohammed Sadat - https://phabricator.wikimedia.org/T269843 (10jbond) @KFrancis are you able to help with processin the NDA for Mohammed  We will also need [[ https://wikitech.wikimedia.org/wiki/SRE_Clinic_Duty#wmde_access | app...
[15:40:56] <wikibugs>	 (03Merged) 10jenkins-bot: admin_ng: Generalization, prod values anf fixes [deployment-charts] - 10https://gerrit.wikimedia.org/r/644787 (https://phabricator.wikimedia.org/T268434) (owner: 10JMeybohm)
[15:41:33] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Enable base::service_auto_restart for prometheus-es-exporter [puppet] - 10https://gerrit.wikimedia.org/r/647730 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[15:43:22] <wikibugs>	 (03PS2) 10Muehlenhoff: Enable base::service_auto_restart for prometheus-es-exporter [puppet] - 10https://gerrit.wikimedia.org/r/647730 (https://phabricator.wikimedia.org/T135991)
[15:44:05] <wikibugs>	 (03PS3) 10Muehlenhoff: Enable base::service_auto_restart for prometheus-es-exporter [puppet] - 10https://gerrit.wikimedia.org/r/647730 (https://phabricator.wikimedia.org/T135991)
[15:45:01] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] calico: Add support for calico 3.x with kubernetes datastore [puppet] - 10https://gerrit.wikimedia.org/r/645417 (https://phabricator.wikimedia.org/T267653) (owner: 10JMeybohm)
[15:45:57] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.reboot-single
[15:45:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:47:05] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/647730 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[15:47:39] <wikibugs>	 (03PS1) 10Elukey: aptrepo: add key for bigtop 1.5 [puppet] - 10https://gerrit.wikimedia.org/r/647732
[15:48:59] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27070/console" [puppet] - 10https://gerrit.wikimedia.org/r/647732 (owner: 10Elukey)
[15:49:52] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
[15:49:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:50:14] <logmsgbot>	 !log cicalese@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 646862 Configure API Portal permissions for launch (duration: 01m 03s)
[15:50:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:50:40] <wikibugs>	 (03CR) 10BPirkle: [C: 03+2] CommonSettings: OAuth 2.0 refresh tokens expire after 1 minute [mediawiki-config] - 10https://gerrit.wikimedia.org/r/645308 (https://phabricator.wikimedia.org/T269152) (owner: 10Vlad.shapik)
[15:50:44] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw2243.codfw.wmnet
[15:50:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:51:30] <wikibugs>	 (03Merged) 10jenkins-bot: CommonSettings: OAuth 2.0 refresh tokens expire after 1 minute [mediawiki-config] - 10https://gerrit.wikimedia.org/r/645308 (https://phabricator.wikimedia.org/T269152) (owner: 10Vlad.shapik)
[15:51:59] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] sre.hosts.downtime: convert to class API [cookbooks] - 10https://gerrit.wikimedia.org/r/633484 (https://phabricator.wikimedia.org/T221212) (owner: 10Volans)
[15:53:05] <moritzm>	 !log rebooting planet1002 (planet.wikimedia.org) for kernel update
[15:53:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:53:11] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.reboot-single
[15:53:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:53:31] <wikibugs>	 (03PS7) 10Volans: sre.hosts.downtime: convert to class API [cookbooks] - 10https://gerrit.wikimedia.org/r/633484 (https://phabricator.wikimedia.org/T221212)
[15:54:26] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mwdebug1002.eqiad.wmnet
[15:54:26] <wikibugs>	 10Operations, 10Prod-Kubernetes, 10serviceops, 10Kubernetes, 10Patch-For-Review: Refactor calico deploy strategy - https://phabricator.wikimedia.org/T267653 (10JMeybohm) [puppet-private] (487bdca0) (jayme) Add calicoctl and calico-cni kubernetes users
[15:54:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:54:32] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mwdebug1003.eqiad.wmnet
[15:54:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:54:43] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/weight=10; selector: name=mwdebug1003.eqiad.wmnet
[15:54:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:55:38] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
[15:55:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:55:42] <logmsgbot>	 !log cicalese@deploy1001 Synchronized wmf-config/CommonSettings.php: 645308 CommonSettings: OAuth 2.0 refresh tokens expire after 1 minute (duration: 01m 02s)
[15:55:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:56:13] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] hiera: upgrade mc1032, mc2032 to buster [puppet] - 10https://gerrit.wikimedia.org/r/647672 (https://phabricator.wikimedia.org/T213089) (owner: 10Effie Mouzeli)
[15:56:22] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on ms-be1030 - https://phabricator.wikimedia.org/T268036 (10fgiunchedi) 05Open→03Resolved We're back, thanks @Cmjohnson !
[15:56:35] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: https://lists.wikimedia.org/pipermail/wikija-l/ has broken encoding - https://phabricator.wikimedia.org/T269301 (10Urbanecm) No, I'm sorry, that was my first time seeing the list.
[15:56:38] <wikibugs>	 10Operations, 10Release-Engineering-Team-TODO, 10serviceops, 10Patch-For-Review, and 2 others: Upgrade MediaWiki appservers to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10jijiki) >>! In T245757#6681662, @MoritzMuehlenhoff wrote: > ffmpeg -i Wall_of_Death_-_Pitts_Todeswand_2017_...
[15:57:01] <wikibugs>	 (03CR) 10Volans: [C: 03+2] sre.hosts.downtime: convert to class API [cookbooks] - 10https://gerrit.wikimedia.org/r/633484 (https://phabricator.wikimedia.org/T221212) (owner: 10Volans)
[15:57:27] <wikibugs>	 10Operations, 10Release-Engineering-Team-TODO, 10serviceops, 10Patch-For-Review, and 2 others: Upgrade MediaWiki appservers to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10jijiki)
[15:58:06] <wikibugs>	 10Operations, 10RESTBase: restbase2009 reimaging issues - https://phabricator.wikimedia.org/T269853 (10MoritzMuehlenhoff) "task md0_resync:20551 blocked for more than 120 seconds" smells like a hw issue. Best to open a DC ops ticket to get the controller and system firmware update and then retry to reimage,
[15:58:41] <mutante>	 !log mw2243 pooled - first jobrunner on buster
[15:58:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:59:05] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM, possible improvement inline" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/647712 (owner: 10Jbond)
[15:59:07] <wikibugs>	 (03Merged) 10jenkins-bot: sre.hosts.downtime: convert to class API [cookbooks] - 10https://gerrit.wikimedia.org/r/633484 (https://phabricator.wikimedia.org/T221212) (owner: 10Volans)
[15:59:24] <CindyCicaleseWMF>	 We're all done with our deployments!
[15:59:36] <wikibugs>	 (03CR) 10Elukey: [V: 03+1 C: 03+2] aptrepo: add key for bigtop 1.5 [puppet] - 10https://gerrit.wikimedia.org/r/647732 (owner: 10Elukey)
[15:59:43] <wikibugs>	 10Operations, 10Parsoid, 10serviceops: Upgrade Parsoid servers to buster - https://phabricator.wikimedia.org/T268524 (10jijiki) @ssastry is there someway we could check that parse2001, which is running on buster now, works as expected?
[16:03:00] <wikibugs>	 (03PS6) 10Jbond: sre.puppet.renew-cert: convert to class API [cookbooks] - 10https://gerrit.wikimedia.org/r/647712
[16:03:12] <wikibugs>	 (03CR) 10Jbond: sre.puppet.renew-cert: convert to class API (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/647712 (owner: 10Jbond)
[16:03:17] <wikibugs>	 (03PS7) 10Jbond: sre.puppet.renew-cert: convert to class API [cookbooks] - 10https://gerrit.wikimedia.org/r/647712
[16:04:14] <Krinkle>	 mutante: a simple redirect data funnel should suffice for that one
[16:04:20] <Krinkle>	 To preserve current behavior
[16:04:56] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] sre.puppet.renew-cert: convert to class API [cookbooks] - 10https://gerrit.wikimedia.org/r/647712 (owner: 10Jbond)
[16:05:00] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM and thanks a lot to jump so quickly on the new API!" [cookbooks] - 10https://gerrit.wikimedia.org/r/647712 (owner: 10Jbond)
[16:05:04] <Krinkle>	 Oh long back scroll, I meant : https://gerrit.wikimedia.org/r/c/operations/puppet/+/524088
[16:06:49] <wikibugs>	 (03CR) 10Dzahn: [V: 03+1] "here it is again what misled me: compiler on C:role::dnsbox comes up empty https://integration.wikimedia.org/ci/job/operations-puppet-cata" [puppet] - 10https://gerrit.wikimedia.org/r/645206 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn)
[16:07:02] <wikibugs>	 10Operations, 10LDAP-Access-Requests: NDA for Superset Request from WMDE Employee - Mohammed Sadat - https://phabricator.wikimedia.org/T269843 (10jbond) p:05Triage→03Medium
[16:09:10] <wikibugs>	 10Operations, 10DC-Ops, 10RESTBase: restbase2009 reimaging issues - https://phabricator.wikimedia.org/T269853 (10jbond) p:05Triage→03Medium
[16:13:00] <wikibugs>	 10Operations, 10Parsoid, 10serviceops: Upgrade Parsoid servers to buster - https://phabricator.wikimedia.org/T268524 (10ssastry) Run a few curl commands like these but while using parse2001 as a proxy. Here is the equivalent for scandium itself: ` curl -L -x http://scandium.eqiad.wmnet:80 http://en.wikipedia...
[16:13:18] <wikibugs>	 10Operations: Update tor's apt gpg key - https://phabricator.wikimedia.org/T269861 (10elukey)
[16:13:32] <logmsgbot>	 !log volans@cumin2001 START - Cookbook sre.hosts.downtime for 0:10:00 on cumin2001.codfw.wmnet with reason: volans's test
[16:13:32] <logmsgbot>	 !log volans@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on cumin2001.codfw.wmnet with reason: volans's test
[16:13:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:13:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:13:37] <volans>	 rzl: ^^^
[16:13:39] <elukey>	 !log add thirdparty/bigtop15 packages to stretch-wikimedia
[16:13:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:13:46] <rzl>	 :o
[16:13:50] <elukey>	 wwoooooooooowwwwww
[16:14:08] <wikibugs>	 10Operations, 10Maps: Requesting access to maps for mbsantos and jgiannelos - https://phabricator.wikimedia.org/T269357 (10jbond) Im going to drop the SRE-Access-Requests tage from this task as i dosn't look like there is an access request to action.  please re add if i have missed something.
[16:14:10] <volans>	 to be tweaked
[16:14:19] <elukey>	 this is really awesome
[16:14:49] <wikibugs>	 (03Abandoned) 10RLazarus: When starting a cookbook, also log the args to IRC. [software/spicerack] - 10https://gerrit.wikimedia.org/r/549879 (owner: 10RLazarus)
[16:14:55] <rzl>	 volans: ^ ;)
[16:15:08] <volans>	 sorry it took soooooo long
[16:15:28] <rzl>	 ahaha it's all good
[16:15:34] <rzl>	 I'm really glad to see it
[16:15:41] <elukey>	 yep it is really great
[16:15:54] <wikibugs>	 10Operations: Update tor's apt gpg key - https://phabricator.wikimedia.org/T269861 (10MoritzMuehlenhoff) I can simply be removed, we no longer import/use the Tor packages. It was probably mis-imported when we migrated from local storage on apt1001 to the current Puppet approach.
[16:15:56] <wikibugs>	 10Operations: Update tor's apt gpg key - https://phabricator.wikimedia.org/T269861 (10Dzahn) Since torrelay1001 has been removed and from a glance at debmonitor.. I don't think we use the tor package anymore and can probably remove this component.
[16:16:09] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics-Clusters: an-presto1004 shows only the NIC in the boot list - https://phabricator.wikimedia.org/T268951 (10Cmjohnson) This is scheduled for Monday 14Dec
[16:18:11] <wikibugs>	 (03CR) 10Dzahn: [V: 03+1 C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/27074/dns1001.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/645206 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn)
[16:19:30] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqsin on alert1001 is CRITICAL: 56.67 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[16:19:44] <icinga-wm>	 PROBLEM - Host ms-be1022 is DOWN: PING CRITICAL - Packet loss = 100%
[16:20:20] <wikibugs>	 10Operations, 10Parsoid, 10serviceops: Upgrade Parsoid servers to buster - https://phabricator.wikimedia.org/T268524 (10ssastry) Ah, because parse2001 and parse2002 are codfw, not eqiad. Anyway, here goes: ` ssastry@scandium:~$ curl -L -x http://parse2001.codfw.wmnet:80 http://en.wikipedia.org/w/rest.php/en....
[16:20:35] <wikibugs>	 (03CR) 10Dzahn: "noop confirmed on dns1001, dns3001" [puppet] - 10https://gerrit.wikimedia.org/r/645206 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn)
[16:21:42] <icinga-wm>	 PROBLEM - Check systemd state on wdqs1011 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:22:00] <icinga-wm>	 PROBLEM - varnish-http-requests grafana alert on alert1001 is CRITICAL: CRITICAL: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is alerting: 70% GET drop in 30min alert. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/
[16:22:06] <icinga-wm>	 PROBLEM - Host ms-be1022.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:22:15] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: https://lists.wikimedia.org/pipermail/wikija-l/ has broken encoding - https://phabricator.wikimedia.org/T269301 (10jbond) ack thx
[16:22:44] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at eqsin on alert1001 is OK: (C)60 le (W)70 le 77.75 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[16:23:38] <icinga-wm>	 RECOVERY - varnish-http-requests grafana alert on alert1001 is OK: OK: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is not alerting. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/
[16:24:10] <wikibugs>	 10Operations, 10ops-eqiad, 10SRE-swift-storage: ms-be1022 smart storage battery failure; disk sdb possibly bad - https://phabricator.wikimedia.org/T267870 (10Cmjohnson) @fgiunchedi The battery has been replaced.  The SSD looks to be /dev/sda and is an SSD.  What do you want to do about the failed disk?
[16:25:26] <wikibugs>	 (03PS1) 10Dzahn: poolcounter: require_package -> ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/647737 (https://phabricator.wikimedia.org/T266479)
[16:25:52] <icinga-wm>	 RECOVERY - Host ms-be1022 is UP: PING OK - Packet loss = 0%, RTA = 240.38 ms
[16:26:05] <wikibugs>	 10Operations, 10ops-eqiad, 10SRE-swift-storage: ms-be1022 smart storage battery failure; disk sdb possibly bad - https://phabricator.wikimedia.org/T267870 (10fgiunchedi) >>! In T267870#6682568, @Cmjohnson wrote: > @fgiunchedi The battery has been replaced.  The SSD looks to be /dev/sda and is an SSD.  What d...
[16:26:55] <wikibugs>	 10Operations, 10ops-eqiad, 10Data-Services, 10cloud-services-team (Hardware): Move labstore1005 to 10Gbps rack and ethernet - https://phabricator.wikimedia.org/T266199 (10Cmjohnson) @bstorm I am sorry I confused which one was already in a 10G rack. I need to confirm that 1004 is in C2 and can stay and 1005...
[16:27:17] <wikibugs>	 (03PS1) 10Dzahn: otrs: require_package -> ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/647738 (https://phabricator.wikimedia.org/T266479)
[16:27:41] <wikibugs>	 10Operations, 10ops-eqiad, 10Data-Services, 10cloud-services-team (Hardware): Move or recable labstore1004 to 10Gbps rack (if needed) and ethernet - https://phabricator.wikimedia.org/T266202 (10Cmjohnson) This server can stay in C2 and can be converted anytime.
[16:27:46] <icinga-wm>	 RECOVERY - Host ms-be1022.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.83 ms
[16:27:52] <gehel>	 !log depooling wdqs1011, issues with categories endpoint
[16:27:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:28:24] <wikibugs>	 10Operations, 10ops-eqiad, 10Data-Services, 10Epic, 10cloud-services-team (Hardware): Move labstore1004 and labstore1005 to 10G Ethernet - https://phabricator.wikimedia.org/T266198 (10Cmjohnson) 05Stalled→03Resolved This is now a duplicate task, we have a few for the same thing. I am resolving this o...
[16:28:28] <godog>	 !log power reset ms-be1022 - stuck after boot - T267870
[16:28:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:28:31] <stashbot>	 T267870: ms-be1022 smart storage battery failure; disk sdb possibly bad - https://phabricator.wikimedia.org/T267870
[16:28:33] <wikibugs>	 10Operations, 10Gerrit-Privilege-Requests: Offboard Pablo-WMDE from WMF systems - https://phabricator.wikimedia.org/T268946 (10jbond)
[16:28:55] <wikibugs>	 (03PS1) 10Dzahn: wikimania_scholarships: require_package -> ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/647739 (https://phabricator.wikimedia.org/T266479)
[16:29:00] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2040 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:29:11] <wikibugs>	 10Operations, 10LDAP-Access-Requests: NDA for Superset Request from WMDE Employee - Mohammed Sadat - https://phabricator.wikimedia.org/T269843 (10WMDE-leszek) I approve this request on behalf of WMDE Engineering Managers. @Kris_Litson_WMDE is formally Mohammed's line manager (other branch in WMDE org chart tha...
[16:29:32] <icinga-wm>	 PROBLEM - Host ms-be1022 is DOWN: PING CRITICAL - Packet loss = 100%
[16:30:29] <wikibugs>	 10Operations, 10MediaWiki-General, 10Performance-Team, 10serviceops-radar, and 3 others: Move MainStash out of Redis to a simpler multi-dc aware solution - https://phabricator.wikimedia.org/T212129 (10jijiki) @Marostegui @LSobanski Where are we regarding the purchase?  @Gilles @WDoranWMF Given that we are...
[16:31:06] <icinga-wm>	 RECOVERY - Host ms-be1022 is UP: PING OK - Packet loss = 0%, RTA = 0.20 ms
[16:31:20] <icinga-wm>	 PROBLEM - Check systemd state on ms-be1022 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:31:45] <icinga-wm>	 ACKNOWLEDGEMENT - MD RAID on ms-be1022 is CRITICAL: CRITICAL: State: degraded, Active: 3, Working: 3, Failed: 0, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T269862 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[16:31:49] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on ms-be1022 - https://phabricator.wikimedia.org/T269862 (10ops-monitoring-bot)
[16:33:50] <icinga-wm>	 RECOVERY - HP RAID on ms-be1022 is OK: OK: Slot 3: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2 - Controller: OK - Battery/Capacitor: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[16:34:36] <wikibugs>	 10Operations, 10ops-eqiad, 10SRE-swift-storage: ms-be1022 smart storage battery failure; disk sdb possibly bad - https://phabricator.wikimedia.org/T267870 (10Cmjohnson) The disk error did not come back
[16:40:02] <wikibugs>	 10Operations, 10ops-eqiad, 10SRE-swift-storage: ms-be1022 smart storage battery failure; disk sdb possibly bad - https://phabricator.wikimedia.org/T267870 (10Cmjohnson) 05Open→03Resolved Resolving this, if the error returns please re-open
[16:40:04] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on ms-be1022 - https://phabricator.wikimedia.org/T269862 (10Cmjohnson) 05Open→03Resolved a:03Cmjohnson
[16:47:38] <icinga-wm>	 PROBLEM - Check systemd state on wdqs2006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:48:43] <wikibugs>	 10Operations, 10MediaWiki-General, 10Performance-Team, 10serviceops-radar, and 3 others: Move MainStash out of Redis to a simpler multi-dc aware solution - https://phabricator.wikimedia.org/T212129 (10WDoranWMF) @jijiki Ok, thank you. @Gilles may be we can chat it through? I'll try to find us a time.
[16:50:40] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox
[16:50:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:51:05] <wikibugs>	 10Operations, 10LDAP-Access-Requests: NDA for Superset Request from WMDE Employee - Mohammed Sadat - https://phabricator.wikimedia.org/T269843 (10jbond) >>! In T269843#6682591, @WMDE-leszek wrote: > I approve this request on behalf of WMDE Engineering Managers. @Kris_Litson_WMDE is formally Mohammed's line man...
[16:53:29] <logmsgbot>	 !log pt1979@cumin2001 START - Cookbook sre.dns.netbox
[16:53:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:54:11] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install ml-serve200[1-4] - https://phabricator.wikimedia.org/T267670 (10Papaul)
[16:54:26] <effie>	 !log upgrade mc1032, mc2032 to buster - T213089
[16:54:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:54:29] <stashbot>	 T213089: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089
[16:55:23] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] hiera: upgrade mc1032, mc2032 to buster [puppet] - 10https://gerrit.wikimedia.org/r/647672 (https://phabricator.wikimedia.org/T213089) (owner: 10Effie Mouzeli)
[16:57:34] <wikibugs>	 10Operations: Update tor's apt gpg key - https://phabricator.wikimedia.org/T269861 (10jbond) p:05Triage→03Medium
[16:58:24] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2040 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:58:31] <wikibugs>	 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mc2032.codfw.wmnet ` The log can be...
[16:59:19] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:59:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:59:26] <logmsgbot>	 !log pt1979@cumin2001 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
[16:59:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:00:04] <jouncebot>	 jbond42 and cdanis: Your horoscope predicts another unfortunate Puppet request window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201210T1700).
[17:01:04] <logmsgbot>	 !log pt1979@cumin2001 START - Cookbook sre.dns.netbox
[17:01:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:01:28] <icinga-wm>	 PROBLEM - debmonitor.wikimedia.org requires authentication on debmonitor2002 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/1.1 400 Bad Request https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration
[17:01:46] <icinga-wm>	 PROBLEM - very high load average likely xfs on ms-be2019 is CRITICAL: CRITICAL - load average: 151.56, 108.64, 60.74 https://wikitech.wikimedia.org/wiki/Swift
[17:03:12] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2019 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:03:43] <logmsgbot>	 !log pt1979@cumin2001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[17:03:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:04:27] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+2] Update Chart.yaml source references [deployment-charts] - 10https://gerrit.wikimedia.org/r/647354 (owner: 10Ahmon Dancy)
[17:05:02] <icinga-wm>	 RECOVERY - very high load average likely xfs on ms-be2019 is OK: OK - load average: 28.78, 69.03, 54.24 https://wikitech.wikimedia.org/wiki/Swift
[17:05:43] <wikibugs>	 (03Merged) 10jenkins-bot: Update Chart.yaml source references [deployment-charts] - 10https://gerrit.wikimedia.org/r/647354 (owner: 10Ahmon Dancy)
[17:07:30] <icinga-wm>	 PROBLEM - Aggregate IPsec Tunnel Status eqiad on alert1001 is CRITICAL: instance=mc1032 site=eqiad tunnel=mc2032_v4 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan https://grafana.wikimedia.org/d/B9JpocKZz/ipsec-tunnel-status
[17:10:06] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on mwmaint1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Effie Mouzeli Reported on T269693 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:13:57] <logmsgbot>	 !log jiji@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mc2032.codfw.wmnet with reason: REIMAGE
[17:13:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:15:56] <logmsgbot>	 !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2032.codfw.wmnet with reason: REIMAGE
[17:15:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:17:03] <rzl>	 volans: ^ I will think fondly of you every time someone reimages a host from now on <3
[17:17:23] <volans>	 ahahah
[17:18:33] <wikibugs>	 10Operations: slapd fails to restart sometimes - https://phabricator.wikimedia.org/T269394 (10jbond) adding additional logs before they get rotated  ` lines=5 Dec  3 14:20:32 serpens puppet-agent[4040]: Computing checksum on file /etc/acmecerts/ldap/cae12c858fa6417d8d999bfaef1c25ec/ec-prime256v1.ocsp Dec  3 14:2...
[17:24:11] <wikibugs>	 10Operations, 10ops-eqiad: Interface errors on cr1-eqiad:xe-3/2/1 - https://phabricator.wikimedia.org/T267672 (10Cmjohnson) replaced the cable, gave it the same cable number, removed the old fiber.  Cleared the interface statistics on cr1.
[17:24:11] <wikibugs>	 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc2032.codfw.wmnet'] `  and were **ALL** successful.
[17:29:38] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] hiera: install redis on mc1032,mc2032 [puppet] - 10https://gerrit.wikimedia.org/r/647750 (owner: 10Effie Mouzeli)
[17:31:27] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission-hardware: decommission es1016.eqiad.wmnet - https://phabricator.wikimedia.org/T268812 (10Cmjohnson) 05Open→03Resolved removed from rack, updated netbox and ran the script, confirmed network ports were already removed.
[17:33:30] <icinga-wm>	 RECOVERY - Aggregate IPsec Tunnel Status eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/strongswan https://grafana.wikimedia.org/d/B9JpocKZz/ipsec-tunnel-status
[17:37:39] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics-Clusters, 10DC-Ops: (Need By: TBD) rack/setup/install an-worker10[18-41] - https://phabricator.wikimedia.org/T260445 (10Cmjohnson) @elukey This is what I have currently   +2 servers in A2 +2 servers in A4 +2 servers in A7 ** this is new +2 servers in B2 +2 servers in...
[17:40:47] <wikibugs>	 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mc1032.eqiad.wmnet ` The log can be...
[17:45:28] <Pchelolo>	 would anybody mind if I deploy some little mw config change now?
[17:45:41] <wikibugs>	 (03CR) 10Cicalese: [C: 03+1] "Looks good to me." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647751 (https://phabricator.wikimedia.org/T269809) (owner: 10Ppchelko)
[17:46:46] <wikibugs>	 10Operations, 10ops-eqiad, 10Data-Services, 10cloud-services-team (Hardware): Move labstore1005 to 10Gbps rack and ethernet - https://phabricator.wikimedia.org/T266199 (10Cmjohnson) @bstorm I just found a space for labstore1005. Let's schedule a move for Monday if that works for you, It will go to C4  1004...
[17:47:03] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on wdqs1011 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Ryan Kemper https://phabricator.wikimedia.org/T269872 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:48:50] <wikibugs>	 (03CR) 10Ppchelko: [C: 03+2] Enable wgRestAllowCrossOriginCookieAuth for meta in prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647751 (https://phabricator.wikimedia.org/T269809) (owner: 10Ppchelko)
[17:49:56] <wikibugs>	 (03Merged) 10jenkins-bot: Enable wgRestAllowCrossOriginCookieAuth for meta in prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647751 (https://phabricator.wikimedia.org/T269809) (owner: 10Ppchelko)
[17:50:26] <icinga-wm>	 PROBLEM - Aggregate IPsec Tunnel Status codfw on alert1001 is CRITICAL: instance=mc2032 site=codfw tunnel=mc1032_v4 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan https://grafana.wikimedia.org/d/B9JpocKZz/ipsec-tunnel-status
[17:53:50] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10Event-Platform, and 5 others: Set up internal eventstreams instance exposing all streams declared in stream config (and in kafka jumbo) - https://phabricator.wikimedia.org/T269160 (10fdans) p:05Triage→03Medium
[17:54:20] <logmsgbot>	 !log jiji@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1032.eqiad.wmnet with reason: REIMAGE
[17:54:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:42] <icinga-wm>	 PROBLEM - HP RAID on ms-be1022 is CRITICAL: CRITICAL: Slot 3: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:2 - Failed: 2I:4:1 - Controller: OK - Battery/Capacitor: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[17:54:43] <logmsgbot>	 !log ppchelko@deploy1001 Synchronized wmf-config/InitialiseSettings.php: gerrit:647751 T269809 (duration: 01m 05s)
[17:54:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:45] <stashbot>	 T269809: Clients not displaying in production - https://phabricator.wikimedia.org/T269809
[17:56:20] <logmsgbot>	 !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1032.eqiad.wmnet with reason: REIMAGE
[17:56:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:56:50] <icinga-wm>	 PROBLEM - Memcached on mwdebug1002 is CRITICAL: connect to address 10.64.0.46 and port 11210: Connection refused https://wikitech.wikimedia.org/wiki/Memcached
[17:58:33] <wikibugs>	 10Operations, 10LDAP-Access-Requests: NDA for Superset Request from WMDE Employee - Mohammed Sadat - https://phabricator.wikimedia.org/T269843 (10WMDE-leszek) Thanks for elaboration @jbond. This process was indeed established with @MoritzMuehlenhoff, and we (WMDE managers) had in mind engineering staff. For le...
[17:59:10] <icinga-wm>	 RECOVERY - Aggregate IPsec Tunnel Status codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/strongswan https://grafana.wikimedia.org/d/B9JpocKZz/ipsec-tunnel-status
[18:00:04] <jouncebot>	 chrisalbon and accraze: (Dis)respected human, time to deploy Services – Graphoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201210T1800). Please do the needful.
[18:11:11] <wikibugs>	 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc1032.eqiad.wmnet'] `  and were **ALL** successful.
[18:13:02] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2019 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:15:27] <wikibugs>	 10Operations, 10ops-eqiad, 10Data-Services, 10cloud-services-team (Hardware): Move labstore1005 to 10Gbps rack and ethernet - https://phabricator.wikimedia.org/T266199 (10Bstorm) I don't know if the re-image is ready at this time (haven't synced up with @Andrew on that), so today would probably not have wo...
[18:33:02] <wikibugs>	 (03PS2) 10Bstorm: wikireplicas: close all connections [puppet] - 10https://gerrit.wikimedia.org/r/647419 (https://phabricator.wikimedia.org/T269620)
[18:34:34] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wikireplicas: close all connections [puppet] - 10https://gerrit.wikimedia.org/r/647419 (https://phabricator.wikimedia.org/T269620) (owner: 10Bstorm)
[18:37:22] <wikibugs>	 (03PS3) 10Bstorm: wikireplicas: close all connections [puppet] - 10https://gerrit.wikimedia.org/r/647419 (https://phabricator.wikimedia.org/T269620)
[18:37:36] <wikibugs>	 (03CR) 10Jforrester: [C: 03+1] Remove apache config for zero.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/524088 (https://phabricator.wikimedia.org/T187716) (owner: 10MaxSem)
[18:40:50] <wikibugs>	 (03PS1) 10Razzi: kafka: add kafka-test1007 to kafka-test cluster [puppet] - 10https://gerrit.wikimedia.org/r/647758 (https://phabricator.wikimedia.org/T268202)
[18:41:05] <icinga-wm>	 PROBLEM - Check systemd state on wdqs2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:41:23] <wikibugs>	 (03CR) 10Bstorm: "I think this corrects the confusion and makes it better https://gerrit.wikimedia.org/r/c/operations/puppet/+/647419/1..3/modules/profile/f" [puppet] - 10https://gerrit.wikimedia.org/r/647419 (https://phabricator.wikimedia.org/T269620) (owner: 10Bstorm)
[19:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: #bothumor My software never has bugs. It just develops random features. Rise for Morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201210T1900).
[19:00:04] <jouncebot>	 RoanKattouw: A patch you scheduled for Morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[19:01:00] <RoanKattouw>	 I'll deploy it myself
[19:01:05] <RoanKattouw>	 And add a second one too
[19:01:05] <mutante>	 hashar: what a timing.. here I am 
[19:01:18] <Urbanecm>	 RoanKattouw: assuming you'll deploy, can you deploy https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/645994 as well?
[19:01:25] <RoanKattouw>	 Urbanecm: Will do
[19:01:29] <Urbanecm>	 thank you!
[19:01:32] <hashar>	 mutante: good morning ;)
[19:01:37] <wikibugs>	 (03CR) 10Catrope: [C: 03+2] RCFilters: Temporarily fix TagItemWidget remove button size [core] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/647305 (https://phabricator.wikimedia.org/T269477) (owner: 10Catrope)
[19:02:36] <wikibugs>	 (03PS1) 10Catrope: Add banner module to the homepage [extensions/GrowthExperiments] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/647635 (https://phabricator.wikimedia.org/T269804)
[19:02:46] <wikibugs>	 (03CR) 10Catrope: [C: 03+2] Add banner module to the homepage [extensions/GrowthExperiments] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/647635 (https://phabricator.wikimedia.org/T269804) (owner: 10Catrope)
[19:03:41] <Urbanecm>	 RoanKattouw: oh, that functionality is ready already?
[19:04:07] <RoanKattouw>	 Urbanecm: tgr works fast :)
[19:04:17] <Urbanecm>	 yup :)
[19:04:27] <RoanKattouw>	 I merged it yesterday, I just forgot to create the backport and schedule it
[19:04:43] <Urbanecm>	 cool :)
[19:04:57] <wikibugs>	 (03PS6) 10Dzahn: doc: switch to scap DocumentRoot [puppet] - 10https://gerrit.wikimedia.org/r/620368 (https://phabricator.wikimedia.org/T149924) (owner: 10Hashar)
[19:05:19] <wikibugs>	 (03CR) 10Catrope: [C: 03+2] Add PoolCounter settings for DPL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/645994 (https://phabricator.wikimedia.org/T263220) (owner: 10Brian Wolff)
[19:05:22] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] doc: switch to scap DocumentRoot [puppet] - 10https://gerrit.wikimedia.org/r/620368 (https://phabricator.wikimedia.org/T149924) (owner: 10Hashar)
[19:06:29] <wikibugs>	 (03Merged) 10jenkins-bot: Add PoolCounter settings for DPL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/645994 (https://phabricator.wikimedia.org/T263220) (owner: 10Brian Wolff)
[19:07:56] <mutante>	 !log doc1001 - restarted apache after docroot change
[19:07:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:08:20] <RoanKattouw>	 Urbanecm: Ready for testing on mwdebug1002
[19:08:22] <wikibugs>	 (03PS4) 10Hashar: doc: relocate published documents to /srv/doc [puppet] - 10https://gerrit.wikimedia.org/r/625644 (https://phabricator.wikimedia.org/T149924)
[19:09:01] <Urbanecm>	 RoanKattouw: trying to test
[19:09:17] <Urbanecm>	 (I'm not 100% sure it can be tested, but I can at least verify DPL doesn't breek)
[19:09:19] <Urbanecm>	 *break
[19:09:32] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics-Clusters, 10DC-Ops: (Need By: TBD) rack/setup/install an-worker10[18-41] - https://phabricator.wikimedia.org/T260445 (10elukey) I reviewed rack settings for hadoop, this is my proposal:  >>! In T260445#6682850, @Cmjohnson wrote: > @elukey This is what I have currently...
[19:11:50] <Urbanecm>	 RoanKattouw: DynamicPageList still does its work at ruwikinews, please sync
[19:13:59] <RoanKattouw>	 Thanks, syncing
[19:14:23] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add banner module to the homepage [extensions/GrowthExperiments] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/647635 (https://phabricator.wikimedia.org/T269804) (owner: 10Catrope)
[19:15:22] <logmsgbot>	 !log catrope@deploy1001 Synchronized wmf-config/PoolCounterSettings.php: Add PoolCounter settings for DPL (T263220) (duration: 01m 05s)
[19:15:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:15:25] <stashbot>	 T263220: Limit concurrency of DPL queries - https://phabricator.wikimedia.org/T263220
[19:17:08] <wikibugs>	 (03PS1) 10Dzahn: Revert "doc: switch to scap DocumentRoot" [puppet] - 10https://gerrit.wikimedia.org/r/647636
[19:17:47] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] Revert "doc: switch to scap DocumentRoot" [puppet] - 10https://gerrit.wikimedia.org/r/647636 (owner: 10Dzahn)
[19:18:25] <Urbanecm>	 RoanKattouw: can you revert please? I'm concerned about messages like `Pool key 'nowait:dpl-query:enwikinews' (DPL): Error reading from pool counter server 10.64.0.151.    ` that just started to appear in the logs
[19:18:51] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] Revert "doc: switch to scap DocumentRoot" [puppet] - 10https://gerrit.wikimedia.org/r/647636 (owner: 10Dzahn)
[19:18:51] <RoanKattouw>	 OK, will do
[19:19:48] <wikibugs>	 (03PS1) 10Catrope: Revert "Add PoolCounter settings for DPL" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647637
[19:19:54] <Kizule>	 Hi, why CI isn't still reconfigured?
[19:19:58] <wikibugs>	 (03PS2) 10Catrope: Revert "Add PoolCounter settings for DPL" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647637 (https://phabricator.wikimedia.org/T263220)
[19:20:06] <wikibugs>	 (03CR) 10Catrope: [C: 03+2] Revert "Add PoolCounter settings for DPL" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647637 (https://phabricator.wikimedia.org/T263220) (owner: 10Catrope)
[19:21:00] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Add PoolCounter settings for DPL" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647637 (https://phabricator.wikimedia.org/T263220) (owner: 10Catrope)
[19:21:28] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on wdqs2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Ryan Kemper https://phabricator.wikimedia.org/T269872 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:21:28] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on wdqs2006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Ryan Kemper https://phabricator.wikimedia.org/T269872 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:23:16] <wikibugs>	 (03PS5) 10Hashar: doc: relocate published documents to /srv/doc [puppet] - 10https://gerrit.wikimedia.org/r/625644 (https://phabricator.wikimedia.org/T149924)
[19:23:18] <wikibugs>	 (03PS4) 10Hashar: doc: stop backup for old doc directory [puppet] - 10https://gerrit.wikimedia.org/r/625649 (https://phabricator.wikimedia.org/T149924)
[19:23:20] <wikibugs>	 (03PS4) 10Hashar: doc: remove legacy doc directory [puppet] - 10https://gerrit.wikimedia.org/r/625650 (https://phabricator.wikimedia.org/T149924)
[19:23:22] <wikibugs>	 (03PS1) 10Hashar: doc: switch to scap DocumentRoot [puppet] - 10https://gerrit.wikimedia.org/r/647763 (https://phabricator.wikimedia.org/T149924)
[19:23:49] <Kizule>	 hashar?
[19:24:12] <wikibugs>	 (03PS1) 10Catrope: Guard more singleton() calls with globalArticleInstance() checks [extensions/FlaggedRevs] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/647638 (https://phabricator.wikimedia.org/T269608)
[19:24:39] <wikibugs>	 (03PS2) 10Catrope: Add banner module to the homepage [extensions/GrowthExperiments] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/647635 (https://phabricator.wikimedia.org/T269804)
[19:24:43] <logmsgbot>	 !log catrope@deploy1001 Synchronized wmf-config/PoolCounterSettings.php: Revert PoolCounter settings for DPL (T263220) (duration: 01m 03s)
[19:24:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:24:47] <stashbot>	 T263220: Limit concurrency of DPL queries - https://phabricator.wikimedia.org/T263220
[19:25:06] <wikibugs>	 (03PS1) 10Andrew Bogott: nova-compute/cinder/ceph: add a cinder-specific ceph uuid [puppet] - 10https://gerrit.wikimedia.org/r/647764 (https://phabricator.wikimedia.org/T269511)
[19:25:08] <wikibugs>	 (03CR) 10Hashar: "That one is broken somehow, the documentation links were giving a 404 (ex: https://doc.wikimedia.org/mediawiki-core/master/php/ )." [puppet] - 10https://gerrit.wikimedia.org/r/647763 (https://phabricator.wikimedia.org/T149924) (owner: 10Hashar)
[19:25:11] <wikibugs>	 (03CR) 10Hashar: [C: 04-1] doc: switch to scap DocumentRoot [puppet] - 10https://gerrit.wikimedia.org/r/647763 (https://phabricator.wikimedia.org/T149924) (owner: 10Hashar)
[19:25:38] <Urbanecm>	 thanks RoanKattouw 
[19:26:14] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] doc: switch to scap DocumentRoot [puppet] - 10https://gerrit.wikimedia.org/r/647763 (https://phabricator.wikimedia.org/T149924) (owner: 10Hashar)
[19:27:44] <wikibugs>	 (03PS2) 10Andrew Bogott: nova-compute/cinder/ceph: add a cinder-specific ceph uuid [puppet] - 10https://gerrit.wikimedia.org/r/647764 (https://phabricator.wikimedia.org/T269511)
[19:29:14] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] nova-compute/cinder/ceph: add a cinder-specific ceph uuid [puppet] - 10https://gerrit.wikimedia.org/r/647764 (https://phabricator.wikimedia.org/T269511) (owner: 10Andrew Bogott)
[19:30:39] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/27076/otrs1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/647738 (https://phabricator.wikimedia.org/T266479) (owner: 10Dzahn)
[19:30:50] <wikibugs>	 (03PS2) 10Dzahn: otrs: require_package -> ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/647738 (https://phabricator.wikimedia.org/T266479)
[19:33:15] <wikibugs>	 (03PS5) 10Jeena Huneidi: 0.1.0: Add ENABLE_DEBUG_LOGGING setting [deployment-charts] - 10https://gerrit.wikimedia.org/r/647355 (owner: 10Ahmon Dancy)
[19:33:48] <wikibugs>	 (03Merged) 10jenkins-bot: RCFilters: Temporarily fix TagItemWidget remove button size [core] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/647305 (https://phabricator.wikimedia.org/T269477) (owner: 10Catrope)
[19:35:23] <wikibugs>	 (03CR) 10Jeena Huneidi: [C: 03+2] "Thanks for adding this!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/647355 (owner: 10Ahmon Dancy)
[19:36:51] <wikibugs>	 (03CR) 10Dzahn: "noop on otrs1001" [puppet] - 10https://gerrit.wikimedia.org/r/647738 (https://phabricator.wikimedia.org/T266479) (owner: 10Dzahn)
[19:37:05] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/27078/miscweb1002.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/647739 (https://phabricator.wikimedia.org/T266479) (owner: 10Dzahn)
[19:37:10] <wikibugs>	 (03Merged) 10jenkins-bot: 0.1.0: Add ENABLE_DEBUG_LOGGING setting [deployment-charts] - 10https://gerrit.wikimedia.org/r/647355 (owner: 10Ahmon Dancy)
[19:37:14] <wikibugs>	 (03PS2) 10Dzahn: wikimania_scholarships: require_package -> ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/647739 (https://phabricator.wikimedia.org/T266479)
[19:38:36] <wikibugs>	 (03PS2) 10Dzahn: poolcounter: require_package -> ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/647737 (https://phabricator.wikimedia.org/T266479)
[19:39:15] <wikibugs>	 (03CR) 10Dzahn: "noop on miscweb1002" [puppet] - 10https://gerrit.wikimedia.org/r/647739 (https://phabricator.wikimedia.org/T266479) (owner: 10Dzahn)
[19:40:31] <wikibugs>	 (03CR) 10Dzahn: [V: 03+1] "https://puppet-compiler.wmflabs.org/compiler1001/27079/orespoolcounter2003.codfw.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/647737 (https://phabricator.wikimedia.org/T266479) (owner: 10Dzahn)
[19:40:58] <wikibugs>	 (03CR) 10Dzahn: [V: 03+1 C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/27080/poolcounter1005.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/647737 (https://phabricator.wikimedia.org/T266479) (owner: 10Dzahn)
[19:41:54] <logmsgbot>	 !log catrope@deploy1001 Synchronized php-1.36.0-wmf.21/resources/src/mediawiki.rcfilters/styles/mw.rcfilters.ui.FilterTagMultiselectWidget.less: Work around OOUI bug breaking RCFilters UI (T269477) (duration: 01m 04s)
[19:41:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:41:58] <stashbot>	 T269477: [wmf.21-regression] RC/Watchlist -misaligned  close icon in oo-ui-tagMultiselectWidget-group  - https://phabricator.wikimedia.org/T269477
[19:42:37] <wikibugs>	 (03CR) 10Dzahn: "noop on poolcounter1005" [puppet] - 10https://gerrit.wikimedia.org/r/647737 (https://phabricator.wikimedia.org/T266479) (owner: 10Dzahn)
[19:42:52] <wikibugs>	 (03CR) 10Catrope: [C: 03+2] Guard more singleton() calls with globalArticleInstance() checks [extensions/FlaggedRevs] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/647638 (https://phabricator.wikimedia.org/T269608) (owner: 10Catrope)
[19:44:56] <wikibugs>	 (03PS2) 10Dave Pifke: webperf: require_package -> ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/647031 (https://phabricator.wikimedia.org/T266479)
[19:47:11] <wikibugs>	 (03CR) 10Dave Pifke: "Thanks for the re: trick!" [puppet] - 10https://gerrit.wikimedia.org/r/647031 (https://phabricator.wikimedia.org/T266479) (owner: 10Dave Pifke)
[19:48:47] <wikibugs>	 (03Merged) 10jenkins-bot: Guard more singleton() calls with globalArticleInstance() checks [extensions/FlaggedRevs] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/647638 (https://phabricator.wikimedia.org/T269608) (owner: 10Catrope)
[19:49:52] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1001/27082/  want it merged now?" [puppet] - 10https://gerrit.wikimedia.org/r/647031 (https://phabricator.wikimedia.org/T266479) (owner: 10Dave Pifke)
[19:51:14] <wikibugs>	 (03PS3) 10Andrew Bogott: nova-compute/cinder/ceph: add a cinder-specific ceph uuid [puppet] - 10https://gerrit.wikimedia.org/r/647764 (https://phabricator.wikimedia.org/T269511)
[19:53:01] <icinga-wm>	 RECOVERY - Check systemd state on wdqs2006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:53:26] <mutante>	 dpifke: wanna get that done with right now?
[19:53:41] <mutante>	 pretty confident it will be noop
[19:54:00] <dpifke>	 I'm confused by "Resources only in the old catalog" in the PCC output.
[19:54:11] <wikibugs>	 (03Merged) 10jenkins-bot: Add banner module to the homepage [extensions/GrowthExperiments] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/647635 (https://phabricator.wikimedia.org/T269804) (owner: 10Catrope)
[19:54:32] <dpifke>	 That seems to imply to me that the packages would no longer be installed?
[19:54:52] <mutante>	 dpifke: it's because require_package creates a resource for each package that everything else is dependent on
[19:55:01] <mutante>	 it doesn't mean it will remove the packages
[19:55:15] <mutante>	 I know it looks weird but I just did the same thing for like 3 other places
[19:55:23] <dpifke>	 Right, but will they be added if we ever try to deploy on a new host?
[19:56:16] <dpifke>	 No objection to merging if you're confident it's correct.  Mostly trying to understand for my own edification. :)
[19:57:02] <mutante>	 dpifke: yes, if you go to "change catalog" https://puppet-compiler.wmflabs.org/compiler1001/27082/webperf1001.eqiad.wmnet/change.webperf1001.eqiad.wmnet.pson
[19:57:12] <mutante>	 you can see there is still:
[19:57:13] <mutante>	   "type": "Package",
[19:57:13] <mutante>	       "title": "python3-tz",
[19:57:15] <mutante>	 for example
[19:57:51] <icinga-wm>	 PROBLEM - Check systemd state on wdqs2006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:57:52] <mutante>	 so you could find each of the packages in the full catalog after the change if you wanted to
[19:58:21] <dpifke>	 Ah-ha.  Makes sense.
[19:59:58] <mutante>	 dpifke: the actual explanation is that require_package does this:
[19:59:59] <mutante>	 # Create class scope
[19:59:59] <mutante>	  36       cls = Puppet::Parser::Resource.new(
[19:59:59] <mutante>	  37           'class', class_name, :scope => compiler.topscope)
[20:00:04] <jouncebot>	 twentyafterfour and marxarelli: Dear deployers, time to do the Mediawiki train - American Version deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201210T2000).
[20:00:05] <RoanKattouw>	 I'm still doing the last syncs for the backport window, they're delayed due to a CI issue
[20:00:08] <RoanKattouw>	 ( twentyafterfour  )
[20:00:08] <mutante>	 so each package is a separate class
[20:00:22] <mutante>	 but with ensure_package it's a resource.. but not its own class for each package
[20:00:30] <twentyafterfour>	 RoanKattouw: ok
[20:02:38] <twentyafterfour>	 ]'
[20:02:42] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] webperf: require_package -> ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/647031 (https://phabricator.wikimedia.org/T266479) (owner: 10Dave Pifke)
[20:03:31] <hauskatze>	 twentyafterfour: is there some cache for phab subproject membership? After leaving a project I still remain as "member" of its subprojects
[20:04:02] <twentyafterfour>	 hauskatze: there shouldn't be 
[20:04:22] <dpifke>	 Thanks!  If you're in a +2 mood, this is ready (and should be safe to merge whenever) as well: https://gerrit.wikimedia.org/r/c/operations/puppet/+/636759 :)
[20:04:30] <twentyafterfour>	 hauskatze: there may be a bug though 
[20:04:30] <hauskatze>	 e.g. I removed myself from #wiki-setup but I still appear as member of its subprojects
[20:04:32] <hauskatze>	 smh
[20:04:44] <mutante>	 dpifke: hope that makes sense ^ and I merged and confirmed on both webperf1001 and webperf1002 nothing changed
[20:04:46] <twentyafterfour>	 hauskatze: I'll test a bit 
[20:04:47] <hauskatze>	 twentyafterfour: random features :)
[20:04:51] <mutante>	 hauskatze: watcher vs member maybe?
[20:04:58] <hauskatze>	 none of them
[20:05:11] <dpifke>	 Makes lots of sense, appreciate the explanation.
[20:05:43] <wikibugs>	 (03CR) 10Dzahn: "confirmed noop on webperf1001 and webperf1002" [puppet] - 10https://gerrit.wikimedia.org/r/647031 (https://phabricator.wikimedia.org/T266479) (owner: 10Dave Pifke)
[20:06:10] <wikibugs>	 (03PS1) 10Ryan Kemper: categories: fix prom exporter's broken namespace [puppet] - 10https://gerrit.wikimedia.org/r/647774 (https://phabricator.wikimedia.org/T269872)
[20:06:24] <logmsgbot>	 !log catrope@deploy1001 Synchronized php-1.36.0-wmf.21/extensions/FlaggedRevs/: Guard more singleton() calls with globalArticleInstance() checks (T269608, to unbreak CI in wmf.21) (duration: 01m 04s)
[20:06:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:06:29] <stashbot>	 T269608: Several failing tests in Wikibase CI (CentralAuthApiSessionProviderTest, CentralAuthHeaderSessionProviderTest, EditEntityActionTest, ViewEntityActionTest, HtmlPageLinkRendererEndHookHandlerTest) - https://phabricator.wikimedia.org/T269608
[20:06:57] <RoanKattouw>	 One more, and then I'll be done
[20:07:02] <mutante>	 dpifke: yea, i'll merge that as well. but let's see if that actually removes those modules, I think not without some manual action
[20:07:52] <logmsgbot>	 !log catrope@deploy1001 Synchronized php-1.36.0-wmf.21/extensions/GrowthExperiments/: Add banner module to the homepage (T269804) (duration: 01m 03s)
[20:07:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:07:57] <stashbot>	 T269804: Banner module on the Growth homepage - https://phabricator.wikimedia.org/T269804
[20:09:30] <RoanKattouw>	 twentyafterfour: I'm done, it's all yours
[20:10:12] <wikibugs>	 (03CR) 10Hashar: gerrit: use proper hostname on replica hosts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/643919 (owner: 10Hashar)
[20:10:37] <wikibugs>	 (03PS4) 10Hashar: gerrit: use proper hostname on replica hosts [puppet] - 10https://gerrit.wikimedia.org/r/643919
[20:10:57] <wikibugs>	 (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/643919 (owner: 10Hashar)
[20:13:44] <dpifke>	 mutante: I've got to step away for a few minutes to walk the dog and drop off a package, but I can do whatever cleanup is needed (e.g. running a2dismod) after that.
[20:15:27] <twentyafterfour>	 thanks RoanKattouw
[20:15:33] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] arclamp: add CORS header and clean up modules [puppet] - 10https://gerrit.wikimedia.org/r/636759 (owner: 10Dave Pifke)
[20:15:44] <mutante>	 dpifke: sounds good, yes, go ahead! I am making sure nothing breaks and leave the cleanup to you.
[20:16:03] <dpifke>	 Thanks!
[20:16:07] <mutante>	 np
[20:19:14] <wikibugs>	 (03CR) 10Hashar: "https://puppet-compiler.wmflabs.org/compiler1003/654/" [puppet] - 10https://gerrit.wikimedia.org/r/643919 (owner: 10Hashar)
[20:25:41] <logmsgbot>	 !log hashar@deploy1001 Started deploy [integration/docroot@fdf0917]: (no justification provided)
[20:25:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:25:48] <logmsgbot>	 !log hashar@deploy1001 Finished deploy [integration/docroot@fdf0917]: (no justification provided) (duration: 00m 06s)
[20:25:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:25:56] <wikibugs>	 10Operations, 10LDAP-Access-Requests: NDA for Superset Request from WMDE Employee - Mohammed Sadat - https://phabricator.wikimedia.org/T269843 (10KFrancis) >>! In T269843#6682457, @jbond wrote: > @KFrancis are you able to help with processin the NDA for Mohammed >  > We will also need [[ https://wikitech.wikim...
[20:27:29] <wikibugs>	 (03CR) 10Dzahn: "new config snippet was added on webperf2002  and service got refreshed but for example the php7.0 apache module is still enabled. that wil" [puppet] - 10https://gerrit.wikimedia.org/r/636759 (owner: 10Dave Pifke)
[20:31:58] <wikibugs>	 (03PS1) 10Mforns: Migrate Growth schemas from EventLogging to EventGate on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647782 (https://phabricator.wikimedia.org/T267333)
[20:32:47] <marxarelli>	 twentyafterfour: o/ are you holding due to https://phabricator.wikimedia.org/T269477 ?
[20:34:11] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to releasers-wikibase for toan - https://phabricator.wikimedia.org/T269777 (10KFrancis) >>! In T269777#6682253, @jbond wrote: > @KFrancis Are you able to confirm NDA status for Tobias, thanks  @jbond I was not able to find an NDA on r...
[20:36:52] <wikibugs>	 10Operations, 10Domains, 10Traffic: URL to redirect to upcoming Wikipedia Birthday page on wikimediafoundation.org - https://phabricator.wikimedia.org/T264367 (10hdothiduc) Awesome, thank you very much! The redirect (from wikimediafoundation.org/wikipedia20 to wikimediafoundation.org) happens to fast that ba...
[20:39:05] <twentyafterfour>	 marxarelli: yes 
[20:39:56] <twentyafterfour>	 I thought the fix was deployed but Volker_E says it's still a blocker 
[20:40:17] <Volker_E>	 twentyafterfour: marxarelli: I'm on it
[20:40:58] <Volker_E>	 the fix was only catching the most popular instance of the widget, not the several others
[20:41:39] <Volker_E>	 this was the misconception. Lukas Werkmeister captured the other instances last night, when I was already off after delivering the quick-fix
[20:41:57] <Volker_E>	 and I haven't had time for anything more until now
[20:42:08] <wikibugs>	 (03PS4) 10Andrew Bogott: nova-compute/cinder/ceph: add a cinder-specific ceph uuid [puppet] - 10https://gerrit.wikimedia.org/r/647764 (https://phabricator.wikimedia.org/T269511)
[20:44:55] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] nova-compute/cinder/ceph: add a cinder-specific ceph uuid [puppet] - 10https://gerrit.wikimedia.org/r/647764 (https://phabricator.wikimedia.org/T269511) (owner: 10Andrew Bogott)
[20:46:26] <ottomata>	 twentyafterfour: marxarelli  i have a config change for testwiki i'd like to deploy; is the train clear?
[20:46:34] <twentyafterfour>	 thanks Volker_E, just let me know when a patch is ready.  
[20:46:40] <twentyafterfour>	 ottomata: train is on hold so go ahead 
[20:46:48] <ottomata>	 k danke
[20:47:09] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Migrate Growth schemas from EventLogging to EventGate on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647782 (https://phabricator.wikimedia.org/T267333) (owner: 10Mforns)
[20:48:01] <wikibugs>	 (03Merged) 10jenkins-bot: Migrate Growth schemas from EventLogging to EventGate on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647782 (https://phabricator.wikimedia.org/T267333) (owner: 10Mforns)
[20:51:00] <wikibugs>	 (03PS1) 10Ottomata: wgEventLoggingSchemas - remove SpecialMuteSubmit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647785 (https://phabricator.wikimedia.org/T268517)
[20:52:18] <wikibugs>	 (03PS1) 10Andrew Bogott: rbd_libvirt: fix installation of the cinder ceph secret [puppet] - 10https://gerrit.wikimedia.org/r/647786 (https://phabricator.wikimedia.org/T269511)
[20:52:26] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] wgEventLoggingSchemas - remove SpecialMuteSubmit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647785 (https://phabricator.wikimedia.org/T268517) (owner: 10Ottomata)
[20:53:43] <icinga-wm>	 PROBLEM - Widespread puppet agent failures on alert1001 is CRITICAL: 0.01063 ge 0.01 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet
[20:54:02] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] rbd_libvirt: fix installation of the cinder ceph secret [puppet] - 10https://gerrit.wikimedia.org/r/647786 (https://phabricator.wikimedia.org/T269511) (owner: 10Andrew Bogott)
[20:54:35] <logmsgbot>	 !log otto@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Migrate Growth EventLogging schemas to Event Platform on testwiki - T267333 (duration: 01m 03s)
[20:54:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:54:39] <stashbot>	 T267333: Migrate Growth EventLogging schemas to Event Platform - https://phabricator.wikimedia.org/T267333
[20:56:54] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install aqs101[0-5] - https://phabricator.wikimedia.org/T267414 (10wiki_willy) a:03Cmjohnson Hardware arrived Dec 3
[21:08:37] <Volker_E>	 twentyafterfour: 
[21:08:51] <Volker_E>	 https://gerrit.wikimedia.org/r/c/mediawiki/core/+/647790 needs to be merged first
[21:08:59] <RoanKattouw>	 On it
[21:09:53] <RoanKattouw>	 I'm going to take the liberty of cherry-picking that immediately, without waiting for it to merge
[21:10:07] <wikibugs>	 (03PS1) 10Catrope: OOUI: Backport I18799e54ef46232a54d36e86e2b3d08c3ee0a3d5 [core] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/647641 (https://phabricator.wikimedia.org/T269477)
[21:10:15] <wikibugs>	 (03CR) 10Catrope: [C: 03+2] OOUI: Backport I18799e54ef46232a54d36e86e2b3d08c3ee0a3d5 [core] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/647641 (https://phabricator.wikimedia.org/T269477) (owner: 10Catrope)
[21:10:54] <RoanKattouw>	 That should speed up unblocking the train, because gate-and-submit for wmf.21 patches in core took ~25 minutes when I did my backport earlier today
[21:19:20] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install aqs101[0-5] - https://phabricator.wikimedia.org/T267414 (10Cmjohnson) @Jclark-ctr where are these?
[21:25:59] <twentyafterfour>	 RoanKattouw: yeah unfortunately our test suite has gotten pretty slow. 
[21:26:22] <twentyafterfour>	 I mean it's always been kinda slow as long as I can remember but seems to be trending in the direction of slow
[21:27:21] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on alert1001 is CRITICAL: 313 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[21:28:08] <twentyafterfour>	 whoa that's quite a spike in fatals
[21:28:22] <DannyS712>	 fixing https://phabricator.wikimedia.org/T155147 might help with the slow tests
[21:28:59] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on alert1001 is OK: (C)100 gt (W)50 gt 3 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[21:33:44] <thcipriani>	 seems like that one spikes every now and then. Looks like there's a patch for on the task: https://phabricator.wikimedia.org/T249745
[21:36:13] <wikibugs>	 (03PS1) 10Andrew Bogott: Cinder: set default quotas to be very low [puppet] - 10https://gerrit.wikimedia.org/r/647795 (https://phabricator.wikimedia.org/T269511)
[21:36:53] <hauskatze>	 twentyafterfour: fwiw I filed T269893 for docs :)
[21:36:53] <stashbot>	 T269893: Phabricator keeps displaying my account as a "shadow" member of milestones after leaving parent project - https://phabricator.wikimedia.org/T269893
[21:37:09] <hauskatze>	 title is crap, sorry; can't find the right words right now
[21:37:27] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Cinder: set default quotas to be very low [puppet] - 10https://gerrit.wikimedia.org/r/647795 (https://phabricator.wikimedia.org/T269511) (owner: 10Andrew Bogott)
[21:38:18] <twentyafterfour>	 hauskatze: I think this is a bug, I'll look into pushing it upstream or if it's easy I may patch it locally
[21:39:09] <hauskatze>	 twentyafterfour: as you think it's best. I'm really sorry to put yet-another-task in your backlog
[21:39:11] <hauskatze>	 :(
[21:40:23] <twentyafterfour>	 hauskatze: no problem, I think it's a legit bug in upstream phabricator 
[21:42:03] <wikibugs>	 (03Merged) 10jenkins-bot: OOUI: Backport I18799e54ef46232a54d36e86e2b3d08c3ee0a3d5 [core] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/647641 (https://phabricator.wikimedia.org/T269477) (owner: 10Catrope)
[21:44:54] <wikibugs>	 (03CR) 10VolkerE: [C: 03+1] OOUI: Backport I18799e54ef46232a54d36e86e2b3d08c3ee0a3d5 [core] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/647641 (https://phabricator.wikimedia.org/T269477) (owner: 10Catrope)
[21:46:04] <DannyS712>	 Pchelolo is it possible to see the parser cache key that is being shown clientside?
[21:46:36] <Pchelolo>	 yes DannyS712. every page HTML has HTML comment <!-- Saved in parser cache ...
[21:46:51] <Pchelolo>	 if it's served from the cache
[21:47:44] <DannyS712>	 do we have caching for old revisions yet? Trying to figure out T269860
[21:47:45] <stashbot>	 T269860: Transcluding {{Special:ListFiles/$username}} on any page overrides the skin selected by the viewer in Preferences, makes it Vector-only - https://phabricator.wikimedia.org/T269860
[21:51:48] <hauskatze>	 twentyafterfour: thanks; I saw you comment on 13478 upstream
[21:57:33] <sbassett>	 Hey twentyafterfour: how goes the .21 rollout?  Have an updated security patch for the second core patch currently applied (02-T120883.patch) we'd like to try soon, if possible.
[21:58:23] <twentyafterfour>	 sbassett: train has been blocked, was about to be unblocked but you can update the security patch now if you'd like 
[21:59:03] <sbassett>	 Ok, I can wait if need be.  New patch will definitely require some testing since the previous one blew up a bit last time.
[21:59:59] <twentyafterfour>	 sbassett: I say do it now unless RoanKattouw is deploying?  https://gerrit.wikimedia.org/r/c/mediawiki/core/+/647641 merged but not sure if it got deployed
[22:00:10] <sbassett>	 Ok
[22:00:19] <RoanKattouw>	 Oh whoops sorry for dropping the ball on that
[22:00:33] <twentyafterfour>	 RoanKattouw: it's ok I was about to ask then sbassett showed up ;) 
[22:00:45] <twentyafterfour>	 RoanKattouw: I can deploy that after sbassett tests the security patch 
[22:00:56] <RoanKattouw>	 I figured I might as well start loading the dishwasher and then I forgot about this patch
[22:02:09] <sbassett>	 Ok, DannyS712: you around for testing?
[22:03:34] <logmsgbot>	 !log razzi@cumin1001 START - Cookbook sre.ganeti.makevm
[22:03:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:03:47] <Pchelolo>	 DannyS712: old revision cache is only enabled for group0 - test wikis, mediawiki.org etc
[22:03:55] <wikibugs>	 10Operations, 10Domains, 10Traffic: URL to redirect to upcoming Wikipedia Birthday page on wikimediafoundation.org - https://phabricator.wikimedia.org/T264367 (10Dzahn) 05Open→03Resolved a:03Dzahn @hdothiduc Great! Thanks for confirming. You are welcome. I will call this resolved and yes, exactly, just...
[22:04:46] <wikibugs>	 (03CR) 10Dave Pifke: "On each affected host, I ran:" [puppet] - 10https://gerrit.wikimedia.org/r/636759 (owner: 10Dave Pifke)
[22:07:51] <wikibugs>	 (03CR) 10Dzahn: "> Patch Set 4:" [puppet] - 10https://gerrit.wikimedia.org/r/636759 (owner: 10Dave Pifke)
[22:09:11] <sbassett>	 twentyafterfour RoanKattouw: Ok, testing and (hopefully) deploying new sec patch now just on .21
[22:09:31] <DannyS712>	 is it staged on 1002?
[22:10:46] <sbassett>	 DannyS712: yes, should be now, if you want to test a wiki or two on there...
[22:11:17] <wikibugs>	 (03CR) 10Razzi: [C: 03+2] kafka: add kafka-test1007 to kafka-test cluster [puppet] - 10https://gerrit.wikimedia.org/r/647758 (https://phabricator.wikimedia.org/T268202) (owner: 10Razzi)
[22:13:03] <sbassett>	 DannyS712: I could probably put it on .20 as well if that's easier to test
[22:14:26] <DannyS712>	 oh, yes please - I was testing on enwiki and it wasn't working (I don't know of any accounts to test with on testwiki)
[22:16:13] <sbassett>	 DannyS712: try now?
[22:16:26] <sbassett>	 the one enwiki test page wfm
[22:16:35] <sbassett>	 on debug1002
[22:17:00] <sbassett>	 well, "works", as in doesn't throw a fatal
[22:17:43] <DannyS712>	 yeah, works for me and fixes the issue as intended as far as I can tell - same behavior for wgRelevantUserName and for displaying the notices for both accounts
[22:18:31] <wikibugs>	 (03PS1) 10Bstorm: labstore: change versions to match what's actually installed [puppet] - 10https://gerrit.wikimedia.org/r/647803
[22:18:36] <sbassett>	 Ok, cool.  I can deploy to (the soon to not exist) .20 and .21, I suppose if it looks good to you.
[22:19:32] <DannyS712>	 yes, it does (though I noticed another difference that leaks, but that can be v8 :) - v7 is better than the current v5 and doesn't appear to break anything
[22:20:01] <DannyS712>	 `mw.config.get( [ 'wgHostname', 'wgRelevantUserName' ] )` returns the same for all four pages
[22:21:10] <sbassett>	 Ok, fair enough.  Let's just get v7 out :)
[22:22:29] <sbassett>	 !log Deployed security patch for T120883 (v7) to wmf.20
[22:23:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:24:22] <wikibugs>	 (03PS1) 10Razzi: yarn: aggregate logs every hour for long-running jobs [puppet] - 10https://gerrit.wikimedia.org/r/647805 (https://phabricator.wikimedia.org/T269616)
[22:24:27] <sbassett>	 !log Deployed security patch for T120883 (v7) to wmf.21
[22:24:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:25:36] <sbassett>	 twentyafterfour RoanKattouw DannyS712: Ok, security patch looks good.  Tested, deployed to .20 and .21 and nothing going crazy in logstash.  I just need to clean up the patches in /srv/patches and we're good.  Thanks.
[22:25:54] <twentyafterfour>	 thanks sbassett
[22:26:07] <DannyS712>	 thanks for the help. Will send patch v8 later, but that can wait until tomorrow to delay
[22:26:11] <DannyS712>	 *deploy
[22:26:26] <twentyafterfour>	 I'll go ahead and deploy oojs-ui-widgets-wikimediaui.css now 
[22:26:42] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] "this is a no-op even though it doesn't look like one" [puppet] - 10https://gerrit.wikimedia.org/r/647803 (owner: 10Bstorm)
[22:27:13] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] labstore: change versions to match what's actually installed [puppet] - 10https://gerrit.wikimedia.org/r/647803 (owner: 10Bstorm)
[22:32:31] <logmsgbot>	 !log twentyafterfour@deploy1001 Synchronized php-1.36.0-wmf.21/resources/lib/ooui/oojs-ui-widgets-wikimediaui.css: sync https://gerrit.wikimedia.org/r/c/mediawiki/core/+/647641 to fix T269477 and unblock T264801 (duration: 01m 04s)
[22:32:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:32:37] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] yarn: aggregate logs every hour for long-running jobs [puppet] - 10https://gerrit.wikimedia.org/r/647805 (https://phabricator.wikimedia.org/T269616) (owner: 10Razzi)
[22:32:37] <stashbot>	 T264801: 1.36.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T264801
[22:32:37] <stashbot>	 T269477: [wmf.21-regression] RC/Watchlist -misaligned  close icon in oo-ui-tagMultiselectWidget-group  - https://phabricator.wikimedia.org/T269477
[22:34:37] <twentyafterfour>	 ok do I need to do something more than sync that css file to fix? I still see misalignment on mediawiki.org preferences page 
[22:34:47] <twentyafterfour>	 Volker_E: RoanKattouw: ^
[22:35:11] <RoanKattouw>	 twentyafterfour: It may take a few minutes for CSS changes to propagate
[22:35:16] <RoanKattouw>	 I had a similar issue when testing my own deployment earlier
[22:35:22] <twentyafterfour>	 ok
[22:37:30] <RoanKattouw>	 Also I believe it's working for me (I'm looking at the x icons in the "Muted users" preference)
[22:38:23] <twentyafterfour>	 I'm looking at the same, maybe I don't know what it's supposed to look like 
[22:40:07] <twentyafterfour>	 oh looks good now
[22:40:14] <twentyafterfour>	 I guess it was cached. cool 
[22:40:33] <twentyafterfour>	 ok closing task and unblocking train. Thanks everybody
[22:42:47] <wikibugs>	 10Operations, 10Traffic, 10Readers-Web-Backlog (Needs Product Owner Decisions): [Bug] iPadOS 13 shows the desktop version of Safari with a broken layout - https://phabricator.wikimedia.org/T229875 (10Ckoerner) Adding a note. in iPadOS 14 the behavior appears to be different from what is described in the desc...
[22:48:37] <wikibugs>	 (03PS1) 10Andrew Bogott: Cinder: update policy.yaml [puppet] - 10https://gerrit.wikimedia.org/r/647810 (https://phabricator.wikimedia.org/T269511)
[22:49:11] <wikibugs>	 (03PS1) 10Mforns: Migrate EventLogging Growth schemas to EventGate on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647811 (https://phabricator.wikimedia.org/T267333)
[22:49:40] <twentyafterfour>	 I'm seeing a lot of http 415 unsupported media type errors that weren't happening until about 40 minutes ago
[22:49:47] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Cinder: update policy.yaml [puppet] - 10https://gerrit.wikimedia.org/r/647810 (https://phabricator.wikimedia.org/T269511) (owner: 10Andrew Bogott)
[22:49:59] <twentyafterfour>	 https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&from=now-1h&to=now&refresh=30s
[22:50:15] <wikibugs>	 (03Abandoned) 10Mforns: Migrate EventLogging NewcomerTask to EventGate on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/639539 (https://phabricator.wikimedia.org/T267333) (owner: 10Mforns)
[22:51:34] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] Migrate EventLogging Growth schemas to EventGate on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647811 (https://phabricator.wikimedia.org/T267333) (owner: 10Mforns)
[22:51:53] <twentyafterfour>	 there were no deployments at that time so I guess I should just ignore? 
[22:53:11] <twentyafterfour>	 it's like 1.6k requests per second
[22:53:44] <twentyafterfour>	 I'm surprised this isn't alerting
[22:58:45] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Migrate EventLogging Growth schemas to EventGate on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647811 (https://phabricator.wikimedia.org/T267333) (owner: 10Mforns)
[23:01:25] <logmsgbot>	 !log otto@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Migrate Growth EventLogging schemas to Event Platform on all wikis - T267333 (duration: 01m 09s)
[23:01:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:01:28] <stashbot>	 T267333: Migrate Growth EventLogging schemas to Event Platform - https://phabricator.wikimedia.org/T267333
[23:02:02] <thcipriani>	 twentyafterfour: that is a noticible spike
[23:02:23] <twentyafterfour>	 for sure, it's pretty big 
[23:02:23] <thcipriani>	 happened right before a deployment so it seems unrelated to deploys
[23:02:32] <wikibugs>	 (03PS1) 10Bstorm: partman: build a recipe to re-image nfs servers [puppet] - 10https://gerrit.wikimedia.org/r/647815 (https://phabricator.wikimedia.org/T266199)
[23:02:41] <thcipriani>	 it looks like it's all *new* traffic?
[23:03:11] <thcipriani>	 like we're getting an addition 1500reqps and they're all 415s?
[23:03:23] <wikibugs>	 (03CR) 10Bstorm: "I have really no idea what I'm doing here, so I'm submitting this for help as much as for review." [puppet] - 10https://gerrit.wikimedia.org/r/647815 (https://phabricator.wikimedia.org/T266199) (owner: 10Bstorm)
[23:04:02] <twentyafterfour>	 yep
[23:04:24] <wikibugs>	 (03PS1) 10Mforns: Refine Growth schemas using eventlogging_legacy job [puppet] - 10https://gerrit.wikimedia.org/r/647817 (https://phabricator.wikimedia.org/T267333)
[23:06:51] <logmsgbot>	 !log razzi@cumin1001 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
[23:06:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:08:13] <wikibugs>	 (03CR) 10Andrew Bogott: "The only thing I have to add (maybe obvious) is that when you invoke this it will need to be with" [puppet] - 10https://gerrit.wikimedia.org/r/647815 (https://phabricator.wikimedia.org/T266199) (owner: 10Bstorm)
[23:12:01] <twentyafterfour>	 seems the consensus is that this is a misbehaving bot so I'm going to deploy mediawiki 1.36.0-wmf.21 
[23:12:59] <wikibugs>	 (03PS1) 1020after4: all wikis to 1.36.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647823
[23:13:01] <wikibugs>	 (03CR) 1020after4: [C: 03+2] all wikis to 1.36.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647823 (owner: 1020after4)
[23:13:47] <wikibugs>	 (03Merged) 10jenkins-bot: all wikis to 1.36.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/647823 (owner: 1020after4)
[23:15:04] <logmsgbot>	 !log twentyafterfour@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.21
[23:15:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:16:05] <twentyafterfour>	 ok wmf.21 seems stable! 
[23:16:23] <twentyafterfour>	 🎉
[23:18:03] <marxarelli>	 \o/
[23:19:17] <thcipriani>	 twentyafterfour: nice :)
[23:24:37] <icinga-wm>	 PROBLEM - Router interfaces on cr3-knams is CRITICAL: CRITICAL: host 91.198.174.246, interfaces up: 75, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[23:29:11] <DannyS712>	 sbassett twentyafterfour want to try with v8 if the train is done?
[23:29:24] <twentyafterfour>	 train is done
[23:32:31] <wikibugs>	 (03PS1) 10PipelineBot: blubberoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/647831
[23:35:50] <Urbanecm>	 !log [urbanecm@mwmaint1002 ~]$ mwscript resetAuthenticationThrottle.php --wiki=enwiki --login --ip 'REDACTED' --user 'WP 1.0 bot' # T269898
[23:35:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:35:54] <stashbot>	 T269898: Bot needs account unlocked - https://phabricator.wikimedia.org/T269898
[23:44:15] <wikibugs>	 (03PS1) 10Ahmon Dancy: Reorganized setup.sh and added db wait loop [deployment-charts] - 10https://gerrit.wikimedia.org/r/647842
[23:44:17] <wikibugs>	 (03PS1) 10Ahmon Dancy: New utility macros in templates/_mediawiki-common.tpl [deployment-charts] - 10https://gerrit.wikimedia.org/r/647843
[23:44:19] <wikibugs>	 (03PS1) 10Ahmon Dancy: 0.2.0: Use a Job to set up the database [deployment-charts] - 10https://gerrit.wikimedia.org/r/647844
[23:44:26] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] "one week without complaints so I'm merging." [puppet] - 10https://gerrit.wikimedia.org/r/645096 (https://phabricator.wikimedia.org/T269252) (owner: 10Andrew Bogott)
[23:50:45] <icinga-wm>	 RECOVERY - Router interfaces on cr3-knams is OK: OK: host 91.198.174.246, interfaces up: 79, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down