[01:08:20] <icinga-wm>	 PROBLEM - Thanos sidecar is failing to upload blocks on alert1001 is CRITICAL: cluster=prometheus instance=prometheus1004 job=thanos-sidecar prometheus=ops site=eqiad https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar
[01:27:00] <icinga-wm>	 RECOVERY - Check systemd state on maps2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:30:20] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 234 probes of 567 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[01:32:04] <icinga-wm>	 PROBLEM - Check systemd state on maps2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:36:02] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 52 probes of 567 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[02:27:42] <icinga-wm>	 RECOVERY - cassandra service on maps2002 is OK: OK - cassandra is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[02:30:32] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 149 probes of 567 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[02:32:46] <icinga-wm>	 PROBLEM - cassandra service on maps2002 is CRITICAL: CRITICAL - Expecting active but unit cassandra is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[02:36:18] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 57 probes of 567 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[03:03:58] <icinga-wm>	 RECOVERY - snapshot of s7 in codfw on alert1001 is OK: Last snapshot for s7 at codfw (db2100.codfw.wmnet:3317) taken on 2020-11-02 01:31:32 (1021 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting
[03:18:01] <wikibugs>	 10Operations, 10MassMessage, 10WMF-JobQueue, 10Platform Team Workboards (Clinic Duty Team): Same MassMessage is being sent more than once - https://phabricator.wikimedia.org/T93049 (10Tgr) >>! In T93049#6512485, @Pchelolo wrote: > As soon as this happens again, please add an example here and we will invest...
[03:54:21] <wikibugs>	 (03PS1) 10Hoo man: Do weekly dumps of Wikidata Lexeme [puppet] - 10https://gerrit.wikimedia.org/r/637895 (https://phabricator.wikimedia.org/T264883)
[03:54:46] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Do weekly dumps of Wikidata Lexeme [puppet] - 10https://gerrit.wikimedia.org/r/637895 (https://phabricator.wikimedia.org/T264883) (owner: 10Hoo man)
[03:57:04] <icinga-wm>	 RECOVERY - Check systemd state on maps2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:57:06] <icinga-wm>	 RECOVERY - cassandra service on maps2002 is OK: OK - cassandra is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[04:02:06] <icinga-wm>	 PROBLEM - Check systemd state on maps2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:02:10] <icinga-wm>	 PROBLEM - cassandra service on maps2002 is CRITICAL: CRITICAL - Expecting active but unit cassandra is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[04:27:12] <icinga-wm>	 RECOVERY - Check systemd state on maps2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:27:16] <icinga-wm>	 RECOVERY - cassandra service on maps2002 is OK: OK - cassandra is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[04:32:20] <icinga-wm>	 PROBLEM - Check systemd state on maps2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:32:24] <icinga-wm>	 PROBLEM - cassandra service on maps2002 is CRITICAL: CRITICAL - Expecting active but unit cassandra is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[04:57:02] <DannyS712>	 tgr|away are you around?
[05:10:38] <tgr_>	 DannyS712: o/
[05:32:41] <wikibugs>	 10Operations, 10CheckUser, 10Traffic: Log source port for anonymous users and expose it for sysops/checkusers - https://phabricator.wikimedia.org/T181368 (10Ladsgroup) I think this shouldn't go in mw side of things, it should be part of the analytics data lake ([[https://wikitech.wikimedia.org/wiki/Analytics...
[05:42:46] <icinga-wm>	 PROBLEM - Check systemd state on deneb is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:03:10] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[06:04:52] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[06:09:03] <logmsgbot>	 !log oblivian@cumin1001 START - Cookbook sre.network.cf
[06:09:03] <logmsgbot>	 !log oblivian@cumin1001 END (PASS) - Cookbook sre.network.cf (exit_code=0)
[06:09:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:09:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:09:14] <logmsgbot>	 !log oblivian@cumin1001 START - Cookbook sre.network.cf
[06:09:16] <logmsgbot>	 !log oblivian@cumin1001 END (PASS) - Cookbook sre.network.cf (exit_code=0)
[06:09:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:09:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:59:19] <wikibugs>	 (03CR) 10ArielGlenn: Do weekly dumps of Wikidata Lexeme (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/637895 (https://phabricator.wikimedia.org/T264883) (owner: 10Hoo man)
[07:16:18] <icinga-wm>	 RECOVERY - Check systemd state on stat1007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:24:16] <wikibugs>	 10Operations, 10serviceops, 10Growth-Team (Current Sprint), 10Patch-For-Review, and 2 others: Reimage one memcached shard per DC to Buster - https://phabricator.wikimedia.org/T252391 (10elukey) >>! In T252391#6592606, @jijiki wrote: > * `mc2036.codfw.wmnet` has been reimaged to buster without redis-server...
[07:31:17] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics-Clusters, 10Patch-For-Review, 10User-Elukey: replace onboard NIC in kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T236327 (10elukey) `firmware-bnx2x` installed manually on kafka-jumbo1006, we can retry the switch anytime to see if it works.
[07:34:24] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: db1091 crashed - https://phabricator.wikimedia.org/T225060 (10wiki_willy) Hi @Marostegui - @Jclark-ctr is in charge of gathering up all the decom'd hardware for recycling, so we can have him check this week for any spare drives lying around.  We should...
[07:52:09] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: eqiad: Spare Drive Onsite for db1091 - https://phabricator.wikimedia.org/T266988 (10wiki_willy)
[07:53:16] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: eqiad: Spare Drive Onsite for db1091 - https://phabricator.wikimedia.org/T266988 (10wiki_willy)
[07:53:21] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: db1091 crashed - https://phabricator.wikimedia.org/T225060 (10wiki_willy)
[07:53:32] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: db1091 crashed - https://phabricator.wikimedia.org/T225060 (10wiki_willy) 05Open→03Resolved
[08:09:09] <wikibugs>	 (03CR) 10Hashar: "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/636084 (https://phabricator.wikimedia.org/T266024) (owner: 10Dzahn)
[08:10:22] <wikibugs>	 (03PS2) 10Ladsgroup: [WIP] varnish: Improve wording of the browser security error a bit [puppet] - 10https://gerrit.wikimedia.org/r/637850 (https://phabricator.wikimedia.org/T241656)
[08:10:59] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks great, two comments inline." (032 comments) [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/637672 (https://phabricator.wikimedia.org/T266763) (owner: 10Kormat)
[08:12:56] <wikibugs>	 (03PS3) 10Ladsgroup: [WIP] varnish: Improve wording of the browser security error a bit [puppet] - 10https://gerrit.wikimedia.org/r/637850 (https://phabricator.wikimedia.org/T241656)
[08:33:08] <wikibugs>	 10Operations, 10SRE-swift-storage, 10Patch-For-Review, 10User-fgiunchedi: Put ms-be2057 (Dell R740xd2) in service - https://phabricator.wikimedia.org/T261633 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi Host is fully in service now
[08:39:26] <icinga-wm>	 PROBLEM - varnish-http-requests grafana alert on alert1001 is CRITICAL: CRITICAL: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is alerting: 70% GET drop in 30min alert. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/
[08:40:03] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] "Will look at deploying it this today" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636083 (https://phabricator.wikimedia.org/T266024) (owner: 10Legoktm)
[08:40:44] <godog>	 !log upgrade thanos to 0.16 in codfw/eqiad - T261281
[08:40:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:40:51] <stashbot>	 T261281: Improve performance of Thanos (+ Prometheus) - https://phabricator.wikimedia.org/T261281
[08:41:06] <icinga-wm>	 RECOVERY - varnish-http-requests grafana alert on alert1001 is OK: OK: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is not alerting. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/
[08:41:37] <moritzm>	 !log installing openldap security updates on LDAP replicas
[08:41:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:42:06] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Add uRPF strict mode to Customers links [homer/public] - 10https://gerrit.wikimedia.org/r/636653 (https://phabricator.wikimedia.org/T266561) (owner: 10Ayounsi)
[08:42:39] <wikibugs>	 (03Merged) 10jenkins-bot: Add uRPF strict mode to Customers links [homer/public] - 10https://gerrit.wikimedia.org/r/636653 (https://phabricator.wikimedia.org/T266561) (owner: 10Ayounsi)
[08:44:50] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=thanos-compact site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[08:46:16] <XioNoX>	 !log add uRPF strict to ulsfo office links - T266561
[08:46:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:46:22] <stashbot>	 T266561: Apply uRPF strict mode on Customer links - https://phabricator.wikimedia.org/T266561
[08:46:34] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[08:46:57] <wikibugs>	 10Operations, 10netops, 10Patch-For-Review: Apply uRPF strict mode on Customer links - https://phabricator.wikimedia.org/T266561 (10ayounsi) Nothing more in the logs.
[08:47:19] <wikibugs>	 10Operations, 10netops, 10Patch-For-Review: Apply uRPF strict mode on Customer links - https://phabricator.wikimedia.org/T266561 (10ayounsi) 05Open→03Resolved
[08:51:37] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10Discovery-Search (Current work): Give Trey jones access necessary to support Search Platform Airflow jobs - https://phabricator.wikimedia.org/T266995 (10Gehel)
[08:52:59] <wikibugs>	 (03PS1) 10Gehel: admin: Trey Jones needs access to support Search Platform Airflow jobs [puppet] - 10https://gerrit.wikimedia.org/r/638019 (https://phabricator.wikimedia.org/T266995)
[08:53:10] <icinga-wm>	 PROBLEM - Thanos compact has not run on alert1001 is CRITICAL: 4.456e+05 ge 24 https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/651943d05a8123e32867b4673963f42b/thanos-compact
[08:54:31] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10Discovery-Search (Current work), 10Patch-For-Review: Give Trey jones access necessary to support Search Platform Airflow jobs - https://phabricator.wikimedia.org/T266995 (10Gehel) a:03RKemper
[08:55:59] <wikibugs>	 (03CR) 10Volans: "If the mapping is 1:1 between desktop and mobile records, I'm wondering if we should instead take advantage of the fact that those are tem" [dns] - 10https://gerrit.wikimedia.org/r/637849 (https://phabricator.wikimedia.org/T152882) (owner: 10Ladsgroup)
[08:56:54] <icinga-wm>	 RECOVERY - cassandra service on maps2002 is OK: OK - cassandra is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[08:59:41] <wikibugs>	 (03PS4) 10Nikerabbit: Stop defining wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634224
[08:59:43] <wikibugs>	 (03PS1) 10Nikerabbit: Stop reading wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/638020
[09:01:56] <icinga-wm>	 PROBLEM - cassandra service on maps2002 is CRITICAL: CRITICAL - Expecting active but unit cassandra is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[09:04:47] <wikibugs>	 (03CR) 10JMeybohm: [C: 04-1] "Thanks! One issue and a nit 😊" (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/637753 (owner: 10Jeena Huneidi)
[09:06:06] <icinga-wm>	 PROBLEM - Check systemd state on thanos-fe2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:09:25] <wikibugs>	 10Operations, 10netops, 10cloud-services-team (Kanban): Enable L3 routing on cloudsw nodes - https://phabricator.wikimedia.org/T265288 (10ayounsi) As explained previously on IRC,  `208.80.155.88/29` is part of the eqiad IP space, `185.15.56.240/29` is part of the WMCS IP space.  When a "customer" connects to...
[09:14:58] <icinga-wm>	 RECOVERY - Thanos compact has not run on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/651943d05a8123e32867b4673963f42b/thanos-compact
[09:15:04] <icinga-wm>	 PROBLEM - Check systemd state on thanos-fe1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:17:00] <icinga-wm>	 RECOVERY - Thanos sidecar is failing to upload blocks on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar
[09:19:30] <icinga-wm>	 RECOVERY - Check systemd state on thanos-fe2003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:23:38] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM, thanks for the fix!" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/637728 (https://phabricator.wikimedia.org/T266767) (owner: 10Ayounsi)
[09:24:06] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/637734 (https://phabricator.wikimedia.org/T265340) (owner: 10Ayounsi)
[09:25:10] <icinga-wm>	 PROBLEM - Thanos compact has not run on alert1001 is CRITICAL: 4.456e+05 ge 24 https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/651943d05a8123e32867b4673963f42b/thanos-compact
[09:27:12] <icinga-wm>	 RECOVERY - cassandra service on maps2002 is OK: OK - cassandra is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[09:29:42] <wikibugs>	 (03PS2) 10Kormat: Initial (re)packaging [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/637672 (https://phabricator.wikimedia.org/T266763)
[09:31:28] <wikibugs>	 (03PS1) 10JMeybohm: Lint the chart _scaffold by creating a dummy chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/638025
[09:31:58] <icinga-wm>	 RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:32:04] <wikibugs>	 (03PS2) 10JMeybohm: Lint the chart _scaffold by creating a dummy chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/638025
[09:32:14] <icinga-wm>	 PROBLEM - cassandra service on maps2002 is CRITICAL: CRITICAL - Expecting active but unit cassandra is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[09:33:32] <icinga-wm>	 RECOVERY - Thanos compact has not run on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/651943d05a8123e32867b4673963f42b/thanos-compact
[09:34:21] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Lint the chart _scaffold by creating a dummy chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/638025 (owner: 10JMeybohm)
[09:36:24] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] "Thanks!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/636905 (owner: 10Kosta Harlan)
[09:36:26] <icinga-wm>	 PROBLEM - Check systemd state on thanos-fe1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:39:08] <wikibugs>	 (03Merged) 10jenkins-bot: Define scaffold_version before attempting to use it [deployment-charts] - 10https://gerrit.wikimedia.org/r/636905 (owner: 10Kosta Harlan)
[09:39:30] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] PuppetDB import: don't do empty saves [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/637728 (https://phabricator.wikimedia.org/T266767) (owner: 10Ayounsi)
[09:41:11] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] PuppetDB import, set interface type when renaming ##PRIMARY## [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/637734 (https://phabricator.wikimedia.org/T265340) (owner: 10Ayounsi)
[09:45:16] <wikibugs>	 (03PS3) 10JMeybohm: Lint the chart _scaffold by creating a dummy chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/638025
[09:46:16] <wikibugs>	 (03CR) 10JMeybohm: "Currently expected to fail, should be fine after Id735ce4bb2619a6814a97968cec3295689cc0050 is merged" [deployment-charts] - 10https://gerrit.wikimedia.org/r/638025 (owner: 10JMeybohm)
[09:47:38] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Lint the chart _scaffold by creating a dummy chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/638025 (owner: 10JMeybohm)
[09:48:12] <icinga-wm>	 RECOVERY - Check systemd state on thanos-fe1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:53:40] <icinga-wm>	 PROBLEM - Thanos compact has not run on alert1001 is CRITICAL: 4.456e+05 ge 24 https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/651943d05a8123e32867b4673963f42b/thanos-compact
[09:55:42] <icinga-wm>	 PROBLEM - Thanos sidecar is failing to upload blocks on alert1001 is CRITICAL: cluster=prometheus instance=prometheus1004 job=thanos-sidecar prometheus=ops site=eqiad https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar
[10:23:25] <moritzm>	 !log installing openldap security updates on corp LDAP replicas
[10:23:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:24:33] <wikibugs>	 (03PS1) 10Itamar Givon: Revert JS parser commits [extensions/Wikibase] (wmf/1.36.0-wmf.14) - 10https://gerrit.wikimedia.org/r/637801 (https://phabricator.wikimedia.org/T266671)
[10:27:34] <icinga-wm>	 RECOVERY - Check systemd state on maps2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:27:36] <icinga-wm>	 RECOVERY - cassandra service on maps2002 is OK: OK - cassandra is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[10:28:38] <logmsgbot>	 !log oblivian@cumin1001 START - Cookbook sre.network.cf
[10:28:38] <logmsgbot>	 !log oblivian@cumin1001 END (PASS) - Cookbook sre.network.cf (exit_code=0)
[10:28:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:28:45] <logmsgbot>	 !log oblivian@cumin1001 START - Cookbook sre.network.cf
[10:28:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:28:47] <logmsgbot>	 !log oblivian@cumin1001 END (PASS) - Cookbook sre.network.cf (exit_code=0)
[10:28:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:28:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:29:40] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/637672 (https://phabricator.wikimedia.org/T266763) (owner: 10Kormat)
[10:32:38] <icinga-wm>	 PROBLEM - Check systemd state on maps2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:32:40] <icinga-wm>	 PROBLEM - cassandra service on maps2002 is CRITICAL: CRITICAL - Expecting active but unit cassandra is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[10:33:38] <_joe_>	 hnowlan: ^^
[10:33:47] <_joe_>	 cassandra on maps2002 keeps failing
[10:34:07] <wikibugs>	 (03PS3) 10Kormat: Initial (re)packaging [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/637672 (https://phabricator.wikimedia.org/T266763)
[10:34:09] <wikibugs>	 (03PS3) 10Kormat: debian: add user/group + systemd service [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/637683 (https://phabricator.wikimedia.org/T266763)
[10:36:38] <wikibugs>	 (03PS1) 10Filippo Giunchedi: package_builder: use --no-cowdancer-update when updating chroots [puppet] - 10https://gerrit.wikimedia.org/r/638032
[10:37:04] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] package_builder: use --no-cowdancer-update when updating chroots [puppet] - 10https://gerrit.wikimedia.org/r/638032 (owner: 10Filippo Giunchedi)
[10:38:44] <wikibugs>	 (03PS2) 10Filippo Giunchedi: package_builder: use --no-cowdancer-update when updating chroots [puppet] - 10https://gerrit.wikimedia.org/r/638032
[10:41:43] <wikibugs>	 (03CR) 10Muehlenhoff: debian: add user/group + systemd service (031 comment) [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/637683 (https://phabricator.wikimedia.org/T266763) (owner: 10Kormat)
[10:45:54] <wikibugs>	 (03PS2) 10Filippo Giunchedi: prometheus: re-enable compaction by default [puppet] - 10https://gerrit.wikimedia.org/r/636362 (https://phabricator.wikimedia.org/T261281)
[10:45:56] <wikibugs>	 (03PS1) 10Filippo Giunchedi: thanos: use systemd overrides for query/store/compact [puppet] - 10https://gerrit.wikimedia.org/r/638036 (https://phabricator.wikimedia.org/T261281)
[10:50:19] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.decommission
[10:50:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:57:33] <wikibugs>	 (03CR) 10Physikerwelt: "I apprechiate this as an intermediate solution for T266673" [extensions/Wikibase] (wmf/1.36.0-wmf.14) - 10https://gerrit.wikimedia.org/r/637801 (https://phabricator.wikimedia.org/T266671) (owner: 10Itamar Givon)
[10:57:40] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
[10:57:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:57:45] <wikibugs>	 10Operations, 10Patch-For-Review: Migrate LDAP replicas to Buster - https://phabricator.wikimedia.org/T264388 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2001 for hosts: `ldap-replica2001.wikimedia.org` - ldap-replica2001.wikimedia.org (**WARN**)   - **Failed downtime host on I...
[10:58:06] <icinga-wm>	 RECOVERY - Thanos sidecar is failing to upload blocks on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar
[10:59:53] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.decommission
[10:59:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:00:28] <hnowlan>	 _joe_: whoa what, that's bad. codfw maps shouldn't be affected by changes at all
[11:00:39] <hnowlan>	 looking 
[11:00:43] <_joe_>	 thanks :)
[11:06:00] <godog>	 !log upgrade thanos to 0.16.0 on prometheus hosts - T261281
[11:06:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:06:06] <stashbot>	 T261281: Improve performance of Thanos (+ Prometheus) - https://phabricator.wikimedia.org/T261281
[11:07:04] <icinga-wm>	 PROBLEM - Check systemd state on stat1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:09:33] <wikibugs>	 (03PS1) 10ArielGlenn: update worker scripts to loop in secondary batch worker mode [dumps] - 10https://gerrit.wikimedia.org/r/638043 (https://phabricator.wikimedia.org/T252396)
[11:12:20] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Add Debian directory [debs/kthxbye] - 10https://gerrit.wikimedia.org/r/638044 (https://phabricator.wikimedia.org/T266535)
[11:12:41] <wikibugs>	 (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/638045 (https://phabricator.wikimedia.org/T128546)
[11:13:40] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
[11:13:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:13:45] <wikibugs>	 10Operations, 10Patch-For-Review: Migrate LDAP replicas to Buster - https://phabricator.wikimedia.org/T264388 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2001 for hosts: `ldap-replica2002.wikimedia.org` - ldap-replica2002.wikimedia.org (**WARN**)   - **Failed downtime host on I...
[11:14:28] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove ldap-replica2001/2002 from DNS [dns] - 10https://gerrit.wikimedia.org/r/638067 (https://phabricator.wikimedia.org/T264388)
[11:14:44] <wikibugs>	 (03CR) 10Gilles: "Sounds good!" [puppet] - 10https://gerrit.wikimedia.org/r/636024 (https://phabricator.wikimedia.org/T266155) (owner: 10Gilles)
[11:15:44] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] Remove ldap-eqiad-replica0[12] from acmechief config [puppet] - 10https://gerrit.wikimedia.org/r/637500 (https://phabricator.wikimedia.org/T264388) (owner: 10Muehlenhoff)
[11:16:17] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove ldap-replica2001/2002 from DNS [dns] - 10https://gerrit.wikimedia.org/r/638067 (https://phabricator.wikimedia.org/T264388) (owner: 10Muehlenhoff)
[11:18:30] <logmsgbot>	 !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime
[11:18:30] <logmsgbot>	 !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[11:18:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:18:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:23:16] <icinga-wm>	 RECOVERY - Check systemd state on maps2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:23:18] <icinga-wm>	 RECOVERY - cassandra service on maps2002 is OK: OK - cassandra is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[11:24:11] <wikibugs>	 10Operations, 10Puppet, 10observability, 10Patch-For-Review, and 2 others: Puppet: get row/rack info from Netbox - https://phabricator.wikimedia.org/T229397 (10Volans) We discussed it a bit during the Infrastructure Foundation last meeting on Wed. I'll try to summarize the outcome of it, please correct me...
[11:24:48] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove ldap-replica2001/2002 from Puppet [puppet] - 10https://gerrit.wikimedia.org/r/638069
[11:25:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Remove ldap-replica2001/2002 from Puppet [puppet] - 10https://gerrit.wikimedia.org/r/638069 (owner: 10Muehlenhoff)
[11:25:28] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove ldap-replica2001/2002 from Puppet [puppet] - 10https://gerrit.wikimedia.org/r/638069
[11:29:46] <icinga-wm>	 RECOVERY - cassandra CQL 10.192.16.179:9042 on maps2002 is OK: TCP OK - 0.032 second response time on 10.192.16.179 port 9042 https://phabricator.wikimedia.org/T93886
[11:30:04] <jouncebot>	 jan_drewniak: Dear deployers, time to do the Wikimedia Portals Update deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201102T1130).
[11:30:59] <wikibugs>	 (03CR) 10Jdrewniak: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/638045 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak)
[11:31:01] <wikibugs>	 (03PS1) 10Volans: CHANGELOG: add changelogs for release v0.0.4 [software/pywmflib] - 10https://gerrit.wikimedia.org/r/638072
[11:31:37] <wikibugs>	 (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/638045 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak)
[11:32:09] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove ldap-replica2001/2002 from Puppet [puppet] - 10https://gerrit.wikimedia.org/r/638069 (owner: 10Muehlenhoff)
[11:32:33] <wikibugs>	 (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v0.0.4 [software/pywmflib] - 10https://gerrit.wikimedia.org/r/638072 (owner: 10Volans)
[11:33:46] <logmsgbot>	 !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:638045| Bumping portals to master (T128546)]] (duration: 01m 00s)
[11:33:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:33:53] <wikibugs>	 (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.4 [software/pywmflib] - 10https://gerrit.wikimedia.org/r/638072 (owner: 10Volans)
[11:33:53] <stashbot>	 T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546
[11:34:44] <logmsgbot>	 !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:638045| Bumping portals to master (T128546)]] (duration: 00m 58s)
[11:34:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:35:01] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove ldap-eqiad-replica0[12] from acmechief config [puppet] - 10https://gerrit.wikimedia.org/r/637500 (https://phabricator.wikimedia.org/T264388) (owner: 10Muehlenhoff)
[11:42:37] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata Query UI, 10User-Addshore: Move WDQS UI to microsites - https://phabricator.wikimedia.org/T266702 (10Addshore) So I now see that the custom-config used to be in the build repo in the production branch. It was removed in https://gerrit.wikimedia.org/r/c/wikidata/query/gu...
[11:46:08] <wikibugs>	 (03PS1) 10Volans: Upstream release v0.0.4 [software/pywmflib] (debian) - 10https://gerrit.wikimedia.org/r/638073
[11:48:17] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove ldap-replica2001/2002 from acmechief config [puppet] - 10https://gerrit.wikimedia.org/r/638075
[11:48:46] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] Remove ldap-replica2001/2002 from acmechief config [puppet] - 10https://gerrit.wikimedia.org/r/638075 (owner: 10Muehlenhoff)
[11:49:34] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove ldap-replica2001/2002 from acmechief config [puppet] - 10https://gerrit.wikimedia.org/r/638075 (owner: 10Muehlenhoff)
[11:50:31] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove obsolete Hiera files [puppet] - 10https://gerrit.wikimedia.org/r/638076
[11:50:38] <wikibugs>	 (03CR) 10Volans: [C: 03+2] Upstream release v0.0.4 [software/pywmflib] (debian) - 10https://gerrit.wikimedia.org/r/638073 (owner: 10Volans)
[11:51:17] <effie>	 !log disable thumbor on thumbor1001 and thumbor1002 to test 636024
[11:51:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:51:32] <effie>	 agr
[11:51:41] <wikibugs>	 (03Merged) 10jenkins-bot: Upstream release v0.0.4 [software/pywmflib] (debian) - 10https://gerrit.wikimedia.org/r/638073 (owner: 10Volans)
[11:51:58] <effie>	 !log disable puppet on  thumbor1001 and thumbor1002 to test 636024
[11:52:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:52:53] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove obsolete Hiera files [puppet] - 10https://gerrit.wikimedia.org/r/638076 (owner: 10Muehlenhoff)
[11:53:18] <icinga-wm>	 RECOVERY - Thanos compact has not run on alert1001 is OK: (C)24 ge (W)12 ge 0.03049 https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/651943d05a8123e32867b4673963f42b/thanos-compact
[11:55:37] <wikibugs>	 10Operations, 10Patch-For-Review: Migrate LDAP replicas to Buster - https://phabricator.wikimedia.org/T264388 (10MoritzMuehlenhoff) 05Open→03Resolved ldap-replica1001/1002/2003/2004 are now running Buster, old Stretch instances have been removed.
[12:00:04] <jouncebot>	 Amir1, Lucas_WMDE, awight, and Urbanecm: May I have your attention please! European mid-day backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201102T1200)
[12:00:05] <jouncebot>	 matthiasmullie, hashar, Nikerabbit, and ItamarWMDE: A patch you scheduled for European mid-day backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[12:00:11] <matthiasmullie>	 o/
[12:00:18] <Lucas_WMDE>	 o/
[12:00:23] <wikibugs>	 (03PS2) 10Hnowlan: Enable replication in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/608726 (https://phabricator.wikimedia.org/T254014) (owner: 10Ryan Kemper)
[12:00:30] <Lucas_WMDE>	 I can deploy today
[12:00:54] <hashar>	 hi. I will do my config change to ExtensionDistributor later ( https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/636083/ )
[12:01:25] <Nikerabbit>	 o7
[12:01:46] <hashar>	 and one probably want to +2 the pending Wikibase right now
[12:01:48] <hashar>	 ( https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/637801 )
[12:02:11] <volans>	 !log uploaded python3-wmflib_0.0.4 to apt.wikimedia.org buster-wikimedia
[12:02:14] <itamarWMDE>	 o/
[12:02:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:02:22] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Revert JS parser commits [extensions/Wikibase] (wmf/1.36.0-wmf.14) - 10https://gerrit.wikimedia.org/r/637801 (https://phabricator.wikimedia.org/T266671) (owner: 10Itamar Givon)
[12:02:26] <Lucas_WMDE>	 hashar: good point :)
[12:03:27] <wikibugs>	 (03PS2) 10Lucas Werkmeister (WMDE): Fix array depth for properties array [mediawiki-config] - 10https://gerrit.wikimedia.org/r/637778 (https://phabricator.wikimedia.org/T266835) (owner: 10Matthias Mullie)
[12:03:34] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Fix array depth for properties array [mediawiki-config] - 10https://gerrit.wikimedia.org/r/637778 (https://phabricator.wikimedia.org/T266835) (owner: 10Matthias Mullie)
[12:05:13] <wikibugs>	 (03Merged) 10jenkins-bot: Fix array depth for properties array [mediawiki-config] - 10https://gerrit.wikimedia.org/r/637778 (https://phabricator.wikimedia.org/T266835) (owner: 10Matthias Mullie)
[12:05:26] <matthiasmullie>	 Lucas_WMDE: that one doesn't have to go to mwdebug
[12:05:36] <Lucas_WMDE>	 okay
[12:06:17] <hashar>	 lunch &
[12:07:40] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:637778|Fix array depth for properties array (T266835)]] (duration: 00m 59s)
[12:07:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:07:48] <stashbot>	 T266835: [betalabs] MediaSearch - Internal error MediaQueryBuilder.php: Unsupported operand types - https://phabricator.wikimedia.org/T266835
[12:08:35] <wikibugs>	 (03PS2) 10Lucas Werkmeister (WMDE): Stop reading wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/638020 (owner: 10Nikerabbit)
[12:08:46] <matthiasmullie>	 Thanks, Lucas_WMDE!
[12:08:48] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Stop reading wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/638020 (owner: 10Nikerabbit)
[12:09:03] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:637778|Fix array depth for properties array (T266835)]], Beta part (prod no-op) (duration: 00m 58s)
[12:09:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:09:17] <Lucas_WMDE>	 Nikerabbit: your changes also look like they can’t really be tested on mwdebug, right?
[12:09:27] <Lucas_WMDE>	 since the wmf.14 code already doesn’t read the config settings anymore
[12:09:46] <Nikerabbit>	 Lucas_WMDE: I can check, but not expecting to see any difference. Only labs wikis will change
[12:09:52] <Lucas_WMDE>	 ok
[12:10:11] <Nikerabbit>	 well, would have changed, but like you said ULS does not read them anymore
[12:10:17] <Nikerabbit>	 so no difference there expected either
[12:11:06] <wikibugs>	 (03Merged) 10jenkins-bot: Stop reading wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/638020 (owner: 10Nikerabbit)
[12:12:23] <Lucas_WMDE>	 Nikerabbit: first change is on mwdebug…
[12:12:25] <Lucas_WMDE>	 wait
[12:12:31] <Lucas_WMDE>	 sorry, I’m still on mwdebug2001
[12:12:43] <Lucas_WMDE>	 but should probably return to mwdebug100* now that the DC switch is over
[12:13:29] <Lucas_WMDE>	 well, I guess you can still target mwdebug2001 from the x-wikimedia-debug extension?
[12:13:43] <Nikerabbit>	 yeah I can
[12:14:21] <Nikerabbit>	 no change visible logged in or logged out
[12:14:31] <Lucas_WMDE>	 ok, syncing
[12:15:11] <wikibugs>	 (03PS5) 10Lucas Werkmeister (WMDE): Stop defining wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634224 (owner: 10Nikerabbit)
[12:15:29] <volans>	 !log upgraded python3-wmflib to 0.0.4 on cumin[12]001
[12:15:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:15:38] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Stop defining wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634224 (owner: 10Nikerabbit)
[12:15:58] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:638020|Stop reading wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon]] (duration: 00m 58s)
[12:16:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:16:32] <wikibugs>	 (03Merged) 10jenkins-bot: Stop defining wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634224 (owner: 10Nikerabbit)
[12:17:10] <Lucas_WMDE>	 Nikerabbit: second change is also on mwdebug2001 now
[12:17:54] <Nikerabbit>	 still lookd good. I assume fatal-monitor is quiet on this?
[12:18:04] <Lucas_WMDE>	 yeah, looks like it
[12:18:33] <Nikerabbit>	 good
[12:19:28] <Lucas_WMDE>	 syncing
[12:19:52] <Lucas_WMDE>	 meanwhile, the Wikibase gate-and-submit build had a random composer error in the wikibase-client-docker job, apparently :(
[12:20:22] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:634224|Stop defining wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon]], 1/2 (production) (duration: 01m 02s)
[12:20:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:20:36] <Lucas_WMDE>	 but the deployment calendar is otherwise fairly empty today, so I’d say if we need a second gate-and-submit and overrun the backport window it’s not a big problem
[12:21:44] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:634224|Stop defining wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon]], 2/2 (Beta) (duration: 00m 57s)
[12:21:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:22:07] <Lucas_WMDE>	 okay, I think that was all the config changes
[12:22:12] <Lucas_WMDE>	 so now we wait for Wikibase CI :)
[12:22:35] <itamarWMDE>	 :)
[12:22:40] <Lucas_WMDE>	 I think I’ll cancel the four remaining jobs so we can retry immediately and wait a bit less long
[12:23:03] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Revert JS parser commits [extensions/Wikibase] (wmf/1.36.0-wmf.14) - 10https://gerrit.wikimedia.org/r/637801 (https://phabricator.wikimedia.org/T266671) (owner: 10Itamar Givon)
[12:23:40] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "Let’s try again (wikibase-client-docker had a random-looking composer error so I aborted the remaining jobs to save time)." [extensions/Wikibase] (wmf/1.36.0-wmf.14) - 10https://gerrit.wikimedia.org/r/637801 (https://phabricator.wikimedia.org/T266671) (owner: 10Itamar Givon)
[12:24:16] * itamarWMDE opens up zuul to watch
[12:26:50] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Seems fine. If the performance loss is notable we could also switch to a backport of cowbuilder 0.89, but it's probably not needed." [puppet] - 10https://gerrit.wikimedia.org/r/638032 (owner: 10Filippo Giunchedi)
[12:34:00] * Lucas_WMDE whistles the Jeopardy! theme
[12:39:22] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on alert1001 is CRITICAL: 610 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[12:39:56] <Lucas_WMDE>	 eeeeek
[12:40:00] <Lucas_WMDE>	 that’s a large spike
[12:40:06] * Lucas_WMDE saunters over to logstash
[12:40:29] <Lucas_WMDE>	 not seeing anything in logspam-watch yet though
[12:40:43] <Urbanecm>	 [{exception_id}] {exception_url} ErrorException from line 820 of /srv/mediawiki/php-1.36.0-wmf.14/vendor/wikimedia/parsoid/src/Config/Env.php: PHP Notice: Undefined index: mwf1 
[12:40:45] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Add base php cli image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/638095 (https://phabricator.wikimedia.org/T265324)
[12:40:55] <Lucas_WMDE>	 from logstash it looks like it was a temporary job queue issue?
[12:40:56] <Urbanecm>	 that sounds...weird?
[12:40:59] <Lucas_WMDE>	 mainly on commonswiki
[12:41:02] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on alert1001 is OK: (C)100 gt (W)50 gt 11 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[12:41:06] <Lucas_WMDE>	 several exceptions in JobQueueEventBus.php
[12:41:12] <Lucas_WMDE>	 “Could not enqueue jobs from stream …”
[12:41:29] <Lucas_WMDE>	 with RecordLintJob, wikibase-addUsagesForPage, cirrusSearchElasticaWrite, cirrusSearchLinksUpdate as the main culprits
[12:41:37] <Lucas_WMDE>	 but I suspect those are just the most common jobs overall
[12:42:10] <Lucas_WMDE>	 but it also seems to have recovered already
[12:42:37] <Urbanecm>	 yeah, seems to be fine https://usercontent.irccloud-cdn.com/file/rJf0rFA7/image.png
[12:42:38] <Lucas_WMDE>	 (ok, there was also a large volume of deferred updates that failed to run)
[12:43:17] <Urbanecm>	 Lucas_WMDE: I take it that you're waiting for Wikibase CI?
[12:43:22] <Lucas_WMDE>	 yeah
[12:43:22] <Urbanecm>	 if so, would you mind syncing https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/637819 for me?
[12:43:32] * Lucas_WMDE looks
[12:43:56] <Lucas_WMDE>	 'otrs_wikiwiki' might be the worst dbname I’ve encountered yet
[12:44:03] <Urbanecm>	 +
[12:44:09] <Urbanecm>	 +1
[12:44:18] <Urbanecm>	 but we can't change that, so...we'll have to live with that
[12:44:22] <Lucas_WMDE>	 yeah
[12:44:26] <Lucas_WMDE>	 quick grep shows that’s its name indeed
[12:44:32] <Urbanecm>	 (it kinda matches the wiki's URL through, otrs-wiki.wikimedia.org)
[12:44:55] <Lucas_WMDE>	 (I can’t see the discussion linked in the Phab task so I just have to trust you :P )
[12:45:19] <Urbanecm>	 hehe
[12:45:25] <DannyS712>	 Lucas_WMDE OTRS agent here, can confirm community consensus for the change
[12:45:30] <Lucas_WMDE>	 ok thanks
[12:45:42] <Lucas_WMDE>	 lol I can’t even run action=ßuery&siprop=namespaces to see if it’s ns100
[12:45:42] <DannyS712>	 np
[12:46:06] <Lucas_WMDE>	 but I can see that in IS.php
[12:46:11] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Add Response namespace at otrs_wikiwiki to namespaces searched by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/637819 (https://phabricator.wikimedia.org/T266917) (owner: 10Urbanecm)
[12:46:17] <wikibugs>	 (03PS2) 10Lucas Werkmeister (WMDE): Add Response namespace at otrs_wikiwiki to namespaces searched by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/637819 (https://phabricator.wikimedia.org/T266917) (owner: 10Urbanecm)
[12:46:21] <Urbanecm>	 it is Lucas_WMDE https://usercontent.irccloud-cdn.com/file/xgFPZwuf/image.png
[12:46:42] <Urbanecm>	 through I always look the IDs up via IS.php
[12:46:57] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/637819 (https://phabricator.wikimedia.org/T266917) (owner: 10Urbanecm)
[12:47:18] <DannyS712>	 It took me longer since the actual query is action=query&meta=siteinfo&siprop=namespaces but can confirm
[12:47:34] <DannyS712>	 Though I think we can all trust Urbanecm
[12:47:50] <Urbanecm>	 thank you :)
[12:47:53] <wikibugs>	 (03Merged) 10jenkins-bot: Add Response namespace at otrs_wikiwiki to namespaces searched by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/637819 (https://phabricator.wikimedia.org/T266917) (owner: 10Urbanecm)
[12:48:02] <Lucas_WMDE>	 DannyS712: yeah I was just too lazy to type out the full URL ^^
[12:48:27] <Lucas_WMDE>	 Urbanecm: want to quickly test it on mwdebug2001?
[12:48:31] <Urbanecm>	 Lucas_WMDE: sure
[12:48:34] <DannyS712>	 Urbanecm if there is time now, can we done the security patch instead of later?
[12:48:35] <Urbanecm>	 through I'D prefer 1002
[12:48:41] <Urbanecm>	 (or 1001)
[12:48:45] <Urbanecm>	 Lucas_WMDE: we're post-switchover
[12:48:49] <Lucas_WMDE>	 yeah, I hadn’t reset my script to eqiad yet
[12:48:55] <Lucas_WMDE>	 done now but old terminals are still open ^^
[12:49:21] <Urbanecm>	 DannyS712: from my side, sure - through Lucas_WMDE leads this window
[12:49:38] <DannyS712>	 @loc
[12:49:40] <Lucas_WMDE>	 not sure we have the time for that
[12:49:51] <DannyS712>	 woops
[12:49:51] <DannyS712>	 Lucas_WMDE 2001 isn't showing the change for me
[12:49:58] <Lucas_WMDE>	 hm
[12:50:03] <DannyS712>	 oh, now it is
[12:50:10] <DannyS712>	 just had to refresh a dozen times
[12:50:10] <Lucas_WMDE>	 ah okay
[12:51:04] <DannyS712>	 > not sure we have the time for that
[12:51:04] <DannyS712>	 Okay, just wondering since I figured I was around now
[12:51:17] <Urbanecm>	 confirmed it works Lucas_WMDE 
[12:51:21] <Lucas_WMDE>	 okay, syncing
[12:52:21] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:637819|Add Response namespace at otrs_wikiwiki to namespaces searched by default (T266917)]] (duration: 00m 58s)
[12:52:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:52:28] <stashbot>	 T266917: Add Response namespace at otrswiki to namespaces searched by default - https://phabricator.wikimedia.org/T266917
[12:52:33] <Urbanecm>	 thank you Lucas_WMDE !
[12:52:37] <Lucas_WMDE>	 np
[12:56:02] <wikibugs>	 (03Merged) 10jenkins-bot: Revert JS parser commits [extensions/Wikibase] (wmf/1.36.0-wmf.14) - 10https://gerrit.wikimedia.org/r/637801 (https://phabricator.wikimedia.org/T266671) (owner: 10Itamar Givon)
[12:57:25] <Urbanecm>	 \o/
[12:57:27] <Lucas_WMDE>	 itamarWMDE: the revert is on mwdebug2001 now
[12:58:24] <Lucas_WMDE>	 huh, test edit says the wiki in read-only mode?
[12:58:44] <Lucas_WMDE>	 is that because I can’t edit in codfw?
[12:58:51] <wikibugs>	 (03PS4) 10Kormat: Initial (re)packaging [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/637672 (https://phabricator.wikimedia.org/T266763)
[12:59:07] <Lucas_WMDE>	 trying mwdebug1001 instead
[12:59:50] <wikibugs>	 (03PS5) 10Kormat: Initial (re)packaging [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/637672 (https://phabricator.wikimedia.org/T266763)
[12:59:53] <Lucas_WMDE>	 guess that was it, yeah
[13:00:10] <Lucas_WMDE>	 https://www.wikidata.org/w/index.php?title=Q4115189&diff=1301636953&oldid=1301615072
[13:00:24] <Lucas_WMDE>	 \o/
[13:00:59] <itamarWMDE>	 \o/ Yhanks Lucas_WMDE
[13:01:14] <itamarWMDE>	 With a T even, thanks ;)
[13:01:42] <Lucas_WMDE>	 ok, syncing
[13:02:49] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized php-1.36.0-wmf.14/extensions/Wikibase: Backport: [[gerrit:637801|Revert JS parser commits (T266671)]] (duration: 01m 09s)
[13:02:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:02:56] <stashbot>	 T266671: Revert commit 7f430f142d from `Malformed input error on text which is not malformed` - https://phabricator.wikimedia.org/T266671
[13:02:57] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] Enable replication in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/608726 (https://phabricator.wikimedia.org/T254014) (owner: 10Ryan Kemper)
[13:03:47] <Lucas_WMDE>	 !log EU backport&config window done
[13:03:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:03:55] <Urbanecm>	 Lucas_WMDE: ftr, yes, codfw servers are read-only to avoid issues :-)
[13:04:07] <Lucas_WMDE>	 yeah, makes sense :)
[13:07:33] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] package_builder: use --no-cowdancer-update when updating chroots [puppet] - 10https://gerrit.wikimedia.org/r/638032 (owner: 10Filippo Giunchedi)
[13:15:18] <wikibugs>	 (03PS2) 10Hnowlan: maps: add maps(200[5-9]|2010) as maps hosts [puppet] - 10https://gerrit.wikimedia.org/r/637554 (https://phabricator.wikimedia.org/T266820)
[13:22:36] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: eqiad: Spare Drive Onsite for db1091 - https://phabricator.wikimedia.org/T266988 (10Hermann)
[13:22:38] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10Discovery-Search (Current work), 10Patch-For-Review: Give Trey jones access necessary to support Search Platform Airflow jobs - https://phabricator.wikimedia.org/T266995 (10Hermann)
[13:25:56] <icinga-wm>	 PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[13:26:19] <wikibugs>	 (03PS4) 10Kormat: debian: add user/group + systemd service [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/637683 (https://phabricator.wikimedia.org/T266763)
[13:27:00] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10Discovery-Search (Current work), 10Patch-For-Review: Give Trey jones access necessary to support Search Platform Airflow jobs - https://phabricator.wikimedia.org/T266995 (10DannyS712)
[13:27:02] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: eqiad: Spare Drive Onsite for db1091 - https://phabricator.wikimedia.org/T266988 (10DannyS712)
[13:27:28] <wikibugs>	 (03CR) 10Kormat: "Made a few small fixes and done some basic testing of the completed pacakge." (032 comments) [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/637672 (https://phabricator.wikimedia.org/T266763) (owner: 10Kormat)
[13:27:53] <wikibugs>	 (03CR) 10Kormat: debian: add user/group + systemd service (031 comment) [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/637683 (https://phabricator.wikimedia.org/T266763) (owner: 10Kormat)
[13:32:32] <icinga-wm>	 RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is OK: HTTP OK: HTTP/1.0 200 OK - 23586 bytes in 0.256 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[13:35:24] <wikibugs>	 (03PS1) 10Elukey: dumps::web::html: fix pageview-complete's settings [puppet] - 10https://gerrit.wikimedia.org/r/638102
[13:37:09] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] dumps::web::html: fix pageview-complete's settings [puppet] - 10https://gerrit.wikimedia.org/r/638102 (owner: 10Elukey)
[13:40:03] <wikibugs>	 (03CR) 10Muehlenhoff: "Looks good, two comments inline." (032 comments) [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/637683 (https://phabricator.wikimedia.org/T266763) (owner: 10Kormat)
[13:40:17] <logmsgbot>	 !log elukey@cumin1001 START - Cookbook sre.zookeeper.roll-restart-zookeeper
[13:40:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:40:30] <wikibugs>	 10Operations, 10observability: VictorOps ~5min delay from email received to incident paging - https://phabricator.wikimedia.org/T266800 (10fgiunchedi) This is now a case with VO support. They'll be following up with their transactional email provider.
[13:40:51] <elukey>	 !log roll restart zookeeper ok an-conf* to pick up new openjdk upgrades
[13:40:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:41:21] <wikibugs>	 (03PS5) 10Kormat: debian: add user/group + systemd service [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/637683 (https://phabricator.wikimedia.org/T266763)
[13:43:22] <wikibugs>	 (03PS2) 10Hashar: Remove $wgExtDistListFile, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636083 (https://phabricator.wikimedia.org/T266024) (owner: 10Legoktm)
[13:44:15] <wikibugs>	 (03CR) 10Kormat: debian: add user/group + systemd service (032 comments) [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/637683 (https://phabricator.wikimedia.org/T266763) (owner: 10Kormat)
[13:45:06] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/637683 (https://phabricator.wikimedia.org/T266763) (owner: 10Kormat)
[13:46:34] <logmsgbot>	 !log elukey@cumin1001 END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
[13:46:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:48:44] <wikibugs>	 10Operations, 10Traffic, 10Performance-Team (Radar): 8-10% response start regression (Varnish 5.1.3-1wm15 -> 6.0.6-1wm1) - https://phabricator.wikimedia.org/T264398 (10Gilles) | Host | hit-front rate 2020-09-10 -> 2020-09-23| hit-front rate 2020-10-29 -> 2020-11-02 | | cp4027 | 70.3% | 64.4% | | cp4028 | 72....
[13:49:32] <icinga-wm>	 PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/page/mobile-sections/{title} (Get mobile-sections for a test page on enwiki) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase
[13:50:47] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Changes between PS3 and PS5 also LGTM" [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/637672 (https://phabricator.wikimedia.org/T266763) (owner: 10Kormat)
[13:51:10] <icinga-wm>	 RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[13:51:29] <wikibugs>	 (03CR) 10Kormat: [V: 03+2 C: 03+2] Initial (re)packaging [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/637672 (https://phabricator.wikimedia.org/T266763) (owner: 10Kormat)
[13:51:40] <wikibugs>	 (03CR) 10Kormat: [V: 03+2 C: 03+2] debian: add user/group + systemd service [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/637683 (https://phabricator.wikimedia.org/T266763) (owner: 10Kormat)
[13:53:36] <wikibugs>	 (03PS1) 10Fdans: dumps::web::html Change location of pageview complete landing to readme.html [puppet] - 10https://gerrit.wikimedia.org/r/638105
[13:55:19] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+1] "OK by me, anyone else need to weigh in on this?" [puppet] - 10https://gerrit.wikimedia.org/r/638105 (owner: 10Fdans)
[13:56:54] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] dumps::web::html Change location of pageview complete landing to readme.html [puppet] - 10https://gerrit.wikimedia.org/r/638105 (owner: 10Fdans)
[13:57:04] <wikibugs>	 (03CR) 10Hashar: [C: 03+2] "Configuration cleanup. The file is gone from Gerrit." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636083 (https://phabricator.wikimedia.org/T266024) (owner: 10Legoktm)
[13:57:23] <hashar>	 ^ late config change
[13:57:52] <wikibugs>	 (03Merged) 10jenkins-bot: Remove $wgExtDistListFile, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636083 (https://phabricator.wikimedia.org/T266024) (owner: 10Legoktm)
[13:58:17] <wikibugs>	 (03PS5) 10Effie Mouzeli: Switch Thumbor haproxy load balancing to IP hash [puppet] - 10https://gerrit.wikimedia.org/r/636024 (https://phabricator.wikimedia.org/T266155) (owner: 10Gilles)
[14:01:20] <logmsgbot>	 !log hashar@deploy1001 Synchronized wmf-config/CommonSettings.php: Remove $wgExtDistListFile, unused - T266024 (duration: 00m 58s)
[14:01:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:01:27] <stashbot>	 T266024: Phase out https://gerrit.wikimedia.org/mediawiki-extensions.txt - https://phabricator.wikimedia.org/T266024
[14:03:20] <wikibugs>	 (03PS1) 10Kormat: debian: Fix release name [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/638108
[14:03:47] <wikibugs>	 (03CR) 10Kormat: [V: 03+2 C: 03+2] debian: Fix release name [debs/orchestrator] - 10https://gerrit.wikimedia.org/r/638108 (owner: 10Kormat)
[14:08:38] <wikibugs>	 10Operations, 10observability: VictorOps ~5min delay from email received to incident paging - https://phabricator.wikimedia.org/T266800 (10Volans) @fgiunchedi should we consider converting out transport from email to API calls at this point? Should give us an immediate feedback that we relayed the alert to VO...
[14:09:28] <wikibugs>	 (03PS6) 10Effie Mouzeli: Switch Thumbor haproxy load balancing to IP hash [puppet] - 10https://gerrit.wikimedia.org/r/636024 (https://phabricator.wikimedia.org/T266155) (owner: 10Gilles)
[14:17:53] <kormat>	 !log uploaded orchestrator 3.2.3-1 to apt
[14:17:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:19:01] <wikibugs>	 (03PS1) 10Effie Mouzeli: swift: pass the 'X-Client-IP' header to thumbor [puppet] - 10https://gerrit.wikimedia.org/r/638109 (https://phabricator.wikimedia.org/T266155)
[14:19:44] <wikibugs>	 (03CR) 10Effie Mouzeli: "I tried "balance hdr(X-Client-IP)", which didn't work at all! What happened was that we have configured "http-request set-header X-Client-" [puppet] - 10https://gerrit.wikimedia.org/r/636024 (https://phabricator.wikimedia.org/T266155) (owner: 10Gilles)
[14:20:37] <wikibugs>	 (03PS2) 10Kormat: orchestrator: Support running as non-root [puppet] - 10https://gerrit.wikimedia.org/r/637693 (https://phabricator.wikimedia.org/T266763)
[14:20:39] <wikibugs>	 (03PS9) 10Kormat: orchestrator: Support sqlite backend [puppet] - 10https://gerrit.wikimedia.org/r/637684 (https://phabricator.wikimedia.org/T266657)
[14:21:19] <wikibugs>	 10Operations, 10DBA, 10Orchestrator, 10Patch-For-Review: Repackage orchestrator - https://phabricator.wikimedia.org/T266763 (10Kormat) Repackaging is done, now just need https://gerrit.wikimedia.org/r/c/operations/puppet/+/637693 merged so it can be deployed.
[14:21:20] <icinga-wm>	 PROBLEM - cassandra CQL 10.192.16.179:9042 on maps2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T93886
[14:24:32] <icinga-wm>	 RECOVERY - cassandra CQL 10.192.16.179:9042 on maps2002 is OK: TCP OK - 0.032 second response time on 10.192.16.179 port 9042 https://phabricator.wikimedia.org/T93886
[14:25:34] <wikibugs>	 (03PS1) 10Filippo Giunchedi: thanos: configure memcached size via hiera [puppet] - 10https://gerrit.wikimedia.org/r/638110 (https://phabricator.wikimedia.org/T261281)
[14:28:40] <wikibugs>	 (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler1001/26248/thanos-fe1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/638110 (https://phabricator.wikimedia.org/T261281) (owner: 10Filippo Giunchedi)
[14:34:58] <moritzm>	 !log rolling restart of cassandra in restbase-dev to pick up Java security updates
[14:35:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:36:34] <icinga-wm>	 PROBLEM - cassandra CQL 10.192.16.179:9042 on maps2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T93886
[14:38:39] <logmsgbot>	 !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime
[14:38:40] <logmsgbot>	 !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[14:38:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:38:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:41:36] <wikibugs>	 (03CR) 10Gilles: [C: 03+1] swift: pass the 'X-Client-IP' header to thumbor [puppet] - 10https://gerrit.wikimedia.org/r/638109 (https://phabricator.wikimedia.org/T266155) (owner: 10Effie Mouzeli)
[14:42:10] <wikibugs>	 10Operations, 10Puppet: Puppet Proposal to remove require_package - https://phabricator.wikimedia.org/T266479 (10akosiaris) The idea was indeed to just make sure that the packages are installed before anything else in the class happens. These days, if one puts `ensure_packages()` at the top of the manifest, we...
[14:46:38] <icinga-wm>	 RECOVERY - cassandra CQL 10.192.16.179:9042 on maps2002 is OK: TCP OK - 0.032 second response time on 10.192.16.179 port 9042 https://phabricator.wikimedia.org/T93886
[14:50:28] <wikibugs>	 (03CR) 10Cparle: Generation of json dumps for wikimedia commons (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/629121 (https://phabricator.wikimedia.org/T259067) (owner: 10Cparle)
[15:12:52] <wikibugs>	 (03PS1) 10JMeybohm: Remove kubernetes sources [debs/kubernetes] (future) - 10https://gerrit.wikimedia.org/r/638114
[15:12:54] <wikibugs>	 (03PS1) 10JMeybohm: Package binary kubernetes releases [debs/kubernetes] (future) - 10https://gerrit.wikimedia.org/r/638115
[15:13:56] <wikibugs>	 (03PS2) 10JMeybohm: Package binary kubernetes releases [debs/kubernetes] (future) - 10https://gerrit.wikimedia.org/r/638115 (https://phabricator.wikimedia.org/T266766)
[15:13:58] <wikibugs>	 (03PS1) 10Ottomata: Produce canary events every 15 minutes [puppet] - 10https://gerrit.wikimedia.org/r/638116 (https://phabricator.wikimedia.org/T266573)
[15:19:10] <wikibugs>	 (03PS2) 10Ottomata: Produce canary events every 15 minutes [puppet] - 10https://gerrit.wikimedia.org/r/638116 (https://phabricator.wikimedia.org/T266573)
[15:21:12] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Produce canary events every 15 minutes [puppet] - 10https://gerrit.wikimedia.org/r/638116 (https://phabricator.wikimedia.org/T266573) (owner: 10Ottomata)
[15:22:05] <wikibugs>	 (03PS1) 10Filippo Giunchedi: thanos: add query-frontend [puppet] - 10https://gerrit.wikimedia.org/r/638119 (https://phabricator.wikimedia.org/T261281)
[15:22:08] <wikibugs>	 (03PS1) 10Filippo Giunchedi: prometheus: add thanos query-frontend jobs [puppet] - 10https://gerrit.wikimedia.org/r/638120 (https://phabricator.wikimedia.org/T261281)
[15:22:10] <wikibugs>	 (03PS1) 10Filippo Giunchedi: role: add query_frontend to thanos frontend [puppet] - 10https://gerrit.wikimedia.org/r/638121 (https://phabricator.wikimedia.org/T261281)
[15:22:12] <wikibugs>	 (03PS1) 10Filippo Giunchedi: pontoon: use frontends for query_frontend memcache [puppet] - 10https://gerrit.wikimedia.org/r/638122 (https://phabricator.wikimedia.org/T261281)
[15:35:48] <wikibugs>	 (03CR) 10Filippo Giunchedi: "The end result of this patch series is PCC-ed here: https://puppet-compiler.wmflabs.org/compiler1002/26250/thanos-fe1001.eqiad.wmnet/fulld" [puppet] - 10https://gerrit.wikimedia.org/r/638121 (https://phabricator.wikimedia.org/T261281) (owner: 10Filippo Giunchedi)
[15:36:18] <moritzm>	 !log imported php-excimer/php-luasandbox to component/php72 for buster-wikimedia
[15:36:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:37:46] <wikibugs>	 (03CR) 10Gehel: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/637554 (https://phabricator.wikimedia.org/T266820) (owner: 10Hnowlan)
[15:40:08] <wikibugs>	 (03PS4) 10Mforns: Add ::profile::analytics::refinery::network_region_config [puppet] - 10https://gerrit.wikimedia.org/r/637559 (https://phabricator.wikimedia.org/T254332)
[15:41:30] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add ::profile::analytics::refinery::network_region_config [puppet] - 10https://gerrit.wikimedia.org/r/637559 (https://phabricator.wikimedia.org/T254332) (owner: 10Mforns)
[15:47:21] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: (Need By: 2020-08-31) rack/setup/install es10[26-34].eqiad.wmnet - https://phabricator.wikimedia.org/T260370 (10RobH) a:05Cmjohnson→03RobH >>! In T260370#6593013, @Marostegui wrote: > es1032 has RAID0 instead of RAID10.  > Can we get that one re-done with RAI...
[15:47:30] <wikibugs>	 (03PS5) 10Mforns: Add ::profile::analytics::refinery::network_region_config [puppet] - 10https://gerrit.wikimedia.org/r/637559 (https://phabricator.wikimedia.org/T254332)
[15:48:53] <wikibugs>	 (03CR) 10Mforns: Add ::profile::analytics::refinery::network_region_config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/637559 (https://phabricator.wikimedia.org/T254332) (owner: 10Mforns)
[15:58:31] <wikibugs>	 (03PS1) 10Hnowlan: maps: add maps100[5-8] and maps1010 [puppet] - 10https://gerrit.wikimedia.org/r/638125
[16:04:15] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] maps: add maps(200[5-9]|2010) as maps hosts [puppet] - 10https://gerrit.wikimedia.org/r/637554 (https://phabricator.wikimedia.org/T266820) (owner: 10Hnowlan)
[16:04:47] <wikibugs>	 (03PS2) 10Hnowlan: maps: add maps100[5-8] and maps1010 [puppet] - 10https://gerrit.wikimedia.org/r/638125
[16:08:55] <wikibugs>	 (03PS3) 10Hnowlan: maps: add maps(200[5-9]|2010) as maps hosts [puppet] - 10https://gerrit.wikimedia.org/r/637554 (https://phabricator.wikimedia.org/T266820)
[16:20:58] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to GLOBAL ROOT for David Caro - https://phabricator.wikimedia.org/T267040 (10dcaro)
[16:23:19] <icinga-wm>	 PROBLEM - Check systemd state on maps2005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:28:39] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops: (Need By: 2020-11-29) rack/setup/install db214[234] - https://phabricator.wikimedia.org/T267041 (10RobH)
[16:28:54] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops: (Need By: 2020-11-29) rack/setup/install db214[234] - https://phabricator.wikimedia.org/T267041 (10RobH)
[16:29:52] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to GLOBAL ROOT for David Caro - https://phabricator.wikimedia.org/T267040 (10nskaggs) +1 from me
[16:36:03] <icinga-wm>	 PROBLEM - Maps HTTPS on maps2005 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 0.132 second response time https://wikitech.wikimedia.org/wiki/Maps/RunBook
[16:37:00] <logmsgbot>	 !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime
[16:37:00] <logmsgbot>	 !log hnowlan@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[16:37:03] <logmsgbot>	 !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime
[16:37:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:37:06] <logmsgbot>	 !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[16:37:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:37:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:37:14] <hnowlan>	 sorry, that's me 
[16:37:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:37:24] <logmsgbot>	 !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime
[16:37:24] <logmsgbot>	 !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[16:37:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:37:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:43:53] <icinga-wm>	 RECOVERY - Check systemd state on maps2005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:44:57] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10RobH)
[16:45:07] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10RobH)
[16:45:28] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: eqiad: Spare Drive Onsite for db1091 - https://phabricator.wikimedia.org/T266988 (10Cmjohnson) @wiki_willy I do not have any spare SSDs that would match what is in that server now.
[16:45:46] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10RobH)
[16:45:51] <wikibugs>	 (03CR) 10Meno25: "* Updated Arabic (ar) translation to match the current English source" [puppet] - 10https://gerrit.wikimedia.org/r/638055 (owner: 10Meno25)
[16:46:12] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10RobH)
[16:48:10] <wikibugs>	 10Operations, 10ops-eqsin, 10serviceops: ganeti5002 was down / powered off, machine check entries in SEL - https://phabricator.wikimedia.org/T261130 (10RobH)
[16:50:32] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: eqiad: Spare Drive Onsite for db1091 - https://phabricator.wikimedia.org/T266988 (10wiki_willy) Thanks for checking @Cmjohnson, just a heads the refresh for this server should be onsite towards the end of November via T264336.  @Marostegui - are you ok with still having t...
[16:57:36] <wikibugs>	 (03CR) 10Meno25: "This is the Arabic (ar) translation of the new text:" [puppet] - 10https://gerrit.wikimedia.org/r/637850 (https://phabricator.wikimedia.org/T241656) (owner: 10Ladsgroup)
[17:00:22] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: Audit down ports - https://phabricator.wikimedia.org/T218751 (10ayounsi) Fyi: `lang=diff ayounsi@asw2-b-eqiad# show | compare  [edit interfaces interface-range disabled]      member xe-2/0/21 { ... } +    member ge-5/0/10; +    member ge-5/0/9; +    member ge-8/0/22; +...
[17:02:41] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics-Clusters, 10DC-Ops: (Need By: TBD) rack/setup/install an-worker10[18-41] - https://phabricator.wikimedia.org/T260445 (10Cmjohnson)
[17:07:24] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to GLOBAL ROOT for David Caro - https://phabricator.wikimedia.org/T267040 (10aborrero)
[17:11:01] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to GLOBAL ROOT for David Caro - https://phabricator.wikimedia.org/T267040 (10aborrero) p:05Triage→03High
[17:13:50] <wikibugs>	 (03PS1) 10Hnowlan: postgres: set max connections in postgres based on replica count [puppet] - 10https://gerrit.wikimedia.org/r/638143 (https://phabricator.wikimedia.org/T266820)
[17:14:25] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] postgres: set max connections in postgres based on replica count [puppet] - 10https://gerrit.wikimedia.org/r/638143 (https://phabricator.wikimedia.org/T266820) (owner: 10Hnowlan)
[17:14:33] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on alert1001 is CRITICAL: 127 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[17:16:07] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on alert1001 is OK: (C)100 gt (W)50 gt 10 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[17:16:21] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics-Clusters, 10DC-Ops: (Need By: TBD) rack/setup/install an-worker10[18-41] - https://phabricator.wikimedia.org/T260445 (10Cmjohnson) @wiki_willy and @elukey I do not have enough 10G rack space to fit 24 2U servers, Currently, I have 17 2U spaces in 10G racks.  This is a...
[17:21:23] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqsin on alert1001 is CRITICAL: 57.19 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[17:24:25] <icinga-wm>	 PROBLEM - varnish-http-requests grafana alert on alert1001 is CRITICAL: CRITICAL: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is alerting: 70% GET drop in 30min alert. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/
[17:25:13] <wikibugs>	 (03PS2) 10Hnowlan: postgres: set max connections in postgres based on replica count [puppet] - 10https://gerrit.wikimedia.org/r/638143 (https://phabricator.wikimedia.org/T266820)
[17:26:00] <wikibugs>	 10Operations, 10Traffic, 10Upstream: OCSP Stapling for Intermediates - https://phabricator.wikimedia.org/T148134 (10Aklapper)
[17:26:33] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at eqsin on alert1001 is OK: (C)60 le (W)70 le 82.89 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[17:26:37] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] postgres: set max connections in postgres based on replica count [puppet] - 10https://gerrit.wikimedia.org/r/638143 (https://phabricator.wikimedia.org/T266820) (owner: 10Hnowlan)
[17:27:28] <wikibugs>	 10Operations, 10MassMessage, 10WMF-JobQueue, 10Platform Team Workboards (Clinic Duty Team): Same MassMessage is being sent more than once - https://phabricator.wikimedia.org/T93049 (10Pchelolo) Ok, it did execute the job twice:  Once on 27th:  ` 2020-10-27 19:52:28 [34499b04-8b9a-4cd1-95dd-9229906705c7] mw...
[17:27:53] <icinga-wm>	 RECOVERY - varnish-http-requests grafana alert on alert1001 is OK: OK: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is not alerting. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/
[17:29:17] <wikibugs>	 (03PS3) 10Hnowlan: postgres: set max connections in postgres based on replica count [puppet] - 10https://gerrit.wikimedia.org/r/638143 (https://phabricator.wikimedia.org/T266820)
[17:30:39] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] postgres: set max connections in postgres based on replica count [puppet] - 10https://gerrit.wikimedia.org/r/638143 (https://phabricator.wikimedia.org/T266820) (owner: 10Hnowlan)
[17:35:38] <wikibugs>	 10Operations, 10MassMessage, 10WMF-JobQueue, 10Platform Team Workboards (Clinic Duty Team): Same MassMessage is being sent more than once - https://phabricator.wikimedia.org/T93049 (10Pchelolo) Ok, a bit more:  ` 16:06 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobque...
[17:36:03] <wikibugs>	 (03PS4) 10Hnowlan: postgres: set max connections in postgres based on replica count [puppet] - 10https://gerrit.wikimedia.org/r/638143 (https://phabricator.wikimedia.org/T266820)
[17:37:21] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] postgres: set max connections in postgres based on replica count [puppet] - 10https://gerrit.wikimedia.org/r/638143 (https://phabricator.wikimedia.org/T266820) (owner: 10Hnowlan)
[17:38:14] <wikibugs>	 (03PS1) 10Ahmon Dancy: set cpu_model_extra_flags = vmx,pcid [puppet] - 10https://gerrit.wikimedia.org/r/638146
[17:44:10] <wikibugs>	 (03PS5) 10Hnowlan: postgres: set max connections in postgres based on replica count [puppet] - 10https://gerrit.wikimedia.org/r/638143 (https://phabricator.wikimedia.org/T266820)
[17:45:32] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] postgres: set max connections in postgres based on replica count [puppet] - 10https://gerrit.wikimedia.org/r/638143 (https://phabricator.wikimedia.org/T266820) (owner: 10Hnowlan)
[17:46:22] <wikibugs>	 (03PS1) 10Elukey: profile::analytics::database::meta: specify max_connections for mariadb [puppet] - 10https://gerrit.wikimedia.org/r/638148
[17:48:13] <wikibugs>	 (03CR) 10BryanDavis: "andrewbogott: do we have a way to test this in isolation in the codfw cluster?" [puppet] - 10https://gerrit.wikimedia.org/r/638146 (owner: 10Ahmon Dancy)
[17:48:43] <wikibugs>	 10Operations, 10ops-eqiad, 10decommission-hardware, 10Patch-For-Review: decommission ganeti100[1-4].eqiad.wmnet - https://phabricator.wikimedia.org/T255553 (10ayounsi) FYI:  `lang=diff [edit interfaces interface-range disabled]      member ge-7/0/11 { ... } +    member ge-4/0/22; +    member ge-4/0/23; +...
[17:50:49] <wikibugs>	 (03PS6) 10Hnowlan: postgres: set max connections in postgres based on replica count [puppet] - 10https://gerrit.wikimedia.org/r/638143 (https://phabricator.wikimedia.org/T266820)
[17:52:13] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] postgres: set max connections in postgres based on replica count [puppet] - 10https://gerrit.wikimedia.org/r/638143 (https://phabricator.wikimedia.org/T266820) (owner: 10Hnowlan)
[17:53:10] <wikibugs>	 10Operations, 10DNS, 10Traffic, 10serviceops, 10Services (watching): nodejs / restbase services (mobileapps, aqs, recommendation-api, etc?) fail persistently after short windows of DNS unavailability - https://phabricator.wikimedia.org/T162818 (10Aklapper) 05Stalled→03Open The previous comments don't...
[17:54:20] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install ml-deploy100[1-4] - https://phabricator.wikimedia.org/T267050 (10RobH)
[17:54:43] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install ml-deploy100[1-4] - https://phabricator.wikimedia.org/T267050 (10RobH)
[17:54:50] <wikibugs>	 (03PS1) 10Jgreen: switch payments-listener to codfw for final testing [dns] - 10https://gerrit.wikimedia.org/r/638149 (https://phabricator.wikimedia.org/T265688)
[17:56:35] <wikibugs>	 (03CR) 10Jgreen: [C: 03+2] switch payments-listener to codfw for final testing [dns] - 10https://gerrit.wikimedia.org/r/638149 (https://phabricator.wikimedia.org/T265688) (owner: 10Jgreen)
[18:00:04] <jouncebot>	 ryankemper: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Wikidata Query Service weekly deploy . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201102T1800).
[18:02:07] <wikibugs>	 (03PS7) 10Hnowlan: postgres: set max connections in postgres based on replica count [puppet] - 10https://gerrit.wikimedia.org/r/638143 (https://phabricator.wikimedia.org/T266820)
[18:05:06] <wikibugs>	 10Operations, 10MassMessage, 10WMF-JobQueue, 10Platform Team Workboards (Clinic Duty Team): Same MassMessage is being sent more than once - https://phabricator.wikimedia.org/T93049 (10Pchelolo) So, we have found it: the same exact job has been executed twice. I have deployed change-prop for jobqueue right...
[18:10:03] <wikibugs>	 (03PS8) 10Hnowlan: postgres: set max connections in postgres based on replica count [puppet] - 10https://gerrit.wikimedia.org/r/638143 (https://phabricator.wikimedia.org/T266820)
[18:11:45] <icinga-wm>	 PROBLEM - Restbase edge ulsfo on text-lb.ulsfo.wikimedia.org is CRITICAL: /api/rest_v1/page/mobile-html/{title} (Get mobile-html from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase
[18:13:25] <icinga-wm>	 RECOVERY - Restbase edge ulsfo on text-lb.ulsfo.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[18:14:20] <XioNoX>	 !log push new pfw policies - T267051
[18:14:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:14:30] <bd808>	 !bash <  hnowlan> the cookie licks you
[18:14:30] <stashbot>	 bd808: Stored quip at https://bash.toolforge.org/quip/xlAqinUBpU87LSFJ7Mx-
[18:15:01] <wikibugs>	 (03PS6) 10Ottomata: Add ::profile::analytics::refinery::network_region_config [puppet] - 10https://gerrit.wikimedia.org/r/637559 (https://phabricator.wikimedia.org/T254332) (owner: 10Mforns)
[18:17:33] <logmsgbot>	 !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime
[18:17:35] <logmsgbot>	 !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[18:17:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:17:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:19:34] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Add ::profile::analytics::refinery::network_region_config [puppet] - 10https://gerrit.wikimedia.org/r/637559 (https://phabricator.wikimedia.org/T254332) (owner: 10Mforns)
[18:29:54] <wikibugs>	 (03CR) 10Hnowlan: "pcc output:  https://puppet-compiler.wmflabs.org/compiler1001/26262/" [puppet] - 10https://gerrit.wikimedia.org/r/638143 (https://phabricator.wikimedia.org/T266820) (owner: 10Hnowlan)
[18:31:26] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] systemd::timer: fix TODO of adding type definition for timer job [puppet] - 10https://gerrit.wikimedia.org/r/633853 (owner: 10Dzahn)
[18:37:43] <wikibugs>	 (03PS1) 10Jgreen: flip payments-listener back to eqiad [dns] - 10https://gerrit.wikimedia.org/r/638155
[18:40:44] <icinga-wm>	 ACKNOWLEDGEMENT - Disk space on maps2002 is CRITICAL: DISK CRITICAL - free space: /srv 268 MB (0% inode=99%): Hnowlan cassandra issues being investigated https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=maps2002&var-datasource=codfw+prometheus/ops
[18:40:44] <icinga-wm>	 ACKNOWLEDGEMENT - Postgres Replication Lag on maps2002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 272514488256 and 367629 seconds Hnowlan cassandra issues being investigated https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[18:41:22] <wikibugs>	 (03PS1) 10Jgreen: flip payments-listener back to eqiad [dns] - 10https://gerrit.wikimedia.org/r/638156 (https://phabricator.wikimedia.org/T265688)
[18:43:25] <wikibugs>	 (03CR) 10Jgreen: [C: 03+2] flip payments-listener back to eqiad [dns] - 10https://gerrit.wikimedia.org/r/638156 (https://phabricator.wikimedia.org/T265688) (owner: 10Jgreen)
[18:47:34] <wikibugs>	 (03PS1) 10Dzahn: decom testvm1001.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/638159 (https://phabricator.wikimedia.org/T245757)
[18:49:36] <logmsgbot>	 !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime
[18:49:38] <logmsgbot>	 !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[18:49:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:49:43] <logmsgbot>	 !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime
[18:49:43] <logmsgbot>	 !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[18:49:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:49:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:49:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:51:21] <wikibugs>	 (03PS2) 10Dzahn: decom testvm1001.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/638159 (https://phabricator.wikimedia.org/T245757)
[18:52:15] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] decom testvm1001.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/638159 (https://phabricator.wikimedia.org/T245757) (owner: 10Dzahn)
[18:53:41] <Urbanecm>	 jouncebot: next
[18:53:41] <jouncebot>	 In 0 hour(s) and 6 minute(s): Morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201102T1900)
[18:54:24] <DannyS712>	 here
[18:54:52] <Urbanecm>	 hello DannyS712 
[18:55:00] <Urbanecm>	 I'm currently doing prep work
[18:58:48] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission
[18:58:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:59:02] <mutante>	 !log decom'ing testvm1001
[18:59:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:59:07] <andrewbogott>	 !log added dcaro to ops and wmf ldap groups
[18:59:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: I, the Bot under the Fountain, allow thee, The Deployer, to do Morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201102T1900).
[19:00:04] <jouncebot>	 DannyS712: A patch you scheduled for Morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[19:00:10] <DannyS712>	 Still here
[19:00:19] <DannyS712>	 and is the sticker cool?
[19:00:22] <Urbanecm>	 I can deploy today!
[19:01:33] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] "Looks like no-op in main cluster.  +1" [puppet] - 10https://gerrit.wikimedia.org/r/637587 (https://phabricator.wikimedia.org/T262660) (owner: 10Razzi)
[19:01:40] <DannyS712>	 let me know when its ready to test Urbanecm
[19:01:52] <Urbanecm>	 DannyS712: available at mwdebug1002 ow
[19:02:57] <DannyS712>	 confirmed to work - page DOM and links reflect desired change
[19:03:03] <Urbanecm>	 great
[19:03:04] <DannyS712>	 s/links/buttons
[19:03:16] <Urbanecm>	 let me try to test it for real (not disclosing further)
[19:04:26] <logmsgbot>	 !log dzahn@cumin1001 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
[19:04:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:06:06] <Urbanecm>	 DannyS712: confirmed it works, syncing
[19:07:20] <Urbanecm>	 !log Deployed security fix for T205908
[19:07:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:07:30] <Urbanecm>	 DannyS712: should work w/o mwdebug now
[19:07:48] <DannyS712>	 confirmed
[19:07:51] <Urbanecm>	 great
[19:07:56] <Urbanecm>	 anything else?
[19:08:12] <DannyS712>	 nope
[19:12:25] <mutante>	 volans: I have a case of "decom cookbook Failed to run the sre.dns.netbox cookbook: Cumin execution failed" on a ganeti VM in eqiad. should i worry and/or make a ticket? it happens  after most (or all) other decom steps worked fine
[19:15:46] <mutante>	  [ERROR clustershell.py:431 in _failed_commands_report]
[19:15:57] <mutante>	 from the extended log
[19:17:14] <wikibugs>	 (03PS9) 10Hnowlan: postgres: set max connections in postgres based on replica count [puppet] - 10https://gerrit.wikimedia.org/r/638143 (https://phabricator.wikimedia.org/T266820)
[19:20:03] <icinga-wm>	 PROBLEM - Maps HTTPS on maps2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Maps/RunBook
[19:21:11] <icinga-wm>	 RECOVERY - Maps HTTPS on maps2002 is OK: HTTP OK: HTTP/1.1 200 OK - 1286 bytes in 0.156 second response time https://wikitech.wikimedia.org/wiki/Maps/RunBook
[19:25:47] <icinga-wm>	 PROBLEM - cassandra service on maps2002 is CRITICAL: CRITICAL - Expecting active but unit cassandra is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[19:26:02] <wikibugs>	 (03PS1) 10Urbanecm: abusefilter.php: Enable wgAbuseFilterNotificationsPrivate by default for WMF wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/638162 (https://phabricator.wikimedia.org/T266298)
[19:26:53] <icinga-wm>	 PROBLEM - cassandra CQL 10.192.16.179:9042 on maps2002 is CRITICAL: connect to address 10.192.16.179 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886
[19:27:53] <icinga-wm>	 RECOVERY - cassandra service on maps2002 is OK: OK - cassandra is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[19:27:57] <icinga-wm>	 RECOVERY - cassandra CQL 10.192.16.179:9042 on maps2002 is OK: TCP OK - 0.034 second response time on 10.192.16.179 port 9042 https://phabricator.wikimedia.org/T93886
[19:30:47] <wikibugs>	 (03PS1) 10Aklapper: phabricator weekly changes email: List stalled task stalled for years [puppet] - 10https://gerrit.wikimedia.org/r/638163 (https://phabricator.wikimedia.org/T252522)
[19:31:50] <wikibugs>	 (03CR) 10Gehel: [C: 03+1] "LTGM" [puppet] - 10https://gerrit.wikimedia.org/r/638125 (owner: 10Hnowlan)
[19:34:44] <wikibugs>	 (03CR) 10Dzahn: "back in the days there was always the "is MobileFrontend extension enabled or not" to determine if a wiki is "ready for mobile", as far as" [dns] - 10https://gerrit.wikimedia.org/r/637849 (https://phabricator.wikimedia.org/T152882) (owner: 10Ladsgroup)
[19:37:40] <wikibugs>	 (03CR) 10Dzahn: "why is "nyc" special in this patch? It seems to work both mobile and not mobile but it's not in the regular place?" [dns] - 10https://gerrit.wikimedia.org/r/637849 (https://phabricator.wikimedia.org/T152882) (owner: 10Ladsgroup)
[19:38:20] <wikibugs>	 (03CR) 10Andrew Bogott: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/638146 (owner: 10Ahmon Dancy)
[19:38:37] <wikibugs>	 10Operations, 10Analytics, 10SRE-Access-Requests: Requesting access to production shell groups for JAnstee - https://phabricator.wikimedia.org/T266249 (10JAnstee_WMF) We are working to get our Director onboarded to phabricator and will hopefully be able to add to the card soon for approval!
[19:41:25] <wikibugs>	 (03PS2) 10Dzahn: Update email address for Nuria [puppet] - 10https://gerrit.wikimedia.org/r/636936 (owner: 10Muehlenhoff)
[19:45:12] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10RobH)
[19:45:27] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] Update email address for Nuria [puppet] - 10https://gerrit.wikimedia.org/r/636936 (owner: 10Muehlenhoff)
[19:46:19] <wikibugs>	 10Operations, 10Analytics-Radar, 10SRE-Access-Requests: Nuria's volunteer account - https://phabricator.wikimedia.org/T266086 (10Dzahn) email address changed:  https://gerrit.wikimedia.org/r/c/operations/puppet/+/636936
[19:46:50] <volans>	 mutante: just run the sre.dns.netbox cookbook manually
[19:47:54] <mutante>	 volans: just like   [cumin1001:~] $ sudo -i cookbook sre.dns.netbox 'sync after decom of ganeti VM'
[19:47:58] <mutante>	 ?
[19:48:24] <volans>	 I'd refer the hostname in the message
[19:48:29] <volans>	 but yes that's the gist
[19:48:37] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.dns.netbox
[19:48:38] <volans>	 the messagethat the decom would have used is:
[19:48:39] <volans>	 {hosts} decommissioned, removing all IPs except the asset tag one
[19:48:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:48:50] <mutante>	 'sync after decom of ganeti VM testvm1001'
[19:48:55] <volans>	 that actually for VMs is even inaccurate
[19:48:58] <volans>	 k
[19:49:28] <mutante>	 the clustershell error is probably known to you then
[19:49:32] <mutante>	 it's running
[19:51:16] <volans>	 yeah it's actually a netbox api error I have a patch to add automati retry on all netbox api calls
[19:51:32] <volans>	 need to update it to use the new wmflib.requests module that abstracts that bits
[19:52:07] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[19:52:09] <icinga-wm>	 PROBLEM - Router interfaces on mr1-eqsin is CRITICAL: CRITICAL: No response from remote host 103.102.166.128 for 1.3.6.1.2.1.2.2.1.8 with snmp version 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[19:54:24] <volans>	 mutante: the icinga above is yours, but recovery after the cookbook runs (takes a bit)
[19:55:13] <icinga-wm>	 PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 71%, RTA = 6577.61 ms
[19:55:14] <icinga-wm>	 PROBLEM - Host mr1-eqsin IPv6 is DOWN: PING CRITICAL - Packet loss = 71%, RTA = 6584.63 ms
[19:55:21] <mutante>	 volans: ACK, i am at the "done" prompt and confirming it now. thank you
[19:56:16] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[19:56:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:58:59] <icinga-wm>	 PROBLEM - Router interfaces on mr1-eqsin is CRITICAL: CRITICAL: No response from remote host 103.102.166.128 for 1.3.6.1.2.1.2.2.1.8 with snmp version 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[20:03:06] <volans>	 np anytime
[20:03:23] <icinga-wm>	 RECOVERY - Uncommitted DNS changes in Netbox on netbox1001 is OK: Netbox has zero uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[20:05:25] <icinga-wm>	 PROBLEM - Juniper alarms on mr1-eqsin is CRITICAL: JNX_ALARMS CRITICAL - No response from remote host 103.102.166.128 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Juniper_alarm
[20:07:17] <icinga-wm>	 PROBLEM - Router interfaces on mr1-eqsin is CRITICAL: CRITICAL: No response from remote host 103.102.166.128 for 1.3.6.1.2.1.2.2.1.8 with snmp version 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[20:09:06] <mutante>	 doesnt look good but since it's the management router it's not UBN and maybe maintenance
[20:11:57] <icinga-wm>	 RECOVERY - Juniper alarms on mr1-eqsin is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms https://wikitech.wikimedia.org/wiki/Network_monitoring%23Juniper_alarm
[20:12:11] <icinga-wm>	 RECOVERY - Router interfaces on mr1-eqsin is OK: OK: host 103.102.166.128, interfaces up: 38, down: 0, dormant: 0, excluded: 1, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[20:12:27] <icinga-wm>	 RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 301.22 ms
[20:12:35] <icinga-wm>	 RECOVERY - Host mr1-eqsin IPv6 is UP: PING OK - Packet loss = 0%, RTA = 231.04 ms
[20:13:09] <mutante>	 looks like it indeed
[20:17:44] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: eqiad: Server Moves to Free up 7x 2u Spaces on 10g Racks - https://phabricator.wikimedia.org/T267065 (10wiki_willy)
[20:18:52] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics-Clusters, 10DC-Ops: (Need By: TBD) rack/setup/install an-worker10[18-41] - https://phabricator.wikimedia.org/T260445 (10wiki_willy)
[20:18:58] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: eqiad: Server Moves to Free up 7x 2u Spaces on 10g Racks - https://phabricator.wikimedia.org/T267065 (10wiki_willy)
[20:19:03] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics-Clusters, 10DC-Ops: (Need By: TBD) rack/setup/install an-worker10[18-41] - https://phabricator.wikimedia.org/T260445 (10wiki_willy) Thanks for the heads up @Cmjohnson .  @elukey - do you have any servers on existing servers on 10g switches, that you might be able to d...
[20:29:33] <wikibugs>	 (03CR) 10Ahmon Dancy: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/638146 (owner: 10Ahmon Dancy)
[20:36:53] <wikibugs>	 (03CR) 10Andrew Bogott: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/638146 (owner: 10Ahmon Dancy)
[20:43:41] <wikibugs>	 10Operations, 10Pybal, 10Traffic, 10Patch-For-Review: Tune systemd journal rate limiting for PyBal - https://phabricator.wikimedia.org/T189290 (10Aklapper) @vgutierrez: Is https://gerrit.wikimedia.org/r/c/operations/debs/pybal/+/418866 still wanted? What exactly (task, person?) is this task [stalled](https...
[20:43:47] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: eqiad: Server Moves to Free up Space on 10g Racks - https://phabricator.wikimedia.org/T267065 (10wiki_willy)
[20:48:52] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: eqiad: Server Moves to Free up Space on 10g Racks - https://phabricator.wikimedia.org/T267065 (10wiki_willy)
[20:57:14] <wikibugs>	 (03CR) 10Huji: [C: 04-1] "See phab." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/638162 (https://phabricator.wikimedia.org/T266298) (owner: 10Urbanecm)
[21:00:04] <jouncebot>	 chrisalbon and accraze: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Services – Graphoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201102T2100).
[21:03:04] <wikibugs>	 (03PS4) 10Jeena Huneidi: linkrecommendation: Add deployment chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/636916 (https://phabricator.wikimedia.org/T265893) (owner: 10Kosta Harlan)
[21:04:43] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] linkrecommendation: Add deployment chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/636916 (https://phabricator.wikimedia.org/T265893) (owner: 10Kosta Harlan)
[21:09:07] <wikibugs>	 (03PS2) 10Jeena Huneidi: Scaffold improvements [deployment-charts] - 10https://gerrit.wikimedia.org/r/637753
[21:10:53] <wikibugs>	 (03CR) 10Huji: "Redacted" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/638162 (https://phabricator.wikimedia.org/T266298) (owner: 10Urbanecm)
[21:11:37] <wikibugs>	 (03CR) 10Urbanecm: "> Patch Set 1: -Code-Review" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/638162 (https://phabricator.wikimedia.org/T266298) (owner: 10Urbanecm)
[21:12:40] <wikibugs>	 (03CR) 10Huji: [C: 03+1] abusefilter.php: Enable wgAbuseFilterNotificationsPrivate by default for WMF wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/638162 (https://phabricator.wikimedia.org/T266298) (owner: 10Urbanecm)
[21:13:57] <wikibugs>	 (03PS3) 10Jeena Huneidi: Scaffold improvements [deployment-charts] - 10https://gerrit.wikimedia.org/r/637753
[21:14:54] <wikibugs>	 (03CR) 10Jeena Huneidi: Scaffold improvements (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/637753 (owner: 10Jeena Huneidi)
[21:19:00] <wikibugs>	 (03CR) 10Huji: [C: 03+1] "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/638162 (https://phabricator.wikimedia.org/T266298) (owner: 10Urbanecm)
[21:19:30] <wikibugs>	 (03CR) 10Urbanecm: "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/638162 (https://phabricator.wikimedia.org/T266298) (owner: 10Urbanecm)
[21:24:37] <wikibugs>	 10Operations, 10DBA, 10Performance-Team, 10Platform Engineering, 10User-Kormat: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 (10Krinkle)
[21:24:53] <wikibugs>	 10Operations, 10DBA, 10Performance-Team, 10Platform Engineering, 10User-Kormat: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 (10Krinkle) a:05Krinkle→03nnikkhoui
[21:26:18] <wikibugs>	 (03PS5) 10Jeena Huneidi: linkrecommendation: Add deployment chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/636916 (https://phabricator.wikimedia.org/T265893) (owner: 10Kosta Harlan)
[21:29:09] <icinga-wm>	 RECOVERY - Maps - OSM synchronization lag - eqiad on alert1001 is OK: (C)2.592e+05 ge (W)1.764e+05 ge 1.601e+05 https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=11&fullscreen&orgId=1
[21:31:29] <icinga-wm>	 PROBLEM - cassandra CQL 10.192.16.179:9042 on maps2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T93886
[21:36:21] <icinga-wm>	 RECOVERY - cassandra CQL 10.192.16.179:9042 on maps2002 is OK: TCP OK - 0.032 second response time on 10.192.16.179 port 9042 https://phabricator.wikimedia.org/T93886
[21:50:51] <icinga-wm>	 PROBLEM - Check systemd state on maps2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:51:29] <icinga-wm>	 PROBLEM - cassandra CQL 10.192.16.179:9042 on maps2002 is CRITICAL: connect to address 10.192.16.179 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886
[21:51:39] <icinga-wm>	 PROBLEM - cassandra service on maps2002 is CRITICAL: CRITICAL - Expecting active but unit cassandra is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[21:52:36] <wikibugs>	 (03CR) 10Ahmon Dancy: "> I probably don't want to flip on a feature on a hypervisor that's running a bunch of different user VMs.  Can you tell me specifically w" [puppet] - 10https://gerrit.wikimedia.org/r/638146 (owner: 10Ahmon Dancy)
[21:52:38] <wikibugs>	 10Operations, 10Commons, 10SRE-swift-storage: Recently more broken files (premature end of file at 5MB size) that were cross-wiki uploaded to Commons - https://phabricator.wikimedia.org/T266903 (10Draceane) p:05High→03Unbreak!
[21:57:33] <icinga-wm>	 RECOVERY - Check systemd state on maps2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:58:11] <icinga-wm>	 RECOVERY - cassandra CQL 10.192.16.179:9042 on maps2002 is OK: TCP OK - 0.032 second response time on 10.192.16.179 port 9042 https://phabricator.wikimedia.org/T93886
[21:58:23] <icinga-wm>	 RECOVERY - cassandra service on maps2002 is OK: OK - cassandra is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[22:00:04] <jouncebot>	 Reedy and sbassett: May I have your attention please! Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201102T2200)
[22:03:52] <twentyafterfour>	 !log applied 113a244a66 on phab1001 to hotfix T240862
[22:03:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:04:01] <stashbot>	 T240862: Can't do shallow clone from phabricator - https://phabricator.wikimedia.org/T240862
[22:07:33] <wikibugs>	 10Operations, 10DBA, 10Performance-Team, 10Platform Engineering, 10User-Kormat: Document remaining database load groups - https://phabricator.wikimedia.org/T267077 (10nnikkhoui)
[22:19:22] <twentyafterfour>	 !log restart php7.3-fpm on phab1001
[22:19:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:32:00] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10serviceops, 10Performance Issue: Very long response time on frwiki main page - https://phabricator.wikimedia.org/T266865 (10Dzahn) .
[22:37:13] <wikibugs>	 (03CR) 10Razzi: "PCC diff: https://puppet-compiler.wmflabs.org/compiler1001/26265/" [puppet] - 10https://gerrit.wikimedia.org/r/637587 (https://phabricator.wikimedia.org/T262660) (owner: 10Razzi)
[22:37:18] <wikibugs>	 10Operations, 10DBA, 10Performance-Team, 10Platform Engineering, 10User-Kormat: Document remaining database load groups - https://phabricator.wikimedia.org/T267077 (10ArielGlenn) I can say something about the "dump" group if someone points me at a location and tells me an appropriate format.
[22:42:56] <wikibugs>	 (03PS1) 10Razzi: nginx: Remove profile::tlsproxy::service [puppet] - 10https://gerrit.wikimedia.org/r/638185 (https://phabricator.wikimedia.org/T240439)
[22:50:09] <wikibugs>	 10Operations, 10ops-eqiad, 10Reading Epics (Analytics): an-coord1001 ram upgrade - https://phabricator.wikimedia.org/T266709 (10wiki_willy) a:03Cmjohnson
[23:03:44] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: ASAP) rack/setup/install ms-be106[0-3] - https://phabricator.wikimedia.org/T265093 (10wiki_willy)