[00:09:31] 10Operations, 10ops-codfw, 10DC-Ops: codfw: Testing Out Sample PDUs - https://phabricator.wikimedia.org/T265435 (10wiki_willy) p:05Triage→03Low [00:40:10] (03PS1) 10Dzahn: systemd::timer: fix TODO of adding type definition for timer job [puppet] - 10https://gerrit.wikimedia.org/r/633853 [00:45:44] (03PS1) 10Huji: Add 'spamblacklistlog' as a default right for the CU log user [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633855 (https://phabricator.wikimedia.org/T239288) [00:48:42] (03CR) 10Huji: "This is very similar to https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/608222" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633855 (https://phabricator.wikimedia.org/T239288) (owner: 10Huji) [00:51:30] (03CR) 10Ryan Kemper: [C: 03+2] "Sorry for the delay here. Looks great!" [puppet] - 10https://gerrit.wikimedia.org/r/633022 (owner: 10Dzahn) [00:54:41] (03PS5) 10Razzi: geoip: move archive timer from stat1007 to an-launcher1002 [puppet] - 10https://gerrit.wikimedia.org/r/633032 (https://phabricator.wikimedia.org/T264152) [00:55:54] (03PS1) 10Dzahn: gerrit: replace cron jobs with systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/633857 [00:56:13] (03CR) 10jerkins-bot: [V: 04-1] gerrit: replace cron jobs with systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/633857 (owner: 10Dzahn) [00:57:04] (03PS2) 10Dzahn: gerrit: replace cron jobs with systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/633857 [00:57:15] (03PS6) 10Razzi: geoip: move archive timer from stat1007 to an-launcher1002 [puppet] - 10https://gerrit.wikimedia.org/r/633032 (https://phabricator.wikimedia.org/T264152) [00:57:27] (03CR) 10jerkins-bot: [V: 04-1] gerrit: replace cron jobs with systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/633857 (owner: 10Dzahn) [00:59:50] (03PS3) 10Dzahn: gerrit: replace cron jobs with systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/633857 [01:13:28] (03PS7) 10Razzi: geoip: move archive timer from stat1007 to an-launcher1002 [puppet] - 10https://gerrit.wikimedia.org/r/633032 (https://phabricator.wikimedia.org/T264152) [01:14:10] (03PS8) 10Razzi: geoip: move archive timer from stat1007 to an-launcher1002 [puppet] - 10https://gerrit.wikimedia.org/r/633032 (https://phabricator.wikimedia.org/T264152) [01:15:20] (03CR) 10jerkins-bot: [V: 04-1] geoip: move archive timer from stat1007 to an-launcher1002 [puppet] - 10https://gerrit.wikimedia.org/r/633032 (https://phabricator.wikimedia.org/T264152) (owner: 10Razzi) [01:18:25] (03PS9) 10Razzi: geoip: move archive timer from stat1007 to an-launcher1002 [puppet] - 10https://gerrit.wikimedia.org/r/633032 (https://phabricator.wikimedia.org/T264152) [01:25:42] (03PS10) 10Razzi: geoip: move archive timer from stat1007 to an-launcher1002 [puppet] - 10https://gerrit.wikimedia.org/r/633032 (https://phabricator.wikimedia.org/T264152) [01:29:04] (03PS11) 10Razzi: geoip: move archive timer from stat1007 to an-launcher1002 [puppet] - 10https://gerrit.wikimedia.org/r/633032 (https://phabricator.wikimedia.org/T264152) [01:30:08] (03CR) 10jerkins-bot: [V: 04-1] geoip: move archive timer from stat1007 to an-launcher1002 [puppet] - 10https://gerrit.wikimedia.org/r/633032 (https://phabricator.wikimedia.org/T264152) (owner: 10Razzi) [01:32:16] (03CR) 10Razzi: "Puppet catalog compiler output: https://puppet-compiler.wmflabs.org/compiler1002/25865/vvvvvvv" [puppet] - 10https://gerrit.wikimedia.org/r/633032 (https://phabricator.wikimedia.org/T264152) (owner: 10Razzi) [01:33:06] (03CR) 10Razzi: "> Patch Set 6:" [puppet] - 10https://gerrit.wikimedia.org/r/633032 (https://phabricator.wikimedia.org/T264152) (owner: 10Razzi) [01:52:26] PROBLEM - Check systemd state on elastic1063 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:14:52] RECOVERY - Check systemd state on elastic1063 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:11:45] (03PS1) 10Marostegui: db2125: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/633863 (https://phabricator.wikimedia.org/T260670) [05:42:46] (03CR) 10Marostegui: [C: 03+2] db2125: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/633863 (https://phabricator.wikimedia.org/T260670) (owner: 10Marostegui) [05:44:20] !log marostegui@cumin1001 dbctl commit (dc=all): 'db2125 (re)pooling @ 10%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12982 and previous config saved to /var/cache/conftool/dbconfig/20201014-054420-root.json [05:44:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:44:28] T260670: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 [05:48:56] (03PS1) 10Elukey: Decommission analytics1049 from the Hadoop cluster [puppet] - 10https://gerrit.wikimedia.org/r/633864 (https://phabricator.wikimedia.org/T255140) [05:49:40] (03CR) 10Elukey: [C: 03+2] sre.hadoop.change-distro-from-cdh: allow to select workers/journal [cookbooks] - 10https://gerrit.wikimedia.org/r/633550 (owner: 10Elukey) [05:49:57] (03CR) 10Elukey: [C: 03+2] sre.hadoop.reboot-workers: allow to limit workers to reboot [cookbooks] - 10https://gerrit.wikimedia.org/r/633766 (https://phabricator.wikimedia.org/T255138) (owner: 10Elukey) [05:56:13] (03CR) 10Elukey: [C: 03+2] Decommission analytics1049 from the Hadoop cluster [puppet] - 10https://gerrit.wikimedia.org/r/633864 (https://phabricator.wikimedia.org/T255140) (owner: 10Elukey) [05:56:17] (03CR) 10Marostegui: [C: 03+1] "Just for the record, the socket one was used while we migrated away from /tmp. It took some years to do so, and we had to have both locati" [puppet] - 10https://gerrit.wikimedia.org/r/633768 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [05:57:23] (03CR) 10Urbanecm: [C: 04-1] Add 'spamblacklistlog' as a default right for the CU log user (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633855 (https://phabricator.wikimedia.org/T239288) (owner: 10Huji) [05:59:23] !log marostegui@cumin1001 dbctl commit (dc=all): 'db2125 (re)pooling @ 30%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12983 and previous config saved to /var/cache/conftool/dbconfig/20201014-055923-root.json [05:59:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:59:30] T260670: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 [06:10:15] (03CR) 10Nikerabbit: [C: 04-1] "Per https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/586353" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633761 (owner: 10DannyS712) [06:12:58] !log Change UNIQUE into KEY on enwikivoyage.imagelinks T265445 [06:13:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:13:04] T265445: Index uniqueness mismatches in links tables in wikis that were moved from s3 to s5 - https://phabricator.wikimedia.org/T265445 [06:14:27] !log marostegui@cumin1001 dbctl commit (dc=all): 'db2125 (re)pooling @ 20%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12984 and previous config saved to /var/cache/conftool/dbconfig/20201014-061426-root.json [06:14:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:14:33] T260670: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 [06:17:15] (03PS1) 10Muehlenhoff: Remove access for joewalsh [puppet] - 10https://gerrit.wikimedia.org/r/633888 [06:19:27] (03PS1) 10Gergő Tisza: GrowthExperiments: Default to variant D on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633889 [06:22:47] (03CR) 10Urbanecm: "> Patch Set 2: Code-Review-1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633761 (owner: 10DannyS712) [06:23:36] (03CR) 10Urbanecm: "We should perhaps just copy it again (meh) and include beta's overrides again." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633761 (owner: 10DannyS712) [06:29:30] !log marostegui@cumin1001 dbctl commit (dc=all): 'db2125 (re)pooling @ 40%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12985 and previous config saved to /var/cache/conftool/dbconfig/20201014-062930-root.json [06:29:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:29:36] T260670: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 [06:44:34] !log marostegui@cumin1001 dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12986 and previous config saved to /var/cache/conftool/dbconfig/20201014-064433-root.json [06:44:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:44:42] T260670: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 [06:53:41] (03PS1) 10Elukey: sre.hadoop.init-hadoop-workers: add comments about Megacli options [cookbooks] - 10https://gerrit.wikimedia.org/r/633941 [06:54:32] (03PS2) 10Elukey: sre.hadoop.init-hadoop-workers: add comments about Megacli options [cookbooks] - 10https://gerrit.wikimedia.org/r/633941 [06:56:01] (03CR) 10jerkins-bot: [V: 04-1] sre.hadoop.init-hadoop-workers: add comments about Megacli options [cookbooks] - 10https://gerrit.wikimedia.org/r/633941 (owner: 10Elukey) [06:58:29] (03PS3) 10Elukey: sre.hadoop.init-hadoop-workers: add comments about Megacli options [cookbooks] - 10https://gerrit.wikimedia.org/r/633941 [06:59:37] !log marostegui@cumin1001 dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12987 and previous config saved to /var/cache/conftool/dbconfig/20201014-065936-root.json [06:59:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:59:44] T260670: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 [07:00:24] (03CR) 10Elukey: [C: 03+2] sre.hadoop.init-hadoop-workers: add comments about Megacli options [cookbooks] - 10https://gerrit.wikimedia.org/r/633941 (owner: 10Elukey) [07:11:52] (03PS1) 10Elukey: role::analytics_test_cluster::hadoop::worker: remove unnecessary nrpe disk check [puppet] - 10https://gerrit.wikimedia.org/r/633944 (https://phabricator.wikimedia.org/T255139) [07:12:34] (03CR) 10Elukey: [C: 03+2] role::analytics_test_cluster::hadoop::worker: remove unnecessary nrpe disk check [puppet] - 10https://gerrit.wikimedia.org/r/633944 (https://phabricator.wikimedia.org/T255139) (owner: 10Elukey) [07:13:01] (03CR) 10Muehlenhoff: [C: 03+2] Remove access for joewalsh [puppet] - 10https://gerrit.wikimedia.org/r/633888 (owner: 10Muehlenhoff) [07:14:40] !log marostegui@cumin1001 dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12988 and previous config saved to /var/cache/conftool/dbconfig/20201014-071440-root.json [07:14:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:14:46] T260670: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 [07:18:19] (03PS1) 10Elukey: Add an-test-coord1001's IPs to Kafka Jumbo's ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/633945 (https://phabricator.wikimedia.org/T255139) [07:20:31] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/633945 (https://phabricator.wikimedia.org/T255139) (owner: 10Elukey) [07:31:45] 10Operations, 10ops-codfw, 10DBA, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) 05Open→03Resolved This host has been fully pooled and notifications were enabled too. Going to close this as fixed for now, we'll see if it crashes... [07:34:25] (03PS1) 10Elukey: Add IPv6's PTR/AAAA records for the new Hadoop test cluster [dns] - 10https://gerrit.wikimedia.org/r/633946 (https://phabricator.wikimedia.org/T255139) [07:48:16] 10Operations, 10ops-codfw: Degraded RAID on ms-be2036 - https://phabricator.wikimedia.org/T265208 (10fgiunchedi) Thanks @papaul! It looks like the `sdd` disk is in trouble, do you have a spare or can order one? Thank you! ` [45698.678655] sd 0:1:0:6: [sdg] tag#31 FAILED Result: hostbyte=DID_OK driverbyte=DRIV... [07:49:23] (03CR) 10Elukey: "Riccardo, afaics from https://wikitech.wikimedia.org/wiki/DNS/Netbox eqiad is not migrated but I don't recall if the migration is already " [dns] - 10https://gerrit.wikimedia.org/r/633946 (https://phabricator.wikimedia.org/T255139) (owner: 10Elukey) [08:05:28] (03PS1) 10Elukey: profile::hadoop::worker: avoid custom disk space checks for the test cluster [puppet] - 10https://gerrit.wikimedia.org/r/633947 (https://phabricator.wikimedia.org/T255139) [08:09:17] !log filippo@cumin1001 START - Cookbook sre.hosts.downtime [08:09:18] !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [08:09:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:09:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:16:10] (03PS1) 10Elukey: Add fake keytabs for new Hadoop test cluster nodes [labs/private] - 10https://gerrit.wikimedia.org/r/633948 (https://phabricator.wikimedia.org/T255139) [08:16:34] (03CR) 10Elukey: [V: 03+2 C: 03+2] Add fake keytabs for new Hadoop test cluster nodes [labs/private] - 10https://gerrit.wikimedia.org/r/633948 (https://phabricator.wikimedia.org/T255139) (owner: 10Elukey) [08:17:54] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/25867/" [puppet] - 10https://gerrit.wikimedia.org/r/633947 (https://phabricator.wikimedia.org/T255139) (owner: 10Elukey) [08:20:58] (03CR) 10Kormat: [C: 03+2] mariadb: Remove unused hiera lookups [puppet] - 10https://gerrit.wikimedia.org/r/633768 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [08:22:02] (03PS1) 10JMeybohm: admin: add nnair to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/633950 (https://phabricator.wikimedia.org/T265428) [08:23:56] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/633950 (https://phabricator.wikimedia.org/T265428) (owner: 10JMeybohm) [08:24:16] (03CR) 10JMeybohm: [C: 03+2] admin: add nnair to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/633950 (https://phabricator.wikimedia.org/T265428) (owner: 10JMeybohm) [08:28:18] !log elukey@cumin1001 START - Cookbook sre.dns.netbox [08:28:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:32:25] 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: LDAP access to the wmf group for Neha Nair (nnair) - https://phabricator.wikimedia.org/T265428 (10JMeybohm) 05Open→03Resolved a:03JMeybohm Done [08:34:20] 10Operations, 10Data-Persistence-Backup, 10SRE-tools: Add toil::systemd_scope_cleanup to dbprov hosts - https://phabricator.wikimedia.org/T265323 (10Marostegui) p:05Triage→03Medium [08:34:29] (03CR) 10Kosta Harlan: Disable wgWMEUnderstandingFirstDay (EditorJourney) logging (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633514 (https://phabricator.wikimedia.org/T252391) (owner: 10Kosta Harlan) [08:34:42] !log elukey@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [08:34:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:36:10] (03CR) 10Elukey: [C: 03+2] Add IPv6's PTR/AAAA records for the new Hadoop test cluster [dns] - 10https://gerrit.wikimedia.org/r/633946 (https://phabricator.wikimedia.org/T255139) (owner: 10Elukey) [08:37:22] (03CR) 10Elukey: [C: 03+2] Add an-test-coord1001's IPs to Kafka Jumbo's ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/633945 (https://phabricator.wikimedia.org/T255139) (owner: 10Elukey) [08:39:58] (03PS2) 10Kormat: [WIP] mariadb: Convert role::mariadb::core to profile. [puppet] - 10https://gerrit.wikimedia.org/r/633769 (https://phabricator.wikimedia.org/T256972) [08:41:22] (03PS1) 10Elukey: Revert "Add an-test-coord1001's IPs to Kafka Jumbo's ferm rules" [puppet] - 10https://gerrit.wikimedia.org/r/633874 [08:41:57] * elukey writes 100 times "I should not send code reviews early in the morning" [08:42:12] (03CR) 10Elukey: [C: 03+2] Revert "Add an-test-coord1001's IPs to Kafka Jumbo's ferm rules" [puppet] - 10https://gerrit.wikimedia.org/r/633874 (owner: 10Elukey) [08:47:09] (03CR) 10Jbond: [C: 03+1] calico: hiera->lookup, add data types [puppet] - 10https://gerrit.wikimedia.org/r/633033 (owner: 10Dzahn) [08:47:54] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/633026 (owner: 10Dzahn) [08:50:53] (03CR) 10Jbond: [C: 03+1] "LGTM but also get a +1 from brooke just in case theses variables are populated via openstack" [puppet] - 10https://gerrit.wikimedia.org/r/633838 (owner: 10Dzahn) [08:57:54] (03PS3) 10Kormat: [WIP] mariadb: Convert role::mariadb::core to profile. [puppet] - 10https://gerrit.wikimedia.org/r/633769 (https://phabricator.wikimedia.org/T256972) [09:02:00] (03PS4) 10Kormat: mariadb: Convert role::mariadb::core to profile. [puppet] - 10https://gerrit.wikimedia.org/r/633769 (https://phabricator.wikimedia.org/T256972) [09:07:45] (03PS1) 10Kormat: mariadb: make role::mariadb::core_test use mysql_role [puppet] - 10https://gerrit.wikimedia.org/r/633958 [09:08:10] (03PS5) 10Kormat: mariadb: Convert role::mariadb::core to profile. [puppet] - 10https://gerrit.wikimedia.org/r/633769 (https://phabricator.wikimedia.org/T256972) [09:15:08] (03CR) 10Kormat: "PCC is a no-op: https://puppet-compiler.wmflabs.org/compiler1003/25871/" [puppet] - 10https://gerrit.wikimedia.org/r/633958 (owner: 10Kormat) [09:20:56] (03CR) 10Kormat: "PCC run is a ~no-op: https://puppet-compiler.wmflabs.org/compiler1002/25873/" [puppet] - 10https://gerrit.wikimedia.org/r/633769 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [09:21:51] (03CR) 10Gehel: cookbook API: add class API (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/506956 (https://phabricator.wikimedia.org/T221212) (owner: 10Volans) [09:29:34] (03PS1) 10Kormat: mariadb: Add db sections to motd. [puppet] - 10https://gerrit.wikimedia.org/r/633962 [09:30:50] (03PS1) 10Elukey: Fix host override selection for Hadoop cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/633963 [09:34:38] (03CR) 10Hashar: "recheck" [debs/doxygen] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/621291 (https://phabricator.wikimedia.org/T254465) (owner: 10Hashar) [09:37:00] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [09:37:17] (03PS1) 10Filippo Giunchedi: pontoon: add thanos hiera to o11y [puppet] - 10https://gerrit.wikimedia.org/r/633965 [09:37:19] (03PS1) 10Filippo Giunchedi: pontoon: add new hosts to o11y [puppet] - 10https://gerrit.wikimedia.org/r/633966 [09:37:21] (03PS1) 10Filippo Giunchedi: pontoon: add monitoring_hosts to o11y [puppet] - 10https://gerrit.wikimedia.org/r/633967 [09:38:14] (03CR) 10Gergő Tisza: [C: 03+1] Disable wgWMEUnderstandingFirstDay (EditorJourney) logging (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633514 (https://phabricator.wikimedia.org/T252391) (owner: 10Kosta Harlan) [09:38:40] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [09:39:39] (03CR) 10Filippo Giunchedi: [C: 03+2] pontoon: add thanos hiera to o11y [puppet] - 10https://gerrit.wikimedia.org/r/633965 (owner: 10Filippo Giunchedi) [09:39:45] (03CR) 10Filippo Giunchedi: [C: 03+2] pontoon: add new hosts to o11y [puppet] - 10https://gerrit.wikimedia.org/r/633966 (owner: 10Filippo Giunchedi) [09:39:50] (03CR) 10Filippo Giunchedi: [C: 03+2] pontoon: add monitoring_hosts to o11y [puppet] - 10https://gerrit.wikimedia.org/r/633967 (owner: 10Filippo Giunchedi) [09:41:48] 10Operations, 10Technical-blog-posts, 10Traffic: Blog post series: the evolution of Wikimedia's Content Delivery Network - https://phabricator.wikimedia.org/T264729 (10ema) >>! In T264729#6540401, @srodlund wrote: > @ema are you the sole author I am. > I credited all of the images in the post to you as you... [09:43:54] (03CR) 10Marostegui: "if we wanted to have a host with master role but belonging to test_core, that's still doable?" [puppet] - 10https://gerrit.wikimedia.org/r/633958 (owner: 10Kormat) [09:46:01] (03CR) 10Kormat: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/633958 (owner: 10Kormat) [09:46:22] (03CR) 10Marostegui: [C: 03+1] mariadb: make role::mariadb::core_test use mysql_role [puppet] - 10https://gerrit.wikimedia.org/r/633958 (owner: 10Kormat) [09:46:46] (03CR) 10Kormat: [C: 03+2] mariadb: make role::mariadb::core_test use mysql_role [puppet] - 10https://gerrit.wikimedia.org/r/633958 (owner: 10Kormat) [09:50:12] (03CR) 10Marostegui: [C: 03+1] mariadb: Convert role::mariadb::core to profile. [puppet] - 10https://gerrit.wikimedia.org/r/633769 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [09:50:18] (03PS1) 10Hashar: Attempt to build with llvm 8 [debs/doxygen] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/633968 [09:50:22] (03CR) 10Kormat: [C: 03+2] mariadb: Convert role::mariadb::core to profile. [puppet] - 10https://gerrit.wikimedia.org/r/633769 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [09:50:24] (03CR) 10jerkins-bot: [V: 04-1] Attempt to build with llvm 8 [debs/doxygen] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/633968 (owner: 10Hashar) [09:53:03] (03Abandoned) 10Hashar: Attempt to build with llvm 8 [debs/doxygen] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/633968 (owner: 10Hashar) [09:53:35] (03PS1) 10Kormat: dbutil: Simplify typing in read_section_ports_list [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/633969 [09:56:52] (03CR) 10Kormat: [C: 03+2] dbutil: Simplify typing in read_section_ports_list [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/633969 (owner: 10Kormat) [09:58:26] (03Merged) 10jenkins-bot: dbutil: Simplify typing in read_section_ports_list [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/633969 (owner: 10Kormat) [09:59:41] !log imported php-wmerrors, tideways, tideways-xhprof, wikidiff2, xdebug to component/icu63 T264991 [09:59:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:59:49] T264991: Upgrade the MediaWiki appservers to debian buster, icu63 - https://phabricator.wikimedia.org/T264991 [10:08:21] (03PS2) 10Kormat: mariadb: Add db sections to motd. [puppet] - 10https://gerrit.wikimedia.org/r/633962 [10:09:24] (03CR) 10Kormat: "@marostegui Thoughts?" [puppet] - 10https://gerrit.wikimedia.org/r/633962 (owner: 10Kormat) [10:09:58] (03CR) 10Marostegui: [C: 03+1] "Oh, nice one!" [puppet] - 10https://gerrit.wikimedia.org/r/633962 (owner: 10Kormat) [10:10:31] (03CR) 10Kormat: [C: 03+2] mariadb: Add db sections to motd. [puppet] - 10https://gerrit.wikimedia.org/r/633962 (owner: 10Kormat) [10:19:51] (03CR) 10Elukey: [C: 03+2] Fix host override selection for Hadoop cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/633963 (owner: 10Elukey) [10:31:56] (03CR) 10Jbond: [C: 03+1] "I can't think of a cleaner way, have added a comment inline but also lgtm as is" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/516615 (https://phabricator.wikimedia.org/T225613) (owner: 10Filippo Giunchedi) [10:32:53] (03PS1) 10Filippo Giunchedi: profile: selectively enable Prometheus compaction [puppet] - 10https://gerrit.wikimedia.org/r/633971 (https://phabricator.wikimedia.org/T261281) [10:32:55] (03PS1) 10Filippo Giunchedi: hieradata: re-enable compaction for prometheus[12]003 [puppet] - 10https://gerrit.wikimedia.org/r/633972 (https://phabricator.wikimedia.org/T261281) [10:44:43] (03PS1) 10Elukey: profile::hadoop::master: refactor monitor for HDFS space left [puppet] - 10https://gerrit.wikimedia.org/r/633973 (https://phabricator.wikimedia.org/T255139) [10:45:39] (03PS2) 10Elukey: profile::hadoop::master: refactor monitor for HDFS space left [puppet] - 10https://gerrit.wikimedia.org/r/633973 (https://phabricator.wikimedia.org/T255139) [10:45:43] (03CR) 10jerkins-bot: [V: 04-1] profile::hadoop::master: refactor monitor for HDFS space left [puppet] - 10https://gerrit.wikimedia.org/r/633973 (https://phabricator.wikimedia.org/T255139) (owner: 10Elukey) [10:46:04] aah snap I realized afterwards, sorry for the -1 spam [10:46:10] (03PS3) 10Elukey: profile::hadoop::master: refactor monitor for HDFS space left [puppet] - 10https://gerrit.wikimedia.org/r/633973 (https://phabricator.wikimedia.org/T255139) [10:47:11] (03CR) 10jerkins-bot: [V: 04-1] profile::hadoop::master: refactor monitor for HDFS space left [puppet] - 10https://gerrit.wikimedia.org/r/633973 (https://phabricator.wikimedia.org/T255139) (owner: 10Elukey) [10:48:54] (03PS4) 10Elukey: profile::hadoop::master: refactor monitor for HDFS space left [puppet] - 10https://gerrit.wikimedia.org/r/633973 (https://phabricator.wikimedia.org/T255139) [10:49:36] (03PS2) 10Bartosz Dziewoński: Enable DiscussionTools as a beta feature on 30 more wikis ("phase 2") [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633568 (https://phabricator.wikimedia.org/T264693) [10:50:10] (03CR) 10Elukey: [C: 03+2] profile::hadoop::master: refactor monitor for HDFS space left [puppet] - 10https://gerrit.wikimedia.org/r/633973 (https://phabricator.wikimedia.org/T255139) (owner: 10Elukey) [10:51:38] (03PS2) 10Vgutierrez: vcl: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 10% [puppet] - 10https://gerrit.wikimedia.org/r/633739 (https://phabricator.wikimedia.org/T258405) [10:52:39] (03PS3) 10Bartosz Dziewoński: Enable DiscussionTools as a beta feature on 30 more wikis ("phase 2") [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633568 (https://phabricator.wikimedia.org/T264693) [10:54:47] (03PS1) 10Jbond: lldp: add additional information to lldp fact [puppet] - 10https://gerrit.wikimedia.org/r/633977 (https://phabricator.wikimedia.org/T265456) [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: Dear deployers, time to do the European mid-day backport window deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201014T1100). [11:00:04] MatmaRex: A patch you scheduled for European mid-day backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:16] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:00:47] hello [11:01:56] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:05:03] anyone deploying? [11:09:37] I’m having lunch right now but could deploy in half an hour or so if nobody else is around [11:09:46] (03PS2) 10Jbond: lldp: add additional information to lldp fact [puppet] - 10https://gerrit.wikimedia.org/r/633977 (https://phabricator.wikimedia.org/T265456) [11:09:49] only one config change, should be doable :) [11:10:08] thanks. enjoy the lunch [11:12:43] (03PS3) 10Jbond: lldp: add additional information to lldp fact [puppet] - 10https://gerrit.wikimedia.org/r/633977 (https://phabricator.wikimedia.org/T265456) [11:13:57] (03PS4) 10Jbond: lldp: add additional information to lldp fact [puppet] - 10https://gerrit.wikimedia.org/r/633977 (https://phabricator.wikimedia.org/T265456) [11:16:41] MatmaRex: I can deploy today :) [11:16:49] (unless Lucas_WMDE really wants to?) [11:16:51] !log imported php-igbinary, php-apcu-bc to component/icu63 T264991 [11:16:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:16:59] T264991: Upgrade the MediaWiki appservers to debian buster, icu63 - https://phabricator.wikimedia.org/T264991 [11:17:38] Urbanecm: go ahead ^^ [11:17:42] (03CR) 10Urbanecm: [C: 03+2] Enable DiscussionTools as a beta feature on 30 more wikis ("phase 2") [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633568 (https://phabricator.wikimedia.org/T264693) (owner: 10Bartosz Dziewoński) [11:18:23] (03Merged) 10jenkins-bot: Enable DiscussionTools as a beta feature on 30 more wikis ("phase 2") [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633568 (https://phabricator.wikimedia.org/T264693) (owner: 10Bartosz Dziewoński) [11:19:09] MatmaRex: can you test at mwdebug2001, please? [11:19:26] thanks. looking [11:19:49] (03CR) 10Elukey: geoip: move archive timer from stat1007 to an-launcher1002 (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/633032 (https://phabricator.wikimedia.org/T264152) (owner: 10Razzi) [11:20:15] Urbanecm: looks fine [11:20:19] thanks, syncing [11:21:54] 10Operations, 10Analytics-Clusters, 10Traffic, 10Patch-For-Review: varnishkafka 1.1.0 CPU usage increase - https://phabricator.wikimedia.org/T264074 (10elukey) a:05klausman→03ema Transitioning the task to Ema since he is following up with upstream to patch Varnish :) [11:22:16] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: c63632de6a20b2f00da91187e5cf416fd39d8c5b: Enable DiscussionTools as a beta feature on 30 more wikis (T264693) (duration: 01m 15s) [11:22:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:22:22] T264693: Make config change to enable Reply Tool as Beta Feature at Phase 2 wikis - https://phabricator.wikimedia.org/T264693 [11:22:27] MatmaRex: can you test at mwdebug2001, please? [11:22:29] grr [11:22:30] MatmaRex: done [11:22:39] :D [11:22:40] thanks Urbanecm [11:22:41] anything else? [11:22:49] happy to help :) [11:23:05] i think it was just me today. and just that one change [11:23:10] cool [11:25:34] !log EU B&C window completed [11:25:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:27:47] (03PS5) 10Jbond: lldp: add additional information to lldp fact [puppet] - 10https://gerrit.wikimedia.org/r/633977 (https://phabricator.wikimedia.org/T265456) [11:41:57] (03PS1) 10Hnowlan: apiportal: enable discussion tools [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633980 (https://phabricator.wikimedia.org/T260624) [11:43:18] !log imported php-memcached, php-redis to component/icu63 T264991 [11:43:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:43:24] T264991: Upgrade the MediaWiki appservers to debian buster, icu63 - https://phabricator.wikimedia.org/T264991 [11:44:16] (03CR) 10Hnowlan: [C: 03+2] api-gateway: more instances in production [deployment-charts] - 10https://gerrit.wikimedia.org/r/633559 (owner: 10Hnowlan) [11:46:35] (03Merged) 10jenkins-bot: api-gateway: more instances in production [deployment-charts] - 10https://gerrit.wikimedia.org/r/633559 (owner: 10Hnowlan) [11:48:08] !log hnowlan@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' . [11:48:08] !log hnowlan@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' . [11:48:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:48:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:50:25] !log hnowlan@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' . [11:50:25] !log hnowlan@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' . [11:50:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:50:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:53:54] (03CR) 10Muehlenhoff: [C: 03+2] Remove dotfiles for banyek, demon, rush [puppet] - 10https://gerrit.wikimedia.org/r/633473 (owner: 10Muehlenhoff) [11:56:36] (03PS1) 10Matthias Mullie: Add another SDC property to search for matching media statements [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633982 (https://phabricator.wikimedia.org/T264925) [12:06:36] 10Operations, 10Wikimedia-Mailing-lists: Create arbcom-cs mailinglist for Czech Arbitration Committee - https://phabricator.wikimedia.org/T265472 (10Urbanecm) [12:10:48] (03PS6) 10Jbond: lldp: add additional information to lldp fact [puppet] - 10https://gerrit.wikimedia.org/r/633977 (https://phabricator.wikimedia.org/T265456) [12:17:06] (03CR) 10Muehlenhoff: [C: 03+2] Add an apt proxy config for deb.debian.org [puppet] - 10https://gerrit.wikimedia.org/r/633172 (https://phabricator.wikimedia.org/T262647) (owner: 10Muehlenhoff) [12:27:01] (03PS6) 10Filippo Giunchedi: swift: change ownership depending on mountpoint status [puppet] - 10https://gerrit.wikimedia.org/r/516615 (https://phabricator.wikimedia.org/T225613) [12:27:52] (03CR) 10Filippo Giunchedi: swift: change ownership depending on mountpoint status (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/516615 (https://phabricator.wikimedia.org/T225613) (owner: 10Filippo Giunchedi) [12:32:21] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [12:34:01] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [12:37:30] (03PS7) 10Filippo Giunchedi: swift: change ownership depending on mountpoint status [puppet] - 10https://gerrit.wikimedia.org/r/516615 (https://phabricator.wikimedia.org/T225613) [12:38:55] (03CR) 10Filippo Giunchedi: swift: change ownership depending on mountpoint status (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/516615 (https://phabricator.wikimedia.org/T225613) (owner: 10Filippo Giunchedi) [12:43:15] (03PS8) 10Filippo Giunchedi: swift: change ownership depending on mountpoint status [puppet] - 10https://gerrit.wikimedia.org/r/516615 (https://phabricator.wikimedia.org/T225613) [12:43:32] (03CR) 10Filippo Giunchedi: swift: change ownership depending on mountpoint status (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/516615 (https://phabricator.wikimedia.org/T225613) (owner: 10Filippo Giunchedi) [12:44:04] (03CR) 10Filippo Giunchedi: swift: change ownership depending on mountpoint status (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/516615 (https://phabricator.wikimedia.org/T225613) (owner: 10Filippo Giunchedi) [12:45:40] (03CR) 10Vgutierrez: [C: 03+2] vcl: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 10% [puppet] - 10https://gerrit.wikimedia.org/r/633739 (https://phabricator.wikimedia.org/T258405) (owner: 10Vgutierrez) [12:46:41] !log Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 10% - T258405 [12:46:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:47] T258405: Deprecate TLSv1.2 weak ciphersuites - https://phabricator.wikimedia.org/T258405 [12:49:00] (03PS7) 10Jbond: lldp: add additional information to lldp fact [puppet] - 10https://gerrit.wikimedia.org/r/633977 (https://phabricator.wikimedia.org/T265456) [12:50:00] (03CR) 10jerkins-bot: [V: 04-1] lldp: add additional information to lldp fact [puppet] - 10https://gerrit.wikimedia.org/r/633977 (https://phabricator.wikimedia.org/T265456) (owner: 10Jbond) [12:52:05] (03CR) 10Jbond: swift: change ownership depending on mountpoint status (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/516615 (https://phabricator.wikimedia.org/T225613) (owner: 10Filippo Giunchedi) [12:55:20] (03PS8) 10Jbond: lldp: add additional information to lldp fact [puppet] - 10https://gerrit.wikimedia.org/r/633977 (https://phabricator.wikimedia.org/T265456) [13:02:37] (03PS9) 10Filippo Giunchedi: swift: change ownership depending on mountpoint status [puppet] - 10https://gerrit.wikimedia.org/r/516615 (https://phabricator.wikimedia.org/T225613) [13:03:48] (03CR) 10Filippo Giunchedi: swift: change ownership depending on mountpoint status (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/516615 (https://phabricator.wikimedia.org/T225613) (owner: 10Filippo Giunchedi) [13:10:42] (03CR) 10Jbond: "ready" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/633977 (https://phabricator.wikimedia.org/T265456) (owner: 10Jbond) [13:15:24] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/516615 (https://phabricator.wikimedia.org/T225613) (owner: 10Filippo Giunchedi) [13:15:31] (03PS4) 10Muehlenhoff: Switch gerrit to profile::java [puppet] - 10https://gerrit.wikimedia.org/r/632224 (https://phabricator.wikimedia.org/T264182) [13:22:14] (03PS19) 10Kormat: mariadb: Apply the list of ports to the core::multiinstance class [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [13:23:59] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_citoid_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [13:25:37] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [13:25:50] 10Operations, 10Analytics-Clusters, 10Traffic, 10Patch-For-Review: varnishkafka 1.1.0 CPU usage increase - https://phabricator.wikimedia.org/T264074 (10ema) I've opened https://github.com/varnishcache/varnish-cache/issues/3436 for 6.5/master, https://github.com/varnishcache/varnish-cache/issues/3437 for 6.... [13:29:07] (03PS4) 10Kosta Harlan: labs: Disable EditorJourney (UnderstandingFirstDay) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633514 (https://phabricator.wikimedia.org/T252391) [13:29:08] (03PS1) 10Kosta Harlan: Disable EditorJourney (UnderstandingFirstDay) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634012 (https://phabricator.wikimedia.org/T252391) [13:29:24] (03PS5) 10Kosta Harlan: labs: Disable EditorJourney (UnderstandingFirstDay) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633514 (https://phabricator.wikimedia.org/T252391) [13:29:38] (03PS2) 10Kosta Harlan: Disable EditorJourney (UnderstandingFirstDay) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634012 (https://phabricator.wikimedia.org/T252391) [13:30:54] (03CR) 10Ottomata: [C: 03+1] "Hm, ok. But 10% in the prod cluster will still be around 300TB, right? I guess that's good to know as a warning." [puppet] - 10https://gerrit.wikimedia.org/r/633973 (https://phabricator.wikimedia.org/T255139) (owner: 10Elukey) [13:30:56] (03PS20) 10Kormat: mariadb: Apply the list of ports to the core::multiinstance class [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [13:37:18] (03CR) 10Filippo Giunchedi: [C: 03+2] swift: change ownership depending on mountpoint status [puppet] - 10https://gerrit.wikimedia.org/r/516615 (https://phabricator.wikimedia.org/T225613) (owner: 10Filippo Giunchedi) [13:38:04] (03PS5) 10Muehlenhoff: Switch gerrit to profile::java [puppet] - 10https://gerrit.wikimedia.org/r/632224 (https://phabricator.wikimedia.org/T264182) [13:39:13] (03CR) 10Ayounsi: lldp: add additional information to lldp fact (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/633977 (https://phabricator.wikimedia.org/T265456) (owner: 10Jbond) [13:39:56] (03PS21) 10Kormat: mariadb: Apply the list of ports to the core::multiinstance class [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [13:41:52] (03CR) 10Muehlenhoff: "Updated PCC: https://puppet-compiler.wmflabs.org/compiler1002/25881/" [puppet] - 10https://gerrit.wikimedia.org/r/632224 (https://phabricator.wikimedia.org/T264182) (owner: 10Muehlenhoff) [13:42:15] (03PS1) 10Ayounsi: Add Z side device/interface/vlan and cable to PuppetDB importer [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/634017 (https://phabricator.wikimedia.org/T262899) [13:43:18] (03PS1) 10Ema: 6.0.6-1wm2: clear vut->sighup even if sighup_f is not defined [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/634018 (https://phabricator.wikimedia.org/T264074) [13:44:03] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [13:44:04] (03PS1) 10Jbond: puppetdb: mount puppetdb1001 with tmpfs [puppet] - 10https://gerrit.wikimedia.org/r/634019 (https://phabricator.wikimedia.org/T263578) [13:44:22] (03CR) 10Elukey: "> Patch Set 4: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/633973 (https://phabricator.wikimedia.org/T255139) (owner: 10Elukey) [13:45:43] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [13:45:46] (03CR) 10Jbond: "updated" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/633977 (https://phabricator.wikimedia.org/T265456) (owner: 10Jbond) [13:46:09] (03CR) 10Jbond: [C: 03+2] puppetdb: mount puppetdb1001 with tmpfs [puppet] - 10https://gerrit.wikimedia.org/r/634019 (https://phabricator.wikimedia.org/T263578) (owner: 10Jbond) [13:47:28] (03CR) 10Kormat: "PCC is a noop: https://puppet-compiler.wmflabs.org/compiler1001/25882/" [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [13:48:01] !log disable puppet fleet wide to convert puppetdb stockpile queue to tmpfs [13:48:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:49:05] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [13:50:47] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [13:52:44] (03CR) 10Vgutierrez: [C: 03+1] 6.0.6-1wm2: clear vut->sighup even if sighup_f is not defined [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/634018 (https://phabricator.wikimedia.org/T264074) (owner: 10Ema) [13:53:26] !log enable puppet fleet wide post - convert puppetdb stockpile queue to tmpfs [13:53:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:55:45] (03CR) 10Elukey: [C: 03+1] "Given my understanding of https://github.com/varnishcache/varnish-cache/blob/6.0/lib/libvarnishapi/vut.c, it looks good. I am still not cl" [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/634018 (https://phabricator.wikimedia.org/T264074) (owner: 10Ema) [13:59:18] 10Puppet, 10Patch-For-Review: puppetdb seems to be slow on host reimage - https://phabricator.wikimedia.org/T263578 (10jbond) moving to tmpfs has definitely made a difference on the command processing time https://grafana.wikimedia.org/d/000000477/puppetdb?viewPanel=7&orgId=1&from=now-30d&to=now&var-datasource... [14:01:29] !log push a 6GB image, named docker-registry.discovery.wmnet/mwcachedir:0.0.1, containing the cache/ dir of a mediawiki installation to the registry. T265183 [14:01:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:01:36] T265183: In a k8s world: where does MediaWiki code live? - https://phabricator.wikimedia.org/T265183 [14:01:52] !log push a 6GB image, named docker-registry.discovery.wmnet/mwcachedir:0.0.1, containing the cache/ dir of a mediawiki installation to the registry. T264209 [14:01:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:01:59] T264209: Run stress tests on docker images infrastructure - https://phabricator.wikimedia.org/T264209 [14:02:33] (03CR) 10jerkins-bot: [V: 04-1] 6.0.6-1wm2: clear vut->sighup even if sighup_f is not defined [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/634018 (https://phabricator.wikimedia.org/T264074) (owner: 10Ema) [14:06:35] (03PS9) 10Jbond: lldp: add additional information to lldp fact [puppet] - 10https://gerrit.wikimedia.org/r/633977 (https://phabricator.wikimedia.org/T265456) [14:07:01] (03PS1) 10Muehlenhoff: Make the mirror to use configurable via Hiera [puppet] - 10https://gerrit.wikimedia.org/r/634023 (https://phabricator.wikimedia.org/T262647) [14:07:04] (03PS1) 10CDanis: Map tiles for 3rd parties: block (403) all requests, not just misses [puppet] - 10https://gerrit.wikimedia.org/r/634024 (https://phabricator.wikimedia.org/T261424) [14:09:15] (03CR) 10Ayounsi: [C: 03+1] "LGTM, thx!" [puppet] - 10https://gerrit.wikimedia.org/r/633977 (https://phabricator.wikimedia.org/T265456) (owner: 10Jbond) [14:09:28] (03CR) 10jerkins-bot: [V: 04-1] Make the mirror to use configurable via Hiera [puppet] - 10https://gerrit.wikimedia.org/r/634023 (https://phabricator.wikimedia.org/T262647) (owner: 10Muehlenhoff) [14:09:31] (03CR) 10Jbond: [C: 03+2] lldp: add additional information to lldp fact [puppet] - 10https://gerrit.wikimedia.org/r/633977 (https://phabricator.wikimedia.org/T265456) (owner: 10Jbond) [14:09:41] (03CR) 10CDanis: "VTC: 0 tests failed, 0 tests skipped, 16 tests passed" [puppet] - 10https://gerrit.wikimedia.org/r/634024 (https://phabricator.wikimedia.org/T261424) (owner: 10CDanis) [14:09:43] (03CR) 10Mforns: [C: 03+1] "LGTM! On our side, this can be merged! Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/633737 (https://phabricator.wikimedia.org/T254332) (owner: 10Ayounsi) [14:10:13] 10Operations, 10Analytics, 10Analytics-Kanban, 10netops, 10Patch-For-Review: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10mforns) Yes, please merge when ready, thanks! [14:11:17] (03PS2) 10Muehlenhoff: Make the mirror to use configurable via Hiera [puppet] - 10https://gerrit.wikimedia.org/r/634023 (https://phabricator.wikimedia.org/T262647) [14:12:07] (03CR) 10Huji: Add 'spamblacklistlog' as a default right for the CU log user (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633855 (https://phabricator.wikimedia.org/T239288) (owner: 10Huji) [14:12:18] (03CR) 10Ayounsi: [C: 03+2] Nfacct: add src_mask + dst_mask [puppet] - 10https://gerrit.wikimedia.org/r/633737 (https://phabricator.wikimedia.org/T254332) (owner: 10Ayounsi) [14:12:44] !log disable-puppet on deploy1001 to test a change in hemlfile puppet on deploy2001 only - T260917 [14:12:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:52] T260917: Support TLS for service-to-service communication in k8s staging - https://phabricator.wikimedia.org/T260917 [14:13:17] (03CR) 10JMeybohm: [C: 03+2] deployment_server::helmfile: Allow default secrets per environment [puppet] - 10https://gerrit.wikimedia.org/r/631720 (https://phabricator.wikimedia.org/T260917) (owner: 10JMeybohm) [14:16:05] (03PS22) 10Kormat: mariadb: core::multiinstance - generate sections. [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [14:16:13] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:16:58] 10Operations, 10Product-Design-Strategy: Deploy design-strategy Gerrit repo to design.wikimedia.org/design-strategy/ - https://phabricator.wikimedia.org/T265486 (10Volker_E) [14:17:08] 10Operations, 10ops-codfw: Degraded RAID on ms-be2036 - https://phabricator.wikimedia.org/T265208 (10Papaul) @fgiunchedi yes we do have 3 spares. Will replace when back on site tomorrow. [14:17:22] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1001/25884/" [puppet] - 10https://gerrit.wikimedia.org/r/634023 (https://phabricator.wikimedia.org/T262647) (owner: 10Muehlenhoff) [14:17:48] (03PS23) 10Kormat: mariadb: core::multiinstance - generate sections. [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [14:17:59] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:21:39] (03CR) 10BBlack: [C: 03+1] Map tiles for 3rd parties: block (403) all requests, not just misses [puppet] - 10https://gerrit.wikimedia.org/r/634024 (https://phabricator.wikimedia.org/T261424) (owner: 10CDanis) [14:23:17] (03CR) 10CDanis: [C: 03+2] Map tiles for 3rd parties: block (403) all requests, not just misses [puppet] - 10https://gerrit.wikimedia.org/r/634024 (https://phabricator.wikimedia.org/T261424) (owner: 10CDanis) [14:24:18] 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Traffic, 10Patch-For-Review: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10CDanis) 05Stalled→03Resolved This will be taking effect over the next half hour. If you are affected by this, a... [14:25:17] (03PS24) 10Kormat: mariadb: core::multiinstance - generate sections. [puppet] - 10https://gerrit.wikimedia.org/r/620899 (https://phabricator.wikimedia.org/T257033) (owner: 10Jcrespo) [14:26:59] (03PS1) 10Ema: Do not explicitly create varnish-dbg [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/634028 [14:27:04] (03Abandoned) 10CDanis: [WIP] maps: block 3rd parties with 403, even hits [puppet] - 10https://gerrit.wikimedia.org/r/570156 (https://phabricator.wikimedia.org/T244278) (owner: 10BBlack) [14:28:09] (03PS1) 10Jbond: puppetdb::app: add abblity to blacklist facts from puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/634029 (https://phabricator.wikimedia.org/T263578) [14:31:13] (03PS1) 10BBlack: Remove redundant upload VCL condition [puppet] - 10https://gerrit.wikimedia.org/r/634031 [14:31:14] (03CR) 10Esanders: "This will only enable DT as a beta feature. I assumed they wanted to enable it by default." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633980 (https://phabricator.wikimedia.org/T260624) (owner: 10Hnowlan) [14:32:42] (03CR) 10CDanis: [C: 03+1] Remove redundant upload VCL condition [puppet] - 10https://gerrit.wikimedia.org/r/634031 (owner: 10BBlack) [14:33:15] (03CR) 10Ema: [C: 03+1] Remove redundant upload VCL condition [puppet] - 10https://gerrit.wikimedia.org/r/634031 (owner: 10BBlack) [14:33:22] (03CR) 10BBlack: [C: 03+2] Remove redundant upload VCL condition [puppet] - 10https://gerrit.wikimedia.org/r/634031 (owner: 10BBlack) [14:34:44] (03PS2) 10Jbond: puppetdb::app: add abblity to blacklist facts from puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/634029 (https://phabricator.wikimedia.org/T263578) [14:36:01] (03PS1) 10Kormat: mariadb: misc::multiinstance - generate sections. [puppet] - 10https://gerrit.wikimedia.org/r/634034 (https://phabricator.wikimedia.org/T256972) [14:37:05] (03PS3) 10Ayounsi: Add Z side device/interface/vlan and cable to PuppetDB importer [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/634017 (https://phabricator.wikimedia.org/T262899) [14:38:39] (03PS2) 10Kormat: mariadb: misc::multiinstance - generate sections. [puppet] - 10https://gerrit.wikimedia.org/r/634034 (https://phabricator.wikimedia.org/T256972) [14:40:57] (03PS2) 10CDanis: geoip VCL: init/free functions are now reusable [puppet] - 10https://gerrit.wikimedia.org/r/630314 (https://phabricator.wikimedia.org/T263496) [14:40:59] (03PS3) 10CDanis: geoip VCL: add a 'which' param to get_geo_xcip [puppet] - 10https://gerrit.wikimedia.org/r/630315 (https://phabricator.wikimedia.org/T263496) [14:41:01] (03PS4) 10CDanis: VCL: Attach a variety of GeoIP info as bereq headers; test GeoIP [puppet] - 10https://gerrit.wikimedia.org/r/630316 (https://phabricator.wikimedia.org/T263496) [14:41:11] (03PS2) 10Huji: Add 'spamblacklistlog' as a default right for the CU log user [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633855 (https://phabricator.wikimedia.org/T239288) [14:41:57] (03CR) 10Huji: "This should the right place." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633855 (https://phabricator.wikimedia.org/T239288) (owner: 10Huji) [14:42:24] (03CR) 10Kormat: "PCC says no-op: https://puppet-compiler.wmflabs.org/compiler1003/25888/" [puppet] - 10https://gerrit.wikimedia.org/r/634034 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [14:43:30] 10Operations, 10netops, 10cloud-services-team (Kanban): Enable L3 routing on cloudsw nodes - https://phabricator.wikimedia.org/T265288 (10ayounsi) @aborrero when would be a good time to schedule those changes? knowing that there is a short downtime needed. Probably 2x5min, but I'd schedule 30min just in case... [14:46:12] (03PS5) 10CDanis: VCL: Attach a variety of GeoIP info as bereq headers; test GeoIP [puppet] - 10https://gerrit.wikimedia.org/r/630316 (https://phabricator.wikimedia.org/T263496) [14:46:14] (03CR) 10jerkins-bot: [V: 04-1] Do not explicitly create varnish-dbg [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/634028 (owner: 10Ema) [14:55:04] !log elukey@cumin1001 START - Cookbook sre.hadoop.reboot-workers [14:55:07] 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Traffic, 10Patch-For-Review: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10AntiCompositeNumber) I've removed Wikimedia Maps from https://wiki.openstreetmap.org/wiki/Tile_servers. [14:55:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:06] !log drain + reboot an-worker109[8,9] to pick up GPU settings - T255138 [14:56:08] (03CR) 10Filippo Giunchedi: [C: 03+1] Make the mirror to use configurable via Hiera [puppet] - 10https://gerrit.wikimedia.org/r/634023 (https://phabricator.wikimedia.org/T262647) (owner: 10Muehlenhoff) [14:56:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:12] T255138: Put 6 GPU-based Hadoop worker in service - https://phabricator.wikimedia.org/T255138 [14:56:49] (03CR) 10Volans: [C: 03+1] "I'm not familiar with this puppet configuration but the change looks sane and if the compiler is happy seems a noop for now, just a prepar" [puppet] - 10https://gerrit.wikimedia.org/r/634029 (https://phabricator.wikimedia.org/T263578) (owner: 10Jbond) [14:59:51] (03PS12) 10Razzi: geoip: move archive timer from stat1007 to an-launcher1002 [puppet] - 10https://gerrit.wikimedia.org/r/633032 (https://phabricator.wikimedia.org/T264152) [15:00:14] (03PS1) 10Itamar Givon: Set Wikidata MF to collapse sections by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634039 (https://phabricator.wikimedia.org/T239195) [15:01:36] (03PS2) 10Itamar Givon: Set Wikidata MF to collapse sections by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634039 (https://phabricator.wikimedia.org/T239195) [15:08:06] (03CR) 10Hnowlan: "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633980 (https://phabricator.wikimedia.org/T260624) (owner: 10Hnowlan) [15:13:45] (03PS1) 10Kormat: mariadb: sanitarium_multiinstance - generate sections. [puppet] - 10https://gerrit.wikimedia.org/r/634042 (https://phabricator.wikimedia.org/T256972) [15:14:54] (03CR) 10jerkins-bot: [V: 04-1] mariadb: sanitarium_multiinstance - generate sections. [puppet] - 10https://gerrit.wikimedia.org/r/634042 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [15:15:11] (03PS1) 10Jbond: puppetdb: blacklist partitions and mountpount facts [puppet] - 10https://gerrit.wikimedia.org/r/634043 [15:17:10] (03PS2) 10Kormat: mariadb: sanitarium_multiinstance - generate sections. [puppet] - 10https://gerrit.wikimedia.org/r/634042 (https://phabricator.wikimedia.org/T256972) [15:17:30] (03PS2) 10Jbond: puppetdb: blacklist partitions and mountpount facts [puppet] - 10https://gerrit.wikimedia.org/r/634043 [15:19:12] (03PS3) 10Jbond: puppetdb: blacklist partitions and mountpount facts [puppet] - 10https://gerrit.wikimedia.org/r/634043 (https://phabricator.wikimedia.org/T263578) [15:19:43] (03CR) 10Kormat: "PCC is a no-op: https://puppet-compiler.wmflabs.org/compiler1003/25894/" [puppet] - 10https://gerrit.wikimedia.org/r/634042 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [15:20:05] (03CR) 10Jbond: [C: 03+2] puppetdb::app: add abblity to blacklist facts from puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/634029 (https://phabricator.wikimedia.org/T263578) (owner: 10Jbond) [15:24:10] !log enabled and ran puppet on deploy1001 - T260917 [15:24:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:24:16] T260917: Support TLS for service-to-service communication in k8s staging - https://phabricator.wikimedia.org/T260917 [15:26:02] !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) [15:26:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:37] !log elukey@cumin1001 START - Cookbook sre.hadoop.reboot-workers [15:28:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:29:03] !log drain + reboot an-worker110[1,2] to pick up GPU settings - T255138 [15:29:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:29:09] T255138: Put 6 GPU-based Hadoop worker in service - https://phabricator.wikimedia.org/T255138 [15:30:07] (03PS3) 10Hnowlan: mtail: convert mediawiki to use a real histogram [puppet] - 10https://gerrit.wikimedia.org/r/629653 (https://phabricator.wikimedia.org/T263727) (owner: 10Giuseppe Lavagetto) [15:31:11] (03CR) 10jerkins-bot: [V: 04-1] mtail: convert mediawiki to use a real histogram [puppet] - 10https://gerrit.wikimedia.org/r/629653 (https://phabricator.wikimedia.org/T263727) (owner: 10Giuseppe Lavagetto) [15:31:13] (03PS1) 10Kormat: mariadb: dbstore_multiinstance - generate sections. [puppet] - 10https://gerrit.wikimedia.org/r/634044 (https://phabricator.wikimedia.org/T256972) [15:32:22] (03CR) 10jerkins-bot: [V: 04-1] mariadb: dbstore_multiinstance - generate sections. [puppet] - 10https://gerrit.wikimedia.org/r/634044 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [15:34:49] (03PS2) 10Kormat: mariadb: dbstore_multiinstance - generate sections. [puppet] - 10https://gerrit.wikimedia.org/r/634044 (https://phabricator.wikimedia.org/T256972) [15:38:52] (03CR) 10Lucas Werkmeister (WMDE): Set Wikidata MF to collapse sections by default (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634039 (https://phabricator.wikimedia.org/T239195) (owner: 10Itamar Givon) [15:42:34] (03PS4) 10Hnowlan: mtail: convert mediawiki to use a real histogram [puppet] - 10https://gerrit.wikimedia.org/r/629653 (https://phabricator.wikimedia.org/T263727) (owner: 10Giuseppe Lavagetto) [15:45:00] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission-hardware: decommission es2011.codfw.wmnet - https://phabricator.wikimedia.org/T264261 (10Papaul) ` papaul@asw-b-codfw# show | compare [edit interfaces interface-range vlan-private1-b-codfw] - member ge-1/0/9; [edit interfaces interface-range disabled]... [15:45:30] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission-hardware: decommission es2011.codfw.wmnet - https://phabricator.wikimedia.org/T264261 (10Papaul) [15:45:34] (03CR) 10Volans: puppetdb: blacklist partitions and mountpount facts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/634043 (https://phabricator.wikimedia.org/T263578) (owner: 10Jbond) [15:48:06] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission-hardware: decommission es2015.codfw.wmnet - https://phabricator.wikimedia.org/T264700 (10Papaul) ` papaul@asw-c-codfw# show | compare [edit interfaces interface-range vlan-private1-c-codfw] - member ge-1/0/1; [edit interfaces interface-range disabled]... [15:48:30] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission-hardware: decommission es2015.codfw.wmnet - https://phabricator.wikimedia.org/T264700 (10Papaul) [15:48:55] 10Operations, 10netops, 10cloud-services-team (Kanban): Enable L3 routing on cloudsw nodes - https://phabricator.wikimedia.org/T265288 (10aborrero) This change has impacts to Toolforge (NFS, databases, etc). We want to reduce the downtime, i.e, failover things etc. For this it would be good if we can do this... [15:50:30] 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10Elitre) Thanks to everyone involved. [15:52:16] 10Operations, 10ops-codfw, 10decommission-hardware: decommission es2016.codfw.wmnet - https://phabricator.wikimedia.org/T264156 (10Papaul) ` papaul@asw-d-codfw# show | compare [edit interfaces interface-range vlan-private1-d-codfw] - member ge-1/0/5; [edit interfaces interface-range disabled] member... [15:52:37] 10Operations, 10ops-codfw, 10decommission-hardware: decommission es2016.codfw.wmnet - https://phabricator.wikimedia.org/T264156 (10Papaul) [15:54:27] (03PS1) 10Ayounsi: PuppetDB microservice: allow LLDP fact [puppet] - 10https://gerrit.wikimedia.org/r/634048 (https://phabricator.wikimedia.org/T262899) [15:54:50] 10Operations, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Traffic, 10Epic: Support maps serving for affiliate sites via an allow list - https://phabricator.wikimedia.org/T261694 (10CDanis) >>! In T261694#6538860, @MSantos wrote: > @CDanis and @Dzahn as per T261424#6538173, is there anything else to... [15:55:45] !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) [15:55:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:55:51] PROBLEM - Host an-worker1102 is DOWN: PING CRITICAL - Packet loss = 100% [15:55:53] RECOVERY - Host an-worker1102 is UP: PING OK - Packet loss = 0%, RTA = 0.18 ms [15:58:21] (03CR) 10Jbond: "updated" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/634043 (https://phabricator.wikimedia.org/T263578) (owner: 10Jbond) [15:58:50] !log elukey@cumin1001 START - Cookbook sre.hadoop.reboot-workers [15:58:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:59:02] !log drain + reboot an-worker1100 to pick up GPU settings - T255138 [15:59:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:59:08] T255138: Put 6 GPU-based Hadoop worker in service - https://phabricator.wikimedia.org/T255138 [16:01:17] 10Operations, 10Thumbor, 10Wikimedia-SVG-rendering, 10Upstream: Update librsvg to ≥2.42.3 - https://phabricator.wikimedia.org/T193352 (10AntiCompositeNumber) [16:01:19] (03PS1) 10Arturo Borrero Gonzalez: network: constants: add cloud floating IP ranges [puppet] - 10https://gerrit.wikimedia.org/r/634050 [16:04:18] (03CR) 10Andrew Bogott: [C: 03+1] "This looks like something that should have been in here years ago. Nonetheless, let's get buy-in from a few other SRE folks before mergin" [puppet] - 10https://gerrit.wikimedia.org/r/634050 (owner: 10Arturo Borrero Gonzalez) [16:12:31] !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) [16:12:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:13:19] PROBLEM - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is CRITICAL: /v2/translate/{from}/{to}{/provider} (Machine translate an HTML fragment using TestClient, adapt the links to target language wiki.) is CRITICAL: Test Machine translate an HTML fragment using TestClient, adapt the links to target language wiki. returned the unexpected status 500 (expecting: 200) https://wikitech.wikimedia.org/wiki/CX [16:15:01] RECOVERY - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX [16:26:11] 10Operations, 10ops-eqiad, 10DC-Ops: scs-c1-eqiad CPU usage over 85% - https://phabricator.wikimedia.org/T238036 (10RobH) 05Open→03Resolved a:03RobH If this happens again on any scs device, other than scs-a8-eqiad, it means the firmware update to 4.9.0u1 (fleetwide) doesn't fix the CPU spike issue.... [16:26:13] 10Operations, 10ops-eqiad, 10DC-Ops: scs-c1-eqiad CPU usage over 85% - https://phabricator.wikimedia.org/T238036 (10RobH) a:05RobH→03None [16:27:45] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:29:25] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:34:16] 10Operations, 10Thumbor, 10Wikimedia-SVG-rendering, 10Upstream: SVG fails to render properly due to several issues - https://phabricator.wikimedia.org/T46016 (10AntiCompositeNumber) [16:38:02] (03CR) 10CRusnov: [C: 03+1] "LGTM :)" [puppet] - 10https://gerrit.wikimedia.org/r/634048 (https://phabricator.wikimedia.org/T262899) (owner: 10Ayounsi) [16:39:41] (03PS4) 10Jbond: puppetdb: blacklist dynamicly generated facts [puppet] - 10https://gerrit.wikimedia.org/r/634043 (https://phabricator.wikimedia.org/T263578) [16:39:43] (03CR) 10Ayounsi: [C: 03+2] PuppetDB microservice: allow LLDP fact [puppet] - 10https://gerrit.wikimedia.org/r/634048 (https://phabricator.wikimedia.org/T262899) (owner: 10Ayounsi) [16:42:49] (03CR) 10Volans: [C: 03+1] "LGTM, let's see if they have any impact, for sure will make the /facts page on puppetboard usable again (nowadays times out)" [puppet] - 10https://gerrit.wikimedia.org/r/634043 (https://phabricator.wikimedia.org/T263578) (owner: 10Jbond) [16:43:54] (03PS2) 10Andrew Bogott: exim: add toolforge.org domain [puppet] - 10https://gerrit.wikimedia.org/r/619851 [16:45:37] (03CR) 10Andrew Bogott: "It's been so long since I wrote this that I can't remember what prompted it. Nevertheless it seems like it's probably right!" [puppet] - 10https://gerrit.wikimedia.org/r/619851 (owner: 10Andrew Bogott) [16:50:36] (03CR) 10Muehlenhoff: [C: 03+1] "Sounds good, I'm pretty sure noone uses these and as said in the meeting it's also easy to rollback." [puppet] - 10https://gerrit.wikimedia.org/r/634043 (https://phabricator.wikimedia.org/T263578) (owner: 10Jbond) [16:56:21] 10Operations, 10Technical-blog-posts, 10Traffic: Blog post series: the evolution of Wikimedia's Content Delivery Network - https://phabricator.wikimedia.org/T264729 (10srodlund) @ema This has been published! https://techblog.wikimedia.org/2020/10/14/wikimedias-cdn/ Can you take a quick look and let me know... [17:06:43] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:08:27] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:11:40] (03PS3) 10DannyS712: Partially revert "[labs] Remove wmgMonologChannels override" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633761 [17:11:46] (03PS4) 10DannyS712: Partially revert "[labs] Remove wmgMonologChannels override" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633761 [17:12:25] (03PS5) 10DannyS712: Revert "[labs] Remove wmgMonologChannels override" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633761 [17:13:06] (03CR) 10DannyS712: "> Patch Set 2:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633761 (owner: 10DannyS712) [17:25:36] (03PS10) 10Volans: cookbook API: add class API [software/spicerack] - 10https://gerrit.wikimedia.org/r/506956 (https://phabricator.wikimedia.org/T221212) [17:25:38] (03PS1) 10Volans: Refactoring: rename internal modules [software/spicerack] - 10https://gerrit.wikimedia.org/r/634056 (https://phabricator.wikimedia.org/T221212) [17:28:51] (03CR) 10Volans: "Addressed comments, replies inline" (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/506956 (https://phabricator.wikimedia.org/T221212) (owner: 10Volans) [17:32:37] (03PS4) 10Ayounsi: Add Z side device/interface/vlan and cable to PuppetDB importer [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/634017 (https://phabricator.wikimedia.org/T262899) [17:45:02] (03CR) 10Urbanecm: [C: 04-1] "nitpick" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633761 (owner: 10DannyS712) [17:46:50] (03PS1) 10Zoranzoki21: Add rollbacker right on uzwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634057 (https://phabricator.wikimedia.org/T265509) [17:47:09] (03CR) 10Urbanecm: labs: Disable EditorJourney (UnderstandingFirstDay) (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633514 (https://phabricator.wikimedia.org/T252391) (owner: 10Kosta Harlan) [17:48:19] (03CR) 10Urbanecm: Add 'spamblacklistlog' as a default right for the CU log user (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633855 (https://phabricator.wikimedia.org/T239288) (owner: 10Huji) [17:48:33] (03CR) 10Urbanecm: [C: 03+1] "I can merge this in ~10 minutes" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633855 (https://phabricator.wikimedia.org/T239288) (owner: 10Huji) [17:51:10] (03CR) 10Urbanecm: [C: 04-1] Add rollbacker right on uzwiki (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634057 (https://phabricator.wikimedia.org/T265509) (owner: 10Zoranzoki21) [17:51:17] (03PS6) 10DannyS712: Revert "[labs] Remove wmgMonologChannels override" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633761 [17:51:20] (03CR) 10DannyS712: Revert "[labs] Remove wmgMonologChannels override" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633761 (owner: 10DannyS712) [17:52:28] (03CR) 10Urbanecm: [C: 03+2] "no-op for prod" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633761 (owner: 10DannyS712) [17:53:13] (03Merged) 10jenkins-bot: Revert "[labs] Remove wmgMonologChannels override" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633761 (owner: 10DannyS712) [17:53:49] thanks DannyS712 - should take effect soon [17:54:41] (03CR) 10Urbanecm: [C: 04-1] GrowthExperiments: Default to variant D on testwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633889 (owner: 10Gergő Tisza) [17:57:26] (03PS2) 10Zoranzoki21: Add rollbacker right on uzwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634057 (https://phabricator.wikimedia.org/T265509) [17:58:47] (03CR) 10Zoranzoki21: Add rollbacker right on uzwiki (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634057 (https://phabricator.wikimedia.org/T265509) (owner: 10Zoranzoki21) [18:00:04] marxarelli and longma: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Train log triage with CPT deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201014T1800). [18:00:04] RoanKattouw, Niharika, and Urbanecm: Your horoscope predicts another unfortunate Morning backport window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201014T1800). [18:00:04] No GERRIT patches in the queue for this window AFAICS. [18:00:55] (03PS3) 10Zoranzoki21: Add rollbacker right on uzwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634057 (https://phabricator.wikimedia.org/T265509) [18:07:13] (03CR) 10Urbanecm: [C: 03+2] Add 'spamblacklistlog' as a default right for the CU log user [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633855 (https://phabricator.wikimedia.org/T239288) (owner: 10Huji) [18:08:10] (03Merged) 10jenkins-bot: Add 'spamblacklistlog' as a default right for the CU log user [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633855 (https://phabricator.wikimedia.org/T239288) (owner: 10Huji) [18:08:34] (03CR) 10Dzahn: "thanks for merging this :)" [puppet] - 10https://gerrit.wikimedia.org/r/633022 (owner: 10Dzahn) [18:09:17] (03PS1) 10Dave Pifke: Enable $wgImagePreconnect [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634058 (https://phabricator.wikimedia.org/T123582) [18:10:25] !log urbanecm@deploy1001 Synchronized wmf-config/CommonSettings.php: 0da89998e4e380f3ebe527a42a47dc66c49ee4d2: Add spamblacklistlog as a default right for the CU log user (T239288) (duration: 01m 05s) [18:10:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:10:41] (03PS2) 10Dave Pifke: Enable $wgImagePreconnect [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634058 (https://phabricator.wikimedia.org/T123582) [18:11:50] (03CR) 10Urbanecm: [C: 03+2] Add rollbacker right on uzwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634057 (https://phabricator.wikimedia.org/T265509) (owner: 10Zoranzoki21) [18:12:36] (03Merged) 10jenkins-bot: Add rollbacker right on uzwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634057 (https://phabricator.wikimedia.org/T265509) (owner: 10Zoranzoki21) [18:14:16] (03CR) 10RLazarus: [C: 03+1] mediawiki: replace font package ttf-alee with fonts-alee [puppet] - 10https://gerrit.wikimedia.org/r/633275 (https://phabricator.wikimedia.org/T264991) (owner: 10Dzahn) [18:14:33] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: d6a56bb7fb762c53db5965f2698a93db2433d33d: Add rollbacker right on uzwiki (T265509) (duration: 01m 04s) [18:14:37] (03CR) 10Krinkle: "For prod we'll want to vary this in IS.php by wiki, e.g. group0/group1 at first so we can confirm the perf improvement but also take a few" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634058 (https://phabricator.wikimedia.org/T123582) (owner: 10Dave Pifke) [18:14:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:14:38] T265509: Request for upload a rollbacker's right to uzwiki - https://phabricator.wikimedia.org/T265509 [18:14:48] * Urbanecm is done with deploying [18:14:49] Urbanecm: Oh, you've +2'ed my patch. Is checking on mwdebug needed? [18:15:00] Kizule: no, I just did that for you ;) [18:15:04] it's live [18:15:04] (03CR) 10Herron: "Why are we adding this domain?" [puppet] - 10https://gerrit.wikimedia.org/r/619851 (owner: 10Andrew Bogott) [18:15:35] Urbanecm: Okay, it is great. Can you do T265347? [18:15:36] T265347: My wikidata watchlist is impossible to be emptied - https://phabricator.wikimedia.org/T265347 [18:16:09] sure, gimme a sec [18:17:52] (03CR) 10BryanDavis: [C: 03+1] "I don't think we actually have a mail server anywhere for toolforge.org right now, but someday™ we will do the work to setup toolforge.org" [puppet] - 10https://gerrit.wikimedia.org/r/619851 (owner: 10Andrew Bogott) [18:23:01] (03CR) 10Dzahn: [V: 03+1 C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/25897/lvs5001.eqsin.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/633026 (owner: 10Dzahn) [18:28:22] !log wikiadmin@10.192.0.6(wikidatawiki)> DELETE FROM watchlist WHERE wl_user=104889; # T265347 [18:28:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:28:29] T265347: My wikidata watchlist is impossible to be emptied - https://phabricator.wikimedia.org/T265347 [18:28:55] Urbanecm why not use the maintenance script? [18:29:03] because it works only for update queries now [18:29:11] ah [18:29:25] (03CR) 10Dzahn: "noop on lvs4005, lvs1013, lvs5001.." [puppet] - 10https://gerrit.wikimedia.org/r/633026 (owner: 10Dzahn) [18:29:36] actually, since late 2016 [18:29:44] https://github.com/wikimedia/mediawiki/commit/4c8c5c434f1f620e12710d7a81198775e6dd3116, if curious [18:34:05] (03CR) 10Dzahn: [V: 03+1 C: 03+2] calico: hiera->lookup, add data types [puppet] - 10https://gerrit.wikimedia.org/r/633033 (owner: 10Dzahn) [18:34:12] (03PS2) 10Gergő Tisza: GrowthExperiments: On testwiki, enable variant C/D for now users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/633889 [18:34:28] Kizule: done [18:34:37] (03PS3) 10Dzahn: mediawiki: replace font package ttf-alee with fonts-alee [puppet] - 10https://gerrit.wikimedia.org/r/633275 (https://phabricator.wikimedia.org/T264991) [18:35:17] Urbanecm: Great, thanks! :) [18:35:22] np [18:38:42] (03PS1) 10Krinkle: Make attribution source logic more defensive [extensions/NavigationTiming] (wmf/1.36.0-wmf.13) - 10https://gerrit.wikimedia.org/r/634002 (https://phabricator.wikimedia.org/T263599) [18:39:35] (03PS2) 10Effie Mouzeli: varnish: check for pageview=0 value in X-Analytics header [puppet] - 10https://gerrit.wikimedia.org/r/629735 (https://phabricator.wikimedia.org/T263683) [18:39:42] (03CR) 10Effie Mouzeli: varnish: check for pageview=0 value in X-Analytics header [puppet] - 10https://gerrit.wikimedia.org/r/629735 (https://phabricator.wikimedia.org/T263683) (owner: 10Effie Mouzeli) [18:40:20] (03PS3) 10Effie Mouzeli: varnish: check for debug=1 value in X-Analytics header [puppet] - 10https://gerrit.wikimedia.org/r/629735 (https://phabricator.wikimedia.org/T263683) [18:49:40] (03PS1) 10Dzahn: mediawiki::fonts: remove support and special case for jessie [puppet] - 10https://gerrit.wikimedia.org/r/634062 [18:50:36] (03CR) 10jerkins-bot: [V: 04-1] mediawiki::fonts: remove support and special case for jessie [puppet] - 10https://gerrit.wikimedia.org/r/634062 (owner: 10Dzahn) [18:50:55] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/25898/" [puppet] - 10https://gerrit.wikimedia.org/r/633275 (https://phabricator.wikimedia.org/T264991) (owner: 10Dzahn) [18:51:21] awww. was +1 and -1 after a simple rebase, nice jerkins :) [18:51:34] haha jerkins [18:53:35] hehe, yea. it changes names based on +1 or -1, but i was wrong, different patch [18:53:52] (03CR) 10Muehlenhoff: [C: 04-1] "Still needed by scb" [puppet] - 10https://gerrit.wikimedia.org/r/634062 (owner: 10Dzahn) [18:55:50] (03CR) 10Dzahn: "arrr. yes, thanks. blaming graphoid !" [puppet] - 10https://gerrit.wikimedia.org/r/634062 (owner: 10Dzahn) [18:56:04] (03Abandoned) 10Dzahn: mediawiki::fonts: remove support and special case for jessie [puppet] - 10https://gerrit.wikimedia.org/r/634062 (owner: 10Dzahn) [18:59:52] (03CR) 10Dzahn: "[cumin1001:~] $ sudo cumin 'C:role::netmon' "lsb_release -c"" [puppet] - 10https://gerrit.wikimedia.org/r/633824 (owner: 10Dzahn) [19:00:05] marxarelli and longma: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Mediawiki train - American Version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201014T1900). [19:01:01] planning on deploying in ~ 20 after a stretch and food break following back-to-back morning meetings [19:04:17] (03CR) 10Dzahn: "new package now installed on testvm1001, noop on prod appservers" [puppet] - 10https://gerrit.wikimedia.org/r/633275 (https://phabricator.wikimedia.org/T264991) (owner: 10Dzahn) [19:12:22] (03PS13) 10Razzi: geoip: move archive timer from stat1007 to an-launcher1002 [puppet] - 10https://gerrit.wikimedia.org/r/633032 (https://phabricator.wikimedia.org/T264152) [19:14:08] !log dzahn@cumin1001 conftool action : set/pooled=no; selector: dc=codfw,name=wtp200[1-5].codfw.wmnet [19:14:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:14:33] !log depooling 5 of the older parsoid servers in codfw [19:14:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:32:17] (03CR) 10Razzi: "https://puppet-compiler.wmflabs.org/compiler1002/25899/" [puppet] - 10https://gerrit.wikimedia.org/r/633032 (https://phabricator.wikimedia.org/T264152) (owner: 10Razzi) [19:32:43] (03PS1) 10Dduvall: group1 wikis to 1.36.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634071 [19:32:45] (03CR) 10Dduvall: [C: 03+2] group1 wikis to 1.36.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634071 (owner: 10Dduvall) [19:33:02] !log mx1001/mx2001 - temp. disabled puppet, live hacking urgent alias change since private repo needs to be fixed [19:33:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:33:28] (03Merged) 10jenkins-bot: group1 wikis to 1.36.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634071 (owner: 10Dduvall) [19:38:30] !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.13 [19:38:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:38:45] RECOVERY - Ensure local MW versions match expected deployment on mw2279 is OK: OKAY: Not alerting due to fresh production wikiversions: 956 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [19:39:34] !log dduvall@deploy1001 Synchronized php: group1 wikis to 1.36.0-wmf.13 (duration: 01m 03s) [19:39:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:46:48] !log 1.36.0-wmf.13 promoted to group1. no new or concerning errors or changes in error rates (T263179) [19:46:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:46:54] T263179: 1.36.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T263179 [19:48:15] (03PS3) 10Aklapper: Adjust CSP header for pdfs & videos & set enforce on testwiki [puppet] - 10https://gerrit.wikimedia.org/r/547929 (https://phabricator.wikimedia.org/T117618) (owner: 10Brian Wolff) [19:56:13] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [19:57:55] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:00:04] chrisalbon and accraze: (Dis)respected human, time to deploy Services – Graphoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201014T2000). Please do the needful. [20:08:33] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/633824 (owner: 10Dzahn) [20:09:41] (03CR) 10BryanDavis: [C: 03+2] webservice: restore setting backend via service.manifest [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/628200 (https://phabricator.wikimedia.org/T263190) (owner: 10BryanDavis) [20:10:39] (03Merged) 10jenkins-bot: webservice: restore setting backend via service.manifest [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/628200 (https://phabricator.wikimedia.org/T263190) (owner: 10BryanDavis) [20:12:45] (03PS1) 10Kosta Harlan: labs: Fix config definition for wgAbuseFilterEmergencyDisableThreshold [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634074 (https://phabricator.wikimedia.org/T230305) [20:14:08] (03PS1) 10BryanDavis: d/changelog: Prepare for 0.74 release [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/634075 [20:14:59] (03CR) 10jerkins-bot: [V: 04-1] d/changelog: Prepare for 0.74 release [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/634075 (owner: 10BryanDavis) [20:17:34] (03PS2) 10BryanDavis: d/changelog: Prepare for 0.74 release [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/634075 [20:19:56] (03CR) 10BryanDavis: [C: 03+2] d/changelog: Prepare for 0.74 release [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/634075 (owner: 10BryanDavis) [20:20:45] (03Merged) 10jenkins-bot: d/changelog: Prepare for 0.74 release [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/634075 (owner: 10BryanDavis) [20:32:38] !log rolling back group1 due to malformed html in nav menu [20:32:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:32:53] kaldari: ^ [20:34:05] kaldari: would you mind filing a task while i perform the rollback? [20:34:06] (03PS8) 10BryanDavis: Make `webservice shell` scriptable [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/621776 (https://phabricator.wikimedia.org/T169695) [20:35:35] will do [20:35:51] (03PS1) 10Dduvall: Revert "group1 wikis to 1.36.0-wmf.13" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634079 [20:35:53] (03CR) 10Dduvall: [C: 03+2] Revert "group1 wikis to 1.36.0-wmf.13" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634079 (owner: 10Dduvall) [20:36:32] (03Merged) 10jenkins-bot: Revert "group1 wikis to 1.36.0-wmf.13" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634079 (owner: 10Dduvall) [20:36:47] kaldari: awesome. thank you [20:37:40] !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.36.0-wmf.11 [20:37:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:39:28] !log group1 rolled back to 1.36.0-wmf.11 due to malformed html in nav. task incoming (cc: T263179) [20:39:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:39:34] T263179: 1.36.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T263179 [20:40:41] https://phabricator.wikimedia.org/T265543 [20:40:42] (03PS1) 10Hashar: Link to static libclang [debs/doxygen] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/634080 (https://phabricator.wikimedia.org/T255465) [20:40:50] looks like Jon filed one [20:45:25] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:47:09] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:47:30] marxarelli: Thanks for your help! I'm talking with the Web engineers about it now. [20:47:46] bug was filed at https://phabricator.wikimedia.org/T265543 [20:50:10] (03PS1) 10Jdlrobson: Stylesheet needs to be compatible with cached HTML [skins/Vector] (wmf/1.36.0-wmf.13) - 10https://gerrit.wikimedia.org/r/634086 (https://phabricator.wikimedia.org/T265543) [20:51:17] (03CR) 10Kaldari: [C: 03+2] Stylesheet needs to be compatible with cached HTML [skins/Vector] (wmf/1.36.0-wmf.13) - 10https://gerrit.wikimedia.org/r/634086 (https://phabricator.wikimedia.org/T265543) (owner: 10Jdlrobson) [20:51:32] kaldari: thanks for pinging about this, I think marxarelli had to duck out, but let me know if/when there are backports needed [20:52:23] thcipriani: will do. I imagine Jon will want to try again after his fix is merged. [20:52:34] ack [20:53:32] but no rush [20:54:37] (03PS2) 10Hashar: Link to static libclang [debs/doxygen] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/634080 (https://phabricator.wikimedia.org/T254465) [20:55:39] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:55:46] (03CR) 10Hashar: "The build issue should be solved by https://gerrit.wikimedia.org/r/c/operations/debs/doxygen/+/634080 which I should squash into this chan" [debs/doxygen] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/621291 (https://phabricator.wikimedia.org/T254465) (owner: 10Hashar) [20:57:21] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [21:11:32] (03Merged) 10jenkins-bot: Stylesheet needs to be compatible with cached HTML [skins/Vector] (wmf/1.36.0-wmf.13) - 10https://gerrit.wikimedia.org/r/634086 (https://phabricator.wikimedia.org/T265543) (owner: 10Jdlrobson) [21:13:22] (03PS3) 10Dave Pifke: Enable $wgImagePreconnect in group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634058 (https://phabricator.wikimedia.org/T123582) [21:15:13] Jdlrobson: stylesheet change is on mwdebug2001, check please [21:16:41] (03PS4) 10Dave Pifke: Enable $wgImagePreconnect in group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634058 (https://phabricator.wikimedia.org/T123582) [21:17:13] (03CR) 10Jdlrobson: [C: 03+1] Make attribution source logic more defensive [extensions/NavigationTiming] (wmf/1.36.0-wmf.13) - 10https://gerrit.wikimedia.org/r/634002 (https://phabricator.wikimedia.org/T263599) (owner: 10Krinkle) [21:17:34] (03CR) 10Thcipriani: [C: 03+2] Make attribution source logic more defensive [extensions/NavigationTiming] (wmf/1.36.0-wmf.13) - 10https://gerrit.wikimedia.org/r/634002 (https://phabricator.wikimedia.org/T263599) (owner: 10Krinkle) [21:20:26] Jdlrobson: does mwdebug2001 lgt you? [21:21:48] (03PS5) 10Dave Pifke: Enable $wgImagePreconnect in group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634058 (https://phabricator.wikimedia.org/T123582) [21:27:08] (03CR) 10Krinkle: [C: 03+2] Enable $wgImagePreconnect in group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634058 (https://phabricator.wikimedia.org/T123582) (owner: 10Dave Pifke) [21:27:55] (03Merged) 10jenkins-bot: Enable $wgImagePreconnect in group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634058 (https://phabricator.wikimedia.org/T123582) (owner: 10Dave Pifke) [21:29:11] longma: any chance you could check if styles on meta.wikimedia.org still look messed up, and if so check again on mwdebug? [21:29:22] i just remembered we ran into this during pairing session yesterday. [21:30:18] thcipriani: sorry, I didnt' realise deployment was still active. [21:30:22] x.x yes I was just remembering this [21:30:26] want me to rollback? or avoid syncing that :/ [21:31:04] meta looks alright to me [21:31:10] brennen ^ [21:31:53] Krinkle: you could sync-file safely probably. I'm waiting on Jdlrobson to check things I've staged for wmf.13 on mwdebug2001. [21:31:54] thcipriani: looking [21:31:54] thx. i guess we sort of don't have a repro case for that at the moment then. [21:32:19] thcipriani: that's ok, dpifke and I will wait. [21:32:22] thcipriani: i think this should be fine. [21:32:28] just fyi that there's an unpulled change in wmf-config :) [21:32:30] Jdlrobson: k, going live [21:33:13] Krinkle: noted :) I'll ping you when you're clear [21:33:29] (03PS1) 10Andrew Bogott: Revert "nova fullstack test: run tests every 10 minutes and increase timeouts" [puppet] - 10https://gerrit.wikimedia.org/r/634084 [21:33:48] !log thcipriani@deploy1001 Synchronized php-1.36.0-wmf.13/skins/Vector/resources/skins.vector.styles/Menu.less: BACON: [[gerrit:634086|Stylesheet needs to be compatible with cached HTML]] T265543 (duration: 01m 07s) [21:33:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:33:56] T265543: UI Regression: Personal tools menu is appearing unstyled for anonymous users on cached HTML - https://phabricator.wikimedia.org/T265543 [21:34:00] ^ Jdlrobson live now [21:34:05] (03CR) 10Andrew Bogott: [C: 03+2] Revert "nova fullstack test: run tests every 10 minutes and increase timeouts" [puppet] - 10https://gerrit.wikimedia.org/r/634084 (owner: 10Andrew Bogott) [21:34:49] Krinkle: I'm waiting on tests for another few minutes for the other backport if you want to sneak in a deploy [21:34:51] thcipriani: double checking [21:35:37] LGTM. no caching issues in about 50 random pages [21:35:56] \o/ [21:36:23] dpifke: wanna pull and stage on mwdebug2001? [21:36:36] * Krinkle opens logstash mwdebug dash [21:36:57] just waiting to see if Uncaught TypeError: Cannot read property 'localName' of null drops from the log as expected in 5 mins when we roll over to new JS [21:37:37] Sure. [21:41:01] OK, it's on mwdebug2001. [21:43:03] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [21:44:45] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [21:45:14] dpifke: confirmed on purged mediawiki.org main page [21:45:31] I'm seeing some errors in logstash for curl timeouts [21:45:39] that's highly unusual, there are never warnings there [21:45:50] will be unrelated surely but that's def an issue. [21:46:02] thcipriani: were those present earlier during your deploys? [21:46:13] (at mwdebug log dash) [21:46:30] Krinkle: I hadn't seen those earlier... [21:48:11] dpifke: anyway, all clear I think? [21:48:44] I think so. [21:49:17] scap sync-file'ing now. [21:49:29] PROBLEM - Ensure local MW versions match expected deployment on mw2279 is CRITICAL: CRITICAL: 956 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [21:51:12] !log dpifke@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enabling image preconnect in group0 (T123582) (duration: 01m 03s) [21:51:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:51:19] T123582: Use "preconnect" resource hint for thumbnail host - https://phabricator.wikimedia.org/T123582 [21:51:31] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [21:51:50] (03Merged) 10jenkins-bot: Make attribution source logic more defensive [extensions/NavigationTiming] (wmf/1.36.0-wmf.13) - 10https://gerrit.wikimedia.org/r/634002 (https://phabricator.wikimedia.org/T263599) (owner: 10Krinkle) [21:52:05] Looks good on this end. [21:53:20] (03CR) 10Razzi: "Thanks for your comments Elukey. I'm generally in favor of following best practices / conventions even if the end result is the same, whic" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/633032 (https://phabricator.wikimedia.org/T264152) (owner: 10Razzi) [21:54:55] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [21:55:00] Krinkle: dpifke all clear for other deploys? [21:55:14] I defer to dpifke [21:55:37] Yes. I think we're going to leave it in group0 for at least a day or so. [21:55:51] ack, thanks [21:56:13] Jdlrobson: I staged the navtiming patch on mwdebug2001 if you could check [21:56:49] thcipriani, longma: i'm back and can take the reins for group1 re-deploy if that works [21:57:30] looking [21:57:52] in terms of errors we'll need to sync to confirm [21:57:56] thcipriani, longma: thanks for jumping in on such short notice! [21:59:17] thcipriani: i cant confirm it's behaving correctly but I assume performance team did that. I can confirm it's running without error [21:59:23] syncing should confirm the error is fixed [22:00:14] Jdlrobson: ack, going live [22:01:31] !log thcipriani@deploy1001 Synchronized php-1.36.0-wmf.13/extensions/NavigationTiming: BACON: [[gerrit:634002|Make attribution source logic more defensive]] T263599 (duration: 01m 05s) [22:01:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:01:37] ^ Jdlrobson live [22:01:37] T263599: Uncaught TypeError: Cannot read property 'localName' of null in emitLayoutShift - https://phabricator.wikimedia.org/T263599 [22:01:57] marxarelli: longma backports for blockers are done, I'll happily return the train to you all now :) [22:02:18] thx thcipriani! [22:02:25] thanks! [22:03:19] ill respond with a nice graph as soon as i can [22:03:34] (03CR) 10DannyS712: [C: 03+1] labs: Fix config definition for wgAbuseFilterEmergencyDisableThreshold [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634074 (https://phabricator.wikimedia.org/T230305) (owner: 10Kosta Harlan) [22:03:45] I do enjoy a nice graph [22:04:33] !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@04548dd]: spark: centralize reading/writing to hive [22:04:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:07:26] marxarelli: should I go ahead and do the deploy? [22:08:18] !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@04548dd]: spark: centralize reading/writing to hive (duration: 03m 44s) [22:08:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:10:04] (03PS1) 10Dduvall: group1 wikis to 1.36.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634085 [22:10:06] (03CR) 10Dduvall: [C: 03+2] group1 wikis to 1.36.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634085 (owner: 10Dduvall) [22:10:13] longma: ^ :) [22:10:25] i'll take it. sorry i had to duck out on short notice [22:10:46] (03Merged) 10jenkins-bot: group1 wikis to 1.36.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634085 (owner: 10Dduvall) [22:11:17] 👍 sorry about the miscommunication. I thought you might have lost your internet connection so I was just about to do it but then I saw the message here [22:11:53] ha! that's a fair assumption given my debacles with at&t fiber [22:12:15] RECOVERY - Ensure local MW versions match expected deployment on mw2279 is OK: OKAY: Not alerting due to fresh production wikiversions: 956 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [22:12:27] !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.13 [22:12:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:13:31] !log dduvall@deploy1001 Synchronized php: group1 wikis to 1.36.0-wmf.13 (duration: 01m 03s) [22:13:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:24:42] (03PS3) 10MichaelSchoenitzer: Add fd and ripgrep to toolforge [puppet] - 10https://gerrit.wikimedia.org/r/633583 (https://phabricator.wikimedia.org/T219501) [22:24:50] (03PS1) 10Ebernhardson: Revert "cirrus: Increase more_like cache from one to three days" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634089 [22:25:43] (03PS4) 10MichaelSchoenitzer: Add fd and ripgrep to toolforge [puppet] - 10https://gerrit.wikimedia.org/r/633583 (https://phabricator.wikimedia.org/T219501) [22:29:15] (03PS1) 10Cwhite: prometheus: ensure new prometheus-rsyslog-exporter version [puppet] - 10https://gerrit.wikimedia.org/r/634112 (https://phabricator.wikimedia.org/T210137) [22:29:18] (03CR) 10MichaelSchoenitzer: "> Patch Set 2: Code-Review-1" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/633583 (https://phabricator.wikimedia.org/T219501) (owner: 10MichaelSchoenitzer) [22:31:52] (03PS1) 10MichaelSchoenitzer: Add neovim – a modern fork of vim – to toolforge [puppet] - 10https://gerrit.wikimedia.org/r/634113 [22:32:03] (03CR) 10Cwhite: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/633971 (https://phabricator.wikimedia.org/T261281) (owner: 10Filippo Giunchedi) [22:32:33] (03CR) 10Cwhite: [C: 03+1] hieradata: re-enable compaction for prometheus[12]003 [puppet] - 10https://gerrit.wikimedia.org/r/633972 (https://phabricator.wikimedia.org/T261281) (owner: 10Filippo Giunchedi) [22:32:51] (03CR) 10MichaelSchoenitzer: "The easier part of 633583…" [puppet] - 10https://gerrit.wikimedia.org/r/634113 (owner: 10MichaelSchoenitzer) [22:33:58] (03PS2) 10MichaelSchoenitzer: Add neovim – a modern fork of vim – to toolforge [puppet] - 10https://gerrit.wikimedia.org/r/634113 (https://phabricator.wikimedia.org/T219501) [22:37:19] PROBLEM - ping-offload grafana alert on alert1001 is CRITICAL: CRITICAL: Ping offload ( https://grafana.wikimedia.org/d/000000513/ping-offload ) is alerting: target IP missing on hosts loopback. https://wikitech.wikimedia.org/wiki/Ping_offload%23InAddrErrors_alert https://grafana.wikimedia.org/d/000000513/ [22:38:39] !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@9ce273f]: bulk_daemon: revert of streaming gzip decompression [22:38:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:38:43] PROBLEM - Prometheus prometheus1003/ops restarted: beware possible monitoring artifacts on prometheus1003 is CRITICAL: instance=127.0.0.1 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops [22:38:55] RECOVERY - ping-offload grafana alert on alert1001 is OK: OK: Ping offload ( https://grafana.wikimedia.org/d/000000513/ping-offload ) is not alerting. https://wikitech.wikimedia.org/wiki/Ping_offload%23InAddrErrors_alert https://grafana.wikimedia.org/d/000000513/ [22:41:04] !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@9ce273f]: bulk_daemon: revert of streaming gzip decompression (duration: 02m 25s) [22:41:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:46:25] (03PS2) 10Ebernhardson: Revert "cirrus: Increase more_like cache from one to three days" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634089 [22:54:30] (03PS2) 10Urbanecm: labs: Fix config definition for wgAbuseFilterEmergencyDisableThreshold [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634074 (https://phabricator.wikimedia.org/T230305) (owner: 10Kosta Harlan) [22:54:48] (03CR) 10Urbanecm: [C: 03+2] "thanks Kosta" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634074 (https://phabricator.wikimedia.org/T230305) (owner: 10Kosta Harlan) [22:55:27] (03Merged) 10jenkins-bot: labs: Fix config definition for wgAbuseFilterEmergencyDisableThreshold [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634074 (https://phabricator.wikimedia.org/T230305) (owner: 10Kosta Harlan) [23:00:04] RoanKattouw, Niharika, and Urbanecm: I, the Bot under the Fountain, allow thee, The Deployer, to do Evening backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201014T2300). [23:00:04] ebernhardson: A patch you scheduled for Evening backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:36] \o/ [23:00:54] ebernhardson: i guess you'll self-deploy? :-) [23:02:31] RECOVERY - Prometheus prometheus1003/ops restarted: beware possible monitoring artifacts on prometheus1003 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops [23:03:12] Urbanecm: yup i'll take care of it [23:03:17] (03PS1) 10CDanis: bump FNM mbps threshold [puppet] - 10https://gerrit.wikimedia.org/r/634115 [23:03:25] (y) [23:03:52] (03PS3) 10Ebernhardson: Revert "cirrus: Increase more_like cache from one to three days" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634089 (https://phabricator.wikimedia.org/T264053) [23:04:06] (03CR) 10Ebernhardson: [C: 03+2] Revert "cirrus: Increase more_like cache from one to three days" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634089 (https://phabricator.wikimedia.org/T264053) (owner: 10Ebernhardson) [23:05:42] (03Merged) 10jenkins-bot: Revert "cirrus: Increase more_like cache from one to three days" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634089 (https://phabricator.wikimedia.org/T264053) (owner: 10Ebernhardson) [23:08:05] (03PS1) 10Catrope: GrowthExperiments: Enable variant C/D for new users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634117 (https://phabricator.wikimedia.org/T265556) [23:08:53] !log ebernhardson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 01m 04s) [23:08:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:11:56] !log Syncronized wmf-config/InitialiseSettings.php to sync reduction of cirrus morelike query cache from 3 back to 1 day [23:12:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:12:21] backports complete [23:13:00] (03PS1) 10Catrope: Enable and configure GrowthExperiments on trwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634119 (https://phabricator.wikimedia.org/T243445) [23:24:23] (03PS1) 10Cwhite: profile,prometheus: enable logging level tuning and set @ops to debug [puppet] - 10https://gerrit.wikimedia.org/r/634121 [23:25:18] ebernhardson:are you still around? [23:25:55] or thcipriani ? [23:26:18] there's a deployment blocker that we've just noticed would be great to backport. [23:26:31] PROBLEM - Ensure local MW versions match expected deployment on mw2279 is CRITICAL: CRITICAL: 956 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [23:28:14] What is it? [23:28:30] !log Removing nine files for legal compliance [23:28:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:30:52] we broke error logging https://phabricator.wikimedia.org/T256173 [23:31:03] we dont need to roll back so we can always take care of this tomorrow morning [23:31:58] CI is saying no to https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/634120/ [23:32:38] couple of code style issues, and an implicit conversion issue [23:33:35] yeh i dont think it's ready [23:33:46] it can wait till tomorrow [23:35:58] !log Removing one further file for legal compliance [23:36:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:38:48] foks what do you mean by removing? Deleting from the database entirely? [23:38:53] yes [23:39:52] ah, okay. Was wondering why nothing was added at https://commons.wikimedia.org/wiki/Commons:Office_actions/DMCA_notices [23:40:36] They're not necessarily DMCA [23:40:47] yeah, they're not DMCAs [23:51:49] (03PS1) 10Dzahn: DHCP: remove wtp2001 through wtp2020 [puppet] - 10https://gerrit.wikimedia.org/r/634125 (https://phabricator.wikimedia.org/T265558) [23:53:05] (03PS1) 10Dzahn: conftool-data: remove wtp2001 through wtp2020 [puppet] - 10https://gerrit.wikimedia.org/r/634126 (https://phabricator.wikimedia.org/T265558) [23:55:08] (03PS1) 10Dzahn: scap/cumin: switch parsoid codfw canaries from wtp2001/2002 to parse2001/2002 [puppet] - 10https://gerrit.wikimedia.org/r/634128 (https://phabricator.wikimedia.org/T265558) [23:56:28] (03PS1) 10Dzahn: remove wtp2001.codfw.wmnet through wtp2020.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/634129 (https://phabricator.wikimedia.org/T265558) [23:59:19] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets