[00:01:40] (03PS1) 10Ryan Kemper: Bring 3 new eqiad wdqs nodes into service [puppet] - 10https://gerrit.wikimedia.org/r/634381 (https://phabricator.wikimedia.org/T260083) [00:02:29] (03PS2) 10Ryan Kemper: Bring 3 new eqiad wdqs nodes into service [puppet] - 10https://gerrit.wikimedia.org/r/634381 (https://phabricator.wikimedia.org/T260083) [00:05:53] (03PS3) 10Ryan Kemper: Bring 3 new eqiad wdqs nodes into service [puppet] - 10https://gerrit.wikimedia.org/r/634381 (https://phabricator.wikimedia.org/T260083) [00:08:05] (03PS1) 10Dzahn: parsoid: stop using nodejs parsoid on scandium [puppet] - 10https://gerrit.wikimedia.org/r/634383 (https://phabricator.wikimedia.org/T257906) [00:09:06] 10Operations, 10Parsoid, 10Parsoid-Tests, 10serviceops, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10ssastry) @Dzahn I was supposed to verify that rt testing works from that new server but haven't done it yet. Lost t... [00:09:58] (03CR) 10Subramanya Sastry: [C: 04-1] "Please hold on till I verify and shift all testing over to testreduce1001. I lost track of getting that done." [puppet] - 10https://gerrit.wikimedia.org/r/634383 (https://phabricator.wikimedia.org/T257906) (owner: 10Dzahn) [00:13:41] (03PS1) 10Dzahn: parsoid: add data types [puppet] - 10https://gerrit.wikimedia.org/r/634385 [00:13:55] (03CR) 10Dzahn: "ACK, thanks. this will be on hold" [puppet] - 10https://gerrit.wikimedia.org/r/634383 (https://phabricator.wikimedia.org/T257906) (owner: 10Dzahn) [00:15:20] 10Operations, 10Parsoid, 10Parsoid-Tests, 10serviceops, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10Dzahn) Alright, sounds good. Thank you @ssastry [00:19:05] (03CR) 10Dzahn: [C: 03+2] "as the jenkins vote shows.. this lint ignored was not ignoring anything (anymore)" [puppet] - 10https://gerrit.wikimedia.org/r/634373 (owner: 10Dzahn) [00:20:03] (03CR) 10Dzahn: [C: 03+2] "another one that does nothing" [puppet] - 10https://gerrit.wikimedia.org/r/634370 (owner: 10Dzahn) [00:24:17] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/25943/" [puppet] - 10https://gerrit.wikimedia.org/r/634371 (owner: 10Dzahn) [00:26:44] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/25944/archiva1002.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/634364 (owner: 10Dzahn) [00:28:08] (03CR) 10Dzahn: "confirmed pure noop on archiva1002" [puppet] - 10https://gerrit.wikimedia.org/r/634364 (owner: 10Dzahn) [00:28:28] (03CR) 10Dzahn: [C: 03+2] base::environment: remove lint-ignore that ignores nothing [puppet] - 10https://gerrit.wikimedia.org/r/634374 (owner: 10Dzahn) [00:29:27] (03PS2) 10Dzahn: parsoid: add data types [puppet] - 10https://gerrit.wikimedia.org/r/634385 [00:30:23] (03CR) 10Dzahn: [V: 04-1] "https://puppet-compiler.wmflabs.org/compiler1001/25945/" [puppet] - 10https://gerrit.wikimedia.org/r/634385 (owner: 10Dzahn) [00:32:13] 10Operations, 10Mail, 10Epic: Move most (all?) exim personal aliases to WMF ITS - https://phabricator.wikimedia.org/T122144 (10Aklapper) [00:34:34] (03PS3) 10Dzahn: parsoid: add data types [puppet] - 10https://gerrit.wikimedia.org/r/634385 [00:40:11] (03PS1) 10Dzahn: mariadb::grants: hiera()->lookup() [puppet] - 10https://gerrit.wikimedia.org/r/634387 [00:43:49] (03PS4) 10Dzahn: parsoid: add data types [puppet] - 10https://gerrit.wikimedia.org/r/634385 [00:49:12] 10Operations, 10Mail: E-mail for people in different OIT LDAP object unit - https://phabricator.wikimedia.org/T159750 (10Dzahn) [00:50:50] 10Operations, 10Mail: E-mail for people in different OIT LDAP object unit - https://phabricator.wikimedia.org/T159750 (10Dzahn) @Aklapper I see that "WMF-Office-IT" was archived but there does not seem to be a replacement for "ITS". There are tickets though that should still be tagged.. what should we do with... [00:53:23] 10Operations, 10Design-Research, 10Domains, 10Traffic: Register wikipersonas.org and redirect URL - https://phabricator.wikimedia.org/T241944 (10Dzahn) a:03Dendelele [00:56:20] !log cdanis@cumin1001 START - Cookbook sre.network.cf [00:56:21] !log cdanis@cumin1001 END (PASS) - Cookbook sre.network.cf (exit_code=0) [00:56:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:56:30] 10Operations, 10Design-Research: Edit optoutresearch@ mailing list recipients - https://phabricator.wikimedia.org/T100860 (10Dzahn) Since we never got a reply here I recently asked ITS (formerly OIT) to move optoutresearch@ over to them. Harry confirmed it was created on their side and then I removed it on the... [00:56:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:57:29] 10Operations, 10Design-Research: Edit optoutresearch@ mailing list recipients - https://phabricator.wikimedia.org/T100860 (10Dzahn) 05Open→03Resolved a:03Dzahn [00:58:30] 10Operations, 10LDAP: Remove disabled users from internal mailing lists - https://phabricator.wikimedia.org/T161004 (10Dzahn) 05Open→03Stalled [00:58:32] 10Operations, 10LDAP: Make disabled accounts visible in the corp mirror LDAP replica - https://phabricator.wikimedia.org/T160158 (10Dzahn) [00:58:34] 10Operations: Enhance account handling (meta bug) - https://phabricator.wikimedia.org/T142815 (10Dzahn) [01:01:41] !log Cleaning up a dangling no-longer-puppet-managed udev elasticsearch-readahead rule across all cirrus instances: `sudo cumin -b 36 C:profile::elasticsearch::cirrus 'sudo rm -fv /etc/udev/rules.d/elasticsearch-readahead.rules && sudo /sbin/udevadm control --reload && sudo /sbin/udevadm trigger'` [01:01:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:13:38] (03PS1) 10Ryan Kemper: udev_reload missing trailing sudo [puppet] - 10https://gerrit.wikimedia.org/r/634390 [01:27:11] (03CR) 10Ryan Kemper: "Need to find a more exhaustive list of nodes for PCC but for now I ran it on a couple as a quick sanity check:" [puppet] - 10https://gerrit.wikimedia.org/r/634390 (owner: 10Ryan Kemper) [01:30:56] (03CR) 10Ebernhardson: [C: 03+1] "Makes sense. This likely means any instance that has had a udev rule added but has not been rebooted hasn't had the changes applied as exp" [puppet] - 10https://gerrit.wikimedia.org/r/634390 (owner: 10Ryan Kemper) [01:43:33] (03CR) 10Ryan Kemper: "https://puppet-compiler.wmflabs.org/compiler1003/25942/" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/634381 (https://phabricator.wikimedia.org/T260083) (owner: 10Ryan Kemper) [01:50:25] (03PS1) 10Ryan Kemper: Bump shard_size warning/crit thresholds [puppet] - 10https://gerrit.wikimedia.org/r/634391 [01:58:33] (03CR) 10Ryan Kemper: "https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/25951" [puppet] - 10https://gerrit.wikimedia.org/r/634391 (owner: 10Ryan Kemper) [02:40:41] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_citoid_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [02:42:23] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [02:43:37] PROBLEM - Check systemd state on ms-be2029 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:57:25] RECOVERY - Check systemd state on ms-be2029 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:58:45] PROBLEM - Check systemd state on ms-be1023 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:03:29] PROBLEM - Check whether ferm is active by checking the default input chain on ms-be1023 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [04:07:00] (03PS1) 10DannyS712: Revert "Test authmanager restricter in labs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634410 [04:07:11] (03CR) 10jerkins-bot: [V: 04-1] Revert "Test authmanager restricter in labs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634410 (owner: 10DannyS712) [04:07:20] (03PS2) 10DannyS712: Revert "Test authmanager restricter in labs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634410 [04:07:24] (03PS3) 10DannyS712: Revert "Test authmanager restricter in labs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634410 [04:07:46] (03PS4) 10DannyS712: Revert "Test authmanager restricter in labs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634410 [04:26:07] RECOVERY - Check systemd state on ms-be1023 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:34:11] RECOVERY - Check whether ferm is active by checking the default input chain on ms-be1023 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [06:49:05] (03CR) 10Vgutierrez: [C: 03+2] ATS: Remove digicert-2019a cert definition [puppet] - 10https://gerrit.wikimedia.org/r/634202 (https://phabricator.wikimedia.org/T265584) (owner: 10Vgutierrez) [06:52:49] (03CR) 10Alexandros Kosiaris: "https://gerrit.wikimedia.org/r/c/operations/puppet/+/634278/6/modules/stdlib/CHANGELOG.md says 6.4.0 but commit message says 6.5.0. What a" [puppet] - 10https://gerrit.wikimedia.org/r/634278 (owner: 10Jbond) [06:57:19] !log enable cr2-eqdfw:xe-0/1/2 [06:57:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:58:57] PROBLEM - Router interfaces on cr3-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 75, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:59:03] PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 52, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:59:55] ^ telia planned maintenance [07:00:05] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201016T0700) [07:15:01] !log dcausse@deploy1001 Started deploy [wikimedia/discovery/analytics@27d0b01]: cirrus namespace map: Align output columns with table [07:15:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:19:23] !log dcausse@deploy1001 Finished deploy [wikimedia/discovery/analytics@27d0b01]: cirrus namespace map: Align output columns with table (duration: 04m 22s) [07:19:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:20:31] (03CR) 10Gehel: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/634391 (owner: 10Ryan Kemper) [07:24:12] 10Operations, 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (2020-10-01 to 2020-12-31 (Q2)): Migrate Gerrit to profile::java - https://phabricator.wikimedia.org/T264182 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff The Gerrit servers have... [07:24:14] 10Operations: Migrate remaining services using Java to profile::java - https://phabricator.wikimedia.org/T264174 (10MoritzMuehlenhoff) [07:28:18] (03CR) 10Gehel: [C: 04-1] Bring 3 new eqiad wdqs nodes into service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/634381 (https://phabricator.wikimedia.org/T260083) (owner: 10Ryan Kemper) [07:30:33] RECOVERY - Freshness of OCSP Stapling files -ATS-TLS- on cp2030 is OK: OK https://wikitech.wikimedia.org/wiki/HTTPS/Unified_Certificates [07:30:53] RECOVERY - Freshness of OCSP Stapling files -ATS-TLS- on cp1079 is OK: OK https://wikitech.wikimedia.org/wiki/HTTPS/Unified_Certificates [07:32:15] 10Operations, 10Traffic: Wipe digicert-2019a from the caching cluster - https://phabricator.wikimedia.org/T265584 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez [07:39:22] (03PS1) 10Ema: varnish: do not install -dbg package [puppet] - 10https://gerrit.wikimedia.org/r/634461 (https://phabricator.wikimedia.org/T264074) [07:40:06] (03CR) 10Vgutierrez: [C: 03+1] varnish: do not install -dbg package [puppet] - 10https://gerrit.wikimedia.org/r/634461 (https://phabricator.wikimedia.org/T264074) (owner: 10Ema) [07:41:44] (03CR) 10Ema: [C: 03+2] varnish: do not install -dbg package [puppet] - 10https://gerrit.wikimedia.org/r/634461 (https://phabricator.wikimedia.org/T264074) (owner: 10Ema) [07:42:30] 10Operations, 10netops, 10observability, 10Security, 10User-jbond: ulog: filter out diffscan from ulog - https://phabricator.wikimedia.org/T265590 (10MoritzMuehlenhoff) The servers with a public IP already have lots of noise from random bots/portscans (e.g. on bast3004 40kish log per day), so this doesn'... [07:46:07] (03CR) 10Muehlenhoff: [C: 03+1] "Ploticus will be added to apt.wikimedia.org since it's needed for MediaWiki on Buster (T253377), if people request it for stat* hosts it c" [puppet] - 10https://gerrit.wikimedia.org/r/630578 (https://phabricator.wikimedia.org/T255028) (owner: 10Elukey) [07:46:39] (03CR) 10Gehel: "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/634390 (owner: 10Ryan Kemper) [07:59:04] !log text@ulsfo: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances T264074 [07:59:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:59:10] T264074: varnishkafka 1.1.0 CPU usage increase - https://phabricator.wikimedia.org/T264074 [08:02:45] (03PS1) 10Filippo Giunchedi: thanos: fix bucket web viewer proxypass [puppet] - 10https://gerrit.wikimedia.org/r/634463 [08:06:25] (03CR) 10Filippo Giunchedi: [C: 03+2] "PCC https://puppet-compiler.wmflabs.org/compiler1002/25952/thanos-fe1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/634463 (owner: 10Filippo Giunchedi) [08:08:59] !log upload@ulsfo: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest T264074 [08:09:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:09:05] T264074: varnishkafka 1.1.0 CPU usage increase - https://phabricator.wikimedia.org/T264074 [08:09:18] !log reboot stat1005/stat1008 to pick up correct GPU settings [08:09:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:09:43] RECOVERY - Router interfaces on cr3-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 77, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [08:09:47] RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 54, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [08:12:27] PROBLEM - Host stat1005 is DOWN: PING CRITICAL - Packet loss = 100% [08:13:34] 10Operations, 10Traffic: varnish 5.1.3 frontend child restarted - https://phabricator.wikimedia.org/T185968 (10ema) 05Open→03Resolved We haven't seen this happening anymore after setting transient storage limits. Closing. [08:15:06] (03CR) 10Filippo Giunchedi: [C: 03+1] prometheus::rsyslog_exporter: update collector to listen on primary ip [puppet] - 10https://gerrit.wikimedia.org/r/634213 (https://phabricator.wikimedia.org/T265587) (owner: 10Jbond) [08:15:55] RECOVERY - Host stat1005 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [08:24:18] !log text@eqiad: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances T264074 [08:24:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:24:25] T264074: varnishkafka 1.1.0 CPU usage increase - https://phabricator.wikimedia.org/T264074 [08:25:30] 10Operations, 10ops-codfw: codfw: relocate thanos-fe2003 to create space for new ms-be servers - https://phabricator.wikimedia.org/T265647 (10fgiunchedi) Proposal sounds good to me @Papaul. Host can be powered off at any time, and must be depooled first [08:26:47] PROBLEM - Host stat1008 is DOWN: PING CRITICAL - Packet loss = 100% [08:28:32] 10Operations, 10DBA, 10Patch-For-Review, 10User-Kormat, 10User-jbond: Refactor mariadb puppet code - https://phabricator.wikimedia.org/T256972 (10Kormat) [08:28:51] 10Operations, 10DBA, 10Patch-For-Review, 10User-Kormat, 10User-jbond: Refactor mariadb puppet code - https://phabricator.wikimedia.org/T256972 (10Kormat) [08:29:55] (03PS2) 10Arturo Borrero Gonzalez: network: constants: add cloud floating IP ranges [puppet] - 10https://gerrit.wikimedia.org/r/634050 [08:29:57] !log upload@eqiad: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest T264074 [08:30:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:30:03] T264074: varnishkafka 1.1.0 CPU usage increase - https://phabricator.wikimedia.org/T264074 [08:30:13] RECOVERY - Host stat1008 is UP: PING OK - Packet loss = 0%, RTA = 0.65 ms [08:30:31] (03CR) 10Arturo Borrero Gonzalez: network: constants: add cloud floating IP ranges (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/634050 (owner: 10Arturo Borrero Gonzalez) [08:32:58] (03CR) 10Gehel: [C: 04-1] udev_reload missing trailing sudo (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/634390 (owner: 10Ryan Kemper) [08:34:07] (03CR) 10Gehel: [C: 04-1] udev_reload missing trailing sudo (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/634390 (owner: 10Ryan Kemper) [08:34:27] (03CR) 10CRusnov: [C: 04-1] "I have to rejigger these two patches', so disregard for now." [dns] - 10https://gerrit.wikimedia.org/r/634302 (https://phabricator.wikimedia.org/T258729) (owner: 10CRusnov) [08:34:40] (03CR) 10CRusnov: [C: 04-1] "Please disregard for now." [dns] - 10https://gerrit.wikimedia.org/r/634303 (https://phabricator.wikimedia.org/T258729) (owner: 10CRusnov) [08:35:07] (03PS1) 10Arturo Borrero Gonzalez: cloudgw: add routes for internal VM addressing [puppet] - 10https://gerrit.wikimedia.org/r/634470 (https://phabricator.wikimedia.org/T261724) [08:36:16] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloudgw: add routes for internal VM addressing [puppet] - 10https://gerrit.wikimedia.org/r/634470 (https://phabricator.wikimedia.org/T261724) (owner: 10Arturo Borrero Gonzalez) [08:36:22] (03PS2) 10Elukey: Add the possibility to set the VSL grouping option [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/633696 (https://phabricator.wikimedia.org/T264074) [08:47:27] (03PS3) 10Elukey: Add the possibility to set the VSL grouping option [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/633696 (https://phabricator.wikimedia.org/T264074) [08:48:50] !log text@codfw: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances T264074 [08:48:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:48:57] T264074: varnishkafka 1.1.0 CPU usage increase - https://phabricator.wikimedia.org/T264074 [08:52:44] !log add BGP_IXP_RS_in to eqsin RS BGP sessions [08:52:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:53:50] !log upload@codfw: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest T264074 [08:53:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:57:08] (03PS1) 10Ayounsi: eqsin: don't let HE prefixes get prioritized [homer/public] - 10https://gerrit.wikimedia.org/r/634473 [08:57:26] (03PS1) 10Elukey: Decommission analytics1051 from the Hadoop cluster [puppet] - 10https://gerrit.wikimedia.org/r/634474 (https://phabricator.wikimedia.org/T255140) [08:58:22] 10Operations, 10Technical-blog-posts, 10Traffic: Blog post series: the evolution of Wikimedia's Content Delivery Network - https://phabricator.wikimedia.org/T264729 (10ema) >>! In T264729#6547807, @srodlund wrote: > This has been changed, and I announced on Twitter. @srodlund: the first picture, "Data cent... [08:58:38] (03CR) 10Klausman: [C: 03+2] Decommission analytics1051 from the Hadoop cluster [puppet] - 10https://gerrit.wikimedia.org/r/634474 (https://phabricator.wikimedia.org/T255140) (owner: 10Elukey) [08:59:15] (03PS1) 10Filippo Giunchedi: profile: add alerts for Thanos sidecar not uploading or failing to do so [puppet] - 10https://gerrit.wikimedia.org/r/634475 (https://phabricator.wikimedia.org/T265632) [08:59:21] (03PS2) 10Elukey: Decommission analytics1051 from the Hadoop cluster [puppet] - 10https://gerrit.wikimedia.org/r/634474 (https://phabricator.wikimedia.org/T255140) [09:00:48] (03CR) 10Klausman: [C: 03+2] Decommission analytics1051 from the Hadoop cluster [puppet] - 10https://gerrit.wikimedia.org/r/634474 (https://phabricator.wikimedia.org/T255140) (owner: 10Elukey) [09:01:22] !log text@eqsin: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances T264074 [09:01:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:01:29] T264074: varnishkafka 1.1.0 CPU usage increase - https://phabricator.wikimedia.org/T264074 [09:03:31] !log eqsin, push CR 634473 [09:03:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:04:42] (03CR) 10Ayounsi: [C: 03+2] eqsin: don't let HE prefixes get prioritized [homer/public] - 10https://gerrit.wikimedia.org/r/634473 (owner: 10Ayounsi) [09:06:42] (03Merged) 10jenkins-bot: eqsin: don't let HE prefixes get prioritized [homer/public] - 10https://gerrit.wikimedia.org/r/634473 (owner: 10Ayounsi) [09:07:19] 10Operations, 10Mail: E-mail for people in different OIT LDAP object unit - https://phabricator.wikimedia.org/T159750 (10Aklapper) >>! In T159750#6549106, @Dzahn wrote: > @Aklapper I see that "WMF-Office-IT" was archived but there does not seem to be a replacement for "ITS". And there shouldn't be a replaceme... [09:08:24] !log upload@eqsin: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest T264074 [09:08:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:08:31] T264074: varnishkafka 1.1.0 CPU usage increase - https://phabricator.wikimedia.org/T264074 [09:13:07] 10Operations, 10Wikimedia-Mailing-lists: Create arbcom-cs mailinglist for Czech Arbitration Committee - https://phabricator.wikimedia.org/T265472 (10JMeybohm) Thanks @Urbanecm Here are the URLs for [[ https://lists.wikimedia.org/mailman/listinfo/arbcom-cs | listinfo ]], [[ https://lists.wikimedia.org/mailman... [09:13:16] 10Operations, 10Wikimedia-Mailing-lists: Create arbcom-cs mailinglist for Czech Arbitration Committee - https://phabricator.wikimedia.org/T265472 (10JMeybohm) 05Open→03Resolved [09:13:42] (03PS4) 10Rosalie Perside (WMDE): Remove unused $wgExtraLanguageNames['qqq'] assignment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628773 (https://phabricator.wikimedia.org/T263441) (owner: 10Lucas Werkmeister (WMDE)) [09:13:44] (03PS4) 10Rosalie Perside (WMDE): Stop using $wmgExtraLanguageNames in CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628774 (https://phabricator.wikimedia.org/T263441) (owner: 10Lucas Werkmeister (WMDE)) [09:13:46] (03PS4) 10Rosalie Perside (WMDE): Remove $wmgExtraLanguageNames from InitialiseSettings-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628775 (https://phabricator.wikimedia.org/T263441) (owner: 10Lucas Werkmeister (WMDE)) [09:13:48] (03PS3) 10Rosalie Perside (WMDE): Set Wikidata MF to collapse sections by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634039 (https://phabricator.wikimedia.org/T239195) (owner: 10Itamar Givon) [09:19:35] !log upload@esams: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest T264074 [09:19:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:19:41] T264074: varnishkafka 1.1.0 CPU usage increase - https://phabricator.wikimedia.org/T264074 [09:23:04] (03PS1) 10Ayounsi: ulsfo: don't let HE prefixes get prioritized [homer/public] - 10https://gerrit.wikimedia.org/r/634478 [09:23:55] !log text@esams (except for cp3050/cp3052): upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances T264074 [09:24:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:26:51] (03PS1) 10JMeybohm: admin: add ladsgroup to ores-admin [puppet] - 10https://gerrit.wikimedia.org/r/634479 (https://phabricator.wikimedia.org/T265172) [09:26:53] (03CR) 10Ayounsi: [C: 03+2] ulsfo: don't let HE prefixes get prioritized [homer/public] - 10https://gerrit.wikimedia.org/r/634478 (owner: 10Ayounsi) [09:27:07] 10Operations, 10Machine Learning Platform, 10SRE-Access-Requests, 10Patch-For-Review: Requesting adding to ores-admin for Ladsgroup - https://phabricator.wikimedia.org/T265172 (10JMeybohm) [x] - User has signed the L3 Acknowledgement of Wikimedia Server Access Responsibilities Document. [x] - User has a... [09:27:21] (03Merged) 10jenkins-bot: ulsfo: don't let HE prefixes get prioritized [homer/public] - 10https://gerrit.wikimedia.org/r/634478 (owner: 10Ayounsi) [09:30:58] 10Operations, 10Analytics-Clusters, 10Traffic, 10Patch-For-Review: varnishkafka 1.1.0 CPU usage increase - https://phabricator.wikimedia.org/T264074 (10ema) 05Open→03Resolved All varnishkafka instances restarted with 6.0.6-1wm2, CPU usage [[https://grafana.wikimedia.org/d/000000253/varnishkafka?viewPan... [09:31:07] (03CR) 10Jbond: "> Patch Set 6:" [puppet] - 10https://gerrit.wikimedia.org/r/634278 (owner: 10Jbond) [09:32:24] 10Operations, 10netops, 10observability, 10Security, 10User-jbond: ulog: filter out diffscan from ulog - https://phabricator.wikimedia.org/T265590 (10fgiunchedi) If the figure of 7.5M logs/day just from ulog dropping expected internal scans is accurate then it does make a difference (about +10% overall l... [09:34:30] (03PS7) 10Jbond: stdlib: update to v6.5.0 [puppet] - 10https://gerrit.wikimedia.org/r/634278 [09:34:43] (03CR) 10Jbond: "> Patch Set 6:" [puppet] - 10https://gerrit.wikimedia.org/r/634278 (owner: 10Jbond) [09:35:34] (03CR) 10Ema: [C: 03+1] Add the possibility to set the VSL grouping option [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/633696 (https://phabricator.wikimedia.org/T264074) (owner: 10Elukey) [09:37:16] (03CR) 10Alexandros Kosiaris: "> Patch Set 6:" [puppet] - 10https://gerrit.wikimedia.org/r/634278 (owner: 10Jbond) [09:37:34] (03CR) 10Alexandros Kosiaris: "If a cross fleet PCC is happy with this, +1 from me." [puppet] - 10https://gerrit.wikimedia.org/r/634278 (owner: 10Jbond) [09:38:18] (03CR) 10Jbond: "> Patch Set 7:" [puppet] - 10https://gerrit.wikimedia.org/r/634278 (owner: 10Jbond) [09:40:51] (03CR) 10Jbond: [C: 04-1] "See inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/634368 (owner: 10Dzahn) [09:50:16] (03CR) 10Lucas Werkmeister (WMDE): [C: 04-2] "Since we added the $wmgExtraLanguageNames back (T264295), this change should not be done at the moment. (There’s no need to keep rebasing " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628774 (https://phabricator.wikimedia.org/T263441) (owner: 10Lucas Werkmeister (WMDE)) [09:51:15] (03PS4) 10Lucas Werkmeister (WMDE): Set Wikidata MF to collapse sections by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634039 (https://phabricator.wikimedia.org/T239195) (owner: 10Itamar Givon) [09:52:08] (03CR) 10Jbond: [C: 03+2] prometheus::rsyslog_exporter: update collector to listen on primary ip [puppet] - 10https://gerrit.wikimedia.org/r/634213 (https://phabricator.wikimedia.org/T265587) (owner: 10Jbond) [09:57:59] (03CR) 10Elukey: [V: 03+2 C: 03+2] Add the possibility to set the VSL grouping option [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/633696 (https://phabricator.wikimedia.org/T264074) (owner: 10Elukey) [09:59:00] 10Operations, 10observability, 10User-fgiunchedi: rsyslog occasional segfault on centrallog hosts - https://phabricator.wikimedia.org/T259780 (10fgiunchedi) Looks like the latest rsyslog version is significantly more stable on centrallog1001! We'll be running 8.2008.0-1~bpo10+1 on centrallog hosts only then,... [10:00:25] 10Operations, 10User-fgiunchedi: rsyslog's in:imtcp thread stuck on recvfrom loop from down/rebooted hosts - https://phabricator.wikimedia.org/T199406 (10fgiunchedi) Note that with the latest rsyslog version (cfr {T259780}) this bug might be fixed too, and thus we can remove the bandaid [10:02:01] (03CR) 10Jbond: [C: 03+1] "Looks good to me but good to get a +1 from arzhel as well" [puppet] - 10https://gerrit.wikimedia.org/r/634050 (owner: 10Arturo Borrero Gonzalez) [10:03:48] 10Operations, 10ops-eqiad: an-scheduler1001 renamed to an-coord1002 - Update Host labelling and Switch ports - https://phabricator.wikimedia.org/T265639 (10JMeybohm) p:05Triage→03Medium [10:08:22] 10Operations, 10LDAP-Access-Requests: Access to the Logstash for John Bolorinos - https://phabricator.wikimedia.org/T264918 (10JMeybohm) @spatton we need a confirmation from your side, please. [10:13:19] (03CR) 10Ayounsi: "First quick look." (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/563186 (https://phabricator.wikimedia.org/T229397) (owner: 10Jbond) [10:15:05] 10Operations, 10observability, 10User-fgiunchedi: Sync users and permissions from LDAP to Grafana - https://phabricator.wikimedia.org/T265712 (10fgiunchedi) [10:15:59] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Set Wikidata MF to collapse sections by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634039 (https://phabricator.wikimedia.org/T239195) (owner: 10Itamar Givon) [10:28:08] (03PS1) 10Reedy: Revert "Fix escaping in PageContentBuilder::buildDefaultContentForPageTitle" [extensions/ProofreadPage] (wmf/1.36.0-wmf.13) - 10https://gerrit.wikimedia.org/r/634416 (https://phabricator.wikimedia.org/T265571) [10:28:32] 10Operations, 10Traffic: ats-be occasional system CPU usage increase - https://phabricator.wikimedia.org/T265625 (10JMeybohm) p:05Triage→03Medium [10:29:38] (03CR) 10Reedy: [C: 03+2] Revert "Fix escaping in PageContentBuilder::buildDefaultContentForPageTitle" [extensions/ProofreadPage] (wmf/1.36.0-wmf.13) - 10https://gerrit.wikimedia.org/r/634416 (https://phabricator.wikimedia.org/T265571) (owner: 10Reedy) [10:33:30] (03Merged) 10jenkins-bot: Revert "Fix escaping in PageContentBuilder::buildDefaultContentForPageTitle" [extensions/ProofreadPage] (wmf/1.36.0-wmf.13) - 10https://gerrit.wikimedia.org/r/634416 (https://phabricator.wikimedia.org/T265571) (owner: 10Reedy) [10:35:41] !log reedy@deploy1001 Synchronized php-1.36.0-wmf.13/extensions/ProofreadPage/: Revert excessive escaping T265571 (duration: 01m 12s) [10:35:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:35:49] T265571: MediaWiki 1.36/wmf.13 needlessly HTML encodes ASCII characters in DjVu text layer - https://phabricator.wikimedia.org/T265571 [10:42:03] 10Operations, 10Pybal, 10Traffic: pybal's "can-depool" logic only takes downServers into account - https://phabricator.wikimedia.org/T184715 (10Vgutierrez) 05Open→03Stalled this hasn't been backported to the 1.15 branch so it's never been deployed in production, I'd keep the task open [11:35:24] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/634479 (https://phabricator.wikimedia.org/T265172) (owner: 10JMeybohm) [11:46:50] (03PS1) 10Arturo Borrero Gonzalez: cloudgw: add support for selecting NICs using hiera [puppet] - 10https://gerrit.wikimedia.org/r/634490 (https://phabricator.wikimedia.org/T244851) [11:48:55] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloudgw: add support for selecting NICs using hiera [puppet] - 10https://gerrit.wikimedia.org/r/634490 (https://phabricator.wikimedia.org/T244851) (owner: 10Arturo Borrero Gonzalez) [11:50:07] (03PS2) 10Arturo Borrero Gonzalez: cloudgw: add support for selecting NICs using hiera [puppet] - 10https://gerrit.wikimedia.org/r/634490 (https://phabricator.wikimedia.org/T261724) [11:50:35] 10Operations, 10netops, 10cloud-services-team (Kanban): Enable L3 routing on cloudsw nodes - https://phabricator.wikimedia.org/T265288 (10aborrero) [11:54:49] (03PS1) 10Arturo Borrero Gonzalez: hieradata: labtestvirt2003: cloudgw: fix NIC names [puppet] - 10https://gerrit.wikimedia.org/r/634492 (https://phabricator.wikimedia.org/T261724) [11:55:11] 10Operations, 10ops-codfw, 10serviceops: Degraded RAID on mw2279 - https://phabricator.wikimedia.org/T264698 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin2001.codfw.wmnet for hosts: ` mw2279.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/202010161152_jiji_2277... [11:55:16] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] hieradata: labtestvirt2003: cloudgw: fix NIC names [puppet] - 10https://gerrit.wikimedia.org/r/634492 (https://phabricator.wikimedia.org/T261724) (owner: 10Arturo Borrero Gonzalez) [12:09:13] !log jiji@cumin2001 START - Cookbook sre.hosts.downtime [12:09:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:11:12] !log jiji@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [12:11:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:34:00] 10Operations, 10ops-codfw, 10serviceops: Degraded RAID on mw2279 - https://phabricator.wikimedia.org/T264698 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2279.codfw.wmnet'] ` and were **ALL** successful. [12:34:09] 10Operations, 10ops-codfw, 10serviceops: Degraded RAID on mw2279 - https://phabricator.wikimedia.org/T264698 (10jijiki) >>! In T264698#6547540, @Papaul wrote: > @jijiki disk replaced > Many thanks! [12:35:57] RECOVERY - Ensure local MW versions match expected deployment on mw2279 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [12:41:11] RECOVERY - MD RAID on mw2279 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [12:50:14] (03CR) 10JMeybohm: [C: 03+2] admin: add ladsgroup to ores-admin [puppet] - 10https://gerrit.wikimedia.org/r/634479 (https://phabricator.wikimedia.org/T265172) (owner: 10JMeybohm) [12:52:25] 10Operations, 10Machine Learning Platform, 10SRE-Access-Requests, 10Patch-For-Review: Requesting adding to ores-admin for Ladsgroup - https://phabricator.wikimedia.org/T265172 (10JMeybohm) 05Open→03Resolved a:03JMeybohm Done [13:07:12] 10Operations, 10Research, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Redirect wikimedia.org/research to research.wikimedia.org instead of some external closed survey - https://phabricator.wikimedia.org/T259979 (10leila) @Aklapper I just saw this after you pinged us via email. sorry about that.... [13:09:41] 10Operations, 10observability, 10User-fgiunchedi: Sync users and permissions from LDAP to Grafana - https://phabricator.wikimedia.org/T265712 (10JMeybohm) p:05Triage→03Medium [13:11:12] 10Operations, 10ops-eqiad, 10DC-Ops: eqiad: Netbox Error for asw2-d4-eqiad - https://phabricator.wikimedia.org/T265393 (10faidon) From the Netbox changelog ("Changelog" tab on the device) it looks like some changes were made on September 28th by @Cmjohnson and later one change on Oct 6th by @wiki_willy. Spec... [13:17:14] 10Operations, 10Machine Learning Platform, 10SRE-Access-Requests: Requesting adding to ores-admin for Ladsgroup - https://phabricator.wikimedia.org/T265172 (10darthmon_wmde) > [] - sudo requests: All sudo requests must be approved by the user manager. If the sudo permissions also give access to restricted... [13:26:06] (03PS6) 10Gehel: Enable cumin EventHandler to disable output. [software/cumin] - 10https://gerrit.wikimedia.org/r/628315 (https://phabricator.wikimedia.org/T212783) [13:26:14] (03PS1) 10Gehel: Mark get_short_command() as private. [software/cumin] - 10https://gerrit.wikimedia.org/r/634504 (https://phabricator.wikimedia.org/T212783) [13:28:22] (03CR) 10jerkins-bot: [V: 04-1] Enable cumin EventHandler to disable output. [software/cumin] - 10https://gerrit.wikimedia.org/r/628315 (https://phabricator.wikimedia.org/T212783) (owner: 10Gehel) [13:31:22] (03PS7) 10Gehel: Enable cumin EventHandler to disable output. [software/cumin] - 10https://gerrit.wikimedia.org/r/628315 (https://phabricator.wikimedia.org/T212783) [13:34:28] (03CR) 10jerkins-bot: [V: 04-1] Enable cumin EventHandler to disable output. [software/cumin] - 10https://gerrit.wikimedia.org/r/628315 (https://phabricator.wikimedia.org/T212783) (owner: 10Gehel) [13:37:46] (03PS8) 10Gehel: Enable cumin EventHandler to disable output. [software/cumin] - 10https://gerrit.wikimedia.org/r/628315 (https://phabricator.wikimedia.org/T212783) [13:38:39] (03CR) 10Gehel: Enable cumin EventHandler to disable output. (034 comments) [software/cumin] - 10https://gerrit.wikimedia.org/r/628315 (https://phabricator.wikimedia.org/T212783) (owner: 10Gehel) [13:40:12] 10Operations, 10LDAP-Access-Requests: Access to the Logstash for John Bolorinos - https://phabricator.wikimedia.org/T264918 (10spatton) Thanks for helping with this @JMeybohm, this is approved from WMF online fundraising's side. [13:41:37] (03PS3) 10Milimetric: analytics_cluster/turnilo: Configure url shortner [puppet] - 10https://gerrit.wikimedia.org/r/622600 (https://phabricator.wikimedia.org/T233336) [13:41:42] !log pooling mw2279.codfw.wmnet T264698 [13:41:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:41:50] T264698: Degraded RAID on mw2279 - https://phabricator.wikimedia.org/T264698 [13:42:30] 10Operations, 10ops-codfw, 10serviceops: Degraded RAID on mw2279 - https://phabricator.wikimedia.org/T264698 (10jijiki) 05Open→03Resolved [13:49:43] (03PS4) 10Milimetric: analytics_cluster/turnilo: Configure url shortner [puppet] - 10https://gerrit.wikimedia.org/r/622600 (https://phabricator.wikimedia.org/T233336) [13:50:42] (03PS1) 10JMeybohm: admin: add jbol to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/634527 (https://phabricator.wikimedia.org/T264918) [13:51:42] (03PS2) 10JMeybohm: admin: add jbol to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/634527 (https://phabricator.wikimedia.org/T264918) [13:52:43] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/634527 (https://phabricator.wikimedia.org/T264918) (owner: 10JMeybohm) [13:52:56] (03CR) 10JMeybohm: [C: 03+2] admin: add jbol to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/634527 (https://phabricator.wikimedia.org/T264918) (owner: 10JMeybohm) [13:53:13] RECOVERY - mediawiki-installation DSH group on mw2279 is OK: OK https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [13:55:10] 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: Access to the Logstash for John Bolorinos - https://phabricator.wikimedia.org/T264918 (10JMeybohm) 05Open→03Resolved a:03JMeybohm Thanks @spatton, the user has been added to the [[ https://wikitech.wikimedia.org/wiki/LDAP/Groups#wmf_group | wmf... [14:06:49] (03CR) 10JMeybohm: [C: 04-1] "PCC does not complain https://puppet-compiler.wmflabs.org/compiler1002/25956/" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/633835 (owner: 10Dzahn) [14:10:35] PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 50, down: 2, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [14:10:37] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 130, down: 1, dormant: 0, excluded: 1, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [14:10:37] PROBLEM - Router interfaces on cr3-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 75, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [14:11:09] 👀 [14:12:19] one of those links had planned maintenance... several hours ago [14:15:44] (03CR) 10Muehlenhoff: sretest: Experiment with preserving docker rules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/634192 (owner: 10Alexandros Kosiaris) [14:17:52] let's do the time warp again? [14:22:45] XioNoX: so with those two links down, eqord is looking kind of isolated... should we do anything? (does it automatically stop advertising our prefixes?) [14:23:33] I guess it still has the wave to eqiad and the tunnel to eqiad, maybe it is fine [14:23:40] yeah [14:23:58] and it does stop advertising our prefix if it doesn't receive them anymore from the core DCs [14:24:00] (03CR) 10Alexandros Kosiaris: sretest: Experiment with preserving docker rules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/634192 (owner: 10Alexandros Kosiaris) [14:24:10] ok! [14:24:22] thanks, sorry for ping [14:24:42] cdanis: no pb at all! I prefer to be aware [14:26:30] I already emailed Telia btw, no reply yet [14:26:50] ah nvm, Telia Carrier Ref: 01226848 [14:29:04] (03CR) 10Muehlenhoff: sretest: Experiment with preserving docker rules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/634192 (owner: 10Alexandros Kosiaris) [14:35:14] 10Operations, 10CheckUser, 10Traffic: Log source port for anonymous users and expose it for sysops/checkusers - https://phabricator.wikimedia.org/T181368 (10Huji) While I understand how this can be helpful when reporting abusers to ISPs, this The use case is narrow and uncommon. [14:36:55] (03PS1) 10JMeybohm: Import new upstream version 2.16.12 [debs/helm] - 10https://gerrit.wikimedia.org/r/634534 (https://phabricator.wikimedia.org/T263616) [14:41:40] 10Operations, 10CheckUser, 10Traffic: Log source port for anonymous users and expose it for sysops/checkusers - https://phabricator.wikimedia.org/T181368 (10Huji) While I understand how this can be helpful when reporting abusers to ISPs, this use case is narrow and uncommon. If we decide to add this to CU lo... [14:58:09] 10Operations, 10EasyTimeline, 10Packaging: WMF deployed EasyTimeline extension depends on Ploticus package which is not available in Debian Buster (but available again in Debian Bullseye) - https://phabricator.wikimedia.org/T253377 (10MoritzMuehlenhoff) [15:00:17] 10Operations, 10EasyTimeline, 10Packaging: WMF deployed EasyTimeline extension depends on Ploticus package which is not available in Debian Buster (but available again in Debian Bullseye) - https://phabricator.wikimedia.org/T253377 (10MoritzMuehlenhoff) I've uploaded an NMU (2.42-4.2) for ploticus which corr... [15:01:28] !log bblack@cumin1001 START - Cookbook sre.hosts.decommission [15:01:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:03:57] (03PS1) 10BBlack: remove refs to decommed cp2 boxes: 3, 9, 15, 21 [puppet] - 10https://gerrit.wikimedia.org/r/634537 (https://phabricator.wikimedia.org/T265729) [15:07:58] (03CR) 10Alexandros Kosiaris: sretest: Experiment with preserving docker rules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/634192 (owner: 10Alexandros Kosiaris) [15:08:41] (03PS1) 10BBlack: Remove cp2 decoms: 3, 9, 15, 12 [dns] - 10https://gerrit.wikimedia.org/r/634539 (https://phabricator.wikimedia.org/T265729) [15:11:12] (03CR) 10JMeybohm: [C: 03+2] Import new upstream version 2.16.12 [debs/helm] - 10https://gerrit.wikimedia.org/r/634534 (https://phabricator.wikimedia.org/T263616) (owner: 10JMeybohm) [15:11:55] !log bblack@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [15:11:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:12:22] 10Operations, 10Technical-blog-posts, 10Traffic: Blog post series: the evolution of Wikimedia's Content Delivery Network - https://phabricator.wikimedia.org/T264729 (10srodlund) Fixed :-) [15:12:57] (03CR) 10BBlack: [C: 03+2] remove refs to decommed cp2 boxes: 3, 9, 15, 21 [puppet] - 10https://gerrit.wikimedia.org/r/634537 (https://phabricator.wikimedia.org/T265729) (owner: 10BBlack) [15:14:14] (03CR) 10BBlack: [C: 03+2] Remove cp2 decoms: 3, 9, 15, 12 [dns] - 10https://gerrit.wikimedia.org/r/634539 (https://phabricator.wikimedia.org/T265729) (owner: 10BBlack) [15:15:44] 10Operations, 10ops-codfw, 10decommission-hardware: decommission cp2003, cp2009, cp2015, cp2021 - https://phabricator.wikimedia.org/T265729 (10BBlack) a:03Papaul [15:22:20] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_cxserver_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:25:24] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:29:38] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=gerrit site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:30:42] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:36:05] !log ayounsi@cumin1001 START - Cookbook sre.network.cf [15:36:05] !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.cf (exit_code=0) [15:36:09] (03CR) 10Urbanecm: [C: 03+2] "no longer necessary, noop for prod" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634410 (owner: 10DannyS712) [15:36:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:36:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:36:49] (03Merged) 10jenkins-bot: Revert "Test authmanager restricter in labs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634410 (owner: 10DannyS712) [15:39:30] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_wikifeeds_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:40:54] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:45:20] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job={cadvisor,purged} site=ulsfo https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:46:52] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:51:59] (03PS1) 10Cicalese: Configuration for user menu and sidebar special pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634543 (https://phabricator.wikimedia.org/T264246) [15:53:26] (03CR) 10Ppchelko: [C: 03+2] "let's try and see if we destroy beta." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634543 (https://phabricator.wikimedia.org/T264246) (owner: 10Cicalese) [15:54:20] (03Merged) 10jenkins-bot: Configuration for user menu and sidebar special pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634543 (https://phabricator.wikimedia.org/T264246) (owner: 10Cicalese) [16:21:33] !log andrew@deploy1001 Started deploy [horizon/deploy@89b308c]: prevent creation of VMs with non-ceph flavors [16:21:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:25:40] !log andrew@deploy1001 Finished deploy [horizon/deploy@89b308c]: prevent creation of VMs with non-ceph flavors (duration: 04m 08s) [16:25:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:28:38] (03PS5) 10Ahmon Dancy: Improve error message if wikiversions.php has wrong format [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622408 [16:30:10] (03CR) 10Ahmon Dancy: Improve error message if wikiversions.php has wrong format (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622408 (owner: 10Ahmon Dancy) [16:34:36] (03PS1) 10Ahmon Dancy: Add ServiceConfig->getDatacenters() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634550 [16:34:39] (03PS1) 10Ahmon Dancy: Define $wmfDatacenters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634551 [16:34:41] (03PS1) 10Ahmon Dancy: Use new $wmfDatacenters global [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634552 [16:35:47] (03CR) 10jerkins-bot: [V: 04-1] Add ServiceConfig->getDatacenters() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634550 (owner: 10Ahmon Dancy) [16:36:38] (03Abandoned) 10Ahmon Dancy: Factor out datacenters lists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622193 (owner: 10Ahmon Dancy) [16:37:56] (03PS2) 10Ahmon Dancy: Add ServiceConfig->getDatacenters() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634550 [16:37:58] (03PS2) 10Ahmon Dancy: Define $wmfDatacenters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634551 [16:38:00] (03PS2) 10Ahmon Dancy: Use new $wmfDatacenters global [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634552 [16:54:10] (03PS4) 10Dduvall: ci: Install docker-credential-environment credHelper [puppet] - 10https://gerrit.wikimedia.org/r/634316 (https://phabricator.wikimedia.org/T265177) [16:59:44] (03CR) 10Dduvall: [C: 03+1] "Added comments to clarify that `store` and `erase` are no-ops for the credential helper." [puppet] - 10https://gerrit.wikimedia.org/r/634316 (https://phabricator.wikimedia.org/T265177) (owner: 10Dduvall) [17:06:09] (03PS1) 10Jbond: diffscan: add refactored diffscan [puppet] - 10https://gerrit.wikimedia.org/r/634556 [17:08:47] (03PS2) 10Jbond: diffscan: add refactored diffscan [puppet] - 10https://gerrit.wikimedia.org/r/634556 [17:08:55] ack cheers ill ping thempcc [17:09:06] * jbond42 wrong window [17:19:12] 10Operations, 10Mail: E-mail for people in different OIT LDAP object unit - https://phabricator.wikimedia.org/T159750 (10Dzahn) >>! In T159750#6549486, @Aklapper wrote: > Tags with names of teams that don't use Phab usually create the wrong expectation that the team would look into tasks in Phab. Shouldn't we... [17:20:15] (03PS3) 10Jbond: iffscan: add refactored diffscan [puppet] - 10https://gerrit.wikimedia.org/r/634556 [17:22:01] (03PS4) 10Jbond: diffscan: add refactored diffscan [puppet] - 10https://gerrit.wikimedia.org/r/634556 [17:23:07] (03PS1) 10Urbanecm: Restore bureaucrat's abilities at uzwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634557 (https://phabricator.wikimedia.org/T265746) [17:24:27] (03CR) 10Ahmon Dancy: [C: 03+1] ci: Install docker-credential-environment credHelper [puppet] - 10https://gerrit.wikimedia.org/r/634316 (https://phabricator.wikimedia.org/T265177) (owner: 10Dduvall) [17:24:50] (03CR) 10RhinosF1: [C: 03+1] "Does Line 12452 need the + as well?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634557 (https://phabricator.wikimedia.org/T265746) (owner: 10Urbanecm) [17:25:55] (03CR) 10RhinosF1: [C: 03+1] "> Patch Set 1: Code-Review+1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634557 (https://phabricator.wikimedia.org/T265746) (owner: 10Urbanecm) [17:27:15] (03PS2) 10Urbanecm: Restore bureaucrat's abilities at uzwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634557 (https://phabricator.wikimedia.org/T265746) [17:28:02] (03CR) 10Urbanecm: "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634557 (https://phabricator.wikimedia.org/T265746) (owner: 10Urbanecm) [17:28:45] (03CR) 10Jbond: [C: 03+2] diffscan: add refactored diffscan [puppet] - 10https://gerrit.wikimedia.org/r/634556 (owner: 10Jbond) [17:29:02] (03CR) 10RhinosF1: [C: 03+1] "> Patch Set 2:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634557 (https://phabricator.wikimedia.org/T265746) (owner: 10Urbanecm) [17:39:09] (03CR) 10Jeena Huneidi: [C: 03+1] ci: Install docker-credential-environment credHelper [puppet] - 10https://gerrit.wikimedia.org/r/634316 (https://phabricator.wikimedia.org/T265177) (owner: 10Dduvall) [17:43:14] !log restarting gerrit due to gc thrashing [17:43:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:49:18] (03PS1) 10Jbond: diffscan: switch to new refactored diffscan [puppet] - 10https://gerrit.wikimedia.org/r/634566 [17:49:38] (03CR) 10jerkins-bot: [V: 04-1] diffscan: switch to new refactored diffscan [puppet] - 10https://gerrit.wikimedia.org/r/634566 (owner: 10Jbond) [17:52:50] (03PS2) 10Jbond: diffscan: switch to new refactored diffscan [puppet] - 10https://gerrit.wikimedia.org/r/634566 [17:57:14] (03CR) 10Jbond: [C: 03+1] "After working with diffscan and getting a bit frustrated with some of the behaviour i ended up falling down the well and rewriting it. we" [puppet] - 10https://gerrit.wikimedia.org/r/630703 (https://phabricator.wikimedia.org/T247364) (owner: 10CRusnov) [18:01:28] (03CR) 10CRusnov: [C: 03+1] "NGL This looks *substantially* better than the original" [puppet] - 10https://gerrit.wikimedia.org/r/634566 (owner: 10Jbond) [18:18:50] (03PS3) 10Dereckson: Restore bureaucrat's abilities at uzwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634557 (https://phabricator.wikimedia.org/T265746) (owner: 10Urbanecm) [18:20:03] hi Dereckson, long time no see :) [18:20:43] (03CR) 10Dereckson: [C: 03+1] "Updated commit message to track the previous change." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634557 (https://phabricator.wikimedia.org/T265746) (owner: 10Urbanecm) [18:21:02] (03CR) 10Urbanecm: "hi, and thanks 😊" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634557 (https://phabricator.wikimedia.org/T265746) (owner: 10Urbanecm) [18:25:20] 10Operations, 10Mail: E-mail for people in different OIT LDAP object unit - https://phabricator.wikimedia.org/T159750 (10Aklapper) @dzahn: There are no open tickets to close, as all open tickets with the #office-it tag also have other active tags. [18:29:16] (03PS1) 10Jbond: diffscan: pyhotnify [puppet] - 10https://gerrit.wikimedia.org/r/634572 [18:33:35] (03CR) 10Jbond: "I have tried to take a stab in the previous CR at bringing this script a bit more inline with our best practices however i think there are" [puppet] - 10https://gerrit.wikimedia.org/r/634572 (owner: 10Jbond) [18:34:19] (03CR) 10Krinkle: [C: 03+1] Add ServiceConfig->getDatacenters() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634550 (owner: 10Ahmon Dancy) [18:34:30] (03CR) 10Krinkle: [C: 03+1] Define $wmfDatacenters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634551 (owner: 10Ahmon Dancy) [18:35:50] (03CR) 10Krinkle: [C: 03+1] Use new $wmfDatacenters global (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634552 (owner: 10Ahmon Dancy) [18:35:58] (03CR) 10Jbond: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/634572 (owner: 10Jbond) [18:41:20] (03PS4) 10Legoktm: Add buildpack images ("stacks") [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/634349 (https://phabricator.wikimedia.org/T265686) [18:42:51] (03PS2) 10RobH: deploy1002 mac info [puppet] - 10https://gerrit.wikimedia.org/r/634333 (https://phabricator.wikimedia.org/T265653) [18:44:26] (03CR) 10RobH: [C: 03+2] deploy1002 mac info [puppet] - 10https://gerrit.wikimedia.org/r/634333 (https://phabricator.wikimedia.org/T265653) (owner: 10RobH) [18:46:28] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:48:12] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:51:38] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=routinator site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:53:16] (03PS1) 10RobH: deploy1002 prod dns [dns] - 10https://gerrit.wikimedia.org/r/634583 (https://phabricator.wikimedia.org/T265653) [18:53:20] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:54:00] (03CR) 10RobH: [C: 03+2] deploy1002 prod dns [dns] - 10https://gerrit.wikimedia.org/r/634583 (https://phabricator.wikimedia.org/T265653) (owner: 10RobH) [19:24:38] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) setup/install deploy1002 - https://phabricator.wikimedia.org/T265653 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts: ` ['deploy1002.eqiad.wmnet'] ` The log can be found in `/var/l... [19:29:28] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) setup/install deploy1002 - https://phabricator.wikimedia.org/T265653 (10RobH) [19:35:49] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) setup/install deploy1002 - https://phabricator.wikimedia.org/T265653 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['deploy1002.eqiad.wmnet'] ` Of which those **FAILED**: ` ['deploy1002.eqiad.wmnet'] ` [19:37:16] !log robh@cumin1001 START - Cookbook sre.hosts.downtime [19:37:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:39:05] !log robh@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [19:39:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:54:10] (03PS1) 10Andrew Bogott: Glance: make rbd the default store in eqiad1 [puppet] - 10https://gerrit.wikimedia.org/r/634595 (https://phabricator.wikimedia.org/T263461) [19:58:12] (03PS2) 10Andrew Bogott: Glance: make rbd the default store in eqiad1 [puppet] - 10https://gerrit.wikimedia.org/r/634595 (https://phabricator.wikimedia.org/T263461) [19:59:19] (03CR) 10Andrew Bogott: "https://puppet-compiler.wmflabs.org/compiler1001/25963/" [puppet] - 10https://gerrit.wikimedia.org/r/634595 (https://phabricator.wikimedia.org/T263461) (owner: 10Andrew Bogott) [20:09:43] (03PS1) 10RobH: adding deploy1002 to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/634598 (https://phabricator.wikimedia.org/T265653) [20:10:48] (03CR) 10RobH: [C: 03+2] adding deploy1002 to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/634598 (https://phabricator.wikimedia.org/T265653) (owner: 10RobH) [20:12:59] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) setup/install deploy1002 - https://phabricator.wikimedia.org/T265653 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts: ` ['deploy1002.eqiad.wmnet'] ` The log can be found in `/var/l... [20:20:43] (03CR) 10Andrew Bogott: [C: 03+2] Glance: make rbd the default store in eqiad1 [puppet] - 10https://gerrit.wikimedia.org/r/634595 (https://phabricator.wikimedia.org/T263461) (owner: 10Andrew Bogott) [20:24:21] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) setup/install deploy1002 - https://phabricator.wikimedia.org/T265653 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['deploy1002.eqiad.wmnet'] ` Of which those **FAILED**: ` ['deploy1002.eqiad.wmnet'] ` [20:25:44] !log robh@cumin1001 START - Cookbook sre.hosts.downtime [20:25:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:27:34] !log robh@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [20:27:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:29:05] 10Operations, 10Mail, 10fr-donorservices: Forwarding or alias for fundraising@ - https://phabricator.wikimedia.org/T252932 (10MBeat33) Thanks @Dzahn We are working this quarter to come up with a strategy for Q3 to mitigate the fundraising@ spam folder issue. There are a few stakeholders involved so Q3 is whe... [20:29:13] 10Operations, 10Mail, 10fr-donorservices: Forwarding or alias for fundraising@ - https://phabricator.wikimedia.org/T252932 (10MBeat33) [20:29:58] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) setup/install deploy1002 - https://phabricator.wikimedia.org/T265653 (10RobH) a:05RobH→03Dzahn This fails reimage due to the initial puppet run failing. Not sure if we should apply a different role, or if you want to take over and... [20:32:25] (03PS1) 10Andrew Bogott: boostrapvs buster manifest: update package list [puppet] - 10https://gerrit.wikimedia.org/r/634617 [20:34:00] (03CR) 10Andrew Bogott: [C: 03+2] boostrapvs buster manifest: update package list [puppet] - 10https://gerrit.wikimedia.org/r/634617 (owner: 10Andrew Bogott) [20:40:40] (03CR) 10Jeena Huneidi: "I just noticed these already exist in the old directory structure 😊" [deployment-charts] - 10https://gerrit.wikimedia.org/r/634354 (https://phabricator.wikimedia.org/T258572) (owner: 10Jeena Huneidi) [20:41:25] (03PS4) 10Cicalese: Configuration for user menu and sidebar special pages. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634356 (https://phabricator.wikimedia.org/T264246) [20:43:32] (03PS1) 10Andrew Bogott: boostrapvz buster manifest: fix a misnamed package [puppet] - 10https://gerrit.wikimedia.org/r/634621 [20:44:03] (03CR) 10Andrew Bogott: [C: 03+2] boostrapvz buster manifest: fix a misnamed package [puppet] - 10https://gerrit.wikimedia.org/r/634621 (owner: 10Andrew Bogott) [20:48:54] (03CR) 10Jeena Huneidi: [DNM] Experimental King helmfile (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/634354 (https://phabricator.wikimedia.org/T258572) (owner: 10Jeena Huneidi) [20:57:09] (03CR) 10Bstorm: "I'd just assumed you'd duplicate things to make it simple, but this seems to look quite sensible (and better than adding a lot of duplicat" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/634349 (https://phabricator.wikimedia.org/T265686) (owner: 10Legoktm) [21:02:39] 10Operations, 10ops-eqiad: an-scheduler1001 renamed to an-coord1002 - Update Host labelling and Switch ports - https://phabricator.wikimedia.org/T265639 (10wiki_willy) a:03Cmjohnson [21:03:12] 10Operations, 10ops-eqiad, 10Discovery-Search: Memory issue on elastic1063 caused elasticsearch to be killed - https://phabricator.wikimedia.org/T265113 (10wiki_willy) a:03Cmjohnson [21:12:34] (03CR) 10Legoktm: "> Patch Set 4:" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/634349 (https://phabricator.wikimedia.org/T265686) (owner: 10Legoktm) [21:15:43] (03PS5) 10Cicalese: Configuration for user menu and sidebar special pages. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634356 (https://phabricator.wikimedia.org/T264246) [21:17:57] (03CR) 10Ppchelko: [C: 03+1] "one small suggestion, otherwise I think it is good." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634356 (https://phabricator.wikimedia.org/T264246) (owner: 10Cicalese) [21:26:56] (03CR) 10Cicalese: Configuration for user menu and sidebar special pages. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634356 (https://phabricator.wikimedia.org/T264246) (owner: 10Cicalese) [21:27:30] (03PS6) 10Cicalese: Configuration for user menu and sidebar special pages. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634356 (https://phabricator.wikimedia.org/T264246) [21:32:27] (03CR) 10CRusnov: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/630703 (https://phabricator.wikimedia.org/T247364) (owner: 10CRusnov) [21:32:45] (03Abandoned) 10CRusnov: diffscan.py: Port to Python3 [puppet] - 10https://gerrit.wikimedia.org/r/630703 (https://phabricator.wikimedia.org/T247364) (owner: 10CRusnov) [21:35:22] 10Operations, 10ops-codfw, 10decommission-hardware: decommission cp2003, cp2009, cp2015, cp2021 - https://phabricator.wikimedia.org/T265729 (10Papaul) ` [edit interfaces interface-range disabled] member ge-1/0/5 { ... } + member xe-2/0/6; [edit interfaces] - xe-2/0/6 { - description cp2003; -... [21:37:01] (03CR) 10Ppchelko: [C: 03+1] Configuration for user menu and sidebar special pages. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634356 (https://phabricator.wikimedia.org/T264246) (owner: 10Cicalese) [21:38:15] 10Operations, 10ops-codfw, 10decommission-hardware: decommission cp2003, cp2009, cp2015, cp2021 - https://phabricator.wikimedia.org/T265729 (10Papaul) [21:41:29] (03CR) 10Bstorm: "> Patch Set 4:" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/634349 (https://phabricator.wikimedia.org/T265686) (owner: 10Legoktm) [21:43:23] !log pt1979@cumin2001 START - Cookbook sre.dns.netbox [21:43:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:46:25] !log pt1979@cumin2001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [21:46:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:47:45] 10Operations, 10ops-codfw, 10decommission-hardware: decommission cp2003, cp2009, cp2015, cp2021 - https://phabricator.wikimedia.org/T265729 (10Papaul) [22:03:49] (03CR) 10Bstorm: [C: 03+1] "Looks good and seems to work well!" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/621776 (https://phabricator.wikimedia.org/T169695) (owner: 10BryanDavis) [22:14:10] (03PS1) 10Fdans: dumps::web::html Add landing page/readme for pageview-complete dumps [puppet] - 10https://gerrit.wikimedia.org/r/634650 (https://phabricator.wikimedia.org/T251777) [22:14:22] (03CR) 10jerkins-bot: [V: 04-1] dumps::web::html Add landing page/readme for pageview-complete dumps [puppet] - 10https://gerrit.wikimedia.org/r/634650 (https://phabricator.wikimedia.org/T251777) (owner: 10Fdans) [22:40:04] (03PS2) 10Jbond: diffscan: pyhotnify [puppet] - 10https://gerrit.wikimedia.org/r/634572 [22:45:06] (03CR) 10Cwhite: profile: add alerts for Thanos sidecar not uploading or failing to do so (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/634475 (https://phabricator.wikimedia.org/T265632) (owner: 10Filippo Giunchedi) [22:51:13] (03PS1) 10Cwhite: icinga: add SMART Phabricator handler script [puppet] - 10https://gerrit.wikimedia.org/r/634659 (https://phabricator.wikimedia.org/T196994) [23:08:13] (03PS3) 10Jbond: diffscan: pyhotnify [puppet] - 10https://gerrit.wikimedia.org/r/634572 [23:10:14] (03PS1) 10Razzi: hue: switch from nginx to envoy for tls [puppet] - 10https://gerrit.wikimedia.org/r/634660 (https://phabricator.wikimedia.org/T240439) [23:14:33] (03PS1) 10Razzi: turnilo: use envoy instead of nginx for tls [puppet] - 10https://gerrit.wikimedia.org/r/634661 (https://phabricator.wikimedia.org/T240439) [23:17:45] (03PS2) 10Razzi: turnilo: use envoy instead of nginx for tls [puppet] - 10https://gerrit.wikimedia.org/r/634661 (https://phabricator.wikimedia.org/T240439) [23:26:48] (03PS1) 10Razzi: superset: use envoy instead of nginx for tls [puppet] - 10https://gerrit.wikimedia.org/r/634662 (https://phabricator.wikimedia.org/T240439) [23:42:43] (03PS1) 10Reedy: wikitech.php: Set CURLOPT_RETURNTRANSFER true in gerrit handler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634663 (https://phabricator.wikimedia.org/T242554) [23:45:19] (03PS1) 10Razzi: piwik: use envoy instead of nginx for tls [puppet] - 10https://gerrit.wikimedia.org/r/634664 (https://phabricator.wikimedia.org/T240439) [23:52:56] (03PS1) 10Razzi: stats: use envoy instead of nginx for tls [puppet] - 10https://gerrit.wikimedia.org/r/634667 (https://phabricator.wikimedia.org/T240439) [23:56:11] (03CR) 10BryanDavis: [C: 03+1] "Nice job tracking that down Reedy. :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634663 (https://phabricator.wikimedia.org/T242554) (owner: 10Reedy) [23:56:40] (03PS3) 10Razzi: turnilo: use envoy instead of nginx for tls [puppet] - 10https://gerrit.wikimedia.org/r/634661 (https://phabricator.wikimedia.org/T240439) [23:58:26] (03PS2) 10Razzi: superset: use envoy instead of nginx for tls [puppet] - 10https://gerrit.wikimedia.org/r/634662 (https://phabricator.wikimedia.org/T240439) [23:59:41] (03PS2) 10Razzi: piwik: use envoy instead of nginx for tls [puppet] - 10https://gerrit.wikimedia.org/r/634664 (https://phabricator.wikimedia.org/T240439)