[00:00:04] RoanKattouw, Niharika, and Urbanecm: Your horoscope predicts another unfortunate Evening SWAT(Max 6 patches) deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200122T0000). [00:00:04] RoanKattouw: A patch you scheduled for Evening SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:03:31] (03PS1) 10Dzahn: ci::httpd: add support fur buster PHP 7.3 [puppet] - 10https://gerrit.wikimedia.org/r/566386 (https://phabricator.wikimedia.org/T224591) [00:05:06] I'll SWAT [00:07:26] (03PS2) 10Dzahn: ci::httpd: add support for buster PHP 7.3 [puppet] - 10https://gerrit.wikimedia.org/r/566386 (https://phabricator.wikimedia.org/T224591) [00:10:06] (03PS1) 10Dzahn: ci::proxy_jenkins: add data types [puppet] - 10https://gerrit.wikimedia.org/r/566387 [00:11:03] 10Operations, 10ops-eqiad, 10serviceops: (No Need By Date Provided) rack/setup/install kubernetes10[07-14].eqiad.wmnet - https://phabricator.wikimedia.org/T241850 (10wiki_willy) ++ @Cmjohnson / @Jclark-ctr - just following up on Effie's previous comment, can you guys decide on a doable turnover date for this... [00:22:52] (03PS2) 10Catrope: GrowthExperiments: Enable homepage on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565449 (https://phabricator.wikimedia.org/T238320) [00:26:09] (03PS2) 10Catrope: GrowthExperiments: Enable suggested edits everywhere (except euwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565398 [00:26:14] (03CR) 10Catrope: [C: 03+2] GrowthExperiments: Enable suggested edits everywhere (except euwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565398 (owner: 10Catrope) [00:26:53] !log catrope@deploy1001 Synchronized php-1.35.0-wmf.15/extensions/GrowthExperiments/: SWAT for T242811, T242052 (duration: 01m 05s) [00:26:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:26:58] T242052: Newcomer tasks: topic matching instrumentation - https://phabricator.wikimedia.org/T242052 [00:26:58] T242811: [betalabs] Newcomer tasks: mobile - task explanation drawer is not present - https://phabricator.wikimedia.org/T242811 [00:27:06] (03Merged) 10jenkins-bot: GrowthExperiments: Enable suggested edits everywhere (except euwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565398 (owner: 10Catrope) [00:27:19] (03PS2) 10Catrope: GrowthExperiments: Enable homepage on hywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565450 (https://phabricator.wikimedia.org/T238320) [00:28:11] (03CR) 10Catrope: "The commit message is wrong, I meant *topics* in suggested edits" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565398 (owner: 10Catrope) [00:31:00] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable topics in suggested edits on cswiki, kowiki, arwiki, viwiki (duration: 01m 05s) [00:31:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:33:29] 10Operations, 10SRE-tools, 10Traffic, 10Goal, and 2 others: Integrate automated DNS snippets into CI - https://phabricator.wikimedia.org/T243362 (10crusnov) [00:34:57] PROBLEM - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on icinga1001 is CRITICAL: 1.138e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw [00:36:34] 10Operations, 10SRE-tools, 10Traffic, 10Goal, and 2 others: Integrate automated DNS snippets into CI - https://phabricator.wikimedia.org/T243362 (10crusnov) The pulling the git looks straight forward (adding to the Dockerfile), as for actually testing against the pull I suspect we'll have to use the templa... [00:37:34] (03PS2) 10Catrope: GrowthExperiments: Enable help panel on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565443 (https://phabricator.wikimedia.org/T238319) [00:38:16] (03PS1) 10Sbisson: Enable InukaPageView instrumentation in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566391 (https://phabricator.wikimedia.org/T238029) [00:38:26] (03CR) 10jerkins-bot: [V: 04-1] GrowthExperiments: Enable help panel on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565443 (https://phabricator.wikimedia.org/T238319) (owner: 10Catrope) [00:38:37] (03PS2) 10Catrope: GrowthExperiments: Enable help panel on hywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565447 (https://phabricator.wikimedia.org/T238319) [00:39:45] (03CR) 10jerkins-bot: [V: 04-1] GrowthExperiments: Enable help panel on hywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565447 (https://phabricator.wikimedia.org/T238319) (owner: 10Catrope) [00:39:58] (03CR) 10CRusnov: "Question is this a child of the previous commit, or a new course of development?" [puppet] - 10https://gerrit.wikimedia.org/r/563186 (https://phabricator.wikimedia.org/T229397) (owner: 10Jbond) [00:40:40] RoanKattouw: Any room for a labs config patch in this window (just needs to be +2 and pulled on the deploy box, if I remember right)? [00:41:07] stephanebisson: Go for it! [00:41:13] I'm done with my deployments [00:41:13] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/566391 [00:41:21] Can you +2? [00:41:24] (03CR) 10Catrope: [C: 03+2] Enable InukaPageView instrumentation in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566391 (https://phabricator.wikimedia.org/T238029) (owner: 10Sbisson) [00:42:14] (03Merged) 10jenkins-bot: Enable InukaPageView instrumentation in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566391 (https://phabricator.wikimedia.org/T238029) (owner: 10Sbisson) [00:43:04] Thanks. I've pulled it on the deployment box [00:47:12] Thanks! [00:51:33] RECOVERY - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on icinga1001 is OK: (C)1e+05 gt (W)1e+04 gt 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw [01:11:03] (03CR) 10CRusnov: "> Patch Set 1: Code-Review-1" (036 comments) [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/562408 (https://phabricator.wikimedia.org/T231512) (owner: 10CRusnov) [01:11:16] (03CR) 10Legoktm: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/565793 (https://phabricator.wikimedia.org/T229920) (owner: 10Legoktm) [01:11:57] (03PS2) 10CRusnov: rotatedump: Enhance to retain period copies [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/562408 (https://phabricator.wikimedia.org/T231512) [01:12:20] (03CR) 10CRusnov: "To be clear my previous comment was on PS2." [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/562408 (https://phabricator.wikimedia.org/T231512) (owner: 10CRusnov) [01:13:30] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: resync, the last sync only took on half the appservers (duration: 01m 05s) [01:13:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:29:47] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [01:31:39] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [01:34:38] 10Operations, 10Traffic, 10Inuka-Team (Kanban), 10MW-1.35-notes (1.35.0-wmf.16; 2020-01-21), 10Performance-Team (Radar): Code for InukaPageView instrumentation - https://phabricator.wikimedia.org/T238029 (10SBisson) @nshahquinn-wmf This is enabled in [[ https://en.m.wikipedia.beta.wmflabs.org/wiki/ | bet... [04:07:33] (03CR) 10Bmansurov: Add recommendation-api chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) (owner: 10Bmansurov) [04:52:54] (03PS1) 10Legoktm: scap: Restructure tox.ini so it's easier to test against Python 3 in the future [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566409 [04:52:56] (03PS1) 10Legoktm: scap: Actually pass flake8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566410 [04:52:58] (03PS1) 10Legoktm: scap: Add Python 3 support [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566411 [04:53:00] (03PS1) 10Legoktm: scap: Clean up unused build configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566412 [04:54:45] (03CR) 10jerkins-bot: [V: 04-1] scap: Clean up unused build configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566412 (owner: 10Legoktm) [04:55:08] (03CR) 10Legoktm: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566412 (owner: 10Legoktm) [05:11:50] PROBLEM - BFD status on cr1-eqiad is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [05:59:53] AaronSchulz: yeah, mostly that, we can try to do it after all hands [06:05:50] 10Operations, 10serviceops, 10Performance-Team (Radar): Increased latency in CODFW API and APP monitoring urls (~07:20 UTC 19 Jan 2020) - https://phabricator.wikimedia.org/T243149 (10Marostegui) >>! In T243149#5821443, @Krinkle wrote: > Looks like the main action is to avoid these alarms in the future, askin... [06:11:04] 10Operations: 2020 Q3 DC switchover and switchback - https://phabricator.wikimedia.org/T243314 (10Marostegui) p:05Triage→03Normal [06:11:13] 10Operations: 2020 Q3 eqiad -> codfw switchover - https://phabricator.wikimedia.org/T243316 (10Marostegui) p:05Triage→03Normal [06:11:19] 10Operations: 2020 Q3 (or later) codfw -> eqiad switchback - https://phabricator.wikimedia.org/T243318 (10Marostegui) p:05Triage→03Normal [06:12:18] 10Operations: 2020 Q3 (or later) codfw -> eqiad switchback - https://phabricator.wikimedia.org/T243318 (10Marostegui) [06:12:20] 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb deamons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [06:14:30] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db2091:3314 T239453', diff saved to https://phabricator.wikimedia.org/P10241 and previous config saved to /var/cache/conftool/dbconfig/20200122-061429-marostegui.json [06:14:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:14:34] T239453: Remove partitions from revision table - https://phabricator.wikimedia.org/T239453 [06:15:24] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1103:3314 T239453', diff saved to https://phabricator.wikimedia.org/P10242 and previous config saved to /var/cache/conftool/dbconfig/20200122-061522-marostegui.json [06:15:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:16:19] !log Remove partitions from db1103:3314 - T239453 [06:16:20] (03PS1) 10Marostegui: db1103: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/566415 (https://phabricator.wikimedia.org/T239453) [06:16:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:17:08] (03CR) 10Marostegui: [C: 03+2] db1103: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/566415 (https://phabricator.wikimedia.org/T239453) (owner: 10Marostegui) [07:22:00] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::webserver: use service catalog directly [puppet] - 10https://gerrit.wikimedia.org/r/566419 [07:22:41] (03CR) 10jerkins-bot: [V: 04-1] profile::mediawiki::webserver: use service catalog directly [puppet] - 10https://gerrit.wikimedia.org/r/566419 (owner: 10Giuseppe Lavagetto) [07:31:31] (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::webserver: use service catalog directly [puppet] - 10https://gerrit.wikimedia.org/r/566419 [07:54:19] (03PS3) 10Giuseppe Lavagetto: profile::mediawiki::webserver: use service catalog directly [puppet] - 10https://gerrit.wikimedia.org/r/566419 [07:54:48] (03CR) 10Giuseppe Lavagetto: "https://puppet-compiler.wmflabs.org/compiler1002/20500/ this is a noop." [puppet] - 10https://gerrit.wikimedia.org/r/566419 (owner: 10Giuseppe Lavagetto) [07:58:04] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me. This should work on both the existing contint1001/jessie and the upcoming contint2001/buster, the component name is the " [puppet] - 10https://gerrit.wikimedia.org/r/566383 (https://phabricator.wikimedia.org/T224591) (owner: 10Dzahn) [07:58:53] (03CR) 10Giuseppe Lavagetto: [C: 03+2] profile::mediawiki::webserver: use service catalog directly [puppet] - 10https://gerrit.wikimedia.org/r/566419 (owner: 10Giuseppe Lavagetto) [08:03:13] RECOVERY - BFD status on cr1-eqiad is OK: OK: UP: 11 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [08:03:16] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::php: remove useless inclusion of lvs::configuration [puppet] - 10https://gerrit.wikimedia.org/r/566461 [08:06:25] 10Operations, 10Wikimedia-Mailing-lists: aol.com subscriptions not possible for mediawiki-commits - https://phabricator.wikimedia.org/T243375 (10siebrand) [08:06:53] 10Operations, 10Wikimedia-Mailing-lists: Block aol.com subscriptions for mediawiki-commits due to spam - https://phabricator.wikimedia.org/T243375 (10siebrand) 05Open→03Resolved [08:19:50] (03PS1) 10Ema: cache: icinga check for high varnishd_mmap_count [puppet] - 10https://gerrit.wikimedia.org/r/566463 (https://phabricator.wikimedia.org/T242417) [08:20:51] (03CR) 10jerkins-bot: [V: 04-1] cache: icinga check for high varnishd_mmap_count [puppet] - 10https://gerrit.wikimedia.org/r/566463 (https://phabricator.wikimedia.org/T242417) (owner: 10Ema) [08:21:30] (03PS1) 10Muehlenhoff: Record new contract date for mayakpwiki [puppet] - 10https://gerrit.wikimedia.org/r/566464 [08:23:58] (03PS1) 10Marostegui: mariadb: Productionize es2023 as es5 codfw master [puppet] - 10https://gerrit.wikimedia.org/r/566465 (https://phabricator.wikimedia.org/T243052) [08:24:58] 10Operations, 10ops-eqiad, 10DBA: (Needed by 31st January) eqiad: rack/setup/install es102[0-5].eqiad.wmnet - https://phabricator.wikimedia.org/T241359 (10Marostegui) >>! In T241359#5809112, @Marostegui wrote: > You guys think this will be ready by 31st Jan? > Thanks. Any ETA on when we can expect these ho... [08:25:43] (03CR) 10Marostegui: [C: 03+2] mariadb: Productionize es2023 as es5 codfw master [puppet] - 10https://gerrit.wikimedia.org/r/566465 (https://phabricator.wikimedia.org/T243052) (owner: 10Marostegui) [08:26:46] (03CR) 10Muehlenhoff: [C: 03+2] Record new contract date for mayakpwiki [puppet] - 10https://gerrit.wikimedia.org/r/566464 (owner: 10Muehlenhoff) [08:27:56] !log Stop MySQL on es2021 to "clone" es2023 - T243052 [08:27:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:28:00] T243052: Productionize es1020-es1025, es2020-es2025 - https://phabricator.wikimedia.org/T243052 [08:29:10] (03CR) 10Muehlenhoff: [C: 03+1] ci::httpd: add support for buster PHP 7.3 [puppet] - 10https://gerrit.wikimedia.org/r/566386 (https://phabricator.wikimedia.org/T224591) (owner: 10Dzahn) [08:37:45] (03PS1) 10Muehlenhoff: Rebuild for Buster T224580 [debs/prometheus-etherpad-exporter] - 10https://gerrit.wikimedia.org/r/566467 [08:39:39] (03CR) 10Muehlenhoff: [C: 03+2] Rebuild for Buster T224580 [debs/prometheus-etherpad-exporter] - 10https://gerrit.wikimedia.org/r/566467 (owner: 10Muehlenhoff) [08:45:13] !log upload prometheus-etherpad-exporter 0.2 to buster-wikimedia T224580 [08:45:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:45:16] T224580: Migrate etherpad1001 to Buster - https://phabricator.wikimedia.org/T224580 [08:53:10] (03PS2) 10Ema: cache: icinga check for high varnishd_mmap_count [puppet] - 10https://gerrit.wikimedia.org/r/566463 (https://phabricator.wikimedia.org/T242417) [08:59:23] (03PS3) 10Ema: cache: icinga check for high varnishd_mmap_count [puppet] - 10https://gerrit.wikimedia.org/r/566463 (https://phabricator.wikimedia.org/T242417) [09:01:14] 10Operations, 10Wikimedia-Etherpad, 10serviceops, 10Patch-For-Review: Migrate etherpad1001 to Buster - https://phabricator.wikimedia.org/T224580 (10MoritzMuehlenhoff) >>! In T224580#5820607, @Dzahn wrote: > The following packages are used by the puppet role but so far missing on buster: > > * prometheus-e... [09:01:17] (03CR) 10Ema: "pcc seems fine https://puppet-compiler.wmflabs.org/compiler1002/20504/" [puppet] - 10https://gerrit.wikimedia.org/r/566463 (https://phabricator.wikimedia.org/T242417) (owner: 10Ema) [09:05:12] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/566303 (owner: 10Filippo Giunchedi) [09:05:19] (03CR) 10Alexandros Kosiaris: [C: 04-1] Add recommendation-api chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) (owner: 10Bmansurov) [09:06:06] (03PS2) 10Muehlenhoff: Switch some analytics roles to standard Partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/566237 (https://phabricator.wikimedia.org/T156955) [09:12:10] (03CR) 10Muehlenhoff: [C: 03+2] Switch some analytics roles to standard Partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/566237 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [09:24:05] (03PS1) 10Muehlenhoff: Fix duplicated entry in netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/566473 [09:25:08] (03CR) 10Muehlenhoff: [C: 03+2] Fix duplicated entry in netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/566473 (owner: 10Muehlenhoff) [09:29:51] (03PS1) 10Muehlenhoff: Switch authdns* to standard Partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/566476 (https://phabricator.wikimedia.org/T156955) [09:38:58] 10Operations: 2020 Q3 (or later) codfw -> eqiad switchback - https://phabricator.wikimedia.org/T243318 (10Marostegui) [09:45:26] (03PS1) 10Muehlenhoff: Switch role for torrelay1001 to spare [puppet] - 10https://gerrit.wikimedia.org/r/566488 [09:47:46] (03CR) 10Muehlenhoff: [C: 03+2] Switch role for torrelay1001 to spare [puppet] - 10https://gerrit.wikimedia.org/r/566488 (owner: 10Muehlenhoff) [09:51:01] (03PS1) 10Marostegui: mariadb: Productionize es2024 as es5 codfw slave [puppet] - 10https://gerrit.wikimedia.org/r/566489 (https://phabricator.wikimedia.org/T243052) [09:51:26] (03PS2) 10Marostegui: mariadb: Productionize es2024 as es5 codfw slave [puppet] - 10https://gerrit.wikimedia.org/r/566489 (https://phabricator.wikimedia.org/T243052) [09:59:10] 10Operations, 10LDAP-Access-Requests: Requesting access to wmf LDAP group for dpifke - https://phabricator.wikimedia.org/T243354 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff @dpifke I've added you to cn=wmf, let me know if you run into any issues. [10:02:36] 10Operations, 10Continuous-Integration-Config: cergen CI fails to run on Debian Stretch because cryptography dependency cannot be built against newer openssl version - https://phabricator.wikimedia.org/T212395 (10MoritzMuehlenhoff) a:05MoritzMuehlenhoff→03None [10:02:49] (03CR) 10ArielGlenn: "> > Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/565403 (owner: 10Dzahn) [10:04:01] !log installing openldap security updates on stretch [10:04:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:07:59] (03CR) 10Marostegui: [C: 03+2] mariadb: Productionize es2024 as es5 codfw slave [puppet] - 10https://gerrit.wikimedia.org/r/566489 (https://phabricator.wikimedia.org/T243052) (owner: 10Marostegui) [10:09:11] !log Stop MySQL on es2023 to "clone" es2024 - T243052 [10:09:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:09:14] T243052: Productionize es1020-es1025, es2020-es2025 - https://phabricator.wikimedia.org/T243052 [10:11:32] (03PS1) 10Muehlenhoff: service::node: Switch to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/566490 [10:14:24] 10Operations, 10Wikispeech-Text-to-Speech, 10Wikispeech-WMSE, 10Wikispeech-jobrunner: TTS server deployment strategy - https://phabricator.wikimedia.org/T193072 (10Addshore) I think the goal here would be to write a blubber file and have the service then deployed ot labs in a similar way to the wikidata te... [10:17:12] (03CR) 10Ema: [C: 03+1] ATS: Trigger update-ocsp-all iff non acme-chief certs are deployed [puppet] - 10https://gerrit.wikimedia.org/r/553526 (owner: 10Vgutierrez) [10:17:39] (03PS4) 10Vgutierrez: ATS: Trigger update-ocsp-all iff non acme-chief certs are deployed [puppet] - 10https://gerrit.wikimedia.org/r/553526 [10:21:36] (03PS1) 10Legoktm: [WIP] toolforge: Port portgrabber related code to Python 3 [puppet] - 10https://gerrit.wikimedia.org/r/566491 (https://phabricator.wikimedia.org/T218427) [10:22:26] (03CR) 10Legoktm: "Surely there's a better way to identify Python versions than what I have in web.py right now." [puppet] - 10https://gerrit.wikimedia.org/r/566491 (https://phabricator.wikimedia.org/T218427) (owner: 10Legoktm) [10:23:09] (03PS1) 10Vgutierrez: cache: Remove nginx from text and upload clusters [puppet] - 10https://gerrit.wikimedia.org/r/566492 (https://phabricator.wikimedia.org/T236120) [10:23:54] (03CR) 10jerkins-bot: [V: 04-1] [WIP] toolforge: Port portgrabber related code to Python 3 [puppet] - 10https://gerrit.wikimedia.org/r/566491 (https://phabricator.wikimedia.org/T218427) (owner: 10Legoktm) [10:27:40] (03CR) 10Vgutierrez: "pcc looks reasonable: https://puppet-compiler.wmflabs.org/compiler1003/20505/" [puppet] - 10https://gerrit.wikimedia.org/r/566492 (https://phabricator.wikimedia.org/T236120) (owner: 10Vgutierrez) [10:28:27] (03CR) 10Vgutierrez: [C: 03+2] ATS: Trigger update-ocsp-all iff non acme-chief certs are deployed [puppet] - 10https://gerrit.wikimedia.org/r/553526 (owner: 10Vgutierrez) [10:28:49] (03PS2) 10Legoktm: [WIP] toolforge: Port portgrabber related code to Python 3 [puppet] - 10https://gerrit.wikimedia.org/r/566491 (https://phabricator.wikimedia.org/T218427) [10:30:01] (03CR) 10Jbond: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/563186 (https://phabricator.wikimedia.org/T229397) (owner: 10Jbond) [10:30:12] (03PS3) 10Cparle: Re-enable delayed new upload jobs for MachineVision extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565615 (https://phabricator.wikimedia.org/T241072) [10:31:22] (03PS4) 10Ema: cache: icinga check for high varnishd_mmap_count [puppet] - 10https://gerrit.wikimedia.org/r/566463 (https://phabricator.wikimedia.org/T242417) [10:33:51] (03CR) 10Ema: "https://puppet-compiler.wmflabs.org/compiler1003/20506/" [puppet] - 10https://gerrit.wikimedia.org/r/566463 (https://phabricator.wikimedia.org/T242417) (owner: 10Ema) [10:34:36] (03PS3) 10Cparle: Remove handler deleted from the MachineVision extension on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565987 (https://phabricator.wikimedia.org/T241242) [10:35:21] (03CR) 10Vgutierrez: [C: 03+1] cache: icinga check for high varnishd_mmap_count [puppet] - 10https://gerrit.wikimedia.org/r/566463 (https://phabricator.wikimedia.org/T242417) (owner: 10Ema) [10:35:46] (03CR) 10jerkins-bot: [V: 04-1] Remove handler deleted from the MachineVision extension on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565987 (https://phabricator.wikimedia.org/T241242) (owner: 10Cparle) [10:35:50] 10Operations: Retire the Tor relay - https://phabricator.wikimedia.org/T243288 (10faidon) To your last point: the [[ https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use | WMCS Terms of Use ]] explicitly lists "network proxy" in the "prohibited activities" section -and even names Tor specific... [10:36:06] (03PS4) 10Cparle: Remove handler deleted from the MachineVision extension on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565987 (https://phabricator.wikimedia.org/T241242) [10:37:06] (03CR) 10Ema: [C: 03+2] cache: icinga check for high varnishd_mmap_count [puppet] - 10https://gerrit.wikimedia.org/r/566463 (https://phabricator.wikimedia.org/T242417) (owner: 10Ema) [10:37:17] (03PS3) 10Cparle: Remove handler deleted from the MachineVision extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565614 (https://phabricator.wikimedia.org/T241242) [10:41:21] RECOVERY - snapshot of s3 in codfw on db1115 is OK: snapshot for s3 at codfw taken less than 4 days ago and larger than 90 GB: Last one 2020-01-22 07:10:12 from db2098.codfw.wmnet:3313 (809 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups [10:47:09] (03PS2) 10Vgutierrez: cache: Remove nginx from text and upload clusters [puppet] - 10https://gerrit.wikimedia.org/r/566492 (https://phabricator.wikimedia.org/T236120) [10:51:06] 10Operations, 10Scap, 10serviceops: Make canary wait time configurable - https://phabricator.wikimedia.org/T217924 (10jijiki) @thcipriani ping! :) [10:56:44] (03PS3) 10Vgutierrez: cache: Remove nginx from text and upload clusters [puppet] - 10https://gerrit.wikimedia.org/r/566492 (https://phabricator.wikimedia.org/T236120) [10:57:02] 10Operations, 10serviceops, 10Patch-For-Review: Move debugging symbols and tools to a new class - https://phabricator.wikimedia.org/T236048 (10jijiki) [10:57:14] (03Abandoned) 10Effie Mouzeli: debug: new module to add debug tools and -dbg packages [puppet] - 10https://gerrit.wikimedia.org/r/550833 (https://phabricator.wikimedia.org/T236048) (owner: 10Effie Mouzeli) [10:59:40] (03PS1) 10Ema: cache: consolidate common text/upload hiera [puppet] - 10https://gerrit.wikimedia.org/r/566495 [11:02:33] akosiaris: minor detail, but in steps 7 and 8 the docs say to run source .hfenv twice, once before diff and once before apply. Is that nessecary? https://wikitech.wikimedia.org/wiki/Migrating_from_scap-helm#Code_deployment/configuration_changes [11:02:48] is anyone able to tell me what would be the update time for "title blacklist" once saved at meta? I know there used to be some lag once it was updated. [11:04:21] (03PS2) 10Ema: cache: consolidate common text/upload hiera [puppet] - 10https://gerrit.wikimedia.org/r/566495 [11:04:52] !log restarting mw canaries to pick up openldap update [11:04:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:07:34] 10Operations: Retire the Tor relay - https://phabricator.wikimedia.org/T243288 (10Legoktm) >>! In T243288#5822351, @faidon wrote: > To your last point: the [[ https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use | WMCS Terms of Use ]] explicitly lists "network proxy" in the "prohibited activi... [11:09:13] 10Operations, 10DNS, 10Domains, 10Traffic: Donate wikiźródła.pl and wikisłownik.pl to the Foundation - https://phabricator.wikimedia.org/T240446 (10tomasz) >>! In T240446#5821331, @Dzahn wrote: > @tomasz Is @CRoslof looped into the conversation with Doneva? Not into the message with the AuthInfo codes, ho... [11:09:34] (03PS3) 10Ema: cache: consolidate common text/upload hiera [puppet] - 10https://gerrit.wikimedia.org/r/566495 [11:13:53] 10Operations, 10Tor: Retire the Tor relay - https://phabricator.wikimedia.org/T243288 (10Peachey88) [11:15:05] (03PS1) 10Jbond: role::puppetmaster::standalone: support multiple puppetdb servers [puppet] - 10https://gerrit.wikimedia.org/r/566500 (https://phabricator.wikimedia.org/T243226) [11:15:13] (03PS4) 10Vgutierrez: cache: Remove nginx from text and upload clusters [puppet] - 10https://gerrit.wikimedia.org/r/566492 (https://phabricator.wikimedia.org/T236120) [11:16:54] !log restarting exim on MXes to pick up new openldap [11:16:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:17:06] (03CR) 10jerkins-bot: [V: 04-1] role::puppetmaster::standalone: support multiple puppetdb servers [puppet] - 10https://gerrit.wikimedia.org/r/566500 (https://phabricator.wikimedia.org/T243226) (owner: 10Jbond) [11:18:42] (03PS4) 10Ema: cache: consolidate common text/upload hiera [puppet] - 10https://gerrit.wikimedia.org/r/566495 [11:22:10] (03PS2) 10Jbond: role::puppetmaster::standalone: support multiple puppetdb servers [puppet] - 10https://gerrit.wikimedia.org/r/566500 (https://phabricator.wikimedia.org/T243226) [11:27:09] (03CR) 10Vgutierrez: "pcc looking good: https://puppet-compiler.wmflabs.org/compiler1002/20511/" [puppet] - 10https://gerrit.wikimedia.org/r/566492 (https://phabricator.wikimedia.org/T236120) (owner: 10Vgutierrez) [11:27:13] (03PS5) 10Ema: cache: consolidate common text/upload hiera [puppet] - 10https://gerrit.wikimedia.org/r/566495 [11:28:36] mvolz: you are right, it's not necessary. [11:29:40] (03PS5) 10Vgutierrez: cache: Remove nginx from text and upload clusters [puppet] - 10https://gerrit.wikimedia.org/r/566492 (https://phabricator.wikimedia.org/T236120) [11:34:52] (03CR) 10Ema: [C: 03+1] "yay!" [puppet] - 10https://gerrit.wikimedia.org/r/566492 (https://phabricator.wikimedia.org/T236120) (owner: 10Vgutierrez) [11:34:55] (03CR) 10Muehlenhoff: role::puppetmaster::standalone: support multiple puppetdb servers (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/566500 (https://phabricator.wikimedia.org/T243226) (owner: 10Jbond) [11:37:56] (03PS1) 10Ladsgroup: Revert "Revert "Set useEntitySourceBasedFederation to true for Wikidata"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566506 (https://phabricator.wikimedia.org/T241972) [11:38:26] !log disabled wikitech 2fa for Cparle [11:38:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:40:07] !log restarting apache on puppetboard/graphite/webperf to pick up OpenLDAP update [11:40:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:42:47] (03PS1) 10Arturo Borrero Gonzalez: toolforge: migrate toolviews puppet code to the toolforge namespace [puppet] - 10https://gerrit.wikimedia.org/r/566508 [11:42:49] (03PS1) 10Arturo Borrero Gonzalez: toolforge: toolviews: don't run it if not in the tools CloudVPS project [puppet] - 10https://gerrit.wikimedia.org/r/566509 [11:44:33] (03PS3) 10Jbond: role::puppetmaster::standalone: support multiple puppetdb servers [puppet] - 10https://gerrit.wikimedia.org/r/566500 (https://phabricator.wikimedia.org/T243226) [11:45:16] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: migrate toolviews puppet code to the toolforge namespace [puppet] - 10https://gerrit.wikimedia.org/r/566508 (owner: 10Arturo Borrero Gonzalez) [11:46:04] (03CR) 10Jbond: "updated thanks" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/566500 (https://phabricator.wikimedia.org/T243226) (owner: 10Jbond) [11:46:13] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: toolviews: don't run it if not in the tools CloudVPS project [puppet] - 10https://gerrit.wikimedia.org/r/566509 (owner: 10Arturo Borrero Gonzalez) [11:46:15] 10Operations: Integrate Stretch 9.10/9.11 point updates - https://phabricator.wikimedia.org/T232308 (10MoritzMuehlenhoff) [11:48:07] (03PS4) 10Jbond: role::puppetmaster::standalone: support multiple puppetdb servers [puppet] - 10https://gerrit.wikimedia.org/r/566500 (https://phabricator.wikimedia.org/T243226) [11:49:07] (03CR) 10Addshore: [C: 03+1] Revert "Revert "Set useEntitySourceBasedFederation to true for Wikidata"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566506 (https://phabricator.wikimedia.org/T241972) (owner: 10Ladsgroup) [11:49:43] (03CR) 10Ema: [C: 03+1] "Yuhu!" [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/566309 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez) [11:52:50] 10Operations, 10DC-Ops, 10decommission: decommission torrelay1001 - https://phabricator.wikimedia.org/T243390 (10MoritzMuehlenhoff) [11:53:04] 10Operations, 10DC-Ops, 10decommission: decommission torrelay1001 - https://phabricator.wikimedia.org/T243390 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff [11:53:51] 10Operations, 10DC-Ops, 10decommission: decommission torrelay1001 - https://phabricator.wikimedia.org/T243390 (10MoritzMuehlenhoff) [11:54:58] 10Operations, 10Traffic: Upgrade ncredir cluster to buster - https://phabricator.wikimedia.org/T243391 (10Vgutierrez) [11:55:11] 10Operations, 10Traffic: Upgrade ncredir cluster to buster - https://phabricator.wikimedia.org/T243391 (10Vgutierrez) p:05Triage→03Normal [11:55:55] (03PS5) 10Jbond: role::puppetmaster::standalone: support multiple puppetdb servers [puppet] - 10https://gerrit.wikimedia.org/r/566500 (https://phabricator.wikimedia.org/T243226) [11:55:57] (03PS1) 10Jbond: role::puppetmaster::standalone: add type checking to autosign [puppet] - 10https://gerrit.wikimedia.org/r/566512 [11:57:06] (03CR) 10Vgutierrez: [V: 03+2 C: 03+2] Release 8.0.5-1wm13 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/566309 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez) [12:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: It is that lovely time of the day again! You are hereby commanded to deploy European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200122T1200). [12:00:04] No GERRIT patches in the queue for this window AFAICS. [12:00:21] There's some gerrit patches [12:00:29] I forgot to refresh jounce bot [12:00:37] (03CR) 10Jbond: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/20514/" [puppet] - 10https://gerrit.wikimedia.org/r/566500 (https://phabricator.wikimedia.org/T243226) (owner: 10Jbond) [12:01:00] 10Operations, 10Traffic, 10Patch-For-Review: varnish-fe crashes due to "Error in munmap(): Cannot allocate memory" - https://phabricator.wikimedia.org/T242417 (10ema) 05Open→03Resolved a:03ema Raised `vm.max_map_count` and added an icinga check alerting if the number of memory map areas used by varnish... [12:04:13] jouncebot: refresh [12:04:14] I refreshed my knowledge about deployments. [12:04:18] jouncebot: now [12:04:18] For the next 0 hour(s) and 55 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200122T1200) [12:05:43] T243394 [12:05:44] T243394: Automatically refresh jouncebot just before a deployment window starts - https://phabricator.wikimedia.org/T243394 [12:08:36] * addshore watches [12:14:42] !log jmm@cumin1001 START - Cookbook sre.hosts.decommission [12:14:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:15:17] !log jmm@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [12:15:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:15:21] 10Operations, 10DC-Ops, 10decommission: decommission torrelay1001 - https://phabricator.wikimedia.org/T243390 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin1001 for hosts: `torrelay1001.wikimedia.org` - torrelay1001.wikimedia.org (**PASS**) - Downtimed host on Icinga - Dow... [12:17:33] !log Disable puppet on mw* and wtp* to merge 563206 [12:17:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:17:45] (03PS6) 10Vgutierrez: cache: Remove nginx from text and upload clusters [puppet] - 10https://gerrit.wikimedia.org/r/566492 (https://phabricator.wikimedia.org/T236120) [12:19:25] (03CR) 10Effie Mouzeli: [C: 03+2] mtail: Remove hhvm handler [puppet] - 10https://gerrit.wikimedia.org/r/563206 (owner: 10Effie Mouzeli) [12:24:14] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/566500 (https://phabricator.wikimedia.org/T243226) (owner: 10Jbond) [12:25:31] (03PS1) 10Muehlenhoff: Remove Puppet references to torrelay1001 [puppet] - 10https://gerrit.wikimedia.org/r/566514 (https://phabricator.wikimedia.org/T243390) [12:26:26] 10Operations, 10DC-Ops, 10decommission, 10Patch-For-Review: decommission torrelay1001 - https://phabricator.wikimedia.org/T243390 (10MoritzMuehlenhoff) [12:27:32] (03CR) 10Vgutierrez: "I've fixed some ats-tls configuration on deployment-cache-{text,upload}05, as soon as puppet runs there, it should be able to switch from " [puppet] - 10https://gerrit.wikimedia.org/r/566492 (https://phabricator.wikimedia.org/T236120) (owner: 10Vgutierrez) [12:27:48] (03CR) 10Ema: [C: 03+1] cache: Remove nginx from text and upload clusters [puppet] - 10https://gerrit.wikimedia.org/r/566492 (https://phabricator.wikimedia.org/T236120) (owner: 10Vgutierrez) [12:28:41] jenkins is soo slow [12:29:02] (03CR) 10Muehlenhoff: [C: 03+2] Remove Puppet references to torrelay1001 [puppet] - 10https://gerrit.wikimedia.org/r/566514 (https://phabricator.wikimedia.org/T243390) (owner: 10Muehlenhoff) [12:30:40] !log uploaded trafficserver 8.0.5-1wm13 to apt.w.o (buster) - T242093 [12:30:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:30:43] T242093: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 [12:34:44] (03PS1) 10Muehlenhoff: Remove DNS emtries for torrelay1001 and the tor-eqiad-1 CNAME [dns] - 10https://gerrit.wikimedia.org/r/566515 [12:35:06] !log enable puppet and restart mtail on mw* and wtp* [12:35:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:36:28] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [12:36:56] !log ladsgroup@deploy1001 Synchronized php-1.35.0-wmf.16/extensions/WikibaseQualityConstraints: [[gerrit:566505|Better dependency injection of base URI in ConstraintParameterParser (T241972)]] (duration: 01m 14s) [12:36:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:36:59] T241972: wmgUseEntitySourceBasedFederation true for Wikidata.org - https://phabricator.wikimedia.org/T241972 [12:37:19] (03PS2) 10Muehlenhoff: Remove DNS entries for torrelay1001 and the tor-eqiad-1 CNAME [dns] - 10https://gerrit.wikimedia.org/r/566515 [12:38:18] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [12:42:53] 10Operations, 10Beta-Cluster-Infrastructure, 10Patch-For-Review: Upgrade puppet in deployment-prep - https://phabricator.wikimedia.org/T243226 (10jbond) hi alex, I noticed that the postgress databases was missing the uppetdb user, however a simple puppet run on the puppet master fixed the problem. i did re... [12:42:55] (03CR) 10Muehlenhoff: [C: 03+2] Remove DNS entries for torrelay1001 and the tor-eqiad-1 CNAME [dns] - 10https://gerrit.wikimedia.org/r/566515 (owner: 10Muehlenhoff) [12:43:31] 10Operations, 10DC-Ops, 10decommission, 10Patch-For-Review: decommission torrelay1001 - https://phabricator.wikimedia.org/T243390 (10MoritzMuehlenhoff) [12:43:37] (03PS2) 10Ladsgroup: Revert "Revert "Set useEntitySourceBasedFederation to true for Wikidata"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566506 (https://phabricator.wikimedia.org/T241972) [12:43:45] (03CR) 10Ladsgroup: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566506 (https://phabricator.wikimedia.org/T241972) (owner: 10Ladsgroup) [12:43:46] !log ladsgroup@deploy1001 scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details) [12:43:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:44:51] (03Merged) 10jenkins-bot: Revert "Revert "Set useEntitySourceBasedFederation to true for Wikidata"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566506 (https://phabricator.wikimedia.org/T241972) (owner: 10Ladsgroup) [12:46:17] !log ladsgroup@deploy1001 Synchronized php-1.35.0-wmf.15/extensions/WikibaseQualityConstraints: [[gerrit:566504|Better dependency injection of base URI in ConstraintParameterParser (T241972)]] (duration: 01m 05s) [12:46:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:21] T241972: wmgUseEntitySourceBasedFederation true for Wikidata.org - https://phabricator.wikimedia.org/T241972 [12:46:34] The error is because the files are not in sync during the deployment, I don't think I should fix it [12:47:47] !log disable puppet fleat wide - upgrade jdk on puppetdb [12:47:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:49:00] PROBLEM - puppet last run on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [12:50:08] !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:562578|Set useEntitySourceBasedFederation to true for Wikidata (T241972)]] (duration: 01m 06s) [12:50:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:50:30] PROBLEM - Check size of conntrack table on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack [12:50:38] PROBLEM - MD RAID on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [12:50:42] PROBLEM - Disk space on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=notebook1003&var-datasource=eqiad+prometheus/ops [12:50:50] PROBLEM - DPKG on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [12:50:52] PROBLEM - Check systemd state on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:51:00] PROBLEM - configured eth on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth [12:51:55] !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:562578|Set useEntitySourceBasedFederation to true for Wikidata (T241972)]] (duration: 01m 05s) [12:51:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:51:58] T241972: wmgUseEntitySourceBasedFederation true for Wikidata.org - https://phabricator.wikimedia.org/T241972 [12:54:37] (03PS1) 10Alexandros Kosiaris: Rebuild for buster [debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/566517 (https://phabricator.wikimedia.org/T224580) [12:57:56] !log Updated the Wikidata property suggester with data from the 2020-01-06 JSON dump and applied the T132839 workarounds [12:57:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:58:00] T132839: [RfC] Property suggester suggests human properties for non-human items - https://phabricator.wikimedia.org/T132839 [12:59:08] (03PS1) 10Ladsgroup: Revert "Revert "Revert "Set useEntitySourceBasedFederation to true for Wikidata""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566520 [12:59:43] !log restart npre on notebook1003 [12:59:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:59:53] (03CR) 10Ladsgroup: [C: 03+2] Revert "Revert "Revert "Set useEntitySourceBasedFederation to true for Wikidata""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566520 (owner: 10Ladsgroup) [12:59:58] RECOVERY - Check size of conntrack table on notebook1003 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack [13:00:10] RECOVERY - MD RAID on notebook1003 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [13:00:14] RECOVERY - Disk space on notebook1003 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=notebook1003&var-datasource=eqiad+prometheus/ops [13:00:24] RECOVERY - DPKG on notebook1003 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [13:00:26] RECOVERY - Check systemd state on notebook1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:00:37] (03PS2) 10Alexandros Kosiaris: Rebuild for buster [debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/566517 (https://phabricator.wikimedia.org/T224580) [13:00:38] RECOVERY - configured eth on notebook1003 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth [13:00:53] (03Merged) 10jenkins-bot: Revert "Revert "Revert "Set useEntitySourceBasedFederation to true for Wikidata""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566520 (owner: 10Ladsgroup) [13:01:33] (03CR) 10Effie Mouzeli: [C: 03+2] role::prometheus::beta: add mcrouter metrics [puppet] - 10https://gerrit.wikimedia.org/r/566255 (owner: 10Effie Mouzeli) [13:02:22] !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert: [[gerrit:562578|Set useEntitySourceBasedFederation to true for Wikidata (T241972)]] (duration: 01m 05s) [13:02:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:02:25] T241972: wmgUseEntitySourceBasedFederation true for Wikidata.org - https://phabricator.wikimedia.org/T241972 [13:03:04] 10Operations, 10DC-Ops, 10decommission, 10Patch-For-Review: Reclaim torrelay1001 to spares - https://phabricator.wikimedia.org/T243390 (10MoritzMuehlenhoff) a:05MoritzMuehlenhoff→03Jclark-ctr [13:03:42] !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert: [[gerrit:562578|Set useEntitySourceBasedFederation to true for Wikidata (T241972)]] (duration: 01m 05s) [13:03:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:06:57] 10Operations, 10Tor: Retire the Tor relay - https://phabricator.wikimedia.org/T243288 (10MoritzMuehlenhoff) torrelay1001 is being reclaimed to the spare pool via https://phabricator.wikimedia.org/T243390 (only pending DC ops steps like disk wipe) [13:07:41] !log EU SWAT is over [13:07:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:09:16] PROBLEM - Check systemd state on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:09:30] PROBLEM - configured eth on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth [13:10:20] PROBLEM - Check size of conntrack table on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack [13:10:34] PROBLEM - puppet last run on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [13:10:34] PROBLEM - MD RAID on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [13:10:38] PROBLEM - Disk space on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=notebook1003&var-datasource=eqiad+prometheus/ops [13:10:52] PROBLEM - DPKG on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [13:12:38] PROBLEM - Check whether ferm is active by checking the default input chain on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [13:17:12] RECOVERY - Check size of conntrack table on notebook1003 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack [13:17:28] RECOVERY - MD RAID on notebook1003 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [13:17:32] RECOVERY - Disk space on notebook1003 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=notebook1003&var-datasource=eqiad+prometheus/ops [13:17:46] RECOVERY - DPKG on notebook1003 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [13:17:48] RECOVERY - Check systemd state on notebook1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:17:54] RECOVERY - Check whether ferm is active by checking the default input chain on notebook1003 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [13:18:04] RECOVERY - configured eth on notebook1003 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth [13:22:12] RECOVERY - puppet last run on notebook1003 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [13:30:37] (03PS6) 10Jbond: role::puppetmaster::standalone: support multiple puppetdb servers [puppet] - 10https://gerrit.wikimedia.org/r/566500 (https://phabricator.wikimedia.org/T243226) [13:32:09] (03PS2) 10Jbond: role::puppetmaster::standalone: add type checking to autosign [puppet] - 10https://gerrit.wikimedia.org/r/566512 [13:33:29] (03CR) 10Jbond: "PCC: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/20515/" [puppet] - 10https://gerrit.wikimedia.org/r/566500 (https://phabricator.wikimedia.org/T243226) (owner: 10Jbond) [13:49:15] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] role::puppetmaster::standalone: support multiple puppetdb servers [puppet] - 10https://gerrit.wikimedia.org/r/566500 (https://phabricator.wikimedia.org/T243226) (owner: 10Jbond) [13:50:59] 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work), 10Epic: [Epic] Scaling strategy for Wikidata Query Service - https://phabricator.wikimedia.org/T221938 (10Gehel) a:03dcausse [13:54:58] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, two comments inline" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/566284 (https://phabricator.wikimedia.org/T238820) (owner: 10Arturo Borrero Gonzalez) [13:56:24] (03CR) 10Jbond: [C: 03+2] role::puppetmaster::standalone: support multiple puppetdb servers [puppet] - 10https://gerrit.wikimedia.org/r/566500 (https://phabricator.wikimedia.org/T243226) (owner: 10Jbond) [13:58:59] 10Operations, 10ops-eqiad, 10DC-Ops, 10Epic, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1013 with 10G interfaces - https://phabricator.wikimedia.org/T243414 (10Andrew) [13:59:03] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] mirrors: mirror Debian openstack backports repositories (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/566284 (https://phabricator.wikimedia.org/T238820) (owner: 10Arturo Borrero Gonzalez) [14:00:13] 10Operations, 10ops-eqiad, 10DC-Ops, 10Epic, 10cloud-services-team (Kanban): Move cloudvirt hosts to 10Gb ethernet - https://phabricator.wikimedia.org/T216195 (10Andrew) [14:04:36] (03PS3) 10Alexandros Kosiaris: Rebuild for buster [debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/566517 (https://phabricator.wikimedia.org/T224580) [14:14:14] (03CR) 10Alexandros Kosiaris: [C: 03+2] Rebuild for buster [debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/566517 (https://phabricator.wikimedia.org/T224580) (owner: 10Alexandros Kosiaris) [14:14:53] (03PS1) 10Marostegui: check_depooled: Add es4 and es5 [software] - 10https://gerrit.wikimedia.org/r/566528 (https://phabricator.wikimedia.org/T243052) [14:15:00] PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 299565752 and 28 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [14:15:58] (03CR) 10Marostegui: [C: 03+2] check_depooled: Add es4 and es5 [software] - 10https://gerrit.wikimedia.org/r/566528 (https://phabricator.wikimedia.org/T243052) (owner: 10Marostegui) [14:16:28] (03Merged) 10jenkins-bot: check_depooled: Add es4 and es5 [software] - 10https://gerrit.wikimedia.org/r/566528 (https://phabricator.wikimedia.org/T243052) (owner: 10Marostegui) [14:16:52] RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 58320 and 14 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [14:17:08] PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 523499328 and 110 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [14:18:54] !log upload etherpad-lite_1.7.5-3 to apt.wikimedia.org buster-wikimedia/main T224580 [14:18:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:18:58] T224580: Migrate etherpad1001 to Buster - https://phabricator.wikimedia.org/T224580 [14:18:58] RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 472 and 141 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [14:21:39] 10Operations, 10Wikimedia-Etherpad, 10serviceops, 10Patch-For-Review: Migrate etherpad1001 to Buster - https://phabricator.wikimedia.org/T224580 (10akosiaris) >>! In T224580#5820607, @Dzahn wrote: > The following packages are used by the puppet role but so far missing on buster: > > * prometheus-etherpad-... [14:22:37] Hello, I need https://gerrit.wikimedia.org/r/#/c/integration/config/+/566519/ deployed asap [14:25:30] 10Operations, 10Beta-Cluster-Infrastructure, 10Patch-For-Review: Upgrade puppet in deployment-prep - https://phabricator.wikimedia.org/T243226 (10jbond) Hi alex, I have pushed the change to allow multiple puppetdb's with command_broadcast and updated the node meta data in horiozon (i removed some redundant... [14:25:34] (03PS1) 10Vgutierrez: Release 0.6-1 [software/fifo-log-demux] - 10https://gerrit.wikimedia.org/r/566530 (https://phabricator.wikimedia.org/T242093) [14:25:42] (03CR) 10jerkins-bot: [V: 04-1] Release 0.6-1 [software/fifo-log-demux] - 10https://gerrit.wikimedia.org/r/566530 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez) [14:30:44] 10Operations, 10ops-eqiad, 10DC-Ops, 10Epic, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1013 with 10G interfaces - https://phabricator.wikimedia.org/T243414 (10Andrew) a:05Andrew→03Cmjohnson [14:34:12] (03PS2) 10Vgutierrez: Release 0.6-1 [software/fifo-log-demux] - 10https://gerrit.wikimedia.org/r/566530 (https://phabricator.wikimedia.org/T242093) [14:34:57] (03PS3) 10Vgutierrez: Release 0.6.1 [software/fifo-log-demux] - 10https://gerrit.wikimedia.org/r/566530 (https://phabricator.wikimedia.org/T242093) [14:39:51] !log Stop MySQL on db2085:3311 and db2085:3318 for onsite maintenance - [14:39:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:59] !log Stop MySQL on db2085:3311 and db2085:3318 for onsite maintenance - T243148 [14:40:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:40:01] T243148: db2085 crashed - memory issues - https://phabricator.wikimedia.org/T243148 [14:41:22] (03PS1) 10Vgutierrez: vhtcpd (0.1.2-2) buster-wikimedia; urgency=medium [software/varnish/vhtcpd] (debian) - 10https://gerrit.wikimedia.org/r/566532 (https://phabricator.wikimedia.org/T242093) [14:42:58] PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:45:51] (03PS1) 10Jbond: puppetmaster2003: add puppetmaster2003 as a canary host and server [puppet] - 10https://gerrit.wikimedia.org/r/566534 (https://phabricator.wikimedia.org/T239732) [14:50:46] !log copied python3-file-read-backwards to apt.w.o (buster) - T242093 [14:50:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:50:49] T242093: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 [14:53:07] !log copied python3-logstash to apt.w.o (buster) - T242093 [14:53:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:54:02] !log FW upgrade on db2085 [14:54:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:10:36] RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:17:55] (03PS1) 10Ema: cache: remove 'enable_geoiplookup' from vcl_config [puppet] - 10https://gerrit.wikimedia.org/r/566536 (https://phabricator.wikimedia.org/T222177) [15:20:20] (03CR) 10Ema: [C: 03+1] vhtcpd (0.1.2-2) buster-wikimedia; urgency=medium [software/varnish/vhtcpd] (debian) - 10https://gerrit.wikimedia.org/r/566532 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez) [15:20:47] (03CR) 10Vgutierrez: [C: 03+2] vhtcpd (0.1.2-2) buster-wikimedia; urgency=medium [software/varnish/vhtcpd] (debian) - 10https://gerrit.wikimedia.org/r/566532 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez) [15:22:33] 10Operations, 10ops-codfw, 10DBA: db2085 crashed - memory issues - https://phabricator.wikimedia.org/T243148 (10Papaul) a:05Papaul→03Marostegui Before ` BIOS Version 2.9.1 Firmware Version 2.61.60.60 ` After BIOS Version 2.11.0 Firmware Version 2.70.70.70 @Marostegui FW upgrade complete [15:23:00] (03PS3) 10Matthias Mullie: Add 3d-patents page to wgForceUIMsgAsContentMsg [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416730 [15:24:09] (03CR) 10Ema: [C: 03+1] Release 0.6.1 [software/fifo-log-demux] - 10https://gerrit.wikimedia.org/r/566530 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez) [15:24:49] (03CR) 10Vgutierrez: [C: 03+2] Release 0.6.1 [software/fifo-log-demux] - 10https://gerrit.wikimedia.org/r/566530 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez) [15:26:44] (03CR) 10Alexandros Kosiaris: [C: 04-1] New eventstreams chart (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/551843 (https://phabricator.wikimedia.org/T238658) (owner: 10Ottomata) [15:29:07] !log uploaded fifo-log-demux 0.6.1 to apt.w.o (buster) - T242093 [15:29:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:29:11] T242093: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 [15:29:33] (03CR) 10Ema: "pcc: https://puppet-compiler.wmflabs.org/compiler1002/20516/" [puppet] - 10https://gerrit.wikimedia.org/r/566536 (https://phabricator.wikimedia.org/T222177) (owner: 10Ema) [15:29:40] 10Operations, 10ops-codfw, 10DBA: db2085 crashed - memory issues - https://phabricator.wikimedia.org/T243148 (10Marostegui) Thanks @Papaul - going to start mysql [15:30:13] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Couple of minor charts, overall I think it's pretty close to being merged" (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/551843 (https://phabricator.wikimedia.org/T238658) (owner: 10Ottomata) [15:30:52] (03CR) 10Ottomata: New eventstreams chart (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/551843 (https://phabricator.wikimedia.org/T238658) (owner: 10Ottomata) [15:36:26] (03PS2) 10Ema: cache: remove 'enable_geoiplookup' from vcl_config [puppet] - 10https://gerrit.wikimedia.org/r/566536 (https://phabricator.wikimedia.org/T222177) [15:36:48] 10Operations, 10ops-codfw, 10DBA: db2085 crashed - memory issues - https://phabricator.wikimedia.org/T243148 (10Marostegui) For the record, we are also going to try to contact Dell with the OS logs to see if we can get a new DIMM before A3 goes from correctable to uncorrectable. [15:38:44] !log Compress wikidatawiki.wbt_text wikidatawiki.wbt_text_in_lang on db1124:3318 (this might cause lag on s8 labs) - T232446 [15:38:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:38:47] T232446: Compress new Wikibase tables - https://phabricator.wikimedia.org/T232446 [15:39:54] (03PS1) 10Vgutierrez: Fix typo on changelog release date [software/varnish/vhtcpd] (debian) - 10https://gerrit.wikimedia.org/r/566541 [15:40:54] (03CR) 10Vgutierrez: [C: 03+2] Fix typo on changelog release date [software/varnish/vhtcpd] (debian) - 10https://gerrit.wikimedia.org/r/566541 (owner: 10Vgutierrez) [15:43:25] (03CR) 10Ottomata: "I'd like to include the service.name and canary stuff we are discussing before merging this." (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/551843 (https://phabricator.wikimedia.org/T238658) (owner: 10Ottomata) [15:43:27] !log uploaded vhtcpd 0.1.2-2 to apt.w.o (buster) - T242093 [15:43:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:43:30] T242093: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 [15:44:10] (03PS16) 10Ottomata: New eventstreams chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/551843 (https://phabricator.wikimedia.org/T238658) [15:44:23] (03CR) 10Vgutierrez: [C: 03+1] cache: remove 'enable_geoiplookup' from vcl_config [puppet] - 10https://gerrit.wikimedia.org/r/566536 (https://phabricator.wikimedia.org/T222177) (owner: 10Ema) [15:44:28] 10Operations, 10ops-eqsin, 10Traffic: rack/setup/install ps[12]-60[34]-eqsin - https://phabricator.wikimedia.org/T242250 (10faidon) Hey - this was a Q2 task but it hasn't seen an update in a while. What's the status? [15:46:45] 10Puppet, 10Beta-Cluster-Infrastructure, 10Cloud-Services, 10Release-Engineering-Team-TODO, and 2 others: Horizon hiera UI: investigate data type handling - https://phabricator.wikimedia.org/T243422 (10Andrew) [15:47:02] 10Puppet, 10Beta-Cluster-Infrastructure, 10Cloud-Services, 10Release-Engineering-Team-TODO, and 2 others: Horizon hiera UI: investigate data type handling - https://phabricator.wikimedia.org/T243422 (10Andrew) a:03Andrew [15:48:51] 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frlog2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T242265 (10Jgreen) [15:49:38] (03CR) 10Alexandros Kosiaris: [C: 04-1] New eventstreams chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/551843 (https://phabricator.wikimedia.org/T238658) (owner: 10Ottomata) [15:51:43] (03PS1) 10Vgutierrez: varnish: Set buster python version [puppet] - 10https://gerrit.wikimedia.org/r/566543 (https://phabricator.wikimedia.org/T242093) [15:52:25] (03PS1) 10Marostegui: db2085: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/566544 (https://phabricator.wikimedia.org/T243148) [15:54:24] (03CR) 10Marostegui: [C: 03+2] db2085: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/566544 (https://phabricator.wikimedia.org/T243148) (owner: 10Marostegui) [15:55:15] (03CR) 10Ottomata: New eventstreams chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/551843 (https://phabricator.wikimedia.org/T238658) (owner: 10Ottomata) [15:56:01] (03PS1) 10Matthias Mullie: Enable WikibaseQualityConstrains on beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566546 (https://phabricator.wikimedia.org/T239939) [15:56:26] (03CR) 10Matthias Mullie: [C: 04-2] "DB table to be created first" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566546 (https://phabricator.wikimedia.org/T239939) (owner: 10Matthias Mullie) [16:07:57] !log update logging target for pfw3-codfw - T243343 [16:07:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:08:37] (03PS1) 10Vgutierrez: ATS: Include /var/cache/ocsp on RWPaths iff update-ocsp is used [puppet] - 10https://gerrit.wikimedia.org/r/566548 [16:08:48] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2085 crashed - memory issues - https://phabricator.wikimedia.org/T243148 (10Papaul) Create Dispatch: Success You have successfully submitted request SR1011684731. [16:09:41] 10Operations, 10ops-eqsin, 10Traffic: rack/setup/install ps[12]-60[34]-eqsin - https://phabricator.wikimedia.org/T242250 (10RobH) We only got confirmation of delivery of the PDUs yesterday via email. I'll be dispatching directions to Jin after we determine what date works best. @bblack: Do you have a prefe... [16:10:10] 10Operations, 10Scap, 10serviceops: Make canary wait time configurable - https://phabricator.wikimedia.org/T217924 (10thcipriani) >>! In T217924#5822404, @jijiki wrote: > @thcipriani ping! :) pong! >>! In T217924#5806546, @jijiki wrote: > @thcipriani as per our discussion, we can consider merging and testi... [16:10:14] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2085 crashed - memory issues - https://phabricator.wikimedia.org/T243148 (10Marostegui) >>! In T243148#5823376, @Papaul wrote: > > Create Dispatch: Success > You have successfully submitted request SR1011684731. Thank you! Let's see what they say [16:10:33] (03CR) 10Ema: [C: 03+2] cache: remove 'enable_geoiplookup' from vcl_config [puppet] - 10https://gerrit.wikimedia.org/r/566536 (https://phabricator.wikimedia.org/T222177) (owner: 10Ema) [16:11:07] (03CR) 10Vgutierrez: "NOOP in production: https://puppet-compiler.wmflabs.org/compiler1003/20519/ but needed on labs" [puppet] - 10https://gerrit.wikimedia.org/r/566548 (owner: 10Vgutierrez) [16:11:12] 10Operations, 10Traffic, 10Wikidata, 10Wikidata-Query-Service, and 2 others: LDF service does not Vary responses by Accept, sending incorrect cached responses to clients - https://phabricator.wikimedia.org/T232006 (10Gehel) 05Open→03Resolved [16:11:19] (03CR) 10Ema: [C: 03+1] ATS: Include /var/cache/ocsp on RWPaths iff update-ocsp is used [puppet] - 10https://gerrit.wikimedia.org/r/566548 (owner: 10Vgutierrez) [16:11:38] (03CR) 10Vgutierrez: [C: 03+2] ATS: Include /var/cache/ocsp on RWPaths iff update-ocsp is used [puppet] - 10https://gerrit.wikimedia.org/r/566548 (owner: 10Vgutierrez) [16:11:50] (03CR) 10CRusnov: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/563186 (https://phabricator.wikimedia.org/T229397) (owner: 10Jbond) [16:12:35] 10Operations, 10Traffic, 10Patch-For-Review: varnish parent unable to send signals to child - https://phabricator.wikimedia.org/T242411 (10ema) 05Open→03Resolved a:03ema Capability added, all frontends restarted. Closing. [16:13:11] !log update logging target for pfw3-eqiad - T243343 [16:13:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:14:07] 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install frlog2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T242265 (10ayounsi) [16:16:00] (03CR) 10Ema: [C: 03+1] varnish: Set buster python version [puppet] - 10https://gerrit.wikimedia.org/r/566543 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez) [16:16:13] (03CR) 10Vgutierrez: [C: 03+2] varnish: Set buster python version [puppet] - 10https://gerrit.wikimedia.org/r/566543 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez) [16:21:43] 10Operations, 10ops-eqiad, 10serviceops: (Need By Dec 20) rack/setup/install mw13[49-84].eqiad.wmnet - https://phabricator.wikimedia.org/T236437 (10jijiki) @Jclark-ctr @Cmjohnson @wiki_willy please updates of the status of racking these servers and update the task accordingly, thank you! [16:21:49] !log copied prometheus-trafficserver-exporter from stretch to buster on apt.w.o - T242093 [16:21:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:21:53] T242093: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 [16:23:39] (03CR) 10Alexandros Kosiaris: [C: 04-1] New eventstreams chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/551843 (https://phabricator.wikimedia.org/T238658) (owner: 10Ottomata) [16:24:27] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/566534 (https://phabricator.wikimedia.org/T239732) (owner: 10Jbond) [16:25:04] 10Operations, 10ops-eqiad, 10serviceops: (No Need By Date Provided) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10jijiki) @wiki_willy @Cmjohnson @Jclark-ctr Please provide a date that works for you regarding those servers, thank you! [16:26:22] !log installing tiff security updates for buster [16:26:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:30:24] (03CR) 10Lucas Werkmeister (WMDE): "Isn’t $wmgUseWikibaseQuality the config variable you need to set? $wgWBQualityConstraintsEnableConstraintsCheckJobs is something else, I t" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566546 (https://phabricator.wikimedia.org/T239939) (owner: 10Matthias Mullie) [16:31:05] (03CR) 10CRusnov: [C: 03+1] "Overall LGTM. I note that it would be pretty straight forward to add RAPI rw token, but this seems like the path of least resistance." [software/spicerack] - 10https://gerrit.wikimedia.org/r/566054 (https://phabricator.wikimedia.org/T231068) (owner: 10Volans) [16:32:11] (03PS2) 10Matthias Mullie: Enable WikibaseQualityConstrains on beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566546 (https://phabricator.wikimedia.org/T239939) [16:33:07] (03PS3) 10CRusnov: rotatedump: Enhance to retain period copies [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/562408 (https://phabricator.wikimedia.org/T231512) [16:33:09] (03CR) 10Jforrester: [C: 03+1] Enable WikibaseQualityConstrains on beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566546 (https://phabricator.wikimedia.org/T239939) (owner: 10Matthias Mullie) [16:33:11] (03CR) 10Vgutierrez: [C: 03+2] cache: Remove nginx from text and upload clusters [puppet] - 10https://gerrit.wikimedia.org/r/566492 (https://phabricator.wikimedia.org/T236120) (owner: 10Vgutierrez) [16:33:19] (03PS1) 10Ema: cache: remove backend-specific VCL files [puppet] - 10https://gerrit.wikimedia.org/r/566554 (https://phabricator.wikimedia.org/T241239) [16:33:38] (03PS7) 10Vgutierrez: cache: Remove nginx from text and upload clusters [puppet] - 10https://gerrit.wikimedia.org/r/566492 (https://phabricator.wikimedia.org/T236120) [16:33:48] (03CR) 10CRusnov: "% PS3 untabify" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/562408 (https://phabricator.wikimedia.org/T231512) (owner: 10CRusnov) [16:34:33] 10Operations, 10ops-eqiad, 10DC-Ops, 10User-Zppix, 10cloud-services-team (Hardware): VMs on cloudvirt1015 crashing - bad mainboard/memory - https://phabricator.wikimedia.org/T220853 (10JHedden) [16:35:08] (03CR) 10Ema: "pcc noop: https://puppet-compiler.wmflabs.org/compiler1001/20520/" [puppet] - 10https://gerrit.wikimedia.org/r/566554 (https://phabricator.wikimedia.org/T241239) (owner: 10Ema) [16:35:42] (03CR) 10Lucas Werkmeister (WMDE): Enable WikibaseQualityConstrains on beta Commons (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566546 (https://phabricator.wikimedia.org/T239939) (owner: 10Matthias Mullie) [16:35:45] (03CR) 10Matthias Mullie: "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566546 (https://phabricator.wikimedia.org/T239939) (owner: 10Matthias Mullie) [16:36:56] (03CR) 10Lucas Werkmeister (WMDE): "> I suspect we'll also want $wgWBQualityConstraintsEnableConstraintsCheckJobs (similar to Wikidata), right?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566546 (https://phabricator.wikimedia.org/T239939) (owner: 10Matthias Mullie) [16:38:56] (03CR) 10Matthias Mullie: Enable WikibaseQualityConstrains on beta Commons (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566546 (https://phabricator.wikimedia.org/T239939) (owner: 10Matthias Mullie) [16:39:37] (03CR) 10Lucas Werkmeister (WMDE): Enable WikibaseQualityConstrains on beta Commons (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566546 (https://phabricator.wikimedia.org/T239939) (owner: 10Matthias Mullie) [16:39:41] (03PS3) 10Ayounsi: Initial flowspec support [homer/public] - 10https://gerrit.wikimedia.org/r/562505 [16:40:23] !log removing nginx from the caching cluster [16:40:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:43:14] (03CR) 10CRusnov: [C: 04-1] "I'll note that this is primarily waiting on non-management interface import in order to do short->fqdn hostname munging." [software/cumin] - 10https://gerrit.wikimedia.org/r/514840 (https://phabricator.wikimedia.org/T205900) (owner: 10CRusnov) [16:43:16] (03CR) 10CRusnov: [C: 03+1] "LGTM" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/566361 (https://phabricator.wikimedia.org/T213843) (owner: 10Ayounsi) [16:43:46] 10Operations, 10ops-codfw, 10serviceops: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10jijiki) @Papaul Great, let's rack A, B, D, and we will do C in two batches afterwards. Thank you! [16:57:31] 10Operations, 10ops-eqiad, 10DC-Ops, 10Epic, 10cloud-services-team (Hardware): relocate/reimage cloudvirt1013 with 10G interfaces - https://phabricator.wikimedia.org/T243414 (10bd808) [17:03:05] (03PS1) 10Jbond: hiera5: upgrade to hiera5 [puppet] - 10https://gerrit.wikimedia.org/r/566559 [17:04:47] (03CR) 10jerkins-bot: [V: 04-1] hiera5: upgrade to hiera5 [puppet] - 10https://gerrit.wikimedia.org/r/566559 (owner: 10Jbond) [17:09:09] 10Operations, 10ops-eqiad: Heating alerts for mw servers in eqiad - https://phabricator.wikimedia.org/T149287 (10jijiki) 05Open→03Declined We will decom those servers soon, I think we can mark this as declined [17:09:11] 10Operations, 10observability: Monitor hardware thermal issues - https://phabricator.wikimedia.org/T125205 (10jijiki) [17:21:45] (03PS1) 10Urbanecm: Add www.eso.org to the wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566560 (https://phabricator.wikimedia.org/T243423) [17:23:56] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=nginx site={codfw,eqiad,eqsin,esams,ulsfo} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:32:18] 10Operations, 10Performance-Team, 10Traffic, 10Patch-For-Review: Production load.php spends ~ 10% time doing output compression within PHP - https://phabricator.wikimedia.org/T242478 (10Krinkle) a:03ema [17:44:38] ^^ that prometheus alert seems to be triggered by me after removing nginx from the caching cluster [17:44:46] ack [17:45:55] (03CR) 10Jbond: [C: 03+2] puppetmaster2003: add puppetmaster2003 as a canary host and server [puppet] - 10https://gerrit.wikimedia.org/r/566534 (https://phabricator.wikimedia.org/T239732) (owner: 10Jbond) [17:46:12] ACKNOWLEDGEMENT - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=nginx site={codfw,eqiad,eqsin,esams,ulsfo} cole_white fallout from current nginx work https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:46:16] ACKNOWLEDGEMENT - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=nginx site={codfw,eqiad,eqsin,esams,ulsfo} cole_white fallout from current nginx work https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:46:47] !log forcing by hand the first sync on sodium for openstack packages (T238820) [17:46:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:46:51] T238820: CloudVPS: consider mirroring debian repos for openstack packages - https://phabricator.wikimedia.org/T238820 [17:48:59] (03PS1) 10Vgutierrez: prometheus: Remove nginx cache jobs [puppet] - 10https://gerrit.wikimedia.org/r/566561 [17:50:11] shdubsh: may I get a review of that CR? ^^ [17:50:33] shdubsh: of course after running puppet on the prometheus nodes I'll remove the resources [17:51:16] (03CR) 10Cwhite: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/566561 (owner: 10Vgutierrez) [17:51:29] (03CR) 10Vgutierrez: [C: 03+2] prometheus: Remove nginx cache jobs [puppet] - 10https://gerrit.wikimedia.org/r/566561 (owner: 10Vgutierrez) [17:51:41] (03PS1) 10Jbond: puppetmaster::frontend: add ipv6 address to canary hosts [puppet] - 10https://gerrit.wikimedia.org/r/566562 [17:52:52] (03CR) 10Jbond: [C: 03+2] puppetmaster::frontend: add ipv6 address to canary hosts [puppet] - 10https://gerrit.wikimedia.org/r/566562 (owner: 10Jbond) [17:53:36] 10Operations, 10Traffic, 10Inuka-Team (Kanban), 10MW-1.35-notes (1.35.0-wmf.16; 2020-01-21), 10Performance-Team (Radar): Code for InukaPageView instrumentation - https://phabricator.wikimedia.org/T238029 (10Nuria) Please do search on wikitech where all analytics docs are: https://wikitech.wikimedia.org/w... [17:57:28] 10Operations, 10ops-eqiad, 10DC-Ops: cloudclastic1006 malformed asset tag - report error - https://phabricator.wikimedia.org/T243433 (10RobH) [17:57:34] (03PS1) 10Vgutierrez: prometheus: Clean nginx cache cluster config [puppet] - 10https://gerrit.wikimedia.org/r/566564 [17:57:46] (03CR) 10Bmansurov: Add recommendation-api chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) (owner: 10Bmansurov) [17:58:03] shdubsh: I've triggered a puppet run on prometheus::ops nodes, and that CR is the clean up as promised :) [17:58:34] awesome, thanks! [17:59:17] (03CR) 10ArielGlenn: [C: 04-1] "The problem with this patchset is that it will cheerfully break up some carefully determined job for a nice page range into smaller pieces" [dumps] - 10https://gerrit.wikimedia.org/r/562995 (https://phabricator.wikimedia.org/T242209) (owner: 10ArielGlenn) [18:00:48] (03CR) 10Vgutierrez: [C: 03+2] prometheus: Clean nginx cache cluster config [puppet] - 10https://gerrit.wikimedia.org/r/566564 (owner: 10Vgutierrez) [18:04:11] hmmm it looks like there are some rests of varnish-backend there [18:04:39] 10Operations, 10ops-codfw: (No Need By Date Provided) codfw: rack/setup/install elastic20{55,56,57,58,59,60}.wikimedia.org - https://phabricator.wikimedia.org/T241337 (10Papaul) [18:05:26] (03PS1) 10Vgutierrez: prometheus: Remove varnish-backend cluster config [puppet] - 10https://gerrit.wikimedia.org/r/566565 [18:08:44] (03PS1) 10Vgutierrez: prometheus: Clean up varnish-backend cluster config [puppet] - 10https://gerrit.wikimedia.org/r/566566 [18:08:45] (03PS1) 10Jbond: base::resolve: ensure the server domain name is always first in the search list [puppet] - 10https://gerrit.wikimedia.org/r/566567 [18:10:06] (03CR) 10jerkins-bot: [V: 04-1] base::resolve: ensure the server domain name is always first in the search list [puppet] - 10https://gerrit.wikimedia.org/r/566567 (owner: 10Jbond) [18:12:05] 10Operations, 10ops-codfw: codfw: rack/setup/install wdqs202[7-8].codfw.wmnet - https://phabricator.wikimedia.org/T242301 (10Papaul) [18:14:48] PROBLEM - LVS HTTP IPv4 #page on zotero.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [18:15:16] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid [18:16:05] 10Operations, 10SRE-Access-Requests: Requesting replacement Production SSH key - https://phabricator.wikimedia.org/T243438 (10Capt_Swing) [18:16:11] <_joe_> hey [18:16:31] is someone looking at it? [18:16:34] RECOVERY - LVS HTTP IPv4 #page on zotero.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 138 bytes in 0.009 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [18:16:44] XioNoX: I 'll have a look [18:16:45] i guess someone did [18:16:52] it was already woerking when I tested it being up [18:16:54] <_joe_> II don't think so [18:17:26] https://grafana.wikimedia.org/d/000000620/xxxx-zotero-debugging-kubernetes?orgId=1 [18:17:31] memory usage skyrocketed [18:17:58] someone is trying to get citations on something [18:18:07] <_joe_> should we kill one instance? [18:18:37] let me know if it requires an IR, (or any kind of help) [18:18:50] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid [18:18:59] <_joe_> XioNoX: definitely not [18:19:04] yeah I 'd say not [18:19:12] citoid btw is just a dependent service [18:19:58] * apergos peeks in [18:20:08] (03PS2) 10Jbond: base::resolve: ensure the server domain name is always first in the search list [puppet] - 10https://gerrit.wikimedia.org/r/566567 [18:20:09] _joe_: memory usage seems to fall back to normal levels [18:20:17] <_joe_> yeah [18:20:25] mvolz: fyi ^ [18:21:23] mathodoid also had a spike on fs activity, not sure if just a coincidence [18:21:27] *mathoid [18:21:35] the bad news is that we had to disable logging altogether for zotero, so it's not logging absolutely anything [18:21:43] jynus: ? got a graph handy? [18:21:59] <_joe_> also we don't have a significant probe for telling kubernetes if it works [18:22:18] akosiaris: https://grafana.wikimedia.org/d/000000445/kubernetes-pods?orgId=1&from=1579713159094&to=1579717179319&fullscreen&panelId=37 [18:22:30] akosiaris: maybe that is "normal" [18:22:40] I am not familiar with the meta-service [18:23:11] * _joe_ off again [18:23:33] I see larger spikes before [18:23:59] (03PS3) 10Jbond: base::resolv: ensure the server domain name is always first in the search list [puppet] - 10https://gerrit.wikimedia.org/r/566567 [18:24:24] 10Operations, 10ops-codfw: codfw: rack/setup/install wdqs202[7-8].codfw.wmnet - https://phabricator.wikimedia.org/T242301 (10Papaul) ` papaul@asw-b-codfw# show | compare [edit interfaces interface-range vlan-private1-b-codfw] member ge-1/0/16 { ... } + member ge-8/0/7; [edit interfaces interface-range... [18:24:41] 10Operations, 10ops-codfw: codfw: rack/setup/install wdqs202[7-8].codfw.wmnet - https://phabricator.wikimedia.org/T242301 (10Papaul) [18:25:09] jynus: that's 18 and 17 Bytes of read activity [18:25:14] note the Bytes [18:25:19] oh [18:25:28] (03PS4) 10Jbond: base::resolv: ensure the server domain name is always first in the search list [puppet] - 10https://gerrit.wikimedia.org/r/566567 [18:25:28] I thought there were Billions lol [18:25:33] I did too at first [18:25:37] ignore me then [18:25:50] 0:-) [18:26:01] if you go back at 3hours or more you can see the megabytes for other workloads [18:26:20] I was pretty sure it was billions of bytes at the beginning as well [18:26:27] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: decommission labstore2001.codfw.wmnet and labstore2002.codfw.wmnet - https://phabricator.wikimedia.org/T243329 (10Papaul) [18:26:28] sorry [18:26:42] no worries, it's good that you brought it up. thanks [18:26:53] zotero memory seems to be back at normal levels [18:26:56] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10cloud-services-team (Hardware): decommission labstore2003.codfw.wmnet and labstore2004.codfw.wmnet - https://phabricator.wikimedia.org/T243319 (10Papaul) [18:27:19] akosiaris: indeed https://grafana.wikimedia.org/d/000000445/kubernetes-pods?orgId=1&from=1579716345960&to=1579717452793&fullscreen&panelId=32 [18:28:01] :( [18:28:05] mvolz: you may want to drill into citoid logs (as zotero logs are inexistent due to being useless and causing issues) to figure out what happened here https://grafana.wikimedia.org/d/000000620/xxxx-zotero-debugging-kubernetes?orgId=1&from=1579716341745&to=1579717330571&fullscreen&panelId=41. Something caused zotero memory usage to skyrocket [18:28:36] mvolz: hi! sorry for pinging you. Seems like someone really wanted badly to cite something [18:28:46] not sure what yet. [18:28:51] akosiaris: no worries at all, if I'm not available, I'm not available [18:29:27] we've had memory leaks with zotero in the past, but I believe we thought it was resolved [18:29:58] at the time it was related to downloading and trying to parse really big resources (i.e. pdfs) which weren't parseable. [18:30:11] but yeah figuring out what the request was could tell us for sure [18:31:06] akosiaris: is there a quick tutorial intro on looking at logs anywhere? the last time I looked at logs in prod was circa 2015 [18:32:12] (03CR) 10Ayounsi: [C: 03+2] "Tested successfully in af-netbox (with mock data)." [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/566361 (https://phabricator.wikimedia.org/T213843) (owner: 10Ayounsi) [18:34:47] mvolz: https://logstash.wikimedia.org/goto/49b68048141bbf008eb07688ea2ecf64 [18:35:26] each entry in the table should contain a log from citoid [18:36:00] clicking on the arrow should allow you to see more detail. You can change timeframes from the top right [18:36:48] mvolz: there is also docs at https://wikitech.wikimedia.org/wiki/Logstash#Kibana_quick_intro [18:37:08] and a link to a tech talk from bryan in that wikitech page [18:37:51] (03CR) 10Jbond: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/563186 (https://phabricator.wikimedia.org/T229397) (owner: 10Jbond) [18:39:06] 10Operations, 10hardware-requests: Two test hosts for SREs - https://phabricator.wikimedia.org/T214024 (10RobH) a:05RobH→03faidon Ok, wmf5175 was ordered and can be allocated as the dual cpu spare pool system currently available in eqiad. Current proposal: Allocate these two single cpu misc hosts: [[ h... [18:40:47] (03CR) 10Alexandros Kosiaris: [C: 04-1] Add recommendation-api chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) (owner: 10Bmansurov) [18:42:13] * akosiaris off again. I guess I 'll be paged again if the zotero issues return. At least zotero memory usage returned to normal levels very quickly. [18:46:05] 10Operations, 10hardware-requests: Expand Eqiad Ganeti row_A capacity - https://phabricator.wikimedia.org/T242885 (10RobH) [18:50:28] (03CR) 10Dzahn: [C: 03+2] ci::httpd: add support for buster PHP 7.3 [puppet] - 10https://gerrit.wikimedia.org/r/566386 (https://phabricator.wikimedia.org/T224591) (owner: 10Dzahn) [18:53:36] akosiaris: hmm - they don't seem to include the initial request like the raw logs do... anyway to get those manually? [18:54:55] @seen hashar [18:54:55] mutante: Last time I saw hashar they were quitting the network with reason: Quit: I am a virus. Please copy paste me in your /quit message to help me propagate N/A at 1/22/2020 2:28:55 PM (4h25m59s ago) [18:56:22] (03PS4) 10Dzahn: contint: use package_from_component, stop using docker class [puppet] - 10https://gerrit.wikimedia.org/r/566383 (https://phabricator.wikimedia.org/T224591) [19:00:04] RoanKattouw, Niharika, and Urbanecm: That opportune time is upon us again. Time for a Morning SWAT(Max 6 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200122T1900). [19:00:05] Zoranzoki21 and RoanKattouw: A patch you scheduled for Morning SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [19:00:13] I'll do the SWAT [19:02:16] RoanKattouw did I miss the window? [19:02:56] (03CR) 10Dzahn: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/565403 (owner: 10Dzahn) [19:03:06] 10Operations, 10Citoid: Request took down both zotero and citoid (exceeding memory) - https://phabricator.wikimedia.org/T243444 (10Mvolz) [19:03:46] davidwbarratt: No you can still add stuff [19:03:59] RoanKattouw oh ok, I put it in the wrong section, ha. [19:04:04] Or maybe you already did and I just need to refresh the wiki page? [19:04:13] Oh right yeah that explains [19:05:49] (03CR) 10CDanis: [C: 03+1] "LGTM https://puppet-compiler.wmflabs.org/compiler1002/20521/" [puppet] - 10https://gerrit.wikimedia.org/r/566567 (owner: 10Jbond) [19:06:33] RoanKattouw ok, added [19:07:54] before I randomly poke around prod server, does anyone know where the raw logs for services live? :D [19:09:20] RoanKattouw would you like the patch cherry-picked onto the current release branches? [19:09:41] davidwbarratt: yes please [19:10:28] RoanKattouw kk, will do. sorry it's Niharika's fault. :P [19:10:59] Haha. I've been thrown under the bus! [19:11:07] Or under the train? [19:11:20] Hahaha. [19:11:49] 10Operations, 10Security-Team, 10Security: Convert security@ to a google collaboration group - https://phabricator.wikimedia.org/T243446 (10chasemp) p:05Triage→03Normal [19:12:54] davidwbarratt: In all seriousness (Niharika knows this): we deployers can make the cherry-picks ourselves if needed, but we prefer that the requestors do it because 1) conflicts are found ahead of time, 2) they know the repo & how to deal with conflicts [19:13:11] makes sense. :) [19:17:00] 10Operations, 10DC-Ops, 10hardware-requests: eqiad: three clouvirt-wdqs servers for WDQS testing - https://phabricator.wikimedia.org/T232654 (10RobH) 05Open→03Resolved fulfilled by T235685, resolving task off @hw-request workboard. [19:18:13] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/20522/contint1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/566387 (owner: 10Dzahn) [19:19:26] davidwbarratt: I've +2ed your wmf.15 cherry-pick, but it looks like you submitted the wmf.16 one against master instead [19:19:50] Oh never mind there's the wmf.16 version [19:20:04] done: https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200122T1900 [19:20:17] Thanks, I've +2ed both [19:20:23] oh ha, yeah sorry I was moving them 'round [19:20:32] I was stalking you through a Gerrit search for owner:dbarratt so I +2ed them as soon as I saw them there : [19:20:33] :) [19:21:02] idk if you tried this / know about this, but in case you didn't: Gerrit also has a "cherry pick" button that auto-generates cherry-picks (only works if there aren't any conflicts) [19:21:29] Zoranzoki21: Are you around for your SWAT patches? [19:21:53] (03PS2) 10Catrope: Enable UnderstandingFirstDay on ukwiki, huwiki, hywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565439 (https://phabricator.wikimedia.org/T238294) [19:21:57] RoanKattouw I did not know that! thanks! [19:22:31] (03CR) 10Catrope: [C: 03+2] Enable UnderstandingFirstDay on ukwiki, huwiki, hywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565439 (https://phabricator.wikimedia.org/T238294) (owner: 10Catrope) [19:23:25] (03Merged) 10jenkins-bot: Enable UnderstandingFirstDay on ukwiki, huwiki, hywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565439 (https://phabricator.wikimedia.org/T238294) (owner: 10Catrope) [19:23:36] legoktm: it's now possible to compile puppet changes in codesearch before merge. i synced the missing facts the other day. example: https://puppet-compiler.wmflabs.org/compiler1001/20524/codesearch6.codesearch.eqiad.wmflabs/ [19:24:50] legoktm: if you want to use it in the future. go to https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/build?delay=0sec (assuming you can login there) and enter Gerrit change number and FQDN of the cloud instance. then "console output" and at the bottom it links to the result [19:26:20] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable UnderstandingFirstDay on ukwiki, huwiki, hywiki (T238294) (duration: 01m 06s) [19:26:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:26:24] T238294: Deploy EditorJourney to Ukrainian, Hungarian, Armenian Wikipedias - https://phabricator.wikimedia.org/T238294 [19:26:31] (03CR) 10Dzahn: [C: 03+2] "changes in codesearch6 are now compilable, i synced puppet facts the other day:" [puppet] - 10https://gerrit.wikimedia.org/r/565752 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [19:28:13] davidwbarratt: uhhh, you cherry-picked those commits to 1.34.0-wmf.{15,16} instead of 1.35, oops [19:28:18] I'll cherry-pick them to the right ones [19:28:35] (03PS1) 10Papaul: DNS: Add mgmt and production DNS for elastic205[5-9] and elastic2060 [dns] - 10https://gerrit.wikimedia.org/r/566576 [19:28:52] jouncebot: now [19:28:52] For the next 0 hour(s) and 31 minute(s): Morning SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200122T1900) [19:28:56] (03CR) 10jerkins-bot: [V: 04-1] DNS: Add mgmt and production DNS for elastic205[5-9] and elastic2060 [dns] - 10https://gerrit.wikimedia.org/r/566576 (owner: 10Papaul) [19:29:23] They didn't show up when I git pulled, but they were merged, it took me a minute to figure out why that was [19:29:54] RoanKattouw GAY! [19:30:02] GAH! [19:30:03] lol [19:30:04] ha I can't even get that right [19:30:07] damnit [19:30:26] anyone familiar with MachineVision? [19:30:52] RoanKattouw thanks! [19:31:22] RoanKattouw yeah I fetched with git and got the 1.34 versions and I assumed that's what we were on, I should have looked closer at https://tools.wmflabs.org/versions/ [19:33:32] (03PS1) 10ArielGlenn: write out and reuse pagerage info for big page content jobs [dumps] - 10https://gerrit.wikimedia.org/r/566580 (https://phabricator.wikimedia.org/T243434) [19:33:48] RoanKattouw ooo! I cherry-picked them correctly, but moved them in gerrit to the wrong branch [19:33:54] Oh hah [19:34:09] And apparently that repo is inactive enough that it just merged cleanly anyway, somehow? [19:34:11] hauskatze: I'd ping mdholloway or annet [19:34:45] hauskatze: /me waves [19:35:08] thanks Urbanecm [19:35:41] annet: hi, would you be able to take https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/MachineVision/+/566569/ a look ? [19:36:01] (03CR) 10Jforrester: "I assume we don't need to pin the docker versions any more and can rely on the registry giving us the right one at install time?" [puppet] - 10https://gerrit.wikimedia.org/r/566383 (https://phabricator.wikimedia.org/T224591) (owner: 10Dzahn) [19:36:14] hauskatze: yep, on it! [19:36:24] (03PS1) 10Reedy: maintain-replicas is no more, it's maintain-views now [puppet] - 10https://gerrit.wikimedia.org/r/566581 [19:36:32] annet: awesome, thanks. I'm going to make dinner in the meanwhile [19:36:33] !log catrope@deploy1001 Synchronized php-1.35.0-wmf.16/extensions/WikimediaEvents/: InukaPageView: update schema version (T238029) (duration: 01m 07s) [19:36:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:36:36] T238029: Code for InukaPageView instrumentation - https://phabricator.wikimedia.org/T238029 [19:36:37] RoanKattouw right?! I would have expected a conflict, but seems to be fine [19:36:42] if you see smoke, it's *not* me [19:36:51] I'm an outstanding chef ;) [19:36:59] … it's the Pope? [19:37:18] Bring back Benedictus XVI please [19:37:32] Why, what do you want to do to Him? [19:37:40] Latin Mass [19:38:36] James_F: in the meanwhile https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/565745/ ?? :) :) [19:38:51] please Sir? [19:38:57] hauskatze: I'm getting to it, I'm getting to it. [19:39:00] Oy veh. [19:39:11] no rush [19:39:18] bbl [19:39:20] Also, I'm no Sir, I'm a common-as-muck pleb born into bastardy. :-) [19:39:33] davidwbarratt: OK your changes are now on mwdebug1001, please test [19:39:40] Lord Forrester sounds about right James_F [19:39:44] RoanKattouw you got it! [19:41:19] RoanKattouw it's perfect! [19:43:25] hauskatze, annet: While I'm at it, shall I cherry-pick and deploy that commit that adds the missing messages? [19:43:31] Oh hauskatze left [19:43:36] !log restart tilerator / kartotherian on maps* servers [19:43:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:43:45] (03Abandoned) 10Papaul: DNS: Add mgmt and production DNS for elastic205[5-9] and elastic2060 [dns] - 10https://gerrit.wikimedia.org/r/566576 (owner: 10Papaul) [19:45:28] !log catrope@deploy1001 Synchronized php-1.35.0-wmf.15/extensions/WikimediaMessages/: Remove temporary partial block banner (T240300) (duration: 01m 10s) [19:45:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:45:31] T240300: Introduce a temporary banner on Special:Block to inform users about upcoming partial blocks deploy - https://phabricator.wikimedia.org/T240300 [19:46:35] !log catrope@deploy1001 Synchronized php-1.35.0-wmf.16/extensions/WikimediaMessages/: Remove temporary partial block banner (T240300) (duration: 01m 06s) [19:46:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:46:45] RoanKattouw: the issue is that at https://commons.wikimedia.org/wiki/Special:ListGroupRights, the group is called machinevision-tester which is not pretty. I don't think that warrants a full scap. Your call through [19:47:50] RoanKattouw thanks for your help! [19:48:06] Urbanecm: I already need to do a full scap anyway, so it can come along for the ride [19:48:07] I'll just do it [19:48:21] then it's probably good to do this too [20:00:04] brennen and twentyafterfour: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Mediawiki train - American Version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200122T2000). [20:00:23] twentyafterfour: SWAT is delayed due to Jenkins, and I need to do a full scap as well. Since you're probably just doing a wikiversions bump, do you want to do that now-ish and then have me go back to SWATting? [20:00:58] RoanKattouw: i'm handling train this week (mainly), but that sounds reasonable. [20:01:13] OK cool [20:01:20] I'll stand back and let you deploy the train then [20:01:46] k, bumping momentarily [20:02:38] (03PS1) 10Brennen Bearnes: group1 wikis to 1.35.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566587 [20:02:41] (03CR) 10Brennen Bearnes: [C: 03+2] group1 wikis to 1.35.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566587 (owner: 10Brennen Bearnes) [20:03:10] (03PS1) 10Dzahn: ci::docker: add same docker version also for buster [puppet] - 10https://gerrit.wikimedia.org/r/566588 (https://phabricator.wikimedia.org/T224591) [20:03:36] (03Merged) 10jenkins-bot: group1 wikis to 1.35.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566587 (owner: 10Brennen Bearnes) [20:04:12] (03CR) 10Dzahn: [C: 03+2] "https://gerrit.wikimedia.org/r/c/operations/puppet/+/566383 will be better but this is the quick fix to see what issue is next if any" [puppet] - 10https://gerrit.wikimedia.org/r/566588 (https://phabricator.wikimedia.org/T224591) (owner: 10Dzahn) [20:06:20] !log brennen@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.16 [20:06:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:07:26] !log brennen@deploy1001 Synchronized php: group1 wikis to 1.35.0-wmf.16 (duration: 01m 05s) [20:07:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:10:37] RoanKattouw: looking pretty quiet... [20:16:23] 10Operations: Audit & update spares part tracking for all sites - https://phabricator.wikimedia.org/T243450 (10Aklapper) [20:16:41] RoanKattouw: over to you i'd say. [20:17:27] brennen: Thanks! One more Jenkins run.... [20:19:13] (03CR) 10Cwhite: [C: 03+1] "LGTM assuming traffic team has replacement grafana dashboards in place." [puppet] - 10https://gerrit.wikimedia.org/r/566565 (owner: 10Vgutierrez) [20:35:13] (03PS1) 10Cmjohnson: Adding mac address to dhcp file mc-gp100[1-3] [puppet] - 10https://gerrit.wikimedia.org/r/566594 (https://phabricator.wikimedia.org/T241795) [20:36:15] (03CR) 10jerkins-bot: [V: 04-1] Adding mac address to dhcp file mc-gp100[1-3] [puppet] - 10https://gerrit.wikimedia.org/r/566594 (https://phabricator.wikimedia.org/T241795) (owner: 10Cmjohnson) [20:38:13] (03PS2) 10Cmjohnson: Adding mac address to dhcp file mc-gp100[1-3] [puppet] - 10https://gerrit.wikimedia.org/r/566594 (https://phabricator.wikimedia.org/T241795) [20:39:23] (03CR) 10Jforrester: [C: 03+1] contint: use package_from_component, stop using docker class [puppet] - 10https://gerrit.wikimedia.org/r/566383 (https://phabricator.wikimedia.org/T224591) (owner: 10Dzahn) [20:39:34] (03CR) 10Cmjohnson: [C: 03+2] Adding mac address to dhcp file mc-gp100[1-3] [puppet] - 10https://gerrit.wikimedia.org/r/566594 (https://phabricator.wikimedia.org/T241795) (owner: 10Cmjohnson) [20:39:42] (03CR) 10ArielGlenn: "In each module is fine, people do and should copy paste, that's all." [puppet] - 10https://gerrit.wikimedia.org/r/565403 (owner: 10Dzahn) [20:40:47] 10Operations, 10ops-eqiad, 10serviceops, 10Patch-For-Review: (Need By: Jan 10) rack/setup/install mc-gp100[123].eqiad.wmnet - https://phabricator.wikimedia.org/T241795 (10Cmjohnson) [20:41:55] hrm, just realized this still needs to go out: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CodeReview/+/566385 [20:47:08] (03PS1) 10Jgreen: adjust nsca_frack.cfg to monitor frlog2001, replacing bellatrix [puppet] - 10https://gerrit.wikimedia.org/r/566597 [20:48:20] RoanKattouw: Thanks for backporting my patches to the current wmf branch :) [20:48:46] Still waiting for Jenkins to finish on the backport though :/ [20:48:50] I thought deploy people didn't liked to do it because that means we need to do a full l10n update ? [20:49:06] (03PS2) 10Jgreen: adjust nsca_frack.cfg to monitor frlog2001, replacing bellatrix [puppet] - 10https://gerrit.wikimedia.org/r/566597 (https://phabricator.wikimedia.org/T242265) [20:49:10] Yeah, I see the MV backport is still being tested. Lots of dependencies [20:50:40] (03CR) 10Jgreen: [C: 03+2] adjust nsca_frack.cfg to monitor frlog2001, replacing bellatrix [puppet] - 10https://gerrit.wikimedia.org/r/566597 (https://phabricator.wikimedia.org/T242265) (owner: 10Jgreen) [20:51:51] ugh [20:51:52] https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/MachineVision/+/566590/ [20:51:57] stuck in ready to submit? [20:52:01] c'mon jenkins [20:56:05] RoanKattouw: looks it got stuck? [20:57:00] James_F: upon renaming a MW core namespace, do we need to run namespacesDupes? [20:59:28] (03CR) 10Thcipriani: contint: use package_from_component, stop using docker class (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/566383 (https://phabricator.wikimedia.org/T224591) (owner: 10Dzahn) [20:59:41] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: rack/setup/install frlog2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T242265 (10Jgreen) [21:00:04] cscott, arlolra, subbu, halfak, and accraze: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Services – Graphoid / Parsoid / Citoid / ORES. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200122T2100). [21:00:10] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: rack/setup/install frlog2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T242265 (10Jgreen) 05Open→03Resolved [21:06:27] arlolra: i'm here [21:07:31] ok, I'll get started [21:10:17] !log arlolra@deploy1001 Started deploy [parsoid/deploy@e8610ff]: Updating Parsoid to 7390988 [21:10:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:16:27] 10Operations, 10ops-eqiad, 10DC-Ops: cloudclastic1006 malformed asset tag - report error - https://phabricator.wikimedia.org/T243433 (10Jclark-ctr) 05Open→03Resolved updated netbox with correct asset tag [21:18:45] !log arlolra@deploy1001 Finished deploy [parsoid/deploy@e8610ff]: Updating Parsoid to 7390988 (duration: 08m 28s) [21:18:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:28:34] !log Updated Parsoid to 7390988 (T242513, T243008, T241146) [21:28:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:28:47] T241146: Text w/o a context crasher in Parsoid/PHP LanguageConverter - https://phabricator.wikimedia.org/T241146 [21:28:47] T242513: VE: tag should be followed by a carriage return when inserted (as block element format) - https://phabricator.wikimedia.org/T242513 [21:28:49] T243008: PHP Notice: Undefined variable: headers - https://phabricator.wikimedia.org/T243008 [21:32:54] !log catrope@deploy1001 Started scap: i18n changes for SWAT: Special page aliases for GrowthExperiments (T230676); messages for machinevision-tester group (T243440); fix namespace names for atj (T243125) [21:32:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:33:00] T230676: Deploy Growth experiments at Armenian Wikipedia - https://phabricator.wikimedia.org/T230676 [21:33:00] T243125: Name space "Template" is misspelled on atj.wp - https://phabricator.wikimedia.org/T243125 [21:33:00] T243440: `machinevision-tester` lacks i18n keys - https://phabricator.wikimedia.org/T243440 [21:33:58] RoanKattouw: mind giving me a heads up when you're finished? i should get a patch out for T243337. [21:33:59] T243337: Argument 3 passed to CodeRevisionView::__construct() must be an instance of User, string given, called in /srv/mediawiki/php-1.35.0-wmf.16/extensions/CodeReview/includes/ui/SpecialCode.php on line 134 - https://phabricator.wikimedia.org/T243337 [21:34:10] brennen: Will do [21:34:20] ta. [21:57:36] (03CR) 10Addshore: [C: 03+1] Enable WikibaseQualityConstrains on beta Commons (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566546 (https://phabricator.wikimedia.org/T239939) (owner: 10Matthias Mullie) [22:00:05] mutante: woohoo! I have a bit of final testing to do, but I think everything should be fixed and working now :D [22:03:33] brennen: scap sync is still running, but I've +2ed the cherry-pick for that bug and Jenkins has merged it. I'll sync it once my scap finishes [22:04:01] RoanKattouw: cool, thx [22:13:08] (03PS1) 10Cmjohnson: Adding mac addresses and partman for ganeti1019-1022 [puppet] - 10https://gerrit.wikimedia.org/r/566606 (https://phabricator.wikimedia.org/T228926) [22:13:30] (03Abandoned) 10Alex Monk: role::puppetmaster::standalone: Add support for multiple PuppetDB hosts [puppet] - 10https://gerrit.wikimedia.org/r/566380 (https://phabricator.wikimedia.org/T243226) (owner: 10Alex Monk) [22:13:42] !log catrope@deploy1001 Finished scap: i18n changes for SWAT: Special page aliases for GrowthExperiments (T230676); messages for machinevision-tester group (T243440); fix namespace names for atj (T243125) (duration: 40m 48s) [22:13:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:13:48] T230676: Deploy Growth experiments at Armenian Wikipedia - https://phabricator.wikimedia.org/T230676 [22:13:48] T243125: Name space "Template" is misspelled on atj.wp - https://phabricator.wikimedia.org/T243125 [22:13:49] T243440: `machinevision-tester` lacks i18n keys - https://phabricator.wikimedia.org/T243440 [22:15:07] (03CR) 10Cmjohnson: [C: 03+2] Adding mac addresses and partman for ganeti1019-1022 [puppet] - 10https://gerrit.wikimedia.org/r/566606 (https://phabricator.wikimedia.org/T228926) (owner: 10Cmjohnson) [22:19:55] !log catrope@deploy1001 Synchronized php-1.35.0-wmf.16/extensions/CodeReview/: T243337 (duration: 01m 06s) [22:19:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:19:59] T243337: Argument 3 passed to CodeRevisionView::__construct() must be an instance of User, string given, called in /srv/mediawiki/php-1.35.0-wmf.16/extensions/CodeReview/includes/ui/SpecialCode.php on line 134 - https://phabricator.wikimedia.org/T243337 [22:20:33] brennen: Done [22:21:11] (03CR) 10Catrope: [C: 03+2] GrowthExperiments: Enable help panel on ukwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565442 (https://phabricator.wikimedia.org/T238319) (owner: 10Catrope) [22:21:16] (03PS2) 10Catrope: GrowthExperiments: Enable help panel on ukwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565442 (https://phabricator.wikimedia.org/T238319) [22:21:21] (03CR) 10Catrope: [C: 03+2] GrowthExperiments: Enable help panel on ukwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565442 (https://phabricator.wikimedia.org/T238319) (owner: 10Catrope) [22:21:44] (03PS3) 10Catrope: GrowthExperiments: Enable help panel on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565443 (https://phabricator.wikimedia.org/T238319) [22:22:22] (03Merged) 10jenkins-bot: GrowthExperiments: Enable help panel on ukwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565442 (https://phabricator.wikimedia.org/T238319) (owner: 10Catrope) [22:22:38] (03CR) 10jerkins-bot: [V: 04-1] GrowthExperiments: Enable help panel on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565443 (https://phabricator.wikimedia.org/T238319) (owner: 10Catrope) [22:23:04] (03PS4) 10Catrope: GrowthExperiments: Enable help panel on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565443 (https://phabricator.wikimedia.org/T238319) [22:23:20] RoanKattouw: thx. [22:23:42] (03CR) 10Catrope: [C: 03+2] GrowthExperiments: Enable help panel on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565443 (https://phabricator.wikimedia.org/T238319) (owner: 10Catrope) [22:24:38] (03Merged) 10jenkins-bot: GrowthExperiments: Enable help panel on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565443 (https://phabricator.wikimedia.org/T238319) (owner: 10Catrope) [22:25:56] (03PS3) 10Catrope: GrowthExperiments: Enable help panel on hywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565447 (https://phabricator.wikimedia.org/T238319) [22:26:34] (03PS4) 10Catrope: GrowthExperiments: Enable help panel on hywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565447 (https://phabricator.wikimedia.org/T238319) [22:26:44] (03CR) 10Catrope: [C: 03+2] GrowthExperiments: Enable help panel on hywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565447 (https://phabricator.wikimedia.org/T238319) (owner: 10Catrope) [22:27:37] (03Merged) 10jenkins-bot: GrowthExperiments: Enable help panel on hywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565447 (https://phabricator.wikimedia.org/T238319) (owner: 10Catrope) [22:30:56] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable help panel on ukwiki, huwiki, hywiki (T238319, T231720, T230478, T230676) (duration: 01m 04s) [22:31:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:31:02] T230676: Deploy Growth experiments at Armenian Wikipedia - https://phabricator.wikimedia.org/T230676 [22:31:02] T231720: Deploy Growth experiments at Ukrainian Wikipedia - https://phabricator.wikimedia.org/T231720 [22:31:02] T238319: Deploy Help Panel to Ukrainian, Hungarian, Armenian Wikipedias - https://phabricator.wikimedia.org/T238319 [22:31:03] T230478: Get the Growth experiment for the Hungarian Wikipedia - https://phabricator.wikimedia.org/T230478 [22:34:00] (03PS2) 10Catrope: GrowthExperiments: Enable homepage on ukwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565448 (https://phabricator.wikimedia.org/T238320) [22:35:34] (03PS3) 10Catrope: GrowthExperiments: Enable homepage on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565449 (https://phabricator.wikimedia.org/T238320) [22:35:38] (03CR) 10Catrope: [C: 03+2] GrowthExperiments: Enable homepage on ukwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565448 (https://phabricator.wikimedia.org/T238320) (owner: 10Catrope) [22:35:47] (03CR) 10Catrope: [C: 03+2] GrowthExperiments: Enable homepage on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565449 (https://phabricator.wikimedia.org/T238320) (owner: 10Catrope) [22:36:01] (03PS3) 10Catrope: GrowthExperiments: Enable homepage on hywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565450 (https://phabricator.wikimedia.org/T238320) [22:36:06] (03CR) 10Catrope: [C: 03+2] GrowthExperiments: Enable homepage on hywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565450 (https://phabricator.wikimedia.org/T238320) (owner: 10Catrope) [22:36:41] (03Merged) 10jenkins-bot: GrowthExperiments: Enable homepage on ukwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565448 (https://phabricator.wikimedia.org/T238320) (owner: 10Catrope) [22:36:46] (03Merged) 10jenkins-bot: GrowthExperiments: Enable homepage on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565449 (https://phabricator.wikimedia.org/T238320) (owner: 10Catrope) [22:37:19] (03Merged) 10jenkins-bot: GrowthExperiments: Enable homepage on hywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565450 (https://phabricator.wikimedia.org/T238320) (owner: 10Catrope) [22:39:12] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable homepage on ukwiki, huwiki, hywiki (T238320, T231720, T230478, T230676) (duration: 01m 05s) [22:39:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:39:18] T230478: Get the Growth experiment for the Hungarian Wikipedia - https://phabricator.wikimedia.org/T230478 [22:39:18] T230676: Deploy Growth experiments at Armenian Wikipedia - https://phabricator.wikimedia.org/T230676 [22:39:18] T231720: Deploy Growth experiments at Ukrainian Wikipedia - https://phabricator.wikimedia.org/T231720 [22:39:18] T238320: Deploy Newcomer Homepage to Ukrainian, Hungarian, Armenian Wikipedias - https://phabricator.wikimedia.org/T238320 [22:45:13] (03PS3) 10Bmansurov: Add recommendation-api chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) [22:45:48] (03CR) 10jerkins-bot: [V: 04-1] Add recommendation-api chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) (owner: 10Bmansurov) [22:53:51] (03PS4) 10Bmansurov: Add recommendation-api chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) [23:00:46] (03CR) 10Matthias Mullie: Enable WikibaseQualityConstrains on beta Commons (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566546 (https://phabricator.wikimedia.org/T239939) (owner: 10Matthias Mullie) [23:01:24] (03PS3) 10Matthias Mullie: Enable WikibaseQualityConstrains on beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566546 (https://phabricator.wikimedia.org/T239939) [23:02:43] (03CR) 10jerkins-bot: [V: 04-1] Enable WikibaseQualityConstrains on beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566546 (https://phabricator.wikimedia.org/T239939) (owner: 10Matthias Mullie) [23:03:46] (03PS4) 10Matthias Mullie: Enable WikibaseQualityConstrains on beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566546 (https://phabricator.wikimedia.org/T239939) [23:06:10] !log configure flowspec on cr3-knams [23:06:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:22:02] (03CR) 10Matthias Mullie: [C: 03+2] Enable WikibaseQualityConstrains on beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566546 (https://phabricator.wikimedia.org/T239939) (owner: 10Matthias Mullie) [23:22:51] (03Merged) 10jenkins-bot: Enable WikibaseQualityConstrains on beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566546 (https://phabricator.wikimedia.org/T239939) (owner: 10Matthias Mullie) [23:27:55] (03CR) 10BryanDavis: [WIP] toolforge: Port portgrabber related code to Python 3 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/566491 (https://phabricator.wikimedia.org/T218427) (owner: 10Legoktm) [23:31:41] T236104 happened again, and this time I'm leaving it broken so I can investigate. Please don't use do any MW deployments (use scap) for now [23:31:51] T236104: Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 [23:46:29] !log T236104 happened again, and this time I'm leaving it broken so I can investigate. Please don't use do any MW deployments (use scap) for now [23:46:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:46:34] T236104: Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 [23:49:22] (03PS1) 10Papaul: DNS: Add mgmt DNS for elastic205[5-9],elastic2060 [dns] - 10https://gerrit.wikimedia.org/r/566623