[00:00:04] RoanKattouw, Niharika, and Urbanecm: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Evening SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200107T0000). [00:00:05] ebernhardson, davidwbarratt, and tgr: A patch you scheduled for Evening SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:01:38] 10Operations, 10Mail: Add security-team@wikimedia.org as recipient of any abuse@ emails - https://phabricator.wikimedia.org/T242049 (10Dzahn) I added security-team@wikimedia.org to abuse@ for all of the above. [00:01:51] here! [00:02:11] 10Operations, 10SRE-Access-Requests: Requesting access to stat1004, stat1007, stat1006, notebook1003, notebook1004 for Kate Zimmerman - https://phabricator.wikimedia.org/T240732 (10kzimmerman) @jcrespo Thank you! I missed your ping earlier; I'll reopen if I end up running into issues. [00:06:25] 10Operations, 10ops-codfw: (Need By: Jan 15) codfw: rack/setup/install mc-gp200[123].codfw.wmnet - https://phabricator.wikimedia.org/T239249 (10Papaul) [00:08:13] o/ [00:08:20] 10Operations: rack/setup/install ganeti500[123].eqsin.wmnet - https://phabricator.wikimedia.org/T228099 (10RobH) a:05RobH→03MoritzMuehlenhoff [00:08:30] 10Operations: rack/setup/install ganeti500[123].eqsin.wmnet - https://phabricator.wikimedia.org/T228099 (10RobH) This is now ready for service implementation. [00:11:49] 10Operations, 10Mail: Add security-team@wikimedia.org as recipient of any abuse@ emails - https://phabricator.wikimedia.org/T242049 (10Dzahn) 05Open→03Resolved a:03Dzahn This should be it.. unless you are asking to add it even for things that never had working abuse@ in the past and are at best https red... [00:16:00] I can swat [00:16:06] YAY! :) [00:17:35] 10Operations, 10ops-codfw: (Need By: Jan 15) codfw: rack/setup/install mc-gp200[123].codfw.wmnet - https://phabricator.wikimedia.org/T239249 (10Papaul) [00:17:48] ebernhardson: around for SWAT? [00:18:19] (03PS3) 10Gergő Tisza: Revert "cirrus: Shift more_like to codfw cirrus cluster" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562309 (owner: 10EBernhardson) [00:20:54] (03CR) 10Gergő Tisza: [C: 03+2] Enable Partial Blocks on every wiki excluding those that have opted-out [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562359 (https://phabricator.wikimedia.org/T218626) (owner: 10Dbarratt) [00:21:56] (03Merged) 10jenkins-bot: Enable Partial Blocks on every wiki excluding those that have opted-out [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562359 (https://phabricator.wikimedia.org/T218626) (owner: 10Dbarratt) [00:23:53] davidwbarratt: you can test on mwdebug1001 [00:24:08] tgr thanks! [00:25:08] looks perfect! [00:26:20] 10Operations, 10ops-codfw, 10Wikimedia-Logstash: (No Need By Date Provided) rack/setup/install logstash202[6-9].codfw.wmnet - https://phabricator.wikimedia.org/T240882 (10Papaul) [00:26:57] (03PS2) 10Gergő Tisza: GrowthExperiments: use local search in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561927 (https://phabricator.wikimedia.org/T235717) [00:27:33] !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:562359|Enable Partial Blocks on every wiki excluding those that have opted-out (T218626)]] (duration: 00m 55s) [00:27:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:27:36] T218626: [Epic] Partial block rollout - https://phabricator.wikimedia.org/T218626 [00:28:30] (03CR) 10Gergő Tisza: [C: 03+2] GrowthExperiments: use local search in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561927 (https://phabricator.wikimedia.org/T235717) (owner: 10Gergő Tisza) [00:29:29] (03Merged) 10jenkins-bot: GrowthExperiments: use local search in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561927 (https://phabricator.wikimedia.org/T235717) (owner: 10Gergő Tisza) [00:31:39] (03CR) 10Dzahn: [C: 03+2] Add DNS entries for urldownloader* [dns] - 10https://gerrit.wikimedia.org/r/562283 (https://phabricator.wikimedia.org/T224551) (owner: 10Muehlenhoff) [00:31:44] (03PS4) 10Dzahn: Add DNS entries for urldownloader* [dns] - 10https://gerrit.wikimedia.org/r/562283 (https://phabricator.wikimedia.org/T224551) (owner: 10Muehlenhoff) [00:33:22] tgr FYI: https://phabricator.wikimedia.org/T218626#5780200 [00:34:00] I wasn't sure if this was the channel to report that in - the partial block interface is only showing up intermittently on enwikiversity [00:34:35] we seem to get that with config changes nowadays [00:34:47] let me see if I can dig up the task with the workaround [00:34:58] (03CR) 10Dzahn: "[authdns1001:~] $ host urldownloader1001" [dns] - 10https://gerrit.wikimedia.org/r/562283 (https://phabricator.wikimedia.org/T224551) (owner: 10Muehlenhoff) [00:35:24] 10Operations, 10Patch-For-Review: Migrate URL downloaders to Stretch/Buster - https://phabricator.wikimedia.org/T224551 (10Dzahn) ` [authdns1001:~] $ host urldownloader1001 urldownloader1001.wikimedia.org has address 208.80.154.29 urldownloader1001.wikimedia.org has IPv6 address 2620:0:861:1:208:80:154:29 [aut... [00:38:36] 10Operations, 10vm-requests: eqiad/codfw: 2 VM request for URL downloaders - https://phabricator.wikimedia.org/T241979 (10Dzahn) added to https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions#Servers [00:39:21] tgr: i totally spaced on the deploy time...i can deploy if you're done [00:40:02] davidwbarratt: T236104 is the one I think [00:40:02] T236104: Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 [00:40:09] 10Operations, 10Patch-For-Review: Migrate URL downloaders to Stretch/Buster - https://phabricator.wikimedia.org/T224551 (10Dzahn) Added urldownloader server name to https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions#Servers and linked to https://wikitech.wikimedia.org/wiki/Url-downloader. W... [00:40:18] the next sync touching InitializeSettings should fix it [00:42:29] tgr sheesh that's weird [00:44:22] I don't know if this helps, but when the page is correct, the JS is still wrong. :/ [00:45:31] !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: [[gerrit:561927|GrowthExperiments: use local search in production (T235717)]] (duration: 00m 58s) [00:45:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:45:35] T235717: Newcomer tasks: non-HTTP-based ConfigurationLoader and TaskSuggester - https://phabricator.wikimedia.org/T235717 [00:46:40] !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:561927|GrowthExperiments: use local search in production (T235717)]] (duration: 00m 54s) [00:46:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:46:45] davidwbarratt: fixed now? [00:47:58] (03PS1) 10Dzahn: site: add new URL-downloaders as spare systems [puppet] - 10https://gerrit.wikimedia.org/r/562392 (https://phabricator.wikimedia.org/T241979) [00:48:32] (03PS6) 10DannyS712: InitialiseSettings - clean up groupOverrides layout / spacing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/554392 (https://phabricator.wikimedia.org/T231178) [00:49:20] tgr yep! I had to clear all the site data from my browser, but it looks good now on each refresh. :) [00:49:30] ebernhardson is it too late to add https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/554392/ to this swat? [00:50:08] over to you ebernhardson; note you'll have to sync twice due to T236104 [00:50:09] T236104: Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 [00:50:21] tgr thank you so much! [00:50:28] tgr: kk [00:50:42] (03PS4) 10EBernhardson: Revert "cirrus: Shift more_like to codfw cirrus cluster" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562309 [00:50:44] (03CR) 10EBernhardson: [C: 03+2] Revert "cirrus: Shift more_like to codfw cirrus cluster" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562309 (owner: 10EBernhardson) [00:51:45] DannyS712: we can probably ship it [00:52:14] (03CR) 10EBernhardson: [C: 03+2] Revert "Reduce query load on cirrus elastic clusters" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562308 (owner: 10EBernhardson) [00:52:34] !log dzahn@cumin1001 START - Cookbook sre.ganeti.makevm [00:52:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:53:06] (03Merged) 10jenkins-bot: Revert "Reduce query load on cirrus elastic clusters" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562308 (owner: 10EBernhardson) [00:53:31] (03PS7) 10DannyS712: InitialiseSettings - clean up groupOverrides layout / spacing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/554392 (https://phabricator.wikimedia.org/T231178) [00:56:30] !log ebernhardson@deploy1001 sync-file aborted: Revery (duration: 00m 00s) [00:56:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:57:30] !log dzahn@cumin1001 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [00:57:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:57:34] !log ebernhardson@deploy1001 Synchronized wmf-config/CirrusSearch-common.php: Revert "reduce query load on cirrus elastic clusters" (duration: 00m 54s) [00:57:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:57:49] !log dzahn@cumin1001 START - Cookbook sre.ganeti.makevm [00:57:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:57:53] (03PS2) 10Dzahn: site: add new URL-downloaders as spare systems [puppet] - 10https://gerrit.wikimedia.org/r/562392 (https://phabricator.wikimedia.org/T241979) [00:58:47] !log ganeti - creating urldownloader1001.wikimedia.org in eqiad_A with 1 CPU, 1 GB RAM, 10 GB disk, public IP (T241979) [00:58:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:58:49] T241979: eqiad/codfw: 2 VM request for URL downloaders - https://phabricator.wikimedia.org/T241979 [00:59:11] (03PS5) 10EBernhardson: Revert "cirrus: Shift more_like to codfw cirrus cluster" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562309 [00:59:16] !log ganeti - creating urldownloader1002.wikimedia.org in eqiad_C with 1 CPU, 1 GB RAM, 10 GB disk, public IP (T241979) [00:59:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:59:25] (03CR) 10EBernhardson: [C: 03+2] Revert "cirrus: Shift more_like to codfw cirrus cluster" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562309 (owner: 10EBernhardson) [01:00:24] (03Merged) 10jenkins-bot: Revert "cirrus: Shift more_like to codfw cirrus cluster" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562309 (owner: 10EBernhardson) [01:02:46] !log dzahn@cumin1001 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [01:02:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:03:05] !log ebernhardson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Revert: "cirrus: Shift more_like to codfw cirrus cluster" (duration: 00m 54s) [01:03:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:03:43] (03CR) 10EBernhardson: [C: 03+2] InitialiseSettings - clean up groupOverrides layout / spacing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/554392 (https://phabricator.wikimedia.org/T231178) (owner: 10DannyS712) [01:03:49] (03PS8) 10EBernhardson: InitialiseSettings - clean up groupOverrides layout / spacing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/554392 (https://phabricator.wikimedia.org/T231178) (owner: 10DannyS712) [01:03:58] (03CR) 10EBernhardson: [C: 03+2] InitialiseSettings - clean up groupOverrides layout / spacing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/554392 (https://phabricator.wikimedia.org/T231178) (owner: 10DannyS712) [01:04:29] !log dzahn@cumin1001 START - Cookbook sre.ganeti.makevm [01:04:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:04:52] !log ganeti - creating urldownloader2001.wikimedia.org in codfw_A with 1 CPU, 1 GB RAM, 10 GB disk, public IP (T241979) [01:04:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:04:54] T241979: eqiad/codfw: 2 VM request for URL downloaders - https://phabricator.wikimedia.org/T241979 [01:04:56] (03Merged) 10jenkins-bot: InitialiseSettings - clean up groupOverrides layout / spacing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/554392 (https://phabricator.wikimedia.org/T231178) (owner: 10DannyS712) [01:09:23] !log dzahn@cumin1001 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [01:09:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:11:31] DannyS712: pulled to mwdebug1002, although this looks like it should be a no-op [01:11:55] 10Operations, 10ops-codfw: (Need By: Jan 15) codfw: rack/setup/install mc-gp200[123].codfw.wmnet - https://phabricator.wikimedia.org/T239249 (10Papaul) ` papaul@asw-a-codfw> show interfaces xe-4/0/21 descriptions Interface Admin Link Description xe-4/0/21 up up mc-gp2001 ` ` papaul@asw-b-cod... [01:12:02] indeed, it ways meant as a cleanup no-op before T239771 so that there would be a standard format for user rights [01:12:02] T239771: Review and consolidate when user group rights are assigned - https://phabricator.wikimedia.org/T239771 [01:12:23] !log dzahn@cumin1001 START - Cookbook sre.ganeti.makevm [01:12:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:12:34] 10Operations, 10ops-codfw: (Need By: Jan 15) codfw: rack/setup/install mc-gp200[123].codfw.wmnet - https://phabricator.wikimedia.org/T239249 (10Papaul) [01:12:37] !log ganeti - creating urldownloader2002.wikimedia.org in codfw_B with 1 CPU, 1 GB RAM, 10 GB disk, public IP (T241979) [01:12:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:12:40] T241979: eqiad/codfw: 2 VM request for URL downloaders - https://phabricator.wikimedia.org/T241979 [01:14:28] !log ebernhardson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: InitialiseSettings - clean up groupOverrides layout / spacing (duration: 00m 54s) [01:14:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:15:04] DannyS712: synced, syncing one last time for T236104, and should be all done [01:15:06] T236104: Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 [01:15:30] !log ebernhardson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: InitialiseSettings - clean up groupOverrides layout / spacing (sync again) (duration: 00m 53s) [01:15:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:15:47] should be all set [01:17:19] !log dzahn@cumin1001 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [01:17:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:19:27] 10Operations, 10vm-requests, 10Patch-For-Review: eqiad/codfw: 2 VM request for URL downloaders - https://phabricator.wikimedia.org/T241979 (10Dzahn) 2020-01-07 00:57:29,953 [INFO] instance urldownloader1001.wikimedia.org created with MAC `aa:00:00:d2:7d:c6` 2020-01-07 01:02:46,491 [INFO] instance urldownload... [01:23:05] (03PS1) 10Dzahn: install_server: add new urldownloaders to DHCP [puppet] - 10https://gerrit.wikimedia.org/r/562394 (https://phabricator.wikimedia.org/T241979) [01:26:05] (03CR) 10Dzahn: [C: 03+2] site: add new URL-downloaders as spare systems [puppet] - 10https://gerrit.wikimedia.org/r/562392 (https://phabricator.wikimedia.org/T241979) (owner: 10Dzahn) [01:27:19] (03CR) 10Dzahn: "so buster or stretch?" [puppet] - 10https://gerrit.wikimedia.org/r/562394 (https://phabricator.wikimedia.org/T241979) (owner: 10Dzahn) [01:29:38] 10Operations, 10WMF-Legal, 10serviceops, 10Patch-For-Review: Move old transparency report pages to historical URLs and setup redirect - https://phabricator.wikimedia.org/T230638 (10JbuattiWMF) Hello friends, and happy new year. Legal was hoping to check back in on #1 and #2 (but #1 in particular). Thanks s... [01:29:39] 10Operations: Migrate URL downloaders to Stretch/Buster - https://phabricator.wikimedia.org/T224551 (10Dzahn) 4 VMs have been created and the MAC addresses are on https://gerrit.wikimedia.org/r/c/operations/puppet/+/562394 but OS has not been installed yet. Buster or stretch? [01:45:57] 10Operations, 10ops-codfw, 10Wikimedia-Logstash: (No Need By Date Provided) rack/setup/install logstash202[6-9].codfw.wmnet - https://phabricator.wikimedia.org/T240882 (10Papaul) ` papaul@asw-a-codfw# run show interfaces xe-4/0/20 descriptions Interface Admin Link Description xe-4/0/20 up up... [01:46:41] 10Operations, 10ops-codfw, 10Wikimedia-Logstash: (No Need By Date Provided) rack/setup/install logstash202[6-9].codfw.wmnet - https://phabricator.wikimedia.org/T240882 (10Papaul) [01:56:35] (03PS1) 10DannyS712: Consolidate user rights assignments, part 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562396 (https://phabricator.wikimedia.org/T239771) [02:03:06] (03PS2) 10DannyS712: Consolidate user rights assignments, part 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562396 (https://phabricator.wikimedia.org/T239771) [02:16:06] 10Operations, 10Mail: Add security-team@wikimedia.org as recipient of any abuse@ emails - https://phabricator.wikimedia.org/T242049 (10Dsharpe) @Dzahn - Perfect! Thank you! [02:30:04] (03CR) 10Legoktm: [WIP] Add html webservice type (033 comments) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/561753 (https://phabricator.wikimedia.org/T241817) (owner: 10Legoktm) [02:44:36] PROBLEM - MariaDB Slave Lag: s3 on db2098 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1129.03 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [02:50:57] (03CR) 10BryanDavis: [C: 04-1] [WIP] Add html webservice type (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/561753 (https://phabricator.wikimedia.org/T241817) (owner: 10Legoktm) [03:06:52] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 36 probes of 509 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [03:12:42] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 35 probes of 509 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [03:40:46] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 37 probes of 505 (alerts on 35) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [03:46:34] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 34 probes of 505 (alerts on 35) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [04:03:42] RECOVERY - MariaDB Slave Lag: s3 on db2098 is OK: OK slave_sql_lag Replication lag: 0.28 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [04:43:08] PROBLEM - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on icinga1001 is CRITICAL: 1.001e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw [04:50:18] RECOVERY - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on icinga1001 is OK: (C)1e+05 gt (W)1e+04 gt 9687 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw [05:10:42] PROBLEM - Backup freshness on backup1001 is CRITICAL: Stale: 1 (gerrit1001), Fresh: 89 jobs https://wikitech.wikimedia.org/wiki/Backups%23Monitoring [05:23:18] (03CR) 10Ayounsi: fastnetmon: add UDP/ICMP bw limits, greatly increase pps limits (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/562387 (owner: 10CDanis) [05:40:38] (03PS1) 10CRusnov: rotatedump: Enhance to retain period copies [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/562408 (https://phabricator.wikimedia.org/T231512) [05:45:34] (03CR) 10CRusnov: "I have tested this on a scratch copy of the current mess of backup dumps and the output is listed here:" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/562408 (https://phabricator.wikimedia.org/T231512) (owner: 10CRusnov) [05:49:06] PROBLEM - snapshot of s7 in eqiad on db1115 is CRITICAL: snapshot for s7 at eqiad taken more than 4 days ago: Most recent backup 2020-01-03 05:25:37 https://wikitech.wikimedia.org/wiki/MariaDB/Backups [05:53:19] (03CR) 10Ayounsi: "LGTM but not tested." [puppet] - 10https://gerrit.wikimedia.org/r/562385 (owner: 10CDanis) [06:15:13] (03PS4) 10Giuseppe Lavagetto: Add parsoid-php to the discovery records to switchover [cookbooks] - 10https://gerrit.wikimedia.org/r/545167 [06:15:19] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] Add parsoid-php to the discovery records to switchover [cookbooks] - 10https://gerrit.wikimedia.org/r/545167 (owner: 10Giuseppe Lavagetto) [06:31:12] (03PS5) 10Giuseppe Lavagetto: Add a registry reporter [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559933 [06:34:16] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Add class to scan a registry for images (031 comment) [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559804 (https://phabricator.wikimedia.org/T241206) (owner: 10Giuseppe Lavagetto) [06:47:20] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1100 - https://phabricator.wikimedia.org/T241506 (10Marostegui) Any ETA on when the new disk will be ordered? I wouldn't like to leave the primary database master for s5 with a broken disk for long. If another disk on the same span fails, the master will g... [06:48:47] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db2089:3316 - T239453', diff saved to https://phabricator.wikimedia.org/P10048 and previous config saved to /var/cache/conftool/dbconfig/20200107-064846-marostegui.json [06:48:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:48:51] T239453: Remove partitions from revision table - https://phabricator.wikimedia.org/T239453 [06:53:06] 10Operations, 10Traffic: servers freeze across the caching cluster - https://phabricator.wikimedia.org/T238305 (10Marostegui) >>! In T238305#5774514, @wiki_willy wrote: > In going through all the affected systems in this task, I'd like to treat db2125 and backup2001 separately, since they seem like one-offs an... [06:56:18] 10Operations, 10ops-codfw, 10DBA: Upgrade BIOS and firmware on db2084 - https://phabricator.wikimedia.org/T241103 (10Marostegui) 05Open→03Resolved a:05Marostegui→03Papaul >>! In T241103#5774374, @jcrespo wrote: > I have started mysql instances back again, and replication, as on codfw there is low loa... [06:56:20] 10Operations, 10DBA: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Marostegui) [06:56:47] 10Operations, 10DBA: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Marostegui) [06:59:22] (03PS1) 10Marostegui: dbproxy1010: Depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/562424 [07:03:03] !log Deploy schema change on s8 codfw (this will generate lag on s8 codfw) - T234052 [07:03:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:03:19] T234052: Add abuse_filter_log.afl_filter_id and afl_global columns - https://phabricator.wikimedia.org/T234052 [07:05:04] (03CR) 10Marostegui: [C: 03+2] dbproxy1010: Depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/562424 (owner: 10Marostegui) [07:05:57] !log Depool labsdb1011 [07:05:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:08:51] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1098:3316 - T239453', diff saved to https://phabricator.wikimedia.org/P10049 and previous config saved to /var/cache/conftool/dbconfig/20200107-070850-marostegui.json [07:08:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:08:55] T239453: Remove partitions from revision table - https://phabricator.wikimedia.org/T239453 [07:09:15] (03PS1) 10Marostegui: Revert "dbproxy1010: Depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/562425 [07:10:41] 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb deamons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [07:11:32] PROBLEM - haproxy failover on dbproxy1018 is CRITICAL: CRITICAL check_failover servers up 2 down 1 https://wikitech.wikimedia.org/wiki/HAProxy [07:11:59] (03CR) 10Marostegui: [C: 03+2] Revert "dbproxy1010: Depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/562425 (owner: 10Marostegui) [07:12:59] !log Remove partitions from revision table on s6: db1098 T239453 [07:13:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:13:16] RECOVERY - haproxy failover on dbproxy1018 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy [07:15:05] (03PS1) 10Ladsgroup: labs: Set $wmgUseEntitySourceBasedFederation to true everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562427 (https://phabricator.wikimedia.org/T241974) [07:15:34] (03CR) 10Ladsgroup: [C: 03+2] labs: Set $wmgUseEntitySourceBasedFederation to true everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562427 (https://phabricator.wikimedia.org/T241974) (owner: 10Ladsgroup) [07:15:36] !log Remove partitions from s5: db2084:3315 T239453 [07:15:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:15:43] T239453: Remove partitions from revision table - https://phabricator.wikimedia.org/T239453 [07:16:27] (03Merged) 10jenkins-bot: labs: Set $wmgUseEntitySourceBasedFederation to true everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562427 (https://phabricator.wikimedia.org/T241974) (owner: 10Ladsgroup) [07:20:18] (03CR) 10Ladsgroup: "This is merged but not deployed blocking other deployments, I revert it for now. Please put it in SWAT. This is not how patches of product" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/554392 (https://phabricator.wikimedia.org/T231178) (owner: 10DannyS712) [07:20:46] (03PS1) 10Ladsgroup: Revert "InitialiseSettings - clean up groupOverrides layout / spacing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562429 [07:20:50] (03CR) 10Ladsgroup: [C: 03+2] Revert "InitialiseSettings - clean up groupOverrides layout / spacing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562429 (owner: 10Ladsgroup) [07:21:44] (03Merged) 10jenkins-bot: Revert "InitialiseSettings - clean up groupOverrides layout / spacing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562429 (owner: 10Ladsgroup) [07:27:06] rebased [07:27:16] (03PS1) 10Elukey: profile::hue: correct typo in kerberos parameters [puppet] - 10https://gerrit.wikimedia.org/r/562431 [07:29:32] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool es1018 for upgrade', diff saved to https://phabricator.wikimedia.org/P10050 and previous config saved to /var/cache/conftool/dbconfig/20200107-072930-marostegui.json [07:29:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:31:54] 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb deamons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [07:35:09] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool es1018', diff saved to https://phabricator.wikimedia.org/P10051 and previous config saved to /var/cache/conftool/dbconfig/20200107-073508-marostegui.json [07:35:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:35:45] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool es1013 for upgrade', diff saved to https://phabricator.wikimedia.org/P10052 and previous config saved to /var/cache/conftool/dbconfig/20200107-073543-marostegui.json [07:35:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:37:43] (03PS1) 10Elukey: Fix name in Hue's fake keytab for analytics-tool1001 [labs/private] - 10https://gerrit.wikimedia.org/r/562437 [07:38:07] (03CR) 10Elukey: [V: 03+2 C: 03+2] Fix name in Hue's fake keytab for analytics-tool1001 [labs/private] - 10https://gerrit.wikimedia.org/r/562437 (owner: 10Elukey) [07:39:24] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool es1013', diff saved to https://phabricator.wikimedia.org/P10053 and previous config saved to /var/cache/conftool/dbconfig/20200107-073922-marostegui.json [07:39:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:40:10] (03CR) 10Elukey: [C: 03+2] "Andrew: not sure if this is the issue, but there was a typo so it is better to fix it and see if it brings any benefit :) In theory Hue sh" [puppet] - 10https://gerrit.wikimedia.org/r/562431 (owner: 10Elukey) [07:40:37] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool es1019 for upgrade', diff saved to https://phabricator.wikimedia.org/P10054 and previous config saved to /var/cache/conftool/dbconfig/20200107-074035-marostegui.json [07:40:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:42:02] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool es1019', diff saved to https://phabricator.wikimedia.org/P10055 and previous config saved to /var/cache/conftool/dbconfig/20200107-074159-marostegui.json [07:42:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:45:07] (03CR) 10DannyS712: "https://tools.wmflabs.org/sal/log/AW99kUBdfYQT6VcDf0sJ and https://tools.wmflabs.org/sal/log/AW99ki_sfYQT6VcDf0s- show that it was deploye" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562429 (owner: 10Ladsgroup) [07:48:28] (03CR) 10Ladsgroup: "> Patch Set 2:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562429 (owner: 10Ladsgroup) [07:49:40] (03CR) 10Muehlenhoff: "Nothing new should be set up with Buster (just exceptional cases like xhgui due to mongodb or replacing a server in a stretch cluster), th" [puppet] - 10https://gerrit.wikimedia.org/r/562394 (https://phabricator.wikimedia.org/T241979) (owner: 10Dzahn) [07:51:56] (03PS1) 10Elukey: hue: fix kerberos principal for test and prod [puppet] - 10https://gerrit.wikimedia.org/r/562465 [07:55:44] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/20226/analytics-tool1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/562465 (owner: 10Elukey) [07:57:19] (03CR) 10DCausse: [C: 03+2] [cirrus] force phrase_suggest fallback profile for all beta wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561600 (https://phabricator.wikimedia.org/T241487) (owner: 10DCausse) [07:59:22] (03PS2) 10DCausse: [cirrus] force phrase_suggest fallback profile for all beta wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561600 (https://phabricator.wikimedia.org/T241487) [08:00:34] (03CR) 10DCausse: [C: 03+2] [cirrus] force phrase_suggest fallback profile for all beta wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561600 (https://phabricator.wikimedia.org/T241487) (owner: 10DCausse) [08:01:16] (03CR) 10DannyS712: "> This is merged but not deployed blocking other deployments, I" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/554392 (https://phabricator.wikimedia.org/T231178) (owner: 10DannyS712) [08:01:23] (03Merged) 10jenkins-bot: [cirrus] force phrase_suggest fallback profile for all beta wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561600 (https://phabricator.wikimedia.org/T241487) (owner: 10DCausse) [08:11:10] !log ayounsi@deploy1001 Started deploy [librenms/librenms@7a0f7aa]: Upgrade LibreNMS to 1.59 - T241962 [08:11:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:11:13] T241962: Upgrade LibreNMS to 1.59 - https://phabricator.wikimedia.org/T241962 [08:11:20] !log ayounsi@deploy1001 Finished deploy [librenms/librenms@7a0f7aa]: Upgrade LibreNMS to 1.59 - T241962 (duration: 00m 10s) [08:11:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:15:17] (03PS2) 10Muehlenhoff: install_server: add new urldownloaders to DHCP [puppet] - 10https://gerrit.wikimedia.org/r/562394 (https://phabricator.wikimedia.org/T241979) (owner: 10Dzahn) [08:15:42] 10Operations, 10LDAP-Access-Requests: NDA for Superset Request from WMDE Employee - Kris Litson - https://phabricator.wikimedia.org/T241722 (10Kris_Litson_WMDE) Thanks all. Can I have the link to the login page? I'd also like to know what username and password I should use. Cheers! [08:19:21] (03PS3) 10Muehlenhoff: install_server: add new urldownloaders to DHCP [puppet] - 10https://gerrit.wikimedia.org/r/562394 (https://phabricator.wikimedia.org/T241979) (owner: 10Dzahn) [08:22:38] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1084 for compression', diff saved to https://phabricator.wikimedia.org/P10056 and previous config saved to /var/cache/conftool/dbconfig/20200107-082236-marostegui.json [08:22:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:22:46] 10Operations, 10netops: Upgrade LibreNMS to 1.59 - https://phabricator.wikimedia.org/T241962 (10ayounsi) 05Open→03Resolved Went smoothly. [08:23:49] 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb deamons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [08:25:33] (03CR) 10Gehel: [C: 03+2] [airflow] Move PrivateTmp to correct section of unit [puppet] - 10https://gerrit.wikimedia.org/r/562358 (owner: 10EBernhardson) [08:25:44] (03CR) 10Muehlenhoff: [C: 03+2] install_server: add new urldownloaders to DHCP [puppet] - 10https://gerrit.wikimedia.org/r/562394 (https://phabricator.wikimedia.org/T241979) (owner: 10Dzahn) [08:42:46] PROBLEM - Unmerged changes on repository puppet on labtestpuppetmaster2001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [08:43:00] PROBLEM - Unmerged changes on repository puppet on puppetmaster1003 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [08:43:24] PROBLEM - Unmerged changes on repository puppet on puppetmaster2002 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [08:43:24] PROBLEM - Unmerged changes on repository puppet on puppetmaster2001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [08:43:46] moritzm: ^ would that be you? [08:45:22] ah, sorry, fixed now [08:46:22] RECOVERY - Unmerged changes on repository puppet on labtestpuppetmaster2001 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [08:46:36] RECOVERY - Unmerged changes on repository puppet on puppetmaster1003 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [08:47:00] RECOVERY - Unmerged changes on repository puppet on puppetmaster2002 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [08:47:00] RECOVERY - Unmerged changes on repository puppet on puppetmaster2001 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [08:47:34] 10Operations, 10observability: Add RIPE atlas data to Prometheus - https://phabricator.wikimedia.org/T167689 (10ayounsi) Thanks this is really nice! It could be useful to add https://grafana.com/grafana/plugins/grafana-worldmap-panel and show the data on a map as well. I'm interested in adding the traceroute... [08:58:59] (03CR) 10Ayounsi: "I'm not familiar with our APT repo. I guess the end result should be a NOOP?" [puppet] - 10https://gerrit.wikimedia.org/r/562226 (owner: 10Muehlenhoff) [09:07:09] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "recheck" [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559804 (https://phabricator.wikimedia.org/T241206) (owner: 10Giuseppe Lavagetto) [09:07:12] PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:08:16] (03CR) 10jerkins-bot: [V: 04-1] Add class to scan a registry for images [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559804 (https://phabricator.wikimedia.org/T241206) (owner: 10Giuseppe Lavagetto) [09:09:36] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-apy] - 10https://gerrit.wikimedia.org/r/554849 (https://phabricator.wikimedia.org/T236080) (owner: 10KartikMistry) [09:13:50] RECOVERY - Backup freshness on backup1001 is OK: Fresh: 90 jobs https://wikitech.wikimedia.org/wiki/Backups%23Monitoring [09:14:51] (03CR) 10Muehlenhoff: "Ack, it'll be a NOP for existing installs, I'll ping you when merging" [puppet] - 10https://gerrit.wikimedia.org/r/562226 (owner: 10Muehlenhoff) [09:19:18] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/559879 (owner: 10Herron) [09:19:21] 10Operations, 10LDAP-Access-Requests, 10WMF-Legal, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T242080 (10Silvan_WMDE) [09:19:34] (03PS1) 10Muehlenhoff: profile::url_downloader: Add types and switch to lookup() [puppet] - 10https://gerrit.wikimedia.org/r/562472 [09:19:36] (03PS1) 10Muehlenhoff: Adapt auto restart for Buster [puppet] - 10https://gerrit.wikimedia.org/r/562473 [09:22:22] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1092 for alter and upgrade', diff saved to https://phabricator.wikimedia.org/P10057 and previous config saved to /var/cache/conftool/dbconfig/20200107-092221-marostegui.json [09:22:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:24:01] (03PS1) 10Elukey: hive: store delegation tokens in the db [puppet] - 10https://gerrit.wikimedia.org/r/562474 (https://phabricator.wikimedia.org/T238560) [09:24:05] 10Operations, 10LDAP-Access-Requests: NDA for Superset Request from WMDE Employee - Kris Litson - https://phabricator.wikimedia.org/T241722 (10Aklapper) See https://superset.wikimedia.org/ [09:24:43] (03CR) 10jerkins-bot: [V: 04-1] hive: store delegation tokens in the db [puppet] - 10https://gerrit.wikimedia.org/r/562474 (https://phabricator.wikimedia.org/T238560) (owner: 10Elukey) [09:28:19] (03CR) 10Filippo Giunchedi: [C: 03+1] Switch Gerrit/Phabricator to standard Partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/562285 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [09:28:23] (03PS2) 10Elukey: hive: store delegation tokens in the db [puppet] - 10https://gerrit.wikimedia.org/r/562474 (https://phabricator.wikimedia.org/T238560) [09:31:47] (03CR) 10Elukey: "Looks fine to me, let's wait for SRE to confirm." [puppet] - 10https://gerrit.wikimedia.org/r/562354 (owner: 10EBernhardson) [09:33:12] 10Operations: Migrate fermium to Buster - https://phabricator.wikimedia.org/T224586 (10MoritzMuehlenhoff) [09:33:27] 10Operations: Migrate URL downloaders to Buster - https://phabricator.wikimedia.org/T224551 (10MoritzMuehlenhoff) [09:34:06] 10Operations, 10serviceops: Migrate Zookeeper/etcd conf cluster in codfw to Buster - https://phabricator.wikimedia.org/T224560 (10MoritzMuehlenhoff) [09:34:06] RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:34:16] 10Operations, 10Wikimedia-Etherpad, 10serviceops: Migrate etherpad1001 to Buster - https://phabricator.wikimedia.org/T224580 (10MoritzMuehlenhoff) [09:34:23] 10Operations: Upgrade install servers to Buster - https://phabricator.wikimedia.org/T224576 (10MoritzMuehlenhoff) [09:35:52] 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb deamons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [09:42:12] 10Operations, 10LDAP-Access-Requests, 10WMF-Legal: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T242080 (10Aklapper) [09:42:57] 10Operations, 10LDAP-Access-Requests, 10WMF-Legal: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T242080 (10Aklapper) If WMDE has internal docs then please remove #WMF-NDA-Requests from it. Thanks. [09:45:07] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P10058 and previous config saved to /var/cache/conftool/dbconfig/20200107-094506-marostegui.json [09:45:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:47:20] (03CR) 10Filippo Giunchedi: [C: 03+2] install_server: use raid10-6dev standard recipe [puppet] - 10https://gerrit.wikimedia.org/r/559549 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [09:49:22] (03CR) 10Elukey: [C: 03+1] install_server: deprecate raid10-gpt-srv-lvm-ext4.cfg [puppet] - 10https://gerrit.wikimedia.org/r/559553 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [09:49:47] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P10059 and previous config saved to /var/cache/conftool/dbconfig/20200107-094944-marostegui.json [09:49:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:52:16] (03PS5) 10Filippo Giunchedi: install_server: use raid10-8dev standard recipe [puppet] - 10https://gerrit.wikimedia.org/r/559548 (https://phabricator.wikimedia.org/T156955) [09:52:18] (03PS5) 10Filippo Giunchedi: install_server: use raid10-6dev standard recipe [puppet] - 10https://gerrit.wikimedia.org/r/559549 (https://phabricator.wikimedia.org/T156955) [09:52:20] (03PS5) 10Filippo Giunchedi: install_server: deprecate raid10-gpt-srv-lvm-ext4.cfg [puppet] - 10https://gerrit.wikimedia.org/r/559553 (https://phabricator.wikimedia.org/T156955) [09:52:51] !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' . [09:52:51] !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' . [09:52:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:52:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:55:04] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P10060 and previous config saved to /var/cache/conftool/dbconfig/20200107-095501-marostegui.json [09:55:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:55:25] (03CR) 10Filippo Giunchedi: [C: 03+1] install_server: use raid10-8dev standard recipe [puppet] - 10https://gerrit.wikimedia.org/r/559548 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [09:55:31] (03CR) 10Filippo Giunchedi: [C: 03+2] install_server: use raid10-8dev standard recipe [puppet] - 10https://gerrit.wikimedia.org/r/559548 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [09:56:23] (03CR) 10Filippo Giunchedi: [C: 03+2] install_server: use raid10-6dev standard recipe [puppet] - 10https://gerrit.wikimedia.org/r/559549 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [09:57:08] (03CR) 10Filippo Giunchedi: [C: 03+2] install_server: deprecate raid10-gpt-srv-lvm-ext4.cfg [puppet] - 10https://gerrit.wikimedia.org/r/559553 (https://phabricator.wikimedia.org/T156955) (owner: 10Filippo Giunchedi) [10:01:55] !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime [10:01:56] !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [10:01:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:01:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:01:58] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P10061 and previous config saved to /var/cache/conftool/dbconfig/20200107-100157-marostegui.json [10:01:59] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/561675 (https://phabricator.wikimedia.org/T240440) (owner: 10Elukey) [10:01:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:03:13] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/562294 (owner: 10Herron) [10:03:20] (03PS4) 10Elukey: profile::analytics::client::limits: add cpu limits to Analytics clients [puppet] - 10https://gerrit.wikimedia.org/r/561675 (https://phabricator.wikimedia.org/T240440) [10:05:00] !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' . [10:05:00] !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' . [10:05:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:05:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:05:33] (03CR) 10Addshore: [C: 04-1] "1 thing needs removing, everything else looks good!" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559161 (https://phabricator.wikimedia.org/T238154) (owner: 10Ladsgroup) [10:05:55] (03CR) 10Addshore: [C: 03+1] Clean up unused config in InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559162 (https://phabricator.wikimedia.org/T238154) (owner: 10Ladsgroup) [10:07:45] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1092', diff saved to https://phabricator.wikimedia.org/P10062 and previous config saved to /var/cache/conftool/dbconfig/20200107-100743-marostegui.json [10:07:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:08:45] 10Operations, 10LDAP-Access-Requests, 10WMF-Legal: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T242080 (10WMDE-leszek) As Silvan's line manager, I endorse this request and confirm he is WMDE employee. @RStallman-legalteam mind filing an NDA for Silvan and sending it to him... [10:09:04] (03CR) 10Jbond: [C: 03+2] "LGTM ill merge now" [puppet] - 10https://gerrit.wikimedia.org/r/562354 (owner: 10EBernhardson) [10:09:41] (03PS3) 10Jbond: Add airflow-kerberos unit to airflow mgmt privs [puppet] - 10https://gerrit.wikimedia.org/r/562354 (owner: 10EBernhardson) [10:10:41] (03CR) 10Ladsgroup: Clean up unused configs in Wikibase.php (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559161 (https://phabricator.wikimedia.org/T238154) (owner: 10Ladsgroup) [10:11:50] 10Operations, 10DNS, 10Domains, 10Traffic: Donate wikiźródła.pl and wikisłownik.pl to the Foundation - https://phabricator.wikimedia.org/T240446 (10tomasz) p:05Normal→03Low [10:12:26] (03PS4) 10Giuseppe Lavagetto: Add class to scan a registry for images [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559804 (https://phabricator.wikimedia.org/T241206) [10:12:28] (03PS6) 10Giuseppe Lavagetto: Add a registry reporter [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559933 [10:12:30] (03PS1) 10Giuseppe Lavagetto: Fix tox / pep517 issues [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/562480 [10:13:33] (03CR) 10jerkins-bot: [V: 04-1] Add a registry reporter [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559933 (owner: 10Giuseppe Lavagetto) [10:13:42] (03CR) 10Volans: [C: 03+1] "LGTM" [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/562480 (owner: 10Giuseppe Lavagetto) [10:13:53] 10Operations, 10DNS, 10Domains, 10Traffic: Donate wikiźródła.pl and wikisłownik.pl to the Foundation - https://phabricator.wikimedia.org/T240446 (10tomasz) I passed the AuthInfo codes to Doneva on Saturday 28 December 2019, however she wass out-of-office until 2 January 2020, so I'm currently still awaitin... [10:15:44] (03CR) 10Alexandros Kosiaris: [C: 03+1] "+1ed, but this will require a manual rebase it seems" [puppet] - 10https://gerrit.wikimedia.org/r/549177 (https://phabricator.wikimedia.org/T233630) (owner: 10Ottomata) [10:15:57] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Fix tox / pep517 issues [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/562480 (owner: 10Giuseppe Lavagetto) [10:17:59] (03Merged) 10jenkins-bot: Fix tox / pep517 issues [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/562480 (owner: 10Giuseppe Lavagetto) [10:18:01] (03Merged) 10jenkins-bot: Add class to scan a registry for images [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559804 (https://phabricator.wikimedia.org/T241206) (owner: 10Giuseppe Lavagetto) [10:18:39] 10Operations, 10Wikimedia-Mailing-lists: Rename and transfer admin of Malaysia Mailing List (wikimediamy) - https://phabricator.wikimedia.org/T241988 (10Exec8) There are still issues: 1. I checked the [[ https://lists.wikimedia.org/mailman/listinfo | roster of mailing lists ]] and I noticed that this one (wik... [10:19:23] !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' . [10:19:23] !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' . [10:19:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:19:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:19:40] PROBLEM - IPv4 ping to eqiad on ripe-atlas-eqiad is CRITICAL: CRITICAL - /usr/lib/nagios/plugins/check_ripe_atlas.py failed with URLError: urlopen error [Errno 111] Connection refused https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:19:48] PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - /usr/lib/nagios/plugins/check_ripe_atlas.py failed with URLError: urlopen error [Errno 111] Connection refused https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:20:08] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - /usr/lib/nagios/plugins/check_ripe_atlas.py failed with URLError: urlopen error [Errno 111] Connection refused https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:20:08] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - /usr/lib/nagios/plugins/check_ripe_atlas.py failed with URLError: urlopen error [Errno 111] Connection refused https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:20:20] PROBLEM - IPv4 ping to eqsin on ripe-atlas-eqsin is CRITICAL: CRITICAL - /usr/lib/nagios/plugins/check_ripe_atlas.py failed with URLError: urlopen error [Errno 111] Connection refused https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:21:04] (03CR) 10Addshore: [C: 04-1] Clean up unused configs in Wikibase.php (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559161 (https://phabricator.wikimedia.org/T238154) (owner: 10Ladsgroup) [10:21:08] PROBLEM - IPv4 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - /usr/lib/nagios/plugins/check_ripe_atlas.py failed with URLError: urlopen error [Errno 111] Connection refused https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:21:10] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - /usr/lib/nagios/plugins/check_ripe_atlas.py failed with URLError: urlopen error [Errno 111] Connection refused https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:22:13] hmmm RIPE Atlas API issues? [10:22:40] PROBLEM - IPv4 ping to esams on ripe-atlas-esams is CRITICAL: CRITICAL - /usr/lib/nagios/plugins/check_ripe_atlas.py failed with URLError: urlopen error [Errno 111] Connection refused https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:22:57] I think we had patched the script for API failures [10:23:42] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - /usr/lib/nagios/plugins/check_ripe_atlas.py failed with URLError: urlopen error [Errno 111] Connection refused https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:24:18] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - /usr/lib/nagios/plugins/check_ripe_atlas.py failed with URLError: urlopen error [Errno 111] Connection refused https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:27:49] (03PS1) 10Jbond: test "string" `cat /etc/debian_version` $(cat /etc/debian_version) [puppet] - 10https://gerrit.wikimedia.org/r/562481 [10:28:38] (03CR) 10Jbond: [C: 03+2] test "string" `cat /etc/debian_version` $(cat /etc/debian_version) [puppet] - 10https://gerrit.wikimedia.org/r/562481 (owner: 10Jbond) [10:30:04] (03PS1) 10Jbond: test: normal commit message [puppet] - 10https://gerrit.wikimedia.org/r/562482 [10:33:54] (03PS3) 10Ladsgroup: Clean up unused configs in Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559161 (https://phabricator.wikimedia.org/T238154) [10:33:56] (03PS4) 10Ladsgroup: Clean up unused config in InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559162 (https://phabricator.wikimedia.org/T238154) [10:34:03] 10Operations, 10Analytics, 10Traffic: Create replacement for Varnishkafka - https://phabricator.wikimedia.org/T237993 (10ema) [10:34:06] (03PS1) 10Muehlenhoff: Deprecate raid1.cfg [puppet] - 10https://gerrit.wikimedia.org/r/562483 (https://phabricator.wikimedia.org/T156955) [10:34:23] (03CR) 10Jbond: [C: 03+2] test: normal commit message [puppet] - 10https://gerrit.wikimedia.org/r/562482 (owner: 10Jbond) [10:34:59] (03CR) 10Addshore: [C: 03+1] Clean up unused configs in Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559161 (https://phabricator.wikimedia.org/T238154) (owner: 10Ladsgroup) [10:39:32] !log ladsgroup@deploy1001 Synchronized php-1.35.0-wmf.11/extensions/Wikibase/lib/includes/Store/Sql/Terms/DatabaseTermIdsAcquirer.php: [[gerrit:562477|Temporary add metrics of the need to reinsert in the new term store]] (duration: 00m 57s) [10:39:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:45:47] (03CR) 10Effie Mouzeli: [C: 03+2] mediawiki::php::admin memory optimisation for lib.php [puppet] - 10https://gerrit.wikimedia.org/r/558158 (https://phabricator.wikimedia.org/T240824) (owner: 10Effie Mouzeli) [10:46:20] (03PS1) 10Ema: varnishkafka: remove incorrect comment about X-Client-IP [puppet] - 10https://gerrit.wikimedia.org/r/562485 [10:49:12] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 32 probes of 509 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:49:12] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 31 probes of 509 (alerts on 35) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:49:30] RECOVERY - IPv4 ping to eqsin on ripe-atlas-eqsin is OK: OK - failed 3 probes of 587 (alerts on 35) - https://atlas.ripe.net/measurements/11645085/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:49:34] RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 3 probes of 587 (alerts on 35) - https://atlas.ripe.net/measurements/1791210/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:50:18] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 38 probes of 509 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:50:26] RECOVERY - IPv4 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 3 probes of 587 (alerts on 35) - https://atlas.ripe.net/measurements/1791307/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:51:27] 10Operations, 10serviceops, 10Patch-For-Review: PHP Fatal error: Allowed memory size of 524288000 bytes exhausted (tried to allocate 20480 bytes) in /var/www/php-monitoring/lib.php on line 35 - https://phabricator.wikimedia.org/T240824 (10jijiki) 05Open→03Resolved a:03jijiki There is was bug in the mo... [10:51:38] RECOVERY - IPv4 ping to esams on ripe-atlas-esams is OK: OK - failed 3 probes of 583 (alerts on 35) - https://atlas.ripe.net/measurements/23449935/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:52:38] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 33 probes of 505 (alerts on 35) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:53:16] !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' . [10:53:16] !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' . [10:53:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:53:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:53:20] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 33 probes of 509 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:54:00] (03PS2) 10Addshore: Update cron with lb and lb-pool params [puppet] - 10https://gerrit.wikimedia.org/r/553097 (https://phabricator.wikimedia.org/T238751) (owner: 10Alaa Sarhan) [10:54:40] RECOVERY - IPv4 ping to eqiad on ripe-atlas-eqiad is OK: OK - failed 3 probes of 587 (alerts on 35) - https://atlas.ripe.net/measurements/1790945/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [10:55:08] (03CR) 10Elukey: [C: 03+1] varnishkafka: remove incorrect comment about X-Client-IP [puppet] - 10https://gerrit.wikimedia.org/r/562485 (owner: 10Ema) [10:58:33] (03CR) 10Ema: [C: 03+2] varnishkafka: remove incorrect comment about X-Client-IP [puppet] - 10https://gerrit.wikimedia.org/r/562485 (owner: 10Ema) [11:00:46] (03PS1) 10Ammarpad: Enable lead paragraph in draft namespace on nlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562486 (https://phabricator.wikimedia.org/T242030) [11:01:54] (03CR) 10jerkins-bot: [V: 04-1] Enable lead paragraph in draft namespace on nlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562486 (https://phabricator.wikimedia.org/T242030) (owner: 10Ammarpad) [11:03:16] (03PS2) 10Ammarpad: Enable lead paragraph in draft namespace on nlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562486 (https://phabricator.wikimedia.org/T242030) [11:04:06] (03PS1) 10Ladsgroup: mediawiki: Drop the rebuildItemTerms cron [puppet] - 10https://gerrit.wikimedia.org/r/562487 (https://phabricator.wikimedia.org/T219123) [11:09:45] (03PS1) 10Arturo Borrero Gonzalez: openstack: eqiad1: depool cloudvirt1009 [puppet] - 10https://gerrit.wikimedia.org/r/562490 [11:10:13] (03PS1) 10Jbond: motd: fix awk command for parsing puppet configuration_version [puppet] - 10https://gerrit.wikimedia.org/r/562491 (https://phabricator.wikimedia.org/T241459) [11:10:17] !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' . [11:10:17] !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' . [11:10:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:10:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:11:09] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: eqiad1: depool cloudvirt1009 [puppet] - 10https://gerrit.wikimedia.org/r/562490 (owner: 10Arturo Borrero Gonzalez) [11:11:48] !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime [11:11:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:12:05] !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' . [11:12:05] !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' . [11:12:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:12:06] !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [11:12:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:12:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:12:11] ACKNOWLEDGEMENT - mediawiki-installation DSH group on mw2282 is CRITICAL: Host mw2282 is not in mediawiki-installation dsh group Effie Mouzeli Server is used to test puppet changes, will reimage again https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [11:18:03] 10Operations, 10Traffic: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10Vgutierrez) [11:18:24] (03PS7) 10Giuseppe Lavagetto: Add a registry reporter [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559933 [11:20:28] (03PS1) 10Vgutierrez: 5.1.3-wm12: Bump version and target buster [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/562493 (https://phabricator.wikimedia.org/T242093) [11:21:37] (03CR) 10Giuseppe Lavagetto: "I did refine some tests,and added a couple bugfixes, but the patch is still basically the same Moritz +1'd." [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559933 (owner: 10Giuseppe Lavagetto) [11:21:49] (03PS2) 10Vgutierrez: 5.1.3-1wm12: Bump version and target buster [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/562493 (https://phabricator.wikimedia.org/T242093) [11:22:27] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Add a registry reporter [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559933 (owner: 10Giuseppe Lavagetto) [11:22:58] 10Operations, 10netops: mr1-esams RMA (2020 edition) - https://phabricator.wikimedia.org/T242097 (10ayounsi) a:03ayounsi [11:23:33] (03Merged) 10jenkins-bot: Add a registry reporter [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/559933 (owner: 10Giuseppe Lavagetto) [11:26:30] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10Vgutierrez) p:05Triage→03Normal [11:28:10] (03CR) 10Filippo Giunchedi: [C: 03+1] Deprecate raid1.cfg [puppet] - 10https://gerrit.wikimedia.org/r/562483 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [11:28:53] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "please check that cloudmetrics100X servers are OK with this change when merging." [puppet] - 10https://gerrit.wikimedia.org/r/561855 (owner: 10Muehlenhoff) [11:32:48] (03CR) 10Arturo Borrero Gonzalez: toolforge: Monitor local crontabs with Prometheus (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/561412 (https://phabricator.wikimedia.org/T210993) (owner: 10BryanDavis) [11:33:59] 10Operations, 10Analytics, 10Product-Analytics, 10SRE-Access-Requests: Requesting access to analytics infrastructure - https://phabricator.wikimedia.org/T242026 (10Nuria) [11:36:37] (03CR) 10jerkins-bot: [V: 04-1] 5.1.3-1wm12: Bump version and target buster [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/562493 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez) [11:37:21] (03PS1) 10Effie Mouzeli: mediawiki::php::admin fix for $sma_info global var [puppet] - 10https://gerrit.wikimedia.org/r/562498 [11:39:08] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 36 probes of 509 (alerts on 35) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [11:41:47] 10Operations, 10Performance-Team, 10observability, 10Patch-For-Review: Consolidate performance website and related software - https://phabricator.wikimedia.org/T158837 (10Gilles) a:03dpifke [11:42:39] (03CR) 10Effie Mouzeli: [C: 03+2] mediawiki::php::admin fix for $sma_info global var [puppet] - 10https://gerrit.wikimedia.org/r/562498 (owner: 10Effie Mouzeli) [11:44:56] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 35 probes of 509 (alerts on 35) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [11:50:49] (03CR) 10Ammarpad: [C: 04-1] "Waiting for Jon. Ita appears there's no ns:118 on nlwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562486 (https://phabricator.wikimedia.org/T242030) (owner: 10Ammarpad) [12:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200107T1200). [12:00:05] Ammarpad and Amir1: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [12:00:14] o/ [12:00:29] o/ [12:03:28] I can SWAT [12:03:57] I don't see Ammarpad here, so probably start with yours Amir1 ? [12:04:05] sure [12:04:36] (03CR) 10Ladsgroup: [C: 03+2] Clean up unused configs in Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559161 (https://phabricator.wikimedia.org/T238154) (owner: 10Ladsgroup) [12:05:38] (03Merged) 10jenkins-bot: Clean up unused configs in Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559161 (https://phabricator.wikimedia.org/T238154) (owner: 10Ladsgroup) [12:08:57] (03PS2) 10Jbond: motd: fix awk command for parsing puppet configuration_version [puppet] - 10https://gerrit.wikimedia.org/r/562491 (https://phabricator.wikimedia.org/T241459) [12:09:15] I'm getting this in mwdebug1002: PHP Notice: Undefined variable: sma_info in /var/www/php-monitoring/lib.php on line 26 [12:09:25] unrelated though just saying [12:10:49] Synced the wrong file 🤦‍♂️🤦‍♂️🤦‍♂️ [12:11:10] !log ladsgroup@deploy1001 Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:559161|Clean up unused configs in Wikibase.php (T238154)]] (duration: 00m 56s) [12:11:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:11:13] T238154: Clean removed wikibase config variables from production - https://phabricator.wikimedia.org/T238154 [12:12:04] I will do it twice since there's a cache issue [12:12:23] !log ladsgroup@deploy1001 Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:559161|Clean up unused configs in Wikibase.php (T238154)]] (duration: 00m 54s) [12:12:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:12:48] (03PS5) 10Ladsgroup: Clean up unused config in InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559162 (https://phabricator.wikimedia.org/T238154) [12:12:56] (03CR) 10Ladsgroup: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559162 (https://phabricator.wikimedia.org/T238154) (owner: 10Ladsgroup) [12:13:22] !log ladsgroup@deploy1001 Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:559161|Clean up unused configs in Wikibase.php (T238154)]] (duration: 00m 54s) [12:13:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:14:01] (03Merged) 10jenkins-bot: Clean up unused config in InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/559162 (https://phabricator.wikimedia.org/T238154) (owner: 10Ladsgroup) [12:15:34] !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:559162|Clean up unused configs in InitialiseSettings.php (T238154)]] (duration: 00m 55s) [12:15:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:17:33] !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:559162|Clean up unused configs in InitialiseSettings.php (T238154)]] (duration: 00m 54s) [12:17:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:17:38] T238154: Clean removed wikibase config variables from production - https://phabricator.wikimedia.org/T238154 [12:18:37] I'm done [12:18:48] Anyone for rest of the SWAT? [12:19:23] I have a patch for some logos but I forgot to schedule it in the calendar [12:19:40] hauskatze: sure, add it there [12:19:42] let me know [12:20:01] okay, let me see; it's from last year [12:20:25] you should look really hard, it's from the last decade [12:21:04] fortunatelly owner:self status:open doesn't return much results [12:21:07] it's https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/561365/ [12:21:24] I also have an old chain, clean up Wikidata config, if we’re cleaning those up now [12:21:26] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/502480 [12:21:50] I'm adding the patch on Wikitech [12:22:51] Amir1: do you lead the rest of the SWAT? [12:23:21] Urbanecm: if you want to, you can take over. I have some other things to finish with the wb_terms right now [12:23:46] my patch is now on the Calendar should anyone want to deploy it [12:24:31] ah but of course I put it in the wrong section [12:25:24] fixed [12:25:29] Urbanecm: ? [12:26:08] hauskatze: +2'ing [12:26:15] (03CR) 10Urbanecm: [C: 03+2] Modify ge.wikimedia project logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561365 (https://phabricator.wikimedia.org/T241327) (owner: 10MarcoAurelio) [12:26:32] Thanks [12:26:49] :) [12:27:07] (03Merged) 10jenkins-bot: Modify ge.wikimedia project logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561365 (https://phabricator.wikimedia.org/T241327) (owner: 10MarcoAurelio) [12:28:10] (03CR) 10Alexandros Kosiaris: [C: 04-1] profile::url_downloader: Add types and switch to lookup() (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/562472 (owner: 10Muehlenhoff) [12:29:15] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1104', diff saved to https://phabricator.wikimedia.org/P10063 and previous config saved to /var/cache/conftool/dbconfig/20200107-122914-marostegui.json [12:29:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:29:18] pls let me know when it's on debug(xxxx) to test [12:29:25] hauskatze: mwdebug1001, please, could you test? [12:29:31] ja [12:30:04] I don't see them. I'm trying to purge the cache [12:30:19] that fixed it [12:30:26] looks good to me in this screen [12:30:55] Urbanecm: looks good to me in my pc [12:30:59] hauskatze: thanks, syncing [12:31:39] Urbanecm: do you think we can do 561355 as well, unless others do have something further to add? [12:31:56] hauskatze: no problem :) [12:32:01] (03PS3) 10Urbanecm: Modify $wgArticleCount to 'any' for ta.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561355 (https://phabricator.wikimedia.org/T241684) (owner: 10MarcoAurelio) [12:32:05] I think Lucas_WMDE wanted to SWAT something as well [12:32:06] (03CR) 10Urbanecm: [C: 03+2] Modify $wgArticleCount to 'any' for ta.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561355 (https://phabricator.wikimedia.org/T241684) (owner: 10MarcoAurelio) [12:32:23] if there’s still time [12:32:23] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: SWAT: d6ee5fe: Modify ge.wikimedia project logos (T241327) (duration: 00m 57s) [12:32:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:32:26] T241327: Change WMGE site logo - https://phabricator.wikimedia.org/T241327 [12:33:07] (03Merged) 10jenkins-bot: Modify $wgArticleCount to 'any' for ta.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561355 (https://phabricator.wikimedia.org/T241684) (owner: 10MarcoAurelio) [12:33:14] hauskatze: should be fine now [12:33:58] logo looks good now as well in prod [12:34:07] another happy customer :) [12:34:09] (03PS3) 10Lucas Werkmeister (WMDE): Fix wgImportSources setting for wikidata dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/502479 [12:34:11] (03PS3) 10Lucas Werkmeister (WMDE): Fix WBRepoCanonicalUriProperty setting for testwikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/502480 [12:34:47] hauskatze: great! [12:34:53] can you test the second patch? [12:35:06] added mine to the calendar [12:35:09] I think so, on mwdebug1001? [12:35:30] hauskatze: now [12:35:37] Lucas_WMDE: I'll ping you once the air will be clear for you [12:35:43] ok thanks! [12:36:25] Urbanecm: I don't see any changes in Special:Statistics [12:36:34] maybe it needs the script run? [12:36:42] I'll try purging there as well, see if that works [12:36:42] ah, that's the thing - I can't regenerate now, since mwmaint has no clue about the change [12:36:44] I'll deploy [12:36:50] aha, ok [12:38:15] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 5be01f0: Modify $wgArticleCount to any for ta.wiktionary (T241684) (duration: 00m 55s) [12:38:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:38:20] T241684: set wgArticleCountMethod to any on Tamil wiktionary - https://phabricator.wikimedia.org/T241684 [12:39:10] !log Run mwscript initSiteStats.php --wiki=tawiktionary --update (T241684) [12:39:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:39:12] hauskatze: works now? [12:41:07] Urbanecm: Apparently still the same data [12:41:23] hmm, interesting [12:41:25] https://usercontent.irccloud-cdn.com/file/WucE0raU/image.png [12:42:32] looks it worked then [12:44:33] okay :-) [12:44:39] hauskatze: anything else? [12:44:49] nothing else [12:44:51] thanks [12:44:56] lunch time for me [12:46:14] enjoy your lunch! [12:46:18] Lucas_WMDE: the air is clear [12:46:52] (03PS1) 10Giuseppe Lavagetto: Initial debianization [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/562503 (https://phabricator.wikimedia.org/T241206) [12:47:11] ok [12:47:20] Urbanecm: do you want to do the deployment or should I? [12:47:25] Lucas_WMDE: go ahead [12:48:00] ok [12:49:17] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/502479 (owner: 10Lucas Werkmeister (WMDE)) [12:50:12] (03Merged) 10jenkins-bot: Fix wgImportSources setting for wikidata dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/502479 (owner: 10Lucas Werkmeister (WMDE)) [12:50:43] first change is on mwdebug1001, testing [12:50:45] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Initial debianization [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/562503 (https://phabricator.wikimedia.org/T241206) (owner: 10Giuseppe Lavagetto) [12:51:10] looks fine, syncing [12:52:12] (03PS1) 10Ladsgroup: Set wmgUseEntitySourceBasedFederation for test.wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562504 (https://phabricator.wikimedia.org/T241973) [12:52:14] (03PS1) 10Ayounsi: Initial flowspec support [homer/public] - 10https://gerrit.wikimedia.org/r/562505 [12:52:28] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:502479|Fix wgImportSources setting for wikidata dblist]] (duration: 00m 54s) [12:52:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:52:58] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/502480 (owner: 10Lucas Werkmeister (WMDE)) [12:53:54] (03Merged) 10jenkins-bot: Fix WBRepoCanonicalUriProperty setting for testwikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/502480 (owner: 10Lucas Werkmeister (WMDE)) [12:55:05] also fine on mwdebug1001, syncing [12:56:01] (03PS2) 10Ayounsi: Initial flowspec support [homer/public] - 10https://gerrit.wikimedia.org/r/562505 [12:56:36] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:502480|Fix WBRepoCanonicalUriProperty setting for testwikidatawiki]] (duration: 00m 54s) [12:56:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:56:51] !log EU SWAT done [12:56:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:58:02] (03PS1) 10Giuseppe Lavagetto: Release for stretch; add compat level [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/562506 [13:01:06] (03PS2) 10Lucas Werkmeister (WMDE): Update Skolt Sami language name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/510875 (https://phabricator.wikimedia.org/T223544) [13:09:02] (03PS1) 10Muehlenhoff: apt::package_from_component: Switch to a per component apt-get exec [puppet] - 10https://gerrit.wikimedia.org/r/562507 [13:09:13] (03PS1) 10Arturo Borrero Gonzalez: toolforge: new k8s: move metrics-server and kube-state-metrics to new namespace [puppet] - 10https://gerrit.wikimedia.org/r/562508 (https://phabricator.wikimedia.org/T241853) [13:11:17] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Release for stretch; add compat level [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/562506 (owner: 10Giuseppe Lavagetto) [13:18:24] 10Operations, 10Traffic, 10Performance-Team (Radar): Some HTTP requests for MW failing due to "ERR_SPDY_PROTOCOL_ERROR 200" - https://phabricator.wikimedia.org/T220022 (10Krinkle) 05Open→03Declined Closing this together with several other TLS/HTTP2 related issues as we've switched from Nginx to ATS for t... [13:26:14] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1003/20229/" [puppet] - 10https://gerrit.wikimedia.org/r/562507 (owner: 10Muehlenhoff) [13:31:03] (03CR) 10Muehlenhoff: [C: 03+2] apt::package_from_component: Switch to a per component apt-get exec [puppet] - 10https://gerrit.wikimedia.org/r/562507 (owner: 10Muehlenhoff) [13:34:41] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1104', diff saved to https://phabricator.wikimedia.org/P10064 and previous config saved to /var/cache/conftool/dbconfig/20200107-133439-marostegui.json [13:34:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:44] 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown, 10User-DannyS712: Pages whose title ends with semicolon (;) are intermittently inaccessible - https://phabricator.wikimedia.org/T238285 (10geraki) The issue exists also in elwiki: https://el.wikipedia.org/wiki/%CE%91%CE%B8%CF%8E%CE%B1_%CE%AE_%CE%AD%C... [13:42:52] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1126', diff saved to https://phabricator.wikimedia.org/P10065 and previous config saved to /var/cache/conftool/dbconfig/20200107-134251-marostegui.json [13:42:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:43:25] !log reimaging mw2282 [13:43:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:43:41] 10Operations, 10Analytics, 10Product-Analytics, 10SRE-Access-Requests: Requesting access to analytics infrastructure - https://phabricator.wikimedia.org/T242026 (10Nuria) @SNowick_WMF your LDAP user is the user that you log into Https://wikitech.wikimedia.org with. Can you confirm which one that is? [13:45:56] PROBLEM - Disk space on notebook1004 is CRITICAL: DISK CRITICAL - free space: /srv 4810 MB (3% inode=80%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=notebook1004&var-datasource=eqiad+prometheus/ops [13:56:45] 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb deamons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [13:59:28] PROBLEM - Check size of conntrack table on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack [13:59:34] PROBLEM - dhclient process on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient [13:59:34] PROBLEM - Check systemd state on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:59:40] PROBLEM - DPKG on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [14:00:34] 10Operations, 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10User-Elukey: notebook/stat server(s) running out of memory - https://phabricator.wikimedia.org/T212824 (10Nuria) 05Open→03Resolved [14:00:38] PROBLEM - MD RAID on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [14:00:56] PROBLEM - Check whether ferm is active by checking the default input chain on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [14:01:02] PROBLEM - configured eth on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth [14:01:31] (03CR) 10Elukey: [C: 03+2] profile::analytics::client::limits: add cpu limits to Analytics clients [puppet] - 10https://gerrit.wikimedia.org/r/561675 (https://phabricator.wikimedia.org/T240440) (owner: 10Elukey) [14:02:54] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [14:02:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:03:01] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P10066 and previous config saved to /var/cache/conftool/dbconfig/20200107-140300-marostegui.json [14:03:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:03:12] RECOVERY - dhclient process on notebook1004 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient [14:03:12] RECOVERY - Check systemd state on notebook1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:03:18] RECOVERY - DPKG on notebook1004 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [14:04:16] RECOVERY - MD RAID on notebook1004 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [14:04:25] (03PS2) 10Arturo Borrero Gonzalez: toolforge: new k8s: move metrics-server and kube-state-metrics to new namespace [puppet] - 10https://gerrit.wikimedia.org/r/562508 (https://phabricator.wikimedia.org/T241853) [14:04:34] RECOVERY - Check whether ferm is active by checking the default input chain on notebook1004 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [14:04:38] RECOVERY - configured eth on notebook1004 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth [14:04:40] very interesting, the oom killer had to intervene on a process that was not in any user cgroup (that is limited) [14:04:54] RECOVERY - Check size of conntrack table on notebook1004 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack [14:05:03] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [14:05:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:05:27] (03PS1) 10Vgutierrez: 1.7-3: Rebuild for buster [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/562515 (https://phabricator.wikimedia.org/T242093) [14:05:59] (03PS1) 10Gerrit Patch Uploader: Introduce wgSitename for fywiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562517 [14:06:00] (03CR) 10Gerrit Patch Uploader: "This commit was uploaded using the Gerrit Patch Uploader [1]." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562517 (owner: 10Gerrit Patch Uploader) [14:06:14] (03CR) 10jerkins-bot: [V: 04-1] 1.7-3: Rebuild for buster [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/562515 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez) [14:08:38] (03CR) 10Ottomata: [C: 03+1] hive: store delegation tokens in the db [puppet] - 10https://gerrit.wikimedia.org/r/562474 (https://phabricator.wikimedia.org/T238560) (owner: 10Elukey) [14:15:48] (03PS1) 10Ema: ATS: set tls::mapping_rules for labs [puppet] - 10https://gerrit.wikimedia.org/r/562518 [14:17:03] PROBLEM - Check whether ferm is active by checking the default input chain on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [14:17:09] PROBLEM - configured eth on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth [14:17:31] PROBLEM - Check size of conntrack table on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack [14:17:53] PROBLEM - MD RAID on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [14:18:41] (03CR) 10Ema: [C: 03+2] ATS: set tls::mapping_rules for labs [puppet] - 10https://gerrit.wikimedia.org/r/562518 (owner: 10Ema) [14:21:13] PROBLEM - puppet last run on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [14:21:29] !log Deploy schema change on s2 codfw master, this will generate lag on s2 codfw - T234052 [14:21:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:32] T234052: Add abuse_filter_log.afl_filter_id and afl_global columns - https://phabricator.wikimedia.org/T234052 [14:22:16] !log Deploy schema change on s7 codfw master, this will generate lag on s7 codfw - T234052 [14:22:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:22:58] (03PS2) 10Vgutierrez: 1.7-3: Rebuild for buster [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/562515 (https://phabricator.wikimedia.org/T242093) [14:23:20] (03CR) 10jerkins-bot: [V: 04-1] 1.7-3: Rebuild for buster [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/562515 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez) [14:25:39] (03PS7) 10Ottomata: Set up cache routing for schema.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/549177 (https://phabricator.wikimedia.org/T233630) [14:28:21] PROBLEM - Check the NTP synchronisation status of timesyncd on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/NTP [14:29:02] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: new k8s: move metrics-server and kube-state-metrics to new namespace [puppet] - 10https://gerrit.wikimedia.org/r/562508 (https://phabricator.wikimedia.org/T241853) (owner: 10Arturo Borrero Gonzalez) [14:30:28] (03CR) 10Ammarpad: [C: 04-1] Introduce wgSitename for fywiktionary (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562517 (owner: 10Gerrit Patch Uploader) [14:32:04] (03PS2) 10Aklapper: Introduce wgSitename for fywiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562517 (https://phabricator.wikimedia.org/T241883) (owner: 10Gerrit Patch Uploader) [14:32:40] !log Stop MySQL on db2076 for maintenance T241647 [14:32:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:43] T241647: Upgrade BIOS and firmware on db2076 - https://phabricator.wikimedia.org/T241647 [14:33:48] (03PS1) 10Giuseppe Lavagetto: Switch build to buster [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/562522 [14:34:00] (03CR) 10Alexandros Kosiaris: [C: 03+1] ganeti: apply ferm regardless of ganeti_cluster fact [puppet] - 10https://gerrit.wikimedia.org/r/559879 (owner: 10Herron) [14:34:27] (03CR) 10Alexandros Kosiaris: [C: 03+1] profile::ganeti switch from hiera() to lookup() [puppet] - 10https://gerrit.wikimedia.org/r/562294 (owner: 10Herron) [14:34:51] PROBLEM - Check systemd state on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:35:04] (03CR) 10Alexandros Kosiaris: [C: 03+2] Set up cache routing for schema.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/549177 (https://phabricator.wikimedia.org/T233630) (owner: 10Ottomata) [14:35:20] !log Power off db2076 for on-site maintenance T241647 [14:35:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:35:47] 10Operations, 10ops-codfw, 10DBA: Upgrade BIOS and firmware on db2076 - https://phabricator.wikimedia.org/T241647 (10Marostegui) MySQL stopped and host powered off. Ready for @Papaul to act on it. Thank you! [14:36:16] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Switch build to buster [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/562522 (owner: 10Giuseppe Lavagetto) [14:36:59] PROBLEM - dhclient process on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient [14:37:11] PROBLEM - DPKG on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [14:37:52] notebook1004 is sadly a corner case that escapes the cgroups memory limits, since multiple people are basically using all the ram available [14:38:11] (we currently have per user memory limit, not all users) [14:39:04] <_joe_> !log uploading python3-docker-report to {buster,stretch}-wikimedia, T241206 [14:39:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:07] T241206: Report image metadata to debmonitor - https://phabricator.wikimedia.org/T241206 [14:39:45] RECOVERY - Check size of conntrack table on notebook1004 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack [14:39:49] RECOVERY - Check systemd state on notebook1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:40:17] RECOVERY - MD RAID on notebook1004 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [14:40:19] RECOVERY - dhclient process on notebook1004 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient [14:40:23] RECOVERY - Check whether ferm is active by checking the default input chain on notebook1004 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [14:40:33] RECOVERY - DPKG on notebook1004 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [14:40:37] RECOVERY - configured eth on notebook1004 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth [14:40:55] PROBLEM - PyBal IPVS diff check on lvs1015 is CRITICAL: CRITICAL: Services known to PyBal but not to IPVS: set([10.2.2.43:443]) https://wikitech.wikimedia.org/wiki/PyBal [14:40:55] PROBLEM - PyBal IPVS diff check on lvs1016 is CRITICAL: CRITICAL: Services known to PyBal but not to IPVS: set([10.2.2.43:443]) https://wikitech.wikimedia.org/wiki/PyBal [14:41:17] PROBLEM - PyBal IPVS diff check on lvs2006 is CRITICAL: CRITICAL: Services in IPVS but unknown to PyBal: set([10.2.1.43:8190]) https://wikitech.wikimedia.org/wiki/PyBal [14:42:39] (03CR) 10Elukey: [C: 03+2] hive: store delegation tokens in the db [puppet] - 10https://gerrit.wikimedia.org/r/562474 (https://phabricator.wikimedia.org/T238560) (owner: 10Elukey) [14:43:11] RECOVERY - puppet last run on notebook1004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [14:43:47] PROBLEM - PyBal IPVS diff check on lvs2003 is CRITICAL: CRITICAL: Services known to PyBal but not to IPVS: set([10.2.1.43:443]) https://wikitech.wikimedia.org/wiki/PyBal [14:44:51] those are fine ^ A service is changing ports [14:45:27] (03PS1) 10Muehlenhoff: Readd conditional pinning [puppet] - 10https://gerrit.wikimedia.org/r/562524 [14:45:29] (03CR) 10Herron: [C: 03+2] ganeti: apply ferm regardless of ganeti_cluster fact [puppet] - 10https://gerrit.wikimedia.org/r/559879 (owner: 10Herron) [14:46:10] RECOVERY - PyBal IPVS diff check on lvs1016 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal [14:46:10] RECOVERY - PyBal IPVS diff check on lvs1015 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal [14:46:34] RECOVERY - PyBal IPVS diff check on lvs2006 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal [14:48:04] (03PS1) 10Giuseppe Lavagetto: profile::docker::reporter: periodically generate reports to debmonitor. [puppet] - 10https://gerrit.wikimedia.org/r/562526 (https://phabricator.wikimedia.org/T241206) [14:48:50] RECOVERY - PyBal IPVS diff check on lvs2003 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal [14:50:34] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/562491 (https://phabricator.wikimedia.org/T241459) (owner: 10Jbond) [14:51:37] (03PS1) 10Filippo Giunchedi: WIP puppetvagrant role [puppet] - 10https://gerrit.wikimedia.org/r/562527 [14:52:48] (03CR) 10Jbond: [C: 03+2] motd: fix awk command for parsing puppet configuration_version [puppet] - 10https://gerrit.wikimedia.org/r/562491 (https://phabricator.wikimedia.org/T241459) (owner: 10Jbond) [14:53:43] (03CR) 10jerkins-bot: [V: 04-1] WIP puppetvagrant role [puppet] - 10https://gerrit.wikimedia.org/r/562527 (owner: 10Filippo Giunchedi) [14:54:26] (03PS2) 10Giuseppe Lavagetto: profile::docker::reporter: periodically generate reports to debmonitor. [puppet] - 10https://gerrit.wikimedia.org/r/562526 (https://phabricator.wikimedia.org/T241206) [14:54:28] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1003/20232/" [puppet] - 10https://gerrit.wikimedia.org/r/562524 (owner: 10Muehlenhoff) [14:56:06] (03CR) 10Herron: "https://puppet-compiler.wmflabs.org/compiler1001/20235/" [puppet] - 10https://gerrit.wikimedia.org/r/562294 (owner: 10Herron) [14:56:12] 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown, 10User-DannyS712: Pages whose title ends with semicolon (;) are intermittently inaccessible - https://phabricator.wikimedia.org/T238285 (10geraki) I was wrong in the last comment. It happens when the semicolon is last character in a **short url**.... [14:56:13] (03CR) 10Herron: [C: 03+2] profile::ganeti switch from hiera() to lookup() [puppet] - 10https://gerrit.wikimedia.org/r/562294 (owner: 10Herron) [14:56:17] (03PS2) 10Herron: profile::ganeti switch from hiera() to lookup() [puppet] - 10https://gerrit.wikimedia.org/r/562294 [14:56:19] (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/562524 (owner: 10Muehlenhoff) [14:56:49] (03PS4) 10Ottomata: Switch eventgate-analytics LVS to use TLS port 4192 [puppet] - 10https://gerrit.wikimedia.org/r/559167 (https://phabricator.wikimedia.org/T241073) [14:58:36] RECOVERY - Check the NTP synchronisation status of timesyncd on notebook1004 is OK: OK: synced at Tue 2020-01-07 14:58:35 UTC. https://wikitech.wikimedia.org/wiki/NTP [14:58:40] (03PS4) 10Ottomata: Switch eventgate-main LVS to use TLS port 4292 [puppet] - 10https://gerrit.wikimedia.org/r/559168 (https://phabricator.wikimedia.org/T241073) [14:59:02] (03CR) 10Muehlenhoff: [C: 03+2] Readd conditional pinning [puppet] - 10https://gerrit.wikimedia.org/r/562524 (owner: 10Muehlenhoff) [15:02:15] (03PS3) 10Giuseppe Lavagetto: profile::docker::reporter: periodically generate reports to debmonitor. [puppet] - 10https://gerrit.wikimedia.org/r/562526 (https://phabricator.wikimedia.org/T241206) [15:04:45] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1088 for upgrade', diff saved to https://phabricator.wikimedia.org/P10067 and previous config saved to /var/cache/conftool/dbconfig/20200107-150440-marostegui.json [15:04:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:04:48] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/20237/boron.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/562526 (https://phabricator.wikimedia.org/T241206) (owner: 10Giuseppe Lavagetto) [15:07:31] 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb deamons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [15:09:22] !log reimaging mw2282 [15:09:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:10:06] 10Operations, 10ops-eqiad, 10Dumps-Generation: (No Need By Date) rack/setup/install snapshot1010.eqiad.wmnet - https://phabricator.wikimedia.org/T241794 (10ArielGlenn) Might I be able to get this by Jan 25? This will allow me to do set-up and have it ready to go by Feb 1st. [15:11:16] !log installing urldownloader2001 T241979 [15:11:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:11:18] T241979: eqiad/codfw: 2 VM request for URL downloaders - https://phabricator.wikimedia.org/T241979 [15:16:35] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1088', diff saved to https://phabricator.wikimedia.org/P10068 and previous config saved to /var/cache/conftool/dbconfig/20200107-151633-marostegui.json [15:16:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:17:54] (03PS1) 10Alexandros Kosiaris: otrs: Add otrs-admins group [puppet] - 10https://gerrit.wikimedia.org/r/562530 (https://phabricator.wikimedia.org/T242113) [15:20:34] (03CR) 10Alex Monk: "Well it's called otrs-roots in the diff but otherwise LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/562530 (https://phabricator.wikimedia.org/T242113) (owner: 10Alexandros Kosiaris) [15:21:19] (03CR) 10Alexandros Kosiaris: [C: 03+2] otrs: Add otrs-admins group [puppet] - 10https://gerrit.wikimedia.org/r/562530 (https://phabricator.wikimedia.org/T242113) (owner: 10Alexandros Kosiaris) [15:22:35] (03PS1) 10Tchanders: Disable banner on Special:Block for partial blocks early-adopters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562531 (https://phabricator.wikimedia.org/T240300) [15:22:54] PROBLEM - Check systemd state on boron is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:23:07] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1088', diff saved to https://phabricator.wikimedia.org/P10069 and previous config saved to /var/cache/conftool/dbconfig/20200107-152304-marostegui.json [15:23:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:27:57] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [15:27:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:30:08] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [15:30:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:33:44] (03PS1) 10Bstorm: toolforge-k8s: switch extraVolumes to an array [puppet] - 10https://gerrit.wikimedia.org/r/562532 (https://phabricator.wikimedia.org/T242067) [15:39:49] (03PS1) 10Ema: ATS: add webrequest logging for atskafka [puppet] - 10https://gerrit.wikimedia.org/r/562535 (https://phabricator.wikimedia.org/T237993) [15:41:00] !log installing urldownloader2002 T241979 [15:41:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:41:03] T241979: eqiad/codfw: 2 VM request for URL downloaders - https://phabricator.wikimedia.org/T241979 [15:44:29] !log shutting down db2076 for FW upgrade [15:44:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:45:30] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1088', diff saved to https://phabricator.wikimedia.org/P10070 and previous config saved to /var/cache/conftool/dbconfig/20200107-154529-marostegui.json [15:45:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:45:47] (03CR) 10Vgutierrez: "recheck" [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/562493 (https://phabricator.wikimedia.org/T242093) (owner: 10Vgutierrez) [15:47:18] 10Operations, 10Core Platform Team, 10TechCom, 10User-mobrovac: Service Ownership and Maintenance - https://phabricator.wikimedia.org/T122825 (10Joe) 05Open→03Resolved [15:49:54] (03PS1) 10Elukey: openldap: add Hue note to offboard-user.py [puppet] - 10https://gerrit.wikimedia.org/r/562537 [15:57:37] (03PS1) 10Herron: ganeti: assign ganeti500[123] role::ganeti [puppet] - 10https://gerrit.wikimedia.org/r/562538 (https://phabricator.wikimedia.org/T228099) [15:59:33] (03PS2) 10Herron: ganeti: assign ganeti500[123] role::ganeti [puppet] - 10https://gerrit.wikimedia.org/r/562538 (https://phabricator.wikimedia.org/T228099) [16:01:34] (03CR) 10Muehlenhoff: openldap: add Hue note to offboard-user.py (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/562537 (owner: 10Elukey) [16:02:45] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/562538 (https://phabricator.wikimedia.org/T228099) (owner: 10Herron) [16:02:55] 10Operations, 10ops-eqiad, 10Dumps-Generation: (Need By Jan 25) rack/setup/install snapshot1010.eqiad.wmnet - https://phabricator.wikimedia.org/T241794 (10wiki_willy) [16:03:45] (03PS1) 10Ottomata: Use https://schema.discovery.wmnet for refine mediawiki_events job [puppet] - 10https://gerrit.wikimedia.org/r/562541 (https://phabricator.wikimedia.org/T233630) [16:04:21] (03PS2) 10Ottomata: Use https://schema.discovery.wmnet for refine mediawiki_events job [puppet] - 10https://gerrit.wikimedia.org/r/562541 (https://phabricator.wikimedia.org/T233630) [16:04:32] (03CR) 10Elukey: ">" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/562537 (owner: 10Elukey) [16:06:19] 10Operations, 10Elasticsearch, 10Traffic, 10Discovery-Search (Current work), and 2 others: Sustained periods (2-4h) of bad latency on production-search eqiad - https://phabricator.wikimedia.org/T241421 (10Gehel) >>! In T241421#5773665, @ema wrote: > Clearly, however, this can easily turn in a cat and mouse... [16:06:25] (03CR) 10jerkins-bot: [V: 04-1] Use https://schema.discovery.wmnet for refine mediawiki_events job [puppet] - 10https://gerrit.wikimedia.org/r/562541 (https://phabricator.wikimedia.org/T233630) (owner: 10Ottomata) [16:07:53] (03CR) 10Muehlenhoff: [C: 03+1] openldap: add Hue note to offboard-user.py (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/562537 (owner: 10Elukey) [16:10:08] 10Operations, 10Analytics, 10Product-Analytics, 10SRE-Access-Requests: Requesting access to analytics infrastructure - https://phabricator.wikimedia.org/T242026 (10SNowick_WMF) Hi @Nuria, I already have access to stats and notebooks, sorry for the confusion, this ticket has all the info for that completed... [16:10:34] !log cr1/cr2-eqiad: set port 443 (was 8190) for term schema in analytics-in4 [16:10:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:11:41] (03PS1) 10Elukey: Set port 443 (was 8190) for term schema in analytics-in4 [homer/public] - 10https://gerrit.wikimedia.org/r/562543 [16:12:03] (03CR) 10Elukey: "Already applied manually on cr1/cr2 eqiad :)" [homer/public] - 10https://gerrit.wikimedia.org/r/562543 (owner: 10Elukey) [16:13:53] (03PS4) 10Jhedden: lvs ceph: add cloudceph service and cluster [puppet] - 10https://gerrit.wikimedia.org/r/559110 (https://phabricator.wikimedia.org/T240715) [16:13:59] (03CR) 10Elukey: [C: 03+2] openldap: add Hue note to offboard-user.py [puppet] - 10https://gerrit.wikimedia.org/r/562537 (owner: 10Elukey) [16:14:19] 10Operations, 10Analytics, 10Product-Analytics, 10SRE-Access-Requests: Requesting access to analytics infrastructure - https://phabricator.wikimedia.org/T242026 (10Nuria) @SNowick_WMF Are you just missing access to ldap? can you access https://turnilo.wikimedia.org? [16:16:19] (03CR) 10Dbarratt: [C: 03+1] Disable banner on Special:Block for partial blocks early-adopters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562531 (https://phabricator.wikimedia.org/T240300) (owner: 10Tchanders) [16:16:46] (03PS3) 10Ottomata: Use https://schema.discovery.wmnet for refine mediawiki_events job [puppet] - 10https://gerrit.wikimedia.org/r/562541 (https://phabricator.wikimedia.org/T233630) [16:19:47] 10Operations, 10ops-codfw, 10DBA: Upgrade BIOS and firmware on db2076 - https://phabricator.wikimedia.org/T241647 (10Papaul) a:05Papaul→03Marostegui Before BIOS Version 2.4.3 Firmware Version 2.40.40.40 After BIOS Version 2.11.0 Firmware Version 2.70.70.70 @Marostegui complete [16:20:46] (03CR) 10Ottomata: [C: 03+2] Use https://schema.discovery.wmnet for refine mediawiki_events job [puppet] - 10https://gerrit.wikimedia.org/r/562541 (https://phabricator.wikimedia.org/T233630) (owner: 10Ottomata) [16:21:06] (03PS1) 10Muehlenhoff: Inline a variant of apt::pin to package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/562544 [16:21:38] (03PS3) 10Herron: ganeti: assign ganeti500[123] role::ganeti [puppet] - 10https://gerrit.wikimedia.org/r/562538 (https://phabricator.wikimedia.org/T228099) [16:21:49] (03CR) 10jerkins-bot: [V: 04-1] Inline a variant of apt::pin to package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/562544 (owner: 10Muehlenhoff) [16:22:14] (03CR) 10Muehlenhoff: [C: 03+1] ganeti: assign ganeti500[123] role::ganeti [puppet] - 10https://gerrit.wikimedia.org/r/562538 (https://phabricator.wikimedia.org/T228099) (owner: 10Herron) [16:23:06] (03PS2) 10Muehlenhoff: Inline a variant of apt::pin to package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/562544 [16:25:05] (03CR) 10jerkins-bot: [V: 04-1] Inline a variant of apt::pin to package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/562544 (owner: 10Muehlenhoff) [16:26:54] (03PS3) 10Muehlenhoff: Inline a variant of apt::pin to package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/562544 [16:27:39] (03CR) 10Dmaza: [C: 03+1] Disable banner on Special:Block for partial blocks early-adopters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562531 (https://phabricator.wikimedia.org/T240300) (owner: 10Tchanders) [16:32:32] 10Operations, 10Analytics, 10Product-Analytics, 10SRE-Access-Requests: Requesting access to analytics infrastructure - https://phabricator.wikimedia.org/T242026 (10SNowick_WMF) I'm just not able to login to Hue, I can access Turnilo. [16:32:47] 10Operations, 10SRE-Access-Requests, 10Security: Please grant dsharpe temporary access to mendelevium.eqiad.wmnet - https://phabricator.wikimedia.org/T242113 (10MarcoAurelio) [16:34:38] 10Operations, 10LDAP-Access-Requests: NDA for Superset Request from WMDE Employee - Kris Litson - https://phabricator.wikimedia.org/T241722 (10Dzahn) You should use the same username and password you used on https://wikitech.wikimedia.org when you initially created your user there. [16:35:17] PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - CRITICAL - schema_443: Servers schema1001.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [16:35:35] PROBLEM - PyBal backends health check on lvs1015 is CRITICAL: PYBAL CRITICAL - CRITICAL - schema_443: Servers schema1001.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [16:36:07] ottomata: --^ [16:36:24] yeahhhh [16:36:26] (03CR) 10Vgutierrez: "pcc seems happy on the LVS and in cloudcephmon1001: https://puppet-compiler.wmflabs.org/compiler1003/20241/" [puppet] - 10https://gerrit.wikimedia.org/r/559110 (https://phabricator.wikimedia.org/T240715) (owner: 10Jhedden) [16:36:27] my fault [16:36:30] trying to fix something [16:36:32] 10Operations, 10Analytics, 10Product-Analytics, 10SRE-Access-Requests: Requesting access to analytics infrastructure - https://phabricator.wikimedia.org/T242026 (10Nuria) What is the login you use to access turnilo? [16:36:43] should have downtimed those [16:37:28] ahhh okok nevermind [16:37:40] ACKNOWLEDGEMENT - PyBal backends health check on lvs1015 is CRITICAL: PYBAL CRITICAL - CRITICAL - schema_443: Servers schema1001.eqiad.wmnet are marked down but pooled ottomata fixing something https://wikitech.wikimedia.org/wiki/PyBal [16:37:40] ACKNOWLEDGEMENT - PyBal IPVS diff check on lvs1016 is CRITICAL: CRITICAL: Hosts in IPVS but unknown to PyBal: set([schema1001.eqiad.wmnet]) ottomata fixing something https://wikitech.wikimedia.org/wiki/PyBal [16:37:40] ACKNOWLEDGEMENT - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - CRITICAL - schema_443: Servers schema1001.eqiad.wmnet are marked down but pooled ottomata fixing something https://wikitech.wikimedia.org/wiki/PyBal [16:38:38] 10Operations, 10Analytics, 10Product-Analytics, 10SRE-Access-Requests: Requesting access to analytics infrastructure - https://phabricator.wikimedia.org/T242026 (10SNowick_WMF) Shay Nowick [16:39:14] (03CR) 10Jforrester: [C: 03+2] Disable banner on Special:Block for partial blocks early-adopters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562531 (https://phabricator.wikimedia.org/T240300) (owner: 10Tchanders) [16:39:49] RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [16:39:59] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: (No Need By Date Provided) rack/setup/install frban1001.eqiad.wmnet - https://phabricator.wikimedia.org/T234068 (10Jgreen) p:05Normal→03High [16:40:11] RECOVERY - PyBal backends health check on lvs1015 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [16:40:16] (03Merged) 10jenkins-bot: Disable banner on Special:Block for partial blocks early-adopters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562531 (https://phabricator.wikimedia.org/T240300) (owner: 10Tchanders) [16:40:25] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: (ASAP) rack/setup/install frban1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T234068 (10Jgreen) [16:40:31] (03CR) 10Herron: "https://puppet-compiler.wmflabs.org/compiler1001/20242/" [puppet] - 10https://gerrit.wikimedia.org/r/562538 (https://phabricator.wikimedia.org/T228099) (owner: 10Herron) [16:40:42] (03CR) 10Herron: [C: 03+2] ganeti: assign ganeti500[123] role::ganeti [puppet] - 10https://gerrit.wikimedia.org/r/562538 (https://phabricator.wikimedia.org/T228099) (owner: 10Herron) [16:43:23] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Disable banner on Special:Block for partial blocks early-adopter wikis T240300 (duration: 00m 57s) [16:43:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:43:26] T240300: Introduce a temporary banner on Special:Block to inform users about upcoming partial blocks deploy - https://phabricator.wikimedia.org/T240300 [16:46:00] (03CR) 10BryanDavis: [C: 03+1] toolforge-k8s: switch extraVolumes to an array [puppet] - 10https://gerrit.wikimedia.org/r/562532 (https://phabricator.wikimedia.org/T242067) (owner: 10Bstorm) [16:46:25] (03PS2) 10Bstorm: toolforge-k8s: switch extraVolumes to an array [puppet] - 10https://gerrit.wikimedia.org/r/562532 (https://phabricator.wikimedia.org/T242067) [16:47:15] (03PS1) 10Herron: dns: add forward/reverse ipv4 records ganeti01.svc.eqsin.wmnet [dns] - 10https://gerrit.wikimedia.org/r/562549 (https://phabricator.wikimedia.org/T228099) [16:47:27] (03CR) 10jerkins-bot: [V: 04-1] dns: add forward/reverse ipv4 records ganeti01.svc.eqsin.wmnet [dns] - 10https://gerrit.wikimedia.org/r/562549 (https://phabricator.wikimedia.org/T228099) (owner: 10Herron) [16:49:00] (03CR) 10Bstorm: [C: 03+2] toolforge-k8s: switch extraVolumes to an array [puppet] - 10https://gerrit.wikimedia.org/r/562532 (https://phabricator.wikimedia.org/T242067) (owner: 10Bstorm) [16:52:31] (03Abandoned) 10Herron: dns: add forward/reverse ipv4 records ganeti01.svc.eqsin.wmnet [dns] - 10https://gerrit.wikimedia.org/r/562549 (https://phabricator.wikimedia.org/T228099) (owner: 10Herron) [16:56:07] (03PS1) 10Herron: dns: add forward/reverse ipv4 records ganeti01.svc.eqsin.wmnet [dns] - 10https://gerrit.wikimedia.org/r/562552 (https://phabricator.wikimedia.org/T228099) [17:00:04] godog and _joe_: That opportune time is upon us again. Time for a Puppet SWAT(Max 6 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200107T1700). [17:00:05] No GERRIT patches in the queue for this window AFAICS. [17:04:51] (03CR) 10Vgutierrez: [C: 03+1] lvs ceph: add cloudceph service and cluster [puppet] - 10https://gerrit.wikimedia.org/r/559110 (https://phabricator.wikimedia.org/T240715) (owner: 10Jhedden) [17:05:05] (03CR) 10Jhedden: [C: 03+2] lvs ceph: add cloudceph service and cluster [puppet] - 10https://gerrit.wikimedia.org/r/559110 (https://phabricator.wikimedia.org/T240715) (owner: 10Jhedden) [17:07:56] (03PS1) 10Milimetric: Enable structured-data report [puppet] - 10https://gerrit.wikimedia.org/r/562555 (https://phabricator.wikimedia.org/T239565) [17:10:19] !log vgutierrez@puppetmaster1001 conftool action : set/pooled=yes; selector: service=cloudceph,name=cloudcephmon1001.wikimedia.org [17:10:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:10:28] !log vgutierrez@puppetmaster1001 conftool action : set/pooled=yes; selector: service=cloudceph,name=cloudcephmon1002.wikimedia.org [17:10:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:10:34] !log vgutierrez@puppetmaster1001 conftool action : set/pooled=yes; selector: service=cloudceph,name=cloudcephmon1003.wikimedia.org [17:10:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:10:44] jeh: ^^ :) [17:12:18] !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@b378752]: bump numpy to 1.17.2 [17:12:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:12:36] PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 126284432 and 11 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [17:13:22] !log restarting pybal on lvs1016 - T240715 [17:13:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:13:25] T240715: Configure prometheus monitoring for Ceph - https://phabricator.wikimedia.org/T240715 [17:14:16] RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 176 and 37 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [17:18:04] !log restarting pybal on lvs1015 - T240715 [17:18:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:18:11] !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@b378752]: bump numpy to 1.17.2 (duration: 05m 53s) [17:18:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:19:58] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1088', diff saved to https://phabricator.wikimedia.org/P10071 and previous config saved to /var/cache/conftool/dbconfig/20200107-171955-marostegui.json [17:19:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:21:18] 10Operations, 10ops-codfw, 10DBA: Upgrade BIOS and firmware on db2076 - https://phabricator.wikimedia.org/T241647 (10Marostegui) 05Open→03Resolved a:05Marostegui→03Papaul Thank you Papaul, I have started MySQL. I will repool the host once it has caught up! Thanks again [17:21:20] 10Operations, 10DBA: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Marostegui) [17:21:31] 10Operations, 10DBA: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Marostegui) [17:23:52] !log Remove partitions from dewiki.revision from db2089:3315 T239453 [17:23:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:23:55] T239453: Remove partitions from revision table - https://phabricator.wikimedia.org/T239453 [17:28:29] 10Operations, 10ops-codfw, 10SRE-swift-storage: Degraded RAID on ms-be2035 - https://phabricator.wikimedia.org/T241534 (10Papaul) Thank you for contacting Hewlett Packard Enterprise for your service request. This email confirms your request for service and the details are below. Your request is being worked... [17:28:40] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db2076 T241647', diff saved to https://phabricator.wikimedia.org/P10072 and previous config saved to /var/cache/conftool/dbconfig/20200107-172839-marostegui.json [17:28:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:28:43] T241647: Upgrade BIOS and firmware on db2076 - https://phabricator.wikimedia.org/T241647 [17:29:19] !log Stashing at mwdebug1001 [17:29:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:29:32] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [17:31:20] !log Run scap pull at mwdebug1001, test over [17:31:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:31:55] (03CR) 10Alexandros Kosiaris: [C: 04-1] "This will remove the old service (assuming we do the manual cleanup as well) and hence all clients configured to talk to it will fail unti" [puppet] - 10https://gerrit.wikimedia.org/r/559167 (https://phabricator.wikimedia.org/T241073) (owner: 10Ottomata) [17:32:04] (03CR) 10Alexandros Kosiaris: [C: 04-1] "This will remove the old service (assuming we do the manual cleanup as well) and hence all clients configured to talk to it will fail unti" [puppet] - 10https://gerrit.wikimedia.org/r/559168 (https://phabricator.wikimedia.org/T241073) (owner: 10Ottomata) [17:32:12] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [17:36:30] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [17:38:02] PROBLEM - LVS HTTP IPv4 on cloudceph.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [17:40:29] ^ downtimed the cloudceph.svc.eqiad.wmnet icinga host and services while I'm working on it [18:00:04] cscott, arlolra, subbu, halfak, and accraze: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Services – Graphoid / Parsoid / Citoid / ORES. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200107T1800). [18:00:32] (03PS1) 10Herron: dns: move gerrit-test to public1-c-eqiad [dns] - 10https://gerrit.wikimedia.org/r/562564 (https://phabricator.wikimedia.org/T239151) [18:00:57] (03CR) 10jerkins-bot: [V: 04-1] dns: move gerrit-test to public1-c-eqiad [dns] - 10https://gerrit.wikimedia.org/r/562564 (https://phabricator.wikimedia.org/T239151) (owner: 10Herron) [18:02:26] (03PS2) 10Herron: dns: move gerrit-test to public1-c-eqiad [dns] - 10https://gerrit.wikimedia.org/r/562564 (https://phabricator.wikimedia.org/T239151) [18:02:56] (03CR) 10jerkins-bot: [V: 04-1] dns: move gerrit-test to public1-c-eqiad [dns] - 10https://gerrit.wikimedia.org/r/562564 (https://phabricator.wikimedia.org/T239151) (owner: 10Herron) [18:04:27] (03PS3) 10Herron: dns: move gerrit-test to public1-c-eqiad [dns] - 10https://gerrit.wikimedia.org/r/562564 (https://phabricator.wikimedia.org/T239151) [18:10:03] PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 45433096 and 6 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [18:10:03] PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 54579496 and 8 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [18:12:02] 10Operations, 10Wikimedia-Mailing-lists: Rename and transfer admin of Malaysia Mailing List (wikimediamy) - https://phabricator.wikimedia.org/T241988 (10Dzahn) >>! In T241988#5781080, @Exec8 wrote: > There are still issues: > > 1. I checked the [[ https://lists.wikimedia.org/mailman/listinfo | roster of maili... [18:13:17] RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 1360 and 95 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [18:13:17] RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 0 and 96 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [18:18:34] Hi, about to schedule https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/559612/ for the evening SWAT tonight [18:20:21] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/562560 (owner: 10Filippo Giunchedi) [18:22:37] Urbanecm - please can you just confirm https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200108T0000 is correct? [18:23:12] NicholasG04: looks good - you just need to be around at midnight UTC [18:23:34] Will be! Don't expect anything to go wrong though [18:23:58] For a throttle rule? I don't think he really does [18:24:03] It's not like anything can be tested [18:24:12] Exactly haha [18:24:29] As long as it's syntactically valid... And matches the request IP/values/times [18:24:59] Someday™ we should finish and deploy the extension for managing throttle rules [18:25:17] :) [18:25:24] Reedy: maybe something we could pair on after all hands? [18:25:36] I can't remember how much is outstanding with it [18:25:52] If I remember right it is really close to {{done}}. Like just some edit issues left? [18:25:54] But might not be a bad idea if we can get it over whatever the hurdles are left [18:26:04] (03PS1) 10Jhedden: ceph: Update ceph role system::role name [puppet] - 10https://gerrit.wikimedia.org/r/562574 (https://phabricator.wikimedia.org/T240715) [18:26:05] Yeah, maybe a couple of UI/UX type issues [18:26:22] (03PS1) 10Herron: install_server: add entries for gerrit-test [puppet] - 10https://gerrit.wikimedia.org/r/562575 (https://phabricator.wikimedia.org/T239151) [18:27:32] (03CR) 10jerkins-bot: [V: 04-1] install_server: add entries for gerrit-test [puppet] - 10https://gerrit.wikimedia.org/r/562575 (https://phabricator.wikimedia.org/T239151) (owner: 10Herron) [18:27:48] Reedy: well he doesn't _need_ to be in this very specific case, but it's nice for a GCI student to see a change deployed? Althrough some "seeable" config's probably btter for that [18:27:59] Indeed [18:28:06] The only change is it really appearing on noc [18:28:07] After the cache expires [18:28:19] Hoping it works when it needs to haha [18:28:31] (03PS2) 10Herron: install_server: add entries for gerrit-test [puppet] - 10https://gerrit.wikimedia.org/r/562575 (https://phabricator.wikimedia.org/T239151) [18:30:16] (03CR) 10Herron: [C: 03+2] dns: move gerrit-test to public1-c-eqiad [dns] - 10https://gerrit.wikimedia.org/r/562564 (https://phabricator.wikimedia.org/T239151) (owner: 10Herron) [18:31:21] (03CR) 10Herron: [C: 03+2] install_server: add entries for gerrit-test [puppet] - 10https://gerrit.wikimedia.org/r/562575 (https://phabricator.wikimedia.org/T239151) (owner: 10Herron) [18:34:41] (03PS2) 10Jhedden: ceph: Update ceph role system::role name [puppet] - 10https://gerrit.wikimedia.org/r/562574 (https://phabricator.wikimedia.org/T240715) [18:34:55] (03PS2) 10Ottomata: Enable structured-data report [puppet] - 10https://gerrit.wikimedia.org/r/562555 (https://phabricator.wikimedia.org/T239565) (owner: 10Milimetric) [18:35:24] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Enable structured-data report [puppet] - 10https://gerrit.wikimedia.org/r/562555 (https://phabricator.wikimedia.org/T239565) (owner: 10Milimetric) [18:37:15] (03PS3) 10Jhedden: ceph: Update ceph role system::role name [puppet] - 10https://gerrit.wikimedia.org/r/562574 (https://phabricator.wikimedia.org/T240715) [18:38:32] !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@511f745]: [airflow] Force PYTHONPATH to use pyspark 3.5 deps [18:38:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:38:46] !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@511f745]: [airflow] Force PYTHONPATH to use pyspark 3.5 deps (duration: 00m 14s) [18:38:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:44:55] (03PS1) 10Ladsgroup: Set useEntitySourceBasedFederation to true for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562578 (https://phabricator.wikimedia.org/T241971) [18:46:58] 10Operations, 10LDAP-Access-Requests, 10WMF-Legal: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T242080 (10Aklapper) @WMDE-leszek: I fail to find any mention of the `#WMF-NDA` Phab project tag in the section at https://wikitech.wikimedia.org/wiki/Volunteer_NDA#Privileged_LDA... [18:54:06] (03PS2) 10Ladsgroup: Set useEntitySourceBasedFederation to true for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562578 (https://phabricator.wikimedia.org/T241972) [18:59:03] (03PS4) 10Jhedden: ceph: Update ceph role system::role name [puppet] - 10https://gerrit.wikimedia.org/r/562574 (https://phabricator.wikimedia.org/T240715) [19:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200107T1900) [19:02:18] !log 1.35.0-wmf.14 was branched at fb16374c5bdb9d14729f358fb81638fc91640b4f T233862 [19:02:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:02:21] T233862: 1.35.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T233862 [19:04:01] (03CR) 10Herron: [C: 03+2] dns: add forward/reverse ipv4 records ganeti01.svc.eqsin.wmnet [dns] - 10https://gerrit.wikimedia.org/r/562552 (https://phabricator.wikimedia.org/T228099) (owner: 10Herron) [19:04:05] (03PS2) 10Herron: dns: add forward/reverse ipv4 records ganeti01.svc.eqsin.wmnet [dns] - 10https://gerrit.wikimedia.org/r/562552 (https://phabricator.wikimedia.org/T228099) [19:10:04] herron: there is a typo in this patch [19:10:35] volans: thanks [19:11:52] (03PS5) 10Jhedden: ceph: Update ceph role system::role name [puppet] - 10https://gerrit.wikimedia.org/r/562574 (https://phabricator.wikimedia.org/T240715) [19:13:00] (03PS1) 10Herron: dns: fix typo in ganeti01.svc.eqsin.wmnet [dns] - 10https://gerrit.wikimedia.org/r/562586 (https://phabricator.wikimedia.org/T228099) [19:13:47] volans: ^ this is what you meant right? [19:15:19] 10Operations, 10LDAP-Access-Requests, 10WMF-Legal: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T242080 (10WMDE-leszek) Sorry @AKlapper, I was wrong. We have copied the tag over in the template we're using, while it indeed is not needed. I am fixing WMDE template now. Earli... [19:15:20] herron: yep [19:15:28] (03CR) 10Volans: [C: 03+1] "LGTM" [dns] - 10https://gerrit.wikimedia.org/r/562586 (https://phabricator.wikimedia.org/T228099) (owner: 10Herron) [19:15:42] great thanks! [19:16:06] (03CR) 10Herron: [C: 03+2] dns: fix typo in ganeti01.svc.eqsin.wmnet [dns] - 10https://gerrit.wikimedia.org/r/562586 (https://phabricator.wikimedia.org/T228099) (owner: 10Herron) [19:18:03] (03PS1) 10Herron: gerrit: assign host gerrit-test role::gerrit [puppet] - 10https://gerrit.wikimedia.org/r/562587 (https://phabricator.wikimedia.org/T239151) [19:18:50] 10Operations, 10SRE-Access-Requests: Requesting access to RESTBase for clarakosi - https://phabricator.wikimedia.org/T242152 (10Clarakosi) [19:20:58] (03PS6) 10Jhedden: ceph: Update ceph role system::role name [puppet] - 10https://gerrit.wikimedia.org/r/562574 (https://phabricator.wikimedia.org/T240715) [19:21:42] (03PS1) 10EBernhardson: Enable hive kerberos connections from search/airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/562589 [19:22:26] (03CR) 10Paladox: [C: 03+1] "Awesome!" [puppet] - 10https://gerrit.wikimedia.org/r/562587 (https://phabricator.wikimedia.org/T239151) (owner: 10Herron) [19:22:41] 10Operations, 10Gerrit, 10vm-requests, 10Patch-For-Review: Gerrit VM to test data migration - https://phabricator.wikimedia.org/T239151 (10herron) ganeti-test.wikimedia.org VM has been created on row_C, and I've uploaded a patch to assign it role::gerrit with https://gerrit.wikimedia.org/r/#/c/operations/p... [19:23:07] 10Operations, 10SRE-Access-Requests: Requesting access to RESTBase for clarakosi - https://phabricator.wikimedia.org/T242152 (10WDoranWMF) Hi, I'm Clara's Manager and I approve this. [19:24:32] (03PS7) 10Jhedden: ceph: Update ceph role desc and lvs pool map [puppet] - 10https://gerrit.wikimedia.org/r/562574 (https://phabricator.wikimedia.org/T240715) [19:25:30] (03CR) 10EBernhardson: "I'm not sure if this is all that is needed, but it was the only difference I noticed comparing hive config on an-airflow1001 vs stat1007. " [puppet] - 10https://gerrit.wikimedia.org/r/562589 (owner: 10EBernhardson) [19:26:18] (03CR) 10Jhedden: [C: 03+2] "PCC results https://puppet-compiler.wmflabs.org/compiler1001/20250/" [puppet] - 10https://gerrit.wikimedia.org/r/562574 (https://phabricator.wikimedia.org/T240715) (owner: 10Jhedden) [19:28:07] RECOVERY - LVS HTTP IPv4 on cloudceph.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 362 bytes in 0.005 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [19:28:53] !log mwscript createAndPromote.php foundationwiki 'Jdforrester (WMF)' --force --custom-groups=interface-admin for T241950 [19:28:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:29:53] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [19:38:14] 10Operations, 10LDAP-Access-Requests: Allow LDAP access to superset dashboards for Moushira Elamrawy - https://phabricator.wikimedia.org/T242000 (10Moushira) Hi @Dzahn yes, we can also do that. I changed the associated email to my wikimedia .org email. Can't log in to superset though. [19:57:05] jouncebot: now [19:57:05] For the next 0 hour(s) and 2 minute(s): Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200107T1900) [19:58:06] 10Operations, 10ops-codfw, 10SRE-swift-storage: Degraded RAID on ms-be2035 - https://phabricator.wikimedia.org/T241534 (10Papaul) Hello Papaul, Thank you for your response and sharing the screen shot of the failed HDD. I have ordered the hard drive (SSD) and it would be shipped to the servers address share... [19:59:31] (03PS1) 10Ottomata: Use https://schema.wikimedia.org for client side pretty-autoindex js [puppet] - 10https://gerrit.wikimedia.org/r/562594 [19:59:41] (03PS1) 10Jeena Huneidi: testwikis wikis to 1.35.0-wmf.14 refs T233862 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562595 [19:59:43] (03CR) 10Jeena Huneidi: [C: 03+2] testwikis wikis to 1.35.0-wmf.14 refs T233862 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562595 (owner: 10Jeena Huneidi) [20:00:04] longma and liw: (Dis)respected human, time to deploy Mediawiki train - American+European Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200107T2000). Please do the needful. [20:00:55] (03Merged) 10jenkins-bot: testwikis wikis to 1.35.0-wmf.14 refs T233862 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562595 (owner: 10Jeena Huneidi) [20:01:28] !log jhuneidi@deploy1001 Started scap: testwikis wikis to 1.35.0-wmf.14 refs T233862 [20:01:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:01:28] T233862: 1.35.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T233862 [20:01:43] (03CR) 10jerkins-bot: [V: 04-1] Use https://schema.wikimedia.org for client side pretty-autoindex js [puppet] - 10https://gerrit.wikimedia.org/r/562594 (owner: 10Ottomata) [20:03:40] (03PS2) 10Ottomata: Use https://schema.wikimedia.org for client side pretty-autoindex js [puppet] - 10https://gerrit.wikimedia.org/r/562594 [20:05:43] (03CR) 10jerkins-bot: [V: 04-1] Use https://schema.wikimedia.org for client side pretty-autoindex js [puppet] - 10https://gerrit.wikimedia.org/r/562594 (owner: 10Ottomata) [20:07:04] !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@867d674]: Bump to master: Allow cli to load without pyspark [20:07:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:07:12] 10Operations, 10LDAP-Access-Requests, 10WMF-Legal: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T242080 (10RStallman-legalteam) @ Silvan_WMDE - I will send you the NDA via Docusign. The link will be emailed to your WMDE account. Thanks! [20:07:14] (03PS3) 10Ottomata: Use https://schema.wikimedia.org for client side pretty-autoindex js [puppet] - 10https://gerrit.wikimedia.org/r/562594 [20:08:53] 10Operations, 10Traffic, 10Performance-Team (Radar): Add profiling for Varnish and VCL - https://phabricator.wikimedia.org/T175710 (10Krinkle) [20:09:15] (03CR) 10jerkins-bot: [V: 04-1] Use https://schema.wikimedia.org for client side pretty-autoindex js [puppet] - 10https://gerrit.wikimedia.org/r/562594 (owner: 10Ottomata) [20:10:35] (03PS4) 10Ottomata: Use https://schema.wikimedia.org for client side pretty-autoindex js [puppet] - 10https://gerrit.wikimedia.org/r/562594 [20:11:48] 10Operations, 10ops-codfw, 10Core Platform Team: (No Need By Date Provided) rack/setup/install restbase202[123] - https://phabricator.wikimedia.org/T241790 (10Papaul) [20:12:29] !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@867d674]: Bump to master: Allow cli to load without pyspark (duration: 05m 13s) [20:12:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:14:27] (03CR) 10Ottomata: "https://puppet-compiler.wmflabs.org/compiler1002/20253/schema1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/562594 (owner: 10Ottomata) [20:14:29] (03CR) 10Ottomata: [C: 03+2] Use https://schema.wikimedia.org for client side pretty-autoindex js [puppet] - 10https://gerrit.wikimedia.org/r/562594 (owner: 10Ottomata) [20:27:15] (03CR) 10Dzahn: [C: 03+1] caching-proxy: squid vs squid3 paths [puppet] - 10https://gerrit.wikimedia.org/r/562560 (owner: 10Filippo Giunchedi) [20:28:19] (03PS5) 10Ottomata: Set up new LVS service eventgate-analytics-https [puppet] - 10https://gerrit.wikimedia.org/r/559167 (https://phabricator.wikimedia.org/T241073) [20:29:19] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [20:30:11] !log jhuneidi@deploy1001 Finished scap: testwikis wikis to 1.35.0-wmf.14 refs T233862 (duration: 29m 01s) [20:30:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:30:14] T233862: 1.35.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T233862 [20:30:26] (03CR) 10jerkins-bot: [V: 04-1] Set up new LVS service eventgate-analytics-https [puppet] - 10https://gerrit.wikimedia.org/r/559167 (https://phabricator.wikimedia.org/T241073) (owner: 10Ottomata) [20:31:47] (03CR) 10Ottomata: Set up new LVS service eventgate-analytics-https (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/559167 (https://phabricator.wikimedia.org/T241073) (owner: 10Ottomata) [20:31:59] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [20:33:26] (03CR) 10Ottomata: New eventstreams chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/551843 (https://phabricator.wikimedia.org/T238658) (owner: 10Ottomata) [20:33:46] (03PS1) 10Jdlrobson: Disable beta (for now) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562607 (https://phabricator.wikimedia.org/T237290) [20:33:49] 10Operations, 10Design-Research, 10Domains, 10Traffic: Register a domain and redirect URL - https://phabricator.wikimedia.org/T241944 (10CRoslof) I've registered wikipersonas.org. If it turns out we aren't actually going to use it, though, or if we are going to use a different domain name instead, please l... [20:36:01] (03PS2) 10Jforrester: Disable MobileFrontend's beta mode on all wikis (for now) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562607 (https://phabricator.wikimedia.org/T237290) (owner: 10Jdlrobson) [20:36:15] !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@6c1f455]: Bump to master: Allow cli to load without pyspark [20:36:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:36:44] (03PS1) 10Jeena Huneidi: group0 wikis to 1.35.0-wmf.14 refs T233862 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562609 [20:36:46] (03CR) 10Jeena Huneidi: [C: 03+2] group0 wikis to 1.35.0-wmf.14 refs T233862 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562609 (owner: 10Jeena Huneidi) [20:37:51] (03Merged) 10jenkins-bot: group0 wikis to 1.35.0-wmf.14 refs T233862 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562609 (owner: 10Jeena Huneidi) [20:39:13] (03PS1) 10Jhedden: ceph: update ferm for prometheus exporter [puppet] - 10https://gerrit.wikimedia.org/r/562610 (https://phabricator.wikimedia.org/T240715) [20:40:32] !log jhuneidi@deploy1001 rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.14 refs T233862 [20:40:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:40:35] T233862: 1.35.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T233862 [20:41:27] (03CR) 10Jhedden: [C: 03+2] "PCC results" [puppet] - 10https://gerrit.wikimedia.org/r/562610 (https://phabricator.wikimedia.org/T240715) (owner: 10Jhedden) [20:41:59] (03CR) 10Aaron Schulz: [C: 03+1] mediawiki: Capture shutdown/destruct backtrace in php7-fatal-error.php [puppet] - 10https://gerrit.wikimedia.org/r/559262 (https://phabricator.wikimedia.org/T241097) (owner: 10Krinkle) [20:42:09] !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@6c1f455]: Bump to master: Allow cli to load without pyspark (duration: 05m 55s) [20:42:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:43:35] (03PS1) 10Herron: dns: add forward/reverse records for netflow5001 [dns] - 10https://gerrit.wikimedia.org/r/562613 (https://phabricator.wikimedia.org/T228099) [20:43:42] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:44:33] (03PS1) 10Jhedden: Revert "ceph: update ferm for prometheus exporter" [puppet] - 10https://gerrit.wikimedia.org/r/562614 [20:45:01] (03CR) 10Jhedden: [C: 03+2] Revert "ceph: update ferm for prometheus exporter" [puppet] - 10https://gerrit.wikimedia.org/r/562614 (owner: 10Jhedden) [20:45:45] (03CR) 10Herron: [C: 03+2] dns: add forward/reverse records for netflow5001 [dns] - 10https://gerrit.wikimedia.org/r/562613 (https://phabricator.wikimedia.org/T228099) (owner: 10Herron) [20:45:49] (03CR) 10Jhedden: [V: 03+1 C: 03+2] Revert "ceph: update ferm for prometheus exporter" [puppet] - 10https://gerrit.wikimedia.org/r/562614 (owner: 10Jhedden) [20:45:58] (03CR) 10Jhedden: [V: 03+2 C: 03+2] Revert "ceph: update ferm for prometheus exporter" [puppet] - 10https://gerrit.wikimedia.org/r/562614 (owner: 10Jhedden) [20:51:04] (03CR) 10Dzahn: [C: 04-1] "per IRC, we should wait with this and first disable monitoring and check where in Hiera we need to add it and how we can deal with temp. 3" [puppet] - 10https://gerrit.wikimedia.org/r/562587 (https://phabricator.wikimedia.org/T239151) (owner: 10Herron) [20:51:06] 10Operations, 10Performance-Team, 10Traffic, 10observability: Ensure graphs used by Performance account for Varnish-to-ATS migration - https://phabricator.wikimedia.org/T233474 (10Krinkle) [20:51:19] (03CR) 10Thcipriani: [C: 04-1] "Awesome! Thanks for this!" [puppet] - 10https://gerrit.wikimedia.org/r/562587 (https://phabricator.wikimedia.org/T239151) (owner: 10Herron) [21:00:26] (03PS1) 10Jhedden: ceph: add prometheus servers to ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/562616 (https://phabricator.wikimedia.org/T240715) [21:01:11] (03PS1) 10Herron: install_server: add netflow5001 dhcp entry [puppet] - 10https://gerrit.wikimedia.org/r/562617 (https://phabricator.wikimedia.org/T228099) [21:02:06] (03CR) 10Jhedden: [C: 03+2] "PCC results https://puppet-compiler.wmflabs.org/compiler1001/20255/" [puppet] - 10https://gerrit.wikimedia.org/r/562616 (https://phabricator.wikimedia.org/T240715) (owner: 10Jhedden) [21:02:36] (03CR) 10jerkins-bot: [V: 04-1] install_server: add netflow5001 dhcp entry [puppet] - 10https://gerrit.wikimedia.org/r/562617 (https://phabricator.wikimedia.org/T228099) (owner: 10Herron) [21:03:45] (03PS2) 10Herron: install_server: add netflow5001 dhcp entry [puppet] - 10https://gerrit.wikimedia.org/r/562617 (https://phabricator.wikimedia.org/T228099) [21:05:38] (03CR) 10Herron: [C: 03+2] install_server: add netflow5001 dhcp entry [puppet] - 10https://gerrit.wikimedia.org/r/562617 (https://phabricator.wikimedia.org/T228099) (owner: 10Herron) [21:05:49] (03PS3) 10Herron: install_server: add netflow5001 dhcp entry [puppet] - 10https://gerrit.wikimedia.org/r/562617 (https://phabricator.wikimedia.org/T228099) [21:07:07] (03CR) 10Thcipriani: [C: 03+1] "+1 based entirely off of commit message -- doesn't sound like it should change much for gerrit." [puppet] - 10https://gerrit.wikimedia.org/r/562285 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [21:08:40] PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 68456456 and 9 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [21:11:52] RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 87744 and 98 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [21:12:31] (03PS1) 10Paladox: Gerrit: Tweak hiera values for gerrit-test.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/562618 [21:15:33] (03PS2) 10Paladox: Gerrit: Tweak hiera values for gerrit-test.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/562618 [21:18:47] (03PS3) 10Paladox: Gerrit: Tweak hiera values for gerrit-test.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/562618 [21:25:15] (03CR) 10Dzahn: profile::url_downloader: Add types and switch to lookup() (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/562472 (owner: 10Muehlenhoff) [21:30:06] (03PS1) 10Dzahn: base/icinga: disable notifications and some monitoring for gerrit-test [puppet] - 10https://gerrit.wikimedia.org/r/562619 (https://phabricator.wikimedia.org/T239151) [21:32:50] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: (ASAP) rack/setup/install frnetmon1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T232137 (10Jgreen) p:05Normal→03High [21:33:04] (03CR) 10Dzahn: "This change is ready for review." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/562618 (owner: 10Paladox) [21:33:34] (03PS4) 10Paladox: Gerrit: Tweak hiera values for gerrit-test.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/562618 [21:33:48] (03CR) 10Paladox: Gerrit: Tweak hiera values for gerrit-test.wikimedia.org (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/562618 (owner: 10Paladox) [21:47:59] (03PS1) 10Dzahn: add IPv6 records for gerrit-test.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/562622 (https://phabricator.wikimedia.org/T239151) [21:48:13] PROBLEM - Disk space on notebook1004 is CRITICAL: DISK CRITICAL - free space: /srv 5080 MB (3% inode=80%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=notebook1004&var-datasource=eqiad+prometheus/ops [21:50:08] (03CR) 10Dzahn: [C: 03+2] add IPv6 records for gerrit-test.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/562622 (https://phabricator.wikimedia.org/T239151) (owner: 10Dzahn) [21:50:21] (03PS1) 10Ottomata: Use new schemas/event/{primary,secondary} in staging eventgate services [deployment-charts] - 10https://gerrit.wikimedia.org/r/562623 (https://phabricator.wikimedia.org/T240985) [21:50:26] (03CR) 10Dzahn: "IP is already assigned on eno1" [dns] - 10https://gerrit.wikimedia.org/r/562622 (https://phabricator.wikimedia.org/T239151) (owner: 10Dzahn) [21:51:46] (03CR) 10Dzahn: "[authdns1001:~] $ host gerrit-test.wikimedia.org" [dns] - 10https://gerrit.wikimedia.org/r/562622 (https://phabricator.wikimedia.org/T239151) (owner: 10Dzahn) [21:57:24] (03CR) 10Dzahn: [C: 03+2] Gerrit: Tweak hiera values for gerrit-test.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/562618 (owner: 10Paladox) [21:57:35] (03PS5) 10Dzahn: Gerrit: Tweak hiera values for gerrit-test.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/562618 (owner: 10Paladox) [21:58:32] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: (ASAP) rack/setup/install frnetmon1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T232137 (10Jgreen) [21:59:33] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: (ASAP) rack/setup/install frnetmon1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T232137 (10Jgreen) [22:04:45] (03CR) 10Herron: [C: 03+1] base/icinga: disable notifications and some monitoring for gerrit-test [puppet] - 10https://gerrit.wikimedia.org/r/562619 (https://phabricator.wikimedia.org/T239151) (owner: 10Dzahn) [22:06:09] (03PS1) 10Ottomata: Use relative nginx redirects for schema.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/562624 (https://phabricator.wikimedia.org/T233630) [22:06:55] (03CR) 10jerkins-bot: [V: 04-1] Use relative nginx redirects for schema.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/562624 (https://phabricator.wikimedia.org/T233630) (owner: 10Ottomata) [22:07:35] (03PS2) 10Ottomata: Use relative nginx redirects for schema.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/562624 (https://phabricator.wikimedia.org/T233630) [22:07:53] (03CR) 10Ottomata: "https://puppet-compiler.wmflabs.org/compiler1003/20256/schema1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/562624 (https://phabricator.wikimedia.org/T233630) (owner: 10Ottomata) [22:08:06] (03PS3) 10Ottomata: Use relative nginx redirects for schema.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/562624 (https://phabricator.wikimedia.org/T233630) [22:08:51] (03PS2) 10Dzahn: base/icinga: disable notifications and some monitoring for gerrit-test [puppet] - 10https://gerrit.wikimedia.org/r/562619 (https://phabricator.wikimedia.org/T239151) [22:09:10] (03CR) 10Ottomata: [C: 03+2] Use relative nginx redirects for schema.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/562624 (https://phabricator.wikimedia.org/T233630) (owner: 10Ottomata) [22:09:34] (03PS1) 10Dzahn: gerrit::proxy: fix typo in "reploca_hosts" [puppet] - 10https://gerrit.wikimedia.org/r/562627 [22:10:39] PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 20636552 and 2 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [22:11:42] (03CR) 10Paladox: [C: 03+1] gerrit::proxy: fix typo in "reploca_hosts" [puppet] - 10https://gerrit.wikimedia.org/r/562627 (owner: 10Dzahn) [22:12:27] RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 160688 and 42 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [22:12:28] (03CR) 10Paladox: [C: 03+1] base/icinga: disable notifications and some monitoring for gerrit-test [puppet] - 10https://gerrit.wikimedia.org/r/562619 (https://phabricator.wikimedia.org/T239151) (owner: 10Dzahn) [22:17:41] (03CR) 10Dzahn: [C: 03+2] "noop https://puppet-compiler.wmflabs.org/compiler1001/20257/" [puppet] - 10https://gerrit.wikimedia.org/r/562627 (owner: 10Dzahn) [22:19:52] (03CR) 10Dzahn: [C: 03+2] "we will need more follow-up to stop using is_replica for other things but it can't hurt..just affects the host and role isn't applied yet." [puppet] - 10https://gerrit.wikimedia.org/r/562619 (https://phabricator.wikimedia.org/T239151) (owner: 10Dzahn) [22:24:17] PROBLEM - DPKG on netflow5001 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.30: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [22:26:33] PROBLEM - Disk space on netflow5001 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.30: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=netflow5001&var-datasource=eqsin+prometheus/ops [22:28:48] ACKNOWLEDGEMENT - Check whether microcode mitigations for CPU vulnerabilities are applied on netflow5001 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.30: Connection reset by peer daniel_zahn https://phabricator.wikimedia.org/T228099 https://wikitech.wikimedia.org/wiki/Microcode [22:28:48] ACKNOWLEDGEMENT - DPKG on netflow5001 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.30: Connection reset by peer daniel_zahn https://phabricator.wikimedia.org/T228099 https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [22:28:48] ACKNOWLEDGEMENT - Disk space on netflow5001 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.30: Connection reset by peer daniel_zahn https://phabricator.wikimedia.org/T228099 https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=netflow5001&var-datasource=eqsin+prometheus/ops [22:29:30] scheduled 2 hours downtime for that ^ looks WIP [22:31:19] (03CR) 10Alex Monk: [C: 03+1] wmcs: stop the rpcbind service [puppet] - 10https://gerrit.wikimedia.org/r/562377 (https://phabricator.wikimedia.org/T241710) (owner: 10Bstorm) [22:31:33] (03CR) 10Jhedden: wmcs: stop the rpcbind service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/562377 (https://phabricator.wikimedia.org/T241710) (owner: 10Bstorm) [22:33:01] (03CR) 10Bstorm: wmcs: stop the rpcbind service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/562377 (https://phabricator.wikimedia.org/T241710) (owner: 10Bstorm) [22:33:09] (03PS1) 10BryanDavis: dsh.yaml: Add cloudweb2001-dev as a scap deploy target [puppet] - 10https://gerrit.wikimedia.org/r/562633 (https://phabricator.wikimedia.org/T241251) [22:34:37] RECOVERY - Disk space on netflow5001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=netflow5001&var-datasource=eqsin+prometheus/ops [22:34:47] RECOVERY - DPKG on netflow5001 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [22:35:04] mutante:  https://gerrit.wikimedia.org/r/562633 is a trivial merge if you have time. If not I can pester somebody else. [22:35:48] 10Operations: rack/setup/install ganeti500[123].eqsin.wmnet - https://phabricator.wikimedia.org/T228099 (10herron) [22:36:28] bd808: that is "add wikitech to dsh groups" great.. yes , i will do that [22:36:49] i was about to comment that it is conftool nowadays.. but it is not for this [22:36:58] i see [22:37:04] *nod* always a snowflake :) [22:37:33] I think we may figure out how to run wikitech and labtestwikitech in the main cluster soon actually [22:37:44] which will be awesome if we can make it happen [22:38:09] and the next step in a 4+ year saga of making that wiki less weird [22:38:09] "-dev" host means "labtest" ? [22:38:48] mutante: yeah cloud2*-dev means it is part of our staging cluster in codfw [22:38:55] 10Operations: rack/setup/install ganeti500[123].eqsin.wmnet - https://phabricator.wikimedia.org/T228099 (10herron) The eqsin ganeti cluster is now up and running, and a first VM `netflow5001` has been created. I'll kick this over to @MoritzMuehlenhoff now to give things a last review and close this out. [22:39:00] (03CR) 10Dzahn: [C: 03+2] dsh.yaml: Add cloudweb2001-dev as a scap deploy target [puppet] - 10https://gerrit.wikimedia.org/r/562633 (https://phabricator.wikimedia.org/T241251) (owner: 10BryanDavis) [22:40:35] jouncebot: next [22:40:35] In 1 hour(s) and 19 minute(s): Evening SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200108T0000) [22:40:51] (03PS3) 10RLazarus: Test multiple hosts in parallel. [software/httpbb] - 10https://gerrit.wikimedia.org/r/559952 [22:43:21] (03CR) 10RLazarus: "Dusting this off after the break. :) See also https://gerrit.wikimedia.org/r/555515 also pending review, which this change depends on. Tha" (034 comments) [software/httpbb] - 10https://gerrit.wikimedia.org/r/559952 (owner: 10RLazarus) [22:44:19] PROBLEM - Host cp3055 is DOWN: PING CRITICAL - Packet loss = 100% [22:45:02] https://phabricator.wikimedia.org/T240425 ? [22:45:24] looks like it [22:46:02] 10Operations, 10Traffic: cp3055 crashed - https://phabricator.wikimedia.org/T240425 (10Krenair) Down again, 2020-01-07, 22:44:19ish based on icinga IRC message [22:52:34] (03PS1) 10Jhedden: lvs: update cloudceph proxy check url [puppet] - 10https://gerrit.wikimedia.org/r/562637 (https://phabricator.wikimedia.org/T240715) [22:56:21] (03CR) 10Bstorm: wmcs: stop the rpcbind service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/562377 (https://phabricator.wikimedia.org/T241710) (owner: 10Bstorm) [22:58:59] (03CR) 10Jhedden: [C: 03+1] wmcs: stop the rpcbind service [puppet] - 10https://gerrit.wikimedia.org/r/562377 (https://phabricator.wikimedia.org/T241710) (owner: 10Bstorm) [23:02:37] !log cp3055.mgmt% racadm serveraction powercycle T240425 [23:02:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:02:41] T240425: cp3055 crashed - https://phabricator.wikimedia.org/T240425 [23:04:34] Here in case needed for my very small throttle rule on the evening SWAT [23:05:42] RECOVERY - Host cp3055 is UP: PING OK - Packet loss = 0%, RTA = 83.37 ms [23:05:55] (03PS1) 10Dzahn: gerrit: adjust bacula backup behaviour to deal with multiple hosts [puppet] - 10https://gerrit.wikimedia.org/r/562639 (https://phabricator.wikimedia.org/T239151) [23:06:42] PROBLEM - Postgres Replication Lag on maps2001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 29822600 and 3 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [23:06:53] (03CR) 10jerkins-bot: [V: 04-1] gerrit: adjust bacula backup behaviour to deal with multiple hosts [puppet] - 10https://gerrit.wikimedia.org/r/562639 (https://phabricator.wikimedia.org/T239151) (owner: 10Dzahn) [23:08:28] (03CR) 10Paladox: gerrit: adjust bacula backup behaviour to deal with multiple hosts (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/562639 (https://phabricator.wikimedia.org/T239151) (owner: 10Dzahn) [23:09:35] 10Operations, 10Traffic: cp3055 crashed - https://phabricator.wikimedia.org/T240425 (10CDanis) Nothing in `racadm getsel` or `racadm lclog view` (latter just has me logging in over SSH). [23:09:44] RECOVERY - Postgres Replication Lag on maps2001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 44600 and 91 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [23:10:24] PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 79599928 and 6 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [23:12:00] RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 21080 and 77 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [23:14:40] (03PS3) 10Bstorm: wmcs: stop the rpcbind service [puppet] - 10https://gerrit.wikimedia.org/r/562377 (https://phabricator.wikimedia.org/T241710) [23:15:26] (03CR) 10Dzahn: gerrit: adjust bacula backup behaviour to deal with multiple hosts (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/562639 (https://phabricator.wikimedia.org/T239151) (owner: 10Dzahn) [23:19:50] (03CR) 10Dzahn: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1001/20258/gerrit1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/562639 (https://phabricator.wikimedia.org/T239151) (owner: 10Dzahn) [23:28:34] (03CR) 10Dzahn: [C: 03+2] Switch Gerrit/Phabricator to standard Partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/562285 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [23:28:42] (03PS2) 10Dzahn: Switch Gerrit/Phabricator to standard Partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/562285 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [23:33:40] (03PS1) 10EBernhardson: spark-env.sh: Allow overriding python version detection [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/562651 [23:36:29] !log [puppetmaster2001:/var/run/confd-template] $ sudo rm .cloudceph*.err [23:36:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:41] (03CR) 10EBernhardson: "tested by copying the default spark conf dir, and setting SPARK_CONF_DIR to load the desired spark-env.sh. It seems a bit hacky, more idea" [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/562651 (owner: 10EBernhardson) [23:37:47] (03CR) 10Bstorm: [C: 03+2] wmcs: stop the rpcbind service [puppet] - 10https://gerrit.wikimedia.org/r/562377 (https://phabricator.wikimedia.org/T241710) (owner: 10Bstorm) [23:38:21] (03CR) 10EBernhardson: "tested by copying the default spark conf dir, copying hive-site.xml from stat1007 to an-airflow1001, and setting SPARK_CONF_DIR to load th" [puppet] - 10https://gerrit.wikimedia.org/r/562589 (owner: 10EBernhardson) [23:40:20] 10Operations, 10DBA: backup2001 crashed 2019-12-08 - https://phabricator.wikimedia.org/T240177 (10Papaul) @jcrespo the last time the system crashed we just upgrade the IDRAC and not the BIOS. The system BIOS version is at 1.37 right now or the new BISO version for the server is 2.4.8 and according the the Dell... [23:41:05] (03PS1) 10CDanis: confd nrpe: link to new docs section [puppet] - 10https://gerrit.wikimedia.org/r/562654 [23:42:28] (03CR) 10Dzahn: [C: 03+1] confd nrpe: link to new docs section [puppet] - 10https://gerrit.wikimedia.org/r/562654 (owner: 10CDanis) [23:43:09] (03CR) 10CDanis: [C: 03+2] confd nrpe: link to new docs section [puppet] - 10https://gerrit.wikimedia.org/r/562654 (owner: 10CDanis) [23:44:38] RECOVERY - mediawiki-installation DSH group on cloudweb2001-dev is OK: OK https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [23:48:52] (03CR) 10Dzahn: [C: 03+2] "note the "pmtpa" part:) and none of "bugzilla, nfs1, nfs2, sanger or virt1" exist anymore as hosts." [labs/private] - 10https://gerrit.wikimedia.org/r/561909 (owner: 10Dzahn) [23:49:27] Reedy: ^ sanger.. pmtpa :) [23:49:41] lolol [23:50:37] (03CR) 10CDanis: [C: 03+2] "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/562385 (owner: 10CDanis) [23:50:39] 10Operations, 10Traffic: servers freeze across the caching cluster - https://phabricator.wikimedia.org/T238305 (10Papaul) backup2001 is at 1.3.7 for BIOS version and the last time we did only the IDRAC upgrade since sometimes when the IDRAC version is not up to date we might not see and log at system crash. so... [23:53:25] !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@cb228ae]: Force python to use python3.5 dependencies (take two) [23:53:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:53:35] !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@cb228ae]: Force python to use python3.5 dependencies (take two) (duration: 00m 10s) [23:53:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:58:36] SWAT in a couple of minutes??