[00:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Evening SWAT (Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181211T0000). [00:00:04] stella and odder: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:00:09] Hello! [00:00:34] * odder raises his hand [00:01:06] May everything work well [00:01:11] This is a task for Google Code-in, so how do I deploy the change I made? [00:01:46] Someone will guide you though the process; you just have to wait for now [00:01:55] Alright, thank you! [00:07:04] (03PS3) 10Cwhite: memcached: remove memcached diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/469250 (https://phabricator.wikimedia.org/T183454) [00:07:18] 10Operations, 10Icinga, 10fundraising-tech-ops: frack / passive icinga checks: Errors connecting to icinga2001.wikimedia.org - https://phabricator.wikimedia.org/T211641 (10Dzahn) @cwdent is it new and started at a certain point or has it been like that for a while? I don't see differences in iptables betwee... [00:15:43] !log icinga2001 - killed all nagios processes, restarted nsca service, something is different from icinga1001, service failed when trying to restart (T211641) [00:15:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:15:47] T211641: frack / passive icinga checks: Errors connecting to icinga2001.wikimedia.org - https://phabricator.wikimedia.org/T211641 [00:17:31] 10Operations, 10Icinga, 10fundraising-tech-ops: frack / passive icinga checks: Errors connecting to icinga2001.wikimedia.org - https://phabricator.wikimedia.org/T211641 (10cwdent) @Dzahn it goes back as far as current syslog, I'll dig back and see when it started. Fwiw I can telnet to 208.80.154.84 5667 and... [00:18:57] 10Operations, 10Icinga, 10fundraising-tech-ops: frack / passive icinga checks: Errors connecting to icinga2001.wikimedia.org - https://phabricator.wikimedia.org/T211641 (10Dzahn) Did anything change now ? I killed and restarted nsca on icinga2001. [00:21:23] PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/data/citation/{format}/{query} (Get citation for Darth Vader) timed out before a response was received [00:22:16] 10Operations, 10Icinga, 10fundraising-tech-ops: frack / passive icinga checks: Errors connecting to icinga2001.wikimedia.org - https://phabricator.wikimedia.org/T211641 (10Dzahn) >>! In T211641#4812507, @cwdent wrote: > @Dzahn it goes back as far as current syslog, I'll dig back and see when it started. Fwi... [00:22:27] RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy [00:22:41] (03PS1) 10BBlack: authdns config/CI refactor [1/5] [puppet] - 10https://gerrit.wikimedia.org/r/478809 (https://phabricator.wikimedia.org/T205439) [00:22:44] (03PS1) 10BBlack: authdns config/CI refactor [3/5] [puppet] - 10https://gerrit.wikimedia.org/r/478810 (https://phabricator.wikimedia.org/T205439) [00:22:52] (03PS1) 10BBlack: authdns config/CI refactor: [5/5] [puppet] - 10https://gerrit.wikimedia.org/r/478811 (https://phabricator.wikimedia.org/T205439) [00:22:53] (03PS1) 10BBlack: authdns CI/config refactor [2/5] [dns] - 10https://gerrit.wikimedia.org/r/478812 (https://phabricator.wikimedia.org/T205439) [00:22:55] (03PS1) 10BBlack: authdns config/CI refactor: [4/5] [dns] - 10https://gerrit.wikimedia.org/r/478813 (https://phabricator.wikimedia.org/T205439) [00:23:24] 10Operations, 10Icinga, 10fundraising-tech-ops: frack / passive icinga checks: Errors connecting to icinga2001.wikimedia.org - https://phabricator.wikimedia.org/T211641 (10cwdent) 05Open>03Resolved @dzahn yep looks good now! Thanks :) [00:24:06] (03CR) 10jerkins-bot: [V: 04-1] authdns config/CI refactor: [4/5] [dns] - 10https://gerrit.wikimedia.org/r/478813 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [00:24:08] (03CR) 10jerkins-bot: [V: 04-1] authdns CI/config refactor [2/5] [dns] - 10https://gerrit.wikimedia.org/r/478812 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [00:24:10] (03CR) 10jerkins-bot: [V: 04-1] authdns config/CI refactor [3/5] [puppet] - 10https://gerrit.wikimedia.org/r/478810 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [00:24:12] (03CR) 10jerkins-bot: [V: 04-1] authdns config/CI refactor: [5/5] [puppet] - 10https://gerrit.wikimedia.org/r/478811 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [00:24:43] yes of course CI changes don't pass CI :P [00:36:23] * odder wondering if SWAT window is happening today [00:38:53] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 7.246 second response time [00:41:27] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof [00:41:36] James_F [00:41:45] Sorry, I'm in a meeting [00:41:49] I can do it [00:42:19] For odder and ste1la, GCI tasks (today is a deadline) [00:42:22] Thanks Roan. [00:42:35] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:42:38] Thank you! [00:42:42] Iā€™m waking down the street, sorry. [00:43:18] (03PS3) 10Catrope: Add localised logos for the Chewa Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478632 (https://phabricator.wikimedia.org/T211570) (owner: 10Odder) [00:43:18] * odder ain't doing no GCI tasks, but will take the opportunity anyway [00:43:30] Same, will anyone help me with my deployment? [00:43:33] i can't seem to reach bast4001.wikimedia.org (no ping), is there something wrong with it? [00:43:36] (03PS4) 10Catrope: Add localised logos for the Chewa Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478632 (https://phabricator.wikimedia.org/T211570) (owner: 10Odder) [00:44:12] (03PS4) 10Catrope: Add several HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477960 (https://phabricator.wikimedia.org/T150618) (owner: 10Stella) [00:44:54] ste1la: Yes, I'm doing yours first, then odder's [00:45:01] Thank you so much! [00:45:04] (03CR) 10Catrope: [C: 032] Add several HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477960 (https://phabricator.wikimedia.org/T150618) (owner: 10Stella) [00:45:41] Sorry for the very long wait, I didn't realize nobody else was around [00:46:15] (03PS3) 10Catrope: Updated InitialiseSettings.php for HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478055 (https://phabricator.wikimedia.org/T150618) (owner: 10Stella) [00:46:56] (03Merged) 10jenkins-bot: Add several HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477960 (https://phabricator.wikimedia.org/T150618) (owner: 10Stella) [00:46:58] (03CR) 10Catrope: [C: 032] Updated InitialiseSettings.php for HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478055 (https://phabricator.wikimedia.org/T150618) (owner: 10Stella) [00:47:02] (03Merged) 10jenkins-bot: Updated InitialiseSettings.php for HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478055 (https://phabricator.wikimedia.org/T150618) (owner: 10Stella) [00:47:07] HaeB: Apparently it's bast4002 now [00:47:15] nevermind, per https://wikitech.wikimedia.org/wiki/Bastion it's bast4002 now [00:47:18] hah ;) [00:48:09] bast4002 works fine... just curious why bast4001 still worked as of this morning - but problem solved [00:50:09] Hmm, looks like mw1272 is down but scap is still trying to deploy to it [00:50:26] !log catrope@deploy1001 Synchronized static/images/project-logos/: Add HD logos for zhwikinews, zhwikivoyage, zhwiktionary (T150618) (duration: 02m 30s) [00:50:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:50:30] T150618: Provide HD logos for all projects - https://phabricator.wikimedia.org/T150618 [00:50:50] !log mw1272 is down (does not respond to ping), but scap still tries to deploy to it [00:50:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:52:39] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Use new HD logos for zhwiktionary, zhwikivoyage, zhwikinews (T150618) (duration: 01m 16s) [00:52:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:53:17] ACKNOWLEDGEMENT - cassandra-c CQL 10.192.32.139:9042 on restbase2004 is CRITICAL: connect to address 10.192.32.139 and port 9042: Connection refused eevans Decommissioned (T210843) [00:53:17] ACKNOWLEDGEMENT - cassandra-c SSL 10.192.32.139:7001 on restbase2004 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused eevans Decommissioned (T210843) [00:54:05] ah: https://phabricator.wikimedia.org/T178592 - robh, maybe i was the last of the mohicans on bast4001 (see above), but maybe not? [00:54:10] And URLs purged [00:54:16] ste1la: Yours should be live now (please verify) [00:54:32] How do I check if it's live? [00:55:44] My changes are merged, so would that be it [00:55:54] (03PS2) 10Catrope: Add localised logos for the Chewa Wikipedia (part 1) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478628 (https://phabricator.wikimedia.org/T211570) (owner: 10Odder) [00:55:57] (03PS3) 10Catrope: Add localised logos for the Chewa Wikipedia (part 1) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478628 (https://phabricator.wikimedia.org/T211570) (owner: 10Odder) [00:55:59] (03CR) 10Catrope: [C: 032] Add localised logos for the Chewa Wikipedia (part 1) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478628 (https://phabricator.wikimedia.org/T211570) (owner: 10Odder) [00:56:07] (03PS5) 10Catrope: Add localised logos for the Chewa Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478632 (https://phabricator.wikimedia.org/T211570) (owner: 10Odder) [00:56:08] (03PS6) 10Catrope: Add localised logos for the Chewa Wikipedia (part 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478632 (https://phabricator.wikimedia.org/T211570) (owner: 10Odder) [00:56:10] (03CR) 10Catrope: [C: 032] Add localised logos for the Chewa Wikipedia (part 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478632 (https://phabricator.wikimedia.org/T211570) (owner: 10Odder) [00:56:14] (03Merged) 10jenkins-bot: Add localised logos for the Chewa Wikipedia (part 1) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478628 (https://phabricator.wikimedia.org/T211570) (owner: 10Odder) [00:56:18] ste1la: I mean, check that your updated logos actually appear on the Chinese Wikinews, Wikivoyage, etc [00:56:35] Ah, ok. Will do. [00:57:11] (03Merged) 10jenkins-bot: Add localised logos for the Chewa Wikipedia (part 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478632 (https://phabricator.wikimedia.org/T211570) (owner: 10Odder) [00:59:26] (robh: re ^ i left a PSA about bast4001 in #wikimedia-analytics, just in case others had still been using it too) [01:00:21] (03CR) 10jenkins-bot: Add several HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477960 (https://phabricator.wikimedia.org/T150618) (owner: 10Stella) [01:00:23] (03CR) 10jenkins-bot: Updated InitialiseSettings.php for HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478055 (https://phabricator.wikimedia.org/T150618) (owner: 10Stella) [01:00:25] (03CR) 10jenkins-bot: Add localised logos for the Chewa Wikipedia (part 1) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478628 (https://phabricator.wikimedia.org/T211570) (owner: 10Odder) [01:00:27] (03CR) 10jenkins-bot: Add localised logos for the Chewa Wikipedia (part 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478632 (https://phabricator.wikimedia.org/T211570) (owner: 10Odder) [01:01:49] !log catrope@deploy1001 Synchronized static/images/project-logos/: Add localised logos for nywiki (T211570) (duration: 01m 00s) [01:01:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:01:53] T211570: Add a localised logo for the Chewa Wikipedia - https://phabricator.wikimedia.org/T211570 [01:03:25] I think they are live! Thank you! [01:03:50] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Configure localized logos for nywiki (T211570) (duration: 01m 36s) [01:03:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:04:34] odder: Yours should be live now too, please verify [01:05:03] RoanKattouw: Yup, can confirm they're there [01:05:34] Thanks for your help :) [01:08:09] Alright [01:08:16] That's a wrap then, I see nothing else on the list [01:10:42] Thank you! [01:11:49] PROBLEM - Disk space on an-coord1001 is CRITICAL: DISK CRITICAL - free space: / 1739 MB (3% inode=91%) [01:20:15] PROBLEM - Disk space on an-coord1001 is CRITICAL: DISK CRITICAL - free space: / 1562 MB (3% inode=90%) [01:21:23] PROBLEM - puppet last run on logstash1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:47:23] RECOVERY - puppet last run on logstash1008 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [01:50:23] PROBLEM - Disk space on an-coord1001 is CRITICAL: DISK CRITICAL - free space: / 1705 MB (3% inode=92%) [02:01:15] PROBLEM - Disk space on an-coord1001 is CRITICAL: DISK CRITICAL - free space: / 1705 MB (3% inode=92%) [02:06:33] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 5.532 second response time [02:09:07] 10Operations, 10Sentry, 10vm-requests: Procure hardware for Sentry - https://phabricator.wikimedia.org/T93138 (10Tgr) @Dzahn the blocker is that no one is working on it. There is long-term interest from the Web team at least, but unsure when it will be picked up (and setting up Sentry would not be the first... [02:10:17] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:10:55] PROBLEM - Disk space on an-coord1001 is CRITICAL: DISK CRITICAL - free space: / 1482 MB (3% inode=91%) [02:15:45] PROBLEM - Disk space on an-coord1001 is CRITICAL: DISK CRITICAL - free space: / 1643 MB (3% inode=92%) [02:19:00] 10Operations, 10Core Platform Team (PHP7 (TEC4)), 10Core Platform Team Kanban (Doing), 10HHVM, and 3 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10CCicalese_WMF) [02:20:35] PROBLEM - Disk space on an-coord1001 is CRITICAL: DISK CRITICAL - free space: / 1478 MB (3% inode=91%) [02:31:57] (03PS4) 10Mathew.onipe: setup: change curator version to 5.3.0 to match our current elasticsearch version [software/spicerack] - 10https://gerrit.wikimedia.org/r/477958 [02:34:28] 10Operations, 10Traffic, 10netops: IPv6 ~20ms higher ping than IPv4 to gerrit - https://phabricator.wikimedia.org/T211079 (10ayounsi) >>! In T211079#4812380, @faidon wrote: > - It's been a while, but I believe an import statement in the neighbor block overrides the parent one in its entirety, and does not su... [02:35:47] (03CR) 10Mathew.onipe: setup: change curator version to 5.3.0 to match our current elasticsearch version (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/477958 (owner: 10Mathew.onipe) [02:59:48] (03CR) 10Mathew.onipe: README: update API documentation (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/477565 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [03:07:37] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.554 second response time [03:11:21] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:17:22] 10Operations, 10Sentry, 10vm-requests: Procure hardware for Sentry - https://phabricator.wikimedia.org/T93138 (10Dzahn) Ok, thanks @tgr. I understand, just leaving it stalled. It's fine. [03:35:55] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 891.94 seconds [03:46:29] 10Operations, 10Sentry: Procure hardware for Sentry - https://phabricator.wikimedia.org/T93138 (10Dzahn) [03:48:28] 10Operations, 10Sentry: Procure hardware for Sentry - https://phabricator.wikimedia.org/T93138 (10Dzahn) The reason i kept asking is because of the tag vm-requests and checking that during our clinic duty. So i removed that because currently it's not a request and then you also won't be pinged again by others... [03:52:21] (03CR) 10Mathew.onipe: "Look really nice. Just some few comments." (035 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/478030 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [04:06:22] (03CR) 10BearND: [C: 031] admins: add to proton-admins: pmiazga, bsitzmann, mholloway, mbsantos, tgr [puppet] - 10https://gerrit.wikimedia.org/r/478776 (https://phabricator.wikimedia.org/T211382) (owner: 10Dzahn) [04:12:58] (03PS2) 10BBlack: authdns config/CI refactor [1/5] [puppet] - 10https://gerrit.wikimedia.org/r/478809 (https://phabricator.wikimedia.org/T205439) [04:13:00] (03PS2) 10BBlack: authdns config/CI refactor [3/5] [puppet] - 10https://gerrit.wikimedia.org/r/478810 (https://phabricator.wikimedia.org/T205439) [04:13:02] (03PS2) 10BBlack: authdns config/CI refactor [5/5] [puppet] - 10https://gerrit.wikimedia.org/r/478811 (https://phabricator.wikimedia.org/T205439) [04:13:04] (03PS2) 10BBlack: authdns config/CI refactor [2/5] [dns] - 10https://gerrit.wikimedia.org/r/478812 (https://phabricator.wikimedia.org/T205439) [04:13:06] (03PS2) 10BBlack: authdns config/CI refactor [4/5] [dns] - 10https://gerrit.wikimedia.org/r/478813 (https://phabricator.wikimedia.org/T205439) [04:13:56] (03CR) 10jerkins-bot: [V: 04-1] authdns config/CI refactor [2/5] [dns] - 10https://gerrit.wikimedia.org/r/478812 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [04:13:58] (03CR) 10jerkins-bot: [V: 04-1] authdns config/CI refactor [4/5] [dns] - 10https://gerrit.wikimedia.org/r/478813 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [04:25:23] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 221.96 seconds [05:06:51] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 7.691 second response time [05:10:33] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:49:19] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 9.124 second response time [05:53:01] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:57:14] !log Deploy schema change on s4 primary master (db1068) T202167 T86338 [05:57:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:57:20] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [05:57:20] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [06:01:31] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 7.003 second response time [06:05:13] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:06:36] (03PS1) 10Marostegui: db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478828 (https://phabricator.wikimedia.org/T86338) [06:08:03] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478828 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [06:09:06] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478828 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [06:12:01] 10Operations, 10Analytics, 10Security-Team, 10WMF-Legal, 10Software-Licensing: Can exfat be used in WMF production? - https://phabricator.wikimedia.org/T210667 (10Legoktm) >>! In T210667#4795435, @JBennett wrote: > Thanks everyone of for their thoughtful consideration. I have no issues nor do I see a co... [06:12:18] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1109 T86338 T202167 (duration: 02m 55s) [06:12:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:12:23] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [06:12:23] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [06:13:26] !log Deploy schema change on db1109 T202167 T86338 [06:13:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:15:25] (03PS7) 10Robingan7: Add logos. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478498 [06:20:16] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478828 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [06:25:12] 10Operations, 10ops-eqiad, 10DBA, 10User-Marostegui: rack/setup/install db11[26-38].eqiad.wmnet - https://phabricator.wikimedia.org/T211613 (10Marostegui) Hey @robh! Thanks for putting up an initial racking plan. It is a bit more complicated than just replacing the masters, as we also have candidate mast... [06:26:43] RECOVERY - Disk space on an-coord1001 is OK: DISK OK [06:28:03] PROBLEM - Check systemd state on netmon2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:35:24] (03PS1) 10Marostegui: mariadb: Install db11[26-38] new DB hosts [puppet] - 10https://gerrit.wikimedia.org/r/478829 (https://phabricator.wikimedia.org/T211613) [06:35:30] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1109" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478830 [06:38:38] (03CR) 10Marostegui: [C: 032] "https://puppet-compiler.wmflabs.org/compiler1002/13887/" [puppet] - 10https://gerrit.wikimedia.org/r/478829 (https://phabricator.wikimedia.org/T211613) (owner: 10Marostegui) [06:38:59] RECOVERY - Check systemd state on netmon2001 is OK: OK - running: The system is fully operational [06:44:27] 10Operations, 10Release Pipeline: blubber template for nodejs should allow defining configuration files to copy to the container - https://phabricator.wikimedia.org/T211580 (10Joe) >>! In T211580#4810394, @mobrovac wrote: > FYI, we can also have an alternative mount point for the config file, and can tell `ser... [06:45:57] !log Rename flaggedrevs tables on srwikinews on db1078 - T209761 [06:46:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:46:01] T209761: Drop FlaggedRevs tables in database for srwikinews - https://phabricator.wikimedia.org/T209761 [06:55:13] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1109" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478830 (owner: 10Marostegui) [06:56:17] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1109" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478830 (owner: 10Marostegui) [06:58:09] (03PS1) 10Marostegui: db-eqiad.php: Depool db1104 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478834 (https://phabricator.wikimedia.org/T86338) [06:58:21] (03PS2) 10Catrope: Enable emails for certain notification types by default on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478799 (https://phabricator.wikimedia.org/T211620) [06:59:18] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1109" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478830 (owner: 10Marostegui) [06:59:28] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1109 T86338 T202167 (duration: 02m 52s) [06:59:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:59:33] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [06:59:34] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [07:01:31] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1104 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478834 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [07:02:40] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1104 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478834 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [07:03:46] (03PS2) 10Giuseppe Lavagetto: Hotfix for logging in php-fpm [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478021 (https://phabricator.wikimedia.org/T211184) [07:06:02] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1104 T86338 T202167 (duration: 02m 51s) [07:06:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:06:07] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [07:06:07] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [07:06:07] !log Deploy schema change on db1104 T202167 T86338 [07:06:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:12:25] (03CR) 10Urbanecm: "What happened with this patch? +2'ed, but not merged." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478348 (https://phabricator.wikimedia.org/T211395) (owner: 10Urbanecm) [07:12:29] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1104 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478834 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [07:14:31] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 8.529 second response time [07:18:09] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:20:04] (03CR) 10Giuseppe Lavagetto: [C: 032] Hotfix for logging in php-fpm [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478021 (https://phabricator.wikimedia.org/T211184) (owner: 10Giuseppe Lavagetto) [07:21:06] (03Merged) 10jenkins-bot: Hotfix for logging in php-fpm [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478021 (https://phabricator.wikimedia.org/T211184) (owner: 10Giuseppe Lavagetto) [07:25:26] (03CR) 10jenkins-bot: Hotfix for logging in php-fpm [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478021 (https://phabricator.wikimedia.org/T211184) (owner: 10Giuseppe Lavagetto) [07:28:39] !log oblivian@deploy1001 Synchronized wmf-config/php7.php: Hotfix for logging on php7 (1/2) (duration: 02m 50s) [07:28:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:29:03] !log oblivian@puppetmaster1001 conftool action : set/pooled=inactive; selector: name=mw1272.* [07:29:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:31:02] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1104" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478851 [07:32:33] !log oblivian@deploy1001 Synchronized wmf-config/PhpAutoPrepend.php: Hotfix for logging on php7 (2/2) (duration: 02m 51s) [07:32:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:34:37] (03PS4) 10Giuseppe Lavagetto: shiny_server: change gfortran/g++ dep [puppet] - 10https://gerrit.wikimedia.org/r/478252 (https://phabricator.wikimedia.org/T168967) (owner: 10Bearloga) [07:45:03] 10Operations, 10Puppet, 10puppet-compiler: Cleanup the puppetmaster module so that we stop breaking expectations (and the puppet compiler) - https://phabricator.wikimedia.org/T211547 (10Joe) >>! In T211547#4810535, @herron wrote: > Since we are arguably due for another puppet upgrade, and puppet 5 will be in... [07:45:26] (03CR) 10Giuseppe Lavagetto: [C: 032] shiny_server: change gfortran/g++ dep [puppet] - 10https://gerrit.wikimedia.org/r/478252 (https://phabricator.wikimedia.org/T168967) (owner: 10Bearloga) [07:51:55] (03PS5) 10Vgutierrez: lists: Use the certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/476869 (https://phabricator.wikimedia.org/T207050) [07:53:57] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1104" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478851 (owner: 10Marostegui) [07:55:00] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1104" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478851 (owner: 10Marostegui) [07:56:05] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1104 T86338 T202167 (duration: 00m 46s) [07:56:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:56:12] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [07:56:12] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [08:04:03] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1104" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478851 (owner: 10Marostegui) [08:10:35] !log decommissioning cassandra-b, restbase2005 -- T210843 [08:10:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:10:39] T210843: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 [08:12:06] (03PS1) 10Marostegui: db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478861 (https://phabricator.wikimedia.org/T86338) [08:13:17] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478861 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [08:14:20] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478861 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [08:14:55] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 3.842 second response time [08:15:26] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1092 T86338 T202167 (duration: 00m 46s) [08:15:29] !log Deploy schema change on db1092 T202167 T86338 [08:15:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:15:31] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [08:15:31] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [08:15:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:16:45] (03CR) 10Mathew.onipe: [C: 04-1] puppet: add PuppetMaster class (0310 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/477707 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [08:17:08] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478861 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [08:18:39] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:26:36] 10Operations, 10monitoring, 10User-fgiunchedi: Review prometheus_nodes params - https://phabricator.wikimedia.org/T207292 (10Joe) I don't think this is the pattern widely adopted in our codebase. What we typically do is: - Create a specific profile for the exporter, so in this case probably `profile::promet... [08:28:09] (03CR) 10Mathew.onipe: [C: 031] "Checked the lvs config against the nodes in each clusters. All seems good!" [puppet] - 10https://gerrit.wikimedia.org/r/475753 (https://phabricator.wikimedia.org/T207195) (owner: 10Gehel) [08:29:22] 10Operations, 10monitoring, 10User-CDanis: puppet-provisioned dashboards not found in Grafana 5 - https://phabricator.wikimedia.org/T211654 (10fgiunchedi) p:05Triage>03Normal [08:31:04] (03CR) 10Filippo Giunchedi: "+traffic folks, I'm not sure we want to split trafficserver from the general "cache proxy" cluster" [puppet] - 10https://gerrit.wikimedia.org/r/478774 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [08:33:56] (03CR) 10Filippo Giunchedi: "> Patch Set 8: Code-Review-1" [puppet] - 10https://gerrit.wikimedia.org/r/432528 (https://phabricator.wikimedia.org/T182085) (owner: 1020after4) [08:36:10] (03PS3) 10Filippo Giunchedi: conftool: add restbase20[3-8] [puppet] - 10https://gerrit.wikimedia.org/r/478672 (https://phabricator.wikimedia.org/T211416) [08:36:43] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 9.661 second response time [08:36:49] (03PS12) 10Giuseppe Lavagetto: httpd::mpm: Also remove mod_php for 7.0 and 7.2 if not prefork [puppet] - 10https://gerrit.wikimedia.org/r/477587 (https://phabricator.wikimedia.org/T208257) (owner: 10Paladox) [08:37:19] (03CR) 10Filippo Giunchedi: [C: 032] conftool: add restbase20[3-8] [puppet] - 10https://gerrit.wikimedia.org/r/478672 (https://phabricator.wikimedia.org/T211416) (owner: 10Filippo Giunchedi) [08:38:43] (03CR) 10Ema: [C: 031] "A few minor comment but the general direction looks sound to me." (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/458115 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:39:38] <_joe_> oh my I hate our ff-only policy so much [08:39:51] (03CR) 10Giuseppe Lavagetto: [C: 032] httpd::mpm: Also remove mod_php for 7.0 and 7.2 if not prefork [puppet] - 10https://gerrit.wikimedia.org/r/477587 (https://phabricator.wikimedia.org/T208257) (owner: 10Paladox) [08:40:23] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:40:43] (03PS13) 10Giuseppe Lavagetto: httpd::mpm: Also remove mod_php for 7.0 and 7.2 if not prefork [puppet] - 10https://gerrit.wikimedia.org/r/477587 (https://phabricator.wikimedia.org/T208257) (owner: 10Paladox) [08:40:56] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] httpd::mpm: Also remove mod_php for 7.0 and 7.2 if not prefork [puppet] - 10https://gerrit.wikimedia.org/r/477587 (https://phabricator.wikimedia.org/T208257) (owner: 10Paladox) [08:42:04] mobrovac: re: https://gerrit.wikimedia.org/r/c/operations/puppet/+/478673 after merge and puppet run a restbase rolling restart is needed I'm guessing? [08:46:36] (03PS1) 10Filippo Giunchedi: restbase: remove production_ng remnants [puppet] - 10https://gerrit.wikimedia.org/r/478869 (https://phabricator.wikimedia.org/T211416) [08:46:38] (03PS1) 10Filippo Giunchedi: hieradata: replace restbase seeds in codfw [puppet] - 10https://gerrit.wikimedia.org/r/478870 (https://phabricator.wikimedia.org/T211416) [08:47:18] (03CR) 10Ema: "Some comments inline." (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/477707 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [08:47:44] (03Abandoned) 10Urbanecm: Create two extra namespaces on yuewiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477232 (https://phabricator.wikimedia.org/T205546) (owner: 10Urbanecm) [08:49:11] (03CR) 10Giuseppe Lavagetto: "If any of the files we're mocking are static and available in the puppet repository, we might be better off extracting them from there." [dns] - 10https://gerrit.wikimedia.org/r/478812 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [08:55:00] (03CR) 10Ema: [C: 031] Add ipmi module (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/478030 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [08:56:24] (03PS1) 10Marostegui: dbproxy1010: Depool labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/478872 (https://phabricator.wikimedia.org/T86338) [08:58:11] PROBLEM - MariaDB Slave Lag: s8 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 605.52 seconds [08:58:42] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478873 [09:00:23] (03CR) 10Marostegui: [C: 032] dbproxy1010: Depool labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/478872 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [09:00:42] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478873 (owner: 10Marostegui) [09:01:43] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478873 (owner: 10Marostegui) [09:01:48] !log Depool labsdb1010 - T86338 [09:01:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:01:52] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [09:04:03] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1092 T86338 T202167 (duration: 00m 46s) [09:04:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:04:08] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [09:05:01] (03PS1) 10Marostegui: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478875 (https://phabricator.wikimedia.org/T86338) [09:06:13] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478875 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [09:07:14] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478875 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [09:08:03] (03PS10) 10Filippo Giunchedi: rsyslog: add UDP localhost compatibility endpoint [puppet] - 10https://gerrit.wikimedia.org/r/475352 (https://phabricator.wikimedia.org/T205851) [09:08:05] (03PS5) 10Filippo Giunchedi: logstash: add new logging kafka consumer [puppet] - 10https://gerrit.wikimedia.org/r/476472 (https://phabricator.wikimedia.org/T205851) [09:08:07] (03PS5) 10Filippo Giunchedi: logstash: copy 'severity' into 'level' where needed [puppet] - 10https://gerrit.wikimedia.org/r/476473 (https://phabricator.wikimedia.org/T205851) [09:08:14] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1087 T86338 T202167 (duration: 00m 46s) [09:08:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:08:19] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [09:08:28] !log Deploy schema change on db1087 with replication (this will generate lag on labsdb:s8) T202167 T86338 [09:08:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:08:46] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478873 (owner: 10Marostegui) [09:08:50] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478875 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [09:10:06] (03PS6) 10Ema: cache_text: Vary for PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/478680 (https://phabricator.wikimedia.org/T206339) (owner: 10BBlack) [09:15:10] 10Operations, 10Analytics, 10Performance-Team, 10Traffic: Only serve debug HTTP headers when x-wikimedia-debug is present - https://phabricator.wikimedia.org/T210484 (10Gilles) @Anomie very good point, I think it will be very hard for someone to find out about such a whitelist. Things will work for them on... [09:17:14] (03CR) 10Ema: hiera: add trafficserver cluster definition (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/478774 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [09:17:20] 10Operations, 10Analytics, 10Performance-Team, 10Traffic: Only serve debug HTTP headers when x-wikimedia-debug is present - https://phabricator.wikimedia.org/T210484 (10Gilles) >>! In T210484#4779199, @TheDJ wrote: > what about ?debug=true ? We already vary on that right ? might as well vary which set of... [09:17:50] 10Operations, 10Traffic, 10Continuous-Integration-Infrastructure (Slipway), 10Patch-For-Review, 10User-ArielGlenn: CI jobs for authdns linting need to run on Stretch - https://phabricator.wikimedia.org/T205439 (10hashar) CI runs two jobs for `operations/dns`: operations-dns-tabs ==== Does a git shallow... [09:18:00] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 8.241 second response time [09:19:01] 10Operations, 10Analytics, 10Performance-Team, 10Traffic: Only serve debug HTTP headers when x-wikimedia-debug is present - https://phabricator.wikimedia.org/T210484 (10Gilles) >>! In T210484#4794749, @fdans wrote: > Analytics needs x-analytics in every request, not only in debugging ones but we don't need... [09:21:12] (03PS2) 10Ema: hiera: add trafficserver cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/478774 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [09:21:41] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:22:53] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 8.448 second response time [09:26:29] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:27:02] 10Operations, 10Performance-Team, 10Traffic, 10media-storage: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10Gilles) [09:27:10] 10Operations, 10Performance-Team, 10Traffic, 10media-storage: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10Gilles) p:05Triage>03Normal [09:27:31] (03PS11) 10Filippo Giunchedi: rsyslog: add UDP localhost compatibility endpoint [puppet] - 10https://gerrit.wikimedia.org/r/475352 (https://phabricator.wikimedia.org/T205851) [09:27:50] mmh, mw1272 has been down for the last 12 hours apparently [09:27:58] (03CR) 10Filippo Giunchedi: [C: 032] rsyslog: add UDP localhost compatibility endpoint [puppet] - 10https://gerrit.wikimedia.org/r/475352 (https://phabricator.wikimedia.org/T205851) (owner: 10Filippo Giunchedi) [09:28:15] RECOVERY - MariaDB Slave Lag: s8 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 221.03 seconds [09:30:02] !log mw1272 down for the past 12h. Nothing in console, power-cycling [09:30:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:31:04] <_joe_> ema: don't repool it before doing a scap pull from the server itself [09:31:09] RECOVERY - Host mw1272 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [09:31:18] _joe_: does it auto-repool at boot? [09:31:32] <_joe_> nope [09:37:17] PROBLEM - HHVM rendering on mw1348 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:38:21] RECOVERY - HHVM rendering on mw1348 is OK: HTTP OK: HTTP/1.1 200 OK - 74973 bytes in 0.117 second response time [09:42:18] 10Operations, 10HHVM: mw1272 crashed: Bad page map in process hhvm - https://phabricator.wikimedia.org/T211668 (10ema) [09:42:26] 10Operations, 10HHVM: mw1272 crashed: Bad page map in process hhvm - https://phabricator.wikimedia.org/T211668 (10ema) p:05Triage>03Normal [09:42:39] _joe_: FYI ^ [09:43:16] <_joe_> ema: looks like the RAM has issues [09:45:30] 10Operations, 10ops-eqiad, 10HHVM: mw1272 crashed: Bad page map in process hhvm - https://phabricator.wikimedia.org/T211668 (10ema) The problem could be due to bad RAM. @Cmjohnson could you check? [09:45:59] _joe_: yeah maybe. cmjohnson1 pinged on the task! [09:46:53] (03PS1) 10GTirloni: toolforge: Increase shinken 'High iowait' time period to 60min [puppet] - 10https://gerrit.wikimedia.org/r/478880 (https://phabricator.wikimedia.org/T161898) [09:47:42] (03CR) 10GTirloni: [C: 032] toolforge: Increase shinken 'High iowait' time period to 60min [puppet] - 10https://gerrit.wikimedia.org/r/478880 (https://phabricator.wikimedia.org/T161898) (owner: 10GTirloni) [09:49:38] (03PS2) 10ArielGlenn: convert dump scripts to python3 [dumps] - 10https://gerrit.wikimedia.org/r/478702 (https://phabricator.wikimedia.org/T210989) [09:57:58] (03CR) 10ArielGlenn: "Well that's interesting. I really expected it to complain about FileNotFoundError. But I'm not complaining." [dumps] - 10https://gerrit.wikimedia.org/r/478702 (https://phabricator.wikimedia.org/T210989) (owner: 10ArielGlenn) [10:08:33] PROBLEM - etcd request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 operation=list https://grafana.wikimedia.org/dashboard/db/kubernetes-api [10:09:45] RECOVERY - etcd request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [10:10:46] (03PS2) 10Volans: comments: uniform and add missing Ganeti comments [dns] - 10https://gerrit.wikimedia.org/r/478622 (https://phabricator.wikimedia.org/T182028) [10:10:51] PROBLEM - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is CRITICAL: /api/rest_v1/data/citation/{format}/{query} (Get citation for Darth Vader) timed out before a response was received [10:11:08] 10Operations, 10Traffic, 10Continuous-Integration-Infrastructure (Slipway), 10Patch-For-Review, 10User-ArielGlenn: CI jobs for authdns linting need to run on Stretch - https://phabricator.wikimedia.org/T205439 (10BBlack) @hashar - I'm re-working the tools for the linting checks on operations/dns in the c... [10:11:47] 10Operations, 10MediaWiki-Cache, 10Performance-Team (Radar), 10User-Elukey, 10Wikimedia-production-error: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10elukey) I checked again tcpdump traffic and the "new"... [10:11:57] RECOVERY - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is OK: All endpoints are healthy [10:13:09] (03PS4) 10Volans: validator: complete refactor of the validation [dns] - 10https://gerrit.wikimedia.org/r/478416 (https://phabricator.wikimedia.org/T182028) [10:13:11] (03PS1) 10Volans: validator: improve Ganeti comment check [dns] - 10https://gerrit.wikimedia.org/r/478885 (https://phabricator.wikimedia.org/T182028) [10:13:13] (03PS1) 10Volans: validator: promote clean warnigns to errors [dns] - 10https://gerrit.wikimedia.org/r/478886 (https://phabricator.wikimedia.org/T182028) [10:14:58] (03CR) 10Volans: "I've improved the Ganeti check in I14aaf6db03aea61f1e6d7ea093da453aebd852e4 and cross-checked with the list of existing VMs in Ganeti to f" [dns] - 10https://gerrit.wikimedia.org/r/478622 (https://phabricator.wikimedia.org/T182028) (owner: 10Volans) [10:22:01] PROBLEM - mediawiki-installation DSH group on mw1272 is CRITICAL: Host mw1272 is not in mediawiki-installation dsh group [10:23:02] _joe_: your friend :-P ^^^ [10:23:50] <_joe_> volans: uhm it is ok if the server is still inactive [10:27:52] (03PS3) 10Volans: README: update API documentation [cookbooks] - 10https://gerrit.wikimedia.org/r/477565 (https://phabricator.wikimedia.org/T199079) [10:28:06] (03CR) 10Volans: "reply inline" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/477565 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [10:37:06] (03PS3) 10BBlack: authdns config/CI refactor [1/5] [puppet] - 10https://gerrit.wikimedia.org/r/478809 (https://phabricator.wikimedia.org/T205439) [10:37:08] (03PS3) 10BBlack: authdns config/CI refactor [3/5] [puppet] - 10https://gerrit.wikimedia.org/r/478810 (https://phabricator.wikimedia.org/T205439) [10:37:10] (03PS3) 10BBlack: authdns config/CI refactor [5/5] [puppet] - 10https://gerrit.wikimedia.org/r/478811 (https://phabricator.wikimedia.org/T205439) [10:37:12] (03PS3) 10BBlack: authdns config/CI refactor [2/5] [dns] - 10https://gerrit.wikimedia.org/r/478812 (https://phabricator.wikimedia.org/T205439) [10:37:14] (03PS3) 10BBlack: authdns config/CI refactor [4/5] [dns] - 10https://gerrit.wikimedia.org/r/478813 (https://phabricator.wikimedia.org/T205439) [10:38:03] (03CR) 10jerkins-bot: [V: 04-1] authdns config/CI refactor [2/5] [dns] - 10https://gerrit.wikimedia.org/r/478812 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [10:38:05] (03CR) 10jerkins-bot: [V: 04-1] authdns config/CI refactor [4/5] [dns] - 10https://gerrit.wikimedia.org/r/478813 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [10:38:53] <_joe_> bblack: I think I have a solution for running the dns CI without moving scripts around in a first stage [10:39:50] <_joe_> given I just properly have to copy a couple files over from the puppet repository [10:40:11] that's what you'd think, but it's not true! :) [10:40:35] <_joe_> well, that's unless authdns-gen-zones does something strange :P [10:40:50] <_joe_> which... shouldn't work in the ci slaves [10:40:58] anyways, we want to keep being able to refactor these without stepping off into special CI stuff -land [10:41:19] <_joe_> yes, I think your patchset is valid conceptually [10:41:32] <_joe_> I'm just saying it's not a blocker to activate the zone validator [10:41:32] the 5-step patch above leads to an ops/dns repo that's locally testable for CI purposes from any machine, given a few debian-packaged prereqs installed [10:41:37] <_joe_> and move all to stretch [10:41:45] <_joe_> bblack: nod [10:42:25] (03CR) 10Alexandros Kosiaris: [C: 031] validator: improve Ganeti comment check [dns] - 10https://gerrit.wikimedia.org/r/478885 (https://phabricator.wikimedia.org/T182028) (owner: 10Volans) [10:42:35] !log scap pull mw1272 [10:42:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:42:37] <_joe_> bblack: the biggest issue is to import the geoip data manually at image build time [10:42:48] the zone validator requires stretch, stretch requires docker, and a clean dockerization of the "authdns-lint" needs the two scripts at least, the rest just cleans up how we mock-test the combination of puppet+dns. [10:43:17] <_joe_> bblack: right now I have [10:43:18] my patchset takes care of that with a 261 byte mock geoip2 database [10:43:21] <_joe_> # Temporary step while we move the scripts off of the puppet repository [10:43:57] <_joe_> cd /tmp && git clone https://gerrit.wikimedia.org/r/operations/puppet && cp /tmp/puppet/modules/authdns/files/authdns-gen-zones.py /usr/local/bin/authdns-gen-zones... [10:44:05] (it's a valid geoip2 binary database, but happens to be basically the minimal possible one which is only useful for testing that things work at all, it doesn't really map IPs usefully to anywhere) [10:44:22] <_joe_> makes sense [10:45:59] so the dependencies would be: python3 3.5+ for zone_validator, python2 + jinja2 for authdns-gen-zones (maybe later we just update this to 3 too, but there's other stuff coming eventually), and gdnsd itself. [10:46:28] (03CR) 10Giuseppe Lavagetto: [C: 031] baseimages: Add a default LC_ALL C.UTF-8 locale [puppet] - 10https://gerrit.wikimedia.org/r/478200 (https://phabricator.wikimedia.org/T210260) (owner: 10Alexandros Kosiaris) [10:46:30] (03PS5) 10Alexandros Kosiaris: admins: add to proton-admins: pmiazga, bsitzmann, mholloway, mbsantos, tgr [puppet] - 10https://gerrit.wikimedia.org/r/478776 (https://phabricator.wikimedia.org/T211382) (owner: 10Dzahn) [10:47:05] !log pooling mw1272 [10:47:07] there's other "obvious" followups if 1-5 work out that we could do later (e.g. move authdns-local-update into the dns repo as well, and take advantage of the new structure to fix some other long-standing issues with the results of templating, etc).. but just 1-5 and getting over the stretch hurdle is enough for now. [10:47:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:47:50] (03CR) 10Alexandros Kosiaris: [C: 032] "I 've removed sc-admins from the groups in the proton role in the interest of keeping things clear and separated now that the service has " [puppet] - 10https://gerrit.wikimedia.org/r/478776 (https://phabricator.wikimedia.org/T211382) (owner: 10Dzahn) [10:47:57] (03PS6) 10Alexandros Kosiaris: admins: add to proton-admins: pmiazga, bsitzmann, mholloway, mbsantos, tgr [puppet] - 10https://gerrit.wikimedia.org/r/478776 (https://phabricator.wikimedia.org/T211382) (owner: 10Dzahn) [10:50:19] _joe_: if you have time to review those for stupidity, please do. I've run them back over a few times myself and caught a few things, and I can obviously patch-as-I-go by disabling puppet on authdnses and testing the fallout on authdns::testns first, etc.... [10:50:24] (03CR) 10Alexandros Kosiaris: [C: 032] upgrade puppet stdlib from 4.24.0 to 4.25.1 [puppet] - 10https://gerrit.wikimedia.org/r/475261 (owner: 10Dzahn) [10:50:30] (03PS3) 10Alexandros Kosiaris: upgrade puppet stdlib from 4.24.0 to 4.25.1 [puppet] - 10https://gerrit.wikimedia.org/r/475261 (owner: 10Dzahn) [10:50:55] <_joe_> bblack: will take a look in a few, but I will need some guidance [10:50:59] _joe_: the main thing is, I need to be able to force an updated puppet agent run on the current authdns CI slaves at various steps to make this all work, how do I do that? [10:51:30] (03PS1) 10Hashar: (DO NOT SUBMIT) I have a typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478889 [10:52:02] <_joe_> bblack: uhm frankly, with cumin [10:52:05] (I mean as part of the process of conversion while deploying these 5 patches, not as a runtime thing later) [10:52:09] <_joe_> we have cumin in labs AFAIK [10:52:32] CI instances are quite volatile though... why we need to run puppet? [10:52:33] <_joe_> I'm not sure how to use it there though, cloud people might help [10:52:35] (03CR) 10jerkins-bot: [V: 04-1] (DO NOT SUBMIT) I have a typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478889 (owner: 10Hashar) [10:52:39] 10Operations, 10Cloud-Services, 10Cloud-VPS, 10IPv6: Enable ipv6 on labs - https://phabricator.wikimedia.org/T37947 (10aborrero) Note for myself: https://docs.openstack.org/mitaka/networking-guide/config-ipv6.html [10:52:44] <_joe_> volans: not for authdns though [10:52:59] (03Abandoned) 10Hashar: (DO NOT SUBMIT) I have a typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478889 (owner: 10Hashar) [10:53:12] 10Operations, 10Cloud-VPS, 10IPv6, 10cloud-services-team (Kanban): Enable IPv6 on CloudVPS - https://phabricator.wikimedia.org/T37947 (10aborrero) p:05Low>03Normal [10:53:14] volans: because the changes will change how the authdns CI stuff runs as they go, it will break current authdns CI if I can't force the updates between patches [10:53:23] <_joe_> volans: when I was a jenkins admin I could've told you more [10:53:29] eeheh [10:53:36] (this set of changes is also meant to leave the existing jessie CI in a working state, but one that's easily transitioned to stretch+docker) [10:53:37] <_joe_> now we need releng and cloud [10:53:44] <_joe_> we're progressing! [10:53:47] * _joe_ bbiab [10:54:26] <_joe_> bblack: empirically they all seem to run on integration-slave-jessie-10\d+ [10:54:40] this is one of those patchsets where the mental energy of figuring out how to review it fully may not be worth it, but any check for stupid obvious typos/errors helps [10:54:42] <_joe_> so if you can use cumin and select those hosts in the CI project [10:54:43] 10Operations, 10Epic, 10cloud-services-team (Kanban): CloudVPS: our ideal future model - https://phabricator.wikimedia.org/T209460 (10aborrero) [10:54:48] 10Operations, 10Cloud-VPS, 10IPv6, 10cloud-services-team (Kanban): Enable IPv6 on CloudVPS - https://phabricator.wikimedia.org/T37947 (10aborrero) [10:55:18] _joe_: do you know where I run that from in the CI project? I can go dig on wikitech too [10:55:36] bblack: you can run cumin from the labpuppetmaster* hosts [10:55:44] against any project in cloud [10:55:49] (03CR) 10Alexandros Kosiaris: [C: 032] baseimages: Add a default LC_ALL C.UTF-8 locale [puppet] - 10https://gerrit.wikimedia.org/r/478200 (https://phabricator.wikimedia.org/T210260) (owner: 10Alexandros Kosiaris) [10:55:56] (03PS2) 10Alexandros Kosiaris: baseimages: Add a default LC_ALL C.UTF-8 locale [puppet] - 10https://gerrit.wikimedia.org/r/478200 (https://phabricator.wikimedia.org/T210260) [10:55:58] https://wikitech.wikimedia.org/wiki/Cumin#OpenStack_backend [10:55:59] ok :) [10:56:02] project:name [10:56:08] if they are in a specific project [10:56:15] (no puppetdb there ;) ) [10:56:20] https://gerrit.wikimedia.org/r/q/topic:%22authdns-ci%22 [10:56:33] (03PS1) 10Urbanecm: Upload new logos for cawikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478890 (https://phabricator.wikimedia.org/T198507) [10:56:36] (03PS1) 10Urbanecm: Use HD logos for cawikimedia in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478891 [10:57:09] 10Operations, 10Cloud-VPS, 10IPv6, 10cloud-services-team (Kanban): Enable IPv6 on CloudVPS - https://phabricator.wikimedia.org/T37947 (10aborrero) [10:57:19] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 8.974 second response time [10:57:20] 10Operations, 10Cloud-VPS, 10IPv6, 10cloud-services-team (Kanban): Enable IPv6 on CloudVPS - https://phabricator.wikimedia.org/T37947 (10aborrero) [11:00:08] (03PS10) 10DCausse: [cirrus] Start writing to psi & omega [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476271 (https://phabricator.wikimedia.org/T210381) [11:00:10] (03PS10) 10DCausse: [cirrus] Start using replica group settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476272 (https://phabricator.wikimedia.org/T210381) [11:00:12] (03PS12) 10DCausse: [cirrus] Cleanup transitional states [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476273 (https://phabricator.wikimedia.org/T210381) [11:00:18] (03PS1) 10DCausse: [cirrus] Add all three elasticsearch cluster to labs services [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478892 (https://phabricator.wikimedia.org/T211526) [11:00:37] (03CR) 10Mathew.onipe: [C: 031] README: update API documentation [cookbooks] - 10https://gerrit.wikimedia.org/r/477565 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [11:00:57] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:01:15] 10Operations, 10Cloud-VPS, 10IPv6, 10cloud-services-team (Kanban): Enable IPv6 on CloudVPS - https://phabricator.wikimedia.org/T37947 (10aborrero) Before we can move forward with this, there are several things to sort out: * what is our ideal IPv6 model for CloudVPS * to which extent we can implement our... [11:01:18] (03PS3) 10DCausse: tests: Assert LabsServices contains all prod keys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478569 (https://phabricator.wikimedia.org/T211526) (owner: 10Krinkle) [11:02:42] still no automatic restart for pdfrender? :( [11:03:21] !log restarted pdfrender on scb1003 [last time, we need an automatic restart] [11:03:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:04:02] (03CR) 10DCausse: [C: 031] "thanks!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478569 (https://phabricator.wikimedia.org/T211526) (owner: 10Krinkle) [11:04:27] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.003 second response time [11:04:42] the openstack backend's grammar is not very intuitive heh, even after seeing wikitech! [11:04:47] got it though! [11:05:07] (03PS1) 10Marostegui: Revert "dbproxy1010: Depool labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/478894 [11:05:08] ouch :( how can be improved? [11:05:25] (03PS2) 10Marostegui: Revert "dbproxy1010: Depool labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/478894 [11:06:07] bblack: and if it feels super slow, that's expected due to the prod-cloud proxy, concurrency is set way lower than prod ;) [11:06:23] (03CR) 10Marostegui: [C: 032] Revert "dbproxy1010: Depool labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/478894 (owner: 10Marostegui) [11:06:52] "project:X name:Y" as a single selector, when it seems like two selectors, and quoting and the note about combining two selectors with O{} and O{} etc just didn't make sense to me for some reason [11:06:58] !log Repool labsdb1010 - T86338 [11:07:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:07:02] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [11:07:10] I just tried dry-run on like 20 possible syntax variations until one stuck lol [11:07:23] failed ones include e.g.: [11:07:25] sudo cumin 'project:integration and name:"integration-slave-jessie-10\d+"' [11:07:59] sudo cumin 'O{project:integration} and O{name:integration-slave-.*}' [11:08:05] I see, yeah one "query" is a single call to openstack and you cannot really do and/or [11:08:06] etc, etc [11:08:24] it just took me a whlie to realize the right thing was literally what it said in the docs, as dumb as that sounds [11:08:34] sudo cumin 'project:integration name:"integration-slave-jessie-10\d+"' [11:08:42] (03PS1) 10Marostegui: dbproxy1010: Depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/478895 (https://phabricator.wikimedia.org/T86338) [11:09:21] my brain said something along the lines of "surely there's a boolean combiner missing in that documented example" heh [11:09:53] ahahah [11:09:54] sorry [11:10:17] (03CR) 10Marostegui: [C: 032] dbproxy1010: Depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/478895 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [11:10:45] !log Depool labsdb1011 - T86338 [11:10:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:10:51] (03Restored) 10Hashar: (DO NOT SUBMIT) I have a typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478889 (owner: 10Hashar) [11:10:55] (03CR) 10Hashar: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478889 (owner: 10Hashar) [11:11:31] (03CR) 10Hashar: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478892 (https://phabricator.wikimedia.org/T211526) (owner: 10DCausse) [11:11:40] 10Operations, 10MediaWiki-Cache, 10Performance-Team (Radar), 10User-Elukey, 10Wikimedia-production-error: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10Nikerabbit) >>! In T203786#4813187, @elukey wrote: >... [11:11:56] (03CR) 10jerkins-bot: [V: 04-1] (DO NOT SUBMIT) I have a typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478889 (owner: 10Hashar) [11:12:53] 10Operations, 10Proton, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to Proton for pmiazga, bearND, Mholloway, MSantos, Tgr - https://phabricator.wikimedia.org/T211382 (10akosiaris) 05Open>03Resolved I 've slightly amended the patch to remove the now defunct `sc-admins` group and merged... [11:14:10] 10Operations, 10Cloud-VPS, 10IPv6, 10cloud-services-team (Kanban): Enable IPv6 on CloudVPS - https://phabricator.wikimedia.org/T37947 (10Krenair) [11:14:22] 10Operations, 10Epic, 10cloud-services-team (Kanban): CloudVPS: our ideal future model - https://phabricator.wikimedia.org/T209460 (10Krenair) [11:14:30] 10Operations, 10Cloud-VPS, 10IPv6, 10cloud-services-team (Kanban): Enable IPv6 on CloudVPS - https://phabricator.wikimedia.org/T37947 (10Krenair) [11:14:32] 10Operations, 10Epic, 10cloud-services-team (Kanban): CloudVPS: our ideal future model - https://phabricator.wikimedia.org/T209460 (10Krenair) [11:14:40] 10Operations, 10Cloud-VPS, 10IPv6, 10cloud-services-team (Kanban): Enable IPv6 on CloudVPS - https://phabricator.wikimedia.org/T37947 (10Krenair) [11:16:01] (03PS1) 10Urbanecm: Enable extension SandboxLink for nowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478896 (https://phabricator.wikimedia.org/T210325) [11:16:14] 10Operations, 10MediaWiki-Cache, 10Performance-Team (Radar), 10User-Elukey, 10Wikimedia-production-error: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10elukey) >>! In T203786#4813350, @Nikerabbit wrote: >>... [11:18:18] 10Operations, 10Cloud-VPS, 10IPv6, 10cloud-services-team (Kanban): Enable IPv6 on CloudVPS - https://phabricator.wikimedia.org/T37947 (10Krenair) [11:18:27] 10Operations, 10Puppet, 10Beta-Cluster-Infrastructure, 10Technical-Debt, 10Tracking: Minimize differences between beta and production (Tracking) - https://phabricator.wikimedia.org/T87220 (10Krenair) [11:19:48] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478897 [11:21:48] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478897 (owner: 10Marostegui) [11:22:06] yeah I found the cumin openstack backend to be confusing [11:22:13] RECOVERY - mediawiki-installation DSH group on mw1272 is OK: OK [11:22:20] I normally just use the puppet DB backend anyway since we have nice things like that in beta :) [11:22:51] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478897 (owner: 10Marostegui) [11:23:46] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1087 T86338 T202167 (duration: 00m 46s) [11:23:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:23:53] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [11:23:53] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [11:24:13] 10Operations, 10MediaWiki-Cache, 10Performance-Team (Radar), 10User-Elukey, 10Wikimedia-production-error: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10Nikerabbit) I suppose one of those API request could... [11:25:42] (03PS1) 10GTirloni: toolforge: Absent BigBrother [puppet] - 10https://gerrit.wikimedia.org/r/478898 (https://phabricator.wikimedia.org/T208357) [11:26:19] (03CR) 10jerkins-bot: [V: 04-1] toolforge: Absent BigBrother [puppet] - 10https://gerrit.wikimedia.org/r/478898 (https://phabricator.wikimedia.org/T208357) (owner: 10GTirloni) [11:27:10] (03PS4) 10BBlack: authdns config/CI refactor [1/5] [puppet] - 10https://gerrit.wikimedia.org/r/478809 (https://phabricator.wikimedia.org/T205439) [11:27:12] (03PS4) 10BBlack: authdns config/CI refactor [3/5] [puppet] - 10https://gerrit.wikimedia.org/r/478810 (https://phabricator.wikimedia.org/T205439) [11:27:14] (03PS4) 10BBlack: authdns config/CI refactor [5/5] [puppet] - 10https://gerrit.wikimedia.org/r/478811 (https://phabricator.wikimedia.org/T205439) [11:27:47] (03Abandoned) 10Hashar: (DO NOT SUBMIT) I have a typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478889 (owner: 10Hashar) [11:28:01] (03PS2) 10GTirloni: toolforge: Absent BigBrother [puppet] - 10https://gerrit.wikimedia.org/r/478898 (https://phabricator.wikimedia.org/T208357) [11:28:42] (03CR) 10jerkins-bot: [V: 04-1] toolforge: Absent BigBrother [puppet] - 10https://gerrit.wikimedia.org/r/478898 (https://phabricator.wikimedia.org/T208357) (owner: 10GTirloni) [11:31:02] (03PS3) 10GTirloni: toolforge: Absent BigBrother [puppet] - 10https://gerrit.wikimedia.org/r/478898 (https://phabricator.wikimedia.org/T208357) [11:31:31] (03PS2) 10Volans: validator: promote clean warnings to errors [dns] - 10https://gerrit.wikimedia.org/r/478886 (https://phabricator.wikimedia.org/T182028) [11:31:53] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478897 (owner: 10Marostegui) [11:38:29] ok, I think I've set up all the windows to try to walk through this [11:39:00] !log puppet disabled on authdnses for attempting https://gerrit.wikimedia.org/r/q/topic:%22authdns-ci%22 [11:39:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:39:17] (03PS1) 10Filippo Giunchedi: deployment-prep: bump logstash heap memory [puppet] - 10https://gerrit.wikimedia.org/r/478902 (https://phabricator.wikimedia.org/T205851) [11:40:55] please hold any DNS-related changes for a few! [11:41:09] gehel: are you around? thereā€™s a bugfix for the WDQS UI that should be deployed ASAP, but I donā€™t know how to do it [11:41:28] (03CR) 10BBlack: [C: 032] authdns config/CI refactor [1/5] [puppet] - 10https://gerrit.wikimedia.org/r/478809 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [11:42:00] Lucas_WMDE: I'm on paternity leave, try pinging onimisionipe [11:42:02] Lucas_WMDE: better ask onimisionipe, ge.hel is out for a bit ;) [11:42:11] Lucas_WMDE: gehel isn't around and won't be for a few days maybe onimisionipe can help [11:42:15] okay, sorry! [11:42:18] gehel: what are you doing here... go waway :) [11:42:25] will try to remember :) [11:42:37] volans, marostegui : thanks ! [11:42:44] (I donā€™t know what youā€™re supposed to say to someone on paternity leaveā€¦ good luck? :D ) [11:43:07] Lucas_WMDE: try "good sleep" ! [11:43:20] good sleep, then! [11:45:50] Lucas_WMDE: hey! [11:46:05] onimisionipe: hi! [11:46:31] https://phabricator.wikimedia.org/T211629 is triaged ā€œunbreak nowā€ and a fix is available and merged [11:46:39] can you deploy it? [11:46:50] (cc hoo) [11:47:11] checking [11:48:02] Lucas_WMDE: gimmie a few min to test/check on autodeployment node first [11:48:18] sure [11:49:27] (03PS4) 10BBlack: authdns config/CI refactor [2/5] [dns] - 10https://gerrit.wikimedia.org/r/478812 (https://phabricator.wikimedia.org/T205439) [11:49:29] (03PS4) 10BBlack: authdns config/CI refactor [4/5] [dns] - 10https://gerrit.wikimedia.org/r/478813 (https://phabricator.wikimedia.org/T205439) [11:49:36] (03CR) 10jerkins-bot: [V: 04-1] authdns config/CI refactor [4/5] [dns] - 10https://gerrit.wikimedia.org/r/478813 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [11:49:38] (03CR) 10jerkins-bot: [V: 04-1] authdns config/CI refactor [2/5] [dns] - 10https://gerrit.wikimedia.org/r/478812 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [11:50:13] (03CR) 10Arturo Borrero Gonzalez: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/478898 (https://phabricator.wikimedia.org/T208357) (owner: 10GTirloni) [11:53:28] Lucas_WMDE: I will deploy now.. [11:55:49] hello o/ [11:55:55] o/ [11:55:58] (03CR) 10Filippo Giunchedi: [C: 032] deployment-prep: bump logstash heap memory [puppet] - 10https://gerrit.wikimedia.org/r/478902 (https://phabricator.wikimedia.org/T205851) (owner: 10Filippo Giunchedi) [11:56:06] (03PS2) 10Filippo Giunchedi: deployment-prep: bump logstash heap memory [puppet] - 10https://gerrit.wikimedia.org/r/478902 (https://phabricator.wikimedia.org/T205851) [11:57:43] takidelfin and i are gci students [11:57:51] yup [11:57:58] and we're new to deployments, and are willing to learn! [11:58:21] * takidelfin is excited too [11:58:40] !log onimisionipe@deploy1001 Started deploy [wdqs/wdqs@dcde39f]: GUI update [11:58:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:59:00] welcome :) [11:59:20] * Lucas_WMDE peeks at the deployment calendar [11:59:25] yay, HD logos! [11:59:35] \o/ [11:59:45] !log onimisionipe@deploy1001 Finished deploy [wdqs/wdqs@dcde39f]: GUI update (duration: 01m 05s) [11:59:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:00:05] :D yes [12:00:05] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181211T1200). [12:00:05] shreyasminocha, takidelfin, and Urbanecm: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [12:00:22] stickers! [12:00:28] (03CR) 10BBlack: [V: 032 C: 032] authdns config/CI refactor [2/5] [dns] - 10https://gerrit.wikimedia.org/r/478812 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [12:00:28] lol [12:00:43] jouncebot has a sense of humour [12:00:55] #bothumor [12:01:22] wait till you get your first -1 from jenkins [12:01:29] I can SWAT today [12:01:40] :O [12:01:45] akosiaris: :D been there [12:01:54] :D [12:02:25] `(Merge Conflict)` [12:02:27] hi zeljkof ! [12:02:30] Urbanecm: o/ [12:02:33] o/ [12:02:49] Lucas_WMDE: can you try now? [12:03:36] onimisionipe: still getting the same error :/ [12:03:51] force-reloading the page doesnā€™t seem to help [12:03:56] shreyasminocha, takidelfin, and Urbanecm: anybody in a hurry, any urgent commits, or can I deploy in the order they are in calendar? [12:04:07] I think you can deploy them in order [12:04:08] calendar order is fine with me [12:04:11] yup :D [12:04:43] calendar order it is then [12:05:03] please stand by, I'll let you know when a patch is ready for testing at mwdebug1002 [12:05:10] sure [12:05:16] shreyasminocha, takidelfin: do you know how to test at mwdebug1002? [12:05:28] if not, docs are at https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug [12:05:43] revising [12:05:52] ah, extension? [12:05:55] yes [12:06:00] Lucas_WMDE: git log does not show that change after fetching and rebasing [12:06:49] I feel like in a teleportation terminal now ~ `mwdebug1002` [12:06:53] i think i'm erady [12:06:55] (03CR) 10BBlack: [C: 032] authdns config/CI refactor [3/5] [puppet] - 10https://gerrit.wikimedia.org/r/478810 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [12:07:39] onimisionipe: does this WDQSGuiBuilder thing need to happen first, perhaps? [12:07:44] as in e.ā€Æg. https://phabricator.wikimedia.org/T207749#4778236 [12:08:08] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478034 (https://phabricator.wikimedia.org/T150618) (owner: 10Shreyasminocha) [12:08:32] (03PS5) 10BBlack: authdns config/CI refactor [3/5] [puppet] - 10https://gerrit.wikimedia.org/r/478810 (https://phabricator.wikimedia.org/T205439) [12:08:39] (03CR) 10BBlack: [V: 032 C: 032] authdns config/CI refactor [3/5] [puppet] - 10https://gerrit.wikimedia.org/r/478810 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [12:09:13] (03Merged) 10jenkins-bot: Add HD logos for 3 projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478034 (https://phabricator.wikimedia.org/T150618) (owner: 10Shreyasminocha) [12:09:28] godog: ok to merge? [12:09:51] Lucas_WMDE: Honestly.. I'm not sure [12:10:08] (03PS1) 10Marostegui: Revert "dbproxy1010: Depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/478911 [12:10:17] * onimisionipe is checking WDQS page on wikitech [12:10:33] (03PS2) 10Marostegui: Revert "dbproxy1010: Depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/478911 [12:10:39] (03CR) 10jenkins-bot: Add HD logos for 3 projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478034 (https://phabricator.wikimedia.org/T150618) (owner: 10Shreyasminocha) [12:10:50] assuming so! [12:10:54] bblack: whoops, yes [12:10:56] sorry about that! [12:11:17] (03CR) 10Giuseppe Lavagetto: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/478813 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [12:11:20] (03CR) 10Marostegui: [C: 032] Revert "dbproxy1010: Depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/478911 (owner: 10Marostegui) [12:11:25] (03CR) 10jerkins-bot: [V: 04-1] authdns config/CI refactor [4/5] [dns] - 10https://gerrit.wikimedia.org/r/478813 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [12:11:32] Lucas_WMDE: here: https://wikitech.wikimedia.org/wiki/Wikidata_query_service#GUI_deployment [12:11:38] shreyasminocha: 478034 is at mwdebug1002, please test and let me know if I can deploy [12:11:52] sure, testing [12:11:56] * Lucas_WMDE reads [12:11:58] shreyasminocha: let me know if you need help testing [12:12:02] !log Repool labsdb1011 - T86338 [12:12:05] alright, thanks [12:12:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:12:06] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [12:13:37] (03CR) 10Mobrovac: [C: 031] hieradata: replace restbase seeds in codfw [puppet] - 10https://gerrit.wikimedia.org/r/478870 (https://phabricator.wikimedia.org/T211416) (owner: 10Filippo Giunchedi) [12:13:53] (03CR) 10Mobrovac: [C: 031] hieradata: add restbase20[3-8] to restbase [puppet] - 10https://gerrit.wikimedia.org/r/478673 (https://phabricator.wikimedia.org/T211416) (owner: 10Filippo Giunchedi) [12:14:14] (03CR) 10Mobrovac: [C: 031] restbase: remove production_ng remnants [puppet] - 10https://gerrit.wikimedia.org/r/478869 (https://phabricator.wikimedia.org/T211416) (owner: 10Filippo Giunchedi) [12:15:12] Lucas_WMDE: so I assume you have to do the `grunt deploy` and merge it. I only do `git submodule update` in deployment. [12:15:36] okay, Iā€™ll try [12:15:39] zeljkof: tested, lgtm. you may proceed with deployment [12:16:04] shreyasminocha: ok, deploying [12:16:53] shreyasminocha: gz :D [12:16:58] !log zfilipin@deploy1001 Synchronized static/images/project-logos/: SWAT: [[gerrit:478034|Add HD logos for 3 projects (T150618)]] (duration: 00m 47s) [12:17:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:17:02] T150618: Provide HD logos for all projects - https://phabricator.wikimedia.org/T150618 [12:17:32] shreyasminocha: it's deployed, please test again with the extension disabled, let me know if you have questions or need help [12:17:38] sure [12:18:08] 10Operations, 10Citoid, 10Patch-For-Review, 10Services (done), 10VisualEditor (Current work): Transition citoid to use Zotero's translation-server-v2 - https://phabricator.wikimedia.org/T197242 (10Mvolz) [12:18:38] (03CR) 10BBlack: [V: 032 C: 032] authdns config/CI refactor [4/5] [dns] - 10https://gerrit.wikimedia.org/r/478813 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [12:18:53] shreyasminocha: there is merge conflict for 478036, could you please rebase the patch? let me know if you need help with that [12:19:03] sure, just a sec [12:19:30] zeljkof: Could I rebase my patch too? [12:19:37] because there is merge conflict too [12:19:47] i'm going to rebase onto master, gerrit warns me that it'll break the relation chain. cause for worry? [12:19:48] takidelfin, please wait when it's about to be merged [12:19:55] shreyasminocha, no, that's okay [12:20:01] 10Operations, 10Wikimedia-Logstash, 10service-runner, 10Core Platform Team Backlog (Next), 10Services (next): Move service-runner to new logging infrastructure - https://phabricator.wikimedia.org/T211125 (10fgiunchedi) >>! In T211125#4806051, @Pchelolo wrote: > There's not that much logging happening in... [12:20:02] alright, thanks [12:20:02] Urbanecm: sure [12:20:20] it's because if you do it now, merging other commits might cause another merge conflict [12:20:52] D: [12:21:27] onimisionipe: grunt is making too many assumptions about my SSH setup :/ [12:21:29] trying again [12:21:52] shreyasminocha, takidelfin: before deployment, if gerrit reports merge conflict, I'll do a rebase from gerrit (there is a rebase button in the web interface), but that works only for the trivial conflicts, anything a bit complicated and gerrit gives up, so you have to rebase manually on your machine [12:22:10] am rebasing on my machine atm [12:22:33] gerrit gave up [12:22:35] no problem, merging git conflicts is a sport ( Ķ”~ ĶœŹ– Ķ”Ā°) [12:22:43] Lucas_WMDE: I hate it when they make assumptions...sorry [12:23:08] Ebe123: o/ [12:23:23] Hello! [12:23:29] Not in -dev? [12:23:32] (03PS5) 10BBlack: authdns config/CI refactor [5/5] [puppet] - 10https://gerrit.wikimedia.org/r/478811 (https://phabricator.wikimedia.org/T205439) [12:24:02] Lucas_WMDE: onimisionipe: I think it worked for me now: https://gerrit.wikimedia.org/r/c/wikidata/query/gui-deploy/+/478916 [12:24:16] Ebe123, they're working on the get conf change deployed task [12:24:18] hoo: ah, so thatā€™s why Iā€™m seeing ā€œconflicts withā€ changesā€¦ [12:24:22] mine is https://gerrit.wikimedia.org/r/c/wikidata/query/gui-deploy/+/478915 [12:24:33] Cool [12:24:41] and thereā€™s also https://gerrit.wikimedia.org/r/c/wikidata/query/gui-deploy/+/478903 and https://gerrit.wikimedia.org/r/c/wikidata/query/gui-deploy/+/478904 [12:24:45] wat [12:24:53] Great :D [12:25:07] and why does my patch have more changes o_O [12:25:36] (03CR) 10BBlack: [C: 032] authdns config/CI refactor [5/5] [puppet] - 10https://gerrit.wikimedia.org/r/478811 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [12:25:39] I guess it would be best to go with the WDQS builder one? [12:26:06] probably [12:26:25] but Iā€™m trying another manual one after `npm update; npm install`, just to see if that fixes the extra changes [12:26:39] Iā€™m also not sure how to merge these anywaysā€¦ I donā€™t see any activity in Zuul [12:27:26] Probably force merge [12:27:38] yeah https://gerrit.wikimedia.org/r/c/wikidata/query/gui-deploy/+/478667 [12:28:06] oh, so we need to set Verified+2 too? [12:28:14] thatā€™s probably why Iā€™m not seeing a Submit button yet [12:28:19] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478709 (https://phabricator.wikimedia.org/T150618) (owner: 10Takidelfin) [12:28:41] here it goes :O [12:28:54] takidelfin: I'm merging 478709, while waiting for 478036 [12:29:06] those can be deployed in any order [12:29:12] hoo: so letā€™s merge https://gerrit.wikimedia.org/r/c/wikidata/query/gui-deploy/+/478904 ? thatā€™s the latest GuiBuilder one [12:29:19] :O [12:29:22] (03Merged) 10jenkins-bot: HD Logos: Add 1.5x and 2x variants of fr and fy wikibooks and fr wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478709 (https://phabricator.wikimedia.org/T150618) (owner: 10Takidelfin) [12:29:23] Good to know, thanks [12:29:37] zeljkof, ftr, shreyasminocha has some problems with doing rebase, I'm helping him in separate conversation [12:29:42] 10Operations, 10Release Pipeline: blubber template for nodejs should allow defining configuration files to copy to the container - https://phabricator.wikimedia.org/T211580 (10mobrovac) >>! In T211580#4810394, @mobrovac wrote: > FYI, we can also have an alternative mount point for the config file, and can tell... [12:29:56] Urbanecm: thanks! [12:30:01] yw [12:30:34] takidelfin: 478709 is at mwdebug1002, please test and let me know if I can deploy [12:30:42] takidelfin: let me know if you need help with testing [12:30:46] okay :D [12:30:55] hoo, onimisionipe: https://gerrit.wikimedia.org/r/c/wikidata/query/gui-deploy/+/478904 merged [12:31:11] Coolā€¦ who'll scap it now? [12:32:03] hoo: I'm in a middle of a deployment, can you wait after swat window? or is it urgent? [12:32:39] I guess we can wait, but I'm going for in a moment [12:32:45] for lunch that is [12:33:03] (It's a service deploy, not MW related at all, btw) [12:33:12] (03CR) 10Mobrovac: [C: 04-1] role::beta: introduce docker_services (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/478637 (owner: 10Giuseppe Lavagetto) [12:33:12] hoo: I can let you deploy after the current patch is done, or after swat [12:33:40] whatever you prefer [12:33:41] (03PS1) 10BBlack: Test random good edit [dns] - 10https://gerrit.wikimedia.org/r/478918 [12:33:43] (03PS1) 10BBlack: Test random bad edit [dns] - 10https://gerrit.wikimedia.org/r/478919 [12:33:52] (03CR) 10jerkins-bot: [V: 04-1] Test random bad edit [dns] - 10https://gerrit.wikimedia.org/r/478919 (owner: 10BBlack) [12:33:53] zeljkof: After the current patch would be lovely [12:34:02] hoo: sure, I'll let you know [12:34:40] (03PS6) 10Shreyasminocha: Update settings to include new HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478036 (https://phabricator.wikimedia.org/T150618) [12:34:45] thanks Urbanecm [12:34:57] zeljkof: fixed [12:34:58] yw [12:34:59] (03CR) 10BBlack: [C: 032] Test random good edit [dns] - 10https://gerrit.wikimedia.org/r/478918 (owner: 10BBlack) [12:35:23] (03CR) 10jerkins-bot: [V: 04-1] Update settings to include new HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478036 (https://phabricator.wikimedia.org/T150618) (owner: 10Shreyasminocha) [12:35:44] zeljkof: I think you can deploy it :D [12:35:47] shreyasminocha: great, hoo has an urgent deployment, and I'm in a middle of deployment for takidelfin, I'll let you know when your patch is ready [12:35:52] takidelfin: ok, deploying [12:35:52] sure thing [12:36:42] (03CR) 10jenkins-bot: HD Logos: Add 1.5x and 2x variants of fr and fy wikibooks and fr wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478709 (https://phabricator.wikimedia.org/T150618) (owner: 10Takidelfin) [12:36:49] (03Abandoned) 10BBlack: Test random bad edit [dns] - 10https://gerrit.wikimedia.org/r/478919 (owner: 10BBlack) [12:36:49] !log zfilipin@deploy1001 Synchronized static/images/project-logos/: SWAT: [[gerrit:478709|HD Logos: Add 1.5x and 2x variants of fr and fy wikibooks and fr wikinews (T150618)]] (duration: 00m 46s) [12:36:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:36:53] T150618: Provide HD logos for all projects - https://phabricator.wikimedia.org/T150618 [12:37:05] !log updated nodejs nodejs-legacy on aqs1004 (security upgrades) [12:37:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:37:11] takidelfin: it's deployed, please disable the extension and test [12:37:15] (03PS1) 10BBlack: Revert "Test random good edit" [dns] - 10https://gerrit.wikimedia.org/r/478920 [12:37:25] hoo: I'm done with this patch, go ahead [12:37:30] (03CR) 10BBlack: [C: 032] Revert "Test random good edit" [dns] - 10https://gerrit.wikimedia.org/r/478920 (owner: 10BBlack) [12:37:42] zeljkof: okay [12:37:43] shreyasminocha, jenkins voted -1 [12:37:50] looking [12:38:31] * shreyasminocha facepalms [12:38:39] zeljkof: umm? https://usercontent.irccloud-cdn.com/file/2NBDuShr/screenshot_20181211_133821.png [12:38:39] !log Authdns CI/config refactoring done, all is well, resume normal DNS ops! [12:38:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:39:04] zeljkof, can confirm what zeljkof reports [12:39:10] did you purge the cache? [12:39:22] *what takidelfin reports [12:39:26] (03PS7) 10Shreyasminocha: Update settings to include new HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478036 (https://phabricator.wikimedia.org/T150618) [12:39:26] :D [12:39:53] !log hoo@deploy1001 Started deploy [wdqs/wdqs@f914415]: Fix WDQS UI embeds (T211629) [12:39:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:39:56] T211629: WDQS embeds do not display - https://phabricator.wikimedia.org/T211629 [12:40:12] (03CR) 10jerkins-bot: [V: 04-1] Update settings to include new HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478036 (https://phabricator.wikimedia.org/T150618) (owner: 10Shreyasminocha) [12:40:17] :O [12:40:20] :O [12:40:22] [*] [12:40:33] takidelfin, Urbanecm: cache purge is not needed for new files, but I can do it if there is a problem, that's the only file with problems, or there are more? [12:40:55] zeljkof, https://en.wikipedia.org/static/images/project-logos/enwikinews-2x.png returns 404 on prod, but works on debug [12:41:07] https://wikipedia.org/static/images/project-logos/frwikibooks-2x.png [12:41:17] this works on production though [12:41:45] this is the url format I usually use https://en.wikipedia.org/static/images/project-logos/newikibooks.png [12:41:51] from https://wikitech.wikimedia.org/wiki/SWAT_deploys/Deployers#Purging [12:41:54] https://wikipedia.org/static/images/project-logos/frwikibooks-2x.png [12:41:54] and this works too [12:42:47] (03PS8) 10Shreyasminocha: Update settings to include new HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478036 (https://phabricator.wikimedia.org/T150618) [12:42:50] https://en.wikipedia.org/static/images/project-logos/frwikinews-1.5x.png [12:42:53] does not work too [12:43:16] takidelfin: can you send me the list of filenames to purge? or should I just purge them all? [12:43:42] enwikinews-2x.png [12:43:42] enwikinews-1.5x.png [12:44:18] `static/images/project-logos/enwikinews-2x.png` [12:44:18] `static/images/project-logos/enwikinews-1.5x.png` [12:44:36] zeljkof ^ [12:44:55] (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478036 (https://phabricator.wikimedia.org/T150618) (owner: 10Shreyasminocha) [12:45:09] zeljkof, ftr, shreyasminocha's patch should be good to go [12:45:10] takidelfin: purged, try again [12:45:19] standing by [12:45:27] hoo: done? can I continue? [12:45:45] https://en.wikipedia.org/static/images/project-logos/enwikinews-2x.png does not work :( [12:46:02] takidelfin, try it with Ctrl+Shift+R [12:46:03] zeljkof: It's still running [12:46:04] works for me [12:46:09] but I don't see a reason not to continue [12:46:18] Urbanecm: yup, works now [12:46:18] this service is no requirement for anything MediaWiki [12:46:29] hoo: ok, I'll continue then [12:46:32] zeljkof: works now :D [12:46:39] takidelfin: great! [12:47:43] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478036 (https://phabricator.wikimedia.org/T150618) (owner: 10Shreyasminocha) [12:48:09] shreyasminocha: you're next, please stand by [12:48:15] ack [12:48:26] takidelfin: I'll let you know when your patch is ready for testing [12:48:44] okie [12:48:50] (03Merged) 10jenkins-bot: Update settings to include new HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478036 (https://phabricator.wikimedia.org/T150618) (owner: 10Shreyasminocha) [12:48:57] looks like WDQS embeds are working again \o/ [12:48:59] thanks hoo [12:49:06] (03PS4) 10GTirloni: toolforge: Absent BigBrother [puppet] - 10https://gerrit.wikimedia.org/r/478898 (https://phabricator.wikimedia.org/T208357) [12:49:18] and thanks onimisionipe as well! [12:49:27] (03CR) 10jenkins-bot: Update settings to include new HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478036 (https://phabricator.wikimedia.org/T150618) (owner: 10Shreyasminocha) [12:49:55] shreyasminocha: 478036 is at mwdebug1002, please test [12:49:57] (03CR) 10GTirloni: [C: 032] toolforge: Absent BigBrother [puppet] - 10https://gerrit.wikimedia.org/r/478898 (https://phabricator.wikimedia.org/T208357) (owner: 10GTirloni) [12:50:02] alright [12:50:24] !log hoo@deploy1001 Finished deploy [wdqs/wdqs@f914415]: Fix WDQS UI embeds (T211629) (duration: 10m 31s) [12:50:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:50:27] T211629: WDQS embeds do not display - https://phabricator.wikimedia.org/T211629 [12:51:01] Lucas_WMDE: onimisionipe: Cool :) [12:51:10] zeljkof: tested, works [12:51:17] shreyasminocha: ok, deploying [12:51:39] oh, did I close the task before the scap was complete? sorry :D [12:52:10] !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:478036|Update settings to include new HD logos (T150618)]] (duration: 00m 47s) [12:52:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:52:14] T150618: Provide HD logos for all projects - https://phabricator.wikimedia.org/T150618 [12:52:27] shreyasminocha: it's deployed, please test and thanks for deploying with #releng :) [12:52:48] (03PS4) 10Zfilipin: HD Logos: Add fr and fy wikibooks and fr wiiknews variants to InitaliseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478712 (https://phabricator.wikimedia.org/T150618) (owner: 10Takidelfin) [12:52:49] shreyasminocha, congrats for deploying your first patches with #releng! [12:53:02] testing [12:53:07] my pleasure! [12:53:32] confirmed, everything works [12:53:32] shreyasminocha: yay šŸŽ‰ [12:54:03] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478712 (https://phabricator.wikimedia.org/T150618) (owner: 10Takidelfin) [12:54:16] thanks zeljkof and Urbanecm [12:54:22] yw [12:54:32] Lucas_WMDE: seems we good now. yw! [12:54:41] shreyasminocha: I'm glad I could help :) [12:55:07] (03Merged) 10jenkins-bot: HD Logos: Add fr and fy wikibooks and fr wiiknews variants to InitaliseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478712 (https://phabricator.wikimedia.org/T150618) (owner: 10Takidelfin) [12:55:23] :) [12:55:53] goodbye shreyasminocha! [12:55:59] oh, i'm too late [12:56:01] takidelfin: 478712 is at mwdebug1002, please test [12:56:06] sure [12:57:08] (03PS1) 10GTirloni: toolforge: Remove BigBrother puppet code [puppet] - 10https://gerrit.wikimedia.org/r/478926 (https://phabricator.wikimedia.org/T208357) [12:58:19] (03CR) 10GTirloni: [C: 032] toolforge: Remove BigBrother puppet code [puppet] - 10https://gerrit.wikimedia.org/r/478926 (https://phabricator.wikimedia.org/T208357) (owner: 10GTirloni) [13:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181211T1300) [13:01:01] zeljkof: I think everything is right [13:01:11] takidelfin: ok, deploying [13:02:21] !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:478712|HD Logos: Add fr and fy wikibooks and fr wiiknews variants to InitaliseSettings.php (T150618)]] (duration: 00m 46s) [13:02:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:02:25] T150618: Provide HD logos for all projects - https://phabricator.wikimedia.org/T150618 [13:02:35] Urbanecm: we ran out of time for your commits, can you move them to another swat? [13:02:38] (03CR) 10jenkins-bot: HD Logos: Add fr and fy wikibooks and fr wiiknews variants to InitaliseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478712 (https://phabricator.wikimedia.org/T150618) (owner: 10Takidelfin) [13:02:40] sure [13:02:52] takidelfin: it's deployed, please test and thanks for deploying with #releng :) [13:02:54] see you at tomorrow's EU SWAT then! [13:03:03] Urbanecm: sorry for it :( [13:03:07] zeljkof: yay, testing [13:03:08] np [13:03:11] Urbanecm: thanks, I'm running the train in an hour, I have to get ready for it [13:03:25] sure, i totally understand :) [13:03:34] takidelfin: it happens, we had an urgent commit in the middle and a couple of new people, so it took more time than usual [13:04:03] :D [13:04:15] :zeljkof: until that can I deploy? [13:04:34] I mean before the deploy [13:04:52] zeljkof: works for me :D [13:04:55] I have a few pool/depool tasks in my queue [13:04:59] zeljkof, rescheduled for tomorrow [13:05:03] banyek: is it urgent? I need to cut the branch and all that, it's officially "no deployment" (Pre MediaWiki train sanity break) window [13:05:13] no, it is not [13:05:17] that's why I asked [13:05:30] banyek: I would prefer if you could wait until the train [13:05:46] So it was my last task... it is sad... GCI ended :( [13:05:54] sure thing 16:00 utc, right? [13:06:04] !log EU SWAT finished [13:06:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:06:50] banyek: correct, I might be done with train earlier [13:06:52] takidelfin, submit your task for review, I'll approve it [13:06:55] This is how such big thing like Wikipedia is managed :thinking_face: I thought that every wiki (like en, fr etc) has got their own servers and they do everything at their own [13:06:56] * Imperial March plays in background [13:07:20] Urbanecm: thank you, submitted [13:07:34] nope that'd be crazy inconsistent [13:07:39] {{approved}} [13:07:50] ^-^ [13:07:51] see you during GCI 2019 takidelfin and shreyasminocha ! [13:08:09] if i'm eligible, that is :D [13:08:21] But I don't want to leave ;-; [13:08:23] thanks! [13:08:40] takidelfin, also all the non-wikipedia wikimedia sites run alongside them on the same servers [13:08:50] :O [13:09:13] configured in pretty much the same way, some minor variations but they're mostly kept minimal [13:09:49] how much ram these servers have then :O [13:09:56] (03PS1) 10BBlack: Add tab checker and a run-test.sh CI script [dns] - 10https://gerrit.wikimedia.org/r/478929 (https://phabricator.wikimedia.org/T205439) [13:10:02] this all said there are separate database clusters - en.wikipedia.org, commons.wikimedia.org, wikidata.org all have their own database masters etc. [13:10:07] (03CR) 10jerkins-bot: [V: 04-1] Add tab checker and a run-test.sh CI script [dns] - 10https://gerrit.wikimedia.org/r/478929 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [13:10:19] wow... [13:10:31] whereas most of the rest are located in groups together [13:10:35] (03PS2) 10BBlack: Add tab checker and a run-tests.sh CI script [dns] - 10https://gerrit.wikimedia.org/r/478929 (https://phabricator.wikimedia.org/T205439) [13:10:38] that's complicated as they're served by lots of different servers doing different things [13:10:45] (03CR) 10jerkins-bot: [V: 04-1] Add tab checker and a run-tests.sh CI script [dns] - 10https://gerrit.wikimedia.org/r/478929 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [13:11:35] Krenair: thanks for explaination :D [13:11:49] (03PS3) 10BBlack: Add tab checker and a run-tests.sh CI script [dns] - 10https://gerrit.wikimedia.org/r/478929 (https://phabricator.wikimedia.org/T205439) [13:12:15] there's the caching servers, application servers, database servers, a bunch of more complicated things and all the networking and load balancing between it all [13:12:47] 10Operations, 10monitoring, 10User-CDanis: puppet-provisioned dashboards not found in Grafana 5 - https://phabricator.wikimedia.org/T211654 (10CDanis) Ah, I think I know what happened. Hopefully will be an easy fix... [13:12:52] FYI I'm cutting the branch now T206662 [13:12:53] devops = cool [13:12:54] T206662: 1.33.0-wmf.8 deployment blockers - https://phabricator.wikimedia.org/T206662 [13:13:19] wikimedia devs largely have deployment access but that doesn't extent to touching infrastructure, there's more traditional ops for that [13:13:26] extend* [13:13:42] * while the most complicated thing I did was connecting two VPS... [13:16:50] :D [13:17:24] 10Operations, 10monitoring, 10Goal, 10cloud-services-team (Kanban): Toolforge: Port sge.py stats to Prometheus - https://phabricator.wikimedia.org/T211684 (10GTirloni) p:05Triage>03High [13:19:37] (03PS2) 10Volans: puppet: add PuppetMaster class [software/spicerack] - 10https://gerrit.wikimedia.org/r/477707 (https://phabricator.wikimedia.org/T205884) [13:19:39] (03PS2) 10Volans: Add ipmi module [software/spicerack] - 10https://gerrit.wikimedia.org/r/478030 (https://phabricator.wikimedia.org/T205884) [13:19:41] (03PS1) 10Volans: icinga: fix typo in test docstring [software/spicerack] - 10https://gerrit.wikimedia.org/r/478931 (https://phabricator.wikimedia.org/T205884) [13:20:01] (03CR) 10Volans: "comment addressed/replied (see inline)" (0313 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/477707 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [13:20:07] (03CR) 10Volans: "comment addressed/replied (see inline)" (038 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/478030 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [13:23:10] 10Operations, 10Citoid, 10Prod-Kubernetes, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Citoid automated monitoring times out due to Zotero v2 - https://phabricator.wikimedia.org/T211411 (10Mvolz) I can get Zotero to time out locally with 10.1098/rspb.2000.1188 Although if... [13:27:05] (03CR) 10ArielGlenn: [C: 031] "I had a look at the labstore1006 changes, my only concern would be rsync --delete jobs from stat1007 to the labstore box but those are all" [puppet] - 10https://gerrit.wikimedia.org/r/478020 (https://phabricator.wikimedia.org/T205846) (owner: 10Elukey) [13:27:09] (03CR) 10Volans: Add tab checker and a run-tests.sh CI script (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/478929 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [13:29:22] lol [13:30:19] (03PS1) 10Banyek: labsdb: depool labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/478932 (https://phabricator.wikimedia.org/T210693) [13:30:35] (03PS3) 10Elukey: Move remaining stat1005 references to stat1007 [puppet] - 10https://gerrit.wikimedia.org/r/478020 (https://phabricator.wikimedia.org/T205846) [13:31:35] (03CR) 10Elukey: [C: 032] Move remaining stat1005 references to stat1007 [puppet] - 10https://gerrit.wikimedia.org/r/478020 (https://phabricator.wikimedia.org/T205846) (owner: 10Elukey) [13:36:07] (03CR) 10BBlack: Add tab checker and a run-tests.sh CI script (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/478929 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [13:36:30] (03PS4) 10BBlack: Add tab checker and a run-tests.sh CI script [dns] - 10https://gerrit.wikimedia.org/r/478929 (https://phabricator.wikimedia.org/T205439) [13:36:41] lol [13:36:44] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: replace restbase seeds in codfw [puppet] - 10https://gerrit.wikimedia.org/r/478870 (https://phabricator.wikimedia.org/T211416) (owner: 10Filippo Giunchedi) [13:37:23] don't look at the other shellscripts that are already there :) [13:37:41] (03PS3) 10Filippo Giunchedi: hieradata: add restbase20[3-8] to restbase [puppet] - 10https://gerrit.wikimedia.org/r/478673 (https://phabricator.wikimedia.org/T211416) [13:37:43] (03PS2) 10Filippo Giunchedi: restbase: remove production_ng remnants [puppet] - 10https://gerrit.wikimedia.org/r/478869 (https://phabricator.wikimedia.org/T211416) [13:37:45] (03PS2) 10Filippo Giunchedi: hieradata: replace restbase seeds in codfw [puppet] - 10https://gerrit.wikimedia.org/r/478870 (https://phabricator.wikimedia.org/T211416) [13:38:09] bblack: lol, deal :) [13:39:38] (03CR) 10Volans: Add tab checker and a run-tests.sh CI script (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/478929 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [13:39:45] sorry, didn't looked at the script before :) [13:40:33] I guess it depends who merges first! [13:40:59] :) [13:41:30] let me cherry-pick down your stuff though, I haven't really even tried it yet [13:41:34] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: add restbase20[3-8] to restbase [puppet] - 10https://gerrit.wikimedia.org/r/478673 (https://phabricator.wikimedia.org/T211416) (owner: 10Filippo Giunchedi) [13:41:38] (03CR) 10Filippo Giunchedi: [C: 032] restbase: remove production_ng remnants [puppet] - 10https://gerrit.wikimedia.org/r/478869 (https://phabricator.wikimedia.org/T211416) (owner: 10Filippo Giunchedi) [13:41:40] (03PS1) 10Elukey: Apply role::statistics::gpu to stat1005 [puppet] - 10https://gerrit.wikimedia.org/r/478933 (https://phabricator.wikimedia.org/T205846) [13:42:18] wow, the new output is really nice! [13:43:02] (03PS2) 10Elukey: Apply role::statistics::gpu to stat1005 [puppet] - 10https://gerrit.wikimedia.org/r/478933 (https://phabricator.wikimedia.org/T205846) [13:44:25] (03CR) 10Elukey: [C: 032] Apply role::statistics::gpu to stat1005 [puppet] - 10https://gerrit.wikimedia.org/r/478933 (https://phabricator.wikimedia.org/T205846) (owner: 10Elukey) [13:44:30] (03PS3) 10Elukey: Apply role::statistics::gpu to stat1005 [puppet] - 10https://gerrit.wikimedia.org/r/478933 (https://phabricator.wikimedia.org/T205846) [13:44:42] thx :) [13:47:00] (03CR) 10BBlack: [C: 031] "Looks nice, seems to work!" [dns] - 10https://gerrit.wikimedia.org/r/478416 (https://phabricator.wikimedia.org/T182028) (owner: 10Volans) [13:47:20] (03CR) 10Vgutierrez: [C: 032] lists: Use the certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/476869 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [13:47:46] volans: did you really vet all those ganetis? it's way more than I would've thought we had! [13:47:47] (03PS6) 10Vgutierrez: lists: Use the certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/476869 (https://phabricator.wikimedia.org/T207050) [13:48:11] but I don't look around for the list of them, either! [13:48:21] !log Use certcentral TLS managed certificate in lists.wikimedia.org - T207050 [13:48:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:48:25] T207050: Migrate most standard public TLS certificates to CertCentral issuance - https://phabricator.wikimedia.org/T207050 [13:48:41] I pick the list from gnt-group list on eqiad and codfw [13:48:46] *picked [13:49:08] and grepped if any of them was not having at all a ganeti comment [13:49:18] once one is added the tool told me what to fix :D [13:49:43] _joe_: thank you! [13:50:09] <_joe_> bearloga: np, I just saw it was still lying around, and I merged it [13:50:59] bblack: in this case we could even make it patch the file as it's trivial but yeah, ideas for another time :) [13:51:08] mobrovac: puppet ran on restbase codfw boxes, could you check too all is well and deploys / rolling restarts work as expected? [13:51:17] mobrovac: the new hosts are depooled still tho [13:52:35] kk will check godog, in a meeting now though [13:54:24] (03CR) 10BBlack: [C: 031] comments: uniform and add missing Ganeti comments [dns] - 10https://gerrit.wikimedia.org/r/478622 (https://phabricator.wikimedia.org/T182028) (owner: 10Volans) [13:54:34] (03CR) 10BBlack: [C: 031] validator: improve Ganeti comment check [dns] - 10https://gerrit.wikimedia.org/r/478885 (https://phabricator.wikimedia.org/T182028) (owner: 10Volans) [13:54:41] (03CR) 10BBlack: [C: 031] validator: promote clean warnings to errors [dns] - 10https://gerrit.wikimedia.org/r/478886 (https://phabricator.wikimedia.org/T182028) (owner: 10Volans) [13:54:59] volans: yeah let's merge them up and then switch on -e [13:55:12] mobrovac: yup, thanks! [13:55:12] 10Operations, 10MediaWiki-Logging, 10Wikimedia-Logstash, 10Patch-For-Review: Move mediawiki to new logging infrastructure - https://phabricator.wikimedia.org/T211124 (10fgiunchedi) >>! In T211124#4807164, @bd808 wrote: >>>! In T211124#4805389, @fgiunchedi wrote: >> Thanks @bd808 for the context/insight, I... [13:55:14] sounds like a plan! proceeding [13:55:18] thanks for the reviews [13:55:53] not that I really read the python deeply, but at this point if it works it works, it's not yet blocking CI and we can debug as we go if some fault ever does [13:56:23] yeah I assumed that [13:56:31] (03PS5) 10Volans: validator: complete refactor of the validation [dns] - 10https://gerrit.wikimedia.org/r/478416 (https://phabricator.wikimedia.org/T182028) [13:56:59] (03CR) 10Volans: [C: 032] validator: complete refactor of the validation [dns] - 10https://gerrit.wikimedia.org/r/478416 (https://phabricator.wikimedia.org/T182028) (owner: 10Volans) [13:57:12] of course now my neat little for loop doesn't work because -e :P [13:57:19] (03PS2) 10Volans: validator: improve Ganeti comment check [dns] - 10https://gerrit.wikimedia.org/r/478885 (https://phabricator.wikimedia.org/T182028) [13:57:26] eheheh yeah :D [13:57:39] blame fai.don, he wanted the default to be just the summary [13:57:50] (03CR) 10Volans: [C: 032] validator: improve Ganeti comment check [dns] - 10https://gerrit.wikimedia.org/r/478885 (https://phabricator.wikimedia.org/T182028) (owner: 10Volans) [13:58:08] (03PS3) 10Volans: validator: promote clean warnings to errors [dns] - 10https://gerrit.wikimedia.org/r/478886 (https://phabricator.wikimedia.org/T182028) [13:58:28] (missed a :-P above ofc, it makes sense) [13:58:39] (03CR) 10Volans: [C: 032] validator: promote clean warnings to errors [dns] - 10https://gerrit.wikimedia.org/r/478886 (https://phabricator.wikimedia.org/T182028) (owner: 10Volans) [14:00:04] zeljkof: (Dis)respected human, time to deploy MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181211T1400). Please do the needful. [14:00:18] o/ [14:00:24] cutting 1.33.0-wmf.8 [14:00:48] 10Operations, 10Analytics, 10Research-management, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) Very good news, finally stat1005 is ready for experiment with GPU drivers etc.. I am completely ignorant about the subject so if anybody has time/patience pl... [14:01:15] (03PS4) 10Vgutierrez: lists: Get rid of the old LE puppetization [puppet] - 10https://gerrit.wikimedia.org/r/476872 (https://phabricator.wikimedia.org/T207050) [14:01:21] (03PS3) 10Volans: comments: uniform and add missing Ganeti comments [dns] - 10https://gerrit.wikimedia.org/r/478622 (https://phabricator.wikimedia.org/T182028) [14:01:46] (03CR) 10Volans: [C: 032] comments: uniform and add missing Ganeti comments [dns] - 10https://gerrit.wikimedia.org/r/478622 (https://phabricator.wikimedia.org/T182028) (owner: 10Volans) [14:02:41] bblack: all merged + authdns-updated [14:02:47] all yours :) [14:04:17] 10Operations, 10Traffic, 10Patch-For-Review: Migrate most standard public TLS certificates to CertCentral issuance - https://phabricator.wikimedia.org/T207050 (10Vgutierrez) [14:05:40] 10Operations, 10Product-Analytics, 10Patch-For-Review: Upload shiny-server .deb to our Stretch apt repository - https://phabricator.wikimedia.org/T168967 (10mpopov) Alright, dependencies for Shiny Server resolved. Now on to the problem of Shiny Server itself: ` Error: Execution of '/usr/bin/apt-get -q -y -o... [14:08:26] (03CR) 10Vgutierrez: [C: 032] lists: Get rid of the old LE puppetization [puppet] - 10https://gerrit.wikimedia.org/r/476872 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [14:08:38] !log mobrovac@deploy1001 Started deploy [restbase/deploy@44e0955]: Bring restbase201[3-8] up to date - T211416 [14:08:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:43] T211416: Put restbase201[3-8] into conftool and LVS - https://phabricator.wikimedia.org/T211416 [14:09:33] (03PS5) 10BBlack: Add tab checker and a run-tests.sh CI script [dns] - 10https://gerrit.wikimedia.org/r/478929 (https://phabricator.wikimedia.org/T205439) [14:09:49] volans: ^ [14:10:31] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@44e0955]: Bring restbase201[3-8] up to date - T211416 (duration: 01m 53s) [14:10:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:58] (03CR) 10Volans: [C: 031] "LGTM, of course the zone validator will fails right now as we still have 5 outstanding errors." [dns] - 10https://gerrit.wikimedia.org/r/478929 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [14:13:19] !log mobrovac@deploy1001 Started deploy [restbase/deploy@44e0955]: Bring restbase201[3-8] up to date [14:13:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:13:23] (03CR) 10BBlack: [C: 032] Add tab checker and a run-tests.sh CI script [dns] - 10https://gerrit.wikimedia.org/r/478929 (https://phabricator.wikimedia.org/T205439) (owner: 10BBlack) [14:13:57] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@44e0955]: Bring restbase201[3-8] up to date (duration: 00m 38s) [14:13:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:08] now if we just didn't pointlessly reload zones every time a random non-data file is edited in the repo :) [14:14:19] but one thing at a time [14:15:13] (03PS2) 10Filippo Giunchedi: logging: introduce cee formatter usage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478621 (https://phabricator.wikimedia.org/T211124) [14:15:14] right :) [14:15:28] (03CR) 10Herron: [C: 031] logstash: copy 'severity' into 'level' where needed [puppet] - 10https://gerrit.wikimedia.org/r/476473 (https://phabricator.wikimedia.org/T205851) (owner: 10Filippo Giunchedi) [14:16:03] (03CR) 10jerkins-bot: [V: 04-1] logging: introduce cee formatter usage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478621 (https://phabricator.wikimedia.org/T211124) (owner: 10Filippo Giunchedi) [14:16:40] (03PS1) 10Hashar: (DO NOT SUBMIT) testing tabulations [dns] - 10https://gerrit.wikimedia.org/r/478936 [14:17:17] godog: heh, it seems we'll need to pool the hosts manually before we can do anything since conftool refuses to pool/repool them during restarts which makes the deployment fail [14:17:27] godog: but i see no reason why they can't be pooled [14:19:01] (03PS3) 10Filippo Giunchedi: logging: introduce cee formatter usage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478621 (https://phabricator.wikimedia.org/T211124) [14:19:19] (03CR) 10Herron: [C: 031] "Looks good! one minor comment" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/476472 (https://phabricator.wikimedia.org/T205851) (owner: 10Filippo Giunchedi) [14:19:59] (03PS1) 10BBlack: Remove pointless ns[012] IPv6 revdns [dns] - 10https://gerrit.wikimedia.org/r/478938 [14:20:06] mobrovac: ugh curious, what's the error? curious in the sense that conftool should be able to (de)pool now [14:20:51] (03CR) 10Marostegui: [C: 031] labsdb: depool labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/478932 (https://phabricator.wikimedia.org/T210693) (owner: 10Banyek) [14:21:12] (03PS1) 10BBlack: Remove pointless ns[012] IPv6 in ns_addrs [puppet] - 10https://gerrit.wikimedia.org/r/478939 [14:21:16] godog: the pooler script gives me the exit code 2, which signals that none of the conftool hosts accepts to (de)pool the hosts [14:22:07] mobrovac: mhh ok, I'll pool restbase2013 first as a test [14:22:17] k thnx [14:22:31] i'll try to exec the script after you do to see if that did the trick [14:23:11] (03PS2) 10BBlack: Remove pointless ns[012] IPv6 in ns_addrs [puppet] - 10https://gerrit.wikimedia.org/r/478939 [14:23:36] (03CR) 10BBlack: [C: 032] Remove pointless ns[012] IPv6 in ns_addrs [puppet] - 10https://gerrit.wikimedia.org/r/478939 (owner: 10BBlack) [14:23:43] !log filippo@puppetmaster1001 conftool action : set/pooled=yes; selector: name=restbase2013.codfw.wmnet [14:23:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:24:44] godog: hmmmm, no change [14:24:51] that's fishy [14:25:46] indeed [14:27:16] requests are making it to restbase2013 afaics so it is pooled at least [14:27:44] godog: hmmm so why does the conftool script not succeed? [14:27:45] hm hm [14:27:59] godog: i have to leave now, but we can continue the investigation tomorrow [14:28:08] (03PS1) 10Zfilipin: Group0 to 1.33.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478941 [14:28:33] mobrovac: ok I'll keep digging [14:28:42] grazie godog! [14:31:44] PROBLEM - Host ns0-v6 is DOWN: PING CRITICAL - Packet loss = 100% [14:31:49] !log removed unused public IPv6 IPs from authdnses manually with "ip -6 addr del ..." - https://gerrit.wikimedia.org/r/c/operations/puppet/+/478939 [14:31:50] PROBLEM - Host ns2-v6 is DOWN: PING CRITICAL - Packet loss = 100% [14:31:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:31:59] ignore those alerts [14:32:23] I actually ran the update on the icinga servers first too, but apparently there are other non-obvious monitoring to deal with [14:32:26] (03PS1) 10Vgutierrez: certcentral: Provide certificates for mirrors.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/478943 (https://phabricator.wikimedia.org/T207050) [14:32:28] (03PS1) 10Vgutierrez: mirrors: Deploy certcentral TLS managed certificate [puppet] - 10https://gerrit.wikimedia.org/r/478944 (https://phabricator.wikimedia.org/T207050) [14:32:30] (03CR) 10Volans: [C: 031] "LGTM, it would be better if anyone from traffic could have a look too." [puppet] - 10https://gerrit.wikimedia.org/r/475753 (https://phabricator.wikimedia.org/T207195) (owner: 10Gehel) [14:32:54] !log decommissioning cassandra-c, restbase2005 -- T210843 [14:32:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:01] T210843: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 [14:33:12] bblack: puppet run on the authdns -> puppet run on icinga1001 -> check disappears ;) [14:33:14] (03CR) 10Vgutierrez: [C: 032] certcentral: Provide certificates for mirrors.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/478943 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [14:33:21] (03PS2) 10Vgutierrez: certcentral: Provide certificates for mirrors.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/478943 (https://phabricator.wikimedia.org/T207050) [14:33:58] I'm assuming those are exported resources, checking [14:34:01] volans: hmmm I didn't think we were using exported [14:34:39] apparently we are in modules/authdns/manifests/monitoring/global.pp [14:34:41] (03CR) 10Alex Monk: [C: 031] mirrors: Deploy certcentral TLS managed certificate [puppet] - 10https://gerrit.wikimedia.org/r/478944 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [14:34:44] ah! [14:34:58] well, re-running [14:35:44] PROBLEM - Host ns1-v6 is DOWN: PING CRITICAL - Packet loss = 100% [14:35:47] (03PS1) 10Rafidaslam: Add `napwikisource` [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478945 (https://phabricator.wikimedia.org/T210752) [14:35:51] (03CR) 10BBlack: [C: 032] Remove pointless ns[012] IPv6 revdns [dns] - 10https://gerrit.wikimedia.org/r/478938 (owner: 10BBlack) [14:36:33] (03CR) 10jerkins-bot: [V: 04-1] Add `napwikisource` [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478945 (https://phabricator.wikimedia.org/T210752) (owner: 10Rafidaslam) [14:37:19] eh, icinga still didn't remove those hosts, and it's been minute since all the authdns ran their agents [14:37:42] and it run again on icinga too? [14:37:46] maybe I missed it, yeah [14:38:01] yeah I mean I didn't see a diff running again on icinga, but maybe I raced with an auto-run that did the diff [14:38:14] (03CR) 10Vgutierrez: [C: 032] mirrors: Deploy certcentral TLS managed certificate [puppet] - 10https://gerrit.wikimedia.org/r/478944 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [14:38:21] (03PS2) 10Vgutierrez: mirrors: Deploy certcentral TLS managed certificate [puppet] - 10https://gerrit.wikimedia.org/r/478944 (https://phabricator.wikimedia.org/T207050) [14:38:23] oh hmm that's not it either [14:39:04] Dec 11 14:26:45 icinga1001 puppet-agent[178506]: (/Stage[main]/Icinga/Nagios_host[ns0-v6]/ensure) removed [14:39:07] Dec 11 14:26:45 icinga1001 puppet-agent[178506]: (/Stage[main]/Icinga/Nagios_host[ns1-v6]/ensure) removed [14:39:10] Dec 11 14:26:45 icinga1001 puppet-agent[178506]: (/Stage[main]/Icinga/Nagios_host[ns2-v6]/ensure) removed [14:39:13] Dec 11 14:26:47 icinga1001 puppet-agent[178506]: (/Stage[main]/Icinga/Nagios_service[icinga1001 ns0-v6]/ensure) removed [14:39:16] Dec 11 14:26:47 icinga1001 puppet-agent[178506]: (/Stage[main]/Icinga/Nagios_service[icinga1001 ns1-v6]/ensure) removed [14:39:19] Dec 11 14:26:47 icinga1001 puppet-agent[178506]: (/Stage[main]/Icinga/Nagios_service[icinga1001 ns2-v6]/ensure) removed [14:39:22] actually it was all removed at :26 earlier, before those alerts [14:39:37] (03PS2) 10Rafidaslam: Add `napwikisource` [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478945 (https://phabricator.wikimedia.org/T210752) [14:40:25] (03CR) 10jerkins-bot: [V: 04-1] Add `napwikisource` [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478945 (https://phabricator.wikimedia.org/T210752) (owner: 10Rafidaslam) [14:40:29] mmmh having a look [14:41:12] oh, that didn't trigger an icinga reload at all [14:41:26] if you look at /var/log/puppet.log for that :26 recent run [14:41:52] (03PS3) 10Rafidaslam: Add `napwikisource` [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478945 (https://phabricator.wikimedia.org/T210752) [14:41:57] config is gone from /etc/nagios/ [14:42:01] while the v4 are there [14:42:34] !log 'sudo systemctl reload icinga' on icinga1001 [14:42:40] (03CR) 10jerkins-bot: [V: 04-1] Add `napwikisource` [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478945 (https://phabricator.wikimedia.org/T210752) (owner: 10Rafidaslam) [14:42:50] and now they are gone... [14:42:58] why didn't puppet reload it? [14:43:02] so I guess puppet maybe didn't notify (reload) icinga? [14:43:06] I'll check the log [14:43:18] !log zfilipin@deploy1001 Pruned MediaWiki: 1.33.0-wmf.1 (duration: 11m 40s) [14:43:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:43:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:44:01] bblack: btw I recall you asked me once that a puppet run from run-puppet-agent -q was "gone", actually it's logged in puppetboard for 1d, but yeah we should log it anyway [14:45:10] (03PS1) 10Vgutierrez: mirrors: Use the certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/478948 (https://phabricator.wikimedia.org/T207050) [14:45:38] yep no refresh from nagios_service [14:45:40] https://puppetboard.wikimedia.org/report/icinga1001.wikimedia.org/24387d2b1ed42d2ee4dc5de25b032edc771cce50 [14:45:51] just cam1-a-b-eqiad left now [14:46:04] (03CR) 10Alex Monk: [C: 031] mirrors: Use the certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/478948 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [14:46:27] that's T207965, out of my control :) [14:46:28] T207965: eqiad: Re-connect cage cameras - https://phabricator.wikimedia.org/T207965 [14:46:42] (03CR) 10Hashar: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/478936 (owner: 10Hashar) [14:46:47] !log zfilipin@deploy1001 Pruned MediaWiki: 1.33.0-wmf.2 (duration: 03m 12s) [14:46:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:46:52] (03CR) 10jerkins-bot: [V: 04-1] (DO NOT SUBMIT) testing tabulations [dns] - 10https://gerrit.wikimedia.org/r/478936 (owner: 10Hashar) [14:47:07] (03Abandoned) 10Hashar: (DO NOT SUBMIT) testing tabulations [dns] - 10https://gerrit.wikimedia.org/r/478936 (owner: 10Hashar) [14:47:19] bblack: I'm tempted to guess that we don't notify Icinga if we just remove stuff, checking the code [14:47:35] (03PS1) 10Vgutierrez: mirrors: Get rid of the old LE puppetization [puppet] - 10https://gerrit.wikimedia.org/r/478950 (https://phabricator.wikimedia.org/T207050) [14:48:21] (03CR) 10jerkins-bot: [V: 04-1] mirrors: Get rid of the old LE puppetization [puppet] - 10https://gerrit.wikimedia.org/r/478950 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [14:48:32] *sigh* [14:49:17] !log depooling labsdb1010 - T210693 [14:49:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:49:21] T210693: Create materialized views on Wiki Replica hosts for better query performance - https://phabricator.wikimedia.org/T210693 [14:49:30] (03CR) 10Banyek: [C: 032] labsdb: depool labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/478932 (https://phabricator.wikimedia.org/T210693) (owner: 10Banyek) [14:49:42] even the whole mirrors::serve code is there before our lovely linter or somebody got it V:+2 manually [14:49:49] s/even/either [14:50:00] 10Operations, 10monitoring, 10User-CDanis: puppet-provisioned dashboards not found in Grafana 5 - https://phabricator.wikimedia.org/T211654 (10CDanis) I believe I've fixed this: https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?orgId=1 https://grafana.wikimedia.org/d/myRmf1Pik/varnish-aggregate-client-status-... [14:50:27] (03PS3) 10Andrew Bogott: Neutron: allow VMs to access the neutron API [puppet] - 10https://gerrit.wikimedia.org/r/478786 (https://phabricator.wikimedia.org/T211391) [14:50:29] (03PS1) 10Andrew Bogott: Horizon: move projects to eqiad1-r: maps and wm-bot [puppet] - 10https://gerrit.wikimedia.org/r/478951 (https://phabricator.wikimedia.org/T204745) [14:50:34] !log zfilipin@deploy1001 Pruned MediaWiki: 1.33.0-wmf.3 (duration: 02m 51s) [14:50:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:06] volans: we could comment them with a fixme referencing the ticket maybe [14:51:27] up to paravoid, he was following it, same for me [14:51:31] what? [14:51:39] catch me up? :) [14:51:40] (03PS2) 10Banyek: labsdb: depool labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/478932 (https://phabricator.wikimedia.org/T210693) [14:51:43] (03CR) 10Banyek: [V: 032 C: 032] labsdb: depool labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/478932 (https://phabricator.wikimedia.org/T210693) (owner: 10Banyek) [14:51:50] paravoid: cam DNS entries are the last errors [14:52:01] should we add an ignore comment? [14:52:05] referencing the task [14:52:19] so that's a mess and we'll probably have to redo them from scratch [14:52:39] but until we do, I'd be hesitant to remove their DNS records, because maybe they'll come up online at some point with their old IPs [14:53:19] sorry, I didn't mean comment out the entry itself, but add a comment that makes zone_validator ignore it, reffing the ticket [14:53:49] (03PS2) 10Andrew Bogott: Horizon: move projects to eqiad1-r: maps and wm-bot [puppet] - 10https://gerrit.wikimedia.org/r/478951 (https://phabricator.wikimedia.org/T204745) [14:53:52] we can add a comment with 'wmf-zone-validator-ignore=MISSING_OR_WRONG_IP_FOR_NAME_AND_PTR' in it (even lowercase, is the same) [14:54:23] we can also add an entry "cam-unknown" with an IP of 10.64.0.4 and 2620:0:861:101::4 [14:54:54] as I have no idea which camera is that, these were taken offline during a switch migration [14:55:00] !log zfilipin@deploy1001 Started scap: testwiki to php-1.33.0-wmf.8 and rebuild l10n cache [14:55:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:04] (03CR) 10Andrew Bogott: [C: 032] Horizon: move projects to eqiad1-r: maps and wm-bot [puppet] - 10https://gerrit.wikimedia.org/r/478951 (https://phabricator.wikimedia.org/T204745) (owner: 10Andrew Bogott) [14:57:46] (03PS3) 10ArielGlenn: convert dump scripts to python3 [dumps] - 10https://gerrit.wikimedia.org/r/478702 (https://phabricator.wikimedia.org/T210989) [14:59:16] I'll push a patch, give me a sec [15:02:38] 10Operations, 10Traffic, 10Continuous-Integration-Infrastructure (Slipway), 10Patch-For-Review, 10User-ArielGlenn: CI jobs for authdns linting need to run on Stretch - https://phabricator.wikimedia.org/T205439 (10BBlack) @hashar - So where we're at now is that we just need our CI switched to a Docker wit... [15:03:53] (03PS1) 10Faidon Liambotis: Name stray camera IPs as "cam-unknown" [dns] - 10https://gerrit.wikimedia.org/r/478953 [15:03:58] volans, bblack ^ [15:04:03] (03CR) 10jerkins-bot: [V: 04-1] Name stray camera IPs as "cam-unknown" [dns] - 10https://gerrit.wikimedia.org/r/478953 (owner: 10Faidon Liambotis) [15:04:09] uhoh [15:04:46] ah I know [15:04:50] good catch jenkins :) [15:04:52] (03CR) 10Vgutierrez: [C: 032] mirrors: Use the certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/478948 (https://phabricator.wikimedia.org/T207050) (owner: 10Vgutierrez) [15:05:02] (03PS2) 10Vgutierrez: mirrors: Use the certcentral managed TLS certificate [puppet] - 10https://gerrit.wikimedia.org/r/478948 (https://phabricator.wikimedia.org/T207050) [15:05:20] also a bug on the validator though! [15:05:22] sorry :) [15:05:57] I had IN A \nIN A instead of AAAA, and the validator bought it [15:06:08] gdnsd didn't :) [15:06:08] (03PS2) 10Faidon Liambotis: Name stray camera IPs as "cam-unknown" [dns] - 10https://gerrit.wikimedia.org/r/478953 [15:06:29] the line reported seemed wrong too: Zone wmnet.: Zonefile parse error at line 127: General parse error [15:06:39] your change was at line 123 [15:06:42] indeed [15:06:48] but that's gdnsd, not mine ;) [15:06:49] !log Use certcentral managed TLS certificate in mirrors.wikimedia.org - T207050 [15:06:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:06:55] T207050: Migrate most standard public TLS certificates to CertCentral issuance - https://phabricator.wikimedia.org/T207050 [15:07:03] yup, I hit bugs in both I guess ;) [15:07:58] looking at the history... [15:08:12] ca12fa20 defined all the cam IPs, and the old .4 revdns predates it back to the beginning of git time [15:08:18] yup [15:08:29] regardless of recent switch work, I'd say the .4 is just ancient invalid junk to be confirmed [15:08:35] I think so too [15:08:55] 10Operations, 10Traffic, 10Patch-For-Review: Migrate most standard public TLS certificates to CertCentral issuance - https://phabricator.wikimedia.org/T207050 (10Vgutierrez) [15:09:08] but I'm not removing it because then we may reuse it for something else, and then maybe we have some camera junk that uses it [15:09:15] and we plug that back in again and cause an issue or something [15:09:23] so I'd rather keep it there until we re-do all the cameras [15:10:03] and re: the wrong line, it's quite hard to do much better. the dns zonefile spec is ridiculous. the parser isn't aware its in a bad state until it gets to the next record. [15:10:09] (in this particular case!) [15:12:09] (03PS4) 10Rafidaslam: Initial configuration for napwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478945 (https://phabricator.wikimedia.org/T210752) [19:48:10] 10Operations, 10Patch-For-Review: clean up deprecated TLS certificates from the puppet repo - https://phabricator.wikimedia.org/T211697 (10Dzahn) p:05Triage>03Normal [19:48:20] 10Operations, 10Traffic, 10Patch-For-Review: clean up deprecated TLS certificates from the puppet repo - https://phabricator.wikimedia.org/T211697 (10Dzahn) [19:52:50] 10Operations, 10Operations-Software-Development, 10Patch-For-Review: Develop and deploy at least three Netbox reports to assist with data correctness and consistency - https://phabricator.wikimedia.org/T205899 (10crusnov) > My preference would be: > - check if upstream has any bug open or closed on the argum... [19:57:15] !log apply BGP_IXP_RS_in and avoid HE to cr4-ulsfo - T211079 [19:57:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:57:18] T211079: IPv6 ~20ms higher ping than IPv4 to gerrit - https://phabricator.wikimedia.org/T211079 [20:00:04] Deploy window MediaWiki train - Americas version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181211T2000) [20:03:08] 10Operations, 10Mail: Create affcom-staff email account - https://phabricator.wikimedia.org/T176153 (10Dzahn) Hi @egalvezwmf @bcampbell There is no affcom related mail alias or address on our ("ops") side, at least not anymore if there ever was in the past. I don't think we are needed for this. and since we a... [20:03:15] 10Operations, 10Operations-Software-Development, 10Patch-For-Review: Develop and deploy at least three Netbox reports to assist with data correctness and consistency - https://phabricator.wikimedia.org/T205899 (10crusnov) [20:04:39] 10Operations, 10Mail: Create affcom-staff email account - https://phabricator.wikimedia.org/T176153 (10Dzahn) By the way, there is (or has been) also T61104 which is a private mailing list for affcom members. [20:12:08] 10Operations: docker-registry.wikimedia.org caches images missing instead of revalidating - https://phabricator.wikimedia.org/T211719 (10hashar) [20:12:16] 10Operations, 10MediaWiki-Cache, 10Performance-Team (Radar), 10User-Elukey, 10Wikimedia-production-error: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10aaron) >>! In T203786#4807458, @aaron wrote: >>>! In... [20:21:13] 10Operations, 10Traffic, 10netops: IPv6 ~20ms higher ping than IPv4 to gerrit - https://phabricator.wikimedia.org/T211079 (10ayounsi) The issue is not present in eqdfw, eqiad, esams, as HE is not sending those routes through the RS. Pushing the "avoid HE prefixes from the RS" change to those sites to ensure... [20:23:45] 10Operations: "sql" command fails with "sh: 1: mysql: not found" on mwdebug1002 - https://phabricator.wikimedia.org/T211512 (10Dzahn) The "sql" command works on maintenance servers, so mwmaint1002. At first i thought it was about them and the switch from terbium -> mwmaint1002 but this is an mwdebug host so like... [20:26:03] 10Operations, 10Traffic, 10netops: Free up 185.15.59.0/24 - https://phabricator.wikimedia.org/T211254 (10ayounsi) Talked a bit over IRC, tldr, the rationale has been added to the beginning of the task's description. Triggering conversation was about removing WMCS 185.15.56.0/23 from prod ACLs. [20:31:29] (03Abandoned) 10Ottomata: [WIP] Configure cloud-analytics-eqiad Hadoop cluster [puppet] - 10https://gerrit.wikimedia.org/r/468070 (https://phabricator.wikimedia.org/T204951) (owner: 10Ottomata) [20:31:43] 10Operations, 10Traffic, 10netops, 10IPv6: Fix IPv6 autoconf issues once and for all, across the fleet. - https://phabricator.wikimedia.org/T102099 (10herron) On a personal level I firmly believe interface config belongs in the OS install phase (as described in option 1) and ideally never modified by Puppe... [20:31:45] 10Operations, 10Traffic, 10netops: IPv6 ~20ms higher ping than IPv4 to gerrit - https://phabricator.wikimedia.org/T211079 (10ayounsi) 05Open>03stalled All done, marking the task as stalled until T204281 [20:32:01] 10Operations, 10netops, 10Performance-Team (Radar): Stop prioritizing peering over transit - https://phabricator.wikimedia.org/T204281 (10ayounsi) [20:32:05] 10Operations, 10Traffic, 10netops: IPv6 ~20ms higher ping than IPv4 to gerrit - https://phabricator.wikimedia.org/T211079 (10ayounsi) [20:34:38] (03PS2) 10Herron: add ip6 mapped interfaces to logstash codfw metal hosts [puppet] - 10https://gerrit.wikimedia.org/r/479031 (https://phabricator.wikimedia.org/T211065) [20:35:39] 10Operations: docker-registry.wikimedia.org caches images missing instead of revalidating - https://phabricator.wikimedia.org/T211719 (10hashar) Based on rebuild of https://integration.wikimedia.org/ci/job/commit-message-validator/ the miss seems to be cached for 30 minutes. [20:36:37] (03CR) 10Herron: [C: 032] add ip6 mapped interfaces to logstash codfw metal hosts [puppet] - 10https://gerrit.wikimedia.org/r/479031 (https://phabricator.wikimedia.org/T211065) (owner: 10Herron) [20:42:00] 10Operations, 10ops-codfw, 10DBA, 10decommission: Decommission parsercache hosts: pc2004 pc2005 pc2006 (Dec 2018 lease return) - https://phabricator.wikimedia.org/T209858 (10Papaul) ` papaul@asw-b-codfw> show interfaces ge-5/0/35 descriptions Interface Admin Link Description ge-5/0/35 down do... [20:42:25] 10Operations, 10ops-codfw, 10DBA, 10decommission: Decommission parsercache hosts: pc2004 pc2005 pc2006 (Dec 2018 lease return) - https://phabricator.wikimedia.org/T209858 (10Papaul) [20:46:40] 10Operations, 10ops-codfw, 10DBA, 10decommission: Decommission parsercache hosts: pc2004 pc2005 pc2006 (Dec 2018 lease return) - https://phabricator.wikimedia.org/T209858 (10Papaul) @Marostegui any reason why production DNS is still showing for pc2004? 170 1H IN PTR pc2004.codfw.wmnet. [20:49:24] 10Operations, 10Analytics, 10Performance-Team, 10Traffic: Only serve debug HTTP headers when x-wikimedia-debug is present - https://phabricator.wikimedia.org/T210484 (10Milimetric) >>! In T210484#4812997, @Gilles wrote: >>>! In T210484#4794749, @fdans wrote: >> Analytics needs x-analytics in every request,... [20:50:17] (03PS1) 10Andrew Bogott: novaproxy: support user-config for $banned_ips and $blocked_user_agent_regex [puppet] - 10https://gerrit.wikimedia.org/r/479041 (https://phabricator.wikimedia.org/T211709) [20:51:50] (03PS1) 10Papaul: DNS: removed mgmt DNS entries for pc200[4-6] [dns] - 10https://gerrit.wikimedia.org/r/479042 (https://phabricator.wikimedia.org/T209858) [20:56:07] 10Operations, 10ops-codfw, 10DBA, 10decommission, 10Patch-For-Review: Decommission parsercache hosts: pc2004 pc2005 pc2006 (Dec 2018 lease return) - https://phabricator.wikimedia.org/T209858 (10Papaul) [21:04:15] (03PS4) 10ArielGlenn: convert dump scripts to python3 [dumps] - 10https://gerrit.wikimedia.org/r/478702 (https://phabricator.wikimedia.org/T210989) [21:04:36] (03CR) 10jerkins-bot: [V: 04-1] convert dump scripts to python3 [dumps] - 10https://gerrit.wikimedia.org/r/478702 (https://phabricator.wikimedia.org/T210989) (owner: 10ArielGlenn) [21:06:26] PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: /{domain}/v1/media/image/featured/{year}/{month}/{day} (retrieve featured image data for April 29, 2016) is CRITICAL: Test retrieve featured image data for April 29, 2016 returned the unexpected status 504 (expecting: 200) [21:11:07] 10Operations: "sql" command fails with "sh: 1: mysql: not found" on mwdebug1002 - https://phabricator.wikimedia.org/T211512 (10Dzahn) "sql" translates to "exec mwscript mysql.php" and that translates to "php /srv/mediawiki/multiversion/MWScript.php mysql.php --wiki=metawiki --wikidb=enwiki" looking at mysq.php... [21:14:12] Hi, is it normal to see packet loss to gerrit.wikimedia.org? [21:14:13] 3 packets transmitted, 2 packets received, 33.3% packet loss [21:14:46] no [21:15:29] paladox, I'd look at traceroute next [21:15:33] ok [21:15:51] i've re ran ping and it dosen't show packet loss now which is strange. [21:17:34] (03PS1) 10Herron: network::constants: add codfw logstash kafka brokers [puppet] - 10https://gerrit.wikimedia.org/r/479048 (https://phabricator.wikimedia.org/T211065) [21:22:44] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy [21:22:44] 10Operations: "sql" command fails with "sh: 1: mysql: not found" on mwdebug1002 - https://phabricator.wikimedia.org/T211512 (10Dzahn) `role::mediawiki_maintenance` includes `profile::mariadb::client ` which uses class { 'mariadb::packages_client': That installs "'`wmf-mariadb101-client'`, # mariadb client, c... [21:23:43] :o [21:24:19] (03PS1) 10Andrew Bogott: Horizon: move 'wildcat' project to eqiad1-r [puppet] - 10https://gerrit.wikimedia.org/r/479055 [21:24:52] (03PS2) 10Andrew Bogott: Horizon: move 'wildcat' project to eqiad1-r [puppet] - 10https://gerrit.wikimedia.org/r/479055 (https://phabricator.wikimedia.org/T204703) [21:26:14] (03CR) 10Andrew Bogott: [C: 032] Horizon: move 'wildcat' project to eqiad1-r [puppet] - 10https://gerrit.wikimedia.org/r/479055 (https://phabricator.wikimedia.org/T204703) (owner: 10Andrew Bogott) [21:28:41] (03CR) 10Herron: [C: 032] network::constants: add codfw logstash kafka brokers [puppet] - 10https://gerrit.wikimedia.org/r/479048 (https://phabricator.wikimedia.org/T211065) (owner: 10Herron) [21:28:48] (03PS2) 10Herron: network::constants: add codfw logstash kafka brokers [puppet] - 10https://gerrit.wikimedia.org/r/479048 (https://phabricator.wikimedia.org/T211065) [21:41:49] Krenair i did a mtr, here's what it looks like: [21:41:49] https://phabricator.wikimedia.org/P7907 [21:43:11] RECOVERY - Check systemd state on kafkamon2001 is OK: OK - running: The system is fully operational [21:43:36] 10Operations, 10Epic, 10cloud-services-team (Kanban): CloudVPS: our ideal future model - https://phabricator.wikimedia.org/T209460 (10Multichill) Can someone point me to the current network layout? Vlans, ip space in use, what's used to route/filter traffic, etc.? Knowing the current situation is usually a g... [21:49:20] 10Operations, 10netops: Outbound BGP graceful shutdown - https://phabricator.wikimedia.org/T211728 (10ayounsi) p:05Triage>03Normal [21:59:06] 10Operations, 10Mail: Create affcom-staff email account - https://phabricator.wikimedia.org/T176153 (10bcampbell) @Dzahn I see in our ticket history that a Google Group was not desired by @egalvezwmf, but rather an additional individual account and email address. The ticket has since been resolved, so I believ... [22:00:56] 10Operations, 10netops: Replace accepted-prefix-limit with prefix-limit - https://phabricator.wikimedia.org/T211730 (10ayounsi) p:05Triage>03Low [22:01:48] 10Operations, 10ops-codfw, 10Core Platform Team, 10Services (doing), and 2 others: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 (10Eevans) [22:02:19] (03PS5) 10ArielGlenn: convert dump scripts to python3 [dumps] - 10https://gerrit.wikimedia.org/r/478702 (https://phabricator.wikimedia.org/T210989) [22:02:39] (03CR) 10jerkins-bot: [V: 04-1] convert dump scripts to python3 [dumps] - 10https://gerrit.wikimedia.org/r/478702 (https://phabricator.wikimedia.org/T210989) (owner: 10ArielGlenn) [22:02:50] 10Operations, 10Mail: Create affcom-staff email account - https://phabricator.wikimedia.org/T176153 (10Dzahn) 05Open>03Resolved a:03Dzahn Ah, the best kind of solution, it's already done: ) thanks @bcampbell ! @egalvezwmf ^ In case we missed something here just let us know are reopen the ticket. [22:04:09] !log decommissioning cassandra-a, restbase2006 -- T210843 [22:04:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:04:13] T210843: Reshape RESTBase Cassandra cluster for server refresh - https://phabricator.wikimedia.org/T210843 [22:05:35] (03PS6) 10ArielGlenn: convert dump scripts to python3 [dumps] - 10https://gerrit.wikimedia.org/r/478702 (https://phabricator.wikimedia.org/T210989) [22:06:46] !log push loopback filter term return-tcp to all routers - T207962 [22:06:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:14:35] (03PS1) 10Dzahn: mediawiki: rename the maintenance role to match other roles [puppet] - 10https://gerrit.wikimedia.org/r/479131 [22:20:59] (03PS2) 10Dzahn: DNS: remove mgmt DNS entries for pc200[4-6] [dns] - 10https://gerrit.wikimedia.org/r/479042 (https://phabricator.wikimedia.org/T209858) (owner: 10Papaul) [22:26:01] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install codfw logstash elasticsearch storage servers - https://phabricator.wikimedia.org/T211065 (10herron) [22:26:50] 10Operations, 10Wikimedia-Logstash: Procure and provision Logging pipeline hardware in multiple datacenters - https://phabricator.wikimedia.org/T205850 (10herron) [22:26:52] 10Operations, 10Wikimedia-Logstash, 10User-fgiunchedi, 10User-herron: Logstash hardware expansion - https://phabricator.wikimedia.org/T203169 (10herron) [22:26:57] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install codfw logstash elasticsearch storage servers - https://phabricator.wikimedia.org/T211065 (10herron) 05Open>03Resolved New logging storage hosts are online and puppetized with elasticsearch/kafka roles [22:27:13] (03CR) 10Dzahn: [C: 032] DNS: remove mgmt DNS entries for pc200[4-6] [dns] - 10https://gerrit.wikimedia.org/r/479042 (https://phabricator.wikimedia.org/T209858) (owner: 10Papaul) [22:30:50] (03CR) 10Dzahn: "@Robh this change removed 2007 instead of 2004 it seems" [dns] - 10https://gerrit.wikimedia.org/r/477359 (https://phabricator.wikimedia.org/T209858) (owner: 10RobH) [22:31:29] mutante: you wanna add it back or shall i? [22:32:08] robh: you add 2007, i remove 2004 ? [22:32:15] i just merged the mgmt part for 2004 [22:32:18] ... [22:32:21] pc2004 is going away [22:32:25] so im not sure what you mean? [22:32:44] whoever does it can do it in one changeset [22:32:47] pc2004 production IP is still in there [22:32:49] im asking if you want to or if i should? [22:32:52] right [22:32:54] please do it then [22:33:03] cool, wilco now [22:33:07] thanks [22:34:10] mutante: someoe fixed the pc2007 already [22:34:13] but ill remove 2004 now [22:34:24] and i decommissioned pc1007 =P [22:34:30] goddamn typos and transpositions [22:35:28] robh: ah:) yea, i only looked at that one diff before [22:35:57] (03PS1) 10RobH: decom pc2004 dns entry [dns] - 10https://gerrit.wikimedia.org/r/479135 (https://phabricator.wikimedia.org/T209858) [22:36:41] (03PS2) 10RobH: decom pc2004 dns entry [dns] - 10https://gerrit.wikimedia.org/r/479135 (https://phabricator.wikimedia.org/T209858) [22:37:01] (03CR) 10RobH: [C: 032] decom pc2004 dns entry [dns] - 10https://gerrit.wikimedia.org/r/479135 (https://phabricator.wikimedia.org/T209858) (owner: 10RobH) [22:37:20] still, was a mistake and had to be fixed, thx for finding [22:39:26] yw, just stumbled upon [22:46:07] (03PS2) 10Thcipriani: Initial Helm chart for Blubberoid [deployment-charts] - 10https://gerrit.wikimedia.org/r/479026 (https://phabricator.wikimedia.org/T211708) (owner: 10Jeena Huneidi) [22:46:49] (03PS1) 10Herron: logstash: set kafka consumer group id in codfw [puppet] - 10https://gerrit.wikimedia.org/r/479136 (https://phabricator.wikimedia.org/T205850) [22:49:57] 10Operations, 10ops-ulsfo: ulsfo: install new PDUs in racks / phase out APC loaner PDU use - https://phabricator.wikimedia.org/T209101 (10RobH) Put in a ticket with DR techs: comment section of ticket: > Digital Realty Support, > > Our two cabinets, 103.02.22 and 103.02.23 each have redundant APC PDUs cur... [22:50:45] !log ssh to tar archive data from logstash1006 /mnt (external) to labstore1007 [22:50:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:53:05] (03PS1) 10Cwhite: ci: define statsd prometheus exporter mappings [puppet] - 10https://gerrit.wikimedia.org/r/479139 (https://phabricator.wikimedia.org/T205870) [22:56:00] (03CR) 10Herron: "with this eqiad stays group_id => "logstash" (default) and codfw becomes group_id => "logstash-codfw". would handle eqiad in a separate p" [puppet] - 10https://gerrit.wikimedia.org/r/479136 (https://phabricator.wikimedia.org/T205850) (owner: 10Herron) [23:06:12] (03PS1) 10Herron: assign codfw logging VMs logstash role [puppet] - 10https://gerrit.wikimedia.org/r/479140 (https://phabricator.wikimedia.org/T205850) [23:08:42] (03CR) 10Herron: [C: 04-2] "must go after I9dda6be734c19cbd92a05cc1da4f088fcb07e613" [puppet] - 10https://gerrit.wikimedia.org/r/479140 (https://phabricator.wikimedia.org/T205850) (owner: 10Herron) [23:10:55] (03PS1) 10Dzahn: mediawiki/scap: do not install sql scripts on canary appservers [puppet] - 10https://gerrit.wikimedia.org/r/479142 (https://phabricator.wikimedia.org/T211512) [23:12:06] (03CR) 10jerkins-bot: [V: 04-1] mediawiki/scap: do not install sql scripts on canary appservers [puppet] - 10https://gerrit.wikimedia.org/r/479142 (https://phabricator.wikimedia.org/T211512) (owner: 10Dzahn) [23:12:53] 10Operations, 10Patch-For-Review: "sql" command fails with "sh: 1: mysql: not found" on mwdebug1002 - https://phabricator.wikimedia.org/T211512 (10Dzahn) p:05Triage>03Normal [23:15:35] (03PS2) 10Dzahn: mediawiki/scap: do not install sql scripts on canary appservers [puppet] - 10https://gerrit.wikimedia.org/r/479142 (https://phabricator.wikimedia.org/T211512) [23:16:26] (03CR) 10jerkins-bot: [V: 04-1] mediawiki/scap: do not install sql scripts on canary appservers [puppet] - 10https://gerrit.wikimedia.org/r/479142 (https://phabricator.wikimedia.org/T211512) (owner: 10Dzahn) [23:22:27] (03PS3) 10Dzahn: mediawiki/scap: do not install sql scripts on canary appservers [puppet] - 10https://gerrit.wikimedia.org/r/479142 (https://phabricator.wikimedia.org/T211512) [23:26:51] (03PS1) 10Paladox: php: Add support for php 7.3 [puppet] - 10https://gerrit.wikimedia.org/r/479144 [23:27:12] 10Operations, 10MediaWiki-Cache, 10Performance-Team (Radar), 10User-Elukey, 10Wikimedia-production-error: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10aaron) I'm not sure why the recache() calls would cau... [23:28:28] (03PS2) 10Paladox: php: Add support for php 7.3 [puppet] - 10https://gerrit.wikimedia.org/r/479144 [23:31:21] (03PS3) 10Paladox: php: Add support for php 7.3 [puppet] - 10https://gerrit.wikimedia.org/r/479144 [23:33:11] 10Operations, 10Patch-For-Review: "sql" command fails with "sh: 1: mysql: not found" on mwdebug1002 - https://phabricator.wikimedia.org/T211512 (10Krinkle) I don't have a preference for whether to remove or keep the `sql` command on mwdebug/canary servers. However, looking at the above patch, I notice somethin... [23:49:52] PROBLEM - Long running screen/tmux on people1001 is CRITICAL: CRIT: Long running SCREEN process. (user: root PID: 19202, 1729109s 1728000s). [23:53:09] ah, that was from me when i synced from rutherfordium. screen is terminating [23:53:36] worked as it should to remind