[00:23:57] (03PS1) 10Dzahn: smokeping: allow rsync of data from netmon1001 to netmon1002 [puppet] - 10https://gerrit.wikimedia.org/r/361606 (https://phabricator.wikimedia.org/T166180) [00:24:36] (03PS2) 10Dzahn: smokeping: allow rsync of data from netmon1001 to netmon1002 [puppet] - 10https://gerrit.wikimedia.org/r/361606 (https://phabricator.wikimedia.org/T166180) [00:26:30] 10Operations, 10ops-codfw, 10monitoring, 10Patch-For-Review: rack/setup/install netmon2001 - https://phabricator.wikimedia.org/T166180#3287592 (10Dzahn) oops, wrong ticket, disregard last comment [00:27:27] (03PS3) 10Dzahn: smokeping: allow rsync of data from netmon1001 to netmon1002 [puppet] - 10https://gerrit.wikimedia.org/r/361606 (https://phabricator.wikimedia.org/T159756) [00:28:13] (03PS1) 10Dzahn: cache::misc/smokeping: switch smokeping backend to netmon1002 [puppet] - 10https://gerrit.wikimedia.org/r/361608 (https://phabricator.wikimedia.org/T159756) [00:28:21] (03CR) 10Dzahn: [C: 032] smokeping: allow rsync of data from netmon1001 to netmon1002 [puppet] - 10https://gerrit.wikimedia.org/r/361606 (https://phabricator.wikimedia.org/T159756) (owner: 10Dzahn) [00:29:24] PROBLEM - puppet last run on lvs4004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:32:07] (03PS2) 10Dzahn: cache::misc/smokeping: switch smokeping backend to netmon1002 [puppet] - 10https://gerrit.wikimedia.org/r/361608 (https://phabricator.wikimedia.org/T159756) [00:39:25] !log netmon1001 - rsyncing smokeping data (/var/lib/smokeping) over to netmon1002 [00:39:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:40:34] (03PS2) 10Dzahn: jenkins: lack of access_log produce invalid system unit [puppet] - 10https://gerrit.wikimedia.org/r/361551 (owner: 10Hashar) [00:42:25] (03CR) 10Dzahn: [C: 032] jenkins: lack of access_log produce invalid system unit [puppet] - 10https://gerrit.wikimedia.org/r/361551 (owner: 10Hashar) [00:47:13] (03PS2) 10Dzahn: Limit mmap to indexes (work-around for abnormal page faults) [puppet] - 10https://gerrit.wikimedia.org/r/361506 (https://phabricator.wikimedia.org/T137419) (owner: 10Eevans) [00:49:47] (03CR) 10Dzahn: [C: 032] Limit mmap to indexes (work-around for abnormal page faults) [puppet] - 10https://gerrit.wikimedia.org/r/361506 (https://phabricator.wikimedia.org/T137419) (owner: 10Eevans) [00:50:12] (03CR) 10Dzahn: [C: 032] "" only affect Cassandra 3.x nodes (restbase dev)"" [puppet] - 10https://gerrit.wikimedia.org/r/361506 (https://phabricator.wikimedia.org/T137419) (owner: 10Eevans) [00:57:24] RECOVERY - puppet last run on lvs4004 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [00:59:30] (03PS4) 10Dzahn: contint: PHP ext build dependencies on Nodepool [puppet] - 10https://gerrit.wikimedia.org/r/342635 (https://phabricator.wikimedia.org/T134381) (owner: 10Hashar) [01:02:24] PROBLEM - Check health of redis instance on 6381 on rdb2003 is CRITICAL: CRITICAL: replication_delay is 1498525336 600 - REDIS 2.8.17 on 127.0.0.1:6381 has 1 databases (db0) with 8481095 keys, up 2 minutes 13 seconds - replication_delay is 1498525336 [01:02:24] PROBLEM - Check health of redis instance on 6380 on rdb2003 is CRITICAL: CRITICAL: replication_delay is 1498525336 600 - REDIS 2.8.17 on 127.0.0.1:6380 has 1 databases (db0) with 8574012 keys, up 2 minutes 14 seconds - replication_delay is 1498525336 [01:02:24] PROBLEM - Check health of redis instance on 6379 on rdb2003 is CRITICAL: CRITICAL: replication_delay is 1498525337 600 - REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 8571238 keys, up 2 minutes 15 seconds - replication_delay is 1498525337 [01:02:24] PROBLEM - Check health of redis instance on 6481 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 1498525341 600 - REDIS 2.8.17 on 127.0.0.1:6481 has 1 databases (db0) with 3867405 keys, up 2 minutes 18 seconds - replication_delay is 1498525341 [01:02:41] (03CR) 10Dzahn: [C: 031] "hmm, looks good but i'll wait until it's earlier in Europe anyways" [puppet] - 10https://gerrit.wikimedia.org/r/342635 (https://phabricator.wikimedia.org/T134381) (owner: 10Hashar) [01:03:24] RECOVERY - Check health of redis instance on 6381 on rdb2003 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6381 has 1 databases (db0) with 8478637 keys, up 3 minutes 13 seconds - replication_delay is 0 [01:03:24] RECOVERY - Check health of redis instance on 6380 on rdb2003 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6380 has 1 databases (db0) with 8570126 keys, up 3 minutes 14 seconds - replication_delay is 0 [01:03:24] RECOVERY - Check health of redis instance on 6379 on rdb2003 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6379 has 1 databases (db0) with 8569500 keys, up 3 minutes 14 seconds - replication_delay is 0 [01:03:24] RECOVERY - Check health of redis instance on 6481 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6481 has 1 databases (db0) with 3867348 keys, up 3 minutes 17 seconds - replication_delay is 0 [01:51:06] (03PS3) 10Dzahn: cache::misc/smokeping: switch smokeping backend to netmon1002 [puppet] - 10https://gerrit.wikimedia.org/r/361608 (https://phabricator.wikimedia.org/T159756) [01:59:59] Witam wszystkich, Huon chciałbyś, abyś dołączył do swojego nowego kanału ##huon [02:20:44] (03CR) 10Dzahn: [C: 032] cache::misc/smokeping: switch smokeping backend to netmon1002 [puppet] - 10https://gerrit.wikimedia.org/r/361608 (https://phabricator.wikimedia.org/T159756) (owner: 10Dzahn) [02:22:57] !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.6) (duration: 07m 46s) [02:23:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:28:24] PROBLEM - puppet last run on cp2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:29:22] !log l10nupdate@tin ResourceLoader cache refresh completed at Tue Jun 27 02:29:22 UTC 2017 (duration 6m 25s) [02:29:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:32:14] PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:32:14] PROBLEM - puppet last run on cp2012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:33:44] PROBLEM - puppet last run on cp4001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:34:34] PROBLEM - puppet last run on cp1045 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:34:41] that is me and i am fixing it [02:35:24] PROBLEM - puppet last run on cp2018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:36:25] (03PS1) 10Dzahn: cache::misc: add backend for netmon1002 [puppet] - 10https://gerrit.wikimedia.org/r/361614 (https://phabricator.wikimedia.org/T159756) [02:37:26] (03PS2) 10Dzahn: cache::misc: add backend for netmon1002 [puppet] - 10https://gerrit.wikimedia.org/r/361614 (https://phabricator.wikimedia.org/T159756) [02:37:44] PROBLEM - puppet last run on cp1051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:39:34] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:41:10] (03CR) 10Dzahn: [C: 032] cache::misc: add backend for netmon1002 [puppet] - 10https://gerrit.wikimedia.org/r/361614 (https://phabricator.wikimedia.org/T159756) (owner: 10Dzahn) [02:41:16] (03PS3) 10Dzahn: cache::misc: add backend for netmon1002 [puppet] - 10https://gerrit.wikimedia.org/r/361614 (https://phabricator.wikimedia.org/T159756) [02:41:47] (03CR) 10Dzahn: [V: 032 C: 032] cache::misc: add backend for netmon1002 [puppet] - 10https://gerrit.wikimedia.org/r/361614 (https://phabricator.wikimedia.org/T159756) (owner: 10Dzahn) [02:42:44] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:43:44] PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:43:45] RECOVERY - puppet last run on cp4001 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [02:44:24] RECOVERY - puppet last run on cp2018 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [02:45:14] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [02:45:25] (03CR) 10Dzahn: [C: 04-1] Nrpe: Fix check_ram script to work on stretch [puppet] - 10https://gerrit.wikimedia.org/r/361361 (owner: 10Paladox) [02:45:54] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [02:46:01] (03CR) 10Dzahn: [C: 04-1] "we need to fix the command for the "b" mode instead, adjust it to the changed output of "free"" [puppet] - 10https://gerrit.wikimedia.org/r/361361 (owner: 10Paladox) [02:49:34] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [02:50:14] RECOVERY - puppet last run on cp2012 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [02:50:57] 10Operations, 10Traffic, 10Wikimedia-Stream: stream.wikimedia.org - redirect http(s) to docs - https://phabricator.wikimedia.org/T70528#3381109 (10Krinkle) 05Resolved>03Open While the certificate issue has been resolved, and RCStream (at `stream.wikimedia.org/rc/`) is indeed being deprecated and replaced... [02:51:44] RECOVERY - puppet last run on cp1051 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [02:53:44] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [02:54:34] RECOVERY - puppet last run on cp1045 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [02:55:24] RECOVERY - puppet last run on cp2006 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [03:00:11] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 3 others: ForeignAPIRepo wrongly returns non-protocol-relative URLs for original "thumbs" - https://phabricator.wikimedia.org/T50133#3381113 (10Krinkle) 05stalled>03Open [03:03:19] !log netmon1002 - fixing permissions on /var/lib/smokeping rrd files (rsynced, inconstent UIDs ) [03:03:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:28:04] !log netmon1002 - ganglia apache_status.py broken in stretch (?), ganglia deprecated, stopping gmond, aggregator role got removed, was for torrus [03:28:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:35:06] !log smokeping - stop/rsync/fix permissions/start one more time to minimize gaps in graphs - now fully migrated netmon1001->netmon1002, historic data has been copied (T159756) [03:35:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:35:15] T159756: setup netmon1002.wikimedia.org - https://phabricator.wikimedia.org/T159756 [03:41:11] (03PS1) 10Dzahn: site: remove smokeping role from netmon1001 [puppet] - 10https://gerrit.wikimedia.org/r/361615 (https://phabricator.wikimedia.org/T159756) [03:43:22] !log smokeping on stretch means 2.6.11-3 vs 2.6.9-1 we had before [03:43:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:48:01] (03CR) 10Dzahn: "smokeping on stretch means 2.6.11-3 vs 2.6.9-1 we had before" [puppet] - 10https://gerrit.wikimedia.org/r/361615 (https://phabricator.wikimedia.org/T159756) (owner: 10Dzahn) [03:53:45] (03PS1) 10Dzahn: site: update TODO roles for netmon1002, no more torrus [puppet] - 10https://gerrit.wikimedia.org/r/361616 [03:57:57] (03CR) 10Dzahn: "where exactly did torrus actually die? i didn't see an obvious one in git log and there are lots of remnants but it's dead, right?" [puppet] - 10https://gerrit.wikimedia.org/r/361616 (owner: 10Dzahn) [03:59:20] (03CR) 10Dzahn: [C: 032] "so not moving any torrus role to netmon1002 because there isn't one on netmon1001" [puppet] - 10https://gerrit.wikimedia.org/r/361616 (owner: 10Dzahn) [04:04:50] (03PS1) 10Dzahn: varnish/bacula: remove torrus directors [puppet] - 10https://gerrit.wikimedia.org/r/361617 [04:06:10] (03PS2) 10Dzahn: varnish/bacula: remove torrus directors [puppet] - 10https://gerrit.wikimedia.org/r/361617 [04:07:54] (03PS1) 10Dzahn: remove torrus.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/361618 [04:10:15] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=326.30 Read Requests/Sec=1509.50 Write Requests/Sec=870.50 KBytes Read/Sec=38557.20 KBytes_Written/Sec=8762.80 [04:11:51] ACKNOWLEDGEMENT - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=1413.60 Read Requests/Sec=1760.60 Write Requests/Sec=2.90 KBytes Read/Sec=37760.80 KBytes_Written/Sec=1217.20 daniel_zahn usual spikes [04:11:53] 10Operations, 10MediaWiki-General-or-Unknown, 10Traffic, 10HTTPS: Make default interwiki map links protocol-relative - https://phabricator.wikimedia.org/T33327#3381133 (10demon) 05Open>03declined [04:19:24] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=1.60 Read Requests/Sec=0.20 Write Requests/Sec=0.70 KBytes Read/Sec=1.20 KBytes_Written/Sec=74.00 [04:38:32] (03CR) 10Dzahn: [C: 04-1] "Alexandros and Paladox are both right with their comments. need to amend to do these things. thanks for reviews and test on labs" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/354078 (owner: 10Dzahn) [04:39:55] * mutante signs out now. laters [04:50:16] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Green Cardamom → GreenC: supervision needed - https://phabricator.wikimedia.org/T168776#3381145 (10Marostegui) >>! In T168776#3379913, @alanajjar wrote: > @Marostegui can we start now? Sorry - I was not online at the time, as I am based in Eur... [04:56:05] 10Operations, 10DBA, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests: Reopen Wikinews Dutch - https://phabricator.wikimedia.org/T168764#3376111 (10Marostegui) >>! In T168764#3378717, @Urbanecm wrote: > > I really think that **real** deleting of the wiki database from all db servers (and recreating,... [04:58:36] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361621 [04:58:42] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361621 [05:01:31] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Green Cardamom → GreenC: supervision needed - https://phabricator.wikimedia.org/T168776#3381153 (10alanajjar) @Marostegui Don' mind, I'm based in Africa, Egypt (until 10 June). Aha yes, I think we should arrange a time or day! I think it's no... [05:02:17] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361621 (owner: 10Marostegui) [05:02:19] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Green Cardamom → GreenC: supervision needed - https://phabricator.wikimedia.org/T168776#3381154 (10Marostegui) >>! In T168776#3381153, @alanajjar wrote: > @Marostegui Don't mind, I'm based in Africa, Egypt (until 10 June). > > Aha yes, I think... [05:03:17] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361621 (owner: 10Marostegui) [05:03:19] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Green Cardamom → GreenC: supervision needed - https://phabricator.wikimedia.org/T168776#3381155 (10alanajjar) >>! In T168776#3380408, @Luke081515 wrote: > @alanajjar I guess it would make more sense to ping him at IRC, so if there is a problem,... [05:03:30] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1079" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361621 (owner: 10Marostegui) [05:04:26] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1079 - T166208 (duration: 00m 43s) [05:04:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:04:37] T166208: Convert unique keys into primary keys for some wiki tables on s7 - https://phabricator.wikimedia.org/T166208 [05:07:16] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Green Cardamom → GreenC: supervision needed - https://phabricator.wikimedia.org/T168776#3381161 (10alanajjar) >>! In T168776#3381154, @Marostegui wrote: >>>! In T168776#3381153, @alanajjar wrote: >> @Marostegui Don't mind, I'm based in Africa,... [05:07:40] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Green Cardamom → GreenC: supervision needed - https://phabricator.wikimedia.org/T168776#3381162 (10Marostegui) >>! In T168776#3381155, @alanajjar wrote: >>>! In T168776#3380408, @Luke081515 wrote: >> @alanajjar I guess it would make more sense... [05:08:00] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Green Cardamom → GreenC: supervision needed - https://phabricator.wikimedia.org/T168776#3381163 (10Marostegui) >>! In T168776#3381161, @alanajjar wrote: >>>! In T168776#3381154, @Marostegui wrote: >>>>! In T168776#3381153, @alanajjar wrote: >>>... [05:08:27] !log Global rename of Green Cardamom → GreenC - T168776 [05:08:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:08:37] T168776: Global rename of Green Cardamom → GreenC: supervision needed - https://phabricator.wikimedia.org/T168776 [05:20:11] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Green Cardamom → GreenC: supervision needed - https://phabricator.wikimedia.org/T168776#3381169 (10alanajjar) Finished. Thanks @Marostegui [05:21:14] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Green Cardamom → GreenC: supervision needed - https://phabricator.wikimedia.org/T168776#3381170 (10Marostegui) Great! Thanks for your cooperation! :-) [05:58:32] !log restored rdb2004 as slave of rdb2003 (end of experiment) [05:58:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:00:03] (no positive results) [06:18:08] <_joe_> elukey: as expected, right? [06:18:20] <_joe_> a negative result is still a significant result [06:18:23] <_joe_> :) [06:18:45] <_joe_> elukey: next let's try to disable jobchron after having restarted all instances in codfw? [06:18:55] <_joe_> for a few hours [06:21:02] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Green Cardamom → GreenC: supervision needed - https://phabricator.wikimedia.org/T168776#3381184 (10alanajjar) 05Open>03Resolved [06:38:48] (03PS1) 10Marostegui: db-eqiad.php: Depool db1034 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361627 (https://phabricator.wikimedia.org/T166208) [06:40:46] !log Deploy alter table s7 - dbstore1002 - no_replicate_T166208.sh [06:40:50] great [06:40:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:41:23] 10Operations, 10Ops-Access-Requests, 10Scoring-platform-team-Backlog, 10Patch-For-Review: Grant AWight accounts on ores production clusters - https://phabricator.wikimedia.org/T168442#3364601 (10ArielGlenn) >>! In T168442#3380801, @RobH wrote: > Addition to the ores-admins is a sudo group, and thus will re... [06:43:39] fixed that wrong entry [06:45:34] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1034 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361627 (https://phabricator.wikimedia.org/T166208) (owner: 10Marostegui) [06:46:52] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1034 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361627 (https://phabricator.wikimedia.org/T166208) (owner: 10Marostegui) [06:47:01] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1034 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361627 (https://phabricator.wikimedia.org/T166208) (owner: 10Marostegui) [06:47:49] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1034 - T166208 (duration: 00m 43s) [06:47:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:47:59] T166208: Convert unique keys into primary keys for some wiki tables on s7 - https://phabricator.wikimedia.org/T166208 [06:48:07] !log Deploy alter table s7 on labsdb1001 - T166208 [06:48:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:58:24] 10Operations, 10ops-codfw, 10Performance-Team, 10Thumbor, 10User-fgiunchedi: Rename mw2148 / mw2149 / mw2259 / mw2260 to thumbor200[1234] - https://phabricator.wikimedia.org/T168881#3379407 (10Volans) @fgiunchedi the old names needs to be cleaned from puppetdb too, I guess we need to add this step in the... [07:08:01] 10Operations, 10Fundraising-Backlog, 10Technical-Debt: Determine if benefactorevents.wikimedia.org should be hosted on the production cluster or still on Microsoft Azure - https://phabricator.wikimedia.org/T166240#3381245 (10ArielGlenn) p:05Triage>03Normal [07:11:54] !log Deploy alter table db1034 - T166208 [07:12:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:12:03] T166208: Convert unique keys into primary keys for some wiki tables on s7 - https://phabricator.wikimedia.org/T166208 [07:16:04] !log Temporarily disable event scheduler on dbstore2001 - T168354 [07:16:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:16:14] T168354: dbstore2001: s5 thread isn't able to catch up with the master - https://phabricator.wikimedia.org/T168354 [07:26:34] (03PS2) 10Alexandros Kosiaris: Remove the etherpad type from export [debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/361487 (https://phabricator.wikimedia.org/T168485) [07:27:28] (03CR) 10Alexandros Kosiaris: [C: 031] remove torrus.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/361618 (owner: 10Dzahn) [07:28:00] (03CR) 10Alexandros Kosiaris: [C: 031] varnish/bacula: remove torrus directors [puppet] - 10https://gerrit.wikimedia.org/r/361617 (owner: 10Dzahn) [07:33:26] (03CR) 10Alexandros Kosiaris: "Just out of curiosity, how many labs projects (aside from paladox themselves) actually use this ? A few year back (2014) matanya tried to " [puppet] - 10https://gerrit.wikimedia.org/r/361361 (owner: 10Paladox) [07:37:11] (03CR) 10Alexandros Kosiaris: [C: 032] tilerator: mock secret() [puppet] - 10https://gerrit.wikimedia.org/r/361586 (owner: 10Hashar) [07:37:15] (03PS2) 10Alexandros Kosiaris: tilerator: mock secret() [puppet] - 10https://gerrit.wikimedia.org/r/361586 (owner: 10Hashar) [07:37:49] (03CR) 10Alexandros Kosiaris: [C: 032] Remove the etherpad type from export [debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/361487 (https://phabricator.wikimedia.org/T168485) (owner: 10Alexandros Kosiaris) [07:39:04] (03CR) 10Filippo Giunchedi: Serve thumbnails for all public wikis with Thumbor (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/361479 (https://phabricator.wikimedia.org/T167796) (owner: 10Gilles) [07:42:51] (03CR) 10Gilles: Serve thumbnails for all public wikis with Thumbor (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/361479 (https://phabricator.wikimedia.org/T167796) (owner: 10Gilles) [07:43:10] 10Operations, 10Commons, 10Traffic, 10Wikimedia-Site-requests, and 2 others: Allow anonymous users to change interface language on Commons with ULS - https://phabricator.wikimedia.org/T161517#3381283 (10Nikerabbit) >>! In T161517#3380185, @BBlack wrote: > This task has gotten a bit confusing. I wrote a pr... [07:44:36] (03PS2) 10Gilles: Serve thumbnails for all public wikis with Thumbor [puppet] - 10https://gerrit.wikimedia.org/r/361479 (https://phabricator.wikimedia.org/T167796) [07:45:39] (03CR) 10Filippo Giunchedi: [C: 031] varnish/bacula: remove torrus directors [puppet] - 10https://gerrit.wikimedia.org/r/361617 (owner: 10Dzahn) [07:46:07] (03CR) 10Alexandros Kosiaris: [C: 031] "This looks fine to me, but I 'd rather Ottomata had a look as well" [puppet] - 10https://gerrit.wikimedia.org/r/361497 (https://phabricator.wikimedia.org/T167670) (owner: 10Ppchelko) [07:47:33] (03CR) 10Filippo Giunchedi: [C: 031] site: remove smokeping role from netmon1001 [puppet] - 10https://gerrit.wikimedia.org/r/361615 (https://phabricator.wikimedia.org/T159756) (owner: 10Dzahn) [07:48:38] _joe_ yes you are right :) Is stopping the jobcrons acceptable? I think that 2/3 hours should be good enough [07:48:53] (03CR) 10Alexandros Kosiaris: [C: 032] aptrepo: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/361577 (owner: 10Hashar) [07:48:58] (03PS2) 10Alexandros Kosiaris: aptrepo: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/361577 (owner: 10Hashar) [07:49:07] delayed jobs might not be started in time but it shouldn't be a big deal [07:49:26] (03PS2) 10Alexandros Kosiaris: nrpe: fix spec (depends on stdlib) [puppet] - 10https://gerrit.wikimedia.org/r/361580 (owner: 10Hashar) [07:49:33] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] aptrepo: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/361577 (owner: 10Hashar) [07:49:43] (03CR) 10Alexandros Kosiaris: [C: 032] nrpe: fix spec (depends on stdlib) [puppet] - 10https://gerrit.wikimedia.org/r/361580 (owner: 10Hashar) [07:49:50] (03PS3) 10Alexandros Kosiaris: nrpe: fix spec (depends on stdlib) [puppet] - 10https://gerrit.wikimedia.org/r/361580 (owner: 10Hashar) [07:49:53] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] nrpe: fix spec (depends on stdlib) [puppet] - 10https://gerrit.wikimedia.org/r/361580 (owner: 10Hashar) [07:50:10] (03CR) 10Alexandros Kosiaris: [C: 032] jenkins: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/361578 (owner: 10Hashar) [07:50:15] (03PS2) 10Alexandros Kosiaris: jenkins: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/361578 (owner: 10Hashar) [07:50:17] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] jenkins: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/361578 (owner: 10Hashar) [07:52:54] (03PS1) 10Alexandros Kosiaris: Bump debian revision to reflect latest patch level [debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/361630 (https://phabricator.wikimedia.org/T168485) [07:53:44] <_joe_> elukey: I hereby decree it is [07:54:59] 10Operations, 10ops-eqiad, 10Analytics: Smartctl errors for one kafka1012 disk - https://phabricator.wikimedia.org/T168927#3381297 (10elukey) [07:57:06] !log reboot maps codfw cluster [07:57:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:00:04] (03PS6) 10Giuseppe Lavagetto: Add build script plus nodejs base images [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/360813 [08:01:40] (03PS3) 10Filippo Giunchedi: Serve thumbnails for all public wikis with Thumbor [puppet] - 10https://gerrit.wikimedia.org/r/361479 (https://phabricator.wikimedia.org/T167796) (owner: 10Gilles) [08:04:24] 10Operations, 10DBA, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests: Reopen Wikinews Dutch - https://phabricator.wikimedia.org/T168764#3381334 (10Urbanecm) In that case I think the requestor (or somebody else) should delete all pages and import new pages from incubator. I'll create a patch for reope... [08:04:57] (03CR) 10Filippo Giunchedi: [C: 032] Serve thumbnails for all public wikis with Thumbor [puppet] - 10https://gerrit.wikimedia.org/r/361479 (https://phabricator.wikimedia.org/T167796) (owner: 10Gilles) [08:05:19] (03CR) 10Alexandros Kosiaris: [C: 032] Bump debian revision to reflect latest patch level [debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/361630 (https://phabricator.wikimedia.org/T168485) (owner: 10Alexandros Kosiaris) [08:08:06] !log roll-restart swift-proxy on ms-fe1* to pick up thumbor changes [08:08:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:08:17] (03PS4) 10Jcrespo: mariadb: refactor option support and move it to hiera [puppet] - 10https://gerrit.wikimedia.org/r/361456 (https://phabricator.wikimedia.org/T148507) [08:09:40] (03CR) 10jerkins-bot: [V: 04-1] mariadb: refactor option support and move it to hiera [puppet] - 10https://gerrit.wikimedia.org/r/361456 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo) [08:10:03] (03PS5) 10Jcrespo: mariadb: refactor option support and move it to hiera [puppet] - 10https://gerrit.wikimedia.org/r/361456 (https://phabricator.wikimedia.org/T148507) [08:11:04] (03CR) 10jerkins-bot: [V: 04-1] mariadb: refactor option support and move it to hiera [puppet] - 10https://gerrit.wikimedia.org/r/361456 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo) [08:11:42] 10Operations, 10ops-codfw: Troubleshoot scb2005 NICs - https://phabricator.wikimedia.org/T167763#3381349 (10Marostegui) CC @mobrovac [08:14:15] !log Re-enable event scheduler on dbstore2001 - T168354 [08:14:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:14:25] T168354: dbstore2001: s5 thread isn't able to catch up with the master - https://phabricator.wikimedia.org/T168354 [08:18:08] (03PS6) 10Jcrespo: mariadb: refactor option support and move it to hiera [puppet] - 10https://gerrit.wikimedia.org/r/361456 (https://phabricator.wikimedia.org/T148507) [08:18:32] !log stop jobcron/jobrunner on mw116[12] and reboot the hosts for kernel updates [08:18:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:18:56] (03PS1) 10Filippo Giunchedi: Update WMF logo url [puppet] - 10https://gerrit.wikimedia.org/r/361632 [08:19:09] (03PS7) 10Jcrespo: mariadb: refactor option support and move it to hiera [puppet] - 10https://gerrit.wikimedia.org/r/361456 (https://phabricator.wikimedia.org/T148507) [08:19:44] (03PS2) 10Filippo Giunchedi: Update WMF logo url [puppet] - 10https://gerrit.wikimedia.org/r/361632 [08:20:41] (03CR) 10jerkins-bot: [V: 04-1] mariadb: refactor option support and move it to hiera [puppet] - 10https://gerrit.wikimedia.org/r/361456 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo) [08:25:13] (03CR) 10Filippo Giunchedi: [C: 032] Update WMF logo url [puppet] - 10https://gerrit.wikimedia.org/r/361632 (owner: 10Filippo Giunchedi) [08:25:38] !log upload etherpad-lite_1.6.0-3 to apt.wikimedia.org/jessie-wikimedia/main [08:25:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:26:26] (03CR) 10Hashar: [C: 031] nslcd: Remove Labs shell override [puppet] - 10https://gerrit.wikimedia.org/r/361595 (https://phabricator.wikimedia.org/T86668) (owner: 10BryanDavis) [08:26:49] 10Operations, 10LDAP-Access-Requests, 10Labs, 10Labs-Infrastructure, 10Patch-For-Review: Make all ldap users have a sane shell (/bin/bash) - https://phabricator.wikimedia.org/T86668#3381412 (10hashar) Thank you @bd808 ! [08:31:29] mw116[12] back in service [08:33:24] !log restart of maps codfw cluster completed [08:33:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:41:38] (03PS8) 10Jcrespo: mariadb: refactor option support and move it to hiera [puppet] - 10https://gerrit.wikimedia.org/r/361456 (https://phabricator.wikimedia.org/T148507) [08:42:47] (03CR) 10jerkins-bot: [V: 04-1] mariadb: refactor option support and move it to hiera [puppet] - 10https://gerrit.wikimedia.org/r/361456 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo) [08:44:36] !log reboot maps eqiad cluster [08:44:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:46:21] (03PS9) 10Jcrespo: mariadb: refactor option support and move it to hiera [puppet] - 10https://gerrit.wikimedia.org/r/361456 (https://phabricator.wikimedia.org/T148507) [08:46:32] !log executing alter tables to the log database on db1047 for https://phabricator.wikimedia.org/T167162#3340421 [08:46:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:46:49] (Manuel is executing them so don't be scared) [08:46:59] (be even more scared!) [08:47:16] (worst than Luca doing things? Nah :) [08:47:25] (03CR) 10jerkins-bot: [V: 04-1] mariadb: refactor option support and move it to hiera [puppet] - 10https://gerrit.wikimedia.org/r/361456 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo) [08:54:59] (03PS10) 10Jcrespo: mariadb: refactor option support and move it to hiera [puppet] - 10https://gerrit.wikimedia.org/r/361456 (https://phabricator.wikimedia.org/T148507) [08:55:04] PROBLEM - MariaDB Slave SQL: s3 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1146, Errmsg: Error Table rel13testwiki.searchindex doesnt exist on query. Default database: rel13testwiki. [Query snipped] [08:55:25] ^ expected and I will fix that [08:56:04] RECOVERY - MariaDB Slave SQL: s3 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes [08:59:59] !log stop puppet and eventlogging_sync on db1047 [09:00:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:02:14] PROBLEM - eventlogging_sync processes on db1047 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /bin/bash /usr/local/bin/eventlogging_sync.sh [09:02:28] this is me --^ [09:04:49] (03CR) 10Volans: [C: 031] "Disclaimer: I'm not familiar with the Python API of Docker building tools." (037 comments) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/360813 (owner: 10Giuseppe Lavagetto) [09:10:53] (03PS6) 10Volans: Package metadata and testing tools improvements [software/cumin] - 10https://gerrit.wikimedia.org/r/338808 (https://phabricator.wikimedia.org/T154588) [09:10:55] (03PS2) 10Volans: Tests: convert unittest to pytest [software/cumin] - 10https://gerrit.wikimedia.org/r/361274 (https://phabricator.wikimedia.org/T154588) [09:15:55] (03PS11) 10Jcrespo: mariadb: refactor option support and move it to hiera [puppet] - 10https://gerrit.wikimedia.org/r/361456 (https://phabricator.wikimedia.org/T148507) [09:19:03] (03PS1) 10Volans: TODO: remove rejected item [software/cumin] - 10https://gerrit.wikimedia.org/r/361638 [09:20:36] 10Operations, 10Discovery-Analysis: Upgrade pandoc package to at least 1.12.3 - https://phabricator.wikimedia.org/T168683#3372065 (10Gehel) pandoc 1.12.4 is already available in Debian Jessie, maybe it is time to migrate discovery dashboards to Jessie? [09:21:25] (03CR) 10Jcrespo: "I am relatively confident on applying the current patch, and continue refactoring in other patches (this is in no way a final state)." [puppet] - 10https://gerrit.wikimedia.org/r/361456 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo) [09:21:37] (03CR) 10Jcrespo: "https://puppet-compiler.wmflabs.org/6856/" [puppet] - 10https://gerrit.wikimedia.org/r/361456 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo) [09:24:27] !log restart of maps eqiad cluster completed [09:24:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:29:59] (03PS12) 10Jcrespo: mariadb: refactor option support and move it to hiera [puppet] - 10https://gerrit.wikimedia.org/r/361456 (https://phabricator.wikimedia.org/T148507) [09:31:04] (03CR) 10jerkins-bot: [V: 04-1] mariadb: refactor option support and move it to hiera [puppet] - 10https://gerrit.wikimedia.org/r/361456 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo) [09:33:07] (03PS13) 10Jcrespo: mariadb: refactor option support and move it to hiera [puppet] - 10https://gerrit.wikimedia.org/r/361456 (https://phabricator.wikimedia.org/T148507) [09:37:29] (03CR) 10Alexandros Kosiaris: [C: 04-1] Add build script plus nodejs base images (038 comments) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/360813 (owner: 10Giuseppe Lavagetto) [09:41:30] (03PS2) 10Hashar: servermon: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/361590 [09:42:20] (03CR) 10Marostegui: [C: 031] "This looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/361456 (https://phabricator.wikimedia.org/T148507) (owner: 10Jcrespo) [09:42:27] (03CR) 10Hashar: servermon: fix spec (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/361590 (owner: 10Hashar) [09:43:36] !log bawolff@tin Synchronized php-1.30.0-wmf.6/api.php: Use redirect for api requests with pathinfo (duration: 00m 43s) [09:43:39] 10Operations, 10User-fgiunchedi: Some swift disks wrongly mounted on 5 ms-be hosts - https://phabricator.wikimedia.org/T163673#3381609 (10fgiunchedi) The scsi LUN numbers seem to be always correct, I'm assuming because they don't change if the underlying LV isn't changed. afaics `hpssa` will yield the devices... [09:43:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:49:54] !log executing alter tables to the log database on dbstore1002 for https://phabricator.wikimedia.org/T167162#3340421 [09:50:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:55:06] (03CR) 10Alexandros Kosiaris: servermon: fix spec (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/361590 (owner: 10Hashar) [09:58:58] (03CR) 10Hashar: servermon: fix spec (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/361590 (owner: 10Hashar) [09:59:40] (03PS3) 10Hashar: servermon: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/361590 [10:00:49] akosiaris: you are magic :] [10:04:31] (03CR) 10Alexandros Kosiaris: [C: 032] servermon: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/361590 (owner: 10Hashar) [10:04:32] (03PS4) 10Alexandros Kosiaris: servermon: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/361590 (owner: 10Hashar) [10:04:34] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] servermon: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/361590 (owner: 10Hashar) [10:07:29] hashar: I actually had to amend my code in various place for this lately, that's why I knew it [10:21:24] 10Operations, 10MobileFrontend, 10Reading-Web-Backlog, 10Traffic, and 3 others: Remove disableImages handling from VCL - https://phabricator.wikimedia.org/T168013#3381692 (10phuedx) @ema: Last week's train ran, so you should be able to merge and deploy {332d2ca8abac7393494b11a96e3b02834f209b4b} now. [10:23:14] RECOVERY - eventlogging_sync processes on db1047 is OK: PROCS OK: 1 process with UID = 0 (root), args /bin/bash /usr/local/bin/eventlogging_sync.sh [10:25:02] !log re-enabled puppet and eventlogging_sync on db1047 [10:25:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:26:36] (03PS1) 10KartikMistry: apertium-cat: New upstream release [debs/contenttranslation/apertium-cat] - 10https://gerrit.wikimedia.org/r/361644 (https://phabricator.wikimedia.org/T168857) [10:29:34] !log stop jobcron/jobrunner on mw116[34] and reboot the hosts for kernel updates [10:29:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:31:10] (03PS1) 10Hashar: elasticsearch: pass $publish_host in specs [puppet] - 10https://gerrit.wikimedia.org/r/361645 [10:32:38] (03PS1) 10KartikMistry: apertium-spa: New upstream release [debs/contenttranslation/apertium-spa] - 10https://gerrit.wikimedia.org/r/361646 (https://phabricator.wikimedia.org/T168857) [10:41:48] powercycling mw1163, seems stuck [10:43:58] better now :) [10:48:59] mw116[34] back in service [10:53:21] (03PS1) 10Filippo Giunchedi: swift: ship udev rules for swift disks [puppet] - 10https://gerrit.wikimedia.org/r/361647 (https://phabricator.wikimedia.org/T163673) [10:53:23] (03PS1) 10Filippo Giunchedi: swift: use implicit /dev/swift prefix for swift devices [puppet] - 10https://gerrit.wikimedia.org/r/361648 (https://phabricator.wikimedia.org/T163673) [11:02:40] (03CR) 10Gehel: [C: 032] elasticsearch: pass $publish_host in specs [puppet] - 10https://gerrit.wikimedia.org/r/361645 (owner: 10Hashar) [11:09:55] (03CR) 10Filippo Giunchedi: "Not tested in beta yet" [puppet] - 10https://gerrit.wikimedia.org/r/361648 (https://phabricator.wikimedia.org/T163673) (owner: 10Filippo Giunchedi) [11:13:56] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-spa: New upstream release [debs/contenttranslation/apertium-spa] - 10https://gerrit.wikimedia.org/r/361646 (https://phabricator.wikimedia.org/T168857) (owner: 10KartikMistry) [11:14:06] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-cat: New upstream release [debs/contenttranslation/apertium-cat] - 10https://gerrit.wikimedia.org/r/361644 (https://phabricator.wikimedia.org/T168857) (owner: 10KartikMistry) [11:17:44] (03PS4) 10Lucas Werkmeister (WMDE): Configure WikibaseQualityConstraints extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/358553 (https://phabricator.wikimedia.org/T168938) [11:18:13] (03CR) 10Lucas Werkmeister (WMDE): "Rebased and commit message updated to refer to T168938." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/358553 (https://phabricator.wikimedia.org/T168938) (owner: 10Lucas Werkmeister (WMDE)) [11:32:31] (03PS1) 10KartikMistry: apertium-spa-cat: New upstream version [debs/contenttranslation/apertium-spa-cat] - 10https://gerrit.wikimedia.org/r/361650 (https://phabricator.wikimedia.org/T168857) [11:33:03] (03CR) 10jerkins-bot: [V: 04-1] apertium-spa-cat: New upstream version [debs/contenttranslation/apertium-spa-cat] - 10https://gerrit.wikimedia.org/r/361650 (https://phabricator.wikimedia.org/T168857) (owner: 10KartikMistry) [11:34:26] (03CR) 10KartikMistry: "Depends on latest apertium-cat and apertium-spa." [debs/contenttranslation/apertium-spa-cat] - 10https://gerrit.wikimedia.org/r/361650 (https://phabricator.wikimedia.org/T168857) (owner: 10KartikMistry) [11:36:25] !log upload apertium-cat_2.2.0~r79715-1+wmf1 to apt.wikimedia.org/jessie-wikimedia/main [11:36:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:36:37] !log upload apertium-spa_1.1.0~r79716-1+wmf1 to apt.wikimedia.org/jessie-wikimedia/main [11:36:41] kart_: ^ [11:36:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:36:53] !log stop jobcron/jobrunner on mw116[56] and reboot the hosts for kernel updates [11:37:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:37:07] akosiaris: cool. will recheck for apertium-spa-cat [11:37:19] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-spa-cat] - 10https://gerrit.wikimedia.org/r/361650 (https://phabricator.wikimedia.org/T168857) (owner: 10KartikMistry) [11:43:11] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-spa-cat: New upstream version [debs/contenttranslation/apertium-spa-cat] - 10https://gerrit.wikimedia.org/r/361650 (https://phabricator.wikimedia.org/T168857) (owner: 10KartikMistry) [11:48:24] !log upload apertium-spa-cat_2.1.0~r79717-1 to apt.wikimedia.org/jessie-wikimedia/main [11:48:31] kart_: ^ [11:48:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:48:37] mw116[5,6] back in service [11:49:24] PROBLEM - Disk space on labtestnet2001 is CRITICAL: DISK CRITICAL - free space: / 350 MB (3% inode=75%) [11:50:34] really? [11:50:46] is neutron up? [11:51:01] no it is not [11:51:02] grrr [11:52:40] akosiaris: thanks! [11:53:24] RECOVERY - Disk space on labtestnet2001 is OK: DISK OK [11:53:44] akosiaris: do we need to restart apertium-apy service? [11:54:56] !log stop nova-spiceproxy and neutron-metadata-agent on labtestnet2001 to avoid root partition to fill up [11:54:59] andrewbogott: --^ [11:55:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:55:14] there are multiple non puppetized things in there that are spamming [11:55:20] not sure if I got all of them [11:58:07] elukey: shut it down ;) [11:58:24] !log Deploy alter table on s6 directly on codfw master (db2028) to let it replicate - T168661 [11:58:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:58:35] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [12:06:21] !log stop jobcron/jobrunner on mw1167 and mw1299 and reboot the hosts for kernel updates [12:06:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:19:23] !log Deploy alter table on s5 directly on codfw master (db2023) to let it replicate - T168661 [12:19:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:19:32] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [12:20:56] mw1167 and mw1299 back in service [12:35:30] !log Deploy alter table on s4 directly on codfw master (db2019) to let it replicate - T168661 [12:35:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:35:40] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [12:39:22] (03CR) 10Mforns: "I think the filter for mediawiki tables could be improved, what do you think?" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/356383 (https://phabricator.wikimedia.org/T108850) (owner: 10Elukey) [12:40:14] (03PS1) 10Volans: Icinga: fix wikitech-static check [puppet] - 10https://gerrit.wikimedia.org/r/361654 (https://phabricator.wikimedia.org/T128209) [12:40:17] (03Abandoned) 10Mforns: Change EL purging script to avoid limit/offset [puppet] - 10https://gerrit.wikimedia.org/r/359938 (https://phabricator.wikimedia.org/T168071) (owner: 10Mforns) [12:41:39] (03PS2) 10Volans: Icinga: fix wikitech-static check [puppet] - 10https://gerrit.wikimedia.org/r/361654 (https://phabricator.wikimedia.org/T128209) [12:41:57] (03Abandoned) 10Mforns: [WIP] Modify EL purging script to not use limit/offset [puppet] - 10https://gerrit.wikimedia.org/r/359442 (https://phabricator.wikimedia.org/T168071) (owner: 10Mforns) [12:43:24] (03CR) 10Volans: [C: 032] Icinga: fix wikitech-static check [puppet] - 10https://gerrit.wikimedia.org/r/361654 (https://phabricator.wikimedia.org/T128209) (owner: 10Volans) [12:45:55] RECOVERY - are wikitech and wt-static in sync on labtestweb2001 is OK: wikitech-static OK - wikitech and wikitech-static in sync (42964 200000s) [12:45:55] RECOVERY - are wikitech and wt-static in sync on silver is OK: wikitech-static OK - wikitech and wikitech-static in sync (42964 200000s) [13:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170627T1300). Please do the needful. [13:00:05] anomie: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [13:01:44] no patches [13:01:55] jouncebot: refresh [13:02:00] I refreshed my knowledge about deployments. [13:03:04] hashar: wait, wasn't there one patch for today? [13:03:34] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: x1 master db1031: Faulty BBU - https://phabricator.wikimedia.org/T166108#3382080 (10Marostegui) 05Open>03stalled I have changed it to stalled as I don't think we are replacing its BBU anytime soon - it is a master. [13:03:47] 10Operations, 10Graphite, 10User-fgiunchedi: Something puts many different metrics into graphite, allocating a lot of disk space - https://phabricator.wikimedia.org/T1075#3382083 (10fgiunchedi) [13:03:49] ah https://wikitech.wikimedia.org/w/index.php?title=Deployments&type=revision&diff=1763076&oldid=1763020 [13:04:32] 10Operations, 10Graphite, 10User-fgiunchedi: Something puts many different metrics into graphite, allocating a lot of disk space - https://phabricator.wikimedia.org/T1075#18643 (10fgiunchedi) [13:05:19] 10Operations, 10Graphite, 10User-fgiunchedi: Something puts many different metrics into graphite, allocating a lot of disk space - https://phabricator.wikimedia.org/T1075#18643 (10fgiunchedi) @ArielGlenn thanks! yeah the eventstreams/rdkafka task is T160644, `instances` is also supposed to move to labmon in... [13:11:11] (03PS1) 10Jcrespo: Parsercache: Purge rows every day, and reduce TTL to 22 days [puppet] - 10https://gerrit.wikimedia.org/r/361656 (https://phabricator.wikimedia.org/T167784) [13:12:03] (03CR) 10Marostegui: [C: 031] Parsercache: Purge rows every day, and reduce TTL to 22 days [puppet] - 10https://gerrit.wikimedia.org/r/361656 (https://phabricator.wikimedia.org/T167784) (owner: 10Jcrespo) [13:12:43] (03PS2) 10Jcrespo: Parsercache: Purge rows every day, and reduce TTL to 22 days [puppet] - 10https://gerrit.wikimedia.org/r/361656 (https://phabricator.wikimedia.org/T167784) [13:18:02] _joe_: have a minute for me to pick your brain about a puppet thing? [13:20:32] (03PS1) 10Jcrespo: Parsercache: Reduce expiration time to 22 days [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361659 (https://phabricator.wikimedia.org/T167784) [13:23:57] (03PS3) 10Jcrespo: Parsercache: Purge rows every day, and reduce TTL to 22 days [puppet] - 10https://gerrit.wikimedia.org/r/361656 (https://phabricator.wikimedia.org/T167784) [13:24:21] elukey: I am renumbering bohrium as we speak. see https://gerrit.wikimedia.org/r/#/c/361660/1. I 'll also reboot it in the process [13:30:51] away [13:30:53] argh [13:31:03] akosiaris: sure, ack, thanks for the ping :) [13:31:32] (03CR) 10Alexandros Kosiaris: [C: 032] Renumber bohrium.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/361660 (owner: 10Alexandros Kosiaris) [13:32:05] RECOVERY - MariaDB Slave Lag: s6 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 89923.33 seconds [13:33:45] PROBLEM - Host bohrium is DOWN: PING CRITICAL - Packet loss = 100% [13:35:47] could someone please merge a CI patch for me? https://gerrit.wikimedia.org/r/#/c/342635/ it moves packages around so they are available on Nodepool instances [13:36:32] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 3 others: ForeignAPIRepo wrongly returns non-protocol-relative URLs for original "thumbs" - https://phabricator.wikimedia.org/T50133#534558 (10Anomie) >>! In T50133#1614931, @Tgr wrote: > The relevant part is `ApiQueryImageInfo.php` [[... [13:38:00] (03PS2) 10Filippo Giunchedi: swift: ship udev rules for swift disks [puppet] - 10https://gerrit.wikimedia.org/r/361647 (https://phabricator.wikimedia.org/T163673) [13:38:02] (03PS2) 10Filippo Giunchedi: swift: use implicit /dev/swift prefix for swift devices [puppet] - 10https://gerrit.wikimedia.org/r/361648 (https://phabricator.wikimedia.org/T163673) [13:38:04] (03PS1) 10Filippo Giunchedi: profile: update labs swift storage symlink [puppet] - 10https://gerrit.wikimedia.org/r/361665 (https://phabricator.wikimedia.org/T163673) [13:39:10] elukey: done. waiting for puppet to propagate to icinga the IP change now [13:39:24] <_joe_> urandom: in 10 minutes [13:39:38] _joe_: sure; let me know [13:40:54] (03CR) 10Alexandros Kosiaris: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/361663 (owner: 10Alexandros Kosiaris) [13:41:14] RECOVERY - Host bohrium is UP: PING OK - Packet loss = 0%, RTA = 0.40 ms [13:41:20] hashar: can you run pcc on the affected hosts? [13:41:33] elukey: there is none. It is labs only :] [13:41:59] elukey: though you are right. I should cherry pick it on the puppet master [13:42:23] super thanks :) [13:43:07] (03CR) 10Alexandros Kosiaris: [C: 032] Lower TTLs for various host renumberings [dns] - 10https://gerrit.wikimedia.org/r/361661 (owner: 10Alexandros Kosiaris) [13:45:15] elukey: definitely a noop :) [13:45:28] (03PS6) 10Filippo Giunchedi: swift: lower replication interval for beta [puppet] - 10https://gerrit.wikimedia.org/r/344387 (https://phabricator.wikimedia.org/T160990) (owner: 10Hashar) [13:45:51] hashar: I've rebased it onto current production, was causing a conflict on beta puppetmaster ^ [13:47:46] (03PS2) 10Filippo Giunchedi: profile: update labs swift storage symlink [puppet] - 10https://gerrit.wikimedia.org/r/361665 (https://phabricator.wikimedia.org/T163673) [13:48:13] (03PS5) 10Elukey: contint: PHP ext build dependencies on Nodepool [puppet] - 10https://gerrit.wikimedia.org/r/342635 (https://phabricator.wikimedia.org/T134381) (owner: 10Hashar) [13:49:52] <_joe_> urandom: here I am [13:50:09] (03CR) 10Elukey: [C: 032] contint: PHP ext build dependencies on Nodepool [puppet] - 10https://gerrit.wikimedia.org/r/342635 (https://phabricator.wikimedia.org/T134381) (owner: 10Hashar) [13:50:18] elukey: thank you :] [13:50:35] yw! [13:50:54] _joe_: cool; i need to make a change to the cassandra module and am coming up short on a satisfying approach [13:51:08] <_joe_> urandom: what do you need to do? [13:51:27] _joe_: basically, i need to be able to assign more than 1 data_file_directories, on an instance-by-instance basis [13:51:47] <_joe_> uhm ok [13:51:57] * urandom is getting a link [13:52:00] !log Rename table enwiki.localisation_file_hash on db1089 - T119811 [13:52:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:52:11] T119811: Drop localisation and localisation_file_hash tables, l10nwiki databases too - https://phabricator.wikimedia.org/T119811 [13:52:20] _joe_: https://github.com/wikimedia/puppet/blob/production/modules/cassandra/manifests/instance.pp#L80 [13:52:47] <_joe_> so you want to have a different data_file_directory value per instance or multiple values per instance? [13:53:08] _joe_: you can see that in a single instance config (above that), that you can assign an array of values, but for a multi-instance config it's hard-coded to one [13:53:20] yeah, an arbitrary number [13:53:34] <_joe_> uh ok, this is indeed complex [13:55:11] _joe_: it's probably safe to assume that all will share a base (ala data_directory_base) [13:55:36] that is currently also something you can assign to a 'default'/single instance [13:56:01] <_joe_> urandom: let me think of something [13:56:04] k [13:56:14] <_joe_> but it won't be pretty :P [13:56:24] _joe_: nothing i could think of was [13:57:35] it's puppet after all ;) [13:58:27] volans: touche [14:06:50] (03PS3) 10Filippo Giunchedi: profile: fix labs swift storage [puppet] - 10https://gerrit.wikimedia.org/r/361665 (https://phabricator.wikimedia.org/T163673) [14:06:52] (03PS3) 10Filippo Giunchedi: swift: ship udev rules for swift disks [puppet] - 10https://gerrit.wikimedia.org/r/361647 (https://phabricator.wikimedia.org/T163673) [14:06:54] (03PS3) 10Filippo Giunchedi: swift: use implicit /dev/swift prefix for swift devices [puppet] - 10https://gerrit.wikimedia.org/r/361648 (https://phabricator.wikimedia.org/T163673) [14:07:59] (03PS1) 10Giuseppe Lavagetto: cassandra::instance: allow defining multiple data directories [puppet] - 10https://gerrit.wikimedia.org/r/361673 [14:08:11] <_joe_> urandom: ^^ sig [14:08:14] <_joe_> *sigh [14:08:18] (03CR) 10Filippo Giunchedi: [C: 032] profile: fix labs swift storage [puppet] - 10https://gerrit.wikimedia.org/r/361665 (https://phabricator.wikimedia.org/T163673) (owner: 10Filippo Giunchedi) [14:08:23] (03PS4) 10Filippo Giunchedi: profile: fix labs swift storage [puppet] - 10https://gerrit.wikimedia.org/r/361665 (https://phabricator.wikimedia.org/T163673) [14:10:09] _joe_: looking... [14:11:51] (03CR) 10Filippo Giunchedi: "cherry-picked in beta and hiera adjusted accordingly, it works!" [puppet] - 10https://gerrit.wikimedia.org/r/361648 (https://phabricator.wikimedia.org/T163673) (owner: 10Filippo Giunchedi) [14:14:01] _joe_: so, the semantics here would be that if you are configuring a single instance, then you need to use a fully qualified set of paths, but if you are configuring a multi-instance, that array becomes the subdirectories against a base directory? [14:15:04] PROBLEM - Check Varnish expiry mailbox lag on cp1074 is CRITICAL: CRITICAL: expiry mailbox lag is 2011828 [14:15:30] _joe_: it just ocurred to me that it is already the case that in either situation, you can specify a data_directory_base, and only in the former does it actually guarantee that your directories are in the base [14:15:47] errr... the latter, the multi-instance config [14:15:53] so... it's already a mess [14:16:03] s/mess/inconsistent/ [14:17:26] <_joe_> sorry, I'm not sure I understand urandom [14:17:32] <_joe_> so going in order [14:19:04] if you specify 'data_file_directories' for a single instance config (aka instance_name == 'default'), then the array needs to contain fully qualified paths [14:19:27] <_joe_> ah right, I can fix that ofc [14:20:00] <_joe_> if you want to define multiple data directories you can just add the 'data_file_directories' entry here https://github.com/wikimedia/puppet/blob/production/hieradata/role/common/restbase/production.yaml#L65 [14:20:35] (03PS4) 10Filippo Giunchedi: aptrepo: add hp-mcp to stretch-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/357422 (https://phabricator.wikimedia.org/T162609) [14:21:32] _joe_: yup [14:21:50] (03CR) 10Gilles: [C: 031] hieradata: add thumbor100[34] to thumbor nutcracker [puppet] - 10https://gerrit.wikimedia.org/r/361454 (owner: 10Filippo Giunchedi) [14:22:07] (03PS5) 10Filippo Giunchedi: aptrepo: add hp-mcp to stretch-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/357422 (https://phabricator.wikimedia.org/T162609) [14:22:34] !log stop jobcron/jobrunner on mw1300 and mw1301 and reboot the hosts for kernel updates [14:22:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:22:59] (03CR) 10Filippo Giunchedi: "I've minimized the changes to just include mcp updates, we can untangle thirdparty/hwraid later on" [puppet] - 10https://gerrit.wikimedia.org/r/357422 (https://phabricator.wikimedia.org/T162609) (owner: 10Filippo Giunchedi) [14:23:03] 10Operations, 10fundraising-tech-ops, 10netops: BGP session between pfw clusters flapping - https://phabricator.wikimedia.org/T164777#3382359 (10ayounsi) 05Open>03Invalid Going to replace the pfw soon, not worth investigating more, unless it's causing visible issues. [14:23:06] 10Operations, 10Cassandra, 10RESTBase, 10RESTBase-Cassandra, and 2 others: Option: Consider switching back to leveled compaction (LCS) - https://phabricator.wikimedia.org/T153703#3382361 (10GWicke) @eevans, I know you have switched some keyspaces on the dev cluster from TWCS to LCS. What has been the effec... [14:23:18] jouncebot: next [14:23:20] In 1 hour(s) and 36 minute(s): Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170627T1600) [14:23:52] (03PS2) 10Filippo Giunchedi: hieradata: add thumbor100[34] to thumbor nutcracker [puppet] - 10https://gerrit.wikimedia.org/r/361454 [14:24:24] 10Operations, 10DNS, 10Traffic, 10Services (watching): icinga alerts on nodejs services when a recdns server is depooled - https://phabricator.wikimedia.org/T162818#3382365 (10GWicke) [14:26:02] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: add thumbor100[34] to thumbor nutcracker [puppet] - 10https://gerrit.wikimedia.org/r/361454 (owner: 10Filippo Giunchedi) [14:26:12] (03PS3) 10Rush: maintain-views.yaml: Remove unused views [puppet] - 10https://gerrit.wikimedia.org/r/361034 (https://phabricator.wikimedia.org/T153213) (owner: 10Marostegui) [14:26:28] 10Operations, 10Services (next): Set up grafana alerting for services - https://phabricator.wikimedia.org/T162765#3382371 (10GWicke) [14:26:33] 10Operations, 10monitoring, 10netops: Setup flow monitoring of *internal* network traffic - https://phabricator.wikimedia.org/T79755#3382373 (10ayounsi) 05Open>03Resolved Alerts added to the dashboard (not tied to nagios, but shows up in the "single pane of glass" dashboard in LibreNMS. I think that sati... [14:27:57] (03PS2) 10Giuseppe Lavagetto: cassandra::instance: allow defining multiple data directories [puppet] - 10https://gerrit.wikimedia.org/r/361673 [14:28:46] (03CR) 10Gehel: Switch this repo to a deb package (031 comment) [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/352170 (https://phabricator.wikimedia.org/T158560) (owner: 10DCausse) [14:29:28] (03CR) 10Rush: [C: 032] maintain-views.yaml: Remove unused views [puppet] - 10https://gerrit.wikimedia.org/r/361034 (https://phabricator.wikimedia.org/T153213) (owner: 10Marostegui) [14:30:06] PROBLEM - puppet last run on labtestpuppetmaster2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[apache2] [14:32:01] (03CR) 10Ppchelko: "@Alexandros Ottomata is away for 2 weeks, and this is blocking ORES, so I decided to go ahead with it. Also, the ticket proposing the idea" [puppet] - 10https://gerrit.wikimedia.org/r/361497 (https://phabricator.wikimedia.org/T167670) (owner: 10Ppchelko) [14:34:14] (03CR) 10Eevans: cassandra::instance: allow defining multiple data directories (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/361673 (owner: 10Giuseppe Lavagetto) [14:35:22] _joe_: i wasn't trying to get you to do this for me btw :) [14:35:35] _joe_: but i do appreciate your help [14:37:58] (03PS1) 10Andrew Bogott: Puppetmaster: Remove some config code that is never used. [puppet] - 10https://gerrit.wikimedia.org/r/361675 [14:38:14] (03PS4) 10Filippo Giunchedi: phabricator: redirect serveraliases homepage to phab_servername [puppet] - 10https://gerrit.wikimedia.org/r/355769 (https://phabricator.wikimedia.org/T166120) [14:39:51] (03CR) 10Filippo Giunchedi: [C: 032] phabricator: redirect serveraliases homepage to phab_servername [puppet] - 10https://gerrit.wikimedia.org/r/355769 (https://phabricator.wikimedia.org/T166120) (owner: 10Filippo Giunchedi) [14:43:48] 10Operations, 10ops-ulsfo: lvs4002 power supply failure - https://phabricator.wikimedia.org/T151273#3382455 (10ayounsi) [14:45:03] 10Operations, 10Mail: Move most (all?) exim personal aliases to OIT - https://phabricator.wikimedia.org/T122144#3382463 (10Jgreen) [14:45:06] 10Operations, 10Mail, 10fundraising-tech-ops: (re)move problemsdonating aliases - https://phabricator.wikimedia.org/T127488#3382460 (10Jgreen) 05Open>03Resolved a:03Jgreen I submitted this task to techsupport@ so OIT will see it, closing in Phabricator. [14:46:27] 10Operations, 10netops: Filter outgoing BGP announcements on AS regex - https://phabricator.wikimedia.org/T83037#3382464 (10ayounsi) 05Open>03Resolved a:03ayounsi I believe the previous comment fixes the initial request, other improvements (like community no-export or as-path will be investigated in larg... [14:46:56] (03CR) 10Andrew Bogott: "the puppet compiler confirms that this is a no-op for puppetmaster1001 and 1002. https://integration.wikimedia.org/ci/job/operations-pupp" [puppet] - 10https://gerrit.wikimedia.org/r/361675 (owner: 10Andrew Bogott) [14:50:25] 10Operations, 10ops-codfw, 10Services (watching): Troubleshoot scb2005 NICs - https://phabricator.wikimedia.org/T167763#3382488 (10mobrovac) Any day works, the sooner the better. Some time this week? [14:50:54] PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [14:52:56] 10Operations, 10fundraising-tech-ops: Port fundraising stats off Ganglia - https://phabricator.wikimedia.org/T152562#3382497 (10Jgreen) [14:56:18] 10Operations, 10DBA, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests: Reopen Wikinews Dutch - https://phabricator.wikimedia.org/T168764#3382535 (10Luke081515) I would prefer, if we do the reopen first, otherwise, if there is a lag between import and reopen, changes at incubator can get missed. We can... [14:56:48] 10Operations, 10Commons, 10Traffic, 10Wikimedia-Site-requests, and 2 others: Allow anonymous users to change interface language on Commons with ULS - https://phabricator.wikimedia.org/T161517#3382537 (10Nikerabbit) Thanks for the detailed reply which explains why it isn't so simple as I thought. I need to... [14:59:54] RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0] [15:05:12] (03CR) 10Giuseppe Lavagetto: [C: 031] Puppetmaster: Remove some config code that is never used. [puppet] - 10https://gerrit.wikimedia.org/r/361675 (owner: 10Andrew Bogott) [15:05:31] (03CR) 10Giuseppe Lavagetto: [C: 031] "I would double-check the labs puppetmasters, but it seems safe." [puppet] - 10https://gerrit.wikimedia.org/r/361675 (owner: 10Andrew Bogott) [15:05:53] (03CR) 10DCausse: Switch this repo to a deb package (031 comment) [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/352170 (https://phabricator.wikimedia.org/T158560) (owner: 10DCausse) [15:09:24] PROBLEM - Check Varnish expiry mailbox lag on cp1072 is CRITICAL: CRITICAL: expiry mailbox lag is 2043595 [15:09:24] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad is CRITICAL: CRITICAL - failed 22 probes of 296 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [15:09:37] 10Operations, 10DBA, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests: Reopen Wikinews Dutch - https://phabricator.wikimedia.org/T168764#3382585 (10MF-Warburg) >>! In T168764#3381334, @Urbanecm wrote: > In that case I think the requestor (or somebody else) should delete all pages and import new pages... [15:10:17] 10Operations, 10DBA, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests: Reopen Wikinews Dutch - https://phabricator.wikimedia.org/T168764#3382612 (10MF-Warburg) >>! In T168764#3382535, @Luke081515 wrote: > We can add .* to the TBL after removed from closed lists to prevent new pages before import finis... [15:14:20] 10Operations, 10Labs, 10wikitech.wikimedia.org: wikitech-static sync check shouldn't happen so often - https://phabricator.wikimedia.org/T168962#3382632 (10Andrew) [15:14:24] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad is OK: OK - failed 14 probes of 296 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [15:14:43] <_joe_> urandom: the whole 'data_file_directories' thing is quite the clusterfuck [15:15:00] <_joe_> you have two different uses of it [15:19:19] (03CR) 10Giuseppe Lavagetto: cassandra::instance: allow defining multiple data directories (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/361673 (owner: 10Giuseppe Lavagetto) [15:19:24] RECOVERY - Check Varnish expiry mailbox lag on cp1072 is OK: OK: expiry mailbox lag is 511 [15:19:35] (03PS3) 10Giuseppe Lavagetto: cassandra::instance: allow defining multiple data directories [puppet] - 10https://gerrit.wikimedia.org/r/361673 [15:22:08] (03CR) 10Paladox: "This fails on stretch as php5-tidy is php-tidy (i.e. php7.0-tidy.)" [puppet] - 10https://gerrit.wikimedia.org/r/342635 (https://phabricator.wikimedia.org/T134381) (owner: 10Hashar) [15:30:00] 10Operations, 10Phabricator, 10Traffic, 10Patch-For-Review, 10User-fgiunchedi: phab.wmfusercontent.org "homepage" yields a 500 - https://phabricator.wikimedia.org/T166120#3382702 (10mmodell) 05Open>03Resolved a:03fgiunchedi [15:31:25] (03Draft1) 10Paladox: contint: Make php.pp compatible with stretch [puppet] - 10https://gerrit.wikimedia.org/r/361680 [15:31:28] (03PS2) 10Paladox: contint: Make php.pp compatible with stretch [puppet] - 10https://gerrit.wikimedia.org/r/361680 [15:35:51] (03PS3) 10Paladox: contint: Make php.pp compatible with stretch [puppet] - 10https://gerrit.wikimedia.org/r/361680 (https://phabricator.wikimedia.org/T166611) [15:36:01] _joe_: yeah :( [15:37:56] 10Operations, 10ops-codfw, 10Services (watching): Troubleshoot scb2005 NICs - https://phabricator.wikimedia.org/T167763#3382751 (10Papaul) @mobrovac what about this Thursday 29th at 10:00AM CDT ? [15:39:59] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361681 (https://phabricator.wikimedia.org/T142582) [15:40:26] (03CR) 10Nuria: "@Pchelko: Can you be more more explicit as to why is this change blocking ORES?" [puppet] - 10https://gerrit.wikimedia.org/r/361497 (https://phabricator.wikimedia.org/T167670) (owner: 10Ppchelko) [15:40:40] (03CR) 10jerkins-bot: [V: 04-1] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361681 (https://phabricator.wikimedia.org/T142582) (owner: 10Jdrewniak) [15:41:13] <_joe_> urandom: so, I moved my trick to the specific part [15:43:52] 10Operations, 10Analytics-Kanban, 10User-Elukey: New analytic hosts with BBU learning cycle enabled - https://phabricator.wikimedia.org/T167809#3382757 (10Nuria) 05Open>03Resolved [15:44:06] (03PS1) 10Papaul: DNS: Remove mgmt DNS entries for ms-be20[0-1[1-9] [dns] - 10https://gerrit.wikimedia.org/r/361682 [15:45:47] (03CR) 10Jdlrobson: [C: 031] "This can be +2ed now. Code has been removed from MobileFrontend and is live in production." [puppet] - 10https://gerrit.wikimedia.org/r/359417 (https://phabricator.wikimedia.org/T168013) (owner: 10Ema) [15:47:34] 10Operations, 10Discovery, 10Discovery-Analysis, 10Discovery-Search (Current work): Upload shiny-server .deb to our Jessie apt repository - https://phabricator.wikimedia.org/T168967#3382767 (10Gehel) [15:47:38] 10Operations, 10Discovery, 10Discovery-Analysis, 10Discovery-Search (Current work): Upload shiny-server .deb to our Jessie apt repository - https://phabricator.wikimedia.org/T168967#3382798 (10Gehel) p:05Triage>03High [15:48:56] 10Operations, 10Discovery-Analysis: Upgrade pandoc package to at least 1.12.3 - https://phabricator.wikimedia.org/T168683#3372065 (10Gehel) The blocker on movign to Jessie was the availability of shiny-server. I opened T168967 to track that part. We'll see if it is easier to provide a newer version of pandoc o... [15:50:43] (03CR) 10Gehel: Switch this repo to a deb package (031 comment) [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/352170 (https://phabricator.wikimedia.org/T158560) (owner: 10DCausse) [15:53:08] (03CR) 10Ppchelko: "@Nuria ORES has several supported methods for precaching and one of them was using the RCStream, while the other used ChangeProp with the " [puppet] - 10https://gerrit.wikimedia.org/r/361497 (https://phabricator.wikimedia.org/T167670) (owner: 10Ppchelko) [15:56:38] _joe_: sorry, just got out of a meeting. [15:56:46] <_joe_> urandom: I'm in one in a few [15:57:04] _joe_: so, i guess your siding with maintaining the current inconsistency? [15:57:14] <_joe_> urandom: for now, yes [15:57:14] wrt to single v multi instances? [15:57:24] <_joe_> it's a bigger topic I think [15:57:24] it's least invasive, certainly [15:57:27] 10Operations, 10Ops-Access-Requests, 10Scoring-platform-team-Backlog, 10Patch-For-Review: Grant AWight accounts on ores production clusters - https://phabricator.wikimedia.org/T168442#3382836 (10awight) :) @Halfak, I believe that's for you. [15:57:27] right [16:00:04] godog, moritzm, and _joe_: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170627T1600). [16:00:05] Pchelolo and Smalyshev: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [16:00:08] (03PS3) 10Dzahn: varnish/bacula: remove torrus directors [puppet] - 10https://gerrit.wikimedia.org/r/361617 [16:00:34] I'll take a look at puppet swat [16:00:44] here [16:01:19] SMalyshev: hey, doesn't look like https://gerrit.wikimedia.org/r/#/c/358783/ has an answer from hoo? [16:02:55] godog: yeah, I haven't seen him online for a while, maybe he's on vacation? [16:03:44] SMalyshev: no idea, maybe Lydia_WMDE knows ^ ? [16:05:27] SMalyshev: sounds like he should sign off though [16:06:30] 10Operations, 10DBA, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10User-Urbanecm: Reopen Wikinews Dutch - https://phabricator.wikimedia.org/T168764#3382914 (10Urbanecm) a:03Urbanecm >>! In T168764#3382585, @MF-Warburg wrote: >>>! In T168764#3381334, @Urbanecm wrote: >> In that case I think t... [16:06:35] Pchelolo: looks like https://gerrit.wikimedia.org/r/#/c/361497/ is still undergoing discussion before merge? [16:06:56] (03CR) 10Eevans: [C: 031] "WFM; [PC output](http://puppet-compiler.wmflabs.org/6862/)" [puppet] - 10https://gerrit.wikimedia.org/r/361673 (owner: 10Giuseppe Lavagetto) [16:07:18] (03PS1) 10Herron: Change mailman DEFAULT_DMARC_MODERATION_ACTION to 1 (munge from) [puppet] - 10https://gerrit.wikimedia.org/r/361685 (https://phabricator.wikimedia.org/T168467) [16:07:25] 10Operations, 10Ops-Access-Requests, 10Scoring-platform-team-Backlog, 10Patch-For-Review: Grant AWight accounts on ores production clusters - https://phabricator.wikimedia.org/T168442#3382919 (10Halfak) Confirmed! [16:08:26] (03PS1) 10Urbanecm: Reopen nlwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361686 (https://phabricator.wikimedia.org/T168764) [16:08:28] (03CR) 10Dzahn: [C: 032] varnish/bacula: remove torrus directors [puppet] - 10https://gerrit.wikimedia.org/r/361617 (owner: 10Dzahn) [16:08:34] godog: he said it makes sense... I've asked on wikidata channel let's see if anybody knows [16:09:12] (03CR) 10jerkins-bot: [V: 04-1] Reopen nlwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361686 (https://phabricator.wikimedia.org/T168764) (owner: 10Urbanecm) [16:11:20] (03CR) 10Urbanecm: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361686 (https://phabricator.wikimedia.org/T168764) (owner: 10Urbanecm) [16:11:53] (03CR) 10jerkins-bot: [V: 04-1] Reopen nlwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361686 (https://phabricator.wikimedia.org/T168764) (owner: 10Urbanecm) [16:12:49] Hello all, what is wrong with my patch above? Why do I get V-1? [16:13:27] Uncaught exception: Could not open extension /usr/lib/x86_64-linux-gnu/hhvm/extensions/20150212/tidy.so: [16:13:44] I think there was a discussion in #wikimedia-releng about it [16:14:08] So can I just ignore it elukey? [16:14:10] that's a bug in CI itself, not your mistake [16:14:19] Great! [16:14:27] the new tidy package has been added yesterday , afaik [16:15:40] godog: ye let's wait for nuria_ to respond there.. [16:16:01] Is this https://gerrit.wikimedia.org/r/#/c/361297/ the same thing? In the log there is "RuntimeException". [16:16:36] (03PS4) 10Urbanecm: Add two lines to NamespacesAliases for zh_classical [mediawiki-config] - 10https://gerrit.wikimedia.org/r/360450 (https://phabricator.wikimedia.org/T168422) [16:17:00] Urbanecm: no, that seems a test faiure [16:17:06] DbListTests::testComputedListsFreshness [16:17:09] Contents of 'nowikidatadescriptiontaglines' must match expansion of 'nowikidatadescriptiontaglines-computed' [16:17:13] Failed asserting that two arrays are equal. [16:17:19] (03Abandoned) 10Urbanecm: Remove the botadmin group from mlwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332329 (https://phabricator.wikimedia.org/T152296) (owner: 10Urbanecm) [16:17:26] (03CR) 10jerkins-bot: [V: 04-1] Add two lines to NamespacesAliases for zh_classical [mediawiki-config] - 10https://gerrit.wikimedia.org/r/360450 (https://phabricator.wikimedia.org/T168422) (owner: 10Urbanecm) [16:17:31] someone else reported the "tidy" problem in -releng [16:17:50] 10Operations, 10ops-codfw, 10Services (watching): Troubleshoot scb2005 NICs - https://phabricator.wikimedia.org/T167763#3382983 (10mobrovac) >>! In T167763#3382751, @Papaul wrote: > @mobrovac what about this Thursday 29th at 10:00AM CDT ? Yup, perfect! [16:17:57] (03CR) 10Lydia Pintscher: "If you don't hear anything from Marius in the next 2 days please go ahead as he is currently very busy." [puppet] - 10https://gerrit.wikimedia.org/r/358783 (https://phabricator.wikimedia.org/T164783) (owner: 10Smalyshev) [16:19:01] (03PS1) 10Gilles: Switch back to image scalers [puppet] - 10https://gerrit.wikimedia.org/r/361689 (https://phabricator.wikimedia.org/T168949) [16:19:03] volans: How can be the error fixed? [16:19:08] (03PS5) 10Urbanecm: Add two lines to NamespacesAliases for zh_classical [mediawiki-config] - 10https://gerrit.wikimedia.org/r/360450 (https://phabricator.wikimedia.org/T168422) [16:19:43] (03CR) 10jerkins-bot: [V: 04-1] Add two lines to NamespacesAliases for zh_classical [mediawiki-config] - 10https://gerrit.wikimedia.org/r/360450 (https://phabricator.wikimedia.org/T168422) (owner: 10Urbanecm) [16:19:51] (03CR) 10jerkins-bot: [V: 04-1] Switch back to image scalers [puppet] - 10https://gerrit.wikimedia.org/r/361689 (https://phabricator.wikimedia.org/T168949) (owner: 10Gilles) [16:21:11] (03PS2) 10Gilles: Switch back to image scalers [puppet] - 10https://gerrit.wikimedia.org/r/361689 (https://phabricator.wikimedia.org/T168949) [16:22:15] Urbanecm: I guess checking the code of the test reported there: tests/dblistTest.php:109 :) [16:22:37] !log restarted apache on iridium, phabricator was running an old version of libphutil [16:22:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:22:50] (03CR) 10Filippo Giunchedi: [C: 032] Switch back to image scalers [puppet] - 10https://gerrit.wikimedia.org/r/361689 (https://phabricator.wikimedia.org/T168949) (owner: 10Gilles) [16:23:02] Urbanecm i've filled https://phabricator.wikimedia.org/T168978 [16:23:22] !log revert back to imagescalers for thumbs - T168949 [16:23:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:23:34] T168949: Proper thumbnails of portrait photos not being generated; serious display issues - https://phabricator.wikimedia.org/T168949 [16:23:34] I'm not familiar with those tests, just checked Jenkins output [16:24:04] volans: Thank you, I've look at it. [16:24:41] Urbanecm: maybe that issue was introduced recently when the latest Wikipedia language was added [16:24:54] it somehow checks the integrity of the "dblists" [16:24:59] and that would have touched them [16:25:04] RECOVERY - Check Varnish expiry mailbox lag on cp1074 is OK: OK: expiry mailbox lag is 45412 [16:26:21] !log rebooting ganeti instance releases1001 - which is down network-wise but was running [16:26:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:30:32] ACKNOWLEDGEMENT - Restbase root url on restbase2001 is CRITICAL: connect to address 10.192.16.152 and port 7231: Connection refused Volans mobrovac debugging new cassandra: https://datastax-oss.atlassian.net/browse/NODEJS-371 [16:30:43] mobrovac: ^^^^ I'm disabling the checks too [16:30:46] so it doesn't flap [16:30:55] awesome, grazie Vol [16:30:57] volans: [16:31:28] yw :) [16:31:29] (03CR) 10Halfak: "Great summary. Thanks Ppchelko." [puppet] - 10https://gerrit.wikimedia.org/r/361497 (https://phabricator.wikimedia.org/T167670) (owner: 10Ppchelko) [16:37:21] !log releases1001 - setting boot parameters to network, rebooting [16:37:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:39:32] (03PS3) 10Urbanecm: Initial configuration for maiwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361297 (https://phabricator.wikimedia.org/T168782) [16:39:41] (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for maiwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361297 (https://phabricator.wikimedia.org/T168782) (owner: 10Urbanecm) [16:41:17] (03PS4) 10Urbanecm: Initial configuration for maiwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361297 (https://phabricator.wikimedia.org/T168782) [16:41:26] (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for maiwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361297 (https://phabricator.wikimedia.org/T168782) (owner: 10Urbanecm) [16:41:43] (03CR) 10Giuseppe Lavagetto: Add build script plus nodejs base images (0315 comments) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/360813 (owner: 10Giuseppe Lavagetto) [16:41:59] (03PS7) 10Giuseppe Lavagetto: Add build script plus nodejs base images [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/360813 [16:44:19] <_joe_> urandom: do you need that cassandra patch merged today? [16:44:24] <_joe_> as I'd rather go offline nw [16:44:27] <_joe_> *now [16:44:34] <_joe_> and merge it tomorrow morning [16:44:51] <_joe_> if it's a noop according to the compiler, just ask someone in the right TZ to assist you [16:46:21] Pchelolo: sorry on meeting back here now cc godog [16:48:12] elukey: looking now at eventstreams change to get full context [16:49:14] nuria_: ack [16:51:52] Pchelolo: are these events being validated through eventbus proxy? or are tehy being produced from kafka [16:55:07] Pchelolo: and how about this other change : https://gerrit.wikimedia.org/r/#/c/357457/ [16:55:13] Pchelolo: is this needed too? [16:58:39] (03PS1) 10Urbanecm: Enable autopatrol flag on ptwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361695 (https://phabricator.wikimedia.org/T168981) [16:58:47] Pchelolo: seems that no, that one is not needed [16:59:22] (03CR) 10jerkins-bot: [V: 04-1] Enable autopatrol flag on ptwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361695 (https://phabricator.wikimedia.org/T168981) (owner: 10Urbanecm) [16:59:27] Pchelolo: The new public event stream needs to be documented , similar to https://www.mediawiki.org/wiki/API:Recent_changes_stream [17:00:04] gwicke, cscott, arlolra, subbu, halfak, and Amir1: Respected human, time to deploy Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170627T1700). Please do the needful. [17:00:10] Pchelolo: would you be able to take care of that [17:00:39] _joe_: no worries; it can wait [17:00:49] _joe_: thanks for your help! [17:07:44] no parsoid deploy [17:08:02] nuria_: oh, sorry, I went on the trip to the office at a really bad time.. [17:08:20] Pchelolo: np [17:08:50] nuria_: since we're exposing the new stream mainly for ORES use right now, we can hold on with public announcement/documentation till Andrew is back [17:09:45] We would still have a lot of room to change stuff there, we've rushed it only because it's blocking ores for 2 weeks [17:10:05] (03PS16) 10Elukey: role::mariadb::analytics::custom_repl_slave: add eventlogging_cleaner.py [puppet] - 10https://gerrit.wikimedia.org/r/356383 (https://phabricator.wikimedia.org/T108850) [17:10:09] So I think I just add a comment on the ticket for Andrew to know where we are here and then we can wait for him to come back [17:12:03] 10Operations, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10Labs, 10hardware-requests: Eqiad: Hardware request for labstore1006/7, dataset1002/3 - https://phabricator.wikimedia.org/T161311#3383469 (10RobH) 05stalled>03Resolved This request has been fulfilled, and systems are being setup on T16... [17:12:27] 10Operations, 10Labs, 10hardware-requests: eqiad: (1) hardware access request for labnodepool1002 - https://phabricator.wikimedia.org/T161753#3383474 (10RobH) 05stalled>03Resolved This has been ordered and setup is tracked via T168407. [17:12:47] 10Operations, 10Labs, 10hardware-requests: Codfw: (1) hardware access request for labtestvirt2003 [region 2] - https://phabricator.wikimedia.org/T161765#3383478 (10RobH) 05Open>03Resolved Ordered and setup via T166564 [17:18:49] Pchelolo: 1st time i hear ORES is blocked on this, sorry [17:19:07] 10Operations, 10ops-ulsfo, 10hardware-requests: Decommission cp4011, cp4012, cp4019, cp4020 - https://phabricator.wikimedia.org/T167377#3383530 (10RobH) a:03RobH [17:19:21] nuria_: I've put a small explanation on the gerrit patch [17:20:21] Pchelolo: ya, i understand the explanation but not why would ORES be blocked , i understand consuming a new public stream that will be supported is abetter way forward [17:20:30] Pchelolo: but not why thsi is blocking ORES per se [17:20:36] * this [17:20:38] 10Operations, 10MobileFrontend, 10Reading-Web-Backlog, 10Traffic, 10Patch-For-Review: Remove disableImages handling from VCL - https://phabricator.wikimedia.org/T168013#3383537 (10Jdlrobson) [17:21:07] 10Operations, 10MobileFrontend, 10Reading-Web-Backlog, 10Traffic, 10Patch-For-Review: Remove disableImages handling from VCL - https://phabricator.wikimedia.org/T168013#3352978 (10Jdlrobson) [17:21:28] (03Draft1) 10Paladox: package_builder: Make init.pp compatible with stretch [puppet] - 10https://gerrit.wikimedia.org/r/361698 [17:21:31] (03PS2) 10Paladox: package_builder: Make init.pp compatible with stretch [puppet] - 10https://gerrit.wikimedia.org/r/361698 (https://phabricator.wikimedia.org/T166611) [17:21:42] 10Operations, 10DBA, 10cloud-services-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3383543 (10madhuvishy) Thanks for the detailed explanation @jcrespo. For labsdb1001 and 1003, I'll check with Chris and schedule the dns switchover and the reboots to happen this week/e... [17:22:01] nuria_: no,no, it's just blocking some refatorings they want to do. Also RCStream is supposed to shut down so it's good to get ready for that [17:22:47] Pchelolo: I am just trying to gather context [17:22:55] (03PS17) 10Elukey: role::mariadb::analytics::custom_repl_slave: add eventlogging_cleaner.py [puppet] - 10https://gerrit.wikimedia.org/r/356383 (https://phabricator.wikimedia.org/T108850) [17:23:31] 10Operations, 10LDAP-Access-Requests, 10Labs, 10Labs-Infrastructure, 10Patch-For-Review: Make all ldap users have a sane shell (/bin/bash) - https://phabricator.wikimedia.org/T86668#3383568 (10bd808) Just for the historical record, here's what I did to edit the `loginShell` entries. Commands were run fro... [17:24:08] Pchelolo: I see, this change is just blocking some refactor the ORES team wants to do but it does not have any operational impact in the immediate term, correct? [17:24:15] 10Operations, 10DBA, 10cloud-services-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3383577 (10madhuvishy) @Cmjohnson Hi! We are looking at rebooting labsdb1001 and 1003, and it seems like these boxes may not come up automatically on reboot. Jaime recommended that it w... [17:24:30] 10Operations, 10ops-eqiad, 10DC-Ops, 10Discovery-Search (Current work): some elasticsearch servers in eqiad have CPU overheating - https://phabricator.wikimedia.org/T168816#3377656 (10EBernhardson) I have some suspicion this is also related to our latency warning in icinga triggering for cirrussearch resul... [17:24:51] 10Operations, 10ops-ulsfo, 10hardware-requests: Decommission cp4011, cp4012, cp4019, cp4020 - https://phabricator.wikimedia.org/T167377#3383585 (10RobH) Port assignments for later update: xe-1/0/10 cp4019 xe-1/0/11 cp4020 xe-2/0/11 cp4011 xe-2/0/12 cp4012 [17:25:02] (03PS18) 10Elukey: role::mariadb::analytics::custom_repl_slave: add eventlogging_cleaner.py [puppet] - 10https://gerrit.wikimedia.org/r/356383 (https://phabricator.wikimedia.org/T108850) [17:25:10] nuria_: until the RCStream is shut down - no. But RCStream is going away July 7 [17:25:50] Pchelolo: yes, that is teh idea [17:25:56] 10Operations, 10ops-ulsfo, 10hardware-requests: Decommission cp4011, cp4012, cp4019, cp4020 - https://phabricator.wikimedia.org/T167377#3383591 (10RobH) [17:26:32] RCStream is going away July 7? [17:26:54] PROBLEM - Host cp4010 is DOWN: PING CRITICAL - Packet loss = 100% [17:27:37] nuria_: so ORES would need the revision create stream before that to get rid of the RCStream. They can switch to recentchange stream in the meantime, but that's just duplicated work, because in the long run they want to use revision-create stream. [17:27:54] PROBLEM - Host lvs4004 is DOWN: PING CRITICAL - Packet loss = 100% [17:28:17] Pchelolo: ya, understood now. [17:28:35] nuria_: Pchelolo: in https://phabricator.wikimedia.org/T168919 I was looking at RCStream's traffic, it's still in the ballpark of ~2-3 reqs/sec averaged over a day (~200-250K reqs/day), mostly from google exit IPs and amazon instances it looks like. [17:29:20] bblack: there was an announcement from Andrew Otto on the ops list and there's a task about that: https://phabricator.wikimedia.org/T156919 [17:30:22] ok. I mean I'm all for killing it, the sooner the better, due to its HTTPS problems. [17:30:46] just noting it seems like there's a fair bit of outstanding public traffic and no public announce of that date that I know of, for something that's like 10 days away [17:30:48] Pchelolo: My main concerns here is with the way we have proceed of exposing something publicly w/o proper docs though, even if at this time, the stream is only for our own consumption [17:32:24] PROBLEM - IPsec on cp2016 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp4010_v4, cp4010_v6 [17:32:24] PROBLEM - IPsec on cp2013 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp4010_v6 not-conn: cp4010_v4 [17:32:34] PROBLEM - IPsec on cp1066 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp4010_v4, cp4010_v6 [17:32:46] PROBLEM - IPsec on cp1055 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp4010_v4, cp4010_v6 [17:32:46] PROBLEM - IPsec on cp2023 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp4010_v4, cp4010_v6 [17:32:53] (03CR) 10Mforns: [C: 031] "LGTM! I think it's ready to go!" [puppet] - 10https://gerrit.wikimedia.org/r/356383 (https://phabricator.wikimedia.org/T108850) (owner: 10Elukey) [17:32:54] PROBLEM - IPsec on cp1053 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp4010_v4, cp4010_v6 [17:32:54] PROBLEM - IPsec on cp2019 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp4010_v4, cp4010_v6 [17:32:54] PROBLEM - IPsec on cp1065 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp4010_v4, cp4010_v6 [17:33:04] PROBLEM - IPsec on cp1054 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp4010_v4, cp4010_v6 [17:33:04] PROBLEM - IPsec on cp1067 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp4010_v4, cp4010_v6 [17:33:04] PROBLEM - IPsec on cp1052 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp4010_v4, cp4010_v6 [17:33:05] PROBLEM - IPsec on cp1068 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp4010_v4, cp4010_v6 [17:33:08] bblack: ya , we have worked with a bunch of clients but some others we have no way to reach them seems like [17:33:14] PROBLEM - IPsec on cp2001 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp4010_v4, cp4010_v6 [17:33:14] PROBLEM - IPsec on cp2004 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp4010_v4, cp4010_v6 [17:33:16] bblack: this is your main point i take [17:33:22] nuria_: up to you, we can wait for ottomata to pick this up, or we can do it with no docs for now and keep it unannounced [17:33:24] PROBLEM - IPsec on cp2010 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp4010_v4, cp4010_v6 [17:33:24] PROBLEM - IPsec on cp2007 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp4010_v4, cp4010_v6 [17:36:29] ok [17:37:11] bblack: "and no public announce of that date that I know of" [17:37:23] was there one? [17:37:44] bblack: we announce it on wikitech-l which seems like it wouyld be the best venue [17:38:00] bblack: yes, we probably need to send it again , let me find our original one [17:38:35] RECOVERY - Host lvs4004 is UP: PING WARNING - Packet loss = 80%, RTA = 78.61 ms [17:38:38] ok [17:38:42] bblack: "EventStreams launch and RCStream deprecation" Frebruary 8th [17:38:44] RECOVERY - Host cp4010 is UP: PING OK - Packet loss = 0%, RTA = 78.61 ms [17:38:44] RECOVERY - IPsec on cp1055 is OK: Strongswan OK - 44 ESP OK [17:38:44] RECOVERY - IPsec on cp2023 is OK: Strongswan OK - 56 ESP OK [17:38:54] RECOVERY - IPsec on cp1053 is OK: Strongswan OK - 44 ESP OK [17:38:54] RECOVERY - IPsec on cp2019 is OK: Strongswan OK - 56 ESP OK [17:38:54] RECOVERY - IPsec on cp1065 is OK: Strongswan OK - 44 ESP OK [17:38:58] bblack: with a remainder 5 days ago [17:39:04] RECOVERY - IPsec on cp1054 is OK: Strongswan OK - 44 ESP OK [17:39:05] RECOVERY - IPsec on cp1067 is OK: Strongswan OK - 44 ESP OK [17:39:05] RECOVERY - IPsec on cp1052 is OK: Strongswan OK - 44 ESP OK [17:39:05] RECOVERY - IPsec on cp1068 is OK: Strongswan OK - 44 ESP OK [17:39:14] RECOVERY - IPsec on cp2001 is OK: Strongswan OK - 56 ESP OK [17:39:14] RECOVERY - IPsec on cp2004 is OK: Strongswan OK - 56 ESP OK [17:39:24] RECOVERY - IPsec on cp2010 is OK: Strongswan OK - 56 ESP OK [17:39:24] RECOVERY - IPsec on cp2016 is OK: Strongswan OK - 56 ESP OK [17:39:24] RECOVERY - IPsec on cp2007 is OK: Strongswan OK - 56 ESP OK [17:39:24] RECOVERY - IPsec on cp2013 is OK: Strongswan OK - 56 ESP OK [17:39:33] nuria_: ok thanks :) [17:39:34] RECOVERY - IPsec on cp1066 is OK: Strongswan OK - 44 ESP OK [17:39:39] (03CR) 10Nuria: "I see, this change is just blocking some refactor the ORES team wants to do but it does not have any operational impact today. This is jus" [puppet] - 10https://gerrit.wikimedia.org/r/361497 (https://phabricator.wikimedia.org/T167670) (owner: 10Ppchelko) [17:39:51] bblack: but you know i am going to send this issue to adele [17:40:40] 10Operations, 10ops-ulsfo, 10hardware-requests: Decommission cp4011, cp4012, cp4019, cp4020 - https://phabricator.wikimedia.org/T167377#3383659 (10RobH) Ok, one of those port assignments is bad, since disabling them all brought down cp4010. I'll need to go onsite to determine what ports these actually plug... [17:40:46] PROBLEM - PyBal backends health check on lvs4004 is CRITICAL: PYBAL CRITICAL - uploadlb_80 - Could not depool server cp4007.ulsfo.wmnet because of too many down!: misc_weblb_80 - Could not depool server cp4004.ulsfo.wmnet because of too many down!: misc_weblb6_80 - Could not depool server cp4002.ulsfo.wmnet because of too many down!: misc_weblb_443 - Could not depool server cp4002.ulsfo.wmnet because of too many down!: misc_we [17:40:47] depool server cp4003.ulsfo.wmnet because of too many down!: uploadlb6_443 - Could not depool server cp4005.ulsfo.wmnet because of too many down! [17:41:21] 10Operations, 10ops-ulsfo, 10hardware-requests: Decommission cp4011, cp4012, cp4019, cp4020 - https://phabricator.wikimedia.org/T167377#3383660 (10RobH) [17:41:37] PROBLEM - puppet last run on lvs4004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:46] RECOVERY - PyBal backends health check on lvs4004 is OK: PYBAL OK - All pools are healthy [17:44:01] bblack: sending e-mail to adele, cc-ing you [17:44:11] !log restart pybal on lvs4004 [17:44:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:44:54] (03CR) 10Nuria: [C: 031] Expose mediawiki.revision-create stream from eventstreams. [puppet] - 10https://gerrit.wikimedia.org/r/361497 (https://phabricator.wikimedia.org/T167670) (owner: 10Ppchelko) [17:52:13] (03PS2) 10Andrew Bogott: Puppetmaster: Remove some config code that is never used. [puppet] - 10https://gerrit.wikimedia.org/r/361675 [17:52:38] (03PS1) 10RobH: decom of cp4011, cp4012, cp4019, cp4020 [puppet] - 10https://gerrit.wikimedia.org/r/361702 [17:52:53] !log branching 1.30.0-wmf.7 - T167536 [17:53:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:53:03] T167536: MW-1.30.0-wmf.7 deployment blockers - https://phabricator.wikimedia.org/T167536 [17:54:53] (03CR) 10Andrew Bogott: [C: 032] Puppetmaster: Remove some config code that is never used. [puppet] - 10https://gerrit.wikimedia.org/r/361675 (owner: 10Andrew Bogott) [17:56:26] ACKNOWLEDGEMENT - puppet last run on labtestpuppetmaster2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[apache2] andrew bogott Im in the process of building this system [17:57:21] doh! why does the cards extension have access restricted in gerrit? [17:57:25] https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/extensions/Cards,access [17:58:36] RECOVERY - puppet last run on lvs4004 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [17:58:57] twentyafterfour: It's set read-only. [17:59:03] Nothing looks wrong with ACL [17:59:20] It was EOL'd [17:59:51] 10Operations, 10Traffic: stream.wikimedia.org: remove legacy rcstream/socket.io HTTPS redirect hole punches - https://phabricator.wikimedia.org/T168919#3380816 (10BBlack) [18:00:01] RainbowSprinkles: well it's still in make-wmf-branch [18:00:05] Ugh. [18:00:24] Quick fix: set to active again [18:00:46] PROBLEM - pdfrender on scb1003 is CRITICAL: connect to address 10.64.32.153 and port 5252: Connection refused [18:01:18] 10Operations, 10Traffic: stream.wikimedia.org: remove legacy rcstream/socket.io HTTPS redirect hole punches - https://phabricator.wikimedia.org/T168919#3380816 (10BBlack) Answering my own timeline question, it looks like it was announced that RCStream goes away July 7th! [18:01:24] (03CR) 10Chad: "Just needs reattempting. I haven't had time." [puppet] - 10https://gerrit.wikimedia.org/r/332531 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [18:01:42] RainbowSprinkles: thanks, that worked [18:01:59] I'll remove it from make-wmf-branch config and re-read-only it later [18:02:42] T167452 [18:02:42] T167452: Undeploy and archive Cards extension - https://phabricator.wikimedia.org/T167452 [18:05:10] !log Some CI jobs are broken with "tidy.so: cannot open shared object file: No such file or directory" see T169004 [18:05:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:05:20] T169004: CI job fails with: /usr/lib/x86_64-linux-gnu/hhvm/extensions/20150212/tidy.so: cannot open shared object file: No such file or directory - https://phabricator.wikimedia.org/T169004 [18:07:17] PROBLEM - puppetmaster https on labcontrol1002 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8140: HTTP/1.1 200 OK [18:07:54] ^ andrewbogott? [18:08:16] I'll look… wouldn't've thought it was running there even [18:08:36] PROBLEM - Check HHVM threads for leakage on mw2148 is CRITICAL: NRPE: Command check_check_leaked_hhvm_threads not defined [18:09:13] (03PS1) 10Andrew Bogott: Add labtestpuppetmaster2001 hiera host defs [puppet] - 10https://gerrit.wikimedia.org/r/361704 [18:10:17] (03CR) 10jerkins-bot: [V: 04-1] Add labtestpuppetmaster2001 hiera host defs [puppet] - 10https://gerrit.wikimedia.org/r/361704 (owner: 10Andrew Bogott) [18:11:33] ACKNOWLEDGEMENT - puppetmaster https on labcontrol1001 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8140: HTTP/1.1 200 OK andrew bogott I dont know what this is but Im looking. [18:11:34] ACKNOWLEDGEMENT - puppetmaster https on labcontrol1002 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8140: HTTP/1.1 200 OK andrew bogott I dont know what this is but Im looking. [18:12:27] In what sense is '200 OK' an invalid response? [18:12:54] (03PS1) 10Hashar: Remove libtidy-dev from nodepool instances [puppet] - 10https://gerrit.wikimedia.org/r/361706 (https://phabricator.wikimedia.org/T134381) [18:13:25] lol [18:13:40] it's a puppermaster so it's expected to 500? [18:13:51] 400 according to the code [18:14:30] well that sort of makes sense I guess [18:14:52] it should be access controlled by client cert? [18:15:39] It's not crazy… I need to figure out what changed [18:15:40] yea, check_http has -e parameter, to say which code is expected as OK. this is a special case where 200 isnt the norm [18:16:46] PROBLEM - puppetmaster https on labtestcontrol2001 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8140: HTTP/1.1 200 OK [18:17:15] (03CR) 10RobH: [C: 032] Remove libtidy-dev from nodepool instances [puppet] - 10https://gerrit.wikimedia.org/r/361706 (https://phabricator.wikimedia.org/T134381) (owner: 10Hashar) [18:17:34] there goes another one... [18:17:37] that's weird [18:18:06] (03PS1) 10Gilles: Upgrade to 0.1.41 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/361709 [18:19:01] (03PS2) 10Gilles: Upgrade to 0.1.41 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/361709 (https://phabricator.wikimedia.org/T168949) [18:19:03] (03CR) 10Filippo Giunchedi: [C: 032] Upgrade to 0.1.41 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/361709 (https://phabricator.wikimedia.org/T168949) (owner: 10Gilles) [18:19:15] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] Upgrade to 0.1.41 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/361709 (https://phabricator.wikimedia.org/T168949) (owner: 10Gilles) [18:19:22] (03PS2) 10Andrew Bogott: Add labtestpuppetmaster2001 hiera host defs [puppet] - 10https://gerrit.wikimedia.org/r/361704 [18:19:24] (03PS1) 10Andrew Bogott: Revert "Puppetmaster: Remove some config code that is never used." [puppet] - 10https://gerrit.wikimedia.org/r/361710 [18:20:50] (03PS1) 10Gilles: Switch Thumbor back on [puppet] - 10https://gerrit.wikimedia.org/r/361711 (https://phabricator.wikimedia.org/T168949) [18:21:44] (03CR) 10jerkins-bot: [V: 04-1] Add labtestpuppetmaster2001 hiera host defs [puppet] - 10https://gerrit.wikimedia.org/r/361704 (owner: 10Andrew Bogott) [18:25:30] !log reduce cluster_concurrent_rebalance to 8 and node_concurrent_recoveries to 4 on elasticsearch eqiad [18:25:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:28:37] (03PS2) 10Andrew Bogott: Partially revert "Puppetmaster: Remove some config code that is never used." [puppet] - 10https://gerrit.wikimedia.org/r/361710 [18:28:39] (03PS3) 10Andrew Bogott: Add labtestpuppetmaster2001 hiera host defs [puppet] - 10https://gerrit.wikimedia.org/r/361704 [18:29:51] (03CR) 10jerkins-bot: [V: 04-1] Add labtestpuppetmaster2001 hiera host defs [puppet] - 10https://gerrit.wikimedia.org/r/361704 (owner: 10Andrew Bogott) [18:30:51] (03PS2) 10Dzahn: site: remove smokeping role from netmon1001 [puppet] - 10https://gerrit.wikimedia.org/r/361615 (https://phabricator.wikimedia.org/T159756) [18:31:35] (03PS3) 10Andrew Bogott: Partially revert "Puppetmaster: Remove some config code that is never used." [puppet] - 10https://gerrit.wikimedia.org/r/361710 [18:32:49] (03CR) 10Andrew Bogott: [C: 032] Partially revert "Puppetmaster: Remove some config code that is never used." [puppet] - 10https://gerrit.wikimedia.org/r/361710 (owner: 10Andrew Bogott) [18:34:16] RECOVERY - puppetmaster https on labcontrol1002 is OK: HTTP OK: Status line output matched 400 - 333 bytes in 1.093 second response time [18:35:20] !log upgrade thumbor to 0.1.41 [18:35:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:37:27] (03CR) 10Filippo Giunchedi: [C: 032] Switch Thumbor back on [puppet] - 10https://gerrit.wikimedia.org/r/361711 (https://phabricator.wikimedia.org/T168949) (owner: 10Gilles) [18:37:32] (03PS2) 10Filippo Giunchedi: Switch Thumbor back on [puppet] - 10https://gerrit.wikimedia.org/r/361711 (https://phabricator.wikimedia.org/T168949) (owner: 10Gilles) [18:37:40] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] Switch Thumbor back on [puppet] - 10https://gerrit.wikimedia.org/r/361711 (https://phabricator.wikimedia.org/T168949) (owner: 10Gilles) [18:37:46] (03PS4) 10Andrew Bogott: Add labtestpuppetmaster2001 hiera host defs [puppet] - 10https://gerrit.wikimedia.org/r/361704 [18:38:24] bd808: any comment on https://gerrit.wikimedia.org/r/#/c/361361/ ? [18:40:11] matanya: do you happen to know what role(s) apply that check? We have puppet role searching fixed for Cloud instances via https://tools.wmflabs.org/openstack-browser/puppetclass/ [18:41:13] no clue bd808 [18:41:34] !log switch thumbor back on with a fix for T168949 [18:41:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:41:44] T168949: Proper thumbnails of portrait photos not being generated; serious display issues - https://phabricator.wikimedia.org/T168949 [18:44:29] (03PS1) 10Herron: Change donate.wikimedia.org SPF to soft fail (~all) [dns] - 10https://gerrit.wikimedia.org/r/361718 (https://phabricator.wikimedia.org/T167704) [18:44:36] PROBLEM - mediawiki-installation DSH group on mw2148 is CRITICAL: Host mw2148 is not in mediawiki-installation dsh group [18:45:46] RECOVERY - puppetmaster https on labtestcontrol2001 is OK: HTTP OK: Status line output matched 400 - 333 bytes in 1.718 second response time [18:46:04] (03CR) 10Hashar: "Eventually caused T169004 :'(" [puppet] - 10https://gerrit.wikimedia.org/r/342635 (https://phabricator.wikimedia.org/T134381) (owner: 10Hashar) [18:48:16] (03CR) 10BryanDavis: Nrpe: Fix check_ram script to work on stretch (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/361361 (owner: 10Paladox) [18:54:31] (03CR) 10BryanDavis: "> Just out of curiosity, how many labs projects (aside from paladox" [puppet] - 10https://gerrit.wikimedia.org/r/361361 (owner: 10Paladox) [18:55:21] (03CR) 10Paladox: "> > Just out of curiosity, how many labs projects (aside from paladox" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/361361 (owner: 10Paladox) [18:56:35] (03CR) 10Hashar: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361681 (https://phabricator.wikimedia.org/T142582) (owner: 10Jdrewniak) [18:57:40] (03CR) 10Hashar: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361686 (https://phabricator.wikimedia.org/T168764) (owner: 10Urbanecm) [18:57:46] (03CR) 10Hashar: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/360450 (https://phabricator.wikimedia.org/T168422) (owner: 10Urbanecm) [18:58:35] (03CR) 10Hashar: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361297 (https://phabricator.wikimedia.org/T168782) (owner: 10Urbanecm) [18:58:40] (03CR) 10Hashar: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361695 (https://phabricator.wikimedia.org/T168981) (owner: 10Urbanecm) [18:58:43] (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for maiwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361297 (https://phabricator.wikimedia.org/T168782) (owner: 10Urbanecm) [19:00:04] twentyafterfour: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170627T1900). [19:01:11] (03PS2) 10Mobrovac: Recommendation API: Add the beta scap source [puppet] - 10https://gerrit.wikimedia.org/r/360686 (https://phabricator.wikimedia.org/T165760) [19:02:26] (03CR) 10Krinkle: [C: 04-1] Initial configuration for maiwikimedia (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361297 (https://phabricator.wikimedia.org/T168782) (owner: 10Urbanecm) [19:03:10] (03CR) 10Krinkle: [C: 04-1] "The nowikidatadescriptiontaglines-computed list needs to be updated. Per Jenkins." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361297 (https://phabricator.wikimedia.org/T168782) (owner: 10Urbanecm) [19:03:28] (03CR) 10Krinkle: [C: 04-1] "(also merge conflict since these files have changed since then)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361297 (https://phabricator.wikimedia.org/T168782) (owner: 10Urbanecm) [19:08:02] (03PS1) 10EBernhardson: Fix incorrect paste of dworley ssh key [puppet] - 10https://gerrit.wikimedia.org/r/361728 [19:11:04] (03PS1) 10Dzahn: Revert "Revert "icinga: remove check_ram.sh doesn't seem to be used anywhere"" [puppet] - 10https://gerrit.wikimedia.org/r/361729 [19:11:18] (03CR) 10Gehel: [C: 032] Fix incorrect paste of dworley ssh key [puppet] - 10https://gerrit.wikimedia.org/r/361728 (owner: 10EBernhardson) [19:11:35] (03CR) 10jerkins-bot: [V: 04-1] Revert "Revert "icinga: remove check_ram.sh doesn't seem to be used anywhere"" [puppet] - 10https://gerrit.wikimedia.org/r/361729 (owner: 10Dzahn) [19:12:31] (03CR) 10Dzahn: "when this was reverted the reason was given as "icinga in labs uses it". But that was https://icinga.wmflabs.org by petan which is dead. S" [puppet] - 10https://gerrit.wikimedia.org/r/361729 (owner: 10Dzahn) [19:14:52] 10Operations, 10DBA, 10cloud-services-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3384109 (10jcrespo) > it would be easier to just reboot the boxes I am ok with that if you are ok with that. Announcement should be done, though- on last upgrade people got upset even... [19:15:18] (03CR) 10Jgreen: [C: 031] Change donate.wikimedia.org SPF to soft fail (~all) [dns] - 10https://gerrit.wikimedia.org/r/361718 (https://phabricator.wikimedia.org/T167704) (owner: 10Herron) [19:16:26] it seems db1034 has disk issues [19:16:50] it is depooled and most pages disabled, but wanted to give a heads up to not worry if you se problems there [19:19:31] (03Draft1) 10Paladox: puppetmaster: Fix syntax error in puppetmaster.erb [puppet] - 10https://gerrit.wikimedia.org/r/361730 [19:19:34] (03PS2) 10Paladox: puppetmaster: Fix syntax error in puppetmaster.erb [puppet] - 10https://gerrit.wikimedia.org/r/361730 [19:20:16] (03CR) 10Paladox: "This has syntax errors, fixed by https://gerrit.wikimedia.org/r/#/c/361730/" [puppet] - 10https://gerrit.wikimedia.org/r/361710 (owner: 10Andrew Bogott) [19:21:08] (03PS5) 10Urbanecm: Initial configuration for maiwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361297 (https://phabricator.wikimedia.org/T168782) [19:21:17] (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for maiwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361297 (https://phabricator.wikimedia.org/T168782) (owner: 10Urbanecm) [19:21:36] (03CR) 10Urbanecm: "@Krinkle: Updated how? Can you amend my patch please? Going to care about the conflict now!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361297 (https://phabricator.wikimedia.org/T168782) (owner: 10Urbanecm) [19:22:25] andrewbogott i fixed it with git fetch https://gerrit.wikimedia.org/r/operations/puppet refs/changes/30/361730/2 && git cherry-pick FETCH_HEAD [19:22:33] https://gerrit.wikimedia.org/r/#/c/361730/2 [19:22:38] Tested [19:22:39] (03CR) 10Aaron Schulz: "The other commit mentioned links to this same commit." [puppet] - 10https://gerrit.wikimedia.org/r/361656 (https://phabricator.wikimedia.org/T167784) (owner: 10Jcrespo) [19:23:44] paladox: I don't see how those can be syntax errors since they were present in https://gerrit.wikimedia.org/r/#/c/361675/2/modules/puppetmaster/templates/puppetmaster.erb as of this morning [19:24:15] (03PS6) 10Urbanecm: Initial configuration for maiwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361297 (https://phabricator.wikimedia.org/T168782) [19:24:17] Hmm your right, it just got me furthur and failed at apache2 service. Sorry for the ping. [19:24:43] also deployment-puppetmaster is just fine and has those slashes [19:25:19] (03CR) 10Dzahn: "line 19 seems indeed a syntax error, closes the tag early. and i hear puppet broke in labs and it was a recent change, so yea. but Andrew " [puppet] - 10https://gerrit.wikimedia.org/r/361730 (owner: 10Paladox) [19:25:57] (03CR) 10Krinkle: "Don't worry, it's passing now." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361297 (https://phabricator.wikimedia.org/T168782) (owner: 10Urbanecm) [19:26:01] works now [19:26:10] ooh. good! [19:26:13] after doing https://gerrit.wikimedia.org/r/361730 then reverting it [19:26:51] ok,sounds like the local master had not caught up with the change yet? [19:27:38] it's cought up, but seems one puppet master is fixed and another one broken still. [19:28:11] ok, but the change can be abandoned then [19:28:33] yep [19:28:37] will do that now [19:29:03] (03Abandoned) 10Paladox: puppetmaster: Fix syntax error in puppetmaster.erb [puppet] - 10https://gerrit.wikimedia.org/r/361730 (owner: 10Paladox) [19:29:19] so the phab master needs to be synced i suppose [19:29:42] thanks [19:29:48] it's sync already. [19:30:58] but you still see the erorr or all is good? [19:31:35] error still there on puppet-phabricator, error is gone on puppet-paladox3. [19:35:21] ok, well, i don't know. synced should mean synced. maybe first wait a little and see if it persists [19:36:50] (03Restored) 10Paladox: puppetmaster: Fix syntax error in puppetmaster.erb [puppet] - 10https://gerrit.wikimedia.org/r/361730 (owner: 10Paladox) [19:37:01] do you usually do something manual to sync it. [19:37:04] heh [19:37:31] (03PS3) 10Paladox: puppetmaster: Fix syntax error in puppetmaster.erb [puppet] - 10https://gerrit.wikimedia.org/r/361730 [19:37:34] (03Abandoned) 10Paladox: puppetmaster: Fix syntax error in puppetmaster.erb [puppet] - 10https://gerrit.wikimedia.org/r/361730 (owner: 10Paladox) [20:03:56] it's fixed now. it wasnt actually in sync [20:04:22] (03PS3) 10Dzahn: site: remove smokeping role from netmon1001 [puppet] - 10https://gerrit.wikimedia.org/r/361615 (https://phabricator.wikimedia.org/T159756) [20:05:30] mutante yes it was :), it just did not apply the puppet change as puppet was broken. [20:05:41] (03CR) 10Dzahn: [C: 032] "has been running on netmon1002 over night, graphs working." [puppet] - 10https://gerrit.wikimedia.org/r/361615 (https://phabricator.wikimedia.org/T159756) (owner: 10Dzahn) [20:07:32] paladox: ok, yea, just calling all that "sync" to sum it up. [20:12:16] PROBLEM - Juniper alarms on asw-ulsfo is CRITICAL: JNX_ALARMS CRITICAL - 1 red alarms, 0 yellow alarms [20:16:13] !log twentyafterfour@tin Started scap: sync 1.30.0-wmf.7 and promote to test wikis - refs T167536 [20:16:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:16:24] T167536: MW-1.30.0-wmf.7 deployment blockers - https://phabricator.wikimedia.org/T167536 [20:28:36] PROBLEM - salt-minion processes on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:28:36] PROBLEM - nutcracker process on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:28:46] PROBLEM - dhclient process on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:29:36] RECOVERY - dhclient process on thumbor1001 is OK: PROCS OK: 0 processes with command name dhclient [20:29:53] Ok, disabling the newly audited ulsfo network ports for decom systems cp40(1[129]|20) [20:30:06] PROBLEM - dhclient process on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:30:12] lets see if any other hosts go down like before (they should not, the network port descriptions were incorrect on all the cp systems in ulsfo) [20:30:26] RECOVERY - salt-minion processes on thumbor1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [20:30:26] RECOVERY - nutcracker process on thumbor1001 is OK: PROCS OK: 1 process with UID = 115 (nutcracker), command name nutcracker [20:30:36] PROBLEM - nutcracker process on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:30:36] PROBLEM - salt-minion processes on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:31:26] RECOVERY - nutcracker process on thumbor1002 is OK: PROCS OK: 1 process with UID = 115 (nutcracker), command name nutcracker [20:31:36] RECOVERY - salt-minion processes on thumbor1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [20:31:56] RECOVERY - dhclient process on thumbor1002 is OK: PROCS OK: 0 processes with command name dhclient [20:32:23] 10Operations, 10ops-ulsfo, 10hardware-requests, 10Patch-For-Review: Decommission cp4011, cp4012, cp4019, cp4020 - https://phabricator.wikimedia.org/T167377#3384329 (10RobH) Ok, I had to audit and fix all the switch ports for cp systems in ulsfo. They all have ALL been audited, and the new port description... [20:38:57] (03PS4) 10Jcrespo: Parsercache: Purge rows every day, and reduce TTL to 22 days [puppet] - 10https://gerrit.wikimedia.org/r/361656 (https://phabricator.wikimedia.org/T167784) [20:40:34] (03CR) 10Jcrespo: "As a comment- I would expect to being able to rise the limit to over 22 days- but right now the server availability is threatened, and we " [puppet] - 10https://gerrit.wikimedia.org/r/361656 (https://phabricator.wikimedia.org/T167784) (owner: 10Jcrespo) [20:44:04] (03PS1) 10BBlack: cache_misc: take ulsfo IPs out of effective service [dns] - 10https://gerrit.wikimedia.org/r/361777 (https://phabricator.wikimedia.org/T164610) [20:46:57] !log twentyafterfour@tin Finished scap: sync 1.30.0-wmf.7 and promote to test wikis - refs T167536 (duration: 30m 44s) [20:47:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:47:07] T167536: MW-1.30.0-wmf.7 deployment blockers - https://phabricator.wikimedia.org/T167536 [20:52:25] (03CR) 10Dzahn: [C: 031] "lgtm, per Jeff and per http://www.openspf.org/SPF_Record_Syntax and per commit message" [dns] - 10https://gerrit.wikimedia.org/r/361718 (https://phabricator.wikimedia.org/T167704) (owner: 10Herron) [20:54:52] (03PS2) 10Dzahn: remove torrus.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/361618 [20:56:01] (03PS2) 10BBlack: cache_misc: take ulsfo IPs out of effective service [dns] - 10https://gerrit.wikimedia.org/r/361777 (https://phabricator.wikimedia.org/T164610) [20:56:05] (03CR) 10Dzahn: [C: 032] "removed from varnish in https://gerrit.wikimedia.org/r/#/c/361617/ - it's dead dead" [dns] - 10https://gerrit.wikimedia.org/r/361618 (owner: 10Dzahn) [20:56:16] (03CR) 10jerkins-bot: [V: 04-1] cache_misc: take ulsfo IPs out of effective service [dns] - 10https://gerrit.wikimedia.org/r/361777 (https://phabricator.wikimedia.org/T164610) (owner: 10BBlack) [20:58:40] 10Operations, 10Interactive-Sprint, 10Maps (Maps-data): Monitor PostgreSQL connection slots - https://phabricator.wikimedia.org/T168767#3384409 (10Gehel) [20:59:12] (03PS1) 1020after4: group0 wikis to 1.30.0-wmf.7 refs T167536 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361778 [20:59:14] (03CR) 1020after4: [C: 032] group0 wikis to 1.30.0-wmf.7 refs T167536 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361778 (owner: 1020after4) [20:59:29] (03PS3) 10BBlack: cache_misc: take ulsfo IPs out of effective service [dns] - 10https://gerrit.wikimedia.org/r/361777 (https://phabricator.wikimedia.org/T164610) [20:59:58] (03PS1) 10BBlack: cache_misc: remove ulsfo nodes and IPs [puppet] - 10https://gerrit.wikimedia.org/r/361779 (https://phabricator.wikimedia.org/T164610) [21:02:05] (03PS4) 10BBlack: cache_misc: take ulsfo IPs out of effective service [dns] - 10https://gerrit.wikimedia.org/r/361777 (https://phabricator.wikimedia.org/T164610) [21:02:22] (03Merged) 10jenkins-bot: group0 wikis to 1.30.0-wmf.7 refs T167536 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361778 (owner: 1020after4) [21:02:32] (03CR) 10jenkins-bot: group0 wikis to 1.30.0-wmf.7 refs T167536 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361778 (owner: 1020after4) [21:03:17] (03CR) 10BBlack: [C: 032] cache_misc: take ulsfo IPs out of effective service [dns] - 10https://gerrit.wikimedia.org/r/361777 (https://phabricator.wikimedia.org/T164610) (owner: 10BBlack) [21:03:37] !log twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 wikis to 1.30.0-wmf.7 refs T167536 [21:03:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:03:47] T167536: MW-1.30.0-wmf.7 deployment blockers - https://phabricator.wikimedia.org/T167536 [21:04:02] uh someone left some outstanding unmerged DNS changes [21:04:19] bblack: yes, we are talking about it [21:04:25] i see which one it is [21:04:54] TTL changes for ganeti https://gerrit.wikimedia.org/r/#/c/361661/1 [21:05:19] ok [21:05:19] the newer one is me and was just showing herron about authdns-update [21:05:39] ok, merging it all [21:05:44] of course ran into this whihc usually never happens [21:05:46] heh [21:06:04] just told herron to confirm and do it, either works, +1 [21:06:08] ok :) [21:06:21] sorry :) [21:17:31] when will Ununpentium get renamed to Moscovium? :o [21:18:05] (03PS1) 10Hashar: Restore labnet-users access to nova hosts [puppet] - 10https://gerrit.wikimedia.org/r/361786 (https://phabricator.wikimedia.org/T169018) [21:19:04] 10Operations, 10ops-eqiad, 10DC-Ops, 10Discovery-Search (Current work): some elasticsearch servers in eqiad have CPU overheating - https://phabricator.wikimedia.org/T168816#3384505 (10Cmjohnson) @gehel, I have seen the task and will be on vacation July 3-10. Let's plan on some time after the 10th to do th... [21:19:14] (03CR) 10Hashar: "Broken since April 10th, but apparently I am the sole user relying on it from time to time. So there is no hurry :]" [puppet] - 10https://gerrit.wikimedia.org/r/361786 (https://phabricator.wikimedia.org/T169018) (owner: 10Hashar) [21:20:54] (03PS1) 10Jdlrobson: Cleanup: Remove wgMFContentNamespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361788 [21:20:56] (03PS1) 10Jdlrobson: Disable logging from MobileFormatter#moveFirstParagraphBeforeInfobox [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361789 (https://phabricator.wikimedia.org/T169001) [21:23:06] PROBLEM - Check Varnish expiry mailbox lag on cp1074 is CRITICAL: CRITICAL: expiry mailbox lag is 2085908 [21:24:40] !log cp1074: restart backend (mailbox lag) [21:24:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:31:14] 10Operations, 10ops-ulsfo, 10hardware-requests, 10Patch-For-Review: Decommission cp4011, cp4012, cp4019, cp4020 - https://phabricator.wikimedia.org/T167377#3384530 (10RobH) cp4020 securely wiped using hdparm off a usb boot stick of finnix (debian live lacked hdparm utilities.) [21:33:06] RECOVERY - Check Varnish expiry mailbox lag on cp1074 is OK: OK: expiry mailbox lag is 0 [21:42:20] 10Operations, 10ops-ulsfo, 10hardware-requests, 10Patch-For-Review: Decommission cp4011, cp4012, cp4019, cp4020 - https://phabricator.wikimedia.org/T167377#3384560 (10RobH) cp4019 used hdparm to securely erase ssds [21:43:10] (03PS2) 10RobH: decom of cp4011, cp4012, cp4019, cp4020 [puppet] - 10https://gerrit.wikimedia.org/r/361702 [21:43:57] (03CR) 10RobH: [C: 032] decom of cp4011, cp4012, cp4019, cp4020 [puppet] - 10https://gerrit.wikimedia.org/r/361702 (owner: 10RobH) [21:46:14] (03PS2) 10Alexandros Kosiaris: Renumber releases1001.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/361461 [21:46:16] (03PS2) 10Alexandros Kosiaris: Renumber 4 VMs to public1-c-eqiad [dns] - 10https://gerrit.wikimedia.org/r/361662 [21:46:18] (03PS2) 10Alexandros Kosiaris: Renumber install1002.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/361663 [21:46:20] (03PS2) 10Alexandros Kosiaris: Bump the TTLs again after renumbering [dns] - 10https://gerrit.wikimedia.org/r/361664 [21:47:17] (03CR) 10Alexandros Kosiaris: [C: 032] Renumber releases1001.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/361461 (owner: 10Alexandros Kosiaris) [21:47:33] 10Operations, 10ops-ulsfo, 10hardware-requests, 10Patch-For-Review: Decommission cp4011, cp4012, cp4019, cp4020 - https://phabricator.wikimedia.org/T167377#3384570 (10RobH) [21:47:45] (03PS1) 10RobH: decom of cp40(1[129]|20), removing prod dns [dns] - 10https://gerrit.wikimedia.org/r/361792 [21:48:10] (03PS2) 10BBlack: cache_misc: remove ulsfo nodes and IPs [puppet] - 10https://gerrit.wikimedia.org/r/361779 (https://phabricator.wikimedia.org/T164610) [21:49:10] (03PS2) 10RobH: decom of cp40(1[129]|20), removing prod dns [dns] - 10https://gerrit.wikimedia.org/r/361792 [21:49:30] (03CR) 10RobH: [C: 032] decom of cp40(1[129]|20), removing prod dns [dns] - 10https://gerrit.wikimedia.org/r/361792 (owner: 10RobH) [21:49:39] (03CR) 10BBlack: [C: 032] cache_misc: remove ulsfo nodes and IPs [puppet] - 10https://gerrit.wikimedia.org/r/361779 (https://phabricator.wikimedia.org/T164610) (owner: 10BBlack) [21:49:54] hrmm [21:50:01] someone got my dns change in their authdns update [21:50:16] !log removing cp4001-4 (cache_misc@ulsfo), except a few minor related alerts from race conditions [21:50:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:51:25] also, expect me to have exceptionally bad speling :P [21:53:22] PROBLEM - LVS HTTP IPv4 on misc-web-lb.ulsfo.wikimedia.org is CRITICAL: connect to address 198.35.26.120 and port 80: Connection refused [21:53:27] PROBLEM - LVS HTTP IPv6 on misc-web-lb.ulsfo.wikimedia.org_ipv6 is CRITICAL: connect to address 2620:0:863:ed1a::3:d and port 80: Connection refused [21:53:33] .... [21:53:36] bleh, I should've known to downtime those [21:53:40] ok, that wasnt me [21:53:41] whew [21:53:43] there's little point now [21:53:51] I can ack though [21:53:56] i had the 'oh shit what did i bump' moment [21:54:00] ;] [21:54:03] PROBLEM - LVS HTTPS IPv4 on misc-web-lb.ulsfo.wikimedia.org is CRITICAL: connect to address 198.35.26.120 and port 443: Connection refused [21:54:08] yea, it's actually better to get 2 texts, where the second is the recovery, than just one [21:54:12] PROBLEM - LVS HTTPS IPv6 on misc-web-lb.ulsfo.wikimedia.org_ipv6 is CRITICAL: connect to address 2620:0:863:ed1a::3:d and port 443: Connection refused [21:54:16] then you know it's over [21:54:19] poking my head up, seems a known issue? k :) [21:54:27] nothing to see here! [21:54:31] (sorry for the pages!) [21:55:01] * volans heads back debugging a race condition [21:55:40] ok, one last to wipe then im leaving to beat rush hour across the bridge =] [21:56:14] bblack: we'll need to determine which systems to wire up since i can swap new systems into the places of the 4 we just decom'd [21:56:29] robh: there's 4 more decoms coming (will be ready for tomorrow I guess) [21:56:38] yeah, no worries [21:56:50] i only rushed down here since i fucked up the decom proces midway [21:57:02] having shutdown and rmeoved puppet for the 4 hosts but not being able to disable network [21:57:06] mutante: boot_order: network ? [21:57:11] was this you ? [21:57:20] re: releases1001.eqiad.wmnet [21:57:22] akosiaris: it was me [21:57:34] akosiaris: i was debugging earlier, to see if DHCP gives it an IP [21:57:35] ah ... so the VM is now reinstalling itself [21:57:41] the installer started [21:57:42] that is ok too :) [21:57:45] and it's wiping everything [21:57:53] not a big deal :) [21:57:59] it wasn't used yet with actual releases [21:58:05] a... phew [21:58:09] and nothing should have been manual [21:58:14] for a moment I was terrified [21:58:32] heh, nah, also it should be in bacula [21:58:56] earlier i was already thinking "meh, just reinstall it" [21:58:57] !log pybal restarts on lvs4004,lvs4002 for misc@ulsfo [21:59:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:59:05] and that didnt work [21:59:08] well that's what's happening now [21:59:11] it's being reinstalled [21:59:16] ok :) i can take it back from there if you want [21:59:33] sure... it did fail the install btw [21:59:40] Installation step failed │ [21:59:40] │ An installation step failed. You can try to run the failing item │ [21:59:40] │ again from the menu, or skip it and choose something else. The │ [21:59:40] │ failing step is: Continue installation remotely using SSH │ [21:59:42] oh.. now that part is new [21:59:46] never seen that before [22:00:02] eh.. yea.. did not see that when i originally made it [22:00:28] 10Operations, 10ops-ulsfo, 10hardware-requests, 10Patch-For-Review: Decommission cp4011, cp4012, cp4019, cp4020 - https://phabricator.wikimedia.org/T167377#3384578 (10RobH) cp4011 and cp4012 securely erased [22:00:33] I am btw cleaning puppet and salt keys/certs [22:00:51] so the icinga stops complaining [22:01:06] ok, great [22:01:17] 10Operations, 10ops-ulsfo, 10hardware-requests: Decommission cp4011, cp4012, cp4019, cp4020 - https://phabricator.wikimedia.org/T167377#3384579 (10RobH) 05Open>03stalled [22:01:25] i'll try the install again and re-add it later [22:01:31] ok [22:01:38] maybe i should delete it and recreate it, using the same name? [22:01:54] was peeking at criticals, anyone know this one's cause? [22:01:55] I doubt that would change anything [22:01:56] CRITICAL: https://grafana.wikimedia.org/dashboard/db/webpagetest-alerts is alerting: Difference in JS size mobile authenticated [ALERT] alert. [22:02:05] akosiaris: ok [22:03:09] mutante: I would advise against it now that I think of it. I haven't yet updated the docs, you would have to fiddle with nodegroups and network settings. eqiad is not yet fully ready [22:03:43] funny thing is a pressed enter [22:03:47] and it's now continuing ? [22:03:51] what on earth [22:03:55] akosiaris: ok, good to know! and .. heh [22:04:14] bd808: wise decision, next time remove me too ;-) [22:05:08] maybe the old IP was still somewhere re: "installation remotely using SSH " [22:05:33] (03Abandoned) 10Jdrewniak: Updating wikipedia.org stats [mediawiki-config] - 10https://gerrit.wikimedia.org/r/355632 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [22:06:08] mutante: ssh to the installer wasn't working btw ;-) [22:06:15] RECOVERY - Juniper alarms on asw-ulsfo is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms [22:07:27] akosiaris: i see the installer now. on console. want me to take it? [22:07:34] yup [22:07:36] I am off [22:07:37] ah, finished [22:07:44] I 've set boot_order=disk already [22:07:52] ok, great. alright, see you later [22:10:05] tries install-console.. heh [22:10:52] elukey: sorry for bothering again but it seems any Apache rewrite rule I write just doesn't work. I cherry-picked this in beta and no success https://gerrit.wikimedia.org/r/#/c/360891/1/modules/mediawiki/files/apache/beta/sites/wikipedia.conf [22:11:27] what is missing/wrong? I read manuals several times and it looks okay [22:12:07] also any help from anyone is appreciated :) [22:12:58] Amir1: the L flag means it's the last rule, but there are actually other rules after it [22:13:14] (03CR) 10Alexandros Kosiaris: "Per https://tools.wmflabs.org/openstack-browser/puppetclass/ it looks like no project actually uses role::icinga, which is the only class " [puppet] - 10https://gerrit.wikimedia.org/r/361361 (owner: 10Paladox) [22:13:20] I tried without the flag too, but let me try again [22:14:03] (03CR) 10Dzahn: [C: 04-1] "yes, we already agreed to just delete it :) i was trying to revert the revert but it's an annoying rebase" [puppet] - 10https://gerrit.wikimedia.org/r/361361 (owner: 10Paladox) [22:14:51] (03PS1) 10Alexandros Kosiaris: Revert "Revert "icinga: remove check_ram.sh doesn't seem to be used anywhere"" [puppet] - 10https://gerrit.wikimedia.org/r/361795 [22:14:56] ^ ? [22:15:03] (03Abandoned) 10Dzahn: Nrpe: Fix check_ram script to work on stretch [puppet] - 10https://gerrit.wikimedia.org/r/361361 (owner: 10Paladox) [22:15:20] (03CR) 10jerkins-bot: [V: 04-1] Revert "Revert "icinga: remove check_ram.sh doesn't seem to be used anywhere"" [puppet] - 10https://gerrit.wikimedia.org/r/361795 (owner: 10Alexandros Kosiaris) [22:15:25] akosiaris: https://gerrit.wikimedia.org/r/#/c/361729/1 [22:15:42] but please take any one that is working :) [22:15:56] (03CR) 10Dzahn: [C: 031] Revert "Revert "icinga: remove check_ram.sh doesn't seem to be used anywhere"" [puppet] - 10https://gerrit.wikimedia.org/r/361795 (owner: 10Alexandros Kosiaris) [22:15:57] This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. [22:15:59] grrrr [22:16:02] that :) [22:16:08] that is exactly what i had open, hehe [22:16:14] so.. gerrit can revert it, but jenkins can't ? [22:16:18] (03PS3) 10Volans: ClusterShell: allow to set a timeout per command [software/cumin] - 10https://gerrit.wikimedia.org/r/359466 (https://phabricator.wikimedia.org/T164838) [22:16:20] (03PS3) 10Volans: CLI: migrate to timeout per command [software/cumin] - 10https://gerrit.wikimedia.org/r/359467 (https://phabricator.wikimedia.org/T164838) [22:16:22] (03PS2) 10Volans: Fix Pylint and other tools reported errors [software/cumin] - 10https://gerrit.wikimedia.org/r/361040 (https://phabricator.wikimedia.org/T154588) [22:16:24] (03PS7) 10Volans: Package metadata and testing tools improvements [software/cumin] - 10https://gerrit.wikimedia.org/r/338808 (https://phabricator.wikimedia.org/T154588) [22:16:26] (03PS3) 10Volans: Tests: convert unittest to pytest [software/cumin] - 10https://gerrit.wikimedia.org/r/361274 (https://phabricator.wikimedia.org/T154588) [22:16:42] mutante: I tried without "L" flag and gives out 404 [22:16:59] mutante: yeah I 'll abandon mine, yours is just fine, it's just not loved by jenkins, same as mine [22:17:02] 10Operations, 10Traffic, 10Patch-For-Review: Unprovision cache_misc @ ulsfo - https://phabricator.wikimedia.org/T164610#3239748 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by bblack on neodymium.eqiad.wmnet for hosts: ``` ['cp4001.ulsfo.wmnet', 'cp4002.ulsfo.wmnet', 'cp4003.ulsfo.wmnet', 'cp4... [22:17:04] akosiaris: and all of a sudden, now install-console works too [22:17:08] (03PS2) 10Jdlrobson: Only enable logging on enwiki for MobileFormatter#moveFirstParagraphBeforeInfobox [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361789 (https://phabricator.wikimedia.org/T169001) [22:17:16] (03Abandoned) 10Alexandros Kosiaris: Revert "Revert "icinga: remove check_ram.sh doesn't seem to be used anywhere"" [puppet] - 10https://gerrit.wikimedia.org/r/361795 (owner: 10Alexandros Kosiaris) [22:17:27] ok [22:18:20] Amir1: yea, eh, i don't know about the new rule, it was just an observation i made. i think it would break the existing rules that follow it [22:18:48] but that is seprate from getting the new rule to do what it should [22:19:05] PROBLEM - salt-minion processes on puppetmaster1001 is CRITICAL: PROCS CRITICAL: 5 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [22:19:20] 5 salt minions, heh [22:22:21] that happens every time someone does a wmf-auto-reimage on several hosts, I think? [22:22:45] ah [22:22:52] (at least, the coincidence seems to be that I observe that alert after I start such things) [22:24:09] and the timing of the new minion procs lines up with that theory [22:25:32] maybe somehow the per-node salt-call (s) in wmf-reimage for key status+delete end up creating new minions, I donno [22:25:59] *nod* [22:31:18] mutante: I tested it with different flags and it seems when I add R flag, it works otherwise 404s [22:34:07] (03PS2) 10Ladsgroup: Add /data/ url redirect in beta cluster (Wikipedia only) [puppet] - 10https://gerrit.wikimedia.org/r/360891 (https://phabricator.wikimedia.org/T163922) [22:34:43] (03CR) 10Ladsgroup: Make /entity/ redirect internal (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/357985 (https://phabricator.wikimedia.org/T119536) (owner: 10Ladsgroup) [22:36:51] (03PS2) 10Ladsgroup: Add /data/ Redirect for commons [puppet] - 10https://gerrit.wikimedia.org/r/360887 (https://phabricator.wikimedia.org/T163922) [22:38:22] Amir1: if you want to rewrite based on query string, you need something like RewriteCond %{QUERY_STRING} [22:39:13] hmm, let me check. Thanks [22:39:24] Amir1: actually, nevermind that comment. but i am not sure if beta uses the same config template [22:39:47] i saw that wrong, the query parameter is only in the target URL it looks [22:40:49] https://gerrit.wikimedia.org/r/#/c/360891/ [22:40:52] https://gerrit.wikimedia.org/r/#/c/360887/ [22:41:01] RECOVERY - Host releases1001 is UP: PING OK - Packet loss = 0%, RTA = 0.48 ms [22:41:02] their configs are different [22:42:51] Amir1: the changes are also in different virtual hosts? [22:43:02] *.wikipedia.beta vs. commons.wikimedia. [22:43:13] which one is it for [22:43:30] it should be for all of them [22:43:34] all mediawiki wikis [22:43:47] but I want to test on wikipedia in beta and commons in prod first [22:44:01] PROBLEM - Check systemd state on releases1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [22:44:03] !log demon@tin Synchronized README: force co-master sync (duration: 00m 47s) [22:44:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:44:16] also commons is the most important one to have it as RDF mapping of wikidata is currently links to them [22:44:18] that one in remant.conf would only infleunce ServerName commons.wikimedia.org it seems [22:44:28] (03PS1) 10Thcipriani: Scap: scap_source correct gid [puppet] - 10https://gerrit.wikimedia.org/r/361796 [22:44:30] ah [22:51:11] RECOVERY - salt-minion processes on puppetmaster1001 is OK: PROCS OK: 4 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [22:52:37] heh I wonder why 4 is an ok number of minions? :) [22:53:54] (03PS3) 10Ladsgroup: Add /data/ Redirect for commons [puppet] - 10https://gerrit.wikimedia.org/r/360887 (https://phabricator.wikimedia.org/T163922) [22:54:49] (03PS3) 10Ladsgroup: Add /data/ url redirect in beta cluster (Wikipedia only) [puppet] - 10https://gerrit.wikimedia.org/r/360891 (https://phabricator.wikimedia.org/T163922) [22:55:01] RECOVERY - Check systemd state on releases1001 is OK: OK - running: The system is fully operational [22:57:50] (03PS2) 10Dzahn: Revert "Revert "icinga: remove check_ram.sh doesn't seem to be used anywhere"" [puppet] - 10https://gerrit.wikimedia.org/r/361729 [22:59:09] bblack: we raised it to 4 in https://gerrit.wikimedia.org/r/#/c/179440/:) [22:59:45] 2014, no recollection :) [23:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170627T2300). Please do the needful. [23:00:05] bd808, jan_drewniak, and Jdlrobson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:39] o/ [23:01:01] 10Operations, 10ops-ulsfo, 10hardware-requests: Decommission cp400[1-4] - https://phabricator.wikimedia.org/T169020#3384648 (10BBlack) [23:01:02] (03CR) 10Dzahn: "we see this alert for 5 processes - probably caused by wmf-reimage script during reinstalls on multiple hosts at once" [puppet] - 10https://gerrit.wikimedia.org/r/179440 (owner: 10Faidon Liambotis) [23:01:19] o/ [23:02:08] do we have a SWAT master of ceremonies for today? [23:02:19] I can do it I guess [23:02:28] bd808 you're first [23:02:42] I *think* its a total no-op for prod [23:02:48] (03PS1) 10Dzahn: base::monitoring: raise crit threshold for salt-minions to 5 [puppet] - 10https://gerrit.wikimedia.org/r/361797 [23:03:10] (03CR) 10Chad: [C: 032] Move ukwikimedia to deleted.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/360564 (https://phabricator.wikimedia.org/T168436) (owner: 10BryanDavis) [23:03:22] Yeah, I'm not even going to bother with mwdebug on that [23:03:39] (03CR) 10Dzahn: [C: 04-1] "argg.. the .gitmodules change is not intended. suckmodules" [puppet] - 10https://gerrit.wikimedia.org/r/361729 (owner: 10Dzahn) [23:03:41] (03CR) 10Chad: [C: 032] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361681 (https://phabricator.wikimedia.org/T142582) (owner: 10Jdrewniak) [23:03:49] jan_drewniak: you're second [23:03:58] cool [23:04:11] mutante: I cherry-picked the related change for beta and now https://en.wikipedia.beta.wmflabs.org/data/main/Albert_Einstein works just fine, do you think we can merge these two? https://gerrit.wikimedia.org/r/#/c/360891/ https://gerrit.wikimedia.org/r/#/c/360887/ [23:04:17] (03Merged) 10jenkins-bot: Move ukwikimedia to deleted.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/360564 (https://phabricator.wikimedia.org/T168436) (owner: 10BryanDavis) [23:04:38] 10Operations, 10Traffic, 10Patch-For-Review: Unprovision cache_misc @ ulsfo - https://phabricator.wikimedia.org/T164610#3384666 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp4001.ulsfo.wmnet', 'cp4002.ulsfo.wmnet', 'cp4003.ulsfo.wmnet', 'cp4004.ulsfo.wmnet'] ``` Of which those **FAILED**:... [23:04:48] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361681 (https://phabricator.wikimedia.org/T142582) (owner: 10Jdrewniak) [23:04:49] FWIW: main refers to "main" slot of the page see the RFC for the details: https://phabricator.wikimedia.org/T161527 [23:04:51] \o [23:05:02] oh it's SWAT. Sorry [23:05:32] Amir1: sorry, no, that needs other reviewers, like Daniel K for example [23:05:39] !log demon@tin Synchronized dblists/: ukwikimedia swapped from closed to deleted (duration: 00m 46s) [23:05:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:05:53] bd808: done [23:05:58] (03CR) 10jenkins-bot: Move ukwikimedia to deleted.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/360564 (https://phabricator.wikimedia.org/T168436) (owner: 10BryanDavis) [23:06:11] jan_drewniak: Do we have a way to test portals on mwdebug? [23:06:13] mutante: he is probably asleep. I will tell him tomorrow [23:06:24] RainbowSprinkles: thx much [23:06:52] RainbowSprinkles: yeah, I don't know how it actually gets up on mwdebug though [23:07:23] a scap pull should do the trick [23:08:01] the sync portal script provides scap commands in an ideal order, so sync everything to mwdebug should work [23:08:06] and I think it worked in the past [23:08:06] Ah yeah [23:08:11] I wasn't sure on the caching bit [23:08:17] But I guess mwdebug cache busts [23:10:15] I think it's live on mwdebug1001 now [23:10:20] But dunno what changes I'm looking for [23:12:20] RainbowSprinkles: it's just different numbers, and yup, they're different :) [23:12:29] looks good to me [23:12:35] Ok, syncing everywhere [23:13:32] !log demon@tin Synchronized portals/prod/wikipedia.org/assets: (no justification provided) (duration: 00m 47s) [23:13:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:14:20] !log demon@tin Synchronized portals: (no justification provided) (duration: 00m 47s) [23:14:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:14:37] jan_drewniak: Ok, should be live + purged urls everywhere [23:14:59] RainbowSprinkles: yup! looks good [23:15:14] (03PS3) 10Dzahn: Revert "Revert "icinga: remove check_ram.sh doesn't seem to be used anywhere"" [puppet] - 10https://gerrit.wikimedia.org/r/361729 [23:15:24] jdlrobson: About? [23:15:37] RainbowSprinkles: yup [23:15:47] Ok, I wanna do your last 2 first, they seem trivial [23:16:21] PROBLEM - All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.197 second response time [23:16:22] (03CR) 10Chad: [C: 032] Only enable logging on enwiki for MobileFormatter#moveFirstParagraphBeforeInfobox [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361789 (https://phabricator.wikimedia.org/T169001) (owner: 10Jdlrobson) [23:16:33] (03CR) 10Chad: [C: 032] Cleanup: Remove wgMFContentNamespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361788 (owner: 10Jdlrobson) [23:17:35] (03Merged) 10jenkins-bot: Cleanup: Remove wgMFContentNamespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361788 (owner: 10Jdlrobson) [23:18:57] (03PS3) 10Chad: Only enable logging on enwiki for MobileFormatter#moveFirstParagraphBeforeInfobox [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361789 (https://phabricator.wikimedia.org/T169001) (owner: 10Jdlrobson) [23:19:16] (03PS1) 10Dzahn: icinga: remove unused check_ram.sh [puppet] - 10https://gerrit.wikimedia.org/r/361798 [23:20:21] !log demon@tin Synchronized wmf-config/InitialiseSettings.php: Removing wgMFContentNamespace (duration: 00m 46s) [23:20:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:20:30] jdlrobson: First one live ^ [23:20:44] RainbowSprinkles: on it. [23:20:48] everywhere right? [23:21:01] Yep [23:21:19] RainbowSprinkles: yup that's good to go [23:21:32] or good not to revert i should say :) [23:21:36] Ok second one now [23:21:54] (03PS2) 10Dzahn: base::monitoring: raise crit threshold for salt-minions to 5 [puppet] - 10https://gerrit.wikimedia.org/r/361797 [23:22:17] !log demon@tin Synchronized wmf-config/InitialiseSettings.php: Only enable logging on enwiki for MobileFormatter#moveFirstParagraphBeforeInfobox (duration: 00m 46s) [23:22:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:22:41] 10Operations, 10ops-ulsfo, 10Traffic, 10Patch-For-Review: replace ulsfo aging servers - https://phabricator.wikimedia.org/T164327#3384704 (10BBlack) [23:22:44] 10Operations, 10Traffic, 10Patch-For-Review: Unprovision cache_misc @ ulsfo - https://phabricator.wikimedia.org/T164610#3384701 (10BBlack) 05Open>03Resolved a:03BBlack I had to manually fix up salt keys and do final reboots on 4001+4003, all should be sane and consistent now (except for a couple of IPM... [23:23:51] (03CR) 10Dzahn: [C: 032] base::monitoring: raise crit threshold for salt-minions to 5 [puppet] - 10https://gerrit.wikimedia.org/r/361797 (owner: 10Dzahn) [23:24:32] jdlrobson: RelatedArticles' two go together I'm guessing? Can you test bucketing from mwdebug? [23:24:39] RainbowSprinkles: they need to go together [23:24:41] (03CR) 10Dzahn: "raised to 5 https://gerrit.wikimedia.org/r/#/c/361797/" [puppet] - 10https://gerrit.wikimedia.org/r/179440 (owner: 10Faidon Liambotis) [23:24:43] and can't really test it [23:24:56] RainbowSprinkles: : Only enable logging on enwiki for MobileFormatter#moveFirstParagraphBeforeInfobox looking good so far [23:24:59] (03PS2) 10Dzahn: icinga: remove unused check_ram.sh [puppet] - 10https://gerrit.wikimedia.org/r/361798 [23:26:22] !log demon@tin Synchronized php-1.30.0-wmf.6/extensions/RelatedArticles/: Hygene and stuff (duration: 00m 46s) [23:26:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:26:32] Ok live ^ [23:26:49] (03CR) 1020after4: [C: 031] Scap: scap_source correct gid [puppet] - 10https://gerrit.wikimedia.org/r/361796 (owner: 10Thcipriani) [23:27:20] RainbowSprinkles: awesome. thank you! [23:27:23] RECOVERY - All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.338 second response time [23:32:13] (03PS3) 10Dzahn: icinga: remove unused check_ram.sh [puppet] - 10https://gerrit.wikimedia.org/r/361798 [23:36:10] (03PS4) 10Dzahn: icinga: remove unused check_ram.sh [puppet] - 10https://gerrit.wikimedia.org/r/361798 [23:36:26] (03PS5) 10Dzahn: icinga: remove unused check_ram.sh [puppet] - 10https://gerrit.wikimedia.org/r/361798 [23:37:20] 10Operations, 10Wikibase-Internal-Serialization, 10Wikidata: Wikibase ontology url: wikiba.se or wikidata.org? - https://phabricator.wikimedia.org/T169023#3384741 (10Krinkle) [23:37:28] (03CR) 10Dzahn: [C: 032] "ensure => absent first to remove it everywhere... checked with cumin it's still around just unused" [puppet] - 10https://gerrit.wikimedia.org/r/361798 (owner: 10Dzahn) [23:37:45] 10Operations, 10Wikibase-Internal-Serialization, 10Wikidata: Wikibase ontology url: wikiba.se or wikidata.org? - https://phabricator.wikimedia.org/T169023#3384753 (10Krinkle) [23:38:12] 10Operations, 10Wikibase-DataModel, 10Wikidata: Wikibase ontology url: wikiba.se or wikidata.org? - https://phabricator.wikimedia.org/T169023#3384741 (10Krinkle) [23:38:39] 10Operations, 10Wikibase-DataModel, 10Wikidata: Wikibase ontology url: wikiba.se or wikidata.org? - https://phabricator.wikimedia.org/T169023#3384741 (10Krinkle) [23:39:14] (03CR) 10Dzahn: "https://gerrit.wikimedia.org/r/#/c/361798/" [puppet] - 10https://gerrit.wikimedia.org/r/361361 (owner: 10Paladox) [23:39:42] (03Abandoned) 10Dzahn: Revert "Revert "icinga: remove check_ram.sh doesn't seem to be used anywhere"" [puppet] - 10https://gerrit.wikimedia.org/r/361729 (owner: 10Dzahn) [23:41:13] 10Operations, 10Wikibase-DataModel, 10Wikidata: Wikibase ontology url: wikiba.se or wikidata.org? - https://phabricator.wikimedia.org/T169023#3384762 (10Krinkle) [23:43:01] (03PS1) 10Dzahn: icinga: final remove of check_ram.sh remnant [puppet] - 10https://gerrit.wikimedia.org/r/361799 [23:44:04] 10Operations, 10Wikibase-DataModel, 10Wikidata: Wikibase ontology url: wikiba.se or wikidata.org? - https://phabricator.wikimedia.org/T169023#3384787 (10Krinkle) Based on {T93207}, I've changed this task to instead be a clean up task to remove the broken wikidata.org alias. It already doesn't work. [23:44:37] 10Operations, 10Wikibase-DataModel, 10Wikidata: Remove left-over alias for wikidata.org/ontology (doesn't work) - https://phabricator.wikimedia.org/T169023#3384790 (10Krinkle) [23:47:37] (03PS1) 10Krinkle: mediawiki: Remove broken wikidata.org/ontology Apache alias [puppet] - 10https://gerrit.wikimedia.org/r/361801 (https://phabricator.wikimedia.org/T169023)