[00:00:05] twentyafterfour: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Phabricator update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171005T0000). [00:00:05] No GERRIT patches in the queue for this window AFAICS. [00:04:33] !log catrope@tin Synchronized php-1.31.0-wmf.2/extensions/WikimediaEvents/modules/ext.wikimediaEvents.recentChangesClicks.js: T176652 (duration: 00m 50s) [00:04:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:04:41] T176652: Performance review of RCFilters feature - https://phabricator.wikimedia.org/T176652 [00:06:21] !log catrope@tin Synchronized php-1.31.0-wmf.2/resources/src/mediawiki.rcfilters/mw.rcfilters.init.js: T176652 (duration: 00m 49s) [00:06:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:15:55] (03CR) 10Krinkle: [C: 04-1] Avoid perl warnings for invalid lines in reverse-stack mode [puppet] - 10https://gerrit.wikimedia.org/r/377451 (https://phabricator.wikimedia.org/T169249) (owner: 10Aaron Schulz) [00:18:23] (03PS5) 10Sowjanyavemuri: Understand the scripts/webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/380568 (https://phabricator.wikimedia.org/T176624) [00:42:16] (03PS2) 10Catrope: Enable structured change filters by default on all wikis except those with FlaggedRevs protection [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378265 (https://phabricator.wikimedia.org/T177445) [00:42:19] (03PS1) 10Catrope: Enable structured change filters by default on all remaining wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382333 (https://phabricator.wikimedia.org/T177444) [01:12:08] 10Operations, 10Performance-Team, 10Availability (Multiple-active-datacenters), 10Services (blocked): Consider REST with SSL (HyperSwitch/Cassandra) for session storage - https://phabricator.wikimedia.org/T134811#3659433 (10aaron) a:05aaron>03None [01:22:36] (03PS1) 10Dzahn: wikistats (labs): resource-like declarations vs include [puppet] - 10https://gerrit.wikimedia.org/r/382335 [01:24:12] (03CR) 10Dzahn: "was testing the new style check, so this results in +1 and -1 violation" [puppet] - 10https://gerrit.wikimedia.org/r/382335 (owner: 10Dzahn) [01:26:54] (03CR) 10Catrope: [C: 04-2] "Not yet" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382333 (https://phabricator.wikimedia.org/T177444) (owner: 10Catrope) [01:32:21] (03PS2) 10Dzahn: wikistats (labs): resource-like declarations vs include [puppet] - 10https://gerrit.wikimedia.org/r/382335 [01:32:53] (03CR) 10jerkins-bot: [V: 04-1] wikistats (labs): resource-like declarations vs include [puppet] - 10https://gerrit.wikimedia.org/r/382335 (owner: 10Dzahn) [01:36:02] (03PS3) 10Dzahn: wikistats (labs): profile/role, remove violations [puppet] - 10https://gerrit.wikimedia.org/r/382335 [01:38:32] (03CR) 10Dzahn: [C: 032] "Resolved violations: 1" [puppet] - 10https://gerrit.wikimedia.org/r/382335 (owner: 10Dzahn) [01:49:26] (03PS1) 10Dzahn: racktables: role/profile, remove style violations [puppet] - 10https://gerrit.wikimedia.org/r/382336 [01:49:57] (03CR) 10jerkins-bot: [V: 04-1] racktables: role/profile, remove style violations [puppet] - 10https://gerrit.wikimedia.org/r/382336 (owner: 10Dzahn) [02:00:05] (03PS2) 10Dzahn: racktables: role/profile, remove style violations [puppet] - 10https://gerrit.wikimedia.org/r/382336 [02:02:29] (03CR) 10Dzahn: "wmf-style: total violations delta -7" [puppet] - 10https://gerrit.wikimedia.org/r/382336 (owner: 10Dzahn) [02:09:03] (03PS1) 10Dzahn: otrs: apache resources in profile vs include [puppet] - 10https://gerrit.wikimedia.org/r/382338 [02:10:21] (03PS2) 10Dzahn: otrs: apache resources in profile vs include [puppet] - 10https://gerrit.wikimedia.org/r/382338 [02:10:55] (03CR) 10jerkins-bot: [V: 04-1] otrs: apache resources in profile vs include [puppet] - 10https://gerrit.wikimedia.org/r/382338 (owner: 10Dzahn) [02:12:14] (03PS3) 10Dzahn: otrs: apache resources in profile vs include [puppet] - 10https://gerrit.wikimedia.org/r/382338 [02:18:21] (03PS1) 10Dzahn: phabricator: apache resources in profile vs include [puppet] - 10https://gerrit.wikimedia.org/r/382342 [02:18:22] RECOVERY - Router interfaces on cr1-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 68, down: 0, dormant: 0, excluded: 0, unused: 0 [02:18:33] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [02:19:40] (03PS2) 10Dzahn: phabricator: apache resources in profile vs include [puppet] - 10https://gerrit.wikimedia.org/r/382342 [02:23:22] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 66, down: 1, dormant: 0, excluded: 0, unused: 0 [02:23:42] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0 [02:24:50] (03PS1) 10Dzahn: requesttracker: apache resources vs include [puppet] - 10https://gerrit.wikimedia.org/r/382343 [02:26:32] (03PS2) 10Dzahn: requesttracker: apache resources vs include [puppet] - 10https://gerrit.wikimedia.org/r/382343 [02:29:04] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.1) (duration: 09m 51s) [02:29:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:38:42] RECOVERY - Router interfaces on cr1-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 68, down: 0, dormant: 0, excluded: 0, unused: 0 [02:38:52] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [02:51:56] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.2) (duration: 09m 01s) [02:52:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:59:19] !log l10nupdate@tin ResourceLoader cache refresh completed at Thu Oct 5 02:59:18 UTC 2017 (duration 7m 23s) [02:59:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:34:31] (03PS1) 10Dzahn: annualreport: rm module, merge into profile, fix style [puppet] - 10https://gerrit.wikimedia.org/r/382351 [03:35:54] (03PS2) 10Dzahn: annualreport: rm module, merge into profile, fix style [puppet] - 10https://gerrit.wikimedia.org/r/382351 [03:36:51] (03CR) 10Dzahn: "wmf-style: total violations delta -3" [puppet] - 10https://gerrit.wikimedia.org/r/382351 (owner: 10Dzahn) [03:39:00] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 66, down: 1, dormant: 0, excluded: 0, unused: 0 [03:39:19] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0 [04:13:52] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [04:14:41] RECOVERY - Router interfaces on cr1-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 68, down: 0, dormant: 0, excluded: 0, unused: 0 [04:15:18] (03PS1) 10Dzahn: start profile for wikiba.se web hosting [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) [04:15:50] (03CR) 10jerkins-bot: [V: 04-1] start profile for wikiba.se web hosting [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) (owner: 10Dzahn) [04:16:52] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0 [04:17:01] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0 [04:20:30] (03PS2) 10Dzahn: start profile for wikiba.se web hosting [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) [04:21:00] (03CR) 10jerkins-bot: [V: 04-1] start profile for wikiba.se web hosting [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) (owner: 10Dzahn) [04:23:06] (03PS3) 10Dzahn: start profile for wikiba.se web hosting [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) [04:23:36] (03CR) 10jerkins-bot: [V: 04-1] start profile for wikiba.se web hosting [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) (owner: 10Dzahn) [04:24:31] PROBLEM - MariaDB Slave Lag: s4 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 605.90 seconds [04:26:31] (03PS4) 10Dzahn: start profile for wikiba.se web hosting [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) [04:27:02] (03CR) 10jerkins-bot: [V: 04-1] start profile for wikiba.se web hosting [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) (owner: 10Dzahn) [04:34:41] RECOVERY - MariaDB Slave Lag: s4 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 251.16 seconds [04:38:52] (03PS5) 10Dzahn: start profile for wikiba.se web hosting [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) [04:39:19] (03CR) 10jerkins-bot: [V: 04-1] start profile for wikiba.se web hosting [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) (owner: 10Dzahn) [04:40:27] (03PS6) 10Dzahn: start profile for wikiba.se web hosting [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) [04:41:40] (03Draft2) 10Jayprakash12345: Enable Extension:DynamicPageList to Turkish Witionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382357 [04:42:29] (03PS3) 10Jayprakash12345: Enable Extension:DynamicPageList to Turkish Witionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382357 (https://phabricator.wikimedia.org/T177448) [04:47:08] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se, and 2 others: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531#3659506 (10Dzahn) We have the repo, and i see your change Amir, thank you! Before i merge the content itself i started with puppet code to in... [05:14:12] (03CR) 10Chad: start profile for wikiba.se web hosting (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) (owner: 10Dzahn) [05:26:11] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0] [05:26:21] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0] [05:35:21] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [05:37:11] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [05:39:41] PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 returned the unexpected status 503 (expecting: 200) [05:42:05] !log restart varnish backend on cp3030 [05:42:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:42:21] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0] [05:42:41] RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy [05:43:11] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [1000.0] [05:46:04] going to restart cp3040 in bit too, it was responsible for some previous 503s [05:46:09] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1097" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382360 [05:46:11] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1097" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382360 [05:47:53] <_joe_> elukey: is it recovering? [05:48:40] <_joe_> it is indeed [05:48:43] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1097" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382360 (owner: 10Marostegui) [05:48:50] yep yep [05:50:20] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1097" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382360 (owner: 10Marostegui) [05:50:31] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1097" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382360 (owner: 10Marostegui) [05:51:50] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1097 - T174509 (duration: 01m 09s) [05:51:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:51:59] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [05:53:39] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2035 and db2056" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382364 [05:53:42] (03PS2) 10Marostegui: Revert "db-codfw.php: Depool db2035 and db2056" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382364 [05:56:32] (03CR) 10Marostegui: [C: 031] "Looks good, if you want me to merge+deploy just let me know" [dns] - 10https://gerrit.wikimedia.org/r/382205 (https://phabricator.wikimedia.org/T175685) (owner: 10Papaul) [05:56:49] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2035 and db2056" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382364 (owner: 10Marostegui) [05:57:19] !log restart varnish backend on cp3040 [05:57:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:00:07] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2035 and db2056" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382364 (owner: 10Marostegui) [06:00:21] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2035 and db2056" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382364 (owner: 10Marostegui) [06:01:28] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2035 and db2056 - T174509 (duration: 00m 52s) [06:01:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:01:36] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [06:02:56] and now it is the turn of cp3032 [06:03:02] that is throwing 503s [06:03:11] <_joe_> this is untenable [06:03:16] <_joe_> I'm depooling esams [06:03:35] <_joe_> this is, in my opinion, an outage [06:03:50] <_joe_> 5% of 5xx responses right now [06:03:50] so the last datapoints seems recovered [06:04:10] <_joe_> don't trust grafana or graphite [06:04:18] <_joe_> go on oxygen and tail the feed [06:04:30] I am checking logstash atm [06:04:40] <_joe_> no. [06:04:53] <_joe_> that's by far not the best way to check these things while they happen [06:05:36] sure, but I can see no more 503s now, going to check oxygen but it seems that the spike is over [06:05:49] it might just be that this morning we have 3 misbehaving backends rather than 2 [06:06:04] <_joe_> yeah well, this is all unacceptable [06:13:02] !log Optimize enwiki.templatelinks and enwiki.pagelinks on codfw master db2048 (without replication) - T174509 [06:13:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:13:10] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [06:14:30] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:15:11] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:16:47] the only one left is cp3032, that caused the last spike [06:16:58] I'd precautionary restart that one as well [06:17:23] even if it would be 3 cache hosts purged in ~30 min [06:17:38] !log Optimize s2.templatelinks and enwiki.pagelinks on codfw master db2017 (without replication) - T174509 [06:17:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:17:51] <_joe_> elukey: still failed backend fetches? [06:18:22] _joe_: atm it seems all good, but it caused the previous backend fetch spike [06:18:58] https://grafana.wikimedia.org/dashboard/db/varnish-failed-fetches?orgId=1&var-datasource=esams%20prometheus%2Fops&var-cache_type=text&from=now-3h&to=now [06:20:30] (03PS1) 10Marostegui: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382370 (https://phabricator.wikimedia.org/T174509) [06:22:47] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382370 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [06:24:18] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382370 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [06:24:56] !log Optimize templatelinks and pagelinks on db1089 - T174509 [06:25:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:25:07] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [06:25:37] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1089 - T174509 (duration: 00m 49s) [06:25:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:26:00] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382370 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [06:26:49] (03PS1) 10Marostegui: db-eqiad.php: Depool db1053 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382371 (https://phabricator.wikimedia.org/T174509) [06:28:29] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1053 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382371 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [06:29:53] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1053 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382371 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [06:30:12] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1053 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382371 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [06:30:56] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1053 - T174509 (duration: 00m 50s) [06:30:58] !log Optimize templatelinks and pagelinks on db1053 - T174509 [06:31:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:31:03] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [06:31:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:32:14] !log Optimize templatelinks and pagelinks on db1069 - T174509 [06:32:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:35:00] (03PS1) 10Marostegui: mariadb: Add db1105 to s5 [puppet] - 10https://gerrit.wikimedia.org/r/382372 (https://phabricator.wikimedia.org/T172679) [06:36:18] (03PS1) 10Marostegui: s5.hosts: Add db1105 [software] - 10https://gerrit.wikimedia.org/r/382373 (https://phabricator.wikimedia.org/T172679) [06:38:29] (03CR) 10Marostegui: [C: 032] s5.hosts: Add db1105 [software] - 10https://gerrit.wikimedia.org/r/382373 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [06:38:34] (03CR) 10Marostegui: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/8191/" [puppet] - 10https://gerrit.wikimedia.org/r/382372 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [06:39:35] (03Merged) 10jenkins-bot: s5.hosts: Add db1105 [software] - 10https://gerrit.wikimedia.org/r/382373 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [06:40:38] !log Stop MySQL on db1104 to clone db1105 from it - T172679 [06:40:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:40:45] T172679: Productionize 11 new eqiad database servers - https://phabricator.wikimedia.org/T172679 [06:41:39] 10Operations, 10HHVM, 10User-Elukey: Migration of mw* servers to stretch - https://phabricator.wikimedia.org/T174431#3561778 (10MoritzMuehlenhoff) [06:41:41] 10Operations, 10HHVM, 10User-Elukey: Missing .deb dependencies for appserver on Stretch - https://phabricator.wikimedia.org/T177443#3659585 (10MoritzMuehlenhoff) 05Open>03Resolved a:03MoritzMuehlenhoff That's all addressed in https://gerrit.wikimedia.org/r/#/c/380722/ (which needs review, review welcom... [06:49:06] PROBLEM - puppet last run on einsteinium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:53:41] (03PS1) 10Marostegui: db-eqiad.php: Depool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382375 (https://phabricator.wikimedia.org/T174509) [06:54:03] RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:55:01] (03CR) 10jerkins-bot: [V: 04-1] db-eqiad.php: Depool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382375 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [06:55:39] (03PS2) 10Marostegui: db-eqiad.php: Depool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382375 (https://phabricator.wikimedia.org/T174509) [06:56:40] !log upgrade mw1238-mw1258 (app servers) to HHVM 3.18.5 [06:56:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:58:44] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [06:58:44] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [06:59:02] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382375 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [07:00:32] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382375 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [07:00:42] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382375 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [07:01:12] !log Optimize templatelinks and pagelinks on db1099 - T174509 [07:01:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:01:18] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [07:01:37] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1099 - T174509 (duration: 00m 52s) [07:01:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:06:24] !log update pybal to 1.14.0 on esams secondary LVSs [07:06:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:11:25] !log update pybal to 1.14.0 on esams primary LVSs [07:11:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:22:51] (03CR) 10Ema: "Looks good! Minor nit: you should add yourself to the Uploaders list in debian/control so that lintian does not think this is a Non-Mainta" [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382176 (owner: 10Ayounsi) [07:25:06] (03CR) 10Giuseppe Lavagetto: "LGTM in general, see inline comments though." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/380722 (owner: 10Muehlenhoff) [07:26:58] !log running batched query of "DELETE FROM linter WHERE linter_cat=12" on all wikis to clear out mostly bogus html5-misnesting category [07:27:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:43:59] (03PS1) 10Marostegui: db-eqiad.php: Set commonswiki on read only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382379 (https://phabricator.wikimedia.org/T176883) [07:44:25] (03CR) 10Marostegui: [C: 04-2] "Do not merge until Wednesday 11th Oct at 6:00AM UTC" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382379 (https://phabricator.wikimedia.org/T176883) (owner: 10Marostegui) [07:45:45] PROBLEM - MariaDB Slave Lag: s4 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 765.00 seconds [07:47:32] (03PS1) 10Marostegui: db1068: Update socket path [puppet] - 10https://gerrit.wikimedia.org/r/382380 (https://phabricator.wikimedia.org/T168661) [07:49:14] PROBLEM - DPKG on mw1255 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [07:50:14] RECOVERY - DPKG on mw1255 is OK: All packages OK [07:51:26] (03CR) 10Marostegui: [C: 04-2] "Do not merge until https://phabricator.wikimedia.org/T168661#3659659 steps are done" [puppet] - 10https://gerrit.wikimedia.org/r/382380 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [07:52:04] PROBLEM - puppet last run on mw1257 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm],Package[hhvm-dbg] [07:54:37] !log upgrade mw1221-mw1235 (API servers) to HHVM 3.18.5 [07:54:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:57:04] RECOVERY - puppet last run on mw1257 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:57:34] PROBLEM - puppet last run on mw1238 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm],Package[hhvm-dbg] [07:59:04] (03PS2) 10Gehel: logstash: DSH groups based on conftool [puppet] - 10https://gerrit.wikimedia.org/r/382184 [07:59:46] (03CR) 10Gehel: [C: 032] logstash: DSH groups based on conftool [puppet] - 10https://gerrit.wikimedia.org/r/382184 (owner: 10Gehel) [08:00:45] PROBLEM - DPKG on mw1222 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [08:01:00] (03PS1) 10Aude: Cleanup old Wikibase echo test wiki configs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382384 [08:01:54] RECOVERY - DPKG on mw1222 is OK: All packages OK [08:02:36] RECOVERY - puppet last run on mw1238 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:03:03] !log putting labsdb1009 under maintenance [08:03:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:04:48] I will check the other hosts and see if they also complain about the buffer pool [08:05:23] 10 doesn't, but it doesn't have the load of the others [08:08:14] (03PS1) 10Giuseppe Lavagetto: Rakefile: add global task for the wmf_style check [puppet] - 10https://gerrit.wikimedia.org/r/382385 [08:08:42] (03CR) 10jerkins-bot: [V: 04-1] Rakefile: add global task for the wmf_style check [puppet] - 10https://gerrit.wikimedia.org/r/382385 (owner: 10Giuseppe Lavagetto) [08:12:43] !log Optimize templatelinks and pagelinks on s1 s2 s4 s5 s6 and s7 on labsdb1009 - T174509 [08:12:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:12:50] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [08:13:16] (03CR) 10Jcrespo: "I would add a link with more information, like last time." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382379 (https://phabricator.wikimedia.org/T176883) (owner: 10Marostegui) [08:13:31] (03PS2) 10Giuseppe Lavagetto: Rakefile: add global task for the wmf_style check [puppet] - 10https://gerrit.wikimedia.org/r/382385 [08:14:31] (03CR) 10Giuseppe Lavagetto: [C: 032] Rakefile: add global task for the wmf_style check [puppet] - 10https://gerrit.wikimedia.org/r/382385 (owner: 10Giuseppe Lavagetto) [08:14:52] (03PS2) 10Marostegui: db-eqiad.php: Set commonswiki on read only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382379 (https://phabricator.wikimedia.org/T176883) [08:15:03] PROBLEM - puppet last run on mw1229 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm],Package[hhvm-dbg] [08:16:50] (03CR) 10MarcoAurelio: [C: 04-1] "DPL is not being deployed to further wikis due to performance issues." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382357 (https://phabricator.wikimedia.org/T177448) (owner: 10Jayprakash12345) [08:18:57] (03CR) 10Marostegui: [C: 04-2] "> I would add a link with more information, like last time." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382379 (https://phabricator.wikimedia.org/T176883) (owner: 10Marostegui) [08:19:04] PROBLEM - HHVM rendering on mw1230 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:19:54] RECOVERY - puppet last run on mw1229 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [08:20:03] RECOVERY - HHVM rendering on mw1230 is OK: HTTP OK: HTTP/1.1 200 OK - 73770 bytes in 0.146 second response time [08:21:39] !log upgrade remaining eqiad image scalers to HHVM 3.18.5 [08:21:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:25:33] PROBLEM - HHVM rendering on mw1296 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:26:24] RECOVERY - HHVM rendering on mw1296 is OK: HTTP OK: HTTP/1.1 200 OK - 73770 bytes in 0.159 second response time [08:29:45] !log Optimize pagelinks and templatelinks on s5 - db1105 - T174509 [08:29:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:29:52] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [08:31:33] PROBLEM - HHVM rendering on mw1297 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:32:23] RECOVERY - HHVM rendering on mw1297 is OK: HTTP OK: HTTP/1.1 200 OK - 73770 bytes in 0.432 second response time [08:32:30] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Add db1105 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382388 (https://phabricator.wikimedia.org/T172679) [08:35:53] PROBLEM - DPKG on mw1298 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [08:36:53] RECOVERY - DPKG on mw1298 is OK: All packages OK [08:37:07] (03CR) 10Marostegui: [C: 032] db-eqiad,db-codfw.php: Add db1105 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382388 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [08:37:24] PROBLEM - HHVM rendering on mw1298 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:38:13] RECOVERY - HHVM rendering on mw1298 is OK: HTTP OK: HTTP/1.1 200 OK - 73770 bytes in 0.151 second response time [08:38:44] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Add db1105 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382388 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [08:38:54] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Add db1105 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382388 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [08:39:49] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Add db1105 to the config - T172679 (duration: 00m 52s) [08:39:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:39:58] T172679: Productionize 11 new eqiad database servers - https://phabricator.wikimedia.org/T172679 [08:40:45] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Add db1105 to the config - T172679 (duration: 00m 50s) [08:40:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:41:08] !log upgrade remaining eqiad job runners to HHVM 3.18.5 [08:41:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:44:35] (03PS1) 10Marostegui: s5.hosts: Add db1106 to s5 [software] - 10https://gerrit.wikimedia.org/r/382391 (https://phabricator.wikimedia.org/T172679) [08:46:18] (03PS1) 10Marostegui: mariadb: Add db1106 to s5 [puppet] - 10https://gerrit.wikimedia.org/r/382392 (https://phabricator.wikimedia.org/T172679) [08:47:16] (03CR) 10Marostegui: [C: 032] s5.hosts: Add db1106 to s5 [software] - 10https://gerrit.wikimedia.org/r/382391 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [08:47:30] (03CR) 10Ladsgroup: "Thanks :)" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) (owner: 10Dzahn) [08:47:44] (03PS2) 10Gehel: wdqs: GC tuning [puppet] - 10https://gerrit.wikimedia.org/r/382195 (https://phabricator.wikimedia.org/T175919) [08:47:58] (03PS3) 10Gehel: wdqs: GC tuning [puppet] - 10https://gerrit.wikimedia.org/r/382195 (https://phabricator.wikimedia.org/T175919) [08:48:04] (03Merged) 10jenkins-bot: s5.hosts: Add db1106 to s5 [software] - 10https://gerrit.wikimedia.org/r/382391 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [08:48:26] 10Operations, 10Prod-Kubernetes, 10Kubernetes: Implement authentication/authorization in Kubernetes clusters - https://phabricator.wikimedia.org/T177393#3659784 (10Joe) [08:48:56] 10Operations, 10Prod-Kubernetes, 10Kubernetes, 10User-Joe: Experiment with a TLS proxy/router for pods - https://phabricator.wikimedia.org/T177394#3659785 (10Joe) [08:49:24] (03CR) 10Marostegui: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/8192/" [puppet] - 10https://gerrit.wikimedia.org/r/382392 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [08:49:39] 10Operations, 10Prod-Kubernetes, 10Kubernetes, 10User-Joe: Improve monitoring of the Kubernetes clusters - https://phabricator.wikimedia.org/T177395#3659786 (10Joe) [08:50:04] _joe_: is it ok to merge your change? [08:50:17] <_joe_> marostegui: meh sorry, yes [08:50:21] ok will do now [08:50:27] <_joe_> marostegui: that's a change with no effect in production [08:50:30] <_joe_> so I forgot to merge [08:50:32] <_joe_> :/ [08:50:37] no worries - merging now :) [08:51:19] (03CR) 10Gehel: [C: 032] wdqs: GC tuning [puppet] - 10https://gerrit.wikimedia.org/r/382195 (https://phabricator.wikimedia.org/T175919) (owner: 10Gehel) [08:51:27] (03PS4) 10Gehel: wdqs: GC tuning [puppet] - 10https://gerrit.wikimedia.org/r/382195 (https://phabricator.wikimedia.org/T175919) [08:52:31] 10Operations, 10Prod-Kubernetes, 10Kubernetes, 10User-Joe: Design pod-level monitoring and service-level alerting - https://phabricator.wikimedia.org/T177396#3659827 (10Joe) [08:53:05] 10Operations, 10Prod-Kubernetes, 10Kubernetes, 10User-Joe: Create scaffolding of services templates for deployment in production/staging - https://phabricator.wikimedia.org/T177397#3659829 (10Joe) [08:53:18] !log rolling restart of wdqs to pick up new GC options - T175919 [08:53:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:53:25] T175919: investigate GC times on wikidata query service - https://phabricator.wikimedia.org/T175919 [08:54:09] !log Stop MySQL on db1104 to clone db1106 - T172679 [08:54:12] PROBLEM - MariaDB Slave Lag: s2 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 661.89 seconds [08:54:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:54:17] T172679: Productionize 11 new eqiad database servers - https://phabricator.wikimedia.org/T172679 [08:54:21] 10Operations, 10Prod-Kubernetes, 10Kubernetes, 10User-Joe: Create scaffolding of services templates for deployment in production/staging - https://phabricator.wikimedia.org/T177397#3657212 (10Joe) [08:58:26] (03PS1) 10Gehel: wdqs: fix typo in GC_LOGS [puppet] - 10https://gerrit.wikimedia.org/r/382397 (https://phabricator.wikimedia.org/T175919) [08:59:00] (03CR) 10Gehel: [C: 032] wdqs: fix typo in GC_LOGS [puppet] - 10https://gerrit.wikimedia.org/r/382397 (https://phabricator.wikimedia.org/T175919) (owner: 10Gehel) [09:01:13] RECOVERY - MariaDB Slave Lag: s2 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 293.90 seconds [09:01:54] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, and 2 others: Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3659854 (10jcrespo) > The data in user_groups isn't incorrect, there just happens to be a new column of relevant data.... [09:03:32] PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 502 (expecting: 200) [09:03:59] (03PS1) 10Gehel: wdqs: add -XX:+UnlockExperimentalVMOptions [puppet] - 10https://gerrit.wikimedia.org/r/382398 [09:04:33] RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy [09:04:33] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target) is CRITICAL: Test normal source and target returned the unexpected status 502 (expecting: 200) [09:04:41] (03CR) 10Gehel: [C: 032] wdqs: add -XX:+UnlockExperimentalVMOptions [puppet] - 10https://gerrit.wikimedia.org/r/382398 (owner: 10Gehel) [09:05:32] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy [09:06:38] (03PS4) 10Hashar: jenkins: switch to Java8 [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) [09:07:09] (03CR) 10Hashar: "Added a spec to ensure the catalog compiles when adding both master and slave classes ( https://gerrit.wikimedia.org/r/#/c/382217/3..4/mod" [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) (owner: 10Hashar) [09:15:22] (03PS1) 10Gehel: wdqs: send GC lgos to file [puppet] - 10https://gerrit.wikimedia.org/r/382401 (https://phabricator.wikimedia.org/T175919) [09:16:40] (03PS2) 10Gehel: wdqs: send GC logs to file [puppet] - 10https://gerrit.wikimedia.org/r/382401 (https://phabricator.wikimedia.org/T175919) [09:17:17] (03CR) 10Gehel: [C: 032] wdqs: send GC logs to file [puppet] - 10https://gerrit.wikimedia.org/r/382401 (https://phabricator.wikimedia.org/T175919) (owner: 10Gehel) [09:17:35] yes [09:17:42] oops, wrong channel... [09:23:46] (03PS1) 10Ema: package_builder: add D03wikimedia-experimental hook [puppet] - 10https://gerrit.wikimedia.org/r/382403 [09:26:36] (03PS2) 10Ema: package_builder: add D03wikimedia-experimental hook [puppet] - 10https://gerrit.wikimedia.org/r/382403 [09:28:23] (03CR) 10Mattflaschen: [C: 04-1] "The if statement should be reversed." (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378265 (https://phabricator.wikimedia.org/T177445) (owner: 10Catrope) [09:32:13] PROBLEM - MariaDB Slave Lag: s4 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 345174.90 seconds [09:37:29] (03CR) 10Mattflaschen: [C: 04-1] Enable structured change filters by default on all wikis except those with FlaggedRevs protection (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378265 (https://phabricator.wikimedia.org/T177445) (owner: 10Catrope) [09:45:43] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1099" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382409 [09:45:47] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1099" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382409 [09:46:11] (03CR) 10Filippo Giunchedi: smart: new module (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/378039 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [09:46:23] (03PS4) 10Filippo Giunchedi: smart: new module [puppet] - 10https://gerrit.wikimedia.org/r/378039 (https://phabricator.wikimedia.org/T86552) [09:49:29] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1099" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382409 (owner: 10Marostegui) [09:49:46] (03CR) 10Alexandros Kosiaris: [C: 04-1] package_builder: add D03wikimedia-experimental hook (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/382403 (owner: 10Ema) [09:50:40] 10Operations, 10Performance-Team, 10Thumbor, 10MW-1.31-release-notes (WMF-deploy-2017-09-26 (1.31.0-wmf.1)), 10User-fgiunchedi: Remove X-Content-Dimensions for multipage originals - https://phabricator.wikimedia.org/T175689#3659953 (10Gilles) Started running the cleanup script on Terbium, clearing the he... [09:51:47] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1099" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382409 (owner: 10Marostegui) [09:52:02] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1099" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382409 (owner: 10Marostegui) [09:52:50] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1099 - T174509 (duration: 00m 50s) [09:52:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:52:57] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [09:56:52] (03CR) 10Alexandros Kosiaris: "I am not entirely sure about this one. What's left in the module class is also violating our policy (a class inclusion within a module fro" [puppet] - 10https://gerrit.wikimedia.org/r/382338 (owner: 10Dzahn) [09:57:48] (03CR) 10Gehel: jenkins: switch to Java8 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) (owner: 10Hashar) [09:59:28] (03PS1) 10Marostegui: db-eqiad.php: Depool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382410 (https://phabricator.wikimedia.org/T174054) [10:01:32] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382410 (https://phabricator.wikimedia.org/T174054) (owner: 10Marostegui) [10:02:16] (03CR) 10Muehlenhoff: jenkins: switch to Java8 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) (owner: 10Hashar) [10:02:57] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382410 (https://phabricator.wikimedia.org/T174054) (owner: 10Marostegui) [10:03:11] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382410 (https://phabricator.wikimedia.org/T174054) (owner: 10Marostegui) [10:03:43] (03PS1) 10Giuseppe Lavagetto: Add tests for a bad defined type [puppet-lint/wmf_styleguide-check] - 10https://gerrit.wikimedia.org/r/382413 [10:03:54] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054#3659978 (10Marostegui) db1076 has been depooled. @Cmjohnson ping me when you are online so I can start mydumper to simulate reads, a... [10:04:08] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1076 - T174054 (duration: 00m 50s) [10:04:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:04:15] T174054: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054 [10:04:32] (03PS1) 10Hoo man: Enable Statement usage tracking on kowiki and trwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382414 (https://phabricator.wikimedia.org/T151717) [10:05:02] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, and 2 others: Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3659995 (10MarcoAurelio) [10:06:12] (03CR) 10Alexandros Kosiaris: [C: 04-1] racktables: role/profile, remove style violations (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/382336 (owner: 10Dzahn) [10:07:29] !log addshore@tin Synchronized php-1.31.0-wmf.1/extensions/WikimediaEvents/WikimediaEventsHooks.php: 31-wmf.1 WikimediaEvents: WMDE: update $campaignStartTimestamp (duration: 00m 52s) [10:07:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:08:48] (03PS5) 10Gehel: jenkins: switch to Java8 [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) (owner: 10Hashar) [10:09:30] (03CR) 10jerkins-bot: [V: 04-1] jenkins: switch to Java8 [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) (owner: 10Hashar) [10:09:33] !log addshore@tin Synchronized php-1.31.0-wmf.2/extensions/WikimediaEvents/WikimediaEventsHooks.php: 31-wmf.2 WikimediaEvents: WMDE: update $campaignStartTimestamp (duration: 00m 50s) [10:09:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:10:05] (03PS1) 10Filippo Giunchedi: hieradata: enable syslog over tls for eqiad [puppet] - 10https://gerrit.wikimedia.org/r/382415 (https://phabricator.wikimedia.org/T136312) [10:10:51] (03PS1) 10Alexandros Kosiaris: scap::dsh: Create the parsoid-canaries group [puppet] - 10https://gerrit.wikimedia.org/r/382416 (https://phabricator.wikimedia.org/T177374) [10:14:02] (03PS6) 10Gehel: jenkins: switch to Java8 [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) (owner: 10Hashar) [10:14:31] !log addshore@tin Synchronized php-1.31.0-wmf.2/extensions/TwoColConflict/includes/SpecialConflictTestPage/SpecialConflictTestPage.php: [[gerrit:382404|Fix SimulateTwoColEditConflict when using a page with ns]] (duration: 00m 50s) [10:14:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:14:37] (03CR) 10jerkins-bot: [V: 04-1] jenkins: switch to Java8 [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) (owner: 10Hashar) [10:15:57] gehel: the poor spec file gotta be updated as result :D [10:16:17] !log addshore@tin Synchronized php-1.31.0-wmf.2/extensions/TwoColConflict/includes/TwoColConflictHooks.php: [[gerrit:382405|Dont register simulate page when not enabled as a BF]] PT1/3 (duration: 00m 51s) [10:16:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:16:24] of course! [10:16:26] (03PS7) 10Gehel: jenkins: switch to Java8 [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) (owner: 10Hashar) [10:17:02] ohhh [10:17:25] !log addshore@tin Synchronized php-1.31.0-wmf.2/extensions/TwoColConflict/extension.json: [[gerrit:382405|Dont register simulate page when not enabled as a BF]] PT2/3 (duration: 00m 50s) [10:17:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:17:59] hashar: lunch time here, but I'll ping you after it to see if it all works out... [10:18:14] gehel: I am going to further amend and run the puppet compiler [10:18:27] hashar: good idea! [10:18:29] gehel: but looks your fix is the way to goo [10:18:32] thanks !!! [10:18:36] hmmmmmm [10:18:37] np [10:18:58] hashar: looks like i did something... [10:19:34] (03PS8) 10Hashar: jenkins: switch to Java8 [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) [10:20:10] BadMethodCallException from line 43 of /srv/mediawiki/php-1.31.0-wmf.2/extensions/BetaFeatures/includes/BetaFeaturesUtil.php: Call to a member function getOption() on a non-object (null) [10:20:27] fuck sake [10:20:45] oh [10:20:50] addshore: try with python? [10:21:05] i am kidding. I guess some User/Title is missing ? :( [10:21:40] reverting [10:21:42] meh [10:22:23] !log addshore@tin Synchronized php-1.31.0-wmf.2/extensions/TwoColConflict/extension.json: REVERT: [[gerrit:382405|Dont register simulate page when not enabled as a BF]] PT2/3 (duration: 00m 50s) [10:22:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:23:58] !log addshore@tin Synchronized php-1.31.0-wmf.2/extensions/TwoColConflict/includes/TwoColConflictHooks.php: REVERT: [[gerrit:382405|Dont register simulate page when not enabled as a BF]] PT1/3 (duration: 00m 50s) [10:24:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:24:52] (03CR) 10Hashar: jenkins: switch to Java8 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) (owner: 10Hashar) [10:25:05] (03PS9) 10Hashar: jenkins: switch to Java8 [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) [10:25:10] and the mediawiki-errors logstash dash now refuses to load, lovely [10:26:41] !log upgrade remaining eqiad API servers to HHVM 3.18.5 [10:26:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:30:06] Looks like they only came from load.php and job runs, I'll keep looking at that in a bit. Nothing to worry about now! [10:31:01] (03PS2) 10Alexandros Kosiaris: scap::dsh: Create the parsoid-canaries group [puppet] - 10https://gerrit.wikimedia.org/r/382416 (https://phabricator.wikimedia.org/T177374) [10:32:46] (03PS3) 10Alexandros Kosiaris: scap::dsh: Create the parsoid-canaries group [puppet] - 10https://gerrit.wikimedia.org/r/382416 (https://phabricator.wikimedia.org/T177374) [10:34:24] (03CR) 10Hashar: "Moritz wrote:" [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) (owner: 10Hashar) [10:35:01] 10Operations, 10Goal, 10User-fgiunchedi: Port non-deprecated Diamond collectors to Prometheus - https://phabricator.wikimedia.org/T177196#3660086 (10fgiunchedi) [10:35:33] !log contint2001: switched java to version 8 as well as all the jre/jdk utilities managed by alternatives - T162828 [10:35:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:35:40] T162828: Upgrade jenkins server and jenkins slaves to java 8 - https://phabricator.wikimedia.org/T162828 [10:35:44] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: decom wtp1001-wtp1024 - https://phabricator.wikimedia.org/T177374#3660089 (10akosiaris) >>! In T177374#3658878, @Arlolra wrote: > #parsing-team could have used a ping here, since our canaries are still hardcoded to the decommissioned nodes. > https:/... [10:36:36] PROBLEM - DPKG on mw1287 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:37:36] RECOVERY - DPKG on mw1287 is OK: All packages OK [10:38:44] hashar: Bugging you once again about this one ;) Could you have a look at https://gerrit.wikimedia.org/r/#/c/380989/ / https://gerrit.wikimedia.org/r/#/c/381639/ ? [10:39:54] hashar: I wonder if the canarychecks etc would make sense to run per entry point (index, load and runJobs) [10:39:57] (03CR) 10Hashar: [V: 031 C: 031] "The patch I have made to update-java-alternatives:" [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) (owner: 10Hashar) [10:40:21] That would have caught what I just broke [10:40:41] 10Operations, 10Traffic, 10Performance-Team (Radar): Upgrade to Varnish 5 - https://phabricator.wikimedia.org/T168529#3660161 (10ema) varnish-modules backported and uploaded to experimental. In my comment above I forgot to mention libvmod-vslp, which conveniently we do not need to care about any longer as i... [10:45:17] 10Operations, 10Traffic, 10Performance-Team (Radar): Upgrade to Varnish 5 - https://phabricator.wikimedia.org/T168529#3660177 (10ema) Oh, and `s/return (fetch)/return (miss)/`. [10:56:08] (03PS1) 10Ema: varnish: stop passing session_max [puppet] - 10https://gerrit.wikimedia.org/r/382420 (https://phabricator.wikimedia.org/T168529) [11:29:33] !log rebuilding globalimagelinks table on dbstore1001:s4 [11:29:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:38:36] 10Operations, 10Contributors-Team, 10MobileFrontend, 10wikidiff2, and 3 others: Diff page consistently produces 503 on beta cluster on first visit - https://phabricator.wikimedia.org/T176637#3660320 (10Tobi_WMDE_SW) a:03jkroll [11:57:42] !log upgrade remaining eqiad video scalers to HHVM 3.18.5 [11:57:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:00:05] Amir1: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Creating amwikimedia deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171005T1200). [12:00:05] No GERRIT patches in the queue for this window AFAICS. [12:00:43] :))) [12:01:47] (03CR) 10Ladsgroup: [C: 032] Add config for amwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378400 (https://phabricator.wikimedia.org/T176042) (owner: 10Ladsgroup) [12:04:27] (03PS7) 10Ladsgroup: Add amwikimedia to wikiversions.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378403 (https://phabricator.wikimedia.org/T176042) [12:06:24] (03CR) 10Ladsgroup: [C: 032] Add amwikimedia to wikiversions.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378403 (https://phabricator.wikimedia.org/T176042) (owner: 10Ladsgroup) [12:07:57] (03Merged) 10jenkins-bot: Add amwikimedia to wikiversions.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378403 (https://phabricator.wikimedia.org/T176042) (owner: 10Ladsgroup) [12:08:07] (03CR) 10jenkins-bot: Add amwikimedia to wikiversions.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378403 (https://phabricator.wikimedia.org/T176042) (owner: 10Ladsgroup) [12:08:39] before I start I get lots of "47 Syntax Error (64): Illegal character <29> in hex string" in fatalmonitor [12:08:43] just saying [12:10:53] (03PS4) 10Ladsgroup: Add config for amwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378400 (https://phabricator.wikimedia.org/T176042) [12:12:49] (03CR) 10jenkins-bot: Add config for amwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378400 (https://phabricator.wikimedia.org/T176042) (owner: 10Ladsgroup) [12:14:13] PROBLEM - MariaDB Slave Lag: s6 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 727.78 seconds [12:15:31] (03PS1) 10Hashar: contint: apt-get update before installing packages [puppet] - 10https://gerrit.wikimedia.org/r/382429 [12:17:30] hashar Dereckson: creating wiki using the maintenance script starts to behave strange: [12:17:35] https://www.irccloud.com/pastebin/PWr8v4nU/ [12:21:46] The title for the main page doesn't seem right [12:22:39] !log installing poppler security updates on trusty [12:22:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:23:47] it's weird that am.wikimedia.org gets redirected to am.wikipedia.org (which is something totally different) [12:25:09] (03PS1) 10Muehlenhoff: Add library hint for poppler [puppet] - 10https://gerrit.wikimedia.org/r/382430 [12:26:16] (03CR) 10Muehlenhoff: [C: 032] Add library hint for poppler [puppet] - 10https://gerrit.wikimedia.org/r/382430 (owner: 10Muehlenhoff) [12:26:19] RECOVERY - MariaDB Slave Lag: s6 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 221.25 seconds [12:27:09] Amir1: looking [12:28:11] also sync dblists is not working :/ [12:28:17] https://www.irccloud.com/pastebin/jrE28ziW/ [12:28:30] oh I should do it on tin [12:28:42] not terbium, ignore the last comment [12:29:14] Note we don't deploy Flagged Revisions on new wikis. [12:29:27] So it's not really surprising an error occurs there. [12:30:30] PROBLEM - MariaDB Slave Lag: s7 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 890.48 seconds [12:31:01] So we'll need to run again the script using --wiki=aawiki instead of --wiki=fawiki, but commenting out the already done tasks [12:32:31] By the way, Phabricator has a pastebin, at https://phabricator.wikimedia.org/paste [12:33:34] The first thing is to know if a database has been created. [12:33:36] Dereckson: https://phabricator.wikimedia.org/P6082 [12:33:41] it has been created [12:33:45] and same for the tables [12:35:27] yeah but why do you asked the script to run with the fa.wikipedia config? [12:35:45] because without --wiki option it didn't work at all [12:35:51] so I had to choose a wiki [12:35:56] aawiki is the one to use [12:36:04] !log ladsgroup@tin Synchronized dblists: (no justification provided) (duration: 00m 51s) [12:36:05] it's not documented anywhere [12:36:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:36:21] in the page history [12:37:07] so I should delete the whole thing and start again? [12:37:10] No [12:37:17] 12:31:01 < Dereckson> So we'll need to run again the script using --wiki=aawiki instead of --wiki=fawiki, but commenting out the already done tasks [12:37:39] https://wikitech.wikimedia.org/w/index.php?title=Add_a_wiki&type=revision&diff=1772044&oldid=1771423 [12:37:44] hop, the aawiki is restored [12:38:07] Thanks [12:38:07] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382434 [12:38:10] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382434 [12:38:22] now, you can ivehack on terbium the addwiki.php script, comment everything already done [12:38:33] (with sudo -u mwdeploy nano addWiki.php) [12:39:00] PROBLEM - MariaDB Slave SQL: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1007, Errmsg: Error Cant create database amwikimedia: database exists on query. Default database: amwikimedia. [Query snipped] [12:39:11] ping me when you're done, I'll check it to determine if it's good. [12:39:22] (you can ignore dbstore errors, they are expected and pop every new wiki created) [12:39:28] I will fix dbstore1002 [12:39:37] (it's a lack of IF EXISTS statement somewhere) [12:39:44] thanks marostegui [12:40:43] Amir1: the bug is fun note: it wasn't especially easy to think the god script will use the configuration submitted as parameter, and not the aawikimedia configuration to populate a page. [12:40:55] (and not the amwikimedia) [12:41:24] it's logic afterwards, but not before hitting it [12:42:43] Dereckson: I commented out echo [12:42:47] but now I get this [12:43:03] https://phabricator.wikimedia.org/P6084 [12:43:28] let me comment out the line 81 [12:44:08] RECOVERY - MariaDB Slave SQL: s3 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes [12:46:12] Amir1: you need to edit /srv/mediawiki/php-1.31.0-wmf.1/extensions/WikimediaMaintenance/addWiki.php on Terbium to comment the code of the tasks already done [12:46:26] I'm already done [12:46:29] and before running again the script, ping me, I'll check if that looks good [12:46:47] practically commented out half of the code [12:47:00] oh sorry the # wasn't easy to see [12:47:21] if ( self::isPrivateOrFishbowl( $dbName ) ) { [12:47:24] $dbw->sourceFile( "$IP/extensions/OATHAuth/sql/mysql/tables.sql" ); [12:47:27] } [12:47:32] if amwikimedia si fishbowl, you can comment hat t oo [12:47:38] PROBLEM - MariaDB Slave Lag: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 861.63 seconds [12:47:53] // Initialise external storage [12:47:55] done too [12:47:57] // Flow External Store (may be the same, so there is an array_unique) [12:48:01] done too [12:48:08] You want to resume here: $title = Title::newFromText( wfMessage( 'mainpage' )->inLanguage( $lang )->useDatabase( false )->plain() ); [12:48:21] yeah [12:48:24] started [12:48:28] now it's doing stuff [12:49:10] 10Operations, 10DBA, 10procurement: Purchase testing backups hosts (2 hosts in total) in eqiad - https://phabricator.wikimedia.org/T177488#3660510 (10Marostegui) [12:49:32] done [12:49:41] we need to clean this up afterwards [12:49:58] !log ladsgroup@tin Synchronized dblists: Create amwikimedia (T176042) (duration: 00m 50s) [12:50:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:50:06] T176042: Create amwikimedia - https://phabricator.wikimedia.org/T176042 [12:50:56] !log ladsgroup@tin rebuilt wikiversions.php and synchronized wikiversions files: Create amwikimedia (T176042) [12:51:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:51:07] Amir1: clean what? [12:51:19] terbium addWiki.php file [12:51:33] sudo -u mwdeploy rm addWiki.php && scap pull perhaps? [12:51:34] Amir1: When you have a second, can I get a +1 https://gerrit.wikimedia.org/r/382414 [12:51:58] Dereckson: you are a wizard [12:52:02] hoo: sure [12:52:17] (perhaps with a touch on the file on tin) [12:52:25] (03CR) 10Ladsgroup: [C: 031] "Looks sane" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382414 (https://phabricator.wikimedia.org/T151717) (owner: 10Hoo man) [12:52:31] !log ladsgroup@tin Synchronized wmf-config/InitialiseSettings.php: Create amwikimedia (T176042) (duration: 00m 50s) [12:52:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:52:48] well I've run this script several times, and at the start, it crashed a lot. [12:53:34] now let's fix the amwikipedia/amwikimedia thing [12:53:46] !log ladsgroup@tin Synchronized static/images/project-logos/: Create amwikimedia (T176042) (duration: 00m 50s) [12:53:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:56:22] /home/dereckson/dev/mediawiki/operations/mediawiki-config/multiversion (amwikimedia-entry-point) ] git blame MWMultiVersion.php | grep "'am'" [12:56:25] 8b07788b44 multiversion/MWMultiVersion.php (Amir Sarabadani 2017-09-16 19:58:13 +0400 188) 'am', 'ar', 'bd', 'be', [12:56:28] 'br', 'ca', 'cn', 'co', 'dk', 'ec', 'et', 'fi', 'il', 'mai', 'mk', 'mx', [12:56:34] you've already edited multiversion/MWMultiVersion.php, but you need to deploy it too [12:57:06] okay, on it [12:58:01] it's the MediaWiki entry point, so the code responsible to ask to read the right database, etc. [12:58:22] it redirects .wikimedia.org to .wikipedia.org if it's not on the list you've edited [12:58:44] !log ladsgroup@tin Synchronized multiversion/MWMultiVersion.php: Create amwikimedia (T176042) (duration: 00m 50s) [12:58:49] Dereckson Amir1 is it ok if I deploy: https://gerrit.wikimedia.org/r/#/c/382434/ ? [12:58:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:58:52] T176042: Create amwikimedia - https://phabricator.wikimedia.org/T176042 [12:59:06] Dereckson: it was not in the doc too [12:59:10] we should add it [12:59:41] Amir1: yes, edit that part [12:59:47] https://am.wikimedia.org/wiki/Գլխավոր_էջ looks good now :) [12:59:52] 10Operations, 10OTRS: Upgrade OTRS to 5.0.23 - https://phabricator.wikimedia.org/T176221#3660556 (10akosiaris) Date no longer tentative. Oct 11th 09:00 UTC. Motd has been updated accordingly [12:59:53] marostegui: I think so [13:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Your horoscope predicts another unfortunate European Mid-day SWAT(Max 8 patches) deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171005T1300). [13:00:05] hoo: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:10] marostegui: I'm not using Tin either [13:00:14] I can SWAT today [13:00:18] I will deploy in a sec [13:00:20] Dereckson: two funny things: 1- the logo is not correct [13:00:22] zeljkof: not right now [13:00:22] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382434 (owner: 10Marostegui) [13:00:27] hoo: arround for SWAT? [13:00:31] hoo: zeljkof: we don't have done with the previous add wiki window [13:00:32] Dereckson: something going on? [13:00:39] zeljkof: Yes, I am [13:00:44] zeljkof: yes, Amir1 had a window to deploy am.wikimedia.org [13:00:48] but I would like to briefly collect some more pre-deploy data [13:00:52] Dereckson: ok, let me know when you are done, any ETA? [13:00:52] can you hang on a bit? [13:00:57] zeljkof: 40 to 70 minutes [13:01:10] Dereckson: uh oh :) so, probably no SWAT? [13:01:23] (03PS1) 10Fdans: Change Wikistats 2 deployment branch to 'release' [puppet] - 10https://gerrit.wikimedia.org/r/382438 (https://phabricator.wikimedia.org/T177288) [13:01:24] Dereckson: why 40? [13:02:23] you've still changes to deploy, the next one is the interwiki update [13:02:23] hoo: well, looks like the previous deploy window will take the most of SWAT, not sure if there will be SWAT, so stand by [13:02:48] okay, On it [13:02:50] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382434 (owner: 10Marostegui) [13:02:54] (it breaks login code in some cases when not done) [13:03:09] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382434 (owner: 10Marostegui) [13:03:51] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1089 - T174509 (duration: 00m 50s) [13:03:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:03:57] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [13:04:34] (03CR) 10Zfilipin: [C: 031] Enable Statement usage tracking on kowiki and trwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382414 (https://phabricator.wikimedia.org/T151717) (owner: 10Hoo man) [13:06:04] (03PS1) 10Ladsgroup: Add amwikimedia to interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382439 (https://phabricator.wikimedia.org/T176042) [13:06:22] Dereckson: https://gerrit.wikimedia.org/r/#/c/382439/ [13:06:40] zeljkof: Got my data… so once we're ready I'm good [13:06:41] 2- I will take care of the logo issue later [13:07:30] (03CR) 10Dereckson: [C: 031] Add amwikimedia to interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382439 (https://phabricator.wikimedia.org/T176042) (owner: 10Ladsgroup) [13:08:01] ah yes, was a blank logo, not the one you uploaded, indeed [13:08:18] (03CR) 10Ladsgroup: [C: 032] Add amwikimedia to interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382439 (https://phabricator.wikimedia.org/T176042) (owner: 10Ladsgroup) [13:08:20] hoo: I'm also ready, waiting for Dereckson and Amir1 to finish :) [13:08:43] hoo: want to deploy yourself, or should I? (not sure if you can deploy, just asking) [13:08:45] we are deploying [13:08:49] (03CR) 10Elukey: [C: 032] Change Wikistats 2 deployment branch to 'release' [puppet] - 10https://gerrit.wikimedia.org/r/382438 (https://phabricator.wikimedia.org/T177288) (owner: 10Fdans) [13:08:54] zeljkof: I can deploy it if you want me too [13:09:16] Amir1: next thing to do: the WikimediaMessages thing, I'm preparing the change, so you'll be able to review if (you've +2 in wikimediaMessages I think?) [13:09:24] Amir1: if you are in the deploy mood, go ahead :) [13:09:31] Dereckson: yes [13:09:36] zeljkof: sure [13:09:46] (03Merged) 10jenkins-bot: Add amwikimedia to interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382439 (https://phabricator.wikimedia.org/T176042) (owner: 10Ladsgroup) [13:09:56] (03CR) 10jenkins-bot: Add amwikimedia to interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382439 (https://phabricator.wikimedia.org/T176042) (owner: 10Ladsgroup) [13:10:23] you can also ask a meta sysop at #wikimedia-stewards to update the date of the interwiki map last deployed [13:11:28] yeah, I'll do [13:11:57] !log ladsgroup@tin Synchronized wmf-config/interwiki.php: Create amwikimedia (T176042) (duration: 00m 51s) [13:12:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:12:07] T176042: Create amwikimedia - https://phabricator.wikimedia.org/T176042 [13:13:39] Amir1: change ready at https://gerrit.wikimedia.org/r/382440 [13:14:10] Once merged in master, I'll cherry-pick it for 1.31.0-wmf.2 [13:14:16] Dereckson: awesome [13:15:02] !log test upgrade node-exporter in thumbor/swift/services/puppet3-diffs/monitoring projects - T166561 [13:15:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:15:09] T166561: Rollout prometheus-node-exporter 0.14 in labs - https://phabricator.wikimedia.org/T166561 [13:15:29] Dereckson: +2'd [13:17:25] cherry-picked as https://gerrit.wikimedia.org/r/#/c/382441/ [13:18:24] as the train will deploy soon 1.31.0-wmf.2 on Wikipedia, it's probably not worthwhile to cherry-pick for .1 [13:19:34] yeah [13:20:41] the storage stuff is now handled by the add wiki script, so that's done [13:21:27] The interwiki is also done [13:26:19] !log ladsgroup@tin Synchronized php-1.31.0-wmf.2/extensions/WikimediaMessages/i18n/wikimediaprojectnames: (no justification provided) (duration: 00m 49s) [13:26:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:26:32] Dereckson: deployed [13:28:06] For the logo issue, it currently displays the standard wikimedia one it seems. [13:29:02] ah yes you provided the HD ones: /static/images/project-logos/amwikimedia-1.5x.png /static/images/project-logos/amwikimedia-2x.png [13:29:28] (03PS3) 10Ema: package_builder: add support for WIKIMEDIA_EXPERIMENTAL [puppet] - 10https://gerrit.wikimedia.org/r/382403 [13:29:30] yup [13:29:34] I fix it right now [13:29:36] but we also need a traditional entry [13:29:46] yeah [13:31:02] https://gerrit.wikimedia.org/r/382444 [13:31:06] (03PS1) 10Ladsgroup: Add small logo for amwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382444 (https://phabricator.wikimedia.org/T176042) [13:31:21] (03CR) 10Ladsgroup: [C: 032] Add small logo for amwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382444 (https://phabricator.wikimedia.org/T176042) (owner: 10Ladsgroup) [13:31:23] (03CR) 10Volans: [C: 031] "LGTM, thanks for taking care of it!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/378039 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [13:32:07] (03CR) 10Ema: package_builder: add support for WIKIMEDIA_EXPERIMENTAL (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/382403 (owner: 10Ema) [13:33:04] (03Merged) 10jenkins-bot: Add small logo for amwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382444 (https://phabricator.wikimedia.org/T176042) (owner: 10Ladsgroup) [13:33:15] (03PS4) 10Ema: package_builder: add support for WIKIMEDIA_EXPERIMENTAL [puppet] - 10https://gerrit.wikimedia.org/r/382403 [13:35:15] that's deployed now [13:35:19] gehel: seems like the change for java 8 on ci hosts is all fine ( https://gerrit.wikimedia.org/r/#/c/382217/ ) [13:35:35] !log ladsgroup@tin Synchronized wmf-config/InitialiseSettings.php: Fix amwikimedia logo (T176042) (duration: 00m 50s) [13:35:38] gehel: I am not sure why I havent tried to add the alternative select in some common class :D [13:35:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:35:42] T176042: Create amwikimedia - https://phabricator.wikimedia.org/T176042 [13:36:05] one thing: for a fishbowl wiki, we don't have the first user to make for other ones, how I can make it? I couldn't find any maintenance script for that. [13:36:13] directly to the database? [13:36:45] hashar: just busy deep in code review. Do you need something else from me on this CR? [13:37:36] (03PS1) 10Dereckson: Document task reference for amwikimedia logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382447 (https://phabricator.wikimedia.org/T176042) [13:38:23] Amir1: populateAndCreate [13:38:38] (03CR) 10Dereckson: [C: 032] Document task reference for amwikimedia logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382447 (https://phabricator.wikimedia.org/T176042) (owner: 10Dereckson) [13:39:42] Dereckson: can't find the script [13:40:09] createAndPromote.php sorry [13:40:11] (03Merged) 10jenkins-bot: Document task reference for amwikimedia logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382447 (https://phabricator.wikimedia.org/T176042) (owner: 10Dereckson) [13:40:24] last time I used it: mwscript createAndPromote.php maiwikimedia --bureaucrat "Biplab Anand" `cat pass.txt` [13:41:14] If you want something to generate a pass: date +%s | sha256sum | base64 | head -c 32 > pass.txt [13:41:17] gehel: na I think that can just be merged. [13:41:30] (the base64 warrants you've "easy to copy/paste" characters) [13:41:32] gehel: moving the selects in jenkins::common solves the madness :] [13:41:36] (03CR) 10jenkins-bot: Add small logo for amwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382444 (https://phabricator.wikimedia.org/T176042) (owner: 10Ladsgroup) [13:41:38] (03CR) 10jenkins-bot: Document task reference for amwikimedia logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382447 (https://phabricator.wikimedia.org/T176042) (owner: 10Dereckson) [13:41:40] (as you need to send the pass by mail manually to the relevant user) [13:41:42] hashar: you'll push it to puppet swat? [13:42:01] Dereckson: that is a terrible password with barely any entropy :] [13:42:17] gehel: I already upgraded the machines :] [13:42:22] hashar: you mean, people are going to keep them? [13:42:43] and yes, to generate a pass from the CURRENT time is probably not a good idea [13:42:46] Dereckson: I mean using date as a source of entropy to generate a pass :] [13:42:59] done [13:43:16] Amir1: log it too [13:43:35] already on it [13:43:36] (when you run a maintenance script, there should be a !log entry, so it's in the server admin log) [13:43:51] !log ladsgroup@terbium:~$ mwscript createAndPromote.php --wiki=amwikimedia --createAndPromote=sysop,bureaucrat 'Ladsgroup' "password removed obviously" --force (T176042) [13:43:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:43:58] T176042: Create amwikimedia - https://phabricator.wikimedia.org/T176042 [13:43:59] yeah, I know I do it all the time [13:44:16] (03CR) 10Ladsgroup: [C: 032] Enable Statement usage tracking on kowiki and trwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382414 (https://phabricator.wikimedia.org/T151717) (owner: 10Hoo man) [13:44:27] hoo: merging, and deploying [13:44:31] thanks! [13:44:40] hoo: is it testable in mwdebug1002? [13:44:46] Amir1: can you deploy wmf-config/InitialiseSettings.php before for the task id fix? [13:44:57] Amir1: not really [13:45:12] Dereckson: for what? I don't get it. I did it for the logos [13:45:28] Amir1: I've pulled a follow-up change to add a comment with the task ID on tin, and tested it on mwdebug1002 [13:45:46] it's only for the sake of repo/prod code parity [13:45:52] as a no-op adding a comment. [13:46:12] gehel: so I guess it is all about merging it and rebasing the puppetmaster [13:46:57] hashar: so this is only used in labs and you already applied it? So merging is a noop... [13:47:12] (03Merged) 10jenkins-bot: Enable Statement usage tracking on kowiki and trwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382414 (https://phabricator.wikimedia.org/T151717) (owner: 10Hoo man) [13:47:23] (03CR) 10jenkins-bot: Enable Statement usage tracking on kowiki and trwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382414 (https://phabricator.wikimedia.org/T151717) (owner: 10Hoo man) [13:49:23] okay [13:49:42] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Document task reference for amwikimedia logo (T176042) (duration: 00m 54s) [13:49:47] gehel: it is also used in prod but I already did the upgrade :) [13:49:48] "13:49:34 sync-file failed: Failed to acquire lock "/var/lock/scap.unknown-but-probably-mediawiki.lock"; owner is "dereckson"; reason is "Document task reference for amwikimedia logo (T176042)"" [13:49:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:49:49] T176042: Create amwikimedia - https://phabricator.wikimedia.org/T176042 [13:49:55] gehel: so yeah noop :] [13:50:00] Amir1: I'm done [13:50:13] Dereckson: thank you for the great help [13:50:27] Please upgrade MediaWiki to the 1.31.0-wmf.2 version in es.wikipedia.org. [13:50:50] You're welcome Amir1,and don't worry, addWiki.php has virtually always crashed at some point. [13:51:06] (03PS1) 10DCausse: [cirrus] Disable A/B test of MLR on testing on 18 wikis with > 1% of search traffic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382451 (https://phabricator.wikimedia.org/T177490) [13:51:13] wikipediaes: do you have an emergency issue not fixed in 1.31.0-wmf.1, and fixed in 1.31.0-wmf.2? [13:51:14] Please upgrade MediaWiki to the 1.31.0-wmf.2 version in es.wikipedia.org. [13:51:22] Okay. [13:51:38] !log ladsgroup@tin Synchronized wmf-config/Wikibase-production.php: (no justification provided) (duration: 00m 50s) [13:51:44] shoot, I was copy pasting the reason for deploy but it got enter and did it without justification [13:51:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:51:47] sorry [13:51:57] Dereckson: Thanks :) [13:52:03] hoo: your patch is deployed, please check [13:52:33] wikipediaes: the upgrade is scheduled for 19:00 CEST [13:52:48] No. Upgrade now. [13:52:57] wikipediaes: if there is a current bug on es.wikipedia fixed in 1.31.0-wmf.2, please create a task on Phabricator, and we can backport the change [13:53:05] to 1.31.0-wmf.1. [13:53:21] Upgrade now! [13:53:49] (03PS1) 10Elukey: druid: add log4j logger to direct metrics to a specific file [puppet] - 10https://gerrit.wikimedia.org/r/382452 (https://phabricator.wikimedia.org/T177459) [13:54:23] I think we need someone to ban [13:55:02] Amir1: hm, not a single statement usage came in yet [13:55:12] Upgrade now! [13:56:11] Amir1: It's not deployed [13:56:18] can you double check? Did you rebase? [13:56:27] Please deploy! [13:56:45] hoo: I think I forgot the rebase [13:56:56] or it wasn't merged when I started [13:57:03] Deploy now! [13:57:05] sorry [13:57:12] hoo: deploying [13:57:24] !log ladsgroup@tin Synchronized wmf-config/Wikibase-production.php: Enable Statement usage tracking on kowiki and trwiki (T151717) (duration: 00m 50s) [13:57:25] aha, here we go :) [13:57:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:57:31] T151717: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717 [13:57:47] now usage are coming in… thanks :) [13:58:25] Who deploy it´s end? [14:00:48] Please deploy now! [14:01:53] 10Operations: IRC access request for Freenode #wikimedia-operations - https://phabricator.wikimedia.org/T177493#3660733 (10Dereckson) [14:02:17] 10Operations: IRC access request for Freenode #wikimedia-operations - https://phabricator.wikimedia.org/T177493#3660745 (10Dereckson) [14:02:26] 10Operations: IRC access request for Freenode #wikimedia-operations - https://phabricator.wikimedia.org/T177493#3660733 (10Dereckson) p:05Triage>03Low [14:02:34] wikipediaes: please stop [14:02:40] Please deploy now! [14:02:42] Okay [14:03:05] wikipediaes: the wikipedia wiki are upgraded on thursday at 8pm european time :) [14:03:27] wikipediaes: right now is a window to deploy configuration changes and some small fixes. [14:03:31] (03CR) 10Ema: [C: 032] varnish: stop passing session_max [puppet] - 10https://gerrit.wikimedia.org/r/382420 (https://phabricator.wikimedia.org/T168529) (owner: 10Ema) [14:03:37] (03PS2) 10Ema: varnish: stop passing session_max [puppet] - 10https://gerrit.wikimedia.org/r/382420 (https://phabricator.wikimedia.org/T168529) [14:03:39] (03CR) 10Ema: [V: 032 C: 032] varnish: stop passing session_max [puppet] - 10https://gerrit.wikimedia.org/r/382420 (https://phabricator.wikimedia.org/T168529) (owner: 10Ema) [14:03:45] wikipediaes: be sure that tonight eswiki will have the new version :]] [14:04:22] wikipediaes: you can book mark https://tools.wmflabs.org/versions/ to follow the deployment. We do group 0 first, then group 1 and last group2 which are the wikipedia projects [14:05:33] the addWiki file is clean now [14:06:16] Amir1: awesome :) [14:08:09] Please get a ssh to deploy! [14:10:27] Get a ssh to deploy! [14:14:12] 10Operations, 10HHVM, 10User-Elukey: Provide a forward port of ICU 52 for stretch / Investigate best ICU update strategy - https://phabricator.wikimedia.org/T177498#3660812 (10MoritzMuehlenhoff) [14:14:40] (03PS2) 10Elukey: druid: add log4j logger to direct metrics to a specific file [puppet] - 10https://gerrit.wikimedia.org/r/382452 (https://phabricator.wikimedia.org/T177459) [14:17:49] (03CR) 10Dzahn: racktables: role/profile, remove style violations (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/382336 (owner: 10Dzahn) [14:18:06] (03CR) 10Elukey: "Tested it live hacking the broker log4j file on druid1006, everything looks working as intended." [puppet] - 10https://gerrit.wikimedia.org/r/382452 (https://phabricator.wikimedia.org/T177459) (owner: 10Elukey) [14:18:15] (03PS3) 10Elukey: druid: add log4j logger to direct metrics to a specific file [puppet] - 10https://gerrit.wikimedia.org/r/382452 (https://phabricator.wikimedia.org/T177459) [14:24:05] hoo: is mid-day swat complete? [14:24:13] from my side, yes [14:24:29] thanks, then I'll proceed with upgrading HHVM on app servers [14:24:33] (03PS4) 10Zoranzoki21: Enable Extension:DynamicPageList to Turkish Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382357 (https://phabricator.wikimedia.org/T177448) (owner: 10Jayprakash12345) [14:24:37] yeah, the SWAT is done [14:26:46] (03PS1) 10Marostegui: db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382460 (https://phabricator.wikimedia.org/T174054) [14:26:55] (03PS2) 10Marostegui: db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382460 (https://phabricator.wikimedia.org/T174054) [14:27:22] !log upgrade remaining eqiad video scalers to HHVM 3.18.5 [14:27:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:31:26] I am going to restart jenkins soonish [14:34:02] 10Operations, 10Cloud-Services, 10Patch-For-Review: rack/setup/install labweb100[12].wikimedia.org - https://phabricator.wikimedia.org/T167820#3660887 (10Andrew) 05Open>03Resolved These boxes are up and installed and seem ok. Actual service implementation is T168470 [14:35:07] Amir1: are you done with: https://phabricator.wikimedia.org/T176043 ? [14:35:20] marostegui: yeah [14:35:27] Amir1: ok, i will sanitize the data on labs then [14:35:28] :) [14:38:57] 10Operations, 10Services (doing): Prometheus cluster attribute for new RESTBase Cassandra cluster - https://phabricator.wikimedia.org/T177501#3660900 (10Eevans) [14:40:29] (03PS1) 10Ladsgroup: Fix amwikimedia site name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382463 (https://phabricator.wikimedia.org/T176042) [14:42:44] 10Operations, 10Services (doing): Prometheus cluster attribute for new RESTBase Cassandra cluster - https://phabricator.wikimedia.org/T177501#3660900 (10elukey) I had to recently do something similar for kafka jumbo and discovered that the `cluster` puppet variable used must also be reflected in `common.yaml =... [14:44:16] (03CR) 10Zoranzoki21: [C: 031] Fix amwikimedia site name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382463 (https://phabricator.wikimedia.org/T176042) (owner: 10Ladsgroup) [14:48:43] !log Restarting Jenkins [14:48:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:50:05] (03PS1) 10Ema: varnish: support for version 5 [puppet] - 10https://gerrit.wikimedia.org/r/382464 (https://phabricator.wikimedia.org/T168529) [14:50:36] (03CR) 10jerkins-bot: [V: 04-1] varnish: support for version 5 [puppet] - 10https://gerrit.wikimedia.org/r/382464 (https://phabricator.wikimedia.org/T168529) (owner: 10Ema) [14:51:28] RECOVERY - DPKG on labvirt1016 is OK: All packages OK [14:52:09] !log Sanitize db1095 for amwikimedia - T176043 [14:52:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:52:18] T176043: Prepare and check storage layer for amwikimedia - https://phabricator.wikimedia.org/T176043 [14:53:00] (03CR) 10Hashar: [V: 031 C: 031] "I cherry picked it on the beta and CI puppet master. All slaves there have been upgraded." [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) (owner: 10Hashar) [14:53:20] !log All Jenkins infra is now on Java 8 - T162828 [14:53:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:53:28] T162828: Upgrade jenkins server and jenkins slaves to java 8 - https://phabricator.wikimedia.org/T162828 [14:54:11] (03CR) 10Ottomata: [C: 031] druid: add log4j logger to direct metrics to a specific file [puppet] - 10https://gerrit.wikimedia.org/r/382452 (https://phabricator.wikimedia.org/T177459) (owner: 10Elukey) [14:57:38] PROBLEM - DPKG on mw1277 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:58:38] RECOVERY - DPKG on mw1277 is OK: All packages OK [14:58:41] (03CR) 10Elukey: [C: 032] druid: add log4j logger to direct metrics to a specific file [puppet] - 10https://gerrit.wikimedia.org/r/382452 (https://phabricator.wikimedia.org/T177459) (owner: 10Elukey) [14:58:57] (03PS2) 10BBlack: browsersec: bump to 26% 2017-10-05 [puppet] - 10https://gerrit.wikimedia.org/r/376314 (https://phabricator.wikimedia.org/T163251) [14:59:11] 10Operations, 10Continuous-Integration-Infrastructure, 10Jenkins, 10Release-Engineering-Team (Kanban): Upgrade jenkins to 2.73.1 (new lts release) - https://phabricator.wikimedia.org/T168644#3660984 (10hashar) For #operations , we would need the Jenkins 2.73.1 Debian package to be made available on apt.wik... [14:59:46] (03CR) 10BBlack: [C: 032] browsersec: bump to 26% 2017-10-05 [puppet] - 10https://gerrit.wikimedia.org/r/376314 (https://phabricator.wikimedia.org/T163251) (owner: 10BBlack) [15:00:56] 10Operations, 10Continuous-Integration-Infrastructure, 10Jenkins, 10Release-Engineering-Team (Kanban): Upgrade jenkins to 2.73.1 (new lts release) - https://phabricator.wikimedia.org/T168644#3370797 (10MoritzMuehlenhoff) @hashar: I need to finish some other tasks today, but I can help with that tomorrow mo... [15:02:10] (03CR) 10WMDE-leszek: "Sorry, looks like I haven't understood what consequences this change has to the app running on the said instance. With this change, the ap" [puppet] - 10https://gerrit.wikimedia.org/r/379499 (owner: 10Elukey) [15:02:33] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: Decommission db2010 and move m1 codfw to db2078 - https://phabricator.wikimedia.org/T175685#3661002 (10Papaul) Disk wipe in progress . [15:06:43] (03CR) 10Elukey: "Ah snap sorry for it, if you want I can revert and the next puppet run will fix it. Then you'll be able to work on a more narrow set of ru" [puppet] - 10https://gerrit.wikimedia.org/r/379499 (owner: 10Elukey) [15:07:28] (03CR) 10WMDE-leszek: "not super urgent, will submit a patch in a short bit!" [puppet] - 10https://gerrit.wikimedia.org/r/379499 (owner: 10Elukey) [15:07:32] 10Operations, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Tune Kafka logs to register clients connected - https://phabricator.wikimedia.org/T173493#3661052 (10elukey) [15:08:41] 10Operations, 10Continuous-Integration-Infrastructure, 10Jenkins, 10Release-Engineering-Team (Kanban): Upgrade jenkins to 2.73.1 (new lts release) - https://phabricator.wikimedia.org/T168644#3661059 (10hashar) @MoritzMuehlenhoff definitely. I was planning to poke you about it tomorrow :] The aim is just t... [15:15:44] (03PS1) 10WMDE-leszek: phragile: Allow .htaccess needed by the app [puppet] - 10https://gerrit.wikimedia.org/r/382467 [15:16:37] (03CR) 10WMDE-leszek: "I am very noob in apache config, so please feel free to amend the patch if there is some better solution." [puppet] - 10https://gerrit.wikimedia.org/r/382467 (owner: 10WMDE-leszek) [15:16:52] !log Pulling out one disk from db1076 - T174054 [15:16:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:16:59] T174054: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054 [15:17:12] (03CR) 10WMDE-leszek: ""fix" in I5c48bff2cfd10272a581c8db4182d8e88061b5a9" [puppet] - 10https://gerrit.wikimedia.org/r/379499 (owner: 10Elukey) [15:17:59] RECOVERY - MariaDB Slave Lag: s4 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 3.46 seconds [15:18:39] 10Operations, 10ops-eqiad, 10fundraising-tech-ops, 10netops: Interface error on fasw-c-eqiad:vcp-255/1/0 - https://phabricator.wikimedia.org/T177333#3661112 (10ayounsi) 05Open>03Resolved a:03ayounsi Chris replaced c1a-0 <-> c1b-1; interface errors are gone. [15:19:53] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382460 (https://phabricator.wikimedia.org/T174054) (owner: 10Marostegui) [15:21:39] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382460 (https://phabricator.wikimedia.org/T174054) (owner: 10Marostegui) [15:21:49] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382460 (https://phabricator.wikimedia.org/T174054) (owner: 10Marostegui) [15:23:13] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1078 - T174054 (duration: 00m 51s) [15:23:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:23:20] T174054: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054 [15:23:52] so no crash? [15:23:56] nope [15:23:59] testing db1078 again [15:24:05] again? [15:24:23] yeah, i want to check if the host is wrong or if it was a punctual thing [15:24:31] oh, again like, last time it crashed [15:24:33] yeah [15:24:41] we tested a different host with same HW (db1076) [15:24:45] and now testing the original crashed host [15:24:46] I understood as "again the same we just did" [15:24:50] I got you now [15:25:37] !log Pulling out one disk from db1078 - T174054 [15:25:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:33:35] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054#3661198 (10Marostegui) So @Cmjohnson and myself have done the following tests: db1076 (testing host) pulled out one disk while gene... [15:34:03] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1078" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382469 [15:36:51] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1078" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382469 (owner: 10Marostegui) [15:37:34] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054#3661211 (10Marostegui) 05Open>03Resolved Let's call it resolved for now then. Thanks a lot for your help @Cmjohnson [15:38:23] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1078" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382469 (owner: 10Marostegui) [15:39:33] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1078 - T174054 (duration: 00m 50s) [15:39:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:39:41] T174054: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054 [15:40:09] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: Decommission db2010 and move m1 codfw to db2078 - https://phabricator.wikimedia.org/T175685#3661214 (10Papaul) switch port information asw-a2-codfw ge-6/0/9 [15:42:48] 10Operations, 10ops-eqiad, 10DBA: Decommission db1022 (Was: db1022 broke while changing topology on s6- evaluate if to fix or directly decommission) - https://phabricator.wikimedia.org/T163778#3661219 (10jcrespo) This is probably not fully decomissioned yet (dns, puppet, etc.), but I am going to try to remov... [15:43:30] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1078" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382469 (owner: 10Marostegui) [15:43:40] (03CR) 10Elukey: [C: 032] phragile: Allow .htaccess needed by the app [puppet] - 10https://gerrit.wikimedia.org/r/382467 (owner: 10WMDE-leszek) [15:45:01] (03PS2) 10DCausse: [cirrus] Disable A/B test of MLR on testing on 18 wikis with > 1% of search traffic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382451 (https://phabricator.wikimedia.org/T177490) [15:45:03] (03PS1) 10DCausse: [cirrus] Deploy recall A/B on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382470 (https://phabricator.wikimedia.org/T177502) [15:45:05] (03CR) 10Elukey: "Should be fine now, let me know after puppet runs :)" [puppet] - 10https://gerrit.wikimedia.org/r/382467 (owner: 10WMDE-leszek) [15:45:52] elukey: Is there an easy way to see what a certain ldap group gives people access to? Is there somewhere i should be greping? [15:46:41] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Deploy recall A/B on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382470 (https://phabricator.wikimedia.org/T177502) (owner: 10DCausse) [15:46:51] jouncebot: next [15:46:52] In 0 hour(s) and 13 minute(s): Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171005T1600) [15:47:04] addshore: there is a page on wikitech iirc moritzm has been keeping current [15:47:52] chasemp: okay, I just checked and the wmde one essentially had nothing there :D [15:47:56] addshore: there is a wiki page, lemme check it [15:48:27] (03PS2) 10Dzahn: DNS:Remove production & mgmt DNS for db2010 [dns] - 10https://gerrit.wikimedia.org/r/382205 (https://phabricator.wikimedia.org/T175685) (owner: 10Papaul) [15:48:37] addshore: https://wikitech.wikimedia.org/wiki/LDAP_Groups [15:48:40] I was wondering if the "Acknowledgement of Wikimedia Server Access Responsibilities" needs to be signed for the wmde ldap group, given it doesnt give you access to any machines etc. [15:49:19] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054#3661245 (10jcrespo) > I noticed that the disk shipped to db1076 to replace the failed one when it happened is bigger than the rest:... [15:50:00] mmmm maybe it is worth to follow up with Legal on this, but technically since the ldap grants you access to several important UIs I'd say yes [15:50:43] (sorry just seen the wmde ldap only, but I'd probably say yes anyway) [15:51:27] hmmm, okay, yeh, just wanted to clarify, as I think before noone had to sign L3, but the last time someone got added they did [15:52:02] Unless this all got cleaned up in either the LDAP or NDA sort out [15:53:46] (03CR) 10Dzahn: [C: 032] DNS:Remove production & mgmt DNS for db2010 [dns] - 10https://gerrit.wikimedia.org/r/382205 (https://phabricator.wikimedia.org/T175685) (owner: 10Papaul) [15:54:25] (03PS3) 10Dzahn: Remove production & mgmt DNS for db2010 [dns] - 10https://gerrit.wikimedia.org/r/382205 (https://phabricator.wikimedia.org/T175685) (owner: 10Papaul) [15:55:44] 10Operations, 10MediaWiki-Maintenance-scripts, 10MW-1.30-release-notes (WMF-deploy-2017-09-19 (1.30.0-wmf.19)): wikitech-static sync failing - https://phabricator.wikimedia.org/T176090#3661301 (10Andrew) 05Open>03Resolved a:03Andrew [15:55:58] (03CR) 10Arlolra: [C: 031] scap::dsh: Create the parsoid-canaries group [puppet] - 10https://gerrit.wikimedia.org/r/382416 (https://phabricator.wikimedia.org/T177374) (owner: 10Alexandros Kosiaris) [15:56:30] 10Operations, 10Patch-For-Review, 10Prometheus-metrics-monitoring: Port application-specific metrics from ganglia to prometheus - https://phabricator.wikimedia.org/T145659#3661310 (10fgiunchedi) [15:56:54] 10Operations, 10Patch-For-Review, 10Prometheus-metrics-monitoring: Port application-specific metrics from ganglia to prometheus - https://phabricator.wikimedia.org/T145659#2637292 (10fgiunchedi) [15:56:55] 10Operations, 10Patch-For-Review, 10Prometheus-metrics-monitoring: Port redis statistics from ganglia to prometheus - https://phabricator.wikimedia.org/T148637#3661314 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi Will do as part of T177196 [15:57:24] 10Operations, 10Patch-For-Review, 10Prometheus-metrics-monitoring: Port application-specific metrics from ganglia to prometheus - https://phabricator.wikimedia.org/T145659#2637292 (10fgiunchedi) Resolving as the work will be completed in T177196 by porting the missing Diamond collectors. [15:57:39] 10Operations, 10Patch-For-Review, 10Prometheus-metrics-monitoring: Port application-specific metrics from ganglia to prometheus - https://phabricator.wikimedia.org/T145659#3661328 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi [15:58:41] elukey: chasemp I might look at getting rid of the ldap group, it looks like it is pretty pointless now, heh! [15:59:02] addshore: i think the requirement just got a bit stricter, LDAP access is now handled more like "real" shell access is as well. For example "ldap_only" admins are now also listed as admins in the admin module in puppet [16:00:04] godog, moritzm, and _joe_: #bothumor My software never has bugs. It just develops random features. Rise for Puppet SWAT(Max 8 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171005T1600). [16:00:04] phedenskog and reedy: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [16:00:23] mutante: yup, I just read https://wikitech.wikimedia.org/wiki/Volunteer_NDA#Volunteer_NDA_for_privileged_LDAP_access_or_shell_access [16:00:24] * Reedy waves [16:00:27] and i would see LDAP groups like flags, you could have "wmde" AND "nda" as a volunteer if you needed it [16:01:12] Reedy: I'll take a look [16:01:22] (03PS28) 10Elukey: Make values stackable [puppet] - 10https://gerrit.wikimedia.org/r/375345 (https://phabricator.wikimedia.org/T104902) (owner: 10Phedenskog) [16:02:03] mutante: ack [16:02:08] (03PS29) 10Elukey: navtiming.py: Make values stackable [puppet] - 10https://gerrit.wikimedia.org/r/375345 (https://phabricator.wikimedia.org/T104902) (owner: 10Phedenskog) [16:02:26] Reedy: how does the generation work btw? talk to mysql and upload to swift sort of thing? [16:02:39] No talking to mysql [16:02:52] Reads a couple of text files [16:02:56] Gets a list of old files from swift [16:03:04] Krinkle: hi there, ok to merge the navtiming changes? (Peter seems not around) [16:03:05] generates the new captchas, saves to local disk [16:03:08] copies captchas to swift [16:03:10] deletes old captchas [16:03:17] I'm here [16:03:19] sorry [16:03:28] ah hi! [16:03:29] :) [16:03:38] yes merge, I'm ready to verify that it works :) [16:03:43] super, merging [16:04:05] Reedy: when you *waved*, i was wondering in which of these ways is he waving https://i.pinimg.com/736x/cf/2a/49/cf2a49149dd6aa253d21c9b83c360686--aviation-training-aviation-civile.jpg [16:04:16] (03CR) 10Elukey: [C: 032] navtiming.py: Make values stackable [puppet] - 10https://gerrit.wikimedia.org/r/375345 (https://phabricator.wikimedia.org/T104902) (owner: 10Phedenskog) [16:04:35] 10Operations, 10Analytics-Kanban: LVS for Druid - https://phabricator.wikimedia.org/T177511#3661358 (10Ottomata) [16:04:42] elukey: Yes, it's okay :) [16:05:21] Reedy: ack [16:06:04] (03PS4) 10Filippo Giunchedi: Regenerate FancyCaptchas weekly rather than monthly [puppet] - 10https://gerrit.wikimedia.org/r/382322 (https://phabricator.wikimedia.org/T157736) (owner: 10Reedy) [16:06:28] phedenskog: just ran puppet on hafnium, all good [16:06:40] cool [16:06:57] (03CR) 10Filippo Giunchedi: [C: 032] Regenerate FancyCaptchas weekly rather than monthly [puppet] - 10https://gerrit.wikimedia.org/r/382322 (https://phabricator.wikimedia.org/T157736) (owner: 10Reedy) [16:07:01] ACKNOWLEDGEMENT - puppet last run on netmon1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[acme-setup-acme-librenms] daniel_zahn hopefully this is gone in 1 day, since its been 6 days and its a weekly limit [16:08:46] Reedy: puppet ran on terbium [16:09:07] (03PS2) 10DCausse: [cirrus] Deploy recall A/B on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382470 (https://phabricator.wikimedia.org/T177502) [16:10:30] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] Deploy recall A/B on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382470 (https://phabricator.wikimedia.org/T177502) (owner: 10DCausse) [16:10:47] godog: files look to have changed, so all good [16:10:49] ta [16:11:02] elukey Krinkle: looks good so far. got the new metrics and higher eth0 traffic on hafnium [16:11:28] ack! [16:13:02] I'm not seeing any disaster on statsd either [16:13:22] a bunch of creates as the new metrics come in [16:13:47] (03PS1) 10DCausse: [cirrus] Enable loading interwiki configs via API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382474 [16:14:07] godog: https://grafana.wikimedia.org/dashboard/db/graphite-eqiad 'statsite' dashboard, or is there another? [16:14:13] 10Operations, 10Cloud-Services, 10Patch-For-Review: rack/setup/install labweb100[12].wikimedia.org - https://phabricator.wikimedia.org/T167820#3661416 (10bd808) [16:14:27] Krinkle: nope that's the one I'm looking at too [16:16:57] !log failure test for pfw3-codfw [16:17:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:17:59] (03PS3) 10DCausse: [cirrus] Deploy recall A/B on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382470 (https://phabricator.wikimedia.org/T177502) [16:19:48] godog: is the graphite system ssd based btw? or does it have just good memory caching and/or fs cache? [16:20:50] Krinkle: 4x ssd in raid10, and lots of ram yeah [16:21:00] 10Operations, 10DBA, 10Availability (Multiple-active-datacenters), 10Performance-Team (Radar): Make apache/maintenance hosts TLS connections to mariadb work - https://phabricator.wikimedia.org/T175672#3661447 (10jcrespo) I would suggest to setup a proxysql instance to move this forward? maybe on terbium it... [16:21:14] I think we were maxing out a db-class machine when I joined three years ago, tungsten [16:21:15] elukey yep good from metrics perspective. really looking forward to create the graphs later Krinkle ! [16:21:32] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10netops: vcp port down on fasw-c-codfw - https://phabricator.wikimedia.org/T177332#3661454 (10ayounsi) 05Open>03Resolved a:03ayounsi Papaul reseated the cable, no more issues. [16:21:55] Krinkle: and we already crushed a set of those ssd, back in feb [16:21:55] phedenskog: yeah! [16:22:36] godog: yeah, we were. we (perfteam) are still using that machine (tungsten) for low-frequency writes/reads with a MongoDB (XHGui) [16:23:28] heheh time to get tungsten retired alright [16:23:42] but yeah graphite/whisper is really unforgiving to the disk [16:24:11] PROBLEM - puppet last run on mw1257 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:24:31] 10Operations, 10ops-eqiad, 10DBA: Decommission db1022 (Was: db1022 broke while changing topology on s6- evaluate if to fix or directly decommission) - https://phabricator.wikimedia.org/T163778#3661467 (10jcrespo) Actually, I cannot do all the steps (network changes) without coordinating with DC ops. I do not... [16:27:51] 10Operations, 10ops-eqiad, 10DC-Ops, 10netops: Move eqiad frack to new infra - https://phabricator.wikimedia.org/T174218#3661490 (10ayounsi) 05Open>03Resolved Test in codfw was successful, no packet loss/issue. [16:28:50] PROBLEM - MariaDB Slave SQL: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1061, Errmsg: Error Duplicate key name eu_entity_id on query. Default database: amwikimedia. [Query snipped] [16:31:34] (03PS3) 10Catrope: Enable structured change filters by default on all wikis except FlaggedRevs wikis using non-protection modes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378265 (https://phabricator.wikimedia.org/T177445) [16:33:12] (03CR) 10EBernhardson: [C: 031] [cirrus] Enable loading interwiki configs via API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382474 (owner: 10DCausse) [16:33:56] (03CR) 10Ayounsi: [V: 032 C: 032] Imported Upstream version 5.1.3 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382174 (owner: 10Ayounsi) [16:35:34] (03CR) 10EBernhardson: [C: 031] [cirrus] Deploy recall A/B on enwiki (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382470 (https://phabricator.wikimedia.org/T177502) (owner: 10DCausse) [16:36:19] (03CR) 10Ayounsi: [V: 032 C: 032] Adapt WMF specific patches for Varnish5 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382175 (owner: 10Ayounsi) [16:36:26] (03CR) 10Ayounsi: [C: 032] Update changelog for Varnish 5.1.3 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382176 (owner: 10Ayounsi) [16:36:44] (03CR) 10Ayounsi: [V: 032 C: 032] Varnish5: Install devicedetect.vcl in the proper path [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382177 (owner: 10Ayounsi) [16:37:20] (03CR) 10Ayounsi: [V: 032 C: 032] Varnish5: Update libvarnishapi1.symbols [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382178 (owner: 10Ayounsi) [16:37:36] (03CR) 10Ayounsi: [V: 032 C: 032] Add lintian-overrides for statically linked libs [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382193 (owner: 10Ayounsi) [16:37:58] (03CR) 10Ayounsi: [V: 032 C: 032] Fix vmod_abi.h version parsing [software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/382207 (owner: 10Ayounsi) [16:38:09] (03CR) 10Ayounsi: [V: 032 C: 032] Add description to vmod_netmapper.vcc [software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/382208 (owner: 10Ayounsi) [16:38:18] (03CR) 10Ayounsi: [V: 032 C: 032] Bump version to 1.5 [software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/382209 (owner: 10Ayounsi) [16:38:27] (03CR) 10Ayounsi: [V: 032 C: 032] Fix vmod_abi.h version parsing [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/382309 (owner: 10Ayounsi) [16:38:39] (03CR) 10Ayounsi: [V: 032 C: 032] Add description to vmod_netmapper.vcc [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/382310 (owner: 10Ayounsi) [16:38:46] (03CR) 10Ayounsi: [V: 032 C: 032] Bump version to 1.5 [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/382311 (owner: 10Ayounsi) [16:38:54] (03CR) 10Ayounsi: [V: 032 C: 032] Update debian changelog for version 1.5 [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/382312 (owner: 10Ayounsi) [16:39:01] (03CR) 10Ayounsi: [V: 032 C: 032] Bump libvarnishapi-dev dependency to version 5.1.3 [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/382313 (owner: 10Ayounsi) [16:42:50] RECOVERY - MariaDB Slave SQL: s3 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes [16:43:00] (03PS2) 10Ema: varnish: support for version 5 [puppet] - 10https://gerrit.wikimedia.org/r/382464 (https://phabricator.wikimedia.org/T168529) [16:43:28] (03CR) 10jerkins-bot: [V: 04-1] varnish: support for version 5 [puppet] - 10https://gerrit.wikimedia.org/r/382464 (https://phabricator.wikimedia.org/T168529) (owner: 10Ema) [16:44:35] (03PS1) 10Ema: cache_canary: disable max_connections [puppet] - 10https://gerrit.wikimedia.org/r/382477 (https://phabricator.wikimedia.org/T175803) [16:48:29] (03CR) 10Ema: "https://puppet-compiler.wmflabs.org/compiler02/8202/" [puppet] - 10https://gerrit.wikimedia.org/r/382464 (https://phabricator.wikimedia.org/T168529) (owner: 10Ema) [16:49:10] RECOVERY - puppet last run on mw1257 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [16:51:05] 10Operations: IRC operator request for Freenode #wikimedia-operations for @Dereckson - https://phabricator.wikimedia.org/T177493#3661613 (10Framawiki) [16:51:43] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1092 - https://phabricator.wikimedia.org/T177264#3661624 (10Cmjohnson) Case submitted with HP...Case ID 5323521514 [16:53:11] 10Operations, 10ops-eqiad: adjust flerovium power draw - https://phabricator.wikimedia.org/T177131#3661629 (10Cmjohnson) 05Open>03Resolved Fixed [16:53:13] 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install flerovium.eqiad.wmnet - https://phabricator.wikimedia.org/T176505#3661631 (10Cmjohnson) [16:53:32] !log upgrade remaining app servers in eqiad to HHVM 3.18.5 [16:53:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:53:54] (03CR) 10Dzahn: [C: 031] "@paladox could you please rebase it one more time? then i'll merge, but needs a manual one" [puppet] - 10https://gerrit.wikimedia.org/r/361680 (https://phabricator.wikimedia.org/T166611) (owner: 10Paladox) [16:54:08] (03PS1) 10Volans: Docstrings: use Google Style [software/cumin] - 10https://gerrit.wikimedia.org/r/382479 (https://phabricator.wikimedia.org/T159308) [16:54:11] (03PS1) 10Volans: Documentation: convert Markdown to reStructuredText [software/cumin] - 10https://gerrit.wikimedia.org/r/382480 (https://phabricator.wikimedia.org/T159308) [16:54:13] (03PS1) 10Volans: CLI: extract parser definition from parse_args() [software/cumin] - 10https://gerrit.wikimedia.org/r/382481 [16:54:15] (03PS1) 10Volans: setup.py: prepare for PyPi submission [software/cumin] - 10https://gerrit.wikimedia.org/r/382482 [16:54:17] (03PS1) 10Volans: Documentation: Sphinx setup [software/cumin] - 10https://gerrit.wikimedia.org/r/382483 (https://phabricator.wikimedia.org/T159308) [16:54:19] (03PS1) 10Volans: setup.py: split test dependencies [software/cumin] - 10https://gerrit.wikimedia.org/r/382484 [16:55:02] (03CR) 10Ema: [C: 032] cache_canary: disable max_connections [puppet] - 10https://gerrit.wikimedia.org/r/382477 (https://phabricator.wikimedia.org/T175803) (owner: 10Ema) [16:56:50] PROBLEM - DPKG on mw1273 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:57:50] RECOVERY - DPKG on mw1273 is OK: All packages OK [16:58:10] RECOVERY - puppet last run on netmon1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:59:37] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1092 - https://phabricator.wikimedia.org/T177264#3661662 (10Marostegui) Thank you! [17:00:03] 10Operations, 10Cloud-Services, 10wikitech.wikimedia.org, 10HHVM: Move wikitech (silver) to HHVM - https://phabricator.wikimedia.org/T98813#1278203 (10Krinkle) ... or not. {nav name=T176370 Migrate to PHP 7 in WMF production, href=https://phabricator.wikimedia.org/T176370} [17:00:04] gwicke, cscott, arlolra, subbu, halfak, and Amir1: That opportune time is upon us again. Time for a Services – Graphoid / Parsoid / OCG / Citoid / ORES deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171005T1700). [17:00:04] No GERRIT patches in the queue for this window AFAICS. [17:00:08] (03CR) 10jerkins-bot: [V: 04-1] setup.py: split test dependencies [software/cumin] - 10https://gerrit.wikimedia.org/r/382484 (owner: 10Volans) [17:00:45] (03PS7) 10Paladox: contint: Make php.pp compatible with stretch [puppet] - 10https://gerrit.wikimedia.org/r/361680 (https://phabricator.wikimedia.org/T166611) [17:00:46] (03CR) 10Ayounsi: [C: 031] pybal: BGP MED configuration [puppet] - 10https://gerrit.wikimedia.org/r/380516 (https://phabricator.wikimedia.org/T165584) (owner: 10Ema) [17:00:51] (03PS8) 10Paladox: contint: Make php.pp compatible with stretch [puppet] - 10https://gerrit.wikimedia.org/r/361680 (https://phabricator.wikimedia.org/T166611) [17:01:09] jouncebot: Nothing for ORES [17:01:23] (03CR) 10Ayounsi: [C: 031] Add ferm service for rpc.statd on labstore [puppet] - 10https://gerrit.wikimedia.org/r/354226 (https://phabricator.wikimedia.org/T165136) (owner: 10Muehlenhoff) [17:01:44] (03CR) 10jerkins-bot: [V: 04-1] setup.py: prepare for PyPi submission [software/cumin] - 10https://gerrit.wikimedia.org/r/382482 (owner: 10Volans) [17:01:53] (03PS9) 10Paladox: contint: Make php.pp compatible with stretch [puppet] - 10https://gerrit.wikimedia.org/r/361680 (https://phabricator.wikimedia.org/T166611) [17:02:40] PROBLEM - puppet last run on mw1270 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm],Package[hhvm-dbg] [17:03:21] (03CR) 10jerkins-bot: [V: 04-1] Documentation: Sphinx setup [software/cumin] - 10https://gerrit.wikimedia.org/r/382483 (https://phabricator.wikimedia.org/T159308) (owner: 10Volans) [17:06:30] 10Operations, 10Research, 10Research-2017-18-Q2: Permissions to upload data to the analytics cluster from a machine at UMN - https://phabricator.wikimedia.org/T177521#3661687 (10DarTar) [17:08:08] (03CR) 10Dzahn: racktables: role/profile, remove style violations (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/382336 (owner: 10Dzahn) [17:08:46] (03PS3) 10Dzahn: racktables: role/profile, remove style violations [puppet] - 10https://gerrit.wikimedia.org/r/382336 [17:11:00] PROBLEM - DPKG on mw1275 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [17:11:51] (03PS4) 10Herron: Add standalone letsencrypt nginx template [puppet] - 10https://gerrit.wikimedia.org/r/375071 (https://phabricator.wikimedia.org/T174720) [17:12:01] RECOVERY - DPKG on mw1275 is OK: All packages OK [17:13:35] 10Operations, 10Research, 10Research-2017-18-Q2: Permissions to upload data to the analytics cluster from a machine at UMN - https://phabricator.wikimedia.org/T177521#3661737 (10elukey) Adding a bit of context here after a chat with @MoritzMuehlenhoff: * For Aaron it would be a bit of a problem to use his wo... [17:14:20] PROBLEM - DPKG on mw1319 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [17:15:20] RECOVERY - DPKG on mw1319 is OK: All packages OK [17:17:41] PROBLEM - DPKG on mw1328 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [17:18:40] RECOVERY - DPKG on mw1328 is OK: All packages OK [17:19:20] PROBLEM - puppet last run on mw1325 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm],Package[hhvm-dbg] [17:20:26] 10Operations, 10Discovery: Investigate adding memory to elastic10{01...16} to bring more parity between the two types of servers running elasticsearch in eqiad - https://phabricator.wikimedia.org/T117110#3661756 (10debt) [17:24:03] (03PS2) 10Volans: CLI: extract parser definition from parse_args() [software/cumin] - 10https://gerrit.wikimedia.org/r/382481 [17:24:05] (03PS2) 10Volans: setup.py: prepare for PyPi submission [software/cumin] - 10https://gerrit.wikimedia.org/r/382482 [17:24:07] (03PS2) 10Volans: Documentation: Sphinx setup [software/cumin] - 10https://gerrit.wikimedia.org/r/382483 (https://phabricator.wikimedia.org/T159308) [17:24:09] (03PS2) 10Volans: setup.py: split test dependencies [software/cumin] - 10https://gerrit.wikimedia.org/r/382484 [17:24:20] (03PS1) 10BBlack: robots.txt: block MJ12bot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382494 [17:24:34] (03PS1) 10Cmjohnson: removing db1023/24 from site.pp for decomm [puppet] - 10https://gerrit.wikimedia.org/r/382495 [17:25:28] (03PS2) 10BBlack: robots.txt: block MJ12bot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382494 [17:26:51] (03PS3) 10Volans: CLI: extract parser definition from parse_args() [software/cumin] - 10https://gerrit.wikimedia.org/r/382481 [17:26:53] (03PS3) 10Volans: setup.py: prepare for PyPi submission [software/cumin] - 10https://gerrit.wikimedia.org/r/382482 [17:26:55] (03PS3) 10Volans: Documentation: Sphinx setup [software/cumin] - 10https://gerrit.wikimedia.org/r/382483 (https://phabricator.wikimedia.org/T159308) [17:26:57] (03PS3) 10Volans: setup.py: split test dependencies [software/cumin] - 10https://gerrit.wikimedia.org/r/382484 [17:27:12] !log disable puppet db1024 [17:27:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:27:40] RECOVERY - puppet last run on mw1270 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [17:31:45] (03PS4) 10Dzahn: racktables: role/profile, remove style violations [puppet] - 10https://gerrit.wikimedia.org/r/382336 [17:34:08] (03CR) 10Herron: [C: 032] Add standalone letsencrypt nginx template [puppet] - 10https://gerrit.wikimedia.org/r/375071 (https://phabricator.wikimedia.org/T174720) (owner: 10Herron) [17:36:28] (03PS1) 10BBlack: Varnish: block MJ12bot [puppet] - 10https://gerrit.wikimedia.org/r/382503 [17:37:25] (03PS5) 10Herron: Add standalone letsencrypt nginx template [puppet] - 10https://gerrit.wikimedia.org/r/375071 (https://phabricator.wikimedia.org/T174720) [17:37:44] (03CR) 10BBlack: [C: 032] Varnish: block MJ12bot [puppet] - 10https://gerrit.wikimedia.org/r/382503 (owner: 10BBlack) [17:37:58] 10Operations, 10Services (doing): Prometheus cluster attribute for new RESTBase Cassandra cluster - https://phabricator.wikimedia.org/T177501#3661810 (10Eevans) >>! In T177501#3660916, @elukey wrote: > I had to recently do something similar for kafka jumbo and discovered that the `cluster` puppet variable used... [17:43:23] (03PS6) 10Herron: Add standalone letsencrypt nginx template [puppet] - 10https://gerrit.wikimedia.org/r/375071 (https://phabricator.wikimedia.org/T174720) [17:49:15] RECOVERY - puppet last run on mw1325 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:55:29] 10Operations, 10ops-eqiad, 10DC-Ops: Multiple servers in eqiad D8 showing PSU failures - https://phabricator.wikimedia.org/T177227#3661832 (10Cmjohnson) @herron all power supplies are working for those 3 hosts [17:56:22] (03PS1) 10Eevans: cassandra: move machines from restbase to restbase_ng cluster [puppet] - 10https://gerrit.wikimedia.org/r/382506 (https://phabricator.wikimedia.org/T177501) [17:59:15] PROBLEM - Disk space on mw1262 is CRITICAL: DISK CRITICAL - free space: / 15990 MB (3% inode=97%) [18:00:05] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear deployers, time to do the Morning SWAT (Max 8 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171005T1800). [18:00:05] Krinkle, RoanKattouw, davidwbarratt, Amir1, Addshore, and James_F: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:12] here! [18:00:27] co/ [18:00:28] o/ [18:02:18] 0/ [18:02:32] Hey. But addshore's stuff needs to happen first. [18:02:48] ? [18:03:20] I can SWAT [18:03:45] (03PS3) 10Thcipriani: Enable jQuery 3 on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379949 (https://phabricator.wikimedia.org/T124742) (owner: 10Krinkle) [18:04:00] Hi [18:04:00] I'm here but addshore's changes need to go before mine [18:04:06] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379949 (https://phabricator.wikimedia.org/T124742) (owner: 10Krinkle) [18:04:10] Oh wait [18:04:14] PROBLEM - MariaDB Slave SQL: s3 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1007, Errmsg: Error Cant create database amwikimedia: database exists on query. Default database: amwikimedia. [Query snipped] [18:04:18] addshore: Did you not break preferences completely? Or was that only in master? [18:04:27] That's only master [18:04:41] Oh OK never mind then [18:04:50] (03CR) 10jerkins-bot: [V: 04-1] Documentation: Sphinx setup [software/cumin] - 10https://gerrit.wikimedia.org/r/382483 (https://phabricator.wikimedia.org/T159308) (owner: 10Volans) [18:04:56] 10Operations, 10Cassandra, 10Epic, 10Goal, and 2 others: End of August milestone: Cassandra 3 cluster in production - https://phabricator.wikimedia.org/T169939#3661858 (10Eevans) [18:04:59] Also, I didnt do the breaking ;) [18:05:00] 10Operations, 10Cassandra, 10Epic, 10Goal, and 2 others: Delete graphite metrics for old CFs - https://phabricator.wikimedia.org/T173436#3661857 (10Eevans) [18:05:13] (03CR) 10jerkins-bot: [V: 04-1] setup.py: split test dependencies [software/cumin] - 10https://gerrit.wikimedia.org/r/382484 (owner: 10Volans) [18:08:14] (03Merged) 10jenkins-bot: Enable jQuery 3 on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379949 (https://phabricator.wikimedia.org/T124742) (owner: 10Krinkle) [18:08:24] (03CR) 10jenkins-bot: Enable jQuery 3 on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379949 (https://phabricator.wikimedia.org/T124742) (owner: 10Krinkle) [18:08:35] RECOVERY - MariaDB Slave Lag: s7 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 221.23 seconds [18:10:41] Krinkle: jquery3 change is live on mwdebug1002, check please [18:11:15] RECOVERY - MariaDB Slave SQL: s3 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes [18:11:27] thcipriani: confirmed, good to go [18:11:35] ok, going live [18:13:02] davidwbarratt: is this the change I need to backport? https://gerrit.wikimedia.org/r/#/c/374871/ is this for wmf.2? [18:13:05] (03CR) 10Dzahn: [C: 031] "addresses Alex' comments, turned some includes into resource-like declarations, moved to profile. now it adds no new issues and: wmf-style" [puppet] - 10https://gerrit.wikimedia.org/r/382336 (owner: 10Dzahn) [18:13:12] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:379949|Enable jQuery 3 on all wikis]] T124742 (duration: 00m 52s) [18:13:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:13:19] T124742: Upgrade to jQuery 3 - https://phabricator.wikimedia.org/T124742 [18:13:19] ^ Krinkle live everywhere [18:13:35] (03PS4) 10Thcipriani: Enable structured change filters by default on all wikis except FlaggedRevs wikis using non-protection modes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378265 (https://phabricator.wikimedia.org/T177445) (owner: 10Catrope) [18:13:37] thcipriani yes [18:14:04] thcipriani: confirmed, thx [18:14:17] davidwbarratt: okie doke, I'll cherry-pick, thanks [18:14:18] thcipriani and I don't know what wmf.2 is [18:14:30] thcipriani great! [18:15:08] davidwbarratt: that's the current mediawiki version running on all wikis with the exception of wikipedia wikis (but it will be running on those wikis in the next hour, likely) [18:15:12] (03PS10) 10Dzahn: contint: Make php.pp compatible with stretch [puppet] - 10https://gerrit.wikimedia.org/r/361680 (https://phabricator.wikimedia.org/T166611) (owner: 10Paladox) [18:15:13] re: wmf.2 [18:15:25] 10Operations, 10ops-eqiad, 10DC-Ops: Multiple servers in eqiad D8 showing PSU failures - https://phabricator.wikimedia.org/T177227#3661871 (10herron) Strange. Analytics1035 has cleared while the other two are still in critical state. IPMI sel shows a few recent power events. I wonder what's going on with t... [18:15:26] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378265 (https://phabricator.wikimedia.org/T177445) (owner: 10Catrope) [18:15:52] (03CR) 10Dzahn: [C: 032] "lgtm, +1 from hashar, no-op for contint machines: http://puppet-compiler.wmflabs.org/8203/" [puppet] - 10https://gerrit.wikimedia.org/r/361680 (https://phabricator.wikimedia.org/T166611) (owner: 10Paladox) [18:16:21] thcipriani ah, yes the feature is on English Wikipedia (and all other wikis) so this is a bug fix for the feature that is already deployed [18:18:03] (03Merged) 10jenkins-bot: Enable structured change filters by default on all wikis except FlaggedRevs wikis using non-protection modes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378265 (https://phabricator.wikimedia.org/T177445) (owner: 10Catrope) [18:18:13] (03CR) 10jenkins-bot: Enable structured change filters by default on all wikis except FlaggedRevs wikis using non-protection modes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378265 (https://phabricator.wikimedia.org/T177445) (owner: 10Catrope) [18:18:49] RoanKattouw: your change is live on mwdebug1002, check please [18:20:14] 10Operations, 10Cloud-Services, 10wikitech.wikimedia.org, 10HHVM: Move wikitech (silver) to HHVM - https://phabricator.wikimedia.org/T98813#3661882 (10bd808) >>! In T98813#3661663, @Krinkle wrote: > ... or not. {nav name=T176370 Migrate to PHP 7 in WMF production, href=https://phabricator.wikimedia.org/T17... [18:21:38] thcipriani: Looks good [18:23:58] ok going live [18:24:42] (03PS4) 10Volans: Documentation: Sphinx setup [software/cumin] - 10https://gerrit.wikimedia.org/r/382483 (https://phabricator.wikimedia.org/T159308) [18:24:44] (03PS4) 10Volans: setup.py: split test dependencies [software/cumin] - 10https://gerrit.wikimedia.org/r/382484 [18:26:31] !log thcipriani@tin Synchronized wmf-config: SWAT: [[gerrit:378265|Enable structured change filters by default on all wikis except FlaggedRevs wikis using non-protection modes]] T177445 (duration: 00m 54s) [18:26:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:26:38] T177445: Graduate New Filters on Recent Changes out of beta on all wikis without "Hide reviewed edits" filter (shown on some FlaggedRevs wikis) - https://phabricator.wikimedia.org/T177445 [18:26:46] ^ RoanKattouw should be live [18:32:45] Thanks [18:32:46] Amir1: didn't see you around, ping for SWAT if you're here [18:33:45] RECOVERY - MariaDB Slave Lag: s3 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 138.22 seconds [18:34:25] Eh.. why are all my mediawiki.org pages in user lang 'cy' [18:34:40] Krinkle: Did you change your interface language? [18:34:58] I don't recall doing so, but special:preferences does claim so [18:35:06] Krinkle: Someone from the Welsh independence movement write a default-on gadget that touches your prefs? ;-) [18:35:14] I've changed it back but I hope this isn't a bug. [18:35:27] haha [18:38:45] Krinkle: there were issues with the StubUserLang removal, it's been reverted out of master [18:38:48] !log restbase-ng: lowering compactionthroughput to 4 (current/5 jbod devices) [18:38:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:39:29] (03CR) 10Dzahn: [C: 04-1] "now that this part is solved, we have " Evaluation Error: Error while evaluating a Resource Statement, Duplicate declaration: Class[Apache" [puppet] - 10https://gerrit.wikimedia.org/r/382336 (owner: 10Dzahn) [18:39:37] thcipriani: Is SWAT continuing? [18:39:54] James_F: waiting on Jenkins, of course :) [18:40:14] Of course. :-) [18:42:57] davidwbarratt: you changes are live on mwdebug1002, could you check please? [18:43:13] thcipriani sure! did you run the script as well? [18:43:55] davidwbarratt: not yet, after I deploy everywhere I can run it [18:44:02] thcipriani kk [18:44:38] so let me know if everything looks fine on mwdebug1002, and I can deploy and run the script [18:46:46] thcipriani looks good to me! added someone to my echo blocklist and queried it with the API and it returned the userid rather than the username [18:47:08] davidwbarratt: ok, I'll sync this live, thanks for checking :) [18:47:17] thcipriani no problem! [18:53:18] !log cp3* upgrade nginx to 1.13.5-1+wmf1~jessie1 [18:53:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:57:55] (03CR) 10jerkins-bot: [V: 04-1] setup.py: split test dependencies [software/cumin] - 10https://gerrit.wikimedia.org/r/382484 (owner: 10Volans) [18:58:44] !log thcipriani@tin Synchronized php-1.31.0-wmf.2/extensions/Echo: SWAT: [[gerrit:382508|Use User Ids instead of User Names for Echo Mute]] T173475 (duration: 00m 56s) [18:58:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:58:51] T173475: Echo Notification Mute (Block List) can be bypassed by changing username - https://phabricator.wikimedia.org/T173475 [18:59:12] ummm mw1262.eqiad.wmnet [18:59:20] has no space left on device? [19:00:04] twentyafterfour: Time to snap out of that daydream and deploy MediaWiki train. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171005T1900). [19:00:04] No GERRIT patches in the queue for this window AFAICS. [19:00:13] except that isn't true? [19:00:24] man when did jouncebot get so mouthy [19:01:05] hehe [19:01:12] (03CR) 10jerkins-bot: [V: 04-1] Documentation: Sphinx setup [software/cumin] - 10https://gerrit.wikimedia.org/r/382483 (https://phabricator.wikimedia.org/T159308) (owner: 10Volans) [19:01:43] twentyafterfour: I'm still swatting [19:02:05] ah, mw1262 is out of space, not mw1272 [19:02:11] thcipriani: no problem [19:05:14] !log thcipriani@tin Synchronized php-1.31.0-wmf.1/extensions/Echo: SWAT: [[gerrit:382511|Use User Ids instead of User Names for Echo Mute]] T173475 (duration: 00m 55s) [19:05:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:05:20] T173475: Echo Notification Mute (Block List) can be bypassed by changing username - https://phabricator.wikimedia.org/T173475 [19:08:09] legoktm: interesting, what happened (RE: StubUserLang) [19:10:02] uhhh, trying to run the maintenance script for echo, but getting Error: 1146 Table 'testwiki.user_properties' [19:10:14] doesn't exist although it does [19:11:41] ^ davidwbarratt any ideas about that? or MaxSem ? [19:12:41] uhhhh..... [19:12:45] hmmm, seems kind of familiar... which db cluster is it connecting to? [19:13:02] the main wiki shard cluster or x1/extension1? =o [19:13:45] just running maintentenance script on https://gerrit.wikimedia.org/r/#/c/382511/1/maintenance/updatePerUserBlacklist.php on terbium [19:14:24] RECOVERY - Disk space on mw1262 is OK: DISK OK [19:14:54] mhhm, it reminded me of https://phabricator.wikimedia.org/T164469 but doesnt seem that relevant [19:16:12] thcipriani do you know if it's the read query DB_REPLICA or the write DB_MASTER ? [19:16:39] davidwbarratt: looks like it's the read query [19:17:22] thcipriani maybe DB_REPLICA doesn't have access to that table? [19:19:01] (03PS5) 10Dzahn: racktables: role/profile, remove style violations [puppet] - 10https://gerrit.wikimedia.org/r/382336 [19:20:36] hrm, that would be...odd [19:21:13] thcipriani that's the only reason I can think of that would cause a table to be non-existant [19:21:19] (when it really is there) [19:24:21] 10Operations, 10ops-eqiad, 10DC-Ops: Multiple servers in eqiad D8 showing PSU failures - https://phabricator.wikimedia.org/T177227#3662058 (10Cmjohnson) Odd, the racadm log (Dell's hardware log) shows that the power was restored and the physical connections shows that there is power. root@analytics1037.mgmt... [19:25:57] that doesn't make sense to me. This is what all the maintenance scripts seem to do: read from DB_REPLICA right? [19:26:24] yeh [19:26:28] doesnt make sense to me either [19:26:46] well good 'cuz I copied that part from another one. :) [19:27:29] hmm, i can reproduce in eval.php [19:28:05] https://www.irccloud.com/pastebin/jjlBQBjz/ [19:28:49] is the table name maybe not lower case or something? [19:29:54] user_properties is definitely correct [19:30:07] (03PS1) 10Andrew Bogott: wikitech-static: convert yaml content to wikitext when dumping wikitech [puppet] - 10https://gerrit.wikimedia.org/r/382520 (https://phabricator.wikimedia.org/T177450) [19:30:14] PROBLEM - puppet last run on netmon1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[acme-setup-acme-librenms] [19:30:24] hrm https://gist.github.com/thcipriani/4480bc8d46b1a3815d30c67b74e4e88a [19:32:27] thcipriani: got it [19:32:32] https://www.irccloud.com/pastebin/kxBmM631/ [19:32:45] they are servers in extension1 cluster, there is no user props table there [19:33:06] the maint script is connecting to the wrong place [19:33:14] https://noc.wikimedia.org/conf/db-eqiad.php.txt [19:34:07] 10Operations, 10ops-eqiad, 10DC-Ops: Multiple servers in eqiad D8 showing PSU failures - https://phabricator.wikimedia.org/T177227#3662111 (10herron) I wonder if the ipmi issue is related to fan failure on the power supply? [19:34:19] oh! [19:34:50] so uhh... how to fix? [19:34:53] Ah, the script needs to connect to both DB clusters to work? [19:35:17] I guess you dont want to connect to MWEchoDbFactory ? [19:35:24] i havnt looked at what the script is doing [19:35:25] Seems rather scary that sql.php maintenance script instantiates a DatabaseUpdater in production [19:35:30] (03CR) 10Dzahn: start profile for wikiba.se web hosting (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) (owner: 10Dzahn) [19:35:37] it's only meant to read state and ignore any actual updates, but still seems wrong [19:35:48] Krinkle: indeed [19:36:00] Krinkle: it happens for every maint script run if i remember [19:36:06] not just sql.php [19:36:11] it's just sql.php [19:36:14] $db->setSchemaVars( $updater->getSchemaVars() ); [19:36:20] ahh! [19:36:24] Not sure what those are for [19:36:38] anyway, the actual problem here is that when you specify a differnet cluster in sql.php it still assumes index=DB_MASTER, which triggers this code path [19:36:52] it should only do that for the wiki db [19:37:11] davidwbarratt: yes, if you are only touching the user_properties table you should just get a regular db connection, not use MWEchoDbFactory [19:37:13] 10Operations, 10Wikimedia-log-errors: mw1209 /usr/bin/timeout: the monitored command dumped core - https://phabricator.wikimedia.org/T171903#3662116 (10herron) FWIW this cropped up on mw1262 today. Same symptoms, large /var/cache/hhvm/cli.hhbc.sq3 file causing rapid core dumps that filled the disk. [19:37:26] addshore ah [19:37:36] (03PS6) 10Dzahn: racktables: role/profile, remove style violations [puppet] - 10https://gerrit.wikimedia.org/r/382336 [19:37:50] do we want to patch this now or just revert and do it some other time? [19:38:09] addshore I think I can patch it now? [19:38:10] I've held up the train for a while now, can we move this to evening swat? [19:38:17] sure [19:38:21] I'll reschedule it at some other time [19:38:25] davidwbarratt: thanks, sorry for all the confusion :( [19:38:31] thcipriani no problem [19:38:39] thcipriani I'm glad we figured it out! [19:39:37] ok, lemme finish what I've merged [19:40:26] kk, just make sure my patch isn't in there. ;) [19:40:26] thcipriani: is that mine? :D [19:40:36] it is :) [19:40:55] wheeee [19:41:22] addshore: you change should be live on mwdebug1002, check please [19:41:26] checking [19:41:27] *your [19:41:57] Krinkle: https://phabricator.wikimedia.org/T177478 [19:42:21] James_F: you need a full swat, is that right? [19:42:30] er full scap [19:42:36] thcipriani: looks good [19:42:36] words. [19:42:51] PROBLEM - HHVM rendering on mw2123 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:42:57] thcipriani: No, just touch to extension.json I think. [19:43:04] ah, cool :) [19:43:14] thcipriani: It's just shuffling an i18n dependency to the right module. :-) [19:43:22] PROBLEM - MegaRAID on db1046 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough [19:43:41] RECOVERY - HHVM rendering on mw2123 is OK: HTTP OK: HTTP/1.1 200 OK - 73784 bytes in 0.429 second response time [19:45:01] PROBLEM - parsoid on wtp2017 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:45:09] !log thcipriani@tin Synchronized php-1.31.0-wmf.2/extensions/TwoColConflict: SWAT: [[gerrit:382475|Dont register simulate page when not enabled as a BF]] (duration: 00m 52s) [19:45:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:45:16] ^ addshore live now [19:45:21] looks good [19:45:52] RECOVERY - parsoid on wtp2017 is OK: HTTP OK: HTTP/1.1 200 OK - 1051 bytes in 0.082 second response time [19:45:54] (03PS1) 10Andrew Bogott: Move labs shinken to #wikimedia-cloud-feed [puppet] - 10https://gerrit.wikimedia.org/r/382523 (https://phabricator.wikimedia.org/T177427) [19:46:17] James_F: your change should be live on mwdebug1002, check please [19:46:52] thcipriani: Yup, LGTM. [19:47:00] cool, going live [19:48:26] thanks for your work thcipriani :) [19:48:33] +1, thcipriani rocks. [19:48:56] aww...you :) thanks [19:49:20] (03CR) 10Dzahn: "back to the include keywords but in profile. this way it doesn't fail in compiler and still has a "violation delta" of -6. that's not all " [puppet] - 10https://gerrit.wikimedia.org/r/382336 (owner: 10Dzahn) [19:49:28] !log thcipriani@tin Synchronized php-1.31.0-wmf.2/extensions/VisualEditor/extension.json: SWAT: [[gerrit:382490|Move 'parentheses' message to MWSaveDialog's modules]] T177446 (duration: 00m 50s) [19:49:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:49:35] T177446: Broken edit summary preview on mobile VE - https://phabricator.wikimedia.org/T177446 [19:49:38] ^ James_F should be live [19:49:59] now just got to get jenkins to merge my reverts, sync out, done. [19:50:00] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/8206/krypton.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/382336 (owner: 10Dzahn) [19:50:04] Yup. [19:50:53] (03CR) 10Andrew Bogott: [C: 032] Move labs shinken to #wikimedia-cloud-feed [puppet] - 10https://gerrit.wikimedia.org/r/382523 (https://phabricator.wikimedia.org/T177427) (owner: 10Andrew Bogott) [19:52:07] (03PS5) 10Volans: Documentation: Sphinx setup [software/cumin] - 10https://gerrit.wikimedia.org/r/382483 (https://phabricator.wikimedia.org/T159308) [19:52:09] (03PS5) 10Volans: setup.py: split test dependencies [software/cumin] - 10https://gerrit.wikimedia.org/r/382484 [19:52:50] (03CR) 10Dzahn: start profile for wikiba.se web hosting (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) (owner: 10Dzahn) [19:53:04] (03PS7) 10Dzahn: start profile for wikiba.se web hosting [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) [19:55:03] !log thcipriani@tin Synchronized php-1.31.0-wmf.2/extensions/Echo: SWAT: [[gerrit:382522|revert "Use User Ids instead of User Names for Echo Mute"]] (duration: 00m 52s) [19:55:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:58:33] !log upgrading wikitech-static to REL1_30 [19:58:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:59:32] !log thcipriani@tin Synchronized php-1.31.0-wmf.1/extensions/Echo: SWAT: [[gerrit:382521|revert "Use User Ids instead of User Names for Echo Mute"]] (duration: 00m 55s) [19:59:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:59:42] twentyafterfour: all yours, sorry for the delay :( [20:02:01] (03CR) 10Dzahn: start profile for wikiba.se web hosting (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) (owner: 10Dzahn) [20:02:17] (03CR) 10jerkins-bot: [V: 04-1] setup.py: split test dependencies [software/cumin] - 10https://gerrit.wikimedia.org/r/382484 (owner: 10Volans) [20:02:44] (03CR) 10jerkins-bot: [V: 04-1] Documentation: Sphinx setup [software/cumin] - 10https://gerrit.wikimedia.org/r/382483 (https://phabricator.wikimedia.org/T159308) (owner: 10Volans) [20:03:30] thcipriani: thanks [20:05:16] (03CR) 10Dzahn: start profile for wikiba.se web hosting (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) (owner: 10Dzahn) [20:07:16] (03PS8) 10Dzahn: start profile for wikiba.se web hosting [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) [20:08:31] (03PS1) 1020after4: all wikis to 1.31.0-wmf.2 refs T174358 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382528 [20:08:35] (03CR) 1020after4: [C: 032] all wikis to 1.31.0-wmf.2 refs T174358 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382528 (owner: 1020after4) [20:08:37] (03CR) 10Andrew Bogott: [C: 032] wikitech-static: convert yaml content to wikitext when dumping wikitech [puppet] - 10https://gerrit.wikimedia.org/r/382520 (https://phabricator.wikimedia.org/T177450) (owner: 10Andrew Bogott) [20:08:43] (03PS2) 10Andrew Bogott: wikitech-static: convert yaml content to wikitext when dumping wikitech [puppet] - 10https://gerrit.wikimedia.org/r/382520 (https://phabricator.wikimedia.org/T177450) [20:10:07] (03Merged) 10jenkins-bot: all wikis to 1.31.0-wmf.2 refs T174358 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382528 (owner: 1020after4) [20:10:18] (03CR) 10jenkins-bot: all wikis to 1.31.0-wmf.2 refs T174358 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382528 (owner: 1020after4) [20:10:50] !log Promote all wikis to 1.31.0-wmf.2 refs T174358 (currently there are no blockers and no significant logspam) [20:10:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:10:58] T174358: 1.31.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T174358 [20:11:36] !log twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.31.0-wmf.2 refs T174358 [20:11:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:12:36] wtf [20:13:07] No such wiki 's'. [20:13:17] /srv/mediawiki/php-1.31.0-wmf.2/includes/SiteConfiguration.php:545 [20:14:15] halp! [20:14:32] I guess I'm rolling back there are a bunch of weird 'no such wiki' errors coming from search [20:15:08] search? [20:15:22] /w/index.php?search [20:15:38] ebernhar|lunch: dcausse ^ [20:15:48] (03PS9) 10Dzahn: start profile for wikiba.se web hosting [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) [20:16:04] search doesn't seem to be broken on en.wikipedia [20:16:16] but there are a lot of new errors after promoting all to 1.31.0-wmf.2 [20:16:32] ~250 errors per minute [20:16:43] MWException from line 545 of /srv/mediawiki/php-1.31.0-wmf.2/includes/SiteConfiguration.php: No such wiki 'b'. [20:17:39] the letter 'b' could be any of a, d, e, f, i, p, r, v, z [20:18:14] (03PS6) 10Volans: Documentation: Sphinx setup [software/cumin] - 10https://gerrit.wikimedia.org/r/382483 (https://phabricator.wikimedia.org/T159308) [20:18:16] (03PS6) 10Volans: setup.py: split test dependencies [software/cumin] - 10https://gerrit.wikimedia.org/r/382484 [20:18:26] kinda seems like it might be config related but it only happened after syncing wikiversions [20:18:31] (03CR) 10Dzahn: [C: 031] "cool @ "the whole fleet is on Java 8" and i guess since this is already applied and happened, all there is left to say is +1" [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) (owner: 10Hashar) [20:18:46] PROBLEM - MariaDB Slave SQL: s3 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1061, Errmsg: Error Duplicate key name eu_entity_id on query. Default database: amwikimedia. [Query snipped] [20:19:10] oh oh, that database name, that's a brandnew wiki [20:20:13] did it get created today? i know it was added to DNS and config, but dont know about createwiki [20:20:24] Yeah [20:20:29] (03PS7) 10Volans: Documentation: Sphinx setup [software/cumin] - 10https://gerrit.wikimedia.org/r/382483 (https://phabricator.wikimedia.org/T159308) [20:20:30] And... It went wrong a few times apparently [20:20:31] (03PS7) 10Volans: setup.py: split test dependencies [software/cumin] - 10https://gerrit.wikimedia.org/r/382484 [20:20:45] That was the problem that was there earlier [20:21:20] hmmm [20:21:24] twentyafterfour: hmm, i'll take a look [20:21:29] (after lunhc) [20:21:32] I can't seem to reproduce the problem even though I see it in the logs [20:21:49] 10Operations, 10Ops-Access-Requests: IRC operator request for Freenode #wikimedia-operations for @Dereckson - https://phabricator.wikimedia.org/T177493#3662280 (10Peachey88) [20:21:51] when I access e.g. https://en.wikipedia.org/wiki/Special:Search?search=testing+one+two+three it works [20:22:29] twentyafterfour: do the logs have a search string? Its possible to trigger cross-language searches (although i would think not for a new language) [20:22:51] there is also some code related cross-language search in this deployment, but it's not enabled until we ship a config patch [20:22:56] Where the code is tripping up... Seems to suggest that might be related [20:23:05] } else { // $wiki is a foreign wiki [20:23:14] Then a bit further.. [20:23:14] } elseif ( !in_array( $wiki, $this->wikis ) ) { [20:23:14] throw new MWException( "No such wiki '$wiki'." ); [20:23:20] ebernhar|lunch: apparently it is search-string dependent [20:23:55] e.g. "Sugarbush+Cushman" triggers the error [20:24:12] thanks to thcipriani for figuring that out, btw [20:24:42] mutante: Guten Tag! https://gerrit.wikimedia.org/r/#/c/382217/ yeah I think we can merge it safely :] [20:25:01] mutante: gehel and I were talking about it this morning and kind of forgot about it after the lunch break [20:25:04] (03CR) 10jerkins-bot: [V: 04-1] setup.py: split test dependencies [software/cumin] - 10https://gerrit.wikimedia.org/r/382484 (owner: 10Volans) [20:25:08] 10Operations, 10netops: Implement RPKI (Resource Public Key Infrastructure) - https://phabricator.wikimedia.org/T61115#3662281 (10ayounsi) Added the key pair generated for ARIN to the pw repository. Generated a SOA for the v6 ARIN prefix, if no issues after propagation, I'll generate the last two ARIN v4 SOAs.... [20:25:16] I guess I need to roll back until ebernhar|lunch can take a look at it [20:25:32] twentyafterfour: yea i'll have to pull it into the debugger, set some break points, and see whats going on [20:25:48] (03CR) 10jerkins-bot: [V: 04-1] Documentation: Sphinx setup [software/cumin] - 10https://gerrit.wikimedia.org/r/382483 (https://phabricator.wikimedia.org/T159308) (owner: 10Volans) [20:26:07] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 90.00% of data above the critical threshold [50.0] [20:26:13] !log rolling back to 1.31.0-wmf.1 [20:26:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:26:41] (03PS1) 1020after4: all wikis to 1.31.0-wmf.1 refs T174358 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382532 [20:26:43] (03CR) 1020after4: [C: 032] all wikis to 1.31.0-wmf.1 refs T174358 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382532 (owner: 1020after4) [20:28:08] (03Merged) 10jenkins-bot: all wikis to 1.31.0-wmf.1 refs T174358 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382532 (owner: 1020after4) [20:28:22] (03CR) 10jenkins-bot: all wikis to 1.31.0-wmf.1 refs T174358 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382532 (owner: 1020after4) [20:28:51] (03PS3) 10Ema: varnish: support for version 5 [puppet] - 10https://gerrit.wikimedia.org/r/382464 (https://phabricator.wikimedia.org/T168529) [20:28:54] !log twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.31.0-wmf.1 refs T174358 [20:29:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:29:03] T174358: 1.31.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T174358 [20:29:09] (03CR) 10Dzahn: [C: 032] jenkins: switch to Java8 [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) (owner: 10Hashar) [20:29:20] (03CR) 10jerkins-bot: [V: 04-1] varnish: support for version 5 [puppet] - 10https://gerrit.wikimedia.org/r/382464 (https://phabricator.wikimedia.org/T168529) (owner: 10Ema) [20:29:41] hashar: ^ doing it. violations delta: 0 adds one AND fixed one :) [20:29:53] !!!! [20:30:00] (03PS10) 10Dzahn: jenkins: switch to Java8 [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) (owner: 10Hashar) [20:30:16] hashar: now before it's night or Friday [20:30:56] RECOVERY - MariaDB Slave SQL: s3 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes [20:31:13] ah, nice to see it recovered. that dbstore thing [20:31:28] it recovered naturaly- [20:31:52] the gnomes fix it while you are not looking [20:32:10] jynus: hehee :) nice! so it's just because it is brand-new i assume [20:32:16] when createwiki runs [20:32:26] it is a mediawiki bug [20:32:33] whoever does deployments should fix it [20:32:36] ah [20:32:45] please change add wiki to do CREATE IF NOT EXISTS [20:33:16] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] [20:33:56] (03PS8) 10Volans: Documentation: Sphinx setup [software/cumin] - 10https://gerrit.wikimedia.org/r/382483 (https://phabricator.wikimedia.org/T159308) [20:33:58] (03PS8) 10Volans: setup.py: split test dependencies [software/cumin] - 10https://gerrit.wikimedia.org/r/382484 [20:35:14] i see, it's part of an MW extension https://www.mediawiki.org/wiki/Extension:WikimediaMaintenance [20:36:11] i'll file a ticket for that jynus [20:36:20] (03Draft1) 10Paladox: contint: Set java 8 as default [puppet] - 10https://gerrit.wikimedia.org/r/382550 [20:36:23] (03PS2) 10Paladox: contint: Set java 8 as default [puppet] - 10https://gerrit.wikimedia.org/r/382550 [20:36:26] hashar ^^ :) [20:36:39] not sure if including jenkins::common will break anything? [20:36:56] (03CR) 10jerkins-bot: [V: 04-1] contint: Set java 8 as default [puppet] - 10https://gerrit.wikimedia.org/r/382550 (owner: 10Paladox) [20:37:11] jynus: mutante: looks like it DOES do create if not exists [20:37:20] goes to contint1001 to watch the java8 change, hashar [20:37:41] oh, there is one create database line that doesn't use IF NOT EXIST [20:38:08] (03PS3) 10Paladox: contint: Set java 8 as default [puppet] - 10https://gerrit.wikimedia.org/r/382550 [20:39:06] (03CR) 10Dzahn: "no-op on contint1001/2001 - everything had already happened, nothing to see for me :)" [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) (owner: 10Hashar) [20:39:53] twentyafterfour: just one line? nice find then [20:41:22] mutante: all of the tables are created by sourcing .sql files so there may be some missing IF NOT EXISTS in there too [20:41:27] (03PS4) 10Ema: varnish: support for version 5 [puppet] - 10https://gerrit.wikimedia.org/r/382464 (https://phabricator.wikimedia.org/T168529) [20:41:37] the sql files are spread all over mediawiki and extensions [20:41:54] (03CR) 10jerkins-bot: [V: 04-1] varnish: support for version 5 [puppet] - 10https://gerrit.wikimedia.org/r/382464 (https://phabricator.wikimedia.org/T168529) (owner: 10Ema) [20:42:06] paladox: i think on the original change moritz said how it's "much simpler to not let puppet mess with alternatives" or so [20:42:16] oh [20:42:21] mutante: did you make a task? I'll reference it in the patch [20:42:36] (03PS4) 10Paladox: contint: Set java 8 as default [puppet] - 10https://gerrit.wikimedia.org/r/382550 [20:43:31] RECOVERY - MegaRAID on db1046 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy [20:45:54] https://gerrit.wikimedia.org/r/#/c/382572/1/addWiki.php [20:46:59] twentyafterfour: https://phabricator.wikimedia.org/T177537 [20:47:05] !log 1.31.0-wmf.2 is now blocked by T177535 [20:47:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:47:13] wonder what the right phab tags are [20:47:14] mutante: thank you to have double checked on the contint hosts ! :) [20:47:14] T177535: MWException from line 545 of SiteConfiguration.php: No such wiki 's'. - https://phabricator.wikimedia.org/T177535 [20:47:20] hashar: :) [20:47:31] twentyafterfour: can you give it a tag ?:) [20:48:20] 10Operations, 10Continuous-Integration-Infrastructure, 10Jenkins, 10Release-Engineering-Team (Kanban): Upgrade jenkins to 2.73.1 (new lts release) - https://phabricator.wikimedia.org/T168644#3662359 (10hashar) [20:48:26] mutante: that opens the way to upgrade Jenkins \o/ [20:49:25] cool, hashar! and i found a tag, twentyafterfour [20:50:07] ah cool [20:51:34] bbiaw, food [20:54:13] (03CR) 10Zoranzoki21: [C: 031] robots.txt: block MJ12bot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382494 (owner: 10BBlack) [20:55:15] 10Operations, 10Commons, 10Multimedia, 10Traffic, and 2 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3662372 (10Dispenser) Since this seems to be impossible. Would adding interstitial for Facebook referrers be doable? I imagine that we'... [20:56:02] (03CR) 10Hashar: [C: 031] "https://pypi.python.org/pypi/cumin is a celery monitor. the github repo is gone https://github.com/s2krish/cumin/ so maybe it can be take" [software/cumin] - 10https://gerrit.wikimedia.org/r/382482 (owner: 10Volans) [20:57:11] lols: T147737 [20:57:12] T147737: addWiki keeps being broken by people - https://phabricator.wikimedia.org/T147737 [21:00:49] twentyafterfour: I think https://gerrit.wikimedia.org/r/382595 should do the trick. Will see what jenkins thinks [21:06:19] twentyafterfour: also fwiw, it seems the rollback didn't update https://noc.wikimedia.org/conf/ which says everything is on wmf.2, but Special:Version shows wmf.1 [21:07:11] ebernhardson: caching [21:07:22] it sits behind misc-varnish [21:07:39] ahh [21:08:05] maybe scap could purge that page when sync-wikiversions is run [21:09:30] ebernhardson: +2, I'll cherry pick to wmf.2 [21:09:41] jenkins liked the patch. I'm tempted to dig into phan and see if i can make `list(...) = "some string"` emit an issue [21:09:51] i can't imagine that being done intentionally very often [21:10:07] Might be able to do it in phpcs [21:10:21] phpcs wouldn't know its a string, the array came from another functoin but was annotated string[] [21:10:36] so reset(string[]) would return a string, and then list(...) deconstructed the string [21:10:41] ebernhardson: yeah that should raise a notice like undefined indes does [21:11:30] I figured it had to be a string passed to something expecting an array but that's a hard one to track down given that strings are often treated as arrays on purpose [21:11:55] and they literally are arrays at the C level ;) [21:13:20] cherry picked at https://gerrit.wikimedia.org/r/#/c/382605/1 ... I'll sync it and promote to wmf.2 once again when this merges [21:13:44] !log Getting the train back on track: moving to wmf.2 as soon as https://gerrit.wikimedia.org/r/#/c/382605/1 merges [21:13:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:19:38] (03PS1) 10Thcipriani: WIP: docker pusher [puppet] - 10https://gerrit.wikimedia.org/r/382608 [21:20:09] (03CR) 10jerkins-bot: [V: 04-1] WIP: docker pusher [puppet] - 10https://gerrit.wikimedia.org/r/382608 (owner: 10Thcipriani) [21:20:12] (03CR) 10Thcipriani: [C: 04-1] WIP: docker pusher [puppet] - 10https://gerrit.wikimedia.org/r/382608 (owner: 10Thcipriani) [21:30:29] (03CR) 10Dduvall: "See my comments about ditching the python script and favoring a simple wrapper for `docker --config /etc/docker-pusher push` that can be e" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/382608 (owner: 10Thcipriani) [21:31:25] BTW: Thanks for the quick fix ebernhardson! [21:32:04] !log twentyafterfour@tin Synchronized php-1.31.0-wmf.2/extensions/CirrusSearch: sync CirrusSearch to deploy https://gerrit.wikimedia.org/r/#/c/382605/ refs T177535 (duration: 01m 06s) [21:32:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:32:13] T177535: MWException from line 545 of SiteConfiguration.php: No such wiki 's'. - https://phabricator.wikimedia.org/T177535 [21:32:20] (03CR) 10EBernhardson: [C: 031] "Good to deploy, i want to delay deploying this till a bit after the AB test is turned off though so that stragglers due to caching don't h" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382451 (https://phabricator.wikimedia.org/T177490) (owner: 10DCausse) [21:33:09] twentyafterfour: the bugs were a bit silly..should have been caught with better testing :( [21:33:23] (03PS1) 1020after4: all wikis to 1.31.0-wmf.2 refs T174358 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382612 [21:33:25] (03CR) 1020after4: [C: 032] all wikis to 1.31.0-wmf.2 refs T174358 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382612 (owner: 1020after4) [21:34:47] (03Merged) 10jenkins-bot: all wikis to 1.31.0-wmf.2 refs T174358 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382612 (owner: 1020after4) [21:35:27] !log twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.31.0-wmf.2 refs T174358 [21:35:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:35:33] T174358: 1.31.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T174358 [21:36:08] (03CR) 10Hashar: "The class contint::packages::java installs dependencies required for the job to run. We most probably still need the java 7 jdk installed," [puppet] - 10https://gerrit.wikimedia.org/r/382550 (owner: 10Paladox) [21:36:10] (03CR) 10jenkins-bot: all wikis to 1.31.0-wmf.2 refs T174358 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382612 (owner: 1020after4) [21:36:10] ebernhardson: now I get " Failed to run getConfiguration.php. " [21:36:15] could be related? [21:36:31] MWException from line 561 of /srv/mediawiki/php-1.31.0-wmf.2/includes/SiteConfiguration.php: Failed to run getConfiguration.php. [21:36:31] twentyafterfour: hmm, well yes its same code path :S looking [21:36:51] not nearly as many errors this time though [21:36:56] might be an isolated thing [21:37:43] (03CR) 10Paladox: "> The class contint::packages::java installs dependencies required" [puppet] - 10https://gerrit.wikimedia.org/r/382550 (owner: 10Paladox) [21:38:07] (03PS2) 10Thcipriani: WIP: docker pusher [puppet] - 10https://gerrit.wikimedia.org/r/382608 [21:38:11] (03PS5) 10Paladox: contint: Set java 8 as default [puppet] - 10https://gerrit.wikimedia.org/r/382550 [21:39:06] (03CR) 10Thcipriani: [C: 04-1] WIP: docker pusher [puppet] - 10https://gerrit.wikimedia.org/r/382608 (owner: 10Thcipriani) [21:39:23] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): labvirt1015 crashes - https://phabricator.wikimedia.org/T171473#3662516 (10bd808) @Cmjohnson what's our next step here? Do we have enough info to request additional replacement parts from Dell? This poor box has been a bit of a dud since... [21:43:18] ebernhardson: another one: Unknown fallback profile: mlr-1024rs [21:43:25] causing CirrusSearch\Search\InvalidRescoreProfileException [21:47:05] Heads-up that we’re currently stress testing the new ORES cluster. Shouldn’t affect anything else, though. [21:49:55] twentyafterfour: oh fun. /usr/bin/timeout: the monitored command dumped core [21:50:25] twentyafterfour: we can try though just turning on the new config that stops using this all together, https://gerrit.wikimedia.org/r/#/c/382474/ [21:51:32] (also means the error doesn't reproduce on terbium, because terbium uses zend instead of hhvm ...) [21:52:31] T177545 [21:52:32] T177545: Unknown fallback profile: mlr-1024rs - https://phabricator.wikimedia.org/T177545 [21:52:55] (03CR) 1020after4: [C: 032] [cirrus] Enable loading interwiki configs via API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382474 (owner: 10DCausse) [21:54:19] (03Merged) 10jenkins-bot: [cirrus] Enable loading interwiki configs via API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382474 (owner: 10DCausse) [21:54:36] pull it to mwdebug and i'll double check the new code path is all happy [21:56:09] (03CR) 10jenkins-bot: [cirrus] Enable loading interwiki configs via API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382474 (owner: 10DCausse) [21:57:01] ebernhardson: the patch should be on mwdebug1002 [21:59:34] hmm, language fallback isn't triggering on prod or mwdebug1002. wil nede a moment [22:00:19] that might explain the very low occurence rate of that error? I only saw a couple of instances of it [22:01:19] and none of those are from enwiki [22:02:06] (03PS3) 10Thcipriani: WIP: docker pusher [puppet] - 10https://gerrit.wikimedia.org/r/382608 (https://phabricator.wikimedia.org/T176896) [22:02:16] actually I take it back, now there are 31 occurrence [22:02:24] still relatively low though [22:03:03] twentyafterfour: will I be in your way if I do a quick scap3 deploy for striker? [22:03:06] example uri from enwiki: /w/index.php?search=Evangelische+Kirche+Deutsch+Eylau&title=Special%3ASearch&go=Go [22:03:10] bd808: go for it [22:03:16] thx [22:04:18] !log bd808@tin Started deploy [striker/deploy@fbb9019]: Prevent tools from being named with invalid Kubernetes namespace labels (T176681) [22:04:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:04:24] T176681: Striker should not allow tool names to include '_' for Kubernetes compatibility - https://phabricator.wikimedia.org/T176681 [22:04:51] !log bd808@tin Finished deploy [striker/deploy@fbb9019]: Prevent tools from being named with invalid Kubernetes namespace labels (T176681) (duration: 00m 33s) [22:04:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:05:44] (03PS4) 10Thcipriani: WIP: docker pusher [puppet] - 10https://gerrit.wikimedia.org/r/382608 (https://phabricator.wikimedia.org/T176896) [22:06:53] 10Operations, 10DBA, 10Availability (Multiple-active-datacenters), 10Performance-Team (Radar): Make apache/maintenance hosts TLS connections to mariadb work - https://phabricator.wikimedia.org/T175672#3662667 (10aaron) We discussed proxies in the last performance meeting and we're OK with that (it would cu... [22:08:24] * bd808 is done with scap3 [22:11:34] (03PS1) 10Ayounsi: Add DNS/IP allocations for ftp-internal [dns] - 10https://gerrit.wikimedia.org/r/382615 [22:11:50] twentyafterfour: so of course the code path we tested, sister search, works fine. Cross language search just doens't get enough love ... small patch incoming that i've already tested on mwdebug1002 and we can ship both together [22:12:09] ok :) [22:13:32] !log awight@tin Started deploy [ores/deploy@42c5663]: Cause ORES service restart [22:13:37] (03CR) 10Dzahn: [C: 031] Add DNS/IP allocations for ftp-internal [dns] - 10https://gerrit.wikimedia.org/r/382615 (owner: 10Ayounsi) [22:13:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:13:50] !log awight@tin Finished deploy [ores/deploy@42c5663]: Cause ORES service restart (duration: 00m 19s) [22:13:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:14:33] twentyafterfour: https://gerrit.wikimedia.org/r/382616 [22:14:42] (03CR) 10Ayounsi: [C: 032] Add DNS/IP allocations for ftp-internal [dns] - 10https://gerrit.wikimedia.org/r/382615 (owner: 10Ayounsi) [22:17:21] (03CR) 10Hashar: "> java 8 has to be the default on the jdk and i am not sure if using java 7 as secondary will work." [puppet] - 10https://gerrit.wikimedia.org/r/382550 (owner: 10Paladox) [22:18:00] (03Abandoned) 10Paladox: contint: Set java 8 as default [puppet] - 10https://gerrit.wikimedia.org/r/382550 (owner: 10Paladox) [22:19:33] cherry-picked https://gerrit.wikimedia.org/r/#/c/382618/1 [22:25:17] might also fix the mlr-1024rs problem ... at least i can't seem to reproduce [22:25:24] (on wmf.2) [22:26:07] (03CR) 10Hashar: [C: 031] "Thinking about it twice, most probably wikitech would not need all those fonts packages anyway. I guess they are used when rendering SVG " [puppet] - 10https://gerrit.wikimedia.org/r/380712 (owner: 10Muehlenhoff) [22:29:26] (03CR) 10HaeB: "> > * Except for the skin field (cf. below), this purging strategy" [puppet] - 10https://gerrit.wikimedia.org/r/379829 (https://phabricator.wikimedia.org/T175395) (owner: 10Bmansurov) [22:33:11] (03PS1) 10Ayounsi: Add DHCP entry for ftp-internal [puppet] - 10https://gerrit.wikimedia.org/r/382619 [22:33:29] PROBLEM - MegaRAID on db1046 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough [22:33:49] (03CR) 10Ayounsi: [C: 032] Add DHCP entry for ftp-internal [puppet] - 10https://gerrit.wikimedia.org/r/382619 (owner: 10Ayounsi) [22:35:13] ebernhardson: ready to sync [22:35:18] (03PS2) 10Volans: Docstrings: use Google Style [software/cumin] - 10https://gerrit.wikimedia.org/r/382479 (https://phabricator.wikimedia.org/T159308) [22:35:20] (03PS2) 10Volans: Documentation: convert Markdown to reStructuredText [software/cumin] - 10https://gerrit.wikimedia.org/r/382480 (https://phabricator.wikimedia.org/T159308) [22:35:22] (03PS4) 10Volans: CLI: extract parser definition from parse_args() [software/cumin] - 10https://gerrit.wikimedia.org/r/382481 [22:35:26] (03PS4) 10Volans: setup.py: prepare for PyPi submission [software/cumin] - 10https://gerrit.wikimedia.org/r/382482 [22:35:27] (03PS9) 10Volans: Documentation: Sphinx setup [software/cumin] - 10https://gerrit.wikimedia.org/r/382483 (https://phabricator.wikimedia.org/T159308) [22:35:29] (03PS9) 10Volans: setup.py: split test dependencies [software/cumin] - 10https://gerrit.wikimedia.org/r/382484 [22:37:19] (03PS1) 10Ayounsi: Add partman receipe for ftp-internal [puppet] - 10https://gerrit.wikimedia.org/r/382620 [22:37:41] !log twentyafterfour@tin Synchronized php-1.31.0-wmf.2/extensions/CirrusSearch/: deploy https://gerrit.wikimedia.org/r/#/c/382618/ (duration: 00m 58s) [22:37:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:38:04] twentyafterfour: cross-language search is back! [22:38:38] i'll keep an eye on the mlr-1024rs ones, but i wasn't able to reproduce from any of the logs [22:38:43] (03CR) 10Ayounsi: [C: 032] Add partman receipe for ftp-internal [puppet] - 10https://gerrit.wikimedia.org/r/382620 (owner: 10Ayounsi) [22:38:53] last message was 20 minutes ago [22:39:12] ebernhardson: I'm now syncing the change to CirrisSearch-common.php [22:39:36] can you test again to be sure that doesn't break things? looks like canaries are ok [22:39:38] !log twentyafterfour@tin Synchronized wmf-config/CirrusSearch-common.php: sync https://gerrit.wikimedia.org/r/#/c/382474/ (duration: 00m 47s) [22:39:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:39:53] ebernhardson: thanks again for the fixes! [22:40:29] uh oh [22:40:30] BadMethodCallException from line 140 of /srv/mediawiki/php-1.31.0-wmf.2/extensions/CirrusSearch/includes/BaseInterwikiResolver.php: Call to a member function isLocal() on a non-object (boolean) [22:40:33] ebernhardson: ^ [22:40:47] twentyafterfour: hmm, thats what the second patch should have fixed. Sync th ecirrus dir? [22:40:55] ok [22:43:39] !log twentyafterfour@tin Synchronized php-1.31.0-wmf.2/extensions/CirrusSearch: Sync CirrusSearch extension again for good measure (duration: 00m 57s) [22:43:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:44:59] ebernhardson: doesn't seem to be resolved. [22:46:56] twentyafterfour: hmm, looking [22:47:09] !log twentyafterfour@tin Synchronized php-1.31.0-wmf.2/extensions/CirrusSearch/includes/: Sync CirrusSearch extension again for good measure (duration: 00m 51s) [22:47:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:48:39] PROBLEM - ores on ores1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:49:08] ^ stress test, nothing to worry about. [22:49:39] RECOVERY - ores on ores1002 is OK: HTTP OK: HTTP/1.0 200 OK - 3666 bytes in 9.438 second response time [22:52:54] stressful [22:53:51] hmm, oddly some wikis the mw core interwiki lookup fails for valid prefixes ... [22:54:41] odd indeed [22:55:46] oh, on this wiki wikisource prefix was renamed from s: to src: but the code didn't pick that up. Gotta figure out where that comes from ... [22:55:49] PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 185 bytes in 0.620 second response time [23:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear deployers, time to do the Evening SWAT (Max 8 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171005T2300). [23:00:04] ebernhardson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:47] train is still running - there is an ongoing issue with CirrusSearch [23:01:24] (inter-wiki search is currently broken) [23:01:36] I can swat afterward [23:01:54] twentyafterfour: well i can put a temp fix in for this, it should accept the resolver returning null|false anyways as documented. But i'll have to figure out why its detecting the wrong interwiki prefixes for a full fix ... [23:02:18] ebernhardson: ok that sounds good to me [23:02:35] and we can SWAT your AB test whenever you'd like [23:03:05] (03PS1) 10Ayounsi: Add ftp-internal to puppet [puppet] - 10https://gerrit.wikimedia.org/r/382623 [23:04:20] (03CR) 10Dzahn: [C: 031] Add ftp-internal to puppet [puppet] - 10https://gerrit.wikimedia.org/r/382623 (owner: 10Ayounsi) [23:04:36] twentyafterfour: https://gerrit.wikimedia.org/r/382624 [23:06:25] (03CR) 10Dzahn: [C: 031] "actually, sorry, use "role::test" instead of spare, that is even more accurate" [puppet] - 10https://gerrit.wikimedia.org/r/382623 (owner: 10Ayounsi) [23:08:19] cherry picked and +2'd https://gerrit.wikimedia.org/r/#/c/382625/ [23:08:41] ebernhardson: you want me to merge your ab test SWAT patch too? [23:08:43] (03PS2) 10Ayounsi: Add ftp-internal to puppet [puppet] - 10https://gerrit.wikimedia.org/r/382623 [23:10:04] twentyafterfour: sure. It's actually just turning off the test, should be much safer :) [23:10:07] (03CR) 10Dzahn: [C: 031] Add ftp-internal to puppet [puppet] - 10https://gerrit.wikimedia.org/r/382623 (owner: 10Ayounsi) [23:10:09] (03CR) 10Ayounsi: [C: 032] Add ftp-internal to puppet [puppet] - 10https://gerrit.wikimedia.org/r/382623 (owner: 10Ayounsi) [23:10:26] btw, most of the badmethodcall errors are from svwiki [23:11:27] twentyafterfour: yea, thats the one that renamed the wikisource prefix from 's' to 'src' [23:11:58] but it looks like $wgSiteMatrixSites, where we get the info from, is just a static array setup in wmf-config and doesn't know about wiki specific customizations [23:12:17] I see [23:12:49] * ebernhardson didn't even know there were wiki specific customizations ... learn knew things [23:18:58] (03PS7) 10Dzahn: racktables: role/profile, remove style violations [puppet] - 10https://gerrit.wikimedia.org/r/382336 [23:20:33] PROBLEM - puppet last run on db1063 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:20:34] PROBLEM - puppet last run on mw1322 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:20:43] PROBLEM - puppet last run on mc1030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:21:13] PROBLEM - puppet last run on analytics1037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:21:50] !log twentyafterfour@tin Synchronized php-1.31.0-wmf.2/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: SWAT https://gerrit.wikimedia.org/r/#/c/382611/ (duration: 00m 47s) [23:21:50] ebernhardson: ab test patch sync'd [23:21:53] PROBLEM - puppet last run on mw1221 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:21:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:22:03] PROBLEM - puppet last run on etcd1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:22:03] PROBLEM - puppet last run on labsdb1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:22:03] PROBLEM - puppet last run on mw1264 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:22:29] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current), and 2 others: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3662930 (10awight) Ran a few tests today, and found that the filehandle issue is not solved. The celery service died on several node... [23:23:06] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current), and 2 others: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3662933 (10awight) [23:23:06] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current), 10User-Ladsgroup: Review and fix file handle management in worker and celery processes - https://phabricator.wikimedia.org/T174402#3662931 (10awight) 05Resolved>03Open Reopening, I saw this error kill the celery worker on a... [23:23:37] !log twentyafterfour@tin Synchronized php-1.31.0-wmf.2/extensions/CirrusSearch/includes/BaseInterwikiResolver.php: sync https://gerrit.wikimedia.org/r/#/c/382625/ (duration: 00m 47s) [23:23:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:24:05] Cool, that did it :) [23:24:33] twentyafterfour: I've got two late additions for swat if that's okay - https://gerrit.wikimedia.org/r/#/q/status:open+project:mediawiki/core+branch:wmf/1.31.0-wmf.2 [23:24:46] Krinkle: you're just in time :) [23:24:47] Minor taming of jquery-migrate which went out to all wikis earlier today [23:25:31] Thanks! [23:25:33] ok, sync them both together I assume? [23:25:54] twentyafterfour: yea [23:25:56] heh, one is just a comment deletion so I guess that ones safe [23:27:42] twentyafterfour: yes [23:28:39] (03CR) 10Dzahn: racktables: role/profile, remove style violations (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/382336 (owner: 10Dzahn) [23:33:22] (03PS8) 10Dzahn: racktables: role/profile, remove style violations [puppet] - 10https://gerrit.wikimedia.org/r/382336 [23:33:57] (03PS9) 10Dzahn: racktables: role/profile, remove style violations [puppet] - 10https://gerrit.wikimedia.org/r/382336 [23:34:20] !log upgrading ps1-d2-eqiad.mgmt.eqiad.wmnet (unused) [23:34:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:26] Krinkle: syncing [23:37:05] !log twentyafterfour@tin Synchronized php-1.31.0-wmf.2/resources/lib/jquery/jquery.migrate.js: sync https://gerrit.wikimedia.org/r/#/c/382530 and https://gerrit.wikimedia.org/r/#/c/382529 (duration: 00m 47s) [23:37:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:38:11] cool, fatalmonitor is all clear now [23:38:23] (03PS10) 10Dzahn: racktables: role/profile, remove style violations [puppet] - 10https://gerrit.wikimedia.org/r/382336 [23:40:03] twentyafterfour: confirmed. [23:40:13] Cool, thanks! [23:40:48] (03PS11) 10Dzahn: racktables: role/profile, remove style violations [puppet] - 10https://gerrit.wikimedia.org/r/382336 [23:41:06] twentyafterfour: As muh as that sounds amazing, I assume you don't mean https://logstash.wikimedia.org/app/kibana#/dashboard/Fatal-Monitor is empty now? [23:41:06] * twentyafterfour declares the end of the line for evening SWAT and the 1.31.0-wmf.2 train. [23:41:15] Because that would produce more worry than joy at this point. [23:41:27] Krinkle: it's not empty exactly, [23:41:32] but my filtered view of it is [23:41:52] my filters exclude the warnings and a bunch of common db / redis errors [23:43:05] https://logstash.wikimedia.org/goto/62fbcc67d7315dc9fa2df8a32fed42b5 [23:43:28] Hm. are the default urls also sharable btw? [23:43:29] https://logstash.wikimedia.org/app/kibana#/dashboard/Fatal-Monitor?_g=h@b74aee6&_a=h@15c1f8f [23:43:32] 3 errors in the past 15 minutes. Much easier to spot problems [23:43:38] Looks like they're shortening by default pretty much now [23:43:50] this is unfiltered last 30 minutes (custom from 15 min) [23:43:59] "unable to completely restore the url, be sure to use the share functionality" [23:44:29] Hm.okay [23:44:36] I guess it's some kind of session store in the browser. [23:44:40] Not sure what the point is in that case :P [23:44:44] I do like that the urls are much shorter now, that was ridiculous before [23:45:00] but yeah it's kinda silly the way they do it [23:45:06] Well, yeah, but now they're user-specific by default. [23:45:15] you can always save as a new name and share that saved dashboard [23:45:15] I'd rahter have longer urls that I can bookmark. [23:45:52] yeah [23:46:22] it's nice when a url contains the actual state, kibana just has sooo much state to store [23:50:11] RECOVERY - puppet last run on analytics1037 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:50:31] RECOVERY - puppet last run on db1063 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:50:41] RECOVERY - puppet last run on mw1322 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:50:41] RECOVERY - puppet last run on mc1030 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:51:46] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se, and 2 others: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531#3663063 (10Dzahn) Next we should figure out: - who should have Gerrit permissions for +2/merge on the content repo - give them the permissi... [23:51:52] RECOVERY - puppet last run on mw1221 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:52:02] RECOVERY - puppet last run on etcd1003 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [23:52:02] RECOVERY - puppet last run on mw1264 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:52:02] RECOVERY - puppet last run on labsdb1010 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [23:53:31] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se, and 2 others: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531#3663069 (10Dzahn) By the way i couldn't just merge Ladsgroup's change in the new content repo as i just have +1 rights there, but not +2. [23:56:26] (03PS12) 10Dzahn: racktables: role/profile, remove style violations [puppet] - 10https://gerrit.wikimedia.org/r/382336 [23:56:32] (03CR) 10Dzahn: [C: 032] racktables: role/profile, remove style violations [puppet] - 10https://gerrit.wikimedia.org/r/382336 (owner: 10Dzahn)