[00:12:31] PROBLEM - carbon-frontend-relay metric drops on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [100.0] [00:16:21] RECOVERY - carbon-frontend-relay metric drops on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [04:09:01] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=6036.30 Read Requests/Sec=5043.30 Write Requests/Sec=464.00 KBytes Read/Sec=21840.40 KBytes_Written/Sec=8917.20 [04:15:01] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=2.20 Read Requests/Sec=156.30 Write Requests/Sec=6.50 KBytes Read/Sec=1207.20 KBytes_Written/Sec=356.80 [04:55:51] PROBLEM - Check Varnish expiry mailbox lag on cp2002 is CRITICAL: CRITICAL: expiry mailbox lag is 585257 [05:35:51] RECOVERY - Check Varnish expiry mailbox lag on cp2002 is OK: OK: expiry mailbox lag is 0 [05:42:05] (03PS1) 10KartikMistry: Re-enable ContentTranslation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349869 (https://phabricator.wikimedia.org/T163344) [06:06:09] (03PS1) 10Marostegui: db-eqiad.php: Repool db1092, depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349870 (https://phabricator.wikimedia.org/T163548) [06:08:36] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1092, depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349870 (https://phabricator.wikimedia.org/T163548) (owner: 10Marostegui) [06:09:50] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1092, depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349870 (https://phabricator.wikimedia.org/T163548) (owner: 10Marostegui) [06:09:59] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1092, depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349870 (https://phabricator.wikimedia.org/T163548) (owner: 10Marostegui) [06:12:32] !log marostegui@naos Synchronized wmf-config/db-eqiad.php: Repool db1092, depoll db1087 - T162539 T163548 (duration: 02m 19s) [06:12:41] !log Deploy alter table on wikidatawiki.wb_terms on db1087 - https://phabricator.wikimedia.org/T162539 https://phabricator.wikimedia.org/T163548 [06:12:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:12:44] T162539: Deploy schema change for adding term_full_entity_id column to wb_terms table - https://phabricator.wikimedia.org/T162539 [06:12:44] T163548: Drop the useless wb_terms keys "wb_terms_entity_type" and "wb_terms_type" on "wb_terms" table - https://phabricator.wikimedia.org/T163548 [06:12:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:23:36] !log Deploy alter table enwiki.revision db1052 (eqiad master) - T132416 [06:23:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:23:44] T132416: Rampant differences in indexes on enwiki.revision across the DB cluster - https://phabricator.wikimedia.org/T132416 [06:29:11] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [06:30:11] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:31:31] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0] [06:34:11] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0] [06:37:31] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:38:11] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:43:51] PROBLEM - puppet last run on labvirt1011 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tzdata] [06:48:31] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [06:50:11] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0] [06:57:20] (03PS2) 10Muehlenhoff: Revert "Create a separate sysctl configuration for setting conntrack settings" [puppet] - 10https://gerrit.wikimedia.org/r/349396 [06:58:11] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:58:31] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [07:06:46] (03CR) 10Muehlenhoff: [C: 032] Revert "Create a separate sysctl configuration for setting conntrack settings" [puppet] - 10https://gerrit.wikimedia.org/r/349396 (owner: 10Muehlenhoff) [07:10:11] PROBLEM - puppet last run on mw2165 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ferm/conntrack-sysctl.conf] [07:10:11] PROBLEM - puppet last run on mw2149 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ferm/conntrack-sysctl.conf] [07:10:11] PROBLEM - puppet last run on mw2135 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ferm/conntrack-sysctl.conf] [07:10:12] PROBLEM - puppet last run on elastic1019 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ferm/conntrack-sysctl.conf] [07:10:31] PROBLEM - puppet last run on mw2147 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ferm/conntrack-sysctl.conf] [07:10:31] PROBLEM - puppet last run on mw2222 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ferm/conntrack-sysctl.conf] [07:11:33] moritzm: ^ [07:11:44] looking [07:11:51] RECOVERY - puppet last run on labvirt1011 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [07:12:22] marostegui: jynus: I want to see https://phabricator.wikimedia.org/P5295 and get to wikidata related ones I already signed NDA (I've access to tenderil) Can I see it? [07:12:44] "Access Denied: Restricted Paste" [07:13:01] RECOVERY - puppet last run on mw2165 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [07:14:11] RECOVERY - puppet last run on mw2149 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [07:14:16] Amir1, that paste contains potentially sensitive data (private user's data), it requires an NDA and being part of the NDA-signed phabricator group [07:14:21] RECOVERY - puppet last run on mw2222 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [07:14:40] if you have signed one, ask phab admins to add you to the right group [07:14:55] I do not handle such group [07:15:11] RECOVERY - puppet last run on mw2135 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [07:16:06] Amir1, what queries are you interested in? [07:16:15] jynus: wikidata-related ones [07:16:30] let me see [07:16:42] I think I generated a wikidata-only paste [07:16:44] I already signed it (that's why I have access to production cluster) [07:16:53] that'd be great [07:17:39] https://phabricator.wikimedia.org/T163544 [07:18:08] I created a specific wikidata ticket [07:18:55] It seems hoo took care of it [07:19:08] we have a channel for database, nice [07:19:09] you can coordinate with hoo, and ask for NDA later if you need it to the phab admins [07:19:34] Amir1: you can simply file a Phabricator ticket and add the WMF-NDA-Requests project to it [07:19:45] yes, that would be it ^ [07:19:54] you've already signed P2, so the requirements are already fulfilled, only needs someone with admin access to add you [07:20:04] https://phabricator.wikimedia.org/T134651 [07:20:10] Something like this? [07:21:59] that ticket was for your shell access and while it was also tagged WMF-NDA-Requests, that seems to gave fallen through the cracks. I suggest you open a new WMF-NDA-Requests ticket only for adding you to "WMF-NDA" in Phabricator [07:22:18] Amir1, that is the ops side of things (cluster access) [07:22:41] there is a separate thing, which is phabricator requests, that normally ops do not handle [07:22:56] okay, sorry for bothering [07:23:01] not bothering [07:23:03] let me make on [07:23:11] RECOVERY - puppet last run on elastic1019 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [07:23:21] I am just telling you why we do not want to step in into someone else's task [07:23:22] RECOVERY - puppet last run on mw2147 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [07:24:08] probably those should be coordinated, but I do not know the details of how their requests work [07:25:40] Done [07:27:26] I know it is burocracy, and I apologize for that [07:27:33] (03PS1) 10Marostegui: db-eqiad.php: Repool db1080 and db1067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349874 [07:30:44] No worries, it's Monday and this will be done (probably) soon [07:30:45] Thanks [07:30:52] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1080 and db1067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349874 (owner: 10Marostegui) [07:32:02] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1080 and db1067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349874 (owner: 10Marostegui) [07:32:14] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1080 and db1067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349874 (owner: 10Marostegui) [07:34:00] !log marostegui@naos Synchronized wmf-config/db-eqiad.php: Repool db1080 and db1067 (duration: 01m 18s) [07:34:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:44:17] https://phabricator.wikimedia.org/T163658 [07:49:05] (03PS2) 10Filippo Giunchedi: keyholder: create /run/keyholder at boot [puppet] - 10https://gerrit.wikimedia.org/r/348760 [07:50:36] (03PS3) 10Filippo Giunchedi: keyholder: create /run/keyholder at boot [puppet] - 10https://gerrit.wikimedia.org/r/348760 [07:53:36] (03CR) 10Filippo Giunchedi: [C: 032] keyholder: create /run/keyholder at boot [puppet] - 10https://gerrit.wikimedia.org/r/348760 (owner: 10Filippo Giunchedi) [07:56:06] 06Operations, 10netops, 13Patch-For-Review: analytics hosts frequently tripping 'port utilization threshold' librenms alerts - https://phabricator.wikimedia.org/T133852#3204919 (10ayounsi) Another point, if the servers saturates its uplink, this also means it needs more capacity. In addition to making the al... [07:57:16] 06Operations: Production Shell access denied - https://phabricator.wikimedia.org/T163568#3202018 (10MoritzMuehlenhoff) The SSH key type that was added for your account back in 2014 (ssh-dss) is no longer supported by your version of OpenSSH, see https://www.openssh.com/legacy.html: "OpenSSH 7.0 and greater simi... [08:00:04] aude: Respected human, time to deploy Wikidata (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170424T0800). Please do the needful. [08:02:59] !log Deploy alter table enwiki.revision on db1095 (sanitarium2) - T132416 [08:03:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:03:07] T132416: Rampant differences in indexes on enwiki.revision across the DB cluster - https://phabricator.wikimedia.org/T132416 [08:03:42] (03CR) 10Gilles: [C: 031] mwgrep: Add --etitle option [puppet] - 10https://gerrit.wikimedia.org/r/349352 (owner: 10Krinkle) [08:11:08] (03PS2) 10Gehel: wdqs - monitor response times for both eqiad and codfw [puppet] - 10https://gerrit.wikimedia.org/r/349241 [08:12:30] (03CR) 10Gehel: [C: 032] wdqs - monitor response times for both eqiad and codfw [puppet] - 10https://gerrit.wikimedia.org/r/349241 (owner: 10Gehel) [08:15:08] (03PS1) 10Addshore: Add Cognate to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349876 (https://phabricator.wikimedia.org/T150182) [08:16:06] (03PS1) 10Gehel: wdqs - corrected naming of new WDQS alerts [puppet] - 10https://gerrit.wikimedia.org/r/349877 [08:16:24] morning aude [08:17:11] PROBLEM - puppet last run on tegmen is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:17:28] (03CR) 10Gehel: [C: 032] wdqs - corrected naming of new WDQS alerts [puppet] - 10https://gerrit.wikimedia.org/r/349877 (owner: 10Gehel) [08:18:01] PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:20:05] RECOVERY - puppet last run on tegmen is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [08:21:05] (03PS1) 10Volans: Traffic: add automatic verification of the changes [switchdc] - 10https://gerrit.wikimedia.org/r/349879 (https://phabricator.wikimedia.org/T163373) [08:21:07] (03PS1) 10Volans: DNS: add removal of confd stale files [switchdc] - 10https://gerrit.wikimedia.org/r/349880 (https://phabricator.wikimedia.org/T163376) [08:23:40] (03PS1) 10Gehel: wdqs - fixed stupid copy paste mistake [puppet] - 10https://gerrit.wikimedia.org/r/349881 [08:24:48] (03PS1) 10Marostegui: db-codfw.php: Depool db2043, db2061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349882 (https://phabricator.wikimedia.org/T163339) [08:25:21] (03CR) 10Gehel: [C: 032] wdqs - fixed stupid copy paste mistake [puppet] - 10https://gerrit.wikimedia.org/r/349881 (owner: 10Gehel) [08:27:41] hashar: awake yet? :D [08:29:44] addshore: yes and busy sprinting some CI cleanup :D [08:30:06] hashar: what is the current mwdebug and mwlog servers I should be using? ;) [08:30:09] *are the [08:30:25] Dereckson around? [08:30:35] Good morning, yes I'm [08:30:53] let me recheck the backups [08:31:48] let's start opening all logs related to ptwikibooks [08:32:25] lats backups are from Feb 27 [08:32:32] I will run them once again [08:32:43] and check nothing was written there [08:33:14] Dereckson, if you have to do something that doesn't touch the database for sure, maybe that can be done now? [08:33:52] I am not sure what is the state of the deployment- if the whole things has to be done or what is the process? [08:34:36] jynus: there is an Apache change [08:34:57] to suppress the redirect to their site and send it to our cluster [08:35:00] but that may make the database available? [08:35:17] which currently isn't, right? [08:35:20] right [08:35:29] ok, then we have to delete before that [08:35:56] let me recreate the backups, to be 100% sure [08:36:51] specially now that the hosts are different, I have to give them a second look [08:37:03] * Dereckson nods [08:38:59] things are a bit cold [08:39:54] data is the same, but user_groups seems to have received an additonal schema change [08:42:24] (03PS5) 10Dereckson: Apache configuration for pt.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/270479 (https://phabricator.wikimedia.org/T126832) [08:42:48] (03CR) 10Dereckson: "PS5: rebased against recent chapters change (e.g. +wb.)" [puppet] - 10https://gerrit.wikimedia.org/r/270479 (https://phabricator.wikimedia.org/T126832) (owner: 10Dereckson) [08:45:18] Dereckson what are the current mwdebug and mwlog servers I should be using? ;) [08:45:21] tendril is very slow [08:45:23] RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [08:46:52] addshore: mwdebug1002 and mwlog1001 still work [08:47:01] addshore: but deployment is from mira, scripts from wasat [08:47:21] ahh, okay :) [08:47:33] ok,backups are ok [08:47:58] ready to drop databases, you will recreate absolutely everyhing as if it was a new deployment, right? [08:48:08] marostegui, around? [08:48:11] yep [08:48:16] and keeping an eye on you guys :) [08:48:20] just in case help is needed [08:48:27] I may need you if replication goes wrong on s3 or others [08:48:31] Dereckson, addshore: the deployment is not mira (that was decommissioned), is currently naos.codfw.wmnet [08:48:41] yes, I am here, I am following the conversation :) [08:49:03] Dereckson, can we quicky review the process to avoid surprises? [08:49:12] (03PS1) 10Filippo Giunchedi: swift: increase swift-dispersion coverage to 6% [puppet] - 10https://gerrit.wikimedia.org/r/349887 [08:49:14] jynus: ok [08:49:15] volans: ack [08:49:34] as in, what is going to be done (even if it is link X as usual) [08:49:52] I am mostly concerned about puppet or ops-owner patches [08:51:13] jynus: So first, there is the Apache change (DNS etc. are already ok). Then, I'll recreate the "ptwikimedia" database with the current content, then will merge the config and process as usual as if it would be a new wiki. Finally, I'll create the storage bucket, asserted previously not creating any conflict issue with previous installation. [08:51:30] ok [08:51:36] so I start the drops, then? [08:52:13] addshore: could I borrow 20-30 minutes of your window so we can end this? [08:52:19] Dereckson: yup! [08:52:22] jynus: yes [08:52:28] addshore: thanks [08:52:42] ok, I will log every action and wait some minutes after each [08:53:23] (03CR) 10Addshore: wmgUseInterwikiSorting true for wiktionaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346523 (https://phabricator.wikimedia.org/T162253) (owner: 10Addshore) [08:53:25] (03CR) 10Addshore: Deploy Cognate to production wiktionaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346524 (https://phabricator.wikimedia.org/T150182) (owner: 10Addshore) [08:55:08] !log dropping ptwikimedia from s3 T126832 [08:55:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:55:17] T126832: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832 [08:55:51] 06Operations: Fix UIDs for deployment server users - https://phabricator.wikimedia.org/T163667#3205144 (10fgiunchedi) [08:55:56] (03PS1) 10Urbanecm: Add Draft namespace to zh_classicalwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349888 (https://phabricator.wikimedia.org/T163655) [08:56:10] 06Operations, 10ops-codfw: setup naos/WMF6406 as new codfw deployment server - https://phabricator.wikimedia.org/T162900#3205159 (10fgiunchedi) Followup for trebuchet/mwdeploy fixed uid/gid: https://phabricator.wikimedia.org/T163667 [08:56:12] (03PS3) 10Dereckson: Respawn ptwikimedia configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314792 (https://phabricator.wikimedia.org/T126832) [08:56:14] replication looks good so far, a few seconds of lag on the slower serverts [08:56:32] but nothing broken at db side [08:56:35] :) [08:56:46] marostegui, can you check labs [08:56:52] yep [08:56:54] I do not trust this working [08:57:02] due to the filtering [08:57:27] db1069 is broken [08:57:34] what does it say? [08:57:35] database doesn't exist [08:57:38] I can skip it [08:57:43] ah [08:57:44] ok [08:57:47] yeah [08:58:01] it was deleted there earlier [08:58:03] PROBLEM - MariaDB Slave SQL: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1008, Errmsg: Error Cant drop database ptwikimedia: database doesnt exist on query. Default database: ptwikimedia. [Query snipped] [08:58:05] apparently [08:58:09] yeah, just skip [08:58:12] (03CR) 10jerkins-bot: [V: 04-1] Respawn ptwikimedia configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314792 (https://phabricator.wikimedia.org/T126832) (owner: 10Dereckson) [08:58:14] done [08:58:20] let me check sanitarium2 [08:58:23] I can do it [08:58:25] ok [08:58:35] I will run drop if exists later [08:58:45] failed too, will fix it [08:59:06] thanks manuel, you are real saviour there [08:59:11] (03PS4) 10Dereckson: Respawn ptwikimedia configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314792 (https://phabricator.wikimedia.org/T126832) [08:59:19] done [08:59:24] we will have to recheck filtering once we are done [08:59:25] haha you are doing all the stuff! [08:59:40] (03CR) 10Dereckson: "PS3: rebased, PS4: +wikiversion +multiversion +new dblists" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314792 (https://phabricator.wikimedia.org/T126832) (owner: 10Dereckson) [08:59:59] labs hosts look good [08:59:59] any erriors on mediawiki Dereckson or anyone else? I cannot see database errors [09:00:01] checking dbstores [09:00:05] addshore: Dear anthropoid, the time has come. Please deploy Wiktionary InterwikiSorting & Cognate deployment (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170424T0900). [09:00:21] maybe 1002 broke [09:00:25] fixing 1002 [09:00:28] Dereckson: just give me a ping once I can start :) [09:00:29] it is broken yes [09:00:43] and the delated ones will break most likely tomorrow [09:00:49] yep [09:00:56] but dbstore2002 worked? [09:01:00] jynus: on fatalmonitor, we've still errors popping, nothing seems related to what you did [09:01:12] yeah, errors are normal [09:01:17] I mean new errors [09:01:26] or related to this wiki [09:01:31] dbstore1002 fixed [09:01:34] dbstore2002 wasn't broken [09:01:44] addshore: ack [09:02:04] RECOVERY - MariaDB Slave SQL: s3 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes [09:02:19] ok [09:02:30] I will now go with the other shards [09:04:42] !log dropping ptwikimedia from x1 T126832 [09:04:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:04:50] T126832: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832 [09:05:05] (03PS2) 10Addshore: Use group0 to reduce lines for WMDE related config settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348307 [09:05:07] (03PS2) 10Addshore: Add InterwikiSortOrders to noc docroot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348305 [09:05:09] (03PS2) 10Addshore: Configure InterwikiSorting orders for Wiktionaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348306 (https://phabricator.wikimedia.org/T162926) [09:05:11] (03PS3) 10Addshore: wmgUseInterwikiSorting true for wiktionaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346523 (https://phabricator.wikimedia.org/T162253) [09:05:13] (03PS2) 10Addshore: Add Cognate to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349876 (https://phabricator.wikimedia.org/T150182) [09:05:15] (03PS2) 10Addshore: Deploy Cognate to production wiktionaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346524 (https://phabricator.wikimedia.org/T150182) [09:05:24] I've run it with if-exists [09:05:29] \o/ [09:05:32] dbstore1002 is good yes [09:05:35] it should work now on dbstores [09:05:38] (03PS1) 10Urbanecm: Add NS aliases for zh_classicalwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349889 (https://phabricator.wikimedia.org/T162547) [09:05:57] I am not 100% sure mixing main servers and x1 is a good idea [09:06:02] for cases like this [09:06:30] now to the most delicate ones, esX [09:07:06] es2 and es3 are empty, so there should be no issue there [09:08:16] as expected (their wiki was older than ES) [09:08:34] !log dropping ptwikimedia from es2 T126832 [09:08:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:10:18] es2 looking good [09:11:01] !log dropping ptwikimedia from es3 T126832 [09:11:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:11:10] T126832: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832 [09:11:55] Small erratum about Apache: actually with the new multiversion guard, for *.wikipedia.org it detects auto the db, but for *.wikimedia.org it will print a "Wikimedia is a global movement whose mission is to bring free educational content to the world." welcome page as long as config isn't merged. [09:12:18] it is ok, not a big deal [09:12:22] happy that it works like that [09:14:16] !log dropping ptwikimedia from es1012,es1016,es1018,es2011,es2012,es2013, T126832 [09:14:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:14:24] those contain the main data [09:15:21] !log Stop MYSQL on db1062 to backup its mysql - T163665 [09:15:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:15:28] T163665: Reclone db1062 from db1041 (s7 master) - https://phabricator.wikimedia.org/T163665 [09:17:03] I will have to restart tendril at some point [09:17:09] it is very slow [09:20:10] ok, things look good at the moment [09:20:22] I would do a general check that it has been deleted on all core dbs [09:21:01] ok [09:21:41] ok, no output except from db1062 [09:21:51] just brought it down :) [09:22:10] so 129 host + 1 do not have such a db [09:22:26] now, if there is some extension still doing things that we have not accounted for [09:22:26] root@db1062:/srv/sqldata# ls ptwikimedia [09:22:27] ls: cannot access ptwikimedia: No such file or directory [09:22:42] so db1062 is good too [09:22:43] we cannot say [09:23:00] I think this finishes for us, Dereckson [09:23:11] be vigilant please about errors [09:23:25] (some similar errors were not discovered until weeks later) [09:23:41] ok [09:23:51] but regarding data privacy I am ok with the current state [09:28:27] Dereckson: any eta? [09:28:47] (03PS1) 10Muehlenhoff: Setup rsync server to sync home dirs from terbium to wasat [puppet] - 10https://gerrit.wikimedia.org/r/349893 [09:31:45] jynus: Could you merge the Apache change? https://gerrit.wikimedia.org/r/#/c/270479/ It then requires to follow https://wikitech.wikimedia.org/wiki/Application_servers#Deploying_config to gracefully restart application servers [09:33:42] for the apache change, I would like to have the help from someone with experience with the application servers. [09:33:55] ^elukey ? [09:34:12] checking :) [09:34:39] I suppose we merge on debug and check it works as intended? [09:34:43] and then we roll slowly? [09:35:22] I am not asking you to do it, I am asking for any tip you could give me [09:36:01] addshore: ok, you can proceed, all is in order, and I'll do the config part after [09:36:09] awesome! [09:36:35] (03CR) 10Addshore: [C: 032] Use group0 to reduce lines for WMDE related config settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348307 (owner: 10Addshore) [09:36:49] jynus: yep definitely, what I'd do is deploy to debug / eqiad first and then codfw [09:37:04] and for the reload? [09:37:15] what do you use and what policy? [09:37:34] or does puppet reload automatically? [09:37:41] the latter [09:37:46] so we need to disable puppet? [09:37:54] (03Merged) 10jenkins-bot: Use group0 to reduce lines for WMDE related config settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348307 (owner: 10Addshore) [09:38:03] (03CR) 10jenkins-bot: Use group0 to reduce lines for WMDE related config settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348307 (owner: 10Addshore) [09:38:45] jynus: yep this is usually my preference, but I saw people not doing it for small changes [09:38:55] in this case I'd definitely do it [09:39:14] (I am reading the task to get a bit of context) [09:40:10] there is some guidance on https://wikitech.wikimedia.org/wiki/Application_servers [09:41:24] (03CR) 10Addshore: [C: 032] Add InterwikiSortOrders to noc docroot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348305 (owner: 10Addshore) [09:42:37] (03Merged) 10jenkins-bot: Add InterwikiSortOrders to noc docroot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348305 (owner: 10Addshore) [09:42:49] (03CR) 10jenkins-bot: Add InterwikiSortOrders to noc docroot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348305 (owner: 10Addshore) [09:42:55] !log addshore@naos Synchronized wmf-config/InitialiseSettings.php: [[gerrit:348307|Use group0 to reduce lines for WMDE related config settings]] (duration: 01m 18s) [09:43:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:35] !log addshore@naos Synchronized docroot/noc/conf/InterwikiSortOrders.php.txt: NOOP [[gerrit:348305|Add InterwikiSortOrders to noc docroot]] (docs only) (duration: 01m 00s) [09:44:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:45:01] addshore: what? [09:45:19] aude: nothing, I was just wondering if you were doing your deployment or not :P [09:45:44] * aude confused [09:45:45] (03CR) 10Addshore: [C: 032] Configure InterwikiSorting orders for Wiktionaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348306 (https://phabricator.wikimedia.org/T162926) (owner: 10Addshore) [09:45:48] about time zones [09:46:34] * addshore is also confused about timezones, I turned up 1 hour early for my slot [09:46:39] aude: you had a geospatial Wikidata deployment planned at 8h-9h UTC (10h-11h CEST) [09:47:26] (03Merged) 10jenkins-bot: Configure InterwikiSorting orders for Wiktionaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348306 (https://phabricator.wikimedia.org/T162926) (owner: 10Addshore) [09:47:40] (03CR) 10jenkins-bot: Configure InterwikiSorting orders for Wiktionaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348306 (https://phabricator.wikimedia.org/T162926) (owner: 10Addshore) [09:47:54] Dereckson: addshore can i deploy around 12:00 UTC or before if addshore is done earlier [09:48:09] or maybe squeeze before swat [09:48:39] aude: what's your expected duration? [09:48:51] i shouldn't use my whole slot if all goes to plan [09:49:06] 5 min [09:49:38] swat already has 8 things [09:49:43] to squeeze it before SWAT looks good in this case, but perhaps start the SWAT a little earlier as there is 8 patches, so around 12:30 CEST? [09:49:44] !log testing mediawiki changes on mwdebug1001 [09:49:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:50:11] !log addshore@naos Synchronized wmf-config/InterwikiSortOrders.php: [[gerrit:348306|Configure InterwikiSorting orders for Wiktionaries]] PT 1/2 (duration: 00m 53s) [09:50:15] ok, before swat would be perfect [09:50:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:50:23] Dereckson, I have deployed the change to mwdebug1001 [09:50:31] not sure if that is reachable in the current config [09:50:36] * aude to blame for not reserving space for this in swat [09:50:47] to enable geoshapes [09:50:49] jynus: let me fire a test request there [09:50:49] jynus: change looks fine to me [09:50:54] or if I have to do that [09:50:59] on the codfw debug ones [09:51:01] (I mean the change in gerrit, not mwdebug) [09:51:17] (03CR) 10Addshore: [C: 032] wmgUseInterwikiSorting true for wiktionaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346523 (https://phabricator.wikimedia.org/T162253) (owner: 10Addshore) [09:51:19] (03CR) 10Filippo Giunchedi: [C: 031] Setup rsync server to sync home dirs from terbium to wasat [puppet] - 10https://gerrit.wikimedia.org/r/349893 (owner: 10Muehlenhoff) [09:51:19] yep look goods [09:51:22] in any case, it can also be tested from inside the cluster [09:51:26] (03CR) 10Filippo Giunchedi: [C: 032] swift: increase swift-dispersion coverage to 6% [puppet] - 10https://gerrit.wikimedia.org/r/349887 (owner: 10Filippo Giunchedi) [09:51:27] !log addshore@naos Synchronized wmf-config/InitialiseSettings.php: [[gerrit:348306|Configure InterwikiSorting orders for Wiktionaries]] PT 2/2 (duration: 00m 48s) [09:51:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:51:45] (I wonder why it redirects to pt.wikipedia by default, but looks good) [09:51:47] are you sure you are checking mwdebug1001 and not mw2017 ? [09:52:06] yes yes I picked mwdebug1001 in the list [09:52:18] and instead to redirect to www.wikimedia.pt, it's well handled by our multiversion entrypoint [09:52:22] I am not sure if that would work [09:52:28] (03Merged) 10jenkins-bot: wmgUseInterwikiSorting true for wiktionaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346523 (https://phabricator.wikimedia.org/T162253) (owner: 10Addshore) [09:52:40] (03CR) 10jenkins-bot: wmgUseInterwikiSorting true for wiktionaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346523 (https://phabricator.wikimedia.org/T162253) (owner: 10Addshore) [09:52:50] let me double check on codfw [09:54:45] !log addshore@naos Synchronized wmf-config/InitialiseSettings.php: [[gerrit:346523|wmgUseInterwikiSorting true for wiktionaries]] PT 1/2 (duration: 00m 47s) [09:54:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:54:58] jynus: well, what I see: serve pt.wikipedia.org (so "correctly" handled by our multiversion entry point) (new config) / elsewhere: redirect to http://wikimedia.pt/Wikimedia_Portugal (old config) [09:55:41] can you check it doesnt work on mw2017 ? [09:55:43] ok [09:56:07] !log addshore@naos Synchronized wmf-config/InitialiseSettings-labs.php: [[gerrit:346523|wmgUseInterwikiSorting true for wiktionaries]] PT 2/2 (duration: 00m 46s) [09:56:07] I confirm it doesn't work on mw2017 (ie it still redirects to http://wikimedia.pt/Wikimedia_Portugal) [09:56:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:56:29] and now? [09:56:48] pt.wikipedia, so ok [09:56:59] aude: you were correct to request a window, new features aren't SWAT candidates [09:57:06] ok [09:57:15] I will now deploy slowly globally [09:57:53] 25% batch size is suggested on https://wikitech.wikimedia.org/wiki/Application_servers [09:57:55] (03CR) 10Addshore: [C: 032] Add Cognate to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349876 (https://phabricator.wikimedia.org/T150182) (owner: 10Addshore) [09:58:06] Dereckson: yeah [09:59:01] (03Merged) 10jenkins-bot: Add Cognate to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349876 (https://phabricator.wikimedia.org/T150182) (owner: 10Addshore) [09:59:09] (03CR) 10jenkins-bot: Add Cognate to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349876 (https://phabricator.wikimedia.org/T150182) (owner: 10Addshore) [10:00:47] !log disabling puppet on app servers for apache config deploy T126832 [10:00:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:00:55] T126832: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832 [10:01:26] !log addshore@naos Started scap: [[gerrit:349876|Add Cognate to extension-list]] T150182 [10:01:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:01:34] T150182: Deploy Cognate extension to production - https://phabricator.wikimedia.org/T150182 [10:02:58] (03CR) 10Jcrespo: [C: 032] Apache configuration for pt.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/270479 (https://phabricator.wikimedia.org/T126832) (owner: 10Dereckson) [10:03:19] (03PS6) 10Jcrespo: Apache configuration for pt.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/270479 (https://phabricator.wikimedia.org/T126832) (owner: 10Dereckson) [10:05:56] it is merged now, now deploying on real servers [10:07:54] (03PS1) 10Aude: Enable geo-shape data type on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349900 (https://phabricator.wikimedia.org/T161543) [10:09:26] you are right, we are getting ptwikpedia [10:09:29] as a redirect [10:09:37] I assume that is on purpose [10:09:56] "Location: http://pt.wikipedia.org/wiki/Wikip%C3%A9dia:P%C3%A1gina_principal" [10:10:22] not sure what to test, everthing seems ok [10:10:43] Dereckson: i'll be back in ~2 hours and can deploy after you are done [10:10:52] I am going to reenable on eqiad [10:10:58] updated deployment page with link to the the ticket and patch [10:11:18] aude: okay, I'll ping you [10:12:06] jynus: yes that seems ok [10:12:37] also remember i need to do https://gerrit.wikimedia.org/r/#/c/348413/ [10:15:46] enablign puppet on codfw [10:16:53] !log addshore@naos Finished scap: [[gerrit:349876|Add Cognate to extension-list]] T150182 (duration: 15m 26s) [10:17:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:17:01] T150182: Deploy Cognate extension to production - https://phabricator.wikimedia.org/T150182 [10:17:36] (03CR) 10Addshore: [C: 032] Deploy Cognate to production wiktionaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346524 (https://phabricator.wikimedia.org/T150182) (owner: 10Addshore) [10:18:35] (03Merged) 10jenkins-bot: Deploy Cognate to production wiktionaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346524 (https://phabricator.wikimedia.org/T150182) (owner: 10Addshore) [10:18:43] (03CR) 10jenkins-bot: Deploy Cognate to production wiktionaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346524 (https://phabricator.wikimedia.org/T150182) (owner: 10Addshore) [10:20:16] Dereckson, it is running [10:20:28] I see no errors, but it is not 100% fully done [10:20:42] so it still redirects to the external server [10:20:48] ping me if you see any issue [10:21:40] ok [10:21:55] aparently the redirect is cached on varnish [10:22:03] we'll purge it later [10:22:05] so that may be an issue [10:22:09] yeah [10:22:25] I am switching tasks, ping me if you see any problem, or anyone else [10:22:36] ack'ed [10:23:35] (03PS4) 10Dereckson: Initial configuration for dty.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/347217 (https://phabricator.wikimedia.org/T161529) (owner: 10DatGuy) [10:23:54] (03CR) 10Dereckson: "PS4: +wikiversions" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/347217 (https://phabricator.wikimedia.org/T161529) (owner: 10DatGuy) [10:24:19] !log addshore@wasat:~$ mwscript extensions/Cognate/maintenance/populateCognateSites.php enwiktionary --site-group=wiktionary [10:24:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:25:01] !log 172 sites added to cognate_sites [10:25:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:25:53] !log addshore@wasat:~$ mwscript extensions/Cognate/maintenance/populateCognatePages.php zawiktionary [10:25:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:27:02] !log 180 rows added to cognate_titles & cognate_pages [10:27:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:28:06] !log addshore@wasat:~$ mwscriptwikiset extensions/Cognate/maintenance/populateCognatePages.php wiktionary.dblist [10:28:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:35:37] 06Operations, 10ops-eqiad, 15User-fgiunchedi: Some swift disks wrongly mounted on 5 ms-be hosts - https://phabricator.wikimedia.org/T163673#3205351 (10fgiunchedi) [10:40:16] addshore: you're done with InterwikiSorting / Cognate? [10:40:24] nope [10:40:45] Got to wait for this main script to finish running, then I can sync the last 4 files, then got to run the maint script 1 last time [10:41:14] ok [10:41:37] its running on all wiktionaries and just got to en :) should go a little faster once over this hump! [10:46:00] (03PS2) 10Muehlenhoff: Setup rsync server to sync home dirs from terbium to multatuli [puppet] - 10https://gerrit.wikimedia.org/r/349893 [10:47:23] (03PS1) 10ArielGlenn: rotate empty cirrusdump logs too so they get cleared out in 90 days [puppet] - 10https://gerrit.wikimedia.org/r/349922 [10:49:50] (03CR) 10ArielGlenn: [C: 032] rotate empty cirrusdump logs too so they get cleared out in 90 days [puppet] - 10https://gerrit.wikimedia.org/r/349922 (owner: 10ArielGlenn) [10:51:31] (03PS1) 10Muehlenhoff: Make cirrus logrotate config jessie-compatible [puppet] - 10https://gerrit.wikimedia.org/r/349923 (https://phabricator.wikimedia.org/T163555) [10:54:25] 06Operations, 10Traffic, 13Patch-For-Review: varnish backends start returning 503s after ~6 days uptime - https://phabricator.wikimedia.org/T145661#3205422 (10ema) @BBlack suggested that the possible underlying issue could be lock contention between the expiry thread and the worker threads. Indeed this seems... [10:55:14] PROBLEM - Check Varnish expiry mailbox lag on cp2002 is CRITICAL: CRITICAL: expiry mailbox lag is 662913 [10:57:50] !log addshore@naos Synchronized wmf-config/CommonSettings.php: [[gerrit:346524|Deploy Cognate to production wiktionaries]] T150182 PT 1/4 (duration: 01m 18s) [10:57:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:57:59] T150182: Deploy Cognate extension to production - https://phabricator.wikimedia.org/T150182 [10:58:47] The "Notice: Undefined variable: wmgUseCognate in /srv/mediawiki/wmf-config/CommonSettings.php on line 3026" spam in the logs is me [10:59:33] PROBLEM - HP RAID on ms-be1039 is CRITICAL: CHECK_NRPE: Socket timeout after 50 seconds. [11:00:13] 06Operations: Fix UIDs for deployment server users - https://phabricator.wikimedia.org/T163667#3205441 (10fgiunchedi) [11:01:22] (03CR) 10Filippo Giunchedi: [C: 031] Setup rsync server to sync home dirs from terbium to multatuli [puppet] - 10https://gerrit.wikimedia.org/r/349893 (owner: 10Muehlenhoff) [11:01:25] !log addshore@naos Synchronized wmf-config/CommonSettings-labs.php: [[gerrit:346524|Deploy Cognate to production wiktionaries]] T150182 PT 2/4 (duration: 01m 01s) [11:01:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:34] !log addshore@naos Synchronized wmf-config/InitialiseSettings.php: [[gerrit:346524|Deploy Cognate to production wiktionaries]] T150182 PT 3/4 (duration: 00m 57s) [11:02:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:06:52] (03PS1) 10Filippo Giunchedi: graphite: don't fire alarms on carbon-related spikes [puppet] - 10https://gerrit.wikimedia.org/r/349925 [11:07:53] (03PS2) 10Filippo Giunchedi: graphite: don't fire alarms on carbon-related spikes [puppet] - 10https://gerrit.wikimedia.org/r/349925 [11:08:15] Dereckson: still around? [11:09:04] yes [11:09:24] (03CR) 10Filippo Giunchedi: [C: 032] graphite: don't fire alarms on carbon-related spikes [puppet] - 10https://gerrit.wikimedia.org/r/349925 (owner: 10Filippo Giunchedi) [11:10:03] Can you double check this for me before I move forward, I cant spot what I am missing [11:10:31] 3 or the 4 files in https://gerrit.wikimedia.org/r/#/c/346524 have been deployed [11:10:36] Just IS-labs to go [11:10:50] Getting spams of Undefined variable: wmgUseCognate in /srv/mediawiki/wmf-config/CommonSettings.php on line 3026 [11:11:10] But that should always be set.... as far as I can see [11:11:28] touch and resync IS [11:11:59] {{doing}} [11:11:59] Sometimes, the missing variable is still in cache after you sync [11:12:09] ack [11:12:47] !log addshore@naos Synchronized wmf-config/InitialiseSettings.php: [[gerrit:346524|Deploy Cognate to production wiktionaries]] T150182 PT 3/4 (touched) (duration: 00m 52s) [11:12:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:12:56] T150182: Deploy Cognate extension to production - https://phabricator.wikimedia.org/T150182 [11:13:03] Yup, thats done it, thanks Dereckson! [11:13:32] 06Operations, 10ops-codfw: Swap NIC on mira - https://phabricator.wikimedia.org/T162859#3177514 (10fgiunchedi) naos is online and used, I think we should fix mira's NIC and deprovision / allocate to spare now (or decom altogether) [11:14:11] !log addshore@naos Synchronized wmf-config/InitialiseSettings-labs.php: [[gerrit:346524|Deploy Cognate to production wiktionaries]] T150182 PT 4/4 (duration: 00m 47s) [11:14:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:16:12] !log addshore@wasat:~$ mwscriptwikiset extensions/Cognate/maintenance/populateCognatePages.php wiktionary.dblist --batch-size=1000 [11:16:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:16:45] (03PS3) 10Muehlenhoff: Setup rsync server to sync home dirs from terbium to wasat [puppet] - 10https://gerrit.wikimedia.org/r/349893 [11:17:35] Dereckson: right, I have deployed everything I need to, just the run of the maint script to go, but that might take some time [11:17:47] ok [11:19:23] RECOVERY - HP RAID on ms-be1039 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor [11:19:28] 06Operations, 10ops-eqiad: ms-be1016 controller cache failure - https://phabricator.wikimedia.org/T150206#3205489 (10fgiunchedi) @Cmjohnson I'm ok to do this today, LMK when it is a good time for you [11:19:48] I'm going to deploy cherry-picks of l10n changes in core, Gadgets and Scribunto, not touching mediawiki-config [11:22:06] 06Operations, 10ops-codfw, 15User-fgiunchedi: upgrade memory in prometheus200[34] - https://phabricator.wikimedia.org/T163386#3205495 (10fgiunchedi) @Papaul LMK when you can do this, we can depool one machine at a time for maintenance [11:22:17] 06Operations, 10ops-eqiad, 15User-fgiunchedi: upgrade memory in prometheus100[34] - https://phabricator.wikimedia.org/T163385#3205496 (10fgiunchedi) @Cmjohnson LMK when you can do this, we can depool one machine at a time for maintenance [11:22:47] Dereckson: okay! would also be fine to touch mediawiki-config now too :) [11:23:20] (03CR) 10Muehlenhoff: [C: 032] Setup rsync server to sync home dirs from terbium to wasat [puppet] - 10https://gerrit.wikimedia.org/r/349893 (owner: 10Muehlenhoff) [11:26:05] !log dereckson@naos Synchronized php-1.29.0-wmf.20/extensions/Scribunto/Scribunto.namespaces.php: Localize namespaces in Doteli (T162874) (duration: 00m 46s) [11:26:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:26:13] T162874: Namespace localisation in Doteli (dty) - https://phabricator.wikimedia.org/T162874 [11:27:23] !log dereckson@naos Synchronized php-1.29.0-wmf.20/extensions/Gadgets/Gadgets.namespaces.php: Localize namespaces in Doteli (T162873) (duration: 00m 44s) [11:27:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:27:31] T162873: Namespace localisation in Doteli (dty) - https://phabricator.wikimedia.org/T162873 [11:28:20] !log dereckson@naos Synchronized php-1.29.0-wmf.20/languages/messages/MessagesDty.php: Localize namespaces in Doteli (T162872) (duration: 00m 50s) [11:28:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:28:28] T162872: Namespaces localisation in Doteli (dty) - https://phabricator.wikimedia.org/T162872 [11:32:44] (03CR) 10Aleksey Bekh-Ivanov (WMDE): [C: 031] Don't let Wikibase instances read/write terms_full_entity_id [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348413 (https://phabricator.wikimedia.org/T159851) (owner: 10Ladsgroup) [11:33:16] (03CR) 10Dereckson: [C: 032] Respawn ptwikimedia configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314792 (https://phabricator.wikimedia.org/T126832) (owner: 10Dereckson) [11:33:35] okay, let's recreate the fresh and new pt.wikimedia wiki [11:36:32] (03PS5) 10Dereckson: Respawn ptwikimedia configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314792 (https://phabricator.wikimedia.org/T126832) [11:37:24] (03CR) 10Dereckson: [C: 032] Respawn ptwikimedia configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314792 (https://phabricator.wikimedia.org/T126832) (owner: 10Dereckson) [11:38:27] (03Merged) 10jenkins-bot: Respawn ptwikimedia configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314792 (https://phabricator.wikimedia.org/T126832) (owner: 10Dereckson) [11:38:36] (03CR) 10jenkins-bot: Respawn ptwikimedia configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314792 (https://phabricator.wikimedia.org/T126832) (owner: 10Dereckson) [11:41:02] !log Recreate database for ptwikimedia (T126832) [11:41:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:41:09] T126832: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832 [11:42:38] !log dereckson@naos Synchronized dblists/: Respawn pt.wikimedia configuration (duration: 00m 44s) [11:42:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:42:55] !log dereckson@naos rebuilt wikiversions.php and synchronized wikiversions files: +pt.wikimedia (T126832) [11:43:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:44:21] Fun to see CentralAuth redirect to pt.wikip [11:44:44] but looks good on mwdebug1002 login issue excepted [11:45:41] should test on codfw by the way [11:45:53] something complains db is read only in resource loader [11:46:24] live on mw2017.codfw [11:47:01] 11h46min de 24 de abril de 2017 A conta de utilizador Dereckson (Discussão | contribs) foi criada automaticamente [11:47:04] works better [11:47:21] okay syncing [11:48:32] !log dereckson@naos Synchronized wmf-config/InitialiseSettings.php: Initial configuration for pt.wikimedia (T126832) [11:48:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:48:40] T126832: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832 [11:50:10] !log mwscript extensions/WikimediaMaintenance/filebackend/setZoneAccess.php ptwikimedia --backend=local-multiwrite (T126832) [11:50:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:52:52] jynus: ptwikimedia new database has been created, all looks good according the wiki and logs [11:53:36] now I'm syncing multiversion, then purge URL and it will be live [11:55:41] !log dereckson@naos Synchronized multiversion/MWMultiVersion.php: Entry point for pt.wikimedia.org (T126832) (duration: 00m 44s) [11:55:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:55:50] T126832: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832 [11:59:38] !log Purged https://pt.wikimedia.org/ URL (T126832) [11:59:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:00:04] Dereckson: Respected human, time to deploy Wiki creation (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170424T1200). Please do the needful. [12:01:37] (03PS5) 10Dereckson: Initial configuration for dty.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/347217 (https://phabricator.wikimedia.org/T161529) (owner: 10DatGuy) [12:01:45] DatGuy: ping [12:01:48] (03CR) 10Dereckson: [C: 032] Initial configuration for dty.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/347217 (https://phabricator.wikimedia.org/T161529) (owner: 10DatGuy) [12:03:00] (03PS1) 10Muehlenhoff: Change synchronisation host for terbium reimage [puppet] - 10https://gerrit.wikimedia.org/r/349941 [12:04:37] (03Merged) 10jenkins-bot: Initial configuration for dty.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/347217 (https://phabricator.wikimedia.org/T161529) (owner: 10DatGuy) [12:05:40] (03CR) 10jenkins-bot: Initial configuration for dty.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/347217 (https://phabricator.wikimedia.org/T161529) (owner: 10DatGuy) [12:07:04] !log dereckson@naos Synchronized static/images/project-logos/: Logo for dty.wikipedia (T161529) (duration: 01m 13s) [12:07:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:07:12] T161529: Create Wikipedia Doteli - https://phabricator.wikimedia.org/T161529 [12:07:57] (03CR) 10Muehlenhoff: [C: 032] Change synchronisation host for terbium reimage [puppet] - 10https://gerrit.wikimedia.org/r/349941 (owner: 10Muehlenhoff) [12:08:04] !log dereckson@naos Synchronized dblists: +dtywiki (duration: 00m 56s) [12:08:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:08:39] !log Creata dtywiki database (T161529) [12:08:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:09:13] !log dereckson@naos rebuilt wikiversions.php and synchronized wikiversions files: +dtywiki [12:09:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:10:39] १७:५४, २४ अप्रिल २०१७ प्रयोगकर्ता खाता Dereckson (कुरणि • योगदानअन) स्वतः खोलियो [12:10:44] looks good too [12:12:06] But a full scap is perhaps required for new namespaces l10n [12:13:29] aude: you need a full scpa? [12:13:45] !log dereckson@naos Synchronized langlist: +dty (T161529) (duration: 00m 50s) [12:13:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:13:54] T161529: Create Wikipedia Doteli - https://phabricator.wikimedia.org/T161529 [12:14:27] !log dereckson@naos Synchronized wmf-config/InitialiseSettings.php: Initial configuration for dty.wikipedia (T161529) (duration: 00m 49s) [12:14:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:15:55] (03CR) 10Hashar: [C: 031] Enable NewUserMessage on zh_classicalwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348309 (https://phabricator.wikimedia.org/T163043) (owner: 10Urbanecm) [12:16:23] (03CR) 10Hashar: [C: 031] Remove all feeds added in T127176 from RSS whitelist for mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348782 (https://phabricator.wikimedia.org/T163217) (owner: 10Urbanecm) [12:16:56] (03CR) 10Hashar: [C: 031] Change the timezone of West Bengal Wikimedians user group wiki to Asia/Kolkata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348956 (https://phabricator.wikimedia.org/T163322) (owner: 10Urbanecm) [12:17:15] (03CR) 10Hashar: [C: 031] Raise requirements for getting autoconfirmed status to 4 days, 10 edits at cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348725 (https://phabricator.wikimedia.org/T163207) (owner: 10Urbanecm) [12:17:34] (03CR) 10Hashar: [C: 031] Make sysops able to grant/remove confirmed user group at cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348727 (https://phabricator.wikimedia.org/T163206) (owner: 10Urbanecm) [12:18:23] (03PS2) 10Dereckson: Raise autoconfirmed status requirements to 4 days, 10 edits at cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348725 (https://phabricator.wikimedia.org/T163207) (owner: 10Urbanecm) [12:19:08] (03CR) 10Dereckson: "PS2: shorter commit message to avoid 72 chars truncate on GitHub" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348725 (https://phabricator.wikimedia.org/T163207) (owner: 10Urbanecm) [12:20:17] (03PS3) 10Dereckson: Set timezone to Asia/Kolkata on wb.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348956 (https://phabricator.wikimedia.org/T163322) (owner: 10Urbanecm) [12:21:09] Im here whenever swat starts when your ready for me just ping me [12:22:18] (03CR) 10Hashar: "Are you sure all the dependent code has made it to 1.29.0-wmf.20 ?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/347545 (https://phabricator.wikimedia.org/T159416) (owner: 10TTO) [12:22:33] (03PS1) 10Muehlenhoff: Fix storage path for terbium sync [puppet] - 10https://gerrit.wikimedia.org/r/349942 [12:23:06] Zppix: don't hold your breath, it could take a little while, we've still to deploy follow-up patches for pt.wikimedia/dty.wikipedia new wikis, then aude has to deploy something for Wikidata [12:23:40] hashar: I'll check before SWAT [12:23:58] Dereckson: take your time im just letting swatters know :) [12:24:26] tto: seems most have been merged before wmf.20 got cut beside the one Echo patch I have found [12:24:50] tto: you might want to look at each of the subtasks and their Gerrit patch then check whether they are included in [12:25:03] hashar: cherrypick? [12:25:04] tto: then it is probably not going to cause any major havoc on the infrastructure :] [12:27:06] (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/347545 (https://phabricator.wikimedia.org/T159416) (owner: 10TTO) [12:27:13] PROBLEM - Varnish HTTP text-backend - port 3128 on cp1008 is CRITICAL: connect to address 208.80.154.42 and port 3128: Connection refused [12:27:43] PROBLEM - Check systemd state on cp1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:27:53] (03PS1) 10Dereckson: Update interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349943 [12:28:14] hashar: I can see a couple of the patches didn't get into wmf.20, but they are minor patches in the scheme of things. Not blockers to this SWAT patch IMHO [12:28:19] !log mwscript extensions/WikimediaMaintenance/filebackend/setZoneAccess.php dtywiki --backend=local-multiwrite (T162874) [12:28:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:28:27] T162874: Namespace localisation in Doteli (dty) - https://phabricator.wikimedia.org/T162874 [12:28:34] (03CR) 10Dereckson: [C: 032] Update interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349943 (owner: 10Dereckson) [12:29:27] So long as they will be deployed to infrastructure in the near future, we will survive for a couple of days without them [12:29:43] RECOVERY - Check systemd state on cp1008 is OK: OK - running: The system is fully operational [12:30:02] (03CR) 10Muehlenhoff: [C: 032] Fix storage path for terbium sync [puppet] - 10https://gerrit.wikimedia.org/r/349942 (owner: 10Muehlenhoff) [12:30:05] aude: Dear anthropoid, the time has come. Please deploy Wikidata (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170424T1230). [12:30:13] RECOVERY - Varnish HTTP text-backend - port 3128 on cp1008 is OK: HTTP OK: HTTP/1.1 200 OK - 176 bytes in 0.076 second response time [12:30:58] Reedy: if you are around, I could use a review for some EducationProgram $wgAddGroup https://gerrit.wikimedia.org/r/#/c/349427/11 :] [12:32:04] (03Merged) 10jenkins-bot: Update interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349943 (owner: 10Dereckson) [12:32:16] (03CR) 10jenkins-bot: Update interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349943 (owner: 10Dereckson) [12:33:30] * aude back [12:33:37] aude: do you need a full scap? [12:33:40] no [12:33:42] just config [12:33:54] ok [12:34:09] hashar: something in the SWAT needs a full scap? [12:35:14] Dereckson: swat looks like all config except https://gerrit.wikimedia.org/r/#/c/349863/ [12:35:30] for that, not sure [12:36:16] hashar: i may be a couple mins late for swat i have to switch devices real quick [12:36:39] aude: oh well, I need one for +dty in Gadgets/Scribunto, so I'd vote yes it needs it [12:37:17] yeah, might as well include that [12:37:31] so I suggest as order: interwiki map change / then you / then SWAT / then full scpa [12:37:52] ok [12:37:54] was anyone else bombarded with spam from multiple nicks with links to 'thoughtful blog post by freenode oper "kloeri"' and various attempts at script injection? [12:38:01] ori: yeah [12:38:06] ugh [12:38:52] !log dereckson@naos Synchronized wmf-config/interwiki.php: +dty +wmpt and other fixes (duration: 00m 48s) [12:39:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:40:17] ori: script injection the other no [12:41:16] I got messages like ' nM01Ln>G}Y^A<#yN' (with an src= of some presumably malicious js file) [12:41:29] anyways, if it's widespread freenode opers must know [12:41:29] !log pt.wikimedia.org and dty.wikipedia.org wikis creation done [12:41:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:41:51] aude: you've the floor (naos.codfw.wmnet as deployment server) [12:41:54] ori: it appears to be handled either that or its slowed down ive not recieved any of that since last night [12:42:07] ok [12:42:31] (03CR) 10Aude: [C: 032] Enable geo-shape data type on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349900 (https://phabricator.wikimedia.org/T161543) (owner: 10Aude) [12:43:13] PROBLEM - Check Varnish expiry mailbox lag on cp2017 is CRITICAL: CRITICAL: expiry mailbox lag is 597686 [12:43:58] (03Merged) 10jenkins-bot: Enable geo-shape data type on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349900 (https://phabricator.wikimedia.org/T161543) (owner: 10Aude) [12:44:51] jouncebot: next [12:44:51] In 0 hour(s) and 15 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170424T1300) [12:45:58] (03PS1) 10Muehlenhoff: Configure terbium for installation with jessie [puppet] - 10https://gerrit.wikimedia.org/r/349945 [12:45:59] (03CR) 10jenkins-bot: Enable geo-shape data type on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349900 (https://phabricator.wikimedia.org/T161543) (owner: 10Aude) [12:46:38] !log aude@naos Synchronized wmf-config/Wikibase-production.php: (no justification provided) (duration: 00m 47s) [12:46:42] checking [12:46:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:58] hashar: at the end of the SWAT, could you perform a full scap for some namespaces localisations change (the one you CR+2 in SWAT, and three for dty this morning)? [12:47:14] PROBLEM - Varnish HTTP text-backend - port 3128 on cp1008 is CRITICAL: connect to address 208.80.154.42 and port 3128: Connection refused [12:47:43] PROBLEM - Check systemd state on cp1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:48:15] this is the one thing i need to login as staff on wikidata... [12:48:42] looks good [12:48:45] have one more thing [12:49:30] hashar: when it comes to testing my swat patch I will probably need someone with sysop perms to confirm they can add Edu program rights to users (or if i could be granted sysop on test.wikipedia.org that would work too) [12:49:37] (03PS2) 10Aude: Don't let Wikibase instances read/write terms_full_entity_id [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348413 (https://phabricator.wikimedia.org/T159851) (owner: 10Ladsgroup) [12:50:02] Dereckson: guess that is doable yes. You will have to remember me about it [12:50:03] (03CR) 10Aude: [C: 032] Don't let Wikibase instances read/write terms_full_entity_id [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348413 (https://phabricator.wikimedia.org/T159851) (owner: 10Ladsgroup) [12:50:03] :] [12:50:15] hashar: I'd like to go to eat actually [12:50:24] Dereckson: then head out for lunch! [12:50:26] ;] [12:50:37] (was creating a wiki at lunch time) [12:50:38] Thanks [12:50:51] I took note about running sca [12:50:51] p [12:51:02] almost done [12:51:04] (03Merged) 10jenkins-bot: Don't let Wikibase instances read/write terms_full_entity_id [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348413 (https://phabricator.wikimedia.org/T159851) (owner: 10Ladsgroup) [12:51:06] pffiouuu the Parsoid site matrix hasn't been updated for months. [12:51:13] (03CR) 10jenkins-bot: Don't let Wikibase instances read/write terms_full_entity_id [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348413 (https://phabricator.wikimedia.org/T159851) (owner: 10Ladsgroup) [12:51:40] Who is SWAT'ng next? for EU Midday? [12:52:07] Is there any chance to accomodate 1 patch? https://gerrit.wikimedia.org/r/#/c/349869/ [12:52:52] !log aude@naos Synchronized wmf-config/Wikibase-production.php: Disable use of new column in wb_terms table for now (duration: 00m 48s) [12:52:52] (I've asked Releng team, but it seems timezone matters) [12:52:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:53:02] * aude checks [12:53:14] kart_: maybe you can deploy it with jynus? [12:53:30] Sure. Let me check. [12:53:34] jynus: around? [12:53:42] 06Operations, 10Traffic: Test.wikipedia,org is reporting bad gateways outside of the main page - https://phabricator.wikimedia.org/T163684#3205783 (10Zppix) [12:53:51] looks good [12:53:54] * aude done [12:54:07] seems CX was the cause of some database overload, so I would rather have a DBA around to confirm that the databases are all fine when ContentTranslation is re enabled [12:54:26] (03PS2) 10Dereckson: Re-enable ContentTranslation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349869 (https://phabricator.wikimedia.org/T163344) (owner: 10KartikMistry) [12:54:39] Sure. There is patch in master, but that won't affect much at moment though. [12:54:48] Dereckson: just wait till DBA is OK :) [12:54:51] aude: congrats :] [12:55:19] kart_: I was only noting a reference to the commit it reverts [12:55:20] * aude eats breakfast but am around for a while [12:55:26] 06Operations, 10Traffic: Test.wikipedia,org is reporting bad gateways outside of the main page - https://phabricator.wikimedia.org/T163684#3205783 (10TTO) Works for me... https://test.wikipedia.org/wiki/Hello_there_apples_and_bananas! (silly URL to bust cache) is fine. [12:55:40] kart_: so, on Phabricator, the two commits will be cross-ref [12:55:43] Zppix: your patch ( https://gerrit.wikimedia.org/r/#/c/349427/ ) , I would prefer Reedy to give it a +1 [12:55:52] Dereckson: thanks. That's better. [12:56:00] Zppix: that being said, I guess it can be deployed at anytime (eg outside of a swat window) [12:56:20] (03PS1) 10Elukey: Piwik fake passwords [labs/private] - 10https://gerrit.wikimedia.org/r/349950 (https://phabricator.wikimedia.org/T159136) [12:56:28] (03PS3) 10Hashar: Enable NewUserMessage on zh_classicalwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348309 (https://phabricator.wikimedia.org/T163043) (owner: 10Urbanecm) [12:56:30] (03PS4) 10Hashar: Remove all feeds added in T127176 from RSS whitelist for mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348782 (https://phabricator.wikimedia.org/T163217) (owner: 10Urbanecm) [12:56:32] (03PS4) 10Hashar: Set timezone to Asia/Kolkata on wb.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348956 (https://phabricator.wikimedia.org/T163322) (owner: 10Urbanecm) [12:56:34] (03PS3) 10Hashar: Raise autoconfirmed status requirements to 4 days, 10 edits at cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348725 (https://phabricator.wikimedia.org/T163207) (owner: 10Urbanecm) [12:56:36] (03PS2) 10Hashar: Make sysops able to grant/remove confirmed user group at cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348727 (https://phabricator.wikimedia.org/T163206) (owner: 10Urbanecm) [12:56:44] hashar: I can't promise my availability outside of EU swat [12:56:59] (03CR) 10jerkins-bot: [V: 04-1] Set timezone to Asia/Kolkata on wb.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348956 (https://phabricator.wikimedia.org/T163322) (owner: 10Urbanecm) [12:57:00] Zppix: then we might just deploy it for you once reedy has give it a shot :] [12:57:05] (03PS2) 10Elukey: Piwik fake passwords [labs/private] - 10https://gerrit.wikimedia.org/r/349950 (https://phabricator.wikimedia.org/T159136) [12:57:39] (03CR) 10Hashar: [C: 031] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348956 (https://phabricator.wikimedia.org/T163322) (owner: 10Urbanecm) [12:57:41] (03CR) 10Elukey: [V: 032 C: 032] Piwik fake passwords [labs/private] - 10https://gerrit.wikimedia.org/r/349950 (https://phabricator.wikimedia.org/T159136) (owner: 10Elukey) [12:57:43] 06Operations, 10Traffic: Test.wikipedia,org is reporting bad gateways outside of the main page - https://phabricator.wikimedia.org/T163684#3205805 (10Zppix) Correction I get a error 400 bad request... [13:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170424T1300). Please do the needful. [13:00:04] Urbanecm, tto, Zppix, and Dereckson: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [13:00:09] Zppix: I can't repro too.. how are you accessing test.w.o? Browser? [13:00:11] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348309 (https://phabricator.wikimedia.org/T163043) (owner: 10Urbanecm) [13:00:15] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348782 (https://phabricator.wikimedia.org/T163217) (owner: 10Urbanecm) [13:00:20] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348956 (https://phabricator.wikimedia.org/T163322) (owner: 10Urbanecm) [13:00:22] elukey: chrome [13:00:24] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348725 (https://phabricator.wikimedia.org/T163207) (owner: 10Urbanecm) [13:00:29] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348727 (https://phabricator.wikimedia.org/T163206) (owner: 10Urbanecm) [13:00:52] Am here for SWAT [13:01:17] (03Merged) 10jenkins-bot: Enable NewUserMessage on zh_classicalwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348309 (https://phabricator.wikimedia.org/T163043) (owner: 10Urbanecm) [13:01:29] (03CR) 10jenkins-bot: Enable NewUserMessage on zh_classicalwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348309 (https://phabricator.wikimedia.org/T163043) (owner: 10Urbanecm) [13:02:12] (03Merged) 10jenkins-bot: Remove all feeds added in T127176 from RSS whitelist for mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348782 (https://phabricator.wikimedia.org/T163217) (owner: 10Urbanecm) [13:02:14] (03Merged) 10jenkins-bot: Set timezone to Asia/Kolkata on wb.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348956 (https://phabricator.wikimedia.org/T163322) (owner: 10Urbanecm) [13:02:34] (03Merged) 10jenkins-bot: Raise autoconfirmed status requirements to 4 days, 10 edits at cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348725 (https://phabricator.wikimedia.org/T163207) (owner: 10Urbanecm) [13:02:58] I am deploying Urbanecm patches [13:03:24] (03Merged) 10jenkins-bot: Make sysops able to grant/remove confirmed user group at cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348727 (https://phabricator.wikimedia.org/T163206) (owner: 10Urbanecm) [13:03:27] (03CR) 10jenkins-bot: Remove all feeds added in T127176 from RSS whitelist for mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348782 (https://phabricator.wikimedia.org/T163217) (owner: 10Urbanecm) [13:03:28] (03CR) 10jenkins-bot: Set timezone to Asia/Kolkata on wb.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348956 (https://phabricator.wikimedia.org/T163322) (owner: 10Urbanecm) [13:03:31] (03CR) 10jenkins-bot: Raise autoconfirmed status requirements to 4 days, 10 edits at cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348725 (https://phabricator.wikimedia.org/T163207) (owner: 10Urbanecm) [13:03:33] (03CR) 10jenkins-bot: Make sysops able to grant/remove confirmed user group at cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348727 (https://phabricator.wikimedia.org/T163206) (owner: 10Urbanecm) [13:03:35] !log hashar@naos Synchronized wmf-config/InitialiseSettings.php: Enable NewUserMessage on zh_classicalwiki - T163043 (duration: 00m 46s) [13:03:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:03:43] T163043: Request for NewUserMessage extension on zh-classical.wikipedia - https://phabricator.wikimedia.org/T163043 [13:05:04] !log hashar@naos Synchronized wmf-config/InitialiseSettings.php: Remove all feeds added in T127176 from RSS whitelist for mw.org - T163217 (duration: 00m 45s) [13:05:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:05:13] T127176: Include ATOM feed on WMDE Engineering page on mediawiki.org - https://phabricator.wikimedia.org/T127176 [13:05:13] T163217: Remove old ATOM feed which have been used by WMDE Engineering page on mediawiki.org - https://phabricator.wikimedia.org/T163217 [13:05:18] 06Operations, 10ops-eqiad, 10netops, 13Patch-For-Review: Rack and setup new eqiad row D switch stack (EX4300/QFX5100) - https://phabricator.wikimedia.org/T148506#3205842 (10ayounsi) From the feedback I collected here is what I believe the maintenance will look like. please edit this comment or let me know... [13:05:45] tto: have you confirmed patches have landed properly ? [13:05:59] tto: I mean whatever is needed for user group expiry to work? :] [13:06:22] !log hashar@naos Synchronized wmf-config/InitialiseSettings.php: Set timezone to Asia/Kolkata on wb.wikimedia - T163322 (duration: 00m 44s) [13:06:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:06:30] T163322: Change the timezone of West Bengal Wikimedians user group wiki - https://phabricator.wikimedia.org/T163322 [13:06:30] hashar: Yes, a couple minor patches are not landed yet, but we will survive for a couple of days without them. They're not critical at all [13:06:38] Just message changes and the like [13:07:02] I thought they would be landed by now, but seems there was no new wmf branch last week [13:07:18] tto: no deployment and stuff was frozen last week [13:07:21] But like I said, we will live without them for a couple of days [13:07:25] tto: yeah we had a deployment freeze. I guess we want to cherry pick them so ? [13:07:28] o [13:07:29] k [13:07:43] If you want to cherry pick I'd not oppose that [13:07:48] But it's not necessary [13:08:15] (03PS3) 10Hashar: Enable user group expiry in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/347545 (https://phabricator.wikimedia.org/T159416) (owner: 10TTO) [13:08:25] I'm here for SWAT hashar and others. [13:08:38] (03PS1) 10Marostegui: db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349953 (https://phabricator.wikimedia.org/T163548) [13:08:45] (03CR) 10Hashar: [C: 032] "There is a few patches that have not been ported to wmf.20 yet, but TTO clarified they are not strictly needed at this point." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/347545 (https://phabricator.wikimedia.org/T159416) (owner: 10TTO) [13:09:02] hashar: can I deploy: https://gerrit.wikimedia.org/r/#/c/349882/ ? it is needed for a DC operation in a bit [13:09:07] !log hashar@naos Synchronized wmf-config/InitialiseSettings.php: Raise autoconfirmed status requirements to 4 days, 10 edits at cswiki - T163207 (duration: 01m 09s) [13:09:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:09:15] T163207: Raise requirements for getting autoconfirmed status to 4 days, 10 edits at cswiki - https://phabricator.wikimedia.org/T163207 [13:09:26] hashar: actually no, it is in a hour, so we have time [13:09:43] marostegui: :] [13:09:54] messed up a bit the Dallas timezone :) [13:09:57] so no rush [13:10:12] tto: I am going to push your change on mwdebug1001 so it can be tested there [13:10:31] !log hashar@naos Synchronized wmf-config/InitialiseSettings.php: Make sysops able to grant/remove confirmed user group at cswiki - T163206 (duration: 00m 55s) [13:10:33] hashar: Thanks, will test [13:10:38] when ready [13:10:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:40] T163206: Make sysops able to grant/remove confirmed user group at cswiki - https://phabricator.wikimedia.org/T163206 [13:10:49] (03PS1) 10Elukey: Correct hiera path for piwik role [labs/private] - 10https://gerrit.wikimedia.org/r/349954 (https://phabricator.wikimedia.org/T159136) [13:11:02] (03Merged) 10jenkins-bot: Enable user group expiry in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/347545 (https://phabricator.wikimedia.org/T159416) (owner: 10TTO) [13:11:03] !log Deploy alter table on wikidatawiki.wb_terms on db1063 - T162539 https://phabricator.wikimedia.org/T163548 [13:11:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:11:11] T162539: Deploy schema change for adding term_full_entity_id column to wb_terms table - https://phabricator.wikimedia.org/T162539 [13:11:14] (03CR) 10jenkins-bot: Enable user group expiry in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/347545 (https://phabricator.wikimedia.org/T159416) (owner: 10TTO) [13:11:42] (03CR) 10Elukey: [V: 032 C: 032] Correct hiera path for piwik role [labs/private] - 10https://gerrit.wikimedia.org/r/349954 (https://phabricator.wikimedia.org/T159136) (owner: 10Elukey) [13:11:56] 06Operations, 10ORES, 10Revision-Scoring-As-A-Service-Backlog: Re-enable ORES data in action API - https://phabricator.wikimedia.org/T163687#3205901 (10Tgr) [13:11:56] tto: it is on mwdebug1001 now :] [13:12:02] hashar: Will test [13:12:04] :) [13:12:04] !log Deploy alter table on wikidatawiki.wb_terms on db1082 - T162539 - T163548 [13:12:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:12:13] T163548: Drop the useless wb_terms keys "wb_terms_entity_type" and "wb_terms_type" on "wb_terms" table - https://phabricator.wikimedia.org/T163548 [13:12:47] Zppix: weird I can't repro, same thing with curl too ? [13:12:47] hashar: I get "Database locked" error on testwiki [13:13:02] 06Operations, 10ops-eqiad, 15User-fgiunchedi: Some swift disks wrongly mounted on 5 ms-be hosts - https://phabricator.wikimedia.org/T163673#3205915 (10fgiunchedi) I've tried rebooting ms-be1036 though that didn't change anything, I think the issue is a combination of these factors: 1. Hardware is first inst... [13:13:07] elukey: Its working now... I still think some investigation is due [13:13:35] same error on test2 [13:13:53] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master). [13:13:54] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master). [13:14:15] tto: try again ? [13:14:32] tto: what have you tried? [13:14:35] !log hashar@naos Synchronized php-1.29.0-wmf.20/extensions/ProofreadPage/ProofreadPage.namespaces.php: Fix language code for Norwegian (duration: 00m 54s) [13:14:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:14:53] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [13:14:53] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [13:14:55] Tried to submit Special:UserRights and save an edit on test2wiki from mwdebug1001.eqiad [13:15:00] (03Abandoned) 10Marostegui: db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349953 (https://phabricator.wikimedia.org/T163548) (owner: 10Marostegui) [13:15:06] Does "eqiad" have something to do with it here? [13:15:49] i can edit on test [13:15:49] hashar: ^ [13:15:57] Zppix: sure, but usually 400 bad req indicates some malformed client request, this is why I was asking [13:16:12] (03PS8) 10Elukey: Refactor role::piwik in multiple profiles [puppet] - 10https://gerrit.wikimedia.org/r/348938 (https://phabricator.wikimedia.org/T159136) [13:16:21] !log Remove replication codfw - eqiad on s3 (db2018 codfw master will not be a slave of eqiad master) - https://phabricator.wikimedia.org/T130067 https://phabricator.wikimedia.org/T147166 T162133 [13:16:23] hashar: Can you deploy it on mw2017.codfw or mw2099.codfw? [13:16:23] elukey: all i did was navigate to anypage by clicking on the link from the main page... nothing special [13:16:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:16:32] T162133: Analyze if we want to replace some masters in eqiad while it is not active - https://phabricator.wikimedia.org/T162133 [13:17:33] tto: why those specially ? [13:17:41] oh man we are on codfw! [13:17:56] hmm yes, it seems so :) [13:17:56] * Zppix facepalms [13:18:02] 06Operations, 10Traffic, 10netops: Frequent RST returned by appservers to LVS hosts - https://phabricator.wikimedia.org/T163674#3205947 (10ayounsi) [13:18:11] tto: done for both. [13:18:29] Ack, will test from those [13:19:19] (03CR) 10Hashar: "I have spotted a "metawiki: Error: 1146 Table 'dtywiki.linter' doesn't exist (10.192.32.110)"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/347217 (https://phabricator.wikimedia.org/T161529) (owner: 10DatGuy) [13:21:03] hashar: was the patch for T163206 deployed? [13:21:04] T163206: Make sysops able to grant/remove confirmed user group at cswiki - https://phabricator.wikimedia.org/T163206 [13:21:14] Urbanecm: should have? [13:21:43] RECOVERY - Check systemd state on cp1008 is OK: OK - running: The system is fully operational [13:21:44] hashar: stashbot posted a comment to the task which indicate it was deployed. [13:21:46] Urbanecm: or maybe I screwed it up :( [13:22:04] But I can't grant confirmed status... [13:22:12] Urbanecm: it is the last patch of your serie, os maybe I forgot to rebase before deploying it [13:22:13] RECOVERY - Varnish HTTP text-backend - port 3128 on cp1008 is OK: HTTP OK: HTTP/1.1 200 OK - 177 bytes in 0.078 second response time [13:22:22] hashar: maybe. [13:22:23] (03PS12) 10Hashar: Fix EducationProgram user rights so that they can be assigned/removed by sysops [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349427 (https://phabricator.wikimedia.org/T163167) (owner: 10Zppix) [13:22:52] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349427 (https://phabricator.wikimedia.org/T163167) (owner: 10Zppix) [13:24:00] (03Merged) 10jenkins-bot: Fix EducationProgram user rights so that they can be assigned/removed by sysops [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349427 (https://phabricator.wikimedia.org/T163167) (owner: 10Zppix) [13:24:33] Urbanecm: InitialiseSettings.php will be resync again whenever tto has finished testing [13:24:44] Almost done.. [13:24:51] hashar: ack [13:24:59] hashar: Thumbs up! All good. Tests successful [13:25:04] \o/ [13:25:15] https://test.wikipedia.org/wiki/Special:Log/rights [13:25:46] (03CR) 10jenkins-bot: Fix EducationProgram user rights so that they can be assigned/removed by sysops [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349427 (https://phabricator.wikimedia.org/T163167) (owner: 10Zppix) [13:26:02] can i be granted temp sysop for gerrit:349427 on testwiki when you push to test server hashar? [13:26:34] !log hashar@naos Synchronized wmf-config/InitialiseSettings.php: Enable user group expiry in production - T159416 (duration: 00m 49s) [13:27:54] Zppix: I'm a sysop at testwiki, what do you need there? [13:28:17] Urbanecm: i need to confirm that edu rights can be assigned (when my patch is moved to be tested) [13:28:35] Zppix: we can check it via eg https://en.wikinews.org/wiki/Special:ListGroupRights [13:28:56] ah yes i forgot about that hashar [13:29:13] Zppix, I assigned User:Zppix temporary admin rights on testwiki (mainly just to show that user group expiry is now working!) [13:29:29] You can probably have permanent rights there if you plan to do this more often [13:29:36] it is on mw2099 now [13:29:41] thanks tto [13:30:12] and on mw2017 [13:30:21] Remove groups: Pseudo-bots, IP block exemptions, Reviewers, Autochecked users, Course online volunteers, Course campus volunteers, Course instructors and Course coordinators [13:30:26] based on https://en.wikinews.org/wiki/Special:ListGroupRights [13:31:03] not on my end [13:31:17] hashar Zppix I can confirm it too, I may add and remove EP rights. [13:31:35] syncing syncing [13:31:52] 06Operations: Logrotate fails on mediawiki maintenance servers on jessie - https://phabricator.wikimedia.org/T163555#3205959 (10ayounsi) [13:31:53] ack i see it now [13:31:55] 06Operations, 13Patch-For-Review, 15User-Elukey: Tracking and Reducing cron-spam from root@ - https://phabricator.wikimedia.org/T132324#3205960 (10ayounsi) [13:32:01] !log hashar@naos Synchronized wmf-config/CommonSettings.php: Fix EducationProgram user rights so that they can be assigned/removed by sysops - T163167 (duration: 00m 46s) [13:33:03] so lets check [13:33:11] all 5 Urbanecm patches have been deployed [13:33:17] hashar: thanks! [13:33:21] TTO one to enable User Group Expiry has been pushed and verifie [13:33:35] Zppix Education Program rights are enabled [13:33:51] Yes, indeedy! All working fine. Thanks hashar as always :) [13:34:09] hashar: confirmed thanks [13:34:11] and I have pushed Dereckson Fix language code for Norwegian [13:34:18] well [13:34:21] I am just pushing buttons [13:34:24] dont forget full scap on behalf hashar [13:34:27] you are doing all the actual work [13:34:29] of Dereckson [13:34:30] !log Deploy alter table on s3 etwiki on watchlist table directly on the master (db1075) - T130067 [13:34:34] ah yeah good point [13:34:44] your welcome btw :) [13:35:08] wikitech is dead? [13:35:19] WFM [13:35:25] !log reimage analytics1003 to Jessie (Oozie/Hive/Camus not available during this timeframe in the Analytics Hadoop cluster) [13:35:30] yeah wfm marostegui [13:35:31] database error [13:35:42] really? I am getting database errors [13:35:45] maybe silver datbase is dead? [13:35:53] I'm not logged in fwiw [13:35:55] yeah I got the error [13:36:04] labswiki Error: 1054 Unknown column 'ug_expiry' in 'field list' (208.80.154.136) [13:36:05] what was the latest related deployment? [13:36:06] wikitech is fine [13:36:08] no sal? [13:36:20] go XioNoX [13:36:36] elukey: checking [13:36:36] what is ug? [13:36:46] what's up? [13:36:55] user groups? [13:36:59] hashar: https://phabricator.wikimedia.org/T160686 ? [13:37:09] probably someone deployed something that has not been deployed on silver [13:37:25] marostegui: looks like yes [13:38:19] so either revert or deploy [13:38:32] I have pasted the exception trace and reopened that task [13:38:50] replaced ug_user = '8620' with XXXX [13:39:04] jynus: ug_expiry sounds like 08:39 < jynus> data is the same, but user_groups seems to have received an additonal schema change [13:39:09] that task has nothing to do w/ this afaict, that's related to the view on labsdb for enwiki [13:39:13] https://phabricator.wikimedia.org/T155605 [13:39:40] (ug_ for user_groups) [13:39:50] (03CR) 10Gehel: [C: 04-1] Make cirrus logrotate config jessie-compatible (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/349923 (https://phabricator.wikimedia.org/T163555) (owner: 10Muehlenhoff) [13:40:10] 06Operations, 10ops-eqiad: Degraded RAID on ms-be1039 - https://phabricator.wikimedia.org/T163690#3205981 (10ops-monitoring-bot) [13:40:29] I do not know why silver is part of the cluster- either it is separate or not [13:40:45] I can deploy the alter table there now, but I agree with jynus here [13:41:00] we made it part of the general train deployment [13:41:07] it is on that middle ground and nobody is clear about that [13:41:13] its quasi status has been nothing but a headache for as long as I can remember [13:41:16] it was a pain to have silver / wikitech mediawiki code to not be in sync with the rest of the infra [13:41:37] ok, but then either full maintan it or not [13:41:44] as part of the cluster [13:42:04] I will alter labswiki.user_groups now if no one has any objections [13:42:10] it has separate neotwork [13:42:11] so we can fix the issue [13:42:35] cannot be reached from maintenance scripts [13:42:50] move it away from a VM into production [13:43:10] (03CR) 10Muehlenhoff: Make cirrus logrotate config jessie-compatible (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/349923 (https://phabricator.wikimedia.org/T163555) (owner: 10Muehlenhoff) [13:43:19] jynus: it's not a VM and it is in prod in labs-support [13:43:36] but it has different firewall rules than the rest of production [13:43:45] I'm not sure what the story would be on maint scripts not being able to maintain it [13:43:59] jynus: by https://phabricator.wikimedia.org/T163344#3204831, CX is not ready to re-enable, if I don't understand wrong. [13:44:00] possible yeah, but probably from host fw [13:44:01] tto: so the group experiy thing fail for wikitech / labswiki [13:44:02] the table is altered [13:44:14] Going to log it backwards [13:44:28] hashar: Hmm, didn't check labswiki [13:44:38] !log Deploy unscheduled alter table on silver (labswiki.user_groups) - T155605 [13:44:44] hashar, what exactly is the problem? [13:44:51] chasemp, search silver or wikitech on phabricator [13:44:53] jynus: there is patch to fix multiple saving (reduce the query). https://gerrit.wikimedia.org/r/349214 [13:44:54] tto: Error: 1054 Unknown column 'ug_expiry' in 'field list' (208.80.154.136) [13:44:59] oh I see I should have read above [13:45:02] you will find a thousand open tickets [13:45:26] (03PS1) 10BBlack: [untested] keep CAP_SYS_NICE in varnishd worker child proc [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/349956 [13:45:42] I can maintainin it, no problem, but then it should be moved into production. If it cannot, because it is part of labs infrastructure, it should be shared nothing with production [13:45:48] (03CR) 10jerkins-bot: [V: 04-1] [untested] keep CAP_SYS_NICE in varnishd worker child proc [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/349956 (owner: 10BBlack) [13:46:13] marostegui: you can !log against T159416 [13:46:26] (03PS2) 10Muehlenhoff: Make cirrus logrotate config jessie-compatible [puppet] - 10https://gerrit.wikimedia.org/r/349923 (https://phabricator.wikimedia.org/T163555) [13:46:35] so… what's up with wikitech? Everything seems fine/normal to me. [13:46:55] hashar: I don't think SAL worked though for that last log I did [13:47:13] !log Deploy unscheduled alter table on silver (labswiki.user_groups) - T159416 [13:47:17] bah [13:47:20] logmsgbot: [13:47:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:47:22] logmsgbot: ping [13:47:31] !log reimage analytics1003 to Jessie (Oozie/Hive/Camus not available during this timeframe in the Analytics Hadoop cluster) [13:47:34] https://phabricator.wikimedia.org/T110987 [13:47:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:47:43] it did now [13:47:51] marostegui: thank you for the quick hotfix! [13:48:06] marostegui: Thanks for that! Saved us a revert [13:48:17] np! but we need to clear up what happens with silver to avoid issues like this in the future [13:49:14] what about labstestwiki ? [13:49:15] Indeed. Immediate action would be to mention at wikitech:Schema_changes [13:49:16] I'm looking at stashbot logs [13:49:27] it was getting 500s [13:49:36] jynus: that was going to be my second question, if there is anything else we need to alter [13:49:53] !log swift eqiad-prod: more weight on ms-be1028 -> ms-be1039 - T160640 [13:50:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:02] T160640: Rack and Setup ms-be1028-ms-1039 - https://phabricator.wikimedia.org/T160640 [13:50:03] PROBLEM - Check Varnish expiry mailbox lag on cp2005 is CRITICAL: CRITICAL: expiry mailbox lag is 697419 [13:50:39] !log Initial run of populateCognatePages.php complete. 27,595,121 rows in cognate_pages & 17,263,411 in cognate_titles [13:50:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:52] who maintains labstestwiki? [13:51:54] jynus: it is no different than the rest of the wikis in my opinion [13:51:57] Wait, are we talking about wikitech or labtestwikitech? [13:52:06] both [13:52:12] um... [13:52:18] I just bet labstestwiki has the same issue [13:52:21] So I totally can't tell if we're having an emergency or just complaining. [13:52:29] Is there an actual outage happening right now? [13:52:32] I am just asking [13:52:35] there was an outage [13:52:42] andrewbogott: there was one [13:52:59] and I am asking not to blame anyone [13:53:09] hashar: how long you think until swat is done? [13:53:10] but because I fear the answer is noone [13:53:19] still have to run a full scap [13:53:23] PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0] [13:53:35] I don't feel blamed at all, I'm just trying to understand whether there's an immediate issue that I need to understand and fix :) [13:53:38] seems like the first question is, why wasn't whatever change done on wikitech along with everything else? [13:53:55] I maintain labtestwikitech, and in theory no one outside of wmcs should know/care about if it breaks. [13:54:35] !log hashar@naos Started scap: Full scap for namespaces related changes (T161529 and https://gerrit.wikimedia.org/r/#/c/349864/1) [13:54:39] wikitech is part of the normal deployment train, and it gets all normal mw rollouts. It is the responsibility of mw devs/deployers to not break it during roll-outs. [13:54:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:54:43] T161529: Create Wikipedia Doteli - https://phabricator.wikimedia.org/T161529 [13:54:53] Whereas labtest wikitech is NOT and I upgrade it according to my inscrutable whim. [13:54:54] marostegui: after that full scap ^ swat will be done [13:55:01] chasemp: Because I guess it is not clear whether wikitech (silver)'s database is maintaned by dbas? [13:55:06] Dereckson: full swat in progress [13:55:08] hashar: great! thank you! [13:55:17] andrewbogott: that's my understanding as well but jynus was saying there are issues with scap I think [13:55:18] ema bblack ^ 5xx on upload, could be the mailbox issue again? swift seems fine but I just deployed a rebalance in eqiad [13:55:23] andrewbogott, I think there is a missunderstanding there [13:55:30] (03PS2) 10Marostegui: db-codfw.php: Depool db2043, db2061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349882 (https://phabricator.wikimedia.org/T163339) [13:55:33] you no longer upgrade wikitech, as I can see [13:55:37] scap does [13:55:40] marostegui: ahhh ok, I thought it was [13:55:44] godog: indeed, it's cp2002 [13:55:55] or maybe I am wrong [13:56:07] jynus: isn't that what I just said? "wikitech is part of the normal deployment train, and it gets all normal mw rollouts." [13:56:12] ema: ack, thanks [13:56:27] (03PS1) 10ArielGlenn: add link to json status info to per-dump index.html [puppet] - 10https://gerrit.wikimedia.org/r/349957 [13:56:28] "Whereas labtest wikitech is NOT and I upgrade it according to my inscrutable whim." [13:56:36] I do not understand, this^ [13:56:52] ah [13:56:54] jynus: andrewbogott upgrades the testlabswiki whenever he feels like it [13:56:56] you meant labstest [13:57:01] andrewbogott: jynus: the question is "who should apply patches to the labswiki db?" [13:57:09] yes [13:57:20] andrewbogott: that's not the responsibility of the train to deploy database changes [13:57:25] I have never touched silver in the psat [13:57:35] !log ema@neodymium conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-be [13:57:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:57:54] andrewbogott: train is concerned to push the code, looks what pops as error, fix or rollback [13:58:13] Dereckson: so when the train rollouts requre a corresponding database change... [13:58:16] how is that handled? [13:58:18] when did it started happeniong? [13:58:24] because I was never told [13:59:50] Nikerabbit: hey [13:59:56] Nikerabbit: https://wikitech.wikimedia.org/wiki/Incident_documentation/20170419-ContentTranslation is still pretty empty? [13:59:59] I believe releng started including wikitech in teh train some time ago and it seems like tht didn't come full circle for jynus and marostegui [14:00:05] matt_flaschen: Respected human, time to deploy GuidedTour and RCFilters (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170424T1400). Please do the needful. [14:00:10] for heads up I mean [14:00:11] andrewbogott: devs are encouraged to write code with a feature guard, to avoid the code to immediately require the db change [14:00:21] chasemp, I litteraly never touched wikitech at all [14:00:33] maybe krenair ran update.php manually? [14:00:40] used to*^? [14:01:01] because I doubt this was not broken before if not [14:01:05] Dereckson: but presumably at some point the database is updated, right? [14:01:10] andrewbogott: https://www.mediawiki.org/wiki/Development_policy#Database_patches the 'Make your schema change optional – All schema changes must go through a period of being optional' part [14:01:40] jynus: sure I understand, I think andrewbogott is trying to focus on what we want to do going forward [14:01:41] I doubt that has worked in the last 2 years [14:01:50] so something was running those [14:03:07] chasemp, I am not complaining, I want to know how far we have to go back to solve the issue [14:04:02] understood man (have to afk for a minute) [14:04:37] jynus: I suspect you're right, that Krenair was quietly doing in the background. I certainly haven't ever manually upgraded things on silver since we got it on the deployment train. [14:04:53] well, that is good news [14:05:00] becaues it means we only have to go a few months [14:05:05] instead of 2 years [14:05:08] So going forward, silver DBs should receive whatever upgrades/maintenance the normal/prod dbs get. [14:05:21] Well, it might be 2 years, I've no idea. I just wouldn't expect it to still work at all if it were 2 years :( [14:06:04] thing is what if you just sync silver right now to what it should be at what would happen? [14:06:28] jynus: but I also sort of feel like this is a deployment issue, like the releng people should be communicating what needs upgrading to you, and they should also be including silver as part of that... [14:06:29] (03PS3) 10Filippo Giunchedi: prometheus: add aggregation rules for apache and hhvm [puppet] - 10https://gerrit.wikimedia.org/r/334662 [14:06:46] (That is what I'm trying to pry out of Dereckson… I still don't understand what the upgrade process is AT ALL outside of scap) [14:06:57] Dereckson: The policy was followed from the developers' end. The issue here is that silver fell between the cracks somehow. It received code updates but never received the schema update, even though all other wikis got it [14:07:03] andrewbogott, deployment, as you have just seen, wants to know nothing to do with databases [14:07:26] jynus: so how do find out about schema updates, if not from them? [14:07:29] andrewbogott: first code is deployed through train, afterwards dba apply the schema change [14:07:35] I got them from developers [14:07:42] Developers open a separate task to require the schema change [14:07:43] andrewbogott: the developers create the tickets following some guidelines [14:07:46] andrewbogott, https://phabricator.wikimedia.org/T155605 [14:07:51] whoa, really? [14:07:58] Ok, that's not what I expected! [14:08:02] "all wikis" is not very specific [14:08:18] dblist file is more specific [14:08:25] !log re-pooling cp2002's varnish-be with increased priority for expiry thread T145661 [14:08:27] to be honest for "all wikis" I understand all shards and/or x1 [14:08:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:33] T145661: varnish backends start returning 503s after ~6 days uptime - https://phabricator.wikimedia.org/T145661 [14:08:50] marostegui, as I said, I never did a schema change on silver before [14:09:01] and manuel probably doesn't even know what silver is [14:09:01] !log ema@neodymium conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-be [14:09:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:09:17] jynus: I had to find out las week about it because of some grants issues for the failover XD [14:09:31] But before last week, I had no idea no [14:10:23] RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [14:10:41] !log hashar@naos Finished scap: Full scap for namespaces related changes (T161529 and https://gerrit.wikimedia.org/r/#/c/349864/1) (duration: 16m 06s) [14:10:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:10:50] T161529: Create Wikipedia Doteli - https://phabricator.wikimedia.org/T161529 [14:10:50] (03Abandoned) 10Filippo Giunchedi: site: add prometheus200[34] [puppet] - 10https://gerrit.wikimedia.org/r/327555 (https://phabricator.wikimedia.org/T148408) (owner: 10Filippo Giunchedi) [14:11:01] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: add aggregation rules for apache and hhvm [puppet] - 10https://gerrit.wikimedia.org/r/334662 (owner: 10Filippo Giunchedi) [14:11:02] So, going forward, is there someplace I can document the existence of Silver? Like, should I add a note to the description of T155605 that says "And don't forget silver!" ? [14:11:03] T155605: Schema changes for expiring user groups - https://phabricator.wikimedia.org/T155605 [14:11:30] andrewbogott, I think the right way is to enforce a dblist on "where to apply the schema change" [14:11:37] hashar: I will go ahead then [14:11:39] if we do not get a dblist, we declien [14:12:08] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2043, db2061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349882 (https://phabricator.wikimedia.org/T163339) (owner: 10Marostegui) [14:12:47] jynus: ok, in that case, are there docs about how to properly file a schema change request? :) [14:12:48] jynus: agreed, that would be really useful. Maybe we can change the schema change request wiki page to include that [14:12:59] I am editing https://wikitech.wikimedia.org/w/index.php?title=Schema_changes&action=edit§ion=1 [14:13:04] marostegui: yeah, that sounds like what we want [14:13:04] :) [14:13:23] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2043, db2061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349882 (https://phabricator.wikimedia.org/T163339) (owner: 10Marostegui) [14:13:36] (03CR) 10jenkins-bot: db-codfw.php: Depool db2043, db2061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349882 (https://phabricator.wikimedia.org/T163339) (owner: 10Marostegui) [14:13:45] Thank you jynus! Sorry that nobody got wikitech/silver on your radar before. [14:13:54] not your fault [14:14:01] and probably nobody's [14:14:13] if someone was doing it with no trace [14:14:29] (assuming that is the case) [14:14:30] or not doing it at all... [14:14:37] !log rebooting ms1001 for kernel update to Linux 4.9 [14:14:40] but the issue is labstestwiki is part of all [14:14:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:58] !log marostegui@naos Synchronized wmf-config/db-codfw.php: Depool db2043 and db2061 - T163339 (duration: 01m 08s) [14:14:59] we should either remove it from there [14:15:04] (03CR) 10Mattflaschen: [C: 032] Enable GuidedTour on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331986 (https://phabricator.wikimedia.org/T152827) (owner: 10Dereckson) [14:15:05] or keep it in sync [14:15:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:15:06] T163339: pdu phase inbalances: ps1-a3-codfw, ps1-c6-codfw, & ps1-d6-codfw - https://phabricator.wikimedia.org/T163339 [14:15:14] RECOVERY - Check Varnish expiry mailbox lag on cp2002 is OK: OK: expiry mailbox lag is 0 [14:15:24] jynus: if you would like to/are willing to maintain the dbs on labtestwiki when you do silver, that would be fine with me. (more than fine.) [14:15:58] But I don't want anyone to regard labtest outages as anything worth missing lunch over. [14:16:00] one more is not a problem [14:16:22] what I need is normalization: either all or none, etc. [14:16:31] not "half maintained, etc." [14:17:02] !log Stop MySQL db2043 and db2061 for maintenance - https://phabricator.wikimedia.org/T163339 [14:17:04] we would need to review labswiki [14:17:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:17:10] (03PS3) 10Mattflaschen: Enable GuidedTour on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331986 (https://phabricator.wikimedia.org/T152827) (owner: 10Dereckson) [14:17:10] Sure — if it's easy then go ahead and keep the dbs there in sync with silver. If that results in incompatibility with the mw version running there you can either ignore it or just let me know and I'll sync as needed. [14:17:23] because there may be other latent issues of non-executed schema changes [14:17:26] (03CR) 10Mattflaschen: [C: 032] Enable GuidedTour on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331986 (https://phabricator.wikimedia.org/T152827) (owner: 10Dereckson) [14:17:36] (99% of the time there's no reason for the mw version there to drift, but for the 1% of cases it's nice to not have it updated by the train) [14:18:01] maybe remove it from the config, then? [14:18:23] PROBLEM - Check Varnish expiry mailbox lag on cp2024 is CRITICAL: CRITICAL: expiry mailbox lag is 596587 [14:18:25] from dblists and db-eqiad.php, etc. [14:18:31] that would work for me [14:18:37] hm… [14:18:43] am I wrong and it /does/ get train updates? [14:18:44] * andrewbogott checks [14:18:50] again, I do not mind maintaining or not [14:18:58] but it has to be very, very clear [14:19:17] can you see that things there are not clear- I think it started getting integrated [14:19:22] (03Merged) 10jenkins-bot: Enable GuidedTour on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331986 (https://phabricator.wikimedia.org/T152827) (owner: 10Dereckson) [14:19:30] (03CR) 10jenkins-bot: Enable GuidedTour on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331986 (https://phabricator.wikimedia.org/T152827) (owner: 10Dereckson) [14:19:32] I do not want to spend time on this [14:19:39] let's decide and act on it [14:19:45] ok, I just changed my mind: Let's just get wikitech and labtestwikitech both on the train, and call them the same problem. I don't care enough about the 1% to have it be an exceptional case. [14:19:58] Is that an ok option for you? [14:20:02] maybe you can "undo" changes? [14:20:08] when you need it [14:20:08] PROBLEM - MariaDB Slave IO: s3 on db2043 is CRITICAL: CRITICAL slave_io_state could not connect [14:20:17] PROBLEM - MariaDB Slave SQL: s3 on db2043 is CRITICAL: CRITICAL slave_sql_state could not connect [14:20:18] I silenced it! [14:20:21] db2043, expected? [14:20:25] With the additional bonus that if you have to just ruin the labtest db state to get it in sync, that's fine. No content there that I care about. [14:20:28] yes [14:20:30] ah [14:20:30] all good right? [14:20:30] ok [14:21:00] sorry for the page, I am pretty sure I silenced it along with db2061 [14:21:31] I prefer a fake page at midday than a real one at 4am [14:21:31] Ah, I silenced db2023 instead of 43, lame [14:21:41] haha [14:21:43] what jynus said :) [14:21:48] 06Operations, 10ops-codfw, 10DBA, 13Patch-For-Review: pdu phase inbalances: ps1-a3-codfw, ps1-c6-codfw, & ps1-d6-codfw - https://phabricator.wikimedia.org/T163339#3206135 (10Marostegui) @Papaul you can do the maintenance on db2043 and db2061 now. They have been depooled. Please let me know when it is done,... [14:22:11] PROBLEM - mysqld processes on db2061 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [14:22:22] expected too [14:24:09] jynus: I created T163694 which I expect bryan can do pretty easily. Do you generally have enough info (and enough decisions) for you to move ahead? [14:24:10] T163694: Get labtestwikitech/californium on the deployment train - https://phabricator.wikimedia.org/T163694 [14:26:12] mutante, deployment.codfw.wmnet ssh fingerprint changed. Hopefully, that is because of the data center switch. New fingerprint is 14:88:ae:23:69:ba:d8:30:a1:bb:32:55:0f:6d:86:f4. [14:26:43] matt_flaschen: mira has ben decom, deployment.codfw.wmnet redirects to naos.codfw.wmnet [14:27:15] !log Deploy alter table on s3 etwiki on watchlist table directly on the master (db1075) - T130067 [14:27:16] https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/naos.codfw.wmnet matches yours [14:27:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:27:24] T130067: Add wl_id to watchlist tables on production dbs - https://phabricator.wikimedia.org/T130067 [14:27:38] (03Abandoned) 10BBlack: [untested] keep CAP_SYS_NICE in varnishd worker child proc [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/349956 (owner: 10BBlack) [14:27:41] (03PS2) 10ArielGlenn: add link to json status info to per-dump index.html [puppet] - 10https://gerrit.wikimedia.org/r/349957 [14:28:48] (03CR) 10ArielGlenn: [C: 032] add link to json status info to per-dump index.html [puppet] - 10https://gerrit.wikimedia.org/r/349957 (owner: 10ArielGlenn) [14:28:56] Thanks hashar for the full scap, everything works fine [14:29:26] Dereckson: \o/ [14:31:21] hey, just got home. hashar is it ok now? [14:34:05] !log mattflaschen@naos Synchronized wmf-config/InitialiseSettings.php: Enable GuidedTour on all wikis (duration: 00m 59s) [14:34:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:34:31] hashar: strange, commit 9fb66f6b99c7a90931090fadff655b8e9698e75a [14:34:34] Date: Fri Apr 14 23:18:31 2017 +0100 [14:34:37] + $dbw->sourceFile( "$IP/extensions/Linter/linter.sql" ); [14:34:40] (in addWiki.php) [14:34:53] (for the linter table) [14:35:46] oh ok [14:35:47] jynus: also I've been partially incoherent this morning and a couple of times said that labtestwikitech is hosted on californium, which is not right. It runs on labtestweb2001. [14:35:58] hashar: DatGuy: for the linter, it's part of wmf21, we're still wmf20 [14:36:22] Dereckson: ah nice [14:36:45] (03PS1) 10Ema: varnish: set LimitRTPRIO=infinity in systemd unit file [puppet] - 10https://gerrit.wikimedia.org/r/349961 (https://phabricator.wikimedia.org/T145661) [14:36:47] (03CR) 10Gehel: [C: 04-1] "Another minor comment (sorry, I should have seen that on the first review)" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/349923 (https://phabricator.wikimedia.org/T163555) (owner: 10Muehlenhoff) [14:38:35] !log Created linter table on ptwikimedia and dtywiki [14:38:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:13] 06Operations, 10ops-eqiad, 10media-storage: Degraded RAID on ms-be1002 - https://phabricator.wikimedia.org/T163209#3206185 (10fgiunchedi) a:05Cmjohnson>03fgiunchedi Indeed megacli doesn't seem happy ``` # megacli -CfgEachDskRaid0 WB RA Direct CachedBadBBU -a0 Adapte... [14:39:43] !log Deployment of T152827 ("Enable GuidedTour on all wikis") complete and tested [14:39:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:51] T152827: Enable GuidedTour on all wikis - https://phabricator.wikimedia.org/T152827 [14:39:54] now you can troll my edit: https://wikitech.wikimedia.org/w/index.php?title=Schema_changes&type=revision&diff=1757272&oldid=1756993 [14:40:46] jynus: looks good to me [14:41:19] it puts some burden on the change requesters [14:41:38] that is why I expect people to complain [14:43:50] (03CR) 10Mattflaschen: [C: 032] Add b/c for ORES config format change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349108 (https://phabricator.wikimedia.org/T162760) (owner: 10Catrope) [14:43:53] (03CR) 10Mattflaschen: [C: 032] Set ORES thresholds for enwiki ahead of RCFilters release [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349014 (owner: 10Catrope) [14:44:11] (03PS2) 10Mattflaschen: Add b/c for ORES config format change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349108 (https://phabricator.wikimedia.org/T162760) (owner: 10Catrope) [14:44:16] (03CR) 10jerkins-bot: [V: 04-1] Set ORES thresholds for enwiki ahead of RCFilters release [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349014 (owner: 10Catrope) [14:44:33] (03PS3) 10Mattflaschen: Set ORES thresholds for enwiki ahead of RCFilters release [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349014 (owner: 10Catrope) [14:44:50] (03PS10) 10Mattflaschen: Enable RCFilters beta feature on all wikis except wikidatawiki, nlwiki, cswiki, etwiki and hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343439 (https://phabricator.wikimedia.org/T144458) (owner: 10Catrope) [14:45:38] (03CR) 10Mattflaschen: [C: 032] Add b/c for ORES config format change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349108 (https://phabricator.wikimedia.org/T162760) (owner: 10Catrope) [14:45:46] (03CR) 10Mattflaschen: [C: 032] Set ORES thresholds for enwiki ahead of RCFilters release [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349014 (owner: 10Catrope) [14:45:53] (03CR) 10Mattflaschen: [C: 032] Enable RCFilters beta feature on all wikis except wikidatawiki, nlwiki, cswiki, etwiki and hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343439 (https://phabricator.wikimedia.org/T144458) (owner: 10Catrope) [14:46:44] (03Merged) 10jenkins-bot: Add b/c for ORES config format change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349108 (https://phabricator.wikimedia.org/T162760) (owner: 10Catrope) [14:46:50] (03Merged) 10jenkins-bot: Set ORES thresholds for enwiki ahead of RCFilters release [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349014 (owner: 10Catrope) [14:46:53] (03CR) 10jenkins-bot: Add b/c for ORES config format change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349108 (https://phabricator.wikimedia.org/T162760) (owner: 10Catrope) [14:47:05] (03Merged) 10jenkins-bot: Enable RCFilters beta feature on all wikis except wikidatawiki, nlwiki, cswiki, etwiki and hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343439 (https://phabricator.wikimedia.org/T144458) (owner: 10Catrope) [14:48:50] (03CR) 10jenkins-bot: Set ORES thresholds for enwiki ahead of RCFilters release [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349014 (owner: 10Catrope) [14:49:33] (03PS3) 10Muehlenhoff: Make cirrus logrotate config jessie-compatible [puppet] - 10https://gerrit.wikimedia.org/r/349923 (https://phabricator.wikimedia.org/T163555) [14:50:09] (03CR) 10BBlack: [C: 031] varnish: set LimitRTPRIO=infinity in systemd unit file [puppet] - 10https://gerrit.wikimedia.org/r/349961 (https://phabricator.wikimedia.org/T145661) (owner: 10Ema) [14:50:31] !log mattflaschen@naos Synchronized wmf-config/: Release RC Filters on more wikis and prep changes for that (duration: 00m 53s) [14:50:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:12] (03CR) 10Ema: [V: 032 C: 032] varnish: set LimitRTPRIO=infinity in systemd unit file [puppet] - 10https://gerrit.wikimedia.org/r/349961 (https://phabricator.wikimedia.org/T145661) (owner: 10Ema) [14:54:43] !log mattflaschen@naos Synchronized php-1.29.0-wmf.20/extensions/ORES: Make the preference for the "r" flag on the RC page also control highlighting (duration: 00m 48s) [14:54:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:47] (03PS1) 10Ema: 4.1.5-1wm2: add 0006-exp-thread-realtime.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/349965 (https://phabricator.wikimedia.org/T145661) [14:58:09] (03Abandoned) 10Dereckson: Add pa.wikisource new wiki on RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/343552 (https://phabricator.wikimedia.org/T149522) (owner: 10Dereckson) [14:58:13] (03CR) 10BBlack: [C: 031] 4.1.5-1wm2: add 0006-exp-thread-realtime.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/349965 (https://phabricator.wikimedia.org/T145661) (owner: 10Ema) [15:00:43] (03CR) 10Ema: [V: 032 C: 032] 4.1.5-1wm2: add 0006-exp-thread-realtime.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/349965 (https://phabricator.wikimedia.org/T145661) (owner: 10Ema) [15:02:13] mwscript isn't working terbium, using naos instead. [15:04:53] andrewbogott, labstestweb is failing as expected [15:05:11] https://logstash.wikimedia.org/goto/ea985ecfa24b37b33708d3718bd603dd [15:05:22] jynus: yeah, not a shock. [15:06:10] !log Preference updates (for ORES on enwiki) done, using naos instead of terbium [15:06:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:07:18] Hi all, if logged in to sysop&bureaucrat account this page https://cs.wikipedia.org/wiki/Speci%C3%A1ln%C3%AD:Posledn%C3%AD_zm%C4%9Bny can't be loaded (I receive blank screen, https://ctrlv.cz/eOdn). Is something in progress which can cause this? [15:07:46] !log varnish 4.1.5-1wm2 uploaded to apt.w.o T145661 [15:07:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:07:54] T145661: varnish backends start returning 503s after ~6 days uptime - https://phabricator.wikimedia.org/T145661 [15:08:09] matt_flaschen: use wasat for scripts [15:09:30] !log disabling the bgp session between pfw-codfw and cr2 for T163447 [15:09:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:09:38] T163447: Interface errors on pfw-codfw:xe-15/0/0 - https://phabricator.wikimedia.org/T163447 [15:10:02] 06Operations: Puppet facts around the primary network interface and IPv4/IPv6 address - https://phabricator.wikimedia.org/T163196#3206314 (10Volans) Comparison beween `ipaddress6` and `ipaddress6_primary`. All the ones where **there is some issue are marked in bold** and have a number in square brakects that is... [15:10:13] !log GuidedTour/RCFilters/ORES deployment complete and tested [15:10:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:11:20] Dereckson, thanks. Is that documented anywhere? [15:11:27] ores seems to have broken https://cs.wikipedia.org/wiki/Speci%C3%A1ln%C3%AD:Posledn%C3%AD_zm%C4%9Bny [15:11:43] when I enable it , it returns a blank page on recentchanges [15:12:04] matt_flaschen, ^ [15:12:15] jynus, okay, testing now. en (which has ORES) and es are working, so I thought it was okay. CHecking now. [15:12:53] logged in to cs, enabled it and blank page [15:12:55] matt_flaschen: https://wikitech.wikimedia.org/wiki/Wasat https://wikitech.wikimedia.org/wiki/Terbium https://wikitech.wikimedia.org/wiki/SWAT_deploys/Deployers#SSH_Connections.2FCommands [15:13:46] so I can replicate Urbanecm's report [15:14:40] 250 Undefined index: goodfaith in /srv/mediawiki/wmf-config/CommonSettings.php on line 3338 [15:14:43] 250 Undefined index: bad in /srv/mediawiki/wmf-config/CommonSettings.php on line 3339 [15:15:02] (03PS1) 10Marostegui: db-codfw.php: Repool db2043 and db2061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349970 (https://phabricator.wikimedia.org/T163339) [15:15:35] (03CR) 10Marostegui: [C: 04-2] "Wait for maintenance to happen" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349970 (https://phabricator.wikimedia.org/T163339) (owner: 10Marostegui) [15:16:19] 06Operations, 10Traffic: Special:RecentChanges in etwiki displays error - https://phabricator.wikimedia.org/T163696#3206368 (10Zppix) p:05Triage>03High [15:16:52] So… all the ORES-not-RCFilters wikis' RC pages are broken if ORES is enabled? [15:17:25] Would it fix them to enable RCFilters there? Ahead of schedule, but better than broken. [15:17:48] James_F, I don't think so. I think I see what-ish the cause is. I'll revert. [15:18:05] Kk. [15:18:44] (03PS1) 10Dereckson: Ensure wgOresFiltersThresholds relevant keys exist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349971 [15:18:51] is something known/ongoing with recentchanges -related things? [15:18:52] matt_flaschen: if you don't revert the config change, please consider to merge that ^ [15:19:01] bblack: matt_flaschen ORES change [15:19:03] ok [15:21:25] matt_flaschen: we're currently serving notices, please revert 681ac85cd0758daba405f725f9351e8ef95ce68f or deploy https://gerrit.wikimedia.org/r/349971 [15:21:48] 06Operations, 10Traffic: Special:RecentChanges in etwiki displays error - https://phabricator.wikimedia.org/T163696#3206411 (10matej_suchanek) p:05High>03Unbreak! >>! In T158004#3206258, @matej_suchanek wrote: > https://cs.wikipedia.org/wiki/Speciální:Poslední_změny (cswiki RC) **down for me with beta ORES... [15:21:52] Dereckson, I think your patch is the right solution. Reverting would cause other problems. [15:22:00] (03CR) 10Mattflaschen: [C: 032] Ensure wgOresFiltersThresholds relevant keys exist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349971 (owner: 10Dereckson) [15:22:44] Yes I concur, it makes sense to populate these values when they exist per roan rationale, we only need to ensure there is something to copy there [15:23:48] (03CR) 10Jforrester: "Caused T163696, whoops." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349108 (https://phabricator.wikimedia.org/T162760) (owner: 10Catrope) [15:24:15] (03CR) 10BryanDavis: Designate: Allow labs clients to access the designate API. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/349531 (https://phabricator.wikimedia.org/T45580) (owner: 10Andrew Bogott) [15:24:49] Oh! Oops! Why did that fail exactly [15:24:49] 06Operations, 10ops-codfw, 10DBA, 13Patch-For-Review: pdu phase inbalances: ps1-a3-codfw, ps1-c6-codfw, & ps1-d6-codfw - https://phabricator.wikimedia.org/T163339#3206426 (10Papaul) @Marostegui we are clear for db2061 Tower A Loads: X 11.16 Y 8.61 Z 10.46 Tower B Loads: X 11.03 Y 7.93 Z 10.66 no more w... [15:24:51] ? [15:25:10] (03Merged) 10jenkins-bot: Ensure wgOresFiltersThresholds relevant keys exist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349971 (owner: 10Dereckson) [15:25:31] 06Operations, 10ops-codfw, 10DBA, 13Patch-For-Review: pdu phase inbalances: ps1-a3-codfw, ps1-c6-codfw, & ps1-d6-codfw - https://phabricator.wikimedia.org/T163339#3206434 (10Marostegui) >>! In T163339#3206426, @Papaul wrote: > @Marostegui we are clear for db2061 > > Tower A Loads: X 11.16 Y 8.61 Z 10.46... [15:26:11] Ooh I see, wikis with ORES installed but no threshold settings [15:26:19] RECOVERY - mysqld processes on db2061 is OK: PROCS OK: 1 process with command name mysqld [15:28:17] !log mattflaschen@naos Synchronized wmf-config/CommonSettings.php: T163696: Only copy filter thresholds if they are set (duration: 01m 10s) [15:28:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:26] T163696: Special:RecentChanges in wikidata/cs/etwiki displays error due to config screwup - https://phabricator.wikimedia.org/T163696 [15:28:32] matt_flaschen: Confirmed fixed on cswiki for me. [15:29:32] James_F, yeah, sorry about that. [15:35:18] (03PS2) 10Andrew Bogott: Designate: Allow labs clients to access the designate API. [puppet] - 10https://gerrit.wikimedia.org/r/349531 (https://phabricator.wikimedia.org/T45580) [15:39:21] 06Operations, 10ops-eqiad, 10media-storage: Degraded RAID on ms-be1002 - https://phabricator.wikimedia.org/T163209#3206519 (10fgiunchedi) 05Open>03Resolved @Cmjohnson the disk in slot 7 was marked as 'foreign config' and it looks like it contained a previous filesystem, maybe from another swift box? Thes... [15:40:53] 06Operations, 10Pybal, 10Traffic, 10netops: Frequent RST returned by appservers to LVS hosts - https://phabricator.wikimedia.org/T163674#3206531 (10BBlack) 05Resolved>03Open hmm, no, it is the HTTPS check, not the IdleConnection one. I wonder why it's RST and not regular close? [15:44:19] RECOVERY - MariaDB Slave IO: s3 on db2043 is OK: OK slave_io_state Slave_IO_Running: Yes [15:44:27] !log poweroff prometheus2003 for memory upgrade - T163386 [15:44:29] RECOVERY - MariaDB Slave SQL: s3 on db2043 is OK: OK slave_sql_state Slave_SQL_Running: Yes [15:44:31] !log stopping all slaves on dbstore1001 for maintenance [15:44:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:44:35] T163386: upgrade memory in prometheus200[34] - https://phabricator.wikimedia.org/T163386 [15:44:41] papaul: ^ prometheus2003 should be off shortly [15:44:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:46:28] godog: on thanks [15:47:29] (03PS4) 10Gehel: Make cirrus logrotate config jessie-compatible [puppet] - 10https://gerrit.wikimedia.org/r/349923 (https://phabricator.wikimedia.org/T163555) (owner: 10Muehlenhoff) [15:47:48] (03CR) 10Gehel: [C: 031] "LGTM, puppet compiler agrees: https://puppet-compiler.wmflabs.org/6212/" [puppet] - 10https://gerrit.wikimedia.org/r/349923 (https://phabricator.wikimedia.org/T163555) (owner: 10Muehlenhoff) [15:49:37] (03CR) 10Gehel: [C: 032] Make cirrus logrotate config jessie-compatible [puppet] - 10https://gerrit.wikimedia.org/r/349923 (https://phabricator.wikimedia.org/T163555) (owner: 10Muehlenhoff) [15:52:28] moritzm: ^ logrotate change merged, looking good [15:53:04] gehel: ok, thanks! I'll double-check cron mails tomorrow morning [15:53:09] yep [16:00:53] godog: system is back up with 96GB [16:01:12] papaul: sweet, thanks! [16:01:17] yw [16:02:30] 06Operations, 07HHVM: Frequent TCP RST on connections between HHVM and Redis - https://phabricator.wikimedia.org/T162354#3206798 (10elukey) 05Open>03Resolved Merged by upstream and included in the last 3.18 hhvm build. [16:10:04] RECOVERY - Check Varnish expiry mailbox lag on cp2005 is OK: OK: expiry mailbox lag is 1670 [16:11:12] !log upgrade cp2017 varnish-be to varnish 4.1.5-1wm2, expiry thread lock/priority workaround T145661 [16:11:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:11:22] T145661: varnish backends start returning 503s after ~6 days uptime - https://phabricator.wikimedia.org/T145661 [16:19:46] 06Operations, 10ops-ulsfo, 10fundraising-tech-ops, 13Patch-For-Review: rack/setup frbackup2001 - https://phabricator.wikimedia.org/T162469#3206869 (10Jgreen) 05Open>03Resolved this is done! [16:23:14] RECOVERY - Check Varnish expiry mailbox lag on cp2017 is OK: OK: expiry mailbox lag is 0 [16:39:08] 06Operations, 10Phabricator: Intermittent DB connectivity problem on phabricator, needs investigation - https://phabricator.wikimedia.org/T163507#3206912 (10mmodell) a:05mmodell>03None @faidon I will do whatever I can to help debug it on the phabricator side, however, I will need help from ops and/or DBAs... [16:53:12] 06Operations, 10Phabricator: Intermittent DB connectivity problem on phabricator, needs investigation - https://phabricator.wikimedia.org/T163507#3199829 (10epriestley) I don't remember if we set this up previously, but Phabricator supports a `persistent` flag to enable persistent connections. It is documented... [16:54:02] (03CR) 10Marostegui: [C: 032] db-codfw.php: Repool db2043 and db2061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349970 (https://phabricator.wikimedia.org/T163339) (owner: 10Marostegui) [16:54:06] (03PS2) 10Marostegui: db-codfw.php: Repool db2043 and db2061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349970 (https://phabricator.wikimedia.org/T163339) [16:56:15] !log poweroff prometheus2004 for memory upgrade - T163386 [16:56:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:56:23] T163386: upgrade memory in prometheus200[34] - https://phabricator.wikimedia.org/T163386 [16:56:26] papaul: ^ powering off now [16:57:24] PROBLEM - Host prometheus2004 is DOWN: PING CRITICAL - Packet loss = 100% [16:57:39] whoops, forgot to silence [16:58:16] !log marostegui@naos Synchronized wmf-config/db-codfw.php: Repool db2043 and db2061 with less weight - T163339 (duration: 01m 16s) [16:58:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:58:24] T163339: pdu phase inbalances: ps1-a3-codfw, ps1-c6-codfw, & ps1-d6-codfw - https://phabricator.wikimedia.org/T163339 [16:58:32] godog: ok [17:00:04] gehel: Dear anthropoid, the time has come. Please deploy Weekly Wikidata query service deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170424T1700). [17:07:04] SMalyshev: deployment of wdqs on test done, tests looking good, moving to prod [17:08:19] !log gehel@naos Started deploy [wdqs/wdqs@481346a]: (no justification provided) [17:08:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:10:06] !log gehel@naos Finished deploy [wdqs/wdqs@481346a]: (no justification provided) (duration: 01m 47s) [17:10:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:11:03] godog: prometheus2004 is now at 96GB [17:11:03] (03CR) 10GWicke: [C: 031] "+1 to lowering the wikiuser timeouts to 60s. Note that I did not review the implementation." [software] - 10https://gerrit.wikimedia.org/r/346559 (https://phabricator.wikimedia.org/T160984) (owner: 10Jcrespo) [17:11:19] SMalyshev: deployment completed, tests looking good... [17:11:44] RECOVERY - Host prometheus2004 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [17:12:20] 06Operations, 10ops-codfw, 15User-fgiunchedi: upgrade memory in prometheus200[34] - https://phabricator.wikimedia.org/T163386#3207053 (10Papaul) a:05Papaul>03fgiunchedi memory upgrade complete [17:14:37] (03PS2) 10Gehel: Enable "trailing poller" functionality for production. [puppet] - 10https://gerrit.wikimedia.org/r/347565 (https://phabricator.wikimedia.org/T161342) (owner: 10Smalyshev) [17:16:18] (03CR) 10Gehel: [C: 032] Enable "trailing poller" functionality for production. [puppet] - 10https://gerrit.wikimedia.org/r/347565 (https://phabricator.wikimedia.org/T161342) (owner: 10Smalyshev) [17:17:47] papaul: nice, thanks! [17:18:33] godog: yw [17:19:36] !log restarting wdqs-updater for new configuration [17:19:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:20:01] (03PS1) 10Marostegui: db-codfw.php: Increase weight db2043 and db2061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349979 [17:26:02] (03CR) 10Marostegui: [C: 032] db-codfw.php: Increase weight db2043 and db2061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349979 (owner: 10Marostegui) [17:27:13] (03Merged) 10jenkins-bot: db-codfw.php: Increase weight db2043 and db2061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349979 (owner: 10Marostegui) [17:28:34] (03PS1) 10Chad: Write global scap lock file for non-active deployment master [puppet] - 10https://gerrit.wikimedia.org/r/349981 [17:28:36] !log marostegui@naos Synchronized wmf-config/db-codfw.php: Increase db2043 and db2061 weight - T163339 (duration: 00m 58s) [17:28:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:28:45] T163339: pdu phase inbalances: ps1-a3-codfw, ps1-c6-codfw, & ps1-d6-codfw - https://phabricator.wikimedia.org/T163339 [17:28:50] thcipriani, godog: Well that was easy ^^^ [17:29:06] (03PS1) 10BryanDavis: Move labtestwiki to group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349982 (https://phabricator.wikimedia.org/T163694) [17:29:15] RainbowSprinkles: awesome :) [17:29:17] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349983 (https://phabricator.wikimedia.org/T128546) [17:30:59] (03CR) 10Thcipriani: [C: 031] Write global scap lock file for non-active deployment master [puppet] - 10https://gerrit.wikimedia.org/r/349981 (owner: 10Chad) [17:31:11] (03PS1) 10Ladsgroup: Enable echo notification for wikibase clients in beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349984 (https://phabricator.wikimedia.org/T142104) [17:32:40] RainbowSprinkles: neat! thanks, I'll merge it [17:32:55] (03CR) 10Filippo Giunchedi: [C: 032] Write global scap lock file for non-active deployment master [puppet] - 10https://gerrit.wikimedia.org/r/349981 (owner: 10Chad) [17:34:08] (03CR) 10Chad: [C: 032] Move labtestwiki to group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349982 (https://phabricator.wikimedia.org/T163694) (owner: 10BryanDavis) [17:34:31] {{done}} [17:34:32] paravoid: I have not been able to work much since the incident. The phab task has more info though [17:35:08] (03Merged) 10jenkins-bot: Move labtestwiki to group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349982 (https://phabricator.wikimedia.org/T163694) (owner: 10BryanDavis) [17:35:15] !log upgrade cp2024 varnish-be to varnish 4.1.5-1wm2, expiry thread lock/priority workaround T145661 [17:35:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:35:23] T145661: varnish backends start returning 503s after ~6 days uptime - https://phabricator.wikimedia.org/T145661 [17:36:30] !log demon@naos Synchronized dblists/group0.dblist: moving labstestwiki to group0 (duration: 00m 54s) [17:36:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:36:48] bd808: {{done}} ^ [17:37:01] (probably need to fix the group0 dashboard on logstash) [17:37:05] RainbowSprinkles: sweet [17:37:55] RainbowSprinkles: I can do that [17:37:57] (03PS1) 10Marostegui: db-codfw.php: Increase weight db2061 and db2043 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349985 [17:38:24] RECOVERY - Check Varnish expiry mailbox lag on cp2024 is OK: OK: expiry mailbox lag is 0 [17:39:16] RainbowSprinkles: group0 dashboard updated [17:39:16] (03CR) 10Marostegui: [C: 032] db-codfw.php: Increase weight db2061 and db2043 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349985 (owner: 10Marostegui) [17:39:47] Sweetness [17:39:49] Yay teamwork [17:40:23] (03Merged) 10jenkins-bot: db-codfw.php: Increase weight db2061 and db2043 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349985 (owner: 10Marostegui) [17:40:45] looks like our db schema may be a bit stale on labtestweb2001 [17:40:55] Unknown column 'ug_expiry' in 'field list' [17:41:07] bd808: that happened earlier on silver [17:41:10] I can help with that [17:41:21] marostegui: awesome and thanks [17:41:22] !log marostegui@naos Synchronized wmf-config/db-codfw.php: Increase db2043 and db2061 weight (duration: 00m 49s) [17:41:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:41:48] bd808: https://phabricator.wikimedia.org/T155605 [17:42:11] Whaaaa? [17:42:23] thcipriani: I'm seeing the global lock, but tried to start a sync-file on tin but it didn't 'splode [17:42:36] bd808: how can i see that DB? [17:42:52] $ cat /var/lock/scap-global-lock [17:42:52] Not the active deployment server, use naos.codfw.wmnet [17:43:12] marostegui: a fine question. I think it is local on the labtestweb2001 host, but let me double check [17:43:35] !log installing varnish 4.1.5-1wm2 on all cache_upload hosts @ codfw (no restarts) [17:43:38] hrm [17:43:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:44:23] marostegui: yeah. `sql labtestwiki` from labtestweb2001.wikimedia.org gives me the mysql prompt "(wikiadmin@labtestweb2001) [labtestwiki]>" [17:44:52] thcipriani: Checked on disk, definitely have support for global lock files in our package on tin [17:44:54] * RainbowSprinkles puzzles [17:45:02] labtestweb2001.wikimedia.org should functionally be a clone of silver [17:45:36] bd808: ok, let me quickly fix it [17:46:09] * bd808 dreams of the day when wikitech is just a plain old wiki [17:46:45] we might get there before the end of the calendar year... maybe. [17:46:50] !log Alter table labtestwiki.user_groups on labtestweb2001 - T155605 [17:46:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:46:58] T155605: Schema changes for expiring user groups - https://phabricator.wikimedia.org/T155605 [17:47:13] bd808: it should be fixed now [17:48:31] RainbowSprinkles: hrm, messing with this on beta, seems like Lock.__enter__ isn't getting called here somehow... [17:48:34] thanks again marostegui [17:48:40] you are welcome! [17:48:47] thcipriani: Fucccckkkkk, did I break lockfiles with that refactor? [17:49:43] (03PS1) 10Marostegui: db-codfw.php: Increase weight db2061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349990 [17:51:46] (03CR) 10Marostegui: [C: 032] db-codfw.php: Increase weight db2061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349990 (owner: 10Marostegui) [17:52:46] (03Merged) 10jenkins-bot: db-codfw.php: Increase weight db2061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349990 (owner: 10Marostegui) [17:53:46] !log marostegui@naos Synchronized wmf-config/db-codfw.php: Increase db2061 weight (duration: 00m 47s) [17:53:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:58:25] (03CR) 10Dzahn: "2 more days of Precise support, then we should merge this" [puppet] - 10https://gerrit.wikimedia.org/r/345838 (owner: 10Faidon Liambotis) [18:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170424T1800). Please do the needful. [18:00:05] matthiasmullie and jan_drewniak: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [18:00:12] check [18:03:36] I can SWAT today [18:03:58] (03PS2) 10Thcipriani: Full path to xvfb-run [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348199 (owner: 10Matthias Mullie) [18:04:08] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348199 (owner: 10Matthias Mullie) [18:06:45] anyone wants to deploy a no-op config change for me, while you're at it? https://gerrit.wikimedia.org/r/347412 [18:07:03] (03Merged) 10jenkins-bot: Full path to xvfb-run [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348199 (owner: 10Matthias Mullie) [18:09:40] (03CR) 10Dzahn: "on both phab servers, iridium (trusty) AND phab2001 (jessie) there are already both packages installed in parallel." [puppet] - 10https://gerrit.wikimedia.org/r/349793 (owner: 10Paladox) [18:10:25] (03CR) 10Paladox: "> on both phab servers, iridium (trusty) AND phab2001 (jessie) there" [puppet] - 10https://gerrit.wikimedia.org/r/349793 (owner: 10Paladox) [18:10:57] !log thcipriani@naos Synchronized wmf-config/CommonSettings-labs.php: SWAT: [[gerrit:348199|Full path to xvfb-run]] (beta only change) (duration: 01m 07s) [18:10:59] (03CR) 10Dzahn: "php-apc on jessie is a transitional package, it depends on php5-apcu. installing it works. it will automatically pull in php5-apcu because" [puppet] - 10https://gerrit.wikimedia.org/r/349793 (owner: 10Paladox) [18:11:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:11:39] matthiasmullie: ^ this change should go out on beta whenever beta-scap-eqiad runs next. I just realized that it's frozen, so I'll fix it after SWAT and ping you when that's done. [18:11:45] (03CR) 10Paladox: "> php-apc on jessie is a transitional package, it depends on" [puppet] - 10https://gerrit.wikimedia.org/r/349793 (owner: 10Paladox) [18:11:47] (03CR) 10Dzahn: "you are right about the part that it will be removed in stretch, after jessie. for jessie itself it doesn't really matter yet. later it wi" [puppet] - 10https://gerrit.wikimedia.org/r/349793 (owner: 10Paladox) [18:12:03] (03CR) 10Paladox: "> you are right about the part that it will be removed in stretch," [puppet] - 10https://gerrit.wikimedia.org/r/349793 (owner: 10Paladox) [18:12:10] thcipriani hah, alright thanks :) [18:12:23] MatmaRex: sure I can merge that one for you, could you put it on the wikitech page? [18:12:36] jan_drewniak: ping for swat [18:12:54] thcipriani: i can if you need me to [18:12:56] (03CR) 10Dzahn: "you can do "if > jessie" or "if >= jessie".." [puppet] - 10https://gerrit.wikimedia.org/r/349793 (owner: 10Paladox) [18:13:46] (03PS2) 10Thcipriani: Remove defunct $wgForeignUploadTestEnabled for cross-wiki upload A/B test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/347412 (owner: 10Bartosz Dziewoński) [18:13:51] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/347412 (owner: 10Bartosz Dziewoński) [18:13:53] MatmaRex: yes please and thank you :) [18:13:59] (03CR) 10Paladox: "> you can do "if > jessie" or "if >= jessie".." [puppet] - 10https://gerrit.wikimedia.org/r/349793 (owner: 10Paladox) [18:15:03] (03Merged) 10jenkins-bot: Remove defunct $wgForeignUploadTestEnabled for cross-wiki upload A/B test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/347412 (owner: 10Bartosz Dziewoński) [18:17:24] (03PS3) 10Paladox: Phabricator: Install php5-apcu on debian jessie [puppet] - 10https://gerrit.wikimedia.org/r/349793 [18:18:28] !log disabling mysql replication eqiad -> codfw on s[1-7] and x1 shards T155099 [18:18:32] (03CR) 10jerkins-bot: [V: 04-1] Phabricator: Install php5-apcu on debian jessie [puppet] - 10https://gerrit.wikimedia.org/r/349793 (owner: 10Paladox) [18:18:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:18:35] T155099: Database maintenance scheduled while eqiad datacenter is non primary (after the DC switchover) - https://phabricator.wikimedia.org/T155099 [18:18:53] (03PS4) 10Paladox: Phabricator: Install php5-apcu on debian jessie [puppet] - 10https://gerrit.wikimedia.org/r/349793 [18:19:40] !log thcipriani@naos Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:347412|Remove defunct $wgForeignUploadTestEnabled for cross-wiki upload A/B test]] (duration: 00m 53s) [18:19:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:19:57] ^ MatmaRex patch is live, thanks :) [18:20:04] PROBLEM - Check Varnish expiry mailbox lag on cp2022 is CRITICAL: CRITICAL: expiry mailbox lag is 691750 [18:20:43] thanks thcipriani [18:21:04] jan_drewniak: ping me when you're around for SWAT. [18:22:05] (03PS3) 10Eevans: WIP: Create a Cassandra 3.7 configuration [puppet] - 10https://gerrit.wikimedia.org/r/349668 (https://phabricator.wikimedia.org/T160570) [18:22:51] (03CR) 10jenkins-bot: db-codfw.php: Increase weight db2061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349990 (owner: 10Marostegui) [18:23:18] (03CR) 10jerkins-bot: [V: 04-1] WIP: Create a Cassandra 3.7 configuration [puppet] - 10https://gerrit.wikimedia.org/r/349668 (https://phabricator.wikimedia.org/T160570) (owner: 10Eevans) [18:23:48] (03CR) 10jenkins-bot: Move labtestwiki to group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349982 (https://phabricator.wikimedia.org/T163694) (owner: 10BryanDavis) [18:24:36] (03CR) 10jenkins-bot: Remove defunct $wgForeignUploadTestEnabled for cross-wiki upload A/B test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/347412 (owner: 10Bartosz Dziewoński) [18:25:24] (03CR) 10jenkins-bot: db-codfw.php: Increase weight db2061 and db2043 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349985 (owner: 10Marostegui) [18:27:04] (03CR) 10jenkins-bot: Enable RCFilters beta feature on all wikis except wikidatawiki, nlwiki, cswiki, etwiki and hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343439 (https://phabricator.wikimedia.org/T144458) (owner: 10Catrope) [18:28:06] (03CR) 10jenkins-bot: Ensure wgOresFiltersThresholds relevant keys exist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349971 (owner: 10Dereckson) [18:28:59] (03CR) 10jenkins-bot: db-codfw.php: Repool db2043 and db2061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349970 (https://phabricator.wikimedia.org/T163339) (owner: 10Marostegui) [18:30:04] RECOVERY - Check Varnish expiry mailbox lag on cp2022 is OK: OK: expiry mailbox lag is 0 [18:30:16] (03CR) 10jenkins-bot: db-codfw.php: Increase weight db2043 and db2061 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/349979 (owner: 10Marostegui) [18:31:12] (03CR) 10jenkins-bot: Full path to xvfb-run [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348199 (owner: 10Matthias Mullie) [18:34:15] matthiasmullie: ^ your change should be live in beta now, FYI [18:34:38] (03PS1) 10Urbanecm: New throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/350004 (https://phabricator.wikimedia.org/T163726) [18:35:40] Hi there, is there some space for my last minute SWAT patch? It is just a throttle rule. [18:36:05] thcipriani: ^ [18:36:39] Urbanecm: yup, sure [18:36:41] * thcipriani looks [18:37:53] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/350004 (https://phabricator.wikimedia.org/T163726) (owner: 10Urbanecm) [18:38:00] thcipriani perfect, thanks! [18:38:52] (03Merged) 10jenkins-bot: New throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/350004 (https://phabricator.wikimedia.org/T163726) (owner: 10Urbanecm) [18:39:01] (03PS5) 10Paladox: Phabricator: Install php5-apcu on debian jessie [puppet] - 10https://gerrit.wikimedia.org/r/349793 [18:39:04] (03CR) 10jenkins-bot: New throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/350004 (https://phabricator.wikimedia.org/T163726) (owner: 10Urbanecm) [18:39:08] (03PS6) 10Paladox: Phabricator: Install php5-apcu on debian jessie [puppet] - 10https://gerrit.wikimedia.org/r/349793 [18:42:50] !log thcipriani@naos Synchronized wmf-config/throttle.php: SWAT: [[gerrit:350004|New throttle rule]] T163726 (duration: 01m 03s) [18:42:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:42:59] T163726: temporary lift of IP cap for editathon - https://phabricator.wikimedia.org/T163726 [18:43:11] ^ Urbanecm new throttle rule is live, thanks for the patch! [18:43:24] Thanks for the deploy thcipriani! [18:47:24] PROBLEM - puppet last run on labvirt1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:10:14] !log cp2026: restart to wm2 varnish package [19:10:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:11:53] bblack: ema's patch working out already? :) [19:17:24] RECOVERY - puppet last run on labvirt1009 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [19:18:06] paravoid: I think so :) [19:18:20] awesome [19:18:23] we had another mitigation over the weekend, which was to re-raise keep [19:18:42] but we expected that even if that did help, it would take ~3d to start helping and it's only been 2 [19:19:03] but maybe it starts helping earlier than we think, because the ~6/10 that haven't restarted to the new code also aren't showing as much sign of trouble as we'd expect [19:22:32] thcipriani: shoot! I was distracted with baby stuff during the SWAT window, lost track of time, I'll just reschedule that portal deploy for tomorrow :/ ... [19:23:23] jan_drewniak: np, sounds good :) [19:27:06] thcipriani: good morning. Do you mind if we had asimple hotfix to get rid of a fatal/log spam ? ( https://gerrit.wikimedia.org/r/#/c/349144/ ) [19:27:58] (03PS1) 10Cmjohnson: Adding mgmt dns for netmon1002 T159756 [dns] - 10https://gerrit.wikimedia.org/r/350009 [19:29:23] hashar: no, I don't mind, we're a wee bit over swat, want me to deploy? [19:29:44] oh my hours / timezone maths are crazy [19:29:55] :) [19:30:04] I +2ed a backport https://gerrit.wikimedia.org/r/#/c/350011/1 [19:35:04] !log mattflaschen@naos Started scap: Full scap (due to ORES i18n change earlier), plus additional $wgHiddenPrefs change [19:35:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:35:49] (03CR) 10Cmjohnson: [C: 032] Adding mgmt dns for netmon1002 T159756 [dns] - 10https://gerrit.wikimedia.org/r/350009 (owner: 10Cmjohnson) [19:39:15] hashar, thcipriani, I'm in the middle of a scap per greg-g, will tell you when done. [19:39:22] (03Restored) 10Dzahn: add netmon1002 to site [puppet] - 10https://gerrit.wikimedia.org/r/333780 (https://phabricator.wikimedia.org/T159756) (owner: 10Dzahn) [19:39:26] matt_flaschen: thanks :) [19:40:05] (03PS5) 10Dzahn: add netmon1002 to site [puppet] - 10https://gerrit.wikimedia.org/r/333780 (https://phabricator.wikimedia.org/T159756) [19:41:19] (03CR) 10Dzahn: [C: 031] "this adds it with "standard", firewall and network constants.. but as you can see the actual puppet roles are commented out on purpose" [puppet] - 10https://gerrit.wikimedia.org/r/333780 (https://phabricator.wikimedia.org/T159756) (owner: 10Dzahn) [19:42:26] 06Operations, 10ops-eqiad: ripe-atlas-eqiad is down - https://phabricator.wikimedia.org/T163243#3207627 (10Cmjohnson) This needs a c14 to Square/Round end Polarized cord. [19:43:58] thcipriani: bah tests have failed :) [19:46:37] (03CR) 10Dzahn: [C: 04-1] "Yes to the "If" but please add "else" to keep the status quo on trusty and not break iridium." [puppet] - 10https://gerrit.wikimedia.org/r/349793 (owner: 10Paladox) [19:47:37] (03CR) 10Dzahn: [C: 04-1] "well.. it wouldn't break it because puppet would not remove the installed package and if we reinstall it we want jessie anyways.. but stil" [puppet] - 10https://gerrit.wikimedia.org/r/349793 (owner: 10Paladox) [19:49:26] (Don't kill me) but can someone disable my wikitech's 2fa? [19:52:11] !log mattflaschen@naos Finished scap: Full scap (due to ORES i18n change earlier), plus additional $wgHiddenPrefs change (duration: 17m 06s) [19:52:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:53:39] 06Operations, 10ops-eqiad, 13Patch-For-Review: rack and setup boron replacement frpm1001 - https://phabricator.wikimedia.org/T162298#3207704 (10Cmjohnson) frpm1001 is racked in c1 connected pfw1-0/7. idrac and bios are setup. [19:53:51] 06Operations, 10ops-eqiad, 13Patch-For-Review: rack and setup boron replacement frpm1001 - https://phabricator.wikimedia.org/T162298#3207705 (10Cmjohnson) p:05Triage>03Normal [19:56:13] Zppix: yeah, new phone? [19:57:04] i think due to the security of it, you may need to file a ops-access-request since its more secure than irc [19:57:12] but otherwise should be able to strip 2fa off your wikitech account [19:57:46] the request is just for audit trail + account verification, ther isnt any kind of waitin gperiod for what you are asking that im aware of [20:00:05] gwicke, cscott, arlolra, subbu, bearND, halfak, and Amir1: Respected human, time to deploy Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170424T2000). Please do the needful. [20:00:15] Nothing today for ORES [20:00:20] no parsoid deploy today [20:09:21] (03PS7) 10Paladox: Phabricator: Install php5-apcu on debian jessie [puppet] - 10https://gerrit.wikimedia.org/r/349793 [20:10:19] robh: want me to cc you on it? [20:11:39] nah im on clinic duty so ill see it anyhow [20:11:47] if its in ops-access-requests, which is where i would file it [20:12:07] 06Operations, 10Ops-Access-Requests: 2fa disable on Wikitech Account for Zppix - https://phabricator.wikimedia.org/T163736#3207761 (10Zppix) [20:12:17] hashar, thcipriani, scap complete. Sorry, there was a bug and I was trying to see if it was related to my deploy (no). [20:12:39] Zppix: has anyone given you a hard time about not having your 5 one time use codes? ;] [20:12:42] matt_flaschen: np, thanks for the ping :) [20:12:44] if not, consider that it ^ [20:13:04] robh: err umm no comment [20:14:01] our password reset procedures were written pre-phabricator adoption [20:14:02] thcipriani: and my backport patch is voted v-1 because CI is wrong :( [20:14:13] i tihnk its fair to consider phab requests as valid id confirmation steps [20:14:15] hashar: I was just looking at that [20:14:18] (much like a home directory use or whatever) [20:14:43] robh: i can ssh into something if you wish [20:14:44] thcipriani: in short: the mediawiki-extensions jobs running on a wmf.xx branch ends up cloning wikidata @ master [20:14:54] Zppix: nah, im just echoing my thought process,heh [20:14:55] thcipriani: because wikidata is not branched [20:15:19] https://wikitech.wikimedia.org/wiki/Password_reset#Reset_two_factor_authentication [20:15:32] hashar: ah, so zuul-cloner is falling back to master since there is no wmf.20 for that repo? [20:15:37] thcipriani: correct [20:15:44] 06Operations, 10ops-eqiad: ocg1001.eqiad.wmnet ipmi error - https://phabricator.wikimedia.org/T155692#3207780 (10Cmjohnson) While this server was down I did confirm that ipmi is enabled. reset racadm. [20:15:56] I am pretty sure there is a task about it already. But really for wmf branches we should clone mediawiki and process submodules [20:16:39] that might take a while :) [20:16:55] https://phabricator.wikimedia.org/T113731 [20:18:14] robh: can you ping me when you're doing whatever it is you do? [20:18:22] done* [20:18:22] just finished [20:18:27] ok thanks [20:18:34] irc bot echo is slow it seems [20:18:49] but you should be able to log in now, you will want to setup your 2FA again [20:18:57] or you'll not have access to the advanced panels that require it [20:19:18] robh: ack thanks (maybe this time i wont be so stupid :P ) [20:20:04] i already knew how to do it cuz ive had to do it for myself.... [20:20:25] heh [20:22:14] RECOVERY - Host ocg1001 is UP: PING OK - Packet loss = 0%, RTA = 36.99 ms [20:25:14] PROBLEM - dhclient process on ocg1001 is CRITICAL: Return code of 255 is out of bounds [20:25:14] PROBLEM - Check whether ferm is active by checking the default input chain on ocg1001 is CRITICAL: Return code of 255 is out of bounds [20:25:14] PROBLEM - OCG health on ocg1001 is CRITICAL: Return code of 255 is out of bounds [20:25:14] PROBLEM - Check size of conntrack table on ocg1001 is CRITICAL: Return code of 255 is out of bounds [20:25:14] PROBLEM - Disk space on ocg1001 is CRITICAL: Return code of 255 is out of bounds [20:25:15] PROBLEM - salt-minion processes on ocg1001 is CRITICAL: Return code of 255 is out of bounds [20:25:15] PROBLEM - puppet last run on ocg1001 is CRITICAL: Return code of 255 is out of bounds [20:25:16] PROBLEM - DPKG on ocg1001 is CRITICAL: Return code of 255 is out of bounds [20:25:16] PROBLEM - configured eth on ocg1001 is CRITICAL: Return code of 255 is out of bounds [20:29:17] thcipriani: and in short I dont have any idea how to fix it :( [20:31:32] ocg1001 - ACK [20:31:56] So I've just searched wikitech, mediawiki.org and phab for about 15 minutes and am even more confused than whan I started: WTF is "labswiki"? It seems this is some part of beta.wmflabs.org. And then again, it seems this is wikitech. o_O [20:32:14] ACKNOWLEDGEMENT - Check size of conntrack table on ocg1001 is CRITICAL: Return code of 255 is out of bounds daniel_zahn https://phabricator.wikimedia.org/T155692 [20:32:15] ACKNOWLEDGEMENT - Check whether ferm is active by checking the default input chain on ocg1001 is CRITICAL: Return code of 255 is out of bounds daniel_zahn https://phabricator.wikimedia.org/T155692 [20:32:15] ACKNOWLEDGEMENT - DPKG on ocg1001 is CRITICAL: Return code of 255 is out of bounds daniel_zahn https://phabricator.wikimedia.org/T155692 [20:32:15] ACKNOWLEDGEMENT - Disk space on ocg1001 is CRITICAL: Return code of 255 is out of bounds daniel_zahn https://phabricator.wikimedia.org/T155692 [20:32:15] ACKNOWLEDGEMENT - MD RAID on ocg1001 is CRITICAL: Return code of 255 is out of bounds daniel_zahn https://phabricator.wikimedia.org/T155692 [20:32:15] ACKNOWLEDGEMENT - NTP on ocg1001 is CRITICAL: NTP CRITICAL: No response from NTP server daniel_zahn https://phabricator.wikimedia.org/T155692 [20:32:15] ACKNOWLEDGEMENT - OCG health on ocg1001 is CRITICAL: Return code of 255 is out of bounds daniel_zahn https://phabricator.wikimedia.org/T155692 [20:32:16] ACKNOWLEDGEMENT - configured eth on ocg1001 is CRITICAL: Return code of 255 is out of bounds daniel_zahn https://phabricator.wikimedia.org/T155692 [20:32:16] ACKNOWLEDGEMENT - dhclient process on ocg1001 is CRITICAL: Return code of 255 is out of bounds daniel_zahn https://phabricator.wikimedia.org/T155692 [20:32:17] ACKNOWLEDGEMENT - puppet last run on ocg1001 is CRITICAL: Return code of 255 is out of bounds daniel_zahn https://phabricator.wikimedia.org/T155692 [20:32:17] ACKNOWLEDGEMENT - salt-minion processes on ocg1001 is CRITICAL: Return code of 255 is out of bounds daniel_zahn https://phabricator.wikimedia.org/T155692 [20:32:38] eddiegp: "labswiki" sounds like wikitech to me [20:32:59] eddiegp: you should now start calling it "cloudwiki" instead i guess :p [20:33:11] (no, j/k, it's wikitechwiki) [20:33:45] eddiegp: in which context do you see that term? [20:33:54] thcipriani: anyway I have added https://gerrit.wikimedia.org/r/#/c/350011/ to the evening swat but I will not be around. The patch will need to be force merged :( [20:34:06] Okay thanks. Just wondering, as there were a lot of things refering to "labtestwikitech.wikimedia.org" too. [20:34:07] thcipriani: if that does not land today, I will babysit it tomorrow :] [20:36:23] (03PS1) 10Cmjohnson: Adding public ipv4 dns for netmon1002 T159756 [dns] - 10https://gerrit.wikimedia.org/r/350028 [20:37:23] mutante: I've read hashar stating "labswiki has not been updated" on T160686. I though some quick search should give me a hint what this is, but it didn't. [20:37:24] T160686: ug_expiry column of the user_groups table is not present on Labs - https://phabricator.wikimedia.org/T160686 [20:37:38] (03PS2) 10Cmjohnson: Adding public ipv4 dns for netmon1002 T159756 [dns] - 10https://gerrit.wikimedia.org/r/350028 [20:37:50] (03PS2) 10Jdlrobson: Correctly enforce config for related pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348472 (https://phabricator.wikimedia.org/T163114) [20:38:02] (03CR) 10Cmjohnson: [C: 032] Adding public ipv4 dns for netmon1002 T159756 [dns] - 10https://gerrit.wikimedia.org/r/350028 (owner: 10Cmjohnson) [20:38:21] eddiegp: ok, so i can confirm that is wikitech. i know from "silver labswiki ". silver is the server name [20:38:31] silver runs wikitech [20:38:50] labswiki is the dbname of wikitech wiki [20:40:27] (03PS1) 10Jdlrobson: Disable Page previews beta feature on Wiktionary and Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/350038 (https://phabricator.wikimedia.org/T163205) [20:41:22] eddiegp: also: https://wikitech.wikimedia.org/wiki/Labs_labs_labs | https://lists.wikimedia.org/pipermail/wikimedia-l/2017-February/086343.html [20:42:07] mutante: Yeah, I think I just found T72108, that makes clear there was something named "labswiki" on beta.wmflabs.org that was renamed to "deploymentwiki" deployment.wikimedia.beta.wmflabs.org in favor of the wikitech database name. [20:42:08] T72108: Rename labswiki to deploymentwiki - https://phabricator.wikimedia.org/T72108 [20:42:31] (03PS4) 10Eevans: WIP: Create a Cassandra 3.7 configuration [puppet] - 10https://gerrit.wikimedia.org/r/349668 (https://phabricator.wikimedia.org/T160570) [20:42:39] Looking at "labs labs labs" was my first try. Going to add it there. [20:42:54] eddiegp: aha! yea, that sounds right.. good idea to add it there. thank you [20:42:56] eddiegp: yup labswiki (wikitech) database missed the "ug_expiry" column [20:43:18] that caused a brief outage when we have enabled the "user group expiry" feature on all wikis [20:43:41] got missed when the schema got updated via https://phabricator.wikimedia.org/T155605 [20:43:42] hashar: I've got all of that, I just didn't get what wiki "labswiki" is. But that's clear by now ;) [20:43:54] ah ok :] [20:44:02] and afaik it runs on a single host: silver [20:44:53] it does. but changes can be tested on labtestweb2001 [20:45:03] well, that is codfw and has the same role of nova manager [20:45:09] 06Operations, 10MediaWiki-General-or-Unknown, 10Traffic, 07HTTPS: Make default interwiki map links protocol-relative - https://phabricator.wikimedia.org/T33327#353861 (10demon) I disagree. We should use https for those that support it, and http for those that don't. Protocol-relative URLs were a useful to... [20:45:40] i suppose codfw labs is 'soon come' [20:47:53] mutante: that would probably be a pain considering everything labs runs [20:49:54] 06Operations, 10ops-eqiad, 10netops: Spread eqiad analytics Kafka nodes to multiple racks ans rows - https://phabricator.wikimedia.org/T163002#3207869 (10elukey) @Cmjohnson yep exactly! But it should be done before the 26th, the major goal is to avoid to loose two kafka nodes for extended maintenance at the... [20:50:14] Zppix: oh, i'm sure it's a lot of work, yea. just pointing out i already see a server in codfw with that role [20:50:35] (the nova::manager / webserver, not all of labs) [20:51:00] mutante: you arent apart of the MediaWiki gerrit group are you? [20:51:26] Zppix: no, my gerrit group is just the Ops group [20:51:54] dang i really need this usergroup ext merge and no-one with +2 rights seems to be active [20:52:28] Zppix: a change in an extension? [20:52:35] https://gerrit.wikimedia.org/r/346539 mutante [20:53:48] Zppix: #wikimedia-operations isn't the best place to ask review for MediaWiki code, you could try #mediawiki and #wikimedia-dev, look also in Git history who committed lastly and ask them to review [20:54:07] Zppix: can you check git log to see who changed it last? it's not so much about Gerrit rights, but it's about that it needs deployment [20:54:12] heh, basically all that Dereckson said [20:54:21] mutante: extentsions require deployment? [20:54:42] (doesnt do much with extenstions usually) [20:55:00] i assume they do, yea, how is the code going to get on the servers [20:55:05] Dereckson: i only asked mutante because i was already talking with him so i thought hey while we're chatting i would ask [20:55:12] another trick: `git blame theFileYouEdited`, so you'll know who wrote the lines around yours and so who could review [20:55:15] mutante: the same as mw core changes [20:55:40] Zppix: yea, that's deployment [20:56:09] But you don't have to care about that manually if it isn't urgent. The train will pick it up. [20:57:33] thats what i thought [20:58:25] *nod* the train conductors know this better than i do [20:58:42] Zppix: the extension seems unmaintened by the way, https://phabricator.wikimedia.org/p/Withoutaname/ [20:58:58] Dereckson: thats why i am asking for someone thats in mw group [20:59:05] and it's not on the Wikimedia cluster [20:59:17] oh [21:00:04] dapatrick, bawolff, and Reedy: Respected human, time to deploy Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170424T2100). Please do the needful. [21:00:24] Well, then I haven't said anything, the train won't care about THAT for some reason ;D [21:00:34] heh [21:01:38] choo choo [21:01:47] Did someone say train? [21:03:50] RainbowSprinkles: in the context to "how non hotfixes extension code is deployed" [21:03:57] (03CR) 10Dzahn: [C: 031] "http://puppet-compiler.wmflabs.org/6213/" [puppet] - 10https://gerrit.wikimedia.org/r/349793 (owner: 10Paladox) [21:04:37] They ride the train :) [21:04:44] Same as skins, vendor, and core [21:05:52] https://en.wikipedia.org/wiki/List_of_PHP_accelerators#Compatibility_chart [21:05:54] woops [21:05:56] wrong place [21:16:16] 06Operations, 06Release-Engineering-Team, 10vm-requests, 07Security-General: New ganeti VM for MW release pipeline work - https://phabricator.wikimedia.org/T163743#3207976 (10demon) [21:23:25] (03PS8) 10Dzahn: Phabricator: Install php5-apcu on debian jessie [puppet] - 10https://gerrit.wikimedia.org/r/349793 (owner: 10Paladox) [21:24:33] (03CR) 10Dzahn: [C: 032] "no-op on prod (iridum) per compiler" [puppet] - 10https://gerrit.wikimedia.org/r/349793 (owner: 10Paladox) [21:27:50] 06Operations, 06Release-Engineering-Team, 10vm-requests, 07Security-General: New ganeti VM for MW release pipeline work - https://phabricator.wikimedia.org/T163743#3207976 (10Dzahn) Should this exist in both DCs? one in eqiad one in codfw per default nowadays? [21:30:30] (03PS1) 10Dereckson: Enable WikidataPageBanner on vi.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/350085 (https://phabricator.wikimedia.org/T163662) [21:50:20] 06Operations, 06Release-Engineering-Team, 10vm-requests, 07Security-General: New ganeti VM for MW release pipeline work - https://phabricator.wikimedia.org/T163743#3208120 (10demon) >>! In T163743#3208114, @demon wrote: >>>! In T163743#3208018, @Dzahn wrote: >> Should this exist in both DCs? one in eqiad o... [21:53:22] !log Updated the sites and site_identifiers tables on all Wikidata clients for dtywiki T161529. [21:53:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:53:29] T161529: Create Wikipedia Doteli - https://phabricator.wikimedia.org/T161529 [22:14:09] (03PS5) 10Eevans: WIP: Create a Cassandra 3.7 configuration [puppet] - 10https://gerrit.wikimedia.org/r/349668 (https://phabricator.wikimedia.org/T160570) [22:15:36] (03PS6) 10Eevans: WIP: Create a Cassandra 3.7 configuration [puppet] - 10https://gerrit.wikimedia.org/r/349668 (https://phabricator.wikimedia.org/T160570) [22:29:04] (03CR) 10Ottomata: [C: 031] ":D" [puppet] - 10https://gerrit.wikimedia.org/r/348938 (https://phabricator.wikimedia.org/T159136) (owner: 10Elukey) [22:29:22] is it expected that wikipedia should be having performance issues https://status.wikimedia.org/169667/https-services---wikipedia ? [22:31:30] there seems to be service distruption on https://status.wikimedia.org/155930/Wiki-platform-[[w:en:Special:Random]] [22:35:31] watchmouse isn't the best tool to rely on for status, especially considering there hasn't been any other alerts raised elsewhere [22:35:51] that ^, it's not trustworthy [22:36:27] Oh, if it isent trustyworthy shouldent it be replaced with the source we trusty? [22:36:51] trusty = trust [22:37:32] It's basically "take this all with a grain of salt" -- a minor performance blip on them is probably just noise [22:37:50] it's predominately just for us to have uptime metrics [22:38:01] ok [22:38:06] ^ That. It's best for basic "is it broadly up/down" [22:46:09] !log deploy patch for T155277 [22:46:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:52:50] (03PS1) 10Kaldari: Enable cookie blocking on all production wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/350090 (https://phabricator.wikimedia.org/T162651) [23:00:05] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170424T2300). [23:00:05] hashar and jdlrobson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:21] paladox: you can't really have external automated status that can be 100% due to a large number of factors that can interfer with the data collection [23:00:35] ok [23:01:15] hashar isn't here but I'll be responsible for his patch [23:01:36] actually I'll swat if nobody else wants to [23:01:43] jdlrobson: you around? [23:01:51] twentyafterfour: yup [23:02:42] https://gerrit.wikimedia.org/r/#/c/348472/ good to go? [23:03:03] yup [23:03:04] ! [23:04:30] ok wth... both tin and mira have the " this is not the active deployment server." message [23:04:59] twentyafterfour: naos.codfw [23:05:10] ah [23:06:00] (03CR) 1020after4: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348472 (https://phabricator.wikimedia.org/T163114) (owner: 10Jdlrobson) [23:07:15] (03Merged) 10jenkins-bot: Correctly enforce config for related pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348472 (https://phabricator.wikimedia.org/T163114) (owner: 10Jdlrobson) [23:07:23] (03CR) 10jenkins-bot: Correctly enforce config for related pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348472 (https://phabricator.wikimedia.org/T163114) (owner: 10Jdlrobson) [23:10:12] ok what host is mwdebug in codfw? [23:12:08] mwdebug2001 or so [23:12:34] I don't see any such thing in dns or puppet [23:12:53] yeah... I guess we still use 1001? [23:13:01] hrmmm [23:13:21] 06Operations, 06Performance-Team, 15User-fgiunchedi: Backfill restored coal whisper files with current data - https://phabricator.wikimedia.org/T163194#3208300 (10Krinkle) 05Open>03Resolved Starting with merging data on graphite2001. ```lines=5,name=Backup current data [22:28 UTC] krinkle at graphite20... [23:13:30] Also, all canaries are in eqiad. [23:13:37] https://github.com/wikimedia/puppet/blob/228efb11dcb81ce889ccb26f74e82498fc9d81de/manifests/site.pp#L1954 [23:14:17] No codfw canaries is prolly a bad thing [23:14:18] well the patch is on mwdebug1001 [23:14:49] jdlrobson: what's needed to test? [23:14:59] ill test right now [23:15:11] debug extension already supports two canaries in codfw [23:15:26] mw2017 mw2099 [23:15:32] Just named differently [23:16:07] im confused.. where should I be testing? mwdebug1001 doesnt seem to work for me [23:16:23] ok the patch is on mw2017 now [23:16:42] mw2017 is analogous to debug1001, mw2099 to debug1002 [23:16:46] jdlrobson: mwdebug1001 is eqiad we are codfw atm [23:16:46] jdlrobson: does that one work? [23:17:33] https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug#Available_backends [23:17:41] Should probably file a task for ops to rename those [23:17:55] I can rename them right now if you wish Krinkle [23:18:17] Zppix: How? [23:18:29] you mean the page correct? [23:18:33] Zppix: No. [23:18:37] The names on the page are correct. [23:18:41] oh nevermind i misunderstood [23:18:43] But we want the names to be different. [23:19:28] (03CR) 1020after4: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/350038 (https://phabricator.wikimedia.org/T163205) (owner: 10Jdlrobson) [23:19:54] twentyafterfour: i can't see it on mw2017 either.. [23:20:12] hmmm... [23:20:35] thcipriani: is there some magic that I'm missing between git pull on the deploy master and scap pull on the debug server? [23:20:44] (03Merged) 10jenkins-bot: Disable Page previews beta feature on Wiktionary and Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/350038 (https://phabricator.wikimedia.org/T163205) (owner: 10Jdlrobson) [23:20:52] (03CR) 10jenkins-bot: Disable Page previews beta feature on Wiktionary and Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/350038 (https://phabricator.wikimedia.org/T163205) (owner: 10Jdlrobson) [23:21:07] e.g do I need to do something to sync the mediawiki and mediawiki-staging directories? [23:21:33] twentyafterfour: shouldn't be [23:22:02] twentyafterfour: my patch sucks [23:22:05] i think that's the issue here. [23:22:40] but the patch is definitely not on mwdebug1002 [23:23:04] twentyafterfour: and neither are you: where are you pulling this? [23:23:45] (03PS1) 10Jdlrobson: Follow up to Ia1a6da9d844bd3ca7288a8d4f5f81a955e4061b4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/350095 [23:23:55] ah, 2017.../me reads scrollback [23:23:58] https://gerrit.wikimedia.org/r/350095 < twentyafterfour thcipriani [23:24:03] it's missing a `m` [23:24:23] jdlrobson: oh [23:24:56] ah, yeah, that patch is there on mw2017 [23:24:57] sorry :/ [23:25:04] * thcipriani wanders away [23:25:33] jdlrobson: I already jumped the gun and +2'd the other patch, can we deploy the two together or should I do some git wrangling to reorder them? [23:26:02] we can deploy the two together [23:26:05] ok cool [23:26:10] (03CR) 1020after4: [C: 032] Follow up to Ia1a6da9d844bd3ca7288a8d4f5f81a955e4061b4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/350095 (owner: 10Jdlrobson) [23:26:15] as long as i can test them together :) [23:26:28] should be able to [23:26:44] we'll just call it integration testing [23:27:04] integration testing: the hard way [23:27:15] (03Merged) 10jenkins-bot: Follow up to Ia1a6da9d844bd3ca7288a8d4f5f81a955e4061b4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/350095 (owner: 10Jdlrobson) [23:27:23] (03CR) 10jenkins-bot: Follow up to Ia1a6da9d844bd3ca7288a8d4f5f81a955e4061b4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/350095 (owner: 10Jdlrobson) [23:28:27] ok jdlrobson, it should be on mw2017.codfw now [23:28:38] all three patches [23:30:46] twentyafterfour: so.. the wikitionary/wikidata one is fine. [23:30:58] twentyafterfour still something wrong with the other two [23:30:59] i dont get it [23:31:32] hmm [23:33:04] twentyafterfour: are you able to see what the value of $wgRelatedArticlesFooterWhitelistedSkins is for ruwiki ? [23:34:10] twentyafterfour, can I deploy security patches after the swat? [23:35:02] MaxSem: sure, I'll let you know as soon as this is resolved [23:35:21] (03PS1) 10Thcipriani: Scap: update version to 3.5.6-1 [puppet] - 10https://gerrit.wikimedia.org/r/350096 [23:35:24] jdlrobson: I'm looking at the file on disk but not sure how to test in process [23:35:50] I see 'htwiki' => [ 'minerva', 'vector' ], [23:35:54] but nothing about ruwiki [23:36:51] and that's the only mention of wmgRelatedArticlesFooterWhitelistedSkins in InitializeSettings [23:38:02] twentyafterfour: it's using a dblist [23:38:41] im not sure how to proceed.. it doesnt seem to introduce any regressions but it's not helping either. We can deploy the wiktionary/wikidata patch and I can try work this out another time? [23:39:03] the config is behaving in a very strange way though [23:39:05] ok if it's not going to cause problems we'll just deploy all three then? [23:39:10] it shouldnt do [23:39:16] ill let you know quickly if it does [23:39:40] ok going live then :-o [23:39:41] (03PS7) 10Eevans: Create a Cassandra 3.7 configuration [puppet] - 10https://gerrit.wikimedia.org/r/349668 (https://phabricator.wikimedia.org/T160570) [23:40:14] crosses fingers.. [23:41:04] syncing [23:41:47] hmm we didn't get any log message from scap [23:41:51] !log twentyafterfour@naos Synchronized wmf-config/: deploy https://gerrit.wikimedia.org/r/#/c/348472/ refs T163114 (duration: 01m 05s) [23:41:59] oh there it is [23:41:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:42:00] T163114: Regression: Fix config to disable related pages where it's not wanted - https://phabricator.wikimedia.org/T163114 [23:42:06] jdlrobson: done, test? [23:43:10] twentyafterfour: thanks.. seeing same as in debug mode :/ not sure what's going on with that related pages link [23:43:19] who's the best person to talk to about how our configs work? [23:44:11] hmm I don't know, I am very much clueless about the mediawiki settings spaghetti [23:44:52] I know that there are circular includes() chains in there and that alone is enough to bother me greatly [23:45:03] s/includes/include/ [23:45:35] ok MaxSem you wanted to deploy security patches [23:45:43] thanks twentyafterfour [23:45:47] I have one more for swat but I can wait until after you're done [23:45:54] since nobody else is waiting to test it [23:45:56] ehm, go ahead then [23:46:10] no you can go :) it's cool, security patches are probably more important [23:46:21] twentyafterfour, dunno how long it will take me so you first [23:46:29] ok then, I'll be quick [23:46:54] assuming jenkins cooperates [23:53:08] jdlrobson: we are now generating a bunch of "Notice: Undefined variable: wmgRelatedArticlesFooterWhitelistedSkins in /srv/mediawiki/wmf-config/CommonSettings.php on line 2878" [23:53:16] wahh [23:53:23] yeah wth I don't get it [23:53:30] WTF [23:55:02] twentyafterfour: i dont know what to suggest [23:55:34] (03CR) 10Eevans: [C: 031] "PC output: http://puppet-compiler.wmflabs.org/6216" [puppet] - 10https://gerrit.wikimedia.org/r/349668 (https://phabricator.wikimedia.org/T160570) (owner: 10Eevans) [23:58:35] (03PS1) 10BBlack: block ancient chrome [puppet] - 10https://gerrit.wikimedia.org/r/350098 [23:58:44] twentyafterfour: i also have to go.. im having a bad migraine and need to put down my laptop [23:59:31] jdlrobson: ok I'll take care of reverting it [23:59:33] twentyafterfour: are you sure the errors are not old? [23:59:47] i'd really like to understand WTF is happening here as it makes no sense to me [23:59:55] the errors are brand new