[00:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: (Dis)respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180215T0000). Please do the needful. [00:00:04] MaxSem: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:00:20] I'll do it [00:00:46] (03PS2) 10MaxSem: Deploy GlobalPreferences in Beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410267 (https://phabricator.wikimedia.org/T184668) [00:00:52] (03CR) 10MaxSem: [C: 032] Deploy GlobalPreferences in Beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410267 (https://phabricator.wikimedia.org/T184668) (owner: 10MaxSem) [00:03:20] (03Merged) 10jenkins-bot: Deploy GlobalPreferences in Beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410267 (https://phabricator.wikimedia.org/T184668) (owner: 10MaxSem) [00:03:34] (03CR) 10jenkins-bot: Deploy GlobalPreferences in Beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410267 (https://phabricator.wikimedia.org/T184668) (owner: 10MaxSem) [00:09:10] 10Operations, 10ops-eqsin, 10DC-Ops, 10Traffic: singapore caching center: eqiad staging tracking task - https://phabricator.wikimedia.org/T166179#3287561 (10faidon) @RobH, this can be resolved now, right? [00:09:43] 10Operations, 10ops-eqsin, 10DC-Ops, 10Traffic: singapore caching center: eqiad staging tracking task - https://phabricator.wikimedia.org/T166179#3974242 (10RobH) 05stalled>03Resolved Yep, was just cleaning up the sub-tasks. Nothing left for this! [00:10:40] RECOVERY - puppet last run on ms-be1038 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [00:23:23] 10Operations, 10ops-eqsin, 10Traffic: dns5002 mgmt console unreachable - https://phabricator.wikimedia.org/T186902#3974268 (10BBlack) Sounds about right to me. But let's do the other two in T187158 and T187157 as well and maybe get more value out of the time. cp5006 and cp5010 both have "working" managemen... [00:39:28] (03PS4) 10Krinkle: webperf: Re-use expected result by reference to simplify fixture [puppet] - 10https://gerrit.wikimedia.org/r/404045 [00:39:33] (03PS2) 10Krinkle: webperf: Introduce 'templates' in test fixture and use for mwload [puppet] - 10https://gerrit.wikimedia.org/r/404046 [00:43:07] 10Operations, 10Collection, 10OfflineContentGenerator, 10Readers-Web-Backlog (Tracking), 10Services (watching): Remove deprecated features from book creator UI - https://phabricator.wikimedia.org/T150917#3974331 (10Jdlrobson) [00:43:21] PROBLEM - puppet last run on lvs5003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:45:03] MaxSem: did you deploy? [00:45:51] greg-g: spent a long time trying to figure out why beta is borked. turns out, I created the table on a wrong host [00:46:01] can push now [00:46:13] solid [00:49:56] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/410267/ (duration: 01m 13s) [00:50:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:52:49] !log maxsem@tin Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/410267/ (duration: 01m 14s) [00:53:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:55:15] (03PS1) 1020after4: Phabricator: Increase zend opcache limits in php.ini [puppet] - 10https://gerrit.wikimedia.org/r/410626 [00:55:43] {{done}} [00:56:34] mutante: when you have a couple of minutes, I'd like to deploy ^ but it's not urgent [01:00:04] twentyafterfour: That opportune time is upon us again. Time for a Phabricator update deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180215T0100). [01:00:04] No GERRIT patches in the queue for this window AFAICS. [01:02:03] (03CR) 10Dzahn: [C: 032] Phabricator: Increase zend opcache limits in php.ini [puppet] - 10https://gerrit.wikimedia.org/r/410626 (owner: 1020after4) [01:03:42] !log the scheduled phabricator upgrade is delayed until 06:00 UTC Thursday because of large database migrations. Doing the upgrade at a time when DBAs are available to assist. [01:03:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:03:58] !log using the current phabricator maintenance window to deploy https://gerrit.wikimedia.org/r/#/c/410626/ [01:04:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:04:14] mutante: cool thanks! [01:04:15] twentyafterfour: it's merged on the master [01:04:17] yw [01:04:24] I'll run puppet and then bump apache [01:04:36] it happened to be already running.. cool [01:13:01] PROBLEM - HHVM rendering on mw1293 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:13:21] RECOVERY - puppet last run on lvs5003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [01:13:51] RECOVERY - HHVM rendering on mw1293 is OK: HTTP OK: HTTP/1.1 200 OK - 73885 bytes in 0.148 second response time [01:16:01] twentyafterfour: we should also raise the php execution timeout from 10 secs to the default of 30? [01:16:06] Or even 20? [01:23:20] paladox: I don't know... [01:23:26] maybe 15? [01:23:31] Yep [01:23:42] Would at least possibly fix the timeout issue [01:23:59] (Task is some where under phab) [01:23:59] 30 is a long time to burn cpu for nothing. Usually if something doesn't complete in 10 seconds it's not going to complete in 20 or 30 either [01:24:12] Oh [01:26:56] twentyafterfour: I think 15 or 20 will help one of the tasks [01:27:20] !log restarting apache2 on phab1001 to free deadlocked php processes. [01:27:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:29:29] twentyafterfour: https://phabricator.wikimedia.org/T125357 [01:30:35] twentyafterfour: I think the increase will help ^^ [01:31:03] Also may help large repos if mw becomes large enough though that’s years away :) [01:31:55] is optimistic that this was the last irregular restart needed already :) [01:44:48] mutante: do you know why we lost apache_status on phab1001? [01:44:56] https://grafana-admin.wikimedia.org/dashboard/db/phabricator?orgId=1&from=now-1h&to=now [01:45:17] apache metrics at the bottom are gone and running apache-status in the shell doesn't return the proper result [01:45:28] twentyafterfour: i think i do ... [01:45:46] these were helpful for watching the pool gradually dwindle [01:45:48] because it was removed in the httpd module and we switched to it from apache module [01:45:54] oh [01:46:13] uhm.. there was a question about it on another unmerged change as well.. [01:46:16] i'll get into it [01:46:27] i can also revert that first [01:47:46] https://gerrit.wikimedia.org/r/#/c/400100/ [01:48:19] twentyafterfour: i'll fix it, it's supposed to be there.. if i can have a few minutes though [01:48:32] or i can revert and fix it tomorrow [01:49:08] twentyafterfour: https://gerrit.wikimedia.org/r/#/c/408947/ [01:49:33] I like the looks of it :) [01:49:48] the apache module was intemidating to use [01:49:49] except that the status page should not have been affected :) *nod* [01:49:56] lots of implicit behavior in the old code [01:50:10] the main reason was to stop that everybody gets a jenkins-bot -1 when using apache module [01:50:18] hahah [01:50:18] for including from another module [01:50:44] previously it was hard to have nginx and apache on the same machine, is that still the case? [01:50:48] * twentyafterfour reads the code [01:51:26] twentyafterfour: if it’s on a different port [01:51:41] Or if it is pined to the ip + port [01:52:25] I think the problem was something else, they had resource conflicts between them... but actually I think that was fixed a while ago in apache/nginx modules [01:52:58] twentyafterfour: the phab project uses the httpd module now :) [01:53:13] So the conflicts should be gone if there was any :) [01:53:51] i'm not sure about the mixing apache and nginx on one machine yet, haven't had to try it [01:54:17] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3974521 (10mmodell) [02:01:39] !log phab1001 - restarted apache to fix server status page [02:01:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:02:19] ok, i know how to fix it, now just need to puppetize [02:02:30] twentyafterfour hi i get this error [02:02:30] Unable to load libphutil. Put libphutil/ next to phabricator/, or update your PHP 'include_path' to include the parent directory of libphutil/. [02:02:35] when testing the latest update [02:02:39] https://phab.wmflabs.org [02:11:57] (03PS1) 10Dzahn: httpd: add server status page config from apache module [puppet] - 10https://gerrit.wikimedia.org/r/410630 [02:16:46] compiling and back in few min [02:18:20] PROBLEM - pdfrender on scb1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:18:25] oh i forgot the restart of apache2 [02:18:27] fixed now [02:21:10] RECOVERY - pdfrender on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time [02:23:54] (03Draft1) 10Paladox: Phabricator: Raise php max_execution_time to 15 [puppet] - 10https://gerrit.wikimedia.org/r/410631 (https://phabricator.wikimedia.org/T125357) [02:23:58] (03Draft2) 10Paladox: Phabricator: Raise php max_execution_time to 15 [puppet] - 10https://gerrit.wikimedia.org/r/410631 (https://phabricator.wikimedia.org/T125357) [02:24:02] twentyafterfour mutante ^^ [02:25:43] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.20) (duration: 07m 25s) [02:25:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:34:41] (03PS2) 10Dzahn: httpd: add server status page config from apache module [puppet] - 10https://gerrit.wikimedia.org/r/410630 [02:37:48] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/9984/" [puppet] - 10https://gerrit.wikimedia.org/r/410630 (owner: 10Dzahn) [02:41:13] twentyafterfour: fixed! [phab1001:~] $ apache-status [02:43:51] (03CR) 10Dzahn: "https://gerrit.wikimedia.org/r/#/c/410630/ fixed the server-status page part" [puppet] - 10https://gerrit.wikimedia.org/r/409462 (owner: 10Dzahn) [02:45:06] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3974546 (10mmodell) This looks much better now: ``` _... [02:46:59] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3974547 (10Dzahn) I fixed the server status page. it's... [03:00:37] (03PS7) 10Dzahn: otrs: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409462 [03:05:26] (03CR) 10Dzahn: "status page and perl2 module are here: http://puppet-compiler.wmflabs.org/9986/mendelevium.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/409462 (owner: 10Dzahn) [03:48:40] Krinkle: nice necrobump ;) [03:53:20] PROBLEM - HP RAID on db2048 is CRITICAL: CRITICAL: Slot 0: Failed: 1I:1:1 - OK: 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK [03:53:22] ACKNOWLEDGEMENT - HP RAID on db2048 is CRITICAL: CRITICAL: Slot 0: Failed: 1I:1:1 - OK: 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T187419 [03:53:25] 10Operations, 10ops-codfw: Degraded RAID on db2048 - https://phabricator.wikimedia.org/T187419#3974567 (10ops-monitoring-bot) [05:40:28] !log andrew@tin Started deploy [horizon/deploy@c355366]: testing a couple of cherry-picks in horizon [05:40:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:43:16] 10Operations, 10ops-codfw: Degraded RAID on db2048 - https://phabricator.wikimedia.org/T187419#3974642 (10jcrespo) [05:43:19] 10Operations, 10ops-codfw, 10DBA: db2048: RAID with predictive failure - https://phabricator.wikimedia.org/T187328#3974644 (10jcrespo) [05:43:33] !log andrew@tin Finished deploy [horizon/deploy@c355366]: testing a couple of cherry-picks in horizon (duration: 03m 06s) [05:43:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:44:00] 10Operations, 10ops-codfw: Degraded RAID on db2048 - https://phabricator.wikimedia.org/T187419#3974567 (10jcrespo) p:05Triage>03Normal a:03Papaul [05:44:44] 10Operations, 10ops-codfw: Degraded RAID on db2048 - https://phabricator.wikimedia.org/T187419#3974567 (10jcrespo) It finally failed completely. [05:53:14] (03PS4) 10Jcrespo: [WIP]Orchestrate the source of the database backups per datacenter [puppet] - 10https://gerrit.wikimedia.org/r/410180 (https://phabricator.wikimedia.org/T184696) [05:53:16] (03PS1) 10Jcrespo: mariadb: Switchover dbproxy1008 from db1043 to db1059 [puppet] - 10https://gerrit.wikimedia.org/r/410640 (https://phabricator.wikimedia.org/T187143) [05:53:40] (03CR) 10jerkins-bot: [V: 04-1] [WIP]Orchestrate the source of the database backups per datacenter [puppet] - 10https://gerrit.wikimedia.org/r/410180 (https://phabricator.wikimedia.org/T184696) (owner: 10Jcrespo) [05:53:46] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Switchover dbproxy1008 from db1043 to db1059 [puppet] - 10https://gerrit.wikimedia.org/r/410640 (https://phabricator.wikimedia.org/T187143) (owner: 10Jcrespo) [05:54:40] (03PS2) 10Jcrespo: mariadb: Switchover dbproxy1008 from db1043 to db1059 [puppet] - 10https://gerrit.wikimedia.org/r/410640 (https://phabricator.wikimedia.org/T187143) [05:55:07] morning [05:56:47] (03CR) 10Marostegui: [C: 031] mariadb: Switchover dbproxy1008 from db1043 to db1059 [puppet] - 10https://gerrit.wikimedia.org/r/410640 (https://phabricator.wikimedia.org/T187143) (owner: 10Jcrespo) [05:57:33] !log restarting dbproxy1008 for kernel upgrade [05:57:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:59:56] (03CR) 10Jcrespo: [C: 032] mariadb: Switchover dbproxy1008 from db1043 to db1059 [puppet] - 10https://gerrit.wikimedia.org/r/410640 (https://phabricator.wikimedia.org/T187143) (owner: 10Jcrespo) [06:06:31] !log Upgrade mysql on db1110 [06:06:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:08:38] (03PS1) 10Jcrespo: Point m3-master to dbproxy1008 [dns] - 10https://gerrit.wikimedia.org/r/410641 (https://phabricator.wikimedia.org/T187143) [06:09:38] (03CR) 10Marostegui: [C: 031] Point m3-master to dbproxy1008 [dns] - 10https://gerrit.wikimedia.org/r/410641 (https://phabricator.wikimedia.org/T187143) (owner: 10Jcrespo) [06:11:31] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410642 [06:12:35] (03PS1) 10Jcrespo: mariadb: Repoint dbproxy1003 after reimage [puppet] - 10https://gerrit.wikimedia.org/r/410643 (https://phabricator.wikimedia.org/T187143) [06:12:46] (03PS2) 10Jcrespo: mariadb: Repoint dbproxy1003 after reimage [puppet] - 10https://gerrit.wikimedia.org/r/410643 (https://phabricator.wikimedia.org/T187143) [06:13:02] (03CR) 10Jcrespo: [C: 04-1] "Not before dns failover and reimage." [puppet] - 10https://gerrit.wikimedia.org/r/410643 (https://phabricator.wikimedia.org/T187143) (owner: 10Jcrespo) [06:14:08] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Slowly repool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410642 (owner: 10Marostegui) [06:15:36] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410642 (owner: 10Marostegui) [06:16:57] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410642 (owner: 10Marostegui) [06:17:05] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Slowly repool db1110 (duration: 01m 13s) [06:17:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:27:39] twentyafterfour, marostegui [06:27:48] yeah, better here [06:28:35] !log scheduling downtime for phabricator on phab1001 [06:28:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:28:52] you taking care of icinga? [06:28:54] ok [06:29:03] I've got the phab code update ready to go [06:29:08] and I'm logged into phab1001 [06:29:12] Check systemd state is disabled, is that intended? [06:29:31] it doesn't page, can I remove the disable alerting? [06:29:43] I think it's intentional but not sure [06:29:49] mutante did that I believe [06:30:08] * twentyafterfour looks [06:30:13] ok, I will ask him later [06:30:30] so I will now set db1043 as read only [06:30:33] phd and apache checks need to be downtimed [06:30:35] +1 [06:30:46] twentyafterfour: which one did I miss? [06:30:52] I'll do phd [06:30:58] [06:31:00] PHD should be running [06:31:03] [06:31:05] PHD should be supervising processes [06:31:08] those go offline when I do the update [06:31:09] oh, I see [06:31:13] plase do those [06:31:22] done [06:31:40] marostegui: aside from looking at monitoring, can you catch the db1043 and db1059 coords when I set read only? [06:31:46] yep [06:32:10] let me know when you are ready and I would set pahricator in read only mode, then the database [06:32:26] twentyafterfour: I will need you to do the phab-level read only [06:32:31] I am ready [06:32:31] I will do the db-level [06:33:40] !log about to set phabricator.wikimedia.org as read only [06:33:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:34:36] I see read only on phab now [06:34:36] !log set cluster.read-only in phabricator [06:34:39] (not on the db yet) [06:34:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:34:56] ok, then setting it on the db [06:35:32] I see it on db1043 now [06:35:34] getting coords [06:35:34] !log set db1043 as read only [06:35:42] get both securely [06:35:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:35:52] when we are sure both are in sync [06:35:59] we will merge the dns update [06:36:02] heartbeat writing - but that is expected [06:36:10] I can kill it [06:36:15] yeah, let's do that [06:36:21] but alterting :-/ [06:36:37] let's kill it for a second [06:36:49] it is stopped now [06:36:59] get those coords [06:37:10] done [06:37:12] you can start it again [06:37:20] starting it manually on db1059 [06:37:37] ok [06:37:41] coords are on our usual etherpad [06:38:07] !log merging dns update for phabricator db [06:38:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:38:28] ups, I didn't merge [06:38:49] (03CR) 10Jcrespo: [C: 032] Point m3-master to dbproxy1008 [dns] - 10https://gerrit.wikimedia.org/r/410641 (https://phabricator.wikimedia.org/T187143) (owner: 10Jcrespo) [06:39:00] now doing that [06:39:28] when everybody is happy, I will set db1059 as read write [06:39:36] ok from my side [06:39:39] twentyafterfour? [06:40:19] dns is taking some time to propagate [06:40:20] good for me [06:40:55] just need to restart apache and disable read-only when dns propogates, correct? [06:41:02] yes [06:41:26] but it is still responding dbproxy1003.eqiad.wmnet most of the time [06:42:02] phab1001:~$ dig m3-master.eqiad.wmnet [06:42:17] I may consider killing dbproxy1003 [06:42:20] yeah, I am checking for a few hosts and dbproxy1008 is now coming in more times than 1003 at least :) [06:42:24] we want to see dbproxy1008? [06:42:35] yep [06:43:20] still seeing 1003 almost half of the time [06:43:38] * twentyafterfour is watching `watch host m3-master.eqiad.wmnet` [06:43:40] that is the problem od dns-based failover :-) [06:44:04] seems stable to me [06:44:09] well I'm not sure where it's cached but 1003 does appear to be mostly gone now [06:44:24] I see 1008 consistently now [06:44:24] ok, seting db1059 as read write [06:44:29] good [06:44:52] !log set db1059 in read-write [06:44:57] ok to set phabricator read-write? [06:45:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:45:05] yes, and restart threads [06:45:16] or thigs will fail for users [06:45:31] !log restarted apache on phab1001 and reset cluster.read-only to false [06:45:33] Yeah, I am getting this: Database host "m3-master.eqiad.wmnet" is configured as a master, but is replicating another host. This is dangerous and can mangle or destroy data. Only replicas should be replicating. Stop replication on the host or reconfigure Phabricator. [06:45:37] from phab [06:45:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:45:44] Database host "m3-master.eqiad.wmnet" is configured as a master, but is replicating another host. [06:45:59] phabricator error ^ [06:46:00] lets disconnect db1059 [06:46:09] ok [06:46:13] assuming we have the coords [06:46:16] Stop replication on the host or reconfigure Phabricator. [06:46:25] we do, they are on our etherpad [06:46:43] what about now? [06:46:52] works [06:46:56] error gone [06:46:58] phabricator is pretty smart detecting that [06:47:05] it is cool [06:47:09] I was able to write [06:47:09] and totally was my fault [06:47:19] can you create a test ticket,etc [06:47:21] https://phabricator.wikimedia.org/T187328#3974674 [06:47:27] alter doesn't break stuff, etc [06:47:32] well sweet looks like it's good to go, so now I can run the code upgrade [06:47:36] https://phabricator.wikimedia.org/T187425 [06:47:40] let me do some check [06:47:41] ^ new ticket created [06:47:47] that there are no threads left [06:47:50] on the old server [06:48:03] there are not [06:48:07] looks like I can operate that ticket fine [06:48:09] so this new server [06:48:16] has the alter already [06:48:26] cool everything looks like it's good [06:48:29] yep [06:48:31] I disabled the alter in the migration [06:48:32] if you skipped that and the update [06:48:37] so it won't try to run again [06:48:40] you can continue the rest of the migration [06:48:43] ok [06:48:57] do read only if you need, or whatever is the right way [06:49:03] !log starting phabricator upgrade tagged release/2018-02-15/1 [06:49:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:49:16] phabricator insists on being completely offline (apache shut down) for the upgrade [06:49:19] if something goes wrong, we will point back to the old server [06:49:24] that is ok to me [06:50:03] meanwhile, we can maybe repoint the old server to the new one, without starting replication [06:50:26] and do the puppet commits [06:50:38] :) [06:50:59] !log shutting down apache on phab1001 to deploy update, downtime should be only a couple of minutes [06:51:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:53:32] (03PS5) 10Jcrespo: [WIP]Orchestrate the source of the database backups per datacenter [puppet] - 10https://gerrit.wikimedia.org/r/410180 (https://phabricator.wikimedia.org/T184696) [06:53:34] (03PS3) 10Jcrespo: mariadb: Repoint dbproxy1003 after reimage [puppet] - 10https://gerrit.wikimedia.org/r/410643 (https://phabricator.wikimedia.org/T187143) [06:53:36] (03PS1) 10Jcrespo: mariadb: promote db1059 to be the new m3 master [puppet] - 10https://gerrit.wikimedia.org/r/410651 (https://phabricator.wikimedia.org/T187143) [06:53:56] (03PS4) 10Jcrespo: mariadb: Repoint dbproxy1003 after reimage [puppet] - 10https://gerrit.wikimedia.org/r/410643 (https://phabricator.wikimedia.org/T187143) [06:53:58] (03CR) 10jerkins-bot: [V: 04-1] [WIP]Orchestrate the source of the database backups per datacenter [puppet] - 10https://gerrit.wikimedia.org/r/410180 (https://phabricator.wikimedia.org/T184696) (owner: 10Jcrespo) [06:54:03] marostegui: https://gerrit.wikimedia.org/r/#/c/410643/ [06:54:11] let me seee [06:54:16] not that one [06:54:18] (03CR) 10jerkins-bot: [V: 04-1] mariadb: promote db1059 to be the new m3 master [puppet] - 10https://gerrit.wikimedia.org/r/410651 (https://phabricator.wikimedia.org/T187143) (owner: 10Jcrespo) [06:54:28] this https://gerrit.wikimedia.org/r/#/c/410651/ [06:54:33] (03PS2) 10Jcrespo: mariadb: promote db1059 to be the new m3 master [puppet] - 10https://gerrit.wikimedia.org/r/410651 (https://phabricator.wikimedia.org/T187143) [06:54:51] (03CR) 10Marostegui: [C: 031] mariadb: promote db1059 to be the new m3 master [puppet] - 10https://gerrit.wikimedia.org/r/410651 (https://phabricator.wikimedia.org/T187143) (owner: 10Jcrespo) [06:54:54] this one is not fancy [06:54:59] (03CR) 10jerkins-bot: [V: 04-1] mariadb: promote db1059 to be the new m3 master [puppet] - 10https://gerrit.wikimedia.org/r/410651 (https://phabricator.wikimedia.org/T187143) (owner: 10Jcrespo) [06:55:15] extra space [06:55:50] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410652 [06:56:03] (03PS3) 10Jcrespo: mariadb: promote db1059 to be the new m3 master [puppet] - 10https://gerrit.wikimedia.org/r/410651 (https://phabricator.wikimedia.org/T187143) [06:56:18] (03CR) 10Marostegui: [C: 031] mariadb: promote db1059 to be the new m3 master [puppet] - 10https://gerrit.wikimedia.org/r/410651 (https://phabricator.wikimedia.org/T187143) (owner: 10Jcrespo) [06:57:14] !log phabricator database migrations applied [06:57:24] nice [06:57:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:57:53] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410652 (owner: 10Marostegui) [06:57:59] !log apache restarted, update appears to be successful [06:58:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:58:14] so you run the migration #2 now? [06:58:18] or later [06:59:37] twentyafterfour: nice! [07:00:01] jynus: I'm going to run it now [07:00:27] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410652 (owner: 10Marostegui) [07:00:37] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410652 (owner: 10Marostegui) [07:02:41] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase weight for db1110 (duration: 01m 13s) [07:02:47] (03PS6) 10Jcrespo: [WIP]Orchestrate the source of the database backups per datacenter [puppet] - 10https://gerrit.wikimedia.org/r/410180 (https://phabricator.wikimedia.org/T184696) [07:02:49] (03PS5) 10Jcrespo: mariadb: Repoint dbproxy1003 after reimage [puppet] - 10https://gerrit.wikimedia.org/r/410643 (https://phabricator.wikimedia.org/T187143) [07:02:51] (03PS4) 10Jcrespo: mariadb: promote db1059 to be the new m3 master [puppet] - 10https://gerrit.wikimedia.org/r/410651 (https://phabricator.wikimedia.org/T187143) [07:02:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:03:03] (03PS5) 10Jcrespo: mariadb: promote db1059 to be the new m3 master [puppet] - 10https://gerrit.wikimedia.org/r/410651 (https://phabricator.wikimedia.org/T187143) [07:03:11] now with more prometheus [07:03:12] (03CR) 10jerkins-bot: [V: 04-1] [WIP]Orchestrate the source of the database backups per datacenter [puppet] - 10https://gerrit.wikimedia.org/r/410180 (https://phabricator.wikimedia.org/T184696) (owner: 10Jcrespo) [07:03:21] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Repoint dbproxy1003 after reimage [puppet] - 10https://gerrit.wikimedia.org/r/410643 (https://phabricator.wikimedia.org/T187143) (owner: 10Jcrespo) [07:03:39] (03PS6) 10Jcrespo: mariadb: Repoint dbproxy1003 after reimage [puppet] - 10https://gerrit.wikimedia.org/r/410643 (https://phabricator.wikimedia.org/T187143) [07:03:58] (03CR) 10Jcrespo: [C: 032] mariadb: promote db1059 to be the new m3 master [puppet] - 10https://gerrit.wikimedia.org/r/410651 (https://phabricator.wikimedia.org/T187143) (owner: 10Jcrespo) [07:04:21] !log Applying patch "phabricator:20180215.maniphest.02.populate.php" to host "m3-master.eqiad.wmnet"... [07:04:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:07:30] I want to now reimage dbproxy1003 if you let me [07:07:36] go for it! [07:08:00] can you double check 1003 is not active in any way [07:08:14] that I am doing the right one, basically [07:08:30] let me double check [07:09:24] all the connections are on 1008 [07:09:30] 1003 is empty :) [07:09:43] cool [07:09:58] !log reimage dbproxy1003 to stretch [07:10:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:10:28] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410655 [07:11:36] (03PS7) 10Jcrespo: mariadb: Repoint dbproxy1003 after reimage [puppet] - 10https://gerrit.wikimedia.org/r/410643 (https://phabricator.wikimedia.org/T187143) [07:11:58] (03CR) 10Jcrespo: [C: 032] mariadb: Repoint dbproxy1003 after reimage [puppet] - 10https://gerrit.wikimedia.org/r/410643 (https://phabricator.wikimedia.org/T187143) (owner: 10Jcrespo) [07:13:39] not sure if too soon but /var/log/apache2/error.log on phab1001 looks clear now [07:16:15] twentyafterfour: FWIW, while i successfully posted this comment 10 minutes ago https://phabricator.wikimedia.org/T187244#3974712 , it's still not yet showing up in my contributions list: https://phabricator.wikimedia.org/p/Tbayer/ [07:16:56] https://dbtree.wikimedia.org/ [07:17:21] we should connect db1043 at some point [07:17:57] of course [07:18:05] it can wait [07:18:13] yep [07:18:19] also we have to make the schema change there [07:18:26] yeah [07:18:43] I just do not think we should connect AND start replication yet [07:18:45] and look for a replacement host for db1043 [07:18:51] actually [07:19:06] maybe we should not connect it, and migrate its date directly [07:19:33] which was the initial point of all of thise, aside from a safeguard [07:19:57] HaeB: looking into it [07:20:30] HaeB: oh, it's because phd is still offline [07:20:37] (03CR) 10Marostegui: [V: 032 C: 032] db-eqiad.php: Fully repool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410655 (owner: 10Marostegui) [07:20:47] marostegui: do we have a candidate host? [07:20:50] or can we make one? [07:21:34] mmm don't remember [07:21:35] let's see [07:22:00] initially we said db1053 [07:22:00] I think it would be one of those master switchovers [07:22:04] let's see where that host is [07:22:22] db1053 is vslow in s2 [07:22:59] assuming s2 master failed over to db1074 [07:23:28] well, not a blocker [07:23:31] yeah, but that failover shouldn't block it [07:23:32] yeah [07:23:34] where is db1066 [07:23:43] still on s1? [07:23:49] s1 yeah [07:24:06] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410655 (owner: 10Marostegui) [07:24:15] which gets db1075 from s2 [07:24:18] *76 [07:24:24] maybe we can make it easier [07:24:45] actually no, it was like that becaus size [07:25:42] db1053 < db1066 [07:26:07] we can simply remove db1053 and place db1060 as vslow, db1074 and db1076 as api in s2 [07:26:14] that [07:26:20] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Fully repool db1110 (duration: 01m 12s) [07:26:25] or move db1066 directly, wait for replacements [07:26:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:26:48] as that is without counting db105* going away [07:26:50] Yeah, but if we do the db1074 and db1076 in s2 we do not have to touch s1 for ow [07:26:55] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410655 (owner: 10Marostegui) [07:27:01] it is ok to me [07:27:14] sure, both are ok, my proposal doesn't touch s1 (yet) [07:27:19] but I am fine either aay [07:27:20] way [07:27:35] I think that is easier [07:27:56] althoug we will have to do this same process in a few months time [07:27:59] !log phabricator database migration finished [07:28:01] but we knew that would happen [07:28:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:28:10] yeah, but we'll have the replacements hopefully [07:28:33] we cannot just ignore hw if it doesn't have a replacement now [07:29:19] I am not doing so [07:29:50] I am not telling you are doing so, I am agreeing with you! [07:29:56] XDD [07:30:09] let's do that then [07:30:52] let me check dbproxy1003 first [07:30:55] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3974706 (10mmodell) [07:30:56] !log phabricator upgrade finished. phd is back online. [07:31:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:31:12] 10Operations, 10Analytics-Kanban, 10monitoring, 10netops, and 2 others: Pull netflow data in realtime from Kafka via Tranquillity/Spark - https://phabricator.wikimedia.org/T181036#3974720 (10elukey) [07:31:43] jynus: you won't touch db1066 then? Just asking to see if I can start working with it for the checksumming [07:32:13] apprently not, or not for this [07:32:19] cool! [07:33:10] (03PS1) 10Marostegui: db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410656 (https://phabricator.wikimedia.org/T187089) [07:33:39] even if it was touched, contents would be copied elswhere [07:33:46] not just discarded [07:33:56] :) [07:34:40] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3974735 (10mmodell) Ok so the upstream code is deployed... [07:34:54] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410656 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [07:35:11] we should steal db1051 from s5 too [07:35:21] totally yes [07:35:22] not needed, and will give us another misc host [07:35:49] !log installing libvorbis security updates on stretch [07:36:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:36:54] twentyafterfour: so everthing looks good? [07:37:31] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410656 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [07:37:32] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410656 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [07:37:34] should we wait a day or so for issues, or it should be as unlikely that backups would be enough? [07:39:01] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1082 (duration: 01m 13s) [07:39:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:39:25] !log Deploy schema change on db1082 (sanitarium master) with replication, this will generate lag on labs - T187089 T185128 T153182 [07:39:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:39:39] T187089: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 [07:39:39] T153182: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182 [07:39:39] T185128: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128 [07:42:07] twentyafterfour: can we re-enable puppet on phab1001? [07:44:42] (03CR) 10Giuseppe Lavagetto: Increase test coverage (033 comments) [software/conftool] - 10https://gerrit.wikimedia.org/r/410224 (owner: 10Giuseppe Lavagetto) [07:44:45] elukey: yes, sorry [07:44:57] my upgrade script should have done that [07:45:11] done [07:45:20] jynus: yes everything looks good [07:45:46] jynus: I think backups should be enough, I don't think we need to keep the old master around that long [07:46:35] (03CR) 1020after4: [C: 031] "on a cursory reading of the code, everything looks good" [puppet] - 10https://gerrit.wikimedia.org/r/392221 (owner: 10Aaron Schulz) [07:47:48] dbproxy1003 successfuly being reimaged [07:48:12] (03PS1) 10Marostegui: db-eqiad.php: Depool db1089,db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410660 (https://phabricator.wikimedia.org/T162807) [07:49:46] twentyafterfour: super thanks! httpd in G state seems good, segfaults still there (but we kinda know they are related to php itself) [07:50:57] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1089,db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410660 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [07:51:02] (03CR) 1020after4: [C: 04-2] Enable Extension:Newsletter on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381537 (https://phabricator.wikimedia.org/T177151) (owner: 10Zoranzoki21) [07:51:17] (03CR) 1020after4: [C: 04-2] "-2 per Qgil's last comment" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381537 (https://phabricator.wikimedia.org/T177151) (owner: 10Zoranzoki21) [07:53:15] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1089,db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410660 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [07:53:27] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1089,db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410660 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [07:54:36] I think people will appreciate the new 'mute notifications' button on phabricator tasks ;) [07:54:57] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1089 and db1066 - T162807 (duration: 01m 12s) [07:55:08] twentyafterfour: I was using unsusbcribe for that :p [07:55:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:55:14] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [07:55:55] !log Stop replication in sync on db1089 and db1066 - T162807 [07:56:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:56:46] There are several new features in this phabricator upgrade that I think people will like: https://phabricator.wikimedia.org/phame/post/view/85/phabricator_updates_for_february_2018/ [07:56:54] marostegui: unsubscribe is a perfectly good option [07:57:24] but mute prevents notifications even if you get re-subscribed [07:57:26] twentyafterfour: But a mute thing is cleaner I would say, so, nice! [07:57:29] yeah, exactly [08:02:25] so we can now prevent any details being disclosed via email according to various herald criteria [08:02:30] including which space a task is in [08:03:06] it causes phabricator to send a generic email with links to see what happened [08:03:28] we could maybe add this to procurement tasks if opsen want that [08:13:01] (03PS1) 10Elukey: profile::hive: remove jmx trans and java 7 support [puppet] - 10https://gerrit.wikimedia.org/r/410680 (https://phabricator.wikimedia.org/T166248) [08:13:43] (03CR) 10Elukey: [C: 032] profile::hive: remove jmx trans and java 7 support [puppet] - 10https://gerrit.wikimedia.org/r/410680 (https://phabricator.wikimedia.org/T166248) (owner: 10Elukey) [08:16:38] (03CR) 10Muehlenhoff: Add support for selective automatic restarts of stateless services (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/399618 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [08:18:29] !log Upgrade kernel + mariadb on db1082 (sanitarium master in s5) [08:18:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:29:37] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410690 [08:32:59] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Slowly repool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410690 (owner: 10Marostegui) [08:38:14] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410690 (owner: 10Marostegui) [08:38:25] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410690 (owner: 10Marostegui) [08:39:47] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Slowly repool db1082 (duration: 01m 13s) [08:39:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:54:49] !log installing erlang security updates on labtestcontrol* [08:55:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:55:24] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410714 [09:01:20] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410714 (owner: 10Marostegui) [09:05:44] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410714 (owner: 10Marostegui) [09:06:57] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410714 (owner: 10Marostegui) [09:07:15] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic db1082 (duration: 01m 12s) [09:07:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:13:31] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410732 [09:13:34] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3974829 (10elukey) >>! In T182832#3974547, @Dzahn wrote... [09:22:06] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fully repool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410732 (owner: 10Marostegui) [09:28:28] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410732 (owner: 10Marostegui) [09:28:38] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410732 (owner: 10Marostegui) [09:29:57] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Fully repool db1082 (duration: 01m 12s) [09:30:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:31:27] (03PS1) 10Marostegui: db-eqiad.php: Depool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410739 (https://phabricator.wikimedia.org/T187089) [09:33:22] 10Operations, 10Ops-Access-Requests, 10Traffic, 10Patch-For-Review: Ops Onboarding for Valentín Gutiérrez - https://phabricator.wikimedia.org/T187035#3974843 (10Vgutierrez) I just added myself to root and security aliases. Regarding the ops mailing lists, @Volans told me that there is a typo on my email ad... [09:33:30] (03PS1) 10Elukey: role::phabricator: add prometheus apache exporter [puppet] - 10https://gerrit.wikimedia.org/r/410740 (https://phabricator.wikimedia.org/T182832) [09:34:38] (03PS1) 10Alexandros Kosiaris: Set service-checker pod in the same namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/410745 [09:34:40] (03PS1) 10Alexandros Kosiaris: Specify namespace for the service [deployment-charts] - 10https://gerrit.wikimedia.org/r/410746 [09:36:05] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410739 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [09:36:29] (03CR) 1020after4: [C: 031] role::phabricator: add prometheus apache exporter [puppet] - 10https://gerrit.wikimedia.org/r/410740 (https://phabricator.wikimedia.org/T182832) (owner: 10Elukey) [09:37:29] (03CR) 10Filippo Giunchedi: [C: 031] role::phabricator: add prometheus apache exporter [puppet] - 10https://gerrit.wikimedia.org/r/410740 (https://phabricator.wikimedia.org/T182832) (owner: 10Elukey) [09:37:31] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410739 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [09:37:35] (03CR) 10Elukey: "PCC looks good: https://puppet-compiler.wmflabs.org/compiler02/9987" [puppet] - 10https://gerrit.wikimedia.org/r/410740 (https://phabricator.wikimedia.org/T182832) (owner: 10Elukey) [09:37:41] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410739 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [09:38:06] (03CR) 10Elukey: [C: 032] role::phabricator: add prometheus apache exporter [puppet] - 10https://gerrit.wikimedia.org/r/410740 (https://phabricator.wikimedia.org/T182832) (owner: 10Elukey) [09:39:13] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1097 for s4 and s5 (duration: 01m 12s) [09:39:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:39:28] !log Upgrade kernel and mariadb on db1097 [09:39:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:40:57] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3974852 (10mmodell) @elukey: last night, before the upg... [09:44:23] (03PS1) 10Filippo Giunchedi: mail: switch icinga check to LE variant [puppet] - 10https://gerrit.wikimedia.org/r/410758 (https://phabricator.wikimedia.org/T181519) [09:44:25] (03PS1) 10Filippo Giunchedi: icinga: tweak thresholds for LE certs alerting [puppet] - 10https://gerrit.wikimedia.org/r/410759 (https://phabricator.wikimedia.org/T181519) [09:46:26] (03PS1) 10Alexandros Kosiaris: WIP: Force jenkins-slave being member of docker [puppet] - 10https://gerrit.wikimedia.org/r/410763 (https://phabricator.wikimedia.org/T186790) [09:47:58] !log Deploy schema change on db1097:3315 - T187089 T185128 T153182 [09:48:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:48:14] T187089: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 [09:48:14] T153182: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182 [09:48:14] T185128: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128 [09:49:47] (03PS1) 10Elukey: phabricator: disable opcache.fastshutdow [puppet] - 10https://gerrit.wikimedia.org/r/410767 (https://phabricator.wikimedia.org/T182832) [09:50:46] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410769 [09:52:16] (03PS2) 10Elukey: phabricator: disable opcache.fastshutdown [puppet] - 10https://gerrit.wikimedia.org/r/410767 (https://phabricator.wikimedia.org/T182832) [09:53:03] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Slowly repool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410769 (owner: 10Marostegui) [09:56:32] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410769 (owner: 10Marostegui) [09:56:56] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410769 (owner: 10Marostegui) [09:58:00] (03CR) 1020after4: [C: 031] "it seems that the message 'zend_mm_heap corrupted' is really a bit of misdirection. Fast shutdown causes the zend allocator to cleanup hea" [puppet] - 10https://gerrit.wikimedia.org/r/410767 (https://phabricator.wikimedia.org/T182832) (owner: 10Elukey) [09:58:02] (03CR) 10Reedy: "I think it doesn't actually need any optimisation (though, the db indexes do need fixing), for a days worth of logs in the table, even for" [puppet] - 10https://gerrit.wikimedia.org/r/410349 (https://phabricator.wikimedia.org/T187078) (owner: 10Dzahn) [09:58:25] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Slowly repool db1097:3314 (duration: 01m 12s) [09:58:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:00:00] (03CR) 1020after4: [C: 031] "feel free to merge this, at least it will eliminate a possible source of confusion when looking for other causes of segfaulting." [puppet] - 10https://gerrit.wikimedia.org/r/410767 (https://phabricator.wikimedia.org/T182832) (owner: 10Elukey) [10:03:05] (03CR) 10Elukey: "Ack, I'll keep an eye on the httpd error log to see other occurrences of the issue before proceeding, there is no hurry. Thanks for the fe" [puppet] - 10https://gerrit.wikimedia.org/r/410767 (https://phabricator.wikimedia.org/T182832) (owner: 10Elukey) [10:03:08] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3974890 (10mmodell) I'm now convinced that the problem... [10:06:02] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3974891 (10elukey) >>! In T182832#3974890, @mmodell wro... [10:09:51] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410786 [10:14:55] (03PS7) 10Giuseppe Lavagetto: Add support for jsonschema-based entities [software/conftool] - 10https://gerrit.wikimedia.org/r/408585 (https://phabricator.wikimedia.org/T185080) [10:14:57] (03PS3) 10Giuseppe Lavagetto: Increase test coverage [software/conftool] - 10https://gerrit.wikimedia.org/r/410224 [10:14:59] (03PS3) 10Giuseppe Lavagetto: Add simple actions to be exercised only on the basic types. [software/conftool] - 10https://gerrit.wikimedia.org/r/410225 [10:15:01] (03PS3) 10Giuseppe Lavagetto: Release new version of conftool [software/conftool] - 10https://gerrit.wikimedia.org/r/410226 [10:15:06] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410786 (owner: 10Marostegui) [10:16:08] (03CR) 10jerkins-bot: [V: 04-1] Add support for jsonschema-based entities [software/conftool] - 10https://gerrit.wikimedia.org/r/408585 (https://phabricator.wikimedia.org/T185080) (owner: 10Giuseppe Lavagetto) [10:16:10] (03CR) 10jerkins-bot: [V: 04-1] Release new version of conftool [software/conftool] - 10https://gerrit.wikimedia.org/r/410226 (owner: 10Giuseppe Lavagetto) [10:18:01] hashar: No space left on device ^^^ [10:18:13] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410786 (owner: 10Marostegui) [10:18:22] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410786 (owner: 10Marostegui) [10:18:44] that's integration-slave-jessie-1002 AFAICT [10:19:46] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic db1097:3314 (duration: 01m 12s) [10:19:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:20:22] (03CR) 10Filippo Giunchedi: "> Patch Set 12:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/399618 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [10:25:26] (03CR) 10Giuseppe Lavagetto: [C: 031] Backends: add known hosts files backend [software/cumin] - 10https://gerrit.wikimedia.org/r/405719 (owner: 10Volans) [10:25:57] volans: eek [10:26:55] (03PS6) 10Jcrespo: mariadb: Depool db2042 fully in preparation for reassignment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410236 [10:28:49] !log Upgrade mariadb on db1066 [10:29:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:30:09] volans: I have cleaned it [10:30:25] hashar: do you know why it doesn't clean itself? [10:30:33] (03PS1) 10Jcrespo: mariadb: Remove db2042 from mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410790 (https://phabricator.wikimedia.org/T183470) [10:30:33] thanks for the clean ;) [10:30:52] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db2042 fully in preparation for reassignment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410236 (owner: 10Jcrespo) [10:33:08] (03Merged) 10jenkins-bot: mariadb: Depool db2042 fully in preparation for reassignment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410236 (owner: 10Jcrespo) [10:33:23] (03CR) 10jenkins-bot: mariadb: Depool db2042 fully in preparation for reassignment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410236 (owner: 10Jcrespo) [10:36:49] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1097:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410792 [10:37:04] (03PS2) 10Alexandros Kosiaris: Force jenkins-slave being member of docker [puppet] - 10https://gerrit.wikimedia.org/r/410763 (https://phabricator.wikimedia.org/T186790) [10:38:03] !log jynus@tin Synchronized wmf-config/db-codfw.php: Depool db2042 fully (duration: 01m 12s) [10:38:15] (03CR) 10Marostegui: [C: 031] hieradata: enable SMART for db in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/410412 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [10:38:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:39:42] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Slowly repool db1097:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410792 (owner: 10Marostegui) [10:39:48] (03PS2) 10Filippo Giunchedi: hieradata: enable SMART for db in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/410412 (https://phabricator.wikimedia.org/T86552) [10:40:45] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: enable SMART for db in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/410412 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [10:43:57] volans: those slaves are terrible and contents keep pilling up on them :/ [10:44:25] volans: but I am migrating the jobs to new slaves that would have jobs that self collect the mess [10:44:35] ack [10:45:31] lunch &!! [10:48:55] (03PS1) 10Jcrespo: mariadb: Move db2042 from codfw:core:s1 to misc:s3 (phabricator) [puppet] - 10https://gerrit.wikimedia.org/r/410794 (https://phabricator.wikimedia.org/T183470) [10:49:21] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Move db2042 from codfw:core:s1 to misc:s3 (phabricator) [puppet] - 10https://gerrit.wikimedia.org/r/410794 (https://phabricator.wikimedia.org/T183470) (owner: 10Jcrespo) [10:50:37] (03CR) 10Jcrespo: "We are going to ignore the -1, because it is a style complain that eventually will be zeroed by the host we decommission, but that cannot " [puppet] - 10https://gerrit.wikimedia.org/r/410794 (https://phabricator.wikimedia.org/T183470) (owner: 10Jcrespo) [10:50:49] (03CR) 10Jcrespo: [V: 031] mariadb: Move db2042 from codfw:core:s1 to misc:s3 (phabricator) [puppet] - 10https://gerrit.wikimedia.org/r/410794 (https://phabricator.wikimedia.org/T183470) (owner: 10Jcrespo) [10:51:57] (03PS2) 10Jcrespo: mariadb: Move db2042 from codfw:core:s1 to misc:s3 (phabricator) [puppet] - 10https://gerrit.wikimedia.org/r/410794 (https://phabricator.wikimedia.org/T183470) [10:52:03] (03CR) 10Jcrespo: [C: 032] mariadb: Remove db2042 from mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410790 (https://phabricator.wikimedia.org/T183470) (owner: 10Jcrespo) [10:52:30] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Move db2042 from codfw:core:s1 to misc:s3 (phabricator) [puppet] - 10https://gerrit.wikimedia.org/r/410794 (https://phabricator.wikimedia.org/T183470) (owner: 10Jcrespo) [10:54:04] (03CR) 10Volans: [C: 031] "recheck LGTM" [software/conftool] - 10https://gerrit.wikimedia.org/r/408585 (https://phabricator.wikimedia.org/T185080) (owner: 10Giuseppe Lavagetto) [10:54:12] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1097:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410792 (owner: 10Marostegui) [10:55:19] (03CR) 10Jcrespo: [V: 032] mariadb: Move db2042 from codfw:core:s1 to misc:s3 (phabricator) [puppet] - 10https://gerrit.wikimedia.org/r/410794 (https://phabricator.wikimedia.org/T183470) (owner: 10Jcrespo) [10:55:23] 10Operations, 10Analytics, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack/setup/install conf1004-conf1006 - https://phabricator.wikimedia.org/T166081#3974987 (10elukey) I think that we can proceed in this way: 1) Check in labs what zookeeper version would end up in stretch. On conf100[123] we have... [10:55:33] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic db1097:3314 and slowly repool db1097:3315 (duration: 01m 12s) [10:55:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:56:49] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1097:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410792 (owner: 10Marostegui) [10:57:13] (03PS2) 10Jcrespo: mariadb: Remove db2042 from mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410790 (https://phabricator.wikimedia.org/T183470) [10:57:33] (03CR) 10Volans: [C: 031] "LGTM" [software/conftool] - 10https://gerrit.wikimedia.org/r/410224 (owner: 10Giuseppe Lavagetto) [10:57:40] (03CR) 10Giuseppe Lavagetto: "Seems generally good, although the feature would need more work to be fully useful." (031 comment) [software/cumin] - 10https://gerrit.wikimedia.org/r/409980 (https://phabricator.wikimedia.org/T186818) (owner: 10Volans) [10:58:11] !log Stop replication in sync db1089 and db1066 [10:58:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:59:16] (03CR) 10Giuseppe Lavagetto: Add simple actions to be exercised only on the basic types. (034 comments) [software/conftool] - 10https://gerrit.wikimedia.org/r/410225 (owner: 10Giuseppe Lavagetto) [11:01:02] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Add support for jsonschema-based entities [software/conftool] - 10https://gerrit.wikimedia.org/r/408585 (https://phabricator.wikimedia.org/T185080) (owner: 10Giuseppe Lavagetto) [11:01:06] <_joe_> win 19 [11:04:04] (03CR) 10Volans: [C: 032] Backends: add known hosts files backend [software/cumin] - 10https://gerrit.wikimedia.org/r/405719 (owner: 10Volans) [11:04:16] (03CR) 10Muehlenhoff: Add support for selective automatic restarts of stateless services (0312 comments) [puppet] - 10https://gerrit.wikimedia.org/r/399618 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [11:04:28] (03PS13) 10Muehlenhoff: Add support for selective automatic restarts of stateless services [puppet] - 10https://gerrit.wikimedia.org/r/399618 (https://phabricator.wikimedia.org/T135991) [11:06:06] 10Operations, 10Analytics, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack/setup/install conf1004-conf1006 - https://phabricator.wikimedia.org/T166081#3975006 (10MoritzMuehlenhoff) >>! In T166081#3974987, @elukey wrote: > 1) Check in labs what zookeeper version would end up in stretch. On conf100[123... [11:06:55] (03Merged) 10jenkins-bot: Backends: add known hosts files backend [software/cumin] - 10https://gerrit.wikimedia.org/r/405719 (owner: 10Volans) [11:08:14] (03CR) 10jenkins-bot: Backends: add known hosts files backend [software/cumin] - 10https://gerrit.wikimedia.org/r/405719 (owner: 10Volans) [11:08:16] (03CR) 10Giuseppe Lavagetto: [C: 032] Increase test coverage [software/conftool] - 10https://gerrit.wikimedia.org/r/410224 (owner: 10Giuseppe Lavagetto) [11:08:54] (03CR) 10Volans: [C: 04-1] "Thanks for the fixes, one typo and we're good to go." (032 comments) [software/conftool] - 10https://gerrit.wikimedia.org/r/410225 (owner: 10Giuseppe Lavagetto) [11:09:50] (03Merged) 10jenkins-bot: Increase test coverage [software/conftool] - 10https://gerrit.wikimedia.org/r/410224 (owner: 10Giuseppe Lavagetto) [11:14:50] (03PS3) 10Alexandros Kosiaris: Force jenkins-slave being member of docker [puppet] - 10https://gerrit.wikimedia.org/r/410763 (https://phabricator.wikimedia.org/T186790) [11:15:32] (03CR) 10Alexandros Kosiaris: [C: 032] Force jenkins-slave being member of docker [puppet] - 10https://gerrit.wikimedia.org/r/410763 (https://phabricator.wikimedia.org/T186790) (owner: 10Alexandros Kosiaris) [11:21:39] 10Operations, 10ops-codfw, 10Analytics, 10DC-Ops: Decomission eventlog2001 - https://phabricator.wikimedia.org/T182397#3975056 (10MoritzMuehlenhoff) >>! In T182397#3974123, @RobH wrote: > Not showing there now, someone did a cleanup. Not quite, that is related to some changes in Puppet 4 and their interac... [11:24:01] (03PS4) 10Giuseppe Lavagetto: Add simple actions to be exercised only on the basic types. [software/conftool] - 10https://gerrit.wikimedia.org/r/410225 [11:24:03] (03PS4) 10Giuseppe Lavagetto: Release new version of conftool [software/conftool] - 10https://gerrit.wikimedia.org/r/410226 [11:25:05] (03CR) 10Volans: [C: 031] "LGTM" [software/conftool] - 10https://gerrit.wikimedia.org/r/410225 (owner: 10Giuseppe Lavagetto) [11:25:09] (03CR) 10jerkins-bot: [V: 04-1] Add simple actions to be exercised only on the basic types. [software/conftool] - 10https://gerrit.wikimedia.org/r/410225 (owner: 10Giuseppe Lavagetto) [11:25:11] (03CR) 10Giuseppe Lavagetto: Add simple actions to be exercised only on the basic types. (032 comments) [software/conftool] - 10https://gerrit.wikimedia.org/r/410225 (owner: 10Giuseppe Lavagetto) [11:25:13] (03CR) 10jerkins-bot: [V: 04-1] Release new version of conftool [software/conftool] - 10https://gerrit.wikimedia.org/r/410226 (owner: 10Giuseppe Lavagetto) [11:25:37] (03CR) 10Volans: [C: 031] "LGTM" [software/conftool] - 10https://gerrit.wikimedia.org/r/410226 (owner: 10Giuseppe Lavagetto) [11:29:56] 10Operations, 10Goal, 10User-fgiunchedi: Include apache_exporter in puppet module apache - https://phabricator.wikimedia.org/T187434#3975086 (10fgiunchedi) p:05Triage>03Normal [11:33:25] (03CR) 10Volans: "See inline. Anyway I'd wait for a bit more consensus on the task before going forward with this." (031 comment) [software/cumin] - 10https://gerrit.wikimedia.org/r/409980 (https://phabricator.wikimedia.org/T186818) (owner: 10Volans) [11:49:47] !log milimetric@tin Started deploy [analytics/refinery@26d4e50]: Deploying Refinery jobs with new 0.0.58 jars [11:50:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:52:05] (03PS2) 10Deskana: Enable the visual diff beta feature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409091 (owner: 10Jforrester) [11:54:29] (03CR) 10Deskana: [C: 031] Enable the visual diff beta feature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409091 (owner: 10Jforrester) [11:55:23] (03PS2) 10Alexandros Kosiaris: Prepare kubernetes nodes for serving mathoid traffic [puppet] - 10https://gerrit.wikimedia.org/r/410489 (https://phabricator.wikimedia.org/T184919) [11:58:44] !log addshore@terbium:~$ mwscript extensions/Cognate/maintenance/populateCognatePages.php --wiki elwiktionary --batchsize 1000 # T185738 [11:58:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:58:56] T185738: Certain entries/pages on Wiktionaries lack their corresponding links to el.wiktionary - https://phabricator.wikimedia.org/T185738 [11:59:20] !log milimetric@tin Finished deploy [analytics/refinery@26d4e50]: Deploying Refinery jobs with new 0.0.58 jars (duration: 09m 33s) [11:59:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:01:47] !log script run for T185738 done [12:02:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:06:05] (03PS2) 10Muehlenhoff: uwsgi: Use systemd::tmpfile [puppet] - 10https://gerrit.wikimedia.org/r/386620 [12:12:56] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3975241 (10Paladox) @elukey what about backporting php7... [12:23:31] (03PS7) 10Ema: icinga: add check_established_connections plugin [puppet] - 10https://gerrit.wikimedia.org/r/409921 (https://phabricator.wikimedia.org/T170847) [12:33:35] (03CR) 10MarcoAurelio: [C: 031] "Has the script finished already its first complete run?" [puppet] - 10https://gerrit.wikimedia.org/r/410349 (https://phabricator.wikimedia.org/T187078) (owner: 10Dzahn) [12:40:51] Reedy: that's good (wrt AF script); maybe we could wait until the indexes are fixed then revert the puppet change and restore daily run? [12:41:50] (03PS8) 10Ema: icinga: add check_established_connections plugin [puppet] - 10https://gerrit.wikimedia.org/r/409921 (https://phabricator.wikimedia.org/T170847) [12:44:42] (03PS3) 10Addshore: Switch to extension.json for PropertySuggester [mediawiki-config] - 10https://gerrit.wikimedia.org/r/395486 [12:44:47] (03PS3) 10Addshore: Switch to extension.json for WikibaseQuality extensions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/395487 [12:44:50] (03PS3) 10Addshore: Switch to extension.json for Wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/395488 [12:44:59] (03PS4) 10Addshore: Create a LockManager for WikidataDispatch with short TTL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/395967 (https://phabricator.wikimedia.org/T178652) [12:45:12] (03PS2) 10Addshore: Use new wikibase dispatch lock manager on wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/395969 (https://phabricator.wikimedia.org/T178652) [12:45:46] (03PS2) 10Addshore: Add 'RevisionStore' to wmgMonologChannels [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406128 [12:49:00] (03CR) 10Addshore: [C: 031] Enable log channel T184670 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410546 (https://phabricator.wikimedia.org/T184670) (owner: 10Sbisson) [12:49:36] (03PS7) 10Ema: pybal: check established TCP connections to etcd [puppet] - 10https://gerrit.wikimedia.org/r/409922 (https://phabricator.wikimedia.org/T170847) [13:08:00] (03PS2) 10Sbisson: Enable log channel T184670 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410546 (https://phabricator.wikimedia.org/T184670) [13:11:50] 10Operations, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10hardware-requests: Give misc dump crons their own host - https://phabricator.wikimedia.org/T181936#3975318 (10ArielGlenn) We have to get this into the budget plan by tomorrow, so I'm going to request: Let's get a box that looks like snapsh... [13:14:01] (03CR) 10Rush: [C: 031] openstack: install hp-health on labvirt* servers [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) (owner: 10Bstorm) [13:14:05] (03PS2) 10Rush: openstack: install hp-health on labvirt* servers [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) (owner: 10Bstorm) [13:22:04] (03CR) 10Muehlenhoff: "Why is this specific to the labvirt hosts? If this is generally useful on HP servers, we should move it to the list of standard packages?" [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) (owner: 10Bstorm) [13:23:10] (03CR) 10Rush: [C: 031] "That seems fine too Mortiz, yeah" [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) (owner: 10Bstorm) [13:31:43] 10Operations, 10Ops-Access-Requests, 10Traffic, 10Patch-For-Review: Ops Onboarding for Valentín Gutiérrez - https://phabricator.wikimedia.org/T187035#3975345 (10Dzahn) >>! In T187035#3974843, @Vgutierrez wrote: > I just added myself to root and security aliases. great! :) > typo on my email address and... [13:31:45] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T187442#3975346 (10RazShuty) [13:33:29] 10Operations, 10ops-eqiad, 10DC-Ops: decommission rcs1001/1002 - https://phabricator.wikimedia.org/T181825#3803505 (10faidon) This is a duplicate of T170157. I'll tag that one with #ops-eqiad, and close this as duplicate. [13:33:43] 10Operations, 10ops-eqiad, 10DC-Ops: decommission rcs1001/1002 - https://phabricator.wikimedia.org/T181825#3975363 (10faidon) [13:33:51] 10Operations, 10Analytics, 10Wikimedia-Stream, 10hardware-requests, 10Patch-For-Review: decommission rcs100[12] - https://phabricator.wikimedia.org/T170157#3420970 (10faidon) [13:34:19] 10Operations, 10ops-eqiad, 10Analytics, 10Wikimedia-Stream, 10hardware-requests: decommission rcs100[12] - https://phabricator.wikimedia.org/T170157#3420970 (10faidon) [13:34:33] 10Operations, 10Goal, 10User-fgiunchedi: Include apache_exporter in puppet module apache - https://phabricator.wikimedia.org/T187434#3975369 (10Dzahn) But we also want to replace the apache module with the httpd module (which doesn't have monitoring.pp anymore because we didn't want the diamond collector). S... [13:35:34] 10Operations, 10Ops-Access-Requests, 10Traffic, 10Patch-For-Review: Ops Onboarding for Valentín Gutiérrez - https://phabricator.wikimedia.org/T187035#3975370 (10Vgutierrez) [13:42:35] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, and 2 others: rack/setup/install restbase-dev100[456] - https://phabricator.wikimedia.org/T166181#3975379 (10faidon) [13:42:37] 10Operations, 10ops-eqiad, 10hardware-requests: Decommisson restbase-dev100[1-3] - https://phabricator.wikimedia.org/T171179#3975377 (10faidon) 05Resolved>03Open These appear to be still racked in Racktables. Reopening to investigate per IRC conversation. [13:44:09] jouncebot: next [13:44:09] In 0 hour(s) and 15 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180215T1400) [13:44:14] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T187442#3975346 (10MoritzMuehlenhoff) If this request is only about getting added to the wmde group (which controls some Gerrit settings related to WMDE projects) you don't... [13:44:20] jynus: the icinga systemd check wasn't supposed to be disabled on phab1001, at least not by me. not sure why it was. only the one on phab2001 where it is known that the service cant start. i just re-enabled notifications on phab1001 [13:46:19] (03PS5) 10MarcoAurelio: Log accessing private abusefilter details [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409445 (https://phabricator.wikimedia.org/T160357) [13:49:39] 10Operations, 10OCG-General, 10Readers-Community-Engagement, 10Epic, and 3 others: [EPIC] (Proposal) Replicate core OCG features and sunset OCG service - https://phabricator.wikimedia.org/T150871#2799774 (10Osnard) Can somebody give me a short status description of this project? On "Reading/Web/PDF_Functio... [13:49:53] (03CR) 10Esanders: [C: 031] Load 3D extension on other wikis, for display only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410433 (https://phabricator.wikimedia.org/T187261) (owner: 10Matthias Mullie) [13:54:55] (03PS2) 10Ema: wmf-upgrade-varnish: add support for non-interactive upgrades [puppet] - 10https://gerrit.wikimedia.org/r/410558 (https://phabricator.wikimedia.org/T168529) [13:55:25] (03CR) 10Ema: wmf-upgrade-varnish: add support for non-interactive upgrades (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/410558 (https://phabricator.wikimedia.org/T168529) (owner: 10Ema) [13:56:27] (03CR) 10Ottomata: Force jenkins-slave being member of docker (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/410763 (https://phabricator.wikimedia.org/T186790) (owner: 10Alexandros Kosiaris) [13:56:35] 10Operations, 10ops-codfw, 10DC-Ops: Decommission osm-db200[12] and osm-web200[1234] - https://phabricator.wikimedia.org/T187445#3975431 (10faidon) p:05Triage>03Low [13:57:31] (03PS1) 10Vgutierrez: Add vgutierrez to icinga sms contactgroup [puppet] - 10https://gerrit.wikimedia.org/r/410909 (https://phabricator.wikimedia.org/T187035) [13:58:54] (03CR) 10Dzahn: [C: 031] Add vgutierrez to icinga sms contactgroup [puppet] - 10https://gerrit.wikimedia.org/r/410909 (https://phabricator.wikimedia.org/T187035) (owner: 10Vgutierrez) [13:59:13] 10Operations, 10ops-eqiad, 10DC-Ops: Decommission xenon, cerium, praseodymium - https://phabricator.wikimedia.org/T187446#3975453 (10faidon) p:05Triage>03Low [13:59:21] I need someone from the 'staff' global group to be around SWAT to check the working correctness of a patch that will be deployed soon-ish. [14:00:03] (03PS1) 10Muehlenhoff: Add repository component for tor on stretch [puppet] - 10https://gerrit.wikimedia.org/r/410910 [14:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor My software never has bugs. It just develops random features. Rise for European Mid-day SWAT(Max 8 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180215T1400). [14:00:05] stephanebisson, Deskana, Addshore, and Hauskatze: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [14:00:11] o/ [14:00:14] purr [14:00:18] I can SWAT today [14:00:19] 10Operations, 10Analytics, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack/setup/install conf1004-conf1006 - https://phabricator.wikimedia.org/T166081#3975464 (10elukey) >>! In T166081#3975006, @MoritzMuehlenhoff wrote: >>>! In T166081#3974987, @elukey wrote: >> 1) Check in labs what zookeeper versio... [14:00:22] hello [14:00:29] 10Operations, 10ops-codfw, 10DC-Ops: Decommission restbase-test200[123] - https://phabricator.wikimedia.org/T187447#3975465 (10faidon) p:05Triage>03Low [14:00:42] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T187442#3975477 (10Addshore) Per comments in T177599 - L2 is only for access to the "WMF-NDA" group on phabricator (for access to restricted tickets) T177599#3674019 - An... [14:00:55] stephanebisson, Deskana, Addshore, and Hauskatze: do you want to deploy your own patches (if you can)? [14:01:06] I can't [14:01:09] * Hauskatze does not have access [14:01:30] PROBLEM - HHVM rendering on mw2200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:01:35] but if deskana or addshore could also help me test the patch I've requested to deploy it'd be awesome [14:01:39] I can't, and I wouldn't trust myself to even if I did. [14:01:42] ;-) [14:01:51] Deskana: same here, but yet... ;) [14:01:55] :-D [14:01:56] (03PS2) 10Vgutierrez: nagios_common: Add vgutierrez to icinga sms contactgroup [puppet] - 10https://gerrit.wikimedia.org/r/410909 (https://phabricator.wikimedia.org/T187035) [14:01:59] I trust you more than me! [14:02:02] I mean, I can, but should I? [14:02:14] ;) [14:02:19] zeljkof: I'll let you do mine :) [14:02:20] RECOVERY - HHVM rendering on mw2200 is OK: HTTP OK: HTTP/1.1 200 OK - 74049 bytes in 0.416 second response time [14:02:26] unless you dont want to :D [14:02:42] (03CR) 10Volans: [C: 031] "LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/410558 (https://phabricator.wikimedia.org/T168529) (owner: 10Ema) [14:03:01] (03CR) 10Alexandros Kosiaris: [C: 032] Force jenkins-slave being member of docker (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/410763 (https://phabricator.wikimedia.org/T186790) (owner: 10Alexandros Kosiaris) [14:03:11] apergos: https://meta.wikimedia.org/wiki/Special:CentralAuth/AGlenn_(WMF) <-- attach to metawiki ;-) [14:03:29] addshore: I prefer if developers deploy their commits, but if they can not or prefer I do it, I'll do it! :) [14:03:52] zeljkof: I don't have shell access, so I can't do that. [14:03:54] I can do mine at the end if you would like :) [14:03:56] for a small fee, see terms and conditions* [14:04:14] addshore: go ahead then, while I review stephanebisson's patches [14:04:23] okay! [14:04:26] people that deploy themselves have priority [14:04:30] Hauskatze: I never edit with that account (as you can see), it was one of those mass-created when someone decided we should all get them [14:04:40] (03CR) 10Addshore: [C: 032] Add 'RevisionStore' to wmgMonologChannels [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406128 (owner: 10Addshore) [14:04:48] tbh I hardly every edit with any account, so there's that [14:04:54] heh [14:04:54] *hardly ever [14:05:03] developers with deployer status, please start -boarding-, I mean deploying [14:05:11] (03CR) 10Urbanecm: [C: 031] Log accessing private abusefilter details [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409445 (https://phabricator.wikimedia.org/T160357) (owner: 10MarcoAurelio) [14:06:38] apergos: fact is that I need someone with an account with 'staff' global rights to test a patch I'm about to deploy; but if you're busy/do not want to that's also fine [14:06:39] stephanebisson, Deskana, Hauskatze: anything special about your patches? examples: can not be tested at mwdebug1002 (or at all), a script needs to run, testing takes more than a few minutes, files need to be deployed in certain order... [14:07:01] zeljkof: I like your checklist ;) [14:07:16] zeljkof: I don't think there's anything special, but I'm far from certain. [14:07:32] the checklist is the reason I don't have "I broke wikipedia" t-shirt (yet) ;P [14:07:35] (03Merged) 10jenkins-bot: Add 'RevisionStore' to wmgMonologChannels [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406128 (owner: 10Addshore) [14:07:39] All it's doing is enabling a beta feature, so I don't see why I couldn't test that on mwdebug1002. [14:07:40] zeljkof: yes; my patch can be tested at mwdebug to check that noone gets the abusefilter-private/abusefilter-private-log locally, but I *need* someone with an account in the global 'staff' group to test the log is working :| [14:07:45] (03CR) 10jenkins-bot: Add 'RevisionStore' to wmgMonologChannels [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406128 (owner: 10Addshore) [14:07:47] you don't? :O [14:07:52] so far nobody seems to volunteer [14:08:02] and I'd rather not add the rights to myself [14:08:06] addshore: not as far as I know :) [14:08:36] Hauskatze: how do I check if I can test it? [14:08:43] Hauskatze: If you tell me exactly where to look, and exactly what I'm looking for, I can tell you whether I think it's working. But I'm not going to be on the hook for some kind of "official WMF verification" or anything. [14:08:46] zeljkof: Nothing crazy. The config patch should be deployed first. Then the code patch can be tested on mwdebug1002 but the data may not appear right away in logstash (which is the goal of those patches). [14:08:47] syncing [14:08:57] (03CR) 10Vgutierrez: [C: 032] nagios_common: Add vgutierrez to icinga sms contactgroup [puppet] - 10https://gerrit.wikimedia.org/r/410909 (https://phabricator.wikimedia.org/T187035) (owner: 10Vgutierrez) [14:09:07] Hauskatze: If that's okay, then I can help. :-) [14:09:15] Deskana: I'll let James do the 'formal' verification then [14:09:55] !log addshore@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:406128|Add RevisionStore to wmgMonologChannels]]: (duration: 01m 13s) [14:10:03] zeljkof: thats me out of the way! [14:10:03] thanks! [14:10:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:10:08] otherwise, if you could just verify that you can see the Special:Log/abusefilterprivatedetails that'd be awesome [14:10:22] Hauskatze: Sure, I can take a look at that. [14:10:25] addshore: thank you for deploying, see you at our next flight ;) [14:10:46] Deskana: great, I've just added 'abusefilter-private-log' to 'staff'; you should see the log [14:10:55] Deskana: all testing during swat is smoke testing, just checking that nothing obvious is on fire :) [14:10:56] it should be empty though [14:12:21] once $wgAbuseFilterPrivateLog = true; is deployed, the log should start to collect the checks of private abuse filter log data the same way the checkuser log does [14:12:49] (03CR) 10Zfilipin: [C: 032] Enable log channel T184670 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410546 (https://phabricator.wikimedia.org/T184670) (owner: 10Sbisson) [14:12:57] (03CR) 10Zfilipin: Enable log channel T184670 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410546 (https://phabricator.wikimedia.org/T184670) (owner: 10Sbisson) [14:12:59] (03PS3) 10Muehlenhoff: uwsgi: Use systemd::tmpfile [puppet] - 10https://gerrit.wikimedia.org/r/386620 [14:13:02] (03PS3) 10Zfilipin: Enable log channel T184670 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410546 (https://phabricator.wikimedia.org/T184670) (owner: 10Sbisson) [14:13:12] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410546 (https://phabricator.wikimedia.org/T184670) (owner: 10Sbisson) [14:13:42] (03CR) 10Muehlenhoff: [C: 032] uwsgi: Use systemd::tmpfile [puppet] - 10https://gerrit.wikimedia.org/r/386620 (owner: 10Muehlenhoff) [14:14:18] (03Abandoned) 10Paladox: Phabricator: Raise php max_execution_time to 15 [puppet] - 10https://gerrit.wikimedia.org/r/410631 (https://phabricator.wikimedia.org/T125357) (owner: 10Paladox) [14:14:25] (03PS1) 10Marostegui: db-eqiad.php: Depool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410923 (https://phabricator.wikimedia.org/T187089) [14:15:09] jouncebot: next [14:15:10] In 2 hour(s) and 44 minute(s): Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180215T1700) [14:16:13] zeljkof: ahh there is a library upgrader bot running and sending hundred of patches [14:16:17] (03PS2) 10Marostegui: db-eqiad.php: Depool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410923 (https://phabricator.wikimedia.org/T187089) [14:16:23] hashar: nice :( [14:16:32] and the mediawiki-config patches get to wait for operations-mw-config-composer-hhvm-jessie :( [14:16:36] gotta migrate that one to docker [14:16:46] hashar: please do, I'm buying beers :) [14:16:55] :_( [14:18:09] stephanebisson: can you test 410546 at mwdebug1002? [14:18:22] (not now, in general, still waiting for CI) [14:18:32] (03CR) 10jenkins-bot: mariadb: Remove db2042 from mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410790 (https://phabricator.wikimedia.org/T183470) (owner: 10Jcrespo) [14:19:06] zeljkof: I can only test both the config and code patch together. [14:19:24] stephanebisson: ok, so I deploy both to mwdebug1002 and _then_ let you know? [14:19:37] zeljkof: yep [14:19:58] will let you know in a few minutes, waiting for CI [14:20:38] (03Merged) 10jenkins-bot: Enable log channel T184670 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410546 (https://phabricator.wikimedia.org/T184670) (owner: 10Sbisson) [14:20:48] !log jynus@tin Synchronized wmf-config/db-codfw.php: Remove db2042 (duration: 01m 12s) [14:20:55] (03CR) 10Matthias Mullie: [C: 04-1] Load 3D extension on other wikis, for display only (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410433 (https://phabricator.wikimedia.org/T187261) (owner: 10Matthias Mullie) [14:21:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:32] (03CR) 10Paladox: "Let me change this to httpd module then." [puppet] - 10https://gerrit.wikimedia.org/r/407962 (owner: 10Paladox) [14:21:51] (03CR) 10jenkins-bot: Enable log channel T184670 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410546 (https://phabricator.wikimedia.org/T184670) (owner: 10Sbisson) [14:22:08] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Remove db2042 (duration: 01m 11s) [14:22:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:24:04] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Rack and setup db1115 (tendril replacement database) - https://phabricator.wikimedia.org/T185788#3975522 (10Marostegui) 05stalled>03Open a:05Marostegui>03Cmjohnson Hi, this can now proceed with the current hostname (db1115) Thanks! [14:24:07] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Rack and setup db1115 (tendril replacement database) - https://phabricator.wikimedia.org/T185788#3975526 (10Marostegui) 05stalled>03Open a:05Marostegui>03Cmjohnson Hi, this can now proceed with the current hostname (db1115) Thanks! [14:24:29] mmm weird… [14:24:40] stephanebisson: both patches are at mwdebug1002, please test and let me know if it's ok to deploy [14:26:19] zeljkof: can you let me know when my patch is going? I'm about to do something else while waiting. Please ping me. Thanks. [14:26:22] (03PS1) 10Jcrespo: mariadb: Depool db1056 from s2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410932 (https://phabricator.wikimedia.org/T183469) [14:26:27] Hauskatze: sure [14:26:32] ty [14:26:49] Hauskatze: will ping you a few minutes before I need you to test it [14:27:32] Deskana: please stand by, you are next, in a few minutes, stephanebisson's patches should be deployed soon [14:27:39] Ready sir! [14:27:47] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/install tendril2001 - https://phabricator.wikimedia.org/T186123#3975533 (10Marostegui) Hi, After a chat with @jcrespo we have agreed to rename this host to a normal dbXXXX one, so please can we rename it to: db2093 Please make sure that the... [14:28:28] fine :) [14:28:42] (03PS1) 10Vgutierrez: icinga: grant vgutierrez permissions to run commands [puppet] - 10https://gerrit.wikimedia.org/r/410933 (https://phabricator.wikimedia.org/T187035) [14:28:44] stephanebisson: still around? do you need more time to test? [14:29:05] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/install db2093 (WAS: rack/setup/install tendril2001) - https://phabricator.wikimedia.org/T186123#3975536 (10Marostegui) [14:29:11] my browser crashed... I'm just back, testing now [14:29:43] Deskana: how come 409091 does not link to a phabricator ticket in the commit message? [14:30:05] (03PS2) 10Jcrespo: mariadb: Depool db1053 from s2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410932 (https://phabricator.wikimedia.org/T183469) [14:30:07] stephanebisson: ok [14:30:25] 10Operations, 10Analytics, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack/setup/install conf1004-conf1006 - https://phabricator.wikimedia.org/T166081#3975538 (10elukey) Nevermind I am stupid, I confused the long version 3.4.5+cdh5.10.0+104-1.cdh5.10.0.p0.71~jessie-cdh5.10.0 with 3.4.5+dfsg-2+deb8u2,... [14:30:34] zeljkof: A mistake, I assume. There is a task. Let me find it. [14:30:53] Deskana: please do, it's a smell when there is no task :) [14:31:07] and makes it harder for me to see what is going on [14:31:17] (03CR) 10Jcrespo: [C: 031] "Quick compare.py said it was identical to its master on the main tables." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410932 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [14:31:23] zeljkof: Got it. https://phabricator.wikimedia.org/T185708 [14:31:39] (03CR) 10Marostegui: [C: 031] "\o/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410932 (https://phabricator.wikimedia.org/T183469) (owner: 10Jcrespo) [14:31:39] zeljkof: all good [14:31:41] Deskana: ok, adding to the commit message [14:31:49] Thank you! [14:32:12] (03PS3) 10Zfilipin: Enable the visual diff beta feature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409091 (https://phabricator.wikimedia.org/T185708) (owner: 10Jforrester) [14:32:18] (03PS4) 10Zfilipin: Enable the visual diff beta feature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409091 (https://phabricator.wikimedia.org/T185708) (owner: 10Jforrester) [14:32:27] stephanebisson: ok, deploying [14:32:53] stephanebisson: so, first the config change, then core change? right? (triple checking) [14:33:24] (03PS1) 10Elukey: linux-host-entries: set stretch for conf100[456] [puppet] - 10https://gerrit.wikimedia.org/r/410940 (https://phabricator.wikimedia.org/T166081) [14:34:40] (03CR) 10Elukey: [C: 032] linux-host-entries: set stretch for conf100[456] [puppet] - 10https://gerrit.wikimedia.org/r/410940 (https://phabricator.wikimedia.org/T166081) (owner: 10Elukey) [14:35:42] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:410546|Enable log channel T184670 (T184670)]] (duration: 01m 12s) [14:35:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:35:57] T184670: [wmf.16-regression] Fatal exception of type "Flow\Exception\InvalidDataException" for opting out from "Structured Discussions on user talk" - https://phabricator.wikimedia.org/T184670 [14:36:35] zeljkof: order may not matter, but if it does, yes [14:37:12] !log zfilipin@tin Synchronized php-1.31.0-wmf.21/includes/Revision.php: SWAT: [[gerrit:410522|Log the reason why revision->getContent() returns null (T184670)]] (duration: 01m 12s) [14:37:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:37:30] stephanebisson: deployed both patches, please check and thanks for deploying with #releng ;) [14:37:42] Deskana: reviewing and merging your patch [14:37:47] zeljkof: thanks! [14:38:22] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T187442#3975563 (10Tobi_WMDE_SW) [14:38:40] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409091 (https://phabricator.wikimedia.org/T185708) (owner: 10Jforrester) [14:39:04] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T187442#3975346 (10Tobi_WMDE_SW) Per @Addshore 's comment, I removed the requirement for L2 from the description. [14:39:49] 10Operations, 10Goal, 10User-fgiunchedi: Include apache_exporter in puppet module apache - https://phabricator.wikimedia.org/T187434#3975567 (10fgiunchedi) Indeed, so the current way we are using apache_exporter in puppet is via `profile::prometheus::apache_exporter` which includes `prometheus::apache_expor... [14:39:57] Deskana: any minute now, just waiting for CI [14:41:12] 10Operations, 10Analytics, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack/setup/install conf1004-conf1006 - https://phabricator.wikimedia.org/T166081#3284834 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['conf1004.eqiad.wmnet'] ``` The... [14:43:20] (03Merged) 10jenkins-bot: Enable the visual diff beta feature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409091 (https://phabricator.wikimedia.org/T185708) (owner: 10Jforrester) [14:43:33] (03CR) 10jenkins-bot: Enable the visual diff beta feature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409091 (https://phabricator.wikimedia.org/T185708) (owner: 10Jforrester) [14:45:00] Deskana: the patch is at mwdebug, please test and let me know if I can deploy [14:45:03] (03PS2) 10Odder: Add favicon for right-to-left Wikibooks projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406624 (https://phabricator.wikimedia.org/T185919) [14:45:13] (03PS4) 10Odder: Update logos for the Urdu Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410201 (https://phabricator.wikimedia.org/T187182) [14:45:34] Hauskatze: you are next, please stand by, your patch should be at mwdebug in a few minutes [14:45:43] purr [14:47:48] zeljkof: It doesn't appear to be working. I'm not seeing the beta feature. [14:47:59] (03CR) 10Zfilipin: [C: 031] Log accessing private abusefilter details [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409445 (https://phabricator.wikimedia.org/T160357) (owner: 10MarcoAurelio) [14:49:29] Deskana: should I revert the commit? [14:49:42] how to test it? I took a look at https://www.mediawiki.org/wiki/Special:Preferences#mw-prefsection-betafeatures [14:49:53] did not see any new beta features there, if that's what should happen [14:51:09] Yeah, as far as I know that's what should happen, but I'm wondering whether this can't be tested on mwdebug1002 or something. [14:51:32] Deskana: I can deploy, if you think it will not break anything [14:51:32] (03PS3) 10Odder: Shrink favicon file sizes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402618 (https://phabricator.wikimedia.org/T177726) [14:51:45] we can revert if there is trouble later [14:51:48] To be honest, I'd prefer we try deploying it fully to see if it works then. [14:51:57] If it doesn't, immediate revert. [14:52:02] Deskana: ok, deploying, ready for revert [14:52:43] (03CR) 10Filippo Giunchedi: [C: 031] cassandra: add instance ID to list of custom logstash fields [puppet] - 10https://gerrit.wikimedia.org/r/409916 (https://phabricator.wikimedia.org/T130862) (owner: 10Eevans) [14:53:42] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:409091|Enable the visual diff beta feature (T185708)]] (duration: 01m 12s) [14:53:52] Deskana: deployed, please test [14:53:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:53:55] T185708: Make historical VisualDiffs a BetaFeature - https://phabricator.wikimedia.org/T185708 [14:54:17] That also appears to have done nothing. [14:55:03] The comment above the change does say "Whitelist enablement of individual Beta Features for production; per-wiki configuration should happen below this." [14:55:06] And there is no change below it. [14:55:09] I don't see anything strange in logs (so far) [14:55:25] So I'm thinking the patch is probably missing something. [14:55:29] ah, maybe it is not enabled for any wiki? [14:55:44] I am fine with leaving it as-is, or reverting, what ever you think makes sense [14:55:50] If you have time, I'd like to try changing the patch. [14:56:02] If I'm right about what's wrong, then it should work even on mwdebug1002. [14:56:06] we have 5 more minutes, and one more patch to go [14:57:00] I'll start with the next patch, you take a look at this one, we can probably have a bit longer swat, I don't see anything at 15 utc [14:57:20] Hauskatze: reviewing your patch, please stand by [14:57:33] I'm here [14:57:53] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409445 (https://phabricator.wikimedia.org/T160357) (owner: 10MarcoAurelio) [14:58:44] (03PS1) 10Rush: openstack: set up values for test and n environment [puppet] - 10https://gerrit.wikimedia.org/r/410943 (https://phabricator.wikimedia.org/T184209) [14:58:50] !log installing erlang security updates on labcontrol1001 [14:59:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:59:10] (03CR) 10jerkins-bot: [V: 04-1] openstack: set up values for test and n environment [puppet] - 10https://gerrit.wikimedia.org/r/410943 (https://phabricator.wikimedia.org/T184209) (owner: 10Rush) [15:00:03] Hauskatze: hard to tell how long it will take, waiting for CI :( [15:00:27] zeljkof: we should bribe hashar to migrate those tests to docker/making things faster ;-) [15:00:34] PROBLEM - HHVM rendering on mw2241 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:00:57] Hauskatze: I think he is already moving jobs to docker 25 hours a day :) [15:01:23] RECOVERY - HHVM rendering on mw2241 is OK: HTTP OK: HTTP/1.1 200 OK - 74007 bytes in 0.298 second response time [15:03:01] (03PS2) 10Rush: openstack: set up values for test and n environment [puppet] - 10https://gerrit.wikimedia.org/r/410943 (https://phabricator.wikimedia.org/T184209) [15:03:44] (03PS3) 10Ema: wmf-upgrade-varnish: add support for non-interactive upgrades [puppet] - 10https://gerrit.wikimedia.org/r/410558 (https://phabricator.wikimedia.org/T168529) [15:04:15] zeljkof: I'll be honest, if I make a change here I'm mostly guessing as to whether it's the right thing or not. [15:04:29] Deskana: so, don't do it? ;) [15:04:36] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Specify namespace for the service [deployment-charts] - 10https://gerrit.wikimedia.org/r/410746 (owner: 10Alexandros Kosiaris) [15:04:48] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Set service-checker pod in the same namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/410745 (owner: 10Alexandros Kosiaris) [15:04:59] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Correctly name prometheus-statsd image [deployment-charts] - 10https://gerrit.wikimedia.org/r/410478 (owner: 10Alexandros Kosiaris) [15:05:06] Deskana: we can leave it as is, until somebody that knows what should happen takes a look [15:05:30] Hauskatze: the job is finally running, any minute now [15:05:34] ok [15:05:37] jouncebot: now [15:05:37] No deployments scheduled for the next 1 hour(s) and 54 minute(s) [15:05:39] (03CR) 10Ema: [C: 032] wmf-upgrade-varnish: add support for non-interactive upgrades [puppet] - 10https://gerrit.wikimedia.org/r/410558 (https://phabricator.wikimedia.org/T168529) (owner: 10Ema) [15:05:45] Deskana: or we can revert [15:05:54] good, we're not 'invading' the time slot of anyone :) [15:05:57] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/399618 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [15:05:58] please ping me when finished, I am waiting to deploy [15:06:13] CI volente [15:06:23] jynus: we should be done in a few minutes, will let you know [15:06:28] database pools and depools happen 24/7 [15:06:32] slow CI is slow :( [15:06:32] same here :( [15:07:02] (03Merged) 10jenkins-bot: Log accessing private abusefilter details [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409445 (https://phabricator.wikimedia.org/T160357) (owner: 10MarcoAurelio) [15:07:10] zeljkof: I think it can left as it is, since the way it is now doesn't actually do anything. [15:07:12] jynus, marostegui: sorry, looks like CI is under load today, most of SWAT I was waiting for jobs to run :( [15:07:28] there are a lot of patches [15:07:33] zeljkof: no worries, it has been like that since early in the morning :( [15:07:35] Deskana: ok, let's do that, if needed somebody will revert [15:07:51] 10Operations, 10Patch-For-Review: Automated service restarts for common low-level system services - https://phabricator.wikimedia.org/T135991#3975626 (10fgiunchedi) While reviewing https://gerrit.wikimedia.org/r/399618 it occurred to me another criteria for service selection, other than stateless, should be so... [15:07:59] Hauskatze: patch merged, pushing to mwdebug in a minute [15:08:14] =^o^= [15:08:43] Hauskatze: the patch is at mwdebug1002 [15:08:47] checking [15:08:55] Deskana: you said you can test 409445, right? [15:09:33] (03CR) 10jenkins-bot: Log accessing private abusefilter details [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409445 (https://phabricator.wikimedia.org/T160357) (owner: 10MarcoAurelio) [15:09:37] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/409921 (https://phabricator.wikimedia.org/T170847) (owner: 10Ema) [15:10:04] zeljkof: I guess so. [15:10:16] Hauskatze: Remind me what I'm supposed to do? [15:10:18] unrelated, this was not in fatalmonitor earlier: `469 data error in /srv/mediawiki/php-1.31.0-wmf.20/extensions/Graph/includes/ApiGraph.php on line 125` [15:10:24] zeljkof: lgtm, deskana, would you mind checking https://meta.wikimedia.org/wiki/Special:Log/abusefilterprivatedetails [15:10:28] ^ [15:10:39] (03PS1) 10Alexandros Kosiaris: Inform the user about the available port range on minikube [deployment-charts] - 10https://gerrit.wikimedia.org/r/410946 [15:10:48] 10Operations, 10Patch-For-Review: Automated service restarts for common low-level system services - https://phabricator.wikimedia.org/T135991#3975630 (10MoritzMuehlenhoff) Yeah, definitely, this is currently only meant for all many common system services we use across the fleet (nrpe, diamond, systemd-timesync... [15:11:04] by the way I've just found that MediaWiki:action-abusefilter-private-log should be added; but that's unrelated [15:11:14] will do that today :) [15:11:40] https://i.imgur.com/hgewQiO.png [15:12:00] So I guess it worked. [15:12:23] Deskana: mind https://meta.wikimedia.org/wiki/Special:AbuseLog/366091 see if there's a box at the bottom? [15:12:29] 10Operations, 10ops-codfw, 10Cloud-VPS: Connect labtestvirt2003 eth1 and eth2 interface(s) to switch fabric - https://phabricator.wikimedia.org/T183167#3975632 (10chasemp) >>! In T183167#3967741, @Papaul wrote: > @chasemp > labtestnet2002:eth0 = ge-1/0/13 > labtestvirt2003:eth2 = ge-1/0/14 Thanks @Papaul... [15:12:38] 10Operations, 10Analytics, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack/setup/install conf1004-conf1006 - https://phabricator.wikimedia.org/T166081#3975633 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['conf1004.eqiad.wmnet'] ``` and were **ALL** successful. [15:12:39] it's an IP entry so no private data will be accessed [15:13:13] well, that can't be tested until the patch is merged [15:13:20] zeljkof: I think you can sync [15:13:28] Hauskatze: deploying [15:13:28] nothing broke, no smoke signals [15:13:51] note that abusefilter.php somewhat a subfile of CommonSettings.php [15:14:05] in case that means you need to sync one or another first [15:14:30] (03PS1) 10Ema: cache_upload: finalize upgrade to varnish 5 [puppet] - 10https://gerrit.wikimedia.org/r/410948 (https://phabricator.wikimedia.org/T180433) [15:14:33] Hauskatze: nothing changes in commonsettings.php, right? so no need to sync it [15:14:44] zeljkof: no, no changes in CS.php [15:14:46] !log zfilipin@tin Synchronized wmf-config/abusefilter.php: SWAT: [[gerrit:409445|Log accessing private abusefilter details (T160357)]] (duration: 01m 12s) [15:14:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:15:00] T160357: Allow those with CheckUser right to access AbuseLog private information on WMF projects - https://phabricator.wikimedia.org/T160357 [15:15:12] Hauskatze, Deskana: deployed, please test and thanks for deploying with #releng ;) [15:15:29] thanks CI for working, and thanks zeljkof [15:15:32] and thanks Deskana [15:15:43] !log EU SWAT finished [15:15:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:16:02] (03PS3) 10Marostegui: db-eqiad.php: Depool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410923 (https://phabricator.wikimedia.org/T187089) [15:16:17] jynus, marostegui: swat finished, apologies for the delay [15:16:29] no worries, jynus do you mind if I go first? [15:16:53] I want to leave an alter running and keep warming up a host [15:18:14] do hosts need warm up? crazy [15:19:07] Not necessarily in all cases, but if possible it is good to bring data to the buffer pool slowly [15:19:22] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410923 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [15:19:54] please do [15:20:16] Hauskatze: there is a wormup period [15:20:18] thanks :-) [15:20:31] normally they workmup automatically [15:20:36] I already pushed - where are my manners... [15:20:40] zeljkof: Thanks. Sorry for wasting your time a little. [15:21:06] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Inform the user about the available port range on minikube [deployment-charts] - 10https://gerrit.wikimedia.org/r/410946 (owner: 10Alexandros Kosiaris) [15:21:16] but in any case, normally not ok to do large sudden changes in any case [15:21:48] Deskana: no problem at all, that many patches should be done in 30-45 minutes, if CI was not really slow today [15:23:16] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/410933 (https://phabricator.wikimedia.org/T187035) (owner: 10Vgutierrez) [15:23:36] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410923 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [15:23:42] (03PS2) 10Ema: cache_upload: finalize upgrade to varnish 5 [puppet] - 10https://gerrit.wikimedia.org/r/410948 (https://phabricator.wikimedia.org/T180433) [15:24:43] jynus: your turn! [15:25:35] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1051, fully repool db1097:3314, increase weight for db1097:3315 (duration: 01m 13s) [15:25:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:25:54] (03PS1) 10Volans: Icinga: update allowed IPs for external monitoring [puppet] - 10https://gerrit.wikimedia.org/r/410951 (https://phabricator.wikimedia.org/T162857) [15:27:03] !log Deploy schema change on db1051 - T187089 T185128 T153182 [15:27:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:27:17] T187089: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 [15:27:18] T153182: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182 [15:27:18] T185128: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128 [15:27:43] (03PS1) 10Elukey: profile::zookeeper::server: remove explicit java-7 dependency [puppet] - 10https://gerrit.wikimedia.org/r/410957 (https://phabricator.wikimedia.org/T166081) [15:29:24] (03PS1) 10Muehlenhoff: Blacklist v4l2-common [puppet] - 10https://gerrit.wikimedia.org/r/410958 [15:29:52] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410923 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [15:30:30] (03CR) 10Ema: [C: 032] cache_upload: finalize upgrade to varnish 5 [puppet] - 10https://gerrit.wikimedia.org/r/410948 (https://phabricator.wikimedia.org/T180433) (owner: 10Ema) [15:34:20] !log upgrade upload @ eqsin to varnish 5 [15:34:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:34:34] (03CR) 10Elukey: [C: 031] profile::zookeeper::server: remove explicit java-7 dependency [puppet] - 10https://gerrit.wikimedia.org/r/410957 (https://phabricator.wikimedia.org/T166081) (owner: 10Elukey) [15:37:37] 10Operations, 10Analytics, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack/setup/install conf1004-conf1006 - https://phabricator.wikimedia.org/T166081#3975752 (10elukey) In labs I've extended the one-zookeeper-node analytics project's cluster to three nodes, adding two stretch hosts. Except the puppe... [15:39:00] (03PS1) 10Alexandros Kosiaris: Add the helm chart for mathoid [deployment-charts] - 10https://gerrit.wikimedia.org/r/410964 (https://phabricator.wikimedia.org/T184919) [15:41:09] (03CR) 10Muehlenhoff: [C: 031] Icinga: update allowed IPs for external monitoring [puppet] - 10https://gerrit.wikimedia.org/r/410951 (https://phabricator.wikimedia.org/T162857) (owner: 10Volans) [15:43:30] (03PS2) 10Volans: Icinga: update allowed IPs for external monitoring [puppet] - 10https://gerrit.wikimedia.org/r/410951 (https://phabricator.wikimedia.org/T162857) [15:44:04] (03CR) 10Volans: [C: 032] Icinga: update allowed IPs for external monitoring [puppet] - 10https://gerrit.wikimedia.org/r/410951 (https://phabricator.wikimedia.org/T162857) (owner: 10Volans) [15:44:43] (03CR) 10Filippo Giunchedi: [C: 031] Blacklist v4l2-common [puppet] - 10https://gerrit.wikimedia.org/r/410958 (owner: 10Muehlenhoff) [15:45:35] 10Operations, 10ops-eqiad, 10Cloud-Services, 10DC-Ops: Decommission labstore100[12] - https://phabricator.wikimedia.org/T187456#3975773 (10faidon) p:05Triage>03Low [15:46:07] 10Operations, 10ops-eqiad, 10Cloud-Services, 10DC-Ops: Decommission labstore100[12] and their disk shelves - https://phabricator.wikimedia.org/T187456#3975788 (10faidon) [15:46:10] 10Operations, 10ops-codfw: Degraded RAID on db2048 - https://phabricator.wikimedia.org/T187419#3975790 (10Papaul) a:05Papaul>03Marostegui Disk replacement complete [15:53:50] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic db1097:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410981 [15:54:23] (03PS2) 10Muehlenhoff: Blacklist v4l2-common [puppet] - 10https://gerrit.wikimedia.org/r/410958 [15:55:21] 10Operations, 10monitoring, 10Patch-For-Review: Some Core availability Catchpoint tests might be more expensive than they need to be - https://phabricator.wikimedia.org/T162857#3975827 (10Volans) [15:56:20] (03PS1) 10Alexandros Kosiaris: httpd: Bump mod_status priority to 50 [puppet] - 10https://gerrit.wikimedia.org/r/410984 [15:56:26] 10Operations, 10hardware-requests: hardware request for tin replacement - https://phabricator.wikimedia.org/T184481#3884434 (10Joe) I initially suggested tin might make use of SSD, being the deployment server, but in fact it seems that given how scap works nowadays the only iops-intensive activity on those ser... [15:57:25] <_joe_> akosiaris: that priority for the server-status was deliberate [15:57:34] (03CR) 10Muehlenhoff: [C: 032] Blacklist v4l2-common [puppet] - 10https://gerrit.wikimedia.org/r/410958 (owner: 10Muehlenhoff) [15:57:49] used to be 50 though [15:58:01] 10Operations, 10hardware-requests: hardware request for tin replacement - https://phabricator.wikimedia.org/T184481#3975841 (10faidon) a:05faidon>03RobH Approved then. [15:58:02] we 've change it that much in httpd module ? [15:58:06] * akosiaris looking better at the code [15:59:51] (03PS3) 10Bstorm: openstack: install hp-health on labvirt* servers [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) [16:00:26] _joe_: are you sure ? I see no mention for that in https://gerrit.wikimedia.org/r/#/c/410630/ [16:02:31] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3975871 (10mmodell) Paladin: how many packages from bus... [16:03:44] <_joe_> akosiaris: yeah, I mean do merge that [16:03:51] <_joe_> I'm in a meeting :) [16:03:55] ok [16:04:01] thanks [16:04:04] (03PS1) 10MarcoAurelio: maintain-views: Explicitly exclude 'abusefilterprivatedetails' [puppet] - 10https://gerrit.wikimedia.org/r/410992 (https://phabricator.wikimedia.org/T187455) [16:05:05] (03CR) 10BryanDavis: [C: 031] maintain-views: Explicitly exclude 'abusefilterprivatedetails' [puppet] - 10https://gerrit.wikimedia.org/r/410992 (https://phabricator.wikimedia.org/T187455) (owner: 10MarcoAurelio) [16:05:21] (03CR) 10Alexandros Kosiaris: [C: 032] httpd: Bump mod_status priority to 50 [puppet] - 10https://gerrit.wikimedia.org/r/410984 (owner: 10Alexandros Kosiaris) [16:05:23] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic db1097:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410981 (owner: 10Marostegui) [16:05:25] (03PS2) 10Alexandros Kosiaris: httpd: Bump mod_status priority to 50 [puppet] - 10https://gerrit.wikimedia.org/r/410984 [16:05:49] (03PS3) 10Alexandros Kosiaris: httpd: Bump mod_status priority to 50 [puppet] - 10https://gerrit.wikimedia.org/r/410984 [16:05:58] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] httpd: Bump mod_status priority to 50 [puppet] - 10https://gerrit.wikimedia.org/r/410984 (owner: 10Alexandros Kosiaris) [16:06:12] twentyafterfour: I guess your okay to do that patch in the train slot rather than me swat it? :) [16:07:57] (03CR) 10Andrew Bogott: [C: 032] maintain-views: Explicitly exclude 'abusefilterprivatedetails' [puppet] - 10https://gerrit.wikimedia.org/r/410992 (https://phabricator.wikimedia.org/T187455) (owner: 10MarcoAurelio) [16:08:00] (03PS2) 10Andrew Bogott: maintain-views: Explicitly exclude 'abusefilterprivatedetails' [puppet] - 10https://gerrit.wikimedia.org/r/410992 (https://phabricator.wikimedia.org/T187455) (owner: 10MarcoAurelio) [16:08:40] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic db1097:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410981 (owner: 10Marostegui) [16:08:51] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic db1097:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410981 (owner: 10Marostegui) [16:10:15] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic db1097:3315 (duration: 01m 12s) [16:10:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:15:11] hey bd808, are you around? [16:15:43] dsaez: yup. what's up [16:18:44] bd808, I'm trying to login here: https://horizon.wikimedia.org/auth/login/ , but my credentials are not working. Not sure why, I remember that were some problems with my LDAP access, but I don't know what is the problem now [16:19:26] dsaez: can you join in #wikimedia-cloud and we can try to work it out? [16:19:42] ok! going there! thx [16:27:10] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2048 - https://phabricator.wikimedia.org/T187419#3976006 (10Marostegui) Thanks - it is rebuilding ``` logicaldrive 1 (3.3 TB, RAID 1+0, Recovering, 30% complete) physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, Rebuilding) ``` [16:29:09] !log andrew@tin Started deploy [horizon/deploy@4e7ccc5]: lots of updates [16:29:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:06] 10Operations, 10Analytics, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack/setup/install conf1004-conf1006 - https://phabricator.wikimedia.org/T166081#3976023 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['conf1005.eqiad.wmnet', 'conf10... [16:32:22] !log andrew@tin Finished deploy [horizon/deploy@4e7ccc5]: lots of updates (duration: 03m 13s) [16:32:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:35:02] (03PS1) 10Marostegui: db-eqiad.php: Repool db1097:3315,db1089,db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411013 [16:37:59] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1097:3315,db1089,db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411013 (owner: 10Marostegui) [16:38:53] 10Operations, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Upgrade to Varnish 5 - https://phabricator.wikimedia.org/T168529#3976058 (10ema) [16:38:58] 10Operations, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Upgrade cache_upload to Varnish 5 - https://phabricator.wikimedia.org/T180433#3976056 (10ema) 05Open>03Resolved a:03ema [16:40:42] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1097:3315,db1089,db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411013 (owner: 10Marostegui) [16:42:14] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Fully repool db1097:3315, db1089, db1066 (duration: 01m 12s) [16:42:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:42:45] (03PS4) 10Paladox: httpd: support php7 and 71 in mpm.pp [puppet] - 10https://gerrit.wikimedia.org/r/407962 [16:43:25] (03CR) 10jerkins-bot: [V: 04-1] httpd: support php7 and 71 in mpm.pp [puppet] - 10https://gerrit.wikimedia.org/r/407962 (owner: 10Paladox) [16:45:07] (03CR) 10Paladox: "Actually this is loaded by the user so won't be needed by phab or is it?" [puppet] - 10https://gerrit.wikimedia.org/r/407962 (owner: 10Paladox) [16:48:24] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1097:3315,db1089,db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411013 (owner: 10Marostegui) [16:48:50] 10Operations, 10Page-Previews, 10RESTBase, 10Traffic, and 2 others: Cached page previews not shown when refreshed - https://phabricator.wikimedia.org/T184534#3976131 (10phuedx) >>! In T184534#3967535, @BBlack wrote: > Do we want to allow stale content in the UA's cache here, for up to 5 minutes past the ex... [16:50:02] 10Operations, 10ops-eqiad: Decommission mw1259-mw1260 - https://phabricator.wikimedia.org/T187466#3976137 (10Joe) [16:51:08] (03Abandoned) 10Paladox: httpd: support php7 and 71 in mpm.pp [puppet] - 10https://gerrit.wikimedia.org/r/407962 (owner: 10Paladox) [16:52:13] (03PS2) 10Dzahn: icinga: grant vgutierrez permissions to run commands [puppet] - 10https://gerrit.wikimedia.org/r/410933 (https://phabricator.wikimedia.org/T187035) (owner: 10Vgutierrez) [16:53:23] (03CR) 10Dzahn: [C: 032] "yep, we talked about it on IRC earlier. that was the last thing missing for onboarding" [puppet] - 10https://gerrit.wikimedia.org/r/410933 (https://phabricator.wikimedia.org/T187035) (owner: 10Vgutierrez) [16:53:48] (03CR) 10Dzahn: [C: 032] "and needs to match "cn" in LDAP" [puppet] - 10https://gerrit.wikimedia.org/r/410933 (https://phabricator.wikimedia.org/T187035) (owner: 10Vgutierrez) [16:53:53] 10Operations: Create 2 VMs in codfw for mwdebug20001 and 2002 - https://phabricator.wikimedia.org/T187468#3976170 (10Joe) [16:54:21] 10Operations, 10ops-eqiad: Decommission mw2017 and mw2099 - https://phabricator.wikimedia.org/T187467#3976192 (10Joe) [16:54:32] 10Operations, 10Cloud-Services, 10cloud-services-team (Kanban): rack/setup/install labcontrol100[34] - https://phabricator.wikimedia.org/T165781#3976193 (10faidon) What is the status of this and is there an ETA? The reason I'm asking is that labcontrol100[12] (very old/replaced by this) are still online and... [17:00:04] godog, moritzm, and _joe_: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Puppet SWAT(Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180215T1700). [17:00:05] tgr and urandom: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [17:01:16] o/ [17:01:36] looking at the patches! [17:02:06] (03PS1) 10Marostegui: db-eqiad.php: Repool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411022 [17:02:23] so https://gerrit.wikimedia.org/r/c/409916/ seems straightforward, I suspect it'll mean reload/restarting logstash [17:02:37] no I'm misreading it, that'd be cassandra [17:02:54] (03PS2) 10Filippo Giunchedi: cassandra: add instance ID to list of custom logstash fields [puppet] - 10https://gerrit.wikimedia.org/r/409916 (https://phabricator.wikimedia.org/T130862) (owner: 10Eevans) [17:03:12] godog: oh, i hope not; i don't think so [17:03:47] yeah I don't think so either, brainfart on my part [17:03:53] gehel: adding that field (discussed yesterday) doesn't require a logstash/es restart, does it? [17:03:59] (03CR) 10Filippo Giunchedi: [C: 032] cassandra: add instance ID to list of custom logstash fields [puppet] - 10https://gerrit.wikimedia.org/r/409916 (https://phabricator.wikimedia.org/T130862) (owner: 10Eevans) [17:04:22] godog: the real Q is whether it requires a restart (i don't think it does) [17:04:59] godog: errr [17:05:03] a restart of Cassandra [17:05:05] urandom: Nope, it should *just work* (TM) [17:05:09] * urandom brainfarted too [17:05:13] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411022 (owner: 10Marostegui) [17:05:18] 10Operations, 10Analytics, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack/setup/install conf1004-conf1006 - https://phabricator.wikimedia.org/T166081#3976232 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['conf1005.eqiad.wmnet', 'conf1006.eqiad.wmnet'] ``` and were **ALL** successful. [17:06:01] urandom: it should not even require a Cassandra restart, if I remember, auto reload of log config is enabled [17:06:12] ya [17:06:15] just ran puppet on restbase1008 btw [17:06:22] godog: ran it on 1007 :) [17:06:52] hehe [17:07:03] urandom: LMK if everything looks like as it should [17:07:18] godog: kk [17:08:01] jynus: https://gerrit.wikimedia.org/r/c/409645/ mentions you'd be deploying it during puppet swat, still the case? if not I can do it too [17:08:13] If you do not mind [17:08:22] just deploy, it is safe [17:08:25] it was deployed before [17:08:29] no I don't, easy enough indeed [17:08:37] (03PS4) 10Filippo Giunchedi: Re-enable cron job for purging ReadingLists data [puppet] - 10https://gerrit.wikimedia.org/r/409645 (https://phabricator.wikimedia.org/T181107) (owner: 10Gergő Tisza) [17:08:40] I reverteve because log spam [17:09:43] (I actually tested the script first this time) [17:10:20] (03CR) 10Filippo Giunchedi: [C: 032] Re-enable cron job for purging ReadingLists data [puppet] - 10https://gerrit.wikimedia.org/r/409645 (https://phabricator.wikimedia.org/T181107) (owner: 10Gergő Tisza) [17:10:47] (03Draft1) 10Paladox: Phabricator: Support php 7.1 under stretch [puppet] - 10https://gerrit.wikimedia.org/r/410245 [17:10:50] (03Draft2) 10Paladox: Phabricator: Support php 7.1 under stretch [puppet] - 10https://gerrit.wikimedia.org/r/410245 [17:10:53] PROBLEM - Varnish HTTP text-backend - port 3128 on cp4030 is CRITICAL: connect to address 10.128.0.130 and port 3128: Connection refused [17:10:54] (03PS3) 10Paladox: Phabricator: Support php 7.1 under stretch [puppet] - 10https://gerrit.wikimedia.org/r/410245 [17:11:06] tgr: we will see you on cronspam court if not!!!! [17:11:07] godog: https://goo.gl/drrDQt [17:11:08] :-D [17:11:13] 10Operations, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack/setup/install conf1004-conf1006 - https://phabricator.wikimedia.org/T166081#3976253 (10elukey) [17:11:20] godog: TL;DR It Works(tm) [17:11:25] (03CR) 10jerkins-bot: [V: 04-1] Phabricator: Support php 7.1 under stretch [puppet] - 10https://gerrit.wikimedia.org/r/410245 (owner: 10Paladox) [17:11:34] * godog shakes fist at kibana urls [17:11:47] urandom: sweet! [17:11:47] the link didn't work? [17:11:53] RECOVERY - Varnish HTTP text-backend - port 3128 on cp4030 is OK: HTTP OK: HTTP/1.1 200 OK - 218 bytes in 0.157 second response time [17:11:55] https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2018.02.15/cassandra?id=AWGacPZKWMYqG9UiFU3o&_g=(refreshInterval:('$$hashKey':'object:2603',display:'10%20seconds',pause:!f,section:1,value:10000),time:(from:now-1h,mode:quick,to:now)) [17:11:59] LOL [17:12:01] yeah it worked ok [17:12:04] ok [17:12:15] tgr: thanks for testing it! [17:12:22] that looks like `cat /dev/urandom` [17:12:27] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411022 (owner: 10Marostegui) [17:12:38] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411022 (owner: 10Marostegui) [17:12:52] mmh, again the icinga varnish backend spam [17:13:16] cp4030 just went through it's weekly restart [17:14:08] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1051 (duration: 01m 12s) [17:14:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:15:26] 10Operations, 10Ops-Access-Requests, 10Traffic, 10Patch-For-Review: Ops Onboarding for Valentín Gutiérrez - https://phabricator.wikimedia.org/T187035#3976273 (10Dzahn) @Vgutierrez re: Icinga command permissions. should be all done. the ultimate test is if you try a "schedule downtime" or "disable/enable no... [17:16:52] (03PS1) 10Marostegui: db-eqiad.php: Depool db1067 and db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411029 [17:17:04] (03CR) 10Marostegui: [C: 04-1] "wait till tomorrow" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411029 (owner: 10Marostegui) [17:17:09] (03PS4) 10Paladox: Phabricator: Support php 7.1 under stretch [puppet] - 10https://gerrit.wikimedia.org/r/410245 [17:19:45] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3976280 (10Paladox) @mmodell hi, i have a list here htt... [17:20:07] (03PS5) 10Paladox: Phabricator: Support php 7.1 under stretch [puppet] - 10https://gerrit.wikimedia.org/r/410245 (https://phabricator.wikimedia.org/T182832) [17:22:17] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3976287 (10mmodell) Looks like we still have workers in... [17:22:31] (03PS1) 10Dzahn: Revert "mediawiki: reduce frequency of purge_abusefilter to weekly" [puppet] - 10https://gerrit.wikimedia.org/r/411031 [17:22:48] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3976290 (10Paladox) @mmodell full list https://phabrica... [17:22:53] (03CR) 10Dzahn: "planned revert of a temp thing and per comment from Reedy" [puppet] - 10https://gerrit.wikimedia.org/r/411031 (owner: 10Dzahn) [17:24:26] !log removed 2FA from User:Lea Lacroix (WMDE) [17:24:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:25:40] 10Operations, 10ops-eqiad, 10DC-Ops: Decommission old and unused/spare servers in eqiad - https://phabricator.wikimedia.org/T187473#3976323 (10faidon) p:05Triage>03Normal [17:27:29] (03PS1) 10Ema: check_http_varnish: bump check_interval [puppet] - 10https://gerrit.wikimedia.org/r/411033 [17:27:32] 10Operations, 10ops-codfw, 10DC-Ops: Decommission old and unused/spare servers in codfw - https://phabricator.wikimedia.org/T187474#3976338 (10faidon) p:05Triage>03Normal [17:28:13] 10Operations, 10Cloud-Services, 10cloud-services-team (Kanban): rack/setup/install labcontrol100[34] - https://phabricator.wikimedia.org/T165781#3976363 (10chasemp) We have been planning to use them in our deployment with neutron and to cut over things at once since we need a second control plane but there a... [17:31:58] 10Operations, 10ops-codfw, 10DC-Ops, 10hardware-requests: Decommission old and unused/spare servers in codfw - https://phabricator.wikimedia.org/T187474#3976372 (10RobH) [17:32:24] 10Operations, 10ops-eqiad, 10DC-Ops, 10hardware-requests: Decommission old and unused/spare servers in eqiad - https://phabricator.wikimedia.org/T187473#3976383 (10RobH) a:03RobH [17:32:33] 10Operations, 10ops-codfw, 10DC-Ops, 10hardware-requests: Decommission old and unused/spare servers in codfw - https://phabricator.wikimedia.org/T187474#3976338 (10RobH) a:03RobH [17:33:07] 10Operations, 10ops-eqiad, 10DC-Ops, 10hardware-requests: Decommission old and unused/spare servers in eqiad - https://phabricator.wikimedia.org/T187473#3976389 (10faidon) [17:33:47] (03CR) 10Eevans: [C: 031] cassandra: enable component/cassandra33 where applicable (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/410252 (https://phabricator.wikimedia.org/T186619) (owner: 10Eevans) [17:34:09] 10Operations, 10ops-codfw, 10DC-Ops, 10hardware-requests: Decommission old and unused/spare servers in codfw - https://phabricator.wikimedia.org/T187474#3976390 (10RobH) [17:34:37] 10Operations, 10ops-eqiad, 10DC-Ops, 10hardware-requests: Decommission old and unused/spare servers in eqiad - https://phabricator.wikimedia.org/T187473#3976391 (10RobH) Please note that every system on this list will need to be decommission and have the following checklist applied PER HOST: [] - all syst... [17:34:45] 10Operations, 10ops-eqiad, 10Cloud-Services, 10DC-Ops: Decommission labstore100[12] and their disk shelves - https://phabricator.wikimedia.org/T187456#3976393 (10chasemp) I believe labstore1003 was a part of the refresh for labstore1006/7, where labstore1004/1005 were the most direct refresh for labstore10... [17:36:11] (03PS4) 10Rush: openstack: install hp-health on labvirt* servers [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) (owner: 10Bstorm) [17:36:49] (03CR) 10MarcoAurelio: "I'd prefer if we fixed the indexed first. I worry that the script breaks if it takes more than a day to complete again. If the db-indexes " [puppet] - 10https://gerrit.wikimedia.org/r/411031 (owner: 10Dzahn) [17:37:37] CI is slooooow :) [17:40:43] 10Operations, 10ops-eqiad, 10DC-Ops, 10Data-Services, 10cloud-services-team: Decommission labstore100[12] and their disk shelves - https://phabricator.wikimedia.org/T187456#3976424 (10bd808) [17:43:35] Hauskatze: you mean backed up [17:44:10] I know the end-user difference doesn't mean much, but on the infra side we just have a backlog of tests, not much we can do about that :) [17:44:31] (03PS5) 10Giuseppe Lavagetto: Add simple actions to be exercised only on the basic types. [software/conftool] - 10https://gerrit.wikimedia.org/r/410225 [17:46:07] (03CR) 10Giuseppe Lavagetto: [C: 032] Add simple actions to be exercised only on the basic types. [software/conftool] - 10https://gerrit.wikimedia.org/r/410225 (owner: 10Giuseppe Lavagetto) [17:46:16] (03PS5) 10Giuseppe Lavagetto: Release new version of conftool [software/conftool] - 10https://gerrit.wikimedia.org/r/410226 [17:47:28] (03CR) 10jerkins-bot: [V: 04-1] Release new version of conftool [software/conftool] - 10https://gerrit.wikimedia.org/r/410226 (owner: 10Giuseppe Lavagetto) [17:48:45] (03PS6) 10Paladox: Phabricator: Support php 7.1 under stretch [puppet] - 10https://gerrit.wikimedia.org/r/410245 (https://phabricator.wikimedia.org/T182832) [17:49:38] 10Operations, 10ops-codfw, 10DC-Ops, 10hardware-requests: Decommission old and unused/spare servers in codfw - https://phabricator.wikimedia.org/T187474#3976446 (10RobH) [17:50:09] 10Operations, 10ops-codfw, 10DC-Ops, 10hardware-requests: Decommission old and unused/spare servers in codfw - https://phabricator.wikimedia.org/T187474#3976338 (10RobH) a:05RobH>03Papaul I've done all the remotely accessible steps, now escalating this directly to @papaul to complete the checklist item... [17:50:39] (03PS6) 10Giuseppe Lavagetto: Release new version of conftool [software/conftool] - 10https://gerrit.wikimedia.org/r/410226 [17:51:41] (03CR) 10jerkins-bot: [V: 04-1] Release new version of conftool [software/conftool] - 10https://gerrit.wikimedia.org/r/410226 (owner: 10Giuseppe Lavagetto) [17:53:49] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3976453 (10elukey) Got a backtrace of one process in G... [17:56:46] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3976455 (10Paladox) Should we change serialize_precisio... [17:57:01] (03CR) 10Dzahn: [C: 031] phabricator: disable opcache.fastshutdown [puppet] - 10https://gerrit.wikimedia.org/r/410767 (https://phabricator.wikimedia.org/T182832) (owner: 10Elukey) [17:59:59] (03PS2) 10Dzahn: Revert "mediawiki: reduce frequency of purge_abusefilter to weekly" [puppet] - 10https://gerrit.wikimedia.org/r/411031 [18:00:04] cscott, arlolra, subbu, halfak, and Amir1: How many deployers does it take to do Services – Graphoid / Parsoid / Citoid / ORES deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180215T1800). [18:00:04] No GERRIT patches in the queue for this window AFAICS. [18:00:12] Nothing for ORES [18:00:30] maybe parsoid deploy .. we are discussing. [18:01:03] (03CR) 10Dzahn: "i'll do what Reedy and Hauskatze agree on :)" [puppet] - 10https://gerrit.wikimedia.org/r/411031 (owner: 10Dzahn) [18:02:05] (03CR) 10Dzahn: "comes down to how long is long in "even for enwiki doesn't take it long to clear them (now that the backlog has been cleared)"" [puppet] - 10https://gerrit.wikimedia.org/r/411031 (owner: 10Dzahn) [18:02:46] (03PS7) 10Paladox: Phabricator: Support php 7.1 under stretch [puppet] - 10https://gerrit.wikimedia.org/r/410245 (https://phabricator.wikimedia.org/T182832) [18:06:04] !log bsitzmann@tin Started deploy [mobileapps/deploy@0bfafa9]: Update mobileapps to d219d1b (T187475) [18:06:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:06:18] T187475: Extra square in 'In the news' section - https://phabricator.wikimedia.org/T187475 [18:07:29] Reedy: how long does a run of purge_abusefilter take now [18:07:31] Hauskatze: ^ [18:08:24] no idea mutante -- I don't have access to production [18:08:30] thus cannot see logs [18:10:02] the log doesnt have timestamps inside it and contains multiple wikis [18:11:32] Hauskatze: i'll wait a bit with the revert as you requested [18:11:49] 10Operations, 10Electron-PDFs, 10Proton, 10Readers-Web-Backlog, and 4 others: New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748#3976530 (10Niedzielski) a:03phuedx [18:11:58] !log bsitzmann@tin Finished deploy [mobileapps/deploy@0bfafa9]: Update mobileapps to d219d1b (T187475) (duration: 05m 54s) [18:12:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:12:13] T187475: Extra square in 'In the news' section - https://phabricator.wikimedia.org/T187475 [18:12:21] one of the things I was thinking of was that the script output some sort of time/timestamp [18:12:42] not sure if that's possible [18:23:27] RECOVERY - HP RAID on db2048 is OK: OK: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK [18:23:59] (03PS3) 10Andrew Bogott: Move WMCS VMs back to the default environment. [puppet] - 10https://gerrit.wikimedia.org/r/410069 [18:24:01] (03PS1) 10Andrew Bogott: lab puppetmasters: allow labweb access via ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/411047 [18:24:06] (03PS1) 10Jforrester: Follow-up 77be427a1: Enable the Beta Feature on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411048 (https://phabricator.wikimedia.org/T185708) [18:25:40] (03PS2) 10Andrew Bogott: lab puppetmasters: allow labweb access via ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/411047 [18:25:41] (03PS4) 10Andrew Bogott: Move WMCS VMs back to the default environment. [puppet] - 10https://gerrit.wikimedia.org/r/410069 [18:26:53] Hauskatze: certainly possible, but cant do it right now. maybe if you could put that idea on the ticket [18:27:18] (03CR) 10Andrew Bogott: [C: 032] lab puppetmasters: allow labweb access via ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/411047 (owner: 10Andrew Bogott) [18:30:37] PROBLEM - Check systemd state on labpuppetmaster1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [18:33:59] !log arlolra@tin Started deploy [parsoid/deploy@6da4591]: Updating Parsoid to 0650195 [18:34:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:42:33] !log arlolra@tin Finished deploy [parsoid/deploy@6da4591]: Updating Parsoid to 0650195 (duration: 08m 34s) [18:42:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:48:39] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3976642 (10elukey) >>! In T182832#3976287, @mmodell wro... [18:49:36] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3976644 (10mmodell) Yeah it's especially odd that it's... [18:51:04] 10Operations, 10ops-codfw, 10DC-Ops, 10hardware-requests: Decommission old and unused/spare servers in codfw - https://phabricator.wikimedia.org/T187474#3976649 (10RobH) a:05Papaul>03RobH [18:52:14] 10Operations, 10ops-codfw, 10DC-Ops, 10hardware-requests: Decommission old and unused/spare servers in codfw - https://phabricator.wikimedia.org/T187474#3976338 (10RobH) [18:56:06] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack frpig1001 - https://phabricator.wikimedia.org/T187365#3976660 (10RobH) [18:56:24] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack frdata1001 - https://phabricator.wikimedia.org/T187364#3976662 (10RobH) p:05Lowest>03Normal [18:56:57] (03PS1) 10Andrew Bogott: add ipv6 addresses for labweb1001 and 1002 [dns] - 10https://gerrit.wikimedia.org/r/411056 [18:57:36] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack frpig1001 - https://phabricator.wikimedia.org/T187365#3973216 (10RobH) [18:57:47] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack frdata1001 - https://phabricator.wikimedia.org/T187364#3973195 (10RobH) [18:57:59] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack frbast1001 - https://phabricator.wikimedia.org/T187363#3976671 (10RobH) [18:58:16] PROBLEM - Check systemd state on labpuppetmaster1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [18:58:55] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack frbast1001 - https://phabricator.wikimedia.org/T187363#3973183 (10RobH) p:05Lowest>03Normal [18:59:03] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack frpig1001 - https://phabricator.wikimedia.org/T187365#3976677 (10RobH) p:05Lowest>03Normal [19:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Morning SWAT (Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180215T1900). [19:00:04] Smalyshev, razesoldier, twkozlowski, and James_F: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [19:00:28] * James_F waves. [19:00:39] am here [19:00:44] standby [19:02:06] (03CR) 10Ayounsi: [C: 031] add ipv6 addresses for labweb1001 and 1002 [dns] - 10https://gerrit.wikimedia.org/r/411056 (owner: 10Andrew Bogott) [19:02:35] (03CR) 10Andrew Bogott: [C: 032] add ipv6 addresses for labweb1001 and 1002 [dns] - 10https://gerrit.wikimedia.org/r/411056 (owner: 10Andrew Bogott) [19:05:33] 10Operations, 10ops-codfw, 10DC-Ops, 10hardware-requests: Decommission old and unused/spare servers in codfw - https://phabricator.wikimedia.org/T187474#3976698 (10faidon) [19:06:58] I can SWAT [19:07:19] (03PS3) 10Thcipriani: Set SPARQL endpoint for category search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410242 (https://phabricator.wikimedia.org/T184840) (owner: 10Smalyshev) [19:07:19] Oh [19:07:26] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410242 (https://phabricator.wikimedia.org/T184840) (owner: 10Smalyshev) [19:08:18] thcipriani: this is purely config patch which is for master functionality so can't test it yet [19:08:36] and it doesn't do anything until you search for keyword [19:08:48] specifically deepcat: keyword [19:08:54] SMalyshev: ok, I will go ahead and deploy after merge :) [19:09:05] cool, thanks [19:10:44] (03Merged) 10jenkins-bot: Set SPARQL endpoint for category search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410242 (https://phabricator.wikimedia.org/T184840) (owner: 10Smalyshev) [19:10:54] (03CR) 10jenkins-bot: Set SPARQL endpoint for category search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410242 (https://phabricator.wikimedia.org/T184840) (owner: 10Smalyshev) [19:13:21] (03PS3) 10Thcipriani: Follow-up 0bfc7d8: Set Portal and Portal talk namespace alias of zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410536 (https://phabricator.wikimedia.org/T184866) (owner: 10星耀晨曦) [19:13:33] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410536 (https://phabricator.wikimedia.org/T184866) (owner: 10星耀晨曦) [19:13:47] RECOVERY - Check systemd state on labpuppetmaster1001 is OK: OK - running: The system is fully operational [19:13:58] !log thcipriani@tin Synchronized wmf-config/CirrusSearch-common.php: SWAT: [[gerrit:410242|Set SPARQL endpoint for category search]] T184840 (duration: 01m 12s) [19:14:05] ^ SMalyshev live everywhere [19:14:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:14:12] T184840: Create search keyword for deep category search - https://phabricator.wikimedia.org/T184840 [19:14:29] thcipriani: thanks! [19:15:55] yw :) [19:16:01] (03Merged) 10jenkins-bot: Follow-up 0bfc7d8: Set Portal and Portal talk namespace alias of zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410536 (https://phabricator.wikimedia.org/T184866) (owner: 10星耀晨曦) [19:16:16] PROBLEM - Check whether ferm is active by checking the default input chain on labpuppetmaster1001 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly [19:17:43] razesoldier: your change is live on mwdebug1002, check please [19:18:14] (03CR) 10jenkins-bot: Follow-up 0bfc7d8: Set Portal and Portal talk namespace alias of zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/410536 (https://phabricator.wikimedia.org/T184866) (owner: 10星耀晨曦) [19:18:29] thcipriani: How can I check? [19:19:15] I wrote a test in zhwiki [19:19:18] razesoldier: you can use the X-Wikimedia-Debug browser extension https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug to checkout mwdebug1002 and make sure zhwiki looks right there. [19:23:21] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3976742 (10mmodell) Interesting: I think I may have fo... [19:23:31] thcipriani: I'm sorry, I don't understand how to test [19:26:32] razesoldier: if you have the browser extension installed, you can point your browser to mwdebug1002 then, in this instance, you can go to https://zh.wikipedia.org/wiki/%E4%B8%BB%E9%A2%98:%E7%9B%AE%E5%BD%95 and ensure that you are redirected to the portal namespace. [19:26:45] razesoldier: it seems to work for me [19:28:21] razesoldier: I'll go ahead and deploy the change out, just be aware that in order for changes to be SWATed under most circumstances folks who put the patch up for swat have to be able to test it. [19:28:24] Can I use cli to test it? [19:28:40] 10Operations, 10Commons, 10Wikimedia-SVG-rendering, 10media-storage, 10Patch-For-Review: Install Noto fonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T184664#3976763 (10Johan) [19:30:04] razesoldier: sure, curl should work: https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug if you have access to mwdebug1002.eqiad.wmnet (I don't know if you do) you could ensure all works fine there. [19:33:39] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3976774 (10Paladox) @mmodell "This is finally fixed in... [19:34:40] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:410536|Set Portal and Portal talk namespace alias of zhwiki]] T184866 (duration: 01m 13s) [19:34:48] razesoldier: ^ live everywhere now [19:34:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:34:55] T184866: Set Portal and Portal talk namespace alias of zhwiki - https://phabricator.wikimedia.org/T184866 [19:36:03] (03PS2) 10Thcipriani: Follow-up 77be427a1: Enable the Beta Feature on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411048 (https://phabricator.wikimedia.org/T185708) (owner: 10Jforrester) [19:36:20] looks good for me [19:36:30] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411048 (https://phabricator.wikimedia.org/T185708) (owner: 10Jforrester) [19:36:32] (03CR) 10Catrope: [C: 04-1] "Which beta feature?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411048 (https://phabricator.wikimedia.org/T185708) (owner: 10Jforrester) [19:37:02] whoops. RoanKattouw James_F looks like there was a mid-air review collision [19:37:13] nm, just a commit message gripe [19:37:17] But it looks like the new alias link is still red [19:37:24] Didn't realize that was about to be deployed [19:37:30] It's fine, it's just that "the beta feature" is not very specific [19:37:37] gotcha [19:37:38] RoanKattouw: Psh. [19:37:48] RoanKattouw: It also says "follow-up" which gives the context. [19:40:12] (03Merged) 10jenkins-bot: Follow-up 77be427a1: Enable the Beta Feature on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411048 (https://phabricator.wikimedia.org/T185708) (owner: 10Jforrester) [19:40:21] (03CR) 10jenkins-bot: Follow-up 77be427a1: Enable the Beta Feature on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411048 (https://phabricator.wikimedia.org/T185708) (owner: 10Jforrester) [19:41:43] James_F: ^ is live on mwdebug1002, check please [19:41:55] (03PS3) 10Jforrester: Restrict FlaggedRevs to only operated on NS_MAIN on arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404620 (https://phabricator.wikimedia.org/T148603) (owner: 10TerraCodes) [19:42:16] Hmm. [19:43:40] not there? [19:43:45] thcipriani: It's not showing up yet. [19:45:16] hrm, grepped for it and it should be there. I just retouched IS.php as mwdeploy, lemme know if you see anything different. [19:46:14] thcipriani: Can you `mwscript eval.php --wiki testwiki var_dump($wgVisualEditorEnableDiffPageBetaFeature);` ? [19:46:23] * thcipriani does [19:48:11] James_F: https://gist.github.com/thcipriani/7c3da03b81cb23c9c732c3a8ed60c859 [19:48:21] How odd. [19:48:31] you're looking at 1002 right? [19:49:00] Yeah. I also checked 1001 just in case. [19:49:16] huh [19:50:21] thcipriani: Well, nothing seems broken. Leave it as is? [19:50:38] just on 1002? Or send it out? [19:51:31] Send it out. [19:51:48] Maybe it's just HHVM caching or something odd. [19:52:33] Aha. It's now showing up. [19:52:37] thcipriani: Sync away. [19:52:47] :) [19:52:49] sure thing [19:55:49] !log thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: Follow-up 77be427a1: [[gerrit:411048|Enable the Beta Feature on all wikis]] T185708 (duration: 01m 12s) [19:55:54] ^ James_F live everywhere [19:55:59] Thanks. [19:56:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:56:06] T185708: Make historical VisualDiffs a BetaFeature - https://phabricator.wikimedia.org/T185708 [19:56:10] yw :) [20:00:04] twentyafterfour: #bothumor I � Unicode. All rise for MediaWiki train deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180215T2000). [20:00:05] No GERRIT patches in the queue for this window AFAICS. [20:06:46] PROBLEM - puppet last run on cp4031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:09:42] !log syncing a patch before deploying 1.31.0-wmf.21 to all wikis. [20:09:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:11:04] !log twentyafterfour@tin Synchronized php-1.31.0-wmf.21/extensions/TwoColConflict/includes/TwoColConflictHooks.php: sync https://gerrit.wikimedia.org/r/#/c/410809/ (duration: 01m 13s) [20:11:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:12:12] thcipriani: Thank you for your work [20:12:31] razesoldier: sure thing! thanks for the patch! [20:13:49] (03PS1) 1020after4: all wikis to 1.31.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411072 [20:13:51] (03CR) 1020after4: [C: 032] all wikis to 1.31.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411072 (owner: 1020after4) [20:17:10] (03Merged) 10jenkins-bot: all wikis to 1.31.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411072 (owner: 1020after4) [20:18:16] !log twentyafterfour@tin rebuilt and synchronized wikiversions files: all wikis to 1.31.0-wmf.21 [20:18:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:20:12] (03CR) 10jenkins-bot: all wikis to 1.31.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411072 (owner: 1020after4) [20:22:11] !log 1.31.0-wmf.21 deployed: no apparent change in fatalmonitor error rate. refs T183960 [20:22:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:22:25] T183960: 1.31.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T183960 [20:36:46] RECOVERY - puppet last run on cp4031 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:53:46] (03CR) 10Imarlier: [C: 031] webperf: Re-use expected result by reference to simplify fixture [puppet] - 10https://gerrit.wikimedia.org/r/404045 (owner: 10Krinkle) [20:54:07] mutante: ^ safe to deploy whenever you have a minute :) [20:54:22] (or godog ) [21:03:31] 10Operations, 10ops-eqiad, 10DC-Ops, 10Data-Services, 10cloud-services-team: Decommission labstore100[12] and their disk shelves - https://phabricator.wikimedia.org/T187456#3977160 (10faidon) [21:05:58] 10Operations, 10ops-eqiad, 10DC-Ops, 10Data-Services, 10cloud-services-team: Decommission labstore100[12] and their disk shelves - https://phabricator.wikimedia.org/T187456#3977161 (10faidon) My apologies, this is all confusing! I corrected the task description to reflect that labstore100[12] have been r... [21:11:50] (03PS5) 10Bstorm: servers: install hp-health on all HP servers [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) [21:11:57] (03CR) 10Dzahn: [C: 032] webperf: Re-use expected result by reference to simplify fixture [puppet] - 10https://gerrit.wikimedia.org/r/404045 (owner: 10Krinkle) [21:12:03] (03PS5) 10Dzahn: webperf: Re-use expected result by reference to simplify fixture [puppet] - 10https://gerrit.wikimedia.org/r/404045 (owner: 10Krinkle) [21:12:04] Krinkle: doing now [21:12:30] (03CR) 10jerkins-bot: [V: 04-1] servers: install hp-health on all HP servers [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) (owner: 10Bstorm) [21:14:17] mutante: Thanks [21:14:28] mutante: Dependant commit https://gerrit.wikimedia.org/r/#/c/404046/ as well (also test-only) [21:15:56] (03CR) 10Dzahn: [C: 032] webperf: Introduce 'templates' in test fixture and use for mwload [puppet] - 10https://gerrit.wikimedia.org/r/404046 (owner: 10Krinkle) [21:15:58] (03PS3) 10Dzahn: webperf: Introduce 'templates' in test fixture and use for mwload [puppet] - 10https://gerrit.wikimedia.org/r/404046 (owner: 10Krinkle) [21:17:59] Krinkle: it's on the master now. yw [21:18:35] (03CR) 10Bstorm: "Weird. The tests failed for something that I don't think I touched here:" [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) (owner: 10Bstorm) [21:19:09] (03PS6) 10Bstorm: servers: install hp-health on all HP servers [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) [21:19:56] (03CR) 10jerkins-bot: [V: 04-1] servers: install hp-health on all HP servers [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) (owner: 10Bstorm) [21:20:10] (03CR) 10Paladox: "> Patch Set 5:" [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) (owner: 10Bstorm) [21:21:24] (03CR) 10Paladox: servers: install hp-health on all HP servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) (owner: 10Bstorm) [21:21:55] (03CR) 10Paladox: "(found the error, it's" [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) (owner: 10Bstorm) [21:29:32] (03PS7) 10Bstorm: servers: install hp-health on all HP servers [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) [21:29:56] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3977197 (10Dzahn) >>! In T182832#3974829, @elukey wrote... [21:29:58] (03PS8) 10Bstorm: servers: install hp-health on all HP servers [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) [21:30:29] (03CR) 10Bstorm: "Awesome, thanks paladox!" [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) (owner: 10Bstorm) [21:30:47] (03CR) 10Paladox: "Your welcome :)" [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) (owner: 10Bstorm) [21:44:12] (03CR) 10Bstorm: "Since this is totally different now, I'd love it if folks could give it another check and +1" [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) (owner: 10Bstorm) [21:52:58] 10Operations, 10Ops-Access-Requests, 10Traffic, 10Patch-For-Review: Ops Onboarding for Valentín Gutiérrez - https://phabricator.wikimedia.org/T187035#3977256 (10Vgutierrez) >>! In T187035#3976273, @Dzahn wrote: > @Vgutierrez re: Icinga command permissions. should be all done. the ultimate test is if you tr... [22:06:00] (03CR) 10Rush: [C: 031] "neat" [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) (owner: 10Bstorm) [22:06:11] (03CR) 10Rush: [C: 031] "@mortiz, seem good?" [puppet] - 10https://gerrit.wikimedia.org/r/410599 (https://phabricator.wikimedia.org/T187355) (owner: 10Bstorm) [22:26:39] Getting this error message whilst trying to preview a MassMessage at meta: "Request from [IP] via cp1065 frontend, Varnish XID 505188761 --- Upstream caches: cp1065 int --- Error: 500, Internal Server Error" keegan is filing a bug now, I'm just noting it here in case it indicates a deeper problem elsewhere. [22:27:03] Task is https://phabricator.wikimedia.org/T187510 [22:29:06] (03PS1) 10Chad: scap clean: Minor pylint nits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411160 [22:32:31] 10Operations, 10Analytics, 10hardware-requests: Refresh or replace oxygen - https://phabricator.wikimedia.org/T181264#3977328 (10faidon) a:05faidon>03RobH OK, let's do this, approved. It's spinning rust which is unfortunate, but with 64GB of RAM we could probably fit most of the dataset in the page cache... [22:32:49] no_justification: got access to the logs to see which is the error on https://phabricator.wikimedia.org/T187510 ? [22:42:15] 10Operations, 10Ops-Access-Requests, 10Traffic, 10Patch-For-Review: Ops Onboarding for Valentín Gutiérrez - https://phabricator.wikimedia.org/T187035#3977347 (10ayounsi) [22:42:18] l.ego found it already. (and my apologies for bringing a non-operations issue up here. :) Let's take to -tech for anything further) [22:43:00] PHP Fatal Error: Class undefined: MediaWiki\MassMessage\MediaWikiServices [22:43:51] no_justification: https://gerrit.wikimedia.org/r/411165 [22:44:13] +2 [22:44:16] thanks [22:44:38] could you backport/deploy it too? I'm about to get on a train, otherwise I can do it in 30-45ish min [22:44:41] <3 [22:44:59] I was gonna go get my haircut (I've been meaning to do it for the last 3 days) [22:45:26] Keegan / quiddity: is it ok if I deploy the patch in an hour? [22:45:30] or do you need it now? [22:45:31] (fwiw I can wait an hour) [22:45:36] ok [22:45:37] Heh :P [22:45:43] Safe travels, ty [22:45:58] (03PS1) 10Ayounsi: Rancid: add asw2-a/b/c-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/411166 [22:47:10] (03CR) 10Ayounsi: [C: 032] Rancid: add asw2-a/b/c-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/411166 (owner: 10Ayounsi) [22:47:29] I'll go ahead and do the cherry-pick, just won't +2 [22:48:11] Ok, got a patch up for wmf.21 [22:48:47] Eh whatever, I'll just deploy now. I got time [22:54:20] scap deploy :) [22:54:46] !log demon@tin Synchronized php-1.31.0-wmf.21/extensions/MassMessage/includes/MassMessage.php: fix use statement, T187510 (duration: 00m 57s) [22:54:55] * Hauskatze tests [22:54:56] Keegan, legoktm: ^^^ <3 [22:55:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:55:01] T187510: MassMessage preview is returning an Internal 500 error - https://phabricator.wikimedia.org/T187510 [22:55:12] LGTM! <3 all [22:55:45] yup, my page is working. Thanks! [22:56:02] yep, preview is working [22:56:07] now testing sending [22:56:24] Hauskatze, I did. Confirmed working [22:57:01] okay, I guess there's a bit of jobqueue lag for me :) [22:57:17] https://es.wikibooks.org/w/index.php?title=Usuario_discusi%C3%B3n:MarcoAurelio&diff=347953&oldid=341967 <-- yay [22:57:21] <3 legoktm and no_justification [22:59:11] This really was a Unbreak Now, they really unbroke now the thing :D [23:02:57] (03PS8) 10Paladox: Phabricator: Support php 7.1 under stretch [puppet] - 10https://gerrit.wikimedia.org/r/410245 (https://phabricator.wikimedia.org/T182832) [23:03:17] /14/8 [23:13:50] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3977421 (10Paladox) @Muehlenhoff hi, what about backpor... [23:35:37] (03CR) 10Chad: [C: 032] scap clean: Minor pylint nits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411160 (owner: 10Chad) [23:38:23] (03Merged) 10jenkins-bot: scap clean: Minor pylint nits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411160 (owner: 10Chad) [23:38:33] (03CR) 10jenkins-bot: scap clean: Minor pylint nits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/411160 (owner: 10Chad) [23:43:52] !log demon@tin Synchronized scap/plugins/clean.py: no-op (duration: 00m 56s) [23:44:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log