breaking editing in VE and rendering elsewhere - https://phabricator.wikimedia.org/T165139 [20:49:30] arlolra: tell me when you're done. Thanks [20:49:41] Amir1: all done [20:49:48] thanks [20:50:45] !log ladsgroup@tin Started deploy [ores/deploy@4874809]: Second deploy of ores for enabling frwiki damaging [20:50:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:50:55] (03CR) 10Dzahn: "pretty sure you can't do "require => ''", that will break and not mean "nothing required". i asked" [puppet] - 10https://gerrit.wikimedia.org/r/353964 (https://phabricator.wikimedia.org/T165462) (owner: 10Paladox) [20:51:38] (03CR) 10Dzahn: ""undef" could work." [puppet] - 10https://gerrit.wikimedia.org/r/353964 (https://phabricator.wikimedia.org/T165462) (owner: 10Paladox) [20:52:11] (03PS8) 10Paladox: HHVM: Fix puppet on trusty [puppet] - 10https://gerrit.wikimedia.org/r/353964 (https://phabricator.wikimedia.org/T165462) [20:52:16] (03PS9) 10Paladox: HHVM: Fix puppet on trusty [puppet] - 10https://gerrit.wikimedia.org/r/353964 (https://phabricator.wikimedia.org/T165462) [20:55:52] canary died, rolling back [20:56:09] !log ladsgroup@tin Finished deploy [ores/deploy@4874809]: Second deploy of ores for enabling frwiki damaging (duration: 05m 23s) [20:56:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:57:38] (03CR) 10Dereckson: "Initially scheduled this Monday 18:00 UTC, but not deployed. Please reschudle it in another SWAT window." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/353173 (owner: 10Aaron Schulz) [20:57:41] (03CR) 10Dereckson: "Initially scheduled this Monday 18:00 UTC, but not deployed. Please reschudle it in another SWAT window." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331808 (owner: 10Aaron Schulz) [21:00:04] dapatrick, bawolff, and Reedy: Respected human, time to deploy Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170522T2100). Please do the needful. [21:03:36] RECOVERY - Disk space on elastic1023 is OK: DISK OK [21:08:16] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [21:08:16] PROBLEM - citoid endpoints health on scb2003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [21:08:17] PROBLEM - citoid endpoints health on scb2004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [21:08:17] PROBLEM - citoid endpoints health on scb2006 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [21:08:17] PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [21:08:27] PROBLEM - citoid endpoints health on scb2005 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [21:10:01] !log BBR: cp1074: reverted back to cubic+pfifo_fast - T147569 [21:10:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:10:10] T147569: Evaluate/Deploy TCP BBR when available (kernel 4.9+) - https://phabricator.wikimedia.org/T147569 [21:10:56] PROBLEM - puppet last run on db1066 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:11:01] !log BBR: cp1065: reverted back to cubic+pfifo_fast - T147569 [21:11:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:11:16] PROBLEM - citoid endpoints health on scb2002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [21:12:06] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [21:12:06] RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy [21:12:16] RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy [21:12:16] RECOVERY - citoid endpoints health on scb2003 is OK: All endpoints are healthy [21:12:16] RECOVERY - citoid endpoints health on scb2006 is OK: All endpoints are healthy [21:12:17] RECOVERY - citoid endpoints health on scb2004 is OK: All endpoints are healthy [21:12:26] RECOVERY - citoid endpoints health on scb2005 is OK: All endpoints are healthy [21:14:06] is citiod behavior related to the recent deployment of ores? [21:19:17] Amir1: it's been doing that all weekend :/ [21:35:07] PROBLEM - Nginx local proxy to apache on mw1218 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.151 second response time [21:35:07] PROBLEM - HHVM rendering on mw1218 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.074 second response time [21:36:06] RECOVERY - Nginx local proxy to apache on mw1218 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.186 second response time [21:36:06] RECOVERY - HHVM rendering on mw1218 is OK: HTTP OK: HTTP/1.1 200 OK - 79458 bytes in 0.316 second response time [21:39:56] RECOVERY - puppet last run on db1066 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [22:19:16] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [22:19:16] PROBLEM - citoid endpoints health on scb2002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [22:19:26] PROBLEM - citoid endpoints health on scb2004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [22:19:26] PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [22:19:26] PROBLEM - citoid endpoints health on scb2006 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [22:19:26] PROBLEM - citoid endpoints health on scb2003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [22:19:36] PROBLEM - citoid endpoints health on scb2005 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [22:21:26] RECOVERY - citoid endpoints health on scb2005 is OK: All endpoints are healthy [22:21:49] (03CR) 10Nemo bis: [C: 031] "Per my comment on the task" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354549 (https://phabricator.wikimedia.org/T121995) (owner: 10Dereckson) [22:22:06] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [22:22:16] RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy [22:22:16] RECOVERY - citoid endpoints health on scb2006 is OK: All endpoints are healthy [22:22:17] RECOVERY - citoid endpoints health on scb2004 is OK: All endpoints are healthy [22:22:17] RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy [22:22:17] RECOVERY - citoid endpoints health on scb2003 is OK: All endpoints are healthy [23:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170522T2300). Please do the needful. [23:00:05] Dereckson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:06:47] * AaronSchulz goes [23:06:57] (03PS7) 10Aaron Schulz: Include DB shard in production SPI log entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331808 [23:07:04] (03CR) 10Aaron Schulz: [C: 032] Include DB shard in production SPI log entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331808 (owner: 10Aaron Schulz) [23:11:26] PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [23:11:27] PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [23:11:36] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [23:11:36] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [23:11:36] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [23:13:22] (03Merged) 10jenkins-bot: Include DB shard in production SPI log entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331808 (owner: 10Aaron Schulz) [23:13:37] (03CR) 10jenkins-bot: Include DB shard in production SPI log entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331808 (owner: 10Aaron Schulz) [23:14:26] RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy [23:14:26] RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy [23:14:26] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [23:14:27] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [23:14:27] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [23:15:33] !log aaron@tin Synchronized wmf-config/logging.php: Include DB shard in production SPI log entries (duration: 00m 38s) [23:15:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:16:02] (03CR) 10Aaron Schulz: [C: 032] Move swift auth URL to ProductionServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/353173 (owner: 10Aaron Schulz) [23:18:35] (03Merged) 10jenkins-bot: Move swift auth URL to ProductionServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/353173 (owner: 10Aaron Schulz) [23:18:44] (03CR) 10jenkins-bot: Move swift auth URL to ProductionServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/353173 (owner: 10Aaron Schulz) [23:19:58] !log aaron@tin Synchronized wmf-config/ProductionServices.php: Move swift auth URL to ProductionServices (duration: 00m 38s) [23:20:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:20:52] !log aaron@tin Synchronized wmf-config/filebackend.php: Move swift auth URL to ProductionServices (duration: 00m 38s) [23:20:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:23:26] (03CR) 10Paladox: "This breaks beta scap https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17268/console" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/353173 (owner: 10Aaron Schulz) [23:27:25] (03PS1) 10Chad: Revert "Move swift auth URL to ProductionServices" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/355170 [23:27:32] (03CR) 10Chad: [C: 032] Revert "Move swift auth URL to ProductionServices" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/355170 (owner: 10Chad) [23:30:30] (03Merged) 10jenkins-bot: Revert "Move swift auth URL to ProductionServices" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/355170 (owner: 10Chad) [23:30:39] (03CR) 10jenkins-bot: Revert "Move swift auth URL to ProductionServices" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/355170 (owner: 10Chad) [23:33:43] !log demon@tin Synchronized wmf-config/filebackend.php: I4b19b4a8f4f1ff7ad65fc02c0b89da651a883524 (duration: 00m 38s) [23:33:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:34:30] !log demon@tin Synchronized wmf-config/ProductionServices.php: I4b19b4a8f4f1ff7ad65fc02c0b89da651a883524 (duration: 00m 38s) [23:34:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:34:50] (03PS1) 10Aaron Schulz: Set mediaSwift* keys in LabsServices.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/355172 [23:35:11] AaronSchulz: I reverted you too [23:35:20] Didn't know if you were around. [23:39:45] (03PS2) 10Aaron Schulz: Move swift auth URL to ProductionServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/355172 [23:40:33] RainbowSprinkles: t'was patching. I'll squashed into https://gerrit.wikimedia.org/r/#/c/355172/ now. [23:40:39] K [23:40:43] *all squashed, heh [23:41:13] (03PS3) 10Dzahn: contint: role/profile conversion [puppet] - 10https://gerrit.wikimedia.org/r/355156 [23:41:28] In an ideal world, we'd standardize names and could just foreach the DCs ;-) [23:41:39] But beta makes up funny names [23:42:32] (03CR) 10Chad: [C: 031] Move swift auth URL to ProductionServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/355172 (owner: 10Aaron Schulz) [23:43:07] RainbowSprinkles: is https://gerrit.wikimedia.org/r/#/c/354586/ going on? [23:43:25] otherwise, I'll do another go since prod looked fine [23:43:32] I wasn't doing swat [23:43:52] Dereckson didn't say anything, and can self-deploy if so desired [23:44:12] I assume it's a regular sync-file deal, I could just do that I suppose [23:44:51] Yeah, sync-file. I'd do the Services files first, then filebackend [23:44:54] (03CR) 10Aaron Schulz: [C: 032] Fix hy.wikipedia high resolution logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354586 (https://phabricator.wikimedia.org/T165811) (owner: 10Dereckson) [23:44:56] (reverse of what I did) [23:45:04] Ah, for yours I meant [23:46:19] (03Merged) 10jenkins-bot: Fix hy.wikipedia high resolution logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354586 (https://phabricator.wikimedia.org/T165811) (owner: 10Dereckson) [23:46:29] (03CR) 10jenkins-bot: Fix hy.wikipedia high resolution logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354586 (https://phabricator.wikimedia.org/T165811) (owner: 10Dereckson) [23:47:30] (03CR) 10Aaron Schulz: [C: 032] Move swift auth URL to ProductionServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/355172 (owner: 10Aaron Schulz) [23:48:32] (03Merged) 10jenkins-bot: Move swift auth URL to ProductionServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/355172 (owner: 10Aaron Schulz) [23:48:42] (03CR) 10jenkins-bot: Move swift auth URL to ProductionServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/355172 (owner: 10Aaron Schulz) [23:48:54] !log aaron@tin Synchronized static/images/project-logos/hywiki-1.5x.png: Fix hy.wikipedia high resolution logos (duration: 00m 38s) [23:49:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:49:43] !log aaron@tin Synchronized static/images/project-logos/hywiki-2x.png: Fix hy.wikipedia high resolution logos (duration: 00m 38s) [23:49:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:49:56] (03CR) 10Chad: [C: 031] Add techconduct.wikimedia.org for new private wiki [dns] - 10https://gerrit.wikimedia.org/r/354954 (https://phabricator.wikimedia.org/T165977) (owner: 10Dereckson) [23:50:21] (03CR) 10Chad: [C: 031] Set initial configuration for techconduct.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354985 (https://phabricator.wikimedia.org/T165977) (owner: 10Dereckson) [23:51:22] !log aaron@tin Synchronized wmf-config: Move swift auth URL to ProductionServices (duration: 00m 52s) [23:51:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:51:43] (03CR) 10Dzahn: [C: 032] Add techconduct.wikimedia.org for new private wiki [dns] - 10https://gerrit.wikimedia.org/r/354954 (https://phabricator.wikimedia.org/T165977) (owner: 10Dereckson) [23:52:24] AaronSchulz: Heh, you could possibly race on apaches where filebackend lands before *Services.php [23:52:32] (hence why I suggested ordered sync-file) [23:52:55] But self-fixes...soon [23:53:11] in theory, the first time I did PS, then fb. [23:53:33] It would be three syncs now though. [23:53:48] Could do sync-file then sync-dir [23:53:50] But yeah [23:53:51] afaik we still do the /tmp rename step, so it's a dot of a window [23:53:53] It's all kind of ugly [23:54:03] (on the resync level) [23:56:22] (03PS3) 10Dzahn: graphite: move 'standard' and 'base::firewall' to role [puppet] - 10https://gerrit.wikimedia.org/r/353364