[00:11:58] RECOVERY - puppet last run on pc2004 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [00:28:52] Request from 90.180.83.194 via cp3007 cp3007, Varnish XID 402671165 [00:28:55] Error: 503, Backend fetch failed at Sun, 03 Jul 2016 00:28:36 GMT [00:36:14] Krenair: no, it didn't work on my home Wi-Fi but it did work on my mobile connection [00:36:33] I have 4G so I don't think I can test any other mobile connection [00:36:44] I suppose my ISP just fucked something up [00:37:27] twkozlowski, okay, can you run a traceroute on your home WiFi? [01:26:48] PROBLEM - MegaRAID on labstore2001 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) [01:27:17] PROBLEM - MD RAID on labstore2001 is CRITICAL: CRITICAL: Active: 11, Working: 11, Failed: 1, Spare: 0 [02:06:47] PROBLEM - Last backup of the tools filesystem on labstore1001 is CRITICAL: CRITICAL - Last run result for unit replicate-tools was exit-code [02:09:39] PROBLEM - puppet last run on oxygen is CRITICAL: CRITICAL: Puppet has 1 failures [02:21:35] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.8) (duration: 09m 13s) [02:21:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:27:13] !log l10nupdate@tin ResourceLoader cache refresh completed at Sun Jul 3 02:27:13 UTC 2016 (duration 5m 38s) [02:27:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:30:39] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [1000.0] [02:33:49] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] [02:35:09] RECOVERY - puppet last run on oxygen is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [02:38:48] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [02:40:49] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [02:47:03] 06Operations, 10GlobalRename, 10MediaWiki-extensions-CentralAuth: GlobalRename gets stuck sometimes - https://phabricator.wikimedia.org/T137973#2385660 (10Pokefan95) Isn't this a duplicate of T135656? [02:50:50] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [02:52:08] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:06:48] PROBLEM - Last backup of the others filesystem on labstore1001 is CRITICAL: CRITICAL - Last run result for unit replicate-others was exit-code [04:05:05] PROBLEM - Last backup of the maps filesystem on labstore1001 is CRITICAL: CRITICAL - Last run result for unit replicate-maps was exit-code [05:56:44] legoktm: regarding T137973 , any idea if this will be fixed ever? the backlogs is growing and growing. [05:56:45] T137973: GlobalRename gets stuck sometimes - https://phabricator.wikimedia.org/T137973 [06:18:54] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:21:13] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [06:31:34] PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:04] PROBLEM - puppet last run on cp3048 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:14] PROBLEM - puppet last run on wtp2008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:55] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:46] PROBLEM - puppet last run on mw2228 is CRITICAL: CRITICAL: Puppet has 2 failures [06:33:55] PROBLEM - puppet last run on mc2007 is CRITICAL: CRITICAL: Puppet has 2 failures