[00:01:24] RECOVERY - puppet last run on mc1013 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [00:01:58] 06Operations, 10Mail, 10Wikimedia-General-or-Unknown, 13Patch-For-Review, 05Security: DMARC: Users cannot send emails via a wiki's [[Special:EmailUser]] - https://phabricator.wikimedia.org/T66795#2824871 (10Huji) [[https://gerrit.wikimedia.org/r/#/c/322243/ | r322243 ]] will possibly fix this for Wikimed... [00:02:20] 06Operations, 10Mail, 10MediaWiki-Email, 10Wikimedia-General-or-Unknown, and 2 others: DMARC: Users cannot send emails via a wiki's [[Special:EmailUser]] - https://phabricator.wikimedia.org/T66795#2824872 (10Huji) [00:03:34] PROBLEM - dhclient process on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:04:24] PROBLEM - salt-minion processes on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:05:37] PROBLEM - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:06:24] RECOVERY - salt-minion processes on thumbor1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [00:06:24] RECOVERY - dhclient process on thumbor1001 is OK: PROCS OK: 0 processes with command name dhclient [00:06:28] RECOVERY - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 241 bytes in 0.353 second response time [00:07:17] 06Operations, 10Mail, 10MediaWiki-Email, 10Wikimedia-General-or-Unknown, and 2 others: DMARC: Users cannot send emails via a wiki's [[Special:EmailUser]] - https://phabricator.wikimedia.org/T66795#2824888 (10Huji) Another thought: If an email is tried to be sent a few times and it bounces (as appears to b... [00:08:45] meh thumbor, not at home now i will take a look later [00:09:00] (03PS4) 10Huji: Set $wgUserEmailUseReplyTo = true; [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322243 (https://phabricator.wikimedia.org/T66795) (owner: 10Legoktm) [00:09:15] (03CR) 10Huji: "How about this way? I am doing this for the same reasons as we did it in https://gerrit.wikimedia.org/r/#/c/316291/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322243 (https://phabricator.wikimedia.org/T66795) (owner: 10Legoktm) [00:14:25] 06Operations, 10Mail, 10MediaWiki-Email, 10Wikimedia-General-or-Unknown, and 2 others: Email server's DMARC config prevents users from sending emails via Special:EmailUser - https://phabricator.wikimedia.org/T66795#685159 (10Huji) [00:19:29] (03PS5) 10Legoktm: Set $wgUserEmailUseReplyTo = true; everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322243 (https://phabricator.wikimedia.org/T66795) [00:19:31] (03PS1) 10Legoktm: Set $wgUserEmailUseReplyTo = true; on group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323672 (https://phabricator.wikimedia.org/T66795) [00:21:07] (03CR) 10Legoktm: "> How about this way? I am doing this for the same reasons as we did it in https://gerrit.wikimedia.org/r/#/c/316291/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322243 (https://phabricator.wikimedia.org/T66795) (owner: 10Legoktm) [00:32:24] PROBLEM - salt-minion processes on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:34] PROBLEM - dhclient process on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:35:59] (03CR) 10Gergő Tisza: "Uh, sorry, I must have gotten confused." [puppet] - 10https://gerrit.wikimedia.org/r/323351 (https://phabricator.wikimedia.org/T136849) (owner: 10BryanDavis) [00:36:14] RECOVERY - salt-minion processes on thumbor1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [00:36:24] RECOVERY - dhclient process on thumbor1001 is OK: PROCS OK: 0 processes with command name dhclient [00:39:37] PROBLEM - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:40:27] RECOVERY - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 241 bytes in 0.010 second response time [00:41:04] 06Operations, 06Labs, 10Tool-Labs, 10Traffic, 07HTTPS: Detect tools.wmflabs.org tools which are HTTP-only - https://phabricator.wikimedia.org/T128409#2824992 (10scfc) [00:42:18] heh i guess it is the same issue as the other day, shoild be able to look in 2h or so [00:59:56] 07Puppet, 06Labs, 10Tool-Labs: Puppetize adding a host to a particular queue - https://phabricator.wikimedia.org/T88713#2825013 (10scfc) [01:03:28] (03PS6) 10Huji: Set $wgUserEmailUseReplyTo = true; [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322243 (https://phabricator.wikimedia.org/T66795) (owner: 10Legoktm) [01:04:03] (03CR) 10Huji: "PS reverts it to what you had originally submitted." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322243 (https://phabricator.wikimedia.org/T66795) (owner: 10Legoktm) [01:04:43] 06Operations, 06Labs, 10Tool-Labs, 10Traffic, 07HTTPS: Migrate tools.wmflabs.org to https only (and set HSTS) - https://phabricator.wikimedia.org/T102367#2825020 (10scfc) [01:19:14] PROBLEM - puppet last run on db1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:40:04] (03CR) 10BryanDavis: "> Don't you need a drop {} for exception-json to avoid double-logging" [puppet] - 10https://gerrit.wikimedia.org/r/323351 (https://phabricator.wikimedia.org/T136849) (owner: 10BryanDavis) [01:47:14] RECOVERY - puppet last run on db1009 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [02:00:05] !log l10nupdate@tin LocalisationUpdate failed: git pull of core failed [02:00:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:01:10] Um, I'm not seing any numbers where they should be on https://en.wikipedia.org/wiki/House_of_Savoy - # does not generate numbers, and they don't show up next to references in the reflists [02:01:36] and the bullets in the "see also" is not showing up... [02:02:46] is there something wrong with some kind of parser, or with the wikicode... [02:06:14] PROBLEM - Apache HTTP on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:06:14] PROBLEM - HHVM rendering on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:07:04] RECOVERY - Apache HTTP on mw1195 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.825 second response time [02:07:04] RECOVERY - HHVM rendering on mw1195 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 3.749 second response time [02:09:24] PROBLEM - HHVM rendering on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:09:44] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:10:14] PROBLEM - Apache HTTP on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:10:24] RECOVERY - HHVM rendering on mw1199 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 2.168 second response time [02:10:24] PROBLEM - HHVM rendering on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:10:44] PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:11:04] RECOVERY - Apache HTTP on mw1197 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.848 second response time [02:11:24] PROBLEM - Apache HTTP on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:11:24] RECOVERY - HHVM rendering on mw1284 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 8.913 second response time [02:11:34] PROBLEM - HHVM rendering on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:11:34] PROBLEM - HHVM rendering on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:11:44] PROBLEM - Apache HTTP on mw1281 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:11:44] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [02:12:14] RECOVERY - Apache HTTP on mw1194 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.392 second response time [02:12:24] RECOVERY - HHVM rendering on mw1197 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 2.349 second response time [02:12:34] RECOVERY - Apache HTTP on mw1281 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.337 second response time [02:12:44] PROBLEM - mobileapps endpoints health on scb1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:12:44] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:13:04] PROBLEM - HHVM rendering on mw1278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:13:14] PROBLEM - Apache HTTP on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:14:04] PROBLEM - Apache HTTP on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:14:04] PROBLEM - Apache HTTP on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:14:04] RECOVERY - Apache HTTP on mw1284 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 6.591 second response time [02:14:04] PROBLEM - HHVM rendering on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:14:05] PROBLEM - HHVM rendering on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:14:24] PROBLEM - Apache HTTP on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:14:24] PROBLEM - HHVM rendering on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:14:24] PROBLEM - HHVM rendering on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:14:24] PROBLEM - HHVM rendering on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:14:24] PROBLEM - Apache HTTP on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:14:34] RECOVERY - HHVM rendering on mw1194 is OK: HTTP OK: HTTP/1.1 200 OK - 69343 bytes in 7.691 second response time [02:14:44] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [02:14:44] RECOVERY - mobileapps endpoints health on scb1004 is OK: All endpoints are healthy [02:14:44] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:15:15] RECOVERY - HHVM rendering on mw1280 is OK: HTTP OK: HTTP/1.1 200 OK - 69341 bytes in 0.187 second response time [02:15:24] RECOVERY - HHVM rendering on mw1284 is OK: HTTP OK: HTTP/1.1 200 OK - 69343 bytes in 9.489 second response time [02:15:44] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy [02:15:54] RECOVERY - Apache HTTP on mw1286 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.081 second response time [02:15:54] RECOVERY - HHVM rendering on mw1286 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 0.188 second response time [02:16:04] PROBLEM - HHVM rendering on mw1287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:16:14] RECOVERY - Apache HTTP on mw1279 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.068 second response time [02:16:14] PROBLEM - Apache HTTP on mw1287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:16:34] PROBLEM - Apache HTTP on mw1276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:16:44] PROBLEM - mobileapps endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:17:04] PROBLEM - Apache HTTP on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:17:14] PROBLEM - Apache HTTP on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:17:14] PROBLEM - HHVM rendering on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:17:14] PROBLEM - HHVM rendering on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:17:14] RECOVERY - HHVM rendering on mw1198 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 0.939 second response time [02:17:24] RECOVERY - Apache HTTP on mw1198 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.108 second response time [02:17:24] PROBLEM - HHVM rendering on mw1203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:17:54] RECOVERY - HHVM rendering on mw1279 is OK: HTTP OK: HTTP/1.1 200 OK - 69340 bytes in 0.098 second response time [02:18:04] RECOVERY - HHVM rendering on mw1287 is OK: HTTP OK: HTTP/1.1 200 OK - 69343 bytes in 4.215 second response time [02:18:04] RECOVERY - Apache HTTP on mw1287 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.046 second response time [02:18:05] PROBLEM - HHVM rendering on mw1276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:18:15] RECOVERY - Apache HTTP on mw1195 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.987 second response time [02:18:15] RECOVERY - HHVM rendering on mw1195 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 8.108 second response time [02:18:15] PROBLEM - Apache HTTP on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:18:15] RECOVERY - HHVM rendering on mw1203 is OK: HTTP OK: HTTP/1.1 200 OK - 69341 bytes in 0.174 second response time [02:18:24] PROBLEM - Apache HTTP on mw1278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:18:57] RECOVERY - Apache HTTP on mw1276 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.026 second response time [02:18:57] PROBLEM - HHVM rendering on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:18:57] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy [02:18:57] PROBLEM - mobileapps endpoints health on scb2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:18:57] PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:18:57] RECOVERY - Apache HTTP on mw1202 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.037 second response time [02:19:04] RECOVERY - HHVM rendering on mw1202 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 0.168 second response time [02:19:05] PROBLEM - HHVM rendering on mw1283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:19:37] RECOVERY - Apache HTTP on mw1278 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.043 second response time [02:19:37] PROBLEM - HHVM rendering on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:19:37] PROBLEM - HHVM rendering on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:19:44] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:19:54] PROBLEM - restbase endpoints health on praseodymium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:20:04] RECOVERY - Apache HTTP on mw1280 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.056 second response time [02:20:04] RECOVERY - HHVM rendering on mw1278 is OK: HTTP OK: HTTP/1.1 200 OK - 69343 bytes in 1.947 second response time [02:20:04] PROBLEM - HHVM rendering on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:04] PROBLEM - Apache HTTP on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:14] PROBLEM - Apache HTTP on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:24] PROBLEM - Apache HTTP on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:44] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [02:20:44] PROBLEM - HHVM rendering on mw1207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:44] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:20:44] RECOVERY - restbase endpoints health on praseodymium is OK: All endpoints are healthy [02:21:04] RECOVERY - Apache HTTP on mw1284 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.057 second response time [02:21:14] PROBLEM - Apache HTTP on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:21:24] RECOVERY - HHVM rendering on mw1280 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 6.825 second response time [02:21:24] RECOVERY - Apache HTTP on mw1201 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.707 second response time [02:21:24] PROBLEM - HHVM rendering on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:21:24] PROBLEM - HHVM rendering on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:21:34] RECOVERY - HHVM rendering on mw1197 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 7.118 second response time [02:21:34] PROBLEM - Apache HTTP on mw1276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:21:44] PROBLEM - restbase endpoints health on restbase1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:21:44] PROBLEM - mobileapps endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:21:44] PROBLEM - mobileapps endpoints health on scb1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:21:52] Reedy ^^ [02:22:04] RECOVERY - HHVM rendering on mw1283 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 1.088 second response time [02:22:04] RECOVERY - HHVM rendering on mw1290 is OK: HTTP OK: HTTP/1.1 200 OK - 69343 bytes in 6.542 second response time [02:22:04] RECOVERY - Apache HTTP on mw1290 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.717 second response time [02:22:04] RECOVERY - Apache HTTP on mw1196 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.126 second response time [02:22:05] PROBLEM - Apache HTTP on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:22:05] PROBLEM - HHVM rendering on mw1282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:22:14] PROBLEM - Apache HTTP on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:22:14] PROBLEM - HHVM rendering on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:22:14] PROBLEM - HHVM rendering on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:22:15] Anything actually down? [02:22:21] Not sure [02:22:24] RECOVERY - HHVM rendering on mw1199 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 0.830 second response time [02:22:24] PROBLEM - Apache HTTP on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:22:34] RECOVERY - HHVM rendering on mw1207 is OK: HTTP OK: HTTP/1.1 200 OK - 69343 bytes in 1.140 second response time [02:22:46] PROBLEM - Apache HTTP on mw1281 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:22:47] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy [02:23:04] PROBLEM - Apache HTTP on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:23:04] PROBLEM - Apache HTTP on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:23:04] PROBLEM - Apache HTTP on mw1288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:23:04] PROBLEM - Apache HTTP on mw1233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:23:04] RECOVERY - Apache HTTP on mw1199 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.698 second response time [02:23:05] RECOVERY - Apache HTTP on mw1206 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.493 second response time [02:23:05] PROBLEM - HHVM rendering on mw1288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:23:05] PROBLEM - HHVM rendering on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:23:14] RECOVERY - HHVM rendering on mw1206 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 6.301 second response time [02:23:14] PROBLEM - HHVM rendering on mw1281 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:23:14] PROBLEM - HHVM rendering on mw1233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:23:14] RECOVERY - HHVM rendering on mw1284 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 0.216 second response time [02:23:14] RECOVERY - HHVM rendering on mw1201 is OK: HTTP OK: HTTP/1.1 200 OK - 69343 bytes in 8.161 second response time [02:23:24] PROBLEM - Apache HTTP on mw1203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:23:24] PROBLEM - HHVM rendering on mw1203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:23:34] PROBLEM - Apache HTTP on mw1282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:23:34] PROBLEM - HHVM rendering on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:23:44] RECOVERY - Apache HTTP on mw1281 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.673 second response time [02:23:44] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:24:05] PROBLEM - Apache HTTP on mw1190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:24:05] PROBLEM - HHVM rendering on mw1278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:24:13] Reedy could this be api? [02:24:14] RECOVERY - HHVM rendering on mw1281 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 9.006 second response time [02:24:24] PROBLEM - Apache HTTP on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:24:24] PROBLEM - Apache HTTP on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:24:24] PROBLEM - HHVM rendering on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:24:34] RECOVERY - Apache HTTP on mw1276 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.000 second response time [02:24:34] PROBLEM - Apache HTTP on mw1207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:24:35] Since hhvm going down on mw1* seems like it may affect api. [02:24:51] depends if they're api appservers [02:24:54] PROBLEM - restbase endpoints health on restbase1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:24:54] RECOVERY - Apache HTTP on mw1190 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.113 second response time [02:24:54] RECOVERY - HHVM rendering on mw1282 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 0.173 second response time [02:25:04] PROBLEM - Apache HTTP on mw1283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:25:04] RECOVERY - HHVM rendering on mw1278 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 8.471 second response time [02:25:04] PROBLEM - HHVM rendering on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:25:05] PROBLEM - HHVM rendering on mw1283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:25:14] PROBLEM - Apache HTTP on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:25:14] PROBLEM - HHVM rendering on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:25:14] PROBLEM - Apache HTTP on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:25:14] RECOVERY - Apache HTTP on mw1194 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 2.722 second response time [02:25:14] RECOVERY - Apache HTTP on mw1203 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 4.742 second response time [02:25:14] RECOVERY - HHVM rendering on mw1203 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 1.920 second response time [02:25:24] PROBLEM - HHVM rendering on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:25:34] RECOVERY - HHVM rendering on mw1194 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 3.791 second response time [02:25:34] RECOVERY - Apache HTTP on mw1282 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.105 second response time [02:25:44] PROBLEM - restbase endpoints health on restbase2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:25:44] PROBLEM - restbase endpoints health on restbase1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:25:44] PROBLEM - mobileapps endpoints health on scb2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:25:44] PROBLEM - restbase endpoints health on restbase-test2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:25:44] PROBLEM - restbase endpoints health on restbase-test2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:25:44] PROBLEM - mobileapps endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:25:44] PROBLEM - restbase endpoints health on xenon is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:25:45] PROBLEM - restbase endpoints health on restbase2006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:25:54] PROBLEM - restbase endpoints health on restbase2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:25:54] RECOVERY - Apache HTTP on mw1280 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.452 second response time [02:26:04] RECOVERY - Apache HTTP on mw1233 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 4.488 second response time [02:26:04] RECOVERY - Apache HTTP on mw1290 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.007 second response time [02:26:04] RECOVERY - HHVM rendering on mw1290 is OK: HTTP OK: HTTP/1.1 200 OK - 69343 bytes in 6.120 second response time [02:26:05] PROBLEM - HHVM rendering on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:26:05] PROBLEM - HHVM rendering on mw1287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:26:10] Reedy adding things to the watchlist seems to be slow [02:26:13] on en wiki [02:26:14] PROBLEM - Apache HTTP on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:26:14] PROBLEM - HHVM rendering on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:26:14] PROBLEM - HHVM rendering on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:26:14] PROBLEM - Apache HTTP on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:26:14] PROBLEM - Apache HTTP on mw1287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:26:14] PROBLEM - HHVM rendering on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:26:14] RECOVERY - HHVM rendering on mw1280 is OK: HTTP OK: HTTP/1.1 200 OK - 69343 bytes in 2.480 second response time [02:26:24] PROBLEM - HHVM rendering on mw1289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:26:24] PROBLEM - PyBal backends health check on lvs1006 is CRITICAL: PYBAL CRITICAL - api_80 - Could not depool server mw1196.eqiad.wmnet because of too many down! [02:26:24] but removing pages from the watchlist is fast. [02:26:24] RECOVERY - Apache HTTP on mw1207 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.360 second response time [02:26:24] PROBLEM - HHVM rendering on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:26:24] PROBLEM - HHVM rendering on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:26:34] PROBLEM - Apache HTTP on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:26:44] PROBLEM - Apache HTTP on mw1281 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:26:44] PROBLEM - restbase endpoints health on restbase1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:26:44] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:26:44] PROBLEM - restbase endpoints health on restbase2005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:26:44] PROBLEM - restbase endpoints health on restbase2011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:26:44] PROBLEM - restbase endpoints health on restbase2008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:26:45] PROBLEM - restbase endpoints health on restbase2007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:26:45] PROBLEM - restbase endpoints health on restbase1008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:26:46] PROBLEM - restbase endpoints health on cerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:26:50] LOL [02:26:54] PROBLEM - restbase endpoints health on restbase2009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:26:54] PROBLEM - restbase endpoints health on praseodymium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:26:54] PROBLEM - restbase endpoints health on restbase1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:27:04] RECOVERY - HHVM rendering on mw1206 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 3.063 second response time [02:27:05] PROBLEM - Apache HTTP on mw1189 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:27:05] PROBLEM - Apache HTTP on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:27:14] RECOVERY - HHVM rendering on mw1233 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 6.366 second response time [02:27:14] PROBLEM - HHVM rendering on mw1281 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:27:14] PROBLEM - HHVM rendering on mw1190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:27:14] PROBLEM - Apache HTTP on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:27:15] RECOVERY - HHVM rendering on mw1201 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 9.231 second response time [02:27:24] RECOVERY - Apache HTTP on mw1201 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.108 second response time [02:27:24] PROBLEM - HHVM rendering on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:27:24] PROBLEM - HHVM rendering on mw1204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:27:24] PROBLEM - Apache HTTP on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:27:24] PROBLEM - PyBal backends health check on lvs1003 is CRITICAL: PYBAL CRITICAL - api_80 - Could not depool server mw1203.eqiad.wmnet because of too many down! [02:27:37] PROBLEM - LVS HTTP IPv4 on api.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:27:37] PROBLEM - HHVM rendering on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:27:37] PROBLEM - Apache HTTP on mw1276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:27:37] Reedy watchlist is broken [02:27:39] again [02:27:43] ori ^^ [02:27:44] PROBLEM - restbase endpoints health on restbase2010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:27:44] PROBLEM - restbase endpoints health on restbase1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:27:44] PROBLEM - restbase endpoints health on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:27:44] PROBLEM - restbase endpoints health on restbase-test2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:28:04] PROBLEM - Apache HTTP on mw1190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:28:05] PROBLEM - HHVM rendering on mw1282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:28:05] PROBLEM - HHVM rendering on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:28:14] PROBLEM - Apache HTTP on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:28:14] RECOVERY - Apache HTTP on mw1197 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 6.568 second response time [02:28:24] PROBLEM - Apache HTTP on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:28:34] RECOVERY - HHVM rendering on mw1197 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 7.586 second response time [02:28:34] PROBLEM - Apache HTTP on mw1282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:28:34] PROBLEM - Apache HTTP on mw1204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:28:34] PROBLEM - HHVM rendering on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:28:54] RECOVERY - restbase endpoints health on restbase2004 is OK: All endpoints are healthy [02:28:54] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:29:04] RECOVERY - Apache HTTP on mw1189 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.600 second response time [02:29:04] RECOVERY - Apache HTTP on mw1199 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 6.035 second response time [02:29:04] PROBLEM - Apache HTTP on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:29:04] PROBLEM - Apache HTTP on mw1289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:29:05] PROBLEM - HHVM rendering on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:29:14] PROBLEM - Apache HTTP on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:29:14] PROBLEM - Apache HTTP on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:29:22] yeah api is in trouble, I'm taking a look [02:29:27] RECOVERY - LVS HTTP IPv4 on api.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 18293 bytes in 0.226 second response time [02:29:27] PROBLEM - HHVM rendering on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:29:28] PROBLEM - HHVM rendering on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:29:28] RECOVERY - HHVM rendering on mw1199 is OK: HTTP OK: HTTP/1.1 200 OK - 69343 bytes in 8.047 second response time [02:30:04] PROBLEM - HHVM rendering on mw1278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:14] RECOVERY - Apache HTTP on mw1196 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.429 second response time [02:30:24] RECOVERY - HHVM rendering on mw1196 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 9.756 second response time [02:30:44] RECOVERY - restbase endpoints health on restbase2006 is OK: All endpoints are healthy [02:30:54] RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy [02:31:04] RECOVERY - Apache HTTP on mw1280 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.131 second response time [02:31:14] RECOVERY - HHVM rendering on mw1280 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.540 second response time [02:31:24] PROBLEM - Apache HTTP on mw1278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:44] RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy [02:31:44] PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:31:54] PROBLEM - restbase endpoints health on restbase2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:31:54] PROBLEM - Codfw HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [02:32:04] PROBLEM - Apache HTTP on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:32:14] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [1000.0] [02:32:15] RECOVERY - HHVM rendering on mw1204 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 2.153 second response time [02:32:24] RECOVERY - Apache HTTP on mw1204 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.912 second response time [02:32:24] PROBLEM - HHVM rendering on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:32:44] RECOVERY - restbase endpoints health on restbase2007 is OK: All endpoints are healthy [02:32:52] Uh oh [02:32:54] RECOVERY - Codfw HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [02:33:05] RECOVERY - HHVM rendering on mw1205 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 9.627 second response time [02:33:14] RECOVERY - Apache HTTP on mw1205 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 9.070 second response time [02:33:14] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [02:33:14] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0] [02:33:24] PROBLEM - HHVM rendering on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:24] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0] [02:34:18] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:34:19] PROBLEM - Apache HTTP on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:19] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [02:34:44] RECOVERY - restbase endpoints health on restbase2010 is OK: All endpoints are healthy [02:34:44] RECOVERY - restbase endpoints health on restbase-test2003 is OK: All endpoints are healthy [02:34:44] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:35:09] PROBLEM - Apache HTTP on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:35:09] RECOVERY - Apache HTTP on mw1202 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.900 second response time [02:35:14] RECOVERY - HHVM rendering on mw1202 is OK: HTTP OK: HTTP/1.1 200 OK - 69348 bytes in 8.862 second response time [02:35:24] PROBLEM - HHVM rendering on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:35:34] RECOVERY - restbase endpoints health on restbase1011 is OK: All endpoints are healthy [02:35:44] RECOVERY - restbase endpoints health on restbase2003 is OK: All endpoints are healthy [02:35:44] RECOVERY - restbase endpoints health on restbase2008 is OK: All endpoints are healthy [02:35:44] RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy [02:35:44] RECOVERY - restbase endpoints health on restbase-test2002 is OK: All endpoints are healthy [02:35:44] PROBLEM - restbase endpoints health on restbase2007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:36:23] RECOVERY - restbase endpoints health on praseodymium is OK: All endpoints are healthy [02:36:23] PROBLEM - Codfw HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [1000.0] [02:37:04] RECOVERY - Apache HTTP on mw1195 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 4.493 second response time [02:37:04] RECOVERY - HHVM rendering on mw1195 is OK: HTTP OK: HTTP/1.1 200 OK - 69348 bytes in 5.065 second response time [02:37:14] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] [02:37:44] PROBLEM - restbase endpoints health on restbase2010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:37:54] RECOVERY - restbase endpoints health on restbase2009 is OK: All endpoints are healthy [02:38:04] PROBLEM - OCG health on ocg1001 is CRITICAL: CRITICAL: ocg_job_status 807308 msg (=800000 warning): ocg_render_job_queue 3130 msg (=3000 critical) [02:38:14] PROBLEM - OCG health on ocg1002 is CRITICAL: CRITICAL: ocg_job_status 807359 msg (=800000 warning): ocg_render_job_queue 3169 msg (=3000 critical) [02:38:24] PROBLEM - OCG health on ocg1003 is CRITICAL: CRITICAL: ocg_job_status 807403 msg (=800000 warning): ocg_render_job_queue 3211 msg (=3000 critical) [02:38:44] RECOVERY - restbase endpoints health on restbase1014 is OK: All endpoints are healthy [02:38:44] PROBLEM - restbase endpoints health on restbase1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:38:44] PROBLEM - restbase endpoints health on restbase2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:38:44] PROBLEM - restbase endpoints health on restbase2008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:38:44] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:38:44] PROBLEM - restbase endpoints health on restbase-test2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:38:44] RECOVERY - restbase endpoints health on restbase1015 is OK: All endpoints are healthy [02:38:45] PROBLEM - restbase endpoints health on restbase2006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:38:45] PROBLEM - restbase endpoints health on restbase-test2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:38:54] PROBLEM - restbase endpoints health on praseodymium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:39:14] RECOVERY - HHVM rendering on mw1196 is OK: HTTP OK: HTTP/1.1 200 OK - 69347 bytes in 0.242 second response time [02:39:34] RECOVERY - HHVM rendering on mw1194 is OK: HTTP OK: HTTP/1.1 200 OK - 69347 bytes in 6.003 second response time [02:39:34] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [02:39:44] RECOVERY - Apache HTTP on mw1281 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 2.762 second response time [02:39:44] RECOVERY - restbase endpoints health on restbase-test2002 is OK: All endpoints are healthy [02:39:44] RECOVERY - restbase endpoints health on restbase2006 is OK: All endpoints are healthy [02:39:44] RECOVERY - restbase endpoints health on restbase1013 is OK: All endpoints are healthy [02:39:44] RECOVERY - restbase endpoints health on restbase2010 is OK: All endpoints are healthy [02:39:44] RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy [02:39:44] RECOVERY - restbase endpoints health on restbase1007 is OK: All endpoints are healthy [02:39:45] RECOVERY - restbase endpoints health on cerium is OK: All endpoints are healthy [02:39:45] RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy [02:39:46] RECOVERY - restbase endpoints health on restbase2004 is OK: All endpoints are healthy [02:39:54] RECOVERY - HHVM rendering on mw1287 is OK: HTTP OK: HTTP/1.1 200 OK - 69347 bytes in 0.209 second response time [02:40:04] RECOVERY - Apache HTTP on mw1287 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.020 second response time [02:40:24] RECOVERY - HHVM rendering on mw1199 is OK: HTTP OK: HTTP/1.1 200 OK - 69347 bytes in 0.184 second response time [02:40:34] RECOVERY - restbase endpoints health on restbase2001 is OK: All endpoints are healthy [02:40:34] RECOVERY - restbase endpoints health on restbase2011 is OK: All endpoints are healthy [02:40:44] RECOVERY - restbase endpoints health on restbase2007 is OK: All endpoints are healthy [02:40:44] RECOVERY - restbase endpoints health on restbase1010 is OK: All endpoints are healthy [02:41:04] PROBLEM - Apache HTTP on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:41:05] !log dumping hhvm backtraces and roll-restart on affected api machines [02:41:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:41:24] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [02:41:44] RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy [02:41:44] PROBLEM - restbase endpoints health on restbase1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:41:54] PROBLEM - restbase endpoints health on restbase1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:42:04] PROBLEM - HHVM rendering on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:42:14] PROBLEM - Apache HTTP on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:42:14] PROBLEM - HHVM rendering on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:42:14] PROBLEM - Apache HTTP on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:42:14] PROBLEM - HHVM rendering on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:42:14] RECOVERY - Apache HTTP on mw1194 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.073 second response time [02:42:14] PROBLEM - Apache HTTP on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:42:14] RECOVERY - Apache HTTP on mw1279 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.411 second response time [02:42:24] PROBLEM - Apache HTTP on mw1203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:42:24] PROBLEM - Apache HTTP on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:42:24] PROBLEM - HHVM rendering on mw1203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:42:24] PROBLEM - HHVM rendering on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:42:34] PROBLEM - HHVM rendering on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:42:44] PROBLEM - Apache HTTP on mw1281 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:42:44] RECOVERY - restbase endpoints health on restbase-test2003 is OK: All endpoints are healthy [02:42:44] PROBLEM - restbase endpoints health on restbase-test2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:42:54] PROBLEM - restbase endpoints health on restbase2009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:42:54] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:42:54] RECOVERY - Apache HTTP on mw1280 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.073 second response time [02:42:54] RECOVERY - Codfw HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [02:43:04] RECOVERY - Apache HTTP on mw1289 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.663 second response time [02:43:04] PROBLEM - Apache HTTP on mw1233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:43:14] RECOVERY - Apache HTTP on mw1277 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.998 second response time [02:43:14] PROBLEM - Apache HTTP on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:43:14] RECOVERY - HHVM rendering on mw1289 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.176 second response time [02:43:14] PROBLEM - HHVM rendering on mw1233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:43:14] PROBLEM - HHVM rendering on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:43:14] PROBLEM - HHVM rendering on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:43:24] RECOVERY - HHVM rendering on mw1277 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 9.705 second response time [02:43:24] PROBLEM - HHVM rendering on mw1204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:43:34] PROBLEM - HHVM rendering on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:43:34] PROBLEM - Apache HTTP on mw1204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:43:44] RECOVERY - restbase endpoints health on restbase1008 is OK: All endpoints are healthy [02:43:44] RECOVERY - restbase endpoints health on restbase1014 is OK: All endpoints are healthy [02:43:44] PROBLEM - restbase endpoints health on restbase2011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:43:44] PROBLEM - restbase endpoints health on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:43:44] PROBLEM - restbase endpoints health on restbase2007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:44:34] PROBLEM - Apache HTTP on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:44:44] RECOVERY - restbase endpoints health on restbase2008 is OK: All endpoints are healthy [02:44:44] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:44:44] PROBLEM - restbase endpoints health on restbase2010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:44:44] RECOVERY - restbase endpoints health on praseodymium is OK: All endpoints are healthy [02:45:04] RECOVERY - HHVM rendering on mw1276 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 3.891 second response time [02:45:14] RECOVERY - HHVM rendering on mw1285 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 5.708 second response time [02:45:14] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [02:45:14] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [02:45:14] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [02:45:24] PROBLEM - Apache HTTP on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:45:24] PROBLEM - HHVM rendering on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:45:44] PROBLEM - restbase endpoints health on restbase-test2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:45:44] PROBLEM - restbase endpoints health on restbase-test2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:45:44] PROBLEM - restbase endpoints health on restbase1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:45:44] PROBLEM - restbase endpoints health on cerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:45:54] RECOVERY - HHVM rendering on mw1278 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.230 second response time [02:46:04] PROBLEM - Apache HTTP on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:46:05] PROBLEM - Apache HTTP on mw1289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:46:14] RECOVERY - Apache HTTP on mw1278 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.025 second response time [02:46:14] PROBLEM - Apache HTTP on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:46:24] PROBLEM - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is CRITICAL: /{domain}/v1/page/media/{title} (retrieve images and videos of en.wp Cat page via media route) is CRITICAL: Test retrieve images and videos of en.wp Cat page via media route returned the unexpected status 503 (expecting: 200) [02:46:24] PROBLEM - HHVM rendering on mw1289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:46:24] PROBLEM - HHVM rendering on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:46:34] RECOVERY - restbase endpoints health on restbase1013 is OK: All endpoints are healthy [02:46:34] RECOVERY - restbase endpoints health on restbase2010 is OK: All endpoints are healthy [02:46:34] RECOVERY - restbase endpoints health on restbase2003 is OK: All endpoints are healthy [02:46:34] RECOVERY - restbase endpoints health on restbase2007 is OK: All endpoints are healthy [02:46:34] RECOVERY - restbase endpoints health on restbase1011 is OK: All endpoints are healthy [02:46:34] RECOVERY - restbase endpoints health on restbase-test2003 is OK: All endpoints are healthy [02:46:34] RECOVERY - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is OK: All endpoints are healthy [02:46:35] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [02:46:44] RECOVERY - restbase endpoints health on restbase-test2002 is OK: All endpoints are healthy [02:46:44] RECOVERY - restbase endpoints health on restbase2005 is OK: All endpoints are healthy [02:46:44] RECOVERY - restbase endpoints health on restbase2001 is OK: All endpoints are healthy [02:46:44] RECOVERY - mobileapps endpoints health on scb2004 is OK: All endpoints are healthy [02:46:44] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy [02:46:44] RECOVERY - restbase endpoints health on cerium is OK: All endpoints are healthy [02:46:44] RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy [02:46:45] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [02:46:45] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [02:46:46] RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy [02:46:46] RECOVERY - restbase endpoints health on restbase2011 is OK: All endpoints are healthy [02:46:47] RECOVERY - restbase endpoints health on restbase2009 is OK: All endpoints are healthy [02:46:54] RECOVERY - restbase endpoints health on restbase1015 is OK: All endpoints are healthy [02:47:04] RECOVERY - HHVM rendering on mw1279 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.356 second response time [02:47:04] RECOVERY - Apache HTTP on mw1289 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 9.212 second response time [02:47:14] RECOVERY - Apache HTTP on mw1279 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.037 second response time [02:47:14] RECOVERY - HHVM rendering on mw1227 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 1.079 second response time [02:47:24] RECOVERY - HHVM rendering on mw1284 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 8.226 second response time [02:47:24] RECOVERY - Apache HTTP on mw1276 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.197 second response time [02:47:34] RECOVERY - Apache HTTP on mw1227 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.519 second response time [02:47:44] RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy [02:48:04] RECOVERY - Apache HTTP on mw1195 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.051 second response time [02:48:04] RECOVERY - HHVM rendering on mw1195 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.284 second response time [02:48:04] PROBLEM - HHVM rendering on mw1276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:48:14] PROBLEM - HHVM rendering on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:48:54] RECOVERY - Apache HTTP on mw1280 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.028 second response time [02:49:14] RECOVERY - HHVM rendering on mw1280 is OK: HTTP OK: HTTP/1.1 200 OK - 69336 bytes in 0.077 second response time [02:49:24] RECOVERY - HHVM rendering on mw1199 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 3.158 second response time [02:49:24] PROBLEM - Apache HTTP on mw1208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:49:24] PROBLEM - HHVM rendering on mw1208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:49:44] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:49:44] PROBLEM - restbase endpoints health on restbase1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:49:44] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:49:44] PROBLEM - restbase endpoints health on restbase2007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:49:44] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:49:44] PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:49:44] PROBLEM - mobileapps endpoints health on scb2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:50:04] RECOVERY - HHVM rendering on mw1285 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 1.016 second response time [02:50:05] RECOVERY - Apache HTTP on mw1277 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.073 second response time [02:50:05] PROBLEM - Apache HTTP on mw1289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:50:14] RECOVERY - Apache HTTP on mw1197 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.688 second response time [02:50:14] RECOVERY - HHVM rendering on mw1277 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 1.727 second response time [02:50:24] PROBLEM - HHVM rendering on mw1189 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:50:24] PROBLEM - HHVM rendering on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:50:24] RECOVERY - HHVM rendering on mw1197 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 4.624 second response time [02:50:34] PROBLEM - Apache HTTP on mw1276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:50:34] RECOVERY - Apache HTTP on mw1281 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.025 second response time [02:50:44] RECOVERY - restbase endpoints health on restbase2007 is OK: All endpoints are healthy [02:50:54] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:50:54] PROBLEM - restbase endpoints health on restbase1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:50:54] RECOVERY - Apache HTTP on mw1233 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 2.109 second response time [02:50:54] RECOVERY - HHVM rendering on mw1205 is OK: HTTP OK: HTTP/1.1 200 OK - 69337 bytes in 0.107 second response time [02:51:04] RECOVERY - HHVM rendering on mw1281 is OK: HTTP OK: HTTP/1.1 200 OK - 69336 bytes in 0.083 second response time [02:51:04] RECOVERY - Apache HTTP on mw1205 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 2.240 second response time [02:51:04] RECOVERY - HHVM rendering on mw1233 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 2.624 second response time [02:51:04] PROBLEM - Apache HTTP on mw1189 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:51:14] RECOVERY - Apache HTTP on mw1196 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.689 second response time [02:51:14] RECOVERY - HHVM rendering on mw1204 is OK: HTTP OK: HTTP/1.1 200 OK - 69337 bytes in 0.513 second response time [02:51:14] RECOVERY - HHVM rendering on mw1196 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 1.586 second response time [02:51:24] PROBLEM - HHVM rendering on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:51:24] PROBLEM - Apache HTTP on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:51:24] RECOVERY - Apache HTTP on mw1204 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.355 second response time [02:51:34] PROBLEM - HHVM rendering on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:51:34] PROBLEM - Apache HTTP on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:51:44] PROBLEM - restbase endpoints health on restbase1008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:51:44] PROBLEM - restbase endpoints health on restbase2010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:51:44] PROBLEM - restbase endpoints health on restbase-test2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:51:44] PROBLEM - restbase endpoints health on restbase2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:51:44] PROBLEM - restbase endpoints health on restbase2005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:51:44] PROBLEM - restbase endpoints health on restbase-test2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:51:44] PROBLEM - restbase endpoints health on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:51:45] PROBLEM - restbase endpoints health on restbase1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:51:45] PROBLEM - restbase endpoints health on cerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:51:46] PROBLEM - restbase endpoints health on xenon is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:51:46] PROBLEM - restbase endpoints health on restbase-test2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:51:54] PROBLEM - restbase endpoints health on restbase2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:51:54] PROBLEM - restbase endpoints health on restbase1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:51:55] RECOVERY - HHVM rendering on mw1282 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.178 second response time [02:52:05] RECOVERY - HHVM rendering on mw1290 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 6.853 second response time [02:52:05] RECOVERY - Apache HTTP on mw1290 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.815 second response time [02:52:14] PROBLEM - Apache HTTP on mw1234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:52:24] RECOVERY - Apache HTTP on mw1282 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.029 second response time [02:52:24] PROBLEM - HHVM rendering on mw1234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:52:34] PROBLEM - HHVM rendering on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:52:44] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:52:44] PROBLEM - restbase endpoints health on restbase2011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:52:44] PROBLEM - restbase endpoints health on restbase1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:52:44] RECOVERY - restbase endpoints health on restbase-test2002 is OK: All endpoints are healthy [02:52:54] PROBLEM - restbase endpoints health on praseodymium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:52:54] PROBLEM - restbase endpoints health on restbase2009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:53:14] PROBLEM - HHVM rendering on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:53:14] PROBLEM - Apache HTTP on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:53:24] PROBLEM - HHVM rendering on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:53:34] PROBLEM - Apache HTTP on mw1207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:53:34] RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy [02:53:44] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [02:53:44] PROBLEM - HHVM rendering on mw1207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:53:44] PROBLEM - restbase endpoints health on restbase1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:53:44] PROBLEM - restbase endpoints health on restbase2008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:53:44] PROBLEM - restbase endpoints health on restbase2006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:53:44] RECOVERY - restbase endpoints health on restbase2004 is OK: All endpoints are healthy [02:53:54] RECOVERY - Apache HTTP on mw1283 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.028 second response time [02:53:54] RECOVERY - HHVM rendering on mw1283 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.140 second response time [02:54:14] RECOVERY - HHVM rendering on mw1202 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 9.576 second response time [02:54:44] PROBLEM - restbase endpoints health on restbase2007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:55:04] RECOVERY - Apache HTTP on mw1202 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 4.784 second response time [02:55:04] PROBLEM - HHVM rendering on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:55:14] PROBLEM - Apache HTTP on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:55:14] RECOVERY - HHVM rendering on mw1284 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.173 second response time [02:55:14] RECOVERY - HHVM rendering on mw1198 is OK: HTTP OK: HTTP/1.1 200 OK - 69336 bytes in 0.100 second response time [02:55:14] RECOVERY - Apache HTTP on mw1198 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.061 second response time [02:55:34] RECOVERY - restbase endpoints health on restbase2001 is OK: All endpoints are healthy [02:55:34] RECOVERY - restbase endpoints health on restbase-test2003 is OK: All endpoints are healthy [02:55:44] RECOVERY - restbase endpoints health on cerium is OK: All endpoints are healthy [02:55:44] RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy [02:55:44] RECOVERY - restbase endpoints health on restbase1007 is OK: All endpoints are healthy [02:55:44] RECOVERY - restbase endpoints health on restbase2003 is OK: All endpoints are healthy [02:55:44] RECOVERY - restbase endpoints health on restbase2006 is OK: All endpoints are healthy [02:55:44] RECOVERY - restbase endpoints health on restbase2008 is OK: All endpoints are healthy [02:55:44] RECOVERY - restbase endpoints health on restbase1013 is OK: All endpoints are healthy [02:55:45] RECOVERY - restbase endpoints health on restbase2007 is OK: All endpoints are healthy [02:55:45] RECOVERY - restbase endpoints health on restbase2005 is OK: All endpoints are healthy [02:55:46] RECOVERY - restbase endpoints health on restbase1014 is OK: All endpoints are healthy [02:55:46] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy [02:55:47] RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy [02:55:47] RECOVERY - restbase endpoints health on restbase1010 is OK: All endpoints are healthy [02:55:48] RECOVERY - restbase endpoints health on restbase2009 is OK: All endpoints are healthy [02:55:54] RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy [02:56:04] RECOVERY - Apache HTTP on mw1284 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.025 second response time [02:56:14] RECOVERY - Apache HTTP on mw1203 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 4.672 second response time [02:56:44] RECOVERY - restbase endpoints health on restbase1008 is OK: All endpoints are healthy [02:56:44] RECOVERY - restbase endpoints health on restbase2010 is OK: All endpoints are healthy [02:56:44] RECOVERY - restbase endpoints health on restbase2011 is OK: All endpoints are healthy [02:56:44] RECOVERY - restbase endpoints health on restbase1011 is OK: All endpoints are healthy [02:56:44] RECOVERY - restbase endpoints health on praseodymium is OK: All endpoints are healthy [02:56:44] RECOVERY - restbase endpoints health on restbase1015 is OK: All endpoints are healthy [02:57:04] RECOVERY - HHVM rendering on mw1285 is OK: HTTP OK: HTTP/1.1 200 OK - 69345 bytes in 0.174 second response time [02:57:04] PROBLEM - HHVM rendering on mw1278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:57:14] RECOVERY - Apache HTTP on mw1222 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 6.088 second response time [02:57:24] RECOVERY - Apache HTTP on mw1285 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.033 second response time [02:57:24] PROBLEM - Apache HTTP on mw1278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:58:04] RECOVERY - Apache HTTP on mw1288 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.503 second response time [02:58:04] RECOVERY - HHVM rendering on mw1288 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 6.490 second response time [02:58:04] PROBLEM - HHVM rendering on mw1287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:58:14] PROBLEM - HHVM rendering on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:58:14] PROBLEM - Apache HTTP on mw1287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:58:24] RECOVERY - HHVM rendering on mw1222 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 7.293 second response time [02:58:34] RECOVERY - Apache HTTP on mw1227 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.564 second response time [02:58:44] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:58:44] PROBLEM - restbase endpoints health on restbase2008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:58:44] PROBLEM - restbase endpoints health on restbase1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:58:44] PROBLEM - restbase endpoints health on restbase1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:58:44] PROBLEM - restbase endpoints health on restbase2007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:58:44] PROBLEM - restbase endpoints health on restbase2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:58:44] PROBLEM - restbase endpoints health on xenon is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:58:45] PROBLEM - restbase endpoints health on cerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:58:45] PROBLEM - restbase endpoints health on restbase2006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:58:54] RECOVERY - Apache HTTP on mw1286 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.025 second response time [02:58:54] RECOVERY - HHVM rendering on mw1286 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.154 second response time [02:59:04] RECOVERY - Apache HTTP on mw1199 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.156 second response time [02:59:14] PROBLEM - Apache HTTP on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:59:24] PROBLEM - Apache HTTP on mw1203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:59:24] PROBLEM - HHVM rendering on mw1231 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:59:24] PROBLEM - HHVM rendering on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:59:24] PROBLEM - HHVM rendering on mw1204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:59:24] PROBLEM - Apache HTTP on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:59:24] RECOVERY - HHVM rendering on mw1199 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 5.298 second response time [02:59:34] PROBLEM - Apache HTTP on mw1231 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:59:34] PROBLEM - HHVM rendering on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:59:34] PROBLEM - Apache HTTP on mw1204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:59:44] PROBLEM - restbase endpoints health on restbase1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:59:44] PROBLEM - restbase endpoints health on restbase-test2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:59:44] PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:59:44] PROBLEM - restbase endpoints health on restbase2011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:59:44] PROBLEM - restbase endpoints health on restbase2005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:59:44] PROBLEM - restbase endpoints health on restbase1008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:59:44] PROBLEM - restbase endpoints health on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:59:45] PROBLEM - restbase endpoints health on restbase-test2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:59:54] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:59:54] PROBLEM - restbase endpoints health on restbase2009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:59:54] PROBLEM - restbase endpoints health on restbase1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:59:54] PROBLEM - restbase endpoints health on restbase1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:59:54] PROBLEM - restbase endpoints health on restbase2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:59:54] RECOVERY - HHVM rendering on mw1287 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.222 second response time [03:00:04] RECOVERY - Apache HTTP on mw1234 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.056 second response time [03:00:04] PROBLEM - Apache HTTP on mw1233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:00:04] RECOVERY - Apache HTTP on mw1287 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.035 second response time [03:00:05] PROBLEM - Apache HTTP on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:00:05] PROBLEM - HHVM rendering on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:00:14] PROBLEM - Apache HTTP on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:00:14] PROBLEM - Apache HTTP on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:00:14] PROBLEM - HHVM rendering on mw1233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:00:24] RECOVERY - HHVM rendering on mw1234 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 4.162 second response time [03:00:24] PROBLEM - Apache HTTP on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:00:24] PROBLEM - HHVM rendering on mw1193 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:00:25] PROBLEM - HHVM rendering on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:00:34] PROBLEM - HHVM rendering on mw1191 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:00:36] PROBLEM - LVS HTTP IPv4 on api.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:00:37] PROBLEM - Apache HTTP on mw1193 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:00:37] PROBLEM - HHVM rendering on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:00:44] RECOVERY - restbase endpoints health on restbase2005 is OK: All endpoints are healthy [03:00:44] PROBLEM - restbase endpoints health on restbase1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:00:54] PROBLEM - restbase endpoints health on praseodymium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:00:54] PROBLEM - Apache HTTP on mw1288 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.003 second response time [03:00:54] PROBLEM - HHVM rendering on mw1288 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [03:00:54] RECOVERY - Apache HTTP on mw1189 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.096 second response time [03:01:04] PROBLEM - Apache HTTP on mw1191 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:01:04] PROBLEM - HHVM rendering on mw1200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:01:14] PROBLEM - Apache HTTP on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:01:14] PROBLEM - HHVM rendering on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:01:14] RECOVERY - HHVM rendering on mw1189 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 1.799 second response time [03:01:24] RECOVERY - Apache HTTP on mw1208 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 4.660 second response time [03:01:24] RECOVERY - HHVM rendering on mw1208 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 5.396 second response time [03:01:34] PROBLEM - Apache HTTP on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:01:44] RECOVERY - restbase endpoints health on restbase2003 is OK: All endpoints are healthy [03:01:44] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:01:44] PROBLEM - restbase endpoints health on restbase2010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:01:44] PROBLEM - restbase endpoints health on restbase-test2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:01:54] RECOVERY - Apache HTTP on mw1288 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.020 second response time [03:01:54] RECOVERY - Apache HTTP on mw1190 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.062 second response time [03:01:54] RECOVERY - HHVM rendering on mw1288 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.128 second response time [03:02:04] RECOVERY - HHVM rendering on mw1190 is OK: HTTP OK: HTTP/1.1 200 OK - 69337 bytes in 0.222 second response time [03:02:04] RECOVERY - Apache HTTP on mw1206 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 4.571 second response time [03:02:04] RECOVERY - HHVM rendering on mw1206 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 2.858 second response time [03:02:14] RECOVERY - HHVM rendering on mw1227 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 4.361 second response time [03:02:34] RECOVERY - Apache HTTP on mw1227 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 4.449 second response time [03:02:34] RECOVERY - Apache HTTP on mw1276 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.652 second response time [03:02:36] RECOVERY - LVS HTTP IPv4 on api.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 18294 bytes in 9.708 second response time [03:02:36] PROBLEM - Apache HTTP on mw1232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:02:36] PROBLEM - HHVM rendering on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:02:36] PROBLEM - Restbase LVS codfw on restbase.svc.codfw.wmnet is CRITICAL: /page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) [03:02:44] RECOVERY - restbase endpoints health on restbase1007 is OK: All endpoints are healthy [03:02:44] RECOVERY - restbase endpoints health on restbase-test2002 is OK: All endpoints are healthy [03:03:04] RECOVERY - HHVM rendering on mw1276 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 0.826 second response time [03:03:05] PROBLEM - Apache HTTP on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:03:14] PROBLEM - HHVM rendering on mw1232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:03:14] RECOVERY - HHVM rendering on mw1289 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 0.310 second response time [03:03:24] PROBLEM - Apache HTTP on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:03:24] PROBLEM - Apache HTTP on mw1200 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [03:03:34] PROBLEM - HHVM rendering on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:03:44] PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /page/revision/{revision} (Get rev by ID) is CRITICAL: Test Get rev by ID returned the unexpected status 503 (expecting: 200) [03:03:54] RECOVERY - Apache HTTP on mw1289 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.031 second response time [03:04:04] RECOVERY - HHVM rendering on mw1202 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 0.369 second response time [03:04:04] PROBLEM - HHVM rendering on mw1225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:04:14] PROBLEM - Apache HTTP on mw1225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:04:14] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [03:04:14] RECOVERY - HHVM rendering on mw1204 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 0.165 second response time [03:04:24] PROBLEM - Apache HTTP on mw1229 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:04:24] RECOVERY - Apache HTTP on mw1200 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.034 second response time [03:04:24] RECOVERY - Apache HTTP on mw1204 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.032 second response time [03:04:34] PROBLEM - HHVM rendering on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:04:34] PROBLEM - HHVM rendering on mw1229 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:04:44] RECOVERY - restbase endpoints health on restbase1011 is OK: All endpoints are healthy [03:04:44] RECOVERY - restbase endpoints health on restbase2006 is OK: All endpoints are healthy [03:04:44] RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy [03:04:44] RECOVERY - restbase endpoints health on restbase2008 is OK: All endpoints are healthy [03:04:44] RECOVERY - restbase endpoints health on cerium is OK: All endpoints are healthy [03:04:44] RECOVERY - restbase endpoints health on restbase2007 is OK: All endpoints are healthy [03:04:44] RECOVERY - restbase endpoints health on restbase1008 is OK: All endpoints are healthy [03:04:45] RECOVERY - Restbase LVS codfw on restbase.svc.codfw.wmnet is OK: All endpoints are healthy [03:04:45] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [03:04:46] PROBLEM - restbase endpoints health on restbase2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:04:54] RECOVERY - Apache HTTP on mw1202 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.030 second response time [03:04:54] RECOVERY - HHVM rendering on mw1200 is OK: HTTP OK: HTTP/1.1 200 OK - 69341 bytes in 0.111 second response time [03:04:54] RECOVERY - HHVM rendering on mw1290 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 0.175 second response time [03:05:04] RECOVERY - Apache HTTP on mw1290 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.029 second response time [03:05:04] RECOVERY - HHVM rendering on mw1232 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 7.109 second response time [03:05:14] RECOVERY - Apache HTTP on mw1229 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 6.688 second response time [03:05:24] RECOVERY - Apache HTTP on mw1232 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.716 second response time [03:05:24] RECOVERY - HHVM rendering on mw1226 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 3.431 second response time [03:05:34] RECOVERY - HHVM rendering on mw1229 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 3.065 second response time [03:05:34] RECOVERY - HHVM rendering on mw1207 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 0.201 second response time [03:05:44] RECOVERY - restbase endpoints health on restbase1013 is OK: All endpoints are healthy [03:05:44] RECOVERY - restbase endpoints health on restbase1014 is OK: All endpoints are healthy [03:05:44] RECOVERY - restbase endpoints health on praseodymium is OK: All endpoints are healthy [03:05:44] PROBLEM - restbase endpoints health on restbase1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:05:44] RECOVERY - restbase endpoints health on restbase2004 is OK: All endpoints are healthy [03:05:44] RECOVERY - restbase endpoints health on restbase2009 is OK: All endpoints are healthy [03:05:44] PROBLEM - restbase endpoints health on restbase-test2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:05:54] RECOVERY - restbase endpoints health on restbase1010 is OK: All endpoints are healthy [03:05:54] RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy [03:05:54] RECOVERY - HHVM rendering on mw1205 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 0.184 second response time [03:06:04] RECOVERY - Apache HTTP on mw1205 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.038 second response time [03:06:04] RECOVERY - HHVM rendering on mw1225 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 3.964 second response time [03:06:04] RECOVERY - Apache HTTP on mw1225 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.795 second response time [03:06:05] RECOVERY - Apache HTTP on mw1277 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.568 second response time [03:06:05] PROBLEM - HHVM rendering on mw1276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:06:05] RECOVERY - HHVM rendering on mw1201 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 0.181 second response time [03:06:14] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0] [03:06:14] RECOVERY - Apache HTTP on mw1201 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.042 second response time [03:06:14] RECOVERY - HHVM rendering on mw1231 is OK: HTTP OK: HTTP/1.1 200 OK - 69340 bytes in 0.073 second response time [03:06:14] RECOVERY - HHVM rendering on mw1277 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 0.252 second response time [03:06:24] RECOVERY - Apache HTTP on mw1231 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.032 second response time [03:06:24] RECOVERY - PyBal backends health check on lvs1006 is OK: PYBAL OK - All pools are healthy [03:06:24] RECOVERY - Apache HTTP on mw1207 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.025 second response time [03:06:24] RECOVERY - HHVM rendering on mw1222 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 4.209 second response time [03:06:24] RECOVERY - PyBal backends health check on lvs1003 is OK: PYBAL OK - All pools are healthy [03:06:34] PROBLEM - Apache HTTP on mw1276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:06:34] RECOVERY - restbase endpoints health on restbase2011 is OK: All endpoints are healthy [03:06:44] RECOVERY - restbase endpoints health on restbase2010 is OK: All endpoints are healthy [03:06:44] RECOVERY - restbase endpoints health on restbase-test2002 is OK: All endpoints are healthy [03:06:44] RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy [03:06:44] RECOVERY - restbase endpoints health on restbase1007 is OK: All endpoints are healthy [03:06:44] RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy [03:06:44] RECOVERY - restbase endpoints health on restbase2003 is OK: All endpoints are healthy [03:06:44] RECOVERY - restbase endpoints health on restbase2001 is OK: All endpoints are healthy [03:06:45] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy [03:06:45] RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy [03:06:46] PROBLEM - restbase endpoints health on restbase2005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:06:54] PROBLEM - Codfw HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] [03:07:04] RECOVERY - Apache HTTP on mw1197 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.046 second response time [03:07:14] RECOVERY - Apache HTTP on mw1203 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.076 second response time [03:07:14] RECOVERY - HHVM rendering on mw1203 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 0.203 second response time [03:07:14] RECOVERY - Apache HTTP on mw1222 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.311 second response time [03:07:24] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0] [03:07:34] RECOVERY - restbase endpoints health on restbase-test2003 is OK: All endpoints are healthy [03:07:44] RECOVERY - restbase endpoints health on restbase2005 is OK: All endpoints are healthy [03:07:44] RECOVERY - mobileapps endpoints health on scb2003 is OK: All endpoints are healthy [03:07:44] RECOVERY - restbase endpoints health on restbase1015 is OK: All endpoints are healthy [03:07:44] PROBLEM - restbase endpoints health on restbase-test2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:07:54] RECOVERY - Apache HTTP on mw1233 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.111 second response time [03:07:54] RECOVERY - Apache HTTP on mw1191 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.272 second response time [03:08:04] RECOVERY - HHVM rendering on mw1233 is OK: HTTP OK: HTTP/1.1 200 OK - 69341 bytes in 0.141 second response time [03:08:14] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [1000.0] [03:08:14] RECOVERY - Apache HTTP on mw1194 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.649 second response time [03:08:14] RECOVERY - HHVM rendering on mw1196 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 0.241 second response time [03:08:24] PROBLEM - HHVM rendering on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:08:24] RECOVERY - HHVM rendering on mw1199 is OK: HTTP OK: HTTP/1.1 200 OK - 69344 bytes in 2.163 second response time [03:08:24] RECOVERY - HHVM rendering on mw1194 is OK: HTTP OK: HTTP/1.1 200 OK - 69342 bytes in 0.213 second response time [03:08:44] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [03:08:54] RECOVERY - Apache HTTP on mw1199 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.056 second response time [03:09:04] RECOVERY - Apache HTTP on mw1196 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.039 second response time [03:09:04] PROBLEM - Apache HTTP on mw1189 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:09:14] RECOVERY - HHVM rendering on mw1227 is OK: HTTP OK: HTTP/1.1 200 OK - 69348 bytes in 2.706 second response time [03:09:14] PROBLEM - Apache HTTP on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:09:24] PROBLEM - HHVM rendering on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:09:24] PROBLEM - HHVM rendering on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:09:34] PROBLEM - Apache HTTP on mw1231 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:09:34] PROBLEM - Apache HTTP on mw1232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:09:34] PROBLEM - HHVM rendering on mw1229 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:09:44] PROBLEM - Apache HTTP on mw1281 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:10:04] RECOVERY - Apache HTTP on mw1195 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.035 second response time [03:10:04] RECOVERY - HHVM rendering on mw1195 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 0.262 second response time [03:10:05] PROBLEM - Apache HTTP on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:10:14] PROBLEM - Apache HTTP on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:10:14] PROBLEM - HHVM rendering on mw1281 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:10:14] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:10:14] RECOVERY - Apache HTTP on mw1277 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.695 second response time [03:10:14] RECOVERY - HHVM rendering on mw1277 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 0.136 second response time [03:10:14] RECOVERY - HHVM rendering on mw1193 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 0.265 second response time [03:10:24] RECOVERY - HHVM rendering on mw1191 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 0.322 second response time [03:10:24] RECOVERY - Apache HTTP on mw1193 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.379 second response time [03:10:24] RECOVERY - Apache HTTP on mw1232 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.528 second response time [03:10:24] RECOVERY - HHVM rendering on mw1197 is OK: HTTP OK: HTTP/1.1 200 OK - 69360 bytes in 1.041 second response time [03:10:24] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:10:34] PROBLEM - HHVM rendering on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:10:44] PROBLEM - mobileapps endpoints health on scb2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:10:44] PROBLEM - restbase endpoints health on restbase1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:11:04] RECOVERY - HHVM rendering on mw1276 is OK: HTTP OK: HTTP/1.1 200 OK - 69360 bytes in 1.877 second response time [03:11:04] RECOVERY - Apache HTTP on mw1189 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.147 second response time [03:11:04] PROBLEM - Apache HTTP on mw1233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:11:04] PROBLEM - Apache HTTP on mw1190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:11:04] PROBLEM - HHVM rendering on mw1225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:11:14] PROBLEM - Apache HTTP on mw1225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:11:14] PROBLEM - Apache HTTP on mw1224 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:11:14] PROBLEM - HHVM rendering on mw1190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:11:24] RECOVERY - HHVM rendering on mw1284 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 4.799 second response time [03:11:24] PROBLEM - HHVM rendering on mw1234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:11:44] PROBLEM - restbase endpoints health on restbase2011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:11:44] PROBLEM - restbase endpoints health on restbase1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:11:44] PROBLEM - restbase endpoints health on restbase2010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:04] RECOVERY - Apache HTTP on mw1233 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.833 second response time [03:12:04] PROBLEM - Apache HTTP on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:12:04] PROBLEM - Apache HTTP on mw1230 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:12:05] PROBLEM - HHVM rendering on mw1230 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:12:14] PROBLEM - Apache HTTP on mw1235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:12:14] PROBLEM - HHVM rendering on mw1233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:12:14] RECOVERY - HHVM rendering on mw1198 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 0.136 second response time [03:12:14] RECOVERY - Apache HTTP on mw1198 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.047 second response time [03:12:24] PROBLEM - HHVM rendering on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:12:24] RECOVERY - HHVM rendering on mw1222 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.865 second response time [03:12:24] PROBLEM - HHVM rendering on mw1231 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:12:24] PROBLEM - HHVM rendering on mw1224 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:12:24] PROBLEM - HHVM rendering on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:12:24] PROBLEM - PyBal backends health check on lvs1003 is CRITICAL: PYBAL CRITICAL - api_80 - Could not depool server mw1200.eqiad.wmnet because of too many down! [03:12:34] PROBLEM - Apache HTTP on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:12:44] RECOVERY - restbase endpoints health on restbase2011 is OK: All endpoints are healthy [03:12:44] PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:44] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:44] PROBLEM - restbase endpoints health on restbase1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:44] PROBLEM - restbase endpoints health on restbase2007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:44] PROBLEM - restbase endpoints health on restbase1008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:44] PROBLEM - restbase endpoints health on restbase2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:45] PROBLEM - restbase endpoints health on restbase1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:45] PROBLEM - restbase endpoints health on restbase2005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:46] PROBLEM - restbase endpoints health on cerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:54] PROBLEM - restbase endpoints health on praseodymium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:54] PROBLEM - restbase endpoints health on restbase1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:54] PROBLEM - restbase endpoints health on restbase1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:54] PROBLEM - restbase endpoints health on restbase2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:54] PROBLEM - restbase endpoints health on restbase2009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:54] RECOVERY - Codfw HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:13:04] PROBLEM - Apache HTTP on mw1289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:13:05] PROBLEM - HHVM rendering on mw1223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:13:14] RECOVERY - Apache HTTP on mw1224 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.017 second response time [03:13:14] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0] [03:13:14] RECOVERY - HHVM rendering on mw1233 is OK: HTTP OK: HTTP/1.1 200 OK - 69360 bytes in 8.894 second response time [03:13:14] RECOVERY - HHVM rendering on mw1227 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 1.926 second response time [03:13:24] PROBLEM - Apache HTTP on mw1229 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:13:24] PROBLEM - HHVM rendering on mw1289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:13:24] PROBLEM - PyBal backends health check on lvs1006 is CRITICAL: PYBAL CRITICAL - api_80 - Could not depool server mw1288.eqiad.wmnet because of too many down! [03:13:24] PROBLEM - HHVM rendering on mw1189 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:13:24] RECOVERY - Apache HTTP on mw1227 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.104 second response time [03:13:24] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [03:13:34] RECOVERY - Apache HTTP on mw1231 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 9.637 second response time [03:13:34] PROBLEM - Apache HTTP on mw1282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:13:34] PROBLEM - HHVM rendering on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:13:34] PROBLEM - Apache HTTP on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:13:44] PROBLEM - restbase endpoints health on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:13:44] PROBLEM - restbase endpoints health on restbase-test2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:13:44] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:13:44] PROBLEM - restbase endpoints health on restbase2008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:13:44] PROBLEM - restbase endpoints health on restbase-test2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:13:44] PROBLEM - restbase endpoints health on restbase2006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:13:44] PROBLEM - restbase endpoints health on xenon is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:13:45] PROBLEM - restbase endpoints health on restbase-test2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:13:54] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:14:04] RECOVERY - Apache HTTP on mw1226 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.802 second response time [03:14:04] PROBLEM - Apache HTTP on mw1189 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:14:05] PROBLEM - HHVM rendering on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:14:05] PROBLEM - HHVM rendering on mw1276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:14:05] PROBLEM - HHVM rendering on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:14:05] PROBLEM - HHVM rendering on mw1282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:14:05] RECOVERY - HHVM rendering on mw1225 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 9.609 second response time [03:14:06] RECOVERY - Apache HTTP on mw1225 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.828 second response time [03:14:14] PROBLEM - Apache HTTP on mw1223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:14:14] PROBLEM - Apache HTTP on mw1234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:14:14] PROBLEM - Apache HTTP on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:14:14] PROBLEM - Apache HTTP on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:14:14] PROBLEM - HHVM rendering on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:14:14] RECOVERY - HHVM rendering on mw1280 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 0.237 second response time [03:14:15] PROBLEM - Apache HTTP on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:14:15] PROBLEM - HHVM rendering on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:14:24] PROBLEM - Apache HTTP on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:14:24] PROBLEM - HHVM rendering on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:14:24] PROBLEM - Apache HTTP on mw1208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:14:24] PROBLEM - HHVM rendering on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:14:24] PROBLEM - HHVM rendering on mw1204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:14:24] PROBLEM - HHVM rendering on mw1235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:14:24] PROBLEM - HHVM rendering on mw1208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:14:34] RECOVERY - HHVM rendering on mw1226 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 5.449 second response time [03:14:34] PROBLEM - Apache HTTP on mw1204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:15:04] PROBLEM - Apache HTTP on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:15:05] PROBLEM - Apache HTTP on mw1283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:15:05] PROBLEM - HHVM rendering on mw1283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:15:05] PROBLEM - HHVM rendering on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:15:14] PROBLEM - wikidata.org dispatch lag is higher than 300s on wikidata is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:15:24] RECOVERY - HHVM rendering on mw1224 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 5.852 second response time [03:15:24] PROBLEM - graphoid endpoints health on scb1002 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200) [03:15:24] RECOVERY - HHVM rendering on mw1235 is OK: HTTP OK: HTTP/1.1 200 OK - 69360 bytes in 9.781 second response time [03:15:44] PROBLEM - restbase endpoints health on restbase2011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:15:54] PROBLEM - Codfw HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [03:15:54] RECOVERY - Apache HTTP on mw1230 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.735 second response time [03:16:04] RECOVERY - HHVM rendering on mw1230 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.810 second response time [03:16:04] RECOVERY - HHVM rendering on mw1223 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.946 second response time [03:16:04] RECOVERY - Apache HTTP on mw1223 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.846 second response time [03:16:04] RECOVERY - Apache HTTP on mw1235 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.209 second response time [03:16:14] RECOVERY - Apache HTTP on mw1278 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.152 second response time [03:16:14] RECOVERY - graphoid endpoints health on scb1002 is OK: All endpoints are healthy [03:16:24] PROBLEM - Apache HTTP on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:16:34] PROBLEM - HHVM rendering on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:17:04] RECOVERY - HHVM rendering on mw1278 is OK: HTTP OK: HTTP/1.1 200 OK - 69360 bytes in 8.025 second response time [03:17:05] PROBLEM - Apache HTTP on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:17:14] PROBLEM - Apache HTTP on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:17:14] PROBLEM - HHVM rendering on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:17:14] RECOVERY - HHVM rendering on mw1231 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 1.184 second response time [03:17:24] PROBLEM - HHVM rendering on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:17:34] RECOVERY - Apache HTTP on mw1204 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.231 second response time [03:17:34] PROBLEM - HHVM rendering on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:17:44] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [03:17:44] PROBLEM - Restbase LVS codfw on restbase.svc.codfw.wmnet is CRITICAL: /page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) [03:18:04] RECOVERY - HHVM rendering on mw1205 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 4.772 second response time [03:18:04] PROBLEM - HHVM rendering on mw1287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:18:24] RECOVERY - HHVM rendering on mw1204 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 4.589 second response time [03:18:34] RECOVERY - HHVM rendering on mw1226 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 9.688 second response time [03:18:44] RECOVERY - restbase endpoints health on restbase2005 is OK: All endpoints are healthy [03:19:04] RECOVERY - Apache HTTP on mw1226 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 2.225 second response time [03:19:14] RECOVERY - Apache HTTP on mw1206 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 9.221 second response time [03:19:14] RECOVERY - HHVM rendering on mw1206 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 7.954 second response time [03:19:14] RECOVERY - Apache HTTP on mw1222 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.742 second response time [03:19:14] RECOVERY - Apache HTTP on mw1229 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.252 second response time [03:19:14] PROBLEM - Apache HTTP on mw1287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:19:24] RECOVERY - HHVM rendering on mw1222 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.755 second response time [03:19:24] PROBLEM - Apache HTTP on mw1278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:19:34] RECOVERY - HHVM rendering on mw1229 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 2.259 second response time [03:19:44] RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy [03:19:44] RECOVERY - Restbase LVS codfw on restbase.svc.codfw.wmnet is OK: All endpoints are healthy [03:19:44] RECOVERY - restbase endpoints health on restbase1013 is OK: All endpoints are healthy [03:19:54] RECOVERY - Apache HTTP on mw1189 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.025 second response time [03:20:04] RECOVERY - wikidata.org dispatch lag is higher than 300s on wikidata is OK: HTTP OK: HTTP/1.1 200 OK - 1729 bytes in 0.144 second response time [03:20:06] PROBLEM - HHVM rendering on mw1278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:20:14] RECOVERY - Apache HTTP on mw1205 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.529 second response time [03:20:14] RECOVERY - HHVM rendering on mw1189 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 0.158 second response time [03:20:44] RECOVERY - restbase endpoints health on restbase1007 is OK: All endpoints are healthy [03:20:44] RECOVERY - restbase endpoints health on restbase2006 is OK: All endpoints are healthy [03:20:44] RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy [03:20:44] RECOVERY - restbase endpoints health on restbase2008 is OK: All endpoints are healthy [03:20:44] RECOVERY - restbase endpoints health on restbase-test2002 is OK: All endpoints are healthy [03:20:44] RECOVERY - restbase endpoints health on restbase2003 is OK: All endpoints are healthy [03:20:44] RECOVERY - restbase endpoints health on restbase1011 is OK: All endpoints are healthy [03:20:45] RECOVERY - restbase endpoints health on restbase1008 is OK: All endpoints are healthy [03:20:46] RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy [03:20:54] RECOVERY - restbase endpoints health on praseodymium is OK: All endpoints are healthy [03:20:54] RECOVERY - restbase endpoints health on restbase1015 is OK: All endpoints are healthy [03:20:54] RECOVERY - restbase endpoints health on restbase2004 is OK: All endpoints are healthy [03:21:05] PROBLEM - HHVM rendering on mw1288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:21:14] RECOVERY - Apache HTTP on mw1234 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 9.976 second response time [03:21:15] RECOVERY - HHVM rendering on mw1234 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 2.956 second response time [03:21:24] PROBLEM - HHVM rendering on mw1204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:21:24] RECOVERY - PyBal backends health check on lvs1003 is OK: PYBAL OK - All pools are healthy [03:21:34] PROBLEM - Apache HTTP on mw1204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:21:44] RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy [03:21:44] RECOVERY - restbase endpoints health on restbase2001 is OK: All endpoints are healthy [03:21:44] RECOVERY - restbase endpoints health on restbase2007 is OK: All endpoints are healthy [03:21:44] PROBLEM - restbase endpoints health on restbase-test2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:21:54] RECOVERY - restbase endpoints health on restbase2009 is OK: All endpoints are healthy [03:22:04] RECOVERY - HHVM rendering on mw1287 is OK: HTTP OK: HTTP/1.1 200 OK - 69360 bytes in 2.072 second response time [03:22:04] PROBLEM - Apache HTTP on mw1288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:22:04] RECOVERY - HHVM rendering on mw1201 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 0.660 second response time [03:22:06] !log roll-restart hhvm across api_appserver [03:22:14] RECOVERY - Apache HTTP on mw1287 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 4.790 second response time [03:22:14] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 755.38 seconds [03:22:14] RECOVERY - Apache HTTP on mw1201 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.087 second response time [03:22:14] RECOVERY - Apache HTTP on mw1278 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.109 second response time [03:22:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:22:24] RECOVERY - HHVM rendering on mw1204 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 9.820 second response time [03:22:44] PROBLEM - restbase endpoints health on restbase1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:14] PROBLEM - Apache HTTP on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:23:14] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:23:14] RECOVERY - HHVM rendering on mw1277 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 0.275 second response time [03:23:24] PROBLEM - Apache HTTP on mw1229 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:23:24] RECOVERY - Apache HTTP on mw1285 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.038 second response time [03:23:44] RECOVERY - restbase endpoints health on restbase2010 is OK: All endpoints are healthy [03:23:44] PROBLEM - restbase endpoints health on restbase2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:44] PROBLEM - restbase endpoints health on restbase2008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:44] PROBLEM - restbase endpoints health on restbase1008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:44] PROBLEM - restbase endpoints health on restbase1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:44] PROBLEM - restbase endpoints health on restbase1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:44] PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:45] PROBLEM - restbase endpoints health on restbase2006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:54] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:54] PROBLEM - restbase endpoints health on praseodymium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:54] PROBLEM - restbase endpoints health on restbase2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:54] PROBLEM - restbase endpoints health on restbase1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:54] RECOVERY - Apache HTTP on mw1190 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.049 second response time [03:24:04] RECOVERY - HHVM rendering on mw1283 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 4.368 second response time [03:24:04] RECOVERY - Apache HTTP on mw1283 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 6.481 second response time [03:24:04] RECOVERY - HHVM rendering on mw1285 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.115 second response time [03:24:04] RECOVERY - HHVM rendering on mw1190 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 0.263 second response time [03:24:04] PROBLEM - Apache HTTP on mw1233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:24:04] PROBLEM - Apache HTTP on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:24:05] RECOVERY - Apache HTTP on mw1277 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.116 second response time [03:24:14] PROBLEM - Apache HTTP on mw1223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:24:14] PROBLEM - HHVM rendering on mw1233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:24:14] PROBLEM - graphoid endpoints health on scb1002 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200) [03:24:24] PROBLEM - Apache HTTP on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:24:25] PROBLEM - PyBal backends health check on lvs1003 is CRITICAL: PYBAL CRITICAL - api_80 - Could not depool server mw1231.eqiad.wmnet because of too many down! [03:24:34] PROBLEM - HHVM rendering on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:24:34] PROBLEM - HHVM rendering on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:24:34] PROBLEM - Apache HTTP on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:24:34] PROBLEM - HHVM rendering on mw1207 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [03:24:34] PROBLEM - HHVM rendering on mw1229 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:24:44] RECOVERY - restbase endpoints health on restbase-test2003 is OK: All endpoints are healthy [03:24:44] PROBLEM - restbase endpoints health on restbase2007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:24:44] PROBLEM - restbase endpoints health on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:24:44] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:24:44] PROBLEM - restbase endpoints health on restbase2005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:24:44] PROBLEM - restbase endpoints health on xenon is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:24:44] PROBLEM - restbase endpoints health on restbase-test2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:24:54] RECOVERY - restbase endpoints health on restbase1015 is OK: All endpoints are healthy [03:24:54] PROBLEM - restbase endpoints health on restbase2009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:24:54] RECOVERY - Apache HTTP on mw1289 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.045 second response time [03:24:54] RECOVERY - Apache HTTP on mw1280 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.123 second response time [03:25:04] PROBLEM - Apache HTTP on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:25:04] PROBLEM - HHVM rendering on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:25:04] PROBLEM - HHVM rendering on mw1287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:25:05] PROBLEM - HHVM rendering on mw1225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:25:05] PROBLEM - HHVM rendering on mw1223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:25:14] PROBLEM - HHVM rendering on mw1232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:25:14] PROBLEM - Apache HTTP on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:25:14] PROBLEM - Apache HTTP on mw1225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:25:14] PROBLEM - HHVM rendering on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:25:14] PROBLEM - HHVM rendering on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:25:14] RECOVERY - HHVM rendering on mw1289 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 0.151 second response time [03:25:14] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:25:15] PROBLEM - Apache HTTP on mw1287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:25:15] RECOVERY - HHVM rendering on mw1280 is OK: HTTP OK: HTTP/1.1 200 OK - 69360 bytes in 1.975 second response time [03:25:16] RECOVERY - graphoid endpoints health on scb1002 is OK: All endpoints are healthy [03:25:24] RECOVERY - Apache HTTP on mw1222 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.965 second response time [03:25:24] PROBLEM - HHVM rendering on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:25:24] PROBLEM - Apache HTTP on mw1203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:25:24] RECOVERY - Apache HTTP on mw1204 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.030 second response time [03:25:24] PROBLEM - Apache HTTP on mw1278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:25:24] PROBLEM - HHVM rendering on mw1234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:25:24] PROBLEM - HHVM rendering on mw1203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:25:25] RECOVERY - HHVM rendering on mw1222 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 4.826 second response time [03:25:34] PROBLEM - HHVM rendering on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:25:37] PROBLEM - LVS HTTP IPv4 on api.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:25:37] PROBLEM - HHVM rendering on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:25:37] RECOVERY - HHVM rendering on mw1207 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.123 second response time [03:25:44] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [03:25:44] RECOVERY - restbase endpoints health on restbase1014 is OK: All endpoints are healthy [03:26:04] PROBLEM - Apache HTTP on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:26:04] RECOVERY - HHVM rendering on mw1232 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 6.535 second response time [03:26:04] PROBLEM - graphoid endpoints health on scb1001 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200) [03:26:14] PROBLEM - Apache HTTP on mw1234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:26:14] PROBLEM - Apache HTTP on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:26:14] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [1000.0] [03:26:14] RECOVERY - Apache HTTP on mw1208 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.494 second response time [03:26:15] RECOVERY - HHVM rendering on mw1208 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.774 second response time [03:26:26] RECOVERY - LVS HTTP IPv4 on api.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 18293 bytes in 0.173 second response time [03:26:26] PROBLEM - Apache HTTP on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:26:27] PROBLEM - HHVM rendering on mw1235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:26:27] PROBLEM - HHVM rendering on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:26:34] PROBLEM - Apache HTTP on mw1200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:26:45] RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy [03:26:45] PROBLEM - restbase endpoints health on restbase2010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:26:54] RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy [03:27:04] RECOVERY - Apache HTTP on mw1284 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.051 second response time [03:27:04] PROBLEM - Apache HTTP on mw1224 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [03:27:04] PROBLEM - Apache HTTP on mw1283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:27:04] PROBLEM - HHVM rendering on mw1200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:27:04] PROBLEM - HHVM rendering on mw1283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:27:04] RECOVERY - graphoid endpoints health on scb1001 is OK: All endpoints are healthy [03:27:14] PROBLEM - Apache HTTP on mw1235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:27:14] PROBLEM - HHVM rendering on mw1224 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [03:27:14] RECOVERY - HHVM rendering on mw1284 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.642 second response time [03:27:34] PROBLEM - Apache HTTP on mw1232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:27:44] PROBLEM - restbase endpoints health on restbase-test2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:27:54] RECOVERY - HHVM rendering on mw1290 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.204 second response time [03:28:04] RECOVERY - Apache HTTP on mw1290 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.051 second response time [03:28:04] RECOVERY - HHVM rendering on mw1281 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.192 second response time [03:28:05] RECOVERY - Apache HTTP on mw1224 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.029 second response time [03:28:05] PROBLEM - Apache HTTP on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:28:05] RECOVERY - HHVM rendering on mw1282 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 7.926 second response time [03:28:14] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 77.78% of data above the critical threshold [1000.0] [03:28:14] RECOVERY - HHVM rendering on mw1224 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.143 second response time [03:28:24] PROBLEM - Apache HTTP on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:28:24] PROBLEM - HHVM rendering on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:28:34] RECOVERY - Apache HTTP on mw1282 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.501 second response time [03:28:34] PROBLEM - HHVM rendering on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:28:34] RECOVERY - Apache HTTP on mw1281 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.081 second response time [03:28:44] RECOVERY - restbase endpoints health on restbase2003 is OK: All endpoints are healthy [03:28:44] PROBLEM - restbase endpoints health on restbase1014 is CRITICAL: /page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) [03:28:44] PROBLEM - restbase endpoints health on restbase-test2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:29:14] PROBLEM - HHVM rendering on mw1232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:29:24] RECOVERY - HHVM rendering on mw1194 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.230 second response time [03:29:34] RECOVERY - Apache HTTP on mw1232 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.620 second response time [03:29:44] RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy [03:29:44] PROBLEM - restbase endpoints health on xenon is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:29:54] PROBLEM - restbase endpoints health on restbase1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:30:04] RECOVERY - Apache HTTP on mw1234 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.021 second response time [03:30:04] RECOVERY - HHVM rendering on mw1225 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 3.666 second response time [03:30:04] RECOVERY - Apache HTTP on mw1196 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.035 second response time [03:30:04] RECOVERY - Apache HTTP on mw1225 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.676 second response time [03:30:05] RECOVERY - HHVM rendering on mw1232 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 6.326 second response time [03:30:14] RECOVERY - Apache HTTP on mw1194 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.023 second response time [03:30:14] RECOVERY - HHVM rendering on mw1234 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.245 second response time [03:30:14] RECOVERY - HHVM rendering on mw1196 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.124 second response time [03:30:24] RECOVERY - HHVM rendering on mw1235 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 7.446 second response time [03:30:44] RECOVERY - restbase endpoints health on restbase2011 is OK: All endpoints are healthy [03:30:44] RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy [03:30:54] RECOVERY - restbase endpoints health on restbase2004 is OK: All endpoints are healthy [03:31:04] PROBLEM - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/media/image/featured/{yyyy}/{mm}/{dd} (retrieve featured image data for April 29, 2016) is CRITICAL: Test retrieve featured image data for April 29, 2016 returned the unexpected status 500 (expecting: 200) [03:31:24] RECOVERY - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is OK: All endpoints are healthy [03:31:24] RECOVERY - Apache HTTP on mw1222 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.291 second response time [03:31:24] RECOVERY - Apache HTTP on mw1200 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.227 second response time [03:31:24] RECOVERY - HHVM rendering on mw1222 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.550 second response time [03:31:34] RECOVERY - restbase endpoints health on restbase2005 is OK: All endpoints are healthy [03:31:34] RECOVERY - restbase endpoints health on cerium is OK: All endpoints are healthy [03:31:44] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [03:31:44] RECOVERY - restbase endpoints health on restbase1008 is OK: All endpoints are healthy [03:31:44] RECOVERY - restbase endpoints health on restbase1010 is OK: All endpoints are healthy [03:31:44] PROBLEM - restbase endpoints health on restbase2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:31:44] RECOVERY - restbase endpoints health on praseodymium is OK: All endpoints are healthy [03:31:54] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:32:04] RECOVERY - HHVM rendering on mw1223 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.429 second response time [03:32:04] RECOVERY - Apache HTTP on mw1223 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 2.754 second response time [03:32:14] RECOVERY - HHVM rendering on mw1227 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 3.999 second response time [03:32:24] RECOVERY - Apache HTTP on mw1227 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.922 second response time [03:32:34] PROBLEM - Apache HTTP on mw1232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:32:34] RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy [03:32:34] RECOVERY - restbase endpoints health on restbase2006 is OK: All endpoints are healthy [03:32:44] RECOVERY - restbase endpoints health on restbase1014 is OK: All endpoints are healthy [03:32:44] RECOVERY - restbase endpoints health on restbase2007 is OK: All endpoints are healthy [03:32:44] RECOVERY - restbase endpoints health on restbase2001 is OK: All endpoints are healthy [03:32:44] RECOVERY - restbase endpoints health on restbase1007 is OK: All endpoints are healthy [03:32:44] RECOVERY - restbase endpoints health on restbase2009 is OK: All endpoints are healthy [03:32:44] PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:33:04] RECOVERY - Apache HTTP on mw1235 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.026 second response time [03:33:14] PROBLEM - puppet last run on mw1248 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.test],File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz] [03:33:24] RECOVERY - HHVM rendering on mw1199 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.189 second response time [03:33:24] PROBLEM - HHVM rendering on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:33:44] RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy [03:33:44] RECOVERY - restbase endpoints health on restbase1011 is OK: All endpoints are healthy [03:33:44] RECOVERY - restbase endpoints health on restbase2008 is OK: All endpoints are healthy [03:33:44] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:34:04] RECOVERY - Apache HTTP on mw1199 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.599 second response time [03:34:04] RECOVERY - HHVM rendering on mw1276 is OK: HTTP OK: HTTP/1.1 200 OK - 69360 bytes in 1.063 second response time [03:34:04] PROBLEM - Apache HTTP on mw1190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:34:04] PROBLEM - HHVM rendering on mw1225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:34:14] PROBLEM - Apache HTTP on mw1225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:34:14] PROBLEM - Apache HTTP on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:34:14] PROBLEM - HHVM rendering on mw1232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:34:14] PROBLEM - HHVM rendering on mw1190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:34:14] RECOVERY - Apache HTTP on mw1203 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.053 second response time [03:34:14] RECOVERY - HHVM rendering on mw1203 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.135 second response time [03:34:24] PROBLEM - Apache HTTP on mw1200 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [03:34:24] RECOVERY - Apache HTTP on mw1276 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.025 second response time [03:34:24] PROBLEM - Apache HTTP on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:34:34] RECOVERY - Apache HTTP on mw1232 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.099 second response time [03:34:34] PROBLEM - HHVM rendering on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:34:44] RECOVERY - restbase endpoints health on restbase1013 is OK: All endpoints are healthy [03:34:44] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy [03:34:54] PROBLEM - restbase endpoints health on praseodymium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:34:54] PROBLEM - HHVM rendering on mw1223 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [03:35:04] PROBLEM - Apache HTTP on mw1223 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [03:35:14] PROBLEM - Apache HTTP on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:35:14] PROBLEM - HHVM rendering on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:35:24] PROBLEM - HHVM rendering on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:35:24] PROBLEM - HHVM rendering on mw1193 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:35:34] PROBLEM - Apache HTTP on mw1193 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:35:34] PROBLEM - HHVM rendering on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:35:34] PROBLEM - Apache HTTP on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:35:34] RECOVERY - restbase endpoints health on restbase-test2003 is OK: All endpoints are healthy [03:35:44] RECOVERY - restbase endpoints health on restbase-test2002 is OK: All endpoints are healthy [03:35:44] PROBLEM - restbase endpoints health on xenon is CRITICAL: /page/random/{format} (Random title redirect) is CRITICAL: Test Random title redirect returned the unexpected status 503 (expecting: 303) [03:35:44] PROBLEM - restbase endpoints health on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:35:44] PROBLEM - restbase endpoints health on restbase2007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:35:44] RECOVERY - restbase endpoints health on restbase1015 is OK: All endpoints are healthy [03:35:44] PROBLEM - restbase endpoints health on cerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:35:54] RECOVERY - Apache HTTP on mw1286 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.021 second response time [03:35:54] RECOVERY - HHVM rendering on mw1223 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.116 second response time [03:35:54] RECOVERY - HHVM rendering on mw1200 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.109 second response time [03:35:54] RECOVERY - HHVM rendering on mw1286 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.129 second response time [03:36:04] RECOVERY - Apache HTTP on mw1223 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.032 second response time [03:36:04] RECOVERY - Apache HTTP on mw1206 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.155 second response time [03:36:04] RECOVERY - HHVM rendering on mw1206 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.105 second response time [03:36:04] RECOVERY - HHVM rendering on mw1232 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 4.724 second response time [03:36:14] PROBLEM - Apache HTTP on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:36:24] RECOVERY - HHVM rendering on mw1227 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 4.361 second response time [03:36:24] RECOVERY - Apache HTTP on mw1229 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.034 second response time [03:36:24] RECOVERY - Apache HTTP on mw1200 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.028 second response time [03:36:24] RECOVERY - PyBal backends health check on lvs1006 is OK: PYBAL OK - All pools are healthy [03:36:24] RECOVERY - PyBal backends health check on lvs1003 is OK: PYBAL OK - All pools are healthy [03:36:34] RECOVERY - HHVM rendering on mw1229 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 1.196 second response time [03:36:34] RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy [03:36:34] RECOVERY - restbase endpoints health on restbase2003 is OK: All endpoints are healthy [03:36:34] RECOVERY - restbase endpoints health on cerium is OK: All endpoints are healthy [03:36:44] RECOVERY - restbase endpoints health on restbase2001 is OK: All endpoints are healthy [03:36:44] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [03:36:44] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [03:36:44] RECOVERY - mobileapps endpoints health on scb1004 is OK: All endpoints are healthy [03:36:44] RECOVERY - restbase endpoints health on restbase2010 is OK: All endpoints are healthy [03:36:44] RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy [03:36:44] RECOVERY - restbase endpoints health on restbase2007 is OK: All endpoints are healthy [03:36:45] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy [03:36:45] RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy [03:36:46] RECOVERY - restbase endpoints health on praseodymium is OK: All endpoints are healthy [03:36:46] RECOVERY - mobileapps endpoints health on scb2003 is OK: All endpoints are healthy [03:36:47] PROBLEM - restbase endpoints health on restbase2008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:37:04] RECOVERY - HHVM rendering on mw1202 is OK: HTTP OK: HTTP/1.1 200 OK - 69360 bytes in 1.062 second response time [03:37:14] RECOVERY - HHVM rendering on mw1193 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.150 second response time [03:37:24] RECOVERY - Apache HTTP on mw1193 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.929 second response time [03:37:34] RECOVERY - HHVM rendering on mw1197 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 7.300 second response time [03:37:34] RECOVERY - restbase endpoints health on restbase2008 is OK: All endpoints are healthy [03:37:44] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy [03:37:54] RECOVERY - Apache HTTP on mw1202 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.038 second response time [03:37:54] RECOVERY - HHVM rendering on mw1225 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.224 second response time [03:38:04] RECOVERY - Apache HTTP on mw1225 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.024 second response time [03:38:04] RECOVERY - Apache HTTP on mw1233 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.165 second response time [03:38:04] RECOVERY - HHVM rendering on mw1233 is OK: HTTP OK: HTTP/1.1 200 OK - 69356 bytes in 0.091 second response time [03:38:14] RECOVERY - Apache HTTP on mw1197 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 2.440 second response time [03:38:24] RECOVERY - HHVM rendering on mw1226 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.169 second response time [03:38:24] PROBLEM - Apache HTTP on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:38:54] RECOVERY - HHVM rendering on mw1205 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.207 second response time [03:39:04] RECOVERY - Apache HTTP on mw1195 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.044 second response time [03:39:04] RECOVERY - Apache HTTP on mw1205 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.037 second response time [03:39:04] RECOVERY - HHVM rendering on mw1195 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.195 second response time [03:39:04] RECOVERY - Apache HTTP on mw1190 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 6.334 second response time [03:39:05] RECOVERY - HHVM rendering on mw1190 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 1.133 second response time [03:39:24] RECOVERY - Apache HTTP on mw1227 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 2.533 second response time [03:39:44] PROBLEM - mobileapps endpoints health on scb1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:40:14] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 88.20 seconds [03:40:14] RECOVERY - Apache HTTP on mw1222 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.043 second response time [03:40:24] RECOVERY - HHVM rendering on mw1222 is OK: HTTP OK: HTTP/1.1 200 OK - 69355 bytes in 0.174 second response time [03:40:34] PROBLEM - Apache HTTP on mw1193 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:40:54] RECOVERY - Apache HTTP on mw1283 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.022 second response time [03:40:54] RECOVERY - HHVM rendering on mw1283 is OK: HTTP OK: HTTP/1.1 200 OK - 69354 bytes in 0.108 second response time [03:41:14] RECOVERY - Apache HTTP on mw1201 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 2.717 second response time [03:41:44] RECOVERY - mobileapps endpoints health on scb2004 is OK: All endpoints are healthy [03:41:54] RECOVERY - Apache HTTP on mw1226 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.030 second response time [03:41:54] RECOVERY - HHVM rendering on mw1278 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.121 second response time [03:42:04] PROBLEM - Apache HTTP on mw1190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:42:14] PROBLEM - HHVM rendering on mw1190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:42:14] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0] [03:42:14] PROBLEM - HHVM rendering on mw1227 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [03:42:14] PROBLEM - Apache HTTP on mw1203 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [03:42:14] RECOVERY - Apache HTTP on mw1278 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.028 second response time [03:42:14] PROBLEM - HHVM rendering on mw1203 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [03:42:24] RECOVERY - Apache HTTP on mw1193 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.699 second response time [03:42:24] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0] [03:42:44] PROBLEM - mobileapps endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:42:54] RECOVERY - Codfw HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:43:14] RECOVERY - Apache HTTP on mw1203 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.039 second response time [03:43:14] RECOVERY - HHVM rendering on mw1227 is OK: HTTP OK: HTTP/1.1 200 OK - 69360 bytes in 1.069 second response time [03:43:14] RECOVERY - HHVM rendering on mw1203 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.260 second response time [03:43:24] PROBLEM - HHVM rendering on mw1193 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:43:34] PROBLEM - HHVM rendering on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:43:44] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:43:54] RECOVERY - Apache HTTP on mw1288 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.025 second response time [03:43:54] RECOVERY - HHVM rendering on mw1288 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.108 second response time [03:44:44] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:44:54] RECOVERY - Apache HTTP on mw1190 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.031 second response time [03:44:54] RECOVERY - HHVM rendering on mw1287 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.193 second response time [03:45:04] RECOVERY - HHVM rendering on mw1190 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.374 second response time [03:45:05] RECOVERY - Apache HTTP on mw1287 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.038 second response time [03:45:05] PROBLEM - HHVM rendering on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:45:14] PROBLEM - Apache HTTP on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:45:24] RECOVERY - HHVM rendering on mw1194 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.157 second response time [03:45:24] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:45:44] RECOVERY - mobileapps endpoints health on scb1004 is OK: All endpoints are healthy [03:45:44] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy [03:46:04] RECOVERY - HHVM rendering on mw1290 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 8.609 second response time [03:46:05] PROBLEM - HHVM rendering on mw1282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:46:14] RECOVERY - HHVM rendering on mw1284 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.140 second response time [03:46:24] PROBLEM - HHVM rendering on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:46:25] PROBLEM - HHVM rendering on mw1229 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [03:46:44] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [03:46:44] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy [03:46:54] RECOVERY - Apache HTTP on mw1280 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.029 second response time [03:47:14] PROBLEM - Apache HTTP on mw1229 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [03:47:14] RECOVERY - HHVM rendering on mw1193 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.123 second response time [03:47:14] RECOVERY - HHVM rendering on mw1280 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.258 second response time [03:47:24] PROBLEM - HHVM rendering on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:48:04] RECOVERY - Apache HTTP on mw1284 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.023 second response time [03:48:04] RECOVERY - Apache HTTP on mw1290 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.056 second response time [03:48:14] RECOVERY - Apache HTTP on mw1229 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.029 second response time [03:48:14] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:48:14] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:48:14] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:48:14] RECOVERY - HHVM rendering on mw1277 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.196 second response time [03:48:24] RECOVERY - HHVM rendering on mw1229 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.174 second response time [03:48:34] PROBLEM - Apache HTTP on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:48:44] PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:48:44] PROBLEM - mobileapps endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:49:24] RECOVERY - HHVM rendering on mw1196 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 6.949 second response time [03:50:34] RECOVERY - Apache HTTP on mw1285 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.272 second response time [03:50:44] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy [03:50:44] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy [03:50:44] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:51:04] RECOVERY - HHVM rendering on mw1282 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 6.839 second response time [03:51:14] PROBLEM - Apache HTTP on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:51:34] PROBLEM - Apache HTTP on mw1207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:51:44] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy [03:51:44] PROBLEM - mobileapps endpoints health on scb1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:52:14] PROBLEM - HHVM rendering on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:52:14] PROBLEM - Apache HTTP on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:52:24] RECOVERY - Apache HTTP on mw1207 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.415 second response time [03:52:24] PROBLEM - HHVM rendering on mw1192 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:52:34] PROBLEM - Apache HTTP on mw1282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:53:04] RECOVERY - Apache HTTP on mw1290 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.520 second response time [03:53:04] RECOVERY - HHVM rendering on mw1285 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.128 second response time [03:53:04] PROBLEM - Apache HTTP on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:53:04] PROBLEM - Apache HTTP on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:53:04] PROBLEM - HHVM rendering on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:53:14] PROBLEM - Apache HTTP on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:53:14] PROBLEM - HHVM rendering on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:53:14] PROBLEM - HHVM rendering on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:53:24] PROBLEM - Apache HTTP on mw1208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:53:24] PROBLEM - HHVM rendering on mw1208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:53:34] PROBLEM - Apache HTTP on mw1192 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:53:44] PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:53:44] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:53:44] PROBLEM - mobileapps endpoints health on scb2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:53:54] RECOVERY - Apache HTTP on mw1202 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.046 second response time [03:54:04] PROBLEM - HHVM rendering on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:54:24] RECOVERY - HHVM rendering on mw1192 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 5.887 second response time [03:54:24] PROBLEM - HHVM rendering on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:54:34] RECOVERY - Apache HTTP on mw1192 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.464 second response time [03:54:44] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:54:54] RECOVERY - Apache HTTP on mw1286 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.774 second response time [03:55:04] RECOVERY - Apache HTTP on mw1195 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.734 second response time [03:55:04] RECOVERY - HHVM rendering on mw1195 is OK: HTTP OK: HTTP/1.1 200 OK - 69356 bytes in 0.093 second response time [03:55:04] RECOVERY - HHVM rendering on mw1286 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 4.970 second response time [03:55:04] RECOVERY - HHVM rendering on mw1202 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.857 second response time [03:55:05] PROBLEM - Apache HTTP on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:55:05] PROBLEM - HHVM rendering on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:55:05] RECOVERY - Apache HTTP on mw1277 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.038 second response time [03:55:34] PROBLEM - Apache HTTP on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:55:44] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy [03:55:44] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy [03:55:52] 06Operations, 06Multimedia, 10Traffic, 13Patch-For-Review, and 2 others: Thumbnails failing to render sporadically (ERR_CONNECTION_CLOSED or ERR_SSL_BAD_RECORD_MAC_ALERT) - https://phabricator.wikimedia.org/T148917#2825066 (10Josve05a) Just got another `Failed to load resource: net::ERR_CONNECTION_CLOSED`... [03:56:14] PROBLEM - Apache HTTP on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:56:14] RECOVERY - HHVM rendering on mw1208 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 1.908 second response time [03:56:14] RECOVERY - Apache HTTP on mw1208 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.915 second response time [03:56:24] RECOVERY - HHVM rendering on mw1198 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 8.117 second response time [03:56:44] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [03:57:04] PROBLEM - HHVM rendering on mw1200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:57:24] PROBLEM - Apache HTTP on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:57:44] PROBLEM - mobileapps endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:58:04] RECOVERY - HHVM rendering on mw1279 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 4.383 second response time [03:58:04] RECOVERY - Apache HTTP on mw1199 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.263 second response time [03:58:04] PROBLEM - Apache HTTP on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:58:05] PROBLEM - HHVM rendering on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:58:14] PROBLEM - Apache HTTP on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:58:14] PROBLEM - HHVM rendering on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:58:14] PROBLEM - HHVM rendering on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:58:24] PROBLEM - Apache HTTP on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:58:24] PROBLEM - HHVM rendering on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:58:24] RECOVERY - Apache HTTP on mw1285 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.383 second response time [03:58:24] PROBLEM - HHVM rendering on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:58:44] PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:58:44] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:59:04] RECOVERY - HHVM rendering on mw1200 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.636 second response time [03:59:04] PROBLEM - Apache HTTP on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:59:05] PROBLEM - HHVM rendering on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:59:15] RECOVERY - Apache HTTP on mw1279 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.059 second response time [03:59:24] RECOVERY - HHVM rendering on mw1196 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 7.500 second response time [03:59:34] RECOVERY - Apache HTTP on mw1282 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 6.450 second response time [03:59:34] PROBLEM - Apache HTTP on mw1192 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:59:34] PROBLEM - Apache HTTP on mw1204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:59:44] RECOVERY - mobileapps endpoints health on scb1004 is OK: All endpoints are healthy [03:59:44] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy [03:59:44] RECOVERY - mobileapps endpoints health on scb2003 is OK: All endpoints are healthy [04:00:04] RECOVERY - Apache HTTP on mw1286 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.744 second response time [04:00:04] RECOVERY - HHVM rendering on mw1286 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 1.703 second response time [04:00:04] RECOVERY - HHVM rendering on mw1290 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 3.829 second response time [04:00:04] RECOVERY - Apache HTTP on mw1290 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.061 second response time [04:00:04] RECOVERY - Apache HTTP on mw1202 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.011 second response time [04:00:04] PROBLEM - HHVM rendering on mw1282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:00:05] RECOVERY - HHVM rendering on mw1285 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 4.824 second response time [04:00:14] RECOVERY - HHVM rendering on mw1277 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 4.115 second response time [04:00:24] RECOVERY - Apache HTTP on mw1192 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.211 second response time [04:00:24] RECOVERY - Apache HTTP on mw1204 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.038 second response time [04:00:24] PROBLEM - HHVM rendering on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:00:34] PROBLEM - Apache HTTP on mw1207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:00:44] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:01:04] RECOVERY - HHVM rendering on mw1282 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.852 second response time [04:01:04] RECOVERY - HHVM rendering on mw1205 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 3.316 second response time [04:01:04] RECOVERY - Apache HTTP on mw1205 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.237 second response time [04:01:04] PROBLEM - HHVM rendering on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:01:14] RECOVERY - puppet last run on mw1248 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [04:01:14] PROBLEM - Apache HTTP on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:01:14] PROBLEM - Apache HTTP on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:01:24] PROBLEM - HHVM rendering on mw1208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:01:24] PROBLEM - Apache HTTP on mw1208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:01:24] PROBLEM - HHVM rendering on mw1204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:01:24] RECOVERY - Apache HTTP on mw1198 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.765 second response time [04:01:34] PROBLEM - HHVM rendering on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:01:44] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy [04:02:04] RECOVERY - HHVM rendering on mw1202 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.103 second response time [04:02:04] PROBLEM - HHVM rendering on mw1200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:02:04] RECOVERY - Apache HTTP on mw1277 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.035 second response time [04:02:14] RECOVERY - Apache HTTP on mw1197 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 2.644 second response time [04:02:14] RECOVERY - HHVM rendering on mw1204 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 2.999 second response time [04:02:24] PROBLEM - Apache HTTP on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:02:24] RECOVERY - HHVM rendering on mw1194 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.148 second response time [04:02:34] PROBLEM - Apache HTTP on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:02:44] PROBLEM - HHVM rendering on mw1207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:02:44] PROBLEM - mobileapps endpoints health on scb1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:02:44] PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:03:04] RECOVERY - HHVM rendering on mw1200 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 6.477 second response time [04:03:04] PROBLEM - Apache HTTP on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:03:14] PROBLEM - Apache HTTP on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:03:14] RECOVERY - Apache HTTP on mw1208 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.657 second response time [04:03:14] RECOVERY - HHVM rendering on mw1208 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 1.748 second response time [04:03:24] RECOVERY - HHVM rendering on mw1198 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 6.686 second response time [04:03:34] PROBLEM - Apache HTTP on mw1282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:03:34] PROBLEM - HHVM rendering on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:03:34] RECOVERY - HHVM rendering on mw1207 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 1.142 second response time [04:03:54] RECOVERY - HHVM rendering on mw1279 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.114 second response time [04:04:04] RECOVERY - Apache HTTP on mw1196 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.077 second response time [04:04:04] PROBLEM - Apache HTTP on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:04:04] PROBLEM - HHVM rendering on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:04:14] PROBLEM - Apache HTTP on mw1223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:04:14] PROBLEM - HHVM rendering on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:04:24] RECOVERY - Apache HTTP on mw1279 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 9.929 second response time [04:04:24] RECOVERY - HHVM rendering on mw1197 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 3.447 second response time [04:04:25] PROBLEM - HHVM rendering on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:04:34] RECOVERY - Apache HTTP on mw1282 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.868 second response time [04:04:34] RECOVERY - Apache HTTP on mw1285 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 9.658 second response time [04:04:34] PROBLEM - HHVM rendering on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:04:44] PROBLEM - mobileapps endpoints health on scb2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:04:54] RECOVERY - Apache HTTP on mw1286 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 2.002 second response time [04:05:04] RECOVERY - Apache HTTP on mw1202 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 2.690 second response time [04:05:04] RECOVERY - Apache HTTP on mw1223 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.025 second response time [04:05:04] RECOVERY - HHVM rendering on mw1286 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 8.961 second response time [04:05:14] RECOVERY - HHVM rendering on mw1285 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 7.134 second response time [04:05:24] RECOVERY - HHVM rendering on mw1199 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.356 second response time [04:05:34] RECOVERY - Apache HTTP on mw1207 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.994 second response time [04:05:44] PROBLEM - mobileapps endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:05:44] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:06:44] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [04:06:44] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy [04:06:44] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [04:07:14] RECOVERY - HHVM rendering on mw1196 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.899 second response time [04:07:34] PROBLEM - Apache HTTP on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:07:44] PROBLEM - mobileapps endpoints health on scb2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:08:15] PROBLEM - HHVM rendering on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:08:34] RECOVERY - Apache HTTP on mw1285 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.284 second response time [04:08:44] RECOVERY - mobileapps endpoints health on scb2004 is OK: All endpoints are healthy [04:08:44] RECOVERY - mobileapps endpoints health on scb1004 is OK: All endpoints are healthy [04:08:54] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=5150.90 Read Requests/Sec=5702.40 Write Requests/Sec=9.30 KBytes Read/Sec=32254.00 KBytes_Written/Sec=4328.00 [04:09:44] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy [04:09:44] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy [04:09:44] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:10:04] RECOVERY - HHVM rendering on mw1285 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 5.229 second response time [04:10:44] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [04:10:44] PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:11:24] PROBLEM - Apache HTTP on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:12:24] RECOVERY - Apache HTTP on mw1279 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.018 second response time [04:12:44] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy [04:12:44] RECOVERY - mobileapps endpoints health on scb2003 is OK: All endpoints are healthy [04:14:04] PROBLEM - Apache HTTP on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:14:04] PROBLEM - HHVM rendering on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:14:04] PROBLEM - HHVM rendering on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:14:24] PROBLEM - Apache HTTP on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:14:44] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:14:54] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=21.30 Read Requests/Sec=11.00 Write Requests/Sec=0.50 KBytes Read/Sec=48.40 KBytes_Written/Sec=2.80 [04:15:04] RECOVERY - HHVM rendering on mw1279 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 7.167 second response time [04:15:04] RECOVERY - Apache HTTP on mw1286 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 9.289 second response time [04:15:04] RECOVERY - HHVM rendering on mw1286 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 7.295 second response time [04:15:14] RECOVERY - Apache HTTP on mw1198 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.088 second response time [04:15:15] PROBLEM - Apache HTTP on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:15:24] PROBLEM - HHVM rendering on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:15:44] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [04:15:44] PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:16:04] PROBLEM - HHVM rendering on mw1282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:16:14] PROBLEM - Apache HTTP on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:16:14] RECOVERY - Apache HTTP on mw1277 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 4.587 second response time [04:16:14] RECOVERY - HHVM rendering on mw1277 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.306 second response time [04:16:44] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy [04:16:44] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:17:04] RECOVERY - Apache HTTP on mw1290 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 2.433 second response time [04:17:44] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy [04:18:04] RECOVERY - HHVM rendering on mw1282 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 9.195 second response time [04:18:04] PROBLEM - HHVM rendering on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:18:34] PROBLEM - Apache HTTP on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:19:04] RECOVERY - HHVM rendering on mw1286 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 6.579 second response time [04:19:14] PROBLEM - Apache HTTP on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:19:25] RECOVERY - Apache HTTP on mw1285 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.253 second response time [04:20:14] PROBLEM - HHVM rendering on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:20:14] RECOVERY - Apache HTTP on mw1277 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 4.792 second response time [04:21:04] RECOVERY - HHVM rendering on mw1285 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.167 second response time [04:21:04] PROBLEM - HHVM rendering on mw1282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:21:44] PROBLEM - mobileapps endpoints health on scb2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:22:04] RECOVERY - HHVM rendering on mw1282 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 4.404 second response time [04:22:44] RECOVERY - mobileapps endpoints health on scb2004 is OK: All endpoints are healthy [04:34:57] (03CR) 10Legoktm: "Yes...I reverted it back in PS5 except in your PS6 you lost my dependency and changed the commit message...?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322243 (https://phabricator.wikimedia.org/T66795) (owner: 10Legoktm) [04:35:18] (03PS7) 10Legoktm: Set $wgUserEmailUseReplyTo = true; everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322243 (https://phabricator.wikimedia.org/T66795) [04:37:24] PROBLEM - puppet last run on labvirt1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:46:44] PROBLEM - puppet last run on cp3047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:51:24] PROBLEM - HHVM rendering on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:52:04] PROBLEM - HHVM rendering on mw1278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:52:24] PROBLEM - Apache HTTP on mw1278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:52:34] PROBLEM - Apache HTTP on mw1193 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:52:54] RECOVERY - HHVM rendering on mw1278 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.194 second response time [04:53:14] PROBLEM - Apache HTTP on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:53:24] PROBLEM - HHVM rendering on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:53:34] PROBLEM - Apache HTTP on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:53:44] PROBLEM - mobileapps endpoints health on scb2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:54:04] PROBLEM - Apache HTTP on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:54:05] PROBLEM - HHVM rendering on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:54:14] PROBLEM - Apache HTTP on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:54:14] PROBLEM - HHVM rendering on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:54:44] PROBLEM - Apache HTTP on mw1281 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:54:44] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:54:54] RECOVERY - Apache HTTP on mw1280 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.049 second response time [04:55:04] RECOVERY - HHVM rendering on mw1285 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 1.977 second response time [04:55:05] PROBLEM - HHVM rendering on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:55:14] RECOVERY - Apache HTTP on mw1277 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.436 second response time [04:55:24] RECOVERY - HHVM rendering on mw1280 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 6.875 second response time [04:55:24] PROBLEM - Apache HTTP on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:55:44] RECOVERY - Apache HTTP on mw1281 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.738 second response time [04:55:44] PROBLEM - mobileapps endpoints health on scb2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:55:54] RECOVERY - HHVM rendering on mw1279 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.130 second response time [04:56:04] PROBLEM - HHVM rendering on mw1278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:56:05] PROBLEM - HHVM rendering on mw1276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:56:05] PROBLEM - HHVM rendering on mw1282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:56:14] PROBLEM - Apache HTTP on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:56:14] RECOVERY - HHVM rendering on mw1277 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 2.435 second response time [04:56:15] PROBLEM - HHVM rendering on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:56:24] PROBLEM - HHVM rendering on mw1192 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:56:44] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy [04:56:44] RECOVERY - mobileapps endpoints health on scb2003 is OK: All endpoints are healthy [04:56:44] RECOVERY - mobileapps endpoints health on scb2004 is OK: All endpoints are healthy [04:56:44] PROBLEM - mobileapps endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:56:44] PROBLEM - mobileapps endpoints health on scb1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:57:17] !log roll-restart hhvm on api_appcluster for on machines with hhvm leaking memory [04:57:24] RECOVERY - Apache HTTP on mw1193 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.858 second response time [04:57:24] PROBLEM - HHVM rendering on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:57:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:57:34] PROBLEM - Apache HTTP on mw1282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:58:04] RECOVERY - HHVM rendering on mw1278 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 1.422 second response time [04:58:04] PROBLEM - Apache HTTP on mw1283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:58:05] PROBLEM - HHVM rendering on mw1287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:58:14] PROBLEM - Apache HTTP on mw1287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:58:24] RECOVERY - HHVM rendering on mw1284 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 4.214 second response time [04:58:24] PROBLEM - HHVM rendering on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:58:24] PROBLEM - HHVM rendering on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:58:24] PROBLEM - Apache HTTP on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:58:44] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy [04:58:44] PROBLEM - Apache HTTP on mw1281 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:58:44] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:59:04] RECOVERY - HHVM rendering on mw1282 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.378 second response time [04:59:04] RECOVERY - Apache HTTP on mw1283 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 2.645 second response time [04:59:04] PROBLEM - Apache HTTP on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:59:04] PROBLEM - Apache HTTP on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:59:14] PROBLEM - Apache HTTP on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:59:14] PROBLEM - Apache HTTP on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:59:14] PROBLEM - HHVM rendering on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:59:14] PROBLEM - HHVM rendering on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:59:14] RECOVERY - HHVM rendering on mw1192 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.115 second response time [04:59:24] PROBLEM - Apache HTTP on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:59:24] PROBLEM - Apache HTTP on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:59:24] PROBLEM - HHVM rendering on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:59:34] RECOVERY - Apache HTTP on mw1285 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 9.877 second response time [04:59:34] PROBLEM - HHVM rendering on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:59:44] RECOVERY - mobileapps endpoints health on scb1004 is OK: All endpoints are healthy [04:59:44] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:59:44] PROBLEM - restbase endpoints health on restbase2006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:00:04] RECOVERY - Apache HTTP on mw1280 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.612 second response time [05:00:04] PROBLEM - Apache HTTP on mw1288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:00:04] PROBLEM - HHVM rendering on mw1283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:00:05] PROBLEM - HHVM rendering on mw1200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:00:05] PROBLEM - HHVM rendering on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:00:05] PROBLEM - Apache HTTP on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:00:05] RECOVERY - Apache HTTP on mw1196 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 6.106 second response time [05:00:14] RECOVERY - Apache HTTP on mw1206 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.854 second response time [05:00:14] RECOVERY - HHVM rendering on mw1201 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 2.116 second response time [05:00:14] PROBLEM - HHVM rendering on mw1281 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:00:14] PROBLEM - HHVM rendering on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:00:24] RECOVERY - Apache HTTP on mw1201 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 4.083 second response time [05:00:34] RECOVERY - restbase endpoints health on restbase2006 is OK: All endpoints are healthy [05:00:44] PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:00:44] PROBLEM - mobileapps endpoints health on scb2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:00:44] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:01:04] RECOVERY - HHVM rendering on mw1281 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.148 second response time [05:01:04] RECOVERY - Apache HTTP on mw1288 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.019 second response time [05:01:04] RECOVERY - HHVM rendering on mw1287 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 5.037 second response time [05:01:14] RECOVERY - Apache HTTP on mw1194 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.282 second response time [05:01:24] PROBLEM - HHVM rendering on mw1289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:01:24] PROBLEM - HHVM rendering on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:01:34] RECOVERY - HHVM rendering on mw1194 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 5.606 second response time [05:01:34] PROBLEM - Apache HTTP on mw1200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:01:44] PROBLEM - mobileapps endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:01:44] PROBLEM - mobileapps endpoints health on scb2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:01:44] PROBLEM - restbase endpoints health on restbase2010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:02:04] RECOVERY - Apache HTTP on mw1199 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.641 second response time [05:02:04] RECOVERY - HHVM rendering on mw1200 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 5.028 second response time [05:02:04] PROBLEM - Apache HTTP on mw1283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:02:04] PROBLEM - Apache HTTP on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:02:04] PROBLEM - HHVM rendering on mw1282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:02:04] PROBLEM - HHVM rendering on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:02:14] RECOVERY - HHVM rendering on mw1206 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 5.257 second response time [05:02:24] RECOVERY - Apache HTTP on mw1282 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.051 second response time [05:02:24] PROBLEM - HHVM rendering on mw1192 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:02:34] PROBLEM - Apache HTTP on mw1192 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:02:34] PROBLEM - Apache HTTP on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:02:44] PROBLEM - mobileapps endpoints health on scb1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:02:44] PROBLEM - restbase endpoints health on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:02:54] RECOVERY - Apache HTTP on mw1283 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.353 second response time [05:02:54] RECOVERY - HHVM rendering on mw1279 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.462 second response time [05:03:04] RECOVERY - HHVM rendering on mw1283 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 2.282 second response time [05:03:04] RECOVERY - Apache HTTP on mw1286 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.585 second response time [05:03:05] RECOVERY - HHVM rendering on mw1286 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 8.458 second response time [05:03:14] PROBLEM - HHVM rendering on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:03:14] PROBLEM - Apache HTTP on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:03:14] PROBLEM - Apache HTTP on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:03:14] PROBLEM - Apache HTTP on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:03:14] PROBLEM - HHVM rendering on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:03:15] RECOVERY - Apache HTTP on mw1279 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 2.512 second response time [05:03:24] PROBLEM - PyBal backends health check on lvs1006 is CRITICAL: PYBAL CRITICAL - api_80 - Could not depool server mw1280.eqiad.wmnet because of too many down! [05:03:24] PROBLEM - PyBal backends health check on lvs1003 is CRITICAL: PYBAL CRITICAL - api_80 - Could not depool server mw1205.eqiad.wmnet because of too many down! [05:03:34] PROBLEM - Apache HTTP on mw1276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:03:34] PROBLEM - Apache HTTP on mw1207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:03:34] PROBLEM - HHVM rendering on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:03:44] PROBLEM - HHVM rendering on mw1207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:03:44] PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:03:44] PROBLEM - restbase endpoints health on restbase2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:03:44] PROBLEM - restbase endpoints health on restbase2008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:03:44] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:03:44] PROBLEM - restbase endpoints health on restbase1008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:03:44] PROBLEM - restbase endpoints health on restbase2005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:03:45] PROBLEM - restbase endpoints health on restbase1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:03:45] PROBLEM - restbase endpoints health on restbase1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:03:46] PROBLEM - restbase endpoints health on restbase1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:03:46] PROBLEM - restbase endpoints health on restbase1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:03:47] PROBLEM - restbase endpoints health on restbase2006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:04:04] RECOVERY - HHVM rendering on mw1202 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.613 second response time [05:04:05] PROBLEM - Apache HTTP on mw1288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:04:05] PROBLEM - HHVM rendering on mw1288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:04:05] PROBLEM - HHVM rendering on mw1287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:04:05] PROBLEM - HHVM rendering on mw1278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:04:14] PROBLEM - HHVM rendering on mw1281 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:04:24] RECOVERY - Apache HTTP on mw1285 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.036 second response time [05:04:24] PROBLEM - HHVM rendering on mw1193 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:04:25] PROBLEM - Apache HTTP on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:04:25] PROBLEM - Apache HTTP on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:04:34] PROBLEM - Apache HTTP on mw1193 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:04:34] PROBLEM - HHVM rendering on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:04:36] PROBLEM - LVS HTTP IPv4 on api.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:04:36] PROBLEM - HHVM rendering on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:04:44] RECOVERY - restbase endpoints health on restbase2001 is OK: All endpoints are healthy [05:04:44] PROBLEM - restbase endpoints health on restbase-test2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:04:44] PROBLEM - restbase endpoints health on restbase2007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:04:44] PROBLEM - restbase endpoints health on cerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:04:54] RECOVERY - restbase endpoints health on restbase1015 is OK: All endpoints are healthy [05:04:54] PROBLEM - restbase endpoints health on restbase2009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:04:54] PROBLEM - Apache HTTP on mw1289 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [05:05:04] RECOVERY - HHVM rendering on mw1285 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.215 second response time [05:05:05] PROBLEM - Apache HTTP on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:05:05] PROBLEM - HHVM rendering on mw1200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:05:26] RECOVERY - LVS HTTP IPv4 on api.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 18293 bytes in 0.171 second response time [05:05:34] RECOVERY - HHVM rendering on mw1194 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 4.670 second response time [05:05:34] PROBLEM - Apache HTTP on mw1282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:05:44] RECOVERY - restbase endpoints health on restbase1013 is OK: All endpoints are healthy [05:05:44] PROBLEM - restbase endpoints health on restbase2011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:05:44] PROBLEM - restbase endpoints health on restbase-test2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:05:44] PROBLEM - restbase endpoints health on restbase-test2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:05:54] PROBLEM - restbase endpoints health on praseodymium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:05:54] RECOVERY - Apache HTTP on mw1289 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.037 second response time [05:05:54] PROBLEM - HHVM rendering on mw1279 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.003 second response time [05:06:04] PROBLEM - Apache HTTP on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:06:04] PROBLEM - Apache HTTP on mw1283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:06:04] PROBLEM - Apache HTTP on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:06:05] PROBLEM - HHVM rendering on mw1283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:06:05] PROBLEM - HHVM rendering on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:06:14] RECOVERY - HHVM rendering on mw1289 is OK: HTTP OK: HTTP/1.1 200 OK - 69356 bytes in 0.084 second response time [05:06:14] PROBLEM - Apache HTTP on mw1279 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [05:06:24] RECOVERY - Apache HTTP on mw1194 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.817 second response time [05:06:24] RECOVERY - HHVM rendering on mw1193 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 6.991 second response time [05:06:24] RECOVERY - Apache HTTP on mw1207 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.059 second response time [05:06:24] RECOVERY - puppet last run on labvirt1003 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [05:06:34] PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /page/revision/{revision} (Get rev by ID) is CRITICAL: Test Get rev by ID returned the unexpected status 503 (expecting: 200) [05:06:34] RECOVERY - HHVM rendering on mw1207 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.194 second response time [05:06:44] RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy [05:06:54] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:07:04] PROBLEM - Apache HTTP on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:07:14] PROBLEM - HHVM rendering on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:07:14] RECOVERY - HHVM rendering on mw1201 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 6.863 second response time [05:07:24] RECOVERY - HHVM rendering on mw1192 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 2.095 second response time [05:07:24] RECOVERY - PyBal backends health check on lvs1006 is OK: PYBAL OK - All pools are healthy [05:07:24] RECOVERY - Apache HTTP on mw1201 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.577 second response time [05:07:24] RECOVERY - Apache HTTP on mw1192 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 2.776 second response time [05:07:24] PROBLEM - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is CRITICAL: /{domain}/v1/page/media/{title} (retrieve images and videos of en.wp Cat page via media route) is CRITICAL: Test retrieve images and videos of en.wp Cat page via media route returned the unexpected status 503 (expecting: 200) [05:07:25] RECOVERY - PyBal backends health check on lvs1003 is OK: PYBAL OK - All pools are healthy [05:07:34] RECOVERY - Apache HTTP on mw1193 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.807 second response time [05:07:34] RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy [05:07:34] RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy [05:07:34] RECOVERY - restbase endpoints health on restbase-test2003 is OK: All endpoints are healthy [05:07:34] RECOVERY - restbase endpoints health on restbase2008 is OK: All endpoints are healthy [05:07:34] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [05:07:34] RECOVERY - restbase endpoints health on restbase2010 is OK: All endpoints are healthy [05:07:44] RECOVERY - restbase endpoints health on restbase1011 is OK: All endpoints are healthy [05:07:44] RECOVERY - restbase endpoints health on restbase1007 is OK: All endpoints are healthy [05:07:44] RECOVERY - restbase endpoints health on restbase1014 is OK: All endpoints are healthy [05:07:44] RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy [05:07:44] RECOVERY - restbase endpoints health on restbase2007 is OK: All endpoints are healthy [05:07:44] RECOVERY - restbase endpoints health on restbase2003 is OK: All endpoints are healthy [05:07:44] RECOVERY - restbase endpoints health on restbase-test2002 is OK: All endpoints are healthy [05:07:45] RECOVERY - restbase endpoints health on cerium is OK: All endpoints are healthy [05:07:45] RECOVERY - restbase endpoints health on restbase2005 is OK: All endpoints are healthy [05:07:46] RECOVERY - restbase endpoints health on restbase2011 is OK: All endpoints are healthy [05:07:46] RECOVERY - restbase endpoints health on restbase2006 is OK: All endpoints are healthy [05:07:47] RECOVERY - restbase endpoints health on restbase2004 is OK: All endpoints are healthy [05:07:47] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy [05:07:48] RECOVERY - restbase endpoints health on praseodymium is OK: All endpoints are healthy [05:07:54] RECOVERY - restbase endpoints health on restbase2009 is OK: All endpoints are healthy [05:07:54] RECOVERY - restbase endpoints health on restbase1010 is OK: All endpoints are healthy [05:07:54] RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy [05:07:54] RECOVERY - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is OK: All endpoints are healthy [05:07:54] RECOVERY - HHVM rendering on mw1279 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.183 second response time [05:07:54] RECOVERY - HHVM rendering on mw1290 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.137 second response time [05:08:04] RECOVERY - HHVM rendering on mw1278 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 2.104 second response time [05:08:04] RECOVERY - Apache HTTP on mw1196 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.028 second response time [05:08:04] RECOVERY - Apache HTTP on mw1290 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.029 second response time [05:08:14] RECOVERY - Apache HTTP on mw1279 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.025 second response time [05:08:15] RECOVERY - HHVM rendering on mw1196 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.117 second response time [05:08:24] RECOVERY - Apache HTTP on mw1198 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.777 second response time [05:08:24] RECOVERY - HHVM rendering on mw1198 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 3.856 second response time [05:08:44] RECOVERY - restbase endpoints health on restbase1008 is OK: All endpoints are healthy [05:08:44] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy [05:08:54] PROBLEM - restbase endpoints health on restbase1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:08:54] RECOVERY - HHVM rendering on mw1282 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.256 second response time [05:09:04] RECOVERY - HHVM rendering on mw1276 is OK: HTTP OK: HTTP/1.1 200 OK - 69360 bytes in 1.161 second response time [05:09:04] RECOVERY - HHVM rendering on mw1287 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 5.508 second response time [05:09:14] RECOVERY - Apache HTTP on mw1287 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.990 second response time [05:09:24] RECOVERY - Apache HTTP on mw1282 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.031 second response time [05:09:24] RECOVERY - Apache HTTP on mw1276 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.030 second response time [05:09:44] RECOVERY - mobileapps endpoints health on scb2003 is OK: All endpoints are healthy [05:09:44] RECOVERY - restbase endpoints health on restbase1015 is OK: All endpoints are healthy [05:09:44] RECOVERY - mobileapps endpoints health on scb1004 is OK: All endpoints are healthy [05:09:44] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [05:10:04] RECOVERY - Apache HTTP on mw1283 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 4.993 second response time [05:10:04] RECOVERY - HHVM rendering on mw1283 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 2.916 second response time [05:10:04] RECOVERY - Apache HTTP on mw1195 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.773 second response time [05:10:04] RECOVERY - HHVM rendering on mw1195 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 3.926 second response time [05:10:04] RECOVERY - Apache HTTP on mw1197 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.068 second response time [05:10:24] RECOVERY - HHVM rendering on mw1197 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 2.113 second response time [05:10:54] RECOVERY - Apache HTTP on mw1286 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.029 second response time [05:10:54] RECOVERY - Apache HTTP on mw1202 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.042 second response time [05:10:54] RECOVERY - Apache HTTP on mw1199 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.095 second response time [05:10:54] RECOVERY - HHVM rendering on mw1286 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.140 second response time [05:10:54] RECOVERY - HHVM rendering on mw1200 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.165 second response time [05:11:04] RECOVERY - HHVM rendering on mw1202 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.238 second response time [05:11:14] RECOVERY - HHVM rendering on mw1280 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.130 second response time [05:11:24] RECOVERY - Apache HTTP on mw1200 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.061 second response time [05:11:24] RECOVERY - HHVM rendering on mw1199 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.128 second response time [05:11:24] PROBLEM - Apache HTTP on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:11:24] PROBLEM - HHVM rendering on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:11:24] PROBLEM - Apache HTTP on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:11:34] PROBLEM - HHVM rendering on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:11:44] RECOVERY - Apache HTTP on mw1281 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 9.207 second response time [05:12:04] PROBLEM - Apache HTTP on mw1191 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:12:24] PROBLEM - HHVM rendering on mw1192 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:12:34] PROBLEM - Apache HTTP on mw1192 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:12:44] PROBLEM - mobileapps endpoints health on scb1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:12:44] PROBLEM - mobileapps endpoints health on scb2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:13:04] PROBLEM - HHVM rendering on mw1223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:13:14] PROBLEM - Apache HTTP on mw1223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:13:14] RECOVERY - HHVM rendering on mw1284 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.150 second response time [05:13:24] PROBLEM - Apache HTTP on mw1203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:13:24] PROBLEM - HHVM rendering on mw1203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:13:25] PROBLEM - HHVM rendering on mw1208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:13:34] PROBLEM - HHVM rendering on mw1191 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:13:44] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:13:54] RECOVERY - Apache HTTP on mw1288 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.030 second response time [05:13:54] RECOVERY - HHVM rendering on mw1288 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.126 second response time [05:14:04] RECOVERY - HHVM rendering on mw1281 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.187 second response time [05:14:14] PROBLEM - Apache HTTP on mw1201 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50405 bytes in 0.003 second response time [05:14:14] PROBLEM - HHVM rendering on mw1280 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [05:14:14] RECOVERY - Apache HTTP on mw1278 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.030 second response time [05:14:24] PROBLEM - HHVM rendering on mw1289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:14:24] PROBLEM - HHVM rendering on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:14:34] PROBLEM - Apache HTTP on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:14:44] PROBLEM - mobileapps endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:15:04] RECOVERY - HHVM rendering on mw1223 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 1.537 second response time [05:15:04] RECOVERY - Apache HTTP on mw1223 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.142 second response time [05:15:04] PROBLEM - Apache HTTP on mw1289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:15:05] PROBLEM - HHVM rendering on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:15:14] PROBLEM - Apache HTTP on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:15:14] PROBLEM - HHVM rendering on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:15:14] RECOVERY - Apache HTTP on mw1201 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.046 second response time [05:15:14] RECOVERY - HHVM rendering on mw1280 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.180 second response time [05:15:14] RECOVERY - Apache HTTP on mw1203 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 4.152 second response time [05:15:24] RECOVERY - HHVM rendering on mw1203 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 2.106 second response time [05:15:24] RECOVERY - HHVM rendering on mw1196 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 3.598 second response time [05:15:24] RECOVERY - Apache HTTP on mw1198 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.014 second response time [05:15:24] RECOVERY - HHVM rendering on mw1198 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 5.126 second response time [05:15:24] PROBLEM - HHVM rendering on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:15:24] PROBLEM - Apache HTTP on mw1208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:15:24] PROBLEM - HHVM rendering on mw1193 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:15:25] PROBLEM - Apache HTTP on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:15:25] RECOVERY - Apache HTTP on mw1227 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.951 second response time [05:15:34] PROBLEM - HHVM rendering on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:15:44] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:15:44] PROBLEM - restbase endpoints health on restbase1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:15:44] PROBLEM - restbase endpoints health on restbase2007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:15:54] RECOVERY - Apache HTTP on mw1280 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.027 second response time [05:16:04] PROBLEM - HHVM rendering on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:16:14] PROBLEM - Apache HTTP on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:16:14] PROBLEM - HHVM rendering on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:16:14] PROBLEM - Apache HTTP on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:16:14] PROBLEM - Apache HTTP on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:16:14] PROBLEM - HHVM rendering on mw1284 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [05:16:24] PROBLEM - PyBal backends health check on lvs1006 is CRITICAL: PYBAL CRITICAL - api_80 - Could not depool server mw1276.eqiad.wmnet because of too many down! [05:16:24] PROBLEM - HHVM rendering on mw1204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:16:24] PROBLEM - HHVM rendering on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:16:24] PROBLEM - PyBal backends health check on lvs1003 is CRITICAL: PYBAL CRITICAL - api_80 - Could not depool server mw1282.eqiad.wmnet because of too many down! [05:16:34] PROBLEM - Apache HTTP on mw1204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:16:34] PROBLEM - Apache HTTP on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:16:44] PROBLEM - HHVM rendering on mw1207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:16:44] PROBLEM - restbase endpoints health on restbase2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:16:44] PROBLEM - restbase endpoints health on restbase1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:16:44] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:16:44] PROBLEM - restbase endpoints health on restbase2005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:16:44] PROBLEM - restbase endpoints health on restbase1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:16:44] PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:16:45] PROBLEM - restbase endpoints health on cerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:16:54] PROBLEM - restbase endpoints health on restbase2009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:16:54] PROBLEM - restbase endpoints health on restbase1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:16:54] PROBLEM - restbase endpoints health on restbase2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:17:04] RECOVERY - Apache HTTP on mw1284 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.021 second response time [05:17:04] PROBLEM - HHVM rendering on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:17:14] PROBLEM - Apache HTTP on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:17:14] PROBLEM - Apache HTTP on mw1234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:17:14] PROBLEM - HHVM rendering on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:17:14] RECOVERY - HHVM rendering on mw1284 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.438 second response time [05:17:14] PROBLEM - Apache HTTP on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:17:24] RECOVERY - HHVM rendering on mw1191 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.102 second response time [05:17:24] PROBLEM - Apache HTTP on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:17:24] PROBLEM - HHVM rendering on mw1234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:17:34] PROBLEM - Apache HTTP on mw1207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:17:34] PROBLEM - Apache HTTP on mw1193 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:17:36] PROBLEM - LVS HTTP IPv4 on api.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:17:44] PROBLEM - restbase endpoints health on restbase-test2002 is CRITICAL: /page/title/{title} (Get rev by title from storage) is CRITICAL: Test Get rev by title from storage returned the unexpected status 503 (expecting: 200) [05:17:44] PROBLEM - restbase endpoints health on restbase1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:17:47] !log roll restarting hhvm across api_cluster when hhvm uses more than 40% of memory [05:17:54] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:17:54] PROBLEM - restbase endpoints health on restbase1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:17:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:18:04] RECOVERY - Apache HTTP on mw1234 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.610 second response time [05:18:05] PROBLEM - HHVM rendering on mw1223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:18:14] PROBLEM - Apache HTTP on mw1223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:18:14] RECOVERY - Apache HTTP on mw1196 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.855 second response time [05:18:24] RECOVERY - HHVM rendering on mw1234 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 4.085 second response time [05:18:24] PROBLEM - Apache HTTP on mw1203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:18:24] PROBLEM - HHVM rendering on mw1203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:18:24] PROBLEM - HHVM rendering on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:18:24] PROBLEM - Apache HTTP on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:18:36] RECOVERY - LVS HTTP IPv4 on api.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 18294 bytes in 9.607 second response time [05:18:44] RECOVERY - restbase endpoints health on restbase-test2002 is OK: All endpoints are healthy [05:18:44] PROBLEM - restbase endpoints health on restbase2010 is CRITICAL: /page/revision/{revision} (Get rev by ID) is CRITICAL: Test Get rev by ID returned the unexpected status 503 (expecting: 200) [05:18:44] PROBLEM - restbase endpoints health on restbase2008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:18:44] PROBLEM - restbase endpoints health on restbase2006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:18:54] RECOVERY - Apache HTTP on mw1289 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.357 second response time [05:19:04] RECOVERY - HHVM rendering on mw1285 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.218 second response time [05:19:05] PROBLEM - HHVM rendering on mw1276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:19:05] RECOVERY - Apache HTTP on mw1277 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.043 second response time [05:19:14] RECOVERY - HHVM rendering on mw1277 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.145 second response time [05:19:14] RECOVERY - HHVM rendering on mw1289 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.186 second response time [05:19:14] RECOVERY - HHVM rendering on mw1204 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.208 second response time [05:19:24] RECOVERY - Apache HTTP on mw1204 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.035 second response time [05:19:24] RECOVERY - Apache HTTP on mw1207 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.042 second response time [05:19:24] RECOVERY - Apache HTTP on mw1285 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.038 second response time [05:19:24] RECOVERY - HHVM rendering on mw1227 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 4.189 second response time [05:19:34] PROBLEM - Apache HTTP on mw1276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:19:34] PROBLEM - HHVM rendering on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:19:34] RECOVERY - HHVM rendering on mw1207 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.128 second response time [05:19:44] RECOVERY - restbase endpoints health on restbase2006 is OK: All endpoints are healthy [05:19:44] RECOVERY - restbase endpoints health on restbase2005 is OK: All endpoints are healthy [05:19:44] PROBLEM - restbase endpoints health on restbase2011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:19:54] PROBLEM - restbase endpoints health on praseodymium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:20:04] RECOVERY - HHVM rendering on mw1223 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 5.114 second response time [05:20:04] RECOVERY - Apache HTTP on mw1223 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 4.380 second response time [05:20:04] PROBLEM - Apache HTTP on mw1230 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:20:05] PROBLEM - HHVM rendering on mw1230 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:20:05] PROBLEM - HHVM rendering on mw1282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:20:34] PROBLEM - HHVM rendering on mw1191 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:20:34] RECOVERY - restbase endpoints health on restbase2010 is OK: All endpoints are healthy [05:20:34] RECOVERY - restbase endpoints health on restbase1011 is OK: All endpoints are healthy [05:20:34] RECOVERY - restbase endpoints health on restbase2008 is OK: All endpoints are healthy [05:20:44] RECOVERY - restbase endpoints health on restbase1013 is OK: All endpoints are healthy [05:20:44] RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy [05:20:44] RECOVERY - restbase endpoints health on restbase2011 is OK: All endpoints are healthy [05:20:44] RECOVERY - restbase endpoints health on restbase1014 is OK: All endpoints are healthy [05:20:44] PROBLEM - restbase endpoints health on restbase-test2001 is CRITICAL: /page/random/{format} (Random title redirect) is CRITICAL: Test Random title redirect returned the unexpected status 503 (expecting: 303): /page/title/{title} (Get rev by title from storage) is CRITICAL: Test Get rev by title from storage returned the unexpected status 503 (expecting: 200) [05:20:44] RECOVERY - restbase endpoints health on restbase2003 is OK: All endpoints are healthy [05:20:44] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [05:20:45] RECOVERY - restbase endpoints health on restbase1007 is OK: All endpoints are healthy [05:20:45] RECOVERY - restbase endpoints health on restbase1010 is OK: All endpoints are healthy [05:20:46] RECOVERY - restbase endpoints health on restbase2009 is OK: All endpoints are healthy [05:20:46] RECOVERY - restbase endpoints health on praseodymium is OK: All endpoints are healthy [05:20:47] RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy [05:20:54] RECOVERY - restbase endpoints health on restbase2004 is OK: All endpoints are healthy [05:20:54] RECOVERY - HHVM rendering on mw1279 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.195 second response time [05:21:14] RECOVERY - Apache HTTP on mw1203 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.466 second response time [05:21:14] RECOVERY - Apache HTTP on mw1208 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.040 second response time [05:21:14] RECOVERY - HHVM rendering on mw1203 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 1.047 second response time [05:21:24] RECOVERY - Apache HTTP on mw1279 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.020 second response time [05:21:24] RECOVERY - HHVM rendering on mw1208 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.126 second response time [05:21:24] PROBLEM - Apache HTTP on mw1282 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [05:21:24] RECOVERY - PyBal backends health check on lvs1006 is OK: PYBAL OK - All pools are healthy [05:21:24] RECOVERY - PyBal backends health check on lvs1003 is OK: PYBAL OK - All pools are healthy [05:21:34] RECOVERY - HHVM rendering on mw1199 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 5.007 second response time [05:21:34] RECOVERY - restbase endpoints health on cerium is OK: All endpoints are healthy [05:21:34] RECOVERY - restbase endpoints health on restbase2007 is OK: All endpoints are healthy [05:21:44] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy [05:21:44] RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy [05:21:44] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [05:21:44] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy [05:21:44] RECOVERY - mobileapps endpoints health on scb2003 is OK: All endpoints are healthy [05:21:44] RECOVERY - restbase endpoints health on restbase1015 is OK: All endpoints are healthy [05:21:57] !log Live-hacked api.php on mw1290 to die if request user-agent contains 'Parsoid'; restarted HHVM. [05:22:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:22:14] PROBLEM - Apache HTTP on mw1234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:22:14] PROBLEM - HHVM rendering on mw1233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:22:14] RECOVERY - Apache HTTP on mw1222 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.845 second response time [05:22:24] RECOVERY - HHVM rendering on mw1191 is OK: HTTP OK: HTTP/1.1 200 OK - 69366 bytes in 0.217 second response time [05:22:24] RECOVERY - Apache HTTP on mw1282 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.031 second response time [05:22:24] RECOVERY - Apache HTTP on mw1276 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.023 second response time [05:22:34] RECOVERY - HHVM rendering on mw1222 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 5.665 second response time [05:22:44] RECOVERY - mobileapps endpoints health on scb2004 is OK: All endpoints are healthy [05:22:44] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy [05:22:54] RECOVERY - Apache HTTP on mw1191 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.055 second response time [05:22:54] RECOVERY - HHVM rendering on mw1282 is OK: HTTP OK: HTTP/1.1 200 OK - 69356 bytes in 0.080 second response time [05:23:04] RECOVERY - HHVM rendering on mw1276 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.569 second response time [05:23:04] RECOVERY - Apache HTTP on mw1290 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.495 second response time [05:23:04] RECOVERY - Apache HTTP on mw1230 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.491 second response time [05:23:04] RECOVERY - HHVM rendering on mw1230 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 7.643 second response time [05:23:04] PROBLEM - HHVM rendering on mw1223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:23:14] RECOVERY - Apache HTTP on mw1234 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.393 second response time [05:23:14] PROBLEM - Apache HTTP on mw1223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:23:34] RECOVERY - HHVM rendering on mw1194 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 7.171 second response time [05:23:44] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:23:54] RECOVERY - HHVM rendering on mw1290 is OK: HTTP OK: HTTP/1.1 200 OK - 69356 bytes in 0.073 second response time [05:24:04] PROBLEM - Apache HTTP on mw1233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:24:14] RECOVERY - Apache HTTP on mw1198 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.055 second response time [05:24:14] RECOVERY - HHVM rendering on mw1198 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.308 second response time [05:24:24] RECOVERY - HHVM rendering on mw1192 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 2.785 second response time [05:24:24] PROBLEM - Apache HTTP on mw1203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:24:24] RECOVERY - Apache HTTP on mw1192 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 2.282 second response time [05:24:24] PROBLEM - HHVM rendering on mw1203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:24:25] RECOVERY - Apache HTTP on mw1194 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.207 second response time [05:24:44] PROBLEM - mobileapps endpoints health on scb2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:25:24] PROBLEM - Apache HTTP on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:25:24] PROBLEM - HHVM rendering on mw1189 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:25:34] PROBLEM - Apache HTTP on mw1232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:25:34] PROBLEM - HHVM rendering on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:25:54] RECOVERY - HHVM rendering on mw1205 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.152 second response time [05:26:04] RECOVERY - Apache HTTP on mw1195 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.028 second response time [05:26:04] RECOVERY - Apache HTTP on mw1205 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.035 second response time [05:26:04] RECOVERY - Apache HTTP on mw1206 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.665 second response time [05:26:04] RECOVERY - HHVM rendering on mw1195 is OK: HTTP OK: HTTP/1.1 200 OK - 69360 bytes in 2.926 second response time [05:26:04] PROBLEM - Apache HTTP on mw1189 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:26:05] PROBLEM - Apache HTTP on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:26:05] PROBLEM - Apache HTTP on mw1190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:26:05] PROBLEM - HHVM rendering on mw1200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:26:05] PROBLEM - Apache HTTP on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:26:14] RECOVERY - HHVM rendering on mw1206 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 5.544 second response time [05:26:14] PROBLEM - HHVM rendering on mw1232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:26:24] (03CR) 10Huji: "That was unintentional. Thanks for correcting it." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322243 (https://phabricator.wikimedia.org/T66795) (owner: 10Legoktm) [05:26:24] PROBLEM - Apache HTTP on mw1229 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:26:24] PROBLEM - HHVM rendering on mw1235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:26:34] PROBLEM - HHVM rendering on mw1234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:26:34] PROBLEM - HHVM rendering on mw1229 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:26:44] PROBLEM - restbase endpoints health on restbase1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:26:44] PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:26:44] PROBLEM - mobileapps endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:26:54] PROBLEM - restbase endpoints health on praseodymium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:27:04] RECOVERY - Apache HTTP on mw1233 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.811 second response time [05:27:04] PROBLEM - Apache HTTP on mw1288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:27:05] PROBLEM - HHVM rendering on mw1225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:27:05] PROBLEM - HHVM rendering on mw1288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:27:14] PROBLEM - Apache HTTP on mw1225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:27:14] RECOVERY - HHVM rendering on mw1193 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.182 second response time [05:27:24] RECOVERY - Apache HTTP on mw1232 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.054 second response time [05:27:24] PROBLEM - PyBal backends health check on lvs1006 is CRITICAL: PYBAL CRITICAL - api_80 - Could not depool server mw1194.eqiad.wmnet because of too many down! [05:27:24] RECOVERY - Apache HTTP on mw1193 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.046 second response time [05:27:24] PROBLEM - HHVM rendering on mw1221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:27:24] PROBLEM - PyBal backends health check on lvs1003 is CRITICAL: PYBAL CRITICAL - api_80 - Could not depool server mw1192.eqiad.wmnet because of too many down! [05:27:34] PROBLEM - Apache HTTP on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:27:34] PROBLEM - Apache HTTP on mw1200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:27:34] PROBLEM - Apache HTTP on mw1221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:27:34] PROBLEM - HHVM rendering on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:27:34] PROBLEM - HHVM rendering on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:27:44] RECOVERY - HHVM rendering on mw1229 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 6.041 second response time [05:27:44] PROBLEM - mobileapps endpoints health on scb2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:27:44] PROBLEM - restbase endpoints health on restbase1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:27:44] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:27:44] PROBLEM - restbase endpoints health on xenon is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:27:44] PROBLEM - restbase endpoints health on restbase-test2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:27:54] PROBLEM - restbase endpoints health on restbase2009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:27:54] RECOVERY - Apache HTTP on mw1199 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.050 second response time [05:28:04] RECOVERY - HHVM rendering on mw1200 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.794 second response time [05:28:04] PROBLEM - Apache HTTP on mw1289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:28:04] PROBLEM - Apache HTTP on mw1283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:28:04] PROBLEM - HHVM rendering on mw1283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:28:14] RECOVERY - HHVM rendering on mw1232 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 8.021 second response time [05:28:14] PROBLEM - Apache HTTP on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:28:24] PROBLEM - HHVM rendering on mw1289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:28:24] RECOVERY - Apache HTTP on mw1194 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.379 second response time [05:28:24] RECOVERY - HHVM rendering on mw1194 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 2.000 second response time [05:28:34] PROBLEM - HHVM rendering on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:28:34] RECOVERY - Apache HTTP on mw1200 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 9.743 second response time [05:28:44] PROBLEM - restbase endpoints health on restbase1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:44] PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:44] PROBLEM - restbase endpoints health on restbase2008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:44] PROBLEM - restbase endpoints health on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:44] PROBLEM - restbase endpoints health on restbase2010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:44] PROBLEM - restbase endpoints health on restbase2005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:44] PROBLEM - restbase endpoints health on restbase-test2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:45] PROBLEM - restbase endpoints health on restbase1008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:45] PROBLEM - restbase endpoints health on restbase-test2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:54] PROBLEM - restbase endpoints health on restbase2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:54] PROBLEM - restbase endpoints health on restbase1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:54] PROBLEM - restbase endpoints health on restbase1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:54] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:29:04] RECOVERY - Apache HTTP on mw1288 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 6.192 second response time [05:29:04] RECOVERY - Apache HTTP on mw1283 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 6.938 second response time [05:29:04] RECOVERY - HHVM rendering on mw1288 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 4.953 second response time [05:29:04] RECOVERY - HHVM rendering on mw1283 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 5.098 second response time [05:29:04] PROBLEM - Apache HTTP on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:04] RECOVERY - HHVM rendering on mw1225 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 8.092 second response time [05:29:04] PROBLEM - Apache HTTP on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:05] PROBLEM - Apache HTTP on mw1230 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:05] PROBLEM - HHVM rendering on mw1230 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:14] RECOVERY - Apache HTTP on mw1225 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.045 second response time [05:29:14] PROBLEM - Apache HTTP on mw1234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:14] PROBLEM - Apache HTTP on mw1235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:14] PROBLEM - Apache HTTP on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:14] PROBLEM - HHVM rendering on mw1202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:14] PROBLEM - Apache HTTP on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:14] PROBLEM - HHVM rendering on mw1190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:15] PROBLEM - HHVM rendering on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:15] RECOVERY - Apache HTTP on mw1197 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.447 second response time [05:29:16] RECOVERY - HHVM rendering on mw1221 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 2.378 second response time [05:29:16] PROBLEM - Apache HTTP on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:17] PROBLEM - Apache HTTP on mw1287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:24] PROBLEM - HHVM rendering on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:24] PROBLEM - Apache HTTP on mw1208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:25] PROBLEM - HHVM rendering on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:25] RECOVERY - Apache HTTP on mw1221 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.675 second response time [05:29:25] PROBLEM - HHVM rendering on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:25] PROBLEM - HHVM rendering on mw1204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:34] RECOVERY - HHVM rendering on mw1197 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 5.713 second response time [05:29:34] PROBLEM - HHVM rendering on mw1208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:34] PROBLEM - HHVM rendering on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:34] PROBLEM - Apache HTTP on mw1204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:36] PROBLEM - LVS HTTP IPv4 on api.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:44] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:29:44] PROBLEM - restbase endpoints health on restbase2007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:29:44] PROBLEM - restbase endpoints health on restbase1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:29:44] PROBLEM - restbase endpoints health on cerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:29:44] PROBLEM - restbase endpoints health on restbase2006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:29:54] RECOVERY - restbase endpoints health on restbase2009 is OK: All endpoints are healthy [05:30:04] PROBLEM - Apache HTTP on mw1233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:30:05] PROBLEM - HHVM rendering on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:30:05] PROBLEM - HHVM rendering on mw1278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:30:05] PROBLEM - HHVM rendering on mw1287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:30:34] PROBLEM - HHVM rendering on mw1192 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:30:34] PROBLEM - Apache HTTP on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:30:34] PROBLEM - Apache HTTP on mw1192 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:30:43] !log roll restarting hhvm across api_cluster when hhvm uses more than 40% of memory [05:30:44] RECOVERY - restbase endpoints health on restbase1007 is OK: All endpoints are healthy [05:30:44] PROBLEM - restbase endpoints health on restbase2011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:30:44] PROBLEM - restbase endpoints health on restbase2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:30:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:30:54] PROBLEM - HHVM rendering on mw1229 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:31:04] RECOVERY - Apache HTTP on mw1226 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.309 second response time [05:31:04] PROBLEM - Apache HTTP on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:31:14] RECOVERY - Apache HTTP on mw1229 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.404 second response time [05:31:24] RECOVERY - HHVM rendering on mw1226 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 1.359 second response time [05:31:34] PROBLEM - Apache HTTP on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:31:34] PROBLEM - HHVM rendering on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:31:34] PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /page/random/{format} (Random title redirect) is CRITICAL: Test Random title redirect returned the unexpected status 503 (expecting: 303): /page/title/{title} (Get rev by title from storage) is CRITICAL: Test Get rev by title from storage returned the unexpected status 503 (expecting: 200) [05:31:44] RECOVERY - HHVM rendering on mw1229 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 7.541 second response time [05:32:04] RECOVERY - Apache HTTP on mw1233 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.269 second response time [05:32:04] RECOVERY - Apache HTTP on mw1235 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 2.009 second response time [05:32:04] PROBLEM - Apache HTTP on mw1288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:32:05] PROBLEM - HHVM rendering on mw1288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:32:05] PROBLEM - HHVM rendering on mw1225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:32:14] PROBLEM - Apache HTTP on mw1225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:32:14] RECOVERY - HHVM rendering on mw1233 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 5.494 second response time [05:32:15] RECOVERY - HHVM rendering on mw1235 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 1.414 second response time [05:32:24] RECOVERY - Apache HTTP on mw1204 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.046 second response time [05:32:24] RECOVERY - HHVM rendering on mw1189 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 9.183 second response time [05:32:24] PROBLEM - Apache HTTP on mw1278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:32:44] RECOVERY - restbase endpoints health on restbase2006 is OK: All endpoints are healthy [05:32:44] RECOVERY - restbase endpoints health on restbase2008 is OK: All endpoints are healthy [05:32:44] RECOVERY - restbase endpoints health on restbase-test2002 is OK: All endpoints are healthy [05:32:44] RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy [05:32:44] RECOVERY - restbase endpoints health on restbase2007 is OK: All endpoints are healthy [05:32:44] RECOVERY - restbase endpoints health on restbase2010 is OK: All endpoints are healthy [05:32:54] RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy [05:32:54] RECOVERY - restbase endpoints health on praseodymium is OK: All endpoints are healthy [05:32:54] PROBLEM - restbase endpoints health on restbase2009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:32:54] RECOVERY - Apache HTTP on mw1289 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.024 second response time [05:32:54] RECOVERY - Apache HTTP on mw1230 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.029 second response time [05:32:54] PROBLEM - Codfw HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [05:32:54] RECOVERY - HHVM rendering on mw1230 is OK: HTTP OK: HTTP/1.1 200 OK - 69356 bytes in 0.088 second response time [05:33:04] RECOVERY - HHVM rendering on mw1225 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 4.522 second response time [05:33:04] RECOVERY - Apache HTTP on mw1189 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.967 second response time [05:33:05] RECOVERY - Apache HTTP on mw1277 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.030 second response time [05:33:12] !log With Parsoid requests hacked to fail fast, mw1290 is not showing the kind of aggressive growth in memory usage we're seeing on other API servers [05:33:14] PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) [05:33:14] RECOVERY - HHVM rendering on mw1289 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.113 second response time [05:33:15] RECOVERY - HHVM rendering on mw1277 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.309 second response time [05:33:15] RECOVERY - HHVM rendering on mw1204 is OK: HTTP OK: HTTP/1.1 200 OK - 69357 bytes in 0.111 second response time [05:33:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:33:26] RECOVERY - LVS HTTP IPv4 on api.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 18294 bytes in 1.756 second response time [05:33:44] RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy [05:33:44] RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy [05:33:44] RECOVERY - restbase endpoints health on restbase2001 is OK: All endpoints are healthy [05:33:44] RECOVERY - restbase endpoints health on restbase-test2003 is OK: All endpoints are healthy [05:33:44] PROBLEM - restbase endpoints health on restbase1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:34:04] RECOVERY - Apache HTTP on mw1234 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.059 second response time [05:34:04] RECOVERY - HHVM rendering on mw1278 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 3.238 second response time [05:34:04] RECOVERY - Apache HTTP on mw1288 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 6.503 second response time [05:34:04] RECOVERY - HHVM rendering on mw1288 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 4.619 second response time [05:34:05] RECOVERY - Apache HTTP on mw1225 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 6.840 second response time [05:34:14] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [05:34:14] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [05:34:14] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0] [05:34:14] RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy [05:34:15] RECOVERY - Apache HTTP on mw1208 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.038 second response time [05:34:24] RECOVERY - HHVM rendering on mw1234 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.127 second response time [05:34:24] RECOVERY - HHVM rendering on mw1208 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.153 second response time [05:34:24] RECOVERY - Apache HTTP on mw1194 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.151 second response time [05:34:24] RECOVERY - HHVM rendering on mw1194 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.659 second response time [05:35:04] RECOVERY - HHVM rendering on mw1223 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 5.613 second response time [05:35:04] RECOVERY - Apache HTTP on mw1223 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.684 second response time [05:35:04] PROBLEM - HHVM rendering on mw1200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:35:14] PROBLEM - Apache HTTP on mw1235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:35:14] PROBLEM - HHVM rendering on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:35:24] PROBLEM - Apache HTTP on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:35:24] PROBLEM - HHVM rendering on mw1189 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:35:25] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0] [05:35:34] PROBLEM - Apache HTTP on mw1200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:35:44] PROBLEM - restbase endpoints health on restbase2008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:35:44] PROBLEM - restbase endpoints health on restbase2010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:35:44] PROBLEM - restbase endpoints health on xenon is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:35:44] PROBLEM - restbase endpoints health on restbase2006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:35:44] PROBLEM - restbase endpoints health on restbase-test2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:35:54] PROBLEM - restbase endpoints health on praseodymium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:35:54] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:36:04] PROBLEM - Apache HTTP on mw1233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:36:04] PROBLEM - Apache HTTP on mw1283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:36:04] PROBLEM - Apache HTTP on mw1189 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:36:05] RECOVERY - HHVM rendering on mw1279 is OK: HTTP OK: HTTP/1.1 200 OK - 69359 bytes in 8.934 second response time [05:36:05] PROBLEM - HHVM rendering on mw1283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:36:14] PROBLEM - HHVM rendering on mw1281 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:36:14] PROBLEM - HHVM rendering on mw1233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:36:14] PROBLEM - Apache HTTP on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:36:24] PROBLEM - HHVM rendering on mw1235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:36:24] PROBLEM - Apache HTTP on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:36:25] PROBLEM - HHVM rendering on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:36:34] PROBLEM - HHVM rendering on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:36:36] PROBLEM - LVS HTTP IPv4 on api.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:36:44] PROBLEM - Apache HTTP on mw1281 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:36:44] PROBLEM - HHVM rendering on mw1207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:36:44] PROBLEM - restbase endpoints health on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:36:44] PROBLEM - restbase endpoints health on restbase2007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:36:44] PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:36:44] PROBLEM - restbase endpoints health on restbase-test2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:36:45] !log Commented-out lived-hack from mw1290; if we see memory growth now, Parsoid would be strongly implicated. [05:36:54] RECOVERY - Apache HTTP on mw1202 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.059 second response time [05:36:54] PROBLEM - HHVM rendering on mw1278 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [05:36:54] RECOVERY - Apache HTTP on mw1199 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.065 second response time [05:36:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:37:04] RECOVERY - Apache HTTP on mw1206 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.110 second response time [05:37:04] RECOVERY - HHVM rendering on mw1202 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.193 second response time [05:37:04] RECOVERY - HHVM rendering on mw1206 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.220 second response time [05:37:05] PROBLEM - Apache HTTP on mw1288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:37:05] PROBLEM - HHVM rendering on mw1288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:37:14] RECOVERY - HHVM rendering on mw1190 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 7.868 second response time [05:37:24] RECOVERY - HHVM rendering on mw1199 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.173 second response time [05:37:34] PROBLEM - Apache HTTP on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:37:34] PROBLEM - HHVM rendering on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:37:34] PROBLEM - HHVM rendering on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:37:34] PROBLEM - Apache HTTP on mw1207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:37:34] PROBLEM - Apache HTTP on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:37:34] PROBLEM - HHVM rendering on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:37:44] RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy [05:37:44] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:37:54] RECOVERY - Apache HTTP on mw1190 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.169 second response time [05:38:04] PROBLEM - HHVM rendering on mw1225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:38:14] PROBLEM - Apache HTTP on mw1225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:38:14] PROBLEM - HHVM rendering on mw1232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:38:14] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [05:38:24] PROBLEM - HHVM rendering on mw1224 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:38:26] RECOVERY - LVS HTTP IPv4 on api.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 18294 bytes in 2.806 second response time [05:38:34] RECOVERY - HHVM rendering on mw1226 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 6.889 second response time [05:38:34] PROBLEM - Apache HTTP on mw1276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:38:34] PROBLEM - Apache HTTP on mw1232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:38:34] RECOVERY - restbase endpoints health on restbase2001 is OK: All endpoints are healthy [05:38:54] PROBLEM - HHVM rendering on mw1229 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:38:54] RECOVERY - Apache HTTP on mw1283 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.023 second response time [05:38:54] RECOVERY - Apache HTTP on mw1280 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.325 second response time [05:38:54] PROBLEM - HHVM rendering on mw1279 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [05:38:54] RECOVERY - HHVM rendering on mw1278 is OK: HTTP OK: HTTP/1.1 200 OK - 69356 bytes in 0.096 second response time [05:38:54] RECOVERY - HHVM rendering on mw1283 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 0.178 second response time [05:39:04] PROBLEM - HHVM rendering on mw1276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:39:14] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [05:39:14] PROBLEM - Restbase LVS codfw on restbase.svc.codfw.wmnet is CRITICAL: /page/revision/{revision} (Get rev by ID) is CRITICAL: Test Get rev by ID returned the unexpected status 503 (expecting: 200): /page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) [05:39:14] RECOVERY - Apache HTTP on mw1278 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.022 second response time [05:39:24] RECOVERY - Apache HTTP on mw1203 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.044 second response time [05:39:24] PROBLEM - Apache HTTP on mw1229 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:39:24] RECOVERY - HHVM rendering on mw1280 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 6.024 second response time [05:39:24] RECOVERY - HHVM rendering on mw1203 is OK: HTTP OK: HTTP/1.1 200 OK - 69358 bytes in 6.797 second response time [05:39:34] PROBLEM - Apache HTTP on mw1231 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:39:44] RECOVERY - restbase endpoints health on restbase1013 is OK: All endpoints are healthy [05:39:54] RECOVERY - restbase endpoints health on restbase1015 is OK: All endpoints are healthy [05:40:04] RECOVERY - HHVM rendering on mw1281 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.240 second response time [05:40:05] PROBLEM - HHVM rendering on mw1223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:40:05] RECOVERY - Apache HTTP on mw1287 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.060 second response time [05:40:14] PROBLEM - Apache HTTP on mw1223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:40:14] PROBLEM - Apache HTTP on mw1224 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:40:24] RECOVERY - Apache HTTP on mw1279 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.024 second response time [05:40:24] PROBLEM - HHVM rendering on mw1231 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:40:34] RECOVERY - Apache HTTP on mw1281 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.026 second response time [05:40:44] RECOVERY - restbase endpoints health on restbase2006 is OK: All endpoints are healthy [05:40:44] RECOVERY - restbase endpoints health on restbase1011 is OK: All endpoints are healthy [05:40:55] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:40:55] RECOVERY - Apache HTTP on mw1288 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.021 second response time [05:40:55] RECOVERY - HHVM rendering on mw1279 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.162 second response time [05:40:55] RECOVERY - HHVM rendering on mw1287 is OK: HTTP OK: HTTP/1.1 200 OK - 69336 bytes in 0.085 second response time [05:40:55] RECOVERY - HHVM rendering on mw1288 is OK: HTTP OK: HTTP/1.1 200 OK - 69337 bytes in 0.158 second response time [05:41:04] PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /page/title/{title} (Get rev by title from storage) is CRITICAL: Test Get rev by title from storage returned the unexpected status 503 (expecting: 200) [05:41:14] RECOVERY - Apache HTTP on mw1224 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 6.714 second response time [05:41:14] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [1000.0] [05:41:24] PROBLEM - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is CRITICAL: /{domain}/v1/page/random/title (retrieve a random article) is CRITICAL: Test retrieve a random article returned the unexpected status 503 (expecting: 200): /{domain}/v1/page/mobile-sections/{title} (retrieve en.wp main page via mobile-sections) is CRITICAL: Test retrieve en.wp main page via mobile-sections returned the unexpected status 503 (expecting: 20 [05:41:24] RECOVERY - HHVM rendering on mw1224 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 4.430 second response time [05:41:44] RECOVERY - restbase endpoints health on restbase2010 is OK: All endpoints are healthy [05:41:44] RECOVERY - restbase endpoints health on restbase2007 is OK: All endpoints are healthy [05:41:44] RECOVERY - HHVM rendering on mw1229 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.220 second response time [05:41:44] RECOVERY - restbase endpoints health on restbase1007 is OK: All endpoints are healthy [05:41:44] RECOVERY - restbase endpoints health on restbase2011 is OK: All endpoints are healthy [05:41:44] RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy [05:41:44] RECOVERY - restbase endpoints health on restbase2008 is OK: All endpoints are healthy [05:41:45] RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy [05:41:45] RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy [05:41:46] RECOVERY - restbase endpoints health on restbase2003 is OK: All endpoints are healthy [05:41:46] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [05:41:47] RECOVERY - restbase endpoints health on restbase-test2002 is OK: All endpoints are healthy [05:41:47] RECOVERY - restbase endpoints health on restbase2005 is OK: All endpoints are healthy [05:41:48] RECOVERY - restbase endpoints health on restbase-test2003 is OK: All endpoints are healthy [05:42:04] RECOVERY - Restbase LVS codfw on restbase.svc.codfw.wmnet is OK: All endpoints are healthy [05:42:04] RECOVERY - Apache HTTP on mw1284 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.033 second response time [05:42:05] RECOVERY - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is OK: All endpoints are healthy [05:42:14] RECOVERY - Apache HTTP on mw1229 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.027 second response time [05:42:14] PROBLEM - Apache HTTP on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:42:14] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [1000.0] [05:42:24] RECOVERY - HHVM rendering on mw1284 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.166 second response time [05:42:34] RECOVERY - restbase endpoints health on cerium is OK: All endpoints are healthy [05:42:44] RECOVERY - restbase endpoints health on restbase1008 is OK: All endpoints are healthy [05:42:44] RECOVERY - restbase endpoints health on restbase2009 is OK: All endpoints are healthy [05:43:04] RECOVERY - HHVM rendering on mw1232 is OK: HTTP OK: HTTP/1.1 200 OK - 69337 bytes in 0.241 second response time [05:43:04] PROBLEM - Apache HTTP on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:43:04] PROBLEM - Apache HTTP on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:43:04] PROBLEM - HHVM rendering on mw1286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:43:14] RECOVERY - Apache HTTP on mw1222 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.045 second response time [05:43:24] PROBLEM - HHVM rendering on mw1221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:43:24] PROBLEM - Apache HTTP on mw1203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:43:24] RECOVERY - HHVM rendering on mw1235 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 8.371 second response time [05:43:54] RECOVERY - HHVM rendering on mw1223 is OK: HTTP OK: HTTP/1.1 200 OK - 69336 bytes in 0.078 second response time [05:44:04] RECOVERY - HHVM rendering on mw1225 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 1.733 second response time [05:44:04] RECOVERY - Apache HTTP on mw1223 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.125 second response time [05:44:04] RECOVERY - Apache HTTP on mw1225 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.542 second response time [05:44:04] RECOVERY - Apache HTTP on mw1233 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.329 second response time [05:44:04] RECOVERY - Apache HTTP on mw1235 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 4.526 second response time [05:44:04] RECOVERY - HHVM rendering on mw1233 is OK: HTTP OK: HTTP/1.1 200 OK - 69336 bytes in 0.086 second response time [05:44:34] PROBLEM - HHVM rendering on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:44:36] PROBLEM - LVS HTTP IPv4 on api.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:44:44] RECOVERY - puppet last run on cp3047 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [05:44:44] PROBLEM - restbase endpoints health on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:44:44] PROBLEM - restbase endpoints health on restbase-test2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:44:44] PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:45:04] PROBLEM - HHVM rendering on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:45:14] PROBLEM - HHVM rendering on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:45:14] PROBLEM - Apache HTTP on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:45:14] PROBLEM - Apache HTTP on mw1224 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:45:14] PROBLEM - HHVM rendering on mw1190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:45:24] RECOVERY - Apache HTTP on mw1207 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.090 second response time [05:45:26] RECOVERY - LVS HTTP IPv4 on api.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 18293 bytes in 0.182 second response time [05:45:26] PROBLEM - HHVM rendering on mw1224 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:45:26] RECOVERY - HHVM rendering on mw1227 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 4.521 second response time [05:45:27] RECOVERY - Apache HTTP on mw1198 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.574 second response time [05:45:27] RECOVERY - HHVM rendering on mw1198 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 5.638 second response time [05:45:34] RECOVERY - Apache HTTP on mw1227 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.425 second response time [05:45:34] PROBLEM - HHVM rendering on mw1203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:45:34] PROBLEM - HHVM rendering on mw1196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:45:34] PROBLEM - Apache HTTP on mw1221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:45:34] RECOVERY - HHVM rendering on mw1207 is OK: HTTP OK: HTTP/1.1 200 OK - 69336 bytes in 0.084 second response time [05:45:44] RECOVERY - restbase endpoints health on restbase2001 is OK: All endpoints are healthy [05:45:44] PROBLEM - restbase endpoints health on restbase1008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:45:44] PROBLEM - restbase endpoints health on cerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:45:44] PROBLEM - restbase endpoints health on restbase-test2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:46:04] PROBLEM - Apache HTTP on mw1190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:46:14] PROBLEM - HHVM rendering on mw1232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:46:15] PROBLEM - Apache HTTP on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:46:24] RECOVERY - HHVM rendering on mw1192 is OK: HTTP OK: HTTP/1.1 200 OK - 69337 bytes in 0.483 second response time [05:46:24] RECOVERY - Apache HTTP on mw1194 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.753 second response time [05:46:24] PROBLEM - Apache HTTP on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:46:24] RECOVERY - Apache HTTP on mw1192 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.977 second response time [05:46:44] RECOVERY - restbase endpoints health on cerium is OK: All endpoints are healthy [05:46:44] RECOVERY - restbase endpoints health on restbase1014 is OK: All endpoints are healthy [05:46:44] PROBLEM - restbase endpoints health on restbase2011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:46:44] PROBLEM - restbase endpoints health on restbase-test2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:46:44] PROBLEM - restbase endpoints health on xenon is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:46:54] PROBLEM - restbase endpoints health on restbase2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:46:54] PROBLEM - restbase endpoints health on praseodymium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:46:54] RECOVERY - Codfw HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [05:47:04] RECOVERY - Apache HTTP on mw1226 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.001 second response time [05:47:05] PROBLEM - HHVM rendering on mw1223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:47:05] RECOVERY - HHVM rendering on mw1201 is OK: HTTP OK: HTTP/1.1 200 OK - 69336 bytes in 0.084 second response time [05:47:14] PROBLEM - Apache HTTP on mw1223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:47:14] RECOVERY - Apache HTTP on mw1201 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.253 second response time [05:47:15] PROBLEM - HHVM rendering on mw1233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:47:24] RECOVERY - Apache HTTP on mw1276 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.036 second response time [05:47:25] RECOVERY - HHVM rendering on mw1194 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 1.944 second response time [05:47:34] PROBLEM - HHVM rendering on mw1191 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:47:34] RECOVERY - HHVM rendering on mw1226 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 8.009 second response time [05:47:34] RECOVERY - restbase endpoints health on restbase1008 is OK: All endpoints are healthy [05:47:54] PROBLEM - restbase endpoints health on restbase1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:47:54] RECOVERY - HHVM rendering on mw1200 is OK: HTTP OK: HTTP/1.1 200 OK - 69336 bytes in 0.084 second response time [05:48:04] PROBLEM - Apache HTTP on mw1233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:48:24] RECOVERY - HHVM rendering on mw1197 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.152 second response time [05:48:24] RECOVERY - Apache HTTP on mw1200 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.491 second response time [05:48:34] PROBLEM - Apache HTTP on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:48:34] PROBLEM - HHVM rendering on mw1198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:48:44] RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy [05:48:44] PROBLEM - restbase endpoints health on restbase2005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:48:44] PROBLEM - restbase endpoints health on restbase2007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:48:54] PROBLEM - restbase endpoints health on restbase2009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:49:04] RECOVERY - Apache HTTP on mw1197 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.352 second response time [05:49:04] PROBLEM - salt-minion processes on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:49:04] PROBLEM - dhclient process on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:49:24] PROBLEM - HHVM rendering on mw1235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:49:44] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [05:49:44] PROBLEM - restbase endpoints health on restbase1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:49:44] PROBLEM - restbase endpoints health on restbase1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:49:54] RECOVERY - restbase endpoints health on restbase2009 is OK: All endpoints are healthy [05:49:54] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:50:04] PROBLEM - Apache HTTP on mw1191 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:50:14] PROBLEM - Apache HTTP on mw1235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:50:34] PROBLEM - HHVM rendering on mw1192 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:50:34] PROBLEM - Apache HTTP on mw1276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:50:34] PROBLEM - Apache HTTP on mw1192 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:50:34] PROBLEM - HHVM rendering on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:50:34] RECOVERY - restbase endpoints health on restbase1014 is OK: All endpoints are healthy [05:50:44] RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy [05:50:44] RECOVERY - restbase endpoints health on restbase1011 is OK: All endpoints are healthy [05:50:44] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:50:44] PROBLEM - restbase endpoints health on restbase1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:50:44] PROBLEM - restbase endpoints health on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:50:44] RECOVERY - restbase endpoints health on restbase1015 is OK: All endpoints are healthy [05:50:55] RECOVERY - Apache HTTP on mw1189 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.131 second response time [05:50:55] RECOVERY - salt-minion processes on thumbor1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [05:50:55] RECOVERY - dhclient process on thumbor1002 is OK: PROCS OK: 0 processes with command name dhclient [05:51:04] RECOVERY - HHVM rendering on mw1223 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 1.613 second response time [05:51:04] RECOVERY - Apache HTTP on mw1223 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.045 second response time [05:51:05] PROBLEM - Apache HTTP on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:51:14] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [05:51:24] RECOVERY - HHVM rendering on mw1189 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 5.949 second response time [05:51:24] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [05:51:34] PROBLEM - HHVM rendering on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:51:34] RECOVERY - restbase endpoints health on restbase-test2003 is OK: All endpoints are healthy [05:51:34] RECOVERY - restbase endpoints health on restbase2011 is OK: All endpoints are healthy [05:51:44] RECOVERY - restbase endpoints health on restbase-test2002 is OK: All endpoints are healthy [05:51:44] RECOVERY - restbase endpoints health on restbase2005 is OK: All endpoints are healthy [05:51:44] RECOVERY - restbase endpoints health on restbase2007 is OK: All endpoints are healthy [05:51:44] RECOVERY - restbase endpoints health on restbase2004 is OK: All endpoints are healthy [05:51:44] PROBLEM - restbase endpoints health on restbase1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:51:44] PROBLEM - restbase endpoints health on xenon is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:51:44] PROBLEM - restbase endpoints health on cerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:52:04] PROBLEM - HHVM rendering on mw1225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:52:14] RECOVERY - Apache HTTP on mw1235 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 9.322 second response time [05:52:14] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [05:52:14] RECOVERY - HHVM rendering on mw1235 is OK: HTTP OK: HTTP/1.1 200 OK - 69337 bytes in 0.360 second response time [05:52:24] RECOVERY - HHVM rendering on mw1227 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 1.795 second response time [05:52:24] RECOVERY - HHVM rendering on mw1226 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 4.750 second response time [05:52:34] PROBLEM - HHVM rendering on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:52:44] PROBLEM - restbase endpoints health on restbase2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:52:44] RECOVERY - restbase endpoints health on cerium is OK: All endpoints are healthy [05:52:54] PROBLEM - restbase endpoints health on restbase2009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:53:04] RECOVERY - Apache HTTP on mw1226 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.568 second response time [05:53:14] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [05:53:24] RECOVERY - HHVM rendering on mw1231 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 4.010 second response time [05:53:24] RECOVERY - HHVM rendering on mw1192 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 2.930 second response time [05:53:24] RECOVERY - Apache HTTP on mw1192 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.034 second response time [05:53:24] RECOVERY - Apache HTTP on mw1231 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.149 second response time [05:53:34] PROBLEM - Apache HTTP on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:53:34] PROBLEM - Apache HTTP on mw1193 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:53:44] RECOVERY - restbase endpoints health on restbase2003 is OK: All endpoints are healthy [05:53:44] PROBLEM - restbase endpoints health on restbase2010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:53:44] PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:53:44] PROBLEM - restbase endpoints health on restbase1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:53:54] PROBLEM - restbase endpoints health on restbase1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:54:04] RECOVERY - Apache HTTP on mw1224 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.931 second response time [05:54:04] PROBLEM - Apache HTTP on mw1189 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:54:05] PROBLEM - HHVM rendering on mw1223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:54:14] PROBLEM - Apache HTTP on mw1225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:54:14] PROBLEM - Apache HTTP on mw1223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:54:14] PROBLEM - HHVM rendering on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:54:14] RECOVERY - HHVM rendering on mw1221 is OK: HTTP OK: HTTP/1.1 200 OK - 69336 bytes in 0.091 second response time [05:54:14] RECOVERY - HHVM rendering on mw1224 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 1.408 second response time [05:54:15] PROBLEM - HHVM rendering on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:54:24] RECOVERY - Apache HTTP on mw1221 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.039 second response time [05:54:24] PROBLEM - HHVM rendering on mw1189 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:54:24] PROBLEM - HHVM rendering on mw1193 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:54:24] PROBLEM - Apache HTTP on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:54:34] PROBLEM - Apache HTTP on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:54:44] PROBLEM - restbase endpoints health on restbase1008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:54:44] PROBLEM - restbase endpoints health on restbase2011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:54:44] PROBLEM - restbase endpoints health on restbase2005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:54:44] PROBLEM - restbase endpoints health on restbase2008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:54:54] RECOVERY - restbase endpoints health on restbase2009 is OK: All endpoints are healthy [05:55:04] RECOVERY - HHVM rendering on mw1276 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 4.546 second response time [05:55:04] RECOVERY - Apache HTTP on mw1190 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.081 second response time [05:55:04] PROBLEM - Apache HTTP on mw1289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:55:04] RECOVERY - HHVM rendering on mw1190 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 1.390 second response time [05:55:14] PROBLEM - Apache HTTP on mw1235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:55:14] PROBLEM - Apache HTTP on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:55:15] PROBLEM - Apache HTTP on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:55:24] RECOVERY - HHVM rendering on mw1191 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.148 second response time [05:55:24] PROBLEM - HHVM rendering on mw1289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:55:24] PROBLEM - HHVM rendering on mw1235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:55:34] PROBLEM - HHVM rendering on mw1204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:55:34] PROBLEM - HHVM rendering on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:55:34] PROBLEM - Apache HTTP on mw1204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:55:34] PROBLEM - HHVM rendering on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:55:34] PROBLEM - Apache HTTP on mw1227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:55:44] RECOVERY - restbase endpoints health on restbase2011 is OK: All endpoints are healthy [05:55:44] PROBLEM - restbase endpoints health on cerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:55:44] PROBLEM - restbase endpoints health on restbase-test2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:55:54] PROBLEM - restbase endpoints health on restbase1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:56:04] RECOVERY - Apache HTTP on mw1191 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 2.881 second response time [05:56:04] PROBLEM - Apache HTTP on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:56:14] PROBLEM - HHVM rendering on mw1277 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [05:56:24] PROBLEM - HHVM rendering on mw1231 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:56:34] PROBLEM - HHVM rendering on mw1192 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:56:34] PROBLEM - HHVM rendering on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:56:34] PROBLEM - Apache HTTP on mw1192 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:56:34] PROBLEM - Apache HTTP on mw1231 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:56:44] RECOVERY - restbase endpoints health on restbase1013 is OK: All endpoints are healthy [05:56:44] PROBLEM - restbase endpoints health on restbase2007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:56:44] PROBLEM - restbase endpoints health on restbase2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:56:54] RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy [05:57:04] RECOVERY - Apache HTTP on mw1226 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.557 second response time [05:57:14] RECOVERY - HHVM rendering on mw1289 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.615 second response time [05:57:24] RECOVERY - Apache HTTP on mw1285 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.040 second response time [05:57:24] RECOVERY - HHVM rendering on mw1231 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 9.136 second response time [05:57:44] RECOVERY - restbase endpoints health on restbase2005 is OK: All endpoints are healthy [05:57:44] RECOVERY - restbase endpoints health on restbase2010 is OK: All endpoints are healthy [05:57:44] PROBLEM - restbase endpoints health on restbase-test2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:57:54] PROBLEM - restbase endpoints health on restbase2009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:57:54] RECOVERY - Apache HTTP on mw1289 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.019 second response time [05:58:04] RECOVERY - Apache HTTP on mw1195 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.338 second response time [05:58:04] RECOVERY - HHVM rendering on mw1195 is OK: HTTP OK: HTTP/1.1 200 OK - 69337 bytes in 0.375 second response time [05:58:04] RECOVERY - HHVM rendering on mw1285 is OK: HTTP OK: HTTP/1.1 200 OK - 69337 bytes in 0.253 second response time [05:58:04] RECOVERY - Apache HTTP on mw1235 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 2.228 second response time [05:58:04] PROBLEM - HHVM rendering on mw1276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:58:04] RECOVERY - Apache HTTP on mw1277 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.027 second response time [05:58:14] RECOVERY - HHVM rendering on mw1277 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.439 second response time [05:58:14] RECOVERY - Apache HTTP on mw1203 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.381 second response time [05:58:15] RECOVERY - HHVM rendering on mw1235 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 2.730 second response time [05:58:24] RECOVERY - HHVM rendering on mw1203 is OK: HTTP OK: HTTP/1.1 200 OK - 69337 bytes in 0.296 second response time [05:58:24] RECOVERY - PyBal backends health check on lvs1006 is OK: PYBAL OK - All pools are healthy [05:58:24] RECOVERY - HHVM rendering on mw1226 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 2.619 second response time [05:58:24] RECOVERY - Apache HTTP on mw1231 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.858 second response time [05:58:24] RECOVERY - PyBal backends health check on lvs1003 is OK: PYBAL OK - All pools are healthy [05:58:34] PROBLEM - HHVM rendering on mw1191 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:58:34] RECOVERY - restbase endpoints health on restbase1007 is OK: All endpoints are healthy [05:58:44] RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy [05:58:44] RECOVERY - restbase endpoints health on restbase2008 is OK: All endpoints are healthy [05:58:44] RECOVERY - restbase endpoints health on restbase2001 is OK: All endpoints are healthy [05:58:44] RECOVERY - restbase endpoints health on restbase-test2002 is OK: All endpoints are healthy [05:58:44] RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy [05:58:44] RECOVERY - restbase endpoints health on restbase1011 is OK: All endpoints are healthy [05:58:44] RECOVERY - restbase endpoints health on praseodymium is OK: All endpoints are healthy [05:58:45] PROBLEM - restbase endpoints health on restbase2011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:59:04] RECOVERY - HHVM rendering on mw1205 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 3.648 second response time [05:59:04] RECOVERY - Apache HTTP on mw1196 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.042 second response time [05:59:04] RECOVERY - Apache HTTP on mw1205 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.276 second response time [05:59:04] PROBLEM - Apache HTTP on mw1191 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:59:24] RECOVERY - HHVM rendering on mw1196 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.204 second response time [05:59:34] PROBLEM - Apache HTTP on mw1200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:59:34] PROBLEM - HHVM rendering on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:59:34] RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy [05:59:54] RECOVERY - restbase endpoints health on restbase1010 is OK: All endpoints are healthy [05:59:54] RECOVERY - restbase endpoints health on restbase2009 is OK: All endpoints are healthy [06:00:04] RECOVERY - HHVM rendering on mw1225 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 3.631 second response time [06:00:05] RECOVERY - Apache HTTP on mw1225 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.476 second response time [06:00:05] PROBLEM - Apache HTTP on mw1199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:00:24] RECOVERY - Apache HTTP on mw1276 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.052 second response time [06:00:34] RECOVERY - HHVM rendering on mw1222 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 8.086 second response time [06:00:34] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [06:00:44] RECOVERY - restbase endpoints health on restbase2003 is OK: All endpoints are healthy [06:00:44] RECOVERY - restbase endpoints health on cerium is OK: All endpoints are healthy [06:00:44] RECOVERY - restbase endpoints health on restbase2011 is OK: All endpoints are healthy [06:00:54] RECOVERY - Apache HTTP on mw1189 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.035 second response time [06:00:54] RECOVERY - HHVM rendering on mw1276 is OK: HTTP OK: HTTP/1.1 200 OK - 69337 bytes in 0.103 second response time [06:01:04] RECOVERY - Apache HTTP on mw1233 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 4.770 second response time [06:01:04] PROBLEM - Apache HTTP on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:01:04] RECOVERY - HHVM rendering on mw1233 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 1.025 second response time [06:01:05] PROBLEM - HHVM rendering on mw1200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:01:14] PROBLEM - Apache HTTP on mw1235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:01:14] RECOVERY - HHVM rendering on mw1201 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 3.500 second response time [06:01:14] RECOVERY - Apache HTTP on mw1201 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.037 second response time [06:01:14] RECOVERY - HHVM rendering on mw1189 is OK: HTTP OK: HTTP/1.1 200 OK - 69337 bytes in 0.310 second response time [06:01:24] RECOVERY - Apache HTTP on mw1222 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.800 second response time [06:01:24] PROBLEM - PyBal backends health check on lvs1006 is CRITICAL: PYBAL CRITICAL - api_80 - Could not depool server mw1222.eqiad.wmnet because of too many down! [06:01:24] PROBLEM - Apache HTTP on mw1203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:01:24] PROBLEM - HHVM rendering on mw1231 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:01:24] PROBLEM - HHVM rendering on mw1235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:01:24] PROBLEM - Apache HTTP on mw1278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:01:24] PROBLEM - PyBal backends health check on lvs1003 is CRITICAL: PYBAL CRITICAL - api_80 - Could not depool server mw1222.eqiad.wmnet because of too many down! [06:01:34] PROBLEM - HHVM rendering on mw1203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:01:34] PROBLEM - HHVM rendering on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:01:34] PROBLEM - Apache HTTP on mw1207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:01:34] PROBLEM - Apache HTTP on mw1231 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:01:44] PROBLEM - HHVM rendering on mw1207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:01:44] PROBLEM - restbase endpoints health on restbase1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:01:44] PROBLEM - restbase endpoints health on restbase-test2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:01:44] PROBLEM - restbase endpoints health on restbase1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:01:44] PROBLEM - restbase endpoints health on restbase1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:01:44] PROBLEM - restbase endpoints health on restbase2006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:01:44] PROBLEM - restbase endpoints health on xenon is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:01:54] PROBLEM - restbase endpoints health on praseodymium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:01:54] PROBLEM - restbase endpoints health on restbase2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:01:54] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:01:54] RECOVERY - Apache HTTP on mw1286 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.481 second response time [06:01:54] RECOVERY - HHVM rendering on mw1223 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.280 second response time [06:01:54] RECOVERY - HHVM rendering on mw1286 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.295 second response time [06:02:04] RECOVERY - Apache HTTP on mw1223 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.063 second response time [06:02:04] PROBLEM - Apache HTTP on mw1190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:02:04] PROBLEM - HHVM rendering on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:02:04] PROBLEM - HHVM rendering on mw1278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:02:14] PROBLEM - Apache HTTP on mw1205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:02:14] PROBLEM - HHVM rendering on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:02:14] PROBLEM - Apache HTTP on mw1195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:02:14] PROBLEM - Apache HTTP on mw1224 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:02:14] PROBLEM - HHVM rendering on mw1190 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:02:24] RECOVERY - Apache HTTP on mw1198 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.036 second response time [06:02:24] RECOVERY - HHVM rendering on mw1198 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.193 second response time [06:02:24] RECOVERY - HHVM rendering on mw1192 is OK: HTTP OK: HTTP/1.1 200 OK - 69337 bytes in 0.326 second response time [06:02:24] RECOVERY - Apache HTTP on mw1192 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.076 second response time [06:02:24] PROBLEM - HHVM rendering on mw1221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:02:24] PROBLEM - HHVM rendering on mw1224 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:02:34] PROBLEM - Apache HTTP on mw1221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:02:34] PROBLEM - Apache HTTP on mw1282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:02:44] RECOVERY - restbase endpoints health on restbase2006 is OK: All endpoints are healthy [06:02:44] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:02:44] PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:02:44] PROBLEM - restbase endpoints health on restbase2008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:02:44] RECOVERY - restbase endpoints health on restbase2004 is OK: All endpoints are healthy [06:02:54] PROBLEM - restbase endpoints health on restbase1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:03:04] PROBLEM - Apache HTTP on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:03:05] PROBLEM - HHVM rendering on mw1282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:03:05] PROBLEM - HHVM rendering on mw1225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:03:14] PROBLEM - Apache HTTP on mw1225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:03:34] PROBLEM - HHVM rendering on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:03:44] PROBLEM - restbase endpoints health on restbase-test2001 is CRITICAL: /feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregated feed content for April 29, 2016 returned the unexpected status 503 (expecting: 200) [06:03:44] PROBLEM - restbase endpoints health on restbase2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:03:44] PROBLEM - restbase endpoints health on restbase2011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:03:44] PROBLEM - restbase endpoints health on restbase2010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:03:44] PROBLEM - restbase endpoints health on cerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:03:44] PROBLEM - restbase endpoints health on restbase-test2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:04:04] RECOVERY - Apache HTTP on mw1197 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.035 second response time [06:04:14] RECOVERY - HHVM rendering on mw1193 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.141 second response time [06:04:14] PROBLEM - HHVM rendering on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:04:24] RECOVERY - Apache HTTP on mw1194 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.088 second response time [06:04:24] RECOVERY - Apache HTTP on mw1193 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.306 second response time [06:04:24] RECOVERY - HHVM rendering on mw1197 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.952 second response time [06:04:24] PROBLEM - Apache HTTP on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:04:24] RECOVERY - HHVM rendering on mw1280 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 4.635 second response time [06:04:24] RECOVERY - HHVM rendering on mw1194 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 1.988 second response time [06:04:36] PROBLEM - LVS HTTP IPv4 on api.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:04:44] RECOVERY - restbase endpoints health on restbase1008 is OK: All endpoints are healthy [06:04:44] RECOVERY - restbase endpoints health on restbase2010 is OK: All endpoints are healthy [06:04:44] RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy [06:04:44] RECOVERY - restbase endpoints health on cerium is OK: All endpoints are healthy [06:04:44] RECOVERY - restbase endpoints health on restbase2011 is OK: All endpoints are healthy [06:05:14] PROBLEM - HHVM rendering on mw1233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:05:24] PROBLEM - HHVM rendering on mw1192 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [06:05:24] RECOVERY - HHVM rendering on mw1204 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.153 second response time [06:05:24] PROBLEM - Apache HTTP on mw1192 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time [06:05:24] RECOVERY - Apache HTTP on mw1207 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.027 second response time [06:05:24] RECOVERY - Apache HTTP on mw1204 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.053 second response time [06:05:27] RECOVERY - LVS HTTP IPv4 on api.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 18294 bytes in 3.084 second response time [06:05:34] PROBLEM - HHVM rendering on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:05:34] RECOVERY - HHVM rendering on mw1207 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.200 second response time [06:05:44] PROBLEM - restbase endpoints health on restbase2006 is CRITICAL: /page/random/{format} (Random title redirect) is CRITICAL: Test Random title redirect returned the unexpected status 503 (expecting: 303) [06:05:54] PROBLEM - restbase endpoints health on restbase2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:06:04] PROBLEM - Apache HTTP on mw1233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:06:14] PROBLEM - Apache HTTP on mw1284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:06:14] RECOVERY - HHVM rendering on mw1235 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.185 second response time [06:06:24] RECOVERY - HHVM rendering on mw1192 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.214 second response time [06:06:24] RECOVERY - Apache HTTP on mw1192 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.029 second response time [06:06:24] RECOVERY - Apache HTTP on mw1282 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.034 second response time [06:06:34] RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy [06:06:44] RECOVERY - restbase endpoints health on restbase1011 is OK: All endpoints are healthy [06:06:44] RECOVERY - restbase endpoints health on restbase1014 is OK: All endpoints are healthy [06:06:44] RECOVERY - restbase endpoints health on restbase1007 is OK: All endpoints are healthy [06:06:54] RECOVERY - restbase endpoints health on restbase1015 is OK: All endpoints are healthy [06:06:54] RECOVERY - HHVM rendering on mw1282 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.194 second response time [06:07:04] RECOVERY - Apache HTTP on mw1235 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.020 second response time [06:07:14] RECOVERY - HHVM rendering on mw1201 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 7.794 second response time [06:07:14] RECOVERY - Apache HTTP on mw1201 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 3.296 second response time [06:07:34] PROBLEM - Apache HTTP on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:07:34] PROBLEM - HHVM rendering on mw1280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:07:34] PROBLEM - Apache HTTP on mw1193 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:07:34] PROBLEM - HHVM rendering on mw1194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:07:34] RECOVERY - restbase endpoints health on restbase-test2003 is OK: All endpoints are healthy [06:07:44] RECOVERY - restbase endpoints health on restbase-test2002 is OK: All endpoints are healthy [06:07:44] PROBLEM - restbase endpoints health on restbase1008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:07:44] PROBLEM - restbase endpoints health on restbase2011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:07:44] PROBLEM - restbase endpoints health on cerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:07:54] RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy [06:07:54] PROBLEM - restbase endpoints health on restbase2009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:07:54] RECOVERY - HHVM rendering on mw1225 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.150 second response time [06:08:04] RECOVERY - Apache HTTP on mw1225 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.026 second response time [06:08:24] PROBLEM - HHVM rendering on mw1193 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:08:34] RECOVERY - Apache HTTP on mw1193 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.758 second response time [06:08:44] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [06:08:44] RECOVERY - restbase endpoints health on restbase2003 is OK: All endpoints are healthy [06:08:44] RECOVERY - restbase endpoints health on restbase1008 is OK: All endpoints are healthy [06:08:44] RECOVERY - restbase endpoints health on cerium is OK: All endpoints are healthy [06:08:44] RECOVERY - restbase endpoints health on praseodymium is OK: All endpoints are healthy [06:08:44] RECOVERY - restbase endpoints health on restbase2006 is OK: All endpoints are healthy [06:08:44] RECOVERY - restbase endpoints health on restbase2004 is OK: All endpoints are healthy [06:08:45] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy [06:08:45] PROBLEM - restbase endpoints health on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:09:04] RECOVERY - HHVM rendering on mw1232 is OK: HTTP OK: HTTP/1.1 200 OK - 69337 bytes in 0.327 second response time [06:09:34] RECOVERY - restbase endpoints health on restbase2011 is OK: All endpoints are healthy [06:09:35] RECOVERY - restbase endpoints health on restbase2008 is OK: All endpoints are healthy [06:09:35] RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy [06:09:35] RECOVERY - restbase endpoints health on restbase2001 is OK: All endpoints are healthy [06:09:44] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [06:09:44] RECOVERY - restbase endpoints health on restbase2007 is OK: All endpoints are healthy [06:09:54] RECOVERY - restbase endpoints health on restbase1010 is OK: All endpoints are healthy [06:10:04] RECOVERY - Apache HTTP on mw1195 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.025 second response time [06:10:04] RECOVERY - HHVM rendering on mw1195 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.259 second response time [06:10:04] RECOVERY - Apache HTTP on mw1224 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.042 second response time [06:10:04] RECOVERY - Apache HTTP on mw1233 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.823 second response time [06:10:04] RECOVERY - HHVM rendering on mw1233 is OK: HTTP OK: HTTP/1.1 200 OK - 69337 bytes in 0.253 second response time [06:10:14] RECOVERY - HHVM rendering on mw1221 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 3.168 second response time [06:10:14] PROBLEM - HHVM rendering on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:10:24] RECOVERY - HHVM rendering on mw1284 is OK: HTTP OK: HTTP/1.1 200 OK - 69336 bytes in 0.097 second response time [06:10:24] RECOVERY - Apache HTTP on mw1221 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.060 second response time [06:10:24] PROBLEM - Apache HTTP on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:10:44] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy [06:10:44] RECOVERY - restbase endpoints health on restbase2009 is OK: All endpoints are healthy [06:10:54] RECOVERY - Apache HTTP on mw1226 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.037 second response time [06:10:54] RECOVERY - Apache HTTP on mw1190 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.051 second response time [06:10:54] RECOVERY - HHVM rendering on mw1278 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.252 second response time [06:11:04] RECOVERY - HHVM rendering on mw1190 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.241 second response time [06:11:14] RECOVERY - Apache HTTP on mw1278 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.029 second response time [06:11:24] RECOVERY - HHVM rendering on mw1227 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.123 second response time [06:11:24] RECOVERY - HHVM rendering on mw1226 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.182 second response time [06:11:24] RECOVERY - PyBal backends health check on lvs1006 is OK: PYBAL OK - All pools are healthy [06:11:24] PROBLEM - Apache HTTP on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:11:24] RECOVERY - Apache HTTP on mw1227 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.036 second response time [06:11:24] RECOVERY - PyBal backends health check on lvs1003 is OK: PYBAL OK - All pools are healthy [06:11:34] PROBLEM - Apache HTTP on mw1193 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:11:34] PROBLEM - HHVM rendering on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:11:44] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [06:11:44] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:12:04] PROBLEM - Apache HTTP on mw1283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:12:04] PROBLEM - HHVM rendering on mw1283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:12:04] PROBLEM - HHVM rendering on mw1228 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:12:14] PROBLEM - HHVM rendering on mw1281 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:12:14] RECOVERY - Apache HTTP on mw1222 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.047 second response time [06:12:15] RECOVERY - HHVM rendering on mw1224 is OK: HTTP OK: HTTP/1.1 200 OK - 69336 bytes in 0.090 second response time [06:12:24] RECOVERY - HHVM rendering on mw1280 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.885 second response time [06:12:24] RECOVERY - HHVM rendering on mw1222 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.121 second response time [06:12:34] PROBLEM - Apache HTTP on mw1228 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:12:44] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [06:12:44] RECOVERY - mobileapps endpoints health on scb1004 is OK: All endpoints are healthy [06:12:44] PROBLEM - Apache HTTP on mw1281 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:12:54] RECOVERY - Apache HTTP on mw1280 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.028 second response time [06:13:04] RECOVERY - Apache HTTP on mw1284 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.022 second response time [06:13:04] RECOVERY - HHVM rendering on mw1228 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 6.998 second response time [06:13:14] RECOVERY - Apache HTTP on mw1203 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.047 second response time [06:13:14] RECOVERY - HHVM rendering on mw1193 is OK: HTTP OK: HTTP/1.1 200 OK - 69337 bytes in 0.102 second response time [06:13:24] RECOVERY - HHVM rendering on mw1203 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.122 second response time [06:13:24] RECOVERY - HHVM rendering on mw1231 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 8.213 second response time [06:13:24] RECOVERY - Apache HTTP on mw1193 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.166 second response time [06:13:24] RECOVERY - Apache HTTP on mw1231 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.181 second response time [06:13:24] RECOVERY - Apache HTTP on mw1232 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.476 second response time [06:13:24] RECOVERY - HHVM rendering on mw1199 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 2.090 second response time [06:13:44] RECOVERY - mobileapps endpoints health on scb2004 is OK: All endpoints are healthy [06:13:44] RECOVERY - Apache HTTP on mw1281 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.946 second response time [06:13:44] RECOVERY - mobileapps endpoints health on scb2003 is OK: All endpoints are healthy [06:14:04] RECOVERY - HHVM rendering on mw1200 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 3.659 second response time [06:14:04] RECOVERY - Apache HTTP on mw1199 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.575 second response time [06:14:24] RECOVERY - Apache HTTP on mw1194 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.050 second response time [06:14:24] RECOVERY - HHVM rendering on mw1194 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.149 second response time [06:14:34] RECOVERY - Apache HTTP on mw1228 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.132 second response time [06:14:44] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy [06:15:04] RECOVERY - HHVM rendering on mw1201 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 0.326 second response time [06:15:14] PROBLEM - Apache HTTP on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:15:16] PROBLEM - HHVM rendering on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:15:16] RECOVERY - Apache HTTP on mw1201 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.065 second response time [06:15:24] RECOVERY - HHVM rendering on mw1191 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 3.311 second response time [06:15:37] PROBLEM - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:16:04] RECOVERY - Apache HTTP on mw1191 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.804 second response time [06:16:04] RECOVERY - Apache HTTP on mw1283 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 8.996 second response time [06:16:04] RECOVERY - HHVM rendering on mw1283 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 7.008 second response time [06:16:36] RECOVERY - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 241 bytes in 3.250 second response time [06:16:36] RECOVERY - Apache HTTP on mw1200 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 7.845 second response time [06:16:44] PROBLEM - mobileapps endpoints health on scb1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:16:44] PROBLEM - mobileapps endpoints health on scb2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:16:44] PROBLEM - mobileapps endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:16:44] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:16:44] PROBLEM - mobileapps endpoints health on scb2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:16:54] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:17:05] RECOVERY - Apache HTTP on mw1206 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 4.962 second response time [06:17:05] RECOVERY - HHVM rendering on mw1206 is OK: HTTP OK: HTTP/1.1 200 OK - 69337 bytes in 0.948 second response time [06:17:14] PROBLEM - HHVM rendering on mw1233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:17:14] PROBLEM - Apache HTTP on mw1287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:17:44] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:17:44] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:18:04] RECOVERY - HHVM rendering on mw1233 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 1.638 second response time [06:18:34] PROBLEM - HHVM rendering on mw1191 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:18:44] PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:18:54] RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy [06:19:04] RECOVERY - HHVM rendering on mw1205 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 4.934 second response time [06:19:04] PROBLEM - Apache HTTP on mw1283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:19:05] RECOVERY - Apache HTTP on mw1205 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 5.554 second response time [06:19:05] PROBLEM - HHVM rendering on mw1287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:19:05] PROBLEM - HHVM rendering on mw1283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:19:14] RECOVERY - HHVM rendering on mw1281 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 9.980 second response time [06:19:25] RECOVERY - HHVM rendering on mw1191 is OK: HTTP OK: HTTP/1.1 200 OK - 69339 bytes in 8.319 second response time [06:20:14] PROBLEM - Apache HTTP on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:20:14] PROBLEM - HHVM rendering on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:20:14] RECOVERY - Apache HTTP on mw1287 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 9.224 second response time [06:20:19] !log ori@tin Synchronized php-1.29.0-wmf.3/api.php: Bandaid: make API reqs fail fast if User-Agent ~= Parsoid and Host ~= eu.wikipedia.org (duration: 00m 50s) [06:20:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:20:44] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [06:20:44] PROBLEM - restbase endpoints health on restbase1008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:21:04] RECOVERY - Apache HTTP on mw1206 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.865 second response time [06:21:04] RECOVERY - HHVM rendering on mw1206 is OK: HTTP OK: HTTP/1.1 200 OK - 69337 bytes in 0.784 second response time [06:21:44] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy [06:22:04] RECOVERY - Apache HTTP on mw1283 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 6.341 second response time [06:22:04] RECOVERY - HHVM rendering on mw1287 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 8.151 second response time [06:22:34] RECOVERY - restbase endpoints health on restbase1008 is OK: All endpoints are healthy [06:22:44] RECOVERY - mobileapps endpoints health on scb2003 is OK: All endpoints are healthy [06:22:44] RECOVERY - mobileapps endpoints health on scb1004 is OK: All endpoints are healthy [06:22:44] RECOVERY - mobileapps endpoints health on scb2004 is OK: All endpoints are healthy [06:22:44] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [06:22:44] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy [06:22:44] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy [06:23:04] RECOVERY - HHVM rendering on mw1283 is OK: HTTP OK: HTTP/1.1 200 OK - 69338 bytes in 1.722 second response time [06:25:44] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:26:44] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [06:28:34] PROBLEM - dhclient process on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:29:24] PROBLEM - salt-minion processes on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:30:14] RECOVERY - salt-minion processes on thumbor1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [06:30:24] RECOVERY - dhclient process on thumbor1001 is OK: PROCS OK: 0 processes with command name dhclient [06:32:27] PROBLEM - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 0.011 second response time [06:32:40] oh FFS thumbor [06:32:42] looking at that too [06:33:09] https://grafana.wikimedia.org/dashboard/file/server-board.json?var-server=thumbor1001&var-network=eth0 [06:33:16] thumbor1001 in very high IOwait [06:33:36] RECOVERY - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 241 bytes in 3.917 second response time [06:33:48] I am gonna venture a guess and say it's emitting that deep recursion thing for exiftool [06:33:58] indeed, same issue from the other day you saw heh? [06:34:37] yeah, gilles had some live hack in place to figure it out and then the issue subsided [06:34:49] all we did was restart 1/2 thumbor [06:35:08] and shift the traffic around a bit between the 2 hosts in order to help us with debugging [06:35:18] somehow ended up solving the problem [06:35:32] which doesn't make sense, so gilles opened some tasks [06:35:40] and I think left the live hack in place [06:36:03] ok thanks! I'm taking a look too [06:41:14] PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Not Available - 531 bytes in 0.021 second response time [06:45:21] (03CR) 10Gergő Tisza: [C: 031] logstash: Add processing rules for MediaWiki's exception channel [puppet] - 10https://gerrit.wikimedia.org/r/323351 (https://phabricator.wikimedia.org/T136849) (owner: 10BryanDavis) [06:52:24] PROBLEM - Check systemd state on thumbor1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:54:04] PROBLEM - salt-minion processes on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:54:05] PROBLEM - dhclient process on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:54:54] RECOVERY - dhclient process on thumbor1002 is OK: PROCS OK: 0 processes with command name dhclient [06:54:54] RECOVERY - salt-minion processes on thumbor1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [06:56:24] PROBLEM - PyBal backends health check on lvs1003 is CRITICAL: PYBAL CRITICAL - thumbor_8800 - Could not depool server thumbor1002.eqiad.wmnet because of too many down! [06:56:37] PROBLEM - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:56:59] still looking, sorry about the page [06:57:03] I'm going to silence it [06:58:36] RECOVERY - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 241 bytes in 8.547 second response time [06:59:04] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:59:24] RECOVERY - Check systemd state on thumbor1001 is OK: OK - running: The system is fully operational [06:59:24] RECOVERY - PyBal backends health check on lvs1003 is OK: PYBAL OK - All pools are healthy [07:02:14] PROBLEM - puppet last run on kafka1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:02:24] PROBLEM - Check systemd state on thumbor1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:03:14] RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.038 second response time [07:03:24] RECOVERY - Check systemd state on thumbor1002 is OK: OK - running: The system is fully operational [07:13:34] PROBLEM - puppet last run on ms-be1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:18:04] PROBLEM - dhclient process on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:18:04] PROBLEM - salt-minion processes on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:19:34] PROBLEM - dhclient process on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:20:24] PROBLEM - salt-minion processes on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:24:24] PROBLEM - Check systemd state on thumbor1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:24:54] RECOVERY - salt-minion processes on thumbor1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [07:24:54] RECOVERY - dhclient process on thumbor1002 is OK: PROCS OK: 0 processes with command name dhclient [07:25:24] PROBLEM - PyBal backends health check on lvs1003 is CRITICAL: PYBAL CRITICAL - thumbor_8800 - Could not depool server thumbor1002.eqiad.wmnet because of too many down! [07:26:04] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [07:26:14] RECOVERY - salt-minion processes on thumbor1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [07:26:24] PROBLEM - PyBal backends health check on lvs1006 is CRITICAL: PYBAL CRITICAL - thumbor_8800 - Could not depool server thumbor1002.eqiad.wmnet because of too many down! [07:26:24] RECOVERY - dhclient process on thumbor1001 is OK: PROCS OK: 0 processes with command name dhclient [07:27:34] RECOVERY - PyBal backends health check on lvs1003 is OK: PYBAL OK - All pools are healthy [07:28:24] RECOVERY - PyBal backends health check on lvs1006 is OK: PYBAL OK - All pools are healthy [07:29:14] PROBLEM - puppet last run on mw1249 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:30:14] RECOVERY - puppet last run on kafka1020 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [07:31:24] RECOVERY - Check systemd state on thumbor1001 is OK: OK - running: The system is fully operational [07:34:24] PROBLEM - Check systemd state on thumbor1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:36:24] RECOVERY - Check systemd state on thumbor1001 is OK: OK - running: The system is fully operational [07:42:34] RECOVERY - puppet last run on ms-be1018 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [07:43:14] PROBLEM - cassandra-b CQL 10.64.32.206:9042 on restbase1013 is CRITICAL: connect to address 10.64.32.206 and port 9042: Connection refused [07:43:24] PROBLEM - cassandra-b SSL 10.64.32.206:7001 on restbase1013 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [07:43:44] PROBLEM - cassandra-b service on restbase1013 is CRITICAL: CRITICAL - Expecting active but unit cassandra-b is failed [07:44:04] PROBLEM - Check systemd state on restbase1013 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:48:04] PROBLEM - salt-minion processes on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:48:04] PROBLEM - dhclient process on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:48:54] RECOVERY - dhclient process on thumbor1002 is OK: PROCS OK: 0 processes with command name dhclient [07:48:54] RECOVERY - salt-minion processes on thumbor1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [07:57:14] RECOVERY - puppet last run on mw1249 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [07:58:24] PROBLEM - PyBal backends health check on lvs1006 is CRITICAL: PYBAL CRITICAL - thumbor_8800 - Could not depool server thumbor1002.eqiad.wmnet because of too many down! [08:00:14] PROBLEM - puppet last run on elastic1038 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:00:24] RECOVERY - PyBal backends health check on lvs1006 is OK: PYBAL OK - All pools are healthy [08:01:24] PROBLEM - salt-minion processes on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:01:34] PROBLEM - dhclient process on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:02:24] RECOVERY - salt-minion processes on thumbor1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:04:44] RECOVERY - cassandra-b service on restbase1013 is OK: OK - cassandra-b is active [08:05:04] RECOVERY - Check systemd state on restbase1013 is OK: OK - running: The system is fully operational [08:05:24] PROBLEM - salt-minion processes on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:06:24] PROBLEM - Check systemd state on thumbor1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [08:07:14] RECOVERY - cassandra-b CQL 10.64.32.206:9042 on restbase1013 is OK: TCP OK - 0.000 second response time on 10.64.32.206 port 9042 [08:07:24] RECOVERY - cassandra-b SSL 10.64.32.206:7001 on restbase1013 is OK: SSL OK - Certificate restbase1013-b valid until 2017-09-12 15:34:20 +0000 (expires in 289 days) [08:09:34] RECOVERY - dhclient process on thumbor1001 is OK: PROCS OK: 0 processes with command name dhclient [08:12:34] PROBLEM - dhclient process on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:13:14] RECOVERY - salt-minion processes on thumbor1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:13:24] RECOVERY - dhclient process on thumbor1001 is OK: PROCS OK: 0 processes with command name dhclient [08:14:24] RECOVERY - Check systemd state on thumbor1002 is OK: OK - running: The system is fully operational [08:20:04] PROBLEM - puppet last run on mw1180 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:28:14] RECOVERY - puppet last run on elastic1038 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [08:38:04] PROBLEM - puppet last run on db1035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:49:04] RECOVERY - puppet last run on mw1180 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [08:49:24] PROBLEM - salt-minion processes on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:49:34] PROBLEM - dhclient process on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:53:14] RECOVERY - salt-minion processes on thumbor1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:53:24] RECOVERY - dhclient process on thumbor1001 is OK: PROCS OK: 0 processes with command name dhclient [08:57:24] PROBLEM - salt-minion processes on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:57:34] PROBLEM - dhclient process on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:01:14] RECOVERY - salt-minion processes on thumbor1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [09:01:24] RECOVERY - dhclient process on thumbor1001 is OK: PROCS OK: 0 processes with command name dhclient [09:06:04] RECOVERY - puppet last run on db1035 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [09:35:08] !log removed all the files not used in /tmp on stat1002 after a follow up with the owner [09:35:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:09:14] PROBLEM - puppet last run on labvirt1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:38:14] RECOVERY - puppet last run on labvirt1001 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [13:11:44] PROBLEM - puppet last run on cp4007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:40:44] RECOVERY - puppet last run on cp4007 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [13:41:04] PROBLEM - puppet last run on db1048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:09:04] RECOVERY - puppet last run on db1048 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [14:50:14] PROBLEM - puppet last run on ms-be1021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:53:01] (03PS1) 10MarcoAurelio: Initial configuration for fi.wikivoyage.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323695 (https://phabricator.wikimedia.org/T151570) [15:01:24] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 652 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4236356 keys, up 27 days 6 hours - replication_delay is 652 [15:04:33] (03PS1) 10Urbanecm: [logo] Add logo to Wikivoyage Finnish [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323697 (https://phabricator.wikimedia.org/T151571) [15:05:10] (03Draft2) 10MarcoAurelio: RESTBase configuration for fi.wikivoyage.org [puppet] - 10https://gerrit.wikimedia.org/r/323696 (https://phabricator.wikimedia.org/T151570) [15:05:19] (03Draft1) 10MarcoAurelio: RESTBase configuration for fi.wikivoyage.org [puppet] - 10https://gerrit.wikimedia.org/r/323696 (https://phabricator.wikimedia.org/T151570) [15:09:25] (03PS2) 10Urbanecm: [logo] Add logo to Wikivoyage Finnish [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323697 (https://phabricator.wikimedia.org/T151571) [15:12:24] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4217299 keys, up 27 days 6 hours - replication_delay is 19 [15:19:14] RECOVERY - puppet last run on ms-be1021 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [15:23:01] (03Draft2) 10MarcoAurelio: Labs configuration for fi.wikivoyage.org [puppet] - 10https://gerrit.wikimedia.org/r/323698 (https://phabricator.wikimedia.org/T151570) [15:23:11] (03Draft1) 10MarcoAurelio: Labs configuration for fi.wikivoyage.org [puppet] - 10https://gerrit.wikimedia.org/r/323698 (https://phabricator.wikimedia.org/T151570) [16:00:10] Krenair: how to add fi.wikivoyage to Parsoid? I don't understand quite well https://wikitech.wikimedia.org/wiki/Add_a_wiki#Parsoid [16:01:16] do you have a clone of the repo mentioned there? [16:03:06] Not locally. [16:03:49] well [16:03:51] that would be step 1 [16:04:00] https://phabricator.wikimedia.org/T151570#2825558 <-- done so far [16:04:07] you could clone it from labs or whatever [16:04:07] I'm doing it through gerrit [16:04:22] or I can let other guys to do it 9_9 [16:04:32] so I don't have to clone x^n repos [16:09:08] how are you 'doing it through gerrit'? [16:10:36] you said from labs, not on labs [16:10:53] so I presume he means cloning from gerrit... thinking you meant you could clone from labs [16:12:40] * valhallasw`cloud interprets it as 'using the edit button in the gerrit web interface' [16:13:12] but this looks a bit too complicated to do in that way [16:13:26] Can you create a commit from scratch? [16:15:05] Reedy: e.g. https://gerrit.wikimedia.org/r/#/admin/projects/pywikibot/core 'create change' button [16:15:22] Scary [16:15:24] And hidden [16:17:11] Reedy that's not scary [16:17:24] it is the same as git pull and creating the change just doing it from the web ui [16:17:47] i find it more useful since on a mobile like an iphone you carn't use git [16:17:55] so doing it from the webui is a benefit [16:18:19] Why do you find a need to make changes from an iphone? [16:18:25] It must be rare that a change is that urgent [16:18:33] Reedy because during the week days i am away from my pc [16:18:57] mafk, Reedy: yeah I meant you could clone it from gerrit to a labs node [16:18:58] sorry [16:21:49] I can't imagine it's good UX editing large files on a phone [16:21:52] It's bad enough viewing them [16:34:37] sorry, work calls [16:35:27] Reedy: instead of having to clone a repo locally I don't use too much, I use the create change button as valhallasw`cloud said; which is very useful for me. [16:35:43] [16:15:22] Scary [16:35:43] [16:15:23] And hidden [16:35:49] I can still see it [16:36:04] so please, do not disable that feature [16:36:12] It's hidden, as in, nowhere obvious [16:39:54] In any case, I do most of my changes on my local machine and upload them afterwards. But in cases like this one, instead of cloning analytics/refinery locally, etc; I just edit it via web [16:41:32] I can't say I've ever bothered trying [16:41:41] Even on my shitty internet connection, I'll just clone locally [16:43:04] PROBLEM - puppet last run on mw1210 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:48:51] Reedy: shitty = 20 GB/s ? ;) [16:52:20] mafk: bonded ADSL to get 5-6 meg [16:54:39] optical fiber [16:55:17] Literally have no other choice [16:55:29] Other than maybe satelite [16:58:31] Reedy what about 4g? [16:58:46] It's not cost effective for large data transfer [16:59:01] Reedy three do unlimited data and speeds can be fast [16:59:09] 3g can be fast in rual areas on 3g with three mobile [16:59:11] It's not unlimited for tethering [16:59:26] Yes it is, since everyone does it on three [16:59:43] three pay as you they doint allow teethering but you can teether any ways [16:59:52] i used like 70gb+ in a week [17:00:00] actualy three days. [17:01:30] i get like 20mbps on 3g and like the same on 4g but up the street it goes up to 90mbps [17:05:09] https://phabricator.wikimedia.org/project/profile/2342/ <-- lol [17:06:00] lol [17:10:04] RECOVERY - puppet last run on mw1210 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [17:55:21] (03PS1) 10Alex Monk: wikitech cloudadmin: remove right that no longer exists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323708 [18:38:58] Hi, cswiki Arbcom wants internal wiki which will be readable and writeable only by members of our ArbCom. How can I request it and do I need approval from somebody? [18:39:41] Request goes in fab [18:39:43] *phab [18:40:01] And do I need preapproval from somewhere Reedy ? [18:40:07] I'm not sure about approval [18:40:24] Okay. But thank you. [18:40:51] ooold requests include https://phabricator.wikimedia.org/T14962 [18:41:06] https://phabricator.wikimedia.org/T62764 [18:41:27] Urbanecm: File the request, linking to discussions [18:41:34] If approval is actually needed from elsewhere, it'll be asked for [18:42:10] Which discussions? ArbCom discussed it internally through e-mail. So should we discuss it publicly at our wiki and then fill the request? [18:43:21] Reedy, ^ [18:43:32] looks like you have a committee of 4 people? [18:43:49] can probably just have one person open the ticket then the other 3 support in comments [18:43:49] Yes. Me and three others. [18:44:13] Okay, thanks Krenair. [18:51:24] PROBLEM - Check systemd state on labstore1005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [18:51:34] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1005 is CRITICAL: CRITICAL - Expecting active but unit create-dbusers is failed [19:09:24] RECOVERY - Check systemd state on labstore1005 is OK: OK - running: The system is fully operational [19:09:34] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1005 is OK: OK - create-dbusers is active [19:43:14] PROBLEM - puppet last run on db1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:43:24] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [19:44:24] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4254356 keys, up 27 days 11 hours - replication_delay is 47 [20:11:14] RECOVERY - puppet last run on db1001 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [20:56:24] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [20:57:24] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4267810 keys, up 27 days 12 hours - replication_delay is 0 [21:14:36] 06Operations, 10MediaWiki-Configuration, 06Performance-Team, 06Services (watching), and 5 others: Integrating MediaWiki (and other services) with dynamic configuration - https://phabricator.wikimedia.org/T149617#2758050 (10Seb35) I’m interested in this topic, particularly because I try to develop a wiki fa... [21:32:34] (03PS1) 10Aaron Schulz: Bump parser cache purging batch wait time [puppet] - 10https://gerrit.wikimedia.org/r/323764 (https://phabricator.wikimedia.org/T150124) [21:47:34] !log created wmf/1.29.0-wmf.3 branch pointing at master for mediawiki/extensions/ElectronPdfService to workaround T151725 [21:47:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:47:45] T151725: @RelaseTaggerBot not working since 18 November - https://phabricator.wikimedia.org/T151725 [21:53:34] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [21:54:24] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4277581 keys, up 27 days 13 hours - replication_delay is 0 [21:56:44] PROBLEM - Improperly owned -0:0- files in /srv/mediawiki-staging on tin is CRITICAL: Improperly owned (0:0) files in /srv/mediawiki-staging [21:57:23] (03CR) 10Alex Monk: [C: 031] Labs configuration for fi.wikivoyage.org [puppet] - 10https://gerrit.wikimedia.org/r/323698 (https://phabricator.wikimedia.org/T151570) (owner: 10MarcoAurelio) [21:58:15] (03CR) 10Alex Monk: [C: 031] RESTBase configuration for fi.wikivoyage.org [puppet] - 10https://gerrit.wikimedia.org/r/323696 (https://phabricator.wikimedia.org/T151570) (owner: 10MarcoAurelio) [22:01:09] (03CR) 10Alex Monk: [C: 031] [logo] Add logo to Wikivoyage Finnish [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323697 (https://phabricator.wikimedia.org/T151571) (owner: 10Urbanecm) [22:02:56] (03CR) 10Alex Monk: [C: 031] Initial configuration for fi.wikivoyage.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323695 (https://phabricator.wikimedia.org/T151570) (owner: 10MarcoAurelio) [22:15:04] PROBLEM - puppet last run on mw1201 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:19:14] PROBLEM - puppet last run on elastic1040 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:43:04] RECOVERY - puppet last run on mw1201 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [22:48:14] RECOVERY - puppet last run on elastic1040 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [23:37:34] PROBLEM - puppet last run on cp3038 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues