[00:10:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:22:39] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.331 seconds [00:56:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:07:46] New patchset: Helder.wiki; "(bug 22911) Configure Extension:SubpageSortkey for enwikibooks and ptwikibooks" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/24561 [01:10:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.860 seconds [01:19:02] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [01:42:08] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 272 seconds [01:42:17] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 224 seconds [01:43:38] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 17 seconds [01:46:02] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:46:47] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 7 seconds [01:58:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.263 seconds [02:08:05] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [02:32:14] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:39:53] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.806 seconds [03:30:08] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [04:52:45] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [04:52:45] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [05:45:43] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [05:46:01] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [05:47:04] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [06:40:50] PROBLEM - Puppet freshness on aluminium is CRITICAL: Puppet has not run in the last 10 hours [08:08:33] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 234 seconds [08:38:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:42:00] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.436 seconds [08:50:51] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , eswiki (26116) [09:03:00] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [09:04:04] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [09:05:15] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , eswiki (32184) [09:07:12] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [09:16:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:21:27] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [09:21:27] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [09:21:27] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [09:21:27] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [09:21:27] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [09:21:28] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [09:24:00] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.638 seconds [09:28:21] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.038 second response time [09:58:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:11:02] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.520 seconds [10:46:17] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:58:53] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.456 seconds [11:20:18] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [11:33:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:34:24] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [11:35:36] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [11:47:09] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.041 seconds [12:02:36] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 14 seconds [12:09:21] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [12:19:56] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:30:44] PROBLEM - Puppet freshness on dobson is CRITICAL: Puppet has not run in the last 10 hours [12:34:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.035 seconds [13:06:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:19:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.517 seconds [13:31:41] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [13:41:53] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , frwiki (23498) [13:43:14] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , frwiki (19952) [13:54:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:57:29] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [13:58:50] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [14:08:44] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.042 seconds [14:41:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:52:41] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.764 seconds [14:53:53] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [14:53:53] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [15:28:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:42:47] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.025 seconds [15:46:59] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [16:15:02] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:28:59] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , frwiki (28656) [16:29:08] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.029 seconds [16:30:20] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , frwiki (24416) [16:41:53] PROBLEM - Puppet freshness on aluminium is CRITICAL: Puppet has not run in the last 10 hours [17:02:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:13:37] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.701 seconds [17:16:01] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [17:17:13] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [17:29:29] !log powercycled cp1043 [17:29:40] Logged the message, Master [17:30:52] RECOVERY - Host cp1043 is UP: PING OK - Packet loss = 0%, RTA = 26.51 ms [17:34:46] PROBLEM - Varnish traffic logger on cp1043 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishncsa [17:35:04] PROBLEM - Varnish HTTP mobile-frontend on cp1043 is CRITICAL: Connection refused [17:39:34] RECOVERY - Varnish traffic logger on cp1043 is OK: PROCS OK: 3 processes with command name varnishncsa [17:40:01] RECOVERY - Varnish HTTP mobile-frontend on cp1043 is OK: HTTP OK HTTP/1.1 200 OK - 643 bytes in 0.053 seconds [17:47:58] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:02:59] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.032 seconds [18:35:13] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:49:01] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.025 seconds [19:21:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:22:37] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [19:22:37] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [19:22:37] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [19:22:37] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [19:22:37] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [19:22:37] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [19:36:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.020 seconds [19:46:49] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (39115) [19:48:01] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (38185) [19:55:15] orly [20:08:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:19:40] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.662 seconds [20:55:13] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:09:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.111 seconds [21:21:46] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [21:42:01] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:54:55] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.041 seconds [22:09:53] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [22:29:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:31:56] PROBLEM - Puppet freshness on dobson is CRITICAL: Puppet has not run in the last 10 hours [22:40:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.235 seconds [23:14:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:18:53] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Puppet has not run in the last 10 hours [23:18:53] PROBLEM - Puppet freshness on lvs5 is CRITICAL: Puppet has not run in the last 10 hours [23:18:53] PROBLEM - Puppet freshness on cp1034 is CRITICAL: Puppet has not run in the last 10 hours [23:18:53] PROBLEM - Puppet freshness on search1015 is CRITICAL: Puppet has not run in the last 10 hours [23:18:53] PROBLEM - Puppet freshness on search1002 is CRITICAL: Puppet has not run in the last 10 hours [23:18:54] PROBLEM - Puppet freshness on search1003 is CRITICAL: Puppet has not run in the last 10 hours [23:18:54] PROBLEM - Puppet freshness on search1014 is CRITICAL: Puppet has not run in the last 10 hours [23:18:55] PROBLEM - Puppet freshness on search1005 is CRITICAL: Puppet has not run in the last 10 hours [23:18:55] PROBLEM - Puppet freshness on search1011 is CRITICAL: Puppet has not run in the last 10 hours [23:18:56] PROBLEM - Puppet freshness on search1017 is CRITICAL: Puppet has not run in the last 10 hours [23:18:56] PROBLEM - Puppet freshness on search14 is CRITICAL: Puppet has not run in the last 10 hours [23:18:57] PROBLEM - Puppet freshness on search1018 is CRITICAL: Puppet has not run in the last 10 hours [23:18:57] PROBLEM - Puppet freshness on search15 is CRITICAL: Puppet has not run in the last 10 hours [23:18:58] PROBLEM - Puppet freshness on search19 is CRITICAL: Puppet has not run in the last 10 hours [23:18:58] PROBLEM - Puppet freshness on sq53 is CRITICAL: Puppet has not run in the last 10 hours [23:18:59] PROBLEM - Puppet freshness on search23 is CRITICAL: Puppet has not run in the last 10 hours [23:18:59] PROBLEM - Puppet freshness on sq54 is CRITICAL: Puppet has not run in the last 10 hours [23:19:00] PROBLEM - Puppet freshness on search25 is CRITICAL: Puppet has not run in the last 10 hours [23:19:00] PROBLEM - Puppet freshness on sq77 is CRITICAL: Puppet has not run in the last 10 hours [23:19:01] PROBLEM - Puppet freshness on search30 is CRITICAL: Puppet has not run in the last 10 hours [23:19:01] PROBLEM - Puppet freshness on search27 is CRITICAL: Puppet has not run in the last 10 hours [23:19:02] PROBLEM - Puppet freshness on sq80 is CRITICAL: Puppet has not run in the last 10 hours [23:19:02] PROBLEM - Puppet freshness on sq85 is CRITICAL: Puppet has not run in the last 10 hours [23:19:03] PROBLEM - Puppet freshness on search29 is CRITICAL: Puppet has not run in the last 10 hours [23:24:09] New patchset: Tim Starling; "Reduce wgMaxGeneratedPPNodeCount to 1.5M" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/24764 [23:25:16] Change merged: Tim Starling; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/24764 [23:26:41] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.070 seconds [23:30:26] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [23:33:16] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [23:33:34] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000