[00:22:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:23:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [00:52:21] !log demon Started syncing Wikimedia installation... : [00:52:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:52:26] Logged the message, Master [00:53:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [00:58:41] (03CR) 10Dereckson: "Thank you for the feedback, I added you especially so you can comment on the previous withdrawal." [operations/puppet] - 10https://gerrit.wikimedia.org/r/80760 (owner: 10Dereckson) [01:01:12] RECOVERY - check_job_queue on fenari is OK: JOBQUEUE OK - all job queues below 10,000 [01:03:27] !log demon Finished syncing Wikimedia installation... : [01:03:37] Logged the message, Master [01:03:43] (03CR) 10Dereckson: "In my sixth paragraph, I forgot take in consideration your Legal advocacy role, even if it's not really clear if you make an intervention " [operations/puppet] - 10https://gerrit.wikimedia.org/r/80760 (owner: 10Dereckson) [01:04:22] PROBLEM - check_job_queue on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:11:48] (03PS1) 10Demon: Enable secure login on mw.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80787 [01:21:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [01:52:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:53:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [02:13:20] !log LocalisationUpdate completed (1.22wmf13) at Sun Aug 25 02:13:19 UTC 2013 [02:13:25] Logged the message, Master [02:21:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:23:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.138 second response time [02:25:16] !log LocalisationUpdate completed (1.22wmf14) at Sun Aug 25 02:25:16 UTC 2013 [02:25:22] Logged the message, Master [02:32:41] PROBLEM - Puppet freshness on eeden is CRITICAL: No successful Puppet run in the last 10 hours [02:37:03] !log LocalisationUpdate ResourceLoader cache refresh completed at Sun Aug 25 02:37:03 UTC 2013 [02:37:09] Logged the message, Master [03:21:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:22:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [03:43:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:44:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.142 second response time [03:45:54] (03CR) 10Jalexander: "Thank you for the response, to be clear except for the very rare occasion when it is a legal issue (in which case I am generally more a m" [operations/puppet] - 10https://gerrit.wikimedia.org/r/80760 (owner: 10Dereckson) [03:52:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:53:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [04:13:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:15:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [04:18:02] PROBLEM - Puppet freshness on cp1063 is CRITICAL: No successful Puppet run in the last 10 hours [04:26:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:27:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [04:52:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.131 second response time [05:22:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:23:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [05:28:28] PROBLEM - DPKG on search1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:36:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:37:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [05:37:39] PROBLEM - Puppet freshness on fenari is CRITICAL: No successful Puppet run in the last 10 hours [05:52:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:53:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [06:21:45] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:22:35] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [06:26:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:28:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [06:43:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:44:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [06:52:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:53:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [07:01:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:02:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [07:20:59] RECOVERY - check_job_queue on fenari is OK: JOBQUEUE OK - all job queues below 10,000 [07:22:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:23:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [07:24:09] PROBLEM - check_job_queue on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:39:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:40:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.262 second response time [07:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [07:56:55] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: No successful Puppet run in the last 10 hours [07:56:55] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: No successful Puppet run in the last 10 hours [08:21:49] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: No successful Puppet run in the last 10 hours [08:22:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:24:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [08:26:49] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: No successful Puppet run in the last 10 hours [08:39:16] PROBLEM - DPKG on search1019 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [08:56:05] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: No successful Puppet run in the last 10 hours [08:57:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:58:05] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: No successful Puppet run in the last 10 hours [08:58:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.152 second response time [09:17:58] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: No successful Puppet run in the last 10 hours [09:18:58] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: No successful Puppet run in the last 10 hours [09:18:58] PROBLEM - Puppet freshness on mw1126 is CRITICAL: No successful Puppet run in the last 10 hours [09:20:58] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: No successful Puppet run in the last 10 hours [09:39:13] PROBLEM - MySQL Slave Delay on db1010 is CRITICAL: CRIT replication delay 203 seconds [09:39:23] PROBLEM - MySQL Replication Heartbeat on db1010 is CRITICAL: CRIT replication delay 212 seconds [09:42:13] RECOVERY - MySQL Slave Delay on db1010 is OK: OK replication delay 0 seconds [09:42:23] RECOVERY - MySQL Replication Heartbeat on db1010 is OK: OK replication delay -0 seconds [10:22:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:23:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.149 second response time [10:27:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:28:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [10:40:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:41:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [10:52:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:53:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.134 second response time [11:02:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:03:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [11:22:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:23:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [11:27:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:28:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [11:52:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:53:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [12:13:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:14:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [12:22:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:23:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [12:32:56] PROBLEM - Puppet freshness on eeden is CRITICAL: No successful Puppet run in the last 10 hours [12:52:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:53:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.133 second response time [12:56:10] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:59:00] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [13:06:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:07:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [13:13:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:14:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [13:22:22] PROBLEM - Disk space on wtp1013 is CRITICAL: DISK CRITICAL - free space: / 328 MB (3% inode=77%): [13:22:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:23:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [13:27:22] PROBLEM - Parsoid on wtp1013 is CRITICAL: Connection refused [13:43:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:44:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [13:49:24] RECOVERY - Disk space on wtp1013 is OK: DISK OK [13:49:24] RECOVERY - Parsoid on wtp1013 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.008 second response time [13:50:24] PROBLEM - Disk space on wtp1023 is CRITICAL: DISK CRITICAL - free space: / 295 MB (3% inode=77%): [13:52:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:53:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [13:55:34] PROBLEM - Parsoid on wtp1023 is CRITICAL: Connection refused [13:55:34] PROBLEM - Disk space on wtp1020 is CRITICAL: DISK CRITICAL - free space: / 331 MB (3% inode=77%): [13:56:34] PROBLEM - Disk space on wtp1019 is CRITICAL: DISK CRITICAL - free space: / 33 MB (0% inode=77%): [13:58:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:59:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [13:59:34] PROBLEM - Disk space on wtp1006 is CRITICAL: DISK CRITICAL - free space: / 6 MB (0% inode=77%): [13:59:34] PROBLEM - Parsoid on wtp1019 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:00:24] RECOVERY - Disk space on wtp1023 is OK: DISK OK [14:00:34] PROBLEM - Parsoid on wtp1020 is CRITICAL: Connection refused [14:00:34] RECOVERY - Parsoid on wtp1023 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.004 second response time [14:01:24] RECOVERY - Parsoid on wtp1019 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.004 second response time [14:01:35] RECOVERY - Disk space on wtp1019 is OK: DISK OK [14:02:14] PROBLEM - Parsoid on wtp1006 is CRITICAL: Connection refused [14:05:04] PROBLEM - Disk space on wtp1012 is CRITICAL: DISK CRITICAL - free space: / 335 MB (3% inode=77%): [14:06:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:07:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [14:10:17] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay 0 seconds [14:10:27] RECOVERY - MySQL Replication Heartbeat on db1047 is OK: OK replication delay -0 seconds [14:10:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:10:37] PROBLEM - Parsoid on wtp1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:11:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [14:13:40] ^ [14:13:48] Is anyone doing anything with the wtp boxes? [14:16:56] wt1016 filled its last 4gb or so of space in the last 15 minutes [14:17:27] PROBLEM - Disk space on wtp1016 is CRITICAL: DISK CRITICAL - free space: / 40 MB (0% inode=77%): [14:18:17] PROBLEM - Puppet freshness on cp1063 is CRITICAL: No successful Puppet run in the last 10 hours [14:19:37] PROBLEM - Disk space on wtp1015 is CRITICAL: DISK CRITICAL - free space: / 321 MB (3% inode=77%): [14:20:07] PROBLEM - Parsoid on wtp1016 is CRITICAL: Connection refused [14:22:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:23:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [14:23:27] RECOVERY - Parsoid on wtp1020 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.004 second response time [14:23:27] RECOVERY - Disk space on wtp1006 is OK: DISK OK [14:23:37] RECOVERY - Parsoid on wtp1006 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.004 second response time [14:23:37] RECOVERY - Disk space on wtp1020 is OK: DISK OK [14:24:27] PROBLEM - Disk space on wtp1005 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=77%): [14:24:27] RECOVERY - Disk space on wtp1016 is OK: DISK OK [14:24:57] PROBLEM - Parsoid on wtp1015 is CRITICAL: Connection refused [14:25:07] RECOVERY - Parsoid on wtp1016 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.007 second response time [14:26:07] PROBLEM - Parsoid on wtp1005 is CRITICAL: Connection refused [14:27:07] RECOVERY - Parsoid on wtp1005 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.006 second response time [14:27:27] RECOVERY - Disk space on wtp1005 is OK: DISK OK [14:28:07] RECOVERY - Disk space on wtp1012 is OK: DISK OK [14:36:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:37:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.140 second response time [14:41:40] RECOVERY - Disk space on wtp1015 is OK: DISK OK [14:41:50] RECOVERY - Parsoid on wtp1015 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.002 second response time [14:44:30] PROBLEM - Disk space on wtp1018 is CRITICAL: DISK CRITICAL - free space: / 206 MB (2% inode=77%): [14:46:00] PROBLEM - Disk space on wtp1007 is CRITICAL: DISK CRITICAL - free space: / 342 MB (3% inode=77%): [14:48:50] PROBLEM - Parsoid on wtp1018 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:51:10] PROBLEM - Parsoid on wtp1007 is CRITICAL: Connection refused [14:56:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:57:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.139 second response time [15:06:20] PROBLEM - Disk space on wtp1022 is CRITICAL: DISK CRITICAL - free space: / 132 MB (1% inode=77%): [15:09:25] PROBLEM - Parsoid on wtp1022 is CRITICAL: Connection refused [15:09:35] RECOVERY - Disk space on wtp1018 is OK: DISK OK [15:09:35] RECOVERY - Parsoid on wtp1018 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.005 second response time [15:11:55] RECOVERY - Disk space on wtp1007 is OK: DISK OK [15:12:25] RECOVERY - Parsoid on wtp1007 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.009 second response time [15:13:25] PROBLEM - Disk space on wtp1013 is CRITICAL: DISK CRITICAL - free space: / 302 MB (3% inode=77%): [15:14:05] RECOVERY - Puppet freshness on eeden is OK: puppet ran at Sun Aug 25 15:13:57 UTC 2013 [15:17:05] PROBLEM - Disk space on wtp1010 is CRITICAL: DISK CRITICAL - free space: / 73 MB (0% inode=77%): [15:18:25] PROBLEM - Parsoid on wtp1013 is CRITICAL: Connection refused [15:19:26] RECOVERY - Parsoid on wtp1013 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.008 second response time [15:19:26] RECOVERY - Disk space on wtp1013 is OK: DISK OK [15:19:35] PROBLEM - Parsoid on wtp1010 is CRITICAL: Connection refused [15:22:05] RECOVERY - Disk space on wtp1010 is OK: DISK OK [15:22:35] RECOVERY - Parsoid on wtp1010 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.009 second response time [15:22:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:23:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [15:30:25] RECOVERY - Disk space on wtp1022 is OK: DISK OK [15:30:25] RECOVERY - Parsoid on wtp1022 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.007 second response time [15:30:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:32:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [15:38:04] PROBLEM - Puppet freshness on fenari is CRITICAL: No successful Puppet run in the last 10 hours [16:22:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:23:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [16:32:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:33:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [16:34:35] Reedy: Hey there [16:34:41] Lemme look at what these wtp boxes are doing [16:35:46] What the ... [16:35:59] Did someone like reinstall all of these boxes? I get host key verification failures for all of thme [16:37:25] NSA! [16:38:59] !log Investigating low disk space warnings on wtp1001-1024, but it looks like something weird happened to them, I get host key verification warnings or failures for all of them [16:39:04] Logged the message, Mr. Obvious [16:39:08] haha! [16:39:31] Hmph maybe that's just because *everything* is failing [16:39:34] Even fenari from fenari [16:40:16] !log Strike that, my account on fenari is just being weird apparently [16:40:21] Logged the message, Mr. Obvious [16:40:46] Hmm, so wtp1001-1004 have larger, LVMed root partitions, while the others don't [16:41:49] And something is using that space too [16:41:56] wtp1002 has 30GB in use, but Parsoid isn't using any of that space [16:44:45] !log Restarted Parsoid on wtp1012, was unresponsive [16:44:51] Logged the message, Mr. Obvious [16:45:38] RECOVERY - Parsoid on wtp1012 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.003 second response time [16:46:00] YuviPanda: Yeah the Mr Obvious thing is something Domas put in many many years ago [16:46:08] heh, I never noticed [16:46:17] I remember Leslie being called mistress of something [16:46:41] of the network gear, I think [16:46:48] yes, something along those lines [16:47:03] I should do soemthing for grrrit-wm [16:48:37] Hah, no, I'm clearly not awake. That 30G is *free* space [16:49:53] !log Restarted Parsoid on a couple of boxes to clear some disk space. It appears wtp1001-1004 have been set up with larger, LVMed root partitions but I can't find any logs on when this was done, by whom, or why [16:49:58] Logged the message, Mr. Obvious [16:52:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:53:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [17:57:35] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: No successful Puppet run in the last 10 hours [17:57:35] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: No successful Puppet run in the last 10 hours [17:57:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:59:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [18:22:02] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: No successful Puppet run in the last 10 hours [18:27:02] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: No successful Puppet run in the last 10 hours [18:27:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:28:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [18:31:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:33:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [18:45:55] PROBLEM - DPKG on ms-be1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:56:55] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: No successful Puppet run in the last 10 hours [18:58:55] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: No successful Puppet run in the last 10 hours [19:18:57] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: No successful Puppet run in the last 10 hours [19:19:57] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: No successful Puppet run in the last 10 hours [19:19:57] PROBLEM - Puppet freshness on mw1126 is CRITICAL: No successful Puppet run in the last 10 hours [19:21:57] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: No successful Puppet run in the last 10 hours [19:22:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:23:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [19:44:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:46:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.140 second response time [19:52:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [20:00:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:01:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.136 second response time [20:22:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:23:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [21:22:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:23:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [21:58:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:59:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.140 second response time [22:26:40] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:27:30] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.146 second response time [23:56:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:57:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time