[00:03:56] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 00:03:52 UTC 2013 [00:04:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [00:10:45] RECOVERY - Solr on vanadium is OK: All OK [00:14:45] PROBLEM - Solr on vanadium is CRITICAL: Average request time is 1083.9518 (gt 1000) [00:14:55] PROBLEM - Puppet freshness on ms-fe1001 is CRITICAL: No successful Puppet run in the last 10 hours [00:15:55] PROBLEM - Puppet freshness on bast1001 is CRITICAL: No successful Puppet run in the last 10 hours [00:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [00:24:56] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 00:24:53 UTC 2013 [00:25:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [00:28:55] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 00:28:52 UTC 2013 [00:29:05] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 00:28:57 UTC 2013 [00:29:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [00:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [00:33:25] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 00:33:17 UTC 2013 [00:33:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [00:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.150 second response time [00:54:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 00:54:51 UTC 2013 [00:55:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [00:58:15] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 00:58:12 UTC 2013 [00:58:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [00:58:45] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 00:58:42 UTC 2013 [00:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [01:01:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:02:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [01:02:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 01:02:42 UTC 2013 [01:03:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [01:11:45] RECOVERY - Solr on vanadium is OK: All OK [01:14:45] PROBLEM - Solr on vanadium is CRITICAL: Average request time is 1044.5839 (gt 1000) [01:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.134 second response time [01:24:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 01:24:50 UTC 2013 [01:25:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [01:27:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 01:27:39 UTC 2013 [01:28:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [01:28:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 01:28:46 UTC 2013 [01:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [01:32:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:32:55] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 01:32:48 UTC 2013 [01:33:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [01:33:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [01:51:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [01:54:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 01:54:45 UTC 2013 [01:54:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [01:57:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 01:57:43 UTC 2013 [01:58:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [01:58:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 01:58:50 UTC 2013 [01:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [02:03:15] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 02:03:09 UTC 2013 [02:03:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [02:07:00] !log LocalisationUpdate completed (1.22wmf10) at Sun Jul 21 02:07:00 UTC 2013 [02:12:25] !log LocalisationUpdate completed (1.22wmf11) at Sun Jul 21 02:12:23 UTC 2013 [02:13:45] RECOVERY - Solr on vanadium is OK: All OK [02:20:44] !log LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 21 02:20:44 UTC 2013 [02:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:24:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [02:27:55] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 02:27:47 UTC 2013 [02:27:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 02:27:53 UTC 2013 [02:28:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [02:29:01] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [02:29:15] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 02:29:09 UTC 2013 [02:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [02:32:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 02:32:38 UTC 2013 [02:33:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [02:38:55] PROBLEM - Puppet freshness on dobson is CRITICAL: No successful Puppet run in the last 10 hours [02:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [02:54:55] PROBLEM - Puppet freshness on mchenry is CRITICAL: No successful Puppet run in the last 10 hours [02:55:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 02:55:46 UTC 2013 [02:55:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [02:58:15] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 02:58:05 UTC 2013 [02:58:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [02:58:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 02:58:45 UTC 2013 [02:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [03:01:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:02:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [03:02:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 03:02:44 UTC 2013 [03:03:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [03:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [03:24:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 03:24:50 UTC 2013 [03:25:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [03:27:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 03:27:44 UTC 2013 [03:28:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [03:29:25] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 03:29:21 UTC 2013 [03:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [03:32:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:32:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 03:32:44 UTC 2013 [03:33:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [03:33:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [03:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.144 second response time [03:54:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 03:54:46 UTC 2013 [03:54:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [03:57:55] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 03:57:48 UTC 2013 [03:58:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [03:58:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 03:58:50 UTC 2013 [03:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [04:05:55] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 04:05:46 UTC 2013 [04:06:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [04:13:45] RECOVERY - HTTP radosgw on ms-fe1002 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.411 second response time [04:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [04:25:35] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 04:25:34 UTC 2013 [04:25:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [04:28:04] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 04:27:52 UTC 2013 [04:28:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [04:28:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 04:28:48 UTC 2013 [04:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [04:32:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 04:32:41 UTC 2013 [04:33:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [04:34:55] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [04:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [04:57:45] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 04:57:40 UTC 2013 [04:57:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 04:57:40 UTC 2013 [04:57:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [04:58:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [04:58:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 04:58:46 UTC 2013 [04:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [05:01:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:02:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [05:02:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 05:02:37 UTC 2013 [05:03:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [05:06:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:07:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [05:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [05:24:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 05:24:51 UTC 2013 [05:25:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [05:27:55] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 05:27:45 UTC 2013 [05:28:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [05:28:55] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 05:28:51 UTC 2013 [05:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [05:32:45] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [05:32:55] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 05:32:48 UTC 2013 [05:33:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [05:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [05:54:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 05:54:47 UTC 2013 [05:55:56] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [05:57:55] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 05:57:45 UTC 2013 [05:58:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [05:58:56] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 05:58:51 UTC 2013 [05:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [06:02:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 06:02:36 UTC 2013 [06:02:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [06:17:56] PROBLEM - SSH on cp1043 is CRITICAL: Server answer: [06:18:55] RECOVERY - SSH on cp1043 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [06:21:55] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [06:24:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 06:24:45 UTC 2013 [06:24:55] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [06:24:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [06:27:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 06:27:38 UTC 2013 [06:28:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [06:28:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 06:28:49 UTC 2013 [06:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [06:32:55] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 06:32:49 UTC 2013 [06:33:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [06:47:56] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [06:49:55] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [06:54:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 06:54:50 UTC 2013 [06:55:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [06:57:55] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 06:57:48 UTC 2013 [06:58:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [06:59:05] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 06:58:54 UTC 2013 [06:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [07:02:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 07:02:37 UTC 2013 [07:03:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [07:06:55] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:07:45] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [07:10:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:11:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.139 second response time [07:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [07:24:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 07:24:51 UTC 2013 [07:25:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [07:27:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 07:27:42 UTC 2013 [07:28:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [07:28:45] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 07:28:43 UTC 2013 [07:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [07:33:15] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 07:33:08 UTC 2013 [07:33:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [07:51:55] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:52:45] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [07:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [07:53:55] RECOVERY - search indices - check lucene status page on search1004 is OK: HTTP OK: HTTP/1.1 200 OK - 163 bytes in 0.002 second response time [07:54:05] RECOVERY - search indices - check lucene status page on search1005 is OK: HTTP OK: HTTP/1.1 200 OK - 163 bytes in 0.004 second response time [07:54:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 07:54:46 UTC 2013 [07:54:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [07:57:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 07:57:39 UTC 2013 [07:58:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [07:58:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 07:58:44 UTC 2013 [07:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [08:01:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:02:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.137 second response time [08:02:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 08:02:39 UTC 2013 [08:02:55] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:03:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [08:06:56] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [08:10:56] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [08:12:55] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [08:24:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 08:24:48 UTC 2013 [08:25:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [08:28:05] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 08:28:00 UTC 2013 [08:28:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [08:29:25] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 08:29:21 UTC 2013 [08:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [08:34:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 08:34:37 UTC 2013 [08:34:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [08:37:55] PROBLEM - Puppet freshness on neon is CRITICAL: No successful Puppet run in the last 10 hours [08:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [08:54:46] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 08:54:43 UTC 2013 [08:54:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [08:56:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:57:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [08:58:15] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 08:58:11 UTC 2013 [08:58:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [08:58:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 08:58:48 UTC 2013 [08:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [09:03:55] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: No successful Puppet run in the last 10 hours [09:05:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 09:05:40 UTC 2013 [09:06:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [09:07:56] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: No successful Puppet run in the last 10 hours [09:08:55] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: No successful Puppet run in the last 10 hours [09:22:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [09:24:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 09:24:51 UTC 2013 [09:25:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [09:30:13] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [09:30:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 09:30:41 UTC 2013 [09:32:15] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [09:32:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 09:32:43 UTC 2013 [09:33:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [09:43:55] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [09:43:55] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [09:43:55] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [09:43:55] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [09:43:55] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [09:43:55] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [09:43:55] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [09:48:55] PROBLEM - Puppet freshness on ms-fe1002 is CRITICAL: No successful Puppet run in the last 10 hours [09:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.139 second response time [09:54:55] PROBLEM - Puppet freshness on ms-fe1003 is CRITICAL: No successful Puppet run in the last 10 hours [09:54:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 09:54:52 UTC 2013 [09:55:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [09:58:05] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 09:58:03 UTC 2013 [09:58:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [09:58:45] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 09:58:43 UTC 2013 [09:59:46] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [09:59:55] PROBLEM - Puppet freshness on ms-fe1004 is CRITICAL: No successful Puppet run in the last 10 hours [10:01:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:02:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [10:02:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 10:02:42 UTC 2013 [10:03:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [10:09:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:10:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [10:15:55] PROBLEM - Puppet freshness on ms-fe1001 is CRITICAL: No successful Puppet run in the last 10 hours [10:16:55] PROBLEM - Puppet freshness on bast1001 is CRITICAL: No successful Puppet run in the last 10 hours [10:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:23:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 6.352 second response time [10:25:05] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 10:24:56 UTC 2013 [10:25:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [10:28:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 10:28:36 UTC 2013 [10:28:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [10:28:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 10:28:47 UTC 2013 [10:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [10:33:15] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 10:33:12 UTC 2013 [10:33:46] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [10:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [10:54:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 10:54:48 UTC 2013 [10:55:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [10:57:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 10:57:37 UTC 2013 [10:58:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [10:59:15] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 10:59:08 UTC 2013 [10:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [11:02:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:03:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [11:03:25] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 11:03:16 UTC 2013 [11:03:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [11:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [11:24:56] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 11:24:54 UTC 2013 [11:25:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [11:27:55] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 11:27:46 UTC 2013 [11:28:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [11:29:35] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 11:29:28 UTC 2013 [11:29:46] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [11:32:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 11:32:41 UTC 2013 [11:33:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [11:34:35] PROBLEM - Host mw1173 is DOWN: PING CRITICAL - Packet loss = 100% [11:38:55] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:40:45] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [11:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [11:54:45] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 11:54:39 UTC 2013 [11:54:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [11:57:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 11:57:43 UTC 2013 [11:58:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [11:58:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 11:58:54 UTC 2013 [11:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [12:02:35] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 12:02:33 UTC 2013 [12:02:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [12:17:55] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [12:18:56] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [12:24:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 12:24:52 UTC 2013 [12:25:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [12:28:15] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 12:28:05 UTC 2013 [12:28:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [12:30:35] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 12:30:32 UTC 2013 [12:30:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [12:32:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 12:32:43 UTC 2013 [12:33:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [12:36:17] incase anyone is around and wants to poke before chad gets back: (NEW) Git.wikimedia.org (web interface) be broken again, yo! - https://bugzilla.wikimedia.org/51769 normal; Wikimedia: Git/Gerrit; () [12:39:55] PROBLEM - Puppet freshness on dobson is CRITICAL: No successful Puppet run in the last 10 hours [12:51:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [12:54:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 12:54:50 UTC 2013 [12:55:55] PROBLEM - Puppet freshness on mchenry is CRITICAL: No successful Puppet run in the last 10 hours [12:55:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [12:58:15] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 12:58:10 UTC 2013 [12:58:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [12:58:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 12:58:46 UTC 2013 [12:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [13:02:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 13:02:39 UTC 2013 [13:03:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [13:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.154 second response time [13:24:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 13:24:48 UTC 2013 [13:25:56] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [13:28:55] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 13:28:49 UTC 2013 [13:29:05] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 13:28:59 UTC 2013 [13:29:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [13:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [13:32:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 13:32:37 UTC 2013 [13:32:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [13:35:45] PROBLEM - Solr on vanadium is CRITICAL: Average request time is 1007.56824 (gt 1000) [13:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [13:54:45] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 13:54:44 UTC 2013 [13:54:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [13:57:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 13:57:37 UTC 2013 [13:58:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [13:58:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 13:58:48 UTC 2013 [13:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [14:02:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 14:02:37 UTC 2013 [14:02:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [14:06:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:07:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.133 second response time [14:14:15] PROBLEM - SSH on pdf2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:22:15] RECOVERY - SSH on pdf2 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [14:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [14:24:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 14:24:47 UTC 2013 [14:24:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [14:27:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 14:27:44 UTC 2013 [14:28:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [14:28:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 14:28:46 UTC 2013 [14:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [14:31:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:32:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [14:32:55] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 14:32:45 UTC 2013 [14:33:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [14:35:55] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [14:54:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 14:54:47 UTC 2013 [14:54:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [14:57:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 14:57:35 UTC 2013 [14:57:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [14:58:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 14:58:52 UTC 2013 [14:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [15:02:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 15:02:41 UTC 2013 [15:03:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [15:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.173 second response time [15:27:25] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 15:27:21 UTC 2013 [15:27:55] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 15:27:46 UTC 2013 [15:27:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [15:28:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [15:28:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 15:28:53 UTC 2013 [15:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [15:32:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 15:32:42 UTC 2013 [15:33:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [15:47:55] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [15:49:55] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [15:52:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.131 second response time [15:54:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 15:54:47 UTC 2013 [15:54:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [15:57:55] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:58:15] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 15:58:05 UTC 2013 [15:58:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [15:59:05] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 15:59:00 UTC 2013 [15:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [16:01:06] PROBLEM - Disk space on analytics1010 is CRITICAL: DISK CRITICAL - free space: / 511 MB (2% inode=85%): [16:01:45] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [16:02:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 16:02:40 UTC 2013 [16:03:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [16:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [16:24:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 16:24:49 UTC 2013 [16:25:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [16:27:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 16:27:37 UTC 2013 [16:28:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [16:28:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 16:28:50 UTC 2013 [16:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [16:32:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 16:32:41 UTC 2013 [16:33:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [16:39:45] PROBLEM - LVS HTTPS IPv4 on wikivoyage-lb.pmtpa.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 504 Gateway Time-out - 3500 bytes in 0.152 second response time [16:39:55] PROBLEM - Apache HTTP on mw1024 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:39:55] PROBLEM - Apache HTTP on mw1106 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:39:55] PROBLEM - Apache HTTP on mw1062 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:39:55] PROBLEM - Apache HTTP on mw1180 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:39:55] PROBLEM - Apache HTTP on mw1095 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:39:55] PROBLEM - Apache HTTP on mw1177 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:39:55] PROBLEM - Apache HTTP on mw1035 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:39:56] PROBLEM - Apache HTTP on mw1216 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:39:56] PROBLEM - Apache HTTP on mw1088 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:39:57] PROBLEM - LVS HTTP IPv4 on appservers.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:40:05] PROBLEM - Apache HTTP on mw1186 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:40:05] PROBLEM - Apache HTTP on mw1172 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:40:05] PROBLEM - Apache HTTP on mw1179 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:40:05] PROBLEM - Apache HTTP on mw1161 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:40:05] PROBLEM - Apache HTTP on mw1176 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:40:05] PROBLEM - Apache HTTP on mw1187 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:40:05] PROBLEM - Apache HTTP on mw1185 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:41:15] RECOVERY - Apache HTTP on mw1175 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.837 second response time [16:41:15] RECOVERY - Apache HTTP on mw1019 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 4.806 second response time [16:41:15] RECOVERY - Apache HTTP on mw1210 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 4.895 second response time [16:41:15] RECOVERY - Apache HTTP on mw1084 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 4.452 second response time [16:41:15] RECOVERY - Apache HTTP on mw1091 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 5.022 second response time [16:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [16:54:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 16:54:46 UTC 2013 [16:54:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [16:57:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 16:57:34 UTC 2013 [16:57:45] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [16:58:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 16:58:51 UTC 2013 [16:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [17:02:55] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 17:02:45 UTC 2013 [17:03:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [17:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:24:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [17:24:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 17:24:51 UTC 2013 [17:25:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [17:27:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 17:27:40 UTC 2013 [17:28:35] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [17:28:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 17:28:53 UTC 2013 [17:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [17:32:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 17:32:43 UTC 2013 [17:33:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [17:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [17:55:05] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 17:55:02 UTC 2013 [17:55:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [17:57:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 17:57:35 UTC 2013 [17:58:35] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [18:00:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 18:00:50 UTC 2013 [18:01:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [18:03:15] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 18:03:13 UTC 2013 [18:03:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [18:15:43] morebots is down [18:15:50] PANIC! [18:15:59] where is morebots run anyway? [18:16:04] Urgh. [18:16:04] wikitech-static [18:16:14] which is rackspace, iirc [18:16:28] Not linode? [18:16:32] It used to be linode, I think. [18:16:44] i thought it was linode too but ryan corrected me [18:18:00] https://bugzilla.wikimedia.org/buglist.cgi?query_format=specific&order=relevance%20desc&bug_status=__all__&content=morebots&list_id=219167 is pretty depressing. [18:18:44] i'll probably just rewrite it eventually like i did with logmsgbot [18:19:14] use an obscure language nobody has heard of, ok? [18:19:38] Written in whitespace. [18:19:48] SNOBOL [18:19:53] INTERCAL [18:20:57] https://github.com/atdt/snoflake [18:21:01] i never finished it :( [18:21:29] dat contributions graph [18:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:22:44] https://bugzilla.wikimedia.org/show_bug.cgi?id=51777 [18:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [18:23:42] just poke tim to restart it when he logs on later [18:24:05] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [18:24:23] Assuming it takes him five minutes, it costs what to have the bot restarted each time? [18:24:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 18:24:53 UTC 2013 [18:25:18] I guess at five minutes, it's not too substantial. [18:25:25] But it probably hurts morale. [18:25:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [18:26:06] i'm surprised no one else wants to do it [18:26:25] it's a python bot; if you decide to rewrite it it could be written in more or less anything [18:26:45] and it's a useful tool [18:26:45] I don't see why it needs to be rewritten. [18:26:53] It just needs to work properly. [18:27:18] Elsie: i'll be looking for your patch [18:27:49] Give me access to the box it's running on and I'll ensure it stays up. [18:28:03] I don't have access to it [18:28:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:28:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 18:28:43 UTC 2013 [18:28:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 18:28:54 UTC 2013 [18:28:56] I run an IRC bot or two. You have to ping the server every once in a while to make sure it's still connected and have it restart itself if it's not. [18:29:05] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [18:29:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.150 second response time [18:29:18] Elsie: yes, adding that to morebots would probably fix it [18:29:35] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [18:29:37] submit a patch or poke the right dev [18:29:43] Without access to the host, I can't tell if the process is dying or if it's something else. [18:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [18:30:13] https://bugzilla.wikimedia.org/show_bug.cgi?id=50485 [18:31:26] Oh, I'm just repeating everything you already know. :-) [18:31:27] Sorry. [18:31:56] I pasted it re: can't tell if the process is dying or if it's something else. [18:32:18] Well, I still can't say for sure, but for some reason I thought that bug was older and had already been resolved. [18:32:54] this is the library that it is using: [18:32:55] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 18:32:53 UTC 2013 [18:32:57] https://bitbucket.org/jaraco/irc/issue/16/irc-client-ping-timeout-issue [18:33:03] https://bitbucket.org/jaraco/irc/issue/1/library-does-not-detect-that-connection-is [18:33:21] "The reporter doesn't give any indication of how this could be addressed and in fact indicates that it may not be possible to address it. Until such a time as we have a concrete suggestion, I'm marking this as won't fix." [18:33:41] Sounds like a PHP dev. [18:33:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [18:35:00] changed status to resolved [18:35:00] Sounds like you've found a solution. Good to hear. We'll continue to look into the ping timeout / keepalive support. [18:35:00] https://bitbucket.org/jaraco/irc/commits/88ee7096b3d7 [18:35:20] Looks like keepalive support was merged. [18:35:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:35:29] yes, I noted it in my bug above [18:35:39] I don't trust it, though. [18:35:39] So it's a matter of setting it on our side. [18:35:49] Because logmsgbot doesn't use that, either. [18:36:16] so I think it's just masking some deeper problem in the bot's event loop that the maintainer is too lazy to properly diagnose [18:36:35] Probably, but I don't anyone cares. [18:36:40] don't think * [18:37:01] yes, that much is pretty evident [18:37:14] Unless the bot begins restarting all the time, having it self-restart every once in a while to always be up seems like an acceptable price to me. [18:37:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [18:37:31] yeah, I don't trust the set_keepalive to do that properly, either [18:37:36] I'd implement something more crude [18:37:59] just have a 5-minute timer and reset it any time any data comes forward from the server [18:58:15] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 18:58:08 UTC 2013 [18:58:35] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [18:58:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 18:58:49 UTC 2013 [18:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [19:02:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 19:02:36 UTC 2013 [19:02:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [19:04:55] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: No successful Puppet run in the last 10 hours [19:09:44] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: No successful Puppet run in the last 10 hours [20:16:55] PROBLEM - Puppet freshness on ms-fe1001 is CRITICAL: No successful Puppet run in the last 10 hours [20:18:28] PROBLEM - Puppet freshness on bast1001 is CRITICAL: No successful Puppet run in the last 10 hours [20:22:04] ori-l: hm. I thought I had fixed that the other day [20:22:12] re: deployment [20:22:20] * Ryan_Lane checks [20:22:24] this was yesterday, IIRC [20:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:22:35] I think there's a timestamp in the log [20:22:51] maybe my changes are being removed by puppet [20:22:56] though they shouldn't be [20:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [20:24:07] yep [20:24:09] being wiped out [20:24:11] hm [20:24:20] * Ryan_Lane checks puppet [20:24:49] Ryan_Lane: it's also OK to enjoy your sunday :) [20:24:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 20:24:48 UTC 2013 [20:25:00] this will only take one sec :) [20:25:28] ^ usually, famous last words [20:25:42] I wasn't blocked by it; I just SCP'd stuff instead. I ended up pushing out lots of code earlier in the week and needed to fix things. [20:25:55] * Ryan_Lane nods [20:25:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [20:26:01] (PS1) Ryan Lane: Use grain match for deployment pillars [operations/puppet] - https://gerrit.wikimedia.org/r/75055 [20:26:07] but parsoid also deploys with it [20:27:23] stupid jenkins :D [20:27:27] takes too long [20:27:40] (CR) jenkins-bot: [V: -1] Use grain match for deployment pillars [operations/puppet] - https://gerrit.wikimedia.org/r/75055 (owner: Ryan Lane) [20:27:49] is the -1 expected? [20:27:55] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 20:27:45 UTC 2013 [20:27:56] nope [20:28:06] LOST ? [20:28:06] wtf? [20:28:20] haha [20:28:29] well, the change is fine, I'm going to +2/+2 [20:28:35] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [20:28:41] (CR) Ryan Lane: [C: 2 V: 2] Use grain match for deployment pillars [operations/puppet] - https://gerrit.wikimedia.org/r/75055 (owner: Ryan Lane) [20:28:45] (Merged) Ryan Lane: Use grain match for deployment pillars [operations/puppet] - https://gerrit.wikimedia.org/r/75055 (owner: Ryan Lane) [20:28:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 20:28:53 UTC 2013 [20:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [20:31:41] ori-l: should be fixed now [20:32:03] and the fix won't break due to puppet this time :) [20:35:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 20:35:36 UTC 2013 [20:35:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [20:39:11] Ryan_Lane: sweet, thanks! [20:39:11] yw [20:54:45] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 20:54:39 UTC 2013 [20:54:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [20:57:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 20:57:42 UTC 2013 [20:58:35] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [20:59:45] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 20:59:44 UTC 2013 [21:00:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [21:02:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 21:02:37 UTC 2013 [21:03:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [21:10:13] Ryan_Lane: just tried it; works. [21:10:18] great [21:10:55] the pillars were missing because I screwed up the top file (didn't specify it was supposed to do a grain match) [21:11:28] i'll nod and pretend like i understood what you just said [21:11:52] i have a hard time keeping the mapping between salt terms and their ordinary meaning straight, probably just a consequence of not having used it very much. [21:12:29] pillars are global variables, basically [21:13:02] and you can specify that they exist on different minions by matching them (like a regex or glob match) [21:13:24] a grain is a fact, in puppet terms [21:13:38] ah, ok [21:13:58] since all targets use the deployment_target grain, I send out the pillars to deployment_target:* [21:14:19] I forgot to specify it was a grain match, so it was doing a glob match (which is against the host name) [21:15:04] yeah, i see that now, looking at the diff [21:15:12] * Ryan_Lane nods [21:15:13] btw, there's something about how the deployment puppet code is structured that could be done better, i think [21:15:24] oh, there's tons :) [21:15:32] most of the config should be optional [21:15:47] what would you change? [21:15:54] nah, something more basic than that: it contains configuration data for all the software components it manages [21:16:20] when it should be structured like a library, and provide puppet resources that the software component could parametrize in its own module to declare how it is deployed [21:17:02] * Ryan_Lane nods [21:18:14] puppet makes this rather difficult [21:18:29] like, parsoid.py is an odd fit for modules/deployment; its parsoid-specific, not deployment-specific. i ran into this when trying to have the same path $variable configure the deployment location and other resources that depend on the files being where they are [21:18:54] i ultimately gave up and just hard-coded a $path var and added a comment saying it should be kept in sync with the value in the deployment manifest [21:19:04] * Ryan_Lane nods [21:19:20] if you want to restructure things, I'm cool with that [21:19:28] just wait till after my next patchset ;) [21:19:59] ok, i might. if you don't mind, add me as a reviewer so i know when it is merged [21:20:07] will do [21:20:26] added. [21:20:43] I have a half-finished patchset 2 I'll likely push in tomorrow [21:20:54] cool, thanks [21:20:59] i won't review unless you explicitly ask [21:21:03] * Ryan_Lane nods [21:21:32] i don't think our code review model works all that well for software that is still in the initial development spurt [21:22:21] like, it's versioned, it's in git, it's ok to deploy it an see how well it works rather than just delay it for a month because someone doesn't like where you put a comma [21:22:24] * YuviPanda pushes things to github all the time, and then imports into gerrit later on :) [21:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:22:40] ori-l: well, I have a labs project for testing it [21:23:05] but in general, yeah, that's kind of a pain in the ass [21:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [21:23:33] I find mediawiki's review process takes a month or so per patchset [21:23:37] which is painful [21:24:08] it makes sense for core, i think. not the delays, but having strict CR standards [21:24:17] yep [21:24:29] btw, have you seen https://github.com/aodn/vagrant-openstack ? [21:24:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 21:24:47 UTC 2013 [21:24:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [21:25:10] could be useful for jenkins [21:25:50] since we're already looking at using vagrant there [21:26:50] could be useful for MW development in labs as well, assuming we ever get to the point where we can expose the apis directly [21:27:55] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 21:27:45 UTC 2013 [21:28:35] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [21:29:00] yeah, though the puppet management interfaces on wikitech-l make it unnecessary there; it'd be better to just consolidate some puppet modules across the two repos [21:29:05] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 21:29:01 UTC 2013 [21:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [21:32:55] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 21:32:49 UTC 2013 [21:33:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [21:35:20] are we always going to have one large puppet repo? [21:35:41] doesn't that make reuse elsewhere sortof harder? [21:36:55] yeah, maybe we should revisit that [21:42:54] ori-l: yeah. like it's hard to make labs changes to to -labs, for example [21:52:05] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [21:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:53:05] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [21:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.139 second response time [21:54:45] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 21:54:42 UTC 2013 [21:54:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [21:58:55] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 21:58:50 UTC 2013 [21:59:05] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 21:58:55 UTC 2013 [21:59:35] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [21:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [22:02:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 22:02:36 UTC 2013 [22:02:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [22:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [22:24:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 22:24:51 UTC 2013 [22:25:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [22:27:45] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 22:27:44 UTC 2013 [22:28:35] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [22:29:15] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 22:29:10 UTC 2013 [22:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [22:32:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 22:32:40 UTC 2013 [22:33:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [22:36:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:37:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.141 second response time [22:40:55] PROBLEM - Puppet freshness on dobson is CRITICAL: No successful Puppet run in the last 10 hours [22:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:53:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [22:54:45] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 22:54:44 UTC 2013 [22:54:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [22:56:55] PROBLEM - Puppet freshness on mchenry is CRITICAL: No successful Puppet run in the last 10 hours [22:58:05] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 22:58:04 UTC 2013 [22:58:35] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [22:58:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 22:58:50 UTC 2013 [22:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [23:03:15] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 23:03:04 UTC 2013 [23:03:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [23:09:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:10:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.142 second response time [23:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [23:24:55] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 23:24:47 UTC 2013 [23:24:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [23:28:15] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 23:28:05 UTC 2013 [23:28:35] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [23:28:55] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 23:28:51 UTC 2013 [23:29:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours [23:31:05] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [23:31:24] Hi TimStarling. morebots has gone missing. [23:32:20] I've restarted it [23:32:45] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Sun Jul 21 23:32:40 UTC 2013 [23:33:05] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [23:33:45] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [23:42:02] Thanks, Ryan_Lane. [23:43:32] yw [23:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [23:54:45] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Sun Jul 21 23:54:40 UTC 2013 [23:54:55] PROBLEM - Puppet freshness on cp1042 is CRITICAL: No successful Puppet run in the last 10 hours [23:58:05] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Sun Jul 21 23:58:03 UTC 2013 [23:58:35] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [23:58:45] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Sun Jul 21 23:58:43 UTC 2013 [23:59:45] PROBLEM - Puppet freshness on cp1041 is CRITICAL: No successful Puppet run in the last 10 hours