[00:02:17] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Server Error - 1703 bytes in 7.993 second response time [00:12:17] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 327395 bytes in 9.047 second response time [00:25:27] PROBLEM - RAID on mw1061 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:25:57] PROBLEM - RAID on mw1099 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:26:17] PROBLEM - Apache HTTP on mw1061 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:26:17] PROBLEM - SSH on mw1099 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:26:17] PROBLEM - RAID on mw1022 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:26:27] PROBLEM - twemproxy process on mw1099 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:26:27] PROBLEM - SSH on mw1061 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:26:37] PROBLEM - DPKG on mw1061 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:26:37] PROBLEM - puppet disabled on mw1061 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:26:37] PROBLEM - twemproxy process on mw1061 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:26:47] PROBLEM - Apache HTTP on mw1099 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:26:47] PROBLEM - twemproxy port on mw1099 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:26:47] PROBLEM - Disk space on mw1061 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:26:57] PROBLEM - DPKG on mw1099 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:26:57] PROBLEM - twemproxy port on mw1061 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:27:07] RECOVERY - Apache HTTP on mw1061 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.066 second response time [00:27:07] PROBLEM - Disk space on mw1099 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:27:07] PROBLEM - puppet disabled on mw1099 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:27:07] PROBLEM - RAID on mw1080 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:27:17] RECOVERY - RAID on mw1061 is OK: OK: no RAID installed [00:27:17] RECOVERY - SSH on mw1061 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:27:27] RECOVERY - DPKG on mw1061 is OK: All packages OK [00:27:27] RECOVERY - puppet disabled on mw1061 is OK: OK [00:27:27] RECOVERY - twemproxy process on mw1061 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:27:27] PROBLEM - DPKG on mw1080 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:27:27] PROBLEM - SSH on mw1080 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:27:37] PROBLEM - Apache HTTP on mw1080 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:27:38] PROBLEM - Disk space on mw1080 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:27:38] RECOVERY - Disk space on mw1061 is OK: DISK OK [00:27:47] RECOVERY - twemproxy port on mw1061 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:27:47] PROBLEM - RAID on mw1070 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:27:57] PROBLEM - DPKG on mw1067 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:27:57] PROBLEM - RAID on mw1104 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:27:57] PROBLEM - RAID on mw1020 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:27:57] PROBLEM - DPKG on mw1108 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:07] PROBLEM - Apache HTTP on mw1041 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:28:07] PROBLEM - DPKG on mw1020 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:07] PROBLEM - puppet disabled on mw1080 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:07] PROBLEM - DPKG on mw1079 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:07] RECOVERY - Disk space on mw1099 is OK: DISK OK [00:28:08] PROBLEM - twemproxy process on mw1037 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:17] PROBLEM - twemproxy process on mw1080 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:17] PROBLEM - Apache HTTP on mw1020 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:28:17] PROBLEM - Apache HTTP on mw1037 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:28:17] PROBLEM - RAID on mw1112 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:17] PROBLEM - RAID on mw1041 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:18] PROBLEM - twemproxy port on mw1080 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:18] PROBLEM - RAID on mw1077 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:19] PROBLEM - twemproxy port on mw1067 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:19] PROBLEM - puppet disabled on mw1020 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:20] PROBLEM - SSH on mw1108 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:28:20] PROBLEM - twemproxy port on mw1108 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:21] PROBLEM - puppet disabled on mw1108 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:27] PROBLEM - RAID on mw1079 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:27] PROBLEM - RAID on mw1037 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:27] PROBLEM - twemproxy process on mw1020 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:27] PROBLEM - DPKG on mw1037 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:27] PROBLEM - RAID on mw1054 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:28] PROBLEM - Apache HTTP on mw1054 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:28:28] PROBLEM - SSH on mw1037 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:28:29] PROBLEM - Apache HTTP on mw1067 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:28:29] PROBLEM - RAID on mw1108 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:37] PROBLEM - twemproxy process on mw1072 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:37] PROBLEM - Apache HTTP on mw1065 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:28:37] PROBLEM - Apache HTTP on mw1072 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:28:37] PROBLEM - SSH on mw1072 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:28:37] PROBLEM - SSH on mw1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:28:38] PROBLEM - Disk space on mw1020 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:38] PROBLEM - Apache HTTP on mw1056 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:28:39] PROBLEM - RAID on mw1029 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:39] PROBLEM - DPKG on mw1072 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:40] PROBLEM - puppet disabled on mw1037 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:40] PROBLEM - RAID on mw1072 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:41] PROBLEM - RAID on mw1019 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:41] PROBLEM - DPKG on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:42] RECOVERY - RAID on mw1070 is OK: OK: no RAID installed [00:28:42] PROBLEM - RAID on mw1113 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:47] PROBLEM - Apache HTTP on mw1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:28:47] PROBLEM - Apache HTTP on mw1029 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:28:47] PROBLEM - SSH on mw1020 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:28:47] PROBLEM - Disk space on mw1056 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:47] PROBLEM - Disk space on mw1097 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:48] PROBLEM - SSH on mw1056 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:28:48] PROBLEM - twemproxy port on mw1020 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:49] PROBLEM - RAID on mw1081 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:49] PROBLEM - RAID on mw1058 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:50] PROBLEM - RAID on mw1056 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:50] RECOVERY - DPKG on mw1067 is OK: All packages OK [00:28:51] PROBLEM - SSH on mw1026 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:28:57] PROBLEM - Apache HTTP on mw1079 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:28:57] PROBLEM - DPKG on mw1022 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:57] RECOVERY - DPKG on mw1108 is OK: All packages OK [00:28:57] PROBLEM - Apache HTTP on mw1108 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:28:57] PROBLEM - RAID on mw1045 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:07] RECOVERY - DPKG on mw1079 is OK: All packages OK [00:29:07] PROBLEM - Apache HTTP on mw1052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:07] PROBLEM - SSH on mw1045 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:07] PROBLEM - twemproxy port on mw1045 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:07] PROBLEM - twemproxy process on mw1035 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:08] PROBLEM - twemproxy process on mw1097 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:08] PROBLEM - twemproxy port on mw1081 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:09] RECOVERY - twemproxy port on mw1067 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:29:17] PROBLEM - Apache HTTP on mw1105 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:17] PROBLEM - twemproxy port on mw1035 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:17] RECOVERY - RAID on mw1041 is OK: OK: no RAID installed [00:29:17] RECOVERY - puppet disabled on mw1108 is OK: OK [00:29:17] PROBLEM - RAID on mw1035 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:18] RECOVERY - twemproxy port on mw1108 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:29:18] PROBLEM - puppet disabled on mw1081 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:19] PROBLEM - RAID on mw1039 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:19] PROBLEM - Apache HTTP on mw1093 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:20] PROBLEM - Disk space on mw1045 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:20] RECOVERY - SSH on mw1108 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:29:21] PROBLEM - SSH on mw1036 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:21] PROBLEM - DPKG on mw1036 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:22] PROBLEM - Apache HTTP on mw1063 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:22] PROBLEM - Apache HTTP on mw1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:27] PROBLEM - DPKG on mw1052 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:27] PROBLEM - Apache HTTP on mw1101 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:27] PROBLEM - Apache HTTP on mw1022 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:27] RECOVERY - SSH on mw1072 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:29:27] RECOVERY - twemproxy process on mw1072 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:29:28] RECOVERY - DPKG on mw1072 is OK: All packages OK [00:29:28] PROBLEM - DPKG on mw1026 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:29] PROBLEM - Apache HTTP on mw1112 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:29] PROBLEM - RAID on mw1086 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:30] PROBLEM - twemproxy process on mw1084 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:30] PROBLEM - Disk space on mw1101 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:31] PROBLEM - Apache HTTP on mw1026 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:31] PROBLEM - DPKG on mw1045 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:32] PROBLEM - Apache HTTP on mw1113 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:32] PROBLEM - DPKG on mw1081 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:33] PROBLEM - RAID on mw1052 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:33] PROBLEM - puppet disabled on mw1101 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:34] PROBLEM - SSH on mw1052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:34] PROBLEM - DPKG on mw1104 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:35] PROBLEM - DPKG on mw1063 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:35] RECOVERY - RAID on mw1072 is OK: OK: no RAID installed [00:29:36] PROBLEM - puppet disabled on mw1052 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:36] PROBLEM - Apache HTTP on mw1084 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:37] RECOVERY - RAID on mw1029 is OK: OK: no RAID installed [00:29:37] PROBLEM - RAID on mw1026 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:38] PROBLEM - SSH on mw1019 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:38] PROBLEM - Apache HTTP on mw1104 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:39] PROBLEM - SSH on mw1104 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:39] PROBLEM - twemproxy process on mw1052 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:40] PROBLEM - twemproxy process on mw1101 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:40] PROBLEM - Apache HTTP on mw1045 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:41] PROBLEM - DPKG on mw1084 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:41] PROBLEM - RAID on mw1105 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:42] RECOVERY - Disk space on mw1056 is OK: DISK OK [00:29:42] PROBLEM - twemproxy port on mw1026 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:43] RECOVERY - twemproxy port on mw1099 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:29:43] PROBLEM - Apache HTTP on mw1019 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:44] PROBLEM - twemproxy port on mw1037 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:44] RECOVERY - Disk space on mw1097 is OK: DISK OK [00:29:47] PROBLEM - Apache HTTP on mw1097 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:47] PROBLEM - RAID on mw1036 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:47] PROBLEM - twemproxy process on mw1045 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:47] RECOVERY - DPKG on mw1099 is OK: All packages OK [00:29:47] PROBLEM - SSH on mw1096 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:48] PROBLEM - Disk space on mw1037 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:48] PROBLEM - puppet disabled on mw1026 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:49] PROBLEM - RAID on mw1084 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:49] RECOVERY - RAID on mw1099 is OK: OK: no RAID installed [00:29:50] PROBLEM - puppet disabled on mw1045 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:57] RECOVERY - RAID on mw1020 is OK: OK: no RAID installed [00:29:57] PROBLEM - Apache HTTP on mw1036 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:57] PROBLEM - Disk space on mw1035 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:57] RECOVERY - puppet disabled on mw1099 is OK: OK [00:29:57] PROBLEM - RAID on mw1082 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:58] PROBLEM - RAID on mw1076 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:58] PROBLEM - Apache HTTP on mw1058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:29:59] PROBLEM - RAID on mw1024 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:59] PROBLEM - DPKG on mw1035 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:00] RECOVERY - twemproxy process on mw1097 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:30:00] PROBLEM - SSH on mw1055 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:01] PROBLEM - RAID on mw1101 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:07] PROBLEM - Apache HTTP on mw1035 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:07] PROBLEM - puppet disabled on mw1055 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:07] PROBLEM - Disk space on mw1022 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:07] PROBLEM - puppet disabled on mw1084 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:07] PROBLEM - SSH on mw1084 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:08] PROBLEM - RAID on mw1096 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:08] PROBLEM - Apache HTTP on mw1082 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:09] PROBLEM - RAID on mw1093 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:09] PROBLEM - DPKG on mw1096 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:10] RECOVERY - SSH on mw1099 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:30:10] PROBLEM - DPKG on mw1105 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:11] PROBLEM - RAID on mw1066 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:11] RECOVERY - puppet disabled on mw1020 is OK: OK [00:30:17] RECOVERY - twemproxy port on mw1035 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:30:17] PROBLEM - DPKG on mw1024 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:17] PROBLEM - Apache HTTP on mw1061 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:17] PROBLEM - RAID on mw1071 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:17] RECOVERY - twemproxy process on mw1099 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:30:18] RECOVERY - twemproxy process on mw1084 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:30:18] PROBLEM - RAID on mw1040 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:19] PROBLEM - Apache HTTP on mw1024 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:19] PROBLEM - twemproxy process on mw1096 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:20] PROBLEM - twemproxy port on mw1024 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:20] RECOVERY - twemproxy process on mw1020 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:30:21] PROBLEM - Apache HTTP on mw1039 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:21] PROBLEM - Disk space on mw1052 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:22] PROBLEM - DPKG on mw1086 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:22] RECOVERY - RAID on mw1079 is OK: OK: no RAID installed [00:30:23] RECOVERY - Disk space on mw1101 is OK: DISK OK [00:30:23] PROBLEM - RAID on mw1055 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:24] PROBLEM - RAID on mw1018 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:24] RECOVERY - puppet disabled on mw1101 is OK: OK [00:30:27] PROBLEM - twemproxy port on mw1096 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:27] PROBLEM - DPKG on mw1018 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:27] RECOVERY - SSH on mw1081 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:30:27] RECOVERY - DPKG on mw1063 is OK: All packages OK [00:30:27] RECOVERY - DPKG on mw1104 is OK: All packages OK [00:30:28] RECOVERY - Disk space on mw1020 is OK: DISK OK [00:30:28] PROBLEM - RAID on mw1098 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:29] PROBLEM - DPKG on mw1066 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:29] PROBLEM - Apache HTTP on mw1044 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:30] PROBLEM - SSH on mw1093 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:30] RECOVERY - SSH on mw1104 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:30:31] RECOVERY - DPKG on mw1081 is OK: All packages OK [00:30:31] PROBLEM - SSH on mw1044 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:32] PROBLEM - DPKG on mw1044 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:32] PROBLEM - puppet disabled on mw1093 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:33] RECOVERY - twemproxy port on mw1026 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:30:37] RECOVERY - DPKG on mw1065 is OK: All packages OK [00:30:37] RECOVERY - Apache HTTP on mw1099 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.074 second response time [00:30:37] PROBLEM - DPKG on mw1094 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:37] PROBLEM - Apache HTTP on mw1049 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:37] PROBLEM - Apache HTTP on mw1060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:38] PROBLEM - twemproxy process on mw1022 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:38] PROBLEM - Apache HTTP on mw1096 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:39] PROBLEM - Apache HTTP on mw1055 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:39] PROBLEM - twemproxy port on mw1094 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:40] PROBLEM - puppet disabled on mw1096 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:40] PROBLEM - puppet disabled on mw1022 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:41] PROBLEM - Disk space on mw1096 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:41] RECOVERY - RAID on mw1036 is OK: OK: no RAID installed [00:30:42] RECOVERY - SSH on mw1056 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:30:42] RECOVERY - RAID on mw1081 is OK: OK: no RAID installed [00:30:43] PROBLEM - SSH on mw1086 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:43] PROBLEM - twemproxy port on mw1022 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:44] RECOVERY - RAID on mw1056 is OK: OK: no RAID installed [00:30:44] PROBLEM - Apache HTTP on mw1098 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:47] PROBLEM - Apache HTTP on mw1086 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:47] RECOVERY - twemproxy port on mw1020 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:30:47] RECOVERY - RAID on mw1104 is OK: OK: no RAID installed [00:30:47] PROBLEM - Apache HTTP on mw1076 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:47] PROBLEM - Apache HTTP on mw1066 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:48] PROBLEM - Apache HTTP on mw1046 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:48] PROBLEM - DPKG on mw1029 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:49] RECOVERY - SSH on mw1055 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:30:57] PROBLEM - Apache HTTP on mw1094 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:30:57] PROBLEM - puppet disabled on mw1019 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:57] RECOVERY - puppet disabled on mw1055 is OK: OK [00:30:57] PROBLEM - Disk space on mw1086 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:57] PROBLEM - RAID on mw1074 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:30:58] RECOVERY - SSH on mw1084 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:30:58] RECOVERY - puppet disabled on mw1084 is OK: OK [00:30:59] RECOVERY - twemproxy port on mw1081 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:30:59] PROBLEM - twemproxy port on mw1086 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:00] PROBLEM - RAID on mw1062 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:00] RECOVERY - RAID on mw1093 is OK: OK: no RAID installed [00:31:01] PROBLEM - DPKG on mw1054 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:01] RECOVERY - twemproxy process on mw1035 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:31:02] PROBLEM - RAID on mw1046 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:02] RECOVERY - DPKG on mw1020 is OK: All packages OK [00:31:03] PROBLEM - puppet disabled on mw1086 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:03] PROBLEM - twemproxy process on mw1019 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:07] PROBLEM - twemproxy port on mw1019 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:07] RECOVERY - DPKG on mw1105 is OK: All packages OK [00:31:07] PROBLEM - SSH on mw1054 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:31:07] PROBLEM - puppet disabled on mw1094 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:07] PROBLEM - puppet disabled on mw1082 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:08] PROBLEM - SSH on mw1062 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:31:08] PROBLEM - Apache HTTP on mw1074 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:31:09] PROBLEM - Apache HTTP on mw1062 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:31:09] PROBLEM - puppet disabled on mw1024 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:10] PROBLEM - RAID on mw1094 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:10] PROBLEM - twemproxy process on mw1024 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:11] PROBLEM - twemproxy process on mw1086 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:11] PROBLEM - Apache HTTP on mw1018 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:31:12] PROBLEM - Apache HTTP on mw1069 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:31:12] RECOVERY - puppet disabled on mw1081 is OK: OK [00:31:17] PROBLEM - twemproxy port on mw1062 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:17] RECOVERY - Disk space on mw1052 is OK: DISK OK [00:31:17] RECOVERY - twemproxy process on mw1096 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:31:17] RECOVERY - twemproxy port on mw1096 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:31:17] RECOVERY - Apache HTTP on mw1044 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.088 second response time [00:31:18] PROBLEM - Disk space on mw1062 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:18] RECOVERY - DPKG on mw1052 is OK: All packages OK [00:31:19] PROBLEM - Disk space on mw1019 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:19] RECOVERY - DPKG on mw1066 is OK: All packages OK [00:31:20] PROBLEM - RAID on mw1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:20] RECOVERY - SSH on mw1044 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:31:21] RECOVERY - DPKG on mw1044 is OK: All packages OK [00:31:21] PROBLEM - puppet disabled on mw1062 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:27] PROBLEM - DPKG on mw1019 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:27] PROBLEM - Apache HTTP on mw1071 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:31:27] RECOVERY - puppet disabled on mw1093 is OK: OK [00:31:27] RECOVERY - Apache HTTP on mw1060 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.123 second response time [00:31:27] RECOVERY - SSH on mw1093 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:31:28] RECOVERY - twemproxy process on mw1101 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:31:28] PROBLEM - RAID on mw1031 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:29] RECOVERY - twemproxy process on mw1022 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:31:29] RECOVERY - puppet disabled on mw1022 is OK: OK [00:31:30] RECOVERY - twemproxy port on mw1022 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:31:30] PROBLEM - Apache HTTP on mw1021 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:31:31] PROBLEM - twemproxy port on mw1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:31] RECOVERY - puppet disabled on mw1052 is OK: OK [00:31:32] PROBLEM - twemproxy process on mw1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:32] PROBLEM - RAID on mw1108 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:33] RECOVERY - DPKG on mw1084 is OK: All packages OK [00:31:37] RECOVERY - Disk space on mw1096 is OK: DISK OK [00:31:37] RECOVERY - puppet disabled on mw1096 is OK: OK [00:31:37] PROBLEM - RAID on mw1069 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:37] PROBLEM - RAID on mw1097 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:37] RECOVERY - twemproxy process on mw1052 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:31:38] PROBLEM - RAID on mw1067 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:38] RECOVERY - SSH on mw1096 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:31:39] RECOVERY - Apache HTTP on mw1046 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.081 second response time [00:31:39] RECOVERY - DPKG on mw1029 is OK: All packages OK [00:31:40] RECOVERY - RAID on mw1084 is OK: OK: no RAID installed [00:31:40] RECOVERY - SSH on mw1020 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:31:47] RECOVERY - puppet disabled on mw1026 is OK: OK [00:31:47] PROBLEM - SSH on mw1112 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:31:47] RECOVERY - SSH on mw1026 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:31:47] PROBLEM - RAID on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:47] PROBLEM - twemproxy process on mw1112 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:48] RECOVERY - RAID on mw1046 is OK: OK: no RAID installed [00:31:52] What is this I don't even [00:31:57] PROBLEM - DPKG on mw1067 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:57] RECOVERY - Disk space on mw1022 is OK: DISK OK [00:31:57] PROBLEM - DPKG on mw1031 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:57] PROBLEM - DPKG on mw1035 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:07] PROBLEM - puppet disabled on mw1112 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:07] PROBLEM - Disk space on mw1031 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:07] PROBLEM - RAID on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:07] PROBLEM - Apache HTTP on mw1053 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:32:07] PROBLEM - DPKG on mw1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:08] PROBLEM - DPKG on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:08] PROBLEM - SSH on mw1031 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:32:09] PROBLEM - DPKG on mw1049 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:09] PROBLEM - twemproxy process on mw1054 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:10] PROBLEM - Apache HTTP on mw1031 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:32:10] PROBLEM - puppet disabled on mw1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:11] PROBLEM - RAID on mw1049 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:11] PROBLEM - twemproxy port on mw1112 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:12] RECOVERY - RAID on mw1040 is OK: OK: no RAID installed [00:32:12] RECOVERY - RAID on mw1039 is OK: OK: no RAID installed [00:32:13] RECOVERY - Apache HTTP on mw1039 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.077 second response time [00:32:13] ... [00:32:17] RECOVERY - Apache HTTP on mw1063 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 2.397 second response time [00:32:17] PROBLEM - twemproxy port on mw1049 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:17] RECOVERY - Disk space on mw1019 is OK: DISK OK [00:32:17] PROBLEM - Apache HTTP on mw1095 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:32:17] PROBLEM - puppet disabled on mw1031 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:18] PROBLEM - RAID on mw1109 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:18] PROBLEM - RAID on mw1041 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:19] PROBLEM - twemproxy port on mw1067 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:19] PROBLEM - DPKG on mw1112 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:20] RECOVERY - RAID on mw1021 is OK: OK: no RAID installed [00:32:20] RECOVERY - twemproxy port on mw1021 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:32:21] RECOVERY - twemproxy process on mw1021 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:32:21] PROBLEM - Disk space on mw1067 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:22] PROBLEM - puppet disabled on mw1095 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:22] RECOVERY - DPKG on mw1026 is OK: All packages OK [00:32:25] greg-g: is everything okay? [00:32:27] PROBLEM - twemproxy port on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:27] PROBLEM - DPKG on mw1095 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:27] PROBLEM - puppet disabled on mw1049 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:27] PROBLEM - puppet disabled on mw1109 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:27] PROBLEM - RAID on mw1025 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:28] PROBLEM - SSH on mw1018 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:32:28] PROBLEM - Disk space on mw1112 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:29] PROBLEM - Apache HTTP on mw1109 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:32:29] PROBLEM - twemproxy process on mw1049 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:37] RECOVERY - Apache HTTP on mw1097 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.082 second response time [00:32:37] PROBLEM - SSH on mw1049 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:32:37] PROBLEM - twemproxy port on mw1109 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:37] PROBLEM - Disk space on mw1109 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:37] PROBLEM - RAID on mw1095 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:38] PROBLEM - puppet disabled on mw1067 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:38] PROBLEM - SSH on mw1067 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:32:39] PROBLEM - twemproxy process on mw1109 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:46] Our servers are currently experiencing a technical problem. This is probably temporary and should be fixed soon. Please try again in a few minutes. [00:32:47] PROBLEM - SSH on mw1109 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:32:47] PROBLEM - twemproxy process on mw1067 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:47] PROBLEM - Disk space on mw1024 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:47] RECOVERY - DPKG on mw1022 is OK: All packages OK [00:32:47] PROBLEM - SSH on mw1024 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:32:48] PROBLEM - Disk space on mw1049 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:48] RECOVERY - Disk space on mw1035 is OK: DISK OK [00:32:49] PROBLEM - twemproxy port on mw1113 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:49] RECOVERY - DPKG on mw1035 is OK: All packages OK [00:32:57] PROBLEM - DPKG on mw1098 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:57] PROBLEM - twemproxy port on mw1054 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:57] PROBLEM - Apache HTTP on mw1025 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:32:57] RECOVERY - puppet disabled on mw1112 is OK: OK [00:32:57] PROBLEM - SSH on mw1074 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:32:58] PROBLEM - DPKG on mw1109 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:32:58] RECOVERY - puppet disabled on mw1021 is OK: OK [00:32:59] RECOVERY - twemproxy port on mw1112 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:32:59] RECOVERY - DPKG on mw1021 is OK: All packages OK [00:33:00] RECOVERY - DPKG on mw1053 is OK: All packages OK [00:33:00] RECOVERY - twemproxy port on mw1045 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:33:01] PROBLEM - RAID on mw1020 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:01] PROBLEM - Disk space on mw1054 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:02] PROBLEM - DPKG on mw1108 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:07] PROBLEM - Apache HTTP on mw1027 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:33:07] PROBLEM - puppet disabled on mw1076 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:07] PROBLEM - SSH on mw1113 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:33:07] PROBLEM - twemproxy process on mw1113 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:07] PROBLEM - twemproxy port on mw1076 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:08] PROBLEM - puppet disabled on mw1054 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:08] PROBLEM - RAID on mw1027 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:08] o_O [00:33:09] RECOVERY - RAID on mw1035 is OK: OK: no RAID installed [00:33:09] RECOVERY - twemproxy process on mw1080 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:33:10] RECOVERY - twemproxy port on mw1080 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:33:15] getting 503s on enwiki [00:33:17] PROBLEM - twemproxy process on mw1076 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:17] RECOVERY - puppet disabled on mw1095 is OK: OK [00:33:17] RECOVERY - DPKG on mw1080 is OK: All packages OK [00:33:17] RECOVERY - SSH on mw1052 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:33:17] RECOVERY - RAID on mw1052 is OK: OK: no RAID installed [00:33:18] RECOVERY - DPKG on mw1112 is OK: All packages OK [00:33:18] RECOVERY - SSH on mw1080 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:33:19] PROBLEM - Disk space on mw1076 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:19] RECOVERY - twemproxy port on mw1053 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:33:20] awjr: yep [00:33:27] RECOVERY - RAID on mw1098 is OK: OK: no RAID installed [00:33:27] PROBLEM - twemproxy port on mw1031 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:27] RECOVERY - Disk space on mw1080 is OK: DISK OK [00:33:27] RECOVERY - RAID on mw1097 is OK: OK: no RAID installed [00:33:27] RECOVERY - Apache HTTP on mw1080 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.067 second response time [00:33:28] RECOVERY - Apache HTTP on mw1072 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.096 second response time [00:33:28] RECOVERY - DPKG on mw1045 is OK: All packages OK [00:33:29] PROBLEM - puppet disabled on mw1035 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:29] RECOVERY - RAID on mw1069 is OK: OK: no RAID installed [00:33:29] "Our servers are currently experiencing a technical problem. This is probably temporary and should be fixed soon. Please try again in a few minutes." [00:33:30] RECOVERY - RAID on mw1105 is OK: OK: no RAID installed [00:33:30] PROBLEM - twemproxy process on mw1031 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:31] PROBLEM - SSH on mw1035 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:33:31] PROBLEM - twemproxy port on mw1082 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:32] PROBLEM - DPKG on mw1074 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:32] PROBLEM - SSH on mw1076 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:33:33] PROBLEM - twemproxy port on mw1074 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:37] RECOVERY - SSH on mw1019 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:33:37] RECOVERY - puppet disabled on mw1037 is OK: OK [00:33:37] RECOVERY - twemproxy port on mw1037 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:33:37] PROBLEM - twemproxy process on mw1062 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:37] PROBLEM - DPKG on mw1076 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:38] PROBLEM - twemproxy process on mw1082 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:38] RECOVERY - Disk space on mw1037 is OK: DISK OK [00:33:39] PROBLEM - DPKG on mw1062 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:39] RECOVERY - twemproxy port on mw1113 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:33:40] RECOVERY - RAID on mw1113 is OK: OK: no RAID installed [00:33:40] RECOVERY - SSH on mw1112 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:33:41] RECOVERY - twemproxy process on mw1112 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:33:41] RECOVERY - twemproxy process on mw1045 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:33:41] Reedy [00:33:47] RECOVERY - puppet disabled on mw1045 is OK: OK [00:33:47] RECOVERY - twemproxy port on mw1054 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:33:47] RECOVERY - puppet disabled on mw1019 is OK: OK [00:33:47] RECOVERY - DPKG on mw1098 is OK: All packages OK [00:33:47] RECOVERY - DPKG on mw1108 is OK: All packages OK [00:33:57] RECOVERY - Disk space on mw1054 is OK: DISK OK [00:33:57] PROBLEM - RAID on mw1104 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:33:57] RECOVERY - twemproxy process on mw1019 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:33:57] RECOVERY - twemproxy port on mw1019 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:33:57] RECOVERY - Apache HTTP on mw1052 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.074 second response time [00:33:58] RECOVERY - SSH on mw1113 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:33:58] RECOVERY - puppet disabled on mw1080 is OK: OK [00:33:59] RECOVERY - RAID on mw1065 is OK: OK: no RAID installed [00:33:59] RECOVERY - twemproxy process on mw1113 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:34:00] RECOVERY - puppet disabled on mw1082 is OK: OK [00:34:00] RECOVERY - RAID on mw1096 is OK: OK: no RAID installed [00:34:01] RECOVERY - DPKG on mw1096 is OK: All packages OK [00:34:01] RECOVERY - twemproxy process on mw1037 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:34:02] RECOVERY - RAID on mw1080 is OK: OK: no RAID installed [00:34:02] RECOVERY - Apache HTTP on mw1069 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.975 second response time [00:34:03] RECOVERY - twemproxy process on mw1054 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:34:03] RECOVERY - puppet disabled on mw1054 is OK: OK [00:34:04] PROBLEM - SSH on mw1055 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:34:04] RECOVERY - SSH on mw1054 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:34:07] PROBLEM - DPKG on mw1042 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:34:07] RECOVERY - RAID on mw1071 is OK: OK: no RAID installed [00:34:07] PROBLEM - puppet disabled on mw1018 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:34:07] PROBLEM - RAID on mw1042 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:34:07] RECOVERY - RAID on mw1022 is OK: OK: no RAID installed [00:34:08] RECOVERY - RAID on mw1112 is OK: OK: no RAID installed [00:34:09] !sal [00:34:13] http://bit.ly/wikisal [00:34:17] RECOVERY - SSH on mw1036 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:34:17] PROBLEM - DPKG on mw1082 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:34:17] RECOVERY - Apache HTTP on mw1071 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.066 second response time [00:34:17] RECOVERY - Apache HTTP on mw1093 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 6.026 second response time [00:34:17] RECOVERY - Disk space on mw1045 is OK: DISK OK [00:34:18] RECOVERY - DPKG on mw1019 is OK: All packages OK [00:34:18] RECOVERY - Disk space on mw1067 is OK: DISK OK [00:34:19] RECOVERY - puppet disabled on mw1109 is OK: OK [00:34:19] RECOVERY - twemproxy port on mw1067 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:34:20] PROBLEM - puppet disabled on mw1074 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:34:20] PROBLEM - SSH on mw1094 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:34:21] RECOVERY - puppet disabled on mw1062 is OK: OK [00:34:21] RECOVERY - SSH on mw1037 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:34:22] RECOVERY - Disk space on mw1112 is OK: DISK OK [00:34:22] PROBLEM - twemproxy process on mw1094 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:34:23] PROBLEM - Disk space on mw1074 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:34:23] PROBLEM - Disk space on mw1094 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:34:24] RECOVERY - RAID on mw1108 is OK: OK: no RAID installed [00:34:24] RECOVERY - DPKG on mw1095 is OK: All packages OK [00:34:25] 18:37 ori: added jgonera to wmf-deployment; was already a deployer but not in relevant gerrit group [00:34:27] RECOVERY - twemproxy port on mw1082 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:34:27] RECOVERY - puppet disabled on mw1035 is OK: OK [00:34:27] RECOVERY - SSH on mw1035 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:34:27] RECOVERY - twemproxy process on mw1062 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:34:27] RECOVERY - twemproxy process on mw1082 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:34:28] RECOVERY - Apache HTTP on mw1021 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 6.830 second response time [00:34:28] RECOVERY - DPKG on mw1037 is OK: All packages OK [00:34:29] RECOVERY - Apache HTTP on mw1113 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 7.279 second response time [00:34:29] RECOVERY - Apache HTTP on mw1065 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 3.285 second response time [00:34:30] RECOVERY - RAID on mw1019 is OK: OK: no RAID installed [00:34:30] RECOVERY - SSH on mw1067 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:34:31] RECOVERY - puppet disabled on mw1067 is OK: OK [00:34:31] RECOVERY - DPKG on mw1062 is OK: All packages OK [00:34:32] RECOVERY - twemproxy port on mw1109 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:34:32] RECOVERY - Disk space on mw1109 is OK: DISK OK [00:34:32] hm, nothing relevant there [00:34:33] RECOVERY - Apache HTTP on mw1098 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.077 second response time [00:34:33] RECOVERY - RAID on mw1095 is OK: OK: no RAID installed [00:34:34] RECOVERY - RAID on mw1067 is OK: OK: no RAID installed [00:34:37] RECOVERY - twemproxy process on mw1067 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:34:37] PROBLEM - RAID on mw1029 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:34:37] RECOVERY - RAID on mw1053 is OK: OK: no RAID installed [00:34:37] RECOVERY - RAID on mw1058 is OK: OK: no RAID installed [00:34:47] RECOVERY - Apache HTTP on mw1079 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.059 second response time [00:34:47] RECOVERY - SSH on mw1074 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:34:47] RECOVERY - Apache HTTP on mw1066 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 7.628 second response time [00:34:47] RECOVERY - DPKG on mw1067 is OK: All packages OK [00:34:47] RECOVERY - Apache HTTP on mw1108 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 1.868 second response time [00:34:48] PROBLEM - RAID on mw1056 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:34:48] PROBLEM - RAID on mw1084 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:34:57] RECOVERY - DPKG on mw1042 is OK: All packages OK [00:34:57] RECOVERY - Apache HTTP on mw1053 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.084 second response time [00:34:57] RECOVERY - RAID on mw1020 is OK: OK: no RAID installed [00:34:57] RECOVERY - puppet disabled on mw1076 is OK: OK [00:34:57] RECOVERY - RAID on mw1042 is OK: OK: no RAID installed [00:34:58] RECOVERY - twemproxy port on mw1076 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:35:07] RECOVERY - twemproxy process on mw1076 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:35:07] RECOVERY - SSH on mw1062 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:35:07] RECOVERY - twemproxy port on mw1062 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:35:07] RECOVERY - Disk space on mw1062 is OK: DISK OK [00:35:07] RECOVERY - RAID on mw1041 is OK: OK: no RAID installed [00:35:08] RECOVERY - Disk space on mw1074 is OK: DISK OK [00:35:08] RECOVERY - puppet disabled on mw1074 is OK: OK [00:35:09] RECOVERY - Apache HTTP on mw1081 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.103 second response time [00:35:09] RECOVERY - Disk space on mw1076 is OK: DISK OK [00:35:10] Is anyone here? [00:35:12] ??? [00:35:17] RECOVERY - DPKG on mw1036 is OK: All packages OK [00:35:17] RECOVERY - RAID on mw1037 is OK: OK: no RAID installed [00:35:17] RECOVERY - Apache HTTP on mw1020 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 9.362 second response time [00:35:17] PROBLEM - DPKG on mw1102 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:35:17] RECOVERY - SSH on mw1076 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:35:18] RECOVERY - SSH on mw1018 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:35:18] RECOVERY - twemproxy port on mw1074 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:35:19] RECOVERY - DPKG on mw1074 is OK: All packages OK [00:35:19] PROBLEM - SSH on mw1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:35:20] RECOVERY - DPKG on mw1018 is OK: All packages OK [00:35:21] on Friday night [00:35:27] RECOVERY - DPKG on mw1076 is OK: All packages OK [00:35:27] RECOVERY - Apache HTTP on mw1096 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.075 second response time [00:35:27] PROBLEM - puppet disabled on mw1056 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:35:27] PROBLEM - DPKG on mw1026 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:35:27] PROBLEM - SSH on mw1025 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:35:28] (in San Francisco) [00:35:37] RECOVERY - twemproxy process on mw1109 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:35:37] RECOVERY - Disk space on mw1024 is OK: DISK OK [00:35:37] PROBLEM - Apache HTTP on mw1102 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:35:37] RECOVERY - SSH on mw1024 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:35:37] PROBLEM - RAID on mw1102 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:35:38] PROBLEM - twemproxy port on mw1026 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:35:38] PROBLEM - twemproxy port on mw1056 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:35:39] RECOVERY - Apache HTTP on mw1019 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 7.898 second response time [00:35:39] RECOVERY - Disk space on mw1049 is OK: DISK OK [00:35:40] RECOVERY - RAID on mw1084 is OK: OK: no RAID installed [00:35:40] PROBLEM - twemproxy process on mw1026 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:35:41] RECOVERY - SSH on mw1109 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:35:42] Looks like mthings are recovering... [00:35:45] sort of [00:35:47] RECOVERY - Apache HTTP on mw1076 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 4.970 second response time [00:35:47] PROBLEM - Disk space on mw1056 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:35:47] PROBLEM - puppet disabled on mw1026 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:35:47] RECOVERY - RAID on mw1074 is OK: OK: no RAID installed [00:35:47] PROBLEM - SSH on mw1056 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:35:48] RECOVERY - RAID on mw1062 is OK: OK: no RAID installed [00:35:48] RECOVERY - RAID on mw1076 is OK: OK: no RAID installed [00:35:49] RECOVERY - Apache HTTP on mw1058 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 1.064 second response time [00:35:49] PROBLEM - puppet disabled on mw1102 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:35:51] Reedy: what's going on? [00:35:57] PROBLEM - DPKG on mw1077 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:35:57] RECOVERY - RAID on mw1024 is OK: OK: no RAID installed [00:35:57] RECOVERY - Apache HTTP on mw1074 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.106 second response time [00:35:57] RECOVERY - DPKG on mw1054 is OK: All packages OK [00:35:57] RECOVERY - puppet disabled on mw1094 is OK: OK [00:35:58] RECOVERY - RAID on mw1101 is OK: OK: no RAID installed [00:35:58] RECOVERY - twemproxy process on mw1024 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:35:59] RECOVERY - Apache HTTP on mw1062 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.751 second response time [00:35:59] RECOVERY - puppet disabled on mw1024 is OK: OK [00:36:00] RECOVERY - RAID on mw1066 is OK: OK: no RAID installed [00:36:00] PROBLEM - twemproxy process on mw1064 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:36:01] RECOVERY - puppet disabled on mw1018 is OK: OK [00:36:01] RECOVERY - RAID on mw1094 is OK: OK: no RAID installed [00:36:02] RECOVERY - Apache HTTP on mw1041 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 8.344 second response time [00:36:07] RECOVERY - Apache HTTP on mw1105 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.072 second response time [00:36:07] PROBLEM - twemproxy process on mw1102 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:36:07] PROBLEM - SSH on mw1064 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:36:07] RECOVERY - DPKG on mw1024 is OK: All packages OK [00:36:07] PROBLEM - Apache HTTP on mw1064 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:36:08] PROBLEM - DPKG on mw1064 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:36:08] PROBLEM - twemproxy process on mw1056 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:36:09] RECOVERY - twemproxy port on mw1024 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:36:09] RECOVERY - Apache HTTP on mw1037 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.063 second response time [00:36:10] RECOVERY - Apache HTTP on mw1095 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.130 second response time [00:36:10] RECOVERY - SSH on mw1094 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:36:11] RECOVERY - twemproxy port on mw1049 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:36:11] RECOVERY - puppet disabled on mw1031 is OK: OK [00:36:12] RECOVERY - Disk space on mw1094 is OK: DISK OK [00:36:12] RECOVERY - twemproxy process on mw1094 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:36:17] RECOVERY - twemproxy port on mw1031 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:36:17] RECOVERY - Apache HTTP on mw1112 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.066 second response time [00:36:17] RECOVERY - SSH on mw1077 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:36:17] PROBLEM - RAID on mw1035 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:36:17] PROBLEM - RAID on mw1064 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:36:18] RECOVERY - RAID on mw1054 is OK: OK: no RAID installed [00:36:18] PROBLEM - Disk space on mw1027 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:36:19] RECOVERY - twemproxy process on mw1031 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:36:19] PROBLEM - DPKG on mw1056 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:36:20] PROBLEM - puppet disabled on mw1064 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:36:20] RECOVERY - DPKG on mw1026 is OK: All packages OK [00:36:21] RECOVERY - puppet disabled on mw1049 is OK: OK [00:36:21] RECOVERY - Apache HTTP on mw1084 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.089 second response time [00:36:22] RECOVERY - RAID on mw1031 is OK: OK: no RAID installed [00:36:22] PROBLEM - Disk space on mw1025 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:36:23] RECOVERY - twemproxy process on mw1049 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:36:27] PROBLEM - twemproxy port on mw1064 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:36:27] RECOVERY - twemproxy port on mw1094 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:36:27] PROBLEM - puppet disabled on mw1025 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:36:27] PROBLEM - DPKG on mw1045 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:36:27] RECOVERY - twemproxy port on mw1026 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:36:28] RECOVERY - DPKG on mw1094 is OK: All packages OK [00:36:28] RECOVERY - twemproxy process on mw1026 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:36:29] RECOVERY - RAID on mw1026 is OK: OK: no RAID installed [00:36:37] RECOVERY - SSH on mw1049 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:36:37] PROBLEM - SSH on mw1104 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:36:37] RECOVERY - puppet disabled on mw1026 is OK: OK [00:36:37] PROBLEM - DPKG on mw1041 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:36:42] someone tripped over the network cable? [00:36:45] Almost looks networky [00:36:47] RECOVERY - SSH on mw1056 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:36:47] RECOVERY - Apache HTTP on mw1036 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 3.747 second response time [00:36:47] RECOVERY - DPKG on mw1031 is OK: All packages OK [00:36:47] RECOVERY - DPKG on mw1077 is OK: All packages OK [00:36:57] RECOVERY - Disk space on mw1031 is OK: DISK OK [00:36:57] PROBLEM - twemproxy process on mw1027 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:36:57] RECOVERY - SSH on mw1031 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:36:57] RECOVERY - DPKG on mw1049 is OK: All packages OK [00:36:57] PROBLEM - DPKG on mw1027 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:36:58] RECOVERY - twemproxy process on mw1056 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:36:58] RECOVERY - RAID on mw1049 is OK: OK: no RAID installed [00:36:59] PROBLEM - DPKG on mw1025 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:36:59] PROBLEM - twemproxy port on mw1104 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:37:07] PROBLEM - puppet disabled on mw1082 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:37:07] PROBLEM - puppet disabled on mw1027 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:37:07] PROBLEM - twemproxy port on mw1045 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:37:07] PROBLEM - SSH on mw1082 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:37:07] PROBLEM - twemproxy port on mw1027 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:37:08] PROBLEM - SSH on mw1027 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:37:08] RECOVERY - Apache HTTP on mw1024 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.079 second response time [00:37:09] PROBLEM - Disk space on mw1064 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:37:09] RECOVERY - RAID on mw1018 is OK: OK: no RAID installed [00:37:17] RECOVERY - DPKG on mw1056 is OK: All packages OK [00:37:17] RECOVERY - Apache HTTP on mw1022 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.056 second response time [00:37:17] RECOVERY - Apache HTTP on mw1101 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.062 second response time [00:37:17] RECOVERY - RAID on mw1077 is OK: OK: no RAID installed [00:37:17] RECOVERY - puppet disabled on mw1056 is OK: OK [00:37:18] RECOVERY - Apache HTTP on mw1054 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.143 second response time [00:37:18] RECOVERY - Apache HTTP on mw1026 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.889 second response time [00:37:19] PROBLEM - puppet disabled on mw1104 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:37:27] PROBLEM - puppet disabled on mw1109 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:37:27] RECOVERY - twemproxy port on mw1056 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:37:27] PROBLEM - twemproxy process on mw1104 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:37:27] PROBLEM - DPKG on mw1104 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:37:37] RECOVERY - Disk space on mw1056 is OK: DISK OK [00:37:37] RECOVERY - RAID on mw1056 is OK: OK: no RAID installed [00:37:47] PROBLEM - twemproxy process on mw1045 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:37:47] RECOVERY - RAID on mw1104 is OK: OK: no RAID installed [00:37:47] RECOVERY - SSH on mw1055 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:37:57] RECOVERY - twemproxy port on mw1104 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:37:57] RECOVERY - Apache HTTP on mw1035 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.109 second response time [00:37:57] RECOVERY - SSH on mw1045 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:37:57] RECOVERY - puppet disabled on mw1082 is OK: OK [00:37:57] RECOVERY - RAID on mw1082 is OK: OK: no RAID installed [00:37:58] RECOVERY - RAID on mw1045 is OK: OK: no RAID installed [00:37:58] RECOVERY - Apache HTTP on mw1031 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.064 second response time [00:37:59] RECOVERY - SSH on mw1082 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:37:59] RECOVERY - Apache HTTP on mw1018 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.125 second response time [00:38:00] RECOVERY - twemproxy port on mw1045 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:38:00] RECOVERY - DPKG on mw1025 is OK: All packages OK [00:38:07] RECOVERY - DPKG on mw1082 is OK: All packages OK [00:38:07] RECOVERY - RAID on mw1035 is OK: OK: no RAID installed [00:38:17] RECOVERY - Disk space on mw1027 is OK: DISK OK [00:38:17] PROBLEM - RAID on mw1041 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:38:17] RECOVERY - puppet disabled on mw1104 is OK: OK [00:38:17] RECOVERY - Apache HTTP on mw1067 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.061 second response time [00:38:18] RECOVERY - twemproxy process on mw1104 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:38:18] RECOVERY - DPKG on mw1045 is OK: All packages OK [00:38:19] PiRSquared: What error(s) are/were you getting? [00:38:27] RECOVERY - DPKG on mw1104 is OK: All packages OK [00:38:27] PROBLEM - SSH on mw1102 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:38:27] RECOVERY - Apache HTTP on mw1049 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.085 second response time [00:38:27] RECOVERY - Apache HTTP on mw1056 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.080 second response time [00:38:27] PROBLEM - Disk space on mw1102 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:38:28] PROBLEM - Disk space on mw1041 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:38:37] RECOVERY - SSH on mw1104 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:38:37] RECOVERY - twemproxy process on mw1045 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:38:37] PROBLEM - twemproxy port on mw1109 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:38:37] PROBLEM - Disk space on mw1109 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:38:37] PROBLEM - twemproxy port on mw1102 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:38:38] PROBLEM - SSH on mw1041 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:38:38] PROBLEM - twemproxy process on mw1109 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:38:47] RECOVERY - DPKG on mw1109 is OK: All packages OK [00:38:57] RECOVERY - twemproxy process on mw1064 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:38:57] PROBLEM - twemproxy process on mw1041 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:38:57] RECOVERY - SSH on mw1064 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:38:57] PROBLEM - twemproxy port on mw1041 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:38:57] RECOVERY - DPKG on mw1064 is OK: All packages OK [00:38:58] RECOVERY - Disk space on mw1064 is OK: DISK OK [00:39:07] PROBLEM - Apache HTTP on mw1041 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:39:07] PROBLEM - RAID on mw1094 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:39:07] RECOVERY - RAID on mw1109 is OK: OK: no RAID installed [00:39:07] RECOVERY - puppet disabled on mw1064 is OK: OK [00:39:17] RECOVERY - twemproxy port on mw1064 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:39:17] RECOVERY - puppet disabled on mw1109 is OK: OK [00:39:25] reedy: I got 503 on desktop site (not mdot) GET of a redlink. twice [00:39:27] RECOVERY - Disk space on mw1041 is OK: DISK OK [00:39:27] RECOVERY - twemproxy port on mw1109 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:39:27] RECOVERY - Disk space on mw1109 is OK: DISK OK [00:39:27] RECOVERY - RAID on mw1029 is OK: OK: no RAID installed [00:39:27] RECOVERY - DPKG on mw1041 is OK: All packages OK [00:39:28] RECOVERY - twemproxy process on mw1109 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:39:37] RECOVERY - SSH on mw1041 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:39:37] RECOVERY - Apache HTTP on mw1045 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 9.557 second response time [00:39:47] RECOVERY - Apache HTTP on mw1077 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 9.442 second response time [00:39:47] RECOVERY - twemproxy process on mw1041 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:39:47] RECOVERY - twemproxy process on mw1027 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:39:47] RECOVERY - twemproxy port on mw1041 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:39:57] RECOVERY - DPKG on mw1027 is OK: All packages OK [00:39:57] RECOVERY - Apache HTTP on mw1082 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.076 second response time [00:39:57] RECOVERY - puppet disabled on mw1027 is OK: OK [00:39:57] RECOVERY - twemproxy port on mw1027 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:39:57] RECOVERY - SSH on mw1027 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:39:58] RECOVERY - Apache HTTP on mw1064 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 4.828 second response time [00:40:07] RECOVERY - RAID on mw1094 is OK: OK: no RAID installed [00:40:07] RECOVERY - RAID on mw1041 is OK: OK: no RAID installed [00:40:07] RECOVERY - RAID on mw1064 is OK: OK: no RAID installed [00:40:07] RECOVERY - RAID on mw1055 is OK: OK: no RAID installed [00:40:16] reedy, both times was cp1053 [00:40:17] RECOVERY - Apache HTTP on mw1109 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.050 second response time [00:40:27] RECOVERY - Apache HTTP on mw1104 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.060 second response time [00:40:37] RECOVERY - Apache HTTP on mw1029 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.054 second response time [00:40:37] PROBLEM - twemproxy port on mw1025 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:40:47] RECOVERY - Apache HTTP on mw1094 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.070 second response time [00:40:57] RECOVERY - Apache HTTP on mw1027 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.068 second response time [00:40:57] RECOVERY - Apache HTTP on mw1041 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.073 second response time [00:40:57] RECOVERY - RAID on mw1027 is OK: OK: no RAID installed [00:40:57] PROBLEM - SSH on mw1055 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:40:57] PROBLEM - DPKG on mw1025 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:41:07] PROBLEM - twemproxy process on mw1025 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:42:10] FYI: Ops are aware [00:42:47] RECOVERY - SSH on mw1055 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:43:27] RECOVERY - Apache HTTP on mw1055 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.073 second response time [00:44:17] RECOVERY - Disk space on mw1025 is OK: DISK OK [00:45:07] RECOVERY - twemproxy process on mw1025 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:45:27] RECOVERY - puppet disabled on mw1025 is OK: OK [00:45:27] RECOVERY - twemproxy port on mw1025 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:46:17] RECOVERY - SSH on mw1102 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:46:17] RECOVERY - DPKG on mw1102 is OK: All packages OK [00:46:17] RECOVERY - Disk space on mw1102 is OK: DISK OK [00:46:27] RECOVERY - Apache HTTP on mw1102 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.075 second response time [00:46:27] RECOVERY - RAID on mw1102 is OK: OK: no RAID installed [00:46:27] RECOVERY - twemproxy port on mw1102 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:46:37] RECOVERY - puppet disabled on mw1102 is OK: OK [00:46:57] RECOVERY - twemproxy process on mw1102 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:47:41] !log kill updateSpecial MWScript on terbium & pt-kill updateSpecial frwiki query running for 15m [00:47:48] Logged the message, Master [00:48:17] PROBLEM - Disk space on mw1025 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:49:37] PROBLEM - twemproxy port on mw1025 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:50:07] PROBLEM - twemproxy process on mw1025 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:50:27] PROBLEM - puppet disabled on mw1025 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:54:47] RECOVERY - Disk space on mw1086 is OK: DISK OK [00:54:57] RECOVERY - twemproxy port on mw1086 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:54:57] RECOVERY - puppet disabled on mw1086 is OK: OK [00:54:57] RECOVERY - twemproxy process on mw1086 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [00:55:07] RECOVERY - DPKG on mw1086 is OK: All packages OK [00:55:17] RECOVERY - RAID on mw1086 is OK: OK: no RAID installed [00:55:27] RECOVERY - SSH on mw1086 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [00:55:37] RECOVERY - Apache HTTP on mw1086 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.070 second response time [00:59:07] RECOVERY - Apache HTTP on mw1061 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.068 second response time [01:06:17] PROBLEM - Host mw1025 is DOWN: PING CRITICAL - Packet loss = 100% [01:07:17] RECOVERY - RAID on mw1025 is OK: OK: no RAID installed [01:07:17] RECOVERY - SSH on mw1025 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.2 (protocol 2.0) [01:07:17] RECOVERY - puppet disabled on mw1025 is OK: OK [01:07:27] RECOVERY - twemproxy port on mw1025 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [01:07:27] RECOVERY - Host mw1025 is UP: PING OK - Packet loss = 0%, RTA = 0.32 ms [01:07:47] RECOVERY - DPKG on mw1025 is OK: All packages OK [01:07:57] RECOVERY - twemproxy process on mw1025 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [01:08:07] RECOVERY - Disk space on mw1025 is OK: DISK OK [01:10:47] RECOVERY - Apache HTTP on mw1025 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.063 second response time [01:45:51] anyone knows what /w/api.php?format=json&action=query&prop=pageimages&titles=&pithumbsize=80&pilimit=50 is? [01:46:15] and by that I mean why is it fetched hundred of times of second by all kinds of mobile UAs? [01:47:34] as well as /w/api.php?format=json&action=query&prop=pageimages&titles= URLs [01:48:16] oh my... [01:49:33] 1811 RxURL c /w/api.php?format=json&action=query&prop=pageimages&titles=&pithumbsize=80&pilimit=50 [01:49:37] 1811 RxHeader c User-Agent: Mozilla/5.0 (Linux; Android 4.4.2; GT-I9500 Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.166 Mobile Safari/537.36 [01:49:41] 1811 RxHeader c Referer: http://es.m.wikipedia.org/wiki/Wikipedia:Portada [01:49:50] 1811 RxHeader c X-Requested-With: XMLHttpRequest [01:49:53] wtf [02:02:37] bd808|BUFFER: poke https://gerrit.wikimedia.org/r/#/c/121574/ [02:16:33] !log LocalisationUpdate completed (1.23wmf19) at 2014-03-29 02:16:33+00:00 [02:16:45] Logged the message, Master [02:19:08] w00t, at least l10n worked tonight [02:19:23] (well, the first one, /me crosses fingers) [02:36:39] !log LocalisationUpdate completed (1.23wmf20) at 2014-03-29 02:36:39+00:00 [02:36:47] Logged the message, Master [02:45:44] greg-g: still here? [02:50:39] paravoid: ish, feeling sick [02:50:46] eep [02:50:48] sorry [02:50:55] greg-g: http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&c=Application+servers+eqiad&m=cpu_report&s=by+name&mc=2&g=network_report [02:51:20] the timestamp of the rise is 2014-03-27 between 18:00 and 19:30 [02:51:25] so, depl. window [02:51:32] jebede [02:51:42] that's bad [02:52:13] it is :) [02:55:16] * greg-g looks at the next one you filed [02:55:20] it's memcache [03:00:44] paravoid: you stay up however late/early you want, but I'm going to go lay down if you don't mind :) [03:00:53] goodnight :) [03:01:03] g'night sir, and thanks for helping tonight [03:16:27] !log LocalisationUpdate ResourceLoader cache refresh completed at Sat Mar 29 03:16:24 UTC 2014 (duration 16m 23s) [03:16:34] Logged the message, Master [04:04:07] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:04:14] ? [04:06:57] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [04:18:37] PROBLEM - MySQL InnoDB on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:19:27] RECOVERY - MySQL InnoDB on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [04:41:05] (03CR) 10BryanDavis: [C: 031] "`git submodule status` exits with status 1 and error message "You need to run this command from the toplevel of the working tree." when yo" [operations/puppet] - 10https://gerrit.wikimedia.org/r/121574 (owner: 10Ryan Lane) [05:59:05] !log deployed patch for bug63251 for wmf19 and wmf20 [05:59:12] Logged the message, Master [08:42:01] (03PS11) 10Ori.livneh: Change home directory of vagrant user [operations/puppet] - 10https://gerrit.wikimedia.org/r/118053 (owner: 10Physikerwelt) [08:42:06] (03CR) 10Ori.livneh: [C: 032] Change home directory of vagrant user [operations/puppet] - 10https://gerrit.wikimedia.org/r/118053 (owner: 10Physikerwelt) [08:55:07] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:55:57] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [09:53:17] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:56:17] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 332660 bytes in 8.857 second response time [12:03:24] paravoid, yt? I have some fixes - do they warrant an emergency deployment? [15:01:40] I need assistance from ops to determine if a DB schema change has already be done: changeset in question is https://gerrit.wikimedia.org/r/110881 [15:05:25] springles comment seems to suggest it has [15:05:27] let me check [15:06:21] | afl_namespace | int(11) | NO | MUL | NULL | | [15:06:34] hmm [15:06:37] seemingly not everywhere [15:07:50] Is there some wiki you need it on urgently? [15:10:30] !log afl_namespace tinyint -> int on mediawikiwiki [15:10:37] Logged the message, Master [15:12:05] * Reedy waits for explains to run [15:17:30] se4598: Ugh [15:17:50] looks like only enwiki was originally done [15:18:45] Reedy: thanks for ping, no there no need. I emailed springles a week ago, but no response. I think there is currently no (important) namespace with a so high id [15:23:37] se4598: Can you create a bug requesting application of the schema change if there isn't one already? [15:25:16] will do [15:30:38] MaxSem: no need for emergency deploy, the feature is there for two months :) [15:30:50] MaxSem: and thanks a lot, kudos [15:40:53] Thanks [15:49:52] !log afl_namespace tinyint -> int on all small wikis [15:49:58] Logged the message, Master [15:51:53] That table is empty on most wikis it seems [16:00:00] (03PS2) 10Nknudsen: Bug 34897 - Enable Special:Import on cawikiquote I4bdaa1b4c679356e6355987b31d1dce04ae85bd3 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121834 [16:05:20] Reedy: If you have time, you could run a query to look how many rows have the namespace set to 127 (highest value in tinyint) and so already been affected by the bug. I think they could all be fixed by a maintenance script, if someone writes one :) [16:06:09] !log 800+ connections open on db1021, db1037, db1045 (s5) [16:06:15] Logged the message, Master [16:06:45] probably need to invoked pt-kill or so if this keeps going [16:06:51] hopefully it's installed :P [16:08:48] wikidata fail? [16:08:51] I think it is/should be [16:09:14] stuff is fine again [16:09:23] will have a look at the cause... [16:09:31] might be some cache stampede [16:09:49] users briefly even saw DB errors, so we must have hit 1k [16:10:03] hoo: I saw said errors :p [16:10:40] JohnLewis: No surprise... was pinged be dewikipedians (who are alos on s5) [16:10:41] * Reedy blames JohnLewis [16:10:54] but the long lasting connections were WD ones [16:11:01] Reedy: Evidence? Well besides me admiting it :p [16:11:07] just keep growing, get your own db cluster and gtfo [16:11:07] :D [16:11:25] JohnLewis: secret log cabal is secret [16:11:28] Reedy: WIP :D [16:11:48] Reedy: Why am I not part of the secret log cabal :( [16:12:20] you didn't bribe enough people ;) :D [16:12:36] hoo: Tell me who to bribe and I'll change it :p [16:12:42] How else did you think hoo got shell access? [16:12:49] * Reedy grins [16:12:59] Reedy: Do I bribe you? :D [16:13:15] I head Ops are very partial to scotch [16:13:26] Now that I have shell, I can even get stressed on a Saturday, great :D [16:13:45] Scotch? Hm. Can easily be arranged. [16:14:23] I can remember myself saying "I'll never do sysadmin stuff ..." after an internship quite some time ago :D [16:14:31] they are very partial to good scotch [16:15:22] Reedy: Tell me who want the scotch and where and I'll see what can be done :p [16:15:28] pt-kill is on terbium at least, that's good to know in case we are getting set on fire again [16:15:30] Depends what you want done ;) [16:15:38] hm I think Ihad a hand in that shell access but I don't see any new scotch on my shelf [16:15:45] * apergos coughs politely  [16:15:57] apergos: All the scotch went to SF, sorry :D [16:16:17] people are travelling here soon. bottles could be sent :-D [16:16:24] Better take some to Zurich at least [16:16:26] apergos: Tell me what you can do and I'll get some off to you asap ;0 [16:16:28] *;) [16:16:33] Then opsen can forward it [16:16:37] Or face apergos' wrath [16:16:44] (that is actually a joke. I have two bottles here barely touched, I do like it but I drink so seldom that it's probably a silly idea) [16:17:21] apergos: Well, fancy a third bottle just in case? [16:17:25] :-D [16:17:35] I cannot say no to a good bottle! [16:18:03] (I wonder ifI could actually, if I had 20 bottles would I say no to a 21st knowing tha I don't even drink a bottle in a year? eh...probably not :-D) [16:18:26] apergos: So, you wont be in Zurich? [16:18:29] * Zürich [16:18:29] no [16:18:29] apergos: Organise abusive shell access and it'll be there before you can say anything ;) [16:18:33] can't travel [16:18:38] common, I even have the ü on my keyboard :P [16:18:56] well apr 1 is coming up [16:19:05] so the time for hijinks would be soon [16:19:15] 3 days! gt yer plans together [16:19:20] hoo: lrn2usespecialcharactersonthekeyboard? [16:19:51] JohnLewis: I do know how to use them... it's just that English ruins my German skills (and everything else :D) [16:20:07] Sure, blame the English :p [16:23:02] blame canada [16:25:08] can't find anything obvious in the logs concerning the DB conn. spikes [16:25:52] hoo: Maybe people just liked connecting for that particular moment? [16:26:28] no, these were long sleeping connection (200 or 300 seconds) [16:26:37] Oh. Hm [16:26:39] but they killed themselves before I did [16:28:17] what user were they running as? [16:28:38] wikiuser or wikiadmin, I mean? [16:29:41] wikiuser I think [16:30:17] at some point they were just gone and I then switch to fluorine for investigation [16:30:35] there is a pt kill job that shoots things running as wikiuser for longer than 5 minutes but I'm not sure on which dbs [16:31:09] spri ngle will know the details (and I should find out where those are running too) [16:31:15] could be that those hit 300s, yes [16:33:43] mh, can'T find that in puppet [16:35:22] se4598: 325 wikis to go [16:39:30] hoo@terbium:~$ crontab -u springle -l [16:39:30] must be privileged to use -u [16:39:54] I looked at it but it seems to have to do with a dashboard thing he has going [16:41:59] Sat Mar 29 16:00:54 UTC 2014 mw1201 wikidatawiki Error connecting to 10.64.16.154: :real_connect(): (HY000/1040): Too many connections [16:54:05] !log afl_namespace tinyint -> int on all medium wikis [16:54:12] Logged the message, Master [17:22:21] Reedy: you currently doing on wikimania2014 db? ;D [17:22:49] did it ages ago :P [17:23:26] ptwiki: Query OK, 1595438 row(s) affected [17:23:26] ages... like in the 90s... [17:23:41] mh, then it was only a lagging slave [17:24:02] yeah [17:24:05] s3 has a lot of wikis [17:24:20] Reedy: Do you run them per hand from master? [17:25:31] Not for these ones [17:25:31] sudo -u apache foreachwikiindblist ../large.dblist sql.php extensions/AbuseFilter/db_patches/patch-afl-namespace_int.sql [17:25:42] they're pretty cheap and quick to run [17:25:45] enwiki had already been done [17:26:24] se4598: all done [17:26:42] oh thanks [17:26:53] Reedy: If you have time, you could run a query to look how many rows have the namespace set to 127 (highest value in tinyint) and so already been affected by the bug. I think they could all be fixed by a maintenance script, if someone writes one :) [17:27:25] 127 in the affected column you just altered [17:28:01] wow [17:28:07] incubatorwiki is now classed as large [17:28:30] !log reedy updated /a/common to {{Gerrit|I7e9595ce0}}: Prepare GeoData config for Elasticsearch switchover [17:28:34] (03PS1) 10Reedy: Update wiki size dblists [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121972 [17:28:37] Logged the message, Master [17:29:01] (03CR) 10Reedy: [C: 032] Update wiki size dblists [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121972 (owner: 10Reedy) [17:29:09] (03Merged) 10jenkins-bot: Update wiki size dblists [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/121972 (owner: 10Reedy) [17:30:12] !log reedy synchronized database lists files: Update size related dblists [17:30:18] Logged the message, Master [17:31:10] !log reedy synchronized wmf-config/InitialiseSettings.php 'touch' [17:31:16] Logged the message, Master [17:32:11] thanks for doing these, btw... I kind of lost track of the AbuseFilter log thing [17:32:27] oh and thanks se4598 for keeping an eye on [17:35:41] hoo: only wanted to get my assigned bug closed :P [17:36:36] after springl.e approved the gerrit change I internally (in my head) filed it as done :P [17:38:55] AbuseFilter has some old legacy things like l10n-Messages-Contructio . I think it's getting to some point rotten. :( [17:39:43] ahhaha, getting rotten? :D It's a single code rot... [17:42:00] I've written in my wikimania scholarship-appl. that I _maybe_ want to get a AF-maintainer (dunno what thats exactly means but anyway :D). And you're now lost to WMF shell access :P [17:42:14] * 16:06 hoo: {{Gerrit|800}}+ connections open on db1021, db1037, db1045 (s5) [17:42:20] se4598: Want to take that over? [17:42:26] Reedy: yes [17:42:31] se4598: I'd so love that [17:42:38] Someone needs to tell the bot \d{2-3} isn't gerrit [17:42:52] bluehg [17:42:55] 3-4 even [17:43:07] yeah... it does the same for years like 2014 :P [17:43:22] and that if there's a bug in front thats likely NOT a gerrit one [17:44:00] greedy regex on labslogbot [17:44:07] I guess I should propose a fix [17:44:51] se4598: If you want to take it over, just tell me... I'm not eager to keep maintaining that piece of legacy code [17:44:52] I wonder if it should only link to gerrit it if its I[a-zA-Z0-d]{4,} or similar [17:45:37] Reedy: It should at least contain one letter, I guess (ok there are edge cases w/o a letter in the first 8 chars, but who cares) [17:45:53] Anyone know which repo that bot is in? [17:46:35] nope [17:46:47] I think they ahve an own repo, but don't ask me :P [17:47:57] hoo: Not so fast ;) , I haven't nearly enough experience to do that on my own, in general and in terms of AF, but will keep looking at that pieces of code. [17:48:47] don't give up easily, the code is pretty WTF often, but it's like that to all of us [17:49:22] last time I did a huge change I spend half a work day just WTFing at it's program flow :P [17:49:41] Reedy: I think it's https://git.wikimedia.org/tree/operations%2Fdebs%2Fadminbot [17:49:48] anyway, it's Saturday evening, so I'm off... cu [17:49:56] see you in a couple of hours ;) [17:53:07] (03PS1) 10Reedy: Simplify boolean return [operations/debs/adminbot] - 10https://gerrit.wikimedia.org/r/121973 [17:56:36] Reedy: Am I guessing right that the gerrit-linking was already removed but not deployed? https://gerrit.wikimedia.org/r/95571 [17:57:08] looks to be the case yup [17:57:24] I'm just filing an RT ticket to request the bot to be updated [22:56:42] (03PS1) 10Mattflaschen: Set a special logo for Commons on Beta Labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122084 [22:57:31] (03PS2) 10Mattflaschen: Set a special logo for Commons on Beta Labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122084 [23:01:20] (03CR) 10Mattflaschen: "This is from https://github.com/wikimedia/operations-mediawiki-config/pull/9 , with a new commit message and an indent fix." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122084 (owner: 10Mattflaschen) [23:11:36] something change in last 20ish mins? reqerror seems to have improved significantly (5xx not 500) [23:12:15] http://ur1.ca/gxwel [23:12:23] (the blue line) [23:17:51] jeremyb: not https://gerrit.wikimedia.org/r/#q,I46335a76096ec800ee8ce5471bacffd41d2dc4f6,n,z unless it was live hacked [23:18:06] but maybe the bot(s) were blocked/stopped