[00:11:38] PROBLEM - Lucene on searchidx1001 is CRITICAL: Connection refused [00:28:53] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:34:53] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 4.171 seconds [00:51:57] !log reedy synchronized wmf-config/InitialiseSettings.php 'Add NS to stewardwiki at request of Philippe' [00:52:00] Logged the message, Master [00:54:00] gn8 folks [00:54:16] !log reedy synchronized wmf-config/InitialiseSettings.php 'Add NS to stewardwiki at request of Philippe' [00:54:19] Logged the message, Master [00:54:30] 'night DaB [01:09:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:21:36] !log LocalisationUpdate completed (1.19) at Fri Mar 16 02:21:35 UTC 2012 [02:21:40] Logged the message, Master [02:36:33] RECOVERY - RAID on cp1035 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [02:36:34] RECOVERY - DPKG on mw1107 is OK: All packages OK [02:36:34] RECOVERY - DPKG on mw1156 is OK: All packages OK [02:36:34] RECOVERY - RAID on mw1146 is OK: OK: no RAID installed [02:36:34] RECOVERY - RAID on mw1011 is OK: OK: no RAID installed [02:36:42] RECOVERY - Disk space on mw1139 is OK: DISK OK [02:36:42] RECOVERY - Disk space on mw1074 is OK: DISK OK [02:36:42] RECOVERY - RAID on cp1012 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [02:36:42] RECOVERY - DPKG on cp1015 is OK: All packages OK [02:36:42] RECOVERY - DPKG on virt3 is OK: All packages OK [02:36:43] RECOVERY - RAID on cp1013 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [02:36:51] RECOVERY - Disk space on mw1127 is OK: DISK OK [02:36:51] RECOVERY - RAID on ms5 is OK: OK: Active: 50, Working: 50, Failed: 0, Spare: 0 [02:36:51] RECOVERY - DPKG on mw1131 is OK: All packages OK [02:36:51] RECOVERY - DPKG on mw1112 is OK: All packages OK [02:36:51] RECOVERY - DPKG on virt4 is OK: All packages OK [02:36:52] RECOVERY - Disk space on mw1015 is OK: DISK OK [02:36:52] RECOVERY - RAID on mw1032 is OK: OK: no RAID installed [02:36:53] RECOVERY - Disk space on mw1106 is OK: DISK OK [02:36:53] RECOVERY - Disk space on mw1112 is OK: DISK OK [02:36:54] RECOVERY - Disk space on mw1100 is OK: DISK OK [02:36:54] RECOVERY - Disk space on cp1035 is OK: DISK OK [02:36:55] RECOVERY - DPKG on mw1067 is OK: All packages OK [02:36:55] RECOVERY - RAID on mw1109 is OK: OK: no RAID installed [02:46:23] yay, nagios-wm gone [02:46:27] ow, it's gone everywhere.. [03:05:21] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:05:39] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.148 seconds [03:07:18] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 0.020 seconds [03:18:18] test [03:22:11] 1234 [03:23:30] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [03:28:30] PROBLEM - Puppet freshness on cp1022 is CRITICAL: Puppet has not run in the last 10 hours [03:34:24] PROBLEM - Puppet freshness on cp1021 is CRITICAL: Puppet has not run in the last 10 hours [03:35:27] RECOVERY - RAID on aluminium is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [03:35:36] PROBLEM - Puppet freshness on cp1044 is CRITICAL: Puppet has not run in the last 10 hours [03:36:39] PROBLEM - Puppet freshness on cp1041 is CRITICAL: Puppet has not run in the last 10 hours [03:36:48] PROBLEM - swift-object-server on copper is CRITICAL: Connection refused by host [03:36:48] PROBLEM - swift-container-auditor on copper is CRITICAL: Connection refused by host [03:36:57] PROBLEM - swift-container-server on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [03:36:57] PROBLEM - swift-container-replicator on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [03:36:57] PROBLEM - swift-object-auditor on magnesium is CRITICAL: Connection refused by host [03:36:57] PROBLEM - swift-account-auditor on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [03:36:57] PROBLEM - swift-object-updater on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [03:36:57] PROBLEM - swift-account-replicator on magnesium is CRITICAL: Connection refused by host [03:36:58] PROBLEM - swift-account-server on ms3 is CRITICAL: NRPE: Command check_swift_account_server not defined [03:36:58] PROBLEM - swift-object-replicator on ms3 is CRITICAL: NRPE: Command check_swift_object_replicator not defined [03:37:15] PROBLEM - swift-account-reaper on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [03:37:15] PROBLEM - swift-container-updater on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [03:37:15] PROBLEM - swift-container-server on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [03:37:15] PROBLEM - swift-object-server on ms3 is CRITICAL: NRPE: Command check_swift_object_server not defined [03:37:15] PROBLEM - swift-account-auditor on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [03:37:15] PROBLEM - swift-container-auditor on ms3 is CRITICAL: NRPE: Command check_swift_container_auditor not defined [03:37:15] PROBLEM - swift-container-replicator on copper is CRITICAL: Connection refused by host [03:37:16] PROBLEM - swift-object-updater on copper is CRITICAL: Connection refused by host [03:37:16] PROBLEM - swift-account-server on zinc is CRITICAL: Connection refused by host [04:24:44] PROBLEM - SSH on sq40 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:25:47] PROBLEM - Backend Squid HTTP on sq40 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:26:05] PROBLEM - Frontend Squid HTTP on sq40 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:48:17] PROBLEM - Host sq40 is DOWN: PING CRITICAL - Packet loss = 100% [05:01:17] PROBLEM - swift-object-server on copper is CRITICAL: Connection refused by host [05:01:17] PROBLEM - swift-container-auditor on copper is CRITICAL: Connection refused by host [05:01:26] PROBLEM - swift-account-auditor on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:01:26] PROBLEM - swift-container-server on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:01:26] PROBLEM - swift-account-replicator on magnesium is CRITICAL: Connection refused by host [05:01:26] PROBLEM - swift-object-auditor on magnesium is CRITICAL: Connection refused by host [05:01:26] PROBLEM - swift-container-replicator on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:01:26] PROBLEM - swift-object-replicator on ms3 is CRITICAL: NRPE: Command check_swift_object_replicator not defined [05:01:27] PROBLEM - swift-object-updater on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:01:27] PROBLEM - swift-account-server on ms3 is CRITICAL: NRPE: Command check_swift_account_server not defined [05:01:35] PROBLEM - swift-container-replicator on copper is CRITICAL: Connection refused by host [05:01:35] PROBLEM - swift-object-updater on copper is CRITICAL: Connection refused by host [05:01:44] PROBLEM - swift-container-server on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:01:44] PROBLEM - swift-account-auditor on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:01:44] PROBLEM - swift-container-updater on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:01:44] PROBLEM - swift-account-reaper on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:01:44] PROBLEM - swift-container-auditor on ms3 is CRITICAL: NRPE: Command check_swift_container_auditor not defined [05:01:45] PROBLEM - swift-object-server on ms3 is CRITICAL: NRPE: Command check_swift_object_server not defined [05:01:53] PROBLEM - swift-account-server on zinc is CRITICAL: Connection refused by host [05:01:53] PROBLEM - swift-object-replicator on zinc is CRITICAL: Connection refused by host [05:02:02] PROBLEM - swift-account-server on magnesium is CRITICAL: Connection refused by host [05:02:02] PROBLEM - swift-account-auditor on copper is CRITICAL: Connection refused by host [05:02:02] PROBLEM - swift-container-server on copper is CRITICAL: Connection refused by host [05:02:02] PROBLEM - swift-object-server on zinc is CRITICAL: Connection refused by host [05:02:02] PROBLEM - swift-container-auditor on zinc is CRITICAL: Connection refused by host [05:02:11] PROBLEM - swift-container-replicator on ms3 is CRITICAL: NRPE: Command check_swift_container_replicator not defined [05:02:11] PROBLEM - swift-object-updater on ms3 is CRITICAL: NRPE: Command check_swift_object_updater not defined [05:02:11] PROBLEM - swift-object-server on magnesium is CRITICAL: Connection refused by host [05:02:11] PROBLEM - swift-account-reaper on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:02:11] PROBLEM - swift-account-replicator on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:02:12] PROBLEM - swift-object-replicator on magnesium is CRITICAL: Connection refused by host [05:02:20] PROBLEM - swift-container-updater on copper is CRITICAL: Connection refused by host [05:02:20] PROBLEM - swift-account-reaper on copper is CRITICAL: Connection refused by host [05:02:20] PROBLEM - swift-container-auditor on magnesium is CRITICAL: Connection refused by host [05:02:20] PROBLEM - swift-container-updater on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:02:20] PROBLEM - swift-object-auditor on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:02:29] PROBLEM - swift-object-replicator on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:02:29] PROBLEM - swift-object-updater on magnesium is CRITICAL: Connection refused by host [05:02:29] PROBLEM - swift-container-replicator on magnesium is CRITICAL: Connection refused by host [05:02:29] PROBLEM - swift-account-server on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:02:29] PROBLEM - swift-account-replicator on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:02:29] PROBLEM - swift-object-auditor on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:02:29] PROBLEM - swift-account-auditor on ms3 is CRITICAL: NRPE: Command check_swift_account_auditor not defined [05:02:30] PROBLEM - swift-container-server on ms3 is CRITICAL: NRPE: Command check_swift_container_server not defined [05:02:38] PROBLEM - swift-account-auditor on zinc is CRITICAL: Connection refused by host [05:02:38] PROBLEM - swift-container-server on zinc is CRITICAL: Connection refused by host [05:02:38] PROBLEM - swift-object-updater on zinc is CRITICAL: Connection refused by host [05:02:38] PROBLEM - swift-container-replicator on zinc is CRITICAL: Connection refused by host [05:02:47] PROBLEM - swift-container-server on magnesium is CRITICAL: Connection refused by host [05:02:47] PROBLEM - swift-account-auditor on magnesium is CRITICAL: Connection refused by host [05:02:47] PROBLEM - swift-account-replicator on copper is CRITICAL: Connection refused by host [05:02:56] PROBLEM - swift-object-auditor on copper is CRITICAL: Connection refused by host [05:02:56] PROBLEM - swift-container-auditor on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:03:05] PROBLEM - swift-account-server on copper is CRITICAL: Connection refused by host [05:03:05] PROBLEM - swift-object-replicator on copper is CRITICAL: Connection refused by host [05:03:05] PROBLEM - swift-account-reaper on zinc is CRITICAL: Connection refused by host [05:03:05] PROBLEM - swift-container-updater on ms3 is CRITICAL: NRPE: Command check_swift_container_updater not defined [05:03:05] PROBLEM - swift-container-updater on zinc is CRITICAL: Connection refused by host [05:03:05] PROBLEM - swift-account-server on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:03:06] PROBLEM - swift-account-reaper on magnesium is CRITICAL: Connection refused by host [05:03:06] PROBLEM - swift-container-updater on magnesium is CRITICAL: Connection refused by host [05:03:07] PROBLEM - swift-object-server on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:03:14] PROBLEM - swift-object-auditor on ms3 is CRITICAL: NRPE: Command check_swift_object_auditor not defined [05:03:14] PROBLEM - swift-object-replicator on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:03:14] PROBLEM - swift-object-server on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:03:23] PROBLEM - swift-object-auditor on zinc is CRITICAL: Connection refused by host [05:03:23] PROBLEM - swift-account-replicator on zinc is CRITICAL: Connection refused by host [05:03:23] PROBLEM - swift-account-reaper on ms3 is CRITICAL: NRPE: Command check_swift_account_reaper not defined [05:03:23] PROBLEM - swift-container-auditor on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:03:23] PROBLEM - swift-container-replicator on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:03:23] PROBLEM - swift-object-updater on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:03:24] PROBLEM - swift-account-replicator on ms3 is CRITICAL: NRPE: Command check_swift_account_replicator not defined [05:05:07] well, that is quite a few problems [05:06:24] Alpha_Quadrant: how many? [05:07:19] jeremyb: no idea, but the bot appears to be listing quite a few [05:07:41] it looks like 19 problems listed above [06:53:19] PROBLEM - swift-container-auditor on copper is CRITICAL: Connection refused by host [06:53:19] PROBLEM - swift-object-server on copper is CRITICAL: Connection refused by host [06:53:28] PROBLEM - swift-object-auditor on magnesium is CRITICAL: Connection refused by host [06:53:28] PROBLEM - swift-account-replicator on magnesium is CRITICAL: Connection refused by host [06:53:28] PROBLEM - swift-container-server on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:53:28] PROBLEM - swift-object-updater on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:53:28] PROBLEM - swift-account-auditor on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:53:29] PROBLEM - swift-container-replicator on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:53:29] PROBLEM - swift-object-replicator on ms3 is CRITICAL: NRPE: Command check_swift_object_replicator not defined [06:53:30] PROBLEM - swift-account-server on ms3 is CRITICAL: NRPE: Command check_swift_account_server not defined [06:53:37] PROBLEM - swift-object-updater on copper is CRITICAL: Connection refused by host [06:53:37] PROBLEM - swift-container-replicator on copper is CRITICAL: Connection refused by host [06:53:46] PROBLEM - swift-container-updater on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:53:46] PROBLEM - swift-account-reaper on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:53:46] PROBLEM - swift-container-server on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:53:46] PROBLEM - swift-account-auditor on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:53:46] PROBLEM - swift-object-server on ms3 is CRITICAL: NRPE: Command check_swift_object_server not defined [06:53:47] PROBLEM - swift-container-auditor on ms3 is CRITICAL: NRPE: Command check_swift_container_auditor not defined [06:53:47] PROBLEM - swift-account-server on zinc is CRITICAL: Connection refused by host [06:53:55] PROBLEM - swift-object-replicator on zinc is CRITICAL: Connection refused by host [06:54:04] PROBLEM - swift-account-auditor on copper is CRITICAL: Connection refused by host [06:54:04] PROBLEM - swift-container-server on copper is CRITICAL: Connection refused by host [06:54:04] PROBLEM - swift-container-auditor on zinc is CRITICAL: Connection refused by host [06:54:04] PROBLEM - swift-object-server on zinc is CRITICAL: Connection refused by host [06:54:04] PROBLEM - swift-object-replicator on magnesium is CRITICAL: Connection refused by host [06:54:13] PROBLEM - swift-container-replicator on ms3 is CRITICAL: NRPE: Command check_swift_container_replicator not defined [06:54:13] PROBLEM - swift-object-updater on ms3 is CRITICAL: NRPE: Command check_swift_object_updater not defined [06:54:13] PROBLEM - swift-account-server on magnesium is CRITICAL: Connection refused by host [06:54:13] PROBLEM - swift-object-auditor on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:54:13] PROBLEM - swift-object-server on magnesium is CRITICAL: Connection refused by host [06:54:13] PROBLEM - swift-account-reaper on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:54:14] PROBLEM - swift-container-auditor on magnesium is CRITICAL: Connection refused by host [06:54:22] PROBLEM - swift-account-reaper on copper is CRITICAL: Connection refused by host [06:54:22] PROBLEM - swift-container-updater on copper is CRITICAL: Connection refused by host [06:54:22] PROBLEM - swift-container-replicator on zinc is CRITICAL: Connection refused by host [06:54:22] PROBLEM - swift-object-updater on zinc is CRITICAL: Connection refused by host [06:54:22] PROBLEM - swift-container-updater on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:54:22] PROBLEM - swift-account-replicator on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:54:31] PROBLEM - swift-object-updater on magnesium is CRITICAL: Connection refused by host [06:54:31] PROBLEM - swift-account-server on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:54:31] PROBLEM - swift-container-replicator on magnesium is CRITICAL: Connection refused by host [06:54:31] PROBLEM - swift-object-replicator on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:54:31] PROBLEM - swift-account-replicator on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:54:31] PROBLEM - swift-object-auditor on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:54:31] PROBLEM - swift-account-auditor on ms3 is CRITICAL: NRPE: Command check_swift_account_auditor not defined [06:54:32] PROBLEM - swift-container-server on ms3 is CRITICAL: NRPE: Command check_swift_container_server not defined [06:54:40] PROBLEM - swift-container-server on zinc is CRITICAL: Connection refused by host [06:54:40] PROBLEM - swift-account-auditor on zinc is CRITICAL: Connection refused by host [06:54:49] PROBLEM - swift-object-server on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:54:49] PROBLEM - swift-object-replicator on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:54:49] PROBLEM - swift-account-server on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:54:49] PROBLEM - swift-account-reaper on ms3 is CRITICAL: NRPE: Command check_swift_account_reaper not defined [06:54:49] PROBLEM - swift-account-replicator on copper is CRITICAL: Connection refused by host [06:54:50] PROBLEM - swift-container-updater on ms3 is CRITICAL: NRPE: Command check_swift_container_updater not defined [06:54:58] PROBLEM - swift-object-auditor on copper is CRITICAL: Connection refused by host [06:55:07] PROBLEM - swift-container-auditor on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:55:07] PROBLEM - swift-account-auditor on magnesium is CRITICAL: Connection refused by host [06:55:07] PROBLEM - swift-object-replicator on copper is CRITICAL: Connection refused by host [06:55:07] PROBLEM - swift-account-server on copper is CRITICAL: Connection refused by host [06:55:07] PROBLEM - swift-container-updater on zinc is CRITICAL: Connection refused by host [06:55:07] PROBLEM - swift-account-reaper on zinc is CRITICAL: Connection refused by host [06:55:16] PROBLEM - swift-object-updater on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:55:16] PROBLEM - swift-object-server on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:55:16] PROBLEM - swift-container-auditor on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:55:16] PROBLEM - swift-container-replicator on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [06:55:16] PROBLEM - swift-account-replicator on ms3 is CRITICAL: NRPE: Command check_swift_account_replicator not defined [06:55:16] PROBLEM - swift-object-auditor on ms3 is CRITICAL: NRPE: Command check_swift_object_auditor not defined [06:55:16] PROBLEM - swift-account-reaper on magnesium is CRITICAL: Connection refused by host [06:55:25] PROBLEM - swift-container-server on magnesium is CRITICAL: Connection refused by host [06:55:25] PROBLEM - swift-object-auditor on zinc is CRITICAL: Connection refused by host [06:55:25] PROBLEM - swift-account-replicator on zinc is CRITICAL: Connection refused by host [06:55:25] PROBLEM - swift-container-updater on magnesium is CRITICAL: Connection refused by host [07:55:57] mhm [08:26:01] PROBLEM - MySQL Replication Heartbeat on db1047 is CRITICAL: CRIT replication delay 316 seconds [08:26:28] PROBLEM - MySQL Slave Delay on db1047 is CRITICAL: CRIT replication delay 343 seconds [08:28:07] RECOVERY - MySQL Replication Heartbeat on db1047 is OK: OK replication delay 27 seconds [08:30:40] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay 0 seconds [08:43:25] RECOVERY - Auth DNS on ns2.wikimedia.org is OK: DNS OK: 0.138 seconds response time. www.wikipedia.org returns 208.80.154.225 [08:44:55] PROBLEM - MySQL Replication Heartbeat on db1047 is CRITICAL: CRIT replication delay 182 seconds [08:45:22] PROBLEM - MySQL Slave Delay on db1047 is CRITICAL: CRIT replication delay 188 seconds [08:47:01] RECOVERY - MySQL Replication Heartbeat on db1047 is OK: OK replication delay 4 seconds [08:47:28] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay 0 seconds [08:57:17] PROBLEM - MySQL Slave Delay on db1047 is CRITICAL: CRIT replication delay 208 seconds [08:58:56] PROBLEM - MySQL Replication Heartbeat on db1047 is CRITICAL: CRIT replication delay 271 seconds [09:05:14] RECOVERY - MySQL Replication Heartbeat on db1047 is OK: OK replication delay 0 seconds [09:05:41] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay 0 seconds [09:27:35] PROBLEM - swift-container-auditor on ms-be2 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [09:30:44] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [09:31:47] RECOVERY - swift-container-auditor on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [09:32:50] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [09:39:44] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [09:39:44] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [09:48:26] PROBLEM - Disk space on srv221 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=61%): /var/lib/ureadahead/debugfs 0 MB (0% inode=61%): [09:50:59] PROBLEM - Disk space on srv220 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=61%): /var/lib/ureadahead/debugfs 0 MB (0% inode=61%): [09:50:59] PROBLEM - Disk space on srv219 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=61%): /var/lib/ureadahead/debugfs 0 MB (0% inode=61%): [09:50:59] PROBLEM - Disk space on srv222 is CRITICAL: DISK CRITICAL - free space: / 157 MB (2% inode=61%): /var/lib/ureadahead/debugfs 157 MB (2% inode=61%): [09:50:59] PROBLEM - Disk space on srv224 is CRITICAL: DISK CRITICAL - free space: / 176 MB (2% inode=61%): /var/lib/ureadahead/debugfs 176 MB (2% inode=61%): [09:50:59] PROBLEM - Disk space on srv223 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=61%): /var/lib/ureadahead/debugfs 0 MB (0% inode=61%): [09:57:08] RECOVERY - Disk space on srv221 is OK: DISK OK [10:01:20] RECOVERY - Disk space on srv222 is OK: DISK OK [10:01:20] RECOVERY - Disk space on srv219 is OK: DISK OK [10:01:20] RECOVERY - Disk space on srv223 is OK: DISK OK [10:01:20] RECOVERY - Disk space on srv220 is OK: DISK OK [10:01:20] RECOVERY - Disk space on srv224 is OK: DISK OK [10:26:33] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:30:45] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [11:26:37] PROBLEM - swift-container-auditor on copper is CRITICAL: Connection refused by host [11:26:37] PROBLEM - swift-object-server on copper is CRITICAL: Connection refused by host [11:26:46] PROBLEM - swift-object-auditor on magnesium is CRITICAL: Connection refused by host [11:26:46] PROBLEM - swift-account-replicator on magnesium is CRITICAL: Connection refused by host [11:26:46] PROBLEM - swift-container-replicator on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:26:46] PROBLEM - swift-account-auditor on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:26:46] PROBLEM - swift-container-server on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:26:47] PROBLEM - swift-account-server on ms3 is CRITICAL: NRPE: Command check_swift_account_server not defined [11:26:47] PROBLEM - swift-object-updater on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:26:48] PROBLEM - swift-object-replicator on ms3 is CRITICAL: NRPE: Command check_swift_object_replicator not defined [11:27:04] PROBLEM - swift-container-replicator on copper is CRITICAL: Connection refused by host [11:27:04] PROBLEM - swift-object-updater on copper is CRITICAL: Connection refused by host [11:27:04] PROBLEM - swift-object-replicator on zinc is CRITICAL: Connection refused by host [11:27:04] PROBLEM - swift-account-server on zinc is CRITICAL: Connection refused by host [11:27:04] PROBLEM - swift-container-updater on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:27:04] PROBLEM - swift-account-reaper on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:27:05] PROBLEM - swift-account-auditor on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:27:05] PROBLEM - swift-object-server on ms3 is CRITICAL: NRPE: Command check_swift_object_server not defined [11:27:06] PROBLEM - swift-container-auditor on ms3 is CRITICAL: NRPE: Command check_swift_container_auditor not defined [11:27:06] PROBLEM - swift-container-server on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:27:22] PROBLEM - swift-account-server on magnesium is CRITICAL: Connection refused by host [11:27:22] PROBLEM - swift-container-server on copper is CRITICAL: Connection refused by host [11:27:22] PROBLEM - swift-account-auditor on copper is CRITICAL: Connection refused by host [11:27:22] PROBLEM - swift-container-auditor on zinc is CRITICAL: Connection refused by host [11:27:22] PROBLEM - swift-object-server on zinc is CRITICAL: Connection refused by host [11:27:31] PROBLEM - swift-object-server on magnesium is CRITICAL: Connection refused by host [11:27:31] PROBLEM - swift-object-auditor on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:27:31] PROBLEM - swift-account-reaper on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:27:31] PROBLEM - swift-container-auditor on magnesium is CRITICAL: Connection refused by host [11:27:31] PROBLEM - swift-container-updater on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:27:31] PROBLEM - swift-account-replicator on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:27:31] PROBLEM - swift-container-replicator on ms3 is CRITICAL: NRPE: Command check_swift_container_replicator not defined [11:27:32] PROBLEM - swift-object-updater on ms3 is CRITICAL: NRPE: Command check_swift_object_updater not defined [11:27:32] PROBLEM - swift-object-replicator on magnesium is CRITICAL: Connection refused by host [11:27:40] PROBLEM - swift-account-reaper on copper is CRITICAL: Connection refused by host [11:27:40] PROBLEM - swift-container-updater on copper is CRITICAL: Connection refused by host [11:27:49] PROBLEM - swift-object-updater on magnesium is CRITICAL: Connection refused by host [11:27:49] PROBLEM - swift-account-server on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:27:49] PROBLEM - swift-container-replicator on magnesium is CRITICAL: Connection refused by host [11:27:49] PROBLEM - swift-object-replicator on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:27:49] PROBLEM - swift-account-replicator on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:27:50] PROBLEM - swift-container-server on ms3 is CRITICAL: NRPE: Command check_swift_container_server not defined [11:27:50] PROBLEM - swift-object-auditor on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:27:51] PROBLEM - swift-account-auditor on ms3 is CRITICAL: NRPE: Command check_swift_account_auditor not defined [11:27:58] PROBLEM - swift-container-replicator on zinc is CRITICAL: Connection refused by host [11:28:07] PROBLEM - swift-object-auditor on copper is CRITICAL: Connection refused by host [11:28:07] PROBLEM - swift-account-replicator on copper is CRITICAL: Connection refused by host [11:28:07] PROBLEM - swift-container-server on zinc is CRITICAL: Connection refused by host [11:28:07] PROBLEM - swift-account-auditor on zinc is CRITICAL: Connection refused by host [11:28:16] PROBLEM - swift-account-auditor on magnesium is CRITICAL: Connection refused by host [11:28:16] PROBLEM - swift-container-server on magnesium is CRITICAL: Connection refused by host [11:28:16] PROBLEM - swift-container-auditor on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:28:16] PROBLEM - swift-object-server on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:28:16] PROBLEM - swift-object-replicator on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:28:17] PROBLEM - swift-container-updater on ms3 is CRITICAL: NRPE: Command check_swift_container_updater not defined [11:28:17] PROBLEM - swift-account-server on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:28:18] PROBLEM - swift-object-updater on zinc is CRITICAL: Connection refused by host [11:28:18] PROBLEM - swift-account-reaper on ms3 is CRITICAL: NRPE: Command check_swift_account_reaper not defined [11:28:25] PROBLEM - swift-account-server on copper is CRITICAL: Connection refused by host [11:28:25] PROBLEM - swift-object-replicator on copper is CRITICAL: Connection refused by host [11:28:25] PROBLEM - swift-account-reaper on zinc is CRITICAL: Connection refused by host [11:28:25] PROBLEM - swift-container-updater on zinc is CRITICAL: Connection refused by host [11:28:34] PROBLEM - swift-container-updater on magnesium is CRITICAL: Connection refused by host [11:28:34] PROBLEM - swift-account-reaper on magnesium is CRITICAL: Connection refused by host [11:28:34] PROBLEM - swift-object-updater on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:28:34] PROBLEM - swift-container-replicator on ms1 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:28:34] PROBLEM - swift-object-server on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:28:35] PROBLEM - swift-container-auditor on ms2 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:28:35] PROBLEM - swift-object-auditor on ms3 is CRITICAL: NRPE: Command check_swift_object_auditor not defined [11:28:36] PROBLEM - swift-account-replicator on ms3 is CRITICAL: NRPE: Command check_swift_account_replicator not defined [11:28:43] PROBLEM - swift-account-replicator on zinc is CRITICAL: Connection refused by host [11:28:43] PROBLEM - swift-object-auditor on zinc is CRITICAL: Connection refused by host [12:23:03] any coder with access to stewardswiki online? [12:39:10] PROBLEM - Puppet freshness on stafford is CRITICAL: Puppet has not run in the last 10 hours [12:45:47] got just some timeouts on deletions: from within function "LocalFile::delete". Database returned error "1205: Lock wait timeout exceeded; try restarting transaction (10.0.6.41)". [12:46:03] (at Wikimedia Commons) [12:46:20] retries were successful [12:50:32] same with me some hours ago! [12:51:03] needed some tries to delete *not only the page* but the file as well ... [12:51:25] i.e. the page has been deleted, but the file was still there and shown! [13:25:13] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [13:30:10] PROBLEM - Puppet freshness on cp1022 is CRITICAL: Puppet has not run in the last 10 hours [13:36:11] PROBLEM - Puppet freshness on cp1021 is CRITICAL: Puppet has not run in the last 10 hours [13:37:23] PROBLEM - Puppet freshness on cp1044 is CRITICAL: Puppet has not run in the last 10 hours [13:38:26] PROBLEM - Puppet freshness on cp1041 is CRITICAL: Puppet has not run in the last 10 hours [13:39:29] PROBLEM - Puppet freshness on cp1027 is CRITICAL: Puppet has not run in the last 10 hours [13:39:29] PROBLEM - Puppet freshness on cp1025 is CRITICAL: Puppet has not run in the last 10 hours [13:44:26] PROBLEM - Puppet freshness on cp1024 is CRITICAL: Puppet has not run in the last 10 hours [13:46:23] PROBLEM - Puppet freshness on cp1042 is CRITICAL: Puppet has not run in the last 10 hours [13:46:41] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:47:17] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:49:23] PROBLEM - Puppet freshness on cp1026 is CRITICAL: Puppet has not run in the last 10 hours [13:50:26] PROBLEM - Puppet freshness on cp1043 is CRITICAL: Puppet has not run in the last 10 hours [13:54:11] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [13:57:29] PROBLEM - Puppet freshness on cp1023 is CRITICAL: Puppet has not run in the last 10 hours [13:57:29] PROBLEM - Puppet freshness on cp1028 is CRITICAL: Puppet has not run in the last 10 hours [14:01:23] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 6.440 seconds [14:01:59] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.578 seconds [14:04:26] !log restarted swift-container-auditor on ms-be3, it had died for some reason [14:04:29] Logged the message, Master [14:04:41] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [14:06:11] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:08:08] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:08:08] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [14:10:23] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:16:32] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 8.622 seconds [14:22:50] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:47:04] RECOVERY - Puppet freshness on cp1021 is OK: puppet ran at Fri Mar 16 14:46:59 UTC 2012 [14:50:58] RECOVERY - Puppet freshness on cp1041 is OK: puppet ran at Fri Mar 16 14:50:41 UTC 2012 [14:52:01] RECOVERY - Puppet freshness on cp1042 is OK: puppet ran at Fri Mar 16 14:51:46 UTC 2012 [14:53:22] PROBLEM - Varnish traffic logger on cp1021 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishncsa [14:54:52] PROBLEM - Varnish HTTP upload-backend on cp1021 is CRITICAL: Connection refused [14:55:16] !log reedy synchronized php-1.19/extensions/WikimediaMaintenance/cleanupBug31576.php [14:55:19] PROBLEM - DPKG on cp1021 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:55:19] Logged the message, Master [14:57:34] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Fri Mar 16 14:57:02 UTC 2012 [14:58:01] RECOVERY - Puppet freshness on cp1043 is OK: puppet ran at Fri Mar 16 14:57:56 UTC 2012 [14:59:04] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Fri Mar 16 14:58:53 UTC 2012 [14:59:04] RECOVERY - Varnish HTTP upload-backend on cp1021 is OK: HTTP OK HTTP/1.1 200 OK - 634 bytes in 0.055 seconds [15:04:35] !log root synchronized ufg.sql 'test sync to see if hume is fixed' [15:04:38] Logged the message, Master [15:04:55] PROBLEM - MySQL Slave Delay on db42 is CRITICAL: CRIT replication delay 183 seconds [15:06:34] RECOVERY - Puppet freshness on cp1022 is OK: puppet ran at Fri Mar 16 15:06:14 UTC 2012 [15:06:52] PROBLEM - MySQL Replication Heartbeat on db42 is CRITICAL: CRIT replication delay 202 seconds [15:07:28] PROBLEM - Host cp1021 is DOWN: PING CRITICAL - Packet loss = 100% [15:08:40] PROBLEM - Host cp1022 is DOWN: PING CRITICAL - Packet loss = 100% [15:09:07] RECOVERY - Host cp1021 is UP: PING OK - Packet loss = 0%, RTA = 26.75 ms [15:09:40] !log reedy synchronized stylize.php 'Test for hume' [15:09:43] Logged the message, Master [15:10:37] PROBLEM - Host cp1023 is DOWN: PING CRITICAL - Packet loss = 100% [15:10:46] RECOVERY - Host cp1022 is UP: PING OK - Packet loss = 0%, RTA = 26.58 ms [15:11:22] RECOVERY - Host cp1023 is UP: PING OK - Packet loss = 0%, RTA = 26.46 ms [15:13:46] PROBLEM - MySQL Slave Delay on db1033 is CRITICAL: CRIT replication delay 182 seconds [15:14:13] PROBLEM - Varnish HTTP upload-frontend on cp1021 is CRITICAL: Connection refused [15:15:07] PROBLEM - Varnish traffic logger on cp1022 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishncsa [15:15:25] PROBLEM - MySQL Replication Heartbeat on db1033 is CRITICAL: CRIT replication delay 203 seconds [15:15:52] PROBLEM - Varnish traffic logger on cp1026 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishncsa [15:16:01] PROBLEM - Varnish traffic logger on cp1023 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishncsa [15:16:37] PROBLEM - Varnish traffic logger on cp1024 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishncsa [15:16:37] PROBLEM - Varnish HTTP upload-frontend on cp1028 is CRITICAL: Connection refused [15:16:55] PROBLEM - Varnish HTTP upload-frontend on cp1024 is CRITICAL: Connection refused [15:16:55] PROBLEM - Varnish HTTP upload-frontend on cp1025 is CRITICAL: Connection refused [15:16:55] PROBLEM - Varnish HTTP upload-frontend on cp1022 is CRITICAL: Connection refused [15:17:04] PROBLEM - Varnish traffic logger on cp1028 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishncsa [15:17:13] PROBLEM - Varnish traffic logger on cp1025 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishncsa [15:17:31] PROBLEM - Varnish HTTP upload-frontend on cp1026 is CRITICAL: Connection refused [15:17:40] PROBLEM - Varnish HTTP upload-frontend on cp1023 is CRITICAL: Connection refused [15:18:34] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 8.285 seconds [15:19:01] RECOVERY - Varnish HTTP upload-frontend on cp1022 is OK: HTTP OK HTTP/1.1 200 OK - 641 bytes in 0.053 seconds [15:19:19] RECOVERY - Varnish traffic logger on cp1022 is OK: PROCS OK: 2 processes with command name varnishncsa [15:20:31] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Fri Mar 16 15:20:26 UTC 2012 [15:21:16] RECOVERY - Varnish HTTP upload-frontend on cp1024 is OK: HTTP OK HTTP/1.1 200 OK - 643 bytes in 0.053 seconds [15:21:43] RECOVERY - MySQL Replication Heartbeat on db1033 is OK: OK replication delay 0 seconds [15:22:10] RECOVERY - MySQL Slave Delay on db1033 is OK: OK replication delay 0 seconds [15:22:37] RECOVERY - Varnish HTTP upload-frontend on cp1021 is OK: HTTP OK HTTP/1.1 200 OK - 643 bytes in 0.053 seconds [15:22:55] RECOVERY - Varnish traffic logger on cp1024 is OK: PROCS OK: 2 processes with command name varnishncsa [15:23:13] RECOVERY - Varnish traffic logger on cp1021 is OK: PROCS OK: 2 processes with command name varnishncsa [15:24:34] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Fri Mar 16 15:24:05 UTC 2012 [15:24:34] RECOVERY - Varnish traffic logger on cp1023 is OK: PROCS OK: 2 processes with command name varnishncsa [15:24:52] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:25:46] RECOVERY - MySQL Replication Heartbeat on db42 is OK: OK replication delay 0 seconds [15:25:55] RECOVERY - Varnish HTTP upload-frontend on cp1023 is OK: HTTP OK HTTP/1.1 200 OK - 643 bytes in 0.053 seconds [15:26:04] RECOVERY - MySQL Slave Delay on db42 is OK: OK replication delay 0 seconds [15:28:28] RECOVERY - Puppet freshness on cp1028 is OK: puppet ran at Fri Mar 16 15:28:03 UTC 2012 [15:29:04] RECOVERY - Varnish HTTP upload-frontend on cp1028 is OK: HTTP OK HTTP/1.1 200 OK - 643 bytes in 0.053 seconds [15:29:40] RECOVERY - Varnish traffic logger on cp1028 is OK: PROCS OK: 2 processes with command name varnishncsa [15:33:07] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Fri Mar 16 15:32:45 UTC 2012 [15:33:34] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.736 seconds [15:34:19] RECOVERY - Varnish HTTP upload-frontend on cp1026 is OK: HTTP OK HTTP/1.1 200 OK - 643 bytes in 0.053 seconds [15:34:55] RECOVERY - Varnish traffic logger on cp1026 is OK: PROCS OK: 2 processes with command name varnishncsa [15:37:55] RECOVERY - Puppet freshness on cp1025 is OK: puppet ran at Fri Mar 16 15:37:34 UTC 2012 [15:38:22] RECOVERY - Varnish traffic logger on cp1025 is OK: PROCS OK: 2 processes with command name varnishncsa [15:39:34] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.592 seconds [15:39:52] RECOVERY - Varnish HTTP upload-frontend on cp1025 is OK: HTTP OK HTTP/1.1 200 OK - 641 bytes in 0.053 seconds [15:39:52] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:45:52] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:47:13] PROBLEM - Disk space on srv221 is CRITICAL: DISK CRITICAL - free space: / 237 MB (3% inode=61%): /var/lib/ureadahead/debugfs 237 MB (3% inode=61%): [15:50:04] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 8.259 seconds [15:50:22] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.104 seconds [15:57:28] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:57:28] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:00:55] RECOVERY - Disk space on srv221 is OK: DISK OK [16:19:04] RECOVERY - Host cp1019 is UP: PING OK - Packet loss = 0%, RTA = 26.44 ms [16:19:25] someone is complaining that it.wiki pages on google are shown with the sitenotice content [16:19:34] and suggests to add somewhere [16:20:10] How is the problem dealt with on the CentralNotice, and whatever the solution is shouldn't it be applied on the default sitenotice css/whatever? [16:20:27] pgehres, ^? [16:20:34] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 8.279 seconds [16:20:34] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.272 seconds [16:22:48] !log reedy synchronized php-1.19/extensions/CentralAuth/specials/SpecialCentralAuth.php 'r114021' [16:22:51] Logged the message, Master [16:23:25] PROBLEM - Backend Squid HTTP on cp1019 is CRITICAL: Connection refused [16:24:46] PROBLEM - Frontend Squid HTTP on cp1019 is CRITICAL: Connection refused [16:26:52] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:26:52] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:36:37] PROBLEM - DPKG on professor is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:39:28] RECOVERY - Frontend Squid HTTP on cp1019 is OK: HTTP OK HTTP/1.0 200 OK - 27535 bytes in 0.162 seconds [16:46:19] for what it's worth, which is probably very little, we've just had a troll in another channel claim bits.wikimedia.org is about to be ddosed by a botnet [16:46:44] out of curiosity, what channel? [16:46:46] !ops I'm gonna send 10 gb/sec packets to bits.wikimedia.org... In fact, 10 minutes is enough... [16:46:58] heh. this one, now :D [16:46:59] nm [16:47:01] yeah, you pinged the guys that kick you [16:47:04] A nice UDP flood... With SYN/ACK floods. [16:47:09] *shrug* [16:47:52] RECOVERY - Host cp1017 is UP: PING OK - Packet loss = 0%, RTA = 26.48 ms [16:52:22] PROBLEM - Backend Squid HTTP on cp1017 is CRITICAL: Connection refused [16:54:12] t-Minus 2 minutes. [16:54:20] what? [16:54:27] Rcsprinter, ignore him please [16:54:49] T-Minutes until 10GB/second attack on bits.wikimedia.org [16:55:34] helpful troll is helpful [16:55:49] PROBLEM - Host search1017 is DOWN: PING CRITICAL - Packet loss = 100% [16:56:39] Now booting up cannons... [16:57:35] Targets acquired: 208.80.154.235, 208.80.154.234,208.80.154.233,208.80.154.232, 208.80.154.231 [16:58:04] Loading plasma from Fluffernutter's breath... [16:58:13] PROBLEM - Host search1018 is DOWN: PING CRITICAL - Packet loss = 100% [16:58:31] !ops Now attacking. [16:58:46] Next target: toolserver.wikimedia.org [16:58:56] Baiiii [17:00:14] TBloemink: ? [17:00:29] TBloemink exploded, PiRSquaredAway [17:08:06] No, tgicuze did. [17:11:43] RECOVERY - Backend Squid HTTP on cp1019 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.160 seconds [17:12:34] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 8.063 seconds [17:23:22] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 7.209 seconds [17:26:58] RECOVERY - Backend Squid HTTP on cp1017 is OK: HTTP OK HTTP/1.0 200 OK - 27400 bytes in 0.161 seconds [17:28:19] RECOVERY - Host search1018 is UP: PING OK - Packet loss = 0%, RTA = 26.43 ms [17:29:22] RECOVERY - Host search1017 is UP: PING OK - Packet loss = 0%, RTA = 26.43 ms [17:33:16] PROBLEM - RAID on search1018 is CRITICAL: Connection refused by host [17:33:34] PROBLEM - SSH on search1018 is CRITICAL: Connection refused [17:34:37] PROBLEM - DPKG on search1018 is CRITICAL: Connection refused by host [17:35:13] PROBLEM - DPKG on search1017 is CRITICAL: Connection refused by host [17:35:22] PROBLEM - RAID on search1017 is CRITICAL: Connection refused by host [17:35:40] PROBLEM - SSH on search1017 is CRITICAL: Connection refused [17:40:55] PROBLEM - Lucene on search1018 is CRITICAL: Connection refused [17:40:55] PROBLEM - Lucene on search1017 is CRITICAL: Connection refused [17:48:16] RECOVERY - SSH on search1017 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:48:34] RECOVERY - SSH on search1018 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:00:25] PROBLEM - NTP on search1017 is CRITICAL: NTP CRITICAL: No response from NTP server [18:04:46] RECOVERY - DPKG on search1017 is OK: All packages OK [18:05:04] RECOVERY - RAID on search1017 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [18:05:40] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:06:25] RECOVERY - Disk space on search1017 is OK: DISK OK [18:06:43] RECOVERY - NTP on search1017 is OK: NTP OK: Offset 0.1131283045 secs [18:07:19] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:07:46] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.750 seconds [18:10:28] RECOVERY - Lucene on search1017 is OK: TCP OK - 0.027 second response time on port 8123 [18:15:07] RECOVERY - DPKG on search1018 is OK: All packages OK [18:15:07] RECOVERY - RAID on search1018 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [18:16:46] RECOVERY - Disk space on search1018 is OK: DISK OK [18:19:55] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 0.007 seconds [18:20:58] RECOVERY - Lucene on search1018 is OK: TCP OK - 0.027 second response time on port 8123 [19:15:22] !log Ran namespaceDupes on stewardwiki [19:15:25] Logged the message, Master [19:28:12] 16 19:23:06 < Kai_WMDE> jeremyb: sounds good. maybe you can also tell me if there are copies of the pagecount files from dumps.wikimedia.org located on a server in amsterdam? [19:28:33] who knows about pagecounts locations? apergos ? [19:28:46] me [19:29:00] there's not an esams copy [19:29:45] that's a cacheing center so I guess it's alittle weird to put dumps etc there [19:31:17] apergos: i see. daniel (duesentrieb) said the dumps were also in the data centre in amsterdam [19:31:27] they are? [19:31:38] this is news to me because I don't copy them there [19:31:41] apergos: i don't know :) [19:31:44] toolserver users sometimes pull copies [19:31:50] of the ones they need [19:32:01] yes, that's what i am about to do [19:32:17] you know they go in a central location over there right? [19:32:31] Kai_WMDE: did you check the user-store? [19:32:36] all the dumps are shared exactly so we don't get ten people downloading their own copies of en wiki or de wiki or whatever [19:32:39] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [19:34:36] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [19:35:06] jeremyb: yes, i checked that, but didn't find it. there are just db-dumps, it seems. [19:35:24] Kai_WMDE: did you talk to Danny_B|backup ? [19:35:39] jeremyb: yes [19:35:51] Kai_WMDE: also, you should read toolserver-l once in a while ;) [19:35:55] jeremyb: i assume you wanted to send him to DaBPunkt not to me... [19:36:07] Danny_B|backup: nope. Danny_B|backup [19:36:29] Danny_B|backup: you aren't the new user-store dumps czar? [19:36:47] ah... Kai_WMDE was talking to me about something different [19:37:06] jeremyb, Danny_B|backup, apergos: ah, i think i found them [19:37:22] cd stats [19:37:31] how big are these files? [19:37:44] 50 to 100 mb each [19:38:10] in fact they are located in user-store [19:38:13] so, 600GB total maybe [19:38:30] oh. so they probably just download from us [19:38:35] that's fine [19:38:38] yeah, sure [19:39:43] apergos Danny_B|backup jeremyb: thank a lot [19:39:56] *s [19:40:10] sure, glad you found em [19:40:17] gern geschehen [19:41:30] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [19:41:30] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [19:42:53] jeremyb: you know german? i'll keep that in mind! ;) [19:44:03] keine [19:44:31] Kai_WMDE: but i'll be in your city in 11 days [19:46:02] (and I'm very happy to have found some significant money is still left on my unexpired o2.de SIM!) [19:46:22] hehe [19:53:42] !log reedy synchronized php-1.19/extensions/MoodBar/ 'r114030' [19:53:45] Logged the message, Master [19:58:46] Nemo_bis: still have a question about CentralNotice? [19:59:13] pgehres, yes [19:59:38] pgehres, is my question clear enough? [19:59:56] I think the answer to your question is the fact that CentralNotice is loaded via JS and site notice is loaded as part of the page content [20:00:03] but I am not 100% sure of that [20:03:28] !log reedy synchronized wmf-config/InitialiseSettings.php 'Re-enable moodbar on enwiki' [20:03:32] Logged the message, Master [20:04:59] Nemo_bis: does that make sense? I swear there is a bug somewhere for this, but I cannot find it [20:06:43] pgehres, google guys say they consider also JS to index what users actually see [20:07:11] but it would be interesting to know what can be done for local sitenotice, and I don't know who to ask [20:08:54] pgehres has the right answer [20:12:00] !log reedy synchronizing Wikimedia installation... : Rebuild moodbar messages [20:12:03] Logged the message, Master [20:16:04] Nemo_bis: I would also ask Kaldari. He at least knows CN forward and backward and know a lot of its history [20:16:12] he is in Argentina at the moment htough [20:24:30] sync done. [20:33:13] pgehres, ok [20:39:49] dmcmatic! [20:40:04] dmc.dev.it [20:40:40] http://www.dev.it/ exists! :o [20:41:19] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [20:49:34] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [21:20:52] PROBLEM - MySQL Replication Heartbeat on db1033 is CRITICAL: CRIT replication delay 310 seconds [21:21:20] PROBLEM - MySQL Replication Heartbeat on db42 is CRITICAL: CRIT replication delay 337 seconds [21:21:52] Ryan_Lane: Thanks for killing gerrit-wm in here. :-) [21:21:55] PROBLEM - MySQL Replication Heartbeat on db1017 is CRITICAL: CRIT replication delay 373 seconds [21:22:04] PROBLEM - MySQL Replication Heartbeat on db1043 is CRITICAL: CRIT replication delay 382 seconds [21:22:13] PROBLEM - MySQL Slave Delay on db1017 is CRITICAL: CRIT replication delay 390 seconds [21:22:31] PROBLEM - MySQL Replication Heartbeat on db1047 is CRITICAL: CRIT replication delay 408 seconds [21:30:41] Joan: yw [22:19:26] is the API for ptwiki down? [22:22:23] chicocvenancio: no? [22:22:46] im getting this in pywikipedia http://bpaste.net/show/25315/ [22:23:18] could be [22:40:40] PROBLEM - Puppet freshness on stafford is CRITICAL: Puppet has not run in the last 10 hours [22:42:25] Test [22:42:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:43:31] ...test what? [22:44:36] I'm PS from CAT Thailand(AS4651,4652) [22:44:43] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 8.387 seconds [22:45:15] PS-CAT: is something wrong? [22:46:25] I'd like to chat with LeslieCarr. [22:48:05] Regard many TH client can't reach wikimedia. [22:49:31] Daniel Zahn suggest me join with her. [22:51:02] * Damianz joins PS-CAT to Leslie via superglue [22:51:20] over IP? [22:51:32] yes. [22:52:27] source 203.130.139.32/28 and 203.147.39.128/28 [22:53:29] PS-CAT: What does the traceroute say when they try to hit the wmf cluster? [22:54:31] hey PS-CAT [22:54:38] sorry was out talking [22:55:57] so if you can give me a traceroute , i'll see our routes to you, see which direction the issue is in [22:56:01] No mention. [22:56:44] last hop from us is 61.19.15.150 [22:56:53] when tracerouting to 61.19.15.150 [22:57:48] LeslieCarr: to where? [22:57:53] we're getting your routes from Hurrican Electric, via KORnet (4766) [22:57:56] Can u send me result sh ip b 208.80.154.225 from ur router [22:58:20] R u received my e-mail? [22:58:29] did you mean showing the route to 203.130.139.32/28 ? [23:00:12] http://pastebin.com/nHm01jQ1 PS-CAT here's my routes to you [23:00:31] looks like you just prepended a lot ? [23:00:53] I'd like to know return path from Wiki to our client. [23:02:16] PS-CAT: http://pastebin.com/9bViLeXj [23:03:22] Can u suggest me why u set local pref 280? [23:04:14] we set that for all of our peering [23:04:20] for the routes we get from peering [23:06:01] Please show me routes from u to source 203.130.158.166 [23:07:26] http://pastebin.com/zpv8ivq6 [23:09:41] the ptwiki API is giving errors... [23:09:55] PS-CAT: do you have a presence in ashburn, VA ? [23:10:08] if you did we could just bypass everything by peering [23:11:44] Can u explan me differ between source 203.130.139.32 & 203.130.158.166? [23:14:15] Our cus complain can access by 203.130.158.166 but can not by 203.147.39.129. [23:14:47] there is no difference in our eyes - it's the same /19 [23:15:24] PROBLEM - Lucene on search1001 is CRITICAL: Connection refused [23:17:08] Dis u check some policy with 2 source? [23:18:47] I'm don't know why only 203.130.139.32/28 and 203.147.39.128/28 can not reach wikimedia [23:19:00] that is so weird , no routing source policies ... [23:19:01] hrm [23:19:12] so i show another person with a korea telecom issue [23:19:49] let me try de-preffing everything with KT [23:20:35] this will take about 10 minutes [23:21:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:21:25] Ok thank u so much Lessie. [23:22:55] * Damianz gets the network superglue ready [23:23:28] so grabbing some pvc tubing and server kittehs then? [23:24:19] PVC as in Permanent Virtual Circuit, of course! [23:25:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 7.205 seconds [23:27:06] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [23:39:27] PS-CAT: can you try again and see the results ? [23:42:45] that path should have a bit more latency but isn't going via Korea Telecom - though if you have a presence in ashburn, VA we could peer [23:46:00] OK LeslieCarr I'll inform our customer for re-check & call u back again thank you. [23:46:05] thanks [23:46:11] hopefully that fixes it [23:47:22] Thanks again for your help see u soonest bye. [23:47:52] * Damianz wonders if LeslieCarr broke the internet [23:48:18] hehe [23:48:20] i hope not [23:48:28] well maybe i broke korea telecom's internet [23:48:31] but they deserved it [23:49:14] :D [23:49:33] What somthing wrong? [23:49:36] If we broke the whole of China's internet the internet would be a much less spammy place ;) [23:49:51] no, just joking :) [23:50:00] i found south florida to have a huge amount of spam actually [23:51:05] * saper is now doing incident response @friends hit by phishers :/ [23:51:10] Most spam/dos/general abuse issues I see are either china area or us dsl people (which the are probably more malwareified pcs). [23:52:26] Damianz: many spammers have software that refreshes ip's - you can reset your modem -- grrr spammers [23:53:19] True, resetting your CHAP session is just annoying though. [23:59:36] My question is regarding the sorting on "1st National Film Awards" article [23:59:47] one of the section (Feature films) contains the sorting for the table [23:59:54] but sorting image is not visible there