[01:07:37] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [01:41:19] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 210 seconds [01:43:07] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 271 seconds [01:48:58] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 621s [01:51:49] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 1 seconds [01:53:37] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 4 seconds [01:54:58] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 1s [02:07:52] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [02:52:16] RECOVERY - Puppet freshness on mw1016 is OK: puppet ran at Sat Jul 14 02:51:54 UTC 2012 [03:19:52] PROBLEM - Puppet freshness on ms3 is CRITICAL: Puppet has not run in the last 10 hours [03:30:58] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [04:57:49] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [05:06:49] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [05:50:11] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [07:04:03] PROBLEM - MySQL Replication Heartbeat on db44 is CRITICAL: CRIT replication delay 212 seconds [07:04:03] PROBLEM - MySQL Slave Delay on db44 is CRITICAL: CRIT replication delay 213 seconds [07:26:42] RECOVERY - MySQL Replication Heartbeat on db44 is OK: OK replication delay 23 seconds [07:26:51] RECOVERY - MySQL Slave Delay on db44 is OK: OK replication delay 25 seconds [07:39:44] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [07:42:53] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [07:53:14] PROBLEM - MySQL Replication Heartbeat on db44 is CRITICAL: CRIT replication delay 185 seconds [08:00:44] RECOVERY - MySQL Replication Heartbeat on db44 is OK: OK replication delay 0 seconds [08:17:14] PROBLEM - MySQL Replication Heartbeat on db44 is CRITICAL: CRIT replication delay 189 seconds [08:18:35] PROBLEM - MySQL Slave Delay on db44 is CRITICAL: CRIT replication delay 265 seconds [08:36:44] RECOVERY - MySQL Slave Delay on db44 is OK: OK replication delay 11 seconds [08:37:29] RECOVERY - MySQL Replication Heartbeat on db44 is OK: OK replication delay 2 seconds [08:40:21] PROBLEM - Puppet freshness on db1029 is CRITICAL: Puppet has not run in the last 10 hours [08:42:18] PROBLEM - MySQL Replication Heartbeat on db44 is CRITICAL: CRIT replication delay 186 seconds [08:42:54] PROBLEM - MySQL Slave Delay on db44 is CRITICAL: CRIT replication delay 202 seconds [08:46:48] PROBLEM - MySQL Replication Heartbeat on db44 is CRITICAL: CRIT replication delay 185 seconds [08:47:24] PROBLEM - MySQL Slave Delay on db44 is CRITICAL: CRIT replication delay 214 seconds [08:49:57] PROBLEM - MySQL Replication Heartbeat on db44 is CRITICAL: CRIT replication delay 221 seconds [08:52:03] PROBLEM - MySQL Slave Delay on db44 is CRITICAL: CRIT replication delay 231 seconds [09:34:12] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:35:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.052 seconds [09:46:52] RECOVERY - MySQL Slave Delay on db44 is OK: OK replication delay 29 seconds [09:47:55] RECOVERY - MySQL Replication Heartbeat on db44 is OK: OK replication delay 0 seconds [10:09:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:11:01] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.352 seconds [10:13:25] PROBLEM - Host mw1019 is DOWN: PING CRITICAL - Packet loss = 100% [10:45:49] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:57:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.028 seconds [11:08:34] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [11:27:19] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:39:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.031 seconds [12:08:34] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [12:10:49] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:19:40] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.717 seconds [12:52:07] PROBLEM - Host mw1118 is DOWN: PING CRITICAL - Packet loss = 100% [12:54:40] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:02:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.024 seconds [13:20:55] PROBLEM - Puppet freshness on ms3 is CRITICAL: Puppet has not run in the last 10 hours [13:32:01] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [13:36:40] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:43:43] PROBLEM - Host mw1063 is DOWN: PING CRITICAL - Packet loss = 100% [13:45:49] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.064 seconds [14:19:00] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:29:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.026 seconds [14:58:28] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [15:02:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:07:27] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [15:11:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.039 seconds [15:44:12] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:51:15] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [15:53:03] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.186 seconds [16:26:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:36:42] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.019 seconds [17:09:40] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:18:40] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.044 seconds [17:18:59] interesting [17:19:04] slave lag is crap data nowadays [17:26:37] PROBLEM - SSH on cp1044 is CRITICAL: Server answer: [17:28:07] RECOVERY - SSH on cp1044 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:35:01] PROBLEM - SSH on potassium is CRITICAL: Server answer: [17:36:31] RECOVERY - SSH on potassium is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:41:10] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [17:44:11] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [17:50:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:56:01] PROBLEM - Host mw1111 is DOWN: PING CRITICAL - Packet loss = 100% [18:00:46] anyone working on mw1111 ? [18:00:49] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.016 seconds [18:01:33] !log rebooting unresponsive mw1111 [18:01:41] Logged the message, Mistress of the network gear. [18:08:08] RECOVERY - Host mw1111 is UP: PING OK - Packet loss = 0%, RTA = 30.90 ms [18:33:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:40:59] PROBLEM - Puppet freshness on db1029 is CRITICAL: Puppet has not run in the last 10 hours [18:42:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.725 seconds [18:46:26] !log powercycling frozen db1029 [18:46:33] Logged the message, Mistress of the network gear. [18:49:23] PROBLEM - Host db1029 is DOWN: PING CRITICAL - Packet loss = 100% [18:53:17] RECOVERY - SSH on db1029 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:53:26] RECOVERY - Host db1029 is UP: PING OK - Packet loss = 0%, RTA = 30.92 ms [18:54:11] RECOVERY - MySQL disk space on db1029 is OK: DISK OK [19:01:41] RECOVERY - Puppet freshness on db1029 is OK: puppet ran at Sat Jul 14 19:01:28 UTC 2012 [19:05:29] hmm I wonder if that was an "uptime > 208 days" box [19:07:08] doesn't look like it [19:07:09] oh well [19:08:13] apergos, I added you as CC for new bug 38402, since you were handling those requests in bug 17313 [19:08:39] sure [19:08:46] I still do them from time to time [19:09:21] sometime someone should really figure out why we miss a batch (typically 500 iirc) [19:09:25] *shrug* [19:09:46] irritating but there are workarounds so... [19:10:17] I figured it was better to create a tracking bug for that [19:10:33] yes, I dislike when people reopen the one bug and add their own issue onto it [19:10:36] thanks for doing that [19:10:56] oh, it seems we already had a similar one 29757 :S [19:13:23] woops [19:13:32] ok well [19:14:17] off to try to eat a little, the heat is a killer (of appetite if nothing else) [19:14:19] talk to you later [19:15:28] heat kills many things [19:15:41] except boredom :/ [19:16:59] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:26:35] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.039 seconds [19:39:19] hey ops guys, does "Error creating thumbnail: GPL Ghostscript 8.61" happen ofter on wikitech http://wikitech.wikimedia.org/view/File:Wikimania_2012_-_The_Wikipedia_Mobile_Experience_%E2%80%94_Where_We%27ve_Been_and_Where_We%27re_Going_-_Tomasz_and_Jon.pdf ? [19:39:51] RobH: --^ ? [19:50:38] New patchset: Platonides; "(Bug 38404) Change the names of the automatic user categories of the Babel Extension in es.wikipedia.org" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/15674 [19:57:59] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:06:50] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.066 seconds [20:12:59] ping? [20:13:05] labs problem, need ops [20:13:11] -> #wikimedia-labs [20:40:08] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:50:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.023 seconds [21:09:31] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [21:23:01] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:33:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.029 seconds [22:05:01] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:09:31] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [22:09:31] PROBLEM - Puppet freshness on mw1112 is CRITICAL: Puppet has not run in the last 10 hours [22:10:34] PROBLEM - Puppet freshness on mw1109 is CRITICAL: Puppet has not run in the last 10 hours [22:10:34] PROBLEM - Puppet freshness on pdf1 is CRITICAL: Puppet has not run in the last 10 hours [22:11:28] PROBLEM - Puppet freshness on mw1099 is CRITICAL: Puppet has not run in the last 10 hours [22:12:31] PROBLEM - Puppet freshness on mw1013 is CRITICAL: Puppet has not run in the last 10 hours [22:12:31] PROBLEM - Puppet freshness on mw1088 is CRITICAL: Puppet has not run in the last 10 hours [22:12:31] PROBLEM - Puppet freshness on mw1007 is CRITICAL: Puppet has not run in the last 10 hours [22:15:31] PROBLEM - Puppet freshness on mw1119 is CRITICAL: Puppet has not run in the last 10 hours [22:15:31] PROBLEM - Puppet freshness on mw1057 is CRITICAL: Puppet has not run in the last 10 hours [22:15:40] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.046 seconds [22:16:34] PROBLEM - Puppet freshness on mw1071 is CRITICAL: Puppet has not run in the last 10 hours [22:16:35] PROBLEM - Puppet freshness on mw1089 is CRITICAL: Puppet has not run in the last 10 hours [22:16:35] PROBLEM - Puppet freshness on mw1092 is CRITICAL: Puppet has not run in the last 10 hours [22:18:31] PROBLEM - Puppet freshness on mw1028 is CRITICAL: Puppet has not run in the last 10 hours [22:19:34] PROBLEM - Puppet freshness on mw1034 is CRITICAL: Puppet has not run in the last 10 hours [22:19:34] PROBLEM - Puppet freshness on mw1003 is CRITICAL: Puppet has not run in the last 10 hours [22:19:34] PROBLEM - Puppet freshness on mw1047 is CRITICAL: Puppet has not run in the last 10 hours [22:19:34] PROBLEM - Puppet freshness on mw1062 is CRITICAL: Puppet has not run in the last 10 hours [22:19:34] PROBLEM - Puppet freshness on mw1060 is CRITICAL: Puppet has not run in the last 10 hours [22:19:35] PROBLEM - Puppet freshness on mw1078 is CRITICAL: Puppet has not run in the last 10 hours [22:19:35] PROBLEM - Puppet freshness on mw1121 is CRITICAL: Puppet has not run in the last 10 hours [22:19:36] PROBLEM - Puppet freshness on mw1082 is CRITICAL: Puppet has not run in the last 10 hours [22:19:36] PROBLEM - Puppet freshness on mw1104 is CRITICAL: Puppet has not run in the last 10 hours [22:19:37] PROBLEM - Puppet freshness on mw1153 is CRITICAL: Puppet has not run in the last 10 hours [22:19:37] PROBLEM - Puppet freshness on mw1123 is CRITICAL: Puppet has not run in the last 10 hours [22:19:38] PROBLEM - Puppet freshness on mw1155 is CRITICAL: Puppet has not run in the last 10 hours [22:48:05] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:48:32] PROBLEM - Host mw1010 is DOWN: PING CRITICAL - Packet loss = 100% [22:55:35] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.908 seconds [23:22:26] PROBLEM - Puppet freshness on ms3 is CRITICAL: Puppet has not run in the last 10 hours [23:30:14] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:33:23] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [23:39:06] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.055 seconds