[00:08:51] PROBLEM - Puppet freshness on pdf1 is CRITICAL: No successful Puppet run in the last 10 hours [00:09:23] Coren: around? [00:09:52] ori-l: he's around, but maybe hiding? :P [00:10:12] Nope. I exist. [00:10:22] What can I do to you? [00:11:06] that sounds ominous [00:11:13] Coren: wondering if you could upload a package to reprepo; it's packaged & all [00:11:23] it's in tin, /home/olivneh/statsd [00:11:32] i built it myself, it contains the source [00:11:35] Sure can. [00:12:27] aw, that's awesome. *now* it's a weekend. [00:15:19] ori-l: Hm... do you know if there is docs on that somewhere? I'd rather avoid flailing blindly. :-) [00:18:57] nevermind, apt.wm is configured pretty standard. [00:21:44] Coren: https://wikitech.wikimedia.org/wiki/Git-buildpackage#Add_the_reviewed_package_to_apt.wikmedia.org [00:23:32] p statsd - Stats aggregation daemon [00:23:38] {{done}} [00:25:00] Coren: <3 ! thank you so much [00:25:12] weeee [00:26:51] PROBLEM - Puppet freshness on cp1063 is CRITICAL: No successful Puppet run in the last 10 hours [01:45:52] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: No successful Puppet run in the last 10 hours [01:45:52] PROBLEM - Puppet freshness on fenari is CRITICAL: No successful Puppet run in the last 10 hours [02:08:02] !log LocalisationUpdate completed (1.22wmf14) at Sat Aug 31 02:08:01 UTC 2013 [02:08:09] Logged the message, Master [02:09:46] PROBLEM - DPKG on mw1057 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:14:36] PROBLEM - Ceph on ms-fe1004 is CRITICAL: Ceph HEALTH_ERR 1 pgs inconsistent: 2 scrub errors [02:14:36] PROBLEM - Ceph on ms-fe1001 is CRITICAL: Ceph HEALTH_ERR 1 pgs inconsistent: 2 scrub errors [02:15:16] PROBLEM - Ceph on ms-fe1003 is CRITICAL: Ceph HEALTH_ERR 1 pgs inconsistent: 2 scrub errors [02:16:41] !log LocalisationUpdate completed (1.22wmf15) at Sat Aug 31 02:16:41 UTC 2013 [02:16:47] Logged the message, Master [02:26:21] !log LocalisationUpdate ResourceLoader cache refresh completed at Sat Aug 31 02:26:20 UTC 2013 [02:26:26] Logged the message, Master [03:17:23] (03CR) 10Spage: [C: 031] "Thanks, seems like the right idea. Since these rewrite rules are broken right now we might as well try the change ASAP." [operations/puppet] - 10https://gerrit.wikimedia.org/r/82044 (owner: 10QChris) [04:04:55] PROBLEM - Puppet freshness on sq36 is CRITICAL: No successful Puppet run in the last 10 hours [04:43:04] PROBLEM - DPKG on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:47:44] PROBLEM - Host mw31 is DOWN: PING CRITICAL - Packet loss = 100% [04:48:34] RECOVERY - Host mw31 is UP: PING OK - Packet loss = 0%, RTA = 26.62 ms [05:02:44] PROBLEM - NTP on mw31 is CRITICAL: NTP CRITICAL: Offset unknown [05:04:13] (03PS1) 10Springle: table removed from production [operations/puppet] - 10https://gerrit.wikimedia.org/r/82078 [05:05:25] (03CR) 10Springle: [C: 032 V: 032] table removed from production [operations/puppet] - 10https://gerrit.wikimedia.org/r/82078 (owner: 10Springle) [05:07:44] RECOVERY - NTP on mw31 is OK: NTP OK: Offset -0.00274348259 secs [05:25:00] PROBLEM - Puppet freshness on mw1126 is CRITICAL: No successful Puppet run in the last 10 hours [05:48:11] PROBLEM - DPKG on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:33:43] I have an odd question. Could someone tell me what percentage of page views are made by logged-in users? [06:35:30] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: No successful Puppet run in the last 10 hours [06:38:05] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: No successful Puppet run in the last 10 hours [06:38:05] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: No successful Puppet run in the last 10 hours [06:43:05] PROBLEM - DPKG on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:42] Risker: maybe try #wikimedia-analytics? [06:56:58] thank you legoktm. Ironically, it appears almost everyone in that channel is also in this channel. However, I will try there. [06:57:04] :P [06:57:23] That channel has a lot less bot spam so your question won't get missed [06:58:51] well, to be honest, I have little hope of having the question answered on the Friday night of a long weekend. I'd assume most of the staff are off having a real life. :) [07:27:52] PROBLEM - Puppet freshness on virt0 is CRITICAL: No successful Puppet run in the last 10 hours [07:47:08] PROBLEM - Puppet freshness on sodium is CRITICAL: No successful Puppet run in the last 10 hours [08:21:59] PROBLEM - Puppet freshness on ssl1 is CRITICAL: No successful Puppet run in the last 10 hours [08:27:59] PROBLEM - Puppet freshness on ssl1006 is CRITICAL: No successful Puppet run in the last 10 hours [08:34:59] PROBLEM - Puppet freshness on ssl1008 is CRITICAL: No successful Puppet run in the last 10 hours [08:38:53] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [08:47:53] PROBLEM - Puppet freshness on ssl1001 is CRITICAL: No successful Puppet run in the last 10 hours [08:48:53] PROBLEM - Puppet freshness on amssq47 is CRITICAL: No successful Puppet run in the last 10 hours [08:51:53] PROBLEM - Puppet freshness on ssl1003 is CRITICAL: No successful Puppet run in the last 10 hours [08:51:53] PROBLEM - Puppet freshness on ssl1005 is CRITICAL: No successful Puppet run in the last 10 hours [08:51:53] PROBLEM - Puppet freshness on ssl4 is CRITICAL: No successful Puppet run in the last 10 hours [08:54:53] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [08:54:53] PROBLEM - Puppet freshness on ssl1007 is CRITICAL: No successful Puppet run in the last 10 hours [08:57:53] PROBLEM - Puppet freshness on ssl1002 is CRITICAL: No successful Puppet run in the last 10 hours [08:57:53] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: No successful Puppet run in the last 10 hours [09:00:53] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: No successful Puppet run in the last 10 hours [09:02:53] PROBLEM - Puppet freshness on ssl3003 is CRITICAL: No successful Puppet run in the last 10 hours [09:03:53] PROBLEM - Puppet freshness on ssl1009 is CRITICAL: No successful Puppet run in the last 10 hours [09:04:53] PROBLEM - Puppet freshness on ssl3 is CRITICAL: No successful Puppet run in the last 10 hours [09:04:53] PROBLEM - Puppet freshness on ssl3002 is CRITICAL: No successful Puppet run in the last 10 hours [09:09:41] PROBLEM - Puppet freshness on ssl2 is CRITICAL: No successful Puppet run in the last 10 hours [10:08:55] PROBLEM - Puppet freshness on pdf1 is CRITICAL: No successful Puppet run in the last 10 hours [10:21:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:22:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [10:26:55] PROBLEM - Puppet freshness on cp1063 is CRITICAL: No successful Puppet run in the last 10 hours [10:41:57] PROBLEM - DPKG on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:46:03] PROBLEM - Puppet freshness on fenari is CRITICAL: No successful Puppet run in the last 10 hours [11:46:03] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: No successful Puppet run in the last 10 hours [12:21:18] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:22:07] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.157 second response time [12:27:27] PROBLEM - Backend Squid HTTP on sq37 is CRITICAL: Connection timed out [12:28:27] RECOVERY - Backend Squid HTTP on sq37 is OK: HTTP OK: HTTP/1.0 200 OK - 1258 bytes in 0.054 second response time [12:45:15] PROBLEM - MySQL Idle Transactions on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:46:05] RECOVERY - MySQL Idle Transactions on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [12:51:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.134 second response time [13:51:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:52:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [14:05:55] PROBLEM - Puppet freshness on sq36 is CRITICAL: No successful Puppet run in the last 10 hours [14:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [14:52:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:56:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 5.886 second response time [14:59:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:00:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [15:21:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:22:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.165 second response time [15:25:42] PROBLEM - Puppet freshness on mw1126 is CRITICAL: No successful Puppet run in the last 10 hours [15:51:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:52:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [16:06:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:07:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.177 second response time [16:21:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:22:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [16:37:13] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: No successful Puppet run in the last 10 hours [16:38:13] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: No successful Puppet run in the last 10 hours [16:38:13] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: No successful Puppet run in the last 10 hours [16:43:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:44:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [17:28:23] PROBLEM - Puppet freshness on virt0 is CRITICAL: No successful Puppet run in the last 10 hours [17:47:17] PROBLEM - Puppet freshness on sodium is CRITICAL: No successful Puppet run in the last 10 hours [18:22:14] PROBLEM - Puppet freshness on ssl1 is CRITICAL: No successful Puppet run in the last 10 hours [18:28:14] PROBLEM - Puppet freshness on ssl1006 is CRITICAL: No successful Puppet run in the last 10 hours [18:35:14] PROBLEM - Puppet freshness on ssl1008 is CRITICAL: No successful Puppet run in the last 10 hours [18:38:57] PROBLEM - Puppet freshness on cp1044 is CRITICAL: No successful Puppet run in the last 10 hours [18:47:57] PROBLEM - Puppet freshness on ssl1001 is CRITICAL: No successful Puppet run in the last 10 hours [18:48:57] PROBLEM - Puppet freshness on amssq47 is CRITICAL: No successful Puppet run in the last 10 hours [18:51:57] PROBLEM - Puppet freshness on ssl1003 is CRITICAL: No successful Puppet run in the last 10 hours [18:51:57] PROBLEM - Puppet freshness on ssl1005 is CRITICAL: No successful Puppet run in the last 10 hours [18:51:57] PROBLEM - Puppet freshness on ssl4 is CRITICAL: No successful Puppet run in the last 10 hours [18:54:57] PROBLEM - Puppet freshness on cp1043 is CRITICAL: No successful Puppet run in the last 10 hours [18:54:57] PROBLEM - Puppet freshness on ssl1007 is CRITICAL: No successful Puppet run in the last 10 hours [18:57:57] PROBLEM - Puppet freshness on ssl1002 is CRITICAL: No successful Puppet run in the last 10 hours [18:57:57] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: No successful Puppet run in the last 10 hours [19:00:57] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: No successful Puppet run in the last 10 hours [19:02:57] PROBLEM - Puppet freshness on ssl3003 is CRITICAL: No successful Puppet run in the last 10 hours [19:03:57] PROBLEM - Puppet freshness on ssl1009 is CRITICAL: No successful Puppet run in the last 10 hours [19:04:57] PROBLEM - Puppet freshness on ssl3 is CRITICAL: No successful Puppet run in the last 10 hours [19:04:57] PROBLEM - Puppet freshness on ssl3002 is CRITICAL: No successful Puppet run in the last 10 hours [19:09:56] PROBLEM - Puppet freshness on ssl2 is CRITICAL: No successful Puppet run in the last 10 hours [20:09:51] PROBLEM - Puppet freshness on pdf1 is CRITICAL: No successful Puppet run in the last 10 hours [20:22:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:23:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [20:27:51] PROBLEM - Puppet freshness on cp1063 is CRITICAL: No successful Puppet run in the last 10 hours [20:32:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:33:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.138 second response time [20:52:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.142 second response time [20:57:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:00:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.147 second response time [21:22:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:24:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.159 second response time [21:30:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:32:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 7.883 second response time [21:40:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:42:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [21:45:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:46:15] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: No successful Puppet run in the last 10 hours [21:46:15] PROBLEM - Puppet freshness on fenari is CRITICAL: No successful Puppet run in the last 10 hours [21:47:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [21:52:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:53:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 6.696 second response time [22:52:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:53:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [23:22:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:23:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.144 second response time [23:38:50] PROBLEM - DPKG on db1027 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:40:41] PROBLEM - MySQL disk space on db1027 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:40:41] PROBLEM - RAID on db1027 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:40:41] PROBLEM - mysqld processes on db1027 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:41:20] PROBLEM - MySQL Slave Delay on db1027 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:41:30] PROBLEM - MySQL Recent Restart on db1027 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:41:31] PROBLEM - MySQL Replication Heartbeat on db1027 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:41:31] PROBLEM - MySQL Slave Running on db1027 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:41:31] PROBLEM - Full LVS Snapshot on db1027 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:41:40] PROBLEM - Disk space on db1027 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:41:40] PROBLEM - SSH on db1027 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:41:41] PROBLEM - MySQL Idle Transactions on db1027 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:44:10] (03PS1) 10Springle: depooling db1027 for investigation [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/82110 [23:46:09] (03CR) 10Springle: [C: 032 V: 032] depooling db1027 for investigation [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/82110 (owner: 10Springle) [23:47:46] !log springle synchronized wmf-config/db-eqiad.php 'depooling db1027' [23:47:53] Logged the message, Master [23:52:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:53:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [23:53:21] PROBLEM - NTP on db1027 is CRITICAL: NTP CRITICAL: No response from NTP server