[00:16:56] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3036 [00:19:00] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:25:00] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.896 seconds [00:26:19] "is OK: HTTP OK HTTP/1.1 400 Bad Request" eee [00:27:09] Yeah [00:27:16] It's been explained to me that that's actually right [00:27:21] I filed a ticket about it at some point [00:27:38] ah ok [00:35:13] gn8 folks [00:41:32] damn it! ;) Who broke editing via API at Commons? Deletions also do not return a success message to the browser. [00:41:53] https://bugzilla.wikimedia.org/show_bug.cgi?id=34717 is similar [00:58:09] thanks ;) [00:59:47] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:01:26] PROBLEM - Host ssl3001 is DOWN: PING CRITICAL - Packet loss = 100% [01:01:26] PROBLEM - Host ssl3003 is DOWN: PING CRITICAL - Packet loss = 100% [01:01:35] PROBLEM - Host wikibooks-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [01:01:36] PROBLEM - Host wikinews-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [01:01:36] and now commons is dead completely [01:01:37] PROBLEM - Host wikipedia-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [01:01:37] PROBLEM - Host wikisource-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [01:01:58] 502 Bad Gateway [01:01:58] nginx/0.7.65 [01:02:02] RECOVERY - Host ssl3003 is UP: PING OK - Packet loss = 0%, RTA = 120.02 ms [01:02:02] RECOVERY - Host ssl3001 is UP: PING OK - Packet loss = 0%, RTA = 118.80 ms [01:02:11] RECOVERY - Host wikibooks-lb.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 119.89 ms [01:02:20] RECOVERY - Host wikipedia-lb.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 118.96 ms [01:02:34] back again [01:02:37] messy! [01:02:38] RECOVERY - Host wikisource-lb.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 118.06 ms [01:02:56] RECOVERY - Host wikinews-lb.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 119.19 ms [01:05:47] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 1.889 seconds [01:07:17] PROBLEM - Puppet freshness on db1004 is CRITICAL: Puppet has not run in the last 10 hours [01:17:20] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [01:18:14] PROBLEM - Puppet freshness on knsq23 is CRITICAL: Puppet has not run in the last 10 hours [01:18:14] PROBLEM - Puppet freshness on amssq40 is CRITICAL: Puppet has not run in the last 10 hours [01:19:17] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [01:19:17] PROBLEM - Puppet freshness on amssq49 is CRITICAL: Puppet has not run in the last 10 hours [01:19:17] PROBLEM - Puppet freshness on amssq56 is CRITICAL: Puppet has not run in the last 10 hours [01:22:17] PROBLEM - Puppet freshness on knsq24 is CRITICAL: Puppet has not run in the last 10 hours [01:22:17] PROBLEM - Puppet freshness on ms6 is CRITICAL: Puppet has not run in the last 10 hours [01:22:17] PROBLEM - Puppet freshness on knsq21 is CRITICAL: Puppet has not run in the last 10 hours [01:22:17] PROBLEM - Puppet freshness on ssl3003 is CRITICAL: Puppet has not run in the last 10 hours [01:23:11] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 17.3815615179 (gt 8.0) [01:23:20] PROBLEM - Puppet freshness on amssq62 is CRITICAL: Puppet has not run in the last 10 hours [01:26:20] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [01:26:20] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [01:31:26] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 0.371558125 [01:32:02] PROBLEM - Packetloss_Average on emery is CRITICAL: CRITICAL: packet_loss_average is 8.00294721739 (gt 8.0) [01:36:14] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: Puppet has not run in the last 10 hours [01:38:02] RECOVERY - Packetloss_Average on emery is OK: OK: packet_loss_average is 0.287232807018 [01:40:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:44:20] PROBLEM - Puppet freshness on marmontel is CRITICAL: Puppet has not run in the last 10 hours [01:45:14] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [01:46:35] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 5.177 seconds [01:55:17] PROBLEM - Puppet freshness on amssq31 is CRITICAL: Puppet has not run in the last 10 hours [01:55:17] PROBLEM - Puppet freshness on amslvs1 is CRITICAL: Puppet has not run in the last 10 hours [01:58:20] is DNS down? [01:58:30] Didn't nagios report it coming back up? [01:58:44] okay, this is severe [01:59:16] or is it the cache [02:00:04] why don't you just say what's actually wrong? [02:01:17] PROBLEM - Puppet freshness on amslvs3 is CRITICAL: Puppet has not run in the last 10 hours [02:01:17] PROBLEM - Puppet freshness on amssq35 is CRITICAL: Puppet has not run in the last 10 hours [02:01:17] PROBLEM - Puppet freshness on amssq38 is CRITICAL: Puppet has not run in the last 10 hours [02:01:17] PROBLEM - Puppet freshness on amssq41 is CRITICAL: Puppet has not run in the last 10 hours [02:01:17] PROBLEM - Puppet freshness on amssq50 is CRITICAL: Puppet has not run in the last 10 hours [02:01:17] PROBLEM - Puppet freshness on amssq52 is CRITICAL: Puppet has not run in the last 10 hours [02:01:18] PROBLEM - Puppet freshness on amssq58 is CRITICAL: Puppet has not run in the last 10 hours [02:01:18] PROBLEM - Puppet freshness on cp3001 is CRITICAL: Puppet has not run in the last 10 hours [02:01:19] PROBLEM - Puppet freshness on knsq17 is CRITICAL: Puppet has not run in the last 10 hours [02:01:19] PROBLEM - Puppet freshness on knsq25 is CRITICAL: Puppet has not run in the last 10 hours [02:01:20] PROBLEM - Puppet freshness on knsq29 is CRITICAL: Puppet has not run in the last 10 hours [02:05:35] nevermind [02:05:37] its back [02:05:45] ... [02:05:50] anyway, the question I should be asking, whats wrong [02:10:17] PROBLEM - Puppet freshness on amssq33 is CRITICAL: Puppet has not run in the last 10 hours [02:10:17] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [02:10:17] PROBLEM - Puppet freshness on amssq36 is CRITICAL: Puppet has not run in the last 10 hours [02:10:17] PROBLEM - Puppet freshness on amssq39 is CRITICAL: Puppet has not run in the last 10 hours [02:10:17] PROBLEM - Puppet freshness on amssq44 is CRITICAL: Puppet has not run in the last 10 hours [02:10:17] PROBLEM - Puppet freshness on amssq51 is CRITICAL: Puppet has not run in the last 10 hours [02:10:18] PROBLEM - Puppet freshness on amssq53 is CRITICAL: Puppet has not run in the last 10 hours [02:10:18] PROBLEM - Puppet freshness on amssq54 is CRITICAL: Puppet has not run in the last 10 hours [02:10:19] PROBLEM - Puppet freshness on amssq59 is CRITICAL: Puppet has not run in the last 10 hours [02:10:19] PROBLEM - Puppet freshness on amssq55 is CRITICAL: Puppet has not run in the last 10 hours [02:10:20] PROBLEM - Puppet freshness on amssq60 is CRITICAL: Puppet has not run in the last 10 hours [02:10:20] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [02:10:21] PROBLEM - Puppet freshness on knsq26 is CRITICAL: Puppet has not run in the last 10 hours [02:10:21] PROBLEM - Puppet freshness on ssl3004 is CRITICAL: Puppet has not run in the last 10 hours [02:10:50] nagios-wm: library voices? [02:11:20] PROBLEM - Puppet freshness on amssq45 is CRITICAL: Puppet has not run in the last 10 hours [02:11:20] PROBLEM - Puppet freshness on amssq32 is CRITICAL: Puppet has not run in the last 10 hours [02:11:20] PROBLEM - Puppet freshness on amssq47 is CRITICAL: Puppet has not run in the last 10 hours [02:11:20] PROBLEM - Puppet freshness on amssq48 is CRITICAL: Puppet has not run in the last 10 hours [02:11:20] PROBLEM - Puppet freshness on knsq27 is CRITICAL: Puppet has not run in the last 10 hours [02:11:20] PROBLEM - Puppet freshness on knsq28 is CRITICAL: Puppet has not run in the last 10 hours [02:14:20] PROBLEM - Puppet freshness on amssq34 is CRITICAL: Puppet has not run in the last 10 hours [02:14:20] PROBLEM - Puppet freshness on amssq37 is CRITICAL: Puppet has not run in the last 10 hours [02:14:20] PROBLEM - Puppet freshness on cp3002 is CRITICAL: Puppet has not run in the last 10 hours [02:14:20] PROBLEM - Puppet freshness on amssq42 is CRITICAL: Puppet has not run in the last 10 hours [02:14:20] PROBLEM - Puppet freshness on amssq57 is CRITICAL: Puppet has not run in the last 10 hours [02:14:20] PROBLEM - Puppet freshness on knsq18 is CRITICAL: Puppet has not run in the last 10 hours [02:14:20] PROBLEM - Puppet freshness on knsq16 is CRITICAL: Puppet has not run in the last 10 hours [02:14:21] PROBLEM - Puppet freshness on knsq19 is CRITICAL: Puppet has not run in the last 10 hours [02:14:21] PROBLEM - Puppet freshness on ssl3002 is CRITICAL: Puppet has not run in the last 10 hours [02:14:22] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [02:15:14] PROBLEM - Puppet freshness on amssq46 is CRITICAL: Puppet has not run in the last 10 hours [02:16:17] PROBLEM - Puppet freshness on knsq22 is CRITICAL: Puppet has not run in the last 10 hours [02:16:17] PROBLEM - Puppet freshness on hooft is CRITICAL: Puppet has not run in the last 10 hours [02:17:20] PROBLEM - Puppet freshness on amssq43 is CRITICAL: Puppet has not run in the last 10 hours [02:17:20] PROBLEM - Puppet freshness on amssq61 is CRITICAL: Puppet has not run in the last 10 hours [02:17:20] PROBLEM - Puppet freshness on nescio is CRITICAL: Puppet has not run in the last 10 hours [02:18:03] !log LocalisationUpdate completed (1.19) at Tue Mar 13 02:18:03 UTC 2012 [02:18:08] Logged the message, Master [02:18:17] finally a useful bot ;) [02:19:17] PROBLEM - Puppet freshness on knsq20 is CRITICAL: Puppet has not run in the last 10 hours [02:19:32] jeremyb, do you call useful only bots bringing good news? [02:19:53] Nemo_bis: ask me in 24 hrs [02:20:03] hmm [02:20:08] * Nemo_bis just goes to bed [02:21:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:25:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 9.789 seconds [03:10:16] good night [06:33:22] PROBLEM - Puppet freshness on db1033 is CRITICAL: Puppet has not run in the last 10 hours [06:37:16] PROBLEM - Puppet freshness on virt4 is CRITICAL: Puppet has not run in the last 10 hours [06:41:19] PROBLEM - Puppet freshness on virt3 is CRITICAL: Puppet has not run in the last 10 hours [06:46:48] PROBLEM - Puppet freshness on virt1 is CRITICAL: Puppet has not run in the last 10 hours [06:56:51] PROBLEM - Puppet freshness on virt2 is CRITICAL: Puppet has not run in the last 10 hours [07:40:36] PROBLEM - Puppet freshness on mw53 is CRITICAL: Puppet has not run in the last 10 hours [08:34:15] New patchset: ArielGlenn; "add 10.64.16 to hosts for common/httpdconf sync" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3106 [08:34:26] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3106 [08:35:47] New review: ArielGlenn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3106 [08:35:50] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3106 [09:26:25] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:28:22] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [09:30:32] Morning [09:30:37] Problems this morning people? [09:32:26] Qcoder00: what problem? [09:32:38] Slow downloading of page styles [09:36:37] PROBLEM - Host db1040 is DOWN: PING CRITICAL - Packet loss = 100% [09:38:43] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [09:38:43] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [09:51:37] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:53:34] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [10:34:11] New patchset: Mark Bergsma; "Do HTCP loss monitoring on the upload eqiad servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3107 [10:34:23] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3107 [10:34:48] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3107 [10:34:51] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3107 [10:49:20] PROBLEM - Disk space on srv220 is CRITICAL: DISK CRITICAL - free space: / 283 MB (3% inode=61%): /var/lib/ureadahead/debugfs 283 MB (3% inode=61%): [10:49:25] New patchset: Mark Bergsma; "include nagios::configuration so $master_hosts can be referenced" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3108 [10:49:37] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3108 [10:49:45] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3108 [10:49:48] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3108 [10:51:26] RECOVERY - Disk space on srv220 is OK: DISK OK [10:55:22] New patchset: Mark Bergsma; "Fix varnishhtcpd path" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3109 [10:55:34] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3109 [10:55:48] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3109 [10:55:51] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3109 [11:08:50] PROBLEM - Puppet freshness on db1004 is CRITICAL: Puppet has not run in the last 10 hours [11:18:53] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [11:19:47] PROBLEM - Puppet freshness on amssq40 is CRITICAL: Puppet has not run in the last 10 hours [11:19:47] PROBLEM - Puppet freshness on knsq23 is CRITICAL: Puppet has not run in the last 10 hours [11:20:50] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [11:20:50] PROBLEM - Puppet freshness on amssq56 is CRITICAL: Puppet has not run in the last 10 hours [11:20:50] PROBLEM - Puppet freshness on amssq49 is CRITICAL: Puppet has not run in the last 10 hours [11:23:50] PROBLEM - Puppet freshness on ms6 is CRITICAL: Puppet has not run in the last 10 hours [11:23:50] PROBLEM - Puppet freshness on knsq24 is CRITICAL: Puppet has not run in the last 10 hours [11:23:50] PROBLEM - Puppet freshness on knsq21 is CRITICAL: Puppet has not run in the last 10 hours [11:23:50] PROBLEM - Puppet freshness on ssl3003 is CRITICAL: Puppet has not run in the last 10 hours [11:24:53] PROBLEM - Puppet freshness on amssq62 is CRITICAL: Puppet has not run in the last 10 hours [11:27:53] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [11:27:53] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [11:32:16] New patchset: Mark Bergsma; "Try a dynamic lookup, global is not working" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3110 [11:32:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3110 [11:32:46] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3110 [11:32:49] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3110 [11:37:47] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: Puppet has not run in the last 10 hours [11:40:19] New patchset: Mark Bergsma; "Install socat for unicast->multicast relaying" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3111 [11:40:31] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3111 [11:45:53] PROBLEM - Puppet freshness on marmontel is CRITICAL: Puppet has not run in the last 10 hours [11:46:47] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [11:47:23] New patchset: Mark Bergsma; "Migrate CDN logging to our GLOP multicast address range" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3112 [11:47:35] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3112 [11:48:05] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3111 [11:48:08] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3111 [11:48:35] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3112 [11:48:38] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3112 [11:56:41] New patchset: Mark Bergsma; "Subscribe to upstart job changes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3113 [11:56:53] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3113 [11:57:04] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3113 [11:57:07] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3113 [13:05:03] does anyone still work on WP:XFF, or does anyone still know if it is more-or-less maintained still? [13:31:20] any of the just joiners know if the XFF project is still alive? Im hoping someone can check if Telecom NZ sends proper XFF headers, and if so, if it can be whitelisted [13:31:30] New patchset: Mark Bergsma; "Swift response times are problematic, request only from Squid for now" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3114 [13:31:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3114 [13:31:47] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3114 [13:31:50] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3114 [13:50:05] New patchset: ArielGlenn; "hash of all subnets in network constants" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3115 [13:50:15] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/3115 [14:42:26] New patchset: Hashar; "hash of all subnets in network constants" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3115 [14:42:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3115 [15:13:28] PROBLEM - Puppet freshness on amslvs1 is CRITICAL: Puppet has not run in the last 10 hours [15:13:28] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [15:13:28] PROBLEM - Puppet freshness on amslvs3 is CRITICAL: Puppet has not run in the last 10 hours [15:13:28] PROBLEM - Puppet freshness on amssq32 is CRITICAL: Puppet has not run in the last 10 hours [15:13:28] PROBLEM - Puppet freshness on amssq31 is CRITICAL: Puppet has not run in the last 10 hours [15:13:28] PROBLEM - Puppet freshness on amssq33 is CRITICAL: Puppet has not run in the last 10 hours [15:13:28] PROBLEM - Puppet freshness on amssq34 is CRITICAL: Puppet has not run in the last 10 hours [15:14:40] PROBLEM - Host db1020 is DOWN: PING CRITICAL - Packet loss = 100% [15:15:29] New patchset: Mark Bergsma; "Fix LVS setup of payments" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3116 [15:15:41] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3116 [15:17:30] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3116 [15:17:33] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3116 [15:22:37] PROBLEM - BGP status on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, sessions up: 7, down: 1, shutdown: 0BRPeering with AS64600 not established - BR [15:24:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:25:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 4.710 seconds [15:29:04] PROBLEM - Host lvs1005 is DOWN: PING CRITICAL - Packet loss = 100% [15:30:34] RECOVERY - Host lvs1005 is UP: PING OK - Packet loss = 0%, RTA = 26.58 ms [15:31:01] RECOVERY - BGP status on cr2-eqiad is OK: OK: host 208.80.154.197, sessions up: 8, down: 0, shutdown: 0 [15:31:46] PROBLEM - Auth DNS on ns2.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [15:32:40] RECOVERY - Auth DNS on ns2.wikimedia.org is OK: DNS OK: 0.148 seconds response time. www.wikipedia.org returns 208.80.154.225 [15:36:25] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 9.92723975 (gt 8.0) [15:47:38] PROBLEM - Disk space on srv219 is CRITICAL: DISK CRITICAL - free space: / 235 MB (3% inode=61%): /var/lib/ureadahead/debugfs 235 MB (3% inode=61%): [15:50:47] RECOVERY - Disk space on srv219 is OK: DISK OK [15:52:17] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 10.7209579167 (gt 8.0) [15:54:26] New patchset: Lcarr; "Cleaning up icinga config Moved files from nagios3 directory, notify proper service, etc" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3117 [15:54:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3117 [15:58:17] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:01:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 6.398 seconds [16:02:41] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3117 [16:02:43] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3117 [16:05:29] PROBLEM - Disk space on srv220 is CRITICAL: DISK CRITICAL - free space: / 280 MB (3% inode=61%): /var/lib/ureadahead/debugfs 280 MB (3% inode=61%): [16:16:23] PROBLEM - MySQL Replication Heartbeat on db1042 is CRITICAL: NRPE: Unable to read output [16:16:23] PROBLEM - MySQL disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 0 MB (0% inode=0%): [16:16:23] PROBLEM - Disk space on ms1002 is CRITICAL: DISK CRITICAL - free space: /export/upload 62299 MB (0% inode=87%): [16:16:44] PROBLEM - Memcached on marmontel is CRITICAL: Connection refused [16:16:51] PROBLEM - MySQL Replication Heartbeat on db49 is CRITICAL: NRPE: Unable to read output [16:16:59] PROBLEM - Memcached on srv254 is CRITICAL: Connection refused [16:16:59] PROBLEM - mysqld processes on db1022 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [16:17:08] PROBLEM - RAID on virt1 is CRITICAL: CRITICAL: Degraded [16:17:26] PROBLEM - mysqld processes on db56 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [16:17:26] PROBLEM - MySQL replication status on es1002 is CRITICAL: (Return code of 255 is out of bounds) [16:17:35] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [16:17:35] PROBLEM - MySQL master status on es1001 is CRITICAL: CRITICAL: Read only: expected OFF, got ON [16:17:44] PROBLEM - mysqld processes on db1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [16:17:44] PROBLEM - Disk space on db1047 is CRITICAL: DISK CRITICAL - free space: /a 6895 MB (0% inode=99%): [16:17:44] PROBLEM - Puppetmaster HTTPS on virt0 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8140: HTTP/1.1 403 Forbidden [16:17:44] PROBLEM - MySQL slave status on es1002 is CRITICAL: CRITICAL: Lost connection to MySQL server at reading initial communication packet, system error: 111 [16:17:44] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:17:53] PROBLEM - LDAP on nfs1 is CRITICAL: Connection refused [16:17:53] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:18:02] PROBLEM - MySQL disk space on db1047 is CRITICAL: DISK CRITICAL - free space: /a 6894 MB (0% inode=99%): [16:18:02] PROBLEM - MySQL Replication Heartbeat on db48 is CRITICAL: NRPE: Unable to read output [16:18:02] PROBLEM - Disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 0 MB (0% inode=0%): [16:18:11] PROBLEM - LDAPS on nfs1 is CRITICAL: Connection refused [16:18:20] PROBLEM - Backend Squid HTTP on knsq25 is CRITICAL: Connection refused [16:19:23] PROBLEM - Disk space on srv220 is CRITICAL: DISK CRITICAL - free space: / 80 MB (1% inode=61%): /var/lib/ureadahead/debugfs 80 MB (1% inode=61%): [16:21:29] RECOVERY - Disk space on srv220 is OK: DISK OK [16:22:32] RECOVERY - Disk space on ms1002 is OK: DISK OK [16:23:20] !log stole some free space from the phys volume on ms1002 to give us more time for the rsync to keep going til after the move to swift etc [16:23:24] Logged the message, Master [16:24:20] PROBLEM - Lucene on searchidx1001 is CRITICAL: Connection refused [16:36:20] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 11.6114966667 (gt 8.0) [16:36:47] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:39:11] PROBLEM - Puppet freshness on virt4 is CRITICAL: Puppet has not run in the last 10 hours [16:40:50] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 5.863 seconds [16:43:14] PROBLEM - Puppet freshness on virt3 is CRITICAL: Puppet has not run in the last 10 hours [16:46:05] !log preilly synchronized wmf-config/InitialiseSettings.php 'add ZeroRatedMobileAccess extension to mswiki remove from mywiki' [16:46:08] Logged the message, Master [16:52:41] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 8.79082958333 (gt 8.0) [16:57:04] * apergos looks at the time [16:58:59] !log preilly synchronized wmf-config/InitialiseSettings.php 'add disable images option for mswiki on zero domain' [16:59:02] Logged the message, Master [16:59:36] PROBLEM - Puppet freshness on virt2 is CRITICAL: Puppet has not run in the last 10 hours [16:59:39] !log preilly synchronized wmf-config/CommonSettings.php 'add disable images option for mswiki on zero domain' [16:59:42] Logged the message, Master [16:59:43] !log add disable images support to mswiki under zero domain [16:59:46] Logged the message, Master [17:00:52] New patchset: Jgreen; "pgehres storage3 shell access per RT 2610" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3118 [17:01:04] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3118 [17:01:31] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3118 [17:01:33] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3118 [17:05:54] !log preilly synchronized php-1.19/extensions/ZeroRatedMobileAccess/ZeroRatedMobileAccess.body.php 'changes for zero' [17:05:57] Logged the message, Master [17:14:59] * Karol007 is away: Pracuję albo śpię [17:15:57] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 8.772 seconds [17:16:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:20:01] this should be a good time to catch a few people around who are interested in dumps? if there is anyone? [17:20:13] * apergos looks around  [17:20:39] sees none of the usual suspects [17:20:42] is disappointed... [17:22:15] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:22:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.049 seconds [17:23:53] New patchset: Bhartshorne; "changing lvs and nagios to check for a file in swift directly rather than going through the swift rewrite stuff for thumbnails to protect against the thumbnail getting deleted (second try)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3119 [17:24:05] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3119 [17:24:13] Change abandoned: Bhartshorne; "retried in change 3119" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3036 [17:28:06] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 8.87275554622 (gt 8.0) [17:30:09] !log reedy synchronized wmf-config/CommonSettings.php 'Bug 35183 - p include extensions/Renameuser/Renameuser.php instead of extensions/Renameuser/SpecialRenameuser.php' [17:30:14] Logged the message, Master [17:32:00] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3119 [17:32:02] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3119 [17:32:36] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 7.429 seconds [17:32:36] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 7.614 seconds [17:39:03] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:39:03] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:41:54] PROBLEM - Puppet freshness on mw53 is CRITICAL: Puppet has not run in the last 10 hours [17:47:27] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.710 seconds [17:47:27] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.719 seconds [17:50:26] RECOVERY - Host db1040 is UP: PING OK - Packet loss = 0%, RTA = 26.49 ms [17:50:53] PROBLEM - NTP on db1040 is CRITICAL: NTP CRITICAL: Offset unknown [17:51:38] PROBLEM - MySQL Replication Heartbeat on db1040 is CRITICAL: CRIT replication delay 29891 seconds [17:52:02] New patchset: Lcarr; "Reenabling icinga install on neon" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3120 [17:52:14] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/3120 [17:53:35] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:53:44] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:54:56] RECOVERY - NTP on db1040 is OK: NTP OK: Offset 0.003578186035 secs [17:55:41] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 8.108 seconds [17:56:17] PROBLEM - MySQL Slave Running on db1040 is CRITICAL: CRIT replication Slave_IO_Running: No Slave_SQL_Running: No Last_Error: Rollback done for prepared transaction because its XID was not in the [17:56:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:57:46] New patchset: Lcarr; "Reenabling icinga install on neon" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3120 [17:57:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3120 [17:58:19] New patchset: RobH; "updated ipmi script to work a bit better, added iron into site.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3121 [17:58:32] New patchset: RobH; " left out one tiny change" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3122 [17:58:44] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3121 [17:58:44] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3122 [17:59:11] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3120 [17:59:14] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3120 [18:00:44] New review: RobH; "easy changes to a server no one is using yet and a script i wrote anyhow" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3121 [18:00:47] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3121 [18:01:33] New review: RobH; "updated in script help prompts" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3122 [18:01:36] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3122 [18:02:08] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:02:53] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.064 seconds [18:04:23] RECOVERY - MySQL Slave Running on db1040 is OK: OK replication Slave_IO_Running: Yes Slave_SQL_Running: Yes Last_Error: [18:09:56] PROBLEM - MySQL Slave Delay on db1040 is CRITICAL: CRIT replication delay 27764 seconds [18:23:08] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.152 seconds [18:23:08] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.149 seconds [18:29:35] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:29:35] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:29:40] New patchset: Lcarr; "inserting icinga config file" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3123 [18:29:53] New patchset: Lcarr; "adding config files into git" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3124 [18:30:05] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3123 [18:30:05] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3124 [18:30:17] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3123 [18:30:19] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3123 [18:31:57] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3124 [18:32:00] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3124 [18:37:50] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:40:32] RECOVERY - MySQL Replication Heartbeat on db1040 is OK: OK replication delay 0 seconds [18:40:32] RECOVERY - MySQL Slave Delay on db1040 is OK: OK replication delay 0 seconds [18:43:59] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 1.905 seconds [18:50:13] New patchset: Pyoungmeister; "using these would be smart!" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3125 [18:50:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3125 [18:51:50] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3125 [18:51:52] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3125 [18:54:47] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 7.765 seconds [18:55:06] New review: Hashar; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/3115 [19:02:24] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:07:49] More people getting the mobile edition when opening http://blog.wikimedia.org/ in a normal webbrowser? Is this intentional? [19:08:30] !log preilly synchronized php-1.19/extensions/ZeroRatedMobileAccess/ZeroRatedMobileAccess.body.php 'changes for zero' [19:08:34] Logged the message, Master [19:08:49] !log preilly synchronized php-1.19/extensions/ZeroRatedMobileAccess/ZeroRatedMobileAccess.i18n.php 'changes for zero' [19:08:52] !log pushing changes for zero to mswiki [19:08:52] Logged the message, Master [19:08:55] Logged the message, Master [19:12:45] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 7.300 seconds [19:15:17] New patchset: Lcarr; "Putting service definition after files installed" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3126 [19:15:30] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3126 [19:15:43] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3126 [19:15:45] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3126 [19:18:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:19:03] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.376 seconds [19:19:12] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:23:06] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 11.2599349167 (gt 8.0) [19:24:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 8.746 seconds [19:25:21] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:25:30] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.031 seconds [19:27:07] !log preilly synchronized php-1.19/extensions/ZeroRatedMobileAccess/ZeroRatedMobileAccess.body.php 'changes for zero needed for carrier testing' [19:27:10] Logged the message, Master [19:31:48] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.239 seconds [19:37:30] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 9.98531258333 (gt 8.0) [19:38:06] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:38:06] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:40:03] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [19:40:48] PROBLEM - Disk space on srv219 is CRITICAL: DISK CRITICAL - free space: / 132 MB (1% inode=61%): /var/lib/ureadahead/debugfs 132 MB (1% inode=61%): [19:44:24] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.706 seconds [19:50:42] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:56:32] RECOVERY - Disk space on srv219 is OK: DISK OK [20:01:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:05:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 8.677 seconds [20:07:02] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 2.60560366667 [20:08:59] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 7.894 seconds [20:09:26] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.022 seconds [20:14:11] New patchset: Lcarr; "Trying to ignore this as a requirement" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3127 [20:14:23] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3127 [20:15:44] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:17:23] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:21:26] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 8.10203858333 (gt 8.0) [20:23:41] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.177 seconds [20:29:59] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:30:00] !log preilly synchronized php-1.19/extensions/ZeroRatedMobileAccess/ZeroRatedMobileAccess.i18n.php 'changes for zero' [20:30:03] Logged the message, Master [20:34:02] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 6.934 seconds [20:34:38] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.327 seconds [20:37:47] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.557444 [20:38:26] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3127 [20:38:29] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3127 [20:39:26] PROBLEM - Disk space on srv219 is CRITICAL: DISK CRITICAL - free space: / 131 MB (1% inode=61%): /var/lib/ureadahead/debugfs 131 MB (1% inode=61%): [20:40:20] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:40:56] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:42:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:48:35] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 5.373 seconds [20:49:56] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Tue Mar 13 20:49:34 UTC 2012 [20:50:41] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 7.428 seconds [20:51:53] RECOVERY - Disk space on srv219 is OK: DISK OK [20:52:07] * Karol007 is back (gone 03:37:08) [20:57:08] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:01:48] New patchset: Lcarr; "trying another commenting out" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3128 [21:02:00] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3128 [21:02:38] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3128 [21:02:41] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3128 [21:05:23] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 7.370 seconds [21:10:05] RECOVERY - Host cp1036 is UP: PING OK - Packet loss = 0%, RTA = 26.46 ms [21:11:17] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:17:44] RECOVERY - mysqld processes on db56 is OK: PROCS OK: 1 process with command name mysqld [21:19:50] !log started slaving db56 from db37 [21:19:53] Logged the message, Master [21:20:35] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.972 seconds [21:20:53] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [21:21:47] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.857 seconds [21:22:50] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [21:23:17] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:23:35] PROBLEM - MySQL Replication Heartbeat on db56 is CRITICAL: CRIT replication delay 2591 seconds [21:24:21] PROBLEM - MySQL Slave Delay on db56 is CRITICAL: CRIT replication delay 2293 seconds [21:25:50] PROBLEM - Puppet freshness on ms6 is CRITICAL: Puppet has not run in the last 10 hours [21:25:50] PROBLEM - Puppet freshness on knsq24 is CRITICAL: Puppet has not run in the last 10 hours [21:25:50] PROBLEM - Puppet freshness on ssl3003 is CRITICAL: Puppet has not run in the last 10 hours [21:27:02] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:28:05] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:29:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.035 seconds [21:29:44] RECOVERY - MySQL Replication Heartbeat on db56 is OK: OK replication delay 0 seconds [21:29:53] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [21:29:53] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [21:30:20] RECOVERY - MySQL Slave Delay on db56 is OK: OK replication delay 0 seconds [21:31:33] !log asher synchronized wmf-config/db.php 'replacing db18 with new s7 slave db56' [21:31:36] Logged the message, Master [21:36:25] New patchset: Asher; "making sync_binlog=1 the default for prod dbs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3130 [21:36:37] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3130 [21:39:38] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.892 seconds [21:39:47] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: Puppet has not run in the last 10 hours [21:40:32] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 8.567 seconds [21:44:05] !log preilly synchronized php-1.19/extensions/ZeroRatedMobileAccess/ZeroRatedMobileAccess.i18n.php 'changes for zero' [21:44:08] Logged the message, Master [21:44:24] !log preilly synchronized php-1.19/extensions/ZeroRatedMobileAccess/ZeroRatedMobileAccess.body.php 'changes for zero' [21:44:27] Logged the message, Master [21:46:59] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:48:02] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:48:47] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [21:48:56] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 8.665 seconds [21:55:23] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:59:35] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.953 seconds [22:04:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:04:41] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.566 seconds [22:05:53] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:08:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 2.035 seconds [22:13:14] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:17:27] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.788 seconds [22:23:14] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:23:29] New patchset: Reedy; "Switch foreachwikiindblist to use MWScript.php" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3131 [22:23:41] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3131 [22:25:38] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.401 seconds [22:27:17] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 4.377 seconds [22:27:23] !log awjrichards synchronized wmf-config/InitialiseSettings.php 'Removing moile URL template for tewtwiki' [22:27:26] Logged the message, Master [22:28:07] Reedy: so you have the push hook working? [22:28:17] not from windows ;) [22:28:51] where is the curl version code? [22:29:47] New review: Aaron Schulz; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/3131 [22:31:52] curl version code? [22:33:05] * AaronSchulz thought there was a version of the hook that used curl [22:35:59] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:40:20] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:41:33] New patchset: Bhartshorne; "removed extra slash from squid purge URLs. purge was generating http://upload...//wikipe... rather than http://upload.../wikipe..., causing the purge to fail (silently)." [operations/software] (master) - https://gerrit.wikimedia.org/r/3132 [22:42:24] New review: Bhartshorne; "(no comment)" [operations/software] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3132 [22:42:26] Change merged: Bhartshorne; [operations/software] (master) - https://gerrit.wikimedia.org/r/3132 [22:42:32] !log awjrichards synchronized php/extensions/MobileFrontend/api/ApiQueryExcerpts.php 'r113774' [22:42:35] Logged the message, Master [22:43:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:43:39] !log awjrichards synchronized php/extensions/MobileFrontend/templates/ApplicationTemplate.php 'r113771' [22:43:42] Logged the message, Master [22:44:04] !log awjrichards synchronized php/extensions/MobileFrontend/stylesheets/common.css 'r113774' [22:44:07] Logged the message, Master [22:44:45] !log awjrichards synchronized php/extensions/MobileFrontend/stylesheets/beta_common.css 'r113774' [22:44:48] Logged the message, Master [22:45:32] New patchset: Bhartshorne; "swiftcleaner calls htcp.php. may as well install it along side swiftcleaner." [operations/software] (master) - https://gerrit.wikimedia.org/r/3133 [22:46:19] New review: Bhartshorne; "(no comment)" [operations/software] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3133 [22:46:21] Change merged: Bhartshorne; [operations/software] (master) - https://gerrit.wikimedia.org/r/3133 [22:47:31] !log awjrichards synchronized php/extensions/MobileFrontend/templates/ApplicationTemplate.php 'r113779' [22:47:34] Logged the message, Master [22:49:47] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 8.817 seconds [22:56:59] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.907 seconds [22:56:59] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 6.253 seconds [23:03:17] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:03:26] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:09:45] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.806 seconds [23:14:11] !log preilly synchronized php-1.19/extensions/ZeroRatedMobileAccess/ZeroRatedMobileAccess.body.php 'changes for zero needed for carrier testing' [23:14:14] Logged the message, Master [23:17:50] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.088 seconds [23:24:59] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:26:20] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:31:08] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.441 seconds [23:34:35] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 4.862 seconds [23:42:10] !log reedy synchronized php-1.19/resources/jquery/jquery.textSelection.js 'r113786' [23:42:13] Logged the message, Master [23:47:38] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:49:27] New patchset: Bhartshorne; "first draft of the swift cleaner stuff. I know this doesn't work but I want to check it in for reviews." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3134 [23:49:40] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3134 [23:53:02] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:53:57] New patchset: Bhartshorne; "first draft of the swift cleaner stuff. I know this doesn't work but I want to check it in for reviews." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3134 [23:54:09] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3134