[00:04:46] PROBLEM - MySQL Replication Heartbeat on db42 is CRITICAL: CRIT replication delay 265 seconds [00:05:04] PROBLEM - MySQL Slave Delay on db42 is CRITICAL: CRIT replication delay 268 seconds [00:09:16] RECOVERY - MySQL Replication Heartbeat on db42 is OK: OK replication delay 0 seconds [00:09:25] RECOVERY - MySQL Slave Delay on db42 is OK: OK replication delay 0 seconds [00:28:10] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [00:33:52] PROBLEM - MySQL Replication Heartbeat on db42 is CRITICAL: CRIT replication delay 235 seconds [00:34:10] PROBLEM - MySQL Slave Delay on db42 is CRITICAL: CRIT replication delay 252 seconds [00:36:07] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [00:37:01] RECOVERY - MySQL Slave Delay on db42 is OK: OK replication delay 0 seconds [00:37:01] RECOVERY - MySQL Replication Heartbeat on db42 is OK: OK replication delay 0 seconds [00:37:10] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [00:40:46] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [00:44:40] PROBLEM - Puppet freshness on es1004 is CRITICAL: Puppet has not run in the last 10 hours [00:45:07] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.031 second response time [00:52:46] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.261 seconds [01:31:02] preilly: I haven't been to ones by them in specific, but yeah, I'm always interested in that sort of thing [01:31:15] And since it's at Atlassian, I'd totally be down to go. [01:31:53] oh. it was today. nm. heh. [01:36:16] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [01:41:04] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 240 seconds [01:45:25] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 0 seconds [01:49:10] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 0 seconds [01:50:04] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 50s [01:52:37] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [01:54:55] dschoon: ha ha [02:10:55] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [02:13:20] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.174 seconds [02:23:41] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.201 seconds [02:39:17] RECOVERY - Puppet freshness on bellin is OK: puppet ran at Thu Jun 7 02:38:53 UTC 2012 [03:28:22] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:30:19] PROBLEM - Frontend Squid HTTP on cp1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:31:13] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27409 bytes in 0.110 seconds [03:31:40] RECOVERY - Frontend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27546 bytes in 0.111 seconds [03:35:34] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [03:37:13] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.179 seconds [03:53:52] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [04:01:50] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [04:19:23] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.163 seconds [04:30:59] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [04:37:23] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.175 seconds [05:31:14] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [05:38:26] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.176 seconds [05:42:56] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [05:49:30] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.161 seconds [06:19:57] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [06:28:48] PROBLEM - Host search32 is DOWN: PING CRITICAL - Packet loss = 100% [06:40:43] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.187 seconds [06:53:12] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [06:55:27] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [06:57:42] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [06:59:12] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [07:08:03] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.055 second response time [07:15:24] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [07:22:09] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.164 seconds [07:36:16] PROBLEM - Puppet freshness on es1003 is CRITICAL: Puppet has not run in the last 10 hours [07:36:16] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [07:36:16] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [07:40:55] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.165 seconds [08:10:19] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [08:15:34] RECOVERY - Host db1047 is UP: PING OK - Packet loss = 0%, RTA = 26.38 ms [08:20:14] PROBLEM - MySQL Replication Heartbeat on db1047 is CRITICAL: CRIT replication delay 58842 seconds [08:20:31] PROBLEM - MySQL Slave Delay on db1047 is CRITICAL: CRIT replication delay 58832 seconds [08:23:31] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.179 seconds [08:39:39] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [08:41:09] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [08:42:30] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.171 seconds [08:47:54] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay NULL seconds [08:50:54] PROBLEM - MySQL Slave Running on db1047 is CRITICAL: CRIT replication Slave_IO_Running: Yes Slave_SQL_Running: No Last_Error: Error Incorrect key file for table ./enwiki/aft_article_filter_coun [09:03:03] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [09:23:27] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.162 seconds [10:39:29] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [10:43:50] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.177 seconds [10:45:56] PROBLEM - Puppet freshness on es1004 is CRITICAL: Puppet has not run in the last 10 hours [11:24:29] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [11:25:59] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.161 seconds [11:36:56] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [11:44:06] PROBLEM - MySQL Replication Heartbeat on db1020 is CRITICAL: CRIT replication delay 185 seconds [11:44:42] PROBLEM - MySQL Slave Delay on db1020 is CRITICAL: CRIT replication delay 195 seconds [11:52:48] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [11:57:09] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.160 seconds [11:57:27] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [11:57:45] RECOVERY - MySQL Slave Delay on db1020 is OK: OK replication delay 0 seconds [11:58:21] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [11:58:30] RECOVERY - MySQL Replication Heartbeat on db1020 is OK: OK replication delay 0 seconds [12:00:18] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [12:15:45] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.177 seconds [12:19:03] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [12:27:36] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.168 seconds [12:49:34] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [12:56:46] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [12:58:25] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.176 seconds [13:17:01] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.191 seconds [13:17:28] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [13:29:01] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.162 seconds [13:45:04] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [13:47:16] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.180 seconds [13:48:37] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [14:00:19] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.162 seconds [14:03:10] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [14:13:49] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [14:16:31] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [14:18:19] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.370 seconds [14:30:46] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.163 seconds [14:41:52] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [14:45:10] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [14:47:35] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.236 seconds [14:52:07] hi guys, can someone approve this for me? [14:52:08] https://gerrit.wikimedia.org/r/#/c/9627/ [14:57:31] PROBLEM - swift-container-auditor on ms-be5 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [15:01:52] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.163 seconds [15:02:48] !b 37089 [15:02:49] https://bugzilla.wikimedia.org/37089 [15:03:11] 05 15:49:54 mark: i think there was also a second block? either squid or mediawiki config or something [15:03:14] 05 15:50:12 squid common-acls.conf , marked with 20120522 atg and reason I need to go [15:03:45] someone want to remove that? [15:06:12] damn you gerrit. why can't I do everything with the keyboard? also your perf kinda sucks [15:06:50] although it's nice that everything or nearly everything could be done by ssh interface. so i could write my own UI... ;P [15:08:41] New review: Jeremyb; "good enough for now" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/9627 [15:09:02] ottomata: ^. you need someone else still obviously ;) [15:09:14] yeah, still waiting on that one :/ [15:09:16] thanks though [15:09:40] RECOVERY - swift-container-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [15:10:52] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [15:14:55] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [15:18:04] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.367 seconds [15:23:33] New patchset: Ryan Lane; "Use version 115488 of the ldap tools." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10571 [15:23:57] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/10571 [15:24:02] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/10571 [15:24:05] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10571 [15:32:28] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.164 seconds [15:40:08] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [15:42:13] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 24.6341522727 (gt 8.0) [15:44:01] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [15:44:37] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 0.0 [15:48:40] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.175 seconds [16:03:05] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.160 seconds [16:08:47] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [16:20:20] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.202 seconds [16:39:14] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [16:39:32] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [16:49:35] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.183 seconds [16:52:26] !log added ldap automount entries for /public/datasets and /public/keys [16:52:31] Logged the message, Master [17:04:20] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.162 seconds [17:07:11] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [17:11:09] New patchset: Ryan Lane; "Adding /public" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10582 [17:11:32] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/10582 [17:11:45] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/10582 [17:11:48] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10582 [17:20:14] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.170 seconds [17:37:11] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [17:37:11] PROBLEM - Puppet freshness on es1003 is CRITICAL: Puppet has not run in the last 10 hours [17:37:11] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [17:37:47] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [17:50:59] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.178 seconds [18:06:53] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [18:08:42] Ryan_Lane: Hey remember that node box I talked to you about a few weeks ago? [18:08:49] yep [18:08:57] I'm gonna need it soon, do we have a misc box lying around for that? [18:09:02] I have no clue [18:09:09] I have cadmium but it's in eqiad [18:09:12] robh would be the person to talk to about that [18:09:17] <^demon> Hmm, formey? Nothing important there anymore ;-) [18:09:21] preilly said I could borrow one of his WPzero boxes [18:09:27] ^demon: Is formey in Tampa? [18:09:33] ^demon: formey is a replica of manganese [18:09:35] <^demon> That it is, but I was joking :p [18:09:47] it also still hosts svn [18:10:05] Really I'm just looking for a box to borrow for a few weeks while I go through the formal process of procuring a machine etc [18:10:43] robh is really the guy for this [18:10:46] OK [18:10:55] Is he working this week? [18:11:02] I think he's on vacation [18:11:11] I was afraid of that [18:11:30] Ops Corner is empty, and the remote ops are all on vacation or resting up after working round the clock for IPv6 day [18:11:53] I also worked IPv6 day, I just didn't rest today [18:11:57] heh [18:12:05] You're the only ops person I've seen since Sunday [18:12:10] Well no that's not true [18:12:17] Mark and Faidon busted their asses for 6/6 [18:12:22] Ryan is just omnipresent [18:12:56] That's because mark is awesome, pretty sure you have to be awesome (and mental) to work in ops [18:13:54] Oh don't get me wrong, I love our ops people [18:14:04] But it's inconvenient that they're all gone at the same time [18:14:51] <^demon> RoanKattouw: Could labs work until real hardware exists? I have no clue, but just throwing that out there. [18:15:29] Do you have any idea how slow labs is right now? [18:15:46] yeah, please don't do that :) [18:16:11] <^demon> Just throwing it out there :p [18:16:35] ok. off to dinner [18:16:38] Can we pioneer container based datacenters, each project gets a container full of blades? :D [18:16:39] Enjoy [18:16:48] I'm probably just gonna use cadmium for now [18:16:54] <^demon> Ryan_Lane: Enjoy. I want you back on a later timezone than me though :p [18:17:00] :D [18:17:01] Even though it's in eqiad [18:17:09] I like it here. I get so much more work done! [18:17:37] <^demon> Honestly, I'm going to miss EST, being able to hit Europe & SF with no difficulty. [18:20:30] Yeah Eastern time must be nice [18:20:47] I like being able to work at a normal hour but I can't really reach Europeans well any more [18:21:01] <^demon> RoanKattouw: https://bugzilla.wikimedia.org/show_bug.cgi?id=37083 lol. [18:21:34] pff [18:21:49] That's disappointing [18:21:52] <^demon> jgit is silly [18:22:05] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.204 seconds [18:24:58] <^demon> RoanKattouw, AaronSchulz: I'm going to move hetdeploy to git. How does operations/mediawiki-hetdeploy sound? or mediawiki-multiversion? Was thinking similar to mediawiki-config. [18:25:25] the multiversion thing sounds ok [18:25:39] <^demon> mediawiki-multiversion, mmk. Any objections anyone else? [18:26:51] New patchset: SPQRobin; "bug 37391 - Install Translate extension on be.wikimedia.org" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/10593 [18:26:58] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/10593 [18:27:01] <^demon> Ok, svn now r/o for it [18:35:58] <^demon> Ok, pushed to gerrit. [18:41:44] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [18:46:15] New review: Thehelpfulone; "Looks okay to me, if you want, the bug for Wikimania 2013 wiki was 36477, it was requested as part o..." [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/10593 [18:58:13] Logged the message, Master [19:06:20] RECOVERY - Host search32 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [19:09:38] PROBLEM - Lucene disk space on search32 is CRITICAL: Connection refused by host [19:14:08] PROBLEM - Lucene on search32 is CRITICAL: Connection refused [19:28:21] PROBLEM - NTP on search32 is CRITICAL: NTP CRITICAL: No response from NTP server [19:47:33] PROBLEM - Host search32 is DOWN: PING CRITICAL - Packet loss = 100% [19:50:51] RECOVERY - Lucene disk space on search32 is OK: DISK OK [19:51:00] RECOVERY - Host search32 is UP: PING OK - Packet loss = 0%, RTA = 1.18 ms [19:51:45] RECOVERY - Lucene on search32 is OK: TCP OK - 0.010 second response time on port 8123 [20:12:27] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [20:25:30] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.200 seconds [20:26:38] New patchset: Kaldari; "Turning on LastModified and E3Experiments on en.wiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/10660 [20:26:44] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/10660 [20:27:40] New review: Kaldari; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/10660 [20:27:42] Change merged: Kaldari; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/10660 [20:47:01] PROBLEM - Puppet freshness on es1004 is CRITICAL: Puppet has not run in the last 10 hours [21:38:01] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [21:55:06] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [21:57:57] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.196 seconds [22:02:18] New patchset: Catrope; "Pipe AFT logging data through a demuxer." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10669 [22:02:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/10669 [22:24:12] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [22:28:33] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.197 seconds [22:36:39] PROBLEM - Host search32 is DOWN: PING CRITICAL - Packet loss = 100% [22:37:33] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [22:39:57] RECOVERY - Host search32 is UP: PING OK - Packet loss = 0%, RTA = 0.61 ms [22:40:24] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.187 seconds [22:51:58] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [23:00:31] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.196 seconds [23:02:38] Ryan_Lane: ping [23:02:42] New patchset: preilly; "add opera ip addresses back to digi block" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10672 [23:02:43] notpeter: ping [23:02:47] mark: ping [23:03:05] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/10672 [23:07:37] woosters: ping [23:08:07] paravoid: ping [23:08:42] Is there anybody in this channel from operations right now? [23:11:35] New review: preilly; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/10672 [23:13:50] New patchset: Jeremyb; "Bug 27706 - enable RSS extension on uawikimedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/10673 [23:13:56] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/10673 [23:14:52] jeremyb: do you have root access? [23:14:52] New review: Jeremyb; "I just said on IRC: I'm confused. Why can mediawikiwiki have rss use unrestricted but uawikimedia ne..." [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/10673 [23:21:18] Tim-away: ping [23:21:20] preilly: i have essentially no access [23:21:30] jeremyb: okay thanks for the reply [23:21:32] preilly: or at least as little as you [23:21:42] jeremyb: ha ha okay thanks [23:22:03] i'm happy to try to answer something or dig up a root though [23:22:42] preilly: it's kinda late in europe and especially greece atm [23:23:12] jeremyb: yeah I know [23:23:24] 1:23am Friday [23:23:24] i think one of the greeks was up ~40 mins past now the other night. but that was a special occasian [23:23:37] jeremyb: yeah thanks for pointing that out [23:23:37] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [23:23:57] occasion* [23:24:12] preilly: it's now 2:23am [23:24:25] well 2:24 actually ;P [23:24:30] ha ha ha [23:25:20] what's up? [23:25:25] preilly: ^ [23:25:42] New patchset: Dereckson; "bug 37391 - Install Translate extension on be.wikimedia.org" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/10593 [23:25:48] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/10593 [23:25:51] TimStarling: can you approve this change for me? — https://gerrit.wikimedia.org/r/#/c/10672/ [23:26:16] TimStarling: I can't seem to get ahold of anybody else and Amit is in Malaysia right now trying to demo this functionality [23:26:46] PROBLEM - Host srv206 is DOWN: PING CRITICAL - Packet loss = 100% [23:26:59] can't he just get a special UA string or some other flag that means it's him? [23:27:01] ok [23:27:09] New review: Tim Starling; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/10672 [23:27:12] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10672 [23:27:23] TimStarling: great thanks! [23:27:26] I assume you want me to push it into puppet also? [23:27:39] sounds like a good assumption ;P [23:28:09] preilly? [23:28:13] New review: Thehelpfulone; "(no comment)" [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/10593 [23:28:16] it's done already [23:28:24] k [23:28:37] preilly: what about getting him another way to identify himself? [23:28:40] TimStarling: yes please [23:29:17] preilly: it's done already [23:29:35] do you know what servers it affects? I can do a forced puppet run [23:29:51] mobile varnish? [23:30:49] TimStarling: it's cp1041, cp1042, cp1043, cp1044 [23:31:13] TimStarling: in eqiad [23:31:55] site.pp agrees [23:31:56] node /cp104[1-4].wikimedia.org/ { [23:32:16] !log deploying varnish configuration change https://gerrit.wikimedia.org/r/#/c/10672/ on cp1041, cp1042, cp1043, cp1044 [23:32:19] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.210 seconds [23:32:20] Logged the message, Master [23:34:59] TimStarling: so are you doing the forced puppet run now? [23:35:19] yes, I've done cp1041 and cp1042, now running cp1043 [23:35:27] TimStarling: okay great [23:35:31] TimStarling: thanks for your help [23:35:41] TimStarling: I greatly appreciate it as I'm sure Amit does as well [23:35:50] it's no problem [23:36:16] preilly: what about getting him another way to identify himself? [23:36:20] 07 23:26:58 < jeremyb> can't he just get a special UA string or some other flag that means it's him? [23:36:41] jeremyb: we don't have anything like that in place [23:36:52] i'm talking about the future [23:37:18] jeremyb: we are working on a cookie based solution for the future for testing proposes [23:37:42] is there a draft of that somewhere? [23:37:56] jeremyb: no [23:38:08] jeremyb: who are you? [23:38:13] jeremyb: I don't think I know you [23:38:32] sounds likely. i think you're new since last wikimania [23:38:46] * jeremyb is nobody particularly special [23:40:01] jeremyb: I was at Wikimania last year [23:40:31] hrmmm. yeah, on second thought you're not quite as new as i was thinking [23:40:35] i think we did meet [23:40:46] i don't really remember for sure [23:42:05] doesn't help not having a pic on the staff page [23:43:04] jeremyb: this is me — http://www.flickr.com/photos/tychay/1382180433/ [23:44:00] * jeremyb doesn't know [23:44:50] * jeremyb definitely did meet kul and i think amit too [23:45:03] hmm [23:45:24] TimStarling: thanks again for all of your help [23:45:45] TimStarling: not having access is problematic when no one from operations is around [23:46:36] strange that nobody is around, is it a holiday or something? [23:46:57] TimStarling: everyone is out of town or sleeping [23:47:00] TimStarling: or both [23:47:05] TimStarling: it's not a holiday [23:47:51] TimStarling: there's some people from the hackathon in greece and it's nearly 3am there. and some people (e.g. ma rk) are recovering from the ipv6 deploy. but really it's pretty late for all of europe [23:48:30] idk where SF is other than being partly in europe and hence coverd by my last line ;) [23:48:32] it's not late in SF [23:48:53] what is it there, near 10am? [23:49:01] heh [23:49:04] yes [23:49:17] it's lucky that I was online, I asked for today off, but Angela is out with Evelyn so I thought I may as well do some work [23:49:29] TimStarling: there aren't any operations folks in SF currently [23:49:49] as in, like none :) [23:49:51] It's Thu Jun 7 16:49:48 in SF [23:50:10] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [23:50:16] i'm pretty sure leslie is in SF [23:50:26] jeremyb: NO she is not [23:50:30] huh [23:50:52] jeremyb: she is on vacation [23:51:18] oh. i knew she was going for a run or walk or something. didn't catch the vacation [23:51:23] where are those GPS implants? ;) [23:52:05] jeremyb: she is at the Aids LifeCycle [23:52:31] preilly: for some reason i thought that was during the hackathon [23:52:43] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [23:52:52] jeremyb: it's 7-days, 600 miles. [23:52:58] whoa [23:53:12] that certainly explains it [23:53:25] i was thinking a 1 or 2 day thing [23:54:06] jeremyb: http://www.aidslifecycle.org/ [23:56:02] * jeremyb is most familiar with http://www.pmc.org/ , went to camp right on top of the route for years