[01:41:58] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 287 seconds [01:44:49] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 0 seconds [02:25:10] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [02:31:19] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [02:32:49] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [02:36:34] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [02:50:49] PROBLEM - udp2log log age for emery on emery is CRITICAL: CRITICAL: log files /var/log/squid/orange-ivory-coast.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [02:52:10] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [02:57:53] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.400 second response time [02:59:22] RECOVERY - udp2log log age for emery on emery is OK: OK: all log files active [04:20:49] PROBLEM - udp2log log age for oxygen on oxygen is CRITICAL: CRITICAL: log files /a/squid/telenor-montenegro.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [04:46:28] RECOVERY - udp2log log age for oxygen on oxygen is OK: OK: all log files active [05:23:49] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [05:24:12] again? [05:25:01] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123 [05:29:40] PROBLEM - Lucene on search1016 is CRITICAL: Connection timed out [05:38:31] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [05:39:07] PROBLEM - udp2log log age for emery on emery is CRITICAL: CRITICAL: log files /var/log/squid/orange-ivory-coast.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [05:39:34] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123 [05:50:09] 1016 needs a boot again? [05:51:25] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [05:51:38] * jeremyb spies a Jamesofur [05:51:51] i believe you've been sent a request for frisbees ;-P [05:53:00] A very exhausted Jamesofur lol, I was indeed sent this request :) [05:56:49] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123 [06:02:13] RECOVERY - udp2log log age for emery on emery is OK: OK: all log files active [06:02:49] RECOVERY - Lucene on search1016 is OK: TCP OK - 3.020 second response time on port 8123 [06:11:47] anyone around for a merge? [06:13:16] change 7917 has been waiting over a day and is not yet urgent but does need to be done in less than 2 days [06:13:45] * jeremyb finds himself again wanting a way to ask for review from the wind [06:14:22] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [06:14:38] also, of course search1016 needs a boot ;) [06:14:49] PROBLEM - Lucene on search1016 is CRITICAL: Connection timed out [06:15:34] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123 [06:17:31] RECOVERY - Lucene on search1016 is OK: TCP OK - 9.025 second response time on port 8123 [06:18:24] btw, why is none of search in pmtpa in ganglia? [06:18:35] or is all of prod search now in eqiad? [06:20:04] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [06:20:53] hrm, apergos is idle. hashar isn't here [06:22:46] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123 [06:26:22] PROBLEM - Lucene on search1016 is CRITICAL: Connection timed out [06:31:22] RECOVERY - Lucene on search1016 is OK: TCP OK - 0.027 second response time on port 8123 [06:34:49] * jeremyb tries a hilight on woosters? [06:39:40] * jeremyb went and looked it up, the 2 that have been flapping (search101[56]) are both commonswiki [06:51:01] RECOVERY - Packetloss_Average on oxygen is OK: OK: packet_loss_average is 0.419502479339 [06:57:55] PROBLEM - Packetloss_Average on oxygen is CRITICAL: XML parse error [08:33:01] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [10:01:04] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [12:27:01] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [12:54:01] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [14:25:32] New patchset: Demon; "Probably futile attempt to fix L10n update auto-commits." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8037 [14:25:51] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/8037 [15:03:01] RECOVERY - Packetloss_Average on oxygen is OK: OK: packet_loss_average is -0.18504601626 [15:09:55] PROBLEM - Packetloss_Average on oxygen is CRITICAL: XML parse error [16:57:01] * jeremyb spies a ^demon... [16:57:11] * ^demon hides [16:57:14] can i have a quick, trivial settings merge? [16:57:25] change 7917 [16:57:27] <^demon> At 1pm on a sunday? :\ [16:57:39] i asked several other times [16:57:53] i just ask now because you seem to be around [16:58:10] 20 06:13:44 * jeremyb finds himself again wanting a way to ask for review from the wind [17:00:05] hrmm, i wonder if search1016 fixed itself? [17:00:08] http://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&m=&c=Search+eqiad&h=search1016.eqiad.wmnet&tab=m&vn=&mc=2&z=medium&metric_group=ALLGROUPS [17:00:12] New review: Demon; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7917 [17:00:21] Change merged: Demon; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7917 [17:00:29] log shows no indication that it was booted [17:01:03] ^demon: thanks! now i don't have to worry about it ;) [17:01:26] well, i guess it still needs sync but maybe that's half done [17:03:44] <^demon> Done. [17:04:03] danke! [17:04:07] <^demon> yw. [17:17:37] <^demon> jeremyb: prolog is fun ;-) [17:17:48] *click* [17:22:16] ^demon: i need some clarification on "fun". well besides that it has "Fun" in the source [17:22:57] also, not a fan of the overloading .pl from perl [17:23:06] although idk which came first [17:23:10] <^demon> Blame prolog, not gerrit ;-) [17:24:20] and "fun"? [17:25:03] <^demon> I may have exaggerated ;-) [17:25:08] aha [17:26:04] anyway, i'm still thinking maybe we should have an address people can add as reviewer which is either a mailing list or goes to devnull. and then people can trawl the dashboard for that fake user [17:26:19] or else we get tagging working and people can just tag as wants review or something [17:26:30] <^demon> If you want review, add someone to review it? [17:26:59] how do i know who's available? on vacation? in which TZ? [17:27:10] and if i don't know how will someone less involved? [17:27:32] the point is i want to be able to ask the wind and i want people to be able to search for requests from the wind [17:29:19] we never used the review requests system in bugzilla? [17:29:25] <^demon> Not afaik. [17:29:56] (it does have a wind option built in) [17:54:01] hah, "joke" [18:05:11] New review: Platonides; "I hope it doesn't work, then :)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/8037 [18:35:07] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [18:45:48] New patchset: Jeroen De Dauw; "add irc notifications for the #wikimedia-wikidata and #semantic-mediawiki channels" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8043 [18:46:10] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/8043 [19:27:12] New patchset: Aaron Schulz; "Added profiling calls." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/8045 [19:27:18] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/8045 [19:27:57] New review: Aaron Schulz; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/8045 [19:27:59] Change merged: Aaron Schulz; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/8045 [19:28:45] I rebooted it [19:28:50] jeremyb: this morning [19:28:54] ahh [19:29:05] but I did not stick around: I gt the page, did the dead, saw the recover and left [19:29:09] (migraine) [19:29:14] deed* [19:29:16] ;P [19:29:28] feel halfway decent now but it's still there [19:29:47] ;-( [19:30:08] * jeremyb hands apergos a cup of tea. or whatever greeks drink [19:30:33] wine? [19:30:40] more suitable for this time I think [19:31:18] hey, it's only 15:30 [19:31:57] heh [19:31:59] 10:30 pm [20:01:39] New patchset: Jeroen De Dauw; "add irc notifications for the #wikimedia-wikidata and #semantic-mediawiki channels" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8043 [20:01:59] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/8043 [20:03:01] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [20:50:14] Will someone review https://gerrit.wikimedia.org/r/#/c/6578/ please? [21:18:48] hashar did ;) [21:31:43] New patchset: Jeremyb; "cleanup/refactor gerrit logging. allow multiple log files per project" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8120 [21:32:03] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/8120 [21:41:51] do we have a policy on following PEP 8 or not? (style guide) [21:48:12] New review: Jeroen De Dauw; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/8120 [21:50:45] New patchset: Jeremyb; "cleanup/refactor gerrit logging. allow multiple log files per project" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8120 [21:51:05] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/8120 [21:51:50] jeremyb, yeah I know, but someone else must approve for it to be merged [21:52:20] New review: Jeremyb; "did briefly look at the tests now, I think that's all that's needed. haven't run them yet." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/8120 [21:52:27] Krenair: sure [21:52:49] Krenair: not just that, you need someone to deploy and babysit [22:07:28] * jeremyb still wants a `ps ax` from emery/locke/oxygen [22:07:31] bbl [22:07:43] (any of them, not all of them) [22:29:01] New review: Jeremyb; "maybe this should wait on change 8120? this will be non-deterministic as is. (git dicts are unordered)" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/8043 [22:29:07] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [22:29:46] New review: Jeremyb; "gah, gerrit why don't you linkify?" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/8043 [22:30:39] New review: Jeremyb; "I give up." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/8043 [22:38:12] jeremyb: so what were you saying about mailman yesterday? [22:56:07] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours