[00:07:46] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [00:27:38] (03CR) 10Yurik: "recheck" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/138049 (owner: 10Yurik) [01:00:46] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 22:00:07 UTC [01:06:28] (03PS2) 10Ori.livneh: role::mediawiki::webserver: set maxclients dynamically, dissolve bits role [operations/puppet] - 10https://gerrit.wikimedia.org/r/137947 [01:07:46] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.11 [01:08:16] (03CR) 10Ori.livneh: role::mediawiki::webserver: set maxclients dynamically, dissolve bits role (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137947 (owner: 10Ori.livneh) [01:17:46] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [01:32:17] (03PS1) 10Yurik: Fixing ZeroPortal labs rollout [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/138126 [01:34:25] (03CR) 10Yurik: [C: 032] "labs-only, prod noop" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/138126 (owner: 10Yurik) [01:34:30] (03Merged) 10jenkins-bot: Fixing ZeroPortal labs rollout [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/138126 (owner: 10Yurik) [01:44:46] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [02:17:34] !log LocalisationUpdate completed (1.24wmf7) at 2014-06-07 02:16:30+00:00 [02:17:41] Logged the message, Master [02:30:46] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Sat Jun 7 02:30:45 UTC 2014 [02:31:01] !log LocalisationUpdate completed (1.24wmf8) at 2014-06-07 02:29:57+00:00 [02:31:06] Logged the message, Master [03:24:39] !log LocalisationUpdate ResourceLoader cache refresh completed at Sat Jun 7 03:23:32 UTC 2014 (duration 23m 31s) [03:24:43] Logged the message, Master [04:45:46] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [05:22:46] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.0066889632107 [05:27:46] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [05:35:10] (03CR) 10Giuseppe Lavagetto: "@gwicke ok, it's a little baffling how we can't find a feature complete repository software :) My -1 is purely on technicalities though." [operations/puppet] - 10https://gerrit.wikimedia.org/r/136128 (owner: 10Filippo Giunchedi) [06:03:36] (03CR) 10GWicke: "I guess full-blown dak could do it ;)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136128 (owner: 10Filippo Giunchedi) [06:12:37] <_joe_> hey gwicke :) [06:13:02] <_joe_> my -1 there was just for the enhancements to the puppet manifests, don't get me wrong [06:13:04] * gwicke unlurks [06:13:19] yeah, I know - I only responded to the other part [06:13:33] <_joe_> I just hate we need two different softwares for doing almost exactly the same thing [06:13:36] currently reading the aptly source ;) [06:14:09] <_joe_> well, I don't think there is anything simple enough covering all our needs [06:14:34] <_joe_> so in the end, we can live with this :) [06:15:31] yeah, I think so too [06:16:10] my guess is that mini-dinstall wouldn't work so well to handle a repo the size of full debian [06:16:31] but I'll be happy when we reach two dozen different packages [06:18:21] adding a feature to aptly would probably also not be too hard: https://github.com/smira/aptly/blob/c72ef05a2a4d65c7d245aa867840d35f5c886144/deb/reflist.go#L309 [06:27:46] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.0133333333333 [07:20:56] PROBLEM - LighttpdHTTP on dataset1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:21:46] RECOVERY - LighttpdHTTP on dataset1001 is OK: HTTP OK: HTTP/1.1 200 OK - 5122 bytes in 0.002 second response time [07:46:46] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [09:35:16] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:47:46] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [12:40:36] PROBLEM - SSH on searchidx1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:41:27] RECOVERY - SSH on searchidx1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [13:48:46] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [16:10:23] (03PS2) 10Ori.livneh: beta: fix scap for videoscalers [operations/puppet] - 10https://gerrit.wikimedia.org/r/137274 (owner: 10BryanDavis) [16:11:13] (03CR) 10Ori.livneh: [C: 031] beta: fix scap for videoscalers [operations/puppet] - 10https://gerrit.wikimedia.org/r/137274 (owner: 10BryanDavis) [16:49:46] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [17:01:45] _joe_ or godog , can you give a look to elastic1017.eqiad.wmnet ? [17:04:33] Nemo_bis: is it misbehaving? [17:10:03] indeed ES isn't very keen on talking to me [17:11:20] Yup, came to report same. Timeouts on all non-default-namespace searches. [17:12:28] It's been like that since this morning, en.wiki and commons alike [17:13:34] kk, I'm trying to capture a jstack and then restarting [17:18:43] Any ops around? [17:18:57] Hi, we have not received any email through OTRS for about 21 hours. Can someone look into that? [17:18:58] OTRS has not received any mail for 21 (!) hours! [17:20:06] ou https://ganglia.wikimedia.org/latest/graph_all_periods.php?title=mchenry+mail+delivery&vl=&x=&n=&hreg[]=mchenry&mreg[]=exim.%2B>ype=line&glegend=show&aggregate=1 [17:20:52] Timing coincides with the light blue line [17:22:28] Nemo_bis quiddity bah I've captured a jstack and restarted ES [17:23:07] looking at mr henry [17:26:01] or iodine rather, it doesn't look like it is happy to talk to mrchenry [17:27:46] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [17:42:46] (03PS1) 10Jgreen: re-quote booleans for exim::roled class call because unquoting seems to break the template [operations/puppet] - 10https://gerrit.wikimedia.org/r/138182 [17:44:07] (03CR) 10Jgreen: [C: 032 V: 032] re-quote booleans for exim::roled class call because unquoting seems to break the template [operations/puppet] - 10https://gerrit.wikimedia.org/r/138182 (owner: 10Jgreen) [17:49:58] looks like Jeff_Green might be on it [17:50:25] yeah I think I just fixed it [17:55:06] godog: did you !log the fix to CirrusSearch which icinga-wm just announced? :) [17:55:26] Nemo_bis: oops no, sorry [17:56:05] !log restarted ES on elastic1017.eqiad.wmnet (at 17:22 UTC) [17:56:12] Logged the message, Master [18:01:06] PROBLEM - Disk space on virt1000 is CRITICAL: DISK CRITICAL - free space: / 1944 MB (3% inode=89%): [18:21:46] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Sat 07 Jun 2014 15:20:52 UTC [18:42:46] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.746666666667 [18:57:13] ori: Do you think we could change mwgrep to also work on Module: ? [19:09:16] PROBLEM - SSH on mw1053 is CRITICAL: Server answer: [19:09:46] PROBLEM - puppet disabled on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [19:10:06] PROBLEM - check configured eth on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [19:10:56] PROBLEM - RAID on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [19:12:06] RECOVERY - check configured eth on mw1053 is OK: NRPE: Unable to read output [19:12:46] RECOVERY - puppet disabled on mw1053 is OK: OK [19:13:16] PROBLEM - DPKG on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [19:13:26] PROBLEM - Disk space on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [19:14:06] PROBLEM - check if dhclient is running on mw1053 is CRITICAL: NRPE: Call to popen() failed [19:15:06] PROBLEM - check configured eth on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [19:15:17] RECOVERY - SSH on mw1053 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [19:15:26] RECOVERY - Disk space on mw1053 is OK: DISK OK [19:16:06] RECOVERY - check configured eth on mw1053 is OK: NRPE: Unable to read output [19:18:06] RECOVERY - check if dhclient is running on mw1053 is OK: PROCS OK: 0 processes with command name dhclient [19:18:16] RECOVERY - DPKG on mw1053 is OK: All packages OK [19:18:34] Maybe I'm missing something, but... [19:18:54] 0 processes with that name, but the check to make sure the process is running succeeds? [19:18:56] RECOVERY - RAID on mw1053 is OK: OK: no RAID installed [19:23:26] hoo|away: sure; submit a patch [19:24:07] Krenair: the executable that implements the check takes an argument specifying what is the desired number of processes, and what the thresholds are for warning / alerting [19:24:25] Krenair: in the case of dhclient, it makes sense for the check to issue an alert if the number of processes exceeds 0 [19:24:40] Krenair: because the idea is that dhclient ought to run very quickly and terminate; if it lingers it's a sign of some issue [19:24:48] Ah, I see. [19:25:10] Admittedly I didn't check to find out what dhclient does :p [19:50:46] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [20:01:56] PROBLEM - RAID on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [20:02:16] PROBLEM - DPKG on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [20:02:56] RECOVERY - RAID on mw1053 is OK: OK: no RAID installed [20:03:16] RECOVERY - DPKG on mw1053 is OK: All packages OK [20:05:56] PROBLEM - RAID on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [20:06:06] PROBLEM - check if dhclient is running on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [20:06:16] PROBLEM - DPKG on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [20:06:16] PROBLEM - SSH on mw1053 is CRITICAL: Server answer: [20:07:16] RECOVERY - SSH on mw1053 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [20:07:56] RECOVERY - RAID on mw1053 is OK: OK: no RAID installed [20:08:06] PROBLEM - check configured eth on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [20:10:06] RECOVERY - check configured eth on mw1053 is OK: NRPE: Unable to read output [20:11:06] PROBLEM - check if dhclient is running on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [20:12:06] RECOVERY - check if dhclient is running on mw1053 is OK: PROCS OK: 0 processes with command name dhclient [20:12:16] RECOVERY - DPKG on mw1053 is OK: All packages OK [20:13:06] PROBLEM - check configured eth on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [20:15:06] PROBLEM - check if dhclient is running on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [20:16:06] RECOVERY - check if dhclient is running on mw1053 is OK: PROCS OK: 0 processes with command name dhclient [20:17:16] PROBLEM - DPKG on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [20:17:56] PROBLEM - RAID on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [20:18:06] PROBLEM - check configured eth on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [20:19:32] cmjohnson1: robh ^^ seems 1053 is berzerk again [20:20:16] PROBLEM - SSH on mw1053 is CRITICAL: Server answer: [20:20:46] PROBLEM - puppet disabled on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [20:20:56] RECOVERY - RAID on mw1053 is OK: OK: no RAID installed [20:20:57] mutante: ^^ 1053 is down, we can deal, but, that stupid machine never wants to stay up [20:21:06] RECOVERY - check configured eth on mw1053 is OK: NRPE: Unable to read output [20:21:16] RECOVERY - DPKG on mw1053 is OK: All packages OK [20:21:16] RECOVERY - SSH on mw1053 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [20:21:40] greg-g: not sure what's going on with it. [20:21:46] RECOVERY - puppet disabled on mw1053 is OK: OK [20:22:13] not sure, don't matter much, but if it continually goes down/up, maybe depool/disable icinga on it? [20:22:54] I agree, let's depool it [20:25:41] cmjohnson1, greg-g: was this also causing some errors regarding the search function? [20:27:36] sjoerddebruin: shouldn't, what errors were you seeing? [20:28:43] greg-g: There was someone complaining in the nlwiki-village pumb. It was "We could not complete your search due to a temporary problem. Please try again later." [20:29:07] A error returned by the backend of the cirrussearch. [20:29:10] maybe? /me shrugs [20:29:14] oh, odd [20:29:27] is there any info on the backend error? [20:29:49] No. I tried myself but didn't get any. [20:29:53] k [20:30:05] But in the past weeks there seems to be a higher error count. :/ [20:30:09] hrmmm [20:30:47] how do you mark a change in gerrit as abandoned [20:31:07] grr, no nik online [20:31:11] sjoerddebruin: you're right [20:31:19] :D [20:32:09] http://imgur.com/8ZUjyjT [20:32:20] that's "CirrusSearch-failed" events [20:32:25] * greg-g emails nik [20:32:55] Yeah, it's a crucial part of the wiki. :/ [20:33:47] yeah, just emailed, sorry [20:35:17] cmjohnson1: elastic1017 is down [20:35:26] not sure if that's known, emailing nik that as well [20:35:26] PROBLEM - Disk space on analytics1013 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/h 100792 MB (5% inode=99%): /var/lib/hadoop/data/j 89442 MB (4% inode=99%): /var/lib/hadoop/data/f 100982 MB (5% inode=99%): /var/lib/hadoop/data/l 72878 MB (3% inode=99%): [20:38:31] i noticed it early but I assumed nik was working on it [20:38:34] again elastic1017.eqiad.wmnet acting up [20:38:36] or ottomata [20:38:50] Nemo_bis: yeah, just emailed Ops list about it [20:38:58] hopefully someone's checking email on Saturday [20:38:58] elastic1017 was added yesterday afternoon [20:39:20] depool? [20:39:23] cmjohnson1: it just keeps going up/down [20:39:26] Oh well, to bed now [20:39:32] Nemo_bis: not necissarily [20:39:41] I wouldn't know which ones are masters [20:39:46] g'night [20:40:06] elastic isn't like the mw pool, depooling the wrong one (or two) could be bad [20:40:11] they're databases :) [20:40:33] I believe the masters are identified [20:40:39] * cmjohnson1 goes to look [20:40:43] well, hopefully just one would only cause a short spike as the cluster reshuffles shards, but.. again, I'm wary personally ;) [20:46:04] $master_eligible = $::hostname ? { [20:46:05] 'elastic1002' => true, [20:46:05] 'elastic1007' => true, [20:46:07] 'elastic1014' => true, [20:46:37] weird, so just a slave but causing tons of errors [20:46:43] user facing errors [20:47:08] what is the offecnding box ? [20:47:44] 1017 ? [20:47:46] yeah [20:49:04] * greg-g just texted nik [20:50:14] greg-g: hint: https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=Search+eqiad&h=search1017.eqiad.wmnet&jr=&js=&v=3.362&m=search_threads&vl=threads ? [20:51:35] on that time line not sure [20:51:52] https://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Search+eqiad&h=search1017.eqiad.wmnet&jr=&js=&v=3.362&m=search_threads&vl=threads [20:51:55] that's day [20:52:08] yeah [20:52:19] some spikes [20:52:21] yeah [20:52:38] vs https://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Search+eqiad&h=search1011.eqiad.wmnet&jr=&js=&v=3.362&m=search_threads&vl=threads [20:52:41] ok, nik is got it [20:52:55] *has [20:53:12] oh man [21:10:16] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:13:38] greg-g: hey, sorry you had to call me [21:17:06] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.011 second response time [21:22:09] manybubbles: no worries man, sorry you had to come home from maker faire [21:24:16] manybubbles: hope it's easy/quick :/ [21:31:28] !log elastic1017 is sick - thrashing to death on io - restarting Elasticsearch to see if it recovers unthrashed [21:31:33] Logged the message, Master [21:35:26] PROBLEM - ElasticSearch health check on elastic1017 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.39 [21:35:31] !log after consulting logs - elastic1017 has had high io wait since it was deployed - I'm taking it out of rotation [21:35:34] Logged the message, Master [21:36:25] !log that means I turned off puppet and shut down Elasticsearch on elastic1017 - you can expect the cluster to go yellow for half an hour or so while the other nodes take rebuild the redundency that elastic1017 had [21:36:30] Logged the message, Master [21:37:38] thanks manybubbles [21:37:58] greg-g: no problem - it was stupid to try to add elastic1017 tot he cluster yesterday afternoon [21:38:20] I shouldn't have done it [21:38:43] bad dev, no biscuit? [21:39:43] calling it while the cluster rebuilds [21:42:46] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [21:43:39] wee [22:07:15] greg-g: ok - I'm going to take my laptop over to dinner - it everything looks a-o-better [22:51:46] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC [23:32:36] PROBLEM - Kafka Broker Under Replicated Partitions on analytics1022 is CRITICAL: kafka.server.ReplicaManager.UnderReplicatedPartitions.Value CRITICAL: 15.0 [23:34:36] PROBLEM - Disk space on analytics1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:34:36] PROBLEM - Kafka Broker Server on analytics1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:34:36] PROBLEM - RAID on analytics1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:34:46] PROBLEM - check if dhclient is running on analytics1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:34:56] PROBLEM - puppet disabled on analytics1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:35:06] PROBLEM - jmxtrans on analytics1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:35:16] PROBLEM - check configured eth on analytics1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:35:16] PROBLEM - DPKG on analytics1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:37:06] PROBLEM - Varnishkafka Delivery Errors on amssq47 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1.233333 [23:37:06] RECOVERY - DPKG on analytics1021 is OK: All packages OK [23:37:06] RECOVERY - check configured eth on analytics1021 is OK: NRPE: Unable to read output [23:37:26] RECOVERY - Disk space on analytics1021 is OK: DISK OK [23:37:26] RECOVERY - RAID on analytics1021 is OK: OK: no disks configured for RAID [23:37:36] RECOVERY - check if dhclient is running on analytics1021 is OK: PROCS OK: 0 processes with command name dhclient [23:37:46] RECOVERY - puppet disabled on analytics1021 is OK: OK [23:37:56] RECOVERY - jmxtrans on analytics1021 is OK: PROCS OK: 1 process with command name java, args -jar jmxtrans-all.jar [23:38:16] PROBLEM - Varnishkafka Delivery Errors on cp1037 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 0.016667 [23:38:36] PROBLEM - Varnishkafka Delivery Errors on cp1038 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 0.016667 [23:39:06] RECOVERY - Varnishkafka Delivery Errors on amssq47 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:39:16] RECOVERY - Varnishkafka Delivery Errors on cp1037 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:39:36] RECOVERY - Varnishkafka Delivery Errors on cp1038 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:41:06] PROBLEM - Varnishkafka Delivery Errors on amssq48 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 95.800003 [23:41:36] PROBLEM - Varnishkafka Delivery Errors on cp1046 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 75.666664 [23:41:46] PROBLEM - Varnishkafka Delivery Errors on amssq55 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 88.366669 [23:42:06] PROBLEM - Varnishkafka Delivery Errors on amssq51 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 201.600006 [23:42:16] PROBLEM - Varnishkafka Delivery Errors on amssq57 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 131.300003 [23:42:16] RECOVERY - Varnishkafka Delivery Errors on amssq48 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:42:16] PROBLEM - Varnishkafka Delivery Errors on amssq49 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 458.033325 [23:42:36] RECOVERY - Varnishkafka Delivery Errors on cp1046 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:42:36] PROBLEM - Varnishkafka Delivery Errors on cp4008 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 524.299988 [23:42:36] PROBLEM - Varnishkafka Delivery Errors on cp1059 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 532.299988 [23:42:36] PROBLEM - Varnishkafka Delivery Errors on cp4018 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 484.733337 [23:42:36] PROBLEM - Varnishkafka Delivery Errors on cp4001 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 579.599976 [23:42:37] PROBLEM - Varnishkafka Delivery Errors on cp4010 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 518.56665 [23:42:37] PROBLEM - Varnishkafka Delivery Errors on cp4017 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 206.166672 [23:42:38] PROBLEM - Varnishkafka Delivery Errors on cp4009 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 655.200012 [23:42:46] RECOVERY - Varnishkafka Delivery Errors on amssq55 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:43:06] RECOVERY - Varnishkafka Delivery Errors on amssq51 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:43:06] RECOVERY - Varnishkafka Delivery Errors on amssq57 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:43:16] RECOVERY - Varnishkafka Delivery Errors on amssq49 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:43:27] PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 638.400024 [23:43:36] PROBLEM - Varnishkafka Delivery Errors on cp4004 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 39.433334 [23:43:36] RECOVERY - Varnishkafka Delivery Errors on cp1059 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:43:36] RECOVERY - Varnishkafka Delivery Errors on cp4008 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:43:36] RECOVERY - Varnishkafka Delivery Errors on cp4018 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:43:36] RECOVERY - Varnishkafka Delivery Errors on cp4001 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:43:37] PROBLEM - Varnishkafka Delivery Errors on cp4016 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 259.333344 [23:43:37] PROBLEM - Varnishkafka Delivery Errors on cp4003 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 688.799988 [23:43:38] RECOVERY - Varnishkafka Delivery Errors on cp4010 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:43:38] RECOVERY - Varnishkafka Delivery Errors on cp4009 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:43:39] RECOVERY - Varnishkafka Delivery Errors on cp4017 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:44:26] RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:44:36] RECOVERY - Varnishkafka Delivery Errors on cp4004 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:44:36] PROBLEM - Varnishkafka Delivery Errors on cp1067 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 571.200012 [23:44:36] PROBLEM - Varnishkafka Delivery Errors on cp1055 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 503.066681 [23:44:36] RECOVERY - Varnishkafka Delivery Errors on cp4016 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:44:36] RECOVERY - Varnishkafka Delivery Errors on cp4003 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:44:37] PROBLEM - Varnishkafka Delivery Errors on cp1053 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 554.400024 [23:44:37] PROBLEM - Varnishkafka Delivery Errors on cp1065 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 493.166656 [23:44:38] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 562.06665 [23:44:56] PROBLEM - Varnishkafka Delivery Errors on cp1068 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 571.200012 [23:44:56] PROBLEM - Varnishkafka Delivery Errors on cp1066 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 571.200012 [23:44:56] PROBLEM - Varnishkafka Delivery Errors on cp1054 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 596.400024 [23:45:06] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 529.200012 [23:45:06] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 571.200012 [23:45:06] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 436.799988 [23:45:16] PROBLEM - Varnishkafka Delivery Errors on cp1052 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 596.400024 [23:46:56] RECOVERY - Varnishkafka Delivery Errors on cp1066 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:47:06] PROBLEM - Varnishkafka Delivery Errors on cp1069 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 483.0 [23:47:16] PROBLEM - Varnishkafka Delivery Errors on cp1057 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 579.599976 [23:47:16] PROBLEM - Varnishkafka Delivery Errors on cp1070 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 475.166656 [23:47:16] RECOVERY - Varnishkafka Delivery Errors on cp1052 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:47:36] PROBLEM - Varnishkafka Delivery Errors on cp1056 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 510.799988 [23:47:36] RECOVERY - Varnishkafka Delivery Errors on cp1067 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:47:36] RECOVERY - Varnishkafka Delivery Errors on cp1055 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:47:36] RECOVERY - Varnishkafka Delivery Errors on cp1065 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:47:36] RECOVERY - Varnishkafka Delivery Errors on cp1053 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:47:37] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:47:56] RECOVERY - Varnishkafka Delivery Errors on cp1068 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:47:56] RECOVERY - Varnishkafka Delivery Errors on cp1054 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:47:56] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:48:06] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:48:14] !log Fixed four CentralAuth log entries on meta which were logged for WikiSets/0 [23:48:20] Logged the message, Master [23:48:48] out of interest what were they, hoo? [23:49:06] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:49:21] oh, it were five actually, doh [23:49:44] UPDATE logging SET page_title = 'WikiSets/11' WHERE log_id = 1934956; [23:49:44] UPDATE logging SET log_title = 'WikiSets/11' WHERE log_id = 1934956; [23:49:44] UPDATE logging SET log_title = 'WikiSets/12' WHERE log_id = 2263003; [23:49:44] UPDATE logging SET log_title = 'WikiSets/13' WHERE log_id = 3947997; [23:49:44] UPDATE logging SET log_title = 'WikiSets/14' WHERE log_id = 3948158; [23:50:06] oh no, it were 4... screwed the first one :P [23:50:20] Krenair: ^ [23:51:06] RECOVERY - Varnishkafka Delivery Errors on cp1069 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:51:16] RECOVERY - Varnishkafka Delivery Errors on cp1070 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:51:16] RECOVERY - Varnishkafka Delivery Errors on cp1057 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:51:36] RECOVERY - Varnishkafka Delivery Errors on cp1056 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:52:55] | log_id | log_timestamp | log_comment | log_user_text | [23:52:56] | 1934956 | 20110702185233 | testing | Ruslik0 | [23:52:56] | 2263003 | 20111108101428 | For New wiki importer group | Ruslik0 | [23:52:57] | 3947997 | 20121213154417 | | Vituzzu | [23:52:57] | 3948158 | 20121213160615 | | Vituzzu | [23:52:58] ok [23:53:18] interesting. was there a bug about this hoo? any idea why it happened? [23:53:37] Krenair: Not sure we had a bug, but it has been fixed in LA [23:53:39] * CA [23:55:55] ah, ok [23:56:54] hoo, bug 27031 right? [23:57:20] yep :)