[00:18:52] (03CR) 10Aklapper: "After repeatedly running checksetup.pl on the fresh labs instance to get rid of the rest of the errors and proposals of Perl modules. All " [operations/puppet] - 10https://gerrit.wikimedia.org/r/101174 (owner: 10Dzahn) [00:22:07] PROBLEM - Puppet freshness on mchenry is CRITICAL: Last successful Puppet run was Fri 13 Dec 2013 06:20:43 PM UTC [00:49:39] (03CR) 10Aklapper: "Next step: AFTER REMOVING /lib on the fresh Labs instance:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/101174 (owner: 10Dzahn) [00:51:44] (03CR) 10Dzahn: "the Sitemap stuff never really seemed to work anyways, or just partly. afair quite quite some time ago i was involved on a ticket trying t" [operations/puppet] - 10https://gerrit.wikimedia.org/r/101174 (owner: 10Dzahn) [01:03:18] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [01:04:06] Snaps: what do you mean the stats are json? [01:04:13] Snaps: what generates the stats? [01:05:02] (03PS6) 10Dzahn: install various Perl modules needed by Bugzilla [operations/puppet] - 10https://gerrit.wikimedia.org/r/101174 [01:08:24] (03CR) 10Dzahn: "added packages per comments above and further testing, reduced number of libemail packages needed because they had normal dependencies, st" [operations/puppet] - 10https://gerrit.wikimedia.org/r/101174 (owner: 10Dzahn) [01:15:48] (03CR) 10Aklapper: [C: 031] "LGTM (compared to my findings in prev comments here and list looks complete)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/101174 (owner: 10Dzahn) [01:22:17] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [02:10:35] !log LocalisationUpdate completed (1.23wmf6) at Sat Dec 14 02:10:34 UTC 2013 [02:10:57] Logged the message, Master [02:19:24] !log LocalisationUpdate completed (1.23wmf7) at Sat Dec 14 02:19:24 UTC 2013 [02:19:40] Logged the message, Master [02:28:38] !log LocalisationUpdate ResourceLoader cache refresh completed at Sat Dec 14 02:28:38 UTC 2013 [02:28:54] Logged the message, Master [03:03:37] PROBLEM - Disk space on virt11 is CRITICAL: DISK CRITICAL - free space: /var/lib/nova/instances 42683 MB (3% inode=99%): [03:06:19] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Sat 14 Dec 2013 03:02:24 AM UTC [03:08:19] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Sat 14 Dec 2013 03:02:24 AM UTC [03:10:19] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Sat 14 Dec 2013 03:02:24 AM UTC [03:12:19] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Sat 14 Dec 2013 03:02:24 AM UTC [03:14:19] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Sat 14 Dec 2013 03:02:24 AM UTC [03:16:19] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Sat 14 Dec 2013 03:02:24 AM UTC [03:18:19] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Sat 14 Dec 2013 03:02:24 AM UTC [03:20:19] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Sat 14 Dec 2013 03:02:24 AM UTC [03:22:19] PROBLEM - Puppet freshness on mchenry is CRITICAL: Last successful Puppet run was Fri 13 Dec 2013 06:20:43 PM UTC [03:22:19] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Sat 14 Dec 2013 03:02:24 AM UTC [03:24:19] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Sat 14 Dec 2013 03:02:24 AM UTC [03:26:19] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Sat 14 Dec 2013 03:02:24 AM UTC [03:28:19] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Sat 14 Dec 2013 03:02:24 AM UTC [03:30:19] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Sat 14 Dec 2013 03:02:24 AM UTC [03:32:19] PROBLEM - Puppet freshness on wtp1003 is CRITICAL: Last successful Puppet run was Sat 14 Dec 2013 03:02:24 AM UTC [03:32:59] RECOVERY - Puppet freshness on wtp1003 is OK: puppet ran at Sat Dec 14 03:32:54 UTC 2013 [05:22:06] People with multiple BZ accounts *shakes fist at the cloud wildly* [06:02:25] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [06:22:25] PROBLEM - Puppet freshness on mchenry is CRITICAL: Last successful Puppet run was Fri 13 Dec 2013 06:20:43 PM UTC [06:23:55] PROBLEM - Host mw27 is DOWN: PING CRITICAL - Packet loss = 100% [06:24:35] RECOVERY - Host mw27 is UP: PING OK - Packet loss = 0%, RTA = 35.41 ms [07:06:25] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [07:46:25] PROBLEM - Puppet freshness on searchidx1001 is CRITICAL: Last successful Puppet run was Sat 14 Dec 2013 04:46:01 AM UTC [08:20:02] paravoid: varnishkafka itself (which is easily changed into somthing else of course) and librdkafka, to provide an ABI-safe and generic way to push stats [08:22:18] paravoid: doing a popen("json2astatsframework.py", "we"); in vk and then writing one json object per line seems to me the easiest and most generic solution. [08:47:35] RECOVERY - Puppet freshness on searchidx1001 is OK: puppet ran at Sat Dec 14 08:47:33 UTC 2013 [08:51:45] RECOVERY - Puppet freshness on mchenry is OK: puppet ran at Sat Dec 14 08:51:43 UTC 2013 [09:01:25] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [09:34:25] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [10:21:53] (03CR) 10Milimetric: "Sorry I missed this again Andrew, I'll try to make some time Monday to help you finish it. But is statsd standard enough that maybe Snaps" (031 comment) [operations/puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/101431 (owner: 10Ottomata) [11:40:25] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [12:11:25] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [12:15:55] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:16:55] RECOVERY - RAID on searchidx1001 is OK: OK: optimal, 1 logical, 4 physical [13:25:28] (03PS1) 10Hashar: beta: let sysops add/remove gwtoolset group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101488 [13:34:25] (03CR) 10Dan-nl: [C: 031] beta: let sysops add/remove gwtoolset group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101488 (owner: 10Hashar) [13:34:56] (03CR) 10Hashar: [C: 032] "deploying on labs :-D" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101488 (owner: 10Hashar) [13:35:05] (03Merged) 10jenkins-bot: beta: let sysops add/remove gwtoolset group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101488 (owner: 10Hashar) [14:02:25] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [14:35:31] (03CR) 10Edenhill: "There are two types of stats emitted by varnishkafka:" [operations/puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/101431 (owner: 10Ottomata) [16:50:25] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [18:16:35] PROBLEM - MySQL Recent Restart on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:17:25] RECOVERY - MySQL Recent Restart on db1047 is OK: OK 5712830 seconds since restart [18:22:55] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:23:55] RECOVERY - RAID on searchidx1001 is OK: OK: optimal, 1 logical, 4 physical [22:23:56] PROBLEM - RAID on db1001 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [23:35:45] PROBLEM - Host mw27 is DOWN: PING CRITICAL - Packet loss = 100% [23:36:45] RECOVERY - Host mw27 is UP: PING OK - Packet loss = 0%, RTA = 35.35 ms