[00:05:29] Ryan_Lane: oh yeah, see netvibes.com , that was pretty nice, got reminded by http://web.appstorm.net/roundups/top-10-web-based-rss-readers/ [00:06:50] RECOVERY - Varnish traffic logger on cp1023 is OK: PROCS OK: 3 processes with command name varnishncsa [00:08:40] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 00:08:28 UTC 2013 [00:09:16] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [00:10:05] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 00:10:00 UTC 2013 [00:10:16] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [00:11:06] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 00:11:02 UTC 2013 [00:11:15] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [00:12:06] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 00:11:59 UTC 2013 [00:12:19] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [00:12:45] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 00:12:44 UTC 2013 [00:13:15] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [00:21:54] New patchset: Ryan Lane; "Fix novaconfig include for labs puppetmaster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53708 [00:22:30] New patchset: Krinkle; "Integration: Move to wikimedia.org" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53513 [00:23:26] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53708 [00:23:44] New review: Krinkle; "@Hashar: Moved update of integration site and doc_index.html to I09225307686dcd07" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53513 [00:25:16] RECOVERY - Puppet freshness on virt0 is OK: puppet ran at Thu Mar 14 00:25:04 UTC 2013 [00:26:04] New patchset: Krinkle; "Integration: Move to wikimedia.org" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53513 [00:27:25] RECOVERY - Puppet freshness on virt1000 is OK: puppet ran at Thu Mar 14 00:27:18 UTC 2013 [00:37:27] New patchset: Ryan Lane; "Deny another beam port and amanda by default" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53709 [00:37:35] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 187 seconds [00:37:35] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 187 seconds [00:38:23] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53709 [00:41:46] New patchset: Ryan Lane; "Refer to inetd for amanda service" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53710 [00:42:36] PROBLEM - Varnish traffic logger on cp1028 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [00:42:39] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53710 [00:48:25] PROBLEM - Puppet freshness on europium is CRITICAL: Puppet has not run in the last 10 hours [00:52:47] PROBLEM - Varnish traffic logger on cp1023 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [00:53:07] !log reedy synchronized wmf-config/ [00:53:13] Logged the message, Master [00:56:47] PROBLEM - Varnish traffic logger on cp1033 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [00:59:25] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 00:59:18 UTC 2013 [00:59:54] !log reedy synchronized php-1.21wmf11/extensions/Scribunto [01:00:00] Logged the message, Master [01:00:15] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [01:07:36] RECOVERY - Varnish traffic logger on cp1028 is OK: PROCS OK: 3 processes with command name varnishncsa [01:08:43] !log reedy synchronized php-1.21wmf11/cache/l10n/ [01:10:32] Logged the message, Master [01:14:25] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: Puppet has not run in the last 10 hours [01:14:25] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Puppet has not run in the last 10 hours [01:14:26] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [01:14:26] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [01:14:26] PROBLEM - Puppet freshness on msfe1002 is CRITICAL: Puppet has not run in the last 10 hours [01:16:48] RECOVERY - Varnish traffic logger on cp1023 is OK: PROCS OK: 3 processes with command name varnishncsa [01:18:33] New patchset: Ryan Lane; "Include the icinga system user in nrpe" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53711 [01:21:18] New patchset: Hashar; "contint: xdebug + code coverage directory" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53531 [01:21:18] New patchset: Hashar; "contint: move apache proxy configuration to module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53425 [01:21:18] New patchset: Hashar; "contint: move tmpfs disk to the module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53424 [01:21:19] New patchset: Hashar; "contint: get rid of Sun JDK" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53423 [01:21:47] RECOVERY - Varnish traffic logger on cp1033 is OK: PROCS OK: 3 processes with command name varnishncsa [01:21:54] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53711 [01:22:18] paravoid: I have rebased my contint module to get rid of the only change you have rejected ( https://gerrit.wikimedia.org/r/#/c/53422/ which made contint to depend on geoip ) [01:22:19] ;) [01:22:50] I'll have a look tomorrow [01:22:51] now sleep [01:22:59] thanks :-] [01:23:08] will patch the testswarm one on top of that [01:23:27] PROBLEM - Varnish traffic logger on cp1026 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:23:46] hashar: Hi again, still adjusting time rotation? [01:24:04] Krinkle: yeah. Went to sleep at 7pm completely vasted [01:24:19] and had some meal at 1am :/ [01:24:40] I am going to work overnight and have a nap in the morning :-] [01:24:47] New review: Hashar; "I have rebased the other contint patches to get rid of this dependency." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53422 [01:25:19] Does anyone know what version of Redis we run in production? [01:27:58] 2.6 something [01:28:38] Thanks, AaronSchulz. [01:30:25] New patchset: Hashar; "migrate testswarm module to contint module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53712 [01:30:39] 2.6.3-wmf1 [01:31:22] New review: Krinkle; "(1 comment)" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/53712 [01:32:01] New review: Hashar; "Faidon, here is the patch we talked about this afternoon. I guess we will reinstate the testswarm m..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53712 [01:32:16] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 01:32:06 UTC 2013 [01:32:17] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [01:32:32] New review: Hashar; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53712 [01:33:45] PROBLEM - Varnish traffic logger on cp1033 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:33:51] New patchset: Hashar; "migrate testswarm module to contint module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53712 [01:34:02] New patchset: Krinkle; "contint: xdebug + code coverage directory" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53531 [01:34:25] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 01:34:24 UTC 2013 [01:35:17] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [01:35:22] !log running puppet on celsus to investigate puppet freshness reports from icinga - it started parsoid, but also celsus is a Wikimedia DECOMMISSIONED server (base::decommissioned). [01:35:28] Logged the message, Master [01:36:15] ACKNOWLEDGEMENT - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours daniel_zahn stop channel spam - celsus is a Wikimedia DECOMMISSIONED server (base::decommissioned). [01:36:25] RECOVERY - Varnish traffic logger on cp1026 is OK: PROCS OK: 3 processes with command name varnishncsa [01:40:47] PROBLEM - Varnish traffic logger on cp1023 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:48:35] PROBLEM - Varnish traffic logger on cp1028 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:49:45] RECOVERY - Varnish traffic logger on cp1023 is OK: PROCS OK: 3 processes with command name varnishncsa [01:53:26] PROBLEM - Puppet freshness on kuo is CRITICAL: Puppet has not run in the last 10 hours [01:54:46] RECOVERY - Varnish traffic logger on cp1033 is OK: PROCS OK: 3 processes with command name varnishncsa [01:55:58] Change merged: Ryan Lane; [operations/debs/ircecho] (master) - https://gerrit.wikimedia.org/r/53360 [01:56:25] PROBLEM - Puppet freshness on virt1005 is CRITICAL: Puppet has not run in the last 10 hours [01:56:28] Change merged: Ryan Lane; [operations/debs/ircecho] (master) - https://gerrit.wikimedia.org/r/53361 [01:56:42] Change merged: Ryan Lane; [operations/debs/ircecho] (master) - https://gerrit.wikimedia.org/r/53363 [01:57:28] New patchset: Hashar; "move geoip to a module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53714 [02:01:49] tfinc: you know from here, it *almost* looks like Jon is here at first glance [02:02:10] AaronSchulz: its pretty compelling isn't it? [02:02:29] i had to add the hat to complete it [02:02:55] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 02:02:52 UTC 2013 [02:03:16] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [02:03:25] PROBLEM - Puppet freshness on hume is CRITICAL: Puppet has not run in the last 10 hours [02:06:25] PROBLEM - Puppet freshness on mw1109 is CRITICAL: Puppet has not run in the last 10 hours [02:09:46] PROBLEM - Varnish traffic logger on cp1033 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [02:12:26] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [02:13:26] PROBLEM - MySQL Replication Heartbeat on db69 is CRITICAL: CRIT replication delay 204 seconds [02:13:26] PROBLEM - MySQL Slave Delay on db69 is CRITICAL: CRIT replication delay 207 seconds [02:13:35] RECOVERY - Varnish traffic logger on cp1028 is OK: PROCS OK: 3 processes with command name varnishncsa [02:14:26] PROBLEM - Puppet freshness on tola is CRITICAL: Puppet has not run in the last 10 hours [02:14:46] PROBLEM - Varnish traffic logger on cp1025 is CRITICAL: PROCS CRITICAL: 1 process with command name varnishncsa [02:17:00] TimStarling: expect a huge redis queue patch sometime, just saying :) [02:17:25] Change restored: Hashar; "(no reason)" [operations/debs/ircecho] (master) - https://gerrit.wikimedia.org/r/53371 [02:17:25] PROBLEM - Puppet freshness on capella is CRITICAL: Puppet has not run in the last 10 hours [02:17:30] New patchset: Hashar; "Jenkins job validation (DO NOT SUBMIT)" [operations/debs/ircecho] (master) - https://gerrit.wikimedia.org/r/53371 [02:18:26] RECOVERY - MySQL Replication Heartbeat on db69 is OK: OK replication delay 0 seconds [02:18:37] RECOVERY - MySQL Slave Delay on db69 is OK: OK replication delay 0 seconds [02:18:58] Change abandoned: Hashar; "(no reason)" [operations/debs/ircecho] (master) - https://gerrit.wikimedia.org/r/53371 [02:20:50] !log jenkins: made Zuul to block changes on operations/debs/ircecho whenever pep8/pyflakes fails. [02:20:58] Logged the message, Master [02:22:46] RECOVERY - Varnish traffic logger on cp1025 is OK: PROCS OK: 3 processes with command name varnishncsa [02:25:20] !log LocalisationUpdate completed (1.21wmf11) at Thu Mar 14 02:25:19 UTC 2013 [02:25:26] Logged the message, Master [02:27:26] PROBLEM - MySQL Slave Delay on db66 is CRITICAL: CRIT replication delay 188 seconds [02:27:45] RECOVERY - Varnish traffic logger on cp1033 is OK: PROCS OK: 3 processes with command name varnishncsa [02:27:45] PROBLEM - MySQL Replication Heartbeat on db66 is CRITICAL: CRIT replication delay 213 seconds [02:28:27] PROBLEM - Varnish traffic logger on cp1034 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [02:29:33] New review: Hashar; "That may be working. We cant really test GeoIP in labs since virt0 does not have the GeoIP files und..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53714 [02:33:47] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 02:33:42 UTC 2013 [02:34:15] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [02:39:27] RECOVERY - Varnish traffic logger on cp1034 is OK: PROCS OK: 3 processes with command name varnishncsa [03:04:46] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 03:04:41 UTC 2013 [03:05:16] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [03:06:25] RECOVERY - MySQL Slave Delay on db66 is OK: OK replication delay 0 seconds [03:06:47] RECOVERY - MySQL Replication Heartbeat on db66 is OK: OK replication delay 0 seconds [03:14:47] PROBLEM - Varnish traffic logger on cp1023 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [03:25:48] RECOVERY - Varnish traffic logger on cp1023 is OK: PROCS OK: 3 processes with command name varnishncsa [03:34:09] New patchset: Hashar; "migrate testswarm module to contint module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53712 [03:35:27] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 03:35:21 UTC 2013 [03:36:15] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [03:40:48] New review: Hashar; "I will take care of integrating this change in the puppet 'contint' module whenever it is merged in ..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/53513 [03:45:36] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 189 seconds [03:45:36] PROBLEM - MySQL Replication Heartbeat on db1020 is CRITICAL: CRIT replication delay 182 seconds [03:45:45] PROBLEM - MySQL Slave Delay on db1020 is CRITICAL: CRIT replication delay 188 seconds [03:45:58] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 195 seconds [03:58:36] RECOVERY - MySQL Replication Heartbeat on db1020 is OK: OK replication delay 0 seconds [03:58:46] RECOVERY - MySQL Slave Delay on db1020 is OK: OK replication delay 1 seconds [04:00:06] New patchset: Krinkle; "Integration: Move to wikimedia.org" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53513 [04:03:25] PROBLEM - Puppet freshness on colby is CRITICAL: Puppet has not run in the last 10 hours [04:07:15] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 04:07:10 UTC 2013 [04:07:15] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [04:10:55] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 181 seconds [04:11:36] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 188 seconds [04:20:25] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [04:43:05] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 04:43:02 UTC 2013 [04:43:15] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [04:50:53] New patchset: Hashar; "(bug 44061) initial release" [operations/debs/python-voluptuous] (master) - https://gerrit.wikimedia.org/r/44408 [04:53:09] New patchset: Hashar; "(bug 44061) initial release" [operations/debs/python-voluptuous] (master) - https://gerrit.wikimedia.org/r/44408 [04:53:16] New review: Hashar; "Patchset 9 was a mistake" [operations/debs/python-voluptuous] (master) - https://gerrit.wikimedia.org/r/44408 [04:53:36] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [04:53:40] New review: Hashar; "PS10:" [operations/debs/python-voluptuous] (master) - https://gerrit.wikimedia.org/r/44408 [04:53:46] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [04:56:22] New review: Hashar; "I could not manage to build the package. I am giving up any attempt to build the package and will le..." [operations/debs/python-voluptuous] (master) - https://gerrit.wikimedia.org/r/44408 [05:13:35] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 05:13:29 UTC 2013 [05:14:17] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [05:24:07] New patchset: Krinkle; "Initial release" [operations/debs/python-voluptuous] (master) - https://gerrit.wikimedia.org/r/44408 [05:44:17] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 05:44:05 UTC 2013 [05:44:17] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [06:07:25] PROBLEM - Puppet freshness on search1019 is CRITICAL: Puppet has not run in the last 10 hours [06:14:35] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 06:14:32 UTC 2013 [06:15:15] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [06:45:05] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 06:44:57 UTC 2013 [06:45:15] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [07:15:45] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 07:15:38 UTC 2013 [07:16:16] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [07:46:05] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 07:46:02 UTC 2013 [07:46:16] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [08:13:25] PROBLEM - Puppet freshness on amslvs1 is CRITICAL: Puppet has not run in the last 10 hours [08:16:35] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 08:16:26 UTC 2013 [08:17:16] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [08:26:26] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 192 seconds [08:26:55] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 204 seconds [08:38:56] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 08:38:46 UTC 2013 [08:39:15] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [08:45:25] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [08:50:00] mark: apergos: I have noticed that niobium.wikimedia.org (eqiad bits cache) went to swap yesterday. Maybe because of all wikis switching to a new wmf branch [08:53:07] but only that one [09:03:38] !log restarted varnishncsa.vanadium on niobium, it was using 9gb memory or so [09:03:45] Logged the message, Master [09:07:41] \O/ [09:09:25] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 09:09:24 UTC 2013 [09:10:15] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [09:11:42] hrm [09:11:48] memory leak in varnishncsa :/ [09:14:40] ori-l: it went up in just a few minutes [09:14:48] probably has an other cause [09:15:02] hm [09:15:36] anyways, apergos, hashar -- thanks [09:16:03] sure [09:16:17] yeah I found a few refs to memory leaks but it was a very sudden jump [09:16:52] yeah, looking at http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&h=niobium.wikimedia.org&m=cpu_report&s=descending&mc=2&g=mem_report&c=Bits+caches+eqiad [09:17:20] it's a boa constrictor digesting an elephant [09:20:56] PROBLEM - SSH on amslvs1 is CRITICAL: Server answer: [09:21:55] RECOVERY - SSH on amslvs1 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [09:28:13] New patchset: Matthias Mullie; "Prepare AFTv5 config for deployment new features" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50744 [09:40:56] \O/ [09:41:06] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 09:40:55 UTC 2013 [09:41:15] New patchset: Hashar; "puppet now manage jenkins ssh authorized_keys" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53736 [09:41:18] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [10:11:35] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 10:11:31 UTC 2013 [10:12:15] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [10:13:25] PROBLEM - Puppet freshness on strontium is CRITICAL: Puppet has not run in the last 10 hours [10:28:59] New review: Faidon; "I don't really see the need for all these classes like geoip::packages and geoip::packages::python (..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/53714 [10:30:44] :-D [10:30:55] paravoid: how to end up maintaining the geopip module hehe [10:31:12] haha [10:31:21] I said to get ottomata on board [10:31:41] I might do it myself, but I thought I should give otto the benefit of review and fixes [10:32:18] that is a great way to leveup us [10:32:44] my idea was to do the grunt work and let otto take care of it :-] [10:33:26] the other contint changes have been rebased on production (i.e. they no more depends on the change that has geoip) [10:33:28] https://gerrit.wikimedia.org/r/#/q/status:open+project:operations/puppet+branch:production+topic:contintrefactor,n,z [10:33:50] that is where I would love gerrit to support feature branches properly [10:34:10] or maybe I should have created a branch :-] [10:42:07] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 10:42:00 UTC 2013 [10:42:15] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [10:49:25] PROBLEM - Puppet freshness on europium is CRITICAL: Puppet has not run in the last 10 hours [10:56:25] paravoid: this is what happens when you talk about puppet code review on a public channel! [10:56:36] (i.e. i notice and then add you to my outstanding review requests :P) [11:00:06] New review: Mark Bergsma; "So because generic-definitions.pp stuff lives outside a module, it shouldn't be used within modules,..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/49710 [11:12:35] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 11:12:27 UTC 2013 [11:13:15] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [11:15:25] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Puppet has not run in the last 10 hours [11:15:25] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: Puppet has not run in the last 10 hours [11:15:25] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [11:15:25] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [11:15:25] PROBLEM - Puppet freshness on msfe1002 is CRITICAL: Puppet has not run in the last 10 hours [11:25:09] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53423 [11:25:33] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53424 [11:26:08] New review: Faidon; "geoip module is coming soon" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/53422 [11:26:16] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53422 [11:26:37] \O/ [11:27:06] New review: Hashar; "thanks! :-]" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53422 [11:28:02] ah crap [11:28:14] wasn't a reviewer for 53425 [11:28:17] and it's a dependency [11:28:21] I have to review it now I gues :) [11:28:45] oooops [11:29:13] oh that one is a bit crazy :/ [11:30:04] New review: Faidon; "not terribly excited, but meh" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/53425 [11:30:19] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53425 [11:31:02] https://gerrit.wikimedia.org/r/#/c/53531/ says needs rebase or has dependency [11:31:09] but the dep is merged [11:31:09] wtf? [11:31:23] New patchset: Hashar; "contint: xdebug + code coverage directory" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53531 [11:31:28] if in doubt [11:31:30] press [Rebase] [11:32:07] I know that but I was wondering what happened [11:32:37] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53531 [11:32:45] New patchset: Hashar; "migrate testswarm module to contint module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53712 [11:32:55] and the last one get rid of the testswarm module [11:33:34] waiting for V+1 [11:33:50] yeah there is a bug in Zuul [11:34:05] that make it wait for all jobs currently running to complete before reporting [11:34:12] that is supposedly fixed upstream [11:34:14] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53712 [11:36:09] all done and merged in sockpuppet [11:36:32] hashar: I don't mind much but contint might benefit from a different split [11:36:38] rather than packages that install a bunch of unrelated packages [11:36:57] do you mean the huge package class? [11:37:02] yes [11:37:12] hm [11:37:14] like a class for testing mediawiki, another for testing mobile apps etc. [11:37:18] yeah I am not sure what to do with it [11:37:40] another for testing debian packages (package builder) [11:37:44] i think something is funky with the permissions on fenari: 'error: unable to unlink old '.gitmodules' (Permission denied)' when git-pull; reedy owns all files [11:38:12] ori-l: path? [11:38:15] erm, this is common/php-1.21wmf11 [11:38:20] sorry, it's late [11:38:27] isn't set-group-write` supposed to fix that ? :-D [11:39:00] ori-l: 3am30 ? :( [11:39:05] ori-l: you should probably sleep a bit [11:39:07] 4:39 [11:39:12] ah even worse [11:39:12] the clock changed [11:39:28] ori-l: I refuse to fix it for you, go to sleep [11:39:28] on the plus side, if you wait a couple hours you can prepare breakfast for your familly [11:39:34] ;-) [11:39:46] O^o [11:39:56] blah. FINE :P [11:40:16] good night / morning / whatever the hell it is [11:40:23] bonne nuit ori! [11:40:44] haha [11:42:53] paravoid: success. puppet ran properly [11:42:57] and zuul is still running [11:42:58] ;-) [11:43:06] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 11:42:58 UTC 2013 [11:43:15] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [11:43:34] apachectl configtest [11:43:35] Syntax OK [11:43:42] I am so happy [11:44:48] paravoid: I got another one to publish jenkins ssh public key on servers that have the jenkins user : https://gerrit.wikimedia.org/r/#/c/53736/ [11:45:01] that is to let me setup jenkins slaves boxes easily [11:45:17] ideally the UID should be the same everywhere but I have no idea how to do that [11:45:27] beside creating an admin user [11:45:33] sec [11:54:27] PROBLEM - Puppet freshness on kuo is CRITICAL: Puppet has not run in the last 10 hours [11:57:25] PROBLEM - Puppet freshness on virt1005 is CRITICAL: Puppet has not run in the last 10 hours [12:04:25] PROBLEM - Puppet freshness on hume is CRITICAL: Puppet has not run in the last 10 hours [12:07:25] PROBLEM - Puppet freshness on mw1109 is CRITICAL: Puppet has not run in the last 10 hours [12:13:26] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [12:13:45] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 12:13:36 UTC 2013 [12:14:15] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [12:15:26] PROBLEM - Puppet freshness on tola is CRITICAL: Puppet has not run in the last 10 hours [12:16:02] New patchset: Hashar; "remove contint web material" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53744 [12:16:51] !log gallium setting up /srv/ as a copy of integration/docroot.git [12:16:57] Logged the message, Master [12:18:25] PROBLEM - Puppet freshness on capella is CRITICAL: Puppet has not run in the last 10 hours [12:21:47] mark: I am getting the files that serves https://integration.mediawiki.org/ out of puppet to a new independent repo. Would you mind merging/sockpuppeting https://gerrit.wikimedia.org/r/#/c/53744/ please ? :-] [12:41:05] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 12:41:04 UTC 2013 [12:41:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [13:11:35] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 13:11:31 UTC 2013 [13:12:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [13:12:30] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53744 [13:12:37] mark: solved by faidon :-] [13:14:54] :) [13:15:12] !log Fixed OSPF on cr2-knams:xe-0/0/0 - csw1-esams:e8/2 (earlier) [13:15:19] Logged the message, Master [13:15:31] !log Set OSPF cost to 10 on csw1-esams:ve7 to facilitate ip multipath [13:15:37] Logged the message, Master [13:21:58] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 17 seconds [13:22:35] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [13:29:48] New patchset: Matthias Mullie; "Prepare AFTv5 config for deployment new features" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50744 [13:41:56] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 13:41:52 UTC 2013 [13:42:25] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [13:53:39] paravoid: wanna talk about the python-voluptuous packaging ? :-] [13:56:35] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 182 seconds [13:56:56] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 185 seconds [14:04:25] PROBLEM - Puppet freshness on colby is CRITICAL: Puppet has not run in the last 10 hours [14:12:35] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 14:12:25 UTC 2013 [14:13:05] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 30 seconds [14:13:26] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [14:16:36] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 21 seconds [14:21:25] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [14:28:26] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 8.74939815603 (gt 8.0) [14:42:23] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.73463129496 [14:42:54] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 14:42:50 UTC 2013 [14:43:23] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [15:14:38] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 15:13:16 UTC 2013 [15:14:38] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [15:14:38] hello good people, I know you're busy but to whom should I speak about an RT ticket? [15:14:39] milimetric: which ticket? [15:14:39] #4730 [15:14:39] the one you filed yesterday for us [15:14:56] it's linked to a deliverable for us [15:15:08] ah this one, yeah [15:15:26] lemme harass some other opsen about it and see if I can get some feedback [15:15:50] cool, thanks Jeff [15:16:01] sure [15:22:18] New review: Daniel Kinzler; "After some discussion: I fear it won't help much, but it's better then nothing. So let's give it a try." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/52797 [15:36:54] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [15:37:04] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [15:42:32] robla: you've been paged recently? [15:43:30] jeremyb_: yeah, I have. I was about to describe it, but then I realized that I should double check the assumption that I shouldn't be getting these before getting aggressive about getting unsubscribed [15:43:56] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 15:43:50 UTC 2013 [15:44:24] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [15:44:26] why doesn't it let me know when you've reopened? [15:44:33] mysterious software [15:46:15] hashar: fyi, performance not performances. (for computers at least. performances is e.g. opera or shakespeare) [15:46:58] unless you mean individual jobs. but that sounds weird [15:51:37] jeremyb_: where ? [15:54:29] hashar: 4733 [15:54:38] 42 ;) [15:54:47] I am out for now, going to get my daughter :-] [16:02:23] New patchset: Demon; "Don't announce comments on drafts to IRC" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53759 [16:03:09] New review: Demon; "Needs upstream change merged & deployed first: https://gerrit-review.googlesource.com/#/c/43490/" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/53759 [16:08:23] PROBLEM - Puppet freshness on search1019 is CRITICAL: Puppet has not run in the last 10 hours [16:14:13] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 16:14:12 UTC 2013 [16:14:24] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [16:14:24] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 16:14:17 UTC 2013 [16:15:23] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [16:21:26] Change merged: Matthias Mullie; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/50744 [16:21:44] seems to be having a problem opening pt.wikipedia pages with the math tag. a user reported me that opening the page http://pt.wikipedia.org/wiki/Pi for example, gets the errror Error: 1048 Column 'math_outputhash' cannot be null (10.64.16.23) [16:22:08] this was reported from brasil, i'm in portugal, and i can read it perfectly [16:22:38] New patchset: Mark Bergsma; "Make ganglia-monitor-aggregator not fail if an instance is running" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53761 [16:23:43] PROBLEM - Host mw1041 is DOWN: PING CRITICAL - Packet loss = 100% [16:24:09] New patchset: Mark Bergsma; "Aggregators should not collect metrics themselves" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53762 [16:24:58] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53761 [16:25:24] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53762 [16:26:01] !log mlitn synchronized wmf-config 'Prepare AFTv5 config for deployment new features' [16:26:08] Logged the message, Master [16:26:46] quit [16:31:09] New patchset: Mark Bergsma; "Invert" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53763 [16:31:13] Alchimista: Known bug [16:31:36] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53763 [16:32:35] Reedy: kenair is discussing it on tech, thanks [16:38:31] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Thu Mar 14 16:38:21 UTC 2013 [16:38:31] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [16:39:40] RECOVERY - Host mw1041 is UP: PING OK - Packet loss = 0%, RTA = 1.22 ms [16:42:05] New patchset: Mark Bergsma; "Manage aggregator instances through upstart directly" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53765 [16:42:33] PROBLEM - Apache HTTP on mw1041 is CRITICAL: Connection refused [16:43:32] New patchset: Mark Bergsma; "Manage aggregator instances through upstart directly" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53765 [16:44:28] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53765 [16:49:36] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 189 seconds [16:49:40] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 191 seconds [16:53:58]