[00:15:04] PROBLEM - Puppet freshness on db63 is CRITICAL: Puppet has not run in the last 10 hours [00:28:37] New patchset: Ryan Lane; "Add keystone config import to nova's api-paste" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17030 [00:29:16] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17030 [00:29:23] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17030 [00:43:25] New patchset: Ryan Lane; "Add keystone support to glance" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17033 [00:44:04] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17033 [00:44:15] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17033 [00:55:07] PROBLEM - Puppet freshness on ms-be10 is CRITICAL: Puppet has not run in the last 10 hours [01:21:03] New review: Catrope; "Can and should be merged without 16966, see comments on that change." [operations/mediawiki-config] (master); V: 0 C: 1; - https://gerrit.wikimedia.org/r/16967 [01:24:55] New patchset: Pyoungmeister; "apache refactor: first half of mark's comments" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17035 [01:25:33] New patchset: Pyoungmeister; "apache refactor: second half of mark's comments" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17036 [01:26:09] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17035 [01:26:10] New patchset: Pyoungmeister; "apache refactor: second half of mark's comments" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17036 [01:26:46] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17036 [01:41:53] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 245 seconds [01:41:53] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 246 seconds [01:48:29] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 642s [01:53:08] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 20 seconds [01:54:02] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 37s [01:54:38] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 17 seconds [03:23:17] RECOVERY - Puppet freshness on mw60 is OK: puppet ran at Tue Jul 31 03:22:52 UTC 2012 [04:18:43] New patchset: Ori.livneh; "*UNTESTED* Calls to bits-lb.eqiad/event.gif 204'd" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16724 [04:19:24] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16724 [04:48:40] New patchset: Ori.livneh; "*UNTESTED* Event tracking endpoint" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16724 [04:49:19] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16724 [06:35:54] PROBLEM - Puppet freshness on srv281 is CRITICAL: Puppet has not run in the last 10 hours [07:19:51] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [07:32:53] New patchset: Ori.livneh; "(RT 3325) olivneh restricted => mortals" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17040 [07:33:37] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17040 [07:43:37] New patchset: Ori.livneh; "(RT 3325) olivneh restricted => mortals" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17040 [07:44:14] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17040 [07:44:17] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [07:45:10] New patchset: Jeremyb; "certs.pp: fix cert IDs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17041 [07:45:48] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17041 [07:52:12] * jeremyb wonders which ops are awake? [07:59:46] paravoid: you awake? [08:00:43] preilly: it is 10am ;-) [08:01:01] jeremyb: not in SF [08:01:11] wait, no i can't do math. 11am [08:01:12] jeremyb: it's 1:01:06 AM [08:01:47] Tuesday (PDT) - Time in San Francisco, CA [08:01:55] 31 08:01:12 < preilly> jeremyb: it's 1:01:06 AM [08:02:03] +3 for CEST [08:02:07] err, EEST [08:02:13] * jeremyb is obviously half asleep [08:05:25] !g I9e1b90579fba24 [08:05:25] https://gerrit.wikimedia.org/r/#q,I9e1b90579fba24,n,z [08:05:47] well if someone wants to sanity check or review or merge that, great, thanks ;-) [08:50:21] still no ops... [08:50:26] * jeremyb runs away shortly [08:53:44] jeremyb: shouildn't you sleep now? [10:03:45] New review: Hashar; "Ideally we would want to lookup the openssl package version and use that instead of the distribution..." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/17041 [10:08:33] jeremyb: did a basic review. I have sent an email to ops about the certs being broken on Precise. Replied to it and mentioned your change. [10:08:44] hopefully will attract someone :) [10:15:45] PROBLEM - Puppet freshness on db63 is CRITICAL: Puppet has not run in the last 10 hours [10:55:57] PROBLEM - Puppet freshness on ms-be10 is CRITICAL: Puppet has not run in the last 10 hours [12:19:35] !log Depooled cp1042 for reinstall with Precise [12:19:43] Logged the message, Master [12:27:54] PROBLEM - Host cp1042 is DOWN: PING CRITICAL - Packet loss = 100% [12:28:10] hm? so all of the mobile varnishes are going to be internal? [12:29:24] New patchset: Faidon; "certs: use c_rehash instead of manually symlinking" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17065 [12:30:02] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17065 [12:30:54] New review: Faidon; "Thanks Jeremy. While reviewing this I thought of taking a different approach, namely:" [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/17041 [12:31:30] mark: ping? [12:32:15] yes? [12:32:15] I've been getting multiple mails per day for the new jobrunner stuff (I've reviewed most of the incarnations) [12:32:36] it has the potential of breaking jobs cluster-wide [12:32:44] oh noes [12:32:59] so, what do you suggest? merge it? or you want to have a look first? [12:33:03] or? [12:33:07] i'll have a look today [12:33:38] I think I can handle it, I'm only asking if /you/ want to double-check :) [12:34:00] Backend host '"srv193.pmtpa.wmnet"' could not be resolved to an IP address: [12:34:01] Temporary failure in name resolution [12:34:01] (Sorry if that error message is gibberish.) [12:34:01] ('input' Line 129 Pos 17) [12:34:01] .host = "srv193.pmtpa.wmnet"; [12:34:01] ----------------####################- [12:34:24] !? [12:34:59] ah nm [12:35:05] i changed the vlan of that host hehe [12:35:10] heh [12:35:18] so, what's the plan with the mobile cp? [12:35:23] move them internal [12:35:33] how will that work? [12:35:39] how will that not work? [12:35:50] erm? lvs dr? [12:35:54] yes? [12:36:13] if the frontends are internal how will they send traffic to clients? [12:36:21] like they always do [12:36:25] send to the router, router sends on [12:36:54] mind you, lvs service ip is out of subnet [12:37:02] yeah, I thought of that [12:37:02] whether it's a public ip subnet or a private ip subnet does not matter one bit ;) [12:37:08] right [12:37:12] the internal name confused me [12:37:27] it's not exactly "internal" in the sense I'm used to [12:38:37] I just had to reinstall git [12:38:57] and apple says "your security preferences currently forbid installing apps from unknown developers!" [12:39:17] shame on you! trying to install applications on your operating system! [12:42:18] New patchset: Mark Bergsma; "Move cp1042 internal" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17067 [12:42:57] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17067 [12:43:09] so are you planning to move all cp* to internal? [12:43:21] and all !lvs I presume? [12:43:42] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17067 [12:51:12] yes [12:51:24] if it doesn't need to talk to the internet [13:16:16] !log compiling phpllvm tests on bast1001 [13:16:25] Logged the message, Master [14:33:37] !log Reinstalled cp1042 with Precise [14:33:45] Logged the message, Master [14:33:52] !log Repooled cp1042 [14:34:00] Logged the message, Master [15:58:52] notpeter: oh? are you in SF? [16:00:27] paravoid: indeed. I was at defcon last week, so I thought I'd come to sf for a week [16:00:29] !log beginning swift deploy to make thumbnail requests bypass ms5 and go straight from swift to the rendering cluster [16:00:37] Logged the message, Master [16:00:41] and I got here just in time to be sick! wooo [16:01:16] New patchset: Aaron Schulz; "Set the 404 handlerUrl for the thumb zone." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17078 [16:02:41] Change merged: Aaron Schulz; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17078 [16:03:11] you can even get human viruses at defcon now? [16:05:44] so it would seem... luckily, I'm pretty confident that my electonics are unscathed, sadly, there was a lot of NSA/DoD people at defcon (like, *a lot*), so this is probably a mind control virus [16:08:04] paravoid: Around? [16:08:10] * RoanKattouw just got up and is making breakfast [16:09:14] hehe [16:09:17] i'm looking at your commits now [16:09:22] it's a bit confusing like this [16:09:28] several gerrit changes [16:09:44] i'm thinking, perhaps we should make modules out of them [16:09:54] then we also don't have to worry (now) about keeping current systems in tact [16:10:47] sure, that's reasonable [16:10:54] I can also squash them all, if tha twould help [16:11:16] ideally it would become a new patchset to the existing change [16:11:23] then I can more easily see what's changed and what has happened to the comments [16:11:38] but so far it's looking better anyway ;) [16:11:53] part of me would also like to switch it to a module later, so as to not hold up deploy. [16:12:47] yes, I believe that I addressed all of your comments. I used it as a checklist [16:13:51] mark: I'm still waiting for ack on the ssh module [16:13:57] and possibly varnish, although that has diverged from now [16:14:01] and wasn't tested in the first place [16:14:15] hold up what deploy? [16:14:34] paravoid: well we need to decide on the tabs/spaces don't we? ;) [16:15:23] AaronSchulz: about to push https://gerrit.wikimedia.org/r/#/c/16821/ [16:15:49] mark: eqiad? [16:16:03] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16821 [16:16:13] why would that hold up a deploy? [16:16:43] you're drastically changing things now with existing systems running off those manifests [16:16:48] that's risky and painful [16:16:51] true [16:16:58] and a trivial change to make something into modules would hold up a deployment? :) [16:17:05] true [16:17:14] it would allow you to use the new stuff independently for now [16:17:25] fair enough [16:17:37] I am a fan of that [16:18:15] also [16:18:26] why do you name files different from the classes contained within them all the time? [16:18:34] at least the autoloader will deal with that ;)' [16:19:52] usually due to trying to reconcile naming conventions and not quite getting it right. I will make sure to not do that in the future [16:20:10] so [16:20:18] if we make modules we can do this right from the start [16:20:21] what modules should we have [16:20:34] a module that maintains a functional mediawiki instance [16:20:45] not an apache service, but just a mediawiki instance on a host [16:20:49] yep [16:20:55] so also used for things like cron jobs, dumps, whatever [16:21:14] it should be as generic as we can make it with our current crap deployment system [16:21:25] heh [16:21:54] and... a module called applicationserver I think [16:21:58] I hate the word 'apache', too confusing here [16:21:59] mark: I know we do, I'm just saying that we should probably decide that soon if you want other people to start using modules :) [16:22:03] sounds reasonable [16:22:10] I think we need both [16:22:21] both what? [16:22:26] an apache module for all the apache related stuff [16:22:28] AaronSchulz: live on ms-fe1 and 2 [16:22:34] yeah that's possible [16:22:35] (50% deployed) [16:22:36] packages, sites-avail/enabled, reload etc. [16:22:45] but we also have more generic apaches [16:22:46] that's being reused in the appserver module [16:22:49] not related to mediawiki [16:22:55] right, the apache module should be usable by itself [16:23:01] and I don't really want to do that now [16:23:10] so I'd be ok with getting an apache module later [16:23:20] the way I envision it is having multiple layers of parameterized classes ending up in a role class [16:23:21] our apache config right now is a mess anyway, there's no chance we can integrate that into a new apache module in time [16:23:26] yeah [16:23:34] but we can't realistically do it all now [16:23:46] okay, I haven't looked at that much [16:23:56] so, I wouldn't know, I'm talking purely theoritical here [16:24:01] theoretical even [16:24:06] I fully agree [16:24:14] great [16:24:18] just being realistic here now [16:24:33] so, in the future, our application servers (from the role classes) would use an apache module (also used for other stuff) [16:24:43] for now, they'll probably handle the apache config themselves [16:24:51] kk [16:24:55] since it works very differently from how we handle apache/sites config elsewhere (which ALSO sucks, for different reasons ;) [16:25:04] hehehe [16:25:17] so I would be ok with it if the apache handling for app servers would exist in... the applicationserver module, for now [16:25:21] we also have webserver::apache from what I can see [16:25:29] that's for non-mediawiki stuff [16:25:32] right [16:25:52] let's not use that unless it's something simple like "install the apache packages" [16:25:58] and it already does exactly what we need [16:26:05] no config handling [16:26:10] ok, a mediawiki module and an appserver module (that will currently include apache stuff) [16:26:13] yeah, I was just saying that an apache module could replace both in the future [16:26:14] yup [16:26:23] and then we need to do something with all the special case mediawikis [16:26:29] I mean, special case servers [16:26:31] yeah... [16:26:33] for dumps, cron jobs, etc [16:26:39] part of that is role classes [16:26:41] part of it may not be [16:26:43] I'm not sure yet [16:26:50] perhaps we should start small and simple [16:26:53] when we use modules, we have that liberty [16:26:59] yep :) [16:27:03] since we don't need to worry about the existing pmtpa cluster [16:27:14] well, hopefully all of those things will be doable with just the mediawiki module [16:27:20] we can reuse those existing manifest bits, but only if they're good [16:27:23] (and not much is ;) [16:27:49] btw, something else: what I've seen some people do, and you might be interested in [16:27:59] is having a two-level hierarchy for modules [16:28:12] so, modulepath=/etc/puppet/modules/base;/etc/puppet/modules/site [16:28:19] with base/ having only software-related modules [16:28:24] e.g. apache, squid, varnish etc. [16:28:47] Change abandoned: Jeremyb; "Thanks Faidon, that WORKSFORME" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17041 [16:28:48] and site having "site-specific" modules, appserver, dbserver, mcserver etc. [16:29:08] I'm not a big fan [16:29:10] but some people are [16:29:16] so I'm just putting it on the table :) [16:29:18] it's much like our role classes [16:29:43] well, you can do that (as I suggest) without having a different *directory* hierarchy [16:29:58] yeah [16:30:05] btw, one problem we have with our role classes now [16:30:11] is that they're used for two different things: [16:30:22] one is to tie different manifests/modules/whatever into one system [16:30:32] so a system has just one role class, normally [16:30:48] * paravoid nods [16:30:50] but now, especially for labs, it's also used to set up a common configuration of some manifest classes [16:30:54] to fill in parameters [16:31:05] but then multiple role classes end up on one box [16:31:10] could you give an example of that? [16:31:18] some of hashar's work recently does that [16:31:27] so, some misc manifest gets parameterized [16:31:42] and to use that tiny service on a box with other services on it [16:31:49] a role class is created, to fill in the parameters [16:31:58] and then that role class, amongst others (now or in the future :) is used [16:32:23] iirc one example was fenari/bast1001 [16:32:25] let's see [16:32:32] hrm, I'm not sure I fully understand but it doesn't sound very good [16:32:34] nagios-wm: [16:32:36] oops [16:33:02] * aude lurking and learning puppet [16:33:29] ah [16:33:30] nfs1/2 [16:33:30] btw, the base/site thing I was telling before, explained in better and more words: http://serialized.net/2009/07/puppet-module-patterns/ [16:33:37] which are not puppetized very well of course [16:33:38] but still: [16:33:39] include standard, [16:33:39] misc::nfs-server::home, [16:33:39] misc::nfs-server::home::backup, [16:33:39] misc::nfs-server::home::rsyncd, [16:33:39] misc::syslog-server, [16:33:40] ldap::server::wmf-cluster, [16:33:40] ldap::client::wmf-cluster, [16:33:41] backup::client [16:33:41] # don't need udp2log monitoring on nfs hosts [16:33:42] class { "role::logging::mediawiki": monitor => false } [16:34:09] okay [16:34:10] that's not a role class which completely defines one system [16:34:16] why not? [16:34:16] and of course, that's hard in this case [16:34:20] New review: Catrope; "Yes. The l10nupdate script works in its current form but this is much nicer (and makes new extension..." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/6905 [16:34:23] it's a production bastion host [16:34:30] no it's nfs1/nfs2 [16:34:42] oh, right, I was confused [16:34:53] oh you mean it's NFS + syslog + LDAP? [16:34:57] yeah [16:35:02] role classes are easy for clustered systems [16:35:07] not for misc machines with multiple services on them [16:35:12] aha [16:35:17] I think that's fine [16:35:24] because that system essentially fulfills multiple roles [16:35:25] it might conflict some day [16:35:28] yeah I agree [16:35:33] and we could (and are planning to) split it up [16:35:34] but many role classes also include "standard" for example [16:35:38] which sets up their base stuff [16:35:43] so? [16:35:48] that won't conflict [16:35:54] you can include a class as many times as you want [16:35:58] i know [16:36:05] but we also need to make that parameterized really [16:36:28] to allow for different resolvers, or a different syslog server, or stuff like that [16:36:30] PROBLEM - Puppet freshness on srv281 is CRITICAL: Puppet has not run in the last 10 hours [16:36:31] and then it gets difficult [16:36:36] i'm not saying this is a problem -yet- [16:36:51] I'm just signaling that this is slightly different usage of role classes that could become problematic [16:37:04] hm [16:37:18] I need a whiteboard dammit [16:37:23] hehe yeah [16:37:26] use your windows [16:37:28] whiteboard paint? [16:37:31] if you have whiteboard markers that works [16:37:52] jeremyb: horribly expensive and dirty overall [16:37:59] (been there done that) [16:38:06] i think you can expense a whiteboard if you want one ;) [16:38:25] hahaha [16:38:43] hah [16:39:03] paravoid: otoh, there's darst's whiteboard ;) [16:39:33] which one? [16:39:43] paravoid: whiteboard.debian.net [16:39:48] ah [16:57:57] New patchset: Bhartshorne; "changing swift pmtpa-prod cluster to use the image scalers directly instead of ms5" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17082 [16:58:00] AaronSchulz: ^^^ [16:58:11] \o/ [16:58:24] kill kill kill [16:58:35] i'll be even happier when you'll kill the solaris boxes [16:58:40] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17082 [16:58:56] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17082 [16:59:41] * AaronSchulz yays [17:05:17] ok, live on both 1 and 2 [17:06:50] seems ok [17:07:17] running puppet on 3 and 4 [17:08:32] restarting the swift proxy on 3 and 4 [17:08:38] paravoid: did you see you review? [17:08:44] your review* [17:14:44] aude: http://www.netways.de/puppetcamp/puppetcamp2012/program/ [17:16:53] AaronSchulz: the swift side of things is done. [17:17:01] \o/ [17:18:48] PROBLEM - SSH on lvs6 is CRITICAL: Server answer: [17:19:24] mark: ^ lvs ? [17:19:57] notpeter: search32...dell is sending another mainboard. I will get it when i get back [17:20:09] RECOVERY - SSH on lvs6 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [17:20:27] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [17:21:21] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [17:22:38] New review: Mark Bergsma; "This does not really belong here. This looks like a "role class" which uses a jobrunner with a speci..." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/16654 [17:22:49] cmjohnson1: cool! [17:22:51] thank you [17:23:13] that will be the 2nd mainboard and countless DIMM...after this..it's going back [17:23:18] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms [17:23:45] it's cursed [17:23:49] cmjohnson1: i have found a few times that a bad power supply can sometimes keep killing those off [17:23:55] if it's not delivering the proper voltages [17:24:06] LeslieCarr: seen lvs6 above? [17:24:53] lesliecarr: good thought..i will check the power [17:25:18] sigh [17:25:21] checking out lvs6 [17:25:29] y u break it ? [17:26:04] LeslieCarr: it recovered (also above) but i thought maybe it was worth a look anyway [17:26:09] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [17:26:38] hrm, lvs6 looks happy…. [17:26:42] checking out spence [17:28:44] heh, someone should have fixed up icinga by now… i wonder who that was… ;) [17:29:01] * jeremyb blames neon [17:29:10] yeah, bad neon! [17:29:15] !log finished swift deploy - image scaling requests now go straight to the rendering cluster [17:29:23] Logged the message, Master [17:29:36] cmjohnson1: also, sometimes a lack of being hit with hammers can negatively affect servers. maybe take a look at that [17:29:56] mark: I was wrong yesterday when I said that ms5 is now completely out of the loop - mediawiki still checks it for images, just not swift. [17:29:59] notpeter: is there a study? [17:30:18] maplebed: ok [17:30:22] maplebed: when is that going to change? [17:30:38] notpeter: that is a good idea..i did notice that search32 was getting comfy next to dataset1...maybe they worked something out [17:30:42] mark: when we switch originals, currently planned for next week. [17:30:51] cool [17:30:53] jeremyb: http://www.allthingsdistributed.com/images/hammer.JPG [17:30:55] there is the study [17:31:04] cmjohnson1: hahaha [17:31:42] maplebed: or maybe the week after, next week might be a bit close [17:31:52] notpeter: i don't think it's a reliable source [17:31:54] :P [17:32:32] jeremyb: it's on the internet. wikipedia is on the internet. wikipedia is a reliable source. therefore, that "study" is a reliable source [17:32:44] uhuh [17:32:54] man, maybe I hsould be a lawyer. as I understand it, that's the level of understanding of logic that is required... [17:34:11] New patchset: J; "Add videoscaler class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16654 [17:34:53] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16654 [17:35:22] New review: J; "thanks for the feedback, moved the class to role::jobrunner::videoscaler" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/16654 [17:38:54] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.031 second response time [17:45:14] New patchset: Demon; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [17:45:30] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [17:45:49] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/13484 [17:46:12] New review: Demon; "In PS10: I removed the conditional inclusion of gerrit::account. Really, any server using gerrit::je..." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/13484 [17:48:03] ^demon: you know you failed lint? [17:48:36] <^demon> That was already his hashar's fault. [17:50:11] RECOVERY - Host cp1042 is UP: PING OK - Packet loss = 0%, RTA = 35.37 ms [17:50:29] PROBLEM - Varnish traffic logger on cp1042 is CRITICAL: Connection refused by host [17:50:41] New patchset: Demon; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [17:50:47] PROBLEM - Varnish HTCP daemon on cp1042 is CRITICAL: Connection refused by host [17:51:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/13484 [17:51:35] <^demon> jeremyb: Fixed his mistake :) [17:51:45] i see ;) [17:53:11] RECOVERY - Varnish traffic logger on cp1042 is OK: PROCS OK: 3 processes with command name varnishncsa [17:53:29] RECOVERY - Varnish HTCP daemon on cp1042 is OK: PROCS OK: 1 process with UID = 997 (varnishhtcpd), args varnishhtcpd worker [17:54:32] j^: ma_rk is having me move what I'm doing into modules, so what I'm doing will now be independent. [17:57:10] Change merged: Catrope; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16967 [17:57:29] what's the latest on whitespace? [17:57:35] j^: yep [18:07:02] jeremyb: it's still white [18:07:05] More news in an hour [18:07:24] uhuh [18:07:43] but tabs and spaces are *both* white! [18:34:36] PROBLEM - SSH on ms-be1006 is CRITICAL: Connection refused [18:41:48] RECOVERY - SSH on ms-be1006 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [18:46:50] New patchset: MaxSem; "Wiki Loves Monuments API server, RT#3221" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16990 [18:47:28] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16990 [18:50:12] PROBLEM - Host ms-be1005 is DOWN: PING CRITICAL - Packet loss = 100% [18:54:17]