[08:10:11] New patchset: Hashar; "update several live hacks:" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/12841 [08:11:12] New review: Hashar; "This change submit to git several changes which have only be available in the local subversion repos..." [operations/apache-config] (master) C: 0; - https://gerrit.wikimedia.org/r/12841 [08:12:40] New review: Dzahn; "ack, these rules are either live already or have been tested on srv193 by us before" [operations/apache-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/12841 [08:12:42] Change merged: Dzahn; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/12841 [08:43:21] New patchset: Hashar; "unduplicate sync-apache-simulated and sync-apache" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12842 [08:43:53] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12842 [09:15:26] New review: Dzahn; "yea, the sync-apache-simulated was a temp. thing to test with rsync -n and not even meant to be here..." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/12842 [09:15:29] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12842 [09:20:52] !log creating dsh group "testwikipedia" with just srv193, creating sync-apache-test to just sync there...testing sync [09:20:59] Logged the message, Master [09:23:14] !log looking good. running sync-apache [09:23:19] Logged the message, Master [09:30:46] !log apache-graceful-all to push out needed redirects for education [09:30:52] Logged the message, Master [09:38:37] !log so the several redirects for education->outreach requested to work by today look good now. RT-3138 [09:38:43] Logged the message, Master [10:34:56] Ryan_Lane: when do you want to do the labs network node stuff? [10:35:23] we can probably do it now, if you'd like [10:35:41] ok [10:35:47] i also need to do reviews today ;) [10:35:51] * Ryan_Lane nods [10:36:00] labs stuff sounds better! [10:36:19] hahaha [10:36:24] not often I hear that [10:36:32] especially not from me [10:37:02] I'm trying to set up a swift backend for a test wiki and am failing miserably [10:37:14] there's absolutely no documentation on this\ [10:37:21] it's really annoying [10:37:24] anyway.... [10:37:28] let's see [10:37:32] we need to give each node an IP [10:37:52] I think the routing rules are probably already partially set up [10:38:21] on what? [10:38:44] I think we're sending traffic to the public IPs to all compute nodes already [10:38:57] unless that changed to only be virt2 [10:39:17] i'm pretty sure we don't [10:39:22] route 208.80.155.0/24 { [10:39:22] next-hop 10.4.16.3; [10:39:22] readvertise; [10:39:22] no-resolve; [10:39:22] } [10:39:31] seems we don't then :) [10:39:51] 155? [10:40:10] look for 153 [10:40:11] not 155 [10:40:19] 155 is what leslie set up [10:40:20] oh right [10:40:29] /* Route virt test server public IPs to the virt gw */ [10:40:29] route 208.80.153.192/26 { [10:40:29] next-hop 10.4.16.3; [10:40:29] readvertise; [10:40:29] no-resolve; [10:40:30] } [10:40:34] that comment is a bit misleading [10:40:34] heh [10:40:47] btw [10:40:51] we're out of IPs on 153 [10:41:06] I need to get a reverse proxy going soon [10:41:08] just continue with 153.257 [10:41:17] 257? [10:41:24] who knows, perhaps it'll work [10:41:27] :D [10:41:36] so which nexthops should I add? [10:41:47] sec [10:42:11] 2-9 [10:42:17] 9?! [10:42:20] oh [10:42:22] ciscos [10:42:24] yes [10:42:34] 8 systems total [10:42:41] not sure our routers support that many next hops ;) [10:42:43] we'll see [10:43:28] are you ready? [10:43:29] hm. I'm pretty sure we can do multi-node network node without having them on all nodes [10:43:31] it may breaqk everything ;) [10:43:35] heh [10:43:35] yeah [10:43:45] route 208.80.153.192/26 { [10:43:45] next-hop [ 10.4.16.2 10.4.16.3 10.4.16.4 10.4.16.5 10.4.16.6 10.4.16.7 10.4.16.8 10.4.16.9 ]; [10:43:45] readvertise; [10:43:45] no-resolve; [10:43:46] } [10:44:06] ok [10:44:07] there we go [10:44:15] still have network [10:44:19] still? [10:44:24] no [10:44:25] not now [10:44:26] haha [10:44:29] heh [10:44:30] rollback? [10:44:33] yes, please [10:44:45] done [10:44:48] back [10:45:29] let's test with just two boxes first ;) [10:45:37] sure [10:45:41] .2 and .3? [10:45:49] yep [10:45:54] we need to give it an IP [10:46:04] you didn't do that yet? [10:46:28] no. I need to figure out which one we gave to virt2 as well [10:46:37] what do you mean here? [10:46:37] users report some DNS issues on -wikitech [10:46:38] what kind of ip? [10:46:41] public [10:47:07] 208.80.153.192 [10:47:14] * Ryan_Lane nods [10:47:18] that's for outgoing dynamic NAT [10:47:21] yeah [10:47:44] each of the servers will need one [10:47:48] yep [10:47:52] I'm going to have to deallocate some public Ips [10:47:58] err. floating ips [10:48:34] thankfully I saved IPs for just this purpose [10:48:43] damn it [10:48:49] I used one in the sequence for bastion [10:48:54] hehe [10:49:04] it would be handy if they were all in one range yeah [10:49:06] for acls [10:49:22] since that's where labs traffic is coming from, from the pov of the internet [10:49:31] yeah. would be nice [10:49:35] so that would be 208.80.153.192/29 then [10:49:36] going to be hard to change bastion's IP [10:50:06] hm. maybe not [10:50:19] it won't throw key errors [10:50:25] it won't? [10:50:26] it'll just be unavailable temporarily [10:50:35] right [10:50:37] new IPs are fine. reusing old IPs is a problem [10:50:38] new ips will just be added [10:50:39] yeah [10:50:49] and noone's gonna ssh to the old ip [10:50:53] yep [10:51:05] lemme do that really quick [10:51:19] this is going to be a pain in the ass [10:52:29] hm via ldap it is [10:56:27] 1H ttl [10:56:44] damn [10:57:06] I'll have to keep the old one allocated at least that long for it [10:57:11] yes [10:59:05] stupid nova doesn't tell me which project has some of these ips allocated [10:59:12] dear mysql [11:01:28] ah. juju [11:01:37] is that still active? [11:01:44] nope [11:01:47] taking that ip back [11:02:46] we can use 208.80.153.193 right now [11:02:49] I already took it back [11:03:01] I need to remove it from the floating list, so no one else can use it, though [11:04:10] lemme start fixing bastion now [11:10:48] well… I caused an nxdomain temporarily [11:10:53] hopefully no one was doing a lookup [11:14:19] * Ryan_Lane groans [11:14:37] of course, the old IP was used for something. it was in testlabs, so no one should notice, but I still need to send an email [11:14:55] it's in my file, but shouldn't be in anyone else's [11:17:55] ok.... [11:20:15] I'm really glad I stashed a range of IPs [11:23:30] New patchset: Ryan Lane; "Moving ip address addition to the nodes, rather than the network service" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12848 [11:23:35] this openstack puppet config really badly needs to be fixed [11:23:48] maybe after we do this, and I upgrade gluster, I'll rewrite all of this [11:24:02] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12848 [11:24:33] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12848 [11:24:35] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12848 [11:24:39] New patchset: Matthias Mullie; "Bug 37616 - Article Feedback - Increase Test Sample to 10% of English Wikipedia (using article ID code)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12849 [11:28:01] New patchset: Ryan Lane; "Another temporary fix for network nodes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12850 [11:28:05] oh yeah, this is all fucked up [11:28:35] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12850 [11:28:53] * mark is writing reviews in the mean time [11:29:13] * Ryan_Lane nods [11:29:51] New patchset: Ryan Lane; "Another temporary fix for network nodes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12850 [11:29:53] should be done soon [11:30:23] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12850 [11:30:29] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12850 [11:30:32] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12850 [11:36:55] Stderr: "iptables-restore v1.4.4: host/network `208.80.153.0/22,10.0.0.0' not found\n [11:36:57] hm [11:38:23] oh [11:38:24] right [11:38:33] nova-network is patched [11:38:35] let me document that [11:39:52] actually, let me puppetize that [11:41:28] that indeed solved that [11:41:50] * mark thinks there's a cheese cake in the making here [11:42:00] oh? your gf making it? [11:42:04] yeah [11:42:11] getting hungry now [11:44:23] how long are you still in berlin for? [11:44:30] till friday [11:44:46] enjoying it? [11:44:52] yep. great place [11:45:02] heya [11:45:07] howdy [11:45:09] hi [11:45:15] we're doing network node per compute node [11:45:28] great [11:45:56] I'm adding my patched code into puppet [11:46:03] I need to do so for the compute code too [11:46:17] *patched nova-network code [11:46:54] the router's dropping packets from virt6-8->stafford [11:46:59] or the other way around [11:47:29] New patchset: Ryan Lane; "Adding in nova-network patch" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12851 [11:47:39] I really need to upstream this nova-network change [11:48:01] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12851 [11:48:04] (I reinstalled all three of them during the weekend) [11:48:58] New patchset: Ryan Lane; "Adding in nova-network patch" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12851 [11:49:01] cool [11:49:07] could it be the access list? [11:49:30] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12851 [11:49:37] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12851 [11:49:39] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12851 [11:50:28] I really want to be done with virt6-8 :) [11:51:19] mark: access list? [11:51:31] New patchset: Ryan Lane; "Fix typo" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12852 [11:51:41] paravoid: ssh into them from one of the virt nodes [11:51:58] we occasionally need to copy new_install off to other machines temporarily [11:52:03] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12852 [11:52:07] puppet access works [11:52:15] just not ping or ssh [11:52:32] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12852 [11:52:34] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12852 [11:52:51] paravoid: what are their ips? [11:52:56] virt6.pmtpa.wmnet doesn't resolve [11:53:09] oh nm [11:53:13] eh? [11:53:16] getting lunch. back in a little bit [11:53:19] was on my own colo [11:53:28] Ryan_Lane: I've ssh'ed fine [11:53:36] also run the first puppet run with sockpuppet, that worked [11:53:47] ahhh ok [11:53:53] and puppetca -s [11:54:02] term sockpuppet { [11:54:02] from { [11:54:02] source-address { [11:54:02] 10.4.16.0/24; [11:54:02] } [11:54:03] destination-address { [11:54:03] 10.0.0.245/32; [11:54:04] 10.0.0.24/32; [11:54:04] } [11:54:05] protocol tcp; [11:54:05] destination-port 8140; [11:54:06] } [11:54:07] then accept; [11:54:45] what about on the other way around? [11:54:54] there's no filter on that [11:55:03] except usual uRPF [11:55:15] you're right about the drop vs reject though [11:55:17] because it does this: [11:55:46] term default { [11:55:46] filter deny-private-subnets-in4; [11:55:46] } [11:55:57] and that filter does a DROP because it's mostly for external traffic coming in [11:56:02] where we don't want to do reject per se [11:56:05] hmm let's see [11:56:31] hmm, now it works [11:56:38] i didn't change it [11:56:40] so, yesterday, I was running puppet off of these 3 nodes [11:56:52] and it timed out on connect [11:57:16] handshake worked but then it couldn't pass traffic; even tried with openssl s_client [11:57:19] that's weird [11:57:22] mtu issue? [11:57:51] thought about it but why? it's mtu 1500 as everything else on our network, right? [11:58:06] yeah [11:58:14] perhaps some network node issue [11:58:22] on the networking side it's just another vlan [11:59:03] Protocol inet, MTU: 9170 [11:59:14] it's jumbo like any other, but no servers are configured for > 1500 mtu [11:59:57] anyway there's no filtering that should affect virt6-8 specifically [12:00:06] okay [12:00:08] sorry for the trouble [12:00:15] don't know what happened there [12:00:19] and I just assumed filtering :) [12:00:20] it is strange [12:01:53] puppet runs now on all three of them, I'll just be happy about that [12:02:02] okay [12:02:05] btw, since you're here, I got pages on Saturday [12:02:13] so did I [12:02:14] and from reading irc backlog, it was the same issue again [12:02:17] apergos handled it [12:02:20] that's the third time [12:02:23] yeah [12:02:32] we should find a more permanent solution… [12:02:38] yes [12:02:41] i wonder why it started happening [12:02:49] that would make people happy on the weekends [12:03:27] well, one thing is why boxes keep getting killed by load [12:03:46] but the other architectural issue is how a malfunctioning jobrunner can get the whole cluster down [12:04:26] more like, why does a single malfunctioning memcached kill the cluster [12:04:39] but, I haven't looked at it at all yet [12:04:45] I saw there were like 3 of you doing a good job ;-) [12:05:19] well, it's just incident response so far [12:05:31] so, I was thinking that maybe we should split apache/jobrunners from memcache [12:05:40] that's happening soon anyway [12:05:55] how soon? [12:05:59] asher has the boxes [12:06:28] yeah getting the memcaches onto their own box will be a huge win [12:06:32] boxes [12:18:40] I wonder if setting memory pressure per process makes sense for memcached [12:19:15] hmm, or is that just for ooms [12:24:52] New patchset: Hashar; "(bug 37699) Change logo on uz.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11977 [12:25:37] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11977 [12:25:39] mutante: are the puppet configs still in SVN? [12:25:39] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11977 [12:26:40] New review: Hashar; "deployed live" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11977 [12:27:21] rebooting [12:28:00] Platonides: http://lists.wikimedia.org/pipermail/mediawiki-distributors/2012-June/000018.html [12:41:21] mark: ok. back [12:41:28] mark: and I have network nodes on virt1/2 [12:41:32] ok [12:41:37] so you wanna have those two active now [12:41:48] yes, please [12:42:08] done [12:42:17] still have network [12:42:38] of course, I doubt anything is routing through virt1 [12:44:11] oh gosh [12:44:14] i'm ordering cables in this web shop [12:44:14] ? [12:44:20] and they have a quick link to your shopping cart [12:44:23] "to show others" [12:44:27] so I test that link [12:44:29] and the way it works [12:44:30] hm. planet.wmflabs.org is timing out [12:44:34] they encode all articles in the url [12:44:38] so I loaded that link [12:44:46] and it doubled my shopping cart effectively [12:44:52] because it just added everything again in my shopping cart [12:45:10] heh [12:45:11] nice [12:46:51] mark: can you revert the change for now? [12:46:56] ok [12:47:12] done [12:47:27] I want to make sure it isn't the change that's affecting this public IP [12:47:34] well, still can't ping it. [12:49:21] well, the scheduler already reassigned the addresses [12:49:26] they're on the network node [12:49:29] the new one [12:50:50] mark: ok, can you re-add it? [12:50:55] it isn't working either way [12:50:56] ok [12:51:09] but, now that it switched the instances to virt1... [12:51:10] done [12:51:21] it actually moved instances? [12:51:28] only the NAT rules [12:51:39] because those instances are on virt1 [12:51:43] ah yeah [12:51:51] but they need to be on all [12:51:52] I can't ping it, though [12:51:53] not just virt1 [12:52:48] hm [12:52:49] lemme add on the other router as well [12:53:25] done [12:53:50] keep in mind that we can't scale the number of next-hops indefinitely [12:53:54] yeah [12:54:02] I don't know what the limit on juniper MX is, but I wouldn't be surprised if it were no more than 32 or so [12:54:05] or maybe less [12:54:08] hm. still isn't pinging [12:54:25] also, often a number that's not a power of 2 gives odd distribution [12:54:31] ok, what are you pinging? [12:54:37] 208.80.153.223 [12:54:58] that's the public ip of an instance? [12:55:08] yep [12:55:15] and do both network nodes have NAT rules for it? [12:55:21] ah [12:55:22] I see [12:55:29] why isn't forwarding enabled? [12:55:36] it's disabled by default [12:55:52] yeah, but it should be enabled by puppet [12:56:08] unless puppet is wrong [12:57:12] weird... [12:57:17] I'm including it via puppet [12:57:22] but it isn't there [12:57:47] if $lsbdistid == "Ubuntu" and versioncmp($lsbdistrelease, "12.04") >= 0 { [12:57:50] 12.04? [12:58:01] what is that for? [12:58:07] class generic::sysctl::advanced-routing($ensure="present") { [12:58:07] if $lsbdistid == "Ubuntu" and versioncmp($lsbdistrelease, "12.04") >= 0 { [12:58:12] it wraps the file [12:58:18] hmm that seems wrong [12:58:18] why is that being done? [12:58:24] 10.04 has /etc/sysctl.d/ [12:58:27] yes [12:58:28] 8.04 doesn't [12:58:49] fixing [12:58:54] New patchset: Hashar; "(bug 37852) enable WikimaniaShopLink on labs" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12863 [13:00:04] apparently, the default max of next-hops on MX is 16, but can be raised [13:00:10] so we'll be good for a while [13:00:12] yeah [13:01:01] New patchset: Ryan Lane; "Fix sysctl.d inclusions" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12864 [13:01:05] likely 64 max [13:01:28] if they don't all have the static rules, then next-hop will occasionally reject traffic, eh? [13:01:33] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12864 [13:02:30] Ryan_Lane: so... [13:02:41] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12864 [13:02:43] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12864 [13:02:53] paravoid: ? [13:03:02] the nodes are up, puppet has run [13:03:15] puppet was complaining about nova-common before but I guess that was you [13:03:23] yes [13:03:29] now it rightfully complains that it can't find the private key for virt6.pmtpa.wmnet [13:03:31] I broke something temporarily [13:03:35] yep [13:03:38] it needs to be created [13:03:42] do you have the steps somewhere? [13:03:55] or should I trial-ask-and-error? :) [13:03:57] you generate them on sockpuppet, and yep, I do :) [13:04:11] * paravoid looks [13:04:14] http://wikitech.wikimedia.org/view/Using_the_local_certificate_authority [13:04:19] so... [13:04:32] if you want to figure out a better way for libvirt to auth, that would be great too [13:04:41] ssh keys aren't a great option [13:04:50] s/h/l/ :) [13:04:59] (ssl) [13:05:04] ssh keys [13:05:09] ssl keys suck too [13:05:19] but nova does some actions as root and others as nova [13:05:24] so, you need to have keys for both [13:05:35] and we manage authorized_keys for root globally [13:05:44] so, it's a pain in the ass at minimum [13:06:03] a shared secret via sasl or something like that would be ideal [13:06:28] I despise the ssl key way that's being done now [13:07:19] Ryan_Lane: s/occasionally/constantly/ [13:07:24] * Ryan_Lane nods [13:07:25] hm [13:14:27] uuuuggghhhhh [13:14:34] "The next thing to do is to remove your current network and re-create it" [13:14:46] because I have a great feeling that's not going to horribly break things [13:15:16] it's labs [13:15:17] it's ok [13:15:29] you shouldn't make it too stable or people will depend on it and put production stuff on it [13:15:47] -_- [13:15:49] :( [13:16:17] (this is probably not making me popular, but that's ok too ;) [13:16:49] We kinda already run production stuff on it :P [13:17:00] then let's break it now [13:17:18] * Damianz eats mark [13:17:47] * Damianz notes he doesn't taste so good [13:19:13] mark: can you revert that change again? [13:21:51] ok [13:25:57] Could someone update planets? r115556 & r115555 [13:30:11] New patchset: Hashar; "Enhance account throttling" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12185 [13:31:43] New review: Hashar; "Wahh thanks for the work done on that patch over the week-end!" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/12185 [13:32:11] hexmode: maybe mutante [13:36:03] hexmode: I have opened https://bugzilla.wikimedia.org/show_bug.cgi?id=37929 to migrate /trunk/tools/planet from svn to git [13:36:09] hexmode: will see what happens [13:36:22] I think Chad migrate then in batches [13:37:51] <^demon> hashar: I don't know what the plan is there. Probably should just be in puppet repo for now. [13:38:03] <^demon> Long-term plan is to replace it with something that doesn't require code changes to add a new blog. [13:38:30] I think mutante worked on having planet in operations/puppet [13:38:37] <^demon> Well, someone should finish that work. [13:38:43] though ops might not want to have all of that in operations/puppet.git [13:38:49] and would prefer a distinct repo [13:38:54] <^demon> Doesn't make sense to make a new repo and move the config there if puppet doesn't deploy from it. [13:39:04] <^demon> I think a separate repo is a waste of time. [13:39:20] well puppet is probably deploying from svn already [13:39:37] I am not sure how ops deploy from a git repo [13:39:55] I think that is just git clone || git checkout [13:40:08] <^demon> There's a git::clone class or something [13:41:24] it'd be simple to regen a config.ini from wikitext [13:42:22] <^demon> Doing it on-wiki is kind of silly. We should just find planet software that doesn't suck. [13:42:53] as long as we don't have to write and maintain one … I am fine with that plan :-] [13:43:05] New patchset: Faidon; "Add SSL certificates for virt6-8" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12868 [13:43:38] New review: Faidon; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12868 [13:43:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12868 [13:44:12] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12868 [13:44:34] I don't get gerrit [13:44:47] I can't understand when it does a fast-forward and when it does a merge [13:46:40] <^demon> paravoid: https://gerrit.wikimedia.org/r/Documentation/project-setup.html#submit_type explains it pretty well. [13:46:40] ^demon: if there is some out-of-the-box software that does it, sure, but I was just trying to figure out how to fix the current situation with minimal hassel :) [13:46:51] * hexmode leaves for a bit [13:46:57] <^demon> paravoid: Specifically, "If the change being submitted is a strict superset of the destination branch, then the branch is fast-forwarded to the change. If not, then a merge commit is automatically created. This is identical to the classical git merge behavior, or git merge --ff." [13:47:44] ^demon: we have operations/puppet set to "merge if necessary" but I've seen it do merges when a ff would be possible [13:48:17] <^demon> Keep in mind, this is jgit's behavior, not your traditional commandline c git. [13:48:22] ^demon: see 9e18e15 [13:48:25] <^demon> So it might have missed a ff and done a merge instead. [13:49:09] it's also entirely possible that it didn't and I'm missing something :) [13:49:24] <^demon> jgit does funny things sometime :) [13:49:53] <^demon> In any case, #gerrit is pretty responsive if you've ever got weird internals questions about gerrit. They don't bite :) [14:06:36] New patchset: Krinkle; "(bug 37304) Set $wgTranslateUsePreSaveTransform = true;" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12871 [14:08:45] New patchset: Krinkle; "(bug 37304) Set $wgTranslateUsePreSaveTransform = true;" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12871 [14:10:50] mutante: re. RT 3138, rewrite rules--I was going to mention the test script you found, but I forgot. I usually run that with a broader set of URLs (./urls) before deployment to make sure I haven't broken other redirects, and finally without the 'test' arg after deployment to make sure we're getting the same response from all the servers listed in the pybal config [14:19:51] New review: Hashar; "Merging lab stuff." [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/12863 [14:19:53] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12863 [14:26:51] New patchset: Ottomata; "geoip.pp - sshhhhhing stdut of geoipupdate cron" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12874 [14:27:26] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12874 [14:37:22] New patchset: Demon; "(bug 37553) Rotate log files for gerrit." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12876 [14:37:54] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12876 [14:39:49] hi my favorite ops room! [14:40:19] someone new in SF would love to wake up to seeing this merged this morning: [14:40:20] https://gerrit.wikimedia.org/r/#/c/12600/ [14:52:53] New review: Alex Monk; "(no comment)" [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/12185 [15:05:19] Can someone get this merged ? https://gerrit.wikimedia.org/r/#/c/5289/ [15:05:22] Change I5ef26635: Gerrit CSS tweak: word-wrap commit summaries. [15:14:33] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/5289 [15:14:36] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5289 [15:16:36] Krinkle: ^^ [15:16:44] Thx [15:16:48] force running puppet now [15:16:58] did you guys test this change? :) [15:17:23] because I'm going to start a "users who broke gerrit" stab count [15:17:32] it's in [15:18:28] what were srv187 and srv188? apaches? [15:19:05] Ryan_Lane: Don't you lead mark's users who broke gerrid stab count? :D [15:20:04] Damianz: no. [15:20:07] New patchset: Ryan Lane; "Adding patch for nova-compute" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12882 [15:20:17] only if you count the fact that I merge the changes in [15:20:19] <^demon> Ryan_Lane breaks gerrit when I write bad code and it gets merged :) [15:20:32] I always ask "was this tested" [15:20:35] heh [15:20:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12882 [15:22:57] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12882 [15:23:00] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12882 [15:27:17] what were srv187 and srv188? apaches? [15:29:59] hashar, Ryan? [15:30:13] Ryan_Lane: I tested it when I +1'ed it. [15:30:21] everything names srv or mw are apaches [15:30:23] *named [15:30:46] thanks [15:30:57] Ryan_Lane: Gerrit has a tendency of changing all class names after each "build" since they're auto-generated. [15:31:19] So whenever we upgrade to even the slightest different build, we'll have to fix this css. [15:31:32] anyway, we'll cross that bridge when it comes to it. [15:31:55] <^demon> There's a push to make some classes have permenent names. [15:32:46] <^demon> Krinkle: http://code.google.com/p/gerrit/issues/detail?id=864#c5 [15:33:46] ottomata1: let's talk about mr. erosen. [15:49:51] notpeter! [15:49:53] you rang? [15:50:15] hey [15:50:28] I saw a couple of checkins about mr. rosen [15:50:30] heyaaaa [15:50:32] ja [15:50:34] but I think I missed a couple [15:50:39] hmm, i thikn there is just one? [15:50:57] https://gerrit.wikimedia.org/r/#/c/12600/ [15:50:59] lemme look [15:51:08] oh hmm, there is another [15:51:13] but it ahs already been merged [15:51:15] the one that added him [15:51:50] Can someone run as root "chown -R l10nupdate:wikidev /home/wikipedia/common/php-1.20wmf6/cache/l10n/" on fenari please? [15:51:57] I'm apparently not allowed [15:52:36] Reedy: done [15:52:58] thanks [15:53:22] ottomata1: kk [15:53:28] does anyone know the magic to make purgeList.php purge? [15:53:30] looks reasonable. I shall merge [15:53:39] i have notes from several months ago but I'm not having any luck atm [15:54:21] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12600 [15:54:23] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12600 [15:55:13] echo "https://en.wikipedia.org/wiki/Foo" | mwscript purgeList.php --wiki=aawiki [15:56:02] I'm doing that within fenari:/home/w/common/php/maintenance (barfs otherwise), and it doesn't appear to work [15:56:16] says it's purging but the cached copy persists [15:56:36] ooh unless it's my browser cache. sec [15:56:38] what're you trying to purge? [15:56:43] he [15:56:44] h [15:56:49] that was it [15:56:56] http://education.wikimedia.org/milestones [15:57:11] it should 301 to http://outreach.wikimedia.org/wiki/Education/Case_Studies/milestones [15:57:55] thanks notpeter [15:59:37] ottomata: no prob! and now to wail for the "reqest db access to db42 and db1047 for erosen" ticket ;) [16:00:10] ha, aye [16:36:53] New patchset: Demon; "Implement workaround for Gerrit permissions bug." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12894 [16:37:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12894 [17:17:23] who has ops in wm tech? [17:17:27] need someone booted [17:17:33] nm [17:17:49] I thought hou had [17:20:15] *you [17:29:34] I don't think so [17:29:40] I avoid getting irc ops, don't like it [17:33:44] not that I make much use of op [17:33:45] maplebed: just read your email about ms-be5 [17:33:48] sounds like fun ... [17:33:55] but gets useful on these spam cases [17:49:00] there's always been other folks around when I am, wm tech is well populated [17:49:36] populates yes, but few people ave op afaik [17:50:51] heh, it turns out anyone connecting from a server would have op [17:51:05] huh? [17:51:16] er no, just voice [17:51:19] *!*@*.wikimedia.org +V [17:51:24] ah [18:35:07] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12894 [18:35:10] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12894 [18:39:59] New patchset: Faidon; "openstack: remove gluster from compute nodes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12903 [18:40:34] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12903 [18:45:18] * jeremyb waves Ryan_Lane ;) [18:48:28] Ryan_Lane: sign off https://gerrit.wikimedia.org/r/12903 ? [18:48:39] lemme see [18:48:40] I'm fairly sure it won't affect current nodes [18:48:53] it wont [18:48:58] but I'd like to have a review just in case I missed something and I'm about to bring labs down :) [18:49:04] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12903 [18:49:10] bah [18:49:19] server went down right as I went to do that [18:49:22] gerrit down [18:49:28] they pushed a config change [18:49:40] Gerrit gitweb 503 error? [18:49:54] see statement right above yours :) [18:50:07] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12903 [18:50:09] the change that fixed it or the change that broke it? [18:50:15] .... [18:50:19] :P [18:50:20] the server restarts on config changes [18:50:25] right [18:50:27] New review: Faidon; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12903 [18:50:31] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12903 [18:51:18] krinkle-mobile: when you run namespaceDupes.php, how do you make the pages then show up in AllPages? [18:51:37] I.. don't know? [18:51:51] bleh [18:51:56] I hate mediawiki [18:52:03] The broken pages? [18:52:22] dup/clashing names, those are usually renamed to a certain prefix [18:52:25] does a namespace need to be in wgContentNamespaces for them to show up? [18:52:28] no, they all moved properly [18:52:42] Anyway, if they don't show up *anywhere* automatically, that looks like a flaw in the script [18:52:48] they exist [18:52:54] but aren't in AllPages [18:53:02] are you looking at squid cache maybe? [18:53:05] no [18:53:08] this is on a test wiki [18:53:12] I can't imagine what else could cause that [18:53:57] Ryan_Lane: how are eth1.103 interfaces getting to ifstate up? nova? [18:54:02] by hand? [18:54:06] by hand [18:54:14] stupid nova *should* do it [18:54:20] okay, *that's* what you meant [18:54:22] I should really send in a patch for that [18:54:23] fixing it [18:54:36] why would nova care about eth1.103? [18:54:53] oh, nova adds it to the bridge [18:55:02] yes [18:55:10] why can't it just take the bridge as a config and let us deal with its initial members? [18:55:13] other than the VMs that is [18:55:23] * Ryan_Lane shrugs [18:55:32] I'd prefer it do all of it [18:55:41] I certainly wouldn't [18:55:48] but I'd prefer it do it properly [18:55:49] as I wouldn't like messing with any other parts of the system's config [18:56:26] (I realize it already does, I'm just saying I'm not too happy about that either) [18:56:43] enterprise vs. unix way of doing things I guess :) [18:57:51] heh [18:58:01] it's difficult enough to configure it [18:58:15] if they made you do every part of it, it would be way harder [18:59:29] it's hard if you have to learn all of its different ways of doing things rather than sticking to things you know [18:59:38] I *know* how to setup a bridge using /e/n/i [18:59:52] anyway [19:00:00] no reason to bitch about it [19:00:07] heh [19:11:28] err: Could not retrieve catalog from remote server: Connection reset by peer - SSL_connect [19:11:31] again [19:11:49] at least I'm not crazy and it did happen to me yesterday too [19:13:34] hello dear guildmembers! [19:13:39] this is leeroy there :-D [19:13:54] the gallium host looks somehow dead [19:13:55] <^demon> So, what's up with the box? How long's it been down? [19:13:59] ssh does not respond though ping does [19:14:04] apache dead too :( [19:14:37] roughly 15 minutes according to nagios [19:14:41] might just be ssh [19:14:41] hashar: hrm [19:14:47] let me try to get in via mgmt [19:15:41] New patchset: Pyoungmeister; "cleanup: removing very old search class definitions from site.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12908 [19:15:58] hrm, it's freezing up a bit upon login [19:16:10] maybe some process is wild [19:16:14] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12908 [19:16:17] nagios bot must be dead to not be spamming us [19:16:28] <^demon> That's not good either. [19:16:29] http://ganglia.wikimedia.org/latest/?c=Miscellaneous%20eqiad&h=gallium.wikimedia.org&m=cpu_report&r=hour&s=descending&hc=4&mc=2 [19:16:47] yeah the box had a quickly raising load [19:16:53] with at least once CPU at 100% [19:17:22] still unresponsive i'll reboot [19:17:22] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12908 [19:17:24] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12908 [19:17:27] !log rebooting unresponsive gallium [19:17:32] Logged the message, Mistress of the network gear. [19:17:43] ahh Mistress of the network gear. [19:17:52] <^demon> hashar: Worth discussing later: using the clustering features so the builds can be offloaded and not kill the only jenkins box. [19:18:33] ^demon: we will do that whenever I move Jenkins to labs :-] [19:18:48] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.767 seconds [19:19:07] LeslieCarr: do you think atop (if installed) would be able to tell which process started doing nuts on the box? [19:19:15] PROBLEM - Host gallium is DOWN: PING CRITICAL - Packet loss = 100% [19:19:24] doesn't the atop information go away when the box reboots ? [19:19:45] I have no idea [19:19:58] was expecting it to be logged in /var/log/atop.log or something [19:20:09] RECOVERY - Host gallium is UP: PING OK - Packet loss = 0%, RTA = 26.44 ms [19:21:07] thanks LeslieCarr , will investigate a bit [19:21:17] atop stuff is logged, you can look at it later [19:21:24] ok cool [19:21:39] PROBLEM - Host ms-be5 is DOWN: PING CRITICAL - Packet loss = 100% [19:21:41] why am I watcing this channel? [19:21:43] gallium is still starting up … it's fsck'ing [19:21:45] * apergos stops abruptly [19:21:53] afk! [19:23:06] * hashar waves at apergos. Good night! [19:40:55] woosters: who can i talk to about the sibling project move we spoke about last week https://rt.wikimedia.org/Ticket/Display.html?id=2996 ? [19:50:06] tfinc - i am waiting for robh to be back to work on it [19:50:14] he is unfortunately sick today [19:50:43] New patchset: Faidon; "openstack: set tagged interface up on boot" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12912 [19:51:13] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12912 [19:51:16] New review: Faidon; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12912 [19:51:18] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12912 [19:51:56] hashar: hrm … gallium is still fsck'ing [19:52:08] LeslieCarr: it has like 1TB of disk IIRC [19:52:11] so might be long indeed [19:52:17] I guess you can have a lunch safely [19:53:00] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:06] hehe [20:03:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.258 seconds [20:06:36] New patchset: Lcarr; "adding new machines removing lily (decommissioned) some alphabetizing" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12914 [20:06:53] hashar: can you confirm the state of svn vs git vs http conf dirs? [20:07:07] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12914 [20:07:31] is it in fact that we're as we ever were, but a new git repo happens to be receiving occasional updates that have been committed to svn? [20:08:23] RECOVERY - Host ms-be5 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [20:09:36] Jeff_Green: mail me please. can not answer tonight sorry :( [20:09:47] Jeff_Green: hashar at free dot fr [20:10:37] i have a request to push out a redirect ASAP, will I break anything if I just do it as I would have pre-git? [20:11:10] andrewbogott: ping? [20:11:23] RECOVERY - SSH on gallium is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [20:11:30] andrewbogott: I'm trying to give the virt1006 issue another pair of eyes [20:11:39] but I can't connect to it at all, have you shut it down? [20:16:53] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12725 [20:16:56] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12725 [20:17:00] Ryan_Lane: any idea why /lib/init/upstart-job is missing there on the new virt systems? [20:17:55] paravoid: I'm here. [20:18:04] You can't contact the mgmt? [20:18:24] I can't contact the system [20:18:30] I'm in the mgmt [20:18:44] Ah, yes -- what you are seeing is the problem I am seeing. [20:18:53] which is: sol hangs. [20:18:55] no, I mean I can't ssh either [20:19:33] You can ssh to admin@virt1006.mgmt.eqiad.wmnet right? [20:19:42] And get responses &c? [20:20:21] yes [20:20:28] I can't ssh to virt1006.eqiad.wmnet [20:20:38] New patchset: Lcarr; "adding new machines removing lily (decommissioned) some alphabetizing" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12914 [20:20:52] paravoid: yes, exactly. [20:21:05] Nothing is installed on 1006 because I can't see any evidence that there's a server there at all. [20:21:11] aaah! [20:21:11] I can talk to mgmt, but mgmt can't talk to the server. [20:21:12] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12914 [20:21:17] PROBLEM - Host ms-be5 is DOWN: PING CRITICAL - Packet loss = 100% [20:21:23] I thought you installed it and then lost SOL [20:21:36] Oh! Nope, much more serious problem than that. [20:21:52] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12914 [20:21:53] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [20:21:54] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12914 [20:21:56] I've done a 'power cycle' and a 'power hard-reset' with out visible effect. [20:22:03] New patchset: Diederik; "Added more extensive logging support, including title of best hit and score of best hit." [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/12944 [20:22:53] * andrewbogott envisions virt1006 pierced with a lance, partially underwater, etc. [20:23:03] hahahaha [20:23:47] Oh, except, mgmt works so lance is more likely than water. [20:24:35] so, it's trying to PXE now [20:24:54] brewster is getting the DHCPDISCOVER and replies with a DHCPOFFER [20:24:59] but it just doesn't see it [20:25:22] Oh, so, not completely dead! [20:26:06] nope [20:26:17] but DHCP doesn't work, SOL doesn't work, it doesn't look good [20:26:46] paravoid: no clue [20:26:50] RECOVERY - Host ms-be5 is UP: PING OK - Packet loss = 0%, RTA = 0.63 ms [20:27:50] For the record, I can accomplish my intended tasks just as well with 7 servers as with 8. But we will want that capacity eventually. [20:27:58] well yes [20:28:09] since we're both working on Ciscos these days [20:28:14] we might just as well fix it [20:29:49] weird, bugzilla is mostly puppetized … except its apache file in puppet isn't actually distributed to the server itself [20:30:47] awesome [20:31:03] paravoid: I assume, perhaps foolishly, that since there is no software on the system yet, the current failure is a 'hardware problem' and something that only Robh can deal with. Of course, I thought that about the drive letters as well, which proved incorrect. [20:32:52] andrewbogott: I just reset CIMC again [20:33:03] let's see [20:35:54] yep, nothing [20:36:18] I'm opening a ticket for RobH to look at [20:36:32] sounds good. thanks for looking. [20:37:02] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:38:19] New patchset: Lcarr; "puppetizing site for bugzilla.wikimedia.org and switching to new cert" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12946 [20:38:46] LeslieCarr: gallium is back :) [20:38:48] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/12946 [20:39:23] LeslieCarr: there are rotated atop logs in /var/log :-] [20:41:58] New review: Reedy; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12871 [20:42:07] New review: Reedy; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/12871 [20:42:10] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12871 [20:42:33] New review: Reedy; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/12787 [20:42:36] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12787 [20:42:49] andrewbogott: https://rt.wikimedia.org/Ticket/Display.html?id=3187 [20:43:17] New review: Reedy; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/12522 [20:43:19] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12522 [20:44:20] andrewbogott: fyi, I resolved 3055 again; RT has the bad habit of reopening tickets when you reply to them :-) [20:44:36] oops, thanks. [20:44:54] New review: Reedy; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/12783 [20:44:56] does anyone know how to navigate in atop? [20:44:57] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12783 [20:45:06] can't find out how to move back in time :/ [20:45:21] ah t and T [20:45:23] like time [20:45:53] New review: Reedy; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/12780 [20:45:55] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12780 [20:46:02] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.475 seconds [20:46:10] andrewbogott: "virtual media" is a feature where you can point the mgmt to an ISO on your system (via your browser) and then let the remote system boot from it [20:46:15] andrewbogott: e.g. a recovery or installation CD [20:46:33] it's a necessary feature if you don't have PXE :) [20:46:46] Yep, I can see why that would be useful. [20:48:52] New review: Reedy; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/12583 [20:49:54] New review: Reedy; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/12533 [20:49:57] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12533 [20:50:15] * Krenair crosses fingers [20:54:29] New patchset: Lcarr; "puppetizing site for bugzilla.wikimedia.org and switching to new cert" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12946 [20:55:04] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12946 [20:55:23] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12946 [20:55:25] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12946 [20:56:23] PROBLEM - Host ms-be5 is DOWN: PING CRITICAL - Packet loss = 100% [20:57:02] Yay it worked [20:58:37] !log pushing out new redirects.conf adjusted for RT #3138 [20:58:42] Logged the message, Master [21:01:26] !log Triggered several jobs on Jenkins to run tests on change that did not received their blame stick token [21:01:31] Logged the message, Master [21:03:35] New patchset: Lcarr; "removing transcode1 from site.pp (since it's decommissioned)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12947 [21:04:07] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12947 [21:07:21] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12947 [21:07:23] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12947 [21:07:26] oh sh... how on earth did I not notice that [21:07:41] } elseif ( $wgDBname == 'mediawikiwiki ') { [21:07:43] misplaced space -.- [21:07:50] wow [21:07:58] in any case, what about the abuse filter change? [21:07:58] !log rebooting neon [21:08:04] Logged the message, Mistress of the network gear. [21:08:04] That -was- the abusefilter change [21:08:28] didn't see [21:08:31] It needs the space moved a byte to the right, reviewed, merged and deployed. I'm such an idiot. [21:08:56] apparently so [21:09:44] PROBLEM - SSH on ms1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:11:00] New patchset: Lcarr; "fixing bugzilla site" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12948 [21:11:32] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12948 [21:12:47] New patchset: Lcarr; "fixing bugzilla site" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12948 [21:13:19] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12948 [21:13:20] RECOVERY - Host ms-be5 is UP: PING OK - Packet loss = 0%, RTA = 0.52 ms [21:13:23] New patchset: Alex Monk; "Really enable AbuseFilter auto-block on MediaWikiWiki." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12949 [21:13:23] is anyone working on ms1002 ? [21:14:39] hashar: Hi. You triggered https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/2483/ but it seems to have broken [21:15:27] Do I need to rebase the change or is it just a problem with the job? [21:15:38] Krenair: yup rebase on top of latest master [21:16:09] and resubmit [21:16:22] I still need to find out a way to ignore release note files [21:16:51] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12948 [21:16:53] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12948 [21:16:56] hashar, there are custom mergers [21:17:24] Platonides: well if you know of any clean solution we could use, I will be happy to deploy it on Jenkins [21:18:09] Platonides: there is http://stackoverflow.com/questions/332528/is-it-possible-to-exclude-specific-commits-when-doing-a-git-merge/3970442#3970442 [21:18:26] howerver, if jenkins excludes htem [21:18:35] we probably only want that to happen on Jenkins host though [21:18:38] then it could still fail in gerrit [21:18:49] yup [21:19:45] hashar, http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/git-merge-changelog.c [21:20:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:22:20] Platonides: here is the bug report https://bugzilla.wikimedia.org/show_bug.cgi?id=37942 [21:22:45] well the 'ours' merge strategy applied to RELEASE-NOTES* would be fine I guess [21:25:03] Platonides: if you get any idea, feel free to add there. Or submit a change :-] [21:27:20] New patchset: Lcarr; "allowing certs.pp to import passwords.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12950 [21:27:48] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/12950 [21:31:28] New review: Lcarr; "Forcing through - this actually will work as private exists in production" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/12950 [21:31:29] PROBLEM - NTP on ms1002 is CRITICAL: NTP CRITICAL: No response from NTP server [21:31:43] !log manual apache restart on srv265, srv277 [21:31:48] Logged the message, Master [21:33:44] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.027 seconds [21:34:36] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12950 [21:41:15] New patchset: Alex Monk; "Enhance account throttling" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12185 [21:44:19] New patchset: Lcarr; "Didn't work, need to import classes, bigger change Revert "allowing certs.pp to import passwords.pp"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12953 [21:44:49] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12953 [21:45:15] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12953 [21:45:17] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12953 [21:46:15] New patchset: Lcarr; "fixing bugzilla site" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12954 [21:46:46] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12954 [21:47:22] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12954 [21:47:24] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12954 [21:51:01] New patchset: Lcarr; "adding bugzilla server class to kaulen" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12956 [21:51:32] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12956 [21:51:58] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12956 [21:52:01] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12956 [21:52:35] New patchset: Faidon; "openstack: pin python-eventlet to PPA" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12957 [21:52:55] I so miss doing a "git push" and have my changes get applied :/ [21:53:06] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12957 [21:53:13] New review: Faidon; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12957 [21:53:16] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12957 [21:55:30] New patchset: Lcarr; "fixing kaulen (again)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12960 [21:56:01] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12960 [21:56:53] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12960 [21:56:55] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12960 [21:59:11] New patchset: Faidon; "openstack: fix typo in apt pin" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12963 [21:59:44] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12963 [21:59:45] New review: Faidon; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12963 [21:59:48] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12963 [22:02:05] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [22:04:29] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.68 ms [22:05:59] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:08:32] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [22:12:08] New patchset: Ryan Lane; "Make mysql role work for 12.04 and clean up syntax" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12967 [22:12:39] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12967 [22:13:04] !gerrit 12949 [22:13:04] https://gerrit.wikimedia.org/ [22:13:22] !gerrit I6e588d13 [22:13:22] https://gerrit.wikimedia.org/ [22:13:23] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12967 [22:13:25] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12967 [22:18:44] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.059 second response time [22:19:01] paravoid: was change 12967 incorrect? [22:19:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.028 seconds [22:36:52] !log powercycling ms1002 - it's unresponsive to ssh and on the console though it does respond to a ping. [22:36:57] Logged the message, Master [22:37:25] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [22:40:07] RECOVERY - SSH on ms1002 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [22:44:55] RECOVERY - NTP on ms1002 is OK: NTP OK: Offset 0.06067728996 secs [22:50:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:54:58] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [22:59:01] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [22:59:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.049 seconds [23:00:58] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [23:17:01] PROBLEM - Puppet freshness on virt1 is CRITICAL: Puppet has not run in the last 10 hours [23:29:03] New review: Reedy; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/12949 [23:29:06] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12949 [23:33:13] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:44:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.647 seconds