[00:00:02] technically, I guess all labs systems could [00:00:12] but we don't tell people to change their passwords via instances [00:00:34] honestly we should disable that, if it isn't already [00:00:58] not disabled [00:03:26] labsconsole, gerrit, the CLI tools for making new accounts. what else? [00:03:50] that's it [00:03:55] that's 4 things, though [00:04:09] actually, gerrit doesn't modify ldap [00:04:27] labsconsole, formey, manganese, …. [00:04:33] may just be three things [00:04:39] what's labsconsole's box called? [00:04:43] virt0 [00:05:28] PROBLEM - MySQL Slave Delay on es1004 is CRITICAL: CRIT replication delay 226 seconds [00:12:34] o_O I got a "Your message to MediaWiki-CVS awaits moderator approval" bounce back email (for my commit) after doing a commit? … something seems wrong [00:13:03] mutante: ^ [00:16:12] 3am again [00:16:15] bye-bye [00:16:38] paravoid: later [00:16:44] I think that's what chad asked to be added earlier [00:18:49] RECOVERY - MySQL Slave Delay on es1004 is OK: OK replication delay 0 seconds [00:38:05] New review: Jeremyb; "We should not have this set locally in MediaWiki space on some wikis and customize per wiki in site ..." [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/11161 [00:56:18] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [01:11:06] hah, had a merge conflict with myself ;) [01:12:20] nb, about to do a large file upload to commons from hume [01:14:12] New patchset: Jeremyb; "make spacing consistent" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5492 [01:14:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5492 [01:16:29] New review: Jeremyb; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/5492 [01:40:42] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 211 seconds [01:41:54] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 262 seconds [01:43:44] New patchset: Demon; "Allow some logs to supress comment-added notifications" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11176 [01:44:07] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11176 [01:46:33] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 19 seconds [01:47:45] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 8 seconds [02:59:24] PROBLEM - Puppet freshness on es1004 is CRITICAL: Puppet has not run in the last 10 hours [03:52:54] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [04:44:51] PROBLEM - Puppet freshness on searchidx2 is CRITICAL: Puppet has not run in the last 10 hours [08:46:38] hello [08:46:52] today i have a network question :) [08:50:00] a user complained about the fact that he is only able to send small portions of data from the toolserver (esams/haarlem) to the wikis via a bot. he said it started on probably the 6th or 7th. [08:51:29] i saw you had some routing changes in esams on i think the 5th because of ipv6 but unfortunatelly i did not measure how the regular throughput to ie washington is. so i cannot test whether there was a significant change in the throughput. [08:52:16] do you have any ideas how much throughput you have regularily? did it change? [08:53:19] then we could exclude the network stuff as a cause [08:53:35] the guy opened a ticket at you site too: https://bugzilla.wikimedia.org/show_bug.cgi?id=37536 [09:01:43] is he making changes/edits? or just reading? [09:02:13] he is making edits [09:04:45] nosy: it is possible the IPv6 and IPv4 connectivity are slightly differents [09:05:16] so we are talking esams to tampa [09:05:43] I would not expect there to be a difference bth [09:05:45] *tbh [09:05:53] did anyone of you make any scp or something from one destination to the other a while ago? [09:06:08] not me [09:06:14] mutante: poke [09:06:48] I guess the workaround would be to raise the timeout :-] [09:07:25] ohh v4 indeed works [09:07:27] hrhrhr...as long as the other side does not close the connection in this time [09:07:52] hashar: do you know where the edits finally land on the wmf side? [09:07:59] do small edits works over IPv6 ? [09:08:09] then i could also look whether we travel via v6 or v4 [09:08:23] if they use the api, then one of the api servers [09:10:01] from fenari to willow: IPv4 pass through tele2, v6 through hurricane electric [09:10:29] both paths look fine right now, that might be another issue entirely [09:13:08] yay reviews done whew! [09:13:14] hashar: ok thank you for testing this [09:13:51] hashar: how did you test? can you upload from willow to fenari? [09:14:06] if you do, do it to /tmp :-P) [09:14:08] I just did a traceroute from the US to willow [09:14:19] I don't have an access on willow :-] [09:14:24] oh i see [09:14:31] so i will poke dabpunkt for a test [09:14:34] I have no toolserver access either, not for years now [09:14:41] apergos: I will just use my ~ over NFS, seems better than using local disk *grin* [09:14:53] :-P :-P [09:15:42] you know a local /home with sym links to /home/username/nfsdir might keep people from doing silly things (though it's gotten much better) [09:15:45] anyways... [09:15:45] grr http://dumps.wikimedia.org/ is not v6 awareè [09:15:56] are we not proxied? [09:16:20] oh, right, no need, just need to set up lighty [09:16:28] I forget if it's been given an address though [09:16:47] I vaguely think not [09:16:54] * apergos adds it to list ot ask about [09:17:53] nosy: one possible test would be to download some long article using curl [09:18:08] nosy: the curl command can be forced to v4 or v6 (with --ipv4 or --ipv6) [09:18:10] oh golly I hate gerrit already this morning [09:18:14] ahah [09:18:18] use the command line ? [09:18:35] want to add reviewer, don't know their gerrit name, no gneral list of reviewers that's clickable... why not?? [09:18:44] ohh [09:18:52] well that is an issue for Ryan [09:19:10] the LDAP schema is messy, you need to know the Labs login :-( [09:19:15] who are you looking for? [09:19:28] and as promised yesterday I have started a list of issues with that as the first one! [09:19:33] hashar: ok, good idea. do you think the routing path is the same back and forth? [09:19:44] no guarantee about that [09:19:45] i think you said its not [09:19:47] :D [09:19:50] we often have assymetric paths [09:20:00] nosy: I have no idea, from the traceroute it looks like the paths are symmetrics right now [09:20:13] but anyway its probably really sensible to test both ways [09:20:16] but hard to know for sure without having a trace from willow to USA :-] [09:20:21] I don't see how you can tell from the traceroute one way [09:20:28] you want a trace? [09:20:44] apergos: you can if the paths have a different number of routers. The TTL will be differents [09:21:07] it gets through tele2, then equix-dc2.wmf [09:21:37] nr 9 is fenari in my traceroute [09:22:26] you really want to do a download over v4 and another one over v6 [09:22:33] root@willow:~# traceroute -A inet6 fenari.wikimedia.org [09:22:33] traceroute: getaddrinfo: no address for the specified node name [09:22:38] that would let you know if the bandwidth iran issue [09:22:41] ok i will do so [09:24:27] the times are the same [09:26:29] i asked dabpunkt to test the other way [09:32:44] so that is most probably not a bandwidth issue [09:34:27] well that was painful (finding brion vibber's user name: not actually easy) [09:34:51] brion ? [09:34:53] davibber [09:34:55] vbrion [09:34:57] brionv [09:35:00] well ... [09:35:04] yeah that is cumbersome :-( [09:35:07] no, for the reviewer it is Brion VIBBER [09:35:12] ahahah [09:35:16] yeahright [09:35:24] you have to poke Ryan about it :-D [09:35:26] so a simple list... sure would be nice [09:35:32] I shall, I shall! [09:35:35] need to update the LDAP schema to add anew field [09:35:47] I am ranting about it from time to time [09:35:51] Chad does too [09:35:58] so one day, maybe we will fix that :-] [09:36:25] I would love a listtle link that would let me choose to see: list of projects. list of user emails. list of reviewer names. list of project owners. [09:36:46] (select one from dropdown list on any gerrit page) [09:40:19] New review: ArielGlenn; "and that's "unused thumbs" purge helper, not " [operations/dumps] (ariel); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11202 [09:40:21] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/11202 [09:54:05] PROBLEM - Puppet freshness on es1003 is CRITICAL: Puppet has not run in the last 10 hours [09:54:06] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [09:54:06] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [09:58:56] maerlant & professor [09:58:58] lovely names [09:59:02] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [10:00:23] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [10:03:27] New review: Nikerabbit; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 1; - https://gerrit.wikimedia.org/r/10756 [10:04:48] New review: Nikerabbit; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/10707 [10:45:34] New patchset: ArielGlenn; "with --nooverwrite, don't overwrite tarballs at the dojob stage either" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/11276 [10:47:05] New review: ArielGlenn; "(no comment)" [operations/dumps] (ariel); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11276 [10:47:07] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/11276 [10:57:03] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [11:21:11] Logged the message, Master [11:21:16] Logged the message, Master [11:21:21] Logged the message, Master [11:29:51] New review: SPQRobin; "@Hashar: The first one was a small change and was already marked OK. Recommitting them separately no..." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/10756 [12:00:13] apergos: about gerrit lacking list of users, projects … We have two bugs open for that : https://bugzilla.wikimedia.org/show_bug.cgi?id=35508 & https://bugzilla.wikimedia.org/show_bug.cgi?id=35510 [12:00:29] checking... [12:01:53] good it's not just me that feels the lack [12:07:43] New review: Hashar; "So I will do it." [operations/mediawiki-config] (master); V: 0 C: -2; - https://gerrit.wikimedia.org/r/10756 [12:21:54] New patchset: Hashar; "(bug 33809) redirect unknow projects to a meta page" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11277 [12:21:55] New patchset: Hashar; "(bug 36033) unknown wikisource redirected to multinlingual one" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11278 [12:22:02] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11277 [12:22:04] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11278 [12:23:25] New review: Hashar; "I have split this change in two new Gerrit changes:" [operations/mediawiki-config] (master); V: 0 C: -2; - https://gerrit.wikimedia.org/r/10756 [12:24:29] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11277 [12:24:32] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11277 [12:27:36] Change abandoned: SPQRobin; "Oh, okay, thanks anyway for splitting :)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/10756 [12:28:01] New review: SPQRobin; "(no comment)" [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/11278 [12:32:41] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11278 [12:33:54] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11278 [12:33:56] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11278 [12:34:00] going to deploy that [12:34:52] best thing about listening to house music [12:35:03] you can stroke your keyboard keys using the same rythm [12:35:11] aka 64 keystrokes per minutes :-] [12:35:17] or maybe just 128 [12:35:18] hmm [12:35:29] * hashar finds out a keystrokes counting software [12:40:22] New review: Hashar; "Applied to live site." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11277 [12:40:24] New review: Hashar; "Applied to live site." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11278 [12:42:18] PROBLEM - Puppet freshness on stat1 is CRITICAL: Puppet has not run in the last 10 hours [12:43:27] hashar: http://en.wikipedia.org/wiki/Frets_on_Fire [12:43:43] ohh [12:43:47] I have the non-free / for pay version [12:43:57] oh no [12:43:58] gmm [12:44:01] that is guitar hero [12:44:19] hmm [12:44:21] that article is wrong [12:44:29] winner of the Assembly, but yet written in Python [12:47:03] bah http://download.wikimedia.org/mediawiki/1.19/ is redirected to http://dumps.wikimedia.org/mediawiki/1.19/ [12:47:03] ;) [12:52:58] can someone check why there is no http://download.wikimedia.org/mediawiki/1.19/mediawiki-i18n-1.19.1.patch.gz [12:53:13] it was linked on download page before I removed it [12:53:18] the link [12:53:24] file should exist [12:54:39] petan: isn't there a bug report for that? [12:54:48] Sam should be able to tell [13:00:18] PROBLEM - Puppet freshness on es1004 is CRITICAL: Puppet has not run in the last 10 hours [13:03:21] hashar: I don't know [13:10:21] ottomata: what do you mean might be a problem with /etc/issue on stat1? [13:12:28] Ubuntu 10.04 [13:12:29] right? [13:12:32] New patchset: Ottomata; "files/nagios/check_udp2log_log_age - I renamed the Zero filter logs, need to do this here as well." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11281 [13:12:34] that's not precise, is it? [13:12:56] yesterday Leslie took down the machine to reinstall precise [13:12:56] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11281 [13:13:08] ottomata: precise is 12.xx something [13:13:11] right [13:13:18] oh [13:13:20] I just finished restoring after the supposed upgrade reinstall to precise [13:13:26] there is that website with lot of stuff on it let me find [13:13:28] the link [13:13:48] ottomata: http://tinyurl.com/79blsmo [13:13:58] those guys are great at listing stuff [13:14:14] aye [13:14:14] cool [13:14:19] uhm [13:14:39] does it say anywhere in there that they decided to keep /etc/issue saying 10.04 even if you upgrade? :p [13:14:53] lol [13:17:28] could someone approve this for me? [13:17:28] https://gerrit.wikimedia.org/r/11281 [13:17:48] mutante, maybe could do me a kindness? [13:17:50] :) [13:19:02] New review: Dzahn; "sure, we have done these filter changes for mobile before" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11281 [13:19:04] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11281 [13:25:43] ottomata: i was a bit worried about /a , how did you restore it since it failed in amanda? [13:25:53] i heard about a flash drive? [13:26:05] ottomata: what? leslie reinstalled it with lucid? [13:27:24] you gotta be kidding me [13:27:27] not precise? [13:28:40] yup :) [13:28:47] shall I reinstall it right now? [13:28:47] mutante, yeah we had a usb drive plugged in [13:28:50] cp -a [13:28:51] ummmm [13:28:54] yes, should be fine [13:28:59] everything is still on the usb drive [13:29:09] i'm sorry about that [13:30:49] erik and andre are currently logged in [13:30:51] let me email them [13:30:53] but yeah it should be fine [13:33:31] ok then i'll move manutius first... [13:37:52] ottomata: let me know when I can start [13:39:54] RECOVERY - udp2log log age for oxygen on oxygen is OK: OK: all log files active [13:40:47] mark, go ahead whenever you are ready [13:41:00] ok, will do so now [13:45:44] leslie gets to wear the clowns nose today ;) [13:46:53] PROBLEM - Host stat1 is DOWN: CRITICAL - Host Unreachable (208.80.152.146) [13:47:49] poor leslie... [13:51:05] mark: i have the mc servers here in tampa to rack [13:51:12] but they have the SFP type cards, where did you want htem? [13:51:42] RobH: I honestly don't know [13:51:49] i've lost track with all the purchases and rack assignments [13:51:58] they will need an EX4500 in the same rack, that's for sure [13:52:13] ottomata: would it help to keep the existing data on /a? [13:52:17] I don't need to format that [13:52:26] RECOVERY - Host stat1 is UP: PING OK - Packet loss = 0%, RTA = 0.30 ms [13:53:38] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [13:53:48] ottomata: if permissions matter in /a and you just said "cp" you might want to copy them again (with rsync -avp or so) [13:54:20] i did cp -a [13:54:31] huh there's a /dev/sdb which is not the external drive [13:54:33] some raid array [13:54:35] rsync -a for /home [13:54:36] checking what that is before I reinstall [13:54:58] um mark, yeah! [13:55:05] that would be easy [13:55:28] dunno why we didn't just think of that before :p [13:55:43] couldn't do that before, the previous install wasn't LVM I think [13:55:49] (or was it) [13:55:53] PROBLEM - SSH on stat1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:56:36] i think it was when i looked at size of /a for backup [13:56:54] dunno [13:59:17] ok [13:59:21] mark: ok, i have empty sdtpa d-3 [13:59:24] so will use that [13:59:29] PROBLEM - Host stat1 is DOWN: CRITICAL - Host Unreachable (208.80.152.146) [13:59:38] according to the raid bios, stat1 has one virtual disk of 10 TB (12 drives raid6) [14:00:20] whoa really [14:00:21] 12 drives? [14:00:31] 12x 1TB [14:01:36] wish we had that on the analyitcs cluster machines :/ [14:02:10] don't you? [14:02:17] New patchset: ArielGlenn; "script to copy the dump deployment files around to the cluster hosts" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/11285 [14:02:18] 8 drives, right? [14:03:20] 8 drives [14:03:24] not 1TB either [14:03:32] yeah, 8i think ~300G? [14:03:35] though they are SAS [14:03:39] das true! [14:03:41] ah! [14:03:47] it makes the usb drive /dev/sda [14:03:49] that's silly [14:03:54] oh when it rebooted? [14:03:55] lemme disable that temporarily in bios [14:03:57] yeah [14:03:59] aye [14:04:10] ye, hence Ciscos start with sdc in partman [14:06:46] "User-accessible USB ports: All ports OFF" [14:06:50] I so don't trust chris & rob ;-) [14:09:21] mutante: weird that some do and some don't, though [14:09:53] indeed, the first 10 analytics bunch did not seem to be different though [14:10:11] or no wait, 1 one of them [14:10:18] the first afair [14:10:35] RECOVERY - Host stat1 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [14:10:40] and all from same donation date [14:10:41] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9109 [14:10:44] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9109 [14:10:50] going to be annoyed if that breaks gerrit ^^ [14:11:58] that said, I can't complain to devs about not reviewing and merging my code if I don't review/merge theirs :) [14:12:28] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/10956 [14:12:28] I so can [14:12:33] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10956 [14:12:45] yeah, I'm full of it. I can too [14:12:53] I review for them way more than they review for me [14:17:07] ottomata: who runs less.ly? do you still need that external git? i heard you got a wmf repo to replace it if im not mistaken. in the pending change 11042 would be good to update that before. seems like a good time to talk about it now because it is just being blocked by "precise upgrade" which happens momentarily [14:17:47] PROBLEM - Host stat1 is DOWN: CRITICAL - Host Unreachable (208.80.152.146) [14:17:51] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/11176 [14:18:11] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11176 [14:18:18] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11176 [14:18:19] ^^ who wants to bet that's going to break the hooks? [14:18:20] (this is all for "newer nodejs" package) [14:18:33] and installing "reportcard" [14:20:06] ha, well, sort of [14:20:07] i mean [14:20:13] mark really wants us to upgrade [14:20:22] we tried to do this 6 weeks ago because of that, but [14:20:28] and indeed the fucking hooks are broken [14:20:33] at some point gave up because analytics didn't really care if we upgraded [14:20:35] of course there are more reasons to upgrade [14:20:56] RECOVERY - SSH on stat1 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [14:21:03] but now it'd be nice so we can have the newer nodejs packages available, without having to add more .debs to the wikimedia apt repo [14:21:05] RECOVERY - Host stat1 is UP: PING OK - Packet loss = 0%, RTA = 0.19 ms [14:21:11] PLUS mark wants us to upgrade [14:21:23] if it was just nodejs alone, we probably wouldn't do it [14:21:24] all fine, just wondering about the git repo [14:21:30] oh yeah, that is going to change [14:21:38] that commit is not ready yet, just wanted to put it up [14:21:46] kk [14:21:55] I minus oned it so no one would bother with it yet :) [14:22:04] ok [14:22:31] was going to offer to create a gerrit project for it if needed [14:22:38] but i heard you had one [14:22:45] thanks! I have create powers now! [14:22:51] alright:) [14:23:05] so, we are actually hosting the main bit of the reportcard repo on wikimedia's github, but I think there will be a mirror to gerrit [14:23:09] which is where we will clone from [14:23:16] i thiiiink, not sure about that [14:23:34] The framework that david wrote for reportcard is reallly awesome and very generic [14:23:37] so we wanted to open source it [14:24:15] sounds like a good thing for sure [14:26:53] running puppet for the first time [14:28:33] it can't install udp-filter [14:28:37] probably not in the precise apt repo yet [14:29:06] hrmm, i wanna ditch db21-db24 (old suns) and use the reclaimed space for 5 new dbs. [14:29:14] i guess we unpack and stack to chat with asher later [14:29:38] RECOVERY - Puppet freshness on stat1 is OK: puppet ran at Thu Jun 14 14:29:09 UTC 2012 [14:29:41] also mysql-client [14:30:07] !log Reinstalled stat1 with Ubuntu Precise [14:30:08] New review: Ryan Lane; "test" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11176 [14:30:12] ugh, this python was really fucked up [14:30:13] Logged the message, Master [14:30:22] ottomata: stat1 is rebooting, when it's back up you can have it [14:30:40] puppet has a few problems, let me know how we can help in getting them resolved [14:30:49] (e.g. by putting a udp-filter package for precise in our apt repo) [14:31:58] ok cool [14:32:11] hmm, that would be good [14:32:14] i'm sure we will need that [14:32:56] who built it last time? [14:33:05] New patchset: Ryan Lane; "Fix gerrit hooks. Follow up to Ieabdf1ae." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11286 [14:33:32] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11286 [14:33:41] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11286 [14:33:44] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11286 [14:35:00] New review: Ryan Lane; "test" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11286 [14:35:08] well, that fixed that [14:36:56] 0 review requests! \o/ [14:43:11] RobH: do you remember specifics for logging into the remote management of the IBMs? [14:43:16] I'm trying to go onto the manutius console [14:43:22] mark@fenari:~$ ssh root@manutius.mgmt [14:43:22] Received disconnect from 10.1.6.51: 2: Too many authentication failures for root [14:45:20] maybe it's another one that uses admin? :) [14:45:29] or maybe it uses the old mgmt password? [14:45:41] PROBLEM - Puppet freshness on searchidx2 is CRITICAL: Puppet has not run in the last 10 hours [14:48:55] admin asks for a password at least [14:49:00] but doesn't take the one I give it [14:49:06] New review: ArielGlenn; "(no comment)" [operations/dumps] (ariel); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11285 [14:49:08] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/11285 [14:49:27] i'll just reboot the box and change the vlan while it boots [14:49:28] easier [14:50:57] the ibm stuff is on wikitech that i know of [14:51:08] that doesn't say anything about this [14:51:13] except "you must login over ssh" [14:51:16] (well I could guess that part) [14:54:18] gah [14:54:24] I log in on one of our foundry core switches [14:54:30] and can't find any of the tampa servers on there [14:54:38] took me a minute to realize I'm logged into csw1-esams [14:55:47] um, diederik [14:55:51] or maybe me [14:55:53] build udp-filter [14:57:25] RobH: so… bout those ciscos in tampa.... [14:59:42] PROBLEM - Host manutius is DOWN: PING CRITICAL - Packet loss = 100% [15:01:14] New patchset: Jeremyb; "Added php-htmlpurifier to go w/ I2ddf3e2ab00584de" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8348 [15:01:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/8348 [15:01:56] so mark, curious as to why I can't install mysql-server-5.1 on precise [15:02:28] hmm, precise upgrades to 5.5? [15:02:36] or is 5.1 in the lucid wikimedia repo? [15:02:42] and we need it in precise as well? [15:02:48] yeah we have some special mysql client packages in lucid [15:02:59] does it need to be rebuild or can we just put it in precise? [15:02:59] because otherwise they conflict with our special mysql server build [15:03:05] but you should just use whatever precise offers now [15:03:08] oh ok [15:03:09] 5.5 [15:04:13] New patchset: Jeremyb; "Added php-htmlpurifier to go w/ I2ddf3e2ab00584de" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8348 [15:04:35] RobH: ciscos? tampa? :( [15:04:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/8348 [15:04:40] :'( [15:07:36] hmm I thought manutius was precise already... it isn't [15:07:38] let me rectify that [15:07:54] New patchset: Ottomata; "mysql.pp - not specifying mysql version when install generic::mysql::packages classes." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11290 [15:08:11] mark or mutante, could one of you approve that please? [15:08:18] no [15:08:21] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11290 [15:08:24] because I don't know if that's the right thing to do for other hosts [15:08:47] I know the current status on lucid systems is messy and I don't want to change it without testing thoroughly [15:08:53] so whatever you do, make sure it only applies to stat1 for now [15:09:08] on precise we have no special mysql packages yet [15:09:17] OR [15:09:22] wrap that class in a test for precise [15:09:26] that's perhaps nicest, for now [15:09:37] so behaviour under lucid remains the same [15:09:43] hmmmmmm ok [15:09:50] i just checked on oxygen, which is lucid [15:10:00] p mysql-server - MySQL database server (metapackage depending on the latest version) [15:10:00] p mysql-server-5.1 - MySQL database server binaries [15:10:12] so i assumed mysql-server would point to the correct one [15:10:22] i was going to parameterize the version [15:10:24] i suppose I could do that [15:10:29] I don't know for sure, and I don't currently have time to check and/or deal with breakage across the cluster [15:12:05] ok [15:14:31] New patchset: Ottomata; "mysql.pp - parameterizing mysql version. statistcs.pp - installing mysql 5.5 on stat1 Change-Id: Ic3ee9cac9f24f5687d28af9f80612d366bcd63db" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11290 [15:14:48] Ryan_Lane: Chris has all of the ciscos racked upstairs in C3-pmtpa. They are not wired yet, we are working to unpack and rack all the pending shit (that we have to receive in to pay for invoicing) today [15:14:59] the first systems to get actually wired are the ciscos for you though [15:15:02] * Ryan_Lane nods [15:15:03] thanks [15:15:04] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11290 [15:15:29] mark, how's that? [15:16:39] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11290 [15:16:42] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11290 [15:17:11] merged [15:17:34] there's something seriously weird with manutius (torrus) and I/O [15:17:37] dpkg is super slow on it [15:17:42] perhaps fsync()... [15:18:34] RobH: have you done anything with transcode1 after I installed it a year ago? [15:18:47] I need to move it to another subnet, but I'd rather just reinstall it when we're gonna use it [15:19:28] danke! [15:20:43] ah crap apparently I can't do class includes in an array like you can with all other resources [15:20:59] mark: messed with vlc on it, its still assigned to that [15:21:03] for the camera feeds [15:21:08] RobH: but can it be reinstalled? [15:21:12] yea. [15:21:13] is there data on it that needs to be preserved [15:21:15] no data [15:21:21] then i'll decommission it, and we'll reinstall it when needed [15:21:23] with precise [15:21:24] ok? [15:21:29] ok [15:22:23] New patchset: Ottomata; "mysql.pp - can't include classes in an array like you can other resource types (?)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11291 [15:22:51] New patchset: Mark Bergsma; "Decommission transcode1 for now, it needs a subnet move and reinstall" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11292 [15:23:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11291 [15:23:17] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11292 [15:23:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11292 [15:23:25] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11292 [15:23:28] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11292 [15:23:28] someone - https://bugzilla.wikimedia.org/show_bug.cgi?id=37591 [15:23:40] this thing require shell [15:23:47] or db access at least [15:23:55] mark, this one too? [15:23:55] https://gerrit.wikimedia.org/r/11291 [15:24:08] weird [15:24:26] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11291 [15:24:29] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11291 [15:25:49] thanks [15:26:19] !log added adminbot 1.2 to repo [15:26:24] Logged the message, Master [15:26:30] which repo [15:26:45] wikimedia-lucid [15:26:49] then say that :) [15:26:54] !log specifically wikimedia-lucid repo [15:26:55] ;) [15:26:59] Logged the message, Master [15:27:03] lucid-wikimedia then ;p [15:27:24] !log Ryan needs coffee [15:27:28] which ryan [15:27:29] Logged the message, Master [15:27:44] !log to satisfy mark's pedantry that's lucid-wikimedia [15:27:49] Logged the message, Master [15:27:52] heh [15:28:13] !log Starting dist-upgrade of manutius to Precise [15:28:18] Logged the message, Master [15:30:53] @infobot-ingore+ log [15:30:53] petan: Unknown identifier (log) [15:30:59] @infobot-ignore+ log [15:30:59] petan: Unknown identifier (log) [15:30:59] Item log was inserted to ignore list [15:40:30] mark: so, I'm taking neon to install the servermon thing, right? [15:40:41] what's servermon? [15:40:50] !log silicon dist-upgrade and reboot [15:40:53] the django app over the puppet db [15:40:54] Logged the message, Master [15:41:16] like the puppet dashboard, sans the performance issues :) [15:41:31] oh [15:41:33] why neon? [15:41:36] paravoid, mark: seems some people also have a really long ttl for labsconsole [15:41:49] Ryan_Lane: did you see my latest mail? [15:42:03] mark: did you see my reply? [15:42:04] :) [15:42:23] mark: neon because that's what some people told me around here, but that's why I'm asking you again :) [15:42:26] suggestions welcome [15:42:34] indeed. the A record was set to 1H [15:42:35] neon is the nagios server [15:42:38] why would you install puppet stuff on that [15:43:00] sounds wrong to me too [15:43:10] install it on stafford [15:43:14] seems our resolver has labsconsole incorrect as well [15:43:21] stafford? [15:43:27] Ryan_Lane: my local resolver does too [15:43:30] paravoid: yeah, puppetmaster [15:43:38] I'm feeling a bit uneasy exposing a web tool from the puppetmaster [15:43:43] we're not exposing it [15:43:51] stafford is internal [15:44:04] eh? [15:44:06] so we'd socks proxy to it? [15:44:10] yes [15:44:13] works for me [15:44:18] oh [15:44:21] that's how we did dashboard too [15:44:24] although it was on sockpuppet [15:44:29] but I'd like for sockpuppet to go away [15:44:33] or at least get a reinstall sans puppetmaster [15:44:39] I wouldn't mind having this on it then though [15:44:56] ? [15:45:05] do you mean install it on sockpuppet now? [15:45:09] or after reinstall? [15:45:16] or both [15:45:25] you could do a test install now [15:45:33] okay [15:45:36] then some day we clean up the puppet CA situation, reinstall sockpuppet, reuse it for this [15:46:28] dashboard is still installed there, I presume I can go ahead and kill it? [15:46:41] yep [15:48:03] as for dns [15:48:16] perhaps the resolvers use the SOA TTL instead of the negative ttl somehow [15:48:28] although virt0 has never not existed [15:48:31] no, if you request it now, everything's good [15:48:31] so that would seem weird too [15:48:40] it's *weird* [15:48:42] yeah [15:48:52] !grosley dist-upgrade & reboot [15:48:57] garg [15:49:01] !log grosley dist-upgrade & reboot [15:49:06] Logged the message, Master [15:51:02] as for this ipv6 problem [15:51:08] I think there's some MTU problem again [15:51:13] New patchset: Ottomata; "/var/run has been moved to /run in Ubuntu Precise. Updating generic::mysql::server accordingly." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11296 [15:51:22] and it might be similar to the tcp receive segmentation offloading problem we had on ipv4 with lvs [15:51:25] ottomata: eh? [15:51:38] ottomata: /var/run hasn't moved to /run [15:51:40] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11296 [15:51:43] otto@stat1:/a/mysql$ ls -ld /var/run [15:51:43] lrwxrwxrwx 1 root root 4 Jun 14 14:29 /var/run -> /run [15:51:51] oh for crying out loud [15:51:56] /var/run is a symlink to /run [15:52:05] that's an Ubuntu thing [15:52:08] oh? [15:52:11] sorry, nevermind [15:52:19] it wasn't that way on lucid, pretty sure [15:52:19] either way, keep stuff in /var/run for now [15:52:23] that's not the case in other distros, /run wasn't meant for something else than /var/run [15:52:40] well, app armor is annoyed with me if I don't be specific about where the files are [15:52:42] specifically early access to runtime-data, before /var is mounted [15:52:57] ugh apparmor [15:53:07] indeed! [15:53:38] ah, looks like this change happened in Oneric? [15:53:46] Oneiric* [15:54:24] http://askubuntu.com/questions/57297/why-has-var-run-been-migrated-to-run [15:56:04] New patchset: Ottomata; "/var/run has been moved to /run in Ubuntu Precise. Updating generic::mysql::server accordingly." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11296 [15:56:31] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11296 [15:57:31] damnit [15:57:42] I'm seeing tcp packets > 1460 bytes on ipv6 [15:57:51] stupid NIC hardware [15:57:59] RECOVERY - Host manutius is UP: PING OK - Packet loss = 0%, RTA = 0.45 ms [15:58:03] paravoid or mark, could one of you review that for me? [15:58:56] mark: that's GSO/GRO [15:59:12] I know [15:59:32] the question is, why is it not enabled on the (esams?) lvs hosts you reinstalled... [16:01:02] because they never needed it before [16:01:08] as their interfaces weren't tagged [16:01:22] I don't understand [16:01:26] !log restarting nova-compute on virt5 [16:01:30] I don't even know what are you debugging [16:01:31] Logged the message, Master [16:01:45] ugh [16:02:04] --rabbit_host=virt0.wikimedia.org <— anyone have any idea why that might be a problem? :) [16:02:28] stupid resolver ttl problem [16:02:40] paravoid: check your email [16:04:06] !log Manually disabled GRO on amslvs3/4 eth0 [16:04:10] Logged the message, Master [16:04:43] and what that has to do with ipv6? [16:04:54] it's probably failing for ipv6 too [16:05:35] i've not confirmed this yet [16:05:43] I've used both .1q + ipv6 with gso in the past [16:05:46] just fine [16:05:56] that's cool [16:05:58] but here it's not fine [16:06:16] 2.6.32 had some bugs iirc [16:06:19] kinda depends on your NIC hardware doesn't it? [16:06:33] but those were with bridging and routing [16:06:34] I have an issue with Precise hosts refusing to install image scaling package. The cause is some if( distro == 'Lucid' ). Can some one please review / submit my change at https://gerrit.wikimedia.org/r/11298 ? :-] [16:06:43] let's first see if the issue is fixed [16:06:47] perhaps it's something else [16:07:46] which problem are you debugbing? [16:07:59] people are seeing timeouts on ipv6 [16:08:05] when posting large edits on the API [16:08:14] !log loudon dist-upgrade & reboot [16:08:17] sure sounds like MTU issues to me [16:08:19] Logged the message, Master [16:08:22] heh, yeah [16:08:45] although mtu issues with ipv6 are generally harder to encounter [16:08:54] since people can't just block icmpv6 blindly [16:09:04] yeah [16:09:14] and routers don't fragment packets [16:09:37] right [16:10:01] ha hm [16:10:05] I got the issue too :) [16:10:15] can't ping large packets to esams [16:10:32] and I don't have icmpv6 blocked, that's for sure [16:10:39] well [16:10:44] that may be something else [16:10:49] because the reports are entirely within esams [16:10:55] toolserver -> esams squids [16:10:58] well, esams nginx [16:11:11] what size packets are you trying? [16:12:20] New patchset: Hashar; "enable imagescaler packages on Precise hosts." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11299 [16:12:24] >= 1445 fails [16:12:31] ping6 -s 1445 that is [16:12:32] not packet size [16:12:33] on what route? [16:12:47] I have full 1500 without issues from xs4all [16:12:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11299 [16:12:53] also, what destination? [16:13:12] wikipedia-lb.esams [16:13:32] no issues here [16:13:54] nice, same with www.google.com [16:14:01] but not with other destinations e.g. greek [16:14:05] damn [16:14:07] nevermind then [16:14:22] huh [16:14:27] I can do even > 1500 from my colo server [16:14:33] is there some fragmentation going on still [16:14:42] or gro/gso [16:14:57] of course there's fragmentation [16:15:00] it just happens on your end [16:15:24] annoying [16:15:31] how else would it happen? [16:15:48] I don't see any other way [16:15:50] i'll turn it off... [16:15:57] it's not gro/gso [16:16:17] oh you mean you don't see the fragmented packets on tcpdump? [16:16:48] i can't even get root access on my colo anymore [16:16:55] shows how often I type that root pass these days [16:17:04] passwordless sudo ;) [16:17:20] don't have that [16:20:53] gaaah [16:21:00] the stupid -M do/don't tricked me again [16:21:08] I used "don't", but it's "don't prohibit" [16:21:11] that's bitten me a few times already ;) [16:21:50] for v6? [16:21:57] both ipv4 and v6 [16:22:16] -M do prohibits fragmentation, but I always remember "don't" since that's more intuitive [16:22:25] "don't fragment" instead of "don't prohibit" ;) [16:22:26] I always do -M do :) [16:22:33] !log hume dist-upgrade & reboot [16:22:35] but for ping6 it doesn't do much [16:22:38] Logged the message, Master [16:22:42] no DF flag et al [16:22:52] it makes the difference [16:22:58] hashar, I can't approve yours (no powers) [16:23:01] but if you got a sec [16:23:05] check out mine? [16:23:05] https://gerrit.wikimedia.org/r/#/c/11296 [16:23:08] bleh. my vm on rackspacecloud doesn't have ipv6? [16:23:10] oh wait, you don't have approve powers either, do you? [16:23:18] ottomata: will get Ryan to review it :-] I am poking him in -labs [16:23:24] danke [16:23:28] [root@tilia mark]# ping6 2620:0:862:ed1a::1 -s 1453 -M dont -c 1 [16:23:28] PING 2620:0:862:ed1a::1(2620:0:862:ed1a::1) 1453 data bytes [16:23:29] 1461 bytes from 2620:0:862:ed1a::1: icmp_seq=1 ttl=58 time=3.99 ms [16:23:38] [root@tilia mark]# ping6 2620:0:862:ed1a::1 -s 1453 -M do -c 1 [16:23:38] PING 2620:0:862:ed1a::1(2620:0:862:ed1a::1) 1453 data bytes [16:23:38] From 2001:888:2000:8::8 icmp_seq=1 Packet too big: mtu=1500 [16:23:47] yes, but that's all local [16:23:51] I understand [16:23:55] but that doesn't matter [16:24:13] it's not like v4 where you set a flag to see who's going to reject it in the path [16:24:25] indeed [16:24:39] ottomata: the ubuntu people are weird. Is /run/ a new linux file hierarchy version? [16:24:50] PROBLEM - Host hume is DOWN: CRITICAL - Host Unreachable (208.80.152.190) [16:25:21] its not just ubuntu: [16:25:31] http://lwn.net/Articles/436012/ [16:25:42] yes it is, Fedora started it but it's everywhere nowadays [16:32:29] PROBLEM - Host manutius is DOWN: PING CRITICAL - Packet loss = 100% [16:33:23] so that is the linux file hierarchy evolving :]] [16:33:43] ottomata: well sorry, I need to leave my coworking place :( [16:34:35] yeah [16:34:39] awww ok [16:34:50] mark maybe will review it for me when he gets a sec [16:34:53] sounds like they are busy atm though [16:34:54] RECOVERY - Host manutius is UP: PING OK - Packet loss = 0%, RTA = 0.61 ms [16:34:56] thanks anyway [16:37:10] why do we have a special wmf ganglia-monitor package in precise? [16:38:14] apparently the one in precise is much older [16:39:54] bah. all of the nova services except for virt0 show as down [16:40:24] I'm betting rabbit's queue is somehow tied to the old IP [16:40:27] !log running pagetriage_page schemea changes on enwiki and testwiki via osc (https://gerrit.wikimedia.org/r/#/c/11014/1/sql/PageTriagePagePatch.sql) [16:40:27] I hate rabbit [16:40:31] Logged the message, Master [16:41:26] notpeter: https://gerrit.wikimedia.org/r/11301 [16:41:54] Ryan_Lane: labsconsole not responding wouldn't be the NS record kerflufle, would it? [16:42:05] is this from the office? [16:42:09] yes. [16:42:16] they need to flush the resolver cache for virt0.wikimedia.org [16:42:25] but its labsconsole.wikimedia.org, not .wmflabs.org [16:42:30] yes [16:42:36] that's the IP that actually changed [16:42:41] ok. [16:42:48] * maplebed tunnels instead. [16:42:49] I definitely don't understand how the wmflabs.org addresses got fucked up [16:42:54] just ask OIT [16:43:03] they did it in like 5 minutes last time [16:43:19] New review: preilly; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/11301 [16:43:43] eh? did it got the old IP again?! [16:43:48] no [16:44:01] apparently they flushed wmflabs.org [16:44:11] so that people could access bastion [16:44:29] though, that would still be weird [16:45:35] no, they restarted the resolver [16:45:40] flushed the entire cache [16:45:47] wow [16:45:49] that's fucked up [16:45:53] what the hell is going on? [16:46:31] New patchset: Mark Bergsma; "Revert "added in row c eqiad" - it's broken" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11302 [16:46:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11302 [16:47:03] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11302 [16:47:06] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11302 [16:47:17] paravoid: I had to delete rabbit's queues [16:47:23] and restart it [16:47:27] why? [16:47:33] queues are non-persistent in nova, thankfully [16:47:49] because it seems the queue is tied to the hostname and the ip in rabbit [16:48:23] it's not like anyone ever changes those things [16:48:27] * Ryan_Lane stabs [16:49:15] preilly: in 10 minutes, yes? [16:50:45] notpeter: yes it needs to be live @ 10 [16:51:51] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11301 [16:51:52] hey mark, about udp-filter [16:52:01] it is built fine for precise [16:52:09] we just need it available to precise installs in apt [16:52:14] not sure what that means [16:52:18] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11301 [16:52:21] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11301 [16:52:44] New patchset: Jeremyb; "cleanup/refactor gerrit logging" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8120 [16:52:52] it is here just fine: [16:52:53] http://apt.wikimedia.org/wikimedia/pool/main/u/udp-filter/ [16:53:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/8120 [16:54:13] New review: Jeremyb; "I don't understand the logic in Iab6b67c921f08cc." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/8120 [16:55:56] ok [16:56:39] New patchset: Mark Bergsma; "Correct node-id" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11303 [16:57:07] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11303 [16:57:14] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11303 [16:57:17] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11303 [16:57:31] and mark just beat me to it [16:58:01] Ryan_Lane: see 8120. i don't understand the logic you used. seems to be no comments period not just no comments in mediawiki.log [16:59:28] <^demon> Yeah, he's returning anytime the class is CommentAdded. It should be "If we're CommentAdded, check the skipped logs" [16:59:35] <^demon> jeremyb: ^ [16:59:58] ^demon: well see what i did ;) [17:00:14] ottomata: do you mean that the build for lucid works just fine in precise? [17:00:37] yes [17:00:42] that's not very clean [17:00:46] but probably true [17:01:07] it's built for an older libc, but it probably works [17:01:17] what do I need to do to make it installable via apt? [17:01:36] i'll just put the lucid version in the precise repository now, as an exception [17:01:46] k danke [17:02:04] and please use a more descriptive name than udp-filter next time ;) [17:02:09] (whoever did that) [17:02:27] for the package? [17:02:43] yes [17:02:48] ok, precise repo should have it now [17:02:54] ooooook, drdee read up! [17:03:13] thank you! [17:03:17] also...https://gerrit.wikimedia.org/r/#/c/11296 :) [17:03:35] !log Copied udp-filter package from lucid-wikimedia to precise-wikimedia (but do as I say and rebuild, not as I do...) [17:03:36] <^demon> jeremyb: Looks good, but I'm a python noob so some of that's beyond me :) [17:03:40] Logged the message, Master [17:03:40] notpeter: just had a last minute addition https://gerrit.wikimedia.org/r/11304 [17:04:12] ^demon: well specifically the part that was a new merge conflict since ryan's change [17:04:19] New review: preilly; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/11304 [17:04:25] preilly: ok, will push now [17:04:34] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11304 [17:04:42] <^demon> jeremyb: I think you've got the logic right. [17:04:43] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11304 [17:04:45] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11304 [17:04:46] ^demon: also, i thought you were the original author? [17:04:49] ^demon: great [17:04:57] <^demon> Of what? The hooks? [17:05:04] that file [17:05:06] preilly: pushed [17:05:18] maybe i'm imagining things ;P [17:05:23] <^demon> jeremyb: Nope, I've just refactored huge parts of it. Ryan wrote the original hooks. [17:08:01] New review: Mark Bergsma; "I think we should make $run_directory a global variable, set in realm.pp, depending on the distribut..." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/11296 [17:13:06] wanna see the preliminary version of the tool? [17:13:08] mark, Ryan_Lane? [17:13:10] haven't finished yet [17:14:42] i'm busy [17:14:51] Jeff_Green: payments is no longer using squid, correct? [17:15:02] oh noes, it reinstalled lucid [17:15:08] doh, i know what happened [17:15:15] i didn't run puppet on brewster before kicking off the reinstall [17:15:16] :( [17:15:21] sorry mark/ottomata [17:15:24] mark: sadly it does use squid [17:15:24] :) [17:15:26] it's ok [17:15:32] nginx-->squid-->apache [17:15:34] someone needs to wear a clowns nose in the office every once in a while [17:15:35] :) [17:15:38] hehehe [17:15:53] mark: is that why you work from home? [17:15:56] yes [17:16:14] how did we get on a clown nose thread? [17:16:38] New patchset: Mark Bergsma; "Temporarily disable network link aggregates until I have time to fix it" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11305 [17:17:07] New patchset: Mark Bergsma; "Remove payments squids from Torrus" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11306 [17:17:33] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11305 [17:17:33] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11305 [17:17:34] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11305 [17:17:34] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11305 [17:17:34] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11306 [17:17:47] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11306 [17:17:49] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11306 [17:20:36] oh [17:20:41] torrus had a new major release which is in precise [17:20:45] no wonder I'm having so much breakage [17:20:50] time to read the changelog ;) [17:22:06] ah :) [17:22:45] perhaps it actually works now :) [17:25:01] New review: Jeremyb; "see also the way I did it:" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/11296 [17:25:44] mark: when you have a chance (i guess not now) ^^ [17:26:30] ^demon: ping [17:26:49] New patchset: Mark Bergsma; "Parameter references defined within the same level don't seem to be supported in Torrus 2" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11307 [17:27:01] ^demon: see the way that comment was mangled in 11296 [17:27:17] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11307 [17:27:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11307 [17:27:23] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11307 [17:27:25] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11307 [17:27:47] <^demon> jeremyb: Ouch. Let's slap some \b on the commentlink regex. [17:28:39] doing... [17:29:41] why is the match for [commentlink "changeid"] unquoted? [17:30:23] <^demon> Lemme look, I don't have it open in front of me. [17:30:48] it had \\b already on the commit one [17:31:04] i guess ; and / are matches for \b [17:31:08] anyway, have to run [17:31:41] (or ; and =) [17:32:17] hey opsies, who is using stat1001? [17:32:31] statsies? [17:32:41] * jeremyb runssssss away [17:32:44] <^demon> jeremyb: No clue on why no quotes. But it ain't broke so let's not fix it :p [17:33:02] <^demon> Maybe we could just put \\s+ for the link? [17:33:10] <^demon> Rather than \\b [17:33:20] New patchset: Mark Bergsma; "Remove more uses of parameters defined at the same level" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11308 [17:33:24] drdee: you are supposed to use it [17:33:40] i would if i could :D [17:33:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11308 [17:33:52] neither me or ottomata can ssh into that box [17:33:58] domain does not resolve [17:34:02] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11308 [17:34:04] ok [17:34:05] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11308 [17:34:07] can you put in an rt request? [17:34:11] hiiiiiiiiiii doo deed oooooo [17:34:11] sure [17:34:12] https://gerrit.wikimedia.org/r/#/c/11296  [17:34:15] i love youuuuu [17:34:29] ooop [17:34:31] someone reviewed it! [17:34:37] man I always miss review notices [17:34:57] but perhaps put it in base.pp instead [17:35:03] it's not actually realm dependent [17:35:08] (ottomata) [17:35:12] stat1001 doesn't have any dns, fyi drdee/ottomata [17:35:13] it only depends on the distribution version [17:35:26] yeah, cool [17:35:31] i like both of those suggestions [17:35:33] will do [17:35:48] what is stat1001 [17:35:50] i think it is not real [17:35:53] i think it is mythical [17:36:15] it's racked in rack B4 in eqiad ;) [17:36:32] torrus 2, I hate you already [17:36:39] you are not backwards compatible [17:36:55] if I had realized torrus was doing a major version bump in precise I would not have dist-upgraded this box in a whim like this ;) [17:37:41] mark, LeslieCarr: created RT ticket https://rt.wikimedia.org/Ticket/Display.html?id=3113 [17:38:14] New patchset: Jdlrobson; "Black'B'erry not Black'b'erry (bug 37599)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11309 [17:38:26] drdee: ty, we'll get it installed soon [17:38:40] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11309 [17:39:26] New review: preilly; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/11309 [17:39:49] do you want it precise or lucid ? [17:40:19] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11309 [17:40:21] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11309 [17:40:31] precise [17:40:33] like stat1 [17:40:47] it's supposed to do the exact same thing, in eqiad [17:50:21] okay, I have to add an index to the puppet db [17:59:36] New patchset: Pyoungmeister; "removing db43 from db.php for kern upgrade" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11312 [17:59:43] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11312 [18:00:14] !log aluminium/db1008 dist-upgrade & reboot [18:00:19] Logged the message, Master [18:00:21] New patchset: Mark Bergsma; "Make Torrus 2 work" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11313 [18:00:51] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11313 [18:00:51] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11313 [18:03:43] PROBLEM - Host db1008 is DOWN: PING CRITICAL - Packet loss = 100% [18:04:55] RECOVERY - Host db1008 is UP: PING OK - Packet loss = 0%, RTA = 26.61 ms [18:04:57] Ryan_Lane: weren't you looking at the labsconsole issue? [18:08:49] for some reason the office dns went dumb again [18:08:54] i just switched to 8.8.8.8 [18:09:08] New patchset: Demon; "Link commit hashes only on spaces, rather than word boundaries" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11314 [18:09:12] Google's famous DNS [18:09:35] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11314 [18:10:04] yeah [18:10:12] LeslieCarr, I can't reach labsconsole from a different network either (using a German ISP resolver) [18:10:14] well 4.2.2.2 wasn't as happy from the office [18:10:26] oh now that is interesting , what're the results from dig Eloquence ? [18:10:43] ;; ANSWER SECTION: [18:10:43] labsconsole.wikimedia.org. 11725 IN A 208.80.153.135 [18:11:01] that's far too long of a timeout considering the max is supposed to be 3600.... [18:11:10] hrm... [18:11:17] !log erzurumi dist-upgrade & reboot [up 633 days] [18:11:18] (I can reach labsconsole fine from AT&T) [18:11:23] Logged the message, Master [18:11:44] yay jeff [18:11:56] kill those hardies ;) [18:12:06] are you going to precise now? [18:14:39] not now, no [18:14:53] we're talking about replacing it with a host in the payments rack once that's available [18:16:13] bah [18:16:31] hahaha [18:16:46] why do it twice [18:16:58] I assume the hardware is older than death too [18:17:12] not so much [18:17:25] wtf [18:17:29] labsconsole IN A?! [18:17:39] labsconsole is a CNAME to virt0 [18:18:14] heh, good catch paravoid i was too concerned about the timeout [18:18:17] that is weird... [18:18:33] are there any dns caching technologies that save everything as in a ? [18:18:44] that makes no sense leslie [18:18:53] paravoid: yeah, it's a cname... what about it? [18:19:02] mark: Eloquence's paste above said IN A [18:19:08] and a TTL of 11725 seconds [18:19:14] I saw that [18:19:28] but I don't understand your reaction [18:19:31] the office dns has it as an a record as well [18:19:45] we have CNAME in our auth servers [18:19:51] that response seems fine, except for the weird TTL [18:19:54] and yet a recursor says IN A [18:20:09] oh you mean it should add the CNAME step [18:20:37] is should say "labsconsole $ttl IN CNAME virt0", "virt0 $ttl IN A ..." [18:20:43] yes [18:20:59] ssh: connect to host hume.wikimedia.org port 22: Network is unreachable :( [18:21:12] AaronSchulz: looking [18:21:14] and yet it doesn't [18:21:44] http://pastebin.com/WyheLJb0 [18:21:55] office (broken dns) versus working (google) dns [18:22:00] haha [18:22:05] ;; ANSWER SECTION: [18:22:06] wmflabs.org. 86400 IN NS virt0.wikimedia.org. [18:22:06] wmflabs.org. 86400 IN NS labsconsole.wikimedia.org. [18:22:18] so what? [18:22:20] so wmflabs.org has NS records to a cname [18:22:22] going back to virt0 [18:22:31] yeah, I know... [18:22:34] I can see why that would confuse a recursor here and there [18:22:35] remove that [18:22:41] there's absolutely no point in having labsconsole there [18:22:49] LeslieCarr: can you check SOA? [18:22:54] with OIT's? [18:23:14] soa ? [18:23:34] dig @192.168.39.5 labsconsole.wikimedia.org SOA [18:23:45] no need to pastebin it, just paste the answer section [18:23:55] i'm going to remove labsconsole as NS now [18:24:04] labsconsole.wikimedia.org. 3600 IN CNAME virt0.wikimedia.org. [18:24:30] mark: it's wrong but it shouldn't create /that/ effect [18:24:38] no reason to, as far as I can see [18:24:46] interesting - so the authority section is different though [18:24:46] LeslieCarr: and after that? [18:25:03] http://pastebin.com/DSRBhCej [18:25:04] right, the authority section I was interested in [18:25:33] okay, this is getting spooky [18:25:38] New review: Pyoungmeister; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11312 [18:25:40] Change merged: Pyoungmeister; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11312 [18:26:19] ah no, that's Google having a max TTL of 1800 [18:26:29] we do have 86400 for SOA indeed [18:26:40] AaronSchulz: hume is back, it was stuck on fsck [18:27:16] RECOVERY - Host hume is UP: PING OK - Packet loss = 0%, RTA = 0.73 ms [18:27:39] my screen were nuked :( [18:27:46] ah well [18:28:46] PROBLEM - MySQL Replication Heartbeat on db47 is CRITICAL: CRIT replication delay 204 seconds [18:28:55] PROBLEM - MySQL Replication Heartbeat on db1040 is CRITICAL: CRIT replication delay 214 seconds [18:29:13] PROBLEM - MySQL Replication Heartbeat on db50 is CRITICAL: CRIT replication delay 232 seconds [18:29:13] according to the RFCs, CNAMEs are not allowed as NS targets [18:29:13] PROBLEM - MySQL Replication Heartbeat on db46 is CRITICAL: CRIT replication delay 232 seconds [18:29:15] AaronSchulz: sorry about that [18:29:26] so this is out of spec and can easily lead to weird stuff like this [18:29:40] PROBLEM - MySQL Replication Heartbeat on db1006 is CRITICAL: CRIT replication delay 258 seconds [18:29:40] PROBLEM - MySQL Replication Heartbeat on db1022 is CRITICAL: CRIT replication delay 259 seconds [18:29:44] 2012-06-14 18:28:38 no IP address found for host spence (during SMTP connection from spence.wikimedia.org (spence) [208.80.152.161]:42961 I=[208.80.154.6]:25) [18:29:47] fun stuff, i'll remove the labsconsole and see if (in a day) that fixes it ? [18:29:53] that's from aluminium's exim log [18:29:58] i'm doing that already leslie [18:30:09] cool [18:30:56] New patchset: Pyoungmeister; "changing s6 master to db47 in mysql.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11323 [18:31:24] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11323 [18:31:30] Jeff_Green: what about it? [18:31:33] (that's normal) [18:31:38] it's normal? [18:31:42] yes [18:31:48] 'spence' does not exist as a hostname [18:31:59] some client used that as ETRN argument [18:32:04] instead of a fully qualified hostname [18:32:27] !log new master log and pos for s6 MASTER_LOG_FILE='db47-bin.000230', MASTER_LOG_POS=876357616 [18:32:32] Logged the message, notpeter [18:33:57] is there a line I can add to my /etc/hosts file to reach labsconsole? [18:34:05] 208.80.152.32 [18:34:08] mark: isn't that spence that used that hostname? [18:34:16] Jeff_Green: probably [18:34:21] some broken nagios plugin I think [18:34:35] anyways, don't worry about it [18:34:44] mark: okay, another weird thing then [18:34:51] thanks mark [18:34:56] mark: yesterday LeslieCarr + andrew [18:35:07] restarted OIT pdns recursor [18:35:13] flushing its cache in the process [18:35:35] Leslie pasted an output with the old labsconsole IP [18:35:57] the rec_control swipe-cache didn't work …. then the recursor restart fixed the issue… except obviously somehow the bad ip has worked its way back in [18:35:58] which was completely gone from DNS long before the flush [18:36:40] I don't know anything about the OIT setup [18:36:47] does it use other resolvers or root servers directly? [18:36:48] !log pushing new dns zone file (minor change) [18:36:52] Logged the message, notpeter [18:40:46] RECOVERY - MySQL Replication Heartbeat on db50 is OK: OK replication delay 0 seconds [18:40:46] RECOVERY - MySQL Replication Heartbeat on db46 is OK: OK replication delay 0 seconds [18:41:13] RECOVERY - MySQL Replication Heartbeat on db1006 is OK: OK replication delay 0 seconds [18:41:13] RECOVERY - MySQL Replication Heartbeat on db1022 is OK: OK replication delay 0 seconds [18:41:49] RECOVERY - MySQL Replication Heartbeat on db47 is OK: OK replication delay 0 seconds [18:41:58] RECOVERY - MySQL Replication Heartbeat on db1040 is OK: OK replication delay 0 seconds [18:43:58] !log db22 relocating [18:44:02] Logged the message, RobH [18:45:06] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11323 [18:45:09] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11323 [18:45:52] PROBLEM - Host db22 is DOWN: PING CRITICAL - Packet loss = 100% [18:46:13] !log db22 relocated, powering up [18:46:18] Logged the message, RobH [18:46:20] binasher: db22 is powering back up now [18:47:22] mark: looks like it's every delivery from spence actually, one every few minutes [18:47:24] that was fast! [18:47:32] i move fast. [18:47:49] damn, i totlaly should have replied with 'thats what she said' [18:47:54] missed opportunity. [18:48:51] haha [18:50:40] RECOVERY - Host db22 is UP: PING OK - Packet loss = 0%, RTA = 0.64 ms [18:50:55] binasher: So I wanted to confirm before i shut them down and wipe, db21 and db23 are not slated for use by you? [18:51:05] They are not in any actual db rotation that i can see [18:51:22] (i can shut them down and unrack, and not wipe for a week or so just in case) [18:52:09] they can be wiped [18:52:18] cool, thanks! [18:52:26] !log unracking and decommissioning db21 and db23 [18:52:31] Logged the message, RobH [18:54:25] PROBLEM - mysqld processes on db22 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [19:02:58] preilly: time to revert those changes? [19:03:49] notpeter: not yet [19:04:20] RECOVERY - mysqld processes on db22 is OK: PROCS OK: 1 process with command name mysqld [19:04:21] ok [19:06:26] PROBLEM - MySQL Replication Heartbeat on db22 is CRITICAL: CRIT replication delay 573 seconds [19:06:35] PROBLEM - MySQL Slave Delay on db22 is CRITICAL: CRIT replication delay 507 seconds [19:07:04] helllloooooooo to the ether that is the ops channel [19:07:06] hellloooooo [19:07:15] who knows about nginx ssl proxying? [19:07:25] does the proxy go to the apaches? or to squid/varnish? [19:09:13] ottomata: squid/varnish [19:09:17] RECOVERY - MySQL Replication Heartbeat on db22 is OK: OK replication delay 0 seconds [19:09:26] RECOVERY - MySQL Slave Delay on db22 is OK: OK replication delay 0 seconds [19:09:31] they're literally only "just" ssl termination boxes [19:09:57] so request -> nginx -> squid -> apache and then back up? [19:10:00] so [19:10:01] if [19:10:04] !log rebooting db43 for kernel upgrading [19:10:08] a request is udp2logged from nginx [19:10:09] Logged the message, notpeter [19:10:15] it will also be logged from squid, correct? [19:10:40] ottomata is worried about double counting [19:11:54] i know i've asked this before [19:11:58] but i just wanted to be sure [19:12:02] because someone took my memory from me [19:12:19] the filters look at the source ip and if it's from a wikimedia network range, they don't count them [19:13:02] PROBLEM - Host db43 is DOWN: PING CRITICAL - Packet loss = 100% [19:13:12] udp2log does that? [19:13:14] or the custom filters? [19:14:59] RECOVERY - Host db43 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [19:20:26] ottomata, mark: that only applies to webstatscollector [19:20:38] that we use for the pageview counts [19:20:51] it is not used in udp-filter, and actually we should add that [19:34:53] yup! [19:34:54] thank you! [19:46:02] PROBLEM - Host db23 is DOWN: PING CRITICAL - Packet loss = 100% [19:46:11] PROBLEM - Host db21 is DOWN: PING CRITICAL - Packet loss = 100% [19:48:55] New patchset: Pyoungmeister; "adding db43 back to s6 pool" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11401 [19:49:01] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11401 [19:50:12] New review: Pyoungmeister; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11401 [19:50:14] Change merged: Pyoungmeister; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11401 [19:50:51] New review: preilly; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/11408 [19:51:01] notpeter: can you merge https://gerrit.wikimedia.org/r/#/c/11408/1,publish [19:52:04] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11408 [19:52:12] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11408 [19:52:15] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11408 [19:52:45] pushing out now [19:52:51] notpeter: okay cool — thanks! [19:53:19] preilly: puppet runs all done. [19:53:21] no prob! [19:54:26] New patchset: Asher; "setting innodb_file_per_table for es hosts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11441 [19:54:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11441 [19:54:58] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11441 [19:55:00] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11441 [19:55:08] damnit, there go all of our pmtpa mysql ganglia agregators... [19:55:20] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [19:55:20] PROBLEM - Puppet freshness on es1003 is CRITICAL: Puppet has not run in the last 10 hours [19:55:20] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [19:55:38] New patchset: Demon; "Adding tracking ability to gerrit changes." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11451 [19:56:09] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11451 [19:56:46] notpeter: are you fixing, or should i? [19:57:47] binasher: changing to 50 and 51 [19:57:54] ok [19:58:06] glad you noticed so quickly [19:58:37] didn't think to see if 21 was doing anything other than running mysql [19:58:48] New patchset: Pyoungmeister; "changing pmtpa mysql ganglia aggregators to db50 and db51" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11474 [19:59:18] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11474 [19:59:41] RECOVERY - Puppet freshness on es1004 is OK: puppet ran at Thu Jun 14 19:59:18 UTC 2012 [20:00:55] New patchset: Pyoungmeister; "changing pmtpa mysql ganglia aggregators to db50 and db51" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11474 [20:01:26] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11474 [20:01:26] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11474 [20:01:42] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11474 [20:03:55] New patchset: Asher; "set default innodb_file_per_table value at the top of the conf class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11510 [20:04:23] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11510 [20:04:23] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11510 [20:12:51] New patchset: Asher; "moving some es conf variants from base mysql to mysql::conf class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11555 [20:13:24] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11555 [20:13:29] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11555 [20:13:32] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11555 [20:20:55] New patchset: Asher; "trying this one more time" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11560 [20:21:40] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11560 [20:21:41] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11560 [20:21:41] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11560 [20:57:07] New patchset: Ottomata; "Creating new user Jonathan Morgan, including him on stat1." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11564 [20:57:36] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11564 [20:58:19] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [21:12:17] ^demon: i haven't caught up yet, did anyone do the comment regex? [21:12:40] ^demon: \\s+ isn't so good because it won't match at the beginning or end [21:12:57] <^demon> Grrrr. What about \\s? [21:13:02] same thing [21:13:29] we could do lookahead/lookbehind [21:13:38] <^demon> That'll work :) [21:14:16] but that's trickier and who knows if it's actually supported [21:14:26] you say it's the same as JS but i'm not convinced ;) [21:14:32] ^demon: is there a test instance on labs? [21:14:51] <^demon> Yup, called gerrit. Want in? [21:15:01] sure [21:16:20] !log updating dns for new mgmt ips and move of scs [21:16:24] Logged the message, RobH [21:17:27] <^demon> jeremyb: Done [21:22:29] who can I go to for etherpad being partly broken? [21:23:24] http://etherpad.wikimedia.org/ep/pad/view/* is not working, although I can see the live pad [21:23:50] can we kill etherpad yet? ;) [21:24:42] but, but, it's useful LeslieCarr - we don't get any edit conflicts there! [21:24:55] hehe [21:25:19] um, i'd open up a bugzilla ticket [21:25:42] or finding people to set up the etherpad-lite in labs and get it working enough to move to prod ;) [21:26:19] so bugzilla ticket it is :P [21:26:27] LeslieCarr: i thought lite was done in labs? certainly someone worked on the packaging at least [21:26:49] mutante would know as I believe he was working wit someone on it [21:28:16] probably johnduhart [21:28:22] (he did the packaging) [21:28:33] hmm I think its just because the etherpad I'm looking at is huge LeslieCarr [21:28:37] by that I mean 900 lines [21:28:43] damn dude [21:28:46] etherpad novel ? [21:28:53] in fact, it's truncated at 850 [21:28:54] http://etherpad.wikimedia.org/Notes-20and-20discussion-20for-20Advisory-20Group-20Meeting-20 [21:28:59] FDC advisory group :P [21:29:47] hah, reading the chat, "Asaf: It's unbelievable that we're still stuck with this buggy version of Etherpad, after all this time. *sigh*" [21:30:50] blasphemy, labsconsole just accused me of having cookies disabled. [21:31:04] (then i tried again and it worked) [21:43:23] see also: https://rt.wikimedia.org/Ticket/Display.html?id=1720 <- "install etherpadlite" [21:45:33] RECOVERY - Puppet freshness on es1003 is OK: puppet ran at Thu Jun 14 21:45:16 UTC 2012 [22:38:05] New patchset: Catrope; "Give Ori Livneh restricted shell access" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11572 [22:38:20] heya [22:38:32] does anyone here know much about puppet monitor_service [22:38:34] nagios stuff? [22:38:35] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11572 [22:38:39] binasher maybe? [22:38:50] nope [22:38:53] hm [22:39:45] hmm, looks like maybe Ryan_Lane does? [22:40:04] it adds things into nagios's config [22:40:19] ok, I want to change a $name of one of them [22:40:21] it's like 1am here, don't expect me to be too helpful right now ;) [22:40:25] aahhhhh right [22:40:27] well [22:40:28] @@nagios_service { "$hostname $title": [22:40:31] if $title there changes [22:40:39] originally it would have been 'packetloss' [22:40:47] i want to change it to 'udp2log-emery-packetloss' [22:40:50] what happens? [22:40:56] does nagios then get all confused and start a new metric? [22:41:04] i believe nagios starts a new metric [22:41:10] is that bad? [22:41:15] not necessarily [22:41:20] not really at all [22:41:48] ok good, currently the puppet configs have only the ability to monitor a single udp2log instance packet loss [22:41:55] but some machines (emery) run multiple instances [22:42:09] thank you! [22:57:59] New patchset: Ottomata; "Refactoring udp2log classes and defines." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11574 [22:58:27] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11574 [22:58:40] maplebed or someboaaaady, gimme a review of that when you got a chance [22:58:44] don't merge it without me around though [22:58:48] i gotta run for the day [22:58:54] only if you can make my bus show up. [22:58:57] i'd really love some feed back on that one [22:58:59] hmmm [22:59:07] I'll do a bus dance on my way down my stairway [22:59:11] this bus terminal kinda smells. [22:59:25] i will also do a good smell dance [22:59:35] (mostly of exhaust, but also ... other things...) [23:00:16] I like the look of the files you're modifying... [23:00:26] can we merge it tomorrow morning when we're not both about to leave? [23:00:40] maplebed: good luck with the getting back across the bay [23:00:48] yeah, bart's not running yet. [23:01:21] but I have the good fortune of a bus route that *doesn't* go to a bart station going near my house. [23:01:29] so it won't be mobbed by all the bart riders. [23:01:53] yay [23:01:56] sadly it doesn't even start running for another 15 minutes. [23:02:11] and the probability it'll be on time is ... umm... [23:02:16] anyway. [23:03:12] ottomata: the first thing I see - in the logrotate template log_directory is an unscoped variable. [23:03:15] will it be in the right scope? [23:04:14] oh, he probably left. [23:04:20] I'll comment in the review. [23:04:43] LeslieCarr: do you know when something goes in manifests/misc/foo instead of manifests/foo? [23:04:54] (or anybody, really) [23:05:05] not really, i figured smaller things in misc [23:06:36] yeah no merging now [23:06:38] jsut review and feedback [23:06:53] eah i'm leaving [23:06:55] tell me in the review [23:06:58] byyyyeeeee [23:07:04] ciao [23:07:06] ty! [23:07:11] mark: you might want to review https://gerrit.wikimedia.org/r/#/c/11574/ too, if you have the time. [23:08:09] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/11574 [23:08:26] misc stuff tended to be things that ran as smaller services on shared hosts in the past [23:08:35] like a server may run a number of misc entries [23:08:42] but it was mostly from when they were in a single large misc.pp file [23:08:57] then as it grew it was split into misc/ just for ease of use [23:09:22] and existing things stayed there, and other small shit that tended to not warrant a huge entire in the root directory of manifests [23:09:25] do you have an opinion on https://gerrit.wikimedia.org/r/#/c/11574/ and whether udp2log.pp should be in manifests/ or manifests/misc/? [23:09:48] this would pretty much be on all systems? [23:10:02] if its touching everything i say non-misc [23:10:31] no, it's on only a handful: locke, emery, oxygen, nfs1/2, maybe a few more. [23:10:39] ohh, yea, just those [23:10:41] reading now [23:10:56] hrmmm, misc then [23:11:10] just cuz it seems to then denote 'this may not be something you want to slap on most machiens' [23:11:14] unless... does it include the stuff that tells the squids how to send logs? I don't tihnk it does [23:11:27] well, I'll leave a comment - should this be in misc? [23:11:36] maybe other folks will have opinions too. [23:11:38] thanks! [23:11:49] i think misc too [23:11:51] welcome [23:12:02] I kinda think it should... "RobH and I think this should be in misc. do you too?" ;) [23:13:24] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/11574 [23:14:36] yep [23:15:29] Ok, time to pack it up and go to the hotel.... where i can work some more but have the joy of drinking something (no food or drink on the datacenter floor!) [23:16:36] New patchset: Kaldari; "Turn on LastModified for en.wiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11576 [23:16:42] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11576 [23:17:12] New review: Kaldari; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11576 [23:17:14] Change merged: Kaldari; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11576 [23:29:44] * jeremyb could use a favor from a root when you have a sec... [23:30:01] want the prod BZ regex for linkifying bug in comments [23:30:12] i guess i could just take from upstream BZ [23:50:22] maybe start with `fgrep -rn bug_format_comment .` ? [23:51:25] ahhh, i think maybe i found it. http://mxr.mozilla.org/bugzilla/source/Bugzilla/Template.pm#236 [23:53:08] * jeremyb yawns... where's ^demon when I need him? ;) [23:55:06] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [23:57:56] in hell? [23:58:27] he's east coaster, so might be out ? [23:58:47] !log stopping mysql on es1003 and disabled notifications. going to convert to innodb via hotbackup of es1004 for testing [23:58:51] Logged the message, Master