[00:59:01] !log experimentally setting net.ipv4.tcp_tw_recycle=0 on cp1004 [00:59:04] Logged the message, Master [01:00:15] !log reverted after client-side TIME_WAIT connections rose rapidly from 367 to 9000 [01:00:18] Logged the message, Master [01:16:50] New review: Reedy; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7577 [01:16:52] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7577 [01:24:15] PROBLEM - Puppet freshness on searchidx2 is CRITICAL: Puppet has not run in the last 10 hours [01:32:57] RECOVERY - Packetloss_Average on oxygen is OK: OK: packet_loss_average is -0.525353360656 [01:34:56] It's gaining packets? [01:37:18] PROBLEM - Packetloss_Average on oxygen is CRITICAL: XML parse error [01:37:31] !log on cp1004: trying tcp_tw_reuse=1 instead of tcp_tw_recycle [01:37:35] Logged the message, Master [01:38:48] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 211 seconds [01:40:46] !log on cp1004: reverted after TIME_WAIT client connections reached 38k with no sign of a plateau [01:40:50] Logged the message, Master [01:44:30] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 0 seconds [01:52:18] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [02:15:02] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [02:19:19] * jeremyb wonders if these are retroactive or if there's some way to reapply them retroactively. https://gerrit.wikimedia.org/r/4796 (custom linkifications within commit msgs on gerrit) [02:22:26] Read from DB, parse, update, write to DB [02:30:38] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [02:42:21] RECOVERY - Puppet freshness on cp1004 is OK: puppet ran at Thu May 17 02:42:03 UTC 2012 [02:45:02] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [02:47:53] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [03:09:28] how are dns changes made and logged? i'm wondering how it all fits in the new world of gerrit [03:10:08] if i need to add/change/remove a record i assume i can't just add that to a diff. yet. or maybe it depends on the zone [03:11:26] * jeremyb goes digging some [03:17:24] It's in the/a private svn repo [03:17:40] I'm not sure if that's destined to change any time soon [03:20:53] New review: Hashar; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/6005 [03:22:23] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [03:22:49] Reedy: ;-( [03:24:46] Reedy: hi :) what was your IRC client? [03:24:56] was? [03:25:01] it still is Quassel ;) [03:25:15] yeah past time sorry [03:25:22] heh [03:25:31] I'll forgive you, it's only 05:25 :p [03:25:36] I thought about it in French and then translated word by word (somehow) [03:25:43] hehe [03:25:45] thanks! [03:25:46] i don't even remember where now but I saw someone mention a jobs.wm.o cert mismatch. the first step to fixing that would be sending it to the right IP [03:25:53] my night schedules are totally screwed up :-( [03:26:01] which i guess i can't change ;( [03:26:25] j.wm.o redirects to foundation [03:26:29] the first step is to reproduce the issue, take traces and open a bug report :-D [03:26:32] it does so [03:26:40] Oh, I see [03:26:44] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [03:26:59] lol, pointless [03:27:08] hashar: well the place where i saw the problem might be someone else's bug report? [03:27:29] report another [03:27:34] then we can close it as a dupe [03:27:56] RECOVERY - Packetloss_Average on oxygen is OK: OK: packet_loss_average is -0.183695245902 [03:28:55] it was bug 36884 comment 1. but that's not the subject of the bug [03:29:57] ARHGH [03:30:07] Quassel uses french by default :-( [03:30:23] that's a bad thing? [03:30:48] hashar: no, it uses System Default by default [03:32:17] PROBLEM - Packetloss_Average on oxygen is CRITICAL: XML parse error [03:34:12] Reedy: I am discarding the client :-D Could not figure out how to setup core to be started automatically nor the user/pass to connect to it :-D [03:34:26] on ubuntu it just starts :p [03:34:34] yeah [03:34:41] I guess it is not Mac friendly yet [03:34:53] I will open bug reports [03:34:57] hashar: you could ask harej about it [03:35:04] honestly? no :-D [03:35:07] i think he might be on mac [03:35:12] I am too lazy to do that kind of stuff nowadays [03:35:32] I just want to click the app, fill my name, click connect then /join my chans ahaha [03:35:42] i assume harej is equally picky [03:36:02] though I can try again later on [03:36:07] aka not at 5:30am [03:36:11] hah [03:36:27] that must has set me in a bad mood [03:36:33] (has/have?) [03:36:45] I exist! [03:36:56] I live an hour away from Ashburn, Virginia! [03:37:04] I was summoned here by a jeremyb summons. [03:37:06] Hello James glad to meet you [03:37:11] is it really that far? [03:37:16] It's probably closer. [03:37:41] the computernets tell me it's more like a half-hour away. [03:38:25] so basically jeremy told me you are using Quassel [03:38:29] the IRC client [03:38:31] on a mac [03:38:38] correct [03:38:56] (sorry 5am, my brain is slow so I can't make long sentences) <-- that one already took me way too long [03:39:14] so I got a QT something client and some core shell script [03:39:25] do you happen to know how to get Core to start automatically? :) [03:39:40] no [03:39:51] so we need to launch core first then the client correct? [03:40:55] I just launch the client. I don't even know what core is. [03:41:33] there's a standalone client [03:41:40] but the core is for running an always on esk proxy [03:42:37] huh, isn't most of the reason to use quassel for the alwaysonness? [03:42:47] I use it because it's the least shitty client. [03:43:34] "Core" looks like a local bouncer [03:43:47] aka you install Core on some server and connect your client to it [03:45:07] that will be for another day [03:45:16] thanks for showing up harej :) [03:49:07] New review: Hashar; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/7724 [03:50:28] seriously [03:50:37] my daughter is waking up! [03:50:39] at 6am!!!! [03:50:46] she basically prevents me from working :-( [03:50:50] or sleeping [03:50:51] argh [03:51:01] * hashar waits [03:51:21] you could take shifts. you have half the day and you're off half the day. ;-P [03:51:51] the thing that kill me off is that she has woke up at 3am since I am back from SF [03:52:07] and jet lag made me get to bed late in evening [03:52:19] so basically made me 3am to be fully awake :-D [03:52:25] and having to take care of her [03:52:32] just to find out totally screwed for the rest of the day [03:52:34] damn kid [03:52:36] ;D [03:52:39] see you all later [03:52:46] au revoir [04:19:57] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [04:40:02] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [04:50:05] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [05:11:51] mornin [05:13:42] yo [05:20:35] New patchset: ArielGlenn; "skip verify (instead of whine) if no tarballs for wiki" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/7847 [05:22:10] New review: ArielGlenn; "(no comment)" [operations/dumps] (ariel); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7847 [05:22:12] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/7847 [05:29:49] Hm.. there is a wmgPFEnableStringFunctions section in CommonSettings.php [05:29:52] I wonder.. why ? [05:30:32] O_O - there is a wiki that has it enabled [05:30:33] 'donatewiki' => true, [05:30:34] just donatewiki [05:30:35] pfew [05:30:36] TimStarling: ^_^ [05:33:37] Krinkle-away: i don't see where wgPFEnableStringFunctions is used then? (the thing that's set inside the block) [05:33:59] jeremyb: wmgPFEnableStringFunctions is the cluster conditional [05:34:06] set from InitialiseSettings.php [05:34:16] then in CommonSettings.php, if wmgPFEnableStringFunctions -> wgPFEnableStringFunctions [05:34:23] right... [05:34:29] * jeremyb saw all of that [05:34:34] i guess maybe it's just in core... [05:35:03] it's part of Extension:ParserFunctions [05:44:42] Krinkle-away: do you ever sleep ? :-D [05:44:55] Sure, when other people work [05:50:59] :D [06:49:59] !log WMFLabs dieing out, I/O latency raised constantly over the last 2 hours and eventually lead to situation where system (via ssh) is not usable anymore [06:50:04] Logged the message, Master [06:51:04] hashar: elaborate? [06:51:37] prompt takes age to show up? :-D [06:51:45] and I can't edit files remotely using vim hehe [06:51:45] i could log in fine to bastion (not restricted) [06:51:49] but http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=load_one&s=by+name&c=Virtualization+cluster+pmtpa&h=&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4 looks bad [06:51:52] jeremyb: load is at 500 or so? [06:51:53] :) [06:52:02] jeremyb@bastion1:~$ uptime 06:50:54 up 16 days, 4:38, 3 users, load average: 0.49, 0.61, 0.46 [06:52:19] ahhh thanks for the ganglia link [06:52:20] why isn't nagios-wm speaking? [06:52:27] for the virt* nodes that is [06:52:30] no idea [06:53:17] hashar: i just typed "ganglia virt" and that was the first hit in my local browser history ;) [06:53:39] virt2 has like 20% time waiting for IO [06:54:44] yeah for bots ;-) [06:54:45] http://ganglia.wmflabs.org/latest/?c=bots&m=load_one&r=hour&s=by%20name&hc=4&mc=2 [06:54:46] PROBLEM - udp2log log age for emery on emery is CRITICAL: CRITICAL: log files /var/log/squid/teahouse.log, have not been written to in 24 hours [06:54:47] it's not just virt2 that's problematic though [06:55:15] hashar: I'm not sure if this is a cause or an effect [06:55:23] most probably an effet [06:55:25] ryan was looking at labs being wonky just last night [06:55:25] effect [06:55:30] lemme check the scrollback [06:55:41] I guess the cause is the NFS / some hard drive array [06:55:48] it shouldn't [06:56:05] one of them had degraded raid recently too. not sure what the resolution of that was [06:56:12] oh and good morning to the Greek ones :-] [06:56:15] nothing yet, I opened a ticket just yesterday [06:57:01] http://ganglia.wmflabs.org/latest/graph_all_periods.php?c=puppet&m=load_one&r=hour&s=by%20name&hc=4&mc=2&st=1337237804&g=cpu_report&z=large&c=puppet [06:57:09] this is me trying to :wq a simple file [06:58:09] maybe it's not simple? [06:58:11] ;P [06:59:58] http://dpaste.org/DGmWM/ [07:00:07] I have time to write stuff before having the prompt to show up :-] [07:00:47] hashar: tried mosh? [07:01:03] what is that? [07:01:11] mosh rocks [07:01:12] iirc it's mosh.mit.edu [07:01:15] yes [07:02:18] wow (I just saw the ganglia graph) that is no good [07:03:19] PROBLEM - swift-container-auditor on ms-be5 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [07:04:18] hmm it was an nfs instance a couple days ago that was the problem (reading the backlogs) [07:04:40] RECOVERY - swift-container-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [07:05:14] apergos: it rebooted (idk if anyone knows why) and then came back up broken so then it was rebooted again. eventually it started working [07:05:40] ugh. sounds just peachy [07:09:28] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [07:10:22] breakfast time [07:12:01] I am out of computer [07:13:12] my phone number is in the contact file on fenari [07:13:13] if needed [07:13:23] ++ [07:17:01] hashar: nothing works anyway :-) [07:18:40] 102400 bytes (102 kB) copied, 30.6156 s, 3.3 kB/s [07:19:55] I am wondering if it can be due to a specific instance trying to do a ton of IO [07:20:03] so this means it's not a good day to try to set up my exim test instance? [07:20:05] or just to one of the virt machine going wild [07:20:48] nope, it's not just one [07:23:34] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [07:24:52] I did ton of mediawiki configuration change this morning on deployment-prep [07:24:58] maybe one of them caused the issue :( [07:26:37] anyway breakfast for real now [07:27:36] hearing only crickets, I will at least try to prep what I would do, and then see if labs is stable enough to do it today or not [07:29:13] apergos: what for if I may ask? [07:29:26] is it for staging the IT changes that we were discussing at some point? [07:29:30] I guess [07:29:44] I got handed a ticket, I think that's what it is [07:29:51] ah [07:30:07] yeah, I volunteered for that but they wanted me to spend time with hashar instead :-) [07:30:13] :-D [07:30:40] I've spent virtually (pun intended) no time inlabs so this owuld at least get me familiar with it [07:30:53] hahaha [07:31:00] or with it's breakage ;) [07:31:10] already been there [07:31:38] settign up my first instance, I had the worst experience ever... besides the session bug, that is [07:32:09] oh the session bug is ooooooold news [07:41:52] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [08:01:46] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [08:15:21] !log WMFLabs seems to have recovered now [08:15:25] Logged the message, Master [08:16:32] so, i just tested jobs.wikimedia.org with the IP for wikimedia-lb... (in my hosts file). the redirect works fine. someone want to just switch the DNS? or you want a bug for it? [08:16:50] (to fix the cert mismatch) [08:17:53] jeremyb: ping apergos / paravoid ^^^ [08:18:06] I can't do DNS stuff nor I know what the procedure is to change DNS entry [08:18:10] ? [08:18:24] hello? [08:18:46] I have no idea what this is about [08:19:22] 17 03:28:55 < jeremyb> it was bug 36884 comment 1. but that's not the subject of the bug [08:19:26] see the end of that comment [08:20:03] hashar: here too? [08:20:10] hmm rt I guess [08:20:30] I don't know what the "right" answer is for this [08:20:32] jeremyb: ahh I can't get op on -operations and -dev :-( [08:20:40] hashar: ;) [08:20:56] he's in at least a dozen other channels [08:21:13] yeah [08:21:18] going to contact freenode staff so [08:22:28] looks like he got klined [08:22:30] oh no [08:23:31] heh [08:30:13] oh [08:30:20] both K-Lined [08:31:15] so folks who use labs... if I want to make some new class appear in the list of "Special:NovaPuppetGroup", what's the trick for that? [08:31:30] manage puppet groups [08:31:36] and add it there [08:31:48] the last link in the list in the side bar [08:31:58] em [08:32:07] I'm at that page. I want in the list of available classes, [08:32:16] to have the exim-related classes from mail.pp [08:32:39] is your project listed on that page? [08:32:52] if not, update the filter in the top corner [Show project filter] to include your project [08:33:02] right [08:33:03] yes it's there [08:33:08] great [08:33:17] and then next to the project there should be an "add group" [08:33:20] yes [08:33:27] click!!! [08:33:36] I'm asking a different q [08:33:36] haha [08:34:05] can I just arbitrarily give the full classname of anything that appears in any puppet file in manifests? [08:34:12] or does it need to be in some special list first? [08:34:19] AFAIK you can enter just whatever you know [08:34:23] even a totally wrong class [08:34:38] but you should put a class which exit in 'test' branch [08:35:02] so yes, any arbitrarily class should work [08:35:06] that was themissing piece of info. I thought it needed to be displayed already in the list below "all projects" [08:35:10] thanks [08:39:02] New patchset: Hashar; "convert hardcoded 10.0.5.8:8420 to $wmfUdp2logDest" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7702 [08:39:25] New review: Hashar; "Patchset 5 is a rebase / solve conflicts." [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7702 [08:39:27] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7702 [08:39:29] going to deploy that [08:39:40] this says "operations" but probably isn't for me to review :-) [08:41:05] !log Deploying https://gerrit.wikimedia.org/r/7702 which abstract out the udp2log destination [08:41:08] Logged the message, Master [08:42:39] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [08:44:35] !log running scap to apply https://gerrit.wikimedia.org/r/7702 [08:44:38] Logged the message, Master [08:47:07] hashar: did you do it yet? logmsgbot should have said something in #-tech [08:47:19] it is still running [08:47:23] scap is awfully slow nowadays [08:47:31] ahh [08:47:36] PROBLEM - swift-container-auditor on ms-be2 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [08:47:51] it loads all messages from both branches [08:47:58] rebuild all language localization caches [08:48:04] copy them around several time [08:48:05] etc [08:48:14] oh my god [08:48:27] it dumps me a list of the 400 or so server that have synced! [08:48:59] i guess that will teach you to sync just the one file ;-P [08:49:06] next time I will just sync-file the files I need :-] [08:49:17] definitely [08:49:20] well i guess it was 2 [08:49:37] over time `scap` seems to have became a hugeee pile of slow scripts [08:50:12] * hashar watches boxes compiling texvc one at a time [08:50:14] needs a little salt and other flavors [08:50:35] btw, salt's SEO really sucks [08:56:14] what are {news,todo}.dblist? [08:56:42] oh, outage (see #-tech) [09:02:45] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [09:07:57] New review: Hashar; "That change caused a short outage because $wmfUdp2logDest was not available in wfLogXFF() :-(" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7702 [09:10:24] RECOVERY - swift-container-auditor on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:02:18] so given that every lab instance seems tohave exim set up for basic mail sending, I wonder what an exim test instance needs in addition [10:02:23] all the packages should already be there [10:14:47] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:17:03] New patchset: Hashar; "warning message about wmfUdp2logDest format" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7848 [10:17:23] config? [10:17:29] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7848 [10:17:31] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7848 [10:18:46] hashar: whooooops. i changed deployment-prep to use a hostname there. a week or two ago [10:19:27] where ? [10:20:53] jeremyb: ahh I see :) [10:20:57] mutante: is the analytics subnet all working now ? [10:21:05] sorry , i had to run and do server racking [10:21:05] jeremyb: not a big trouble though ;) [10:21:16] hashar: http://wikitech.wikimedia.org/index.php?title=Server_admin_log&action=historysubmit&diff=46576&oldid=46574 [10:21:20] jeremyb: maybe you can indeed use a hostname there afterall [10:21:31] hashar: i never actually tested that it works [10:21:57] i assumed people didn't care if it was just going to prod anyway [10:22:33] (i told people what i did directly and I'd already logged it verbosely) [10:22:39] lets try again [10:24:14] I will remove the warning from 7848 [10:24:57] * jeremyb doesn't know one way or the other... [10:26:17] New patchset: Hashar; "Revert "warning message about wmfUdp2logDest format"" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7849 [10:26:44] New review: Hashar; "It can indeed use a hostname. I have reverted that message with https://gerrit.wikimedia.org/r/#/c/7..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7848 [10:26:52] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7849 [10:26:54] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7849 [10:28:08] hmm [10:29:47] tcpdump -A -n -v -s0 udp port 514 | grep PHP <-- got nothing now :) [10:29:53] daughter duty have fun [10:30:29] (that is on labs btw) [10:33:14] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:36:50] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:48:23] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:55:41] LeslieCarr: well, + DCHP works and gets an IP ,but - i do not get an installer yet, screen stays blank with blinking cursor. looking at DHCP config looks to me i should get a lucid installer. option pxelinux.pathprefix "lucid-installer/"; [11:00:02] ok, i can switch to Ubuntu BusyBox shell from there, so it started somehow, gotta try debug from therehelp [11:12:56] weird [11:15:03] i can see the partman process running, i currently just dont see any installer output.. trying again with the "Legacy OS redirection" option for console [11:19:05] oh these are ciscos right ? [11:19:13] yes [11:19:15] i seem to remember some weird stuff with them and installing - robh would know [11:19:27] i think he was the one dealing with them [11:19:31] anything in wikitech ? [11:19:45] well, he already told me about that legacy option a while ago ... [11:20:09] wikitech, ehm, yea, i am editing on wikitech:) [11:23:27] ah [11:23:52] debconf is running and partman-auto/init_automatically_partition [11:23:56] i should just wait longer first [11:24:40] "0 questions will be asked" "GO" [11:25:35] PROBLEM - Puppet freshness on searchidx2 is CRITICAL: Puppet has not run in the last 10 hours [11:53:09] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [12:22:51] New review: Lcarr; "fyi this broke searchidx2 (which should be decommissioned soon i believe bu t is not yet)." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7126 [12:23:48] New patchset: Lcarr; "removing old classes from searchindexer" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7852 [12:24:07] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/7852 [12:45:27] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [12:51:09] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [12:54:28] wow, our puppet is pain to work with :/ [12:57:10] hahaha [12:57:17] it's been worse [13:09:44] * paravoid cries [13:14:52] paravoid: you mean like there are no modules, there will be no modules? [13:16:01] it's not just modules [13:16:07] everything's entangled with each other [13:16:13] inheritance and overrides are basically absent [13:16:34] if/then/else and globally scoped variables is the standard way of overriding things [13:16:49] I'm working on something that is especially hurt by this [13:17:01] so it's not the current status quo that makes me cry [13:17:15] it's the hacks *I* am doing to work around things :-) [13:17:51] yeah... [13:21:02] ugh [13:38:23] RECOVERY - udp2log log age for emery on emery is OK: OK: all log files active [13:44:01] !log shutting down bellin for troubleshooting [13:44:06] Logged the message, Master [14:03:12] New patchset: Pyoungmeister; "removing old search classes from searchidx2" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7855 [14:03:32] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7855 [14:04:31] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7855 [14:04:33] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7855 [14:05:37] New patchset: Dzahn; "Ciscos uses com1, Dells use com2 for console, wrong DHCP config file, thanks RobH" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7856 [14:05:57] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7856 [14:06:29] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7856 [14:06:31] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7856 [14:09:08] RECOVERY - Puppet freshness on searchidx2 is OK: puppet ran at Thu May 17 14:08:55 UTC 2012 [14:20:32] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [14:22:09] !log adding gerrit project analytics/udplog parent analytics [14:22:12] Logged the message, Master [14:23:23] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [14:37:56] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [14:51:56] New patchset: ArielGlenn; "option to include top level html/txt files in rsync list" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/7857 [14:52:16] New review: ArielGlenn; "(no comment)" [operations/dumps] (ariel); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7857 [14:52:18] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/7857 [15:06:10] notpeter: r u around? [15:07:23] cmjohnson1: so the raid controller shipped, should get there tomorrow [15:07:25] Jeff_Green: ^ [15:08:07] RobHalsell: yayyyyyyy [15:08:09] the hard disks for grosley should also have arrived [15:08:22] ok. we're still waiting for RAM though? [15:08:25] its the 256mb versino of the controller, but oh well [15:08:30] that's fine [15:08:31] yes waiting on the RAM [15:08:43] i think i placed that order, let me confirm [15:08:55] we don't expect much from storage3 even during the fundraiser, performance wise [15:09:20] also most of the reads are heavy--db dumps, giant gz files etc, so the cache is probably not that useful anyway [15:09:21]