[00:00:01] New patchset: Jalexander; "Sync git config files with live SVN files (en and fr additions/adjustments and ru deadlink removal )" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18440 [00:00:40] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18440 [00:04:16] New review: Dzahn; "thanks for syncing" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/18440 [00:04:17] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18440 [00:11:33] Reedy: since you edited [[singer]] on wikitech and it no longer runs contact.wm.o, where does that run now? [00:12:11] (same for racktables) - I thought the ones you removed were all old services, but they're still up, somewhere [00:12:48] Krinkle: racktables is on hooper [00:13:52] thx [00:14:57] contacts seems to be still singer indeed [00:15:10] per DNS [00:16:06] yeah I know Roan saw it there a couple days ago (contact civi) [00:16:15] we have a class in ./misc/outreach.pp but it's not complete [00:16:25] so not actually applied to a node yet [00:16:37] we should test that one in labs [00:17:14] all it does is system user, webserver and an apache config so far [00:21:14] New patchset: Dzahn; "planet - add webserver to role class, define document root dir" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18442 [00:21:54] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18442 [00:34:04] mutante: I added Nagios to {{Server}} [00:34:28] e.g. http://wikitech.wikimedia.org/view/Singer [00:39:46] Krinkle: nice, i like that [00:40:10] it guesses the host, so even pages with a totally empty {{Server}} have a (probably) working link: http://wikitech.wikimedia.org/view/Pdf1 [00:40:30] Krinkle: actually, how about using this link instead http://nagios.wikimedia.org/nagios/cgi-bin/status.cgi?host=singer [00:40:37] shows all services on the host [00:40:51] I used the link that I got from nagios when clicking a hostname [00:41:05] that one looks better, thanks. I'll use that [00:41:14] this is the one you get when further clicking "View Status Detail For This Host" [00:41:20] ok [00:41:26] ic [00:41:52] done [00:42:04] :) [00:45:19] that's probably taking it to far, but if you had, say a server page for each analytics host, analytics1001, analytics1002 and so on.. and a Category: page on the wiki for them, that would then link to host groups like http://nagios.wikimedia.org/nagios/cgi-bin/status.cgi?hostgroup=analytics-eqiad&style=detail [00:45:38] :) [00:45:42] some other time maybe :D [00:45:46] heh, yeah [00:52:32] New patchset: Dzahn; "planet - add apache site config" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18444 [00:53:11] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18444 [00:54:04] New patchset: Dzahn; "planet - add apache site config" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18444 [00:54:42] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18444 [00:58:53] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [00:58:53] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [01:39:58] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 187 seconds [01:40:43] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 228 seconds [01:48:04] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 671s [01:54:31] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 9 seconds [01:58:16] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 12s [01:58:17] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 14 seconds [02:04:53] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [05:16:53] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [05:21:51] mornin [05:26:56] PROBLEM - Puppet freshness on labstore1 is CRITICAL: Puppet has not run in the last 10 hours [05:30:50] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [05:30:50] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [05:30:50] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [05:36:21] yes it is [05:40:48] lies [05:54:15] so am I the only one who didn't get moved over to wikimedia-staff? [06:07:53] PROBLEM - Puppet freshness on hume is CRITICAL: Puppet has not run in the last 10 hours [06:51:37] anyone around who can look at a (simple) dns change? /tmp/ms10-atg.diff on sockpuppet [07:42:55] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [08:05:51] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [08:10:21] PROBLEM - HTTP on kaulen is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:11:33] RECOVERY - HTTP on kaulen is OK: HTTP OK HTTP/1.1 200 OK - 461 bytes in 0.001 seconds [08:45:56] PROBLEM - Puppet freshness on ms-be1003 is CRITICAL: Puppet has not run in the last 10 hours [09:00:55] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [09:15:55] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [09:30:54] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [10:59:52] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [10:59:52] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [11:37:29] it's quiet around here [11:44:10] puppet has not run [11:44:46] * Damianz starts playing the drums really loudly next to parav [11:44:51] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 182 seconds [11:45:00] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 181 seconds [11:46:22] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [11:46:30] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [11:46:48] !log Started disk firmware update on nas1001-a [11:46:58] Logged the message, Master [11:49:37] grumble grumble lucid [11:51:54] !log nas1001-a disk firmware upgrade completed. Started disk firmware upgrade on nas1001-b [11:52:02] Logged the message, Master [11:55:18] what's nas1001-a? [11:55:21] ah netapps? [11:55:25] yes [11:55:32] yay! [11:55:33] gonna upgrade them [11:55:45] i was planning that right before ipv6 day ;) [11:55:53] disk firmware upgrade? the ontap upgrade is doing that automatically I think [11:55:54] !log nas1001-b disk fw upgrade completed [11:56:00] not this version [11:56:03] Logged the message, Master [11:56:04] I'm sure that it did so for older versions, not sure for 8.1 [11:56:09] okay [11:56:19] they recommend doing it 'one day before' [11:56:23] but I don't need to do it in the background ;) [11:56:26] much quicker this way [11:56:26] "one day before"?! [11:56:33] then it does one disk at a time [11:56:36] I just did all at once [11:56:45] aha [11:56:55] well, it made sense to do it on the ontap upgrade [11:57:08] but who knows why they changed that [11:57:26] i'll upgrade all other firmwares too while i'm at it [12:05:52] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [12:06:57] the tampa netapp seems pretty fucked up in terms of cabling [12:08:13] how so? [12:08:21] it's throwing errors [12:08:50] one service processor doesn't even let me log in [12:10:48] New patchset: ArielGlenn; "ms10 gets wm.o hostname" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18464 [12:11:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18464 [12:11:48] I have to say the blue and white is so much easier on the eyes [12:13:42] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18464 [12:32:06] afk for a little while, going to try to get a tent for camping this weekend [13:01:22] If I have a MW maintenance script running on the cluster, is it possible for this script to obtain a list of all wikis on that node and do db interaction with those wikis? [13:03:13] "node"? [13:03:24] we have cluster db lists [13:05:14] Reedy: with "node" I mean a machine with master database on it, not sure what all the terminology is, or even if I understand the basic setup of the cluster correctly - are there any docs on this? [13:06:39] Reedy: basically what I need to do is update a single table per node that holds some cache and then poke all wikis covered by this node for invalidation and stuff [13:08:29] we call it a cluster [13:08:41] s1-s7 [13:08:55] you could just use the dblists [13:09:20] pop the first dbname off, do the update, then iterate over that wiki and the rest of the list and poke them to invalidate stuff [13:10:10] !log Upgraded SP firmware to 1.3 on nas1001-a and nas1001-b [13:10:19] Logged the message, Master [13:12:11] Reedy: heh, so we got a cluster of clusters? :P [13:12:16] Reedy: so no docs on this? [13:12:42] Reedy: how do I obtain such a dblist from MW code? [13:12:48] You don't [13:12:57] :( [13:13:16] well, you could, but you're just reading a file that's above the directory root [13:13:25] most things like this are spawned from shell [13:21:15] Reedy: oh really - but those then poke directly at the database rather then using the relevant abstraction layers MW provides? [13:21:28] It depends [13:21:47] some do, some don't [13:22:11] mwscript dbname maintenanceScript.php would run that script on the dbnmae [13:23:14] Oh, that's very useful for us [13:23:23] I guess we'll go with that approach then [13:23:32] At least, if I'm understanding it correctly [13:24:03] Reedy: are there really no high level docs on the cluster stuff or a list of "stuff you ought to know before writing code that will run on it"? [13:24:57] I don't think so [13:25:25] It's not that complex, so adapting to fit shouldn't need much work [13:31:28] Reedy: well, maybe it's not complex, but knowing about mwscript for instance actually does help us [13:32:17] "all" mwscript does is allows people to work wikiversion agnostically [13:32:45] so they don't have to find out if dbname123 is on mw version 1 2 3 or 4 [13:44:17] notpeter: hi [13:45:06] notpeter: what is CT's handlle ? [13:45:40] woosters usually [13:46:48] thanks [13:55:51] !log Started OnTap upgrade to 8.1 on nas1001-a [13:56:05] Logged the message, Master [14:07:45] shitty netapp [14:07:52] why is it complaining about reservation conflicts all the time [14:11:26] !log Started OnTap upgrade to 8.1 on nas1001-b [14:11:35] Logged the message, Master [14:39:37] !log OnTap upgrade on nas1001-a and -b complete [14:39:46] Logged the message, Master [15:03:46] cmjohnson1: you're not in the dc, right? [15:13:07] mark: headed in now...about 25mins [15:13:13] ok [15:13:17] i have some issues with the netapp [15:13:29] okay...i fixed the cabling to match eqiad [15:13:35] yeah it does seem better now [15:13:40] but there are two broken drives [15:13:43] and the system is behaving weirdly [15:13:50] so will need at least those two drives pulled [15:13:59] ok [15:17:52] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [15:27:26] no puppet on spence for ~60 hrs? [15:27:55] PROBLEM - Puppet freshness on labstore1 is CRITICAL: Puppet has not run in the last 10 hours [15:29:15] y u break puppet jeremyb ? [15:29:40] * jeremyb ?! [15:29:59] ;) [15:30:20] it was wikipetan! [15:31:58] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [15:31:58] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [15:31:58] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [15:33:59] LeslieCarr rrrrrrrrrrrr u working today? [15:34:12] not really .... [15:34:16] i'm doing somestuff for jeff [15:34:21] then i need to pack [15:34:22] aye ok [15:34:28] did you get a chance to figure anything out with the dells yesterday? [15:34:37] ask cmjohnson1 or RobH [15:34:58] ok cool, danke [15:35:10] either of you two around? [15:35:40] ottomata: cmjohnson's probably on the road. not here at least [15:36:58] (but back on soon) [15:41:43] aye cool, danke [15:41:59] RobH, we need some help with dells today, if you have some time [15:42:58] ottomata: I can keep trying to troubleshoot those with you, if you'd like [15:43:32] but tbh, it does seem like a networking issue [15:44:03] thanks! does that mean we need Leslie or can RobH figure it out too? [15:44:14] I'm not really sure what to do from this point [15:45:03] what's up? [15:45:30] huh ? [15:45:33] networking issue ? [15:45:40] oh wait [15:45:41] mark: the new analytics dells pxe boot [15:45:53] the thing i was going to check out yesterday but got caught in stupid firewall shit [15:45:54] but then their dhcp requests for install don't hit brewster [15:45:59] whic makes me thing it's firewalled [15:46:04] ok [15:46:18] looking [15:46:32] and you mentioned that the analytics cluster can't talk to the rest of the cluster. which is why I htink it's networking related [15:46:41] it can [15:46:44] i didn't say that [15:46:54] then I misread. sorry [15:47:39] it's not a network issue [15:47:50] you guys didn't add the subnet to dhcpd.conf as I said you should do yesterday :) [15:47:51] ok, cool [15:47:58] ah [15:48:00] hmmmmmm [15:48:19] I did, didn't i? [15:48:23] # analytics1-c-eqiad subnet [15:48:23] subnet 10.64.36.0 netmask 255.255.255.0 { [15:48:38] oh [15:48:40] so you did [15:48:42] my git merge failed [15:49:06] aye locally? [15:49:06] aye [15:49:14] alright [15:49:18] did you tcpdump on brewster? [15:49:20] or stuff like that [15:49:27] I did not…notpeter? [15:49:33] tail of messages [15:49:36] with a grep [15:50:07] tcpdump would have got it, I suppose [15:53:03] but that change didn't make it to brewster [15:53:20] argh [15:53:20] it did [15:53:22] i need coffee [15:53:44] anyway, can someone try to pxe boot one of those servers? [15:53:51] yep [15:55:48] oh I think i see the issue [15:56:10] ospf is not enabled on that interface [15:56:20] * LeslieCarr hides [15:56:21] doh [15:56:23] ah, gotcha [15:56:33] ok, now i go pack and jump on a plane [15:56:36] bye [15:56:45] travel safely! [15:58:32] mark should we do a distinct subdomain for the frack hosts? [15:58:40] i think so [15:58:53] would be clear [15:58:57] frack-eqiad.wmnet ? [15:59:03] or just frack.eqiad.wmnet [15:59:03] frack.eqiad.wmnet? [15:59:10] k i like that [16:00:09] try again guys [16:00:12] ok [16:05:25] cmjohnson1: are you available now? [16:05:33] there are two [16:05:34] first one [16:05:38] I don't know how the shelves are labeled [16:05:42] there are three shelves [16:05:48] the netapp calls them "00", "01" and "10" [16:05:57] high tech [16:06:08] shelf 00 [16:06:19] but i don't know if that reflects your labels [16:06:32] it's drive 11 of that shelf [16:06:37] so drive 00.11 [16:06:43] i [16:06:49] i'll look if i can make it blink or something [16:07:24] yes [16:08:18] that's another one [16:08:29] first do shelf 00 drive 11 [16:08:41] yes it's gone [16:08:44] next: [16:08:51] shelf 10 disk 3 [16:08:51] PROBLEM - Puppet freshness on hume is CRITICAL: Puppet has not run in the last 10 hours [16:08:52] indeed :) [16:09:21] cool [16:09:27] i'll have to make tickets with netapp to get them replaced [16:09:30] so keep em around [16:09:36] thanks [16:09:40] I -think- the netapp will work fine now [16:10:11] will do [16:15:42] haha oops [16:15:45] I created a case with netapp [16:15:48] and had to select priority [16:15:54] I read "P1: the system is not serving any data" [16:15:57] that seemed appropriate [16:16:00] I kind of misread that ;-) [16:16:03] haha [16:24:35] notpeter did you try the netboot again? [16:24:47] i have an interview and meetings coming up, so i'll be kinda not responding much [16:25:02] ottomata: yeah, I'm trying some junk [16:25:06] I'll jsut keep at it [16:25:11] ping you with updates as they come [16:25:33] ok cool [16:25:34] danke [16:25:54] ottomata: liking the weather? [16:26:19] yeah man, just walked out to grab som elunch, got back in just in time before it went crazy crazy [16:26:23] but my bike is outside! [16:32:40] !log Started OnTap upgrade to 8.1 on nas1-a and nas1-b [16:32:49] Logged the message, Master [16:43:10] Logged the message, Master [16:48:45] hey mark, still having problems with pxe boot on analytics1011. I'm looking at tcpdumps, and I can confirm that dchp requests get to brewster when it pxe boots, but not when the ubuntu installer is trying to get a network connection. thoughts? [16:49:23] interesting... [16:49:31] so isolinux loads? [16:49:52] ja [16:53:11] notpeter: more than one ethernet device and uses the wrong one? [16:53:52] mutante: is that a thing that happens? [16:54:00] no [16:54:15] yeah, I see no dhcp traffic coming into brewster [16:54:17] unless you're using multiple NICs [16:58:07] watchmouse says s7 is broken [16:58:14] (ukwiki main page) [16:58:56] <^demon> jeremyb: WFM...might've been transient. I just got an ERR_CANNOT_FORWARD on mw.org, but it went away on refresh. [16:59:05] ^demon: see #-tech [16:59:27] <^demon> Ah ok [17:00:01] why just s7 though... [17:00:12] <^demon> -tech or here? MAKE UP YOUR MIND GOSH!! ;-) [17:00:20] hahah [17:00:27] ^demon: i meant see the people in -tech [17:00:29] idk where [17:03:12] err, that was supposed to be "i don't care where" [17:03:17] fingers too fast ;) [17:07:32] !log temporarily stopping puppet on brewster [17:07:40] Logged the message, notpeter [17:08:27] I'm sure y'all are familiar with OCP, but this article is quite pleasant. http://www.businessinsider.com/inside-facebooks-plan-to-revolutionize-the-entire-hardware-industry-2012-8 [17:08:44] We do time and motion studies on how long does it take a technician to repair a failed component on an open compute server relative to a Tier 1 HP or Dell server. In some areas there's an 8x decrease in the amount of time. That allows us to have one technician supporting up to 15,000 servers [17:08:46] so! [17:09:45] we have, what, ~15 people on ops? that means we can support like 225,000 servers! [17:12:08] dschoon: you get us the servers, we'll make them go :) [17:12:22] * dschoon waves magic wand in the direction of notpeter [17:12:44] * notpeter give dschoon a pamphlet [17:12:45] ;) [17:12:59] <^demon> dschoon: Vaguely reminds me of my high school, where we had 1 FT guy and 1 PT guy doing IT support for ~2500 laptops :) [17:14:26] ^demon: wait. wait. your highschool had how many laptops? [17:15:29] 2 and a half thousand laptops? ... my school had maybe 3 laptops ... [17:15:29] <^demon> All the students were given laptops. [17:15:48] wow [17:15:57] <^demon> That program lasted....the 4 years I was there? Then the county decided giving students laptops was massively expensive for little gain. [17:16:26] nah. just look at you! if you hadn't been given a laptop you wouldn't be in this chatroom now [17:16:29] or something [17:17:06] <^demon> [[w:Matoaca High School]] [17:18:08] oh wow. that's a lot of tech programs [17:21:53] ^demon: your school has no geo template ;( [17:21:59] <^demon> jeremyb: My school sucked. [17:43:57] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [17:59:52] notpeter, baaaackatcha [17:59:53] how goes it? [18:02:01] poorly [18:03:54] I'm having a very hard time figuring out why one round of dhcp requests gets through and succeeds, and the second never even hits the imaging server (according to tcpdump) [18:04:01] dawwwww [18:04:07] imaging server is brewster too, right? [18:04:11] yeah [18:04:29] this didn't matter? [18:04:31] 'ospf is not enabled on that interface'? [18:05:32] notpeter: how many interfaces does brewster have? [18:05:37] maybe? I thought ma_rk verifieid that that was no the case [18:05:51] jeremyb: not sure, but it should all come into the same one [18:05:55] * jeremyb guesses most machines are static so they don't do dhcp after initial install [18:06:54] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [18:08:08] also, another really weird thing: [18:08:13] it has 2, right? [18:08:24] one is the mgmt interface [18:08:28] right? [18:08:29] ja? [18:08:58] yeah, but who cares about mgmt? [18:09:13] i unno, you asked :) [18:09:32] notpeter: maybe any other dhcp server answering instead of the imaging server? [18:09:52] the mac in the dhcpd conf says that a1011's mac is 04:7d:7b:a5:e1:b2. when I drop to busybox, it shows the mac as 90:e2:ba:11:82:30. when I try to use dhcp with the latter, no successful pxe boot (this is all for eth0) [18:10:10] T3rminat0r: seems unlikely? I don't think we have any others [18:10:21] the above makes me worry that cabling might be wrong? [18:10:23] dunno [18:10:25] notpeter: I surely don't know if you do ;) [18:11:16] RobH: are you in the DC today? [18:11:32] I just know it cost me half a day to figure out that there was a dhcp relay running somewhere when I set up an imaging server at my university. [18:11:35] nope, was all day yesterday [18:11:53] notpeter: you could try the BIOS and on the switch [18:12:21] I got the mac that is in the dhcp configs from bios [18:13:01] ottomata: yeah, it doesn't do anything with the second mac I linked [18:13:24] RobH: cool cool [18:13:33] but i mean, yaeh if the dhcp onb rewster gets packets, that must be the right MAC [18:13:35] ottomata: I'm pretty out of ideas at this point, tbh [18:13:48] brewster should log something when an unknown mac sends a request i hope... [18:13:56] ottomata: well, it only gets half the packets it should ;) [18:14:13] right [18:14:31] you guys have all these problems on mulitple machines in the rack [18:14:32] notpeter: have you tried setting a static IP in busybox and then ping, etc. [18:14:32] right? [18:14:56] RobH: haven't tried any others in the rack [18:15:12] wouldnt it suck if that one bit of hardware had a bad bit? [18:15:18] jeremyb: tried. not sure of gw... [18:15:18] hmmmm good point [18:15:20] RobH: yes.... [18:15:24] i mean, its prolly NOT the hardware, but just in case. [18:15:25] we can try 1012 [18:15:28] they all should be ready to boot [18:15:54] ottomata: want to give that a shot? I need to step away from the computer for a sec before I go crazy. [18:16:02] yeah i'll try that [18:16:04] see how far it gets [18:16:09] kk [18:19:10] notpeter, what tcpdump command are you using to watch packets there? [18:19:30] i think an12 is working... [18:19:37] its booting ubuntu installer [18:20:00] oh it did that before too... [18:20:52] tcpdump -i eth0 -vvv -s 1500 '(port 67 or port 68)' [18:20:59] ahhh nope [18:21:05] Network autoconfiguration failed │ [18:21:05] │ Your network is probably not using the DHCP protocol [18:22:29] notpeter, can I just try to configure network manually? [18:22:55] go for it [18:25:13] hrmm, it gets dhcp lease [18:25:16] but does that in the installer? [18:26:06] eh? [18:26:20] it gets the lease from the server but can't configure itself? [18:26:34] i just configured 1012 network manually, but now am hung up on [18:26:47] Download debconf preconfiguration file [18:26:52] can't dl from apt.wikimedia.org [18:26:53] i think [18:28:08] yeah [18:28:08] Failed to retrieve the preconfiguration file │ [18:28:08] │ The file needed for preconfiguration could not be retrieved from │ [18:28:08] │ http://apt.wikimedia.org/autoinstall/preseed.cfg. The installation │ [18:28:08] │ will proceed in non-automated mode. [18:30:02] Jeff_Green: zirconium has a note that its allocated for payments lvs [18:30:12] but my understanding is you have all the eqiad payments lvs you need now in the payments rack [18:30:21] so i think this is a misallocation from past work [18:30:33] it also appears offline, so i wanna reclaim that, if you can confirm [18:30:47] i think that's true, i've certainly never touched it [18:31:03] awesome, thanks! [18:31:14] np [18:31:29] mutante: zirconium will be taking the place for this. its a high performance misc server, which is overkill for just planet [18:31:37] but as the blogs will eventually move onto the same box, its worth the allocation [18:33:43] RobH: alright, thank you [18:34:17] updated ticket as well, im going to drop another ticket in core-ops to document that this host will eventually run blogs [18:34:36] great [18:36:58] robh not sure if you need to update DNS to recommission that host, but fwiw I'm about to roll a few dns changes [18:37:11] well, mutante does ;] [18:37:23] but pretty sure it doesnt have to happen now [18:37:34] since he will want a network admin to setup vlans and the like, but thanks =] [18:37:43] ok well everbody hold off for a few mins KTHXBYE! [18:37:56] :-P [18:38:02] Ryan_Lane: I don't know much about puppet, but I did "files/apache/sites/bugzilla.wikimedia.org: SSLCertificateFile /etc/ssl/certs/star.wikimedia.org.crt [18:38:03] its only a matter of time before our dns is in gerrit. [18:38:23] someone just has to update the authdns scripts to use git properly instead of svn ;] [18:38:28] RobH: yay, 5 minute outages will become 3 hour ones [18:38:32] the 'just has to' is understatement [18:39:00] Ryan_Lane: I don't know much about puppet, but I did "git grep SSLCertificateFile" and found out that some sites (like bugzilla.wikimedia.org) have certs like /etc/ssl/certs/star.wikimedia.org.crt that do not appear in files/ssl. Does this mean they are not puppetized and are just installed locally? [18:39:10] Jeff_Green: ahh, but any dns related outages are automatically hours long ;] [18:39:36] only for the 10% of the world with sucky resolvers! [18:39:48] saper: well, it should have install_certificate somewhere in some manifest [18:41:18] saper: it's star.wikimedia.org.pem [18:41:31] vs. *.wikimedia.org.crt [18:42:20] !log adding frack.eqiad.wmnet hosts to DNS (wmnet and 10..in-addr.arpa) [18:42:28] Logged the message, Master [18:43:20] node "kaulen.wikimedia.org" has "install_certificate{ "star.wikimedia.org": }" install_certificate{} seems to install cert in /etc/ssl/certs/${name}.pem -> does this mean that apache config is wrong and points to a stale local file? [18:43:49] SSLCertificateFile /etc/ssl/certs/star.wikimedia.org.crt [18:46:58] PROBLEM - Puppet freshness on ms-be1003 is CRITICAL: Puppet has not run in the last 10 hours [19:01:58] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [19:06:01] New patchset: Ryan Lane; "Adding domain info in" [operations/debs/adminbot] (master) - https://gerrit.wikimedia.org/r/18495 [19:06:42] Change merged: Ryan Lane; [operations/debs/adminbot] (master) - https://gerrit.wikimedia.org/r/18495 [19:16:57] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [19:20:54] maplebed: let me know when you have swift running 1.5 in copper or something [19:21:26] AaronSchulz: I have it on one host in labs and ms-fe1001 [19:23:00] do you want it on copper? [19:23:20] the packages are in ~ben on fenrai and these are my notes so far: http://wikitech.wikimedia.org/view/User:Bhartshorne/swift_upgrade_notes_2012-08 [19:25:07] well , let me know. I'm out for lunch. [19:26:21] maplebed: it would be nice [19:28:21] mutante, there's a problem with localisation cache on wikitech: see the bottom of http://wikitech.wikimedia.org/view/Lists.wikimedia.org [19:30:32] MaxSem: like this? [19:30:32] 10 06:14:49 < jeremyb> wikitech is kinda borked [19:30:32] 10 06:14:50 < jeremyb> > (cur | prev) 2012-08-10T00:27:32 Kaldari (Talk | contribs) (26,118 bytes) (<sectionlink>Step 2: get the code on fenari) (undo) [19:31:28] < jeremyb> (see "sectionlink") < jeremyb> seems to link to the right place though [19:31:57] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [19:32:19] jeremyb, yeah Categories: Pages with FIXME on them Mail Services [19:32:59] it needs updating [19:33:29] MaxSem: ? this? [19:33:42] It's a message/l10n cache issue [19:33:51] * AaronSchulz wonders what Ryan_Lane is talking about [19:34:09] Unicorns? [19:34:32] notpeter, just to make sure, we need Leslie to help us troubleshoot this? [19:34:39] so we might as well wait til monday? [19:35:54] She's travelling today, I think [19:36:09] yeah, i know, just making sure that I am helpless without her :) [19:36:55] ottomata: I'm out of ideas at this point [19:37:01] I thik some networking help is needed [19:37:03] ok cool [19:37:04] thanks [19:38:27] AaronSchulz: eh? [19:38:29] og [19:38:39] AaronSchulz: wrong ryan? [19:38:57] !g Ic116ee7d38d01f4d430380842e94c01f670e9901 [19:38:57] https://gerrit.wikimedia.org/r/#q,Ic116ee7d38d01f4d430380842e94c01f670e9901,n,z [19:39:06] Ryan_Lane: totally right Ryan [19:40:11] Is there a completely left Ryan too? [19:41:17] AaronSchulz: what are we talking about? [19:43:46] Ryan_Lane: it sounded like you and patrick were having an interesting discussion :) [19:43:54] ah [19:43:55] heh [19:43:55] * AaronSchulz debugs in eval.php [19:47:59] mutante, yes [21:00:54] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [21:00:54] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [21:26:41] New patchset: Demon; "Adding .gitreview" [operations/debs] (master) - https://gerrit.wikimedia.org/r/18660 [21:26:49] New patchset: Demon; "Adding .gitreview" [operations/debs/gerrit] (master) - https://gerrit.wikimedia.org/r/18661 [21:26:56] New patchset: Demon; "Adding .gitreview" [operations/debs/ircd-ratbox] (master) - https://gerrit.wikimedia.org/r/18662 [21:27:21] New patchset: Demon; "Adding .gitreview" [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/18663 [21:27:31] New patchset: Demon; "Adding .gitreview" [operations/debs/nodejs] (master) - https://gerrit.wikimedia.org/r/18664 [21:28:13] New patchset: Demon; "Adding .gitreview" [operations/debs/puppet] (master) - https://gerrit.wikimedia.org/r/18665 [21:28:21] New patchset: Demon; "Adding .gitreview" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/18666 [21:28:29] New patchset: Demon; "Adding .gitreview" [operations/debs/search-qa] (master) - https://gerrit.wikimedia.org/r/18667 [21:28:37] New patchset: Demon; "Adding .gitreview" [operations/debs/testswarm] (master) - https://gerrit.wikimedia.org/r/18668 [21:28:45] New patchset: Demon; "Adding .gitreview" [operations/debs/udp2log-log4j-java] (master) - https://gerrit.wikimedia.org/r/18669 [21:28:53] ^demon: hurry up please ;) [21:29:08] <^demon> I'm just letting my script do its thing ;-) [21:29:10] New patchset: Demon; "Adding .gitreview" [operations/debs/varnish] (master) - https://gerrit.wikimedia.org/r/18670 [21:29:18] New patchset: Demon; "Adding .gitreview" [operations/debs/wikimedia-base] (master) - https://gerrit.wikimedia.org/r/18671 [21:29:24] so slow [21:29:26] New patchset: Demon; "Adding .gitreview" [operations/debs/wikimedia-job-runner] (master) - https://gerrit.wikimedia.org/r/18672 [21:29:34] New patchset: Demon; "Adding .gitreview" [operations/debs/wikimedia-keyring] (master) - https://gerrit.wikimedia.org/r/18673 [21:29:42] New patchset: Demon; "Adding .gitreview" [operations/debs/wikimedia-ldap-tools] (master) - https://gerrit.wikimedia.org/r/18674 [21:29:50] New patchset: Demon; "Adding .gitreview" [operations/debs/wikimedia-lvs-realserver] (master) - https://gerrit.wikimedia.org/r/18675 [21:29:57] New patchset: Demon; "Adding .gitreview" [operations/debs/wikimedia-search-qa] (master) - https://gerrit.wikimedia.org/r/18676 [21:30:06] New patchset: Demon; "Adding .gitreview" [operations/debs/wikistats] (master) - https://gerrit.wikimedia.org/r/18677 [21:30:13] New patchset: Demon; "Adding .gitreview" [operations/deployment] (master) - https://gerrit.wikimedia.org/r/18678 [21:30:21] New patchset: Demon; "Adding .gitreview" [operations/dns] (master) - https://gerrit.wikimedia.org/r/18679 [21:30:31] New patchset: Demon; "Adding .gitreview" [operations/mediawiki-multiversion] (master) - https://gerrit.wikimedia.org/r/18680 [21:30:45] New patchset: Demon; "Adding .gitreview" [operations/software/gitlist] (master) - https://gerrit.wikimedia.org/r/18681 [21:30:55] phh [21:30:59] come on [21:31:04] you can do it [21:37:10] ^demon: oh, i was about to add a .gitreview file for wikistats and then i see you literally JUST did that? nice :) [21:37:23] <^demon> Yeah, I just did it for all repos that didn't have them. [21:37:33] <^demon> Feel free to just approve the changes I pushed :) [21:37:39] i did not look at the channel, and cloned it like an hour ago and lacked it , nice timing :) [21:39:41] hmm, shouldn't i get it though when doing git pull now.. i don't [21:39:54] oh, not merged yet, gotcha [21:40:33] Change merged: Dzahn; [operations/debs/wikistats] (master) - https://gerrit.wikimedia.org/r/18677 [21:43:18] ok ..:) "Your change was committed before the commit hook was installed" [21:44:13] Change merged: Demon; [operations/deployment] (master) - https://gerrit.wikimedia.org/r/18678 [21:44:33] Change merged: Demon; [operations/software/gitlist] (master) - https://gerrit.wikimedia.org/r/18681 [21:45:13] btw, on the labsconsole git page, we removed the part about configuring your email address, but it was still needed for me [21:45:26] and "git commit" -> "git commit -a", right [21:45:48] just noticed because i setup on a new computer yesterday [21:46:45] <^demon> mutante: Configuring your e-mail address should be `git config [--global] user.email "me@you.us"` [21:47:11] yeah, i got that, just saying it was deleted from the wiki page [21:48:02] <^demon> Well we're trying to move most of the git docs onto mw.org, since they need to be there for devs too :) [21:48:02] editing ... [21:48:09] ah, ok [21:54:21] !log live hacked nova on virt2 with parts of nova git commit 4584e552a653904c36cf04cb295a7bf09d2def28 [21:54:26] New patchset: Dzahn; "fix sorting by active users" [operations/debs/wikistats] (master) - https://gerrit.wikimedia.org/r/18694 [21:54:29] Logged the message, Master [21:55:16] New review: Dzahn; "as pointed out by Siebrand sorting by activeusers was broken" [operations/debs/wikistats] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/18694 [21:55:16] Change merged: Dzahn; [operations/debs/wikistats] (master) - https://gerrit.wikimedia.org/r/18694 [21:56:13] Ryan_Lane: do we have our own nova mirror? [21:56:18] no [21:56:23] which is why I said live hack [21:56:28] and that patch isn't going to work [21:56:33] we're missing some other patch before it [22:06:55] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [22:53:08] Change merged: Demon; [operations/mediawiki-multiversion] (master) - https://gerrit.wikimedia.org/r/18680 [22:56:54] New patchset: Dzahn; "use template for planet config and apache site config" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18705 [22:57:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18705 [23:00:25] New patchset: Dzahn; "use template for planet config and apache site config" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18705 [23:01:05] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18705 [23:10:20] New patchset: Dzahn; "use template for planet config and apache site config" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18705 [23:10:59] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18705 [23:16:37] New patchset: Dzahn; "use template for planet config and apache site config" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18705 [23:17:26] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18705 [23:38:15] New patchset: Dzahn; "ensure /usr/share/planet-venus/wikimedia and all the subdirs exist" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18710 [23:38:53] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/18710 [23:38:59] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/18710