[00:44:31] PROBLEM - Memcached on srv255 is CRITICAL: Connection refused [01:51:49] PROBLEM - Memcached on srv257 is CRITICAL: Connection refused [01:59:19] PROBLEM - Memcached on srv254 is CRITICAL: Connection refused [02:00:19] PROBLEM - Memcached on srv256 is CRITICAL: Connection refused [02:05:15] New patchset: Diederik; "Project structure and initial commit." [analytics] (master) - https://gerrit.wikimedia.org/r/2044 [02:15:49] PROBLEM - MySQL replication status on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 1377s [02:17:54] New review: Diederik; "(no comment)" [analytics] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2044 [02:18:45] New review: Diederik; "(no comment)" [analytics] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2044 [02:19:05] New review: Diederik; "(no comment)" [analytics] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2044 [02:19:06] Change merged: Diederik; [analytics] (master) - https://gerrit.wikimedia.org/r/2044 [02:22:00] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 1747s [02:34:29] PROBLEM - Packetloss_Average on emery is CRITICAL: CRITICAL: packet_loss_average is 8.78355825688 (gt 8.0) [02:36:09] RECOVERY - MySQL replication status on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [02:38:39] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 9.62220981651 (gt 8.0) [02:42:09] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [04:21:21] RECOVERY - Disk space on es1004 is OK: DISK OK [04:24:42] RECOVERY - MySQL disk space on es1004 is OK: DISK OK [04:36:42] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [04:57:12] PROBLEM - Packetloss_Average on emery is CRITICAL: CRITICAL: packet_loss_average is 8.30751299065 (gt 8.0) [06:25:41] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 86, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/2/1: down - Core: cr1-sdtpa:xe-0/0/1 (Level3/FPL, CV71026) {#2008} [10Gbps wave]BR [06:27:21] PROBLEM - Router interfaces on cr1-sdtpa is CRITICAL: CRITICAL: host 208.80.152.196, interfaces up: 76, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/0/1: down - Core: cr1-eqiad:xe-5/2/1 (FPL/GBLX, CV71026) [10Gbps wave]BR [08:04:59] PROBLEM - Lucene on search3 is CRITICAL: Connection timed out [08:47:43] anyone know if we have an ops meeeting today (= monday)? [08:52:40] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.9968854902 [08:58:00] RECOVERY - Packetloss_Average on emery is OK: OK: packet_loss_average is 3.0833322 [09:04:30] PROBLEM - Puppet freshness on knsq9 is CRITICAL: Puppet has not run in the last 10 hours [09:12:30] PROBLEM - Puppet freshness on cp1044 is CRITICAL: Puppet has not run in the last 10 hours [09:38:58] RECOVERY - Router interfaces on cr1-sdtpa is OK: OK: host 208.80.152.196, interfaces up: 78, down: 0, dormant: 0, excluded: 0, unused: 0 [09:39:58] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 88, down: 0, dormant: 0, excluded: 0, unused: 0 [09:52:18] PROBLEM - Disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 453303 MB (3% inode=99%): [09:55:38] PROBLEM - MySQL disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 411038 MB (3% inode=99%): [10:03:08] PROBLEM - Disk space on srv222 is CRITICAL: DISK CRITICAL - free space: / 103 MB (1% inode=60%): /var/lib/ureadahead/debugfs 103 MB (1% inode=60%): [10:13:08] RECOVERY - Disk space on srv222 is OK: DISK OK [10:51:37] RECOVERY - MySQL slave status on es1004 is OK: OK: [11:29:54] PROBLEM - LVS Lucene on search-pool1.svc.pmtpa.wmnet is CRITICAL: Connection timed out [12:10:31] Ops ticket: https://bugzilla.wikimedia.org/33897 Bug 33897 - Enable HTTPS for mail.wiki[mp]edia.org and redirect to lists [12:10:31] ;) [12:31:35] RECOVERY - LVS Lucene on search-pool1.svc.pmtpa.wmnet is OK: TCP OK - 0.002 second response time on port 8123 [12:46:15] PROBLEM - Puppet freshness on srv199 is CRITICAL: Puppet has not run in the last 10 hours [13:29:20] !log restarted search1, search3, search4 - not sure why they were dead [13:29:22] Logged the message, Master [13:37:46] RECOVERY - Lucene on search3 is OK: TCP OK - 0.002 second response time on port 8123 [16:01:26] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1958 [16:01:26] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1958 [16:02:45] hey it's a mutante [16:02:52] do you know if there is an ops meeting today? [16:13:17] apergos: hello ..no i dont:) [16:13:50] ok [16:14:02] I know they have the week of ops meetings or something [16:14:33] apergos: when i saw the mail about the hackathon starting, and no mail about a meeting.. [16:14:44] it made me think its unlikely there is a meeting [16:19:08] guess we'll see, I thought ops was mostly not the hackathon [16:20:50] it ended yesterday (er sunday) [16:21:09] oh..hm.ok [16:22:32] hi :) [16:22:42] do ops look at bugzilla bugs tagged with +ops ? [16:25:14] is there a direct link to that?:) [16:25:30] like the tagged tickets right away [16:25:37] checking [16:26:47] hashar: its more like the bugmeister uses that tag to then create RT and link [16:31:38] k [16:31:52] I have tagged some bugs with 'ops' keyword but did not create the RT ticket :/ [16:32:11] hexmode: are you creating a RT ticket each time a bug is tagged with 'ops' keyword? [16:33:21] hashar: no, but if you tag with ops, please make sure ops is on it [16:34:01] ops is on it ? [16:34:40] well I have created the ticket :D [16:38:12] thanks, +1 for already linking to bz [16:39:17] :-)) [16:39:28] I am heading out. See you later! [17:39:55] cmjohnson1: Heyas, we are around for the locke move later today =] [17:46:53] cmjohnson1: i see the c series arrived too [17:46:58] so other than locke, that is a priority [17:49:39] oh yay [17:56:11] robh: yes...c series arrived today [17:56:52] LeslieCarr: regarding Locke did you open the port on asw-d1 (5) [17:56:56] cmjohnson1: you should have some wifi ap bridges and such [17:57:06] but those are secondary to locke, then c series server [17:57:16] the newegg box came as welll...i haven't opened it up yet [17:59:01] cool should be your wireless APs, usb memory, and whatever the hell else i ordered but now forget ;] [17:59:12] ahh mgmt switch [17:59:26] cmjohnson1: lemme double check [18:01:26] cmjohnson1: still need mrj21 cable? https://rt.wikimedia.org/Ticket/Display.html?id=1997 [18:02:42] cmjohnson1: ge1/0/6 is allocated for locke [18:03:38] robh: yes I don't need them at the moment but if we expand anymore I am going to need them. [18:04:22] ok, is the order is for 2 more, both to go form a1 to c2 sdtpa? [18:04:22] LeslieCarr: okay but that will leave 5 unused...robh: will this be ok? [18:04:45] hrmm, better confirm with leslie, she is on call at the moemtn [18:04:46] robh: correct [18:05:08] cmjohnson1: well, i am confused a bit [18:05:14] so c3 only has 3 cables going to it? [18:05:18] i'm confused :) [18:05:28] ok, forget the mrj21 question [18:05:30] anywyas, i had a previous ticket which allocated port 5 for ms-be1 [18:05:37] ahh, that got moved [18:05:39] i think [18:05:42] lemme check it out [18:06:22] i have no idea. [18:06:36] c2 has 4 trunk cables ( i had to borrow 1 from c3) ...c3 only has 1...if we intend to expand in that rack, i am going to need more cables [18:06:42] ahh [18:06:48] so need 3 more for c3 [18:06:54] to make it a total of 4 of them [18:07:00] yes... [18:07:02] ok [18:07:18] I don't remember what I put in ticket...lemme check [18:07:56] its conflicting from 2 to 4 ir ead [18:08:02] i just updated to order 3 per irc with ya [18:10:36] so, i can move the port to 1/0/5 or keep it at 1/0/6, but first, i get bfast [18:10:37] :) [18:12:53] lesliecarr: can you move Locke to port 1/0/5 -ms-be1 to port 1/0/6 [18:13:03] leslie just went to get breakfast [18:13:07] that will keep the servers in order [18:13:21] mark: okay [18:13:26] do you need it now? [18:13:28] then I can do it [18:13:45] I need locke moved to 1/0/5 now [18:13:59] we want to move it in 45 minutes [18:14:12] thx [18:14:35] ok [18:18:03] cmjohnson1: it's done [18:18:15] thx mark [18:28:10] hi LeslieCarr, TimStarling and everyone [18:32:52] hey aude [18:33:00] cmjohnson1: will do [18:33:05] oh mark got it [18:33:45] he did ...i didn't know when you would be back [18:41:00] LeslieCarr: i remember that Google (and others?) have or had an arrangement to get a feed of recent changes and all from wikipedia [18:41:14] do you know about that or who does, and is that still the case? [18:41:26] OAI [18:41:34] Ryan_Lane knows more, it is still the case [18:41:39] LeslieCarr: ok [18:41:47] i believe they look at the recent changes page - but not 100% certain [18:42:07] LeslieCarr: i imagine they get the irc feed and then query the api unlimited [18:42:12] I am surethey work off of rc [18:42:28] cause in discussions with them about crawling they allude to this but I never asked what they use [18:42:34] apergos: special access? [18:42:48] I don't think there's any special arrangement [18:42:49] basically, we won't block them for abuse [18:42:56] that's about it [18:43:06] i'm not aware of any special access [18:43:07] if i make an article now, it will be in google in like 2 seconds [18:43:13] mark: ok [18:43:28] for wikimania, we want to talk to google as a possible sponsor [18:43:40] they obviously benefit from wikipedia, enriching their product [18:44:03] and wonder if we have any particular point of contact, regarding this... [18:44:13] or i might try other avenues, but they are challenging [18:44:21] we have all kinds of contacts within google, for various reasons [18:44:27] mark: ok [18:44:29] various cooperations [18:44:33] yeah, there's stuff like GSOC [18:44:35] but I'm not involved in them [18:44:45] mark: who might know? [18:44:57] the tech side has 0 cooperation/say into the money side, so our tech resources wouldn't be the right place to look [18:45:02] i've talked to Kul, but not sure he's best to know [18:45:13] LeslieCarr: sure [18:45:14] kul would be most obvious [18:45:38] and does any other company have similar arrangements? [18:45:43] like for bing? [18:45:58] i'm not aware of any search engine having any special access [18:46:00] technically [18:46:12] okay! [18:46:25] i remember we used to get some small income from this [18:46:36] it was in the financial reports [18:47:05] I have a contact I could ask if you like [18:47:07] he probably knows [18:47:12] about the rc stuff [18:47:15] or what about something like what the toolserver does with replicating the db [18:47:20] i guess they are the only ones to do that [18:47:21] about sponsorship I have no idea [18:47:29] apergos: that would help i think [18:47:46] to be able to understand, at least, what benefits they get [18:48:05] there is a google office in DC, so we can try there too [18:48:26] aude: we're not getting any money from them for this, i think sergei's foundation gave us a good grant this year (unrelated) [18:48:59] LeslieCarr: i know about the grant [18:49:07] and there have been other grants from google itself [18:49:10] the benefit we get is that changes show up in google in 20 min or less [18:49:14] even on the small projects [18:49:20] apergos: yep [18:50:09] and they don't crawl every single page every 20 minutes to figure out the changes [18:50:10] oomph [18:50:23] that would hurt us a bit [18:50:57] the irc RC feed would help them and then query the api for just those pages [18:51:16] i don't know if they are actually in the irc channels though [18:51:26] er well [18:51:34] they do make api requests for the changes, in fact [18:51:35] they might just ping the api for RC very often [18:51:49] I believe they use OAI [18:51:53] they don't hit every one instantly, they batch them up [18:51:54] like the search indexers do [18:52:01] but every I forget minute or whatever [18:52:14] at least, this is what they were doing. [18:52:24] RoanKattouw: ok [18:52:36] no search engines have special access [18:52:41] but they don't crawl us normally [18:52:54] * aude is either crazy or thought they used to [18:53:45] they crawl our recent changes [18:53:49] anyone can do this [18:54:00] Ryan_Lane: yeah [18:54:12] and they likely do the same for all mediawikis [18:54:28] but they don't actually talk about what they *really* do [18:54:33] so, who knows for sure [18:54:58] Ryan_Lane: makes sense [18:55:02] well in this case I know they crawl us because I had disucssions with one of them about rate limits [18:55:28] ie I know they get rc, I know they use the api, I know they batch them up and if I dig through my irc logs I can give some numbers about the limits [18:56:22] apergos: yeah, i am aware of rate limits [18:56:50] I'm not sure asking them for a grant because we enrich the service is the right way of going about it ;) [18:57:32] technically, most of our views come from them, so if anything we benefit more from them, than they do from us [18:57:39] Ryan_Lane: what ideas do you have? [18:57:54] Ryan_Lane: +1 [18:57:57] * aude thinks it's mutual benefit [18:58:17] we'd lose a very large amount of traffic if we fell in rankings [18:58:45] though we offer some benefit to them, it's very lopsided in their direction [18:59:15] anyway, it's likely better to approach from the "begging" POV [18:59:26] since that's how non-profits work :D [18:59:34] Ryan_Lane: sure we can try that [18:59:46] but I'd say talk to KUl [18:59:48] Kul [18:59:52] since he's way better at this stuff [19:00:13] Ryan_Lane: sure [19:00:28] and go with whatever feels right for you guys [19:00:55] Where the hell am i gonna put Ryan_Lane gluster shit [19:00:56] =/ [19:01:07] tampa is too full. [19:01:42] if it runs out of someone's home it's more carbon efficient because it heats it [19:01:48] and we all knoiw you need more heating in tampa [19:02:54] cmjohnson1: you ab out? [19:03:10] i am here [19:03:15] on the 12th [19:03:24] how do I fix up wikidiff in labs for https://bugzilla.wikimedia.org/33331 ?? [19:04:13] why not ask labs questions in the labs channel? [19:04:44] too many dangerous chemicals in there [19:04:55] yeah, but ok... [19:05:37] cmjohnson1: ok, prepping the locke move right? [19:05:50] or is my schedule off [19:05:57] no, its now, cool [19:06:08] robh: I am ready to make the move...it is 11 PST so whenever... [19:06:16] ok, lemme do a quick check with folks [19:06:19] k [19:06:36] !log going to shutdown locke now for the move [19:06:38] Logged the message, RobH [19:06:48] cmjohnson1: so this should move asap, pulling it up to shutdown now, ready to go on your end? [19:07:02] yes [19:07:04] ready to go [19:07:05] (i see you said you were, but i am about ot hit enter and kill it) [19:07:06] cool [19:07:09] kill it [19:07:10] !log locke down [19:07:12] Logged the message, RobH [19:07:19] cmjohnson1: when it shuts off, you can move it [19:07:48] omg what happened to locke ? [19:07:52] i was running an important query [19:08:07] tough shit [19:08:08] ;] [19:09:50] cmjohnson1: lemme know once its reconnected and i will power on and such from drac =] [19:12:57] robh: it is reconnected but i powered on [19:13:06] thats fine [19:13:25] i was just gonna watch it post, connectin now [19:13:31] well, trying, it takes a few minutes for drac to fire up [19:15:17] PROBLEM - Puppet freshness on knsq9 is CRITICAL: Puppet has not run in the last 10 hours [19:16:29] there will prolly be a bit of cronspam [19:16:39] seems fine [19:18:12] what's our preferred method of changing the ip/dns name for a machine - do we want to reinstall the machine (idle) or do it manually ? [19:18:42] definitely reinstall, and preferably don't rename at all [19:19:25] !log locke seems ok [19:19:25] well it's sort of renamed (cp1001.wikimedia to .eqiad.wmnet) [19:19:26] Logged the message, RobH [19:19:39] cmjohnson1: i think we are good, you will wnana drop a ticket in network to kill the old locke port if you dont mind [19:19:45] are those installed then? [19:19:54] but yeah, they will need to be reinstalled if that's the case [19:19:58] well i just went to sa few and they're in a weird state [19:20:03] robh: okay [19:20:04] at a ubuntu installation menu [19:20:07] so crazy [19:20:10] so they're not installed [19:20:13] cmjohnson1: also, we had you working on killing all servers below srv187 right? [19:20:16] just leave them, we'll get to them soon [19:20:24] so any racks with them i can assume they wont stay [19:20:26] cool, i'll just update all the relevant bits .. [19:20:29] yes...but I still need a ticket [19:20:30] thanks :) [19:20:40] cmjohnson1: cool, i will make a real ticket for ya now [19:20:43] especially the mobile caches… ;) [19:20:56] also...can you get me a ticket for killing db7 and wiping [19:20:59] mobile caches we should keep external for a while [19:21:10] I was thinking of only making the uninstalled ones internal (for now) [19:21:14] cmjohnson1: will do now [19:21:22] so not cp1041-44 [19:22:07] PROBLEM - Puppet freshness on cp1044 is CRITICAL: Puppet has not run in the last 10 hours [19:22:29] cmjohnson1: https://rt.wikimedia.org/Ticket/Display.html?id=2318 [19:24:33] cmjohnson1: dropping a ticket for you to kill db5, db7, db8. db9 needs to stay alive for now [19:25:41] * maplebed checked locke's udp2log processes - they all seem to be running and logs are getting populated [19:26:13] * apergos checks cronspam [19:27:56] nothing. [19:27:58] good [19:31:13] mark: i was joking :) not gonna kill off the running mobile caches [19:31:18] don't need to do 2 takedowns in one week [19:31:39] Ryan_Lane: where's the web browsable git repo again ? (going to put that info in labsconsole so i can just point people to there for everything git) [19:31:43] "I survived two takeddowns in one week and all I got was this lousy t-shirt"? [19:32:23] anonymous!! ah hah [19:32:26] the secret is out... [19:32:35] did we screen her? :-P [19:33:23] i switched from wired/wireless [19:34:00] are we supposed to be calling in at some point? [19:34:42] hrm [19:38:25] LeslieCarr: https://gerrit.wikimedia.org/r/gitweb?p=operations/puppet.git;a=summary [19:38:35] I need to make a rewrite on the server [19:39:12] for https://gerrit.wikimedia.org/gitweb/operations/puppet -> https://gerrit.wikimedia.org/r/gitweb?p=operations/puppet.git;a=summary [19:39:16] and things like it [19:42:57] cmjohnson1: so lemme know when the c series is all racked up and such [19:43:16] cuz when it is i have to beat ben to it ;] (mostly need to bash on it and hand it off since he has been patiently waiting) [19:49:04] robh: it is racked..just need drac set up [19:50:22] ahh, did i give you that info? [19:50:35] robh: no [19:50:41] eww, gettin it now [19:50:47] asset tag #? [19:52:39] cmjohnson1: nevermind, i forgot this is called ms-be1 [19:52:42] so settin it up now [19:53:41] ok...asset is in racktables [19:54:17] cmjohnson1: updated, ip 10.1.7.21 mgmt for ms-be1 [19:54:41] k..thx [19:55:13] cmjohnson1: https://rt.wikimedia.org/Ticket/Display.html?id=2287 [19:55:23] update that ticket, since the locke server may be in that port [19:55:26] not sure, confirm its ok though [19:55:36] not sure why LeslieCarr assigned na ip though ;] [19:56:07] oh sometimes people want to have ip's set up and i must have been on a roll or something :) [19:56:17] yea but there is no ip in dns for it ;] [19:56:27] oh, there is, hrmm [19:59:18] !log added dns info for ms-be1 but not pushing change until leslie pushes her [19:59:18] s [19:59:19] Logged the message, RobH [19:59:22] bleh, typo on adminlog [20:02:20] All the apache etc are going to be running 64bit ubuntu now, right? [20:05:09] apergos: dell got back with your quote for h700 [20:05:16] ordering it now and returning the other one next week [20:05:24] yayayay [20:05:47] 7 days til rsync... [20:08:00] ordered [20:08:07] should arrive this week [20:08:12] \o/ [20:10:00] apergos: owa1-3 is being used by ben for swift performacing testing and will be freed up over the coming weeks [20:10:05] for your juiper project thing, fyi [20:10:08] no rush [20:10:15] one of those would be ideal i think [20:10:22] what are the specs? [20:10:46] (I have my notes on installation, with plenty of detail in case I forget everything by then) [20:12:05] hmm looks like 8gb ram (plenty) and 90gb or whatever (also fine) [20:12:35] enough to run a little pile of instances :-) [20:18:39] binasher: can you read over a commit for me? [20:20:35] notpeter: maybe… can you build (at least some of) the squids that woosters threw at me earlier in the ops meetings? i'd like to continue doing db stuff thru the week. [20:21:45] binasher: uh... sure. what kind of squids? [20:22:12] also, should be a quick thing to read over. just a new set of nrpe checks. and a couple of scripts. just want another set of eyes [20:22:13] squidly squids [20:22:37] ah, yes. sir squiddington, esq. [20:22:50] where's the commit? [20:23:17] New patchset: Pyoungmeister; "adding check commands for udp2log. if this works correctly, I will add a contact group for consumers of this who have shells" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2049 [20:23:21] there [20:23:36] binasher: ^ [20:24:23] how many squids are we talking about? [20:25:18] apergos: sq62-sqOVER9000 [20:27:06] seriously, if it's tons I can take a day to do some [20:28:32] apergos: yeah... I have no idea. I think something was said in the meeting, but I can't understand most of what's being said [20:28:42] ok, well let me know [20:28:47] ask in the chat [20:29:01] actually binasher should ask in the chat if he doesn't know how many [20:29:15] binasher: can you point me to a tic for these new sir squiddington, esquires>? [20:29:42] notpeter: no [20:31:07] notpeter: the check scripts look good, and i tested the log file and filter proc finding regexes on emery [20:31:21] they both check out on lock and emery [20:31:36] indeed [20:31:47] it's just easy to forget one of the 8 moving parts for a successful nrpe check... [20:32:02] you'll have to manually restart nrpe though [20:32:18] that's fine [20:32:21] unless you subscribe nrpe to the new cfg [20:32:43] eh, I can do that when I add in a contact group [20:33:01] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2049 [20:33:49] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2049 [20:55:07] !log rebooting db1029 with proprietary binary only huawei kernel module installed, for short term ssd evaluation [20:55:08] Logged the message, Master [20:58:23] binasher: tell us how you really feel [21:01:19] LeslieCarr: i feel like it would be very much against the foundations principles to give a bunch of money to the company that builds and maintains china's internet firewall without even being able to eval competing and likely superior products :) [21:04:33] see, the smiley face at the ends makes that less serious [21:07:09] New patchset: Diederik; "Initial commit of pipeline.py" [analytics] (master) - https://gerrit.wikimedia.org/r/2050 [21:12:44] New review: Ottomata; "I'm working on documentation for this stuff, and will have lots of comments shortly." [analytics] (master) C: 1; - https://gerrit.wikimedia.org/r/2050 [21:14:11] New review: Diederik; "We are happy." [analytics] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2050 [21:14:11] Change merged: Diederik; [analytics] (master) - https://gerrit.wikimedia.org/r/2050 [21:28:00] hi again [21:29:09] Reedy: RobH ping [21:29:32] hi [21:29:46] Reedy: you do server-side large uploads for commons? [21:29:59] i know that Roan-meeting does them also [21:30:25] I can do, yeah [21:30:52] dmcdevit has one to do tomorrow, early before SF people are in the office [21:32:10] it's time sensitive [21:32:34] Reedy: are you in SF now? or not [21:32:47] noope [21:32:53] cool [21:33:08] i'll have dominic ping you when he's ready [21:38:03] Jeff_Green: you there? [21:38:09] yessir [21:50:08] RobH, is the only part of adding a new wiki that needs to be done by ops the dns part? [21:51:20] reedy: rob stepped out for food [21:51:29] ah, cheers [21:54:15] PROBLEM - Disk space on srv223 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=60%): /var/lib/ureadahead/debugfs 0 MB (0% inode=60%): [21:54:42] New patchset: Pyoungmeister; "some cleanup and adding diederik/analytics group to alerts for udp2log" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2051 [21:55:35] pyoungmeister; YAY! [21:55:46] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2051 [21:55:47] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2051 [22:05:46] diederik: yes. I'm wating for puppet to run on our monitoring host [22:06:16] then I'm going to add in some face filter (but not restart udp2log so that it won't actually break anything) and see if it sends to you properly [22:11:44] Reedy: the only part is indeed, ops have to push dns changes [22:12:05] they are automated though when you create the wiki and add the langcode to the list [22:12:10] so ops only has to run an update [22:12:48] There's 4 outstanding requests for new wikis at hte moment [22:13:00] notpeter: sounds good! [22:23:58] i forget what the hell 'ocg#' does [22:24:07] Ryan_Lane: ^ these are yers right? [22:24:15] RECOVERY - Disk space on srv223 is OK: DISK OK [22:26:53] apergos: what should we name the juniper test server? [22:27:05] mobile1 has bad cable needs replacing, otherwise it will become the test server for you [22:27:56] um [22:28:00] a name, rats [22:28:12] see the other channel [22:55:14] so [22:55:25] the SF office internet line comes down [22:55:37] and we did not receive any nagios notification! :-D [22:55:42] well [22:55:46] the site can run without it :-P [22:55:50] that deserves an RT ticket :-) [22:56:08] why would nagios tell us? we can't fix it :-P [22:56:34] so remote worker know why the VOIP platform is unreachable 8-) [22:56:36] <^demon|away> And it hardly affects those of us with real internet connections ;-) [22:56:51] guess Office IT need their own nagios bot somewhere :) [22:57:16] <^demon|away> So we can watch the bot complain about network internet down when nobody from OIT is online to see it? Perfect! [22:57:34] <^demon|away> network internet? what kind of word vomit was that... [22:57:59] imagine that my internet connection, so crappy that downloading any file means I can't view any web page at the same time... [22:58:08] is considered now a "real internet connection" :-D [22:58:26] ^demon|away: na the idea is to show confirmation that SF office is down [22:58:55] which would explain why the various people there do not answer on IRC since the server keep them up pending timeout [22:59:07] <^demon|away> apergos: Well, did your connection stay up just now? :D [22:59:10] so that is just an helper [22:59:15] well... [22:59:17] yes, yes it did :-D [23:00:11] PROBLEM - Puppet freshness on srv199 is CRITICAL: Puppet has not run in the last 10 hours [23:00:50] robh: i updated the ticket for ms-be1...let me know the next step [23:03:51] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Mon Jan 23 23:03:21 UTC 2012 [23:06:08] New patchset: RobH; "more servers to decom" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2052 [23:07:40] New review: RobH; "I'm the best at what I do." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2052 [23:07:41] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2052 [23:07:50] logan quote ftw. [23:08:47] logged rt #2327 for the dns entries for the new wikis [23:10:26] !log srv187, srv188, srv189 set to false in pybal for api lvs, old servers that will be decommed soon. [23:10:27] Logged the message, RobH [23:14:00] !log dns update for a bunch of things [23:14:02] Logged the message, RobH [23:21:31] New patchset: Catrope; "Add .gitreview file" [analytics] (master) - https://gerrit.wikimedia.org/r/2053 [23:22:12] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - check plugin (check_job_queue) or PHP errors - [23:31:51] !log started slaving db53 from db36 (enwiki) [23:31:53] Logged the message, Master [23:43:28] New review: Diederik; "Make repository git-review ready." [analytics] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2053 [23:43:28] Change merged: Diederik; [analytics] (master) - https://gerrit.wikimedia.org/r/2053