[00:00:32] New patchset: Ryan Lane; "Switch from upstart to init for glusterd" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52556 [00:01:52] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52556 [00:05:15] New patchset: Ryan Lane; "Fix reference to gluster's upstart file" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52557 [00:06:05] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52557 [00:08:00] RECOVERY - Puppet freshness on spence is OK: puppet ran at Thu Mar 7 00:07:53 UTC 2013 [00:08:40] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [00:08:40] RECOVERY - Puppet freshness on constable is OK: puppet ran at Thu Mar 7 00:08:37 UTC 2013 [00:09:11] RECOVERY - Puppet freshness on spence is OK: puppet ran at Thu Mar 7 00:09:02 UTC 2013 [00:09:36] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/52555 [00:09:40] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [00:09:40] PROBLEM - Puppet freshness on constable is CRITICAL: Puppet has not run in the last 10 hours [00:10:00] RECOVERY - Puppet freshness on constable is OK: puppet ran at Thu Mar 7 00:09:52 UTC 2013 [00:10:10] RECOVERY - Puppet freshness on es3 is OK: puppet ran at Thu Mar 7 00:10:02 UTC 2013 [00:10:20] RECOVERY - Puppet freshness on spence is OK: puppet ran at Thu Mar 7 00:10:14 UTC 2013 [00:10:41] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [00:10:41] PROBLEM - Puppet freshness on constable is CRITICAL: Puppet has not run in the last 10 hours [00:11:01] RECOVERY - Puppet freshness on constable is OK: puppet ran at Thu Mar 7 00:10:53 UTC 2013 [00:11:21] RECOVERY - Puppet freshness on spence is OK: puppet ran at Thu Mar 7 00:11:12 UTC 2013 [00:11:40] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [00:11:41] PROBLEM - Puppet freshness on constable is CRITICAL: Puppet has not run in the last 10 hours [00:11:50] RECOVERY - Puppet freshness on constable is OK: puppet ran at Thu Mar 7 00:11:47 UTC 2013 [00:12:11] RECOVERY - Puppet freshness on spence is OK: puppet ran at Thu Mar 7 00:12:02 UTC 2013 [00:12:20] RECOVERY - Puppet freshness on amssq38 is OK: puppet ran at Thu Mar 7 00:12:18 UTC 2013 [00:12:30] RECOVERY - Puppet freshness on knsq24 is OK: puppet ran at Thu Mar 7 00:12:19 UTC 2013 [00:12:31] RECOVERY - Puppet freshness on ms6 is OK: puppet ran at Thu Mar 7 00:12:20 UTC 2013 [00:12:31] RECOVERY - Puppet freshness on hooft is OK: puppet ran at Thu Mar 7 00:12:20 UTC 2013 [00:12:31] RECOVERY - Puppet freshness on amssq46 is OK: puppet ran at Thu Mar 7 00:12:21 UTC 2013 [00:12:31] RECOVERY - Puppet freshness on amslvs2 is OK: puppet ran at Thu Mar 7 00:12:21 UTC 2013 [00:12:31] RECOVERY - Puppet freshness on amssq34 is OK: puppet ran at Thu Mar 7 00:12:22 UTC 2013 [00:12:31] RECOVERY - Puppet freshness on ssl3001 is OK: puppet ran at Thu Mar 7 00:12:24 UTC 2013 [00:12:31] RECOVERY - Puppet freshness on amssq43 is OK: puppet ran at Thu Mar 7 00:12:24 UTC 2013 [00:12:31] RECOVERY - Puppet freshness on knsq19 is OK: puppet ran at Thu Mar 7 00:12:25 UTC 2013 [00:12:32] RECOVERY - Puppet freshness on amssq32 is OK: puppet ran at Thu Mar 7 00:12:26 UTC 2013 [00:12:32] RECOVERY - Puppet freshness on amssq52 is OK: puppet ran at Thu Mar 7 00:12:26 UTC 2013 [00:12:33] RECOVERY - Puppet freshness on knsq22 is OK: puppet ran at Thu Mar 7 00:12:26 UTC 2013 [00:12:33] RECOVERY - Puppet freshness on amssq54 is OK: puppet ran at Thu Mar 7 00:12:27 UTC 2013 [00:12:34] RECOVERY - Puppet freshness on amssq44 is OK: puppet ran at Thu Mar 7 00:12:27 UTC 2013 [00:12:34] RECOVERY - Puppet freshness on knsq23 is OK: puppet ran at Thu Mar 7 00:12:27 UTC 2013 [00:12:35] RECOVERY - Puppet freshness on amssq48 is OK: puppet ran at Thu Mar 7 00:12:27 UTC 2013 [00:12:35] RECOVERY - Puppet freshness on amssq49 is OK: puppet ran at Thu Mar 7 00:12:27 UTC 2013 [00:12:36] RECOVERY - Puppet freshness on amslvs3 is OK: puppet ran at Thu Mar 7 00:12:28 UTC 2013 [00:12:37] RECOVERY - Puppet freshness on amssq60 is OK: puppet ran at Thu Mar 7 00:12:28 UTC 2013 [00:12:37] RECOVERY - Puppet freshness on knsq27 is OK: puppet ran at Thu Mar 7 00:12:29 UTC 2013 [00:12:37] RECOVERY - Puppet freshness on knsq16 is OK: puppet ran at Thu Mar 7 00:12:29 UTC 2013 [00:12:38] RECOVERY - Puppet freshness on knsq17 is OK: puppet ran at Thu Mar 7 00:12:29 UTC 2013 [00:12:38] RECOVERY - Puppet freshness on knsq28 is OK: puppet ran at Thu Mar 7 00:12:29 UTC 2013 [00:12:39] RECOVERY - Puppet freshness on knsq26 is OK: puppet ran at Thu Mar 7 00:12:29 UTC 2013 [00:12:40] RECOVERY - Puppet freshness on knsq18 is OK: puppet ran at Thu Mar 7 00:12:30 UTC 2013 [00:12:41] RECOVERY - Puppet freshness on amssq40 is OK: puppet ran at Thu Mar 7 00:12:30 UTC 2013 [00:12:41] RECOVERY - Puppet freshness on amssq45 is OK: puppet ran at Thu Mar 7 00:12:30 UTC 2013 [00:12:58] weeee [00:13:17] Will icinga-wm autorejoin? [00:13:22] or do we have to kick it on neon? [00:13:52] nm. [00:14:06] New patchset: Pyoungmeister; "setting 1 node per shard to innodb_file_per_table for conversion" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52559 [00:15:46] !log reedy synchronized wmf-config/InitialiseSettings.php [00:15:51] Logged the message, Master [00:16:42] Reedy: mind if I sync CommonSettings.php? pushing out the addition of a config var that is currently inert (not checked by anything); trying to make deployment simpler. [00:16:53] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52559 [00:17:26] notpeter: "innodb_file_per_table.. This variable is available as of MySQL 4.1.1" - welcome to the future! [00:17:38] ori-l: if fenari wasn't hanging [00:17:38] hahahaha [00:18:10] the dump + load will probably reclaim some disk space too. maybe not a ton but probably at least a few gigs per shard [00:18:56] well, as space/time tradeoffs go, I dunno if it's woth it. [00:19:03] I mean total wall clock time [00:19:06] not execution time [00:19:23] Reedy: oh. heh. [00:19:39] notpeter: that's not why you're doing it though [00:20:11] * Reedy stabs nfs1 [00:21:22] notpeter: the space reclamation is just like earning points on a credit card that screws you with fees [00:22:40] bahahaha [00:22:48] New patchset: Pyoungmeister; "derp. need to pass down" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52563 [00:23:43] New patchset: Reedy; "Allow wikimedia blog on foundationwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/52564 [00:23:48] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52563 [00:24:02] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/52564 [00:25:59] * RobH deletes more stuff off wikitech while apergos isnt looking [00:29:38] yay, fresh puppet [00:30:24] servermon is far happier now than a fwe hours ago. [00:32:04] Change merged: Tim Starling; [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/52543 [00:33:49] !log reedy synchronized php-1.21wmf11/extensions/RSS [00:33:54] Logged the message, Master [00:38:00] New review: Tim Starling; "Maybe you could also change search0x to search1000x in manifests/role/lucene.pp to prevent the same ..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52547 [00:39:33] !log puppet halted run on cp1003, left in locked state, killed all puppet instances and refired, cleared. [00:39:34] Logged the message, RobH [00:40:40] New patchset: Lcarr; "fixed typo" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52574 [00:40:47] !log puppet locked on ms1004, killall puppet, refired [00:40:52] Logged the message, RobH [00:41:53] RobH: I don't bother to even log when i do that it happens so much [00:42:24] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52574 [00:42:26] meh, ms1004 is full [00:42:30] so that wont fix it. [00:42:38] i fear deleting shit on thumb server =P [00:45:02] ahh old log files that are already compressed, why are you still there... [00:45:27] LeslieCarr: this is what i been doing all day, figured i should log one or two to show im alive ;] [00:45:33] hehe [00:49:15] !log restarted pybal on lvs3 [00:49:21] Logged the message, Mistress of the network gear. [00:52:01] LeslieCarr: the celsus icinga alert is false positive from IP change right? [00:52:07] cuz im gonna ack it then. [00:52:29] yeah [00:53:33] huh, it died [00:53:36] icinga, restarting. [00:54:02] !log kicked icinga [00:54:07] Logged the message, RobH [01:02:56] New patchset: Ram; "Bug: 45795 Add explicit property identifying null host." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52547 [01:05:56] RoanKattouw_away: pmtpa parsoidcache is all ready [01:26:59] LeslieCarr: so how long are we keeping the tampa DC? Is there any vague scheduling on that? [01:41:41] Aaron|laptop it all depends on sue/the board and the budget meeting [01:46:38] * Aaron|laptop hands Ryan_Lane a box of Gluster [01:47:00] AaronSchulz: want to manage storage? :) [01:47:03] it's fun [01:47:05] a lot of fun [01:47:35] so would any block storage system work? [01:48:09] as long as it can be shared between clients [01:48:12] would be interesting to have a requirements page [01:48:13] and it isn't a SPOF [01:48:27] though there is little total data right? [01:48:37] right now 10TB [01:48:43] New patchset: Dzahn; "puppetize haproxy (for brewster), the very basics (RT-4660)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52578 [01:48:43] oh [01:48:45] also, it needs to restrict access [01:48:51] and it needs to have read-only support [01:49:01] "restrict access"? [01:49:15] export a can only be accessed by client x,y,z [01:49:38] export b can be accessed by anything, but is read only [01:50:36] have you talked to any gluster dev about the problems labs has? [01:50:38] yes [01:50:46] there are no magic nobs I take it? :) [01:50:49] "gluster 3.4 will likely work better for you" [01:50:56] gluster has almost no knobs [01:51:01] better as in "actually work consistently"? [01:51:05] glusterd is single threaded [01:51:21] and every single change you make to any volume requires all bricks to talk to each other [01:51:29] the more volumes you have, the worse things become [01:51:44] gluster 3.4 will have a multi-threaded glusterd service [01:52:11] so in reality it probably will help [01:53:09] there's other things that I very much dislike, though [01:53:11] performance at least [01:53:22] !log installing package upgrades on sodium [01:53:22] that'll only help performance of glusterd [01:53:28] Logged the message, Master [01:53:37] glusterfsd service is what runs for the gluster filesystem itself [01:53:53] and the glusterfs service runs for nfs [01:54:14] * Aaron|laptop lols at the thumbnails errors on http://gluster.org/community/documentation/index.php/GlusterFS_Concepts [01:54:15] the gluster service on the client eats up tons of cpu [01:54:20] heh [01:54:48] the gluster service on the client used to eat up a ton of memory due to memory leaks too, but thankfully those were fixed in the last point release [01:55:18] any major issue with gluster results in an outage, though [01:55:25] and outages result in split-brained files [01:55:37] we're using replica=2, so we can't use the quorum features [01:56:07] you can't relax consistency either? [01:56:21] meaning? [01:56:37] oh. I see what you mean [01:56:43] well, that would result in corruption [01:57:38] though realistically the end-result is the same, I guess [01:58:03] depends on the nature of the "issue" [01:58:09] because it's nearly impossible to fix split-brain issues now, so we just have a bunch of files with input/output errors [01:58:26] and whether blocks can be known to be broken (via some hash or something) or not [01:58:38] * Aaron|laptop has to read up on the design of gluster [01:58:41] well, it's not really a block based filesystem [01:58:48] it works at the file level [01:59:02] so blocks map to local files or something? [01:59:12] files map to files :) [01:59:21] and a client directly writes to two spots [01:59:30] if one is down, then when it comes back up, it's updated from the other [01:59:57] if one goes down, the client switches to the second [02:00:08] if that then goes down, and the other one comes back up [02:00:15] then the client continues to write.... [02:01:34] if using replica=3 and a quorum it'll block writing unless there's a quorum [02:01:38] telling the client its read-only [02:02:44] we could probably switch to replica=3 [02:02:54] but that's a massive waste of space for normal block data [02:03:27] to bad it's not one of those systems that uses solomon-reed codes to avoid 3 copies [02:03:30] *too [02:03:31] and realistically with replica=3 we're approaching netapp prices for hardware [02:03:38] Thehelpfulone: i got the requested list.. RT-4656, but having trouble attaching a 15kb file.. sigh [02:04:50] we're really just using a DFS to avoid a SPOF [02:04:52] Ryan_Lane: so what's the problem if just one spot goes down and then up again? [02:05:05] Aaron|laptop: none [02:05:13] theoretically [02:05:25] so why is stuff going down so much? [02:05:28] though in practice I've seen some issues with availability [02:05:50] gluster doesn't have a concept of read-only [02:06:01] so, we're using gluster's nfs support for this [02:06:15] glusterfsd services are decoupled from the glusterd service [02:06:19] Aaron|laptop: because gluster's actual implementation is best described by the same acronym as "point of sales" [02:06:24] it seems glusterfs services are not [02:06:32] glusterfs services share nfs [02:07:11] (we need read-only for the ssh keys and for the xml dumps) [02:07:30] so, when the glusterd service died, it would restsrt [02:07:32] *restart [02:08:02] if you were lucky and tried to login when that happened, especially on lucid, it would hang, or deny [02:08:09] lucid would never come back [02:08:16] which is why I just rebuilt bastion1 [02:09:12] so, we haven't had a full gluster outage in a few weeks [02:09:24] we changed how we were interacting with glusterd [02:09:38] previously it was swap-deathing itself [02:09:55] now we're just having the nfs issue [02:12:48] so nfs problems are the reason you want to switch? [02:13:19] or at least you sounded like you wanted to switch to something else this mourning :) [02:14:04] Ryan_Lane: It looks like something is up with the search engine on wikitech (labsconsole). Any pages that were imported don't show up (e.g. looking for "moreb" doesn't suggest "morebots") [02:14:08] known issue? [02:14:15] not known issue [02:14:21] known issue now :) [02:14:27] is there a maintenance script to update the search index? [02:14:48] updateSearchIndex.php [02:15:04] *** Couldn't write to the searchUpdate.labswiki.pos! [02:15:06] o.O [02:15:17] that looks wrong [02:15:34] is labs using the normal search hooks or some OAI thing? [02:15:41] normal mediawiki search [02:16:26] it needs to write to a file? [02:17:01] in the maintenance/ dir by default [02:17:38] well, I tried changing that [02:17:40] *** Couldn't write to the /tmp/searchUpdate.labswiki.pos! [02:17:41] heh [02:18:13] New review: MZMcBride; "Reverted by ." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/21322 [02:18:56] you just want to update from some time range probable (-s and -e) [02:19:04] it doesn't really need to write for that case I guess, heh [02:19:58] *probably [02:20:04] ah [02:20:05] yep [02:20:10] I just set the timestamp to 2001 [02:20:35] actually, 2000 :) [02:20:36] well 200101010000 or whatever [02:20:39] Krinkle: should work now [02:21:19] * Aaron|laptop missed a 00 [02:21:19] ... [02:21:22] or not [02:21:49] any other maintenance scripts I should run? [02:21:52] nope [02:22:08] I get a few on Special:Search , but that may be because I edited the pages [02:22:13] I don't see them in prefix search [02:22:14] yeah [02:22:15] indeed [02:22:24] lemme isolate the api request for easier testing [02:22:29] thanks [02:22:36] https://wikitech.wikimedia.org/w/api.php?format=json&action=opensearch&search=moreb&namespace=0&suggest= [02:22:38] I'm sure there's some specific script for this [02:22:48] https://wikitech.wikimedia.org/w/api.php?format=json&action=opensearch&search=User:Mo&namespace=0&suggest= [02:22:56] the latter has been edited and the former has not [02:22:58] neither shows up yet [02:23:48] Ryan_Lane: did the script do anything? [02:23:52] yeah [02:23:59] it said it indexed all the pages [02:24:18] * Aaron|laptop likes how there is search update code in the base Maintenance class... [02:29:09] yeah that script is broken if -p is given too [02:29:17] :D [02:29:19] realpath() returns false [02:29:24] *isn't [02:29:25] it's ok, only third parties use that [02:29:37] ;) [02:29:38] !log LocalisationUpdate completed (1.21wmf11) at Thu Mar 7 02:29:38 UTC 2013 [02:29:44] Logged the message, Master [02:32:13] bleh [02:32:15] I'm going to run maintenance/rebuildall.php [02:32:50] I'm actually not sure why realpath() gives false, even if I use __DIR__ [02:33:11] the dir perms look fine (missing x bit can cause realpath to fail) [02:33:36] doesn't work as root either, wtf [02:33:43] :D [02:37:20] hm. can't seem to make search work properly [02:38:44] this realpath thing must be a php bug, ugh [02:38:50] it's known to have bugs [02:39:41] :D [02:40:03] I guess I'll need to go through the code to see what the search is doing and why it isn't working [02:41:19] Ryan_Lane: does output the list of titles? [02:41:45] yep [02:43:33] though not all of them [02:44:01] but, some that aren't showing up in the search were in the list [02:44:12] I was just about to ask about that [02:44:50] for some reason I remember seeing this in the past [02:47:13] "It will not update the search index for the pages that do not appear in http://www.mediawiki.org/wiki/Special:RecentChanges." [02:47:14] seriously? [02:47:23] no fucking wonder [02:47:49] you can only update the search index for pages in recent changes [02:48:13] Ryan_Lane: did you run rebuildRecentChanges? [02:48:22] yes.... [02:48:26] but think about that ;) [02:53:50] !log LocalisationUpdate completed (1.21wmf10) at Thu Mar 7 02:53:50 UTC 2013 [02:53:56] Logged the message, Master [06:01:52] New review: Nemo bis; "@Reedy: who are you replying to? I meant the password protection without expiry (fixed in the meanwh..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37441 [07:58:19] Nikerabbit, now you're there? [08:39:10] New review: Mattflaschen; "Let's use a separate slow-parse file for private wikis:" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49678 [08:40:40] New review: MaxSem; "Just filtering blacklisted wiki names is dangerous, as people will forget to update such blacklist w..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49678 [08:43:02] MaxSem: yes [08:43:35] Nikerabbit, https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=vanadium&service=Solr [08:46:00] MaxSem: and what I'm supposed to get out of it? I only see CRITICAL [08:46:08] ...and its cause [08:46:28] MaxSem: average request time for what? [08:46:46] for searches. since it's your installation, let's decide what to do about it [08:47:02] MaxSem: what exactly is profiling my searches? [08:47:19] ? [08:47:40] Icinga;) [08:47:57] MaxSem: and how does it do that? [08:48:18] gets the stats from solr [08:48:26] oki [08:48:44] so it's a reflection of your real preformance (or a lack of it):) [08:49:08] so... we could raise the threshold for this particular service [08:49:23] but it's friggin slow already [08:49:59] okay so [08:50:23] MaxSem: let's rise the threshold a bit, I will fix bug http://bugzilla.wikimedia.org/43778 and see how it acts after that [08:50:46] MaxSem: also, better profiling than just "average" would be nice [08:51:02] whether there is only few hugely slow or whether it is across the board [08:51:45] do you know a better metric in admin/stats.jsp or whatever? [08:52:32] MaxSem: haven't studied those [08:52:33] if you just want to know what's going on, use standard MediaWiki profiling with graphite [08:52:45] MaxSem: are these milliseconds: https://noc.wikimedia.org/cgi-bin/report.py?db=all&sort=real&limit=5000&prefix=Solr ? [08:52:50] icinga is just for cases where shit hit fan [08:53:18] not sure [08:54:31] MaxSem: how do I find that in graphite? [08:54:36] Nikerabbit, https://graphite.wikimedia.org/dashboard/temporary-23 [08:55:26] ugh, it looks scary [08:55:34] yeah, even with this few dataplots [08:55:59] or is that just dust in my screen [08:56:25] hmm updates timing out... that is bad [08:56:46] I wonder why is that [08:57:58] MaxSem: I will also be setting up Solr for translatewiki.net production to see how it compares [08:58:45] meanwhile, I'll tweak monitoring [09:32:08] New patchset: awjrichards; "Update X-CS handling to new k/v pair spec" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52606 [10:04:07] New review: Platonides; "The filter can read the blacklist directly from private.dblist" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49678 [11:13:49] New review: Aklapper; "Hmm, now that you say I think I remember something. Let's abandon this change, if I find out how to ..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52393 [11:14:12] Change abandoned: Aklapper; "Adds more confusion than before, see comment by Nemo" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52393 [11:15:38] New review: Aklapper; "34 is the width of a column - needed to extend it so the string isn't cropped" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52400 [12:18:39] New review: Peachey88; "Wouldn't it be easier to do it on wgDebugLogFile compared to wg DebugLogGroups, So all the private l..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/52608 [12:46:29] New patchset: Aklapper; "[bug 45770] Update used parameters in statesToRun and resolutionsToRun arrays" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52616 [12:47:23] New review: Aklapper; "Superseded by https://gerrit.wikimedia.org/r/#/c/52616/ - marking as abandoned." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52396 [12:47:43] Change abandoned: Aklapper; "Superseded by https://gerrit.wikimedia.org/r/#/c/52616/" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52396 [14:53:49] New review: Platonides; "$wgDebugLogGroups has priority over $wgDebugLogFile. It may make sense to remove most of them for pr..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/52608 [15:00:45] !log Restarted gmetad on nickel [15:00:51] Logged the message, Master [15:28:14] !log Reinstalled cp3003 [15:28:20] Logged the message, Master [15:30:08] does udp2log need a config change when adding a new type ? [15:30:50] what do you mean by a type? [15:33:07] see https://gerrit.wikimedia.org/r/52608 [15:33:32] that new privatewiki-slow-parse log [15:34:30] hmm, don't know [15:48:32] paravoid: are you around? [15:48:37] yes [15:49:07] i want to reseat disk4 on ms-be1004. is this ok? [15:49:31] yes [15:50:04] k..cool also found a few failed disk in 1011/1009/1006 [15:50:29] oh really osdmap e170240: 144 osds: 138 up, 138 in [15:51:30] odd [15:52:10] 1002, 1004, 1006, 1008, 1009, 1011 [15:52:14] let me have a look [15:59:20] 1002 sdl, 1004 sde, 1006 sdf, 1008 sdk, 1009 sdh, 1011 sdi [15:59:27] cmjohnson1: these seem to be all broken [15:59:44] I/O errors [16:00:05] note that sda is the VD with the SSDs, so it's off-by-one to get the bay nr. [16:00:30] next hardware issue [16:00:33] bad drives? ;-) [16:01:04] that's quite a lot suddenly [16:01:24] statistically though it might make sense [16:01:36] although I wonder why we haven't had the same ratio in pmtpa/swift [16:02:05] we are not 100% r720s in tampa [16:02:22] this looks like disk-related [16:02:34] but the r720xd replacement includes disks, right? [16:02:58] yes...they all have the same 3TB disks [16:03:11] w/the exception of the flex bays [16:04:13] New patchset: Mark Bergsma; "Revert "make check_http (80) and check_tcp (8080) on install hosts a critical (paging) service"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52623 [16:04:48] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52623 [16:05:03] mark: I disagreed with that on RT too [16:05:08] i saw [16:06:51] and even if that were a critical service to labs, that needs to become a properly designed HA setup before it can become a critical service [16:07:07] !log reseating disk4 on ms-be1004 [16:07:12] Logged the message, Master [16:21:28] New review: MaxSem; "Thanks!" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36222 [16:26:34] cmjohnson1: it still snowing there? [16:26:46] robh: no it stopped last night [16:26:51] it is all melting today [16:27:02] good =] [16:27:09] so the cameras were still in recieving [16:27:31] well, they better not charge us for delivery on upcoming bill [16:27:35] i'll have to check, thx for info [16:28:06] yep...I need some help w/the ticket for equinix...but will ping you about it shortly in chat session w/dell [16:28:13] \o/ [16:28:24] cool, im working from home today, so will be online and available from now onward [16:28:34] no commute \o/ [16:29:20] gotta love that! [16:29:35] New patchset: MaxSem; "WIP: OSM module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36222 [16:29:44] robh: the hpm servers have both connections so all disk drives will work [16:31:59] awesome! [16:32:12] so we'll be taking a hpm ssd model and putting it in frack, ill drop ticket in a few minutes [16:34:29] ok...jeff_green can you put a ticket in the eqiad queue for the frack db's you need moved [16:34:53] cmjohnson1: sure. but we can't start that until April [16:35:16] that's okay..i will stall it but don't want to forget ;-] [16:35:20] k [16:36:55] cmjohnson1: https://rt.wikimedia.org/Ticket/Display.html?id=4665 [16:37:15] woot [16:37:31] That is for samarium, jeff's new public log host [16:37:53] Jeff_Green: I put in that ticket to plug it into whatever firewall has the least number of connections [16:38:08] Is that ok, or did it need dual connections? [16:38:17] single is ok, and that's probably fine [16:38:23] cool [16:38:31] although can you let me know which it ends up in? [16:38:42] I asked in ticket for chris to drop a new network ticket for it [16:38:43] with all the info [16:38:47] k [16:38:59] (i'd link them all together with refers too personally, i like link history of tickets) [16:39:22] cool [16:55:00] robh: need you help w/equinix ticket for cameras [16:55:16] oh the request for mounts? [16:55:45] yes...what type of request would this be? [16:55:53] its a smart hands [16:56:04] request a metal junction box + conduit for each camera [16:56:11] and mark on the pillars where we want it [16:56:17] these PoE right? [16:56:19] yep [16:56:25] k [16:56:37] so once the junction boxes and conduit are in place, you can run each camera to ports0-3 on any switch in row c [16:56:39] that makes it easier [16:56:48] those are the PoE ports on a EX4200 [16:57:01] (may be more than that, but on existing cameras we have not gone above port 3) [16:57:21] well that only leaves 2 switches so 8 or 5 [16:58:01] that works [17:17:37] cmjohnson1: ms-be1004's disk is still borked [17:19:28] ok...thx...working on replacing them all [17:19:34] thanks [17:31:58] robh: external osm servers in eqiad are not in public vlan yet...unless you changed it already [17:32:12] eww, i did not [17:32:16] mind fixing them? [17:32:24] (figured you may want more switch work ;) [17:34:26] cmjohnson1: its not 'omg fix now' paravoid will be at a hackathon this weekend [17:34:32] so as long as they are working by end of day, its fine. [17:34:45] I think we're going to start in labs first anyway [17:34:52] so it probably doesn't matter [17:36:24] !log reedy synchronized wmf-config/InitialiseSettings.php [17:36:25] ok...cool [17:36:28] Logged the message, Master [17:36:39] reedy@fenari:/home/wikipedia/common$ sync-file wmf-config/InitialiseSettings.php [17:36:39] No syntax errors detected in /home/wikipedia/common/wmf-config/InitialiseSettings.php [17:36:39] copying to apaches [17:36:39] reedy@fenari:/home/wikipedia/common$ [17:36:44] That's scarily quiet! [17:37:07] jeff_green..your new shiny server has been moved just needs network port enabled. [17:37:20] cmjohnson1: nice. thanks! [17:37:28] Jeff_Green: And if you end up not needing the SSDs, let us know. [17:37:33] will do [17:39:17] New patchset: Reedy; "Bug 45819 - Fix RSS plugin on wmfwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/52631 [17:40:18] robh: wmf4084...can i move this to c7? Not sure if you still need it in a db rack [17:40:46] dude, i dont recall wtf we put it there for [17:40:55] maybe cuz it was only rack with networkign working [17:40:58] and then we didnt need it [17:41:02] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/52631 [17:41:02] but i dunno, so it can move. [17:41:36] ok..figured as much but just wanted to confirm [17:41:37] thx [17:49:40] Damn. No peter or asher [17:50:14] Reedy: hm? [17:50:41] Wanting a mariadb host to see if it's cleverer with a query [17:50:49] Oh [17:50:50] 19:02 Ryan_Lane: binasher: shutting down mysql on db1043, converting to mariadb [17:51:24] Reedy: there's a mariadb => true in puppet for the mariadb hosts afaik [17:51:37] SAL was quicker :D [17:55:36] would it be reasonable to use https://github.com/puppetlabs/puppet-postgresql in our repo, or it's too huge? [17:57:24] is it a matter of size? [17:57:43] huge software requires more care:) [17:57:50] frankly, I think you're overreaching a bit [17:57:59] with all this OSM stuff [17:58:08] like? [17:58:12] you've asked for an ops person to be present and you got it [17:58:28] I won't attend just for my merge powers [17:58:50] definitely [17:58:51] I'll do the research and pick the suitable tools for this [17:59:13] you're more than welcome to help but I think you're doing far too much work without having discussed anything with me first [18:01:38] I've been given a task to make this happen - I'm doing it [18:01:45] http://osm.wmflabs.org/osm/slippymap.html :) [18:02:01] You still need ops to make it happen [18:02:38] yup [18:06:05] sigh [18:12:50] * Aaron|home wonders what DeleteJob and MoveJob are [18:13:37] they delete and move [18:15:01] /home/wikipedia/common/php-1.21wmf11/includes/limit.sh: line 77: /sys/fs/cgroup/memory/mediawiki/job/32466/tasks: No such file or directory [18:17:17] Aaron|home: translate [18:18:01] Reedy: figured, I don't like how vague the names are [18:18:08] mmm [18:18:19] Nikerabbit, ^^^ [18:18:23] Easily fixed [18:18:27] With a minor workaround [18:18:36] DeleteJob -> TranslateDeleteJob [18:18:47] class DeleteJob extends TranslateDeleteJob {} [18:18:55] to save breaking the current job queue entries [18:19:00] Aaron|home: are they stuck somewhere? [18:19:04] we had some failures [18:19:12] MaxSem: what? [18:19:16] https://bugzilla.wikimedia.org/show_bug.cgi?id=44865 [18:19:17] no, I was just commenting on the naming [18:19:20] Reedy: https://gerrit.wikimedia.org/r/#/c/52588/ [18:19:21] Nikerabbit, ^^^ [18:19:27] ah ok [18:19:37] Nikerabbit: RenderJob, MoveJob and DeleteJob are too generic in their names [18:19:56] patches welcome ;) [18:20:09] I'll do it then? :p [18:20:15] this is sooo open sourcy:P [18:20:41] is there need to keep BC class aliases for a while? [18:21:00] probably wouldn't hurt, to let jobqueue process those jobs [18:21:10] or a b/c jobtypes entry [18:21:34] well, $wgJobClasses [18:21:49] Nikerabbit: class DeleteJob extends TranslateDeleteJob {} [18:21:50] Aaron|home: ah that works too [18:24:44] Reedy: WorkJob! [18:25:36] Could not fetch review information from gerrit [18:25:37] fatal: internal server error [18:25:38] grr [18:25:47] gerrit being really slow today? [18:29:37] Nikerabbit: https://gerrit.wikimedia.org/r/52638 [18:29:52] Reedy: now you cr that redis thing ;) [18:30:20] Aaron|home: https://gerrit.wikimedia.org/r/51675 [18:30:22] gogogogo [18:33:48] heh, I actually have Postgre on this box [18:34:04] (note that calling it "Postgre" is normally a sin) [18:36:23] Reedy: https://www.mediawiki.org/wiki/Extension:Translate#Recent_changes [18:37:05] Nikerabbit: There'll be a few of those [18:37:18] Lets add gerrit [18:37:26] thx [18:39:02] Reedy: wasn't there a patch for that already [18:39:03] !log reedy synchronized wmf-config/InitialiseSettings.php [18:39:09] Logged the message, Master [18:39:23] Not that I saw [18:40:12] well isn't that great, it doesn't like protrel urls [18:40:40] That, or is it exact matching, not partial.. [18:41:23] !log reedy synchronized wmf-config/InitialiseSettings.php [18:41:30] Logged the message, Master [18:43:17] https://gerrit.wikimedia.org/r/#/c/51675/15/maintenance/sqlite/archives/patch-archive-ar_id.sql ugh [18:43:34] Reedy: can you just made the _id columns a generic unique index for all the db types? [18:43:57] it would still be the PK for mysql (being the first unique index) and it would simplify the sqlite patches [18:44:10] you add columns + unique indexes with ALTER then [18:44:16] *you could [18:44:27] mutante: hey , could you possibly add me on the CC list for https://rt.wikimedia.org/SelfService/Display.html?id=4676 ? I don't have access to it. That is a request to grant marktraceur shell access on the continuous integration box. [18:47:25] New patchset: Reedy; "Enable translate gitweb rss feed on mediawikiwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/52642 [18:47:44] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/52642 [19:17:28] !log pgehres synchronized php-1.21wmf11/extensions/CentralNotice/ 'Updating CEntralNotice to master, resolving bug 45846' [19:17:34] Logged the message, Master [19:19:59] hashar: added you as a requestor, that should give you access [19:20:23] mutante: nice thanks! [19:21:47] New patchset: Lcarr; "fixing up nagios-nrpe-server init on icinga" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52647 [19:23:00] mutante: and it got approved by manager :] [19:24:01] Leslie, icinga died aain today and I restarted it manually.. it left its pid file around again, still don't know what causes it [19:24:09] hrm [19:30:31] New patchset: Bsitu; "Add EventLogging configuration" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/52648 [19:32:33] Change merged: Bsitu; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/52648 [19:36:16] paravoid: ms-be1011...i reseated drive 7 and it did not rebuild, most likely foregin cfg issue. status is unconfigured good. [19:37:00] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52647 [19:37:25] !log bsitu synchronized wmf-config/CommonSettings.php 'Add Eventlogging configuration to Echo' [19:37:30] Logged the message, Master [19:40:30] New review: Hashar; "This should really not be made public for some over reasons too beside the potential disclosures of ..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49678 [19:40:44] binasher: I can finally use linkedin again ;) [19:41:08] woo! [19:41:12] thanks :) [19:42:14] New review: Platonides; "Which ones?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49678 [19:44:21] slowest "git review" ever [19:44:31] but maybe its office wifi .. [19:46:20] New patchset: Dzahn; "merge ./misc/nagios.pp into icinga.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52651 [19:57:10] New patchset: Lcarr; "icinga nagios cleanup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52653 [19:57:20] mutante: ^^ ? [19:58:30] New patchset: Catrope; "Use the service IP for parsoidcache in Tampa as well" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/52654 [19:58:57] partytime!!! [20:03:33] Change merged: Catrope; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/52654 [20:05:29] !log catrope synchronized wmf-config/CommonSettings.php 'fa0a56a89 - parsoidcache VIP in pmtpa' [20:05:35] Logged the message, Master [20:17:54] LeslieCarr, you want I should merge this icinga patch? [20:20:46] New review: Andrew Bogott; "Looks good, I will merge shortly." [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/52026 [20:35:03] oh teh little one ? [20:36:27] yeah [20:37:06] yes plz [20:37:07] thanks [20:38:09] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52026 [20:42:09] For the E3 deploy, I +2'd the core submodule update https://gerrit.wikimedia.org/r/#/c/52659/ , and jenkins-bot "Starting gate-and-submit jobs" has taken over 8 minutes [20:43:05] hashar's eating or something [20:43:10] http://integration.mediawiki.org/zuul/status seems to indicate it's gone on to test subsequent gerrit commits... ? [20:43:20] jeremyb_ hashar is standing next to me [20:43:29] that counts as or something [20:44:00] :) [20:45:47] binasher: https://gerrit.wikimedia.org/r/#/c/52458/ [20:49:45] FYI zuul/jenkins finally failed with "ERROR: Couldn't find any revision to build". hashar thinks it's a bug [20:50:10] Jeff_Green, I tidied https://wikitech.wikimedia.org/wiki/Manual_for_ops_on_duty up a bit but if you get some time to expand that, that would be good [20:52:12] New patchset: MaxSem; "Set mobile URL template for beta" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/52667 [20:53:09] Thehelpfulone: hey there's nothing on there about how to handle documentation requests, so I don't know how to handle it :-P [20:53:29] I'll write a section just for you if you really want ;-) [20:53:32] hahaa [20:54:06] also could someone enable $wgVectorUseIconWatch = true; [20:54:06] on labs? I'm missing my little star to watchlist pages [20:54:06] hahaha, no please don't. i'm busy trying to figure out how to monitor service clusters in icinga. [20:55:47] kaldari, did you have jenkins CI problems with your core submodule updates? Just curious [20:56:01] CI? [20:57:33] kaldari says "yes, the gate-and-submit jobs were really slow", but E2 finally succeeded [21:02:17] We're still waiting on this one to merge: https://gerrit.wikimedia.org/r/#/c/52668/ [21:04:02] kaldari, http://etherpad.wikimedia.org/E3-2013-03-07-deploy has our three extensions. They submodules are updated in core for wmf11, and Jenkins is grinding away for wmf10 https://gerrit.wikimedia.org/r/#/c/52669/ [21:04:21] cool, I'll go ahead and deploy them for wmf11 [21:04:37] kaldari <3 [21:05:59] kaldari your 52669 had the same Jenkins "No candidate revisions . ERROR: Couldn't find any revision to build" failures for PHPUnit tests that we had [21:06:36] lesliecarr: oob link is finished [21:06:44] yay!!! [21:09:12] New review: Hashar; "if you say so :-]" [operations/mediawiki-config] (master) C: 2; - https://gerrit.wikimedia.org/r/52667 [21:09:33] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/52667 [21:09:52] So 1) Jenkins CI quite slow, 2) failures to get revision to build so voting tests fail. [21:19:29] spagewmf: I am The Keymaster! [21:20:29] kaldari, hashar: good news, Jenkins CI succeeded on the wmf10 extension submodule update and merged (4 minutes) [21:21:02] kaldari, since scap does both branches, can you update 1.21wmf10 as well before scap? [21:21:41] I could, but I don't have a wmf10 deployment branch, so it'll take me a while [21:22:36] spagewmf: great! 4 minutes is about normal. Most of the time is actually spent in the parser wich is usually a bit more than 3 mns [21:22:43] New patchset: Dzahn; "add mholmquist to jenkins admins / sudo to restart jenkins/psql on gallium (RT-4676)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52671 [21:22:44] we weren't deploying anything to wmf10 ourselves [21:22:55] kaldari OK, well while we test test, I'll get started in fenari:/home/wikipedia/common/php-1.21wmf10 [21:23:04] hello [21:23:22] I was directed here, and i was hoping to ask a question about cacheing and the Main Page [21:23:48] we currently have a proposal for a bot, the link is here: http://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/Joe%27s_Null_Bot_2 [21:24:30] what this would do is purge the Main Page 4 times and hour for when we implement "Today's articles for improvement" on the Main page [21:24:34] New review: Hashar; "Sounds good. Thank you for all the paperwork!" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/52671 [21:24:58] marktraceur: you are going to be granted access on the cont int server with additional privileges :-] [21:25:03] Woo! [21:25:15] https://gerrit.wikimedia.org/r/#/c/52671/ [21:25:19] ping daniel about it :-] [21:25:22] * marktraceur does a continuously integrated jig [21:25:42] will have to give you some trip'n trick beforehand though [21:25:46] Right. [21:25:54] hashar: Anytime you want to chat about it, I'm game [21:26:51] kaldari, there's a glitch, we need to bump an extension. Sorry [21:27:06] ok [21:27:11] not scapping yet [21:27:12] Any WMF input about the server side reprecussions of this proposed bot would be appreciated, please give any relevant information at: http://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/Joe%27s_Null_Bot_2 [21:30:24] NickPenguin: why ? [21:30:37] 1 refresh per 15 minutes isn't a big deal, but i want ot know why [21:30:56] LeslieCarr: Forces a reshuffle of the list to present them in pseudorandom order [21:31:10] ah [21:31:11] LeslieCarr: otherwise the one order is fixed. [21:31:32] ah gotcha [21:31:35] (for logged out users) [21:31:47] let's not mention our logged in user performance data [21:32:36] drdee: ping [21:38:19] kaldari, the change is GettingStarted 3c981ea Disable tour for everyone , we're waiting for the Jenkins CI jobs to complete [21:41:55] Krinkle, you have a change in 1.21wmf10 f3fb906b "mediawiki.jqueryMsg.test: Fix expected number.". "How to deploy" says "go yell at the culprit" :) Since it's a test I assume it's OK to deploy along with our extensions [21:42:21] Yes [21:42:28] test is not deployed in the cluster [21:42:33] the directory is unaccessible [21:42:55] didn't want to waste a sync for it, go ahead :) [21:43:18] Ryan_Lane: busy ? [21:43:24] want to party with the upgrades ? [21:46:41] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52653 [21:48:11] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52651 [21:50:05] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52185 [21:50:08] hashar: what else did you want me to review? [21:50:22] ohouuuouou :)] [21:51:07] notpeter: there is the huge puppet change at https://gerrit.wikimedia.org/r/#/c/51677/ [21:51:20] latest patchset (#15) is more or less working properly [21:51:43] not sure how bad it is going to be for the production cluster though [21:52:05] lucene seems to be running more or less properly on the beta boxes. [21:52:16] kaldari so sorry for the delay, we are set on wmf10 and wmf11, do you want to start the mtoher of all scaps? [21:52:17] might have to move the indexer to another instance that will have more memory [21:52:31] the java -Xm4000m is killing the instance when using the index-updater job [21:52:40] we have one bug we want to fix [21:52:51] hashar: you can make that a param if you want? [21:53:01] http://etherpad.wikimedia.org/Upgrade-now-bitches [21:53:05] notpeter: I probably should [21:53:09] kaldari after me? No, please, after you :) [21:54:25] Best etherpad title in quite a while [21:54:48] hashar: in the change you made to search.pp, did you mean to have a value for $sync_conf_initialisesettings in the labs case? [21:54:59] haha [21:55:03] kaldari Matt (superm401) and I are joining the EventLogging workshop behind you, let us know when you scap. Thx++ [21:55:17] that's fine [21:55:25] I won't do anything in the meantime [21:57:43] https://salt.readthedocs.org/en/latest/ref/modules/all/salt.modules.apt.html [22:02:42] New patchset: Reedy; "Add libjpeg-turbo-progs to imagescalers for image rotation" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52707 [22:05:43] notpeter: so for production we only care about InitialiseSettings.php but on beta we need InitialiseSettings.php AND InitialiseSettings-labs.php [22:06:10] !log reboot formey after dist-upgrade via salt [22:06:16] Logged the message, Master [22:06:36] notpeter: so each file is a different variable. On production the -labs.php is set to an empty file to skip it. Maybe most of that sync-conf-from-common should be made a new task in lucene.jobs.sh [22:09:14] hashar: ok, cool [22:10:13] killing that [22:12:00] hashar, FYI https://gerrit.wikimedia.org/r/#/c/52674/ had Jenkins CI failures , both on DatabaseTest::testStoredFunctions "MySQL or Postgres required". It only sets up SQLite AIUI [22:13:09] hashar: ok, I'm gonna deploy [22:13:26] notpeter: I will probably submit some more patches next week :-D [22:13:47] <^demon> About to bring gerrit down for a minute or two for a reboot of manganese. Please don't panic. [22:13:51] spagewmf: they should be skipped whenever running on a sqlite backend. [22:13:57] * LeslieCarr panics!!!!! [22:15:36] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51677 [22:16:32] LeslieCarr: http://www.officeplayground.com/Stress-Balls-C9.aspx [22:16:56] koosh balls aren't stress balls! [22:17:01] heh [22:17:02] also i can't believe they still exist [22:17:28] Everything still exists. I still have a Tamagotchi. I just call it "FitBit". [22:17:29] moar reboots [22:17:37] hehehe [22:18:04] !log upgrading all analytics machiens [22:18:10] Logged the message, Mistress of the network gear. [22:18:12] hashar agreed, should I file a bug? [22:18:14] New review: Reedy; "This is to go along with https://gerrit.wikimedia.org/r/52707" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52707 [22:18:21] wtg Leslie! [22:18:21] morebots that's awesome [22:18:22] I am a logbot running on wikitech-static. [22:18:22] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [22:18:22] To log a message, type !log . [22:18:36] hehe [22:18:53] spagewmf: I guess the branch is screwed and tests are not running there [22:19:02] morebots, what is the answer ? [22:19:02] I am a logbot running on wikitech-static. [22:19:02] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [22:19:02] To log a message, type !log . [22:19:09] morebots that's not a very good answer [22:19:10] I am a logbot running on wikitech-static. [22:19:10] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [22:19:10] To log a message, type !log . [22:19:19] spagewmf: though on that change , that is because Zuul can't find the revision and I have ZERO idea why [22:19:35] * Aaron|home watches morebots troll [22:20:59] !log salt pkg.upgrade on zirconium [22:21:04] Logged the message, Master [22:22:02] RobH: for some reason, i thought the rdb100[12] high perf misc + ssd servers had more that 32GB.. were they supposed to, or am i imagining that? [22:22:46] its what we put on the ticket, same as normal hpm but with additional ssds [22:22:46] for some reason I can't load gerrit at all currently: 503 Service Unavailable [22:22:51] but we can upgrade ram as well. [22:23:03] https://gerrit.wikimedia.org/r/ [22:23:41] kaldari yeah, ^demon: About to bring gerrit down for a minute or two for a reboot of manganese. Please don't panic. [22:23:51] <^demon> It's coming back up now. [22:24:01] Do all the apaches have imagescaler stuff installed? mw1114 is an api box, but has imagemagick etc [22:24:03] paravoid: ^ [22:25:48] !log salt pkg.upgrade on spence [22:25:54] Logged the message, Master [22:28:13] !log rebooting marmontel [22:28:19] Logged the message, Master [22:30:53] New patchset: Pyoungmeister; "correcting hash structure mistake" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52712 [22:30:53] New patchset: Lcarr; "fixing nrpe to be the new standard everywhere. Huzzah!" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52713 [22:31:50] !log salt pkg.upgrade on yvon [22:31:56] Logged the message, Master [22:32:13] kaldari, how goes it? gerrit is back up for me [22:32:34] back up now, but my pull seems to be stuck [22:34:32] there it goes, finally [22:34:36] !log installing package upgrades on yvon despite (defunct apt-get procs) [22:34:42] Logged the message, Master [22:36:53] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52713 [22:37:50] New patchset: Demon; "We don't actually run any slaves, turn formey into a replicationdest" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52715 [22:38:20] !log rebooting yvon [22:38:30] Logged the message, Master [22:41:47] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52712 [22:42:10] LeslieCarr: I'm merging your stuff [22:42:26] oh thank you [22:42:57] notpeter or LeslieCarr I need a small guide to the mess in puppet, who can help me? [22:43:38] not right now, i am smacking down security holes [22:43:45] and writing angered emails about it [22:44:42] maybe RobH can? [22:44:44] * pgehres hopes not to get an email [22:46:13] hehe [22:47:13] haha [22:47:42] RobH: ? [22:47:47] Ryan_Lane: host willams? [22:47:53] ? [22:48:07] pgehres: don't add unauthorized apt sources, especially ones that are unresponsive and prevent security upgrades from being installed [22:48:43] LeslieCarr: I do not have enough rights on any hosts to do that [22:48:52] good [22:48:54] ! [22:49:51] RobH: do you remember what williams was for ? [22:50:35] looks like some secondary bastion/misc fenari like host [22:50:40] but no, i dont recall specifically. [22:50:56] if its out of date, and we cannot confirm folks are using it properly, we should take it down. [22:51:12] oh wait, my screne session is showing me garbled data. [22:51:24] i had fenari motd in split terminal window ;_; [22:52:06] I somehow recall williams as being associated with fundraising. [22:52:19] ugh, ... otrs ... at least sometime in the past [22:52:35] oh, perhaps it was the old otrs server. [22:53:19] oh, its the current. [22:53:25] wah :p [22:53:27] mutante: it appears to be the actual otrs server [22:53:29] logsout [22:53:45] funny, yet sad. [22:53:50] * RobH also logs out [22:57:26] yeah don't kill our OTRS server mutante :( [22:59:31] didnt touch it [23:01:58] Ryan_Lane: does salt know regex well enough ? can i do mc?.pmtpa.wmnet [23:02:25] salt -E [23:02:39] -E is for regex [23:03:03] LeslieCarr: mc?.pmtpa.wmnet doesn't do what you want if it's interpreted as a regex ;) [23:03:08] yeah [23:03:09] heh [23:03:13] it does work with globbing, though [23:03:16] which is the default [23:04:02] !log dist-upgrading zhen [23:04:07] Logged the message, Master [23:04:12] !log dist-upgrading pmtpa memcache [23:04:18] Logged the message, Mistress of the network gear. [23:11:40] <^demon> !log gerrit: disabling hooks-its plugin for the time being. It's working (yay) but it's overly spammy until we fix it next week. [23:11:46] Logged the message, Master [23:14:47] !log installing kernel on streber [23:14:52] Logged the message, Master [23:25:18] !log upgrading solr boxes [23:25:24] Logged the message, Mistress of the network gear. [23:30:26] New patchset: Matanya; "The new isn't needed any more." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52720 [23:31:09] LeslieCarr: please review ^ [23:31:24] about to run scap! [23:33:58] New patchset: Lcarr; "ensuring /etc/icinga exists" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52721 [23:34:24] thanks matanya :) [23:34:30] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52720 [23:35:31] np LeslieCarr. I'd like to help more here. anything you need related to nagios/icinga? [23:35:34] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52721 [23:35:53] thanks - so i am trying to slowly clean up all the old nagios bits and making sure they're all named icinga (or at least mostly) [23:36:08] and of course, i'm finding that there's a lot more edge cases and the like than i expected [23:36:28] hopefully inthe next few days everything will be completely out of nagios.pp [23:36:30] shoot me [23:36:34] haha [23:36:35] :) [23:39:36] New review: Apmon; "It makes sense to use the standard postgresql module (on puppetmaster) even if it seems a little com..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36222 [23:47:32] !log bravely rebooting streber - expect short RT and Observium outage [23:47:38] Logged the message, Master [23:47:44] !log doing tmh upgrades [23:47:49] Logged the message, Mistress of the network gear. [23:50:16] New patchset: Matanya; "Fixed the path to icinga cmd file." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52726 [23:52:39] New review: Lcarr; "it's actually manually sent to /var/lib/nagios for compatibility with the old system - we need to fi..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/52726 [23:52:52] one more for LeslieCarr ^ :) [23:53:26] yeah, saw that, we have it sent to the old location sadly … need to fix that [23:53:49] where is that configured? I can fix that, I guess [23:53:58] !log kaldari Started syncing Wikimedia installation... : [23:54:04] Logged the message, Master [23:54:42] i think in files/icinga/icinga.conf [23:54:54] icinga.cfg that is [23:55:21] and i want to say nrpe does some actions to there as well [23:55:23] maybe [23:55:27] definitely maybe [23:55:28] :) [23:55:55] I'll try to figure out in all the mess :) [23:58:59] New patchset: Demon; "Fix various noc files to be pep8 compliant" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52729