[00:02:19] Welcome to another thrilling round of Backport That Patch [00:03:15] <^d> No whammys, no whammys! [00:06:00] ori-l: yep, thx doing so. i suppose the ldap_ parameters don't change since it uses labs [00:07:21] I'm back [00:07:26] anything anyone wants/needs? [00:07:59] a macchiato [00:08:40] !log mholmquist synchronized php-1.23wmf6/extensions/MultimediaViewer/resources/ext.multimediaViewer/ext.multimediaViewer.js 'Make VE and MMV work together again' [00:08:55] Logged the message, Master [00:09:07] marktraceur: Done? [00:09:14] let him test first ;) [00:09:17] !log mholmquist synchronized php-1.23wmf6/extensions/MultimediaViewer/resources/ext.multimediaViewer/ext.multimediaViewer.lightboxinterface.js 'Make VE and MMV work together again' [00:09:33] Logged the message, Master [00:09:48] Three files needed updating [00:09:53] Probably should have sync-dir'd [00:09:56] But whatever [00:09:58] !log mholmquist synchronized php-1.23wmf6/extensions/MultimediaViewer/tests/qunit/lightboxinterface.test.js 'Make VE and MMV work together again' [00:10:12] No worries, just ping me when you're done [00:10:15] Logged the message, Master [00:10:46] Hm, bug's still there. [00:10:54] Probably RL cache. RoanKattouw, you can go. [00:11:02] do you need to touch some other file? [00:11:09] I doubt it [00:11:34] Yeah that should be fine, it'll take 5 minuutes [00:11:36] I'll go [00:12:00] Oh, it's looking good with debug=true [00:12:14] (03PS5) 10Dzahn: role and module structure for ishmael [operations/puppet] - 10https://gerrit.wikimedia.org/r/96403 [00:13:03] marktraceur: good work [00:13:25] * marktraceur notes on bug [00:14:13] (03CR) 10Dzahn: "ori-l: done. that's just site_name,ssl_cert and ssl_ca. the other stuff doesn't need to change between labs and prod" [operations/puppet] - 10https://gerrit.wikimedia.org/r/96403 (owner: 10Dzahn) [00:15:15] so, when do I get to start asking Gage for favors? [00:15:35] only via the RT triage person! [00:15:36] not on his first day? [00:15:38] :) [00:15:49] im sure mutante is going to warn him to never take any task via PM or direct msg [00:15:56] :) :) [00:15:58] he should refer everyone to triage person [00:16:02] I take tasks via PM & direct msgs [00:16:15] yes, but its hard for a new person to tell folks no [00:16:20] also, what's gage's primary role going to be? [00:16:24] i rather they not be placed in that situation [00:16:37] don't worry, I won't corner him or anything [00:16:42] the rt triage person and ops mgmt are in better position to say no to things [00:16:48] greg-g: i know you wont, but others totally will ;] [00:17:04] the goal is to say "yes" and do more, not say no all the time :) [00:17:15] "we just got another person, say no more!" [00:17:16] ;) [00:17:23] You should an an RT field for location of bribes [00:17:24] i'd say first he gets gerrit [00:17:24] im not saying we have to say no more [00:17:28] then he makes his own account [00:17:30] but if thats how its being taken i dont care to discuss [00:17:36] I'm kidding! [00:17:37] throw the new dude to the wolves. [00:17:44] and merges it after review [00:17:57] * Nemo_bis prepares a list of maintenance scripts to be run [00:18:03] but, seriously, what's his specialty/area he'll be focusing on, or more of a generalist? [00:18:16] just curious since that wasn't in the announce email [00:18:23] who knows [00:18:34] i didnt know we were hiring someone until the announcement went out [00:18:36] greg-g: RT and mailman [00:18:57] cool [00:18:59] (i'm just kind of joking:) [00:19:06] mutante: so taking stuff off you're plate ;) [00:19:14] you're? your [00:19:28] greg-g: generalist, I hope [00:19:36] yea, let him pick from the pool [00:20:23] cool cool [00:21:35] would be nice if getting an IRC cloak is kind of quick [00:21:53] mutante: I still haven't received mine :/ [00:21:56] I need to reping [00:22:07] !gc [00:22:30] i bet James_F knows who to annoy for cloaks [00:22:35] !log catrope synchronized php-1.23wmf6/extensions/VisualEditor/ 'Cherry-picks' [00:22:42] if the online form isnt getting responses that is [00:22:45] OK I'm done [00:22:49] ori-l: You're next up in the LD window [00:22:50] Logged the message, Master [00:22:51] #freenode can get you the unaffiliated cloak, usually takes < 5min [00:22:59] !gc is https://meta.wikimedia.org/wiki/IRC/Group_Contacts [00:22:59] Key was added [00:23:00] longer if you want a fancy cloak :P [00:23:02] yea, but its cool to get the wikimedia one =] [00:23:17] RobH: I actually don't, sadly. [00:23:27] https://spreadsheets.google.com/viewform?hl=en&formkey=dG1FTWV1RnNBVHFOSnExMHF6aUhya2c6MA [00:23:41] so im used to having folks fill out form and they are usually fast, but perhaps those folks are all on wikibreak [00:23:47] https://meta.wikimedia.org/wiki/IRC_channel_cloaks [00:23:58] see link above, there are 3 of them [00:24:13] yep [00:24:23] but i wouldnt contact htem directly until AFTER you submit the form [00:24:26] and dont get a reply [00:24:29] its just more polite. [00:24:45] RoanKattouw: I scheduled the slot optimistically, thinking I'd get https://gerrit.wikimedia.org/r/#/c/100542/ reviewed [00:24:49] the actual answer is, it should be part of on-boarding workflow [00:24:56] I haven't, so I'll return the slot to the pool [00:24:58] that meta.wikimedia.org one wont work, (at least, its not for wmf. Its for wiki editors with >250 edits. I know because i asked and they said no :P ) [00:25:04] ori-l: OK [00:25:09] No one else is waiting I think [00:25:14] ebernhardson: if you are paid by wmf [00:25:15] its enough [00:25:20] it helps identify you for that as well [00:25:22] RobH: i am, and i asked, and they said no [00:25:25] what?!?! [00:25:33] then i am not eligible for the cloak [00:25:37] RobH: they said there is a different thing for wmf [00:25:37] i have like 5 edits [00:25:43] no one tell them [00:25:46] i wanna keep it. [00:25:59] ebernhardson: they are nuts [00:26:09] there is nothing for wikimediafoundation on those wikipages at least [00:26:23] staff have traditionally just been granted wikimedia/ cloaks [00:26:36] make them count wikitech edits [00:26:48] oh if they count wikitech edits im a fucking editor for sure. [00:26:50] and office [00:26:52] heh [00:26:53] /msg MemoServ send wmfgc IRC cloak request [00:26:55] gah [00:26:57] i edit the hell out of those! [00:27:34] but when i did cloak stuff [00:27:38] casey was irc contact for it [00:27:39] greg-g: The FSF member cloak is nice and has the benefit of being trusted to mess with wm-bot [00:28:04] huh [00:28:05] but yea, if you have no cloak, id get unaffiliated at minimum [00:28:07] I dunno about globally, but at least in #mediawiki [00:28:13] its way more secure for channel auth than nick [00:28:28] I have one, just, it's kind of not legit any more (I revoked my own Ubuntu membership a while ago) [00:28:32] (unless you tweak your nick with enforcement) [00:28:36] heh [00:29:29] ori-l: so, no go today? [00:29:30] RobH: Nobody should use nicks for channel auth. Accounts are probably better, because it allows for cloak change without loss of status. [00:29:31] i just said that because i want to get him on all channels on day 1 but not add a non-cloak to access list [00:29:51] mutante: Use accounts instead of hostmasks. [00:29:57] true [00:30:58] greg-g: can still do it, gwicke merged [00:31:04] ori-l: sweet, do it [00:31:09] I'mma gonna head towards home [00:31:47] kk [00:36:33] what's up with /w/index.php?action=raw&ctype=text/javascript&title=MediaWiki:Common.js/IEFixes.js ? [00:36:37] ori-l, V8 is a method-based jit- which explains why the self-executing function wrapper makes a difference for optimizations [00:37:14] optimizations are only disabled for objects defined at the top level of the evaled source [00:38:40] marktraceur: yes, ack [00:39:07] gwicke: hrm, but there are fewer scopes in which to look up names [00:39:37] but i guess that wouldn't make a difference, since the names will be resolved in the same scope anyway [00:39:42] as they were before, I mean [00:40:03] but ReferenceErrors: foo is undefined will be faster! :D [00:40:35] random eval'ed code is just not a unit of optimization in the JIT [00:40:37] I can't run the test today after all, because I don't have a reference start time for the measurement [00:40:43] there's tons of /w/index.php?title=MediaWiki:RefToolbar.js&action=raw&ctype=text/javascript & /w/index.php?action=raw&ctype=text/javascript&title=MediaWiki:Common.js/IEFixes.js hits [00:40:45] it can just be a sequence of statements [00:40:47] I don't think these were there yesterday [00:41:07] gwicke: yeah, though the point about scope is not tied to v8 [00:41:15] but it's moot anyway [00:41:35] paravoid: looking [00:41:36] a jit can optimize away aliasing [00:41:58] it does not need to walk the frame chain every time as an interpreter would [00:42:42] paravoid: which wikis? [00:42:53] 5% of all backend traffic, sigh [00:43:36] RefToolbar.js seems to be eswiki [00:44:17] IEFixes mostly enwiki [00:44:24] neither were updated recently, hrm [00:44:33] maybe I'm wrong about them being recent [00:45:18] let me dig some more [00:48:01] hm, weird, now it doesn't appear at all again [00:48:08] as an outlier [00:49:34] can we switch ganglia-web's default time resolution view to 'day'? [00:49:47] I use hour more than day :) [00:50:08] huh, that's surprises me [00:50:11] * gwicke concurs with paravoid  [00:50:52] depends what I'm investigating [00:51:24] right after a change the hourly view shows issues best [00:52:05] Aaron|home: have you seen http://ganglia.wikimedia.org/latest/graph.php?r=month&z=xlarge&c=Redis+eqiad&m=cpu_report&s=by+name&mc=2&g=network_report ? [00:52:19] oh my [00:53:13] https://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Miscellaneous+pmtpa&h=hume.wikimedia.org&v=823574&m=Global_JobQueue_length&jr=&js= is also fun [00:54:06] 7 million? [00:54:08] wtf [00:54:19] well, 6, but still [00:54:47] probably an edit to one of the most-used templates [00:54:47] the following wikis have more than 199,999 jobs: , commonswiki (2125013), enwiktionary (3481304), Total (5945186) [00:55:29] or two ;) [00:57:54] those pages are small, so it will drain relatively quickly [01:03:11] everytime I dig in our logs, I just find weird stuff [01:03:19] tons of requests for /w/api.php?action=parse&redirects&prop=text%7Cdisplaytitle&format=xml&page=Orange [01:03:27] 253 TxHeader b User-Agent: Akamai SureRoute [01:03:29] 253 TxHeader b X-Akamai-TestObject: true [01:07:09] gwicke: so, what's the next steps re: nodejs memory leak? [01:07:18] what are even [01:07:49] paravoid: we used to have a cron job that restarted hanging test runners in rt testing once per hour [01:08:05] that likely covered up the leak in rt testing [01:08:28] this is now disabled, so over night we'll see whether we see the same leak in rt testing [01:08:44] if we do, then we can start to track it down there [01:08:49] how is the rt testing performed? [01:08:55] is it client http requests? [01:09:08] not yet (patch for that pending) [01:09:16] or are you reusing the parsing parts of parsoid? [01:09:21] right now it is test clients getting jobs and calling the module directly [01:09:25] right [01:09:32] so the leak might be somewhere else then [01:09:37] in the http codepath [01:09:55] if we don't see it, then it is likely connected to the http part [01:10:08] nod [01:10:24] I have been running rashomon under high load for days though using the 0.10 ppa [01:10:28] no leaks [01:11:16] so I suspect that it might be something specific to the backport [01:13:08] ori-l: match that with mc1001 [01:13:23] that's just the aggregator server being switched (see the SAL) [01:13:29] so nothing crazy there [01:17:55] parsoid seems to request the same URLs over and over and over [01:18:12] e.g. /w/api.php?action=query&format=json&prop=imageinfo&titles=File%3AGamepad.svg&iiprop=size%7Curl&iiurlwidth=80 [01:18:29] I was going to say; if it's the article on suicide or bomb making we should get worried [01:19:08] anyway, I'm going to stop log digging now [01:19:12] it has questionable results anyway :) [01:19:15] paravoid: IIRC within a single request we are caching those, but not across requests [01:19:54] so if a template using such an image was edited and all those page are re-rendered, it will still produce a ton of requests for the same resource [01:26:16] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [01:47:19] (03Abandoned) 10Jforrester: Add "betar" label to VisualEditor links on eswiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98613 (owner: 10Jforrester) [02:16:17] !log LocalisationUpdate completed (1.23wmf6) at Wed Dec 11 02:16:17 UTC 2013 [02:16:35] Logged the message, Master [02:25:16] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [02:30:45] !log LocalisationUpdate completed (1.23wmf5) at Wed Dec 11 02:30:44 UTC 2013 [02:31:01] Logged the message, Master [02:58:40] PROBLEM - Host elastic1007 is DOWN: PING CRITICAL - Packet loss = 100% [03:01:10] RECOVERY - Host elastic1007 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [03:14:00] (03PS1) 10Ori.livneh: Duplicate client-side latency measurements from eswiki to special bin [operations/puppet] - 10https://gerrit.wikimedia.org/r/100736 [03:17:38] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Dec 11 03:17:38 UTC 2013 [03:17:53] Logged the message, Master [03:46:39] (03PS1) 10QChris: Reconfigure and turn on geowiki monitoring again [operations/puppet] - 10https://gerrit.wikimedia.org/r/100747 [03:47:02] (03CR) 10jenkins-bot: [V: 04-1] Reconfigure and turn on geowiki monitoring again [operations/puppet] - 10https://gerrit.wikimedia.org/r/100747 (owner: 10QChris) [03:48:07] (03PS2) 10QChris: Reconfigure and turn on geowiki monitoring again [operations/puppet] - 10https://gerrit.wikimedia.org/r/100747 [03:49:51] (03CR) 10QChris: "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/100747 (owner: 10QChris) [03:50:28] (03PS1) 10Ottomata: analytics1010 is in Row B, need a Gangalia aggregator there [operations/puppet] - 10https://gerrit.wikimedia.org/r/100749 [03:50:41] (03CR) 10QChris: [C: 04-1] "Depends on Ie3b1a4d210a37ab0929b808f27951be52ff8aa26 and $passwords::geowiki" [operations/puppet] - 10https://gerrit.wikimedia.org/r/100747 (owner: 10QChris) [03:51:45] (03PS2) 10Ottomata: analytics1010 is in Row B, need a Gangalia aggregator there [operations/puppet] - 10https://gerrit.wikimedia.org/r/100749 [03:52:33] (03CR) 10Ottomata: [C: 032 V: 032] analytics1010 is in Row B, need a Gangalia aggregator there [operations/puppet] - 10https://gerrit.wikimedia.org/r/100749 (owner: 10Ottomata) [04:28:44] (03PS3) 10QChris: Reconfigure and turn on geowiki monitoring again [operations/puppet] - 10https://gerrit.wikimedia.org/r/100747 [04:30:20] (03PS4) 10Ottomata: Reconfigure and turn on geowiki monitoring again [operations/puppet] - 10https://gerrit.wikimedia.org/r/100747 (owner: 10QChris) [04:30:30] (03CR) 10Ottomata: [C: 032 V: 032] Reconfigure and turn on geowiki monitoring again [operations/puppet] - 10https://gerrit.wikimedia.org/r/100747 (owner: 10QChris) [04:58:17] (03PS1) 10Dzahn: include the bugzilla config in puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/100752 [05:27:26] (03CR) 10Ori.livneh: "> the other stuff doesn't need to change between labs and prod" [operations/puppet] - 10https://gerrit.wikimedia.org/r/96403 (owner: 10Dzahn) [05:59:01] (03CR) 10Ori.livneh: Include redis on logstash servers (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/100511 (owner: 10Aaron Schulz) [05:59:22] (03PS3) 10Ori.livneh: Include redis on logstash servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/100511 (owner: 10Aaron Schulz) [05:59:32] (03PS4) 10Ori.livneh: Include redis on logstash servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/100511 (owner: 10Aaron Schulz) [06:00:39] (03CR) 10Ori.livneh: [C: 032] Include redis on logstash servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/100511 (owner: 10Aaron Schulz) [06:01:22] (03PS2) 10Ori.livneh: Duplicate client-side latency measurements from eswiki to special bin [operations/puppet] - 10https://gerrit.wikimedia.org/r/100736 [06:02:35] (03CR) 10Ori.livneh: [C: 032] Duplicate client-side latency measurements from eswiki to special bin [operations/puppet] - 10https://gerrit.wikimedia.org/r/100736 (owner: 10Ori.livneh) [06:05:44] !log silencing global collect paging (GC is having issues) for 1 hour [06:05:58] Logged the message, Mistress of the network gear. [06:06:46] PROBLEM - Puppet freshness on mw1035 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 06:00:22 AM UTC [06:08:46] PROBLEM - Puppet freshness on mw1035 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 06:00:22 AM UTC [06:09:12] oh, yuck [06:09:13] err: /Stage[main]/Redis/File[/a/redis]/ensure: change from absent to directory failed: Cannot create /a/redis; parent directory /a does not exist [06:10:46] PROBLEM - Puppet freshness on mw1035 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 06:00:22 AM UTC [06:11:34] (03PS1) 10Ori.livneh: logstash: specify /var/run/redis as redis $dir [operations/puppet] - 10https://gerrit.wikimedia.org/r/100755 [06:12:39] (03CR) 10Ori.livneh: [C: 032] logstash: specify /var/run/redis as redis $dir [operations/puppet] - 10https://gerrit.wikimedia.org/r/100755 (owner: 10Ori.livneh) [06:12:46] PROBLEM - Puppet freshness on mw1035 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 06:00:22 AM UTC [06:14:46] PROBLEM - Puppet freshness on mw1035 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 06:00:22 AM UTC [06:16:03] (03PS1) 10Ori.livneh: logstash: re-order includes to avoid duplicate def'n [operations/puppet] - 10https://gerrit.wikimedia.org/r/100756 [06:16:21] (03CR) 10Ori.livneh: [C: 032 V: 032] logstash: re-order includes to avoid duplicate def'n [operations/puppet] - 10https://gerrit.wikimedia.org/r/100756 (owner: 10Ori.livneh) [06:16:46] PROBLEM - Puppet freshness on mw1035 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 06:00:22 AM UTC [06:18:46] PROBLEM - Puppet freshness on mw1035 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 06:00:22 AM UTC [06:20:46] PROBLEM - Puppet freshness on mw1035 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 06:00:22 AM UTC [06:22:46] PROBLEM - Puppet freshness on mw1035 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 06:00:22 AM UTC [06:24:46] PROBLEM - Puppet freshness on mw1035 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 06:00:22 AM UTC [06:26:46] PROBLEM - Puppet freshness on mw1035 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 06:00:22 AM UTC [06:28:46] PROBLEM - Puppet freshness on mw1035 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 06:00:22 AM UTC [06:29:16] RECOVERY - Puppet freshness on mw1035 is OK: puppet ran at Wed Dec 11 06:29:12 UTC 2013 [06:30:46] PROBLEM - Puppet freshness on mw1035 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 06:29:12 AM UTC [06:32:46] PROBLEM - Puppet freshness on mw1035 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 06:29:12 AM UTC [06:59:56] RECOVERY - Puppet freshness on mw1035 is OK: puppet ran at Wed Dec 11 06:59:45 UTC 2013 [08:01:35] (03PS1) 10ArielGlenn: on puppetmasters, remove default ubuntu logrotate for apache [operations/puppet] - 10https://gerrit.wikimedia.org/r/100758 [08:03:03] (03CR) 10ArielGlenn: [C: 032] on puppetmasters, remove default ubuntu logrotate for apache [operations/puppet] - 10https://gerrit.wikimedia.org/r/100758 (owner: 10ArielGlenn) [08:18:04] (03PS1) 10ArielGlenn: add mw1-16 (pmtpa inactive job runners) to dsh apaches [operations/puppet] - 10https://gerrit.wikimedia.org/r/100759 [08:38:17] (03PS1) 10Matanya: svn: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/100760 [08:38:50] (03CR) 10jenkins-bot: [V: 04-1] svn: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/100760 (owner: 10Matanya) [08:39:51] (03PS2) 10Matanya: svn: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/100760 [08:45:48] !log manually downgraded nodejs on contint (gallium, lanthanum) from 0.10.22 to 0.8.2 (puppet failed to do so) [08:46:04] Logged the message, Master [09:09:00] * apergos eyes robla suspiciously [09:09:07] isn't it a bit late for you? [09:13:16] one could look at it, as being early [09:14:07] there's early and there's ridiculous :-P [09:15:20] and how you know its not future time travelling Rob? [09:16:58] then I would be right to eye him suspiciously [09:18:20] (03PS1) 10ArielGlenn: access to zirconium for bd808, rt #6448 [operations/puppet] - 10https://gerrit.wikimedia.org/r/100762 [09:24:02] (03CR) 10ArielGlenn: [C: 032] access to zirconium for bd808, rt #6448 [operations/puppet] - 10https://gerrit.wikimedia.org/r/100762 (owner: 10ArielGlenn) [09:30:40] apergos: please remind me how do i test a change on labs? any doc on wikirech? [09:31:46] that is a good question (about docs) [09:32:01] apergos: all i know is: https://wikitech.wikimedia.org/wiki/Puppet_usage#labs_testing [09:32:24] that's pretty complete actually [09:32:26] but nothing about how i put my patch in /var/lib/git/operations/puppet [09:33:07] that's step 4 isn't it? [09:33:30] apergos: i want to pull it from gerrit. https://gerrit.wikimedia.org/r/#/c/100760/ [09:33:40] git review -D didn't work [09:33:45] ugh [09:33:50] I don't use git review [09:34:01] just pull? [09:34:40] or cherry pick I guess [09:34:57] depending how current your self hosted puppet branch is [09:38:31] (03PS1) 10ArielGlenn: remove bd808 from restricted, already in mortals [operations/puppet] - 10https://gerrit.wikimedia.org/r/100763 [09:39:01] Hi ops [09:39:30] morning [09:39:38] :) [09:39:41] (03CR) 10ArielGlenn: [C: 032] remove bd808 from restricted, already in mortals [operations/puppet] - 10https://gerrit.wikimedia.org/r/100763 (owner: 10ArielGlenn) [09:39:41] Is this the right place to ask about a Gerrit/jenkins test failure I don't understand? [09:39:46] https://integration.wikimedia.org/ci/job/mwext-MobileFrontend-qunit-mobile/803/console [09:40:41] hashar would be the expert on contint [09:40:50] AndyRussG: i guess hashar would be able to help here [09:41:05] hello [09:41:08] morning [09:41:11] Morning hashar [09:41:23] How's it going? [09:41:49] AndyRussG: so seems some page is timing out in the QUnit javascript tests [09:41:54] AndyRussG: if you go on the job page at https://integration.wikimedia.org/ci/job/mwext-MobileFrontend-qunit-mobile/803/ [09:42:25] AndyRussG: you will find the full debug logs for CLI script and web queries (respectively: mw-debug-cli.log.gz and mw-debug-www.log.gz ) [09:42:32] that corresponds to $wgDebugLogFile [09:43:08] AndyRussG: there is a stack trace :( [09:43:36] [exception] [fa6da801] /jenkins-mwext-MobileFrontend-qunit-mobile-803/index.php?title=Special:JavaScriptTest/qunit&useformat=mobile Exception from line 112 of /srv/ssd/jenkins-slave/workspace/mwext-MobileFrontend-qunit-mobile/includes/WikiPage.php: Invalid or virtual namespace -1 given. [09:43:46] AndyRussG: have a look at http://paste.debian.net/70323/ [09:43:50] yw, sorry for long spam :-] [09:44:53] Wow, thanks [09:45:06] As far as I can see that has nothing to do with my patch [09:45:19] Which was just a one-liner [09:45:25] hashar: while you're here, I downgraded nodejs on gallium/lanthanum, it had been upgraded for parsoid but they rolled that back [09:45:35] just in case there are any issues, hopefully not [09:45:41] Ah no, I'm wroing [09:45:46] apergos: ah thank you to have taken care of the downgrade! [09:45:52] It *does* go through my patch [09:46:12] apergos: I have noticed it happened in production (ensure latest -> present) but haven't thought about downgrading on jenkins slaves :( [09:46:30] yes, well you have ensure latest which makes sense [09:46:48] AndyRussG: something worth mentioning is that we use MediaWiki master branch AND your patch against MobileFrontend is merged with the tip of the MobileFrontend master branch [09:46:49] OK, I think I got it [09:46:51] and ordinarily latest will be later :-D [09:47:18] OK [09:47:22] AndyRussG: so sometime it might be an issue in a change introduced in the extension master branch, though that is really unlikely [09:47:46] Yeah, no, I think I know what it might be, the stack trace is a huge help [09:47:53] hashar: thanks so much!!! [09:48:29] who do we think is the nominal manager for fundraising tech? is that jeff? [09:59:47] katie? [10:06:34] (03PS1) 10Mark Bergsma: Revert "Take mark off the SMS list" [operations/puppet] - 10https://gerrit.wikimedia.org/r/100765 [10:06:42] (03CR) 10jenkins-bot: [V: 04-1] Revert "Take mark off the SMS list" [operations/puppet] - 10https://gerrit.wikimedia.org/r/100765 (owner: 10Mark Bergsma) [10:07:29] (03Abandoned) 10Mark Bergsma: Revert "Take mark off the SMS list" [operations/puppet] - 10https://gerrit.wikimedia.org/r/100765 (owner: 10Mark Bergsma) [10:07:51] (03PS1) 10Mark Bergsma: Put mark back in the SMS list [operations/puppet] - 10https://gerrit.wikimedia.org/r/100766 [10:08:54] (03CR) 10Mark Bergsma: [C: 032] Put mark back in the SMS list [operations/puppet] - 10https://gerrit.wikimedia.org/r/100766 (owner: 10Mark Bergsma) [10:11:49] mark: welcome back :-] [10:12:05] thanks :) [10:12:49] apergos: yup Katie Horn is the lead software dev, happily dispatching fundraising tech issues [10:13:04] great, thanks [10:13:18] apergos: I don't think she is actually managing Matthew and Adam, merely doing tech leading [10:13:22] oh [10:13:25] hm [10:13:38] she is most probably part of the scrum of scrum as well [10:14:01] anywa,y you can't be wrong pinging her. At worth she will route your request to the appropriate person [10:14:15] good point [10:26:26] dns katie [10:29:16] (03PS1) 10Hashar: beta: adapt $wgParsoidCacheServers [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100769 [10:29:31] (03CR) 10Hashar: [C: 032] beta: adapt $wgParsoidCacheServers [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100769 (owner: 10Hashar) [10:29:42] (03Merged) 10jenkins-bot: beta: adapt $wgParsoidCacheServers [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100769 (owner: 10Hashar) [10:32:06] ah elastic search is paying off [10:32:18] they added thumbnails of pictures in Special:Search http://commons.wikimedia.beta.wmflabs.org/w/index.php?title=Special%3ASearch&profile=default&search=jpg&fulltext=Search [10:34:15] mark: welcome back!!! [10:34:26] hey [10:34:36] did you have a nice time ? [10:34:41] very :) [10:35:27] * akosiaris envious [10:35:32] thanks for keeping wikivoyage & wikipedia up during my travels ;) [10:35:50] they were rather useful from time to time hehe [10:36:20] those thanks should mostly go to paravoid. At some point he had a streak of responding to 15 outages in a row. [10:38:31] so I read [10:42:03] i guess he's sleeping it off now ;-p [10:43:51] (03PS1) 10Hashar: beta: get rid of lucene search configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/100771 [10:51:26] (03CR) 10Akosiaris: [C: 032] Template ferm's defs.production [operations/puppet] - 10https://gerrit.wikimedia.org/r/99705 (owner: 10Akosiaris) [11:04:12] PROBLEM - Host elastic1007 is DOWN: PING CRITICAL - Packet loss = 100% [11:07:17] RECOVERY - Host elastic1007 is UP: PING OK - Packet loss = 0%, RTA = 0.43 ms [11:08:50] (03PS1) 10Akosiaris: Replace +,- chars with _ in ferm defs [operations/puppet] - 10https://gerrit.wikimedia.org/r/100776 [11:10:02] akosiaris: i'm testing my svn module and get: err: Failed to apply catalog: Could not find dependent Service[apache2] for File[/etc/apache2/sites-available/svn] at /etc/puppet/modules/svn/manifests/server.pp:36 [11:10:11] how can i debgu this? [11:11:00] the changeset is at https://gerrit.wikimedia.org/r/#/c/100760/ [11:11:03] matanya: the error pretty much says all you need to know. The File resource depends on a Service resource that is not defined [11:11:45] you need to look carefully and decided whether you really need the dependency (the require parameter probably) [11:12:59] how are you testing ? [11:13:13] thanks akosiaris may question was regarding the need, not the error itself. [11:13:24] i'm running it in a labs instance [11:13:50] oh sorry, I did not understand that from the question [11:14:56] in that case, this requires a more thorough examination of the manifest which I can do right now. I look into it in about 30 mins, ok ? [11:15:03] yeah, sorry, bad word choosing. [11:15:10] I can not, i meant [11:15:23] sure 30 mins. thanks [11:15:38] (03CR) 10Akosiaris: [C: 032] Replace +,- chars with _ in ferm defs [operations/puppet] - 10https://gerrit.wikimedia.org/r/100776 (owner: 10Akosiaris) [11:20:29] hmmm there seems to be an ordering issue here. Minor but /etc/ferm/conf.d/00_defs changes in every run (lines get rearranged). I wonder if it's puppet's fault or ERB's... [11:26:52] you need to sort any hashes, right [11:28:21] meh.. :-( [11:28:57] ruby hashes are not stable, I should sort by key i suppose [11:35:30] yep [11:39:06] heeeeeelloo [11:39:31] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [11:39:51] hi [11:40:39] grumble, that check firing just as soon as I say hello [11:41:06] I was about to tell you that, then I said.. let him be :P [11:41:13] too early in the morning for him [11:41:16] ok, false positive [11:41:24] POST http://he.wikipedia.org/ [11:41:28] empty POSTs, someone [11:43:45] (03PS1) 10Akosiaris: Sort ruby hashes before interpolation in ferm defs [operations/puppet] - 10https://gerrit.wikimedia.org/r/100778 [11:46:28] (03CR) 10Akosiaris: [C: 032] Sort ruby hashes before interpolation in ferm defs [operations/puppet] - 10https://gerrit.wikimedia.org/r/100778 (owner: 10Akosiaris) [12:24:33] matanya: those svn manifests are so weird to begin with, that it is not easy to answer your question. You can a) remove the notify and be done with it (it is a legacy service after all), b) include webserver::apache to satisfy the dependency (hosts in site.pp either do it directly or through some other class), c) refactor the entire thing so much that the dependency no longer makes sense. This class is a hybrid between a role class a [12:25:53] thanks akosiaris. i think i'll do a in this patch and then c in a seperate patch. what do you think? [12:27:02] matanya: sounds like it has a good chance of working. [12:27:48] (03CR) 10Akosiaris: [C: 04-1] include the bugzilla config in puppet (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/100752 (owner: 10Dzahn) [12:29:27] (03PS3) 10Matanya: svn: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/100760 [12:36:16] (03PS1) 10Hashar: beta: ensure /usr/local/apache* belong to mwdeploy [operations/puppet] - 10https://gerrit.wikimedia.org/r/100779 [12:36:51] (03PS1) 10ArielGlenn: add analytics1010 (row b aggregator) to gmetad [operations/puppet] - 10https://gerrit.wikimedia.org/r/100780 [12:37:24] apergos: if you are around, got a nasty patch for beta https://gerrit.wikimedia.org/r/100779 [12:37:34] I am, give me about 3 mins [12:37:40] :-D [12:38:23] (03CR) 10ArielGlenn: [C: 032] add analytics1010 (row b aggregator) to gmetad [operations/puppet] - 10https://gerrit.wikimedia.org/r/100780 (owner: 10ArielGlenn) [12:38:39] mark: whey are we keeping the svn server anyway? [12:38:53] people link to it [12:39:15] the last commit is 10 month old [12:39:24] sure [12:39:30] but keeping it around read-only is good [12:39:50] (03CR) 10Hashar: "I have manually fixed the permission issue and retriggered the job on beta cluster. See also https://bugzilla.wikimedia.org/show_bug.cgi?i" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100616 (owner: 10Reedy) [12:39:52] oh the mwdeploy issue [12:39:54] oh joy [12:39:55] (03CR) 10Hashar: "I have manually fixed the permission issue and retriggered the job on beta cluster. See also https://bugzilla.wikimedia.org/show_bug.cgi?i" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98002 (owner: 10John F. Lewis) [12:40:01] (03CR) 10Hashar: "I have manually fixed the permission issue and retriggered the job on beta cluster. See also https://bugzilla.wikimedia.org/show_bug.cgi?i" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100551 (owner: 10Spage) [12:40:06] so all we need actually is the webserver, not the entire configuration [12:40:07] (03CR) 10Hashar: "I have manually fixed the permission issue and retriggered the job on beta cluster. See also https://bugzilla.wikimedia.org/show_bug.cgi?i" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100710 (owner: 10Chad) [12:40:20] (03CR) 10Hashar: "I have manually fixed the permission issue and retriggered the job on beta cluster. See also https://bugzilla.wikimedia.org/show_bug.cgi?i" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100769 (owner: 10Hashar) [12:40:31] apergos: yup that is ugly :( [12:40:40] apergos: I am not sure why we haven't been hit by that issue before :/ [12:40:48] adding a lame exception fix it though [12:41:20] actually we were hit by it in production at some point, there's an old rt ticket about it I just saw the other day :-/ [12:41:42] ouch [12:41:55] at least on beta the root cause is the apache dir is on a shared dir [12:41:58] (unlike production) [12:42:09] so each instance race to apply the UID it knows for mwdeploy [12:42:19] I wish we had a central repository of user/UID. Would solve it for good [12:42:24] what's the path for getting the uid to be the same on all labs instances? or at least, on all instances in deployment? [12:42:32] taking it one step at a time... [12:42:53] we could hardcode the UID in the puppet system_user {} [12:42:57] or use LDAP [12:43:03] each option having drawbacks [12:43:13] let's hear the ldap drawbacks [12:43:20] the LDAP way having strong opposers for whatever reason [12:43:40] one of them probably being if LDAP dies we are screwed :-D [12:44:09] if ldap dies now we are screwed for access, right? [12:44:15] so I'm not seeing a big difference there [12:48:00] (03CR) 10ArielGlenn: [C: 032] beta: ensure /usr/local/apache* belong to mwdeploy [operations/puppet] - 10https://gerrit.wikimedia.org/r/100779 (owner: 10Hashar) [12:48:24] let's have or re-have the ldap discussion sometime soon [12:48:37] yup :-D [12:48:42] the above hack is good enough for now :] [12:48:55] where now is very short term [12:49:32] bearing in mind that around here, short term turns into mid-term etc [12:49:32] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [12:57:21] (03PS3) 10QChris: Backup geowiki's data-private bare repository [operations/puppet] - 10https://gerrit.wikimedia.org/r/98499 [12:57:26] paravoid: Any chance I could get you to look at the second try to backup geowiki ^ and see if that's better suited that the first one? [12:58:23] akosiaris is our backup master :) [12:59:14] paravoid: Ok. Thanks. I just thought you'd care as well as you said you did not like the old approach. I'll wait for akosiaris then :-) [13:34:15] PROBLEM - Puppet freshness on elastic1007 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 10:33:16 AM UTC [14:18:39] (03PS1) 10Manybubbles: Cirrus as secondary for more wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100782 [14:20:06] any idea whether we have a cronjob to clean /tmp on application servers? [14:22:00] yes [14:22:24] we toss tifs over an hour old because those are the biggest hogs [14:22:42] nicee [14:23:26] we clean up some stuff left around with php*** too [14:23:31] nothing else there [14:23:39] modules/applicationserver/manifests/cron.pp: command => 'find /tmp -name \*.tif -mmin +60 -delete', [14:23:41] such a pity [14:23:50] oh? [14:24:16] ideally we would have every temp files under /tmp/appserver/ [14:24:28] and dish out anything in /tmp/appserver which is more than 1 hour old [14:25:03] ideally everything in temp ought to work like that [14:25:06] but... [14:25:20] anyways there is an open bug about the tifs so sometime we will not need that job [14:29:20] (03PS1) 10Hashar: lint applicationserver::cron [operations/puppet] - 10https://gerrit.wikimedia.org/r/100784 [14:44:27] hashar: that is my job :P [14:44:43] matanya: keep doing so by reviewing my change :-] [14:47:28] (03PS2) 10Manybubbles: Cirrus as secondary for more wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100782 [14:50:16] (03CR) 10Matanya: lint applicationserver::cron (035 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/100784 (owner: 10Hashar) [14:50:26] here you go hashar :) [14:50:36] thx [15:06:56] (03Abandoned) 10Andrew Bogott: Added an additional snmp trap. [operations/puppet] - 10https://gerrit.wikimedia.org/r/99672 (owner: 10Andrew Bogott) [15:11:02] (03CR) 10Hashar: lint applicationserver::cron (035 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/100784 (owner: 10Hashar) [15:11:59] (03PS2) 10Hashar: lint applicationserver::cron [operations/puppet] - 10https://gerrit.wikimedia.org/r/100784 [15:14:32] (03PS1) 10Matanya: mediawiki_singlenode : lint cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/100790 [15:15:29] (03CR) 10Matanya: [C: 031] lint applicationserver::cron [operations/puppet] - 10https://gerrit.wikimedia.org/r/100784 (owner: 10Hashar) [15:28:57] apergos: thanks for checking on that for me! [15:29:10] sure, only wish I could have solved it [15:29:33] yeah, probably some weird consequence of the row reshuffling we did last month, and the analytics network acls [15:30:00] well why would the row a hosts drop off from visibility within the last day though? [15:33:10] oh [15:33:28] that only changed within the last day? i've had this trouble for at least a week or more now [15:33:35] well, with ganglia at least [15:33:51] it seems like analytics 1009 used to see the rest of the row a hosts [15:34:04] because ganglia says it had information from them earlier today but not any more [15:34:49] hmm, well, throughout this process I have restarted gmond and gmetad on nickel several times [15:34:53] tryign to figure out what is going on [15:34:59] gmond on analytics nodes [15:50:19] (03PS12) 10Ottomata: [not ready for review] Productionizing Wikimetrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/96042 (owner: 10Milimetric) [15:57:50] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:02:40] RECOVERY - RAID on searchidx1001 is OK: OK: optimal, 1 logical, 4 physical [16:04:49] (03PS1) 10Mark Bergsma: Set storage size to default 300G for role::cache::text [operations/puppet] - 10https://gerrit.wikimedia.org/r/100794 [16:06:14] (03CR) 10Mark Bergsma: [C: 032] Set storage size to default 300G for role::cache::text [operations/puppet] - 10https://gerrit.wikimedia.org/r/100794 (owner: 10Mark Bergsma) [16:12:27] (03PS4) 10Gerrit Patch Uploader: Update internal.ico favicon Change-Id: I93c0f55ec584fe047d2f0818b78b7af0c1dc881a [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 [16:12:28] (03CR) 10Gerrit Patch Uploader: "This commit was uploaded using the Gerrit Patch Uploader [1]." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [16:13:03] (03PS1) 10Faidon Liambotis: Varnish: remove German version of CentralAutoLogin [operations/puppet] - 10https://gerrit.wikimedia.org/r/100796 [16:13:39] (03CR) 10Expi1: "I've uploaded a new patch, I wasn't sure what you meant by 'white background' as I couldn't see one myself, though I was probably missing " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [16:13:53] (03CR) 10Faidon Liambotis: [C: 032] Varnish: remove German version of CentralAutoLogin [operations/puppet] - 10https://gerrit.wikimedia.org/r/100796 (owner: 10Faidon Liambotis) [16:19:08] (03CR) 10Addshore: Start wikidata puppet module for builder (036 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/96552 (owner: 10Addshore) [16:20:48] (03PS9) 10Addshore: Start wikidata puppet module for builder [operations/puppet] - 10https://gerrit.wikimedia.org/r/96552 [16:24:56] !log reedy synchronized php-1.23wmf6/cache/l10n/l10n_cache-ml.cdb 'Forced rebuild' [16:25:12] Logged the message, Master [16:25:51] (03CR) 10Qgil: [C: 04-1] "I will let the review of the icon to Odder. About the commit message:" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [16:31:20] (03PS13) 10Ottomata: [not ready for review] Productionizing Wikimetrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/96042 (owner: 10Milimetric) [16:32:11] !log reedy synchronized php-1.23wmf5/cache/l10n/l10n_cache-ml.cdb 'Forced rebuild' [16:32:21] PROBLEM - Varnish HTTP text-backend on cp4016 is CRITICAL: Connection refused [16:32:26] Logged the message, Master [16:33:21] RECOVERY - Varnish HTTP text-backend on cp4016 is OK: HTTP OK: HTTP/1.1 200 OK - 189 bytes in 0.154 second response time [16:34:07] (03PS14) 10Ottomata: [not ready for review] Productionizing Wikimetrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/96042 (owner: 10Milimetric) [16:34:31] PROBLEM - Puppet freshness on elastic1007 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 10:33:16 AM UTC [16:34:39] (03CR) 10Expi1: "Alright, I'll keep that in mind, should I redo the commit and change the message and submit a new patch? or is it for future reference?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [16:35:49] (03PS15) 10Ottomata: [not ready for review] Productionizing Wikimetrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/96042 (owner: 10Milimetric) [16:36:56] (03PS16) 10Ottomata: [not ready for review] Productionizing Wikimetrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/96042 (owner: 10Milimetric) [16:38:11] (03PS17) 10Ottomata: [not ready for review] Productionizing Wikimetrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/96042 (owner: 10Milimetric) [16:38:46] PROBLEM - Varnish HTTP text-backend on cp4017 is CRITICAL: Connection refused [16:41:46] RECOVERY - Varnish HTTP text-backend on cp4017 is OK: HTTP OK: HTTP/1.1 200 OK - 189 bytes in 0.150 second response time [16:44:04] (03PS18) 10Ottomata: [not ready for review] Productionizing Wikimetrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/96042 (owner: 10Milimetric) [16:44:18] !log reedy updated /a/common to {{Gerrit|I42c4c5bcd}}: Fix Commons config for Cirrus [16:44:26] PROBLEM - Varnish HTTP text-backend on cp4018 is CRITICAL: Connection refused [16:44:27] (03PS1) 10Reedy: Add REL1_22 to ExtensionDistributor branches [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100799 [16:44:35] Logged the message, Master [16:44:59] (03PS2) 10Reedy: Add REL1_22 to ExtensionDistributor branches [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100799 [16:45:06] (03CR) 10Reedy: [C: 032] Add REL1_22 to ExtensionDistributor branches [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100799 (owner: 10Reedy) [16:45:26] RECOVERY - Varnish HTTP text-backend on cp4018 is OK: HTTP OK: HTTP/1.1 200 OK - 188 bytes in 0.157 second response time [16:46:25] (03Merged) 10jenkins-bot: Add REL1_22 to ExtensionDistributor branches [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100799 (owner: 10Reedy) [16:48:06] PROBLEM - MySQL InnoDB on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:48:06] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:49:12] (03PS19) 10Ottomata: [not ready for review] Productionizing Wikimetrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/96042 (owner: 10Milimetric) [16:49:45] (03CR) 10Manybubbles: [C: 04-1] "one moment. this would undo plwiktionary's cirrus as a beta feature...." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100782 (owner: 10Manybubbles) [16:50:06] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [16:50:06] RECOVERY - MySQL InnoDB on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [16:50:26] PROBLEM - Varnish HTTP text-backend on cp4008 is CRITICAL: Connection refused [16:51:20] !log reedy synchronized wmf-config/CommonSettings.php 'I55dc9cbdf3c777ce7e9f925bb0e5269c05b23dfa' [16:51:37] Logged the message, Master [16:52:26] RECOVERY - Varnish HTTP text-backend on cp4008 is OK: HTTP OK: HTTP/1.1 200 OK - 188 bytes in 0.150 second response time [16:55:02] (03PS3) 10Manybubbles: Cirrus as secondary for more wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100782 [16:59:17] (03PS20) 10Ottomata: [not ready for review] Puppetizing Wikimetrics in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/96042 (owner: 10Milimetric) [17:00:36] PROBLEM - Varnish HTTP text-backend on cp4010 is CRITICAL: Connection refused [17:01:18] (03PS21) 10Ottomata: [not ready for review] Puppetizing Wikimetrics in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/96042 (owner: 10Milimetric) [17:08:16] (03PS22) 10Ottomata: [not ready for review] Puppetizing Wikimetrics in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/96042 (owner: 10Milimetric) [17:08:18] ^d: ping for https://gerrit.wikimedia.org/r/#/c/100782/ [17:09:12] (03CR) 10jenkins-bot: [V: 04-1] [not ready for review] Puppetizing Wikimetrics in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/96042 (owner: 10Milimetric) [17:09:59] (03PS23) 10Ottomata: [not ready for review] Puppetizing Wikimetrics in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/96042 (owner: 10Milimetric) [17:15:31] (03PS24) 10Ottomata: [not ready for review] Puppetizing Wikimetrics in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/96042 (owner: 10Milimetric) [17:17:45] (03CR) 10Chad: [C: 032] Cirrus as secondary for more wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100782 (owner: 10Manybubbles) [17:17:58] (03Merged) 10jenkins-bot: Cirrus as secondary for more wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100782 (owner: 10Manybubbles) [17:18:02] thank you ^d! [17:18:17] <^d> yw [17:18:34] greg-g: am I good to go? [17:20:04] (03PS25) 10Ottomata: [not ready for review] Puppetizing Wikimetrics in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/96042 (owner: 10Milimetric) [17:21:26] !log manybubbles synchronized wmf-config/CirrusSearch-common.php 'Make wmgUseCirrus override wmgUseCirrusAsAlternative for easier config editing' [17:21:42] Logged the message, Master [17:22:12] !log manybubbles synchronized wmf-config/InitialiseSettings.php 'Deploy Cirrus to new wikis as secondary and let wikidata have it as a beta feature' [17:22:26] Logged the message, Master [17:23:01] greg-g and ^d: done syncing files, verifying [17:27:42] RECOVERY - Varnish HTTP text-backend on cp4010 is OK: HTTP OK: HTTP/1.1 200 OK - 189 bytes in 0.152 second response time [17:27:48] manybubbles: exciting to see cirrus going live in more places :) [17:28:03] Ryan_Lane: Deploy to all the wikis! [17:28:07] \o/ [17:28:15] manybubbles: What's the ETA on that? [17:28:22] manybubbles: 'Cos I'd love it. :-) [17:28:23] all the wikis? [17:28:39] hmmm - sometime mid to late Q1 I think now [17:28:45] depends on if I need new hardware [17:28:47] * James_F nods. [17:28:49] and where I'll get it if I do [17:28:49] (Yay.) [17:29:04] also, if I need row D I might have to wait too [17:31:38] mutante: Are you @ the office today? [17:31:52] hoping someone is there to greet Gage [17:31:53] andrewbogott: yes [17:32:00] andrewbogott: will do [17:32:11] cool [17:32:44] greg-g: I believe I'm done syncing for this deploy. everything looks good. starting to build indexes. [17:33:22] PROBLEM - ElasticSearch health check on elastic1007 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 12: number_of_data_nodes: 12: active_primary_shards: 1800: active_shards: 5392: relocating_shards: 0: initializing_shards: 4: unassigned_shards: 8 [17:33:51] paravoid, hey [17:33:57] hey [17:34:10] the memory leak occured in rt testing too [17:34:22] cool [17:34:25] RECOVERY - ElasticSearch health check on elastic1007 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 12: number_of_data_nodes: 12: active_primary_shards: 1840: active_shards: 5512: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [17:34:38] I have now installed the ppa package on the rt client machines [17:34:56] we'll see how that fares in the next run [17:38:35] ^d: it might be a good idea for us to reduce the number of shards on tiny wikis. that would probably speed some things up. [17:38:49] <^d> *nod* We can look at that [17:39:38] yeah, we already use lower than the default (4 instead of 5) but for some I think they should just have 1. all would have the replicas. [17:51:45] PROBLEM - ElasticSearch health check on elastic1011 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 12: number_of_data_nodes: 12: active_primary_shards: 2368: active_shards: 7092: relocating_shards: 2: initializing_shards: 4: unassigned_shards: 12 [17:53:35] PROBLEM - ElasticSearch health check on elastic1012 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 12: number_of_data_nodes: 12: active_primary_shards: 2392: active_shards: 7160: relocating_shards: 2: initializing_shards: 4: unassigned_shards: 16 [17:54:15] PROBLEM - ElasticSearch health check on elastic1005 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 12: number_of_data_nodes: 12: active_primary_shards: 2400: active_shards: 7183: relocating_shards: 2: initializing_shards: 4: unassigned_shards: 17 [17:54:15] PROBLEM - ElasticSearch health check on elastic1009 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 12: number_of_data_nodes: 12: active_primary_shards: 2400: active_shards: 7183: relocating_shards: 2: initializing_shards: 4: unassigned_shards: 17 [17:54:15] PROBLEM - ElasticSearch health check on elastic1002 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 12: number_of_data_nodes: 12: active_primary_shards: 2400: active_shards: 7183: relocating_shards: 2: initializing_shards: 4: unassigned_shards: 17 [17:54:45] PROBLEM - ElasticSearch health check on elastic1004 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 12: number_of_data_nodes: 12: active_primary_shards: 2404: active_shards: 7198: relocating_shards: 1: initializing_shards: 5: unassigned_shards: 13 [17:54:49] uhhh [17:55:03] manybubbles: are we ok? [17:55:15] RECOVERY - ElasticSearch health check on elastic1002 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 12: number_of_data_nodes: 12: active_primary_shards: 2416: active_shards: 7240: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [17:55:15] RECOVERY - ElasticSearch health check on elastic1009 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 12: number_of_data_nodes: 12: active_primary_shards: 2416: active_shards: 7240: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [17:55:15] RECOVERY - ElasticSearch health check on elastic1005 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 12: number_of_data_nodes: 12: active_primary_shards: 2416: active_shards: 7240: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [17:55:17] let me look [17:55:20] stupid noisy thing [17:55:23] probably [17:55:36] RECOVERY - ElasticSearch health check on elastic1012 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 12: number_of_data_nodes: 12: active_primary_shards: 2428: active_shards: 7276: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [17:55:45] RECOVERY - ElasticSearch health check on elastic1004 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 12: number_of_data_nodes: 12: active_primary_shards: 2432: active_shards: 7288: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [17:55:50] well, recoveries :) [17:56:30] yeah [17:56:39] so that happens when I add a lot of shards really really fast [17:56:44] indexes, rather [17:56:45] RECOVERY - ElasticSearch health check on elastic1011 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 12: number_of_data_nodes: 12: active_primary_shards: 2460: active_shards: 7372: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [17:56:45] * greg-g nods [17:56:51] I just added like 4000 [17:56:58] it took a while [17:57:13] I wish it wouldn't complain when the shards are just throttled [17:58:42] greg-g: you can see how we built up some unassigned shards here: http://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Elasticsearch%20cluster%20eqiad&h=elastic1001.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1386784695&v=0&m=es_unassigned_shards&vl=shards&ti=es_unassigned_shards&z=large [17:58:52] they are being assigned, just es throttles that operation because it is kind heavy [17:59:44] manybubbles: gotcha [17:59:53] it is an open bug [18:02:35] greg-g, any deployment holdups? [18:02:37] getting ready for zero [18:03:59] yurik: nope, should be good [18:17:27] hmm, is there a reason wmf5 is so different from the origin? greg-g ? [18:19:43] (03CR) 10Qgil: "Please improve the commit message of this pach." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [18:20:07] (03CR) 10Reedy: "How does this even work? :/ Unless I'm missing the obvious" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95996 (owner: 10Aude) [18:23:54] (03PS5) 10Expi1: Update internal.ico favicon Change-Id: I93c0f55ec584fe047d2f0818b78b7af0c1dc881a [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [18:27:59] do we have the REST api enabled for Icinga? / do we have anything that integrates with it? [18:29:40] Reedy: yuri's running into oddness with wmf5 [18:31:23] Git status looks fine [18:31:29] There's apparently some uncomitted extension changes [18:32:49] what the hell are they? [18:33:01] # modified: extensions/TimedMediaHandler (new commits) [18:33:01] # modified: extensions/Wikibase (new commits) [18:33:01] # modified: extensions/ZeroRatedMobileAccess (new commits) [18:34:16] (03CR) 10Odder: "Expi1: If you have a look at the favicon in a graphic manipulation program (such as The GIMP or Photoshop), you will notice that it includ" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [18:34:32] LeslieCarr: are you the owner of Icinga? [18:34:51] Reedy: that's it?, should the wikibase ones be commited? [18:35:01] I've just reverted it [18:35:04] * greg-g nods [18:35:06] yurik: ^^ [18:35:24] TMH one is expected [18:35:30] Zero is possibly yuriks own doing [18:35:36] * greg-g nods [18:35:41] that's what makes sense now [18:35:59] git isn't suggesting it's diverged [18:37:28] (03PS1) 10Ori.livneh: Add totalPageLoadTime (navStart to loadEventEnd) to Ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/100817 [18:38:10] only zero dir [18:38:30] (03CR) 10Ori.livneh: [C: 032] Add totalPageLoadTime (navStart to loadEventEnd) to Ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/100817 (owner: 10Ori.livneh) [18:38:49] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:39:49] RECOVERY - RAID on searchidx1001 is OK: OK: optimal, 1 logical, 4 physical [18:44:45] (03CR) 10Jeroen De Dauw: "> And on labs not all the dependencies are being obviously loaded.. Where are they already loaded?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95996 (owner: 10Aude) [18:45:03] !log yurik synchronized php-1.23wmf5/extensions/ZeroRatedMobileAccess/ [18:45:12] !log disabling puppet on cp1046 to test change to ganglia sample interval for varnishkafka stats [18:45:21] Logged the message, Master [18:45:38] i wonder which master? [18:45:38] Logged the message, Master [18:45:43] there we go [18:52:31] (03PS6) 10Gerrit Patch Uploader: Updated docroot/bits/favicon/internal.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 [18:52:32] (03CR) 10Gerrit Patch Uploader: "This commit was uploaded using the Gerrit Patch Uploader [1]." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [18:52:55] (03CR) 10Expi1: "I've uploaded a new patch with the white box." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [18:53:24] paravoid, ori-l: for the PDF puppet module; my OCD side is blocking on where to actually put it... I'm forging new ground here it seems -- following gwicke's convention I'm going to have a deployment repo at mediawiki/services/BookRender/deploy; therefore it seems like I should have a module something like mediawiki_services::bookrender [18:53:53] but maybe that should be mediawiki_services::extensions::collection [18:55:18] mediawiki_services? [18:55:27] isn't ocg an independent service? [18:55:44] two things there; it's coupled to the collection extension [18:55:50] !log yurik synchronized php-1.23wmf6/extensions/ZeroRatedMobileAccess/ [18:56:04] Logged the message, Master [18:56:14] and ocg is sort of a misnomer because we called it that because we were going to bundle mathoid into it -- but that, in hindsight, now seems incorrect because the only dependency they share is Node.JS [18:56:37] so I was going to have seperate modules for BookRender (or CollectionOCG, or something...) [18:56:42] and mathoid [18:57:11] but; because it seems like we're going to be putting that code in git; under a mediawiki/services namespace [18:57:30] erm... in a grammatical corner... that we should have a similar puppet namespace scheme [18:57:52] ocg had nothing to do with mathoid [18:58:01] the term "ocg" was coined way before mathoid existed :) [18:58:13] (math is not offline content either) [18:58:26] ya; another reason not to bundle it under there [18:58:52] I'm not sure I understand what mediawiki/services is [18:59:13] right now; it holds parsoid and rashamon [18:59:37] uhm, okay [18:59:40] I think the idea is that it's like an extension; except not running under mediawiki [18:59:49] they're independent, though [18:59:58] independent; but coupled [19:02:24] ok, if you folks like it, I have no objections [19:02:31] but no, let's not do mediawiki_services on our side :) [19:02:56] "ocg" sounds fine, alternatives are also welcome but let's keep it simple [19:03:24] <^d> mediawiki/services was the result of a fair bit of bikeshedding :p [19:03:38] yeah, I'll stay out of it :P [19:03:48] so... does it make sense to have a tree structure in puppet? or each service should have a root level module? [19:04:12] root level is fine [19:05:41] ok -- that's what I'll do then :) [19:05:49] * mwalker moves things around again in his development branch [19:06:33] modules/ocg and role/ocg.pp sound okay to me tbh [19:08:21] I was starting to not like those because presumably we will reach a state where not all ocg things will run on the same servers [19:08:38] and then we'd have to rename everything [19:08:41] and it would be a huge mess [19:08:58] servers have nothing to do with puppet names :) [19:09:15] well they do though; in so far as we apply roles to servers [19:09:29] and maybe a server only wants to run book renderering [19:09:32] different roles under the same hierarchy [19:11:38] but then I ended up with something like ocg::collection; and ocg::math -- which had some sub modules like ocg::collection::prereqs (so that I could stage things in the correct order) [19:11:42] and it was just getting messy [19:12:11] to consider that maybe we might end up with a couple services labeled 'ocg' all in the same module punching each other [19:12:25] (it's also entirely possible I'm over thinking this) [19:12:40] (that's a bit besides the point, but don't use classes for ordering of resources) [19:13:57] oh? so... how does one order resources then? I have to make sure that the source has been trebuched to the server, the upstart conf is in place, and the required packages are installed before I start the service [19:14:22] it seemed like a decent pattern to have ocg::collection start the service and ocg::collection::prereqs do all the other stuff [19:15:12] I guess I don't have to make sure of all that; but it didn't seem immediately safe to not declare the dependencies [19:16:57] some servers for tools seem have disturbances [19:17:36] (03CR) 10Odder: "Are you sure that's the correct file? I cannot see a white box in it..." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [19:29:38] (03PS7) 10Gerrit Patch Uploader: Updated docroot/bits/favicon/internal.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 [19:29:39] (03CR) 10Gerrit Patch Uploader: "This commit was uploaded using the Gerrit Patch Uploader [1]." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [19:29:58] (03CR) 10Expi1: "I must have uploaded the wrong one, here it is." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [19:35:22] PROBLEM - Puppet freshness on elastic1007 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 10:33:16 AM UTC [19:47:03] (03CR) 10Odder: [C: 04-1] "Could you try uploading a version with "inter" in lower-cased letters, and in black instead of gray? The current version of the favicon (03PS8) 10Gerrit Patch Uploader: Updated docroot/bits/favicon/internal.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 [19:52:00] (03CR) 10Gerrit Patch Uploader: "This commit was uploaded using the Gerrit Patch Uploader [1]." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [20:00:05] jgage: welcome! [20:00:13] jgage: https://meta.wikimedia.org/wiki/IRC/Cloaks [20:00:30] thank you! just getting my laptop set up. hi everyone :) [20:00:35] *waves* [20:01:18] let me invite you to a bunch of channels just to let you know they exist [20:10:18] !log Reloading zuul to deploy I9c9399473a84 [20:10:35] Logged the message, Master [20:11:38] qchris: eek -- I forgot to update the repository request page -- please don't fulfill my request yet [20:11:51] mwalker: Ha [20:11:56] *it should have a different name based on a discussion I had earlier with paravoi_ [20:11:57] mwalker: I was just creating the group. [20:12:26] mwalker: Ok. I'll wait for a bit then :-) [20:14:01] ok! appologies; updated! if you would kindly create mediawiki/services/ocg-collection/deploy [20:14:34] ori-l: if i send data to statsd…how does it get to ganglia? [20:29:54] (03CR) 10Qgil: [C: 04-1] "Remember that we are trying to replicate the original favicon:" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [20:33:04] (03PS1) 10Ottomata: Setting --tmax to 60 for gmetric command [operations/puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/100900 [20:33:21] (03CR) 10Ottomata: [C: 032 V: 032] Setting --tmax to 60 for gmetric command [operations/puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/100900 (owner: 10Ottomata) [20:34:05] (03PS1) 10Ottomata: Updating varnishkafka module with --tmax monitoring change [operations/puppet] - 10https://gerrit.wikimedia.org/r/100901 [20:34:18] (03CR) 10Ottomata: [C: 032 V: 032] Updating varnishkafka module with --tmax monitoring change [operations/puppet] - 10https://gerrit.wikimedia.org/r/100901 (owner: 10Ottomata) [20:37:03] jgage: welcome! [20:37:33] jgage: When you're ready to think about getting cluster access, I can help with that. (If Daniel hasn't done it already.) [20:37:46] I just did all that for Mike a couple of weeks ago so the process is fresh in my mind :) [20:43:48] PROBLEM - mysqld processes on labsdb1003 is CRITICAL: PROCS CRITICAL: 2 processes with command name mysqld [20:55:21] AaronSchulz: thanks for your sql.php patch for --wikidb (gerrit 100707 (wikidb). To deploy it on terbium do we need to backport it to wmf6? [20:55:37] (03PS1) 10Ori.livneh: Disable module storage on eswiki to assess impact [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100903 [20:56:13] (03CR) 10Ori.livneh: [C: 032] Disable module storage on eswiki to assess impact [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100903 (owner: 10Ori.livneh) [20:56:29] !log reenabled puppet on cp1046 [20:56:37] !log ori updated /a/common to {{Gerrit|Id5b930a6b}}: Disable module storage on eswiki to assess impact [20:56:46] Logged the message, Master [20:57:02] Logged the message, Master [20:57:26] !log ori synchronized wmf-config/InitialiseSettings.php 'Id5b930a6b: Disable module storage on eswiki to assess impact' [20:57:42] Logged the message, Master [20:59:14] spagewmf: wmf 6 & 7 [20:59:34] well, 5 and 6 [21:00:10] I assume 7 hasn't been cut yet [21:02:33] Aaron|home: OK I'll prepare backport changes. Why do we need wmf5 as well? `mwscript maintenance/sql.php --wikidb=flowdb --cluster=extension1 --wiki=test2wiki flow.sql` on terbium should create the one set of tables we need. [21:02:54] (03PS9) 10Gerrit Patch Uploader: Updated docroot/bits/favicon/internal.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 [21:02:55] (03CR) 10Gerrit Patch Uploader: "This commit was uploaded using the Gerrit Patch Uploader [1]." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [21:03:48] (03CR) 10Expi1: "I've added another patch, I've had them both open in photoshop, frantically looking between both until they were as similar as I could get" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [21:13:34] spagewmf: heh, yeah, I guess you only need it on one branch [21:17:36] subbu is just trying to do his first deployment, but does not seem to be able to log into tin from bast1001 [21:17:49] with agent forwarding [21:18:50] could anybody check his access on tin as a deployer? [21:19:25] user name is ssastry [21:21:43] gwicke, subbu, I'll look... [21:22:02] thanks [21:23:03] subbu: You aren't in the 'mortals' list which I think is the same as deployers. I'll add you and we'll see if that helps :) [21:23:41] hm… theoretically there should be an access request ticket for this and such. mutante, any opinion about that? [21:23:58] there was one about giving subbu deploy access [21:24:05] andrewbogott, https://rt.wikimedia.org/Ticket/Display.html?id=5512 [21:25:14] Ah, yeah, otto added you as a parsoid admin but not as a mortal. I'll fix. [21:26:03] thanks andrewbogott! daniel has welcomed me, and i already have >100 emails to go through :) [21:26:44] jgage: yeah, I figured you wouldn't need root immediately :) I'm out tomorrow but back Friday (or another Op can help when you need it.) [21:27:33] (03PS1) 10Andrew Bogott: Added subbu to 'mortals' so that he can actually deploy. [operations/puppet] - 10https://gerrit.wikimedia.org/r/100908 [21:28:08] (03CR) 10Andrew Bogott: [C: 032] Added subbu to 'mortals' so that he can actually deploy. [operations/puppet] - 10https://gerrit.wikimedia.org/r/100908 (owner: 10Andrew Bogott) [21:28:49] ori-l: brain bounce w me! :) [21:30:25] andrewbogott, let me know when that is merged. [21:30:48] It's merged but I'd advise you wait ~30mins so that all the hosts know about it. [21:31:05] I'm forcing an update on tin, but presumably a deployment involves… other things. [21:31:12] ah, i see. gwicke ^ .. so do we spill over past our deploy window or should we retry this on monday? [21:31:20] you can deploy today [21:33:43] we still have until 2pm [21:33:53] and the actual deploy typically takes about 1 minute [21:34:01] so plenty of time left ;) [21:34:15] ok. we'll wait for another 20 mins then :) [21:34:26] (03CR) 10Qgil: "Is it just me, or does this favicon look exactly like the previous one?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [21:35:11] paravoid: you there? [21:35:59] kind of :) [21:36:30] I figured that the return of mark would herald a 48-hour nap for paravoid [21:36:41] Eloquence: on the hangout :) [21:38:26] heh [21:38:41] i am having a problem with ganglia metrics [21:38:46] (not the analytics multicast ones) [21:39:03] mainly, it seems that for positive/counter metrics, unless I update the metric at least every 15 seconds [21:39:09] ganglia will write 0s into the rrd files [21:39:14] not sure why [21:39:17] the value in gmond looks good [21:39:30] gmetad polls every 15 seconds [21:39:41] hm [21:39:47] yeah, hm OH. [21:39:58] paravoid you are a good giraffe to talk to [21:40:06] or rubber ducky, or whatever [21:40:08] :) [21:40:15] yeah, so the value doesn't change in gmond every 15 seconds [21:40:17] http://noc.wikimedia.org/~faidon/giraffe/ [21:40:18] only once a minute [21:40:19] :P [21:40:30] hah [21:40:41] so, since this is a positive sloped value [21:40:47] ganglia only writes the differences between values [21:40:59] and it polls for new values every 15 seconds [21:41:06] hmmmmmmm [21:41:25] it shouldn't though, i would think because TN (time since last update) is old, [21:41:26] grr [21:41:27] ok well [21:41:34] there are 2 ways to fix [21:41:42] tell gmetad to poll less often [21:41:54] but afaict, I can only set that at a cluster wide level [21:41:57] OR [21:42:02] update the value every 15 seconds [21:42:03] that is better [21:42:05] but... [21:42:12] that means I can't update the value via cron [21:42:25] because I can't run more than once a minute (i can, but have do hacky sleep stuff) [21:42:38] so, paravoid, what would you rather me do? [21:42:48] write a stupid little upstart script to run logster in a loop [21:42:52] or [21:42:58] wait, what? [21:43:01] have a cron job with some sleeps in the command? [21:43:02] hah [21:43:10] option C? none of the above? :) [21:43:32] in python plugins, you can tell ganglia how often to poll you iirc [21:43:32] the problem is this [21:43:33] http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=Mobile+caches+eqiad&h=cp1046.eqiad.wmnet&jr=&js=&event=show&ts=0&v=75792&m=kafka.rdkafka.topics.webrequest-mobile.partitions.7.txmsgs&ti= [21:43:39] hmm [21:44:17] wellllll [21:44:18] yeah [21:44:21] i can do that with gmetric too [21:44:24] (03PS6) 10Bsitu: Enable Flow discussions on a few wikis' test pages [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/94106 (owner: 10Spage) [21:45:21] lemme see if I can find a python module example of this [21:47:09] Aaron|home: https://gerrit.wikimedia.org/r/#/c/100909/ is sql.php backport to wmf6 [21:51:37] (03PS7) 10Bsitu: Enable Flow discussions on a few wikis' test pages [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/94106 (owner: 10Spage) [21:51:38] (03PS1) 10Bsitu: Turn on some Flow occupy page [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100911 [21:56:20] (03PS1) 10Anomie: l10nupdate-1: Log start and end times of rsync [operations/puppet] - 10https://gerrit.wikimedia.org/r/100913 [21:58:23] Is anyone doing anything now? [21:58:58] they say people die from hunger every 3 seconds [21:59:04] Reedy, greg-g OK'd the Flow team continuing its deploy at 2pm [21:59:23] is that OK? [21:59:52] Are you likely scapping? [22:00:13] Reedy: don't think so, https://www.mediawiki.org/wiki/Flow_Portal/2013-12_Devployment#Deploy_steps [22:00:26] the code is out there, so just DB tables and flip the switch. [22:00:45] I'll wait for a lul then [22:00:51] And fix mlwikis in a bit [22:00:53] (03CR) 10Expi1: "It shouldn't, I just made another patch with the latest attempt and tried to upload to be told that no changes had been made, I'll triple " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [22:01:22] springle created the actual flowdb on extension1 cluster for us [22:01:28] Ryan_Lane: https://gist.github.com/subbuss/c9a9eb235d4705a042e3 [22:01:52] heh [22:02:11] subbu: you should also do something like !log deployed Parsoid here [22:02:18] ah, ok. [22:02:29] that adds an entry to https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:02:51] !log deployed Parsoid abc4f426 [22:02:52] ^d or greg-g : can you +2 the backport of --wikidb option to wmf6? [22:03:07] Logged the message, Master [22:04:29] gwicke: had you already switched to the upstart? [22:04:32] could anybody restart the parsoid machines for now? [22:04:37] Ryan_Lane: no, not yet [22:04:53] scheduled for Friday [22:05:41] one sec [22:05:44] greg-g: is the window clear now? [22:05:44] (03PS1) 10Ryan Lane: trebuchet: Handle service restarts with no status [operations/puppet] - 10https://gerrit.wikimedia.org/r/100914 [22:06:55] gwicke: are you guys done with parsoid deploy? [22:07:11] shit. I guess I should have staggered that [22:07:13] gwicke: done [22:09:17] stupid jenkins [22:10:07] gwicke: are you still deploying Parsoid? [22:10:16] spagewmf: we are done [22:10:28] and are not touching the php cluster in any case ;) [22:10:34] (03CR) 10Ryan Lane: [C: 032] trebuchet: Handle service restarts with no status [operations/puppet] - 10https://gerrit.wikimedia.org/r/100914 (owner: 10Ryan Lane) [22:11:18] gwicke: so, now when you run the service-restart command, it'll at least show when a minion doesn't return a status [22:11:25] thx, we're starting [22:11:26] and we can track down why it's not working for that minion [22:11:44] it's likely a matter of something not sync'd from salt properly right now [22:11:55] Ryan_Lane: k [22:11:56] I just resync'd everything and restarted the salt minion on the nodes [22:12:07] greg-g, just wanted to check-in with you regarding the release of gwtoolset onto production … is that something you wanted to do tomorrow? [22:12:32] in the worst case a dsh -g parsoid service parsoid restart ought to work too [22:13:41] ah, paravoid [22:13:42] i found one [22:13:43] http://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Mobile%20caches%20eqiad&h=cp1046.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1386799994&event=show&ts=0&v=963387946&m=vhtcpd_inpkts_enqueued&vl=pkts&ti=Packets%20enqueued&z=large [22:13:50] that is from a python module [22:13:57] it is set to run every 30 seconds [22:14:08] Snaps: ^ [22:14:31] also, notice how the larger time span values are all messed up [22:14:32] seems to have the same problem we're trying to fix though [22:14:34] because they are being averaged [22:14:41] ^d: bsitu +2d the backport https://gerrit.wikimedia.org/r/#/c/100909/ , but jenkins isn't merging; how to fix? [22:14:47] with a bunch of 0s in the values [22:14:54] yeah, Snaps, but this particular metric is configured to run every 30 seconds [22:15:06] if we set it to 15 or less, it would be fine [22:15:12] <^d> spagewmf: Be patient? Or ignore jenkins and merge anyway? [22:15:14] i just wanted to verify that this wasn't a gmetric problem [22:15:32] it is any positive slope metric sent to ganglia less often than the gmetad polling interval [22:23:20] !log bsitu synchronized php-1.23wmf6/maintenance/sql.php 'Added --wikidb param to sql.php (backport to wmf6)' [22:23:36] Logged the message, Master [22:24:43] Ryan_Lane: is service-restart likely to work the next time? [22:25:04] yes [22:25:15] and if some hosts don't report success, we'll track down why [22:25:19] great, I'll add it to our deploy docs then [22:25:30] it actually worked this time [22:25:36] but maybe one host failed [22:25:50] dan-nl: is the cron /tmp cleanup merged, any other issues on beta for a while? [22:25:56] and the reporting errored out, so we couldn't tell which one failed [22:26:26] k [22:26:34] greg-g: it doesn't look like a cronjob was created for the /tmp cleanup [22:27:03] i'm not sure how to write it … is there someone there that could? [22:27:46] basically it should clear out any URLxxx files that are older than x time. i'm guessing that 24 hours is reasonable [22:28:54] dan-nl: sorry, multitasking, have an interview in 2 minutes: that's stated on the bug report, right? [22:29:05] (03CR) 10BryanDavis: [C: 031] "This is definitely better than nothing. It might be nice to compute the time delta and push it into graphite so that we could see trends o" [operations/puppet] - 10https://gerrit.wikimedia.org/r/100913 (owner: 10Anomie) [22:29:40] i don't know if hashar placed it in the bug report … let me see … we can chat tomorrow about it [22:29:55] springle: we ran this command: mwscript sql.php --wiki=testwiki --wikidb=flowdb --cluster=extension1 extensions/Flow/flow.sql, but this created the table with default charset=latin1 rather than binary [22:30:52] yeah, getting late for you, lemme check out the error logs on beta, but, I'm guessing it'll be a 50/50 chance on tomorrow versus Tuesday. I just want to have a day or so of time/usage on beta without 'any' issue. [22:30:58] ok, call time [22:33:21] bsitu: did the sql file have $wgDBTableOptions or whatever it's called [22:33:23] ? [22:34:44] Aaron|home: yes it does [22:35:07] Aaron|home: also noticing aft and echo tables on extension1 are created with default charset=latin1 [22:35:15] are the binary columns actually varbinary and such? [22:35:22] springle (or anyone) seems like we only set $wgDBTableOptions = "ENGINE=InnoDB, DEFAULT CHARSET=binary" in wmf-config/db-labs.php [22:35:57] Aaron|home: varchar(n) binary [22:36:16] PROBLEM - Puppet freshness on elastic1007 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 10:33:16 AM UTC [22:36:22] spagewmf: looking [22:42:04] include/DefaultSettings.php has only $wgDBTableOptions = 'ENGINE=InnoDB' , so I can see why it's not setting it. springle, we could CREATE DATABASE flowdb DEFUALT CHARACTER SET binary ? [22:43:12] spagewmf: certainly. can i drop the existing flowdb? [22:43:19] springle: yup, go for it [22:45:20] jgage: what would you like your 'real name' to be in production? I don't think quote marks are allowed :( [22:47:47] ebernhardson, spagewmf, done [22:48:40] now as to why extension1 was given different defaults to other shards, don't know yet. its been this way for some time it seems [22:48:51] Aaron|home: ^ do you know? [22:50:19] springe: all aft and echo tables in extension1 were created with latin1, I didn't notice this before, we need to migrate them at some point [22:50:28] springle: wmf-config/db*.php don't set DEFAULT CHARSET, except for db-labs.php. [22:50:35] bsitu: when were thos created? [22:50:57] within the last 7 months [22:50:58] spagewmf: yes, right. but i'm wondering if it was done for a reason that other extensions might now depend on [22:53:19] (03CR) 10Anomie: "Note the time delta seems to be pretty noisy, depending on how many languages happened to be updated from translatewiki that day." [operations/puppet] - 10https://gerrit.wikimedia.org/r/100913 (owner: 10Anomie) [22:53:40] (03PS1) 10Dan-nl: job-delay-config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100922 [22:53:52] (03PS1) 10Andrew Bogott: Add gage as a root [operations/puppet] - 10https://gerrit.wikimedia.org/r/100923 [22:54:04] (03PS8) 10Spage: Enable Flow discussions on a few wikis' test pages [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/94106 [22:54:32] (03CR) 10Spage: [C: 032] "PS8 just changed commit message, approving." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/94106 (owner: 10Spage) [22:54:41] (03Merged) 10jenkins-bot: Enable Flow on a few wikis (but no pages) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/94106 (owner: 10Spage) [22:54:58] (03CR) 10Andrew Bogott: [C: 04-1] "Pending gage's addition of a pubkey" [operations/puppet] - 10https://gerrit.wikimedia.org/r/100923 (owner: 10Andrew Bogott) [22:55:26] jgage, mutante: OK, an intro task… check out https://gerrit.wikimedia.org/r/#/c/100923/, modify with your new public key, resubmit. [22:55:37] jgage, this should be a different keypair than the one used for labs. [22:55:50] andrewbogott: perfect:) [22:55:57] (03PS2) 10BryanDavis: job-delay-config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100922 (owner: 10Dan-nl) [22:56:04] (03CR) 10BryanDavis: [C: 032] job-delay-config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100922 (owner: 10Dan-nl) [22:56:05] mutante, you want to talk him through the git/gerrit/review wonderland? [22:56:23] (03Merged) 10jenkins-bot: job-delay-config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100922 (owner: 10Dan-nl) [22:56:31] !log bsitu updated /a/common to {{Gerrit|I1fba27a98}}: Enable Flow on a few wikis (but no pages) [22:56:33] andrewbogott: yea, he's in a talk right now, once he's available [22:56:45] 'k [22:56:47] Logged the message, Master [22:56:55] is his email jgage@ or jgerard@ or… both? neither? [22:57:23] !log bsitu synchronized wmf-config/InitialiseSettings.php 'Enable Flow on a few wikis (but no pages)' [22:57:39] Logged the message, Master [22:57:53] !log bsitu synchronized wmf-config/CommonSettings.php 'Enable Flow on a few wikis (but no pages)' [22:58:09] Logged the message, Master [22:58:12] springle: not sure, probably an oversight [22:58:29] !log bsitu synchronized wmf-config/InitialiseSettings-labs.php 'Enable Flow on a few wikis (but no pages)' [22:58:45] Logged the message, Master [23:01:33] andrewbogott: ok, will do that now [23:04:04] (03PS2) 10Spage: Turn on some Flow occupy page [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100911 (owner: 10Bsitu) [23:05:50] (03PS3) 10Spage: Turn on some Flow occupy page [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100911 (owner: 10Bsitu) [23:06:05] (03CR) 10Spage: [C: 032] "Let's do this." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100911 (owner: 10Bsitu) [23:08:02] (03CR) 10Spage: [V: 032] "Jenkins too slow" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100911 (owner: 10Bsitu) [23:09:59] no bug number? [23:10:00] :| [23:16:12] (03PS1) 10EBernhardson: Repair flow config on labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100926 [23:17:36] (03CR) 10Spage: [C: 032] "Unclear how to test these but one or the other should work." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100926 (owner: 10EBernhardson) [23:19:04] !log bsitu updated /a/common to {{Gerrit|I6d8a495d0}}: Repair flow config on labs [23:19:20] Logged the message, Master [23:21:19] !log bsitu synchronized wmf-config/InitialiseSettings-labs.php 'Fix the labs config' [23:21:34] Logged the message, Master [23:22:01] (03CR) 10Dzahn: "i think server.pp should maybe just be renamed to init.pp because every module should have an init.pp and that's the main class" [operations/puppet] - 10https://gerrit.wikimedia.org/r/100760 (owner: 10Matanya) [23:22:29] jgage: sorry, stepped away, back now [23:22:43] !log bsitu synchronized wmf-config/InitialiseSettings.php 'Enable Flow on a few wiki pages' [23:22:58] Logged the message, Master [23:23:16] and https://www.mediawiki.org/wiki/Talk:Flow . Live it, love it, feedback welcome [23:24:03] (03PS2) 10Dzahn: include the bugzilla config in puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/100752 [23:24:07] greg-g, springle, Aaron|home, Reedy, ^d : Thanks [23:24:51] !log bsitu synchronized wmf-config/CommonSettings-labs.php 'job-delay-config' [23:24:59] spagewmf: what happened to the text that was formerly there? [23:25:07] Logged the message, Master [23:25:43] /dev/null [23:25:46] legoktm: Flow takes over all references to the page. You can get to the old wiki text using Special:Export [23:26:01] spagewmf: I thought there was supposed to be a script to convert it.... [23:26:14] legoktm: there will be no conversion from wikitext -> structured discussion [23:26:25] (03CR) 10Qgil: "Does https://gerrit.wikimedia.org/r/#/c/100326/9/docroot/bits/favicon/internal.ico show to you the favicon version you want to contribute?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [23:26:26] wut [23:26:33] I could swear I was told there was a script [23:26:37] that would convert [23:26:41] legoktm: other direction. structured discussion -> wikitext [23:26:47] ohohoh [23:26:48] right [23:27:05] hmmm [23:27:08] popups still work [23:27:15] legoktm: the way to do it is move the existing page contents to a subpage, e.g. http://ee-flow.wmflabs.org/wiki/User_talk:Maryana [23:27:50] https://www.mediawiki.org/wiki/Talk:Flow?action=raw :( [23:30:50] spagewmf: we're all good?! [23:30:58] greg-g: No... [23:31:05] greg-g: labs is still complaining [23:31:05] oh [23:31:05] we are f***ing great! [23:31:10] haha [23:31:15] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [23:32:41] (03PS1) 10Jforrester: Enable VisualEditor on OTRS_wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100927 [23:33:03] (03CR) 10Jforrester: [C: 04-1] "Don't merge until OK'ed." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100927 (owner: 10Jforrester) [23:35:03] James_F: errr, what? [23:35:19] in default mode? [23:36:27] legoktm: Why not? [23:36:45] How about "Why?" [23:36:49] legoktm: OTRS wiki has lots of tables that need cell values editing; it's a good candidate. [23:36:51] (03CR) 10Expi1: "On mobile right now, but that looks like the right icon." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [23:36:58] legoktm: And it's out of a request. :-) [23:37:04] James_F: Maybe you should let OTRS volunteers know first? [23:37:18] legoktm: Duh. What do you think my -1 was? [23:37:40] And, I don't think there are any real tables on OTRS wiki that get edited actively [23:37:57] * James_F can think of two. [23:38:01] The only one I know of is just {{/row|info}} which can't use the table editor afaik [23:39:19] (03CR) 10Reedy: "So why do it 2 different ways?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95996 (owner: 10Aude) [23:41:07] (03PS1) 10BryanDavis: Hack: cron job to clean up files orphaned by UploadFromUrl [operations/puppet] - 10https://gerrit.wikimedia.org/r/100928 [23:44:17] (03PS1) 10Ori.livneh: Disable module storage on all but enwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100929 [23:52:52] (03CR) 10Ori.livneh: [C: 032] Disable module storage on all but enwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100929 (owner: 10Ori.livneh) [23:53:44] !log ori updated /a/common to {{Gerrit|I47ad0a80b}}: Disable module storage on all but enwiki [23:53:59] Logged the message, Master [23:54:30] !log ori synchronized wmf-config/InitialiseSettings.php 'I47ad0a80b: Disable module storage on all but enwiki' [23:54:45] Logged the message, Master