[00:11:14] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [00:19:11] jdlrobson: Max mentioned that you had reservations about implementing .nomobile in CSS [00:19:31] can you elaborate on that (if true) [00:19:52] kaldari: not really [00:20:06] cool [00:32:35] !log mwalker synchronized php-1.22wmf19/extensions/CentralNotice/ 're pushing centralnotice to master on the CORRECT branch... grumble' [00:32:50] Logged the message, Master [00:34:04] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 00:33:54 UTC 2013 [00:34:14] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [00:46:56] (03PS8) 10Physikerwelt: Intial version of puppet script for LaTeXML [operations/puppet] - 10https://gerrit.wikimedia.org/r/61767 [01:13:52] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [01:35:12] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 01:35:09 UTC 2013 [01:35:52] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [01:42:11] * AaronSchulz gets ceph working locally again :) [01:42:48] * AaronSchulz uses xfs this time [02:04:42] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 02:04:32 UTC 2013 [02:04:52] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [02:13:45] !log LocalisationUpdate completed (1.22wmf19) at Tue Oct 1 02:13:45 UTC 2013 [02:13:57] Logged the message, Master [02:25:59] !log LocalisationUpdate completed (1.22wmf18) at Tue Oct 1 02:25:59 UTC 2013 [02:26:11] Logged the message, Master [02:33:52] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 02:33:48 UTC 2013 [02:34:52] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [02:50:55] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Oct 1 02:50:54 UTC 2013 [02:51:10] Logged the message, Master [03:06:25] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [03:34:55] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 03:34:48 UTC 2013 [03:35:25] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [03:41:05] PROBLEM - Puppet freshness on vanadium is CRITICAL: No successful Puppet run in the last 10 hours [03:52:15] RECOVERY - Puppet freshness on vanadium is OK: puppet ran at Tue Oct 1 03:52:11 UTC 2013 [04:04:05] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 04:04:00 UTC 2013 [04:04:25] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [04:35:03] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 04:34:59 UTC 2013 [04:35:23] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [04:44:55] PROBLEM - Puppet freshness on ms-be1012 is CRITICAL: No successful Puppet run in the last 10 hours [04:44:55] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: No successful Puppet run in the last 10 hours [04:51:55] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: No successful Puppet run in the last 10 hours [04:51:55] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: No successful Puppet run in the last 10 hours [04:55:50] Elsie: can we change name? *Bot will be surely blocked somewhere [04:56:20] It requires a configuration change. [04:56:29] Yes, so it's easy. :) [04:56:29] I think legoktm wanted to use EdwardsBot eventually. [04:56:36] Where is *Bot blocked? [04:56:41] Surely a stupid wiki we can ignore. [04:56:52] On any stupid bureacratic wiki [04:56:55] PROBLEM - Puppet freshness on ms-be1002 is CRITICAL: No successful Puppet run in the last 10 hours [04:56:55] PROBLEM - Puppet freshness on ms-be1004 is CRITICAL: No successful Puppet run in the last 10 hours [04:57:14] I did [04:57:17] Fuck 'em. :-) [04:57:23] Yes, but the sysops' stupidity is generally not the message recipents' fault. [04:57:41] I'm not sure we need to accommodate stupidity. [04:57:48] Couldn't they equally just block the new name? [04:57:55] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: No successful Puppet run in the last 10 hours [04:58:18] It's going to be pretty difficult to come up with a name that makes sense in 280+ languages, and follows every single username policy [04:58:21] It's harder to turn on the block mental automatism [04:58:36] EdwardsBot is probably the best since they're already used to it [04:58:46] Don't have to respect username policy [04:58:55] I won't help paint this bikeshed. :-) [04:59:09] Well, that would be a legitimate reason to use a Bot username [04:59:27] It's not a bikeshed, it's survival semplification trick [04:59:57] is there a good reason to not just keep using EdwardsBot? [05:00:21] It pings me. [05:00:22] https://meta.wikimedia.org/wiki/Special:CentralAuth/EdwardsBot <-- only one block [05:00:24] :D [05:00:55] PROBLEM - Puppet freshness on ms-be1003 is CRITICAL: No successful Puppet run in the last 10 hours [05:02:55] PROBLEM - Puppet freshness on ms-be1008 is CRITICAL: No successful Puppet run in the last 10 hours [05:03:24] Nemo_bis: Any edits to https://meta.wikimedia.org/wiki/Global_message_delivery/Spam ? [05:03:55] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: No successful Puppet run in the last 10 hours [05:05:25] Elsie: "much better replacement" sounds redundant [05:05:40] also, all this conversation should be in -tech [05:05:55] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: No successful Puppet run in the last 10 hours [05:06:39] ah, I see the extension will also fulfil Billinghurts' request [05:10:25] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [05:34:05] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 05:33:56 UTC 2013 [05:34:25] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [05:39:14] (03PS1) 10Matanya: Removed pstack package since bug 48025 is resolved. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86812 [06:14:22] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [06:26:59] apergos: what ubuntu version in on fenari? [06:27:27] *is [06:28:18] might be precise [06:28:28] can you please check? [06:28:44] precise [06:28:51] what's up? [06:28:55] thanks! [06:29:11] i'm playing with some pupet, needed to know the version :) [06:34:42] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 06:34:32 UTC 2013 [06:35:22] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [06:35:58] ok, well fenari is on its way out sometime [06:36:07] so I wouldn't rely on that for whatever needs [06:40:16] apergos: what is the deploy host? [06:40:24] tin [07:08:42] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [07:19:06] (03PS1) 10Matanya: Remove lucid umask setting [operations/puppet] - 10https://gerrit.wikimedia.org/r/86816 [07:33:52] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 07:33:42 UTC 2013 [07:34:42] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [07:40:33] PROBLEM - search indices - check lucene status page on search1022 is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern found - 56441 bytes in 0.018 second response time [07:54:11] (03PS1) 10Matanya: Fundrising: removed non-WMF jenkins. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86818 [08:01:25] hashar: are all jenkins gerrit jobs jobs in /srv/ssd/gerrit on jenkins-gallium ? [08:06:28] matanya: that the git repositories [08:06:43] matanya: Gerrit send all repositories modifications to the Jenkins hosts under /srv/ssd/gerrit [08:06:46] as bare repositories [08:07:17] the Jenkins workspaces are in /srv/ssd/jenkins-slave/workspace [08:07:25] thanks, I looked in manifests/role/gerrit.pp [08:07:56] hashar: it seems to have a FIXME that can be applied, wanted to verify i get it correct. Do I ? [08:07:58] (03CR) 10Dzahn: [C: 031] "works for me on analytics machines:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/80055 (owner: 10ArielGlenn) [08:08:43] and BTW hashar I added you as a reviewer on other patch, please review it when you have time [08:10:04] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [08:11:28] matanya: let me look at gerrit.pp [08:11:56] matanya: yeah I still have some jobs depending on repositories being in /var/lib/git/ [08:13:49] ok, thanks hashar I won't commit this yet [08:13:52] *push [08:14:17] (03CR) 10Hashar: [C: 031] "Note that this does not actually remove the package. It merely move it out of puppet scope ;-)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86812 (owner: 10Matanya) [08:15:04] hashar: i may be change to absent if you prefer [08:15:11] na it is ok [08:15:15] I can remove it manually [08:15:39] mutante: mind merging a cleanup change please? https://gerrit.wikimedia.org/r/#/c/86812/ rm a package I have added temporarily on gallium. Will clean up manually. [08:15:43] ok, cool. thanks [08:15:53] mutante: also Guten Tag :-] [08:18:10] (03CR) 10Dzahn: [C: 032] "sure, per comment it was just added for debugging" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86812 (owner: 10Matanya) [08:18:14] \O/ [08:18:46] mutante: danke! [08:18:52] hashar: c'est fait [08:25:24] RECOVERY - Disk space on ms1004 is OK: DISK OK [08:25:37] :) thx apergos [08:26:17] thanks hashar and mutante [08:27:25] mutante: do we have any hardy machines around? [08:27:27] purged on gallium [08:28:25] !log jenkins updating pyflakes jobs to run 'pyflakes .' instead of simply 'pyflakes' {{gerrit|86727}} [08:28:40] matanya: i think just a single one [08:28:41] now [08:28:44] Logged the message, Master [08:29:00] which one mutante ? [08:30:55] matanya: mchenry [08:31:21] the mail relay mutante ? [08:31:28] !log Jenkins: so pyflakes are now actually doing something useful instead of testing nothing. That causes a bunch of jobs to fail [08:31:36] matanya: yes [08:31:41] Logged the message, Master [08:33:44] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 08:33:42 UTC 2013 [08:34:04] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [08:38:31] (03CR) 10Hashar: "I have manually purged `pstack` on gallium." [operations/puppet] - 10https://gerrit.wikimedia.org/r/86812 (owner: 10Matanya) [08:42:57] (03CR) 10Hashar: "Katie, Adam, Matthew : I suddenly discover you have a Jenkins installation :-) The installation I take care of (for Continuous Integra" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86818 (owner: 10Matanya) [08:45:20] hashar: ++ for not having a separate jenkins install from repo [08:45:32] mutante: yeah that part needs to die hard [08:45:49] mutante: I have added FR folks to the change, hopefully they will notice the Gerrit spam mail :-] [08:46:11] hashar: do you want to hire me? :P [08:46:13] (03CR) 10Dzahn: [C: 031] "++ for not having a separate install from repo" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86818 (owner: 10Matanya) [08:49:54] so [08:50:01] lets get some work done instead of reviewing [08:56:59] (03PS1) 10Faidon Liambotis: bits: remove /w/ symlink, add a /w/404.php symlink [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86819 [08:57:08] (03CR) 10jenkins-bot: [V: 04-1] bits: remove /w/ symlink, add a /w/404.php symlink [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86819 (owner: 10Faidon Liambotis) [08:57:32] huh [08:57:36] absolute symlinks [08:57:37] fun [08:58:09] Could not open input file: docroot/bits/w/404.php [08:58:18] heh, yea, i was about to say, it's a thing when adding new symlinks [08:58:40] (03CR) 10Faidon Liambotis: [V: 031] "Ignore the Jenkins error, it fails because of the absolute symlink to /apache." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86819 (owner: 10Faidon Liambotis) [08:58:55] hashar: [09:07:12] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [09:11:08] !log killing nrpe on cp1045 and restarting, was running as icinga instead of nagios, fixes "NRPE: Unable to read output" Icinga Unknown's [09:11:22] Logged the message, Master [09:11:51] huh, weird [09:12:17] the gooood old bug hehe [09:15:42] ..but why the same on mw1173 but restarting doesnt fix it [09:18:08] PIDDIR=/var/run/icinga [09:18:30] and /var/run/nagios on others [09:23:20] (03PS1) 10Adamw: minor config file sync [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/86822 [09:23:27] !log mw1173 had wrong nrpe init script, synced the one from mw1172 and restarted (as nagios), fixes dpkg check [09:23:30] wth [09:23:38] Logged the message, Master [09:25:24] (03PS1) 10Adamw: gitignore things [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/86823 [09:25:52] mark or paravoid, Dan just contacted me because he found some urgent bug in zero, would it be ok if i push https://gerrit.wikimedia.org/r/#/c/86821/ to prod now? [09:26:07] (03PS2) 10Adamw: gitignore things [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/86823 [09:26:10] or if greg-g is around...? :) [09:26:23] what kind of bug? [09:27:01] when users navigate to zero.wikipedia.org without having X-CS set, they should NOT be redirected to www.wikipedia.org, but should be shown red banner instead [09:28:51] paravoid ^ [09:29:37] hi yurik [09:29:43] hey mark [09:29:45] what's your plan on properly solving that DfltLanguage issue? [09:30:04] re: the bug, looks very MW-related so I don't personally have an opinion [09:30:11] but I'm sure greg-g would like to see a BZ # [09:30:49] paravoid, i got a hangout msg from dan. Should i tell him you said he should file a bug first? :) [09:30:54] (03PS1) 10Adamw: cheap hack to qualify table with prefix [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/86824 [09:31:02] yes you should [09:31:18] this isn't (just?) us you need to coordinate with [09:31:33] it is [09:31:47] pretty straightforward really - just a minor logic revert [09:32:21] hm, dan is no longer online, i will put a bug in [09:32:49] I think greg-g would like to track emergency deploys, but you should talk with him and I shouldn't make guesses [09:33:09] as far as the change goes, I don't care about the php logic there, do as you will [09:33:56] i hear you, will follow proper procedure and ask dan to do it. For this case - I will create bz bug, link to it, and deploy [09:34:02] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 09:33:59 UTC 2013 [09:34:06] thanks [09:34:09] so, on the dfltlang stuff [09:34:12] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [09:34:15] what's your plan with that? [09:34:20] sec, let me push it out, will spell it out in a sec [09:42:46] mark, before i push it out - do you know that there are ~600 fatal errors in fatalmonitor? [09:43:04] i don't [09:43:10] 552 PHP Fatal error: Invalid host name (docroot=/usr/local/apache/common/docroot/bits/), can't determine language.#012 in /usr/local/apache/common-local/multiversion/MWMultiVersion.php on line 351 [09:43:32] don't know how critical this is [09:43:35] just letting you know [09:43:40] it's not critical to me :) [09:43:42] yurik: https://gerrit.wikimedia.org/r/#/c/86819/1 [09:44:05] mutante, thx [09:44:06] it's a known issue from yesterday [09:44:16] does anyone sleep around here? [09:44:18] basically, something is hitting MW on bits [09:44:19] just curious [09:44:42] bits.wikimedia.org [09:44:46] grrr [09:44:55] it's BZ #54805, I pushed a fix to gerrit [09:45:22] http://bits.wikimedia.org/w/index.php [09:45:33] * MaxSem bites Reedy [09:46:25] ok, syncing. [09:46:30] * YuviPanda|away has started using PhpStorm, with Vim keybindings [09:46:42] YuviPanda|away, enjoy proper debugging :) [09:46:42] looks like a nice medium, except for the entire 'too slow to run on my old Air' problem [09:46:54] yurik: AutoComplete is nice, yes. [09:47:06] yurik: I'm on the trial version tho [09:47:10] YuviPanda, omg macs are imperfect? liar! [09:47:16] MaxSem: :P [09:47:27] YuviPanda, soon you will start fixing yellow warnings (they are kinda important, but ppl are lazy) [09:47:28] MaxSem: still trying to find what non-apple laptop to buy. no dice yet :( [09:47:31] also, marktraceur will kill you for betrayal of vim [09:47:45] MaxSem: I switched to Emacs like, a month ago [09:47:49] MaxSem: Emacs with Vim keybindings :P [09:48:03] !log yurik synchronized php-1.22wmf18/extensions/ZeroRatedMobileAccess/ [09:48:15] Logged the message, Master [09:49:13] (03CR) 10MaxSem: [C: 032] "Let's see if it will break all the consecutive tests:)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86819 (owner: 10Faidon Liambotis) [09:49:19] (03CR) 10jenkins-bot: [V: 04-1] bits: remove /w/ symlink, add a /w/404.php symlink [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86819 (owner: 10Faidon Liambotis) [09:49:52] (03CR) 10MaxSem: [V: 032] bits: remove /w/ symlink, add a /w/404.php symlink [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86819 (owner: 10Faidon Liambotis) [09:50:05] heh [09:50:08] max and his big hammer [09:50:16] !log yurik synchronized php-1.22wmf19/extensions/ZeroRatedMobileAccess/ [09:50:31] Logged the message, Master [09:50:32] * MaxSem hammers paravoid [09:51:31] mark, overall idea for dflt lang is to have some (preferably index.php) entry point that will redirect to the proper language wiki based on whatever carrier has defined as default, OR show red banner if its ZERO and no carrier, OR redirect to www.wikipedia.org if its m.wikipedia.org and no carrier [09:52:48] (03PS1) 10MaxSem: Quick test to see if tests still work [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86827 [09:52:58] now since i don't exactly know how this is done ( MaxSem showed me some magical script that uses meta page as raw HTML for the main page, but i will have to dig through it), I don't know how soon it will be implemented [09:53:02] paravoid, ^ [09:53:39] take that script apart and grab the interesting pieces? [09:54:10] (03Abandoned) 10MaxSem: Quick test to see if tests still work [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86827 (owner: 10MaxSem) [09:54:58] MaxSem, in order for me to do all that magic I will need to at least load zero config - which might or might not work without the entire framework [09:55:17] probably will but will need some testing [09:55:28] yurik: just make mediawiki handle m.wikipedia.org and zero. ? [09:55:31] and that's the real problem - no "full blown test env" :( [09:55:40] extract2.php has no problems with loading MW [09:56:17] MaxSem, cool [09:56:24] good to know [09:56:47] will try to implement it soon [09:58:18] only slightly unrelated, but where do you put MW code that's completely shared across all languages? [09:58:21] m.wikipedia.org [09:58:24] for example? [09:58:32] (or the proposed global shorturl service) [09:58:38] m.wikipedia.org currently gets redirected to wikipedia.org [09:58:48] which is really ugly and we should fix [09:58:48] !log maxsem synchronized docroot and w [09:58:59] Logged the message, Master [09:59:29] s/redirected/rewritten/ actually [09:59:34] hmm, so does http://bits.wikimedia.org/w/404.php mean that it works or not? :P [09:59:56] yurik: so will you just handle those URLs directly, without a varnish redirect? [10:00:14] yes, that's the general idea :)( [10:00:32] just need to figure out how www works and do the same here [10:00:34] so we can abandon that current patch [10:00:47] which seems like an ugly workaround [10:00:56] i'm talking about https://gerrit.wikimedia.org/r/#/c/86721/1 [10:01:14] mark, i think we can keep it and fix the final redirect target when needed. Dflts vars have to be removed regardless [10:01:24] i mean - don't merge it [10:01:41] i will fix update that patch later once i figure out how to do the whole thing in MW [10:01:45] I think we should fix it properly instead of changing it but not quite fixing it [10:01:56] ok, I'll -2 it now [10:02:01] MaxSem: it works [10:02:03] exactly [10:02:15] (03CR) 10Mark Bergsma: [C: 04-2] "This is just an ugly workaround instead of a real fix. Let's fix it properly." [operations/puppet] - 10https://gerrit.wikimedia.org/r/86721 (owner: 10Yurik) [10:05:11] alright [10:05:13] have a nice flight [10:09:30] thx mark - its in 8 hrs.... need to pack i guess... will sleep on the plane [10:11:35] MaxSem, re weird meta request that timesout for all traffic -- http://meta.m.wikimedia.org/wiki/Special:RecordImpression?result=hide&reason=empty&country=US&uselang=en&project=wikipedia&db=enwiki&bucket=0&anonymous=true&device=unknown [10:11:42] http://meta.m.wikimedia.org/wiki/Special:BannerRandom?uselang=en&sitename=Wikipedia&project=wikipedia&anonymous=true&bucket=0&country=US&device=unknown&slot=15 [10:11:45] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [10:12:40] the first link gets accessed twice - once for meta.m.wikimedia.org and once meta.wikimedia.org; m - gets 200, meta gets 302\ [10:13:18] MaxSem, do you see that in your browser when you simulate moblie device UA? [10:13:28] Gotta love bugs in software, eh? [10:16:50] mhm, don't see it in exception.log - appears to be handled [10:23:28] MaxSem, i'm just looking at the networking tab of the FF's firebug [10:23:54] it makes the site spend extra 10 seconds loading it seems [10:25:09] ask mwalker|away [10:26:25] heh, was just a heads up. I'm more interested to know why zero doesn't show banner for the new carrier i just created :( [10:30:11] (03PS3) 10Hashar: contint::localvhost easily setup an apache listener [operations/puppet] - 10https://gerrit.wikimedia.org/r/86665 [10:31:32] mark: when you can, I have made Apache 'Listen …' statements to be written in a different conf file https://gerrit.wikimedia.org/r/#/c/86665/ That is really ugly though :( [10:34:25] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 10:34:23 UTC 2013 [10:34:45] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [10:48:03] hashar: is it considered ugly to depend on a different module when writting a module? [10:48:15] not necessarily [10:48:55] hashar: i'm working to split the packages in : modules/contint/manifests/packages.pp [10:49:38] but we have the java module, which takes care of most of the java related packages, my question is can I call it? or the packages should be independent? [10:49:48] (03CR) 10Mark Bergsma: "Meh, until we have a decent Apache module it's all ugly." [operations/puppet] - 10https://gerrit.wikimedia.org/r/86665 (owner: 10Hashar) [10:53:51] (03CR) 10Mark Bergsma: [C: 032] "Meh, until we have a decent Apache module it's all ugly." [operations/puppet] - 10https://gerrit.wikimedia.org/r/86665 (owner: 10Hashar) [10:58:47] matanya: yeah you can call that [11:01:52] it needs a rewrite, but it's fine for using it now [11:02:10] ottomata: btw, how's openjdk? :-) [11:03:04] (03CR) 10Mark Bergsma: [C: 032] Normalize and Vary on the forceHTTPS cookie [operations/puppet] - 10https://gerrit.wikimedia.org/r/86268 (owner: 10Mark Bergsma) [11:04:25] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 11:04:20 UTC 2013 [11:04:45] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [11:06:22] (03PS6) 10Hashar: contint: puppet class to setup browsertests slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/85264 [11:11:48] hashar: so using "imort java" and then calling java { 'java-6-openjdk': distribution => 'openjdk' version => 6, alternative => false } java { 'java-7-openjdk': distribution => 'openjdk' version => 7, alternative => false } should work, correct? [11:12:35] * matanya is driving hashar crazy today :| [11:16:16] (03PS1) 10Hashar: contint::localvhost is really a define [operations/puppet] - 10https://gerrit.wikimedia.org/r/86831 [11:16:34] (03CR) 10Hashar: [C: 031] contint::localvhost is really a define [operations/puppet] - 10https://gerrit.wikimedia.org/r/86831 (owner: 10Hashar) [11:16:40] .. [11:17:02] mark: I have created a Class when I wanted a define :( https://gerrit.wikimedia.org/r/86831 [11:17:06] Class -> define [11:17:20] had Invalid resource type contint::localvhost at modules/contint/manifests/website.pp:105 [11:20:05] but two weeks haven't passed yet since you submitted this patch, eh [11:20:10] so I can't possibly merge that yet [11:21:03] (03CR) 10Mark Bergsma: [C: 032 V: 032] contint::localvhost is really a define [operations/puppet] - 10https://gerrit.wikimedia.org/r/86831 (owner: 10Hashar) [11:21:10] :D [11:24:38] I should have done that patch differntly [11:25:47] (03PS1) 10Dzahn: add nrpe to node zirconium so we can monitor processes like etherpad [operations/puppet] - 10https://gerrit.wikimedia.org/r/86833 [11:25:48] (03PS1) 10Dzahn: use clientaddrUsesPort and a high port to send out outgoing SNMP requests per man snmp.conf so that non-root users can bind to it and we avoid getting SNMP permission denied errors when icinga runs these and we send the client IP since I9c7b1c11aeef2ff984 [operations/puppet] - 10https://gerrit.wikimedia.org/r/86834 [11:28:09] ArgumentError: comparison of Hash with Hash failed [11:28:10] niceeee [11:28:20] what? [11:28:21] awwww, Java :P [11:28:24] why didn't you test this in labs? [11:28:41] irb(main):003:0> [ { 'f' => 1 }, { 'b' => 3 } ].sort [11:28:42] ArgumentError: comparison of Hash with Hash failed [11:28:54] awww, Ruby! :P [11:28:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [11:29:06] mark: Iwas too lazy to setup fake SSL certificates on a labs instance to get apache to start [11:29:22] mark: and randomly flipping a coin because of GlusterFS is not that fun anymore :-] [11:30:34] I should rewrite that Listen madness anyway and use a basic template such as: [11:30:35] Listen 127.0.0.1:<%= port %-> http [11:30:35] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 10.262 second response time [11:30:35] Listen [::1]:<%= port %-> http [11:32:07] (03PS2) 10Dzahn: add :0 as port when sending outgoing SNMP requests [operations/puppet] - 10https://gerrit.wikimedia.org/r/86834 [11:33:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [11:36:35] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 11:36:28 UTC 2013 [11:36:40] (03PS3) 10Dzahn: add :0 as port when sending outgoing SNMP requests [operations/puppet] - 10https://gerrit.wikimedia.org/r/86834 [11:36:45] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [11:36:46] (03PS1) 10Hashar: contint: simplify apache listen template [operations/puppet] - 10https://gerrit.wikimedia.org/r/86836 [11:38:00] mark: over simplified my soup with https://gerrit.wikimedia.org/r/86836 :D [11:38:45] (03CR) 10Mark Bergsma: [C: 032] contint: simplify apache listen template [operations/puppet] - 10https://gerrit.wikimedia.org/r/86836 (owner: 10Hashar) [11:38:55] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 25.592 second response time [11:38:56] sorry :( [11:39:01] was too tired yesterday I guess [11:40:42] (03CR) 10ArielGlenn: [C: 031] "this is due to a bug in the client code (which shuld not try to bind to port 161), e.g. https://groups.google.com/forum/#!topic/mailing.un" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86834 (owner: 10Dzahn) [11:50:38] (03CR) 10Dzahn: [C: 032] add :0 as port when sending outgoing SNMP requests [operations/puppet] - 10https://gerrit.wikimedia.org/r/86834 (owner: 10Dzahn) [11:54:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [11:55:19] (03CR) 10Dzahn: "fixed PDU checks . https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?hostgroup=pdus&style=detail&nostatusheader" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86834 (owner: 10Dzahn) [11:56:55] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 27.050 second response time [11:59:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [12:00:55] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 28.404 second response time [12:06:45] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 12:06:36 UTC 2013 [12:07:30] (03Abandoned) 10Dzahn: add nrpe to node zirconium so we can monitor processes like etherpad [operations/puppet] - 10https://gerrit.wikimedia.org/r/86833 (owner: 10Dzahn) [12:07:45] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [12:07:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [12:08:02] (03CR) 10Hashar: [C: 04-1] "(11 comments)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/61767 (owner: 10Physikerwelt) [12:08:45] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 18.746 second response time [12:14:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [12:15:45] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 17.612 second response time [12:33:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [12:38:05] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 12:37:59 UTC 2013 [12:38:45] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [12:41:45] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 24.620 second response time [12:45:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [12:46:45] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 24.083 second response time [12:48:06] Are dzahn, lcarr or rhalsell around? [12:49:26] the first one would mutante [12:49:45] LeslieCarr and RobH respectively, although they're SF-based [12:51:32] got him in pm [13:05:55] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 13:05:52 UTC 2013 [13:06:45] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [13:10:28] (03PS1) 10Faidon Liambotis: varnish mobile: don't override MW's X-Analytics [operations/puppet] - 10https://gerrit.wikimedia.org/r/86846 [13:10:39] mark: ^ [13:11:30] (03PS1) 10Matanya: Nrpe: /usr/lib/nagios/plugins/check_dpkg should be absent everywhere. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86847 [13:12:12] !log start xtrabackup s3 db1035 to db1038 [13:12:23] Logged the message, Master [13:14:51] (03CR) 10Mark Bergsma: "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86846 (owner: 10Faidon Liambotis) [13:15:35] is the TTL 30 days? [13:15:55] I think so [13:17:11] hiyalls [13:17:14] https://gerrit.wikimedia.org/r/#/c/86746/ [13:17:20] mmmm need breakfast [13:18:39] (03PS2) 10Faidon Liambotis: varnish mobile: don't override MW's X-Analytics [operations/puppet] - 10https://gerrit.wikimedia.org/r/86846 [13:20:14] (03CR) 10Faidon Liambotis: [C: 032] varnish mobile: don't override MW's X-Analytics [operations/puppet] - 10https://gerrit.wikimedia.org/r/86846 (owner: 10Faidon Liambotis) [13:21:25] MaxSem: Again, I made a patch to only enable PHP for /w/404.php, not for the whole w dir :p [13:27:20] (03CR) 10Hashar: [C: 031] Shell access for Bryan Davis. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86764 (owner: 10BryanDavis) [13:29:51] (03CR) 10Dzahn: "wait, the check command is "check_dpkg" via NRPE , nrpe_check_dpkg and that is /usr/local/lib/nagios/plugins/check_dpkg. having it in /us" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86847 (owner: 10Matanya) [13:31:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [13:32:35] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 11.020 second response time [13:33:42] (03CR) 10Matanya: "Seems so. what I understood from it is was in an incorrect path. which should have been absent. looks like it is gone anyway." [operations/puppet] - 10https://gerrit.wikimedia.org/r/86847 (owner: 10Matanya) [13:35:18] (03CR) 10Akosiaris: [C: 032] "Yes that was the original intent. nrpe/nagios plugins shipped by us should not be in /usr/etcetcetc but in /usr/local/etcetcetc. I meant t" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86847 (owner: 10Matanya) [13:36:12] (03CR) 10Mark Bergsma: "text-varnish is text, served by Varnish, as opposed to Squid. We're currently migrating them one project at a time." [operations/puppet] - 10https://gerrit.wikimedia.org/r/86746 (owner: 10Ottomata) [13:36:15] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 13:36:07 UTC 2013 [13:36:45] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [13:36:56] great, thanks mark, paravoid and I were talking about that yesterday and weren't sure [13:37:03] I am sure of what it is [13:37:14] I just said it doesn't matter where you'll put them now [13:37:22] (03CR) 10Dzahn: [C: 031] "key like on office page, added by User:BDavis (WMF). good point about the next UID" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86764 (owner: 10BryanDavis) [13:38:00] (03PS1) 10Akosiaris: Adding bacula module rspec tests [operations/puppet] - 10https://gerrit.wikimedia.org/r/86850 [13:40:20] (03CR) 10Akosiaris: [C: 032] Adding bacula module rspec tests [operations/puppet] - 10https://gerrit.wikimedia.org/r/86850 (owner: 10Akosiaris) [13:42:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [13:44:59] (03PS1) 10Akosiaris: Use generic Rakefile in nrpe module [operations/puppet] - 10https://gerrit.wikimedia.org/r/86852 [13:50:26] paravoid, ja i know [13:50:33] but mark has an opinion on that :p [13:50:45] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 18.145 second response time [13:51:52] hm, mark, I think I need a change to the analytics network ACL [13:52:19] i'm want to consume that kafka data into hdfs, and then use hive to check for missing sequence numbers [13:52:31] so the hadoop nodes need to be able to talk to the kafka brokers [13:52:55] but since the kafka brokers now have public IPs, that looks like the analytics cluster trying to initiate a connection to something outside of the analytics network [13:53:06] which the ACL doesn't allow [14:00:02] (03CR) 10Andrew Bogott: [C: 032] Remove lucid umask setting [operations/puppet] - 10https://gerrit.wikimedia.org/r/86816 (owner: 10Matanya) [14:01:53] (03PS3) 10Ottomata: Adding ulsfo as valid site for lvs_services:text-varnish, https, ipv6, bits, upload and mobile. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86746 [14:04:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [14:07:15] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 14:07:06 UTC 2013 [14:07:45] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [14:08:22] (03PS1) 10Matanya: (bug 38946) : Added culmus-fancy font to help render svg [operations/puppet] - 10https://gerrit.wikimedia.org/r/86855 [14:09:55] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 28.325 second response time [14:12:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [14:13:02] thanks andrewbogott [14:22:26] PROBLEM - MySQL Processlist on db1021 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 0 copy to table, 934 statistics [14:23:26] RECOVERY - MySQL Processlist on db1021 is OK: OK 0 unauthenticated, 0 locked, 3 copy to table, 0 statistics [14:24:02] I got yet another patch for continuous integration to let me setup a slave in labs + a vhost to install mediawiki in :-D https://gerrit.wikimedia.org/r/#/c/85264 [14:24:25] will not impact anything [14:24:45] PROBLEM - MySQL Recent Restart on db1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:24:54] hashar: reading. [14:25:35] RECOVERY - MySQL Recent Restart on db1021 is OK: OK 329 seconds since restart [14:25:48] that is a bunch of glue [14:27:35] Who can/should I talk to about getting added to the beta project in Labs? [14:27:47] i still haven't looked at the deployment-prep-master / deployment-prep-puppetclient you have setup for beta :/ [14:27:55] bd808: what is your labs account? Will add [14:28:08] hashar: BryanDavis [14:29:33] bd808: should be good now [14:29:45] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 16.628 second response time [14:29:59] hashar, does class { 'contint::browsertests' actually do anything? [14:30:42] andrewbogott: it just install some dependencies and set up an apache vhost 'localhost' listening on 127.0.0.1:9413 [14:31:01] andrewbogott: Jenkins will put MediaWiki + VisualEditor there then use phantomJS to run some browser tests against localhost:9413 [14:31:05] hashar: Thanks. I got logged into deployment-bastion on the first try. [14:31:16] andrewbogott so contint::browser tests merely provided dependencies [14:31:23] bd808: have sudo ? [14:31:50] hashar: Except… it doesn't? Looks to me like it just requires a file which was already included [14:31:53] hashar: looks like it. `sudo ls /etc` worked [14:32:26] andrewbogott: https://gerrit.wikimedia.org/r/#/c/85264/6/modules/contint/manifests/browsertests.pp,unified ? [14:32:39] andrewbogott: that installs a few packages + the ghost? [14:32:42] vhost [14:32:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [14:33:16] hashar, Oh, my mistake, was misunderstanding puppet syntax briefly :( [14:33:34] confused declare w/define [14:33:45] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 17.729 second response time [14:34:06] (03PS2) 10Akosiaris: Use generic Rakefile in nrpe module [operations/puppet] - 10https://gerrit.wikimedia.org/r/86852 [14:34:15] (03CR) 10Akosiaris: [V: 032] Use generic Rakefile in nrpe module [operations/puppet] - 10https://gerrit.wikimedia.org/r/86852 (owner: 10Akosiaris) [14:34:28] andrewbogott: I keep doing the same mistake :( [14:34:31] (03CR) 10Andrew Bogott: [C: 032] contint: puppet class to setup browsertests slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/85264 (owner: 10Hashar) [14:35:05] andrewbogott: thx! [14:35:15] (03CR) 10Akosiaris: [C: 032] Use generic Rakefile in nrpe module [operations/puppet] - 10https://gerrit.wikimedia.org/r/86852 (owner: 10Akosiaris) [14:35:20] hashar, does this mean that jenkins already has the ability to spin up labs instances and run tests? [14:35:25] akosiaris: yeah that is the other checkbox [14:35:35] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 14:35:30 UTC 2013 [14:35:44] andrewbogott: na not all. I just build a proof of concept instance to write all the needed wrappers [14:35:44] hashar: yeah... i noticed ... [14:35:45] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [14:35:51] aw :( [14:36:03] there are times i really hate gerrit... [14:36:06] Reedy, again, not exposing what's not needed is better [14:36:12] ;):P [14:36:57] andrewbogott: so as a first step, the browser tests jobs will run on a single instance. Then I will get in touch with Ryan/ labs folks to get a jenkins-slave (or whatever name) project and look at querying the openstack api to bootstrap instances. I haven't filled bugs for that though [14:37:29] akosiaris: I got a shell function to approve :-] [14:38:18] akosiaris: https://github.com/hashar/alix/blob/master/shell_functions#L14 [14:38:26] akosiaris: so I can: gerrit approve --code-review +2 12345,1 [14:39:39] hashar: yeah i do that too every now and then. Maybe i should do it more [14:40:23] could even be an alias +2=... [14:41:25] I am cursed by puppet [14:45:45] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: No successful Puppet run in the last 10 hours [14:45:45] PROBLEM - Puppet freshness on ms-be1012 is CRITICAL: No successful Puppet run in the last 10 hours [14:46:29] (03PS1) 10Hashar: contint: ruby1.9 -> ruby1.9.3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/86860 [14:52:45] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: No successful Puppet run in the last 10 hours [14:52:45] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: No successful Puppet run in the last 10 hours [14:53:19] (03PS2) 10Hashar: contint: fix browsertests dependencies [operations/puppet] - 10https://gerrit.wikimedia.org/r/86860 [14:53:42] andrewbogott: last one to fix up some packages and I should be set for this morning. Sorry! https://gerrit.wikimedia.org/r/86860 [14:54:01] (03CR) 10RobLa: [C: 031] Shell access for Bryan Davis. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86764 (owner: 10BryanDavis) [14:54:36] ping mark? [14:54:43] yes? [14:54:52] ok if i merge that recent change? [14:54:57] (03CR) 10Andrew Bogott: [C: 032] contint: fix browsertests dependencies [operations/puppet] - 10https://gerrit.wikimedia.org/r/86860 (owner: 10Hashar) [14:55:03] i'll check once more [14:55:05] not sure how much self review you want me to do here, its a relatively simple one i think [14:55:08] https://gerrit.wikimedia.org/r/#/c/86746/ [14:55:09] k danke [14:55:27] ah one weird indent, fixing [14:55:41] also [14:55:50] you removed all ips from "text" which is fine [14:55:52] but the structure needs to exist [14:55:56] so an empty hash in this case [14:56:07] (or at least, puppet may barf on that) [14:56:14] hm ok, [14:56:28] oh does every section need an ulsfo key then? [14:57:05] hmm, some don't have esams [14:57:07] (03CR) 10Mark Bergsma: "(2 comments)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86746 (owner: 10Ottomata) [14:57:35] it may or may not work without, I didn't check in detail [14:57:45] PROBLEM - Puppet freshness on ms-be1002 is CRITICAL: No successful Puppet run in the last 10 hours [14:57:45] PROBLEM - Puppet freshness on ms-be1004 is CRITICAL: No successful Puppet run in the last 10 hours [14:57:45] but if pybal's config tries to look for that site in that hash, it will currently fail [14:57:51] best to avoid it and just make it empty [14:58:25] ok, but many of the other services are missing some site keys [14:58:27] so it might be ok [14:58:32] (03PS1) 10Manybubbles: Enable CirrusSearch extra splitting for mw.org. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86862 [14:58:35] then again, I am getting a puppet error right now anyway [14:58:45] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: No successful Puppet run in the last 10 hours [14:59:13] oh, mark, I added a whole ulsfo section for ipv6 [14:59:18] 'ulsfo' => { [14:59:18] 'wikimedialb6' => "2620:0:863:ed1a::0", [14:59:18] ... [14:59:22] should I remove that and make it empty too? [14:59:44] yeah [14:59:47] ulsfo is not going to need that [14:59:51] k [14:59:54] ulsfo is gonna be our first site without any squid [14:59:59] so it's a bit different because of that [15:01:18] aye [15:01:23] (03PS4) 10Ottomata: Adding ulsfo as valid site for lvs_services:text-varnish, bits, upload and mobile. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86746 [15:01:28] ulsfo? is that a cluster like pmtpa, eqiad and esams? [15:01:35] yup [15:01:44] akosiaris: andrewbogott: should we have a puppet/rpsec check in or is there nothing to say? [15:01:45] PROBLEM - Puppet freshness on ms-be1003 is CRITICAL: No successful Puppet run in the last 10 hours [15:01:48] what's special about it that it won't have any squids? [15:01:52] shiny new SF caching DC [15:02:02] Squids suck [15:02:12] they will die everywhere [15:02:14] (03PS5) 10Ottomata: Adding ulsfo as valid site for lvs_services:text-varnish, bits, upload and mobile. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86746 [15:02:14] hashar, I don't have anything new to report, but it sounds like you might :) [15:02:25] not really [15:02:25] so... varnish instead MaxSem? [15:02:33] yup [15:02:42] already live on some wikis [15:02:49] andrewbogott: I haven't looked at rspec at all :/ [15:02:49] oh [15:02:52] (if not reverted again:P) [15:02:55] ottomata: sorry, just realised something [15:02:59] ja? [15:03:03] for ulsfo, in text-varnish, you'll also need to mention the ipv6 ips... [15:03:03] ok, we can skip [15:03:07] which is dfferent from the other sites [15:03:14] andrewbogott: same day/ time next week so? [15:03:14] because there the "ipv6" section is taking care of that (for now) [15:03:18] oh because the same hosts will answer for them [15:03:19] yep [15:03:21] MaxSem, any idea why sfo was chosen as the location? it's not because that's where wmf's offices are is it? [15:03:27] but ulsfo text-varnish section should be similar to say, bits [15:03:39] do I look like an ops?:P [15:03:45] PROBLEM - Puppet freshness on ms-be1008 is CRITICAL: No successful Puppet run in the last 10 hours [15:03:46] sorry, this is all a bit weird because it's in migration [15:03:46] andrewbogott: I am attending a conference friday with some openstack folks. Will talk a bit about the crazy idea of spawning instances out of an instance. :-) [15:04:07] oh, hm [15:04:11] s'ok hm [15:04:45] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: No successful Puppet run in the last 10 hours [15:05:16] so, mark, add something like 'wikimedialb6' => 'ipv6::addy', for each project in text-varnish? [15:05:48] basically, the same configs I removed from the ipv6 section, just put in it text-varnish? [15:06:45] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: No successful Puppet run in the last 10 hours [15:06:46] hmm [15:06:50] not sure that works [15:06:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [15:06:55] you know what, let's wait with that [15:06:57] we can change it lateer [15:07:01] let's first get this working [15:07:25] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 15:07:15 UTC 2013 [15:07:32] (03CR) 10Chad: [C: 04-1] "(1 comment)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86862 (owner: 10Manybubbles) [15:07:45] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [15:07:55] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 28.779 second response time [15:08:21] hmmm, ok [15:08:54] so, mark, is it ok as in then? with the ipv6 addies removed alltoegher for now? [15:09:05] yes [15:09:15] we'll fix issues as they come up [15:09:23] ok, going to merge, hoping that this change will fix my puppet error [15:09:34] (03PS6) 10Ottomata: Adding ulsfo as valid site for lvs_services:text-varnish, bits, upload and mobile. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86746 [15:09:49] (03CR) 10Ottomata: [C: 032 V: 032] "Reviewed with Mark in IRC." [operations/puppet] - 10https://gerrit.wikimedia.org/r/86746 (owner: 10Ottomata) [15:10:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [15:12:27] (03PS2) 10Manybubbles: Enable CirrusSearch extra splitting for mw.org. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86862 [15:12:48] (03CR) 10Manybubbles: "(1 comment)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86862 (owner: 10Manybubbles) [15:13:45] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 22.476 second response time [15:13:53] (03CR) 10Manybubbles: "I figured it was possible, but not stylistically ok." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86862 (owner: 10Manybubbles) [15:15:20] growl, still puppet error [15:15:33] mark, have you hit this one before? [15:15:34] pybal/pybal.conf.erb: undefined method `include?' for :undef:Symbol at /etc/puppet/modules/pybal/manifests/configuration.pp:10 [15:15:36] (03CR) 10Chad: [C: 032] Enable CirrusSearch extra splitting for mw.org. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86862 (owner: 10Manybubbles) [15:15:46] (03Merged) 10jenkins-bot: Enable CirrusSearch extra splitting for mw.org. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86862 (owner: 10Manybubbles) [15:15:52] probably ;) [15:15:54] let's see [15:16:31] i was tracing the configs with my mind powers yesterday, everything seemed in place [15:16:47] there's just one line with include? in pybal.conf.erb [15:16:49] so it must be one of those [15:16:52] !log demon synchronized wmf-config/InitialiseSettings.php 'Enable CirrusSearch extra splitting for mw.org.' [15:16:54] <^d> manybubbles: ^ [15:17:06] Logged the message, Master [15:17:13] so either the 'sites' or the classes doesn't have something [15:17:28] and the answer is... [15:17:31] classes is undef [15:17:41] $lvs_class_hosts [15:17:42] doesn't handle ulsfo [15:18:03] so you need to add cases for ulsfo, which host handles which "class" [15:18:15] lvs4001 and 4003 will do high-traffic1 [15:18:20] and 4002 and 4004 will do high-traffic4 [15:18:22] similar to esams really [15:18:23] oh oh oh [15:18:23] got it [15:18:31] yes [15:18:34] k [15:18:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [15:20:32] you probably don't need to worry about the https class [15:20:36] aye [15:20:40] as you're not including ulsfo in its "sites" [15:20:45] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 24.027 second response time [15:21:04] (03PS1) 10Ottomata: Specifying ulsfo lvs hosts for high-traffic1 and high-traffic2 in $lvs_class_hosts. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86863 [15:21:12] mark ^ [15:22:04] (03CR) 10Mark Bergsma: [C: 031] Specifying ulsfo lvs hosts for high-traffic1 and high-traffic2 in $lvs_class_hosts. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86863 (owner: 10Ottomata) [15:22:05] (03PS1) 10Hashar: contint: Apache could not start without IPv6 [operations/puppet] - 10https://gerrit.wikimedia.org/r/86864 [15:22:59] !log dist-upgrade and reboot loudon [15:23:01] (03CR) 10Ottomata: [C: 032 V: 032] Specifying ulsfo lvs hosts for high-traffic1 and high-traffic2 in $lvs_class_hosts. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86863 (owner: 10Ottomata) [15:23:09] Logged the message, Master [15:23:26] ah whaa [15:23:40] accidentally committed that lb6 stuff i worked on before you said not to add it for now [15:23:41] removing. [15:23:48] i saw that [15:23:53] and thought "what the heck, probably works" ;) [15:23:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [15:24:29] hahah [15:24:29] (03PS1) 10Ottomata: Removing accidental commit of *lb6 IPs for text-varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/86865 [15:24:32] i'll remove it for now [15:24:50] (03CR) 10Ottomata: [C: 032 V: 032] Removing accidental commit of *lb6 IPs for text-varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/86865 (owner: 10Ottomata) [15:28:24] (03CR) 10Andrew Bogott: [C: 032] contint: Apache could not start without IPv6 [operations/puppet] - 10https://gerrit.wikimedia.org/r/86864 (owner: 10Hashar) [15:29:14] mark, while I've gotcha: [15:29:15] https://rt.wikimedia.org/Ticket/Display.html?id=5878 [15:29:58] got more specifics though? tcp port nrs? [15:31:37] just 9092 [15:33:12] ticket updated [15:33:22] + term kafka { [15:33:23] + from { [15:33:23] + destination-address { [15:33:23] + 208.80.154.160/31; [15:33:23] + } [15:33:23] + protocol tcp; [15:33:25] + destination-port 9092; [15:33:27] + } [15:33:29] + then accept; [15:33:33] + } [15:34:11] cool [15:34:50] when we start using the prod kafka brokers, we'll have to add their IPs as well [15:34:57] or maybe I'll just move the IPs from the test brokers, [15:35:02] thanks [15:35:08] ag! puppet finally ran [15:35:12] still same puppet template error [15:40:16] (03CR) 10Ryan Lane: [C: 032] Add .deploy to a system-wide ignore file [operations/puppet] - 10https://gerrit.wikimedia.org/r/86749 (owner: 10Ryan Lane) [15:41:05] mark, the error reads like the :undef:Symbol is lvs_class_hosts[service['class']] [15:41:42] and service['class'] is each of the classes in $lvs_services [15:42:00] so it would be undef if there was a class in $lvs_services that wasn't in $lvs_class-hosts... [15:42:45] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 21.601 second response time [15:52:28] (03PS1) 10Ottomata: Fixing lvs4002,4 lvs_balancer_ip assignment [operations/puppet] - 10https://gerrit.wikimedia.org/r/86866 [15:52:43] (03PS2) 10Ottomata: Fixing lvs4002,4 lvs_balancer_ip assignment [operations/puppet] - 10https://gerrit.wikimedia.org/r/86866 [15:52:48] (03CR) 10Ottomata: [C: 032 V: 032] Fixing lvs4002,4 lvs_balancer_ip assignment [operations/puppet] - 10https://gerrit.wikimedia.org/r/86866 (owner: 10Ottomata) [15:58:13] (03CR) 10QChris: "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86846 (owner: 10Faidon Liambotis) [15:59:37] (03CR) 10Faidon Liambotis: "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86846 (owner: 10Faidon Liambotis) [15:59:46] :) [16:02:13] (03CR) 10Ryan Lane: [C: 032] Add sartoris user with optional secondary groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/86756 (owner: 10Ryan Lane) [16:06:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 30 seconds [16:07:25] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 16:07:21 UTC 2013 [16:07:45] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [16:08:55] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 27.345 second response time [16:31:47] (03CR) 10Katie Horn: "Hashar: Yes: We absolutely have a critical jenkins server in fundraising. We don't use it for testing, though: Rather, we treat it like cr" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86818 (owner: 10Matanya) [16:34:35] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 16:34:25 UTC 2013 [16:34:45] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [16:46:49] (03CR) 10Hashar: "The build timeout should be there. We are using it for CI, albeit set at 6 hours per default :-)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86818 (owner: 10Matanya) [17:04:15] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 17:04:10 UTC 2013 [17:04:45] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [17:10:54] (03CR) 10Physikerwelt: "(9 comments)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/61767 (owner: 10Physikerwelt) [17:24:28] (03PS9) 10Physikerwelt: Initial version of puppet script for LaTeXML [operations/puppet] - 10https://gerrit.wikimedia.org/r/61767 [17:24:34] (03CR) 10jenkins-bot: [V: 04-1] Initial version of puppet script for LaTeXML [operations/puppet] - 10https://gerrit.wikimedia.org/r/61767 (owner: 10Physikerwelt) [17:25:03] (03PS10) 10Physikerwelt: Initial version of puppet script for LaTeXML [operations/puppet] - 10https://gerrit.wikimedia.org/r/61767 [17:25:07] Jeff_Green: hi there, have some time to explain short questions in mail.pp? [17:25:14] sure [17:25:24] physikerwelt: hey! Thanks for the edits on the labsvagrant pages! I answered and cleaned them up a little [17:26:34] (03PS11) 10Physikerwelt: Initial version of puppet script for LaTeXML [operations/puppet] - 10https://gerrit.wikimedia.org/r/61767 [17:26:56] Jeff_Green: 1 ) so, i'm looking at the file, I see in line 12 you use exec instead of file. why is that? [17:27:49] and 2) in line 31 you use exec instead of user, saying puppet can't manage groups? didn't get that part [17:28:05] 01YuviPanda: thank you [17:28:24] matanya: line 12 predates my work on the file. I'm not sure why it is done as an exec [17:28:46] I'll change that then [17:28:51] it may be a hack to enforce precedence [17:29:04] but it is supported afaik [17:29:34] git blame shows andrewbogott added that block--maybe he knows? [17:30:15] Oh now I understand what labs-vagrant is [17:30:31] didn't think of that, thanks. what about 2 Jeff_Green ? [17:31:26] matanya: same, but I've had mixed results with puppet administering existing users so nothing would surprise me there [17:31:48] YuviPanda: BTW, did you have a chance to look at https://gerrit.wikimedia.org/r/#/c/86420/ ? [17:32:18] andrewbogott: ? [17:32:22] looking... [17:33:50] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 17:33:47 UTC 2013 [17:34:40] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [17:34:52] physikerwelt: no, not yet. [17:34:55] physikerwelt: let me check [17:35:18] matanya: I don't think I actually wrote those lines, although I did muddy the waters with some messy commits. [17:35:49] I agree that file/user seem like better ways to handle those bits. Mutante may have opinions as well... [17:36:32] matanya, if you want to change those sections and are able to run a valid test to make sure they work… I think that would be welcome. [17:37:49] ok, I will andrewbogott thanks a lot [17:43:04] (03CR) 10QChris: "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86846 (owner: 10Faidon Liambotis) [17:51:48] hey paravoid, I'm looking at building a package for jq: [17:51:49] http://stedolan.github.io/jq/download/ [17:52:11] I just built one from github master, but then noticed that it is actually already in debian, [17:52:12] http://packages.debian.org/search?keywords=jq [17:52:22] yes I was about to say this [17:52:27] should I use mine, or backport (?is that the correct term?) [17:52:32] backport [17:53:50] should I use later version from github, or just keep the package version as is? [17:53:58] I don't know that I need anything from later version [17:54:42] he has a source tarball for jq 1.3 on his download page [17:55:03] i could dl that, add the backported debian/ dir and just build [17:55:18] should I commit this to a repo? or just build and add to apt? (i'm guessing commit to repo?) [17:56:03] sec. [17:58:38] ok, so 1.3 seems to have been released back in May, so I filed a wishlist bug report to Debian [17:58:51] I'd say use the Debian version unless you have a reason not to [17:58:56] but I don't mind either way [17:59:10] ok, that's fine with me, if I do that, do I need to commit to a gerrit repo? [17:59:15] no [17:59:17] if I am just downloading and building? [17:59:31] download, change the version to 1.2-8~precise1 and build [17:59:34] ok so: dl, add debian/ dir, update changelog, build, and add to apt [17:59:35] cool [17:59:39] yep [17:59:44] add debian/ dir? [17:59:49] for 1.3 you mean? [18:00:46] for 1.2-8, just apt-get source jq; cd jq*; dch -v 1.2-8~precise1; dpkg-buildpackage -uc -us -sa should be enough [18:00:53] well, or pbuilder :) [18:01:00] ooooooooo [18:01:31] how do I tell it to get the source of a package for an older debian version? [18:01:39] andrewbogott: I think removing it completely make more sense to me [18:01:39] ? [18:02:09] i'm running precise and I can't find jq via the usual apt commands I run, which is why I didin't realize it already existed in debian in the first place [18:02:13] E: Unable to find a source package for jq [18:02:25] oh [18:02:40] matanya, the directory? [18:02:49] echo "deb-src http://ftp.us.debian.org/debian/ sid main" >> /etc/apt/sources.list [18:02:51] the first exec [18:02:53] apt-get update [18:03:06] ah k [18:03:17] matanya, I think we still need to create that dir since it's used as a mount point [18:03:18] andrewbogott: It is called later on, so I think i should have been removed [18:03:43] i agree, i removed the exec and placed the file ensure instaed [18:03:44] Oh, I see what you mean. [18:03:45] Hm. [18:04:07] Make sure you move the 'require' into the place where it's actually created... [18:04:19] and then add dzahn/mutante as a reviewer for that patch, he may know something that we don't. [18:04:26] New version specified (1.2-8~precise1) is less than [18:04:26] the current version number (1.2-8)! [18:04:28] ottomata: alternatively, you can just dget from packages.d.o [18:04:31] that ok or do we want something else? [18:04:32] paravoid? [18:04:34] that's okay [18:04:37] that's on purpose [18:04:44] I did add require and i'll add them, thanks [18:04:53] ah so if they backport 1.2 into precise [18:04:57] so, saucy has also 1.2-8, since it synced from Debian [18:05:00] there's will override [18:05:01] cool [18:05:03] if you were to upgrade a precise box to saucy [18:05:06] dzahn == mutante, I don' t recall which name he uses on gerrit :) [18:05:16] you want it to upgrade to saucy's version [18:05:26] as it might have been built with a newer library or whatever [18:10:49] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [18:21:08] paravoid, does this belong in universe or main? [18:21:10] (03PS1) 10Matanya: Repalce exec calls with file and user. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86889 [18:21:12] universe, right? since it is a backport? [18:21:18] oh sorry [18:21:18] yes [18:21:24] reprepro wikitech doc says that right in front of my face [18:21:52] !log backported jq_1.2-8 to precise and imported into apt [18:22:09] Logged the message, Master [18:23:08] ok, pushed. thanks a lot andrewbogott [18:24:23] (03PS1) 10RobH: cleaning out old and reclaimed hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/86891 [18:24:46] that file is still too big with outdated cruft, but tiny changes are safer changes... [18:25:22] (unless we are counting the great caching redirection fiasco of 2008, in which case a tiny change of : rather than ; can crash the site) [18:25:35] (my finest hour) [18:26:05] (03CR) 10RobH: [C: 032] cleaning out old and reclaimed hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/86891 (owner: 10RobH) [18:26:38] RobH: crashing the site is the dream of any op [18:27:07] you aren't a full member of the ops team until you cause significant downtime. [18:27:26] Which, incidentally, really cannot be true any longer I don't think. [18:27:43] As the number of new opsen who have not crashed the site outnumbers the old opsen who have. [18:27:56] (new to us, not new to sysadmin) [18:28:59] interesting, that sounds like unreasonable expectations of your new employees. I'll bring this up with HR... [18:29:15] that makes sense to me [18:29:30] at my work place we have some similar requirement [18:29:34] "Gayle, please have Ops make the site less reliable so more Ops members can make it crash and feel like real Opsen" [18:29:59] if a sysadmin didn't cause any harm, he isn't working hard enough [18:30:01] expectation, not requirement! [18:30:08] though i like matanya's answer. [18:30:17] RobH: second class citizen causing! [18:30:17] progress comes at the cost of uptime! ;] [18:31:19] I have made one critical mistake at my previous work place, rm -f in /root instead in my home [18:31:45] did su and didn't notice i changed home dirs [18:32:14] but since then, over 2.5 years, nothing worth telling :/ [18:32:20] I should work harder [18:33:21] (03PS1) 10Ottomata: Adding kafka::udp2log::relay define to consume from Kafka and send to udp2log. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86894 [18:34:12] (03CR) 10Ottomata: "This probably shouldn't be a role, but I'm not sure where I should put it. Thoughts?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/86894 (owner: 10Ottomata) [18:35:13] (03PS2) 10Ottomata: Adding kafka::udp2log::relay define to consume from Kafka and send to udp2log. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86894 [18:36:19] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 18:36:12 UTC 2013 [18:36:49] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [18:51:19] PROBLEM - SSH on amslvs1 is CRITICAL: Server answer: [18:52:19] RECOVERY - SSH on amslvs1 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [19:04:30] Where would I look in ops-puppet to find the ip address of the UDP packet relay that sends HTCP purge messages to AMS? [19:05:12] so you wouldn't right now ;) https://wikitech.wikimedia.org/wiki/Multicast_HTCP_purging [19:05:26] the relay box itself should be puppetized [19:05:32] thoguh did you mean the actual multicast address ? [19:05:39] the relay transforms it from multicast to unicast [19:06:09] LeslieCarr: Thanks. Ummm... I want to figure out how to send an HTCP datagram that only goes to AMS. [19:06:49] Working on https://bugzilla.wikimedia.org/show_bug.cgi?id=54647 [19:07:03] that's not really how it works… you may be able to just directly send it to hooft on the correct port and see if that makes it into a multicast stream ? [19:08:02] i have to run to lunch though … i can check this out when i get back... [19:08:31] hrm [19:08:34] LeslieCarr: Cool. No big rush as patch isn't approved yet (and I have to wait til tomorrow to get shell access). [19:11:20] <^d> LeslieCarr: I can talk lvs after lunch if that works for you. [19:14:29] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 19:14:19 UTC 2013 [19:14:49] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [19:34:39] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 19:34:29 UTC 2013 [19:34:49] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [19:43:41] Since yesterday I get timeout errors (either Nginx error messages, or the Wikimedia error page) when trying to save even moderately big pages on zuwp. Anyone here who knows anything about what that might be about? [19:48:59] rotsee, looks like some templates are too slow [19:50:19] @MaxSem Yes, there are some heavy templates in use there. Copied from enwp, though, and they do work there... [19:50:57] enwiki has already converted some templates to Lua [19:52:02] true. Guess i should start cleaning up the zulu templates... [19:52:48] (03PS1) 10Matanya: Memcached : remove redundnt hardy check. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86990 [19:53:16] (03CR) 10jenkins-bot: [V: 04-1] Memcached : remove redundnt hardy check. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86990 (owner: 10Matanya) [19:54:20] Infobox South African town is definitely busted [19:54:47] Thanks MaxSem, will start with that [20:02:15] (03CR) 10MaxSem: [C: 032] Replace Watchlist specific schema with generic ClickTracking schema [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86011 (owner: 10Jdlrobson) [20:05:44] (03Abandoned) 10Matanya: Memcached : remove redundnt hardy check. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86990 (owner: 10Matanya) [20:07:52] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 20:07:42 UTC 2013 [20:08:16] (03PS2) 10Jdlrobson: Replace Watchlist specific schema with generic ClickTracking schema [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86011 [20:08:44] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [20:09:16] (03PS1) 10Matanya: Memcached : remove redundnt hardy check. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86997 [20:10:13] (03CR) 10MaxSem: [C: 032] Replace Watchlist specific schema with generic ClickTracking schema [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86011 (owner: 10Jdlrobson) [20:14:49] (03Merged) 10jenkins-bot: Replace Watchlist specific schema with generic ClickTracking schema [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86011 (owner: 10Jdlrobson) [20:17:03] hashar: still around? [20:17:29] the one from this morning is probably sleeping right now [20:17:35] i am the one from hawai [20:17:42] LOL [20:18:08] mind another question regarding beta.pp ? [20:18:39] @ hashar ^ [20:25:21] full path ? :D [20:26:36] matanya: not sure what beta.pp is [20:26:41] link ?: -D path ? [20:27:00] manifests/misc/beta.pp [20:27:26] hashar: you wrote it :P [20:34:02] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 20:33:53 UTC 2013 [20:34:42] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [20:40:27] matanya: so what do you want to know ? [20:41:32] hashar: in line 24 and onward, you some weird sudo magic, didn't really get the point of this [20:41:40] *you do [20:43:00] ahh [20:43:19] mwdeploy is used to deploy the code on the beta cluster [20:43:33] the user calls various scripts (such as one used to rebuild the localization cache) [20:43:33] that i got. [20:43:50] which needs to be run using yet another user (l10nupdate) [20:44:03] so we want mwdeploy to be allowed to run mw-update-l10n as l10nupdate user [20:44:10] scapping [20:45:18] matanya: and ./files/scap/mw-update-l10n (being run as l10nupdate) in turns as sudo policy to run more commands as apache user.. [20:45:19] hashar: wouldn't it make more sense to split it and let each "user" run his own script? [20:45:23] matanya: yeah that is messy [20:46:22] (03PS1) 10MaxSem: Try enabling mobile host on test2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/87008 [20:48:03] !log maxsem Started syncing Wikimedia installation... : Weekly mobile deployment [20:48:15] Logged the message, Master [20:50:13] PROBLEM - Host payments2 is DOWN: PING CRITICAL - Packet loss = 100% [20:52:43] PROBLEM - Host payments4 is DOWN: PING CRITICAL - Packet loss = 100% [20:52:43] PROBLEM - Host payments3 is DOWN: PING CRITICAL - Packet loss = 100% [20:53:50] bd808: ^d you get to fight over who gets my time next [20:54:02] winner sends me a pic of the loser's corpse [20:54:36] * bd808 taps out before ^d can attack [20:55:23] RECOVERY - Host payments2 is UP: PING OK - Packet loss = 0%, RTA = 30.97 ms [20:55:53] RECOVERY - Host payments4 is UP: PING OK - Packet loss = 0%, RTA = 31.00 ms [20:55:53] RECOVERY - Host payments3 is UP: PING OK - Packet loss = 0%, RTA = 31.02 ms [20:56:07] LeslieCarr: I /think/ that my needs may be satisfied by sending packets directly to hooft [20:56:40] But I'm not ready to test yet really anyway due to need for code review & shell access grant [21:00:00] bd808: that will happen eventually :-] [21:00:36] * hashar heads to bed [21:01:34] * bd808 waves goodnight to hawaii hashar and waits for taiwan hashar to come on shift [21:01:49] lol [21:02:02] na seriously no shift in taiwan, that is a closed day. [21:02:30] P--------| [21:03:39] !log maxsem Finished syncing Wikimedia installation... : Weekly mobile deployment [21:03:52] Logged the message, Master [21:04:18] damn, i am never going to get "engineers_in_knife_fights.ly" off the ground, despite my series a funding by kleiner perkins [21:04:51] (03CR) 10MaxSem: [C: 032] Try enabling mobile host on test2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/87008 (owner: 10MaxSem) [21:06:01] LeslieCarr: I'd need a really long knife to poke ^d from my office [21:09:35] bd808: metric namespacing: . or .? discuss. [21:10:23] (03Merged) 10jenkins-bot: Try enabling mobile host on test2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/87008 (owner: 10MaxSem) [21:10:31] ori-l: wiki.metric imho [21:10:32] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [21:10:33] ori-l: . [21:10:47] well, [21:11:00] that is intuitive, i agree [21:11:23] what's a real world example of the metric? [21:12:14] !log maxsem synchronized wmf-config/InitialiseSettings.php 'https://gerrit.wikimedia.org/r/#/c/87008/' [21:12:26] well, here's one I looked at yesterday: hit/miss rate of ResourceLoader's filter cache (RL uses memcached to avoid minifying JS over and over; there are other such 'filters') [21:12:26] Logged the message, Master [21:12:55] i was interested at the overall rate; i had no reason to suspect (and still don't) that it's wiki-specific [21:13:04] * bd808 looks for old notes on metric naming conventions [21:13:53] graphite makes it easy to aggregate by name components that come later in the name [21:14:53] so the question is which is the more common use-case: give me all metrics for foowiki, or give me this one metric for all wikis [21:15:27] ori-l: So at $DAYJOB-1 we chose: ..... [21:15:38] eg kount.prod.boi.a211.ris.requests.100100.q.avg [21:15:46] being a client, presumably? [21:15:58] ori-l: division of parent [21:16:08] which may be much like wiki here [21:17:51] ori-l: It moves from most general on left to most specific on right [21:18:03] (03CR) 10MaxSem: [C: 032] Remove more removals [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86774 (owner: 10MaxSem) [21:18:51] bd808: that's a sensible approach, but it doesn't decide the question of which is more specific, the thing that you're measuring or the place it's happening at [21:19:01] The only thing that was sometimes troublesome about this scheme was the lack of discoverability of the services [21:19:09] Hi, is there any progress on getting a dedicated server for tex->mathml conversion in production... if not we could use some labs instance in the meanwhile [21:19:54] physikerwelt, 1) labs can't be used for prod needs, 2) has the architecture discussion concluded? [21:20:05] ori-l: Agreed. In our environment opsen were used to thinking about things from a network -> host -> service order [21:20:28] ori-l: which also fit my brain [21:21:06] 2) who has to dicuss and what has to be decided? [21:22:07] There is a mail from gwicke asking if a dedicated math rendering server is wanted or if shell out is preferred [21:22:09] bd808: kk. i'll chew on this a bit :) thanks [21:22:21] all responses were positve on a dedicated server [21:22:46] for now it seems to be wise to fade in mathml support step by step [21:23:18] (03PS2) 10MaxSem: Remove more removals [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86774 [21:23:35] ori-l: okey doke. One thing to think about: how are you going to create the names? That may make one way or the other more obviously easy. [21:24:00] step one was general support for mathml what we showed in amsterdam... I thin you reviewed the commit [21:24:41] (03CR) 10MaxSem: [C: 032] Remove more removals [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86774 (owner: 10MaxSem) [21:24:42] now further steps have been performed support svg's once they are ready and generate mathml instantely [21:24:49] (03Merged) 10jenkins-bot: Remove more removals [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/86774 (owner: 10MaxSem) [21:25:34] That MathML is technically posible has been tested. The question is how the users would react on the output [21:26:28] !log maxsem synchronized wmf-config/InitialiseSettings.php [21:26:34] in a first step we want to try out to display mathml to firefox users only, since they do not need enabeling technologies for html5 [21:26:39] Logged the message, Master [21:28:06] in a second step, if there is a good way to convert mathm to high quality svg the svg output should replace the svgs [21:28:35] <^d> LeslieCarr: Is now a good time? [21:28:36] at that time we can drop the old mathpng images and there is no need to store files [21:28:47] further questions? [21:28:58] sure [21:29:13] are you here physically or should we just irc it [21:29:22] because i'm not getting off the hammock :) [21:29:37] yes I'm here [21:29:46] <^d> LeslieCarr: I'm home, irc it is. [21:30:22] woot [21:31:28] <^d> So, I basically followed what parsoid did for eqiad. [21:33:52] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Tue Oct 1 21:33:49 UTC 2013 [21:34:32] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [21:35:30] ^d relink me to the gerrit sets plz ? [21:35:35] so i can be lazy ;) [21:35:55] <^d> dns: https://gerrit.wikimedia.org/r/#/c/86741/, puppet: https://gerrit.wikimedia.org/r/#/c/86742/ [21:37:38] merci [21:37:54] (03PS3) 10Lcarr: Add LVS setup for new search infrastructure with monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/86742 (owner: 10Chad) [21:38:49] ^d - any reason for putting witht he other internal services instead of with the old search run of ip's ? [21:39:15] <^d> No particular reason, other than I was copy+pasting from parsoid. [21:40:20] not copy+pasting -- round-tripping! [21:41:41] (03CR) 10Lcarr: [C: 04-1] "(1 comment)" [operations/dns] - 10https://gerrit.wikimedia.org/r/86741 (owner: 10Chad) [21:43:09] (03CR) 10Lcarr: [C: 032] Add LVS setup for new search infrastructure with monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/86742 (owner: 10Chad) [21:43:45] ^d we'll also have to put backends at https://noc.wikimedia.org/pybal/eqiad/ [21:43:57] the normal lvs new vip procedure on wikitech is accurate [21:44:44] (03PS2) 10Chad: Add service IP addresses for new search [operations/dns] - 10https://gerrit.wikimedia.org/r/86741 [21:46:19] what about the architecture we discussed yesterday? [21:46:32] maybe I should raise it on the procurement ticket :) [21:46:40] (just got back) [21:47:06] * AaronSchulz swiftly asks paravoid how things are going [21:47:27] <^d> paravoid: I brought that up. [21:47:38] <^d> See final comment on https://gerrit.wikimedia.org/r/#/c/86718/ [21:47:51] AaronSchulz: setting up ms-be, tomorrow at the latest [21:48:10] swiftrepl original containers would be the next step [21:48:14] <^d> So lvs would still be all clients (read/write). [21:48:24] <^d> We wouldn't be segmenting read/write. [21:48:42] no, this wasn't about segmenting read/write [21:48:49] it was about isolating routing nodes from data notes, for starters [21:48:53] paravoid: i can ignore all the icinga alerts for ms-be*, right ? (i have been but just making sure.. ;) ) [21:49:08] splitting masters off into a third cluster is an extra step, seems also nice [21:49:27] I'll follow up on the procurement ticket [21:49:33] LeslieCarr: as long as it's ms-be10xx (eqiad) [21:49:43] yes, ignore ms-fe/ms-be eqiad until further notice [21:49:58] cool [21:50:23] <^d> paravoid: We can certainly do that going forward, but the client nodes would still need lvs (while data & master wouldn't, if I'm understanding). [21:50:32] yes [21:52:37] !log maxsem synchronized php-1.22wmf18/extensions/MobileFrontend/ [21:52:48] Logged the message, Master [21:54:22] !log maxsem synchronized php-1.22wmf19/extensions/MobileFrontend/ [21:54:33] Logged the message, Master [21:54:36] <^d> LeslieCarr: I amended that dns change for spaces -> tabs [21:55:16] lol, somebody's trying to hack us: Exception from line 322 of /usr/local/apache/common-local/php-1.22wmf18/includes/content/ContentHandler.php: No handler for model 'wikitext"/>"/>