[00:02:03] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Server Error - 1703 bytes in 6.727 second response time [00:02:22] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 331.533325 [00:10:02] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 209534 bytes in 7.153 second response time [00:12:02] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [00:12:22] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [00:13:22] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [00:15:22] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [00:17:52] !log ori synchronized php-1.23wmf16/extensions/CentralNotice 'Update CentralNotice to tip of wmf_deploy for I7d8259fc4' [00:18:00] Logged the message, Master [00:19:42] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [00:24:46] !log ori synchronized php-1.23wmf15/extensions/CentralNotice 'Update CentralNotice to tip of wmf_deploy for I7d8259fc4' [00:24:54] Logged the message, Master [00:25:05] ^ mwalker, K4-713, greg-g [00:26:51] greg-g: around? [00:28:45] ori: Thanks for the update. [00:33:34] Getting a report of issues with bits [00:33:45] I can't repro, but it's there [00:34:19] rdwrer: Looks normal from Europe [00:34:21] Maybe just residual from the PROBLEMs above? [00:34:45] Report is from NY, US [00:34:57] jeremyb: How do things look for you? [00:36:08] ori: I don't suppose there were issues with your deploy that may be causing this [00:37:12] with what? [00:37:19] causing what, rather? [00:37:31] what 'issues with bits'? [00:37:33] There's a report about bits issues, trying to confirm now [00:37:37] JS and images not loading [00:37:40] i'll look at ganglia [00:38:06] rdwrer: which page/hostname? [00:38:18] enwiki, she says she can't log in [00:38:30] i.e. the login page won't load [00:39:32] works for me, but i'll continue looking around for an indication that something may be wrong [00:40:24] Thanks, not sure if it's just her or what [00:40:44] we had someone with similar issues in here yesterday [00:40:46] rdwrer: do you know which browser she's using? [00:40:51] but weren't able to nail down anything [00:41:10] hoo: Was it Finnegan? [00:41:27] rdwrer: No, Cyberpower... [00:41:29] Hm. [00:43:00] ori: Neither fx nor chrome work well for her [00:43:14] Something will load, but not all of it [00:43:23] rdwrer: Can you get a traceroute or ping out of her? [00:43:36] Can try [00:44:05] Or do one better [00:44:20] mh? [00:45:08] Trying to bring her in [00:46:55] Finnegan: 2014-03-04 - 16:43:23 rdwrer: Can you get a traceroute or ping out of her? [00:47:20] i can try if someone can explain how I do that? [00:47:29] Hm, Windows? [00:47:34] yep, win7 [00:47:35] Finnegan: actually -- there's something you could try first [00:47:41] you have Chrome handy, right? [00:48:05] yes. Also Opera, somewhere in the depths of my drive [00:48:17] can you browse to the login page that appears broken [00:48:36] and then use the browser's menu bar to choose View -> Developer -> JavaScript Console [00:49:32] "[blocked] The page at 'https://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboard/Incidents' was loaded over HTTPS, but ran insecure content from 'http://en.wikipedia.org/w/index.php?action=raw&ctype=text/css&title=User:Lupin/navpopdev.css': this content should also be loaded over HTTPS." [00:50:04] (the login seems to have gone through, and it's trying to take me to ANI, which is where I was going before I couldn't get there) [00:51:14] mh [00:51:34] there's a couple "hey, this parameter is deprecated" warnings in there too [00:51:48] that's normal [00:52:04] ori: Can the https things make Chrome block other content [00:52:12] * hoo hasn't touched chrome in months [00:52:33] I mean will it stop loading like on a JS syntax error [00:53:19] I'm looking at what I think is the corresponding console in firefox now while trying to load ANI directly. Not seeing the "insecure" warning in the console, but the page is still hung with most of it loaded (only the logo is currently missing). [00:53:48] * Finnegan apologizes for being the most worthlessly non-technical bug-reporter ever [00:54:15] Finnegan: not at all! I hope we can help [00:58:30] Maybe another problem in -tech [00:58:31] Maybe the same [01:00:08] Finnegan: is it still hung? [01:00:33] did you open the console in firefox, or did it come up by itself? is there an error indicated? [01:01:17] I opened it in firefox myself. It was hung as of about a minute ago, at which point I canceled the pag eload because my fan was starting to make fan-y noises [01:01:40] no error messages that I can see. Only the yellow "deprecated" messages [01:02:28] "because my fan was starting to make fan-y noises" sounds a bit like run away javascript [01:03:33] mh, there's quite some JS there: https://en.wikipedia.org/wiki/User:Fluffernutter/common.js [01:03:43] * Finnegan blushes [01:05:51] Finnegan: Tested your JS in my account and it worked ok [01:07:50] currently "waiting for upload.wikimedia.org" according to ff [01:08:27] Finnegan: On which page? [01:08:37] ANI still. Want me to try another? [01:08:41] Or can you maybe even see the full URL of what it's trying to load [01:09:53] not that I can see. That's just the snippet from the browser's status bar [01:19:44] Finnegan: Leaving in a few minutes, next step would probably be to get a traceroute to upload.wikimedia.org [01:19:52] rdwrer: Want to guide? ;) [01:27:42] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [02:01:22] (03PS1) 10Chad: Give Cirrus to all Wikiquotes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116919 [02:14:29] (03CR) 10Deskana: [C: 031] "Product manager says yes." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116919 (owner: 10Chad) [02:16:57] Database error A database query error has occurred. This may indicate a bug in the software. Function: SpecialWhatLinksHere::showIndirectLinks Error: 0 [02:16:58] at https://en.wikipedia.org/w/index.php?title=Special%3AWhatLinksHere&target=Module%3AUnsubst&namespace=10 [02:24:11] 504 for the same url now [02:27:46] !log LocalisationUpdate completed (1.23wmf15) at 2014-03-05 02:27:46+00:00 [02:27:54] Logged the message, Master [02:54:01] !log LocalisationUpdate completed (1.23wmf16) at 2014-03-05 02:54:00+00:00 [02:54:08] Logged the message, Master [02:54:19] I wonder how long that took.... [03:20:42] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [03:37:06] !log messing with innodb compression on db1007 [03:37:14] Logged the message, Master [03:41:14] !log LocalisationUpdate ResourceLoader cache refresh completed at 2014-03-05 03:41:14+00:00 [03:41:23] Logged the message, Master [03:55:50] ok, starting on the move to trigger [03:55:53] (03CR) 10BryanDavis: Config changes for Elasticsearch (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/116498 (owner: 10Manybubbles) [03:58:21] (03CR) 10BryanDavis: [C: 031] Log length of l10nupdate to SAL and Graphite [operations/puppet] - 10https://gerrit.wikimedia.org/r/116718 (owner: 10Greg Grossmeier) [04:03:34] (03PS6) 10Ryan Lane: Deployment module changes for trebuchet-trigger [operations/puppet] - 10https://gerrit.wikimedia.org/r/110239 [04:08:00] rdwrer: sorry, was afk [04:08:09] 's okay [04:08:52] PROBLEM - MySQL Idle Transactions on db1033 is CRITICAL: CRIT longest blocking idle transaction sleeps for 604 seconds [04:09:13] PROBLEM - MySQL InnoDB on db1033 is CRITICAL: CRIT longest blocking idle transaction sleeps for 626 seconds [04:17:36] !log switching out git-deploy perl frontend for trebuchet trigger [04:17:44] Logged the message, Master [04:18:02] (03CR) 10Ryan Lane: [C: 032] Deployment module changes for trebuchet-trigger [operations/puppet] - 10https://gerrit.wikimedia.org/r/110239 (owner: 10Ryan Lane) [04:21:52] RECOVERY - MySQL Idle Transactions on db1033 is OK: OK longest blocking idle transaction sleeps for 0 seconds [04:22:13] RECOVERY - MySQL InnoDB on db1033 is OK: OK longest blocking idle transaction sleeps for 0 seconds [04:28:38] Ryan_Lane: \o/ [04:28:40] that's awesome [04:28:42] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [04:28:56] (03PS1) 10Ryan Lane: Adding python-gitdb dependency [operations/puppet] - 10https://gerrit.wikimedia.org/r/116924 [04:29:00] well, I still need to test ;) [04:29:23] but it should be in place soon [04:29:57] i think scap / trebuchet convergence is a lot closer now, btw [04:29:57] hm. I really need test/testrepo pointing at something [04:30:04] ori: any idea where I can point it? [04:30:08] excellent [04:30:17] what do you mean? which repo or server or? [04:30:19] I'm looking forward to having some of the code merge together :) [04:30:21] server [04:30:27] it used to point at formey, I think [04:30:41] what are the desirable characteristics for that server? [04:30:50] i dunno what your criteria are [04:30:52] it's the most basic of basic repos [04:30:55] it can be anything [04:31:16] it can be hafnium, that's an odds and ends server for metrics [04:31:23] ok, cool [04:32:07] hm. I wonder if formey is still targetable [04:32:13] if so, I'll use it for this test [04:32:22] sure [04:33:42] -_- [04:33:49] IMissing the following configuration item: deploy.repo-name [04:33:54] I forgot I changed that name [04:34:49] working for test/testrepo [04:35:02] now I need to update that damn key on all the repos [04:35:25] I guess I could have kept compatibility, but prefix-name sucks [04:38:10] (03PS1) 10Ryan Lane: Change sudo format for pillar fetching and service restarts for trigger [operations/puppet] - 10https://gerrit.wikimedia.org/r/116925 [04:43:25] ori: any repos that we can test a deploy with? [04:43:48] bleh. umask isn't being checked [04:44:15] fluoride or eventlogging would be fine. does there need to be an undeployed change? [04:44:23] i can touch a README or something if you like [04:44:59] nah, it can be a no-op change [04:46:38] damn git-python. I need to switch that out with dulwich [04:46:45] it's not reading the global config properly [04:47:09] oh well, required umask is at best only a partially useful feature [04:48:02] * Ryan_Lane opens a bug [04:48:15] fluoride would be safest [04:48:48] ok [04:57:42] and of course, I'm trying to require the wrong umask. it should be 002 [05:00:47] ah. interesting [05:01:01] vanadium has a deployment_target grain of eventlogging [05:01:09] which doesn't match fluroride [05:01:26] fluoride* [05:03:08] ori: when was the last time that was deployed? [05:03:50] I think it's target was removed from puppet [05:03:53] *its [05:04:38] (03CR) 10Ryan Lane: [C: 032] Adding python-gitdb dependency [operations/puppet] - 10https://gerrit.wikimedia.org/r/116924 (owner: 10Ryan Lane) [05:04:57] (03CR) 10Ryan Lane: [C: 032] Change sudo format for pillar fetching and service restarts for trigger [operations/puppet] - 10https://gerrit.wikimedia.org/r/116925 (owner: 10Ryan Lane) [05:05:00] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Wed 05 Mar 2014 05:00:17 AM UTC [05:05:49] (03PS1) 10Ryan Lane: Change required umask for deployment in trigger to 002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/116926 [05:07:00] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Wed 05 Mar 2014 05:00:17 AM UTC [05:08:42] (03PS1) 10Ryan Lane: fluoride is no longer a target, switch to eventlogging [operations/puppet] - 10https://gerrit.wikimedia.org/r/116927 [05:09:00] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Wed 05 Mar 2014 05:00:17 AM UTC [05:10:15] rawr jenkins [05:11:00] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Wed 05 Mar 2014 05:00:17 AM UTC [05:13:00] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Wed 05 Mar 2014 05:00:17 AM UTC [05:15:00] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Wed 05 Mar 2014 05:00:17 AM UTC [05:15:10] (03CR) 10Ryan Lane: [C: 032] Change required umask for deployment in trigger to 002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/116926 (owner: 10Ryan Lane) [05:15:22] (03CR) 10Ryan Lane: [C: 032] fluoride is no longer a target, switch to eventlogging [operations/puppet] - 10https://gerrit.wikimedia.org/r/116927 (owner: 10Ryan Lane) [05:17:00] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Wed 05 Mar 2014 05:00:17 AM UTC [05:19:00] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Wed 05 Mar 2014 05:00:17 AM UTC [05:20:28] that fixed fluoride, but it's being deployed one more place than before (hafnium.wikimedia.org and vanadium.eqiad.wmnet) [05:21:00] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Wed 05 Mar 2014 05:00:17 AM UTC [05:21:08] service restart is working, report sync is working, sync is working, start is working, abort is working [05:21:10] success! [05:21:22] !log finished switch from the perl git-deploy to trigger [05:21:31] Logged the message, Master [05:23:00] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Wed 05 Mar 2014 05:00:17 AM UTC [05:24:50] \o/ [05:25:00] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Wed 05 Mar 2014 05:00:17 AM UTC [05:25:02] no more shitty perl! [05:25:04] :) [05:25:20] not that my python is really amazing, but it's nicer than perl [05:27:00] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Wed 05 Mar 2014 05:00:17 AM UTC [05:28:16] I should probably do a no-op deploy to parsoid [05:29:00] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Wed 05 Mar 2014 05:00:17 AM UTC [05:30:50] RECOVERY - Puppet freshness on mw32 is OK: puppet ran at Wed Mar 5 05:30:45 UTC 2014 [05:31:25] <3 deploys are way faster now that I don't need to request pillar data a bunch of times [05:33:00] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Wed 05 Mar 2014 05:30:45 AM UTC [05:47:14] (03PS1) 10Ori.livneh: Enable cookie-based geolocation on all frontend text Varnishes [operations/puppet] - 10https://gerrit.wikimedia.org/r/116928 [06:00:28] RECOVERY - Puppet freshness on mw32 is OK: puppet ran at Wed Mar 5 06:00:23 UTC 2014 [06:21:28] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [07:29:28] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [08:16:25] wait [08:16:29] dickson.freenode.net split? [08:16:45] How come? [08:18:13] (03CR) 10Alexandros Kosiaris: [C: 032] contint: do not cache api/json calls [operations/puppet] - 10https://gerrit.wikimedia.org/r/116748 (owner: 10Hashar) [08:19:57] apergos: dickson.freenode.net split? [08:19:58] oO [08:20:03] weird.. [08:20:12] hello, yes likely so [08:20:18] Hmm [08:20:22] Did RC go down also? [08:20:32] I see it didn't lol [08:20:33] rc? [08:20:39] irc.wikimedia.org [08:20:40] oh [08:20:42] Different server? [08:20:48] different server and system [08:20:57] :) [08:21:45] I was trying to troubleshoot my IRC bot and saw "Max SendQ exceeded." on the RC server.. [08:21:46] :S [08:21:59] With ngrep :D [08:22:54] nice :-D [08:27:22] Yeah, but of course it doesn't work on SSL connections ;) [08:27:38] I'm mainly connected to SSL-enabled IRC servers. [08:27:43] (Except my bot) [08:29:53] if dickson splits it's probably freenode doing maintenance on it [08:30:01] yup, when we cut out (some of) the ability of govts to snoop, we also cut out (some of) our troubleshooting ability [09:22:28] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [09:49:27] (03PS4) 10Ori.livneh: reprepro: import Facebook's HHVM pkgs, for Labs / Vagrant usage only [operations/puppet] - 10https://gerrit.wikimedia.org/r/112314 [09:50:31] hey [09:50:35] I was just amending that [09:52:14] (03PS5) 10Faidon Liambotis: reprepro: import Facebook's HHVM pkgs, for Labs / Vagrant usage only [operations/puppet] - 10https://gerrit.wikimedia.org/r/112314 (owner: 10Ori.livneh) [09:52:53] (03CR) 10Faidon Liambotis: [C: 032] "Persuaded to merge this, with the condition of never reaching production in this state." [operations/puppet] - 10https://gerrit.wikimedia.org/r/112314 (owner: 10Ori.livneh) [09:53:03] (03CR) 10Faidon Liambotis: [V: 032] reprepro: import Facebook's HHVM pkgs, for Labs / Vagrant usage only [operations/puppet] - 10https://gerrit.wikimedia.org/r/112314 (owner: 10Ori.livneh) [09:54:31] paravoid: hi! Is the hhvm package installable on Precise? :-] [09:54:52] yes, the hhvm "package" is installable on precise, only for testing purposes [09:55:08] awesome [09:55:26] meh [09:55:36] This way I will be able to add a Jenkins job to run mw/core test suite :-] [10:02:17] hashar: it can't / shouldn't run on the production jenkins server [10:02:39] that was the condition for merging it, as the packages have some rather severe problems [10:02:45] (03PS1) 10Matanya: sockpuppet: remove leftovers [operations/puppet] - 10https://gerrit.wikimedia.org/r/116936 [10:03:11] !log Jenkins: gallium overloaded with a few java threads taking 100% CPU starving the box :( [10:03:19] Logged the message, Master [10:03:25] ori: will get it running on a labs instance [10:03:30] * ori nods [10:03:38] paravoid: thanks for the merge, btw! [10:03:49] did you see the diff? [10:03:51] there were two errors [10:04:00] * ori looks [10:04:18] first, they were shipping InRelease, but you had GetInRelease: no [10:04:20] that's minor [10:04:25] hashar: 3rd time in 24h ... [10:04:38] the more important one is that you had Suite: stable and they have no such suite in the repo [10:04:40] matanya: yeah I am well aware of it :/ [10:05:00] matanya: I gotta drop history from Jenkins builds I guess [10:05:16] paravoid: my mistake, sorry. I wasn't sure how to test it. [10:05:26] hashar: you can store it elsewhere [10:05:35] not without engineering something new [10:05:53] we could put the logs / files in a swift volume [10:06:02] no, I'm just telling you so that you know for next time :) [10:06:08] * ori nods [10:06:19] Suite: NNN maps to http://dl.hhvm.com/ubuntu/dists/NNN [10:06:41] well, hashar an action should be taken :D [10:06:51] could we have a 'testing' component that isn't enabled on production? [10:07:04] that would be nice, to be able to add stuff for vagrant, etc. [10:07:23] meh, maintaining more components is kind of a pain, but if you have multiple use cases we should consider it [10:08:02] there are a few things vagrant provisions via PPAs and other distribution channels [10:08:13] xdebug, phpsh, hhvm (until now) [10:08:16] maybe one or two more [10:08:44] phpsh sounds like something that could be useful in prod too? [10:09:00] no way [10:09:08] i wish i never started using it [10:09:21] i now maintain the package in pypi, heh [10:09:41] i'd replace it with something like boris if i were starting over, but i've been lazy [10:09:53] it's a really, really awful hack [10:10:36] see readme: https://github.com/atdt/phpsh [10:10:47] "If you're not an existing user, stop. There are better options out there. " [10:11:01] lol [10:12:12] Downloads (All Versions): 38 downloads in the last day [10:13:40] paravoid: btw, the geo_cookie stuff is running on cp1055 and cp1066, no issues and no measurable negative impact on request latency afaict [10:14:10] so bblack and i were going to push it out to the remaining varnishes today/tomorrow [10:14:23] cool [10:16:26] (03CR) 10Matanya: [C: 031] "a followup patch: https://gerrit.wikimedia.org/r/#/c/116936/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115527 (owner: 10Dzahn) [10:18:15] (03CR) 10TTO: "I like section collapsing for Wiktionary. (Obviously I don't like it for pages with only one section, but for more than one it is useful.)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114634 (owner: 10MaxSem) [10:29:52] (03PS2) 10Matanya: sockpuppet: remove leftovers [operations/puppet] - 10https://gerrit.wikimedia.org/r/116936 [10:30:28] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [10:31:20] (03CR) 10MaxSem: "Looks like consensus on the bug is different." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114634 (owner: 10MaxSem) [10:32:20] (03PS1) 10Ori.livneh: Update Schema:Echo revision to 7731316 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116937 [10:34:02] (03CR) 10Ori.livneh: [C: 032] Update Schema:Echo revision to 7731316 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116937 (owner: 10Ori.livneh) [10:34:14] hey, do we have any logs that can help us to figure out what's going on with https://bugzilla.wikimedia.org/show_bug.cgi?id=62245 ? [10:34:16] (03Merged) 10jenkins-bot: Update Schema:Echo revision to 7731316 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116937 (owner: 10Ori.livneh) [10:34:51] (03PS1) 10Matanya: dhcpd: remove Sun-X4150 related configs [operations/puppet] - 10https://gerrit.wikimedia.org/r/116938 [10:36:59] MaxSem: $ curl -Is -A "DoCoMo/2.0 SH03A" http://ja.wikipedia.org/wiki/Main_page | grep ^Location [10:36:59] Location: http://ja.m.wikipedia.org/wiki/Main_page [10:38:22] however, phone users see a 502 [10:38:28] wait [10:39:00] early November - isn't it when we switched text from Squid to Varnish? [10:40:03] (03PS1) 10Hashar: contint: deny access to Jenkins build history [operations/puppet] - 10https://gerrit.wikimedia.org/r/116939 [10:40:30] yes, it's mentioned in https://www.mediawiki.org/wiki/Scrum_of_scrums/2013-11-12 [10:41:21] mark & paravoid, ^^ [10:41:34] no DoCoMo matches in our 5xx.logs [10:41:49] the user-agent strings in the bug report seem incomplete, too [10:42:32] (03CR) 10Matanya: [C: 031] contint: deny access to Jenkins build history [operations/puppet] - 10https://gerrit.wikimedia.org/r/116939 (owner: 10Hashar) [10:43:09] ori, nope - these a typical docomo UAs, e.g. http://www.tera-wurfl.com/explore/?action=wurfl_id&id=docomo_sh_03a_ver1 [10:43:45] https://productforums.google.com/forum/#!msg/webmasters/dBD59HzcRWE/YJIGcCrowUMJ [10:46:08] seems like a pretty weird setup [10:46:37] https://www.nttdocomo.co.jp/english/service/information/fullbrowser/ says "Part or all of some sites may not be viewable depending on the site." [10:46:38] hmmmmmm [10:46:46] they have an emulator on their site [10:47:32] we see a 502 message in Japanese - apparently, there's some intermediate proxy emitting it? [10:49:16] yeah, i think it's some sort of proxy that strips content that the phone isn't able to display [10:49:48] you may be causing it to freak out by using modern html [10:50:10] nope [10:50:23] we're not emitting html on the redirecth [10:51:14] hm, yeah, and they're able to access the site directly [10:57:38] ori, where did you see the emulator? I've found https://www.nttdocomo.co.jp/english/service/developer/make/content/myface/emulator/index.html but it's not what we want [10:58:01] MaxSem: https://www.nttdocomo.co.jp/english/service/developer/make/content/browser/html/tool2/download/ [10:58:20] thanks [11:01:15] akosiaris: can you please merge : https://gerrit.wikimedia.org/r/#/c/116938/ ? [11:06:32] (03CR) 10Ori.livneh: [C: 032] contint: deny access to Jenkins build history [operations/puppet] - 10https://gerrit.wikimedia.org/r/116939 (owner: 10Hashar) [11:07:17] ori: thx [11:08:12] sigh [11:08:13] !log restarting Jenkins which has starving threads [11:08:21] Logged the message, Master [11:08:30] and of course, the emulator doesn't exhibit that bug [11:11:27] !log ori updated /a/common to {{Gerrit|I5277d2451}}: Update Schema:Echo revision to 7731316 [11:11:36] Logged the message, Master [11:12:28] PROBLEM - Host mw27 is DOWN: PING CRITICAL - Packet loss = 100% [11:12:37] !log jenkins back [11:12:45] Logged the message, Master [11:13:28] RECOVERY - Host mw27 is UP: PING OK - Packet loss = 0%, RTA = 35.38 ms [11:19:41] (03PS1) 10Hashar: contint: Icinga now notify on contint related stuff [operations/puppet] - 10https://gerrit.wikimedia.org/r/116944 [11:21:38] (03PS1) 10Matanya: icinga: update check_mysql-replication to v 0.2.6 [operations/puppet] - 10https://gerrit.wikimedia.org/r/116945 [11:24:32] paravoid, did Squid emit some HTML on redirects? [11:30:39] is ken still with the wmf? [11:31:10] no [11:36:00] paravoid: any reason I can't see the hhvm package on a labs instance though it got added to reprepro ( https://gerrit.wikimedia.org/r/#/c/112314/ ) [11:38:25] try again [11:41:24] paravoid: \O/ [11:44:53] (03PS1) 10Hashar: contint: install hhvm on labs slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/116947 [11:45:56] paravoid: would love if you can security review the patch above :) [11:47:27] (03PS1) 10Matanya: icinga: remove ksnider from contact groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/116948 [12:23:28] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [12:49:50] (03PS1) 10Matanya: php: remove hardy setting, no more hardy app servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/116953 [12:56:11] (03CR) 10Faidon Liambotis: [C: 032] icinga: remove ksnider from contact groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/116948 (owner: 10Matanya) [12:56:21] thanks [12:59:29] paravoid: are you going to be in zurich hackathon? [12:59:49] dunno yet [13:01:26] i'm thinking of it. but i'm only intersted of ops will be there [13:01:33] *if [13:02:59] akosiaris & mark will be there [13:03:15] and andrewb/marc-andre [13:03:56] hmm, the main issue is Saturday [13:04:05] no i won't be [13:04:12] but paravoid will be there :P [13:04:14] you won't? [13:04:19] no [13:04:20] i didn't sign up [13:04:54] btw, you too, i would like to have access to ops-l if possible [13:04:58] *two [13:05:32] hm, I'm not sure if we've ever done this before [13:05:35] do you have an NDA? [13:05:45] I like the way matanya ask his questions. [13:05:56] yes paravoid [13:06:13] ok [13:06:21] Instead of "Wouldn't it be too much of an inconvenience if I could get access to ops-l, please", he says "I would like to have access to ops-l" [13:06:35] * odder pats matanya on the back [13:06:40] I don't mind [13:06:43] I like people being direct :) [13:06:51] my point precisely :-) [13:07:13] i'm not british :) [13:07:38] do you mind submitting the form at https://lists.wikimedia.org/mailman/listinfo/ops ? [13:07:44] I can't promise we'll accept it [13:07:54] but it should at least remind us to have this discussion [13:08:33] (not about you, about non-staff subscriptions in general, I'm not sure if there's a precedent for this, although I'd be in favor) [13:10:13] done paravoid thanks for honsty :) [13:10:31] why all the vowles aren't typed today? [13:12:06] You might have switched your keyboard to Hebrew by mistake, I think [13:12:09] :-DD [13:29:08] (03CR) 10Manybubbles: Config changes for Elasticsearch (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/116498 (owner: 10Manybubbles) [13:29:11] (03PS3) 10Manybubbles: Config changes for Elasticsearch [operations/puppet] - 10https://gerrit.wikimedia.org/r/116498 [13:30:44] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [13:39:34] PROBLEM - Host labstore4 is DOWN: PING CRITICAL - Packet loss = 100% [13:42:38] :/ [13:47:14] RECOVERY - Host labstore4 is UP: PING OK - Packet loss = 0%, RTA = 35.40 ms [14:08:16] (03CR) 10Alexandros Kosiaris: [C: 04-1] "The garbage removal is cool, I would rather we did not delete however the dhcpd.conf stanza, even if the file is empty (it exists btw in t" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/116938 (owner: 10Matanya) [14:11:32] (03PS2) 10Matanya: dhcpd: remove Sun-X4150 related configs [operations/puppet] - 10https://gerrit.wikimedia.org/r/116938 [14:11:42] done akosiaris [14:12:22] (03CR) 10Alexandros Kosiaris: [C: 032] dhcpd: remove Sun-X4150 related configs [operations/puppet] - 10https://gerrit.wikimedia.org/r/116938 (owner: 10Matanya) [14:12:36] (03CR) 10Alexandros Kosiaris: [V: 032] dhcpd: remove Sun-X4150 related configs [operations/puppet] - 10https://gerrit.wikimedia.org/r/116938 (owner: 10Matanya) [14:13:47] matanya: so my catalog-differ is finally ready. Your @var changes are now officially my guinea pigs, you should see them being merged (hopefully) one by one [14:14:05] yay! :) thanks [14:14:16] is the source available ? [14:15:13] not yet. It will be though, I intend to write an installation procedure and some docs first [14:20:05] PROBLEM - Disk space on virt10 is CRITICAL: DISK CRITICAL - free space: /var/lib/nova/instances 44146 MB (3% inode=99%): [14:22:45] (03PS4) 10Manybubbles: Config changes for Elasticsearch [operations/puppet] - 10https://gerrit.wikimedia.org/r/116498 [14:25:38] (03CR) 10Ottomata: "Just looked at this again, but wasn't sure if you were ready for review since some of the comments weren't acted on. Lemme know when!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116498 (owner: 10Manybubbles) [14:27:24] (03PS2) 10Ottomata: Changing replica_lag_max_messages and replica_lag_time_max_ms [operations/puppet] - 10https://gerrit.wikimedia.org/r/116645 [14:27:31] (03CR) 10Ottomata: [C: 032 V: 032] Changing replica_lag_max_messages and replica_lag_time_max_ms [operations/puppet] - 10https://gerrit.wikimedia.org/r/116645 (owner: 10Ottomata) [14:29:52] (03CR) 10Manybubbles: "I think we're ok with applying this as is. We'll start looking at rack awareness vs forced awareness next week." [operations/puppet] - 10https://gerrit.wikimedia.org/r/116498 (owner: 10Manybubbles) [14:31:01] !log starting controlled shutdown of kafka broker an21 to reload new replica.lag settings [14:31:11] Logged the message, Master [14:34:34] PROBLEM - Kafka Broker Under Replicated Partitions on analytics1022 is CRITICAL: kafka.server.ReplicaManager.UnderReplicatedPartitions.Value CRITICAL: 10.0 [14:35:05] (03PS1) 10Matanya: ganglia: remove lookupvar and replace with top scope @ var [operations/puppet] - 10https://gerrit.wikimedia.org/r/116973 [14:36:10] (03CR) 10Ottomata: "Aye, ok. I know my comments were nitpicky, but you wanted to fix some of them, right? strings vs regexes, undefs instead of false, etc?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116498 (owner: 10Manybubbles) [14:37:32] (03CR) 10Manybubbles: "WTF I thought I fixed them...." [operations/puppet] - 10https://gerrit.wikimedia.org/r/116498 (owner: 10Manybubbles) [14:39:54] !log starting controlled shutdown (and restart) of kafka broker analytics1022 to bring in new replica.lag settings [14:40:02] Logged the message, Master [14:40:03] (03PS5) 10Manybubbles: Config changes for Elasticsearch [operations/puppet] - 10https://gerrit.wikimedia.org/r/116498 [14:40:05] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 212.918162617 [14:41:14] (03CR) 10Manybubbles: "These were sitting staged and waiting for me to amend my commit and push it...." [operations/puppet] - 10https://gerrit.wikimedia.org/r/116498 (owner: 10Manybubbles) [14:41:14] RECOVERY - Kafka Broker Messages In on analytics1021 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 1683.78747036 [14:41:34] RECOVERY - Kafka Broker Under Replicated Partitions on analytics1022 is OK: kafka.server.ReplicaManager.UnderReplicatedPartitions.Value OKAY: 0.0 [14:43:05] PROBLEM - Kafka Broker Under Replicated Partitions on analytics1021 is CRITICAL: kafka.server.ReplicaManager.UnderReplicatedPartitions.Value CRITICAL: 20.0 [14:46:25] (03PS1) 10Matanya: ldap: remove lookupvar and replace with top scope @ var [operations/puppet] - 10https://gerrit.wikimedia.org/r/116975 [14:51:24] PROBLEM - Kafka Broker Messages In on analytics1022 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 0.0 [14:53:35] (03PS1) 10Matanya: openstack: remove var and replace with top scope ::var [operations/puppet] - 10https://gerrit.wikimedia.org/r/116976 [14:53:54] RECOVERY - Kafka Broker Under Replicated Partitions on analytics1021 is OK: kafka.server.ReplicaManager.UnderReplicatedPartitions.Value OKAY: 0.0 [14:59:24] RECOVERY - Kafka Broker Messages In on analytics1022 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 2867.94276071 [15:20:27] akosiaris: if you can please give priority to sudo module and etherpad module, i would love that and prefer those getting a review before the @var stuff [15:22:50] (03CR) 10Manybubbles: "Sounds good! I'd remove the one offs above, like for itwikiquote." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116919 (owner: 10Chad) [15:23:44] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [15:38:41] (03PS1) 10Alexandros Kosiaris: Fix ruby stability hash issue in recursor.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/116978 [16:04:12] !log shutting down mw1165 swapping DIMM [16:04:19] Logged the message, Master [16:06:05] PROBLEM - Host mw1165 is DOWN: PING CRITICAL - Packet loss = 100% [16:12:20] robh: i am going to swap the disk in carbon now...any objections? [16:16:24] RECOVERY - Host mw1165 is UP: PING OK - Packet loss = 0%, RTA = 2.23 ms [16:16:46] !log upgraded php5 packages, php5-wmerrors and libmemcached packages on beta in preparation for full cluster upgrade. This will make puppet unhappy. [16:16:54] Logged the message, Master [16:25:07] cmjohnson1: nope! there shoudlnt be anythign on them we need [16:25:12] (but i'd set them aside just in case maybe) [16:25:20] thx for workin on it =] [16:25:25] yep [16:26:33] (03PS1) 10Hashar: beta: vary deployment-bastion by ::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/116982 [16:30:32] Coren, join us? [16:31:03] !log carbon going down [16:31:06] andrewbogott: In 2 min. Just spilled my tea [16:31:10] Logged the message, Master [16:31:44] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [16:32:24] PROBLEM - Host carbon is DOWN: CRITICAL - Host Unreachable (208.80.154.10) [16:33:25] (03PS6) 10Manybubbles: Config changes for Elasticsearch [operations/puppet] - 10https://gerrit.wikimedia.org/r/116498 [16:33:29] manybubbles: ok to merge? [16:33:34] fine by me [16:33:43] (03CR) 10Ottomata: [C: 032 V: 032] Config changes for Elasticsearch [operations/puppet] - 10https://gerrit.wikimedia.org/r/116498 (owner: 10Manybubbles) [16:40:22] Hi Coren, how's it going? [16:40:34] Busy. :-) What's up? [16:41:09] Sorry to bother you! Looking for someone to help me get course management rights on Spanish Wikipedia :) [16:41:43] Not sure who to ask, really [16:42:25] (Sage, who helped me with this on enwiki, is travelling today, I think.) [16:43:34] AndyRussG: Don't you need community consensus for that? [16:43:54] It's not to manage courses. It's to fix a bug [16:44:05] hoo: ^ [16:44:13] in that case, I guess I can arange that [16:44:18] Ah cool thanks [16:44:48] I'm the current maintainer of the EducationProgram extension [16:44:51] AndyRussG: What's your wiki user name? [16:44:58] AGreen (WMF) [16:46:00] damn, can't do that from meta, let me think a second for how to do that w/o making the community cry :/ [16:49:02] hoo: enwiki has this: https://en.wikipedia.org/wiki/Wikipedia:Education_noticeboard, don't know if that's relevant [16:49:36] AndyRussG: As this is a technical action, no consensus is required [16:49:46] right [16:52:13] (03PS1) 10Hashar: beta: on eqiad mount /srv using labs_lvm [operations/puppet] - 10https://gerrit.wikimedia.org/r/116987 [16:55:10] AndyRussG: Better ask someone else... as I'm also a steward people probably will cry if I act *sigh* [16:55:53] hoo: Ah OK, will do...! Many thanks in any case [17:00:18] AndyRussG: Ok, found someone to do it [17:02:25] hoo: cool, thanks so much [17:03:44] AndyRussG: Should be done [17:05:29] hoo: \o/ yay, thanks so much [17:05:34] all set [17:06:12] AndyRussG: So can be removed again? [17:08:14] hoo: Ah hmm, if you think it's important, go ahead, I did just confirm what I wanted to. But it'll be useful to have that right to continue working on this and any other bugs that come up. This page mentions that I'm the engineer for the Education Project software http://wikimediafoundation.org/wiki/Staff_and_contractors?showall=1 [17:08:32] AndyRussG: It's fine then, I guess [17:08:51] K [17:09:17] In any case, I'll create a user page on Spanish WP so that I'm easy to find there :) [17:09:36] that's always smart ;) [17:09:41] :) [17:09:46] If there are any problems it's no problem to remove the right again, in any case, and thanks again! [17:14:53] (03PS1) 10Cmjohnson: Temporarily changing install server for row a eqiad to brewster [operations/puppet] - 10https://gerrit.wikimedia.org/r/116990 [17:16:52] (03PS2) 10Chad: Give Cirrus to all Wikiquotes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116919 [17:17:09] (03CR) 10Cmjohnson: [C: 032] Temporarily changing install server for row a eqiad to brewster [operations/puppet] - 10https://gerrit.wikimedia.org/r/116990 (owner: 10Cmjohnson) [17:17:43] thanks ^d [17:18:03] (03CR) 10Chad: [C: 032] Give Cirrus to all Wikiquotes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116919 (owner: 10Chad) [17:18:11] (03Merged) 10jenkins-bot: Give Cirrus to all Wikiquotes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116919 (owner: 10Chad) [17:18:49] ^d, also, unless it wrecks the cluster it really doesn't matter that the interwiki search is ugly, it.wiki users loved it back in 2008-2009 and will be glad to test it however ugly and broken it may be [17:19:05] Nemo_bis: Mind re code reviewing that AbuseFilter change for sanity? [17:19:19] <^d> Nemo_bis: Hehe, I'm trying to fix some stuff with it first. [17:19:24] <^d> Namespaces be wonky :) [17:20:10] ^d: ah right, namespaces; though nobody will be so offended in Italian wikis, most localised namespaces don't differ much or at all from English I'm afraid [17:20:11] !log demon synchronized wmf-config/InitialiseSettings.php 'Cirrus for all the wikiquotes' [17:20:19] Logged the message, Master [17:20:25] hoo: you're ovestimating how much I cared about my comment, but sure [17:20:54] Nemo_bis: Well, I just need someone to CR it so taht I can convince csteipp to merge it :P [17:21:41] or self merge with his approval, who cares :P [17:22:51] (03PS1) 10Chad: Wikiquotes no longer building [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116992 [17:22:55] (03CR) 10jenkins-bot: [V: 04-1] Wikiquotes no longer building [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116992 (owner: 10Chad) [17:23:03] (03CR) 10Chad: [C: 04-2] "Just prepping this now." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116992 (owner: 10Chad) [17:24:55] hoo: is he in charge of abusefilter config now? there's nothing so special about abusefilter [17:25:39] Nemo_bis: Well, he's one of the people who care about it plus he has the rights to approve configuration changes ;) [17:27:42] <^d> hmm, ganglia reporting elastic1007 as missing. [17:27:48] <^d> Definitely there and working ;-) [17:27:57] <^d> *1013 [17:30:59] <^d> Go home ganglia, you're drunk. [17:34:54] RECOVERY - Host carbon is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [17:36:28] (03CR) 10Nemo bis: [C: 031] "Looks sane. Default configs or permissions implied by broader ones, AFAICS." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114656 (owner: 10Hoo man) [17:36:45] Nemo_bis: thanks :) [17:37:05] PROBLEM - Squid on carbon is CRITICAL: Connection refused [17:37:05] PROBLEM - SSH on carbon is CRITICAL: Connection refused [17:37:44] PROBLEM - HTTP on carbon is CRITICAL: Connection refused [17:44:47] (03CR) 10Hashar: "No production impact. Untested but should work :]" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116787 (owner: 10Hashar) [17:48:52] ottomata: elastic1013 dropped off the ganglia map - is there something wrong with rack d there too? [17:49:05] PROBLEM - NTP on carbon is CRITICAL: NTP CRITICAL: No response from NTP server [17:49:12] I assume it is from the puppet change setting it as the aggregator [17:50:16] hm, well possibly, we made it a ganglia aggregator, right [17:50:16] hm [17:50:28] hmm, oh there might be more to do for that [17:51:12] yeah we need to add it to list of aggregators in ganglia.pp [17:51:16] line 338 [17:51:17] gar [17:51:17] check it [17:51:26] (03PS1) 10Faidon Liambotis: hhvm: add hhvm-fastcgi too [operations/puppet] - 10https://gerrit.wikimedia.org/r/116999 [17:52:07] ottomata: I'll make the hange [17:52:22] (03CR) 10Faidon Liambotis: [C: 032 V: 032] hhvm: add hhvm-fastcgi too [operations/puppet] - 10https://gerrit.wikimedia.org/r/116999 (owner: 10Faidon Liambotis) [17:53:27] (03PS1) 10Manybubbles: Finish setting up elastic1013 in ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/117000 [17:53:34] paravoid: thanks (a lot) again [17:53:57] ottomata: sent for review [17:54:51] (03CR) 10Ottomata: [C: 032 V: 032] Finish setting up elastic1013 in ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/117000 (owner: 10Manybubbles) [17:54:55] (03CR) 10Chad: Finish setting up elastic1013 in ganglia (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/117000 (owner: 10Manybubbles) [17:55:44] bblack: time for cookies? [17:56:04] ori: sure [17:56:21] bblack: shall i go ahead, with you supervising? [17:56:28] yeah [17:57:48] (03PS2) 10Ori.livneh: Enable cookie-based geolocation on all frontend text Varnishes [operations/puppet] - 10https://gerrit.wikimedia.org/r/116928 [17:58:27] before you push that [17:58:35] let's have a plan for the frontend-restart mess [17:58:42] (03CR) 10Ori.livneh: [C: 032 V: 032] Enable cookie-based geolocation on all frontend text Varnishes [operations/puppet] - 10https://gerrit.wikimedia.org/r/116928 (owner: 10Ori.livneh) [17:58:47] ori: ^ [18:00:50] I mean, I guess it's just a VCL reload fail, but I'll have to run through all the text frontends manually I guess [18:02:54] bblack: yeah, i was going to do it one by one [18:03:08] ok [18:03:25] just leave a good gap between them so the 5xx rate isn't awful [18:04:06] well I guess frontends won't be 5xx, they'll be some other issue, but still it's not without impact if you rush through them all [18:05:42] yep [18:06:12] load on esams is high: (pre-change) [18:06:20] are those even varnishes? [18:07:32] hmm that's right [18:07:40] we have the old upload amssq* servers that can be converted to text varnish now [18:07:42] yes, they are varnishes [18:07:47] so we have 16 more [18:07:50] just need to be installed [18:08:01] * mark will ask someone to do that [18:08:30] sorry, they were text squids [18:08:36] the current text varnishes in that range were upload squids before that [18:10:39] * ori nods [18:12:45] mark: cp301[34] for mobile as well, we never did that [18:12:49] mark: RT #6360 [18:13:22] we had them reserved in case we needed them for LVS, but I think you told me the 10g cards are arriving RSN [18:16:11] greg-g, skipping depl today [18:16:45] still waiting for ops to figure out portal site [18:18:58] !log Cookie-based geolocation deployed to all text varnishes [18:19:05] Logged the message, Master [18:19:10] greg-g: ping [18:19:35] ori: awesome! [18:20:10] i'm going to prepare a revert patch, just in case, to have around [18:21:42] hoo: I'm pretty sure greg-g is traveling today. Deskana is going to be in charge of the LD window [18:21:53] (03PS1) 10Ori.livneh: [Do not merge unless things break] Disable cookie-based geolocation on text varnishes [operations/puppet] - 10https://gerrit.wikimedia.org/r/117004 [18:22:09] (03CR) 10Ori.livneh: [C: 04-1] [Do not merge unless things break] Disable cookie-based geolocation on text varnishes [operations/puppet] - 10https://gerrit.wikimedia.org/r/117004 (owner: 10Ori.livneh) [18:22:33] bd808: he's not here :/ I just want a small window for a Wikidata-Test config. change [18:23:51] hoo: are you ready now? [18:24:10] The zero window is blocked out for another 30 mins and yurik passed on using it today [18:24:11] bd808: Don't want to do that today [18:24:22] am tired, also aude's not around [18:24:27] * bd808 nods [18:24:36] forgot to say that I want it tomorrow [18:24:44] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [18:24:53] I'd say just add yourself to the calendar then and ping greg with an email [18:24:55] (03PS1) 10Ori.livneh: CentralNotice: set $wgCentralGeoScriptURL to false [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/117006 [18:25:03] bd808: will do [18:26:21] (03CR) 10Ori.livneh: [C: 032] CentralNotice: set $wgCentralGeoScriptURL to false [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/117006 (owner: 10Ori.livneh) [18:26:29] (03Merged) 10jenkins-bot: CentralNotice: set $wgCentralGeoScriptURL to false [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/117006 (owner: 10Ori.livneh) [18:26:33] !log ori updated /a/common to {{Gerrit|Ic410bd788}}: CentralNotice: set $wgCentralGeoScriptURL to false [18:26:41] Logged the message, Master [18:27:43] !log ori synchronized wmf-config/CommonSettings.php 'Ic410bd788a: CentralNotice: set $wgCentralGeoScriptURL to false' [18:27:52] Logged the message, Master [18:28:48] woot woot woot woot woot [18:29:12] pages will continue making reqs to bits.wikimedia.org/geoiplookup as long as that html is cached [18:29:28] but anything that gets regenerated from now on won't, so they'll fall off the face of the earth over the course of the next month [18:29:46] example: https://en.wikipedia.org/wiki/Equal_Protection_Clause [18:31:19] (03CR) 10Chad: "recheck" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116992 (owner: 10Chad) [18:33:27] ori: will you also remove geoiplookup.wm.org entirely at some point? [18:33:40] and bits.wm.org/geoiplookup I mean [18:37:29] paravoid: yeah [18:37:47] awesome [18:38:24] PROBLEM - Host carbon is DOWN: PING CRITICAL - Packet loss = 100% [18:43:34] RECOVERY - Host carbon is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [18:50:23] ottomata: that doesn't seem to have healed elastic1013 [18:50:32] ok, will look at it [18:50:48] i have other things to do for you too, right? oh, 1.0 in apt, right? [18:50:50] anything else atm? [18:52:24] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1901.733276 [18:54:24] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [18:56:51] (03PS1) 10Jgreen: flip pxeboot/tftp from carbon to brewster until carbon is back [operations/puppet] - 10https://gerrit.wikimedia.org/r/117011 [18:59:01] bblack, you around? [19:00:08] dr0ptp4kt: yes [19:00:15] (03CR) 10Jgreen: [C: 032 V: 031] flip pxeboot/tftp from carbon to brewster until carbon is back [operations/puppet] - 10https://gerrit.wikimedia.org/r/117011 (owner: 10Jgreen) [19:00:52] bblack, i was wondering if any luck with the ip addresses thing, as well as with isp [19:01:10] you mean updates right? [19:01:24] yep. we've a review meeting today, and was hoping to provide any pertinent feedback at that time, even if an eta [19:02:13] I have an interim solution already deployed for you guys to scp json file updates for zero.json and proxies.json, you'll have to manage editing them (by I guess some mechanism other than meta) on your own [19:02:31] sweet, what's the best way to try it out? [19:02:44] PROBLEM - Puppet freshness on carbon is CRITICAL: Last successful Puppet run was Wed 05 Mar 2014 04:01:35 PM UTC [19:09:24] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 294.066681 [19:10:24] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [19:11:18] bblack, thanks for the updates! [19:11:28] (off channel, that is) [19:18:50] (03PS1) 10Nemo bis: [gdash] Add some yearly graphs [operations/puppet] - 10https://gerrit.wikimedia.org/r/117020 [19:23:45] (03PS1) 10Nemo bis: [gdash] Use logscale 10 for reqerror graph, again [operations/puppet] - 10https://gerrit.wikimedia.org/r/117021 [19:28:39] (03CR) 10Faidon Liambotis: [C: 04-1] "This broke reqerror last time around and I had to revert it with I55360737d29f7e01f53b23e35c1f89922b9db70f. I don't see anything that addr" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117021 (owner: 10Nemo bis) [19:32:44] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [19:42:43] (03CR) 10Ottomata: "Anybody mind if I merge this?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/112944 (owner: 10Ottomata) [19:44:56] (03PS1) 10Ottomata: [WIP] Adding archiva module [operations/puppet] - 10https://gerrit.wikimedia.org/r/117024 [19:45:47] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Adding archiva module [operations/puppet] - 10https://gerrit.wikimedia.org/r/117024 (owner: 10Ottomata) [19:50:39] (03PS2) 10Ottomata: [WIP] Adding archiva module [operations/puppet] - 10https://gerrit.wikimedia.org/r/117024 [19:54:04] (03PS3) 10Ottomata: [WIP] Adding archiva module [operations/puppet] - 10https://gerrit.wikimedia.org/r/117024 [19:54:50] (03CR) 10Ottomata: "This depends on:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117024 (owner: 10Ottomata) [20:01:47] manybubbles: yt? [20:01:54] yes [20:02:36] ah bd808 too, good :) [20:02:45] so, i asked about the logstash package thing [20:02:52] does the logstash package include elasticsearch, or depend on it? [20:03:19] ottomata: I think it's independent of it [20:03:28] ohhh, ok interesting [20:03:32] (03CR) 10Nemo bis: "Ah. Had you posted a comment on the bug or patch about this followup, I would have been able to do something about it..." [operations/puppet] - 10https://gerrit.wikimedia.org/r/117021 (owner: 10Nemo bis) [20:03:36] so elasticsearch is built into the logstash package? [20:03:39] But I haven't read the manifest [20:04:07] ottomata: No. Elasticsearch is one of many possible outputs for logstash [20:04:56] oh. [20:04:57] Although the "fat" jar does bundle some version of Elasticsearch I think [20:05:05] i do not know logstash well, i thought logstash used elasticsearch... [20:05:24] (03CR) 10Nemo bis: "Removed in I55360737d29f7e01f53b23e35c1f89922b9db70f" [operations/puppet] - 10https://gerrit.wikimedia.org/r/105614 (owner: 10Nemo bis) [20:05:25] It does, but in the same way that MW uses MySQL [20:05:48] Meaning it's not required [20:05:53] (03CR) 10Nemo bis: "Revert of I27a8c173825387a8f465180b2d8aac374485bc95" [operations/puppet] - 10https://gerrit.wikimedia.org/r/107367 (owner: 10Faidon Liambotis) [20:06:12] But highly likely to be associated [20:06:23] right, hm [20:06:40] ok, i'm looking at logstash.jar, and it looks like all the elasticseach classes are in it [20:06:56] so ok cool. totally independent of the elasticsearch package [20:07:11] Yeah, they probably are, but that's not used in our deployment [20:07:30] our deployment doesn't use es? [20:07:41] It does, but we run it as a separate service [20:07:54] ok, so the reason I am asking [20:07:58] And we use the HTTP connector instead of the java client [20:08:09] is I don't want us changing the es package in our apt repo to cause you problems [20:08:19] so, you have your own es cluster for logstash then? [20:08:31] Yes. [20:08:40] ok, running 0.9x then I assume? [20:09:09] !log installing security upgrades on iron [20:09:09] Yeah. I think the current version from our apt [20:09:18] Logged the message, Master [20:09:54] ok so, if we put 1.0 in our apt, i think 0.9 will disappear [20:10:07] it won't upgrade your cluster, but if you had to add new nodes, your only option would be 1.0 [20:10:21] !log installing php and security upgrades on bast1001 [20:10:28] Logged the message, Master [20:10:28] ottomata: Understood. I'm running 0.90.11 right now [20:10:47] ok, so is it ok for us to do this then? [20:10:48] i guess [20:10:51] And I have a bug open about upgrading to 1.0.x once Nik has switched [20:10:58] will you upgrade to 1.0 too? [20:10:59] ah ok [20:11:03] cool [20:11:38] What does "An error has occurred while searching: Pool queue is full" when requesting https://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=advanced&search=Swazimac&fulltext=Search&ns0=1&redirs=1&profile=advanced mean??? Are there any known db issues that would cause this? [20:11:46] alright cool, going to do this then [20:12:16] Works for me. Make Nik happy :) [20:13:27] After six tries it went through... guess whatever it was is better now... [20:13:37] (03PS1) 10Ottomata: reprepro/update elasticsearch version to 1.0 [operations/puppet] - 10https://gerrit.wikimedia.org/r/117030 [20:17:22] (03PS1) 10Ori.livneh: applicationserver: provision stub HHVM role on Beta Labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/117035 [20:17:31] (03CR) 10Ottomata: [C: 032 V: 032] reprepro/update elasticsearch version to 1.0 [operations/puppet] - 10https://gerrit.wikimedia.org/r/117030 (owner: 10Ottomata) [20:18:14] (03CR) 10Ori.livneh: [C: 032] contint: install hhvm on labs slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/116947 (owner: 10Hashar) [20:18:28] ori: what version of graphite are we using? [20:19:33] Package: graphite-carbon, Version: 0.9.12-1 [20:19:37] ori: thx for hhvm! [20:19:40] Package: graphite-web, Version: 0.9.12+debian-1 [20:20:00] I have no clue how tu run the tests with hhvm but I am sure it can be figured out somehow [20:20:02] Nemo_bis: i don't think your patch will work [20:20:34] hashar: see .travis.yml in the root of core [20:20:44] mutante: would like to talk about wikibugs when you get a moment [20:21:00] hashar: https://github.com/wikimedia/mediawiki-core/blob/master/.travis.yml#L54 [20:21:04] bblack, i'm going to try a push to add two ips [20:21:05] ori: thanks [20:21:37] ori: sounds easy :] [20:22:20] matanya: i dunno know anything about wikibugs:) [20:22:31] that is a good start [20:22:52] i would like to break it and merge the parts into irc/bugzilla [20:23:03] mutante does that work for you? [20:23:03] (03PS1) 10Manybubbles: Update Elasticsearch monitoring for 1.0 [operations/puppet] - 10https://gerrit.wikimedia.org/r/117037 [20:23:06] manybubbles: http://apt.wikimedia.org/wikimedia/pool/main/e/elasticsearch/ [20:23:06] (03CR) 10Dzahn: [C: 032] contint: Icinga now notify on contint related stuff [operations/puppet] - 10https://gerrit.wikimedia.org/r/116944 (owner: 10Hashar) [20:23:25] (03CR) 10Manybubbles: "Do not merge me until Elasticsearch 1.0 upgrade." [operations/puppet] - 10https://gerrit.wikimedia.org/r/117037 (owner: 10Manybubbles) [20:24:47] matanya: i have no concerns as long as you fix it after breaking it:) [20:24:57] but that might become hard [20:25:15] breaking stuff the the core of fixing them [20:25:20] *is the core [20:26:00] matanya: i don't know what's broken, so, yea, i abstain [20:26:23] ok, you can always -2 me :) [20:26:28] does it not run anymore? [20:26:31] no, don't add me:) [20:26:43] there are other reviewers that know way more about that [20:26:50] yes, it reports all the bugzilla related stuff on irc channels [20:26:56] so, then why break it [20:27:15] it isnt running on bugzilla server, see [20:27:17] because it is in manifests [20:27:19] that's why i wasnt involved [20:27:22] it just reads mail [20:27:32] and i want modules [20:28:13] hashar: Can I bug you about beta's Varnish setup when you have a few minutes? [20:28:34] anyway mutante there are enough patches from me you can review :P [20:29:15] matanya: yes, true! [20:29:46] please don't count it as bugzilla server, it's not on zirconium [20:30:11] it will have to move when mail servers move though [20:30:29] hmm, poor mchenry [20:30:52] ok, i'll think about it, and find out the best way to deal with it [20:31:20] (03PS1) 10Jgreen: switch drush on aluminium to the git-installed copy [operations/puppet] - 10https://gerrit.wikimedia.org/r/117040 [20:32:00] matanya: maybe something like modules ircbots::wikibugs, rather that than bugzilla::, i didnt put it into that module on purpose [20:32:08] ok, thank you for your help! [20:32:30] i guessed so, i'll hack it, not sure in what from yet [20:32:35] *form [20:33:50] mutante: you can look at https://gerrit.wikimedia.org/r/#/c/116936/ if you have time, i'm moving on. thank you and good night [20:35:32] matanya: conflict or half/duplicate of https://gerrit.wikimedia.org/r/#/c/115527/1 [20:35:43] good night [20:36:14] not really, more like a follow up of thing missing there [20:37:08] (03CR) 10Dzahn: [C: 04-1] "the backup files are already removed in I773c188f865880e50fb4b643fb47ae50e4cec03d" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116936 (owner: 10Matanya) [20:37:59] (03PS2) 10Jgreen: switch drush on aluminium to the git-installed copy [operations/puppet] - 10https://gerrit.wikimedia.org/r/117040 [20:38:50] (03PS3) 10Matanya: sockpuppet: remove leftovers [operations/puppet] - 10https://gerrit.wikimedia.org/r/116936 [20:39:08] matanya: https://gerrit.wikimedia.org/r/#/c/116064/ :) , but you wanted to leave, hehe [20:39:17] matanya: ok, cool:) [20:39:41] (03CR) 10Jgreen: [C: 032 V: 031] switch drush on aluminium to the git-installed copy [operations/puppet] - 10https://gerrit.wikimedia.org/r/117040 (owner: 10Jgreen) [20:39:46] that one is a dream coming true. i wanted to leave, but like 7 people poked me :) [20:40:11] (03CR) 10Dzahn: [C: 031] "yea, heh, i skipped the key, that's true. looks good if it happens as a follow-up" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116936 (owner: 10Matanya) [20:40:20] Techman224_away: that is caused by lucenesearch hickuping. we're not sure why but it sometimes does that. is it still having touble? [20:41:13] <^d> manybubbles: That was Technical_13 who asked, and he left. [20:41:31] ah, well, it looks like it isn't doing it now [20:41:34] (03CR) 10Dzahn: "yea, heh, i skipped the key, that's true. looks good if it happens as a follow-up. but i literally mean "but someone else must approve", t" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116936 (owner: 10Matanya) [20:41:39] I wish I could say more about lsearchd [20:42:08] <^d> I have lots of things to say about it, most aren't polite ;-) [20:42:14] matanya: +1 but i want others to confirm that files/deploy/id_rsa.pub is NOT used [20:42:20] sure [20:42:27] just that "sockpuppet" is in the key comment.. i don't trust anything there [20:42:34] it's "deploy" something [20:43:11] it was used when ryan used sockpuppet to deploy using slat iirc [20:43:17] *salt [20:43:22] (03CR) 10Hashar: "Love the idea of finally testing hhvm on beta. I left a few note around :]" (035 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/117035 (owner: 10Ori.livneh) [20:43:52] hey ori, you there? [20:44:02] (03CR) 10Nemo bis: "I remember reading this bug: https://bugs.launchpad.net/graphite/+bug/925635" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117021 (owner: 10Nemo bis) [20:44:06] (03CR) 10Chad: [C: 032] Wikiquotes no longer building [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116992 (owner: 10Chad) [20:44:33] (03Merged) 10jenkins-bot: Wikiquotes no longer building [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116992 (owner: 10Chad) [20:46:11] !log demon synchronized wmf-config/InitialiseSettings.php 'Wikiquote Cirrus indexes done building, Beta for all!' [20:46:20] Logged the message, Master [20:48:47] !log starting NTP on analytics1004 [20:48:55] Logged the message, Master [20:48:58] ottomata: ^ i did that because it had a critical in icinga [20:49:15] the "offset unknown" thing [20:51:54] hm, weird ok [20:51:56] thanks [20:52:25] hm, ok , manybubbles, elasticsearch ganglia fixed [20:52:34] yay! [20:52:43] so [20:52:51] ACKNOWLEDGEMENT - Disk space on virt10 is CRITICAL: DISK CRITICAL - free space: /var/lib/nova/instances 43596 MB (3% inode=99%): daniel_zahn checked with Coren. will die soon anyways and new instance creation is disabled [20:52:59] i'm not sure if the sysctl puppet module works properly, or maybe there is some chicken/egg issue there [20:53:13] sorry! [20:53:16] wikimedia base sysctl is supposed to increase net.core.rmem_max [20:53:32] but, on most of the es nodes [20:53:37] it was at the base install level [20:53:43] i had to notify procps to get it to load the proper settings [20:53:50] puppet is supposed to do that when it changes them for the first time [20:54:02] gmond sets a higher udp recv buffer [20:54:17] and it wasn't starting on es1013 because rmem_max was too low [20:54:31] ACKNOWLEDGEMENT - RAID on ms-be5 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) daniel_zahn hardware failure with existing ticket. https://rt.wikimedia.org/Ticket/Display.html?id=6555 [20:55:23] hey manybubbles, whatcha think of this? :) [20:55:24] https://gerrit.wikimedia.org/r/#/c/117024/3/modules/archiva/files/archiva-gitfat-link [20:55:42] I gotta read it, I gurees [20:55:51] (03CR) 10Matanya: "see inline comments and questions" (0314 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/116064 (owner: 10Dzahn) [20:56:10] ACKNOWLEDGEMENT - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC daniel_zahn related to labs migration [20:56:10] ACKNOWLEDGEMENT - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC daniel_zahn related to labs migration [20:56:27] mutante: special review for you :) [20:58:43] (03CR) 10Manybubbles: "cool" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/117024 (owner: 10Ottomata) [20:59:01] haha, thanks :) [21:01:00] manybubbles: are the any other pending pre upgrade todos for me that i've missed? [21:01:30] ottomata: import the deb is the only one before the window I think [21:01:43] I'll queue everything for the start of the window up tomorrow morning [21:01:51] already started with monitoring today [21:01:57] matanya: wow ! haha, 14 comments, i'll read in a bit, but you should run now [21:02:08] ok cool [21:02:12] * addshore wonders who to poke about getting added to the LDAP group for graphite access? :) [21:02:57] addshore: RT, it's always RT you should poke :) [21:03:07] just mail it [21:03:14] and paste the same line:) [21:03:24] I can poke RT without having access to RT? ;p [21:03:35] yes, see topic [21:03:49] ahh :) lovely :P [21:03:51] addshore: you do have access [21:03:54] if you have email [21:04:01] it autocreates a user [21:04:12] the other question is if you also have permissions to see all the other bugs [21:04:29] but you can definitely create tickets that will be processed and mail you [21:04:39] (03CR) 10Ori.livneh: applicationserver: provision stub HHVM role on Beta Labs (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/117035 (owner: 10Ori.livneh) [21:04:50] (03PS2) 10Ori.livneh: applicationserver: provision stub HHVM role on Beta Labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/117035 [21:05:07] lovely mutante [21:09:08] RECOVERY - NTP on analytics1004 is OK: NTP OK: Offset -0.002113580704 secs [21:12:56] there, now if we fix those 2 cert warnings, we are down to 0 un-ACKed CRITs again [21:13:15] and host-wise, just 'rhodium' has been down for a while [21:13:19] anyone have updates on that? [21:13:38] notifications are disabled,so can't be that critical, but is it coming back? [21:15:08] (03CR) 10Hashar: "PS1 server.hdf can be made a template later on." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/117035 (owner: 10Ori.livneh) [21:15:13] (03PS3) 10Ori.livneh: applicationserver: provision stub HHVM role on Beta Labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/117035 [21:18:51] (03PS4) 10Ori.livneh: applicationserver: provision stub HHVM role on Beta Labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/117035 [21:20:08] (03CR) 10Hashar: [C: 031 V: 031] "Reviewed via IRC. Go for it and thank you for using the beta cluster! =]" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117035 (owner: 10Ori.livneh) [21:21:43] (03PS1) 10Ori.livneh: [Do not merge unless breakage occurs] Revert "CentralNotice: set $wgCentralGeoScriptURL to false" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/117086 [21:22:52] (03CR) 10Ori.livneh: [C: 032] applicationserver: provision stub HHVM role on Beta Labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/117035 (owner: 10Ori.livneh) [21:23:36] !log killing rhodium from puppet stored configs and icinga, was already removed from site.pp in change 115638 [21:23:45] Logged the message, Master [21:39:13] (03PS2) 10BryanDavis: Additional ssh key for Bryan Davis. [operations/puppet] - 10https://gerrit.wikimedia.org/r/116014 [21:39:32] (03Abandoned) 10BryanDavis: Additional ssh key for Bryan Davis. [operations/puppet] - 10https://gerrit.wikimedia.org/r/116014 (owner: 10BryanDavis) [21:39:45] (03PS3) 10Dzahn: WIP - turn RT from misc/* into puppet module [operations/puppet] - 10https://gerrit.wikimedia.org/r/116064 [21:40:17] (03CR) 10Dzahn: WIP - turn RT from misc/* into puppet module (0312 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/116064 (owner: 10Dzahn) [21:41:03] (03CR) 10jenkins-bot: [V: 04-1] WIP - turn RT from misc/* into puppet module [operations/puppet] - 10https://gerrit.wikimedia.org/r/116064 (owner: 10Dzahn) [21:45:34] (03PS4) 10Dzahn: WIP - turn RT from misc/* into puppet module [operations/puppet] - 10https://gerrit.wikimedia.org/r/116064 [21:49:12] (03CR) 10Dzahn: [C: 032] nrpe: remove hard coded disk checks [operations/puppet] - 10https://gerrit.wikimedia.org/r/110880 (owner: 10Matanya) [21:53:31] (03CR) 10Dzahn: "is this file really not reference anywhere? that would be a little odd, then i'd question indeed why it's here in puppet repo" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116953 (owner: 10Matanya) [21:54:03] (03CR) 10Dzahn: [C: 031] php: remove hardy setting, no more hardy app servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/116953 (owner: 10Matanya) [21:56:57] (03CR) 10Dzahn: [C: 031] "+1, just one minor comment about puppet-doc not finding the comments if you have newlines" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/115133 (owner: 10Hashar) [22:02:21] (03CR) 10Dzahn: [C: 032] facilities: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/110339 (owner: 10Matanya) [22:07:07] (03CR) 10BryanDavis: "As a follow up to Antoine's comment about using `git init --shared=`, I tested with local git operations and found that a `git init --shar" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115851 (owner: 10Dzahn) [22:08:05] (03PS1) 10Manybubbles: Elasticsearch upgrade starting [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/117095 [22:08:21] (03CR) 10Manybubbles: [C: 04-1] "No merging until the release window." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/117095 (owner: 10Manybubbles) [22:09:55] (03CR) 10Dzahn: "thanks, still works as before, no changes seen on neon" [operations/puppet] - 10https://gerrit.wikimedia.org/r/110339 (owner: 10Matanya) [22:11:27] bd808: i see.. abandon is reasonable ? [22:11:57] per "This fixes a symptom, but the real problem is the behavior/configuration of git:clone. " [22:12:44] mutante: yeah. I don't think that approach is going to fix the problem. And it's spreading. Basically we just did that wrong. [22:13:07] (03Abandoned) 10Dzahn: fix wrong scap/doc permissions.live hack->puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/115851 (owner: 10Dzahn) [22:13:10] I'm tempted to write a patch to deploy scap using trebuchet [22:13:24] bd808: gotcha, alright [22:13:42] what.. deploy scap using trebuchet? hah!? [22:13:47] :) [22:13:57] Ryan_Lane1: :) [22:14:10] (03CR) 10Hashar: "Thanks Bryan for the test." [operations/puppet] - 10https://gerrit.wikimedia.org/r/115851 (owner: 10Dzahn) [22:14:17] bd808: I love git :] [22:14:18] yea, thanks for testing [22:14:53] mutante: git clone comes with a bunch of default behaviors [22:15:14] for scripting one could use git init / git config then add the remote, fetch from it and checkout whatever needed [22:15:27] so you can then finely tune what you want to do by passing relevant options at each stage [22:15:38] in our case git clone is fine and we can tweak settings after [22:15:52] would need to try out locally probably [22:15:56] all i wanted to do was make sure a live hack gets into puppet :) [22:16:05] (03PS1) 10Manybubbles: Remove cirrus jobs from priority list [operations/puppet] - 10https://gerrit.wikimedia.org/r/117096 [22:16:10] Oh. Yeah I didn't test with forced mask. I should have [22:16:13] and you end up fixing git::clone! [22:16:18] yea:) [22:16:24] (03CR) 10Manybubbles: [C: 04-1] "No merging until a few hours before the window." [operations/puppet] - 10https://gerrit.wikimedia.org/r/117096 (owner: 10Manybubbles) [22:16:26] also Precise has a rather "old" git version [22:16:35] while your local git might be more recent [22:17:22] (03CR) 10Manybubbles: "This is safe to merge a bit before the window so we can get it on the job runners. It'll cause a bit of a slow down in cirrus jobs but th" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117096 (owner: 10Manybubbles) [22:18:10] I'm going to use my testing window to do a "no-op" scap now unless anyone has a strong objection. Greg cleared it yesterday. [22:18:47] ^d: when searching for your name on gerrit i find "Normal User", fwiw [22:21:44] !log bd808 scap failed: AttributeError 'int' object has no attribute 'find' (duration: 00m 00s) [22:21:52] Logged the message, Master [22:22:00] That's quite a test [22:22:32] !log scap broken; working on a fix [22:22:40] Logged the message, Master [22:22:47] heh, sounds like it was a good idea to run the no-op [22:22:54] yeah [22:26:22] (03CR) 10Dzahn: [C: 031] webserver: fixing duplicate declaration of apache-mpm [operations/puppet] - 10https://gerrit.wikimedia.org/r/112423 (owner: 10Matanya) [22:28:17] (03CR) 10Dzahn: [C: 031] "yes, but only after Change-Id: I773c188f865880e50fb4b643fb47ae50e4cec03d" [operations/dns] - 10https://gerrit.wikimedia.org/r/115688 (owner: 10Matanya) [22:30:45] (03CR) 10Dzahn: [C: 04-1] "yea, i was about to say this is a ton of stuff to change in one single change, and then what Ottomata said, this should be more than one m" [operations/puppet] - 10https://gerrit.wikimedia.org/r/113966 (owner: 10Matanya) [22:31:01] <^d> mutante: I have 3 users in gerrit :p [22:31:06] <^d> 2 of them were for testing [22:31:20] ^d: i figured, ok:) [22:31:46] i added you to some changes for search and solr by matanya, heh [22:31:59] they are not huge but i'm not sure either if we want that at this point [22:32:43] oh, and i removed my -1 on the change to remove tampa appservers from dsh [22:33:33] paravoid: ^ are you still opposed to that? [22:33:54] appservers should go towards the end no? [22:35:03] i mean https://gerrit.wikimedia.org/r/#/c/108070/ [22:35:25] (03CR) 10Dzahn: [C: 031] Remove old Tampa srv* and mw* apaches from dsh groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/108070 (owner: 10Chad) [22:36:26] matanya: yes, it's more about the timing [22:37:15] matanya: see the comment on PS4 [22:37:19] " "appservers in Tampa still work and serve as a backup" - "scap is hierarchical now, i don't see why it slow down everything" [22:37:31] but then also see comment by Reed [22:37:46] "I also note that the database servers would seem to be mostly decommissioned in tampa" [22:37:56] all those 243 boxes going to be shutdown? [22:38:09] the question is when [22:38:18] amazing [22:38:50] and if it makes sense to keep depsite the db server comment [22:39:10] so we just pay power bills for idle servers? [22:39:33] i'm not sure if it was metered or not [22:39:41] i'm thinking of my CEO respose for such a case, he would go nutz [22:39:44] but yea, if you have hot standby, that's how it is [22:40:29] if those hot standbys were in eqiad would it make any diff? [22:40:32] see ganglia for their traffic, i think what is left is monitoring itself [22:40:40] and network [22:40:45] matanya: They would be useless, mostly :P [22:41:01] yea, well, part of the point is being able to switch over to different dc, right [22:41:09] but then, we are also going to get rid of the other one [22:41:19] and we're going to get a new one [22:41:24] but the RFP should have ended by now [22:41:25] so it comes down to the timing [22:41:41] yes, but like a day or 2 ago [22:41:49] (03CR) 10MaxSem: [C: 031] Remove old Tampa srv* and mw* apaches from dsh groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/108070 (owner: 10Chad) [22:41:56] wast coast? [22:42:14] west [22:42:16] more west than eqiad , yes [22:48:25] (03CR) 10Hoo man: "Will the network latency coming from the writes to the other DC slow down beta (significantly)?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115623 (owner: 10Hashar) [22:53:34] matanya: The RFP was for "Chicago and points west" [22:53:35] !log bd808 Started scap: no-diff scap to test script changes [22:53:42] Logged the message, Master [22:53:47] yes, i know RoanKattouw [23:05:14] RobH: ping [23:05:28] ? [23:05:36] Yes? [23:05:58] hey, we are considering to take the labs migration as the opportunity to migrate parsoid rt testing to physical hardware [23:06:29] the motivation is a) the ability to actually use performance data, and b) better performance and reliability [23:07:01] I was told that you are the person to talk to about such requests [23:07:25] So I assign the actual hardware from our spares, or I put in the reqeusts for new hardware [23:07:40] i do the assignign, but mgmt signs off on the allocation [23:07:55] ok, so should we send a mail to the ops list detailing our needs? [23:07:59] So, you have a project in labs and you want to move it to bare metal then? [23:08:04] yes [23:08:40] well, basically mark and faidon need to understand what you want to do and why you want bare metal to do it [23:08:50] as those two guys are the ones who will review for approval [23:09:14] ok [23:09:27] then if they approve, you and i can spec out the hardware you need and get quotes for purchase approval [23:09:45] makes sense [23:09:51] I'll mail them directly then [23:09:57] so ops list is prolly a wider audience then you need to go with, but it wouldn't exactly be a bad thing. At minimum you want to put in a procurement ticket in RT and detail it all [23:10:17] so if you put in a ops-request ticket, it would also work, just make sure they are aware of it [23:10:27] (and me, you can cc me on said RT ticket ;) [23:10:40] that sounds like a plan [23:10:44] thanks! [23:11:09] quite welcome [23:12:04] the alternative is you simply put in procurement ticket asking for them, then all this stuff i talked about i end up dragging out of you via rt ticket ;] [23:12:22] ;) [23:21:00] !log bd808 Finished scap: no-diff scap to test script changes (duration: 27m 25s) [23:21:09] Logged the message, Master [23:55:41] ori: know about the right LDAP group for graphite access without having to dig for it? [23:56:46] addshore: https://wikitech.wikimedia.org/wiki/Gdash.wikimedia.org is not enough for your purposes i suppose? [23:56:52] mutante: wmf ldap group in wikitech ldap [23:57:14] <^d> What bd808 said. [23:58:19] thanks guys [23:58:26] <^d> yw [23:58:37] mutante: See also https://rt.wikimedia.org/Ticket/Display.html?id=6293 for a quest to open that up to all with NDAs [23:58:52] maybe gdash is also enough [23:59:03] or can have more stuff on it that doesnt need to be behind auth [23:59:20] ok, i'll link that to other ticket [23:59:43] see how gdash is a subset of graphite without needing access [23:59:48] per wikitech