[00:02:00] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Server Error - 1703 bytes in 6.562 second response time [00:05:30] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [00:10:00] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 268105 bytes in 7.125 second response time [00:28:01] Attempts to use catscan2 (http://tools.wmflabs.org/catscan2/catscan2.php ...) are throwing error 500 [01:13:30] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [01:55:30] PROBLEM - Puppet freshness on lanthanum is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 04:37:38 PM UTC [02:20:50] !log LocalisationUpdate completed (1.23wmf15) at 2014-03-03 02:20:49+00:00 [02:21:06] Logged the message, Master [02:23:00] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:24:00] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 268117 bytes in 7.385 second response time [02:37:24] !log LocalisationUpdate completed (1.23wmf16) at 2014-03-03 02:37:24+00:00 [02:37:33] Logged the message, Master [02:43:30] PROBLEM - Puppet freshness on gallium is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 05:25:59 PM UTC [02:57:25] is jenkins down https://integration.wikimedia.org/ci/ [02:59:26] I get Error: 503, Service Unavailable at Mon, 03 Mar 2014 02:54:17 GMT [03:06:30] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [03:08:35] !log LocalisationUpdate ResourceLoader cache refresh completed at 2014-03-03 03:08:34+00:00 [03:08:43] Logged the message, Master [04:14:30] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [04:28:24] !log s3 testing semi-synchronous replication [04:28:32] Logged the message, Master [04:44:00] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [04:50:40] (03CR) 10Ori.livneh: "Why not load *one* font by default via MediaWiki:Common.css?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115153 (owner: 10Odder) [04:56:30] PROBLEM - Puppet freshness on lanthanum is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 04:37:38 PM UTC [05:03:06] Looks like jenkins may be sick. I'm seeing test and gate-and-submit jobs backup and I can't pull up the UI. [05:43:00] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [05:44:30] PROBLEM - Puppet freshness on gallium is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 05:25:59 PM UTC [06:07:30] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [06:27:40] PROBLEM - udp2log log age for emery on emery is CRITICAL: CRITICAL: log files /a/log/webrequest/packet-loss.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [06:29:40] RECOVERY - udp2log log age for emery on emery is OK: OK: all log files active [07:15:30] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [07:46:54] um, paravoid since you are around, can you please tell me if there is a time frame i will get some reviews? some of my patches are waiting over a month [07:47:02] hi [07:47:23] akosiaris was telling me that he has a catalog diff tool that would help have a faster turnaround [07:47:31] I was hoping that would happen last week, I guess it didn't? [07:47:41] it's a holiday here today, so don't expect him today [07:47:48] no, only two got through [07:47:57] :( [07:48:19] only since feb 19 [07:48:58] the last review was feb 19 for the huge work he did on the site.pp lint [07:49:22] since then you did one, and the rest were decomes [07:49:36] I'll I'm sorry [07:49:42] I'll raise it with the team [07:50:35] honstly, if it just adds a bottle neck, i can take a break [07:50:42] no, they're useful [07:57:30] PROBLEM - Puppet freshness on lanthanum is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 04:37:38 PM UTC [08:15:44] (03PS1) 10ArielGlenn: disable dumps rsync cron jobs temporarily [operations/puppet] - 10https://gerrit.wikimedia.org/r/116463 [08:34:04] (03PS17) 10Matanya: etherpad: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/107567 [08:40:24] (03CR) 10ArielGlenn: [C: 032 V: 032] disable dumps rsync cron jobs temporarily [operations/puppet] - 10https://gerrit.wikimedia.org/r/116463 (owner: 10ArielGlenn) [08:44:51] (03PS7) 10Matanya: Torrus: add torrus to netmon1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/108314 [08:45:30] PROBLEM - Puppet freshness on gallium is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 05:25:59 PM UTC [09:06:34] (03PS4) 10Matanya: sudo: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/111189 [09:08:30] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [09:52:44] (03PS4) 10Tim Landscheidt: Tools: Install package supybot [operations/puppet] - 10https://gerrit.wikimedia.org/r/112202 [09:54:13] (03CR) 10Hashar: [C: 031] Tools: Install package supybot [operations/puppet] - 10https://gerrit.wikimedia.org/r/112202 (owner: 10Tim Landscheidt) [10:09:08] paravoid: morning :D I got a rather weird issue on the misc varnish caching the 404 pages. I would want to disable that feature but I am not sure what is the best course of action. [10:09:13] paravoid: should I just raise that on ops list? [10:16:30] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [10:58:30] PROBLEM - Puppet freshness on lanthanum is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 04:37:38 PM UTC [11:02:23] (03PS1) 10Ori.livneh: geoip.inc.vcl: don't increment loop counter twice [operations/puppet] - 10https://gerrit.wikimedia.org/r/116469 [11:46:30] PROBLEM - Puppet freshness on gallium is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 05:25:59 PM UTC [11:47:56] bah [11:51:17] (03PS1) 10Hashar: contint: maven_webproxy is not using autoload layout [operations/puppet] - 10https://gerrit.wikimedia.org/r/116473 [11:51:40] apergos: mind merging a contint change please? A puppet class is in a file which is wrongly named https://gerrit.wikimedia.org/r/116473 [11:55:59] (03CR) 10Matanya: [C: 031] contint: maven_webproxy is not using autoload layout [operations/puppet] - 10https://gerrit.wikimedia.org/r/116473 (owner: 10Hashar) [11:56:18] matanya: sorry missed that in the previous review [11:56:27] it is ok [11:58:56] (03CR) 10ArielGlenn: [C: 032] contint: maven_webproxy is not using autoload layout [operations/puppet] - 10https://gerrit.wikimedia.org/r/116473 (owner: 10Hashar) [12:03:46] nice [12:05:01] RECOVERY - Puppet freshness on lanthanum is OK: puppet ran at Mon Mar 3 12:04:59 UTC 2014 [12:08:34] \O/ [12:09:11] RECOVERY - Puppet freshness on gallium is OK: puppet ran at Mon Mar 3 12:09:09 UTC 2014 [12:09:30] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [12:33:23] apergos: mind merging https://gerrit.wikimedia.org/r/#/c/110880/ ? [12:41:16] I don't have the free brain cells to double check it right now; I'll look at it on my next decent break though [12:41:44] thank you [12:59:51] (03PS1) 10Nuria: [WIP] Changes in puppet to support a testing db for wikimetrics in vagrant. [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/116477 [13:07:19] (03CR) 10Matanya: [WIP] Changes in puppet to support a testing db for wikimetrics in vagrant. (035 comments) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/116477 (owner: 10Nuria) [13:07:30] nuria: ^ :) [13:17:30] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [14:08:05] (03PS2) 10Nuria: [WIP] Changes in puppet to support a testing db for wikimetrics in vagrant. [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/116477 [14:13:34] nuria: they are still there [14:14:32] what? ah I know , did not add file changed , i bet [14:15:46] (03PS3) 10Nuria: [WIP] Changes in puppet to support a testing db for wikimetrics in vagrant. [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/116477 [14:16:42] sorry, one more still there nuria [14:25:18] argh, yes, [14:33:49] (03CR) 10Addshore: [C: 031] Redirect all *.wikidata.org subdomains to www.wikidata.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/113972 (owner: 10Thiemo Mättig (WMDE)) [14:39:16] (03PS4) 10Nuria: [WIP] Changes in puppet to support a testing db for wikimetrics in vagrant. [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/116477 [14:39:51] matanya, now it should be ok, sorry about those. [14:53:00] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [15:10:30] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [15:10:40] (03PS1) 10coren: Tool Labs: adapt webservice scripts for eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/116488 [15:11:32] (03PS1) 10Ottomata: Setting up rsync job to copy kafkatee generated zero logs from analytics1003 to stat1002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/116489 [15:11:47] (03CR) 10Ottomata: [C: 032 V: 032] Setting up rsync job to copy kafkatee generated zero logs from analytics1003 to stat1002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/116489 (owner: 10Ottomata) [15:12:06] (03PS2) 10coren: Tool Labs: adapt webservice scripts for eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/116488 [15:13:42] (03CR) 10coren: [C: 032] "I agree with myself." [operations/puppet] - 10https://gerrit.wikimedia.org/r/116488 (owner: 10coren) [15:34:45] (03CR) 10Ottomata: [WIP] Changes in puppet to support a testing db for wikimetrics in vagrant. (031 comment) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/116477 (owner: 10Nuria) [15:50:51] (03CR) 10Chad: "Tired of the spam from this change so removing myself from review list. Kindly don't readd me." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115412 (owner: 10Odder) [15:54:00] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [16:09:35] !jenkins visualeditor-doitall [16:10:00] !jenkins is https://integration.wikimedia.org/ci/job/$1 [16:10:01] Unable to modify db, access denied, link to database isn't valid [16:10:04] ... [16:18:30] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [16:28:50] (03PS5) 10Ottomata: [WIP] Changes in puppet to support a testing db for wikimetrics in vagrant. [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/116477 (owner: 10Nuria) [16:31:54] ottomata: https://rt.wikimedia.org/Ticket/Display.html?id=6939 any chance we could get that done soonish? [16:32:48] (03PS1) 10ArielGlenn: Revert "disable dumps rsync cron jobs temporarily" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116494 [16:33:24] huh [16:33:34] proxy error on labs? [16:33:46] toolllabs atleast [16:35:09] Wiki13: talk about it in #wikimedia-labs :-] [16:35:19] oh ok [16:35:22] (03PS2) 10ArielGlenn: Revert "disable dumps rsync cron jobs temporarily" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116494 [16:35:22] will do [16:37:55] (03CR) 10ArielGlenn: [C: 032] Revert "disable dumps rsync cron jobs temporarily" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116494 (owner: 10ArielGlenn) [16:38:29] (03CR) 10Ottomata: [WIP] Changes in puppet to support a testing db for wikimetrics in vagrant. (034 comments) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/116477 (owner: 10Nuria) [16:42:40] ^d: Has anyone re-added you to that commit before? [16:43:03] Because I'm not sure why you guys keep repeating that (hashar did that a couple of days ago, too) [16:43:30] which commit? [16:43:39] the disable flow on meta drama? :D [16:44:14] Yes [16:44:39] Well, I should've said 'patch' instead of 'commit', but you know what I mean [16:56:15] as nobody seems to take care of my rt ticket: Anybody around who could do a small Wikidata client deployment between 18:30 and 19:00 UTC? [16:56:38] 10:30 - 11:00 in SF [16:57:43] Reedy: hashar: ^d: ^ [16:58:06] (03PS1) 10Manybubbles: Config changes for Elasticsearch [operations/puppet] - 10https://gerrit.wikimedia.org/r/116498 [16:59:17] (03PS2) 10Manybubbles: Config changes for Elasticsearch [operations/puppet] - 10https://gerrit.wikimedia.org/r/116498 [17:00:20] (03PS6) 10Nuria: Changes in puppet to support a testing db for wikimetrics in vagrant. [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/116477 [17:03:12] <^d> hoo: No, I never removed myself, but I removed myself now and don't want to be readded by anyone :) [17:03:42] ^d: Removed, readded.... hwat? [17:03:48] the change's not even up [17:03:54] He meant to address odder, not you [17:03:55] . [17:04:08] oh well :P [17:04:20] <^d> Meh, caffeine time. [17:04:39] ^d: and deploy time later on? [17:05:06] <^d> I have a deploy today? [17:05:26] ^d: No, but it would be great if you could do one for us [17:05:37] aude is on holiday and my mortal access is still stuck in rt [17:06:04] * ^d has been on vacation and is still playing catch-up. [17:06:24] (03PS1) 10Manybubbles: Temporarily disable hot_threads logging [operations/puppet] - 10https://gerrit.wikimedia.org/r/116500 [17:07:14] ok, will look for someone else, then [17:07:31] What's the change? [17:08:01] Gloria: https://gerrit.wikimedia.org/r/116495 [17:09:39] Daniel can't deploy? [17:09:54] Gloria: how should he, he doesn't even ahve shell [17:10:12] Perhaps that's the bug. [17:10:24] Gloria: I have shell and a ticket for mortal open, but well [17:10:33] slowness on all layers :P *hides* [17:10:48] Measure twice, cut once. [17:15:23] (03PS7) 10Nuria: Changes in puppet to support a testing db for wikimetrics in vagrant. [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/116477 [17:20:58] greg-g: Who's going to do the deployment starting in 10mins? Any chance I could get something deployed after? [17:21:27] Oh, that's right. AFTv5 dies today, allegedly. [17:22:45] hoo: "Responsible Party: Mark H " == rdwrer [17:23:06] rdwrer: After you're done, would you mind deploying a small Wikidata fix? [17:23:14] greg-g: Thanks :) [17:23:27] hoo: what is it? [17:24:01] apergos: on RT duty today, or is topic old? [17:24:09] greg-g: https://gerrit.wikimedia.org/r/#/c/116495/ [17:24:24] greg-g: ottomata is this week, the topic's outdated [17:24:29] ...oh [17:24:31] EFF [17:24:39] greg-g: I knew a 09:30 deploy was a bad idea [17:24:47] It's OK, I can do it [17:24:49] heh [17:24:57] sleepy head [17:24:58] rdwrer: Thanks, will prepare the patches then [17:25:14] to the deployment branch, I mean [17:25:15] greg-g: I got two hours of sleep Sunday morning, then spent the day skiing, yeah I slept [17:25:17] apergos: if you're on RT: https://gerrit.wikimedia.org/r/#/c/116014/ plz [17:25:26] Wait wait wait [17:25:35] rdwrer: ? [17:25:35] greg-g: I admit I am.. not on rt! yay [17:25:42] hoo: Anything complicated for Wikidata, or is it just extension updates? [17:25:44] apergos: :P who is? [17:25:46] ottomata has it I believe [17:25:59] ottomata: https://gerrit.wikimedia.org/r/#/c/116014/ plz :) [17:26:13] ottomata: for bryan doing the deploy tomorrow [17:26:56] rdwrer: It's a small fix to our Scribunto bindings [17:27:25] hoo: Is it an extension update or something else? [17:27:37] I haven't touched Wikidata before [17:27:54] rdwrer: Extension update, you just ahve to sync the extensions/Wikidata dir [17:27:55] rdwrer: only do it if you feel 100% confident, we can move it back [17:28:00] I think I have an open bug report about the indentation in admins.pp being insane. [17:28:06] greg-g: I'm fine [17:28:08] s/indentation/whitespace/ [17:28:14] Gloria: Yeah, we talked about that [17:28:20] rdwrer: k [17:28:29] problem are the change conflicts [17:28:32] rdwrer: (I meant the wikidata one, for the avoidance of doubt) [17:28:36] Right [17:28:36] * hoo still has a patch to it open... [17:28:46] https://bugzilla.wikimedia.org/show_bug.cgi?id=60277 [17:28:51] hoo: I'll happily do it, tell me what patch it is [17:30:36] rdwrer: it's this one https://gerrit.wikimedia.org/r/116495 [17:30:36] bd808: is that right? you created an ssh key on tin? [17:30:41] is that the right thing to do? [17:30:45] (03PS8) 10Nuria: Changes in puppet to support a testing db for wikimetrics in vagrant. [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/116477 [17:31:12] hoo: You get bonus points for using the permalink - I'll do it, and you said you were backporting it to the wmf16 branch? [17:31:20] ottomata: Yes. It was the fix for ssh-agent not being able to keep up with a full fanout dsh suggested by hashar [17:31:22] ottomata: until we figure something else out, it's the only thing that'll let him scap without fail [17:31:35] and chad, and anyone else, really [17:31:48] can you just set your ssh-agent timeout to much larger? [17:32:01] Which is why I wanted a separate key for it so it's easy to revoke [17:32:21] ottomata: Maybe? OS X ssh-agent seems to be mysterious [17:32:52] Mysterious? [17:33:04] re: dsh and agent fail.. .use office wifi [17:33:06] (03PS9) 10Nuria: Changes in puppet to support a testing db for wikimetrics in vagrant. [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/116477 [17:33:15] no kidding, if the network is a little slower it works better [17:33:41] remembers from sync-apache [17:33:54] Gloria: Mysterious == "started by strange interactions at a distance baked into their ssh client" [17:34:23] so,like , maybe if you limit your bandwidth [17:34:30] ottomata: I think the timeout that happens is from the cluster host side, not my agent [17:35:30] ottomata: See https://gist.github.com/bd808/3ab3912093f03c5a6a9b for a dump os what I see when running a full fanout dsh [17:36:08] hoo: Your thing all set to go? [17:36:15] The `-oSetupTimeout=10` may be the cause, but that's hardcoded into many scap scripts [17:36:23] rdwrer: waiting for jenkins... [17:36:30] hmmm [17:36:58] ottomata: Adding a batch even with a batch size of 80 works [17:37:34] But if there are ideas on how to fix my ssh-agent I'm game to try them [17:37:44] hoo: Oh, and, are you pushing this to wmf16 or wmf15 too? [17:37:54] both, as it's a client fix [17:38:10] Ah k [17:38:57] bd808 [17:39:00] what if you do [17:39:32] ssh-add -t 86400 (or whatever) .ssh/id_rsa-whatever [17:39:33] ? [17:41:55] ottomata: `ssh-add -t` is for expiring the private key from the agent. My problem isn't that the key has been dropped from the agent; it's that the agent can't keep up with the inbound request rate from the ssh hosts in the cluster [17:42:15] ottomata: bd808, make the network slow somehow and try again [17:42:30] it's about things going too fast for the agent somehow [17:42:32] and then it dies [17:42:35] oh, you are running dsh from your local? [17:42:44] no from tin [17:43:08] really? weird [17:43:08] ottomata: I run dsh on tin but with my agent forwarded to tin from my laptop [17:43:12] right ok [17:43:25] and this isn't a timeout problem? its a ssh is too fast for agent problem? [17:43:26] my work-around in the past was to go on a slower network for this reason [17:43:31] haha [17:43:40] so wrong :) [17:43:40] if things don't go that fast the agent process survives :P [17:43:45] rdwrer: https://gerrit.wikimedia.org/r/116508 and https://gerrit.wikimedia.org/r/116509 [17:43:53] hmm, ok werid [17:43:58] So this key on tin would allow me to stop doing `ssh -A tin` and instead `ssh-agent; ssh-add` on tin [17:44:08] it's a thing on the local machine [17:44:08] bd808: i'm ok with that, adding a key on tin…but i'm not sure other ops think that is cool [17:44:18] have you heard from anyone on ops aboutthat? [17:44:35] don put private keys on servers [17:44:45] except tim and antoine and ... [17:44:56] <{{Guy}}> Why did don do that? [17:44:56] ottomata: Well I guess that's the question. I have been told that I wouldn't be the first to do this, but that doesn't mean it's blessed [17:45:05] hoo: You can self-merge, I'll make sure the extension updates are sane on tin [17:45:09] consistency is awesome [17:45:14] rdwrer: will do [17:45:35] ... and done [17:45:36] I haven't made my core commits yet, so no rush [17:47:43] ottomata: I can write an email about it to ops-l and ask I guess. Or add a bunch of opsen to the patch and hope for discussion there. [17:47:46] yeah, mutante, he has created the key on tin himself [17:47:50] he did not copy the key there [17:48:05] yeah, bd808, i don't think I can approve that one myself [17:48:06] this'll be fun [17:48:16] you'd need more blessing, so ja :/ [17:48:21] * bd808 goes off to poke a hornet's nest [17:48:25] haha [17:48:33] put on your flamewar suit [17:49:05] (03PS10) 10Nuria: Changes in puppet to support a testing db for wikimetrics in vagrant. [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/116477 [17:49:06] Well it comes down to this: I'm supposed to be deploying "the train" and I can't run half the scripts right now [17:49:27] hoo: You sure? [17:49:37] bd808: Is this a new issue? Or post-scap rewrite? [17:49:53] ottomata: bd808 yeah, this literally blocks bd808 from being ablt to do the work he is assigned to do, notably, update mediawiki [17:49:56] hoo: You need to V+2 and submit too [17:49:58] rdwrer: pretty much... anything to worry? [17:50:00] Gloria: old [17:50:09] (03PS11) 10Nuria: Changes in puppet to support a testing db for wikimetrics in vagrant. [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/116477 [17:50:12] (03CR) 10Ottomata: [C: 032 V: 032] Changes in puppet to support a testing db for wikimetrics in vagrant. [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/116477 (owner: 10Nuria) [17:50:12] Gloria: it's in scripts I haven't rewritten yet [17:50:14] I didn't see them in the log between HEAD and origin/wmf/* [17:50:21] bd808: Poetic. [17:50:33] ...oh [17:50:34] jenkins [17:50:35] Awesome [17:50:38] Gloria: Chad says this issue has plagued him for months [17:50:39] Oh, I read that as s/rewritten/written/, heh. [17:50:50] It was more poetic in my head. [17:50:56] That's more better [17:51:03] I wonder what Reedy does. [17:51:11] Gloria: I traveled back in time to report this issue. :) [17:51:33] Gloria: He's not on a mac. That may make all the difference [17:51:44] Gloria: Windows/putty ssh-agent is better than MacOS's? :) [17:52:06] Could be. That'd be pretty sad, though. [17:52:10] indeed [17:52:15] and poetic [17:52:17] :) [17:54:01] (03PS1) 10Ottomata: Updating wikimetrics module to latest [operations/puppet] - 10https://gerrit.wikimedia.org/r/116513 [17:54:07] !log mholmquist synchronized php-1.23wmf16/extensions/MultimediaViewer/ 'Fix JS errors in latest MMV code' [17:54:14] Logged the message, Master [17:54:15] (03PS2) 10Ottomata: Updating wikimetrics module to latest [operations/puppet] - 10https://gerrit.wikimedia.org/r/116513 [17:54:20] (03CR) 10Ottomata: [C: 032 V: 032] Updating wikimetrics module to latest [operations/puppet] - 10https://gerrit.wikimedia.org/r/116513 (owner: 10Ottomata) [17:56:02] !log mholmquist synchronized php-1.23wmf16/extensions/Wikidata/extensions/Wikibase/client/includes/scribunto/ 'Fix for scribunto/wikibase integration' [17:56:10] Logged the message, Master [17:56:52] ah, already verified on test: Works! [17:57:20] hoo: whew [17:58:01] greg-g: had everything prepared, just needed to purge my testing page ;) [17:58:18] !log mholmquist synchronized php-1.23wmf15/extensions/Wikidata/extensions/Wikibase/client/includes/scribunto/ 'Fix for scribunto/wikibase integration' [17:58:25] Logged the message, Master [17:58:26] hoo: Test on wikipedias? [17:58:35] rdwrer: test.wikipedia.org [17:58:59] hoo: I mean, the fix is on all wikis now, you should test somewhere that's not test [17:59:34] we tested that locally and I've just tested it on testwikipedia, so I guess it's good enough [17:59:38] 'kaaaay [17:59:41] (03PS1) 10Ottomata: Don't need to require wikimetrics::database [operations/puppet] - 10https://gerrit.wikimedia.org/r/116518 [17:59:44] production people will shout anyway, if I test there :P [17:59:44] testwikipedia isn't wmf16 [17:59:48] (03PS2) 10Ottomata: Don't need to require wikimetrics::database [operations/puppet] - 10https://gerrit.wikimedia.org/r/116518 [17:59:49] test someone that's on wmf16 [17:59:54] (03CR) 10Ottomata: [C: 032 V: 032] Don't need to require wikimetrics::database [operations/puppet] - 10https://gerrit.wikimedia.org/r/116518 (owner: 10Ottomata) [17:59:54] greg-g: I'm going to declare the deploy a success...wait [17:59:56] erm, I mean wmf15 [17:59:59] Oh, right [18:00:04] greg-g: You scared me there [18:00:06] numbers, they're hard [18:00:30] greg-g: Unqualified declaration of deploy-success [18:00:36] greg-g: Maybe what I should do instead of the new key is patch sync-wikiversions. It seems to be the only script that isn't using a fork limit on dsh. [18:00:36] :) [18:00:39] Now I'm going to go get dressed maybe [18:00:59] bd808: if you think you can in time, sure, I'm all for a 'better' solution [18:01:02] !log performing Cirrus reindex for commons - its need one for a long time and I've been too distracted to give it the attention it deserves [18:01:08] Logged the message, Master [18:01:20] * greg-g migrates back to the homestead from the coffee shop [18:11:30] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [18:15:37] ottomata: you merge nuria's change with trailing white spaces :/ [18:26:24] if i were to add a new logging type to 2 [18:26:58] err, if i were to add something new to wgDebugLogGroups in prod, what else to i need to do to ensure it gets written out on fluorine and rotated apropriatly? [18:27:17] <^d> Nothing. [18:27:22] awsome [18:27:32] <^d> The file will just start showing up once it gets logs. [18:28:19] matanya: ah foo I did, [18:28:34] will you patch ? [18:43:19] when is that labs migration.. oh maybe we'll hear about that in th emeeting [18:43:29] today or tomorrow? [18:51:56] (03PS1) 10Ottomata: Fixing unless on exec for createing wikimetrics_testing database [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/116523 [18:56:02] (03PS2) 10Ottomata: Using regex instead of 2 grep commands to match database name [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/116523 [18:57:13] (03CR) 10Ottomata: [C: 032 V: 032] Using regex instead of 2 grep commands to match database name [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/116523 (owner: 10Ottomata) [18:58:30] (03PS1) 10Ottomata: Updating to latest wikimetrics module [operations/puppet] - 10https://gerrit.wikimedia.org/r/116526 [18:58:37] (03CR) 10Ottomata: [C: 032 V: 032] Updating to latest wikimetrics module [operations/puppet] - 10https://gerrit.wikimedia.org/r/116526 (owner: 10Ottomata) [18:58:42] mlitn: Hi Matthias, how are you today? Ready for the removal of Article Feedback v5 on enwiki and frwiki? Both communities were notified and are ready for it. Any questions for me? [19:03:05] (03CR) 10BryanDavis: [C: 04-1] "I'm going to try to fix this with I86ec2f3 first. If that doesn't work I'll revisit this patch and see if I can get a consensus from ops o" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116014 (owner: 10BryanDavis) [19:03:11] (03Abandoned) 10Manybubbles: Temporarily disable hot_threads logging [operations/puppet] - 10https://gerrit.wikimedia.org/r/116500 (owner: 10Manybubbles) [19:05:27] fabriceflorin: I'll get started right now [19:05:44] (03CR) 10Matthias Mullie: [C: 032] Remove ArticleFeedbackv5 from Wikimedia wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112639 (owner: 10MZMcBride) [19:06:13] mlitn: Cool, sounds good. Keep me posted. What's the ETA? I'm taking a few final screenshots on both sites. [19:06:35] if all goes well, it should be done in 10" [19:06:44] (03Merged) 10jenkins-bot: Remove ArticleFeedbackv5 from Wikimedia wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/112639 (owner: 10MZMcBride) [19:07:05] will that leave you enough time for screenshots? [19:09:20] !log mlitn updated /a/common to {{Gerrit|Ia9f91e560}}: Removed unused SwiftCloudFiles extension [19:09:28] Logged the message, Master [19:12:24] fabriceflorin: I'm about to sync the changes, which will effectively remove AFT everywhere - is that ok? [19:12:53] Yes. [19:15:18] mlitn: Go ahead, mlitn -- I've taken all the screenshots I needed. [19:15:43] here goes [19:15:45] !log mlitn synchronized wmf-config/ 'Remove ArticleFeedbackv5 from Wikimedia wikis' [19:15:53] Logged the message, Master [19:17:29] it should be gone now (minus whatever may be in your caches) [19:18:57] It appears to be gone, yes. [19:19:30] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [19:20:58] mlitn: Yes, it is gone on the English Wikipedia on my Chrome and Safari browsers. [19:21:11] I will check now on the French Wikipedia. [19:24:05] mlitn: I can confirm that AFT5 is no longer available to end-users on either English or French Wikipedia. I will now post on both sites to let the communities know. Are the data tables available in any form on these sites now? How about the logs? [19:24:18] greg-g: I think the deployment window was for another 40 minutes, but we're done here [19:25:33] mlitn: sweet, it was just in case [19:25:40] Thanks mlitn: How long will it take for us to collect the data and get it in usable form? I assume you don't need a deployment window for that, right? I will respond now to Aaron and Dario's question about which data to make public, but my recommendation would be to make public everything that was public before ... [19:25:47] fabriceflorin: the tables have not (yet) been removed; I will export them into archive soon (maybe this week - though Dario wanted to review with legal & analytics first) [19:25:58] we won't need any deployment window for that :) [19:26:17] mlitn: Can an end-user access the data tables easily? If so, what's the URL? [19:26:55] they can't be accessed; they'll need to wait until the archives are created [19:27:30] mlitn: OK, what about the logs? (though I guess they don't include the actual comments, if I remember correctly). [19:27:41] I believe the database tables are accessible to Labs uesrs. [19:27:45] users [19:28:43] fabriceflorin, mlitn: http://p.defau.lt/?x1_UyVqnMNcROp9MY0werw [19:28:57] mlitn: It appears the logs are gone too, if I understand it correctly. There are no more entries at this URL: https://en.wikipedia.org/w/index.php?title=Special%3ALog&type=articlefeedbackv5 [19:28:58] The article_feedback_* and aft_* tables are accessible to Labs users. [19:29:18] The logs are probably no longer exposed due to the log type no longer being valid. [19:29:30] The data is in the logging table for those interested. [19:29:40] Gloria: Thanks for the clarification. That was my impression as well. [19:30:16] Gloria: those aft_* tables are not the latest AFT data - that's data is on a separate cluster, table aft_feedback [19:30:26] Fun. [19:30:33] I have no idea if that is accessible to labs users [19:30:43] Probably not. [19:30:44] Gloria: It seems that your wish has been fulfilled. I hope you are comfortable with this outcome. [19:31:11] data in logging should, as Gloria pointed out, indeed be available to labs users in logging table [19:31:16] (03PS2) 10Ori.livneh: geoip.inc.vcl: don't increment loop counter twice [operations/puppet] - 10https://gerrit.wikimedia.org/r/116469 [19:31:37] fabriceflorin: I think we've learned and grown from the experience. The growing pains are still pains, of course, but such is life. :-) [19:33:28] Gloria: I agree that it's been a useful learning experience. Yet I do hope that we find an effective way to give our readers a voice soon. That is still an important unmet requirement for the growth of our movement, IMHO. [19:34:37] We give our readers a voice with an edit button. We give our readers a voice by enabling them to become editors. ;-) [19:35:24] mlitn: Sounds good. It would be great if you could give a high priority to getting the archives ready in coming days, as I'm sure some users will want access to them when they find out all feedback is gone. [19:36:37] fabriceflorin: I'll work on those as soon as we have more details on how to format it and which data to include - shouldn't be too much work ;) [19:37:03] Gloria: Yes, I understand your perspective on inviting readers to edit. We have different views on this approach, as many users are uncomfortable using the current editing tools in the current talk pages, and so they are left out from our movement. Our hope is that Flow will solve that issue over time ... [19:37:24] * Gloria nods. [19:39:34] mlitn: Great. I appreciate your willingness to do the archive soon. I will drive that conversation today, so we can reach a resolution quickly on which feedback data to make public. Thanks again for your fine work today -- it's always a pleasure to collaborate with you :) [19:39:51] How Flow will help with declining editor numbers, I have no idea. [19:40:01] Unless you plan on enabling it in NS 0, that is. [19:40:49] I think the idea is to make talk pages more prominent. [19:40:58] And Flow will be there. [19:41:08] odder: That's a longer conversation than I have time for today, regrettably. There are so many different factors that prevent people from editing that we need a range of nuanced solutions to address the most important ones. Fingers crossed ... [19:42:29] Gloria: Yes, that's my understanding as well. In any case, I am glad that we brought AFT to a conclusion, and look forward to our next steps towards welcoming more editors in our movement with other tools. All the best ... [19:43:05] Cheers. :-) [20:09:09] (03PS1) 10Ottomata: Adding hoo to admins::mortals [operations/puppet] - 10https://gerrit.wikimedia.org/r/116534 [20:10:34] wow, we have a bug about a whitespace change now :P [20:10:43] (03CR) 10Ottomata: [C: 032 V: 032] Adding hoo to admins::mortals [operations/puppet] - 10https://gerrit.wikimedia.org/r/116534 (owner: 10Ottomata) [20:11:00] (03PS1) 10coren: Tool Labs: use LVM by default [operations/puppet] - 10https://gerrit.wikimedia.org/r/116535 [20:11:51] thanks, ottomata :) [20:12:27] yup! running puppet now on tin, was there somewhere else that mortals would have given you the access you need that you want me to run puppet manually on now? [20:13:26] ottomata: No, that should be enough... rdwrer was so nice to take over that deploy for me so that we're fine for today :) [20:13:29] (03PS3) 10ArielGlenn: Kunal Mehta access to terbium/flow database, rt #6895 [operations/puppet] - 10https://gerrit.wikimedia.org/r/116133 [20:13:33] ok great [20:13:39] (03CR) 10Ottomata: [C: 032 V: 032] Kunal Mehta access to terbium/flow database, rt #6895 [operations/puppet] - 10https://gerrit.wikimedia.org/r/116133 (owner: 10ArielGlenn) [20:17:01] apergos: hiyaaa [20:17:09] stat1002 access should not be given for database access [20:18:13] (03PS4) 10Hoo man: Introduce an admins::bastion user group [operations/puppet] - 10https://gerrit.wikimedia.org/r/116019 [20:18:36] (03PS2) 10coren: Tool Labs: use LVM by default [operations/puppet] - 10https://gerrit.wikimedia.org/r/116535 [20:19:03] (03CR) 10Hoo man: "rebased" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116019 (owner: 10Hoo man) [20:19:55] (03CR) 10coren: [C: 032] Tool Labs: use LVM by default [operations/puppet] - 10https://gerrit.wikimedia.org/r/116535 (owner: 10coren) [20:29:46] (03PS1) 10Dzahn: remove 'yvon' from site.pp,dsh,dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/116537 [20:34:19] ottomata: I'm pretty much gone but tell me in ticket or pm and I will see it tomorrow (orry but 10:30 pm I'm pretty checked out) [20:35:05] !log upgrading openssl on iron [20:35:13] Logged the message, Master [20:36:08] i updated in ticket [20:36:11] thanks apergos [20:38:00] CUSTOM - Host yvon is UP: PING OK - Packet loss = 0%, RTA = 36.73 ms [20:38:21] heh, that was lacking the actual message, but nvm, was me [20:38:50] !log yvon - disable notifications/schedule downtime for host and all services [20:38:58] Logged the message, Master [20:42:48] !log upgrading to librdkafak 0.8.3-1 on hosts running varnishkafka [20:42:55] Logged the message, Master [20:43:21] (03PS2) 10Dzahn: remove 'yvon' from site.pp,dsh,dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/116537 [20:43:41] Does anybody know where/how I should be setting $::realm with a labs self-hosted puppermaster. It's defaulting to "production" because of the bit at the top of realm.pp [20:44:59] hm, it shouldn't be defaulting to that [20:45:01] i mean [20:45:05] sorry, it shouldn't be set to that [20:45:16] it should have a case or if for that [20:45:18] but not set it [20:45:31] in the role class for what you're using [20:47:03] hm [20:47:08] yeah you shoudln't have to set it [20:47:15] Well I'm guessing that it is set to "production' due to `if $::realm == undef { $realm = "production" }` at the top of manifests/realm.pp [20:47:28] yeah, but it should already be set [20:47:30] not sure how though [20:47:32] i've never had it not set [20:47:36] bd808: $realm being undefined sounds wrong [20:47:40] what ottomata says [20:47:58] maybe labs issue? [20:48:11] but since you're on ::self you could just hack it? (for right now)? [20:48:43] ha, true [20:48:52] Yeah I was going to try that (hack) but was wondering if there was a missing step in the self-hosted docs [20:48:55] or maybe you could hack setting it in the labsconsole interface [20:49:05] naw that sounds messed up, [20:49:09] but i'm not sure how realm gets set in labs [20:49:11] ldap maybe? [20:49:45] i'd ping C.oren if he wasn't extremly busy with migration [20:49:59] but ..maybe related on the other hand..hmmm [20:50:49] I verified that it is being set via the if undef block [20:51:40] That setting ($::realm) expected to come in via facter then I take it [20:52:48] (03CR) 10Dzahn: [C: 032] remove 'yvon' from site.pp,dsh,dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/116537 (owner: 10Dzahn) [20:56:20] !log yvon - disabling puppet,stopping services,revoking puppet cert,delete salt key [20:56:27] Logged the message, Master [20:57:48] mutante: Thanks for the clarification on https://bugzilla.wikimedia.org/show_bug.cgi?id=60277 . I updated the bug summary. [20:58:52] mutante, could you help me with the Parsoid deploy? [20:59:26] Gloria: np. i linked more :) https://wikitech.wikimedia.org/wiki/Puppet_coding#tab_character_found_on_line_.. [20:59:29] salt is still broken, so I need a root to run the restart command [20:59:41] gwicke: restarting the odd broken one? which one [21:00:00] but you just needed a single host or so the last times? [21:00:00] mutante, the best would be to leave the restart to you completely [21:00:14] let me push out the code first [21:00:20] what changed [21:00:54] https://bugzilla.wikimedia.org/show_bug.cgi?id=61882 [21:01:57] mutante, please you run the following, as root: 'dsh -g parsoid service parsoid restart' [21:02:20] oh, not even the 10% thing or so? ok [21:02:26] doing it this way avoids the need to fix up salt breakage [21:02:29] ok [21:02:40] dsh defaults to sequential execution [21:02:51] so won't take down all nodes at once [21:03:04] The program 'dsh' is currently not installed. :P wrong host [21:03:07] one sec [21:03:15] wait, on fenari? [21:03:26] I'd say tin or bast1001 [21:03:26] tried palladium [21:03:28] ok [21:04:27] well, i'm not supposed to be on bast1001, tin says permission denied [21:04:35] trying [21:04:44] (even though i forward my key) [21:04:51] hm [21:05:13] the alternative would be to call salt as root [21:05:13] mutante: dns too for yvon :) [21:05:32] matanya: see above why i'm not doing that [21:05:42] according to Ryan this should work too when run as root: salt-run deploy.restart 'parsoid/deploy' '10%' [21:05:50] https://bugzilla.wikimedia.org/show_bug.cgi?id=61882#c3 [21:06:32] !log deployed Parsoid 98936e with deploy b070bcc [21:06:40] Logged the message, Master [21:07:22] !log restarting parsoid with salt-run deploy.restart 'parsoid/deploy' '10%' [21:07:31] Logged the message, Master [21:08:08] gwicke: Restart completed [21:08:20] that's all the output i get, but sounds good, doesnt it [21:08:20] is tin the salt master btw? [21:08:27] no, palladium [21:08:31] ah, k [21:08:36] i couldn't use dsh on tin [21:08:44] i kept getting permission denied (pubkey) [21:08:56] I can use it as a normal user [21:08:56] but i'm positive i forwarded like i did a lot of times before [21:09:00] maybe some root restriction [21:09:03] i didnt actually check on the hosts so far [21:09:07] ok [21:09:11] PROBLEM - mysqld processes on labsdb1001 is CRITICAL: PROCS CRITICAL: 2 processes with command name mysqld [21:09:14] I only see two old workers left [21:09:24] so looks very good [21:09:28] good [21:09:40] all restarted now it seems [21:09:42] thanks! [21:10:04] ok, cool, then let's always use that i suppose [21:10:11] RECOVERY - mysqld processes on labsdb1001 is OK: PROCS OK: 1 process with command name mysqld [21:10:15] seems like Ryan fixed the issue [21:10:21] no, this is a work-around only [21:10:29] ? again? [21:10:30] sigh [21:10:36] how many more temp. work arounds :p [21:10:52] normally deployers are supposed to be able to do everything that is needed in a deploy [21:10:57] including restarting the service [21:11:27] there is the service-restart command that ryan wrote, but that is still broken [21:11:39] hence 'workaround' [21:12:06] ok, just feels like we're using a different woraround each time [21:12:08] "SHOULD NOT" for "require root on the target hosts (privledge separation/access control)" here: https://www.mediawiki.org/wiki/Deployment_tooling/Notes/Deployment_system_requirements [21:12:12] :) [21:12:30] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 28 Feb 2014 08:45:44 AM UTC [21:13:19] is it really that? or now about a user on the saltmaster [21:14:03] gwicke: would it be solved if you had a user on the saltmaster? [21:17:44] mutante: It's being able to run salt commands basically full root access? [21:17:57] mutante, that would be a hack IMO [21:18:06] * bd808 has been trying to understand salt access controls [21:18:42] I'd be happy if we could restart the service on each of the parsoid boxes with sudo [21:19:01] then dsh would just work, as would restarting individual instances [21:19:13] !log restarting varnishkafka everywhere to pick up librdkafka 0.8.3 upgrade [21:19:20] Logged the message, Master [21:20:42] gwicke: that sounds right, since you already have a user on all those boxes [21:21:12] i don't know if you being able to use salt would be considered a ack [21:21:15] hack [21:21:30] neither do i know about the access controls so far [21:21:41] requiring an account on the salt master sounds hackish to me [21:22:17] what's different to dsh requiring an account where dsh is intsalled [21:22:52] dsh is installed on several (most?) machines [21:22:55] but i don't know this, it's been a long discussion already, i'm sure you have your reasons [21:23:21] I don't know what the salt access right design looks like [21:24:02] from a deployer's perspective it would be nice if the service restart worked from the deploy host [21:24:03] gwicke: no, just 2 hosts, fenari and bast1001 [21:24:09] have dsh [21:24:14] If http://docs.saltstack.com/topics/eauth/access_control.html is to be believed it should be possible to grant a user the ability to run deploy.restart on a subset of hosts. [21:24:47] mutante: and tin [21:25:04] (has dsh installed) [21:25:16] i see that, but that looks like it was a manual hack [21:25:19] don't see it in puppet [21:25:20] * gwicke nods [21:25:25] as opposed to the other boxes [21:25:40] that should be fixed too [21:26:14] mutante: An important "hack" since the scap scripts needed to deploy MW rely on it [21:26:51] anyway, once we switch from upstart to the new init script the parsoid restart issue should be moot as we'll be able to use dsh [21:27:05] it's not in puppet, things should be in puppet. what else can i say [21:27:05] the sudo setup is still in place for /etc/init.d/parsoid [21:27:28] I think there are a bunch of things on tin that were done outside of puppet. I can't find it's rsyncd config in puppet either. [21:27:35] bd808: really? that's sad [21:27:46] how is it being tested in labs [21:28:27] holy... [21:28:33] mutante: It's not really yet. See my questions earlier about $::realm [21:28:49] if tin fell over we'd....? [21:29:08] … have fun figuring out how to replicate it [21:29:46] :( [21:30:13] greg-g: Recreating the rsyncd config would be the least of our worries [21:30:36] recreating /a/common would be a lot more "fun" I think [21:30:46] you know, sorry, but at this point [21:30:54] i'd like to go back to what i was busy with [21:31:02] trying to get out of Tampa after all [21:31:08] * bd808 has no objections [21:31:19] mutante: please, sorry [21:31:43] thanks, ttyl [21:31:56] gwicke: hey, a suggestiong, mind pinging the on-duty ops person next time? might have to do it in advance so they're available. [21:33:06] who should be in charge of puppetizing tin? :) [21:33:14] bd808: ..and i thought fenari being replaced meant this was done right .. sigh [21:33:17] bbl [21:33:45] yeah, you'd think [21:34:20] I'm confused on how this happened since we migrated to tin no more than a year ago (around the summer of 13, i think) [21:34:26] greg-g: My guess is that there are things that aren't in puppet because scap wasn't supposed to be used in eqiad [21:35:00] And those things were added in a hurry in the middle of a deploy [21:36:19] But that's just a guess [21:37:08] I can't find the bug I thought existed about it [21:47:46] fwiw, adding dsh to a host is a literal one-liner: include misc::dsh [21:51:06] https://git.wikimedia.org/blob/operations%2Fpuppet.git/f0c7254b0f48a22de19379974e63a61ad71482ae/manifests%2Fmisc%2Fdeployment.pp#L8 ??? [21:51:36] hello [21:52:51] mutante: how hard would it be to re-enable bzapi? [21:53:02] there are some third-party project management tools for bugzilla that depend on it [21:54:40] ori: i'm not aware disabling an API during the migration [21:54:45] there were 2 of them [21:54:52] one wasnt working since a long time, afaict [21:55:21] http://www.gossamer-threads.com/lists/wiki/wikitech/353727 [21:55:27] the REST one is the one i'm after [21:56:03] *3:50 *looks like im stuck with the xml-rpc api for now [21:56:03] *3:50 *thanks mutante [21:56:14] ori: i never saw that working [21:56:19] must been a long time [21:56:31] not like it was enabled on kaulen [21:56:40] RECOVERY - Kafka Broker Messages In on analytics1021 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 2211.93247756 [21:56:42] ah :( [21:56:49] so, i just dunno :/ [21:57:16] By default Bugzilla has JSON-RPC and XML-RPC APIs. See [21:57:16] http://www.bugzilla.org/docs/4.2/en/html/api/Bugzilla/WebService.html [21:57:20] ori: you got those 2 [21:57:36] yeah, I'm not choosy, but https://github.com/mozilla/kanbanzilla (for example) depends on the REST api [21:57:37] mutante: so, the url I pasted above,that shows dsh *is* puppetized on tin, right? [21:57:53] i've thought of rewriting it to use one of the other APIs, but it'd be pretty involved [21:58:36] ori: "By default Bugzilla has JSON-RPC and XML-RPC APIs. See [21:58:44] oops, wrong paste [21:58:55] ori: "By default Bugzilla does not have a REST API - it needs to get installedfirst." [21:58:58] andre__: [21:58:58] ^ [21:59:15] err, no? [21:59:23] or well, outdated :) [21:59:26] so i'd likely argue the tools have to adjust [21:59:34] JSON-RPC and XML-RPC APIs in 4.4, REST in 5.0 to be released [22:00:21] (03PS1) 10coren: Tool Labs: set labs-db up (real servers this time) [operations/puppet] - 10https://gerrit.wikimedia.org/r/116636 [22:00:24] greg-g: ah, if that is installed it misc [22:00:37] greg-g: ah, if that is installed it misc::deployment and that is on the node, then yea [22:00:48] * greg-g isn't sure on that part [22:00:49] oh man, i can hardly type on this [22:01:40] greg-g: looks like it is indeed, node tin includes misc::deployment [22:01:55] that is because i just looked at site.pp and the other nodes include it directly [22:02:03] while this one does it via the other misc:: class [22:02:27] we're trying to get rid of misc::, but yea, it is [22:03:01] ah, k k [22:05:45] (03CR) 10Catrope: [C: 032] Enable Parsoid's edit caching on all public wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114100 (owner: 10Jforrester) [22:05:57] (03Merged) 10jenkins-bot: Enable Parsoid's edit caching on all public wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114100 (owner: 10Jforrester) [22:07:43] !log catrope synchronized wmf-config/InitialiseSettings.php 'Add $wmgUseParsoid (true for all non-private wikis)' [22:07:51] Logged the message, Master [22:08:20] !log catrope synchronized wmf-config/CommonSettings.php 'Plumbing for $wmgUseParsoid' [22:08:28] Logged the message, Master [22:11:34] (03CR) 10coren: [C: 032] "What could go wrong?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116636 (owner: 10coren) [22:14:09] (03PS1) 10coren: Tool Labs: class mysql expects a boolean log_bin [operations/puppet] - 10https://gerrit.wikimedia.org/r/116637 [22:16:41] (03CR) 10coren: [C: 032] Tool Labs: class mysql expects a boolean log_bin [operations/puppet] - 10https://gerrit.wikimedia.org/r/116637 (owner: 10coren) [22:20:30] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Last successful Puppet run was Tue 25 Feb 2014 06:33:37 PM UTC [22:21:32] !log yvon - deleted puppet stored configs, removed from icinga, shutdown -h now, kthxbye yvon [22:21:41] Logged the message, Master [22:22:10] matanya: ...now.. maybe we get tho that , did you say we have a change? [22:22:27] (03PS1) 10Ottomata: Changing replica_lag_max_messages and replica_lag_time_max_ms [operations/puppet] - 10https://gerrit.wikimedia.org/r/116645 [22:25:20] * Coren|Busy says very evil things about the upstream mysql class. [22:31:00] (03PS1) 10Dzahn: remove yvon's public IP, already shut down [operations/dns] - 10https://gerrit.wikimedia.org/r/116646 [22:32:46] (03CR) 10Dzahn: [C: 032] remove yvon's public IP, already shut down [operations/dns] - 10https://gerrit.wikimedia.org/r/116646 (owner: 10Dzahn) [22:33:16] !log DNS update - removing yvon [22:33:24] Logged the message, Master [22:40:11] PROBLEM - Host ms-be1003 is DOWN: PING CRITICAL - Packet loss = 100% [22:42:32] greg-g, re Deployment system requirements: are PHP deploys now happy with rolling deploys? [22:46:41] !log powercycling ms-be1003 [22:46:49] Logged the message, Master [22:50:00] RECOVERY - Host ms-be1003 is UP: PING OK - Packet loss = 0%, RTA = 0.40 ms [22:51:22] greg-g, do you think it makes sense to mention the rolling deploy requirement for services vs. the all-at-once requirement for PHP in the deploy system requirements? [22:53:11] gwicke: that's why it's a "should" for mw and a "must" for parsoid [22:53:25] * greg-g is in a meeting, multitasking [22:53:39] (03PS1) 10Ottomata: Default for log.flush.interval.ms has changed to 3000 [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/116655 [22:54:10] (03CR) 10Ottomata: [C: 032 V: 032] Default for log.flush.interval.ms has changed to 3000 [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/116655 (owner: 10Ottomata) [22:55:12] greg-g, I thought it is a 'must not' for PHP [22:55:28] also canary is not the same as rolling deploys [22:57:01] why would it be a must not? [22:57:53] these words need definitions, I'll do that, I think we're talking past each other sometimes with this... /me goes back to meeting [23:00:27] (03PS1) 10coren: Tool Labs: forget puppet for now [operations/puppet] - 10https://gerrit.wikimedia.org/r/116656 [23:00:36] (03PS1) 10Dzahn: remove "zhen" public IP, decom [operations/dns] - 10https://gerrit.wikimedia.org/r/116658 [23:02:51] (03CR) 10coren: [C: 032] "The mysql class sucks" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116656 (owner: 10coren) [23:03:21] (03PS1) 10Dzahn: remove 'zhen' from site.pp,dsh,dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/116659 [23:03:35] (03CR) 10jenkins-bot: [V: 04-1] remove 'zhen' from site.pp,dsh,dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/116659 (owner: 10Dzahn) [23:04:02] (03PS1) 10Catrope: Fix VisualEditor/Parsoid on private wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116662 [23:05:12] greg-g, there was a requirement for being as close to globally atomic as possible for PHP code [23:05:52] (03PS2) 10Dzahn: remove 'zhen' from site.pp,dsh,dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/116659 [23:06:44] gwicke: again, I think we're talking past each other, especially with this case.... let me explain [23:07:00] 1) we want atomicity on a few levels. [23:07:10] 2) one being per machine (obvs) [23:07:38] 3) two being per cluster of machines that serve a specific version [23:08:05] right now that's everything, but that doesn't need to be the case [23:08:12] greg-g, I'm referring to "Cluster-wide atomicity" in https://etherpad.wikimedia.org/p/DeploymentSystemRequirements [23:08:31] which is your 3) I think [23:08:31] (03CR) 10Jforrester: [C: 031] "To go in the LD this afternoon." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/116662 (owner: 10Catrope) [23:09:01] for Parsoid, that's a 'MUST NOT' [23:09:30] right, there's much more nuance there than meets the eye [23:09:52] also, that's not a requirement, that's a "nice to have" [23:10:16] greg-g, the table currently says that PHP wants rolling deploys instead [23:10:27] as a 'SHOULD' [23:10:38] it'd be nice [23:11:00] sorry, I have to go, I'll explain more in person when I'm back in SF [23:11:10] greg-g, ok [23:11:20] would be good clarify which of the two would be nice [23:14:29] MaxSem: hey, so, i'm about to shutdown zhen, that is one of the former 'vumi' hosts, and it's running redis [23:14:44] MaxSem: Dan Foy confirmed it's not needed though [23:15:39] we don't use that redis for anything else mutante [23:15:57] MaxSem: thanks! [23:16:19] !log zhen - stopping redis-server, preparing for decom [23:16:27] Logged the message, Master [23:16:41] en.wikipedia is slow to load for me. [23:17:02] At last check it was being caused by bits and upload [23:18:40] Anyone here? [23:18:49] morebots is [23:18:49] I am a logbot running on tools-exec-02. [23:18:49] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [23:18:49] To log a message, type !log . [23:19:00] PROBLEM - Redis on zhen is CRITICAL: Connection refused [23:19:07] * Damianz kicks morebots [23:19:29] ACKNOWLEDGEMENT - Redis on zhen is CRITICAL: Connection refused daniel_zahn zhen is going to eternal downtime - decom [23:19:44] Reedy, Damianz: enwiki is loading slowly. I think it hangs up when it tries to pull content from bits and upload [23:21:56] css looks broken to me, right hand tabs are over the announcement banner just under the left =\ [23:22:52] metawiki is also a bit on the slow side. [23:24:34] Where in the world are you? [23:24:47] PA, USA [23:25:13] Using Verizio Fios [23:25:34] At 40 Mbits/sec [23:25:44] So, it's definitely not my internet [23:26:34] Well, it could be [23:26:40] Can be midstream transit issues [23:26:49] But Damianz and I are hitting a different datacentre to you [23:26:59] meta.orain.org has no issues loading, and loads quickly. [23:27:49] And? [23:28:08] Reedy, and what? [23:28:14] Cyberpower678: Try the usual network debugging tools first [23:28:20] Wikipedia is being very slow. [23:28:21] Saying some random site is fine isn't helpful [23:28:29] Only Wikimedia sites are being slow [23:29:01] hoo, such as [23:29:19] I don't care, us whatever you want [23:29:31] traceroute would be a likely start [23:30:24] Slowest part of loading enwiki main for me is a hotcat call [23:31:11] On meta it's banner related calls [23:31:34] And a gadget [23:32:45] trace route appears to have stalled at a server. [23:33:38] The last server it hit was 207.88.15.53.ptr.us.xo.net [23:34:06] ipv6 via sixxs gives me 8 more hops. yay [23:34:35] trace route is still working [23:34:40] That'a a transit hop [23:34:40] Reedy: how much is that in listening intelligence agencies? [23:34:52] Try it with that other site too for comparison [23:35:05] hoo: They're already my ISP [23:35:27] I have to first wait before the trace route for en.wikipedia.org is finished. [23:36:11] :P [23:36:49] Multi tasking is hard [23:37:12] Reedy, yes it is. :p [23:37:21] Are you male? [23:37:28] If so, it's a valid reaso [23:37:30] n [23:38:49] Reedy, are you female? [23:39:44] Reedy: shall I take https://bugzilla.wikimedia.org/show_bug.cgi?id=62164 ? :P [23:40:27] Cyberpower678: No [23:40:43] hoo: you'll need to wget/curl it on bast1001, then sftp it to tin [23:41:03] Reedy: can't I load it on terbium? [23:41:07] nope [23:41:12] Ok :P [23:41:30] Look at a previous request too... you'll need to sudo -u apache mwscript importImages.php.... [23:42:33] Let's see what happens :P [23:43:48] I always need two tries getting the tar commands right :P [23:45:18] 2nd time lucky? [23:45:24] Is that a German thing? [23:45:31] Reedy: Yeah, always :P [23:50:30] Reedy: ... and done [23:54:47] Reedy, hoo, mutante: http://pastebin.com/z70UitU3 [23:57:02] that looks pretty ok [23:57:24] hoo, except for it doesn't go to the final destination. [23:58:19] it's simply not showing you the final hop(s) [23:58:45] I'm pretty inexperienced with advanced networking. [23:59:07] from that node it's just two hops two the actual lb