[00:11:05] i'm having some trouble loading images atm on en.wikipedia.... odd [00:25:31] hm that micropeak in network was later http://ganglia.wikimedia.org/latest/?c=Upload%20caches%20eqiad&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [00:53:41] (03CR) 10Andrew Bogott: [C: 031] "Looks good. Once you've confirmed that you've tested and validated both old and new behavior then I'm happy to merge." [operations/puppet] - 10https://gerrit.wikimedia.org/r/108537 (owner: 10Ottomata) [01:08:31] (03CR) 10MZMcBride: "Okayed by whom?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100927 (owner: 10Jforrester) [01:09:38] The NSA [01:09:49] Gloria ^ [01:14:17] Thanks, Bsadowski1. [01:17:22] Is the comment syntax "$+origin+::" in Puppet defined somewhere? [01:17:34] (modules/git/manifests/clone.pp) [01:18:54] Gloria: Or it was Jommy Woles [01:19:48] All right. [01:25:50] PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Fri 17 Jan 2014 06:59:52 PM UTC [02:06:43] drdee: sorry for being unresponsive, doing too many things at once this week :) I hope to have another look at your tool yet today. [02:06:55] no worries [02:07:13] i am calling it a day anyways :) [02:07:24] 'k [02:12:17] akosiaris (if still working) and Coren: Can we discuss https://gerrit.wikimedia.org/r/#/c/108647/ a bit? [02:12:58] It works fine on labs but when I merged it to virt0 page loads were incredibly slow. I suspect that it was due to multiple calls to fetchOldSchoolGroupInfo() which tried to load a nonexistent ldap record. [02:15:20] !log LocalisationUpdate completed (1.23wmf10) at 2014-01-21 02:15:20+00:00 [02:15:29] Logged the message, Master [02:16:14] Wouldn't all the OldSchoolGroupInfo() actually exist? [02:16:39] Also, wouldn't running your maintenance script that does the sync solve any missing records anyways? [02:16:58] Oh you're right, I have it backwards [02:17:12] It's fetchGroupInfo that I suspect [02:17:34] And, yeah, after the maintenance script things would be present. But I'd like to verify create/delete before messing with the whole db. [02:17:37] Coren: have you fixed all the cases where sql was using an old version? [02:17:50] So… hoping for some believable theory about why performance broke before I try to fix it... [02:18:06] e.g. is there reason to think that searching for a missing record is slower than searching for an existing one? [02:18:40] andrewbogott: That'd be really surprising, but it's possible. [02:18:45] Betacommand: AFAIK, yes. [02:19:41] Coren: Yep, so you see my concern about running the maintenance script before I diagnose the perf problem. [02:20:00] I don't really understand how indexes work. Do I need to create an index for that new ou before I merge that patch? [02:22:06] andrewbogott: Honestly, with opends, neither do I. It'd make sense, I know Ryan was careful about indexing all an sundry. [02:22:43] Do you know e.g. what form indexes take in ldap and/or how to create one? [02:22:48] If not, I'll look it up :) [02:24:21] andrewbogott: Are you working on wikitech? https://wikitech.wikimedia.org/wiki/Incident_documentation gives "Database error: A database query error has occurred. This may indicate a bug in the software." [02:24:40] scfc_de: I am not! But I'll have a look... [02:24:46] Seeing that elsewhere on wikitech? [02:24:48] Hmmm. Limited to that page. Other pages work fine. [02:25:16] Edit and View history work. [02:25:52] Edit -> Preview gives the database error. [02:26:16] scfc_de: Ever seen that page work? [02:26:37] No, but I found the culprit: If I remove the "{#ask ..." tag, it works. [02:28:07] Error was introduced by greg-g with https://wikitech.wikimedia.org/w/?oldid=73329, and judging by the edit comment and the subsequent edits this must have worked at some point. [02:28:19] I was about to say that the page works from history up to the point where there was a SMW query added [02:28:41] !log LocalisationUpdate completed (1.23wmf11) at 2014-01-21 02:28:40+00:00 [02:28:47] Logged the message, Master [02:29:14] Coren, scfc_de, does that mean I can blame user error? [02:29:16] * andrewbogott hopes [02:29:44] No SMW query should lead to a database error. You can blame SMW though. :-) [02:29:50] !log restarting gitblit using new upstart job def (I3f32dedf1) [02:29:54] Perhaps an SMW update after that edit? [02:29:57] Logged the message, Master [02:29:58] (03PS2) 10Ori.livneh: Add upstart job definition file for Gitblit [operations/puppet] - 10https://gerrit.wikimedia.org/r/108492 [02:30:03] (03CR) 10Ori.livneh: [C: 032 V: 032] Add upstart job definition file for Gitblit [operations/puppet] - 10https://gerrit.wikimedia.org/r/108492 (owner: 10Ori.livneh) [02:30:56] there will be a couple of alerts for gitblit probably while it is restarting [02:31:30] scfc_de: I did mess with SMW last week, but only barely. [02:32:37] or maybe not, I think I was quick enough. [02:32:53] On the project pages we have "{{#ask:" queries, but not for categories? [02:34:12] Changing "Category:" to "Anything:" works, but brings up an empty list of course. [02:35:18] scfc_de: I don't know much about smw syntax, do you think that page queries on a property that was previously present but no longer? [02:36:28] ("[[Category:Tools access requests]]" doesn't throw an error.) If I read http://semantic-mediawiki.org/wiki/Help:Selecting_pages#Categories_and_property_values correctly, categories are first-class citizens in SMW and don't need to be specially defined. [02:38:10] "[[Category:Tools Access Requests]]" (the correct link) doesn't throw an error, but display a list as well. So what's different between Category:Events reports and Category:Tools Access Requests? [02:39:53] I think it's that "Category:Events reports" has a subcategory. [02:40:10] scfc_de: When I load that page the error log says "PHP Notice: Uncommitted DB writes (transaction from DatabaseBase::query (WikiPage::pageData)). in /srv/org/wikimedia/controller/wikis/slot0/includes/db/Database.php on line 4104" [02:40:16] Dunno if that's a red herring. [02:40:47] My favorite type of herring. [02:41:08] I'll file a bug with SMW. Thanks, andrewbogott. [02:41:45] andrewbogott: Do you remember what the previos version of SMW was? [02:41:54] scfc_de: we're using an ancient version of SMW so don't count on a quick fix :( Do you have an idea about a workaround? [02:42:26] y'know, I think in theory we reverted and are running the same version as always. But, let me look... [02:42:51] Looks like we are running 1.8.0.5 [02:45:34] andrewbogott: We could just link to the category page for now instead of presenting a list of its pages (so users will have to click once more). In fact, I'll do that in a moment. [02:45:49] ok, thank you. [02:50:25] !log LocalisationUpdate ResourceLoader cache refresh completed at 2014-01-21 02:50:24+00:00 [02:50:32] Logged the message, Master [03:02:49] AND AS SOON AS I SEND THE BUG REPORT IT WORKS?! [03:03:54] *argl* (Well, working software is very nice actually, but still makes me look like an idiot :-).) [03:06:35] You don't look like an idiot, I saw the error too :) [03:09:46] andrewbogott: That makes us two, but not necessarily sane :-). [03:09:54] true [03:41:12] I'm going to deploy a change to the TimedMediaHandler extension in a couple of minutes. [03:46:34] !log ori synchronized php-1.23wmf11/extensions/TimedMediaHandler/TimedMediaHandler.hooks.php 'Update TimedMediaHandler for I7a6da6c62' [03:46:42] Logged the message, Master [03:47:45] !log ori synchronized php-1.23wmf10/extensions/TimedMediaHandler/TimedMediaHandler.hooks.php 'Update TimedMediaHandler for I7a6da6c62' [03:47:53] Logged the message, Master [03:54:05] Putting the "night" in "lightning." Wait. [03:54:39] frightening deploys [04:26:50] PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Fri 17 Jan 2014 06:59:52 PM UTC [05:07:50] PROBLEM - SSH on searchidx1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:09:33] (03PS2) 10Ori.livneh: logstash: Updates for udp2log filtering [operations/puppet] - 10https://gerrit.wikimedia.org/r/108533 (owner: 10BryanDavis) [05:09:46] (03CR) 10Ori.livneh: [C: 032 V: 032] logstash: Updates for udp2log filtering [operations/puppet] - 10https://gerrit.wikimedia.org/r/108533 (owner: 10BryanDavis) [05:11:50] RECOVERY - SSH on searchidx1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [05:13:00] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:13:50] RECOVERY - RAID on searchidx1001 is OK: OK: optimal, 1 logical, 4 physical [05:15:17] (03PS1) 10Andrew Bogott: Point the managehome chatbot to #wikimedia-labs-icinga [operations/puppet] - 10https://gerrit.wikimedia.org/r/108654 [05:15:19] (03PS1) 10Andrew Bogott: When complaining about mount failures, include hostname! [operations/puppet] - 10https://gerrit.wikimedia.org/r/108655 [05:15:29] Ryan_Lane: ^ and ^^ [05:15:36] heh [05:16:33] Ryan_Lane: while you're here… can you tell me where to find the 'Directory Manager' passwd for ldap? [05:16:43] fenari says it is the 'root passwd' which doesn't seem to be true. [07:00:13] (03PS11) 10Physikerwelt: Add Mathoid module (TeX -> MathML / SVG conversion web service) [operations/puppet] - 10https://gerrit.wikimedia.org/r/90733 [07:08:43] (03CR) 10Nemo bis: "So, what changed?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106629 (owner: 10MaxSem) [07:12:57] (03CR) 10Nemo bis: "https://git.wikimedia.org/commit/mediawiki%2Fphp%2Fwikidiff2.git/6dedd2d77dccf8b81911e45cce065059920bf4a0 for mobile only, I guess." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106629 (owner: 10MaxSem) [07:27:50] PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Fri 17 Jan 2014 06:59:52 PM UTC [08:19:38] !log For wikitech labs interface: deployed new servicegroup schema and supporting code; ran maintenance/transitionServiceGroupSchema.php [08:19:45] Logged the message, Master [09:02:22] (03PS1) 10Hydriz: Adding Extension:Babel configuration for simplewiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108663 [09:34:45] (03CR) 10Dzahn: [C: 031] "looks ok, maybe Ariel can run the scripts that check these against the other places to cleanup on decom?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/108070 (owner: 10Chad) [10:04:57] (03CR) 10Dzahn: "Mark, i suppose all the networking config for torrus, e.g. "$corerouters =" etc that matanya moves here should go into a role class and mo" [operations/puppet] - 10https://gerrit.wikimedia.org/r/108314 (owner: 10Matanya) [10:28:50] PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Fri 17 Jan 2014 06:59:52 PM UTC [10:34:52] mutante: i'm planning on modulizing the planet.pp and i see your are the main author, can you please explain your logic of defining everything? [10:37:15] matanya: don't, i already have it [10:37:27] oh, great [10:37:43] pick another one please, i merged that in the past, there was an issue, it had to be reverted etc [10:37:52] and i'm going to reupload it today [10:38:07] oh, i remember something like that now [10:38:15] can you recomment one? [10:38:50] anything in manifests/misc ? but maybe you have already done them all by now? heh, dunno [10:39:38] the logic of definining everything is to be able to iterate over a hash, btw, the closest thing to a loop [10:39:52] oh, nice [10:39:56] because for planet you want the same things for X languages [10:40:10] and it would otherwise be very repetitive [10:40:45] I did some misc'c, and after reviewing the leftovers, i see they are most likely stuff we will retire [10:41:16] e.g. pdf.pp and dsh [10:42:48] i suppose hashar would like zuul to be one [10:42:55] manifests/zuul.pp [10:43:38] ok [11:15:56] going to deploy I71b70d8ee (Add user preference to enable ULS) to wmf10 + 11 in a minute. [11:24:02] !log ori synchronized php-1.23wmf11/extensions/UniversalLanguageSelector 'I71b70d8ee: Add user preference to enable ULS' [11:24:09] Logged the message, Master [11:25:49] !log ori synchronized php-1.23wmf10/extensions/UniversalLanguageSelector 'I71b70d8ee: Add user preference to enable ULS' [11:25:56] Logged the message, Master [11:26:26] Hrm, something broke. [11:28:10] PROBLEM - Apache HTTP on mw1150 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:28:20] PROBLEM - Apache HTTP on mw1149 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:28:30] PROBLEM - Apache HTTP on mw1151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:29:10] RECOVERY - Apache HTTP on mw1150 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 1.937 second response time [11:29:10] RECOVERY - Apache HTTP on mw1149 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.232 second response time [11:29:20] RECOVERY - Apache HTTP on mw1151 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.048 second response time [11:29:21] right, bits's 503ed [11:29:53] tapering off [11:32:10] PROBLEM - Apache HTTP on mw1150 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:32:20] PROBLEM - Apache HTTP on mw1149 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:32:26] !log bits application server overload due to I71b70d8ee [11:32:33] Logged the message, Master [11:32:50] PROBLEM - Apache HTTP on mw1152 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:32:57] this needs to be reverted, commons is mostly broken [11:33:20] no, it'll cause static assets to be rebuilt again, making the overload more severe [11:33:22] i'll page someone [11:33:50] RECOVERY - Apache HTTP on mw1152 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 8.918 second response time [11:33:58] makes sense [11:34:10] RECOVERY - Apache HTTP on mw1149 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.837 second response time [11:36:30] PROBLEM - Apache HTTP on mw1151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:36:31] lately every deploy brings bits down for 20-30 min [11:36:50] PROBLEM - Apache HTTP on mw1152 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:37:10] RECOVERY - Apache HTTP on mw1150 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 3.443 second response time [11:37:14] * Nemo_bis proposes to deploy only in the middle of (UTC) night [11:37:20] PROBLEM - Apache HTTP on mw1149 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:37:35] Good thinking. That's the middle of the day in PST :) [11:38:20] RECOVERY - Apache HTTP on mw1151 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.056 second response time [11:38:30] https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&s=by+name&c=Bits%2520application%2520servers%2520eqiad&tab=m&vn=&hide-hf=false [11:38:50] RECOVERY - Apache HTTP on mw1152 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 3.442 second response time [11:39:10] RECOVERY - Apache HTTP on mw1149 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.328 second response time [11:41:30] PROBLEM - Apache HTTP on mw1151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:41:50] PROBLEM - Apache HTTP on mw1152 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:42:10] PROBLEM - Apache HTTP on mw1150 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:42:50] RECOVERY - Apache HTTP on mw1152 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 4.547 second response time [11:43:10] RECOVERY - Apache HTTP on mw1150 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 3.899 second response time [11:43:20] RECOVERY - Apache HTTP on mw1151 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.058 second response time [11:44:17] oh well, second last time they stayed at 100+ load for 20 min, now only a handful :) [11:44:35] kind of [11:46:10] PROBLEM - Apache HTTP on mw1150 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:46:12] hello [11:47:00] RECOVERY - Apache HTTP on mw1150 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.059 second response time [11:53:51] bits is still not the best [11:56:36] ori: You should be asleep now. [11:56:45] ok, now back to normal [11:57:01] my apologies, everyone [11:57:04] ori: in your spare time, a post mortem would be nice :) [11:57:16] yes. [11:57:54] IT'S ALREADY WAY AFTER 11 AM UTC! GO TO BED! [11:58:00] ori: ;-) [11:59:26] !log bits application servers back to normal since around 11:46 UTC [11:59:32] Logged the message, Master [12:09:14] the network peak is just after the end of load peak, I suppose all those who were waiting since a few minutes and got finally served? [12:13:11] (03CR) 10Matanya: "Whey do you split it to a separate channel? In production it is in the main -operations channel." [operations/puppet] - 10https://gerrit.wikimedia.org/r/108654 (owner: 10Andrew Bogott) [12:14:23] (03CR) 10Matanya: [C: 031] When complaining about mount failures, include hostname! [operations/puppet] - 10https://gerrit.wikimedia.org/r/108655 (owner: 10Andrew Bogott) [12:17:36] (03CR) 10Matanya: [C: 031] Updating git::clone so that gerrit urls can be assumed by default [operations/puppet] - 10https://gerrit.wikimedia.org/r/108537 (owner: 10Ottomata) [12:18:55] hoi ULS just broke on Wikidata [12:18:57] https://bugzilla.wikimedia.org/show_bug.cgi?id=60281 [12:19:06] it is just gone [12:19:54] (03CR) 10Matanya: Add r-base to Hadoop worker machines (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/108633 (owner: 10OliverKeyes) [12:22:22] ori: are you still around? [12:23:46] yes [12:24:42] looking at your change https://gerrit.wikimedia.org/r/#/c/108484/ [12:25:05] is was wondering, whey did you change the stuff in templates/varnish [12:25:46] while there is a module, and i discovered all the templates weren't moved. is there a reason behind? [12:26:11] ori: or should i preper a patch moving them to the right place? [12:27:01] you shouldn't move them; they are very critical infrastructure and the surprise / risk of moving them around takes priority over file hierarchy linting [12:27:44] so thet will stay there for good? [12:49:51] whats the memory limit in production? :) [12:50:40] memory limit for what? [12:50:58] heh, good point, php :) [12:53:13] 220 MB [12:53:45] cheers :) [12:54:58] (03PS1) 10Nemo bis: [s23.org wikistats] Throttle updates for big farms, keep updating big wikis' stats [operations/debs/wikistats] - 10https://gerrit.wikimedia.org/r/108670 [13:07:10] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [13:10:33] (03CR) 10JanZerebecki: [C: 031] "Looks good (though I didn't actually test it)." [operations/puppet] - 10https://gerrit.wikimedia.org/r/108488 (owner: 10Matanya) [13:27:42] (03CR) 10JanZerebecki: [C: 04-1] "See inline comment for regex that will never match. Otherwise good." (031 comment) [operations/apache-config] - 10https://gerrit.wikimedia.org/r/108465 (owner: 10Tim Landscheidt) [13:29:50] PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Fri 17 Jan 2014 06:59:52 PM UTC [13:34:48] (03PS1) 10Dzahn: turn planet into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/108674 [13:36:13] matanya: there you go [13:44:44] (03CR) 10Alexandros Kosiaris: [C: 032] "I run some catalog compiles for db48 and db63. db48 was picked for having skip_name_resolve set to false and db63 for not. I witnessed no " [operations/puppet] - 10https://gerrit.wikimedia.org/r/108488 (owner: 10Matanya) [13:46:49] thanks mutante [14:04:10] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [14:27:00] (03CR) 10Matanya: "layout looks ok. some minor lint stuff and 3 major questions." (0319 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/108674 (owner: 10Dzahn) [14:27:21] mutante: sorry for the load [14:30:00] matanya: hah, 19:) thanks, will fix in a little [14:48:13] mutante: around? [14:48:21] mutante: any idea if ULS was disabled? [14:49:03] Nemo_bis: ^ [14:49:10] yuvipanda: yes it was [14:49:21] Nemo_bis: any idea what happened there? [14:49:24] on all wikis unconditionally https://bugzilla.wikimedia.org/show_bug.cgi?id=46306 [14:49:29] Nemo_bis: it broke translate, so was turned off for now? [14:49:29] nope, please ask on bug [14:49:38] Nemo_bis: ah, ok [14:49:43] no, it was turned off hence it broke translate [14:49:56] yuvipanda: i just know that ori deployed https://gerrit.wikimedia.org/r/#/q/I71b70d8ee,n,z [14:50:22] yeah, same here [14:59:12] yuvipanda, https://bugzilla.wikimedia.org/show_bug.cgi?id=46306 is set to "Immediate" priority and I've sent an email to involved folks. Now they just need to wake up in their timezone. :P [14:59:37] andre__: yeah, saw that :) [14:59:56] andre__: just wanted to see if there was an obvious explanation somewhere and I missed it [15:12:10] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [15:17:59] (03CR) 10Alexandros Kosiaris: [C: 032] lint 1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa [operations/dns] - 10https://gerrit.wikimedia.org/r/108031 (owner: 10Alexandros Kosiaris) [15:34:55] (03PS2) 10Dzahn: turn planet into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/108674 [15:35:05] (03CR) 10Dzahn: turn planet into a module (0319 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/108674 (owner: 10Dzahn) [16:09:10] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [16:12:54] !log demon started scap: No code updates, just rebuilding all i18n [16:13:02] Logged the message, Master [16:30:50] PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Fri 17 Jan 2014 06:59:52 PM UTC [16:38:10] PROBLEM - Apache HTTP on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:38:40] PROBLEM - SSH on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:38:50] PROBLEM - RAID on mw1206 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:39:00] (03PS1) 10Ottomata: Creating account for Charles Salvia and granting access to analytics and stat nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/108688 [16:39:10] RECOVERY - Apache HTTP on mw1206 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 4.680 second response time [16:39:41] (03PS1) 10Alexandros Kosiaris: Pin volatile to frontend puppetmaster [operations/puppet] - 10https://gerrit.wikimedia.org/r/108689 [16:40:33] (03CR) 10Ottomata: [C: 032 V: 032] Creating account for Charles Salvia and granting access to analytics and stat nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/108688 (owner: 10Ottomata) [16:42:04] (03CR) 10Alexandros Kosiaris: [C: 032] Pin volatile to frontend puppetmaster [operations/puppet] - 10https://gerrit.wikimedia.org/r/108689 (owner: 10Alexandros Kosiaris) [16:44:29] (03PS3) 10Dzahn: turn planet into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/108674 [16:47:10] PROBLEM - puppetmaster backend https on palladium is CRITICAL: Connection refused [16:47:30] this is me ^. ignore please [16:48:10] RECOVERY - puppetmaster backend https on palladium is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.019 second response time [16:48:10] PROBLEM - DPKG on mw1206 is CRITICAL: Timeout while attempting connection [16:48:10] PROBLEM - Apache HTTP on mw1206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:48:20] PROBLEM - Disk space on mw1206 is CRITICAL: Timeout while attempting connection [16:48:40] PROBLEM - puppet disabled on mw1206 is CRITICAL: Timeout while attempting connection [16:48:41] PROBLEM - twemproxy process on mw1206 is CRITICAL: Timeout while attempting connection [16:49:31] (03PS1) 10John F. Lewis: Restrict upload on Korean Wikinews [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108690 [16:50:02] (03PS1) 10Legoktm: Enable ULS by default on Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108691 [16:50:10] PROBLEM - NTP on mw1206 is CRITICAL: NTP CRITICAL: No response from NTP server [16:50:45] (03CR) 10JanZerebecki: [C: 031] "Looks good, effect should be the same." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/106108 (owner: 10Jeremyb) [16:53:41] (03PS3) 10Chad: Remove old Tampa srv* and mw* apaches from dsh groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/108070 [16:55:10] RECOVERY - check_mysql on payments1001 is OK: Uptime: 8296580 Threads: 4 Questions: 10154133 Slow queries: 29717 Opens: 364 Flush tables: 1 Open tables: 64 Queries per second avg: 1.223 [16:56:47] (03CR) 10Dzahn: beta: convert into a module (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/108289 (owner: 10Matanya) [16:59:22] (03PS4) 10OliverKeyes: Add r-base to Hadoop worker machines [operations/puppet] - 10https://gerrit.wikimedia.org/r/108633 [17:01:01] (03CR) 10Dzahn: "Chris/Rob, any opinions about the timing of these decoms? like _when_ they should go from dsh files (vs. the other lifecycle steps)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/108070 (owner: 10Chad) [17:01:09] (03CR) 10Chad: [C: 031] WIP:Make Elasticsearch less exciting [operations/puppet] - 10https://gerrit.wikimedia.org/r/107920 (owner: 10Manybubbles) [17:01:20] PROBLEM - Host mw1206 is DOWN: PING CRITICAL - Packet loss = 100% [17:01:45] (03CR) 10Greg Grossmeier: "I'm ok with enabling ULS for wikidata, but I want Ori to sign off, performance/timing-wise." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108691 (owner: 10Legoktm) [17:06:25] !log demon finished scap: No code updates, just rebuilding all i18n (duration: 58m 29s) [17:06:33] Logged the message, Master [17:06:47] <^d> 58m29s [17:06:56] <^d> Wonder how much of that was scapping to unused apaches in tampa. [17:07:48] <^d> wtf. [17:07:59] <^d> I got tons of public key denied on that scap. [17:08:01] <^d> Just noticed. [17:13:34] (03Draft1) 10Alexandros Kosiaris: Lint puppetmaster.erb [operations/puppet] - 10https://gerrit.wikimedia.org/r/108694 [17:13:47] (03CR) 10Alexandros Kosiaris: [C: 032] Lint puppetmaster.erb [operations/puppet] - 10https://gerrit.wikimedia.org/r/108694 (owner: 10Alexandros Kosiaris) [17:17:39] (03CR) 10Dzahn: "inline comments and mainly i think this is lacking a role class" (035 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/108498 (owner: 10Matanya) [17:20:38] ^d: on legit servers? or in other words, could some application servers still be without l10n? [17:20:50] <^d> Prolly. [17:20:57] O_o [17:21:03] <^d> Actually, maybe not. [17:21:10] <^d> They might all be tampas? [17:22:05] That cluster is beginning a strike [17:25:30] (03CR) 10Dzahn: [C: 04-1] "the role class does not actually use the class from init.pp?" (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/107567 (owner: 10Matanya) [17:29:03] (03CR) 10Dzahn: [C: 04-1] torrus: move into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/108498 (owner: 10Matanya) [17:35:41] could any check the LDAP replication between virt1 and virt1000 ? [17:35:56] it was broken last week and I am not sure how to verify whether it is still broken [17:38:26] (03PS1) 10Alexandros Kosiaris: Backup /var/lib/puppet/volatile [operations/puppet] - 10https://gerrit.wikimedia.org/r/108699 [17:39:44] hashar: I remember andrewbogott_afk and akosiaris working on it a few days ago [17:39:53] I might be wrong [17:39:55] hashar: fixed [17:39:59] since yesterday [17:40:20] and it wasn't strictly broken [17:40:29] akosiaris: hoo [17:40:33] for some reason which i failed to discover, it had lost part of the log [17:40:55] and btw... what do you use virt1000 for ? [17:41:02] the reason I have bring the subject last week was because the jenkins-bot user was missing the 'mail' field on virt1000 [17:41:12] virt1000 is the primary LDAP server of Gerrit [17:41:23] I guess it has been made that way to speed up Gerrit [17:41:31] makes absolute sense [17:41:43] it just that i am trying to figure out the various ldap clients [17:41:48] client apps more like it [17:42:01] (03CR) 10Alexandros Kosiaris: [C: 032] Backup /var/lib/puppet/volatile [operations/puppet] - 10https://gerrit.wikimedia.org/r/108699 (owner: 10Alexandros Kosiaris) [17:42:08] we should require clients to auth themselves :D [17:42:10] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [17:45:29] Is the RT password reset working? [17:49:48] superm401: not working for me either [17:50:04] I did it last night and haven't seen an email with it yet [17:50:20] I tried it again and got, "Only external users can reset their passwords this way." [17:50:50] Jeff_Green: still on RT duty? see above re RT password reset, plz [17:50:57] Jeff_Green: if not, please forward appropriately ;) [17:51:59] Hi everyone!! How's it going? And... does anyone know if I can get a copy of some tables on test2 and/or attach a debugger there? [17:54:53] Jeff_Green, the first part of my WMF email is mflaschen [17:55:49] ah my LDAP sucks :/ [17:56:01] any clue how I could query for the jenkins-bot user information on a specific ldap server ? [17:56:15] ldaplist on labs does not let us specify the service to query [17:56:25] and I can't find out the proper parameters to pass to ldapsearch [17:57:28] hashar: do you want me to move manifests/zuul.pp into a module? [17:59:03] matanya: nop. That is just some glue for wikimedia [17:59:17] tought so. thanks [17:59:32] matanya: the Zuul module is more or less independent. I can't remember the exact reason we ended up with a manifest in between the role and the module [18:00:01] I should be merged imho [18:04:59] i suspect we have a vacuum in RT duty. probably shouldn't be me again b/c I'm on a plane tomorrow [18:08:25] Jeff_Green: it has been neglected for some time now [18:10:14] (03PS2) 10Matanya: torrus: move into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/108498 [18:14:48] Hello all, I'm trying to get my company to switch to MediaWiki, the sticking point is migrating content from a DocuWiki server. I've search and found plenty of converters from MW -> DocuWiki but not much for the other way. I've looked at http://www.linuxintro.org/wiki/Convert_a_dokuwiki_to_mediawiki but I would need a dev server to 'test' on first. Does anyone have any other pointers or suggestions? [18:15:35] stljim: you probably want to ask in #mediawiki channel or on the mediawiki-l mailing list [18:15:51] stljim: I don't think anyone is actively maintaining the conversion scripts though : / [18:16:26] hashar: Do you know the url where the conversion scripts are? I found them one.. along time ago and can't find them again [18:16:57] stljim: no idea sorry :( [18:17:11] stljim: ah that wiki pages refers to importTextFile.php [18:17:20] stljim: it should be in MediaWiki under the maintenance/ directory [18:35:16] (03CR) 10Tim Landscheidt: [C: 04-1] Set up redirects for toolserver.org (031 comment) [operations/apache-config] - 10https://gerrit.wikimedia.org/r/108465 (owner: 10Tim Landscheidt) [18:37:34] (03PS3) 10Matanya: beta: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/108289 [18:38:52] beta cluster is throwing errors [18:38:53] PHP fatal error in /data/project/apache/common-local/php-master/extensions/Flow/includes/Data/UserNameBatch.php line 170: [18:38:53] Class 'Flow\Data\User' not found [18:40:20] jackmcbarn: copied into -corefeatures, thanks [18:40:26] uh [18:47:51] (03PS2) 10Tim Landscheidt: Set up redirects for toolserver.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/108465 [18:50:35] (03PS3) 10Tim Landscheidt: Set up redirects for toolserver.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/108465 [18:54:24] (03PS6) 10Matanya: etherpad: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/107567 [18:55:17] (03CR) 10Matanya: "I hope it is better and clrear now. I used the generic webserver conf file." [operations/puppet] - 10https://gerrit.wikimedia.org/r/107567 (owner: 10Matanya) [18:56:30] (03PS1) 10Chad: Lower account-related caches from infinity to 1 week [operations/puppet] - 10https://gerrit.wikimedia.org/r/108715 [18:56:47] greg-g: https://gerrit.wikimedia.org/r/#/c/108709/ looks good. I can sync the wmf11 one first so we can watch load. [18:56:49] * ^d grumbles something about making cache defaults at "forever" a silly default [18:58:39] so, legoktm told me that if I get added to the wmf ldap group, I should get +2 automatically [18:58:49] do I need to do an rt request for that? [18:59:04] gi11es: you don't have to be in wmf group for that [18:59:21] matanya: but the wmf group grants it [18:59:34] ^d: ^ [18:59:38] legoktm: also grants it [18:59:48] i'm not in that group and i get +2 [19:00:04] anywau, gi11es you need rt ticket for ldap change [19:00:05] We should get gi11es into wmf ldap for other reasons however. Like access to tools that auth against ldap. [19:00:12] <^d> Ah, I wonder if I can do this now. [19:00:19] <^d> I couldn't ssh to the right box the other day because tampa. [19:00:22] bd808: I wonder if any such tool has been recently made available :P [19:00:30] :) [19:00:43] I thought everyone was in wmf group [19:00:50] no [19:01:06] last time I checked it had hundreds members, IIRC [19:01:09] only wmf emploees and the like [19:01:25] obviously I meant everyone at wmf :) [19:01:42] <^d> gi11es: Done. [19:01:49] thanks! [19:01:50] Nikerabbit: I'm going to sync https://gerrit.wikimedia.org/r/#/c/108709/ to wmf11 [19:01:57] yeah, that is way it has so many members :) [19:02:28] <^d> # ldaplist -l group wmf | grep 'member:' | wc -l [19:02:28] <^d> 95 [19:02:35] <^d> Not hundreds. [19:02:40] <^d> Not even *a* hundred yet :) [19:02:44] 0.95 hundreds! [19:03:02] bd808: is the ldap stuff public somewhere? [19:03:15] e.g members, groups etc [19:03:20] <^d> I doubt it. [19:03:25] * bd808 shrugs [19:03:37] <^d> We need a less manual way of managing ldap. [19:03:41] i didn't think so ^d [19:03:48] <^d> (Was on my wishlist for Ryan awhile back, but we never got around to it) [19:04:10] ^d: there is the ldap module at least [19:04:59] ^d: 0<=x<1 needs plural in Italian :P [19:05:16] Nemo_bis: English too, I think [19:05:20] 0.5 meters, not 0.5 meter [19:05:42] ;) [19:07:18] scfc_de: still around? [19:07:35] matanya: Yes. [19:07:39] !log ori synchronized php-1.23wmf11/extensions/UniversalLanguageSelector/Resources.php 'I05c76e478: Make ext.uls.mediawiki depend upon ext.uls.init' [19:07:45] Logged the message, Master [19:07:57] !log sync-file: mw1206: ssh: connect to host mw1206 port 22: No route to host [19:08:05] Logged the message, Master [19:08:39] scfc_de: in modules/protoproxy/templates/proxy.erb there is the site var, how can i lookup.var it? it is a global var. what do you suggest? [19:09:35] (03PS3) 10Manybubbles: Make Elasticsearch less exciting [operations/puppet] - 10https://gerrit.wikimedia.org/r/107920 [19:10:16] (03PS4) 10Manybubbles: Make Elasticsearch less exciting [operations/puppet] - 10https://gerrit.wikimedia.org/r/107920 [19:11:19] matanya: Unfortunately, I'm not good at the subtleties of Puppet and 3+, so I can't offer an informed opinion on that :-). [19:11:50] (03PS1) 10BryanDavis: kibana: Allow /status from all [operations/puppet] - 10https://gerrit.wikimedia.org/r/108722 [19:11:54] thanks scfc_de i'll ask paravoid :) [19:12:02] greg-g: I'll do wmf10 next. The JS is not loaded by default on all pages, so pushing an update won't cause a run on the bits app servers. [19:13:11] ori: :D [19:13:33] that wasn't meant as a snark, just a dry fact [19:15:57] ori: great [19:17:14] !log ori synchronized php-1.23wmf10/extensions/UniversalLanguageSelector/Resources.php 'I05c76e478: Make ext.uls.mediawiki depend upon ext.uls.init' [19:17:22] Logged the message, Master [19:21:10] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [19:25:40] ori: please merge https://gerrit.wikimedia.org/r/#/c/106502/1 sometime [19:27:30] ori: could you have a look at https://gerrit.wikimedia.org/r/#/c/108691/ please while you're doing uls stuff? currently wikidata users don't get uls by default and are therefore missing quite an important part of it [19:27:40] thanks! [19:27:42] Lydia_WMDE: yep, fixing [19:27:46] \o/ [19:27:48] can you stick arond for a moment to confirm? [19:27:53] sure [19:28:05] here for at least another hour [19:31:50] PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Fri 17 Jan 2014 06:59:52 PM UTC [19:35:47] RobH: https://gerrit.wikimedia.org/r/#/c/107859/ :D [19:38:11] (03PS1) 10Ori.livneh: Add 'wmgUniversalLanguageSelectorDefault' [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108735 [19:38:55] (03CR) 10RobH: [V: 032] lower TTL for contint websites [operations/dns] - 10https://gerrit.wikimedia.org/r/107859 (owner: 10Hashar) [19:39:15] (03CR) 10RobH: [C: 032] lower TTL for contint websites [operations/dns] - 10https://gerrit.wikimedia.org/r/107859 (owner: 10Hashar) [19:39:48] (03CR) 10Hashar: [C: 031] Lower account-related caches from infinity to 1 week [operations/puppet] - 10https://gerrit.wikimedia.org/r/108715 (owner: 10Chad) [19:40:44] (03CR) 10Ori.livneh: [C: 032] Add 'wmgUniversalLanguageSelectorDefault' [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108735 (owner: 10Ori.livneh) [19:40:58] !log ori updated /a/common to {{Gerrit|Ie9ff56755}}: Add 'wmgUniversalLanguageSelectorDefault' [19:41:06] Logged the message, Master [19:41:31] (03Abandoned) 10Legoktm: Enable ULS by default on Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108691 (owner: 10Legoktm) [19:42:05] !log ori synchronized wmf-config/InitialiseSettings.php 'Ie9ff56755: Add wmgUniversalLanguageSelectorDefault' [19:42:12] Logged the message, Master [19:42:41] (03PS1) 10Faidon Liambotis: Varnish: disable WAP on mobile frontends [operations/puppet] - 10https://gerrit.wikimedia.org/r/108738 [19:42:43] !log ori synchronized wmf-config/CommonSettings.php 'Ie9ff56755: Add wmgUniversalLanguageSelectorDefault (2/2)' [19:42:51] Logged the message, Master [19:43:15] MaxSem: ^ [19:43:41] Lydia_WMDE: can you confirm? [19:43:54] ori: looking [19:44:39] ori: ULS is back \o/ [19:44:43] thanks a bunch [19:44:55] Lydia_WMDE: thanks for confirming; sorry for the inconvenience [19:45:11] happens [19:45:12] gone for me [19:45:26] Do I still need to mark the preference? [19:45:26] twkozlowski: on wikidata? [19:45:32] tick it, I meant [19:45:39] twkozlowski: the preference is set by default unless you have set it already [19:45:48] if you have explicitly indicated your preference, it will stick [19:46:16] OK, my preference already was 'yes please', so :) [19:46:27] thanks ori [19:46:47] then you should see it [19:46:49] no? [19:47:18] yes [19:47:28] Lydia_WMDE: yeah, I just thought you didn't have to set the preference [19:47:32] but it does work nicely [19:47:33] ok [19:47:51] twkozlowski: its on by default, so it you uncheck it, it'll go away. [19:48:18] twkozlowski: users will have it on by default unless they explicitly opt-out [19:48:47] perhaps something to mention in Tech News [19:49:18] twkozlowski: i think ori already added it there [19:49:21] !technews [19:49:26] bah. [19:49:27] it is scoped to wikidatawiki, testwikidatawiki, and testwiki [19:50:16] MatmaRex: oh right, I didn't have the page on my watchlist [19:50:25] and I think !technews works on -tech [20:02:03] Is gerrit not working for anyone else? [20:02:26] Reedy: Seems fine for me. [20:02:54] <^d> wfm. [20:16:39] (03PS1) 10Ori.livneh: webperf: factor out local configuration to role class [operations/puppet] - 10https://gerrit.wikimedia.org/r/108816 [20:18:08] paravoid: one of the apaches that you restarted earlier (mw1206) is unresponsive when running scap / sync. should I remove it from the DSH group? [20:19:08] (03CR) 10Ori.livneh: [C: 032] webperf: factor out local configuration to role class [operations/puppet] - 10https://gerrit.wikimedia.org/r/108816 (owner: 10Ori.livneh) [20:33:11] (03PS1) 10Ori.livneh: asset-check: report to Graphite rather than Ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/108823 [20:35:21] (03PS2) 10Ori.livneh: asset-check: report to Graphite rather than Ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/108823 [20:36:28] (03CR) 10Ori.livneh: [C: 032] asset-check: report to Graphite rather than Ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/108823 (owner: 10Ori.livneh) [20:41:15] (03PS1) 10Aaron Schulz: Adjust linkpurge and renderfile limits [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108824 [20:48:35] (03PS1) 10Ori.livneh: webperf: add dispatch_stat helper to navtiming.py [operations/puppet] - 10https://gerrit.wikimedia.org/r/108825 [21:11:05] heh, useful https://www.mediawiki.org/w/index.php?title=Wikimedia_MediaWiki_Core_Team/Quarterly_review,_January_2014&diff=next&oldid=887470 [21:13:48] (03PS1) 10RobH: replace wildcard cert for labs ldap [operations/puppet] - 10https://gerrit.wikimedia.org/r/108830 [21:14:34] (03CR) 10jenkins-bot: [V: 04-1] replace wildcard cert for labs ldap [operations/puppet] - 10https://gerrit.wikimedia.org/r/108830 (owner: 10RobH) [21:14:56] Nemo_bis: yeah, working on it ;) [21:15:05] as in, yeah, I know the private mailing list link isn't ideal [21:15:41] I can copy/paste on wiki somewhere, but it isn't really wiki-fied (reads like an email). [21:23:28] (03PS1) 10Tim Landscheidt: webperf: Remove misleading references to Ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/108831 [21:27:00] PROBLEM - MySQL Recent Restart Port 3307 on labsdb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:27:50] RECOVERY - MySQL Recent Restart Port 3307 on labsdb1003 is OK: OK 2855792 seconds since restart [21:29:09] ori: I didn't restart any apaches [21:41:29] (03PS2) 10RobH: replace wildcard cert for labs ldap [operations/puppet] - 10https://gerrit.wikimedia.org/r/108830 [21:42:02] (03PS1) 10Odder: More rights for 'patroller' user group on hewikibooks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108832 [21:42:14] (03CR) 10jenkins-bot: [V: 04-1] replace wildcard cert for labs ldap [operations/puppet] - 10https://gerrit.wikimedia.org/r/108830 (owner: 10RobH) [21:42:23] dammit. [21:44:12] ori: any thoughts on this one? [21:44:12] https://gerrit.wikimedia.org/r/#/c/108640/ [21:51:14] (03CR) 10Cmcmahon: [C: 031] "I'd like to get this running. Right now monitor_fatals.rb does not exist that I can see in /usr/local/bin/ on the deployment cluster" [operations/puppet] - 10https://gerrit.wikimedia.org/r/108041 (owner: 10Hashar) [22:01:03] * jeremyb reasks: readd me to deployment-prep admins please? or make me a new instance or tell me which other project to use instead. (for protorel redirects) [22:02:00] jeremyb: hey :D [22:02:03] or even better deploy the current apache-config master on labs and get it responding properly to wikimediafoundation.org/fundraising and jobs.wikimedia.org :) [22:02:06] jeremyb: what are you willing to play with ? [22:02:23] jeremyb: the beta cluster (deployment-prep) uses custom apache configuration [22:02:36] not the ones from operations/apache-config.git unfortunately :( [22:02:48] hashar: right. but someone told me to use deployment-prep. so i was going to make a new instance for my tests [22:03:01] what you wanna test / play with ? [22:03:03] hashar: i've had bad results with the tests i've tried so far [22:03:12] hashar: i said: (for protorel redirects) [22:03:14] installing mediwaiki is hard [22:03:31] Reedy: i don't need mediawiki. [22:03:43] apt-get install apache [22:03:46] is also hard [22:03:53] is that merely to test out http/https crazy stuff in our apache configs ? [22:04:16] hashar: yes. but the tests i ran so far disagree with prod [22:04:20] cause those urls ( /fundraising and jobs.wikimedia.org ) are not going to work on deployment-prep [22:04:24] ahh [22:04:36] Reedy: maybe you missed yesterday's conversation [22:04:36] ? [22:04:40] might be something in varnish [22:04:50] it's not varnish for sure [22:05:08] hashar: as i said, i was going to make a brand new instance [22:05:31] anyway... i already ran out of ideas for how it was working the way i observed [22:05:42] so other eyes would be welcome [22:06:09] might want to document the issue you are observing on some bug / mail ? [22:08:26] hashar: if i leave the nonexistant line in all.conf then everything is non-existant even if it really does exist later. if i comment out nonexistant then (some) of my tests give better results. but i'm not sure quite what's going on [22:08:59] hashar: i asked for apache people here yesterday. Reedy seems to think this is very simple, maybe he wants a crack at it :) [22:09:09] this / it .... [22:09:22] still no idea what you are investigating :] [22:09:45] is that jobs.wikimedia.org not properly redirecting when one uses HTTP or HTTPS ? [22:10:31] (03PS3) 10RobH: replace wildcard cert for labs ldap [operations/puppet] - 10https://gerrit.wikimedia.org/r/108830 [22:10:46] hashar: i deployed 70ec6d15cff8b678f51f469800ed7c6618ff7f55 on a local test apache. i can't get it to give the results i think it should for either of the 2 examples i mentioned [22:10:53] https://git.wikimedia.org/commit/operations%2Fapache-config/70ec6d15cff8b678f51f469800ed7c6618ff7f55 [22:11:15] (03CR) 10jenkins-bot: [V: 04-1] replace wildcard cert for labs ldap [operations/puppet] - 10https://gerrit.wikimedia.org/r/108830 (owner: 10RobH) [22:11:21] yes yes, shortest patchset ever [22:11:22] (03PS4) 10RobH: replace wildcard cert for labs ldap [operations/puppet] - 10https://gerrit.wikimedia.org/r/108830 [22:11:26] !g 70ec6d15cff8b678f51f469800ed7c6618ff7f55 [22:11:27] https://gerrit.wikimedia.org/r/#q,70ec6d15cff8b678f51f469800ed7c6618ff7f55,n,z [22:11:27] stupid typos [22:12:06] (03CR) 10jenkins-bot: [V: 04-1] replace wildcard cert for labs ldap [operations/puppet] - 10https://gerrit.wikimedia.org/r/108830 (owner: 10RobH) [22:13:11] http://jobs.wikimedia.org/ * 301 Moved Permanently http://www.wikipedia.org/ [22:13:17] ahh [22:13:24] jeremyb: jobs.wikimedia.org is defined in wikimedia.conf [22:13:25] http://wikimediafoundation.org/fundraising * 301 Moved Permanently http://www.wikipedia.org/fundraising [22:13:30] hashar: so what? [22:13:34] jeremyb: of operations/apache-config.git : wikimedia.conf [22:13:51] and it blindly redirect to HTTP : RewriteRule ^/$ http://wikimediafoundation.org/wiki/Work_with_us [22:14:00] regardless of X-Forwarded-Proto [22:14:03] gah [22:14:10] back up [22:14:15] this has nothing to do with protorel [22:14:40] http://jobs.wikimedia.org/ is redirecting to http://www.wikipedia.org/ [22:15:30] $ curl --silent -i http://jobs.wikimedia.org/?foo|fgrep Location [22:15:31] Location: http://wikimediafoundation.org/wiki/Work_with_us?foo [22:15:55] no in prod. in my test apache [22:16:41] not in* [22:16:43] jeremyb: So you're saying the patch is faulty or your test setup? [22:17:14] scfc_de: my test setup. i've not gotten to testing the patch yet. first i'm just trying to replicate what's already deployed @ prod [22:17:33] jeremyb: same issue as yesterday? [22:17:41] scfc_de: already compared apache package versions. they're exactly the same [22:17:44] matanya: yes [22:18:17] jeremyb: Well, debugging *your* setup is kinda hard because we don't know how you transformed the configuration :-). [22:18:54] In fact, I'm working on a test script for https://bugzilla.wikimedia.org/show_bug.cgi?id=43266 right now, but still a few hours to go. [22:19:25] scfc_de: i'm just using a bash script to make test cases and apache-fast-test and then diff -u [22:22:48] scfc_de: re hard to debug because... right. which is the whole point of why i was asking *where* to test in labs. so someone suggested that i should do these tests on deployment-prep. so i looked there a bit and decided the existing apache conf was not a good place to deploy this and hashar seems to agree. so either someone should make me a new instance on deployment-prep (or readd me as project admin there...) or tell me a different proje [22:23:20] > decided the existing apache conf was not a good place to deploy this and hashar seems to agree. so either someone should make me a new instance on deployment-prep (or readd me as project admin there...) or tell me a different project to use instead or build their own test env which I can then use [22:23:25] scfc_de: back up to 22:01 UTC here :) [22:24:00] maybe it's a simple problem but i think i'm stuck... [22:24:00] (03PS1) 10Matanya: site: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/108840 [22:24:05] jeremyb: can add you to the integration project [22:24:09] (03CR) 10jenkins-bot: [V: 04-1] site: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/108840 (owner: 10Matanya) [22:24:21] hashar: idk what that is, but if you think it's the right place to test this then sure [22:24:37] hashar: who else can log in to those boxes? [22:24:39] jeremyb: though I think I exhausted the # of instances quota :D [22:24:45] hah [22:25:02] and deployment-prep , we removed every admins but wmf staff [22:26:14] (03Abandoned) 10Matanya: site: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/108840 (owner: 10Matanya) [22:26:32] hashar: no, the important part is NDA. AFAIK, i can be readded to -prep [22:26:41] hashar: are you aware of puppet-clean? [22:26:41] probably [22:26:49] matanya: not at all [22:27:19] jeremyb: meanwhile I added you to the integration project [22:27:23] i see [22:27:24] jeremyb: what instance name do you want ? :D [22:27:26] hashar: https://github.com/santana/puppet-cleaner/ [22:27:42] matanya: ohhh [22:27:50] matanya: must be a good way to clean up our manifests [22:28:07] it actully works on most of the stuff [22:28:22] but not on site.pp as demonstared above [22:28:24] hashar: whatever. i intend to throw it away in 3 days (or else i can start pulling out hairs) [22:28:48] jeremyb: integration-protrel.pmtpa.wmflabs , being build right now [22:29:00] jeremyb: I am not sure whether you can add puppet classes on it though [22:30:11] jeremyb: and you should get sudo right on integration-protrel instance whenever it is complete [22:31:09] hashar: it's not in dns yet? [22:31:21] I managed to connect to it [22:31:47] using bastion2.wmflabs.org as a bastion [22:31:48] aha, i was looking for protorel [22:31:51] not protrel [22:31:52] :D [22:31:54] :) [22:31:55] sorry [22:32:00] np [22:32:02] want me to recreate it ? [22:32:22] nah [22:32:50] PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Fri 17 Jan 2014 06:59:52 PM UTC [22:32:52] got to wait for apt to upgrade package and puppet to finish running [22:33:00] though you probably have sudo already [22:33:02] sudo -s [22:33:03] would tell [22:33:13] $ sudo -l [22:33:13] User jeremyb is not allowed to run sudo on integration-protrel. [22:33:22] ... [22:33:49] forgot to give access on ALL commands hehe [22:34:04] jeremyb: try again? [22:34:39] hashar: in terms of CI, what puppet work is still needed? [22:34:55] not much [22:35:09] ideally I would love to have jenkins configuration managed via puppet [22:35:14] but that is not that easy to handle :( [22:35:46] hashar: no. let's move to #-labs for figuring out labs :) [22:36:47] (03PS5) 10RobH: replace wildcard cert for labs ldap [operations/puppet] - 10https://gerrit.wikimedia.org/r/108830 [22:37:32] (03CR) 10jenkins-bot: [V: 04-1] replace wildcard cert for labs ldap [operations/puppet] - 10https://gerrit.wikimedia.org/r/108830 (owner: 10RobH) [22:42:27] (03PS6) 10RobH: replace wildcard cert for labs ldap [operations/puppet] - 10https://gerrit.wikimedia.org/r/108830 [22:43:24] (03CR) 10jenkins-bot: [V: 04-1] replace wildcard cert for labs ldap [operations/puppet] - 10https://gerrit.wikimedia.org/r/108830 (owner: 10RobH) [22:54:03] (03PS7) 10RobH: replace wildcard cert for labs ldap [operations/puppet] - 10https://gerrit.wikimedia.org/r/108830 [22:58:50] (03CR) 10RobH: "This should successfully change our LDAP systems on virt0 & virt1000 to use individualized certs with their FQDN/Hostname rather than the " [operations/puppet] - 10https://gerrit.wikimedia.org/r/108830 (owner: 10RobH) [23:03:02] (03PS1) 10Manybubbles: Monitor Elasticsearch query stats groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/108852 [23:04:04] (03CR) 10jenkins-bot: [V: 04-1] Monitor Elasticsearch query stats groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/108852 (owner: 10Manybubbles) [23:06:24] (03PS2) 10Manybubbles: Monitor Elasticsearch query stats groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/108852 [23:07:25] (03CR) 10jenkins-bot: [V: 04-1] Monitor Elasticsearch query stats groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/108852 (owner: 10Manybubbles) [23:09:13] (03PS3) 10Manybubbles: Monitor Elasticsearch query stats groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/108852 [23:22:37] (03CR) 10coren: [C: 032] "This seems straightforward enough. I don't /think/ any client expects a /specific/ certificate (and if they do, it's a bug that needs fix" [operations/puppet] - 10https://gerrit.wikimedia.org/r/108830 (owner: 10RobH) [23:23:10] Coren: heh, thx dude. I think you are right, but my understanding of LDAP is novice at best. [23:23:59] I think I may wait and push it live tomorrow my AM, rather than late afternoon here. [23:24:05] This may be wise. [23:24:14] Unless you have another source of time pressure. [23:24:32] At the very least, it means more people to notice if something arcane breaks. [23:24:44] i think maybe i figured it out [23:24:48] RobH: are we replacing racktables? [23:27:28] matanya: sorry, in what way? as a service eventually? [23:28:02] cuz it already uses its own cert, which is what ive been doing a lot of replacing recently [23:28:21] ideally we'd stop using racktables for somethign that integrates with servermon or another tool to populate system data dynamically [23:29:59] Coren: Well, the only time pressure is we want to lock down that cert use [23:30:22] but there is no key leak currently in effect. [23:30:38] ie: i'm willing to let it wait till tomorrow, but not a lot more, heh [23:30:45] Works for me. [23:30:49] cuz tomorrow i lock down the other use of the wildcard [23:44:06] ottomata: Reedy: Gloria: turns out that both the default apache conf (on apt-get install) and also what's installed by puppet in labs when installing a basic mediawiki from manifests use an apache conf that doesn't work with . i change them all to and then magically everything works. [23:44:30] not sure where to find the relevant part of the apache conf in puppet to compare against what i was using [23:44:31] jeremyb, is this in mediawiki-vagrnt? [23:44:35] ottomata: no [23:44:37] hm [23:44:49] i just submitted this to ori yesterday: [23:44:50] https://gerrit.wikimedia.org/r/#/c/108640/ [23:45:12] i think it has more to do with NameVirtualHosts [23:45:17] than Listen [23:45:21] Interesting. [23:45:24] anyway, at least i can test now [23:45:26] *:80 is more common syntax, I think. [23:45:29] Nice find. [23:46:29] we need some better way to test this stuff in the future though. and a place to document how to test it. e.g. where do i put a note to my future self telling me how to fix the stanzas? [23:46:32] (03PS2) 10Ottomata: Puppetizing wikimetrics [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/108643 [23:49:00] Gloria: did you ever review the test cases? [23:49:21] Which? I had a few on the bug report. [23:49:29] They're still sitting there, AFAIK. [23:49:33] ok [23:50:29] Gloria: here's what i asked you to review yesterday: http://dpaste.com/1561574/plain/ [23:50:50] Oh, that bash noise. [23:51:00] Yeah, I recommended programmatically comparing output. [23:51:07] grrrr [23:51:11] Louder. [23:51:12] that's input [23:51:15] not output [23:51:31] programmatically comparing is fine for output [23:53:13] Gloria: is that louder? [23:53:57] * jeremyb reruns away [23:55:46] !log deployed documentation jenkins jobs for MultimediaViewer [23:55:54] Logged the message, Master [23:57:30] jeremyb: I remember that script being attached to a bug or so, but it is not in ?