[00:00:29] PROBLEM - Puppet freshness on cp4013 is CRITICAL: Last successful Puppet run was Mon 02 Dec 2013 11:58:44 PM UTC [00:00:33] mutante: cool., thanks! [00:00:46] (03CR) 10Ori.livneh: [C: 032] Add logstash100[1-3] to site.pp & add bd808 & aaron as sudo per RT 6366 [operations/puppet] - 10https://gerrit.wikimedia.org/r/98730 (owner: 10Ori.livneh) [00:01:15] wee [00:01:54] gonna manually run puppet on those nodes [00:02:04] btw, who was that se4598 person? [00:02:06] mwalker: ? [00:02:29] PROBLEM - Puppet freshness on cp4013 is CRITICAL: Last successful Puppet run was Mon 02 Dec 2013 11:58:44 PM UTC [00:02:44] possibly steinsplitter from https://gerrit.wikimedia.org/r/#/c/98073/ [00:03:11] greg-g: http://en.wikipedia.org/wiki/User:Se4598 [00:03:12] ahh, yeah, figured it out now, thanks mwalker [00:03:30] I'm going to push my config change now [00:04:05] mwalker: cool [00:04:20] !log mwalker updated /a/common to {{Gerrit|I9e2f24923}}: Changing banner expiration to 10 months [00:04:29] oh [00:04:29] PROBLEM - Puppet freshness on cp4013 is CRITICAL: Last successful Puppet run was Mon 02 Dec 2013 11:58:44 PM UTC [00:04:30] that's new [00:04:35] Logged the message, Master [00:05:13] !log mwalker synchronized wmf-config/InitialiseSettings.php [00:05:16] mwalker: yeah, meant to send an email about that [00:05:28] Logged the message, Master [00:05:35] :) [00:05:42] its kinda cool [00:05:45] !log mwalker synchronized wmf-config/CommonSettings.php [00:05:46] mwalker: btw, i just ran across rules in AdBlock against Wikipedia banners [00:05:59] Logged the message, Master [00:06:07] mutante: if it's just the extended rules; we're aware [00:07:14] mwalker: https://adblockplus.org/forum/viewtopic.php?f=2&t=19557&p=87593&hilit=wikipedia#p87593 [00:07:36] |http://meta.wikimedia.org/wiki/Special:BannerRandom?* [00:07:52] it's a request they didnt reply to yet [00:10:39] PROBLEM - Puppet freshness on sq37 is CRITICAL: Last successful Puppet run was Sun 01 Dec 2013 06:02:06 AM UTC [00:11:20] well; if they do; we're already in the EasyPrivacy list... so... [00:11:27] it's not good [00:11:33] but I don't know what else to do [00:16:48] mwalker: i wouldn't suggest you try antiblock.org , that just makes people use Disable Anti-Adblock .. and anti-anti-anti-anti :p [00:17:26] heh; no [00:17:38] basically; if people want to block us; that's their perogative [00:17:51] and in a month I'm going to make it significantly harder to do so on all wikis [00:17:55] by changing the URLs [00:18:11] but... even then [00:18:14] it is technically an ad [00:18:17] mwalker: more seriously, what you can do is try get on the "accepted ads" list on ABP [00:18:28] but then, i know how controversial that feature is [00:18:33] and AdBlockPlus itself [00:19:02] mwalker: yea, the only way is to follow their guidelines to be accepted as non-intrusive ad [00:19:31] and _supposedly_ people paid for that .. (and AdBlockPlus != AdBlock or AdBlockEdge, which dont have that feature) [00:21:18] https://adblockplus.org/forum/viewforum.php?f=17 [00:21:58] http://searchenginewatch.com/article/2280451/Google-Paying-to-Have-Ads-Whitelisted-on-AdBlock-Plus [00:23:16] (03PS1) 10Ori.livneh: Add accounts for bd808 and aaron on logstash nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/98736 [00:26:11] (03CR) 10Ori.livneh: [C: 032] Add accounts for bd808 and aaron on logstash nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/98736 (owner: 10Ori.livneh) [00:28:59] RECOVERY - Puppet freshness on cp4013 is OK: puppet ran at Tue Dec 3 00:28:57 UTC 2013 [00:29:49] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [00:40:14] ^d: I'm going to go to the store. when I get back I can take over script babysitting [00:40:33] <^d> Mmk. [00:40:35] I believe I saw an error building one of the wikis. testwikidata or something. I'll figure it out when I get back [00:45:07] greg-g: dumbish question; I don't need to do any mystic voodoo beyond a sync-file to get a configuration change applied? [00:47:54] (03PS1) 10BryanDavis: [WIP] Add configuration for Wikimania Scholarships [operations/puppet] - 10https://gerrit.wikimedia.org/r/98740 [00:48:49] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Add configuration for Wikimania Scholarships [operations/puppet] - 10https://gerrit.wikimedia.org/r/98740 (owner: 10BryanDavis) [00:51:03] (03PS2) 10BryanDavis: [WIP] Add configuration for Wikimania Scholarships [operations/puppet] - 10https://gerrit.wikimedia.org/r/98740 [00:56:21] mwalker: aside from the rain dance you mean? [00:56:26] mwalker: that should be it [00:56:35] heh -- ya; it turns out I was looking at cached content [01:08:24] (03CR) 10BryanDavis: [WIP] Add configuration for Wikimania Scholarships (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/98740 (owner: 10BryanDavis) [01:13:17] !log rebooting sq37 [01:13:34] Logged the message, Master [01:16:00] PROBLEM - Host sq37 is DOWN: PING CRITICAL - Packet loss = 100% [01:23:36] cmjohnson1: fyi.. https://gerrit.wikimedia.org/r/#/c/94115/ & https://gerrit.wikimedia.org/r/#/c/96489/ that's why i didnt make a new one [01:24:01] not saying it needs merge now. but re: tesla [01:24:55] mutante: what do you think about the comment? [01:25:08] regarding the virtualization subnet? [01:26:51] cmjohnson1: pretty much exactly what ariel commented :) [01:27:11] (03CR) 10Ori.livneh: [C: 04-1] "Comments inline. I think the Apache config deserves a lookover from someone more familiar with setting up PHP webapps." (0312 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/98740 (owner: 10BryanDavis) [01:27:12] could make a new patchset that just removes actual tesla [01:38:52] http://git.wikimedia.org/ is (almost) dead, long live git [01:47:01] (03PS3) 10BryanDavis: [WIP] Add configuration for Wikimania Scholarships [operations/puppet] - 10https://gerrit.wikimedia.org/r/98740 [01:49:34] TimStarling: i want to depool es1004. it is as simple as commenting out the IP in $wmgOldExtTemplate ? safe, etc [01:51:47] <^d> manybubbles|away: Almost done for all wikis but cawiki, enwikisource, frwikisource, nlwiki, itwiki (these 5 are still on pass 1) [01:53:11] (03CR) 10BryanDavis: [WIP] Add configuration for Wikimania Scholarships (0311 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/98740 (owner: 10BryanDavis) [01:59:42] (03CR) 10Dzahn: "2 minor puppet-lint warnings inline and please retab to use 4 spaces instead of actual tabs. i know we have it both ways, but tabs are the" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/98740 (owner: 10BryanDavis) [02:02:06] (03PS4) 10Dzahn: [WIP] Add configuration for Wikimania Scholarships [operations/puppet] - 10https://gerrit.wikimedia.org/r/98740 (owner: 10BryanDavis) [02:03:03] ^d cool [02:03:07] thanks. [02:03:11] im back [02:03:16] <^d> no problem :) [02:03:27] everything good? [02:03:30] <^d> Yep. [02:03:33] <^d> Humming along dandy. [02:04:01] kids have hand foot and mouth so i went to get groeries and drugs [02:05:17] ori-l, you will be happy to know that i began working on JsonConfig ext... will see what i will come up with :) [02:06:36] <^d> manybubbles|away: Take care of family. I've got this. It's completely uneventful. [02:08:01] yurik-road: cool, looking forward [02:08:11] yurik-road: i still owe you a review of the vagrant patch [02:08:18] ori-l, want to creat a repo for me btw? [02:08:24] i submitted req [02:08:32] its on the repo request page somewhere [02:08:40] mediawiki/extensions/JsonConfig? [02:08:50] yep [02:09:15] do you know whom i should talk to about db schema deployment? This is not new schema - rather i'm trying to deploy an existing extension (flag revs) to a new site (metawiki) [02:09:41] do we have a special procedure for that? [02:10:39] yurik-road: springle i suppose [02:11:15] (03PS7) 10Dzahn: bugzilla module [operations/puppet] - 10https://gerrit.wikimedia.org/r/94075 [02:12:44] PROBLEM - Puppet freshness on cp3012 is CRITICAL: Last successful Puppet run was Mon 02 Dec 2013 05:10:45 PM UTC [02:14:08] mutante, thx, will try to get to springle [02:16:44] PROBLEM - Puppet freshness on cp3011 is CRITICAL: Last successful Puppet run was Mon 02 Dec 2013 05:15:19 PM UTC [02:17:26] (03CR) 10Dzahn: [C: 031] "Alex, even more comments now on PS4, i think i fixed pretty much all, now also unified those 2 scripts into a single define and it runs on" [operations/puppet] - 10https://gerrit.wikimedia.org/r/94075 (owner: 10Dzahn) [02:17:50] !log LocalisationUpdate completed (1.23wmf4) at Tue Dec 3 02:17:50 UTC 2013 [02:17:57] yurik-road: https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/extensions/JsonConfig [02:18:00] ^d: k. send me an email about where things are when you are done for the night. I imagine some stuff will run over night now that we're sure it is uneventful. [02:18:04] can you mark the request as fulfilled? I gotta run [02:18:08] Logged the message, Master [02:18:14] <^d> manybubbles|away: Will do, have a good night [02:18:21] you too [02:18:28] ori-l, thx!!! [02:18:44] akosiaris: ^ when you see this later, replied to all the review comments re: BZ, see above [02:18:56] gerrit 94075 [02:27:01] yurik-road: whats up? [02:27:20] schema stuff? [02:27:27] hi springle, i need to deploy flagged revs extension on meta [02:27:36] wasn't sure whom to bug about db update [02:27:57] new tables or alterations, or both? [02:28:22] no idea - the extension is out there on many sites [02:28:22] like de, etc [02:28:29] looking... [02:29:24] https://wikitech.wikimedia.org/wiki/FlaggedRevs_setup has some old docs from 2011 [02:29:33] springle, the config change patch has a link https://gerrit.wikimedia.org/r/#/c/95662/ [02:29:42] http://git.wikimedia.org/blob/operations%2Fmediawiki-config.git/a8723d447344c57a4f40b52eae076c683201a11a/wmf-config%2FInitialiseSettings.php#L10138 [02:30:21] yurik, Reedy is probably a good person to help if you can wait til he's around [02:31:54] Eloquence, no rush, just need to figure out whom to bug at the end of the day :) [02:32:10] (and not the literal end of the day) [02:33:33] yurik-road: generally devs (or yeah, reedy) can deploy new tables for extensions themselves. i want to know about it if schema alterations are happening so they can be done online or have downtime scheduled [02:34:11] well, that's my point - i didn't want to deploy it without knowing the process first :) [02:34:19] if data updates occur, the jobs need to be batched, but also ok for reedy-eqsue devs to run :) [02:34:43] ppl might get unhappy if meta goes down [02:35:11] heh, then yeah, ask reedy or someone who's deployed flaggedrevs before. not i, so far (i'd wait for reedy too) [02:35:49] or aaron [02:37:13] <^d> Yes, those instructions are pretty much accurate. [02:37:20] <^d> Obviously s/1.18/something modern/ [02:37:37] !log LocalisationUpdate completed (1.23wmf5) at Tue Dec 3 02:37:37 UTC 2013 [02:37:53] Logged the message, Master [02:37:56] thx springle [02:41:31] (03PS1) 10Springle: depool es1004 for upgrade [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98749 [02:42:23] (03CR) 10Springle: [C: 032] depool es1004 for upgrade [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98749 (owner: 10Springle) [02:43:26] !log springle synchronized wmf-config/db-eqiad.php 'depool es1004 for upgrade' [02:43:40] Logged the message, Master [03:07:06] PROBLEM - check_job_queue on terbium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:07:56] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [03:19:58] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Dec 3 03:19:58 UTC 2013 [03:20:06] TimStarling: i want to depool es1004. it is as simple as commenting out the IP in $wmgOldExtTemplate ? safe, etc [03:20:13] Logged the message, Master [03:20:25] obviously it's a bit late now since you just did it, but it's reasonably safe, yes [03:21:17] job queue runners will keep going with the same configuration for a few minutes or so after the configuration change [03:22:45] which matters a little bit more with external storage than with core DBs, since external storage is slightly more likely to give soft (non-exception) failures [03:23:33] TimStarling, hi, do you know by any chance why i don't see zero extension in https://graphite.wikimedia.org/dashboard/ ? [03:24:03] or is there a better place to look at profiling data [03:24:36] what profiling sections are there? [03:25:13] TimStarling, there are hundreds [03:25:40] I see some at http://noc.wikimedia.org/cgi-bin/report.py?db=all&sort=real&limit=5000 [03:25:46] maybe the backslashes confuse graphite [03:26:00] oh, TimStarling veryy sorry, i found it under ExtZero... [03:26:13] not sure why it was defined like that [03:26:30] and thx for the link [03:38:47] TimStarling: thanks. i spent a while digging through code to come to that conclusion. es1004 is depooled and waiting for wikiadmin connnections to die [03:39:05] taking longer than a few mins though. sleepers [03:39:51] are you taking the server down, or doing some other kind of maintenance on it? [03:40:10] package upgrade including mariadb I hope [03:41:01] hmm, ExternalStoreDB.php needs some updates for PHP 5 [03:41:15] I guess this code doesn't get looked at very often [03:41:16] er, LTS upgrade first actually [03:42:51] so, if you just shut it down, the client will throw an exception [03:43:14] which is fine, the job will be reattempted [03:43:34] ah good [03:45:52] !log stopping es1004 mysqld for precise & mariadb upgrade, plus reboot [03:46:07] Logged the message, Master [03:49:28] PROBLEM - mysqld processes on es1004 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [03:49:31] the S[1-7] wikiadmin connections are usually faster to die off after a depool. ES wikiadmin jobs have more persistent connections [03:49:39] oh, icinga, forgot about you [04:02:58] TimStarling: externalLoads db-eqiad.php:315 es1005; are ES cluster masters identified by mediawiki solely by position in that array, just as for sectionLoads ? [04:04:10] yes [04:04:23] thanks [04:12:13] (03PS1) 10Springle: switch es1004 to mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/98756 [04:14:05] TimStarling, maybe you know - (i will ask aaron tmrw otherwise) - have you set up flaggedrevs ext ? does it require complex db schema changes? [04:14:23] i need to set it up on meta [04:19:14] just extra tables [04:20:41] you can use sql.php [04:21:50] yurik: Did you ask Meta-Wiki if it wanted FlaggedRevs? [04:22:28] Elsie, no, but i am only enabling it on zero config namespace - hence it won't change anything for anyone [04:22:39] I'm not sure FlaggedRevs is still supported. [04:22:53] ??? i thought half of the world's wikis run on it [04:23:04] at least the bigger ones i think [04:23:06] de, ru [04:23:09] (03PS2) 10Springle: switch es1004 to mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/98756 [04:23:12] Do the Germans still use it? [04:23:17] i think so [04:23:21] Yes. [04:23:23] and pt [04:23:23] (03CR) 10Springle: [C: 032] switch es1004 to mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/98756 (owner: 10Springle) [04:23:26] en.wiki tried it a few times. [04:23:40] I don't think anyone ever liked it very much. [04:23:42] en uses it under the name 'Pending changes' [04:24:08] It was slow, very heavy on config flags, and created backlogs. [04:24:30] Who's maintaining it these days? [04:25:46] https://noc.wikimedia.org/conf/flaggedrevs.dblist [04:26:53] That's about 50 out of 800 wikis (12.5%?). [04:52:22] PROBLEM - MySQL Replication Heartbeat on db1046 is CRITICAL: CRIT replication delay 301 seconds [04:52:42] PROBLEM - MySQL Slave Delay on db1046 is CRITICAL: CRIT replication delay 303 seconds [05:11:13] ori-l, fonud something interesting ... [05:12:15] <^d> 2200 pages/s? Much more reasonable indexing speed. [05:12:24] <^d> Jobqueue was totally the right way to go here :) [05:13:39] PROBLEM - Puppet freshness on cp3012 is CRITICAL: Last successful Puppet run was Mon 02 Dec 2013 05:10:45 PM UTC [05:17:39] PROBLEM - Puppet freshness on cp3011 is CRITICAL: Last successful Puppet run was Mon 02 Dec 2013 05:15:19 PM UTC [05:21:31] ok, i think i found a techy troll of some sort - they must have a script of some sort to constantly ping the same page (in user namespace), and that page has so many links, it crashes zero [05:30:12] (03PS1) 10Springle: repool es1004 after upgrade [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98758 [05:30:42] (03CR) 10Springle: [C: 032] repool es1004 after upgrade [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98758 (owner: 10Springle) [05:32:07] !log springle synchronized wmf-config/db-eqiad.php 'repool es1004 after upgrade, max_connections lowered during warm up' [05:32:21] Logged the message, Master [05:41:58] ACKNOWLEDGEMENT - RAID on db47 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) Sean Pringle Imminent decommission. [05:46:04] classy! [05:56:35] (03CR) 10Dzahn: "since this is going to be on a misc host and SSL has been mentioned, you'll need :443 in the Apache config, and the cert/key setup in pupp" [operations/puppet] - 10https://gerrit.wikimedia.org/r/98740 (owner: 10BryanDavis) [06:00:57] (03CR) 10Ori.livneh: "@Dzahn: Paravoid recommended putting this behind misc-varnish in eqiad, which would get it free SSL termination, too" [operations/puppet] - 10https://gerrit.wikimedia.org/r/98740 (owner: 10BryanDavis) [06:05:04] (03CR) 10Dzahn: "ok, gotcha. well then.. i was just about to comment that of the other roles on this same node, planet has it's own wildcard cert and insta" [operations/puppet] - 10https://gerrit.wikimedia.org/r/98740 (owner: 10BryanDavis) [06:06:32] ori-l: looks like etherpad relies on star.wm just being there [06:07:04] at some point we wanted to get rid of it altogether and have more separate certs.. i dunno now [06:07:22] if it's all behind misc varnish ..ok [06:07:44] ..away [06:07:45] i don't know much about the benefits and tradeoffs of granular certificates [06:08:25] first thing, it uses a cert, but it's not puppetized that it should be there [06:08:30] but having a few closely-monitored entry points is nice; makes the attack surface smaller [06:08:36] and the other service doesnt care, because it has its own [06:11:59] and since we got from different vendors we also need to keep this updated or we create broken chains https://gerrit.wikimedia.org/r/#/c/97633/1/manifests/certs.pp [06:12:07] cya latr.. late.. away for real [06:14:23] mutante: good night [06:28:10] PROBLEM - udp2log log age for lucene on oxygen is CRITICAL: CRITICAL: log files /a/log/lucene/lucene.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [06:29:10] RECOVERY - udp2log log age for lucene on oxygen is OK: OK: all log files active [06:53:31] hello [06:57:33] morning, paravoid [06:59:10] heh, of course ori-l is still here :-) [06:59:44] i was off for a while, made salad with my son and put him to bed [07:10:55] (03PS1) 10Faidon Liambotis: protoproxy: set proxy_read_timeout to 180s [operations/puppet] - 10https://gerrit.wikimedia.org/r/98761 [07:11:26] sad [07:15:18] (03CR) 10Faidon Liambotis: [C: 032] protoproxy: set proxy_read_timeout to 180s [operations/puppet] - 10https://gerrit.wikimedia.org/r/98761 (owner: 10Faidon Liambotis) [07:15:19] (03PS1) 10Faidon Liambotis: Kill stray nginx configs for thumbs & old mobile [operations/puppet] - 10https://gerrit.wikimedia.org/r/98762 [07:16:30] (03CR) 10Faidon Liambotis: [C: 032] Kill stray nginx configs for thumbs & old mobile [operations/puppet] - 10https://gerrit.wikimedia.org/r/98762 (owner: 10Faidon Liambotis) [08:14:15] PROBLEM - Puppet freshness on cp3012 is CRITICAL: Last successful Puppet run was Mon 02 Dec 2013 05:10:45 PM UTC [08:16:15] !log nikerabbit synchronized php-1.23wmf5/extensions/Translate/ [08:16:32] Logged the message, Master [08:18:15] PROBLEM - Puppet freshness on cp3011 is CRITICAL: Last successful Puppet run was Mon 02 Dec 2013 05:15:19 PM UTC [08:27:11] !log nikerabbit synchronized php-1.23wmf4/extensions/Translate/ [08:27:26] Logged the message, Master [09:12:50] hello [09:16:05] (03PS4) 10Spage: Enable Flow discussions on a few test wiki pages [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/94106 [09:21:11] (03CR) 10Spage: "Comments addressed in PS4, which is rebased with labs improvements." (032 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/94106 (owner: 10Spage) [09:21:18] (03CR) 10Ryan Lane: "The only thing that could possibly be in the iptables rules on virt0 is the opendj rules that are added for 389/636. Everything except the" [operations/puppet] - 10https://gerrit.wikimedia.org/r/98307 (owner: 10Faidon Liambotis) [09:23:52] who is the magic guy for vagrant again? [09:27:32] p858snake|l: ori-l [09:28:01] he is SF based so 'hopefully' sleeping [09:35:25] (03PS1) 10Hashar: deployment: integration/slave-scripts has submodules [operations/puppet] - 10https://gerrit.wikimedia.org/r/98778 [09:43:22] (03CR) 10ArielGlenn: [C: 032] beta: let jenkins-deploy restart Parsoid [operations/puppet] - 10https://gerrit.wikimedia.org/r/98685 (owner: 10Hashar) [09:52:20] (03CR) 10Hashar: "sudo command validated on the beta cluster. Thanks!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/98685 (owner: 10Hashar) [10:04:37] RECOVERY - MySQL Slave Delay on db1046 is OK: OK replication delay 0 seconds [10:05:08] RECOVERY - MySQL Replication Heartbeat on db1046 is OK: OK replication delay -0 seconds [10:08:25] (03CR) 10Ryan Lane: [C: 032] deployment: integration/slave-scripts has submodules [operations/puppet] - 10https://gerrit.wikimedia.org/r/98778 (owner: 10Hashar) [10:18:59] back [10:19:08] Ryan_Lane: do you still have merge rights on the puppetmaster ? [10:19:31] already merged it there [10:19:35] \O/ [10:19:43] ehm, s1 is lagging [10:20:04] MaxSem: db1046 had some lag apparently, might be related ? [10:20:16] hashar, all slaves ATM [10:23:39] I wonder if the cause was me with https://en.wikipedia.org/w/index.php?title=User:Rich_Farmbrough/WP_v0.8_full_index&diff=584344722&oldid=491316699 :P [10:24:09] all normal now [10:30:15] apache@deployment-jobrunner08:~$ bash /usr/local/bin/jobs-loop.sh [10:30:16] Creating default and immediate job runner pipelines [10:30:16] Warning: Invalid [10:30:19] how helpful is that [10:30:27] that is equivalent to "ERROR3 [10:42:38] (03PS1) 10Hashar: beta: define $wmgParserCacheDBs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98793 [10:43:34] (03CR) 10Hashar: [C: 032] beta: define $wmgParserCacheDBs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98793 (owner: 10Hashar) [10:43:42] (03Merged) 10jenkins-bot: beta: define $wmgParserCacheDBs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98793 (owner: 10Hashar) [11:15:05] PROBLEM - Puppet freshness on cp3012 is CRITICAL: Last successful Puppet run was Mon 02 Dec 2013 05:10:45 PM UTC [11:19:05] PROBLEM - Puppet freshness on cp3011 is CRITICAL: Last successful Puppet run was Mon 02 Dec 2013 05:15:19 PM UTC [11:21:35] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [11:33:33] sigh [11:33:41] mobile esams is just cursed [11:36:20] PROBLEM - Varnish HTTP mobile-backend on cp3012 is CRITICAL: Connection refused [11:37:02] !log stopping varnish backend on cp3012, crashing every 30s [11:37:19] Logged the message, Master [11:37:51] bblack: ping? [11:38:55] should be a bit early for him? [11:39:05] yes, but you never know [11:49:02] (03CR) 10Akosiaris: [C: 032] reset repo by merging in tag 'v0.7.1' from upstream [operations/debs/jenkins-debian-glue] - 10https://gerrit.wikimedia.org/r/95422 (owner: 10Hashar) [11:51:25] (03CR) 10Akosiaris: [C: 032] bump wmf package to 0.7.1-6-gf618f4d [operations/debs/jenkins-debian-glue] - 10https://gerrit.wikimedia.org/r/95424 (owner: 10Hashar) [11:51:49] akosiaris: that one is a bit messy not sure whether it make sense to you [11:52:07] akosiaris: I have crafted the version in the change log using git-describe [11:53:46] hashar: yes I noticed [11:54:00] normally i 'd object to -githash [11:54:19] but the version is good enough to warrant an easy upgrade to v0.7.2 [11:54:26] when it is out [11:55:30] that was my idea [11:55:50] hashar: in general you want to avoid git hashes in version [11:56:10] because the next one might be alpabetically lower than the previous one hence no upgrade [11:57:08] but git describe list the number of commits since last tag [11:57:17] so 0.7.1-6-xxx will become 0.7.1-7-yyyy [11:57:36] I am not sure how to have git build package generate that [11:57:39] which is why I did not object :-) [11:57:45] \O/ [12:01:12] today I learned, init scripts need to be run as root .. [12:01:36] :-) [12:04:24] (03PS1) 10Hashar: beta: parsoid init script needs root [operations/puppet] - 10https://gerrit.wikimedia.org/r/98798 [12:10:52] hmm [12:14:31] bah now it works grmblblbl [12:16:51] (03Abandoned) 10Hashar: beta: parsoid init script needs root [operations/puppet] - 10https://gerrit.wikimedia.org/r/98798 (owner: 10Hashar) [12:33:58] i hate it [12:48:14] dare I ask? [12:48:57] trying to figure out why stderr is stripped out when doing " ssh sudo ..." [12:49:04] !log uploaded new jenkins-debian-glue on apt.wikimedia.org, version 0.7.1-6-gf618f4d [12:49:17] Logged the message, Master [12:51:02] PROBLEM - Host mw72 is DOWN: PING CRITICAL - Packet loss = 100% [12:52:22] RECOVERY - Host mw72 is UP: PING OK - Packet loss = 0%, RTA = 35.39 ms [12:59:46] anyone happen to reboot mw72 a couple minutes ago? [13:07:02] PROBLEM - NTP on mw72 is CRITICAL: NTP CRITICAL: Offset unknown [13:11:02] RECOVERY - NTP on mw72 is OK: NTP OK: Offset 0.0008229017258 secs [13:34:03] RECOVERY - Varnish HTTP mobile-backend on cp3012 is OK: HTTP OK: HTTP/1.1 200 OK - 189 bytes in 0.193 second response time [13:34:22] RECOVERY - Puppet freshness on cp3012 is OK: puppet ran at Tue Dec 3 13:34:16 UTC 2013 [13:34:22] RECOVERY - Puppet freshness on cp3011 is OK: puppet ran at Tue Dec 3 13:34:21 UTC 2013 [13:39:02] PROBLEM - Varnish HTTP mobile-backend on cp3012 is CRITICAL: Connection refused [13:42:03] RECOVERY - Varnish HTTP mobile-backend on cp3012 is OK: HTTP OK: HTTP/1.1 200 OK - 188 bytes in 0.194 second response time [13:58:20] (03Restored) 10Hashar: beta: parsoid init script needs root [operations/puppet] - 10https://gerrit.wikimedia.org/r/98798 (owner: 10Hashar) [13:58:35] apergos: mind merging in change above ? Some sudo policy for beta was wrong [13:58:51] the part would let jenkins-deploy use the init script as root instead of 'parsoid' user [13:59:21] i hate the parsoid shell runner … sudo -E -u parsoid nohup node /var/lib/parsoid/Parsoid/js/api/server.js > /dev/null 2>&1 & [13:59:22] :( [14:00:43] wunnerful [14:00:57] yeah I should have thought of that since it's an init script [14:01:01] anyways [14:01:18] I could not catch the std err output because the init script just run a shell that goes to background (and fails later on) [14:01:28] they parsoid team has requested help with writing an init script a number of times [14:01:36] requested from ops, that is [14:01:39] (03CR) 10ArielGlenn: [C: 032] beta: parsoid init script needs root [operations/puppet] - 10https://gerrit.wikimedia.org/r/98798 (owner: 10Hashar) [14:01:42] and we haven't responded to that request [14:01:42] !log depooling cp3012.esams temporarily to fix various ugly issues [14:01:59] Logged the message, Master [14:02:19] apergos: feel free if you have any spare cycles ;) [14:02:39] not yet but I'll add it to the todos [14:04:33] PROBLEM - Varnish HTTP mobile-frontend on cp3012 is CRITICAL: Connection refused [14:05:02] PROBLEM - Varnish HTTP mobile-backend on cp3012 is CRITICAL: Connection refused [14:06:20] close to 1,5 hour without power.... [14:06:38] at some point the laptop's battery is going to die.... [14:06:47] !log rebooting cp3012.esams [14:07:02] Logged the message, Master [14:08:40] PROBLEM - Host cp3012 is DOWN: PING CRITICAL - Packet loss = 100% [14:12:15] apergos: thank you :) [14:12:21] working? [14:12:26] (03PS2) 10Hashar: beta: autoupdate should restart parsoid [operations/puppet] - 10https://gerrit.wikimedia.org/r/98007 [14:12:41] (03PS3) 10Hashar: beta: autoupdate should restart parsoid [operations/puppet] - 10https://gerrit.wikimedia.org/r/98007 [14:13:22] apergos: yeah that is working :-] [14:13:34] the way we start parsoid is definitely not nice [14:13:50] RECOVERY - Host cp3012 is UP: PING OK - Packet loss = 0%, RTA = 96.20 ms [14:14:01] I update the python script that updates beta : https://gerrit.wikimedia.org/r/#/c/98007/ tested it and it is working fine [14:14:10] RECOVERY - Varnish HTTP mobile-backend on cp3012 is OK: HTTP OK: HTTP/1.1 200 OK - 188 bytes in 0.193 second response time [14:14:13] there was a patch around actually for an upstart job [14:14:24] that would be nie [14:14:25] nice [14:14:30] RECOVERY - Varnish HTTP mobile-frontend on cp3012 is OK: HTTP OK: HTTP/1.1 200 OK - 262 bytes in 0.193 second response time [14:14:51] I need to talk to them about the onlyinclude bug though and make sure it's on someone's radar.... tonight, forgot yesterday [14:15:45] bah my code is wrong [14:15:50] !log cp3012 re-pooled [14:16:06] Logged the message, Master [14:25:40] (03PS4) 10Hashar: beta: autoupdate should restart parsoid [operations/puppet] - 10https://gerrit.wikimedia.org/r/98007 [14:27:04] apergos: could use https://gerrit.wikimedia.org/r/#/c/98007/ now :-D [14:27:06] final call! [14:27:33] you call as often as you need, that's (part of) why I'm here :-) [14:27:52] :D [14:28:05] but yeah upstart for parsoid would be nice [14:28:13] yep [14:28:17] or anything more robust than the lame /usr/bin/parsoid shell script wrapper [14:28:24] with log rotation. seriously. [14:28:42] well the workers should send their log over syslog [14:28:49] so we can aggregate them in logstash or whatever [14:29:19] they can produce a lot of crap when things are broken [14:29:20] iirc you told me it got reported to them [14:29:53] yep, but who knows what is on the top of their pile [14:30:22] no clue [14:30:43] i have some interactions with parsoid team, VE itself is more or less a blackbox [14:30:59] (I think that is intentional, with James_F protecting the devs from outside world interruptions) [14:31:18] sure [14:34:03] apergos: meanwhile I could use https://gerrit.wikimedia.org/r/#/c/98007/ :D [14:39:47] so if the status command gives you nonzero you just log it and move on? [14:40:40] you know you just relinked the same patchset twice right? [14:40:42] hashar: [14:40:47] arghgh [14:40:58] so yeah errors are not blocking, just logged [14:41:10] just wondering if 5 seconds is long enough [14:41:27] seems like [14:41:36] the status script invoke a shell script that basically does ( … ) & [14:41:39] well you can always adjust later [14:41:40] so it returns asap [14:41:54] and start-stop-daemon always returns 0 cause ( .. ) & is always a success [14:42:18] yep [14:42:31] all righty [14:42:47] (03CR) 10ArielGlenn: [C: 032] beta: autoupdate should restart parsoid [operations/puppet] - 10https://gerrit.wikimedia.org/r/98007 (owner: 10Hashar) [14:50:41] 00:00:00.561 fatal: Could not change back to '/home/jenkins-deploy/workspace/beta-code-update': Permission denied [14:50:43] oh yeah [14:50:46] I need a new carrer [14:51:43] should swipe streets with a broom maybe [14:52:36] you'd get bored of that pretty quick [14:52:44] also too much rain and cold for that [14:56:33] on the other hand I am getting fed up with software bugs [14:56:50] git log issuing a fatal because it attempts to chdir :/ [15:00:03] it just needs to be able to read, is the directory readable only by jenkins-deploy? [15:06:22] (03PS1) 10Faidon Liambotis: protoproxy: add proxy_read_timeout to IPv6 too [operations/puppet] - 10https://gerrit.wikimedia.org/r/98824 [15:07:02] who handles mingle? [15:07:53] apergos: yup [15:07:53] (03CR) 10Faidon Liambotis: [C: 032] protoproxy: add proxy_read_timeout to IPv6 too [operations/puppet] - 10https://gerrit.wikimedia.org/r/98824 (owner: 10Faidon Liambotis) [15:08:08] hashar: is it my idea or is jenkins getting progressively slower? [15:08:40] apergos: I use git --git-dir , git does a chdir() there then attempt to come back to the previous path with another chdir(). I will cd instead [15:09:11] paravoid: I can look at the operations-puppet* jobs metrics in statsd [15:09:37] paravoid: jenkins API is a bit slow for sure :( [15:13:20] drwx------ 6 jenkins-deploy wikidev 4096 Dec 3 14:21 /home/jenkins-deploy// [15:13:21] \O/ [15:15:15] !log depool/restart/pool cycle for nginxes [15:15:28] Logged the message, Master [15:20:29] aude: define "handles". Server hosting is managed by WMF Office IT technical team [15:21:23] Erik is aware of the issues btw https://bugzilla.wikimedia.org/show_bug.cgi?id=57829 [15:23:28] andre__: it used to be that i could see stuff, now it asks for login [15:23:47] it was nice to be aware what the teams were working on [15:23:58] even if wikidata team is not using it (although we have a section on mingle) [15:24:38] erik is aware so, okay :) [15:25:35] Ryan_Lane: hey, what was it with nginx that didn't work with reload? [15:25:44] aude, meh, I can reproduce the problem without being logged in, yeah [15:26:09] the guest login works but then some stuff is restricted and not sure it was before [15:27:20] * aude will struggle to see our scrum board when working remote and wouldn't mind such a tool :) [15:29:59] scrum ?? :D [15:30:02] on bugzilla? [15:30:20] :) [15:30:54] we do put stuff in bugzilla but it's not necessarily suited to give me an overview of our scrum items [15:31:50] I found out some software that comes on top of bugzilla to let you use it as a kanban wall [15:32:49] oooh [15:34:39] ping: akosiaris [15:34:50] Steinsplitter: pong [15:35:39] akosiaris: is there a problem wit the job queue? like backlog? on meta fuzzybot is broken and 550% of the Grants: pages broken now [15:37:39] Steinsplitter: nothing obvious [15:37:55] verry strange... the extension is broken, probably [15:39:57] Nikerabbit ^^^ [15:43:46] MaxSem: wassup? [15:44:12] ^^^ fuzzybot on meta ^^^ [15:44:17] MaxSem: what about it? [15:44:25] Nikerabbit: fuzzybot is broken on meta, the bot dos not fix the source [15:44:37] https://meta.wikimedia.org/w/index.php?title=Special%3AWhatLinksHere&target=Template%3AWMF+grants+program+documentation&namespace= for example [15:44:58] but Grants:Index/Submit request is marked for translation... ;/ [15:45:34] I see source updates on https://meta.wikimedia.org/wiki/Special:Contributions/FuzzyBot [15:45:51] but not alll sources ar updated [15:46:00] see my link [15:46:14] the bot updates only 80% [15:46:49] 20% missing [15:47:10] and i cannot edit the pages by hand to fix the template etc. [15:47:48] https://meta.wikimedia.org/wiki/Special:WhatLinksHere/Template:Project_and_Event_Grants_program_documentation the exactly same problem [15:47:57] only 40% "replaces" [15:48:33] Nikerabbit: look here, I added a space https://meta.wikimedia.org/w/index.php?title=Grants%3AIndex%2FSubmit_request%2Ffr&diff=6595908&oldid=5464537 [15:49:02] Steinsplitter: the edits are not lost, just delayed [15:49:33] it seems the good old https://bugzilla.wikimedia.org/show_bug.cgi?id=46716 [15:50:46] :/ [15:51:00] now i need to made for every page a dummy edit? o_O [15:51:26] hashar: Not really trying for VE team to be a "blackbox". :-) [15:52:26] (03CR) 10Akosiaris: [C: 04-1] "There is one blocking thing (the weekday missing which would create problems. The two other comments are nitpicks but they would be nice. " (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/94075 (owner: 10Dzahn) [15:53:42] hashar: BTW, you seen https://bugzilla.wikimedia.org/show_bug.cgi?id=57926 ? [15:54:01] hashar: Looks like the new beta labs Parsoid is returning the wrong things… [15:54:19] Steinsplitter: no, just ignore those pages, they'll be updated next time a translator touches them or when someone runs the maintenance script to rebuild translation pages [15:55:21] for a few minutes there's been some 800-1000 ms replag, allegedly the cause https://ganglia.wikimedia.org/latest/graph_all_periods.php?title=mysql_slave_lag&mreg[]=^mysql_slave_lag%24&hreg[]=db10%2807|24|28|41%29&aggregate=1 [15:55:34] omg, +50 affected. :/ it is possible to run this "maintenance script" ? [15:55:36] ah, ok [15:56:06] no, last time it was run it was because of thousands affected pages; the problem is currently rather minor, not worth it [15:56:16] k [15:58:34] Steinsplitter: in few weeks I should have environment to debug this issue and fix it... at that point we might run the script again [15:59:14] Nikerabbit: and what now with the 50~ broken WMF Grants pages? [15:59:41] Steinsplitter: I'm sure Nemo_bis has some practical suggestions [15:59:50] Sure! Be zen [16:00:02] Special:Random, always enough work on Meta [16:00:26] no need to focus on stuff that it will fix by itself and doesn't need you [16:00:49] i get killed this evening :P [16:02:41] (03CR) 10Akosiaris: [C: 032] Adding varnishkafka::monitoring class to send stats to Ganglia. [operations/puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/98601 (owner: 10Ottomata) [16:04:33] Nemo_bis: do you have practical suggestions? [16:04:52] {{tnt|template}} dos not work when redirecting Lua error: expandTemplate: template "Template:Project and Event Grants program documentation/en" does not exist. [16:07:03] James_F: yeah hmm :( [16:07:14] James_F: might be broken so, wondering whether there is a cache in parsoid [16:07:36] ah there is a varnish cache in front [16:08:03] Yes. [16:08:25] * greg-g raises his cup of coffee to the room [16:08:58] <^d> Mornin' greg-g [16:09:04] * hashar discovers we now have a Beta in the preferences [16:09:14] greg-g: wanna do our call right now ? [16:09:42] zeljkof: around ? [16:10:03] (03PS5) 10BryanDavis: [WIP] Add configuration for Wikimania Scholarships [operations/puppet] - 10https://gerrit.wikimedia.org/r/98740 [16:10:28] Steinsplitter: ah well, that's nasty; in those cases you have to make a dummy edit on the source page and remark [16:10:36] hashar: gimme 10? I'm just booting myself up, coffee mug is still more than 3/4 full ;) [16:10:42] greg-g: sure [16:10:44] <^d> BZ is soooooo slow :\ [16:10:46] hashar: I am [16:10:48] trying to debug some parsoid issue on labs :D [16:10:57] <^d> greg-g: You should swap for an SSD. Much faster boot times. [16:11:01] zeljkof: so Parsoid on beta is broken somehow https://bugzilla.wikimedia.org/show_bug.cgi?id=57926 :( [16:11:32] hashar: that was my guess too :) [16:11:49] Nemo_bis: omg, 50 dummy edits haha xD okay [16:11:59] zeljkof: is that a recent breakage? [16:12:22] Steinsplitter: have you marked 50 pages? [16:12:29] hashar: I have noticed it a few hours ago [16:12:33] Nemo_bis: yes [16:12:34] or rather, 50 templates [16:12:39] (03CR) 10BryanDavis: [WIP] Add configuration for Wikimania Scholarships (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/98740 (owner: 10BryanDavis) [16:12:50] ^d: I haven't booted up sufficiently to make a good joke in reply [16:12:52] pages , pages :P [16:13:22] Steinsplitter: if those pages are not transcluded elsewhere via TNT, don't bother [16:13:23] zeljkof: I have updated parsoid code base earlier :/ [16:13:36] hashar: so you are to blame :) [16:13:44] <^d> greg-g: It's ok. You just owe me a joke now :) [16:14:15] Nemo_bis: jeah, all transcluded via TNT.... well... i do it tomorrow. atm no time. [16:14:23] yep [16:14:50] (03CR) 10BryanDavis: [WIP] Add configuration for Wikimania Scholarships (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/98740 (owner: 10BryanDavis) [16:19:46] hashar: alright, ready when you are [16:19:56] greg-g: moving out to another room [16:22:27] morebots, you ok? [16:22:27] I am a logbot running on tools-exec-04. [16:22:27] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [16:22:27] To log a message, type !log . [16:33:01] oh no, hashar's gone? [16:33:46] (yes, he is, you don't need to answer the obvious question) [16:40:45] (03PS1) 10MaxSem: Fix removal of title coordinates from extracts [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98848 [16:41:11] hey greg-g, when can I deploy ^^^, LD? [16:46:08] (03PS6) 10BryanDavis: [WIP] Add configuration for Wikimania Scholarships [operations/puppet] - 10https://gerrit.wikimedia.org/r/98740 [16:49:54] (03CR) 10Akosiaris: [C: 04-1] "I had the epiphany that this will block all ssh access to bastion hosts. We first need to create a ferm::service resource that allows ssh " [operations/puppet] - 10https://gerrit.wikimedia.org/r/96424 (owner: 10Dzahn) [16:51:19] (03PS1) 10BryanDavis: Add scholarships.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/98849 [16:51:40] (03CR) 10BryanDavis: [C: 04-1] "Needs Ie568f268b1" [operations/dns] - 10https://gerrit.wikimedia.org/r/98849 (owner: 10BryanDavis) [16:54:51] MaxSem: yeah, LD today looks good, wanna add it to the wiki calendar, please (I'm in between two back to back calls :/ ) [16:58:24] greg-g, thanks - done [17:01:37] Reedy: able to call in to the meeting now? [17:38:58] (03PS7) 10BryanDavis: [WIP] Add configuration for Wikimania Scholarships [operations/puppet] - 10https://gerrit.wikimedia.org/r/98740 [17:40:46] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (213375) [17:57:44] (03CR) 10BryanDavis: [WIP] Add configuration for Wikimania Scholarships (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/98740 (owner: 10BryanDavis) [18:09:52] (03CR) 10Matthias Mullie: "LGTM" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/94106 (owner: 10Spage) [18:13:07] (03CR) 10BryanDavis: "Dan, amend this changeset and put your settings in the new feature flag section that Mark made in wmf-config/CommonSettings-labs.php." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98684 (owner: 10MarkTraceur) [18:20:33] * paravoid stabs ottomata for adding trailing dots to the first line of commit messages' [18:20:40] haha [18:20:43] oopsie [18:20:46] i'll get it out of my system eventually! [18:21:26] <^d> Hmm, why are testsearch100[1-3] still in https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&s=by+name&c=Elasticsearch%2520cluster%2520eqiad&tab=m&vn=&hide-hf=false? [18:21:38] <^d> I can't find them mentioned in puppet anymore. [18:28:24] ^d, are they still up and running? [18:28:37] oh no, down i see [18:28:49] <^d> No, they should've been wiped and recomissioned as logstash100[1-3] [18:29:04] i doubt that puppet does anything to purge stuff from gmetad/rrd [18:29:07] we could do so manaully [18:29:16] orrr, they might disappear eventually on their own? [18:29:17] not sure [18:29:20] <^d> Hmm. [18:34:48] paravoid: is a single trailing dot also a problem? [18:43:31] racadm serveraction powerup [18:45:09] !log restarting gmetad on nickel to clear dead hosts [18:45:25] Logged the message, Master [18:45:28] RECOVERY - Host labstore1001 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [18:50:00] ottomata: ganglia is supposed to clean up if the host doesn't report after a long time [18:50:10] No concerns if labstore1001 bounces a few times in the next couple minutes. [18:50:28] ok cool, thought so but wasn't willing to claim it as fact :) [18:50:58] PROBLEM - Host labstore1001 is DOWN: PING CRITICAL - Packet loss = 100% [19:01:47] (03PS7) 10Yurik: for m.wikipedia.org and zero.wikipedia.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/97115 [19:02:12] (03PS1) 10Ottomata: Fixing JsonParser.get_state to return proper metric type [operations/debs/logster] - 10https://gerrit.wikimedia.org/r/98862 [19:02:53] (03CR) 10Ottomata: [C: 032 V: 032] Fixing JsonParser.get_state to return proper metric type [operations/debs/logster] - 10https://gerrit.wikimedia.org/r/98862 (owner: 10Ottomata) [19:03:45] paravoid, should i mark 17 as done? Also, I would like to lower your load if possible, so I can deploy https://gerrit.wikimedia.org/r/#/c/97115/ myself during lightning today [19:04:04] yurik: you can't, this needs ops powers [19:04:08] but maybe mutante can [19:04:18] if he has spare cycles [19:04:36] and let's set it as done when our side is fixed too [19:05:07] ok, ping me if I can help in any way - that column of "dependent on OPs" is scary [19:05:16] paravoid: thanks a lot for the varnish merge. [19:05:19] it is :( [19:05:27] (03PS1) 10Reedy: Non Wikipedias to 1.23wmf5 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98864 [19:07:56] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [19:09:57] (03PS1) 10Ottomata: Fixing JsonParser.get_state to return proper metric type [operations/debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/98865 [19:09:58] (03PS1) 10Ottomata: Updating changelog with recent change from master [operations/debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/98866 [19:10:09] (03CR) 10Reedy: [C: 032] Non Wikipedias to 1.23wmf5 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98864 (owner: 10Reedy) [19:10:23] (03Merged) 10jenkins-bot: Non Wikipedias to 1.23wmf5 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98864 (owner: 10Reedy) [19:10:51] \o/ [19:11:49] (03Abandoned) 10Ottomata: Fixing JsonParser.get_state to return proper metric type [operations/debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/98865 (owner: 10Ottomata) [19:11:57] (03Abandoned) 10Ottomata: Updating changelog with recent change from master [operations/debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/98866 (owner: 10Ottomata) [19:14:19] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non Wikipedias to 1.23wmf5 [19:14:40] Logged the message, Master [19:15:18] <^d> Reedy: /a/common/ seems to be...missing...from arsenic. [19:15:23] (03PS2) 10Yurik: Automatically pull proxies from Wikipedia Zero's config namespace on META. [operations/puppet] - 10https://gerrit.wikimedia.org/r/97004 (owner: 10Dr0ptp4kt) [19:15:33] (03PS3) 10Yurik: Automatically pull proxies from Wikipedia Zero's config namespace on META. [operations/puppet] - 10https://gerrit.wikimedia.org/r/97004 (owner: 10Dr0ptp4kt) [19:15:44] <^d> Well, /a/common/ is there but nothing is in it ;-) [19:15:55] Run sync-common locally on it? [19:16:18] mutante: can you deploy https://gerrit.wikimedia.org/r/#/c/97115/ so that yuri can test it today? [19:16:47] (03CR) 10Yurik: "The dependency has been deployed to production" [operations/puppet] - 10https://gerrit.wikimedia.org/r/97004 (owner: 10Dr0ptp4kt) [19:16:52] (please pretty please :)) [19:17:05] <^d> Reedy: Dur, thanks. [19:17:09] paravoid: yea, i can do that later, but tell me.. where do i install the new php-luasandbox package first :) [19:17:09] <^d> Wonder why it brokeded. [19:17:21] <^d> Hmm, still empty [19:17:24] what do you mean? [19:17:53] paravoid: which host to use to install php-luasandbox_1.8 [19:17:55] 11:19 < anomie> Can it be put on testwiki? Or else beta labs? [19:18:07] to test you mean? [19:18:16] yea, well, i just built a new version of it [19:18:28] and just pushing it on APT seems risky [19:18:33] betalabs then [19:18:44] ^d: it's not on fenari either [19:18:47] betalabs -> mw1017 (testwiki) -> prod [19:18:53] it was some puppet change a while ago [19:19:08] <^d> Mehhh [19:19:31] paravoid: mw1017, i needed that number:) [19:20:01] how should i go about putting it on betalabs when doing it manually [19:21:24] (03PS1) 10Catrope: Disable VisualEditor in content namespaces on svwiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98871 [19:22:14] paravoid, if the source of a package has changed, the actual version should be bumped, right? not just the … -x deb version? [19:22:39] 0.1-3 -> 0.1-4 [19:22:39] vs [19:22:40] 0.1.3 -> 0.?-1 [19:22:40] ? [19:23:22] ottomata: i'd say 0.2-1 [19:23:49] aye [19:23:53] in my current example it's 1.7-1 -> 1.8-1 , unless you make several builds [19:24:05] i just bumped deb rev number, and it was complaing about the source .tar.gz file changing [19:24:09] which makes sense, because it has [19:24:13] (reprepro was complaining) [19:25:27] !log reedy synchronized php-1.23wmf5/extensions/ProofreadPage/ [19:25:43] Logged the message, Master [19:29:26] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikisource back to 1.23wmf4 as proofreadpage is broken [19:29:41] Logged the message, Master [19:32:27] !log updated webstatscollector package in apt to 0.2-1 [19:32:43] Logged the message, Master [19:33:42] (03CR) 10Jforrester: Disable VisualEditor in content namespaces on svwiktionary (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98871 (owner: 10Catrope) [19:35:26] RobH or cmjohnson1, can one of you help me out with https://rt.wikimedia.org/Ticket/Display.html?id=6390? (Not urgent, but sometime in the next couple of days would be good.) [19:35:35] !log reedy synchronized php-1.23wmf5/extensions/Wikibase [19:36:00] Logged the message, Master [19:38:57] !log updated webstatscollector to 0.2-1 on erbium and gadolinium [19:39:13] Logged the message, Master [19:40:26] andrewbogott: yea, i forgot about this. you just dont need them and labs does [19:40:33] if i dont get to it today, im not sure we will this week [19:40:43] cuz we have datacenter visits for tomorrow onward [19:41:00] and i think chris is attending them [19:41:08] so no one in ashburn for remainder of week [19:41:23] andrewbogott: is it an issue of you just dont want them under analytics anymore or you need for labs immediately? [19:41:24] (03CR) 10Catrope: Disable VisualEditor in content namespaces on svwiktionary (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98871 (owner: 10Catrope) [19:41:32] trying to figure out if i need to stop what im working on for this [19:41:37] RobH: OK… it doesn't require any hands-on business does it? Just DNS changes? [19:41:39] (or reshuffle my task list [19:41:44] no, it has hands on label changes as well [19:41:55] but, those can be delayed if the systems are needed to be in service sooner than later [19:42:00] It's not an emergency, but Mike will be setting up ashburn labs sometime soon (next week, maybe?) and right now doesn't have any hardware to do it with. [19:42:05] ok [19:42:14] so worst case we'll do all but labels then [19:42:16] which is remote [19:42:17] (03CR) 10Jforrester: Disable VisualEditor in content namespaces on svwiktionary (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98871 (owner: 10Catrope) [19:42:19] except, hrmm [19:42:20] sure. [19:42:24] labsl has bonded network interfaces yes? [19:42:31] so each system will need more network ports added, that is onsite. [19:43:18] I'm not sure. I know that the (as-yet-undesignated) network node needs bonded ports. Not sure if the VM hosts do too. [19:43:25] In theory Ryan knows about all this but he's in contract limbo :( [19:43:37] i asked on ticket for recortd [19:43:41] record even [19:43:54] (03PS2) 10Catrope: Disable VisualEditor in content namespaces on svwiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98871 [19:43:58] so yea, pretty sure we would also just do the base install without the bonded interfaces as well [19:44:04] so it doesnt stop initial deployment, just actual use [19:44:11] * andrewbogott nods [19:44:36] At the moment the 'network node' is a different box/different ticket from 6390. But I don't know a ton about what we need for either. [19:45:15] well, i dunno what differs on them [19:45:31] so really all we can do is kill old naming and put the new ones in place [19:45:41] but that can happen this week easily, i bookmarked and added to my tasks [19:45:46] ok, thanks [19:46:05] that leaves the actual vlan assignments and OS installs to be done still though, just fyi [19:46:40] (also all of that stuff is wayyyy easier now due to lifecycle doc) [19:47:08] andrewbogott: so since these are all just moving from analytics to labs... i technically dont have to do it if you wanna ;] [19:47:25] I can just attach my approval for the migration and anyone can do the actual decom and migrate [19:47:27] The OS install I'll probably do myself. I know nothing about vlan setup, alas. mhoover, do you have a vision for the network setup we need? [19:47:45] RobH: It's up to you to decide if it's easier for you to just do it, or to tell me how :) [19:48:04] im happy to do it [19:48:07] and im happy to help you do it [19:48:13] i realize the latter takes slightly longer [19:48:19] but then you never need me to do it for you again [19:48:23] (your call ;) [19:48:29] but, brb, gotta swap laundry [19:49:10] RobH: I think that… when you next have time to devote to this, you should talk me through it if I'm around, and just do it if I'm not. [19:52:01] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [19:53:00] !log reedy synchronized php-1.23wmf5/extensions/ProofreadPage/ [19:53:16] hrm [19:53:17] Logged the message, Master [19:54:09] (03PS8) 10BryanDavis: [WIP] Add configuration for Wikimania Scholarships [operations/puppet] - 10https://gerrit.wikimedia.org/r/98740 [19:57:21] andrewbogott: cool, if i dont have time today, i will tomorrow psot RFP meeting and pre DC meeting [19:57:34] i have that open time with nothing scheduled for about an hour or so [19:57:43] so tomorrow, late morning Pacific [19:58:15] so i'll either do this by then, or with you then [19:59:18] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikisources back to 1.23wmf5 [19:59:39] Logged the message, Master [19:59:51] !log built and importing php-luasandbox 1.8 to apt repo [20:00:07] Logged the message, Master [20:01:30] RECOVERY - Host labstore1001 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [20:04:57] 'kthanks [20:07:06] ottomata, join us in wikimedia_security? [20:08:55] *mediawiki [20:09:41] *mikiwedia [20:16:03] PROBLEM - NTP on labstore1001 is CRITICAL: NTP CRITICAL: Offset unknown [20:19:07] (03PS2) 10Ottomata: Adding varnishkafka::monitoring class to send stats to Ganglia. [operations/puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/98601 [20:20:11] (03CR) 10Ottomata: [C: 032 V: 032] Adding varnishkafka::monitoring class to send stats to Ganglia. [operations/puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/98601 (owner: 10Ottomata) [20:21:03] RECOVERY - NTP on labstore1001 is OK: NTP OK: Offset 0.0008314847946 secs [20:22:48] paravoid, is there a puppetized varnish cluster in labs I can test varnishkafka puppetization on? [20:24:32] (03PS3) 10Jforrester: Disable VisualEditor in content namespaces on svwiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98871 (owner: 10Catrope) [20:24:58] (03CR) 10Jforrester: [C: 031] Disable VisualEditor in content namespaces on svwiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98871 (owner: 10Catrope) [20:25:41] (03CR) 10Jforrester: "Follow-up in Ie190a5853d5e1 to actually fix this." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97711 (owner: 10Jforrester) [20:35:21] (03PS1) 10Ori.livneh: Enable module storage by default [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98900 [20:37:15] (03PS1) 10Manybubbles: Make Cirrus the default on test2wiki again [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98901 [20:37:16] (03PS9) 10Ottomata: Setting up varnishkafka on mobile varnish caches [operations/puppet] - 10https://gerrit.wikimedia.org/r/94169 [20:37:42] (03CR) 10Ottomata: Setting up varnishkafka on mobile varnish caches (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/94169 (owner: 10Ottomata) [20:41:06] (03CR) 10Ori.livneh: [C: 032] Enable module storage by default [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98900 (owner: 10Ori.livneh) [20:41:54] Fun with disks: labstore1001 reaches /dev/sdbh :-) [20:44:13] !log new mediawiki-1.22rc3 is out [20:44:19] greg-g: ^:) [20:44:34] Logged the message, Master [20:44:44] :) [20:45:59] !log ori updated /a/common to {{Gerrit|If02d8506c}}: Enable module storage by default [20:46:06] shhhhh [20:46:13] i'm hunting for wabbits [20:46:15] I didn't see it [20:46:30] Logged the message, Master [20:46:50] Coren: What's that, 61 disks? [20:46:56] 60 [20:47:03] !log ori synchronized wmf-config/InitialiseSettings.php 'If02d8506c: Enable module storage by default' [20:47:19] How much space is that total? [20:47:24] Logged the message, Master [20:48:15] ~107T [20:48:50] Obviously, there's going to be some raid going on there though. :-) [20:49:21] Only a tenth of a petabyte? ;) [20:49:58] Dude! I remember being exctatic at having a 10M drive. [20:50:07] PROBLEM - Apache HTTP on mw1150 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:50:17] PROBLEM - Apache HTTP on mw1151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:51:17] RECOVERY - Apache HTTP on mw1151 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 5.393 second response time [20:51:30] hrm. [20:51:39] I remember being excited about using 5.25" floppies instead of a cassette tape on my first computer. Not that they worked *well*, but random access! [20:52:02] The first 1G storage system I ever saw was a robotic tape library about the size of 3 household refrigerators. Now I have a 64G thumb drive so small I lose it about once a week. [20:52:26] bd808: how often do you find it? [20:52:36] Funny enough, I never went through the FSK tape phase. Got an odd start with a self built first computer that had an 8" drive. [20:53:14] greg-g: so far just as often as I lose it. I imagine that the upper limit will be N-1 times. [20:53:24] :) [20:53:34] paravoid: around? [20:54:24] now, where's my CSS [20:54:44] load peaked as a result of my deployment [20:54:49] but it seems to be subsiding: http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=Bits+application+servers+eqiad&m=cpu_report&s=descending&mc=2&g=load_report [20:54:52] if it doesn't i'll rever it [20:54:57] RECOVERY - Apache HTTP on mw1150 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.057 second response time [20:55:17] PROBLEM - Apache HTTP on mw1151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:56:00] ori-l: went back up [20:56:47] i'm watching it; give it a minute or two [20:56:53] Almost seems like a shame to have to raid over this and go back down to two digits. :-) [20:57:07] RECOVERY - Apache HTTP on mw1151 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.063 second response time [20:57:38] Coren: easy fix: just buy more disks [20:58:11] down again ori-l \o/ [20:58:19] give it a couple, it might go back up [20:58:20] ;) [21:00:26] it's back to normal now [21:04:18] * Coren plays raid tetris and attemps to figure out the best reliability/performance/space balance for 48 disk in 4 arrays. [21:05:16] stripe on 4 12-disk raid 6s gives me most space, splits writes, but no spares. [21:25:28] yurik: ping [21:25:36] mutante, yep [21:26:06] yurik: i put the change for zero on test.wikipedia [21:26:20] before: [21:26:28] http://m.wikipedia.org * 302 Found http://incubator.wikimedia.org/wiki/Wp/m?goto=mainpage [21:26:42] checking .... no wonder test doesn't work... [21:26:42] after: [21:26:45] http://m.wikipedia.org * 302 Found http://en.m.wikipedia.org/wiki/Main_Page [21:26:52] before: [21:27:03] http://zero.wikipedia.org * 302 Found http://en.zero.wikipedia.org/wiki/Special:ZeroRatedMobileAccess [21:27:06] after: [21:27:13] http://zero.wikipedia.org * 302 Found http://en.m.wikipedia.org/wiki/Main_Page [21:27:43] yurik: that was just like a minute before [21:27:44] strange... checknig [21:27:52] that i changed test [21:27:53] no no, i haven't looked yet [21:27:57] ok [21:28:19] so the part that it does NOT go to Special:ZeroRatedMobileAccess looks intended [21:28:53] but we don't get mobilelanding.php [21:29:01] mutante, this is not public yet, right? just on a test server? [21:29:10] and it doesn't go through varnish [21:29:20] yurik: just on mw1017, which is test [21:29:35] does it hit varnish on the way in? [21:29:50] because varnish has some other magic rewriting [21:29:51] i suppose so, yea [21:30:05] but i'm just testing this by asking apaches [21:30:07] ok, could introduce some weird behaviour there - we need to test the apache without varnish [21:30:08] from fenari [21:30:11] oh [21:30:13] using apache-fast-test [21:30:25] what's that? [21:30:27] eh, yea, what i did [21:30:30] learning a new tool [21:31:00] so yes, this is suspiscious [21:31:00] apache-fast-test --help on fenari [21:31:09] it's a script written by jeff [21:31:35] which let's you test apache changes on either a single host ..or the whole cluster [21:31:50] source => "puppet:///files/misc/scripts/apache-fast-test" [21:31:51] or just: curl -H 'host: www.wikipedia.org' mw1001/ [21:32:05] !log Reloading zuul to deploy Iceb6b4016c [21:32:22] Logged the message, Master [21:32:26] thx, let me check for a sec [21:32:41] mutante, so you are sure it doesn't go through varnish, right? [21:33:34] yurik: pretty sure when using this script, yea [21:33:46] it asks mw1017 directly [21:34:02] usage: make a text file with any URLs to test and then [21:34:13] dzahn@fenari:~$ apache-fast-test zero.url mw1017 [21:34:29] where zero.url just has a couple URLs , one per line [21:34:54] you can compare by running the same on mw1018 which isnt changed [21:35:17] mutante, yep, already running. Do you know if its possible to set a manual header with that script? [21:35:39] to simulate varnish's custom headers [21:35:53] i'm not sure .. but would be curl then [21:37:06] (03CR) 10Dzahn: [C: 04-1] "i put this on test.wp (mw1017) and used apache-fast-test on fenari to ask the apache:" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/97115 (owner: 10Yurik) [21:37:58] mutante, ok, testing it a bit further - it seems okeish (one minor correction might be needed on the backend, but its not something to hold this over with) [21:38:07] give me a sec to finish testing [21:38:20] yurik: sure, cool [21:38:35] be back after a short break then [21:39:09] ori-l, thanks, didn't notice your answer - was exactly what i was looking for :) [21:41:25] !log Reloading zuul to deploy I889600fc65d8f [21:41:41] Logged the message, Master [21:42:37] (03CR) 10Aaron Schulz: [C: 031] Fix up multiversion to not require dba_* functions [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/93622 (owner: 10Chad) [21:45:12] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [21:46:01] ^demon|lunch: I suppose it's fine then [21:46:10] mutante, yeah, seems to be ok. I was using curl -H 'host: zero.wikipedia.org' -H 'X-CS: 250-99' -D /dev/stdout mw1017 [21:47:07] <^demon|lunch> Aaron|home: Sorry, distracted by something else. Maybe I should get a window to deploy it so nobody yells at us for merging. [21:50:30] I love have ^demon|lunch is talking to Aaron|home. |lunch -> |home. [21:52:26] Aaron, around? [21:53:57] Aaron, i need flaggedRevs deployed on meta, should i just follow the instructions in the http://git.wikimedia.org/blob/operations%2Fmediawiki-config.git/a8723d447344c57a4f40b52eae076c683201a11a/wmf-config%2FInitialiseSettings.php#L10138 [21:55:21] Not sure if that should have been Aaron|home or AaronSchulz :) [21:56:36] Have you got consensus for doing so? [21:57:36] Reedy: it's only for a special NS [21:58:13] "press enter" lol [21:58:49] hrm, should be 'mwscript sql.php ruwikinews extensions/FlaggedRevs/schema/mysql/FlaggedRevs.sql' [21:58:58] Reedy, consensus from whom? this is for zero namespace only [21:59:14] otherwise that looks up to date [21:59:18] which zero team uses exclusively for configurations [21:59:24] yurik: thanks, confirmed. so that is Location: http://ru.m.wikipedia.org/wiki/Special:ZeroRatedMobileAccess [21:59:38] as opposed to Location: http://en.zero.wikipedia.org/wiki/Special:ZeroRatedMobileAccess [22:00:03] mutante, ?? for which request? [22:00:34] yurik: curl -H 'host: zero.wikipedia.org' -H 'X-CS: 250-99' -D /dev/stdout mw1017 [22:00:39] copied your command [22:01:11] oh yes - that is correct [22:01:21] in other words redirect is X-CS and subdomain based [22:01:29] thx [22:02:14] ok, please confirm on gerrit [22:03:28] mutante, done [22:03:28] (03CR) 10Yurik: [C: 031] "Verified using" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/97115 (owner: 10Yurik) [22:03:42] damn, i am faster than the tubes! [22:04:00] or at least gerrit... hmm, not much of an achievement :-P [22:04:42] Aaron|home, should i update it in docs, or will you? [22:04:59] (03CR) 10Dzahn: [C: 032 V: 032] "becomes: (f.e) Location: http://ru.m.wikipedia.org/wiki/Special:ZeroRatedMobileAccess" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/97115 (owner: 10Yurik) [22:05:09] you can :) [22:05:19] yurik: so.. deploying then.. ack [22:05:47] mutante, yep [22:06:05] mutante, until varnish change is deployed (separate patch), this is a noop :) [22:06:21] yurik: noop is great.. on it [22:06:42] PROBLEM - DPKG on labstore1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [22:08:21] mutante, once virtual roots are done, this is where the fun begins: https://gerrit.wikimedia.org/r/#/c/97122/ [22:09:35] !log sync-apache, graceful-all for zero/m redirect (gerrit 97115) [22:09:51] Logged the message, Master [22:13:11] mutante: how can i push packages to gerrit? [22:14:08] (03CR) 10Dzahn: "sync-apache, graceful-all, done, this is live." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/97115 (owner: 10Yurik) [22:14:35] matanya: packages? or patches [22:14:43] Reedy: argggg [22:14:48] mutante, awesome!!! :) [22:14:55] mutante: !? [22:14:59] Reedy: it looks sync-apache didnt catch all [22:15:01] ..again [22:15:02] a package. i guess, i took https://rt.wikimedia.org/Ticket/Display.html?id=6344 [22:15:05] :( [22:15:19] what should i do when it is ready? [22:15:29] yurik: saw the "pybal" option of apache-fast-test yet? [22:15:37] we still need to check some servers i'm afraid [22:15:54] running pybal :) [22:16:25] matanya: first check if it exists on apt.wikimedia.org [22:16:43] it does mutante [22:16:59] matanya: then check gerrit for a project like operations/debs/foo [22:17:06] and clone from that repo [22:17:19] and then use git review etc as usual, just on that repo instead of ops/puppet [22:17:26] mutante: https://git.wikimedia.org/summary/operations%2fdebs%2fruby-jsduck.git/HEAD [22:17:57] matanya: yea, then git clone [22:17:57] operations / debs/ruby-jsduck [22:18:15] i gotta check those non-synced servers first [22:18:16] brb [22:18:34] that part i gueesed :) what about the real package build at the end? [22:18:37] mw30.pmtpa.wmnet 302 Found http://incubator.wikimedia.org/wiki/Wp/m?goto=mainpage [22:18:43] mw41.pmtpa.wmnet 302 Found http://incubator.wikimedia.org/wiki/Wp/m?goto=mainpage [22:18:55] mw42.pmtpa.wmnet 302 Found http://en.m.wikipedia.org/wiki/Main_Page [22:19:31] mw82.pmtpa.wmnet 302 Found http://en.m.wikipedia.org/wiki/Main_Page mw83.pmtpa.wmnet 302 Found http://en.zero.wikipedia.org/wiki/Special:ZeroRatedMobileAccess [22:19:35] meeeh [22:20:15] mw83 .. API server [22:22:18] yea, API servers don't get graceful (but we thought we fixed it) [22:22:30] it's that sanity check script [22:26:26] and/or dsh groups out of sync again sigh [22:26:30] yurik: fixed now [22:27:37] !log graceful'ed apaches on mw1078,mw1111,mw1176,mw1194,mw1096,mw1107 (dsh out of sync again? sigghh) [22:27:57] Logged the message, Master [22:28:01] (03PS9) 10BryanDavis: [WIP] Add configuration for Wikimania Scholarships [operations/puppet] - 10https://gerrit.wikimedia.org/r/98740 [22:28:04] apache-fast-test zero.url pybal | grep eqiad [22:29:25] (but i have no idea about the incubator links left on some pmtpa) [22:29:57] mutante, awesome, thank you! do you want to get the related varnish patch out as well? [22:30:11] and make it non-noop :) [22:30:42] mutante, hmm, incubator links are weird, not sure what that is [22:31:14] yurik: yw. but no, sorry, no varnish patch for me and i haven't even gotten to anything i wanted to do since the morning [22:31:40] yurik: i don't think we have to worry or it would have been a bug report for quite a while [22:32:01] mutante, don't worry about it, and thank you!!! i'm sure paravoid wants to deploy that varnish patch soon :) [22:32:01] pretty sure they are not used and out of groups [22:32:06] cool [22:32:57] (03PS1) 10BryanDavis: Ignore .rbenv-version [operations/puppet] - 10https://gerrit.wikimedia.org/r/98988 [22:34:31] (03PS2) 10Yurik: Removed X-DfltLang & X-DfltPage from zero VCLs [operations/puppet] - 10https://gerrit.wikimedia.org/r/97122 [22:34:37] (03PS3) 10Yurik: Removed X-DfltLang & X-DfltPage from zero VCLs [operations/puppet] - 10https://gerrit.wikimedia.org/r/97122 [22:35:34] (03CR) 10Ori.livneh: [C: 032 V: 032] Ignore .rbenv-version [operations/puppet] - 10https://gerrit.wikimedia.org/r/98988 (owner: 10BryanDavis) [22:38:44] !log graceful'ed a couple more mw*/srv* in pmtpa to make them all pickup the same config [22:38:59] yurik: ^ there, just to make it nicer .. and now i'm out [22:39:00] Logged the message, Master [22:39:18] thx! :) [22:43:27] I'm making the puppet config for the new Wikimania Scholarships and I need to specify the mysql host for the PHP app to connect to. Does anyone know what the host would be to connect to "the misc MySQL shard" that paravoid said I should use? [22:43:55] This is a non-wiki product that needs a single host to connect to at the moment. [22:44:23] bd808: that should be db1001 [22:44:26] .eqiad.wmnet [22:44:39] mutante: Thanks. [22:45:07] saying that because it's the replacement for db9, the historic misc db [22:50:03] (03CR) 10Dzahn: "after graceful'ing several apaches (see SAL), the apache-fast-test with pybal option looks fine. see how it summarizes the results when th" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/97115 (owner: 10Yurik) [22:53:12] matanya: re: real package build, switch to the cloned repo and try debuild -us -uc [22:53:36] (03PS10) 10BryanDavis: [WIP] Add configuration for Wikimania Scholarships [operations/puppet] - 10https://gerrit.wikimedia.org/r/98740 [22:54:03] matanya: eh, first decide if you want to use a labs host for building and install the required -dev packages [22:54:10] must run though.. for now [23:07:08] (03Abandoned) 10Yuvipanda: toollabs: Add uwsgi to exec_environ [operations/puppet] - 10https://gerrit.wikimedia.org/r/98126 (owner: 10Yuvipanda) [23:08:17] (03Abandoned) 10Yuvipanda: dynamicproxy: Prioritize url routes by length [operations/puppet] - 10https://gerrit.wikimedia.org/r/97758 (owner: 10Yuvipanda) [23:19:27] (03CR) 10coren: "After-the-fact -1: this should be in a new class analogue to webgrid for a uwsgi grid." [operations/puppet] - 10https://gerrit.wikimedia.org/r/98126 (owner: 10Yuvipanda) [23:20:11] (03CR) 10Ori.livneh: "There's also a uWSGI module now" [operations/puppet] - 10https://gerrit.wikimedia.org/r/98126 (owner: 10Yuvipanda) [23:30:54] (03Abandoned) 10GWicke: WIP Bug 56282: Gzip all Parsoid HTML before storing it in Varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/97647 (owner: 10GWicke) [23:34:24] Will anyone want to second me as mentor for https://bugzilla.wikimedia.org/show_bug.cgi?id=57613 [23:34:40] ori-l: pong [23:34:46] All of engineering welcome, obviously, although that's a really opsish thing. [23:36:01] paravoid: i was either pinging to give you a heads up about enabling module storage or trying to get you to save me when load went up on the bits app servers [23:36:25] ori-l: heh, too late :) [23:36:26] but load went back down after 2 mins and things are looking promising: [23:36:32] http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Bits+caches+eqiad&m=cpu_report&s=by+name&mc=2&g=network_report [23:36:46] and: https://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Bits+caches+pmtpa&m=cpu_report&s=by+name&mc=2&g=network_report [23:37:14] and: http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Bits+application+servers+eqiad&m=cpu_report&s=by+name&mc=2&g=network_report [23:37:30] wow [23:38:27] paravoid: Except that Ryan didn't, you know, suggest a single thing that does the primary necessary function. :-) [23:39:23] ori-l: cool stuff [23:39:24] paravoid: The only thing I ever say that came close is Chronos, and that's a monster that needs Mesos. [23:39:40] ori-l: you rock [23:40:06] well, i am still curious why exactly we're seeing the effect that we're seeing. do browsers just not cache or what? [23:40:26] I find that browser cache is imperfect in a lot of cases [23:40:34] And yes, ori-l does rock. [23:41:26] :) thanks; murphy's law dictates that someone one definitively proves that it had nothing to do with module storage [23:41:59] ori-l: all the URLs I see have max-age set to 300 [23:42:06] Actually what it dictates is that you broke something that causes this drop [23:42:10] unless this was introduced today, it might explain why browsers do not cache [23:42:11] Yeah that's the other thing I was afraid of [23:42:16] 5-minute-cached modules [23:42:19] i would expect a large effect if we had a bug like https://bugzilla.wikimedia.org/show_bug.cgi?id=56856 ("ResourceLoaderULSModule::getModifiedTime updates continuously") somewhere [23:42:24] But I can't think of any that would end up in localStorage [23:43:23] ori-l: We actually had that bug for anonymous user JS during the RL rollout. Bits apaches CPU declined spectacularly once we fixed that case to return 1 (1970-01-01 00:00:01) instead of 0 (now) [23:43:42] RoanKattouw: all of the bits URLs I see on enwiki's main page seem to be at 300 [23:43:42] (More generally, for any user who didn't have a user JS page, but anons were the lion's share of that) [23:43:51] Hmm, really? [23:44:04] well I checked a bunch of them manually [23:44:11] Is that just because your client has already fetched all of the 2592000 ones? [23:44:14] I'm sure there's some firebug trick that does this better [23:44:19] it could be that [23:44:26] the startup module is supposed to be 300 [23:44:30] you may have fetched the rest [23:45:12] you could try localStorage.removeItem('MediaWikiModuleStore:enwiki') in a js console and then reload [23:45:13] Also, most requests by number are 300 [23:45:13] by the rest you mean fetches initiated from javascript rather than link href/script src? [23:45:20] Most requests by size are 2592000 [23:45:24] Yes [23:45:32] All fetches initiated from the HTML source are 300 [23:45:33] by design [23:45:58] nod [23:47:47] ori-l: Alternative theory: a substantial number of our users are behind a 3rd-party proxy/cache that strips/munges/changes Cache-Control [23:48:04] nah, that's far fetched [23:49:40] I think it might be a misbehaving module [23:49:46] Math.max.apply( Math, $.map( mw.loader.moduleRegistry, function( m ) { return m.version }) ); [23:49:54] ^ i got different values across reloads [23:49:58] ha! [23:51:01] "mediawiki.language.data" [23:51:52] hmph [23:52:11] Is there a bug in getHashMTime then? [23:53:09] Or serialize( Language::factory( 'en' ) ) isn't stable? [23:54:02] ugh [23:54:39] i actually noted this in https://bugzilla.wikimedia.org/show_bug.cgi?id=56856 [23:54:43] i just realized [23:54:46] ''mediawiki.language.data' appears to likewise update continuously. ' [23:54:54] Yeah its version timestamp is now() [23:55:02] No idea why though [23:55:08] From reading the code it seems like it should work [23:55:17] famous words [23:57:10] Holy crap [23:57:14] It's being continuously updated in memc [23:57:34] lol [23:57:56] why would localstorage help, though? [23:58:15] it's the same bug [23:58:17] $this->language = Language::factory( $context->getLanguage() ); [23:58:45] paravoid: because only that one module would be reretrieved [23:59:07] oh so you mean that until now everything was being reretrieved because that one module changed its mtime all the time? [23:59:22] Oooooooh [23:59:33] paravoid: Well not everything, but a good chunk, yes [23:59:41] awesome. [23:59:53] $this->language = $this->request->getVal( 'lang' );