[00:03:01] (03CR) 10Dzahn: [C: 032] lucene: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/140665 (owner: 10Matanya) [00:05:28] <^demon|away> mutante: thx! [00:06:32] ^demon|away: yw, i checked a run on searchidx1001 [00:06:36] (03CR) 10Dzahn: "no change to be seen on searchidx1001" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140665 (owner: 10Matanya) [00:07:06] (03CR) 10Tnegrin: [C: 031] Milimetric access to stats user on stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/140051 (owner: 10Springle) [00:08:38] jouncebot: next [00:08:38] In 14 hour(s) and 51 minute(s): SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140624T1500) [00:09:29] (03CR) 10Dzahn: [C: 032] Apache settings for wikimania 2015 wiki [operations/apache-config] - 10https://gerrit.wikimedia.org/r/139288 (https://bugzilla.wikimedia.org/66370) (owner: 10Withoutaname) [00:10:24] (03PS1) 10Awight: WIP enable FundraisingTranslateWorkflow on meta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141607 [00:11:09] Reedy: ehmm.. does that not need a redirect anymore/yet ? [00:11:15] (03CR) 10jenkins-bot: [V: 04-1] WIP enable FundraisingTranslateWorkflow on meta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141607 (owner: 10Awight) [00:18:24] mutante: ? [00:19:40] Reedy: that change for wikimania2015, all it does is add the ServerAlias [00:19:45] but not a redirect rule [00:20:05] but that might be all that is needed now, if you/somebody changed it [00:20:18] so that all wikimanias use the same [00:20:18] I don't think it's needed [00:20:20] ok [00:41:24] enwiki beta seems to be server with content-type: application/octet-stream [00:41:34] *served* [00:42:32] tgr: apache 2.4 does not set a type anymore by default so they did it in varnish, wait.. finding related change [00:44:08] tgr: https://gerrit.wikimedia.org/r/#/c/141556/ , https://gerrit.wikimedia.org/r/#/c/141086/ , https://gerrit.wikimedia.org/r/#/c/138891/ [00:47:45] mutante: on a HTML page, the content type should still be set by Apache, I suppose? [00:49:34] tgr: i suppose so, i just thought the varnish change might have done it [00:49:43] if (!resp.http.Content-Type) { 373 » » set resp.http.Content-Type = "application/octet-stream"; [00:49:48] if for some reason it wasn't set already [00:50:20] especially if you say this is a recent change and not the case a few days ago [00:50:42] not sure, I just noticed [00:50:54] don't recall using beta in the last few days though [00:51:44] but it seems like something that would get noticed quickly, basically none of the beta sites work [00:52:45] let's make a bug [00:53:02] will do [00:53:15] thanks, i can comment on that change [00:54:10] which component is beta? QA? [00:56:42] mutante: https://bugzilla.wikimedia.org/show_bug.cgi?id=67012 [00:59:47] (03CR) 10Dzahn: "could this have broken beta? basically all beta sites are served as binary" [operations/puppet/varnish] - 10https://gerrit.wikimedia.org/r/141556 (owner: 10Ori.livneh) [01:03:32] tgr: cool [01:10:42] (03CR) 10Dzahn: "i'll add the DNS tomorrow, had to run" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/139288 (https://bugzilla.wikimedia.org/66370) (owner: 10Withoutaname) [01:19:12] (03PS2) 10Springle: Milimetric access to stats user on stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/140051 [01:20:56] (03CR) 10Springle: [C: 032] Milimetric access to stats user on stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/140051 (owner: 10Springle) [02:06:27] mutante: they work for me [02:06:48] i did test the change on the beta varnishes but if it did anything it was transient [02:14:41] !log LocalisationUpdate completed (1.24wmf9) at 2014-06-24 02:13:38+00:00 [02:14:52] Logged the message, Master [02:26:46] !log LocalisationUpdate completed (1.24wmf10) at 2014-06-24 02:25:43+00:00 [02:26:50] Logged the message, Master [02:30:38] (03PS3) 10Withoutaname: DNS settings for wikimania 2015 wiki [operations/dns] - 10https://gerrit.wikimedia.org/r/140186 (https://bugzilla.wikimedia.org/66370) [02:30:58] (03PS4) 10Withoutaname: Initialize some settings for wikimania 2015 wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139279 (https://bugzilla.wikimedia.org/66370) [02:48:20] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Jun 24 02:47:14 UTC 2014 (duration 47m 13s) [02:48:26] Logged the message, Master [03:36:22] (03PS8) 10Withoutaname: Delete ve.wikimedia.org and leave redirect [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131907 (https://bugzilla.wikimedia.org/55737) [03:37:20] mutante: thanks for the merge! [04:07:20] (03PS22) 10KartikMistry: cxserver configuration for beta labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/139095 (owner: 10Nikerabbit) [04:34:09] (03PS1) 10Yuvipanda: Raise account creation limits for tewiki outreach event [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141637 (https://bugzilla.wikimedia.org/67017) [04:34:36] greg-g: ^ [05:06:03] * YuviPanda|woozy wave at spagewmf [05:06:03] err [05:06:05] springle: [05:06:14] * YuviPanda|woozy waves at springle. thoughts on https://bugzilla.wikimedia.org/show_bug.cgi?id=66786? [05:07:27] :) [05:07:48] I can't IRC today, apparently [05:12:08] Hi guys. I still haven't gotten any emails at mediawiki-commits@lists.wikimedia.org. [05:12:20] I think something is broken, as I did get emails on other lists. [05:12:32] I'm trying to confirm with someone else subscribed. [05:12:45] siebrand: oh I just ifled bug for that [05:12:58] https://bugzilla.wikimedia.org/67018 [05:13:08] I reported it last night here, but didn't get replies. [05:13:36] Nikerabbit: Thanks for confirming. I was going down the subscriber list alphabetically. [05:14:48] probably need to wait for european people to wake up [05:18:48] YuviPanda|woozy: yeah, makes sense. but i really don't know the full history on the two views [05:19:06] we should follow up with coren [05:37:12] (03CR) 10Springle: [C: 031] Kill $wgEnableNewpagesUserFilter [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141100 (https://bugzilla.wikimedia.org/58932) (owner: 10TTO) [05:38:27] (03PS2) 10Springle: Kill $wgEnableNewpagesUserFilter [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141100 (https://bugzilla.wikimedia.org/58932) (owner: 10TTO) [05:53:07] Nikerabbit: if no gerrit emails are going at all, your bug is a bit misleading :) [05:53:32] Nemo_bis: fix it then [05:54:18] will check timestamps [05:56:08] Look like no email from Gerrit. [06:58:01] (03PS1) 10Springle: S4 repool db1049, warm up. S1 raise db1071 to normal load. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141650 [06:58:45] (03CR) 10Springle: [C: 032] S4 repool db1049, warm up. S1 raise db1071 to normal load. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141650 (owner: 10Springle) [06:58:51] (03Merged) 10jenkins-bot: S4 repool db1049, warm up. S1 raise db1071 to normal load. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141650 (owner: 10Springle) [06:59:44] !log springle Synchronized wmf-config/db-eqiad.php: repool db1049 (duration: 00m 08s) [06:59:49] Logged the message, Master [07:11:21] (03PS1) 10Springle: Maintenance reports limit incremental increase. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141652 [07:30:54] (03PS1) 10Kmosher: Add missing double qoutes [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/141653 [07:37:04] (03PS2) 10Kmosher: Add missing double qoutes [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/141653 [07:39:08] good morning [07:41:58] _joe_: hi! is your puppet catalog compiler able to diff catalog between a change and its parent? [07:42:25] <_joe_> hashar: yes [07:42:45] <_joe_> hashar: its parent not being production? [07:42:49] <_joe_> not at the moment [07:42:56] <_joe_> but I will have to adapt the tool [07:43:05] <_joe_> now that we're on puppet 3 [07:43:34] I thought about compiling the current production + the change request and output the diff [07:43:46] with an option to compare with parent change instead of current production [07:44:15] <_joe_> hashar: no, we don't have that [07:44:30] <_joe_> not now, but it's officially on the whishlist now. [07:44:35] \O/ [07:45:24] and vote, please [07:45:49] and the compilation fails :( http://puppet-compiler.wmflabs.org/101/change/141572/html/gallium.html [07:46:00] err: 'undef' from left operand of 'in' expression is not a string at /opt/wmf/software/compare-puppet-catalogs/external/puppet/manifests/network.pp:402 on node gallium [07:46:10] 2nd whishlist : have the compilation pass hehe [07:46:15] <_joe_> hashar: wrong node name [07:46:32] <_joe_> hashar: you must use the fqdn as found in site.pp and in the yaml facts [07:46:45] <_joe_> that error is the compiler not finding the facts :) [07:47:02] * hashar retries [07:47:12] could it be made more obvious? :-] [07:47:23] i.e. attempt to find fact for a node and bails out early if they can't be found? [07:48:40] <_joe_> hashar: we can add a pre-flight check [07:48:51] <_joe_> hashar: that is something puppet should do ;) [07:49:20] (03CR) 10Hashar: "Whole chain compiled on gallium:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/141572 (owner: 10Hashar) [07:49:24] http://puppet-compiler.wmflabs.org/102/change/141572/html/gallium.wikimedia.org.html \O/ [07:49:27] _joe_: thank you [07:56:42] (03PS1) 10Hashar: zuul: migrate statsd_host to zuul::server [operations/puppet] - 10https://gerrit.wikimedia.org/r/141657 [07:57:25] hey all [07:57:33] is db replication on labs totally broken? [07:57:45] MariaDB [dewiki_p]> select page_id from page where page_namespace=1 and page_title='Financial_Times'; [07:57:45] +---------+ [07:57:45] | page_id | [07:57:49] +---------+ [07:57:50] | 518427 | [07:57:50] +---------+ [07:57:50] 1 row in set (0.00 sec) [07:57:54] MariaDB [dewiki_p]> select * from categorylinks where cl_to=518427; [07:57:54] Empty set, 65535 warnings (9.52 sec) [07:58:32] (03CR) 10Hashar: [C: 031 V: 032] "Tested on labs, change is a noop \O/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/141657 (owner: 10Hashar) [08:02:31] <_joe_> JohannesK_WMDE: show warnings tells you something? [08:02:48] <_joe_> (no idea about labs dbs inner workings, just trying to help ) [08:09:09] _joe_: lol yes. should have used cl_from. :) [08:13:47] (03PS1) 10Hashar: zuul: migrate git_email git_address under zuul::merger [operations/puppet] - 10https://gerrit.wikimedia.org/r/141659 [08:16:14] (03PS2) 10Hashar: zuul: migrate git_email git_address under zuul::merger [operations/puppet] - 10https://gerrit.wikimedia.org/r/141659 [08:23:29] _joe_: you were talking about linting tasks, can i be of a help ? [08:25:54] (03Abandoned) 10Hashar: zuul: migrate git_email git_address under zuul::merger [operations/puppet] - 10https://gerrit.wikimedia.org/r/141659 (owner: 10Hashar) [08:36:30] <_joe_> matanya: yes, sorry, just seen your message [08:37:08] _joe_: let me know, what ever i can help :) [08:37:08] <_joe_> but, I was in deep research mode, now I have to run an errand, then I'll explain you what we need to do, ok? [08:37:33] sorry to distract you [08:37:46] <_joe_> matanya: don't worry, I had IRC notifications silenced :) [08:38:14] <_joe_> I just stopped [08:38:36] <_joe_> bbiab [08:45:53] (03PS1) 10Hashar: zuul: patch of doom (WIP) [operations/puppet] - 10https://gerrit.wikimedia.org/r/141663 [08:50:36] * _joe_ back [08:50:51] (03PS2) 10Hashar: zuul: patch of doom (WIP) [operations/puppet] - 10https://gerrit.wikimedia.org/r/141663 [08:52:04] (03PS3) 10Hashar: zuul: patch of doom (WIP) [operations/puppet] - 10https://gerrit.wikimedia.org/r/141663 [08:54:49] (03PS4) 10Hashar: zuul: patch of doom (WIP) [operations/puppet] - 10https://gerrit.wikimedia.org/r/141663 [09:12:34] qchris or _joe_, can either of you fix gerrit mails https://bugzilla.wikimedia.org/show_bug.cgi?id=67018 or should we wait for par.avoid mar.k Jeff [09:12:57] * qchris goes to read the bug. [09:16:41] Nemo_bis: Looks like nothing on the gerrit side changed. So I guess we'll have to wait for Ops. [09:19:35] <_joe_> Nemo_bis: sorry, taking a look [09:20:07] <_joe_> however, paravoid did some mail migration yesterday, so if he's around he may help [09:24:03] _joe_ Gerrit is using smtp.pmtpa.wmnet as SMTP, but pinging it from ytterbium does not arllow to resolve it. [09:24:25] Which would be the correct smtp server to use? [09:24:48] wiki-mail or wiki-mail-eqiad? [09:25:14] https://gerrit.wikimedia.org/r/#/c/141425/ [09:25:27] <_joe_> qchris: on it [09:25:51] k [09:27:48] <_joe_> wiki-mail.wikimedia.org I'd say [09:28:11] Yup. [09:28:20] <_joe_> so yes, that is the problem for gerrit [09:30:46] (03PS1) 10QChris: Switch gerrit to use wiki-mail.wikimedia.org as SMTP server [operations/puppet] - 10https://gerrit.wikimedia.org/r/141665 [09:31:33] _joe_ ^ (I have no clue about your gerrit handle :-/ ) [09:32:00] <_joe_> don't worry, I can safely look after this. [09:32:17] <_joe_> qchris: /whois _joe_ may give a hint [09:32:32] :-) [09:32:41] (03CR) 10Giuseppe Lavagetto: [C: 032] Switch gerrit to use wiki-mail.wikimedia.org as SMTP server [operations/puppet] - 10https://gerrit.wikimedia.org/r/141665 (owner: 10QChris) [09:35:46] <_joe_> qchris: making puppet run now [09:35:56] Watching logs :-) [09:37:54] gerrit restarted without problems. [09:37:58] <_joe_> yes [09:38:07] <_joe_> now everything should work [09:42:04] PROBLEM - Unmerged changes on repository puppet on virt0 is CRITICAL: Fetching origin [09:51:27] hi. on pathcset-- https://gerrit.wikimedia.org/r/#/c/141287/ we had to remove the contents of file modules/mediawiki/files/php/mail.ini. paravoid tells me to use ensure => absent to make this change happen. can someone help me figure out where this ensure => absent be given ? [09:52:41] tonythomas: what file needs to be absent? /etc/php5/conf.d/mail.ini ? [09:52:47] yup [09:52:57] ok [09:53:06] inorder to remove the '-f ' rewrite [09:53:22] so bring that back and add ensure => absent above source => [09:53:25] so, we just delete the file -- or put this absent somewherer ? [09:53:40] the absent takes care of the removal [09:54:11] <_joe_> tonythomas: puppet is a declarative language, so in this case you DECLARE that you want the file /etc/php5/conf.d/mail.ini to be absent [09:54:14] after the file is deleted on all hosts in the cluster, we can remove that resource [09:54:24] <_joe_> that's the logic [09:54:44] ok. that explains. so now should find where this sournce=> is [09:54:53] *source => [09:55:07] !log removing old salt master cache on palladium, moved yesterday out of the way [09:55:12] Logged the message, Master [09:56:04] PROBLEM - Disk space on ms-be3003 is CRITICAL: DISK CRITICAL - free space: / 2070 MB (3% inode=94%): [09:56:27] ok got it https://github.com/wikimedia/operations-puppet/blob/95610e7aadcc81ccd3764329ae183e6dc2919409/modules/mediawiki/manifests/php.pp#L60 [09:56:34] just above where it was added right ? [09:56:55] tonythomas: line 61 in tonythomas modules/mediawiki/manifests/php.pp [09:57:20] matanya: yup. just found that. Thanks [09:57:44] tonythomas: basically : git checkout production modules/mediawiki/manifests/php.pp [09:58:05] add to line 61 ensure => absent [09:58:09] git add modules/mediawiki/manifests/php.pp [09:58:13] ok. and add this to line 61. yup [09:58:17] git commit --amend [09:58:22] git review -R [09:58:26] and you are done [09:58:32] ok. :) I will do that rightaway [09:58:43] don't forget to put a comma there tonythomas [09:58:53] i.e ensure => absent, [09:59:16] oh. ok. [10:02:54] (03PS7) 1001tonythomas: Removed exim errors_to to support custom Return-Path [operations/puppet] - 10https://gerrit.wikimedia.org/r/141287 [10:04:08] tonythomas: tabs are a no-no [10:04:30] oh ... and the deleted file was not added too :) correcting it now [10:04:32] (03CR) 10Matanya: Removed exim errors_to to support custom Return-Path (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/141287 (owner: 1001tonythomas) [10:05:09] RECOVERY - Unmerged changes on repository puppet on virt0 is OK: Fetching origin [10:05:27] matanya: I should still keep that deleted file -- as deleted in the patch ? [10:05:49] no, that should be a seperate path [10:06:31] ok. sending in new one now [10:07:31] (03PS8) 1001tonythomas: Removed exim errors_to to support custom Return-Path [operations/puppet] - 10https://gerrit.wikimedia.org/r/141287 [10:08:45] matanya: looks good now ? [10:08:52] looking [10:09:11] yes tonythomas [10:11:49] matanya: yay ! and you would've seen what the patch does, we are trying to set in the envelope sender by giving the 5 th param ( -f ) in UserMailer:: php mail(). again, when the extension is not there -- this can cause mails to get envelope sender set as root@wikimedia.org, which should be wiki@wikimedia.org [10:13:19] (03CR) 10QChris: "Forgot to link the corresponding to this commit:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/141665 (owner: 10QChris) [10:15:26] yes, i followed your actions tonythomas very nice work. keep up the good work [10:19:06] matanya: Thanks :). we were on it for some good time. anyway, do you know how I will check whether a hook executed correctly -- the return value always seems to be 1, regardless the hook exist or not :o [10:19:43] bot really, at least not without checking [10:19:46] *not [10:21:18] ok. here is the patch -- https://gerrit.wikimedia.org/r/#/c/138655/25/includes/UserMailer.php line 247 [10:21:36] I get the result always as 1 -- regardless the extension/ the hook is there [10:33:05] qchris and _joe_, thanks for fixing; there is also: ./modules/snapshot/templates/wikidump.conf.erb:smtpserver=smtp.pmtpa.wmnet [10:33:08] ./manifests/role/wikimania_scholarships.pp: smtp_host => 'smtp.pmtpa.wmnet' [10:33:16] Yup. [10:33:23] I am just preparing patches for them. [10:33:24] :-) [10:33:32] :-) [10:33:35] (03CR) 10QChris: "Since this commit back-fired for gerrit (which was still" [operations/dns] - 10https://gerrit.wikimedia.org/r/141425 (owner: 10Faidon Liambotis) [10:34:25] Not currently active but better not forget [10:37:18] <_joe_> qchris: thanks a lot! [10:37:54] (03PS1) 10QChris: Switch Wikimania-scholarships to wiki-mail.wikimedia.org as SMTP server [operations/puppet] - 10https://gerrit.wikimedia.org/r/141669 [10:38:02] (03PS1) 10QChris: Switch SMTP server for wikidump configuration to wiki-mail [operations/puppet] - 10https://gerrit.wikimedia.org/r/141670 [10:38:46] (03CR) 10QChris: "Untested." [operations/puppet] - 10https://gerrit.wikimedia.org/r/141670 (owner: 10QChris) [10:39:00] (03CR) 10QChris: "Untested." [operations/puppet] - 10https://gerrit.wikimedia.org/r/141669 (owner: 10QChris) [10:39:34] (03CR) 10QChris: "Corresponding changes are at" [operations/dns] - 10https://gerrit.wikimedia.org/r/141425 (owner: 10Faidon Liambotis) [10:41:49] hey [10:41:56] <_joe_> paravoid: hi :) [10:42:25] these are all incorrect [10:42:36] (and were beforehand) [10:42:54] they should be switched to $mail_smarthost [10:43:01] <_joe_> oh, ok, because the change I merged was a no-op basically [10:43:20] sorry about that, I should have greped [10:43:23] <_joe_> (the gerrit one) [10:43:38] it wasn't a noop [10:43:43] but you switched back to mchenry, that's bad :) [10:44:24] <_joe_> paravoid: uhm meaning, I restored the preceding situation, I hope that did not generate problems (gerrit emails are coming through for now) [10:44:58] <_joe_> I assumed there was going to be a better solution, I just wanted to fix it for the moment [10:45:08] yes, it was fine [10:45:09] thanks :) [10:46:29] why people use smtp instead of sendmail is beyond me [10:46:34] (03CR) 10QChris: [C: 04-1] "Seems this is wrong." [operations/puppet] - 10https://gerrit.wikimedia.org/r/141669 (owner: 10QChris) [10:46:50] (03CR) 10QChris: [C: 04-1] "Seems this is wrong." [operations/puppet] - 10https://gerrit.wikimedia.org/r/141670 (owner: 10QChris) [10:48:10] (03PS1) 10Filippo Giunchedi: update index page for wikimedia downloads [operations/puppet] - 10https://gerrit.wikimedia.org/r/141671 [10:48:53] paravoid: So what would be correct fix for gerrit ... using "$wikimail_smarthost" as smtp server? [10:49:13] I'm writing a commit to fix them all [10:49:18] Ok. Thanks. [10:49:27] thanks for catching it! [10:49:27] Thanks! [10:49:33] and sorry for the trouble [10:49:59] mail was unbelievably entangled [10:50:05] lots of legacy there [10:50:06] <_joe_> it always is [10:50:09] <_joe_> ;) [10:50:16] the old server was installed with ubuntu 7.04 and is unpuppetized [10:50:25] :-D [10:50:34] (originally installed with 7.04 [10:50:36] <_joe_> yes I discovered that earlier when I ssh'd into mchenry [10:51:13] <_joe_> I was pretty sure we had everything managed by puppet [10:52:36] that is only 7 years ago [10:52:43] (03PS1) 10Faidon Liambotis: Replace hardcoded references with $mail_smarthost [operations/puppet] - 10https://gerrit.wikimedia.org/r/141672 [10:53:30] I kept reading that as m chenry, only to discover now it is mc henry [10:53:48] I'm not seeing anything, did this happen? edsaperia: It's scheduled for 16:00 SF-time today. [10:53:59] (03Abandoned) 10QChris: Switch SMTP server for wikidump configuration to wiki-mail [operations/puppet] - 10https://gerrit.wikimedia.org/r/141670 (owner: 10QChris) [10:54:13] (03CR) 10Faidon Liambotis: [C: 032] Replace hardcoded references with $mail_smarthost [operations/puppet] - 10https://gerrit.wikimedia.org/r/141672 (owner: 10Faidon Liambotis) [10:54:15] (03Abandoned) 10QChris: Switch Wikimania-scholarships to wiki-mail.wikimedia.org as SMTP server [operations/puppet] - 10https://gerrit.wikimedia.org/r/141669 (owner: 10QChris) [10:54:21] godog: the naming convention for tampa would revel this :) [10:55:42] (03CR) 10Filippo Giunchedi: "left a comment to explain the idea behind this (get sth off the ground, refactor to happen)" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137803 (owner: 10Andrew Bogott) [10:55:52] speaking of which, will all servers be renamed and issued a new passport without even asking their preference, when moving to Texas? [10:55:56] matanya: hehe true [10:56:28] Nemo_bis: that's being done when nsa intercepts the parcels I believe [10:56:57] Ah ok, then I'm confident it will be thorough if nothing else. [10:57:29] heheh indeed [10:58:22] (03CR) 10QChris: "Since the commit message suggests to" [operations/dns] - 10https://gerrit.wikimedia.org/r/141425 (owner: 10Faidon Liambotis) [11:00:20] (03PS4) 10Filippo Giunchedi: Add roles for testing swift in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/137803 (owner: 10Andrew Bogott) [11:01:19] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Add roles for testing swift in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/137803 (owner: 10Andrew Bogott) [11:18:52] 12:46:27 why people use smtp instead of sendmail is beyond me [11:18:53] really? [11:28:10] yeah, why? [11:32:07] because smtp is a far superior interface [11:32:20] less ambiguous, sendmail has been somewhat inconsistent over time [11:32:28] and smtp has better error handling [11:32:44] and doesn't need a local, functioning sendmail install per se and can work across the network [11:34:46] there can also be advantages to not having these queues everywhere across your servers [11:34:59] you need to have a good handle on your configuration management & procedures for that [11:35:54] at grnet btw, I had the local exims listening on localhost:25 [11:36:15] so all apps that needed smtp were pointed to localhost:25 [11:36:41] of course you can do that if you manage it well [11:36:49] we didn't have puppet for all services; that kept the smarthost endpoint on the "system" side [11:36:54] but you can also make your central smtp submission hosts redundant of course [11:38:41] akosiaris: do you have plans to ditch manifests/backups.pp ? [11:40:36] matanya: no, only stuff from line 142 and below. Maybe refactor some parts of the rest but in a later time [11:41:17] akosiaris: was planning to lint the file, so wanted to verify you are not going to remove it [11:41:37] matanya: yeah, sure go ahead [11:42:19] PROBLEM - SSH on virt1000 is CRITICAL: Server answer: [11:42:29] PROBLEM - SSH on linne is CRITICAL: Server answer: [11:43:20] RECOVERY - SSH on virt1000 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [11:43:30] RECOVERY - SSH on linne is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7.1 (protocol 2.0) [11:46:39] (03PS2) 10Filippo Giunchedi: fix swift cache directory permissions [operations/puppet] - 10https://gerrit.wikimedia.org/r/140900 [11:46:58] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] fix swift cache directory permissions [operations/puppet] - 10https://gerrit.wikimedia.org/r/140900 (owner: 10Filippo Giunchedi) [11:51:21] (03PS1) 10Matanya: backups: lint bacula part [operations/puppet] - 10https://gerrit.wikimedia.org/r/141675 [12:01:45] (03PS1) 10Filippo Giunchedi: swift: trial icehouse upgrade in esams [operations/puppet] - 10https://gerrit.wikimedia.org/r/141677 [12:03:21] \o/ [12:06:09] |o/ \o| [12:06:28] looking like an high five [13:13:42] (03CR) 10Alexandros Kosiaris: [C: 032] backups: lint bacula part [operations/puppet] - 10https://gerrit.wikimedia.org/r/141675 (owner: 10Matanya) [13:21:45] Hm.. I know too little of the bash colour system to know what could cause this [13:21:47] https://integration.wikimedia.org/ci/job/mwext-VisualEditor-qunit/10252/console [13:22:13] From regular command line (bash 4.x) it works fine, but the Jenkins interpreter (emulating xterm) is missing the close marks [13:22:38] It used to work, must've been caused by a recent upgrade, but there's like a 100 layers of application and abstraction between that and this, no idea where to start. [13:23:48] I recall some vague stuff about how they're quite hacky and things aren't always compatible. Is there some color type that is perhaps compatible color wise, but not uncolor wise? I'd expect it to either work or not work, not to only work to color and not to to stop the color [13:24:14] (03PS1) 10Mark Bergsma: Add mr1-esams neighbor blocks and Tilaa OOB subnets [operations/dns] - 10https://gerrit.wikimedia.org/r/141681 [13:26:15] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 7.14% of data exceeded the critical threshold [500.0] [13:32:11] (03PS1) 10Mark Bergsma: Add loopback addresses for mr1-esams, cleanup decommissioned [operations/dns] - 10https://gerrit.wikimedia.org/r/141683 [13:35:40] (03PS2) 10Mark Bergsma: Add mr1-esams neighbor blocks and Tilaa OOB subnets [operations/dns] - 10https://gerrit.wikimedia.org/r/141681 [13:35:42] (03PS2) 10Mark Bergsma: Add loopback addresses for mr1-esams, cleanup decommissioned [operations/dns] - 10https://gerrit.wikimedia.org/r/141683 [13:40:06] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% data above the threshold [250.0] [13:41:39] (03PS3) 10Mark Bergsma: Add mr1-esams neighbor blocks and Tilaa OOB subnets [operations/dns] - 10https://gerrit.wikimedia.org/r/141681 [13:41:41] (03PS3) 10Mark Bergsma: Add loopback addresses for mr1-esams, cleanup decommissioned [operations/dns] - 10https://gerrit.wikimedia.org/r/141683 [13:42:44] (03PS4) 10Mark Bergsma: Add mr1-esams neighbor blocks and Tilaa OOB subnets [operations/dns] - 10https://gerrit.wikimedia.org/r/141681 [13:42:46] (03PS4) 10Mark Bergsma: Add loopback addresses for mr1-esams, cleanup decommissioned [operations/dns] - 10https://gerrit.wikimedia.org/r/141683 [13:44:23] (03CR) 10Mark Bergsma: [C: 032] Add loopback addresses for mr1-esams, cleanup decommissioned [operations/dns] - 10https://gerrit.wikimedia.org/r/141683 (owner: 10Mark Bergsma) [13:44:54] hiya chasemp, yt? [13:45:00] lingering Q for you on this RT: [13:45:01] https://rt.wikimedia.org/Ticket/Display.html?id=7596 [13:46:18] OK will take a look in just a minute, coffee'ing up now [13:47:01] cool [13:53:08] _joe_: bah nap took longer than expected :-/ Gotta prepare some meeting so the Zuul puppet patches will have to wait tomorrow [13:53:40] hmmm, access denied for user on deployment bastion when trying to access the databases (sql) [13:53:44] bd808|BUFFER: ^ [13:55:46] <_joe_> hashar: no problem for me. Still, this may be an early hint that you're having too many meetings [13:56:13] <_joe_> :) [13:56:33] how I only have 3 hours of meeting per week [13:56:42] the other 40+ hours are what is straining me [13:57:54] <_joe_> oh ok 3 hours a week is ok [13:57:55] _joe_: anyway I realized my chain of patches is not ideal and would need some rework. So will do that tonight or tomorrow and ping you back [13:58:01] <_joe_> ok [13:58:07] <_joe_> take your time [13:59:44] !log Build logs in Jenkins incorrectly render ansi color codes since it was upgraded to 0.4.0. Downgrading to 0.3.1 and restarting Jenkins. [13:59:47] Logged the message, Master [13:59:55] hashar: ^ [14:00:18] :-] [14:00:37] hashar: see https://integration.wikimedia.org/ci/job/VisualEditor-npm/1891/console and https://integration.wikimedia.org/ci/job/mwext-VisualEditor-qunit/10252/console [14:00:40] looks like a circus [14:00:43] Krinkle: strange the colors seems to work on https://integration.wikimedia.org/ci/job/beta-scap-eqiad/10671/console [14:01:45] hashar: that one is also timing out [14:01:59] so I guess it's no longer a transient error, we've got patient #0 and #1 [14:02:23] so Jenkins waits for a job to complete [14:02:36] the init script is a bit buggy, when you ask it to stop [14:02:37] Of course, I didn't do a force restart [14:02:49] it does not wait for jenkins to have fully completed. So you will end up having to kill it manually [14:02:58] possibly with a sudo -u jenkins kill -9 [14:03:00] No, it's fine. I did the restart from the web gui [14:03:06] it's going to wait for this job to complete [14:03:06] ah that one should work [14:03:08] it should [14:03:57] yeah it is reloading [14:04:19] and there is only one java jenkins process running [14:04:21] how long does that take these dyas? [14:04:36] for mail, paravoid worked on mail/exim configuration yesterday. Something might have been broken in the process [14:04:44] what is broken? [14:04:55] bad paravoid [14:04:58] can't you ask for a review [14:05:02] * mark ducks [14:05:03] krinkle: jenkins should start in a few minutes. tailing /var/log/jenkins/jenkins.log gives some clue :] [14:05:24] paravoid: mail from Jenkins sometime timeout. Cant remember offhand the smtp host it is using [14:05:34] where is it configured? [14:05:53] * hashar needs to puppetize Jenkins conf [14:06:12] paravoid: on port 80 :P [14:06:16] mark: do you still thing it's "bad paravoid"? :) [14:06:24] bad bad hashar [14:06:39] heh [14:07:04] yes, and shame on you for not respecting those WLM domains you didn't know about [14:07:15] hashar: why https://gerrit.wikimedia.org/r/#/c/140059/ isn't merged? any idea? [14:07:19] Krinkle: Jenkins is now registering its jobs with gearman :- [14:07:29] kart_: what is it ? [14:07:35] hashar: if you have manual config and mail isn't broken yet, it will [14:07:46] hashar: a Debian package for cxserver patch. [14:07:50] !log Jenkins is back [14:07:52] will be broken soon, I mean [14:07:56] Logged the message, Master [14:08:10] kart_: there is 0 chance I can review a debian package :D I thought you guys were going to use git-deploy? [14:08:39] paravoid: ah we are using smtp.pmtpa.wmnet [14:08:43] kart_: I suspect there is no Jenkins pipeline set up for that project and also, you need to give JenkinsBot permissions to merge. [14:08:50] Did Jenkins-bot run for that repository in the past? [14:09:06] kart_: It didn't just not merge, I also don't see it running tests for the patch set. [14:09:23] smtp.pmtpa.wmnet / smtp.eqiad.wmnet disappeared from DNS :-/ [14:09:35] smtp.eqiad.wmnet didn't exist [14:09:43] smtp.pmtpa.wmnet disappeared, yes [14:09:52] (03CR) 10Ottomata: [C: 032 V: 032] Reports in prod should be stored on redis 30 days [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/140764 (https://bugzilla.wikimedia.org/63664) (owner: 10Nuria) [14:09:58] I might introduce an smtp.eqiad.wmnet later, but you should really use $::mail_smarthost [14:10:00] For new repos, Jenkins needs to be told what to run and how, file a bug or submit a patch to integration/jenkins-job-builder-config and integaration/zuul-config, see also mediawiki.org/wiki/CI/JJB [14:10:35] paravoid: I don't think HTTP or XML understands puppet variables :P [14:10:42] (03CR) 10Ottomata: Unqualify local variables (031 comment) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/140860 (owner: 10QChris) [14:10:42] (03PS2) 10Ottomata: Unqualify local variables [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/140860 (owner: 10QChris) [14:10:49] Krinkle: I don't understand? [14:11:14] (03CR) 10Ottomata: [C: 032 V: 032] Unqualify local variables [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/140860 (owner: 10QChris) [14:11:15] paravoid: Jenkins stores its config in a db, stored on disk in XML. The access layer to change it is a web interface over http. [14:11:38] paravoid: I would restore smtp.pmtpa.wmnet I am pretty sure it is hardcoded everywhere around labs since that was the recommended host at one point [14:11:47] I'm not going to do that, no [14:12:52] Perhaps point all old pmtpa hostnames at the same IP for logging/monitioring so we at least what's using it. Considering that it is a host local to the data centre internally, should only result in getting data we need to act on (e.g. things misconfigured that are broken) [14:13:20] a new service name might make sense, but I'd like to see if these hardcoded places can use proper configuration management first [14:13:24] and I really think they should [14:14:06] yeah, but that's two separate movements, I don't think we can block migration of smtp on having "all the things" puppetized. [14:14:17] what is all the things? [14:14:18] Although we're getting pretty close. [14:14:25] right now, it's just jenkins [14:14:29] paravoid: do we have a puppet class to setup a localhost relay for mail? This way I can point jenkins to localhost and it the mail relay will be configured with puppet [14:14:31] Well, you'll find out if you track all connections to pmtpa.wmnet [14:14:39] (03PS3) 10Ottomata: Parametrize redis' settings needed for Wikimetrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/141116 (https://bugzilla.wikimedia.org/66911) (owner: 10QChris) [14:14:42] jenkins is the first one that rang a bell [14:14:46] (03CR) 10Ottomata: [C: 032 V: 032] Parametrize redis' settings needed for Wikimetrics [operations/puppet] - 10https://gerrit.wikimedia.org/r/141116 (https://bugzilla.wikimedia.org/66911) (owner: 10QChris) [14:15:06] Krinkle: yeah I used to do that at previous job. Before phasing any DNS entry we would look at the last week of logs and contact the remotes / figure out what it can be :-D [14:15:18] Krinkle: but that is long [14:15:24] I'd bet it's not the only one. And we can find out early or wait until random stuff breaks. [14:15:31] ottomata: https://gerrit.wikimedia.org/r/#/c/140655/ ? [14:15:44] ja i see it matanya [14:15:45] !log Jenkins set SMTP server to wiki-mail.wikimedia.org smtp.pmtpa.wmnet got deleted [14:15:46] Jenkins didn't actually ring any alarm, it's silently ignored. I happened to have spot it. [14:15:50] Logged the message, Master [14:15:53] (03PS1) 10Ottomata: Updating wikimetrics module [operations/puppet] - 10https://gerrit.wikimedia.org/r/141687 [14:15:53] thanks ottomata [14:15:55] hashar: wiki-mail is still wrong [14:16:00] that will go away soon as well [14:16:04] lol [14:16:05] so yeah, feel free to keep hardcoding stuff [14:16:11] and I'll keep breaking them :) [14:16:13] (03CR) 10jenkins-bot: [V: 04-1] Updating wikimetrics module [operations/puppet] - 10https://gerrit.wikimedia.org/r/141687 (owner: 10Ottomata) [14:16:17] and paravoidi did two touchy lint changes i'd like you to review [14:16:31] (03PS2) 10Ottomata: Updating wikimetrics module [operations/puppet] - 10https://gerrit.wikimedia.org/r/141687 [14:16:32] we'll be harcoding IPs after this I guess. [14:16:32] * paravoid i did [14:16:40] paravoid: works for me. I am filling a bug for Jenkins to figure out a better long term solution [14:16:43] Krinkle: these will go away soon as well [14:16:47] (03CR) 10Ottomata: [C: 032 V: 032] Updating wikimetrics module [operations/puppet] - 10https://gerrit.wikimedia.org/r/141687 (owner: 10Ottomata) [14:16:49] we can do this all day [14:16:54] https://gerrit.wikimedia.org/r/#/c/140654/ and https://gerrit.wikimedia.org/r/#/c/140678/ [14:17:02] hashar: not Debian review, just Jenkins :) [14:17:06] and set up a cronjob that execute a bash script that resolves the puppet variable and when it catches output a message to irc. [14:17:10] so that hashar can update jenins. [14:17:12] jenkins [14:18:55] (03PS1) 10Aude: Enable Wikibase property suggester on beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141688 [14:19:09] or you can just do what you were supposed to do on the first place [14:19:15] in* [14:20:04] Depends, who's responsible for it? I guess nobody is. [14:20:09] jenkins? [14:20:30] If it's going to be puppet, either ops is going to learn how it works, or I can no longer maintain it [14:20:37] It should be puppet, 100% [14:20:45] Krinkle: paravoid: logged the needed change for Jenkins SMTP host as https://bugzilla.wikimedia.org/show_bug.cgi?id=67027 [14:20:59] Krinkle: I am never going to puppetize Jenkins configuration. [14:21:05] just saying, then us maintaining it silently won't happen. It'll be caught in a find/replace across puppet automatically, as it is shoudl be. [14:21:13] Krinkle: unless I move Jenkins to labs :D [14:21:42] *sigh* [14:22:27] cause I will no more have write access to the conf which is a pain in the ass [14:22:40] that is the sole reason really [14:22:44] Well, it's fine. Ops will maintain it for us. [14:22:47] :) [14:22:57] I know better, and it's not your fault, but is' fun to point out the reality xD [14:23:45] (03PS15) 10QChris: Add backup role and scripts to wikimetrics [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [14:23:53] :-D [14:23:54] We can't both maintain it ourselves and at the same time be responsible for knowing everything ops knows and perform ops-y types of changes [14:24:32] in the case of Jenkins SMTP host, we can probably use a local mail relay maintained that puppet. That would solve the issue [14:24:42] so which is it, you're lacking knowledge and skills, or write access? [14:24:42] or use a stable DNS entry [14:24:49] hashar: Considering jenkisn config is an xml file, why not puppetise it with an erb template for the mailhost etc. [14:25:02] Krinkle: i repeat: no puppet. [14:25:22] Krinkle: or we will end up depending on someone with +2 on ops/puppet [14:25:51] paravoid: I don't want write access, I don't want to know how to maintain puppet stuff. I'm interested personally and write most patches myself (also to get it done faster), but that's not because I should imho. [14:26:00] and the Jenkins master can't yet be migrated to labs where we have our own puppet master that would grant us +2 [14:26:03] Are there system-side blocks of certain IP addresses on WMF sites? We have received a complaint from someone in OTRS who states that they have no access to Wikipedia whatsoever from their IP address and suspect it may be because they are "regularly checking the availability of Wikipedia" from their ip "because it is an important site for our customers" (whatever that means, but sounds like automated checing). [14:26:46] hashar: this is the first I hear about jenkins master possibly moving to labs [14:26:52] that sounds like a bad idea [14:26:52] pajz: very rarely; what does "no access" mean here (timeouts, 403s, etc.)? [14:27:11] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Woo hoo. A module. Looks quite good, some points here and there but sound overall" (037 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/139095 (owner: 10Nikerabbit) [14:27:42] hashar: it can be configured the way mediawiki is. In git, in a separate repo under operations/* that we have +2 on. [14:27:49] (03CR) 10Alexandros Kosiaris: "I will be adding some unit tests in the module as well." [operations/puppet] - 10https://gerrit.wikimedia.org/r/139095 (owner: 10Nikerabbit) [14:28:00] That way it'll be in git, and operations can maintain it when they make changes that affect it, and we can do the rest. [14:28:21] Krinkle: I've never heard anyone say "we want jenkins to be migrated to ops", so you're sort of right [14:28:34] I don't expect all developers to master puppet or have to learn it [14:28:40] or to have to* [14:29:04] kart_: for the debian building job, please fill a bug about it in Wikimedia > Continuous Integration. There is a bunch of templates in Jenkins Job Builder that might get the job done. Most are in the operations.yaml file of integration/jenkins-job-builder-config.git repo . Look for something with -debian-glue or the git log history for some clue :] [14:29:04] but at the same time, if you don't /want/ to, you should say so up the chain [14:29:09] kart_: we can pair that tomorrow :) [14:29:22] working around us like hashar tends to do lately is going to blow up soon [14:29:27] (03CR) 10Ottomata: Add backup role and scripts to wikimetrics (031 comment) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [14:29:28] eventually anyway [14:29:38] paravoid, don't know, they just say that Wikipedia is "blocked"/"not responding" since Saturday and that they are unable to access any Wikipedia article. [14:29:46] paravoid: the bottom line is, if ops schedules a change to start a migration (or rather, clean up the remainder of pmtpa), I don't think we should've known that Jenkins (like every other service wmf has ever had before eqiad) is still using smtpa.pmtpa.wmnet, that'd never cross my mind. [14:30:02] (03CR) 10Ottomata: Add backup role and scripts to wikimetrics (033 comments) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [14:30:03] So however it is done, that should've been something ops found somehow and updated accordingly. [14:30:14] paravoid: how I still have everything sent via puppet. It is just that waiting hours/days between patches is a bit tiring. That is why we got beta and CI to labs with local puppetmaster. [14:30:55] Krinkle: are you trying to invent configuration management? :) [14:31:11] paravoid: that is the same issue with debian packages in which we don't have write access on apt.wm.o for good reasons [14:31:11] (03CR) 10Ottomata: Add backup role and scripts to wikimetrics (031 comment) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/139557 (https://bugzilla.wikimedia.org/66119) (owner: 10Milimetric) [14:31:22] paravoid: I mean, mediawiki php settings aren't in puppet either. But those values are updated by opsen when relevant IPs or hostnames change, not by the typical engineer otherwise merging code in ops/mediawiki-config [14:31:38] sure [14:31:43] Would you welcome something like that for jenkins config? [14:31:45] is the jenkins config in an ops/$foo repo? [14:32:02] No, but I'm saying that'd be a possible compromise. [14:32:40] Right now it's unversioned and untracked in Jenkins's local file system. having it in ops/puppet as an erb with placeholders means we can no longer effectively do our jobs (the config is mostly stuff not ops related, changes too often, just like mediawiki). [14:32:42] I think it's acceptable; depends on the amount and type of settings (e.g. purely system, or e.g. list projects?) there and the frequency in which they're updated [14:33:01] unversioned and untracked in a local file system sounds wrong regardless [14:33:06] Yeah [14:33:15] <_joe_> I think we should set expectations, however. Puppet can use an external repo to manage a subtree easily, but it will never be an handy tool for instantaneous config change deployments. [14:33:33] you know the jenkins hosts are *not* backed up, right? [14:33:37] <_joe_> so if your workflow is change-reload-try-reiterate, that won't work with puppet [14:33:44] if the system crashes, this file is gone [14:33:50] <_joe_> (still, please use a VCS regardless) [14:34:05] paravoid: Well, that's what you get with 2 20% engineers doing it because they want to. [14:34:28] and little input :) [14:34:29] the expectation is that systems can be reinstalled and with configuration management applied, they can be restored into service [14:34:44] and yet everything seems to depend on it. [14:34:54] (03PS1) 10BBlack: Add "import header" at the top level of our VCL [operations/puppet/varnish] - 10https://gerrit.wikimedia.org/r/141692 [14:35:56] (03CR) 10BBlack: [C: 032 V: 032] Add "import header" at the top level of our VCL [operations/puppet/varnish] - 10https://gerrit.wikimedia.org/r/141692 (owner: 10BBlack) [14:36:42] (03PS1) 10Reedy: Non Wikipedias to 1.24wmf10 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141693 [14:37:29] so smtp.pmtpa.wmnet was that placeholder, right [14:37:52] apparently smtp.eqiad.wmnet doesn't exist though [14:40:48] (03PS1) 10BBlack: Fix duplicate header imports in our VCL [operations/puppet] - 10https://gerrit.wikimedia.org/r/141694 [14:41:17] anomie: I'll do swat today unless you're so inclined [14:41:20] we both have patches today [14:41:29] manybubbles: Go for it [14:43:21] paravoid, any idea regarding the "wikipedia is blocked" issue? Could someone check their hypothesis (i.e. wmf is blocking us) based on their ip? [14:43:23] (03CR) 10Manybubbles: [C: 031] Remove references to the Math VisualEditor BetaFeature (now graduated) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141055 (owner: 10Jforrester) [14:43:31] (03PS2) 10BBlack: Fix duplicate header imports in our VCL [operations/puppet] - 10https://gerrit.wikimedia.org/r/141694 [14:43:33] (03CR) 10Manybubbles: [C: 031] Remove whitelist entry for now-graduated MediaViewer BetaFeature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141054 (owner: 10Jforrester) [14:43:45] aude: Was your "user denied" problem from trying to run the `sql` script on deployment-bastion? [14:43:52] (03CR) 10BBlack: [C: 032 V: 032] Fix duplicate header imports in our VCL [operations/puppet] - 10https://gerrit.wikimedia.org/r/141694 (owner: 10BBlack) [14:43:55] (03CR) 10Manybubbles: [C: 031] Enable VisualEditor by default on Wikimania 2014 wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/138345 (owner: 10Jforrester) [14:43:59] bd808: it was [14:44:03] (03CR) 10Manybubbles: [C: 031] Follow-up 73ab798a: Also enable MediaWiki.org's Skin namespace in VisualEditor [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141053 (owner: 10Jforrester) [14:44:16] aude: Yeah. https://bugzilla.wikimedia.org/show_bug.cgi?id=45706 and https://bugzilla.wikimedia.org/show_bug.cgi?id=63803 [14:44:18] it would be nice to verify our new table [14:44:20] (03Abandoned) 10Yuvipanda: Raise account creation limits for tewiki outreach event [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141637 (https://bugzilla.wikimedia.org/67017) (owner: 10Yuvipanda) [14:44:25] although i can do sql.php [14:44:59] aude: ? [14:45:12] hoo: can't access the sql utility [14:45:14] aude: I don't know who to get the 'wikiadmin' user password from to fix it. Probably someone who should be sleeping right now. [14:45:24] it used to work [14:45:30] bd808: That's easy ;) [14:45:37] aude: What's it giving you? On tin? [14:45:46] beta [14:45:48] aude: Really? Those bugs are old. [14:46:04] oh, beta [14:46:05] i'm sure it worked at some point [14:46:13] I lost my root there after the eqiad move :P [14:46:25] probably in tampa [14:46:37] might be that I broke it [14:46:46] I updated the sql script a bit back [14:47:24] bad hoo [14:47:37] hoo: We are just missing a password in the private puppet repo at this point. S.pringle probably knows what it is. [14:47:50] bd808: No, we don't [14:47:54] wikiadmin_pass is correct there [14:48:01] just the user is not wikiadmin, but mw [14:48:15] ah. [14:49:48] pajz: we haven't blocked anyone this past week, as far as I'm aware. but we really need more information about what kind of error that person is seeing [14:50:20] springle: let's move it here for the benefit of others, when you got a minute [14:51:02] paravoid: go ahead [14:51:11] springle: so there were some issues yesterday [14:51:35] that bd808's grepping of logs show them happening as far back as the start of the month [14:51:48] 2014-06-02, iirc, which coincides with a wmf8/wmf9 deployment [14:52:05] the issue was a "Too many connections" issue yesterday, that happened a few times [14:52:18] at least at 02:00 UTC, 14:00 UTC, and sometime later, I think 17:00 UTC [14:52:21] it's visible in graphite [14:52:32] a preliminary investigation of logs show it contained in s5 [14:52:35] (03CR) 10Hoo man: [C: 032] "beta-only" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141688 (owner: 10Aude) [14:52:39] but also es1006 at the same time(?) [14:52:42] (03Merged) 10jenkins-bot: Enable Wikibase property suggester on beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141688 (owner: 10Aude) [14:52:46] thanks [14:52:48] (take everything I say with a grain of salt) [14:52:52] (03PS2) 10Withoutaname: New namespace "Carte" for rowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139766 (https://bugzilla.wikimedia.org/66530) [14:53:03] paravoid: yes i've been chasing those. did you see ori's email to ops@ [14:53:15] * springle looks for a subject [14:53:31] I don't think I did, no [14:53:38] paravoid: "es1006 hiccupC" [14:53:46] er without the C [14:53:55] !log hoo Synchronized wmf-config/CommonSettings.php: Enable Wikibase property suggester on beta (duration: 00m 07s) [14:53:57] oh, right [14:54:00] Logged the message, Master [14:54:05] however I thought it was around the 12th the issue started [14:54:20] if it stretched further back, it might be manifold [14:54:42] bd808 did some log grepping with logstash, I can't say for sure [14:54:45] paravoid: it is indeed s5 and external storage affected [14:55:05] we do get "too many connections" for other reasons [14:55:10] hashar: Why's beta not having a "wikiadmin" db-user but only "mw"? [14:55:11] I checked tendril at the time of the outage yesterday, I didn't see anything abnormal [14:55:16] but far less frequently than last week or two [14:55:27] I think ishmael is broken btw [14:55:29] hoo: history. Beta got setup with 'wm' originally [14:56:16] as I'm fixing the VCL reload failures on a bunch of caches, VCL changes that were merged into production in the past are now suddently actually being applied to reality. I'm not even sure how far back that goes (probably a few weeks), but it's a little scary because one of those changes could break something (long after the committer has forgotten and thought it merged up and worked ok) [14:56:17] paravoid: https://bugzilla.wikimedia.org/show_bug.cgi?id=64581#c1 [14:56:42] so beware, if odd problems crop up this morning, it's probably related [14:56:45] springle: ilol [14:56:47] lol [14:56:57] perl was going nuts [14:57:01] so i killed it [14:57:19] hence starting that sample engine stuff [14:57:22] you're one step ahead of me for everything aren't you [14:57:34] which isn't useful here yet since only s2 is being sampled [14:57:36] anomie: want to go first? [14:57:44] manybubbles: Fine with me [14:57:56] paravoid: no it's great to have input [14:58:06] paravoid: http://graphite.wikimedia.org/render/?target=servers.es1006.cpu.total.user.value&from=00%3A00_20140601&until=23%3A59_20140622&width=600&height=300 [14:58:15] that's the timeframe i suspect [14:58:17] http://ganglia.wikimedia.org/latest/?r=hour&cs=06%2F01%2F2014+00%3A00+&ce=06%2F24%2F2014+00%3A00+&dg=1&tab=m&vn=&hide-hf=false&hreg[]=db1021&mreg[]=^mysql_connections%24 [14:58:23] click on inspect on that [14:58:28] James_F: I'll do your patches next [14:58:29] yep [14:58:38] paravoid, thanks for the info, I've asked for more information. [14:58:43] clearly shows 06-12 [14:58:56] paravoid, springle: here's the logstash dashboard for the too many connection errors as seen on the php side -- https://logstash.wikimedia.org/#/dashboard/elasticsearch/db-too-many-connections [14:59:08] bd808: :D [14:59:24] paravoid: yeah, hence i picked the 12th. but i only guessed with wikidata [14:59:27] saw that yesterday, brief [14:59:33] James_F: do you imagine it'd be ok to do them all at once? if they are all simple, safe things it'd save a few minutes [14:59:39] if not I'll do them one at a time [14:59:39] wikidata (and maybe dewiki?) [14:59:48] manybubbles: yeah [14:59:49] aude: yes, s5 is wikidata/dewiki [14:59:52] yep [15:00:07] manybubbles, anomie: The time is nigh to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140624T1500) [15:00:07] manybubbles: There shouldn't be any issue. [15:00:13] i picked wikidata because dewiki would (i guess) be normal and bahave like everything else [15:00:14] maybe wikidata should be moved [15:00:29] sure that's not the issue but eventually might be needed [15:01:33] springle: any outliers in queries?? [15:01:57] paravoid: nope, i've found no obvisouly slow new queries for wikidata or dewiki [15:02:30] paravoid: and it's odd to see external storage mimic [15:02:45] perhaps terbium or bot driven traffic [15:03:01] !log manybubbles Synchronized php-1.24wmf10/includes/config/GlobalVarConfig.php: SWAT - GlobalVarConfig should not throw exceptions for null-valued config settings (duration: 00m 05s) [15:03:05] Logged the message, Master [15:03:09] anomie: ^^^ [15:03:10] manybubbles: confirmed [15:03:49] (03CR) 10Manybubbles: [C: 032] Remove references to the Math VisualEditor BetaFeature (now graduated) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141055 (owner: 10Jforrester) [15:03:51] (03CR) 10Manybubbles: [C: 032] Remove whitelist entry for now-graduated MediaViewer BetaFeature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141054 (owner: 10Jforrester) [15:03:53] (03CR) 10Manybubbles: [C: 032] Follow-up 73ab798a: Also enable MediaWiki.org's Skin namespace in VisualEditor [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141053 (owner: 10Jforrester) [15:03:55] (03CR) 10Manybubbles: [C: 032] Enable VisualEditor by default on Wikimania 2014 wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/138345 (owner: 10Jforrester) [15:03:58] (03Merged) 10jenkins-bot: Remove references to the Math VisualEditor BetaFeature (now graduated) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141055 (owner: 10Jforrester) [15:04:01] (03Merged) 10jenkins-bot: Remove whitelist entry for now-graduated MediaViewer BetaFeature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141054 (owner: 10Jforrester) [15:04:05] (03Merged) 10jenkins-bot: Follow-up 73ab798a: Also enable MediaWiki.org's Skin namespace in VisualEditor [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141053 (owner: 10Jforrester) [15:04:10] (03Merged) 10jenkins-bot: Enable VisualEditor by default on Wikimania 2014 wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/138345 (owner: 10Jforrester) [15:05:20] !log manybubbles Synchronized visualeditor.dblist: SWAT - Enable VisualEditor by default on Wikimania 2014 wiki (duration: 00m 06s) [15:05:24] Logged the message, Master [15:05:35] !log manybubbles Synchronized visualeditor-default.dblist: SWAT - Enable VisualEditor by default on Wikimania 2014 wiki (duration: 00m 04s) [15:05:37] Logged the message, Master [15:05:48] springle: https://www.mediawiki.org/wiki/MediaWiki_1.24/wmf8 [15:06:02] all the wikidata commits there have nothing to do with dbs [15:06:10] !log manybubbles Synchronized wmf-config/: SWAT - visual editor config changes and retire some beta features (duration: 00m 04s) [15:06:13] yup [15:06:16] Logged the message, Master [15:06:33] James_F: I've synced you [15:07:06] weird [15:07:11] I miss ishmael [15:07:15] paravoid: my intention is to add an s5 slave simply because that's the easiest option. then try to track backwards. however that doesn't help ES [15:07:39] manybubbles: I see. Thanks! [15:07:40] bd808: the marker for the last deploy on kibana seems to get cleared by another deploy. or something. [15:07:57] https://logstash.wikimedia.org/#/dashboard/elasticsearch/fatalmonitor only has the last syncs I did - but there have been more in the past hour [15:08:15] manybubbles: And from test all appears fixed. [15:09:03] James_F: wonderful! consider yourself swated [15:09:37] manybubbles: Thank you again. [15:09:46] manybubbles: it's pretty flakey for prod. Strangely it seems to work really well in beta where there are many many more deploys. [15:10:03] bd808: funky [15:13:47] (03PS1) 10Jforrester: Enable VisualEditor for opt-in on OutreachWiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141702 [15:15:29] !log manybubbles Synchronized php-1.24wmf10/extensions/CirrusSearch/: SWAT Stop Cirrus from breaking RandomRootPage (duration: 00m 04s) [15:15:34] Logged the message, Master [15:15:39] ^demon|away: can you look at ^^^^ [15:17:20] <^demon|away> wmf9 and 10? [15:17:55] * ^demon|away tries to find a wmf10 wiki with it [15:19:03] wmf10 [15:19:15] <^demon|away> I don't think it's enabled on any wmf10 wikis. [15:20:43] ^demon|away: k. doing wmf9 then [15:21:43] !log manybubbles Synchronized php-1.24wmf9/extensions/CirrusSearch/: SWAT Stop Cirrus from breaking RandomRootPage (duration: 00m 06s) [15:21:44] ^demon|away: ^^ [15:21:48] Logged the message, Master [15:21:57] (03CR) 10Mark Bergsma: [C: 032] Add mr1-esams neighbor blocks and Tilaa OOB subnets [operations/dns] - 10https://gerrit.wikimedia.org/r/141681 (owner: 10Mark Bergsma) [15:22:11] <^demon|away> manybubbles: Looks good, thanks! [15:22:23] SWAT complete [15:23:49] greg-g: can you ping me when wmf10 is on group1? I need to run some scripts and the faster I get to them the better [15:25:18] paravoid: actually, i'm an idiot [15:25:25] springle: https://gerrit.wikimedia.org/r/#/c/134049/7 ? [15:25:26] Wikibase\Lib\Store\WikiPageEntityLookup::selectRevisionRow [15:25:46] that query has spiked recently on s5 [15:25:49] aude: ^^ [15:26:00] but would that affect ES too [15:26:10] did you see the commit above? [15:26:24] it's the only one in core that may be vaguely related [15:27:33] manybubbles: yah [15:27:40] greg-g: thanks! [15:28:53] looking [15:28:58] (03PS1) 10Mark Bergsma: Add forward entries for mr1-esams [operations/dns] - 10https://gerrit.wikimedia.org/r/141705 [15:29:31] (03CR) 10Mark Bergsma: [C: 032] Add forward entries for mr1-esams [operations/dns] - 10https://gerrit.wikimedia.org/r/141705 (owner: 10Mark Bergsma) [15:30:16] impending 5xx alert is me, it's brief :p [15:30:37] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data exceeded the critical threshold [500.0] [15:31:17] (doing some varnish restarts, some of which don't restart very quickly due to the mmap issue, so ^) [15:31:36] springle: that change looks like it might only hurt masters [15:31:54] it would double the queries I think [15:32:49] manybubbles: is search by size a planed feature? [15:33:13] matanya: hmmm - no but you want it? like, uh, what kinda thing? [15:33:40] find all files over 30mb on commons [15:33:44] boost long articles? cut off based on length? and is length number of words or php's bogo-bytes of character [15:33:52] ah, yeah, thats cool [15:33:58] something I don't have the data for right now [15:34:07] aka: first step in DOS'ing us? [15:34:12] hmmm - ^demon|away - why don't we have metadata like that in there now? [15:34:12] :) [15:34:30] <^demon|away> Ah, file metadata. [15:34:31] greg-g: might DOS the imagescalers [15:34:33] i got that from sql query [15:34:35] yeah [15:34:58] thought search is a better way [15:35:03] <^demon|away> Sooo, I've kind of meant to do this. [15:35:35] find all files over 30mb on commons << can easily be done on labs [15:36:33] akosiaris: thanks for review [15:36:42] hoo was done [15:36:55] !log VCL compilation is now in-sync everywhere but bits caches... [15:36:59] Nikerabbit: you are welcome [15:37:00] Logged the message, Master [15:37:54] Nikerabbit: I might be poking you in the next days about some questions I got about cxserver [15:37:55] <^demon|away> matanya: I've got a bug somewhere about file metadata. It's something I still plan to do, just haven't gotten to it. [15:38:12] <^demon|away> Mainly because I hadn't figured out a good data structure for the mapping yet, but also because of load concerns. [15:38:26] (03PS1) 10Alexandros Kosiaris: Upgrade to 1.4.0 upstream version [operations/debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/141707 [15:38:26] <^demon|away> But I can start noodling it again. Commons would love that. [15:38:43] i yes. thanks! [15:39:48] akosiaris: sure no problem. please do ping kart_ as well (I have reduced availability mon,wed,fri) [15:40:12] Nikerabbit: OK cool, good to know [15:42:15] paravoid: aude: my money is tentatively on a spike in WikiPageEntityLookup::selectRevisionRow calls. if that was true, would it also cause a corresponding external storage access? [15:42:38] likely [15:42:44] springle: Yep, all these call are followed by an external storage lookup [15:43:07] that code in wikibase has not changed in a while, afaik but maybe something in core changed that is related [15:43:15] that's causing things to behave differnetly [15:43:32] some change in caching perhaps [15:43:35] could be [15:45:41] it's definitely not additional write activity; master load is unchanged, and binlog shows no obvious bot spike [15:46:09] that code would be accessed when accessing wikidata content from the wikipedias [15:46:21] should hit caching though [15:48:34] that other projects sidebar feature isn't deployed yet, right? [15:48:51] on 'voyage it is [15:49:02] (03PS1) 10Filippo Giunchedi: gdash: fix swift load/network dashboard [operations/puppet] - 10https://gerrit.wikimedia.org/r/141709 [15:49:57] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] gdash: fix swift load/network dashboard [operations/puppet] - 10https://gerrit.wikimedia.org/r/141709 (owner: 10Filippo Giunchedi) [15:51:37] hm? I only saw some commits that weren't merged [15:51:43] which other sidebar? [15:51:57] https://bugzilla.wikimedia.org/show_bug.cgi?id=66226 [15:52:03] https://www.mediawiki.org/wiki/Wikibase/Beta_Features/Other_projects_sidebar [15:52:14] that [15:52:32] oh heh.. appropriately named [15:52:33] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% data above the threshold [250.0] [15:52:55] here's the perf review bug, if you have anything: https://bugzilla.wikimedia.org/show_bug.cgi?id=66849 [15:53:00] not beta feature yet [15:53:15] only enabled per project on their request [15:53:19] Yeah [15:53:20] ah! [15:53:32] still could be a factor, but it's been there a while [15:53:37] unless some new project enabled it [15:53:40] so the unmerged commit I saw way earlier was the introduction as *beta* [15:53:46] https://gerrit.wikimedia.org/r/#/c/132606/ [15:53:58] yeah, that commit message should probably mention that it's already deployed [15:54:09] paravoid: Yeah, so users can switch it on/off themselves [15:54:16] rather than we per wiki/ project [15:54:20] right [15:54:28] or addition to that option [15:54:51] it does mention that [15:55:46] what's the setting's name? [15:56:09] otherProjectsLinks [15:56:14] yea [15:56:38] it's on kowiki [15:56:52] don't know how big it is [15:56:59] that's new afaik [15:57:18] so, can we hold off new enablements until those bugs are closed? [15:57:22] * greg-g sighs [15:57:41] which bugs? [15:57:48] https://bugzilla.wikimedia.org/show_bug.cgi?id=66226 and dependent [15:57:56] (security/perf reviews) [15:58:12] kowiki got the other projects thing on june 4 [15:59:08] what is "wikidb" in https://logstash.wikimedia.org/#/dashboard/elasticsearch/db-too-many-connections ? [15:59:36] is it the database that threw the error, or is it the wiki that the user visited? [16:00:29] <^demon|away> Wiki the user visited and threw the error. [16:00:50] so, by that it can't be the other projects sidebar [16:02:05] need to see waht changed in core / try to add debug points in my wiki [16:03:22] paravoid: i think the wiki name reported in dberror for a connectino failure can be wrong [16:03:28] Tue Jun 24 14:07:26 UTC 2014 mw1176 enwiki Error connecting to 10.64.16.10: :real_connect(): (08004/1040): Too many connections [16:03:32] eg^ [16:03:45] 10.64.16.10 is db1021, which isn't enwiki [16:04:02] aude: Could also be that some wiki is now using Wikidata in a widely used template or something like that [16:04:19] <^demon|away> springle: What cluster's db1021 in? [16:04:23] s5 [16:04:25] that can trigger several item reads in worst case (label lookups) [16:04:55] hoo: possible [16:05:34] <^demon|away> springle: s5 has wikidatawiki, which could be reasonable for enwiki to be connecting to? [16:06:11] <^demon|away> Just guessing. But having the wrong dbname in the error logging would be news to me. [16:06:32] ^demon|away: ok, i thought the dbname indicated the target [16:06:44] <^demon|away> No, dbname is the wiki that logged the error. [16:07:17] aude: We *could* change some of clients parser/ lua stuff towards only using a dedicated label lookup which relies on wb_terms [16:07:29] +1 [16:09:43] KEY `tmp1` (`term_language`,`term_type`,`term_entity_type`,`term_search_key`) [16:09:45] ouh? [16:11:11] those indices are slightly inconsistent with what we have in Wikibase [16:11:21] ^demon|away: ah makes sense. so normally target == dbname. what else besides this other toolbar would connect from enwiki to wikidatawiki? [16:11:35] * springle hopes for a simple list [16:11:42] springle: See my writing above [16:12:18] springle: lua, parser function [16:12:35] should normally access caching though [16:13:05] aude: Only in memory or am I missing something? [16:13:38] memcached [16:13:51] not caching entities [16:14:30] the entity revisions are cached [16:16:59] ah right, there are two layers of CachingEntityRevisionLookup [16:17:16] (03PS1) 10Filippo Giunchedi: gdash: fix swift gdash aliases [operations/puppet] - 10https://gerrit.wikimedia.org/r/141712 [16:17:26] i'm adding debug points to see if there is anything odd [16:18:01] (03PS2) 10Filippo Giunchedi: gdash: fix swift gdash aliases [operations/puppet] - 10https://gerrit.wikimedia.org/r/141712 [16:18:09] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] gdash: fix swift gdash aliases [operations/puppet] - 10https://gerrit.wikimedia.org/r/141712 (owner: 10Filippo Giunchedi) [16:20:00] $wgWBRepoSettings['sharedCacheKeyPrefix'] = "$wmgWikibaseCachePrefix/WBL-$wmfVersionNumber"; [16:20:16] that's why we see a spike after the version updates [16:20:20] same for clients, btw [16:20:37] aude: ^ [16:20:38] that's in case serialization chagnes or such [16:21:28] Sure, but it also means that we throw away most of our caching on a version update [16:22:35] (03CR) 10Hashar: "All good. Thanks a ton for pushing Swift to labs :-)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/137803 (owner: 10Andrew Bogott) [16:26:21] aude: Shall I open a bug for evaluation of a label lookup for formatting? [16:26:34] probably is one, if not sure [16:27:53] aude: mh... there's https://bugzilla.wikimedia.org/show_bug.cgi?id=66541 [16:28:02] PROBLEM - Disk space on analytics1013 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/h 125210 MB (6% inode=99%): /var/lib/hadoop/data/j 73674 MB (3% inode=99%): [16:28:05] and https://bugzilla.wikimedia.org/show_bug.cgi?id=66540 [16:28:22] but nothing for formatting [16:28:23] somewhat related but not the issue afaik [16:28:56] Bug 66541 - Use TermIndex for finding labels for linked items in recentchanges and similar lists (edit) [16:29:32] that's a littly scary though... can also lead to a lot of lookups [16:29:39] yeah [16:31:22] EntityIdLabelFormatter uses enitty lookup [16:45:55] paravoid: ping [16:46:39] pong [16:47:05] hey! [16:47:08] hi! [16:47:11] :) [16:47:23] already preparing for cyprus? [16:47:32] yeah :) [16:47:45] I don't envy you [16:47:57] gwicke: ! [16:47:59] it could be fun! [16:48:04] after all this time worrying about it, it's kinda liberating :) [16:48:11] (03PS1) 10BBlack: Set cluster_tier for bits caches [operations/puppet] - 10https://gerrit.wikimedia.org/r/141719 [16:48:13] yeah, and hell could freeze over as well [16:48:32] paravoid: at least it's a reasonably short stint [16:48:42] yup [16:49:06] paravoid: ^ wtf :) [16:49:11] do you think they can break paravoid in just 3 months? [16:49:40] nah [16:49:57] bblack: wow [16:49:58] bblack: wtf... [16:51:15] oh man [16:51:18] that commit makes me cry [16:51:40] makes me wanna cry even [16:51:44] best commitdiff length to comment length ratio ever? [16:52:19] hmm let's check [16:52:22] it took me like 3 hours this morning to figure out all of wtf was going on there. varnish module split did not help :P [16:52:37] bblack: holy crap [16:52:58] <^demon|away> bblack: Old quip on that subject. [16:52:59] <^demon|away> Remember, your commit message isn't too long until it causes people's machines to go into swap when they read the commit mailing. [16:54:02] Puppet catalog differ reports no change (apart from the comment). LGTM [16:54:24] honestly... I... [16:55:08] there's been some change since then, I just merged one earlier today for example. I just don't think any were of consequence. [16:55:08] Krinkle's https://gerrit.wikimedia.org/r/#/c/99010/ definitely wins still [16:55:19] lol ^demon|away [16:55:38] (changes that end up affecting one of the VCL files loaded on tier-2 bits caches since Jan 28 2014, I mean) [16:56:27] paravoid: what's your thinking on making node 0.11 (and later 0.12) available for new code that's not going to be in full production until the end of Q1? [16:56:43] (03CR) 10BBlack: [C: 032] Set cluster_tier for bits caches [operations/puppet] - 10https://gerrit.wikimedia.org/r/141719 (owner: 10BBlack) [17:03:19] analytics1013 /var/lib/hadoop/data/j is full [17:03:52] well 200mb left but at the rate it is filling up it is going to be full soon [17:03:59] Nemo_bis: :) [17:06:14] !log restarted hadoop yarn on analytics1013 [17:06:19] Logged the message, Master [17:09:36] ottomata: (07:28:02 μμ) icinga-wm: PROBLEM - Disk space on analytics1013 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/h 125210 MB (6% inode=99%): /var/lib/hadoop/data/j 73674 MB (3% inode=99%): [17:10:25] it is down to 207MB now [17:12:42] shhhhh so many things aahhhh! :) [17:12:45] hm [17:14:08] 207M? [17:14:22] you mean G [17:22:43] i should figure out how to adjust the default disk warnings lower [17:23:11] 6% of 2 or 4TB is still a lot of space [17:28:39] ottomata: as far as I know, icinga doesn't allow doubles there :S just full integers for percents ... so you'd have to write or at least "fix" the build in check ... [17:28:48] *write a new check [17:29:09] well, the problem is that it is a default percentage for all disks [17:29:13] independent of the size of the disk [17:29:23] would be better to use a hardcoded size or something [17:31:30] oh, sdj does only have 207M [17:31:32] that is weird [17:33:01] (03PS1) 10Springle: db1021 is experiencing more "Too many connections" bursts than its siblings in s5. The only obvious difference is it still runs 5.5.34 while the others have already upgraded to 5.5.36. Reduce db1021 load until a new slave can be cloned and db1021 depooled [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141729 [17:33:21] ottomata: well, as I said... you'd have to write your own check taking some parameters for the warning levels for each disk in MByte or something [17:34:05] (03CR) 10Springle: [C: 032] db1021 is experiencing more "Too many connections" bursts than its siblings in s5. The only obvious difference is it still runs 5.5.34 while [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141729 (owner: 10Springle) [17:36:20] right, but also a special exception to disable the default percentage checks for these hosts (or disks?) [17:36:40] it'd be better if we just changed the default check to use the free space values rather than percentages [17:36:41] but ja [17:37:04] (03Merged) 10jenkins-bot: db1021 is experiencing more "Too many connections" bursts than its siblings in s5. The only obvious difference is it still runs 5.5.34 while the others have already upgraded to 5.5.36. Reduce db1021 load until a new slave can be cloned and db1021 depooled [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141729 (owner: 10Springle) [17:37:46] !log springle Synchronized wmf-config/db-eqiad.php: reduce db1021 load (duration: 00m 10s) [17:37:51] Logged the message, Master [17:42:39] (03CR) 10Aaron Schulz: [C: 032] Maintenance reports limit incremental increase. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141652 (owner: 10Springle) [17:42:41] springle, what is the difference between table collation and column collation? [17:42:47] (03Merged) 10jenkins-bot: Maintenance reports limit incremental increase. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141652 (owner: 10Springle) [17:43:42] springle, more specifically; why would mysql allow you to set a table collation if it also allows column collation? is it so that new columns in a table will have the table's collation? [17:44:29] the reason I'm asking is partly driven by my extreme confusion at an alter I did last night that didn't do what I expected...; I altered the tables collation which took a long time; but the column collations apparently didnt change [17:44:48] !log aaron Synchronized wmf-config/InitialiseSettings.php: Maintenance reports limit incremental increase (duration: 00m 08s) [17:44:54] Logged the message, Master [17:44:56] mwalker: they are just defaults [17:45:33] charset and collation can be set at server, db, table, and col levels. when a column is added it uses the lowest default [17:45:58] if you alter the table's default, it won't push it down to columns that already exist unless you're specific about it [17:47:21] that makes some sense; but I'm not sure why `alter table geonames.altnames collate utf8_unicode_ci` took 20 minutes if it was just changing a column in the information_schema [17:47:24] it was doing something... [17:48:12] it will depend on your mysql/mariadb major version, but in many cases alter table just does a rebuild [17:48:34] so; it changed the table default; rebuilt all the indexes; but didn't actually do anything useful [17:48:40] * mwalker grumbles [17:48:48] quite likely [17:49:10] fair enough; thanks [17:50:16] greg-g: I just noticed I forgot two changes that I'd ment to deploy during SWAT - I had them on the list but didn't do them. Can I sneak them in some time soon? Just small config changes [17:51:50] manybubbles: yeah, reedy won't be starting right at the hour anyways [17:51:58] going then [17:52:11] (03CR) 10Manybubbles: [C: 032] Drop all Cirrus content indexes down to 5 shards [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139163 (owner: 10Manybubbles) [17:52:18] (03CR) 10Manybubbles: [C: 032] Switch all wikis to the experimental highlghter [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140110 (owner: 10Manybubbles) [17:52:20] (03Merged) 10jenkins-bot: Drop all Cirrus content indexes down to 5 shards [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/139163 (owner: 10Manybubbles) [17:52:25] (03Merged) 10jenkins-bot: Switch all wikis to the experimental highlghter [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/140110 (owner: 10Manybubbles) [17:52:45] argh [17:52:55] db1021 raid set to ReadAdaptive [17:52:58] !log manybubbles Synchronized wmf-config: Drop Cirrus indexes to five shards on rebuild and switch all wikis to new highlighter (duration: 00m 04s) [17:53:01] (03CR) 10Ottomata: "I prefer quoted, as the possible values are pretty variable and not really puppet keywords." [operations/puppet] - 10https://gerrit.wikimedia.org/r/139820 (owner: 10Ottomata) [17:53:03] Logged the message, Master [17:53:23] that will be part of why it's been playing up [17:53:49] greg-g: done [17:54:11] manybubbles: ty [17:54:20] (03PS2) 10Ottomata: kafka: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/140655 (owner: 10Matanya) [17:54:29] (03CR) 10Ottomata: [C: 032 V: 032] kafka: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/140655 (owner: 10Matanya) [17:55:19] (03PS2) 10Ottomata: statistics: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/140702 (owner: 10Matanya) [17:57:40] (03PS3) 10Ottomata: Lint fixes for misc/statistics.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/139820 [17:57:42] (03CR) 10Ottomata: [C: 032 V: 032] "Thank you!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140702 (owner: 10Matanya) [17:59:56] (03CR) 10jenkins-bot: [V: 04-1] Lint fixes for misc/statistics.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/139820 (owner: 10Ottomata) [18:00:04] Reedy, greg-g: The time is nigh to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140624T1800) [18:02:24] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: Fetching readonly [18:03:23] that's me! [18:03:24] fixing shortly [18:03:34] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: Fetching origin [18:03:38] sorry, this cafe internet is making git review take a while [18:03:51] (03PS4) 10Ottomata: Lint fixes for misc/statistics.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/139820 [18:04:24] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: Fetching origin [18:04:51] cmon jenkins... [18:07:57] (03PS5) 10Ottomata: Lint fixes for misc/statistics.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/139820 [18:08:08] (03CR) 10Ottomata: [C: 032 V: 032] Lint fixes for misc/statistics.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/139820 (owner: 10Ottomata) [18:08:34] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data exceeded the critical threshold [500.0] [18:09:24] RECOVERY - Unmerged changes on repository puppet on palladium is OK: Fetching origin [18:09:34] RECOVERY - Unmerged changes on repository puppet on strontium is OK: Fetching origin [18:15:57] (03CR) 10Ottomata: "Alex, do you want us to build a .deb package that just puts the precompiled jar in place? Or do we need to go through the whole (usually " [operations/puppet] - 10https://gerrit.wikimedia.org/r/140677 (owner: 10Gage) [18:16:25] (03CR) 10Ottomata: "Lookin good! Let's wait and see what Alex says about how to deploy the jar over in https://gerrit.wikimedia.org/r/#/c/140677/. We might " (031 comment) [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/140676 (owner: 10Gage) [18:18:39] bblack: soo.. DefaultType? :) [18:20:04] PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 685 [18:21:07] (03PS2) 10Reedy: Non Wikipedias to 1.24wmf10 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141693 [18:21:26] (03CR) 10Reedy: [C: 032] Non Wikipedias to 1.24wmf10 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141693 (owner: 10Reedy) [18:21:39] (03Merged) 10jenkins-bot: Non Wikipedias to 1.24wmf10 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141693 (owner: 10Reedy) [18:21:43] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% data above the threshold [250.0] [18:24:13] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: Fetching readonly [18:25:13] RECOVERY - check_mysql on lutetium is OK: Uptime: 1715363 Threads: 2 Questions: 8988837 Slow queries: 6626 Opens: 11290 Flush tables: 2 Open tables: 58 Queries per second avg: 5.240 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [18:25:33] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non Wikipedias to 1.24wmf10 [18:25:39] Logged the message, Master [18:27:52] !log reedy Synchronized docroot and w: (no message) (duration: 00m 14s) [18:27:57] Logged the message, Master [18:32:27] (03CR) 10Gage: "I already have it building with Archiva, deb coming soon!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140677 (owner: 10Gage) [18:32:46] (03PS1) 10Ottomata: Fix for stats .gitconfig path that was broken in previous lint patch [operations/puppet] - 10https://gerrit.wikimedia.org/r/141745 [18:32:59] (03PS2) 10Ottomata: Fix for stats .gitconfig path that was broken in previous lint patch [operations/puppet] - 10https://gerrit.wikimedia.org/r/141745 [18:34:46] greg-g: what's the procedure for core +2 rights for employees? [18:34:58] (03CR) 10Ottomata: [C: 032 V: 032] Fix for stats .gitconfig path that was broken in previous lint patch [operations/puppet] - 10https://gerrit.wikimedia.org/r/141745 (owner: 10Ottomata) [18:36:41] gwicke: ask their manager to get them +2 (which then just gets delegated to whoever on IRC has the right gerrit perms) [18:37:11] greg-g: ok, thanks! [18:37:33] (for the record: https://www.mediawiki.org/wiki/+2#Granting ) [18:37:49] ori: let me finish fixing up the varnish mess before we do more [18:38:12] bblack: OK, other than 'not nag', is there anything I can do to help? [18:38:16] I want to get reload failures to be more obvious, and get rid of the submodule. [18:38:27] would you like me to do the latter? [18:38:45] i can just leave you alone too :P [18:39:22] :) [18:39:41] I suspect there will be complications with just removing the submod and replacing it with the contents [18:39:57] (issues for others checking it out or syncing up clones, both human and automated) [18:39:59] would you ops folks prefer an rt ticket for +2 requests? [18:40:51] https://www.mediawiki.org/wiki/Gerrit/Project_ownership#Requests [18:41:00] don't ask me! [18:41:04] gwicke: looks like chad does them [18:41:15] SPLIT BRAIN [18:41:26] (re conversation in two channels, but we managed) [18:42:44] greg-g: ;) [18:42:52] (03PS1) 10Ottomata: Fix for .gitconfig file content and apache_site that was broken in previous lint patch [operations/puppet] - 10https://gerrit.wikimedia.org/r/141747 [18:43:01] (03PS2) 10Ottomata: Fix for .gitconfig file content and apache_site that was broken in previous lint patch [operations/puppet] - 10https://gerrit.wikimedia.org/r/141747 [18:43:10] I'll use mail [18:43:22] (03CR) 10Ottomata: [C: 032 V: 032] Fix for .gitconfig file content and apache_site that was broken in previous lint patch [operations/puppet] - 10https://gerrit.wikimedia.org/r/141747 (owner: 10Ottomata) [18:46:05] (03PS1) 10Ottomata: Add .bash_aliases file for otto [operations/puppet] - 10https://gerrit.wikimedia.org/r/141748 [18:46:24] (03CR) 10Ottomata: [C: 032 V: 032] Add .bash_aliases file for otto [operations/puppet] - 10https://gerrit.wikimedia.org/r/141748 (owner: 10Ottomata) [18:47:02] (03CR) 10Dzahn: Fix for .gitconfig file content and apache_site that was broken in previous lint patch (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/141747 (owner: 10Ottomata) [18:51:35] (03CR) 10Dzahn: [C: 032] "..and if we ever want it again we'd do it via admin.yaml" [operations/puppet] - 10https://gerrit.wikimedia.org/r/141373 (owner: 10Hoo man) [18:53:57] (03CR) 10Ottomata: [C: 032 V: 032] "Thanks!" [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/141653 (owner: 10Kmosher) [18:55:02] (03CR) 10Ottomata: [C: 032 V: 032] "Sure! fine with me :)" [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/140898 (owner: 10Kmosher) [18:55:49] bd808|LUNCH: when you get back, how about I finally merge this? :) https://gerrit.wikimedia.org/r/#/c/125184/ [18:56:13] lol @ hashar calling one gerrit change the "patch of doom" [18:56:25] mutante: yeah experimenting on labs [18:56:32] hashar: 'k :) [18:56:34] (03PS6) 10Ottomata: Adding $deployable_networks variable in network.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/125184 [18:56:41] mutante: you can ignore the whole chain of zuul: changes. Will review / pair with _joe_ tomorrow to polish them up [18:56:50] hashar: ok [18:57:43] (03CR) 10Ottomata: Add CDH5 support, drop CDH4 support (031 comment) [operations/puppet/cdh4] (cdh5) - 10https://gerrit.wikimedia.org/r/135494 (owner: 10Ottomata) [18:58:42] (03CR) 10Dzahn: [C: 031] Add myself to releasers-mediawiki [operations/puppet] - 10https://gerrit.wikimedia.org/r/140634 (owner: 10Catrope) [19:00:20] (03PS2) 10Dzahn: Add myself to releasers-mediawiki [operations/puppet] - 10https://gerrit.wikimedia.org/r/140634 (owner: 10Catrope) [19:05:45] (03CR) 10Hashar: "For pairing with Giuseppe." [operations/puppet] - 10https://gerrit.wikimedia.org/r/141572 (owner: 10Hashar) [19:05:46] ottomata: Merge whenever you're ready. I'm trying to remember all the things that patch could touch. [19:06:29] if it doesn't work, then anything that uses those variables! [19:06:31] but mostly deployment [19:06:34] git deploy probably [19:06:47] (03CR) 10Hashar: "Will pair with Giuseppe." [operations/puppet] - 10https://gerrit.wikimedia.org/r/141442 (owner: 10Hashar) [19:06:51] i'll deploy and run puppet on tin and try a git deploy [19:06:56] (03CR) 10Hashar: "Will pair with Giuseppe." [operations/puppet] - 10https://gerrit.wikimedia.org/r/141486 (owner: 10Hashar) [19:06:57] Yeah. Looks like ti should only mess with the vhost for gir-deploy on it [19:07:08] (03CR) 10Ottomata: [C: 032 V: 032] Adding $deployable_networks variable in network.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/125184 (owner: 10Ottomata) [19:07:09] s/gir/git/ [19:07:12] (03CR) 10Hashar: "Will pair with Giuseppe." [operations/puppet] - 10https://gerrit.wikimedia.org/r/141487 (owner: 10Hashar) [19:07:30] (03CR) 10Hashar: "Will pair with Giuseppe." [operations/puppet] - 10https://gerrit.wikimedia.org/r/141488 (owner: 10Hashar) [19:07:35] notes how binasher is still added in gerrit [19:07:40] as reviewer [19:07:43] (03CR) 10Hashar: "Will pair with Giuseppe." [operations/puppet] - 10https://gerrit.wikimedia.org/r/141501 (owner: 10Hashar) [19:08:58] (03CR) 10Hashar: "Split conf between merger and server daemons." [operations/puppet] - 10https://gerrit.wikimedia.org/r/141663 (owner: 10Hashar) [19:09:44] (03CR) 10Dzahn: [C: 032] redisdb: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/140667 (owner: 10Matanya) [19:12:02] (03CR) 10Dzahn: "checked on rdb1001: noop" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140667 (owner: 10Matanya) [19:12:44] bd808: looks good i think [19:12:53] can I get a temporary moratorium on operations/puppet changes? [19:13:05] good luck! :) [19:13:09] i will abide. [19:13:11] I'm about to bypass gerrit and merge the varnish module back into the main repo [19:13:22] oh, interesting? [19:13:27] finally got annoyed enough with it? [19:13:30] bblack: i'll stop [19:13:45] (i offered to do that before but was told not to because I already did it...:/) [19:13:56] (already submoduled it*) [19:14:02] keep the history! :D [19:14:06] ottomata: The diff looks clean. subnets are in a different order but all there. [19:14:16] aye ja [19:14:29] bblack: i have changes waiting for review on the submodule... [19:14:41] yeah they're gonna get borked, sorry [19:14:59] that's fine I guess, why move it back though now, but not before? [19:15:02] something change? [19:17:06] because it really sucks and I'm tired of fighting it [19:18:00] bblack, i'm all for moving it back, mainly because I agree with you that I did it prematurely, but it isn't that hard to work with (I do it pretty often with other repos) [19:18:14] it is! [19:18:16] what's the trouble you are having? [19:18:37] it's a royal pain in the ass. jumping back and forth between two separate repos for git blame, history logs, git grep, etc [19:18:48] when the actual relevant files are randomly-intermingled between the two [19:19:13] hm, yeah for this module its really weird [19:19:18] some files in templates/ some in the module [19:19:21] that's why the cleanup [19:19:26] help me with my change someday ehhhhh???!? :p [19:21:10] (03PS1) 10Hashar: contint: install Zuul on all CI slaves [operations/puppet] - 10https://gerrit.wikimedia.org/r/141758 [19:22:34] (03CR) 10Hashar: "All the previous patches have been made for this: install Zuul from sources on all CI Jenkins slaves without installing a daemon / config " [operations/puppet] - 10https://gerrit.wikimedia.org/r/141758 (owner: 10Hashar) [19:23:38] I'm done, I think, assuming everything worked [19:23:59] I think git pull in other local clones will bitch about modules/varnish, you may have to rm -rf modules/varnish; git pull [19:24:36] (I fixed that manually on strontium + palladium) [19:25:53] PROBLEM - Unmerged changes on repository puppet on virt1000 is CRITICAL: Fetching origin [19:26:13] error: The following untracked working tree files would be overwritten by checkout: modules/varnish/files/ganglia/.pep8 [19:26:21] ^ basically a spam of that [19:26:33] PROBLEM - Unmerged changes on repository puppet on virt0 is CRITICAL: Fetching origin [19:28:38] (03CR) 10Alexandros Kosiaris: "I would be fine with any approach of those you point out but if Jeff managed to built the jar with Archiva and get a .deb that is probably" [operations/puppet] - 10https://gerrit.wikimedia.org/r/140677 (owner: 10Gage) [19:29:03] heh [19:29:10] I guess I have to fix the virt masters as well [19:29:13] or someone does [19:31:03] RECOVERY - Puppet freshness on analytics1022 is OK: puppet ran at Tue Jun 24 19:31:01 UTC 2014 [19:31:33] RECOVERY - Puppet freshness on analytics1012 is OK: puppet ran at Tue Jun 24 19:31:26 UTC 2014 [19:34:54] RECOVERY - Unmerged changes on repository puppet on virt1000 is OK: Fetching origin [19:37:33] RECOVERY - Unmerged changes on repository puppet on virt0 is OK: Fetching origin [19:39:44] !log rebuilding search index for group1 wikis after upgrade today [19:39:47] Logged the message, Master [19:45:06] _joe_: could you change your review of https://gerrit.wikimedia.org/r/#/c/141059/ and https://gerrit.wikimedia.org/r/#/c/141062/ ? [19:45:24] <_joe_> ori: yes, sure :) [19:45:37] (03PS1) 10Ottomata: Parameterize jvm_performance_opts in order to modify GC settings [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/141768 [19:45:38] thanks [19:46:05] i tried to deploy from tin yesterday, tested with a single host, [19:46:16] (03CR) 10Giuseppe Lavagetto: "Discussed with ori, this is not optimal but this shouldn't be a blocker." [operations/puppet] - 10https://gerrit.wikimedia.org/r/141059 (owner: 10Ori.livneh) [19:46:21] noticed how it changes permissions on apache configs to root:root [19:46:49] instead of randomuser:wikidev when doing that fromfenari [19:46:57] _joe_: no +1 ? :) [19:46:58] then stopped.. so i would like a sync on all [19:47:14] <_joe_> ori: not at 10 PM sorry [19:47:19] also, the file "static.conf" should likely be deleted from all appservers [19:47:21] because it exists but it not in the repo [19:47:39] <_joe_> I remember the patch was good but I just spent the day looking at ab benchmarks results [19:47:47] and we have a directory called "broken" , owned by jeluf.. fun [19:47:47] _joe_: okay, no problem. could you look at it tomorrow? [19:47:48] <_joe_> I don't trust my review abilities now [19:47:55] <_joe_> ori: will do. [19:48:02] thanks (again!) :) [19:48:06] <_joe_> ori: and, tomorrow we can go live with rcstream [19:48:17] _joe_: \o/ wow! [19:48:18] awesome! [19:48:26] _joe_: will you be available tomorrow morning to discuss my puppet Zuul patches ? :D [19:48:43] <_joe_> sadly, we've no way to load test it if I don't write 'ab-for-websockets', apparently [19:48:59] <_joe_> ehmmm [19:49:04] <_joe_> I'll do my best [19:49:20] <_joe_> in fact, I have two patches to write then I can work with you I guess [19:49:50] (03PS2) 10Ottomata: Parameterize jvm_performance_opts in order to modify GC settings [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/141768 [19:49:55] _joe_: you're in demand ;) [19:49:59] * ori grabs one arm, hashar grabs the other [19:50:04] (03CR) 10Giuseppe Lavagetto: "As per discussion with ori, no reason to make this a blocker given how much work avoiding mod_version would require." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/141062 (owner: 10Ori.livneh) [19:50:37] <_joe_> I'm pretty sure I weigh more than the two of you combined :P [19:50:45] <_joe_> I can handle it [19:50:46] (03CR) 10Ottomata: [C: 032 V: 032] Parameterize jvm_performance_opts in order to modify GC settings [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/141768 (owner: 10Ottomata) [19:50:58] quick, somebody grab his legs! [19:51:17] 71kg here [19:51:24] <_joe_> ottomata: oh you're in the wonderful world of jvm optimization. Don't envy you [19:52:06] <_joe_> it's like having to re-learn how to be a sysadmin from scratch, off of oracle manuals. [19:52:17] <_joe_> which also usually lie. [19:53:01] ha, uh, i'm barely learning much [19:53:03] what I"ve done so far [19:53:04] also, let's just do https://gerrit.wikimedia.org/r/#/c/130296/1 ? [19:53:09] take recommended settings from LinkedIn [19:53:15] set it on a single broker [19:53:20] compare some metrics with other broker [19:53:28] <_joe_> revert [19:53:30] <_joe_> :P [19:53:41] naw, seems ok! :) i'm going with the recommended settings :) [19:53:47] it is 'highly' recommended [19:53:57] what's nice about kafka at least, is that its mem usage is pretty consistent [19:54:01] most of the work is left up to the filesystem [19:54:18] <_joe_> and that's nice? [19:54:19] <_joe_> :) [19:54:30] yes, because then the JVM doesn't have to do much heap management [19:54:45] os handles disk flushes whenever it feels like it [19:55:01] kafka just does fs writes and then wipes its hands [19:55:27] <_joe_> well, while I completely agree with you, do you get that we're saying "it's java, it's so crappy we'd prefer it to write to disk so that the OS can optimize in its place"? [19:55:35] haha [19:55:36] <_joe_> sigh [19:55:39] yup [19:55:47] <_joe_> java makes me regret php [19:55:48] that's basically what they say in the design doc [19:56:00] <_joe_> yes it's probably "clever" [19:56:24] https://kafka.apache.org/documentation.html#persistence [19:56:38] "Furthermore we are building on top of the JVM, and anyone who has spent any time with Java memory usage knows two things: [19:56:38] 1. The memory overhead of objects is very high, often doubling the size of the data stored (or worse). [19:56:38] 2. Java garbage collection becomes increasingly fiddly and slow as the in-heap data increases. [19:56:38] " [19:56:48] <_joe_> lol we all go to incredible lengths to find lame solutions to the shortcomings of our tools. [19:57:13] <_joe_> ottomata: they are completely right. The next question to them could be: why use java then? [19:57:42] haha [19:57:45] they are using scala! [19:58:43] !log Upgraded Zuul on gallium.wikimedia.org to install the zuul-cloner of doom. 4f9fd51..9839edb Tagged wmf-deploy-20140624-1 in our repo. [19:59:29] bd808: I got zuul-cloner installed on gallium :-] Will be able to create a dummy experimental job there for the mediawiki vendor repo \O/ [19:59:40] <_joe_> ottomata: scala is java with hipster clothes ;) [20:00:00] hashar: awesome. thanks :) [20:00:14] ha [20:00:55] bd808: I have slightly refactored the puppet manifest to make it possible to install Zuul on all slaves without installing the daemon / configuration. Giuseppe offered to review / give advice on them. [20:01:10] bd808: I am confident I will have some experimental job by the end of this week. Finally. [20:02:15] (03CR) 10Dzahn: "needs rebase on apache/appserver changes" [operations/puppet] - 10https://gerrit.wikimedia.org/r/130296 (owner: 10ArielGlenn) [20:02:16] !log updated our Jenkins Job Builder copy 416ee7d..e9db73d [20:02:21] Logged the message, Master [20:03:13] Sweet. Hopefully after the tests pass again I can get try to get Tim to lift his -2 so that someone else can take their turn at blocking my logging changes ;) [20:05:35] ooh me me me! [20:05:39] * ori kids. [20:08:49] (03PS1) 10Ottomata: Use G1 garbage collector for Kafka brokers [operations/puppet] - 10https://gerrit.wikimedia.org/r/141813 [20:09:20] (03CR) 10Ottomata: [C: 032 V: 032] Use G1 garbage collector for Kafka brokers [operations/puppet] - 10https://gerrit.wikimedia.org/r/141813 (owner: 10Ottomata) [20:16:27] bout to be some kafka errors here I think... [20:16:54] (03CR) 10Hashar: [C: 031] "Will have to streamline releases in neat Jenkins jobs one day." [operations/puppet] - 10https://gerrit.wikimedia.org/r/140634 (owner: 10Catrope) [20:18:05] (03PS4) 10Withoutaname: DNS settings for wikimania 2015 wiki [operations/dns] - 10https://gerrit.wikimedia.org/r/140186 (https://bugzilla.wikimedia.org/66370) [20:23:50] mutante: thanks for the merges. any other lint offending files? [20:24:05] PROBLEM - Kafka Broker Server on analytics1022 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args kafka.Kafka /etc/kafka/server.properties [20:24:06] PROBLEM - Kafka Broker Server on analytics1012 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args kafka.Kafka /etc/kafka/server.properties [20:24:23] ottomata: ^ [20:24:39] yup [20:24:41] i know [20:24:44] doing sumpin [20:25:04] !log reinitializing varnish topics with replication factor of 3 [20:25:09] Logged the message, Master [20:25:28] wtf is going on with the Apache cluster [20:25:47] I'm trying to debug why I'm getting empty pages on meta and I'm now seeing the weirdest thing in fatalmonitor [20:25:49] 125 Warning: require() [function.require]: Unable to allocate memory for pool. in /usr/local/apache/common-local/php-1.24wmf10/includ [20:25:51] es/AutoLoader.php on line 1249 [20:27:35] PROBLEM - Kafka Broker Server on analytics1018 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args kafka.Kafka /etc/kafka/server.properties [20:29:03] hmm, cmjohnson, yt? [20:29:03] <^demon|lunch> RoanKattouw: google says that's apc. [20:29:19] !log rebooting analytics1021 [20:29:24] Logged the message, Master [20:29:28] It seems to be mw1220 [20:29:32] <^demon|lunch> http://www.cyberciti.biz/faq/linux-unix-php-warning-unable-to-allocate-memory-for-pool/ [20:29:52] !log Restarting Apache on mw1220, getting lots of "Unable to allocate memory for pool" errors [20:29:57] Logged the message, Mr. Obvious [20:30:32] ottomata: I am [20:31:15] PROBLEM - Host analytics1021 is DOWN: PING CRITICAL - Packet loss = 100% [20:31:16] i'm r [20:31:24] /dev/sdf wasn't recognized on analytics1021 [20:31:27] i'm rebooting now [20:32:45] <^demon|lunch> RoanKattouw: http://www.php.net/manual/en/apc.configuration.php#ini.apc.shm-segments is interesting. We use the default of "1" [20:32:51] ottomata: you have them in there as jbod and it appears that someone removed it [20:33:02] removed it? [20:33:02] <^demon|lunch> I'm curious if messing with that could help us get some headroom in apc without increasing shm_size again. [20:33:29] yeah it went bad so it was the bad was removed...the new one has to be added back to /dev/sdf/ [20:33:36] RECOVERY - Kafka Broker Server on analytics1018 is OK: PROCS OK: 1 process with command name java, args kafka.Kafka /etc/kafka/server.properties [20:34:04] it was gone before I removed the old one...i assumed you had done it [20:34:15] you mean the partition? [20:35:34] disk 5 was /dev/sdf correct? [20:36:06] RECOVERY - Host analytics1021 is UP: PING OK - Packet loss = 0%, RTA = 0.47 ms [20:36:32] if numbering starts at 0, then yes? [20:36:37] a,b,c,d,e,f [20:36:38] ... [20:38:06] PROBLEM - check configured eth on analytics1021 is CRITICAL: Connection refused by host [20:38:35] PROBLEM - DPKG on analytics1021 is CRITICAL: Connection refused by host [20:38:45] PROBLEM - SSH on analytics1021 is CRITICAL: Connection refused [20:38:55] PROBLEM - Disk space on analytics1021 is CRITICAL: Connection refused by host [20:38:55] PROBLEM - RAID on analytics1021 is CRITICAL: Connection refused by host [20:38:55] PROBLEM - check if dhclient is running on analytics1021 is CRITICAL: Connection refused by host [20:38:56] PROBLEM - puppet disabled on analytics1021 is CRITICAL: Connection refused by host [20:39:05] PROBLEM - jmxtrans on analytics1021 is CRITICAL: Connection refused by host [20:41:20] still booting back up...I think [20:45:00] actually, dunno what it is doing [20:45:36] it responds to pings, but ssh is not up? [20:46:15] <_joe_> ottomata: checking disks? [20:46:19] <_joe_> try via drac [20:47:07] yeah i'm in console, no output [20:47:49] <_joe_> mh, quite strange indeed [20:49:09] RECOVERY - Kafka Broker Server on analytics1022 is OK: PROCS OK: 1 process with command name java, args kafka.Kafka /etc/kafka/server.properties [20:49:09] RECOVERY - Kafka Broker Server on analytics1012 is OK: PROCS OK: 1 process with command name java, args kafka.Kafka /etc/kafka/server.properties [20:49:57] i'm going to reboot it again and watch on console from beginning [20:50:51] PROBLEM - Kafka Broker Messages In on analytics1012 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 0.0 [20:51:16] PROBLEM - Kafka Broker Messages In on analytics1022 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 0.0 [20:52:09] PROBLEM - Host analytics1021 is DOWN: PING CRITICAL - Packet loss = 100% [20:52:22] ah was waiting on input [20:53:17] !log Jenkins / Zuul deploying experimental pipeline {{gerrit|141827}} [20:53:22] Logged the message, Master [20:54:26] who at the foundation has experience with redis? I'm trying to debug an issue on my local machine where I'm sending a command and getting back an empty ack [20:54:55] actually; I'll ask in #redis [20:56:00] RECOVERY - Host analytics1021 is UP: PING OK - Packet loss = 0%, RTA = 0.45 ms [20:58:49] RECOVERY - SSH on analytics1021 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [20:58:49] RECOVERY - Disk space on analytics1021 is OK: DISK OK [20:58:49] RECOVERY - RAID on analytics1021 is OK: OK: no disks configured for RAID [20:58:59] RECOVERY - check if dhclient is running on analytics1021 is OK: PROCS OK: 0 processes with command name dhclient [20:59:00] RECOVERY - puppet disabled on analytics1021 is OK: OK [20:59:00] RECOVERY - jmxtrans on analytics1021 is OK: PROCS OK: 1 process with command name java, args -jar jmxtrans-all.jar [20:59:09] RECOVERY - check configured eth on analytics1021 is OK: NRPE: Unable to read output [20:59:29] RECOVERY - DPKG on analytics1021 is OK: All packages OK [20:59:59] RECOVERY - Kafka Broker Server on analytics1021 is OK: PROCS OK: 1 process with command name java, args kafka.Kafka /etc/kafka/server.properties [21:00:05] bsitu: The time is nigh to deploy Flow (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140624T2100) [21:07:09] RECOVERY - Kafka Broker Messages In on analytics1022 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 1630.97457407 [21:07:49] RECOVERY - Kafka Broker Messages In on analytics1012 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 3133.38547121 [21:07:59] RECOVERY - Kafka Broker Messages In on analytics1018 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 2911.9345035 [21:08:00] I really want jouncebot to say 'the time is nigh to let the deploy Flow' :p [21:09:09] PROBLEM - Varnishkafka log producer on cp3003 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka [21:12:06] mwalker: where's the source to jouncebot live? [21:12:12] bug tracker? ;) [21:12:24] github! /whois jouncebot [21:12:41] https://github.com/mattofak/jouncebot [21:12:55] oh right, useful information [21:12:59] JohnLewis: ^ [21:13:06] :D [21:13:15] * JohnLewis looks [21:15:09] RECOVERY - Varnishkafka log producer on cp3003 is OK: PROCS OK: 1 process with command name varnishkafka [21:18:39] (03PS1) 10Andrew Bogott: Remove unused upstream Openstack module [operations/puppet] - 10https://gerrit.wikimedia.org/r/141835 [21:18:41] (03PS1) 10Andrew Bogott: Modify nova role to better support labs uses. [operations/puppet] - 10https://gerrit.wikimedia.org/r/141836 [21:33:46] Hm. it seems mw1209-1220 experienced a drop in memory usage a few weeks ago [21:33:52] quite significant [21:34:12] http://ganglia.wikimedia.org/latest/?r=month&c=Application+servers+eqiad&h=mw1220.eqiad.wmnet [21:34:22] http://ganglia.wikimedia.org/latest/?r=month&c=Application+servers+eqiad&h=mw1216.eqiad.wmnet [21:34:43] However they also seem to be having a spike right now in logstash with php unable to allocate memory from pool [21:34:53] limited by php of course, the host has enough [21:35:02] http://ganglia.wikimedia.org/latest/?r=month&c=Application+servers+eqiad&h=mw1216.eqiad.wmnet [21:35:06] PHP Warning: include() [function.include]: Unable to allocate memory for pool. in /usr/local/apache/common-local/php-1.24wmf9/extensions/Wikidata/vendor/data-values/javascript/DataValuesJavascript.php on line 33 [21:35:07] etc. [21:35:31] 22k hits in logstash for PHP Unable to allocate memory [21:35:31] Krinkle: all wikidata, or just generally? [21:35:42] generally, coming from languages/ or AutoLoader etc. [21:35:50] * greg-g nods [21:36:00] Not sure what to do with it. Looking in the logs for something else. [21:36:13] "Unable to allocate memory" is APC thrashing [21:36:17] weird that it is limited to a set of machines [21:38:47] bd808: Is that apc throwing internally or do they surface to the front? [21:39:00] e.g. were those web requests getting 500 errors? [21:39:24] Krinkle: I'm actually not sure if it's fatal to the request or not [21:39:27] that seems bad if we're serving 10,000s of those all the time. [21:39:32] it is [21:39:52] Or rather, it isn't, but it's going to fatal soon after. [21:39:57] require is required after all [21:40:07] PHP Warning: require() [function.require]: Unable to allocate memory for pool. in /usr/local/apache/common-local/php-1.24wmf9/includes/AutoLoader.php on line 1249 [21:42:15] <_joe_> bd808: it is [21:43:15] <_joe_> Krinkle: those servers need an apache restart, but this is a nightmare [21:43:38] <_joe_> APC memory thrashing is very difficult to debug usually [21:43:42] That error is usually only seen for a few minutes following the deploy of a new branch. [21:44:01] <_joe_> bd808: ok so it's just transient usually, which makes sense [21:44:34] If it's happening at random times now we may be hitting a limit for the acp cache size in general. [21:45:14] * bd808 wishes the deploy tags showed reliably in the prod logstash reports [21:45:20] <_joe_> that would be easy to check [21:45:50] <_joe_> if you need ops assistance, though, ping someone else. I'm going to bed now sorry [21:46:09] it's probably worth checking [21:46:29] opsen? [21:46:48] RT maybe [21:46:50] No ottomata? [21:46:51] no otto? [21:46:54] :( [21:48:03] I'm about [21:48:19] bblack: how's the good fight going? [21:49:18] lol, helpful [21:49:23] APC not enabled on cli php? [21:49:32] Warning: apc_cache_info(): No APC info available. Perhaps APC is not enabled? Check apc.enabled in your ini file in php shell code on line 1 [21:49:49] Even if it was it would be a separate cache [21:50:04] Wouldn't it? [21:50:10] https://www.mediawiki.org/wiki/Extension:APC [21:50:36] There's /usr/share/doc/php-apc/apc.php.gz as well [21:51:53] chasemp: Fancy looking at the APC thrashing that's happening on a few boxes? [21:52:27] <^demon|lunch> <^demon|lunch> RoanKattouw: http://www.php.net/manual/en/apc.configuration.php#ini.apc.shm-segments is interesting. We use the default of "1" [21:53:50] apc.shm_size=240M [21:54:11] ; 120MB should be enough for two copies of MW [21:54:11] apc.shm_size=240M [21:54:20] bleugh, just wanted the comment the second time [21:55:13] Reedy: was just reading up. Is my summary in the neighborhood that a group of apache hosts are throwing "Unable to allocate memory" randomly, which is a known apc caching thrash issue. [21:55:32] "group" [21:55:37] I'm not sure if there's a pattern to them [21:55:50] was this discovered as a consequence of soemething like a deployment or who-knows-when-it-started [21:55:54] We usually see it when we switch away from a mediawiki version, but it subsides after a short enough period of time [21:56:42] Even if its only after an mw switch, those are 10,000s of production requests though. [21:57:07] * bd808 sees 871 in the last 24 hours [21:57:10] ranging logstash to -30days with type=apache2 and "Unable to allocate memory" gives a spike every few days of 10-50k [21:58:45] * bd808 tries to keep the elasticsearch cluster behind logstash from crashing while a lot of 30 day searches start [21:59:32] :) [21:59:40] "heap_used_percent" : 99 [22:00:35] screenshot of the graph http://i.imgur.com/7Z8oJsq.png [22:01:05] sure but the unable to allocate memory error is fairly generic [22:01:10] seems lots of them are https://issues.apache.org/bugzilla/show_bug.cgi?id=45187 [22:01:19] or a valid path longer than 255 chars, idk [22:01:22] Yeah. 2014-06-05 == thursday deploy [22:02:19] yeah, those bumps look like thursday deploys [22:03:10] count per 12h makes a lot of things look scary [22:03:14] PROBLEM - ElasticSearch health check on logstash1003 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.136 [22:03:15] PROBLEM - ElasticSearch health check on logstash1002 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.137 [22:03:19] lol [22:03:21] And it died [22:03:35] I was trying to keep it purged but to no avail [22:03:44] :( poor logstash [22:04:25] (03PS1) 10QChris: Make dbstore1002 handle s7 analytics queries [operations/dns] - 10https://gerrit.wikimedia.org/r/141849 (https://bugzilla.wikimedia.org/66068) [22:04:38] this is probably a better representation of the pure unable to allocate for a require [22:04:38] https://logstash.wikimedia.org/#dashboard/temp/jnEvdLZ8TWWBGw1jMSH_NQ [22:05:20] of which there are a few really notable spikes, and one that is 2/3rds of the results [22:08:04] * bd808 crosses fingers and toes for logstash1003 to stay up [22:08:32] (03PS3) 10Ori.livneh: [HAT] text-frontend VCL: set Content-Type if not set [operations/puppet] - 10https://gerrit.wikimedia.org/r/141086 [22:08:46] (03CR) 10jenkins-bot: [V: 04-1] [HAT] text-frontend VCL: set Content-Type if not set [operations/puppet] - 10https://gerrit.wikimedia.org/r/141086 (owner: 10Ori.livneh) [22:08:48] (03Abandoned) 10Ori.livneh: Reattempt Ia36d0d89f with stricter checks [operations/puppet/varnish] - 10https://gerrit.wikimedia.org/r/141584 (owner: 10Ori.livneh) [22:10:02] so we used fpm at my last gig and we had similar output for every sync...our version of deploy [22:10:09] mass cache invalidation is really expensive [22:10:50] is there some way to align our deployments with these jumps? haven't used logstash in a long time [22:12:42] :( [22:13:07] deployment events are going to the electronic ether and never being seen again [22:13:07] chasemp: In theory "type:scap AND channel.raw:scap.announce" [22:13:40] We seem to lose a lot of the scap events between florine and logstash in practice [22:14:11] and they never make it to graphite, either [22:14:20] but it works in beta cluster [22:14:23] that is a problem [22:14:28] yeah :( [22:14:34] Graphite is a whole separate problem [22:14:51] https://bugzilla.wikimedia.org/show_bug.cgi?id=62667 [22:14:52] (03CR) 10Ori.livneh: "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/141086 (owner: 10Ori.livneh) [22:14:57] fwiw we all know it's holding on for dear life and scaling it out is on the table [22:16:17] (03CR) 10Ori.livneh: "Cherry-picked in beta" [operations/puppet] - 10https://gerrit.wikimedia.org/r/141086 (owner: 10Ori.livneh) [22:18:17] ^ bblack: this patch is now the one (since the previous was targeting the submodule which you removed). it's running on the beta cluster varnishes and appears to be doing the right thing. [22:33:39] (03CR) 10Dzahn: [C: 031] Update the redirect target for education.wikimedia.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/122866 (owner: 10Ragesoss) [22:34:31] (03CR) 10Dzahn: [C: 031] Apache set up for us-ne [operations/apache-config] - 10https://gerrit.wikimedia.org/r/133981 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [22:34:50] mwalker: btw, I put your name down for the ... wait.. what team are you on officially now? [22:35:21] greg-g, as of next week -- services [22:35:29] tomorrow still Fundraising? [22:35:32] yep [22:35:33] https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140625T1300 [22:35:35] fundraising should be renamed "goods" [22:35:38] who should I put there next week? [22:35:39] so it's goods & services [22:35:42] hah [22:36:10] greg-g, I'm guessing K4-713 [22:36:17] mwalker: kk [22:36:18] or the-wub [22:36:20] * greg-g nods [22:36:26] buh? [22:36:50] we're blaming you for things k4 :) [22:36:52] K4-713: your nick name will be pinged tomorrow morning at 6am when the fundraising banner test window starts [22:36:52] Who put where next whatnow? [22:37:00] OH GOOD. [22:37:06] :) [22:37:24] (03CR) 10Reedy: [C: 04-1] "Apparently on hold https://bugzilla.wikimedia.org/show_bug.cgi?id=64557#c60" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/133981 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [22:37:25] ...but nah, that's probably... correct. [22:37:43] :) [22:37:48] (03CR) 10Reedy: [C: 04-1] "On hold https://bugzilla.wikimedia.org/show_bug.cgi?id=64557#c60" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/133991 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [22:39:34] (03PS5) 10Reedy: Redirect arbcom.*.wikipedia.org -> arbcom-*.wikipedia.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/134932 (https://bugzilla.wikimedia.org/31335) [22:39:37] (03PS4) 10Reedy: wg.en -> wg-en and noboard.chapters -> noboard-chapters [operations/apache-config] - 10https://gerrit.wikimedia.org/r/134939 (https://bugzilla.wikimedia.org/64977) [22:41:07] RECOVERY - ElasticSearch health check on logstash1002 is OK: OK - elasticsearch (production-logstash-eqiad) is running. status: green: timed_out: false: number_of_nodes: 3: number_of_data_nodes: 3: active_primary_shards: 36: active_shards: 103: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [22:42:07] RECOVERY - ElasticSearch health check on logstash1003 is OK: OK - elasticsearch (production-logstash-eqiad) is running. status: green: timed_out: false: number_of_nodes: 3: number_of_data_nodes: 3: active_primary_shards: 36: active_shards: 103: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [22:48:18] 2014-06-24 09:21:49 mw1004 mediawikiwiki: [7685cd20] [no req] Exception from line 46 of /usr/local/apache/common-local/php-1.24wmf10/extensions/Translate/tag/TranslateRenderJob.php: Oops, this should not happen! [22:48:20] hehe [22:48:24] Nikerabbit: ^ [22:49:03] (03CR) 10Dzahn: [C: 031] Remove wiktionary.wikipedia.org from rewrites as it is not in DNS. [operations/apache-config] - 10https://gerrit.wikimedia.org/r/92799 (owner: 10Reedy) [22:50:00] (03PS4) 10Reedy: Update the redirect target for education.wikimedia.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/122866 (owner: 10Ragesoss) [22:50:18] (03CR) 10Ori.livneh: [C: 032] Remove wiktionary.wikipedia.org from rewrites as it is not in DNS. [operations/apache-config] - 10https://gerrit.wikimedia.org/r/92799 (owner: 10Reedy) [22:51:25] (03CR) 10Ori.livneh: [C: 032] Update the redirect target for education.wikimedia.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/122866 (owner: 10Ragesoss) [22:51:33] (03CR) 10Dzahn: [C: 031] Redirect arbcom.*.wikipedia.org -> arbcom-*.wikipedia.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/134932 (https://bugzilla.wikimedia.org/31335) (owner: 10Reedy) [22:52:16] (03CR) 10Ori.livneh: [V: 032] Update the redirect target for education.wikimedia.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/122866 (owner: 10Ragesoss) [22:52:22] (03CR) 10Ori.livneh: [V: 032] Remove wiktionary.wikipedia.org from rewrites as it is not in DNS. [operations/apache-config] - 10https://gerrit.wikimedia.org/r/92799 (owner: 10Reedy) [22:56:22] (03CR) 10Ori.livneh: [C: 032 V: 032] Redirect arbcom.*.wikipedia.org -> arbcom-*.wikipedia.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/134932 (https://bugzilla.wikimedia.org/31335) (owner: 10Reedy) [22:58:08] (03CR) 10Dzahn: "what is the actual change in wikimedia.conf?" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/134939 (https://bugzilla.wikimedia.org/64977) (owner: 10Reedy) [22:58:52] root is doing a graceful restart of all apaches [22:58:58] ^ me [22:59:21] thanks a bunch for doing them [22:59:39] np at all [22:59:43] we had some backlog [22:59:53] well, that didn't actually log :P [23:00:02] !log root gracefulled all apaches [23:00:04] mwalker, ori, MaxSem: The time is nigh to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140624T2300) [23:00:12] Logged the message, Master [23:00:15] ah :D [23:00:21] sure [23:00:33] hah [23:00:53] was the actual !log automated? [23:00:59] Yeah [23:01:08] Seen these before :P [23:01:15] should have, moved from fenari to tin though [23:01:25] (the script that sends it is dologmsg) [23:01:56] called by apache-graceful-all [23:02:06] !log apache graceful done by me for I543efda24, I29b34689e, and I1c269433e [23:02:10] tgr, next itme plese do the core changes for stuff you want deployed [23:02:10] Logged the message, Master [23:03:00] MaxSem: I'll be happy to once someone approves https://www.mediawiki.org/wiki/Gerrit/Project_ownership#wmf-deployment_.2B2_for_tgr [23:03:15] :) [23:03:29] oh heh:) [23:04:45] wmf-deploy isn't open to anyone in the wmf ldap group? [23:04:52] No [23:04:58] but any member can add users [23:05:10] for satan's sake, it should be [23:05:12] so you (or whoever) can fulfill that [23:05:12] gave my supports :) [23:05:18] cause I'm lazy! [23:05:23] now up to tacotuesday [23:06:07] Apparently ldap/ops can but the rest of us are added one at a time -- https://gerrit.wikimedia.org/r/#/admin/groups/21,members [23:06:15] (03CR) 10Dzahn: [C: 031] "ah, 16:06 noboard.chapters.wikimedia.org to noboard-chapters.wikimedia.org.. yep" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/134939 (https://bugzilla.wikimedia.org/64977) (owner: 10Reedy) [23:09:43] (03CR) 10MaxSem: [C: 032] Enable VisualEditor for opt-in on OutreachWiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141702 (owner: 10Jforrester) [23:09:57] (03PS8) 10Reedy: Move a lot of the miscellaneous wikis out of their own specific docroots [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90703 [23:10:36] (03Merged) 10jenkins-bot: Enable VisualEditor for opt-in on OutreachWiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/141702 (owner: 10Jforrester) [23:11:03] !log maxsem Synchronized php-1.24wmf10/extensions/MultimediaViewer: (no message) (duration: 00m 05s) [23:11:08] Logged the message, Master [23:12:32] !log maxsem Synchronized visualeditor.dblist: https://gerrit.wikimedia.org/r/141702 (duration: 00m 03s) [23:12:37] Logged the message, Master [23:12:58] tgr & James_F, please verify ^^ [23:14:11] (03PS5) 10Reedy: wg.en -> wg-en and noboard.chapters -> noboard-chapters [operations/apache-config] - 10https://gerrit.wikimedia.org/r/134939 (https://bugzilla.wikimedia.org/64977) [23:14:26] MaxSem: works, thanks! [23:14:43] (03CR) 10Ori.livneh: [C: 032 V: 032] wg.en -> wg-en and noboard.chapters -> noboard-chapters [operations/apache-config] - 10https://gerrit.wikimedia.org/r/134939 (https://bugzilla.wikimedia.org/64977) (owner: 10Reedy) [23:27:02] ori is doing a graceful restart of all apaches [23:28:10] !log ori gracefulled all apaches [23:28:14] Logged the message, Master [23:28:48] !log apache-graceful-all was for Ifc9596cc7 [23:28:53] Logged the message, Master