[00:25:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Fri May 16 06:03:33 2014 [00:30:31] !log on osmium: stopping job runners in order to fix cgroup permissions issue [00:30:35] Logged the message, Master [00:33:00] (03CR) 10Jeremyb: [C: 04-1] Redirect usne to us-ne (031 comment) [operations/apache-config] - 10https://gerrit.wikimedia.org/r/133991 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [00:34:24] (03CR) 10Jeremyb: Add usne DNS (031 comment) [operations/dns] - 10https://gerrit.wikimedia.org/r/133992 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [00:35:58] (03CR) 10Jeremyb: [C: 031] Apache set up for us-ne [operations/apache-config] - 10https://gerrit.wikimedia.org/r/133981 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [00:38:20] (03CR) 10Jeremyb: "Looks ok; I'll defer to Reedy, et al. on choice of DB name." (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133982 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [00:39:34] (03CR) 10Jeremyb: [C: 031] Add DNS for us-ne [operations/dns] - 10https://gerrit.wikimedia.org/r/133980 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [02:13:25] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3793 MB (3% inode=99%): [02:14:40] !log LocalisationUpdate completed (1.24wmf4) at 2014-05-19 02:13:37+00:00 [02:14:47] Logged the message, Master [02:21:25] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3434 MB (3% inode=99%): [02:26:54] !log LocalisationUpdate completed (1.24wmf5) at 2014-05-19 02:25:51+00:00 [02:26:59] Logged the message, Master [02:29:49] (03PS3) 10Ori.livneh: start cleaning up role::mediawiki [operations/puppet] - 10https://gerrit.wikimedia.org/r/133987 [02:59:35] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Sun May 18 23:59:02 2014 [03:00:25] RECOVERY - Disk space on virt0 is OK: DISK OK [03:02:35] ACKNOWLEDGEMENT - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Sun May 18 23:59:02 2014 ori.livneh disabled puppet to prevent it from clobbering a local hack Im testing [03:10:06] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon May 19 03:09:00 UTC 2014 (duration 8m 59s) [03:10:10] Logged the message, Master [03:13:21] (03PS6) 10Ori.livneh: Send Vary header on http to http redirect [operations/apache-config] - 10https://gerrit.wikimedia.org/r/111925 (owner: 10BryanDavis) [03:15:27] (03CR) 10Ori.livneh: "Ie16e5de91 gave a variable name an 'RW_' prefix to indicate that its value is changed by Apache, maybe you could turn that into a conventi" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/111925 (owner: 10BryanDavis) [03:26:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Fri May 16 06:03:33 2014 [03:28:01] (03PS1) 10Ori.livneh: Remove obsoleted debug code [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134053 [04:32:11] (03PS7) 10BryanDavis: Send Vary header on http to https redirect [operations/apache-config] - 10https://gerrit.wikimedia.org/r/111925 [04:41:25] (03CR) 10BryanDavis: "In I83cef4b Antoine simplified the rules by omitting the env guard and setting `Header always merge Vary X-Forwarded-Proto` unconditionall" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/111925 (owner: 10BryanDavis) [05:19:51] (03PS1) 10Springle: revert analytics traffic to dbstore1002 [operations/dns] - 10https://gerrit.wikimedia.org/r/134058 [05:20:31] (03CR) 10Springle: [C: 032] revert analytics traffic to dbstore1002 [operations/dns] - 10https://gerrit.wikimedia.org/r/134058 (owner: 10Springle) [05:45:18] (03PS15) 10Giuseppe Lavagetto: puppet-compiler: module for installation [operations/puppet] - 10https://gerrit.wikimedia.org/r/133449 [05:45:44] (03PS16) 10Giuseppe Lavagetto: puppet-compiler: module for installation [operations/puppet] - 10https://gerrit.wikimedia.org/r/133449 [05:48:45] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet-compiler: module for installation [operations/puppet] - 10https://gerrit.wikimedia.org/r/133449 (owner: 10Giuseppe Lavagetto) [06:00:35] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Mon May 19 03:00:01 2014 [06:11:14] (03PS1) 10Giuseppe Lavagetto: Remove mw1151 from the bits appservers [operations/puppet] - 10https://gerrit.wikimedia.org/r/134059 [06:27:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Fri May 16 06:03:33 2014 [06:35:14] (03PS1) 10Giuseppe Lavagetto: erbium: fix template variable scoping [operations/puppet] - 10https://gerrit.wikimedia.org/r/134060 [06:51:06] (03PS1) 10Giuseppe Lavagetto: pmactt: correct compilation under puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/134061 [06:55:16] (03CR) 10Giuseppe Lavagetto: [C: 032] pmactt: correct compilation under puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/134061 (owner: 10Giuseppe Lavagetto) [07:29:03] (03CR) 10Gilles: [C: 031] create account for gtisza (tgr) [operations/puppet] - 10https://gerrit.wikimedia.org/r/133761 (owner: 10Dzahn) [08:18:43] _joe_: around ? [08:19:56] <_joe_> matanya: yes [08:20:15] every search returns Query error [08:20:22] new and old search [08:20:26] <_joe_> matanya: where? [08:20:33] e.g.: https://he.wikipedia.org/w/index.php?title=%D7%9E%D7%99%D7%95%D7%97%D7%93%3A%D7%97%D7%99%D7%A4%D7%95%D7%A9&profile=default&search=%D7%9E%D7%AA%D7%A0%D7%99%D7%94&fulltext=Search [08:20:35] <_joe_> he wiki? [08:20:37] <_joe_> ok [08:20:46] i guess i should point this to manybubbles or ^demon|away [08:21:07] <_joe_> matanya: try that in a new browser [08:21:12] i have [08:21:16] and as anon [08:21:26] <_joe_> matanya: works for me :) [08:21:40] some editors and anons complained [08:22:12] so you don't see "Query error" next to every result ? [08:22:17] <_joe_> matanya: ok seen now. [08:22:31] <_joe_> It wasn't there on the first load. [08:22:49] oh [08:23:43] <_joe_> what fails is [08:23:45] <_joe_> https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=%D7%9E%D7%9B%D7%91%D7%99_%D7%AA%D7%9C_%D7%90%D7%91%D7%99%D7%91_(%D7%9B%D7%93%D7%95%D7%A8%D7%92%D7%9C)&callback=jQuery18306179546795319766_1400487787685&_=1400487788755 [08:24:05] <_joe_> I get as a response {"servedby":"mw1135","error":{"code":"no-such-entity","info":"Invalid id: \u05de\u05db\u05d1\u05d9_\u05ea\u05dc_\u05d0\u05d1\u05d9\u05d1_(\u05db\u05d3\u05d5\u05e8\u05d2\u05dc)"}}) [08:24:35] <_joe_> so this is not strictly a search error, but a wikidata error/search page template error [08:24:45] <_joe_> but this is a software bug I'd say [08:25:00] <_joe_> did you open a bugzilla ticket on this? [08:25:03] yeah, i guessed it is related to wikidata, saw that in firebug [08:25:23] i haven't, wanted second opinion :) [08:25:29] doing so now [08:25:36] <_joe_> matanya: it seems like a software bug [08:25:39] hi hashar [08:25:47] <_joe_> it may be due to some ops "problem" [08:26:28] i'll open a bug and let wikidata folks escalate it to you... [08:27:16] thanks _joe_ [08:48:53] matanya: hello. Sorry I am doing some accounting this morning [08:48:58] so definitely laggy [08:49:23] no worries, just being friendly :) [08:54:16] anyone happen to know how to add a mediawiki template to the mw instance that runs with the jenkins tests? [09:01:35] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Mon May 19 03:00:01 2014 [09:11:03] dan-nl: do you mean in parser tests ? [09:11:38] (03CR) 10Alexandros Kosiaris: [C: 032] dns recursors: add ferm rules [operations/puppet] - 10https://gerrit.wikimedia.org/r/133513 (owner: 10Matanya) [09:12:17] dan-nl: if you look at mw/core tests/parser/parserTests.txt , at the beginning of the file some pages are created using !! article [09:12:23] hashar: i mean in a unit test i created. maybe i'm doing it wrong, but locally this is what i want and it works https://integration.wikimedia.org/ci/job/mwext-GWToolset-testextensions-master/450/console [09:12:48] ah, k, thought i might haveto do that … i'll look at the parser test [09:13:07] good morning [09:13:11] ferm rules for recursors? [09:13:16] probably needs NOTRACK for port 53 [09:13:17] the test errors out because tha template:artwork is not available .. . [09:13:25] or bumped nfconntrack limits [09:13:52] hashar: thanks for the clue … will look into creating the template before the test runs [09:13:55] paravoid: we got that much traffic ? [09:14:25] depends [09:14:29] pybal does a lot of traffic [09:14:43] we used to run local recursors, we kinda want to not do that anymore, maybe :) [09:14:43] dan-nl: that is only for the parser tests though. With a unit test you will have to create the page using the MediaWiki relevant function. There is probably some examples in mw/core unit tests already [09:15:42] thanks [09:16:53] paravoid: any metric on how much traffic we got ? [09:16:58] paravoid: so 700 qps per server... doesn't seem like a lot [09:17:07] matanya: https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&c=Miscellaneous+eqiad&h=chromium.wikimedia.org&tab=m&vn=&hide-hf=false&mc=2&z=medium&metric_group=NOGROUPS [09:17:24] pdns metric group [09:17:34] still we can always play it safe [09:17:42] notrack won't really hurt anyway there [09:18:22] 700 qps isn't that small [09:19:55] matanya: this look nice too https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&title=PDNS&vl=&x=&n=&hreg[]=%28chromium%7Chydrogen%29.wikimedia.org&mreg[]=pdns_questions>ype=stack&glegend=show&aggregate=1&embed=1&_=1400490962727 [09:20:18] it is really not that high [09:20:38] default ip conntrack timeout for UDP is 30, max entries is 64k [09:20:44] 30s * 700 qps is 21k [09:20:50] and dobson is still getting traffic [09:22:10] not really comparable though https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&title=PDNS&vl=&x=&n=&hreg[]=%28chromium%7Chydrogen%7Cdobson%29.wikimedia.org&mreg[]=pdns_questions>ype=stack&glegend=show&aggregate=1&embed=1&_=1400491272858 [09:22:18] something like 1 qps [09:22:27] max 4.6 [09:22:51] ah, so it was switched then [09:23:36] we should have killed dobson already [09:26:16] matanya: we would not have a tampa dns recursor then. Could have worked, but it could have backfired. Plus no reason to have extra latency for every tampa server wanting to do a DNS query [09:27:22] if we keep saying things depend on each other we will [09:27:32] never be out of tampa ... [09:27:47] ahaha. not true [09:27:57] only if circular dependencies exist [09:28:04] and this is not one [09:28:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Fri May 16 06:03:33 2014 [09:31:12] anyway, my view was to get rid of every not "must" host. apparently, ops team as a hole disagree :) [09:31:20] whole [09:35:55] it is a "must" host. That is what I am saying. I don't think we otherwise disagree [09:40:03] ok :) [09:40:47] <_joe_> and if we do, let's brawl [09:46:41] (03PS3) 10Matanya: dns recurses: add firewall [operations/puppet] - 10https://gerrit.wikimedia.org/r/133515 [11:08:14] (03PS1) 10Alexandros Kosiaris: Avoid connection tracking for DNS recursors [operations/puppet] - 10https://gerrit.wikimedia.org/r/134071 [11:09:57] i thought we agreed on not putting ferm on the dns servers? [11:12:02] mark: that was LVS IIRC [11:12:10] and dns and others [11:13:14] hmmm so either I have memory issues and I need to see a doctor ASAP or I was not present at this discussion(probably my fault) [11:13:58] IRC discussion? should I start searching backlog ? [11:14:06] yeah was irc [11:14:14] anyway, what's the point of having a firewall on such boxes? [11:14:53] protecting SSH from the rest of the universe? [11:15:46] just annoying [11:17:47] to protect ssh a simple router acl with some exemptions would be easier [11:19:10] (03PS3) 10Hashar: zuul: compress log daily [operations/puppet] - 10https://gerrit.wikimedia.org/r/127230 (https://bugzilla.wikimedia.org/63935) [11:19:37] well that is bypassed. just hack the i.e. the blog in the past, some other vulnerable app now and in the future [11:19:45] (03CR) 10Hashar: [C: 031] "Looks fine on integration-dev.eqiad.wmflabs.org:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/127230 (https://bugzilla.wikimedia.org/63935) (owner: 10Hashar) [11:19:58] granted we got only keys and all that jazz [11:20:39] but still, why have a service that does not need to, exposed ? [11:20:50] need to be* [11:24:51] oh whatever [11:25:01] i don't touch those boxes anymore anyway [11:30:35] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Mon May 19 11:30:29 UTC 2014 [11:32:09] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Minor comment" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/133663 (owner: 10Dzahn) [11:39:15] (03Abandoned) 10Hashar: zuul: compress log daily [operations/puppet] - 10https://gerrit.wikimedia.org/r/127230 (https://bugzilla.wikimedia.org/63935) (owner: 10Hashar) [12:01:11] (03CR) 10Alexandros Kosiaris: [C: 032] "The rsyncd.conf man page says:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133647 (owner: 10Dzahn) [12:01:22] !log Running deleteEqualMessages.php on suwiki (bug 43917) [12:01:27] Logged the message, Master [12:29:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Fri May 16 06:03:33 2014 [13:07:47] !log Running deleteEqualMessages.php on zh_yuewiki (bug 43917) [13:07:52] Logged the message, Master [13:17:13] !log Running deleteEqualMessages.php on zh_min_nanwiki (bug 43917) [13:17:16] Logged the message, Master [13:34:06] !log Running deleteEqualMessages.php on mtwiki (bug 43917) [13:34:10] Logged the message, Master [13:49:34] greg-g, manybubbles: Opinion on https://gerrit.wikimedia.org/r/#/c/133873/2 as a SWAT? My first reaction is "why are we SWATting new CSS files and i18n changes when neither has to do with the bug referenced on the Deployments page? Aren't we trying to get away from the 'just update the extension to master' model?". [13:50:49] akosiaris: have you revisited those ruby packages post-hackathon? [13:50:50] anomie: hmm - yeah, I'd much prefer a cherry picked fix to the specific problem. Especially if it doesn't include i18n problems [13:50:57] because then no scap is required [13:51:16] so scope and timing [13:52:16] andrewbogott: yes but only a little bit. I did build some of the dependencies. Not yet ready to give you a full thumbs up though. I am working on it as we speak, maybe I 'll have something before the meeting [13:52:39] ok, now that I know I'm not off-base... [13:52:44] akosiaris: great! Just wanted to make sure you hadn't surrendered :) [13:52:44] Krinkle, James_F|Away: Why is https://gerrit.wikimedia.org/r/#/c/133873/2 being SWATted (with i18n and other unrelated changes) instead of just https://gerrit.wikimedia.org/r/#/c/133738/? [13:52:48] thanks for working on this [13:53:09] anomie: Because it's a separate library [13:53:41] The dependency tree is complex enough as it is, we're not goign to backport individual patches 5 layers down the stream. [13:59:56] Krinkle: I'm not sure I buy that as an excuse to be pulling in unrelated changes for a SWAT, since we stopped allowing that excuse for extensions. And it's not like we have an upstream here that is completely separate from MediaWiki. But we can see what greg-g says. [14:00:56] There are no unrelated changes other than the i18n build. The third commit you see listed in the changelog is not related to the files being pulled in (the script we use to generate those commits doesn't filter out commits that don't touch dist files) [14:01:38] Also, unlke extensions, it is not a submodule. This would require hotpatching the file in a way that is virtually unmaintainable or trackable in a way that doesn't require something not in any documented workflow. [14:02:10] We've always done oojs and jquery ui updates like this so far. I'm not happy with it per se, but it's not new. [14:02:31] Krinkle: Adding a 1172-line CSS file is related but never mentioned on bug 65373? [14:02:43] Not related, Roan messed up the commit generation. That file is completely unreferenced. [14:03:17] The only change is in oojs-ui.js where it changes 3 lines. The rest is i18n and git version. [14:03:29] Not pretty I know [14:16:33] (03PS2) 10John F. Lewis: Redirect usne to us-ne [operations/apache-config] - 10https://gerrit.wikimedia.org/r/133991 (https://bugzilla.wikimedia.org/64557) [14:21:58] (03PS1) 10Giuseppe Lavagetto: compare-puppet-catalogs: new puppet catalog diff [operations/software] - 10https://gerrit.wikimedia.org/r/134090 [14:22:02] (03CR) 10jenkins-bot: [V: 04-1] compare-puppet-catalogs: new puppet catalog diff [operations/software] - 10https://gerrit.wikimedia.org/r/134090 (owner: 10Giuseppe Lavagetto) [14:22:22] (03CR) 10John F. Lewis: Add usne DNS (031 comment) [operations/dns] - 10https://gerrit.wikimedia.org/r/133992 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [14:31:35] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Mon May 19 11:30:29 2014 [14:32:27] (03PS2) 10Giuseppe Lavagetto: compare-puppet-catalogs: new puppet catalog diff [operations/software] - 10https://gerrit.wikimedia.org/r/134090 [14:35:23] anomie: I've got a deploy that starts five minutes ago - I'm going to do it. Just turns another wiki on with Cirrus as primary. Not too many users. [14:35:31] * manybubbles has the conch [14:35:41] (03CR) 10Manybubbles: [C: 032] Cirrus as default for zh_yuewiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133840 (owner: 10Manybubbles) [14:35:53] (03Merged) 10jenkins-bot: Cirrus as default for zh_yuewiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133840 (owner: 10Manybubbles) [14:36:02] <_joe_> manybubbles: are you performing a release now? [14:36:18] <_joe_> if so, please ping me once you've finished [14:36:19] <_joe_> I got to check the state of a server [14:36:19] _joe_: sure - you need me to wait? [14:36:25] <_joe_> manybubbles: no [14:36:43] <_joe_> but we have one server that is off - mw1151 specifically [14:36:54] <_joe_> so I need to check it stays off after the deployment [14:37:04] <_joe_> (it's a bits server) [14:37:54] _joe_: mw1171: Connection timed out during banner exchange [14:38:01] (03CR) 10Giuseppe Lavagetto: [C: 032] compare-puppet-catalogs: new puppet catalog diff [operations/software] - 10https://gerrit.wikimedia.org/r/134090 (owner: 10Giuseppe Lavagetto) [14:38:37] !log manybubbles synchronized cirrus.dblist 'Switch cirrus to the primary backend for zh-yue wikipedia' [14:38:42] Logged the message, Master [14:39:25] PROBLEM - DPKG on mw1171 is CRITICAL: Timeout while attempting connection [14:39:55] PROBLEM - check if dhclient is running on mw1171 is CRITICAL: Timeout while attempting connection [14:39:55] PROBLEM - RAID on mw1171 is CRITICAL: Timeout while attempting connection [14:39:55] PROBLEM - puppet disabled on mw1171 is CRITICAL: Timeout while attempting connection [14:39:55] PROBLEM - Disk space on mw1171 is CRITICAL: Timeout while attempting connection [14:39:55] PROBLEM - check configured eth on mw1171 is CRITICAL: Timeout while attempting connection [14:40:07] <_joe_> mmmh this does not look good [14:40:15] PROBLEM - twemproxy process on mw1171 is CRITICAL: Timeout while attempting connection [14:40:15] PROBLEM - twemproxy port on mw1171 is CRITICAL: Timeout while attempting connection [14:40:42] !log manybubbles synchronized wmf-config/InitialiseSettings.php 'touched and synced InitializeSettings.php to make update to cirrus.dblist take hold' [14:40:47] Logged the message, Master [14:40:56] <_joe_> looking into it [14:41:21] thanks! [14:41:28] otherwise, I'm done [14:41:58] PROBLEM - SSH on mw1171 is CRITICAL: Server answer: [14:43:46] <_joe_> manybubbles ok - this server has died but is unrelated [14:43:53] <_joe_> 1171 vs 1151 [14:44:12] _joe_: sweet! thanks. It'll need to be depooled and resynced when it comes back to life. [14:44:19] I can't depool [14:44:24] I don't think [14:45:44] <_joe_> manybubbles: we need to depool & resync it? [14:45:48] <_joe_> we may do that now [14:46:06] _joe_: well, its out of sync [14:46:06] <_joe_> (I can depool it once it restarts, now I'm about to powercycle it [14:46:24] <_joe_> manybubbles: can you resync it in ~ 3 mins when I ask you? [14:46:44] _joe_: sure. I'll just reperform my deploy [14:47:17] <_joe_> wait here then :) [14:48:25] PROBLEM - Apache HTTP on mw1171 is CRITICAL: Connection timed out [14:48:28] (03CR) 10BBlack: [C: 031] Remove mw1151 from the bits appservers [operations/puppet] - 10https://gerrit.wikimedia.org/r/134059 (owner: 10Giuseppe Lavagetto) [14:49:45] RECOVERY - Disk space on mw1171 is OK: DISK OK [14:49:45] RECOVERY - check configured eth on mw1171 is OK: NRPE: Unable to read output [14:49:55] RECOVERY - SSH on mw1171 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [14:50:06] RECOVERY - twemproxy process on mw1171 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [14:50:06] RECOVERY - twemproxy port on mw1171 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [14:50:15] RECOVERY - DPKG on mw1171 is OK: All packages OK [14:50:18] welcome back mw1171 [14:50:24] _joe_: why did it die ? [14:50:35] <_joe_> manybubbles: go on :) [14:50:45] RECOVERY - check if dhclient is running on mw1171 is OK: PROCS OK: 0 processes with command name dhclient [14:50:45] RECOVERY - RAID on mw1171 is OK: OK: no RAID installed [14:50:45] RECOVERY - puppet disabled on mw1171 is OK: OK [14:50:49] going [14:50:51] on [14:51:50] !log manybubbles synchronized cirrus.dblist 'Switch cirrus to the primary backend for zh-yue wikipedia - resyncing to mw1171' [14:51:55] Logged the message, Master [14:51:58] <_joe_> !log powercycled mw1171, dead and serial console stuck [14:52:02] Logged the message, Master [14:52:15] RECOVERY - Apache HTTP on mw1171 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.059 second response time [14:52:21] thats pretty dead [14:52:26] <_joe_> matanya: no idea, there was a blank console [14:52:33] manybubbles: You going to do the (rest of the?) SWAT today? [14:52:34] <_joe_> looking at syslog [14:52:44] anomie: sure [14:52:50] * matanya guess xfs yet again :P [14:52:50] already did a regular deploy [14:53:14] !log manybubbles synchronized wmf-config/InitialiseSettings.php 'touched and synced InitializeSettings.php to make update to cirrus.dblist take hold - resyncing to mw1171' [14:54:12] manybubbles: Don't forget to run updateCollation.php after deploying 132975 (always good to state the obvious, just in case). [14:54:21] <_joe_> matanya: no. This time it's maybe load related [14:54:23] James_F|Away, twkozlowski, and me: are you around to support your deploy [14:54:28] manybubbles: yes, I'm here [14:54:44] anomie: thanks. I've never actually done that before. any tips or just run the script and stare at it? [14:54:58] <_joe_> nothing got written to syslog so either a) the server was just stuck in some very heavy operations b) a bad kernel panic has happened [14:55:02] <_joe_> I vote b) [14:55:28] manybubbles: I've never either. Presumably just run it via mwscript against cswiki and stare at it. [14:56:08] anomie: sounds good. I just did it in vagrant on my dev machine. took it a few seconds to run through the 5084 pages I've imported [14:56:15] no big deal then [14:56:15] _joe_: no appernt load: https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=cpu_report&s=by+name&c=Application+servers+eqiad&h=mw1171.eqiad.wmnet&host_regex=&max_graphs=0&tab=m&vn=&hide-hf=false&sh=1&z=small&hc=4 [14:56:46] <_joe_> matanya: it was not registering data [14:57:00] <_joe_> the server became unresponsive in that phase [14:57:26] hence apparent :) nothing i can see [14:58:50] <_joe_> file under: shit happens [15:02:18] * manybubbles still has the conch. [15:02:31] since on one has responded about supporting their swat deploy, I'll do mine first [15:03:22] anomie: I think I agree with you. that seems worthly of either just waiting for the train or requesting a window (iow: I don't see the immediate need for such a change being deployed *now* without reference to bugs etc) [15:03:48] greg-g: you want me to pull it out of today? deploy then? [15:05:01] am I confused, I don't see https://gerrit.wikimedia.org/r/#/c/133873/2 on today's list? [15:05:07] are you talking about the VE change? [15:05:15] * greg-g just read anomie's ping out of context [15:05:21] * greg-g hasn't read anything else yet [15:05:21] twkozlowski: poke [15:05:27] greg-g: The bug is https://bugzilla.wikimedia.org/show_bug.cgi?id=65373, FYI. [15:05:39] greg-g: its today I believe [15:05:42] good morning robla [15:05:53] howdy [15:06:07] anomie: ah, that helps [15:06:24] greg-g: I just don't like the precedent of going back to "update copy-pasted library mainly developed by WMF people to master to fix a bug", since we're explicitly getting away from that for extensions. [15:06:33] if "New in wmf5, happens when opening any dialog." is right, then sure [15:06:50] * greg-g nods to anomie [15:07:23] yeah, I understand/agree, what would be a better solutionf or this one? is there a smaller cherrypick that could happen instead? [15:07:38] (brb, putting lunch in fridge) [15:08:10] greg-g: There were no other code changes, 1 code change and an i18n update that happened in between. OOjs UI doesn't have wmf branches because it isn't loaded as a submodule because it requires compilation/building into a distribution file. [15:08:20] greg-g: https://gerrit.wikimedia.org/r/#/c/133738/ probably [15:08:28] So the only commit we can cherry-pick from master to the wmf branch of mediawiki is the lib update [15:08:56] We'd have to make a custom build, when then creates a maintenance hazard because you'd have to somehow keep track of whcih patches were applied. [15:09:16] I'm happy to discuss this further, but I think at the moment this is just snafu. [15:09:46] greg-g: At the moment, I might recommend letting this one go but figuring out a better solution for next time [15:10:06] fair [15:10:32] * anomie doesn't want to leave VE broken [15:10:50] !log manybubbles synchronized php-1.24wmf5/extensions/CirrusSearch/ [15:10:51] Krinkle: this sounds like what wikidata does (ish, pre me drinking much coffee), and they make builds of only small cherrypicks all the time [15:10:54] anomie: right right [15:10:55] Logged the message, Master [15:12:11] Krinkle: just to say: brad expressed the "why is this different than extensions just updating to master" point well, and let's figure out how to make oojs follow suit. [15:12:14] !log SWAT deployed cirrus update for wmf5 and looks good. doing for wmf4 now. [15:12:18] Logged the message, Master [15:13:26] * anomie notes aude might be able to give insight into wikidata's backporting process that greg-g mentioned 6 lines up, if she happens to be around [15:13:38] greg-g: it's not an extension, and doesn't extend mediawiki, and it's not a php library, and it isn't executed server side, it's client-side js only lib, standalone, and requires building/compilation. I don't think that "suit" exists in a way that is applicable or relevant to this library. If anything, the few libraries like it that we do have are all updated this way (any jQuery plugin we have, [15:13:38] any grunt plugin or npm package we have, in case of npm we can't even do a patch, all you get is a versio number) [15:14:21] so it'd be a long-term infrastructure change that I'm all for, but has very little gain and is unlikely to get done or prioritised. [15:14:36] being pragmatic of course [15:14:44] your first sentence isn't really relevant. just fyi. :) [15:15:25] the rest let's chat about, but for now, fix VE :) [15:15:44] anomie: I assume I'm going to have to scap to get Krinkle's change out there because of the i18n. or can we do it sans-scap and let the i18n ride the next scap? that might be a bit scary [15:15:52] Only needs a sync-file afaics [15:16:01] manybubbles: Don't fear scap. [15:16:16] or sync-dir rather (resources/lib/oojs-ui) [15:16:27] manybubbles: It'd scare me too. I'd say scap, absent someone who knows better like bd808 saying it's not necessary [15:17:08] I don't think it needs scapping [15:17:25] There is the voice of knowledge :) [15:17:48] bd808: I haven't scapped since the new stuff you've been doing, is it still extremely slow? [15:18:05] Krinkle: is sync-file only because the i18n stuff is only exposed via js? [15:18:19] !log manybubbles synchronized php-1.24wmf4/extensions/CirrusSearch/ 'adding url parameter to suppress snippets and one to suggest suggestions to cirrus' [15:18:24] Logged the message, Master [15:18:31] manybubbles: no, sync-dir is enough because the message changes do not overlap or relate in any way. [15:18:53] anomie: It depends on how much l10n is changing. No l10n is fast (<5m) Full l10n rebuild is ~20-25m [15:18:58] the i18n files are read by mw localisation message cache like any other [15:19:23] e.g. they won't be read since we cache it and this wouldn't update it. [15:20:05] Krinkle: my point was that if this update accidentally poisons the next update it'll be harder to find if we just sync-dir it [15:20:10] but its not that big a deal [15:20:15] An l10n rebuild requires a metric ton of file stat calls to build the CDB files and then rsync of many fairly large json files. [15:20:26] since Reedy'll probably do the next scap and he'll know what is up [15:20:33] Having the l10n cache out of date with the deployed i18n sources is the sort of thing I would worry about. But I bow to Reedy's expertise. [15:20:33] Krinkle: mergine, btw [15:20:38] k [15:21:03] manybubbles: yeah, I'm familiar with that cascading flow of events. wikidata is infamous for breaking the site in a delayed way that way [15:21:32] twkozlowski: ping again, your time is coming (maybe 10 minutes now) [15:23:58] (03PS1) 10Rush: diamond to remaining lvs servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/134100 [15:24:04] "your time is coming" sounds omnious. [15:25:37] Coren: so is yours [15:26:08] Is it hammertime? :-) [15:26:09] (03CR) 10Rush: [C: 032 V: 032] "tabled from monday, I'm around to babysit, where are you jenkins?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134100 (owner: 10Rush) [15:26:29] !log manybubbles synchronized php-1.24wmf5/resources/lib/oojs-ui/ 'fix panellayout' [15:26:33] Logged the message, Master [15:26:59] Krinkle: you are synced [15:27:00] James_F: you too [15:27:04] please make sure all is well [15:27:23] manybubbles: Thanks! [15:27:44] * Coren needs moar contekts! [15:27:52] twkozlowski: your time has arrived [15:27:59] Coren: context is for suckers [15:28:33] manybubbles: You should really say "Your doom is at hand!" That's even better. :-) [15:28:41] what's the problem? [15:28:52] aude: your doom ias at hand! [15:29:20] aude: anomie had a question about how you backport small changes to wikidata [15:29:41] https://wikitech.wikimedia.org/wiki/How_to_deploy_Wikidata_code#Deploy_a_hotfix [15:29:44] generally [15:30:04] what needs backporting? [15:30:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Fri May 16 06:03:33 2014 [15:30:39] aude: Apparently oojs has a problem where they don't want to cherry pick a fix for backporting because it's "too hard" to keep things straight. greg-g pointed out that Wikidata does that already, so I mentioned you as a reference in case the discussion continued. [15:30:52] ok [15:31:00] anomie: Wat? [15:31:35] anomie: Who said that? I'm confused. Given a cherry-pick for OOjs UI just landed all of 6 minutes ago… [15:32:19] James_F: That SWAT deploy included a bunch of unrelated i18n changes and a whole new CSS file (that I'm told is unreferenced from anything else). We'd rather not have all the extra changes riding along on a bugfix backport, we're getting away from that for extensions. [15:32:20] we'll have some jquery fixes later, btw [15:33:12] anomie: Sure, but "it's "too hard" to keep things straight" is just a flat-out lie. [15:33:43] anomie: It's obviously trivial to create the smallest possible back-port if it's needed. [15:34:14] James_F: Perhaps I misrepresented what was said, but Krinkle was saying it's too much work to do a build with just the one cherry-picked change for merging into core. [15:34:30] backports for us are simple (to wikibase), then it's just a matter of making a new build (which we might be able to automate a bit more) [15:35:23] Not so much too much work, but a maintenance hazard. As you'd merge locally, build/compile, sync to meidawiki-core *and* viusaleditor and whatever else uses it in wmf environment, and then forget to not override next time you do it. It's like keeping security patches on tin, except without git. [15:35:35] (03Abandoned) 10QChris: test gerrit->bz notification [operations/puppet] - 10https://gerrit.wikimedia.org/r/133589 (https://bugzilla.wikimedia.org/65370) (owner: 10Dzahn) [15:36:46] twkozlowski: poke again for https://gerrit.wikimedia.org/r/#/c/132975/ [15:36:48] anomie: Of course, there's the entirely different i18n sync problem, but… [15:37:47] Krinkle: Which is why I thought aude might be able to help by discussing with you how wikidata handles that. [15:38:27] I don't think that it requires any brain storming to figure out. It's pretty straight forward. Implementation, documentation and adoption is another. [15:38:41] You'd start creating wmf branches in upstream libs (and libs we don't control, we'd have to mirror or fork). [15:38:41] manybubbles: Hi. [15:38:47] And compile from that branch [15:38:57] twkozlowski: welcome! I'll sync your change now that I know you are around [15:38:59] Sorry about that, manybubbles [15:39:06] (03CR) 10Manybubbles: [C: 032] Set $wgCategoryCollation to 'uca-cs' on cswiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132975 (https://bugzilla.wikimedia.org/64885) (owner: 10Odder) [15:39:12] manybubbles: There's a script that needs to be run, too [15:39:14] (03PS9) 10Rush: admin module for user/group/permissions cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/129501 [15:39:21] it sounds like what is holding it back is how dev happens? you mentioned "syncing to mw-core *and* VE and whatever else uses it (eg MV)", but... why should they dev against anytyhing other than what's in master in -core? [15:39:24] (03CR) 10Rush: [C: 032 V: 032] admin module for user/group/permissions cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/129501 (owner: 10Rush) [15:39:27] manybubbles: Uhhh [15:39:27] No! [15:39:30] (03Merged) 10jenkins-bot: Set $wgCategoryCollation to 'uca-cs' on cswiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132975 (https://bugzilla.wikimedia.org/64885) (owner: 10Odder) [15:39:52] Reedy: no? [15:40:10] Hm [15:40:14] greg-g: Libraries aren't mastered inside MediaWiki. We don't complain about jQuery being mastered outside of MediaWiki; [15:40:18] I can revert if bad? [15:40:22] grrrit-wm: James_F: greg-g: Well, if you want to have accurate tests during backports (seems like a very fragile time not to have tests), you'd need the VE branch of VisualEditor to execute its standalone tests against the right oojs-ui version [15:40:39] I'll let James_F handle it :) [15:41:48] greg-g: Process is currently: Library development happens; dev (or normally me) builds a new version of the library's state; it's tested, and once working, merged. [15:42:03] James_F: so, this https://gerrit.wikimedia.org/r/#/c/133873/2 suggests to me that oojs lives in the mw-core repo, maybe there's an upstream somewhere else I don't care about, but I care about that (mwcore). Why should it make it hard to just update that with some cherrypick from upstream? Why do the other teams write against something that's not in mwcore? [15:42:20] greg-g: There's a minor wrinkle that the library has to be imported into two different repos (VisualEditor stand-alone), but that's irrelevant for this. [15:42:30] yeah, I don't care about that :) [15:42:45] (officially. professionally. personally? sure ;) ) [15:43:02] greg-g: It's not hard. It's trivial. Krinkle should have only imported f5e6413 as a stand-alone pull for the cherry-pick. [15:43:14] so then why are we still debating this? [15:43:17] greg-g: But it does mean that code in cherry-pick is different from any of the tested code. [15:43:23] Reedy: waiting on you [15:43:24] greg-g: Which is Not Good™. [15:43:36] manybubbles: so the collation update maintenance script has a habit of upsetting mysql. https://bugzilla.wikimedia.org/show_bug.cgi?id=56041 [15:43:41] James_F: that sounds like a testing procedure problem that should be fixed regardless [15:43:47] Reedy: ah. [15:44:14] greg-g: You're welcome to change how the weekly deployment system happens to suit us, but others might not be impressed. :-) [15:44:14] 'backport' to mwcore master, test on beta cluster, THEN backport to wmfX [15:44:22] why wouldn't the above work? [15:44:33] That's not a backport. [15:44:35] That's a forwards-break. [15:44:36] Reedy: cswiki has < 300k articles, does this problem occur on such wikis? [15:44:43] cswiki isn't too large, but springle suggested we disabled the adaptive hash stuff on the specific master whilst running the script [15:44:47] I'm just wondering that [15:45:06] We're talking about master being on version A -> D -> F, and you want to cherry pick an A -> C change. [15:45:15] (03PS9) 10Rush: one-off to convert admins.pp to yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/129541 [15:45:17] (03PS1) 10Rush: removing automatic cleanup logic. [operations/puppet] - 10https://gerrit.wikimedia.org/r/134101 [15:45:32] I know this went really wrong with frwiki, but that one is > 1,000,000 [15:46:20] EXPLAIN says 730976 pages, 1248837 categorylinks [15:46:38] Reedy: k - I've got people poking me to go to lunch now. maybe I should revert? [15:46:41] I haven't synced it [15:46:47] (03CR) 10jenkins-bot: [V: 04-1] one-off to convert admins.pp to yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/129541 (owner: 10Rush) [15:47:09] Might be safer rather than potentially taking out the s2 master :) [15:47:12] (03CR) 10Rush: [C: 032 V: 032] "will be reverted later" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134101 (owner: 10Rush) [15:47:35] Reedy: thanks for catching it. reverting twkozlowski following along with Reedy and my conversation? [15:47:41] James_F: how often is oojs updated in mwcore? [15:47:42] I'm not really sure where to go from here.... [15:47:53] would probably need springle to have a look at it [15:47:53] greg-g: 4-6 times a week. [15:48:00] greg-g: Depending on other work. [15:48:04] manybubbles: Scheduling it in conjunction with ops/springle [15:48:11] and there's an "upstream" repo that it is pulled from? [15:48:28] (03PS1) 10Manybubbles: Revert "Set $wgCategoryCollation to 'uca-cs' on cswiki" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134102 [15:48:34] greg-g: Yes. [15:48:35] (03PS2) 10Manybubbles: Revert "Set $wgCategoryCollation to 'uca-cs' on cswiki" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134102 [15:48:48] (03CR) 10Rush: "go" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134101 (owner: 10Rush) [15:48:49] Reedy: so is there any 'safe' number of pages that we can run the script on? [15:48:53] * anomie goes to update [[mw:Manual:$wgCategoryCollation]] and [[mw:Manual:UpdateCollation.php]] so we don't have to rely on Reedy happening to notice and stop us. [15:49:15] I don't think there is a magic number [15:49:27] Large wiki might work [15:49:27] It might not [15:49:28] (03CR) 10Manybubbles: [C: 032] Revert "Set $wgCategoryCollation to 'uca-cs' on cswiki" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134102 (owner: 10Manybubbles) [15:49:36] (03Merged) 10jenkins-bot: Revert "Set $wgCategoryCollation to 'uca-cs' on cswiki" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134102 (owner: 10Manybubbles) [15:49:36] James_F: so, the jerk in me says "sounds like you're too tightly coupled to a finicky upstream" ;) but I don't have an answer :) [15:49:45] ^demon|away: yo homeslice, any idea what 'Submitted, Merge Pending' suck in gerrit means? [15:49:51] cswiki is nowhere in large.dblist [15:49:53] stuck in gerrit even [15:50:04] greg-g: I have a similar comment about downstream users. :-) [15:50:09] :P [15:50:25] (03PS1) 10Manybubbles: Revert "Revert "Set $wgCategoryCollation to 'uca-cs' on cswiki"" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134103 [15:51:38] twkozlowski: That definition of "large" is one Gloria chose [15:52:47] different wikis vary by different metrics [15:54:22] I see; sorry about the trouble [15:54:38] (03CR) 10Rush: "looked at this with faidon who said and I quote "no back doors here that I see!". this was never meant to merge, just needed space in the" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129541 (owner: 10Rush) [15:54:48] (03Abandoned) 10Rush: one-off to convert admins.pp to yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/129541 (owner: 10Rush) [15:55:29] well I've logged off of tin. I'm going to get lunch. I sent an email about the script to sean (cc reedy and twkozlowski) and I've opened up the script so I can review it [15:55:37] twkozlowski: it's fine :) [15:55:45] twkozlowski: for perspective, plwiki had ~3kk categorylinks, while frwiki had ~20kk (and it's only ~50% larger by article count) [15:56:22] (and enwiki has like ten times that) [15:56:28] it depends on community's attitude towards a ton of unnecessary categories ;) [15:56:50] (refs: https://bugzilla.wikimedia.org/show_bug.cgi?id=42413 https://bugzilla.wikimedia.org/show_bug.cgi?id=54680) [15:58:40] cs has 1.3M according to Reedy, so I guess that would take about 12-13 hours if the speed is same as with plwiki [15:59:33] (03PS2) 10Rush: removing automatic cleanup logic. [operations/puppet] - 10https://gerrit.wikimedia.org/r/134101 [15:59:58] (03CR) 10Rush: [V: 032] "try again" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134101 (owner: 10Rush) [16:00:18] twkozlowski: the script was changed to be a little slower, but a lot less disruptive since then (but i have no idea how slower) – it now processes entire categories together instead of going by what is effectively arbitrary order, so users will rarely see messed up category orderings during the script run [16:00:53] (that change probably doesn't affect the database "health" measurably) [16:01:12] Category:Living persons on enwiki ;) [16:01:18] yeah well [16:01:19] as i said [16:01:27] :D [16:01:28] the performance depends on the wikis not being stupid with their categories [16:01:36] :> [16:09:38] (03CR) 10Rush: "I'm hoping to go on a binge of cleanup which would cover this logic in the end: https://gerrit.wikimedia.org/r/#/c/129501/ may want to ho" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133663 (owner: 10Dzahn) [16:46:43] (03CR) 10MaxSem: "And where will this constant be expanded?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131914 (https://bugzilla.wikimedia.org/48618) (owner: 10Withoutaname) [16:51:59] (03PS1) 10Reedy: Move dblists to their own folder [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134117 [16:52:14] (03CR) 10jenkins-bot: [V: 04-1] Move dblists to their own folder [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134117 (owner: 10Reedy) [16:53:20] (03PS1) 10Reedy: Simplify wmf-config listings in createTxtFileSymlinks.sh [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134118 [16:53:40] Reedy: relatedly, did you ever explore have per-phase files representing the wiki deploy groups? similar to eg the largewikis list idea. [16:54:22] when is swat deploy? [16:54:44] 23:00 utc is the next [16:54:58] also if you load up https://wikitech.wikimedia.org/wiki/Deployments#Near-term it'll tell you in your local timezone (that your computer reports) [16:55:01] thanks to mwalker :) [16:55:11] * aude timezone challenged [16:55:23] greg-g: Not really.. I created group0.dblist [16:55:35] Maybe I should create a script that concats the group1 dblists on the fly [16:56:36] and/or a wrapper script to updateWikiversions [16:57:13] Reedy: if it makes sense, I just remembered we talked about it a long time ago [16:57:26] (03PS1) 10MaxSem: Redirect tablets to mobile site, currently scheduled for June 17 [operations/puppet] - 10https://gerrit.wikimedia.org/r/134119 [16:59:59] (03PS2) 10Reedy: Move dblists to their own folder [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134117 [17:00:01] (03PS1) 10Reedy: Remove gettingstarted-with-category-suggestions.dblist, seems unused [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134120 [17:00:10] (03CR) 10jenkins-bot: [V: 04-1] Move dblists to their own folder [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134117 (owner: 10Reedy) [17:00:14] (03CR) 10jenkins-bot: [V: 04-1] Remove gettingstarted-with-category-suggestions.dblist, seems unused [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134120 (owner: 10Reedy) [17:06:00] (03CR) 10Anomie: [C: 04-1] "Constants are not expanded inside strings." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131914 (https://bugzilla.wikimedia.org/48618) (owner: 10Withoutaname) [17:10:21] (03PS1) 10Filippo Giunchedi: update legal terms for dumps.wm.o [operations/puppet] - 10https://gerrit.wikimedia.org/r/134121 [17:13:21] (03PS1) 10Reedy: Add script to update version of all group1 wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134124 [17:14:10] (03CR) 10Mattflaschen: [C: 04-1] "We use this with foreachwikiindblist and GettingStarted/maintenance/populate_categories.php" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134120 (owner: 10Reedy) [17:14:48] (03Abandoned) 10Reedy: Remove gettingstarted-with-category-suggestions.dblist, seems unused [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134120 (owner: 10Reedy) [17:21:25] (03PS2) 10Giuseppe Lavagetto: Remove mw1151 from the bits appservers [operations/puppet] - 10https://gerrit.wikimedia.org/r/134059 [17:23:44] (03CR) 10Giuseppe Lavagetto: [C: 032] Remove mw1151 from the bits appservers [operations/puppet] - 10https://gerrit.wikimedia.org/r/134059 (owner: 10Giuseppe Lavagetto) [17:30:35] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Mon May 19 17:30:24 UTC 2014 [17:41:23] (03PS1) 10Gerrit Patch Uploader: Adding autopatrolled user group for dewikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134132 (https://bugzilla.wikimedia.org/65495) [17:41:29] (03CR) 10Gerrit Patch Uploader: "This commit was uploaded using the Gerrit Patch Uploader [1]." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134132 (https://bugzilla.wikimedia.org/65495) (owner: 10Gerrit Patch Uploader) [17:46:14] matanya: can I bother you to try search in hewiki on beta? [17:47:03] (03CR) 10Mattflaschen: [C: 032] "Merge config patch" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133442 (owner: 10Phuedx) [17:49:48] (03Merged) 10jenkins-bot: Enable the anonymous signup invite experiment [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133442 (owner: 10Phuedx) [17:53:13] (03PS2) 10Gerrit Patch Uploader: Adding autopatrolled user group for dewikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134132 (https://bugzilla.wikimedia.org/65495) [17:53:15] (03CR) 10Gerrit Patch Uploader: "This commit was uploaded using the Gerrit Patch Uploader [1]." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134132 (https://bugzilla.wikimedia.org/65495) (owner: 10Gerrit Patch Uploader) [17:54:57] (03PS3) 10John F. Lewis: Adding autopatrolled user group for dewikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134132 (https://bugzilla.wikimedia.org/65495) (owner: 10Gerrit Patch Uploader) [17:55:25] (03CR) 10John F. Lewis: [C: 031] "Looks good." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134132 (https://bugzilla.wikimedia.org/65495) (owner: 10Gerrit Patch Uploader) [17:56:31] (03PS4) 10Vogone: Adding autopatrolled user group for dewikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134132 (https://bugzilla.wikimedia.org/65495) (owner: 10Gerrit Patch Uploader) [17:57:51] (03PS1) 10Alexandros Kosiaris: puppet:self::master use newer package names [operations/puppet] - 10https://gerrit.wikimedia.org/r/134143 [17:59:33] !log mattflaschen Started scap: Deploy GettingStarted and enable experiment for de, en, fr, and it [17:59:38] Logged the message, Master [17:59:51] sure manybubbles [18:00:14] Sorry, I didn't realize until late that we needed to do a scap. [18:01:27] manybubbles: what am i looking for? it looks just ok as it was [18:05:23] (03PS2) 10Bsitu: Enable Flow on 3 mediawiki talk pages [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133267 (owner: 10Spage) [18:09:03] The scap was going fine, but then we started getting a huge number of errors. [18:09:07] Let me see if I can do a pastebin. [18:10:53] Paste is https://dpaste.de/fknD [18:12:09] Looks like only 1 failed. [18:12:35] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 21.43% of data exceeded the critical threshold [500.0] [18:13:51] scap-rebuild-cdbs has failed on 2 already, though. [18:18:27] !log mattflaschen Finished scap: Deploy GettingStarted and enable experiment for de, en, fr, and it (duration: 18m 53s) [18:18:30] Logged the message, Master [18:22:19] Deployment complete. [18:23:48] !log 1 server failed for sync-common. 2 servers failed for sync-rebuild-cdbs [18:23:51] Logged the message, Master [18:26:51] superm401: which ones? [18:27:35] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% data above the threshold [250.0] [18:30:30] greg-g, mw1010.eqiad.wmnet::common for sync-common, not sure for cdb. [18:31:29] (03CR) 10Chad: [C: 031] Update hooks-bugzilla to 5edd392d926daaa58917b1c8bb174cdb022e4c76 [operations/gerrit/plugins] - 10https://gerrit.wikimedia.org/r/133732 (https://bugzilla.wikimedia.org/65370) (owner: 10QChris) [18:31:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Fri May 16 06:03:33 2014 [18:32:28] !log mw1010.eqiad.wmnet::common for sync-common, not sure for cdb. (sez superm401) [18:32:33] Logged the message, Master [18:32:34] greg-g, https://dpaste.de/7nEc [18:35:18] matanya: thats good news then. I've switched it from the (somewhat unstable) hebrew analyzer to the more stable unicode one. I'll cut everything over to that soon. I'm traveling the next two week unfortunately. [18:37:49] !log Ran sync-common locally on mw1015 [18:37:52] Logged the message, Master [18:38:33] Reedy: Thanks. I was just about to jump in a do that [18:39:12] it looked alright bar php-1.23wmf20 [18:39:53] Reedy: "cannot delete non-empty directory" complaints? [18:39:58] yeah [18:40:14] think it might be worth a dsh rm -rf to fix those [18:40:50] That's the only way to get rid of them unfortunately. [18:41:04] rsync ignores the *.cdb files so they get left behind [18:41:42] (03PS1) 10Nemo bis: Remove plwiki permission for 'autoconfirmed' denied for 'user' [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134155 [18:41:45] hmm, where have all the example dsh commands gone... [18:42:15] `dsh -g mediawiki-installation -M -F 40 -- 'sudo -u mwdeploy -- rm -r /usr/local/apache/common-local/php-1.23wmf20` should do it [18:43:24] !log rm -rf /usr/local/apache/common-local/php-1.23wmf20 against all apaches [18:43:27] Logged the message, Master [18:43:48] I'll run a scap nooop [18:43:55] * bd808 nods [18:44:13] !log reedy Started scap: nooop to test for errors [18:44:13] Logged the message, Master [18:46:28] Reedy: That dsh command I gave is from https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys#Remove_left_over_files_from_expired_branches [18:46:46] !log reedy Finished scap: nooop to test for errors (duration: 02m 45s) [18:46:51] Logged the message, Master [18:47:42] (03CR) 10LVilla (WMF): [C: 031] "Have reviewed the text; looks good to me. Did not verify that the CSS/etc. is still accurate, but at least in the version I sent it worked" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134121 (owner: 10Filippo Giunchedi) [18:52:45] Reedy: Looks like we need to run scap-purge-l10n-cache for other branches too: 1.23wmf22 and php-1.24wmf3 [18:53:17] I can do that if nothing else is going on [18:54:24] bd808: should all be clear, not sure if Reedy's doing anything [18:55:43] greg-g: Coolio. I'll give Reedy a few minutes to respond. He's idle on tin at the momement. [18:56:32] * greg-g nods [18:59:06] greg-g: imma gonna JFDI [18:59:30] !log bd808 Purged l10n cache for 1.23wmf21 [18:59:35] Logged the message, Master [18:59:51] !log bd808 Purged l10n cache for 1.23wmf21 [19:00:13] Second time I actually had my agent forwarded :) [19:00:44] hah [19:00:46] !log bd808 Purged l10n cache for 1.23wmf22 [19:00:50] Logged the message, Master [19:01:20] !log bd808 Purged l10n cache for 1.24wmf3 [19:01:24] Logged the message, Master [19:01:35] greg-g: {{done}} [19:11:52] (03CR) 10Filippo Giunchedi: "yup I verified locally, there's a bullet.gif missing for list items and headbg.jpg for the header, however I don't think they were display" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134121 (owner: 10Filippo Giunchedi) [19:16:11] (03CR) 10Jdlrobson: [C: 04-1] "Not to be merged till June 17" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134119 (owner: 10MaxSem) [19:16:12] !log Added display of exception-json events to fatalmonitor logstash dashboard [19:16:17] Logged the message, Master [19:44:05] Does anyone know if/where on IRC I can find Howie? [19:44:08] And/or Henrique Andrade? [19:44:59] andrewbogott: Howie's bad at IRC; I think HenriqueAndrade shows up sometimes. [19:45:02] andrewbogott, howie is off today [19:45:17] OK, I'll just re-email then. Thanks [19:45:35] Oh [19:45:41] andrewbogott: HenriqueCrang [19:45:52] Last seen 2014-05-07 [19:45:57] marktraceur: cool, I'll keep an eye out [19:48:42] godog: ping [19:48:42] gwicke: You sent me a contentless ping. This is a contentless pong. Please provide a bit of information about what you want and I will respond when I am around. [19:49:09] godog: p i n g ;) [19:55:33] greg-g, can we have a hotfix window? [19:55:35] We are having what seems to be a caching issue (https://bugzilla.wikimedia.org/show_bug.cgi?id=65502), and I suspect https://gerrit.wikimedia.org/r/#/c/134165/1/GettingStarted.php will fix it. [19:55:43] It is a small change, but it requires a i18n update. [19:59:15] !log Reloading zuul to deploy I0b8051074da39edcac [19:59:20] Logged the message, Master [20:00:43] !log Running deleteEqualMessages.php on enwikinews (bug 43917) [20:00:46] !log Running deleteEqualMessages.php on fowiki (bug 43917) [20:00:49] Logged the message, Master [20:00:54] Logged the message, Master [20:05:34] superm401: fire at will. [20:05:41] !log fix deployed for bug 65501 [20:05:44] Thanks, greg-g [20:05:45] Logged the message, Master [20:23:58] greg-g, I don't need to put the hotfix on the calendar, do I? [20:24:10] s'alright [20:24:57] Thanks, it's a followup to the prior one. [20:27:13] !log updated Parsoid to 3ac048d7c4b [20:27:16] Logged the message, Master [20:27:22] (03PS1) 10Hashar: contint: install python-requests on all hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/134234 [20:29:09] (03CR) 10Andrew Bogott: [C: 032] dynamicproxy: Don't specify text/html for gzip [operations/puppet] - 10https://gerrit.wikimedia.org/r/133265 (owner: 10Yuvipanda) [20:29:20] (03CR) 10Andrew Bogott: [C: 032] Tools: Install git-svn [operations/puppet] - 10https://gerrit.wikimedia.org/r/133266 (owner: 10Yuvipanda) [20:30:08] (03CR) 10Hashar: [C: 031 V: 032] "Deployed on contint puppetmaster in labs. I am not too worried about this change :-)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134234 (owner: 10Hashar) [20:31:23] sync-l10nupdate-1 in progress. [20:34:39] (03PS3) 10Dzahn: create account for gtisza (tgr) [operations/puppet] - 10https://gerrit.wikimedia.org/r/133761 [20:37:33] is somebody of the multimedia team here? [20:37:57] what do you usually do on tin when you "get access to EventLogging data" [20:38:50] tgr_: ---^^ [20:39:34] access to tin isn't a requirement; any host on the cluster is fine. it's just that the event stream and the database that collects it aren't accessible via a public interface, so you need to tunnel or proxy. [20:39:40] RoanKattouw: :) he doesn't have that access yet [20:39:58] but i'm trying to give it to him .. [20:40:03] ori: thanks! [20:40:06] mutante: You asked for someone from multimedia... [20:40:24] mutante: What do you mean? We don't do anything on tin, it's all on stat1003 and limn1 on labs [20:40:57] np [20:41:04] RoanKattouw: yes, thanks :) i was just making his account [20:41:14] marktraceur: i was quoting an access request [20:41:26] Well...OK then, weird [20:41:30] which said " the ability to ssh to tin.eqiad.wmnet, [20:41:31] which you can request from ops" [20:41:45] so i was trying to figure out what it really means [20:41:46] Maybe they thought tin was the bastion to use? :/ [20:42:03] i suppose so, yea, i'll suggest bast1001 [20:42:25] we have admins::bastions now [20:42:32] mutante: Access to EL just requires stat1003 access or so; but also I wouldn't mind having tgr_ as an extra deployer [20:44:30] marktraceur: if it's EL that translates to: admins::bastion (bast1001) and stat1003 via "special accounts" [20:44:49] OK [20:44:50] mutante: ori said I needed cluster access, I assumed that's the same thing as shell access [20:44:50] if you ask for software deployer, that is admins::mortals with mooar [20:45:10] there's not a lot of documentation on wikitech about those terms [20:45:28] gi11es said something to that effect too [20:46:00] I don't really have an idea myself how EL access works [20:46:21] tgr_: yea, no worries, i'm figuring out what you need [20:46:33] first i'm merging that change that creates your account now [20:46:58] then i'll give you the bastion access so you have something to ssh to and jump to stat1003 [20:47:39] (03PS1) 10Rush: initial data.yaml for users/groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/134242 [20:48:02] the software deployer thing is a different thing though and a different request [20:50:10] robla: Once again I'm forgetting if I actually sent you an email about deployer rights for tgr_ and gi11es, or if I dreamt it [20:50:16] (03CR) 10Dzahn: [C: 032] create account for gtisza (tgr) [operations/puppet] - 10https://gerrit.wikimedia.org/r/133761 (owner: 10Dzahn) [20:50:47] marktraceur: I don't remember if there's an RT ticket for that or not, but that's the next step [20:51:06] Right well...I guess I'll do that if tgr_ is interested [20:52:03] I agree it would be useful [20:52:33] we don't have any immediate need for it but it would increase the truck number of the team [20:53:42] !log sync-l10nupdate-1 1.24wmf4 had one error on mw1218 [20:53:45] Logged the message, Master [20:53:48] ori: ? [20:53:54] Does that mean I should re-run it? [20:54:15] ^ greg-g, ^demon|away, Reedy [20:55:08] Created, mutante. Thanks. [20:55:10] robla: marktraceur it's #7506 and Alex K. asked for an approval on it [20:55:22] marktraceur: ok [20:55:30] #7527 [20:55:41] alright [20:55:42] I can't read 7506 [20:56:21] mutante: that's eventlogging [20:57:11] marktraceur: because you aren't requestor, adding you [20:57:39] robla: yea, we can use the same or link it, either or [20:58:58] just me or is gerrit dead? [20:59:06] wfm [21:00:35] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Mon May 19 17:59:42 2014 [21:02:57] tin slow because of l10nupdate rsyncing? [21:05:11] (03PS1) 10Dzahn: add tgr to admins::bastion [operations/puppet] - 10https://gerrit.wikimedia.org/r/134247 [21:05:48] YuviPanda: ^ new upload to gerrit [21:06:00] yeah, ikt's back up for me [21:08:18] (03PS2) 10Dzahn: add tgr to admins::bastion [operations/puppet] - 10https://gerrit.wikimedia.org/r/134247 [21:10:51] (03CR) 10Dzahn: [C: 032] add tgr to admins::bastion [operations/puppet] - 10https://gerrit.wikimedia.org/r/134247 (owner: 10Dzahn) [21:14:31] tgr_: watching the puppet run, so first step, in a minute you can try to ssh to "bast1001.wikimedia.org" [21:15:33] Accounts::Tgr/Ssh_authorized_key[gtisza@wikimedia.org]/ensure: created [21:15:57] user is tgr as requested, because we match what you have in labs [21:18:14] (03PS1) 10Dzahn: add tgr to stat1003 special accounts [operations/puppet] - 10https://gerrit.wikimedia.org/r/134251 [21:19:26] (03PS2) 10Dzahn: add tgr to stat1003 special accounts [operations/puppet] - 10https://gerrit.wikimedia.org/r/134251 [21:20:50] mutante: works, thx [21:27:46] tgr: alright, good. so for stat1003 and deployment access,i'm preparing the changes but that has to sit there for the 3 day rule (access requests need to wait 3 days) and the the person on RT duty will handle them, i'll add you as reviewer so you get mail once it's merged [21:29:37] mutante: thanks [21:29:55] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Mon May 19 21:29:45 UTC 2014 [21:32:36] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Fri May 16 06:03:33 2014 [21:33:07] !log Running deleteEqualMessages.php on commonswiki (bug 43917) [21:33:20] no-op [21:33:22] oh well [21:33:27] Good for Commons :) [21:34:36] !log Running deleteEqualMessages.php on urwiki (bug 43917) [21:34:40] Logged the message, Master [21:37:46] (03PS1) 10Dzahn: add tgr to admins::mortals (deployers) [operations/puppet] - 10https://gerrit.wikimedia.org/r/134258 [21:38:15] Krinkle's in urwiki deletin ur msgs [21:39:42] :D [21:52:10] (03CR) 10Dzahn: [C: 031] "checked all the UIDs, not the keys yet, not sure yet if phab instance is going to have public IP" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/134242 (owner: 10Rush) [21:52:30] (03CR) 10MarkTraceur: [C: 031] "Seems fine." [operations/puppet] - 10https://gerrit.wikimedia.org/r/134258 (owner: 10Dzahn) [21:53:12] (03CR) 10RobLa: [C: 031] add tgr to admins::mortals (deployers) [operations/puppet] - 10https://gerrit.wikimedia.org/r/134258 (owner: 10Dzahn) [21:56:31] (03CR) 10Dzahn: "checked all the UIDs, not the keys yet, not sure yet if phab instance is going to have public IP" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/134242 (owner: 10Rush) [21:58:42] (03CR) 10Rush: initial data.yaml for users/groups (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/134242 (owner: 10Rush) [22:06:35] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 7.14% of data exceeded the critical threshold [500.0] [22:07:04] ARGGHHH I HATE THOSE [22:07:15] we all just got paged [22:09:35] (03CR) 10Dzahn: [C: 04-1] "one key too many in ariel's section" (036 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/134242 (owner: 10Rush) [22:10:55] PROBLEM - LDAP on virt1000 is CRITICAL: Connection refused [22:13:29] well, there are exceptions in prod [22:14:12] 'xception from line 1163 of /usr/local/apache/common-local/php-1.24wmf4/includes/db/Database.php: A database error has occurred. Did you forget to run maintenance/update.php after upgrading? See: https://www.mediawiki.org/wiki/Manual:Upgrading#Run_the_update_script' [22:14:37] Yeah that page means that 7% of all requests were HTTP 500/502/503 [22:15:02] only 150 Database.php exceptions, so not it [22:15:31] tho something is up with wikidatawiki [22:15:37] ori@fluorine:/a/mw-log$ awk '{ print $8 }' dberror.log | sort | uniq -c | sort -rn | head [22:15:37] 1327 wikidatawiki [22:15:37] 417 zh_min_nanwiki [22:15:39] 342 enwiki [22:15:41] ... [22:15:55] RECOVERY - LDAP on virt1000 is OK: TCP OK - 0.000 second response time on port 389 [22:16:53] RoanKattouw: i don't think that's correct, though [22:17:27] it's expressing deviation from an expected value [22:18:22] (03CR) 10Dzahn: "that being said, i actually checked the existing keys and they are right (like in admins.pp)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134242 (owner: 10Rush) [22:18:35] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% data above the threshold [250.0] [22:19:22] !log Coren restarted opendj and I restarted pdns on virt1000. Opendj was refusing connections for unclear reasons [22:19:26] Logged the message, Master [22:24:20] (03Abandoned) 10Dzahn: quoted Booleans in rsync::server::module [operations/puppet] - 10https://gerrit.wikimedia.org/r/133645 (owner: 10Dzahn) [22:27:07] bd808: greg-g: no deployments going on, right? /me wants to merge the style changes to deployment.pp [22:27:36] mutante: not for 33 minutes [22:29:19] ok, will watch it on tin to confirm it's no-op [22:38:45] gerrit timeouts [22:45:09] taking back, more like office network [22:47:44] (03PS2) 10Dzahn: retab misc/deployment.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/133636 [22:49:48] (03CR) 10Dzahn: [C: 032] "yea, just the retab, separate from any other changes" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133636 (owner: 10Dzahn) [22:57:15] (03CR) 10Dzahn: "NOOP on tin" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133636 (owner: 10Dzahn) [23:06:20] SWAT time :) [23:07:07] Are you the big boss, RoanKattouw ? [23:07:12] Today I am [23:07:49] swat time! [23:13:32] glad you remembered roan :p [23:13:36] I just did [23:13:36] (03PS2) 10Dzahn: fix quoting, arrows etc in deployment.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/133644 [23:17:47] (03CR) 10Dzahn: "addressed inline comment, not worth it anymore though given chase's change" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/133663 (owner: 10Dzahn) [23:18:47] (03Abandoned) 10Dzahn: include admins::roots in base [operations/puppet] - 10https://gerrit.wikimedia.org/r/133663 (owner: 10Dzahn) [23:19:04] mwalker: I only kind of did. Ori pinged me at 3:50 asking if I could take it since he's home sick [23:19:12] (as in at home while unwell, not desiring to be home) [23:24:23] !log catrope synchronized php-1.24wmf5/extensions/Wikidata [23:24:28] Logged the message, Master [23:24:54] yay [23:25:07] aude: The API fixes? [23:25:35] !log catrope synchronized php-1.24wmf5/extensions/MobileFrontend [23:25:39] Logged the message, Master [23:26:46] !log catrope synchronized php-1.24wmf5/extensions/VisualEditor [23:26:51] Logged the message, Master [23:26:59] hoo: more importantly, jquery fixes [23:27:09] ah, those [23:27:17] seems to work, though we may have one more tomorrow [23:27:41] I tried to review earlier, but well... spent half of hte day in bed, kind of sick :/ [23:29:16] it's alright [23:29:38] i'm sure adrian or henning can review the other thing and not horrible if we deploy with the bug [23:29:41] as long as we fix soonish [23:30:37] Ok... I guess I'm back 100% tomorrow, and hopefully can get the dumps up with Ariel (if he's working tomorrow) [23:31:31] (03CR) 10Catrope: [C: 032] Adding autopatrolled user group for dewikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134132 (https://bugzilla.wikimedia.org/65495) (owner: 10Gerrit Patch Uploader) [23:31:42] (03Merged) 10jenkins-bot: Adding autopatrolled user group for dewikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134132 (https://bugzilla.wikimedia.org/65495) (owner: 10Gerrit Patch Uploader) [23:31:59] aude: As you can tell from that list you guys aren't the only ones that had jQuery trouble :) [23:32:07] yeah [23:32:14] wasn't too terrible [23:32:34] Our was [23:32:50] imagine users might find a bug though, that we missed [23:33:00] They changed something to do with how .css() works and now if you try to compute CSS properties in hidden iframes in Firefox, you get a nasty error [23:33:05] To be fair, few people have hidden iframes [23:33:12] hmmm [23:33:42] bd808: What's the script that automatically reports when you check out a new version of the code in /a/common ? Cause it hung just now [23:34:12] It's a git hook calling to neon [23:34:25] oh, that [23:34:28] funny thing :P [23:34:50] I see [23:34:54] I set NOLOGMSG=1 in my .bashrc to skip it [23:35:03] It's nice how that hook only fires on a merge, but not on a fast forward [23:35:12] which is kind of awry [23:35:30] bd808: Rebase before pulling :D [23:35:36] The hang is `dologmsg`. Ori was poking various opsen to debug over the weekend [23:35:42] wtf [23:35:59] Isn't dologmsg UDP-based or something? [23:36:01] that netcat can hang at times... had taht before [23:36:09] echo "$*" | nc -q0 neon.wikimedia.org 9200 [23:36:33] Hmm [23:36:36] We could wrap it in a timeout [23:36:38] Someone changed that? [23:36:43] nope [23:37:06] !log catrope synchronized wmf-config/InitialiseSettings.php 'autopatrolled group on dewikivoyage' [23:37:09] Well there's nothing listening on port 9200 on neon [23:37:11] Logged the message, Master [23:37:51] RoanKattouw: Awesome [23:38:35] RoanKattouw: Really? How did logmsgbot get your !log message then? [23:39:25] * bd808 is a mortal without ssh access to neon [23:39:32] [hoch_m@marius-notebook puppet]$ git diff 2a52451fcba30f1987b6834f43f563de2ee4a752~1 2a52451fcba30f1987b6834f43f563de2ee4a752 [23:39:32] - echo "$*" >> /var/log/logmsg [23:39:32] + echo "$*" | nc -q0 neon.wikimedia.org 9200 [23:39:36] Weird [23:40:13] tcp6 0 0 :::9200 :::* LISTEN 996 395650005 9605/python [23:40:18] but just on tcp6? ehh [23:40:19] It does in fact use dologmsg [23:40:25] Maybe it's just slow? [23:40:30] yes, it's slow [23:40:42] how slow [23:40:45] Mixed ipv4/ipv6 sockets will show as ipv6 in netstat [23:40:47] 60+ seconds [23:40:50] it's been slow before this weekend [23:40:51] wtf [23:40:58] Does it try ipv4 first then fall back to v6 or something? [23:41:07] I mean why does sending a 4-character message have to take minutes [23:41:08] tries v6 first, but what bd808 said then [23:41:36] test [23:41:37] Ha there we go [23:41:53] mh [23:41:55] yea, i wondered the same thing [23:42:02] then saw it was just slow [23:42:03] change it to nc -4 ? [23:42:03] Hello [23:42:09] Probably [23:42:18] It's instantaneous if I use the IPv4 IP [23:42:28] Hello [23:42:31] -4 works [23:42:35] Oh. Interesting. [23:42:50] Maybe ipv6 doesn't really route there? [23:43:28] hoo@terbium:~$ ping6 neon.wikimedia.org [23:43:28] PING neon.wikimedia.org(neon.wikimedia.org) 56 data bytes [23:43:30] works fine [23:43:35] yeah. [23:43:48] rtt seems longer for ipv6 than ipv4 [23:44:33] not from terbium [23:44:34] ping doesnt actually tcp [23:44:41] that [23:46:01] (03PS1) 10Hoo man: Use ipv4 when sending log messages to neon [operations/puppet] - 10https://gerrit.wikimedia.org/r/134277 [23:46:52] `telnet -6 neon.wikimedia.org 9200` seems to hang [23:46:58] from tin [23:47:31] "telnet: Unable to connect to remote host: Connection timed out" [23:47:56] So perhaps nc is trying ipv6, timing out and falling back to ipv4 [23:48:43] just tested nc -4 also doesn't work :/ [23:48:58] (03PS1) 10Catrope: dologmsg: Use IPv4 to connect to neon [operations/puppet] - 10https://gerrit.wikimedia.org/r/134278 [23:49:05] hoo: nc -4 worked for me [23:49:12] RoanKattouw: Ok [23:49:25] RoanKattouw: So we ahve to similar changes now, yay [23:49:28] iptables has an exception for 9200 [23:49:30] Look [23:49:30] hoo [23:49:31] it [23:49:31] ip6tables does not [23:49:32] totally [23:49:32] works [23:49:42] urgh sorry [23:49:58] Does this work? [23:50:05] OK your parameter order works too [23:50:07] * hoo jumps to tin [23:50:08] I'll abandon my change [23:50:27] (03Abandoned) 10Catrope: dologmsg: Use IPv4 to connect to neon [operations/puppet] - 10https://gerrit.wikimedia.org/r/134278 (owner: 10Catrope) [23:50:30] wtf [23:50:33] where in puppet is the port 9200 being opened [23:50:43] From tin it works, just terbium doesn't [23:51:03] `time echo "ipv6 only message via nc" | nc -6 -q0 neon.wikimedia.org 9200` -- real 1m3.134s [23:51:27] That's our version of "my code's compiling" [23:51:40] modules/tcpircbot/README [23:52:01] 8 By default, it will connect to Freenode using SSL and listen for incoming [23:52:04] 9 connections on port 9200. If the configuration specifies a CIDR range, only [23:52:07] 10 clients within that range are allowed to connect.... [23:52:40] oh, so that's probably why I couldn't from terbium [23:53:18] 61 $cidr = undef, [23:55:41] (03PS1) 10Dzahn: allow tin and terbium Ipv6 IP to talk to tcpircbot [operations/puppet] - 10https://gerrit.wikimedia.org/r/134279 [23:57:09] bd808: hoo RoanKattouw ^ that ? [23:57:45] mutante: looks good... try it [23:58:17] bot will probably need a restart [23:58:33] I don't know enough about tcpircbot to say [23:58:51] also, there must be a ferm or iptables rule somewhere [23:59:38] here's why it works from tin and not from terbium [23:59:44] try it from "vanadium" , heh [23:59:50] ACCEPT tcp -- vanadium.eqiad.wmnet anywhere tcp dpt:9200 [23:59:53] ACCEPT tcp -- tin.eqiad.wmnet anywhere tcp dpt:9200 [23:59:56] ACCEPT tcp -- fenari.wikimedia.org anywhere tcp dpt:9200