[02:12:03] !log LocalisationUpdate completed (1.23wmf10) at 2014-01-27 02:12:03+00:00 [02:12:13] Logged the message, Master [02:22:28] !log LocalisationUpdate completed (1.23wmf11) at 2014-01-27 02:22:27+00:00 [02:22:36] Logged the message, Master [02:44:23] !log LocalisationUpdate ResourceLoader cache refresh completed at 2014-01-27 02:44:23+00:00 [02:44:31] Logged the message, Master [03:09:24] !log I'm about to do a bunch of experimental bullshit in the production puppet repo, because I can't test network topology in labs. [03:09:32] Logged the message, Master [03:14:11] (03PS1) 10Springle: repool db1010 depool db1023 for schema changes revert vslow load to db1022 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109612 [03:15:30] (03CR) 10Springle: [C: 032] repool db1010 depool db1023 for schema changes revert vslow load to db1022 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109612 (owner: 10Springle) [03:15:38] (03Merged) 10jenkins-bot: repool db1010 depool db1023 for schema changes revert vslow load to db1022 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109612 (owner: 10Springle) [03:16:49] !log springle synchronized wmf-config/db-eqiad.php 'repool db1010, depool db1023' [03:16:58] Logged the message, Master [03:20:30] springle: G'day! I'm seeing intermittent connect failures with labsdb* ("Could not connect to s4: Lost connection to MySQL server at 'reading authorization packet', system error: 0") with no discernible pattern (not constant, different servers). Do you know what causes this? [03:24:52] scfc_de: not immediately. many possibilities. definitely all three labsdb boxes? [03:25:46] all shards too? [03:30:23] (03PS1) 10Andrew Bogott: Import the stackforge openstack module for testing. [operations/puppet] - 10https://gerrit.wikimedia.org/r/109613 [03:31:28] springle: s1, s2, s4 and s5 from a cursory look at my logs. [03:31:38] scfc_de: also, when did they start, approx? [03:32:08] * andrewbogott counts to ten before self-merging [03:34:48] springle: It was quiet since Thu, 23 Jan 2014 05:55:09, and started about Sun, 26 Jan 2014 19:41:09. I run a script once per minute, and it had 55 failures in those roughly eight hours. [03:37:01] (03CR) 10Andrew Bogott: [C: 032] Import the stackforge openstack module for testing. [operations/puppet] - 10https://gerrit.wikimedia.org/r/109613 (owner: 10Andrew Bogott) [03:38:29] scfc_de: thanks. digging... [03:45:38] scfc_de: no obvious server-side issues but i do see a spike in aborted connections. i'm setting mariadb log_warnings=2 on labsdb1002. will need to watch it for an hour or so [03:49:39] springle: Thanks. [03:50:31] (03CR) 10Ori.livneh: "Daniel, just do it -- it shouldn't take a dozen confirmations. I trust your judgment, etc." [operations/puppet] - 10https://gerrit.wikimedia.org/r/109285 (owner: 10Dzahn) [03:56:55] andrewbogott: one second per 860 lines of code? [03:57:11] Yep! [03:57:22] Most of those lines are tests :) [03:58:04] it looks non-awful, at a glance [03:58:09] godspeed and good luck [03:58:39] Yeah, that module was endorsed by some folks in #openstack so I have high hopes, soon to be dashed. [04:10:32] !log xtrabackup clone db1022 to db1023 [04:10:40] Logged the message, Master [04:11:50] mark, if awake: Is there a separate IP pool for labs in eqiad, or is it the same set that we're using in tampa? [06:13:43] andrewbogott: one of your focus areas is labs, right? can you help me debug a network issue there? [06:13:54] maybe, what's up? [06:14:56] andrewbogott: separate subnet for sure, but I don't remember which one; DNS doesn't have it, right? [06:15:14] (03PS1) 10Andrew Bogott: Simple Openstack test cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/109615 [06:15:39] all three labsdb boxes started seeing aborted tcp connections around 2014-01-26 19:30:00 utc. suggests a network issue. but i don't know where to look in librenms [06:15:50] (03CR) 10jenkins-bot: [V: 04-1] Simple Openstack test cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/109615 (owner: 10Andrew Bogott) [06:16:11] wow, this openstack module is not that good isn't it [06:16:38] paravoid: I don't know yet… it looks somewhat promising to me, although I'm not clear if it has actual neutron support yet [06:16:41] springle: connections between labsdb and what? [06:16:52] paravoid, springle, I just updated https://wikitech.wikimedia.org/wiki/IP [06:17:02] paravoid: various clients, including each other [06:17:06] andrewbogott: the swift stuff is really bad for sure [06:17:25] i don't see a pattern yet for affected clients, though several of the most frequent are pmtpa [06:17:38] paravoid: I haven't looked at those bits :( I assume the nova sections get the most attention. [06:18:37] springle: I believe that all labs traffic passes through virt2. So might start by looking at its icinga page [06:18:39] * andrewbogott does this now [06:18:56] springle: production hosts or labs instances? [06:19:20] paravoid: no production hosts at all yet. only labs instances and each other [06:19:28] anyone want to look at https://gerrit.wikimedia.org/r/#/c/109074/ ? [06:19:36] hm, virt2 says 0% loss so that's not it [06:19:59] springle: default labs instances are heavily firewalled. Could it just be that? [06:20:19] springle: each other? so between labsdb1001 and, say, labsdb1002? [06:20:47] * andrewbogott curses autocorrect [06:20:59] paravoid: correct. though those are infrequent. tools-exec-01.pmtpa.wmflabs is a big offender [06:21:01] I need to restart so I can stop my client from s/springle/sprinkle/ brb [06:21:37] springle, what labs project do these boxes live in? [06:21:56] labsdb are production hosts [06:22:17] tools-exec-01 is toollabs [06:22:25] so "tools" I suppose [06:22:46] Is this something that worked until recently, or never worked? [06:23:34] mariadb doesn't know why the connection abort. "Unknown error". usually means an external glitch [06:24:02] I don't see anything wrong with e.g. labsdb1001's port [06:24:26] andrewbogott: it spiked at 2014-01-26 19:3x:xx on all three labsdb. prectically non-existent before that [06:24:49] there's a bunch of [06:24:49] Jan 24 00:02:09 virt2 kernel: [5113381.225710] nf_conntrack: table full, dropping packet. [06:24:50] so, 11h ago, or so [06:24:53] Jan 24 00:02:09 virt2 kernel: [5113381.447881] nf_conntrack: table full, dropping packet. [06:24:59] but none after the 24th [06:25:05] (on virt2) [06:25:54] andrewbogott: on an unrelated note since I'm looking at icinga, there's a bunch of unrelated with each other alerts for virt hosts [06:26:25] paravoid: I'm after the DC folks to get a new drive into virt5. [06:27:00] disk space is a constant juggle [06:27:14] And, dammit, I just fussed with the certs on Friday, they must've been changed again. [06:27:15] * andrewbogott scowls [06:28:27] springle, 'it spiked' = dropped packets? [06:28:40] Or by 'spiked' to you mean 'stopped working entirely'? [06:28:49] What port would mariadb be using? [06:29:06] andrewbogott: paravoid: http://aerosuidae.net/aborted_connections.png [06:29:27] sorry that's just an image. i hacked it up from some queries [06:29:35] wow [06:30:04] labsdb1001 is enwiki. it will be much larger always [06:31:00] Is tools-db in the mix here? (I thought that db queries from exec nodes generally happened via tools-db, but I don't know much.) [06:31:44] * springle doesn't know about tools-db [06:32:16] So this is ~10 hours ago, yes? [06:32:23] roughly, yes [06:32:53] (03PS1) 10Faidon Liambotis: Fix gitblit duplicate definition [operations/puppet] - 10https://gerrit.wikimedia.org/r/109617 [06:33:09] (03CR) 10Faidon Liambotis: [C: 032] Fix gitblit duplicate definition [operations/puppet] - 10https://gerrit.wikimedia.org/r/109617 (owner: 10Faidon Liambotis) [06:34:51] springle: Does that correspond to… this? http://ganglia.wmflabs.org/latest/graph.php?r=day&z=xlarge&me=wmflabs&m=load_one&s=by+name&mc=2&g=network_report [06:34:59] RECOVERY - Puppet freshness on antimony is OK: puppet ran at Mon Jan 27 06:34:52 UTC 2014 [06:35:19] PROBLEM - Puppet freshness on antimony is CRITICAL: Last successful Puppet run was Mon 27 Jan 2014 06:34:52 AM UTC [06:35:37] this would explain it, this is saturating a gigabit [06:35:46] andrewbogott: looks like it [06:36:19] 08:20 < springle> paravoid: correct. though those are infrequent. tools-exec-01.pmtpa.wmflabs is a big offender [06:36:21] er [06:36:25] http://ganglia.wikimedia.org/latest/?c=Virtualization%20cluster%20pmtpa&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [06:36:31] this is... broken [06:36:56] ganglia for virt hosts is broken, which explains why production ganglia looks okay [06:37:28] springle: (unrelated) db1057 broken LD, I found no RT, just making sure you know :) [06:37:29] can you find ganglia graphs for tampa virt hosts at all? i sure can't [06:37:50] andrewbogott: no, that's my point :) [06:38:18] yep [06:38:28] Is this a 'recently stopped working' or 'never tracked to begin with'? [06:38:36] dunno, I wasn't looking [06:39:22] paravoid: ah yes, thanks. forgot about that one [06:39:26] And I guess my next question is… is that crazy network graph because the connection failures are causing bad behavior, or does it show the bad behavior that is causing the connect failures? [06:39:44] the latter, probably [06:39:48] something is generating traffic [06:39:59] saturating some gigabit somewhere [06:40:08] which means lost packets and in turn dropped TCP connections [06:40:40] surely ganglia can break this down by host [06:40:46] brb, I'm overdue for a coffee dosage :) [06:40:50] jetlag and everything :) [06:42:24] springle: Looks like a misbehaving tool. Still digging... [06:42:48] andrewbogott: thank you! [06:43:32] Maybe not, we have two different projs showing that same graph shape. http://ganglia.wmflabs.org/latest/?c=visualeditor&m=load_one&r=day&s=by%20name&hc=4&mc=2 and http://ganglia.wmflabs.org/latest/?c=tools&m=load_one&r=day&s=by%20name&hc=4&mc=2 [06:43:42] So I bet one is cause and one is symptom… [06:44:47] oh, well, now that I read the axes... [06:44:53] ACKNOWLEDGEMENT - RAID on db1057 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) Sean Pringle RT 6711 [06:45:45] :) [06:46:59] RECOVERY - Puppet freshness on antimony is OK: puppet ran at Mon Jan 27 06:46:52 UTC 2014 [06:47:19] PROBLEM - Puppet freshness on antimony is CRITICAL: Last successful Puppet run was Mon 27 Jan 2014 06:46:52 AM UTC [06:50:58] grrr, why can't I log into these nodes today? [06:52:10] springle, can you log into any exec node? [06:56:57] andrewbogott: no, but then i've never tried before either [06:59:09] (03PS1) 10Springle: repool db1023 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109620 [06:59:35] (03CR) 10Springle: [C: 032] repool db1023 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109620 (owner: 10Springle) [06:59:41] (03Merged) 10jenkins-bot: repool db1023 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109620 (owner: 10Springle) [07:00:33] !log springle synchronized wmf-config/db-eqiad.php 'repool db1023' [07:00:42] Logged the message, Master [07:01:06] well. [07:01:23] springle, I don't know why my keys don't work on the exec nodes, but it's hard to detect and kill the bad tool without [07:01:49] wait, spoke too soon [07:02:08] my root keys don't work but my user does and I have sudo [07:08:23] springle, I don't know quite what I'm looking at. I've run netstat -t -u and one process has a gigantic 'Send-Q'. [07:08:30] But I don't know if that corresponds to heavy network use or not. [07:08:39] Any suggestions how to locate the offending tool? [07:09:42] ntop? or netstat [07:11:10] don't have ntop. can you advise how to read netstat? [07:11:17] for example, ^^^^ [07:13:44] iftop on there? [07:14:11] nope [07:15:30] i'd try to identify the port send/recv the most. then use netsta to get a pid [07:16:36] And how do I know what send/recvs the most? [07:16:58] iftop is a standard debian pkg. can you just apt-get it? [07:17:33] RECOVERY - Puppet freshness on antimony is OK: puppet ran at Mon Jan 27 07:17:24 UTC 2014 [07:17:45] Bad admin practice! But, yes :( [07:18:36] :) maybe paravoid knows a better way [07:19:38] could tcpdump instead i guess. laborious [07:20:18] so… netstat ? [07:21:29] grr, iftop doesn't show me port or process anyway [07:21:39] err.. something like netstat -p [07:21:39] just tools-exec-02.pmtpa.wmflabs => 10.0.0.45 [07:21:42] or -tp [07:21:45] Whcih I knew already [07:22:15] netstat 949 [07:23:49] iftop can show both source and dest ports. S and D keys iirc [07:26:08] ok, I continue to think that 949 is the offender but… still no idea as to pid or tool [07:27:16] talking to 10.0.0.45 [07:27:21] lsof -i tcp | grep 949 [07:28:19] that returns nothing at all [07:30:20] 949 is a privileged port. i wouldn't have expected something that low [07:31:35] thanks matanya, now how to find the mchenry-only classes :) https://bugzilla.wikimedia.org/show_bug.cgi?id=57890#c5 [07:31:53] springle: https://dpaste.de/TrWo [07:32:05] Nemo_bis: look in manifests/site.pp [07:32:45] matanya: yes but when I did it was a mess; let's see [07:33:41] (03PS1) 10Matanya: swift: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109625 [07:36:12] mogge guillom [07:36:34] andrewbogott: netstat -p ? [07:36:59] matanya: my problem with mchenry is that in site.pp it doesn't have *anything* with mail-sounding names, except privateexim::aliases::private (and that's not it) [07:38:19] I think I had grepped all the other classes' names too, but couldn't find anything sensible [07:38:51] Nemo_bis: you will have to ask one that really knows what is going on mchenry, it might even be it is not puppetized [07:39:14] hmm it's the only one including privateexim::aliases::private [07:39:26] matanya: you'd think! But... [07:39:27] tcp 0 0 tools-exec-02.pmtpa:949 10.0.0.45:38467 ESTABLISHED - [07:39:37] That last '-' is where the pid and program should be [07:39:43] yeah [07:40:46] * andrewbogott has /no/ idea what that means [07:41:25] andrewbogott: did you use sudo ? [07:41:29] or are you root? [07:41:40] I'm root [07:42:02] hmm, under 1024 doesn't show unless your are root, but you are :/ [07:42:45] I'm also not clear on what 10.0.0.45 and why labs can reach it, or thinks it can reach it [07:42:45] andrewbogott: just for the sake of the test, please do with sudo [07:43:43] nah, same. [07:44:31] can you telnet/nmap it? [07:44:49] ? An outbound port? [07:45:00] 949 [07:45:15] (03PS1) 10Springle: depool db1015 for schema changes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109626 [07:45:20] you mean like 'telnet localhost 949' [07:45:40] yes andrewbogott [07:45:44] hi mutante [07:45:47] ah, that's just refused. [07:46:02] so i guess the port isn't open? [07:46:45] (03CR) 10Springle: [C: 032] depool db1015 for schema changes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109626 (owner: 10Springle) [07:46:50] and i guess you tried with lsof andrewbogott ? [07:46:52] (03Merged) 10jenkins-bot: depool db1015 for schema changes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109626 (owner: 10Springle) [07:46:53] matanya: the private mchenry stuff is in private/modules/privateexim/ [07:46:57] andrewbogott: [07:47:00] yeah, lsof knows nothing [07:47:19] Nemo_bis: see mutante's reply [07:48:34] !log springle synchronized wmf-config/db-eqiad.php 'depool db1015' [07:48:42] Logged the message, Master [07:49:31] andrewbogott: found your culprit [07:49:34] i think [07:49:41] yeah? How? [07:49:51] it is nfs, and nfs is provided by the kernel, so no PID [07:49:56] i grep 10.0.0.45 in dns [07:50:06] and it is labnfs [07:50:40] So the network is saturated by something doing file access... [07:50:44] yes [07:50:57] now you need to find that [07:51:01] any way for us to know who? [07:51:14] either iostat [07:51:17] or lsof [07:51:35] (03CR) 10Dzahn: "add some reviewers from mobile team" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109511 (owner: 10Matanya) [07:51:58] for a lint change? [07:52:00] (03PS1) 10Springle: switch db1015 to innodb_file_per_table [operations/puppet] - 10https://gerrit.wikimedia.org/r/109627 [07:52:05] doesn't lsof require us to know /which/ file? [07:52:17] no, you can look by rate [07:52:28] mutante: just review it and merge, mobile has nothing to do with all that [07:52:56] paravoid: is mobile::vumi even still used? [07:53:01] no idea [07:53:14] (03CR) 10Springle: [C: 032] switch db1015 to innodb_file_per_table [operations/puppet] - 10https://gerrit.wikimedia.org/r/109627 (owner: 10Springle) [07:53:16] that was more what i wanted confirmed by adding them [07:53:18] but ok [07:53:31] ok, it wasn't obvious :) [07:53:53] also all that package version pinning from the manifests seems ugly to me [07:54:07] yea,i'll review it,no worries, the queues is just really long and this is like the latest of them all [07:54:12] I'd rather prefer a package { [ 'python-redis', 'python-smpp', ... ]: ... } [07:54:55] paravoid: i think matanya is splitting all the files and packages [07:55:00] for style reasons [07:55:01] i do [07:55:03] um, matanya, sorry I do not at all see how to do that. The lsof man page is long... [07:55:07] per style guide [07:55:33] andrewbogott: lsof /name/of/device [07:55:34] (03CR) 10Faidon Liambotis: [C: 032] "I don't see any issue with that or why it would be controversial." [operations/puppet] - 10https://gerrit.wikimedia.org/r/109325 (owner: 10Ottomata) [07:57:39] matanya: I mean, where is the rate? [07:57:46] And, anyway, unclear what the device is in this context... [07:57:49] mutante: if you can please link rt 6634 to 6684, i don't have permission [07:58:19] andrewbogott: i meant combining lsof and iostat, sorry i wasn't clear [07:58:19] !log xtrabackup clone db1022 to db1015 [07:58:26] Logged the message, Master [07:58:45] the device is the nfs mount [07:58:56] what ever it is called in df -h [07:58:57] matanya: done [07:59:01] thanks [08:00:35] paravoid: https://gerrit.wikimedia.org/r/#/c/83768/ uncontroversial ? [08:01:21] the reply to your comment was starting WIP https://gerrit.wikimedia.org/r/#/c/107831/ for discussion how to do it automatically, but that isnt ready like this [08:01:55] matanya: sorry to be dim, I do not follow at all. I need to iostat each of the 100+ files mounted on each of the 3 nfs volumes? That doesn't narrow it down much [08:03:57] andrewbogott: ah, so that isn't very helpful mmethod. do you know what nfs is being queried ? [08:04:23] (03CR) 10Dzahn: "that was more to confirm mobile::vumi is still used and will be" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109511 (owner: 10Matanya) [08:04:24] matanya: please pretend like I don't know anything that I haven't already told you :) [08:04:41] sorry, i will [08:07:17] ok, andrewbogott better approache, rpcdebug [08:07:26] man there are some seriously big error logs in progress here [08:08:07] 2345298511 <- am I counting digits right? 2tb? [08:09:02] yeah, god [08:09:13] That's not even the only one, there's another that's 2.9 [08:09:19] I guess those are good candidates for iostat [08:09:42] (03CR) 10Dzahn: [C: 031] imagescaler: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109502 (owner: 10Matanya) [08:09:53] Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn [08:09:54] vda 1.38 1.29 137.38 6313385 673611120 [08:10:33] I guess between the two of them that's… maybe enough traffic to break things, huh? [08:10:49] guess so [08:11:51] well, wait, no matter what process I type in I get that same 137.38 [08:11:55] so I guess that's per device rather than per file or process :( [08:13:52] (03CR) 10Dzahn: search: lint (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/109485 (owner: 10Matanya) [08:14:07] andrewbogott: can you log on the nfs server and check there? [08:14:30] I can… what will I be checking? [08:15:29] iostat [08:16:20] (03CR) 10Matanya: search: lint (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/109485 (owner: 10Matanya) [08:18:14] iostat of what? [08:21:51] andrewbogott: correction : nfsstat -rc [08:22:29] nfsstat -rc ? [08:22:53] no, just that as a start [08:23:04] Ah, that's emtpy. Just 0s [08:23:13] sigh [08:23:45] I may drop this and leave it to Coren when he wakes up. Wasting lots of time, and I"m not sure springle still cares anyway :) [08:24:30] one last shot andrewbogott [08:24:36] 'k [08:24:38] on the toolnode [08:24:52] nfsiostat -h -m -t 1 4 > size.out [08:25:29] that got me a usage statement in a file :) [08:25:50] what ? [08:25:58] any broken paramter? [08:26:37] -m is not mentioned in the usage out. Neither is -t [08:27:02] ok use: nfsiostat -h [08:27:48] http://manpages.ubuntu.com/manpages/oneiric/man1/nfsiostat.1.html [08:27:48] that's the same usage [08:28:14] the man page clearly stats it is there :/ ok, wasted too much of your time [08:28:22] From oneiric though [08:28:49] I'm willing to keep poking if you have more ideas, just frustrated [08:29:04] And figuring that at any minute the offending process will complete and I'll never learn anything :( [08:29:05] http://manpages.ubuntu.com/manpages/precise/man1/nfsiostat.1.html ok, here for precise [08:30:49] (03PS2) 10Matanya: search: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109485 [08:31:21] so no nfsiostat andrewbogott ? [08:32:03] https://dpaste.de/CpzE [08:32:57] not the keys :) [08:32:58] hm, I don't have the version in that last link either, since -h is '-help' in my version [08:33:33] that explians this, -h should be humen readble [08:34:11] none of those kb/s look very big to me, must be missing something [08:35:19] because the network is still crazy busy… http://ganglia.wmflabs.org/latest/?r=day&cs=&ce=&c=tools&h=tools-exec-02&tab=m&vn=&mc=2&z=small&metric_group=ALLGROUPS [08:35:57] andrewbogott: let is run for some time. nfsiostat 1 4 [08:38:08] https://dpaste.de/v23t [08:41:44] this is a bit more suspected [08:45:23] hello Nemo_bis [08:48:07] (03PS2) 10Andrew Bogott: Simple Openstack test cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/109615 [08:49:13] back soon... [09:04:48] (03CR) 10Ori.livneh: [C: 032] "Thanks!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/108831 (owner: 10Tim Landscheidt) [09:06:21] (03PS1) 10Matanya: dns: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109632 [09:23:52] (03CR) 10Dzahn: [C: 032] "looked at the diff minus tabs one more time, should really just be lint change, can't see anything that would be a functional change" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109485 (owner: 10Matanya) [09:27:37] (03CR) 10Dzahn: "nothing new to see on puppet run on searchidx1001 or elastic1001, that's good" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109485 (owner: 10Matanya) [09:37:14] (03PS1) 10Dzahn: change search monitor_group description to eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/109635 [09:38:33] (03CR) 10Dzahn: "switched to eqiad already, right? but is the description string of the group really all?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109635 (owner: 10Dzahn) [09:39:49] (03CR) 10Dzahn: search: lint (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/109485 (owner: 10Matanya) [09:44:33] (03CR) 10Dzahn: [C: 04-1] mobile: lint (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/109511 (owner: 10Matanya) [09:48:16] (03PS2) 10Matanya: mobile: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109511 [10:17:13] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [10:17:36] (03PS1) 10TTO: Move VisualEditor to secondary status on eswiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109639 [10:18:05] (03CR) 10Dzahn: [C: 031] mobile: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109511 (owner: 10Matanya) [10:18:38] matanya: will merge that too, last sanity check by yuvi as well [10:18:47] thanks [10:20:28] i usually retab the existing thing and then see what other diffs you have, but looks all good to me, noop [10:20:40] (03CR) 10Yuvipanda: [C: 031] "Seems ok to me" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109511 (owner: 10Matanya) [10:20:57] (03CR) 10Dzahn: [C: 032] mobile: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109511 (owner: 10Matanya) [10:21:32] mutante: seems ok :) [10:21:48] confirming on silver/zhen, thx YuviPanda [10:22:09] YuviPanda: we still use vumi? [10:22:13] yup [10:22:28] well, I know we still use Vumi [10:22:34] not sure if it is this instance or some other one :P [10:22:40] I know that we have partners in Africa who are using it [10:23:35] (03CR) 10Dzahn: "@silver: notice: Finished catalog run in 33.50 seconds" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109511 (owner: 10Matanya) [10:24:09] oh, not good [10:24:29] (03PS3) 10Andrew Bogott: Simple Openstack test cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/109615 [10:24:34] matanya: what? [10:24:46] 10 seconds between zhen and silver [10:25:12] andrewbogott: thanks for digging around tools-exec-02 earlier. sorry to frustrate you :) [10:25:22] eh, i don't think you can expect puppet runs to be the exact same length on 2 hosts [10:25:37] yeah, but 10 seconds? [10:25:53] springle: no problem. I'm sure that Coren will solve the problem in 5 minutes, but this way I'll be more likely to pay attention to his explanation :) [10:27:50] andrewbogott: want to +2 https://gerrit.wikimedia.org/r/#/c/109291/? [10:27:51] ;) [10:28:14] YuviPanda: silver and zhen have mobile::vumi in site.pp, that's why [10:28:39] mutante: i would most appricate if you review site.pp lint, it is moving forward and i'll have hard time rebasing [10:28:42] matanya: i'm not sure if they are even the same hardware, likely not [10:28:53] yeah, well. [10:29:05] YuviPanda: I suppose redis.lua inherits that license from a different dev? [10:29:12] andrewbogott: yup [10:29:17] 'k [10:29:29] andrewbogott: I didn't add headers to the API stuff, because you wrote those [10:29:37] weird use of -- as comment marker [10:30:07] andrewbogott: heh, I think it's not as weird as starting array indices from 1 [10:30:13] SQL also uses that [10:30:23] Does lua count from 1? [10:30:24] andrewbogott: still better than vimscript's choice of " as comment marker [10:30:27] andrewbogott: it does! [10:30:32] well, arrays at least [10:30:34] hm [10:30:42] (03CR) 10Andrew Bogott: [C: 032] dyanmicproxy: Add license headers to files [operations/puppet] - 10https://gerrit.wikimedia.org/r/109291 (owner: 10Yuvipanda) [10:31:00] +1091, -746 gaaah, sorrry matanya, merged 2 bigger lint changes, now i'd like to go back to fixing my own planet module first [10:31:22] andrewbogott: you should add a license header to the files you wrote :D [10:31:35] np mutante thanks for those two :) [10:31:40] I'll be sure to use a different license from you just to keep things interesting [10:31:58] andrewbogott: WTFPL FOR LYF! [10:32:16] lol, i was about to say Yuvi will license it was WTFPL [10:32:18] * andrewbogott is old school, likes gpl [10:32:53] andrewbogott: heh :D I don't mind licensing my stuff as GPL too. I can +1 if you want to make the entire thing GPL [10:32:59] add licensing@fsf.org as default reviewer on topic branch licensing:) fixed [10:33:01] andrewbogott: but I guess it has to be v3 or later than v2 [10:33:08] (03CR) 10Andrew Bogott: [C: 032] "This is totally whimsical, shouldn't touch any active machines." [operations/puppet] - 10https://gerrit.wikimedia.org/r/109615 (owner: 10Andrew Bogott) [10:34:35] "shouldn't touch any active machines" <- how you know I'm about to cause an outage on en.wikipedia [10:37:27] jenkins lint: detected use of "shouldn't" in message ?>:) [10:38:06] also 'safe' and 'no-op' [10:38:26] :) [10:39:02] well, those manifests don't work at all, which means it's probably safe for me to go to dinner. [10:40:16] andrewbogott: dinner? are you in japan? [10:40:29] Singapore, utc+8 [10:40:51] all around the globe :) [10:40:59] andrewbogott: class { 'openstack::compute': [10:41:06] inside role/openstack.pp [10:41:10] i think you want [10:41:25] the class from the module [10:41:45] but that would look for role::openstack::computer [10:42:11] Actually it's finding the right classes [10:42:16] But there's a naming conflict deeper down [10:42:18] among other things [10:42:31] (03PS1) 10Springle: repool db1015, warm up [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109641 [10:42:32] isnt thtat class { '::openstack::compute'" [10:42:37] matanya: I usually live in the midwest US, just here waiting out the winter. [10:42:38] in the role class [10:42:47] mutante: yeah, :: would be more correct. It happens to work as is [10:43:04] interesting, because i ran into that issue before and it would not [10:43:17] just looked in role:: itself.. ok [10:43:19] (03CR) 10Springle: [C: 032] repool db1015, warm up [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109641 (owner: 10Springle) [10:43:24] (03Merged) 10jenkins-bot: repool db1015, warm up [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109641 (owner: 10Springle) [10:43:33] matanya: whereas mutante is waiting out the California winter in Germany, which seems an odd choice [10:43:51] lol, it is, you normally would not pick January to be in .de [10:44:33] !log springle synchronized wmf-config/db-eqiad.php 'repool db1015, warm up' [10:44:41] Logged the message, Master [10:45:23] needs to go to Tokyo again [10:46:49] Right now it is exactly 95 Fahrenheit degrees warmer here than at home. [10:46:59] wow [10:47:32] Yep, worth the trip :) [10:48:00] at least i'm in a city so it's still rain, but more country-side and it's snow [10:48:07] OK, dinner -- back later. [10:48:12] enjoy,ttyl [10:48:14] mutante, d'you know if we have a techops meeting later? [10:48:22] andrewbogott: i don't :p [10:48:34] hm, seems like we're due... [10:48:35] because the last one was skipped [10:48:48] either it's the normal cycle again [10:48:54] or it's sooner because we skipped? [10:49:21] weird. [10:50:00] it was due to US holiday [10:50:16] * andrewbogott spams the list [10:50:25] thx, i wanted to find out,too [10:53:59] (03CR) 10Dzahn: "the ticket also says though "Diederik van Liere said we need the second one. " is that before or after this patch was created?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109516 (owner: 10Matanya) [10:56:27] (03CR) 10Matanya: "There are four logs there, Tim approved the one i'm removing here, and Diederik van Liere prevented the removal of the second the other. " [operations/puppet] - 10https://gerrit.wikimedia.org/r/109516 (owner: 10Matanya) [11:01:05] (03CR) 10Dzahn: [C: 031] "ok, gotcha, +1" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109516 (owner: 10Matanya) [11:01:57] (03CR) 10Alexandros Kosiaris: "Approach seems fine to me. My point is of a pedantic nature." [operations/puppet] - 10https://gerrit.wikimedia.org/r/109316 (owner: 10Ottomata) [11:02:23] PROBLEM - Host mw31 is DOWN: PING CRITICAL - Packet loss = 100% [11:03:03] RECOVERY - Host mw31 is UP: PING OK - Packet loss = 0%, RTA = 35.72 ms [11:03:03] PROBLEM - Disk space on elastic1008 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 11208 MB (3% inode=99%): [11:12:01] (03CR) 10Addshore: Add wikibase permissions to MWOAuthGrantPermissions [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109333 (owner: 10Addshore) [11:20:13] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [11:42:10] (03PS1) 10Springle: depool db1006 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109644 [11:42:51] (03CR) 10Springle: [C: 032] depool db1006 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109644 (owner: 10Springle) [11:42:57] (03Merged) 10jenkins-bot: depool db1006 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109644 (owner: 10Springle) [11:45:01] (03PS1) 10Dzahn: send BZ community metrics to communitymetrics@ [operations/puppet] - 10https://gerrit.wikimedia.org/r/109646 [11:46:13] (03CR) 10Dzahn: "clarification: this _email address_ is used on kaulen, the puppet module we are changing here is not, it's what sets up the new BZ" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109646 (owner: 10Dzahn) [11:54:31] (03PS3) 10Matanya: site: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109507 [11:57:40] (03CR) 10Dzahn: "added aklapper to communitymetrics@ alias" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109646 (owner: 10Dzahn) [12:14:07] (03PS1) 10Dzahn: funnel instead of redirect historic Bugzilla URLs [operations/apache-config] - 10https://gerrit.wikimedia.org/r/109652 [12:14:10] (03CR) 10jenkins-bot: [V: 04-1] funnel instead of redirect historic Bugzilla URLs [operations/apache-config] - 10https://gerrit.wikimedia.org/r/109652 (owner: 10Dzahn) [12:18:55] (03PS1) 10Matanya: snapshots: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109653 [12:25:01] (03CR) 10Aklapper: "I like." [operations/puppet] - 10https://gerrit.wikimedia.org/r/109646 (owner: 10Dzahn) [12:27:03] (03CR) 10Matanya: [C: 031] send BZ community metrics to communitymetrics@ [operations/puppet] - 10https://gerrit.wikimedia.org/r/109646 (owner: 10Dzahn) [12:28:53] (03CR) 10Dzahn: [C: 032] send BZ community metrics to communitymetrics@ [operations/puppet] - 10https://gerrit.wikimedia.org/r/109646 (owner: 10Dzahn) [13:07:33] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Mon 27 Jan 2014 10:06:51 AM UTC [13:08:20] (03PS2) 10Dzahn: funnel instead of redirect historic Bugzilla URLs [operations/apache-config] - 10https://gerrit.wikimedia.org/r/109652 [13:08:23] (03CR) 10jenkins-bot: [V: 04-1] funnel instead of redirect historic Bugzilla URLs [operations/apache-config] - 10https://gerrit.wikimedia.org/r/109652 (owner: 10Dzahn) [13:11:29] (03PS3) 10Dzahn: funnel instead of redirect historic Bugzilla URLs [operations/apache-config] - 10https://gerrit.wikimedia.org/r/109652 [13:15:43] (03CR) 10Dzahn: "should be like I4d00cce7d27657c06023bc5f3b222908d9399e9c wher hashar did this for doc.mediawiki and solve RT #2675" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/109652 (owner: 10Dzahn) [13:20:33] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Last successful Puppet run was Mon 27 Jan 2014 10:19:42 AM UTC [13:26:53] !log springle synchronized wmf-config/db-eqiad.php 'depool db1006' [13:27:01] Logged the message, Master [13:29:30] (03PS1) 10Dzahn: add FIXMEs for erzurumi references [operations/puppet] - 10https://gerrit.wikimedia.org/r/109655 [13:30:29] (03PS2) 10Dzahn: add FIXMEs for erzurumi references [operations/puppet] - 10https://gerrit.wikimedia.org/r/109655 [13:31:06] (03CR) 10Dzahn: "Jeff, re: RT #6634, it said erzurumi is already idle. what about the stomp monitoring now?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109655 (owner: 10Dzahn) [13:34:14] (03PS1) 10Dzahn: remove erzurumi from DNS [operations/dns] - 10https://gerrit.wikimedia.org/r/109656 [13:35:35] !log xtrabackup clone db1022 to db1006 [13:35:41] Logged the message, Master [13:36:04] (03CR) 10Dzahn: "puppet/monitoring removal in I4022b1a1f1fc4b819f47d1a2cccb58c1b7179238" [operations/dns] - 10https://gerrit.wikimedia.org/r/109656 (owner: 10Dzahn) [13:37:33] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Mon 27 Jan 2014 10:36:32 AM UTC [15:19:58] (03CR) 10Manybubbles: [C: 04-1] "Lets go with not merging now: Ifaa1e45b95cb2e7ba7199675d8553b07a8215abc isn't deployed yet. It looks like it'll be deployed on Thursday, " [operations/puppet] - 10https://gerrit.wikimedia.org/r/108852 (owner: 10Manybubbles) [15:20:03] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:21:03] RECOVERY - RAID on searchidx1001 is OK: OK: optimal, 1 logical, 4 physical [15:22:27] (03PS1) 10Hashar: beta: fatal monitor looked at wrong file [operations/puppet] - 10https://gerrit.wikimedia.org/r/109662 [15:25:50] (03CR) 10Cmcmahon: [C: 031] beta: fatal monitor looked at wrong file [operations/puppet] - 10https://gerrit.wikimedia.org/r/109662 (owner: 10Hashar) [15:54:19] (03CR) 10Alexandros Kosiaris: [C: 032] beta: fatal monitor looked at wrong file [operations/puppet] - 10https://gerrit.wikimedia.org/r/109662 (owner: 10Hashar) [15:56:03] (03CR) 10Milimetric: "> And this surprised me:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109316 (owner: 10Ottomata) [16:02:52] (03PS1) 10Cmcmahon: Run every 12 hours instead of every minute. [operations/puppet] - 10https://gerrit.wikimedia.org/r/109673 [16:06:39] (03CR) 10Manybubbles: [C: 031] Run every 12 hours instead of every minute. [operations/puppet] - 10https://gerrit.wikimedia.org/r/109673 (owner: 10Cmcmahon) [16:08:33] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Last successful Puppet run was Mon 27 Jan 2014 10:06:51 AM UTC [16:09:31] (03CR) 10Andrew Bogott: "Hm... it's unclear to me whether or not '12' should be quoted. Quotes seem right to me but other manifests leave it unquoted." [operations/puppet] - 10https://gerrit.wikimedia.org/r/109673 (owner: 10Cmcmahon) [16:11:36] (03PS2) 10Cmcmahon: Run every 12 hours instead of every minute. [operations/puppet] - 10https://gerrit.wikimedia.org/r/109673 [16:12:23] (03CR) 10Cmcmahon: "Andrew, I think you're right, I removed the quotes" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109673 (owner: 10Cmcmahon) [16:16:03] (03CR) 10Manybubbles: [C: 031] change search monitor_group description to eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/109635 (owner: 10Dzahn) [16:21:33] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Last successful Puppet run was Mon 27 Jan 2014 10:19:42 AM UTC [16:26:24] (03CR) 10Andrew Bogott: [C: 032] Run every 12 hours instead of every minute. [operations/puppet] - 10https://gerrit.wikimedia.org/r/109673 (owner: 10Cmcmahon) [16:38:33] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Last successful Puppet run was Mon 27 Jan 2014 10:36:32 AM UTC [16:43:33] PROBLEM - Puppet freshness on searchidx1001 is CRITICAL: Last successful Puppet run was Mon 27 Jan 2014 01:43:07 PM UTC [16:43:33] RECOVERY - Puppet freshness on searchidx1001 is OK: puppet ran at Mon Jan 27 16:43:26 UTC 2014 [16:48:03] RECOVERY - Disk space on elastic1008 is OK: DISK OK [16:49:35] (03PS1) 10Andrew Bogott: ::qualify some top-level class references [operations/puppet] - 10https://gerrit.wikimedia.org/r/109678 [16:51:28] (03CR) 10Andrew Bogott: [C: 032] ::qualify some top-level class references [operations/puppet] - 10https://gerrit.wikimedia.org/r/109678 (owner: 10Andrew Bogott) [16:54:45] (03CR) 10Ottomata: ""I kind of grumble at the idea I need to chase the variables down the file one by one."" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109316 (owner: 10Ottomata) [16:58:29] Reedy: can you merge https://gerrit.wikimedia.org/r/#/c/103355/ or do I need to change the dependency? [17:11:21] (03CR) 10Nikerabbit: Run every 12 hours instead of every minute. (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/109673 (owner: 10Cmcmahon) [17:15:32] (03PS1) 10Ottomata: Access for nuria on analytics cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/109684 [17:17:29] (03CR) 10Ottomata: [C: 032 V: 032] Access for nuria on analytics cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/109684 (owner: 10Ottomata) [17:18:23] (03PS2) 10Ottomata: Including role::analytics::clients on stat1002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/109325 [17:18:27] (03CR) 10Ottomata: [C: 032 V: 032] Including role::analytics::clients on stat1002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/109325 (owner: 10Ottomata) [17:23:54] (03PS1) 10Ottomata: Wrapping some package openjdk-7-jdk installs in if !defined [operations/puppet] - 10https://gerrit.wikimedia.org/r/109686 [17:24:13] (03PS2) 10Ottomata: Wrapping some package openjdk-7-jdk installs in if !defined [operations/puppet] - 10https://gerrit.wikimedia.org/r/109686 [17:24:19] (03CR) 10Ottomata: [C: 032 V: 032] Wrapping some package openjdk-7-jdk installs in if !defined [operations/puppet] - 10https://gerrit.wikimedia.org/r/109686 (owner: 10Ottomata) [17:26:42] !log csteipp synchronized php-1.23wmf10/extensions/TimedMediaHandler 'bug 56699 refix' [17:26:50] Logged the message, Master [17:35:57] (03PS1) 10Odder: Enable per-wiki addition to 'translationadmin' group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109689 [17:39:17] Hi Ryan_Lane! [17:39:33] i'm starting to look into figuring out how to use git-deploy/sartoris/trebuchet with java stuff [17:39:40] i'm really confused about what repositories are for what [17:39:52] (trebuchet is the intended new name for sartoris, is that right?) [17:40:00] maaaybe :P [17:44:26] YuviPanda: who can tell? [17:44:33] I want tos tart hacking on git-deploy [17:44:38] buti really don't know what to use [17:44:39] ottomata: Ryan_Lane only, IIRC [17:44:51] maybe ori, maybe not. They say ori knows everything [17:44:55] haha [17:45:01] (03CR) 10Nemo bis: "This patch confuses me." (033 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109689 (owner: 10Odder) [17:47:09] ottomata: greg-g is probably a good place to start as well, at least to help point you to the right people. :) [17:47:34] ok thanks! [17:47:36] greg-g? [17:47:57] (03CR) 10Odder: Enable per-wiki addition to 'translationadmin' group (032 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109689 (owner: 10Odder) [17:50:45] (03CR) 10CSteipp: [C: 031] Add checkuser OAuth group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109308 (owner: 10Anomie) [17:50:51] (03CR) 10Nemo bis: Enable per-wiki addition to 'translationadmin' group (032 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109689 (owner: 10Odder) [17:54:48] (03PS1) 10Manybubbles: Split some wikis into more shards [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109692 [17:55:21] (03PS2) 10Manybubbles: Split some wikis into more shards [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109692 [17:56:13] (03CR) 10Manybubbles: [C: 04-1] "-1 until the next Cirrus release window." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109692 (owner: 10Manybubbles) [17:58:08] (03PS3) 10Manybubbles: Split some wikis into more shards [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109692 [18:09:49] twkozlowski: Someone needs to (whether it's me) [18:11:12] Reedy: want to review my minor changes? [18:11:45] manybubbles, I was thinking... since GeoData will be moving to Elastic anyway, I could give you my Solr cluster which is mostly idling ATM. Elasti and Solr would co-exist then until the lattr dies [18:12:11] manybubbles: how do ya'll deploy ES? [18:12:23] MaxSem: I like more servers. they are yummy [18:12:45] greg-g: puppet for turning on new servers. pretty much. there is a symlink that is made manually for some reason [18:12:46] deal [18:12:59] manybubbles: k [18:13:08] * greg-g is trying to find someone with git-deploy + java experience [18:13:31] greg-g: I've done a lot of talking about that with ottomata [18:13:36] * greg-g nods [18:13:42] he's the one asking for help! [18:13:44] but none of us have done it so far as I know [18:14:08] ottomata: you're breaking new ground, tread lightly and purposefully, but don't stop moving. [18:14:57] greg-g: git-deploy, trebuchet deploy, sartoris, trebuchet trigger, or trebuchet ricochet? [18:15:15] ori: touche(t) [18:15:39] manybubbles, then we could proceed as soon as we get approval from opsen [18:15:55] MaxSem: don't you need to move the data into ES first? [18:15:58] ottomata: what are you trying to do? [18:16:15] manybubbles, no - they would run in parallel [18:16:21] (03PS2) 10Odder: Enable per-wiki addition to 'translationadmin' group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109689 [18:16:54] MaxSem: cool [18:18:19] (03PS3) 10Odder: Enable per-wiki addition to 'translationadmin' group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109689 [18:19:47] (03PS1) 10Ottomata: Fixing META_MW_REDIRECT_URI in web_config.yaml.erb [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/109697 [18:20:00] (03CR) 10Ottomata: [C: 032 V: 032] Fixing META_MW_REDIRECT_URI in web_config.yaml.erb [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/109697 (owner: 10Ottomata) [18:21:41] (03CR) 10Ori.livneh: [C: 032] logstash: Add support for fatalmonitor dashboard [operations/puppet] - 10https://gerrit.wikimedia.org/r/109324 (owner: 10BryanDavis) [18:22:55] (03CR) 10Odder: Enable per-wiki addition to 'translationadmin' group (033 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109689 (owner: 10Odder) [18:23:52] bd808: ran puppet on logstash nodes [18:24:35] * bd808 goes to tail some log files [18:24:57] ori: Looks like it restarted cleanly [18:28:09] (03PS4) 10Odder: Enable per-wiki addition to 'translationadmin' group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109689 [18:31:43] PROBLEM - Varnish HTTP text-backend on cp1054 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:31:53] PROBLEM - Varnish traffic logger on cp1054 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:32:13] PROBLEM - Varnish HTCP daemon on cp1054 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:34:44] RECOVERY - Varnish traffic logger on cp1054 is OK: PROCS OK: 2 processes with command name varnishncsa [18:34:50] (03PS2) 10Reedy: Disable global AbuseFilter for private/fishbowl wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109498 (owner: 10Hoo man) [18:34:56] (03CR) 10Reedy: [C: 032] Disable global AbuseFilter for private/fishbowl wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109498 (owner: 10Hoo man) [18:35:02] (03Merged) 10jenkins-bot: Disable global AbuseFilter for private/fishbowl wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109498 (owner: 10Hoo man) [18:35:03] RECOVERY - Varnish HTCP daemon on cp1054 is OK: PROCS OK: 1 process with UID = 111 (vhtcpd), args vhtcpd [18:35:33] RECOVERY - Varnish HTTP text-backend on cp1054 is OK: HTTP OK: HTTP/1.1 200 OK - 189 bytes in 0.001 second response time [18:35:49] !log varnishd saturating cpu on cp1054 [18:35:56] Logged the message, Master [18:36:09] ottomata, are you on RT duty? [18:36:53] yup [18:37:04] it's our old friend [18:37:20] !log dmesg on cp1054 [16167206.414548] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) [18:37:27] Logged the message, Master [18:39:02] (03PS5) 10Odder: Added the 48x48 resolution for commons.ico, it was based on the original icon. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103355 (owner: 10Yatinmaan) [18:39:38] (03PS2) 10Ottomata: Puppetizing wikimetrics for use in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/109316 [18:40:28] chrismcmahon: are you around? reagrding https://gerrit.wikimedia.org/r/#/c/109673/2 [18:40:30] (03CR) 10Odder: [C: 031] "This should get rid of the abandoned dependency, I think." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103355 (owner: 10Yatinmaan) [18:40:34] ri wha? [18:40:35] ori [18:40:37] Reedy: ^^ [18:40:45] ottomata: cp1054 alerts above [18:40:51] hi matanya [18:40:59] just said it recovered, is it telling the truth? [18:41:36] hi chrismcmahon , i don't understand your change. cron works 0-23, so 12 would never be twice a day, am i missing something [18:41:51] yes, i think so [18:42:00] matanya: I did it wrong I think. [18:42:40] it should be reverted, want to me to push a revert patch, or you would? [18:42:52] matanya: I mostly wanted the timing not to be '*' which hashar set originally. [18:43:07] what do you want it to be? [18:43:23] matanya: twice per day would be best I thinki [18:43:34] (03PS6) 10Reedy: Added the 48x48 resolution for commons.ico, it was based on the original icon. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103355 (owner: 10Yatinmaan) [18:43:36] (03PS3) 10Ottomata: Puppetizing wikimetrics for use in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/109316 [18:43:39] (03CR) 10Reedy: [C: 032] Added the 48x48 resolution for commons.ico, it was based on the original icon. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103355 (owner: 10Yatinmaan) [18:43:50] chrismcmahon: so you would need somthing like '12', '0' [18:43:53] (03Merged) 10jenkins-bot: Added the 48x48 resolution for commons.ico, it was based on the original icon. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103355 (owner: 10Yatinmaan) [18:43:55] matanya: but I'll still need to make a tweak after that [18:44:02] RobH: Can you help debugging a redirect loop problem with doc.wikimedia.org? I think one of your changes might have broken it https://github.com/wikimedia/operations-puppet/commit/c137dbe8b91a7a631a1336bffbfee528a7cc238c [18:44:05] e.g. https://doc.wikimedia.org/mediawiki-core/master/php/ and https://doc.wikimedia.org/mediawiki-core/master/js/ [18:44:06] and it should be quoted [18:44:09] are both 301 loops [18:44:14] matanya: there was another setting in that file that was 12, I just copied that [18:44:19] (03CR) 10Ottomata: [C: 032 V: 032] Puppetizing wikimetrics for use in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/109316 (owner: 10Ottomata) [18:44:55] yes, chrismcmahon but that runs once a day [18:45:20] (03CR) 10Reedy: [C: 04-2] "This really hasn't been discussed, and it's also not a trivial task" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109458 (owner: 10Tim Landscheidt) [18:45:22] !log cp1054 appears to have recovered, ops to investigate [18:45:29] Logged the message, Master [18:45:59] * matanya points at andrewbogott_afk, maybe i missed something. [18:46:48] RobH: Note that the SSL sections there where *not* equivalents of the non-SSL sections. doc.wikimedia.org is a https-only domain, the non-ssl sections were redirects, the ssl sections where the only one with the proper document root set up. [18:48:19] Hm.. I see that isn't true, the document root did get moved. OK, so what's causing it then? [18:48:30] https://raw.github.com/wikimedia/operations-puppet/production/modules/contint/files/apache/doc.wikimedia.org [18:49:19] bd808: ^ [18:49:21] https://doc.wikimedia.org/favicon.php works, https://doc.wikimedia.org/favicon.ico is a loop [18:49:37] https://doc.wikimedia.org/mediawiki-core/master/php/ and https://doc.wikimedia.org/mediawiki-core/master/js/ (not supposed to be redirects at all) are both broken [18:49:41] Krinkle: the same thing was happening with logstash, which is also on misc-varnish IIRC [18:49:50] ori: thx [18:49:56] (03PS4) 10Reedy: wgWhitelistRead: remove wikimedia Israel related. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109591 (owner: 10Matanya) [18:50:05] (03CR) 10Reedy: [C: 032] wgWhitelistRead: remove wikimedia Israel related. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109591 (owner: 10Matanya) [18:50:12] (03Merged) 10jenkins-bot: wgWhitelistRead: remove wikimedia Israel related. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109591 (owner: 10Matanya) [18:50:32] (03CR) 10Matanya: Run every 12 hours instead of every minute. (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/109673 (owner: 10Cmcmahon) [18:50:48] (03PS3) 10Reedy: Restrict upload on Korean Wikinews [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108690 (owner: 10John F. Lewis) [18:50:58] (03CR) 10Reedy: [C: 032] Restrict upload on Korean Wikinews [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108690 (owner: 10John F. Lewis) [18:51:04] Krinkle: it doesn't loop for me, can you curl -vvv the url? [18:51:16] (03Merged) 10jenkins-bot: Restrict upload on Korean Wikinews [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/108690 (owner: 10John F. Lewis) [18:51:32] I've not been able to personally recreate the redirect loop for http->https on logstash but I have had several reports of it from users. [18:52:04] (03PS3) 10Reedy: hewikinews: tidy NS look [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109593 (owner: 10Matanya) [18:52:05] file a bug, assign to faidon [18:52:22] (03CR) 10Reedy: [C: 032] hewikinews: tidy NS look [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109593 (owner: 10Matanya) [18:52:28] ori: strange, doesn't loop for me from curl. It does from Chrome though. Maybe DNS fuckup? [18:52:28] (03Merged) 10jenkins-bot: hewikinews: tidy NS look [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109593 (owner: 10Matanya) [18:52:53] Like, if it still points to the old DNS, it doesn't get routed properly for requests that aren't 404 or file listings. [18:53:05] because those don't redirect loop for me [18:53:19] Or maybe just any url I've visited before [18:53:50] Weird, after I did the curl request the browser is no longer looping on any of those urls [18:54:00] I've been able to reproduce this for over 10 minutes [18:54:03] and not anymore [18:54:05] wtf [18:56:01] (03PS1) 10Cmcmahon: set time intervals properly [operations/puppet] - 10https://gerrit.wikimedia.org/r/109701 [18:56:37] matanya: is this OK? https://gerrit.wikimedia.org/r/#/c/109701/ [18:57:10] (03CR) 10jenkins-bot: [V: 04-1] set time intervals properly [operations/puppet] - 10https://gerrit.wikimedia.org/r/109701 (owner: 10Cmcmahon) [18:57:27] no chrismcmahon :) [18:58:06] I seem to be recreate the redirect loop against logstash with `curl -v -L http://logstash.wikimedia.org/` [18:58:50] matanya: argh [18:58:52] chrismcmahon: [0,12] [18:59:24] i mean ['0','12'] [18:59:46] (03PS2) 10Cmcmahon: set time intervals properly [operations/puppet] - 10https://gerrit.wikimedia.org/r/109701 [18:59:56] chrismcmahon: you can usepuppet parser validate [19:00:01] *`puppet parser validate [19:00:05] arrg [19:00:08] puppet parser validate [19:00:17] (03PS3) 10Cmcmahon: set time intervals properly [operations/puppet] - 10https://gerrit.wikimedia.org/r/109701 [19:00:33] matanya: I haven't set a cron job in a LONG time, and never with puppet. [19:00:44] good sign :) [19:01:08] matanya: thanks, I think is is OK now [19:01:29] yes, i'll ignore the qoutes [19:01:57] add some ops so they can merge chrismcmahon [19:03:05] (03CR) 10Matanya: [C: 031] "This change should replace https://gerrit.wikimedia.org/r/#/c/109673/ which doesn't do what it was intended to do." [operations/puppet] - 10https://gerrit.wikimedia.org/r/109701 (owner: 10Cmcmahon) [19:03:12] Krinkle: https://gerrit.wikimedia.org/r/#/c/109702/ [19:03:41] (03CR) 10Matanya: "should be reverted by https://gerrit.wikimedia.org/r/#/c/109701/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109673 (owner: 10Cmcmahon) [19:05:25] !log ori synchronized php-1.23wmf11/extensions/WikimediaShopLink 'I21e34fe0f: Update WikimediaShopLink to master for PHP link insert' [19:05:32] Logged the message, Master [19:07:41] !log ori synchronized php-1.23wmf10/extensions/WikimediaShopLink 'I21e34fe0f: Update WikimediaShopLink to master for PHP link insert' [19:07:48] Logged the message, Master [19:08:13] !log reedy synchronized wmf-config/ [19:08:21] Logged the message, Master [19:09:29] ksnider: pm? [19:12:01] !log reedy synchronized docroot/bits/favicon/commons.ico [19:12:07] Logged the message, Master [19:13:10] (03PS4) 10Matanya: site: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109507 [19:13:11] Yay! All our favicons are now served in 16px, 32px and 48px versions. Yay! [19:13:19] brion: ^^ :) [19:13:31] awesome :D [19:13:32] thanks [19:13:41] my retina displays thank you :D [19:13:55] brion: we need to add High res dpi support for our captchas too!!1 [19:14:09] we need to replace the captchas entirely :D [19:14:14] I guess it's time for the logos now we have favicons out of the way :) [19:14:26] ah logos are gonna be fun [19:14:31] gotta re-render all the text [19:14:43] brion: heh [19:14:44] They're cool, except for Wikiquote and, hm, Wikinews [19:14:46] too bad we don't have a SVG-friendly logo [19:14:54] brion: also, can you look at PS2 of https://gerrit.wikimedia.org/r/#/c/109311/ and tell me if it is just a NOP? [19:14:55] which we could serve direct [19:15:02] puzzle globe's a bit heavyweight as svg [19:15:10] brion: looks like that to me, but maybe there's a subtle fuckupiness of PHP arrays that I'm missing [19:15:14] brion: this might be because the Wikipedia logo was created with Illustrator [19:15:21] brion: (just came across that from the IRC bot, and was curious) [19:15:23] (I think) [19:15:41] ori: [19:15:42] 44 Warning: call_user_func() expects parameter 1 to be a valid callback, class 'WikimediaShopLinkHooks' does not have a method 'geoSetup' in /usr/local/apache/common-local/php-1.23wmf10/includes/Setup.php on line 594 [19:15:45] Is that you? [19:16:02] yes, i'll fix [19:16:14] YuviPanda: doesn't look like a NOP to me [19:16:20] the before version modifies $params array [19:16:23] the after version does not [19:16:33] brion: right, so the patch isn't a NOP but the function is? [19:16:33] Reedy: err, are you sure? [19:16:43] It disappeared just after pasting it [19:16:52] YuviPanda: well the function becomes one [19:16:54] just lingering I guess [19:17:01] brion: right. Just wanted to make sure. [19:17:03] brion: thanks! [19:17:03] Reedy: yes, it was removed, so scap atomicity thing [19:17:08] or lack thereof [19:17:28] Software Transactional Memory along with a bunch of Monads is what Scap really needs... [19:17:41] Not really [19:17:54] Error messages linger till more errors push them out of the way [19:18:11] in that case more C++ with templates? [19:18:49] i'm going to invent a new language based on mediawiki templates [19:19:07] {{#curlybracehash}} [19:20:18] ori, Krinkle: I filed https://bugzilla.wikimedia.org/show_bug.cgi?id=60488 about http redirect loop. Testing via curl shows that it comes and goes so my money is on one of the nodes in the misc-lb cluster misbehaving. Maybe not adding the proper X-Forwarded-Proto header? [19:21:08] you can curl each of them with -H 'host: ...' to see [19:21:31] (03PS4) 10coren: Fix dependency libtiff5-alt-dev -> libtiff4-dev [operations/debs/vips] - 10https://gerrit.wikimedia.org/r/102617 [19:37:30] (03PS1) 10Ottomata: Some fixes for wikimetrics in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/109709 [19:38:09] (03CR) 10Ottomata: [C: 032 V: 032] Some fixes for wikimetrics in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/109709 (owner: 10Ottomata) [19:41:24] (03PS1) 10Ottomata: Not writing out :80 in redirect urls [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/109714 [19:41:41] (03CR) 10Ottomata: [C: 032 V: 032] Not writing out :80 in redirect urls [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/109714 (owner: 10Ottomata) [19:43:30] (03PS1) 10Ottomata: Updating wikimetrics module [operations/puppet] - 10https://gerrit.wikimedia.org/r/109715 [19:43:39] (03CR) 10Ottomata: [C: 032 V: 032] Updating wikimetrics module [operations/puppet] - 10https://gerrit.wikimedia.org/r/109715 (owner: 10Ottomata) [19:43:50] ^ YuviPanda [20:03:53] anyone in here have root on the payments cluster? [20:05:29] mwalker: I signed the special magic NDA thing, but I don't know if I got the magic. Need me to try? [20:06:08] (I think I only had it given to me "just in case" though and don't remember seeing an access request to match) [20:06:17] (03PS1) 10Danny B.: skwiki: Configure transwiki import sources. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/109723 [20:06:20] sure; the host I need access to is boron -- I have to apply a security patch to a repo that's ordinarily protected by a deploy script [20:07:03] * Coren tries to figure out the right bastion to use to reach it. [20:07:21] tellurium.wikimedia.org [20:07:50] mwalker: Nope. Not cool enough to reach. Sorry. [20:08:42] mwalker: try jeff green [20:08:49] he's at a wedding right now [20:09:39] otherwise, yes he is my typical PoC [20:11:04] looks like my other options are mark and paravoid -- you guys around? [20:12:14] BTW mwalker do you do the PDF stuff? [20:12:22] yep; that's also me [20:12:37] is there any RTL site i can test the new system? [20:13:15] mwalker: I think aharoni asked about such a thing too somewhere [20:13:32] not that's in labs -- you should be able to import RtL content though to test that [20:13:51] I can also import an RtL wiki from a dump if you have a suggestion for a small one [20:14:26] mwalker: hewikisource/news/qoute are rather small [20:14:35] news being the smallest [20:16:12] mwalker: http://dumps.wikimedia.org/hewikinews/20140117/ [20:16:17] 1.1 mega [20:18:12] (03PS1) 10Ottomata: Using correct variables for labsdb database creds [operations/puppet] - 10https://gerrit.wikimedia.org/r/109726 [20:18:47] (03CR) 10Ottomata: [C: 032 V: 032] Using correct variables for labsdb database creds [operations/puppet] - 10https://gerrit.wikimedia.org/r/109726 (owner: 10Ottomata) [20:35:27] !log aaron synchronized php-1.23wmf11/includes/actions/InfoAction.php '5f94782c2bf09ce6c00b6ad37a68d08fef802fe5' [20:35:36] Logged the message, Master [20:40:45] (03PS1) 10Ori.livneh: Provision performance.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/109731 [20:45:49] (03CR) 10Ori.livneh: [C: 032] "Faidon, FYI. On the one hand, serving this from tungsten adds another role to a machine that is possibly already doing too much. On the ot" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109731 (owner: 10Ori.livneh) [20:46:39] ottomata: is it ok to puppet-merge your wikimetrics changes? [20:47:07] * ori goes with "yes". [20:48:17] (03PS1) 10Matanya: network: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/109732 [20:54:48] !log reload-vcl on cp1043: Backend host '"holmium.eikimrfis.org"' could not be resolved to an IP address: Name or service not known. (culprit: https://gerrit.wikimedia.org/r/#/c/109008/) [20:55:47] (03PS1) 10Ori.livneh: Revert "setup misc-web-lb to cache for blog.w.o server holmium" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109733 [20:56:13] (03PS2) 10Ori.livneh: Revert "setup misc-web-lb to cache for blog.w.o server holmium" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109733 [20:56:27] (03CR) 10Ori.livneh: [C: 032 V: 032] Revert "setup misc-web-lb to cache for blog.w.o server holmium" [operations/puppet] - 10https://gerrit.wikimedia.org/r/109733 (owner: 10Ori.livneh) [20:56:34] RobH, mutante-away ^^ [20:59:40] (03PS1) 10Ori.livneh: Add CNAME performance.wikimedia.org -> misc-web-lb.eqiad [operations/dns] - 10https://gerrit.wikimedia.org/r/109736 [21:01:28] (03CR) 10Ori.livneh: [C: 032] Add CNAME performance.wikimedia.org -> misc-web-lb.eqiad [operations/dns] - 10https://gerrit.wikimedia.org/r/109736 (owner: 10Ori.livneh) [21:02:09] !log dns update [21:02:16] Logged the message, Master [21:07:42] !log csteipp synchronized php-1.23wmf11/extensions/PdfHandler 'bug 60339' [21:07:49] Logged the message, Master [21:08:11] !log csteipp synchronized php-1.23wmf10/extensions/PdfHandler 'bug 60339' [21:08:18] Logged the message, Master [21:13:24] ori: we are good to merge, thank you [21:18:13] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [21:21:17] ottomata: did you find out about the udp2log emery questions i asked you? [21:22:03] (03PS1) 10Ottomata: Don't need to fully qualify local variables [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/109743 [21:22:17] hmm, matanya, what else was I supposed to find out? [21:22:43] ottomata: if we need the API and teahouse logs [21:23:05] no haven't looked into taht [21:23:08] not really sure who to ask [21:23:09] hm [21:23:32] oo,i can merge this though [21:23:33] https://gerrit.wikimedia.org/r/#/c/109516/1 [21:23:34] if you would like [21:23:55] that is my patch [21:24:20] you added the other two logs, that is why i'm asking you [21:25:31] ottomata: unless it was just per request, which i can't find where is it logged [21:26:38] yeah, if I added them, it was probably because drdee asked me to [21:26:39] not sure [21:27:06] (03PS2) 10Ottomata: emery: remove one udp2log logger. [operations/puppet] - 10https://gerrit.wikimedia.org/r/109516 (owner: 10Matanya) [21:27:10] (03CR) 10Ottomata: [C: 032 V: 032] emery: remove one udp2log logger. [operations/puppet] - 10https://gerrit.wikimedia.org/r/109516 (owner: 10Matanya) [21:30:02] (03CR) 10Ottomata: [C: 032 V: 032] Don't need to fully qualify local variables [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/109743 (owner: 10Ottomata) [21:50:42] (03PS1) 10Ottomata: Fixing default.replication.factor, missing = [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/109762 [21:51:32] (03CR) 10Ottomata: [C: 032 V: 032] Fixing default.replication.factor, missing = [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/109762 (owner: 10Ottomata) [21:58:19] !log csteipp synchronized php-1.23wmf11/includes 'bug 60339' [21:58:27] Logged the message, Master [22:00:36] ksnider: i need to include a non-WMF employee within RT, is that possible, or i need to ask him seperatly and copy his answer? [22:01:26] or if any other one here knows the policy ? [22:02:27] ottomata, jgage: did you guys get around to adding varnishkafka to the bits varnishes? [22:02:29] !log csteipp synchronized php-1.23wmf10/includes 'bug 60339' [22:02:37] Logged the message, Master [22:02:45] matanya: they can send e-mails to RT I think [22:03:02] matanya: yes, you can cc him / add him, we do that all the time [22:03:12] ok, thanks [22:04:42] ori, jgage just sent me an email with his questions a few minutes ago, and i just replied [22:04:53] he's grokking the whole of the analytics cluster first, apparently :p [22:04:54] :) [22:05:26] ETA? [22:06:27] ottomata: ^? [22:06:51] jgage: ? [22:07:12] have to ask him i think, we're using this and other things as ways to get him involved in analytics stuff [22:08:39] hi [22:08:49] wasn't it like a one-line patch? [22:08:52] i think i had it staged at one point [22:09:01] the sticking point is creating the topic, the tool doesn't work [22:10:26] sweet, just read your mail ottomata [22:12:26] ja my script is way better than kafka's default ones anyway :p [22:12:31] ori / jgage / ottomata: Forwarded the outstanding questions to ops@ so everyone can be on the same page :) [22:12:34] they should include it :) [22:12:35] hehe [22:12:40] great [22:12:41] thanks [22:13:07] !yeah, jgage, in general questions and convos like that are good to have on the ops@ list, for any interested parties [22:14:35] heh ok [22:16:13] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [22:16:15] ksnider, ottomata, jgage -- thanks! [22:17:10] good night [22:21:09] jgage: forgot to answer your question about zookeeper urls, just replied again [22:21:47] (03PS14) 10Physikerwelt: Add Mathoid module (TeX -> MathML / SVG conversion web service) [operations/puppet] - 10https://gerrit.wikimedia.org/r/90733 [22:30:26] ottomata: thanks, just finished meeting. checking now. [23:37:49] Krinkle: are there any requirements regarding polyfils in mediawiki? Specifically we found out our js doesn't work in ie8 due to a missing Function.prototype.bind [23:37:59] wrong channel ...