[00:03:45] Ah! Didn't notice. The account appears to have been registered just today. [00:03:58] (Well, now yesterday.) [00:08:29] PROBLEM - Puppet freshness on gadolinium is CRITICAL: Last successful Puppet run was Tue 27 May 2014 09:07:41 PM UTC [00:09:10] (03PS1) 10Hoo man: Remove duplicate users in admin class's data.yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/135725 [00:09:26] mutante: ^-- easy-peasy code review [00:16:58] (03PS2) 10Hoo man: Remove duplicate users in admin class's data.yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/135725 [00:38:21] (03CR) 10Rush: [C: 031] "seems good" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135725 (owner: 10Hoo man) [00:40:23] (03PS5) 10Rush: admin yaml for linne.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/135633 [00:40:34] (03CR) 10Hoo man: [C: 031] minor changes to InitialiseSettings.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129464 (owner: 10Ricordisamoa) [00:41:38] (03CR) 10Rush: [C: 032 V: 032] admin yaml for linne.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/135633 (owner: 10Rush) [00:58:15] (03PS5) 10Ori.livneh: Add rsyslog module and port existing usage [operations/puppet] - 10https://gerrit.wikimedia.org/r/135447 [01:51:48] (03PS3) 10Springle: The job control approach doesn't work if a script is in non-interactive mode, which seems to be the case for /bin/sh (though not bash?). [operations/puppet] - 10https://gerrit.wikimedia.org/r/135517 [02:14:05] !log LocalisationUpdate completed (1.24wmf5) at 2014-05-28 02:13:02+00:00 [02:14:18] Logged the message, Master [02:25:58] !log LocalisationUpdate completed (1.24wmf6) at 2014-05-28 02:24:55+00:00 [02:26:04] Logged the message, Master [02:44:50] PROBLEM - puppet disabled on ms-be1005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:44:50] PROBLEM - Disk space on ms-be1005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:45:10] PROBLEM - check configured eth on ms-be1005 is CRITICAL: Timeout while attempting connection [02:45:10] PROBLEM - swift-container-auditor on ms-be1005 is CRITICAL: Timeout while attempting connection [02:45:10] PROBLEM - check if dhclient is running on ms-be1005 is CRITICAL: Timeout while attempting connection [02:45:10] PROBLEM - swift-object-server on ms-be1005 is CRITICAL: Timeout while attempting connection [02:45:10] PROBLEM - swift-account-server on ms-be1005 is CRITICAL: Timeout while attempting connection [02:45:10] PROBLEM - swift-object-updater on ms-be1005 is CRITICAL: Timeout while attempting connection [02:45:19] PROBLEM - swift-container-updater on ms-be1005 is CRITICAL: Timeout while attempting connection [02:45:19] PROBLEM - swift-container-replicator on ms-be1005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:45:19] PROBLEM - swift-object-replicator on ms-be1005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:45:19] PROBLEM - swift-container-server on ms-be1005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:45:19] PROBLEM - swift-account-replicator on ms-be1005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:45:19] PROBLEM - swift-account-reaper on ms-be1005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:45:19] PROBLEM - swift-account-auditor on ms-be1005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:45:29] PROBLEM - swift-object-auditor on ms-be1005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:45:29] PROBLEM - DPKG on ms-be1005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:45:50] PROBLEM - RAID on ms-be1005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:46:50] O_O [03:00:29] PROBLEM - Puppet freshness on oxygen is CRITICAL: Last successful Puppet run was Tue 27 May 2014 08:59:17 PM UTC [03:09:29] PROBLEM - Puppet freshness on gadolinium is CRITICAL: Last successful Puppet run was Tue 27 May 2014 09:07:41 PM UTC [03:10:29] PROBLEM - SSH on ms-be1005 is CRITICAL: Connection refused [03:10:35] hmm [03:13:39] PROBLEM - Host ms-be1005 is DOWN: PING CRITICAL - Packet loss = 100% [03:14:09] RECOVERY - check configured eth on ms-be1005 is OK: NRPE: Unable to read output [03:14:09] RECOVERY - check if dhclient is running on ms-be1005 is OK: PROCS OK: 0 processes with command name dhclient [03:14:09] RECOVERY - swift-container-replicator on ms-be1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [03:14:09] RECOVERY - swift-container-auditor on ms-be1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [03:14:09] RECOVERY - swift-object-updater on ms-be1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [03:14:10] RECOVERY - swift-object-replicator on ms-be1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [03:14:10] RECOVERY - swift-account-server on ms-be1005 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [03:14:11] RECOVERY - swift-object-server on ms-be1005 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [03:14:11] RECOVERY - swift-container-updater on ms-be1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [03:14:12] RECOVERY - swift-account-auditor on ms-be1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [03:14:12] RECOVERY - swift-account-reaper on ms-be1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [03:14:13] RECOVERY - swift-account-replicator on ms-be1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [03:14:13] RECOVERY - swift-container-server on ms-be1005 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [03:14:19] RECOVERY - Host ms-be1005 is UP: PING OK - Packet loss = 0%, RTA = 0.63 ms [03:14:19] RECOVERY - DPKG on ms-be1005 is OK: All packages OK [03:14:19] RECOVERY - swift-object-auditor on ms-be1005 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [03:14:29] RECOVERY - SSH on ms-be1005 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [03:14:39] RECOVERY - RAID on ms-be1005 is OK: OK: optimal, 14 logical, 14 physical [03:14:40] RECOVERY - Disk space on ms-be1005 is OK: DISK OK [03:14:40] RECOVERY - puppet disabled on ms-be1005 is OK: OK [03:14:47] wtf was that about [03:19:11] ms-be1005 uptime 4min.. did anyone restart that on the sly? [03:20:55] cpu soft lockup message for a few hours [03:25:30] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed May 28 03:24:24 UTC 2014 (duration 24m 23s) [03:25:36] Logged the message, Master [03:29:00] PROBLEM - NTP on ms-be1005 is CRITICAL: NTP CRITICAL: Offset unknown [03:32:59] RECOVERY - NTP on ms-be1005 is OK: NTP OK: Offset -0.01124811172 secs [05:35:43] (03PS1) 10Springle: raise dbstore connection limit [operations/puppet] - 10https://gerrit.wikimedia.org/r/135743 [05:36:18] (03CR) 10Springle: [C: 032] raise dbstore connection limit [operations/puppet] - 10https://gerrit.wikimedia.org/r/135743 (owner: 10Springle) [05:36:44] +400 %! :) [05:40:15] from a very conservative starting point :) [05:50:50] (03CR) 10Springle: [V: 032] raise dbstore connection limit [operations/puppet] - 10https://gerrit.wikimedia.org/r/135743 (owner: 10Springle) [05:50:58] * springle gives up waiting [06:01:29] PROBLEM - Puppet freshness on oxygen is CRITICAL: Last successful Puppet run was Tue 27 May 2014 08:59:17 PM UTC [06:02:15] (03PS1) 10Springle: scripts for turning an upstream mariadb static tarball into a deb [operations/software] - 10https://gerrit.wikimedia.org/r/135744 [06:03:52] (03CR) 10Springle: [C: 032] scripts for turning an upstream mariadb static tarball into a deb [operations/software] - 10https://gerrit.wikimedia.org/r/135744 (owner: 10Springle) [06:10:29] PROBLEM - Puppet freshness on gadolinium is CRITICAL: Last successful Puppet run was Tue 27 May 2014 09:07:41 PM UTC [06:40:25] (03PS1) 10Giuseppe Lavagetto: icinga: replace naggen (WiP) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135746 [06:53:51] (03CR) 10Ori.livneh: icinga: replace naggen (WiP) (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135746 (owner: 10Giuseppe Lavagetto) [06:56:00] <_joe_> ori: that was fast! [06:56:16] <_joe_> it's a way to make me feel guilty about your pending CRs? :) [06:56:25] absolutely [06:56:37] <_joe_> ehe [06:56:50] i enjoy reading python :) [06:56:50] <_joe_> I promise, today I'll dedicate the afternoon to CRs [06:57:00] <_joe_> well, this is not good python [06:57:14] <_joe_> I worked under a couple of limitations there [06:57:37] <_joe_> and it's still missing a few things [06:58:41] it's a bit of a shame to have all the firepower of sqlalchemy at your finger tips and still have a giant query as a string literal! [06:59:10] <_joe_> good thing is, in labs with a toy db it took 10 seconds to generate all icinga config files [06:59:29] i bet that using sqlalchemy declarative can generate that same query from less code [06:59:45] that you could use .. to generate i mean [06:59:47] <_joe_> mmmh don't think so [07:00:09] <_joe_> try to generate a GROUP_CONCAT(CONCAT) of joins between 5 tables [07:00:27] <_joe_> Also, I wanted control on the query [07:00:37] * _joe_ not a fan of ORMs [07:01:43] hmmm! you might be right about group_concat being hard to express [07:01:47] <_joe_> but if you want to try... I think it will be less clear in the code [07:01:54] <_joe_> ori: and there is also a subquery [07:02:12] <_joe_> (I have to be sure I only get enabled resources) [07:02:44] <_joe_> btw the puppet db schema shows why ORMs should be used post-hoc when designing databases [07:02:59] <_joe_> this db is modeled upon the activerecord pattern [07:03:16] <_joe_> so that any useful query on it is a pain in the ass [07:06:16] there has to be a better way, that query is fractal [07:06:35] <_joe_> ori: yeah, but it's springle-approved [07:06:53] he's just curious to see how the database will cope [07:06:57] <_joe_> ori: and it's faster than doing 8K queries (which is the natural way to go) [07:07:09] <_joe_> ori: nah, the explain is not that terrible [07:07:16] <_joe_> also it's a very small database [07:07:34] <_joe_> and it's not a query that will be run every second multiple thousands of time [07:07:48] <_joe_> it's not "webscale" [07:07:56] well, performance isn't the issue, it's just hard to read and maintain [07:08:02] <_joe_> so, I can allow myself to do complex things with queries [07:08:08] <_joe_> I can format it :) [07:08:17] i'm halfway done doing that :P [07:08:25] <_joe_> SQL ain't for the faint of heart [07:08:46] <_joe_> it's something from the COBOL era, and it shows [07:10:21] <_joe_> ori: are you doing anything else on this script? [07:10:31] <_joe_> I was fixing a few details and adding logging [07:10:45] <_joe_> just to be sure we don't overlap our efforts [07:10:53] no no, i'm not that helpful [07:11:00] <_joe_> :) [07:11:36] (03CR) 10Springle: icinga: replace naggen (WiP) (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135746 (owner: 10Giuseppe Lavagetto) [07:12:03] <_joe_> springle: wow good catch [07:12:04] springle: what is this sorcery?! [07:12:16] <_joe_> lol [07:12:25] heh [07:12:35] i suppose it is sorcery [07:12:39] but it is in the manual [07:12:47] <_joe_> that's the awesome in mysql [07:12:50] GROUP BY does an implicit sort [07:12:50] <_joe_> yeah it is [07:12:57] <_joe_> I always forget that [07:12:57] ORDER BY NULL skips it [07:13:23] <_joe_> springle: otoh, we may want resources to be ordered in this case [07:13:34] <_joe_> so that we produce the same output every time [07:13:42] ah [07:13:43] then be explicit [07:13:49] <_joe_> yes [07:13:53] DON'T ORDER BY NULL [07:14:10] heh [07:14:21] <_joe_> I should put a ORDER BY resources.title ASC [07:14:46] <_joe_> so that resources from the same host do stay clustered [07:14:59] +1 [07:15:26] <_joe_> springle: I was also missing the set max_group_concat_len [07:15:58] <_joe_> group_concat_max_len [07:23:31] (03PS2) 10Springle: Use m2-master CNAME to make DB rotations neater. This allows a master switch to be a DNS change plus a simple port 3306 tcp redirect with socat until TTL. Should also help if we switch to a haproxy configuration in the future. [operations/puppet] - 10https://gerrit.wikimedia.org/r/131419 [07:26:46] jenkins bot seems to be slacking off [07:53:36] ori: still awake ? [07:57:47] (03PS1) 10Nemo bis: Add Krinkle to the English Planet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135748 [08:05:03] <_joe_> Nemo_bis: if you need to merge this, just tell me [08:08:07] _joe_: wouldn't harm :) of course it's not urgent [08:10:44] <_joe_> I'll do this during the morning [08:11:08] (03PS8) 10Matanya: Move logs to /var/log/mediawiki [operations/puppet] - 10https://gerrit.wikimedia.org/r/83574 (owner: 10Reedy) [08:15:35] (03PS9) 10Matanya: Move logs to /var/log/mediawiki [operations/puppet] - 10https://gerrit.wikimedia.org/r/83574 (owner: 10Reedy) [08:17:06] (03PS10) 10Matanya: Move logs to /var/log/mediawiki [operations/puppet] - 10https://gerrit.wikimedia.org/r/83574 (owner: 10Reedy) [08:17:57] (03PS11) 10Matanya: Move logs to /var/log/mediawiki [operations/puppet] - 10https://gerrit.wikimedia.org/r/83574 (owner: 10Reedy) [08:18:28] <_joe_> matanya: what was that? :) [08:19:12] morning wake up issues _joe_ [09:02:29] PROBLEM - Puppet freshness on oxygen is CRITICAL: Last successful Puppet run was Tue 27 May 2014 08:59:17 PM UTC [09:03:45] <_joe_> that is 3 minutes ago [09:03:47] <_joe_> wtf [09:08:20] (03PS2) 10Giuseppe Lavagetto: icinga: replace naggen (WiP) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135746 [09:11:29] PROBLEM - Puppet freshness on gadolinium is CRITICAL: Last successful Puppet run was Tue 27 May 2014 09:07:41 PM UTC [09:15:20] _joe_: a day and 3 minutes [09:17:59] <_joe_> d'oh [09:18:01] <_joe_> on it [09:20:20] <_joe_> matanya: 12 hours 3 minutes [09:20:20] <_joe_> :) [09:20:47] yeah, well. :) [09:23:47] _joe_: thinking out loud: would it be useful to suffix it with "x hours ago" ? [09:24:18] <_joe_> yeah it would [09:24:29] doing [09:26:05] we had that [09:26:09] it was reverted [09:27:01] thanks for saving me time akosiaris :) [09:27:06] better use ISO format [09:27:12] 09:07:41 PM UTC??? [09:27:24] PM UTC sounds like an oxymoron to me :) [09:27:32] <_joe_> Nemo_bis: yes [09:28:01] yes please, no PM/AM [09:28:02] * godog hides [09:28:38] <_joe_> the italians don't love AM/PM [09:28:52] <_joe_> that is quite clear now :) [09:29:12] I don't understand why anyone would like AM/PM anyway [09:29:28] i'll change it to 24h [09:29:44] <_joe_> akosiaris: brought to you from the same people that measure distances in miles, yards, feet and inches [09:30:02] because 0404 happens twice a day ;) and for some circumstances 24hr time isn't the best to use [09:30:13] akosiaris: out of curiosity why it was reverted? seems more huma [09:30:16] human [09:30:29] we are all machines [09:30:47] <_joe_> If I could choose, I'd use the unix epoch [09:30:47] something about it misbehaving. It would say either "0 hours ago" or for some reason the hours since the Epoch [09:31:15] so is 24h agreed ? [09:31:29] akosiaris: ah ok! not the idea itself, thanks [09:31:41] p858snake|l: like ? [09:32:41] akosiaris: I deal with customers all day, that is why it isn't always the best [09:33:40] p858snake|l: I was meaning situations but I get the feeling this goes the way of "Whatever the customer wants" [09:34:16] p858snake|l: I will be contacting you asking the time the klingon way :-) [09:34:39] <_joe_> akosiaris: eheh [09:34:43] I could probably just trf you to my boss for that... [09:36:06] heh, that reminds of me https://xkcd.com/806/ [09:36:21] heh, reminds me of* [09:37:33] <_joe_> ok, disabling puppet for ~ 5 mins on neon, trying to check my own puppet files [09:40:21] <_joe_> it works \o/ [09:45:03] did you fix it ? or should i ? [09:45:09] @ _joe_ ^ [09:46:26] <_joe_> matanya: what? [09:46:39] did you fix the 24 vs pm/am ? [09:48:08] <_joe_> no [09:48:16] <_joe_> I don't care that much about that :) [09:48:44] k :) [09:49:15] matanya: if you want to take a stab at it I'll be happy to review though [09:50:18] it is basically just adding -u to the check command [09:54:40] <_joe_> argh we ARE evil [10:26:26] godog: mind running on neon date --date @2458485868 +"Last successful Puppet run was %c" please and letting me know the output ? [10:28:04] matanya: returns pm/am Last successful Puppet run was Wed 27 Nov 2047 04:44:28 PM UTC [10:28:12] you probably want LANG=C there [10:28:51] godog: tested on my machine gave 24h times :/ [10:28:52] or LC_DATE or what have you [10:29:51] godog: what version of date is there? [10:30:12] matanya: the version shouldn't matter, the locale does though [10:30:28] just wondering [10:30:39] coreutils 8.13 btw [10:30:54] whatever precise pangolin has on the repos is a good bet usually matanya [10:31:11] thanks [10:31:50] since you have lab machines an apt-cache policy should give you the same answer [10:32:11] and you find the with dpkg -S `which date` [10:43:51] i tend to forget labs [11:05:32] (03PS1) 10Matanya: puppet-fail: report in 24h time format [operations/puppet] - 10https://gerrit.wikimedia.org/r/135757 [11:05:43] godog: ^ [11:08:16] (03PS2) 10Matanya: puppet-fail: report in 24h time format [operations/puppet] - 10https://gerrit.wikimedia.org/r/135757 [11:08:28] what is up with typos today ? [11:12:07] (03CR) 10Filippo Giunchedi: [C: 031] puppet-fail: report in 24h time format [operations/puppet] - 10https://gerrit.wikimedia.org/r/135757 (owner: 10Matanya) [11:13:14] (03PS3) 10Giuseppe Lavagetto: icinga: replace naggen [operations/puppet] - 10https://gerrit.wikimedia.org/r/135746 [11:14:59] matanya: yup, +1, btw also if you want to get my attention on a code review include me, I'm ignoring grrrit-wm but I read emails :) [11:15:39] i added you, got a conflict on gerrit, you already reviewed :) [11:18:52] (03CR) 10Giuseppe Lavagetto: icinga: replace naggen (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135746 (owner: 10Giuseppe Lavagetto) [11:20:16] haha nice timing, thanks [11:23:47] !log restarting elastic1001 to try out niofs (instead of mmapfs) on advice from a lucene developer [11:23:51] Logged the message, Master [11:30:41] (03PS4) 10Giuseppe Lavagetto: icinga: replace naggen [operations/puppet] - 10https://gerrit.wikimedia.org/r/135746 [12:03:29] PROBLEM - Puppet freshness on oxygen is CRITICAL: Last successful Puppet run was Tue 27 May 2014 08:59:17 PM UTC [12:12:29] PROBLEM - Puppet freshness on gadolinium is CRITICAL: Last successful Puppet run was Tue 27 May 2014 09:07:41 PM UTC [13:03:39] (03CR) 10Faidon Liambotis: [C: 04-1] "Looks good, kudos! Inline for small comments :)" (037 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135746 (owner: 10Giuseppe Lavagetto) [13:05:02] heya mutante!, yt? [13:05:11] gonna ask about the research group thang [13:05:20] ottomata: good morning [13:05:36] are you attending the SoS? [13:06:11] ja should be, dunno what to report for ops stuff [13:06:14] but, yes! [13:23:32] (03PS1) 10Reedy: Add REL1_23 to ExtensionDistributor [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135763 (https://bugzilla.wikimedia.org/65852) [13:24:10] (03CR) 10Reedy: [C: 032] Add REL1_23 to ExtensionDistributor [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135763 (https://bugzilla.wikimedia.org/65852) (owner: 10Reedy) [13:24:18] (03Merged) 10jenkins-bot: Add REL1_23 to ExtensionDistributor [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135763 (https://bugzilla.wikimedia.org/65852) (owner: 10Reedy) [13:24:28] !log restarting elastic1001 to revert it back to mmapfs - niofs wasn't better. worse, even. [13:24:33] Logged the message, Master [13:26:18] !log reedy synchronized wmf-config/ 'Enable REL1_23 in ExtensionDistributor' [13:26:23] Logged the message, Master [13:35:08] Reedy: hello, can you tell me what to do with groupOverrides2 in https://gerrit.wikimedia.org/r/#/c/134400/ ? [13:35:49] If you don't know and I have to test it myself that's also good to know. :) [13:39:11] (03PS6) 10BBlack: Remove more noise. [operations/puppet] - 10https://gerrit.wikimedia.org/r/134984 (owner: 10Dr0ptp4kt) [13:39:15] Nemo_bis: https://github.com/wikimedia/mediawiki-core/blob/master/includes/SiteConfiguration.php [13:45:22] (03CR) 10BBlack: [C: 032 V: 032] Remove more noise. [operations/puppet] - 10https://gerrit.wikimedia.org/r/134984 (owner: 10Dr0ptp4kt) [13:45:34] (03PS2) 10BBlack: zero update - all langs for 416-03 [operations/puppet] - 10https://gerrit.wikimedia.org/r/135691 (owner: 10Yurik) [13:46:03] (03CR) 10BBlack: [C: 032 V: 032] zero update - all langs for 416-03 [operations/puppet] - 10https://gerrit.wikimedia.org/r/135691 (owner: 10Yurik) [13:49:49] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:50:03] Reedy: do you mean that I should use groupOverrides and then the + as we do for "private" "closed" etc.? https://github.com/wikimedia/mediawiki-core/blob/master/includes/SiteConfiguration.php#L77 [13:50:22] (03PS1) 10Reedy: ExtensionDistributor messages are now in json! [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135766 [13:50:39] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 54220 bytes in 2.369 second response time [13:51:01] (03CR) 10Reedy: [C: 032] ExtensionDistributor messages are now in json! [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135766 (owner: 10Reedy) [13:51:09] (03Merged) 10jenkins-bot: ExtensionDistributor messages are now in json! [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135766 (owner: 10Reedy) [13:52:18] Or that I should use that to test locally? :) [13:52:29] !log reedy synchronized wmf-config/CommonSettings.php [13:52:33] Logged the message, Master [14:03:22] (03PS1) 10Manybubbles: Reduce enwiki to 6 Elasticsearch shards [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135768 [14:07:53] (03CR) 10Giuseppe Lavagetto: icinga: replace naggen (037 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135746 (owner: 10Giuseppe Lavagetto) [14:08:32] (03PS5) 10Giuseppe Lavagetto: icinga: replace naggen [operations/puppet] - 10https://gerrit.wikimedia.org/r/135746 [14:33:29] (03CR) 10Giuseppe Lavagetto: "nitpick but LGTM." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/132429 (owner: 10Ori.livneh) [15:00:12] swat! [15:03:08] anomie: are you doing swat? [15:03:11] aude: A last-minute addition to the SWAT today? [15:03:19] or would it be easier if i do myself [15:03:22] anomie: yes :) [15:03:31] * anomie starts SWAT [15:03:34] ok [15:04:16] * anomie forgot to start IRC this morning until he went to start SWAT [15:04:29] PROBLEM - Puppet freshness on oxygen is CRITICAL: Last successful Puppet run was Tue 27 May 2014 08:59:17 PM UTC [15:05:44] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "One issue, easy resolution though." (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/134642 (owner: 10Ori.livneh) [15:07:48] ottomata: so, I'll miss SoS tonight [15:07:53] meeting conflict [15:09:22] !log anomie synchronized php-1.24wmf6/extensions/Wikidata 'SWAT: Fix issue with Wikidata rollback [[gerrit:135767]]' [15:09:25] aude: ^ Please test [15:09:25] yay [15:09:27] Logged the message, Master [15:09:33] looks good [15:09:41] thanks [15:09:45] * anomie is done with SWAT [15:09:57] :) [15:13:29] PROBLEM - Puppet freshness on gadolinium is CRITICAL: Last successful Puppet run was Tue 27 May 2014 09:07:41 PM UTC [15:19:12] (03PS9) 10Giuseppe Lavagetto: dissolve mediawiki::config::* [operations/puppet] - 10https://gerrit.wikimedia.org/r/134642 (owner: 10Ori.livneh) [15:23:54] ah ok paravoid [15:24:08] anything I shoudl know to relay? [15:25:58] (03PS6) 10Giuseppe Lavagetto: icinga: replace naggen [operations/puppet] - 10https://gerrit.wikimedia.org/r/135746 [15:27:30] (03CR) 10Giuseppe Lavagetto: [C: 032] jobrunners: set nice to 19, not 20 [operations/puppet] - 10https://gerrit.wikimedia.org/r/134644 (owner: 10Ori.livneh) [15:30:41] (03PS6) 10Nemo bis: Gather all soft-disabled uploads wikis in one config item [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134400 [15:32:10] (03CR) 10Nemo bis: Gather all soft-disabled uploads wikis in one config item (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134400 (owner: 10Nemo bis) [15:33:57] _joe_: here? [15:34:19] <_joe_> paravoid: yes [15:34:31] I think you also need to order by param_name [15:34:37] or else the order could change for no reason [15:34:48] besides that, why use ||| and @@@ and not \t and \n respectively? ) [15:34:51] <_joe_> the order inside a resource, yes [15:34:59] a :) [15:35:07] grr broken kb [15:35:24] <_joe_> In the first version I used \t and \n in fact :P [15:35:38] <_joe_> I don't know why I changed it [15:35:40] hehe [15:36:02] <_joe_> maybe I wanted to visualize the whole thing in 1 row with mysql? [15:36:48] <_joe_> well, writing code while people unpack and mount furniture around you can make you forget things :( [15:39:43] so.... graphite? [15:40:00] <^d> ya graphite's still screwed up on mw metrics. [15:40:36] * mark packed and mounted his own stuff for sanity ;-) [15:41:19] <_joe_> mark: it really wasn't possible, I would have died trying. We moved to an empty flat [15:45:25] any bugs at the moment where users are using the E-Mail this user function but the e-mails are disappearing ? [15:46:27] NotASpy: not that I know of [15:47:10] OK, I've just helped an editor who has been trying to e-mail some other editors and nothing he does results in e-mails being sent (or arriving, depending on perspective) [15:47:40] NotASpy: can you reproduce it yourself? [15:47:45] wanna try emailing me? [15:48:11] sadly not. I've tried it with the user, he can receive my e-mails, and I've had others e-mail me, and everything works as it should. [15:48:15] (03CR) 10Ori.livneh: [C: 031] dissolve mediawiki::config::* [operations/puppet] - 10https://gerrit.wikimedia.org/r/134642 (owner: 10Ori.livneh) [15:48:22] NotASpy: odd [15:49:08] he confirmed, with a screenshot, that the user interface thinks he's sent an e-mail https://en.wikipedia.org/wiki/File:Email_003_2014.png [15:51:07] (03CR) 10Springle: icinga: replace naggen (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135746 (owner: 10Giuseppe Lavagetto) [15:51:32] NotASpy: yeah, just confirmed it worked here with another person [15:51:35] on enwiki [15:51:43] not to figure out what is special about that user [15:52:01] s/not/now/ [15:52:29] https://en.wikipedia.org/wiki/User:Pratyya_Ghosh is the user in question. [15:54:10] (03PS7) 10Giuseppe Lavagetto: icinga: replace naggen [operations/puppet] - 10https://gerrit.wikimedia.org/r/135746 [15:56:59] <_joe_> springle: PRRR [15:57:07] <_joe_> (for the two-queries comment :P) [15:57:14] lol [15:57:28] NotASpy: sooo, you've had them look in their spam folder right? [15:58:11] yeah, he e-mailed me too, and the e-mail has vanished into the ether. [15:58:17] yes, I was suggesting a single-query solution indeed [15:59:28] odd, so when Pratyya emails anyone via the web interface it doesn't appear anywhere? [15:59:41] <_joe_> paravoid: using CONCAT on the results of a subquery? [15:59:57] no [16:00:02] just do the concat in python [16:00:16] <^d> ori: About? Graphite's still fubar'd for mw stats. [16:00:30] you have rows that are , etc. [16:00:31] greg-g: as far as I can tell, yeah. The e-mails just vanish. [16:00:48] <_joe_> how can you fetch all param_names and param_values with a single query? [16:00:50] ^d: i know, but i can't troubleshoot it. chasemp, were you looking into that? should i file an rt? [16:00:56] <_joe_> without concatting in mysql, i mean [16:00:57] NotASpy: that makes no sense [16:01:06] <_joe_> paravoid: I'm missing something here [16:01:14] loop on the rows, keep "host" in a variable, append to an array, yield the array when "host" changes [16:01:19] <_joe_> paravoid: oh, collecting data [16:01:29] NotASpy: not saying you're wrong, just, it's weeeeird [16:01:39] greg-g: he's never been blocked or anything either, I did wonder if it was something odd like he had been blocked and had e-mail disabled, but he has a clean block-log. [16:01:42] <_joe_> yeah I wanted to avoid doing that. Originally, I wanted to get the content of the define directly from mysql [16:01:52] <_joe_> I used python just to pretty-print it [16:02:04] you're doing split() on arbitrary separators now though [16:02:05] <_joe_> we could get rid of that too [16:02:07] I'm not sure if that's cleaner ) [16:02:10] :) [16:02:15] dammit, broken ;: [16:02:56] <_joe_> paravoid: the only reason for splitting in python is I want to reformat results so that lines are prettily aligned in the output file [16:03:38] <_joe_> springle: is there a way to do something like sprintf(%-30s, value) in SQL? [16:03:44] anyway, I don't think it's worth to bikeshed over ) [16:04:03] it works, it's fine ) [16:04:05] grrrr [16:04:09] NotASpy: can you report a bug for me, with the salient details, cc me (greg@wikimedia.org) [16:04:09] <_joe_> paravoid: you need a new keyboard [16:04:12] yes [16:04:35] _joe_: no, don't try to emulate sprintf. you will go mad [16:04:35] <_joe_> I love my das keyboard [16:04:41] greg-g: will do. [16:04:49] <_joe_> springle: ok so I did remember correctly :P [16:05:23] :) [16:05:39] ok, paper inside the keyboard worked :P [16:05:45] <_joe_> ok, tomorrow morning I will merge this. now I already have ori's mediawiki change to merge [16:06:42] (03PS10) 10Giuseppe Lavagetto: dissolve mediawiki::config::* [operations/puppet] - 10https://gerrit.wikimedia.org/r/134642 (owner: 10Ori.livneh) [16:06:50] NotASpy: thanks much [16:06:52] (03CR) 10Giuseppe Lavagetto: [C: 032] dissolve mediawiki::config::* [operations/puppet] - 10https://gerrit.wikimedia.org/r/134642 (owner: 10Ori.livneh) [16:07:09] \o/ [16:07:25] you can hear a big chunk of the glacier breaking apart and floating off into the ocean [16:08:46] <_joe_> waiting for jenkins [16:11:28] lol [16:13:07] <_joe_> zuul is ~ down again? [16:14:14] <_joe_> ok nevermind, only slow [16:14:52] greg-g: can I add the user in question to the bug report, or would that be bad for privacy reasons ? [16:15:11] <_joe_> Krinkle: you around? [16:15:55] NotASpy: only if they are ok with it. [16:16:48] hmm. This is what they told be before they logged off [16:26:14] Pratyya Okay I'm going now, and do whatever you need to do in order to fix the problem. Thank you. [16:20:03] _joe_: I am [16:21:40] 11:39 < greg-g> so.... graphite? [16:21:40] 11:40 < ^d> ya graphite's still screwed up on mw metrics. [16:21:55] <_joe_> greg-g: ok I'll take a look at that [16:22:02] <_joe_> last time, it was mwprof stuck [16:22:55] _joe_: thanks! sorry for the lazy re-irc ping method [16:23:03] <_joe_> greg-g: np [16:23:14] <_joe_> greg-g: I honestly hoped someone else had time for this [16:25:25] _joe_: :) no idea who would be best to ping about it at this point, hoenstly [16:25:52] (03CR) 10Ori.livneh: icinga: replace naggen (0310 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135746 (owner: 10Giuseppe Lavagetto) [16:26:15] chasemp was looking yesterday [16:26:38] <_joe_> we had mwprof stuck the other day [16:26:51] <_joe_> profiler-to-carbon got probably stuck as a consequence [16:27:08] <_joe_> ori: profiler-to-carbon has no log whatsoever? [16:27:25] in /var/log/upstart perhaps? [16:27:39] i didn't write it :P [16:27:55] <_joe_> I looked at the upstart job and has no 'console log' line [16:28:23] it's the default [16:28:34] if it happened twice recently we should probably disable the cgi script [16:28:53] <_joe_> !restarted mwprof/profiler-to-carbon [16:29:00] <_joe_> !log restarted mwprof/profiler-to-carbon [16:29:05] Logged the message, Master [16:29:12] imo restart the whole thing [16:29:13] <_joe_> ori: well, we could replace mwprof, maybe [16:29:15] mwprofctl restart [16:29:26] <_joe_> ori: I used the upstart job [16:29:52] but that doesn't restart the daemon, only the profiler-to-carbon client [16:30:35] what happened to jenkins? [16:31:12] https://integration.wikimedia.org/zuul/ doesn't look too good [16:31:17] oh hey, it's hashar [16:31:19] ^^ [16:31:22] !log Jenkins / Zuul locked. Looking into it [16:31:26] known issue of doom :( [16:31:27] Logged the message, Master [16:31:33] KIoD [16:31:46] * bd808 was just about to ask about zuul [16:32:18] <_joe_> bd808: I don't really know anything about Zuul [16:32:19] <_joe_> sorry [16:33:03] _joe_: Sheesh. What Have you been learning for all these weeks? ;) [16:33:09] <_joe_> bd808: but I can check [16:33:17] I think hashar is on it [16:33:22] yeah I am on it [16:33:24] it is resuming [16:33:25] <_joe_> bd808: I already tried to troubleshoot zuul in the past [16:33:30] <_joe_> hashar: oh you're here :) [16:33:46] for some reasons workers ends up not being registered anymore in Gearman :-( [16:33:51] * hashar blames python [16:34:16] hashar: do you get texted when zuul/jenkins throw fits? [16:34:26] not in that specific case [16:34:27] <_joe_> ^d: I do see metrics collected again [16:34:35] <_joe_> hashar: rewrite zuul in php [16:34:38] I have yet to reproduce the issue / figure out how to monitor it [16:34:39] oh cool [16:34:49] <^d> _joe_: I do too. Thanks much! [16:35:02] <_joe_> ^d: he did not do much [16:35:11] <_joe_> s/he/eh/ [16:35:46] <_joe_> just restarted the service, I should inspect it and fix it once and for all [16:36:03] <_joe_> or better, improve mwprof so it does not die so often [16:36:18] i don't think mwprof is the issue, tho? [16:36:23] <_joe_> or better again, stop reinventing the wheel and use a standard profile data collector :) [16:36:26] <_joe_> ori: it was [16:36:51] oh, shoot. i'd like to debug that next time it happens then. [16:36:55] <_joe_> ori: both the cgi app and the graphite feeder got stuck at the same moment [16:37:08] <_joe_> ori: no *now* mwprof was working fine [16:37:35] andrewbogott: overheard on #wikimedia-dev: ^d: Can you help diagnose my gerrit account, or redirect me? There are two annoying things: I've lost +2, and I have two accounts [16:37:49] <_joe_> I concluded that since both downstream services got stuck, the reason is upstream :) [16:37:53] ori: thanks, I'll tune in [16:38:07] ^d: pinged andrew because i think it might be the labs/prod uid consolidation [16:38:45] greg-g: https://bugzilla.wikimedia.org/show_bug.cgi?id=65860 incase you've not had your cc notification. [16:39:25] NotASpy: thank you kindly [16:43:55] <_joe_> hashar: I still have jobs that have not completed [16:44:06] _joe_: yeah it is is processing them [16:44:12] <_joe_> ok thanks [16:44:18] _joe_: we had a bunch of browsertests running in between which slowed things down [16:44:26] _joe_: I need to add a few more instances Iguess [16:44:45] we only have 10 executions slots right now [16:44:52] but I am out of quota on labs ;-( [16:47:39] !log Jenkins/Zuul back. Jobs meant to be run on labs instances ended up not being registered anymore with the Zuul Gearman server. That must be a bug in the Jenkins Gearman plugin :-( {{bug|63760}} [16:47:44] Logged the message, Master [16:48:57] greg-g, as part of my depl window in 10 min, i would like to cherrypick https://gerrit.wikimedia.org/r/#/c/132952/ into prod [16:49:39] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 7.14% of data exceeded the critical threshold [500.0] [16:49:55] I thinkJenkins is restored now. It is busy processing the backlog [16:50:51] yurik: uhhh, no? [16:51:01] _joe_: i think it lost the change [16:51:04] the config one i mean [16:51:07] yurik: that isn't even merged yet, thus hasn't spent any time on beta cluster, thus hasn't really been tested [16:51:11] yurik: what's the hurry? [16:51:26] why skip all testing? [16:51:36] <_joe_> ori: ok, I was busy answering you on the naggen one [16:51:53] if you remove jenkins as a reviewer and then re-+2 it'll pick it up [16:52:34] greg-g, lots of annoyed partners - when they do testing, they tend to be logged in already. But you are right, it could wait a bit. [16:52:35] <_joe_> yeah I know [16:52:50] ACKNOWLEDGEMENT - Puppet freshness on oxygen is CRITICAL: Last successful Puppet run was Tue 27 May 2014 08:59:17 PM UTC cpettet waiting for analytics admin cleanup [16:53:32] ACKNOWLEDGEMENT - Puppet freshness on gadolinium is CRITICAL: Last successful Puppet run was Tue 27 May 2014 09:07:41 PM UTC cpettet waiting on admin cleaup [16:54:58] yurik: thanks man, sorry, it just is bad hygeine and I don't want others to think it's ok :) [16:55:29] greg-g, no worries ,that's why we have you - to be our conscience [16:56:54] yurik, tsk tsk! [16:57:30] * yurik hides in shame [16:58:57] :) [16:59:26] _joe_: jenkins +2'd [16:59:43] oh, ori there are two patches waiting for you for vagrant ;) [16:59:54] <_joe_> ori: ahah yeah now you're in the joe's wait queue [16:59:55] <_joe_> :P [16:59:56] i'll review [17:00:02] (03PS1) 10Rush: admin yaml for remaining lvs hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/135788 [17:00:04] (03PS1) 10Dzahn: admin yaml for neon (icinga) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135789 [17:00:10] .!log restarted _joe_, stuck [17:00:29] wow, that is a nice bot! thx whoever made it :) [17:01:17] could we have that deployments link in the topic please? [17:01:30] _joe_: back in 20 [17:01:30] (03PS8) 10Giuseppe Lavagetto: icinga: replace naggen [operations/puppet] - 10https://gerrit.wikimedia.org/r/135746 [17:01:35] (03PS3) 10Hoo man: Remove misc::deployment::scap_scripts from terbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/134282 [17:01:39] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% data above the threshold [250.0] [17:01:45] can you edit joe config only with joe? [17:01:50] yurik: mwalker|away is the bestest. [17:01:51] haha [17:02:03] <_joe_> ori: I'm merging now [17:02:23] <_joe_> yeah I did this on purpose [17:02:25] <_joe_> :P [17:02:26] yurik: that's a good idea re deploys in topic [17:02:32] opsen disagree? [17:02:53] (03CR) 10Giuseppe Lavagetto: icinga: replace naggen (0310 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135746 (owner: 10Giuseppe Lavagetto) [17:03:09] (03CR) 10Rush: [C: 031] admin yaml for neon (icinga) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135789 (owner: 10Dzahn) [17:03:10] we really ought to have our own shortening service... i know zero could really use it for stuff [17:03:26] yurik: in the works, or something [17:03:35] <_joe_> yurik: an url shortener is not a very simple service to provide [17:03:54] <_joe_> and it will get us all the kind of headaches if it's somewhat open to the public [17:03:54] true :( [17:04:05] yeah, ur1.ca is blocked from flickr due to spam [17:04:10] or was at least [17:04:20] zero could use it for things like wiki over SMS - we can have short links in there [17:04:33] <_joe_> greg-g: if we do have a closed internal url shortener [17:04:50] (03CR) 10Dzahn: [C: 031] admin yaml for remaining lvs hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/135788 (owner: 10Rush) [17:04:56] <_joe_> that is publicly usable but not editable [17:05:04] it should be public but links must only point to our own domains [17:05:11] (03PS2) 10Dzahn: admin yaml for neon (icinga) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135789 [17:05:49] (03PS2) 10Rush: admin yaml for remaining lvs hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/135788 [17:05:55] (03CR) 10Rush: [C: 032 V: 032] admin yaml for remaining lvs hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/135788 (owner: 10Rush) [17:06:02] (03CR) 10Dzahn: [C: 032] admin yaml for neon (icinga) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135789 (owner: 10Dzahn) [17:06:11] (03PS3) 10Dzahn: admin yaml for neon (icinga) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135789 [17:06:24] with some weird structure like /p11111 -- page title, or /r44444 for revision (permanent links) [17:06:36] yurik, _joe_: https://www.mediawiki.org/wiki/Requests_for_comment/URL_shortener [17:06:48] thx bd808 [17:07:31] (03CR) 10Dzahn: [C: 032] admin yaml for neon (icinga) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135789 (owner: 10Dzahn) [17:08:23] <_joe_> ori: merged and running on the videoscalers [17:08:38] (03PS1) 10Rush: magnesium admin yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/135792 [17:08:46] (03PS2) 10Rush: magnesium admin yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/135792 [17:09:04] (03CR) 10Rush: [C: 032 V: 032] "just gop" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135792 (owner: 10Rush) [17:10:48] (03PS1) 10Dzahn: admin yaml for nickel [operations/puppet] - 10https://gerrit.wikimedia.org/r/135794 [17:11:33] (03PS1) 10Dzahn: admin yaml for nitrogen [operations/puppet] - 10https://gerrit.wikimedia.org/r/135795 [17:11:36] (03CR) 10Rush: [C: 031] admin yaml for nickel [operations/puppet] - 10https://gerrit.wikimedia.org/r/135794 (owner: 10Dzahn) [17:11:39] jenkins is fully backup so I am disconnecting again [17:12:05] hashar: g'night! [17:12:09] (03PS2) 10Dzahn: admin yaml for nickel [operations/puppet] - 10https://gerrit.wikimedia.org/r/135794 [17:12:31] (03CR) 10Dzahn: [C: 032] admin yaml for nickel [operations/puppet] - 10https://gerrit.wikimedia.org/r/135794 (owner: 10Dzahn) [17:13:00] (03PS2) 10Dzahn: admin yaml for nitrogen [operations/puppet] - 10https://gerrit.wikimedia.org/r/135795 [17:13:29] RECOVERY - NTP on dysprosium is OK: NTP OK: Offset -0.005007743835 secs [17:15:20] (03CR) 10Dzahn: [C: 032] admin yaml for nitrogen [operations/puppet] - 10https://gerrit.wikimedia.org/r/135795 (owner: 10Dzahn) [17:16:25] (03PS1) 10Rush: admin yaml for mchenry [operations/puppet] - 10https://gerrit.wikimedia.org/r/135798 [17:17:14] _joe_: back [17:17:30] <_joe_> ori: it seems we're ok [17:17:35] (03CR) 10Dzahn: "warning, this is old, there are all kinds of puppet errors you'll see on a run already" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135798 (owner: 10Rush) [17:17:50] :) [17:18:52] (03PS2) 10Rush: admin yaml for mchenry/mexia/rubidium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135798 [17:18:56] (03CR) 10Dzahn: [C: 031] "just saying" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135798 (owner: 10Rush) [17:19:10] (03PS3) 10Rush: admin yaml for mchenry/mexia/rubidium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135798 [17:19:20] (03CR) 10Ori.livneh: "paravoid: I added rsyslog::logged_daemon to meet the use-case you articulated but did not introduce usage. In the case of ocg and ganglia " [operations/puppet] - 10https://gerrit.wikimedia.org/r/135447 (owner: 10Ori.livneh) [17:20:24] (03PS1) 10Dzahn: admin yaml for hafnium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135801 [17:20:28] (03CR) 10Rush: [C: 032 V: 032] "here we go..." [operations/puppet] - 10https://gerrit.wikimedia.org/r/135798 (owner: 10Rush) [17:21:01] (03PS2) 10Rush: admin yaml for hafnium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135801 (owner: 10Dzahn) [17:21:08] (03CR) 10Rush: [C: 031] admin yaml for hafnium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135801 (owner: 10Dzahn) [17:21:10] (03CR) 10Dzahn: [C: 032] admin yaml for hafnium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135801 (owner: 10Dzahn) [17:22:00] <_joe_> please run your changes through the compiler guys :) so that you get before merging if it fails [17:22:15] <_joe_> If you need assistance with using it, ping me later [17:22:21] <_joe_> or ori can show you maybe [17:23:04] <_joe_> I'm leaving now, not sure if I'll make it here tonight. [17:23:37] bblack thx for reviewing that patch [17:24:11] happy to show it, but who are you talking to _joe_? [17:25:06] if you mean clicking rebuild in jenkins like last time.. ok.. i know how to [17:25:17] just meeting now [17:28:31] (03PS1) 10Rush: commenting out admin logic on mchenry. Accounts are added fine, but it creates a lot of old puppet noise. If we added new opsen we could uncomment and run but since that's rare this seems reasonable. [operations/puppet] - 10https://gerrit.wikimedia.org/r/135803 [17:28:57] (03CR) 10Rush: [C: 032 V: 032] commenting out admin logic on mchenry. Accounts are added fine, but it creates a lot of old puppet noise. If we added new opsen we could u [operations/puppet] - 10https://gerrit.wikimedia.org/r/135803 (owner: 10Rush) [17:37:36] !log yurik synchronized php-1.24wmf5/extensions/ZeroRatedMobileAccess/ [17:37:41] Logged the message, Master [17:39:53] greg-g, I would like to give jouncebot the ability to change the topic to reflect a current deployment -- that means it needs +o; is that something you can give (or want to)? [17:40:14] mwalker: I have no authority here, contrary to popular belief ;) [17:40:34] robh: ^^ re topic changing [17:41:01] !log yurik synchronized php-1.24wmf6/extensions/ZeroRatedMobileAccess/ [17:41:04] Logged the message, Master [17:43:27] dr0ptp4kt: np [17:43:30] (03PS3) 10Dzahn: admin yaml for hafnium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135801 [17:43:30] uhh, mwalker would it append or just replace entirely? [17:43:42] because we keep who is on RT duty there [17:43:49] it would ammend [17:43:53] *sorry; append [17:44:10] cool, i think its cool with me, and i dont mind doing it, and i dont wanna wait 3 days [17:44:21] but can you put in an access request ticket that I'll immediately take control on [17:44:26] and resolve and give the bot the rights? [17:44:39] just so we have an RT record in the proper place for when folks ask why later =] [17:44:50] (alternatively just email ops-requests with it and gimme the rt# ;) [17:44:56] and i move its queue and handle it [17:45:15] (03CR) 10Dzahn: [C: 032] admin yaml for hafnium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135801 (owner: 10Dzahn) [17:45:22] I'm trying to ensure we document everything since now our team is huge =P [17:45:43] mwalker: are you sure you need that right to change the topic? [17:46:01] i thought anyone could change it in here [17:46:08] if channel mode is -o anyone could change topic [17:46:14] wiki style ?:p [17:46:23] sorry, -t [17:46:24] mwalker: so yea [17:46:26] just do it =] [17:46:34] anyone can, we assume good faith in topic setting [17:46:45] and having the bot put deployment info in during deployments sounds awesome to me. [17:46:45] preserving the RT duty bit will be hardest part of it [17:46:57] make jouncebot say who is on duty :p [17:46:58] indeed, as long as we preserve the rt duty bit [17:47:18] mutante: every 15 minutes "Hey $RT_duty person, have you checked the queue lately?" [17:47:38] haha:) [17:48:00] and make it pick random unmerged gerrit changes and beg for reviews [17:48:07] greg-g, btw, zero is done, so if anyone wants to go ahead [17:48:26] oh; cool [17:48:30] yurik: awesome, thanks sir [17:48:51] bd808: fire at will [17:52:12] ori: so is there more to what _joe_ asked for than hitting rebuild a? [17:52:53] triggered a new build #46 and looks OK [17:53:35] (03PS1) 10Dzahn: admin yaml for zinc (solr::ttm) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135806 [17:55:26] (03CR) 10Dzahn: [C: 032] "prototyping box" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135806 (owner: 10Dzahn) [17:55:42] (03PS2) 10Dzahn: admin yaml for zinc (solr::ttm) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135806 [17:55:58] (03CR) 10Rush: [C: 031] admin yaml for zinc (solr::ttm) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135806 (owner: 10Dzahn) [17:56:20] (03CR) 10Dzahn: [C: 032] admin yaml for zinc (solr::ttm) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135806 (owner: 10Dzahn) [17:57:35] (03PS1) 10Rush: admin yaml mobile100[1-4]\.wikimedia\.org/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/135808 [17:57:37] (03CR) 10jenkins-bot: [V: 04-1] admin yaml mobile100[1-4]\.wikimedia\.org/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/135808 (owner: 10Rush) [17:58:32] greg-g: Updating scap now [17:59:05] <^d> !log all job runners halted at 17:39? graphite shows no jobs being run, runJobs on fluorine also has nothing since the timestamp. [17:59:09] Logged the message, Master [18:00:14] yurik: see ^d log [18:00:30] frack. [18:00:37] yurik: what did your change include? not saying it's you, but timing is bad [18:01:10] http://gdash.wikimedia.org/dashboards/jobq [18:01:23] greg-g, i deployed zero only [18:01:28] (03PS1) 10Rush: admin yaml for ms* hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/135809 [18:01:33] (03PS2) 10Rush: admin yaml mobile100[1-4]\.wikimedia\.org/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/135808 [18:01:46] (03CR) 10Rush: [C: 032 V: 032] "simple change" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135808 (owner: 10Rush) [18:01:57] (03PS1) 10Dzahn: admin yaml for strontium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135810 [18:02:27] Fatal error: Class 'Memcached' not found in /usr/local/apache/common-local/php-1.24wmf5/includes/objectcache/MemcachedPeclBagOStuff.php on line 60 [18:02:47] greg-g, what time did it start? [18:02:47] (03CR) 10Dzahn: [C: 031] admin yaml for ms* hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/135809 (owner: 10Rush) [18:03:21] greg-g: Bug 65549 has the permissions in /srv/deployment/scap/scap/.git messed up again. I won't be able to update scap without help from a root to fix permissions on tin. [18:03:24] (03PS2) 10Rush: admin yaml for strontium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135810 (owner: 10Dzahn) [18:03:26] yurik: not positive http://gdash.wikimedia.org/dashboards/jobq [18:03:28] (03CR) 10Rush: [C: 031] admin yaml for strontium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135810 (owner: 10Dzahn) [18:03:33] bd808|deploy: effing [18:03:54] looks like memcached is not in php [18:03:58] <^d> AaronSchulz: That's suspicious. [18:03:59] * AaronSchulz wtfs [18:04:07] (03PS2) 10Rush: admin yaml for ms* hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/135809 [18:04:09] php -m :/ [18:04:13] (03CR) 10Rush: [C: 032 V: 032] admin yaml for ms* hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/135809 (owner: 10Rush) [18:04:49] bd808|deploy: I IRL pinged the roots near me [18:05:02] * greg-g is at office today [18:05:53] rob-h's on it [18:06:09] !log Restarted logstash on logstash1001; log event volume suspiciously low for the last ~35 minutes [18:06:11] glad it just seems to be boxes for cli scripts ;) [18:06:14] Logged the message, Master [18:06:34] <^d> AaronSchulz: Why did memcached disappear from php all of a sudden? [18:06:59] bd808|deploy: uhh, /srv/deployment/scap/scap/.git is trebuchet wikidev owned [18:07:36] robh: Subdirectories under .git/objects (maybe more) are missing the g+w bit [18:07:49] robh: drwxr-sr-x 2 root wikidev 4096 May 7 22:49 de/ [18:07:50] what the hell, why is everything going down suspiciously right now [18:08:30] robh: chasemp: so mobile1004 and mobile1005 are in racktables because of an open RT ticket to reclaim them [18:08:35] https://rt.wikimedia.org/Ticket/Display.html?id=6350 [18:08:37] actually all boxes, but not apache module php I guess [18:08:59] still gotta figure out 1001-1003 [18:09:26] <^d> AaronSchulz: looking at some of the recent apache puppet changes maybe? [18:09:52] bd808|deploy: ok, try now pls [18:09:54] possibly related, not sure what else is relevant [18:09:57] #5847: remove label from wmf3407-3409 / mobile1001-1003 [18:10:04] i added group write to everythign in there [18:10:07] and here's the other one, that's resolved [18:10:08] (well, i think i did) [18:10:14] robh: Tahk you. git fetch worked [18:10:24] *Thank you even [18:10:40] I need to work on that Trebuchet bug :( [18:10:48] on the deployment train metaphor [18:10:54] i just tighten a couple bolts every other run [18:11:02] _joe_: are you still around? [18:11:10] i wonder if the change is implicated [18:12:17] AaronSchulz: `dsh -g mediawiki-installation -M -F 80 -- 'php -m|grep memcache'` says that tin, mw1151, snapshot100[1-4] and searchidx1001 are the only nodes with memcached installed in php [18:12:24] greg-g, i just checked - the update's live portion is just https://gerrit.wikimedia.org/r/#/c/135599/ -- see git diff c89fb1..df3da [18:12:36] (03PS1) 10Rush: admin yaml for swift boxes [operations/puppet] - 10https://gerrit.wikimedia.org/r/135812 [18:12:51] it could be the removal of the igbinary module [18:12:55] everything else is i18n (which was not scaped), and maintenance stuff [18:13:34] that's the only change resulting from the big puppet refactor [18:13:47] yurik: thanks for looking man, I think others are starting to narrow it down to something else [18:13:50] bd808|deploy: apaches don't have memcached [18:13:52] they have twemproxy [18:14:16] ori: So they shouldn't have the php module to talk the memcached protocol? [18:14:31] they still need it [18:14:57] (03PS1) 10Dzahn: remove mobile1001-1004 from site.pp - unused [operations/puppet] - 10https://gerrit.wikimedia.org/r/135813 [18:14:57] the memcached module wasn't removed [18:15:08] i'll restore igbinary, in case it's related [18:15:23] igbinary isn't needed though [18:15:35] cli php just needs memcached back [18:16:17] but on mw1001 (for example) i see: cli/conf.d/memcached.ini:extension=memcached.so [18:16:50] (03PS2) 10Dzahn: remove mobile1001-1004 from site.pp - unused [operations/puppet] - 10https://gerrit.wikimedia.org/r/135813 [18:16:59] I see the ini file on mw1010 as well, but `php -m` doesn't have the extension loaded [18:17:47] maybe it depends on igbinary in some way [18:17:55] when did this start? [18:18:12] <^d> Last logged job was at 17:39 [18:18:40] (03CR) 10Dzahn: [C: 032] "Host mobile1001.wikimedia.org not found: 3(NXDOMAIN)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135813 (owner: 10Dzahn) [18:18:41] ori: are we passing --enable-memcached-igbinary when configuring it? [18:19:48] (03PS1) 10Rush: admin yaml for search* [operations/puppet] - 10https://gerrit.wikimedia.org/r/135818 [18:19:56] Is this only happening on php-1.24wmf5? [18:20:14] <^d> It's not version-dependent. [18:20:23] (03PS1) 10Ori.livneh: restore /etc/php5/conf.digbinary.ini on app servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/135819 [18:20:24] <^d> It's broken on both. [18:20:33] mutante: can i pull you in to review this? ^ [18:20:57] (03CR) 10Chad: [C: 031] restore /etc/php5/conf.digbinary.ini on app servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/135819 (owner: 10Ori.livneh) [18:21:11] (03PS2) 10Chad: restore /etc/php5/conf.d/igbinary.ini on app servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/135819 (owner: 10Ori.livneh) [18:21:21] <^d> Fixed the commit summary, otherwise lgtm. [18:21:41] mw1010 has both php5-memcached and php5-igbinary installed according to `dpkg -l` [18:21:41] (03CR) 10preilly: [C: 031] restore /etc/php5/conf.d/igbinary.ini on app servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/135819 (owner: 10Ori.livneh) [18:22:02] bd808|deploy: yes, but it's not loaded [18:22:25] ori: easy enough.. if it fixes and since you got the +1s.. doing [18:22:42] ori: can’t you just modify one host without puppet to confirm the dependancy? [18:22:54] no [18:23:11] (03CR) 10Dzahn: [C: 032] restore /etc/php5/conf.d/igbinary.ini on app servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/135819 (owner: 10Ori.livneh) [18:23:17] Okay I don’t see why not but okay [18:23:40] but now that https://gerrit.wikimedia.org/r/#/c/135819/ is merged it won’t matter ;-) [18:24:34] !log Scap updated to fd7e538; Trebuchet fetch and checkout failed for mw1053.eqiad.wmnet [18:24:39] Logged the message, Master [18:25:59] @AaronSchulz did you ever find out if, “--enable-memcached-igbinary” is being used? [18:26:41] it is [18:26:42] ori: ]/Mediawiki::Php/File[/etc/php5/conf.d/igbinary.ini]/ensure: defined content ... [18:27:15] (03PS5) 10JanZerebecki: Improve nginx TLS/SSL settings. [operations/puppet] - 10https://gerrit.wikimedia.org/r/132393 (https://bugzilla.wikimedia.org/53259) [18:27:18] so is Memcached::OPT_SERIALIZER Memcached::SERIALIZER_IGBINARY right now? [18:27:24] no [18:27:34] okay so it’s Memcached::SERIALIZER_PHP [18:27:34] <^d> !log jobrunners back up now, should slowly catch back up [18:27:39] Logged the message, Master [18:27:53] everything sorted out? [18:27:59] sorry, I took a break [18:28:00] * AaronSchulz didn't see any stuff in puppet about --enable [18:28:00] yes [18:28:33] paravoid: no more breaks! [18:28:41] ori: nice work! [18:28:49] preilly: well, i broke it [18:28:50] * preilly thinks that ori is the man [18:28:57] !log running puppet on jobrunners [18:29:02] Logged the message, Master [18:29:59] mutante: thanks very much [18:38:39] (03CR) 10Dzahn: "they did not have puppet certs or stored configs either.. all good to go" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135813 (owner: 10Dzahn) [18:39:16] (03PS3) 10Dzahn: admin yaml for strontium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135810 [18:39:18] (03CR) 10jenkins-bot: [V: 04-1] admin yaml for strontium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135810 (owner: 10Dzahn) [18:39:21] yeah nice work mutante [18:41:33] (03CR) 10Ori.livneh: Add rsyslog module and port existing usage (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135447 (owner: 10Ori.livneh) [18:43:32] (03PS4) 10Dzahn: admin yaml for strontium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135810 [18:45:22] ori: I keep getting '2014-05-28 18:44:42 Executed 92 periodic queue task(s).' on osmium but not elsewhere [18:45:36] something is wonky there...sounds like the problem on labs [18:45:37] (03PS6) 10Ori.livneh: Add rsyslog module and port existing usage [operations/puppet] - 10https://gerrit.wikimedia.org/r/135447 [18:46:20] AaronSchulz: the character encoding thing? [18:47:42] you think unserialize is failing? [18:47:59] AaronSchulz: i'm wondering what you meant by "the problem on labs" [18:48:27] bug 63681 [18:49:07] (03PS1) 10Rush: admin yaml for silver (ldap) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135825 [18:49:23] (03CR) 10jenkins-bot: [V: 04-1] admin yaml for silver (ldap) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135825 (owner: 10Rush) [18:50:23] AaronSchulz: hmm. what do they have in common? deployment-jobrunner01 isn't running hhvm [18:51:12] (03PS2) 10Rush: admin yaml for search* [operations/puppet] - 10https://gerrit.wikimedia.org/r/135818 [18:51:14] (03PS2) 10Rush: admin yaml for swift boxes [operations/puppet] - 10https://gerrit.wikimedia.org/r/135812 [18:51:16] (03PS2) 10Rush: admin yaml for silver (ldap) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135825 [18:53:10] greg-g: Scap seems to be mostly sorted out. Now I need to test. [18:54:12] !log bd808 Synchronized robots.txt: Testing sync-file in php (duration: 00m 05s) [18:54:17] Logged the message, Master [18:58:59] ori: var_dump() on the key gives an array on terbium and false on osmium... [19:06:17] !log mw1053 reinstalling [19:06:22] Logged the message, Master [19:08:46] !log Symlinks for mergeCdbFileUpdates, mwversionsinuse, refreshCdbJsonFiles, scap-rebuild-cdbs, scap-recompile and sync-common on tin still pointing to /srv/scap/bin instead of /srv/deployment/scap/scap/bin [19:08:50] Logged the message, Master [19:10:16] Is the mediawiki::sync puppet class not applied on tin? [19:11:04] greg-g: This seems to be dragging on. Glad I didn't wedge it right up against another deploy. [19:11:10] i would like to know a puppet command which outputs all applied classes [19:11:31] (03CR) 10Dzahn: [C: 031] admin yaml for silver (ldap) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135825 (owner: 10Rush) [19:11:33] ori: I don't see a nutcracker process on osmium either [19:11:38] jgage: There's a log file it writes somewhere isn't there? [19:11:45] * jgage looks [19:12:18] AaronSchulz: it doesn't have one provisioned; i couldn't add the app server roles to it because of all the spidering dependencies. [19:12:21] that info is not in puppet.log ... [19:12:33] jgage: /var/lib/puppet/state/classes.txt I think [19:12:35] AaronSchulz: i didn't think of it, i'll add it [19:12:45] I assume it's easier now with your refactoring [19:12:57] oh, wait [19:12:59] i did think of it [19:13:04] it's just that twemproxy isn't packaged for trusty [19:13:10] there's an RT and faidon did most of the work already [19:13:11] bd808: neat, thanks! [19:13:26] bd808|deploy: I haven't been watching, was in a meeting with sir Erik, should I be worried? :) [19:13:59] greg-g: cluster is updated, but there are issues in tin; it's pointing at old code [19:14:09] (03CR) 10Dzahn: [C: 031] admin yaml for search* [operations/puppet] - 10https://gerrit.wikimedia.org/r/135818 (owner: 10Rush) [19:14:15] AaronSchulz: but yes, it's pretty close to being doable now [19:14:19] ahh, see that [19:14:49] At the moment I'm not sure why tin is messed up. I'd need root to see the files that might tell me what's wrong [19:15:09] i'm looking [19:15:13] It looks like the mediawiki::sync puppet class isn't applied [19:15:29] Or somehow failing to update the symlinks in /usr/local/bin [19:15:33] * greg-g leaves you all to it and gets lunch before his next meeting [19:15:35] (03PS5) 10Dzahn: admin yaml for strontium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135810 [19:16:23] I'm relatively certain it's applied because that's what is causing trebuchet to run the checkout phase there and mess up the perms on /srv/deployment/scap/scap/.git I believe [19:17:16] (03CR) 10Dzahn: [C: 032] admin yaml for strontium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135810 (owner: 10Dzahn) [19:17:41] bd808|deploy: change 0a8684f826cf7b21f0ff83a4f71c8f7dba4b5ddd replaced 'include mediawiki::sync' in modules/mediawiki/manifests/web.pp with 'include ::mediawiki', but ::mediawiki now includes mediawiki::sync, so i don't see how that could be an issue [19:20:36] bd808|deploy: mergeCdbFileUpdates isn't in that class, fwiw [19:21:00] ori: Yeah it was deleted in scap.git [19:23:37] i don't think it's applied on tin [19:23:42] but i don't think it's related to my change [19:24:18] Hmmm.. how did the symlinks get created in the first place? [19:24:36] Here's what needs to be done to clean it up manually -- https://gist.github.com/bd808/998b8b339b50c7a27b8b [19:24:54] But that seems like a short term fix [19:24:59] (03PS1) 10Ori.livneh: apply mediawiki::sync class on tin [operations/puppet] - 10https://gerrit.wikimedia.org/r/135833 [19:25:35] mutante: sorry, can i pull you in again? [19:26:15] <_joe_> ori: you needed me? [19:26:32] yes, see patch above [19:26:35] <_joe_> did we screw something up ? [19:26:55] <_joe_> ori: I'm on a ~ 56k connection now [19:27:06] there was an issue earlier (igbinary.ini was needed after all) but i don't think the current one is related [19:27:52] tin appears to have 'shed' the mediawiki::sync class at some point between last february and today (based on the mtimes) [19:27:54] <_joe_> ok so, I'm not really able to help debugging anything with this connection [19:28:02] Tin's motd says "tin is a Wikimedia application server (wikimedia-task-appserver)." jsut like terbium and terbium has the right symlinks [19:28:47] <_joe_> bd808|deploy: that means nothing [19:28:54] <_joe_> motds can be left around [19:29:12] <_joe_> even if puppet classes have been removed [19:29:23] terbium has role::mediawiki::maintenance [19:29:40] (03CR) 10Hashar: "I could totally use this as well. Added Rush as reviewer since he worked on the admin migration to YAML file." [operations/puppet] - 10https://gerrit.wikimedia.org/r/76678 (owner: 10Tim Starling) [19:29:50] _joe_: are you able to merge that change? [19:30:05] <_joe_> ori: which one? [19:30:12] https://gerrit.wikimedia.org/r/#/c/135833/ [19:30:38] <_joe_> I'd prefer someone in the right TZ and with a decent connection to do that [19:30:46] * ori nods. [19:30:49] do you know who's around? [19:31:00] <_joe_> no idea - just here [19:31:01] sf office is opsen-free atm [19:31:08] i'm here, in the quiet room [19:31:10] what's up? [19:31:22] need a merge on https://gerrit.wikimedia.org/r/#/c/135833/ to unbreak tin [19:31:28] k, sec [19:31:42] <_joe_> oh hey jgage :) [19:31:45] <_joe_> thanks man [19:31:56] hi _joe_ :) [19:32:14] (03CR) 10Gage: [C: 032] apply mediawiki::sync class on tin [operations/puppet] - 10https://gerrit.wikimedia.org/r/135833 (owner: 10Ori.livneh) [19:32:27] merging.. [19:33:07] thanks very much [19:33:12] could you force a puppet run on tin, too? [19:33:51] sorry phone call one sec [19:36:23] ori: We should probably get /srv/scap removed from the cluster hosts too. What's the best way to ask for that? RT ticket? [19:36:40] yes [19:38:03] re. i +2'd it but puppet-merge thinks there's no changes, let's see what happened.. [19:38:05] hashar: for things like https://gerrit.wikimedia.org/r/#/c/76678/1 [19:38:27] if you look at the README for admin module it should show how to deal w/ that [19:38:51] like if you put modules/admin/files/home/hashar/i_love_turtles [19:39:05] you will get i_love_turtles files at all participating node locations [19:39:06] (03PS2) 10Gage: apply mediawiki::sync class on tin [operations/puppet] - 10https://gerrit.wikimedia.org/r/135833 (owner: 10Ori.livneh) [19:39:20] rebasing.. [19:39:22] (03PS3) 10Rush: admin yaml for swift boxes [operations/puppet] - 10https://gerrit.wikimedia.org/r/135812 [19:39:27] (03CR) 10Rush: [C: 032 V: 032] admin yaml for swift boxes [operations/puppet] - 10https://gerrit.wikimedia.org/r/135812 (owner: 10Rush) [19:39:57] <_joe_> chasemp: I will be able to log as myself on swift boxes? wow [19:40:07] very soon [19:40:08] chasemp: so that old change can be abandoned right ? [19:40:13] and with your setup :) [19:40:21] <_joe_> next move, we close root ssh access [19:40:23] hashar: basically yes [19:40:33] but I was going to link to new way [19:40:38] w/ a comment [19:40:41] as a nicety [19:40:58] <_joe_> uhm strange, I don't see pitchforks after my proposal [19:41:17] gah why is gerrit still showing me a grayed out button for "publish and submit" after i've rebased? [19:41:22] actually I think it's generally the plan [19:41:30] with some tbd aspects [19:41:43] chasemp: would you mind commenting on https://gerrit.wikimedia.org/r/#/c/76678/1 and maybe abandon it ? :) [19:41:46] jgage: sometimes if there is a conflict ui rebase doesn't actually do it [19:41:56] in those cases I use cli and re-push [19:42:01] but no clue if there is a better way [19:42:07] chasemp: also thanks for the admin data.yaml python linter. I get it wrapped in Jenkins and it should now warn whenever someone makes mistake \O/ [19:42:29] hashar: yes thank you for that, meant to say, saved me once already. on that changest I will but there are several [19:42:32] and I was going to do them all at once [19:42:43] jgage: you need to re+2 after rebasing [19:43:26] that's what i'm tryign to do but it will only let me comment, not "submit" [19:43:30] chasemp: and since the job invoke your linter directly, you can add more lint tests there and Jenkins will happily run them. [19:43:36] jgage: you probably have the previous patchset open [19:43:36] and it still says it has a dep or needs rebase after i rebased [19:43:49] reload https://gerrit.wikimedia.org/r/#/c/135833/ [19:44:00] and make sure you have PS2 expanded [19:44:07] (03PS3) 10Ori.livneh: apply mediawiki::sync class on tin [operations/puppet] - 10https://gerrit.wikimedia.org/r/135833 [19:44:14] it needed another rebase, so make that PS3 [19:44:27] ah i see [19:44:39] hashar: I was going to wait to ask you, since I think data.yaml will live in a different repo [19:44:43] but it's useful to have for sure [19:44:55] (03CR) 10Gage: [C: 032 V: 032] apply mediawiki::sync class on tin [operations/puppet] - 10https://gerrit.wikimedia.org/r/135833 (owner: 10Ori.livneh) [19:45:01] whew ok. merging.. [19:45:13] (03PS3) 10Rush: admin yaml for search* [operations/puppet] - 10https://gerrit.wikimedia.org/r/135818 [19:45:21] (03CR) 10Rush: [C: 032 V: 032] admin yaml for search* [operations/puppet] - 10https://gerrit.wikimedia.org/r/135818 (owner: 10Rush) [19:45:25] running puppet on tin... [19:46:13] jgage: much obliged [19:47:48] jgage: puppetd -tv ;) [19:48:01] That seems to have fixed it. Thanks ori and jgage [19:48:19] thanks jgage! [19:48:36] yay finally puppet run done. my pleaseure :) [19:48:39] !log bd808 Synchronized robots.txt: Testing sync-file in php (duration: 00m 03s) [19:48:44] Logged the message, Master [19:48:54] Crappy log message is crappy [19:49:37] !log bd808 Synchronized database lists: (no message) (duration: 00m 03s) [19:49:42] Logged the message, Master [19:50:32] (03PS2) 10Dzahn: migrate jenkins users/admins/roots to admin.yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/134739 [19:50:35] (03PS1) 10Rush: remove dupe wikidev logic for lucene role [operations/puppet] - 10https://gerrit.wikimedia.org/r/135873 [19:50:56] (03CR) 10Rush: [C: 032 V: 032] "needed to fix puppet" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135873 (owner: 10Rush) [19:51:29] !log bd808 Started scap: no-op scap deleted.dblist [19:51:33] Logged the message, Master [19:52:12] !log Horrible log message; should be "no-op scap to test code changes" [19:52:16] Logged the message, Master [19:52:29] paste buffer is not my friend today [19:52:41] (03PS1) 10Aaron Schulz: Periodically restart job runners to avoid pipeline shrinking issue [operations/puppet] - 10https://gerrit.wikimedia.org/r/135875 [19:53:14] chasemp: whenever you migrate to a different repo, we can move the Jenkins job to that repo as well :] [19:53:15] csteipp: in https://gerrit.wikimedia.org/r/#/c/132393/1/templates/nginx/nginx.conf.erb you said you pinged analytics. was that in public? what exactly can they measure? [19:54:54] (03PS1) 10Rush: adding deployment group to search* [operations/puppet] - 10https://gerrit.wikimedia.org/r/135877 [19:55:17] (03CR) 10Rush: [C: 032 V: 032] "try to fix search*" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135877 (owner: 10Rush) [19:58:12] (03PS3) 10Rush: admin yaml for silver (ldap) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135825 [19:58:17] (03CR) 10Rush: [C: 032 V: 032] admin yaml for silver (ldap) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135825 (owner: 10Rush) [20:00:36] gwicke, subbu: Almost done with my marathon scap deploy [20:00:55] ok ... [20:02:09] !log bd808 Finished scap: no-op scap deleted.dblist (duration: 10m 40s) [20:02:13] Logged the message, Master [20:02:39] greg-g, subbu, gwicke: all done [20:02:48] jouncebot: next [20:02:48] In 2 hour(s) and 57 minute(s): SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140528T2300) [20:04:02] (03PS1) 10Rush: search-roots group for search* [operations/puppet] - 10https://gerrit.wikimedia.org/r/135880 [20:04:42] bd808: :) thanks man [20:07:22] bd808, thanks! [20:07:27] (03CR) 10Dzahn: [C: 031] search-roots group for search* [operations/puppet] - 10https://gerrit.wikimedia.org/r/135880 (owner: 10Rush) [20:07:49] (03CR) 10Rush: [C: 032 V: 032] search-roots group for search* [operations/puppet] - 10https://gerrit.wikimedia.org/r/135880 (owner: 10Rush) [20:11:11] unsurprisingly, I think I managed to kill Jenkins web interface [20:12:18] !log gallium (Jenkins master) sent to swap somehow :-( [20:12:21] Logged the message, Master [20:12:29] PROBLEM - Puppet freshness on searchidx1001 is CRITICAL: Last successful Puppet run was Wed 28 May 2014 05:11:33 PM UTC [20:13:06] (03PS3) 10Dzahn: migrate jenkins users/admins/roots to admin.yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/134739 [20:13:28] !log deployed parsoid a234af8c0 (deploy sha f17506eb) [20:13:29] PROBLEM - check if dhclient is running on gallium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:13:29] PROBLEM - puppet disabled on gallium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:13:30] PROBLEM - check configured eth on gallium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:13:33] Logged the message, Master [20:13:38] I am handling gallium [20:13:39] PROBLEM - RAID on gallium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:14:02] hashar: i was just going to touch it :) [20:14:11] some job went wild there [20:14:14] ok [20:14:36] (03CR) 10Dzahn: migrate jenkins users/admins/roots to admin.yaml (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/134739 (owner: 10Dzahn) [20:14:46] looking at searchidx1001 [20:15:02] this isn't anything I did [20:15:02] err: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class role::mediawiki::configuration::php for searchidx1001.eqiad.wmnet at /etc/puppet/manifests/role/lucene.pp:156 on node searchidx1001.eqiad.wmnet [20:15:04] I don't think [20:15:08] anyone touching that today? [20:15:11] idk [20:15:53] ori? [20:15:59] since it's mw config? [20:16:10] PROBLEM - SSH on gallium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:16:18] (03PS1) 10Rush: admin yaml analytics1003.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135882 [20:16:19] RECOVERY - puppet disabled on gallium is OK: OK [20:16:19] RECOVERY - check if dhclient is running on gallium is OK: PROCS OK: 0 processes with command name dhclient [20:16:29] RECOVERY - check configured eth on gallium is OK: NRPE: Unable to read output [20:17:29] RECOVERY - RAID on gallium is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [20:17:59] RECOVERY - SSH on gallium is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [20:18:07] !log Jenkins: killed all phantomjs process on gallium. They were eating all available memory. All three process were VisualEditor qunit tests. [20:18:12] Logged the message, Master [20:18:13] ori: you about I think maybe errors for puppet on searchidx1001.eqiad.wmnet are from some refactoring there [20:18:16] mutante: gallium should come back now. [20:18:33] hashar: 'k, thanks [20:19:48] hashar: Hmm. :-( [20:20:01] James_F: talking about it with timo in #wikimedia-qa [20:20:01] (03PS4) 10Dzahn: migrate jenkins users/admins/roots to admin.yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/134739 [20:20:08] Kk. [20:20:54] (03CR) 10Ottomata: [C: 032] admin yaml analytics1003.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135882 (owner: 10Rush) [20:21:50] (03CR) 10jenkins-bot: [V: 04-1] admin yaml analytics1003.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135882 (owner: 10Rush) [20:22:59] hashar: the -1 above from jenkins is weird [20:23:01] '20:20:30 ERROR: Could not clone repository' [20:23:24] at least I believe there is no syntax error there...it's some internal chicanery for jenkins [20:23:48] (03CR) 10Rush: [C: 032 V: 032] "jenkins your drunk" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135882 (owner: 10Rush) [20:24:10] can't seem to merge it either [20:24:16] haha [20:24:41] i think ytou can remove jenkins as a revewier [20:24:43] and then merge [20:24:56] (03PS1) 10Rush: admin yaml for analytics1004.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135884 [20:25:57] ottomata: ah didn't know, thanks will try [20:27:02] chasemp: the server went to swap a few minutes ago. You can recheck it I guess [20:27:13] (03CR) 10Hashar: "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135882 (owner: 10Rush) [20:27:38] (03CR) 10Ottomata: admin yaml for analytics1004.eqiad.wmnet (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135884 (owner: 10Rush) [20:28:29] (03PS5) 10Dzahn: migrate jenkins users/admins/roots to admin.yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/134739 [20:32:56] (03CR) 10Rush: admin yaml for analytics1004.eqiad.wmnet (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135884 (owner: 10Rush) [20:35:58] (03CR) 10Ottomata: [C: 032] admin yaml for analytics1004.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135884 (owner: 10Rush) [20:36:34] (03CR) 10Rush: [C: 032 V: 032] admin yaml for analytics1004.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135884 (owner: 10Rush) [20:38:19] !log enabling puppet on osmium [20:38:28] Logged the message, Master [20:38:59] RECOVERY - Puppet freshness on osmium is OK: puppet ran at Wed May 28 20:38:58 UTC 2014 [20:42:01] I don't get it [20:42:13] mutante: can you explain please ^ vs https://gerrit.wikimedia.org/r/#/c/135757/ [20:42:57] what's wrong? looks like 24 hour format [20:43:29] jzerebecki: We measure page times for logged in and logged out users, and already flag if it's http or https. Once we know the baseline, we can see if changing the nginx config changes the time for http vs https. [20:44:28] other servers report am/pm mode [20:44:41] date +"%a %d %b %Y %T %Z" [20:44:41] Wed 28 May 2014 13:44:29 PDT [20:44:56] that looks to me like it does what that change changed [20:45:14] see log from this morning in this channel [20:45:32] e.g: [03:00:29] PROBLEM - Puppet freshness on oxygen is CRITICAL: Last successful Puppet run was Tue 27 May 2014 08:59:17 PM UTC [20:46:07] matanya: i guess it's a chicken egg problem? (the fix for the monitoring of puppet runs was not applied because puppet did no run) [20:46:31] my change isn't merged [20:46:53] ah, lol, true [20:47:18] so i can't understand why servers don't report the same [20:49:53] matanya: here's a difference.. you are comparing the message for PROBLEM with the message for RECOVERY [20:50:10] see how it says "Last successful Puppet run was" [20:50:29] oh, good catch :) [20:50:30] and then it says "OK: puppet ran at " [20:50:40] there is no "Last succesful" in that one [20:50:58] so my patch is valid and should be merged ... [20:53:23] (03CR) 10Dzahn: [C: 032] "yep, 24 hour format" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135757 (owner: 10Matanya) [20:53:29] matanya: yes [20:53:34] thank you [21:00:03] (03CR) 10Dzahn: Move logs to /var/log/mediawiki (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/83574 (owner: 10Reedy) [21:00:11] matanya: ^ thanks for that as well, 2 comments [21:04:11] (03CR) 10Dzahn: [C: 032] Add Krinkle to the English Planet [operations/puppet] - 10https://gerrit.wikimedia.org/r/135748 (owner: 10Nemo bis) [21:05:00] ^d: so we just get rid of hhvm hourly build right? [21:05:05] <^d> Yep. [21:05:40] you want to rebase the JJB related change [21:05:51] <^d> I just made a second change. [21:05:51] <^d> https://gerrit.wikimedia.org/r/#/c/135894/ [21:05:56] niceee [21:05:58] <^d> I didn't mean to, but I did. [21:06:09] and you already did the clean up [21:06:16] thanks for freeing up resources! [21:06:26] <^d> you're welcome :) [21:06:58] I would like one day to build our extensions with our hhvm build and the upstream branch we are tracking [21:07:03] to catch potential issues [21:08:09] (03PS6) 10Dzahn: migrate jenkins users/admins/roots to admin.yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/134739 [21:08:45] (03PS1) 10BryanDavis: scap: ensure=>absent /usr/local/bin/sync-common-file [operations/puppet] - 10https://gerrit.wikimedia.org/r/135924 [21:08:47] (03PS1) 10BryanDavis: scap: /usr/local/bin/sync-common-file is unused [operations/puppet] - 10https://gerrit.wikimedia.org/r/135925 [21:10:14] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:10:19] (03CR) 10Dzahn: [C: 032] migrate jenkins users/admins/roots to admin.yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/134739 (owner: 10Dzahn) [21:11:04] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.014 second response time [21:11:58] (03PS1) 10Dzahn: remove admins::jenkins, replaced by yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/135926 [21:12:50] (03PS12) 10Matanya: Move logs to /var/log/mediawiki [operations/puppet] - 10https://gerrit.wikimedia.org/r/83574 (owner: 10Reedy) [21:12:57] mutante: ^ :) [21:13:31] hashar: and .. i think i found another thing to enhance the linter [21:13:43] has_key(): expects the first argument to be a hash, got "" which is of type String [21:13:46] hrmm [21:13:54] matanya: thx, in a minute [21:14:09] csteipp: so that would be a measurement to be done after deploying the change? [21:14:51] mutante: I have really looked at the code. Just added sys.exit(1) where relevant :° [21:15:05] springle: do we actually have query caching enabled? [21:15:38] (03CR) 10CSteipp: "In practice, ssllabs doesn't show anything negotiating to DHE, so I'm incline to take it out unless there's something to mitigate targeted" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/132393 (https://bugzilla.wikimedia.org/53259) (owner: 10JanZerebecki) [21:15:49] * AaronSchulz can't imagine it would help much with heavily changing tables [21:16:00] jzerebecki: We need a baseline to make sure we can measure it-- so that's what I'm waiting on right now. [21:16:02] * AaronSchulz is just mulling over https://www.mediawiki.org/w/index.php?title=Performance_guidelines [21:18:37] csteipp: who is doing the work for that? [21:19:32] (03PS1) 10Chad: Include CologneBlue and Modern if they exist as proper skins [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135927 [21:19:42] matanya: do you see a difference here? https://gerrit.wikimedia.org/r/#/c/134739/ [21:20:04] I'm not sure who is doing the actual work. Toby forwarded it on to Dario. I'll check on their progress. [21:20:05] matanya: i mean, something that would explain why it breaks unlike the other similar changes [21:20:13] looking [21:20:36] missing space ? [21:20:46] (03PS1) 10BBlack: Move RPS config to site.pp where it belongs. [operations/puppet] - 10https://gerrit.wikimedia.org/r/135928 [21:20:58] matanya: where? [21:21:29] between aaron and chris? might be [21:21:29] line 111 in https://gerrit.wikimedia.org/r/#/c/134739/6/modules/admin/data/data.yaml [21:21:32] yes [21:21:42] thanks, let's see [21:22:13] and same in site.pp [21:23:15] you want line breaks, right, but as opposed to the yaml that should not break [21:23:37] no, space between the group names [21:23:42] (03PS1) 10BBlack: Switch LVS to "performance" cpufreq governor [operations/puppet] - 10https://gerrit.wikimedia.org/r/135929 [21:23:45] i know line breaks don't work [21:23:51] !log Restarting Parsoid Varnishes per gwicke's request [21:23:56] Logged the message, Mr. Obvious [21:23:58] (03PS2) 10Ottomata: [WIP] Add CDH5 support, drop CDH4 support [operations/puppet/cdh4] (cdh5) - 10https://gerrit.wikimedia.org/r/135494 [21:24:09] ok [21:25:20] (03PS1) 10Dzahn: add missing whitespace in contint yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/135930 [21:25:52] (03CR) 10BBlack: [C: 032 V: 032] Move RPS config to site.pp where it belongs. [operations/puppet] - 10https://gerrit.wikimedia.org/r/135928 (owner: 10BBlack) [21:26:34] (03PS1) 10BBlack: Add optional RSS setup to interface RPS script [operations/puppet] - 10https://gerrit.wikimedia.org/r/135931 [21:26:36] (03PS1) 10BBlack: enable RSS for LVS servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/135932 [21:27:11] (03CR) 10jenkins-bot: [V: 04-1] add missing whitespace in contint yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/135930 (owner: 10Dzahn) [21:27:13] (03CR) 10JanZerebecki: "It doesn't in the current version as I have already disabled DHE." [operations/puppet] - 10https://gerrit.wikimedia.org/r/132393 (https://bugzilla.wikimedia.org/53259) (owner: 10JanZerebecki) [21:28:10] (03PS2) 10Dzahn: add missing whitespace in contint yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/135930 [21:29:01] (03CR) 10Dzahn: [C: 032] add missing whitespace in contint yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/135930 (owner: 10Dzahn) [21:31:27] matanya: thanks, different error :) [21:31:41] still broken ? [21:31:48] yea, but in a new way [21:31:53] so that was a fi [21:31:55] fix [21:33:02] looking again [21:33:54] (03PS1) 10Dzahn: remove duplicate admins::roots from contint boxes [operations/puppet] - 10https://gerrit.wikimedia.org/r/135935 [21:34:02] matanya: ^ should just be that [21:34:41] yes, that should fix it [21:34:48] (03PS1) 10Andrew Bogott: Change the mysql connection address for openstack + labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/135937 [21:35:24] cajoel: hello, did you start rolling the replacement for sanger ? [21:35:42] (03CR) 10Dzahn: [C: 032] remove duplicate admins::roots from contint boxes [operations/puppet] - 10https://gerrit.wikimedia.org/r/135935 (owner: 10Dzahn) [21:35:43] matanya: negative -- Alex owns that [21:36:08] any ticket in rt other than 6163 ? [21:36:30] not that I know of [21:37:46] i'll speak to alex tomorrow. The outcome will probably be. create RT ticket for a new box. thanks cajoel [21:38:08] matanya: fixed it, yea:) [21:38:14] yay, thanks [21:38:19] Eloquence: Reedy http://vimeo.com/96687523 [21:39:37] (03PS1) 10Ori.livneh: Update reference to role::mediawiki::configuration::php [operations/puppet] - 10https://gerrit.wikimedia.org/r/135939 [21:39:40] I am off. Delegating to Zuul / Jenkins :-D [21:39:42] chasemp: ^ [21:39:42] * hashar vanishes [21:40:18] hashar: why do you always do that when i already started typing a line for you:) [21:41:00] (03Abandoned) 10Andrew Bogott: Change the mysql connection address for openstack + labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/135937 (owner: 10Andrew Bogott) [21:41:38] (03CR) 10Dzahn: "cat /etc/sudoers.d/hashar" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134739 (owner: 10Dzahn) [21:44:32] (03CR) 10Rush: [C: 031] "seems good to me, reference update" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135939 (owner: 10Ori.livneh) [21:46:00] (03CR) 10Dzahn: "# Instead of skip-networking the default is now to listen only on" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135937 (owner: 10Andrew Bogott) [21:47:41] (03CR) 10Andrew Bogott: "Everything works if I just comment out this line in my.cnf:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135937 (owner: 10Andrew Bogott) [21:47:45] mutante: i'm leaving in a few. if you want any work on the mediawiki log change, post it to the patchset, and i'll try to do it tomorrow am [21:50:54] matanya: ok, directories need to be writable [21:51:03] i can just amend and do it though [21:52:05] you mean 0775 ? [21:52:38] it's currently owned by mwdeploy [21:52:44] that change gives it to wikidev [21:52:51] right [21:52:57] do the crons all run as apache now? [21:53:01] yes [21:53:41] hmmm "wikidev can own mwdeploy but mwdeploy can't own wikidev " [21:53:50] "scripts owned by mwdeploy can only be run by apache " [21:53:54] see huge warning on manifests/misc/maintenance.pp [21:53:59] i know [21:54:33] so shouldn't the logdir be owned by the script user? [21:54:37] not sure [21:54:41] apache ? [21:54:46] Reedy: ^ [21:57:06] grep owner maintenance.pp [21:57:14] matanya: ^ try that, it's still mixed [21:57:22] mwdeploy, l10nupdate, root, apache [21:58:11] yes, shouldn't be merged until this is cleared up [21:58:26] ok, let's figure it out tomorrow [21:58:49] sure, night [21:58:56] good night, thanks [22:00:03] (03CR) 10Rush: [C: 031] remove admins::jenkins, replaced by yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/135926 (owner: 10Dzahn) [22:01:01] (03CR) 10Dzahn: [C: 032] remove admins::jenkins, replaced by yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/135926 (owner: 10Dzahn) [22:03:55] (03PS3) 10Dzahn: Remove duplicate users in admin class's data.yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/135725 (owner: 10Hoo man) [22:06:10] (03CR) 10jenkins-bot: [V: 04-1] Remove duplicate users in admin class's data.yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/135725 (owner: 10Hoo man) [22:07:10] (03CR) 10Andrew Bogott: [C: 031] "These are clearly duplicates :) I'd be interested in knowing how they got in here before this is merged, though (unless Chase has already" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135725 (owner: 10Hoo man) [22:08:29] (03CR) 10Dzahn: "pretty sure that happened because a script generated these from classic admins.pp" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135725 (owner: 10Hoo man) [22:12:42] (03CR) 10Dzahn: "see the admin linter failing with interestig messages" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135725 (owner: 10Hoo man) [22:17:13] Krinkle: heh: http://socket.io/blog/introducing-socket-io-1-0/ [22:17:18] impeccable timing [22:18:45] but: hah! "If you want to scale out Socket.IO to multiple nodes, it now comes down to two simple steps: 1) Turn on sticky load balancing (for example by origin IP address)., 2. Implement the socket.io-redis adapter." [22:18:48] that's validating! [22:19:58] :D [22:20:28] yay redis! [22:20:58] and sticky load balancing [22:21:02] we made the same choices [22:31:57] (03PS1) 10Ori.livneh: Add self (ori) to admin/data.yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/135952 [22:32:16] (03PS2) 10Ori.livneh: Add self (ori) to admin/data.yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/135952 [22:35:36] ori: Nice [22:35:56] ori: btw, today I implemented multi-layer object caching [22:36:16] I wanted it for my tools on Toolserver / Tool Labs [22:36:28] Basically like BagOStuff / MultiWrite in MediaWiki [22:37:07] But with a front-end feature (e.g. on set: store in memory and in Redis, on get: try memory, then Redis, if get from Redis, populate memory now so that we have it next time) [22:37:25] Allows for getting rid a lot of ad-hoc static caching and instead let the cache class cache itself. [22:37:37] (03CR) 10Dzahn: [C: 032] Add self (ori) to admin/data.yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/135952 (owner: 10Ori.livneh) [22:37:59] https://github.com/Krinkle/toollabs-base/blob/master/src/Cache.php [22:38:17] Not extraordinary, but quite proud of it. [22:38:18] \o/ [22:38:53] for the record, no SWAT patches lined up for today yet [22:39:32] For example, getNamespaces on Toolserver means making a complex sql query or a wmf mw api request. Even if i's cached in Redis, you don't want to call Redis a 100 times for the same key within a web request (e.g. from some generic utility that creates an link) [22:39:41] greg-g: When is SWAT? [22:39:56] in 21 minutes [22:40:07] Krinkle: cool, checking it out [22:40:12] Hm. I'll see if I can find something to do for it then greg-g :) [22:41:50] :P [22:43:35] greg-g: I have https://gerrit.wikimedia.org/r/#/c/127443/ ... [22:44:38] doesn't look ready... ? [22:45:48] greg-g: It will be ready when someone merges the CU extension patch /me looks at AaronSchulz and TimStarling :p [22:47:46] ori: For example, if you open https://tools.wmflabs.org/orphantalk/?debug=true and select ab.wikipedia (just any wiki), scroll down and you'll see that it starts with a cache hit in FileSystemCache, and then cache hits from memory. [22:47:53] I think CheckUser is just following convention there [22:48:27] ori: of course, if I wouldn't have such feature, I wouldn't use cache directly in so many places, I'd use additional static and in-object caching to avoid making so many cache calls [22:49:15] TimStarling: Though no default rights (well, creating an entire new group) is possibly better for certain extension. In addition people think extensions should not add default and that so... idk [22:50:50] * greg-g goes [22:52:20] TimStarling: Oh with that point, fair enough. [22:52:43] (03Abandoned) 10John F. Lewis: Add checkuser(-log) permissions by default [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/127443 (owner: 10John F. Lewis) [23:00:30] there's nothing on the calendar, so nothing to deploy [23:00:59] * YuviPanda deployes a parachute [23:02:00] (03PS1) 10Dzahn: admin yaml for tridge (backups) [operations/puppet] - 10https://gerrit.wikimedia.org/r/135957 [23:02:02] (03PS1) 10Dzahn: admin yaml for palladium [operations/puppet] - 10https://gerrit.wikimedia.org/r/135958 [23:02:04] (03PS1) 10Dzahn: admin yaml for iodine [operations/puppet] - 10https://gerrit.wikimedia.org/r/135959 [23:07:46] Krinkle: i like it, nice and clean [23:10:38] ori: I also found out hilariously that my gadgets are used 3x more often than I thought [23:10:47] I have this hacky habit of using GlobalUsage to track user scripts [23:10:54] But my query was limited to 500 entries [23:11:08] it didn't say "500" though, as I'm requesting for multiple files [23:11:19] it was like 312, but is actually closer to 800 [23:11:26] https://tools.wmflabs.org/usage/?action=usage&group=Krinkle [23:11:45] You must all hate me for this wonderful exploit [23:11:47] I would :P [23:12:08] https://commons.wikimedia.org/w/index.php?title=Special:GlobalUsage/Krinkle_RTRC.js [23:12:24] >// [[File:Krinkle_RTRC.js]] [23:12:36] People know how to copy paste things very well [23:12:54] PROBLEM - Puppet freshness on searchidx1001 is CRITICAL: Last successful Puppet run was Wed 28 May 2014 17:11:33 UTC [23:13:15] In porting the old tool from toolserver to labs I rewrote it and noticed it was hitting the 500 limit [23:13:17] :D [23:36:33] <^d> Who's swatting today? [23:37:34] ^d: i think they had nothing to swat [23:37:45] <^d> I have something I'm going to swat then :) [23:37:53] heh, ok [23:37:59] * greg-g looks in [23:38:05] (03CR) 10Chad: [C: 032] Include CologneBlue and Modern if they exist as proper skins [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135927 (owner: 10Chad) [23:38:09] ah [23:38:16] (03Merged) 10jenkins-bot: Include CologneBlue and Modern if they exist as proper skins [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135927 (owner: 10Chad) [23:38:23] <^d> That, from earlier. So things don't explode on deploy tomorrow. [23:38:31] (03CR) 10Ori.livneh: [C: 032] Update reference to role::mediawiki::configuration::php [operations/puppet] - 10https://gerrit.wikimedia.org/r/135939 (owner: 10Ori.livneh) [23:39:22] !log demon Synchronized wmf-config/CommonSettings.php: Including external CologneBlue/Modern skins, if they exist (duration: 00m 07s) [23:39:27] Logged the message, Master [23:39:41] (03PS2) 10Ori.livneh: jobrunners: set nice to 19, not 20 [operations/puppet] - 10https://gerrit.wikimedia.org/r/134644 [23:40:13] <^d> Aw no MatmaRex. [23:42:24] RECOVERY - Puppet freshness on searchidx1001 is OK: puppet ran at Wed May 28 23:42:18 UTC 2014 [23:43:00] (03CR) 10Ori.livneh: [C: 032] "+2'd by Giuseppe earlier" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134644 (owner: 10Ori.livneh) [23:44:09] (03CR) 10Ori.livneh: [C: 031] Periodically restart job runners to avoid pipeline shrinking issue [operations/puppet] - 10https://gerrit.wikimedia.org/r/135875 (owner: 10Aaron Schulz) [23:52:10] (03PS4) 10Ori.livneh: Remove misc::deployment::scap_scripts from terbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/134282 (owner: 10Hoo man) [23:54:00] (03CR) 10Ori.livneh: [C: 032] "+1'd by Dzahn earlier, so merging. I'll clean up the files." [operations/puppet] - 10https://gerrit.wikimedia.org/r/134282 (owner: 10Hoo man)