[00:00:04] RoanKattouw, ^d, marktraceur, MaxSem: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141216T0000). Please do the needful. [00:02:36] if nobody's interested, I guess I can [00:03:05] * MaxSem pokes aude, gi11es, marktraceur [00:04:01] ? [00:04:28] we will have something but it will take a bit longer, so might have to do it ourselves in a bit [00:05:08] eh, looked at the wrong row:P [00:05:18] heh [00:05:31] in such case, ping superm401, ebernhardson, yurikR [00:05:34] * aude is keeping jenkins busy :/ [00:07:05] MaxSem, sup? [00:07:30] going to deploy your stuff [00:07:34] kul :) [00:07:52] (03CR) 10MaxSem: [C: 032] Don't collapse sections on mobile WD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179513 (owner: 10MaxSem) [00:08:07] is Kul even working for us? [00:08:36] kul is always with us [00:09:47] eh, zuul queue is fubar [00:10:17] meh https://performance.wikimedia.org/xenon/theme/style.css [00:10:35] @import url('//fonts.googleapis.com/css?family=Open+Sans'); [00:10:49] third party inclusion -> bad [00:11:09] (03Merged) 10jenkins-bot: Don't collapse sections on mobile WD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179513 (owner: 10MaxSem) [00:12:00] !log maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/179513 (duration: 00m 06s) [00:12:06] Logged the message, Master [00:13:28] (03PS1) 10BryanDavis: Convert from wfErrorLog to MWLogger logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180082 [00:14:41] (03PS1) 10Hoo man: Don't load fonts from google on performance.wikimedia.org/xenon/ [puppet] - 10https://gerrit.wikimedia.org/r/180083 [00:14:57] ori: ^ [00:21:16] okayyy, ebernhardson and superm401 - since you're not around, I'm not deploying your changes [00:21:30] MaxSem, I'm here. [00:21:35] aha:) [00:21:44] MY SCARE TACTICS WORKS [00:22:06] Sorry, I forgot ACKing was required for SWAT. [00:22:10] I saw your earlier post. [00:22:31] MaxSem: it only pings me if it starts with ebernhardson :P [00:22:49] yup, I need someone to verify your changes work and don't break stuff [00:23:50] (03CR) 10MaxSem: [C: 032] Enable EventLogging for Flow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179981 (owner: 10Mattflaschen) [00:24:24] MaxSem: hmm, probably fine but i think i have to wait ~5min for resource loader [00:25:20] ebernhardson, what's fine? [00:26:08] MaxSem: the patch you merged to wmf12 for me, https://gerrit.wikimedia.org/r/180043 [00:26:29] oh i suppose i can debug=1 it [00:26:33] well, I haven't deployed it yet because you weren't responding;) [00:26:34] but not quite the same :) [00:26:36] oh :P [00:28:23] (03Merged) 10jenkins-bot: Enable EventLogging for Flow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179981 (owner: 10Mattflaschen) [00:28:51] pfft zuul [00:29:52] !log maxsem Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/179981 (duration: 00m 07s) [00:29:56] Logged the message, Master [00:29:57] superm401, ^ [00:30:14] Thanks, MaxSem. Should be fine, but I'll do a quick sanity check. [00:31:49] MaxSem,yeah, we're good. [00:32:43] MaxSem, there is another patch for zeroportal, but that one we should just bring it to master (or i could do it right after swat) [00:33:02] yep, too late to add stuff to swat [00:34:52] yurikR, my checkout of ZeroBanner is broken - could you make submodule updates? [00:35:29] MaxSem, on tin? [00:35:51] MaxSem, actually it still hasn't been merged by jenkins ( [00:35:52] in git [00:35:59] oh shit [00:36:07] just merged [00:36:09] sup? [00:36:44] aha, a bunch of wikidata jobs is gone [00:36:50] works again:) [00:37:55] !log maxsem Synchronized php-1.25wmf12/extensions/Flow/: https://gerrit.wikimedia.org/r/#/c/180043/ (duration: 00m 08s) [00:37:58] Logged the message, Master [00:38:01] ebernhardson, ^ [00:38:49] MaxSem, ideally i would love for https://gerrit.wikimedia.org/r/#/c/180029/ to also be deployed [00:38:57] but i could do it myself (it will be merge to master) [00:39:24] eh, looks too large for a swat [00:40:38] (03CR) 10Yurik: [C: 031] Update outbound X-CS behavior in light of unified [puppet] - 10https://gerrit.wikimedia.org/r/179571 (owner: 10Dr0ptp4kt) [00:40:50] bblack, when you have a sec ^ [00:41:43] jouncebot, next [00:41:44] In 15 hour(s) and 18 minute(s): Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141216T1600) [00:42:15] greg-g, MaxSem, aude, i would like to go right after swat for zeroportal patch depl [00:44:23] MaxSem: looks good, works as expected. thanks [00:48:06] grmbl [00:48:19] yurikR, can you deploy ZB too then? [00:48:28] up to you ) [00:48:42] MaxSem, are you done? [00:49:05] need to do a scap [00:49:16] !log maxsem Started scap: (no message) [00:49:22] Logged the message, Master [00:53:07] bleh, that will be forever [00:53:36] (03PS1) 10Yuvipanda: Create column objects as well [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180098 [00:53:51] oh noes, scap? [00:54:07] what we would like to deploy needs a scap [00:59:00] (03PS2) 10Yuvipanda: Create column objects as well [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180098 [00:59:39] aude, when are you deploying? i added myself to the depl schedule [01:00:13] (and i don't need scap) [01:01:23] yurikR: soon as there is an opportunity [01:01:31] (03PS3) 10Yuvipanda: Create column objects as well [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180098 [01:01:33] aude, can you go right after me? [01:01:39] sure [01:01:42] i have two exts to push [01:01:59] ok [01:10:55] * yurikR is waiting for scap... oh max max, why oh why... [01:15:29] because sikrit [01:15:52] !log maxsem Finished scap: (no message) (duration: 26m 35s) [01:15:59] yurikR, aude ^^ [01:16:04] Logged the message, Master [01:16:05] :) [01:16:25] (03CR) 10Yuvipanda: [C: 032 V: 032] Create column objects as well [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180098 (owner: 10Yuvipanda) [01:16:45] eyi!!! [01:16:48] here i go [01:17:11] (03PS1) 10Yuvipanda: Read whitelist/greylist from files and compare to db [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180104 [01:20:21] !log yurik Synchronized php-1.25wmf11/extensions/ZeroBanner: (no message) (duration: 00m 06s) [01:20:30] Logged the message, Master [01:21:19] (03PS2) 10Yuvipanda: Read whitelist/greylist from files and compare to db [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180104 [01:30:05] !log yurik Synchronized php-1.25wmf12/extensions/ZeroPortal: (no message) (duration: 00m 07s) [01:30:11] Logged the message, Master [01:34:49] !log yurik Synchronized php-1.25wmf11/extensions/ZeroBanner: (no message) (duration: 00m 09s) [01:34:56] Logged the message, Master [01:35:28] (03PS3) 10Yuvipanda: Read whitelist/greylist from files and compare to db [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180104 [01:36:09] !log yurik Synchronized php-1.25wmf12/extensions/ZeroBanner: (no message) (duration: 00m 05s) [01:36:13] Logged the message, Master [01:36:24] aude, done [01:36:42] ok [01:37:23] (03PS4) 10Yuvipanda: Read whitelist/greylist from files and compare to db [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180104 [01:42:14] (03PS2) 10Ori.livneh: Don't load fonts from google on performance.wikimedia.org/xenon/ [puppet] - 10https://gerrit.wikimedia.org/r/180083 (owner: 10Hoo man) [01:42:21] hoo: thanks [01:42:40] (03CR) 10Ori.livneh: [C: 032 V: 032] Don't load fonts from google on performance.wikimedia.org/xenon/ [puppet] - 10https://gerrit.wikimedia.org/r/180083 (owner: 10Hoo man) [01:43:18] going to deploy now (and scap) [01:47:33] (03PS5) 10Yuvipanda: Read whitelist/greylist from files and compare to db [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180104 [01:47:43] !log aude Started scap: Update test.wikidata [01:47:51] Logged the message, Master [01:49:39] (03CR) 10Yuvipanda: [C: 032 V: 032] Read whitelist/greylist from files and compare to db [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180104 (owner: 10Yuvipanda) [01:53:03] (03PS1) 10Ori.livneh: xenon: Tweak the text of the footer [puppet] - 10https://gerrit.wikimedia.org/r/180109 [01:53:24] (03CR) 10Ori.livneh: [C: 032 V: 032] xenon: Tweak the text of the footer [puppet] - 10https://gerrit.wikimedia.org/r/180109 (owner: 10Ori.livneh) [01:55:56] (03PS1) 10Yuvipanda: Add feature to ignore databases with a particular regex [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180110 [01:57:34] (03PS2) 10Ori.livneh: hhvm: fix include_path [puppet] - 10https://gerrit.wikimedia.org/r/179974 (owner: 10Hashar) [01:58:29] (03CR) 10Yuvipanda: [C: 032 V: 032] Add feature to ignore databases with a particular regex [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180110 (owner: 10Yuvipanda) [02:05:11] (03PS1) 10Yuvipanda: Move commandline arguments to config file [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180112 [02:06:21] !log aude Finished scap: Update test.wikidata (duration: 18m 38s) [02:06:31] Logged the message, Master [02:12:16] !log l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s) [02:12:21] !log LocalisationUpdate completed (1.25wmf11) at 2014-12-16 02:12:20+00:00 [02:12:24] Logged the message, Master [02:12:29] Logged the message, Master [02:12:48] (03PS2) 10Yuvipanda: Move commandline arguments to config file [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180112 [02:14:37] (03PS3) 10Yuvipanda: Move commandline arguments to config file [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180112 [02:16:48] (03PS4) 10Yuvipanda: Move commandline arguments to config file [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180112 [02:19:02] (03CR) 10Yuvipanda: [C: 032 V: 032] Move commandline arguments to config file [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180112 (owner: 10Yuvipanda) [02:23:53] !log l10nupdate Synchronized php-1.25wmf12/cache/l10n: (no message) (duration: 00m 02s) [02:23:58] !log LocalisationUpdate completed (1.25wmf12) at 2014-12-16 02:23:58+00:00 [02:24:00] Logged the message, Master [02:24:06] Logged the message, Master [02:31:44] (03CR) 10Ori.livneh: [C: 032] hhvm: fix include_path [puppet] - 10https://gerrit.wikimedia.org/r/179974 (owner: 10Hashar) [02:32:55] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [02:33:35] this alert always goes off when the spike has subsided [02:37:45] !log upgrade db1055 trusty [02:37:51] Logged the message, Master [02:42:11] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [02:54:22] (03PS1) 10Springle: upgrade db1055 to trusty and mariadb 10 [puppet] - 10https://gerrit.wikimedia.org/r/180117 [02:55:24] (03CR) 10Springle: [C: 032] upgrade db1055 to trusty and mariadb 10 [puppet] - 10https://gerrit.wikimedia.org/r/180117 (owner: 10Springle) [03:46:48] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Dec 16 03:46:47 UTC 2014 (duration 46m 46s) [03:46:53] Logged the message, Master [04:38:51] (03PS3) 1020after4: Move the top level variables in phabricator role in a class [puppet] - 10https://gerrit.wikimedia.org/r/179930 (owner: 10Alexandros Kosiaris) [04:40:14] (03CR) 1020after4: [C: 031] Move the top level variables in phabricator role in a class [puppet] - 10https://gerrit.wikimedia.org/r/179930 (owner: 10Alexandros Kosiaris) [04:40:22] hi [04:40:34] Just got this: Request: POST http://phabricator.wikimedia.org/auth/login/mediawiki:mediawiki/, from 10.64.0.172 via cp1044 cp1044 ([10.64.0.172]:80), Varnish XID 1597571183 [04:40:35] Forwarded for: 198.73.209.1, 10.64.0.172 [04:40:37] Error: 503, Service Unavailable at Tue, 16 Dec 2014 04:32:53 GMT [04:41:21] (03CR) 1020after4: [C: 031] phabricator: strip Ubuntu 12.04 (precise) support [puppet] - 10https://gerrit.wikimedia.org/r/179882 (owner: 10Faidon Liambotis) [04:48:13] anyone around? [04:48:42] abartov: hi [04:49:22] is there somewhere else I need to report this? [04:49:35] how did you land on that page? [04:49:38] Or is the message inviting me to report to an Ops person itself outdated, because it's all captured automagically these days? [04:50:11] 1. I was looking at a Phab item, and clicked Subscribe. [04:50:41] if you try again, does the problem recur? [04:50:50] 2. I was told my session is something-something, and to re-login. [04:50:50] 3. I clicked the "via mediawiki" button [04:50:50] 4. got this message. [04:51:12] yes, when just refreshing the page. Will now try to retrace the flow from the ticket. [04:51:58] yup, happened again. [04:51:58] the message inviting you to report the error is not outdated -- you should report it [04:52:04] Request: POST http://phabricator.wikimedia.org/auth/login/mediawiki:mediawiki/, from 10.64.0.172 via cp1044 cp1044 ([10.64.0.172]:80), Varnish XID 1597576274 [04:52:12] Forwarded for: 198.73.209.1, 10.64.0.172 [04:52:12] Error: 503, Service Unavailable at Tue, 16 Dec 2014 04:51:39 GMT [04:52:31] twentyafterfour: around? [04:52:42] ori: yes [04:52:56] does abartov's problem ring a bell? [04:53:26] reading... [04:55:50] ori: hmm... familiar yes, but not something I am currently able to reproduce or debug effectively.. [04:56:23] abartov: I just tried the same and it's working for me. Can you log in ok via ldap username/password? [04:58:46] (03CR) 10Gage: [C: 032] logstash: port udp2log rules to monolog input [puppet] - 10https://gerrit.wikimedia.org/r/179758 (owner: 10BryanDavis) [04:59:17] twentyafterfour: seeing this in iridium:/var/log/apache2/phabricator_error.log : https://dpaste.de/yC2O/raw [05:00:33] ori: looking into it [05:00:41] cool, thanks [05:00:43] I think we might have inherited an upstream bug somewhere [05:00:54] thanks for the logs, btw. That's helpful [05:01:01] np [05:02:43] twentyafterfour: er, what username is that? My Wikimedia user? [05:08:02] abartov: it would be your wikitech username (if you've got an account there) [05:08:17] ah, right. [05:08:19] let me try [05:09:34] well, it lets me log in with ldap, and invites me to create an account. I don't actually want to go ahead, though, because I don't want two accounts, and do want to use my main WM identity. [05:12:22] abartov: right [05:13:19] abartov: can you try creating a 'private window' or 'incognito window' and see if mediawiki oauth login will work with a clean slate? ( without any of your usual cookies / session state ) [05:17:40] twentyafterfour: sure, hang on [05:19:11] twentyafterfour: same fail! [05:19:29] using a brand-new Firefox private session. [05:21:25] abartov: and your phabricator username is abartov? [05:22:34] (03PS1) 10Springle: repool db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180119 [05:23:01] abartov: I don't see that username in phab. so what's your wiki username / phab username? [05:23:19] (03CR) 10Springle: [C: 032] repool db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180119 (owner: 10Springle) [05:23:24] (03Merged) 10jenkins-bot: repool db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180119 (owner: 10Springle) [05:24:19] yes [05:24:33] !log springle Synchronized wmf-config/db-eqiad.php: repool db1055, warm up (duration: 00m 07s) [05:24:33] that is, the wikitech/ldap username [05:24:56] my SUL usernames are "Ijon" (volunteer account) and "Asaf (WMF)" (staff account) [05:25:21] twentyafterfour: sorry, keep forgetting to flag your name. [05:27:06] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [05:27:26] abartov: so which is your phab account name that you can't access? [05:27:38] the SUL [05:28:11] twentyafterfour: I doubt it has anything to do with the specific account, though. [05:28:36] abartov: It must be something, my account is working fine. I'm just trying to find logs for you [05:28:47] which phabricator username is it? I can't find you [05:28:52] twentyafterfour: given a private session failed just by my clicking on the "via mediawiki" button, without grabbing credentials from a cookie and without getting to the OAuth dialogue... [05:29:17] abartov: that's odd it works when I do the exact same thing [05:29:52] twentyafterfour: the fact it works for you does not necessarily mean it's to do with my account. I'm guessing it could also be a cluster issue, with us on different ones? [05:30:31] abartov: that could be. still I'm trying to understand what's happening and I'm not getting anywhere [05:30:39] abartov: it doesn't seem to be an issue with phabricator at all [05:30:41] twentyafterfour: there's an actual error message here. Isn't that enough to find it in the logs? [05:30:58] abartov: I don't have any kind of access [05:31:09] all I can touch is phabricator [05:31:33] twentyafterfour: the precise failing scenario is: [05:32:00] twentyafterfour: 1. clicking the "via mediawiki" button sends the browser to https://phabricator.wikimedia.org/auth/start/?next=%2FT37489 [05:32:28] twentyafterfour: 2. which redirects to https://phabricator.wikimedia.org/auth/login/mediawiki:mediawiki/, which yields this error 503. [05:33:05] twentyafterfour: presumably, whatever is being contacted to issue the oauth request is failing. [05:33:18] twentyafterfour: so who does have access? [05:45:16] abartov: members of ops team. I'm still trying to tell if the problem is strictly with phab or something else, you never told me your phab username (not your SUL name but the name you used in phabricator) [05:46:36] twentyafterfour: I don't remember picking one. I did log in with my SUL credentials before. But if it isn't 'abartov', it should be 'Ijon'. [05:47:06] abartov: ah ha! I do see Ijon [05:51:21] abartov, ori: I can't find anything related to phabricator or that account which would cause this to happen... ori: any way to tell if those log messages correspond with abartov's login attempts? I can't tell for sure ... [05:54:36] twentyafterfour: thanks for looking into it! [05:56:18] ori: I was able to reproduce the errors you saw in the logs on iridium but I don't think they are related to the problem with abartov's login ... what I don't understand is why it's getting a 503 error from varnish...usually if phab is directly throwing an exception it will report that to the client rather than throwing up the varnish error page. [06:05:03] PROBLEM - Mediawiki Apple Dictionary Bridge on terbium is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - string Wikimedia_Foundation not found on https://search.wikimedia.org:443https://search.wikimedia.org/?lang=ensite=wikipediasearch=Wikimedia_Foundationlimit=1 - 3389 bytes in 0.039 second response time [06:29:26] (03PS1) 10Springle: depool db1073 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180120 [06:29:58] (03CR) 10Springle: [C: 032] depool db1073 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180120 (owner: 10Springle) [06:30:03] (03Merged) 10jenkins-bot: depool db1073 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180120 (owner: 10Springle) [06:31:40] !log springle Synchronized wmf-config/db-eqiad.php: depool db1073 (duration: 00m 06s) [06:33:16] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:15] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:17] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:47] !log sync-common on mw1043 after sync-file fail [06:34:51] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 2 failures [06:35:40] useless morebots [06:36:11] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:33] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 1 failures [06:37:23] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:38:49] !log <+logmsgbot> !log springle Synchronized wmf-config/db-eqiad.php: depool db1073 (duration: 00m 06s) [06:38:55] Logged the message, Master [06:38:59] !log 06:32 < springle> !log sync-common on mw1043 after sync-file fail [06:39:01] Logged the message, Master [06:43:36] (03PS1) 10KartikMistry: Beta: Fixed comment [puppet] - 10https://gerrit.wikimedia.org/r/180121 [06:48:28] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:48:31] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:05:01] !log upgrade db1073 trusty [07:05:06] Logged the message, Master [07:06:04] <_joe_> morning [07:06:07] <_joe_> hi sean [07:06:17] hey _joe_ [07:06:27] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [07:06:44] oh _joe_ is here.. must be time for my afternoon coffee :) [07:06:58] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [07:07:02] <_joe_> eheh [07:07:02] _joe_ o'clock [07:07:07] <_joe_> I just had 2 [07:07:11] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [07:07:21] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:07:26] <_joe_> one large moka cup and one espresso at a bar [07:07:57] <_joe_> I got enough caffeine to wake up an elephant, probably [07:08:07] espresso at bars is quite a popular italian thing, right [07:08:22] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:09:07] <_joe_> springle: our "bars" usually have little or no sitting space, you can get espresso, cappuccino, alcoholics and some quick food [07:09:59] <_joe_> that's one of the reasons it's hard for me to go work from a cafe' [07:10:40] ah gotcha [07:11:01] yeah, mostly coffee -> cafe, alcohol -> bar, here [07:11:05] at least in my area [07:11:26] <_joe_> yeah we don't really have that distinction [07:11:35] and cafe -> gunmen this week [07:12:06] <_joe_> in the north, where the winter is colder, people have "caffe' corretto", which is espresso with grappa or another liquor poured in [07:12:12] <_joe_> yeah I've seen :/ [07:13:03] <_joe_> (I won't note how some christian madman has shot and killed 6 people in the USA yesterday, but Italian newspapers only care about OMFG THE MUSLIM TERROR) [07:13:28] yeah similar here, though there are voices of reason around [07:14:03] <_joe_> but well, I can imagine your case has more relevance locally [07:14:19] <_joe_> also, the terrifying part is that all it takes is one madman with a gun [07:14:34] i think it surprised people, mostly. we're so far from all that stuff [07:14:43] <_joe_> yes [07:14:52] <_joe_> thanks dick cheney [07:15:09] yet our bombs are participating [07:15:21] <_joe_> in 2001, maybe .1% of the middle east saw Europe and Australia as enemies [07:15:26] <_joe_> now, maybe 50% [07:15:38] <_joe_> and for a good reason too. [07:15:48] <_joe_> nah [07:15:54] au government has just pulled more funding from foreign aid to put into defence spending :| [07:15:58] <_joe_> I vote to call that yuvipanda-wm [07:16:06] <_joe_> springle: sigh [07:23:00] (03PS2) 10Giuseppe Lavagetto: mediawiki: enhancements to hhvm_cleanup_cache [puppet] - 10https://gerrit.wikimedia.org/r/179102 [07:23:10] (03PS8) 10Andrew Bogott: Support bootstrap-vz for building labs debian images [puppet] - 10https://gerrit.wikimedia.org/r/179765 [07:24:28] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: enhancements to hhvm_cleanup_cache [puppet] - 10https://gerrit.wikimedia.org/r/179102 (owner: 10Giuseppe Lavagetto) [07:27:10] <_joe_> did I say I hate git submodules? [07:27:23] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [07:28:30] _joe_: it bears repeating [07:28:52] <_joe_> now to fix 3 lines I need two commits [07:28:57] <_joe_> and 2 reviews [07:28:59] <_joe_> MEH [07:29:51] (03PS1) 10Giuseppe Lavagetto: cdh: fix templates causing warnings [puppet/cdh] - 10https://gerrit.wikimedia.org/r/180124 [07:41:32] (03PS1) 10KartikMistry: WIP: Add cxserver role for production [puppet] - 10https://gerrit.wikimedia.org/r/180125 [07:42:51] (03PS2) 10Giuseppe Lavagetto: Add some experimental settings to one server in each pool [puppet] - 10https://gerrit.wikimedia.org/r/178478 [07:45:06] (03PS2) 10KartikMistry: WIP: Add cxserver role for production [puppet] - 10https://gerrit.wikimedia.org/r/180125 [07:47:05] (03CR) 10Giuseppe Lavagetto: [C: 032] Add some experimental settings to one server in each pool [puppet] - 10https://gerrit.wikimedia.org/r/178478 (owner: 10Giuseppe Lavagetto) [07:48:59] (03Abandoned) 10Giuseppe Lavagetto: hhvm: define more jit configurations [puppet] - 10https://gerrit.wikimedia.org/r/175412 (owner: 10Giuseppe Lavagetto) [07:55:28] (03PS1) 10Giuseppe Lavagetto: deployment: make the keyholder key path configurable [puppet] - 10https://gerrit.wikimedia.org/r/180126 [07:57:08] (03CR) 10Giuseppe Lavagetto: "Please see how to do this next time here:" [puppet] - 10https://gerrit.wikimedia.org/r/179875 (owner: 10Hashar) [08:02:23] * yuvipanda-wm notifies _joe_ of all email to root@ [08:02:34] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 525 bytes in 0.003 second response time [08:03:08] <_joe_> lol [08:03:36] Yeah I bet that happens :) [08:03:42] Err [08:03:48] I bet that's useful? [08:03:55] <_joe_> sure [08:03:59] * yuvipanda-wm just woke up, should probably head back to sleep [08:04:29] <_joe_> yuvipanda-wm: me too [08:05:06] <_joe_> yuvipanda-wm: garage punk usually helps waking me up [08:05:33] Aah [08:05:39] It is really hard to wake me up [08:05:51] A skill acquired from years of sleeping in college [08:06:40] (We had mandatory attendance) [08:07:10] Also what is a garage punk? [08:07:31] I presumed it was some kid playing loud things from a garage [08:07:38] <_joe_> it's a particular current of punk rock [08:07:40] But could also be a music genre [08:07:42] Aaah [08:07:46] <_joe_> the most raw and rock'n rolly [08:07:47] Heh [08:07:56] I see [08:09:05] <_joe_> https://en.wikipedia.org/wiki/Garage_punk [08:09:12] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.020 second response time [08:09:49] greetings [08:09:53] <_joe_> ciao godog [08:10:02] ciao _joe_ [08:10:09] I'll play the song now anyway [08:10:25] <_joe_> ehehe [08:10:30] <_joe_> which version? [08:10:54] https://www.youtube.com/watch?v=DpkDdLZGg30 this [08:11:23] <_joe_> I love the Willy DeVille mariachi version :P [08:11:39] the right hand guitar played left handed always looks funny [08:12:13] I'll give it a try [08:12:26] PROBLEM - puppet last run on mw1114 is CRITICAL: CRITICAL: puppet fail [08:12:33] <_joe_> mh [08:13:54] (03CR) 10Giuseppe Lavagetto: [C: 032] deployment: make the keyholder key path configurable [puppet] - 10https://gerrit.wikimedia.org/r/180126 (owner: 10Giuseppe Lavagetto) [08:16:45] PROBLEM - puppet last run on amssq34 is CRITICAL: CRITICAL: Puppet has 2 failures [08:17:41] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [08:17:52] PROBLEM - puppet last run on amssq46 is CRITICAL: CRITICAL: Puppet has 1 failures [08:17:59] PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: Puppet has 2 failures [08:18:20] <_joe_> wha? [08:18:31] <_joe_> esams link troubles? [08:19:13] <_joe_> Could not evaluate: Connection timed out on the hosts [08:19:33] <_joe_> yeah we had a link hiccup clearly [08:23:57] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [08:27:01] (03PS1) 10Giuseppe Lavagetto: hhvm: fix yaml quotation [puppet] - 10https://gerrit.wikimedia.org/r/180129 [08:29:32] (03CR) 10Giuseppe Lavagetto: [C: 032] hhvm: fix yaml quotation [puppet] - 10https://gerrit.wikimedia.org/r/180129 (owner: 10Giuseppe Lavagetto) [08:29:57] RECOVERY - puppet last run on amssq46 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:30:16] RECOVERY - puppet last run on cp3010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:30:22] (03PS1) 10KartikMistry: WIP: Add apertium role for production [puppet] - 10https://gerrit.wikimedia.org/r/180130 [08:32:17] RECOVERY - puppet last run on amssq34 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:33:57] RECOVERY - puppet last run on mw1114 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:42:37] !log upgrading trusty/codfw to diamond 3.5-3 [08:42:43] Logged the message, Master [09:03:26] godog: hmm, do you know how hard / easy it is to set up a per-project reprepo (or similar debian package local hosting solution that allows us separate trusty and precise packages)? [09:03:38] tools has custom packages, a few of them, and we also have a mix of distros now [09:08:38] is that "labsdebrepo" ? [09:09:45] yuvipanda-wm: manifests/misc/labsdebrepo.pp:class misc::labsdebrepo { ? [09:10:04] mutante: yes, but that doesn’t really support trusty/precise, AFAIK [09:10:09] of course I could be totally wrong as well :) [09:10:24] and you don’t need anything extra other than muddling with the packages as well [09:10:37] hmm, does it need to support a specific distro? [09:10:51] it just sets up sources.list , dpkg-scanpackages etc [09:11:12] i dont see anything specific to a distro name [09:16:39] yuvipanda-wm: good question, you might be able to use the reprepro module as-is and get it setup on the respective project puppetmaster perhaps with some puppet glue, what's the use case? [09:16:52] the trusty/precise part would be apt::pin [09:16:55] ? [09:17:24] godog: toollabs has a few custom packages, and some have different versions for precise/trusty [09:19:07] mutante: not sure what you mean? [09:19:18] yuvipanda-wm: I see, where are the packages stored now? [09:19:41] godog: with labsdebrepo, which is just on NFS with an exec for dpkg-scasnpackages [09:20:28] ah ok, well yeah since the instance storage is ephemeral anyways [09:21:10] godog: yeah [09:22:56] you want to use different packages per release, right [09:23:01] that would be APT pinning [09:23:08] and we have a puppet define for it [09:24:08] oh ok, so there's a single directory with dpkg-scanpackages and not per-distro directories? [09:24:58] yea, /data/project/repo [09:26:47] I see, yeah makes sense if packages don't diverge much I think [09:27:49] godog: yeah, one directory for everything [09:28:00] godog: oh wait, I could just have different directories and branchi n puppet [09:28:53] and change the sources.list accordingly, indeed, are there many different packages? [09:29:34] !log upgrade diamond in trusty/esams [09:29:41] Logged the message, Master [09:29:56] not yet :) but we have at least one where we ‘fixed’ it by just not installing it on trusty [09:30:52] haha wat [09:31:02] nobody’s complained yet... [09:31:53] !log upgrade diamond in trusty/eqiad [09:31:55] Logged the message, Master [09:32:12] paravoid: did you dig up anything interesting looking at the virt servers yesterday? [09:52:10] andrewbogott: have one more small lint fix for virt1000, just adding {} [09:54:00] PROBLEM - DPKG on db1027 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:54:49] ^ iU diamond [09:55:53] and also python-diamong [09:55:55] diamond [09:56:03] PROBLEM - DPKG on db1049 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:56:35] godog: ^ looks related to diamond upgrade [09:56:54] RECOVERY - DPKG on db1027 is OK: All packages OK [09:57:01] ah :) nvm [09:59:17] RECOVERY - DPKG on db1049 is OK: All packages OK [10:00:37] mutante: hehe salt and puppet racing [10:00:46] (now imagine that as a gif) [10:00:47] yep :) [10:00:52] haha [10:02:19] puppet forge has a puppet module to "manage a salt installation" [10:25:05] (03PS9) 10Andrew Bogott: Support bootstrap-vz for building labs debian images [puppet] - 10https://gerrit.wikimedia.org/r/179765 [10:34:41] (03PS12) 10Adrian Lang: [WIP] contint: Apply contint::qunit_localhost to labs slaves [puppet] - 10https://gerrit.wikimedia.org/r/168631 (https://bugzilla.wikimedia.org/72063) (owner: 10Krinkle) [10:42:25] (03CR) 10Daniel Kinzler: "This should be enabled, so we start accurately tracking the content model of every page (and every revision). However, changing a page's c" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/170129 (https://bugzilla.wikimedia.org/49193) (owner: 10Spage) [10:44:47] (03PS1) 10Gilles: Disable thumbnail chaining [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180138 [10:52:15] (03CR) 10Hashar: "Thanks Alexandros. I address all issues in next patchset. Will then rebase it on top of https://gerrit.wikimedia.org/r/179930 which drops" (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/178810 (owner: 10Hashar) [10:52:32] (03PS7) 10Hashar: Basic rspec setup [puppet] - 10https://gerrit.wikimedia.org/r/178810 [11:25:35] (03CR) 10Alexandros Kosiaris: [C: 032] "Thanks for the reviews guys!" [puppet] - 10https://gerrit.wikimedia.org/r/179930 (owner: 10Alexandros Kosiaris) [11:30:11] (03CR) 10Dzahn: "could the hardcoded values be moved out of the script and into puppet role? like for example these:" [puppet] - 10https://gerrit.wikimedia.org/r/179765 (owner: 10Andrew Bogott) [11:32:28] (03CR) 10Dzahn: [C: 032] openstack-database-server: enclose variables [puppet] - 10https://gerrit.wikimedia.org/r/179452 (owner: 10Dzahn) [11:33:00] (03CR) 10Alexandros Kosiaris: [C: 032] Beta: Fixed comment [puppet] - 10https://gerrit.wikimedia.org/r/180121 (owner: 10KartikMistry) [11:55:39] (03PS8) 10Hashar: Basic rspec setup [puppet] - 10https://gerrit.wikimedia.org/r/178810 [11:56:58] (03CR) 10Hashar: "Rebased. I have droped /spec/spec_helper.rb since some new spec have been added to modules/wmflibs and that creates required fixtures pat" [puppet] - 10https://gerrit.wikimedia.org/r/178810 (owner: 10Hashar) [11:58:03] (03PS3) 10KartikMistry: WIP: Add cxserver role for production [puppet] - 10https://gerrit.wikimedia.org/r/180125 [12:23:19] (03PS1) 10Yuvipanda: Update ignore regex [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180151 [12:25:58] (03CR) 10Yuvipanda: [C: 032 V: 032] Update ignore regex [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180151 (owner: 10Yuvipanda) [12:27:23] (03PS4) 10KartikMistry: Add cxserver role for production [puppet] - 10https://gerrit.wikimedia.org/r/180125 [12:31:42] (03PS5) 10KartikMistry: Add cxserver role for production [puppet] - 10https://gerrit.wikimedia.org/r/180125 [12:41:35] (03PS2) 10KartikMistry: Add apertium role for production [puppet] - 10https://gerrit.wikimedia.org/r/180130 [12:48:20] (03PS2) 10KartikMistry: Beta: Fix spacing and indentation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/178772 [13:03:52] (03CR) 10Andrew Bogott: "thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/179452 (owner: 10Dzahn) [13:04:56] (03PS1) 10Giuseppe Lavagetto: mediawiki: furl support for passing headers [puppet] - 10https://gerrit.wikimedia.org/r/180155 [13:07:38] (03PS1) 10Hashar: Refactor rake entry point for specs [puppet] - 10https://gerrit.wikimedia.org/r/180162 [13:14:51] PROBLEM - mailman list info on sodium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:15:06] PROBLEM - puppet last run on sodium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:15:07] PROBLEM - HTTP on sodium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:15:29] PROBLEM - salt-minion processes on sodium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:15:51] PROBLEM - HTTPS on sodium is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [13:17:00] PROBLEM - mailman archives on sodium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:19:13] PROBLEM - dhclient process on sodium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:19:52] * _joe_ lunch [13:20:42] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 2 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [13:20:51] !log started lighttpd on sodium [13:21:01] Logged the message, Master [13:21:12] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 16 processes with UID = 38 (list), regex args /mailman/bin/qrunner [13:21:15] RECOVERY - mailman list info on sodium is OK: HTTP OK: HTTP/1.1 200 OK - 15129 bytes in 0.256 second response time [13:21:15] RECOVERY - puppet last run on sodium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:21:33] RECOVERY - HTTP on sodium is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 174 bytes in 0.007 second response time [13:21:50] RECOVERY - salt-minion processes on sodium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [13:22:00] RECOVERY - HTTPS on sodium is OK: SSL_CERT OK - X.509 certificate for lists.wikimedia.org from RapidSSL CA valid until Jan 31 02:58:36 2016 GMT (expires in 411 days) [13:22:10] RECOVERY - dhclient process on sodium is OK: PROCS OK: 0 processes with command name dhclient [13:23:30] RECOVERY - mailman archives on sodium is OK: HTTP OK: HTTP/1.1 200 OK - 56003 bytes in 0.050 second response time [13:25:23] mutante: what was that? [13:28:23] paravoid: I'm now logged into busybox on virt1011 during the install, as you suggested. And indeed, wget is either failing or extremely slow (hard to tell which) [13:28:54] (oops, and now it's time for a call) [13:28:57] I'll have a look right after our call [13:29:12] thanks [13:29:32] virt1010 is stalled at the 'no swap partition' message, so that's worth investigating as well. [13:38:39] PROBLEM - nutcracker port on mw1205 is CRITICAL: Cannot assign requested address [13:41:49] RECOVERY - nutcracker port on mw1205 is OK: TCP OK - 0.000 second response time on port 11212 [13:43:38] (03PS1) 10RobH: setting mgmt ip for tmh2001-2002 [dns] - 10https://gerrit.wikimedia.org/r/180170 [13:44:10] (03CR) 10RobH: [C: 032] setting mgmt ip for tmh2001-2002 [dns] - 10https://gerrit.wikimedia.org/r/180170 (owner: 10RobH) [13:46:52] (03CR) 10Hashar: Refactor rake entry point for specs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/180162 (owner: 10Hashar) [14:02:37] (03PS8) 10Aklapper: phabricator: community metrics stats mail [puppet] - 10https://gerrit.wikimedia.org/r/177792 (owner: 10Dzahn) [14:09:05] paravoid: do I need to log out of busybox for you to connect? [14:09:18] no [14:09:24] just logout from the VSP [14:09:48] ok, done [14:09:50] (03CR) 10Aklapper: "Patchset 8: Output of some SQL queries included not only the integer but in some cases also part of the SELECT parameter so we failed usin" [puppet] - 10https://gerrit.wikimedia.org/r/177792 (owner: 10Dzahn) [14:10:12] virt1011? [14:10:19] yep [14:10:23] I'm logged in to the VSP but I see nothing [14:10:31] That's because it's updating so slowly... [14:10:43] if you wait 15 minutes you'll see a little sliver of progress bar draw :( [14:10:44] didn't you say you had a shell? [14:10:48] yeah [14:10:52] (03PS9) 10Aklapper: phabricator: community metrics stats mail [puppet] - 10https://gerrit.wikimedia.org/r/177792 (owner: 10Dzahn) [14:10:54] oh? that was via the network console? [14:11:11] paravoid: ssh with the new_install key [14:11:16] right [14:11:19] it works! Once the instance is past the partitioning stage I guess. [14:11:46] it doesn't even ping for me now [14:12:27] I can still connect. # ssh -i /root/.ssh/new_install root@virt1011.eqiad.wmnet <- works [14:12:33] oh, sorry -- [14:12:35] you need to do this from iron [14:12:41] because of the labs subnet, not reachable from palladium [14:12:49] oh dammit [14:12:51] (03CR) 10Aklapper: "Patchset 9: SQL syntax error only in the script when using backticks in the Median age of open task queries. Works as expected also withou" [puppet] - 10https://gerrit.wikimedia.org/r/177792 (owner: 10Dzahn) [14:12:54] thanks, I didn't think of that [14:18:52] (03CR) 10Dzahn: [C: 032] "not currently used. fixed anyways" [puppet] - 10https://gerrit.wikimedia.org/r/179188 (owner: 10Dzahn) [14:19:52] (03PS2) 10Hoo man: Bouncehandler: Use the API pool for API requests [puppet] - 10https://gerrit.wikimedia.org/r/174951 [14:20:06] ^ easy code review is easy [14:23:34] ok, something's misconfigured with ipv6 [14:26:53] it's broken for existing hosts too [14:30:38] andrewbogott: fixed [14:30:50] great! [14:30:59] Want to sort out the swap partition thing while you're at it? :) [14:33:42] (03PS10) 10Aklapper: phabricator: community metrics stats mail [puppet] - 10https://gerrit.wikimedia.org/r/177792 (owner: 10Dzahn) [14:34:54] (03CR) 10Aklapper: "Patchset 10 also adds "Median age in days of open tasks by priority". Fixed the float vs integer issue by just cutting the .0000 or .5000 " [puppet] - 10https://gerrit.wikimedia.org/r/177792 (owner: 10Dzahn) [14:50:16] (03PS2) 10BBlack: Update outbound X-CS behavior in light of unified [puppet] - 10https://gerrit.wikimedia.org/r/179571 (owner: 10Dr0ptp4kt) [14:51:37] (03CR) 10BBlack: [C: 032] Update outbound X-CS behavior in light of unified [puppet] - 10https://gerrit.wikimedia.org/r/179571 (owner: 10Dr0ptp4kt) [15:00:26] <_joe_> !log depooling part of the apache appserver pool to assess current load [15:00:32] Logged the message, Master [15:07:59] (03PS11) 10Dzahn: phabricator: community metrics stats mail [puppet] - 10https://gerrit.wikimedia.org/r/177792 [15:08:30] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Concepts are fine, inline comments" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/180130 (owner: 10KartikMistry) [15:19:24] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/558/change/177792/html/iridium.eqiad.wmnet.html" [puppet] - 10https://gerrit.wikimedia.org/r/177792 (owner: 10Dzahn) [15:22:56] RECOVERY - Mediawiki Apple Dictionary Bridge on terbium is OK: HTTP OK: HTTP/1.1 200 OK - 748 bytes in 0.229 second response time [15:23:35] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Concepts look OK, comments inline" (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/180125 (owner: 10KartikMistry) [15:25:36] Wow, the Apple Dictionary thing is a separate thing? :O [15:27:43] sjoerddebruin: yes :p [15:28:47] And Spotlight is also given a special treatment... The birth and death dates are in the title... [15:30:44] mutante: That one line change is almost a month old now: https://gerrit.wikimedia.org/r/174951 [15:31:26] * hoo|lecture hates stuck changes [15:32:13] Need a code review, hoo|lecture? [15:32:29] (03CR) 10Mark Bergsma: [C: 031] Bouncehandler: Use the API pool for API requests [puppet] - 10https://gerrit.wikimedia.org/r/174951 (owner: 10Hoo man) [15:32:54] sjoerddebruin: Always :D [15:33:07] (03CR) 10Sjoerddebruin: [C: 031] Bouncehandler: Use the API pool for API requests [puppet] - 10https://gerrit.wikimedia.org/r/174951 (owner: 10Hoo man) [15:33:18] (And yes, I've checked) [15:33:18] Not like an important change, but still having those stuck only leads to rebase problems etc. [15:33:48] (03CR) 10Gage: [C: 032] cdh: fix templates causing warnings [puppet/cdh] - 10https://gerrit.wikimedia.org/r/180124 (owner: 10Giuseppe Lavagetto) [15:34:09] (03PS3) 10Giuseppe Lavagetto: Bouncehandler: Use the API pool for API requests [puppet] - 10https://gerrit.wikimedia.org/r/174951 (owner: 10Hoo man) [15:34:27] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Bouncehandler: Use the API pool for API requests [puppet] - 10https://gerrit.wikimedia.org/r/174951 (owner: 10Hoo man) [15:34:39] Jeeh, rebase... :/ [15:34:52] <_joe_> sjoerddebruin: ? [15:35:27] E-mails about that. :/ [15:35:55] <_joe_> hoo|lecture: merged [15:36:12] hm did someone just puppet-merge the change i just +2'd? [15:36:25] _joe_? [15:36:27] <_joe_> jgage: I don't think so [15:36:30] _joe_: Thanks :) [15:36:33] weird. it's not appearing. [15:36:36] doesn't need rebase [15:36:40] <_joe_> jgage: it's in a submodule [15:36:47] ach [15:36:48] ok [15:36:49] <_joe_> I thought the submodule was your doing [15:36:50] <_joe_> :) [15:36:55] andrewbogott: so everything works, right? [15:37:08] Only have 3 changes on /puppet left [15:37:16] (03PS3) 10Faidon Liambotis: Add README, RSpecs and tests for squid3 module [puppet] - 10https://gerrit.wikimedia.org/r/179493 (owner: 10Alexandros Kosiaris) [15:37:16] and one is stalled because I'm to lazy to fix [15:37:28] (03CR) 10Faidon Liambotis: [C: 032] Add README, RSpecs and tests for squid3 module [puppet] - 10https://gerrit.wikimedia.org/r/179493 (owner: 10Alexandros Kosiaris) [15:37:51] paravoid: I'm going to restart the install on 1011 now -- we'll see if partitioning still complains [15:44:26] (03PS6) 10KartikMistry: Add cxserver role for production [puppet] - 10https://gerrit.wikimedia.org/r/180125 [15:45:03] (03PS1) 10Dzahn: phab metrics: fix scope for @mysql_host variable [puppet] - 10https://gerrit.wikimedia.org/r/180193 [15:45:03] paravoid: it's still complaining about a lack of swap. But it got there much faster this time! [15:45:20] I didn't attempt to address the swap issue at all [15:46:13] paravoid: I know -- just thought you might be interested :) I'll tinker with partman more tomorrow [15:46:24] okay :) [15:46:27] Thank you for fixing the network issue! Will be much less painful to debug now. [15:46:28] there's a way to silence this [15:46:40] but I'm guessing you don't want to [15:46:46] because you actually want to have swap :) [15:46:56] Yeah, it's probably a real error, somehow. [15:46:58] (03CR) 10Hashar: [C: 04-1] "This patch has been cherry picked on the integration puppetmaster. It causes instances to no more be able to boot:" [puppet] - 10https://gerrit.wikimedia.org/r/173512 (https://bugzilla.wikimedia.org/72063) (owner: 10Krinkle) [15:47:16] (03CR) 10Hashar: "https://phabricator.wikimedia.org/T76250" [puppet] - 10https://gerrit.wikimedia.org/r/173512 (https://bugzilla.wikimedia.org/72063) (owner: 10Krinkle) [15:47:35] partman can do very weird things depending on the values you choose for the priorities field of the various partitions [15:48:18] (03CR) 10Dzahn: [C: 032] phab metrics: fix scope for @mysql_host variable [puppet] - 10https://gerrit.wikimedia.org/r/180193 (owner: 10Dzahn) [15:49:33] (03CR) 10Dzahn: "declare sql_host='m3-master.eqiad.wmnet'" [puppet] - 10https://gerrit.wikimedia.org/r/180193 (owner: 10Dzahn) [15:51:28] I'll SWAT today. [15:51:42] aude: Ping for SWAT in about 8 minutes. [15:52:18] here [15:57:31] phab: Number of open and stalled tasks in total: 17463 [15:58:29] <_joe_> !log load test done, the apache appserver pool can work flawlessly with 110 servers in the pool [15:58:33] Logged the message, Master [16:00:04] manybubbles, anomie, ^d, marktraceur: Dear anthropoid, the time has come. Please deploy Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141216T1600). [16:00:16] * anomie begins SWAT [16:00:20] anomie: have fun [16:00:21] ok [16:00:24] anomie: Your config change first [16:00:30] :) [16:00:35] (03PS2) 10Anomie: Enable $wgExtractsExtendOpenSearchXml [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179168 [16:00:44] (03CR) 10Anomie: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179168 (owner: 10Anomie) [16:01:29] (03Merged) 10jenkins-bot: Enable $wgExtractsExtendOpenSearchXml [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179168 (owner: 10Anomie) [16:01:47] !log anomie Synchronized wmf-config: SWAT: Enable $wgExtractsExtendOpenSearchXml [[gerrit:179168]] (duration: 00m 07s) [16:01:48] anomie: ^ Test please [16:01:52] Logged the message, Master [16:02:04] anomie: Seems to work [16:02:08] aude: You're up [16:02:15] k [16:03:54] (03PS3) 10KartikMistry: Add apertium role for production [puppet] - 10https://gerrit.wikimedia.org/r/180130 [16:04:31] (03CR) 10KartikMistry: Add apertium role for production (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/180130 (owner: 10KartikMistry) [16:05:35] (03PS1) 10Gage: merge submodule change: cdh: fix templates causing warnings [puppet] - 10https://gerrit.wikimedia.org/r/180197 [16:06:07] (03CR) 10Gage: [C: 032] merge submodule change: cdh: fix templates causing warnings [puppet] - 10https://gerrit.wikimedia.org/r/180197 (owner: 10Gage) [16:07:49] submodules + gerrit = pain [16:08:45] <^d> submodules = pain [16:08:52] <^d> gerrit = pain [16:09:01] <^d> Therefore: submodules + gerrit = 2pain [16:09:53] (03PS1) 10KartikMistry: Beta: Drop log_dir for Apertium [puppet] - 10https://gerrit.wikimedia.org/r/180199 [16:09:55] i think there's a missing synergy coefficient [16:10:15] <^d> Hmmm. Quite possibly. [16:11:40] 3pain at least [16:11:41] i find gerrit berable, though the amount of changes that we self- +2 is kinda suboptimal [16:12:22] (03CR) 10Alexandros Kosiaris: [C: 032] Add apertium role for production [puppet] - 10https://gerrit.wikimedia.org/r/180130 (owner: 10KartikMistry) [16:12:28] that seems more on us than on gerrit [16:12:29] * godog runs [16:12:34] yup [16:13:25] (03PS2) 10Alexandros Kosiaris: Beta: Drop log_dir for Apertium [puppet] - 10https://gerrit.wikimedia.org/r/180199 (owner: 10KartikMistry) [16:13:41] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/180199 (owner: 10KartikMistry) [16:14:09] FWIW I don't mind gerrit, I don't think there's anything else comparable and open source [16:14:49] a few seemingly minor improvements would help a lot [16:15:02] aude: Why does wikidata have a "wmf/1.25wmf12c" branch, that's currently checked out on tin? [16:15:22] I don't think it is [16:15:24] it is a replacement for wmf12 [16:15:38] I don't think all changes are appropriate for code review [16:15:39] which we decided has things we think are not ready yet [16:15:58] and the fact that we can't choose per changeset is what forces us to go the self+2 route [16:16:04] jgage: I 'm merging the cdh submodule merge as well on palladium [16:16:06] wmf12b didn't work out how i liked [16:16:18] we can probably make a new wmf13 branch [16:16:22] aude: The patch you put up for SWAT is against 1.25wmf12, not 1.25wmf12c, so I can't exactly deploy it. [16:16:27] anomie: is the SWAT window still open for new patches? [16:16:39] for core, wmf12 is ok [16:16:49] Nikerabbit: Sure [16:16:55] true, and imo +2 is not the worst thing in the world. we still have a lot of visibility on changes [16:17:12] aude: So I should revert whatever changes are currently deployed on your wmf12c branch? [16:17:21] * aude can take a look [16:17:35] my issues with gerrit are mostly related to searching and viewing [16:18:04] we want wmf12 core + wmf12c wikidata [16:18:45] i find gerrit cumbersome when I want a quick look at activity [16:19:13] yeah for me is hard to follow through general/code comments between PSes and get an overview, reviewboard does a better job at that [16:20:54] anomie: ok, i see (my fault) [16:21:10] i think I may have reached some kind of a wits end with ubuntu trusty and preseed [16:21:13] i think it's a problem that the way to get code review is having to ask on IRC [16:21:18] aude: So is 9d03a1df13ede425673da9ce57c440b59e867aa6 the right commit of Wikidata to deploy? [16:21:37] yes [16:21:42] looks good on tin [16:21:51] we build this whole infra to be able to work decentralized, around the world in different timezones.. then at the very end we add a need to realtime [16:22:11] * aude goes to make wmf13 branch for tomorrow [16:22:33] !log anomie Synchronized php-1.25wmf12/extensions/Wikidata/: SWAT: extensions/Wikidata to 9d03a1df13ede425673da9ce57c440b59e867aa6 [[gerrit:180184]] (duration: 00m 21s) [16:22:34] aude: ^ Test please [16:22:34] having to beg on IRC doesn't scale [16:22:37] ok [16:22:38] mutante: Does your team not have somebody who is responsible for making sure code review happens? [16:22:39] Logged the message, Master [16:22:50] I like IRC actually [16:23:06] anomie: so the thing I want is https://gerrit.wikimedia.org/r/#/c/180198/ ... am I expected to create a backport for it? [16:23:07] Our team has a tech lead (usually me except not this week) who does most of the CR and pokes others to do CR as appropriate [16:23:10] _joe_, bd808: scap error: "mw1249 returned [255]: Error reading response length from authentication socket." and "Permission denied (publickey)." [16:23:11] looks good [16:23:13] thanks [16:23:33] And is also aware of branch cuts and stuff like that (which is less relevant for ops I suppose, but the deployment train rules our lives) [16:23:34] <_joe_> anomie: mh, gonna look into it in a second [16:23:35] i understand what you're saying, and I agree that its weird to have half the process in IRC and half in IRC, but IRC is quick [16:23:53] RoanKattouw: i don't think so. i would like to be able to add people in the web ui and get replies [16:24:04] anomie: *nod* we've been seeing that occasionally. I think it's a bug with the shared ssh-agent getting overloaded. I was hoping ori would look into it. [16:24:06] To be fair I don't scale either [16:24:13] async communication can be really cumbersome and also sterile [16:24:19] and they don't have to be _right now_ [16:24:30] if i ping somebody on IRC i'm asking them to drop what they are doing [16:24:34] Yeah [16:24:36] <_joe_> bd808: oh you are using the shared ssh key already? cool [16:24:58] Nikerabbit: Yes, you'd normally backport it to the appropriate deployment branches in your extension, merge, and the create commits like https://gerrit.wikimedia.org/r/#/c/180187/1 to update core. For today, if you merge the backports for the extension I'll do the core bit. [16:25:17] _joe_: Yeah. It works *most* of the time [16:25:26] anomie: ok [16:25:39] Hi all! [16:25:47] I’m looking at how you guys deal with deployment [16:25:53] in https://git.wikimedia.org/tree/operations%2Fmediawiki-config.git [16:26:35] _joe_: Except it breaks the nightly l10nupdate process (l10nupdate user doesn't have access to the socket) and it occasionally throws that "Error reading response length from authentication socket." message [16:27:05] If I get it correctly this repo deals with all submodules so you clone that one and then something (couldn’t find that yet) gets all dependencies and guesses based on the hostname in the vhost which config to use [16:27:05] <_joe_> anomie: should I run sync-common on the host? [16:27:10] hi bd808 [16:27:24] <_joe_> bd808: do you remember what user is it connecting as? [16:27:36] * renoirb oh, found https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment [16:27:38] _joe_: Probably. Or at least sync php-1.25wmf12/extensions/Wikidata/ [16:28:00] RoanKattouw: i think it's more about good mail filters. so that people still get the gerrit mail when it's because they have been added as reviewer but not ALL others [16:28:12] _joe_ need to troubleshoot mw1192 cpu error...disabling in pybal [16:28:14] _joe_: mwdeploy [16:28:25] mutante: Yeah that would be nice [16:28:29] RoanKattouw: or.. people looking at web ui incoming queue on a regular basis .. [16:28:34] I haven't looked at my Gerrit mail in years [16:28:36] whichever is preferred [16:28:41] But yeah people should totally be looking at their incoming queue [16:28:45] <_joe_> cmjohnson: oh yeah sorry [16:28:46] To be honest, I don't [16:28:56] But instead I look at the project dashboard at least five times a day [16:28:56] <_joe_> I reverted my changes in git [16:29:04] <_joe_> and got that back with me [16:29:21] Gerrit has very configurable dashboards, it's pretty nice once you figure out how to set it up [16:29:22] https://gerrit.wikimedia.org/r/#/projects/mediawiki/extensions/VisualEditor,dashboards/custom:custom [16:30:02] <^d> RoanKattouw: You know better than to use nice and gerrit in the same sentence. [16:30:03] <^d> tsk tsk [16:30:12] renoirb: We make branches in git that have all of our extensions as submodules. The mediawiki-config repo is what we call "hetdeploy" or multiversion and it manages configuration for our wiki farm. [16:30:25] ok [16:30:27] got it [16:30:37] i’m reading tools/scap at the moment. [16:31:18] !log removing mw1192 from pybal and disabling puppet for hardware troubleshooting [16:31:24] Logged the message, Master [16:31:42] renoirb: you won't want or need scap I'm pretty sure. It works (mostly) for us but is not really a great deployment system [16:32:00] ok bd808, i wont use it. It feels its only to sync files around. [16:32:23] I´d like to be able to have a base git repo from which it pulls the right MediaWiki version I want [16:32:46] instead of having to fork and append like I do here https://github.com/webplatform/mediawiki-core/tree/1.24wmf16-wpd [16:33:50] ffs, someone has added new phpcs checks? [16:34:02] So I guess I should follow a bit what you guys do in https://git.wikimedia.org/summary/operations%2Fmediawiki-config.git [16:34:06] anomie: do you mind if I overide jenkins at https://gerrit.wikimedia.org/r/#/c/180201/ ? [16:34:26] Nikerabbit: I don't mind. [16:34:28] renoirb: Something like this might be what you are looking for -- https://github.com/wikimedia/mediawiki-tools-release/tree/master/make-wmf-branch [16:35:57] anomie: oki, it is in 1.25wmf12 [16:37:14] (03PS1) 10Manybubbles: Update wikimedia extra plugin [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/180204 [16:37:40] (03CR) 10Manybubbles: [C: 04-1] "Lets deploy to beta." [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/180204 (owner: 10Manybubbles) [16:38:17] each of the extensions in https://github.com/wikimedia/mediawiki-tools-release/blob/master/make-wmf-branch/default.conf (e.g. Parsoid) are pulled coming form Gerrit? (e.g. 'ssh://gerrit.wikimedia.org:29418/mediawiki' . 'Parsoid' . '.git') [16:40:18] renoirb: /mediawiki/extensions/Parsoid.git but yes [16:40:33] sorry, I wrote too quickly. Basically https://gerrit.wikimedia.org/extensions/{$name}.git", "extensions/$name" and so on. [16:40:42] got it [16:44:04] PROBLEM - Host mw1192 is DOWN: PING CRITICAL - Packet loss = 100% [16:44:21] !log anomie Synchronized php-1.25wmf12/extensions/Translate: SWAT: Translate: Revert "Request csrf tokens in JS when supported" [[gerrit:180201]] (duration: 01m 06s) [16:44:22] Nikerabbit: ^ Test please [16:44:26] Logged the message, Master [16:44:42] _joe_, bd808: Different scap error this time: mw1192 returned [255]: ssh: connect to host mw1192 port 22: Connection timed out [16:45:07] <_joe_> that is related to cmjohnson operating on it [16:45:13] <_joe_> I guess [16:45:23] <_joe_> as he was about to do that earlier [16:45:29] it has to be removed from dsh groups if there is maintenance [16:45:44] <_joe_> cmjohnson: whenever you have to repool mw1192, please run sync-common on it first [16:45:50] <_joe_> mutante: seriously? [16:46:04] <_joe_> for every powercycle one has to do a puppet review? [16:46:12] okay...rebooting now [16:46:16] if it's down during deployment, yea [16:46:33] <_joe_> mutante: no, sorry, I don't agree _at_all_ [16:46:39] <_joe_> not until the config is in puppet [16:47:10] then deployers will keep asking here [16:47:30] anomie: testing... [16:48:16] _joe_: i dont' think it would get reviews, it would be self-merge [16:48:57] anomie: works [16:49:00] on mediawiki.org [16:49:05] * anomie is done with SWAT! [16:49:15] PROBLEM - DPKG on heze is CRITICAL: Connection refused by host [16:49:16] <_joe_> mutante: still, it's like 20 operations [16:49:24] <_joe_> for something that should be atomical [16:49:33] PROBLEM - Disk space on heze is CRITICAL: Connection refused by host [16:49:34] _joe_: ? it's one line in one file [16:49:47] i think the cost is higher having to explain it on IRC all the time [16:49:53] PROBLEM - RAID on heze is CRITICAL: Connection refused by host [16:49:55] and how are deployers skipping hosts [16:50:01] can they just hit enter? [16:50:22] PROBLEM - configured eth on heze is CRITICAL: Connection refused by host [16:50:35] anomie: thank you! /me goes back to vacation [16:50:46] PROBLEM - dhclient process on heze is CRITICAL: Connection refused by host [16:50:51] <_joe_> mutante: so you would have to git pull --rebase, create a branch, edit the file, git commit, git review, go on gerrit, go on palladium and puppet-merge [16:50:56] what is heze? [16:51:00] <_joe_> it's too much effort [16:51:07] PROBLEM - puppet last run on heze is CRITICAL: Connection refused by host [16:51:10] PROBLEM - puppet last run on helium is CRITICAL: CRITICAL: Puppet has 1 failures [16:51:24] PROBLEM - salt-minion processes on heze is CRITICAL: Connection refused by host [16:51:24] RECOVERY - Host mw1192 is UP: PING OK - Packet loss = 0%, RTA = 1.01 ms [16:52:49] A: heze is a bacula server in codfw [16:56:55] _joe_ all done with mw1192 for now...will see if the error appears again. [16:57:09] <_joe_> cmjohnson: ok thanks a bunch [17:01:17] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [17:05:37] (03PS1) 10Chad: Make `es-tool ban-node` handle both IP addressses and hostnames [puppet] - 10https://gerrit.wikimedia.org/r/180210 [17:07:01] <^d> manybubbles: early christmas gift for you ^ you'd asked for that awhile back [17:07:13] ooho! [17:08:18] 5xx was transient https://gdash.wikimedia.org/dashboards/reqerror/ [17:09:36] (03CR) 10Manybubbles: [C: 031] Make `es-tool ban-node` handle both IP addressses and hostnames (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/180210 (owner: 10Chad) [17:10:33] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [17:12:12] (03CR) 10Chad: Make `es-tool ban-node` handle both IP addressses and hostnames (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/180210 (owner: 10Chad) [17:14:24] Hey guys, I could use some sysadmin to look at an issue for me. I've some user who can't log in as he gets an error. Someone around to help? :) [17:18:37] <_joe_> Barras: I think it might be https://phabricator.wikimedia.org/T75462 [17:18:47] <_joe_> Barras: is the user on IRC right now? [17:18:56] Nope, he mailed me. [17:19:24] <_joe_> legoktm: ^^ [17:19:41] <_joe_> Barras: ask him to comment on the bug then if this fits [17:19:52] _joe_: "I cannot enter Simple wiki [using the account “Macdonald-ross”] because when I try to access I get a message < Unexpected non-MediaWiki exception encountered, of type "Exception" > This does not affect my access to any other wiki, including English Wikipedia." [17:20:14] <_joe_> mh, no it seems different [17:20:28] uhh [17:20:33] let me look in the logs [17:20:41] <_joe_> ok thanks :) [17:20:56] <_joe_> you're certainly able to debug this better than me [17:21:25] <_joe_> also, I'm off :) [17:21:35] Whoever of you fixes the problem gets a virtuell cookie from me ;-) [17:24:25] Barras: do you know around what time this was at? [17:25:37] Nope, but I can ask him if you want. Looking at his edit history I'd say it must've happened after 3rd Dec [17:25:48] nah, I figured it out [17:25:52] ok :) [17:25:54] somehow he has an invalid timezone set [17:26:21] hmm, timezones are private info right? [17:26:24] (03CR) 10Matanya: "what happens if it an ipv6? it will go to host?" [puppet] - 10https://gerrit.wikimedia.org/r/180210 (owner: 10Chad) [17:27:29] legoktm: No idea, but I assume in his case it is not that secret [17:28:01] 2014-12-16 14:31:46 mw1090 simplewiki: [3482e11d] /wiki/Adaptation Exception from line 2064 of /srv/mediawiki/php-1.25wmf11/languages/Language.php: DateTimeZone::__construct(): Unknown or bad timezone ([redacted]) [17:28:52] legoktm: He even posts his local time on his userpage... [17:28:56] oh [17:30:02] !log deleted apparently invalid timecorrection preference for user_id=68157 on simplewiki [17:30:09] Logged the message, Master [17:30:12] Barras: ask them to try logging in again? [17:30:18] should be fixed now [17:30:34] legoktm: Amazing, will do! [17:30:57] * Barras sends legoktm a virtual cookie :) [17:31:00] :) [17:33:41] * legoktm files a bug [17:36:04] legoktm: What silly timezone did he have set? [17:36:09] UTC-48 [17:37:14] (03CR) 10Chad: "Yes, but we don't use IPv6 here :)" [puppet] - 10https://gerrit.wikimedia.org/r/180210 (owner: 10Chad) [17:51:19] marktraceur: https://phabricator.wikimedia.org/T78689 [17:52:09] Funky. [17:52:11] Upstream bug? [17:52:17] Bugginville. [17:52:39] maybe, I'm not sure how timezone stuff works. I learned a lot in the past 15 minutes from debian's docs :P [17:54:24] Barras: it looks like a bug on our end, not anything the user did wrong: https://phabricator.wikimedia.org/T78689 [17:56:51] (03CR) 10Krinkle: [WIP] contint: Add tmpfs mount in jenkins-deploy homedir for labs slaves (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/173512 (https://bugzilla.wikimedia.org/72063) (owner: 10Krinkle) [17:57:51] legoktm: I sent him that link :) He didn't reply yet [18:00:05] maxsem, kaldari: Dear anthropoid, the time has come. Please deploy Mobile Web (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141216T1800). [18:01:33] (03PS1) 10Hashar: Use RSpec::Core::Runner.run() in rakefile [puppet] - 10https://gerrit.wikimedia.org/r/180215 [18:02:02] (03CR) 10Hashar: "Got an example using RSpec::Core::Runner.run(['.']) as https://gerrit.wikimedia.org/r/180215 but it fails terribly :-(" [puppet] - 10https://gerrit.wikimedia.org/r/180162 (owner: 10Hashar) [18:03:21] legoktm: The problem is fixed, he can log in now. Thanks a lot for your help! :) [18:07:15] woot [18:09:40] (03PS10) 10Andrew Bogott: Support bootstrap-vz for building labs debian images [puppet] - 10https://gerrit.wikimedia.org/r/179765 [18:15:44] (03CR) 10Andrew Bogott: [C: 032] "This needs refining, but I'm merging now since it's in a nearly-working state and I want to keep track of where I go from here." [puppet] - 10https://gerrit.wikimedia.org/r/179765 (owner: 10Andrew Bogott) [18:31:41] (03CR) 10Filippo Giunchedi: "what happens if there a mixture of hostname/ip for the same host being banned/unbanned? if es behaviour is undefined I think it'd make mor" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/180210 (owner: 10Chad) [18:33:22] (03PS1) 10Cscott: Give parsoid admins the ability to update/restart the RT testing service. [puppet] - 10https://gerrit.wikimedia.org/r/180221 [18:34:33] jouncebot: next [18:34:33] In 0 hour(s) and 25 minute(s): MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141216T1900) [18:46:22] (03PS1) 10Nuria: Adding 'research' read only user to wikimetrics db [puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/180222 [18:53:35] twentyafterfour: I have a patch that needs to be synced before you do the deployment train, is it OK if I delay you for like 5 minutes while I do that? [18:54:27] RoanKattouw: certainly, I'm still getting my bearings [18:55:43] Hold on, have you even done this before? :P [18:55:54] RoanKattouw: no [18:56:07] tagged along with reedy a couple of times [18:56:39] and ^d is my backup should I get lost [18:57:05] Alright, good luck out there [18:57:16] * ^d waves [18:57:17] And bd808 and myself are around too, should you need help with the tools [18:57:34] Me being the person that wrote the book on deploying before bd808 went and rewrote all of the tooling [18:57:44] :) [18:57:53] So naturally I don't know what's what any more nowadays :D [18:58:31] RoanKattouw: I bet you could figure it out pretty quickly if you had the need ;) [18:58:56] * ^d is the one to blame for submodules [18:58:59] <^d> So, yeah there's that [18:59:05] bd808: I got lost in Python code for an hour yesterday [18:59:15] I've spent like two hours of my life writing Python [18:59:26] (The scap Python code to be exact; was trying to fix beta) [19:00:00] RoanKattouw: yuck. Problems with the ssh keys or something nastier? [19:00:04] twentyafterfour: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141216T1900). [19:01:19] bd808: No, HHVM segfaults [19:01:32] And the beta-scap-eqiad Jenkins job had failed so we were investigating that [19:01:35] But it was a red herring [19:01:51] no Reedy today? [19:01:53] BTW, a 30-minute timeout for that job is too low, it times out almost every time there are i18n changes [19:01:58] twentyafterfour: you are deploying? [19:02:00] *nod* I saw some of that on irc as was glad somebody else was debugging it [19:02:40] aude: We give Reedy 3-4 days off a year :) [19:02:48] :) [19:02:50] The extra time due to i18n is like 19 mins [19:03:19] aude: yesh [19:03:29] twentyafterfour: ok [19:03:30] RoanKattouw: Yeah. That seems to have gotten worse over time. Someday we'll make l10n builds faster... someday. [19:04:08] * aude is here in case there are any problems with the new code going on wikidata (and commons, wikivoyage, etc) [19:04:29] that is unlikely [19:04:47] bd808: Can we bump the timeout in JEnkins perhaps? [19:05:08] RoanKattouw: Yeah. It should be an easy change [19:05:19] OK. because right now it marks them as failed [19:05:24] RoanKattouw: tell me when your patch is done and I can theoretically start [19:05:30] will probably have to ask questions [19:05:31] I don't think it's actually bad but it's definitely a big red herring for investigators [19:05:36] twentyafterfour: Waiting on Jenkins, sorry [19:05:51] no rush I'm still re-reading the docs [19:08:07] <^d> twentyafterfour: Luckily there's no code deploy today, it's just moving a batch of wikis from one version to another. [19:08:23] <^d> You could go ahead and prep that patch for wikiversions.json as soon as you're ready [19:08:31] ok [19:08:53] we will have a config change for wikidata [19:09:05] that is easy though to deploy [19:09:27] (03PS2) 10Nuria: Adding 'research' read only user to wikimetrics db [puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/180222 [19:11:42] twentyafterfour: OK Jenkins is *finally* satisfied I didn't break any of our thousands of slow unit tests, now lemme deploy this change real quick [19:13:02] (03CR) 10Ottomata: "The motivation for this change was realized to be one-off, and did not need to be puppetized." [puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/180222 (owner: 10Nuria) [19:15:00] (03PS1) 10Aude: Bump cache epoch for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180234 [19:15:40] !log catrope Synchronized php-1.25wmf12/skins/Vector/: Revert watch star change (duration: 00m 05s) [19:15:46] Logged the message, Master [19:15:47] twentyafterfour: All done, go bonkers [19:15:54] Sorry it took 15 minutes instead of 5 :( [19:18:31] where do I find a definition of "group1" [19:18:54] Good question [19:19:03] twentyafterfour: i think reedy changes everything to the new version wmf12 [19:19:04] I know roughly what it means but I imagine this must be scripted [19:19:08] aude: No [19:19:09] then changes wikipedia group back [19:19:11] Like, NOOOO [19:19:15] Oh, I see [19:19:20] Maybe [19:19:22] not actually deploying that but with the script [19:19:25] bd808: Do you know? [19:19:40] think that's what he said last week [19:19:53] !log Reloading Zuul to deploy Id2cfcdfd56220 [19:19:58] Logged the message, Master [19:20:55] twentyafterfour: group1 is the all.dblist minus the wikipedia.dblist [19:21:06] wikiversions doesn't actually mention "wikimedia" ... how do I tell which are specifically wikipedia sites? this is kinda dumb (note: I'm supposed to find the dumb parts and automate them, that's why I'm doing this deploy instead of someone who knows how ;) [19:21:19] https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys#Promote_branch_.22Tuesday.22_deploy [19:21:25] bd808: thank you [19:21:36] updateWikiversions all php-VERISON [19:21:36] updateWikiversions wikipedia php-VERISON-1 [19:21:43] <^d> twentyafterfour: For historical reasons, wikipedias are just {langcode}wiki [19:21:50] <^d> en.wikipedia -> enwiki [19:21:52] <^d> etc, etc [19:21:54] right [19:22:03] That goofy dance is leaves group0 and group1 at N and the pedias at N-1 [19:22:42] This was one of the first WTFs I learned of in our system :) [19:22:50] lol [19:22:52] There were many more [19:23:49] <^d> Yeah yeah. Only difference in our warts and other company's warts is we're transparent about it :) [19:24:30] ^d: But certain non-Wikipedia wikis are also foowiki [19:24:47] ^d: :) And our warts are sometimes really old [19:24:49] Like commonswiki and metawiki [19:24:53] Worst of all, sourceswiki [19:25:01] <^d> Bah, yes. [19:25:40] * aude loves is too much in progress and incomplete. [19:25:48] ah [19:25:50] * aude loves is too much in progress and incomplete. [19:25:58] aaah wtf [19:26:00] I like to think of db names as opaque strings. Sometimes they tell you something other than were to look for the db, but much of the time they don't. [19:26:02] the multiversions stuff [19:26:34] how one mediawiki automatically becomes many wikis [19:28:33] (03CR) 10Legoktm: [C: 031] Convert from wfErrorLog to MWLogger logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180082 (owner: 10BryanDavis) [19:33:32] (03PS1) 1020after4: Group1 wikis to 1.25wmf12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180236 [19:35:04] https://gerrit.wikimedia.org/r/#/c/180236/ looks ok? [19:35:44] seems like it [19:35:59] twentyafterfour: looks right to me [19:36:08] I don't have +2 in gerrit so ... [19:36:09] enwiki still on wmf11 and lots of things moving to wmf12 [19:36:13] what! [19:36:25] ? [19:36:32] I don't like self-reviewing anyway [19:36:34] fun thing is to +2 your own changes [19:36:41] yeah but I can't [19:36:43] happens all the time :P [19:36:47] ^d: plx add twentyafterfour to the wfm ldap group [19:37:05] twentyafterfour: ready for a +2 then? [19:37:22] yep [19:37:29] (03CR) 10BryanDavis: [C: 032] Group1 wikis to 1.25wmf12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180236 (owner: 1020after4) [19:37:35] <^d> sec. [19:37:38] bombs away [19:37:44] (03Merged) 10jenkins-bot: Group1 wikis to 1.25wmf12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180236 (owner: 1020after4) [19:38:29] I guess it's wmf-deployment that you need to be in, but I bet ^d knows that [19:38:30] he needs to be in the wmf-deploy gerrit group [19:38:34] :P [19:39:11] <^d> twentyafterfour is already a member of the group, skipping. [19:39:12] <^d> yeah, gerrit [19:39:26] <^d> {{done}} [19:39:48] ok rebasing on tin [19:39:59] thanks ^d [19:40:28] I'm gonna do sync-wikiversions. hopefully I didn't break anything ;) [19:41:18] !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 to 1.25wmf12 [19:41:25] Logged the message, Master [19:42:11] did one bot really just tell the other bot to log a message? lols [19:42:35] (03PS2) 10Aude: Bump cache epoch for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180234 [19:42:43] twentyafterfour: ^ [19:42:49] <^d> twentyafterfour: Yep. morebots takes !log entries from humans too [19:42:58] people are superfluous ( I couldn't even spell superfluous without the help of a bot) [19:43:01] <^d> logmsgbot is just stuff from tin, not always !logged. [19:44:18] aude: this cache epoch bump is something I should have done? [19:44:43] usually do this after deploy [19:45:04] <^d> we should automate that then ;-) [19:45:10] it's better not to change this normally [19:45:23] but as we are redesigning the ui and changing stuff all the time, it's often necessary [19:45:24] * twentyafterfour likes automating things [19:45:35] heh [19:46:40] actually not totally sure it is needed this time [19:46:44] so that's it? nothing more? [19:47:29] my patch and then i think that's it [19:47:43] you do want to monitor log stash and/or fluorine [19:47:45] (03CR) 1020after4: [C: 032] Bump cache epoch for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180234 (owner: 10Aude) [19:47:51] in case of any problems [19:47:57] (03Merged) 10jenkins-bot: Bump cache epoch for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180234 (owner: 10Aude) [19:48:02] do I need to sync your patch now? [19:48:17] * aude has http://j.mp/wmfatal for this [19:48:21] twentyafterfour: yes [19:48:32] you know how to do that? [19:48:47] no ... [19:49:09] one sec [19:49:10] sync-common [19:49:12] ? [19:49:13] <^d> From that same directory of /srv/mediawik-staging/, fetch the change in, etc. [19:49:15] <^d> Then it's just `sync-file wmf-config/` [19:49:19] you have to fetch [19:49:24] ok [19:49:24] rebase [19:49:31] sync file [19:49:33] since it's just one [19:49:40] <^d> twentyafterfour: sync-common also works but it's overkill here :) [19:49:40] https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Step_2:_get_the_code_on_tin [19:50:03] and take care to follow the steps where it talks about security patches [19:50:13] though unlikely security patches in common [19:50:37] !log twentyafterfour Synchronized wmf-config/Wikibase.php: (no message) (duration: 00m 05s) [19:50:42] thanks :) [19:50:42] Logged the message, Master [19:51:50] <^d> twentyafterfour: So on version bump days like today that's pretty much it. Just spend awhile keeping an eye on things for any fallout. [19:57:25] the only fatals I see were in 1.25wmf11...so I guess that's a good thing [20:00:31] twentyafterfour: well, hopefully not too many fatals, since it's on big-ish wikis like Commons [20:00:38] 2 [20:00:39] (good job, btw) [20:00:48] (03PS1) 10coren: bootstrapvz: Tweak to firstboot.sh [puppet] - 10https://gerrit.wikimedia.org/r/180243 [20:02:21] greg-g: thanks [20:04:00] (03PS1) 10Ottomata: Add $ldap_create_users_on_login parmater to cdh::hue [puppet/cdh] - 10https://gerrit.wikimedia.org/r/180244 [20:04:29] twentyafterfour: unfortunately, you didn't break the site, so you don't get your badge yet. [20:04:35] :) [20:04:47] heh it's ok I have a bunch of those badges from deviantART [20:05:09] twentyafterfour: :) I didn't even know they had a down teh site badge here [20:05:37] greg-g: We have to widen our criteria a bit - otherwise I'm not real ops despite having been here a while. Let's add "break all of labs" to the qualifications, okay? :-) [20:06:08] Coren: fair :) [20:07:11] http://20after4.deviantart.com/badges/ [20:07:24] greg-g: We need "I broke the English Wikipedia" shirts... $serious reasons [20:07:51] hoo: s/English // [20:07:52] :) [20:08:19] (03PS2) 10Ottomata: Add $ldap_create_users_on_login parmater to cdh::hue [puppet/cdh] - 10https://gerrit.wikimedia.org/r/180244 [20:08:29] I didn't yet earn mine... but I guess that could be arranged :D [20:09:00] (03PS3) 10Ottomata: Add $ldap_create_users_on_login parmater to cdh::hue [puppet/cdh] - 10https://gerrit.wikimedia.org/r/180244 [20:09:22] hoo: purposefully breaking it doesn't count :) [20:09:27] RECOVERY - RAID on ms-be2009 is OK: OK: optimal, 13 logical, 13 physical [20:09:45] (03CR) 10Ottomata: [C: 032] Add $ldap_create_users_on_login parmater to cdh::hue [puppet/cdh] - 10https://gerrit.wikimedia.org/r/180244 (owner: 10Ottomata) [20:10:01] hoo: bd808|LUNCH has an etherpad with those things [20:10:28] Ok, so we certainly need "I accidentally ... Wikipedia" shirts [20:10:38] I accidentally edited Wikipedia? [20:10:46] I accidentally stood for ArbCom? ;) [20:10:52] (^ that happened) [20:11:09] http://knowyourmeme.com/memes/i-accidentally [20:12:04] heh [20:12:46] I accidentally all of Wikimedia production. [20:12:46] :D [20:13:14] I accidentally haven’t anything directly [20:13:18] broke testwiki once [20:14:55] (03PS1) 10Ottomata: Disable automatic Hue user creation in production [puppet] - 10https://gerrit.wikimedia.org/r/180246 [20:16:29] YuviPanda, wut, no t-shirt for you? [20:16:49] (03CR) 10Ottomata: [C: 032] Disable automatic Hue user creation in production [puppet] - 10https://gerrit.wikimedia.org/r/180246 (owner: 10Ottomata) [20:17:16] MaxSem: well, there was the one time I broke bits by forgetting to tell mhurd to not use debug=true in production builds of the iOS apps... [20:17:35] I’ve broken wikitech twice but that doesn’t really count [20:17:44] well, that's still not a result of you doing something with it [20:17:50] indeed [20:19:07] oh whee [20:19:31] the PhpStorm puppet plugin even has autosuggest [20:21:14] nice...haven't used phpstorm in a while. i liked it though when I did [20:23:05] (03PS1) 10Yuvipanda: maintain_replicas: Remove redundant 'as' for ipb_deleted [software] - 10https://gerrit.wikimedia.org/r/180247 [20:23:08] Coren: ^ [20:23:15] hoo: I have shirts prepared -- https://pbs.twimg.com/media/BxnJxSICAAEibU8.jpg [20:23:34] I'll be handing them out at all-hands and tech days next month [20:23:44] bd808: Nice. [20:23:58] (03CR) 10coren: [C: 032] "That should be a noop in practice." [software] - 10https://gerrit.wikimedia.org/r/180247 (owner: 10Yuvipanda) [20:24:10] bd808: damn, I don’t think I qualify yet [20:24:29] bd808: Can I has? [20:24:34] * haz [20:24:35] Hm. Means I need to deploy a breaking change by then! :-) [20:25:09] * Coren thinks bd808 will singlehandedly cause the destruction of uptime stats for the year. :-) [20:25:17] Coren: we can accidentally restart all the virt* machines [20:25:44] "then I fixed it" is the important part here folks [20:25:47] YuviPanda: Well, if breaking Labs count, I already qualify. So don't you dare! :-P [20:25:49] nice, bd808 [20:25:54] print some in size medium :D [20:26:12] Coren: heh :) [20:26:20] I think my bits breakage should count. [20:26:37] also I’ve broken the Wikipedia app a few times... [20:26:39] Yeah, breaking bits counts. :-) [20:26:43] damn, I didn’t do that actually. [20:27:20] Coren: well, I ‘broke’ it by forgetting to tell our iOS dev to not use debug=True on production builds, and that melted bits... [20:27:24] that took a while to dig up [20:27:42] http://knowyourmeme.com/photos/190279-i-accidentally [20:27:56] er http://i0.kym-cdn.com/photos/images/newsfeed/000/190/279/ss.jpg [20:27:58] (03PS1) 10Ottomata: Set up hue.wikimedia.org backend on misc-web-lb [puppet] - 10https://gerrit.wikimedia.org/r/180248 [20:29:39] probably, let's be honest, also I emailed Jimmy last time.... should I still be doing that? [20:31:25] twentyafterfour: also: https://twitter.com/wikimediatech/status/544940376248049664 [20:32:19] for some reason I keep reading "logmsgbot" as "logspambot" [20:32:26] either/or [20:32:39] my version is more amusing ... [20:33:03] I accidentally the whole log [20:34:32] twentyafterfour: reading is hard. [20:35:15] hahaha [20:35:24] !log spam [20:35:30] Logged the message, Master [20:36:05] https://twitter.com/wikimediatech/status/544953987821498368 [20:37:03] dologmsg from terbium is still broken [20:37:17] * hoo is trying that for research purposes [20:37:37] "research" [20:43:08] !log inserted decryption key for English Wikipedia Arbitration Committee Election (2014) [20:43:12] Logged the message, Master [20:45:16] "dolo" doesn't sound so good in Italian [20:46:49] Nemo_bis: 'Nemo' doesn't sound good in Python. [20:47:29] Noted. [21:31:52] (03PS1) 10RobH: updated uni.wikimedia.org cert file [puppet] - 10https://gerrit.wikimedia.org/r/180308 [21:32:55] Jamesofur: around ? [21:33:03] matanya: yes but busy :) [21:33:21] if you want to leave a PM I'll answer when I can [21:33:28] any ETA for tally ? [21:33:38] that ^ is all [21:34:11] as soon as possible :) [21:34:14] hence thebusy [21:35:12] Erm did you just change something? [21:35:16] https://en.wikisource.org/wiki/Wikisource:Scriptorium#Index:The_Botanical_Magazine.2C_Volume_2_.281788.29.djvu [21:35:21] Geetting some VERY odd behaviour? [21:36:06] (03CR) 10BBlack: [C: 032] updated uni.wikimedia.org cert file [puppet] - 10https://gerrit.wikimedia.org/r/180308 (owner: 10RobH) [21:36:11] Such as INDEX pages that won't preview and Page:'s inconsistently showing as a different status from the one I KNOW they have [21:36:31] If something was changed recently I would very strongly suggest changing back quickly [21:37:57] bd808: hiyaaaa [21:38:08] Qcoder00: a new version of MediaWiki was deployed to Wikisources and other non-Wikipedias a few hours ago [21:38:15] Well it broke [21:38:31] ottomata: howdy. what's up? [21:38:34] what exactly broke? i don't see the issue as described. [21:38:38] Or more precisely it broke ProofRead page [21:39:11] MatmaRex: When I try to do a show Preview from an Edit of an Index: page at Wikisource , no preview appears [21:39:45] well, it does for me. [21:39:48] Ah [21:39:55] Must be a cacheing issue? [21:40:45] np idea. maybe something broke some gadgets you use. [21:40:47] Because when I am logged in as ShakespeareFan00 - NO preview is generated [21:40:49] looks like this for me: http://i.imgur.com/aTEKw5E.png [21:41:04] I get that if I am NOT logged in [21:41:49] do you use any funny things related to editing? wiked, live preview, or such? [21:42:01] Not that I know of [21:42:11] Attempting the same test with no Gadgets [21:42:31] bd808: i am talkign with yuvi about RCstream [21:42:36] and he mentioned you are working on it? [21:42:45] just curious what you are up to [21:43:01] All gadgets disabled - Still NO preview generated [21:43:17] I'm using Firefox [21:43:23] ottomata: For the group0 wikis, yeah. I made a config change that caused it to break. The fix is up for SWAT in a few hours [21:43:42] So I'm puzzled... [21:43:44] are you just fixing stuff or actually re-working rcstream internals? [21:44:26] ottomata: Just fixing something that was broken by my PSR-3 logging changes -- https://gerrit.wikimedia.org/r/#/c/180214/ [21:44:31] MatmaRex: I suggest rolling back the update [21:44:44] Until the fault dissapers [21:44:50] or someone suggests a better idea [21:45:41] Qcoder00: if you're the only affected person, then the problem, whatever it is, might be on your side :( [21:45:43] I'm using the Standard settings at Wikisource [21:45:54] MatmaRex: It doesn't occur when I'm not logged in [21:46:05] I don't have any gdagets running [21:46:08] and I don't use scripts [21:46:10] Qcoder00: do you have this enabled? http://i.imgur.com/YE419i0.png [21:46:14] This suggests it's your side [21:46:48] (enabling it indeed seems to break the preview) [21:46:49] MatmaRex: Yes ... Did that change in the update? [21:47:01] Thanks [21:47:20] yeah [21:47:25] hmph. grumble [21:47:26] hm bd808, do you happen to know how mw logs to udp2log right now? [21:47:43] Well Live-preview DOES seem to be what broke [21:47:45] :) [21:47:48] Qcoder00: is the fancy form thing on wikisource pages implemented locally, or in ProofreadPage, or where? [21:47:52] (do you know?) [21:47:59] I don't know [21:48:11] Just a humble user praying to the dev gods [21:48:12] :( [21:48:40] My understanding was that th tweaked edit form for Index and Page was part of the extension [21:49:12] ottomata: I know all about it. :) MWLoggerLegacyLogger is currently used everywhere except beta and the group0 wikis which have been switched to MWLoggerMonologSpi and the classes it creates. [21:49:19] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 417.233337 [21:49:27] indeed it seems to be [21:50:11] ottomata: MWLoggerLegacyLogger is the code I extracted from various parts of GlobalFunctions.php when I implemented the PSR-3 logging interface. [21:51:37] ottomata: So today, MWLoggerLegacyLogger::emit() is what ultimately sends log events as udp packets that udp2log grabs and writes to disk [21:51:55] Qcoder00: https://phabricator.wikimedia.org/T78709 you might want to subscribe to it. [21:52:38] Well if it's already logged , good [21:52:49] It's a minor inconvenience [21:52:57] ok, interesting, thanks bd808. [21:53:02] Not an OMG You really broke it [21:53:08] ottomata: The config in beta and group0 does that as well, but also sends log events to a redis queue that logstash reads. The difference is that it does this using the Monolog logging library and that changes the behavior of the wfErrorLog() method [21:53:35] cool, got it [21:54:11] This change in behavior broke rcstream because I forgot that it used that method to sand udp packets to different places [21:54:39] I had been warned months ago but ... yeah [21:56:44] MatmaRex: I like the "allegedly" bit. :) [21:58:11] bd808: rcstream shouldn't be affected...it uses redis pubsub. Just the IRC feed. [21:59:12] legoktm: oh yeah. rcfeed !== rcstream [22:00:07] (03PS1) 10BBlack: Remove space from start of cert [puppet] - 10https://gerrit.wikimedia.org/r/180314 [22:04:39] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [22:05:11] (03CR) 10BBlack: [C: 032] Remove space from start of cert [puppet] - 10https://gerrit.wikimedia.org/r/180314 (owner: 10BBlack) [22:12:20] legoktm: bd808 oh wait, mediawiki just directly puts things into redis for rcstream, right? I forgot... [22:12:41] yes [22:12:48] right [22:13:34] legoktm: bd808 is rcstream considered a ‘reliable’ service now? [22:13:42] no [22:13:43] I don’t know if I ever saw an announcement email [22:14:08] * bd808 knows nothing about any of this really [22:14:11] it hasn't been officially launched yet [22:14:31] legoktm: do you know why? [22:16:06] I don't remember, ori/Krinkle probably do [22:20:44] PROBLEM - RAID on mw1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:22:22] (03PS1) 10BBlack: Switch public SSL terminators to new unified cert [puppet] - 10https://gerrit.wikimedia.org/r/180325 [22:23:00] PROBLEM - Disk space on mw1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:23:11] PROBLEM - DPKG on mw1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:23:50] RECOVERY - RAID on mw1016 is OK: OK: no RAID installed [22:26:00] RECOVERY - Disk space on mw1016 is OK: DISK OK [22:26:10] RECOVERY - DPKG on mw1016 is OK: All packages OK [22:33:19] YuviPanda, legoktm: it's considered reliable [22:33:27] it was waiting on the developer hub for docs, btu that's on ice at the moment [22:33:30] but it's public and dependable [22:34:20] !log Updated the Wikidata property suggester with data from Monday's JSON dump [22:34:24] Logged the message, Master [22:35:17] (03CR) 10Hoo man: "I just updated the entity suggester data, so this change should get pushed now." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/179469 (owner: 10Hoo man) [22:35:18] ori: aaah, cool! [22:37:17] ori: congrats on the performance flames!! [22:37:24] hashar: :) [22:37:37] bonus point for having the code to be less than 4Kbytes [22:39:11] PROBLEM - puppet last run on mw1012 is CRITICAL: CRITICAL: Puppet has 8 failures [22:39:24] if you have some spare bits available, you might consider adding some chip tune (random one from razor0x777 : https://www.youtube.com/watch?v=svyQ9D3UQQY [22:39:52] ori: more seriously, you might want to craft an appropriate landing page on http://performance.wikimedia.org that would list the various tools available there [22:41:32] hashar: yes, that's a TODO [22:41:42] PROBLEM - puppet last run on mw1008 is CRITICAL: CRITICAL: Puppet has 2 failures [22:41:44] ori: the chip tune ? :D [22:42:41] (03PS1) 10Yuvipanda: Ignore messages that start with / [software/ircyall] - 10https://gerrit.wikimedia.org/r/180331 [22:42:51] hashar, patches are welcome! [22:43:02] ori: meanwhile I got jenkins jobs for extensions, and some are only failing under HHVM :-( [22:43:07] (03CR) 10Yuvipanda: [C: 032 V: 032] Ignore messages that start with / [software/ircyall] - 10https://gerrit.wikimedia.org/r/180331 (owner: 10Yuvipanda) [22:43:35] MaxSem: easy http://deskjet.github.io/chiptune.js/ ! :D [22:44:15] (03CR) 10Hashar: "-1 where are the Jenkins jobs and tests !!!!" [software/ircyall] - 10https://gerrit.wikimedia.org/r/180331 (owner: 10Yuvipanda) [22:44:28] hashar: haven’t added ‘em... [22:44:31] * YuviPanda looks away sheepishly [22:45:19] (zuul) how do you dare writing code before tests! [22:46:45] (03PS1) 10Hashar: tox.ini to easily run flake8 [software/ircyall] - 10https://gerrit.wikimedia.org/r/180333 [22:46:53] PROBLEM - puppet last run on mw1016 is CRITICAL: CRITICAL: Puppet has 6 failures [22:48:07] YuviPanda: is that python3? [22:48:13] hashar: yup [22:48:36] (03PS1) 10Yuvipanda: pep8 fixes for web2redis [software/ircyall] - 10https://gerrit.wikimedia.org/r/180334 [22:49:14] PROBLEM - puppet last run on mw1004 is CRITICAL: CRITICAL: Puppet has 10 failures [22:49:16] (03PS2) 10Hashar: tox.ini to easily run flake8 [software/ircyall] - 10https://gerrit.wikimedia.org/r/180333 [22:49:17] YuviPanda: ^^^ [22:49:28] wooo [22:49:35] creating jobs [22:50:14] (03CR) 10Yuvipanda: [C: 032 V: 032] tox.ini to easily run flake8 [software/ircyall] - 10https://gerrit.wikimedia.org/r/180333 (owner: 10Hashar) [22:50:24] hashar: wooo you are awesome! :) [22:50:54] (03PS2) 10Yuvipanda: pep8 fixes for web2redis [software/ircyall] - 10https://gerrit.wikimedia.org/r/180334 [22:51:10] PROBLEM - puppet last run on mw1003 is CRITICAL: CRITICAL: puppet fail [22:53:25] https://gerrit.wikimedia.org/r/180335 :D [22:55:49] (03PS3) 10Yuvipanda: pep8 fixes [software/ircyall] - 10https://gerrit.wikimedia.org/r/180334 [22:55:51] (03PS1) 10Yuvipanda: Ignore everything after first newline in message [software/ircyall] - 10https://gerrit.wikimedia.org/r/180337 [22:57:19] RECOVERY - puppet last run on mw1008 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [22:57:50] PROBLEM - RAID on mw1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:58:00] YuviPanda: do you know who is responsible for the phabricator IRC bots? [22:58:11] PROBLEM - puppet last run on mw1009 is CRITICAL: CRITICAL: Puppet has 1 failures [22:58:23] gwicke: legoktm! [22:58:40] PROBLEM - SSH on mw1014 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:58:50] PROBLEM - Disk space on mw1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:58:51] (03CR) 10Hashar: "recheck" [software/ircyall] - 10https://gerrit.wikimedia.org/r/180334 (owner: 10Yuvipanda) [22:58:53] PROBLEM - DPKG on mw1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:58:59] PROBLEM - nutcracker process on mw1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:59:01] (03CR) 10Hashar: "recheck" [software/ircyall] - 10https://gerrit.wikimedia.org/r/180337 (owner: 10Yuvipanda) [22:59:12] (03CR) 10jenkins-bot: [V: 04-1] pep8 fixes [software/ircyall] - 10https://gerrit.wikimedia.org/r/180334 (owner: 10Yuvipanda) [22:59:17] (03CR) 10jenkins-bot: [V: 04-1] Ignore everything after first newline in message [software/ircyall] - 10https://gerrit.wikimedia.org/r/180337 (owner: 10Yuvipanda) [22:59:26] PROBLEM - salt-minion processes on mw1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:59:31] hashar: \o/ you are awsome [22:59:45] YuviPanda: up to you to ignore the flake8 errors at https://integration.wikimedia.org/ci/job/operations-software-ircyall-tox-flake8-trusty/1/console [22:59:48] can be done in tox.ini [22:59:52] [flake8] [23:00:01] ignore = W292,E501 [23:00:03] hashar: yeah, I might make it ignore the 80col restriction [23:00:07] but woohoo [23:00:11] though having a newline at the end of the file is kind of nice to have [23:00:20] the x col restriction can be softened [23:00:36] [flake8] [23:00:39] max-line-length = 120 [23:00:42] ori: What about upgrading of protocol to 1.0? [23:00:58] ori: I'd like that figured out before we go 'stable'. [23:01:08] But it seems the python lib we use is no longer maintained. [23:01:15] and is unlikely to see 1.0 integrated. [23:01:15] which lib ? [23:01:17] PROBLEM - nutcracker port on mw1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:01:20] PROBLEM - dhclient process on mw1014 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:01:30] gevent socket.io [23:01:51] (03PS2) 10Yuvipanda: Ignore everything after first newline in message [software/ircyall] - 10https://gerrit.wikimedia.org/r/180337 [23:01:57] hashar: https://github.com/wikimedia/mediawiki-services-rcstream [23:02:02] (03CR) 10jenkins-bot: [V: 04-1] Ignore everything after first newline in message [software/ircyall] - 10https://gerrit.wikimedia.org/r/180337 (owner: 10Yuvipanda) [23:02:04] ahthat project [23:02:12] RECOVERY - SSH on mw1014 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [23:02:12] RECOVERY - Disk space on mw1014 is OK: DISK OK [23:02:13] RECOVERY - DPKG on mw1014 is OK: All packages OK [23:02:14] I am of no help there :( [23:02:29] RECOVERY - nutcracker process on mw1014 is OK: PROCS OK: 1 process with UID = 112 (nutcracker), command name nutcracker [23:02:41] PROBLEM - puppet last run on mw1002 is CRITICAL: CRITICAL: Puppet has 1 failures [23:02:44] RECOVERY - salt-minion processes on mw1014 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [23:02:49] PROBLEM - puppet last run on mw1014 is CRITICAL: CRITICAL: Puppet has 11 failures [23:03:05] (03PS4) 10Yuvipanda: pep8 fixes [software/ircyall] - 10https://gerrit.wikimedia.org/r/180334 [23:03:07] (03PS3) 10Yuvipanda: Ignore everything after first newline in message [software/ircyall] - 10https://gerrit.wikimedia.org/r/180337 [23:03:09] (03PS1) 10Yuvipanda: Add a sane line length limit [software/ircyall] - 10https://gerrit.wikimedia.org/r/180341 [23:03:20] (03CR) 10jenkins-bot: [V: 04-1] Ignore everything after first newline in message [software/ircyall] - 10https://gerrit.wikimedia.org/r/180337 (owner: 10Yuvipanda) [23:03:22] (03CR) 10jenkins-bot: [V: 04-1] pep8 fixes [software/ircyall] - 10https://gerrit.wikimedia.org/r/180334 (owner: 10Yuvipanda) [23:03:24] (03CR) 10Yuvipanda: [C: 032 V: 032] Add a sane line length limit [software/ircyall] - 10https://gerrit.wikimedia.org/r/180341 (owner: 10Yuvipanda) [23:03:26] (03CR) 10jenkins-bot: [V: 04-1] Add a sane line length limit [software/ircyall] - 10https://gerrit.wikimedia.org/r/180341 (owner: 10Yuvipanda) [23:04:33] RECOVERY - nutcracker port on mw1014 is OK: TCP OK - 0.000 second response time on port 11212 [23:04:40] RECOVERY - dhclient process on mw1014 is OK: PROCS OK: 0 processes with command name dhclient [23:04:40] RECOVERY - RAID on mw1014 is OK: OK: no RAID installed [23:07:04] YuviPanda: thx! [23:07:51] RECOVERY - puppet last run on mw1009 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:15:50] (03CR) 10Yuvipanda: "recheck" [software/ircyall] - 10https://gerrit.wikimedia.org/r/180334 (owner: 10Yuvipanda) [23:16:06] PROBLEM - RAID on mw1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:16:30] PROBLEM - puppet last run on mw1008 is CRITICAL: CRITICAL: Puppet has 4 failures [23:16:48] (03PS1) 10Legoktm: Temporarily disable $wgCentralAuthAutoMigrate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180346 [23:17:14] PROBLEM - puppet last run on mw1005 is CRITICAL: CRITICAL: Puppet has 7 failures [23:17:21] PROBLEM - puppet last run on mw1013 is CRITICAL: CRITICAL: Puppet has 1 failures [23:17:55] (03CR) 10Ori.livneh: [C: 031] Temporarily disable $wgCentralAuthAutoMigrate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180346 (owner: 10Legoktm) [23:19:05] RECOVERY - RAID on mw1010 is OK: OK: no RAID installed [23:19:44] greg-g: ^ I'm going to deploy that now [23:20:17] (03CR) 10Yuvipanda: [C: 032 V: 032] pep8 fixes [software/ircyall] - 10https://gerrit.wikimedia.org/r/180334 (owner: 10Yuvipanda) [23:20:29] (03CR) 10jenkins-bot: [V: 04-1] pep8 fixes [software/ircyall] - 10https://gerrit.wikimedia.org/r/180334 (owner: 10Yuvipanda) [23:20:53] legoktm: ... [23:21:12] ? [23:21:29] what's the hold up? [23:21:44] oh, I was finding my deploy notes/instructions [23:21:47] (03CR) 10Legoktm: [C: 032] Temporarily disable $wgCentralAuthAutoMigrate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180346 (owner: 10Legoktm) [23:21:51] oh, sorry. [23:21:59] (03Merged) 10jenkins-bot: Temporarily disable $wgCentralAuthAutoMigrate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180346 (owner: 10Legoktm) [23:22:00] fetch config, looksgood ;) [23:22:06] (03PS5) 10Yuvipanda: pep8 fixes [software/ircyall] - 10https://gerrit.wikimedia.org/r/180334 [23:22:08] (03PS4) 10Yuvipanda: Ignore everything after first newline in message [software/ircyall] - 10https://gerrit.wikimedia.org/r/180337 [23:22:20] (03CR) 10jenkins-bot: [V: 04-1] pep8 fixes [software/ircyall] - 10https://gerrit.wikimedia.org/r/180334 (owner: 10Yuvipanda) [23:23:01] PROBLEM - nutcracker port on mw1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:23:55] PROBLEM - nutcracker process on mw1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:23:56] PROBLEM - puppet last run on mw1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:24:03] PROBLEM - salt-minion processes on mw1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:24:30] hmm [23:24:45] PROBLEM - Disk space on mw1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:25:03] 2 apaches left... [23:25:11] PROBLEM - DPKG on mw1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:25:22] PROBLEM - puppet last run on mw1006 is CRITICAL: CRITICAL: Puppet has 24 failures [23:25:34] PROBLEM - SSH on mw1015 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:25:43] uh [23:25:45] PROBLEM - RAID on mw1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:25:53] legoktm: don't worry about mw1015; it's not related. [23:26:01] PROBLEM - configured eth on mw1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:26:20] PROBLEM - dhclient process on mw1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:26:25] ok, there are still 2 left... [23:28:06] PROBLEM - RAID on mw1008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:28:49] (03PS6) 10Yuvipanda: pep8 fixes + another commit rolled in because jenkins [software/ircyall] - 10https://gerrit.wikimedia.org/r/180334 [23:29:21] notcsteipp: amazing disguise :) [23:29:44] (03CR) 10Yuvipanda: [C: 032] pep8 fixes + another commit rolled in because jenkins [software/ircyall] - 10https://gerrit.wikimedia.org/r/180334 (owner: 10Yuvipanda) [23:29:50] YuviPanda: I try :) [23:29:55] (03Abandoned) 10Yuvipanda: Ignore everything after first newline in message [software/ircyall] - 10https://gerrit.wikimedia.org/r/180337 (owner: 10Yuvipanda) [23:29:58] notcsteipp: :) [23:30:43] legoktm: ctrl+break, try again. [23:30:55] there's only one left now [23:30:57] !log disabled puppet on tin and removed mw1015 from mediawiki-installation dsh group [23:31:00] oh, ok [23:31:03] Logged the message, Master [23:31:08] RECOVERY - RAID on mw1008 is OK: OK: no RAID installed [23:32:24] !log legoktm Synchronized wmf-config/CommonSettings.php: Temporarily disable wgCentralAuthAutoMigrate (duration: 01m 17s) [23:32:28] Logged the message, Master [23:33:03] PROBLEM - puppet last run on mw1010 is CRITICAL: CRITICAL: Puppet has 1 failures [23:34:15] RECOVERY - puppet last run on mw1002 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [23:36:34] RECOVERY - puppet last run on mw1012 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [23:37:31] RECOVERY - Disk space on mw1015 is OK: DISK OK [23:37:51] RECOVERY - DPKG on mw1015 is OK: All packages OK [23:38:02] RECOVERY - SSH on mw1015 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [23:38:51] RECOVERY - configured eth on mw1015 is OK: NRPE: Unable to read output [23:39:05] RECOVERY - dhclient process on mw1015 is OK: PROCS OK: 0 processes with command name dhclient [23:39:14] RECOVERY - nutcracker port on mw1015 is OK: TCP OK - 0.000 second response time on port 11212 [23:39:42] RECOVERY - nutcracker process on mw1015 is OK: PROCS OK: 1 process with UID = 112 (nutcracker), command name nutcracker [23:40:07] RECOVERY - salt-minion processes on mw1015 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [23:40:07] RECOVERY - puppet last run on mw1015 is OK: OK: Puppet is currently enabled, last run 41 minutes ago with 0 failures [23:41:45] RECOVERY - RAID on mw1015 is OK: OK: no RAID installed [23:47:27] (03PS1) 10Yuvipanda: Output report as single YAML file [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180352 [23:47:44] PROBLEM - puppet last run on mw1011 is CRITICAL: CRITICAL: Puppet has 6 failures [23:47:51] !log tstarling Synchronized wmf-config/CommonSettings.php: SecurePoll debugging (duration: 01m 01s) [23:47:58] Logged the message, Master [23:48:43] (03PS2) 10Yuvipanda: Output report as single YAML file [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180352 [23:48:48] RECOVERY - puppet last run on mw1010 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:51:03] (03PS3) 10Yuvipanda: Output report as single YAML file [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180352 [23:51:19] (03CR) 10Yuvipanda: [C: 032 V: 032] Output report as single YAML file [software/labsdb-auditor] - 10https://gerrit.wikimedia.org/r/180352 (owner: 10Yuvipanda) [23:55:28] !log fixed MW cgroup on tin [23:55:37] Logged the message, Master [23:55:41] PROBLEM - puppet last run on mw1015 is CRITICAL: CRITICAL: puppet fail