[00:00:04] RoanKattouw, ^d, marktraceur, MaxSem, kaldari: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141121T0000). Please do the needful. [00:00:21] Alrighty, config changes ffirst [00:00:28] legoktm, ^d, I definitely don't know what's going on, so can I leave you to file any necessary bugs? [00:00:32] Jamesofur: You around for your SWAT? [00:00:56] mhm [00:01:11] (03PS1) 10Kaldari: Adding wmgMFCustomLogos copyright-width and copyright-height [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174862 [00:01:24] kaldari|2: Do you need that ---^^ for the SWAT? [00:01:45] RoanKattouw: Yeah, I'm adding to deployment schedule now. [00:01:51] Awesome [00:01:55] RoanKattouw: Sorry for last minute addition :P [00:02:00] No worries [00:02:34] kaldari|2: Did you mean to remove wgThumbnailBuckets? [00:02:47] (See diff @ https://gerrit.wikimedia.org/r/#/c/174862/1/wmf-config/InitialiseSettings.php ) [00:03:06] (03CR) 10Catrope: [C: 032] Revert "Enable SecurePoll error detail for debugging" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174829 (owner: 10Jalexander) [00:03:14] (03Merged) 10jenkins-bot: Revert "Enable SecurePoll error detail for debugging" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174829 (owner: 10Jalexander) [00:03:18] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 2 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [00:03:48] ori: https://raw.githubusercontent.com/filbertkm/sites/master/sites.tsv [00:03:55] not json [00:04:02] though [00:04:07] RoanKattouw: patches added to deploy schedule. I have 2 actually. [00:04:19] RoanKattouw: Both are config changes. One is already merged [00:05:04] kaldari|2: I don't see it. Did you add the right window? (The Friday one, because it's already Friday in UTC) [00:05:09] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [00:06:03] kaldari|2: Oh I see you added it for next week [00:06:25] (03CR) 10Catrope: [C: 04-1] Adding wmgMFCustomLogos copyright-width and copyright-height (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174862 (owner: 10Kaldari) [00:06:41] kaldari|2: I -1ed one of your changes ---^^ [00:06:55] Going to deploy the other one that was already merged, and Jamesofur's one [00:07:02] thank ye [00:07:09] !log catrope Synchronized wmf-config/: SWAT (duration: 00m 08s) [00:07:11] Logged the message, Master [00:08:50] Jamesofur: Yours is done [00:09:02] WTF: [00:09:04] 00:07:03 ['/srv/deployment/scap/scap/bin/sync-common', '--no-update-l10n', '--include', 'wmf-config', '--include', 'wmf-config/***', 'mw1010.eqiad.wmnet', 'mw1070.eqiad.wmnet', 'mw1161.eqiad.wmnet', 'mw1201.eqiad.wmnet'] on osmium returned [255]: Permission denied (publickey). [00:09:28] osmium [00:09:45] What about it? [00:09:54] It's a funny box [00:09:57] hhvm testing [00:10:28] I can log in there... [00:11:35] (03PS2) 10Kaldari: Adding wmgMFCustomLogos copyright-width and copyright-height [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174862 [00:11:43] RoanKattouw: puppet may be off there. Did anything change about your keys recently? [00:11:53] I gave up root [00:12:02] (03CR) 10Catrope: [C: 032] Adding wmgMFCustomLogos copyright-width and copyright-height [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174862 (owner: 10Kaldari) [00:12:22] (03Merged) 10jenkins-bot: Adding wmgMFCustomLogos copyright-width and copyright-height [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174862 (owner: 10Kaldari) [00:12:29] That might be the problem. ori? Can you see if you can make osmium like RoanKattouw again? [00:12:52] !log catrope Synchronized wmf-config/: SWAT (again, forwmgMFCustomLogos) (duration: 00m 05s) [00:12:56] i'll fix sec [00:12:58] Logged the message, Master [00:13:15] It didn't fail this time [00:13:33] quiddity, ^d: filed https://bugzilla.wikimedia.org/show_bug.cgi?id=73680 [00:14:01] legoktm: {{sofixit}} [00:14:08] kaldari: OK that's deployed too now [00:14:19] bd808: I don't know what's wrong :/ [00:14:31] https [00:14:37] https doesn't work in beta [00:14:55] what's forcing the https call? [00:15:17] ohhhh [00:15:20] it doesn't?? [00:15:28] https://bugzilla.wikimedia.org/show_bug.cgi?id=68387 [00:15:57] RoanKattouw: thanks [00:16:44] (03PS1) 10Legoktm: Use http for GlobalUserPage on beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174870 [00:16:44] legoktm: see also https://bugzilla.wikimedia.org/show_bug.cgi?id=48501 (we have no certs for beta) [00:18:00] RoanKattouw: if you're syncing stuff...wanna do https://gerrit.wikimedia.org/r/174870 ? it's beta only :) [00:18:22] (03CR) 10Catrope: [C: 032] Use http for GlobalUserPage on beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174870 (owner: 10Legoktm) [00:18:43] bd808: I blame csteip.p, who specifically asked me to encourage usage of https in the extension defaults >.< [00:18:44] (03Merged) 10jenkins-bot: Use http for GlobalUserPage on beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174870 (owner: 10Legoktm) [00:19:18] hehe. Maybe he should encourage the purchase of beta certs. [00:19:41] yay! [00:19:44] RoanKattouw: thanks [00:19:46] and also encourage figuring out how secure said certs [00:21:01] http://en.wikipedia.beta.wmflabs.org/wiki/User:Legoktm [00:21:07] !!! [00:21:09] quiddity: ^ :D [00:21:54] Woo! [00:23:08] ori: https://raw.githubusercontent.com/filbertkm/sites/master/sites.json [00:26:22] (03PS1) 10QChris: Remove hooks-bugzilla plugin, as Bugzilla goes into read-only mode [gerrit/plugins] - 10https://gerrit.wikimedia.org/r/174875 [00:26:23] legoktm: That banner should go on top for consistency for cross-cluster pulling of media files. [00:26:32] legoktm: But woo. :-) [00:26:37] James_F: there are a few bugs open for that [00:27:02] James_F: https://bugzilla.wikimedia.org/show_bug.cgi?id=73634 [00:27:18] James_F: comment there please! [00:27:30] legoktm: You want me to WONTFIX? [00:27:40] no [00:27:43] just comment :P [00:27:44] legoktm: 'Cos my request is orthogonal to that bug. [00:27:55] legoktm: I want the shed blue. Max wants it blown up. [00:28:06] well, that bug is basically "figure out what to do with the footer notice" [00:28:07] Commenting about wanting it blue on the request to blow it up isn't fruitful. :-) [00:28:11] an option is "move it to the top" [00:28:18] That's not what it says in the title. :-) [00:28:18] another option is "move it to ?action=edit only" [00:28:34] meh [00:28:39] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 2 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [00:28:46] legoktm: Will re-title. [00:29:39] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [00:38:59] (03CR) 10Rush: [C: 032 V: 032] bugzilla handles characters that are invalid for api [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/173998 (owner: 10Rush) [00:41:38] gi11es, ebernhardson: You guys around? [00:41:47] RoanKattouw: yes [00:41:53] Cool [00:42:08] Sorry for the delay, the main SWAT deploy will happen soon now [00:44:53] !log catrope Synchronized php-1.25wmf8/extensions/VisualEditor: SWAT (duration: 00m 04s) [00:44:57] Logged the message, Master [00:44:58] !log catrope Synchronized php-1.25wmf8/extensions/MultimediaViewer: SWAT (duration: 00m 04s) [00:45:00] Logged the message, Master [00:45:03] !log catrope Synchronized php-1.25wmf8/extensions/Flow: SWAT (duration: 00m 05s) [00:45:05] Logged the message, Master [00:45:18] !log catrope Synchronized php-1.25wmf8/resources/lib/oojs-ui/: SWAT (duration: 00m 03s) [00:45:20] Logged the message, Master [00:45:35] RoanKattouw: yup, here [00:45:40] Cool [00:45:44] I've just done everything in wmf8 [00:46:16] RoanKattouw: testing... [00:46:17] !log catrope Synchronized php-1.25wmf9/extensions/VisualEditor: SWAT (duration: 00m 04s) [00:46:19] Logged the message, Master [00:46:21] !log catrope Synchronized php-1.25wmf9/extensions/MultimediaViewer: SWAT (duration: 00m 03s) [00:46:21] And here goes wmf9 [00:46:23] Logged the message, Master [00:46:26] !log catrope Synchronized php-1.25wmf9/extensions/Flow: SWAT (duration: 00m 05s) [00:46:29] Logged the message, Master [00:46:40] !log catrope Synchronized php-1.25wmf9/resources/lib/oojs-ui/: SWAT (duration: 00m 03s) [00:46:42] Logged the message, Master [00:46:46] That's it, all done [00:48:45] looks to work here, thanks [00:50:47] (03PS1) 10Ori.livneh: Remove osmium from mediawiki-installation [puppet] - 10https://gerrit.wikimedia.org/r/174881 [00:50:59] (03CR) 10Ori.livneh: [C: 032 V: 032] Remove osmium from mediawiki-installation [puppet] - 10https://gerrit.wikimedia.org/r/174881 (owner: 10Ori.livneh) [00:51:29] PROBLEM - MySQL Slave Delay on db1016 is CRITICAL: CRIT replication delay 322 seconds [00:51:58] PROBLEM - MySQL Replication Heartbeat on db1016 is CRITICAL: CRIT replication delay 348 seconds [00:52:19] RoanKattouw: all good for mine, thanks for the swat [00:52:29] RECOVERY - MySQL Slave Delay on db1016 is OK: OK replication delay 0 seconds [00:52:30] Awesome [00:52:35] Thanks for verifying guys [00:52:58] RECOVERY - MySQL Replication Heartbeat on db1016 is OK: OK replication delay -0 seconds [00:57:04] !log disabled gerrit's hooks-bugzilla plugin (See T210) [00:57:06] Logged the message, Master [00:58:30] greg-g: can i sneak one more into this window? we patched wmf9 for an office wiki bug, but obviously office wiki is on wmf8 [00:58:54] ebernhardson: do it [01:01:14] <^d> qchris: thanks, I had forgotten about that. [01:01:22] (03PS1) 10Ori.livneh: Add site-list.json for Wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174885 [01:02:08] !log Updated scap to I5782e8cbe: Make the SSH user and authentication socket configurable [01:02:10] Logged the message, Master [01:02:25] cool. [01:03:02] bd808: so, what can we kill now? :) [01:03:10] l10nupdate was somehow related, no? [01:04:09] l10nupdate runs as the l10nupdate user and uses an ssh keypair that lives on tin [01:04:29] So the l10nupdate user would need to have access to the auth socket [01:06:16] ori: I just touched that script today actually -- https://gerrit.wikimedia.org/r/#/c/174784/2/files/misc/l10nupdate/l10nupdate-1,unified [01:10:33] bd808: should we tell deployers to stop forwarding their agent to tin? [01:10:44] bd808: i guess they still need it if they want to ssh into an individual app server to poke around [01:11:29] I think we should tell them and maybe even add something to profile.d that yells at you if you do [01:11:47] well, you still need it for trebuchet deployments [01:11:55] or, actually, i guess not [01:11:56] do you? [01:12:02] * ^demon|away will probably be forwarding his agent for a few more years [01:12:02] yeah, scratch that [01:12:03] forwarding should be unnecessary, with *.eqiad.wmnet setup in ssh with ProxyCommand(some wikipage says to do it that way) you can ssh to app servers directly through bastion without forwarding [01:12:10] <^demon|away> force of habit typing -A all the time now [01:12:18] bd808: Do I still need to forward my gerrit ssh key? [01:12:24] ebernhardson: right, fair point [01:12:31] hoo: yeah :( [01:12:48] bd808: we should just create an mwdeploy account on gerrit [01:12:54] ori: a gerrit key will still be required to pull [01:13:00] that would work [01:13:07] for everybody but Sam [01:13:15] Sam will adjust [01:13:15] <^demon|away> How would one push? [01:13:18] and he could have a key on tin just for that really [01:13:27] <^demon|away> I push from tin all the freaking time. [01:13:35] Sometimes it's needed [01:13:38] eg. updating CDBs [01:13:43] *nod* [01:13:43] forward your key in those cases, I guess [01:13:49] you really shouldn't, IMO [01:13:53] Or make a key on tin just for that [01:14:02] you can have zillions of keys on gerrit [01:14:13] you'll get yelled at [01:14:17] ops don't like private keys on the cluster [01:14:20] In the past I did the stuff on tin, rsynced the changes out, and then uploaded the patch from here [01:14:30] but that's not *very* nice [01:14:42] also rsync in the cluster needs forward agent :S [01:14:46] * bd808 knows there are private keys on the cluster [01:14:58] o O ( AllowAgentForwarding=no ) [01:15:02] on tin [01:15:17] Sam really has to be abel to push from tin [01:15:19] guess one could do that on terbium, then put it into public_html, download it to my local machine, and then create the change [01:15:21] that would work [01:15:30] besides ProxyCommand, why would you login to an appserver from tin instead of bastion anyway? [01:15:35] for cdb updates [01:15:39] The train deploy process has to make things happen in tin and then save to gerrit [01:15:42] paravoid: right, that's true [01:16:03] Also people use agent forwarding to rsync files [01:16:09] hoo: that's no longer needed [01:16:23] * ^demon|away can't get out of the habit of typing `ssh -A` instead of just `ssh` [01:16:28] <^demon|away> And I've had ProxyCommand setup for ages. [01:16:37] ^demon|away: AllowAgentForwarding=no will cure you of that pretty quickly [01:16:55] ori: How will it work without that? [01:16:57] and really, get out of the habit [01:17:06] it's a very bad habit :) [01:17:18] That's why I only use an ssh agent, if I really need it [01:17:26] I need to start it per hand [01:17:28] hoo: there's a shared SSH agent on tin that gets armed by ops with the mwdeploy identity [01:17:30] <^demon|away> paravoid: We all have our vices :D [01:17:37] hoo: and scap now knows to use it [01:17:59] So even if you do forward your key, scap will no longer use it [01:18:02] ori: But... if I want to shell upload a video, I need to pull it to the bastion and then rsync it to some host with MediaWiki terbium or tin [01:18:22] ori: you should be able to scp directly with ProxyCommand? [01:18:23] well, guess I could use curl with a proxy by using the URL downloader squid thingy [01:18:24] you can use proxycommand [01:18:25] s/ori/ho/ [01:18:32] what ebernhardson said [01:18:44] yeah scp and proxycommand get along just fine [01:19:02] actually rsync should too [01:19:06] I'm not uploading the files from one of my machines [01:19:08] yes it would [01:19:17] so I don't have them locally at all [01:19:41] but you have them somewhere you could reach bastion from? [01:19:58] If so from there you can reach past bastion iwth proxycommand [01:20:26] bd808: I pull them onto the cluster by URL [01:20:31] usually to a bastion [01:20:36] nod [01:20:47] unrelatedly, there is an hn article today about bastions, and a bunch of people slagging off the idea as a half-solution. dont know enough to have an opinion personally [01:20:56] then I am evil, use ssh forwarding... go to the bastion and do rsync ... terbium:~/tmp or so [01:21:05] not sure how that would work without ssh forwarding [01:21:13] and w/o making stuff much more complicated [01:21:31] scp bastion:/path/to/file terbium:/path/to/save [01:21:55] nah that wouldn't work [01:22:08] not unless -3, but then it defeats his whole point [01:22:13] Yeah... I can't use my connection to do that [01:22:18] slowwwwneeesss [01:23:01] just fetch the file from tin then? [01:23:22] paravoid: Doubt that works... at least on terbium it doesn't... for good reasons [01:24:01] nope, doesn't [01:24:03] of course it does [01:24:08] huh? [01:24:13] http_proxy=http://webproxy.eqiad.wmnet:8080 wget [01:24:27] ah [01:24:33] <^demon|away> beer/dinner/tv time. [01:24:36] <^demon|away> later folks. [01:24:36] !log ebernhardson Synchronized php-1.25wmf8/extensions/Flow/includes/Parsoid/: Bump flow submodule in 1.25wmf8 (duration: 00m 04s) [01:24:40] does that thing have time outs or size limits [01:24:40] Logged the message, Master [01:24:41] paravoid: ^ [01:24:59] don't remember, check puppet :) [01:25:16] I'll find out by the time I'll try to pull in 6GiB... :P [01:25:30] what you're doing is kinda scary from a security perspective [01:25:44] but I guess we have bigger problems than that... [01:25:44] I know... I hate ssh agent [01:25:53] no, I meant downloading files from random sources like that [01:26:06] I can imagine all kind of attack scenarios where there's an 0day on wget or squid [01:26:15] and someone persuades you to wget it :) [01:26:25] but I suppose it's not a whole lot different than doing it in your computer [01:26:43] and compromising your own machine and hence your access [01:26:59] Stop using computers... [01:27:25] just think of all the 348237612e54day exploits on the human brain. what would it take to get hoo to type what I want him to type :) [01:27:49] bblack: Try beer, it's very effective :D [01:28:01] I've seen this happen in the wild [01:28:04] it wasn't wget, it was VLC [01:28:07] guess I could use sudo -u nobody curl to minimize the risk :D [01:28:11] curl http://bd808.com/beer | hoo [01:28:50] someone found a way to exploit "strings" recently [01:28:55] paravoid: That's not really comparable... VLC is a monster with a billion features and supported formats... while curl is hopefully rather simple [01:28:57] it was particularly amusing :) [01:28:59] Yeah, saw that [01:29:38] the sudo curl thing brings up an interesting point though. for many simple tools in many simple operational modes, there's no good reason they shouldn't be able to dump all privs before processing the remote data [01:30:18] hoo: curl (+ openssl or gnutls) "rather simple"? [01:30:22] couldn't disagree more :) [01:30:27] ah, ture [01:30:28] http://curl.haxx.se/docs/security.html [01:30:29] e.g. if commandline says fetch x from y and put it in file z, the command should open file z, connect to y, then dump all remaining privs and operate on those fds [01:30:36] the ssl part to it [01:30:48] I can't even sudo -u nobody :P [01:31:02] it doesn't completely solve the problem, but it certainly mitigates a lot of risks [01:31:06] So I should just use telnet and plain http, right [01:31:10] * hoo runs and hides [01:31:30] yeah [01:31:37] it's basically what bblack said [01:31:50] we should probably wget these with very few privileges :) [01:32:01] I can't even sudo nobody :P [01:32:03] hehe [01:32:12] bblack: the extreme of what you described is Qubes [01:32:24] the last thing I'm ever gonna do is use curl on a url like http://curl.haxx.se/docs/security.html ... that's just asking for it [01:33:16] running each app into a separate Xen domain [01:33:26] or well, each "environment" [01:33:48] well sure, but it's kind of a different thing than what I'm describing. [01:33:56] then showing them on the same screen, but with a window manager that annotates these windows [01:34:59] it's a common pattern that within the code of one utility doing a single action and exiting, the need for privileges decreases over the time domain, and the risk of breach increases. taking a strategy in the code of "drop each privilege as soon as we're sure we don't need it anymore to complete this action and exit" is a win. [01:35:13] nod [01:37:55] There's quite some options to it... you can also use selinux sandbox or so... [01:38:03] but most feasible for now would be sudo -u nobody [01:38:13] now I kind of wonder if there's already a kernel API for limiting your view of the local filesystem tree that doesn't muck with paths like chroot does. A way to say "from this point forward, I'm only ever going to operate within these 3 separate directories" [01:38:39] bblack: You can restrict syscalls quite a lot [01:39:14] I guess having open fds on the dirs would help, and then you could kill access to a lot of related syscalls. but that sounds error-prone and workaroundable. [01:39:33] Well, openssh does that AFAIK [01:39:35] using seccomp [01:39:50] yeah seccomp is a whole other ball of wax, but it's kinda awesome :) [01:42:36] Should I put a puppet patch to allow me to sudo nobody [01:42:56] or does that have implications I don't think of right now [01:53:31] hrm [01:53:55] (03PS1) 10Hoo man: Allow wikidev to sudo -u nobody [puppet] - 10https://gerrit.wikimedia.org/r/174896 [01:54:02] this channel is sooo depressing. next you're gonna tell us that piping curl into bash is something bad [01:55:20] MaxSem: Duh... didn't you read about shellshock? Use dash or sh for that, duh [01:55:22] :'D [01:58:49] (03CR) 10Hoo man: "I (offhand) can't think of any negative implications this would have, so it seemed like a good idea to me." [puppet] - 10https://gerrit.wikimedia.org/r/174896 (owner: 10Hoo man) [02:06:20] ori: hoo https://gerrit.wikimedia.org/r/#/c/174874/ (WIP, though welcome feedback now) [02:06:35] sure it can be made much nicer [02:08:57] aude: Nice [02:09:02] will have a proper look tomorrow [02:09:16] going to bed now [02:09:35] hoo: ok [02:09:57] aude: \o/ [02:10:00] :) [02:10:01] that's great, you did that fast! [02:10:24] i'm not sure it all works correctly yet, so want to provide tests and take another look [02:10:37] seems to work for me though :) [02:13:57] <^demon|away> Aww, Max left. Was gonna say, re: curl/bash [02:13:59] <^demon|away> https://twitter.com/OpenSourceLaws/status/530803482937556992 [02:15:16] !log l10nupdate Synchronized php-1.25wmf8/cache/l10n: (no message) (duration: 00m 01s) [02:15:19] !log LocalisationUpdate completed (1.25wmf8) at 2014-11-21 02:15:19+00:00 [02:15:23] Logged the message, Master [02:15:25] Logged the message, Master [02:26:26] !log l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 02s) [02:26:29] Logged the message, Master [02:26:31] !log LocalisationUpdate completed (1.25wmf9) at 2014-11-21 02:26:31+00:00 [02:26:35] Logged the message, Master [02:59:10] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [500.0] [03:00:00] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [500.0] [03:00:29] PROBLEM - puppet last run on cp4011 is CRITICAL: CRITICAL: Puppet has 1 failures [03:12:19] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:12:28] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [03:17:59] RECOVERY - puppet last run on cp4011 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [04:06:08] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: puppet fail [04:06:38] PROBLEM - ElasticSearch health check for shards on logstash1002 is CRITICAL: CRITICAL - elasticsearch http://10.64.32.137:9200/_cluster/health error while fetching: Request timed out. [04:07:28] RECOVERY - ElasticSearch health check for shards on logstash1002 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 1, timed_out: False, active_primary_shards: 41, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 122, initializing_shards: 0, number_of_data_nodes: 3 [04:19:30] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Nov 21 04:19:30 UTC 2014 (duration 19m 29s) [04:19:34] Logged the message, Master [04:25:29] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [04:35:36] (03PS1) 10Rush: Revert "bugzilla handles characters that are invalid for api" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/174905 [04:36:48] (03CR) 10Rush: [C: 032 V: 032] Revert "bugzilla handles characters that are invalid for api" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/174905 (owner: 10Rush) [05:33:38] PROBLEM - Disk space on dataset1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:28:38] PROBLEM - puppet last run on ms-fe1004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:19] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:30] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:49] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:49] PROBLEM - puppet last run on db2018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:52] (03CR) 10Chad: "If we cherry pick I266ed820 to 1.25wmf8 (it's already in master & wmf9) we could go ahead with Aaron's config change in I42742431." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174372 (owner: 10Tim Starling) [06:38:20] PROBLEM - puppet last run on db1027 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:18] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [06:45:19] RECOVERY - puppet last run on ms-fe1004 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:45:48] RECOVERY - puppet last run on db2018 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:46:38] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:47:09] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [06:54:18] (03CR) 10TTO: [C: 04-1] Enable VisualEditor Beta Feature on other wikis too (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174793 (owner: 10Jforrester) [06:56:58] RECOVERY - puppet last run on db1027 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [07:15:37] (03PS1) 10Giuseppe Lavagetto: HAT: start the jemalloc disabler _after_ apache is started and configured [puppet] - 10https://gerrit.wikimedia.org/r/174918 [07:16:46] (03CR) 10Giuseppe Lavagetto: [C: 032] HAT: start the jemalloc disabler _after_ apache is started and configured [puppet] - 10https://gerrit.wikimedia.org/r/174918 (owner: 10Giuseppe Lavagetto) [07:19:13] (03PS1) 10Giuseppe Lavagetto: mediawiki: include admin on the new appservers [puppet] - 10https://gerrit.wikimedia.org/r/174919 [07:19:40] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] mediawiki: include admin on the new appservers [puppet] - 10https://gerrit.wikimedia.org/r/174919 (owner: 10Giuseppe Lavagetto) [07:26:48] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [07:26:49] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [07:28:48] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 0 below the confidence bounds [07:28:48] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 0 below the confidence bounds [07:35:29] RECOVERY - Host mw1227 is UP: PING OK - Packet loss = 0%, RTA = 1.67 ms [07:37:29] PROBLEM - check if dhclient is running on mw1227 is CRITICAL: Connection refused by host [07:37:59] PROBLEM - Disk space on mw1227 is CRITICAL: Connection refused by host [07:37:59] PROBLEM - check configured eth on mw1227 is CRITICAL: Connection refused by host [07:37:59] PROBLEM - nutcracker port on mw1227 is CRITICAL: Connection refused by host [07:37:59] PROBLEM - SSH on mw1227 is CRITICAL: Connection refused [07:37:59] PROBLEM - HHVM processes on mw1227 is CRITICAL: Connection refused by host [07:37:59] PROBLEM - check if salt-minion is running on mw1227 is CRITICAL: Connection refused by host [07:38:00] PROBLEM - RAID on mw1227 is CRITICAL: Connection refused by host [07:38:00] PROBLEM - DPKG on mw1227 is CRITICAL: Connection refused by host [07:38:01] PROBLEM - Apache HTTP on mw1227 is CRITICAL: Connection refused [07:38:01] PROBLEM - nutcracker process on mw1227 is CRITICAL: Connection refused by host [07:38:02] PROBLEM - HHVM rendering on mw1227 is CRITICAL: Connection refused [07:41:58] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [07:41:58] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [08:02:45] PROBLEM - puppet last run on mw1227 is CRITICAL: Connection refused by host [08:02:46] PROBLEM - HHVM processes on mw1227 is CRITICAL: Connection refused by host [08:03:16] PROBLEM - HHVM rendering on mw1227 is CRITICAL: Connection refused [08:03:27] <_joe_> that's me doing imaging, I should just ack those sorry [08:03:34] PROBLEM - RAID on mw1227 is CRITICAL: Connection refused by host [08:03:39] <_joe_> I am still battling with a circular dependency in puppet [08:04:04] PROBLEM - check configured eth on mw1227 is CRITICAL: Connection refused by host [08:04:14] PROBLEM - check if dhclient is running on mw1227 is CRITICAL: Connection refused by host [08:04:25] PROBLEM - Apache HTTP on mw1227 is CRITICAL: Connection refused [08:04:25] PROBLEM - check if salt-minion is running on mw1227 is CRITICAL: Connection refused by host [08:04:25] PROBLEM - DPKG on mw1227 is CRITICAL: Connection refused by host [08:04:25] PROBLEM - mediawiki-installation DSH group on mw1227 is CRITICAL: Host mw1227 is not in mediawiki-installation dsh group [08:15:30] (03PS1) 10Giuseppe Lavagetto: HAT: convert exec to cron to avoid chicken-and-egg dependency [puppet] - 10https://gerrit.wikimedia.org/r/174925 [08:24:16] (03CR) 10Faidon Liambotis: [C: 04-1] "As said on IRC, I slightly prefer the original way :)" [puppet] - 10https://gerrit.wikimedia.org/r/174925 (owner: 10Giuseppe Lavagetto) [08:33:20] (03PS1) 10Giuseppe Lavagetto: HAT: create mediawiki::hhvm::housekeeping [puppet] - 10https://gerrit.wikimedia.org/r/174926 [08:36:46] (03CR) 10Giuseppe Lavagetto: [C: 032] HAT: create mediawiki::hhvm::housekeeping [puppet] - 10https://gerrit.wikimedia.org/r/174926 (owner: 10Giuseppe Lavagetto) [08:36:50] <_joe_> eeew [08:43:05] RECOVERY - Apache HTTP on mw1227 is OK: HTTP OK: HTTP/1.1 200 OK - 11783 bytes in 0.007 second response time [08:47:35] RECOVERY - check configured eth on mw1227 is OK: NRPE: Unable to read output [08:48:05] RECOVERY - check if dhclient is running on mw1227 is OK: PROCS OK: 0 processes with command name dhclient [08:48:14] RECOVERY - check if salt-minion is running on mw1227 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:48:14] RECOVERY - DPKG on mw1227 is OK: All packages OK [08:48:15] RECOVERY - RAID on mw1227 is OK: OK: no RAID installed [08:48:35] RECOVERY - HHVM processes on mw1227 is OK: PROCS OK: 1 process with command name hhvm [08:48:35] RECOVERY - puppet last run on mw1227 is OK: OK: Puppet is currently enabled, last run 31 minutes ago with 0 failures [09:13:25] RECOVERY - HHVM rendering on mw1227 is OK: HTTP OK: HTTP/1.1 200 OK - 72717 bytes in 3.903 second response time [09:19:05] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [09:19:15] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [09:23:03] PROBLEM - mediawiki-installation DSH group on mw1231 is CRITICAL: Host mw1231 is not in mediawiki-installation dsh group [09:23:45] PROBLEM - puppet last run on mw1231 is CRITICAL: CRITICAL: Puppet has 112 failures [09:24:04] PROBLEM - mediawiki-installation DSH group on mw1232 is CRITICAL: Host mw1232 is not in mediawiki-installation dsh group [09:24:33] PROBLEM - mediawiki-installation DSH group on mw1230 is CRITICAL: Host mw1230 is not in mediawiki-installation dsh group [09:25:04] PROBLEM - puppet last run on mw1230 is CRITICAL: CRITICAL: Puppet has 112 failures [09:25:04] PROBLEM - DPKG on mw1232 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:27:54] PROBLEM - puppet last run on mw1232 is CRITICAL: CRITICAL: Puppet has 112 failures [09:28:02] (03CR) 10Ebrahim: "Any update on internal bug tracker on this?" [debs/librsvg] - 10https://gerrit.wikimedia.org/r/173639 (owner: 10Ebrahim) [09:28:14] RECOVERY - DPKG on mw1232 is OK: All packages OK [09:30:54] (03CR) 10Alexandros Kosiaris: "What the ..? The catalog should not even compile with that syntax error. And it did! Successfully, multiple times. Both on puppet compiler" [puppet] - 10https://gerrit.wikimedia.org/r/174783 (owner: 10Yuvipanda) [09:31:53] <_joe_> akosiaris: really fucking weird, ain't it? [09:32:24] PROBLEM - HHVM rendering on mw1231 is CRITICAL: Connection refused [09:32:24] RECOVERY - puppet last run on mw1230 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [09:34:03] RECOVERY - puppet last run on mw1231 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [09:34:24] RECOVERY - HHVM rendering on mw1231 is OK: HTTP OK: HTTP/1.1 200 OK - 72717 bytes in 1.920 second response time [09:35:07] _joe_: yup.... [09:35:17] frustrating tbh [09:35:45] Relying on a compiler to make sure your change is ok and then it not being ok is not cool... [09:35:49] not cool at all :-( [09:35:56] <_joe_> I know [09:36:07] <_joe_> btw it wasn't *always* failing [09:36:15] <_joe_> which sounds super-weird to me [09:36:29] I know. I tried debugging the one you pointed out to me yesterday [09:36:35] es200X something.. [09:36:41] <_joe_> 2008 I guess [09:36:42] I ran puppet 3-4 times and 0 errors [09:36:53] PROBLEM - Host mw1231 is DOWN: PING CRITICAL - Packet loss = 100% [09:37:07] <_joe_> that's me dist-upgrading and rebooting ^^ [09:37:23] PROBLEM - HHVM rendering on mw1232 is CRITICAL: Connection refused [09:37:34] RECOVERY - Host mw1231 is UP: PING OK - Packet loss = 0%, RTA = 1.22 ms [09:38:24] RECOVERY - HHVM rendering on mw1232 is OK: HTTP OK: HTTP/1.1 200 OK - 72717 bytes in 1.968 second response time [09:40:17] RECOVERY - puppet last run on mw1232 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [09:40:21] <_joe_> akosiaris: bwt wmf-reimage is now really really useful [09:43:25] PROBLEM - puppet last run on mw1233 is CRITICAL: CRITICAL: Puppet has 112 failures [09:43:44] PROBLEM - mediawiki-installation DSH group on mw1234 is CRITICAL: Host mw1234 is not in mediawiki-installation dsh group [09:44:29] PROBLEM - puppet last run on mw1234 is CRITICAL: CRITICAL: Puppet has 8 failures [09:44:34] PROBLEM - Host mw1232 is DOWN: PING CRITICAL - Packet loss = 100% [09:45:04] RECOVERY - Host mw1232 is UP: PING WARNING - Packet loss = 61%, RTA = 0.71 ms [09:45:05] PROBLEM - mediawiki-installation DSH group on mw1233 is CRITICAL: Host mw1233 is not in mediawiki-installation dsh group [09:52:54] PROBLEM - HHVM rendering on mw1233 is CRITICAL: Connection refused [09:54:04] PROBLEM - Apache HTTP on mw1233 is CRITICAL: Connection refused [09:55:55] RECOVERY - HHVM rendering on mw1233 is OK: HTTP OK: HTTP/1.1 200 OK - 72717 bytes in 4.557 second response time [09:56:04] RECOVERY - Apache HTTP on mw1233 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.068 second response time [09:59:45] PROBLEM - Host mw1233 is DOWN: PING CRITICAL - Packet loss = 100% [10:00:06] RECOVERY - Host mw1233 is UP: PING OK - Packet loss = 0%, RTA = 1.09 ms [10:00:47] RECOVERY - puppet last run on mw1234 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [10:01:46] RECOVERY - puppet last run on mw1233 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [10:03:26] PROBLEM - mediawiki-installation DSH group on mw1235 is CRITICAL: Host mw1235 is not in mediawiki-installation dsh group [10:06:22] (03CR) 10Alexandros Kosiaris: [C: 032] syslog: deprecate /home/wikipedia/syslog [puppet] - 10https://gerrit.wikimedia.org/r/174673 (owner: 10Filippo Giunchedi) [10:07:27] PROBLEM - DPKG on mw1235 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:07:57] PROBLEM - puppet last run on mw1235 is CRITICAL: CRITICAL: Puppet has 112 failures [10:08:20] (03PS2) 10Filippo Giunchedi: syslog: deprecate /home/wikipedia/syslog [puppet] - 10https://gerrit.wikimedia.org/r/174673 [10:08:27] RECOVERY - DPKG on mw1235 is OK: All packages OK [10:08:29] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] syslog: deprecate /home/wikipedia/syslog [puppet] - 10https://gerrit.wikimedia.org/r/174673 (owner: 10Filippo Giunchedi) [10:08:48] (03PS1) 10Giuseppe Lavagetto: dsh: add 7 new appservers to mediawiki-installation [puppet] - 10https://gerrit.wikimedia.org/r/174929 [10:09:13] (03PS2) 10Giuseppe Lavagetto: dsh: add 7 new appservers to mediawiki-installation [puppet] - 10https://gerrit.wikimedia.org/r/174929 [10:09:29] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] dsh: add 7 new appservers to mediawiki-installation [puppet] - 10https://gerrit.wikimedia.org/r/174929 (owner: 10Giuseppe Lavagetto) [10:10:09] <_joe_> godog: can I merge? [10:10:20] haha yeah, was going to write the same question [10:10:23] <_joe_> done [10:10:28] thanks [10:10:33] ooh new servers? [10:10:34] fancy? [10:10:52] <_joe_> paravoid: yep [10:11:10] :) [10:11:13] <_joe_> I'm adding a few to api ASAP [10:11:18] nice [10:11:20] hhvm I assume :) [10:11:23] <_joe_> hhvm yes [10:11:51] <_joe_> we're a few days away from removing pool distinction, I see no point in creating zend appservers anymore [10:11:52] godog: \o/ [10:12:00] die /h/w/syslog die [10:12:05] <_joe_> eheheh [10:13:02] <_joe_> it's really incredible how 2 small silly scripts can ease your life; I'm imaging 15 servers this morning, and it's really really easy to do [10:13:22] <_joe_> while it was a true pain when I did the first ones [10:13:47] paravoid: haha yes! much more painless than I had imagined, luckily [10:17:27] PROBLEM - HHVM rendering on mw1235 is CRITICAL: Connection refused [10:19:16] <_joe_> godog: graphite1001 has a full root filesystem [10:19:45] <_joe_> sorry, it's /var/lib/carbon [10:19:58] <_joe_> anag is hard to read sometimes [10:21:31] RECOVERY - HHVM rendering on mw1235 is OK: HTTP OK: HTTP/1.1 200 OK - 72717 bytes in 1.631 second response time [10:22:08] _joe_: thanks, I'm taking a look [10:22:40] <_joe_> I'm looking at icinga to downtime the hosts so that I don't pollute the channel [10:23:02] RECOVERY - mediawiki-installation DSH group on mw1231 is OK: OK [10:23:20] RECOVERY - puppet last run on mw1235 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:24:01] RECOVERY - mediawiki-installation DSH group on mw1232 is OK: OK [10:24:30] RECOVERY - mediawiki-installation DSH group on mw1230 is OK: OK [10:29:07] <_joe_> !log pooled mw1227 in the api pool [10:29:11] Logged the message, Master [10:31:51] RECOVERY - mediawiki-installation DSH group on mw1227 is OK: OK [10:32:36] (03PS1) 10Steinsplitter: adding openfashion.momu.be to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174930 [10:32:40] (03CR) 10jenkins-bot: [V: 04-1] adding openfashion.momu.be to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174930 (owner: 10Steinsplitter) [10:33:28] <_joe_> !log pooled mw1230 in the api pool [10:33:31] Logged the message, Master [10:37:34] (03Abandoned) 10Steinsplitter: adding openfashion.momu.be to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174930 (owner: 10Steinsplitter) [10:40:53] <_joe_> !log pooled mw1231,mw1232,mw1233 in the api pool [10:40:56] Logged the message, Master [10:41:46] (03PS1) 10Steinsplitter: adding openfashion.momu.be to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174931 [10:43:06] PROBLEM - puppet last run on mw1241 is CRITICAL: CRITICAL: Puppet has 112 failures [10:43:26] PROBLEM - mediawiki-installation DSH group on mw1242 is CRITICAL: Host mw1242 is not in mediawiki-installation dsh group [10:43:27] PROBLEM - puppet last run on mw1238 is CRITICAL: CRITICAL: Puppet has 112 failures [10:43:29] <_joe_> gee, not fast enough [10:43:45] RECOVERY - mediawiki-installation DSH group on mw1234 is OK: OK [10:44:05] PROBLEM - puppet last run on mw1242 is CRITICAL: CRITICAL: Puppet has 112 failures [10:45:16] RECOVERY - mediawiki-installation DSH group on mw1233 is OK: OK [10:45:17] PROBLEM - mediawiki-installation DSH group on mw1238 is CRITICAL: Host mw1238 is not in mediawiki-installation dsh group [10:47:25] (03CR) 10Alexandros Kosiaris: [C: 031] "LGTM. For the interested, this will help move the top level txstatsd metrics created by each of the swift machines down the hierarchy" [puppet] - 10https://gerrit.wikimedia.org/r/174675 (owner: 10Filippo Giunchedi) [10:52:52] Hi, I need a patch to be merged: https://gerrit.wikimedia.org/r/#/c/174931/ [10:52:56] who can do that? [10:55:05] Romaine: I don't think that will happen before next Monday. [10:55:16] owww [10:55:26] RECOVERY - puppet last run on mw1241 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [10:55:46] the subject is that I am currently at a museum on location, the museum has announced a big GWT upload and needs a domain whitelisted [10:55:54] This week's deployments are done. [10:56:36] Well, perhaps you could get that deployed if you are able to convince someone that it's an "emergency deployment". :P [10:56:41] <_joe_> also, big GWT uploads need to be coordinated with ops :) [10:57:12] <_joe_> I'm almost sure there is a big warning somewhere [10:57:49] the upload is 342 items, and is announced as big donation of the museum collaborating with Wikimedia [10:58:13] <_joe_> what are those 342 items? [10:58:16] <_joe_> images? [10:58:36] <_joe_> and if so, are they very large (~ 100 MB or more)? [10:58:42] images yes [10:59:01] > 1 MB [10:59:16] sorry [10:59:21] smaller than 1 MB each [10:59:25] <_joe_> oh ok [10:59:32] <_joe_> then you have a green light from ops :) [11:00:13] thank you [11:00:42] _joe_: please +1 on the change then :) [11:01:11] <_joe_> matanya: in a minute, I'm building 10 servers ATM [11:01:26] seen, enjoy and thanks! [11:02:23] (03CR) 10Giuseppe Lavagetto: [C: 031] "Romaine said this is for a special-event GWT upload; it's going to be 342 images, all less than a MB, so it shouldn't be a problem for the" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174931 (owner: 10Steinsplitter) [11:03:25] RECOVERY - mediawiki-installation DSH group on mw1235 is OK: OK [11:05:16] <_joe_> !log pooled mw1234,mw1235 in the api pool [11:05:20] Logged the message, Master [11:05:36] RECOVERY - puppet last run on mw1242 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [11:06:44] (03CR) 10Gage: "Hmm, tin:/etc/dsh/apaches is currently a human-edited list of all mw1xxx nodes except mw1053 and mw1163. It does not include other nodes w" [puppet] - 10https://gerrit.wikimedia.org/r/160953 (owner: 10Alexandros Kosiaris) [11:06:56] thank you all for the work and help [11:07:51] (03PS1) 10Filippo Giunchedi: install-server: add swift cache partition [puppet] - 10https://gerrit.wikimedia.org/r/174932 [11:09:16] RECOVERY - puppet last run on mw1238 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [11:09:42] (03CR) 10Giuseppe Lavagetto: "/etc/dsh/group/apaches should not be used btw; we should use /etc/dsh/group/mediawiki-installation I guess;" [puppet] - 10https://gerrit.wikimedia.org/r/160953 (owner: 10Alexandros Kosiaris) [11:10:37] <_joe_> Romaine: I'll get back to you about merging the change when I have some more info [11:10:56] <_joe_> Romaine: is this time-constrained in some way? [11:11:28] <_joe_> meaning you need it before some hour today? [11:12:27] I am sitting here at the museum, if it takes 5 hours to have it merged it can be difficult, but if it takes 1 hour it is no problem [11:13:28] <_joe_> Romaine: I can technically do that, but I don't want to step on the toes of people doing releases normally. Releases on friday are a big no-no as far as I know [11:13:42] <_joe_> and there are good reasons for that [11:14:31] the museum has announced it big, also in collaboration with Europeana [11:16:04] <_joe_> I understand [11:16:18] <_joe_> on the other hand, there are rules, which are there for a reason [11:16:31] <_joe_> and I don't feel like breaking rules not imposed by my team [11:17:26] I understand [11:18:17] the museum is the first here and also very good PR for WMF [11:18:59] <_joe_> Romaine: if I didn't understand this is relevant, I wouldn't have stopped other things I am doing trying to reach someone else :) [11:19:10] great :) [11:35:23] (03CR) 10Mark Bergsma: [C: 032] adding openfashion.momu.be to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174931 (owner: 10Steinsplitter) [11:35:31] (03Merged) 10jenkins-bot: adding openfashion.momu.be to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174931 (owner: 10Steinsplitter) [11:36:55] (03PS2) 10Filippo Giunchedi: txstatsd: gather runtime self metrics under statsd [puppet] - 10https://gerrit.wikimedia.org/r/174675 [11:37:07] you are free to go Romaine :) [11:37:09] (03PS3) 10Filippo Giunchedi: txstatsd: gather runtime self metrics under statsd [puppet] - 10https://gerrit.wikimedia.org/r/174675 [11:37:19] as it seems mark merged it [11:38:04] (lunch time) [11:38:13] Romaine: _joe_ : hello [11:38:58] _joe_ : my hangout was not connected on my desktop, I received the message on my mobile which was out of reach :( [11:39:08] !log mark Synchronized wmf-config/InitialiseSettings.php: add openfashion.momu.be to wgCopyUploadsDomains (duration: 00m 06s) [11:39:10] Logged the message, Master [11:39:14] <_joe_> hashar: np :) [11:39:28] <_joe_> hashar: but mark already did this [11:40:02] https://meta.wikimedia.org/wiki/Talk:Global_AbuseFilter#Can.E2.80.99t_log_in Could someone look into this? [11:40:02] the most helpful people i have ever seen, i admire you guys [11:40:03] awesome! [11:40:05] thank you both [11:40:16] Glam toolset is quite an awesome project. [11:41:25] I am heading back to hack a PCB with a bunch of surface mounted component [11:41:29] poke me again if needed [11:47:23] Lousy internet here [12:08:01] (03PS1) 10Giuseppe Lavagetto: dsh: add new servers in a few groups [puppet] - 10https://gerrit.wikimedia.org/r/174935 [12:08:08] (03CR) 10jenkins-bot: [V: 04-1] dsh: add new servers in a few groups [puppet] - 10https://gerrit.wikimedia.org/r/174935 (owner: 10Giuseppe Lavagetto) [12:09:20] (03PS2) 10Giuseppe Lavagetto: dsh: add new servers in a few groups [puppet] - 10https://gerrit.wikimedia.org/r/174935 [12:23:34] PROBLEM - mediawiki-installation DSH group on mw1236 is CRITICAL: Host mw1236 is not in mediawiki-installation dsh group [12:25:14] (03CR) 10Giuseppe Lavagetto: [C: 032] dsh: add new servers in a few groups [puppet] - 10https://gerrit.wikimedia.org/r/174935 (owner: 10Giuseppe Lavagetto) [12:25:14] PROBLEM - mediawiki-installation DSH group on mw1246 is CRITICAL: Host mw1246 is not in mediawiki-installation dsh group [12:25:14] PROBLEM - DPKG on mw1246 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:29:16] RECOVERY - DPKG on mw1246 is OK: All packages OK [12:29:35] PROBLEM - puppet last run on mw1246 is CRITICAL: CRITICAL: Puppet has 111 failures [12:34:25] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [12:34:35] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [12:43:34] RECOVERY - mediawiki-installation DSH group on mw1242 is OK: OK [12:45:24] RECOVERY - mediawiki-installation DSH group on mw1238 is OK: OK [12:48:54] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [12:48:55] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [12:55:34] (03CR) 10Alexandros Kosiaris: [C: 031] install-server: add swift cache partition [puppet] - 10https://gerrit.wikimedia.org/r/174932 (owner: 10Filippo Giunchedi) [13:08:20] !log fresh dump db1046 to db2011 [13:08:24] Logged the message, Master [13:12:05] PROBLEM - puppet last run on amssq62 is CRITICAL: CRITICAL: puppet fail [13:23:25] RECOVERY - mediawiki-installation DSH group on mw1236 is OK: OK [13:25:14] RECOVERY - mediawiki-installation DSH group on mw1246 is OK: OK [13:29:24] (03PS6) 10Yuvipanda: shinken: Setup IRC notification for shinken [puppet] - 10https://gerrit.wikimedia.org/r/173080 [13:31:25] RECOVERY - puppet last run on amssq62 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [13:32:07] (03PS7) 10Yuvipanda: shinken: Setup IRC notification for shinken [puppet] - 10https://gerrit.wikimedia.org/r/173080 [13:36:53] (03PS1) 10Yuvipanda: shinken: Don't specify any contacts for services [puppet] - 10https://gerrit.wikimedia.org/r/174944 [13:37:13] (03CR) 10Yuvipanda: [C: 032] shinken: Don't specify any contacts for services [puppet] - 10https://gerrit.wikimedia.org/r/174944 (owner: 10Yuvipanda) [13:43:02] (03PS1) 10Cmjohnson: Adding dns entries for mw1247-58 [dns] - 10https://gerrit.wikimedia.org/r/174945 [13:44:44] (03CR) 10Cmjohnson: [C: 032] Adding dns entries for mw1247-58 [dns] - 10https://gerrit.wikimedia.org/r/174945 (owner: 10Cmjohnson) [13:45:01] <_joe_> cmjohnson: we have a couple of errors right now [13:45:34] <_joe_> cmjohnson: mw1243 is unreachable via ipmi; also, mw1228.mgmt.eqiad.wmnet points to mw1227 [13:45:37] _joe_ recognized a couple of things ysterday...mw1227=1229 didn't install correctly..i set them with wrong ip's in idrack [13:45:54] it's fixed now but will need re-install [13:46:06] <_joe_> cmjohnson: not an issue [13:46:08] i also fixed all so HT is enabled [13:46:16] <_joe_> and the idrac of mw1228? [13:46:26] i need to fix the dhcp file [13:46:44] (03PS1) 10Yuvipanda: shinken: Fix email commands to work properly [puppet] - 10https://gerrit.wikimedia.org/r/174946 [13:47:01] <_joe_> cmjohnson: mw1227 is in prod right now, so you may want me to depool it if we need to reinstall it [13:47:08] I will make that change when I add 1247-58 [13:47:17] oh...please depool it [13:47:32] <_joe_> ok [13:48:17] <_joe_> !log depooled mw1227 [13:48:23] Logged the message, Master [13:48:32] <_joe_> so once mw1227-9 are ok, let me know [13:48:43] <_joe_> I want to have those in the api pool this weekend [13:49:02] <_joe_> and thanks a bunch for HT, I've seen you did that everywhere already [13:49:33] <_joe_> (btw, I'm slightly confused, I thought we bought the 10-core processors, we've got 8 cores instead) [13:51:02] _joe_ mw1243 ipmi should be enabled now [13:51:15] <_joe_> cmjohnson: thanks [13:51:22] _joe_ you will have all of 1246-1258 shortly [13:51:43] sorry 1221-1258 [13:51:43] <_joe_> cmjohnson: wow [13:51:53] <_joe_> oh the first ones too? [13:51:58] <_joe_> that's great [13:52:13] <_joe_> I'd finish installing the ones for the api pool [13:52:23] they'll be next week. once you have all of these going I will want to take down mw1201-3 and 1208-10 [13:52:50] I have room in the rack for 2 of those ^ any of them more important than the others? [13:53:06] <_joe_> yes the ones between 21 and 29 [13:54:03] oh..i meant out of the 6 that need to be moved (mw1201-203 or 1208-1210) [13:54:12] <_joe_> oh sorry :) [13:54:23] <_joe_> lemme take a look [13:55:46] <_joe_> cmjohnson: 01-03 are more important (they're API appservers) [13:56:10] <_joe_> but I don't think we're ready to turn those off now [13:56:29] <_joe_> how long a downtime would you expect? [13:58:57] i can keep 2 of them in the rack so about 2 mins [13:59:12] the 3rd not sure I need to find a home [13:59:43] prolly going to put in same row though so no dns changes if that's the case will be about 5 mins [14:04:06] if you'd move the older servers within the same row [14:04:11] just keep them in the current rack [14:04:21] not worth the effort at this time, we could do it later if needed too [14:04:38] okay [14:05:00] put the new ones there instead then [14:09:01] (03PS2) 10Yuvipanda: shinken: Fix email commands to work properly [puppet] - 10https://gerrit.wikimedia.org/r/174946 [14:09:47] (03PS3) 10Yuvipanda: shinken: Fix email commands to work properly [puppet] - 10https://gerrit.wikimedia.org/r/174946 [14:19:02] (03CR) 10Yuvipanda: [C: 032] shinken: Fix email commands to work properly [puppet] - 10https://gerrit.wikimedia.org/r/174946 (owner: 10Yuvipanda) [14:19:24] RECOVERY - Disk space on graphite1001 is OK: DISK OK [14:20:09] (03PS1) 10Cmjohnson: Adding and correcting new app servers dhcpd file [puppet] - 10https://gerrit.wikimedia.org/r/174949 [14:21:14] (03PS8) 10Yuvipanda: shinken: Setup IRC notification for shinken [puppet] - 10https://gerrit.wikimedia.org/r/173080 [14:23:23] (03PS9) 10Yuvipanda: shinken: Setup IRC notification for shinken [puppet] - 10https://gerrit.wikimedia.org/r/173080 [14:28:42] (03CR) 10Cmjohnson: [C: 032] Adding and correcting new app servers dhcpd file [puppet] - 10https://gerrit.wikimedia.org/r/174949 (owner: 10Cmjohnson) [14:28:52] (03PS1) 10Hoo man: Also whitelist IPv6 ips for bouncehandler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174950 [14:29:02] Jeff_Green: tonythomas ^ [14:29:09] if you're ok with that, I can push it in a minute [14:29:17] looking [14:29:20] hope I c&p the right thing [14:29:48] * tonythomas liked the commit message though :) [14:30:08] hoo remove the last IP, I don't think it's valid [14:30:25] (my mistake, I just sent tony everything that looked like ipv6 on eth0) [14:30:31] 2620:0:861:3:ca1f:66ff:febf:8dd6 ? [14:30:39] yes [14:31:29] Jeff_Green: mh... that one is pingable [14:31:38] oh, ok, leave it then [14:31:43] but it doesn't have reverse record, so I'm not sure where it ends up [14:31:47] :P [14:31:49] * Jeff_Green thinks we should just allow all wmf subnets by CIDR [14:31:57] Shared secret? :P [14:32:03] We can do both, actually [14:32:18] allowing external requests, even if they know the secret sounds wrong [14:32:26] hoo: seems like overkill, we already have plenty of security via the VERP hash [14:32:46] mh [14:33:04] I guess I should read up on VERP more so that I can actually assess that [14:33:07] the IP filter's job imo is to short-circuit the more heavy hash check, for external subnets [14:33:08] if that's the case, nice [14:33:25] (03CR) 10Hoo man: [C: 032] "Trivial change is trivial" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174950 (owner: 10Hoo man) [14:33:32] (03Merged) 10jenkins-bot: Also whitelist IPv6 ips for bouncehandler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174950 (owner: 10Hoo man) [14:33:37] hoo we encrypt a string (with HMAC? i forget now...) and check it on the return [14:34:14] Jeff_Green: Who has that string? Only we or also the mail recipient? [14:34:32] it's embedded in the envelop recipient when the message leaves mediawiki [14:34:39] !log hoo Synchronized wmf-config/CommonSettings.php: Also whitelist IPv6 ips for bouncehandler (duration: 00m 08s) [14:34:43] wiki-wikiname-{biguglystring}@wm.o [14:34:44] Logged the message, Master [14:34:56] when the message comes back, it's the envelope recipient [14:34:57] my first deploy without ssh-agent (except of my gerrit key :S) [14:35:02] ooh fun [14:35:11] quoting csteipp on hmac hash we use " * The generated hash is cut down to 12 ( 96 bits ) instead of the full 120 bits. An attacker would be able to brute force the signature by sending an average of 2^95 emails to us. We would (hopefully) notice that. " [14:35:53] that sounds reasonable safe :P [14:36:02] so the mailserver authentication should be very lightweight [14:36:10] true :) [14:36:20] it's really just to protect mw from having to do a whole lot of decrypting [14:36:29] let me send the fake email again - or should wait some time to see the changes ? [14:36:41] tonythomas: No, that is live now [14:37:14] PROBLEM - Apache HTTP on mw1025 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 50748 bytes in 0.009 second response time [14:37:15] PROBLEM - HHVM rendering on mw1025 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 50748 bytes in 0.023 second response time [14:38:00] (03PS1) 10Hoo man: Bouncehandler: Use the API pool for API requests [puppet] - 10https://gerrit.wikimedia.org/r/174951 [14:38:37] hoo: we missed that ! omg [14:38:46] missed what? [14:38:57] api.svc.${::mw_primary}.wmnet/w/api.php ? [14:39:06] Jeff_Green: saw this ? https://gerrit.wikimedia.org/r/#/c/174951/1/manifests/role/mail.pp [14:39:08] Well, it's not really a problem [14:39:12] okey :) [14:39:21] let me send the new email [14:39:23] just that for sanity reasons API requests should go to that pool [14:39:39] send ! [14:39:44] Jeff_Green: exim logs look fine ? [14:40:23] looking [14:40:58] k :) [14:41:14] * tonythomas hope something will show up this time in bounce_records table [14:41:22] or maybe in wgDebugLogs ! :P [14:41:55] tonythomas: yeah, the last delivery looked ok [14:41:59] cool [14:42:05] hoo: db lookup ? [14:42:08] it hit the mwverpbounceprocessor and competed [14:42:11] completed [14:42:15] PROBLEM - Host mw1230 is DOWN: PING CRITICAL - Packet loss = 100% [14:42:25] tonythomas: If oyu tell me the name of the log group, we can add it to wgDebugLogGroups [14:42:54] PROBLEM - Host mw1231 is DOWN: PING CRITICAL - Packet loss = 100% [14:43:04] wfDebugLog( 'BounceHandler', "POST received " ); [14:43:05] PROBLEM - Host mw1232 is DOWN: PING CRITICAL - Packet loss = 100% [14:43:05] Table is still empty [14:43:08] BounceHandler ? [14:43:09] _joe_ [14:43:16] bah [14:43:19] tonythomas: Ok, I can add that [14:43:31] so - it will log only from the next one ? [14:43:37] should i send the same again ? [14:43:46] tonythomas: Wait a bit [14:43:51] that needs a configuration change [14:43:51] okey :) [14:43:55] PROBLEM - Host mw1233 is DOWN: CRITICAL - Plugin timed out after 15 seconds [14:44:04] RECOVERY - Host mw1230 is UP: PING WARNING - Packet loss = 58%, RTA = 2.03 ms [14:44:54] RECOVERY - Host mw1231 is UP: PING OK - Packet loss = 0%, RTA = 3.68 ms [14:44:55] RECOVERY - Host mw1232 is UP: PING OK - Packet loss = 0%, RTA = 2.73 ms [14:45:04] (03PS1) 10Hoo man: Add 'BounceHandler' to wgDebugLogGroups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174952 [14:45:16] RECOVERY - Host mw1233 is UP: PING WARNING - Packet loss = 37%, RTA = 0.43 ms [14:45:25] tonythomas: You only log on API requests and not in code paths that are hit extremely often, I guess? [14:45:32] Like hook handlers for or so [14:45:58] anyway - we have a couple of them in here https://github.com/wikimedia/mediawiki-extensions-BounceHandler/blob/master/includes/ApiBounceHandler.php [14:46:02] hope one of them will log [14:46:04] PROBLEM - HHVM processes on mw1230 is CRITICAL: Connection refused by host [14:46:05] PROBLEM - check if dhclient is running on mw1230 is CRITICAL: Connection refused by host [14:46:05] PROBLEM - SSH on mw1230 is CRITICAL: Connection refused [14:46:14] PROBLEM - puppet last run on mw1230 is CRITICAL: Connection refused by host [14:46:15] PROBLEM - DPKG on mw1230 is CRITICAL: Connection refused by host [14:46:15] PROBLEM - nutcracker port on mw1230 is CRITICAL: Connection refused by host [14:46:15] PROBLEM - Disk space on mw1230 is CRITICAL: Connection refused by host [14:46:25] PROBLEM - RAID on mw1230 is CRITICAL: Connection refused by host [14:46:25] PROBLEM - nutcracker process on mw1230 is CRITICAL: Connection refused by host [14:46:25] PROBLEM - check configured eth on mw1230 is CRITICAL: Connection refused by host [14:46:25] PROBLEM - check if salt-minion is running on mw1230 is CRITICAL: Connection refused by host [14:46:25] PROBLEM - HHVM rendering on mw1230 is CRITICAL: Connection refused [14:46:34] grep-ing your extension for that looks good at a glance [14:46:34] PROBLEM - Apache HTTP on mw1230 is CRITICAL: Connection refused [14:46:35] let's do that [14:46:50] okey :) [14:46:55] PROBLEM - HHVM processes on mw1231 is CRITICAL: Connection refused by host [14:47:04] PROBLEM - HHVM rendering on mw1231 is CRITICAL: Connection refused [14:47:05] PROBLEM - Disk space on mw1232 is CRITICAL: Connection refused by host [14:47:14] PROBLEM - Apache HTTP on mw1232 is CRITICAL: Connection refused [14:47:15] PROBLEM - nutcracker process on mw1232 is CRITICAL: Connection refused by host [14:47:15] PROBLEM - puppet last run on mw1231 is CRITICAL: Connection refused by host [14:47:15] PROBLEM - HHVM processes on mw1232 is CRITICAL: Connection refused by host [14:47:15] PROBLEM - SSH on mw1232 is CRITICAL: Connection refused [14:47:15] PROBLEM - RAID on mw1231 is CRITICAL: Connection refused by host [14:47:15] PROBLEM - puppet last run on mw1232 is CRITICAL: Connection refused by host [14:47:24] PROBLEM - RAID on mw1232 is CRITICAL: Connection refused by host [14:47:29] PROBLEM - RAID on mw1233 is CRITICAL: Connection refused by host [14:47:29] PROBLEM - nutcracker process on mw1233 is CRITICAL: Connection refused by host [14:47:30] !log mw1230-1233 down --reinstalling [14:47:34] PROBLEM - Apache HTTP on mw1231 is CRITICAL: Connection refused [14:47:34] PROBLEM - check if dhclient is running on mw1231 is CRITICAL: Connection refused by host [14:47:34] PROBLEM - nutcracker process on mw1231 is CRITICAL: Connection refused by host [14:47:34] PROBLEM - puppet last run on mw1233 is CRITICAL: Connection refused by host [14:47:34] PROBLEM - SSH on mw1231 is CRITICAL: Connection refused [14:47:35] PROBLEM - Disk space on mw1231 is CRITICAL: Connection refused by host [14:47:35] PROBLEM - check if salt-minion is running on mw1231 is CRITICAL: Connection refused by host [14:47:36] PROBLEM - check configured eth on mw1232 is CRITICAL: Connection refused by host [14:47:36] Logged the message, Master [14:47:42] cmjohnson: Are those in dsh? [14:47:45] PROBLEM - check if salt-minion is running on mw1232 is CRITICAL: Connection refused by host [14:47:45] PROBLEM - nutcracker port on mw1233 is CRITICAL: Connection refused by host [14:47:45] PROBLEM - DPKG on mw1231 is CRITICAL: Connection refused by host [14:47:45] PROBLEM - HHVM rendering on mw1232 is CRITICAL: Connection refused [14:47:45] PROBLEM - Disk space on mw1233 is CRITICAL: Connection refused by host [14:47:45] PROBLEM - check if salt-minion is running on mw1233 is CRITICAL: Connection refused by host [14:47:45] PROBLEM - nutcracker port on mw1231 is CRITICAL: Connection refused by host [14:47:46] PROBLEM - Apache HTTP on mw1233 is CRITICAL: Connection refused [14:47:46] PROBLEM - check configured eth on mw1231 is CRITICAL: Connection refused by host [14:47:47] yep...it was accidental [14:47:54] PROBLEM - DPKG on mw1232 is CRITICAL: Connection refused by host [14:47:54] PROBLEM - check if dhclient is running on mw1232 is CRITICAL: Connection refused by host [14:47:55] PROBLEM - check configured eth on mw1233 is CRITICAL: Connection refused by host [14:47:55] PROBLEM - nutcracker port on mw1232 is CRITICAL: Connection refused by host [14:48:01] they have to come out [14:48:14] PROBLEM - SSH on mw1233 is CRITICAL: Connection refused [14:48:15] PROBLEM - HHVM rendering on mw1233 is CRITICAL: Connection refused [14:48:15] PROBLEM - check if dhclient is running on mw1233 is CRITICAL: Connection refused by host [14:48:24] PROBLEM - DPKG on mw1233 is CRITICAL: Connection refused by host [14:48:24] PROBLEM - HHVM processes on mw1233 is CRITICAL: Connection refused by host [14:48:28] yep [14:48:35] tonythomas: Wont deploy until that is done [14:49:13] hoo: until the above issues gets solved ? [14:49:54] tonythomas: Until those appservers haven been removed from dsh [14:50:07] well, I could do it anyway, stuff will just fail for them [14:50:16] PROBLEM - puppet last run on mw1227 is CRITICAL: CRITICAL: puppet fail [14:50:34] hoo: dsh ? [14:50:45] Hm.. mw1227 is not 1230-1233 [14:51:17] Krinkle: That is a puppet fail [14:51:27] probably apt or so [14:51:41] can't look into puppet logs anymore, because ops are evil :P [14:51:46] tch tch :) [14:51:55] (03PS1) 10Cmjohnson: removing mw1227,1230-33 from dsh groups...reinstalling [puppet] - 10https://gerrit.wikimedia.org/r/174953 [14:51:56] mw1227 is new HHVM appserver, I think. _joe_ was imaging them [14:52:06] that also makes sense [14:52:12] mw1227 was installed wrong [14:52:48] I thought _joe_ took it out already but guess not...sorry if I slowed you down...will have back up shortly [14:52:52] mw1227 is also not responding to ssh. [14:53:05] PROBLEM - HHVM rendering on mw1227 is CRITICAL: Connection refused [14:53:05] PROBLEM - DPKG on mw1227 is CRITICAL: Connection refused by host [14:53:14] PROBLEM - nutcracker process on mw1227 is CRITICAL: Connection refused by host [14:53:18] (03CR) 10Cmjohnson: [C: 032] removing mw1227,1230-33 from dsh groups...reinstalling [puppet] - 10https://gerrit.wikimedia.org/r/174953 (owner: 10Cmjohnson) [14:53:24] It's never "just' puppet :P [14:53:24] PROBLEM - RAID on mw1227 is CRITICAL: Connection refused by host [14:56:24] RECOVERY - SSH on mw1230 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0) [14:56:25] RECOVERY - SSH on mw1232 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0) [14:56:26] PROBLEM - Apache HTTP on mw1227 is CRITICAL: Connection refused [14:56:34] PROBLEM - nutcracker port on mw1227 is CRITICAL: Connection refused by host [14:56:35] PROBLEM - check if salt-minion is running on mw1227 is CRITICAL: Connection refused by host [14:56:35] PROBLEM - check if dhclient is running on mw1227 is CRITICAL: Connection refused by host [14:56:35] PROBLEM - SSH on mw1227 is CRITICAL: Connection refused [14:56:41] hoo: time to go ? :) [14:56:45] PROBLEM - Disk space on mw1227 is CRITICAL: Connection refused by host [14:57:00] tonythomas: yeah [14:57:04] PROBLEM - HHVM processes on mw1227 is CRITICAL: Connection refused by host [14:57:04] (03CR) 10Hoo man: [C: 032] Add 'BounceHandler' to wgDebugLogGroups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174952 (owner: 10Hoo man) [14:57:05] PROBLEM - check configured eth on mw1227 is CRITICAL: Connection refused by host [14:57:15] (03Merged) 10jenkins-bot: Add 'BounceHandler' to wgDebugLogGroups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174952 (owner: 10Hoo man) [14:57:20] yay :) [14:57:27] time to send email again [14:57:39] Krinkle: sometimes it's just puppet :) https://gerrit.wikimedia.org/r/#/c/174783/ [14:57:45] RECOVERY - SSH on mw1231 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0) [14:57:45] !log hoo Synchronized wmf-config/InitialiseSettings.php: Add 'BounceHandler' to wgDebugLogGroups (duration: 00m 05s) [14:57:48] Logged the message, Master [14:58:20] well, I still saw errors, as puppet didn't yet run on tin after the push [14:58:21] but whatever [14:58:25] RECOVERY - SSH on mw1233 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0) [14:58:31] YuviPanda: meh, depends on your definition of puppet. An error in our puppet manifest vs. puppet being puppet. [14:58:39] :) [14:58:53] tonythomas: So... you can go ahead now [14:59:04] PROBLEM - NTP on mw1230 is CRITICAL: NTP CRITICAL: No response from NTP server [14:59:17] hoo: sent ! [14:59:24] PROBLEM - NTP on mw1231 is CRITICAL: NTP CRITICAL: No response from NTP server [14:59:32] * tonythomas hopes something show up in debugLog [14:59:35] PROBLEM - NTP on mw1232 is CRITICAL: NTP CRITICAL: No response from NTP server [14:59:55] PROBLEM - NTP on mw1233 is CRITICAL: NTP CRITICAL: No response from NTP server [15:00:44] RECOVERY - SSH on mw1227 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0) [15:01:23] tonythomas: Nothing yet [15:01:31] hoo: thats strage ! [15:01:40] we are looking at test2wiki, right ? [15:02:00] I haven't configured it [15:02:03] and the runJobs.php has ran, right ? [15:02:55] tonythomas: How is the job queue involved? [15:03:47] (03CR) 10Alexandros Kosiaris: [C: 04-2] "Let's not mirror codfw yet. Leaving the change as is for now while we still discuss how to implement networking in codfw" [dns] - 10https://gerrit.wikimedia.org/r/174732 (owner: 10Alexandros Kosiaris) [15:03:55] the API request gets schedules the bounce process job in the job queue [15:04:13] https://github.com/wikimedia/mediawiki-extensions-BounceHandler/blob/master/includes/ApiBounceHandler.php#L36 [15:04:31] I see [15:04:42] but there's no logging anywhere (despite for failures) [15:04:45] so nothing to see [15:05:28] I can look into the runJobs log, though [15:05:30] hoo: can manually running a php runJobs.php spread some light ? [15:06:05] No [15:06:10] but job runs are logged [15:06:18] okey. anything there ? [15:06:24] looking [15:06:27] :) [15:07:45] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: puppet fail [15:08:11] tonythomas: Ok, seems like the job not getting run [15:08:33] hoo: like runJobs.php is not getting run automattically ? [15:08:37] no [15:08:45] your job has not been run [15:09:00] hmm .. interesting again [15:09:30] tonythomas: Are you using test or test2 for testing? [15:09:34] PROBLEM - NTP on mw1227 is CRITICAL: NTP CRITICAL: No response from NTP server [15:09:37] hoo: test2wiki [15:09:47] sending via https://test2.wikipedia.org/wiki/Special:Version [15:09:52] sorry https://test2.wikipedia.org/wiki/Special:EmailUser [15:10:01] hoo@tin:~$ sudo -u apache mwscript showJobs.php --wiki test2wiki --group [15:10:01] cirrusSearchIncomingLinkCount: 3 queued; 0 claimed (0 active, 0 abandoned); 0 delayed [15:10:08] so nothing waiting in the queue [15:10:17] okey [15:11:27] let me send a mail again - and can you quickly run howJobs.php --wiki test2wiki --group ? [15:11:42] to see something getting scheduled ( in case we missed any ) [15:11:52] I guess stuff fails before the job is submitted [15:12:09] well, actually I'm pretty sure about that ;) [15:12:45] RECOVERY - Apache HTTP on mw1227 is OK: HTTP OK: HTTP/1.1 200 OK - 11783 bytes in 0.002 second response time [15:12:45] yep [15:12:55] hoo: and that wont be IP check fail, right ? [15:13:02] nope [15:13:06] as we have if ( !$inRangeIP ) { wfDebugLog( " " ) } [15:13:08] tonythomas: Got the exception [15:13:13] wow [15:13:16] whats that ? [15:13:23] http://fpaste.org/152840/41658279/ [15:13:28] your using the wrong DB connection [15:13:41] * you're [15:14:05] Table 'wikishared.user' doesn't exist (10.64.16.18) [15:14:08] yep [15:14:20] of course you need to use a DB connection to your local wiki for that [15:15:01] $dbr = self::getBounceRecordDB( DB_SLAVE, $wikiId ); [15:15:03] not that [15:15:11] but just wfGetDB( DB_SLAVE ); [15:15:25] RECOVERY - NTP on mw1227 is OK: NTP OK: Offset -0.08804678917 secs [15:15:35] but we would want to use $wikiId too right / [15:15:37] ? [15:15:56] when it come to full cluster depending on single extension installed on meta/login wiki ? [15:16:03] tonythomas: Is that not guarenteed to be on the local Wiki? [15:16:08] * guaranteed [15:16:15] hoo: in this case, yes [15:16:32] Well, the get a connection to that wiki id [15:16:40] but we are installing it later only in meta - and other wikis bounce too would come there, right ? [15:16:40] (but make sure to relaese it after use [15:16:41] ) [15:16:55] RECOVERY - Apache HTTP on mw1230 is OK: HTTP OK: HTTP/1.1 200 OK - 11783 bytes in 0.006 second response time [15:17:02] tonythomas: Not sure what Aaron wanted [15:17:05] RECOVERY - Disk space on mw1227 is OK: DISK OK [15:17:15] RECOVERY - HHVM processes on mw1227 is OK: PROCS OK: 1 process with command name hhvm [15:17:16] just use the given wiki id, get a LB, and use that [15:17:20] (and release after) [15:17:23] LB -> load balancer [15:17:24] RECOVERY - check configured eth on mw1227 is OK: NRPE: Unable to read output [15:17:34] RECOVERY - nutcracker process on mw1227 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [15:17:35] RECOVERY - RAID on mw1227 is OK: OK: no RAID installed [15:17:45] RECOVERY - nutcracker port on mw1227 is OK: TCP OK - 0.000 second response time on port 11212 [15:17:55] RECOVERY - check if salt-minion is running on mw1227 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [15:17:56] RECOVERY - check if dhclient is running on mw1227 is OK: PROCS OK: 0 processes with command name dhclient [15:19:51] (03PS1) 10Alexandros Kosiaris: codfw row B labs network [dns] - 10https://gerrit.wikimedia.org/r/174955 [15:20:05] RECOVERY - Apache HTTP on mw1231 is OK: HTTP OK: HTTP/1.1 200 OK - 11783 bytes in 0.030 second response time [15:20:30] Going to have (late) lunch... but I'll be back later [15:20:33] hoo : $lb = $wgBounceHandlerCluster ? wfGetLBFactory()->getExternalLB( $wgBounceHandlerCluster ): wfGetLB( $wiki ); needs to be changed to $wgBounceHandlerCluster ? wfGetLBFactory()->getExternalLB( $wiki ) ? [15:20:35] RECOVERY - DPKG on mw1227 is OK: All packages OK [15:20:45] RECOVERY - Apache HTTP on mw1232 is OK: HTTP OK: HTTP/1.1 200 OK - 11783 bytes in 0.004 second response time [15:20:57] tonythomas: I think you got that wrong [15:21:07] you need to differ where you want to access the local wiki's data [15:21:16] and where you want to access your shared table [15:21:18] away now [15:21:35] RECOVERY - HHVM processes on mw1230 is OK: PROCS OK: 1 process with command name hhvm [15:21:35] RECOVERY - check if dhclient is running on mw1230 is OK: PROCS OK: 0 processes with command name dhclient [15:21:44] okey :) my journey too starts in few mins. will email aaron and lego on this [15:21:56] RECOVERY - DPKG on mw1230 is OK: All packages OK [15:21:57] RECOVERY - nutcracker port on mw1230 is OK: TCP OK - 0.000 second response time on port 11212 [15:21:57] RECOVERY - Disk space on mw1230 is OK: DISK OK [15:21:58] RECOVERY - RAID on mw1230 is OK: OK: no RAID installed [15:22:04] RECOVERY - nutcracker process on mw1230 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [15:22:04] RECOVERY - check configured eth on mw1230 is OK: NRPE: Unable to read output [15:22:04] RECOVERY - check if salt-minion is running on mw1230 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [15:22:24] RECOVERY - Apache HTTP on mw1233 is OK: HTTP OK: HTTP/1.1 200 OK - 11783 bytes in 0.007 second response time [15:22:45] RECOVERY - NTP on mw1231 is OK: NTP OK: Offset -0.1696637869 secs [15:24:14] RECOVERY - check if dhclient is running on mw1231 is OK: PROCS OK: 0 processes with command name dhclient [15:24:14] RECOVERY - nutcracker process on mw1231 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [15:24:14] RECOVERY - NTP on mw1232 is OK: NTP OK: Offset -0.05772507191 secs [15:24:14] RECOVERY - Disk space on mw1231 is OK: DISK OK [15:24:15] RECOVERY - check if salt-minion is running on mw1231 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [15:24:24] RECOVERY - nutcracker port on mw1231 is OK: TCP OK - 0.000 second response time on port 11212 [15:24:25] RECOVERY - check configured eth on mw1231 is OK: NRPE: Unable to read output [15:24:34] RECOVERY - HHVM processes on mw1231 is OK: PROCS OK: 1 process with command name hhvm [15:24:35] PROBLEM - DPKG on mw1227 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [15:24:38] <_joe_> cmjohnson: /win 22 [15:24:54] RECOVERY - RAID on mw1231 is OK: OK: no RAID installed [15:24:54] PROBLEM - DPKG on mw1230 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [15:25:05] PROBLEM - mediawiki-installation DSH group on mw1231 is CRITICAL: Host mw1231 is not in mediawiki-installation dsh group [15:25:07] <_joe_> cmjohnson: have you done anything with those servers? [15:25:20] _joe_ so i ran a script to do mass installs and ended up reinstalling mw1230-33 [15:25:24] RECOVERY - check configured eth on mw1232 is OK: NRPE: Unable to read output [15:25:24] RECOVERY - check if salt-minion is running on mw1232 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [15:25:24] RECOVERY - DPKG on mw1231 is OK: All packages OK [15:25:25] RECOVERY - NTP on mw1233 is OK: NTP OK: Offset 0.0005931854248 secs [15:25:28] initial puppet run is going on now [15:25:34] RECOVERY - check if dhclient is running on mw1232 is OK: PROCS OK: 0 processes with command name dhclient [15:25:35] RECOVERY - nutcracker port on mw1232 is OK: TCP OK - 0.000 second response time on port 11212 [15:25:40] mw1227 just finished [15:25:47] RECOVERY - DPKG on mw1227 is OK: All packages OK [15:25:47] RECOVERY - Disk space on mw1232 is OK: DISK OK [15:25:54] RECOVERY - HHVM processes on mw1232 is OK: PROCS OK: 1 process with command name hhvm [15:25:54] RECOVERY - nutcracker process on mw1232 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [15:25:54] PROBLEM - puppet last run on mw1227 is CRITICAL: CRITICAL: Puppet has 112 failures [15:25:55] RECOVERY - RAID on mw1232 is OK: OK: no RAID installed [15:26:05] PROBLEM - mediawiki-installation DSH group on mw1232 is CRITICAL: Host mw1232 is not in mediawiki-installation dsh group [15:26:06] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [15:26:35] PROBLEM - mediawiki-installation DSH group on mw1230 is CRITICAL: Host mw1230 is not in mediawiki-installation dsh group [15:26:39] <_joe_> cmjohnson: you need to restart apache (hard restart) afterwards [15:26:50] okay [15:27:16] <_joe_> cmjohnson: did you depool them from the api cluster btw? [15:27:30] yes but after the fact [15:27:48] <_joe_> ok sorry [15:27:58] oh...no mw1230-33 is my fault [15:28:09] just happened to notice 1227 [15:28:24] PROBLEM - DPKG on mw1231 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [15:28:34] RECOVERY - DPKG on mw1232 is OK: All packages OK [15:28:54] PROBLEM - puppet last run on mw1230 is CRITICAL: CRITICAL: Puppet has 112 failures [15:28:55] RECOVERY - DPKG on mw1230 is OK: All packages OK [15:30:35] <_joe_> cmjohnson: I'm depooling mw1230-1233 then [15:31:34] PROBLEM - DPKG on mw1232 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [15:32:24] RECOVERY - DPKG on mw1231 is OK: All packages OK [15:32:55] PROBLEM - puppet last run on mw1231 is CRITICAL: CRITICAL: Puppet has 112 failures [15:33:04] PROBLEM - puppet last run on mw1232 is CRITICAL: CRITICAL: Puppet has 112 failures [15:33:35] RECOVERY - DPKG on mw1232 is OK: All packages OK [15:33:55] PROBLEM - mediawiki-installation DSH group on mw1227 is CRITICAL: Host mw1227 is not in mediawiki-installation dsh group [15:35:05] RECOVERY - nutcracker process on mw1233 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [15:35:05] RECOVERY - RAID on mw1233 is OK: OK: no RAID installed [15:35:25] RECOVERY - nutcracker port on mw1233 is OK: TCP OK - 0.000 second response time on port 11212 [15:35:25] RECOVERY - Disk space on mw1233 is OK: DISK OK [15:35:26] RECOVERY - check if salt-minion is running on mw1233 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [15:35:34] RECOVERY - check configured eth on mw1233 is OK: NRPE: Unable to read output [15:35:35] RECOVERY - NTP on mw1230 is OK: NTP OK: Offset 0.002781391144 secs [15:35:55] RECOVERY - check if dhclient is running on mw1233 is OK: PROCS OK: 0 processes with command name dhclient [15:36:00] RECOVERY - HHVM processes on mw1233 is OK: PROCS OK: 1 process with command name hhvm [15:36:00] RECOVERY - DPKG on mw1233 is OK: All packages OK [15:36:45] RECOVERY - HHVM rendering on mw1227 is OK: HTTP OK: HTTP/1.1 200 OK - 72717 bytes in 1.895 second response time [15:36:55] RECOVERY - puppet last run on mw1227 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [15:37:41] (03PS1) 10Giuseppe Lavagetto: Revert "removing mw1227,1230-33 from dsh groups...reinstalling" [puppet] - 10https://gerrit.wikimedia.org/r/174957 [15:38:01] hmm, where's ircecho code? [15:38:03] * YuviPanda searches [15:39:04] PROBLEM - DPKG on mw1233 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [15:39:57] (03CR) 10Giuseppe Lavagetto: [C: 032] Revert "removing mw1227,1230-33 from dsh groups...reinstalling" [puppet] - 10https://gerrit.wikimedia.org/r/174957 (owner: 10Giuseppe Lavagetto) [15:43:05] RECOVERY - DPKG on mw1233 is OK: All packages OK [15:43:25] PROBLEM - puppet last run on mw1233 is CRITICAL: CRITICAL: Puppet has 111 failures [15:44:25] PROBLEM - HHVM rendering on mw1230 is CRITICAL: Connection refused [15:44:36] PROBLEM - HHVM rendering on mw1232 is CRITICAL: Connection refused [15:44:57] PROBLEM - HHVM rendering on mw1231 is CRITICAL: Connection refused [15:45:30] <_joe_> !log repooling mw1227 [15:45:35] Logged the message, Master [15:46:05] PROBLEM - Apache HTTP on mw1232 is CRITICAL: Connection refused [15:46:25] PROBLEM - Apache HTTP on mw1231 is CRITICAL: Connection refused [15:46:25] PROBLEM - Apache HTTP on mw1230 is CRITICAL: Connection refused [15:47:14] PROBLEM - mediawiki-installation DSH group on mw1233 is CRITICAL: Host mw1233 is not in mediawiki-installation dsh group [15:51:34] RECOVERY - Apache HTTP on mw1230 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.064 second response time [15:52:05] RECOVERY - puppet last run on mw1230 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [15:52:24] RECOVERY - HHVM rendering on mw1230 is OK: HTTP OK: HTTP/1.1 200 OK - 72716 bytes in 0.250 second response time [15:52:48] <_joe_> !log repooling mw1230 [15:52:50] Logged the message, Master [15:54:05] RECOVERY - HHVM rendering on mw1231 is OK: HTTP OK: HTTP/1.1 200 OK - 72717 bytes in 4.345 second response time [15:54:15] RECOVERY - puppet last run on mw1231 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [15:54:35] RECOVERY - Apache HTTP on mw1231 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.060 second response time [15:56:02] <_joe_> !log repooling mw1231 [15:56:04] Logged the message, Master [15:57:34] (03CR) 10Mark Bergsma: "A few comments, but I agree with it in principle. Perhaps put some RESERVED comments in the file for now as well so we won't use these ran" (032 comments) [dns] - 10https://gerrit.wikimedia.org/r/174955 (owner: 10Alexandros Kosiaris) [15:59:24] RECOVERY - Apache HTTP on mw1232 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 2.492 second response time [15:59:25] RECOVERY - puppet last run on mw1232 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [15:59:45] RECOVERY - HHVM rendering on mw1232 is OK: HTTP OK: HTTP/1.1 200 OK - 72717 bytes in 1.842 second response time [16:00:13] (03Abandoned) 10Giuseppe Lavagetto: HAT: convert exec to cron to avoid chicken-and-egg dependency [puppet] - 10https://gerrit.wikimedia.org/r/174925 (owner: 10Giuseppe Lavagetto) [16:03:42] PROBLEM - HHVM rendering on mw1233 is CRITICAL: Connection refused [16:04:02] PROBLEM - Apache HTTP on mw1233 is CRITICAL: Connection refused [16:05:32] PROBLEM - mediawiki-installation DSH group on mw1229 is CRITICAL: Host mw1229 is not in mediawiki-installation dsh group [16:05:32] PROBLEM - DPKG on mw1229 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:05:42] PROBLEM - puppet last run on mw1229 is CRITICAL: CRITICAL: Puppet has 112 failures [16:06:32] RECOVERY - DPKG on mw1229 is OK: All packages OK [16:07:21] PROBLEM - HHVM rendering on mw1229 is CRITICAL: Connection timed out [16:11:21] PROBLEM - Apache HTTP on mw1229 is CRITICAL: Connection refused [16:12:05] (03PS2) 10Alexandros Kosiaris: codfw row B labs network [dns] - 10https://gerrit.wikimedia.org/r/174955 [16:14:51] RECOVERY - puppet last run on mw1229 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:17:06] !log upload carbonate 0.2.2-1 to trusty-wikimedia [16:17:08] Logged the message, Master [16:19:22] RECOVERY - Apache HTTP on mw1229 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.094 second response time [16:19:22] RECOVERY - HHVM rendering on mw1229 is OK: HTTP OK: HTTP/1.1 200 OK - 72717 bytes in 4.178 second response time [16:22:09] godog: \o/ ! [16:22:12] RECOVERY - puppet last run on mw1233 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [16:22:22] RECOVERY - Apache HTTP on mw1233 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 2.589 second response time [16:23:02] RECOVERY - HHVM rendering on mw1233 is OK: HTTP OK: HTTP/1.1 200 OK - 72717 bytes in 1.877 second response time [16:23:39] <_joe_> !log repooling mw1232-3 [16:23:41] Logged the message, Master [16:24:22] PROBLEM - mediawiki-installation DSH group on mw1228 is CRITICAL: Host mw1228 is not in mediawiki-installation dsh group [16:24:53] PROBLEM - puppet last run on mw1228 is CRITICAL: CRITICAL: Puppet has 112 failures [16:25:03] RECOVERY - mediawiki-installation DSH group on mw1231 is OK: OK [16:25:53] PROBLEM - HHVM rendering on mw1228 is CRITICAL: Connection refused [16:26:05] RECOVERY - mediawiki-installation DSH group on mw1232 is OK: OK [16:26:33] RECOVERY - mediawiki-installation DSH group on mw1230 is OK: OK [16:28:06] (03PS1) 10Alexandros Kosiaris: Add vim modelines all over the repo [dns] - 10https://gerrit.wikimedia.org/r/174967 [16:28:12] RECOVERY - HHVM rendering on mw1228 is OK: HTTP OK: HTTP/1.1 200 OK - 72717 bytes in 9.625 second response time [16:28:49] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Not needed for emacs" [dns] - 10https://gerrit.wikimedia.org/r/174967 (owner: 10Alexandros Kosiaris) [16:28:58] <_joe_> sorry I couldn't resist :P [16:29:23] akosiaris: \o/ indeed! I'll probably sponsor the debian upload, the package is already done essentially [16:30:02] RECOVERY - puppet last run on mw1228 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [16:30:53] (03CR) 10Yuvipanda: [C: 031] "Emacs not needed :)" [dns] - 10https://gerrit.wikimedia.org/r/174967 (owner: 10Alexandros Kosiaris) [16:31:36] (03CR) 10Chad: "Do I need these if I use notepad.exe?!?" [dns] - 10https://gerrit.wikimedia.org/r/174967 (owner: 10Alexandros Kosiaris) [16:31:49] _joe_: :) [16:31:54] <_joe_> that's on mde [16:32:13] Hmm took a long time to find _ on my phone [16:32:39] <_joe_> akosiaris: sorry for hijacking a serious (and worthy) commit with a pseudo editor war [16:33:18] <_joe_> !log pooling mw1236 (HHVM) into the main apache pool [16:33:23] Logged the message, Master [16:33:52] RECOVERY - mediawiki-installation DSH group on mw1227 is OK: OK [16:34:48] you mean there are people who think emacs is still in the race for the editor war? ;) [16:34:54] <_joe_> ahah [16:35:01] _joe_: YuviPanda: P [16:35:04] :P [16:35:04] <_joe_> no, it's in the race for best OS [16:35:13] true that [16:35:33] Emacs evil mode is also next vim [16:35:55] <_joe_> first hhvm server in the general appserver pool, btw [16:36:06] _joe_: sweet. [16:36:18] (03PS16) 10Andrew Bogott: Add class and role for Openstack Horizon [puppet] - 10https://gerrit.wikimedia.org/r/170340 [16:36:20] (03PS1) 10Andrew Bogott: Use the ubuntu cloud archive for all Ubuntu versions [puppet] - 10https://gerrit.wikimedia.org/r/174971 [16:36:38] <_joe_> bd808: it's utterly bored AFAICS [16:36:56] That's how we like our app servers [16:37:06] (03CR) 10Mark Bergsma: [C: 031] codfw row B labs network (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/174955 (owner: 10Alexandros Kosiaris) [16:38:14] (03PS1) 10Giuseppe Lavagetto: dsh: add mw1228 and mw1229 [puppet] - 10https://gerrit.wikimedia.org/r/174972 [16:39:43] (03CR) 10Alexandros Kosiaris: [C: 032] "Thanks" (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/174955 (owner: 10Alexandros Kosiaris) [16:39:53] (03CR) 10Giuseppe Lavagetto: [C: 032] dsh: add mw1228 and mw1229 [puppet] - 10https://gerrit.wikimedia.org/r/174972 (owner: 10Giuseppe Lavagetto) [16:41:38] bblack: "warning: The global option 'udp_threads' was reduced from the configured value of 8 to 1 for lack of SO_REUSEPORT support". Nice! the so_reuseport support part, not the warning [16:41:52] well nice for the warning as well now that I think about it [16:43:05] <_joe_> !log pooled mw1228-9 [16:43:07] Logged the message, Master [16:44:55] (03CR) 10Hashar: [C: 04-1] "It is probably going to cause issue with Trusty hosts due to 'precise-updates' being hardcoded :-)" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/174971 (owner: 10Andrew Bogott) [16:47:11] (03CR) 10Andrew Bogott: "You're right! I will work on this :)" [puppet] - 10https://gerrit.wikimedia.org/r/174971 (owner: 10Andrew Bogott) [16:47:13] RECOVERY - mediawiki-installation DSH group on mw1233 is OK: OK [16:52:53] (03CR) 10Jforrester: Enable VisualEditor Beta Feature on other wikis too (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174793 (owner: 10Jforrester) [16:52:58] (03PS2) 10Jforrester: Enable VisualEditor Beta Feature on other wikis too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174793 [16:53:13] RECOVERY - Apache HTTP on mw1025 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.058 second response time [16:53:23] RECOVERY - HHVM rendering on mw1025 is OK: HTTP OK: HTTP/1.1 200 OK - 72716 bytes in 0.235 second response time [16:59:21] <_joe_> !log restarted hhvm on mw1025, TC cache exhausted [16:59:24] Logged the message, Master [17:01:02] RECOVERY - RAID on ms-be2007 is OK: OK: optimal, 13 logical, 13 physical [17:01:15] (03PS1) 10Alexandros Kosiaris: IPv6 allocations for codfw row B for labs [dns] - 10https://gerrit.wikimedia.org/r/174980 [17:01:17] (03PS1) 10Alexandros Kosiaris: Fix interface in cbff169 [dns] - 10https://gerrit.wikimedia.org/r/174981 [17:04:13] (03PS2) 10Alexandros Kosiaris: Fix interface in cbff169 [dns] - 10https://gerrit.wikimedia.org/r/174981 [17:04:15] (03PS2) 10Alexandros Kosiaris: IPv6 allocations for codfw row B for labs [dns] - 10https://gerrit.wikimedia.org/r/174980 [17:04:20] (03CR) 10Alexandros Kosiaris: [C: 032] Add vim modelines all over the repo [dns] - 10https://gerrit.wikimedia.org/r/174967 (owner: 10Alexandros Kosiaris) [17:05:32] RECOVERY - mediawiki-installation DSH group on mw1229 is OK: OK [17:06:54] (03CR) 10Alexandros Kosiaris: [C: 032] IPv6 allocations for codfw row B for labs [dns] - 10https://gerrit.wikimedia.org/r/174980 (owner: 10Alexandros Kosiaris) [17:07:17] (03CR) 10Alexandros Kosiaris: [C: 032] Fix interface in cbff169 [dns] - 10https://gerrit.wikimedia.org/r/174981 (owner: 10Alexandros Kosiaris) [17:11:32] RECOVERY - puppet last run on ms-be2007 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [17:15:52] PROBLEM - puppetmaster backend https on strontium is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8141: HTTP/1.1 500 Internal Server Error [17:16:03] PROBLEM - puppet last run on strontium is CRITICAL: CRITICAL: Puppet has 29 failures [17:16:03] PROBLEM - puppet last run on analytics1004 is CRITICAL: CRITICAL: Puppet has 17 failures [17:16:12] PROBLEM - puppet last run on fluorine is CRITICAL: CRITICAL: Puppet has 54 failures [17:16:13] PROBLEM - puppet last run on wtp1010 is CRITICAL: CRITICAL: Puppet has 22 failures [17:16:13] PROBLEM - puppet last run on mw1139 is CRITICAL: CRITICAL: Puppet has 11 failures [17:16:13] PROBLEM - puppet last run on achernar is CRITICAL: CRITICAL: Puppet has 18 failures [17:16:22] PROBLEM - puppet last run on mw1241 is CRITICAL: CRITICAL: Puppet has 58 failures [17:16:23] PROBLEM - puppet last run on mw1066 is CRITICAL: CRITICAL: puppet fail [17:16:23] PROBLEM - puppet last run on elastic1020 is CRITICAL: CRITICAL: Puppet has 20 failures [17:16:23] PROBLEM - puppet last run on cp1052 is CRITICAL: CRITICAL: Puppet has 13 failures [17:16:32] PROBLEM - puppet last run on amssq33 is CRITICAL: CRITICAL: puppet fail [17:16:32] PROBLEM - puppet last run on ms-be2007 is CRITICAL: CRITICAL: Puppet has 3 failures [17:16:32] PROBLEM - puppet last run on amssq31 is CRITICAL: CRITICAL: Puppet has 18 failures [17:16:32] PROBLEM - puppet last run on cp3005 is CRITICAL: CRITICAL: puppet fail [17:16:42] PROBLEM - puppet last run on cp3011 is CRITICAL: CRITICAL: puppet fail [17:16:42] PROBLEM - puppet last run on ssl3001 is CRITICAL: CRITICAL: puppet fail [17:16:42] PROBLEM - puppet last run on wtp1001 is CRITICAL: CRITICAL: puppet fail [17:16:43] PROBLEM - puppet last run on search1004 is CRITICAL: CRITICAL: Puppet has 46 failures [17:16:43] PROBLEM - puppet last run on cp1067 is CRITICAL: CRITICAL: Puppet has 17 failures [17:16:43] PROBLEM - puppet last run on mw1204 is CRITICAL: CRITICAL: Puppet has 63 failures [17:16:43] PROBLEM - puppet last run on ms-be1002 is CRITICAL: CRITICAL: puppet fail [17:16:52] PROBLEM - puppet last run on lvs1004 is CRITICAL: CRITICAL: puppet fail [17:16:52] PROBLEM - puppet last run on cp1044 is CRITICAL: CRITICAL: Puppet has 19 failures [17:16:53] PROBLEM - puppet last run on ms-fe1002 is CRITICAL: CRITICAL: puppet fail [17:17:00] <^d> ruh roh [17:17:02] PROBLEM - puppet last run on search1012 is CRITICAL: CRITICAL: Puppet has 53 failures [17:17:02] PROBLEM - puppet last run on cp1045 is CRITICAL: CRITICAL: puppet fail [17:17:03] PROBLEM - puppet last run on lvs1003 is CRITICAL: CRITICAL: puppet fail [17:17:03] PROBLEM - puppet last run on analytics1018 is CRITICAL: CRITICAL: Puppet has 20 failures [17:17:03] PROBLEM - puppet last run on mw1207 is CRITICAL: CRITICAL: puppet fail [17:17:03] PROBLEM - puppet last run on mw1143 is CRITICAL: CRITICAL: Puppet has 55 failures [17:17:03] PROBLEM - puppet last run on mw1236 is CRITICAL: CRITICAL: Puppet has 74 failures [17:17:04] PROBLEM - puppet last run on cp4020 is CRITICAL: CRITICAL: Puppet has 24 failures [17:17:04] PROBLEM - puppet last run on wtp1008 is CRITICAL: CRITICAL: Puppet has 19 failures [17:17:05] PROBLEM - puppet last run on cp4002 is CRITICAL: CRITICAL: Puppet has 23 failures [17:17:05] PROBLEM - puppet last run on osm-cp1001 is CRITICAL: CRITICAL: puppet fail [17:17:06] PROBLEM - puppet last run on rbf1001 is CRITICAL: CRITICAL: Puppet has 21 failures [17:17:06] PROBLEM - puppet last run on mw1154 is CRITICAL: CRITICAL: puppet fail [17:17:07] PROBLEM - puppet last run on cp1068 is CRITICAL: CRITICAL: puppet fail [17:17:07] PROBLEM - puppet last run on chromium is CRITICAL: CRITICAL: puppet fail [17:17:08] PROBLEM - puppet last run on copper is CRITICAL: CRITICAL: Puppet has 25 failures [17:17:12] PROBLEM - puppet last run on radium is CRITICAL: CRITICAL: puppet fail [17:17:12] PROBLEM - puppet last run on cp1053 is CRITICAL: CRITICAL: puppet fail [17:17:12] PROBLEM - puppet last run on db1037 is CRITICAL: CRITICAL: puppet fail [17:17:12] PROBLEM - puppet last run on mw1021 is CRITICAL: CRITICAL: puppet fail [17:17:13] PROBLEM - puppet last run on elastic1003 is CRITICAL: CRITICAL: Puppet has 25 failures [17:17:13] PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: puppet fail [17:17:13] PROBLEM - puppet last run on mw1215 is CRITICAL: CRITICAL: Puppet has 68 failures [17:17:13] PROBLEM - puppet last run on mw1027 is CRITICAL: CRITICAL: Puppet has 63 failures [17:17:14] PROBLEM - puppet last run on baham is CRITICAL: CRITICAL: Puppet has 24 failures [17:17:22] PROBLEM - puppet last run on mw1112 is CRITICAL: CRITICAL: Puppet has 63 failures [17:17:23] PROBLEM - puppet last run on mw1071 is CRITICAL: CRITICAL: Puppet has 62 failures [17:17:23] PROBLEM - puppet last run on mw1203 is CRITICAL: CRITICAL: Puppet has 79 failures [17:17:23] PROBLEM - puppet last run on mw1090 is CRITICAL: CRITICAL: Puppet has 69 failures [17:17:23] PROBLEM - puppet last run on analytics1031 is CRITICAL: CRITICAL: puppet fail [17:17:24] PROBLEM - puppet last run on cp1040 is CRITICAL: CRITICAL: Puppet has 20 failures [17:17:24] PROBLEM - puppet last run on eeden is CRITICAL: CRITICAL: puppet fail [17:17:24] PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: Puppet has 107 failures [17:17:32] PROBLEM - puppet last run on tmh1001 is CRITICAL: CRITICAL: puppet fail [17:17:32] PROBLEM - puppet last run on mw1220 is CRITICAL: CRITICAL: Puppet has 67 failures [17:17:32] PROBLEM - puppet last run on cp4009 is CRITICAL: CRITICAL: Puppet has 25 failures [17:17:32] PROBLEM - puppet last run on amssq39 is CRITICAL: CRITICAL: Puppet has 20 failures [17:17:33] PROBLEM - puppet last run on cp1054 is CRITICAL: CRITICAL: puppet fail [17:17:33] PROBLEM - puppet last run on bast2001 is CRITICAL: CRITICAL: puppet fail [17:17:33] PROBLEM - puppet last run on mw1131 is CRITICAL: CRITICAL: puppet fail [17:17:42] PROBLEM - puppet last run on lvs3003 is CRITICAL: CRITICAL: Puppet has 29 failures [17:17:42] PROBLEM - puppet last run on mw1194 is CRITICAL: CRITICAL: puppet fail [17:17:42] PROBLEM - puppet last run on cp3018 is CRITICAL: CRITICAL: Puppet has 21 failures [17:17:52] PROBLEM - puppet last run on db1056 is CRITICAL: CRITICAL: Puppet has 22 failures [17:17:52] PROBLEM - puppet last run on mw1113 is CRITICAL: CRITICAL: Puppet has 69 failures [17:17:52] PROBLEM - puppet last run on sca1001 is CRITICAL: CRITICAL: Puppet has 21 failures [17:17:52] PROBLEM - puppet last run on mw1064 is CRITICAL: CRITICAL: Puppet has 67 failures [17:17:53] PROBLEM - puppet last run on ms-be1015 is CRITICAL: CRITICAL: Puppet has 22 failures [17:17:53] PROBLEM - puppet last run on analytics1021 is CRITICAL: CRITICAL: puppet fail [17:17:53] PROBLEM - puppet last run on mw1107 is CRITICAL: CRITICAL: Puppet has 59 failures [17:17:54] PROBLEM - puppet last run on mw1086 is CRITICAL: CRITICAL: Puppet has 64 failures [17:17:54] PROBLEM - puppet last run on db1045 is CRITICAL: CRITICAL: Puppet has 29 failures [17:17:55] PROBLEM - puppet last run on mw1073 is CRITICAL: CRITICAL: puppet fail [17:17:55] PROBLEM - puppet last run on radon is CRITICAL: CRITICAL: Puppet has 23 failures [17:17:56] PROBLEM - puppet last run on cp4012 is CRITICAL: CRITICAL: Puppet has 27 failures [17:17:56] PROBLEM - puppet last run on bast1001 is CRITICAL: CRITICAL: Puppet has 79 failures [17:18:02] PROBLEM - puppet last run on mw1193 is CRITICAL: CRITICAL: Puppet has 56 failures [17:18:03] PROBLEM - puppet last run on mc1015 is CRITICAL: CRITICAL: puppet fail [17:18:03] PROBLEM - puppet last run on lvs4004 is CRITICAL: CRITICAL: Puppet has 16 failures [17:18:12] PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: puppet fail [17:18:12] PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: puppet fail [17:18:12] PROBLEM - puppet last run on es1004 is CRITICAL: CRITICAL: Puppet has 17 failures [17:18:13] PROBLEM - puppet last run on analytics1012 is CRITICAL: CRITICAL: Puppet has 25 failures [17:18:13] PROBLEM - puppet last run on mw1104 is CRITICAL: CRITICAL: Puppet has 74 failures [17:18:13] PROBLEM - puppet last run on mw1037 is CRITICAL: CRITICAL: Puppet has 68 failures [17:18:13] PROBLEM - puppet last run on mw1103 is CRITICAL: CRITICAL: puppet fail [17:18:22] PROBLEM - puppet last run on db1035 is CRITICAL: CRITICAL: Puppet has 23 failures [17:18:25] PROBLEM - puppet last run on virt1009 is CRITICAL: CRITICAL: puppet fail [17:18:25] PROBLEM - puppet last run on search1003 is CRITICAL: CRITICAL: puppet fail [17:18:25] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: Puppet has 33 failures [17:18:25] PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: puppet fail [17:18:25] PROBLEM - puppet last run on amssq37 is CRITICAL: CRITICAL: Puppet has 20 failures [17:18:26] PROBLEM - puppet last run on virt1008 is CRITICAL: CRITICAL: Puppet has 26 failures [17:18:32] PROBLEM - puppet last run on lvs2002 is CRITICAL: CRITICAL: Puppet has 20 failures [17:18:32] PROBLEM - puppet last run on db2017 is CRITICAL: CRITICAL: Puppet has 20 failures [17:18:42] PROBLEM - puppet last run on mw1155 is CRITICAL: CRITICAL: Puppet has 73 failures [17:18:43] PROBLEM - puppet last run on labsdb1005 is CRITICAL: CRITICAL: Puppet has 21 failures [17:18:43] PROBLEM - puppet last run on cp3021 is CRITICAL: CRITICAL: puppet fail [17:18:43] PROBLEM - puppet last run on elastic1025 is CRITICAL: CRITICAL: Puppet has 24 failures [17:18:43] PROBLEM - puppet last run on virt1005 is CRITICAL: CRITICAL: puppet fail [17:18:44] PROBLEM - puppet last run on zinc is CRITICAL: CRITICAL: puppet fail [17:18:44] PROBLEM - puppet last run on amssq52 is CRITICAL: CRITICAL: puppet fail [17:18:52] PROBLEM - puppet last run on elastic1029 is CRITICAL: CRITICAL: Puppet has 28 failures [17:18:52] PROBLEM - puppet last run on cp1066 is CRITICAL: CRITICAL: Puppet has 18 failures [17:18:53] PROBLEM - puppet last run on db1027 is CRITICAL: CRITICAL: puppet fail [17:18:53] PROBLEM - puppet last run on ssl1003 is CRITICAL: CRITICAL: Puppet has 22 failures [17:18:53] RECOVERY - puppetmaster backend https on strontium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.998 second response time [17:18:53] PROBLEM - puppet last run on dysprosium is CRITICAL: CRITICAL: Puppet has 19 failures [17:18:53] PROBLEM - puppet last run on db1068 is CRITICAL: CRITICAL: puppet fail [17:19:03] <_joe_> !log apache hard restart on strontium [17:19:03] PROBLEM - puppet last run on elastic1013 is CRITICAL: CRITICAL: puppet fail [17:19:05] Logged the message, Master [17:19:08] PROBLEM - puppet last run on mw1018 is CRITICAL: CRITICAL: Puppet has 80 failures [17:19:08] PROBLEM - puppet last run on pollux is CRITICAL: CRITICAL: puppet fail [17:19:08] PROBLEM - puppet last run on cp4015 is CRITICAL: CRITICAL: puppet fail [17:19:08] PROBLEM - puppet last run on es2006 is CRITICAL: CRITICAL: Puppet has 26 failures [17:19:08] PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 72 failures [17:19:08] PROBLEM - puppet last run on elastic1009 is CRITICAL: CRITICAL: puppet fail [17:19:12] PROBLEM - puppet last run on ms-be1010 is CRITICAL: CRITICAL: Puppet has 17 failures [17:19:12] PROBLEM - puppet last run on mw1128 is CRITICAL: CRITICAL: Puppet has 63 failures [17:19:13] PROBLEM - puppet last run on mw1047 is CRITICAL: CRITICAL: Puppet has 72 failures [17:19:13] PROBLEM - puppet last run on cp1064 is CRITICAL: CRITICAL: Puppet has 18 failures [17:19:22] PROBLEM - puppet last run on db1009 is CRITICAL: CRITICAL: puppet fail [17:19:22] PROBLEM - puppet last run on search1019 is CRITICAL: CRITICAL: puppet fail [17:19:22] PROBLEM - puppet last run on mw1019 is CRITICAL: CRITICAL: puppet fail [17:19:23] PROBLEM - puppet last run on mw1070 is CRITICAL: CRITICAL: puppet fail [17:19:23] PROBLEM - puppet last run on mw1179 is CRITICAL: CRITICAL: puppet fail [17:19:23] PROBLEM - puppet last run on ms-be1014 is CRITICAL: CRITICAL: puppet fail [17:19:23] PROBLEM - puppet last run on mw1058 is CRITICAL: CRITICAL: puppet fail [17:19:23] PROBLEM - puppet last run on mw1199 is CRITICAL: CRITICAL: Puppet has 69 failures [17:19:32] PROBLEM - puppet last run on lanthanum is CRITICAL: CRITICAL: puppet fail [17:19:32] PROBLEM - puppet last run on db2010 is CRITICAL: CRITICAL: Puppet has 18 failures [17:19:33] PROBLEM - puppet last run on mw1020 is CRITICAL: CRITICAL: puppet fail [17:19:33] PROBLEM - puppet last run on mw1230 is CRITICAL: CRITICAL: puppet fail [17:19:33] PROBLEM - puppet last run on es1003 is CRITICAL: CRITICAL: puppet fail [17:19:33] PROBLEM - puppet last run on mw1102 is CRITICAL: CRITICAL: puppet fail [17:19:43] PROBLEM - puppet last run on lvs2005 is CRITICAL: CRITICAL: puppet fail [17:19:45] PROBLEM - puppet last run on amssq50 is CRITICAL: CRITICAL: Puppet has 25 failures [17:19:45] PROBLEM - puppet last run on mw1169 is CRITICAL: CRITICAL: puppet fail [17:19:46] PROBLEM - puppet last run on elastic1010 is CRITICAL: CRITICAL: puppet fail [17:19:46] PROBLEM - puppet last run on ms-be3004 is CRITICAL: CRITICAL: Puppet has 22 failures [17:19:46] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: puppet fail [17:19:52] PROBLEM - puppet last run on cp3019 is CRITICAL: CRITICAL: Puppet has 25 failures [17:19:52] PROBLEM - puppet last run on cp3017 is CRITICAL: CRITICAL: Puppet has 24 failures [17:19:52] PROBLEM - puppet last run on mw1015 is CRITICAL: CRITICAL: puppet fail [17:20:02] PROBLEM - puppet last run on mw1078 is CRITICAL: CRITICAL: puppet fail [17:20:04] PROBLEM - puppet last run on labsdb1002 is CRITICAL: CRITICAL: Puppet has 18 failures [17:20:12] PROBLEM - puppet last run on cp4011 is CRITICAL: CRITICAL: Puppet has 26 failures [17:20:12] PROBLEM - puppet last run on db1005 is CRITICAL: CRITICAL: Puppet has 27 failures [17:20:12] PROBLEM - puppet last run on ocg1002 is CRITICAL: CRITICAL: puppet fail [17:20:13] PROBLEM - puppet last run on elastic1016 is CRITICAL: CRITICAL: puppet fail [17:20:13] PROBLEM - puppet last run on analytics1024 is CRITICAL: CRITICAL: puppet fail [17:20:13] PROBLEM - puppet last run on db1049 is CRITICAL: CRITICAL: Puppet has 19 failures [17:20:22] PROBLEM - puppet last run on mw1083 is CRITICAL: CRITICAL: puppet fail [17:20:22] PROBLEM - puppet last run on hydrogen is CRITICAL: CRITICAL: Puppet has 29 failures [17:20:23] PROBLEM - puppet last run on mw1101 is CRITICAL: CRITICAL: Puppet has 64 failures [17:20:23] PROBLEM - puppet last run on mw1137 is CRITICAL: CRITICAL: Puppet has 69 failures [17:20:23] PROBLEM - puppet last run on db1058 is CRITICAL: CRITICAL: Puppet has 19 failures [17:20:23] PROBLEM - puppet last run on search1008 is CRITICAL: CRITICAL: Puppet has 47 failures [17:20:33] PROBLEM - puppet last run on ms-be2009 is CRITICAL: CRITICAL: Puppet has 23 failures [17:20:33] PROBLEM - puppet last run on mw1075 is CRITICAL: CRITICAL: Puppet has 79 failures [17:20:33] PROBLEM - puppet last run on cp4017 is CRITICAL: CRITICAL: puppet fail [17:20:34] PROBLEM - puppet last run on mw1017 is CRITICAL: CRITICAL: Puppet has 74 failures [17:20:34] PROBLEM - puppet last run on search1021 is CRITICAL: CRITICAL: puppet fail [17:20:42] PROBLEM - puppet last run on cp1057 is CRITICAL: CRITICAL: puppet fail [17:20:45] PROBLEM - puppet last run on analytics1029 is CRITICAL: CRITICAL: puppet fail [17:20:46] PROBLEM - puppet last run on praseodymium is CRITICAL: CRITICAL: Puppet has 20 failures [17:20:46] PROBLEM - puppet last run on mw1136 is CRITICAL: CRITICAL: puppet fail [17:20:46] PROBLEM - puppet last run on wtp1014 is CRITICAL: CRITICAL: Puppet has 24 failures [17:20:46] PROBLEM - puppet last run on mw1157 is CRITICAL: CRITICAL: Puppet has 68 failures [17:20:46] PROBLEM - puppet last run on mw1085 is CRITICAL: CRITICAL: Puppet has 65 failures [17:20:47] PROBLEM - puppet last run on mw1013 is CRITICAL: CRITICAL: puppet fail [17:20:47] PROBLEM - puppet last run on lvs1006 is CRITICAL: CRITICAL: Puppet has 25 failures [17:20:52] PROBLEM - puppet last run on ms-be1005 is CRITICAL: CRITICAL: Puppet has 28 failures [17:20:53] PROBLEM - puppet last run on search1009 is CRITICAL: CRITICAL: Puppet has 56 failures [17:20:53] PROBLEM - puppet last run on lvs3002 is CRITICAL: CRITICAL: Puppet has 21 failures [17:20:53] PROBLEM - puppet last run on labsdb1007 is CRITICAL: CRITICAL: Puppet has 22 failures [17:20:54] PROBLEM - puppet last run on amssq45 is CRITICAL: CRITICAL: Puppet has 25 failures [17:20:54] PROBLEM - puppet last run on mw1036 is CRITICAL: CRITICAL: puppet fail [17:21:03] PROBLEM - puppet last run on tmh1002 is CRITICAL: CRITICAL: puppet fail [17:21:03] PROBLEM - puppet last run on es1005 is CRITICAL: CRITICAL: Puppet has 24 failures [17:21:03] PROBLEM - puppet last run on mw1095 is CRITICAL: CRITICAL: Puppet has 60 failures [17:21:12] PROBLEM - puppet last run on mw1127 is CRITICAL: CRITICAL: Puppet has 63 failures [17:21:12] PROBLEM - puppet last run on cp1051 is CRITICAL: CRITICAL: Puppet has 4 failures [17:21:12] PROBLEM - puppet last run on mw1182 is CRITICAL: CRITICAL: puppet fail [17:21:13] PROBLEM - puppet last run on cp4013 is CRITICAL: CRITICAL: puppet fail [17:21:13] PROBLEM - puppet last run on ms-be2010 is CRITICAL: CRITICAL: puppet fail [17:21:13] PROBLEM - puppet last run on mw1094 is CRITICAL: CRITICAL: Puppet has 64 failures [17:21:13] PROBLEM - puppet last run on mw1214 is CRITICAL: CRITICAL: puppet fail [17:21:13] PROBLEM - puppet last run on zirconium is CRITICAL: CRITICAL: puppet fail [17:21:14] will it be like this all day? :) :) [17:21:22] PROBLEM - puppet last run on db1024 is CRITICAL: CRITICAL: puppet fail [17:21:24] PROBLEM - puppet last run on lvs4001 is CRITICAL: CRITICAL: Puppet has 21 failures [17:21:26] <_joe_> greg-g: no [17:21:33] PROBLEM - puppet last run on mw1184 is CRITICAL: CRITICAL: Puppet has 69 failures [17:21:43] PROBLEM - puppet last run on db1019 is CRITICAL: CRITICAL: Puppet has 21 failures [17:21:43] PROBLEM - puppet last run on ssl1004 is CRITICAL: CRITICAL: Puppet has 19 failures [17:21:43] PROBLEM - puppet last run on mw1196 is CRITICAL: CRITICAL: puppet fail [17:21:52] _joe_: :) [17:22:03] PROBLEM - puppet last run on mw1191 is CRITICAL: CRITICAL: Puppet has 21 failures [17:22:03] PROBLEM - puppet last run on erbium is CRITICAL: CRITICAL: Puppet has 6 failures [17:24:22] RECOVERY - mediawiki-installation DSH group on mw1228 is OK: OK [17:26:16] !log Disabled login for dewiki account "K" [17:26:19] Logged the message, Master [17:31:53] RECOVERY - puppet last run on ms-be2007 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [17:33:43] RECOVERY - puppet last run on cp1052 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [17:34:12] RECOVERY - puppet last run on cp1044 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [17:34:22] RECOVERY - puppet last run on analytics1018 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [17:34:23] RECOVERY - puppet last run on rbf1001 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [17:34:32] RECOVERY - puppet last run on copper is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [17:34:32] RECOVERY - puppet last run on analytics1004 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [17:34:33] RECOVERY - puppet last run on elastic1003 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [17:34:33] RECOVERY - puppet last run on mw1139 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [17:34:42] RECOVERY - puppet last run on wtp1010 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [17:34:42] RECOVERY - puppet last run on fluorine is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [17:34:42] RECOVERY - puppet last run on mw1215 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [17:34:42] RECOVERY - puppet last run on mw1027 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [17:34:53] RECOVERY - puppet last run on mw1071 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [17:34:53] RECOVERY - puppet last run on achernar is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [17:34:53] RECOVERY - puppet last run on elastic1020 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [17:34:54] RECOVERY - puppet last run on cp1040 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [17:34:55] RECOVERY - puppet last run on amssq33 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [17:34:55] RECOVERY - puppet last run on amssq31 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [17:34:55] RECOVERY - puppet last run on amssq39 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [17:35:02] RECOVERY - puppet last run on lvs3003 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [17:35:02] RECOVERY - puppet last run on wtp1001 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [17:35:03] RECOVERY - puppet last run on ssl3001 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [17:35:03] RECOVERY - puppet last run on search1004 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [17:35:12] RECOVERY - puppet last run on cp1067 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [17:35:12] RECOVERY - puppet last run on mw1204 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [17:35:12] RECOVERY - puppet last run on ms-be1002 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [17:35:12] RECOVERY - puppet last run on sca1001 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [17:35:13] RECOVERY - puppet last run on mw1064 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [17:35:13] RECOVERY - puppet last run on mw1107 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [17:35:13] RECOVERY - puppet last run on search1012 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [17:35:22] RECOVERY - puppet last run on bast1001 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [17:35:22] RECOVERY - puppet last run on mw1236 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [17:35:28] RECOVERY - puppet last run on mw1193 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [17:35:28] RECOVERY - puppet last run on osm-cp1001 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [17:35:28] RECOVERY - puppet last run on cp4012 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [17:35:32] RECOVERY - puppet last run on cp4020 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [17:35:32] RECOVERY - puppet last run on chromium is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [17:35:32] RECOVERY - puppet last run on cp4002 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [17:35:42] RECOVERY - puppet last run on strontium is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [17:35:42] RECOVERY - puppet last run on lvs4004 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [17:35:42] RECOVERY - puppet last run on cp1053 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [17:35:42] RECOVERY - puppet last run on db1037 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [17:35:43] RECOVERY - puppet last run on mw1104 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [17:35:43] RECOVERY - puppet last run on mw1110 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [17:35:43] RECOVERY - puppet last run on mw1112 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [17:35:52] RECOVERY - puppet last run on mw1241 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [17:35:52] RECOVERY - puppet last run on db1035 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [17:35:52] RECOVERY - puppet last run on mw1203 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [17:35:52] RECOVERY - puppet last run on mw1090 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [17:35:52] RECOVERY - puppet last run on baham is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [17:35:52] RECOVERY - puppet last run on mw1066 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [17:35:53] RECOVERY - puppet last run on virt1008 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [17:35:53] RECOVERY - puppet last run on terbium is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [17:35:54] RECOVERY - puppet last run on mw1220 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [17:35:54] RECOVERY - puppet last run on mw1155 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [17:36:02] RECOVERY - puppet last run on cp1054 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [17:36:02] RECOVERY - puppet last run on cp4009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:36:03] RECOVERY - puppet last run on cp3005 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [17:36:03] RECOVERY - puppet last run on cp3011 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [17:36:03] RECOVERY - puppet last run on cp3018 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [17:36:13] RECOVERY - puppet last run on elastic1029 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [17:36:13] RECOVERY - puppet last run on db1056 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [17:36:22] RECOVERY - puppet last run on lvs1004 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [17:36:22] RECOVERY - puppet last run on ms-be1015 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [17:36:22] RECOVERY - puppet last run on ssl1003 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [17:36:22] RECOVERY - puppet last run on ms-fe1002 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [17:36:23] RECOVERY - puppet last run on mw1086 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [17:36:23] RECOVERY - puppet last run on cp1045 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [17:36:23] RECOVERY - puppet last run on db1045 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [17:36:24] RECOVERY - puppet last run on lvs1003 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [17:36:24] RECOVERY - puppet last run on mw1143 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [17:36:25] RECOVERY - puppet last run on mw1207 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [17:36:25] RECOVERY - puppet last run on wtp1008 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [17:36:26] RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [17:36:26] RECOVERY - puppet last run on mw1154 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [17:36:27] RECOVERY - puppet last run on cp1068 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [17:36:27] RECOVERY - puppet last run on es2006 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [17:36:32] RECOVERY - puppet last run on ms-be1010 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [17:36:43] RECOVERY - puppet last run on radium is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [17:36:43] RECOVERY - puppet last run on es1004 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [17:36:43] RECOVERY - puppet last run on mw1021 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [17:36:43] RECOVERY - puppet last run on analytics1012 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [17:36:43] RECOVERY - puppet last run on mw1037 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [17:36:44] RECOVERY - puppet last run on mw1199 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [17:36:52] RECOVERY - puppet last run on search1003 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [17:37:00] RECOVERY - puppet last run on palladium is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [17:37:00] RECOVERY - puppet last run on analytics1031 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [17:37:00] RECOVERY - puppet last run on db2010 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [17:37:00] RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [17:37:02] RECOVERY - puppet last run on amssq37 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [17:37:02] RECOVERY - puppet last run on tmh1001 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [17:37:02] RECOVERY - puppet last run on lvs2002 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [17:37:02] RECOVERY - puppet last run on db2017 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [17:37:12] RECOVERY - puppet last run on eeden is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [17:37:12] RECOVERY - puppet last run on mw1131 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:37:12] RECOVERY - puppet last run on bast2001 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [17:37:12] RECOVERY - puppet last run on mw1194 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [17:37:12] RECOVERY - puppet last run on amssq50 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [17:37:12] RECOVERY - puppet last run on labsdb1005 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [17:37:13] RECOVERY - puppet last run on cp3017 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [17:37:13] RECOVERY - puppet last run on cp3019 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [17:37:13] RECOVERY - puppet last run on elastic1025 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:37:14] RECOVERY - puppet last run on virt1005 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [17:37:14] RECOVERY - puppet last run on zinc is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [17:37:22] RECOVERY - puppet last run on cp1066 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [17:37:24] RECOVERY - puppet last run on mw1113 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [17:37:24] RECOVERY - puppet last run on db1027 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:37:24] RECOVERY - puppet last run on dysprosium is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [17:37:24] RECOVERY - puppet last run on analytics1021 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [17:37:24] RECOVERY - puppet last run on db1068 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [17:37:25] RECOVERY - puppet last run on mw1073 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [17:37:25] RECOVERY - puppet last run on labsdb1002 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [17:37:25] RECOVERY - puppet last run on elastic1013 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [17:37:26] RECOVERY - puppet last run on radon is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:37:32] RECOVERY - puppet last run on mw1018 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [17:37:32] RECOVERY - puppet last run on db1005 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [17:37:32] RECOVERY - puppet last run on cp4011 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [17:37:33] RECOVERY - puppet last run on elastic1009 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [17:37:33] RECOVERY - puppet last run on pollux is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [17:37:33] RECOVERY - puppet last run on mc1015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:37:33] RECOVERY - puppet last run on db1049 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [17:37:42] RECOVERY - puppet last run on mw1047 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [17:37:45] RECOVERY - puppet last run on mw1128 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [17:37:45] RECOVERY - puppet last run on cp1064 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [17:37:45] RECOVERY - puppet last run on cp4016 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [17:37:45] RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [17:37:45] RECOVERY - puppet last run on db1009 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [17:37:45] RECOVERY - puppet last run on hydrogen is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [17:37:45] RECOVERY - puppet last run on search1019 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [17:37:46] RECOVERY - puppet last run on mw1179 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [17:37:46] RECOVERY - puppet last run on mw1137 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [17:37:47] RECOVERY - puppet last run on ms-be1014 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [17:37:47] RECOVERY - puppet last run on mw1103 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [17:37:52] RECOVERY - puppet last run on lanthanum is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [17:37:52] RECOVERY - puppet last run on mw1075 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [17:37:52] RECOVERY - puppet last run on mw1017 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [17:37:53] RECOVERY - puppet last run on mw1230 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [17:37:53] RECOVERY - puppet last run on wtp1014 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [17:38:02] RECOVERY - puppet last run on mw1085 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [17:38:03] RECOVERY - puppet last run on lvs2005 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [17:38:12] RECOVERY - puppet last run on mw1157 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [17:38:12] RECOVERY - puppet last run on mw1102 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [17:38:13] RECOVERY - puppet last run on search1009 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [17:38:13] RECOVERY - puppet last run on labsdb1007 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [17:38:13] RECOVERY - puppet last run on elastic1010 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [17:38:13] RECOVERY - puppet last run on cp3021 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [17:38:13] RECOVERY - puppet last run on lvs3002 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [17:38:14] RECOVERY - puppet last run on amssq45 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [17:38:14] RECOVERY - puppet last run on ms-be3004 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [17:38:15] RECOVERY - puppet last run on mw1015 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [17:38:23] RECOVERY - puppet last run on mw1095 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [17:38:25] RECOVERY - puppet last run on es1005 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [17:38:25] RECOVERY - puppet last run on mw1078 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [17:38:25] RECOVERY - puppet last run on amssq52 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [17:38:32] RECOVERY - puppet last run on mw1214 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [17:38:33] RECOVERY - puppet last run on ocg1002 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [17:38:43] RECOVERY - puppet last run on cp4015 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [17:38:43] RECOVERY - puppet last run on elastic1016 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [17:38:52] RECOVERY - puppet last run on mw1101 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [17:38:53] RECOVERY - puppet last run on mw1070 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:38:53] RECOVERY - puppet last run on db1058 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [17:38:53] RECOVERY - puppet last run on mw1058 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [17:38:53] RECOVERY - puppet last run on search1008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:38:53] RECOVERY - puppet last run on virt1009 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [17:38:53] RECOVERY - puppet last run on mw1184 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [17:39:02] RECOVERY - puppet last run on mw1020 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [17:39:03] RECOVERY - puppet last run on ms-be2009 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [17:39:03] RECOVERY - puppet last run on search1021 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [17:39:03] RECOVERY - puppet last run on cp1057 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [17:39:03] RECOVERY - puppet last run on db1019 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [17:39:03] RECOVERY - puppet last run on ssl1004 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [17:39:03] RECOVERY - puppet last run on analytics1029 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [17:39:04] RECOVERY - puppet last run on mw1136 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [17:39:04] RECOVERY - puppet last run on es1003 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [17:39:05] RECOVERY - puppet last run on praseodymium is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [17:39:05] RECOVERY - puppet last run on lvs1006 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [17:39:13] RECOVERY - puppet last run on mw1169 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [17:39:14] RECOVERY - puppet last run on ms-be1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:39:22] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [17:39:22] RECOVERY - puppet last run on mw1191 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [17:39:23] RECOVERY - puppet last run on erbium is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [17:39:23] RECOVERY - puppet last run on tmh1002 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [17:39:33] RECOVERY - puppet last run on mw1127 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [17:39:33] RECOVERY - puppet last run on cp1051 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [17:39:33] RECOVERY - puppet last run on mw1182 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [17:39:43] RECOVERY - puppet last run on mw1094 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [17:39:43] RECOVERY - puppet last run on zirconium is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [17:39:43] RECOVERY - puppet last run on cp4013 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [17:39:44] RECOVERY - puppet last run on ms-be2010 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [17:39:44] RECOVERY - puppet last run on db1024 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [17:39:52] RECOVERY - puppet last run on analytics1024 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:39:53] RECOVERY - puppet last run on mw1083 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [17:39:53] RECOVERY - puppet last run on lvs4001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [17:39:53] RECOVERY - puppet last run on mw1019 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:40:03] RECOVERY - puppet last run on mw1196 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [17:40:12] RECOVERY - puppet last run on cp4017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:40:13] RECOVERY - puppet last run on mw1013 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [17:40:22] RECOVERY - puppet last run on mw1036 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [17:46:43] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [17:49:24] <^d> mark: by the way, robla mentioned to me the other day that you'd noticed an increase in avg. search latency earlier this week. That lines up with our deployment of Translation memory to Elastic from Solr on Tuesday. Those queries are way slower than our normal search traffic. [17:49:38] <^d> The graphite stats don't distinguish between the two, so yeah, avg latency ^ [17:50:28] bd808: hashar: Got a second to look at beta labs? Seems like beta is unable to send password forgotten emails [17:53:37] hoo: hop sorry, about to leave [17:53:57] well, it's Friday evening ;) [17:54:29] (03PS1) 10BryanDavis: Use Monolog provider for beta logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174988 [17:54:58] (03CR) 10BryanDavis: [C: 04-1] "Waiting on MediaWiki updates" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174988 (owner: 10BryanDavis) [17:58:11] (03PS2) 10BryanDavis: Use Monolog provider for beta logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174988 [17:58:23] (03CR) 10BryanDavis: [C: 04-1] Use Monolog provider for beta logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174988 (owner: 10BryanDavis) [18:00:03] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [18:00:41] (03CR) 10BryanDavis: Use Monolog provider for beta logging (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174988 (owner: 10BryanDavis) [18:14:50] (03PS1) 10BryanDavis: Revert "Add 'BounceHandler' to wgDebugLogGroups" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174991 [18:15:14] I need to reboot zirconium [18:15:18] any objections? [18:16:03] !rebooting zirconium because bugzilla [18:16:07] !log rebooting zirconium because bugzilla [18:16:11] Logged the message, Master [18:16:45] (03CR) 10Hoo man: [C: 031] "Good catch" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174991 (owner: 10BryanDavis) [18:17:23] PROBLEM - Host zirconium is DOWN: CRITICAL - Host Unreachable (208.80.154.41) [18:19:13] RECOVERY - Host zirconium is UP: PING OK - Packet loss = 0%, RTA = 1.82 ms [18:21:32] (03CR) 10Cscott: "bd808: https://github.com/cscott/trigger/tree/auto-mode" [puppet] - 10https://gerrit.wikimedia.org/r/170130 (owner: 10Cscott) [18:21:47] bd808: not fully tested yet [18:23:56] cscott: Interesting. I'll give it a read [18:24:57] it's pretty simple, but it should do the trick [18:25:58] git deploy start ; git reset --hard (or whatever), if git deploy sync --auto ; then git deploy service restart ; else git deploy abort ; exit 1 ; fi [18:27:59] ew a line continuation char [18:31:33] ew, a language which needs them [18:32:00] i guess i could have used an open-paren [18:32:05] well only if you write run on statements [18:32:26] What's wrong with a real if block [18:33:43] PROBLEM - NTP on zirconium is CRITICAL: NTP CRITICAL: Offset unknown [18:34:05] cscott: So it will wait 30 times and then return 5 times? [18:34:15] s/return/retry/ [18:34:49] oh not it will wait 30 times * 5 retries [18:36:42] hmmm... it would be nicer if the detailed report were exposed and used so you could know the difference between "host A failed with status N" and "we have only heard back from some hosts" [18:36:57] bd808: it gives the detailed report at the end [18:37:15] yeah, but it didn't use the detailed report to change behavior [18:37:27] Still better than today for sure [18:37:36] there's no difference between the detailed and concise report except in what info gets printed out [18:37:43] RECOVERY - NTP on zirconium is OK: NTP OK: Offset -0.0006675720215 secs [18:37:46] the actual internal info is the same [18:38:26] Sure. But I guess I meant that the detailed report gives status codes. [18:38:29] and i didn't really want to spam the full report 30 times on console [18:38:46] i can dump the detailed report before each retry, if that helps? [18:38:51] the n/m thing waits for all to be successful and you may already know that there is a hard failure [18:38:55] (and it does dump the detailed report before it fails) [18:39:15] yeah, that's for you to fix by changing the info in redis, for now at least [18:39:34] i don't think i want to add a --ignore 1 or something option, although you could imagine doing that in the future [18:39:57] i think really you'd want to remove the permanent failure from redis (it's not hard) and then re-add it when it comes back up [18:40:42] I wasn't hoping for ignore, just failure without further waiting if all hosts have reported and one or more have errors [18:41:26] It's a pain to manage though with the async stuff I think [18:41:38] I really don't like that part of salt [18:42:02] I want to give a list of hosts and know what happened on each one (success or failure) [18:42:15] for an automated deploy i don't really mind if it takes a little longer to report an error, it's supposed to always succeed [18:42:56] Like I said, better than today. :) [18:57:07] (03CR) 10Chad: [C: 032 V: 032] Remove hooks-bugzilla plugin, as Bugzilla goes into read-only mode [gerrit/plugins] - 10https://gerrit.wikimedia.org/r/174875 (owner: 10QChris) [18:58:21] bd808: updated https://github.com/cscott/trigger/tree/auto-mode slightly, now gives detailed report before each retry, doesn't use line continuation characters, etc. [19:08:03] (03PS1) 10RobH: adding the globalsign certificates to the repo [puppet] - 10https://gerrit.wikimedia.org/r/174997 [19:08:59] (03PS1) 10Yuvipanda: shinken: Re-notify about service problems only once a day [puppet] - 10https://gerrit.wikimedia.org/r/174998 [19:09:01] (03PS1) 10Yuvipanda: shinken: Add twentyafterfour to deployment-prep alerts [puppet] - 10https://gerrit.wikimedia.org/r/174999 [19:09:30] (03CR) 10RobH: [C: 032] "pushing these in, since they aren't live and are entirely new" [puppet] - 10https://gerrit.wikimedia.org/r/174997 (owner: 10RobH) [19:09:57] (03CR) 10Yuvipanda: [C: 032] shinken: Re-notify about service problems only once a day [puppet] - 10https://gerrit.wikimedia.org/r/174998 (owner: 10Yuvipanda) [19:10:14] robh: ok to merge? [19:10:18] * YuviPanda just hit puppet merge [19:10:23] im merging now actually [19:10:25] ok [19:10:30] * YuviPanda cancels his [19:10:33] (03CR) 10Greg Grossmeier: [C: 031] ":)" [puppet] - 10https://gerrit.wikimedia.org/r/174999 (owner: 10Yuvipanda) [19:10:36] it was already merging when you pinged me =] [19:10:40] oh [19:10:45] puppet merge race condition! [19:10:56] (03CR) 10Yuvipanda: [C: 032] shinken: Add twentyafterfour to deployment-prep alerts [puppet] - 10https://gerrit.wikimedia.org/r/174999 (owner: 10Yuvipanda) [19:13:39] (03PS1) 10GWicke: Fix two bugs in cassandra module defaults [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/175001 [19:14:24] (03PS1) 10Yuvipanda: shinken: Add nuria to contactgroup for analytics [puppet] - 10https://gerrit.wikimedia.org/r/175002 [19:14:25] nuria__: ^ [19:14:28] nuria__: can you +1? [19:27:12] TIL how to match projects in gerrit using regex: status:open project:^operations/.* [19:27:15] https://gerrit.wikimedia.org/r/#/q/status:open+project:%255Eoperations/.*,n,z [19:27:18] >225 changes open [19:34:40] (03PS1) 10Ori.livneh: Move *.dblist to dblists/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175007 [19:34:50] (03CR) 10jenkins-bot: [V: 04-1] Move *.dblist to dblists/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175007 (owner: 10Ori.livneh) [19:35:34] (03CR) 10Yuvipanda: [C: 04-1] Fix two bugs in cassandra module defaults (031 comment) [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/175001 (owner: 10GWicke) [19:36:59] (03PS2) 10Ori.livneh: Move *.dblist to dblists/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175007 [19:37:08] (03CR) 10jenkins-bot: [V: 04-1] Move *.dblist to dblists/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175007 (owner: 10Ori.livneh) [19:39:16] (03CR) 10Yuvipanda: [C: 032] shinken: Add nuria to contactgroup for analytics [puppet] - 10https://gerrit.wikimedia.org/r/175002 (owner: 10Yuvipanda) [19:50:54] (03PS1) 10Plucas: Increase kill timeout on kafka shutdown [debs/kafka] - 10https://gerrit.wikimedia.org/r/175011 [19:53:23] (03CR) 10Plucas: "The specific scenario we have seen is when the ISR is changing when TERM is received: in that case, it waits for the ISR changes to comple" [debs/kafka] - 10https://gerrit.wikimedia.org/r/175011 (owner: 10Plucas) [19:56:01] ^d: Hmm.. parsing of T in gerrit commit message needs a proper boundary. The old bug we had with other patterns in the past (sha1 hashes) is back and breaking stuff [19:56:04] https://gerrit.wikimedia.org/r/#/c/151127/ [19:56:08] See the card at T517" target="_blank">http://fab.wmflabs.org/T517 [19:56:58] Is https://github.com/wikimedia/operations-puppet/blob/production/modules/gerrit/templates/gerrit.config.erb the one we use? [20:02:07] (03Abandoned) 10Yuvipanda: Adding user to monitoring of analytics projects [puppet] - 10https://gerrit.wikimedia.org/r/174776 (owner: 10Nuria) [20:29:50] (03CR) 10Aaron Schulz: [C: 032] Revert "Add 'BounceHandler' to wgDebugLogGroups" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174991 (owner: 10BryanDavis) [20:30:02] (03Merged) 10jenkins-bot: Revert "Add 'BounceHandler' to wgDebugLogGroups" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174991 (owner: 10BryanDavis) [20:30:12] (03CR) 1020after4: "let's leave bug-attachment pointed at bugzilla, a sort of last minute decision was made to let bugzilla continue serving attachments at th" [dns] - 10https://gerrit.wikimedia.org/r/172469 (owner: 10Dzahn) [20:31:00] !log aaron Synchronized wmf-config/InitialiseSettings.php: Removed duplicated BounceHandler log entry (duration: 00m 05s) [20:31:04] Logged the message, Master [20:31:44] (03CR) 1020after4: switch bugzilla names over to misc-web (032 comments) [dns] - 10https://gerrit.wikimedia.org/r/172469 (owner: 10Dzahn) [20:44:36] (03PS10) 10Yuvipanda: shinken: Setup IRC notification for shinken [puppet] - 10https://gerrit.wikimedia.org/r/173080 [20:46:25] (03PS11) 10Yuvipanda: shinken: Setup IRC notification for shinken [puppet] - 10https://gerrit.wikimedia.org/r/173080 [20:47:04] (03PS7) 1020after4: Set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/174335 [20:48:18] (03CR) 1020after4: [C: 031] "updated to redirect attachment.cgi?* to old-bugzilla.wikimedia.org/attachment.cgi?*" [puppet] - 10https://gerrit.wikimedia.org/r/174335 (owner: 1020after4) [20:49:42] (03PS12) 10Yuvipanda: shinken: Setup IRC notification for shinken [puppet] - 10https://gerrit.wikimedia.org/r/173080 [20:51:30] (03CR) 10Aklapper: "As discussed on IRC: Still allow users in old-bugzilla.wm.org to access attachments of a ticket. Which requires BOTH //bugzilla.wikimedia." [dns] - 10https://gerrit.wikimedia.org/r/172469 (owner: 10Dzahn) [20:51:32] (03PS8) 1020after4: Set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/174335 [20:51:47] (03CR) 10Aklapper: "As discussed on IRC: Still allow users in old-bugzilla.wm.org to access attachments of a ticket. Which requires BOTH //bugzilla.wikimedia." [puppet] - 10https://gerrit.wikimedia.org/r/172471 (owner: 10Dzahn) [20:51:52] (03CR) 10Aklapper: "As discussed on IRC: Still allow users in old-bugzilla.wm.org to access attachments of a ticket. Which requires BOTH //bugzilla.wikimedia." [puppet] - 10https://gerrit.wikimedia.org/r/174335 (owner: 1020after4) [20:55:49] (03CR) 10BryanDavis: "Dependencies are merged. I'll put this up for SWAT monday unless there is an objection." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174988 (owner: 10BryanDavis) [21:00:05] AndyRussG, ejegg: Dear anthropoid, the time has come. Please deploy CentralNotice (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141121T2100). [21:00:20] ? [21:01:14] what could go wrong? ¯\_(ツ)_/¯ [21:02:49] (03PS13) 10Yuvipanda: shinken: Setup IRC notification for shinken [puppet] - 10https://gerrit.wikimedia.org/r/173080 [21:03:27] (03PS1) 10BBlack: use GlobalSign CA for all sni.* cert names [puppet] - 10https://gerrit.wikimedia.org/r/175040 [21:03:29] (03PS1) 10BBlack: Switch r::c::ssl::sni to new certs [puppet] - 10https://gerrit.wikimedia.org/r/175041 [21:03:32] (03CR) 10GWicke: Fix two bugs in cassandra module defaults (031 comment) [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/175001 (owner: 10GWicke) [21:04:55] (03PS2) 10GWicke: Fix two bugs in cassandra module defaults [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/175001 [21:08:29] (03CR) 10Hashar: [C: 031] "I am all for it. I haven't closely followed the monolog effort nor can I assert it is going to work, but I definitely welcome this move." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/174988 (owner: 10BryanDavis) [21:08:32] (03CR) 10BBlack: [C: 032] use GlobalSign CA for all sni.* cert names [puppet] - 10https://gerrit.wikimedia.org/r/175040 (owner: 10BBlack) [21:08:45] (03CR) 10BBlack: [C: 032] Switch r::c::ssl::sni to new certs [puppet] - 10https://gerrit.wikimedia.org/r/175041 (owner: 10BBlack) [21:10:17] !log ejegg Synchronized php-1.25wmf9/extensions/CentralNotice/: (no message) (duration: 00m 07s) [21:10:19] Logged the message, Master [21:11:54] bd808: :) [21:14:27] !log anomie Synchronized php-1.25wmf9/extensions/SecurePoll/: Backport SecurePoll bug fixes (duration: 00m 01s) [21:14:29] Logged the message, Master [21:14:49] wow, that didn't work at all. [21:15:41] ori: I think something blew up with your new scap stuff, every host said "permission denied" [21:16:00] anomie: hm, lemme see. pastebin the log? [21:17:06] (03PS1) 10BBlack: Add missing newlines to ends of new certs [puppet] - 10https://gerrit.wikimedia.org/r/175103 [21:17:08] ori: I can only get the end of it, but if you need the whole thing I can run it again. http://pastebin.com/RSZyxu6M [21:17:14] (03CR) 10BBlack: [C: 032] Add missing newlines to ends of new certs [puppet] - 10https://gerrit.wikimedia.org/r/175103 (owner: 10BBlack) [21:17:22] (03CR) 10BBlack: [V: 032] Add missing newlines to ends of new certs [puppet] - 10https://gerrit.wikimedia.org/r/175103 (owner: 10BBlack) [21:18:31] !log ori Synchronized php-1.25wmf9/extensions/SecurePoll: Backport SecurePoll bug fixes (duration: 00m 06s) [21:18:36] Logged the message, Master [21:18:38] * ori boggles [21:19:51] anomie: WFM. do you have SSH_AUTH_SOCK defined in your env? [21:20:09] ori: Yes [21:20:40] anomie: it shouldn't make a difference, but could you try unsetting it? [21:21:15] ori: I logged out of tin and back in without -A, and it's now unset. So let's try it that way. [21:21:47] oops, have to do -A for gerrit to work. [21:22:12] !log anomie Synchronized php-1.25wmf8/extensions/SecurePoll/: Backport SecurePoll bug fixes (duration: 00m 01s) [21:22:15] Logged the message, Master [21:22:20] ori: Nope, still all failed. [21:22:49] anomie: do you have a ~/.ssh/config? [21:23:02] ori: Yes [21:23:20] can you rename it and try again? i bet that that's it [21:23:33] !log anomie Synchronized php-1.25wmf8/extensions/SecurePoll/: Backport SecurePoll bug fixes (duration: 00m 05s) [21:23:38] we should pass a command-line parameter to SSH invocations from scap that instruct it to disregard ~/.ssh/config [21:23:39] Logged the message, Master [21:24:04] ori: Mostly. mw1169 didn't work ("Error reading response length from authentication socket"), but the other 245 did. [21:24:04] it has a higher priority than -l, apparently [21:24:35] (03CR) 10He7d3r: Set up redirects for bugzilla urls to redirect to phabricator. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/174335 (owner: 1020after4) [21:24:57] (03PS1) 10Yuvipanda: tools: Add python3-dev to dev_environ [puppet] - 10https://gerrit.wikimedia.org/r/175105 [21:25:15] ori: ssh -F /etc/ssh/ssh_config ? [21:25:17] !log anomie Synchronized wmf-config/InitialiseSettings.php: Testing scap, no actual change (duration: 00m 06s) [21:25:20] Logged the message, Master [21:25:29] (03PS2) 10Yuvipanda: tools: Add python3-dev to dev_environ [puppet] - 10https://gerrit.wikimedia.org/r/175105 [21:25:47] ori: It also works if I just comment out the entry for '*' from my ~/.ssh/config, leaving the one for gerrit. [21:26:10] jamesofur: SecurePoll bugfixes should be deployed now, BTW [21:26:39] anomie: heh, was just reading the back scroll, thank you very much! [21:26:49] * jamesofur goes to make polls [21:26:55] (03CR) 10Yuvipanda: [C: 032] tools: Add python3-dev to dev_environ [puppet] - 10https://gerrit.wikimedia.org/r/175105 (owner: 10Yuvipanda) [21:31:21] anomie: can you try one more time, and this time with your ~/.ssh/config as it were before, when it was failing? [21:32:06] !log anomie Synchronized wmf-config/InitialiseSettings.php: Testing scap, no actual change (duration: 00m 05s) [21:32:07] ori: Works [21:32:10] Logged the message, Master [21:32:27] anomie: cool, thanks for reporting / testing [21:32:34] np [21:44:36] !log anomie Synchronized php-1.25wmf9/extensions/SecurePoll/: Backport another SecurePoll bug fix (duration: 00m 06s) [21:44:37] Logged the message, Master [21:45:01] !log anomie Synchronized php-1.25wmf8/extensions/SecurePoll/: Backport another SecurePoll bug fix (duration: 00m 06s) [21:45:04] Logged the message, Master [22:01:08] (03PS1) 10Dzahn: misc varnish: do not handle bz-attachment URLs [puppet] - 10https://gerrit.wikimedia.org/r/175128 [22:03:22] (03PS2) 10Dzahn: misc varnish: do not handle bz-attachment URLs [puppet] - 10https://gerrit.wikimedia.org/r/175128 [22:04:59] (03CR) 10Dzahn: "in reply to andre_: https://gerrit.wikimedia.org/r/#/c/175128/2" [puppet] - 10https://gerrit.wikimedia.org/r/172471 (owner: 10Dzahn) [22:07:56] (03PS4) 10Dzahn: switch bugzilla names over to misc-web [dns] - 10https://gerrit.wikimedia.org/r/172469 [22:08:42] (03CR) 10Dzahn: switch bugzilla names over to misc-web (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/172469 (owner: 10Dzahn) [22:09:14] (03CR) 10Dzahn: "also: https://gerrit.wikimedia.org/r/#/c/172469/4" [puppet] - 10https://gerrit.wikimedia.org/r/172471 (owner: 10Dzahn) [22:09:16] (03CR) 10Aklapper: "Thanks. LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/175128 (owner: 10Dzahn) [22:11:27] YuviPanda: you still around? [22:11:33] yup, but not for long [22:11:34] 'sup? [22:12:08] I fixed https://gerrit.wikimedia.org/r/#/c/175001/ [22:12:26] * YuviPanda looks [22:12:41] saw my chance to get that merged perhaps if you are still around ;) [22:13:45] gwicke: ugh, are you using pick there just to get defaults that can be overriden on a per-node basis? [22:14:04] YuviPanda: don't ask me, not my code [22:14:07] oh [22:14:09] ok [22:14:34] it's more compact than the match / switch syntax [22:14:55] but fails in unexpected and unhelpful ways with certain default values [22:15:15] well, should just specify the default and then let it be overriden in hiera [22:15:18] than global variables [22:15:24] but that's for later. [22:15:40] (03CR) 10Yuvipanda: [C: 032] "Ok, all these pick()s look like they should be replaced with default values and then overriden with hiera, but for later. This fixes broke" [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/175001 (owner: 10GWicke) [22:15:46] (03PS1) 10Dzahn: add old-bugzilla.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/175133 [22:16:03] gwicke: I'd file a bug, but nowhere to do so :) [22:16:10] gwicke: whose code is it, btw? [22:16:22] YuviPanda: you might want to chat with ori about pick usage -- he proposed using it [22:16:29] gwicke: hmm, ok [22:16:59] ottomata wrote the cassandra module [22:17:09] !log ejegg Synchronized php-1.25wmf8/extensions/CentralNotice/: (no message) (duration: 00m 05s) [22:17:13] Logged the message, Master [22:17:29] gwicke: alright, I'll follow up after the weekend :) [22:17:34] I'll go sleep now [22:17:57] YuviPanda: thank you, and have a great weekend! [22:18:58] gwicke: you too [22:22:22] (03PS1) 10Dzahn: bugzilla: delete bugs.wikipedia.org vhost [puppet] - 10https://gerrit.wikimedia.org/r/175136 [22:24:16] (03CR) 10Nemo bis: [C: 04-1] "query.cgi, duplicates.cgi, reports.cgi, quips.cgi, buglist.cgi, votes.cgi and showdependencytree.cgi must be redirected to old-bugzilla to" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/174335 (owner: 1020after4) [22:31:11] (03CR) 10Qgil: [C: 031] add old-bugzilla.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/175133 (owner: 10Dzahn) [22:32:43] is anybody around who could triage https://rt.wikimedia.org/Ticket/Display.html?id=8530 ? [22:33:15] it turns out that we still don't have a cassandra package in our trusty mirror; only precise has it [22:33:58] /cc andrewbogott ori bblack [22:38:17] (03PS1) 10Dzahn: bugzilla: delete bugzilla.wikiPedia.org [puppet] - 10https://gerrit.wikimedia.org/r/175139 [22:42:09] (03CR) 10John F. Lewis: [C: 031] bugzilla: delete bugs.wikipedia.org vhost [puppet] - 10https://gerrit.wikimedia.org/r/175136 (owner: 10Dzahn) [22:43:11] gwicke: so, both update and add to trusty, right? [22:43:22] Is there any reason to think that the package would be different between the two? [22:43:25] andrewbogott: I care mostly about trusty to be honest [22:43:39] (03CR) 10John F. Lewis: [C: 031] misc varnish: do not handle bz-attachment URLs [puppet] - 10https://gerrit.wikimedia.org/r/175128 (owner: 10Dzahn) [22:43:39] Do you have a source for the package, or is it built in-house? [22:43:41] we are not going to be using precise for anything cassandra any more [22:43:53] andrewbogott: it's an apache package [22:44:00] http://wiki.apache.org/cassandra/DebianPackaging [22:44:22] we already imported the same to precise [22:44:29] from the 20x branch [22:44:35] wait, the same? I thought you needed an upgrade... [22:44:40] deb http://www.apache.org/dist/cassandra/debian 20x main [22:44:53] My question is -- do we need a new package downloaded, or just the precise package duplicated on trusty? [22:44:53] right now there is no version at all in trusty [22:45:10] latest stable version is 2.0.11, so it would be great to import that to trusty [22:45:37] if that's really hard then 2.0.9 would help as well [22:45:59] It's not hard, just… I of course haven't reviewed any of these packages. Presumably 2.0.9 is already approved by someone [22:46:11] (03CR) 10John F. Lewis: [C: 031] "Looks good. What Matanya said seems like a valid comment but keeping eqiad as a line just to scream 'this is eqiad' is probably better." [puppet] - 10https://gerrit.wikimedia.org/r/173476 (owner: 10Dzahn) [22:46:13] yes, Faidon imported it [22:46:28] he also contributed some fixes to the package [22:46:42] (03CR) 10John F. Lewis: [C: 031] ganglia: remove pmtpa varnish stanza [puppet] - 10https://gerrit.wikimedia.org/r/174205 (owner: 10Dzahn) [22:47:28] gwicke: apparently 2.0.11 is already available for precise. [22:47:46] okay, then that's even easier [22:47:58] maybe ottomata updated it, but didn't upload for trusty [22:48:14] The same package is now available for trusty. It may or may not work :) [22:49:31] thanks! testing on the second node after working around it manually on the first [22:49:35] (03CR) 10Aklapper: [C: 04-1] "Thanks, He7d3r! A match for (bugs|bugzilla).wikimedia.org/attachment.cgi\\?(.*) should only change the subdomain to old-bugzilla.wikimedia" [puppet] - 10https://gerrit.wikimedia.org/r/174335 (owner: 1020after4) [22:52:04] (03CR) 10Legoktm: "Unfortunately T40 isn't accessible right now....what are the costs of leaving those endpoints up?" [puppet] - 10https://gerrit.wikimedia.org/r/174335 (owner: 1020after4) [22:54:25] !log Disabled login for dewiki account "C" [22:54:32] Logged the message, Master [22:56:44] (03CR) 10GWicke: [C: 031] "This repo doesn't actually submit by default. Could somebody please give it a little push?" [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/175001 (owner: 10GWicke) [22:57:02] andrewbogott: another patch ^^ [22:57:21] yuvi already +2ed it, but didn't submit it for merge [22:57:34] (03PS9) 1020after4: Set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/174335 [22:57:45] (03CR) 10Andrew Bogott: [C: 032] Fix two bugs in cassandra module defaults [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/175001 (owner: 10GWicke) [22:57:56] andrewbogott: thanks! [22:58:27] there will be a follow-up to update the submodule in puppet [22:58:29] hm, that's in a submodule… should there by CI for that? [22:58:47] Or have y'all been verifying by hand? [22:58:52] I guess so, but am not familiar with how that's set up with Jenkins [22:59:06] Andrew Otto set it up [22:59:09] Can you link me to another patch in that repo that's already merged? [22:59:37] andrewbogott: https://gerrit.wikimedia.org/r/#/c/166888/ [22:59:47] (03CR) 10Aklapper: "legoktm: Those endpoints will still be available when going to old-bugzilla.wm.o instead." [puppet] - 10https://gerrit.wikimedia.org/r/174335 (owner: 1020after4) [23:00:26] andrewbogott: it't not used in production yet, I'm currently testing in labs [23:00:35] Looks like no jenkins, otto is just verifying by hand [23:00:36] So. [23:00:51] (03CR) 10Andrew Bogott: [V: 032] "No CI suite for this, apparently :(" [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/175001 (owner: 10GWicke) [23:02:01] (03PS1) 10GWicke: Bump cassandra module to d6182892c5f [puppet] - 10https://gerrit.wikimedia.org/r/175141 [23:02:08] andrewbogott: ^^ [23:02:49] hopefully the last one for a bit [23:04:30] (03CR) 10Andrew Bogott: [C: 032] Bump cassandra module to d6182892c5f [puppet] - 10https://gerrit.wikimedia.org/r/175141 (owner: 10GWicke) [23:04:49] andrewbogott: gracias! [23:07:06] (03PS10) 1020after4: Set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/174335 [23:10:13] (03CR) 10BryanDavis: [C: 031] "Not tested, but the php and json config bits look ok to me. Thanks for switching to a prepared statement so that csteipp doesn't come afte" [puppet] - 10https://gerrit.wikimedia.org/r/174335 (owner: 1020after4) [23:11:22] (03PS1) 10Dzahn: bugzilla: switch svc_name to old-bugzilla [puppet] - 10https://gerrit.wikimedia.org/r/175144 [23:11:31] (03CR) 1020after4: [C: 031] "Patch 9 addresses the attachment.cgi/buglist.cgi mixup (copypasta mistake on my part)" [puppet] - 10https://gerrit.wikimedia.org/r/174335 (owner: 1020after4) [23:13:27] (03PS2) 10Dzahn: bugzilla: switch svc_name to old-bugzilla [puppet] - 10https://gerrit.wikimedia.org/r/175144 [23:14:53] (03CR) 10Dzahn: "note how the "-attachments" URL is not changed and how svc_name is used in ./templates/apache/bugzilla.wikimedia.org.erb" [puppet] - 10https://gerrit.wikimedia.org/r/175144 (owner: 10Dzahn) [23:15:35] mehh.. betalabs is broken again [23:16:05] (03CR) 1020after4: "Although there wasn't really a way to exploit the code by sql injection, I agreed with Brian that using prepared statements eliminates the" [puppet] - 10https://gerrit.wikimedia.org/r/174335 (owner: 1020after4) [23:16:15] (03PS11) 1020after4: Set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/174335 [23:16:26] "Exiting; no certificate found and waitforcert is disabled" [23:16:42] that's the new ssh deploy goodness? [23:17:14] gwicke: The new deploy stuff broke puppet in beta labs, Antoine complained about that on the ops list [23:17:33] okay [23:17:37] is it being fixed? [23:17:37] (03CR) 10John F. Lewis: [C: 04-1] "In apache.pp; it uses 'template("bugzilla/apache/${svc_name}.erb"),'" [puppet] - 10https://gerrit.wikimedia.org/r/175144 (owner: 10Dzahn) [23:17:43] fwiw: Files /root/private/files/nagios/nsca.cfg and /root/private/files/icinga/nsca.cfg differ [23:18:07] /cc ori [23:18:56] PROBLEM - puppet last run on cp4015 is CRITICAL: CRITICAL: puppet fail [23:19:49] (03CR) 10Chad: Set up redirects for bugzilla urls to redirect to phabricator. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/174335 (owner: 1020after4) [23:27:08] gwicke: if an instance is using a non-standard puppetmaster, that master has to sign the puppet cert. [23:27:12] I can fix, just a minute... [23:27:52] gwicke: is that better? [23:28:09] !log Disabled login for dewiki account "@" [23:28:13] *sigh* [23:28:14] Logged the message, Master [23:28:24] hoo: so much abuse [23:30:34] andrewbogott: yes, that looks better! [23:31:06] I wish there was a document that congruently described setting up a new betalabs node [23:31:59] gwicke: all of these steps (puppetwise) are in that page that I linked you to earlier. Here it is again :) https://wikitech.wikimedia.org/wiki/Help:Self-hosted_puppetmaster#Set_up_a_multi-instance_self_hosted_puppetmaster [23:32:19] But I think you're done with that now… but It'll be a repeat for the second node. [23:32:48] andrewbogott: I mean a page that focuses on the common case of setting up a regular node in betalabs [23:32:55] including all the salt etc stuff [23:33:51] should try to repeat the process for a third node & keep notes [23:34:20] <^d> gwicke: Did the docs I linked yesterday not work? It should just be "normal setup procedure for a node" + "this weird puppet/salt shit" [23:35:11] I'm out… good weekend all. [23:35:13] ^d: the problem is that bits and pieces are spread around all of wikitech [23:35:59] andrewbogott_afk: good weekend to you as well! [23:36:04] <^d> Yeah less than ideal :\ [23:38:27] RECOVERY - puppet last run on cp4015 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [23:43:06] (03PS3) 10Dzahn: bugzilla: switch svc_name to old-bugzilla [puppet] - 10https://gerrit.wikimedia.org/r/175144 [23:44:53] (03CR) 10John F. Lewis: [C: 031] "That works too I guess. Untested but looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/175144 (owner: 10Dzahn) [23:46:52] (03PS12) 1020after4: Set up redirects for bugzilla urls to redirect to phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/174335 [23:48:17] what is the trick to fix submodule checkouts with trebuchet again? [23:51:37] (03PS1) 10Ori.livneh: Add hiera lookup tool [puppet] - 10https://gerrit.wikimedia.org/r/175153 [23:51:41] andrewbogott_afk: ^ [23:52:33] ori: how is checkout_submodules handled with the package provider? [23:52:53] gwicke: can't remember off the top of my head [23:53:24] there is no mention of it in https://wikitech.wikimedia.org/wiki/Trebuchet [23:56:59] gwicke: IIRC, the trebuchet package provider just delegates to the python module, which handles the checkout_submodules as well (or as poorly) as it did before [23:57:58] figured it out, you need to add a line to .git/config in the repo on deployment-bastion [23:58:07] got to love it [23:58:49] remember, i proposed replacing it wholesale; i'm just trying to make it not suck.