[00:00:44] (03CR) 10Dzahn: ";; ANSWER SECTION:" [operations/dns] - 10https://gerrit.wikimedia.org/r/156212 (owner: 10Dzahn) [00:01:04] RECOVERY - Unmerged changes on repository puppet on virt0 is OK: No changes to merge. [00:01:32] chasemp: let's wait 1H now :) [00:04:12] (03CR) 10Chad: public_html directory service, see RT #6862 (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [00:04:28] <^d> Krinkle: It doesn't need that much css anyway :p [00:04:38] <^d> Just something on the

and

so they don't look totally crap. [00:05:22] (03CR) 10Krinkle: [C: 04-1] public_html directory service, see RT #6862 (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [00:05:32] ^d: Ah, yeah, I'll add the subset then [00:07:05] (03PS4) 10Ori.livneh: mediawiki: use 'udplog' service alias instead of hard-coding fluorine [operations/puppet] - 10https://gerrit.wikimedia.org/r/154710 [00:08:49] (03PS4) 10Krinkle: public_html directory service, see RT #6862 [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [00:09:58] (03CR) 10Krinkle: "* Removed css ref that was causing a 404 error." [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [00:19:52] (03PS1) 10RobH: assigning row b sandbox vlan ip address for atlas-eqiad [operations/dns] - 10https://gerrit.wikimedia.org/r/156221 [00:20:28] (03CR) 10RobH: [C: 032] assigning row b sandbox vlan ip address for atlas-eqiad [operations/dns] - 10https://gerrit.wikimedia.org/r/156221 (owner: 10RobH) [00:22:11] (03PS1) 10RobH: testing dns repo if it can merge without +2 [operations/dns] - 10https://gerrit.wikimedia.org/r/156222 [00:23:20] (03Abandoned) 10RobH: testing dns repo if it can merge without +2 [operations/dns] - 10https://gerrit.wikimedia.org/r/156222 (owner: 10RobH) [00:26:28] (03PS1) 10Ori.livneh: mediawiki::monitoring::webserver: listen on $::fqdn:80 [operations/puppet] - 10https://gerrit.wikimedia.org/r/156223 [00:31:14] (03PS1) 10Ori.livneh: beta: specify weight param in nutcracker config [operations/puppet] - 10https://gerrit.wikimedia.org/r/156224 [00:32:07] (03CR) 10Ori.livneh: "cherry-picked in beta" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156224 (owner: 10Ori.livneh) [00:33:25] (03PS5) 10Chad: public_html directory service, see RT #6862 [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [00:34:08] (03CR) 10jenkins-bot: [V: 04-1] public_html directory service, see RT #6862 [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [00:34:34] (03CR) 10Chad: "PS5:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [00:36:57] (03CR) 10Dzahn: public_html directory service, see RT #6862 (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [00:39:24] PROBLEM - puppet last run on analytics1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:39:27] (03PS6) 10Dzahn: public_html directory service, see RT #6862 [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [00:40:13] (03CR) 10Dzahn: "PS6: drop require's when using apache::site" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [00:41:26] (03PS7) 10Dzahn: public_html directory service, see RT #6862 [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [00:41:59] (03PS2) 10Ori.livneh: mediawiki::monitoring::webserver: don't declare a vhost [operations/puppet] - 10https://gerrit.wikimedia.org/r/156223 [00:42:30] (03CR) 10Dzahn: "PS7: absent'ing default Apache site also not needed anymore with new Apache module" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [00:43:19] <^d> mutante: What about libapache2-mod-php5? [00:44:08] needs to be declared [00:44:10] add "include ::apache::mod::php5" [00:44:13] if it is needed, i mean [00:44:39] <^d> We've already got php and already have apache on there. [00:44:43] <^d> Might as well add the final bit :p [00:44:55] (03CR) 10Dzahn: "which node is it going to run on?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [00:45:18] <^d> terbium according to the varnish change. [00:45:30] <^d> Via misc-web-lb. [00:45:30] oh, of course [00:45:32] <^d> :) [00:45:38] <^d> What we'd decided originally iirc. [00:45:48] yea, as long as terbium doesnt already have another backend :) [00:46:49] (03PS8) 10Dzahn: public_html directory service, see RT #6862 [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [00:47:38] (03PS3) 10Ori.livneh: mediawiki::monitoring::webserver: don't declare a vhost [operations/puppet] - 10https://gerrit.wikimedia.org/r/156223 [00:51:25] puppet compiler is out of disk again [00:51:41] (03CR) 10Chad: [C: 031] "lgtm" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [00:53:06] <^d> mutante: You know, even after we move everything off of fenari that we think is there, we'll probably have to wait another week or so. [00:53:10] <^d> *something* will show up. [00:53:43] ^d: we might want to have pybal config, yes :) [00:54:02] <^d> pybal config is pretty easy, methinks. [00:54:21] <^d> Well, maybe not. [00:54:22] <^d> Who knows. [00:55:43] (03CR) 10Dzahn: "PS8: also load mod php5" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [00:55:54] (03CR) 10Chad: [C: 031] add people.wm.org -> misc varnish, public_html's [operations/dns] - 10https://gerrit.wikimedia.org/r/156214 (owner: 10Dzahn) [01:00:56] (03CR) 10Ori.livneh: "server admin is also part of the default apache module; it'd be better to remove that parameter/directive from the manifest and template" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [01:01:24] PROBLEM - RAID on analytics1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:02:14] RECOVERY - RAID on analytics1003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [01:02:41] (03CR) 10BryanDavis: [C: 031] "Matches prod config style in manifests/role/mediawiki.pp" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156224 (owner: 10Ori.livneh) [01:03:36] ori: https://logstash.wikimedia.org/#/dashboard/elasticsearch/fatalmonitor has fatals again. Thanks for fixing. [01:03:54] bd808: nutcracker is alive on the beta apaches and mw1 is repooled [01:04:03] w00t [01:04:14] (03CR) 10Ori.livneh: [C: 04-1] "XXX: latest patchset just comments out the resource; needs a better fix" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156223 (owner: 10Ori.livneh) [01:04:21] bd808: and puppet is enabled :) [01:04:54] (03CR) 10Dzahn: [C: 031] beta: specify weight param in nutcracker config [operations/puppet] - 10https://gerrit.wikimedia.org/r/156224 (owner: 10Ori.livneh) [01:07:11] (03CR) 10BryanDavis: "Can we just rename /usr/local/apache/conf/all.conf to /usr/local/apache/conf/00-all.conf as a temporary fix?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156223 (owner: 10Ori.livneh) [01:12:38] (03PS1) 10Dzahn: phab.wmfusercontent - add to misc varnish config [operations/puppet] - 10https://gerrit.wikimedia.org/r/156226 [01:14:03] (03CR) 10MZMcBride: "Just out of curiosity as I can't see the relevant RT ticket, is people.wikimedia.org intended to replace noc.wikimedia.org/~user/?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [01:16:01] (03CR) 10Dzahn: "MZ, yes, that's what it is, we want to shutdown fenari because Tampa" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [01:17:35] (03CR) 10Dzahn: "are we going to add redirects for some noc.wm user URLs per http://www.w3.org/Provider/Style/URI.html ?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [01:24:40] (03CR) 10Dzahn: updating install-server module for new codfw rows and install2001 params second ps: fixing two ip address mistakes/typos from bblack's revie (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/156210 (owner: 10RobH) [01:32:15] (03PS1) 10Dzahn: public1-a-codfw - also use install2001 not carbon [operations/puppet] - 10https://gerrit.wikimedia.org/r/156228 [01:33:36] (03PS2) 10Dzahn: public1-a-codfw - also use install2001 not carbon [operations/puppet] - 10https://gerrit.wikimedia.org/r/156228 [01:39:31] (03CR) 10Chad: "I'd say let's do it. Redirects are cheap." [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [01:40:35] (03CR) 10RobH: [C: 04-2] "its intentionally set to carbon to install install2001, I plan to fix it after install is complete" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156228 (owner: 10Dzahn) [01:46:22] (03CR) 10RobH: "i plan to fix it, by merging this patch, since Daniel already did all the work" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156228 (owner: 10Dzahn) [01:58:21] (03CR) 10Ori.livneh: [C: 032] beta: specify weight param in nutcracker config [operations/puppet] - 10https://gerrit.wikimedia.org/r/156224 (owner: 10Ori.livneh) [02:10:04] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [02:10:25] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [02:10:55] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [02:11:24] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [02:21:35] PROBLEM - puppet last run on mw1136 is CRITICAL: CRITICAL: Puppet has 1 failures [02:30:25] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [02:30:54] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [02:31:34] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [02:31:54] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [02:34:03] !log LocalisationUpdate completed (1.24wmf17) at 2014-08-26 02:33:00+00:00 [02:34:11] Logged the message, Master [02:39:35] RECOVERY - puppet last run on mw1136 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [02:50:54] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [02:51:54] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [02:54:24] PROBLEM - RAID on analytics1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:56:14] RECOVERY - RAID on analytics1003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [03:01:14] PROBLEM - puppet last run on mw1127 is CRITICAL: CRITICAL: Puppet has 1 failures [03:05:21] !log LocalisationUpdate completed (1.24wmf18) at 2014-08-26 03:04:18+00:00 [03:05:27] Logged the message, Master [03:11:09] !log filesystem issues on labsdb1002. stopped mysqld [03:11:15] Logged the message, Master [03:19:14] RECOVERY - puppet last run on mw1127 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [03:23:14] <^d> !log restarting elasticsearch on elastic1001, elastic1003 and elastic1008. icinga may complain briefly. [03:23:20] Logged the message, Master [03:52:49] (03PS1) 10Revi: Set wgNameSpaceProtection on kowikinews [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156235 (https://bugzilla.wikimedia.org/70022) [04:07:41] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Aug 26 04:06:34 UTC 2014 (duration 6m 33s) [04:07:47] Logged the message, Master [04:17:08] Anyone has idea, http://es.wikipedia.beta.wmflabs.org/ is down for good reason? :) [04:21:02] Might want to try #wikimedia-labs or #wikimedia-qa [04:21:06] Though it's late. [04:22:15] (03PS1) 10Chad: Make permenant the recovery concurrent stream throttle [operations/puppet] - 10https://gerrit.wikimedia.org/r/156238 [04:23:51] <^d> One of these days I'll remember where we put the stupid logs on beta. [04:23:57] permanent [04:24:29] <^d> Ah yes. [04:24:49] I've given up on human memory. I just write shit down. [04:24:54] Or it gets forgotten. [04:25:19] <^d> Aug 26 03:25:49 10.68.17.96 apache2: PHP Fatal error: Cannot redeclare wmfLabsOverrideSettings() (previously declared in /srv/common-local/wmf-config/InitialiseSettings-labs.php:19) in /srv/common-local/wmf-config/InitialiseSettings-labs.php on line 19 [04:25:29] <^d> That's probably why labs is broken. [04:25:35] Carmela: seems I forgot to read email. [04:26:00] <^d> Sounds like we've got an inclusion loop. [04:27:21] !log labsdb1002 back up [04:27:27] Logged the message, Master [04:32:05] (03CR) 10Jeremyb: [C: 04-1] "needs consensus (and I haven't reviewed the actual change)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156235 (https://bugzilla.wikimedia.org/70022) (owner: 10Revi) [04:36:21] (03PS1) 10Chad: Try only requiring InitialiseSettings-labs.php once [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156239 [04:37:16] (03CR) 10Chad: [C: 032] Try only requiring InitialiseSettings-labs.php once [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156239 (owner: 10Chad) [04:37:20] (03Merged) 10jenkins-bot: Try only requiring InitialiseSettings-labs.php once [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156239 (owner: 10Chad) [04:41:09] kart_: seems to be only the document root; http://es.wikipedia.beta.wmflabs.org/wiki/Main_Page works [04:41:54] PROBLEM - Host ms-be1011 is DOWN: PING CRITICAL - Packet loss = 100% [04:42:12] <^d> ori: apache.log isn't updating on deployment-bastion :\ [04:42:54] <^d> Since 03:25:49 [04:44:18] it's updating; the location is /data/project/logs [04:44:28] <^d> I'm looking there. [04:44:50] <^d> /data/project/logs/syslog/apache.log? [04:44:50] -rw-r--r-- 1 root root 32794187 Aug 26 04:44 apache-access.log [04:44:57] and [04:44:57] -rw-r--r-- 1 root root 3134225 Aug 26 04:42 apache-error.log [04:45:10] "hysterical raisins" [04:46:00] <^d> ew, those [04:46:07] you know what the funniest thing about labs is? [04:46:14] <^d> what? [04:46:27] it's the little differences. a lotta the same shit we got here, they got there, but there they're a little different. [04:47:04] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /a/common/). [04:47:47] (03PS1) 10Chad: Revert "Try only requiring InitialiseSettings-labs.php once" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156242 [04:47:56] (03CR) 10Chad: [C: 032 V: 032] Revert "Try only requiring InitialiseSettings-labs.php once" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156242 (owner: 10Chad) [04:48:33] !log demon Synchronized wmf-config/InitialiseSettings-labs.php: (no message) (duration: 00m 06s) [04:48:36] <^d> ^ noop [04:48:40] Logged the message, Master [04:49:04] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [05:00:31] (03CR) 10Hoo man: [C: 04-1] "This has two significant flaws right now:" (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/155753 (owner: 1001tonythomas) [05:01:20] tonythomas: ^ [05:01:25] I hope you get what I mean [05:02:23] hoo: got it. let me try that same indeployment-mediawiki02 terminal [05:02:26] will update in a min [05:04:58] hoo: $ curl -H 'deployment-cache-text02' http://deployment.wikimedia.beta.wmflabs.org/w/api.php action="bouncehandler" --data-urlencode "email@asdfasdf" works in deployement-mediawiki02 [05:06:00] mh [05:06:05] that's not documented than [05:06:07] * then [05:06:18] ajh [05:06:21] * ah [05:06:23] I see why [05:06:25] it will work [05:06:31] but not in the exact way we want it [05:06:41] do curl -vvv and you'll see that it screws stuff [05:07:18] hoo: curl -ww ? [05:08:34] strange, but when I give curl -H 'Host:foo' bar, or even 'Host:www.foo' its not working [05:09:03] ok [05:09:04] mh [05:09:19] You need to have a better understanding of http to get this, I guess [05:09:33] but for now it should be good enough if you drop the screwed -H entirely [05:09:43] also for production probably [05:09:51] hoo: true that [05:10:21] use -ww instead of -H right ? [05:10:29] tonythomas: -vvv will show you the headers you send and the headers you receive from the servers [05:10:53] no [05:10:59] I don't know about -ww [05:11:10] I doubt curl actually supports that, but I might be wrong [05:11:19] it has a *lot of* params [05:11:45] hoo: yup and http://www.cyberciti.biz/faq/linux-unix-appleosx-bsd-curl-sending-http-host-header/ shows our -H was roughly correct [05:17:05] tonythomas: Yeah, but after all it might not be needed [05:17:23] for beta it's certainly good enough to just hit the URL you'd also hit from outside [05:17:34] PROBLEM - Puppet freshness on labsdb1002 is CRITICAL: Last successful Puppet run was Tue 26 Aug 2014 03:17:20 UTC [05:18:13] hoo: ok. in that case, remove the -H completely and $ curl http://deployment.wikimedia.beta.wmflabs.org/w/api.php action="bouncehandler" --data-urlencode "email@asdfasdf" right ? [05:18:28] yep [05:18:31] that's one part [05:18:47] the other is that you need to keep the old behaviour for production [05:19:48] hoo: the -H in prod right ? [05:19:57] but that can make the configs a bit messy, I think [05:20:11] tonythomas: Don't think so [05:20:27] I'm just used to using it when wanting to hit a specific app server [05:20:40] (03PS1) 10Springle: reassign db1057 to s2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/156245 [05:20:43] but if we don't care, we can just hit the full URI, I guess [05:20:58] (We do that a lot in other places, so shouldn't be an issue) [05:22:38] hoo: I think mark was telling yesterday, we might want to use something like www.appservers.svc."${::mw_primary}".net for prod [05:23:00] (03PS2) 10Springle: reassign db1054 to s2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/156245 [05:23:18] tonythomas: without the www. that's also going to work [05:24:08] I'm to tired to tell, but it could be that the "external" URIs like en.wikipedia.org don't work from the servers that will run your curl [05:24:15] I doubt that, though [05:24:35] both ways will probably work and end up in the same server pool [05:25:03] (03CR) 10Springle: [C: 032] reassign db1054 to s2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/156245 (owner: 10Springle) [05:25:26] hoo: I think that was marks point. let me try to get some chat logs [05:26:44] ah ok [05:26:59] in that case, use the internal uri ending on eqiad.wmnet [05:27:19] or rather use the data center switch and then just .wmnet [05:28:00] !log upgrade & restart db1054, fs check [05:28:06] Logged the message, Master [05:28:20] hoo: I might want some help there, I think. Do www.appservers.svc."${::mw_primary}" does the same ? [05:28:30] you want [05:29:02] appservers.svc."${::mw_primary}".wmnet [05:29:14] okey. for the prod right ? and beta ? [05:29:39] tonythomas: Yep, for prod [05:30:14] for beta you probably then also want to use the solution with -H 'Host: ...' [05:30:36] the Host: value should be clear deployment.wikimedia.beta.wmflabs.org [05:31:16] not sure about the address for the appserver load balancer there actually [05:31:38] hoo: when I POST with -H 'Host: www.deployment-cache-text02' ? [05:31:41] if you can't find one, just using deployment-mediawiki01/w/api.php?action=... will work [05:31:43] no [05:31:56] the host field is the virtual host which you use in your browser [05:32:03] and the actual URL is the internal thing [05:32:14] sorry, I'm bad at explaining this, I think [05:33:51] hoo: might be due to HTTP. anyway, so it would be like $ curl -H 'Host:deployment.wikimedia.beta.wmflabs.org/' http://deployment.wikimedia.beta.wmflabs.org/w/api.php action="bouncehandler" --data-urlencode "email@asdfasdf" [05:34:24] tonythomas: almost... drop the / in the -H and we should be good [05:34:43] ah. that worked :) [05:34:52] also it's convention to place the URL at the end of the comment, but it should work this way also [05:35:22] hoo: I have written like that for the exim config though [05:35:31] let me change the PS [05:36:54] RECOVERY - Puppet freshness on labsdb1002 is OK: puppet ran at Tue Aug 26 05:36:47 UTC 2014 [05:38:04] (03PS12) 1001tonythomas: Added the bouncehandler router to catch in all bounce emails [operations/puppet] - 10https://gerrit.wikimedia.org/r/155753 [05:38:35] hoo: done. https://gerrit.wikimedia.org/r/#/c/155753 [05:38:36] :) [05:38:44] having a quick look [05:39:41] !log springle Synchronized wmf-config/db-eqiad.php: depool db1036 while cloning (duration: 00m 06s) [05:39:43] tonythomas: "action=bouncehandler" [05:39:51] Logged the message, Master [05:39:51] that string looks very displaced [05:39:56] it should be the the very end [05:40:26] like: <%= @verp_bounce_post_url %>?action=... [05:41:15] and still this should be changed so that it doesn't change behaviour for production at all, yet [05:41:18] !log xtrabackup clone db1036 to db1054 [05:41:25] Logged the message, Master [05:41:28] you can still keep the production URLs and stuff, but commented out or so [05:41:42] just built in some kind of switch based on realm [05:45:38] tonythomas: Have to leave now and get some sleep... I hope that stuff is clear to you know :) [05:45:56] If you have further questions just look for someone else in this channel [06:23:05] (03CR) 10Hashar: "This change is just to remove the Math dependency packages in favor of using puppet https://gerrit.wikimedia.org/r/115133 To solve bug 6" [operations/debs/wikimedia-task-appserver] - 10https://gerrit.wikimedia.org/r/115135 (https://bugzilla.wikimedia.org/61090) (owner: 10Hashar) [06:27:35] PROBLEM - puppet last run on holmium is CRITICAL: CRITICAL: Puppet has 2 failures [06:28:15] PROBLEM - Disk space on elastic1016 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=96%): [06:28:34] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:34] PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:35] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:55] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:14] PROBLEM - puppet last run on search1001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:44] (03PS13) 1001tonythomas: Added the bouncehandler router to catch in all bounce emails [operations/puppet] - 10https://gerrit.wikimedia.org/r/155753 [06:37:34] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:37:34] PROBLEM - RAID on analytics1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:38:34] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 9.502 second response time [06:38:44] PROBLEM - Number of mediawiki jobs queued on tungsten is CRITICAL: CRITICAL: Anomaly detected: 37 data above and 0 below the confidence bounds [06:39:14] PROBLEM - Number of mediawiki jobs running on tungsten is CRITICAL: CRITICAL: Anomaly detected: 37 data above and 0 below the confidence bounds [06:39:25] RECOVERY - RAID on analytics1003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [06:42:01] PROBLEM - puppet last run on virt0 is CRITICAL: CRITICAL: Puppet has 1 failures [06:42:11] PROBLEM - puppet last run on db1019 is CRITICAL: CRITICAL: Puppet has 1 failures [06:43:29] (03PS14) 1001tonythomas: Added the bouncehandler router to catch in all bounce emails [operations/puppet] - 10https://gerrit.wikimedia.org/r/155753 [06:45:06] (03PS15) 1001tonythomas: Added the bouncehandler router to catch in all bounce emails [operations/puppet] - 10https://gerrit.wikimedia.org/r/155753 [06:45:32] PROBLEM - RAID on analytics1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:32] RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:45:41] RECOVERY - puppet last run on holmium is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:45:52] RECOVERY - puppet last run on search1001 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [06:45:52] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [06:46:31] RECOVERY - RAID on analytics1003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [06:46:31] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [06:46:32] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:49:31] PROBLEM - RAID on analytics1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:22] RECOVERY - RAID on analytics1003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [06:53:31] PROBLEM - RAID on analytics1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:22] RECOVERY - RAID on analytics1003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [06:58:01] RECOVERY - puppet last run on virt0 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:59:11] RECOVERY - puppet last run on db1019 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [07:07:48] <_joe_> good morning [07:12:07] afternoon _joe_ [07:22:24] <_joe_> ChrisJ: :) [07:31:29] (03CR) 10Giuseppe Lavagetto: [C: 04-2] "We need to take the virtualhosts for beta in puppet and provision those in beta instead of the production ones." [operations/puppet] - 10https://gerrit.wikimedia.org/r/156223 (owner: 10Ori.livneh) [07:35:46] (03CR) 10Ori.livneh: "Yeah, you're right." [operations/puppet] - 10https://gerrit.wikimedia.org/r/156223 (owner: 10Ori.livneh) [07:35:51] (03Abandoned) 10Ori.livneh: mediawiki::monitoring::webserver: don't declare a vhost [operations/puppet] - 10https://gerrit.wikimedia.org/r/156223 (owner: 10Ori.livneh) [07:36:38] i'm not here [07:37:32] <_joe_> :P [07:37:45] <_joe_> hey [07:38:07] <_joe_> sleep well :) [07:45:58] (03PS2) 10Giuseppe Lavagetto: Allocate IP addresses for HHVM app servers [operations/dns] - 10https://gerrit.wikimedia.org/r/152904 (owner: 10Mark Bergsma) [07:50:46] (03Abandoned) 10Giuseppe Lavagetto: Allocate IP addresses for HHVM app servers [operations/dns] - 10https://gerrit.wikimedia.org/r/152904 (owner: 10Mark Bergsma) [07:59:45] (03PS2) 10Revi: Set wgNameSpaceProtection on kowikinews [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156235 (https://bugzilla.wikimedia.org/70022) [08:19:21] PROBLEM - Puppet freshness on elastic1016 is CRITICAL: Last successful Puppet run was Tue 26 Aug 2014 06:18:16 UTC [08:28:32] !log springle Synchronized wmf-config/db-eqiad.php: repool db1036 (duration: 00m 06s) [08:28:37] Logged the message, Master [08:37:11] PROBLEM - puppet last run on cp4002 is CRITICAL: CRITICAL: Puppet has 1 failures [08:38:15] (03PS2) 10Giuseppe Lavagetto: puppet: hiera backend for the WMF [operations/puppet] - 10https://gerrit.wikimedia.org/r/151869 [08:41:45] <_joe_> ^^ this is mostly it, if someone cares to review it :) [08:42:43] (03CR) 10Giuseppe Lavagetto: "I tested the backend locally and it worked; I'm going to cherry-pick this patch on a local puppetmaster in labs before merging anyway." [operations/puppet] - 10https://gerrit.wikimedia.org/r/151869 (owner: 10Giuseppe Lavagetto) [08:44:31] PROBLEM - RAID on analytics1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:46:22] RECOVERY - RAID on analytics1003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [08:55:11] RECOVERY - puppet last run on cp4002 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [09:04:52] !log reboot ms-be1011, unresponse on network and console [09:04:58] Logged the message, Master [09:05:42] <_joe_> godog: again? [09:05:45] <_joe_> :/ [09:06:04] _joe_: yeah, not sure what's up, yesterday wasn't locked up that hard [09:07:51] RECOVERY - Host ms-be1011 is UP: PING OK - Packet loss = 0%, RTA = 0.76 ms [09:17:23] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Make permenant the recovery concurrent stream throttle [operations/puppet] - 10https://gerrit.wikimedia.org/r/156238 (owner: 10Chad) [09:22:41] (03CR) 10Filippo Giunchedi: apache: when sourcing env-enabled/*, redirect stdout to stderr (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/156060 (owner: 10Ori.livneh) [09:38:43] (03PS1) 10Filippo Giunchedi: force builder files to be ignored in diffs [operations/software/swift-ring] - 10https://gerrit.wikimedia.org/r/156251 [09:39:00] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] force builder files to be ignored in diffs [operations/software/swift-ring] - 10https://gerrit.wikimedia.org/r/156251 (owner: 10Filippo Giunchedi) [09:41:07] (03PS1) 10Filippo Giunchedi: weight ms-be1013/14/15 to 2800 [operations/software/swift-ring] - 10https://gerrit.wikimedia.org/r/156252 [09:51:15] (03PS1) 10Giuseppe Lavagetto: hadoop: grant hive access to Bernd Sitzmann [operations/puppet] - 10https://gerrit.wikimedia.org/r/156253 [09:51:17] (03PS1) 10Giuseppe Lavagetto: hadoop: grant access to Dimitry Brant [operations/puppet] - 10https://gerrit.wikimedia.org/r/156254 [10:00:40] (03Abandoned) 10Filippo Giunchedi: add mini-dinstall to releases.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/136128 (owner: 10Filippo Giunchedi) [10:18:41] PROBLEM - puppet last run on elastic1016 is CRITICAL: CRITICAL: Puppet last ran 14406 seconds ago, expected 14400 [10:20:21] PROBLEM - Puppet freshness on elastic1016 is CRITICAL: Last successful Puppet run was Tue 26 Aug 2014 06:18:16 UTC [10:50:04] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] elasticsearch: add percent-based shard check [operations/puppet] - 10https://gerrit.wikimedia.org/r/154786 (owner: 10Filippo Giunchedi) [10:54:20] (03PS1) 10Aude: Enable otherProjectsLinksBeta for Wikibase clients [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156257 [10:54:22] (03PS1) 10Aude: Use new Wikibase serialization format on Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156258 [10:54:42] (03CR) 10Aude: [C: 04-2] "not until later today" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156257 (owner: 10Aude) [11:03:36] <_joe_> lunch, bbl [11:05:22] http://dumps.wikimedia.org/ is 403 error! [11:05:25] apergos: ! [11:06:33] works when i proxy via US [11:07:43] (03CR) 10Alexandros Kosiaris: [C: 032] wikimedia.community: Add Google webmasters tools verification [operations/dns] - 10https://gerrit.wikimedia.org/r/152269 (owner: 10Alexandros Kosiaris) [11:09:12] (03PS1) 10coren: Tool Labs: add melt [operations/puppet] - 10https://gerrit.wikimedia.org/r/156259 (https://bugzilla.wikimedia.org/69365) [11:13:32] (03PS1) 10Filippo Giunchedi: elasticsearch: deploy shard percentage check [operations/puppet] - 10https://gerrit.wikimedia.org/r/156260 [11:17:05] it works for me aude [11:17:23] hmmm [11:17:48] says forbidden [11:20:46] http://dumps.wikimedia.org/ works for me to (/me in Cental Europe) [11:25:43] (03PS2) 10Filippo Giunchedi: elasticsearch: deploy shard percentage check [operations/puppet] - 10https://gerrit.wikimedia.org/r/156260 [11:25:43] must be dns caching [11:27:27] although traceroute goes same place [11:29:23] works on my mobile [11:35:02] aude: bad browser cache ? [11:35:10] maybe [11:36:15] * aude try proxy in chrome and it works [11:43:13] (03PS1) 10Matanya: torrus: quailfy vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/156261 [11:48:05] (03PS1) 10Matanya: exceptionmonitor: qualify var [operations/puppet] - 10https://gerrit.wikimedia.org/r/156262 [11:49:54] (03PS1) 10Springle: pool db1054 in s2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156263 [11:51:57] (03CR) 10Springle: [C: 032] pool db1054 in s2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156263 (owner: 10Springle) [11:52:01] (03Merged) 10jenkins-bot: pool db1054 in s2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156263 (owner: 10Springle) [11:52:26] (03CR) 10Matanya: [C: 031] remove blog.wikimedia.org related things (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153117 (owner: 10Dzahn) [11:53:15] !log springle Synchronized wmf-config/db-eqiad.php: pool db1054, warm up (duration: 00m 08s) [11:53:21] Logged the message, Master [11:57:00] (03PS1) 10Matanya: logging-relay: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/156264 [12:00:23] (03Abandoned) 10Revi: Set wgNameSpaceProtection on kowikinews [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156235 (https://bugzilla.wikimedia.org/70022) (owner: 10Revi) [12:03:47] !log disable puppet on labsdb1006 for planet osm import [12:03:52] Logged the message, Master [12:05:00] (03PS1) 10Matanya: mysql-config-research: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/156267 [12:05:43] akosiaris: i'm just making sure you won't be bored after your vacation :D [12:06:12] matanya: how kind of you... thanks ;-) [12:06:42] good afternoon [12:06:50] hi hashar [12:07:02] hi hashar [12:07:40] matanya: Iam so sorry you had to abandon the ferm rule rework for Zuul [12:07:44] matanya: bad timing :-/ [12:07:50] no worries [12:08:02] matanya: I am not planning to change the zuul / role::zuul manifests anytime soon so i guess you can redo it if still willing [12:08:23] I am not sure from where the ferm::rule should be applied though. [12:08:43] note at site.pp level though, cause we need them applied on labs instance whenever including a role there [12:09:55] hashar: mostly ferm rules go on roles, with very few exemption [12:10:04] sounds good :] [12:11:08] i'll try to redo it, if i find some time longer than the few seconds i have for those puppet3 patches [12:12:31] !log Jenkins migrating mediawiki-core-qunit to use Zuul cloner {{gerrit|156268}} [12:12:36] Logged the message, Master [12:15:54] aude, are you using multiple connections at the same time? we do cap that [12:16:10] sorry for the delay, I don't always see pings in this window [12:21:21] PROBLEM - Puppet freshness on elastic1016 is CRITICAL: Last successful Puppet run was Tue 26 Aug 2014 06:18:16 UTC [12:22:07] (03CR) 10Krinkle: [C: 031] public_html directory service, see RT #6862 [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [12:33:43] !log Jenkins reverted mediawiki-core-qunit to use Zuul cloner {{gerrit|156268}}. Gotta play with it on a new job name since it does not work out of the box as expected. [12:33:47] Logged the message, Master [12:39:32] PROBLEM - RAID on analytics1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:43:22] RECOVERY - RAID on analytics1003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [12:48:09] !log springle Synchronized wmf-config/db-eqiad.php: db1054 to normal load (duration: 00m 06s) [12:48:15] Logged the message, Master [12:54:37] <_joe_> legoktm: ping [12:56:32] PROBLEM - RAID on analytics1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:57:22] RECOVERY - RAID on analytics1003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [13:00:32] PROBLEM - RAID on analytics1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:01:22] RECOVERY - RAID on analytics1003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [13:07:01] (03CR) 10Manybubbles: [C: 031] "Cool!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156260 (owner: 10Filippo Giunchedi) [13:07:23] (03CR) 10Giuseppe Lavagetto: mediawiki: HAT appserver should turn off mod_php (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153577 (owner: 10Giuseppe Lavagetto) [13:07:59] (03PS5) 10Giuseppe Lavagetto: mediawiki: HAT appserver should turn off mod_php [operations/puppet] - 10https://gerrit.wikimedia.org/r/153577 [13:12:05] (03PS6) 10Giuseppe Lavagetto: mediawiki: HAT appserver should turn off mod_php [operations/puppet] - 10https://gerrit.wikimedia.org/r/153577 [13:12:19] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] mediawiki: HAT appserver should turn off mod_php [operations/puppet] - 10https://gerrit.wikimedia.org/r/153577 (owner: 10Giuseppe Lavagetto) [13:12:50] <_joe_> !log disabling puppet on all appservers while deploying an apache change [13:12:56] Logged the message, Master [13:15:48] apergos: no [13:16:10] fixed it by proxy and now it works w/o proxy [13:24:18] (03PS1) 10Giuseppe Lavagetto: Revert "mediawiki: HAT appserver should turn off mod_php" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156276 [13:24:35] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Revert "mediawiki: HAT appserver should turn off mod_php" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156276 (owner: 10Giuseppe Lavagetto) [13:26:57] (03PS2) 10Giuseppe Lavagetto: Apache config for advisorywiki using mod_proxy_fcgi [operations/puppet] - 10https://gerrit.wikimedia.org/r/147463 (owner: 10Reedy) [13:28:12] PROBLEM - puppet last run on amssq60 is CRITICAL: CRITICAL: Epic puppet fail [13:29:42] <_joe_> !log re-enabling puppet, change aborted as not all sites are served via hhvm on the hhvm appservers (true story). Will re-do once all configs are in their place [13:29:47] Logged the message, Master [13:30:11] RECOVERY - Puppet freshness on mw1053 is OK: puppet ran at Tue Aug 26 13:30:06 UTC 2014 [13:31:11] RECOVERY - puppet last run on mw1053 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [13:32:56] !log Jenkins mediawiki-core-qunit job has been switched to Zuul cloner and pass! :-D [13:33:02] Logged the message, Master [13:34:11] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [13:35:12] <_joe_> ? [13:38:30] (03PS3) 10Giuseppe Lavagetto: Apache config for advisorywiki using mod_proxy_fcgi [operations/puppet] - 10https://gerrit.wikimedia.org/r/147463 (owner: 10Reedy) [13:39:28] <_joe_> mmmh I'm gonna take another approach I guess [13:46:38] (03CR) 10Manybubbles: [C: 031] Turn on elasticsearch row awareness for shard allocation [operations/puppet] - 10https://gerrit.wikimedia.org/r/153805 (owner: 10Ottomata) [13:47:12] RECOVERY - puppet last run on amssq60 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [13:49:21] PROBLEM - Puppet freshness on labsdb1006 is CRITICAL: Last successful Puppet run was Tue 26 Aug 2014 11:48:50 UTC [14:11:20] (03CR) 10Milimetric: [C: 031] Configure wikimetrics' replication lag checking [operations/puppet] - 10https://gerrit.wikimedia.org/r/156202 (owner: 10QChris) [14:11:24] (03PS1) 10Phuedx: Enable the Task Recommendations experiment v1 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156282 [14:12:17] (03CR) 10Phuedx: [C: 04-1] "-1 until Growth are ready to release the experiment." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156282 (owner: 10Phuedx) [14:15:17] (03PS9) 10Nemo bis: public_html directory service, see RT #6862 [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [14:16:56] (03CR) 10Nemo bis: public_html directory service, see RT #6862 (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [14:19:58] (03CR) 10Chad: public_html directory service, see RT #6862 (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [14:22:21] PROBLEM - Puppet freshness on elastic1016 is CRITICAL: Last successful Puppet run was Tue 26 Aug 2014 06:18:16 UTC [14:25:21] RECOVERY - Disk space on elastic1016 is OK: DISK OK [14:26:41] RECOVERY - Puppet freshness on elastic1016 is OK: puppet ran at Tue Aug 26 14:26:38 UTC 2014 [14:27:41] RECOVERY - puppet last run on elastic1016 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [14:28:33] (03CR) 10Chad: "Already live via in-process config. This will just make it permanent." [operations/puppet] - 10https://gerrit.wikimedia.org/r/153805 (owner: 10Ottomata) [14:29:23] (03PS3) 10Chad: Turn on elasticsearch row awareness for shard allocation [operations/puppet] - 10https://gerrit.wikimedia.org/r/153805 (owner: 10Ottomata) [14:31:28] seeing an "{u'servedby': u'mw1117', u'error': {u'info': u'The search parameter must be set', u'code': u'srnosearch'}}" error [14:31:35] has anything on the search api changed? [14:34:03] <^d> The API hasn't, no. [14:35:08] <^d> What query are you trying to execute? [14:36:36] I'll investigate, perhaps somewhere an empty query managed to slip through. [14:37:10] (03PS1) 10Chad: Disable darkconsole on legalpad.wm.o [operations/puppet] - 10https://gerrit.wikimedia.org/r/156291 [14:46:10] _joe_: pong [14:46:40] <_joe_> legoktm: it seems you don't have a home on iron.wikimedia.org - can you log into it? [14:47:02] I though iron was roots only? [14:47:30] (03CR) 10coren: [C: 032] "Trivial package addition." [operations/puppet] - 10https://gerrit.wikimedia.org/r/156259 (https://bugzilla.wikimedia.org/69365) (owner: 10coren) [14:47:32] it is [14:50:50] manybubbles: I'll SWAT today, unless you really want to [14:51:00] meh [14:51:04] have fun! [14:51:10] legoktm: Ping for SWAT in 10 minutes [14:51:13] o/ [14:55:54] (03CR) 1020after4: [C: 031] "It's something that can be enabled/disabled per user, and it could be useful to debug issues if they occur. It's probably not important to" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156291 (owner: 10Chad) [14:57:11] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [14:57:45] (03PS1) 10coren: Tool Labs: make crontab paranoid about empty files [operations/puppet] - 10https://gerrit.wikimedia.org/r/156294 (https://bugzilla.wikimedia.org/69355) [14:59:32] cmjohnson1: hey Chris, re: RT #7728 when would it be a good time for you to replace the cards in swift frontend? [15:00:06] (03PS4) 10Anomie: Enable GlobalCssJs on all CentralAuth wikis minus loginwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154432 (https://bugzilla.wikimedia.org/13953) (owner: 10Legoktm) [15:00:11] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [15:00:12] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [15:01:00] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Turn on elasticsearch row awareness for shard allocation [operations/puppet] - 10https://gerrit.wikimedia.org/r/153805 (owner: 10Ottomata) [15:01:07] godog wanna do today? [15:01:25] actually...strike that...lemme check where they're going to go [15:02:54] cmjohnson1: tomorrow UTC evening would do, there's some time ranges with less traffic in the ticket where it'd be safer to do too [15:03:22] plus wiki loves monuments is coming up starting september so that might put some more swift traffic in [15:03:55] (03CR) 10Anomie: [C: 032] "SWAT" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154432 (https://bugzilla.wikimedia.org/13953) (owner: 10Legoktm) [15:03:57] godog: can we do Thursday evening instead? [15:04:09] (03Merged) 10jenkins-bot: Enable GlobalCssJs on all CentralAuth wikis minus loginwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/154432 (https://bugzilla.wikimedia.org/13953) (owner: 10Legoktm) [15:04:38] I will have to move them to different racks as well. only ms-fe1003 will move rows [15:05:15] legoktm: Here we go [15:05:23] :DDD [15:05:57] !log anomie Synchronized wmf-config: SWAT: Enable GlobalCssJs on all CentralAuth wikis minus loginwiki [[gerrit:154432]] (duration: 00m 09s) [15:05:58] legoktm: ^ Test please [15:06:03] Logged the message, Master [15:06:24] legoktm: btw, thanks for shepherding that very well. Even in my absence last week, nothing was seen as amiss when it was reviewed (which isn't always the case for less crazy sounding extensions :) ). [15:06:32] mw.loader.state({"ext.globalCssJs.site":"ready","ext.globalCssJs.user":"loading", [15:06:40] [15:06:56] greg-g: :) [15:07:04] anomie: looks good on enwp, I'll test it on a few more wikis [15:08:28] grrrit-wm1: :-) [15:08:36] cmjohnson1: sure, like 21.30/22 UTC? how long would it take to do one machine? [15:08:56] Bah. [15:08:58] James_F: for me or the bot? [15:09:05] greg-g: You. [15:09:07] :) [15:09:11] Blasted nick-squatters. [15:09:15] anomie: Seems to be working for me too. [15:09:48] James_F: sorry, I really should just prepend WMF to my nick anyways to make tab completion more explicit. WMF-greg-g [15:10:05] Yeah, that will /definitely/ help reduce clashes. :-P [15:10:11] godog: fe1001/2 and 4 should take 10-15mins each. fe1003 a little longer...will require a new ip address [15:10:17] 'course, would the bot also need that WMF designation? [15:10:22] anomie: yup, everything looks good on the random wikis I tested on [15:10:31] Good! [15:10:36] thanks! [15:10:36] * anomie is done with SWAT [15:10:37] thanks anomie [15:10:40] greg-g: Surely it should be MW-bot-gerrit [15:10:42] <_joe_> no one will force me to change my IRC nick. [15:10:53] greg-g: And MW-bot-jenkins and … [15:11:00] <_joe_> I've had this nick for like 15 years now [15:11:10] no no no... WMF-MW-bot-jenkins sound way better [15:11:17] legoktm: seems to be working on meta and enwikibooks [15:11:24] <_joe_> MW-bot-jenkins(WMF) [15:11:38] (WMF)MW-bot-jenkins(WMF) [15:11:52] _joe_: the time when I'm on IRC drinking my first cup of coffee is known as "Maximum Sarcasm Greg" or MSG time. Not to be confused with the migraine inducing chinese food additive. [15:12:04] MWMF-bot-jenkins! [15:12:18] greg-g: You also induce migraines, though. [15:12:22] helderwiki: \o/ [15:12:38] James_F: :( [15:12:49] cmjohnson1: sounds good to me! [15:12:49] now... see what you did? I am thinking of writing a program getting all the impossible combinations of infinite WMFs and MW and bot-jenkins... [15:14:08] <_joe_> lol [15:14:15] akosiaris: :-D [15:14:36] <_joe_> akosiaris: seed it with int random=4; [15:15:26] nine sounds better _joe_ ... as scott adams points out in http://cequs.com/wp-content/uploads/2013/09/dilbert.jpg [15:16:04] obviously 9 is more random than 4 [15:19:27] (03PS2) 10Rush: Disable darkconsole on legalpad.wm.o [operations/puppet] - 10https://gerrit.wikimedia.org/r/156291 (owner: 10Chad) [15:19:45] (03CR) 10Rush: "thanks chad, I'll merge this today hopefully" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156291 (owner: 10Chad) [15:37:25] (03CR) 10Hashar: gerrit: allow . in Jenkins jobs names (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/156103 (owner: 10Hashar) [15:38:41] (03PS2) 10Hashar: gerrit: allow . in Jenkins jobs names [operations/puppet] - 10https://gerrit.wikimedia.org/r/156103 [15:45:15] (03PS1) 10Manybubbles: Fix turning on querying with all fields [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156302 [15:45:39] (03CR) 10Manybubbles: "Apparently the old way didn't work!" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156302 (owner: 10Manybubbles) [15:50:21] PROBLEM - Puppet freshness on labsdb1006 is CRITICAL: Last successful Puppet run was Tue 26 Aug 2014 11:48:50 UTC [15:55:34] (03PS1) 10Gage: remove duplicated line [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/156305 [15:56:32] (03CR) 10Gage: [C: 032] remove duplicated line [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/156305 (owner: 10Gage) [16:02:49] (03CR) 10Chad: [C: 032] Fix turning on querying with all fields [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156302 (owner: 10Manybubbles) [16:02:54] (03Merged) 10jenkins-bot: Fix turning on querying with all fields [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156302 (owner: 10Manybubbles) [16:02:56] Where did jouncebot go? [16:03:16] * bd808|DEPLOY is going to deploy latest scap code now [16:04:48] bd808|DEPLOY: it's been dead for nearly a week now [16:06:34] It was alive enough to respond to the die command. Hopefully the job grid will restart it [16:07:14] !log Updated scap to 116027f (Make sync-common update l10n cdb files by default) [16:07:19] !log demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 04s) [16:07:20] Logged the message, Master [16:07:26] Logged the message, Master [16:09:17] mediawiki.org is very borked for me right now [16:09:20] Any objections to a noop full scap so I can test the new code? greg-g Reedy ? [16:09:41] .me carefully scheduled this during the releng weekly meeting [16:10:09] robla: works for me? URL? [16:10:17] logged out: http://www.mediawiki.org/wiki/Extension:CentralNotice [16:10:51] ah....hhvm [16:10:54] works for me in incognito mode. Are you running the hhvm cookie? [16:11:20] yeah, forgot I put the hhvm cookie on mw.org as well. [16:11:44] bd808|DEPLOY: Clear for me [16:11:57] sweet. [16:12:09] * bd808|DEPLOY scaps because it's fun [16:12:21] Wheeeeee [16:12:40] !log bd808 Started scap: no-op scap to test scap code update [16:12:46] Logged the message, Master [16:13:19] That's a yo dawg sul message [16:14:44] Looks like some scap scat [16:14:50] Scoobydoop bop wop [16:15:53] Reedy: "sync-common: 99% (ok: 226; fail: 0; left: 1)" [16:15:59] (03PS2) 10Rush: phab.wmfusercontent - add to misc varnish config [operations/puppet] - 10https://gerrit.wikimedia.org/r/156226 (owner: 10Dzahn) [16:16:09] Reedy: No more 100% with 1 left [16:16:10] (03CR) 10Rush: [C: 031] "yup" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156226 (owner: 10Dzahn) [16:16:39] wheeee [16:18:20] Does it floor instead of round? [16:18:32] marktraceur: yup [16:18:45] gj [16:20:20] !log Rsync sloooow to fenari "16:18:52 fenari INFO - Finished rsync common (duration: 04m 38s)" [16:20:26] Logged the message, Master [16:20:53] In other news, why are we syncing to fenari? [16:21:16] noc? [16:21:45] hhvm cookie? :o [16:22:22] noc [16:23:44] MatmaRex: Probably to be announced more widely soonish. There are a couple hhvm nodes running that you can choose to use for some/all wikis via special cookie. [16:24:16] It needs a bit of dogfooding before we unleash on the general public [16:24:32] ha. magic [16:25:26] MatmaRex: You may enjoy the skin hashes shown at https://en.wikipedia.org/wiki/Special:Version now [16:25:37] Does the cookie have the same name as on translatewiki.net [16:25:46] bd808|DEPLOY: :D [16:26:03] <_joe_> bd808|DEPLOY: don't advertise it please [16:26:05] <_joe_> not yet [16:26:12] !log bd808 Finished scap: no-op scap to test scap code update (duration: 13m 31s) [16:26:19] Logged the message, Master [16:26:19] _joe_: agreed [16:26:26] * Nemo_bis shuts up [16:26:27] <_joe_> I just found out a couple of problems with how we manage requests there [16:26:36] <_joe_> and I need to fix those up [16:26:51] <_joe_> or we'll have people using the zend php on the hhvm servers [16:26:56] <_joe_> which is *bad* [16:29:38] !log demon Synchronized wmf-config/InitialiseSettings.php: Again, with feeling (duration: 00m 04s) [16:29:44] Logged the message, Master [16:29:53] (03CR) 10Tim Landscheidt: [C: 031] "That should solve the bug, so in general I would prefer to mimic crontab's behaviour (i. e. either one (readable :-)) file or STDIN) and n" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156294 (https://bugzilla.wikimedia.org/69355) (owner: 10coren) [16:31:03] greg-g, Reedy: all done. [16:36:18] !log running removeOldManualUserPages.php (GlobalCssJs) for users who requested it [16:36:25] Logged the message, Master [16:37:19] (03PS1) 10Rush: prod phab settings update [operations/puppet] - 10https://gerrit.wikimedia.org/r/156308 [16:38:02] (03PS1) 10Gage: merge cdh module change: https://gerrit.wikimedia.org/r/#/c/156305/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/156309 [16:38:40] (03CR) 10Rush: [C: 032] prod phab settings update [operations/puppet] - 10https://gerrit.wikimedia.org/r/156308 (owner: 10Rush) [16:41:57] legoktm: Are you breaking the universe? [16:41:59] :) [16:42:11] no, I don't !log when I do that :) [16:42:34] !log Ran sync-common on osmium to verify that it now rebuilds l10n cache by default (and it does!) [16:42:38] ori: ^ [16:42:39] Logged the message, Master [16:42:57] Right. [16:45:47] <_joe_> bd808: \o_ [16:45:57] <_joe_> |o/ [16:46:12] <_joe_> (I'm a terrible dancer, even in ASCII) [16:46:34] It was an easy fix. SHould have happened a long time ago [16:47:35] (03PS2) 10Gage: merge cdh module change: https://gerrit.wikimedia.org/r/#/c/156305/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/156309 [16:49:06] _joe_: Here's the console output that shows it working -- http://paste.debian.net/117681/ [16:49:44] `sync-common --no-update-l10n` gives the old behavior and is used in the scap and sync-* commands [16:50:02] (03CR) 10Gage: [C: 032] merge cdh module change: https://gerrit.wikimedia.org/r/#/c/156305/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/156309 (owner: 10Gage) [16:55:45] bd808: I'd like to schedule a window to try out wikitech deployment. Is it realistic to think that we'll be ready next week? And if so are you willing to block out a couple of hours to bang on it with me? [16:55:59] Um, oh, sorry, are you in the middle of a deploy? We can talk about this later [16:56:28] andrewbogott: I'm all done. And yeah I can give you some time. [16:56:46] * ^d wants to help with this too! [16:57:12] How about Tuesday, 9AM PDT? [16:57:21] Or, um, 10AM if ^d wants to come :) [16:57:28] so in like 3 minutes? [16:57:29] That would be the same time as right now. [16:57:35] <^d> andrewbogott: 9am is fine, I'll be up and that's my wfh day. [16:57:40] ok. [16:57:43] legoktm: next week [16:57:48] :P [16:58:13] andrewbogott: block out 2 hours on the deploy calendar "just in case" [16:58:20] bd808: the only caveat here is that I'm off some the following week. So I won't be around to watch the next time an automatic deploy happens… when would that happen? [16:58:37] I'm out the 5th through the 9th. [16:58:49] <^d> Later that day tuesday, thursday. [16:58:55] andrewbogott: That depends on what group we put wikitech in. [16:59:05] ok... [16:59:13] <^d> I'd put it with group1 -- general non-wp pool [16:59:20] ^d: Do you think wikitech would be group0 or group1? [16:59:26] * bd808 thinks group1 [16:59:29] If one deployment happens on Thursday (when I'm still around) then that's pretty safe, I just don't want the FIRST auto-update to happen while I'm away [16:59:30] (03PS3) 10Rush: phab.wmfusercontent - add to misc varnish config [operations/puppet] - 10https://gerrit.wikimedia.org/r/156226 (owner: 10Dzahn) [16:59:31] <^d> I think group1. group0 we break a little too often. [16:59:47] (03CR) 10Rush: "hope it's cool I'm merging this, thanks again daniel" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156226 (owner: 10Dzahn) [17:00:14] (03CR) 10Rush: [C: 032 V: 032] phab.wmfusercontent - add to misc varnish config [operations/puppet] - 10https://gerrit.wikimedia.org/r/156226 (owner: 10Dzahn) [17:00:18] How are we going to bring in the MSW extension? Do we want to make a local gerrit repo that builds it from composer? [17:00:27] SMW [17:01:01] bd808: right now wikitech is running an old branch that doesn't require composer. [17:01:13] I'm pretty sure that issue is decoupled from this one, we can still just import the submodule of that old branch [17:01:18] <^d> bd808: make-wmf-branch could do the work and just check it all in. [17:01:28] <^d> (ugly, but would work) [17:01:41] Well, I don't especially /want/ to upgrade that now, better one thing at a time [17:01:57] We could have a jenkins job for it too. wikidata builds their gerrit repo via jenkins [17:01:57] Am I misunderstanding the problem? [17:02:12] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [17:02:22] PROBLEM - HTTP 5xx req/min on labmon1001 is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [17:02:32] (03PS1) 10Rush: phab security.alternate-file-domain needs http:// [operations/puppet] - 10https://gerrit.wikimedia.org/r/156317 [17:02:48] (03CR) 10Rush: [C: 032 V: 032] phab security.alternate-file-domain needs http:// [operations/puppet] - 10https://gerrit.wikimedia.org/r/156317 (owner: 10Rush) [17:03:20] andrewbogott: The branch you need is in https://git.wikimedia.org/log/mediawiki%2Fextensions%2FSemanticMediaWiki/ somewhere? [17:03:33] lemme look [17:03:40] springle: how urgent is the password thing? Should I get it deployed now or is waiting 6 hours until the next SWAT window ok? [17:04:17] We need a repo in gerrit to use for the git submodule (or we need to make changes to make-wmf-branch) [17:04:28] legoktm: 6 hours will be ok [17:05:03] ok [17:05:05] (03CR) 10QChris: [C: 031] gerrit: allow . in Jenkins jobs names [operations/puppet] - 10https://gerrit.wikimedia.org/r/156103 (owner: 10Hashar) [17:05:19] qchris: sorry about that lame regex mistake :-( [17:05:22] !log swapping failed disk labsdb1003 slot 1 [17:05:28] Logged the message, Master [17:06:04] hasharConfCall: No worries. That's what we do code review for ;-) [17:06:37] bd808: The current SMW on wikitech has origin "url = https://gerrit.wikimedia.org/r/p/mediawiki/extensions/SemanticMediaWiki" merge = refs/heads/1.8.x [17:06:54] So I feel like that's… a totally normal submodule, no need to change anything until we want to upgrade? [17:07:03] andrewbogott: Cool. We should be able to make that work I think [17:07:17] I predict that that branch is already committed to the standard wmf mw release branch [17:09:07] andrewbogott: Looks like it is -- https://github.com/wikimedia/mediawiki-tools-release/blob/master/make-wmf-branch/default.conf#L181 [17:09:10] Reedy: do you think you could deploy the backports for https://gerrit.wikimedia.org/r/#/q/I3a955526f1bc613d4eb79c5003012eda0a221fa1,n,z when you do your deploy? It's causing https://bugzilla.wikimedia.org/show_bug.cgi?id=70038 [17:09:33] Oh, but not Forms [17:10:07] andrewbogott: Looks like we branch from the master for forms. Line 160 in that file [17:10:41] Ah, you're right. [17:10:45] Ok, that's all set then, cool. [17:11:56] !log demon Synchronized wmf-config/PrivateSettings.php: adjust swift auth url for cirrus (duration: 00m 04s) [17:12:02] Logged the message, Master [17:12:02] bd808: is there a way to do a dry run of a deploy and see what we get? [17:12:11] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [17:12:22] andrewbogott: That makes things much easier than I'd feared. Current branches on tin all have SMW [17:12:47] yep. Getting to modern versions of SMW will be a drag, but we don't have to think about that now. [17:14:02] andrewbogott: We can send the config out to the cluster without attaching an apache vhost to it. Then we can check things using eval.php to make sure the config all looks good. [17:14:19] I didn't say anything crazy there did I ^d ? [17:14:40] !log demon Synchronized wmf-config/InitialiseSettings.php: touch (duration: 00m 04s) [17:14:45] <^d> Reading scrollback, sec. [17:14:47] Logged the message, Master [17:15:50] ^d: Did you know that sync-* automatically touches InitialiseSettings.php now on each sync? [17:16:05] <^d> Yes, but it wasn't working with something I put in private settings. [17:16:11] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [17:16:13] oh. boo [17:16:15] So the weird extensions we need are… DynamicSidebar, LdapAuthentication, OATHAuth, OpenStackManager, SemanticForms, SemanticMediaWiki, SemanticResultFormats, Validator [17:16:18] I think that's the complete list [17:16:22] RECOVERY - HTTP 5xx req/min on labmon1001 is OK: OK: Less than 1.00% above the threshold [250.0] [17:16:27] (03PS1) 10Gage: Hadoop: use Rolling File Appender alongside GELF [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/156321 [17:17:04] <^d> bd808: Oh yeah, duh. We started doing SMW in wmf branches ages ago so that's set. Just branch anything else we need and pin it to whatever makes sense for wikitech. [17:17:14] <^d> And yeah, we can deploy config, poke about with eval [17:17:16] andrewbogott: Those all seem to be setup in the branch [17:17:16] <^d> Make sure things work. [17:17:44] awesome [17:17:55] (03CR) 10Gage: [C: 032] Hadoop: use Rolling File Appender alongside GELF [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/156321 (owner: 10Gage) [17:18:08] (03PS1) 10Alexandros Kosiaris: WIP: openldap module [operations/puppet] - 10https://gerrit.wikimedia.org/r/156322 [17:18:43] Uhh.. mw.o is spitting out php when an exception happens? [17:19:04] <^d> bd808: I didn't see any failures on sync. Privatesettings isn't sync'd to terbium though. [17:19:28] weird. [17:20:32] ^d: I'll run sync-common there manually and see if I can spot a problem [17:21:34] ^d: Strangely sycn-common did update private/PrivateSettings.php and several other things [17:22:24] (03PS1) 10Gage: Hadoop: use Rolling File Appender alongside GELF [operations/puppet] - 10https://gerrit.wikimedia.org/r/156323 [17:23:10] I wonder if mw1161 is out of sync somehow? [17:23:43] The logs show that as the host that terbium was pulling from as the closest mirror [17:24:54] (03CR) 10Gage: [C: 032] Hadoop: use Rolling File Appender alongside GELF [operations/puppet] - 10https://gerrit.wikimedia.org/r/156323 (owner: 10Gage) [17:25:26] !log sync-* not updating terbium properly; sync-common from terbium manually got several config changes; maybe a problem with mw1161.eqiad.wmnet rsync mirror [17:25:33] Logged the message, Master [17:26:11] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [17:26:28] !log /usr/local/apache/common-local out of date on mw1161.eqiad.wmnet; updated via sync-common [17:26:34] Logged the message, Master [17:26:58] (03PS1) 10Yuvipanda: stats: Setup rsync for stat1002 as well [operations/puppet] - 10https://gerrit.wikimedia.org/r/156324 [17:28:15] bd808, ^d: So, a lot of the private settings that wikitech requires are shared with ldap and OpenStack. So they're already managed by the puppet private repo. I don't much like the idea of having them in two places. How do you feel about me having mw config include a dangling file like '/etc/wikitech/privateconf.php' and then have puppet populate that? [17:29:18] andrewbogott: Can puppet just populate it on tin in /a/common/private and then let it get synced normally? [17:29:42] (03CR) 10Ottomata: [C: 032 V: 032] Add replication lag settings [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/156201 (owner: 10QChris) [17:29:57] <^d> bd808: terbium looks good now, thx [17:30:01] bd808: Sure.. in that case that file would be sitting in an otherwise-git-managed folder but in .gitignore. Is that better? [17:30:24] andrewbogott: Well then we only need to force a puppet run in one place to update it [17:30:32] true. [17:30:40] Ok, I'll have a go. [17:31:06] ^d: Still looking at logs to see if I can figure out why terbium and mw1161 were out of sync with tin [17:31:43] All hosts that synced with mw1161 last are out of date too because of that [17:31:50] (03CR) 10QChris: "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156202 (owner: 10QChris) [17:32:02] And I updated scap a bit ago so... probably my fault [17:36:43] !log mw1010.eqiad.wmnet was out of sync too. I suspect there is something wrong with the fanout update step in scap [17:36:49] Logged the message, Master [17:38:40] ^d: Did you do a sync-dir when you updated those files that didn't end up on terbium? Or a sync-file? [17:38:48] <^d> sync-file [17:39:37] * bd808 looks closely at the sync-file logic [17:43:13] oh man... it's the symlink [17:43:38] ^d: You synced the symlink, not the real file [17:43:48] which is a stupid trap [17:44:23] <^d> Gah! [17:44:28] <^d> Whoops [17:44:34] This is the actual rsync command that ran -- `sudo -u mwdeploy -n -- /usr/bin/rsync --archive --delete-delay --delay-updates --compress --delete --exclude=**/.svn/lock --exclude=**/.git/objects --exclude=**/.git/**/objects --exclude=**/cache/l10n/*.cdb --no-perms --include=/wmf-config --include=/wmf-config/PrivateSettings.php --exclude=* tin.eqiad.wmnet::common /usr/local/apache/common-local` [17:44:52] And /wmf-config/PrivateSettings.php is the symlink, not the file [17:45:23] I don't think this is your fault. It's a tricky bug in the current logic and repo layout [17:47:01] !log bd808 Synchronized private/PrivateSettings.php: Syncing file rather than symlink (duration: 00m 04s) [17:47:08] Logged the message, Master [17:48:02] So what can we do about this? Should scap check to see if the file being synced is a symlink and if so also sync the target? [17:48:12] It's a bug for sure and will trip up others [17:51:21] PROBLEM - Puppet freshness on labsdb1006 is CRITICAL: Last successful Puppet run was Tue 26 Aug 2014 11:48:50 UTC [17:51:34] <^d> Was someone running an mwgrep or otherwise really complex query? [17:51:38] <^d> Against ES. [17:52:42] ACKNOWLEDGEMENT - Puppet freshness on labsdb1006 is CRITICAL: Last successful Puppet run was Tue 26 Aug 2014 11:48:50 UTC alexandros kosiaris planet osm sync [17:56:50] ^d: sync-file issue filed as https://bugzilla.wikimedia.org/show_bug.cgi?id=70054 [17:56:56] <^d> ty [17:58:33] who is deploying today? [18:00:47] ^d ottomata manybubbles ggrrr laptop battery died [18:00:51] PROBLEM - Disk space on virt1000 is CRITICAL: DISK CRITICAL - free space: / 1673 MB (2% inode=84%): [18:01:56] ^d ottomata manybubbles anyways I guess hangout is over by now? [18:02:07] godog: yeah - just about [18:02:19] ack [18:03:53] bd808: Is everything in /a/common/private copied? Or do I need to specify someplace that I added a new file? And, will that new file get copied out everywhere? [18:05:21] andrewbogott: a full scap copies everything in /a/common [18:05:32] So you're good to go [18:05:35] ok [18:05:42] http://en.wikipedia.beta.wmflabs.org/ seems down. [18:06:09] chrismcmahon: ^ [18:07:00] oh noes [18:07:02] StevenW: Try again? [18:07:19] StevenW: Apparently varnish had cached a crash [18:07:34] ?action=purge fixed it at least for me [18:07:41] http://en.wikipedia.beta.wmflabs.org still blank [18:07:44] * aude has caching [18:07:48] http://en.wikipedia.beta.wmflabs.org/? good [18:08:14] purge action worked. Hard refresh and incognito browser did not. [18:08:27] purge works [18:08:31] ah cmjohnson, i see you are back (and busy!) ping me about elasticsearch nodes when you get some time [18:08:37] I think varnish had the empty page cached [18:08:39] bd808: Reedy ^d who is deployign today [18:08:42] http://en.wikipedia.beta.wmflabs.org/?action=purge restores the page [18:08:43] greg-g: ^ [18:08:54] still empty for me [18:08:55] http://en.wikipedia.beta.wmflabs.org/ [18:08:57] aude: not me [18:09:04] bd808: ok [18:09:15] (03PS1) 10Reedy: Non wikipedias to 1.24wmf18 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156328 [18:09:20] :) [18:09:21] <^d> I'm guessing Reedy ^ [18:09:36] we have some config changes to do after and small submodule update [18:10:45] Fuck you chrome [18:10:51] :( [18:10:55] ori: Strangely varnish is returning a 304 status for that page when I hit it with an incognito browser. Like it cached a 304. [18:11:08] Why is gerrit tab dying because flash crashed [18:11:26] it uses flash for the copy to clipboard option [18:11:30] you can turn that off in the options [18:11:32] * aude still wonders about https://test.wikidata.org/wiki/Special:Version and why no logs [18:12:16] need to fix my f key [18:13:17] fffffffixed [18:16:08] <^d> manybubbles: cpu storm quieted down. [18:19:22] !log Failover from analytics1010-eqiad-wmnet to analytics1004-eqiad-wmnet successful [18:19:24] ori: I can't even get into the settings to disable it [18:19:27] Logged the message, Master [18:19:45] aude: i think i figured it out, i'll give you an update in a few [18:21:40] sigh [18:21:45] gerrit is ccompletely unuseable [18:21:50] ty Chrome [18:22:50] ori: \o/ [18:22:55] <^d> try opera [18:22:57] <^d> oh wait [18:23:03] <^d> its worse [18:24:05] And Chromes Report an Issue doesn't seem to do anything [18:24:24] firefox! [18:24:50] I've not got my 2 factor key so can't get into my password vault to get my gerrit/wikitech password [18:25:43] ottomata: ping [18:26:23] yoyo [18:27:14] so ja, just talked with elasticsearch folks [18:27:30] we want to do the ssd testing as soon as we can, so we can put in the procurement order for more nodes [18:27:41] PROBLEM - Hadoop NameNode Primary Is Active on analytics1010 is CRITICAL: Hadoop.NameNode.FSNamesystem.tag_HAState CRITICAL: standby [18:27:46] shhh, its ok! [18:28:03] jgage is doing a failover restart of stuff [18:28:10] we should have scheduled maintenance [18:28:11] acking... [18:28:12] Does someone want to do my merging for me? :P [18:28:21] so ja, cmjohnson [18:28:32] oh, actually, CLI? [18:28:35] in order to do testing, rob was suggesting to put the new ssds in an active prod node [18:28:39] one with a good raid controller [18:28:54] but, es folks would rather have more nodes online before we take some 'down' for testing [18:28:59] as we are pushing the limits right now [18:29:02] so, is it psosible to [18:29:07] get 1018 fixed (it just needs a new ssd, i think) [18:29:20] and, get 1019 back online with old ssds in it? [18:29:35] once those are back up, we can take down 1016 and put the new ssds in that for testing [18:29:36] thoughts? [18:29:46] I don't think that will be a problem [18:30:04] ACKNOWLEDGEMENT - Hadoop NameNode Primary Is Active on analytics1010 is CRITICAL: Hadoop.NameNode.FSNamesystem.tag_HAState CRITICAL: standby ottomata This is a controlled failover start, analytics1004 is currently active. [18:30:08] I can get those disk swapped now [18:30:12] well within 30 mins [18:30:14] ok, awesome [18:30:26] hopefully 1018 and 1019 will just come back easily then [18:30:32] ottomata: i pushed some qualify vars patches that touch logging and stuff, would if you find a moment to review. Thanks [18:30:36] they should [18:30:43] but ya never know [18:32:06] matanya: i'll do the logging-relay one now, i need to check out some stuff on gadolinium anyway [18:32:18] thanks [18:32:51] (03PS2) 10Ottomata: logging-relay: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/156264 (owner: 10Matanya) [18:33:05] (03PS1) 10Ori.livneh: mediawiki::syslog: restore ':omfile' prefix to output channel filter [operations/puppet] - 10https://gerrit.wikimedia.org/r/156332 [18:33:07] (03CR) 10Ottomata: [C: 032 V: 032] logging-relay: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/156264 (owner: 10Matanya) [18:33:44] (03CR) 10Ori.livneh: [C: 032 V: 032] "Trivial and tested." [operations/puppet] - 10https://gerrit.wikimedia.org/r/156332 (owner: 10Ori.livneh) [18:34:40] (03PS1) 10Gage: Hadoop: Yarn etc: explicitly use RFA + GELF [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/156333 [18:35:18] aude: the rsyslog config file for mediawiki used syntax that was not compatible with the newer version of syslog on trusty [18:35:42] i see [18:36:49] (03CR) 10Gage: [C: 032] Hadoop: Yarn etc: explicitly use RFA + GELF [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/156333 (owner: 10Gage) [18:38:11] (03PS1) 10Gage: Hadoop: Yarn etc: explicitly use RFA + GELF [operations/puppet] - 10https://gerrit.wikimedia.org/r/156335 [18:38:37] (03PS2) 10Gage: Hadoop: Yarn etc: explicitly use RFA + GELF [operations/puppet] - 10https://gerrit.wikimedia.org/r/156335 [18:38:48] Reedy: thanks! [18:39:09] php dippy-bird.php --username=reedy --server=gerrit.wikimedia.org --port=29418 --query=156319 --action=submit --review=+2 --verbose [18:39:12] * Reedy hugs awjr [18:39:33] (03CR) 10Gage: [C: 032] Hadoop: Yarn etc: explicitly use RFA + GELF [operations/puppet] - 10https://gerrit.wikimedia.org/r/156335 (owner: 10Gage) [18:39:47] Reedy: Wat. [18:40:12] :D [18:40:17] i totally forgot about that [18:40:22] marktraceurWMF: For when the web interface won't work [18:40:27] Heh. [18:40:40] i think i wrote that right after the gerrit migration [18:40:57] * MaxSem pokes apergos about https://gerrit.wikimedia.org/r/#/c/155080/ [18:40:58] Reedy: also https://github.com/pandemicsyn/fgerrit [18:41:02] i am glad it is doing good things for you Reedy :) [18:41:12] Ugh [18:41:17] Gerrit crashes in FF too [18:41:22] This laptop aint happy [18:42:44] apergos, from the looks of it, the easiest way to dump a table is to create it everywhere. and it's already being added to every new wiki regardless of GeoData status on it - do you think I should just go ahead with it? [18:43:59] (03PS1) 10Andrew Bogott: Add private wikitech settings to /a/common/private on tin [operations/puppet] - 10https://gerrit.wikimedia.org/r/156339 [18:44:18] bd808|LUNCH, ^d: ^ [18:44:44] (03CR) 10jenkins-bot: [V: 04-1] Add private wikitech settings to /a/common/private on tin [operations/puppet] - 10https://gerrit.wikimedia.org/r/156339 (owner: 10Andrew Bogott) [18:45:25] aude: what's up? [18:46:17] greg-g: the beta feature and other things [18:46:39] (badges, serialization change) [18:47:02] no new code, except tiny amount of css [18:47:06] (03PS2) 10Andrew Bogott: Add private wikitech settings to /a/common/private on tin [operations/puppet] - 10https://gerrit.wikimedia.org/r/156339 [18:47:55] (03PS2) 10Ori.livneh: Apache config for testwikidatawiki using mod_proxy_fcgi [operations/puppet] - 10https://gerrit.wikimedia.org/r/147437 (owner: 10Reedy) [18:48:00] aude: sorry, is there a question in there? I'm having trouble parsing the scrollback :/ [18:48:07] no question [18:48:15] well, was wondering who was deploying [18:48:21] resolved :) [18:48:24] cool :) [18:48:26] aude: facepalm [18:48:34] aude: test.wikidata never had its hhvm configs merged [18:48:37] omg [18:48:47] it's because they were uploaded by reedy, and we don't trust him [18:48:53] I don't trust me [18:49:09] (03CR) 10Ori.livneh: [C: 032] Apache config for testwikidatawiki using mod_proxy_fcgi [operations/puppet] - 10https://gerrit.wikimedia.org/r/147437 (owner: 10Reedy) [18:49:23] happy to have these :) [18:49:42] of course if we could read Special:Version we'd notice that instantly [18:49:48] but it was borked for PHP5 reasons! [18:49:59] yep [18:50:09] (03CR) 10Dzahn: [C: 031] "so, we are removing x01-x08 x0b,x0c and x0f-x1f, which means the entire control character range [1], besides keeping x00, x09, x0a, x0d an" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/156100 (https://bugzilla.wikimedia.org/69747) (owner: 10Aklapper) [18:50:12] very much could be something we changed, but have no idea what [18:51:10] (03CR) 10Reedy: [C: 032 V: 031] Non wikipedias to 1.24wmf18 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156328 (owner: 10Reedy) [18:51:41] RECOVERY - Hadoop NameNode Primary Is Active on analytics1010 is OK: Hadoop.NameNode.FSNamesystem.tag_HAState OKAY: active [18:52:55] !log reedy Synchronized php-1.24wmf17/extensions/MassMessage: (no message) (duration: 00m 16s) [18:53:01] Logged the message, Master [18:53:16] !log reedy Synchronized php-1.24wmf18/extensions/MassMessage: (no message) (duration: 00m 14s) [18:53:22] Logged the message, Master [18:54:54] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.24wmf18 [18:55:00] Logged the message, Master [18:55:47] Fatal error: Call to protected method EchoNotificationController::isWhitelistedByUser() from context '' in /usr/local/apache/common-local/php-1.24wmf18/extensions/Echo/controller/NotificationController.php on line 276 [18:56:02] (03CR) 10Aude: Enable otherProjectsLinksBeta for Wikibase clients [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156257 (owner: 10Aude) [18:56:31] aude: https://test.wikidata.org/wiki/Special:Version [18:56:51] hurray! [19:00:09] Reedy: if no other issues, then would like to do config changes [19:00:13] which i cna do myself or you can [19:00:15] can* [19:00:56] Looks like that Echo one isn't frequent, but a valid issue [19:01:01] yep [19:01:41] aude: I think you're good to go [19:01:51] ok [19:02:45] (03CR) 10Aude: [C: 032] Enable otherProjectsLinksBeta for Wikibase clients [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156257 (owner: 10Aude) [19:03:03] (03Merged) 10jenkins-bot: Enable otherProjectsLinksBeta for Wikibase clients [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156257 (owner: 10Aude) [19:05:11] !log aude Synchronized wmf-config/Wikibase.php: enable otherprojects sidebar beta feature (duration: 00m 15s) [19:05:17] Logged the message, Master [19:05:35] * aude verifies [19:07:36] * Reedy waits for Jenkins [19:09:49] ok [19:10:26] we have another config change and subomodule update [19:10:29] submodule* [19:10:43] !log reedy Synchronized php-1.24wmf18/extensions/Echo/: (no message) (duration: 00m 14s) [19:10:44] but want few minutes for trying first thing [19:10:50] Logged the message, Master [19:10:51] Should be good to do your next one now [19:19:53] ottomata: an1019...getting same error now [19:20:15] an1018...is not coming back either...okay to try a a reinstall with it? [19:21:35] ja, 1018 and 1019 are not in prod [19:22:25] what error is that? [19:22:40] next thing [19:23:00] (03CR) 10Aude: [C: 032] Use new Wikibase serialization format on Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156258 (owner: 10Aude) [19:23:07] (03Merged) 10jenkins-bot: Use new Wikibase serialization format on Wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156258 (owner: 10Aude) [19:23:07] oh jgage [19:23:07] sorry [19:23:10] cmjohnson1: means es [19:23:11] jgage [19:23:13] elastic1018 and 1019 [19:23:14] Acquiring DHCP parameters for interface (90:B1:1C:23:36:1D) ... Succeeded [19:23:14] [iBoot-09]: No Target information. [19:23:19] aha ok :) [19:23:36] aye, yeah [19:23:39] i don't totally love having several hosts named *1018 [19:23:43] hah [19:23:49] yeah, especially when i work on them both [19:24:01] i quite often mean to log into one of the other and end up in the wrong spot [19:24:36] one of these days i'm going to be responding to some very carefully placed rm -rf / ( har har) RT ticket and will run it on the wrong node [19:25:26] (03CR) 10BryanDavis: [C: 031] "Someone should add this new file to the .gitignore in /a/common/private on tin." [operations/puppet] - 10https://gerrit.wikimedia.org/r/156339 (owner: 10Andrew Bogott) [19:25:47] still, this is probably less annoying than enforcing unique hostname numbers facility-wide [19:25:56] !log aude Synchronized wmf-config/Wikibase.php: enable new serialization format for wikidata (duration: 00m 08s) [19:26:02] Logged the message, Master [19:26:02] aye [19:26:04] indeed [19:26:29] * jgage -> lunch [19:26:42] Ahm, did someone change the mailman settings for bounces or something? [19:27:10] (03CR) 10Andrew Bogott: [C: 032] Add private wikitech settings to /a/common/private on tin [operations/puppet] - 10https://gerrit.wikimedia.org/r/156339 (owner: 10Andrew Bogott) [19:27:19] Brad just posted something to mediawiki-api-announce, and because I am on mediawiki-api-announce-owner I'm now getting hella bounce notifications [19:27:41] Hmm it seems to have stopped after 18 messages [19:28:09] * aude update submodule [19:30:33] ottomata: I don't think the problem is the server itself...cuz I am getting the same error on elastic1018 and nothing has changed. [19:31:51] bd808: Error: /Stage[main]/Mediawiki::Sync/Package[scap]/ensure: change from 116027fe7150233dca289235bd991df72fa27634 to latest failed: Could not get latest version: undefined method `strip' for nil:NilClass [19:31:54] Did I cause that, somehow? [19:32:33] RoanKattouw: that would be a per-list setting, Bounce processing, so you'd have to ask the other list admins [19:32:52] hm, cmjohnson1, yeah, 1019 did dhcp just fine [19:32:57] when i tried yesterday [19:33:03] it just wouldn't netboot [19:33:04] andrewbogott: Probably not. Looks like something related to ori's trebuchet provider changes [19:33:06] RoanKattouw: you can change the behaviour yourself with the admin pass [19:33:15] (03PS1) 10Aude: add wikibase badge css class names setting [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156352 [19:33:15] yeah it's hitting dhcpd but not get an image [19:33:30] " [19:33:36] Should Mailman send you, the list owner, any bounce messages that failed to be detected by the bounce processor? " and a bunch of related ones [19:33:51] andrewbogott: i can take a look; where are you seeing that? [19:33:57] ori: tin [19:33:57] ori: andrewbogott is seeing "Error: /Stage[main]/Mediawiki::Sync/Package[scap]/ensure: change from 116027fe7150233dca289235bd991df72fa27634 to latest failed: Could not get latest version: undefined method `strip' for nil:NilClass" from puppet (on tin I'm assuming) [19:34:22] mutante: OK I'll look at that [19:34:58] RoanKattouw: https://lists.wikimedia.org/mailman/admin/mediawiki-api-announce/bounce [19:35:18] bd808, andrewbogott: on it [19:35:59] (03CR) 10Ottomata: "Hm, ok. I don't think I realized that there was a public-datasets on stat1002." [operations/puppet] - 10https://gerrit.wikimedia.org/r/156324 (owner: 10Yuvipanda) [19:36:24] cmjohnson1: who has helped us with this in the past? rob? [19:36:39] ? [19:37:14] can i run scap? (almost hesitate, it's for one new message on special version) [19:37:17] i worked with robh on this ast wikimania ...we came up with zero..we concluded it was the disk but nothing new has been installed since so I am wondering [19:37:36] bd808: So… I'm also about to add this to PrivateSettings.php. Look right to you? https://dpaste.de/bE19 [19:37:37] aude: I ran it earlier today. Shouldn't be a problem. [19:37:41] (no code review for that :/ ) [19:37:42] (03CR) 10Aude: [C: 032] add wikibase badge css class names setting [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156352 (owner: 10Aude) [19:37:44] ok [19:37:46] (03Merged) 10jenkins-bot: add wikibase badge css class names setting [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156352 (owner: 10Aude) [19:37:51] i'll do the setting change and then scap [19:39:36] !log aude Synchronized wmf-config/Wikibase.php: add Wikibase badges css setting (duration: 00m 10s) [19:39:42] Logged the message, Master [19:39:56] andrewbogott: Looks good to me, but you may want to wait until we are ready to push out the config. RoanKattouw will track you down and beat you if you change tin and don't immediately sync to the cluster. [19:39:57] (03PS2) 10Dzahn: torrus: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/156261 (owner: 10Matanya) [19:40:12] bd808: ok [19:40:29] Although in a sense I have already changed tin [19:40:31] mutante: OK I've found and disabled the spammy bounce options. Thanks! [19:41:21] !log aude Started scap: Update new messages for Wikibase [19:41:28] Logged the message, Master [19:41:42] Well, ok then, RoanKattouw, there is a local change to /a/common/private/PrivateSettings.php that isn't checked in. [19:41:54] I didn't do it! [19:41:58] cirrusAuthUrl [19:41:59] RoanKattouw: cool, yw [19:42:18] If you are ready to sync (and when aude is done), `sync-file private/WikitechPrivateSettings.php "new settings for labswiki"`; change PrivateSettings.php; `sync-file private/PrivateSettings.php "Include WikitechPrivateSettings.php for labswiki"` [19:42:32] i'm done when scap is done [19:42:47] possible might be in swat if users report any isues [19:42:50] (03CR) 10Dzahn: [C: 032] torrus: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/156261 (owner: 10Matanya) [19:42:52] ok so far [19:43:54] _joe_: can you put RT8214 in the access request queue? [19:44:12] (03PS1) 10Jkrauska: Add oit to lead and polonium RT8214 [operations/puppet] - 10https://gerrit.wikimedia.org/r/156354 [19:44:21] cmjohnson1: if 1018 is doing it, i doubt it is disk [19:44:30] i reinstalled 1018 as it was with those disks not long ago [19:44:31] (03PS1) 10Andrew Bogott: Added a 'managed by puppet' warning [operations/puppet] - 10https://gerrit.wikimedia.org/r/156355 [19:44:33] a few montsh ago [19:44:35] i suppose [19:45:04] yeah..we did 1019 as well..without an issue. I am wondering if a firewall change was made recently [19:45:09] hmm [19:45:26] mark: any idea? [19:45:46] (03PS2) 10Andrew Bogott: Added a 'managed by puppet' warning [operations/puppet] - 10https://gerrit.wikimedia.org/r/156355 [19:46:57] (03CR) 10Andrew Bogott: [C: 032] Added a 'managed by puppet' warning [operations/puppet] - 10https://gerrit.wikimedia.org/r/156355 (owner: 10Andrew Bogott) [19:48:38] !log aude Finished scap: Update new messages for Wikibase (duration: 07m 16s) [19:48:44] Logged the message, Master [19:49:14] got an error for mw1010.eqiad.wmnet::common [19:49:16] bd808: [19:49:20] otherwise ok [19:49:36] sync-common failed: (03PS2) 10Ottomata: Configure wikimetrics' replication lag checking [operations/puppet] - 10https://gerrit.wikimedia.org/r/156202 (owner: 10QChris) [19:49:49] (03CR) 10Ottomata: [C: 032 V: 032] Configure wikimetrics' replication lag checking [operations/puppet] - 10https://gerrit.wikimedia.org/r/156202 (owner: 10QChris) [19:50:27] andrewbogott: i'm puppet-merging a change of yours, s'ok? [19:50:35] ottomata: yes thanks [19:50:40] basicall seems time out [19:50:42] y [19:50:59] (03PS1) 10Andrew Bogott: Added a bunch of semicolons [operations/puppet] - 10https://gerrit.wikimedia.org/r/156409 [19:51:32] aude: That error happened on mw1053.eqiad.wmnet actually. I need to make the error logs make the orgin host more clear [19:51:38] Theres a bug for that somewhere [19:51:52] anything i need to do or can ignor? [19:51:54] e [19:52:06] i don't know what is on mw1010 [19:52:08] (03CR) 10Andrew Bogott: [C: 032] Added a bunch of semicolons [operations/puppet] - 10https://gerrit.wikimedia.org/r/156409 (owner: 10Andrew Bogott) [19:53:48] (03PS1) 10Ori.livneh: trebuchet provider: don't crash when the remote tag is missing [operations/puppet] - 10https://gerrit.wikimedia.org/r/156430 [19:53:52] ^ andrewbogott, bd808 [19:54:07] aude: I'll run sync-common there to fix (mw1053.eqiad.wmnet) [19:54:33] ok [19:56:46] ottomata: semi good news..you will have 1018 back in a few...raid is rebuilding [19:56:55] oh! [19:56:57] ok... [19:57:03] doesn't fix our problem but you will have something [19:57:12] wait, rebuilding? like its just rebooting the old system? [19:57:14] i mean, that's fine... [19:57:35] (03CR) 10Ori.livneh: [C: 032] "trivial" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156430 (owner: 10Ori.livneh) [19:57:43] yeah..i wanted to reinstall to see if it worked but since it didn't I added disk [19:57:51] yes rebuilding old system [19:57:54] cool [19:57:59] fiiiine w me [19:58:08] i can reinitialize the elasticsearch parititons if it comes up [19:58:19] the os partition is all it should be rebuilding [19:58:32] yeah..you can follow along cat /proc/mdstat on 1018 or wait for icinga alerts [19:58:34] !log Ran sync-common on mw1053.eqiad.wmnet to recover from failure during last scap [19:58:39] Logged the message, Master [19:59:37] aude: ^ should be fixed now. That host is a jobrunner [19:59:42] ok [19:59:51] and is related to mw1010? [20:00:36] oh, i ssee [20:00:37] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [20:00:39] * aude reads again [20:01:45] andrewbogott: fixed (puppet on tin) [20:01:54] so I see! thanks [20:05:04] (03PS13) 10Andrew Bogott: Add wikitech config. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 [20:06:04] oh really [20:06:18] andrewbogott: wikitech is joining the rest of the wiki? That is awesome! [20:06:30] hashar: …maybe! [20:06:51] I wonder whether we will scap from tin to virt1000 or whatever host wikitech :D [20:08:14] hashar: Reviews welcome [20:08:58] <_joe_> cajoel: I will in a few [20:09:11] _joe_: thx [20:09:57] <_joe_> cajoel: I'm in UTC+2, so after 17:00Z expect me not to be around so often :) [20:10:08] no worries [20:11:22] <_joe_> cajoel: btw, it needs manager approval :) [20:11:33] indeed, added mark to the cc in rt [20:11:52] I already have similar access to sanger [20:12:02] if there's concern about precedance [20:12:13] not sure if that should be mentioned in the rt [20:12:17] <_joe_> no concern [20:12:35] <_joe_> and no need for that [20:13:13] <_joe_> If ma.rk doesn't answer before, I'll ping him tomorrow [20:13:25] MaxSem: add it to the ariel branch [20:14:11] (03PS3) 10Ori.livneh: mediawiki: avoid installing php packages on HAT servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/153772 (owner: 10Giuseppe Lavagetto) [20:16:58] Reedy: do your apache config patches include the fix for the unmatched parens in the regex? [20:19:18] (03PS4) 10Ori.livneh: Add mediawiki::packages::php5 [operations/puppet] - 10https://gerrit.wikimedia.org/r/153772 (owner: 10Giuseppe Lavagetto) [20:19:24] (03CR) 10Dzahn: [C: 032] exceptionmonitor: qualify var [operations/puppet] - 10https://gerrit.wikimedia.org/r/156262 (owner: 10Matanya) [20:20:39] (03CR) 10Dzahn: "nop" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156261 (owner: 10Matanya) [20:21:17] (03PS1) 10MaxSem: Dump GeoData information [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/156450 (https://bugzilla.wikimedia.org/51225) [20:21:27] apergos, ^ [20:21:52] (03CR) 10Hashar: [C: 04-1] "A bunch of random comments, some can probably be discard since it is late on this side of the earth." (036 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [20:22:06] andrewbogott: did my review, but it is a bit late to take it seriously sorry: ( [20:24:07] (03PS5) 10Ori.livneh: Add mediawiki::packages::php5 [operations/puppet] - 10https://gerrit.wikimedia.org/r/153772 (owner: 10Giuseppe Lavagetto) [20:26:18] (03PS6) 10Ori.livneh: Add mediawiki::packages::php5 [operations/puppet] - 10https://gerrit.wikimedia.org/r/153772 (owner: 10Giuseppe Lavagetto) [20:26:46] (03CR) 10BryanDavis: "> maybe 'wikitechwiki' would be a better database name for wikitech." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [20:28:27] (03CR) 10Ori.livneh: [C: 031] "cherry-picked in labs; does the right thing" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153772 (owner: 10Giuseppe Lavagetto) [20:31:38] (03CR) 10Dzahn: Add wikitech config. (032 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [20:33:49] (03CR) 10John F. Lewis: Add wikitech config. (032 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [20:34:02] (03CR) 10Dzahn: Add wikitech config. (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [20:38:04] (03CR) 10Dzahn: Add wikitech config. (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [20:38:39] (03CR) 10Dzahn: "checked on fluorine, no change" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156262 (owner: 10Matanya) [20:39:27] (03CR) 10Hashar: "Please note the contint Trusty slaves need all the mediawiki::packages. So you can get rid of them solely based on the Ubuntu version :-/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153772 (owner: 10Giuseppe Lavagetto) [20:42:56] ottomata: where do you set statistics::base::working_path [20:43:30] it's used in classes on stat1003, but what is the actual path [20:46:33] oof, you are asking a question i do not like the answer to [20:46:45] this has to do with the historical /a /srv descrepency [20:46:57] but [20:47:01] see misc/statistics.pp line 37 [20:47:26] mutante: ^ [20:48:07] ottomata: thanks , couldn't just grep it [20:48:39] that helps, just double checking a change doesnt break stuff [20:55:17] jgage: hive is angry on stat1002. log4j stuff is weird... [20:55:19] looking into it... [20:55:22] not sureyet [20:59:28] (03CR) 10Dzahn: [C: 032] mysql-config-research: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/156267 (owner: 10Matanya) [21:02:23] ottomata: hmm, i will look too. [21:02:25] (03CR) 10Dzahn: "checked on stat1003 - no change" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156267 (owner: 10Matanya) [21:02:41] jgage, looks like a couple of problems [21:03:13] the $hadoop.log.dir/hadoop.log.file does not exist or is not writeable by regular users [21:03:19] although, i'm not entirely sure what that is for [21:03:22] hadoop.log.dir=. [21:03:22] hadoop.log.file=hadoop.log [21:03:23] also [21:03:44] hrm [21:04:04] !log Updating our Jenkins Job Builder fork 0268581..e5c0c61 . Will let us define variables in 'default' section and override them when invoking a job template ( https://review.openstack.org/#/c/100020/ ) [21:04:06] "No default-logstash-fields.properties resource present, using defaults" in present in any hive or hadoop command [21:04:10] Logged the message, Master [21:04:14] e.g. [21:04:15] hadoop version [21:04:21] prints that out at as the first line [21:04:27] yeah i saw that when i ran the failover, i don't get it [21:04:36] its causing weirdness, dunno [21:04:43] try just doing [21:04:44] i will dig around. for a quick fix, we can set gelf_logging_enabled=false if you want [21:04:45] hdfs dfs -ls / [21:05:02] ja guess it won't hurt, now that the cluster has been restarted, eh? [21:05:12] yeah [21:05:20] i'll turn it off until i can track this down [21:05:20] ok, gonna do that [21:05:27] heh or you can [21:07:07] (03PS1) 10Ottomata: Disable gelf logging temporarily [operations/puppet] - 10https://gerrit.wikimedia.org/r/156457 [21:07:22] (03CR) 10Ottomata: [C: 032 V: 032] Disable gelf logging temporarily [operations/puppet] - 10https://gerrit.wikimedia.org/r/156457 (owner: 10Ottomata) [21:07:37] thanks [21:07:41] yeah, this is true on any node, [21:07:46] you can test on a worker node if you like [21:08:30] if you comment out your earlier hadoop-env changes, stuff works [21:08:48] but still prints out that default-logstash-fields.properties error [21:09:31] hmm [21:10:30] (03CR) 10Andrew Bogott: "I don't love the name 'labswiki' but it is the name of the db on the current wikitech; I'm reluctant to rename it as part of this change." (0311 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [21:11:56] (03CR) 10Dzahn: "renaming wikis (db's) has been a problem mostly because of external storage.. dba once said it's possible but needs scripting" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [21:15:01] (03CR) 10Chad: "It'd be easier to rename the beta one than the production one. That and labswiki is a terrible dbname for the one in beta and could use re" (032 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [21:17:16] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [21:17:56] (03CR) 10Chad: Add wikitech config. (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [21:18:16] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [21:24:22] (03PS1) 10Aude: Revert wikibase badges css setting [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156458 [21:24:30] MaxSem: is the name of the table geo_tags? If so I'll fix the entry (it's wrong) and put it around tomorrow, I have to deploy it for it to take effect [21:25:10] I'm gonna sleep now but I'll read your answer in the scrollback [21:28:19] apergos, yes [21:28:38] (03CR) 10Dzahn: "true, renaming beta would be easier. some inline comments about the config itself; protocol-relative links? upload logo to commons where o" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [21:29:28] (03CR) 10Andrew Bogott: "Daniel, I've tried to address most of your comments inline." (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [21:34:45] (03CR) 10Dzahn: Add wikitech config. (032 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [21:39:11] (03CR) 10Dzahn: "Andrew, thanks, yea, replied inline. you guys are right that protocol-relative doesn't do things when we already enforce HTTPS and even en" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [21:45:22] (03CR) 10Dzahn: "i'm also a bit confused about the LDAP module - it doesn't have an init.pp and there is a ./role/ directory within the module. let's try t" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117698 (owner: 10Matanya) [21:49:59] (03PS14) 10Andrew Bogott: Add wikitech config. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 [21:51:52] (03CR) 10Andrew Bogott: [C: 031] "Heh, I guess none of these files are exactly human-readable... gerrit times out trying to show the diff on the first one." [operations/software/swift-ring] - 10https://gerrit.wikimedia.org/r/156252 (owner: 10Filippo Giunchedi) [21:53:16] PROBLEM - RAID on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:54:17] RECOVERY - RAID on mw1053 is OK: OK: no RAID installed [22:07:16] PROBLEM - puppet last run on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:09:17] RECOVERY - puppet last run on mw1053 is OK: OK: Puppet is currently enabled, last run 703 seconds ago with 0 failures [22:15:16] PROBLEM - RAID on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:15:17] PROBLEM - puppet last run on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:16:16] RECOVERY - RAID on mw1053 is OK: OK: no RAID installed [22:25:17] PROBLEM - RAID on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:26:27] RECOVERY - RAID on labsdb1003 is OK: OK: optimal, 1 logical, 2 physical [22:26:58] (03CR) 10Dzahn: "matanya: silver only has LDAP client tools, basically that's where ops go to make changes to LDAP groups (it installs scripts such as modi" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117698 (owner: 10Matanya) [22:29:56] <^d> bah, elastic boxes are still precise. [22:31:30] RECOVERY - RAID on mw1053 is OK: OK: no RAID installed [22:35:04] (03CR) 10Dzahn: "so actually using role::server classes:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117698 (owner: 10Matanya) [22:45:10] (03CR) 10Dzahn: [C: 04-1] "all that being said, this class does not seem to be used indeed, while i do see iptables rules on virt0 and virt1000 allowing ldap(s) from" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117698 (owner: 10Matanya) [22:47:15] (03PS1) 10MaxSem: Remove old stuff [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156468 [22:54:04] (03CR) 10Dzahn: "also see https://rt.wikimedia.org/Ticket/Display.html?id=4346 :p" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117698 (owner: 10Matanya) [22:54:08] (03PS1) 10Yuvipanda: quarry: Add python-unicodecsv module [operations/puppet] - 10https://gerrit.wikimedia.org/r/156470 [22:54:25] andrewbogott: can you merge ^? trivial [22:55:18] (03CR) 10Kaldari: [C: 032] Remove old stuff [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156468 (owner: 10MaxSem) [22:55:22] (03Merged) 10jenkins-bot: Remove old stuff [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156468 (owner: 10MaxSem) [22:55:49] eh, I'll need to SWAT it:) [22:59:18] swat! [22:59:19] !log maxsem Synchronized wmf-config/: SWAT: 204c867507fa5daf2bb06d93311ee0cc400e1dd8 (duration: 00m 10s) [22:59:44] can https://gerrit.wikimedia.org/r/#/c/156458/ be deployed? or else can i do it? [23:00:19] (03CR) 10Dzahn: [C: 032] "support unicode :) https://pypi.python.org/pypi/unicodecsv" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156470 (owner: 10Yuvipanda) [23:01:14] aude, I can - meanwhile, please add it to the list:) [23:01:33] done [23:01:46] * aude was too eager today [23:01:55] (03CR) 10MaxSem: [C: 032] Revert wikibase badges css setting [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156458 (owner: 10Aude) [23:02:01] (03Merged) 10jenkins-bot: Revert wikibase badges css setting [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/156458 (owner: 10Aude) [23:02:07] thanks [23:03:04] !log maxsem Synchronized wmf-config/Wikibase.php: 66d40797251e7eecfd78f2710c881c81169940ef (duration: 00m 05s) [23:03:09] aude, ^ [23:03:11] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/254/change/156213/diff/ytterbium.wikimedia.org.diff.formatted" [operations/puppet] - 10https://gerrit.wikimedia.org/r/156213 (owner: 10Chad) [23:03:28] looks ok [23:03:33] thanks [23:05:00] (03PS4) 10Dzahn: gerrit - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153849 [23:05:24] mutante_: ty [23:05:26] PROBLEM - Disk space on elastic1009 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 20169 MB (3% inode=99%): [23:08:55] <^d> #gettingold [23:11:47] <^d> Hmm, 18 got repooled. [23:12:02] (03CR) 10Dzahn: [C: 04-1] "eh wait, in the past when you edited Apache config since Tim wrote the generator, you had to edit redirects.dat, then generate redirects.c" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [23:12:17] <^d> Not in pybal tho. [23:13:18] (03CR) 10Jeremyb: "you also had to be a "trusted" user. or it would just skip the job entirely (and never comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [23:13:42] <^d> Hmm, 18 is spinning pretty hard on system though, not doing anything else interesting. [23:14:13] * jeremyb pokes mutante_ [23:14:48] <^d> Hmm. [23:14:49] <^d> Aug 26 23:03:01 elastic1018 mdadm[1402]: NewArray event detected on md device /dev/md2 [23:15:09] jeremyb: new bug to add checking from apache-config repo to ops/puppet repo? [23:15:49] also, now it turned from puppet change to "also apache deployment" [23:17:16] PROBLEM - RAID on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:17:25] these are the disadvantages of moving it all into the mediawiki module when the redirects are not even related to mediawiki [23:17:26] <^d> Hmm, looks like /var/lib/elasticsearch isn't mounted? maybe fstab is wrong because new disk? [23:18:21] mutante, you fixed your client :) [23:18:42] ^d: #8091: elastic1018 - broken SSD [23:18:53] <^d> Yes, has new ssd and is online again. [23:18:59] wait, when did maxsem become OuKB ? :) [23:19:01] <^d> (I saw it joined the ES cluster) [23:19:05] no update about that on ticket [23:19:07] ^d, can't always trust /etc/mtab; check cat /proc/mounts [23:19:12] besides the asking about the status [23:19:16] RECOVERY - RAID on mw1053 is OK: OK: no RAID installed [23:19:19] greg-g, welcome back [23:19:22] <^d> jeremyb: I was. [23:19:26] jeremyb: thanks [23:19:29] ^d, k [23:19:35] jeremyb: what did i fix? [23:19:53] greg-g, there was an impersonator... [23:20:01] mutante, logged in [23:20:04] jeremyb: of me or max? [23:20:12] greg-g, you [23:20:24] mutante, but still missing channel(s) [23:20:30] jeremyb: did you kill them? [23:20:39] jeremyb: uhhh.. indeed 15:43 -!- Irssi: Removed reconnection to server 2 port 6667 [23:20:41] <^d> cmjohnson1: Was elastic1018 supposed to come up again? [23:20:44] greg-g, no! [23:23:40] ^d checking [23:25:55] ^d raid is fixed but needs to be added to pool [23:26:09] <^d> Mounts are wrong too [23:26:58] <^d> syslog is filling up with stuff from mdadm, /proc/mounts doesn't show the /var/log/elasticsearch mount. [23:27:10] <^d> I think /etc/fstab is pointing to the old drive uuid. [23:27:56] was that mounted on /dev/sdb1 ? [23:28:36] I copied /dev/sda to /dev/sdb [23:29:19] <^d> /dev/md2: UUID="9aa4632b-bf42-4801-9813-f484cd5813b3" TYPE="ext4" [23:29:22] <^d> /dev/md2 /var/lib/elasticsearch ext4 rw,noatime,user_xattr,barrier=1,stripe=256,data=ordered 0 0 [23:29:25] <^d> From elastic1001 [23:30:17] PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: Puppet has 1 failures [23:30:19] <^d> md2, from what i can tell. [23:33:55] ^d we can't reinstall and it appears that mount point was added then...do you wanna manually add it? [23:34:12] <^d> Yeah, we can do that. [23:35:26] PROBLEM - Disk space on elastic1009 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 19237 MB (3% inode=99%): [23:36:17] PROBLEM - RAID on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:37:43] (03CR) 10Jeremyb: [C: 04-1] public_html directory service, see RT #6862 (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [23:37:55] ^ that one is also disabled in pybal but active in icinga ? [23:38:00] mw1053 [23:38:16] RECOVERY - RAID on mw1053 is OK: OK: no RAID installed [23:39:35] (03CR) 10Dzahn: "filed Bug 70068 for not having jenkins check Apache config anymore (and this could have broken things)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 (owner: 10ArielGlenn) [23:40:50] <^d> cmjohnson1: Something's wrong with raid still I think. the mdadm spam is key, methinks. [23:43:21] (03CR) 10Dzahn: [C: 031] "has manager approval and ticket now" [operations/puppet] - 10https://gerrit.wikimedia.org/r/155137 (owner: 10Yuvipanda) [23:43:26] PROBLEM - RAID on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:43:59] <^d> mdadm.conf probably needs fixing with the new drive uuid. [23:46:16] RECOVERY - RAID on mw1053 is OK: OK: no RAID installed [23:46:23] (03CR) 10Dzahn: "YuviPanda: check out https://forge.puppetlabs.com/puppetlabs/stdlib#ensure_packages" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153600 (owner: 10Yuvipanda) [23:47:58] mutante, i think nothing would have broken. you don't get any change in redirects.conf if redirects.dat is syntax error. and anyway, it only would have updated redirects.conf if someone manually ran it and so deploy would be a noop even if the .dat change was right [23:48:16] RECOVERY - puppet last run on cp3010 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [23:48:35] (03CR) 10Dzahn: [C: 031] Fix-ups for I3d002968c [operations/puppet] - 10https://gerrit.wikimedia.org/r/154027 (owner: 10Ori.livneh) [23:49:26] jeremyb: people would not run the generator command on deploy though [23:49:37] before you would upload both files to gerrit [23:50:02] so one file would have changed, the other becoming out of sync [23:50:32] but even if.. we once had tests, now we have no tests, right [23:51:03] mutante, right [23:51:52] mutante, just saying it couldn't take down the cluster as currently submitted to gerrit [23:52:24] if that's the case, yea, good [23:54:42] i don't understand the on caching... [23:54:46] the no caching* [23:54:56] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Tue 26 Aug 2014 21:53:54 UTC [23:54:58] shouldn't it be mostly static? [23:55:04] ? [23:55:22] mutante, varnish part of same changeset [23:55:49] jeremyb: oh, i think apergos says that because there could be large files [23:55:54] and per "no caching of large files" [23:56:04] so then don't cache large files [23:56:05] :) [23:56:15] heh, yea [23:56:26] PROBLEM - RAID on mw1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:57:13] (03PS1) 10Yuvipanda: quarry: Add python-translitcodec package [operations/puppet] - 10https://gerrit.wikimedia.org/r/156483 [23:57:16] mutante: ^ as well? [23:57:37] mutante: also I don't know if ensure_packages works well when the conflicting other package is defined as a package{} rather than using ensure_package [23:58:16] RECOVERY - RAID on mw1053 is OK: OK: no RAID installed [23:58:56] YuviPanda|zzz: as far as i understand it should work as long as the package{} comes first.. can change both? [23:59:15] mutante: the other is a big list in dev_environ, I'll be happy to change it :) [23:59:25] mutante: and 'comes first' makes it ordering dependent :(