[00:03:45] (03PS8) 10Yuvipanda: [Heavy WIP] Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [00:05:58] RECOVERY - puppet last run on mw1009 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [00:14:41] (03PS9) 10Yuvipanda: [Heavy WIP] Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [00:15:18] RECOVERY - puppet last run on mw1027 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [00:16:54] ori: getting 503s trying to save a big page on beta labs. [00:16:59] I personally blame you. [00:17:04] Please come repent. [00:17:32] StevenW: I could hook you up with a 500 or a 404 if you like [00:17:47] Oh please do. [00:17:50] I'd love that. [00:17:55] it's probably just timing out [00:18:14] beta has gotten remarkably prod-like, no? [00:21:15] heh [00:21:17] yes [01:10:28] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [01:23:43] (03PS10) 10Yuvipanda: [Heavy WIP] Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [01:26:14] (03PS11) 10Yuvipanda: [Heavy WIP] Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [01:28:25] (03PS12) 10Yuvipanda: [Heavy WIP] Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [01:32:28] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [01:34:19] (03PS13) 10Yuvipanda: [Heavy WIP] Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [01:47:39] !log ip addr del for cp4017's ip6_mapped addr on cp4018 (no idea why it was there...) [01:47:47] Logged the message, Master [01:48:28] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [01:49:34] (03PS4) 10BBlack: Add explicit mmap addrs for varnish persistent storage [operations/puppet] - 10https://gerrit.wikimedia.org/r/149068 [01:51:10] (03CR) 10BBlack: [C: 032] "All prod varnishes and all betalabs varnishes upgraded to 3.0.5-plus~x-wm7, moving forward with this." [operations/puppet] - 10https://gerrit.wikimedia.org/r/149068 (owner: 10BBlack) [01:52:09] (03PS14) 10Yuvipanda: [Heavy WIP] Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [01:52:38] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [01:55:11] (03PS15) 10Yuvipanda: [Heavy WIP] Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [01:59:44] (03PS16) 10Yuvipanda: [Heavy WIP] Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [02:00:13] (03PS17) 10Yuvipanda: [Heavy WIP] Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [02:15:38] (03PS18) 10Yuvipanda: [Heavy WIP] Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [02:23:50] (03PS19) 10Yuvipanda: [Heavy WIP] Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [02:27:48] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:31:43] (03PS20) 10Yuvipanda: [Heavy WIP] Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [02:31:48] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [02:36:51] (03PS1) 10Chad: Remove searchidx dsh group [operations/puppet] - 10https://gerrit.wikimedia.org/r/150474 [02:45:44] (03PS21) 10Yuvipanda: [Heavy WIP] Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [02:51:17] !log LocalisationUpdate completed (1.24wmf14) at 2014-07-30 02:50:14+00:00 [02:51:25] Logged the message, Master [03:39:32] !log LocalisationUpdate completed (1.24wmf15) at 2014-07-30 03:38:28+00:00 [03:39:38] Logged the message, Master [03:45:23] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: Puppet has 1 failures [04:00:23] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [04:21:58] (03PS1) 10BryanDavis: logstash: Don't pin package version [operations/puppet] - 10https://gerrit.wikimedia.org/r/150480 [04:22:22] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: Puppet has 1 failures [04:29:02] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul 30 04:27:56 UTC 2014 (duration 27m 55s) [04:29:07] Logged the message, Master [04:41:22] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [06:31:44] PROBLEM - puppet last run on mw1068 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:02] PROBLEM - puppet last run on ssl3002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:41:26] !log labsdb1001 work in progress; it may misbehave. see labs-l for updates [06:41:33] Logged the message, Master [06:46:52] RECOVERY - puppet last run on mw1068 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [06:52:03] RECOVERY - puppet last run on ssl3002 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [07:11:26] PROBLEM - puppet last run on mw1202 is CRITICAL: CRITICAL: Puppet has 1 failures [07:30:26] RECOVERY - puppet last run on mw1202 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [07:35:25] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I think it would be better to provide directly a new file, as this envvars file doesn't do nothing today - on trusty hosts it defines HHVM" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147514 (owner: 10Ori.livneh) [07:36:37] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.0133333333333 [08:21:49] neon unhappy? [08:24:48] (03PS2) 10Giuseppe Lavagetto: mediawiki::web: get rid of envvars.appserver [operations/puppet] - 10https://gerrit.wikimedia.org/r/147514 (owner: 10Ori.livneh) [08:24:51] am on a dogy 3g connection, but can't get icinga web or neon ssh [08:24:52] <_joe_> springle: how so? [08:24:54] _joe_: ^ [08:25:01] same for you? [08:25:04] or just me [08:25:13] <_joe_> lemme check [08:25:28] <_joe_> I guess you're right [08:25:58] <_joe_> it pings [08:27:28] <_joe_> getting into console now [08:28:11] <_joe_> console is dark, how surprising [08:28:26] heh [08:30:04] <_joe_> !log powercycling neon, doesn't respond to requests, ssh hangs, console dark [08:30:10] Logged the message, Master [08:43:45] (03PS1) 10Giuseppe Lavagetto: mediawiki: get rid of envvars files in puppet. [operations/puppet] - 10https://gerrit.wikimedia.org/r/150492 [08:47:19] (03PS5) 10Giuseppe Lavagetto: mediawiki: use mods-enabled, prepare for HAT [operations/puppet] - 10https://gerrit.wikimedia.org/r/148099 [08:50:59] (03PS2) 10Matanya: spamassassin: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/149993 [08:58:28] <_joe_> !log stopping puppet on the appservers, in preparation for releasing change 148099 [08:58:32] Logged the message, Master [08:59:03] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: use mods-enabled, prepare for HAT [operations/puppet] - 10https://gerrit.wikimedia.org/r/148099 (owner: 10Giuseppe Lavagetto) [09:01:56] PROBLEM - puppet last run on mw1017 is CRITICAL: CRITICAL: Epic puppet fail [09:03:22] <_joe_> I know [09:03:26] <_joe_> :) [09:07:38] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00666666666667 [09:08:09] (03PS1) 10Giuseppe Lavagetto: apache: fix duplicate declaration [operations/puppet] - 10https://gerrit.wikimedia.org/r/150496 [09:08:24] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] apache: fix duplicate declaration [operations/puppet] - 10https://gerrit.wikimedia.org/r/150496 (owner: 10Giuseppe Lavagetto) [09:12:21] (03PS1) 10Giuseppe Lavagetto: apache: fix another merge fail [operations/puppet] - 10https://gerrit.wikimedia.org/r/150498 [09:12:36] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] apache: fix another merge fail [operations/puppet] - 10https://gerrit.wikimedia.org/r/150498 (owner: 10Giuseppe Lavagetto) [09:12:57] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Wed Jul 30 09:12:53 UTC 2014 [09:17:57] RECOVERY - puppet last run on mw1017 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [09:20:38] hello [09:20:55] has anyone changed something in labs / LDAP / ssh / puppet config around 6:35am UTC this morning ? [09:21:07] I have some ssh permission denied on beta cluster [09:24:36] <_joe_> hashar oh that is why I cannot ssh on the puppet compiler probably as well [09:24:54] <_joe_> hashar: not that I know of, but I'm involved in something else now [09:25:01] !log set weight for ms-be1014 and ms-be1015 to 2300 [09:25:07] Logged the message, Master [09:25:45] /etc/nslcd.conf has been changed on one of the host [09:29:47] (03PS1) 10Giuseppe Lavagetto: mediawiki: fix issues with moving to mods-enabled [operations/puppet] - 10https://gerrit.wikimedia.org/r/150499 [09:30:36] <_joe_> hashar: revert it [09:30:45] <_joe_> hashar: labs is basically broken by now [09:30:52] I am not sure how it got changed though [09:30:57] nothing suspicious in puppet [09:31:05] <_joe_> see if someone is going to be able to help you [09:31:17] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: fix issues with moving to mods-enabled [operations/puppet] - 10https://gerrit.wikimedia.org/r/150499 (owner: 10Giuseppe Lavagetto) [09:38:39] (03PS1) 10Hashar: deployment: wrap packages with ensure_packages() [operations/puppet] - 10https://gerrit.wikimedia.org/r/150501 [09:40:32] (03PS1) 10Giuseppe Lavagetto: mediawiki: fix typo [operations/puppet] - 10https://gerrit.wikimedia.org/r/150503 [09:40:50] springle: hey! around? got a small question [09:41:51] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: fix typo [operations/puppet] - 10https://gerrit.wikimedia.org/r/150503 (owner: 10Giuseppe Lavagetto) [09:41:53] springle: specifically, if I want to self-govern long running queries from my tool on labsdb, what kinda timeout should I put on it? [09:44:00] <_joe_> how dumb am I [09:44:02] (03PS1) 10Giuseppe Lavagetto: mediawiki: fix typo (again) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150504 [09:44:21] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: fix typo (again) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150504 (owner: 10Giuseppe Lavagetto) [09:44:31] (03CR) 10Giuseppe Lavagetto: [V: 032] mediawiki: fix typo (again) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150504 (owner: 10Giuseppe Lavagetto) [09:45:25] (03PS2) 10Hashar: deployment: wrap packages with ensure_packages() [operations/puppet] - 10https://gerrit.wikimedia.org/r/150501 [09:47:13] (03CR) 10Hashar: "ps2 drops git-core package" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150501 (owner: 10Hashar) [09:47:49] (03CR) 10Filippo Giunchedi: [C: 031] logstash: Don't pin package version [operations/puppet] - 10https://gerrit.wikimedia.org/r/150480 (owner: 10BryanDavis) [09:52:48] (03PS1) 10Matanya: swift:lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/150505 [09:53:14] (03Abandoned) 10Matanya: swift: lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/140654 (owner: 10Matanya) [09:59:35] (03PS22) 10Yuvipanda: [Heavy WIP] Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [10:00:14] (03CR) 10jenkins-bot: [V: 04-1] [Heavy WIP] Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 (owner: 10Yuvipanda) [10:01:04] (03PS23) 10Yuvipanda: [Heavy WIP] Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [10:08:09] (03PS1) 10Giuseppe Lavagetto: mediawiki fix service name [operations/puppet] - 10https://gerrit.wikimedia.org/r/150510 [10:09:51] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki fix service name [operations/puppet] - 10https://gerrit.wikimedia.org/r/150510 (owner: 10Giuseppe Lavagetto) [10:19:31] godog: https://gerrit.wikimedia.org/r/150505 [10:21:39] (03PS24) 10Yuvipanda: [Heavy WIP] Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [10:23:30] (03CR) 10Hoo man: [C: 04-1] "No community consensus... and no, we can't fully replace this extension with Wikibase functionality just yet :S" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150301 (https://bugzilla.wikimedia.org/68815) (owner: 10Reedy) [10:26:25] (03PS25) 10Yuvipanda: [Heavy WIP] Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [10:38:26] YuviPanda: no simple answer. shorter transactions always better. how long are you talking? [10:38:35] springle: 10m? [10:38:43] as a hard limit [10:38:48] most queries won't go there, of course [10:39:23] if it's a 10m full table scan or massive bulk insert.. that's a problem. a 10m slow select no big deal [10:40:31] springle: right, only selects allowed. [10:41:23] matanya: ack, looking [10:43:37] (03PS1) 10Giuseppe Lavagetto: mediawiki: syntax typo fix, remove unused modules [operations/puppet] - 10https://gerrit.wikimedia.org/r/150516 [10:44:08] (03CR) 10Filippo Giunchedi: [C: 031] "will merge tomorrow" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150505 (owner: 10Matanya) [10:45:01] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: syntax typo fix, remove unused modules [operations/puppet] - 10https://gerrit.wikimedia.org/r/150516 (owner: 10Giuseppe Lavagetto) [10:45:24] (03CR) 10Filippo Giunchedi: [C: 031] deployment: wrap packages with ensure_packages() [operations/puppet] - 10https://gerrit.wikimedia.org/r/150501 (owner: 10Hashar) [10:45:52] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] logstash: Don't pin package version [operations/puppet] - 10https://gerrit.wikimedia.org/r/150480 (owner: 10BryanDavis) [10:48:19] (03PS1) 10Giuseppe Lavagetto: fix duplicate inclusion of resource [operations/puppet] - 10https://gerrit.wikimedia.org/r/150518 [10:48:24] (03CR) 10Filippo Giunchedi: [C: 031] mediawiki::web: get rid of envvars.appserver [operations/puppet] - 10https://gerrit.wikimedia.org/r/147514 (owner: 10Ori.livneh) [10:48:32] (03PS2) 10Giuseppe Lavagetto: fix duplicate inclusion of resource [operations/puppet] - 10https://gerrit.wikimedia.org/r/150518 [10:48:39] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] fix duplicate inclusion of resource [operations/puppet] - 10https://gerrit.wikimedia.org/r/150518 (owner: 10Giuseppe Lavagetto) [10:53:19] (03PS1) 10Giuseppe Lavagetto: mediawiki: re-enabling mod_filter [operations/puppet] - 10https://gerrit.wikimedia.org/r/150520 [10:53:43] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] mediawiki: re-enabling mod_filter [operations/puppet] - 10https://gerrit.wikimedia.org/r/150520 (owner: 10Giuseppe Lavagetto) [10:54:35] (03PS1) 10Matanya: access : add cscott to pdf admin [operations/puppet] - 10https://gerrit.wikimedia.org/r/150521 [10:56:13] (03PS2) 10Matanya: access : add cscott to pdf admin [operations/puppet] - 10https://gerrit.wikimedia.org/r/150521 [10:58:46] <_joe_> !log re-enabling puppet on mw1018, testwiki upgraded to the new config and looks fine [10:58:51] Logged the message, Master [11:09:46] (03PS1) 10Giuseppe Lavagetto: mediawiki: do not load mod_filter twice [operations/puppet] - 10https://gerrit.wikimedia.org/r/150526 [11:11:10] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: do not load mod_filter twice [operations/puppet] - 10https://gerrit.wikimedia.org/r/150526 (owner: 10Giuseppe Lavagetto) [11:15:51] springle: also, which database should I use for the tool, atleast during Wikimania? currently hitting s5 [11:15:53] <_joe_> !log re-enabling puppet on mw1019, last bunch of tests, then re-enabling globally [11:15:58] Logged the message, Master [11:16:57] (tool isn't publicized yet, so no queries) [11:17:40] <_joe_> !log enabling puppet on all mw* servers [11:17:46] Logged the message, Master [11:18:08] YuviPanda: 25 is fine. s1 will also upgrade tomorrow [11:18:11] s5* [11:18:25] springle: cool. In the future, should I round robin it? when everything has everything... [11:20:35] YuviPanda: Don't think we can answer that yet. It might turn out that it's better to isolate the damage to a specific backend ;) I'll wait until we see how weel your tool can defend against crazy queries! [11:20:46] springle: :D ok. [11:21:04] springle: some real world usage might start tomorrow, planning on announcing to the research lists [11:21:27] <_joe_> coordinate with ops so that we're here for the outage :P [11:21:35] _joe_: :D [11:21:42] <_joe_> joking of course [11:21:53] YuviPanda: Friday would be nicer. Tommorrow we're already messing with s1... [11:22:12] oh wait.. friday. how is that ever better [11:22:17] springle: sure. I'll get a bunch of folks to closet test it tomorrow, and then mail out on friday [11:22:18] haha ;) [11:22:26] ok [11:22:46] springle: worst case, I can just turn the tool off temporarily, or limit the query runners to not run more than a small number of queries concurrently [11:23:07] springle: this isn't on tools - it's on the quarry project on labs, and I've built in a lot of configurability to ensure we can tune as necessary [11:23:27] great [11:26:18] (03CR) 10Filippo Giunchedi: hhvm: lintian fixes (032 comments) [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/150213 (owner: 10Giuseppe Lavagetto) [11:29:17] (03CR) 10Giuseppe Lavagetto: hhvm: lintian fixes (032 comments) [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/150213 (owner: 10Giuseppe Lavagetto) [11:30:53] _joe_: pretty nice speed-up in your benchmarks [11:31:22] _joe_: is the user time for 10 parses, or a single one? [11:31:39] <_joe_> gwicke: for the 10 of them [11:31:55] ah, makes sense [11:32:10] <_joe_> I got it wrapping the obama test with 'time -v' [11:32:29] *nod* [11:32:38] I often use /usr/bin/time instead of the shell builtin [11:32:45] reports memory too [11:33:11] on which machine did you run this? [11:33:14] <_joe_> oh yes I used the binary [11:33:18] would be nice to compare this to parsoid [11:33:22] <_joe_> gwicke: mw1053 is the hhvm jobrunner [11:33:34] <_joe_> and mw1017 is testwiki, so php [11:33:45] <_joe_> I stopped the JR on mw1053 before testing of course [11:33:52] k [11:34:07] on my laptop Obama takes 10s with Parsoid [11:34:11] so pretty much the same [11:34:27] <_joe_> gwicke: and our code is not optimized for HHVM probably [11:35:02] <_joe_> I expect our speeds to improve with time when more code gets converted and we can for instance run in RepoAuthoritative mode [11:35:25] yeah, PHP should really be much faster as it's not doing any DOM building etc [11:35:52] <_joe_> well, the php code is also older [11:36:47] <_joe_> but well, hhvm includes async primitives, and I really liked the fact you can convert your code to hack step by step [11:37:16] <_joe_> but I don't think we'll go down that route anytime soon [11:37:25] <_joe_> given standard zend php doesn't support hack [11:38:40] yup [11:39:05] we'll have a lot of other options once we decide to ditch shared hosting support [11:43:38] <_joe_> btw, off for lunch. bbiab [12:18:42] (03CR) 10TTO: [C: 04-1] Add export-0.9.xsd (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149643 (https://bugzilla.wikimedia.org/68686) (owner: 10Reedy) [12:26:31] (03CR) 10Alexandros Kosiaris: [C: 032] solr: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/149960 (owner: 10Matanya) [12:27:48] (03CR) 10Alexandros Kosiaris: [C: 032] gmond - outdated variable syntax in erb [operations/puppet] - 10https://gerrit.wikimedia.org/r/147221 (owner: 10Dzahn) [12:28:50] (03CR) 10Alexandros Kosiaris: [C: 032] sudo privileges should be in array [operations/puppet] - 10https://gerrit.wikimedia.org/r/150160 (owner: 10Alexandros Kosiaris) [12:31:45] (03CR) 10Alexandros Kosiaris: [C: 032] deployment: wrap packages with ensure_packages() [operations/puppet] - 10https://gerrit.wikimedia.org/r/150501 (owner: 10Hashar) [12:44:11] (03CR) 10Reedy: Add export-0.9.xsd (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149643 (https://bugzilla.wikimedia.org/68686) (owner: 10Reedy) [13:00:05] K4-713: Dear anthropoid, the time has come. Please deploy Fundraising (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140730T1300). [13:16:55] !log rebuiding Cirrus index for commons to pick up weighted all field [13:17:02] Logged the message, Master [13:30:22] (03CR) 10Filippo Giunchedi: hhvm: lintian fixes (032 comments) [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/150213 (owner: 10Giuseppe Lavagetto) [13:35:53] (03CR) 10Qgil: "What is the status of this very old changeset? Should it be abandoned? Or at least given a -1? Or is it work in progress?" [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/63782 (owner: 10Sanja pavlovic) [13:36:25] akosiaris: how's it lookin? https://gerrit.wikimedia.org/r/#/c/149889/ [13:38:51] (03PS3) 10Qgil: Extend maximum allowed mediawiki version to 1.24 [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/139413 (https://bugzilla.wikimedia.org/66663) (owner: 10Wpmirrordev) [13:39:03] (03PS2) 10Ottomata: Check assumptions, fix heap overflow [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/127804 (owner: 10CSteipp) [13:39:10] (03CR) 10Ottomata: [C: 032 V: 032] Check assumptions, fix heap overflow [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/127804 (owner: 10CSteipp) [13:44:22] (03CR) 10Qgil: [C: 04-1] "This is a bit confusing. There are three open changesets with the same subject and no description, apparently related to each other. Ideal" [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/113124 (owner: 10Wpmirrordev) [13:44:42] (03CR) 10Qgil: [C: 04-1] "This is a bit confusing. There are three open changesets with the same subject and no description, apparently related to each other. Ideal" [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/113103 (owner: 10Wpmirrordev) [13:44:55] (03CR) 10Qgil: [C: 04-1] "This is a bit confusing. There are three open changesets with the same subject and no description, apparently related to each other. Ideal" [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/111728 (owner: 10Wpmirrordev) [13:45:46] (03CR) 10Qgil: "Does this patch still have a possibility to be merged, or should it be abandoned?" [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/63390 (https://bugzilla.wikimedia.org/48012) (owner: 10Sanja pavlovic) [13:49:07] (03CR) 10Alexandros Kosiaris: [C: 032] Split kafka package into 3 separate packages [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/149889 (owner: 10Ottomata) [13:50:04] !log neon read-only fs. fsck + reboot [13:50:07] Logged the message, Master [13:50:28] thanks! [13:52:15] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: Puppet last ran 16356 seconds ago, expected 14400 [13:52:54] RECOVERY - Puppet freshness on mw1024 is OK: puppet ran at Wed Jul 30 13:52:48 UTC 2014 [13:52:54] RECOVERY - Puppet freshness on virt1000 is OK: puppet ran at Wed Jul 30 13:52:48 UTC 2014 [13:52:54] RECOVERY - Puppet freshness on mw1201 is OK: puppet ran at Wed Jul 30 13:52:48 UTC 2014 [13:52:54] RECOVERY - Puppet freshness on mw1186 is OK: puppet ran at Wed Jul 30 13:52:48 UTC 2014 [13:52:54] RECOVERY - Puppet freshness on rdb1003 is OK: puppet ran at Wed Jul 30 13:52:48 UTC 2014 [13:52:54] RECOVERY - Puppet freshness on db1044 is OK: puppet ran at Wed Jul 30 13:52:49 UTC 2014 [13:53:38] lol [13:53:53] welcome back icinga-wm [13:53:57] or not [13:54:09] RECOVERY - Puppet freshness on lvs3003 is OK: puppet ran at Wed Jul 30 13:54:00 UTC 2014 [13:54:09] RECOVERY - Puppet freshness on mw1143 is OK: puppet ran at Wed Jul 30 13:54:00 UTC 2014 [13:54:10] RECOVERY - Puppet freshness on cp1040 is OK: puppet ran at Wed Jul 30 13:54:00 UTC 2014 [13:54:10] RECOVERY - Puppet freshness on mw1220 is OK: puppet ran at Wed Jul 30 13:54:00 UTC 2014 [13:54:10] RECOVERY - Puppet freshness on cp4002 is OK: puppet ran at Wed Jul 30 13:54:00 UTC 2014 [13:54:11] RECOVERY - Puppet freshness on amssq33 is OK: puppet ran at Wed Jul 30 13:54:00 UTC 2014 [13:54:17] RECOVERY - Puppet freshness on analytics1018 is OK: puppet ran at Wed Jul 30 13:54:05 UTC 2014 [13:54:17] RECOVERY - Puppet freshness on ms-be1002 is OK: puppet ran at Wed Jul 30 13:54:05 UTC 2014 [13:54:17] RECOVERY - Puppet freshness on search1004 is OK: puppet ran at Wed Jul 30 13:54:05 UTC 2014 [13:54:17] RECOVERY - Puppet freshness on strontium is OK: puppet ran at Wed Jul 30 13:54:10 UTC 2014 [13:54:17] RECOVERY - Puppet freshness on mw1071 is OK: puppet ran at Wed Jul 30 13:54:10 UTC 2014 [13:54:17] RECOVERY - Puppet freshness on amssq39 is OK: puppet ran at Wed Jul 30 13:54:10 UTC 2014 [13:54:18] RECOVERY - Puppet freshness on elastic1003 is OK: puppet ran at Wed Jul 30 13:54:15 UTC 2014 [13:54:27] RECOVERY - Puppet freshness on mw1086 is OK: puppet ran at Wed Jul 30 13:54:20 UTC 2014 [13:54:27] RECOVERY - Puppet freshness on db1072 is OK: puppet ran at Wed Jul 30 13:54:25 UTC 2014 [13:54:27] RECOVERY - Puppet freshness on cp1067 is OK: puppet ran at Wed Jul 30 13:54:25 UTC 2014 [13:54:38] RECOVERY - Puppet freshness on mw1203 is OK: puppet ran at Wed Jul 30 13:54:30 UTC 2014 [13:54:47] RECOVERY - Puppet freshness on mw1027 is OK: puppet ran at Wed Jul 30 13:54:40 UTC 2014 [13:54:47] RECOVERY - Puppet freshness on mw1066 is OK: puppet ran at Wed Jul 30 13:54:40 UTC 2014 [13:54:47] RECOVERY - Puppet freshness on copper is OK: puppet ran at Wed Jul 30 13:54:45 UTC 2014 [13:54:47] RECOVERY - Puppet freshness on mw1215 is OK: puppet ran at Wed Jul 30 13:54:45 UTC 2014 [13:54:57] RECOVERY - Puppet freshness on search1012 is OK: puppet ran at Wed Jul 30 13:54:50 UTC 2014 [13:54:57] RECOVERY - Puppet freshness on wtp1008 is OK: puppet ran at Wed Jul 30 13:54:50 UTC 2014 [13:54:57] RECOVERY - Puppet freshness on cp3005 is OK: puppet ran at Wed Jul 30 13:54:50 UTC 2014 [13:54:57] RECOVERY - Puppet freshness on mw1193 is OK: puppet ran at Wed Jul 30 13:54:55 UTC 2014 [13:54:57] RECOVERY - Puppet freshness on cp4009 is OK: puppet ran at Wed Jul 30 13:54:55 UTC 2014 [13:54:57] RECOVERY - Puppet freshness on mw1110 is OK: puppet ran at Wed Jul 30 13:54:55 UTC 2014 [13:54:58] RECOVERY - Puppet freshness on mw1204 is OK: puppet ran at Wed Jul 30 13:54:55 UTC 2014 [13:54:58] RECOVERY - Puppet freshness on db1056 is OK: puppet ran at Wed Jul 30 13:54:55 UTC 2014 [13:54:59] RECOVERY - Puppet freshness on cp1044 is OK: puppet ran at Wed Jul 30 13:54:56 UTC 2014 [13:54:59] RECOVERY - Puppet freshness on db71 is OK: puppet ran at Wed Jul 30 13:54:56 UTC 2014 [13:55:00] RECOVERY - Puppet freshness on wtp1001 is OK: puppet ran at Wed Jul 30 13:54:56 UTC 2014 [13:55:00] RECOVERY - Puppet freshness on cp1053 is OK: puppet ran at Wed Jul 30 13:54:56 UTC 2014 [13:55:07] RECOVERY - Puppet freshness on db60 is OK: puppet ran at Wed Jul 30 13:55:01 UTC 2014 [13:55:07] RECOVERY - Puppet freshness on mw1107 is OK: puppet ran at Wed Jul 30 13:55:01 UTC 2014 [13:55:07] RECOVERY - Puppet freshness on db1035 is OK: puppet ran at Wed Jul 30 13:55:01 UTC 2014 [13:55:07] RECOVERY - Puppet freshness on mw1135 is OK: puppet ran at Wed Jul 30 13:55:01 UTC 2014 [13:55:07] RECOVERY - Puppet freshness on cp4012 is OK: puppet ran at Wed Jul 30 13:55:01 UTC 2014 [13:55:07] RECOVERY - Puppet freshness on mw1064 is OK: puppet ran at Wed Jul 30 13:55:01 UTC 2014 [13:55:08] RECOVERY - Puppet freshness on db1045 is OK: puppet ran at Wed Jul 30 13:55:01 UTC 2014 [13:55:09] RECOVERY - Puppet freshness on mw1090 is OK: puppet ran at Wed Jul 30 13:55:06 UTC 2014 [13:55:09] RECOVERY - Puppet freshness on terbium is OK: puppet ran at Wed Jul 30 13:55:06 UTC 2014 [13:55:10] RECOVERY - Puppet freshness on lvs1004 is OK: puppet ran at Wed Jul 30 13:55:06 UTC 2014 [13:55:10] RECOVERY - Puppet freshness on stat1001 is OK: puppet ran at Wed Jul 30 13:55:06 UTC 2014 [13:55:14] (not) [13:55:21] RECOVERY - Puppet freshness on mw1112 is OK: puppet ran at Wed Jul 30 13:55:11 UTC 2014 [13:55:21] RECOVERY - Puppet freshness on db1037 is OK: puppet ran at Wed Jul 30 13:55:11 UTC 2014 [13:55:21] RECOVERY - Puppet freshness on bast1001 is OK: puppet ran at Wed Jul 30 13:55:11 UTC 2014 [13:55:21] RECOVERY - Puppet freshness on cp1054 is OK: puppet ran at Wed Jul 30 13:55:11 UTC 2014 [13:55:21] RECOVERY - Puppet freshness on mw1113 is OK: puppet ran at Wed Jul 30 13:55:16 UTC 2014 [13:55:22] RECOVERY - Puppet freshness on linne is OK: puppet ran at Wed Jul 30 13:55:16 UTC 2014 [13:55:22] RECOVERY - Puppet freshness on chromium is OK: puppet ran at Wed Jul 30 13:55:16 UTC 2014 [13:55:22] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Wed Jul 30 13:55:16 UTC 2014 [13:55:23] RECOVERY - Puppet freshness on osm-cp1001 is OK: puppet ran at Wed Jul 30 13:55:16 UTC 2014 [13:55:27] RECOVERY - Puppet freshness on mw1131 is OK: puppet ran at Wed Jul 30 13:55:21 UTC 2014 [13:55:27] RECOVERY - Puppet freshness on cp3011 is OK: puppet ran at Wed Jul 30 13:55:21 UTC 2014 [13:55:27] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Jul 30 13:55:21 UTC 2014 [13:55:27] RECOVERY - Puppet freshness on lvs4004 is OK: puppet ran at Wed Jul 30 13:55:21 UTC 2014 [13:55:27] RECOVERY - Puppet freshness on ms-fe1002 is OK: puppet ran at Wed Jul 30 13:55:21 UTC 2014 [13:55:27] RECOVERY - Puppet freshness on mw1158 is OK: puppet ran at Wed Jul 30 13:55:21 UTC 2014 [13:55:28] RECOVERY - Puppet freshness on ms-be1015 is OK: puppet ran at Wed Jul 30 13:55:26 UTC 2014 [13:55:28] RECOVERY - Puppet freshness on cp1068 is OK: puppet ran at Wed Jul 30 13:55:26 UTC 2014 [13:55:29] RECOVERY - Puppet freshness on eeden is OK: puppet ran at Wed Jul 30 13:55:26 UTC 2014 [13:55:30] RECOVERY - Puppet freshness on cp1045 is OK: puppet ran at Wed Jul 30 13:55:26 UTC 2014 [13:55:39] RECOVERY - Puppet freshness on es1004 is OK: puppet ran at Wed Jul 30 13:55:31 UTC 2014 [13:55:39] RECOVERY - Puppet freshness on mw1021 is OK: puppet ran at Wed Jul 30 13:55:31 UTC 2014 [13:55:39] RECOVERY - Puppet freshness on mw1037 is OK: puppet ran at Wed Jul 30 13:55:31 UTC 2014 [13:55:39] RECOVERY - Puppet freshness on mw1104 is OK: puppet ran at Wed Jul 30 13:55:31 UTC 2014 [13:55:39] RECOVERY - Puppet freshness on cp1066 is OK: puppet ran at Wed Jul 30 13:55:36 UTC 2014 [13:55:39] RECOVERY - Puppet freshness on lvs1003 is OK: puppet ran at Wed Jul 30 13:55:36 UTC 2014 [13:55:47] RECOVERY - Puppet freshness on mw1155 is OK: puppet ran at Wed Jul 30 13:55:41 UTC 2014 [13:55:47] RECOVERY - Puppet freshness on amssq37 is OK: puppet ran at Wed Jul 30 13:55:42 UTC 2014 [13:55:57] RECOVERY - Puppet freshness on labsdb1005 is OK: puppet ran at Wed Jul 30 13:55:47 UTC 2014 [13:55:57] RECOVERY - Puppet freshness on ssl1003 is OK: puppet ran at Wed Jul 30 13:55:47 UTC 2014 [13:55:57] RECOVERY - Puppet freshness on mw1154 is OK: puppet ran at Wed Jul 30 13:55:48 UTC 2014 [13:55:57] RECOVERY - Puppet freshness on virt1008 is OK: puppet ran at Wed Jul 30 13:55:54 UTC 2014 [13:55:57] RECOVERY - Puppet freshness on mw1207 is OK: puppet ran at Wed Jul 30 13:55:54 UTC 2014 [13:55:58] RECOVERY - Puppet freshness on mw1128 is OK: puppet ran at Wed Jul 30 13:55:54 UTC 2014 [13:55:58] RECOVERY - Puppet freshness on fenari is OK: puppet ran at Wed Jul 30 13:55:54 UTC 2014 [13:56:07] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: Last successful Puppet run was Wed 30 Jul 2014 07:06:24 UTC [13:56:07] RECOVERY - Puppet freshness on mw1137 is OK: puppet ran at Wed Jul 30 13:55:59 UTC 2014 [13:56:07] RECOVERY - Puppet freshness on db1027 is OK: puppet ran at Wed Jul 30 13:56:04 UTC 2014 [13:56:08] RECOVERY - Puppet freshness on search1003 is OK: puppet ran at Wed Jul 30 13:56:04 UTC 2014 [13:56:17] RECOVERY - Puppet freshness on cp3019 is OK: puppet ran at Wed Jul 30 13:56:09 UTC 2014 [13:56:17] RECOVERY - Puppet freshness on mw1047 is OK: puppet ran at Wed Jul 30 13:56:09 UTC 2014 [13:56:18] RECOVERY - Puppet freshness on cp4016 is OK: puppet ran at Wed Jul 30 13:56:14 UTC 2014 [13:56:18] RECOVERY - Puppet freshness on analytics1012 is OK: puppet ran at Wed Jul 30 13:56:14 UTC 2014 [13:56:25] irssi users: /ignore -time 30m -regexp -pattern "puppet freshness" icinga-wm [13:56:27] RECOVERY - Puppet freshness on mw1199 is OK: puppet ran at Wed Jul 30 13:56:19 UTC 2014 [13:56:27] RECOVERY - Puppet freshness on tmh1001 is OK: puppet ran at Wed Jul 30 13:56:19 UTC 2014 [13:56:37] RECOVERY - Puppet freshness on cp4010 is OK: puppet ran at Wed Jul 30 13:56:25 UTC 2014 [13:56:39] RECOVERY - Puppet freshness on palladium is OK: puppet ran at Wed Jul 30 13:56:31 UTC 2014 [13:56:39] RECOVERY - Puppet freshness on analytics1031 is OK: puppet ran at Wed Jul 30 13:56:31 UTC 2014 [13:56:39] RECOVERY - Puppet freshness on mc1015 is OK: puppet ran at Wed Jul 30 13:56:31 UTC 2014 [13:56:39] RECOVERY - Puppet freshness on mw1194 is OK: puppet ran at Wed Jul 30 13:56:31 UTC 2014 [13:56:39] RECOVERY - Puppet freshness on mw1073 is OK: puppet ran at Wed Jul 30 13:56:31 UTC 2014 [13:56:39] RECOVERY - Puppet freshness on mw1018 is OK: puppet ran at Wed Jul 30 13:56:31 UTC 2014 [13:56:47] RECOVERY - Puppet freshness on cp3021 is OK: puppet ran at Wed Jul 30 13:56:36 UTC 2014 [13:56:47] RECOVERY - Puppet freshness on amssq50 is OK: puppet ran at Wed Jul 30 13:56:36 UTC 2014 [13:56:47] RECOVERY - Puppet freshness on ms-be1010 is OK: puppet ran at Wed Jul 30 13:56:36 UTC 2014 [13:56:59] (03CR) 10Alexandros Kosiaris: [C: 04-1] mobile: replace iptables with ferm rule (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/117673 (owner: 10Matanya) [14:08:50] (03PS1) 10coren: labmon1001: switch to raid1-lvm [operations/puppet] - 10https://gerrit.wikimedia.org/r/150554 [14:10:42] andrewbogott: quick +2 to ^^ when you wake? [14:11:10] !log reinstalling labmon1001 -> change disk partitioning scheme [14:11:16] Logged the message, Master [14:12:46] (03PS26) 10Yuvipanda: Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [14:13:07] (03CR) 10Andrew Bogott: [C: 032] labmon1001: switch to raid1-lvm [operations/puppet] - 10https://gerrit.wikimedia.org/r/150554 (owner: 10coren) [14:13:20] Danke. [14:17:01] !log rebooting neon again, trying to fix the disk situation [14:17:06] Logged the message, Master [14:17:44] (03PS27) 10Yuvipanda: Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [14:19:29] <^demon|brb> godog: We finished rolling out the swift plugin :) Want to get that user/key in place today and give a dump a shot? [14:20:31] ^demon|brb: sure! [14:28:55] ^demon|brb: after the credentials I believe https://gerrit.wikimedia.org/r/#/c/130760 is left and that's it? [14:29:51] <^demon|brb> Yep, I'm getting the credentials into PrivateSettings now [14:29:55] <^demon|brb> Then I'll amend that one last time. [14:30:46] !log rolling restart of ms-fe* to pick up search backup user [14:30:52] Logged the message, Master [14:32:15] ^demon|brb: cool [14:35:05] !log demon Synchronized wmf-config/PrivateSettings.php: Swift config for Cirrus (duration: 00m 08s) [14:35:11] Logged the message, Master [14:38:26] (03CR) 10Alexandros Kosiaris: "Minor comment" (031 comment) [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/150212 (owner: 10Giuseppe Lavagetto) [14:41:57] (03PS7) 10Chad: Configure Swift-backed elasticsearch backups [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130760 [14:42:13] <^demon|brb> Alrighty, PS7 should be the golden ticket. [14:43:35] <^demon|brb> Except I copy+pasted the username instead of the key for the username. [14:43:58] (03PS8) 10Chad: Configure Swift-backed elasticsearch backups [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130760 [14:44:10] <^demon|brb> PS8 or bust. [14:44:32] *rubberstamp* [14:44:48] (03CR) 10Filippo Giunchedi: [C: 031] Configure Swift-backed elasticsearch backups [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130760 (owner: 10Chad) [14:45:21] (03CR) 10Chad: [C: 032] Configure Swift-backed elasticsearch backups [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130760 (owner: 10Chad) [14:45:26] (03Merged) 10jenkins-bot: Configure Swift-backed elasticsearch backups [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130760 (owner: 10Chad) [14:46:20] !log demon Synchronized wmf-config/CirrusSearch-production.php: (no message) (duration: 00m 04s) [14:46:24] Logged the message, Master [14:46:47] <^demon|brb> Woo, now it's all ready for trying out :) [14:47:00] <^demon|brb> Good time to grab a drink, bathroom break, etc before continuing. [14:47:12] ^demon|brb: kk, let me know when you start [15:00:04] manybubbles, Reedy, yurik: Respected human, time to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140730T1500). Please do the needful. [15:00:47] yurikMskRu: I'm in a meeting now so I'm not going to be able to SWAT for another half hour [15:02:28] (03CR) 10Giuseppe Lavagetto: mediawiki::web: compatability fixes for apache2.conf (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147994 (owner: 10Ori.livneh) [15:02:54] ottomata: can you take a stab at responding to https://rt.wikimedia.org/Ticket/Display.html?id=8003 ? [15:05:14] <^demon|brb> godog: No dice. I'm missing another java dependency :( [15:05:40] oh, i thought there was an analytics-admins [15:05:41] hmm [15:05:49] yeah, actually, there isn't yet [15:05:49] manybubbles, thx, i still don't know if it is needed - it was breaking on my machine for mobile master, but i would have to test it with wmf15 for mobile to see if it would work [15:05:53] i think we need one andrewbogott [15:06:01] just one? [15:06:10] (one group :p) [15:06:14] <^demon|brb> manybubbles: You're gonna hate me. We're missing another dependency in swift-plugin. org.apache.http.conn.ClientConnectionManager [15:06:16] the reason I didn't do it myself, was because I wasn't sure if I was allowed to give qchris sudo access [15:06:26] manybubbles, it might have been that mobile team had support for both in 15, and than they removed it from master [15:07:20] !log shutting down neon [15:07:24] (03PS5) 10Giuseppe Lavagetto: mediawiki::web: compatability fixes for apache2.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/147994 (owner: 10Ori.livneh) [15:07:26] Logged the message, Master [15:07:53] ottomata: so currently he has login access but not sudo on those boxes? Is that right? [15:08:03] (03CR) 10Ori.livneh: [C: 031] "(thanks)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147994 (owner: 10Ori.livneh) [15:08:11] (03CR) 10Giuseppe Lavagetto: "I have rebased and translated this patch on top of the current state of the apache2 config across the cluster." [operations/puppet] - 10https://gerrit.wikimedia.org/r/147994 (owner: 10Ori.livneh) [15:08:16] yurikMskRu: meeting over. [15:08:27] _joe_: +1, thanks! [15:08:33] so, I dont' really want to push it if it you aren't sure its right.... [15:08:40] but I get whwere you are coming from [15:08:45] hehe, neither do i :) [15:08:48] ottomata: so, you say "but if that is not allowed, I think we should at least get him shell access to all of the analytics10* boxes" [15:09:05] I think that a) its not allowed and b) he already has shell access doesn't he? [15:09:26] ^demon|brb: :sad: can it wait until after wikimania now? We'll time it with the 1.3 upgrade? [15:09:30] actually, no. analytics-users gets you access to only boxes that users of the analytics cluster would need to have [15:09:36] mainly, analytics1010 (and analytics1004) [15:09:40] those are the hadoop namenodes [15:09:45] manybubbles, but if mobile broke it, it will go kaboom once we switch enwiki to 15 [15:09:50] there is no reason a regular hadoop user woul dneed access to all analytics nodes [15:09:57] e.g. kafka brokers, datanodes, etc. [15:09:59] <^demon|brb> manybubbles: Yeahhh, probably best. I don't want to do the whole thing again today. [15:10:06] <^demon|brb> I *just* finished last night. [15:10:13] yurikMskRu: yeah.... can you test with something on wmf15 today? [15:10:22] <^demon|brb> godog: I think we're on hold. We're not gonna restart the cluster again to pick up the missing dependency. [15:10:32] <^demon|brb> Until after Wikimania. [15:10:54] qchris is doing some admin work on the cluster, (modifying oozie jobs, checking out logs for troubleshooting, etc.) [15:11:00] so it would be good if he had permissions to sudo [15:11:10] actually, he really only needs sudo to hdfs, hive and oozie users [15:11:10] manybubbles, will try to do it later today [15:11:15] ^demon|brb: no problem! we can chat about it at wikimania too [15:11:15] i don't think he needs root [15:11:23] <^demon|brb> godog: Sounds good. Thanks! [15:11:34] andrewbogott: I'll submit a patch [15:11:39] we'll see if we can get someone to approve [15:17:00] !log upgrading php5 on jenkins slaves [15:17:07] Logged the message, Master [15:19:13] (03CR) 10BryanDavis: [C: 04-1] "Conflicting with production HEAD now. I'm pulling the cherry-pick from beta for now." [operations/puppet] - 10https://gerrit.wikimedia.org/r/149364 (owner: 1020after4) [15:21:38] !log hoo Synchronized php-1.24wmf15/extensions/Wikidata/extensions/Wikibase/lib/resources/wikibase.js: touch (duration: 00m 20s) [15:21:43] Logged the message, Master [15:21:55] (03CR) 10BryanDavis: "Cherry-picked patch set #5 into beta's puppet master." [operations/puppet] - 10https://gerrit.wikimedia.org/r/147994 (owner: 10Ori.livneh) [15:22:11] (03PS1) 10Ottomata: Create analytics-admins group with otto and qchris as members [operations/puppet] - 10https://gerrit.wikimedia.org/r/150560 [15:51:16] (03PS1) 10Ottomata: Add support for hive.variable.substitute.depth [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/150566 [15:51:47] (03CR) 10Ottomata: [C: 032 V: 032] Add support for hive.variable.substitute.depth [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/150566 (owner: 10Ottomata) [15:54:05] (03PS1) 10Ottomata: Increase hive.variable.substitute.depth to 10000 [operations/puppet] - 10https://gerrit.wikimedia.org/r/150567 [15:56:25] (03CR) 10Mattflaschen: [C: 031] "Matches my correct new production public key" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150175 (owner: 10Matanya) [15:58:05] (03PS2) 10Ottomata: Increase hive.variable.substitute.depth to 10000 [operations/puppet] - 10https://gerrit.wikimedia.org/r/150567 [15:59:45] (03CR) 10Ottomata: [C: 032 V: 032] Increase hive.variable.substitute.depth to 10000 [operations/puppet] - 10https://gerrit.wikimedia.org/r/150567 (owner: 10Ottomata) [16:00:04] bd808: Respected human, time to deploy Scap update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140730T1600). Please do the needful. [16:00:54] (03PS3) 10Andrew Bogott: access: Matt Flaschen new key [operations/puppet] - 10https://gerrit.wikimedia.org/r/150175 (owner: 10Matanya) [16:02:06] (03CR) 10Andrew Bogott: [C: 032] access: Matt Flaschen new key [operations/puppet] - 10https://gerrit.wikimedia.org/r/150175 (owner: 10Matanya) [16:03:20] (03PS2) 10Ori.livneh: Add HHVM module [operations/puppet] - 10https://gerrit.wikimedia.org/r/150506 [16:04:07] !log Updated scap to 4871208 (rely on $PATH for scap scripts) [16:04:12] Logged the message, Master [16:04:26] <^demon|brb> hashar: What version of php did you upgrade to? [16:06:11] (03PS3) 10Ori.livneh: Add HHVM module [operations/puppet] - 10https://gerrit.wikimedia.org/r/150506 [16:06:41] !log scap announce failed -- timeout connecting to tcpircbot on neon.wikimedia.org [16:06:47] Logged the message, Master [16:07:01] !log Synchronized touch: no-op sync to test scap update (duration: 00m 05s) [16:07:07] Logged the message, Master [16:10:52] andrewbogott, cmjohnson1, bblack: Looks like the tcpircbot that is supposed to be listening on port 9200 on neon is not alive. This is what scap and dologmsg use to announce shell actions from tin. [16:11:26] * bd808|deploy sees a lot of sickness for neon in backscroll [16:11:53] bd808: reinstalling neon...hopefully back up shortly [16:12:17] cmjohnson1: col. thanks for the update [16:12:24] *cool [16:12:28] yeah I should've logged it [16:12:42] (03PS4) 10Ori.livneh: Add HHVM module [operations/puppet] - 10https://gerrit.wikimedia.org/r/150506 [16:13:24] !log scap and dologmsg from tin won't work until neon is back up and running tcpircbot [16:13:30] Logged the message, Master [16:14:35] I'm {{done}} with the scap update incase anyone was waiting to jump in with something adhoc [16:14:47] bd808: \o/ [16:14:56] ori: :) [16:15:46] <^demon|brb> Heh, http://ganglia.wikimedia.org/latest/graph_all_periods.php?me=Wikimedia&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2&g=network_report&z=large is such a weird graph collection. [16:15:53] <^demon|brb> Total cluster network. [16:17:09] (03PS5) 10Ori.livneh: Add HHVM module [operations/puppet] - 10https://gerrit.wikimedia.org/r/150506 [16:19:09] (03CR) 10Ori.livneh: "(Not totally done, but early feedback appreciated.)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150506 (owner: 10Ori.livneh) [16:19:14] ^demon|brb: analytics and virt* servers are the ones usually having tens PB/s network peaks :D [16:19:37] <^demon|brb> Yeah. It just drowns out anything else :p [16:20:41] <^demon|brb> cmjohnson1: I see the new ram in 17-19, looks good thx! [16:20:48] yw [16:21:50] <^demon|brb> manybubbles: Want to start repooling them? [16:21:55] (03PS1) 10Ori.livneh: Remove deployment-apache{01,02} from beta cluster scap targets [operations/puppet] - 10https://gerrit.wikimedia.org/r/150570 [16:22:14] <^demon|brb> (I can do it, just wondering if you have any reason not to) [16:22:46] (03PS2) 10Ori.livneh: Remove deployment-apache{01,02} from beta cluster scap targets [operations/puppet] - 10https://gerrit.wikimedia.org/r/150570 [16:23:40] (03CR) 10Ori.livneh: [C: 032] "trivial and uncontroversial" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150570 (owner: 10Ori.livneh) [16:43:56] (03PS1) 10BryanDavis: beta: Remove require of Ferm::Rule['bastion-ssh'] [operations/puppet] - 10https://gerrit.wikimedia.org/r/150576 [16:44:13] (03PS2) 10BryanDavis: beta: Remove require of Ferm::Rule['bastion-ssh'] [operations/puppet] - 10https://gerrit.wikimedia.org/r/150576 [16:45:49] (03CR) 10BryanDavis: "Cherry-picked to beta and applied." [operations/puppet] - 10https://gerrit.wikimedia.org/r/150576 (owner: 10BryanDavis) [16:53:33] !log elastic1017 repooled, shards allocating [16:53:37] Logged the message, Master [16:53:41] manybubbles: ^ [16:56:14] (03PS1) 10Ottomata: Add debian/kafka-common.dirs [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/150579 [16:56:26] (03CR) 10Ottomata: [C: 032 V: 032] Add debian/kafka-common.dirs [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/150579 (owner: 10Ottomata) [17:01:42] !log labmon1001 rebooting (partitioning changes on primary disks) [17:01:48] Logged the message, Master [17:08:15] !log working on bringing up new neon install (first puppet run, etc) [17:08:20] Logged the message, Master [17:10:52] if there's a huge icinga spam in this channel in the near future, it's probably just that icinga is coming up for the first time... [17:11:56] https://gerrit.wikimedia.org/r/#/c/150474/ is super easy if anyone's got a second. [17:13:01] (03PS2) 10BBlack: Remove searchidx dsh group [operations/puppet] - 10https://gerrit.wikimedia.org/r/150474 (owner: 10Chad) [17:13:08] (03CR) 10BBlack: [C: 032] Remove searchidx dsh group [operations/puppet] - 10https://gerrit.wikimedia.org/r/150474 (owner: 10Chad) [17:13:14] (03CR) 10BBlack: [V: 032] Remove searchidx dsh group [operations/puppet] - 10https://gerrit.wikimedia.org/r/150474 (owner: 10Chad) [17:13:23] bblack: Thx [17:13:28] np [17:21:36] !log labmon1001 rebooting (final check for proper raid+lvm autodetection) [17:21:42] Logged the message, Master [17:22:12] (03PS1) 10Chad: Raise recovery throttle to 100mb/s [operations/puppet] - 10https://gerrit.wikimedia.org/r/150586 [17:25:52] !log repooled elastic1018 and elastic1019 as well [17:25:57] Logged the message, Master [17:31:17] (03PS1) 10Legoktm: Switch ExtensionDistributor to serve tarballs from extdist.wmflabs.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150589 (https://bugzilla.wikimedia.org/68122) [17:32:30] (03CR) 10Legoktm: "This must be deployed before 1.24wmf16 hits mediawiki.org otherwise ED will break. It's safe to deploy beforehand though." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150589 (https://bugzilla.wikimedia.org/68122) (owner: 10Legoktm) [17:35:49] manybubbles: Hey, you have a moment to discuss the search server specs? [17:36:00] The idea of a 4 disk raid0 scares me. [17:37:03] RobH: yeah - 4 disk raid0 feels like we're asking for trouble - twice the ask as 2 disk raid0 [17:37:24] I don't mind moving to 4 disks at all, i just dislike that particular idea [17:37:27] *but* if it gets us twice the throughput I think its worth thinking of [17:37:59] I suppose it is 16 servers [17:38:10] so the trouble with the 4 disk raid0, I think, isn't that we're likely to lose data - we're not particularly likely with the 3x redundancy we run with [17:38:12] so the idea of our search infrastructure then will be to do redundancy by footprint [17:38:15] not by server disk config [17:38:24] any ssd failure is going to be a few days to fix (possibly) [17:38:33] but it could be a pain to deal with fix them [17:38:33] yeah [17:38:36] we should plan to buy some and keep spares though [17:38:49] which mark needs to be ok with, since it may add overhead to the onsite role [17:38:51] if we go with 4 disk raid0 I'd advise buying a couple as spare [17:38:55] yeah [17:39:02] btw, one of the things we'll almost certainly have lost is our current downtime settings in icinga [17:39:17] but if its soemthing where you guys plan that any one server can die for ahile [17:39:20] once things stabilize we'll need to re-downtime things that should be in downtime or ack things that should be acked [17:39:22] similar to how we do mw servers. [17:39:37] yeah - any one server can go down for quite a while [17:39:41] if any single mw server dies, there is no urgency for someone to go onsite. it is merely handled in next business hours [17:39:54] if we have 35 nodes in the cluster then it'll just reduce our capacity by a few percentage points [17:40:13] Cool, I think we are on the same page [17:40:13] like 3% [17:40:27] So if that is how this is planned, while raid0 of 4 SSDs seems iffy [17:40:33] we'll certainly find out the SSD failure rate. [17:40:39] (and it will be quite noticable) [17:40:39] :) [17:41:00] i say all this [17:41:08] but this still makes me cringe on a fundamental level [17:41:09] heh [17:41:15] raid0 of 4 disks just sounds wrong [17:41:17] yeah - if we hit a time where a couple of the servers goes down right next to each other we'll be sad [17:41:22] it does! [17:41:37] Do we need raid0 x 4disk? [17:41:41] so can search service somehow still get performance and split across two partitions for work? [17:41:50] * tacotuesday doesn't know these things [17:41:53] another option would be if one disk goes to rebuild the raid0 with two disks [17:41:56] sorry, three [17:41:58] has anyone actually seen an ssd failure on a server? i believe it happens, but i've never seen one. [17:42:00] just the remaining ones [17:42:08] jgage: yea, weve seen for caching servers [17:42:14] not a ton, but caching writes a LOT [17:42:21] and in the past we used consumer level ssds [17:42:30] RobH: it _can_ use two partitions but its decidedly non-optimal for a bunch of reasons [17:42:32] im not sure of our failure rate of hte S3500s, they are fairly new [17:42:35] does the device just totally stop functioning, or does it degrade? [17:42:39] the code that handles it is a bit simplistic [17:42:40] stop function [17:42:42] whee [17:42:58] * tacotuesday raises hand for question [17:43:21] so - if we have a disk failure on any of the elasticsearch servers - the plan is to put the server back online without the disk and let the cluster rebuild around it [17:43:22] if possible [17:43:25] tacotuesday: yes? [17:43:27] If we're going to look at using noatime consistently and possibly noop for scheduling, how much are we concerned about disk performance? [17:43:53] tacotuesday: so - its still a limiting factor *right now* [17:44:12] it might not be in the future, but we're looking at about 1k extra to switch from 2x300gb to 4x160gb [17:44:19] which feels like it'd be worth it [17:44:34] Jeff_Green: d'you know adam wight's IRC handle? [17:44:39] hm just got paged but no icinga alert here? [17:44:43] analytics1021 [17:44:44] Hmm. [17:44:47] ottomata? [17:44:52] but if the failure rate is too high then it isn't worth it for any money [17:45:01] oo! [17:45:05] you got paged?! [17:45:07] yeah [17:45:08] why did I not get paged? [17:45:20] (in SoS meeting right now..) [17:45:20] Jeff_Green: awight [17:45:23] oh god I hope it doesn't spam all the pages [17:45:24] "the command defined for service kafka broker server does not exist" [17:45:29] Bah, andrewbogott ^ [17:45:31] on 1021 [17:45:32] Not Jeff_Green, sorry. [17:45:34] icinga is just coming back online and the configuration still isn't great [17:45:37] um, icinga looks like it is having problems [17:45:40] please ignore icinga for the time being [17:45:40] aye [17:45:41] oh [17:45:42] ok [17:45:43] cool [17:45:44] oho [17:45:48] <_joe_> bblack: ok [17:45:56] clearly we need redundant monitoring systems [17:45:59] <_joe_> so I should disregard pages from now on? [17:46:15] <_joe_> jgage: we would be well off with an HA cluster [17:46:23] for those that aren't up to date on the past several hours: the host (neon) that hosts icinga had disk issues, it's being rebuilt from scratch (puppet) [17:46:24] manybubbles: Well I'll let you guys decide. I dunno what's best :p [17:46:26] <_joe_> I hd set up a very stupid one [17:46:31] and lots of stuff apparently isn't fixed by puppet [17:46:32] <_joe_> and that could be enough [17:46:35] oh wow, ok [17:46:39] bblack: that makes me sad. [17:46:51] I'm keeping a log of my manual fixups as I go [17:46:58] <_joe_> bblack: do you need assistance? [17:47:15] so far I'm ok, I may have some random questions about how things used to be as I go :) [17:47:16] <_joe_> if so, please tell me [17:47:19] tacotuesday and RobH: I don't have a super strong opinion but I think its worth talking about. if 4x160 is just too icky I'm happy with that as an argument to not do it. [17:47:46] <_joe_> ok for that you may seek someone else's expertise :) [17:49:55] anyone know much about Apache rewrite rules? [17:50:28] well raid0 is scary in general but it seems appropriate for this situation. presumably you need the IO. [17:50:53] only real alternative is raid10 at twice the price [17:50:59] _joe_: do you know what's involved in that switch from check_ganglia to check_graphite you mentioned before? is it trivial? [17:51:11] it might reduce load enough to make the rest of things easier [17:51:58] <_joe_> bblack: not sure honestly, but I thought I committed some fixes there [17:53:06] hmm, there was a lot of binging of my nick and my IRC client failed to raise an alert. did everything get worked out? [17:53:11] <_joe_> lemme check [17:55:31] (03CR) 10Reedy: [C: 032] Switch ExtensionDistributor to serve tarballs from extdist.wmflabs.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150589 (https://bugzilla.wikimedia.org/68122) (owner: 10Legoktm) [17:55:32] (03Merged) 10jenkins-bot: Switch ExtensionDistributor to serve tarballs from extdist.wmflabs.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150589 (https://bugzilla.wikimedia.org/68122) (owner: 10Legoktm) [17:55:50] !log stopping icinga service for now while working out other details [17:55:57] Logged the message, Master [17:56:49] I'm going to leave icinga offline at least long enough for the mirrored disks to finish their initial resync, it's really hurting i/o speed [17:57:37] (03PS1) 10BryanDavis: beta + hhvm: Add bt-hhvm dump script [operations/puppet] - 10https://gerrit.wikimedia.org/r/150593 [17:58:34] (03CR) 10Rush: [C: 04-1] "otto, you shouldn't need to be in this group as part of ops" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150560 (owner: 10Ottomata) [17:58:36] <_joe_> bblack: no sorry, I already fixed what was fixable [17:58:45] <_joe_> https://github.com/wikimedia/operations-puppet/commit/8ddd290b03507b7cad1dce05158a930d381f976c#diff-aeb875e19ff4295aafbac2205474690d [17:58:53] Reedy: ty! [18:00:04] yurik: Dear anthropoid, the time has come. Please deploy Wikipedia Zero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140730T1800). [18:00:38] RobH: You know anything about Apache rewrite rules (or know who does) [18:00:40] ? [18:00:51] _joe_: ok [18:00:56] Need to find someone to take a look at https://bugzilla.wikimedia.org/show_bug.cgi?id=62289 [18:01:16] kaldari: Also, Apache 2.0!? [18:01:35] Reedy: is that bad? [18:02:08] just pretty old [18:02:17] Reedy: the bug was filed many months ago FWIW [18:02:25] It was still old then ;) [18:02:33] (03CR) 10Rush: "for sanity we should not use pip as I'm sure we won't in production" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 (owner: 10Yuvipanda) [18:03:51] kaldari: If it's 500-ing.. Anything in the apache logs? [18:04:19] Reedy: Good idea, I'll check [18:05:10] kaldari: Also, it seems to be mac specific (based on the 2 reports). Asking someone like brion might be a good idea [18:07:47] (03PS1) 10BBlack: fix tcpircbot group setup? [operations/puppet] - 10https://gerrit.wikimedia.org/r/150598 [18:10:46] chasemp: this if purely on labs, as a volunteer project. Usually others who do this just get it setup manually :( [18:11:10] chasemp: and if it ever makes it to prod, I'll be sure to package things as necessary [18:11:18] YuviPanda: I didn't -1 or anything just a thought [18:11:18] (Unlikely to make it to prod anytime soon) [18:11:38] I don't like doing prod things with virtualenv or pip but I also don't know the trajectory of these efforts [18:13:39] chasemp: yeah, if this were in prod I'd just package these things [18:13:44] chasemp: this is just a personal project [18:14:12] YuviPanda: where is the actual invocation of pip/virtualenv? [18:14:17] i see you including them but not using them [18:14:41] ori: fabric [18:14:49] ori: deployment is via fabric. [18:14:59] ori: I tried with puppet, then recoiled in horror, and didn't do it [18:15:14] ori: that's not in the repo yet, but will be soon [18:15:32] why not trebuchet? [18:15:44] ori: I wanted this done before Wikimania [18:15:49] ori: research hackathon [18:16:02] ori: I'll probably switch it over after. plus it's only 2 machines now [18:17:36] YuviPanda|Bus: if this module isn't a candidate for production in its current form, and will only be used in one place… maybe it's a good candidate for a separate repo? [18:18:15] andrewbogott: it might be used in prod in the future, and yeah, puppetception is a good idea, but I want to use the modules from prod in this (nginx, uwsgi, diamond) [18:18:18] Since you just now implemented a path for that… [18:18:25] andrewbogott: I haven't figured out a way for that cleanly yet. [18:18:28] Ah, I see. [18:19:05] andrewbogott: this is the work I did puppetception first for, though. [18:19:14] hm [18:19:25] I'm wary about letting pip 'leak' into our puppet repo, even if you have good intentions in the short run [18:19:42] andrewbogott: I can move it over later if you think that'll be preferable, but I do hope for this to run in prod at some point (For access to EL data + slaves, similar to ori's ipython notebooks of long ago) [18:19:50] andrewbogott: true, but this only installs the pip package and nothing else [18:19:50] (03CR) 10BBlack: [C: 032] fix tcpircbot group setup? [operations/puppet] - 10https://gerrit.wikimedia.org/r/150598 (owner: 10BBlack) [18:20:36] YuviPanda|Bus: I generally discourage pip use even in labs. Because you never know what you're going to get… the whole point of puppet is reproducibility, that goes hand-in-hand with proper packaging. [18:20:57] andrewbogott: pip uses requirements.txt, and so this is fully reproducible [18:21:05] andrewbogott: it's similar to the way we deploy mediawiki [18:21:07] If you trust the repo [18:21:15] andrewbogott: with fabric instead of scap [18:21:16] which you can't unless you host it yourself [18:22:28] andrewbogott: sure, but then 1. on labs, most projects are unpuppetized, 2. toollabs everything is pip, mostly. I don't see why this is different. [18:22:54] andrewbogott: If this were to ever get into prod I know for sure there'll be debs made at that time, but there are no concrete plans for this to be in prod [18:23:39] also I'm on a bus, might drop out at some point [18:23:53] "I'm wary about letting pip 'leak' into our puppet repo" +1 [18:23:53] YuviPanda|Bus: let's think about how to make this work with puppetception then. Currently we don't have any system in place to merge something into production puppet that is 'safe for labs but not for prod' [18:24:00] not trying to be a pain about it [18:25:03] andrewbogott: I could put a check at top that fatals if it's in prod :) [18:25:23] bblack: neon / icinga is dead right? :-] [18:25:29] yes [18:25:36] RIP [18:25:40] well, I'm in the process of bringing it back online, but it's taking a while [18:25:45] luckily it is puppetized :-] [18:25:50] lol [18:26:02] chasemp: fwiw, all it's doing is installing the pip package, which is there in toollabs too [18:26:20] i had a bunch of emails sent at 17:43 stating The command defined for service jenkins_service_running does not exist [18:26:21] I'm keeping a log of the manual steps that were missing from puppet. it's still less than a screen-ful, so that's something! :) [18:26:25] don't think it is going to help you [18:26:37] chasemp: andrewbogott and if it goes into a separate repo for puppetception, that means it's going to get a *lot* less review [18:26:59] chasemp: andrewbogott wikimetrics, which we already have in ops/puppet, is labs only and also uses pip [18:27:03] bblack: if you are up for it one day, we could attempt to setup an icinga for beta cluster ;] [18:27:08] can we not put it up for review otherwise? [18:27:10] YuviPanda|Bus: right, but it will be re-reviewed if/when it is moved into prod. [18:27:11] good luck on fixing it [18:27:18] that is bad but doesn't not imply more badness :) ? [18:27:22] chasemp: andrewbogott in pretty much the same way as I am, except they have manual installation as the last step [18:27:29] hm… that makes me want to break wikimetrics :) [18:27:32] * andrewbogott looks [18:27:40] honestly imo yeah that shouldn't be [18:27:53] if it's serious enough to get into the prod puppet repo [18:28:02] it should be serious enough to package as if it can be used in prod [18:28:16] it's intentionally in labs because they also use only labsdb [18:28:22] quarry is just a more generic version of wikimetrics [18:28:28] re: wikimetrics - we asked ops a while back and they recommended labs [18:28:43] there's a thread I think last October or so [18:28:56] I think not the thought that it's in labs is up for discussion [18:28:58] the logic was mostly because it uses labsdb [18:29:08] but just what the standard is for having the config in the prod puppet repo [18:29:47] right, wikimetrics keeps all config in the prod puppet repo and does a self-hosted puppet master to import the secret repo [18:30:10] I just mentioned that because andrewbogott seemed not happy with the wikimetrics setup [18:30:25] ah [18:30:31] milimetric: does it install code using pip from a non-WMF-hosted code repo? [18:30:35] my only thought is here pip in prod puppet is bad [18:30:51] but it's weird because mostly prod is primary use case and labs is ancillary / testing [18:30:57] where labs is a primary use case for things in prod puppet [18:31:00] that's....and odd case [18:31:19] for which I would say it should provide the same consistency guidelines being in prod puppet [18:31:22] yes andrewbogott, it does use pip install to get set up in labs [18:31:22] and use appropriate packaging [18:31:23] I think labs as just a testing ground for prod 'only' has not been the case for a while [18:31:31] Then I'm not happy :( [18:31:39] YuviPanda|Bus: sure which is cool I think [18:31:42] sure, that's what we thought too [18:31:50] http://ganglia.wikimedia.org/latest/?c=API%20application%20servers%20eqiad&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [18:31:58] but that doesn't mean things in the main puppet setup should use pip [18:31:58] why do half of those use 2x CPU? [18:32:02] so if you look up that old thread, it starts with: hey since we have to go to prod, what do you advise [18:32:18] and the response after some debate was: don't go to prod because accessing labsdb is much harder from prod [18:32:30] (03CR) 1020after4: [C: 031] beta: Remove require of Ferm::Rule['bastion-ssh'] [operations/puppet] - 10https://gerrit.wikimedia.org/r/150576 (owner: 10BryanDavis) [18:32:30] it's like saying we should ignore lint errors, or style things if the module is only labs bound to me [18:32:39] it only ends badly because that's not a classification that can be guaranteed [18:32:44] hmmm [18:32:51] (03PS1) 10coren: labstore1003: set to Trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/150621 [18:32:51] maybe we could host a pip mirror locally [18:32:55] since that's what we do for apt [18:32:57] no, so, nothing is ignored in terms of quality in the puppetization, and I don't think Yuvi is proposing to ignore anything either [18:33:03] andrewbogott: quickie? ^^ [18:33:05] but then we have to maintain two packaging standards [18:33:08] I don't think we individually verify everything upstream from debian either [18:33:44] chasemp: it also bugs me a *lot* that this standard isn't applied to node [18:33:46] chasemp: and only to python [18:33:53] it just packages things into the repo and is all good [18:34:04] I could potentially just checkin my venv and theoretically that should work too [18:34:14] (03CR) 1020after4: "I'll look into it and attempt this fix again later. for now I'm just gonna abandon this experimental patch.." [operations/puppet] - 10https://gerrit.wikimedia.org/r/149364 (owner: 1020after4) [18:34:24] chasemp: and that's stuff running in prod, with tons of library code that's not been verified by us [18:34:28] sure but that's a 'he did it !' argument which doesn't really make sense [18:34:34] chasemp: it's the inconsistency that bugs me. [18:34:39] that's fair [18:34:46] but that badness does not indicate this badness as good [18:34:48] (03Abandoned) 1020after4: collect host keys on deployment-bastion for beta environment. [operations/puppet] - 10https://gerrit.wikimedia.org/r/149364 (owner: 1020after4) [18:34:55] top -c suggest half as many workers [18:35:04] chasemp: imagine if I had done this in node. we wouldn't be having this conversation at all, despite it's the same thing [18:35:06] this is why I spoke up. The main point here is that all of ops saw the thread and the conclusion was that it's ok to do this in wikimetrics's case [18:35:16] not as an exception but because it makes a lot more sense than any alternative [18:35:22] Coren: You don't want to hear this, but… whitespace? [18:35:28] milimetric: is this pip or is this a module that is only labs or [18:35:30] so I think Yuvi's project is no different [18:35:31] I'm not sure who's talking to who [18:35:53] wikimetrics is a module that's used from both vagrant and operations-puppet [18:36:00] chasemp: milimetric's wikimetrics also uses pretty much the same method of deployment I am. [18:36:05] in that module, to install, it uses pip install [18:36:18] andrewbogott: Oh! That thing has *tabs*? [18:36:28] * Coren fixies. [18:36:39] I'm not sure what the argument is, that because we've done it poorly we should continue to do it poorly? [18:36:45] I didn't -1 because of pip in labs [18:36:51] but I also can't agree it's good because it's not good [18:36:53] chasemp: no, I think it's ok in labs, and not in prod. [18:36:57] ah, those are all the API boxen...I guess that makes sense [18:37:16] yes but you are making modules in production puppet repo [18:37:18] for use only in labs [18:37:23] (03PS2) 10coren: labstore1003: set to Trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/150621 [18:37:25] should we split the repos to prod / labs? [18:37:28] which means you are blurring the lines of labs and prod [18:38:12] hmm, I think, for the moment, I'll just run it from the puppetmasters until wikimania, and switch it to puppetception after, and then learn packaging and package the dependencies at some point [18:38:18] YuviPanda|Bus: Maybe, I don't know. My preference would be to have one repo, and hold labs to the standard of production. That makes working in labs harder though :/ [18:38:27] I still think this is *very* inconsistent, and that feels very wrong. [18:38:28] YuviPanda|Bus: FWIW, tools has its own repo for things like that. [18:38:40] Coren: true, but they're debs, not pip. [18:38:40] it only makes it harder if you want to put it in the prod puppet repo [18:38:55] idk what the best thing to do for labs is honestly [18:38:56] chasemp: labs' default puppet repo is operations/puppet.git as wel [18:39:09] andrewbogott: chasemp I guess that's the underlying problem :) [18:39:15] sure but does that mean labs testing should be done via prod puppet? [18:39:20] seems antithetical [18:39:25] chasemp: 'labs testing'? [18:39:38] well you said this isn't for prod but for labs only and for testing pre-wikimania [18:39:47] yet you want to merge into prod puppet? [18:39:59] no, not for 'testing' pre-wikimania (I have that with the self-hosted puppetmasters) [18:40:00] maybe I'm confused tho [18:40:01] chasemp: Labs doesn't have lower standards than prod, just a different SLA. [18:40:17] chasemp: yeah, the 'pre-wikimania' is for trebuchet vs fabric that ori asked about, not for pip [18:40:20] agreed thus packaging should be consistent [18:40:34] but I only care if it's referenced in the production puppet [18:40:37] chasemp: The objective should always be to support "that thing from labs should move to prod" scenario. [18:40:43] agreed [18:40:47] thus no pip? [18:41:01] IMO, thus no pip. [18:41:03] Coren: I think we are saying the same thing [18:41:20] YuviPanda|Bus: I /do/ make debs out of pips at need. [18:41:34] viz. oursql [18:41:36] Coren: it's something I intend to learn hopefully in a couple of weeks :) [18:42:08] andrewbogott: whitespacez fix't [18:42:33] chasemp: also, putting roles in operations/puppet.git is also the only way to consistently apply them from wikitech to labs boxes [18:42:36] (03CR) 10Andrew Bogott: [C: 032] labstore1003: set to Trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/150621 (owner: 10coren) [18:42:52] YuviPanda|Bus: that isn't an argument for not packagingn things correctly? [18:43:15] chasemp: are you volunteering to package them for me? :) [18:43:26] no but sayinig we should do it wrong so we can do it quickly [18:43:32] is not the way to my heart for production stuff [18:44:12] chasemp: andrewbogott Coren anyway, moving forward, I guess 1. figure out how to use ops/puppet modules in puppetception, 2. move this to puppetception, 3. slowly package all the dependencies into debs, 4. remove venvs, 5. bring this back into ops/puppet [18:44:31] YuviPanda|Bus: man py2dsc [18:44:40] :-) [18:44:54] I think that is the right course of action? [18:45:02] And yes, this can go back to ops/puppet once packages right IMO [18:45:09] not sure what puppetception is, just self hosted master? [18:45:12] YuviPanda|Bus: sounds right. I think that 1 should be pretty easy, just a use prod puppet as a submodule and then tack an additional dir onto the module path [18:45:20] * andrewbogott declares it 'easy' without trying it [18:45:30] andrewbogott: heh, I thought so too. except.... [18:45:40] andrewbogott: It's certainly 'easy' to state out loud. :-P [18:45:48] chasemp: puppetception is a self-hosted master + tied into a third-party git repo containing the puppet code. [18:45:57] andrewbogott: recursive submodules, and our modules aren't entirely self contained [18:46:10] andrewbogott: chasemp 3rd party in this case is just another gerrit repo hosted on our gerrit [18:46:16] ahhhh ok [18:46:18] !log temporarily hard-disabling email/sms from icinga via 'mv /usr/bin/mail /usr/bin/mail-disabled' on neon to prevent icinga spam on next startup attempt [18:46:24] Logged the message, Master [18:46:43] mv'ing mail, that'll disable it heh [18:46:45] fwiw i just noticed the debian package python-stdeb which claims to make debs from pip [18:46:56] well anything else I'd do would get undo by puppet attempts, etc :) [18:47:01] jgage: It does, that's why py2dsc lives. :-) [18:47:02] I know how to use that...but I think it's heresy :) [18:47:04] YuviPanda|Bus: yeah, as I typed that a little voice in the back of my head whispered "and convert everything in puppet/manifests to modules" [18:47:18] andrewbogott: :D [18:47:32] YuviPanda|Bus: http://chasemp.github.io/2012/08/08/simple-deb-from-github-python/ [18:47:33] andrewbogott: yeah, and that. I can't easily use anything in puppet/manifests [18:47:35] s/why/where/ [18:47:39] if you want to at least test with debs taht is the easy way [18:47:45] Which I should really get back to, I did a lot of that a year ago but never finished :/ [18:47:51] hm, I don't think this is right. Instead of debianizing everything I think a much more fitting solution is a local blessed pip repository [18:48:05] andrewbogott: the 'clean' way to do puppetception is to just import whatever modules make sense from whrever, and not do ops/puppet [18:48:05] we did the same thing with archiva, and the node ecosystem does roughly the same thing [18:48:14] chasemp: nice! [18:48:21] here is the problem with blessed pip when I go to search for a package I know how to search for things twice, and deal with duplicate rollback logic [18:48:23] milimetric: ~1 [18:48:27] and deal with pip weirdness [18:48:30] to mandate that everything be debianized seems illogical [18:48:36] which is why you should use virtualenv [18:48:51] why we use ubuntu which uses debs? and it's not everything, just things that are packaged [18:48:54] chasemp: virtualenv + pip is pretty much the 'blessed' way to do things in python land, adn we're fighting against it every step [18:48:57] duplicated package managers is a mess of an idea [18:49:06] I think everyone in ops has probably done it [18:49:12] it just leads to solving all problems twice [18:49:37] We have this same discussion about every six months with a different language :) [18:49:41] YuviPanda|Bus: So is using CPAN for perl stuff; doesn't mean it's a good idea to do it in prod. [18:49:44] virtualenv is not a production thing [18:49:47] Today milimetric is like the driver who says to the judge "But I robbed lots of banks before and no one said anything…" :D [18:49:56] this is totally inaccurate [18:50:12] and slightly in ignorance of the history I'm mentioning [18:50:25] if we use pip should we also use pear and cpan? [18:50:31] it just gets unmanageable so quickly [18:50:34] (03PS2) 10Matanya: mobile: replace iptables with ferm rule [operations/puppet] - 10https://gerrit.wikimedia.org/r/117673 [18:50:36] and none of those is more special than the other [18:50:45] chasemp: No pear because pear is dead. [18:50:48] you pick a packaging mechanism and you live and die by [18:50:49] we should be careful what ecosystems we support [18:50:51] pecl then [18:50:52] milimetric: Sorry, I don't think you're actually behaving badly, I'm just cringing in anticipation of an Ops pigpile :) [18:51:02] * YuviPanda|Bus murmurs npm [18:51:11] but when we support an ecosystem we shouldn't beat it to death until it looks like a totally different ecosystem [18:51:12] I agree using npm is bad [18:51:17] (03CR) 10Aaron Schulz: [C: 031] "Seems roughly fine. I'm not quite sure what is going in modules/hhvm/files/hstr though." [operations/puppet] - 10https://gerrit.wikimedia.org/r/150506 (owner: 10Ori.livneh) [18:51:23] YuviPanda|Bus: any comment on that patch ? [18:51:30] deb's handle python packaging just fine though [18:51:47] that's not any kind of stretch, it's done all over including officially by debian [18:52:03] except it adds weeks of work for devs to debianize down chains of impossible dependencies [18:52:05] chasemp: I think it's the selective enforcement that bugs me. I think the easiest solution atm is for me to just learn proper deb packaging, which I shall do shortly :) [18:52:37] chasemp: flask-mwoauth, for example, is fine from pip. debianizing it would require me to debianize about 6 other packages, I think. [18:52:44] if debianizing everything was, in fact, easy, then node would be less than 4 major versions behind, official npm packages wouldn't be broken, official celery packages wouldn't be broken, etc,. [18:52:58] I never easy I said right [18:53:17] chasemp: unneccessarily hard, I think. [18:53:18] you said it wasn't a stretch [18:53:29] and i mean to say that very hard == stretch [18:53:49] are you saying YuviPanda|Bus packages are very hard? [18:54:30] I think it's not packaging that's hard, but dependency hell that is hard. [18:54:37] flattening out recursive dependencies into flat debian packages is hard, yes [18:54:40] chasemp: what andrewbogott said. [18:54:45] yes but not relevant for this case I think? [18:54:52] chasemp: my current app has 7 deps, but they each have a big tree [18:54:53] which is the only case being discussed now [18:54:56] Pretty much exactly what I fear about pip (and composer!) is that they make those dependency issues vanish like magic [18:55:14] saying it's dependency hell is not a good argument for making it with multiple package managers [18:55:20] where 'like magic' is generally a euphimism for 'without you actually knowing what you're installing' [18:55:27] recursive dependencies is the only reason anyone would ever want to use pip or npm, if it weren't for that i'd totally agree with you that a single package management system would be overall better [18:55:28] funny, what i fear is that without those dependencys i have to NIH everything :P [18:55:45] but as things are, a single packaging system is better (arguably) for ops and much worse for everyone else [18:55:52] also what ebernhardson said. [18:55:54] no better for production [18:56:03] saying better for ops is disingenious to us [18:56:20] i didn't say that, it was mentioned above [18:56:57] I thought you just said it :) [18:57:31] arguments of levels of difficulty are not arguments against doing it persay, operations is hard, minutia filled stuff [18:57:31] i am just remixing points from above and my point of view, sorry if lines got blurred [18:57:56] the thing is the overall give and take is on teh side of consistent packaging [18:58:01] So… actually, is anyone advocating for use of pip in prod? Or just in labs? [18:58:14] just labs :) [18:58:17] just labs [18:58:21] the upfront cost of doing it in a way that can be managed consistently is far overshadowed by the advantages for idemotency and environemnt maturity [18:58:58] for labs but used in modules in production puppet...so turtles all the way down [18:59:03] but maybe i'm missing something. Are there separate modules for everything for labs / prod? [18:59:12] ok :) So in that case I think this holy war is mostly needless, since this comes down to pragmatics. Pip isn't banned in labs, it's just banned from the Prod puppet repo. [18:59:16] !log icinga coming back up again for the first time, expect random strangeness to be ignored [18:59:19] And that, mostly for superstitious reasons. [18:59:21] Logged the message, Master [18:59:23] heh [18:59:33] agreed on the first but not the second which doesn't matter :) [18:59:39] So, there are a few situations where you should build packages: [19:00:11] but, is there a different repo where we can put puppet modules and have them not be second class citizens? As in, they can be used in self-hosted puppetmasters easily, etc? [19:00:14] a) It's good practice, in general, to make use of WMF best practices. Since one of the purposes of labs is to engender good systems skills, people are encouraged to follow production practices in labs. [19:00:32] milimetric: I think they said a 3rd repo that rebases the main is common practice? [19:00:41] b) If work in labs might someday trickle into production (or semi-production, which is to say its use becomes vital even if running in labs) then all of the arguments for production security still apply [19:01:07] if neither a or b apply, then we just come down to 'no pip in the puppet repo' which is not necessarily the same as no pip in labs. [19:01:36] If there are things currently in the prod puppet repo that don't follow production practices, we should make an effort to move them elsewhere. [19:01:39] e.g. via puppetception [19:02:02] this is my thought as well [19:02:34] So the answer is "You must not use pip!" It's, rather, "Yuvi is working on new technlogy which will allow an ops-approved use of pip" [19:02:51] I'm a little lost and confused. From my point of view, the policy seems to change as ops dictated that we should keep wikimetrics in labs and have it install via pip, but have its module live in operations/puppet [19:03:15] like, I want to stress that that was not something we fought for [19:03:19] we were literally told to do it that way [19:03:30] I don't think anyone is thinking that [19:03:41] I mean that someone pushed for pip or something [19:03:46] milimetric: That's pretty much what I meant by superstitious. I think the current setup is not actually wrong, it just breaks rules which are generally good rules. [19:04:02] !log csteipp Synchronized php-1.24wmf15/includes/: (no message) (duration: 00m 07s) [19:04:06] all moduels in production puppet should be ready to go to production [19:04:07] Logged the message, Master [19:04:10] that is my position [19:04:17] and we don't use pip in prod at all for good reasons [19:04:26] I think that sums up my thought really [19:04:53] so, from my point of view, so I don't have to keep re-visiting this, it would be useful if either: a) the rule changed to include allowing what wikimetrics is doing or b) enforce the rule and reach out to all violators and have them fix up their code [19:05:10] I'd be ok with either way, but I think the lack of a way causes confusion like this [19:05:15] for both me and projects like quarry [19:05:25] makes sense [19:05:32] (03CR) 10BryanDavis: "Causes an erb template error on deployment-mediawiki01:" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/148099 (owner: 10Giuseppe Lavagetto) [19:06:06] !log csteipp Synchronized php-1.24wmf14/includes/: (no message) (duration: 00m 05s) [19:06:11] Logged the message, Master [19:06:28] milimetric: That's perfectly reasonable. There will always be inconsistencies depending on which Ops you ask, but you should be able to expect clear guidelines. [19:06:38] fair enough :) [19:06:49] * milimetric is happy that inconsistencies are consistently viewed as bad [19:06:57] milimetric: do you know who you collaborated with when setting up the current framework? If it wasn't me, I'll try to check in and build some agreement. [19:07:04] * andrewbogott hopes it wasn't him [19:07:11] I only see zuul/contint and wikimetrics using pip [19:07:17] i'll look up the thread on the ops list, one sec [19:07:22] and maybe the hashar case is needed for some reason i don't know about [19:08:14] YuviPanda|Bus: could puppetception include a manifest that looks roughly like the top of site.pp? Just include every damn manifest? [19:08:16] anyway yeah YuviPanda|Bus sorry this turned into a round table, hopefully it doesn't hurt your efforts too much [19:08:26] I guess maybe that would /require/ the prod repo, hm... [19:09:15] andrewbogott: the thread was "Labs DB access from production cluster" (notice we were making an assumption that we'd have to live in prod) [19:09:32] I started it on 2013-11-26 [19:09:34] maybe add another module path, make your module and add your host in site.pp? [19:09:39] and Ryan Lane and Coren chimed in [19:09:53] milimetric: before my time sir! I'm off the hook here ;) [19:09:56] (03PS1) 10RobH: seting the scs-[a|c]1-codfw.mgmt.codfw.wmnet entries [operations/dns] - 10https://gerrit.wikimedia.org/r/150629 [19:10:52] (03PS10) 1001tonythomas: Removed exim errors_to to support custom Return-Path [operations/puppet] - 10https://gerrit.wikimedia.org/r/141287 [19:10:57] there are separate discussions about pip [19:11:16] (03CR) 10EBernhardson: [C: 031] "lets get this deployed and tested in prod before the new code starts using the watchlist to collect users to notify" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/146849 (owner: 10Bsitu) [19:11:34] (03CR) 10BryanDavis: "Error: Failed to apply catalog: Could not find dependent Package[apache2-mpm-worker] for Apache::Mod_conf[mpm_worker] at /etc/puppet/modul" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148099 (owner: 10Giuseppe Lavagetto) [19:12:08] greg-g: can deploy a mediawiki-config change? It enables the job queue for delivering echo notifications on testwiki and test2wiki: https://gerrit.wikimedia.org/r/#/c/146849/1/wmf-config/InitialiseSettings.php [19:12:15] s/can/can I/ [19:12:16] I see [19:12:22] they had you put it in the role under labs [19:12:25] # Run the wikimetrics/scripts/install script [19:12:25] # in order to pip install proper dependencies. [19:12:25] # Note: This is not in the wikimetrics puppet module [19:12:27] # because it is an improper way to do things in [19:12:29] # WMF production. [19:12:40] anyway, the main concern was the labsdb issue. So andrewbogott: the place to start would be: "is it ok to access labsdb from production"? If not, where would someone define a puppet module that needed access to labsdb? If it's in operations/puppet, would the extra debianization effort really save ops as much effort as it causes devs? [19:12:41] that was the compromise I imagine [19:13:04] (03CR) 10RobH: [C: 031] "This matches the setup for the scs mgmt IPs for EQIAD, in that its in the network mgmt allocated subnet." [operations/dns] - 10https://gerrit.wikimedia.org/r/150629 (owner: 10RobH) [19:13:52] right, not sure about that chasemp, I guess that's true [19:14:01] andrewbogott: Coren assuming similar train of thought YuviPanda|Bus could put the pip stuff in the role for labs only [19:14:07] the module is still operations/puppet/wikimetrics [19:14:13] I'm kinda meh on that but it seems to be previous idea [19:14:22] that as long as the heresy was contained to the labs realm in the role [19:14:32] ugh, sorry guys, in a bus, lost connection [19:14:34] * YuviPanda|Bus reads logs [19:14:41] Yeah, I wonder if it's safe enough to just have a segregated module/collection of roles in prod puppet that are for labs only? [19:14:41] but that doesn't really work because the module would be broken if you tried to move it to production [19:14:56] right, but that's ok - if it moved to prod it would need to be debianized [19:15:08] Mostly we've never come up with a very good policy for labs vs. puppet other than me saying "I encourage you…" a lot. [19:15:11] i don't think anyone's arguing with that [19:15:18] chasemp: I need pip for zuul to install it [19:15:28] chasemp: though the dependencies are fulfilled by Debian packages. [19:15:42] chasemp: I use something like: HTTP_PROXY=. HTTPS_PROXY=. pip install . [19:16:08] hashar: there's no .deb for zuul? [19:16:22] hehe [19:16:23] :D [19:16:24] andrewbogott: I leave it to you to decide if the role pip use case for labs only is ok [19:16:45] I could see that going either way, I don't love it :) but it does mean we can scrutinize the module to prod level [19:16:51] and keeps things relatively sane-ish [19:17:03] I can't decide :) Maybe a bit of directory organization + some explicit fail-on-prod directives is enough. [19:17:16] whoo for explicit fail on prod [19:17:25] and the roles are in manifests/roles/labs? [19:17:31] But... [19:17:44] Jeff_Green, I don't know if this is actually a problem with my MX setup; but mailman and gerrit are both sending mail to the canonical name of my MX (e.g. to mwalker@sylph.khaosdev.com instead of mwalker@khaosdev.com) [19:17:49] * YuviPanda|Bus will debianize his stuff anyway, but milimetric probably doesn't have the time nor inclination [19:17:53] I dunno, nicest yet would be to have some sort of cascading repos. [19:18:19] andrewbogott: wikitech also deals only with operations/puppet (OpenStackManager), so can't apply roles from puppetception there [19:18:20] milimetric: good talk though, i appreciate your view point and hopefully this isn't a pain for all [19:18:42] not at all, i'm glad i have a view point that might inform better policies [19:18:46] whichever way they go, really [19:18:54] YuviPanda|Bus: I would like that to be true someday, but it isn't right now -- those class and var names are just arbitrary strings. [19:18:59] chasemp: just saw the logs, I'll still be able to demo at wikimania since these are already running on self hosted puppetmasters. [19:19:09] So at the moment I bet it would work just fine with puppetception [19:19:25] andrewbogott: oh, but I guess puppetception will have to somehow reach into ldap to get a list of roles? [19:19:47] YuviPanda|Bus: is it running a different puppetmaster from the one that runs on self-hosted instances? [19:19:55] 'cause that one looks in ldap already... [19:19:56] andrewbogott: there's no puppetmaster [19:20:05] oh, it's just 'apply'? [19:20:10] andrewbogott: ya [19:20:10] Then, I don't know how that'll work [19:20:13] mwalker: you don't have an mx record so exim would use the fqdn itself, and for you that's a cname, so exim resolves the cname to the A record [19:20:15] YuviPanda|Bus: you can set where it looks for modules [19:20:22] I'm not sure what the lacking functionality is? [19:20:50] !log icinga is now substantially back online. email/sms still disabled for now, and downtimes/acks need to be re-added for known issues [19:20:55] Logged the message, Master [19:21:47] chasemp: mostly personal ickiness from recursive submodules, and also the potential that you'll get a puppetception module that's *faaar* behind ops/puppet since it never updated the submodule [19:22:05] chasemp: puppetception is ok for things that'll never be in prod, though [19:22:12] (03CR) 10MarkTraceur: [C: 031] "Yessiree bub." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/146951 (owner: 10Gergő Tisza) [19:22:21] chasemp: since then they don't have to use ops/puppet at all, and can just use whatever they want [19:22:34] (03CR) 10BryanDavis: mediawiki: use mods-enabled, prepare for HAT (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/148099 (owner: 10Giuseppe Lavagetto) [19:22:39] Reedy: I'd say 146951 is safe to go out tomorrow. [19:22:42] I meant for your case at the moment, what are you having a hard time doing? [19:23:01] chasemp: oh, mostly I haven't tried :) I didn't realize pip on labs is going to be A Big Deal, [19:23:33] err [19:23:44] pip on ops/puppet [19:23:44] I knew it was banned on prod, and also knew it was not on labs, so assumed it's ok to write a labs role that used pip [19:24:24] ah fair, andrewbogott did you make a call on the labs role and pip-ification? [19:24:55] or maybe you and coren can come up with a consistent ideology, or maybe just no way is the answer idk [19:25:35] indeed. if 1. pip is ok on labs, 2. puppet from ops/puppet is the way to puppetize things on labs, but 3. pip isn't allowed on puppet [19:25:44] logically inconsistent staements :) [19:25:45] no I meant [19:25:49] pip in the ROLE not the module [19:25:53] that was the compromise before it seems [19:25:56] chasemp: oh? I missed that [19:26:00] I need to think about this more and/or discuss with Coren before I can declare a new policy. But debianizing will be fun for Yuvi in the meantime anyway :) [19:26:02] chasemp: I can modify mine to be that [19:26:13] andrewbogott: true :) [19:26:42] I wonder if it's possible to have labs just automatically draw from two different git repos? then we could have a prod repo and a supplemental volunteer-reviewed labs repo. [19:26:57] so yes and no-ish [19:26:59] And if things were promoted, they'd be subject to the full force of Ops review [19:27:04] we do this with secrets now kind of [19:27:11] you can pull "modules" from multiple paths [19:27:12] Hm, that's true. [19:27:15] So it should be possible. [19:27:20] so you could say there is a repo with labs only modules [19:27:23] and that would be fairly easy to do [19:27:39] yeah, that'll be nice too. and we can move things from one to the other later on [19:27:43] if they move to prod [19:27:52] chasemp: this will also mean we can write consistent guidelines for each [19:28:01] hm, so maybe that's a good way to move forward. [19:28:17] I kind of like that idea, but unsure how it fits into their use case overall [19:28:20] puppetception getting nicer and more usable is also another option [19:28:31] yeah [19:28:37] it's a good way to both sandox new things not ready, and maintain pristine prod [19:29:06] although, 'review everything in one go before it moves to prod repo' reminds me of good old days of mediawiki 1.19 [19:29:21] whre 3 months were spent reviewing all teh code that was already merged before it could be released/deployed [19:29:24] (svn days) [19:29:32] you could review this other labs repo tho? [19:29:41] any repo can be up for review [19:29:53] hopefully an incoming icinga alert about cp4011 puppet disable soon as a test [19:29:53] true, with appropriate coding standards and strict enough review that won't happen [19:31:21] andrewbogott: anyway, I'll wait on you and Coren to decide this (perhaps email ops@ too?). I'll work on puppetception *and* debianization [19:31:54] YuviPanda|Bus: sounds good, thanks for your patience. [19:32:30] andrewbogott: :) I'm going to let quarry keep running from its self hosted puppetmaster for now, tho [19:32:42] YuviPanda|Bus: Presumably due to wikimania, the @ops list isn't getting much attention this week :) But I'll keep the idea in mind. [19:33:13] andrewbogott: chasemp reccomendations on things to read about deb packaging? I've never done any packaging before [19:33:30] oh right those take a while to alert... [19:33:32] YuviPanda|Bus: I just walked a good fosdem talk about that, let me find the link... [19:33:55] YuviPanda|Bus: i used to follow https://www.debian.org/doc/manuals/maint-guide/dreq.en.html but there might be something much much more concise :) [19:33:55] YuviPanda|Bus: in the real world, though, you'll be using some kind of automated generator which depends on what kind of software you're building from. [19:33:58] andrewbogott: chasemp also I'll bug you guys while I learn to package things :) [19:34:25] for every question you ask me we'll generate 3 [19:34:27] it will be a good time [19:34:28] trying cp1037 varnishd stop (not in prod) [19:34:34] chasemp: heh, best way to learn :) [19:34:53] YuviPanda|Bus: I watched this on Monday and found it to be pretty useful. Also short: https://archive.fosdem.org/2012/schedule/event/debian_packaging.html [19:35:10] andrewbogott: true, but I need to understand the underlying system too. Don't even know what's in a deb file, for example. [19:35:19] ebernhardson: ah, that's useful [19:36:41] YuviPanda|Bus: judging from that video, building a deb approximately involves doing a 'make install' into a fake root, and then rolling up the result. [19:36:46] But there's also a ton of metadata and such. [19:36:50] right [19:37:01] andrewbogott: can't watch the video now (in a bus), but will do tomorrow! [19:37:03] And I still haven't properly learned about how diffs are handled, which was what I was actually trying to learn about. [19:37:47] I know folks here love buildpackage [19:37:54] and so I'm spent some time with it at least [19:38:28] hmm right [19:38:57] so it's just a way of putting files in places and then potentially running scripts at various points (pre/post/etc), and handling dependencies [19:39:31] debs or buildpackage? [19:39:41] debs [19:39:54] well, a deb file, rather [19:40:42] consistent format binary distribution mechanism :) [19:40:50] right :) [19:40:52] that does the things you mentioned I guess [19:40:57] yeah [19:41:08] does the things I mentioned consistently and reproducibly [19:41:36] andrewbogott: there is no .deb for zuul. Though I will eventually h ave to get one done :=] [19:41:43] 'install' is jut another word for 'put these files here, those files there, and when this happens, run this, and also, I need you to (recursively do the same thing) for package A, B, C' [19:41:48] andrewbogott: there is some debian packaging going on but upstream just use sources [19:41:59] The deb package for gerrit is massively out of date. [19:42:04] Can't wait to kill that crap package. [19:42:21] you talk like ^d but are named of the clan taco [19:42:22] yeah we should get rid of it entirely [19:42:40] chasemp: I go by many names :) [19:42:42] tacotuesday: ping ^d about it he will be quite happy to eliminate the package [19:42:48] it installs an obsolete version of gerrit anyway [19:42:51] tacotuesday: it's wednesday! [19:42:53] and they build it from source [19:43:00] hashar: yeah, building our own deb shouldn't be too big of a deal. If you want to dump it on ops then open an RT ticket and someone might do it... [19:43:00] hashar: tacotuesday is ^d [19:43:02] YuviPanda|Bus: I missed tacos yesterday :( [19:43:36] andrewbogott: but then I will depend on ops to push new packages in apt.wm.o were as now I can push code & run the installer :] [19:43:56] andrewbogott: though maybe I could rebuild the package and install it manually while waiting for upload :-D [19:44:01] andrewbogott: ^ another issue for others as well :) [19:44:05] hashar: are you really upgrading zuul every week? [19:45:04] andrewbogott: I do. Though this year I have been blocked each time a new dependency is added in [19:45:16] andrewbogott: gotta figure out a nice solution to get rid of dependencies entirely [19:45:19] wow. Ok. [19:45:35] zuul moves quickly I guess [19:45:42] andrewbogott: I run an outdated version with a bunch of patches cherry picked. Currently blocked by python-six [19:45:45] andrewbogott: this is the kinda inconsistency that bugs me :) [19:45:52] will probably just chip the python-six as a tarball [19:46:07] chasemp: not that quick. But I liked to be close to upstream [19:46:09] ok, I'll stop whining now :) [19:46:21] and [19:46:27] I am lame at building deb packages :-] They scare me [19:46:40] +1 [19:46:46] hashar and I have both been down that road. [19:46:46] testing 123 [19:46:51] There be dragons. [19:47:47] testing 123 [19:49:17] ah icinga talks again [19:49:18] andrewbogott: is our convention to import the external library into our gerrit before building packages for them? [19:49:20] testing 123 [19:49:20] chasemp: ^ [19:49:23] now it will! [19:49:28] andrewbogott: chasemp: so yeah having Zuul packaged is on the roadmap somewhere [19:49:42] YuviPanda|Bus: yes, usually. [19:49:59] YuviPanda|Bus: but, mostly, it's clear that we need written policies about all of this :) [19:50:04] testing real event via cp1037 again now [19:50:14] andrewbogott: +infinity [19:52:20] if I build a local package I usually do it from a local repo since buildpackage requires a debian branch [19:52:21] etc [19:52:41] hmm, but wouldn't that be non-reproducible later on? [19:52:50] hmm, I should probably RTFM before asking questions :) [19:52:55] how so? I make that branch in gerrit as well [19:53:44] chasemp: oh, I think I had a different meaning of the word 'local' [19:53:58] as in 'local only' [19:54:00] local to wmf :) [19:54:24] bad parlance on my part [19:54:25] heh [19:54:42] PROBLEM - Varnish HTTP text-backend on cp1037 is CRITICAL: Connection refused [19:54:48] woot [19:54:56] chasemp: I'm reading https://www.debian.org/doc/debian-policy/policy.pdf for background, and then pick up tutorials as such. [19:55:27] sweet then you can answer my questions soon [19:55:54] hehe [19:56:42] RECOVERY - Varnish HTTP text-backend on cp1037 is OK: HTTP OK: HTTP/1.1 200 OK - 189 bytes in 0.002 second response time [19:57:17] chasemp: the last time I did something that was terrible ('sudo apt-get install python-virtualenv' on stat1001) I got lectured, and then started learning puppet (that was quite some time back). Maybe I'll end up getting obsessed with packaging now :) [19:57:55] !log shutting off icinga to make some optimizations [19:58:01] Logged the message, Master [20:00:05] gwicke, subbu, cscott: Dear anthropoid, the time has come. Please deploy Parsoid (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140730T2000). [20:02:57] ahem, is the locking issue known? Error: 1213 Deadlock found when trying to get lock; try restarting transaction [20:03:06] https://logstash.wikimedia.org/#/dashboard/elasticsearch/exceptionmonitor [20:05:57] yurikMskRu: Most (all?) see to be from db1040.eqiad.wmnet [20:06:35] 195 in the last hour anyway (searched "10.64.16.29") [20:06:46] (03PS1) 10Jgreen: add redirect for blog.wikimedia.org for RT 8039, also clean up sloppy formatting [operations/puppet] - 10https://gerrit.wikimedia.org/r/150688 [20:07:55] connection dying. cya [20:08:19] (03PS2) 10Reedy: Add export-0.9.xsd [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149643 (https://bugzilla.wikimedia.org/68686) [20:11:56] (03CR) 10Jgreen: [C: 032 V: 032] add redirect for blog.wikimedia.org for RT 8039, also clean up sloppy formatting [operations/puppet] - 10https://gerrit.wikimedia.org/r/150688 (owner: 10Jgreen) [20:24:19] !log icinga back online again [20:24:24] Logged the message, Master [20:30:36] (03CR) 10Peachey88: "iirc doesn't WP allow you to do custom post dates and urls to fix this in WP compared to having something we need to maintain in server fi" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150688 (owner: 10Jgreen) [20:33:01] (03PS1) 10BBlack: slow down default check_ganglia interval from 1 to 3 mins [operations/puppet] - 10https://gerrit.wikimedia.org/r/150694 [20:33:03] (03PS1) 10BBlack: move some icinga temporaries to tmpfs [operations/puppet] - 10https://gerrit.wikimedia.org/r/150695 [20:34:18] (03CR) 10Giuseppe Lavagetto: [C: 031] slow down default check_ganglia interval from 1 to 3 mins [operations/puppet] - 10https://gerrit.wikimedia.org/r/150694 (owner: 10BBlack) [20:34:38] (03CR) 10BBlack: [C: 032] slow down default check_ganglia interval from 1 to 3 mins [operations/puppet] - 10https://gerrit.wikimedia.org/r/150694 (owner: 10BBlack) [20:34:51] (03CR) 10BBlack: [C: 032] move some icinga temporaries to tmpfs [operations/puppet] - 10https://gerrit.wikimedia.org/r/150695 (owner: 10BBlack) [20:38:16] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: Epic puppet fail [20:38:20] (03CR) 10Giuseppe Lavagetto: "Please see my comments" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150695 (owner: 10BBlack) [20:38:35] <_joe_> bblack: ^^ [20:38:39] yeah that's me :) [20:38:52] <_joe_> no I was sayng look at my comments [20:39:48] (03PS3) 10Andrew Bogott: restrict access to puppet logs to root users [operations/puppet] - 10https://gerrit.wikimedia.org/r/150273 (owner: 10Dzahn) [20:40:40] (03CR) 10Andrew Bogott: "Less racy like this? The logfile should be created once the service starts..." [operations/puppet] - 10https://gerrit.wikimedia.org/r/150273 (owner: 10Dzahn) [20:40:44] (03CR) 10BBlack: "No, it's not defined anywhere yet. This is just mirroring my local changes to icinga.cfg on neon back into puppet for now, so that puppet" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150695 (owner: 10BBlack) [20:46:19] (03CR) 10Hashar: "Thanks for the puppetization. I will definitely use the script :-]" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150593 (owner: 10BryanDavis) [20:49:38] mutante: can you look at https://rt.wikimedia.org/Ticket/Display.html?id=8043 ? [20:49:51] maybe even hack that in on mw1053 first :) [20:53:23] AaronSchulz: He might not be around for a week or so... He said he was travelling to Wikimania (via Germany) [20:54:41] maybe RobH can look instead [20:56:34] mutante is traveling today [20:56:49] AaronSchulz: so, carrying the ops converstoin into here for redis bloom [20:56:56] i think i'll rename them and get you both. so lets see [20:57:13] the redis servers are rdb, but this isnt the same right? [20:57:30] yeah, it's single purpose mostly [20:57:50] rbf1001-1002 (redis bloom filter) [20:57:59] i imagine these wont be sharing space with other stuff [20:58:02] yeah, I was going to suggest that ;) [20:58:03] in particular on these hosts [20:58:09] ok, i'm stealing back actinium [20:58:11] and renaming it [20:58:15] and reinstalling [20:58:32] AaronSchulz: I'll get the other one allocated and have both to you again tomorrow [20:58:34] sorry about that! [21:02:50] !log turned icinga email/sms back on [21:02:55] Logged the message, Master [21:04:02] !log Started populateBacklinkNamespace.php on wikidata and commons [21:04:08] Logged the message, Master [21:11:15] <_joe_> and now, we should ack all bogus alarm, or eliminate them [21:19:50] yeah we're still missing all our old downtimes/acks [21:20:29] (03CR) 10Bsimmers: "Looks good, just a few comments." (035 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150506 (owner: 10Ori.livneh) [21:21:43] (03PS1) 10RobH: seting rbf1001-1002 dns info [operations/dns] - 10https://gerrit.wikimedia.org/r/150701 [21:22:54] (03PS1) 10BBlack: add more icinga tmpfs io offload [operations/puppet] - 10https://gerrit.wikimedia.org/r/150702 [21:24:21] bblack: https://rt.wikimedia.org/Ticket/Display.html?id=8043 [21:24:33] (03Abandoned) 10RobH: seting rbf1001-1002 dns info [operations/dns] - 10https://gerrit.wikimedia.org/r/150701 (owner: 10RobH) [21:25:17] (03PS1) 10RobH: seting dns for rbf1001/1002 [operations/dns] - 10https://gerrit.wikimedia.org/r/150703 [21:26:42] (03CR) 10RobH: [C: 032] seting dns for rbf1001/1002 [operations/dns] - 10https://gerrit.wikimedia.org/r/150703 (owner: 10RobH) [21:27:27] (03CR) 10BBlack: [C: 032 V: 032] add more icinga tmpfs io offload [operations/puppet] - 10https://gerrit.wikimedia.org/r/150702 (owner: 10BBlack) [21:30:54] (03CR) 10coren: [C: 032] "Is correct." [operations/puppet] - 10https://gerrit.wikimedia.org/r/149316 (https://bugzilla.wikimedia.org/68545) (owner: 10Tim Landscheidt) [21:31:28] (03PS1) 10RobH: rbf1001-1002 install params [operations/puppet] - 10https://gerrit.wikimedia.org/r/150706 [21:31:38] James_F: I'm going to be afk during this afternoon's SWAT, just FYI. [21:32:11] AaronSchulz: O [21:32:53] greg-g: No worries, I'll man the barricades. [21:33:06] James_F: :) [21:33:13] AaronSchulz: I'd say on a first look at it all, I'm ok with ptrace_scope=0, but it's probably the kind of thing we need to kick around in ops and make sure nobody objects first [21:33:33] AaronSchulz: https://www.kernel.org/doc/Documentation/security/Yama.txt <- good primer [21:34:16] (and we currently have ptrace_scope=1 because it's the Ubuntu default which we copied into our puppet sysctl stuff) [21:34:57] (O, will you SWAT with me? Will you scap the promised code? Will we build a broken server and deploy that muck today?) [21:35:05] Etc. [21:37:25] (03CR) 10RobH: [C: 032] rbf1001-1002 install params [operations/puppet] - 10https://gerrit.wikimedia.org/r/150706 (owner: 10RobH) [21:40:37] (03PS1) 10Aaron Schulz: Fixed "Undefined index: HTTP_X_FORWARDED_FOR" warning [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150707 [21:40:40] (03CR) 10Bsimmers: beta + hhvm: Add bt-hhvm dump script (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150593 (owner: 10BryanDavis) [21:41:03] bblack: maybe you can add that link to RT [21:44:16] (03CR) 10BryanDavis: "ori: *poke* re Brett's comment." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150593 (owner: 10BryanDavis) [21:44:28] discussion of https://www.mediawiki.org/wiki/Requests_for_comment/CentralNotice_Caching_Overhaul_-_Frontend_Proxy in #wikimedia-office in 15 min [21:46:23] (03PS1) 10Legoktm: Use https for extdist.wmflabs.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150709 [21:48:01] (03PS4) 10Ottomata: Split kafka package into 3 separate packages [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/149889 [21:48:23] (03Abandoned) 10Ottomata: Add debian/kafka-common.dirs [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/150579 (owner: 10Ottomata) [21:55:21] (03PS5) 10Ottomata: Split kafka package into 3 separate packages [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/149889 [21:56:03] mutante, I think maybe your problem is your local commit hook is somehow out of sync with the canonical version? (extra line in commit footers) [21:58:31] jeremyb: he is out [21:58:49] traveling [21:59:04] matanya, so he'll get it eventually. maybe. or else i'll repeat it. :-) [21:59:12] but thanks [22:01:25] (03PS6) 10Ottomata: Split kafka package into 3 separate packages [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/149889 [22:02:08] (03CR) 10Ottomata: "Thanks! I believe I have resolved these issues over in:" [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/148287 (owner: 10Plucas) [22:04:26] (03PS1) 10Spage: Enable job queue to process notifications on beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150710 [22:04:50] discussion of https://www.mediawiki.org/wiki/Requests_for_comment/CentralNotice_Caching_Overhaul_-_Frontend_Proxy in #wikimedia-office now [22:06:12] bblack: ^ may be interesting to you [22:06:36] ori: thanks [22:07:02] I've been looking at varnish stats and seeing how those banners affect us, it's definitely needed (the look/overhaul) [22:09:41] (03PS1) 10Hedonil: wenode.pp: Install LUA support for lighttpd (lighttpd-mod-magnet) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150712 (https://bugzilla.wikimedia.org/68614) [22:10:17] (03PS1) 10Ottomata: Fix for debian/README.Debian [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/150713 [22:10:55] (03CR) 10Ottomata: "Alex, this is right, isn't it? debian branch is now 'master'?" [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/150713 (owner: 10Ottomata) [22:11:38] (03CR) 10jenkins-bot: [V: 04-1] wenode.pp: Install LUA support for lighttpd (lighttpd-mod-magnet) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150712 (https://bugzilla.wikimedia.org/68614) (owner: 10Hedonil) [22:19:01] PROBLEM - puppet last run on cp3011 is CRITICAL: CRITICAL: Puppet has 1 failures [22:22:34] (03PS2) 10Hedonil: wenode.pp: Install LUA support for lighttpd (lighttpd-mod-magnet) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150712 (https://bugzilla.wikimedia.org/68614) [22:23:06] (03CR) 10Bsitu: [C: 032] Enable job queue to process notifications on beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150710 (owner: 10Spage) [22:23:14] (03Merged) 10jenkins-bot: Enable job queue to process notifications on beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150710 (owner: 10Spage) [22:23:16] (03CR) 10jenkins-bot: [V: 04-1] wenode.pp: Install LUA support for lighttpd (lighttpd-mod-magnet) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150712 (https://bugzilla.wikimedia.org/68614) (owner: 10Hedonil) [22:25:45] (03CR) 10Tim Landscheidt: "Also, "we*b*node.pp"." [operations/puppet] - 10https://gerrit.wikimedia.org/r/150712 (https://bugzilla.wikimedia.org/68614) (owner: 10Hedonil) [22:31:03] (03PS3) 10Hedonil: wenode.pp: Install LUA support for lighttpd (lighttpd-mod-magnet) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150712 (https://bugzilla.wikimedia.org/68614) [22:31:23] (03PS28) 10Yuvipanda: Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [22:31:40] (03CR) 10jenkins-bot: [V: 04-1] wenode.pp: Install LUA support for lighttpd (lighttpd-mod-magnet) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150712 (https://bugzilla.wikimedia.org/68614) (owner: 10Hedonil) [22:32:40] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /a/common/). [22:37:00] RECOVERY - puppet last run on cp3011 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [22:41:50] (03PS4) 10Hedonil: wenode.pp: Install LUA support for lighttpd (lighttpd-mod-magnet) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150712 (https://bugzilla.wikimedia.org/68614) [22:42:31] (03CR) 10jenkins-bot: [V: 04-1] wenode.pp: Install LUA support for lighttpd (lighttpd-mod-magnet) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150712 (https://bugzilla.wikimedia.org/68614) (owner: 10Hedonil) [22:46:06] (03PS5) 10Hedonil: wenode.pp: Install LUA support for lighttpd (lighttpd-mod-magnet) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150712 (https://bugzilla.wikimedia.org/68614) [22:46:57] (03PS29) 10Yuvipanda: Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [22:51:12] (03PS30) 10Yuvipanda: Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [22:51:22] (03PS6) 10Hedonil: webode.pp: Install LUA support for lighttpd (lighttpd-mod-magnet) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150712 (https://bugzilla.wikimedia.org/68614) [22:54:23] mutante: is it possible to change the admin of a list from sodium? [22:54:34] I'm dealing with a list that was run by mel, no longer at WMF [22:54:51] he's travelling :) [22:55:28] oops [22:55:38] andrewbogott, doesn't it involve making a new list administration password? [22:55:44] probably! How do I do that? [22:55:56] (03PS31) 10Yuvipanda: Role + Module for Quarry Labs tool [operations/puppet] - 10https://gerrit.wikimedia.org/r/150425 [22:56:01] andrewbogott: chasemp ^ no more pip, I made debian packages for the things I needed :) [22:56:02] I am poring over https://wikitech.wikimedia.org/wiki/Mailman but don't see anything about that [22:56:46] YuviPanda|Bus: that was fast [22:56:49] andrewbogott: chasemp verified working as well [22:57:00] andrewbogott: heh :) Couldn't sleep in the bus because everyone else was snoring [22:57:01] Also, HOW LONG is your bus ride? [22:57:05] Are you traveling cross-country? [22:57:07] andrewbogott: about 9h [22:57:10] || how slow is the bus [22:57:11] eek [22:57:13] andrewbogott: no, next city. buses here are slow [22:57:14] andrewbogott, I guess it's supposed to be done by someone with the site password, rather than someone on the server? [22:57:18] about 350km [22:57:23] YuviPanda|Bus: the train is worse? [22:57:27] hmm, it's not 9h, only 7h [22:57:38] andrewbogott: no, but to book trains i've to deal with the govt's train booking website. [22:57:46] heh, ok :) [22:57:46] andrewbogott: I'd rather use IE6 for the rest of my life [22:58:01] You can't pay a travel agent $3 to do that for you? [22:58:10] andrewbogott: I can, but then I've to deal with a travel agent [22:58:25] andrewbogott: bus ticketing is much more painless :) [22:58:34] andrewbogott: also trains are only like an hour shorter [22:58:47] Maybe you can declare it to be work-related and get Leslie to do it :) [22:58:49] andrewbogott: and you usually have to book a week or two in advance [22:58:59] Oh, that's a minus. [22:59:18] Also, I'm envisioning Euro or North American trains which are nice to ride on. No idea if that's true for your route. [22:59:23] andrewbogott: heh, this is prep for wikimania :) flight is from the city I'm going to [22:59:43] andrewbogott: not at all, no. they're the worst experiences, at least in 'second class' (which is the normal thing) [22:59:44] Well, actually, Chinese, Malaysian, Thai, Zimbabwean… come to think of it I've never been on a train I didn't like. [22:59:55] heh [23:00:00] Clearly time for me to come to India and learn my lesson [23:00:04] andrewbogott: Indian trains are good if you travel first class or better [23:00:05] RoanKattouw, mwalker, ori, MaxSem, hoo: Dear anthropoid, the time has come. Please deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140730T2300). [23:00:09] andrewbogott: indeed, highly reccomended [23:00:24] andrewbogott: also, CR? :D Clarity on pip is still needed, but for now... [23:00:37] why did it pick my name up? :D [23:00:44] I'll do it [23:00:50] but I can deploy my stuff myself, if needed [23:00:51] or that [23:00:55] YuviPanda|Bus: I'm on RT duty this week so have some access things to deal with, I'll try to CR sometime soon [23:01:02] andrewbogott: cool [23:01:08] andrewbogott, have you spoken to James Alexander about changing the list admin password? [23:01:12] andrewbogott: Oh yeah I should tell you what I did to wikitech yesterday [23:01:35] Krenair: um… no? Should I have? [23:01:39] chasemp: Coren just a poke about https://gerrit.wikimedia.org/r/#/c/150425/ again. I've made debian packages for the things I needed, and also gotten rid of the virtualenv :) [23:01:53] andrewbogott, https://bugzilla.wikimedia.org/show_bug.cgi?id=68680#c8 implies he can do it [23:02:07] andrewbogott: The VisualEditor checkout (especially the lib/ve submodule inside of it) was screwed up somehow, so I destroyed it and set it back up, but I had to fight it a bit before it worked. I think the submodule works properly now but I'm not sure [23:02:20] hoo, if you have dpeloyment access why https://gerrit.wikimedia.org/r/#/c/150597/ does not have accompanying submodule updates? [23:02:24] andrewbogott: Also, slot0/cache/l10n was all owned by root:root which was spamming the apache logs with warnings and errors [23:02:31] RoanKattouw: Ok, that's about what I got from the backscroll :) Mostly a result of my not doing --recursive [23:02:34] MaxSem: whoops, not linked [23:02:41] https://gerrit.wikimedia.org/r/#/c/150720/1 [23:02:45] So I chowned the entire dir and its contents to mwdeploy:mwdeploy (I think) [23:02:46] https://gerrit.wikimedia.org/r/#/c/150721/1 [23:02:48] RoanKattouw: oh, that's probably because the localization cache needed rebuilding [23:02:50] Whatever the equivalents in slot1 were owned by [23:03:01] andrewbogott: Yeah when you rebuild those, sudo -u mwdeploy [23:03:08] YuviPanda|Bus: I have to take off for a bit here sorry, if it's still around tomorrow I will try to look [23:03:36] chasemp: yeah, 'tis ok :) just a fyi. And I take back what I said about debs being hard :) [23:03:55] chasemp: thanks for the feedback earlier! :) [23:05:42] legoktm, yt? [23:05:47] yes [23:05:51] (03CR) 10MaxSem: [C: 032] Use https for extdist.wmflabs.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150709 (owner: 10Legoktm) [23:05:57] (03Merged) 10jenkins-bot: Use https for extdist.wmflabs.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150709 (owner: 10Legoktm) [23:10:15] bsitu, by merging https://gerrit.wikimedia.org/r/#/c/150710/ you meant you wanted it deployed? [23:10:26] Is https://lists.wikimedia.org/mailman/admin supposed to have a self-signed cert? [23:10:29] MaxSem: yeah, go ahead [23:10:42] Because adding a security exception in my browser and then immediately typing the admin password… that doesn't feel great [23:10:50] andrewbogott: it doesnt [23:11:02] shows rapidssl geotrust to me. [23:11:04] bsitu, next time please don't merge yourself unless you're going to deploy it immediately [23:11:35] andrewbogott: i think something is funky with your browser man [23:11:39] RobH: Sorry, I don't know what that is -- something in my browser or a commandline thing? [23:11:43] Ok, lemme try in crhome [23:11:49] so in browser [23:11:57] (03CR) 10MaxSem: [C: 032] Enable job queue to process notification in test/test2 wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/146849 (owner: 10Bsitu) [23:11:59] Hm, yeah, no complaints in chrome [23:12:01] you can click the little lock next to url to see cert info [23:12:13] MaxSem: sorry, I will have the deployer merge it next time [23:12:40] (03Merged) 10jenkins-bot: Enable job queue to process notification in test/test2 wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/146849 (owner: 10Bsitu) [23:13:15] Hm, it looks right now. And I can't get it to complain again, of course [23:13:34] !log maxsem Synchronized wmf-config: (no message) (duration: 00m 05s) [23:13:37] * andrewbogott forges ahead [23:13:39] Logged the message, Master [23:13:49] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [23:13:55] bsitu and legoktm, please test your changes [23:14:21] MaxSem: the rest of my code isn't live yet, it's going out with wmf16. I just wanted the config ready beforehand [23:14:22] MaxSem: thx [23:14:54] legoktm, to make Reedy's life miserable when he does het deploy? XD [23:15:57] :P [23:17:06] legoktm: is this extdist? [23:17:16] YuviPanda|Bus: yeah, going live tomorrow :) [23:17:18] legoktm: if so, we should probably add a puppet patch that hides the IPs from the logs [23:17:27] James_F: Is CiteThisPage to go out with wmfNEXT tomorrow? [23:17:27] Wikimedia Platform operations, serious stuff | Log: http://bit.ly/wikisal | Channel logs: http://ur1.ca/edq22 | MediaWiki error counts: http://ur1.ca/edq1f | Requests: ops-requests@rt.wikimedia.org | On RT duty: andrewbogott | On Product duty: James_F [23:17:31] blarg [23:17:59] YuviPanda|Bus: oh, the proxy doesn't automatically do that? [23:18:07] legoktm: it sends XFF [23:19:34] YuviPanda|Bus: oh. submit a patch for it then? :) [23:19:45] legoktm: heh, will do when not on bus :P [23:19:50] legoktm: you can too! It's just nginx conf :P [23:20:02] Reedy: I think it's ready, but I'll be on a 'plane… wait 'til next time? [23:20:50] YuviPanda|Bus: oh, what do I need to change? [23:22:11] legoktm: google for 'remove nginx ip address log', and add those to the nginx template file in the extdist module? [23:22:32] ok [23:22:50] MaxSem: uh, did you want me to do the submodule bump for the GeSHi change or were you going to do it? [23:23:02] already did [23:23:14] wmf15 only? [23:23:16] yeah [23:23:29] I didn't feel like doing wmf14 since it's going away tomorrow [23:23:57] !log maxsem Synchronized php-1.24wmf14/extensions/Wikidata: (no message) (duration: 00m 11s) [23:24:01] Logged the message, Master [23:24:03] hoo, part 1 ^^^ [23:24:08] I scripted doing submodule bumps ;) [23:24:09] :) [23:24:14] is wmf14 broken too? [23:24:42] Reedy: GeSHI? yeah [23:26:56] !log maxsem Synchronized php-1.24wmf15/extensions/SyntaxHighlight_GeSHi/: (no message) (duration: 00m 05s) [23:27:02] Logged the message, Master [23:27:15] !log maxsem Synchronized php-1.24wmf15/extensions/Wikidata/: (no message) (duration: 00m 08s) [23:27:19] Logged the message, Master [23:27:31] !log maxsem Synchronized php-1.24wmf15/extensions/MwEmbedSupport/: (no message) (duration: 00m 03s) [23:27:38] Logged the message, Master [23:27:51] legoktm, hoo, bawolff ^^^ [23:27:55] please verify [23:28:35] MaxSem: works, thanks! [23:30:09] (03PS7) 10Hedonil: webnode.pp: Install LUA support for lighttpd (lighttpd-mod-magnet) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150712 (https://bugzilla.wikimedia.org/68614) [23:30:32] omfg, why labs uses lighty? :P [23:30:36] Hmm, old css still on site, but maybe it just takes a little while for RL cache to expire [23:30:57] It works fine when ?debug=true [23:31:21] bawolff, 5 minutes hasn't passed [23:31:44] also waiting for startup module update... [23:31:50] Oh right [23:32:48] PROBLEM - Puppet freshness on labsdb1004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 00:00:00 UTC [23:33:26] should be now... [23:35:06] bawolff, and now? [23:35:19] ["wikibase.sites","1406762846"] [23:35:25] that's what I wanted to see :) [23:35:27] all fine [23:35:44] not sure why it doesn't actually purge away the old version, but that's another thing [23:35:52] MaxSem: Works. Thanks [23:36:01] whee:) [23:37:29] Ok, change seems to have also hit my browser now! All fine now :) [23:37:32] thanks [23:43:42] (03CR) 10BBlack: [C: 031] "I noticed this during the natfix removal stuff, but never followed up on it, sorry! LGTM." [operations/puppet] - 10https://gerrit.wikimedia.org/r/150576 (owner: 10BryanDavis)