[00:15:07] !log imported jouncebot from github - https://gerrit.wikimedia.org/r/#/q/project:wikimedia/bots/jouncebot,n,z [00:15:11] Logged the message, Master [00:20:08] mwalker: https://gerrit.wikimedia.org/r/#/c/149201/1 [00:43:32] (03PS4) 10Dzahn: wikimedia.ee - own zonefile and set external MX [operations/dns] - 10https://gerrit.wikimedia.org/r/148762 [00:50:10] mutante, if you have the permissions; we should probably add a 'Jouncebot' component to the Wikimedia bugzilla product [00:52:24] (03PS4) 10Dzahn: add index.html pages for various directories on dataset hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/144640 (owner: 10ArielGlenn) [00:55:53] mwalker: i don't, andre__ wants tickets for component changes, but https://bugzilla.wikimedia.org/show_bug.cgi?id=68549 [00:56:14] hahah! https://bugzilla.wikimedia.org/show_bug.cgi?id=68548 [00:56:18] jinx! [01:13:44] (03PS5) 10Jeremyb: wikimedia.ee - own zonefile and set external MX [operations/dns] - 10https://gerrit.wikimedia.org/r/148762 (owner: 10Dzahn) [01:17:56] (03PS1) 10Ori.livneh: Disable LuaSandbox's profiling feature, to isolate bug 68413 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149211 [01:18:36] (03CR) 10Ori.livneh: [C: 032] Disable LuaSandbox's profiling feature, to isolate bug 68413 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149211 (owner: 10Ori.livneh) [01:18:40] (03Merged) 10jenkins-bot: Disable LuaSandbox's profiling feature, to isolate bug 68413 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149211 (owner: 10Ori.livneh) [01:20:22] the commit message subject is misleading -- the change above is scoped to labs [01:21:57] !log ori Synchronized wmf-config/CommonSettings.php: Ic29ae11fa: On Labs, disable LuaSandbox's profiling feature to isolate bug 68413 (duration: 00m 04s) [01:22:04] Logged the message, Master [01:33:28] (03CR) 10Ori.livneh: "..but it's okay, because destBuffer doesn't get used, in that case. Removing my -1" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149119 (owner: 10BBlack) [01:35:41] (03PS6) 10Jeremyb: wikimedia.ee - own zonefile and set external MX [operations/dns] - 10https://gerrit.wikimedia.org/r/148762 (owner: 10Dzahn) [01:36:36] (03CR) 10Ori.livneh: [C: 031] "Tested this by extracting the code into a self-standing program (https://gist.github.com/atdt/3296446ed6e854a44f08) and it appears to work" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149119 (owner: 10BBlack) [01:37:14] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Last successful Puppet run was Thu 24 Jul 2014 15:33:13 UTC [01:53:08] (03CR) 10Jeremyb: [C: 031] "the source that you had copied from (wikimedia.com) had been retabbed in the meantime. update to be closer to the new version." [operations/dns] - 10https://gerrit.wikimedia.org/r/148762 (owner: 10Dzahn) [01:59:14] PROBLEM - puppet last run on cp3011 is CRITICAL: CRITICAL: Puppet has 1 failures [02:10:58] (03PS1) 10Springle: Make config changes applied on labsdb1002 with SET GLOBAL permanent. [operations/puppet] - 10https://gerrit.wikimedia.org/r/149215 [02:14:09] (03CR) 10Springle: [C: 032] Make config changes applied on labsdb1002 with SET GLOBAL permanent. [operations/puppet] - 10https://gerrit.wikimedia.org/r/149215 (owner: 10Springle) [02:16:14] RECOVERY - puppet last run on cp3011 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [02:33:12] (03PS1) 10Springle: Remove mysql_multi_instance from labsdb1002 after migration. [operations/puppet] - 10https://gerrit.wikimedia.org/r/149218 [02:34:05] (03CR) 10Springle: [C: 032] Remove mysql_multi_instance from labsdb1002 after migration. [operations/puppet] - 10https://gerrit.wikimedia.org/r/149218 (owner: 10Springle) [02:48:20] !log LocalisationUpdate completed (1.24wmf14) at 2014-07-25 02:47:17+00:00 [02:48:28] Logged the message, Master [03:23:52] PROBLEM - puppet last run on mc1004 is CRITICAL: CRITICAL: Puppet has 1 failures [03:31:37] !log LocalisationUpdate completed (1.24wmf15) at 2014-07-25 03:30:33+00:00 [03:31:42] Logged the message, Master [03:38:12] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Last successful Puppet run was Thu 24 Jul 2014 15:33:13 UTC [03:41:52] RECOVERY - puppet last run on mc1004 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [04:19:51] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 25 04:18:45 UTC 2014 (duration 18m 44s) [04:19:57] Logged the message, Master [05:02:55] PROBLEM - RAID on virt1009 is CRITICAL: CRITICAL: Active: 14, Working: 14, Failed: 2, Spare: 0 [05:13:33] (03PS2) 10Chmarkine: Make lists.wikimedia.org HTTPS only [operations/puppet] - 10https://gerrit.wikimedia.org/r/145616 (https://bugzilla.wikimedia.org/68553) (owner: 10JanZerebecki) [05:38:55] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Last successful Puppet run was Thu 24 Jul 2014 15:33:13 UTC [05:47:18] bblack, around? Will you have time in the next 10+ hrs to try switching a large partner to unified? [05:47:29] i'm in russia, will be testing on the ground [05:48:32] I'm somewhat around if you wanna ping me for easy reviews and merges, but not for deep stuff [05:50:19] bblack, nothing in-depth, just need to be able to roll back quickly if it fails :) [05:50:23] (03PS1) 10Ori.livneh: $wgImageMagickTempDir: /a/magick-tmp => /tmp/magick-tmp [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149253 [05:50:46] when would be a good time? I'm boarding a plane right now, shoud be with a proper sim card in about 3-4 hours [05:51:05] (03CR) 10Ori.livneh: [C: 032] $wgImageMagickTempDir: /a/magick-tmp => /tmp/magick-tmp [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149253 (owner: 10Ori.livneh) [05:51:09] (03Merged) 10jenkins-bot: $wgImageMagickTempDir: /a/magick-tmp => /tmp/magick-tmp [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149253 (owner: 10Ori.livneh) [05:51:32] than i will push a varnish patch and start testing [05:52:03] if it fails, might have to roll back, since i can't do a php depl today [05:52:07] bblack, ^ [05:52:23] yeah ok [05:52:41] well, ideal would be like 8 hours from now [05:52:49] I may or may not sleep sometime between now and then [06:28:55] PROBLEM - puppet last run on mw1099 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:15] PROBLEM - puppet last run on iron is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:15] PROBLEM - puppet last run on lvs3001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:25] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:25] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:35] PROBLEM - puppet last run on db1002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:35] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:45] PROBLEM - puppet last run on ms-fe1004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:55] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:55] PROBLEM - puppet last run on search1010 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:55] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:44:59] RECOVERY - puppet last run on search1010 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [06:45:28] RECOVERY - puppet last run on iron is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [06:45:29] RECOVERY - puppet last run on lvs3001 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [06:45:38] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [06:45:38] RECOVERY - puppet last run on db1002 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [06:45:58] RECOVERY - puppet last run on ms-fe1004 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [06:45:58] RECOVERY - puppet last run on mw1099 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:46:59] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [06:46:59] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [06:47:29] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [06:47:38] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [06:52:59] PROBLEM - puppet last run on db1004 is CRITICAL: CRITICAL: Puppet has 1 failures [07:09:59] RECOVERY - puppet last run on db1004 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [07:11:27] (03Abandoned) 10Ori.livneh: wmflib: add safe_filename() [operations/puppet] - 10https://gerrit.wikimedia.org/r/148723 (owner: 10Ori.livneh) [07:12:38] (03Abandoned) 10Ori.livneh: wmflib: add apt_version() [operations/puppet] - 10https://gerrit.wikimedia.org/r/148512 (owner: 10Ori.livneh) [07:20:54] anyone awake in here? [07:21:39] akosiaris, are you around? [07:21:58] I'm looking for a root to shutdown tantalum for me [07:27:04] apergos, ? [07:28:31] mwalker: shutdown or restart? [07:28:48] just shut it down -- it's fighting with the "real" ocg servers [07:28:53] ok [07:29:01] thanks! :) [07:29:23] !log shutdown tantalum per mwalker request [07:29:29] Logged the message, Master [07:29:53] :'( its been a fun ride tantalum, but the big boys have replaced you [07:30:01] :) [07:30:48] PROBLEM - Host tantalum is DOWN: PING CRITICAL - Packet loss = 100% [07:36:33] https://integration.wikimedia.org/ci/job/cxserver-npm/235/console - no space? needs hashar :) [07:39:48] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Last successful Puppet run was Thu 24 Jul 2014 15:33:13 UTC [07:53:44] hi :) [07:53:58] mwalker: you know LVS is still pointed at tantalum and not ocg100[123] right? [07:54:30] switch it over? [07:54:38] yes please :) [07:55:05] traffic should start moving shortly [07:55:09] (it's not urgent though; nothing is actively using that fqdn unless I poke it with a script) [07:59:28] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [08:00:22] (03CR) 10Hashar: [C: 031] "Hopefully the scap target will still work :-]" [operations/puppet] - 10https://gerrit.wikimedia.org/r/146091 (owner: 10BBlack) [08:02:39] (03PS5) 10BBlack: beta::natfix removal step 2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/146091 [08:03:36] (03CR) 10jenkins-bot: [V: 04-1] beta::natfix removal step 2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/146091 (owner: 10BBlack) [08:04:34] (03PS6) 10BBlack: beta::natfix removal step 2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/146091 [08:04:46] yeah yeah I know jenkins :P [08:05:36] (03CR) 10BBlack: [C: 032] beta::natfix removal step 2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/146091 (owner: 10BBlack) [08:25:57] (03PS1) 10Chmarkine: icinga-admin -- update cipher suite list to support PFS [operations/puppet] - 10https://gerrit.wikimedia.org/r/149267 (https://bugzilla.wikimedia.org/53259) [08:31:28] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [08:53:31] (03PS1) 10Mwalker: Allow world read/write + sticky on ocg tmpfs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149274 [08:54:29] bblack, want to review and merge a puppet change for me? :D [08:55:43] or maybe springle? [08:56:07] * mwalker debates if it might not be better to just go to bed [09:02:19] (03PS1) 10Filippo Giunchedi: add swift account for search backups [operations/puppet] - 10https://gerrit.wikimedia.org/r/149276 [09:05:15] <_joe_> mwalker: if you need assistance, I'm here [09:05:45] _joe_, I have a patch :) it fixes some rendering permissions issues: https://gerrit.wikimedia.org/r/149274 [09:06:39] <_joe_> ok I need to wrap my head around the whole problem [09:06:48] <_joe_> what is the problem you're trying to solve? [09:06:58] <_joe_> a tmpfs only writable by root? [09:07:43] uh no; I have a process that runs in the ocg group that needs to write to the tempfs [09:08:00] right now the tempfs is writeable only by root [09:08:02] <_joe_> so, why not changing the dir permission to match the group? [09:08:17] <_joe_> and why you need it to be 1777? [09:08:34] <_joe_> 777 is a security liability [09:08:39] <_joe_> most of the times [09:08:57] 1777 is the default for /tmp -- I wasn't sure if I could have the tmpfs mount on a folder that wasn't owned by root so I didn't change the owner/group [09:09:02] <_joe_> sorry to ask so many questions [09:09:11] <_joe_> but I've never looked at this subsystem [09:09:14] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] add swift account for search backups [operations/puppet] - 10https://gerrit.wikimedia.org/r/149276 (owner: 10Filippo Giunchedi) [09:09:34] <_joe_> ok is it incredibly late in your TZ? [09:09:43] heh; yep :) but that's not a problem [09:09:50] <_joe_> if this is the case, I'll take a look at the patch, I may modify it and merge it [09:09:58] <_joe_> I understood what's your need [09:10:15] <_joe_> let me understand if there is a better way to do this - I suspect there is [09:10:35] possibly! [09:11:01] (maybe mounting it 1440 with ocg/ocg is possible -- and probably preferred) [09:11:17] I can test in labs; I just took a shortcut [09:11:20] * mwalker looks innocent [09:11:36] <_joe_> :) [09:11:40] <_joe_> don't worry [09:11:53] <_joe_> I'll test this, the requirement is clear [09:12:11] *er, that would need to be 1660 (I need read/write) [09:12:14] <_joe_> btw [09:12:14] awesome; thanks :) [09:12:26] <_joe_> you can specify the mount permissions [09:12:32] <_joe_> when mounting a tmpfs [09:12:45] <_joe_> (reading mount(1) manual and related resources) [09:15:44] <_joe_> (Also, I don't get why you need the sticky bit there, but it adds security so it's good in some respect) [09:16:51] <_joe_> and, you have to set the owner of the tmpfs in mounting it [09:17:26] (sticky bit was mostly if it was open to other users/groups) [09:18:09] (03PS1) 10QChris: Document database pool size parameters [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/149277 [09:18:29] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [09:19:19] (03CR) 10QChris: [C: 031] "Looks good to me [1]." [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/149126 (https://bugzilla.wikimedia.org/68534) (owner: 10Milimetric) [09:20:23] <_joe_> mwalker: btw, default mount mode for tmpfs is 1777 [09:20:40] <_joe_> so - where are you having problems at the moment? [09:21:06] <_joe_> tmpfs is by default owned by root and in mode 1777 [09:21:28] I suspect puppet is changing the permissions due to the $file {tmpfs_mountpoint statement [09:21:28] <_joe_> if you don't pass it -o mode=something [09:21:36] but; if you want to see this on a real host; try ocg1001 [09:21:40] .eqiad.wmnet [09:21:43] <_joe_> yes thanks [09:22:18] <_joe_> the permissions of the mountpoint before mounting are irrelevant btw [09:22:50] but can puppet change the permissions back after it's been mounted [09:23:02] (03CR) 10Filippo Giunchedi: [C: 031] contint: setup localhost.qunit vhost on lanthanum [operations/puppet] - 10https://gerrit.wikimedia.org/r/149105 (https://bugzilla.wikimedia.org/68529) (owner: 10Hashar) [09:23:20] godog: we can get it merged in :-) [09:23:31] since I am around [09:24:09] <_joe_> mwalker: mmmh *very* strange [09:24:22] <_joe_> ok, puppet is braindead [09:24:57] hashar_: sure [09:25:26] godog: I will force run puppet on the server and verify apache works fine [09:25:52] <_joe_> mwalker: I refuse to think there is no way around this stupidity :| [09:25:59] hah [09:26:21] <_joe_> basically puppet is so idiotic it does not see the dir has become a mountpoint [09:26:23] can you create a file in puppet without specifying the user/group/permissions (so therefore they're implicit and puppet doesn't change them every 20 minutes) [09:26:35] <_joe_> mwalker: AFAIR no [09:26:41] *facedesk* [09:26:43] <_joe_> but lemem check [09:27:22] <_joe_> http://docs.puppetlabs.com/references/latest/type.html#file-attribute-mode [09:27:50] <_joe_> ok obviously this is not clear [09:27:56] * _joe_ testing [09:28:44] <_joe_> the real gold nugget is "When setting numeric permissions for directories, Puppet sets the search permission wherever the read permission is set" [09:28:59] <_joe_> which is them assuming puppet users are morons [09:29:47] (03PS1) 10QChris: Enable Apache's headers module [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/149279 [09:30:36] hashar_: kk, ping me here when done [09:30:57] godog: just need the patch to be merged in Gerrit and on palladium :] [09:31:15] (03CR) 10QChris: "In case you find this change when searching how to add" [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/144761 (owner: 10Milimetric) [09:32:10] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] contint: setup localhost.qunit vhost on lanthanum [operations/puppet] - 10https://gerrit.wikimedia.org/r/149105 (https://bugzilla.wikimedia.org/68529) (owner: 10Hashar) [09:32:34] hashar_: we'll need another +1 from a human, _joe_ by any chance? :) [09:32:59] <_joe_> mwalker: ok removing the mode line and fixing it by hand on the servers should be enough [09:33:04] <_joe_> I did some tests [09:33:09] shiny [09:33:29] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [09:33:42] its ocg100[1-3] that will need to be scrubbed [09:34:30] <_joe_> mwalker: we would just need to remount the tmpfs [09:34:33] godog: I am sure it is harmless. Apache is not used on that server, it just run tests :] [09:34:39] <_joe_> mwalker: are those in production already? [09:34:46] nope [09:34:55] <_joe_> godog: 1 minute please :) [09:34:57] godog: the change is there to enable the vhost we are using for testing. just be bold! [09:35:48] (03CR) 10QChris: [C: 04-1] "In addition to the inline comments, the wikimetrics" (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149127 (https://bugzilla.wikimedia.org/68534) (owner: 10Milimetric) [09:36:00] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Your change will accomplish what you need but in the wrong way. Just remove the line assigning explicit permissions. Mount will then mount" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149274 (owner: 10Mwalker) [09:36:12] <_joe_> mwalker: If you want to rest, I can take care of this [09:36:38] <_joe_> godog: which change needs my attention? I have 50 of them :P [09:36:38] my goal tonight is to get the servers rendering documents again! I shall not rest! [09:36:46] <_joe_> ok ok [09:36:51] <_joe_> I know the deal [09:37:00] <_joe_> so, go on remove that line [09:37:13] <_joe_> and in ~20 minutes they'll be up and runnign [09:38:06] _joe_: https://gerrit.wikimedia.org/r/#/c/149105/ :D [09:38:26] <_joe_> hashar_: taking a look - you're just moving things around basically [09:38:29] adds some Apache vhost on lanthanum, the second jenkins slave in prod. Apache is unused right now, that will let us use it [09:38:30] <_joe_> and changing permissions [09:38:49] and puppet catalog compiler detects no change on gallium (the other server) [09:38:53] I am sure it is going to work :-] [09:38:57] <_joe_> not even permissions [09:39:01] else I will write somewhere I owe you both a beer [09:39:03] <_joe_> it's good for me [09:39:10] (03PS2) 10Mwalker: Allow world read/write + sticky on ocg tmpfs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149274 [09:39:29] (03CR) 10Giuseppe Lavagetto: [C: 031] "LGTM" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149105 (https://bugzilla.wikimedia.org/68529) (owner: 10Hashar) [09:39:49] (03PS3) 10Giuseppe Lavagetto: Allow world read/write + sticky on ocg tmpfs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149274 (owner: 10Mwalker) [09:39:58] (03CR) 10Giuseppe Lavagetto: [C: 032] Allow world read/write + sticky on ocg tmpfs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149274 (owner: 10Mwalker) [09:40:14] (03CR) 10Giuseppe Lavagetto: [V: 032] Allow world read/write + sticky on ocg tmpfs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149274 (owner: 10Mwalker) [09:40:25] <_joe_> I hate you jenkins [09:40:48] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Last successful Puppet run was Thu 24 Jul 2014 15:33:13 UTC [09:41:23] heh; I typically imagine jenkins as a crotchety old man hobbling around on a walker [09:43:01] (03PS3) 10Filippo Giunchedi: contint: setup localhost.qunit vhost on lanthanum [operations/puppet] - 10https://gerrit.wikimedia.org/r/149105 (https://bugzilla.wikimedia.org/68529) (owner: 10Hashar) [09:43:13] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] contint: setup localhost.qunit vhost on lanthanum [operations/puppet] - 10https://gerrit.wikimedia.org/r/149105 (https://bugzilla.wikimedia.org/68529) (owner: 10Hashar) [09:43:51] oops, _joe_ good to puppet-merge ? [09:45:40] <_joe_> yes [09:45:52] <_joe_> sorry [09:46:50] ugh... "Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate declaration: File[/var/log/syslog] is already declared in file /etc/puppet/modules/ocg/manifests/init.pp:195; cannot redeclare at /etc/puppet/modules/base/manifests/init.pp:106 on node i-00000396.eqiad.wmflabs" [09:46:55] I think I will just have to go to bed [09:47:09] <_joe_> mwalker: ugh [09:47:10] _joe_: kk merged both [09:47:45] <_joe_> mwalker: I'm confirming how puppet works, I'm applying the change on all nodes [09:47:50] <_joe_> 5 mins and we're ok [09:48:11] *crossing fingers* things work in prod better than in beta [09:48:31] godog: thanks :-) [09:48:48] though I broke puppet now haha [09:48:49] <_joe_> wow ocg1001 is not in salt, :( [09:48:57] hashar: np [09:49:16] _joe_, should be... I deploy OCG using trebuchet [09:49:43] <_joe_> mwalker: ok it gave no output for test.ping [09:50:10] <_joe_> mwalker: running puppet to confirm [09:50:59] PROBLEM - puppet last run on lanthanum is CRITICAL: CRITICAL: Puppet has 1 failures [09:52:00] <_joe_> mwalker: it is fixed now [09:52:02] (03PS1) 10Hashar: contint: /srv/localhost should belong to jenkins-slave [operations/puppet] - 10https://gerrit.wikimedia.org/r/149281 [09:52:04] yaaaay [09:52:07] thanks so much [09:52:10] \O/ [09:52:16] * mwalker proceeds to torture test OCG some more [09:52:35] <_joe_> glad to help [09:53:37] godog: _joe_ : and some lame follow up to my previous patch https://gerrit.wikimedia.org/r/#/c/149281/ 'jenkins' user does not exist everywhere :-D [09:54:12] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] contint: /srv/localhost should belong to jenkins-slave [operations/puppet] - 10https://gerrit.wikimedia.org/r/149281 (owner: 10Hashar) [09:54:15] hashar: hahah okay [09:54:31] done [09:54:59] <_joe_> ok good [09:55:03] <_joe_> back to hiera [09:56:07] sorry :-( [09:56:29] _joe_: does hiera bring support for puppet environments or is that unrelated? [09:56:30] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [09:56:47] <_joe_> mmmh checking tin [09:57:05] <_joe_> hashar: what do you mean with puppet environments? [09:57:42] _joe_: apparently you can shared nodes in different 'envs', much like the $::realm we are using [09:57:48] and each env can come with its own set of manifests [09:58:29] not hiera :-) forget me [09:58:41] <_joe_> oh ok, yes, I thought we used that [09:58:41] some basic doc at http://docs.puppetlabs.com/puppet/latest/reference/environments.html but that will divert you from working on hiera [09:58:54] <_joe_> realm is basically our version of environment [09:58:58] RECOVERY - puppet last run on lanthanum is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [09:59:03] <_joe_> we do that by ourselves :) [09:59:19] <_joe_> hashar: I know what environments are [09:59:45] <_joe_> directory environments, never used them [10:00:15] I am wondering whether it could be used instead of puppetmaster::self for labs :D [10:00:29] and have each labs project use its own git repo in addition of operations/puppet. [10:00:32] but that is digressing [10:00:46] (03PS2) 10QChris: Use wikimetrics' new configurable pool sizes [operations/puppet] - 10https://gerrit.wikimedia.org/r/149127 (https://bugzilla.wikimedia.org/68534) (owner: 10Milimetric) [10:00:48] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:02:20] (03CR) 10QChris: Use wikimetrics' new configurable pool sizes (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149127 (https://bugzilla.wikimedia.org/68534) (owner: 10Milimetric) [10:03:56] (03CR) 10QChris: "Since it seems we want to press this hard into production," (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149127 (https://bugzilla.wikimedia.org/68534) (owner: 10Milimetric) [10:04:16] (03PS1) 10Hashar: contint: qunit local vhost needs mod_rewrite [operations/puppet] - 10https://gerrit.wikimedia.org/r/149283 [10:06:38] PROBLEM - Disk space on ocg1003 is CRITICAL: DISK CRITICAL - free space: /mnt/tmpfs 2 MB (0% inode=99%): [10:08:42] (03CR) 10QChris: [C: 04-1] "Bumping the wikimetrics module is still missing." [operations/puppet] - 10https://gerrit.wikimedia.org/r/149127 (https://bugzilla.wikimedia.org/68534) (owner: 10Milimetric) [10:09:38] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 53278 bytes in 0.452 second response time [10:09:42] (03CR) 10Hashar: "Puppet compilation: http://puppet-compiler.wmflabs.org/184/change/149283/html/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149283 (owner: 10Hashar) [10:09:58] PROBLEM - Disk space on ocg1001 is CRITICAL: DISK CRITICAL - free space: /mnt/tmpfs 64 MB (1% inode=99%): [10:09:58] PROBLEM - Disk space on ocg1002 is CRITICAL: DISK CRITICAL - free space: /mnt/tmpfs 35 MB (0% inode=99%): [10:13:35] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] contint: qunit local vhost needs mod_rewrite [operations/puppet] - 10https://gerrit.wikimedia.org/r/149283 (owner: 10Hashar) [10:13:46] sorry that is messy :( [10:14:20] hashar: not to worry :) [10:14:52] the contint manifests are not very robust [10:15:00] and there is a bunch of leftover tech debt :( [10:21:44] as long as we are decreasing the debt I think we're fine :)) [10:26:47] yeah [10:26:53] apache all happy on both servers \O/ [10:27:11] some paperwork / lunch and will be back later this afternoon [10:33:09] I am off [10:37:28] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [10:41:06] hashar: How do you trigger puppet-compiler in labs? And how does it know for which nodes to try identical/diff ? [10:41:16] e.g. yours only ran for gallium and lanthanum [10:41:38] <_joe_> Krinkle: what do you mean with 'in labs'? [10:41:59] <_joe_> Krinkle: you want to run it against labs hosts? I guess that's not supported at the moment [10:44:46] grrrit-wm: (CR) Hashar: "Puppet compilation: http://puppet-compiler.wmflabs.org/184/change/149283/html/" [operations/puppet] - https://gerrit.wikimedia.org/r/149283 (owner: Hashar) [10:44:48] _joe_: [10:45:13] That puppet-compiler thing runs in labs, it doens't just run willy nilly for every puppet change simulating every prorduction node we hae [10:45:14] it's selective [10:46:57] !log integration-slave1001.eqiad.wmflabs is out of disk space ( / /dev/vda1) [10:47:02] Logged the message, Master [10:47:57] <_joe_> Krinkle: oh ok you want to run it [10:47:59] <_joe_> sorry :) [10:48:13] <_joe_> Krinkle: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/ [10:48:20] <_joe_> ->> build with params [10:48:29] <_joe_> or look at the parameters of the last runs [10:48:38] Well, no, I just want to know how it works, and perhaps tell someone to document it because neither individual runs nor the root dir http://puppet-compiler.wmflabs.org/ show any kind of information [10:48:45] <_joe_> when you trigger the build, look at the console output [10:49:13] <_joe_> Krinkle: I should be the one documenting it [10:49:22] <_joe_> I was pretty sure I had updated wikitech [10:49:38] Is it for detecting changes beteeen puppet 2 and 3, or also for the effective diff within one puppet version before/after this change? [10:50:00] s/this change/the given change/ [10:50:07] <_joe_> It is born for puppet 2 -> puppet 3, now we use it just for diffs between prod and a change [10:50:13] cool [10:50:30] <_joe_> it's not documented in that "website" as that is just a static collection of outputs :P [10:50:47] <_joe_> It's intended only to be run through jenikns nowadays [10:50:56] !log contint: manually cleared /tmp on the 3 labs jenkins slaves. [10:51:00] <_joe_> or in vagrant if you prefer but that is untested [10:51:01] Logged the message, Master [10:51:14] Sure, but could make an index.html with a link to https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/ and/or a link to the instance project page on wikitech. [10:52:07] there isn't a puppet-compiler wmflabs project, I guess it is puppet3-diff? [10:52:07] https://wikitech.wikimedia.org/wiki/Nova_Resource:Puppet3-diffs [10:52:15] <_joe_> it is [10:52:23] <_joe_> I may change the name of the project [10:52:30] <_joe_> and I have to document it a little there [10:52:55] <_joe_> but - I should just write a page on wikitech about how to use the compiler to test your changes [10:53:12] * _joe_ adds to the endless queue of very important things to do [10:53:29] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [10:56:44] _joe_: I added an initial description [10:57:26] _joe_: Could you add me as member? One or more of its hosts are connected to Jenkins, and in order to perform general maintenance it's probably useful to be able to connect to the instance myself as well. [10:57:44] <_joe_> Krinkle: of course I can [10:57:49] <_joe_> that host needs some love btw [10:57:53] <_joe_> I should work on it [10:58:43] The only thing I expect to do at this point is dump my dotfiles template and be able in the future to possibly stop/start jenkins-slave connection if needed. [11:00:34] hashar: Hm.. as of today all links on pages like https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/ have an underline. Makes the text a bit unreadable. Do you know anything about that and/or other updates? [11:01:21] no clue [11:01:55] Krinkle: jenkins has a plugin that loads https://integration.wikimedia.org/jenkins.css [11:02:04] should be in integration/docroot.git [11:02:11] as for why the underline suddenly appeared I have no idea [11:02:59] I am off for lunch [11:22:35] (03CR) 10Giuseppe Lavagetto: "nitpick but apart from that, LGTM" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149066 (owner: 10Ori.livneh) [11:25:09] <_joe_> lunch. bbl. [11:25:46] (03PS1) 10Reedy: Add export-0.8 to index.html [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149289 (https://bugzilla.wikimedia.org/68561) [11:26:14] (03CR) 10Reedy: [C: 032] Add export-0.8 to index.html [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149289 (https://bugzilla.wikimedia.org/68561) (owner: 10Reedy) [11:26:18] (03Merged) 10jenkins-bot: Add export-0.8 to index.html [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149289 (https://bugzilla.wikimedia.org/68561) (owner: 10Reedy) [11:26:45] !log reedy Synchronized docroot and w: (no message) (duration: 00m 13s) [11:26:51] Logged the message, Master [11:36:58] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 525 bytes in 0.014 second response time [11:39:58] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.359 second response time [11:41:48] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Last successful Puppet run was Thu 24 Jul 2014 15:33:13 UTC [11:59:29] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 2 failures [12:11:29] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [12:51:54] bblack, around? [12:54:00] (03PS3) 10Giuseppe Lavagetto: mediawiki/apache: use ports.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/148098 [12:57:29] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [13:03:25] <_joe_> !log stopping puppet on all appservers - will reactivate after testing [13:03:31] Logged the message, Master [13:04:15] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki/apache: use ports.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/148098 (owner: 10Giuseppe Lavagetto) [13:08:59] <_joe_> !log re-enabling puppet, test run on the test host was fine. [13:09:04] Logged the message, Master [13:12:28] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [13:16:43] _joe_ Coren: you do know that puppet does not merge complex hierarchies on hiera ? [13:17:06] akosiaris: Yeah, I know it's first found. [13:17:49] Coren: kind of worse. IIRC it will not drill down to a complex hierarchy [13:18:07] complex == multilevel [13:18:13] for what I am saying [13:18:33] I stumbled across that with a friend of mine while playing with it [13:18:51] let me see if I can find the exact case [13:19:09] <_joe_> akosiaris: I know [13:19:38] <_joe_> akosiaris: the only way to create complex hierarchies it to create them in the backend directly, which kinda sucks [13:26:32] (03PS1) 10Ottomata: Install hive on hadoop workers to get hive-hcatalog packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/149308 [13:26:47] (03CR) 10Ottomata: [C: 032 V: 032] Install hive on hadoop workers to get hive-hcatalog packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/149308 (owner: 10Ottomata) [13:28:21] _joe_: (think of RUBY 1.8 scanning and parsing yaml files for every puppet compilation on the master, then cry). <-- priceless [13:29:34] oh hey, akosiaris, yt? [13:30:05] ottomata: yes [13:30:24] _joe_: matanya that got me thinking. Our new puppetmaster packages do not depend on ruby1.8 anymore [13:30:28] so, there was a use case a couple of weeks ago where it would have been nice to have a kafka-client package [13:30:38] which is really just the same package [13:30:43] but without the init.d script [13:30:45] we could switch to 1.9 ruby (I hope) [13:30:59] dan wanted to check the kafka logs for something [13:31:09] and i couldn't get him an easy way to do it [13:31:17] logs ? [13:31:18] without either giving him shell access to a broker [13:31:19] soryr [13:31:21] data [13:31:24] data in kafka [13:31:27] he wanted to consume on the cli [13:31:32] akosiaris: we can't we still have some .rb thing that where removed from 1.9 iirc [13:31:33] ok [13:31:40] so yeah, without giving him shell acecss to broker, or installing the kafka package on stat1002 [13:31:43] which is fine, iguess [13:31:48] PROBLEM - puppetmaster backend https on strontium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:31:55] but then there'd be the init script and default left at KAFKA_START=no [13:32:17] i think I tried to make a kafka client package before, a while ago [13:32:24] ottomata: sorry looking at the alert, gimme 5 [13:32:27] np [13:32:38] RECOVERY - puppetmaster backend https on strontium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.025 second response time [13:32:47] and you and faidon didn't like it...mainly because I made 'kafka' package have everythign and 'kafka-server' was just the init scripts [13:35:00] akosiaris: e.g modules/stdlib/spec/unit/puppet/parser/functions/range_spec.rb has to_a which was removed from 1.9 [13:35:33] matanya: we should anyway upgrade that module [13:35:39] not that it is going to be easy [13:35:44] agreed [13:35:57] rsync module has the same issue [13:37:18] and to all ops here: Happy sysadmin day! [13:38:08] ottomata: so I figured out nothing obvious from the alert ... sigh... [13:38:34] (03PS1) 10Yurik: Moved 250-99 to unified config [operations/puppet] - 10https://gerrit.wikimedia.org/r/149310 [13:38:55] bblack, ^, but ping me when you merge it so that i can test it right away [13:38:58] anyway from what I see now the kafka package has everything inside (client/server/common code - about 99% of it) [13:39:35] If you do want it can be split up the to 3 binary packages (server, client, common) [13:39:38] aye [13:39:40] the source package will be one again [13:39:58] yes, source package one, could we just have two packages? [13:40:05] kafka-client kafka-server? [13:40:09] or, just kafka and kafka-server? [13:40:19] kafka would be evertying but init scripts [13:40:26] I don't think it would be wise [13:40:35] it would work but not clear [13:40:42] so common should have all the jars [13:40:43] ok, what would be in client and common? [13:40:44] ah [13:40:52] client the script you wrote [13:40:57] and server the init scripts [13:41:13] so basically 99% of the package will go into common [13:41:17] hm, ok, that's fine with me, just curious though, why is that more clear? [13:41:36] because my way kafka-client would have stuff that coudl be used for server too [13:41:37] ? [13:41:44] yes and vice versa [13:41:54] hm, ok sure [13:41:55] which one would depend on which ? [13:42:06] you answer that easily but still [13:42:25] plus it is already been done by other packages out there so let's follow the stream [13:42:35] but does it really add something ? [13:42:43] doing it like that I mean [13:42:48] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Last successful Puppet run was Thu 24 Jul 2014 15:33:13 UTC [13:43:09] I get dan's problem and use case but he can just install the kafka package, can't he ? [13:43:24] it is not like he will get a daemon running [13:43:45] my point being - I don't think I have the time to do that now really [13:43:59] so feel free [13:44:14] ah! I forgot to turn puppet back on an27, that's my fault [13:44:16] (03PS1) 10Chmarkine: planet.wikimedia.org -- fix https redirects to http [operations/puppet] - 10https://gerrit.wikimedia.org/r/149311 (https://bugzilla.wikimedia.org/68554) [13:44:27] well, he can't, I can though [13:44:36] but it feels weird to puppetize it like that, [13:44:37] ottomata: btw I finished today the kafka reorganization I was telling you about [13:44:39] yeah, akosiaris i would do it [13:44:43] oh ok awesome [13:44:58] its kind of a big difference though [13:45:09] not codewise but I changed branch names etc [13:45:17] to conform to gbp's stuff [13:45:34] cool [13:45:47] ah, did you adopt the gbp convention of using master as debian branch? [13:45:51] yes [13:45:53] dawwww [13:45:56] i don't like that convention :/ [13:46:02] it works for kafka [13:46:07] neither did I [13:46:08] because their master is 'trunk [13:46:09] ' [13:46:28] but it conflicts with most other repos if you want to turn them in to gbp repos [13:46:46] yes because that is the wrong way to do it [13:47:07] howso? [13:47:15] turns out the best way is to not have local branches of the remotes [13:47:22] and have upstream for the upstream code [13:47:39] and tags? [13:47:49] upstream/ [13:47:55] debian/ [13:47:58] works nicely [13:48:00] hm, but then you have to maintain it all yourself! [13:48:11] you anyway have to [13:48:17] and it is not difficult [13:48:25] I have a readme ready [13:48:30] hm ok... [13:48:42] oh and debian/ is maintained by gbp [13:49:00] yes, i meant you have to maintain the upstream branch/tag mappings yourself [13:49:14] rathern than just git pull and git checkout......hmm or do you have to do that anyway too [13:49:27] i dunno, i mean, ooook!, it isn't a big deal either way [13:49:40] wait, so [13:49:41] master [13:49:43] has debian/ [13:49:44] only [13:49:48] and debian/ [13:49:51] not anymore [13:49:52] has tag + debian/ [13:49:53] ? [13:50:10] i mean, master has debian/ dir only [13:50:11] right? [13:50:13] no [13:50:15] no? [13:50:15] oh [13:50:17] it has everything [13:50:25] so you mostly just do [13:50:36] git tag upstream/ [13:50:45] git merge upstream/ [13:50:50] git-buildpackage -uc -us [13:50:58] (merge into master?) [13:51:01] yes [13:51:18] and you don't mess at all with the master of the upstream repo [13:51:29] hm, ok, so it builds from master, and auto creates the debian/ tag [13:51:30] right? [13:51:33] yes [13:51:35] is it a tag or branch? [13:51:47] whatever suits you [13:51:51] hm ok [13:51:52] but tag is best [13:52:03] s/best/better/ [13:52:13] aye, do you push those tags? [13:52:16] or just keep them locally? [13:52:16] yes [13:52:19] k [13:52:27] in the end in the repo there will only be [13:52:32] master branch and the tags [13:52:48] <^demon|gone> `git push --tags` is useful [13:53:00] ACKNOWLEDGEMENT - Puppet freshness on analytics1027 is CRITICAL: Last successful Puppet run was Thu 24 Jul 2014 15:33:13 UTC ottomata Still messing with this, will turn it back on today. [13:53:08] and the master branch will always just have the previous code plus upstream/ [13:53:22] hmm, ok [13:54:06] (03CR) 10Physikerwelt: "Thank you Bryan. I'll try that." (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/148836 (owner: 10Physikerwelt) [13:54:11] so the debian branch goes away plus all the upstream maintained branches like 0.7, 0.8.0-beta-candidate-1-oh-no,mysuperduperfeature etc [13:54:23] which is way clearer [13:54:24] aye [13:56:14] (03PS5) 10Physikerwelt: WIP: Draft for Mathoid role [operations/puppet] - 10https://gerrit.wikimedia.org/r/148836 [13:57:54] (03CR) 10Alexandros Kosiaris: [C: 032] mysql_wmf: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/148356 (owner: 10Matanya) [13:58:30] (03CR) 10Alexandros Kosiaris: [C: 032] memcached: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/148348 (owner: 10Matanya) [13:58:59] (03PS2) 10Ottomata: Set authorization_service_authorization_enabled for Oozie. [operations/puppet] - 10https://gerrit.wikimedia.org/r/149112 [13:59:09] (03CR) 10Ottomata: [C: 032 V: 032] Set authorization_service_authorization_enabled for Oozie. [operations/puppet] - 10https://gerrit.wikimedia.org/r/149112 (owner: 10Ottomata) [14:00:29] PROBLEM - puppet last run on analytics1027 is CRITICAL: CRITICAL: Puppet last ran 80803 seconds ago, expected 14400 [14:00:38] RECOVERY - Puppet freshness on analytics1027 is OK: puppet ran at Fri Jul 25 14:00:32 UTC 2014 [14:02:29] RECOVERY - puppet last run on analytics1027 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [14:04:28] for the mathoid developments I'm trying to test my puppet role. the manual on https://wikitech.wikimedia.org/wiki/Help:Self-hosted_puppetmaster says that I should use a precise instance. is that still up to date for new developments? [14:05:09] (03PS1) 10Ottomata: Uncomment resourcemanager_api_url in hue.ini [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/149312 [14:05:24] (03CR) 10Ottomata: [C: 032 V: 032] Uncomment resourcemanager_api_url in hue.ini [operations/puppet/cdh] - 10https://gerrit.wikimedia.org/r/149312 (owner: 10Ottomata) [14:06:22] (03PS1) 10Ottomata: Update for cdh module [operations/puppet] - 10https://gerrit.wikimedia.org/r/149313 [14:06:58] (03PS2) 10Ottomata: Update for cdh module [operations/puppet] - 10https://gerrit.wikimedia.org/r/149313 [14:07:06] (03CR) 10Ottomata: [C: 032 V: 032] Update for cdh module [operations/puppet] - 10https://gerrit.wikimedia.org/r/149313 (owner: 10Ottomata) [14:12:01] (03CR) 10Ottomata: [C: 032 V: 032] Document database pool size parameters [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/149277 (owner: 10QChris) [14:12:26] (03CR) 10Ottomata: [C: 032 V: 032] Make pool size configurable [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/149126 (https://bugzilla.wikimedia.org/68534) (owner: 10Milimetric) [14:12:50] (03PS2) 10Ottomata: Enable Apache's headers module [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/149279 (owner: 10QChris) [14:12:55] (03CR) 10Ottomata: [C: 032 V: 032] Enable Apache's headers module [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/149279 (owner: 10QChris) [14:14:30] (03PS3) 10Ottomata: Use wikimetrics' new configurable pool sizes [operations/puppet] - 10https://gerrit.wikimedia.org/r/149127 (https://bugzilla.wikimedia.org/68534) (owner: 10Milimetric) [14:17:03] ottomata: done [14:17:14] there are btw 3-4 outstanding reviews for debian [14:17:27] (03PS4) 10Ottomata: Use wikimetrics' new configurable pool sizes [operations/puppet] - 10https://gerrit.wikimedia.org/r/149127 (https://bugzilla.wikimedia.org/68534) (owner: 10Milimetric) [14:17:34] looking at them now. Will probably need to be merge back to master (those that make it [14:17:37] (03CR) 10Ottomata: [C: 032 V: 032] Use wikimetrics' new configurable pool sizes [operations/puppet] - 10https://gerrit.wikimedia.org/r/149127 (https://bugzilla.wikimedia.org/68534) (owner: 10Milimetric) [14:17:41] (03PS2) 10Giuseppe Lavagetto: apache: cherry-pick mods added in Ia46312071 [operations/puppet] - 10https://gerrit.wikimedia.org/r/149066 (owner: 10Ori.livneh) [14:17:43] (03PS3) 10Giuseppe Lavagetto: mediawiki: use mods-enabled, prepare for HAT [operations/puppet] - 10https://gerrit.wikimedia.org/r/148099 [14:18:15] akosiaris: cool, did you push your readme change? [14:18:17] ottomata: btw building now is as easy as git clone and following the readme [14:18:27] yes. I updated it even more. It is clearer now [14:19:06] cool [14:19:13] <_joe_> akosiaris: although I feel like I'm mastering gerrit... how can I use it so that it plays well with git-buildpackage? [14:19:22] <_joe_> I guess you have all the answers [14:19:24] <_joe_> :) [14:19:29] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [14:20:06] <_joe_> this ^ is salt timing out [14:20:31] _joe_: have your changes contained in master and just a single tag [14:20:40] for every new version that is [14:21:10] and before sending the debian/* change for review you can bypass gerrit for the upstream patches [14:21:28] git push gerrit master:refs/heads/master [14:21:44] cause obviously those don't need review [14:22:13] +1 sounds like a good approach to me [14:22:47] godog: now kafka conforms to gbp's defaults :-) [14:23:10] so master branch and upstream/, debian/ tags [14:23:27] <_joe_> oh just tags and not branches [14:23:39] <_joe_> I wanted to use git import-orig [14:23:44] yes, it was an obscure advice from the author of gpb [14:23:55] cause the upstream is already on git [14:25:24] _joe_: gbp import-orig imports .tar.gz [14:25:41] it actually works better than already having an upstream git repo [14:25:51] <_joe_> and that works with tags named upstream? [14:26:02] <_joe_> I thought I needed an upstream branch [14:26:40] <_joe_> maybe I'm missing something [14:26:44] yes in that approach you do [14:26:55] <_joe_> ok [14:27:10] akosiaris: nice! thanks :)) [14:27:32] depends on the upstream-tree=tag or upstream-tree=branch settings _joe_ [14:27:41] <_joe_> oh ok [14:27:59] I also updated http://git.wikimedia.org/blob/operations%2Fdebs%2Fkafka.git/7e99141927ef7ce8d6cc2d9200eaa57d3d239ce9/debian%2FREADME.Debian [14:28:12] <_joe_> because I was about to move the git repo for HHVM to GH [14:28:19] <_joe_> out of frustration with gerrit [14:28:30] ahaha. yeah I get the problem [14:28:47] <_joe_> also, I really need to recreate it probably [14:29:29] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [14:29:49] <_joe_> also, won't using import-orig mess with the branches history? [14:29:59] <_joe_> ok I'll experiment [14:30:03] <_joe_> sigh [14:30:33] _joe_: will you base you work on tarballs from upstream or their repo ? [14:30:39] or have you* [14:30:46] that is the first question to answer [14:31:04] <_joe_> well, their repo [14:31:09] and the second one is "stable versions - tags" or "latest and greatest - branches ?" [14:31:19] with - are the answers :-) [14:31:22] <_joe_> the latter [14:31:50] <_joe_> we are at least one year away from using hhvm releases I guess [14:31:51] <_joe_> :) [14:31:55] ahahaha [14:32:42] <_joe_> ok, doing some more apache work then I may bug you shortly [14:32:58] ok, something tells me this is going to cost us some time :-) [14:33:01] akosiaris: mmhh ok so that catch about branches if upstream isn't git and tags if it is [14:39:20] (03CR) 10Alexandros Kosiaris: [C: 032] "Indeed. Thanks!" [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/148219 (owner: 10Plucas) [14:39:43] (03CR) 10Alexandros Kosiaris: [V: 032] Use log4j-1.2 instead of log4j-1.2.16 [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/148219 (owner: 10Plucas) [14:40:19] <_joe_> let's assume I have tarball snapshots of their code [14:40:29] <_joe_> which is what I do with import-orig [14:42:45] (03PS1) 10Tim Landscheidt: Tools: Fix exim configuration for non-local addresses [operations/puppet] - 10https://gerrit.wikimedia.org/r/149316 (https://bugzilla.wikimedia.org/68545) [14:43:31] yurikMskRu: ping [14:43:41] bblack, hi [14:43:46] you ready? [14:43:48] go for it :) [14:44:04] but please be around in the next half an hour :) [14:44:17] just in case [14:44:26] (03PS2) 10BBlack: Moved 250-99 to unified config [operations/puppet] - 10https://gerrit.wikimedia.org/r/149310 (owner: 10Yurik) [14:44:32] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "Thanks!. Just a note for future patches. If you don't manually add reviewers there is quite a good chance we won't see your patches. Feel " [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/148240 (owner: 10Plucas) [14:44:45] bblack, can you force it now so that we know exactly when it starts? [14:44:52] yeah [14:44:56] thx [14:45:13] (03CR) 10BBlack: [C: 032] Moved 250-99 to unified config [operations/puppet] - 10https://gerrit.wikimedia.org/r/149310 (owner: 10Yurik) [14:49:10] bblack, in sync? [14:49:14] almost [14:49:24] salt is being a PITA [14:49:47] hehe [14:49:57] switching to tethering. sec. [14:50:41] yeah it's all deployed now [14:51:47] thx, checking [14:52:40] (03CR) 10Alexandros Kosiaris: [C: 032] "I will merge this for now but I will investigate if there is a better way to detect the correct JAVA_HOME instead of hardcoding it" [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/148227 (owner: 10Plucas) [14:52:48] (03CR) 10Alexandros Kosiaris: [V: 032] "I will merge this for now but I will investigate if there is a better way to detect the correct JAVA_HOME instead of hardcoding it" [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/148227 (owner: 10Plucas) [14:53:01] <_joe_> "detect java home" [14:53:12] <_joe_> I think the higgs boson is easier to detect [14:53:27] java's home is the dustbin of failed languages from the 20th century [14:53:29] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [14:54:00] (it shares a cardboard box with COBOL and ADA) [14:54:18] pascal [14:54:18] <_joe_> don't forget FORTRAN [14:54:39] <_joe_> mark: pascal does not have an all-capital name [14:57:07] bblack, seems to be ok, will be checking for a bit [14:57:07] thx [14:57:18] yurikR: when you get a chance can you screenshot what the russian carrier banner looks like in gif/js form? I'm curious but I don't want to go add myself to the net list. (or is there any east way with just fake headers?) [14:57:31] s/east/easy/ [14:59:06] bblack, http://ru.m.wikipedia.org/w/index.php?title=%D0%A1%D0%BB%D1%83%D0%B6%D0%B5%D0%B1%D0%BD%D0%B0%D1%8F:ZeroRatedMobileAccess&zcmd=img-banner&zfile=1 [14:59:19] (03CR) 10Alexandros Kosiaris: [C: 04-1] "I 'd rather we avoided bash and solve the problem in a better way." (038 comments) [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/148287 (owner: 10Plucas) [14:59:29] ottomata: https://gerrit.wikimedia.org/r/#/c/148287/ [15:00:10] bblack, http://ru.m.wikipedia.org/w/index.php?title=%D0%A1%D0%BB%D1%83%D0%B6%D0%B5%D0%B1%D0%BD%D0%B0%D1%8F:ZeroRatedMobileAccess&zcmd=img-banner&zfile=1&X-CS=250-99 [15:00:58] * marktraceur waves [15:01:06] Oh, it's Friday, there's no SWAT. [15:01:08] * marktraceur leaves [15:01:34] yurikR2: neat, thanks :) [15:02:34] marktraceur: gj [15:03:12] greg-g: I blame coffee [15:08:07] (03CR) 10Alexandros Kosiaris: "As a side note, we are in the process of changing the way we build the package to conform better to the gbp defaults and make it easier to" [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/148287 (owner: 10Plucas) [15:13:25] (03PS4) 10Giuseppe Lavagetto: mediawiki: use mods-enabled, prepare for HAT [operations/puppet] - 10https://gerrit.wikimedia.org/r/148099 [15:13:32] mwalker|away: ocg100x are all running out of room on /mnt/tmpfs. is this normal and it manages its own space, or? [15:13:46] Jeff_Green: ^ [15:14:12] <_joe_> bblack: well, tmpfs mimicks a FS, which is not that promising... [15:14:55] _joe_: this is an explicit separate tmpfs for the app, that has a fixed size [15:15:18] <_joe_> I know [15:15:20] bblack: it's supposed to be self-managing [15:15:28] <_joe_> I helped mwalker this morning [15:15:30] afaik there's a cleanup job that runs [15:15:58] hmmm we might want to disable the icinga disk check for it then [15:16:21] well [15:16:28] if it's paging yes [15:16:58] otoh 5GB was a sort of trial size, we probably either need to increase that or make cleanup more aggressive [15:17:04] <_joe_> uhm, if the cleanup job can't keep up, what will happen then? [15:17:17] <_joe_> 5 GB of tmpfs is not enough? [15:17:28] what will happen? I'm guessing wildy: fail :-) [15:17:38] seems like it should be plenty doesn't it? [15:17:41] <_joe_> and btw. can't we be better off using the kernels cache? [15:18:07] _joe_: using it how exactly? just relying on kernel disk caching? or something more specific? [15:18:24] <_joe_> for nothing more specific [15:18:29] (03CR) 10Tim Landscheidt: "I think the commit message is a bit misleading. AFAICS, at the moment https for all of planet.wikimedia.org is redirected to http, and th" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149311 (https://bugzilla.wikimedia.org/68554) (owner: 10Chmarkine) [15:18:41] I think the tmpfs will still be faster since it's writing then reading these [15:18:55] we tested and it was dramatically faster [15:19:01] <_joe_> Or even, If you need an in-ram cache, why reinvent the wheel instead of using memcached or redis? [15:19:08] granted we didn't make any effort to tweak kernel cache behavior [15:19:27] arguably tmpfs is the wheel those things are reinventing :P [15:19:42] <_joe_> Jeff_Green: or I did misinterpret what this tmpfs is for [15:19:43] _joe_: afaik it's a matter of teaching third-party software not to use disk [15:19:59] <_joe_> oh so it's /tmp [15:20:02] right [15:20:09] <_joe_> which on trusty is in a tmpfs already [15:20:18] _joe_: I think it's for writing out the rendered pdf as it's created, then serving it out to the requestor [15:20:33] but in this case it's a separate tmpfs on /mnt/tmp [15:20:33] _joe_: i don't think that is the case [15:20:38] <_joe_> ok, I was guessing why we would need a separate tmpfs [15:20:41] although trusty does have a stock tmpfs partition [15:20:54] trusty only uses it for /run [15:21:10] <_joe_> seriously? [15:21:12] right, didn't think it was a good idea to overload that [15:21:30] so we carved out a specific block of RAM for just this service [15:21:43] easily configurable from the node definition though [15:21:47] way back in the 90s it seems like most distros had a ram-based /tmp, I donno what happened in the interim when everyone stopped doing that for a while [15:21:47] <_joe_> debian uses tmpfs for tmp since one centory [15:21:57] but I wish they would all get with the program [15:22:17] <_joe_> but - I see a serious problem with such a limitation [15:22:49] <_joe_> we should maybe use a real swap space on machines using tmpfs [15:24:11] if /tmp is tmpfs, yeah, maybe [15:24:15] _joe_: what do you mean? ramfs? [15:24:21] (and if it's unrestricted in size) [15:24:39] we could also just limit /tmp to a reasonable fraction of total RAM [15:24:54] <_joe_> bblack: I mean on ocg, we should make the tmpfs larger, and allow for it to swap in case [15:25:04] oh, yeah, we could [15:25:05] <_joe_> but that may backfire badly [15:25:08] but for now they have tons of headroom [15:25:11] _joe_: I think larger yes, but swap no [15:25:12] <_joe_> no sorry discard that [15:25:22] they're self-limiting to 5GB for now, and there's like another 50GB of unused ram on that box [15:25:30] <_joe_> Jeff_Green: I agree, I've just foreseen a negative-feedback effect [15:25:39] <_joe_> bblack: ok, [15:25:57] <_joe_> my questions were due to not knowing the use-case of that tmpfs [15:26:01] bblack: exactly. we want to understand the high water mark before we throw arbitrary amounts of RAM at it [15:26:11] <_joe_> we're going to use tmpfs for hhvm as well, I'm a fan of it [15:26:48] also I think we should understand what ocg does as it hits a filesystem limit [15:27:47] <_joe_> Jeff_Green: what do the third-party programs do when they exhaust disk space? [15:28:01] I'm sure they explode [15:28:09] i'm not entirely sure what we're using [15:28:11] <_joe_> (we should fallback to writing to disk upon failure) [15:28:21] ok, so we'll keep the icinga check then, just we need to size it right so that the cleanup job keeps up with normal rates [15:28:22] <_joe_> we should take a deeper look [15:28:29] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [15:28:39] it's all queue-consumer driven, I think it might just 'be ok' [15:28:42] <_joe_> the whole design seems thought for speed and not reliability tbh [15:28:47] (the cleanup job runs once a day and clears files older than 3-5 days depending on type) [15:28:55] <_joe_> EEW [15:29:04] <_joe_> "what could possibly go wrong" [15:29:23] _joe_: 'speed not reliability' imo we're premature to conclude that [15:29:27] well, lots in theory, but probably not much in practice if the numbers are tuned right [15:29:28] <_joe_> I think we need to followup on this with devs [15:29:33] maybe speed up the cron job to run more often though [15:30:04] <_joe_> bblack: the reason of my "EEW" was the fact it's not cleaning orphaned files, but only based on date [15:30:05] there's an awful lot from today in /mnt/tmpfs, so I think we've got our projections wrong [15:30:26] <_joe_> Jeff_Green: I think mwalker stress-tested the servers tbh [15:30:33] _joe_: I'm not sure what orphaned means in this context, but I suspect it's like a cache [15:30:43] <_joe_> but - that is what stress-testing should show [15:30:44] _joe_: he did, yes, he replayed a day's traffic against them afaik [15:30:45] the longer they hang around, they can be re-served for another pull of the same URL [15:31:00] <_joe_> oh so it's a cache in tmpfs? [15:31:08] <_joe_> local to each node? [15:31:12] (at least, that's what I hope and how I would've done it, I haven't looked!) [15:31:20] <_joe_> I thought that was used for scratch files during conversion [15:31:29] we shoudl move this discussion to pdfhack where the devs lurk [15:31:36] <_joe_> ok [15:31:41] well really we should just read their code first [15:31:44] <_joe_> what's the name of the chan? [15:31:50] and then we won't have 10,000 stupid questions :) [15:32:01] <_joe_> bblack: right! [15:32:26] (03CR) 10Dzahn: [C: 032 V: 032] icinga-admin -- update cipher suite list to support PFS [operations/puppet] - 10https://gerrit.wikimedia.org/r/149267 (https://bugzilla.wikimedia.org/53259) (owner: 10Chmarkine) [15:32:48] <_joe_> .win 25 [15:34:23] (back on the earlier topic though: assuming it acts as a cache, we could leave in the date-based cleanup and then also run one on a faster job that does something like "If free_space < 10%, delete oldest files until it gets back to 15%"_ [15:34:42] i'm not sure it's intended as a cache [15:35:15] finished pdfs end up in /srv/deployment/ocg/output [15:35:20] ^ If it's not, why doesn't the job clean up for itself immediately, and why would we leave files for 3-5 days? [15:35:54] anyways, what was the pdfhack channel [15:36:18] my guess is #1 there's a bug that's resulting in job cleanup and #2 the cron job was added as a failsafe for this kind of bug [15:36:29] oh, I got confused, the 3-5 day cleanups *are* on /srv/deployment/ocg/... [15:36:31] and #3 the partition is undersized [15:36:43] not on the tmpfs [15:36:54] pretty sure there's something for the tmpfs partition too [15:37:32] nope [15:37:49] and the one for postmorten is using $output_dir, should probably be $postmortem_dir [15:41:03] oh heh [15:41:13] I was on an old review branch, switching... [15:42:06] (03PS1) 10BBlack: fix postmortem cleanup cron [operations/puppet] - 10https://gerrit.wikimedia.org/r/149322 [15:43:56] (03CR) 10Qgil: "Are you still waiting on reviews here, or is there something else that needs to happen before this patch can be merged? Just curious about" [operations/debs/wikimedia-task-appserver] - 10https://gerrit.wikimedia.org/r/115135 (https://bugzilla.wikimedia.org/61090) (owner: 10Hashar) [15:45:23] ACKNOWLEDGEMENT - RAID on virt1009 is CRITICAL: CRITICAL: Active: 14, Working: 14, Failed: 2, Spare: 0 daniel_zahn created RT #8004 [15:46:16] Does someone know how to restart wm-bot? It's got a funny name in #mediawiki. [15:49:43] the page on wikitech has instructions iirc [15:51:03] marktraceur: forwarded to petan for now (we did restart it yesterday and the user it runs as changed apparently, but unsure how that gives it a funny name) [15:54:45] ACKNOWLEDGEMENT - Disk space on ocg1001 is CRITICAL: DISK CRITICAL - free space: /mnt/tmpfs 35 MB (0% inode=99%): daniel_zahn RT #7969 and ongoing discussion [15:54:45] ACKNOWLEDGEMENT - Disk space on ocg1002 is CRITICAL: DISK CRITICAL - free space: /mnt/tmpfs 88 MB (1% inode=99%): daniel_zahn RT #7969 and ongoing discussion [15:54:45] ACKNOWLEDGEMENT - Disk space on ocg1003 is CRITICAL: DISK CRITICAL - free space: /mnt/tmpfs 3 MB (0% inode=99%): daniel_zahn RT #7969 and ongoing discussion [15:55:45] ^ I think the hosts were in downtime, but the fs got added after the downtime starts so those services weren't. sorry! [15:56:00] (icinga should inherit that down in those cases imho) [15:56:27] bblack: gonna blow your mind [15:56:54] ? [15:57:10] tidy { $postmortem_dir: age => 3m, type => mtime, recurse => true } [15:57:16] bam! [15:57:24] http://docs.puppetlabs.com/references/latest/type.html#tidy [15:58:04] yeah but that runs on every puppet run right? also the mtime check in that cron is 3d not 3m :) [15:58:15] details [15:58:26] not my forte, as you know [15:58:33] the last thing I want is one more reason for puppet to run slow :p [15:58:47] it already reminds me of autoconf [15:59:00] tidy { $postmortem_dir: age => 3d, type => mtime, recurse => true, schedule => daily } [15:59:09] bam! [15:59:24] http://docs.puppetlabs.com/references/latest/metaparameter.html#schedule [15:59:26] channeling Emeril today? :) [15:59:42] https://www.varnish-cache.org/docs/trunk/phk/autocrap.html [16:00:29] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 2 failures [16:01:36] have theo de raadt and phk had a public row? [16:01:42] that would be very entertaining [16:01:48] i imagine they must have crossed swords at some point [16:01:56] they must have [16:02:32] theo is actually a nice guy in person I hear. I used to work with a guy up in canada who hung out with him. [16:03:51] (03PS2) 10Chmarkine: planet.wikimedia.org -- fix https redirects to http [operations/puppet] - 10https://gerrit.wikimedia.org/r/149311 (https://bugzilla.wikimedia.org/68554) [16:07:10] (03PS1) 10Giuseppe Lavagetto: access: grant access to the mediawiki cluster for Brett Simmers [operations/puppet] - 10https://gerrit.wikimedia.org/r/149331 [16:09:35] something happening with toollabs? [16:10:09] hmm, seems just pathoschild's scripts are hanging [16:12:12] (03PS6) 10Physikerwelt: WIP: Draft for Mathoid role [operations/puppet] - 10https://gerrit.wikimedia.org/r/148836 [16:14:01] (03PS7) 10Physikerwelt: WIP: Draft for Mathoid role [operations/puppet] - 10https://gerrit.wikimedia.org/r/148836 [16:14:17] (03PS8) 10Physikerwelt: WIP: Draft for Mathoid role [operations/puppet] - 10https://gerrit.wikimedia.org/r/148836 [16:14:31] (03PS1) 10BryanDavis: Revert "beta::natfix removal step 2" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149333 [16:16:40] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Ori is ok to merge this once he added the ssh key for brett." [operations/puppet] - 10https://gerrit.wikimedia.org/r/149331 (owner: 10Giuseppe Lavagetto) [16:17:31] bblack, i think we should revert, i don't think opera mini is working right [16:17:42] ok, shall I? [16:18:07] yeah, go ahead [16:18:33] (03PS1) 10BBlack: Revert "Moved 250-99 to unified config" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149335 [16:18:43] (03PS2) 10BBlack: Revert "Moved 250-99 to unified config" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149335 [16:18:53] (03CR) 10BBlack: [C: 032 V: 032] Revert "Moved 250-99 to unified config" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149335 (owner: 10BBlack) [16:19:17] bblack, it seems opera mini ignores the "no script" command, nor does it execute javascript [16:19:23] [16:20:31] i will try to experiment with it over weekend to see if i can get it working some other way [16:22:08] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: Puppet has 1 failures [16:22:19] yurikMskRu: ok [16:23:45] (03PS3) 10coren: Tools: Puppetize toolwatcher [operations/puppet] - 10https://gerrit.wikimedia.org/r/120186 (owner: 10Tim Landscheidt) [16:24:12] (03CR) 10coren: "cherry-picked and merged back onto HEAD" [operations/puppet] - 10https://gerrit.wikimedia.org/r/120186 (owner: 10Tim Landscheidt) [16:24:22] (03CR) 10jenkins-bot: [V: 04-1] Tools: Puppetize toolwatcher [operations/puppet] - 10https://gerrit.wikimedia.org/r/120186 (owner: 10Tim Landscheidt) [16:28:18] !log Updating PageImages data for mainspace on Commons from terbium [16:28:24] Logged the message, Master [16:40:08] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [16:44:15] (03Abandoned) 10BryanDavis: Revert "beta::natfix removal step 2" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149333 (owner: 10BryanDavis) [16:50:48] PROBLEM - SSH on ms-be1010 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:50:50] ^d what plugin are you trying to deploy? [16:50:53] sorry [16:50:54] what repo? [16:50:58] the elasticsearchplugins repo? [16:51:08] PROBLEM - puppet disabled on ms-be1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:51:08] PROBLEM - swift-container-updater on ms-be1010 is CRITICAL: Timeout while attempting connection [16:51:08] PROBLEM - swift-container-auditor on ms-be1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:51:08] PROBLEM - swift-container-server on ms-be1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:51:08] PROBLEM - swift-object-updater on ms-be1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:51:09] PROBLEM - DPKG on ms-be1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:51:09] PROBLEM - swift-container-replicator on ms-be1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:51:18] PROBLEM - swift-account-auditor on ms-be1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:51:18] PROBLEM - check if dhclient is running on ms-be1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:51:28] PROBLEM - swift-account-reaper on ms-be1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:51:38] PROBLEM - swift-object-auditor on ms-be1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:51:38] PROBLEM - puppet last run on ms-be1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:51:48] PROBLEM - swift-object-replicator on ms-be1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:51:48] PROBLEM - swift-account-replicator on ms-be1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:51:48] PROBLEM - swift-account-server on ms-be1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:51:48] PROBLEM - swift-object-server on ms-be1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:51:48] PROBLEM - RAID on ms-be1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:52:38] PROBLEM - Disk space on ms-be1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:53:08] PROBLEM - check configured eth on ms-be1010 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:53:52] * marktraceur looks up [16:53:54] Expected? [16:59:06] !log powercycled ms-be1010 - unresponsive to ssh, nothing on mgmt [16:59:12] Logged the message, Master [16:59:21] marktraceur: dont think so [17:01:59] RECOVERY - swift-container-auditor on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [17:01:59] RECOVERY - swift-container-server on ms-be1010 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [17:01:59] RECOVERY - check configured eth on ms-be1010 is OK: NRPE: Unable to read output [17:01:59] RECOVERY - swift-object-updater on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [17:01:59] RECOVERY - swift-container-updater on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [17:01:59] RECOVERY - DPKG on ms-be1010 is OK: All packages OK [17:01:59] RECOVERY - swift-container-replicator on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [17:02:08] RECOVERY - swift-account-auditor on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [17:02:08] RECOVERY - check if dhclient is running on ms-be1010 is OK: PROCS OK: 0 processes with command name dhclient [17:02:18] RECOVERY - swift-account-reaper on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [17:02:28] RECOVERY - Disk space on ms-be1010 is OK: DISK OK [17:02:28] RECOVERY - swift-object-auditor on ms-be1010 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [17:02:29] RECOVERY - puppet last run on ms-be1010 is OK: OK: Puppet is currently enabled, last run 1550 seconds ago with 0 failures [17:02:38] RECOVERY - swift-account-replicator on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [17:02:38] RECOVERY - swift-object-replicator on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [17:02:38] RECOVERY - swift-object-server on ms-be1010 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [17:02:38] RECOVERY - swift-account-server on ms-be1010 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [17:02:38] RECOVERY - RAID on ms-be1010 is OK: OK: optimal, 14 logical, 14 physical [17:02:39] RECOVERY - SSH on ms-be1010 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [17:08:37] (03CR) 10Ori.livneh: apache: cherry-pick mods added in Ia46312071 (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149066 (owner: 10Ori.livneh) [17:09:12] (03PS3) 10Ori.livneh: apache: cherry-pick mods added in Ia46312071 [operations/puppet] - 10https://gerrit.wikimedia.org/r/149066 [17:09:22] (03PS2) 10EBernhardson: Enable sandbox page for wikimania user testing [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149077 [17:11:43] (03CR) 10Giuseppe Lavagetto: apache: cherry-pick mods added in Ia46312071 (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149066 (owner: 10Ori.livneh) [17:12:37] (03CR) 10Giuseppe Lavagetto: [C: 031] "But please wait for me to merge this and the next change on monday - this is potentially terribly dangerous and we should not do this on f" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149066 (owner: 10Ori.livneh) [17:12:46] _joe_: np [17:12:58] <_joe_> ori: it's sysadmin day [17:13:07] i'm going to rename the config test resource to something clearer, like "pre_restart_config_test" or something [17:13:08] <_joe_> no dangerous releases on sysadmin day [17:13:10] <_joe_> :) [17:13:12] :) [17:13:14] happy sysadmin day [17:13:57] *and* it's a friday. Double reason. [17:14:01] _joe_: Friday and... yeah [17:14:29] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [17:16:48] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [17:18:01] i didn't even have to merge [17:18:01] (03PS1) 10Ori.livneh: apache2: rename apache2_test_config => apache2_test_config_and_restart [operations/puppet] - 10https://gerrit.wikimedia.org/r/149344 [17:18:10] i can cause crashes with my MIND [17:18:26] kidding. i don't know what the 5xx is. [17:19:03] there's a steady trickle of exceptions: https://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&title=MediaWiki+errors&vl=errors+%2F+sec&x=0.5&n=&hreg[]=vanadium.eqiad.wmnet&mreg[]=fatal|exception>ype=stack&glegend=show&aggregate=1&embed=1 [17:22:10] (03PS4) 10Ori.livneh: apache: cherry-pick mods added in Ia46312071 [operations/puppet] - 10https://gerrit.wikimedia.org/r/149066 [17:22:38] (03CR) 10Ori.livneh: "(rebased on top of the apache2_test_config => apache2_test_config_and_restart rename)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149066 (owner: 10Ori.livneh) [17:29:48] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [17:40:45] (03CR) 10Dzahn: [C: 04-1] "root@sodium:~# ./check_mailman_queue" [operations/puppet] - 10https://gerrit.wikimedia.org/r/146756 (owner: 10Matanya) [17:56:29] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [18:00:19] ^d: happy to help with your plugin deploy anytime [18:00:22] can show you how too [18:00:35] <^d> Oh cool, yeah we need to get that out [18:01:08] want to do now? [18:01:19] (unless it is risky?) [18:01:23] (it is friday afternoon here, btw) [18:01:54] <^d> Hmm, don't we have to restart the cluster to pick up the plugin? [18:02:43] probably, i could at least get you to the point where you could do the deploy on monday [18:02:51] git fat, etc. [18:03:20] <^d> Yeah, we can do that, it's low-risk. [18:03:27] <^d> Then we can do the restart when Nik's back Monday [18:04:01] k [18:04:32] ok so [18:04:36] clone this if you don't already ahve it [18:04:40] oh, you need git fat installed locally [18:05:01] https://wikitech.wikimedia.org/wiki/Archiva#Setting_up_git-fat_for_your_project [18:05:06] you only need to do the first line there [18:05:23] you can jsut grab the python script from here [18:05:23] https://github.com/wikimedia/operations-debs-git-fat [18:05:27] or install from .deb [18:05:28] http://apt.wikimedia.org/wikimedia/pool/main/g/git-fat/ [18:05:34] (if you are debian) [18:05:56] but ja [18:05:57] clone this [18:05:58] https://gerrit.wikimedia.org/r/#/admin/projects/operations/software/elasticsearch/plugins [18:06:57] ^d, is the .jar already in archiva.wikimedia.org? [18:07:05] <^d> Should be, let me double check [18:07:14] yup! [18:07:15] http://archiva.wikimedia.org/#artifact/org.wikimedia.elasticsearch.swift/swift-repository-plugin/0.4 [18:07:16] cool [18:07:51] <^d> Yep, 0.4 [18:07:54] ok cool [18:08:09] so, download that .jar from archiva into your clone of the elasticsearchplugins repo [18:08:16] i'm not sure how Nik organizes that repo [18:08:19] but maybe just make a directory [18:08:24] swift/ [18:08:30] swift-repository-plugin/ [18:08:31] dunno [18:08:31] whatever [18:08:36] put the .jar there [18:08:39] then [18:08:58] RECOVERY - Disk space on ocg1001 is OK: DISK OK [18:09:30] hmmm, i *think* git-fat needs to be on your path [18:09:37] hm, i'm not sure how this part works, let's find out! [18:09:38] (03PS1) 10Jgreen: increase /mnt/tmpfs to 32GB on ocg100* [operations/puppet] - 10https://gerrit.wikimedia.org/r/149357 [18:09:48] i've never done this with a repo that is already set up, i've only done it for myself before [18:09:49] <^d> Uh oh. [18:09:52] <^d> 404'd. [18:09:53] <^d> http://127.0.0.1:8080/repository/releases/org/wikimedia/elasticsearch/swift/swift-repository-plugin/0.4/swift-repository-plugin-0.4.jar [18:09:53] i think if git-fat is on your path [18:09:59] and you git-fat add the .jar... [18:10:01] hm [18:10:04] oh why 127?? [18:10:06] <^d> Oh 127.0.0.1 [18:10:09] <^d> stupid url is broken [18:10:46] HMM, ok that is a todo for fixing for me [18:11:06] <^d> Anyway, got it now [18:12:39] ok you got the .jar? [18:13:02] (03PS1) 10Chad: Adding swift-repository-plugin-0.4.jar via git-fat [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/149358 [18:13:04] <^d> Wheee [18:13:12] ah ok, cool [18:13:19] <^d> Wait no. [18:13:21] <^d> That's a jar [18:13:28] ah, didn't work [18:13:28] yup [18:13:37] ok, so, this is the part i'm not sure about [18:13:42] let's figure it out so I can fix documentation [18:13:49] you have git-fat on your path? [18:14:18] <^d> Yeah, I think I see what I did wrong, lemme try again [18:14:21] ok.. [18:14:32] it should work if you git add anything-that-ends-in.jar [18:14:48] it should know to use git-fat, and add it based on file checksum [18:14:48] (03CR) 10Jgreen: [C: 032 V: 031] increase /mnt/tmpfs to 32GB on ocg100* [operations/puppet] - 10https://gerrit.wikimedia.org/r/149357 (owner: 10Jgreen) [18:14:51] hello [18:15:33] ah! [18:15:34] Now, initialize git-fat for your repository. This needs to be done for every clone of your project: [18:15:37] duh [18:15:37] git-fat init [18:15:38] ok [18:15:40] you do need to run that [18:15:44] ^d [18:15:50] <^d> Got it, yep. [18:15:53] <^d> That's what I forgot. [18:16:42] (03PS2) 10Chad: Adding swift-repository-plugin-0.4.jar via git-fat [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/149358 [18:17:31] (03PS1) 10Aaron Schulz: [WIP] Added JSON version of jobrunner config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/149362 [18:18:38] that looks good! [18:18:58] RECOVERY - Disk space on ocg1002 is OK: DISK OK [18:20:33] cool, ^d, and that is exactly the sha that is symlinked to that .jar in the git-fat rsync module [18:20:33] root@titanium:/var/lib/archiva/git-fat# ls -l 18472477e6dede4dbea26978f660af8f4d2783e4 [18:20:34] lrwxrwxrwx 1 archiva archiva 118 May 29 00:15 18472477e6dede4dbea26978f660af8f4d2783e4 -> ../repositories/releases/org/wikimedia/elasticsearch/swift/swift-repository-plugin/0.4/swift-repository-plugin-0.4.ja [18:20:49] so, when you log into tin and git-deploy that repository [18:20:57] that .jar should appear on all the deploy targets [18:23:21] <^d> Ok, I remember how to git-deploy. [18:23:50] (03CR) 10Chad: [C: 032 V: 032] Adding swift-repository-plugin-0.4.jar via git-fat [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/149358 (owner: 10Chad) [18:23:52] (03PS1) 1020after4: collect host keys on deployment-bastion for beta environment. [operations/puppet] - 10https://gerrit.wikimedia.org/r/149364 [18:24:38] RECOVERY - Disk space on ocg1003 is OK: DISK OK [18:27:39] <^d> ottomata: It says some of the minions didn't complete the fetch. Is that ok? [18:27:50] <^d> 21/26 minions completed fetch [18:28:52] check which ones [18:28:56] i think salt doesn't forget old minions [18:29:09] you can get detailed status [18:30:18] <^d> elastic1009 is on the list [18:30:31] <^d> elastic1009.eqiad.wmnet: [18:30:31] <^d> fetch status: 0 [started: 0 mins ago, last-return: 19 mins ago] [18:30:31] <^d> mw133.pmtpa.wmnet: [18:30:33] <^d> fetch status: 0 [started: 324 mins ago, last-return: 324 mins ago] [18:30:35] <^d> testsearch1001.eqiad.wmnet: [18:30:37] <^d> fetch status: 0 [started: 324 mins ago, last-return: 324 mins ago] [18:30:39] <^d> testsearch1003.eqiad.wmnet: [18:30:41] <^d> fetch status: 0 [started: 324 mins ago, last-return: 324 mins ago] [18:30:43] <^d> mw134.pmtpa.wmnet: [18:30:45] <^d> fetch status: 0 [started: 324 mins ago, last-return: 324 mins ago] [18:31:15] (03CR) 10Hashar: [C: 04-1] "Unfortunately I don't think collection works on labs. Ops would confirm." [operations/puppet] - 10https://gerrit.wikimedia.org/r/149364 (owner: 1020after4) [18:32:08] ^d, retry? [18:32:13] sometimes they take awhile? [18:32:19] all the others i think are old [18:32:34] <^d> Ah better, elastic1009 is ok. [18:32:52] <^d> Two in pmpta are silly and testsearch* were decom'd already [18:33:30] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [18:33:41] <^d> All done :) [18:37:28] cool! [18:37:41] check one of the boxes and make sure that the jar is there and is actually a jar [18:37:45] (and not a small .txt file) [18:42:10] (03PS2) 1020after4: collect host keys on deployment-bastion for beta environment. [operations/puppet] - 10https://gerrit.wikimedia.org/r/149364 [18:43:18] (03PS1) 10Ori.livneh: mediawiki::multimedia: stop managing /a/magick-tmp; provision fontconfig-config [operations/puppet] - 10https://gerrit.wikimedia.org/r/149368 [18:48:32] (03PS2) 10Ori.livneh: access: grant access to the mediawiki cluster for Brett Simmers [operations/puppet] - 10https://gerrit.wikimedia.org/r/149331 (owner: 10Giuseppe Lavagetto) [18:48:43] (03PS3) 1020after4: collect host keys on deployment-bastion for beta environment. [operations/puppet] - 10https://gerrit.wikimedia.org/r/149364 [18:49:03] (03CR) 10Ori.livneh: [C: 032 V: 032] "Merging per explicit ok from Giuseppe" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149331 (owner: 10Giuseppe Lavagetto) [18:49:25] (03CR) 10jenkins-bot: [V: 04-1] collect host keys on deployment-bastion for beta environment. [operations/puppet] - 10https://gerrit.wikimedia.org/r/149364 (owner: 1020after4) [18:50:16] <^d> ottomata: Aww, I've got a text file. [18:51:13] grr [18:51:41] >:( [18:51:45] hm [18:51:50] that is weird [18:51:52] checking... [18:53:52] (03PS1) 10Mwalker: All OCG servers should have syslog readable [operations/puppet] - 10https://gerrit.wikimedia.org/r/149376 [18:55:04] ^d, hm [18:55:05] dunno [18:55:08] i just did another deploy [18:55:10] this time i think it worked... [18:55:13] i didn't do anything special [18:55:30] <^d> Yep, seems ok this time. [18:55:33] <^d> Weird. [18:55:38] <^d> Thx! [18:56:07] (03CR) 10Dzahn: "which other role causes the conflict in labs?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149376 (owner: 10Mwalker) [18:56:30] (03PS1) 10Greg Grossmeier: Add my ssh key for my x230 laptop [operations/puppet] - 10https://gerrit.wikimedia.org/r/149378 [18:57:07] (03CR) 10Dzahn: "ah, is it time to get x220's replaced ?:) want one" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149378 (owner: 10Greg Grossmeier) [18:57:20] (03CR) 10Greg Grossmeier: "Followed the syntax from Tim and domas's." [operations/puppet] - 10https://gerrit.wikimedia.org/r/149378 (owner: 10Greg Grossmeier) [18:57:28] greg-g, I love my x230 [18:57:32] I am sad that I have to give it up [18:57:50] mwalker: I love my x200s better, but it's dang slow and can't even run vagrant [18:58:04] (03CR) 10Mwalker: "I don't actually know... but if you look at the deployment-pdf01.eqiad.wmflabs instance; it complains that:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149376 (owner: 10Mwalker) [18:58:41] hm [18:58:48] sorry for the ping [18:58:50] :) [18:58:52] =) [18:59:11] (03PS2) 10Mwalker: All OCG servers should have syslog readable [operations/puppet] - 10https://gerrit.wikimedia.org/r/149376 [18:59:16] there were two ways of doing it in that yaml file, I chose to copy you, domas :) [18:59:28] I don't think I ever saw that file! [18:59:33] I got grandfathered in [18:59:42] yeah, so it must be right then :) [19:00:04] greg-g: got a couple minutes ? :-D [19:00:07] sure [19:00:10] private [19:00:12] kk [19:01:38] (03CR) 10Dzahn: "aha, so that is because manifests/role/labs.pp uses class { 'base::syslogs'. i suggest we work around this by putting it back in the role " [operations/puppet] - 10https://gerrit.wikimedia.org/r/149376 (owner: 10Mwalker) [19:04:13] mwalker: oh wait, you have a docking station for that thing? [19:04:18] mwalker: is it WMF's? [19:04:20] :) [19:04:22] greg-g, yes it is [19:04:25] going to steal it? [19:04:35] I think I might :) [19:04:39] good call [19:04:44] I love my three monitor setup [19:04:48] I don't know of anyone else who has an x230 [19:06:51] (03PS1) 10Mwalker: New packages and apparmor entries [operations/puppet] - 10https://gerrit.wikimedia.org/r/149380 [19:07:47] (03PS1) 10Ori.livneh: osmium: include admin w/group => deployment, per discussion with joe [operations/puppet] - 10https://gerrit.wikimedia.org/r/149381 [19:08:03] (03Abandoned) 10Mwalker: All OCG servers should have syslog readable [operations/puppet] - 10https://gerrit.wikimedia.org/r/149376 (owner: 10Mwalker) [19:08:14] (03PS2) 10Ori.livneh: osmium: include admin w/group => deployment, per discussion with joe [operations/puppet] - 10https://gerrit.wikimedia.org/r/149381 [19:08:18] mwalker: mind if i amend ? [19:08:27] arr, abandoned.? [19:08:30] (03CR) 10Ori.livneh: [C: 032 V: 032] osmium: include admin w/group => deployment, per discussion with joe [operations/puppet] - 10https://gerrit.wikimedia.org/r/149381 (owner: 10Ori.livneh) [19:08:34] mutante, ooops :p [19:08:56] (03Restored) 10Dzahn: All OCG servers should have syslog readable [operations/puppet] - 10https://gerrit.wikimedia.org/r/149376 (owner: 10Mwalker) [19:09:17] mutante, all yours :) [19:09:18] more restores:) [19:09:24] mwalker: email sent with subject "The poaching of Matt Walker's x230 docking station" [19:10:22] mwalker: (not to you, to techsupport@) [19:10:31] *thumbs up* [19:10:49] even with the docking station, I'm still going to miss you [19:11:18] this isn't a one way street; I'm going to miss all y'all too [19:11:18] (03PS3) 10Dzahn: All OCG servers should have syslog readable [operations/puppet] - 10https://gerrit.wikimedia.org/r/149376 (owner: 10Mwalker) [19:11:21] fun times have been had [19:13:29] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [19:23:56] **** *** Jenkinsi! [19:24:03] !log Jenkins stalled again yeahhhhh [19:24:08] Logged the message, Master [19:25:05] !log zuul@gallium:/etc/zuul/wikimedia$ echo status|nc -q 3 localhost 4730|wc -l ... Yields: 0 . Which mean jobs are no more registered for some reason. [19:25:10] Logged the message, Master [19:28:18] !log Jenkins : disabling gearman plugin and reenabling it (just uncheck/save/check a box in https://integration.wikimedia.org/ci/configure ) [19:28:24] Logged the message, Master [19:30:51] (03PS2) 10Hashar: New packages and apparmor entries [operations/puppet] - 10https://gerrit.wikimedia.org/r/149380 (owner: 10Mwalker) [19:31:11] (03PS4) 10Dzahn: All OCG servers should have syslog readable [operations/puppet] - 10https://gerrit.wikimedia.org/r/149376 (owner: 10Mwalker) [19:32:03] (03CR) 10Andrew Bogott: swift: monitor object/container availability (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149019 (owner: 10Filippo Giunchedi) [19:43:59] mutante: I don't need an RT for that ssh key add, do I? [19:48:29] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [19:54:02] greg-g: i .. don't know :) can you do one of the "put on office wiki, gpg sign it" things [19:55:09] <_joe_> greg-g: you may edit the commit yourself I guess [19:59:16] mutante: I submitted the patch, I thought that was good enough :P [20:00:42] (03PS2) 10Greg Grossmeier: Add my ssh key for my x230 laptop [operations/puppet] - 10https://gerrit.wikimedia.org/r/149378 [20:05:20] (03PS2) 10Ori.livneh: Increased basic job runners by 1 per server [operations/puppet] - 10https://gerrit.wikimedia.org/r/149179 (owner: 10Aaron Schulz) [20:05:31] (03CR) 10Ori.livneh: [C: 032 V: 032] Increased basic job runners by 1 per server [operations/puppet] - 10https://gerrit.wikimedia.org/r/149179 (owner: 10Aaron Schulz) [20:13:32] Jeff_Green, can you take a look at https://gerrit.wikimedia.org/r/#/c/149380/ please :) [20:13:51] looking [20:14:56] (03CR) 10Jgreen: [C: 032 V: 031] New packages and apparmor entries [operations/puppet] - 10https://gerrit.wikimedia.org/r/149380 (owner: 10Mwalker) [20:15:13] akosiaris: still around? [20:15:27] q about your kafka makefiles [20:15:30] mwalker: deployed [20:15:33] not sure how the target jars actually get installed.... [20:15:55] Jeff_Green, awesome thanks [20:16:06] cmjohnson1: around? wanna approve https://gerrit.wikimedia.org/r/149378 ? :) [20:16:11] RT duty person :) [20:16:17] sure give me a sec [20:16:29] no worries [20:18:27] (03CR) 10Dzahn: [C: 032] Add my ssh key for my x230 laptop [operations/puppet] - 10https://gerrit.wikimedia.org/r/149378 (owner: 10Greg Grossmeier) [20:19:06] cmjohnson1: nvm, mutante got it :) [20:24:49] greg-g: ...added on bast1001 just now [20:24:56] thanks sir [20:26:35] (03PS5) 10Dzahn: All OCG servers should have syslog readable [operations/puppet] - 10https://gerrit.wikimedia.org/r/149376 (owner: 10Mwalker) [20:31:04] mutante: confirmed working [20:31:14] greg-g: +1 [20:32:10] mutante: thanks was on the phone [20:46:35] (03PS1) 10Dzahn: add parameter for logfile names to base::syslogs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149396 [20:47:45] (03CR) 10Mwalker: [C: 031] add parameter for logfile names to base::syslogs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149396 (owner: 10Dzahn) [20:49:08] and I am off [20:49:22] enjoy your friday evening folks and have a good week-end [20:50:52] hashar: enjoy yours too [20:51:37] hashar: g'night [20:52:40] * YuviPanda goes off to enjoy his weekend [21:02:19] (03PS1) 10Jalexander: Allow crats to add/remove petitiondata group on foundationWiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149398 (https://bugzilla.wikimedia.org/68587) [21:03:23] (03CR) 10John F. Lewis: [C: 031] Allow crats to add/remove petitiondata group on foundationWiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149398 (https://bugzilla.wikimedia.org/68587) (owner: 10Jalexander) [21:08:58] It seems the certificate on https://status.wikimedia.org/ is invalid. [21:10:18] <_joe_> Rastus_Vernon: that's because it's externally hosted - but they shouldn't misconfigure https anyways [21:11:00] Seems it's only valid for status.io.watchmouse.com, www.status.io.watchmouse.com, api.io.watchmouse.com and proxy.io.watchmouse.com. [21:13:30] Rastus_Vernon: yea, we can't really fix it. status.wikimedia.org is an alias for status.watchmouse.com. [21:14:19] not sure if the "io" part is new [21:58:08] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 1 below the confidence bounds [22:03:42] (03PS1) 10Andrew Bogott: Add a wikitech cron to send echo emails. [operations/puppet] - 10https://gerrit.wikimedia.org/r/149459 [22:03:58] legoktm: ^ [22:05:24] (03CR) 10Dzahn: [C: 032] add parameter for logfile names to base::syslogs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149396 (owner: 10Dzahn) [22:08:04] (03CR) 10Legoktm: [C: 031] Add a wikitech cron to send echo emails. [operations/puppet] - 10https://gerrit.wikimedia.org/r/149459 (owner: 10Andrew Bogott) [22:09:15] (03PS6) 10Dzahn: All OCG servers should have syslog readable [operations/puppet] - 10https://gerrit.wikimedia.org/r/149376 (owner: 10Mwalker) [22:10:16] (03CR) 10Dzahn: [C: 031] "since Change-Id: I5e4efa223b6caf235 it can now take a parameter to make ocg.log readable" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149376 (owner: 10Mwalker) [22:13:19] (03CR) 10Mwalker: [C: 031] All OCG servers should have syslog readable [operations/puppet] - 10https://gerrit.wikimedia.org/r/149376 (owner: 10Mwalker) [22:15:03] (03CR) 10Dzahn: [C: 032] All OCG servers should have syslog readable [operations/puppet] - 10https://gerrit.wikimedia.org/r/149376 (owner: 10Mwalker) [22:16:10] mwalker: dang! [22:16:19] didn't work? [22:16:25] Duplicate declaration: File[/var/log/ocg] is already declared... [22:16:47] reasons I hate puppet ^ right there [22:16:53] ocg/manifests/init.pp:190 vs. base/manifests/init.pp:107 [22:17:20] (03PS1) 10Ori.livneh: Add additional SSH key for Brett Simmers, per his request [operations/puppet] - 10https://gerrit.wikimedia.org/r/149462 [22:19:20] mwalker: so, the ocg module defines /var/log/ocg as a directory [22:19:38] but now we wanted /var/log/ocg to be a file [22:19:42] the logfile itself [22:20:12] the syslogs class expects all files to be just inside /var/log/ [22:20:22] right; forgot that bit [22:20:45] also, i forgot the .log suffix ?! [22:23:14] (03PS1) 10Mwalker: Log rotate the correct ocg file [operations/puppet] - 10https://gerrit.wikimedia.org/r/149463 [22:23:24] mutante, you'll also need ^ [22:24:06] (03CR) 10Andrew Bogott: [C: 032] Add a wikitech cron to send echo emails. [operations/puppet] - 10https://gerrit.wikimedia.org/r/149459 (owner: 10Andrew Bogott) [22:24:51] (03PS1) 10Dzahn: ocg - fix logfile name and remove logdir [operations/puppet] - 10https://gerrit.wikimedia.org/r/149464 [22:24:59] mwalker: and that ^ [22:28:03] (03PS2) 10Mwalker: ocg - fix logfile name and remove logdir [operations/puppet] - 10https://gerrit.wikimedia.org/r/149464 (owner: 10Dzahn) [22:29:18] (03CR) 10Dzahn: [C: 032] ocg - fix logfile name and remove logdir [operations/puppet] - 10https://gerrit.wikimedia.org/r/149464 (owner: 10Dzahn) [22:31:53] mwalker: success [22:32:02] Notice: /Stage[main]/Base::Syslogs/Base::Syslogs::Syslogs::Readable[ocg.log]/File[/var/log/ocg.log]/mode: mode changed '0640' to '0644' [22:32:22] ile[/etc/rsyslog.d/20-ocg.conf]: Scheduling refresh of Service[rsyslog] [22:33:59] (03CR) 10Rush: [C: 032 V: 032] "special case but confirmed by brett putting his key in his home dir, owned by him, on bast1001 that matches this." [operations/puppet] - 10https://gerrit.wikimedia.org/r/149462 (owner: 10Ori.livneh) [22:41:49] !log ocg - deleted old log dirs [22:41:55] Logged the message, Master [22:42:08] mutante, if you're bored; I have some additional ocg puppet work... [22:43:36] actually not bored, have to get one more thing done before weekend [22:43:43] in gerrit you mean? [22:46:25] (03PS9) 10Mwalker: WIP: Draft for Mathoid role [operations/puppet] - 10https://gerrit.wikimedia.org/r/148836 (owner: 10Physikerwelt) [22:47:41] mutante, nah; it's for puppet randomness (like why ganglia isn't working) [22:47:45] so go enjoy your weekend :) [22:52:27] !log Bugzilla - upgraded to 4.4.5 [22:52:34] Logged the message, Master [22:52:49] mwalker: ^ that :) [22:53:02] woot! [22:53:04] *high five* [22:53:58] mwalker: will there be two links or how will the new pdf renderer be invoked? [22:54:40] greg-g, http://en.wikipedia.beta.wmflabs.org/wiki/Special:Book [22:54:47] in the download area there is a dropdown [22:55:07] kk, I'll reply to the -ambassadors thread [22:55:17] they'll have to select the "e-book (PDF, ocg latex renderer)" option [22:55:27] * greg-g nods [22:56:38] oh, but will the two links on the left be on eg Wikipedia? [22:56:38] Download as PDF [22:56:39] Download as WMF PDF [22:56:44] mwalker: ^ [22:57:30] hmm... I can add a link like Downlaod as WMF PDF (Beta) [22:57:37] but I wasn't initially planning on it [22:57:39] oh, just curious [22:57:45] *nods* [22:57:49] no, don't do it, Nemo would complain :) [22:57:55] heh [22:58:06] especially because it isn't very localizable [22:58:14] :) [23:00:08] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 1 below the confidence bounds [23:03:30] (03CR) 10Mwalker: WIP: Draft for Mathoid role (035 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/148836 (owner: 10Physikerwelt) [23:06:39] (03PS1) 10Ori.livneh: Change bsimmer's shell to zsh [operations/puppet] - 10https://gerrit.wikimedia.org/r/149471 [23:06:49] hipsters [23:07:21] (03CR) 10Ori.livneh: [C: 032 V: 032] "(trivial)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149471 (owner: 10Ori.livneh) [23:36:08] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [23:40:48] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Fri 25 Jul 2014 21:40:06 UTC