[00:21:19] PROBLEM - puppet last run on amssq52 is CRITICAL: CRITICAL: Puppet has 1 failures [00:39:19] RECOVERY - puppet last run on amssq52 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [02:00:29] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Mon 28 Jul 2014 00:00:02 UTC [02:16:04] !log LocalisationUpdate completed (1.24wmf14) at 2014-07-28 02:15:00+00:00 [02:16:14] Logged the message, Master [02:26:38] !log LocalisationUpdate completed (1.24wmf15) at 2014-07-28 02:25:34+00:00 [02:26:44] Logged the message, Master [02:40:09] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Mon Jul 28 02:40:04 UTC 2014 [03:00:41] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul 28 02:59:35 UTC 2014 (duration 59m 34s) [03:00:47] Logged the message, Master [03:26:10] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:26:59] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.004 second response time [03:32:04] (03CR) 10Vogone: "Undelete only covers viewing deleted page revisions through Special:Undelete. What it does not cover are actions through Special:RevisionD" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149637 (https://bugzilla.wikimedia.org/68612) (owner: 10Withoutaname) [03:58:29] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Mon 28 Jul 2014 01:57:28 UTC [04:26:56] (03PS2) 10Ori.livneh: wmflib: add ordered_yaml() [operations/puppet] - 10https://gerrit.wikimedia.org/r/149775 [05:16:49] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Mon Jul 28 05:16:43 UTC 2014 [05:40:35] (03PS2) 10Ori.livneh: wmflib: add ensure_service() [operations/puppet] - 10https://gerrit.wikimedia.org/r/149778 [05:40:37] (03PS1) 10Ori.livneh: Nutcracker: move declaration to role::mediawiki; parametrize [operations/puppet] - 10https://gerrit.wikimedia.org/r/149800 [06:06:23] <_joe_> morning [06:06:49] <_joe_> ori: in stdlib there is ensure_resource [06:07:10] <_joe_> oh ok nevermind [06:07:12] <_joe_> :) [06:07:49] hey, morning [06:08:09] i'm just reading over aaron's patch to dispatch jobs via http requests [06:08:24] he sort of cheated :P [06:08:38] instead of shelling out to php...... it shells out to curl [06:08:40] <_joe_> I love cheating as long as it yields results [06:08:45] <_joe_> ewww [06:09:02] yeah, i'm not a fan [06:09:03] <_joe_> I thought he was using fastcgi [06:09:07] <_joe_> and not http [06:09:09] well "he is" [06:09:17] <_joe_> :) [06:09:18] because http -> apache -> fastcgi [06:09:22] <_joe_> yes, ok :) [06:09:29] but yeah, not the way to go imo [06:09:48] https://gist.github.com/wofeiwo/3720207 looks nice [06:09:52] <_joe_> what change are you talking about? [06:09:55] (fastcgi client library for python) [06:10:03] https://gerrit.wikimedia.org/r/#/c/149216 [06:10:55] the one thing i do like about it is that it provides an extremely easy way to compare the two approaches [06:11:09] <_joe_> ori: OTOH the good thing here would be [06:11:19] <_joe_> the jobrunners become services [06:11:30] <_joe_> and you can curl them from wherever you want [06:11:58] <_joe_> my fear is that very short jobs will lose time in shelling out curl, maybe [06:12:00] yes, i'm in favor of that part, but that's in different patches (going into mediawiki core) [06:12:33] <_joe_> not sure it's better than the current approach [06:12:43] yeah, i think the reason he did it is that it lets him keep the current implementation more or less in tact [06:12:50] <_joe_> (we're pulling jobs from the queue, which is smart and reliable) [06:13:00] because the assumption that you're spawning a subprocess and then waiting on it holds true [06:13:02] <_joe_> ori: which is a +1 from me for now [06:13:16] so it's a very cheap way to test the approach [06:13:45] yes, but a +1 with an ewww, as you said :) [06:13:45] <_joe_> couldn't we use curl from within php there? it's 10 lines of code more :) [06:13:58] but then it's not a subprocess [06:14:45] <_joe_> oh ok so you can't compare the relative speed of hhvm [06:15:02] <_joe_> but it could be faster just because of not shelling out [06:15:15] but it is shelling out [06:15:22] <_joe_> spawning a process is exensive in general, from php in particular [06:15:29] <_joe_> so I agree [06:15:54] <_joe_> this is the right approach to see how large are the benefits due to using fastcgi [06:16:05] yeah, but not something to keep [06:16:26] <_joe_> ori: when I have the new packages, I'll try them on mw1053 for a while [06:16:30] <_joe_> to see if it crashes [06:16:37] <_joe_> crash dumps are in /tmp right? [06:16:54] yep [06:17:28] this is the only unmerged PR we need: https://github.com/facebook/hhvm/pull/3249 [06:18:23] brett updated it ~14h ago, it'll probably get merged within the next 12-24h [06:19:10] <_joe_> brett works on weekends as well [06:19:42] <_joe_> I guess he likes this work :) [06:20:05] <_joe_> I've just had an almost-computer-less we, so I'm fresh :P [06:20:27] <_joe_> I hope to merge one or two of the apache patches today [06:26:01] ok cool, hope the weekend was fun [06:27:02] the nutcracker patch doesn't do anything we need urgently btw [06:27:51] so feel free to ignore that one [06:33:49] PROBLEM - puppet last run on es1002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:29] <_joe_> ori: it seems nice and goes in the right direction IMO [06:37:25] ah thanks, i'm glad you think so [06:39:57] <_joe_> ok, coffee, then hhvm [06:44:09] PROBLEM - puppet last run on es1001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:44:44] * Nemo_bis throws some sugar into _joe_'s hhvm cup [06:51:49] RECOVERY - puppet last run on es1002 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [07:01:42] ori: will you be at wikimania? [07:03:09] RECOVERY - puppet last run on es1001 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [07:07:01] YuviPanda: he will [07:07:05] legoktm: ah, cool [08:10:39] YuviPanda: (yes) [08:12:22] good morning [08:15:48] morning hashar [08:18:48] (03PS3) 10Ori.livneh: wmflib: add ensure_service() [operations/puppet] - 10https://gerrit.wikimedia.org/r/149778 [08:20:27] <_joe_> ciao hashar [08:22:30] * hashar ori: sleep sleep! :-D [08:23:02] _joe_: all the Zuul puppet patches I had got reviewed/merged/deployed last week with Alexandros :-]  Thank you for the preliminary reviews [08:23:16] we got everything done in like half an hour. Gotta love well prepared patches [08:30:40] <_joe_> hashar: :) [08:34:23] (03PS1) 10Giuseppe Lavagetto: [wikimedia] Add patches by Tim [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/149808 [08:34:25] (03PS1) 10Giuseppe Lavagetto: [wikimedia] update changelog [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/149809 [08:34:27] (03PS1) 10Giuseppe Lavagetto: [wikimedia] Add init scripts, bump changelog [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/149810 [08:34:29] (03PS1) 10Giuseppe Lavagetto: Imported Upstream version 3.1+20140723 [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/149811 [08:34:31] (03PS1) 10Giuseppe Lavagetto: [wikimedia] Remove patches integrated in the tree, add PR #3121 and PR #3249 [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/149812 [08:34:33] (03PS1) 10Giuseppe Lavagetto: [wikimedia] Remove merged patches [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/149813 [08:34:48] <_joe_> eww [08:34:54] <_joe_> ok, some fake merging to do [08:34:55] <_joe_> :) [08:35:24] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] [wikimedia] Add patches by Tim [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/149808 (owner: 10Giuseppe Lavagetto) [08:35:40] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] [wikimedia] update changelog [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/149809 (owner: 10Giuseppe Lavagetto) [08:36:40] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] [wikimedia] Add init scripts, bump changelog [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/149810 (owner: 10Giuseppe Lavagetto) [08:37:07] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Imported Upstream version 3.1+20140723 [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/149811 (owner: 10Giuseppe Lavagetto) [08:37:43] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] [wikimedia] Remove patches integrated in the tree, add PR #3121 and PR #3249 [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/149812 (owner: 10Giuseppe Lavagetto) [08:38:31] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] [wikimedia] Remove merged patches [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/149813 (owner: 10Giuseppe Lavagetto) [08:40:10] labs-vagrant was already using 3.1+20140723-1+wmf1, nothing to do right? [08:40:59] <_joe_> Nemo_bis: nevermind these commits [08:41:12] <_joe_> it's me getting back the hhvm repo into track [08:41:19] <_joe_> with my work offline [08:41:28] :) [08:42:10] <_joe_> well, really with my tiny slice of work on top of paravoid's one [08:42:29] what's the most useful bit to mention when stating what version I'm using? 3.1+20140723-1+wmf1, 3.3.0-dev, heads/wikimedia-0-g8b842db4e2db664a9b4d543047ae154a6dd59de6 [08:42:36] , ce469da81c1d8ec23f3a4aa889afadad8df5a759 [08:42:52] <_joe_> the version of the package [08:43:01] <_joe_> dpkg -l hhvm [08:43:19] <_joe_> the next package will be using 3.3.0-dev [08:44:04] ok [08:44:11] <_joe_> the next package will be out in ~ 3-4 hours [08:57:29] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Mon 28 Jul 2014 06:57:12 UTC [08:57:29] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Mon Jul 28 08:57:25 UTC 2014 [08:57:54] (03PS18) 10Legoktm: [WIP] Add extdist module + role for labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149486 (https://bugzilla.wikimedia.org/68609) (owner: 10Yuvipanda) [08:58:18] (03CR) 10Legoktm: "PS18: Pass --all to nightly.py" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149486 (https://bugzilla.wikimedia.org/68609) (owner: 10Yuvipanda) [09:18:29] legoktm: doesn't need flask either, no [09:18:47] yeah :D [09:19:13] no dependencies now [09:19:32] legoktm: can you change config file format to be json or yaml? [09:19:37] oh [09:19:37] sure [09:19:40] json [09:19:43] legoktm: ok [09:20:29] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Mon 28 Jul 2014 07:19:45 UTC [09:20:43] YuviPanda: ummmm, how do I know where conf.json is? :P [09:21:29] legoktm: ah, so you can check in /etc/extdist then dirname(__file__) [09:21:59] /etc/extdist/conf.json and then dirname(__file__)/conf.json? [09:22:51] legoktm: no, /etc/extdist.conf and then dirname(__file__)/conf.json? [09:22:59] gotcha [09:23:03] legoktm: make the log file path be configurable from the config as well [09:23:09] and I'll setup logging on /var/log/extdist [09:23:11] uhh, [09:23:20] right now it just uses whatever python's logging is defaulting to? [09:23:28] I haven't set a log dir anywhere [09:23:39] legoktm: right, I'm asking you to add code that sets that also ::P [09:23:45] oh ok :P [09:25:56] YuviPanda: er, what's the python version of __FILE__? [09:26:08] legoktm: __file__ [09:26:23] :P [09:26:43] oh, it's not set in interactive [09:26:49] oh, yeah [09:40:19] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Mon Jul 28 09:40:17 UTC 2014 [09:57:16] (03PS19) 10Yuvipanda: [WIP] Add extdist module + role for labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149486 (https://bugzilla.wikimedia.org/68609) [09:57:54] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Add extdist module + role for labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149486 (https://bugzilla.wikimedia.org/68609) (owner: 10Yuvipanda) [10:00:44] (03PS20) 10Yuvipanda: [WIP] Add extdist module + role for labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149486 (https://bugzilla.wikimedia.org/68609) [10:04:20] <_joe_> eww I screwed up badly it seems with my git commits [10:04:38] <_joe_> ok let's do that again, from scratch [10:06:25] (03PS21) 10Yuvipanda: [WIP] Add extdist module + role for labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149486 (https://bugzilla.wikimedia.org/68609) [10:07:41] (03CR) 10Alexandros Kosiaris: [C: 032] wmflib: add ensure_service() [operations/puppet] - 10https://gerrit.wikimedia.org/r/149778 (owner: 10Ori.livneh) [10:25:20] (03PS22) 10Yuvipanda: [WIP] Add extdist module + role for labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149486 (https://bugzilla.wikimedia.org/68609) [10:27:23] (03PS23) 10Yuvipanda: [WIP] Add extdist module + role for labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149486 (https://bugzilla.wikimedia.org/68609) [10:33:15] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: qualify var [operations/puppet] - 10https://gerrit.wikimedia.org/r/149003 (owner: 10Matanya) [10:33:27] thanks godog [10:34:34] matanya: thank you! [10:38:07] (03PS1) 10Giuseppe Lavagetto: Imported Upstream version 3.3-dev+20140728 [operations/debs/hhvm] (upstream) - 10https://gerrit.wikimedia.org/r/149826 [10:41:34] (03PS24) 10Yuvipanda: [WIP] Add extdist module + role for labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149486 (https://bugzilla.wikimedia.org/68609) [10:41:57] (03PS25) 10Yuvipanda: [WIP] Add extdist module + role for labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149486 (https://bugzilla.wikimedia.org/68609) [10:43:19] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [10:45:29] <_joe_> godog: ^^ happened again [10:46:53] fascinating, git on strontium didn't know who I was [10:46:58] *** Please tell me who you are. [10:47:18] (I think it was on strontium) [10:47:53] http://paste.debian.net/hidden/4a58ee06/ [10:48:57] _joe_: what's the quickest fix? [10:49:23] <_joe_> godog: do git-merge on strontium [10:49:28] <_joe_> but we have to solve thisd [10:49:41] <_joe_> it's some issue local to strontium [10:54:19] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [10:56:30] mark: ? [10:56:41] (03Abandoned) 10Giuseppe Lavagetto: Imported Upstream version 3.3-dev+20140728 [operations/debs/hhvm] (upstream) - 10https://gerrit.wikimedia.org/r/149826 (owner: 10Giuseppe Lavagetto) [11:05:55] (03PS1) 10Giuseppe Lavagetto: Imported Upstream version 3.3-dev+20140728 [operations/debs/hhvm] (upstream) - 10https://gerrit.wikimedia.org/r/149837 [11:07:45] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Imported Upstream version 3.3-dev+20140728 [operations/debs/hhvm] (upstream) - 10https://gerrit.wikimedia.org/r/149837 (owner: 10Giuseppe Lavagetto) [11:32:47] bah what puzzles me about the failure on strontium is that it looks like it is attempting to commit and fail, indeed we do a forced command of git pull && git submodule update --init, the git pull should probably be --ff-only too [11:32:55] akosiaris: ^ [11:41:59] (03CR) 10Filippo Giunchedi: [C: 04-1] "given how short ordered_json I'm more inclined to duplicate it rather than introducing a possibly-confusing dependency (confusing e.g. whe" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149775 (owner: 10Ori.livneh) [11:52:25] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, would be nice to have a link to the catalog compiler too" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/148099 (owner: 10Giuseppe Lavagetto) [11:53:50] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] apache2: rename apache2_test_config => apache2_test_config_and_restart [operations/puppet] - 10https://gerrit.wikimedia.org/r/149344 (owner: 10Ori.livneh) [11:55:01] and now it failed like this: http://paste.debian.net/hidden/fa6fb0f7/ [11:57:55] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] wmflib: add ensure_service() [operations/puppet] - 10https://gerrit.wikimedia.org/r/149778 (owner: 10Ori.livneh) [12:00:16] mhh on https://gerrit.wikimedia.org/r/#/c/149778/ I'm hitting "submit ps3" gerrit thinks for a while and then the button becomes enabled again and nothing happened, have you seen this before? [12:01:18] godog: no, but gerrit is not exactly known for stability [12:01:30] that would also explain the just failed puppet-merge [12:01:38] (03CR) 10Filippo Giunchedi: [C: 031] mediawiki::multimedia: stop managing /a/magick-tmp; provision fontconfig-config [operations/puppet] - 10https://gerrit.wikimedia.org/r/149368 (owner: 10Ori.livneh) [12:02:04] true [12:02:29] godog: the patch has an unmerged dependency [12:02:44] https://gerrit.wikimedia.org/r/#/c/149775/2 [12:03:02] (why would gerrit not tell you that, i have no idea) [12:03:12] Status Submitted, Merge Pending [12:03:21] MatmaRex: ahah! [12:03:35] yeah, you afterward kind of guess why [12:04:10] bah, in retrospect it makes sense but no feedback is a bit meh [12:04:36] akosiaris: anyways, ideas on the merge vs ff above? [12:04:41] (tahnks btw) [12:18:12] (03PS26) 10Yuvipanda: [WIP] Add extdist module + role for labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149486 (https://bugzilla.wikimedia.org/68609) [12:28:15] !log Upgrading our Jenkins Job Builder fork ( d833015..666e953 ) [12:28:20] Logged the message, Master [12:31:51] (03CR) 10Filippo Giunchedi: swift: monitor object/container availability (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149019 (owner: 10Filippo Giunchedi) [12:38:29] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Mon 28 Jul 2014 10:37:40 UTC [12:56:20] apergos: Just for your interest: The new json dump is fine, despite your worrying about echo :) [13:06:01] (03PS1) 10Giuseppe Lavagetto: Imported Upstream version 3.3-dev+20140728 [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/149850 [13:06:03] (03PS1) 10Giuseppe Lavagetto: Update the patchsets to apply cleanly [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/149851 [13:06:05] (03PS1) 10Giuseppe Lavagetto: version bump; add postrm hook [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/149852 [13:06:41] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Imported Upstream version 3.3-dev+20140728 [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/149850 (owner: 10Giuseppe Lavagetto) [13:06:59] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 525 bytes in 0.001 second response time [13:07:20] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Update the patchsets to apply cleanly [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/149851 (owner: 10Giuseppe Lavagetto) [13:07:36] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] version bump; add postrm hook [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/149852 (owner: 10Giuseppe Lavagetto) [13:08:09] (03PS3) 10Giuseppe Lavagetto: Enable hhvm hotprofiler [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/148850 (owner: 10EBernhardson) [13:08:34] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Enable hhvm hotprofiler [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/148850 (owner: 10EBernhardson) [13:10:29] hoo: good to know! congrats [13:10:41] Thanks :) [13:15:59] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.009 second response time [13:36:59] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Mon Jul 28 13:36:55 UTC 2014 [13:37:34] (03PS11) 10Physikerwelt: WIP: Draft for Mathoid role [operations/puppet] - 10https://gerrit.wikimedia.org/r/148836 [14:08:09] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 1 failures [14:19:13] (03PS1) 10Manybubbles: Beta builds Cirrus speed up field [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149859 [14:20:23] anyone mind if merge a beta-only change? [14:21:02] manybubbles: Of course, change is evil!!1 :D [14:21:23] (03CR) 10Manybubbles: [C: 032] Beta builds Cirrus speed up field [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149859 (owner: 10Manybubbles) [14:21:28] (03Merged) 10jenkins-bot: Beta builds Cirrus speed up field [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149859 (owner: 10Manybubbles) [14:21:37] hoo: with that attitude we'll get everything done [14:21:47] I'll do swat today any way [14:22:28] * hoo got things in for todays swat... so better stop trolling :P [14:22:37] :P [14:25:04] <^d> manybubbles is back :) [14:25:38] ^d: I'm so back! [14:26:57] <^d> And so you're back...from outer space.... [14:27:09] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [14:27:17] hello manybubbles [14:27:23] Nemo_bis: hi! [14:28:01] Krenair: would you mind building the submodule updates for your swat changes today? [14:28:04] it saves me some time:) [14:28:18] I'll +2 the backports ifyou'd like [14:29:17] (03CR) 10Manybubbles: [C: 031] Add import sources for bhwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149530 (https://bugzilla.wikimedia.org/68616) (owner: 10Hoo man) [14:30:50] (03CR) 10Manybubbles: [C: 031] Add 'abusefilter-log-detail' to 'rollbacker' and 'patroller' group at eswiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149533 (https://bugzilla.wikimedia.org/68319) (owner: 10Vogone) [14:31:19] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /a/common/). [14:32:31] manybubbles, hey [14:32:51] manybubbles, is there some page on wikitech explaining how to do that? [14:33:05] probably [14:33:16] Krenair: sure! https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Updating_the_submodule [14:33:23] ah yes [14:33:26] what manybubbles says [14:34:15] Krenair: I just +2ed on the release branches - you can follow the instructions to get them into the core release branch [14:34:20] that'd be super useful for me [14:34:36] I can totally do it but its nice for swat when that is already done [14:37:08] manybubbles, hm. this might take a while [14:37:39] Krenair: oh? If you are super stuck I can get it [14:38:41] It's just going to take a while to download everything [14:40:48] manybubbles, maybe I should have done this on a machine in labs? [14:41:29] Krenair: hmmm - I'm pretty sure we're averse to labs having your ssh private key to propose changes [14:41:34] Yeah, because sticking your private key to labs is a good idea [14:41:37] wait [14:41:58] I believe the issue is that too many people have root on labs [14:42:12] I don't need to put my private key on labs to upload stuff to gerrit, I don't think... [14:42:14] we're not happy that we do key forwarding on tin for deployment and that is just a temporary thing [14:42:30] Krenair: How do you plan to upload it then? [14:42:43] Key forwarding via ssh-agent is also not a good idea if a lot of people have root [14:43:08] Doesn't it allow a temporary password to upload via http? [14:43:27] don't think so [14:43:41] I have an extra core clone for backports [14:43:57] I'm away to eat for a moment... but will be back in time for my changes [14:44:05] so no canceling please :P [14:48:03] hoo|away: no canceling:) - unless your away for like an hour [14:48:43] akosiaris: yt? i'm trying to convert the kafka .deb into a multi binary package like we talked about [14:48:46] got a q... [14:50:41] Krenair: I can build it for you [14:51:36] ottomata: yes [14:52:17] (03CR) 10Helder.wiki: "Isn't "[[wikipedia:]]" supposed to work on en.wikipedia.org?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/144264 (https://bugzilla.wikimedia.org/954) (owner: 10TTO) [14:54:35] manybubbles, is https://gerrit.wikimedia.org/r/#/c/149868/ right? [14:54:47] for https://gerrit.wikimedia.org/r/#/c/149329/ [14:54:50] ok akosiaris [14:54:51] so [14:54:58] Mostly everything is working [14:54:59] except [14:55:11] i think the make install step installs everything into debian/tmp/... [14:55:16] and, for a multi binary package [14:55:35] the files should be installed into directories named by package [14:55:36] like [14:55:44] Krenair: looks fine, yeah [14:55:48] debian/kafka-common/usr/... [14:55:49] or whatever [14:55:59] manybubbles, okay, doing 1.24wmf15 submodule update as well then [14:56:35] is it possible to get the install scripts to do that? should I just DESTDIR to debian/kafka-common in rules or something? [14:56:38] Krenair: thanks [14:56:39] set* [14:56:50] ottomata: https://www.debian.org/doc/manuals/developers-reference/best-pkging-practices.html#multiple-binary [14:57:04] manybubbles, https://gerrit.wikimedia.org/r/149869 for https://gerrit.wikimedia.org/r/#/c/149328/ [14:57:26] so, just move the files around in the temporary trees like you already described :-) [14:57:53] using DESTDIR? [14:58:01] Krenair: got it: https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=121583&oldid=121552 [14:58:10] or using package-name.install? [14:58:16] manybubbles, turns out I don't need to do the whole "git submodule update --init --recursive" just to send those commits [14:59:02] plain install or cp calls in an new install: target in debian/rules [14:59:03] Krenair: you just need your submodule, I imagine [14:59:08] yep. [14:59:34] akosiaris: ? [14:59:38] ottomata: cause we don't see to already have one, using dh $@ --with javahelper for everything [14:59:41] manybubbles, okay. I can't merge on the wmf branches so I have to get a deployer to merge them before I can do the submodule stuff [14:59:51] so override dh_install [15:00:04] manybubbles, Reedy, awight: Sir, Please deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140728T1500), the time has come. At your service [15:00:36] oh, for each file, use install or cp for each file? in override_dh_install? [15:00:39] hm [15:00:41] Krenair: I thought I did do the merging on the submodule branches for you? but, yeah, that is a bit weird [15:01:02] manybubbles, yeah, you did. [15:01:10] ottomata: not that many files anyway.. right ? [15:01:30] the bin shell script for kafka-client and init.d script for kafka-server ? [15:01:33] anything I miss ? [15:01:42] manybubbles, But I mean I have to put the cherry-pick-to-extension-branch gerrit changes on the deployment calendar [15:01:44] awight: hey - you merged your submodule update before the swat deployment so I have to do you first [15:01:46] are you ready? [15:02:01] Krenair: ah! because you can't have built the submodule update yourself [15:02:05] manybubbles: yep, thank you [15:02:07] akosiaris: no, those work, via the .install files [15:02:10] yeah [15:02:12] kafka-server.install, etc. [15:02:18] then ? [15:02:19] yeah, the merge was during my attempted deployment last week... oops! [15:02:20] its the files that your Makefiles install [15:02:24] the compiled .jars [15:02:27] ah [15:02:27] for kafka-common [15:02:31] that go to debian/tmp/... [15:02:45] Krenair: I'll bother folks about that - it is certainly easier on me if you can +2 in the deployment branches and make the submoudle updates [15:03:03] manybubbles, well I can't be a deployer, so... [15:03:19] Krenair: I can help CR, what's the patch? [15:03:29] awight, we sorted it for this one, it's fine [15:03:32] kk [15:03:36] Krenair: I'm more arguing that merging to the deployment branch might be something we let everyone do [15:03:46] ah, yeah [15:04:05] (03PS27) 10Yuvipanda: Add extdist module + role for labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149486 (https://bugzilla.wikimedia.org/68609) [15:06:36] awight: just checking on your change - it looks like it was merged five days ago [15:06:43] manybubbles: exactly. [15:06:47] are you sure you meant it to go in swat today? [15:06:53] manybubbles: I failed to do the submodule update... [15:06:58] because when you merged it into the deployment branch..... [15:06:59] ah [15:07:19] hehe only realized what I'd done wrong on Friday. [15:07:50] awight: so the change you actually want deployed is this: 95e0324951bb0ebaf48be7a0871e897799d58688 ? [15:07:59] looking... [15:08:13] Hey greg-g. [15:08:19] because it *looks to me* like it was all deployed [15:08:32] manybubbles: yes, that's the underlying change [15:08:32] I checked the version both before and after I did the submodule update and it all looked clean [15:08:43] manybubbles: ok I'll verify [15:08:51] cool - if not I can sync it again [15:08:58] ottomata: I don't have the answer ready. Will need to research it a bit more [15:09:03] manybubbles: it's the 1.24wmf14 branch that was botched [15:09:43] awight: that is what I was looking at. I'll just sync the files again to be sure - but it'd be helpful if you could verify that it all worked [15:10:08] ottomata: got a change I should use ? [15:10:20] !log manybubbles Synchronized php-1.24wmf14/extensions/FundraisingTranslateWorkflow/: SWAT update fundraising to fix botched deploy [15:10:25] Logged the message, Master [15:11:19] Krenair: I'm going to start the process on your wmf15 updates [15:11:31] manybubbles, okay [15:11:48] ok cool, akosiaris i am trying DESTDIR... [15:12:17] manybubbles: confirmed, the change has gone out and the bug is even fixed. Thank you! [15:12:24] awight: wee! [15:12:27] (03PS12) 10Physikerwelt: Mathoid configuration for beta labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/148836 [15:13:41] (03CR) 10Manybubbles: [C: 032] Add import sources for bhwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149530 (https://bugzilla.wikimedia.org/68616) (owner: 10Hoo man) [15:13:51] hoo: I'll do the import source while I wait for jenkins [15:14:00] (03Merged) 10jenkins-bot: Add import sources for bhwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149530 (https://bugzilla.wikimedia.org/68616) (owner: 10Hoo man) [15:14:24] yeah, they're trivial enough [15:14:53] hoo: going out now [15:14:55] !log manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT - add import sources to bhwiki (duration: 00m 08s) [15:15:01] Logged the message, Master [15:15:19] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [15:15:23] ^d: there? :) I created the swift account so we can move forward [15:15:28] it seems that there is a problem with http://ganglia.wmflabs.org/latest/ [15:15:56] confiremd [15:15:58] <^d> godog: Yep, I'm here. So I realized Thurs/Fri that we never deployed the swift plugin for ES. [15:16:04] * confirmed [15:16:11] <^d> We did the git-fat dance and got it live, but we still need a rolling cluster restart to pick it up. [15:16:19] <^d> Which I was waiting for manybubbles to come back before doing. [15:16:24] <^d> (And not on a friday) [15:17:00] !log manybubbles Synchronized php-1.24wmf15/extensions/Echo/: SWAT - fix incorrect variable name (duration: 00m 08s) [15:17:07] Logged the message, Master [15:17:18] (03PS1) 10Hashar: Load Mantle before MobileFrontend [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149873 (https://bugzilla.wikimedia.org/68704) [15:17:25] ^d: you want to do the rolling restart dance? its all kinds of fun! [15:17:29] ^d: hehe makes sense [15:17:34] physikerwelt: looking [15:17:40] you could even upgrade to 1.3 [15:18:02] but we'll need to do another one after wikimania to pick up the highlighter changes I started on friday morning before we drove 16 horus [15:18:19] <^d> I think I understand how to do the rolling restart dance. [15:18:25] Krenair: incorrect variable fix deployed to wmf15 - please verify [15:18:25] <^d> Upgrading to 1.3 is scarrrryyyyyyy :p [15:18:34] ^d: its pretty much the same:) [15:18:39] I can do the 1.3 upgrade later [15:18:41] that makes more sense [15:19:02] godog: I have also a problem to list instances at https://wikitech.wikimedia.org/wiki/Special:NovaInstance So it might be related to my user-account / browser [15:19:07] I'll have time to update all the other extensions [15:19:08] <^d> (Plus I also haven't tested the plugin with 1.3 yet which shouldn't be a big deal but I need to first) [15:19:16] yeah [15:20:02] manybubbles: Busy SWAT today. :-( [15:20:07] Krenair: ping on verify? [15:20:26] <^d> godog: So, we can get the key in privatesettings and such, but I don't think we'll be ready to test today yet. [15:20:28] James_F: yeah! [15:20:29] manybubbles, looks fine to me [15:20:50] Krenair: cool - starting the merge for wmf14 then [15:21:21] (03CR) 10Florianschmidtwelzow: [C: 031] Load Mantle before MobileFrontend [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149873 (https://bugzilla.wikimedia.org/68704) (owner: 10Hashar) [15:21:29] Vogone: I can do your eswiki permissions changes if you are ready to verify them [15:21:54] k [15:22:07] (03CR) 10Manybubbles: [C: 031] Load Mantle before MobileFrontend [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149873 (https://bugzilla.wikimedia.org/68704) (owner: 10Hashar) [15:22:20] (03CR) 10Manybubbles: [C: 032] Add 'abusefilter-log-detail' to 'rollbacker' and 'patroller' group at eswiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149533 (https://bugzilla.wikimedia.org/68319) (owner: 10Vogone) [15:22:26] (03Merged) 10jenkins-bot: Add 'abusefilter-log-detail' to 'rollbacker' and 'patroller' group at eswiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149533 (https://bugzilla.wikimedia.org/68319) (owner: 10Vogone) [15:22:42] (03CR) 10Chad: [C: 031] Load Mantle before MobileFrontend [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149873 (https://bugzilla.wikimedia.org/68704) (owner: 10Hashar) [15:22:43] godog: I have logged out and logged in again the problem listing the instances disappeared but the ganglia problem still remains [15:23:17] !log manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT - update some permissions on eswiki (duration: 00m 08s) [15:23:21] Vogone: deployed [15:23:23] Logged the message, Master [15:23:49] physikerwelt: yeah I'm currently looking at that (ganglia) [15:24:09] ^d: ok! let me know when it'd be good to go on your side [15:25:19] manybubbles, am ready to verify for the echo wmf14 change, by the way [15:25:53] Krenair: jenkins hated it - forcing it to retry [15:25:57] (03CR) 10Jforrester: [C: 031] Load Mantle before MobileFrontend [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149873 (https://bugzilla.wikimedia.org/68704) (owner: 10Hashar) [15:25:58] :/ [15:26:04] there is a good chance it was spurious [15:26:12] its pretty rare that it does that, but it happens [15:26:19] godog: Thank you. I was testing that change https://gerrit.wikimedia.org/r/#/c/148836/ and I got the message from puppet that the ganglia monitoring was set up correctly. I hope that is unrelated to the global ganglia problem [15:26:38] Krenair: you can watch it some: https://integration.wikimedia.org/zuul/ [15:27:01] Oh it's that damn qunit issue [15:27:01] looks like the qunit tests passed this time - must have been something silly [15:27:21] Yeah this is a PHP fix and shouldn't be messing with qunit [15:27:36] I've seen it do that before [15:27:45] (03CR) 10Greg Grossmeier: [C: 031] "Please unbreak beta cluster." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149873 (https://bugzilla.wikimedia.org/68704) (owner: 10Hashar) [15:27:48] physikerwelt: ah nevermind I misread the url, don't know much about ganglia in labs though :( [15:27:49] (03CR) 10Manybubbles: "Ok - so many +1s so fast.... Should I tack this onto the end of SWAT if there is time?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149873 (https://bugzilla.wikimedia.org/68704) (owner: 10Hashar) [15:27:58] (03CR) 10Andrew Bogott: [C: 031] "Is this a service that's currently running on production but unpuppetized? Or a new service? (The bug seems to suggest the former, but t" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149486 (https://bugzilla.wikimedia.org/68609) (owner: 10Yuvipanda) [15:27:59] manybubbles: Yes. [15:28:19] manybubbles: yes please [15:28:25] James_F: hmmm - in use but not breaking without that change - I guess not in use for mobile frontend? [15:28:29] whatever - looks pretty safe [15:28:30] James_F: greg-g comment on gerirt! [15:28:36] I did [15:28:42] hashar: :-P [15:28:45] andrewbogott: online? [15:28:53] greg-g: I am, what's up? [15:28:57] manybubbles: It's in use and broken in MF too. [15:29:01] manybubbles: works for me :) [15:29:03] andrewbogott: mind merging https://gerrit.wikimedia.org/r/149873 [15:29:18] plenty o' +1s ;) [15:29:20] manybubbles: Also has broken Beta Labs scap, DB update etc. [15:29:21] (03CR) 10Hashar: "I guess one can +2 it, verify beta is fine and then swat deploy it :-D" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149873 (https://bugzilla.wikimedia.org/68704) (owner: 10Hashar) [15:29:37] (03CR) 10Andrew Bogott: [C: 032] Load Mantle before MobileFrontend [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149873 (https://bugzilla.wikimedia.org/68704) (owner: 10Hashar) [15:29:45] (03CR) 10Florianschmidtwelzow: "> Question: is mantle in use in production yet?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149873 (https://bugzilla.wikimedia.org/68704) (owner: 10Hashar) [15:29:45] now andrew has to deploy it hehe [15:29:52] Ha. [15:30:05] ah [15:30:11] greg-g: I'll deploy it.... [15:30:19] thanks manybubbles [15:30:24] manybubbles: thanks [15:30:33] operations is a bit misleading since that is both platform+ops duties and the ops team [15:30:41] (03PS2) 10BBlack: Add explicit mmap addrs for varnish persistent storage [operations/puppet] - 10https://gerrit.wikimedia.org/r/149068 [15:30:42] :) [15:31:42] Krenair: here you go [15:31:45] !log manybubbles Synchronized php-1.24wmf14/extensions/Echo/: SWAT fix bad variable name in echo (duration: 00m 08s) [15:31:51] Logged the message, Master [15:31:59] (03CR) 10Rush: [C: 04-1] "I don't think this makes sense. true/false _already_ work as directives for service state, and absent/present don't make sense. IMO ther" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149778 (owner: 10Ori.livneh) [15:32:08] manybubbles, looks good! [15:32:20] thanks [15:32:21] Krenair: consider yourself SWATed [15:33:00] !log manybubbles Synchronized wmf-config/CommonSettings.php: SWAT load Mantle before MobileFrontent (duration: 00m 07s) [15:33:05] (03CR) 10Manybubbles: "deployed to prod" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149873 (https://bugzilla.wikimedia.org/68704) (owner: 10Hashar) [15:33:06] Logged the message, Master [15:33:17] Nik saved us again [15:33:37] ALL HAIL [15:33:49] hoo: your turn again! [15:34:10] :) [15:34:13] hashar: thanks for actually fixing it [15:34:18] shall I +2 or will you? [15:34:20] also - now beta works [15:34:22] I think I just did [15:34:23] Wait, nevermind [15:34:44] !deploy unlock [15:35:01] James_F: sorry for having you last - was just going down the list and grabbing the submodule updates in order [15:35:14] hashar: wuzzat? [15:35:54] manybubbles: wrong copy paste sorry [15:36:02] hashar: thanks! [15:36:45] manybubbles: No problem at all. :-) [15:37:29] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Mon 28 Jul 2014 13:36:55 UTC [15:38:32] (03PS1) 10BBlack: fix mmap-fail printf formatting [operations/debs/varnish] (3.0.5-plus-wm) - 10https://gerrit.wikimedia.org/r/149878 [15:38:34] (03PS1) 10BBlack: Added new patch for explicit persistent mmap addr [operations/debs/varnish] (3.0.5-plus-wm) - 10https://gerrit.wikimedia.org/r/149879 [15:38:36] (03PS1) 10BBlack: varnish (3.0.5plus~x-wm7) unstable; urgency=low [operations/debs/varnish] (3.0.5-plus-wm) - 10https://gerrit.wikimedia.org/r/149880 [15:39:27] !log manybubbles Synchronized php-1.24wmf15/includes/specials/SpecialRevisiondelete.php: SWAT - fix fatal on revision delete (duration: 00m 08s) [15:39:32] hoo: revisiondelete is live [15:39:32] Logged the message, Master [15:39:55] Whee. [15:39:58] James_F: +2ed your wmf15 change [15:39:58] manybubbles: Confirmed! [15:40:07] hoo: sweet. closing your window:) [15:40:19] PROBLEM - Host gold is DOWN: PING CRITICAL - Packet loss = 100% [15:40:40] manybubbles: Deployed yet? [15:40:59] Vogone: just read my scrollback - glad it works. marking you off or my list:) [15:41:17] James_F: waiting on jenkins - he takes his time [15:41:37] !log Removed all right holders from closed and inaccessible ukwikimedia (bug 68737) [15:41:38] manybubbles: That is sadly true. [15:41:43] Logged the message, Master [15:42:01] James_F: he's nice though - he gets the job done [15:43:09] RECOVERY - Host gold is UP: PING OK - Packet loss = 0%, RTA = 3.66 ms [15:43:36] (03CR) 10BBlack: [C: 032] fix mmap-fail printf formatting [operations/debs/varnish] (3.0.5-plus-wm) - 10https://gerrit.wikimedia.org/r/149878 (owner: 10BBlack) [15:43:48] (03CR) 10BBlack: [C: 032] Added new patch for explicit persistent mmap addr [operations/debs/varnish] (3.0.5-plus-wm) - 10https://gerrit.wikimedia.org/r/149879 (owner: 10BBlack) [15:44:02] (03CR) 10BBlack: [C: 032] varnish (3.0.5plus~x-wm7) unstable; urgency=low [operations/debs/varnish] (3.0.5-plus-wm) - 10https://gerrit.wikimedia.org/r/149880 (owner: 10BBlack) [15:46:53] James_F: deploying [15:46:55] !log manybubbles Synchronized php-1.24wmf15/extensions/VisualEditor/: SWAT - fix visual editor bug - Changes made after reviewing changes are not sent (when caching is enabled) (duration: 00m 08s) [15:47:01] Logged the message, Master [15:47:39] (03PS3) 10BBlack: Add explicit mmap addrs for varnish persistent storage [operations/puppet] - 10https://gerrit.wikimedia.org/r/149068 [15:48:06] manybubbles: Confirmed fixed in wmf15. [15:48:17] manybubbles: Please go ahead with the wmf14 patch. :-) [15:48:23] James_F: +2ed [15:48:48] RobH: rt-wise… I'm still not clear on what exactly to do with these maintenance/service announcements from our network providers. Is there a calendar mark, or do you just file the tickets and merge the related ones? [15:49:00] (03CR) 10BBlack: [C: 04-1] "Updated the 3.0.5 package with the necessary changes as well, since the upstream timeline for 3.0.6 release is unknown. Still requires th" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149068 (owner: 10BBlack) [15:55:54] <^d> manybubbles: I can start the dance for the boxes. I guess we'll do 17-19 first since they've already got no load. [15:56:07] ^d: makes no difference to me [15:57:04] <^d> So it's just `service restart` on the box? [15:57:44] !log manybubbles Synchronized php-1.24wmf14/extensions/VisualEditor/: SWAT - fix visual editor bug - Changes made after reviewing changes are not sent (when caching is enabled) (duration: 00m 07s) [15:57:50] Logged the message, Master [15:57:57] James_F: wmf14 is done [15:58:40] manybubbles: And confirmed fixed. Thanks! [15:58:48] ^d: https://wikitech.wikimedia.org/wiki/Search#Rolling_restarts [15:59:17] ^d: oh! [15:59:33] that script would actually reenable assigning indexes to the nodes that we've currently banned [15:59:41] wait - no it won't [15:59:46] its safe [16:00:04] andrewbogott: Sir, Please deploy Wikitech update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140728T1600), the time has come. At your service [16:00:04] James_F: sweet. thanks [16:00:13] <^d> manybubbles: You should move those bash scripts to the Elastic boxes. [16:00:15] !log deone with SWAT [16:00:21] Logged the message, Master [16:00:21] thanks much manybubbles [16:00:23] <^d> Would be super super useful. [16:00:25] it was a big one today [16:02:00] ^d: I actually run them from my laptop [16:02:05] because it can ssh into everything in turn [16:02:19] and I send up hacking on the scripts from time to time [16:02:28] greg-g: all in the line of duty [16:03:10] dammit [16:03:32] ^d: compare: [16:04:01] <^d> Not gonna work on elastic1017 [16:04:25] wikitech broken? [16:04:25] <^d> ExecutionError[java.lang.NoClassDefFoundError: org/javaswift/joss/exception/CommandException [16:04:42] Nikerabbit: I'm upgrading it, it'll be flaky for a bit [16:04:49] ok [16:05:39] PROBLEM - ElasticSearch health check on elastic1017 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.39 [16:05:40] !log andrewbogott> Nikerabbit: I'm upgrading it [wikitech wiki], it'll be flaky for a bit [16:05:45] Logged the message, Master [16:06:39] PROBLEM - check configured eth on gold is CRITICAL: eth1 reporting no carrier. [16:06:53] <^d> manybubbles: Did I compile things wrong? [16:07:43] thanks Nemo_bis [16:08:09] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:13] greg-g (or whoever) I have a staged upgrade of the wikitech mediawiki code, but when I switch over to it wikitech throws exceptions. Care to help me debug? [16:08:34] not me :) [16:08:38] <^d> Pastebin the exception? [16:08:39] I ran update.php and, mercifully, when I switched back to the old version it seems to still work [16:08:46] Reedy: ^^ [16:08:56] ^d: Where should I look for the full exception? Apache log? [16:09:14] <^d> Probably. Not sure how Wikitech is configured. [16:09:28] heh [16:09:49] But yeah, start with the apache logs [16:09:55] ^d: I'm not sure that looks like you deployed your jar but not the joss jar - checking [16:10:06] (03PS2) 10Ori.livneh: mediawiki::multimedia: stop managing /a/magick-tmp; provision fontconfig-config [operations/puppet] - 10https://gerrit.wikimedia.org/r/149368 [16:10:08] <^d> Oh I thought it was bundled. [16:10:17] (03CR) 10Ori.livneh: [C: 032 V: 032] mediawiki::multimedia: stop managing /a/magick-tmp; provision fontconfig-config [operations/puppet] - 10https://gerrit.wikimedia.org/r/149368 (owner: 10Ori.livneh) [16:10:17] <^d> I don't understand this java thing sometimes. [16:11:58] andrewbogott: I was relocating into the office, sorry for the delay. So for the maint stuff [16:13:35] and seems yer fixing something so ignore my reply until yer done [16:13:56] But yes, for maint-announce, I tend to modify the subject to have the date range of the window in it [16:14:01] ie: rt 7776 [16:14:04] (03PS1) 10Chad: Adding swift dependency of joss [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/149885 [16:14:17] and I merge all related updates to the window into the same ticket, and keep its subject line relevant [16:14:33] so if there is any network outage, folks can glance at the queue and see if any of the windows are active. [16:15:18] <^d> ottomata, manybubbles: https://gerrit.wikimedia.org/r/#/c/149885/ [16:15:45] (03CR) 10Manybubbles: [C: 031] "Merge when ready" [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/149885 (owner: 10Chad) [16:16:58] (03CR) 10Chad: [C: 032 V: 032] Adding swift dependency of joss [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/149885 (owner: 10Chad) [16:18:21] ^d: error appears to be PHP Warning: dba_open(/srv/org/wikimedia/controller/wikis/slot0/cache/l10n_cache-en.cdb.tmp.128406164) [function.dba-open]: failed to open stream: Permission denied in /srv/org/wikimedia/controller/wikis/slot0/includes/utils/CdbDBA.php on line 53 [16:18:28] Does that mean anthing specifc to you? [16:18:46] andrewbogott: l10n cache needs rebuilding... [16:18:47] <^d> andrewbogott: Permissions on cache/ directory in MW. [16:18:53] <^d> Plus what Reedy said. [16:19:01] Reedy: cool, how do I do that? [16:19:15] scap? :) [16:19:16] ^d: I don't think it's permissions; that file doesn't exist. [16:19:32] greg-g: wikitech :) :( [16:19:35] the .tmp seems to suggest it's trying to write... [16:19:43] YuviPanda: yeah, can we fix that? :) [16:19:56] (03PS1) 10Ottomata: Split kafka package into 3 separate packages [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/149889 [16:19:59] andrewbogott: php /srv/org/wikimedia/controller/wikis/slot/maintenance/rebuildLocalisationCache.php [16:20:11] (03PS2) 10Ottomata: Split kafka package into 3 separate packages [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/149889 [16:20:30] Reedy: ok, checking... [16:20:33] <^d> manybubbles: That was it, elastic1017 is back up. [16:20:39] RECOVERY - ElasticSearch health check on elastic1017 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 19: number_of_data_nodes: 19: active_primary_shards: 2033: active_shards: 6098: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [16:20:47] ^d: figured [16:21:38] akosiaris: https://gerrit.wikimedia.org/r/#/c/149889/2 [16:21:38] seems to work [16:21:42] not sure if it is the best way to do things [16:22:07] DESTDIR didn't work. I used cp in dh_install override like you suggested. [16:22:15] but I think it woul dbe better if DESTDIR worked, not sure... [16:23:06] ottomata: looking [16:24:42] ^d, greg-g, Reedy, seems better now, thanks. [16:25:10] Also, I'm maybe going to stab the next person who suggests that we switch wikitech to scap, since every time I ask for help with that the room gets strangely silent :) [16:25:16] (03PS1) 10ArielGlenn: public_html directory service, see RT #6862 [operations/puppet] - 10https://gerrit.wikimedia.org/r/149890 [16:26:09] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [16:26:18] andrewbogott: I'd be glad to talk with you about it. I don't know the wikitech setup, but I'm pretty familiar with what scap does. [16:26:35] I've heard that SMW may be the sticking point [16:27:08] bd808: it may be, although we already have SMW pegged to an old branch. So I don't know why that would bother scap. [16:27:32] (03PS1) 10Subramanya Sastry: Add ruthenium backends for visual diff tests and visual diff service [operations/puppet] - 10https://gerrit.wikimedia.org/r/149891 [16:28:13] bd808: I'm not sure how much exactly scap would help... [16:28:19] scap is just the "last mile" of the cluster deploy. We don't have SMW in the wmf/* branches [16:28:20] It's staged and run from 1 machine [16:28:21] (03CR) 10Subramanya Sastry: "wip to get feedback." [operations/puppet] - 10https://gerrit.wikimedia.org/r/149891 (owner: 10Subramanya Sastry) [16:28:27] bd808: We do [16:28:41] Reedy: Oh? I guess I assumed wrongly [16:28:52] We just branch it with the rest of the stuff for "ease" essentially [16:28:53] Yeah, I just did a 'submodule update' and I got the right (ancient) version of SMW in my tree. [16:28:56] So I think that's fine. [16:29:18] Unless I misunderstand how this works [16:29:19] Reedy: do you remember which special page to use to approve OAuth requests? [16:29:51] * bd808 reads what greg-g wrote about wikitech and scap in https://bugzilla.wikimedia.org/show_bug.cgi?id=68751 [16:30:14] YuviPanda: https://www.mediawiki.org/wiki/Special:OAuthListConsumers [16:30:21] YuviPanda: https://www.mediawiki.org/wiki/Special:OAuthManageConsumers [16:30:34] !log updated wikitech to 1.24wmf15; turned on OAuth [16:30:40] Logged the message, Master [16:30:48] Yay :) [16:31:39] bd808: I'm on support rotation this week and on vacation next, but I'll hit you up for help with this when the dust settles. [16:33:21] RobH: OK, I'm back :) What do you think about the idea of putting stuff like that on the calendar? It almost makes sense for it to land on the deployment calendar, since that's the first place to look for 'why is this broken?' [16:33:38] I cannot get folks to update the actual queue [16:33:44] I wasn't going to add updating something else [16:33:46] ;] [16:34:09] fair [16:34:14] andrewbogott: Sure. I'll be at wikimania and then out for another week after. Maybe we can figure something out towards the end of August. Sam would know how to do all of the setup too. (faster/better than I do actually) [16:34:16] i like the idea though [16:34:26] just means someone has to keep it up to date [16:35:09] Yeah. And it would have to be private, because otherwise it would amount to a when/how ddos cheat guide. [16:35:38] hrmmm [16:35:50] i guess we need to make that eventual phab queue private as well [16:35:56] yeah [16:36:15] since my procurement queue will also have that condition, its not a new requirement [16:36:30] (so im not pinging all the phab related folks) [16:37:09] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Mon Jul 28 16:37:01 UTC 2014 [16:39:45] (03CR) 10Subramanya Sastry: "reconsidering approach. thinking of using same domain with paths to direct to different services via nginx." [operations/puppet] - 10https://gerrit.wikimedia.org/r/149891 (owner: 10Subramanya Sastry) [16:44:29] (03PS1) 10Andrew Bogott: Allow icinga status checks on wikitech [operations/puppet] - 10https://gerrit.wikimedia.org/r/149895 [16:47:33] (03CR) 10Nikerabbit: Mathoid configuration for beta labs (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/148836 (owner: 10Physikerwelt) [16:48:15] (03PS2) 10Krinkle: diamond: Enable for 'cvn' project in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/148689 (https://bugzilla.wikimedia.org/68444) [16:48:21] YuviPanda: ^ [16:48:58] (03CR) 10Yuvipanda: [C: 031] diamond: Enable for 'cvn' project in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/148689 (https://bugzilla.wikimedia.org/68444) (owner: 10Krinkle) [16:49:02] Krinkle: don't have +2, sadly [16:49:27] YuviPanda: But you did help set up that infrastructure right? [16:49:36] Who in ops helped with that? [16:49:39] Krinkle: indeed. [16:49:55] Krinkle: andrewbogo.tt but he's currently debugging wiktech. I'll poke him when that's done [16:50:12] chasemp: ^ +2 would be nice :) [16:50:19] YuviPanda: k, I can wait hours even. Just ideally not weeks this time. I really need this monitoring. [16:50:24] Krinkle: :) [16:50:43] Krinkle: although, we'll get a new box shortly (in a few days), and then all projects will just send there by default. the whitelist will be removed [16:50:46] Once too many times this past few months I find out 2 days after the fact that something's been completely down in 'cvn' affecting lots of users. [16:50:57] YuviPanda: I know, that's what I heard last week when I submitted the patch. [16:51:09] Krinkle: heh, yeah. we're at our third machine, apparently our hardware doesn't like trusty that much [16:51:25] I've got the same problem with Jenkins. [16:51:28] We'll get there. [16:52:12] (03PS4) 10coren: Tools: Puppetize toolwatcher [operations/puppet] - 10https://gerrit.wikimedia.org/r/120186 (owner: 10Tim Landscheidt) [16:52:26] Krinkle: fwiw, I'm also not fully sure if the current machine can handle any more metrics. but we'll see [16:52:53] (03CR) 10jenkins-bot: [V: 04-1] Tools: Puppetize toolwatcher [operations/puppet] - 10https://gerrit.wikimedia.org/r/120186 (owner: 10Tim Landscheidt) [16:55:00] Krinkle: can you check if diamond processes are running on your hosts? [16:55:17] YuviPanda: They aren't, haven't been for several weeks [16:55:20] see the bug report [16:55:21] Krinkle: ah, cool [16:55:28] puppet will start them, yeah? [16:55:31] Krinkle: no, just wanted to make sure :) [16:55:33] Krinkle: yeah, should [16:55:34] cool [16:55:59] Krinkle: there's also a whitelist on the graphite box itself (since it took a while to get the whitelist merged), so just modified that to allow cvn through [16:56:52] (03PS2) 10coren: Allow icinga status checks on wikitech [operations/puppet] - 10https://gerrit.wikimedia.org/r/149895 (owner: 10Andrew Bogott) [16:57:12] andrewbogott: ^^ this is the way to also actually /do/ server-status. :-) [16:57:57] Ah, ok, I wasn't clear on if it was already 'on' and just inaccessible. [16:57:58] thanks [16:58:06] (Also note location vs directory) [16:58:36] Ah, of course [16:58:38] makes sense [16:58:44] (03CR) 10coren: [C: 031] "Icinga likes it." [operations/puppet] - 10https://gerrit.wikimedia.org/r/149895 (owner: 10Andrew Bogott) [16:59:15] (03CR) 10Andrew Bogott: [C: 032] Allow icinga status checks on wikitech [operations/puppet] - 10https://gerrit.wikimedia.org/r/149895 (owner: 10Andrew Bogott) [17:00:52] Pushing that will remove 2 errors from the logs every minute. :-) [17:01:20] (03PS1) 10Ori.livneh: mediawiki: add tidy resource for temp multimedia files [operations/puppet] - 10https://gerrit.wikimedia.org/r/149896 [17:03:33] Coren: merge https://gerrit.wikimedia.org/r/#/c/148689/ while you're at it? :) for Krinkle [17:07:16] (03CR) 10coren: [C: 032] "Godspeed, cvn. :-)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148689 (https://bugzilla.wikimedia.org/68444) (owner: 10Krinkle) [17:10:27] (03PS1) 10Ori.livneh: mediawiki: restore Eval.PerfPidMap = false to HDF file [operations/puppet] - 10https://gerrit.wikimedia.org/r/149900 [17:15:04] (03PS2) 10Ori.livneh: jobrunner/hhvm: disable perf maps, turn off jit [operations/puppet] - 10https://gerrit.wikimedia.org/r/149900 [17:15:37] (03CR) 10Giuseppe Lavagetto: [C: 032] jobrunner/hhvm: disable perf maps, turn off jit [operations/puppet] - 10https://gerrit.wikimedia.org/r/149900 (owner: 10Ori.livneh) [17:15:47] thanks giuseppe [17:18:20] (03CR) 10Jgreen: [C: 032 V: 031] Log rotate the correct ocg file [operations/puppet] - 10https://gerrit.wikimedia.org/r/149463 (owner: 10Mwalker) [17:21:46] (03PS3) 10Giuseppe Lavagetto: jobrunner/hhvm: disable perf maps, turn off jit [operations/puppet] - 10https://gerrit.wikimedia.org/r/149900 (owner: 10Ori.livneh) [17:25:49] mark: yt? [17:27:37] (03PS1) 10Manybubbles: Cirrus all field rollout stage two [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149906 [17:28:37] (03CR) 10Mwalker: [C: 031] fix postmortem cleanup cron [operations/puppet] - 10https://gerrit.wikimedia.org/r/149322 (owner: 10BBlack) [17:31:58] (03PS13) 10Physikerwelt: Mathoid configuration for beta labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/148836 [17:32:03] (03CR) 10Manybubbles: "Scheduled this to go out with tomorrow morning's SWAT if it looks good." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149906 (owner: 10Manybubbles) [17:32:38] (03PS2) 10Manybubbles: Cirrus all field rollout stage two [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149906 [17:34:12] ottomata: so /etc/kafka is handled by kafka-common... is that correct ? [17:34:23] (03CR) 10Physikerwelt: Mathoid configuration for beta labs (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/148836 (owner: 10Physikerwelt) [17:34:33] I see file like /etc/kafka/server.properties, I would expect this to be in kafka-server [17:35:16] ottomata: btw the rest of the file seems cool. I wanted to try a slightly different approach but this obviously works [17:35:29] ottomata: s/file/change/ [17:36:13] hmm, yes mmm, that file gets installed via make install too, in config/ dir, right? [17:36:14] hm [17:36:23] ok, i'd have to do copying of specific files to specific packages then [17:36:53] akosiaris: what if kafka-common also contained usr/sbin/kafka [17:36:59] and we skipped the kafka-client package all together? [17:37:05] just had two [17:37:08] kafka-common and kafka-server? [17:37:27] better or worse? [17:38:21] kind of worse semantically. Does the split to 3 packages cause greater maintenance burden ? [17:38:50] !log aaron Synchronized php-1.24wmf15/includes/jobqueue/JobQueueFederated.php: 12ce1dc1ec46b06d1160e142ddfaf8dcb1c9f131 (duration: 00m 04s) [17:38:56] Logged the message, Master [17:40:11] nope, doesn't matter to me [17:41:09] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [17:41:38] (03CR) 10Legoktm: "It's a new service that is intended to be run just in labs for now." [operations/puppet] - 10https://gerrit.wikimedia.org/r/149486 (https://bugzilla.wikimedia.org/68609) (owner: 10Yuvipanda) [17:41:48] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Really minor comment" (034 comments) [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/149889 (owner: 10Ottomata) [17:42:07] heya bblack, do you know if there been changes to amssq31-amssq46 sorta recently? [17:42:11] like, within the last month or so? [17:42:25] e.g., I didn't know that amssq* hosts had esams.wmnet domains [17:42:32] The mailman server (sodium) is supposed to be upgraded at some point. Its OS and the mailman installation itself, maybe? [17:42:33] thought they were all .esams.wikimedia.org domains [17:42:50] i'm seeing lots of varnishkafka errors from those hosts [17:42:53] Probably related to https://rt.wikimedia.org/Ticket/Display.html?id=5420 [17:43:00] Does anyone know what the status of that is? [17:43:32] as well as a couple of the mobile caches (cp3013 and cp3014) [17:43:43] Carmela: the way I see it stalled/forgotten [17:44:10] :-( [17:44:15] Any way to un-stall it? [17:44:19] http://ganglia.wikimedia.org/latest/graph_all_periods.php?title=&vl=&x=&n=&hreg%5B%5D=amssq31%7Camssq32%7Camssq33%7Camssq34%7Camssq35%7Camssq36%7Camssq37%7Camssq38%7Camssq39%7Camssq40%7Camssq41%7Camssq42%7Camssq43%7Camssq44%7Camssq45%7Camssq46%7Ccp3013%7Ccp3014&mreg%5B%5D=kafka.varnishkafka.kafka_drerr.per_second>ype=line&glegend=show&aggregate=1 [17:44:32] It's allegedly blocking re-skinning/re-theming mailman. [17:44:45] Carmela: allegedly ? [17:44:59] akosiaris: https://bugzilla.wikimedia.org/show_bug.cgi?id=61283#c7 [17:45:26] (03CR) 10Ori.livneh: "@chasemp: The idea is not that you'd pass ensure => present / absent to a Service resource just because; it's so that you can have a class" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149778 (owner: 10Ori.livneh) [17:45:29] Carmela: https://lists.wikimedia.org/mailman/listinfo/accounts-enwiki-l [17:45:34] I personally don't think it's a real blocker, but eh. [17:45:36] isn't this page skinned/themed ? [17:45:50] akosiaris: Yeah, but that ticket is about the default. [17:46:00] Otherwise, we have to have individual list admins do it and that's infeasible. [17:46:12] As many list admins are inactive or don't know HTML. [17:46:22] !log aaron Synchronized php-1.24wmf14/includes/jobqueue/JobQueueFederated.php: 87e7bfceb795d065d6157ac8ce3381a7814000b5 (duration: 00m 03s) [17:46:28] Logged the message, Master [17:47:04] Carmela: I see [17:48:14] Carmela: I can say that is MUST be done anyway for other reasons as well, but I have no estimation currently about the effort required [17:49:46] _joe_: Filepath: /tmp/catalog-differ/akosiaris/146756/change/src/modules/varnish/templates/vcl/wikimedia.vcl.erb, Line: 147, Detail: undefined method `each' for "10.2.2.22":String ... sigh [17:49:58] _joe_: just tried ruby 1.9.1 [17:50:19] just for the heck of it... obviously we need to fix stuff [17:50:21] akosiaris: All right, thanks for taking a look. [17:50:37] Carmela: thanks as well [17:50:48] <_joe_> akosiaris: gee [17:50:56] (03PS2) 10BBlack: fix postmortem cleanup cron [operations/puppet] - 10https://gerrit.wikimedia.org/r/149322 [17:51:10] (03CR) 10BBlack: [C: 032 V: 032] fix postmortem cleanup cron [operations/puppet] - 10https://gerrit.wikimedia.org/r/149322 (owner: 10BBlack) [17:52:20] (03PS1) 10Ori.livneh: add mw1053 to mediawiki-installation dsh group [operations/puppet] - 10https://gerrit.wikimedia.org/r/149913 [17:52:52] ^ _joe_ [17:53:07] heh [17:54:17] (03CR) 10Giuseppe Lavagetto: [C: 032] add mw1053 to mediawiki-installation dsh group [operations/puppet] - 10https://gerrit.wikimedia.org/r/149913 (owner: 10Ori.livneh) [17:55:01] <_joe_> ori: merged. [17:55:08] <_joe_> now off to the meeting [17:56:34] ottomata: those machines were reinstalled about 45 days ago, renamed from .esams.wm.o to .esams.wmnet, moved to private IP space, and re-provisioned as live text varnishes [17:56:58] ok, assumed as much from the uptime of 45 days [17:57:11] they are in a rack with a few other amssq hosts too [17:57:15] but those ones aren't erroring [17:57:25] i'd like to figure out what is making just these hosts error [17:57:56] just 33-46 are erroring? [17:58:24] 31-46 [17:58:28] and cp3013 and cp3014 [17:58:33] those two are in a different rack [17:58:48] and have more errors [17:58:57] (03CR) 10Legoktm: Add extdist module + role for labs (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149486 (https://bugzilla.wikimedia.org/68609) (owner: 10Yuvipanda) [18:01:05] (03PS28) 10Legoktm: Add extdist module + role for labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149486 (https://bugzilla.wikimedia.org/68609) (owner: 10Yuvipanda) [18:20:44] (03PS1) 10Mwalker: Declare OCG ganglia monitor group [operations/puppet] - 10https://gerrit.wikimedia.org/r/149917 [18:21:57] (03PS1) 10Bene: Add good and featured badges to config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149918 (https://bugzilla.wikimedia.org/40810) [18:22:38] ottomata: an interesting comparison point will be amssq47.esams.wmnet - this one was reinstalled like 33-46, but at a later date, and doesn't seem to have the issue. [18:23:34] s/33/31/ as you noted above [18:23:39] aye, i checked that one too, because its in the same rack. ja [18:24:04] if it weren't for the exception of amssq47, I'd say maybe related to being on the private subnet [18:24:53] yea... [18:25:01] and why cp3013 and cp3014 too? [18:25:07] questions to answer! :) [18:26:41] Hm.. can someone tell me what it means when a reply textbox on RT is red? [18:27:21] It never bothered me before, I'm just curious [18:27:37] also, should be concerned who to Bcc? there's a huge list of checkboxes. Is there a convention there? [18:29:00] (03CR) 10Helder.wiki: Add good and featured badges to config (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149918 (https://bugzilla.wikimedia.org/40810) (owner: 10Bene) [18:33:25] (03CR) 10Rush: "yup totally understand man but I think the abstraction in this case is harmful. This kind of thing is allowed for this reason." [operations/puppet] - 10https://gerrit.wikimedia.org/r/149778 (owner: 10Ori.livneh) [18:34:12] (03CR) 10Ori.livneh: "Why is it *harmful*?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149778 (owner: 10Ori.livneh) [18:35:55] (03CR) 10Jgreen: [C: 032 V: 031] Declare OCG ganglia monitor group [operations/puppet] - 10https://gerrit.wikimedia.org/r/149917 (owner: 10Mwalker) [18:38:48] (03CR) 10Rush: "we are passing through existing functionality (boolean case) in order to allow for extension (present/absent) in a way that can be cleanly" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149778 (owner: 10Ori.livneh) [18:42:03] (03PS29) 10Legoktm: Add extdist module + role for labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149486 (https://bugzilla.wikimedia.org/68609) (owner: 10Yuvipanda) [18:49:05] Videoconf test: http://openmeetings.wmflabs.org:5080/openmeetings/#room/9 [18:49:12] (03CR) 10Aaron Schulz: [WIP] Added JSON version of jobrunner config file (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149362 (owner: 10Aaron Schulz) [18:50:22] (03CR) 10Rush: "harmfu is inhereted shorthand from an old job for encourages non-obvious behavior / wraps up things that don't need it / better done simpl" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149778 (owner: 10Ori.livneh) [18:51:09] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [18:52:13] matanya: I'm talking but you can't hear me :) [18:52:38] unless you are trolling me in which case well played [18:58:41] (03Abandoned) 10Subramanya Sastry: Add ruthenium backends for visual diff tests and visual diff service [operations/puppet] - 10https://gerrit.wikimedia.org/r/149891 (owner: 10Subramanya Sastry) [18:59:09] PROBLEM - puppet last run on mw1103 is CRITICAL: CRITICAL: Puppet has 1 failures [19:00:39] PROBLEM - puppet last run on nickel is CRITICAL: CRITICAL: Epic puppet fail [19:01:19] PROBLEM - puppet last run on es4 is CRITICAL: CRITICAL: Epic puppet fail [19:01:47] _joe_: http://ganglia.wikimedia.org/latest/?c=Jobrunners%20eqiad&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 I wonder why mw1001 shows up as having less cpu available essentially [19:01:53] it's always been like that [19:02:43] (03CR) 10Bene: Add good and featured badges to config (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149918 (https://bugzilla.wikimedia.org/40810) (owner: 10Bene) [19:03:49] PROBLEM - puppet last run on aluminium is CRITICAL: CRITICAL: Epic puppet fail [19:03:49] PROBLEM - puppet last run on ms1001 is CRITICAL: CRITICAL: Epic puppet fail [19:05:19] PROBLEM - puppet last run on tridge is CRITICAL: CRITICAL: Epic puppet fail [19:07:09] PROBLEM - puppet last run on ms1004 is CRITICAL: CRITICAL: Epic puppet fail [19:08:16] !log restarting varnishkafka on cp3013 [19:08:22] Logged the message, Master [19:08:59] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: Epic puppet fail [19:09:42] (03PS1) 10Bene: Add good and featured article badges to testwikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149928 (https://bugzilla.wikimedia.org/40810) [19:10:09] PROBLEM - puppet last run on sodium is CRITICAL: CRITICAL: Epic puppet fail [19:10:19] PROBLEM - puppet last run on nfs1 is CRITICAL: CRITICAL: Epic puppet fail [19:12:54] greg-g: recent regression https://bugzilla.wikimedia.org/show_bug.cgi?id=68757 not sure whether it is last deployment or before last. [19:13:14] !log restarting varnishkafka on amssq31 [19:13:15] Several tools the community uses for tracking usage, as well as for automatically linting javascript code are no longer working due to this. [19:13:20] Logged the message, Master [19:14:39] PROBLEM - puppet last run on sanger is CRITICAL: CRITICAL: Epic puppet fail [19:17:09] PROBLEM - puppet last run on linne is CRITICAL: CRITICAL: Epic puppet fail [19:18:09] RECOVERY - puppet last run on mw1103 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [19:18:27] bblack, i restarted varnishkafka on cp3014 and amssq31, we'll see how it does after a day I guess. i suspect it will just start erroring again eventually [19:18:51] ok [19:18:58] what's the nature of the errors? [19:18:59] but, curious to note, as far as I can tell, these nodes have been erroring for a while, and we have 0 data in Kafka (and subsequently in HDFS) for them [19:19:28] they are varnishkafka delivery errors, meaning the varnishkafka queue on the node filled up, and vk just started dropping packets [19:19:40] this has happened in the past from esams hosts [19:19:48] but i hadn't noticed it happening in a long while [19:20:03] apparently its been happening on these hosts for as long as the newly installed kafka has been up [19:20:07] (few weeks) [19:20:29] before, this happened because produce times to eqiad kafka brokers from esams was too long, [19:20:34] things would timeout, queue would fill, etc. [19:21:04] fwiw epic puppet fail message comes from this change: https://gerrit.wikimedia.org/r/#/c/148394/ [19:21:06] amssq46 and amssq47 should be identical, right? [19:21:21] the message text, not the cause of those alerts [19:22:48] ottomata: yes, basically, for vk purposes [19:23:11] (03PS1) 10Matanya: solr: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/149960 [19:23:23] it could be that in all of these cases, vk was unable to connect when the machine was first booted/installed, and due to some bug the error is persistent unless you restart vk [19:23:40] Monitor_group[pdf_eqiad] is already declared in file /etc/puppet/manifests/role/ocg.pp:113; cannot redeclare [19:23:42] yeah, i'm wondering about that [19:23:43] although, on amssq33 I can see TIME_WAIT sockets to the remote port 9092 from it [19:23:50] ^ that's the actual puppet fail [19:23:52] err CLOSE_WAIT [19:23:54] seems unlikely, but that's why I restarted [19:23:56] looks like just on older hosts [19:24:29] just fixed the opposite (pdf_eqiad missing) before weekend [19:24:59] hmmmmm, yeah bblack, some of the broker's are the same addresses as before the kafka cluster reinstall a few weeks ago [19:25:15] i know I at least attempted to manually restart varnishkafka everywhere whne I did that [19:25:16] via salt [19:25:19] ottomata: oh, now that I think/read, CLOSE_WAIT is probably indicative of a vk bug (that is perhaps triggered by conditions in these cases, but still) [19:25:19] mwalker: Jeff_Green https://gerrit.wikimedia.org/r/#/c/149917/ [19:25:22] but, maybe some nodes didn't make it? [19:25:25] that breaks on a couple hosts [19:25:38] (03PS1) 10Mwalker: Revert "Declare OCG ganglia monitor group" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149983 [19:25:47] mutante, it just needs to be reverted ^ [19:25:50] the CLOSE_WAIT sockets are persistent in the broken vk processes, the connection state is probably stuck in vk [19:25:52] it didn't do what it was supposed to [19:26:11] hm, is it possible to tell how long those sockets have been that way? [19:26:17] not really [19:26:25] mwalker: yea, i already added the same in the old PDF rule a couple days ago.. ok [19:26:37] mutante: yeah we were just talking about that [19:26:38] (03PS1) 10Matanya: spamassassin: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/149993 [19:26:38] ottomata: if you look at "root@amssq33:~# lsof -np 932" [19:26:40] (03CR) 10Dzahn: [C: 032] "yep, this should fix: Error 400 on SERVER: Duplicate declaration: Monitor_group[pdf_eqiad] is already declared in file /etc/puppet/manifes" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149983 (owner: 10Mwalker) [19:26:46] those same sockets have been there as long as I've been looking today [19:27:34] my money is on a state-machine bug in vk's tcp connection handling which happens to get triggered in some of these cases (either on fresh install and having connection issues, or when connectivity to esams is intermittent) [19:28:08] intersting, yeah, that looks very fishy indeed [19:28:45] (03CR) 10Dzahn: [V: 032] Revert "Declare OCG ganglia monitor group" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149983 (owner: 10Mwalker) [19:28:55] HM [19:29:00] yeah most explanations of hung CLOSE_WAIT for other software indicates that it's usually a software bug. It means the app never close()'d the socket even though it's FIN from TCP's perspective. [19:29:05] bblack, that process has been running since june 16 [19:29:19] which is probably when the machine was first installed+booted [19:30:16] yeah, and i'm pretty sure I reinstalled kafka cluster after that... [19:30:17] checking... [19:30:36] waits for recoveries.. runs puppet on sodium,linne,nickel. .. [19:31:01] it puts the puppet on the nodes or it gets the icinga spam again [19:31:09] RECOVERY - puppet last run on sodium is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [19:31:10] RECOVERY - puppet last run on linne is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [19:31:39] RECOVERY - puppet last run on nickel is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [19:32:54] yeah, bblack, june 24th [19:33:13] shoudl be when the newly formatted kafka cluster came back online [19:33:39] RECOVERY - puppet last run on sanger is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [19:34:19] RECOVERY - puppet last run on tridge is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [19:34:19] RECOVERY - puppet last run on nfs1 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [19:34:49] RECOVERY - puppet last run on ms1001 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [19:34:59] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [19:35:49] RECOVERY - puppet last run on aluminium is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [19:36:09] RECOVERY - puppet last run on ms1004 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [19:36:19] RECOVERY - puppet last run on es4 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [19:37:16] ACKNOWLEDGEMENT - Host platinum is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn openstack - RT #8013 (Coren) waits for Cross-connect to gold [19:37:45] (03PS2) 10Ori.livneh: Nutcracker: move declaration to role::mediawiki; parametrize [operations/puppet] - 10https://gerrit.wikimedia.org/r/149800 [19:38:22] ACKNOWLEDGEMENT - Host tantalum is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn replaced by ocg - has ticket to be reclaimed - RT #7947 [19:38:29] (03PS3) 10Ori.livneh: Nutcracker: move declaration to role::mediawiki; parametrize [operations/puppet] - 10https://gerrit.wikimedia.org/r/149800 [19:38:30] yup yup, bblack, I'm pretty sure that's it. those hosts just have stale kafka cluster broker data because they weren't restarted [19:38:52] o_O? [19:38:52] (03CR) 10Ori.livneh: "Rebased to remove dependency on wmflib changes. I can integrate them later if/when they are merged." [operations/puppet] - 10https://gerrit.wikimedia.org/r/149800 (owner: 10Ori.livneh) [19:38:55] probably somehow my salt cmd to restart vk everywhere didn't get these ones for some reason [19:39:08] ACKNOWLEDGEMENT - check configured eth on gold is CRITICAL: br0 reporting no carrier. daniel_zahn 8013 Cross-connect gold to platinum [19:39:13] mind if I restart varnishkafka on all these hosts? [19:39:42] would mess with the control for my restart experiment on the two I already restarted, but i'm pretty sure this is the problem [19:39:58] yeah go ahead [19:41:36] yeah, this makes a lot of sense [19:41:40] no data from these hosts at all [19:41:51] would mean that something is configured wrong for sure [19:41:59] we've never had a case where esams hosts couldn't send any data ever [19:42:05] only cases where esams was lossy [19:42:13] springle: did you see diskspace on db1017 yet? [19:42:57] ACKNOWLEDGEMENT - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 daniel_zahn 7924 adjust CirrusSearch monitoring [19:42:58] !log restarted varnishkafka on some esams hosts that seem old misconfigured vk processes [19:43:02] Logged the message, Master [19:44:01] !log Jenkins: updated qunit jobs to roam on both gallium and lanthanum (were previously tied to run only on gallium) [19:44:07] Logged the message, Master [19:46:33] (03CR) 10Dzahn: [C: 032] "so because the general reactions have been positive i'm merging now so the code is here but i won't actually put it on a node in site.pp, " [operations/puppet] - 10https://gerrit.wikimedia.org/r/140948 (owner: 10Dzahn) [19:47:21] hashar: Jenkins seems clogged? [19:47:42] !log Jenkins/Zuul lost connection somehow. Disabled/Reenabled gearman client in Jenkins [19:47:44] Krinkle: yeah again [19:47:47] Logged the message, Master [19:47:57] Krinkle: it somehow lost all its jobs for some reason [19:48:11] Krinkle: I mean, Zuul suddenly does not know about Jenkins jobs. So it keeps pilling them up [19:49:28] Krinkle: suggestion re that bug you pinged me on? [19:49:28] mutante: db1017 isn't a db anymore. gave it to the graphite effort [19:49:42] hopefully to be renamed someday, like tungsten was [19:49:58] Krinkle: did the cvn patch get merged? [19:50:01] springle: aaha, oh yea, let's rename, that can be confusing :) [19:50:11] thx, was just going through icinga [19:50:13] +1 :) [19:50:32] YuviPanda: Coren: Yep, http://graphite.wmflabs.org/render/?width=578&height=289&from=00%3A00_20140723&until=23%3A45_20140729&hideLegend=false&target=cvn.*.cpu.total.user.value [19:50:39] There's fluctuation at last [19:50:41] mutante: godog was playing with cassandra there afaik [19:50:45] Krinkle: ah, cool [19:50:50] http://graphite.wmflabs.org/render/?width=578&height=289&from=00%3A00_20140728&until=23%3A45_20140729&hideLegend=false&target=cvn.*.cpu.total.user.value [19:50:52] ah yeah my fault for not renaming it :( [19:50:59] It does look a bit suspicious though [19:51:01] mutante: ^ [19:51:02] Krinkle: fwiw, I don't consider it 'production' atm, mostly because it can still drop points (due to excess load) [19:51:05] Looks like it's alternating [19:51:05] Krinkle: but should mostly work fine [19:51:18] !log Zuul: stopped / started process to clear up obsoletes changes stuck in queue [19:51:24] Logged the message, Master [19:51:30] though it's not alternating to a fixed point (like real/fake/real/fake/real/fake) [19:51:34] so it's probably fine [19:51:43] Krinkle: oh yeah, dropped points show up as missing points [19:51:57] godog: gotcha, thanks, let me just ACK it [19:52:10] Krinkle: I had a dashboard for toollabs: http://tools.wmflabs.org/giraffe/index.html#dashboard=ToolLabs+Basics&timeFrame=1h, will put it on git soon (paused my efforts while the new box was being provisioned) [19:52:15] http://graphite.wmflabs.org/render/?width=600&height=300&hideLegend=false&target=cvn.*.cpu.total.user.value&from=-4h [19:52:29] ACKNOWLEDGEMENT - Disk space on db1017 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=81%): daniel_zahn not a db, used for graphite testing (godog) [19:52:33] Krinkle: once it's on git people with other projects can easily add things to add a tab there, and this will probably live on somewhere not-toollabs [19:52:44] let me fix it too, poor db1017 [19:52:48] thanks mutante [19:52:57] :) yep, ran out of space [19:52:59] yw [19:53:28] YuviPanda: What's that orange theme? I saw another analytics bootstrap-like page use that as well [19:53:33] is that some kind of theme? [19:53:43] Krinkle: oh, probably? This is just the default giraffe theme [19:54:01] giraffe isn't the name you came up with as graphite punt? [19:54:03] k [19:54:19] Krinkle: no :) it's a pre-existing dashboard [19:54:25] Krinkle: I don't think analytics is using the same thing [19:54:30] Krinkle: but yeah. Bootstrap everyehwere [19:55:17] YuviPanda: the orange hurt smy eyes though [19:55:29] Krinkle: heh, yeah. I'll probably switch the theme to something more somber [19:55:32] (03PS1) 10Ottomata: Re-enable varnishkafka delivery error alerts [operations/puppet] - 10https://gerrit.wikimedia.org/r/150010 [19:55:36] YuviPanda: So do we plan on having some kind of ganglia-like dashboard where we show the data from graphite in a more usable way? [19:55:38] http://graphite.wmflabs.org/render/?width=800&height=2900&hideLegend=false&target=cvn.*.*.*.*&from=-4h [19:55:42] That's not very useful :) [19:55:59] Krinkle: yeah, giraffe is my target. currently prod has gdash (gdash.wikimedia.org) which I think is suboptimal [19:56:01] the usual disk/memory/cpu graphis etc. [19:56:16] with certain things combined into one graph, but not too much. [19:56:23] (03PS1) 10Mwalker: Tell gmond about ocg1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/150012 [19:56:26] Should be pretty straight forward, but is harder to do manuaully [19:56:39] I don't know what all the various fields mean [19:56:42] Krinkle: yeah, I'll probably programmatically construct them a 'default' that folks can customize [19:56:46] do you have a draft I could use? [19:56:49] RECOVERY - Disk space on db1017 is OK: DISK OK [19:56:54] Krinkle: oh yeah, it's on toollabs. I can give you access. [19:57:08] Krinkle: I've added that link table regression to the MW Core team meeting for today, thoughts on it helpful if you can/have time, otherwise I'll try to get a core member on it [19:57:25] YuviPanda: I'm thinking of something like http://codepen.io/Krinkle/full/zyodJ/ where it just takes a js array and consturs graphite in a loop [19:57:42] grrrit-wm: !restart [19:57:44] i Just need to know which data points to grab and what to call them. [19:57:54] (03PS2) 10Mwalker: Tell gmond about ocg1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/150012 [19:58:16] Krinkle: well, giraffe also just takes a JSON structure, but auto updates, etc. I don't like the default graphite styles. [19:58:19] (03CR) 10Ottomata: [C: 032 V: 032] Re-enable varnishkafka delivery error alerts [operations/puppet] - 10https://gerrit.wikimedia.org/r/150010 (owner: 10Ottomata) [20:00:05] gwicke, subbu, cscott: Sir, Please deploy Parsoid (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140728T2000), the time has come. At your service [20:01:06] Krinkle: I added you to the giraffe project on toollabs. You can see the dashboard definition on ~/giraffe/dashboards.js [20:01:24] Krinkle: https://github.com/kenhub/giraffe/blob/master/dashboards.js for a full on template [20:01:57] Hm.. not sure I understand. There's logic already that produces the graphs we want from graphite? [20:02:06] Krinkle: yes? [20:02:21] http://tools.wmflabs.org/giraffe/index.html#dashboard=ToolLabs+Basics&timeFrame=1h doesn't show that though [20:02:29] other than disk space [20:02:49] Krinkle: yeah, because in the dashboards.js file, I had customized it to show only the toollabs disk space + the webproxy traffic levels [20:03:40] YuviPanda: k, I'll play with it a bit later. Thx [20:03:48] Krinkle: yw [20:03:50] That github link contains what I was looking for. [20:03:53] Krinkle: :) [20:03:56] e.g. how to use the mem.user value [20:04:12] especially memory has a ton of different metrcis of which it isn't as obvious how to utilize them [20:04:26] basically I just want the graphs I know from ganglia [20:05:01] Krinkle: right. I'll probably set those up with the new graphite box for all machines [20:05:07] and then provide them for all projects we track in diamond (eventually all) without needing specific config (other than for custom metrics) [20:05:15] Krinkle: yup, that's my plan [20:05:18] cool [20:05:24] (03CR) 10Dzahn: [C: 031] "test - grrritwm has been restarted, is output ending up in ops channel now?" [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/149202 (owner: 10Dzahn) [20:05:24] Krinkle: provide a minimal but usable default, and then let people configure [20:05:28] (03PS2) 10Ori.livneh: mediawiki::jobrunner: Added JSON version of jobrunner config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/149362 (owner: 10Aaron Schulz) [20:05:47] YuviPanda: yeah, prolly let the custom config add instead of replace, so that new metrics we add globally will show up everywhere [20:05:57] end to end [20:06:03] Krinkle: yep [20:06:49] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [20:06:53] _joe_, godog -- if either of you are around, could you look at / +1 https://gerrit.wikimedia.org/r/#/c/149362/ ? [20:07:05] (03PS14) 10Physikerwelt: Mathoid configuration for beta labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/148836 [20:07:15] (03CR) 10Jgreen: [C: 032 V: 031] Tell gmond about ocg1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/150012 (owner: 10Mwalker) [20:07:19] ori: poke when you have a moment, want to show you something potentially at least a little cool [20:08:39] !log intermittent 5xx are most likely varnish restarts off and on rest of today [20:08:44] Logged the message, Master [20:08:50] ori: looks good, but it is tabs all the way down :( [20:09:02] i'll retab [20:09:57] thanks! [20:10:17] (03PS3) 10Ori.livneh: mediawiki::jobrunner: Added JSON version of jobrunner config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/149362 (owner: 10Aaron Schulz) [20:11:19] (03PS1) 10Mwalker: Give OCG admins the ability to run maintenance scripts [operations/puppet] - 10https://gerrit.wikimedia.org/r/150016 [20:11:56] (03CR) 10jenkins-bot: [V: 04-1] Give OCG admins the ability to run maintenance scripts [operations/puppet] - 10https://gerrit.wikimedia.org/r/150016 (owner: 10Mwalker) [20:13:07] (03PS2) 10Mwalker: Give OCG admins the ability to run maintenance scripts [operations/puppet] - 10https://gerrit.wikimedia.org/r/150016 [20:17:54] godog: better? [20:20:02] ori: yup, LGTM, I'll merge [20:20:09] thank you very much [20:20:35] * matanya smiles at no tabs [20:21:05] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] mediawiki::jobrunner: Added JSON version of jobrunner config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/149362 (owner: 10Aaron Schulz) [20:21:07] (03CR) 10Dzahn: "test - grrritwm has been restarted, is output ending up in ops channel now?" [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/149202 (owner: 10Dzahn) [20:21:09] hashar: what about jenkins vote -1 on literal tabs? where does that stand ? [20:23:50] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [20:25:21] matanya: in manifests ? [20:25:33] yes, at first [20:25:41] erbs are later [20:25:56] matanya: we could go with puppet-lint, but would have to explicitly ignore all warnings bug the tabs [20:26:07] works for me [20:26:14] that is ugly :D [20:26:22] right [20:26:27] but works :) [20:26:40] we will add the rest one by one as we fix them [20:26:59] hashar: you can add another test only for tabs [20:27:13] yeah in .pp files? [20:27:19] yes [20:31:04] grep --recursive -e '^\t' --exclude-dir=.git --include='*.pp' . [20:31:06] might just work [20:32:50] godog: got time for one more? :) https://gerrit.wikimedia.org/r/#/c/149896/ if so [20:33:21] ori: depends, does it have tabs? [20:33:23] * godog runs [20:33:37] godog: no tabs, but windows line breaks [20:33:49] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [20:34:01] ori: haha no vertical tabs? [20:34:34] heh [20:34:44] godog: actually, i should amend it to ensure => absent the cronjob [20:34:58] ori: sure [20:36:54] (03PS2) 10Ori.livneh: mediawiki: add tidy resource for temp multimedia files [operations/puppet] - 10https://gerrit.wikimedia.org/r/149896 [20:37:00] (03PS1) 10Mwalker: Let gmond talk to ocg1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/150022 [20:37:17] godog: updated [20:37:44] (03PS2) 10Mwalker: Let gmond talk to ocg1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/150022 [20:38:22] (03CR) 10jenkins-bot: [V: 04-1] Let gmond talk to ocg1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/150022 (owner: 10Mwalker) [20:39:02] matanya: did you have a bug report for tabs in manifests ? just wondering, don't bother filling one [20:39:11] no, haven't [20:39:27] (03PS3) 10Mwalker: Let gmond talk to ocg1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/150022 [20:39:33] (03PS1) 10Aaron Schulz: Switched to JSON-based jobrunner.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/150024 [20:40:37] (03CR) 10Jgreen: [C: 032 V: 031] Let gmond talk to ocg1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/150022 (owner: 10Mwalker) [20:40:46] matanya: https://gerrit.wikimedia.org/r/150023 :D [20:40:57] (03CR) 10Aaron Schulz: [C: 04-1] Switched to JSON-based jobrunner.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/150024 (owner: 10Aaron Schulz) [20:41:48] thanks a lot hashar! [20:41:50] matanya: that will only check operations/puppet , not the submodules puppet repos [20:41:55] https://integration.wikimedia.org/ci/job/operations-puppet-tabs/1/console [20:41:56] pass [20:42:00] good enough [20:42:04] maybe [20:42:30] fyi: there won't be any parsoid deploy today [20:42:50] likely also none on Wednesday [20:43:08] gwicke: you might want to tell jouncebot [20:43:20] i.e edit deployments [20:43:53] (03CR) 10Filippo Giunchedi: mediawiki: add tidy resource for temp multimedia files (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149896 (owner: 10Ori.livneh) [20:44:08] matanya: it's a recurring deployment window, so I'd have to mark it up as an exception [20:44:22] ah, thanks [20:46:20] matanya: https://integration.wikimedia.org/ci/job/operations-puppet-tabs/4/console :) [20:46:49] mwalker: jenkins doesnt like something in jouncebot (or we dont want the tests active) [20:46:49] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [20:47:07] hashar: now we need someone to submit with tabs ... :D [20:47:20] mutante: .pep8 [20:47:24] matanya: https://integration.wikimedia.org/ci/job/wikimedia-bots-jouncebot-pyflakes/2/console [20:47:27] mutante, *nods* I am fixing it [20:47:49] you can override it by adding .pep8 files to the root of the repo [20:47:52] matanya: wasnt sure which of the 2 checks that failed is the important one [20:47:55] mwalker: cool!:) [20:48:03] yeah, fixing is better [20:48:07] +1 mwalker [20:48:24] Coren, springle: For the engineering report I just wrote "The Tool Labs replica Databases are in the process of conversion from Mysql to MariaDB. This should reduce replag and improve performance and reliability." <- is that basically true, or is the replag thing wishful thinking? [20:49:14] (03PS1) 10Mwalker: Style fixes [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150035 [20:49:26] (03CR) 10jenkins-bot: [V: 04-1] Style fixes [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150035 (owner: 10Mwalker) [20:50:08] catch22 [20:50:13] !log operations/puppet.git manifests should no more have leading tabulations {{gerritI69ddc72f5a072ac7dc4f67622b65f36a70d3c021}} [20:50:19] Logged the message, Master [20:50:29] matanya: can you post an announcement to ops list ? Quoting https://gerrit.wikimedia.org/r/#/q/I69ddc72f5a072ac7dc4f67622b65f36a70d3c021,n,z :-) [20:50:57] yes, though i think mutante should have the honor :) [20:50:57] (03CR) 10Ori.livneh: mediawiki: add tidy resource for temp multimedia files (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149896 (owner: 10Ori.livneh) [20:52:08] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] mediawiki: add tidy resource for temp multimedia files [operations/puppet] - 10https://gerrit.wikimedia.org/r/149896 (owner: 10Ori.livneh) [20:52:10] (03PS1) 10Hashar: tabs in manifests are frowned upon [operations/puppet] - 10https://gerrit.wikimedia.org/r/150036 [20:52:14] ori: kk, I'll merge! [20:52:27] godog: thanks very much! [20:52:50] andrewbogott: hopefully true, but not yet proven. also, they've been mariadb for over a year; this is an upgrade [20:52:52] no worries [20:53:04] springle: oh, can you tell me updating from what to what? [20:53:17] matanya: my patch is broken :-/ [20:53:18] failed hashar [20:53:31] edit conflict :) [20:53:44] andrewbogott: mariadb 5.5 to 10.0 (which despite the silly numbering, is merely the equivalent of one major version) [20:53:53] matanya: https://gerrit.wikimedia.org/r/#/c/150036/1/tabulator.pp definitely has tabulations [20:54:07] matanya: but https://integration.wikimedia.org/ci/job/operations-puppet-tabs/5/console : SUCCESS in 29s [20:54:33] (03CR) 10Dzahn: [C: 031] "works!? 20:52:51 ./tabulator.pp:1 ERROR tab character found (hard_tabs)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150036 (owner: 10Hashar) [20:54:33] springle: got it, thank you! [20:54:38] hashar: but.. it worked? [20:54:48] (03PS2) 10Mwalker: Style fixes [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150035 [20:54:48] stupid me [20:54:51] (03CR) 10jenkins-bot: [V: 04-1] Style fixes [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150035 (owner: 10Mwalker) [20:54:54] I have been using Mac OS grep version [20:54:58] instead of GNU one [20:55:56] Mac ... when will apple cope with the rest of the world ? [20:56:27] use ack-grep anyways ;) [20:56:29] andrewbogott: for labsdb the reason was simple: multisource replication to pull all previously-federated shards in as proper local tables on each db instance. other improvements are nice, but putrely incidental :) [20:56:47] (03PS3) 10Mwalker: Style fixes [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150035 [20:56:49] (03CR) 10jenkins-bot: [V: 04-1] Style fixes [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150035 (owner: 10Mwalker) [20:57:24] (03PS2) 10Dzahn: Wikitech -- use "Header set" instead of "Header append" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149626 (owner: 10Chmarkine) [20:58:41] (03CR) 10Dzahn: [C: 032] Wikitech -- use "Header set" instead of "Header append" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149626 (owner: 10Chmarkine) [20:59:51] (03PS4) 10Mwalker: Style fixes [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150035 [21:00:20] (03CR) 10Mwalker: [C: 032] Style fixes [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150035 (owner: 10Mwalker) [21:01:20] hashar, https://gerrit.wikimedia.org/r/#/c/150035/ passed it's gate and submit tests; but it didn't actually submit [21:01:23] any thoughts? [21:01:48] (03PS3) 10Ottomata: Split kafka package into 3 separate packages [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/149889 [21:01:50] mwalker: try doing +2 again now [21:01:55] mwalker: might still be in zuul queue at https://integration.wikimedia.org/zuul/ [21:01:56] mwalker: after jenkins did [21:02:05] else remove +2 and revote +2 :-D [21:02:21] (03CR) 10Mwalker: [C: 032] Style fixes [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150035 (owner: 10Mwalker) [21:02:39] mmm [21:02:43] mutante: matanya: I can't figure out how to match tabs with grep :D [21:04:26] hashar, I removed my CR+2 and re-added it; Jenkins re did it's gate-and-submit; and it tells me "Gate pipeline build succeeded." but still no merge [21:04:40] hashar: grep "\t" [21:04:52] oh really [21:05:03] at beginning of strings [21:05:09] grep -P "^\t" will do [21:05:13] or https://stackoverflow.com/questions/1825552/grep-a-tab-in-unix [21:05:13] perl ftw [21:05:30] but, doesnt it already vote? [21:05:34] what are you adding? [21:05:35] if you really didn't google :P [21:06:17] i also dont see why it doesnt merge the change on jouncebot [21:06:24] maybe because the parent is wikimedia/bots [21:06:29] and it inherited settings from there? [21:06:30] BTW grep "^V" foo.txt with the literal tab key would work too [21:06:48] (03CR) 10Dzahn: [C: 032] "test" [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150035 (owner: 10Mwalker) [21:06:53] mwalker: merged .. [21:07:01] i guess the difference must be permissions [21:07:05] and gerrit group memberships [21:07:17] agh failure https://integration.wikimedia.org/ci/job/operations-puppet-tabs/7/console [21:07:22] hmm; or maybe it just doesn't like self reviewed code :p [21:07:32] (03Abandoned) 10Hashar: tabs in manifests are frowned upon [operations/puppet] - 10https://gerrit.wikimedia.org/r/150036 (owner: 10Hashar) [21:07:37] (03Restored) 10Hashar: tabs in manifests are frowned upon [operations/puppet] - 10https://gerrit.wikimedia.org/r/150036 (owner: 10Hashar) [21:07:41] (03CR) 10Hashar: "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150036 (owner: 10Hashar) [21:08:28] (03CR) 10jenkins-bot: [V: 04-1] tabs in manifests are frowned upon [operations/puppet] - 10https://gerrit.wikimedia.org/r/150036 (owner: 10Hashar) [21:08:43] (03Abandoned) 10Hashar: tabs in manifests are frowned upon [operations/puppet] - 10https://gerrit.wikimedia.org/r/150036 (owner: 10Hashar) [21:09:08] (03Restored) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [operations/puppet] - 10https://gerrit.wikimedia.org/r/89002 (owner: 10Hashar) [21:09:18] (03CR) 10Hashar: "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/89002 (owner: 10Hashar) [21:09:23] validation validation [21:09:35] works! [21:09:46] mwalker: hahhh jouncebot [21:09:55] mwalker: I guess jenkins bot is not allowed to submit patches on that repo [21:10:05] it inherits from the parent project [21:10:10] that i specified on creation [21:10:14] which is wikimedia/bots [21:11:12] mwalker: I have added JenkinsBot https://gerrit.wikimedia.org/r/#/admin/projects/wikimedia/bots/jouncebot,access [21:11:25] I am not sure what kinds of bots are under wikimedia/bots [21:11:40] maybe we can allow JenkinsBot to submit on all wikimedia/bots repos [21:11:48] does jouncebot run on labs? [21:11:54] YuviPanda, yepyep [21:11:56] LabsAntiSpamBot and WMIB [21:12:04] are the other bots in there [21:12:56] !log Gerrit: allowed JenkinsBot to submit patches on wikimedia/bots (and thus on all child repositories) [21:13:01] mutante: heh, could've had wikibugs and grrrit-wm at some point [21:13:01] Logged the message, Master [21:13:01] mwalker: should be good now. [21:13:26] thanks :) [21:13:35] (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [operations/puppet] - 10https://gerrit.wikimedia.org/r/89002 (owner: 10Hashar) [21:13:51] YuviPanda, to add to the confusion, there is also a mediawiki/bots repo root [21:14:02] YuviPanda: i don't know, they are likely in something not called bots :) [21:14:49] <^d> mutante: Last comment on rt 7575 was from you asking for a racktables update. Can you see if that's done? If so I think we can close the ticket. [21:16:09] ^d: ottomata's changes have been abandoned it seems .. hmm [21:16:16] (03PS1) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150042 [21:16:24] well, partly [21:17:25] (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150042 (owner: 10Hashar) [21:17:31] ^d: well, in racktables they redirect to the new host names, looks done.. closing [21:17:47] <^d> Hehe, they were all reallocated and renamed. [21:17:50] matanya: tabs detector seems all good :) you can announce it if not already done [21:17:54] <^d> They're not pooled but that's a separate issue. [21:18:38] hashar: yay, cool [21:18:46] that took us some time, heh [21:18:54] next step : pass puppet-lenient [21:19:29] yea, and the tab thing in DNS [21:19:45] ah now I have to refactor the job hehe [21:20:00] mutante: can you bug fill that to bugzilla under Wikimedia > Continuous Integration please ? [21:20:04] I am in meeting right now [21:20:09] and will definitely forget [21:23:40] hashar: which one? i just meant it's our next goal to also replace them all in that repo [21:25:12] mutante: oh i meant removing tabs from operations/dns.git [21:26:47] eh.. ok [21:26:51] thx [21:28:21] (03PS2) 10Dzahn: wikimedia.org - retab only [operations/dns] - 10https://gerrit.wikimedia.org/r/148437 [21:28:23] (03CR) 10jenkins-bot: [V: 04-1] wikimedia.org - retab only [operations/dns] - 10https://gerrit.wikimedia.org/r/148437 (owner: 10Dzahn) [21:30:01] (03PS3) 10Dzahn: wikimedia.org - retab only [operations/dns] - 10https://gerrit.wikimedia.org/r/148437 (https://bugzilla.wikimedia.org/68769) [21:30:03] (03CR) 10jenkins-bot: [V: 04-1] wikimedia.org - retab only [operations/dns] - 10https://gerrit.wikimedia.org/r/148437 (https://bugzilla.wikimedia.org/68769) (owner: 10Dzahn) [21:30:05] :-( [21:30:19] (03PS5) 10Dzahn: Fixed spacing. [operations/dns] - 10https://gerrit.wikimedia.org/r/147168 (https://bugzilla.wikimedia.org/68769) (owner: 10Scottlee) [21:30:21] (03CR) 10jenkins-bot: [V: 04-1] Fixed spacing. [operations/dns] - 10https://gerrit.wikimedia.org/r/147168 (https://bugzilla.wikimedia.org/68769) (owner: 10Scottlee) [21:30:24] bah [21:30:28] repo borked [21:30:49] are you sure it's not just because they need rebasing? [21:30:59] it changes all the time and they have been in there [21:31:06] Any chance I can get someone to install gcc on stat1003. I'm blocked by it not being available. [21:31:09] ah yeah Gerrit says "Can Merge: No" [21:31:14] (03PS3) 10Mwalker: Give OCG admins the ability to run maintenance scripts [operations/puppet] - 10https://gerrit.wikimedia.org/r/150016 [21:31:15] stat1003.eqiad.wmnet that is [21:31:16] I am just missing the rebase button apparently [21:31:20] halfak: I can submit a patch [21:31:28] <3 YuviPanda thanks. [21:31:38] halfak: you'll need someone else to merge, though [21:31:42] I didn't even know you could have a unix machine without gcc. [21:31:44] halfak: also list all packages you want? just gcc? [21:31:51] halfak: or autotools, etc? [21:32:13] I'm pretty sure I just need gcc, but I might need python3-dev or the equivalent. [21:32:32] halfak: alright, I'll get gcc and python3-dev then [21:32:41] halfak: adding new packages is easy enoguh, so just poke me whenever [21:33:00] Thanks! [21:33:37] ori, could you help me get a simple patch merged? It'll add gcc and python3-dev to stat1003.eqiad.wmnet. [21:33:47] see YuviPanda's comments above. [21:34:44] (03PS1) 10Yuvipanda: statistics: Add packages for halfak [operations/puppet] - 10https://gerrit.wikimedia.org/r/150045 [21:34:46] halfak: ^ is the patch [21:34:49] (03PS1) 10Springle: Make persistent MariaDB config changes on labsdb1002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/150046 [21:35:01] halfak: can you edit the commit message to mention why you want them? that's usually useful [21:35:18] YuviPanda, not sure how to modify a commit message. [21:35:32] there's a pencil icon [21:35:33] hashar: not my call, unfortunately, but i second YuviPanda's comment, and i can also review the patch for correctness and -1/+1 as appropriate [21:35:34] halfak: there should be an edit pencil icon above it. [21:35:57] (03CR) 10Springle: [C: 032] Make persistent MariaDB config changes on labsdb1002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/150046 (owner: 10Springle) [21:35:59] halfak: or you can just tell me and I'll include it in the message [21:36:16] halfak: what ori said above :-) [21:36:30] ori: is trivial, https://gerrit.wikimedia.org/r/150045 +1 would be nice :) [21:37:02] Thanks anyway ori. :) [21:37:03] (03PS1) 10Ori.livneh: Remove resource ensure => absent'ed by Ie064cadf2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/150047 [21:37:12] halfak: usually, you can poke ottomata for analytics related puppet stuff, but he's not around atm :( [21:37:17] * halfak is trying to remember gerrit login. [21:37:26] Yeah. Went otto hunting. [21:37:28] YuviPanda: not really trivial; both of these packages are highly likely to be declared by some other role [21:37:36] YuviPanda: use the puppet compiler to check [21:37:43] halfak: same as labs login [21:37:55] mutante, thanks. [21:37:57] ori: oh, gah, forgot :| I could, however, just check on the stats machines (I have access) [21:38:06] * halfak forgets which are email address and which are not. [21:38:15] YuviPanda: that'll tell you if they're installed, not if they're declared [21:38:18] (03PS4) 10Mwalker: Give OCG admins the ability to run things as OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/150016 [21:38:33] ori: hmm, wouldn't they be declared otherwise *and* be in the same machine only if they're applied there? [21:38:41] err, *and* conflict only if they're applied there? [21:39:00] (03PS2) 10Halfak: statistics: Add packages for halfak Allows the compilation of statistics (SciPy) and parsing (mwparserfromhell) python modules in python3. [operations/puppet] - 10https://gerrit.wikimedia.org/r/150045 (owner: 10Yuvipanda) [21:39:54] ori: aha, python3-dev would've conflicted. [21:39:59] * YuviPanda fixes, moves into appropriate place [21:41:22] (03PS3) 10Yuvipanda: statistics: Add packages for halfak [operations/puppet] - 10https://gerrit.wikimedia.org/r/150045 [21:41:37] ori: ^ this should be better. [21:42:20] nothing should conflict unless we end up putting mwprof on stat1003, which I guess is unlikely [21:42:36] (03PS2) 10Dzahn: gender-neutral language, Dear anthropoid.. [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/149202 [21:42:39] (03CR) 10jenkins-bot: [V: 04-1] gender-neutral language, Dear anthropoid.. [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/149202 (owner: 10Dzahn) [21:43:33] line too long? pfff [21:44:06] mutante: is that python? [21:45:14] (03PS3) 10Dzahn: gender-neutral language, Dear anthropoid.. [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/149202 [21:45:25] yes, but it's right, the message is too long kind of :) [21:45:40] (03CR) 10Ori.livneh: [C: 032] "(trivial)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150047 (owner: 10Ori.livneh) [21:45:42] (03CR) 10TTO: "Wikipedia is a namespace name on enwiki :)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/144264 (https://bugzilla.wikimedia.org/954) (owner: 10TTO) [21:45:55] removes the "at your service" part [21:46:02] (03CR) 10Dzahn: [C: 032] gender-neutral language, Dear anthropoid.. [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/149202 (owner: 10Dzahn) [21:56:41] (03PS4) 10Dzahn: wikimedia.org - retab only [operations/dns] - 10https://gerrit.wikimedia.org/r/148437 (https://bugzilla.wikimedia.org/68769) [21:56:59] (03CR) 10Andrew Bogott: "Is 'Jackass' gender-neutral? I prefer my chatbots to be as rude as possible." [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/149202 (owner: 10Dzahn) [21:59:32] andrewbogott: i think jackass is gender neutral. [22:00:59] Oops, no, apparently the 'jack' in 'jackass' means male. A female donkey is a… ginnyass? [22:02:52] (03CR) 10Dzahn: "this was in response to https://github.com/mattofak/jouncebot/pull/4 and because we imported it from github , also fine with me to discuss" [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/149202 (owner: 10Dzahn) [22:05:06] (03PS1) 10Yuvipanda: Add tox.ini and set max line length to 120 [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150056 [22:05:08] (03PS1) 10Yuvipanda: PEP8 fixes [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150057 [22:05:10] (03CR) 10jenkins-bot: [V: 04-1] Add tox.ini and set max line length to 120 [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150056 (owner: 10Yuvipanda) [22:05:12] (03CR) 10jenkins-bot: [V: 04-1] PEP8 fixes [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150057 (owner: 10Yuvipanda) [22:05:40] hashar: can we change flake8 to tox again here ^ [22:07:10] (03CR) 10Dzahn: [C: 032] wikimedia.org - retab only [operations/dns] - 10https://gerrit.wikimedia.org/r/148437 (https://bugzilla.wikimedia.org/68769) (owner: 10Dzahn) [22:07:39] YuviPanda: ahhh [22:08:00] YuviPanda: it is bed time for me [22:08:07] YuviPanda: we can pair it together tomorrow :-] [22:08:07] RobH: Can you please respond to this ticket? https://rt.wikimedia.org/Ticket/Display.html?id=7983 [22:08:12] hashar: ah, ok. I'll submit the patches now, you can merge and deploy when you're up again [22:08:14] hashar: sure! [22:08:18] hashar: have a good night! :) [22:08:37] YuviPanda: I will give you a Jenkins job builder + Zuul crash course :] [22:08:40] if I haven't already [22:08:49] hashar: you had, but I had forgotten in the meantime :) [22:08:53] hashar: I will welcome it again tomorrow [22:08:54] +716,-716 on wikimedia.org zone .. gop [22:09:03] YuviPanda: yeah :-] [22:09:04] hashar: step 1 :p [22:09:15] mutante: \O/ [22:09:33] mutante: one day we will need some kind of integration test to ensure everything works fine [22:09:52] I though about booting gdnsd with the proposed config and spamming dig request on it asserting the result is proper [22:09:56] i had to look at that like 5 times before daring to :p [22:10:16] but if it sits longer, it's only getting worse to rebase [22:10:25] now ScottLee can try again :) [22:10:45] mutante: whenever the repo is ready, poke the bug [22:11:06] I will probably deploy the tab checker tomorrow as a non voting test [22:11:09] hashar: yep, trying to add the correct Bug line to each change [22:11:14] so switching to Jenkins whining will be quite easy to handle [22:11:15] so the bot updates it [22:11:25] cool, thx [22:11:29] (03PS1) 10Yuvipanda: Randomly pick between multiple configurable messages [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150059 [22:11:31] (03CR) 10jenkins-bot: [V: 04-1] Randomly pick between multiple configurable messages [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150059 (owner: 10Yuvipanda) [22:11:31] not sure you need to add the bug: field for every retab [22:11:38] andrewbogott: mutante ^ should allow us all to have messages :) [22:11:58] oh dear [22:12:38] mwalker: :) [22:12:39] (03CR) 10Dzahn: [C: 031] "hehe, +1 for the idea to randomize" [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150059 (owner: 10Yuvipanda) [22:14:03] YuviPanda, master of jouncebot already should be pep8 compliant [22:14:11] I merged a change earlier today for that [22:14:13] mwalker: yeah, except for the 80char thing [22:14:31] I changed it to 120 chars -- because I find that bit of pep8 stupid [22:14:46] it is not stupid :-} [22:14:46] mwalker: where did you change it? [22:14:57] in master; see tox.ini [22:14:57] in tox.ini [22:15:03] pep8 knows how to read from it [22:15:09] mwalker: it is usually changed in tox.ini, so I just added one and set it to 120, but then pyflake isn't picking it up [22:15:23] mwalker: uh, in master where? I don't see a tox... [22:15:27] mwalker: is it on github, and not on gerrit? [22:15:59] YuviPanda, https://git.wikimedia.org/tree/wikimedia%2Fbots%2Fjouncebot [22:16:07] mwalker: I don't see a tox.ini in the master there [22:16:49] YuviPanda: added by https://gerrit.wikimedia.org/r/#/c/150035/ [22:17:08] wat, that's just... weird [22:17:10] I'm so confused now [22:17:26] oh [22:17:28] I'm an idiot [22:17:30] nevermind me [22:17:33] git remote update && git log --all --oneline --decorate --color [22:17:37] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Mon 28 Jul 2014 20:16:45 UTC [22:17:50] (03PS2) 10Yuvipanda: PEP8 fixes [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150057 [22:17:52] (03PS2) 10Yuvipanda: Randomly pick between multiple configurable messages [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150059 [22:17:58] hashar: no, I didn't check for it first :) Now jenkinsbot should be happy [22:18:16] \O/ [22:18:17] mutante: mwalker ^ want to merge? [22:18:40] also write tests!!!!!!!!!! :D [22:19:04] (03CR) 10Mwalker: [C: 032] PEP8 fixes [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150057 (owner: 10Yuvipanda) [22:19:06] (03Merged) 10jenkins-bot: PEP8 fixes [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150057 (owner: 10Yuvipanda) [22:20:51] (03CR) 10Mwalker: [C: 04-1] Randomly pick between multiple configurable messages (031 comment) [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150059 (owner: 10Yuvipanda) [22:21:34] mwalker: updated [22:21:35] (03PS3) 10Yuvipanda: Randomly pick between multiple configurable messages [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150059 [22:23:49] (03CR) 10Mwalker: [C: 032] Randomly pick between multiple configurable messages [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150059 (owner: 10Yuvipanda) [22:23:51] (03Merged) 10jenkins-bot: Randomly pick between multiple configurable messages [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150059 (owner: 10Yuvipanda) [22:23:54] mwalker: w00t [22:25:50] jouncebot, die [22:26:05] * mwalker needs a jouncebot restart command [22:26:09] mwalker: also some other minor code cleanup coming up [22:27:11] (03PS1) 10Yuvipanda: Use os.path.join instead of string concat [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150063 [22:27:26] now please i18n jouncebot :p [22:27:47] xD [22:27:55] And have it announce in rand() lang? [22:28:07] heh [22:28:11] no, config hash, deploy -> native_language :p [22:28:15] deployer [22:28:42] so if maxsem and roan and bd80.8 are deploying, which one will it choose? [22:28:47] (03CR) 10Mwalker: [C: 032] Use os.path.join instead of string concat [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150063 (owner: 10Yuvipanda) [22:28:47] engrusch? [22:28:49] (03Merged) 10jenkins-bot: Use os.path.join instead of string concat [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150063 (owner: 10Yuvipanda) [22:31:03] mwalker: ty for the merge :) [22:31:09] merges [22:34:57] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [22:35:49] (03PS1) 10Dzahn: mediawiki.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150067 (https://bugzilla.wikimedia.org/68769) [22:35:54] (03PS1) 10Dzahn: 0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150068 (https://bugzilla.wikimedia.org/68769) [22:35:59] (03PS1) 10Dzahn: 152.80.208.in-addr.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150069 [22:36:01] (03PS1) 10Dzahn: 155.80.208.in-addr.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150070 [22:36:03] (03PS1) 10Dzahn: wikidata.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150071 [22:36:05] (03PS1) 10Dzahn: wikivoyage-old.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150072 [22:37:17] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Mon Jul 28 22:37:14 UTC 2014 [22:40:25] (03CR) 10Dzahn: [C: 032] contacts.wm - http->https redirect [operations/puppet] - 10https://gerrit.wikimedia.org/r/148748 (owner: 10Dzahn) [22:43:36] (03CR) 10Dzahn: "enforces https now" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148748 (owner: 10Dzahn) [22:47:23] (03PS2) 10Dzahn: mediawiki.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150067 (https://bugzilla.wikimedia.org/68769) [22:47:57] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [22:48:52] (03CR) 10Dzahn: [C: 032] mediawiki.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150067 (https://bugzilla.wikimedia.org/68769) (owner: 10Dzahn) [22:52:49] (03PS2) 10Dzahn: 0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150068 (https://bugzilla.wikimedia.org/68769) [22:55:11] (03PS3) 10Dzahn: 0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150068 (https://bugzilla.wikimedia.org/68769) [22:58:57] PROBLEM - puppet last run on mw1047 is CRITICAL: CRITICAL: Puppet has 29 failures [22:58:58] PROBLEM - puppet last run on cp3017 is CRITICAL: CRITICAL: Puppet has 7 failures [22:59:07] PROBLEM - puppet last run on ssl1003 is CRITICAL: CRITICAL: Puppet has 12 failures [22:59:07] PROBLEM - puppet last run on hydrogen is CRITICAL: CRITICAL: Puppet has 24 failures [22:59:07] PROBLEM - puppet last run on mw1194 is CRITICAL: CRITICAL: Puppet has 3 failures [22:59:08] PROBLEM - puppet last run on es1004 is CRITICAL: CRITICAL: Puppet has 15 failures [22:59:08] PROBLEM - puppet last run on cp1066 is CRITICAL: CRITICAL: Puppet has 9 failures [22:59:08] PROBLEM - puppet last run on db1005 is CRITICAL: CRITICAL: Puppet has 21 failures [22:59:08] PROBLEM - puppet last run on mw1018 is CRITICAL: CRITICAL: Puppet has 58 failures [22:59:09] PROBLEM - puppet last run on mw1103 is CRITICAL: CRITICAL: Puppet has 30 failures [22:59:09] PROBLEM - puppet last run on tmh1001 is CRITICAL: CRITICAL: Puppet has 24 failures [22:59:17] PROBLEM - puppet last run on mw1095 is CRITICAL: CRITICAL: Puppet has 62 failures [22:59:17] PROBLEM - puppet last run on fenari is CRITICAL: CRITICAL: Puppet has 18 failures [22:59:18] PROBLEM - puppet last run on amssq50 is CRITICAL: CRITICAL: Puppet has 9 failures [22:59:37] PROBLEM - puppet last run on zinc is CRITICAL: CRITICAL: Puppet has 18 failures [22:59:37] PROBLEM - puppet last run on wtp1014 is CRITICAL: CRITICAL: Puppet has 22 failures [22:59:37] PROBLEM - puppet last run on virt1005 is CRITICAL: CRITICAL: Puppet has 20 failures [22:59:37] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: Puppet has 28 failures [22:59:37] PROBLEM - puppet last run on analytics1029 is CRITICAL: CRITICAL: Puppet has 15 failures [22:59:38] PROBLEM - puppet last run on elastic1016 is CRITICAL: CRITICAL: Puppet has 22 failures [22:59:38] PROBLEM - puppet last run on db1049 is CRITICAL: CRITICAL: Puppet has 21 failures [22:59:39] PROBLEM - puppet last run on mw1102 is CRITICAL: CRITICAL: Puppet has 54 failures [22:59:39] PROBLEM - puppet last run on mw1128 is CRITICAL: CRITICAL: Puppet has 54 failures [22:59:47] PROBLEM - puppetmaster backend https on palladium is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8141: HTTP/1.1 500 Internal Server Error [22:59:47] PROBLEM - puppet last run on cp1064 is CRITICAL: CRITICAL: Puppet has 16 failures [22:59:47] PROBLEM - puppet last run on mw1137 is CRITICAL: CRITICAL: Puppet has 40 failures [22:59:57] PROBLEM - puppet last run on mw1073 is CRITICAL: CRITICAL: Puppet has 59 failures [22:59:57] PROBLEM - puppet last run on mw1075 is CRITICAL: CRITICAL: Puppet has 60 failures [22:59:57] PROBLEM - puppet last run on mw1157 is CRITICAL: CRITICAL: Puppet has 71 failures [22:59:57] PROBLEM - puppet last run on elastic1013 is CRITICAL: CRITICAL: Puppet has 19 failures [22:59:57] PROBLEM - puppet last run on elastic1009 is CRITICAL: CRITICAL: Puppet has 19 failures [22:59:58] PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: Puppet has 20 failures [22:59:58] PROBLEM - puppet last run on cp3019 is CRITICAL: CRITICAL: Puppet has 21 failures [23:00:07] PROBLEM - puppet last run on analytics1024 is CRITICAL: CRITICAL: Puppet has 20 failures [23:00:07] PROBLEM - puppet last run on virt1009 is CRITICAL: CRITICAL: Puppet has 20 failures [23:00:08] PROBLEM - puppet last run on mw1169 is CRITICAL: CRITICAL: Puppet has 52 failures [23:00:08] PROBLEM - puppet last run on mw1179 is CRITICAL: CRITICAL: Puppet has 54 failures [23:00:08] PROBLEM - puppet last run on db1068 is CRITICAL: CRITICAL: Puppet has 22 failures [23:00:08] PROBLEM - puppet last run on ms-be1014 is CRITICAL: CRITICAL: Puppet has 20 failures [23:00:08] PROBLEM - puppet last run on mw1020 is CRITICAL: CRITICAL: Puppet has 56 failures [23:00:09] PROBLEM - puppet last run on lanthanum is CRITICAL: CRITICAL: Puppet has 34 failures [23:00:09] PROBLEM - puppet last run on mw1078 is CRITICAL: CRITICAL: Puppet has 65 failures [23:00:10] PROBLEM - puppet last run on mw1070 is CRITICAL: CRITICAL: Puppet has 63 failures [23:00:10] PROBLEM - puppet last run on mw1015 is CRITICAL: CRITICAL: Puppet has 44 failures [23:00:11] PROBLEM - puppet last run on ms-be1005 is CRITICAL: CRITICAL: Puppet has 20 failures [23:00:11] PROBLEM - puppet last run on mw1058 is CRITICAL: CRITICAL: Puppet has 60 failures [23:00:11] (03PS30) 10Yuvipanda: Add extdist module + role for labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149486 (https://bugzilla.wikimedia.org/68609) [23:00:12] PROBLEM - puppet last run on search1019 is CRITICAL: CRITICAL: Puppet has 40 failures [23:00:12] PROBLEM - puppet last run on labsdb1005 is CRITICAL: CRITICAL: Puppet has 24 failures [23:00:17] PROBLEM - puppet last run on db1009 is CRITICAL: CRITICAL: Puppet has 24 failures [23:00:17] PROBLEM - puppet last run on lvs1006 is CRITICAL: CRITICAL: Puppet has 21 failures [23:00:17] PROBLEM - puppet last run on elastic1010 is CRITICAL: CRITICAL: Puppet has 25 failures [23:00:17] PROBLEM - puppet last run on mw1019 is CRITICAL: CRITICAL: Puppet has 55 failures [23:00:18] PROBLEM - puppet last run on cp4011 is CRITICAL: CRITICAL: Puppet has 19 failures [23:00:18] PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: Puppet has 18 failures [23:00:18] PROBLEM - puppet last run on amssq52 is CRITICAL: CRITICAL: Puppet has 22 failures [23:00:19] PROBLEM - puppet last run on cp3021 is CRITICAL: CRITICAL: Puppet has 14 failures [23:00:36] * jamesofur kicks icinga-wm [23:00:37] PROBLEM - puppet last run on ssl1004 is CRITICAL: CRITICAL: Puppet has 17 failures [23:00:37] PROBLEM - puppet last run on db1058 is CRITICAL: CRITICAL: Puppet has 19 failures [23:00:37] PROBLEM - puppet last run on mw1101 is CRITICAL: CRITICAL: Puppet has 57 failures [23:00:37] PROBLEM - puppet last run on search1008 is CRITICAL: CRITICAL: Puppet has 46 failures [23:00:37] PROBLEM - puppet last run on es1003 is CRITICAL: CRITICAL: Puppet has 25 failures [23:00:38] PROBLEM - puppet last run on mw1085 is CRITICAL: CRITICAL: Puppet has 62 failures [23:00:38] PROBLEM - puppet last run on mw1017 is CRITICAL: CRITICAL: Puppet has 48 failures [23:00:39] PROBLEM - puppet last run on tmh1002 is CRITICAL: CRITICAL: Puppet has 30 failures [23:00:48] PROBLEM - puppet last run on cp4015 is CRITICAL: CRITICAL: Puppet has 21 failures [23:00:57] PROBLEM - puppet last run on search1009 is CRITICAL: CRITICAL: Puppet has 42 failures [23:00:57] PROBLEM - puppet last run on mw1127 is CRITICAL: CRITICAL: Puppet has 69 failures [23:00:57] PROBLEM - puppet last run on analytics1019 is CRITICAL: CRITICAL: Puppet has 17 failures [23:00:57] PROBLEM - puppet last run on lvs3002 is CRITICAL: CRITICAL: Puppet has 18 failures [23:00:58] PROBLEM - puppet last run on ms-be3004 is CRITICAL: CRITICAL: Puppet has 15 failures [23:00:58] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Puppet has 16 failures [23:01:07] PROBLEM - puppet last run on labsdb1002 is CRITICAL: CRITICAL: Puppet has 18 failures [23:01:07] PROBLEM - puppet last run on mw1184 is CRITICAL: CRITICAL: Puppet has 61 failures [23:01:07] PROBLEM - puppet last run on mw1182 is CRITICAL: CRITICAL: Puppet has 59 failures [23:01:08] PROBLEM - puppet last run on mw1214 is CRITICAL: CRITICAL: Puppet has 54 failures [23:01:08] PROBLEM - puppet last run on cp1057 is CRITICAL: CRITICAL: Puppet has 21 failures [23:01:08] PROBLEM - puppet last run on erbium is CRITICAL: CRITICAL: Puppet has 36 failures [23:01:08] PROBLEM - puppet last run on db1019 is CRITICAL: CRITICAL: Puppet has 20 failures [23:01:09] PROBLEM - puppet last run on analytics1034 is CRITICAL: CRITICAL: Puppet has 16 failures [23:01:09] PROBLEM - puppet last run on mw1094 is CRITICAL: CRITICAL: Puppet has 55 failures [23:01:10] PROBLEM - puppet last run on mw1083 is CRITICAL: CRITICAL: Puppet has 59 failures [23:01:10] PROBLEM - puppet last run on cp1051 is CRITICAL: CRITICAL: Puppet has 20 failures [23:01:11] PROBLEM - puppet last run on es1005 is CRITICAL: CRITICAL: Puppet has 18 failures [23:01:17] PROBLEM - puppet last run on search1021 is CRITICAL: CRITICAL: Puppet has 51 failures [23:01:17] PROBLEM - puppet last run on mw1191 is CRITICAL: CRITICAL: Puppet has 65 failures [23:01:18] PROBLEM - puppet last run on virt0 is CRITICAL: CRITICAL: Puppet has 53 failures [23:01:18] PROBLEM - puppet last run on amssq45 is CRITICAL: CRITICAL: Puppet has 20 failures [23:01:37] PROBLEM - puppet last run on db1041 is CRITICAL: CRITICAL: Puppet has 22 failures [23:01:38] PROBLEM - puppet last run on mw1013 is CRITICAL: CRITICAL: Puppet has 50 failures [23:01:38] PROBLEM - puppet last run on mw1035 is CRITICAL: CRITICAL: Puppet has 45 failures [23:01:38] PROBLEM - puppet last run on db1024 is CRITICAL: CRITICAL: Puppet has 23 failures [23:01:57] PROBLEM - puppet last run on db1007 is CRITICAL: CRITICAL: Puppet has 23 failures [23:01:57] PROBLEM - puppet last run on lvs4001 is CRITICAL: CRITICAL: Puppet has 18 failures [23:01:57] PROBLEM - puppet last run on cp4017 is CRITICAL: CRITICAL: Puppet has 23 failures [23:01:58] PROBLEM - puppet last run on amslvs4 is CRITICAL: CRITICAL: Puppet has 18 failures [23:02:07] PROBLEM - puppet last run on wtp1021 is CRITICAL: CRITICAL: Puppet has 25 failures [23:02:07] grrr [23:02:07] PROBLEM - puppet last run on virt1002 is CRITICAL: CRITICAL: Puppet has 23 failures [23:02:07] PROBLEM - puppet last run on cp1043 is CRITICAL: CRITICAL: Puppet has 22 failures [23:02:07] PROBLEM - puppet last run on mc1008 is CRITICAL: CRITICAL: Puppet has 24 failures [23:02:08] PROBLEM - puppet last run on cp1065 is CRITICAL: CRITICAL: Puppet has 23 failures [23:02:08] PROBLEM - puppet last run on mw1196 is CRITICAL: CRITICAL: Puppet has 57 failures [23:02:08] PROBLEM - puppet last run on search1014 is CRITICAL: CRITICAL: Puppet has 52 failures [23:02:09] PROBLEM - puppet last run on mw1136 is CRITICAL: CRITICAL: Puppet has 58 failures [23:02:17] PROBLEM - puppet last run on mw1096 is CRITICAL: CRITICAL: Puppet has 62 failures [23:02:17] PROBLEM - puppet last run on zirconium is CRITICAL: CRITICAL: Puppet has 43 failures [23:02:17] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: Puppet has 63 failures [23:02:17] PROBLEM - puppet last run on es10 is CRITICAL: CRITICAL: Puppet has 18 failures [23:02:18] PROBLEM - puppet last run on cp4013 is CRITICAL: CRITICAL: Puppet has 24 failures [23:02:18] PROBLEM - puppet last run on amssq57 is CRITICAL: CRITICAL: Puppet has 18 failures [23:02:32] * jamesofur is guessing that puppet didn't run correct anywhere at 4pm [23:02:37] PROBLEM - puppet last run on wtp1024 is CRITICAL: CRITICAL: Puppet has 19 failures [23:02:37] PROBLEM - puppet last run on ytterbium is CRITICAL: CRITICAL: Puppet has 31 failures [23:02:37] PROBLEM - puppet last run on mw1161 is CRITICAL: CRITICAL: Puppet has 55 failures [23:02:37] PROBLEM - puppet last run on analytics1039 is CRITICAL: CRITICAL: Puppet has 21 failures [23:02:37] PROBLEM - puppet last run on cp1069 is CRITICAL: CRITICAL: Puppet has 18 failures [23:02:38] PROBLEM - puppet last run on es1009 is CRITICAL: CRITICAL: Puppet has 13 failures [23:02:38] PROBLEM - puppet last run on es1006 is CRITICAL: CRITICAL: Puppet has 21 failures [23:02:39] PROBLEM - puppet last run on mw1036 is CRITICAL: CRITICAL: Puppet has 61 failures [23:02:39] PROBLEM - puppet last run on mw1130 is CRITICAL: CRITICAL: Puppet has 55 failures [23:02:40] PROBLEM - puppet last run on db1010 is CRITICAL: CRITICAL: Puppet has 18 failures [23:02:40] PROBLEM - puppet last run on analytics1015 is CRITICAL: CRITICAL: Puppet has 17 failures [23:02:43] s/correct/correctly [23:02:47] PROBLEM - puppet last run on tarin is CRITICAL: CRITICAL: Puppet has 16 failures [23:02:47] PROBLEM - puppet last run on wtp1017 is CRITICAL: CRITICAL: Puppet has 17 failures [23:02:57] PROBLEM - puppet last run on mw1040 is CRITICAL: CRITICAL: Puppet has 59 failures [23:02:57] PROBLEM - puppet last run on mw1192 is CRITICAL: CRITICAL: Puppet has 65 failures [23:02:57] PROBLEM - puppet last run on mw1132 is CRITICAL: CRITICAL: Puppet has 52 failures [23:02:58] PROBLEM - puppet last run on es7 is CRITICAL: CRITICAL: Puppet has 19 failures [23:02:58] PROBLEM - puppet last run on amssq58 is CRITICAL: CRITICAL: Puppet has 25 failures [23:03:07] PROBLEM - puppet last run on mc1010 is CRITICAL: CRITICAL: Puppet has 20 failures [23:03:08] PROBLEM - puppet last run on analytics1036 is CRITICAL: CRITICAL: Puppet has 17 failures [23:03:08] PROBLEM - puppet last run on mw1216 is CRITICAL: CRITICAL: Puppet has 56 failures [23:03:08] PROBLEM - puppet last run on mw1109 is CRITICAL: CRITICAL: Puppet has 62 failures [23:03:08] PROBLEM - puppet last run on mw1218 is CRITICAL: CRITICAL: Puppet has 67 failures [23:03:08] PROBLEM - puppet last run on mw1138 is CRITICAL: CRITICAL: Puppet has 71 failures [23:03:08] PROBLEM - puppet last run on rdb1004 is CRITICAL: CRITICAL: Puppet has 19 failures [23:03:09] PROBLEM - puppet last run on mw1124 is CRITICAL: CRITICAL: Puppet has 58 failures [23:03:18] PROBLEM - puppet last run on cp3006 is CRITICAL: CRITICAL: Puppet has 23 failures [23:03:18] PROBLEM - puppet last run on ssl3003 is CRITICAL: CRITICAL: Puppet has 17 failures [23:03:37] PROBLEM - puppet last run on wtp1019 is CRITICAL: CRITICAL: Puppet has 18 failures [23:03:37] PROBLEM - puppet last run on mc1011 is CRITICAL: CRITICAL: Puppet has 20 failures [23:03:37] PROBLEM - puppet last run on mw1038 is CRITICAL: CRITICAL: Puppet has 63 failures [23:03:38] PROBLEM - puppet last run on search1020 is CRITICAL: CRITICAL: Puppet has 44 failures [23:03:38] PROBLEM - puppet last run on mw1028 is CRITICAL: CRITICAL: Puppet has 51 failures [23:03:38] PROBLEM - puppet last run on mw1147 is CRITICAL: CRITICAL: Puppet has 55 failures [23:03:48] PROBLEM - puppet last run on vanadium is CRITICAL: CRITICAL: Puppet has 22 failures [23:03:48] PROBLEM - puppet last run on db1053 is CRITICAL: CRITICAL: Puppet has 20 failures [23:03:57] PROBLEM - puppet last run on dbstore1001 is CRITICAL: CRITICAL: Puppet has 20 failures [23:03:57] PROBLEM - puppet last run on mw1062 is CRITICAL: CRITICAL: Puppet has 60 failures [23:03:57] PROBLEM - puppet last run on ms-be1001 is CRITICAL: CRITICAL: Puppet has 23 failures [23:03:57] PROBLEM - puppet last run on ms-fe1003 is CRITICAL: CRITICAL: Puppet has 22 failures [23:03:57] PROBLEM - puppet last run on cp4007 is CRITICAL: CRITICAL: Puppet has 24 failures [23:03:58] PROBLEM - puppet last run on cp3022 is CRITICAL: CRITICAL: Puppet has 23 failures [23:03:58] PROBLEM - puppet last run on amssq59 is CRITICAL: CRITICAL: Puppet has 19 failures [23:03:59] PROBLEM - puppet last run on cp3013 is CRITICAL: CRITICAL: Puppet has 25 failures [23:04:07] PROBLEM - puppet last run on db1065 is CRITICAL: CRITICAL: Puppet has 23 failures [23:04:07] PROBLEM - puppet last run on labsdb1001 is CRITICAL: CRITICAL: Puppet has 19 failures [23:04:07] PROBLEM - puppet last run on pc1001 is CRITICAL: CRITICAL: Puppet has 31 failures [23:04:07] PROBLEM - puppet last run on mw1178 is CRITICAL: CRITICAL: Puppet has 51 failures [23:04:07] PROBLEM - puppet last run on ms-be1013 is CRITICAL: CRITICAL: Puppet has 23 failures [23:04:08] PROBLEM - puppet last run on cp1059 is CRITICAL: CRITICAL: Puppet has 19 failures [23:04:08] PROBLEM - puppet last run on analytics1033 is CRITICAL: CRITICAL: Puppet has 21 failures [23:04:09] PROBLEM - puppet last run on es1001 is CRITICAL: CRITICAL: Puppet has 19 failures [23:04:09] PROBLEM - puppet last run on mc1004 is CRITICAL: CRITICAL: Puppet has 20 failures [23:04:10] PROBLEM - puppet last run on mw1031 is CRITICAL: CRITICAL: Puppet has 66 failures [23:04:10] PROBLEM - puppet last run on mw1067 is CRITICAL: CRITICAL: Puppet has 52 failures [23:04:10] * ebernhardson wonders if a 'message repeated' helper might be usefull ;) [23:04:11] PROBLEM - puppet last run on mw1115 is CRITICAL: CRITICAL: Puppet has 49 failures [23:04:32] well, it's not repeated, different number of fails :p [23:04:36] :P [23:04:39] looks like the puppetmaster again [23:04:50] but also like somebody already fixed [23:05:13] arguably puppet should fail differently if the master is completely failed, and we should then pick up on that difference and not report it as a client machine problem [23:05:18] if only we lived in such a world [23:05:47] no, the master still failed [23:05:50] but different from last time [23:05:56] (03CR) 10Legoktm: [C: 031] "Working great on extdist-test.wmflabs.org" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149486 (https://bugzilla.wikimedia.org/68609) (owner: 10Yuvipanda) [23:06:22] !log graceful apache on palladium [23:06:28] now it should work again.. yep [23:06:28] Logged the message, Master [23:06:33] just saying, there's no point in the puppet client reporting "49 failures" where there's really one failure. it should stop and report "the puppetmaster is dead" [23:06:49] (and then we could filter that on the client checks) [23:06:51] bblack: yea, true [23:07:33] the actual message we wanted is in there [23:07:47] 16:01 <+icinga-wm> PROBLEM - puppetmaster backend https on palladium is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8141: HTTP/1.1 500 Internal Server Error [23:08:01] but need to get rid of the other ones in that case.. somehow [23:08:46] ~ http://docs.icinga.org/latest/en/dependencies.html [23:10:11] (03CR) 10Andrew Bogott: "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149486 (https://bugzilla.wikimedia.org/68609) (owner: 10Yuvipanda) [23:10:31] (03CR) 10Andrew Bogott: [C: 032] Add extdist module + role for labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/149486 (https://bugzilla.wikimedia.org/68609) (owner: 10Yuvipanda) [23:12:28] !log temp. disabled puppet on neon and ircecho [23:12:36] Logged the message, Master [23:15:26] greg-g: Who's doing SWAT deployment today? [23:15:38] ^ [23:15:52] jouncebot.. where art thou [23:16:01] looks like either Reedy or manybubbles [23:18:57] mutante: jouncebot never came back after mwalker|away asked it to die :( [23:19:06] I guess one of the patches broke it [23:19:36] (03CR) 10Scottlee: [C: 031] 152.80.208.in-addr.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150069 (owner: 10Dzahn) [23:20:31] mutante: fixing [23:20:54] actually, I was looking at the wrong time slot. Looks like it's either mwalker|away or ori (Max is out) [23:21:10] mutante: there we go [23:21:10] i'll do it [23:21:15] kaldari: sorry for the delay [23:21:17] jouncebot: hi [23:21:22] ori: NP :) [23:21:42] !log Updated /srv/jobrunner to 0bb0ad62dd9240e0f67b2ded4519f125de13dfbc [23:21:47] Logged the message, Master [23:21:50] jouncebot: next [23:21:50] In 15 hour(s) and 38 minute(s): SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140729T1500) [23:22:05] jouncebot: last [23:22:13] wishful thinking [23:22:31] (03PS1) 10Yuvipanda: Put quotes around strings in YAML that start with % [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150081 [23:22:42] mutante: ^ wanna merge? [23:22:42] bd808: something like "and powering down the last server. End of an era. Godspeed Google Knowledge." [23:23:09] greg-g: sad but funny [23:23:13] (03PS2) 10Ori.livneh: Allow crats to add/remove petitiondata group on foundationWiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149398 (https://bugzilla.wikimedia.org/68587) (owner: 10Jalexander) [23:23:43] (03CR) 10Ori.livneh: [C: 032] Allow crats to add/remove petitiondata group on foundationWiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149398 (https://bugzilla.wikimedia.org/68587) (owner: 10Jalexander) [23:23:47] (03Merged) 10jenkins-bot: Allow crats to add/remove petitiondata group on foundationWiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149398 (https://bugzilla.wikimedia.org/68587) (owner: 10Jalexander) [23:23:54] (03CR) 10Scottlee: [C: 04-1] "Please check line 32." (031 comment) [operations/dns] - 10https://gerrit.wikimedia.org/r/150070 (owner: 10Dzahn) [23:23:56] (03PS2) 10Ori.livneh: Switched to JSON-based jobrunner.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/150024 (owner: 10Aaron Schulz) [23:24:35] (03PS1) 10Yuvipanda: Implement last command (per greg-g) [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150082 [23:24:48] (03CR) 10Ori.livneh: [C: 032 V: 032] "(straightforward)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150024 (owner: 10Aaron Schulz) [23:25:15] Thanks ori [23:25:29] bd808: greg-g you can +1 https://gerrit.wikimedia.org/r/150082 if you want that kinda 'last' response :) [23:25:35] !log ori Synchronized wmf-config/InitialiseSettings.php: I369dbad6e: Allow crats to add/remove petitiondata group on foundationWiki (duration: 00m 04s) [23:25:40] YuviPanda: :( [23:25:41] Logged the message, Master [23:26:10] YuviPanda: I think what I really wanted was a "previous" command [23:26:15] greg-g: :) I can probably implement 'last' by looking up the deployments page for it, but only after a few days :) [23:26:17] bd808: :) [23:27:26] ugh. python is not json. All hash items should end with a trailing comma [23:27:55] bd808: +1 [23:28:05] NOOOOOOOOOOOOO [23:28:06] but fine [23:28:35] it feels like ending a sentence with a comma and then a fullstop to me [23:28:35] YuviPanda: prevents dumb problems for sorting the source code and dirty diffs like that one [23:28:47] (03CR) 10Scottlee: "Missed line 36." (031 comment) [operations/dns] - 10https://gerrit.wikimedia.org/r/150071 (owner: 10Dzahn) [23:28:50] bd808: true, but.. but... [23:29:00] bd808: in that way, we shouldn't be aligning arrows inpuppet :) [23:29:09] bd808: far larger dirty diffs than just one extra line [23:29:22] agreed. I didn't draft that standard [23:29:44] heh :) [23:30:13] icinga back to 5 CRITs, letting the bot back in [23:30:22] (03CR) 10Scottlee: [C: 031] 0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150068 (https://bugzilla.wikimedia.org/68769) (owner: 10Dzahn) [23:30:32] (03PS1) 10Ori.livneh: Fix-up for I2508a0cce: correct path to config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/150084 [23:30:41] (03PS2) 10Ori.livneh: Fix-up for I2508a0cce: correct path to config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/150084 [23:30:57] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [23:31:04] (03CR) 10Dzahn: wikidata.org - retab (031 comment) [operations/dns] - 10https://gerrit.wikimedia.org/r/150071 (owner: 10Dzahn) [23:31:17] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: Puppet has 63 failures [23:32:17] (03CR) 10Scottlee: "Lots of alignment issues." (033 comments) [operations/dns] - 10https://gerrit.wikimedia.org/r/150072 (owner: 10Dzahn) [23:32:31] (03PS2) 10BBlack: remove pmtpa section from Round Robin LVS records [operations/dns] - 10https://gerrit.wikimedia.org/r/143201 (owner: 10Dzahn) [23:32:35] (03CR) 10Dzahn: [C: 031] "root@mw1001:/etc/jobrunner# ls" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150084 (owner: 10Ori.livneh) [23:33:14] !log ori Synchronized php-1.24wmf15/extensions/VisualEditor: Update VisualEditor to I944f8fbfa (duration: 00m 04s) [23:33:20] mutante: thanks [23:33:21] Logged the message, Master [23:34:02] James_F: looks good? [23:34:55] (03CR) 10Dzahn: "actually it was on purpose as mentioned on the wikimedia.org change, but fine, i _can_ also do them in one" [operations/dns] - 10https://gerrit.wikimedia.org/r/150072 (owner: 10Dzahn) [23:35:00] (03PS3) 10Ori.livneh: Fix-up for I2508a0cce: correct path to config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/150084 [23:35:02] mwalker|away: btw, there's a live hack in the jouncebot code to make it come back. I added a patch to fix it instead [23:35:19] (03CR) 10Ori.livneh: [C: 032 V: 032] Fix-up for I2508a0cce: correct path to config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/150084 (owner: 10Ori.livneh) [23:37:25] (03CR) 10BBlack: "I've rebased this and reviewed it thoroughly. None of these addresses are responsive, and none of these names are referenced elsewhere in" [operations/dns] - 10https://gerrit.wikimedia.org/r/143201 (owner: 10Dzahn) [23:38:36] (03CR) 10Dzahn: "oh, thanks very much Brandon, i think the FR question would be one for JeffGreen" [operations/dns] - 10https://gerrit.wikimedia.org/r/143201 (owner: 10Dzahn) [23:39:26] (03PS2) 10BBlack: 152.80.208.in-addr.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150069 (owner: 10Dzahn) [23:40:28] (03CR) 10BBlack: [C: 031] "Confirmed whitespace-only (does not change meaning)." [operations/dns] - 10https://gerrit.wikimedia.org/r/150069 (owner: 10Dzahn) [23:40:39] (03PS2) 10BBlack: 155.80.208.in-addr.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150070 (owner: 10Dzahn) [23:41:15] !log ori Started scap: I42c07b64: Update MobileFrontend [23:41:20] Logged the message, Master [23:42:49] ori: Hey, sorry. [23:43:57] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [23:44:19] (03PS3) 10Dzahn: 155.80.208.in-addr.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150070 [23:44:20] ori: Works great, thanks. [23:44:41] cool [23:45:19] (03PS5) 10Tim Landscheidt: Tools: Puppetize toolwatcher [operations/puppet] - 10https://gerrit.wikimedia.org/r/120186 [23:46:34] (03Abandoned) 10Ori.livneh: wmflib: add ensure_service() [operations/puppet] - 10https://gerrit.wikimedia.org/r/149778 (owner: 10Ori.livneh) [23:48:12] (03PS6) 10Tim Landscheidt: Tools: Puppetize toolwatcher [operations/puppet] - 10https://gerrit.wikimedia.org/r/120186 [23:48:56] (03CR) 10BBlack: [C: 032] "Confirmed whitespace-only, merging." [operations/dns] - 10https://gerrit.wikimedia.org/r/150068 (https://bugzilla.wikimedia.org/68769) (owner: 10Dzahn) [23:48:58] (03CR) 10Scottlee: [C: 031] 155.80.208.in-addr.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150070 (owner: 10Dzahn) [23:49:28] (03CR) 10BBlack: [C: 032] 152.80.208.in-addr.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150069 (owner: 10Dzahn) [23:49:54] (03CR) 10BBlack: [C: 032] "Confirmed whitespace-only." [operations/dns] - 10https://gerrit.wikimedia.org/r/150070 (owner: 10Dzahn) [23:50:30] (03CR) 10Ori.livneh: Tools: Puppetize toolwatcher (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/120186 (owner: 10Tim Landscheidt) [23:51:22] kaldari: the log message is stalled, but scap is done [23:51:28] kaldari: can you confirm everything's ok? [23:51:32] ori: Tahnks!!1 [23:51:42] yes, checking now... [23:51:42] np [23:53:29] ori: everything looks good! [23:55:34] * ^d thwacks elasticsearch with a 2x4 [23:55:36] <^d> Stupid. [23:55:57] <^d> Server joins the cluster. Sends 20 shards of enwiki to it. [23:58:52] !log ori Finished scap: I42c07b64: Update MobileFrontend (duration: 17m 37s) [23:58:58] Logged the message, Master