[00:18:29] (03Restored) 10Ori.livneh: HHVM: warm up the JIT by making web requests in Upstart post-start [puppet] - 10https://gerrit.wikimedia.org/r/150992 (owner: 10Ori.livneh) [00:19:03] ^ TimStarling [00:19:13] I abandoned it because I thought I was getting a little ahead of myself [00:54:06] (03CR) 10Ori.livneh: "Another approach would be to have a PHP file that exercises hot code paths, and to set it as the StartupDocument runtime option. See PROBLEM - puppet last run on mw1051 is CRITICAL: CRITICAL: Puppet has 1 failures [01:28:51] RECOVERY - puppet last run on mw1051 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [02:06:53] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3612 MB (3% inode=99%): [02:16:20] !log LocalisationUpdate completed (1.24wmf20) at 2014-09-22 02:16:20+00:00 [02:16:35] Logged the message, Master [02:29:09] !log LocalisationUpdate completed (1.24wmf21) at 2014-09-22 02:29:09+00:00 [02:29:16] Logged the message, Master [02:29:32] PROBLEM - puppet last run on cp4019 is CRITICAL: CRITICAL: Epic puppet fail [02:41:29] !log LocalisationUpdate completed (1.24wmf22) at 2014-09-22 02:41:29+00:00 [02:41:36] Logged the message, Master [02:48:42] RECOVERY - puppet last run on cp4019 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [03:00:34] RECOVERY - Disk space on virt0 is OK: DISK OK [03:36:40] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Sep 22 03:36:40 UTC 2014 (duration 36m 39s) [03:36:47] Logged the message, Master [05:38:32] (03PS1) 10Yurik: Enable Graph ext on mediawiki.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161908 [06:28:35] PROBLEM - puppet last run on mw1114 is CRITICAL: CRITICAL: Epic puppet fail [06:29:37] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:14] PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:24] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:25] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:44] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:44] PROBLEM - puppet last run on mw1211 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:54] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:06] PROBLEM - puppet last run on mw1172 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:06] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:14] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:16] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:56] PROBLEM - puppet last run on db1038 is CRITICAL: CRITICAL: Puppet has 3 failures [06:39:08] (03PS1) 10Tim Starling: Fix profiling error CommonSettings.php-skin-include1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161911 [06:41:18] (03CR) 10Ori.livneh: [C: 031] Fix profiling error CommonSettings.php-skin-include1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161911 (owner: 10Tim Starling) [06:45:24] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:45:44] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:45:44] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:45:54] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [06:45:55] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:46:14] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [06:46:28] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:46:30] RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [06:46:54] RECOVERY - puppet last run on mw1114 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:46:54] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [06:47:02] RECOVERY - puppet last run on mw1211 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [06:47:24] RECOVERY - puppet last run on mw1172 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [06:48:54] PROBLEM - puppet last run on db1034 is CRITICAL: CRITICAL: Puppet has 1 failures [06:54:14] RECOVERY - puppet last run on db1038 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [07:07:15] RECOVERY - puppet last run on db1034 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [07:11:25] (03PS2) 10Ori.livneh: HHVM: warm up the JIT by making web requests in Upstart post-start [puppet] - 10https://gerrit.wikimedia.org/r/150992 [07:19:34] (03CR) 10Qgil: "I guess we are abandoning this patch, and the other ones related with Bugzilla and still open?" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/107410 (owner: 10Aklapper) [07:22:24] <_joe_> ori: never noticed that patch; it's something I thought we should do in fact [07:22:47] <_joe_> not sure if we'd need to clean up the bytecode repo as well [07:25:16] that would defeat the purpose of a persistent repo [07:25:33] anything that requires cleaning up the repo to fix is a serious bug and should be fixed upstream promptly, imo [07:25:49] <_joe_> I tend to agree [07:26:00] don't merge the patch above tho, i want to tweak it a little [07:26:01] <_joe_> btw we're using a quasi-persistent storage at the moment [07:26:16] <_joe_> meaning that the repo code is in-memory using tmpfs [07:30:00] <_joe_> ori: I was intending to tweak that as well :) [07:30:33] <_joe_> I have a couple of things more to figure out as well [07:30:44] <_joe_> like, does hhvm resolve symlinks? [07:34:16] <_joe_> (we have uncountable links to the same files in different docroots) [07:34:32] it does [07:34:43] <_joe_> I was almost sure [07:34:50] let me know what you'd like me to fix and i'll amend the patch [07:35:18] <_joe_> I think the way you call curl could be improved [07:36:08] <_joe_> also, I think there is a way to pass curl more than one url at a time [07:36:56] (03CR) 10Filippo Giunchedi: [C: 031] wmflib: add to_milliseconds() / to_seconds() [puppet] - 10https://gerrit.wikimedia.org/r/159692 (owner: 10Ori.livneh) [07:37:00] greetings [07:37:00] <_joe_> but I'd have to check the manual [07:37:10] <_joe_> ciao :) [07:37:36] ciao _joe_ [07:38:48] ori: I'll try to merge https://gerrit.wikimedia.org/r/#/c/161332/ and possibly https://gerrit.wikimedia.org/r/#/c/133274/ today, worst case tomorrow [07:39:05] <_joe_> \o/ [07:39:19] <_joe_> (grafana makes me happy) [07:39:52] hehe [07:40:37] <_joe_> we need to get rid of ganglia; we won't be able to do that if we don't have a decent frontend for graphite [07:40:39] godog: morning! awesome, thanks! [07:40:45] <_joe_> and a better graphite install as well [07:44:25] ori: np! [07:44:52] <_joe_> it is unbearably hot here [07:45:34] heh, after looking again at the whisper code if we want to keep it SSDs might be worth it, the end result is basically ~1 seek for each metric received which isn't very nice [07:48:38] "if we want to keep it" -- what else would we do? switch to megacarbon, opentsdb, or influxdb? [07:50:52] <_joe_> opentsdb is sexy [07:51:26] <_joe_> influxdb last I checked was utterly immature and full of bugs [07:51:32] <_joe_> but it was like march [07:51:38] <_joe_> before joining the foundation [07:52:01] <_joe_> wow it's almost 6 months I'm here [07:58:44] (03PS1) 10Giuseppe Lavagetto: Add public alias for config-master. [dns] - 10https://gerrit.wikimedia.org/r/161913 [07:58:51] ori: not much available right now indeed, I wanted to try backing a graphite-api with opentsdb but haven't found much time to experiment [07:59:28] <_joe_> godog: If I understand correctly, we'd need to write the "glue" by ourselves, right? [08:01:26] _joe_: yep, it could work 'out of the box' but what's needed I think is having the data pre-aggregated too otherwise there's no chance it'll work [08:04:02] _joe_: happy half year birthday! [08:06:48] (03PS1) 10Giuseppe Lavagetto: config-master: serve through misc-web-lb [puppet] - 10https://gerrit.wikimedia.org/r/161915 [08:09:28] (03PS2) 10Giuseppe Lavagetto: Remove the last references to pybal on fenari [puppet] - 10https://gerrit.wikimedia.org/r/160467 [08:14:17] (03PS3) 10Ori.livneh: HHVM: warm up the JIT by making web requests in Upstart post-start [puppet] - 10https://gerrit.wikimedia.org/r/150992 [08:56:59] _joe_: interested in hhvm feedback ? [09:34:17] <_joe_> matanya: of course I am [09:35:00] _joe_: users report on slower networks css is not loading on time, but only on hhvm, disabling hhvm css loads ok [09:36:56] <_joe_> matanya: reason can be they're not cached (still) [09:37:02] in the user words: works as advertised, hhvm is faster, but maybe due to not loading css :) [09:37:03] <_joe_> but I'd need URLs [09:37:11] give a amsec [09:37:27] * me a sec [09:37:31] <_joe_> also, file a bug :) [09:37:39] !log Jenkins: deleting old mediawiki extensions jobs (rm -fR /var/lib/jenkins/jobs/*testextensions-master). They are no more triggered and superseded by the *-testextension jobs. [09:37:44] Logged the message, Master [09:38:30] will do, thanks _joe_ [09:43:00] uselessd ftw [09:55:43] !log restarting jenkins [09:55:48] Logged the message, Master [09:58:26] _joe_: fwiw https://he.wikipedia.org/wiki/%D7%99%D7%95%D7%95%D7%9F_%D7%94%D7%90%D7%A8%D7%9B%D7%90%D7%99%D7%AA is an example [10:00:29] <_joe_> matanya: I see that perfectly [10:00:39] the bug ? [10:00:46] or the page? [10:00:48] <_joe_> no this url [10:00:51] <_joe_> this page [10:01:05] me too, but i'm on a very fast network [10:01:17] he said it happens only on slow networks [10:01:44] <_joe_> Maybe he has some cookies so that the urls are called with debug=true? [10:02:28] i doubt it, but i can ask [10:02:43] <_joe_> btw, full page rendering in an incognito window so without hhvm took me a fat 2 s more [10:03:09] <_joe_> (yes I cleared the cache before doing the measurement) [10:03:47] (03CR) 10Aklapper: "I'd do that once BZ is entirely gone" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/107410 (owner: 10Aklapper) [10:04:11] yes, hhvm is way faster for me too [10:04:13] !log Jenkins back and fully operational [10:04:19] Logged the message, Master [10:08:48] _joe_: breadcrumbs :D https://lists.wikimedia.org/pipermail/wikimedia-l/2014-September/074713.html [10:09:31] would be nice to have a list of pages which currently give 504 on save and see how well hhvm performs there [10:09:56] <_joe_> 504? [10:10:06] <_joe_> I'd have thought 503 [10:10:46] iirc that page used to give me 504 [10:13:16] I wish there was a way to enable beta features globally, going to 700~ sites to enable hhvm is annoying [10:13:58] <_joe_> matanya: it is [10:15:29] matanya: poke platform to have user preference added to CentralLogin or something similar [10:16:04] hashar: i'm still working on SUL, can't poke them before i finish my part :) [10:16:31] :D [10:17:09] but yeah, the interface looks different on every site i visit due to old css/js i once istalled/gadgets/beta features [10:17:37] very confusing, but don't have the time to clean up, wish is was global [10:17:54] * it [10:19:38] <_joe_> matanya: I kinda agree; I have to say that multi-wiki always-logged-in editors are probably a small subset even of the editors community [10:19:55] <_joe_> so probably not that important :) [10:20:02] i know i'm a unicorn :D [10:20:10] <_joe_> and I also see some implementation issues [10:20:36] <_joe_> given not every mediawiki install we have is at the same version, has the same config, etc etc [10:21:05] <_joe_> matanya: most WMF staff is constantly looking at a lot of wikipedias, so you're not alone :) [10:21:07] yeah, i know that, just general start of the week ranting [10:21:22] <_joe_> eheh I'm as annoyed as you [10:21:55] btw, _joe_ that is a good point, most WMF staff is constantly looking at a lot of wikipedias <-- what about other projects? [10:22:11] (03PS1) 10Nemo bis: [Planet Wikimedia] Add Maria Sefidari to English [puppet] - 10https://gerrit.wikimedia.org/r/161921 [10:22:38] yay, git is broken on this machine :/ [10:22:41] <_joe_> I'm using https://meta.wikimedia.org/wiki/User:GLavagetto_(WMF)/global.js to trick the system [10:22:52] i'll clone a new clone [10:23:06] <_joe_> matanya: other projects as well [10:23:47] not doable, but thanks :) https://he.wikipedia.org/wiki/%D7%9E%D7%A9%D7%AA%D7%9E%D7%A9:Matanya/common.js [10:24:19] RTL will break big time on sites like en,commons or meta [10:24:35] <_joe_> yep [10:25:09] code [10:25:14] i'll go fix some scoping instead of ranting, more productive [10:25:17] heading lunch [10:25:27] <_joe_> :)) [10:29:23] * YuviPanda|away waves at _joe_ [10:32:47] * _joe_ hides [10:33:19] <_joe_> YuviPanda: just joking... I'll take a look [10:33:26] _joe_: :) [10:34:06] (03CR) 10Giuseppe Lavagetto: [C: 032] Add public alias for config-master. [dns] - 10https://gerrit.wikimedia.org/r/161913 (owner: 10Giuseppe Lavagetto) [10:35:48] (03CR) 10Giuseppe Lavagetto: [C: 032] config-master: serve through misc-web-lb [puppet] - 10https://gerrit.wikimedia.org/r/161915 (owner: 10Giuseppe Lavagetto) [10:35:59] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [10:36:31] <_joe_> mmmh a 404 spike as well [10:36:53] <_joe_> someone doing some very lame crawling maybe [10:42:18] PROBLEM - puppet last run on cp1044 is CRITICAL: CRITICAL: Puppet has 1 failures [10:42:51] <_joe_> deh [10:42:51] (03PS1) 10Giuseppe Lavagetto: misc-varnish: add palladium [puppet] - 10https://gerrit.wikimedia.org/r/161922 [10:42:53] PROBLEM - puppet last run on cp1043 is CRITICAL: CRITICAL: Puppet has 1 failures [10:43:09] PROBLEM - puppet last run on amssq58 is CRITICAL: CRITICAL: Puppet has 1 failures [10:43:26] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] misc-varnish: add palladium [puppet] - 10https://gerrit.wikimedia.org/r/161922 (owner: 10Giuseppe Lavagetto) [10:45:18] RECOVERY - puppet last run on cp1044 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [10:48:18] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [10:51:13] _joe_: me again, templates/varnish/upload-frontend.inc.vcl.erb doesn't seem to be called from anywhere, but i'm sure it is. can you please enlight me ? [10:52:19] <_joe_> mmh I need to look at it [10:52:40] <_joe_> but I'd say from the varnish class via the role/cache.pp file [10:52:45] <_joe_> but lemme check [10:54:47] <_joe_> varnish::instance { 'upload-frontend': in role/cache.pp [10:55:39] but where is the template being called from ? [10:56:05] vcl => ? [10:56:45] (03PS9) 10Yuvipanda: nagios_common: Refactor custom command definitions [puppet] - 10https://gerrit.wikimedia.org/r/161478 [10:57:46] <_joe_> yes [10:57:56] <_joe_> YuviPanda: give me anouther 10 minutes [10:58:10] _joe_: sure! I just added the shinken bits, testing now [11:00:29] RECOVERY - puppet last run on amssq58 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [11:00:55] (03PS3) 10Giuseppe Lavagetto: Remove the last references to pybal on fenari [puppet] - 10https://gerrit.wikimedia.org/r/160467 [11:01:09] RECOVERY - puppet last run on cp1043 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [11:01:33] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Remove the last references to pybal on fenari [puppet] - 10https://gerrit.wikimedia.org/r/160467 (owner: 10Giuseppe Lavagetto) [11:02:35] (03PS10) 10Yuvipanda: nagios_common: Refactor custom command definitions [puppet] - 10https://gerrit.wikimedia.org/r/161478 [11:11:23] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [11:11:38] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [11:12:18] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [11:12:19] <_joe_> ^^ that was the dumbest part of me [11:12:48] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [11:13:03] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "LGTM in general; just a couple of comments;" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/161478 (owner: 10Yuvipanda) [11:13:56] _joe_: bah, I've git set to strip all trailing spaces before commit, so I presumed that would pick it up. apparently it doesn't pick it up if you do a git add on a new file [11:14:57] <_joe_> YuviPanda: eheh that was a lame correction honestly [11:15:07] <_joe_> the other two remarks are more interesting [11:15:19] I find them useufl anyway :) [11:15:24] <_joe_> YuviPanda: I'm the one usually spreading whitespaces around, just because I'm lazy [11:16:45] _joe_: I'm wary of collecting things around, since collecting is a bit bogus on labs. It'll probably work here, but I prefer this method (more explicit and undistributed) [11:16:59] <_joe_> ok that's ok [11:17:06] <_joe_> I was doubtful anyway [11:17:08] <_joe_> :) [11:17:24] <_joe_> oh, move check_ganglia there as well then :) [11:17:42] <_joe_> (and maybe while we're at it, let's kill check_ganglia ASAP) [11:17:47] (03PS11) 10Yuvipanda: nagios_common: Refactor custom command definitions [puppet] - 10https://gerrit.wikimedia.org/r/161478 [11:18:02] _joe_: I'm going to move them all (we have a number of custom checks) in followup patches [11:18:44] <_joe_> ok! thanks! [11:19:02] _joe_: wondering if I should do one patch per check, or just move 'em all in one go [11:19:21] hmm, doing it in a couple of patches (not one per patch, but not all in one go) might make sense [11:20:01] <_joe_> one per patch is advisable [11:20:06] hmm, ok [11:20:11] _joe_: want to merge the current ones? [11:20:12] <_joe_> I've got no issues merging [11:20:15] <_joe_> yep [11:20:18] whee [11:20:21] <_joe_> give me 5 mins [11:20:23] sure [11:20:27] <_joe_> I'm patching hhvm atm [11:21:57] cool [11:22:30] _joe_: actually, hold off a moment, it seems to have messed up shinken [11:22:48] <_joe_> ok [11:23:04] <_joe_> it will take me more than 5 minutes it seems [11:23:15] heh [11:27:41] (03PS1) 10Matanya: udp2log: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/161926 [11:27:44] (03PS12) 10Yuvipanda: nagios_common: Refactor custom command definitions [puppet] - 10https://gerrit.wikimedia.org/r/161478 [11:27:46] (03PS5) 10Yuvipanda: nagios_common: Extract user_definitions from icinga [puppet] - 10https://gerrit.wikimedia.org/r/161446 [11:30:46] _joe_: ok, fixed [11:40:14] (03PS1) 10Gerrit Patch Uploader: Set wgUploadNavigationUrl for eowiki to localized version of UploadWizard [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161928 (https://bugzilla.wikimedia.org/69055) [11:40:19] (03CR) 10Gerrit Patch Uploader: "This commit was uploaded using the Gerrit Patch Uploader [1]." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161928 (https://bugzilla.wikimedia.org/69055) (owner: 10Gerrit Patch Uploader) [11:43:44] <_joe_> YuviPanda: at this point, let's wait after lunch [11:43:51] _joe_: sure [11:55:50] (03PS2) 10Steinsplitter: Set wgUploadNavigationUrl for eowiki to localized version of UploadWizard [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161928 (https://bugzilla.wikimedia.org/69055) (owner: 10Gerrit Patch Uploader) [12:10:02] !log shutdown of db1050 to install trusty [12:10:07] Logged the message, Master [12:12:49] PROBLEM - Host db1050 is DOWN: PING CRITICAL - Packet loss = 100% [12:18:08] RECOVERY - Host db1050 is UP: PING OK - Packet loss = 0%, RTA = 1.50 ms [12:39:13] (03CR) 10Alexandros Kosiaris: [C: 032] Fix typo in mathoid LVS group description [puppet] - 10https://gerrit.wikimedia.org/r/161609 (owner: 10Catrope) [12:42:17] (03PS13) 10Yuvipanda: nagios_common: Refactor custom command definitions [puppet] - 10https://gerrit.wikimedia.org/r/161478 [12:42:19] (03PS1) 10Yuvipanda: nagios_common: Move check_dsh_groups into module [puppet] - 10https://gerrit.wikimedia.org/r/161934 [13:11:26] (03PS1) 10Giuseppe Lavagetto: backport of https://github.com/facebook/hhvm/pull/3811/ [debs/hhvm] - 10https://gerrit.wikimedia.org/r/161936 [13:11:44] <_joe_> Is there some proficient c++ coder around? [13:11:52] <_joe_> if so, please take a look at ^^ [13:12:20] <_joe_> I think I backported tim's PR correctly, but given I'm basically an illiterate, I'd use a review [13:12:46] <_joe_> s/proficient/not completely ignorant as myself/ [13:16:19] * YuviPanda looked, but then remembered he hadn't written any real C++ ever, so refrains [13:19:06] ( pasting from the labs channel ) where can I find the apparmor logs in my labs instance ? [13:19:37] tonythomas: https://wiki.ubuntu.com/DebuggingApparmor [13:19:59] _joe_: I'll take a look [13:21:00] <_joe_> Coren: so, I'm working on the 3.3.0 branch and tim worked on HEAD, which was heavily changed [13:22:41] * Coren loves the comment in the original code -- Nobody will ever do *that* is _always_ false. [13:23:13] YuviPanda: ok. found the log in /var/log/kern.log. Thanks [13:23:18] cool [13:23:39] Coren: I've written such comments on Android which are true! [13:24:08] mostly to catch a checked exception which says 'I do not know any encoding called utf-8', which can never happen (Android guarantees that) [13:24:26] also for a 'package not found' exception, when the name I'm passing it is the name of the currently running package [13:27:12] (03CR) 10coren: [C: 031] "This should be functionally equivalent to Tim's patch on github, provided the JIT classes have no significant semantic differences between" [debs/hhvm] - 10https://gerrit.wikimedia.org/r/161936 (owner: 10Giuseppe Lavagetto) [13:27:35] (03PS2) 10Yuvipanda: nagios_common: Move check_dsh_groups into module [puppet] - 10https://gerrit.wikimedia.org/r/161934 [13:27:57] _joe_: Insofar as I can tell, your patch does the right thing to match Tim's; but we presume their class that actually constructs the code is semantically the same. [13:28:29] And I have no way to tell how stable that api has been between the versions. [13:29:29] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: Epic puppet fail [13:29:39] _joe_: I've to go for a bit in about 30min, think you will have time to merge the nagios_common stuff before that? [13:30:00] <_joe_> YuviPanda: I'll take a look [13:30:04] cool [13:30:14] <_joe_> Coren: it's 5 days of difference, really [13:30:14] I also presume that unlikelyIfThenElse is equivalent to unlikelyCond (which seems reasonable) [13:30:19] <_joe_> yep [13:30:34] <_joe_> they changed the name, but AFAICT not the signature of the function [13:31:06] Yeah; that's what I figured too. The name just sucked. :-) [13:33:06] <_joe_> actually, I was wrong [13:33:30] <_joe_> https://github.com/facebook/hhvm/blob/master/hphp/runtime/vm/jit/code-gen-x64.cpp#L195 vs https://github.com/facebook/hhvm/blob/HHVM-3.3/hphp/runtime/vm/jit/code-gen-x64.cpp#L159 [13:34:22] <_joe_> so, I need to add it [13:34:25] <_joe_> meh [13:35:16] You mean because of the return value? [13:35:34] <_joe_> yep [13:36:11] Tim's not /using/ the rv in his patch. [13:37:00] Oh! But there's an extra Vreg. [13:37:13] <_joe_> yep :) [13:38:16] <_joe_> re-working it [13:38:20] Bleh; so the functions are only /mostly/ equivalent. [13:39:04] * Coren tries to figure out why the hell the function would return one of its arguments. [13:39:06] <_joe_> I got confused because they converted themselves some unlikelyIfThenElse to unlikelyCond [13:39:24] <_joe_> because it modifies it internally AFAICS [13:39:47] Sure, but it's passed by ref. [13:44:31] <_joe_> anyway, correcting that [13:47:48] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [13:51:36] (03PS1) 10Yuvipanda: nagios_common: Move wikidata check to a check_command define [puppet] - 10https://gerrit.wikimedia.org/r/161939 [13:52:28] (03PS2) 10Yuvipanda: nagios_common: Move wikidata check to a check_command define [puppet] - 10https://gerrit.wikimedia.org/r/161939 [13:53:37] <_joe_> YuviPanda: I'll get to you eventually [13:53:51] _joe_: :) I'm just moving check commands one by one. [13:57:05] (03PS2) 10Giuseppe Lavagetto: backport of https://github.com/facebook/hhvm/pull/3811/ [debs/hhvm] - 10https://gerrit.wikimedia.org/r/161936 [13:57:16] <_joe_> Coren: ^^ [13:57:24] (03PS3) 10Yuvipanda: nagios_common: Move wikidata check into module [puppet] - 10https://gerrit.wikimedia.org/r/161939 [13:57:26] (03PS1) 10Yuvipanda: nagios_common: Move check_cert into module [puppet] - 10https://gerrit.wikimedia.org/r/161942 [13:58:34] Ah, you went with bringing in the templated function in rather than rework to use unlikelyIfThenElse? [13:58:54] <_joe_> Coren: yep :) [13:59:01] <_joe_> can't harm us I guess [14:02:48] (03CR) 10coren: [C: 031] "With the same caveat that this relies on the JIT semantics not having changed too much." [debs/hhvm] - 10https://gerrit.wikimedia.org/r/161936 (owner: 10Giuseppe Lavagetto) [14:03:48] <_joe_> Coren: I guess if that's wrong I' [14:03:54] <_joe_> ll notice runnign tests [14:04:03] Presumably. :-) [14:04:36] (03PS3) 10Giuseppe Lavagetto: backport of https://github.com/facebook/hhvm/pull/3811/ [debs/hhvm] - 10https://gerrit.wikimedia.org/r/161936 [14:04:54] <_joe_> I also added the test as a quick one, so that we'll run it when building the package [14:05:11] Better than the one in the original patch too. :-) [14:05:13] (03PS1) 10Yuvipanda: nagios_common: Move check_solr into module [puppet] - 10https://gerrit.wikimedia.org/r/161943 [14:08:44] * YuviPanda goes afk for a bit, will be back soon [14:10:41] mark__: Can you help me out with a networking issue? I need access between labcontrol2001 and virt1000 on ports 4444 and 8989. [14:10:56] Those ports are defined in iptables, but I don't see them actually used anyplace... [14:12:50] !log Jenkins deleted job mediawiki-core-lint , replaced by mediawiki-core-phplint [14:12:56] Logged the message, Master [14:13:02] (03CR) 10Giuseppe Lavagetto: [C: 032] nagios_common: Refactor custom command definitions [puppet] - 10https://gerrit.wikimedia.org/r/161478 (owner: 10Yuvipanda) [14:13:45] _joe_: yay. it is dependent on another patch tho [14:13:57] https://gerrit.wikimedia.org/r/#/c/161446/5 [14:14:16] <_joe_> ugh YuviPanda [14:14:22] <_joe_> I hate dependent patches :P [14:14:37] _joe_: :D I quite like them! all the patches are in a series of dependent patches :) [14:14:50] I stopped using git review because it sucks with dependent patches [14:15:22] _joe_: we should just merge the patch it depends on, run on neon and see if something breaks [14:15:28] (03PS6) 10Giuseppe Lavagetto: nagios_common: Extract user_definitions from icinga [puppet] - 10https://gerrit.wikimedia.org/r/161446 (owner: 10Yuvipanda) [14:16:57] (03CR) 10Giuseppe Lavagetto: [C: 032] nagios_common: Extract user_definitions from icinga [puppet] - 10https://gerrit.wikimedia.org/r/161446 (owner: 10Yuvipanda) [14:17:22] (03PS14) 10Yuvipanda: nagios_common: Refactor custom command definitions [puppet] - 10https://gerrit.wikimedia.org/r/161478 [14:17:27] I might have killed jenkins [14:17:37] _joe_: ^ needs another +2 (because of the rebase) [14:17:41] <_joe_> hashar: about time! [14:17:49] I'll rebase all the patches that depend on it locally once that gets merged [14:17:53] <_joe_> YuviPanda: yeah lemme verify the preceding one first [14:17:59] _joe_: ah cool [14:18:04] should be a no-op [14:18:04] _joe_: I have more or less a way to replace jenkins entirely :-D [14:18:13] _joe_: zuul -> gearman -> ansible :-D [14:18:28] ansible? [14:18:33] <_joe_> not sure that's a good idea [14:18:46] mmm, so we'll have salt, puppet and ansible? :D [14:19:10] (03PS6) 10Alexandros Kosiaris: module/role for url-downloader [puppet] - 10https://gerrit.wikimedia.org/r/159738 [14:19:10] <_joe_> YuviPanda: not sure we'll have salt forever [14:19:14] the Jenkins jobs are defined by a yaml schema in JJB [14:19:17] yeah, I saw the email [14:19:27] and the DSL is very very close to Ansible tasks definition [14:19:29] I don't know if we use it for anything other than as a dsh replacement. [14:19:43] so it is not too much work to have JJB describe ansible tasks instead of Jenkins jobs [14:19:45] <_joe_> YuviPanda: I'm pretty agreeing with ori [14:20:00] <_joe_> and - facter is lame but still in some ways better [14:20:26] _joe_: yeah. salt, at my first glance, seemed to be in a bit of an identity crisis as to what it is [14:20:38] (03PS7) 10Alexandros Kosiaris: module/role for url-downloader [puppet] - 10https://gerrit.wikimedia.org/r/159738 [14:22:17] (03CR) 10Alexandros Kosiaris: module/role for url-downloader (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/159738 (owner: 10Alexandros Kosiaris) [14:25:11] <_joe_> btw yuvi I like the direction your changes are going to [14:25:15] <_joe_> more modularization [14:25:16] _joe_: :D [14:25:32] <_joe_> else, I'd have seen -2s :) [14:25:55] _joe_: yeah! I think having manifests/misc/icinga.pp not exist would be a nice end goal of the first set of refactors [14:26:00] <_joe_> ok so, please rebase the other changes [14:26:09] YuviPanda: ah yuvi! were can I fill bug for the new monitoring system you are setting up ? :D [14:26:12] _joe_: yeah! [14:26:20] YuviPanda: I have a feature request [14:26:23] hashar: ah, good question. nowhere yet, but put it in the general labs component [14:26:32] <_joe_> YuviPanda: not having a 2k lines long commangs.cfg file [14:26:39] YuviPanda: will [14:26:43] _joe_: yeah, that as well :D [14:26:56] <_joe_> hashar: opsens use that wonderful bug tracking system called "/dev/null" [14:26:59] _joe_: yeah, doing [14:27:09] <_joe_> YuviPanda: we have a lot to teach you I see [14:27:22] (03PS3) 10Yuvipanda: nagios_common: Move check_dsh_groups into module [puppet] - 10https://gerrit.wikimedia.org/r/161934 [14:27:24] (03PS2) 10Yuvipanda: nagios_common: Move check_solr into module [puppet] - 10https://gerrit.wikimedia.org/r/161943 [14:27:26] (03PS2) 10Yuvipanda: nagios_common: Move check_cert into module [puppet] - 10https://gerrit.wikimedia.org/r/161942 [14:27:28] (03PS4) 10Yuvipanda: nagios_common: Move wikidata check into module [puppet] - 10https://gerrit.wikimedia.org/r/161939 [14:27:28] _joe_: ho I feel them in RT for prod and Wikimedia > labs infrastructure for labs [14:27:32] _joe_: rebased :) [14:27:35] _joe_: works well nowadays! [14:27:36] _joe_: haha :D [14:27:55] (03PS1) 10Alexandros Kosiaris: Remove tridge [puppet] - 10https://gerrit.wikimedia.org/r/161948 [14:28:53] _joe_: did you force a run on neon? [14:29:23] (03CR) 10Giuseppe Lavagetto: [C: 032] nagios_common: Move check_dsh_groups into module [puppet] - 10https://gerrit.wikimedia.org/r/161934 (owner: 10Yuvipanda) [14:29:38] <_joe_> YuviPanda: with each change ;) [14:29:45] _joe_: w00t [14:29:47] <_joe_> well, the next few ones are easy [14:29:49] things seem unbroken so far [14:29:52] <_joe_> so, just merging [14:29:53] yeah [14:29:54] <_joe_> they are [14:29:56] coool :) [14:30:36] (03CR) 10Giuseppe Lavagetto: [C: 032] nagios_common: Move wikidata check into module [puppet] - 10https://gerrit.wikimedia.org/r/161939 (owner: 10Yuvipanda) [14:31:05] (03CR) 10Giuseppe Lavagetto: [C: 032] nagios_common: Move check_cert into module [puppet] - 10https://gerrit.wikimedia.org/r/161942 (owner: 10Yuvipanda) [14:31:21] (03CR) 10Giuseppe Lavagetto: [C: 032] nagios_common: Move check_solr into module [puppet] - 10https://gerrit.wikimedia.org/r/161943 (owner: 10Yuvipanda) [14:35:31] <_joe_> YuviPanda: all seems good, but please give a look at the checks [14:36:34] _joe_: yeah, looking [14:39:25] _joe_: I looked at check_graphite, seems ok [14:39:47] (03PS4) 10Giuseppe Lavagetto: backport of https://github.com/facebook/hhvm/pull/3811/ [debs/hhvm] - 10https://gerrit.wikimedia.org/r/161936 [14:40:00] * YuviPanda checks wikidata [14:40:17] _joe_: others (at least the ones I've permission to submit) seem ok [14:42:05] _joe_: thanks for merging/CRing! I'll move the other checks later today / tomorrow [14:45:52] (03CR) 10Ori.livneh: [C: 031] "Does the test pass? +2 if so." [debs/hhvm] - 10https://gerrit.wikimedia.org/r/161936 (owner: 10Giuseppe Lavagetto) [14:50:48] FlorianSW, Glaisher: Ping for SWAT in 10 minutes [14:50:56] <_joe_> ori: the test doesn't work when moved to quick, so I mobved it back to slow [14:51:04] <_joe_> gonna run the slow ones now [14:51:15] anomie: present [14:52:58] FlorianSW: Are we still wanting to do your change in light of https://lists.wikimedia.org/pipermail/wikitech-l/2014-September/078731.html ? [14:54:00] anomie: i was thinking about this, too. But that's no matter. If the extension itself doesn't have a REL1_24, it will not be displayed. And the conservation of the thread goes to the direction to don't rebranch the REL1_24 :) [14:54:09] anomie: so i would say: yes :) [14:54:16] ok [14:55:25] * YuviPanda goes afk for a bit [15:00:17] FlorianSW: I'll do yours first [15:00:21] (03PS2) 10Anomie: Add REL1_24 as branch in ExtensionDistributor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161666 (owner: 10Florianschmidtwelzow) [15:00:28] (03CR) 10Anomie: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161666 (owner: 10Florianschmidtwelzow) [15:00:29] anomie: ok :) thx :P [15:00:33] (03Merged) 10jenkins-bot: Add REL1_24 as branch in ExtensionDistributor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161666 (owner: 10Florianschmidtwelzow) [15:01:11] !log anomie Synchronized wmf-config/CommonSettings.php: SWAT: Add REL1_24 as branch in ExtensionDistributor [[gerrit:161666]] (duration: 00m 10s) [15:01:12] FlorianSW: ^ test please [15:01:17] Logged the message, Master [15:01:29] I'll do mine next. [15:01:32] * FlorianSW is testing [15:01:51] anomie: great, it's working :) Thanks! [15:01:55] (03PS2) 10Anomie: 'securepoll-create-poll' for sysop on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161653 [15:02:02] (03CR) 10Anomie: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161653 (owner: 10Anomie) [15:02:06] (03Merged) 10jenkins-bot: 'securepoll-create-poll' for sysop on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161653 (owner: 10Anomie) [15:02:25] !log anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Add securepoll-create-poll right to sysop on testwiki [[gerrit:161653]] (duration: 00m 09s) [15:02:27] anomie: ^ test please [15:02:31] Logged the message, Master [15:02:47] anomie: Seems to work [15:03:11] Glaisher: If you want your SWAT patches deployed, please respond. [15:03:25] * anomie doesn't even see yurik online [15:05:25] anomie: graph extension is still enabled on some wikis, and i know a demo page on mw.org, maybe do this without yurik? :/ [15:05:34] Reedy: Regarding Gerrit 161684 (removing some namespace aliases so the interwiki prefixes can work), is there a maintenance script to run? Or just let the wiki's users null edit pages? [15:05:57] FlorianSW: If you want to be responsible for testing it and such, that's fine with me. [15:06:17] anomie: yes, just see, if it works, if there is an error, i can't do anything :) [15:06:31] (03PS2) 10Anomie: Enable Graph ext on mediawiki.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161908 (owner: 10Yurik) [15:06:37] (03CR) 10Anomie: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161908 (owner: 10Yurik) [15:06:42] (03Merged) 10jenkins-bot: Enable Graph ext on mediawiki.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161908 (owner: 10Yurik) [15:07:00] !log anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Graph extension on mediawiki.org [[gerrit:161908]] (duration: 00m 09s) [15:07:01] FlorianSW: ^ test please [15:07:09] * FlorianSW testing [15:08:49] anomie: working, nothing obviously failing [15:09:27] anomie: I like how you keep the patten even for your own patches :) [15:09:30] (03PS1) 10Alexandros Kosiaris: url_downloader: assign to chromium [puppet] - 10https://gerrit.wikimedia.org/r/161951 [15:09:48] greg-g: Me too (: [15:10:18] <_joe_> anomie: at first I thought you were speaking to yourself :P [15:10:59] (03PS1) 10Yuvipanda: nagios_common: Move check_ssl_cert into module [puppet] - 10https://gerrit.wikimedia.org/r/161952 [15:11:34] Glaisher: Ping me when you're ready to test your SWAT patches, if it's before 15:50 UTC. [15:11:52] (03PS2) 10Giuseppe Lavagetto: puppet: introduce hiera for production [puppet] - 10https://gerrit.wikimedia.org/r/160924 [15:15:45] (03PS1) 10Andrew Bogott: Remove a bunch of iptables classes that haven't been used for ages. [puppet] - 10https://gerrit.wikimedia.org/r/161954 [15:15:47] (03PS1) 10Andrew Bogott: Set up firewall rules for ldap servers. [puppet] - 10https://gerrit.wikimedia.org/r/161955 [15:16:26] Can I get a review/advice from someone who has experience with ferm? https://gerrit.wikimedia.org/r/#/c/161955/1 [15:17:48] (03PS1) 10Alexandros Kosiaris: Change url-downloader to point to new IP [dns] - 10https://gerrit.wikimedia.org/r/161956 [15:18:25] YuviPanda: what about our really fun naming bikeshed/! [15:18:27] we merged already!? [15:18:34] ottomata: apparently :) [15:18:39] ottomata: we can still change he name in the future [15:18:53] ottomata: but I'm inclined to just move the git merge stuff out of monitoring module into nagios_common [15:19:20] either way, but I don't like the nagios_common name! [15:19:23] so I still want to bikeshed! [15:19:34] ottomata: :D reply on the thread! :D [15:19:44] no one has responded to my ideas there! [15:19:45] yet [15:20:05] ok, i'm really busy with some analytics reviews and meetings and kafkatee troubleshooting today, so i probably won't be able to do think about it much today [15:20:20] sooo, ok ok, keep working, this isn't being used in prod yet, right? [15:20:27] let's make sure we get that stuff worked out before we put it into prod [15:20:28] ottomata: it is :) [15:20:30] !!! [15:20:40] ottomata: were merged into neon earlier [15:20:48] but I see no reason we can't rename later if we need to [15:20:51] yeah guess not [15:20:57] ok... [15:21:15] hey _joe_, i think it is fine [15:21:29] but there was still outstanding conversation on that change...you sure it was ready to be merged? [15:22:44] ottomata: almost all were addressed, no? [15:22:53] except the naming bikeshed, which I don't think should block merges [15:23:34] ottomata: I had to remove support for templating (content => ) because of a puppet bug, so I'll have to find some way of factoring that in when I port the checks that use templates for config [15:27:55] * YuviPanda steps afk [15:29:12] <_joe_> ottomata: I know you don't like it, and I've been the usual bully, sorry :) [15:29:33] <_joe_> I figured we could continue discussing naming for now as it's not going to be complicated to change [15:29:43] <_joe_> while we don't block yuvi [15:29:59] <_joe_> whose changes are in all other respects very good [15:30:35] haha, you just wait til faidon comes back and makes your modules go through weeks of review to get them perfect...:) [15:33:53] <_joe_> I don't think choosing a name is part of "getting them perfect" [15:33:59] <_joe_> but, that's just me [15:39:32] Jeff_Green: ^ around ? [15:43:04] tonythomas: yup [15:43:28] (03PS1) 10Manybubbles: Lower throttle on two cirrus jobs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161960 [15:44:23] ah. great. I found out we had little bug in our exim regex - fixed in here https://gerrit.wikimedia.org/r/#/c/161679/ -- and in our labs instance -- exim is not able to do the curl request due to some reason ( some firewall blocking or so ? ) [15:45:05] Jeff_Green: I think its defineetly due to some permission issue out there [15:45:05] ok [15:45:17] did the regex fix get merged? [15:46:17] Jeff_Green: not yet. [15:46:39] I found symbol '+' turning up in the VERP addresses [15:46:55] is it in every field, or just the last field? [15:47:05] only in the last field [15:47:15] odd [15:47:19] I have pasted a sample one in the commit message [15:47:26] true that. [15:47:34] ya, i saw but I was wondering why that field was different [15:47:52] true. Let me recheck our hashing algorithms [15:48:28] (03PS1) 10Nikerabbit: Update CX config for changes in master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161961 [15:49:22] (03CR) 10Andrew Bogott: "My only real concern with this is that invoking ferm on a previously un-ferm'd system will spontaneously close a bunch of other ports that" [puppet] - 10https://gerrit.wikimedia.org/r/161955 (owner: 10Andrew Bogott) [15:49:27] hm… akosiaris, ferm review? https://gerrit.wikimedia.org/r/#/c/161955/ [15:52:24] !log reboot ms-be2001 into PXE to test a re-install [15:52:28] Logged the message, Master [15:53:58] (03CR) 10Jsahleen: [C: 031] "lgtm" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161961 (owner: 10Nikerabbit) [15:55:46] (03CR) 10KartikMistry: [C: 031] Update CX config for changes in master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161961 (owner: 10Nikerabbit) [15:56:49] (03CR) 10Nikerabbit: [C: 032] "+1+1 is +2, right?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161961 (owner: 10Nikerabbit) [15:56:54] (03Merged) 10jenkins-bot: Update CX config for changes in master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161961 (owner: 10Nikerabbit) [15:58:50] Jeff_Green: any idea where the '+' came in between ? [15:59:50] no [16:01:06] (03CR) 10Alexandros Kosiaris: [C: 04-1] Set up firewall rules for ldap servers. (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/161955 (owner: 10Andrew Bogott) [16:01:14] andrewbogott: done ^ [16:02:58] Jeff_Green: we have a hash_hmac which should never(?) return a '+' which is base_64 encoded too [16:02:59] (03CR) 10Jgreen: [C: 04-1] Corrected the exim regex expression (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/161679 (owner: 1001tonythomas) [16:04:01] tonythomas: i wondered the same thing--also aren't you using hmac for each field? [16:04:38] tonythomas: note, assuming the + turns out to be sane your regex needs a minor fix [16:05:59] yup. we have the hmac snipped, and later base64 encoded [16:06:10] Jeff_Green: I am trying out the regex fix [16:06:14] will update in a while [16:06:21] k. [16:06:23] base64 encoded can contain + [16:06:30] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [16:06:31] oh! yay ! [16:06:41] it's chars, numbers, + and / [16:06:43] I almost doubted the php hmacs [16:07:10] oh / too [16:07:11] so we should include the possibility of / too ? [16:07:13] https://en.wikipedia.org/wiki/Base64#Implementations_and_history [16:07:17] yeah ;) [16:07:57] ok. so something like \N^wiki-\w+-\w+-\w+-[+/\w]+$\N [16:07:58] ? [16:08:16] yeah [16:08:39] ok. let me test in the labs instance [16:09:50] ok. works as expected. now pushing in [16:12:30] (03PS2) 1001tonythomas: Corrected the exim regex expression [puppet] - 10https://gerrit.wikimedia.org/r/161679 [16:13:17] (03PS2) 10Andrew Bogott: Set up firewall rules for ldap servers. [puppet] - 10https://gerrit.wikimedia.org/r/161955 [16:13:46] Jeff_Green, hoo : done. now to our second issue. Just copy pasting something from labs earlier. I am able to get this command https://dpaste.de/cUK6 executed while the exim delivery is happening ( asking this shell to execute when the exim recieves a bounce, in the pipe transport ), but when I remove the echo " "> and make something like this [16:13:46] https://dpaste.de/yQSu -- exim shows a child process killed error message. [16:14:08] (03CR) 10jenkins-bot: [V: 04-1] Set up firewall rules for ldap servers. [puppet] - 10https://gerrit.wikimedia.org/r/161955 (owner: 10Andrew Bogott) [16:15:31] people in #exim told me that this can cause due to apparmor configs too [16:15:55] I tried killing all the configs and service of apparmor - but this problem seems to exist [16:16:15] what's exactly the problem? [16:16:20] tonythomas: curl is writing the output to stdout? [16:16:23] curl exiting with a non-zero exit code? [16:16:35] (03PS3) 10Andrew Bogott: Set up firewall rules for ldap servers. [puppet] - 10https://gerrit.wikimedia.org/r/161955 [16:16:47] if you can't make curl run silently, you could redirect stdout to /dev/null instead of to a file [16:16:58] hoo: yup. curl is exiting with a non zero exit code. it looks like that. [16:16:58] curl can run silently [16:17:05] -s [16:17:12] (03CR) 10jenkins-bot: [V: 04-1] Set up firewall rules for ldap servers. [puppet] - 10https://gerrit.wikimedia.org/r/161955 (owner: 10Andrew Bogott) [16:17:13] tonythomas: That's weird... it usually only does that if you specify -f [16:17:15] "email@-" <-- the '@' means read from file [16:17:19] and '-' means read from stdin [16:17:23] so if you don't pipe anything to it.. [16:17:40] (03PS1) 10Nikerabbit: Use correct variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161968 [16:17:49] ori: Exim does that... I guess [16:17:53] (03PS4) 10Andrew Bogott: Set up firewall rules for ldap servers. [puppet] - 10https://gerrit.wikimedia.org/r/161955 [16:18:30] Jeff_Green: I tried running the curl command in the shell and I got the output printed to the file, but my intention was not to get the output -- but to make curl do the real post request, which is not happening [16:18:38] akosiaris: How about this? https://gerrit.wikimedia.org/r/#/c/161955/4 [16:19:11] ori: email@- usually got my bounce email PSOTed to the API under the name 'email' [16:19:25] but it doesnt seem to happen here in labs [16:20:28] tonythomas: do you pipe anything to that curl [16:20:32] like: [16:20:36] echo foo | curl ... [16:20:59] hoo: no - but I am having the bounce email there i nthe stdin, right ? [16:21:05] *in the [16:21:22] it seemed to work in my localhost though [16:21:29] tonythomas: exim probably does that (haven't checked), but if you run the curl per hand, you of course have to give it someting as stdin [16:21:30] (03CR) 10KartikMistry: [C: 031] "Now, tested with http://cx.wmflabs.org/ :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161968 (owner: 10Nikerabbit) [16:21:41] * something [16:23:24] hoo: in that case, I removed the "email@-" completely from that line, and still I get the same result [16:23:35] which is exactly? [16:24:34] hoo: I gave something like command = /usr/bin/curl -H 'Host: mediawiki-verp.wmflabs.org' mediawiki-verp.wmflabs.org/w/api.php -d "action=bouncehandler" [16:25:29] andrewbogott: kind of unsure about "@resolve(#{x})". I think ruby will try to reference the @resolve (valid ruby syntax for instance variables) ? [16:25:42] hoo: and I get the same error https://dpaste.de/qCeo [16:25:42] not sure though... [16:25:49] (03CR) 10Jsahleen: [C: 031] "lgtm" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161968 (owner: 10Nikerabbit) [16:25:52] tonythomas: Looks good at a sight, although if that's in the exim config. you might want to add the email@- part again [16:26:04] I 'll test it locallly [16:26:31] (03CR) 10Nikerabbit: [C: 032] Use correct variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161968 (owner: 10Nikerabbit) [16:26:36] (03Merged) 10jenkins-bot: Use correct variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161968 (owner: 10Nikerabbit) [16:26:47] akosiaris: I cribbed that from manifests/role/rt.pp, faidon's code. [16:26:48] hoo: but the curl request would execute correctly without that one too right ? - which is not happening [16:26:55] Although maybe I'm misunderstanding what it does there [16:26:59] tonythomas: Yeah [16:27:04] makes me think something is there wrong with exim doing out curl requests [16:27:19] let me remove the user=nobody and group=nogroup [16:27:25] tonythomas: yeah, try that [16:28:05] andrewbogott: yeah it is correct, my internal ruby parser was wrong [16:28:21] hoo: thats removed though. - and still causing the same issue. [16:28:38] will curl request need some other user and group permission ? [16:28:49] like apache or ? [16:29:09] (03CR) 10Alexandros Kosiaris: [C: 032] Set up firewall rules for ldap servers. [puppet] - 10https://gerrit.wikimedia.org/r/161955 (owner: 10Andrew Bogott) [16:29:17] akosiaris: thank you! [16:29:21] tonythomas: Usually not, no [16:29:23] that should just work [16:29:34] look into the corresponding logs? [16:29:58] /var/log/exim4/mainlog still have the same message again [16:30:28] (03CR) 10Andrew Bogott: [C: 032] Remove a bunch of iptables classes that haven't been used for ages. [puppet] - 10https://gerrit.wikimedia.org/r/161954 (owner: 10Andrew Bogott) [16:30:28] it says error ignored - -is there a way to show some more detailed error report ? [16:30:35] hmm. I'm packaging few packages under operation/debs/... [16:30:45] Whom should I poke to get such rights? [16:31:17] tonythomas: try -o /dev/null [16:31:44] Jeff_Green: okey. on the same line after action=bouncehandler ? [16:31:54] sure [16:32:06] it's just a curl flag to write to /dev/null instead of stdout [16:32:11] the 'email@-' can be removed right ? [16:32:14] oke [16:32:18] no [16:32:21] (03PS1) 10Ori.livneh: HHVM: Update the extension dir path for 3.3 changes [puppet] - 10https://gerrit.wikimedia.org/r/161971 [16:32:27] okey then [16:32:33] isn't email@- is an exim variable? [16:32:43] i.e. doesn't exim replace that with the envelope recipient address? [16:33:13] i don't recall how you're fetching that address on the receiving end [16:34:23] Jeff_Green: actually email is our POST variable, its "email@-" right ? and we get the entire message in that 'email' variable, and fetch it from our API [16:34:59] oic [16:35:08] yeah so you can't get rid of that :-) [16:35:14] Jeff_Green: now the POST completed perfectly ! [16:35:18] no errors in mainlog [16:35:21] checking /dev/null [16:35:24] ha [16:35:30] nothing there :o [16:35:33] its null right [16:35:37] yes /dev/null is the bit bucket [16:35:46] let me check if something reached in the API [16:36:02] there may be other ways to get curl not to spew to stdout or a file too [16:37:04] I think I have got API requests from external IPs blocked by our variable https://github.com/wikimedia/mediawiki-extensions-BounceHandler/blob/master/BounceHandler.php#L66 [16:37:09] let me open that up [16:37:49] tonythomas: Look into the apache access logs and see whether the requests made it through [16:38:02] (assuming you use apache)= [16:38:28] its somewhere in /var/ ? [16:38:52] in /var/log/httpd or apache2 or something [16:39:12] depends on the distro you use [16:40:22] (03PS2) 10Manybubbles: Push Cirrus' non-content enwiki shards apart [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161031 [16:40:27] away for now [16:42:07] no :( nothing there in the access logs [16:42:16] looks like something is still wrong somewhere [16:42:29] (03PS1) 10Andrew Bogott: Confusingly, ldap::role::server::labs is the server for labs, not a server in labs. [puppet] - 10https://gerrit.wikimedia.org/r/161975 [16:42:55] Jeff_Green: so what actually did -o /dev/null did ? [16:43:02] killed the error silently ? [16:43:19] no [16:43:35] curl is fetching a web page and printing it to stdout [16:43:50] exim and exim doesn't expect output from the command [16:44:02] (guessing) [16:44:17] so we just made exim ignore stdout [16:44:37] if you get output on stderr it should still explode [16:44:47] (03CR) 10Andrew Bogott: [C: 032] Confusingly, ldap::role::server::labs is the server for labs, not a server in labs. [puppet] - 10https://gerrit.wikimedia.org/r/161975 (owner: 10Andrew Bogott) [16:45:41] hey Jeff_Green ! [16:45:47] I think there is indeed some exim logs [16:45:51] : ) [16:45:59] 10.68.16.65 - - [22/Sep/2014:16:45:24 +0000] "POST /w/api.php HTTP/1.1" 200 211714 "-" "curl/7.35.0" [16:46:21] ok [16:46:24] I thought something like action='bouncehandler' and all will come, but this seems to come up everytime I send a bounce-able email [16:46:46] let me try with an email send which should never bounce [16:47:51] great. nothing like that this time. looks like our bounce got posted somehow [16:47:52] :D [16:48:37] our bounce_records is empty still though. any idea where to get some API access logs ? [16:49:32] I'm not familiar with that are of labs, but i guess in /var/log/apache2/ on the API server [16:50:08] but that too gives only there was a request ! I wanted to see what happened next [16:50:28] I think we have wgDebugLog messages in our extension. still thinking how to view that too :D [16:50:44] oic, yeah I think you're on the right track there [16:52:17] Jeff_Green: and, I tried the -o /dev/null in my local exim configuration and it seems to push things into the API find -- Ican see it in my local db [16:52:24] so push that into gerrit ? [16:52:56] i think you should test a bit more [16:53:21] i.e. make sure exim handles various API failure modes sanely [16:53:33] true. [16:53:51] now I need to try and find out where the $wgDebugLogs logs to [16:54:35] at least test: 'can't reach API server' and 'HTTP error response from API server' cases [16:55:23] okey. once I get hold of these logs -- I still dont get why the API never commited the user into the table [16:58:19] (03CR) 10Chad: [C: 032] Push Cirrus' non-content enwiki shards apart [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161031 (owner: 10Manybubbles) [16:58:25] (03Merged) 10jenkins-bot: Push Cirrus' non-content enwiki shards apart [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161031 (owner: 10Manybubbles) [16:58:45] ^d: did you mean to merge that now? [16:58:50] its a config change [16:58:52] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [16:59:04] <^d> Yeah, I was going to sync since it's mostly a no-op and there's no deployments going on. [16:59:16] <^d> (you already applied it live) [17:00:14] !log demon Synchronized wmf-config/InitialiseSettings.php: Push Cirrus' non-content enwiki shards apart (no-op) (duration: 00m 04s) [17:00:20] Logged the message, Master [17:02:12] <^d> manybubbles: Rather than taking up space in the swat from someone who needs a *real* change deployed :) [17:10:40] (03PS1) 10Alexandros Kosiaris: Setup EQIAD NTP servers [puppet] - 10https://gerrit.wikimedia.org/r/161984 [17:13:12] Jeff_Green: yay ! finally, the bounce got into the db [17:13:18] I removed the internal IP checks [17:13:21] nice [17:13:29] and now its there [17:13:40] * tonythomas exhales [17:13:45] phew [17:16:58] Jeff_Green: yay. and after 4 successful bounce - the user is now unconfirmed [17:17:16] http://mediawiki-verp.wmflabs.org/ [17:17:41] neat! [17:17:59] got to quit lab. brb [17:19:41] /away afk [17:22:45] !log updated HHVM on beta cluster to HHVM to 3.3.0-20140918+wmf1 [17:22:50] Logged the message, Master [17:31:17] (03PS2) 10Ori.livneh: HHVM: Update the extension dir path for 3.3 changes [puppet] - 10https://gerrit.wikimedia.org/r/161971 [17:31:23] (03CR) 10Ori.livneh: [C: 032 V: 032] HHVM: Update the extension dir path for 3.3 changes [puppet] - 10https://gerrit.wikimedia.org/r/161971 (owner: 10Ori.livneh) [17:35:12] if someone is on db1050 bsides me please let me know right now [17:35:25] 'w'? :) [17:35:33] lots of 'root' [17:35:36] not so helpful [17:35:37] doh [17:35:42] eggzactly [17:35:43] wall :) [17:35:45] 'last' [17:35:53] i thought we're not supposed to login as root anymore [17:35:54] apergos: folks shouldnt be on there as root! [17:35:54] hostnames are helpful [17:36:00] jgage: exactly! [17:36:11] apergos: take a screen shot and lets guilt trip folks in meeting =] [17:36:18] of course they should [17:36:30] hm i realize i don't know of a unix way to force-logout another user. is there one? [17:36:30] who's ewulczyn? [17:36:43] well, they should if they were getting on before first puppet run, as I did [17:36:55] kill their shell [17:36:57] jgage: kill his shell process [17:37:07] too my words out [17:37:09] *k [17:37:11] heh yeah [17:38:04] if you want a command: pkill -KILL -u UserID :D [17:38:21] heheh i suppose that works [17:38:24] the Big Hammer [17:38:37] (03PS1) 10Filippo Giunchedi: codfw-prod: initial ring [software/swift-ring] - 10https://gerrit.wikimedia.org/r/161992 [17:38:44] am I running some rogue process? [17:41:07] greg-g: do you think I might be able to do a deploy for a cirrus config change? I want to see if this change helps prevent some issues we've been having [17:43:52] (03PS2) 10Ori.livneh: misc::maintenance: clean-up [puppet] - 10https://gerrit.wikimedia.org/r/160232 [17:46:39] are newly-provisioned codfw hosts not specifically added to puppet supposed to let me ssh in? [17:47:12] not specifically added to any puppet manifest that is, the cert on the puppet master has been accepted [17:49:54] i think, perhaps there's a firewall rule not updated yet? [17:50:55] mark__: yeah ssh itself is fine, it doesn't recognize my key yet but puppet hasn't run yet it seems [17:51:10] the new_install key should work though [17:51:37] yep that's probably it [17:54:23] and... nik is gone [17:54:38] ^d: I assume you're aware of what nik wanted above at :41 ? [17:57:09] (03CR) 10Dzahn: [C: 032] "https://meta.wikimedia.org/wiki/Wikimedia_Foundation_elections/Board_elections/2013/Candidates#Mar.C3.ADa_Sefidari_.28Raystorm.29" [puppet] - 10https://gerrit.wikimedia.org/r/161921 (owner: 10Nemo bis) [18:07:38] (03PS2) 10Jforrester: Enable Flow on [[mw:User talk:Jdforrester (WMF)]] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161172 [18:07:40] (03PS2) 10Jforrester: Enable Flow on [[mw:Talk:MediaWiki 1.25]] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158237 [18:07:42] (03PS1) 10Jforrester: Enable Flow on [[mw:Talk:HHVM/About]] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162002 (https://bugzilla.wikimedia.org/71048) [18:14:00] PROBLEM - CI tmpfs disk space on lanthanum is CRITICAL: DISK CRITICAL - free space: /var/lib/jenkins-slave/tmpfs 26 MB (5% inode=99%): [18:17:40] PROBLEM - Disk space on lanthanum is CRITICAL: DISK CRITICAL - free space: /var/lib/jenkins-slave/tmpfs 13 MB (2% inode=99%): [18:18:49] Krinkle: ^ ? [18:19:10] andrewbogott is also looking [18:20:00] greg-g: Yep, that happens every few days. Used to happen like once a month, now almost every few days due to hhvm leaving files behind in /tmp [18:20:04] and /tmp being very small there [18:20:16] (03PS1) 10Legoktm: Revert "Add REL1_24 as branch in ExtensionDistributor" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162006 [18:20:39] https://wikitech.wikimedia.org/wiki/Server_Admin_Log#September_17 16:00 [18:20:43] James_F, prepare for your talk page to break a few times a week :D [18:20:58] Eloquence: Eh. It's very low traffic anyway. [18:21:11] Krinkle: greg-g : i once thought this is for it https://gerrit.wikimedia.org/r/#/c/157294/ [18:21:15] Eloquence: And rumours of Flow breakage are over-stated, IME, unless someone's playing silly-buggers. :-) [18:21:30] yes, lots of people are playing that particular game right now. [18:22:15] Krinkle: do you know, is it just rm -rf * in that dir, or is there some way to detect which files are stale? [18:22:54] df -h [18:22:55] tmpfs 512M 505M 7.8M 99% /var/lib/jenkins-slave/tmpfs [18:23:19] https://bugzilla.wikimedia.org/show_bug.cgi?id=69979 [18:23:40] RECOVERY - Disk space on lanthanum is OK: DISK OK [18:23:57] Krinkle: yes… [18:24:01] RECOVERY - CI tmpfs disk space on lanthanum is OK: DISK OK [18:24:17] andrewbogott: I tend to rm -rf *@* [18:24:20] and then maybe mwext-* [18:24:34] ok -- someone just cleaned up before I got there :) Was that you? [18:24:42] @{2,3,4,5,6} are extra work spaces [18:24:43] Yes [18:24:46] great! [18:25:01] Filing bug now [18:25:12] not entirely related to hhvm [18:25:41] (03CR) 10Greg Grossmeier: "The same problem is happening again on Lanthanum (prod Jenkins) which isn't in Labs." [puppet] - 10https://gerrit.wikimedia.org/r/157294 (https://bugzilla.wikimedia.org/69979) (owner: 10Dzahn) [18:25:52] Krinkle: andrewbogott ^^ [18:27:11] <^d> mutante: rt 8393 is in the rfp queue so I can't close it. Can you? [18:27:19] (03Abandoned) 10Legoktm: Revert "Add REL1_24 as branch in ExtensionDistributor" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162006 (owner: 10Legoktm) [18:28:12] ^d: [18:28:13] Ticket 8393: Status changed from 'new' to 'rejected' [18:28:17] <^d> thx [18:28:25] thanks for clean up [18:28:43] <^d> yw [18:29:02] (03CR) 10Legoktm: "This was probably a bit premature, but so was my proposal to revert it. I don't think this should have been done until the extension branc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161666 (owner: 10Florianschmidtwelzow) [18:30:04] (03CR) 10Legoktm: "(Also, in the future please add me as a reviewer to ExtensionDistributor-related things!)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161666 (owner: 10Florianschmidtwelzow) [18:32:11] !log lanthanum tmpfs filled up again, purged manually (bug 71128) [18:32:17] Logged the message, Master [18:32:30] greg-g: andrewbogott: https://bugzilla.wikimedia.org/show_bug.cgi?id=71128 [18:34:53] (03CR) 10Ori.livneh: "This class is only applied on terbium. Puppet Compiler result: (03CR) 10Hoo man: [C: 04-1] misc::maintenance: clean-up (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/160232 (owner: 10Ori.livneh) [18:42:53] hoo: good catch. thanks. [18:42:58] ;) [18:44:19] (03PS3) 10Ori.livneh: misc::maintenance: clean-up [puppet] - 10https://gerrit.wikimedia.org/r/160232 [18:49:35] (03CR) 10Florianschmidtwelzow: "> This was probably a bit premature, but so was my proposal to revert it. I don't think this should have been done until the extension bra" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161666 (owner: 10Florianschmidtwelzow) [19:02:44] (03PS3) 1001tonythomas: Corrected the exim regex expression and POST url [puppet] - 10https://gerrit.wikimedia.org/r/161679 [19:25:11] ori, https://cloud-images.ubuntu.com/vagrant/trusty/current/trusty-server-cloudimg-amd64-vagrant-disk1.box has gone AWOL [19:26:10] it's called trusty, not trustworthy [19:26:15] also, wtf. [19:26:16] ugh. [19:28:15] MaxSem: it is back :) [19:28:39] bad timing? I would have hoped for atomic directories [19:28:56] was this way at least for 5 minutes [19:29:11] https://cloud-images.ubuntu.com/vagrant/trusty/20140922/ [19:29:13] It's just appeared [19:29:30] Looks like they updated the symlinks before the files were staged [19:31:00] !log Jenkins is broken for extensions patches proposed against the wmf branches {{bug|71133}} [19:31:05] "oh that 5 minutes? nobody will notice" [19:31:06] Logged the message, Master [19:43:30] (03PS1) 10Reedy: Up image scaller wgMaxShellFileSize to 512MB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162038 [20:24:50] !log deployed Parsoid ff9476f9 [20:24:55] Logged the message, Master [20:41:38] (03CR) 10Ottomata: [C: 032 V: 032] udp2log: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/161926 (owner: 10Matanya) [20:42:19] PROBLEM - ElasticSearch health check on logstash1002 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.137 [20:42:19] PROBLEM - ElasticSearch health check on logstash1001 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.138 [20:44:10] PROBLEM - ElasticSearch health check for shards on logstash1001 is CRITICAL: CRITICAL - elasticsearch inactive shards 32 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 2, uunassigned_shards: 32, utimed_out: False, uactive_primary_shards: 36, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 71, uinitializing_shards: 0, unumber_of_data_nodes: 2} [20:44:49] PROBLEM - ElasticSearch health check for shards on logstash1003 is CRITICAL: CRITICAL - elasticsearch inactive shards 32 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 2, uunassigned_shards: 32, utimed_out: False, uactive_primary_shards: 36, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 71, uinitializing_shards: 0, unumber_of_data_nodes: 2} [20:45:30] PROBLEM - ElasticSearch health check for shards on logstash1002 is CRITICAL: CRITICAL - elasticsearch inactive shards 38 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 2, uunassigned_shards: 35, utimed_out: False, uactive_primary_shards: 36, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 65, uinitializing_shards: 3, unumber_of_data_nodes: 2} [20:45:50] PROBLEM - ElasticSearch health check on logstash1003 is CRITICAL: CRITICAL - elasticsearch (production-logstash-eqiad) is running. status: red: timed_out: false: number_of_nodes: 2: number_of_data_nodes: 2: active_primary_shards: 35: active_shards: 70: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 33 [20:46:10] PROBLEM - ElasticSearch health check on logstash1001 is CRITICAL: CRITICAL - elasticsearch (production-logstash-eqiad) is running. status: red: timed_out: false: number_of_nodes: 2: number_of_data_nodes: 2: active_primary_shards: 35: active_shards: 70: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 33 [20:46:52] logstash is sad apparently. I'm in a meeting but will look at it soon I guess [20:49:31] (03PS1) 10Spage: Enable Flow on mw.org Talk:HHVM/About [mediawiki-config] - 10https://gerrit.wikimedia.org/r/162124 [20:49:59] ops: can anyone else help bd808 out with the logstash ES cluster? [20:50:25] * bd808 is out of meeting and looking now in 10min before next meeting [20:52:03] !log logstash1002 went split brain from rest of logstash elastic search cluster. restarting [20:52:10] Logged the message, Master [20:52:23] robh: all of my labs instances have /etc/ssl/certs/Equifax_Secure_CA.pem -- it doesn't seem to have come from puppet. Any idea where it came from? I ask because my instances do /not/ have /etc/ssl/certs/GlobalSign_CA.pem which presumably I need for the new ldap certs to work [20:52:44] uh, onsite @ ulsfo [20:52:52] so i dunno if i can spare cycles to hunting it down [20:52:57] at this moment [20:52:59] 'k [20:53:00] * greg-g nods [20:53:06] oh [20:53:09] PROBLEM - ElasticSearch health check on logstash1002 is CRITICAL: CRITICAL - elasticsearch (production-logstash-eqiad) is running. status: red: timed_out: false: number_of_nodes: 3: number_of_data_nodes: 3: active_primary_shards: 35: active_shards: 70: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 33 [20:53:22] robh: that's sort of encouraging, suggests this is not a dumb question :) [20:54:39] !log split brain on logstash1002 preceded by by java OOM for elasticsearch [20:54:45] Logged the message, Master [20:59:58] (03PS1) 10Reedy: Add reedy to logstash-roots [puppet] - 10https://gerrit.wikimedia.org/r/162128 [21:00:23] Does that need an RT ticket to go with it? [21:03:30] RECOVERY - ElasticSearch health check for shards on logstash1002 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 7, timed_out: False, active_primary_shards: 36, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 93, initializing_shards: 3, number_of_data_nodes: 3 [21:03:49] RECOVERY - ElasticSearch health check for shards on logstash1001 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 7, timed_out: False, active_primary_shards: 36, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 93, initializing_shards: 3, number_of_data_nodes: 3 [21:03:59] RECOVERY - ElasticSearch health check for shards on logstash1003 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 7, timed_out: False, active_primary_shards: 36, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 93, initializing_shards: 3, number_of_data_nodes: 3 [21:04:42] !log production-logstash-eqiad healed by restarting elasticsearch on logstash1002 after OOM + split brain [21:04:48] Logged the message, Master [21:30:15] (03PS1) 10Andrew Bogott: Switch ldap servers. [puppet] - 10https://gerrit.wikimedia.org/r/162139 [21:32:09] (03CR) 10Andrew Bogott: [C: 04-2] "This doesn't work yet" [puppet] - 10https://gerrit.wikimedia.org/r/162139 (owner: 10Andrew Bogott) [21:35:10] PROBLEM - SSH on virt0 is CRITICAL: Server answer: [21:36:10] RECOVERY - SSH on virt0 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [21:36:40] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [21:39:00] PROBLEM - SSH on lvs1003 is CRITICAL: Server answer: [21:39:39] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [21:40:59] RECOVERY - SSH on lvs1003 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [21:43:53] (03PS4) 10Greg Grossmeier: Use sync-dir to copy out l10n json files, build cdbs on hosts [puppet] - 10https://gerrit.wikimedia.org/r/158623 (https://bugzilla.wikimedia.org/70443) (owner: 10Reedy) [21:49:49] RECOVERY - ElasticSearch health check on logstash1002 is OK: OK - elasticsearch (production-logstash-eqiad) is running. status: green: timed_out: false: number_of_nodes: 3: number_of_data_nodes: 3: active_primary_shards: 36: active_shards: 103: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [21:49:49] RECOVERY - ElasticSearch health check on logstash1001 is OK: OK - elasticsearch (production-logstash-eqiad) is running. status: green: timed_out: false: number_of_nodes: 3: number_of_data_nodes: 3: active_primary_shards: 36: active_shards: 103: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [21:50:30] RECOVERY - ElasticSearch health check on logstash1003 is OK: OK - elasticsearch (production-logstash-eqiad) is running. status: green: timed_out: false: number_of_nodes: 3: number_of_data_nodes: 3: active_primary_shards: 36: active_shards: 103: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [21:57:32] (03CR) 10Qgil: "I'm happy to fight for it if you help me identifying reviewers. According to http://korma.wmflabs.org/browser/gerrit_review_queue.html , o" [debs/wikimedia-task-appserver] - 10https://gerrit.wikimedia.org/r/115135 (https://bugzilla.wikimedia.org/61090) (owner: 10Hashar) [22:12:22] godog: salt/grafana? :) [22:12:54] (03PS3) 10Ori.livneh: wmflib: add to_milliseconds() / to_seconds() [puppet] - 10https://gerrit.wikimedia.org/r/159692 [22:27:42] (03PS1) 10Plucas: Support extra java opts from $JAVA_OPTS [debs/kafka] - 10https://gerrit.wikimedia.org/r/162150 [22:39:46] (03PS1) 10Plucas: Add java_opts option [puppet/kafka] - 10https://gerrit.wikimedia.org/r/162152 [22:40:52] (03CR) 10Plucas: "This adds support in the puppet module for this change in operations/debs/kafka: https://gerrit.wikimedia.org/r/#/c/162150/" [puppet/kafka] - 10https://gerrit.wikimedia.org/r/162152 (owner: 10Plucas) [22:58:17] Looks like RoanKattouw has claimed the SWAT again? [22:59:01] Have I? [22:59:12] That's what the page says. [22:59:21] Maybe it's just carried over from when you claimed it for training [22:59:37] Yeah I think that's someone blindly copypasting the table from last week [22:59:58] I would like not to do it today if possible [23:00:58] Anyone else willing to do today's SWAT? [23:01:22] with that showing of hands i suppose i can :) [23:01:40] Thanks man [23:01:57] I just had so many meetings and other things going on today, I'd like to able to get some work done [23:02:12] By which I mean reviewing other people's code of course, not getting any of my own work done :| but still [23:02:58] i wish other people reviewed my code, i have 29 patches in 'outgoing reviews', could i send you a few? ;-) [23:04:06] I am basically the person responsible for reviewing everything in VE land unless I manage to make it somebody else's problem [23:04:14] Which means I don't even use my personal dashboard, I use https://gerrit.wikimedia.org/r/#/projects/mediawiki/extensions/VisualEditor,dashboards/custom:custom [23:04:25] lol [23:04:49] Which, if you look at that, that's basically my review load (minus all the -1ed / -2ed patches but still) [23:05:10] o/ [23:06:35] (03CR) 10EBernhardson: [C: 032] Set wgUploadNavigationUrl for eowiki to localized version of UploadWizard [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161928 (https://bugzilla.wikimedia.org/69055) (owner: 10Gerrit Patch Uploader) [23:06:37] (03PS1) 10Dzahn: create shell user for Dan Duvall [puppet] - 10https://gerrit.wikimedia.org/r/162156 [23:08:17] (03Merged) 10jenkins-bot: Set wgUploadNavigationUrl for eowiki to localized version of UploadWizard [mediawiki-config] - 10https://gerrit.wikimedia.org/r/161928 (https://bugzilla.wikimedia.org/69055) (owner: 10Gerrit Patch Uploader) [23:11:08] so aparently there is something i don't understand about the change from /a/common -> /srv/mediawiki. When i run `git status` in /srv/mediawiki on tin.eqiad.wmnet it says 'fatal: Not a git repository (or any of the parent directories): .git` [23:11:20] although there is a .git dir there [23:11:30] (03PS2) 10Reedy: Add reedy to logstash-roots [puppet] - 10https://gerrit.wikimedia.org/r/162128 [23:12:06] `GIT_DIR=/srv/mediawiki/.git git status` doesn't do it either, something on tin doesn't like that git dir [23:13:00] ori: ^^ any idea? not sure if its directly related to /a/common -> /srv/mediawiki, but thats the only obvious change i know of [23:14:06] i can look [23:14:09] RoanKattouw: want me to do SWAT? [23:14:16] It doesn't want to escape the disk boundary [23:14:24] ori: i was doing swat, the problem is /srv/mediawiki is refusing to be a git dir :) [23:14:30] git hardcodes paths in submodules [23:14:39] cant be trivially moved [23:14:39] ebernhardson: https://gerrit.wikimedia.org/r/#/c/162003/ <-- the jenkins failure there looks to be totally bogus [23:14:54] Krinkle: i updated stuff [23:15:04] Krinkle: this isn't the first time anyone has deployed anything since the move [23:15:12] legoktm: Antoine broke *-testextensions for wmf branches. New zuul system doesn't account for wmf branches containing extensions/ subdir [23:15:30] ori: ok, but I've been unable to use tin properly since the move also. [23:15:35] ok. [23:15:43] ori: e.g. /srv/mediawiki is unusable, mediawiki-staging works fine though [23:15:44] ebernhardson: ^ [23:15:50] maybe use that instead? [23:15:59] you should only use mediawiki-staging [23:16:07] Krinkle: i'm not aware of the difference between the two, but if its safe to sync out changes from mediawiki-staging i can [23:16:07] that's the equivalent to /a/common [23:16:11] ahha [23:16:22] ebernhardson: Yeah mediawiki-staging [23:16:27] I missed the announcement about that too [23:16:30] (if there was one) [23:16:42] ebernhardson: syncing occurs from mediawiki-staging, not from mediawiki. mediawiki is the sync target. Just like /a/common and /usrc/local/apache used to be [23:16:46] works much better now :) in retrospect that should be semi-obvious [23:17:38] Just unfortunate that they're both on /srv/ and tab complete picks mediawiki first [23:17:44] maybe we can fix that? [23:17:49] !log ebernhardson Synchronized wmf-config/InitialiseSettings.php: Set wgUploadNavigationUrl for eowiki (duration: 00m 05s) [23:17:54] Logged the message, Master [23:18:39] legoktm: so looks safe to V+2, doing now [23:19:49] (03PS1) 10BryanDavis: Stop changing git::clone shared permissions unconditionally [puppet] - 10https://gerrit.wikimedia.org/r/162160 [23:19:52] Krinkle: we can get rid of -staging iirc [23:20:03] there isn't a lot that depends on it [23:20:12] bd808 has a bug somewhere [23:20:14] (03CR) 10BryanDavis: [C: 031] Add reedy to logstash-roots [puppet] - 10https://gerrit.wikimedia.org/r/162128 (owner: 10Reedy) [23:21:21] There is something in the scap process that relies on two different versions on the staging server... I just have to remember what it is [23:21:56] I think it has something to do with l10n cache but that may just be fingerpointing [23:24:18] (03PS1) 10Chad: T458: Rename ext_ref description and hide it from users [puppet] - 10https://gerrit.wikimedia.org/r/162161 [23:24:56] ori, Krinkle: one "easy" fix would be to rename /srv/mediawiki-staging to /srv/stage-mediawiki or something like that. [23:25:10] bd808: surely you jest [23:25:19] sort of [23:25:56] tab completion is part of the api for something like this and Krinkle is right that the current stat is not optimal [23:26:13] *state [23:28:21] (03CR) 10Dzahn: [C: 031] "estimated time of merge: 09-25" [puppet] - 10https://gerrit.wikimedia.org/r/162128 (owner: 10Reedy) [23:30:10] (03CR) 10BryanDavis: "This is actively breaking labs_vagrant instances so it would be nice to test and merge soon. I'll setup a self-hosted puppet on a test ins" [puppet] - 10https://gerrit.wikimedia.org/r/162160 (owner: 10BryanDavis) [23:30:17] !log ebernhardson Synchronized php-1.24wmf21/extensions/Flow/: Bump flow submodule in php-1.24wmf21 (duration: 00m 06s) [23:30:22] Logged the message, Master [23:30:35] !log ebernhardson Synchronized php-1.24wmf21/extensions/UploadWizard/: Bump UploadWizard submodule in php-1.24wmf21 (duration: 00m 04s) [23:30:42] Logged the message, Master [23:30:58] legoktm: your patch to 1.24wmf21 is being a pain and doesn't want to merge [23:31:12] PROBLEM - puppet last run on db69 is CRITICAL: CRITICAL: Epic puppet fail [23:31:27] gr [23:31:49] ebernhardson: remove the jenkins vote and then hit submit? [23:31:59] !log ebernhardson Synchronized php-1.24wmf21/includes/rcfeed/MachineReadableRCFeedFormatter.php: Use safe attribute accessor for RecentChange (duration: 00m 04s) [23:32:06] Logged the message, Master [23:32:54] Glaisher: Your change to upload wizard url for eowiki is deployed [23:33:08] tgr: Your UploadWizard change to 1.24wmf21 is deployed, please check [23:34:26] (03PS2) 10BryanDavis: Stop changing git::clone shared permissions unconditionally [puppet] - 10https://gerrit.wikimedia.org/r/162160 (https://bugzilla.wikimedia.org/70959) [23:35:57] !log ebernhardson Synchronized php-1.24wmf21/extensions/UploadWizard/: sync UploadWizard in 1.24wmf21 (duration: 00m 07s) [23:36:03] Logged the message, Master [23:36:16] ebernhardson: works, thanks [23:36:45] although it was only intended for wmf22, but won't do any harm in 21 either [23:37:25] tgr: oops, well at least it didn't hurt :) wmf22 in a minute(still fighting an lqt patch) [23:45:08] (03PS1) 10Dzahn: add dduvall to group 'statistics-users' [puppet] - 10https://gerrit.wikimedia.org/r/162166 [23:45:55] (03CR) 10Dzahn: [C: 031] "this should not need bastion access because stat1003 has public IP" [puppet] - 10https://gerrit.wikimedia.org/r/162166 (owner: 10Dzahn) [23:46:03] ori: Can you give https://gerrit.wikimedia.org/r/162160 and see if you have any conceptual objections to the change I'm proposing there? I'm setting up to test it to verify that it fixes my problem now. [23:46:56] !log ebernhardson Synchronized php-1.24wmf21/extensions/LiquidThreads/: Bump LQT submodule in 1.24wmf21 (duration: 00m 04s) [23:47:02] Logged the message, Master [23:48:37] !log ebernhardson Synchronized php-1.24wmf22/extensions/UploadWizard/: Bump UploadWizard submodule in 1.24wmf22 (duration: 00m 04s) [23:48:39] tgr: wmf22 is now out as well [23:48:43] Logged the message, Master [23:49:05] !log ebernhardson Synchronized php-1.24wmf22/extensions/LiquidThreads/: Bump LiquidThreads submodule in 1.24wmf22 (duration: 00m 06s) [23:49:11] Logged the message, Master [23:49:16] legoktm: lqt now updated in 1.24wmf2[12] [23:49:34] ebernhardson: thanks, I'll test it in a few minutes [23:50:21] RECOVERY - puppet last run on db69 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [23:50:49] ebernhardson: works, thanks [23:53:09] bd808: sure, looking [23:54:43] bd808: why does Git::Clone['vagrant'] need to be shared again? [23:55:37] ori: well... because that was what we were doing with the horrible recursive permissions management before [23:55:48] (03CR) 10ArielGlenn: "I'd prefer not to have another config file. A related approach is this:" [puppet] - 10https://gerrit.wikimedia.org/r/161332 (owner: 10Ori.livneh) [23:56:10] apergos: that's exactly what i did in PS1 [23:56:21] ori: and conceptually people seem to want to have "vagrant" behavior where they don't need to sudo to operate in there [23:56:29] apergos: but godog suggested having another config file, because that's more likely to be stable than the API [23:56:34] ugh [23:56:41] i think he's right, tbh [23:56:43] I didn't se patchset one, sorry [23:57:16] I don't like the idea of multiple confs, I'd rather do it this way, and then clean it up after the upgrade [23:57:32] which one is our NTP server now [23:57:37] which two [23:59:36] (03PS1) 10Dzahn: decom linne [puppet] - 10https://gerrit.wikimedia.org/r/162171 [23:59:55] apergos: would you merge it if i reverted to PS1?