[00:00:31] AaronSchulz, okay, I'm reading the docs (https://wikitech.wikimedia.org/wiki/Trebuchet#Deploying), but I'm fine if he does it. [00:01:56] I think I know how to do it though. cd /srv/deployment/jobrunner/jobrunner, git deploy start; git pull; git deploy sync; [00:01:58] Then https://wikitech.wikimedia.org/wiki/Jobrunner#Deployment [00:04:33] matt_flaschen, yep, but restarting the service is annoying without root [00:08:59] We should move evening SWAT an hour earlier so I stop hitting that error. [00:09:02] https://integration.wikimedia.org/ci/job/mediawiki-extensions-zend/19190/consoleFull [00:09:06] Or, you know, implement mockable time. [00:09:43] It only happens at exact midnight UTC. [00:10:09] 6operations, 5Patch-For-Review, 7database: Enabling automatic buffer pool dumping on start/stop (puppet) for all servers - https://phabricator.wikimedia.org/T101009#1332459 (10Springle) Nice :-) I see it can be easily aborted too, which was my only (vague) concern. [00:12:19] (03CR) 10Springle: [C: 031] Enable buffer pool load at start and dump at stop [puppet] - 10https://gerrit.wikimedia.org/r/215320 (https://phabricator.wikimedia.org/T101009) (owner: 10Jcrespo) [00:12:32] hahah ouch [00:13:05] matt_flaschen: https://stackoverflow.com/questions/2371854/can-i-mock-time-in-phpunit is intersting [00:14:48] legoktm, yeah, I've seen that. It's brilliant in a "Look out, I know PHP" [swing in on rope] way, but I think it would only work if you have all code in one namespace. [00:15:03] yeah, it seems like a bad idea [00:15:36] there's a comment about having a time abstraction layer that you can mock which would do the trick [00:15:57] legoktm, it's actually kind of surprising you can't override global functions in PHP (at least without runkit). You can do all sorts of other crazy stuff. [00:16:07] Yeah, I think an abstraction layer is the way to go. [00:18:48] 6operations, 10wikitech.wikimedia.org: wikitech instances list is blank - https://phabricator.wikimedia.org/T89808#1332494 (10Krinkle) [00:20:48] !log mattflaschen Synchronized php-1.26wmf8/includes/User.php: Fixed $flags bit operation precedence fail in User::loadFromDatabase() (duration: 00m 14s) [00:20:54] Logged the message, Master [00:21:15] AaronSchulz, you're done, guessing it's not really super-feasible to test, so moving on. [00:21:50] Oh, we need someone to do https://gerrit.wikimedia.org/r/#/c/215263/ . ori, you around? [00:24:14] AaronSchulz, okay, I'm moving on. Job queue one not started yet. [00:32:25] RECOVERY - Check correctness of the icinga configuration on neon is OK: Icinga configuration is correct [00:35:09] !log mattflaschen Synchronized php-1.26wmf8/extensions/Calendar/: Sync Calendar 1.26wmf8 for module position (duration: 00m 12s) [00:35:16] Logged the message, Master [00:35:26] gilles, done. [00:35:32] matt_flaschen: thanks! [00:35:33] SWAT done except for job queue patch. [00:59:48] 6operations: setup/install/deploy cp2001-cp2026 - https://phabricator.wikimedia.org/T101204#1332611 (10RobH) 3NEW a:3RobH [01:10:49] 6operations: rack & on-site setup of cp2001-cp2026 - https://phabricator.wikimedia.org/T101206#1332645 (10RobH) 3NEW a:3RobH [01:11:24] 6operations: setup/install/deploy cp2001-cp2026 - https://phabricator.wikimedia.org/T101204#1332654 (10RobH) [01:11:48] 6operations: rack & on-site setup of cp2001-cp2026 - https://phabricator.wikimedia.org/T101206#1332645 (10RobH) [01:37:14] !log start sync m4 eventlogging to codfw dbstore2002 [01:37:22] Logged the message, Master [01:43:27] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Deployment access for Darian Patrick - https://phabricator.wikimedia.org/T101170#1332686 (10Dzahn) >>! In T101170#1332265, @dpatrick wrote: > There's a typo there. In the "Name:" field, it lists "dpatrck" instead of "dpatrick". That has been fixed in pa... [01:45:06] 6operations: setup/install/deploy cp2001-cp2026 - https://phabricator.wikimedia.org/T101204#1332688 (10RobH) [01:46:28] (03PS1) 10Springle: m4 replication rule for analytics-store [puppet] - 10https://gerrit.wikimedia.org/r/215566 [01:57:08] !log replicate m3 to codfw dbstore2001 [01:57:15] Logged the message, Master [02:00:54] (03PS1) 10Springle: s5 pager slave partitioning [software] - 10https://gerrit.wikimedia.org/r/215567 [02:02:40] (03CR) 10Springle: "Possible these should be deployed by puppet, same as the grant scripts. But, for now..." [software] - 10https://gerrit.wikimedia.org/r/215567 (owner: 10Springle) [02:17:14] (03PS1) 10Springle: depool db1072 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215569 [02:20:22] (03CR) 10Springle: [C: 032] depool db1072 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215569 (owner: 10Springle) [02:21:07] (03Merged) 10jenkins-bot: depool db1072 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215569 (owner: 10Springle) [02:23:40] !log l10nupdate Synchronized php-1.26wmf7/cache/l10n: (no message) (duration: 07m 13s) [02:23:54] Logged the message, Master [02:25:36] !log springle Synchronized wmf-config/db-eqiad.php: depool db1072 (duration: 00m 12s) [02:25:43] Logged the message, Master [02:28:41] !log LocalisationUpdate completed (1.26wmf7) at 2015-06-03 02:27:38+00:00 [02:28:48] Logged the message, Master [02:45:32] !log l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 06m 37s) [02:45:47] Logged the message, Master [02:49:58] !log LocalisationUpdate completed (1.26wmf8) at 2015-06-03 02:48:55+00:00 [02:50:05] Logged the message, Master [02:58:40] 6operations, 6Release-Engineering: Try out hack (>! In T91590#1332329, @Legoktm wrote: > Also HHVM's linter is significantly slower than PHP5: https://github.com/JakubOnderka/PHP-Parallel-Lint/issues/47 This is a general... [03:03:09] 6operations: foreachwikiexceptdblist to run scripts on all but a blacklist of wikis - https://phabricator.wikimedia.org/T101213#1332769 (10Mattflaschen) 3NEW [03:04:53] 6operations, 6Release-Engineering: Try out hack (>! In T91590#1332763, @bd808 wrote: >>>! In T91590#1332329, @Legoktm wrote: >> Also HHVM's linter is significantly slower than PHP5: https://github.com/JakubOnderka/PHP-Para... [03:52:53] anybody knows which varnish version we're using in production? [03:58:31] SMalyshev: https://wikitech.wikimedia.org/wiki/Varnish#Configuration says minimum 2.1.2 [03:59:04] legoktm: so we run many different versions? [04:00:40] I don't think so [04:01:19] SMalyshev: I *think* it's the version packaged for jessie, because I remember bblack upgrading the varnish servers to jessie to get a newer varnish [04:01:53] legoktm: jessie has 3.0 by default (which is unsupported btw). Will 4.x work too? [04:02:14] I have no idea :P [04:02:27] I see [04:03:00] if I would be installing stuff via puppet, will there be a possibility of getting varnish 4? [04:03:20] 3 and 4 have incompatible configs, so I have to choose one... [04:08:48] well it would have to be packaged [04:09:00] and ops would have to approve using varnish 4 [04:12:23] so I conclude right now it's not :) [04:35:46] (03PS1) 10KartikMistry: Fix tabs and spacing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215572 [05:00:16] PROBLEM - are wikitech and wt-static in sync on silver is CRITICAL: wikitech-static CRIT - wikitech and wikitech-static out of sync (102196s 100000s) [05:05:26] RECOVERY - are wikitech and wt-static in sync on silver is OK: wikitech-static OK - wikitech and wikitech-static in sync (15580 100000s) [05:42:35] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Jun 3 05:41:31 UTC 2015 (duration 41m 30s) [05:42:41] Logged the message, Master [05:51:17] SMalyshev: we run a custom build of 3.0.6 [05:51:39] ori: ok, thanks, I'll test with 3.0.6 then to be compatible [05:55:16] SMalyshev: what are you developing? (if you don't mind me asking) [05:56:05] ori: frontent for wikidata query service (like http://wdqs-beta.wmflabs.org/) [05:56:21] mainly for logging [05:56:24] <_joe_> SMalyshev: why do you need specifics of which version of varnish we use? [05:56:27] at least for now [05:56:42] <_joe_> (just curious) [05:56:52] _joe_: because varnish 3 and 4 has different configs... and I want it eventually use puppet recipes if possible [05:57:15] so I want the config I write now be compatible with varnish that puppet recipes can install [05:57:21] * ori nods [05:58:01] ori: so right now it runs on nginx but I was told varnish is better since we have procedure of how to collect logs from it [05:58:18] <_joe_> SMalyshev: uhmmm [05:58:27] <_joe_> that doesn't sound right [05:58:34] <_joe_> I mean that's for analytics? [05:58:48] _joe_: yes, mainly. i.e. the logs are [05:58:54] <_joe_> anyway, I think you should work with someone in ops on this [05:59:11] <_joe_> SMalyshev: I can see if I find someone to assist you on this [05:59:16] _joe_: I talked to ottomata for now [05:59:52] <_joe_> SMalyshev: oh ok, great [05:59:54] but if you have better address I'd be glad to [06:00:21] <_joe_> SMalyshev: we're talking about blazegraph, right? [06:00:25] I know very little about how the system works, but basically I want to hook that thing to a) analytics b) monitoring [06:01:00] <_joe_> SMalyshev: if you want to take something to prod you'll need some ops help [06:01:13] _joe_: there are 3 things there: 1. query frontend 2. blazegraph 3. update tool. So for 1. we need analytics, e.g. like web analytics, maybe with option for deeper digging into sparql [06:01:18] <_joe_> I'll report you need some assistance [06:01:42] _joe_: yes I know... I'm trying to slowly start on that by at least getting some logging/monitoring [06:01:53] _joe_: thanks! [06:01:55] <_joe_> SMalyshev: do you have puppet manifests for any of those? [06:02:27] _joe_: nope... that would be another big task which I currently don't know how to approach... I know very little about puppet right now [06:02:52] but I wanted to at least figure out how to do logging on existing setup [06:03:07] which people already using [06:03:23] and then go and try to capture it all by puppet [06:04:07] <_joe_> SMalyshev: the current setup is in labs, right? [06:04:17] <_joe_> so, we can't log from labs to prod [06:04:28] <_joe_> for labs monitoring, ask YuviPanda :) [06:04:44] _joe_: so for 2 and 3 (who are java) we'd need some monitoring - like perf things and alive/dead - but not much analytics probably [06:04:56] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 906.057326136 [06:04:59] _joe_: yes, in labs. probably going to stay there for a while now [06:05:27] but I want to start with right setup so when we'd want to move it to prod I won't have to redo everything [06:05:32] are request logs adequate for capturing the activity you want to analyze? [06:05:38] _joe_: thanks I'll ping him [06:06:23] ori: yes, they should be enough. All queries are in the URL, so what I want basically is how many, how long they took, where they come from maybe, etc. [06:06:37] for that req logs should be enough... [06:07:02] maybe then look deeper into sparql contained in those requests but that should be doable too I imagine? [06:07:16] we can do some processing on URLs we log, do we? [06:08:03] SMalyshev: you can follow this example: https://github.com/wikimedia/operations-puppet/blob/production/modules/role/manifests/cache/kafka/statsv.pp#L16-30 [06:08:38] this part: varnish_opts => { 'm' => 'RxURL:^/beacon/statsv\?', } specifies a URL pattern you are interested in (it means, basically, only log requests where the URL matches that regex) [06:08:58] ori: not sure how to hook this to varnish instance [06:09:13] should I install some plugin? [06:09:25] SMalyshev: nah. Let me explain [06:09:59] varnish's approach to logging is interesting: instead of dealing with log files on disk (and potentially blocking writes), it writes to a shared memory segment, which it uses as a circular buffer [06:10:12] ahh brb [06:10:14] ori: ah, yes, I read about it in the docs [06:10:23] so you need to pull it out of it [06:11:43] (03PS13) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 [06:14:16] SMalyshev: yes exactly. varnish ships with a few cli tools for that: varnishncsa, for example, tails the log buffer and writes apache log format (or ncsa log format, hence the name) to stdout [06:14:51] so I imagine that's not what we want... so what we use instead? [06:14:54] we have an internally-developed tool called varnishkafka which tails the logs, reading them into kafka [06:15:17] <_joe_> ori: that's for production, though [06:15:30] yeah, i don't think we have kafka set up in labs [06:15:32] <_joe_> I don't think we have kafka in labs [06:15:37] heh [06:15:39] <_joe_> :P [06:16:17] at some point mark patched varnishncsa to add a UDP output option [06:16:28] which we have used (and IIRC still use for EventLogging) [06:16:55] it is deprecated, but the upshot is that UDP is simple and available in labs, and there is a clear and simple migration path from varnishncsa + udp to varnishkafka [06:18:12] we have a puppet abstraction for that, too: https://github.com/wikimedia/operations-puppet/blob/production/modules/role/manifests/cache/logging/eventlistener.pp#L7-14 [06:19:02] similar idea: '-m RxURL:^/(beacon/)?event(\.gif)?\?. -D' specifies the subset of requests we're interested in, and listener_address / port specify which udp host and port the log records get sent to [06:19:10] well, I don't have puppet for wdqs now so puppet ones for now probably won't help... [06:19:35] but I can look into what varnish::logging is doing [06:19:47] the actual command-line that that resource gets mapped to is: [06:20:08] /usr/bin/varnishncsa -n frontend -w 10.64.32.167:8422 -m RxURL:^/(beacon/)?event(\.gif)?\?. -D -P /var/run/varnishncsa/varnishncsa-eventlogging.pid -F %q\t%l\t%n\t%t\t%h\t%{User-agent}i [06:20:55] but i have to say, i suspect that logging at the varnish layer may not be what you want [06:21:23] ori: so what I think should be instead? [06:22:00] e.g., I don't care how the logs get where they are supposed to get... as long as we can analyze them [06:22:03] well, we are constrained to look at varnish logs because the logs on the backend servers don't include all the requests that were served from the cache, which in our case is the bulk of our traffic [06:22:30] but from glancing at the URL you pasted above it looks like this is the kind of situation where every request gets handled by the application server [06:22:38] for my case, the interesting parts probably won't be cached [06:22:42] right [06:23:07] yes, except for asking for gui files themselves (which is also interesting as "who looked at the gui" but less) [06:23:43] yep [06:23:46] but yeah most of the requests won't be cached I guess.. unless somebody asks the same query within very short time, then we may want to cache [06:23:55] for the last few days i had several times that after save, my page was not up to date with my changes [06:23:57] not decided on that one yet [06:24:03] en.wp [06:24:11] anyone else experience that ? [06:24:32] ori: so, if there's a way to get there from nginx logs I'd be fine with it too [06:24:32] (03PS2) 10Jcrespo: Enable buffer pool load at start and dump at stop [puppet] - 10https://gerrit.wikimedia.org/r/215320 (https://phabricator.wikimedia.org/T101009) [06:24:35] thedj: any kind of inconsistency like that is very worrisome even if it's rare. can you file a bug? [06:24:51] ori: yeah i'll file a report. [06:24:53] SMalyshev: get where? [06:25:30] ori: from access.log to some place where logs are aggregated, stored and analyzed (whatever it is, assuming we have such place now) [06:25:31] in other words: what do you need that isn't satisfied by a simple request log file on disk on a web server? [06:25:39] ah right [06:25:44] we do, there's logstash [06:25:51] (03CR) 10Jcrespo: [C: 032] Enable buffer pool load at start and dump at stop [puppet] - 10https://gerrit.wikimedia.org/r/215320 (https://phabricator.wikimedia.org/T101009) (owner: 10Jcrespo) [06:26:25] http://www.bravo-kernel.com/2014/12/setting-up-logstash-1-4-2-to-forward-nginx-logs-to-elasticsearch/ [06:26:27] ori: cool, does that include analytics (and is it accessible from labs)? [06:27:26] <_joe_> thedj: do you /later/ see the page with the changes? [06:27:28] depends on what you mean by analytics. it comes with a slow but capable front-end webapp for graphign, slicing, and dicing data. see https://logstash.wikimedia.org/ [06:27:32] <_joe_> or they're just lost forever? [06:27:44] _joe_: i get it later. [06:28:03] refresh, or purge and i see my new changes [06:28:04] <_joe_> thedj: ok so it's probably a caching problem [06:28:35] ori: that one doesn't let me in [06:28:42] https://phabricator.wikimedia.org/T101224 [06:28:50] SMalyshev: are you using your wikimedia labs credentials? [06:29:15] ori: no. it said use wiki not shell, so I used WMF one :) [06:29:34] wikitech / ldap [06:29:42] it's not a very clear prompt [06:29:51] ahh, ok, now I see it [06:29:54] I agree [06:30:16] PROBLEM - puppet last run on db1067 is CRITICAL puppet fail [06:30:39] SMalyshev: we send apache logs there, for example. bd808 and jgage are the people to talk to about getting nginx logs there. it should be pretty straightforward. [06:30:48] ori: ok, but this one is production one, right? [06:31:15] so I'm not sure if it's ok to get labs wdqs logs in here. Like, I don't mind but they might :) [06:31:16] yes but there is an instance in labs as well: https://logstash-beta.wmflabs.org/ [06:31:26] PROBLEM - puppet last run on cp4004 is CRITICAL Puppet has 1 failures [06:31:27] PROBLEM - puppet last run on cp3042 is CRITICAL Puppet has 1 failures [06:32:06] PROBLEM - puppet last run on cp3010 is CRITICAL Puppet has 1 failures [06:33:35] PROBLEM - puppet last run on cp4008 is CRITICAL Puppet has 1 failures [06:33:36] RECOVERY - puppet last run on db1067 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:33:40] ori: that one looks more restricted, doesn't allow me with wikitech pwd either [06:34:00] from the message I assume one has to be a deployer? [06:34:16] PROBLEM - puppet last run on rhodium is CRITICAL Puppet has 1 failures [06:34:57] PROBLEM - puppet last run on mw1166 is CRITICAL Puppet has 1 failures [06:35:17] PROBLEM - puppet last run on mw1176 is CRITICAL Puppet has 1 failures [06:35:26] SMalyshev: you have to be in the beta cluster project, which i can add you to if you're not in it already [06:35:35] PROBLEM - puppet last run on mw1060 is CRITICAL Puppet has 1 failures [06:35:35] PROBLEM - puppet last run on mw1170 is CRITICAL Puppet has 1 failures [06:35:37] PROBLEM - puppet last run on mw1046 is CRITICAL Puppet has 1 failures [06:35:46] PROBLEM - puppet last run on mw2016 is CRITICAL Puppet has 1 failures [06:35:46] ori: looks like I'm not [06:35:55] PROBLEM - puppet last run on mw2017 is CRITICAL Puppet has 1 failures [06:35:55] PROBLEM - puppet last run on mw2022 is CRITICAL Puppet has 1 failures [06:36:16] PROBLEM - puppet last run on mw1205 is CRITICAL Puppet has 1 failures [06:36:16] PROBLEM - puppet last run on mw1120 is CRITICAL Puppet has 1 failures [06:36:37] PROBLEM - puppet last run on mw1164 is CRITICAL Puppet has 1 failures [06:36:37] PROBLEM - puppet last run on mw1123 is CRITICAL Puppet has 2 failures [06:36:37] PROBLEM - puppet last run on mw1092 is CRITICAL Puppet has 1 failures [06:36:45] PROBLEM - puppet last run on mw2134 is CRITICAL Puppet has 1 failures [06:36:45] PROBLEM - puppet last run on mw2013 is CRITICAL Puppet has 1 failures [06:36:46] PROBLEM - puppet last run on mw2023 is CRITICAL Puppet has 1 failures [06:36:46] PROBLEM - puppet last run on mw2206 is CRITICAL Puppet has 1 failures [06:36:46] PROBLEM - puppet last run on mw2123 is CRITICAL Puppet has 1 failures [06:36:46] PROBLEM - puppet last run on mw2184 is CRITICAL Puppet has 1 failures [06:36:46] PROBLEM - puppet last run on mw2045 is CRITICAL Puppet has 2 failures [06:36:55] PROBLEM - puppet last run on mw1052 is CRITICAL Puppet has 1 failures [06:37:16] PROBLEM - puppet last run on mw2096 is CRITICAL Puppet has 1 failures [06:37:22] SMalyshev: added you just now. I can't remember if it takes effect immediately or if it takes a puppet run on all the relevant hosts (20 minutes) [06:39:01] not yet... I'll wait [06:39:19] is that one run by bd808 too? [06:39:49] i think so [06:40:54] ok, thanks, so I'll talk to him about it [06:41:19] so what's the difference between logstash and where-ever the varnish logs go? or they go to logstash too? [06:41:53] we have two sorts of logs coming out of varnish [06:42:16] the "all request" log, which is too voluminous to be handled by logstash [06:42:43] that goes into hadoop for ETL [06:44:11] volume shouldn't be a super-big deal at least for a while [06:44:14] then there is the URL-specific one which i linked to above. that's a little more specialized. we basically use varnish's scalability to have /beacon endpoints we can push arbitrary data to from client-side code (encoded as query strings) [06:44:43] aha, I see [06:44:45] that goes into a database and also to graphite [06:45:16] RECOVERY - puppet last run on mw1164 is OK Puppet is currently enabled, last run 25 seconds ago with 0 failures [06:45:35] different database from logstash? [06:45:35] RECOVERY - puppet last run on mw1176 is OK Puppet is currently enabled, last run 2 seconds ago with 0 failures [06:45:37] RECOVERY - puppet last run on cp3010 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:45:46] RECOVERY - puppet last run on mw1060 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:45:47] yeah [06:45:54] though arguably that should go into logstash as well [06:45:55] RECOVERY - puppet last run on mw1046 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:07] RECOVERY - puppet last run on rhodium is OK Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:46:16] the thing about varnish that sucks, especially for someone in your position, is that (a) our setup is complicated and not easy to reproduce, in labs or locally. (the varnishes in labs are actually still running ubuntu precise, whereas in production we've migrated them to jessie) [06:46:35] RECOVERY - puppet last run on mw1205 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:35] RECOVERY - puppet last run on mw1120 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:45] RECOVERY - puppet last run on cp3042 is OK Puppet is currently enabled, last run 44 seconds ago with 0 failures [06:46:56] RECOVERY - puppet last run on mw1123 is OK Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:46:56] RECOVERY - puppet last run on mw1092 is OK Puppet is currently enabled, last run 34 seconds ago with 0 failures [06:46:56] RECOVERY - puppet last run on mw2134 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:46:57] RECOVERY - puppet last run on mw2013 is OK Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:46:57] RECOVERY - puppet last run on mw2023 is OK Puppet is currently enabled, last run 57 seconds ago with 0 failures [06:46:57] RECOVERY - puppet last run on mw2206 is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:47:05] RECOVERY - puppet last run on mw2123 is OK Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:47:05] RECOVERY - puppet last run on mw2045 is OK Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:47:06] RECOVERY - puppet last run on mw1052 is OK Puppet is currently enabled, last run 57 seconds ago with 0 failures [06:47:06] RECOVERY - puppet last run on cp4008 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:07] (b) the varnishes are the workhoses of our infrastructure so ops are understandably a little reluctant to let people poke around or to run experimental software on them [06:47:26] RECOVERY - puppet last run on mw1170 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:27] RECOVERY - puppet last run on mw2096 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:41] well, it will be separate endpoint probably, not part of existing varnish setups [06:47:46] RECOVERY - puppet last run on mw2016 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:47] RECOVERY - puppet last run on mw2022 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:47] RECOVERY - puppet last run on mw2017 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:12] nod [06:48:17] RECOVERY - puppet last run on cp4004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:36] RECOVERY - puppet last run on mw1166 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:45] RECOVERY - puppet last run on mw2184 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:50] so I was looking into varnish just because Otto told me it may be a good solution. but if shoving logs from nginx to logstash works I'd be happy with that too [06:49:05] doesn't solve the monitoring but that is another can of worms :) [06:49:07] <_joe_> SMalyshev: otto told you that about production [06:49:30] _joe_: aha, I see. So if we move to production logstash won't work? [06:49:34] <_joe_> he probably missed the "this is in labs and I'm not yet planning to move it to prod" part [06:49:52] aha, ok [06:49:54] <_joe_> SMalyshev: I won't use our poor logstash infrastructure for access logs, no [06:50:08] <_joe_> access logs go into hadoop [06:50:21] access logs for a specialized service? [06:50:27] i think it's fine for that to go into logstash [06:50:32] <_joe_> also remember analytics people have their own tools, which I suppose don't work on logstash [06:50:41] _joe_: well, those won't be like humongous wikipedia.org access logs... it'd be for query service which would be several ordrs of magnitude less traffic [06:50:50] <_joe_> ori: I don't, given the reason why SMalyshev wants to log those [06:50:59] <_joe_> SMalyshev: I know, see my next remark [06:51:11] <_joe_> anyways, our logstash infrastructure is overbooked as it is [06:51:17] <_joe_> I mean the prod one [06:51:22] _joe_: so for these tools, what we would need then? [06:51:45] <_joe_> SMalyshev: in labs, I have no idea. I don't think we have analytics in labs [06:51:49] namely, which log source would work - nginx logs, varnish logs, anything else? [06:51:51] (03CR) 10GWicke: "@Filippo, do you have more detail on why it broke? I could look into moving it to its own class, but would still need to fix the actual al" [puppet] - 10https://gerrit.wikimedia.org/r/215557 (owner: 10Filippo Giunchedi) [06:52:13] _joe_: yeah ok, but beyond that? I just want to know so I won't paint myself into a corner [06:52:43] right now I checked and varnish works fine. So if we put varnish on it in prod would it be ok? [06:52:52] <_joe_> in prod, you would maybe find a way to feed your access logs to hadoop [06:53:14] <_joe_> SMalyshev: in prod your service will probably be behind the misc varnishes [06:53:33] <_joe_> (and nginx in front of those as an ssl terminator) [06:53:46] <_joe_> I can't say I'm a fan of having varnish in front of random services [06:54:16] _joe_: ok, I think that's good for me for now. We'd have to have _something_ in front of it, since it needs to limit access. [06:54:27] <_joe_> in your case, I'd use nginx alone and find a way to pipe access logs into hadoop from there, but I think most other ops would not agree with me [06:54:45] blazgeraph itself is basically "I don't care, you can do anything you want, delete whole db, fine with me" [06:55:03] <_joe_> SMalyshev: I really don't have the time to help you thoroughly on this, but I'll ask someone in ops to help you [06:55:07] so right now it has a frontend [06:55:31] _joe_: sure, the rpduction part is not urgent, I just wanted to get some visibility on labs one [06:55:35] <_joe_> which is your "application", right? [06:55:40] *production [06:56:16] _joe_: yes, application but the frontend also blocks all requests except safe GETs so you can't mess up the DB [06:56:32] <_joe_> SMalyshev: exactly, perfect [06:56:41] blazegraph surprisingly does implement it correctly in making GET read-only [06:57:43] <_joe_> SMalyshev: can you believe I ask "what is important NOT to do with a GET in a RESTFUL service?" in interviews [06:57:56] ok, so I think I have for now a vision of how to proceed [06:58:02] <_joe_> and just one person answered correctly until now? [06:58:31] _joe_: yeah :) I was asking this one on interviews in my last job too. Many people ignore that thing... [06:58:36] _joe_: really? wow [06:58:56] even if I might violate that rule I thought everyone at least was aware [06:59:05] or go to opposite extreme and do *everything* with POST, even retrievals [06:59:40] which also sucks as now one has to allow POSTs for read-only workloads [06:59:54] <_joe_> (note that GETs need to be idempotent, strictly speaking, not necessarily "read-only") [07:00:16] _joe_: yes, but in case of blazegraph it's also read-only [07:00:34] which is convenient for my purposes [07:00:38] $ curl -sI "https://en.wikipedia.org/w/api.php?bogus=request" | head -1 [07:00:39] HTTP/1.1 200 OK [07:00:40] ;) [07:00:42] <_joe_> SMalyshev: yes, and yes :) [07:01:24] <_joe_> ori: oh I know, you should go to our prox^H^H^H^H rest api instead, right? [07:01:34] anyway, that was very helpful, thanks! [07:01:49] I'll come back to bother you again in a couple of days about monitoring [07:01:56] SMalyshev: good luck! [07:02:03] thanks :) [07:02:18] _joe_: :D [07:02:48] <_joe_> ori: you troll me, I troll someone else [07:02:57] <_joe_> in the hope they'll troll you in return [07:03:06] it's the circle of troll [07:03:14] <_joe_> ricochet trolling [07:07:50] 6operations, 5Patch-For-Review, 7database: Enabling automatic buffer pool dumping on start/stop (puppet) for all servers - https://phabricator.wikimedia.org/T101009#1333019 (10jcrespo) 5Open>3Resolved Documented here: https://wikitech.wikimedia.org/wiki/MariaDB/buffer_pool_dump [07:09:20] what would be a better prompt for labs credentials? [07:09:23] i think it used to say labs [07:12:17] (03CR) 10Ori.livneh: Log a 20s sample of memcached usage to a file once a day (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/214762 (owner: 10Ori.livneh) [07:13:05] 6operations, 7database: investigate performance_schema for wmf prod - https://phabricator.wikimedia.org/T99485#1333022 (10jcrespo) In order to check how it p_s affect us, I want to run https://wikitech.wikimedia.org/wiki/MariaDB/query_performance on some test machines for a week. For example, db1018. [07:17:11] (03PS7) 10Ori.livneh: Log a 20s sample of memcached usage to a file once a day [puppet] - 10https://gerrit.wikimedia.org/r/214762 [07:18:05] (03PS8) 10Ori.livneh: Log a 20s sample of memcached usage to a file once a day [puppet] - 10https://gerrit.wikimedia.org/r/214762 [07:18:16] (03PS9) 10Ori.livneh: Log a 20s sample of memcached usage to a file once a day [puppet] - 10https://gerrit.wikimedia.org/r/214762 [07:18:24] (03CR) 10Ori.livneh: [C: 032 V: 032] Log a 20s sample of memcached usage to a file once a day [puppet] - 10https://gerrit.wikimedia.org/r/214762 (owner: 10Ori.livneh) [07:23:24] 6operations, 7database: Permission problem on parsercache db servers - https://phabricator.wikimedia.org/T101182#1333028 (10jcrespo) a:5Springle>3None [07:26:26] 6operations, 7database: Permission problem on parsercache db servers - https://phabricator.wikimedia.org/T101182#1333036 (10jcrespo) The access errors seems to have been fixed with the grants above: https://logstash.wikimedia.org/#dashboard/temp/7HE5B25RQvOUPWgn5nuy7g [07:42:48] (03PS1) 10Jcrespo: depool es1008 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215576 [07:44:18] (03CR) 10Jcrespo: [C: 032] depool es1008 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215576 (owner: 10Jcrespo) [07:47:46] !log jynus Synchronized wmf-config/db-eqiad.php: depool es1008 (duration: 00m 14s) [07:47:51] Logged the message, Master [07:55:20] (03PS1) 10Muehlenhoff: Fix remote DoS [debs/linux] - 10https://gerrit.wikimedia.org/r/215577 [07:59:27] (03CR) 10Muehlenhoff: [C: 032] Fix remote DoS [debs/linux] - 10https://gerrit.wikimedia.org/r/215577 (owner: 10Muehlenhoff) [07:59:41] (03CR) 10Muehlenhoff: [V: 032] Fix remote DoS [debs/linux] - 10https://gerrit.wikimedia.org/r/215577 (owner: 10Muehlenhoff) [08:45:46] (03PS3) 10Alexandros Kosiaris: install-server: Accomodate virtualization [puppet] - 10https://gerrit.wikimedia.org/r/214377 [08:46:59] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] install-server: Accomodate virtualization [puppet] - 10https://gerrit.wikimedia.org/r/214377 (owner: 10Alexandros Kosiaris) [08:51:29] (03PS1) 10Jcrespo: depool es2010 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215581 [08:51:43] ori: _joe_ SMalyshev to throw more things into the ring about usage - you can use logster + graphite as well. use logster to tail your current logs and put them into graphite.wmflabs.org [08:51:51] !log removed fuse/ntfs-3g from wtp* [08:51:57] Logged the message, Master [08:52:15] SMalyshev: _joe_ ori however, if this is still running in labs, I don't think you will get any usable measurements - you're going to max out resources and bring stuff down on any amount of usage :) [08:54:33] SMalyshev: _joe_ ori so IMO getting it set up in prod in some fashion should be higher priority than setting up a labs specific way to measure access usage [08:54:56] YuviPanda: I don't think it's that much used... [08:55:13] not yet at least [08:55:13] SMalyshev: it kept dying during the hackathon :) [08:55:23] still, I think any usage testing shouldn't be on labs [08:55:33] YuviPanda: that was for different reason (which since fixed I think( [08:56:58] (03CR) 10Jcrespo: [C: 032] depool es2010 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215581 (owner: 10Jcrespo) [08:58:52] YuviPanda: so we have about 10k queries logged so far [08:59:11] I don't think it's overwhelming... [08:59:48] wait, actually that includes static content, if we remove it it's like 3k [09:00:10] !log jynus Synchronized wmf-config/db-codfw.php: depool es2010 (duration: 00m 13s) [09:00:16] Logged the message, Master [09:00:29] SMalyshev: fair enough, but idk - the point at which labs-*specific* stats solution are useless probably approaches faster :) [09:01:03] (03CR) 10Alexandros Kosiaris: [C: 04-1] CX: Log to logstash (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/213840 (https://phabricator.wikimedia.org/T89265) (owner: 10KartikMistry) [09:01:03] YuviPanda: if we get there, that would be the point where we can ask to accelerate move to production :) [09:01:32] ok, time to go to bed now... [09:02:04] YuviPanda: I'd also like to talk about logster later, too late now [09:02:11] cunning plan, SMalyshev :) [09:02:13] have early meetings tomorrow [09:02:19] SMalyshev: yeah, if you already have nginx / logs logster might be easiest way [09:02:26] SMalyshev: \o/ cool. file bugs, etc :) [09:02:26] cya [09:02:47] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [09:03:45] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [09:06:05] (03PS1) 10Addshore: rsync wikidata json dumps to labs /public/dumps [puppet] - 10https://gerrit.wikimedia.org/r/215585 (https://phabricator.wikimedia.org/T100885) [09:08:08] addshore: cool :) now poke apergos to merge :) [09:08:28] all those are goig to get done this week [09:08:41] now we have lots of space again [09:09:25] PROBLEM - High load average on ms-be1005 is CRITICAL - load average: 283.08, 170.14, 80.18 [09:10:26] PROBLEM - puppet last run on cp3014 is CRITICAL puppet fail [09:25:48] <_joe_> akosiaris: I guess you have an unmerged patch on puppet? [09:26:31] _joe_: yes... I also have network problems... [09:26:47] <_joe_> akosiaris: that sucks, need any assistance? [09:26:57] <_joe_> I can puppet-merge it if needed [09:27:17] niah, I managed to merge it [09:27:26] RECOVERY - puppet last run on cp3014 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [09:27:36] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [09:27:43] let's see what I can do about fixing it [09:28:16] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [09:33:00] (03PS1) 10Jcrespo: Repool es1008 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215594 [09:38:26] (03CR) 10Jcrespo: [C: 032] Repool es1008 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215594 (owner: 10Jcrespo) [09:40:27] apergos: awesome! :) [09:43:40] (03PS2) 10Faidon Liambotis: sslcert: include ::chainedcert from ::certificate [puppet] - 10https://gerrit.wikimedia.org/r/215350 [09:43:42] (03PS2) 10Faidon Liambotis: sslcert: remove ::certificate's $content parameter [puppet] - 10https://gerrit.wikimedia.org/r/215351 [09:43:44] (03PS2) 10Faidon Liambotis: certs: inline certificate:: classes to ::base [puppet] - 10https://gerrit.wikimedia.org/r/215348 [09:43:46] (03PS2) 10Faidon Liambotis: base: certificates::base -> base::certificates [puppet] - 10https://gerrit.wikimedia.org/r/215349 [09:43:48] (03PS2) 10Faidon Liambotis: certs: remove random certificates::* includes [puppet] - 10https://gerrit.wikimedia.org/r/215346 [09:43:50] (03PS2) 10Faidon Liambotis: certs: kill a bunch of Labs classes [puppet] - 10https://gerrit.wikimedia.org/r/215347 [09:43:52] (03PS3) 10Faidon Liambotis: certs: replace require by collector ordering [puppet] - 10https://gerrit.wikimedia.org/r/215352 [09:43:54] (03PS3) 10Faidon Liambotis: sslcert: automatically regenerate chained cert on changes [puppet] - 10https://gerrit.wikimedia.org/r/215353 [09:43:56] (03PS1) 10Faidon Liambotis: base::certs: rename CA filenames to their CNs [puppet] - 10https://gerrit.wikimedia.org/r/215596 [09:43:58] (03PS1) 10Faidon Liambotis: base::certs: remove backwards-compat ensure => absents [puppet] - 10https://gerrit.wikimedia.org/r/215597 [09:48:05] !log jynus Synchronized wmf-config/db-eqiad.php: repool es1008 (duration: 00m 13s) [09:48:10] Logged the message, Master [09:53:44] 6operations, 10ops-codfw: degraded RAID / disk fail on es2010 - https://phabricator.wikimedia.org/T98982#1333212 (10jcrespo) I rebooted recently this node, saw no problem with disks. I will remove the note here https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=es2010&service=RAID unless you... [09:57:38] (03PS1) 10Muehlenhoff: Install perf by default (Bug: T100216) [debs/linux-meta] - 10https://gerrit.wikimedia.org/r/215598 [09:58:48] (03CR) 10Muehlenhoff: [C: 032 V: 032] Install perf by default (Bug: T100216) [debs/linux-meta] - 10https://gerrit.wikimedia.org/r/215598 (owner: 10Muehlenhoff) [10:03:11] (03PS1) 10Jcrespo: Repool es2010 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215600 [10:06:56] (03CR) 10Jcrespo: [C: 032] Repool es2010 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215600 (owner: 10Jcrespo) [10:09:30] !log Jenkins: refreshing all jobs to get rid of an obsolete http notification to Zuul {{bug|T93321}} [10:09:35] Logged the message, Master [10:18:46] suffering from some lock_wait_timouts en s2-master: nothing we can do [10:20:08] (03PS17) 10KartikMistry: CX: Log to logstash [puppet] - 10https://gerrit.wikimedia.org/r/213840 (https://phabricator.wikimedia.org/T89265) [10:24:06] !log added linux-meta 1.2 for jessie-wikimedia on carbon.wikimedia.org [10:24:12] Logged the message, Master [10:26:06] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 8.33% of data above the critical threshold [500.0] [10:27:04] 6operations: Switch to Linux 3.19 by default on jessie hosts - https://phabricator.wikimedia.org/T100773#1333261 (10MoritzMuehlenhoff) [10:27:05] 6operations, 5Patch-For-Review: Backport and include linux-tools-3.19 to our jessie repository - https://phabricator.wikimedia.org/T100216#1333260 (10MoritzMuehlenhoff) 5Open>3Resolved [10:28:53] !log jynus Synchronized wmf-config/db-codfw.php: repool es2010 (duration: 00m 14s) [10:28:59] Logged the message, Master [10:34:26] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [10:34:30] (03PS1) 10Jcrespo: Depool es2009 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215601 [10:35:21] (03CR) 10Alexandros Kosiaris: [C: 031] "+1 for now while some testing on beta takes place, will +2 it later on." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/213840 (https://phabricator.wikimedia.org/T89265) (owner: 10KartikMistry) [10:36:20] (03PS3) 10Yuvipanda: [WIP] postgres: Provision credentials for all users / services [puppet] - 10https://gerrit.wikimedia.org/r/211091 [10:36:59] (03CR) 10jenkins-bot: [V: 04-1] [WIP] postgres: Provision credentials for all users / services [puppet] - 10https://gerrit.wikimedia.org/r/211091 (owner: 10Yuvipanda) [10:43:09] !log powercycling ms-be1005 [10:43:15] Logged the message, Master [10:43:36] an1021 kafka alert, anyone? :) [10:45:26] PROBLEM - Host ms-be1005 is DOWN: PING CRITICAL - Packet loss = 100% [10:45:55] RECOVERY - Host ms-be1005 is UPING OK - Packet loss = 0%, RTA = 4.14 ms [10:47:15] RECOVERY - High load average on ms-be1005 is OK - load average: 34.48, 11.74, 4.20 [10:52:49] (03CR) 10Jcrespo: [C: 032] Depool es2009 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215601 (owner: 10Jcrespo) [10:53:14] (03CR) 10Faidon Liambotis: [C: 04-1] Use 3.19 on jessie by default (Bug: T97411) (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/211688 (owner: 10Muehlenhoff) [10:55:28] !log jynus Synchronized wmf-config/db-codfw.php: Depool es2009 (duration: 00m 13s) [10:55:34] Logged the message, Master [10:56:06] (03CR) 10JanZerebecki: [C: 04-1] rsync wikidata json dumps to labs /public/dumps (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/215585 (https://phabricator.wikimedia.org/T100885) (owner: 10Addshore) [11:04:40] !log kafka preferred-replica-election on an1021 [11:04:47] Logged the message, Master [11:05:06] PROBLEM - puppet last run on eventlog1001 is CRITICAL puppet fail [11:06:55] RECOVERY - Kafka Broker Messages In on analytics1021 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 4284.35784467 [11:22:23] (03CR) 10Addshore: rsync wikidata json dumps to labs /public/dumps (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/215585 (https://phabricator.wikimedia.org/T100885) (owner: 10Addshore) [11:22:53] (03PS2) 10Addshore: rsync wikidata json dumps to labs /public/dumps [puppet] - 10https://gerrit.wikimedia.org/r/215585 (https://phabricator.wikimedia.org/T100885) [11:23:28] akosiaris: who can give me access to deployment-logstash? [11:23:54] kart_: define access [11:23:55] RECOVERY - puppet last run on eventlog1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:24:07] akosiaris: login there, check cxserver logs :) [11:24:15] /var/log/cxserver etc [11:25:14] kart_: /var/log/cxserver ? on logstash ? [11:25:31] it's an elastic search cluster [11:25:52] kart_: http://logstash-beta.wmflabs.org/ [11:26:17] PROBLEM - puppet last run on mc1009 is CRITICAL Puppet has 1 failures [11:27:18] /data/project/cxserver/log - isn't that we define? [11:27:45] akosiaris: that's also! [11:31:24] kart_: if you're putting logs on NFS, please get them off. [11:31:59] kart_: ok, yes, there are logs on NFS. please get them off. [11:33:12] YuviPanda: https://gerrit.wikimedia.org/r/#/c/213840/17/modules/cxserver/templates/config.erb - this. [11:33:36] cool. I also filed https://phabricator.wikimedia.org/T101240?workflow=create to keep track [11:35:48] YuviPanda: thanks! Should I just use /var/log/cxserver instead of /data/.. [11:35:56] kart_: yup [11:36:14] Thanks. fixing. [11:38:40] (03PS18) 10KartikMistry: CX: Log to logstash [puppet] - 10https://gerrit.wikimedia.org/r/213840 (https://phabricator.wikimedia.org/T89265) [11:39:22] (03PS19) 10KartikMistry: CX: Log to logstash [puppet] - 10https://gerrit.wikimedia.org/r/213840 (https://phabricator.wikimedia.org/T89265) [11:41:03] (03PS20) 10KartikMistry: CX: Log to logstash [puppet] - 10https://gerrit.wikimedia.org/r/213840 (https://phabricator.wikimedia.org/T89265) [11:41:32] sigh. commit msg :/ [11:41:36] RECOVERY - puppet last run on mc1009 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures [11:49:03] (03PS1) 10Giuseppe Lavagetto: Conftool: initial commit [software/conftool] - 10https://gerrit.wikimedia.org/r/215604 [11:49:49] (03PS5) 10Muehlenhoff: Use 3.19 on jessie by default (Bug: T97411, Bug: T100773) [puppet] - 10https://gerrit.wikimedia.org/r/211688 [11:51:25] (03CR) 10Giuseppe Lavagetto: [C: 04-2] "This is a WiP for now, I just wanted others to be able to see my code." [software/conftool] - 10https://gerrit.wikimedia.org/r/215604 (owner: 10Giuseppe Lavagetto) [11:54:55] _joe_: ^ -2, not in nodejs or go or rust or haskell. [11:55:18] <_joe_> YuviPanda: EHIPSTER [11:55:39] _joe_: pffft, all those languages are way too mainstream :P if I were hipster I'd be talking about elm or nim. [12:00:04] aude: Respected human, time to deploy Wikidata (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150603T1200). Please do the needful. [12:01:10] deploy! [12:02:33] 6operations, 6Labs: labvirt1005 doesn't boot up - https://phabricator.wikimedia.org/T100030#1333509 (10faidon) What's the status of this? Is it blocked on someone outside the Labs team? [12:07:02] (03PS21) 10KartikMistry: CX: Log to logstash [puppet] - 10https://gerrit.wikimedia.org/r/213840 (https://phabricator.wikimedia.org/T89265) [12:09:13] 6operations, 6Labs: labvirt1005 doesn't boot up - https://phabricator.wikimedia.org/T100030#1333520 (10yuvipanda) Ugh, this fell through the cracks :| Ideally, someone will investigate ways to get this machine booting up on a kernel that's new enough to not have the memory issues that @bblack pointed out - an... [12:10:19] (03PS1) 10Faidon Liambotis: Add A/AAAA/PTR for radon (public1-c-eqiad) [dns] - 10https://gerrit.wikimedia.org/r/215607 [12:10:52] (03PS1) 10Faidon Liambotis: site: Set up radon (move to public, jessie) [puppet] - 10https://gerrit.wikimedia.org/r/215608 [12:11:06] (03CR) 10Faidon Liambotis: [C: 032] Add A/AAAA/PTR for radon (public1-c-eqiad) [dns] - 10https://gerrit.wikimedia.org/r/215607 (owner: 10Faidon Liambotis) [12:11:47] jenkins lagging behind and/or dead? [12:12:38] hashar: ^ ? [12:12:40] it was slow for me too before [12:15:41] (03CR) 10Faidon Liambotis: [C: 032] site: Set up radon (move to public, jessie) [puppet] - 10https://gerrit.wikimedia.org/r/215608 (owner: 10Faidon Liambotis) [12:15:52] YuviPanda: ok, can you bring it back up then? [12:16:15] paravoid: yes, let me do an ops@ thread and also bring it up on next ops meeting [12:16:42] paravoid: yeah still refreshing all the Jenkins jobs. So that adds some overhead when triggering the jobs :/ [12:17:15] 6operations, 6Labs, 3ToolLabs-Goals-Q4: Investigate kernel issues on labvirt** hosts - https://phabricator.wikimedia.org/T99738#1333536 (10yuvipanda) T100030 is related [12:21:10] paravoid: I've ressurected thread [12:21:15] thanks for the poke [12:21:24] awesome, thanks [12:21:31] 6operations, 6Labs: labvirt1005 doesn't boot up - https://phabricator.wikimedia.org/T100030#1333543 (10yuvipanda) I've asked for help in the ops@ list again. [12:21:39] (03CR) 10JanZerebecki: rsync wikidata json dumps to labs /public/dumps (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/215585 (https://phabricator.wikimedia.org/T100885) (owner: 10Addshore) [12:21:47] (03CR) 10JanZerebecki: [C: 031] rsync wikidata json dumps to labs /public/dumps [puppet] - 10https://gerrit.wikimedia.org/r/215585 (https://phabricator.wikimedia.org/T100885) (owner: 10Addshore) [12:22:03] (03PS3) 10Yuvipanda: Add support for Generic Webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/215505 (https://phabricator.wikimedia.org/T97230) [12:22:38] (03PS1) 10Aude: Enable Wikibase usage tracking on arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215609 (https://phabricator.wikimedia.org/T100659) [12:28:25] 6operations, 6Labs: labvirt1005 doesn't boot up - https://phabricator.wikimedia.org/T100030#1333554 (10yuvipanda) @andrew says that similar issues had cropped up in another machine before, and a rollback to an older kernel fixed it. [12:30:45] (03PS1) 10Faidon Liambotis: site: add authdns::server role to radon [puppet] - 10https://gerrit.wikimedia.org/r/215610 [12:30:47] (03PS1) 10Faidon Liambotis: site: remove authdns::server from rubidium [puppet] - 10https://gerrit.wikimedia.org/r/215611 [12:32:26] (03CR) 10Faidon Liambotis: [C: 032] site: add authdns::server role to radon [puppet] - 10https://gerrit.wikimedia.org/r/215610 (owner: 10Faidon Liambotis) [12:32:53] (03PS1) 10Faidon Liambotis: Add AAAA/IPv6 PTR to baham [dns] - 10https://gerrit.wikimedia.org/r/215612 [12:41:20] (03PS1) 10Matanya: statistics: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/215614 [12:51:33] (03PS1) 10Jcrespo: Repool es2009 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215616 [12:52:15] (03CR) 10Jcrespo: [C: 032] Repool es2009 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215616 (owner: 10Jcrespo) [12:53:17] (03PS2) 10Faidon Liambotis: site: remove authdns::server from rubidium [puppet] - 10https://gerrit.wikimedia.org/r/215611 [12:53:33] !log jynus Synchronized wmf-config/db-codfw.php: Repool es2009 (duration: 00m 15s) [12:53:39] Logged the message, Master [12:56:04] !log permanently switching ns0 to radon instead of rubidium [12:56:10] Logged the message, Master [12:56:56] (03PS2) 10Aude: Enable Wikibase usage tracking on arwiki and cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215609 (https://phabricator.wikimedia.org/T100659) [12:57:49] PROBLEM - Host ns0-v6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:861:ed1a::e [12:58:07] hmm [12:58:32] it's a real problem but you can safely ignore that [13:00:00] RECOVERY - Host ns0-v6 is UPING OK - Packet loss = 0%, RTA = 3.06 ms [13:00:49] (03CR) 10Aude: [C: 032] Enable Wikibase usage tracking on arwiki and cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215609 (https://phabricator.wikimedia.org/T100659) (owner: 10Aude) [13:00:56] (03Merged) 10jenkins-bot: Enable Wikibase usage tracking on arwiki and cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215609 (https://phabricator.wikimedia.org/T100659) (owner: 10Aude) [13:02:13] !log aude Synchronized wmf-config/InitialiseSettings.php: Enable Wikibase usage tracking on arwiki and cawiki (duration: 00m 15s) [13:02:17] Logged the message, Master [13:02:38] :) [13:06:45] (03PS1) 10Matanya: apache: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/215617 [13:09:38] (03CR) 10Faidon Liambotis: [C: 032] site: remove authdns::server from rubidium [puppet] - 10https://gerrit.wikimedia.org/r/215611 (owner: 10Faidon Liambotis) [13:12:45] !log reimaging rubidium with trusty, as spare [13:12:51] Logged the message, Master [13:14:13] 10Ops-Access-Requests, 6operations, 6Search-and-Discovery, 3Search-and-Discovery-Research-and-Data-Sprint: Get Oliver Keyes access to Google Webmaster Tools for all Wikimedia domains - https://phabricator.wikimedia.org/T101157#1333636 (10Deskana) [13:14:23] 10Ops-Access-Requests, 6operations, 6Search-and-Discovery, 3Search-and-Discovery-Research-and-Data-Sprint: Get Oliver Keyes access to Google Webmaster Tools for all Wikimedia domains - https://phabricator.wikimedia.org/T101157#1331491 (10Deskana) [13:15:30] PROBLEM - Host rubidium is DOWN: PING CRITICAL - Packet loss = 100% [13:16:46] 6operations, 10hardware-requests: Replace rubidium with radon for authdns (allocate radon, deallocate rubidium) - https://phabricator.wikimedia.org/T101256#1333643 (10faidon) 3NEW [13:17:51] RECOVERY - Host rubidium is UPING OK - Packet loss = 0%, RTA = 1.94 ms [13:18:58] 6operations, 10Traffic: Upgrade prod DNS daemons to gdnsd 2.2.0 - https://phabricator.wikimedia.org/T98003#1333652 (10faidon) rubidium was just replaced by radon — radon runs jessie now. IOW, all 3 NSes run jessie/2.1.2. [13:20:54] 6operations, 6Labs, 10Labs-Infrastructure, 3Labs-Sprint-100: Make a block-level copy of the codfw mirror of labstore1001 to eqiad - https://phabricator.wikimedia.org/T101010#1333661 (10coren) The copy is progressing nicely (if not as fast as hoped); the bottleneck appears to be the ssh channel window size... [13:28:52] (03PS1) 10Jcrespo: Depool es2008 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215622 [13:29:38] (03PS22) 10KartikMistry: CX: Log to logstash [puppet] - 10https://gerrit.wikimedia.org/r/213840 (https://phabricator.wikimedia.org/T89265) [13:34:39] (03CR) 10Jcrespo: [C: 032] Depool es2008 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215622 (owner: 10Jcrespo) [13:34:44] 6operations, 10hardware-requests: Replace rubidium with radon for authdns (allocate radon, deallocate rubidium) - https://phabricator.wikimedia.org/T101256#1333682 (10faidon) [13:38:23] !log jynus Synchronized wmf-config/db-codfw.php: Depool es2008, es2009 and es2010 (duration: 00m 14s) [13:38:29] Logged the message, Master [13:42:49] 10Ops-Access-Requests, 6operations, 6Search-and-Discovery, 3Search-and-Discovery-Research-and-Data-Sprint: Get Oliver Keyes access to Google Webmaster Tools for all Wikimedia domains - https://phabricator.wikimedia.org/T101157#1333692 (10chasemp) p:5Triage>3Normal [13:45:28] (03PS23) 10KartikMistry: CX: Log to logstash [puppet] - 10https://gerrit.wikimedia.org/r/213840 (https://phabricator.wikimedia.org/T89265) [13:48:31] 6operations: rack & on-site setup of cp2001-cp2026 - https://phabricator.wikimedia.org/T101206#1333702 (10BBlack) Seems ok to me for physical layout. BIOS settings are standard, but note that we really really care about having Logical Processor and IPMI enabled, Virtualization disabled, and Power Mgmt to Perfor... [13:50:31] 6operations, 10Traffic: setup/install/deploy cp2001-cp2026 - https://phabricator.wikimedia.org/T101204#1333704 (10BBlack) [13:51:36] 6operations, 10Traffic: setup/install/deploy cp2001-cp2026 - https://phabricator.wikimedia.org/T101204#1332611 (10BBlack) Note install-server for this stuff was already set up some time ago (generically for all cp*), and wmf-reimage can take care of the full install process once site.pp entries exist (rather t... [13:51:58] 6operations, 10Traffic: setup/install/deploy cp2001-cp2026 - https://phabricator.wikimedia.org/T101204#1333707 (10BBlack) [13:52:49] 6operations, 10Traffic: rack & on-site setup of cp2001-cp2026 - https://phabricator.wikimedia.org/T101206#1333708 (10BBlack) [13:53:11] (03PS1) 10Matanya: ganglia: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/215623 [13:53:26] 6operations, 10Traffic: rack & on-site setup of cp2001-cp2026 - https://phabricator.wikimedia.org/T101206#1333712 (10BBlack) p:5Triage>3Normal [13:53:38] 6operations, 10Traffic: setup/install/deploy cp2001-cp2026 - https://phabricator.wikimedia.org/T101204#1333713 (10BBlack) p:5Triage>3Normal [13:55:27] (03PS1) 10BBlack: Add legacy bits.wm.o support to text-lb VCL [puppet] - 10https://gerrit.wikimedia.org/r/215624 (https://phabricator.wikimedia.org/T95448) [13:58:25] (03CR) 10BBlack: [C: 04-1] "Not yet complete, just the VCL parts so far (needs $cluster_options stuff for bits hostname, etc)." [puppet] - 10https://gerrit.wikimedia.org/r/215624 (https://phabricator.wikimedia.org/T95448) (owner: 10BBlack) [13:59:24] (03PS1) 10Matanya: nove: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/215625 [14:04:47] (03PS2) 10BBlack: Add legacy bits.wm.o support to text-lb VCL [puppet] - 10https://gerrit.wikimedia.org/r/215624 (https://phabricator.wikimedia.org/T95448) [14:09:02] (03PS1) 10Aude: Enable Wikibase usage tracking on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215627 (https://phabricator.wikimedia.org/T100659) [14:09:40] (03CR) 10Aude: [C: 032] Enable Wikibase usage tracking on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215627 (https://phabricator.wikimedia.org/T100659) (owner: 10Aude) [14:09:46] (03Merged) 10jenkins-bot: Enable Wikibase usage tracking on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215627 (https://phabricator.wikimedia.org/T100659) (owner: 10Aude) [14:10:28] !log aude Synchronized wmf-config/InitialiseSettings.php: Enable Wikibase usage tracking on eswiki (duration: 00m 13s) [14:10:34] Logged the message, Master [14:14:56] PROBLEM - Apache HTTP on mw1195 is CRITICAL - Socket timeout after 10 seconds [14:15:16] PROBLEM - HHVM rendering on mw1195 is CRITICAL - Socket timeout after 10 seconds [14:15:56] 6operations, 6Labs: labvirt1005 doesn't boot up - https://phabricator.wikimedia.org/T100030#1333783 (10BBlack) The memory issues were just a random guess, not real evidence. I do think getting on newer kernels is probably a win in general, though. The alerts about not finding disks.... is this generic to all... [14:20:16] PROBLEM - HHVM queue size on mw1195 is CRITICAL 100.00% of data above the critical threshold [80.0] [14:20:53] is someone working on mw1195? or does it need hhvm restart? [14:21:06] PROBLEM - HHVM busy threads on mw1195 is CRITICAL 100.00% of data above the critical threshold [115.2] [14:23:14] (03PS1) 10Aude: Enable Wikibase usage tracking on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215629 (https://phabricator.wikimedia.org/T100659) [14:23:21] (03PS1) 10Jcrespo: Repool es2008, es2009 and es2010 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215630 [14:24:26] (03CR) 10Jcrespo: [C: 032] Repool es2008, es2009 and es2010 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215630 (owner: 10Jcrespo) [14:29:45] !log jynus Synchronized wmf-config/db-codfw.php: Repool es2008, es2009 and es2010 (duration: 00m 14s) [14:29:51] Logged the message, Master [14:31:25] PROBLEM - puppet last run on ms-be3001 is CRITICAL puppet fail [14:31:47] (03CR) 10Aude: [C: 032] Enable Wikibase usage tracking on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215629 (https://phabricator.wikimedia.org/T100659) (owner: 10Aude) [14:31:54] (03Merged) 10jenkins-bot: Enable Wikibase usage tracking on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215629 (https://phabricator.wikimedia.org/T100659) (owner: 10Aude) [14:32:56] !log aude Synchronized wmf-config/InitialiseSettings.php: Enable Wikibase usage tracking on huwiki (duration: 00m 12s) [14:33:02] Logged the message, Master [14:39:09] (03CR) 10Jcrespo: [C: 031] "Thank you for publishing the scripts!" [software] - 10https://gerrit.wikimedia.org/r/215567 (owner: 10Springle) [14:40:02] (03PS24) 10KartikMistry: CX: Log to logstash [puppet] - 10https://gerrit.wikimedia.org/r/213840 (https://phabricator.wikimedia.org/T89265) [14:46:44] !log restarted hhvm on mw1195, seems to be a case of https://phabricator.wikimedia.org/T89912 [14:46:50] Logged the message, Master [14:47:15] RECOVERY - Apache HTTP on mw1195 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.064 second response time [14:47:35] RECOVERY - HHVM rendering on mw1195 is OK: HTTP OK: HTTP/1.1 200 OK - 66168 bytes in 0.172 second response time [14:48:38] (03CR) 10Andrew Bogott: [C: 032] "thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/215625 (owner: 10Matanya) [14:50:06] RECOVERY - puppet last run on ms-be3001 is OK Puppet is currently enabled, last run 55 seconds ago with 0 failures [14:52:13] we had some insert spikes on s5, probably cause or consequence of the above, gone now [14:53:15] RECOVERY - HHVM busy threads on mw1195 is OK Less than 30.00% above the threshold [76.8] [14:53:29] (03CR) 10Andrew Bogott: [C: 032] statistics: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/215614 (owner: 10Matanya) [14:54:15] RECOVERY - HHVM queue size on mw1195 is OK Less than 30.00% above the threshold [10.0] [14:56:51] (03CR) 10Andrew Bogott: [C: 032] ganglia: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/215623 (owner: 10Matanya) [14:59:19] _joe_: if you have a bit of time this evening, I could still use a hand sorting out ganglia. [15:00:05] manybubbles, anomie, ^d, thcipriani, marktraceur: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150603T1500). Please do the needful. [15:00:17] !log added linux 3.19.3-5 for jessie-wikimedia on apt.wikimedia.org [15:00:23] Logged the message, Master [15:00:35] Looks like there's nothing to SWAT this morning. [15:04:25] thcipriani: I'm running late and have 2 patches, but I can do it myself [15:04:35] 6operations, 7database: es[12]00[123] maintenance and upgrade - https://phabricator.wikimedia.org/T101084#1333945 (10jcrespo) Tomorrow, after a bit more of investigation, I will perform a master-master failover of es1009 (with es1008), a bit more delicate as it is a major version upgrade. That is the last ser... [15:04:59] legoktm: kk [15:05:00] (03PS1) 10Matanya: strongswan: fqdn is a fact, qualify [puppet] - 10https://gerrit.wikimedia.org/r/215635 [15:07:10] !log depooling ns1->baham DNS traffic for kernel update [15:07:16] Logged the message, Master [15:07:19] (03CR) 10Legoktm: [C: 032] Revert "Revert "Change default extension distributor branch to REL1_25"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/213572 (owner: 10Legoktm) [15:07:37] (03CR) 10Legoktm: [C: 032] Remove references to $wgEchoCohortInterval [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215270 (https://phabricator.wikimedia.org/T101047) (owner: 10Legoktm) [15:12:09] (03CR) 10Jcrespo: [C: 031] "I will keep an eye on the lag." [puppet] - 10https://gerrit.wikimedia.org/r/215566 (owner: 10Springle) [15:12:27] .win 15 [15:12:30] :P [15:12:53] (03CR) 10Andrew Bogott: [C: 031] "This is clearly right, but I'd feel better if Gage did the merging." [puppet] - 10https://gerrit.wikimedia.org/r/215635 (owner: 10Matanya) [15:14:24] (03Merged) 10jenkins-bot: Revert "Revert "Change default extension distributor branch to REL1_25"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/213572 (owner: 10Legoktm) [15:14:27] (03Merged) 10jenkins-bot: Remove references to $wgEchoCohortInterval [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215270 (https://phabricator.wikimedia.org/T101047) (owner: 10Legoktm) [15:14:51] thcipriani: I’m planning to attend the airing of greivances in 45 minutes; we can work on breaking beta any time after that. [15:15:28] !log repooling ns1->baham DNS traffic [15:15:34] Logged the message, Master [15:16:12] !log legoktm Synchronized wmf-config/CommonSettings.php: Change default extension distributor branch to REL1_25 (duration: 00m 15s) [15:16:18] Logged the message, Master [15:16:18] andrewbogott: heh, sure, I'd like to take a look at it before this afternoon, so how about right afterwards? 10:15 in -labs? [15:16:36] works for me [15:16:39] !log legoktm Synchronized wmf-config/InitialiseSettings.php: Remove references to $wgEchoCohortInterval (duration: 00m 12s) [15:16:45] Logged the message, Master [15:17:00] 6operations, 10Wikimedia-Site-requests: Run "refreshLinks.php --dfn-only" on all wikis periodically - https://phabricator.wikimedia.org/T18112#1334018 (10PleaseStand) [15:17:02] 6operations, 10Wikimedia-Site-requests: refreshLinks.php --dfn-only cron jobs do not seem to be running - https://phabricator.wikimedia.org/T97926#1334015 (10PleaseStand) 5Open>3Resolved a:3PleaseStand Script appears to have run on s3, as evident on mediawikiwiki ([[http://quarry.wmflabs.org/query/3870|b... [15:17:22] 6operations, 10Wikimedia-Site-requests: refreshLinks.php --dfn-only cron jobs do not seem to be running - https://phabricator.wikimedia.org/T97926#1334019 (10PleaseStand) a:5PleaseStand>3None [15:18:36] (03PS14) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 [15:19:31] (03CR) 10Paladox: "@Dzahn please do V+2 please since it didn't merge yet." [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [15:33:08] 6operations, 6Search-and-Discovery, 7Elasticsearch: Setup backups of elasticsearch indicies - https://phabricator.wikimedia.org/T91404#1334072 (10Manybubbles) [15:34:06] manybubbles: poke? [15:34:14] :D [15:37:14] 10Ops-Access-Requests, 6operations: Additional Webmaster tools access - https://phabricator.wikimedia.org/T98283#1334089 (10akosiaris) Removed all the sitemap.wikimedia.org entries. We are now down to 636 entries. [15:39:47] (03CR) 10Alexandros Kosiaris: [C: 031] CX: Log to logstash [puppet] - 10https://gerrit.wikimedia.org/r/213840 (https://phabricator.wikimedia.org/T89265) (owner: 10KartikMistry) [15:48:51] (03PS1) 10ArielGlenn: dumps config: wiki html dir can now be specified per project [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215641 [15:50:21] (03PS1) 10Alexandros Kosiaris: install-server: Force VMs to power off after installation [puppet] - 10https://gerrit.wikimedia.org/r/215642 [15:50:36] (03CR) 10ArielGlenn: [C: 032] dumps config: wiki html dir can now be specified per project [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215641 (owner: 10ArielGlenn) [15:52:14] (03PS1) 10Filippo Giunchedi: move d-i-test to row C [dns] - 10https://gerrit.wikimedia.org/r/215643 (https://phabricator.wikimedia.org/T100636) [15:52:40] godog: I wouldn't use d-i-test for a real machine [15:52:44] d-i-test was a VM back then [15:53:09] for a real machine, a real server name should be assigned (some of them are named already; for the rest, robh does the honors) [15:54:21] 6operations, 10ContentTranslation-cxserver, 6Services, 10service-template-node, 7service-runner: Standardise CXServer deployment - https://phabricator.wikimedia.org/T101272#1334128 (10mobrovac) 3NEW [15:54:35] paravoid: yeah it isn't obvious from the commit message but it'll be a ganeti vm (hence the row move) but I'm open to options [16:09:04] (03CR) 10Alexandros Kosiaris: [C: 032] install-server: Force VMs to power off after installation [puppet] - 10https://gerrit.wikimedia.org/r/215642 (owner: 10Alexandros Kosiaris) [16:10:23] (03PS2) 10Filippo Giunchedi: move d-i-test to row C [dns] - 10https://gerrit.wikimedia.org/r/215643 (https://phabricator.wikimedia.org/T100636) [16:10:30] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] move d-i-test to row C [dns] - 10https://gerrit.wikimedia.org/r/215643 (https://phabricator.wikimedia.org/T100636) (owner: 10Filippo Giunchedi) [16:11:28] (03PS1) 10ArielGlenn: dumps: add 'date' option to worker (wrapper script) [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215645 [16:12:40] (03CR) 10Mobrovac: "There are the service-runner and service-template-node projects which standardise a lot of things for Node.JS services and ease the deploy" [puppet] - 10https://gerrit.wikimedia.org/r/213840 (https://phabricator.wikimedia.org/T89265) (owner: 10KartikMistry) [16:13:15] mobrovac: any idea why https://gerrit.wikimedia.org/r/#/c/213840/ doesn't actully show anything at logstash-beta.wmflabs.org? [16:13:27] mobrovac: do we need to do anything else there? [16:13:38] * mobrovac goes to look [16:14:03] mobrovac: and thanks for service-runner headsup [16:14:49] kart_: np, integrating it in CXServer would take away a lot of headaches and lost time [16:15:31] mobrovac: yep! [16:15:57] damn, can't log in into logstash-beta UI [16:16:23] mobrovac: there is instuction for password :) [16:17:25] ah it's the wikitech pass [16:17:33] mobrovac: it's a static password [16:17:34] <_joe_> andrewbogott: not really, I'm just off an interview [16:17:43] ************ [16:17:44] <_joe_> and I am kind of tired [16:18:11] <_joe_> hi ori [16:18:21] bd808: " use the same username/password as for this wiki (wikitech) to log in" - so not my pass? [16:18:30] _joe_: ok, no worries… I believe there’s a phab task assigned to you already; can I encourage you to have a look sometime in the next few days? You probably don’t need me to be awake :) [16:18:32] <_joe_> I am happy you're working on your communication skills :) [16:18:35] mobrovac: beta says that? [16:18:45] _joe_: heheh [16:18:46] <_joe_> andrewbogott: sigh, I'll do my best [16:18:49] (03CR) 10ArielGlenn: [C: 032] dumps: add 'date' option to worker (wrapper script) [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215645 (owner: 10ArielGlenn) [16:18:56] bd808: https://wikitech.wikimedia.org/wiki/Logstash#Prototype_.28Beta.29_Logstash says that [16:19:15] _joe_: thanks :) Just pointing me in the right direction is good as well. Pretty much no one knows where to start w/ganglia debugging. [16:19:44] mobrovac: that's a lie :) I'll fix the wiki [16:19:45] <_joe_> andrewbogott: if my brain doesn't shut down too much, I'll look into it next [16:19:55] thank you! [16:21:22] 6operations, 10Continuous-Integration-Infrastructure: Build a new version of php-luasandbox and hhvm-luasandbox, and deploy to integration hosts - https://phabricator.wikimedia.org/T101275#1334197 (10Anomie) 3NEW [16:22:56] mobrovac: https://wikitech.wikimedia.org/wiki/Logstash#GELF_transport - can be issue? [16:23:54] no. not really. [16:26:50] (03PS1) 10ArielGlenn: dumps: add 'job' option to worker wrapper script [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215647 [16:28:22] (03CR) 10Ori.livneh: [C: 032] apache: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/215617 (owner: 10Matanya) [16:31:48] (03PS2) 10Ori.livneh: interface: access template variable via '@' [puppet] - 10https://gerrit.wikimedia.org/r/209961 [16:32:25] (03CR) 10Ori.livneh: [C: 032 V: 032] interface: access template variable via '@' [puppet] - 10https://gerrit.wikimedia.org/r/209961 (owner: 10Ori.livneh) [16:32:31] kart_: that is indeed rather strange, i can see some citoid logs, but not restbase, e.g. [16:32:57] kart_: this warrants further investigation, might not be even related to your cxserver config [16:33:45] (03PS2) 10Gage: strongswan: fqdn is a fact, qualify [puppet] - 10https://gerrit.wikimedia.org/r/215635 (owner: 10Matanya) [16:33:57] (03PS4) 10Ori.livneh: Make comment-stripping in MWWikiversions::readDbListFile simpler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214823 [16:34:27] (03CR) 10ArielGlenn: [C: 032] dumps: add 'job' option to worker wrapper script [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215647 (owner: 10ArielGlenn) [16:34:46] (03CR) 10Gage: "Thanks, Matanya!" [puppet] - 10https://gerrit.wikimedia.org/r/215635 (owner: 10Matanya) [16:35:17] (03CR) 10Gage: [C: 032] "Thanks, Matanya!" [puppet] - 10https://gerrit.wikimedia.org/r/215635 (owner: 10Matanya) [16:35:35] mobrovac: ok! [16:35:42] mobrovac: should I file bug? [16:36:21] nah, not yet, have to figure out why is this happening in the first place and whether it pertains to cxserver only or not [16:36:40] kart_: where is cxserver being deployed in deployment-prep? [16:36:44] i.e. which host [16:37:49] mobrovac: deployment-cxserver03 [16:37:54] k [16:39:25] 6operations, 7Monitoring: Upgrade to newer version of gdash - https://phabricator.wikimedia.org/T98134#1334297 (10fgiunchedi) see also upstream discussion on gdash maintneance https://github.com/ripienaar/gdash/issues/110#issuecomment-101288298 [16:40:47] (03PS1) 10ArielGlenn: dumps: don't require a wiki name for specifying --job [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215649 [16:43:47] (03PS1) 10Gergő Tisza: Disable PHP error logging in the Sentry extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215650 (https://phabricator.wikimedia.org/T85188) [16:44:31] (03CR) 10ArielGlenn: [C: 032] dumps: don't require a wiki name for specifying --job [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215649 (owner: 10ArielGlenn) [16:45:58] PROBLEM - puppet last run on eventlog1001 is CRITICAL puppet fail [16:48:28] (03PS1) 10Ori.livneh: memkeys-snapshot: forward to fluorine for aggregation [puppet] - 10https://gerrit.wikimedia.org/r/215651 [16:49:19] (03PS1) 10ArielGlenn: dumps: don't treat "waiting" dump steps as failures in reporting [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215652 [16:50:38] PROBLEM - RAID on eventlog1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:50:45] um [16:50:48] what's going on with eventlog1001? [16:51:43] (03CR) 10ArielGlenn: [C: 032] dumps: don't treat "waiting" dump steps as failures in reporting [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215652 (owner: 10ArielGlenn) [16:52:17] RECOVERY - RAID on eventlog1001 is OK no disks configured for RAID [16:52:51] (03PS1) 10ArielGlenn: dumps: add 'exclusive' option to worker script and wrapper [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215653 [16:52:56] (03PS2) 10Ori.livneh: memkeys-snapshot: forward to fluorine for aggregation [puppet] - 10https://gerrit.wikimedia.org/r/215651 [16:53:15] (03PS3) 10Ori.livneh: memkeys-snapshot: forward to fluorine for aggregation [puppet] - 10https://gerrit.wikimedia.org/r/215651 [16:53:25] (03CR) 10Ori.livneh: [C: 032 V: 032] memkeys-snapshot: forward to fluorine for aggregation [puppet] - 10https://gerrit.wikimedia.org/r/215651 (owner: 10Ori.livneh) [16:55:04] (03CR) 10ArielGlenn: [C: 032] dumps: add 'exclusive' option to worker script and wrapper [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215653 (owner: 10ArielGlenn) [16:56:25] (03PS1) 10Giuseppe Lavagetto: Conftool: initial commit [software/conftool] - 10https://gerrit.wikimedia.org/r/215654 [16:56:35] <_joe_> mmmh how annoying [16:57:21] <_joe_> I can't really use git review for this [16:57:27] mobrovac: let me know or file a bug if you find something in cxserver. I have to go now. [16:57:37] akosiaris, hi, i updated task to clarify priorities. Any thoughts of how long it might take? Thanks! https://phabricator.wikimedia.org/T101233 [16:58:05] kart_: ok, will do [16:59:19] (03PS1) 10Ori.livneh: Fix-up for I7bc734b58: remove spurious '--file' arg to logger [puppet] - 10https://gerrit.wikimedia.org/r/215656 [17:00:10] (03PS1) 10ArielGlenn: dumps: add 'skipdone' to worker script and its wrapper [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215657 [17:00:58] (03CR) 10Greg Grossmeier: [C: 031] "+1 This tool decreases mistakes when dealing with security patches on the WMF cluster and makes the train deployer's life much more sane, " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202665 (https://phabricator.wikimedia.org/T95375) (owner: 1020after4) [17:01:28] RECOVERY - puppet last run on eventlog1001 is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures [17:02:46] (03CR) 10Ori.livneh: [C: 032] Fix-up for I7bc734b58: remove spurious '--file' arg to logger [puppet] - 10https://gerrit.wikimedia.org/r/215656 (owner: 10Ori.livneh) [17:02:50] yurik_: no, not really [17:03:35] yurik_: it will probably very easy to upgrade, but we might have to reimport the db which is obviously time consuming [17:03:43] I am hoping that it will not though [17:04:43] (03CR) 10ArielGlenn: [C: 032] dumps: add 'skipdone' to worker script and its wrapper [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215657 (owner: 10ArielGlenn) [17:06:48] MaxSem, ^ [17:10:55] akosiaris, understood, lets try to upgrade as soon as you can, MaxSem is running into many db issues on it, and lets hope no update is needed... :D [17:11:17] i meant - upgrade to 2.1, hoping no DB upgrade is needed [17:14:16] (03PS1) 10ArielGlenn: dumps: allow 'last' to be specified as date of run [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215661 [17:15:32] (03CR) 10ArielGlenn: [C: 032] dumps: allow 'last' to be specified as date of run [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215661 (owner: 10ArielGlenn) [17:16:09] 6operations, 10ops-eqiad: rubidium - wipe and reclaim to spares - investigate hdd issue - https://phabricator.wikimedia.org/T101279#1334412 (10RobH) 3NEW a:3Cmjohnson [17:16:34] 6operations, 10ops-eqiad: rubidium - wipe and reclaim to spares - investigate hdd issue - https://phabricator.wikimedia.org/T101279#1334424 (10RobH) [17:16:36] 6operations, 10hardware-requests: Replace rubidium with radon for authdns (allocate radon, deallocate rubidium) - https://phabricator.wikimedia.org/T101256#1333643 (10RobH) [17:17:47] robh: awesome, thanks [17:18:02] 10Ops-Access-Requests, 6operations, 6Search-and-Discovery, 3Search-and-Discovery-Research-and-Data-Sprint: Get Oliver Keyes access to Google Webmaster Tools for all Wikimedia domains - https://phabricator.wikimedia.org/T101157#1334427 (10chasemp) Hopefully @dzahn can help clarify but my understanding is pr... [17:19:38] (03Abandoned) 10RobH: git.wikimedia.org.crt sha1 to sha256 [puppet] - 10https://gerrit.wikimedia.org/r/214673 (https://phabricator.wikimedia.org/T100827) (owner: 10RobH) [17:20:10] (03PS1) 10ArielGlenn: dumps: add onepass to worker script and wrapper, fix cutoff option [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215662 [17:22:21] (03PS1) 10RobH: git.w.o cert deletion - exists behind misc-web [puppet] - 10https://gerrit.wikimedia.org/r/215663 [17:24:53] (03CR) 10ArielGlenn: [C: 032] dumps: add onepass to worker script and wrapper, fix cutoff option [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215662 (owner: 10ArielGlenn) [17:26:00] (03PS2) 10RobH: git.w.o cert deletion - exists behind misc-web [puppet] - 10https://gerrit.wikimedia.org/r/215663 [17:26:04] 6operations: foreachwikiexceptdblist to run scripts on all but a blacklist of wikis - https://phabricator.wikimedia.org/T101213#1334452 (10chasemp) p:5Triage>3Normal [17:26:10] (03CR) 10RobH: [C: 032 V: 032] git.w.o cert deletion - exists behind misc-web [puppet] - 10https://gerrit.wikimedia.org/r/215663 (owner: 10RobH) [17:30:32] (03PS1) 10Dzahn: mailman monitoring: adjusting thresholds [puppet] - 10https://gerrit.wikimedia.org/r/215665 [17:31:15] (03CR) 10Dzahn: [C: 032] "https://icinga.wikimedia.org/cgi-bin/icinga/history.cgi?host=sodium&service=mailman+I%2FO+stats" [puppet] - 10https://gerrit.wikimedia.org/r/215665 (owner: 10Dzahn) [17:31:18] PROBLEM - puppet last run on antimony is CRITICAL Puppet has 2 failures [17:31:29] (03PS1) 10ArielGlenn: dumps: do xml stubs via streaming [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215666 [17:34:00] (03PS1) 10Dzahn: gitblit: don't install ssl cert anymore [puppet] - 10https://gerrit.wikimedia.org/r/215667 [17:34:04] robh: ^ [17:34:43] (03PS2) 10Dzahn: gitblit: don't install ssl cert anymore [puppet] - 10https://gerrit.wikimedia.org/r/215667 (https://phabricator.wikimedia.org/T100827) [17:36:19] ACKNOWLEDGEMENT - puppet last run on antimony is CRITICAL Puppet has 2 failures daniel_zahn https://gerrit.wikimedia.org/r/#/c/215667/2 [17:39:30] (03CR) 10ArielGlenn: [C: 032] dumps: do xml stubs via streaming [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215666 (owner: 10ArielGlenn) [17:40:37] (03PS1) 10ArielGlenn: dumps: do xml page logs via streaming [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215671 [17:41:37] (03CR) 10BBlack: [C: 04-1] "Patch on hold, pending more traffic decom from bits first" [puppet] - 10https://gerrit.wikimedia.org/r/215624 (https://phabricator.wikimedia.org/T95448) (owner: 10BBlack) [17:41:53] mutante: shit did my delete break that? [17:42:06] sorry dude [17:42:26] (03CR) 10RobH: [C: 032] gitblit: don't install ssl cert anymore [puppet] - 10https://gerrit.wikimedia.org/r/215667 (https://phabricator.wikimedia.org/T100827) (owner: 10Dzahn) [17:43:15] thx for fix, runnign puppet on antimony now with the udpate [17:43:25] i merged then went on coffee break =P [17:43:28] robh: just the puppet run, not the service or anything, no problem [17:44:31] puppet runs on antimony now [17:45:05] :) [17:45:17] RECOVERY - puppet last run on antimony is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [17:45:22] 7Blocked-on-Operations, 6Labs: Upgrade postgres on labsdb1004 / 1005 to 9.4, and PostGis 2.1 - https://phabricator.wikimedia.org/T101233#1334561 (10Yurik) [17:45:24] (03PS1) 10Yuvipanda: dnsrecursor: Consistently order aliases [puppet] - 10https://gerrit.wikimedia.org/r/215673 (https://phabricator.wikimedia.org/T101281) [17:45:32] 7Blocked-on-Operations, 6Labs, 10Maps: Upgrade postgres on labsdb1004 / 1005 to 9.4, and PostGis 2.1 - https://phabricator.wikimedia.org/T101233#1333262 (10Yurik) [17:45:33] andrewbogott: ^ is 'fix' [17:45:35] 6operations, 7HTTPS, 5Patch-For-Review: replace git's sha1 cert with sha256 - https://phabricator.wikimedia.org/T100827#1334574 (10RobH) 5Open>3Resolved I revoked and deleted the git.wikimedia.org key and certificate, and Daniel's patchset stops the system from installing the now delete (and also unused)... [17:45:36] 6operations, 7HTTPS: Replace SHA1 certificates with SHA256 - https://phabricator.wikimedia.org/T73156#1334576 (10RobH) [17:45:51] andrewbogott: long term fix is to fix https://phabricator.wikimedia.org/T100990 [17:46:11] akosiaris: I can babysit that one through if you want [17:46:22] YuviPanda heh, fair enough. I will merge, I think there might be a second issue. [17:47:06] andrewbogott: ok :) https://phabricator.wikimedia.org/T100990 should be interesting to solve :) [17:47:15] yeah [17:47:51] 10Ops-Access-Requests, 6operations, 6Search-and-Discovery, 3Search-and-Discovery-Research-and-Data-Sprint: Get Oliver Keyes access to Google Webmaster Tools for all Wikimedia domains - https://phabricator.wikimedia.org/T101157#1334593 (10Tfinc) Approved on my end [17:48:23] robh/mutante: re certs for services moved to misc, it would great if we could audit/wipe those everywhere somehow. it's a total PITA when testing sslcert refactors to worry about debugging cases that are completely nonfunctional anyways, but kinda hard to tell at a glance from salt which certs are real and which aren't. [17:48:59] (03CR) 10ArielGlenn: [C: 032] dumps: do xml page logs via streaming [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215671 (owner: 10ArielGlenn) [17:49:05] bblack: we did not _that_ long ago, the one for git. was an oversight [17:49:20] well maybe except labs* [17:49:27] YuviPanda: which one are you referring to ? [17:49:34] ok [17:49:48] akosiaris: bah, I was pointing at andrewbogott and missed. [17:49:49] sorry [17:50:09] bblack: also we are in process of auditing the repo for sha1 and unused certs again, its how it turned up now [17:50:17] but indeed, this is a second sweep for this very thing [17:52:21] 6operations: (old) SSL certificates cleanup - https://phabricator.wikimedia.org/T84536#1334625 (10Dzahn) [17:52:27] ok, cool :) [17:52:59] 6operations: Document new platform specific doc for Dell Poweredge RN30 systems - https://phabricator.wikimedia.org/T101288#1334636 (10RobH) 3NEW a:3Cmjohnson [17:54:51] bblack: robh: reference T84536 for cleanup. also, there is T82319 back from RT which is open and about using SSLCertificateChainFile (or SSLCACertificatePath) [17:55:26] i have this one pending: https://gerrit.wikimedia.org/r/#/c/215508/ to use ChainFile for ganglia [17:55:41] (03PS1) 10ArielGlenn: dumps: only show one list of dumps running [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215674 [17:55:57] that would go with rob's https://gerrit.wikimedia.org/r/#/c/214670/ [17:56:52] (03CR) 10ArielGlenn: [C: 032] dumps: only show one list of dumps running [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215674 (owner: 10ArielGlenn) [17:59:30] 6operations, 10Datasets-General-or-Unknown: snaphot1004 running dumps very slowly, investigate - https://phabricator.wikimedia.org/T98585#1334676 (10ArielGlenn) 5Open>3Resolved I've run a full round of both stubs and logs on snapshot1004 and memory usage was nice and low. closing. Ah for future reference... [18:00:03] (03CR) 10BBlack: [C: 04-1] "Seems like it should be templated (or the other related parameters not-templated? either way)." [puppet] - 10https://gerrit.wikimedia.org/r/215508 (https://phabricator.wikimedia.org/T100825) (owner: 10Dzahn) [18:00:04] twentyafterfour, greg-g: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150603T1800). [18:00:15] No new branch today, per T97553. I will be deploying wmf8 everywhere, assuming there isn't any problem... [18:05:15] (03PS2) 10Faidon Liambotis: Add AAAA/IPv6 PTR to baham [dns] - 10https://gerrit.wikimedia.org/r/215612 [18:05:20] (03CR) 10Faidon Liambotis: [C: 032] Add AAAA/IPv6 PTR to baham [dns] - 10https://gerrit.wikimedia.org/r/215612 (owner: 10Faidon Liambotis) [18:11:25] 6operations: Upgrade sodium to jessie - https://phabricator.wikimedia.org/T82698#1334764 (10akosiaris) Robh, yes it does. I 've updated the above link and added https://wikitech.wikimedia.org/wiki/Ganeti#Create_a_VM in wikitech. Please do request the VM so we can run this process for the very first time. Thanks! [18:12:09] (03PS1) 10Yuvipanda: designate: Use ferm's resolve function than puppet ipresolve [puppet] - 10https://gerrit.wikimedia.org/r/215677 (https://phabricator.wikimedia.org/T101281) [18:12:11] andrewbogott: ^ [18:12:18] 6operations, 7Documentation: Create documentation on the requesting/allocation of virtual machines in the misc cluster - https://phabricator.wikimedia.org/T97072#1334781 (10akosiaris) 5Open>3Resolved I 've updated https://wikitech.wikimedia.org/wiki/Operations_requests#Virtual_Machine_Requests_.28Productio... [18:13:23] bblack, http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Post_not_showing_up_immediately any known purge problems? [18:13:36] YuviPanda: is that the only place $controller is used? Can it be removed? [18:13:46] (03PS1) 1020after4: Delete stale branch symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215678 [18:13:48] (03PS1) 1020after4: All wikis to 1.26wmf8, no new branch today per T97553 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215679 [18:14:01] (03PS2) 10Yuvipanda: designate: Use ferm's resolve function than puppet ipresolve [puppet] - 10https://gerrit.wikimedia.org/r/215677 (https://phabricator.wikimedia.org/T101281) [18:14:02] andrewbogott: yup [18:14:03] done [18:14:49] (03CR) 10Andrew Bogott: [C: 032] designate: Use ferm's resolve function than puppet ipresolve [puppet] - 10https://gerrit.wikimedia.org/r/215677 (https://phabricator.wikimedia.org/T101281) (owner: 10Yuvipanda) [18:14:56] (03PS2) 10Dzahn: ganglia: use SSLCertificateChainFile [puppet] - 10https://gerrit.wikimedia.org/r/215508 (https://phabricator.wikimedia.org/T100825) [18:15:20] (03PS3) 10Dzahn: ganglia: use SSLCertificateChainFile [puppet] - 10https://gerrit.wikimedia.org/r/215508 (https://phabricator.wikimedia.org/T100825) [18:15:43] (03PS1) 10Matanya: openstack : qualify designateconfig [puppet] - 10https://gerrit.wikimedia.org/r/215680 [18:15:52] (03CR) 10Dzahn: "@bblack done" [puppet] - 10https://gerrit.wikimedia.org/r/215508 (https://phabricator.wikimedia.org/T100825) (owner: 10Dzahn) [18:16:01] andrewbogott: if you already on designate ... ^^ :) [18:16:30] 10Ops-Access-Requests, 6operations, 6Release-Engineering, 5Patch-For-Review: Grant access for aklapper to phab-admins - https://phabricator.wikimedia.org/T97642#1334834 (10Aklapper) 5stalled>3Open Went on vacations. Got back. Found people asking for stuff. Created key. Signed L3. wikitech username: ak... [18:16:54] 10Ops-Access-Requests, 6operations, 6Release-Engineering, 5Patch-For-Review: Grant access for aklapper to phab-admins - https://phabricator.wikimedia.org/T97642#1334838 (10Aklapper) a:5Aklapper>3None [18:17:05] (03CR) 10Andrew Bogott: [C: 031] "I will merge as soon as we aren't otherwise messing with that box." [puppet] - 10https://gerrit.wikimedia.org/r/215680 (owner: 10Matanya) [18:17:15] mutante: how can I find a uid ? [18:17:15] matanya: thank you for hunting .erb warnings [18:17:21] (03PS1) 10Yuvipanda: designate: Just use IP address directly for ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/215681 [18:17:21] sure andrewbogott :) [18:17:23] andrewbogott: ^ for now [18:17:35] AaronSchulz: I don't know of a cause, no. [18:17:36] matanya: i do on terbium [18:17:47] (03CR) 10Andrew Bogott: [C: 032] designate: Just use IP address directly for ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/215681 (owner: 10Yuvipanda) [18:17:54] matanya: which user? [18:17:56] note, mutante how can "I" [18:18:06] aklapper [18:18:11] matanya: i think access request to terbium [18:18:12] matanya: in general uids in prod correspond to those in ldap on labs. Also they’re in puppet in the admins module [18:18:19] um… perhaps this is not what you meant [18:18:22] matanya: 2377 [18:18:26] thanks mutante [18:18:35] this is not what i meant andrewbogott , but thanks [18:18:54] andrewbogott: he would need to be able to run "ldaplist" somewhere [18:19:16] yeah, anyone can run ldaplist -l passwd on labs [18:19:21] (03PS5) 10Yuvipanda: Allow for new labs domain schema in ENC [puppet] - 10https://gerrit.wikimedia.org/r/202790 (owner: 10Thcipriani) [18:19:23] matanya: ^ [18:19:38] thanks [18:20:04] 10Ops-Access-Requests, 6operations, 6Release-Engineering, 5Patch-For-Review: Grant access for aklapper to phab-admins - https://phabricator.wikimedia.org/T97642#1334847 (10Matanya) a:3Matanya [18:20:13] (03CR) 10Yuvipanda: [C: 032] Allow for new labs domain schema in ENC [puppet] - 10https://gerrit.wikimedia.org/r/202790 (owner: 10Thcipriani) [18:21:41] AaronSchulz: I don't see anything significant in vhtcpd trends either, which would catch any big failure to purge at the varnish level, or big dropoff in received multicast purges, I think [18:21:45] AaronSchulz: http://ganglia.wikimedia.org/latest/graph_all_periods.php?title=&vl=&x=&n=&hreg%5B%5D=cp%5B0-9%5D%7B4%7D&mreg%5B%5D=vhtcpd_inpkts_dequeued>ype=line&glegend=show&aggregate=1 [18:23:26] 10Ops-Access-Requests, 6operations: Additional Webmaster tools access - https://phabricator.wikimedia.org/T98283#1334857 (10dr0ptp4kt) @akosiaris did you happen to find an automated way to do that, for example with https://developers.google.com/site-verification/v1/ and https://developers.google.com/webmaster-... [18:23:45] AaronSchulz: oh I take that back. It does seem to drop off when you look back far enough. [18:24:26] (03PS2) 10Dzahn: admin: create user for aklapper [puppet] - 10https://gerrit.wikimedia.org/r/208802 (https://phabricator.wikimedia.org/T97642) [18:25:37] (03PS1) 10Yuvipanda: wmflib: Make ipresolve throw an error if it can't resolve [puppet] - 10https://gerrit.wikimedia.org/r/215682 (https://phabricator.wikimedia.org/T99833) [18:25:40] AaronSchulz: nevermind, that was just ganglia averaging getting confused by spikes. Overall, still nothing there I can see to worry about yet [18:25:48] bblack: ^ wmflib erroring ch ange [18:26:17] (03PS1) 10Matanya: access: grant aklapper phab-admins [puppet] - 10https://gerrit.wikimedia.org/r/215683 [18:26:35] (03PS1) 10ArielGlenn: dumps: fix up job name of wb property info dump [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215685 [18:26:57] 6operations, 10RESTBase, 10hardware-requests: Expand RESTBase cluster capacity - https://phabricator.wikimedia.org/T93790#1334880 (10RobH) [18:27:04] (03CR) 1020after4: [C: 032] Delete stale branch symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215678 (owner: 1020after4) [18:27:09] YuviPanda: btw, should I even ask if this ruby resolver thing actually fetches/honors TTLs, or is just using that default 300 ? [18:27:10] (03Merged) 10jenkins-bot: Delete stale branch symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215678 (owner: 1020after4) [18:27:12] (03CR) 1020after4: [C: 032] All wikis to 1.26wmf8, no new branch today per T97553 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215679 (owner: 1020after4) [18:27:18] (03Merged) 10jenkins-bot: All wikis to 1.26wmf8, no new branch today per T97553 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215679 (owner: 1020after4) [18:27:25] (03CR) 10Andrew Bogott: [C: 031] "yes!" [puppet] - 10https://gerrit.wikimedia.org/r/215682 (https://phabricator.wikimedia.org/T99833) (owner: 10Yuvipanda) [18:27:27] 6operations, 10RESTBase, 10hardware-requests: Expand RESTBase cluster capacity - https://phabricator.wikimedia.org/T93790#1334882 (10RobH) 5Open>3Resolved [18:27:32] bblack: it's _joe_'s code :) also, it seems to honor TTLs [18:27:33] matanya: https://gerrit.wikimedia.org/r/#/c/208802/2/modules/admin/data/data.yaml [18:27:39] cool [18:28:07] what about it mutante ? [18:28:21] ah, dammit [18:28:23] sorry [18:28:26] (03CR) 10ArielGlenn: [C: 032] dumps: fix up job name of wb property info dump [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/215685 (owner: 10ArielGlenn) [18:29:43] mutante: do me a favor, since i seem to have problem reading, please claim tasks you took, sorry for the dup [18:29:53] (03Abandoned) 10Matanya: access: grant aklapper phab-admins [puppet] - 10https://gerrit.wikimedia.org/r/215683 (owner: 10Matanya) [18:30:27] (03PS1) 10Yuvipanda: Revert "tools: Puppetize database aliases as host resources" [puppet] - 10https://gerrit.wikimedia.org/r/215686 [18:30:32] 10Ops-Access-Requests, 6operations, 6Release-Engineering, 5Patch-For-Review: Grant access for aklapper to phab-admins - https://phabricator.wikimedia.org/T97642#1334910 (10Matanya) a:5Matanya>3Dzahn [18:30:33] So I don't even need to run scap today, huh? just sync-wikiversions should do the tricl [18:30:34] (03CR) 10jenkins-bot: [V: 04-1] Revert "tools: Puppetize database aliases as host resources" [puppet] - 10https://gerrit.wikimedia.org/r/215686 (owner: 10Yuvipanda) [18:30:35] trick [18:30:35] (03CR) 10BBlack: wmflib: Make ipresolve throw an error if it can't resolve (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/215682 (https://phabricator.wikimedia.org/T99833) (owner: 10Yuvipanda) [18:30:48] matanya: it's actually 2 different patches, one that makes the user and one that gives him access. alright [18:31:00] yes, that too [18:31:02] well I guess it wont' hurt to scap to sync the deleted symlinks [18:31:22] * matanya goes back to .erb fun [18:31:45] matanya: it was assigned to andre because we needed his key, the idea is that he then gives it back. and .. done [18:32:00] whatever :) [18:32:12] 6operations: Google Webmaster Tools - 1000 domain limit - https://phabricator.wikimedia.org/T99132#1334919 (10dr0ptp4kt) I've thought about this some more. I'm not too keen on creation of another @wikimedia.org account to hold more sites in it. I have asked the contact at Google if a feature request could be pu... [18:33:26] (03CR) 10Yuvipanda: wmflib: Make ipresolve throw an error if it can't resolve (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/215682 (https://phabricator.wikimedia.org/T99833) (owner: 10Yuvipanda) [18:34:24] (03CR) 10Dzahn: [C: 032] admin: create user for aklapper [puppet] - 10https://gerrit.wikimedia.org/r/208802 (https://phabricator.wikimedia.org/T97642) (owner: 10Dzahn) [18:34:42] (03CR) 10BBlack: [C: 031] wmflib: Make ipresolve throw an error if it can't resolve [puppet] - 10https://gerrit.wikimedia.org/r/215682 (https://phabricator.wikimedia.org/T99833) (owner: 10Yuvipanda) [18:35:07] !log twentyafterfour Started scap: Delete stale branch symlinks (1.26wmf1,1.26wmf2) [18:35:08] (03PS6) 10Faidon Liambotis: autoinstall: install linux-meta (3.19) on jessie [puppet] - 10https://gerrit.wikimedia.org/r/211688 (https://phabricator.wikimedia.org/T100773) (owner: 10Muehlenhoff) [18:35:12] Logged the message, Master [18:36:05] (03CR) 10BBlack: [C: 031] autoinstall: install linux-meta (3.19) on jessie [puppet] - 10https://gerrit.wikimedia.org/r/211688 (https://phabricator.wikimedia.org/T100773) (owner: 10Muehlenhoff) [18:37:41] (03PS2) 10Yuvipanda: Revert "tools: Puppetize database aliases as host resources" [puppet] - 10https://gerrit.wikimedia.org/r/215686 [18:38:34] (03PS1) 10Matanya: glance: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/215687 [18:39:23] (03PS2) 10Dzahn: admin: add aklapper to phabricator-admins [puppet] - 10https://gerrit.wikimedia.org/r/207846 (https://phabricator.wikimedia.org/T97642) [18:40:07] (03PS3) 10Dzahn: admin: add aklapper to phabricator-admins [puppet] - 10https://gerrit.wikimedia.org/r/207846 (https://phabricator.wikimedia.org/T97642) [18:40:39] chasemp: ^ feel like confirming that? [18:40:49] user is created, it's just 1 line now [18:41:46] (03CR) 10Rush: [C: 031] "By the nonpowers invested in me I hereby declare andre and phabricator married." [puppet] - 10https://gerrit.wikimedia.org/r/207846 (https://phabricator.wikimedia.org/T97642) (owner: 10Dzahn) [18:42:01] thanks:) [18:42:21] !log twentyafterfour Finished scap: Delete stale branch symlinks (1.26wmf1,1.26wmf2) (duration: 07m 14s) [18:42:25] (03CR) 10Dzahn: [C: 032] admin: add aklapper to phabricator-admins [puppet] - 10https://gerrit.wikimedia.org/r/207846 (https://phabricator.wikimedia.org/T97642) (owner: 10Dzahn) [18:42:27] Logged the message, Master [18:43:04] Deployment train is really flying today, fastest wednesday scap evar [18:43:19] (because: no new branch :-) [18:43:36] scap release "shinkansen" [18:44:02] chasemp: puppet disabled by admin on phab server? [18:44:12] ah me yes hang on [18:44:41] gtg [18:44:56] works, thanks [18:45:12] (03CR) 10Aklapper: "I see yet another expensive divorce coming up on some future day when a hip thing shows up that deprecates Phabricator..." [puppet] - 10https://gerrit.wikimedia.org/r/207846 (https://phabricator.wikimedia.org/T97642) (owner: 10Dzahn) [18:46:08] !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: All wikis to 1.26wmf8, no new branch until next Tuesday, June 9th [18:46:14] Logged the message, Master [18:46:45] (03CR) 10Dzahn: "for now, enjoy your new home in the suburbs" [puppet] - 10https://gerrit.wikimedia.org/r/207846 (https://phabricator.wikimedia.org/T97642) (owner: 10Dzahn) [18:46:52] (03CR) 10Faidon Liambotis: "Note that we could preseed base-installer/kernel/image and avoid installing Debian's default kernel entirely and most of the shell logic (" [puppet] - 10https://gerrit.wikimedia.org/r/211688 (https://phabricator.wikimedia.org/T100773) (owner: 10Muehlenhoff) [18:47:07] (03CR) 10Filippo Giunchedi: [C: 031] autoinstall: install linux-meta (3.19) on jessie [puppet] - 10https://gerrit.wikimedia.org/r/211688 (https://phabricator.wikimedia.org/T100773) (owner: 10Muehlenhoff) [18:47:12] (03PS3) 10BBlack: certs: remove random certificates::* includes [puppet] - 10https://gerrit.wikimedia.org/r/215346 (owner: 10Faidon Liambotis) [18:48:15] (03CR) 10BBlack: [C: 031] "+1, with caveat that I think there's risk of breakage for some labs/ldap-ish things unless you merge this together with at least the next " [puppet] - 10https://gerrit.wikimedia.org/r/215346 (owner: 10Faidon Liambotis) [18:49:06] (03PS3) 10BBlack: certs: kill a bunch of Labs classes [puppet] - 10https://gerrit.wikimedia.org/r/215347 (owner: 10Faidon Liambotis) [18:50:22] (03PS3) 10Yuvipanda: Revert "tools: Puppetize database aliases as host resources" [puppet] - 10https://gerrit.wikimedia.org/r/215686 [18:51:46] (03CR) 10BBlack: [C: 04-1] "`git grep star.wmflabs` seems to turn up hits that are likely referencing these files as installed by these removed stanzas." [puppet] - 10https://gerrit.wikimedia.org/r/215347 (owner: 10Faidon Liambotis) [18:51:54] (03CR) 10coren: [C: 032] "Less ugly than the alternative. Let's avoid poking the sleeping lion until we can replace it for good." [puppet] - 10https://gerrit.wikimedia.org/r/215686 (owner: 10Yuvipanda) [18:52:24] (03CR) 10BBlack: [C: 031] certs: inline certificate:: classes to ::base [puppet] - 10https://gerrit.wikimedia.org/r/215348 (owner: 10Faidon Liambotis) [18:52:58] (03PS1) 10Dzahn: admin: add aklapper to bastion-only group [puppet] - 10https://gerrit.wikimedia.org/r/215689 (https://phabricator.wikimedia.org/T97642) [18:53:03] (03CR) 10BBlack: [C: 031] base: certificates::base -> base::certificates [puppet] - 10https://gerrit.wikimedia.org/r/215349 (owner: 10Faidon Liambotis) [18:53:21] (03CR) 10Yuvipanda: "Aren't they just using install_certificate directly than using these classes?" [puppet] - 10https://gerrit.wikimedia.org/r/215347 (owner: 10Faidon Liambotis) [18:54:18] (03CR) 10BBlack: [C: 031] sslcert: include ::chainedcert from ::certificate [puppet] - 10https://gerrit.wikimedia.org/r/215350 (owner: 10Faidon Liambotis) [18:54:52] (03CR) 10BBlack: [C: 031] sslcert: remove ::certificate's $content parameter [puppet] - 10https://gerrit.wikimedia.org/r/215351 (owner: 10Faidon Liambotis) [18:55:00] (03PS2) 10Dzahn: admin: add aklapper to bastion-only group [puppet] - 10https://gerrit.wikimedia.org/r/215689 (https://phabricator.wikimedia.org/T97642) [18:55:24] (03CR) 10Dzahn: [C: 032] admin: add aklapper to bastion-only group [puppet] - 10https://gerrit.wikimedia.org/r/215689 (https://phabricator.wikimedia.org/T97642) (owner: 10Dzahn) [18:55:45] (03CR) 10BBlack: [C: 04-1] "Should sub to the script itself as well, so that if we fix chain generation bugs in the code they get rebuilt?" [puppet] - 10https://gerrit.wikimedia.org/r/215353 (owner: 10Faidon Liambotis) [18:56:19] So what's the deal with the db connection errors in fatalmonitor. I was told it's a known issue but is it just something I need to continue to ignore for the time being? Anyone know which ticket is tracking it? [18:56:33] bblack: updated https://gerrit.wikimedia.org/r/#/c/215508/ [18:57:11] (03CR) 10BBlack: [C: 031] base::certs: rename CA filenames to their CNs [puppet] - 10https://gerrit.wikimedia.org/r/215596 (owner: 10Faidon Liambotis) [18:57:59] (03CR) 10BBlack: [C: 031] base::certs: remove backwards-compat ensure => absents [puppet] - 10https://gerrit.wikimedia.org/r/215597 (owner: 10Faidon Liambotis) [18:58:03] (03CR) 10BBlack: [C: 031] certs: replace require by collector ordering [puppet] - 10https://gerrit.wikimedia.org/r/215352 (owner: 10Faidon Liambotis) [18:58:16] (03PS1) 10Yuvipanda: tools: Don't include killed labsdb client /etc/hosts class [puppet] - 10https://gerrit.wikimedia.org/r/215691 [18:58:22] Coren: ^ [18:58:53] (03CR) 10coren: [C: 032] "Indeed." [puppet] - 10https://gerrit.wikimedia.org/r/215691 (owner: 10Yuvipanda) [18:59:08] (03PS2) 10coren: tools: Don't include killed labsdb client /etc/hosts class [puppet] - 10https://gerrit.wikimedia.org/r/215691 (owner: 10Yuvipanda) [18:59:18] Ah, needs a rebase (silly git) [18:59:31] (03CR) 10BBlack: [C: 031] ganglia: use SSLCertificateChainFile [puppet] - 10https://gerrit.wikimedia.org/r/215508 (https://phabricator.wikimedia.org/T100825) (owner: 10Dzahn) [19:00:24] 10Ops-Access-Requests, 6operations, 6Release-Engineering, 5Patch-For-Review: Grant access for aklapper to phab-admins - https://phabricator.wikimedia.org/T97642#1335010 (10Dzahn) 5Open>3Resolved user has been created on bast1001 and on iridium. linked andre__ to example for ProxyCommand setup. let us k... [19:01:14] 10Ops-Access-Requests, 6operations, 6Release-Engineering, 5Patch-For-Review: Grant access for aklapper to phab-admins - https://phabricator.wikimedia.org/T97642#1335012 (10Dzahn) ``` [iridium:/etc/sudoers.d] $ sudo cat phabricator-admin # This file is managed by Puppet! %phabricator-admin ALL = NOPASSWD:... [19:02:42] Coren: I usually don't wait for jenkins on trivial rebases :) Just C+2 and V+2 [19:03:44] can we disable gerrit automatic merging of patches that need rebasing on a per-project basis? [19:03:58] Hm, there are two issues atm. /etc/exim4/exim4.conf is regenerated every run (the order of the route_list entires flaps) [unrelated to the current issue] [19:04:10] But also: Could not retrieve information from environment production source(s) puppet:///modules/toollabs/hosts [19:04:32] bblack: you can configure the repo to be fast-forward only, all the CI repos are set up that way [19:05:00] we should do that for ops/puppet [19:05:13] * bblack goes digging for where to config it [19:05:28] (03PS1) 10Yuvipanda: toollabs: Source hosts file as template, not file [puppet] - 10https://gerrit.wikimedia.org/r/215692 [19:05:34] Coren: ^ [19:05:43] YuviPanda: Oh. You changed it to a template, but still use source [19:05:50] Yeah, what I said. :-) [19:06:09] (03PS2) 10coren: toollabs: Source hosts file as template, not file [puppet] - 10https://gerrit.wikimedia.org/r/215692 (owner: 10Yuvipanda) [19:06:52] bblack: https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/core [19:06:52] (03PS1) 10Matanya: glance: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/215693 [19:07:07] Project Options --> Submit Type [19:07:35] (03CR) 10coren: [C: 032] "Now 17% more cool." [puppet] - 10https://gerrit.wikimedia.org/r/215692 (owner: 10Yuvipanda) [19:07:56] uhm... 14 error: Invalid operand type was used: cannot perform this operation with arrays in /srv/mediawiki/php-1.26wmf6/languages/Language.php on line 481 [19:08:04] !log changed ops/puppet repo to ff-only in gerrit config, feel free to scream/revert if necc! [19:08:06] How is 1.26wmf6 still running? [19:08:09] Logged the message, Master [19:08:19] (03PS2) 10Matanya: glance: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/215687 [19:08:44] (03PS2) 10Matanya: glance: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/215693 [19:08:50] YuviPanda: hey [19:08:53] hi paravoid [19:09:04] so, certificates::star_wmflabs_org & certificates::star_wmflabs are being included via LDAP indeed [19:09:11] oh [19:09:28] they probably don't work? if they do I'll be kind of sad. [19:09:32] paravoid: do you know which hosts? [19:09:43] basictestsugar.sugarcrm.eqiad.wmflabs officetools.sugarcrm.eqiad.wmflabs embed-sandbox.embed-sandbox.eqiad.wmflabs language-dev.language.eqiad.wmflabs [19:09:58] (new FQDNs ftw) [19:10:27] certificates::star_wmflabs is just language-dev.language.eqiad.wmflabs [19:10:34] the rest are all _org [19:10:57] could you (or someone else) shed some light on those? [19:12:50] (03PS1) 10Matanya: glance: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215694 [19:13:52] paravoid: I don't think they ever had access to the actual private cert file, so I think they were all just attempts to get the certificate that ended with 'why is this not working? bah!' [19:14:09] paravoid: so I'm going to just remove the role and be ok with that. thoughts? [19:14:31] can we confirm that they did not have access to it? [19:14:38] 10Ops-Access-Requests, 6operations, 6Search-and-Discovery, 3Search-and-Discovery-Research-and-Data-Sprint: Get Oliver Keyes access to Google Webmaster Tools for all Wikimedia domains - https://phabricator.wikimedia.org/T101157#1335075 (10Dzahn) @chasemp Yes, there are only 2 options, we prefer to delegate... [19:14:48] paravoid: hmm, by asking them? or looking through the instances? [19:14:56] either :) [19:15:04] paravoid: I think the first two is Jamesofur|cloud or Jamesofur [19:15:13] Jamesofur|cloud: Jamesofur are you around? :) [19:15:15] hmmm? [19:15:31] (03PS1) 10Matanya: ganglia: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215695 [19:15:36] Jamesofur: were you the person who setup / played around with sugarcrm on labs? [19:15:51] I actively run sugarcrm on labs [19:15:54] sugar.corp.wikimedia.org [19:15:56] is on labs [19:15:59] YuviPanda: i bet it was lila :) [19:16:01] wait wat. [19:16:06] corp.wikimedia.org is on labs? [19:16:19] that particular site is [19:16:23] because Erik demanded we move it there [19:16:39] I... see. [19:16:41] (03CR) 10Paladox: "Hi I have tested on my local machine and only way to do it is if we remove the bugzilla link for gitblit. Which would be breaking since we" [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [19:16:53] though to be honest the instability has led me to thinking about rolling back to OIT :) [19:16:56] (03PS15) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 [19:17:08] (not because I know you guys aren't trying, just because I could hit myself there when I screwed up) [19:17:18] though we're also looking at options outside of sugar [19:17:25] so we may not need it at all in the next 6 months or so [19:17:29] Jamesofur: yes, and I don't know if anyone in ops knew this as well, and I'd have probably asked Erik to 'booo!' if he suggested moving it to labs [19:17:31] 10Ops-Access-Requests, 6operations: Additional Webmaster tools access - https://phabricator.wikimedia.org/T98283#1335102 (10Dzahn) And meanwhile we have yet another request to get access here for @Ironholds T101157 [19:17:31] but for now it's quite active [19:17:48] Jamesofur: anyway - did you have access to / use the star.wmflabs.org ssl certificate at any point? [19:17:49] YuviPanda: at some level I tried, in the end though it wasn't worth the fight [19:17:52] (03PS1) 10Ori.livneh: Fix-up for I7bc734b58: use rsyslog 5 syntax for declaring template [puppet] - 10https://gerrit.wikimedia.org/r/215696 [19:17:57] not that I know of no [19:18:22] Jamesofur: alright. because that 'role' was checked in the class list for a couple of instances in the sugarcrm project. [19:18:32] (03PS2) 10Ori.livneh: Fix-up for I7bc734b58: use rsyslog 5 syntax for declaring template [puppet] - 10https://gerrit.wikimedia.org/r/215696 [19:18:33] I think that was me thinking that would be a good idea [19:18:38] but then I couldn't get the real private cert [19:18:38] (03CR) 10Ori.livneh: [C: 032 V: 032] Fix-up for I7bc734b58: use rsyslog 5 syntax for declaring template [puppet] - 10https://gerrit.wikimedia.org/r/215696 (owner: 10Ori.livneh) [19:18:39] right :) [19:18:41] and so I gave up :) [19:18:43] paravoid: ^^ [19:18:46] (03CR) 10Paladox: "But since we moved to phabricator and no longer use bugzilla accept from redirecting the bug to correct task in phabricator. And currently" [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [19:18:48] so that's two down. [19:19:03] leaves us with embed-sandbox.embed-sandbox.eqiad.wmflabs language-dev.language.eqiad.wmflabs [19:19:09] kart_: around? [19:19:16] Jamesofur: thank you :) [19:19:21] anytime :) [19:19:25] Jamesofur: and yes, you *should* move it back to OIT [19:19:31] :) [19:19:35] :) [19:19:51] YuviPanda: well if it happened for two of them, sounds reasonable this was the case for the rest [19:19:52] I still run my most sensitive tools there, so I might ;) [19:20:11] Jamesofur: does it store all donor information ever, in the world? :) [19:20:28] paravoid: I'm ok with that assumption. [19:20:34] no, but it has child porn go through the server on the way to reports [19:20:48] (03CR) 10Dzahn: [C: 031] Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [19:21:12] * YuviPanda makes joke about commons [19:21:16] heh [19:21:29] YuviPanda: you actually have to make the jokes? :p [19:21:41] (03PS3) 10Andrew Bogott: glance: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/215693 (owner: 10Matanya) [19:22:12] Jamesofur: YuviPanda: re: "moving to labs" was from a personal laptop [19:22:20] so that is an improvement [19:22:25] liez [19:22:32] it was from a server on the 3rd floor [19:22:43] then there was another step in between [19:22:44] (03CR) 10Andrew Bogott: [C: 032] glance: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/215693 (owner: 10Matanya) [19:22:57] I doubt they'd let Jamesofur have a laptop powerful enough! [19:23:03] OIT rack sounds reasoanble, actually [19:23:08] paravoid: want me to unpick them from ldap? [19:23:22] actually I have one of the best laptops in the office I bet, Philippe never says no to me when I ask and I still hurt it's resources [19:23:27] Jamesofur: as long as you know what is there ... [19:23:28] (03CR) 10Paladox: "Thanks I will upload the patch in just a minute." [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [19:24:22] Jamesofur: ask Philippe to stretch the budget and get a laptop with like 64GB of RAM and 24TB of disks ;) [19:24:29] It'll be funny to see that quote being reasoned [19:24:48] (03PS3) 10Andrew Bogott: glance: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/215687 (owner: 10Matanya) [19:24:48] or the weight of that laptop [19:24:59] Jamesofur: I got a pretty good one (maxed out MBP), was very useful as Android dev, painful as ops [19:25:11] yeah, that's what I have [19:25:12] (03PS16) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 [19:25:33] yeah, OIT rack was actually somewhat perfect. It would have been even better to have it on production hardware at some level but Ops never wanted to do that (good reasoning to be fair) and so Erik wanted all production used tools off of OIT hardware [19:25:36] hence the labs [19:25:46] (03CR) 10Andrew Bogott: [C: 032] glance: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/215687 (owner: 10Matanya) [19:25:50] (03PS1) 10BBlack: site.pp for cp20xx [puppet] - 10https://gerrit.wikimedia.org/r/215699 (https://phabricator.wikimedia.org/T101204) [19:25:56] Jamesofur: those specs? Geez. You have a laptop better than most servers in production [19:26:15] :) [19:26:26] and it still gets slow sometimes [19:26:36] Jamesofur: :) I just want a thinner, less powerful one [19:26:38] (03PS2) 10Andrew Bogott: openstack : qualify designateconfig [puppet] - 10https://gerrit.wikimedia.org/r/215680 (owner: 10Matanya) [19:26:40] between my monitors and how much I keep open/running at a given time it still cries sometimes [19:26:40] ssh doesn't need all this weight [19:26:49] YuviPanda: the new retinas are nice [19:26:56] Jamesofur: the 12"/ [19:26:58] ? [19:27:12] (03PS2) 10BBlack: site.pp for cp20xx [puppet] - 10https://gerrit.wikimedia.org/r/215699 (https://phabricator.wikimedia.org/T101204) [19:27:14] and 15 [19:27:30] (03CR) 10Andrew Bogott: [C: 032] openstack : qualify designateconfig [puppet] - 10https://gerrit.wikimedia.org/r/215680 (owner: 10Matanya) [19:27:34] (03CR) 10BBlack: [C: 032 V: 032] site.pp for cp20xx [puppet] - 10https://gerrit.wikimedia.org/r/215699 (https://phabricator.wikimedia.org/T101204) (owner: 10BBlack) [19:27:50] (03PS3) 10BBlack: site.pp for cp20xx [puppet] - 10https://gerrit.wikimedia.org/r/215699 (https://phabricator.wikimedia.org/T101204) [19:27:55] (03PS1) 10Matanya: graphite: hotname is a fact, qualify [puppet] - 10https://gerrit.wikimedia.org/r/215700 [19:27:57] (03CR) 10BBlack: [V: 032] site.pp for cp20xx [puppet] - 10https://gerrit.wikimedia.org/r/215699 (https://phabricator.wikimedia.org/T101204) (owner: 10BBlack) [19:28:30] (03PS2) 10Matanya: ganglia: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215695 [19:28:55] (03PS2) 10Matanya: glance: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215694 [19:30:04] Coren: what is the status of https://phabricator.wikimedia.org/T87870 ? [19:30:42] matanya: Reviewing the current -2 a submiting a new changeset is on this week's todo. [19:30:47] (03CR) 10Andrew Bogott: [C: 032] ganglia: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215695 (owner: 10Matanya) [19:31:07] matanya: I think atm the primary problem is unclear documentation of the why and how of the change. [19:31:19] Coren: that would be awsome, and would unblock me too. [19:31:41] never had such a highlighted channel, my entire screen is orange :) [19:32:26] (03PS2) 10Matanya: graphite: hotname is a fact, qualify [puppet] - 10https://gerrit.wikimedia.org/r/215700 [19:32:37] (03PS3) 10Matanya: glance: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215694 [19:33:23] matanya: hostname * :) [19:33:30] (03CR) 10Andrew Bogott: [C: 032] glance: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215694 (owner: 10Matanya) [19:33:39] heh Krinkle :D [19:34:56] (03PS3) 10Matanya: graphite: hotname is a fact, qualify [puppet] - 10https://gerrit.wikimedia.org/r/215700 [19:35:05] matanya: all no-ops so far :) [19:35:10] Krenair: please assume I'm not clueless and there's a reason I tagged that task as Labs [19:35:22] very glad andrewbogott , touch wood [19:35:32] hi paravoid [19:35:36] hi matanya [19:35:44] matanya: How does that block you? Maybe I can do a subset that'll unblock you faster? [19:36:07] Coren: not blocking me directly: https://gerrit.wikimedia.org/r/#/c/160628/1 [19:36:37] matanya: Ah, indeed. [19:36:59] (03PS4) 10Yuvipanda: Add support for Generic Webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/215505 (https://phabricator.wikimedia.org/T97230) [19:38:58] (03PS5) 10Yuvipanda: Add support for Generic Webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/215505 (https://phabricator.wikimedia.org/T97230) [19:39:10] (03PS1) 10Matanya: strongswan: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215704 [19:41:59] paravoid, I don't think you're clueless, it just doesn't seem like a labs thing [19:42:23] the role class is called "ldap::role::server::labs" [19:42:29] (03CR) 10Andrew Bogott: [C: 032] graphite: hotname is a fact, qualify [puppet] - 10https://gerrit.wikimedia.org/r/215700 (owner: 10Matanya) [19:42:46] and it has been traditionally a Labs thing, although its use has expanded since [19:43:04] it's not exclusively a Labs thing, which is why it's tagged as both #ops and #labs [19:43:08] (03PS6) 10Yuvipanda: Add support for Generic Webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/215505 (https://phabricator.wikimedia.org/T97230) [19:43:14] andrewbogott: i hope it is ok with you i broke all the changes to mini-changes, and not bulked them [19:43:27] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 748.200115412 [19:45:31] (03PS1) 10Matanya: nove: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/215705 [19:45:57] (03PS2) 10Matanya: strongswan: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215704 [19:45:59] (03CR) 10Paladox: "Seems to not work if you use T then the number but if number on own it works. Do you know why and how I can change this." [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [19:47:59] (03CR) 10Brian Wolff: [C: 04-1] "The regex should include a T. Bug numbers without a T have a different numbering scheme when referenced in phabricator, and either need to" [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [19:48:07] (03PS7) 10Yuvipanda: Add support for Generic Webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/215505 (https://phabricator.wikimedia.org/T97230) [19:48:23] godog: https://phabricator.wikimedia.org/T101141#1335208 [19:48:53] (03PS1) 10Matanya: ganalia_new: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215707 [19:49:34] matanya: lots ot little patches is better. [19:49:40] good [19:51:14] (03CR) 10Paladox: "Ok and it seems that it will only work if the number is small." [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [19:51:31] (03CR) 10Paladox: "And where is regex." [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [19:51:48] (03CR) 10Faidon Liambotis: "git grep's results do not yield any usage of the classes, unless I'm missing something (the certificates aren't touched)." [puppet] - 10https://gerrit.wikimedia.org/r/215347 (owner: 10Faidon Liambotis) [19:51:50] (03PS1) 10Odder: Add extra namespace aliases for Italian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215708 (https://phabricator.wikimedia.org/T101274) [19:53:40] (03CR) 10Jdlrobson: [C: 031] Enable MediaWiki logo on mobile login page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215361 (https://phabricator.wikimedia.org/T100633) (owner: 10Jdlrobson) [19:57:25] (03PS1) 10Matanya: keystongconfig: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215709 [19:58:03] (03CR) 10Andrew Bogott: [C: 031] strongswan: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215704 (owner: 10Matanya) [19:58:19] (03CR) 10Andrew Bogott: [C: 032] ganalia_new: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215707 (owner: 10Matanya) [19:58:55] (03CR) 10Paladox: "Thanks I found it and works I am testing now to make sure that removing bugzilla in favour of phabricator. and make sure that I could try " [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [19:59:05] (03PS2) 10Andrew Bogott: nove: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/215705 (owner: 10Matanya) [20:00:00] (03CR) 10Andrew Bogott: [C: 032] nove: qualify vars [puppet] - 10https://gerrit.wikimedia.org/r/215705 (owner: 10Matanya) [20:00:05] gwicke, cscott, arlolra, subbu: Dear anthropoid, the time has come. Please deploy Services – Parsoid / OCG / Citoid / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150603T2000). [20:01:16] (03CR) 10Paladox: "Uploading now all tested and both bugzilla and phabricator will work now in gitblit. And also includes a full highlight link for gerrit fo" [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [20:01:39] (03PS8) 10Yuvipanda: Add support for Generic Webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/215505 (https://phabricator.wikimedia.org/T97230) [20:04:56] (03PS17) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 [20:06:01] (03CR) 10Paladox: "@Dzahn please review and merge. and also V+2 since jenkings doesent seem to do it here. And this has been tested by me and works. I didn't" [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [20:06:05] (03PS1) 10Matanya: ldap-groups: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215712 [20:06:10] (03PS18) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 [20:06:20] (03PS2) 10Matanya: keystongconfig: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215709 [20:07:14] (03CR) 10Andrew Bogott: [C: 032] keystongconfig: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215709 (owner: 10Matanya) [20:07:25] (03PS9) 10Yuvipanda: Add support for Generic Webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/215505 (https://phabricator.wikimedia.org/T97230) [20:09:14] (03PS19) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 [20:09:22] Jamesofur|cloud: still there? [20:09:33] (03PS2) 10Andrew Bogott: ldap-groups: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215712 (owner: 10Matanya) [20:10:18] ganalia_new & keystong [20:10:36] hotname, designateconfig [20:10:42] (03PS10) 10Yuvipanda: Add support for Generic Webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/215505 (https://phabricator.wikimedia.org/T97230) [20:10:42] typos! [20:11:26] paravoid: me, i guess? [20:11:39] you & your reviewers :) [20:11:46] "nove" [20:11:48] paravoid: he’s on a roll, I didn’t want to interrupt [20:11:54] :) [20:12:01] paravoid: yeah, sorry, i'll slow down [20:12:27] paravoid: any puppet 4 planned in far future ? [20:13:22] (03CR) 10Hashar: "Maybe it should be applied to production as well? Or we will end up invoking Raven there too in case sentry is turned on." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215650 (https://phabricator.wikimedia.org/T85188) (owner: 10Gergő Tisza) [20:13:56] ori: inline templates also should be qualified ? [20:14:20] yes [20:14:47] thanks [20:15:14] (03PS1) 10Matanya: memorysize is a fact, qualify [puppet] - 10https://gerrit.wikimedia.org/r/215763 [20:15:35] ori: was referring to this one ^ [20:16:00] (03PS11) 10Yuvipanda: Add support for Generic Webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/215505 (https://phabricator.wikimedia.org/T97230) [20:16:14] (03PS3) 10Faidon Liambotis: sslcert: include ::chainedcert from ::certificate [puppet] - 10https://gerrit.wikimedia.org/r/215350 [20:16:16] (03PS3) 10Faidon Liambotis: sslcert: remove ::certificate's $content parameter [puppet] - 10https://gerrit.wikimedia.org/r/215351 [20:16:18] matanya is a fact, qualify [20:16:18] (03PS3) 10Faidon Liambotis: certs: inline certificate:: classes to ::base [puppet] - 10https://gerrit.wikimedia.org/r/215348 [20:16:20] (03PS3) 10Faidon Liambotis: base: certificates::base -> base::certificates [puppet] - 10https://gerrit.wikimedia.org/r/215349 [20:16:22] (03PS4) 10Faidon Liambotis: certs: remove random certificates::* includes [puppet] - 10https://gerrit.wikimedia.org/r/215346 [20:16:24] (03PS4) 10Faidon Liambotis: certs: kill a bunch of Labs classes [puppet] - 10https://gerrit.wikimedia.org/r/215347 [20:16:26] (03PS2) 10Faidon Liambotis: base::certs: remove backwards-compat ensure => absents [puppet] - 10https://gerrit.wikimedia.org/r/215597 [20:16:28] (03PS2) 10Faidon Liambotis: base::certs: rename CA filenames to their CNs [puppet] - 10https://gerrit.wikimedia.org/r/215596 [20:16:30] (03PS4) 10Faidon Liambotis: certs: replace require by collector ordering [puppet] - 10https://gerrit.wikimedia.org/r/215352 [20:16:32] (03PS4) 10Faidon Liambotis: sslcert: automatically regenerate chained cert on changes [puppet] - 10https://gerrit.wikimedia.org/r/215353 [20:16:34] (03PS1) 10Faidon Liambotis: sslcert: whitespace & comment cleanups [puppet] - 10https://gerrit.wikimedia.org/r/215765 [20:16:44] ok paravoid you win :D [20:16:49] * hashar looks at jenkins [20:17:30] Faidon has an objective to DoS jenkins clearly [20:17:52] oh Zuul would just queue the jobs :-} [20:18:13] (03CR) 10Brian Wolff: "To anyone who merges this, it should be noted that messing with trackingids (Not sure why we are even doing that, especially given that th" [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [20:18:56] (03CR) 10Faidon Liambotis: [C: 032] certs: remove random certificates::* includes [puppet] - 10https://gerrit.wikimedia.org/r/215346 (owner: 10Faidon Liambotis) [20:19:11] (03CR) 10Faidon Liambotis: [C: 032] certs: kill a bunch of Labs classes [puppet] - 10https://gerrit.wikimedia.org/r/215347 (owner: 10Faidon Liambotis) [20:19:18] !log deployed parsoid sha ab675400 [20:19:23] Logged the message, Master [20:19:25] (03CR) 10Faidon Liambotis: [C: 032] certs: inline certificate:: classes to ::base [puppet] - 10https://gerrit.wikimedia.org/r/215348 (owner: 10Faidon Liambotis) [20:20:11] (03CR) 10Faidon Liambotis: [C: 032] base: certificates::base -> base::certificates [puppet] - 10https://gerrit.wikimedia.org/r/215349 (owner: 10Faidon Liambotis) [20:20:13] (03PS2) 10Andrew Bogott: memorysize is a fact, qualify [puppet] - 10https://gerrit.wikimedia.org/r/215763 (owner: 10Matanya) [20:21:09] (03PS3) 10Andrew Bogott: ldap-groups: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215712 (owner: 10Matanya) [20:21:29] (03CR) 10Gergő Tisza: "If Sentry is enabled on production, the whole code block needs to be moved over anyway. (IIRC I put it into CommonSettings.php first, but " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215650 (https://phabricator.wikimedia.org/T85188) (owner: 10Gergő Tisza) [20:21:58] (03CR) 10Andrew Bogott: [C: 032] ldap-groups: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215712 (owner: 10Matanya) [20:22:06] (03PS1) 10Matanya: multicast: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215767 [20:23:12] (03CR) 10Andrew Bogott: [C: 04-1] memorysize is a fact, qualify (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/215763 (owner: 10Matanya) [20:23:20] !log ori Synchronized php-1.26wmf8/resources/Resources.php: 7f49853fc9: ResourceLoader::filter: use APC when running under HHVM (duration: 00m 13s) [20:23:25] Logged the message, Master [20:24:16] (03PS3) 10Matanya: memorysize is a fact, qualify [puppet] - 10https://gerrit.wikimedia.org/r/215763 [20:24:18] (03PS4) 10Faidon Liambotis: sslcert: include ::chainedcert from ::certificate [puppet] - 10https://gerrit.wikimedia.org/r/215350 [20:24:31] (03CR) 10Faidon Liambotis: [C: 032 V: 032] sslcert: include ::chainedcert from ::certificate [puppet] - 10https://gerrit.wikimedia.org/r/215350 (owner: 10Faidon Liambotis) [20:24:51] (03PS4) 10Faidon Liambotis: sslcert: remove ::certificate's $content parameter [puppet] - 10https://gerrit.wikimedia.org/r/215351 [20:24:58] (03CR) 10Faidon Liambotis: [C: 032 V: 032] sslcert: remove ::certificate's $content parameter [puppet] - 10https://gerrit.wikimedia.org/r/215351 (owner: 10Faidon Liambotis) [20:25:08] PROBLEM - puppet last run on mw1240 is CRITICAL Puppet has 3 failures [20:25:27] PROBLEM - puppet last run on mc1009 is CRITICAL Puppet has 3 failures [20:25:28] PROBLEM - puppet last run on mw1040 is CRITICAL Puppet has 3 failures [20:25:28] PROBLEM - puppet last run on ms-fe1003 is CRITICAL Puppet has 3 failures [20:25:38] PROBLEM - puppet last run on mw1252 is CRITICAL Puppet has 3 failures [20:26:49] (03CR) 10Andrew Bogott: [C: 032] memorysize is a fact, qualify [puppet] - 10https://gerrit.wikimedia.org/r/215763 (owner: 10Matanya) [20:28:13] I'm stopping icinga-wm [20:28:22] it will flood us for spurious puppet errors [20:28:46] !log Restarting Jenkins to release a deadlock [20:28:47] jgage: ping? [20:28:52] Logged the message, Master [20:29:22] !log kafka preferred-replica-election on an1021 [20:29:28] Logged the message, Master [20:29:51] marxarelli: there? [20:30:51] ori: HHVM TC_SPACE WARNINGs 83-85% on multiple servers [20:30:55] I hate this check so much [20:31:48] YuviPanda, andrewbogott, Coren: labnet1001 connection tracking saturation has been firing off all day [20:32:05] [1433363348] SERVICE ALERT: labnet1001;Connection tracking saturation;UNKNOWN;SOFT;1;UNKNOWN: More than half of the datapoints are undefined [20:32:08] [1433363457] SERVICE ALERT: labnet1001;Connection tracking saturation;OK;SOFT;2;OK: Less than 1.00% above the threshold [241664.0] [20:32:10] (03PS20) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 [20:32:12] paravoid: in theory won't the recent cache purging fixup for hhvm make that less common / more meaningful? [20:32:13] root@neon:~# grep -c labnet1001 /var/log/icinga/icinga.log [20:32:13] 230 [20:32:16] I think, or thought, yes [20:32:22] (03PS21) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 [20:32:36] (03CR) 10Paladox: [C: 031] Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [20:34:17] (03PS1) 10Matanya: novaconfig: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215769 [20:34:48] paravoid: I’m not sure why we do connection tracking at all on that box… generally it’s disabled on dns servers isn’t it? [20:34:59] yes, it is [20:35:00] But also I don’t entirely know what that is or what it’s for :/ [20:37:37] godog: I inadvertently killed your carbon changes [20:37:40] andrewbogott: https://www.linuxquestions.org/questions/linux-server-73/ip_conntrack-table-full-581142/ [20:39:38] paravoid: np I was done but didn't reenable puppet yet [20:40:42] sorry andrewbogott meant to link to: https://kb.isc.org/article/AA-01183/0/Linux-connection-tracking-and-DNS.html [20:41:09] RECOVERY - puppet last run on mw1040 is OK Puppet is currently enabled, last run 14 seconds ago with 0 failures [20:41:18] RECOVERY - puppet last run on mw1252 is OK Puppet is currently enabled, last run 52 seconds ago with 0 failures [20:41:25] paravoid: hi [20:41:28] RECOVERY - puppet last run on ms-be1017 is OK Puppet is currently enabled, last run 36 seconds ago with 0 failures [20:41:40] (03CR) 10Dzahn: [C: 032] Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [20:41:48] RECOVERY - puppet last run on analytics1036 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:41:48] RECOVERY - puppet last run on mw1134 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures [20:41:49] RECOVERY - puppet last run on cp1059 is OK Puppet is currently enabled, last run 36 seconds ago with 0 failures [20:41:58] RECOVERY - puppet last run on mw2103 is OK Puppet is currently enabled, last run 0 seconds ago with 0 failures [20:41:59] RECOVERY - puppet last run on ms-be1013 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:41:59] RECOVERY - puppet last run on mw1233 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures [20:41:59] RECOVERY - puppet last run on nescio is OK Puppet is currently enabled, last run 37 seconds ago with 0 failures [20:42:10] RECOVERY - puppet last run on mw1192 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:42:14] jgage: an1021 kafka was critical twice today, I ran a preferred-replica-election and it fixed itself but it looks recurring [20:42:18] RECOVERY - puppet last run on mw2102 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:42:18] RECOVERY - puppet last run on mw2099 is OK Puppet is currently enabled, last run 41 seconds ago with 0 failures [20:42:24] (03CR) 10Dzahn: [V: 032] "manual +V2 needed because Paladox is not in the trusted user regex" [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [20:42:41] what's all the recover there? ^ [20:42:47] RECOVERY - puppet last run on mw1072 is OK Puppet is currently enabled, last run 29 seconds ago with 0 failures [20:42:48] nothing [20:42:48] RECOVERY - puppet last run on mc1009 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:42:53] ok! :) [20:42:55] paravoid: thanks, ok. i'll take a look. [20:42:57] RECOVERY - puppet last run on mw1067 is OK Puppet is currently enabled, last run 43 seconds ago with 0 failures [20:42:58] RECOVERY - puppet last run on ms-fe1003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:43:05] :P [20:43:07] RECOVERY - puppet last run on mw2122 is OK Puppet is currently enabled, last run 55 seconds ago with 0 failures [20:43:18] RECOVERY - puppet last run on mw2169 is OK Puppet is currently enabled, last run 59 seconds ago with 0 failures [20:43:18] RECOVERY - puppet last run on mw2183 is OK Puppet is currently enabled, last run 46 seconds ago with 0 failures [20:43:18] RECOVERY - puppet last run on mw1178 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:43:18] RECOVERY - puppet last run on mw1089 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:43:27] RECOVERY - puppet last run on ms-be2013 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:43:41] (03PS1) 10Matanya: puppet_certname: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215772 [20:43:49] RECOVERY - puppet last run on mw1048 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:43:49] RECOVERY - puppet last run on mw1031 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:44:25] (03PS5) 10Faidon Liambotis: sslcert: automatically regenerate chained cert on changes [puppet] - 10https://gerrit.wikimedia.org/r/215353 [20:44:32] (03CR) 10Faidon Liambotis: [C: 032 V: 032] sslcert: automatically regenerate chained cert on changes [puppet] - 10https://gerrit.wikimedia.org/r/215353 (owner: 10Faidon Liambotis) [20:45:54] uh oh [20:45:55] gerrit died [20:46:08] jap :( [20:46:09] cause of the SSL certs ? [20:46:20] paravoid: I'm off for the day, I'll try to keep it that way and look at the labnet changes tomorrow [20:46:44] looks unrelated [20:46:48] paravoid: hashar: no, that would be what i merged [20:46:51] /var/lib/gerrit2/review_site/etc/gerrit.config [20:46:53] it included a gerrit config change [20:46:57] and there goes my roll [20:47:00] which will make it restart [20:47:11] it looks back to me [20:47:18] indeed [20:47:46] in the kingdom of nouns, restarts take a while :) [20:47:58] PROBLEM - puppet last run on es2001 is CRITICAL puppet fail [20:49:03] it was about having links to phab in gitblit [20:50:01] grrrit-wm died though [20:50:07] !log restarted zuul entirely to remove some stalled jobs [20:50:12] Logged the message, Master [20:50:38] PROBLEM - puppet last run on wtp2018 is CRITICAL Puppet has 1 failures [20:52:06] mutante: can you fix grrrit-wm? [20:52:36] paravoid: i hope so, i'm already looking where it is [20:52:45] fwiw paladox said "Thanks for merging tested and all works. No errors are showing yet." [20:53:22] "managed by Bigbrother, so manual restarts should not be necessary " hmm [20:53:34] ah, but i have done this before.. on it [20:53:45] hmm, so I seem to be having possibly space issues on terbium? (As I decrypt securePoll results) it's been taking ages but at some point spat out an error that sorta sounds like that (and I'm not sure if it actually continued after that error or if it just stopped and is pretending to continue) [20:53:50] mkdir: cannot create directory `/sys/fs/cgroup/memory/mediawiki/job/26335': File exists [20:53:50] limit.sh: failed to create the cgroup. [20:54:12] Jamesofur: df -h [20:54:24] it's not space issues, that's not a real filesystem [20:54:38] that's mediawiki trying to create a cgroup to enforce memory limits [20:54:39] ah, sys/fs :) [20:54:40] Jamesofur: I've seen that before, likely the mw installation is missing cgroup-bin, searching for the related ticket [20:54:47] and some bug there, probably [20:54:48] cgroups 16G 0 16G 0% /sys/fs/cgroup [20:54:53] * matanya should read until the end [20:55:20] !log depooling ns0 -> radon AuthDNS (rebooting for kernel update) [20:55:26] Logged the message, Master [20:55:29] Jamesofur: T92712 [20:55:33] dzahn@tools-bastion-01:~$ become lolrrit-wm [20:55:33] You are not a member of the group tools.lolrrit-wm. [20:55:33] Any existing member of the tool's group can add you to that. [20:55:43] pretty sure i was in the past [20:55:49] or i was root.. [20:56:41] Pushed rescheduling of job 750059 on host tools-exec-1406.eqiad.wmflabs [20:58:34] looks like the code change did cause this.. sigh..ok [20:58:39] !log repooling ns0 -> radon AuthDNS [20:58:45] Logged the message, Master [21:00:04] aude: Respected human, time to deploy Wikidata (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150603T2100). Please do the needful. [21:01:35] one more gerrit config change coming up [21:01:37] paravoid: any idea on the cgroup issue? I can file another task like godog linked ( https://phabricator.wikimedia.org/T92712 ) for terbium but at the moment it's either a bit of an unbreaknow or I need to find somewhere else we can run the results tally. Even on my local machine I'm having to break the tally into multiple files which is.... very not ideal (proving that you didn't remove any votes by accident or on purpo [21:01:37] se is tough and hard for others to replicate). [21:01:54] in order to fix the bot [21:02:13] will cause a restart of gerrit [21:03:10] back [21:04:03] Jamesofur: I'll take a look [21:04:10] appreciate it godog [21:04:36] apparently my script IS still running... because it just gave the error a 2nd time [21:04:48] (03CR) 10Dzahn: "." [puppet] - 10https://gerrit.wikimedia.org/r/215779 (owner: 10Paladox) [21:04:50] (03CR) 10Paladox: "Oh." [puppet] - 10https://gerrit.wikimedia.org/r/215779 (owner: 10Paladox) [21:04:57] so I guess that's good, though I'm a bit worried that the errors represent a missed vote (I don't know if it will retry automatically) [21:05:05] is someone trying to debug labvirt1005? [21:05:13] andrewbogott maybe? [21:05:13] though, for the record I guess [21:05:19] analytics1021 problems today seem related to disk io / latency. the second problem coincides with the start of consistency check on /dev/md3, and the first one matches a disk io spike seen in ganglia that has no corresponding syslog events. (except the launch of cron.hourly, but it's a no-op.) that host has been problematic for a long time, but i'll keep a close eye on it today. [21:05:25] !log decryption key for Board Election insert into voteWiki [21:05:26] paravoid: there it is again. btw, i saw cert related dependency errors in the puppet run on gerrit server [21:05:30] paravoid: moritz was [21:05:31] Logged the message, Master [21:05:33] (he pm'd) [21:05:35] paravoid: I am. [21:05:39] Do I keep kicking you off? [21:05:40] ahhhhhh [21:05:43] and vice-versa! [21:05:45] Because someone keeps kicking me off :) [21:05:45] (03PS1) 10Aude: Enable Wikibase usage tracking on kowiki and rowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215782 (https://phabricator.wikimedia.org/T100659) [21:05:46] but that was much earlier [21:05:47] ah [21:05:48] hahaha [21:05:49] haha [21:05:50] I'm so sorry [21:05:56] so my thing was completely irrelevant too [21:05:57] I've reset iLO twice now [21:05:58] * YuviPanda goes afk for real [21:06:00] hah [21:06:03] thinking there's some bug [21:06:09] that keeps kicking me off the VSP [21:06:14] (03CR) 10Paladox: "All I added was regex.global.phabricator = \\b([bB][uU][gG]\\:?\\s+#?(T\\d+))\\b!!! me too [21:06:27] I guess we should’ve considered the obvious possibility :) [21:06:30] Jamesofur: mmhh if it looks like it is running I'll hold off changes for now [21:06:47] (03CR) 10Dzahn: "the bot works again now. had to restart it too on toollabs and wait a little bit" [puppet] - 10https://gerrit.wikimedia.org/r/215779 (owner: 10Paladox) [21:06:49] paravoid: I have it at a grub prompt now but I think it doesn’t get keypresses [21:07:03] oh, I take that back! It finally took one [21:07:18] andrewbogott: while gerrit-wm was away, i pushed some more stuff, at your spare time ... [21:07:42] it’s just super slow [21:08:08] godog: kk, if it just did the error again, if it goes for too much longer we may want to stop it and try to look at the issues anyway. The speed right now is realllllly slow much slower then it usually is [21:08:10] (03CR) 10Paladox: "Ok thanks." [puppet] - 10https://gerrit.wikimedia.org/r/215779 (owner: 10Paladox) [21:08:24] though to be fair it has a large number of choices and a large number of votes [21:08:27] more then it's had to deal with before [21:10:27] paravoid: ok, now it’s booted into a 3.13 kernel and accepting ssh logins. Which was the extent of my goal (as requested by Moritz) so I’m standing clear now. [21:10:34] (03CR) 10Paladox: "Could that be the reason it wasent working with the other patch before this one. Because bot had to be restarted manualy too." [puppet] - 10https://gerrit.wikimedia.org/r/215779 (owner: 10Paladox) [21:11:04] (03PS1) 10Matanya: rra_sizes: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215784 [21:12:14] (03CR) 10Aude: [C: 032] Enable Wikibase usage tracking on kowiki and rowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215782 (https://phabricator.wikimedia.org/T100659) (owner: 10Aude) [21:14:20] (03CR) 10Hashar: [C: 031] "Ah that makes more sense now :-] Thanks for the clarification, feel free to +2 anytime!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215650 (https://phabricator.wikimedia.org/T85188) (owner: 10Gergő Tisza) [21:14:32] (03PS1) 10Matanya: salt_reactor_options: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215785 [21:16:51] * aude wonders if zuul is stuck [21:17:05] (03Merged) 10jenkins-bot: Enable Wikibase usage tracking on kowiki and rowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215782 (https://phabricator.wikimedia.org/T100659) (owner: 10Aude) [21:17:23] there :) [21:17:43] (03PS3) 10Gage: strongswan: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215704 (owner: 10Matanya) [21:18:25] !log aude Synchronized wmf-config/InitialiseSettings.php: Enable Wikibase usage tracking on kowiki and rowiki (duration: 00m 13s) [21:18:50] !log restarting opendj on neptunium [21:19:26] (03PS1) 10Matanya: site: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215788 [21:20:39] RECOVERY - puppet last run on labsdb1001 is OK Puppet is currently enabled, last run 21 seconds ago with 0 failures [21:20:39] RECOVERY - puppet last run on mw1214 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [21:20:40] RECOVERY - puppet last run on suhail is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [21:20:40] RECOVERY - puppet last run on mw1169 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [21:20:41] RECOVERY - puppet last run on mw1013 is OK Puppet is currently enabled, last run 21 seconds ago with 0 failures [21:20:47] !log restarting pdns on virt1000 and labcontrol1001 [21:20:53] Logged the message, Master [21:24:52] (03CR) 10Paladox: "@Dzahn should this be abandoned because it is working now or should it be reverted." [puppet] - 10https://gerrit.wikimedia.org/r/215780 (owner: 10Dzahn) [21:27:39] (03CR) 10Gage: [C: 032] strongswan: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215704 (owner: 10Matanya) [21:28:46] (03PS1) 10Aude: Enable Wikibase usage tracking on ukwiki and viwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215791 (https://phabricator.wikimedia.org/T100659) [21:29:49] (03PS1) 10Matanya: ssl: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215792 [21:31:32] Jamesofur: btw doesn't look related to that bug as I first thought, the mw cgroup init files seem to be there [21:31:38] ok [21:31:56] yeah, with a lot of space if df -h is right [21:32:41] (03PS1) 10Matanya: title: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215793 [21:34:19] (03CR) 10Dzahn: "the bot works again, so it can be abandonded. unless you see any other issues" [puppet] - 10https://gerrit.wikimedia.org/r/215780 (owner: 10Dzahn) [21:36:02] (03PS1) 10Matanya: varnish_instances: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215794 [21:37:57] (03PS1) 10Matanya: zookeeper_hosts: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215795 [21:39:11] godog: ok, my script is done if you have any other ideas, if not any idea of who I could poke to look? [21:41:18] andrewbogott: call it a night [21:41:35] see you, and thanks for all the fish [21:42:57] Jamesofur: given that it isn't related to the bug I thought it was I don't have many ideas no :( [21:43:03] * Jamesofur nods [21:44:00] ok, less priority I think [21:44:03] the number of votes is right [21:44:09] I appreciate you looking godog :) [21:45:44] Jamesofur: nice, so looks like it has worked [21:46:00] yeah, looks like it handled itself even if it had some bumps on the road [21:49:57] (03Abandoned) 10Dzahn: Revert "Add link in gitblit for phabricator" [puppet] - 10https://gerrit.wikimedia.org/r/215780 (owner: 10Dzahn) [21:50:10] (03PS4) 10Faidon Liambotis: base::certs: remove backwards-compat ensure => absents [puppet] - 10https://gerrit.wikimedia.org/r/215597 [21:50:12] (03PS6) 10Faidon Liambotis: certs: replace require by collector ordering [puppet] - 10https://gerrit.wikimedia.org/r/215352 [21:50:14] (03PS1) 10Faidon Liambotis: ldap: point TLS_CACERTFILE to ca-certificates.crt [puppet] - 10https://gerrit.wikimedia.org/r/215800 [21:50:29] (03PS2) 10Faidon Liambotis: ldap: point TLS_CACERTFILE to ca-certificates.crt [puppet] - 10https://gerrit.wikimedia.org/r/215800 [21:50:31] (03PS5) 10Faidon Liambotis: base::certs: remove backwards-compat ensure => absents [puppet] - 10https://gerrit.wikimedia.org/r/215597 [21:50:33] (03PS7) 10Faidon Liambotis: certs: replace require by collector ordering [puppet] - 10https://gerrit.wikimedia.org/r/215352 [21:50:50] (03PS3) 10Faidon Liambotis: ldap: point TLS_CACERTFILE to ca-certificates.crt [puppet] - 10https://gerrit.wikimedia.org/r/215800 [21:50:56] (03CR) 10Faidon Liambotis: [C: 032 V: 032] ldap: point TLS_CACERTFILE to ca-certificates.crt [puppet] - 10https://gerrit.wikimedia.org/r/215800 (owner: 10Faidon Liambotis) [21:52:08] (03CR) 10BryanDavis: "Conceptually a great idea. Those with anti-php bias for scripting can start working on MediaWiki 2.0 in python or node or whatever." (039 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202665 (https://phabricator.wikimedia.org/T95375) (owner: 1020after4) [21:53:18] (03PS1) 10Krinkle: coal: Fix regression in loading permalink of default period [puppet] - 10https://gerrit.wikimedia.org/r/215801 [21:53:20] ori: ^ [21:53:34] !log ori Synchronized php-1.26wmf8/includes/resourceloader/ResourceLoader.php: 7f49853fc9: ResourceLoader::filter: use APC when running under HHVM (did not sync correct file previously) (duration: 00m 12s) [21:53:40] Logged the message, Master [21:54:10] (03PS2) 10Ori.livneh: coal: Fix regression in loading permalink of default period [puppet] - 10https://gerrit.wikimedia.org/r/215801 (owner: 10Krinkle) [21:54:29] (03CR) 10Ori.livneh: [C: 032 V: 032] "Nice, thanks." [puppet] - 10https://gerrit.wikimedia.org/r/215801 (owner: 10Krinkle) [21:58:47] (03PS1) 10Filippo Giunchedi: install-server: provision d-i-test as ganeti VM [puppet] - 10https://gerrit.wikimedia.org/r/215802 (https://phabricator.wikimedia.org/T100636) [21:58:51] !log restarted gitblit [21:58:56] Logged the message, Master [21:59:01] (03PS1) 10Ori.livneh: memkeys-snapshot: exclude log lines and CSV header from output [puppet] - 10https://gerrit.wikimedia.org/r/215803 [21:59:10] (03Abandoned) 10Filippo Giunchedi: install-server: add WMF5842 back as d-i-test [puppet] - 10https://gerrit.wikimedia.org/r/214608 (https://phabricator.wikimedia.org/T100636) (owner: 10Filippo Giunchedi) [21:59:23] (03CR) 10Ori.livneh: [C: 032 V: 032] memkeys-snapshot: exclude log lines and CSV header from output [puppet] - 10https://gerrit.wikimedia.org/r/215803 (owner: 10Ori.livneh) [21:59:56] (03PS2) 10Filippo Giunchedi: install-server: provision d-i-test as ganeti VM [puppet] - 10https://gerrit.wikimedia.org/r/215802 (https://phabricator.wikimedia.org/T100636) [21:59:58] (03CR) 1020after4: "That's what baffles me the most. This is a php project. I'm better equipped to write PHP than other languages. And most of all, the reposi" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202665 (https://phabricator.wikimedia.org/T95375) (owner: 1020after4) [22:00:04] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] install-server: provision d-i-test as ganeti VM [puppet] - 10https://gerrit.wikimedia.org/r/215802 (https://phabricator.wikimedia.org/T100636) (owner: 10Filippo Giunchedi) [22:01:30] (03CR) 10Aude: [C: 032] Enable Wikibase usage tracking on ukwiki and viwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215791 (https://phabricator.wikimedia.org/T100659) (owner: 10Aude) [22:01:37] (03Merged) 10jenkins-bot: Enable Wikibase usage tracking on ukwiki and viwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215791 (https://phabricator.wikimedia.org/T100659) (owner: 10Aude) [22:01:47] PROBLEM - puppet last run on neptunium is CRITICAL puppet fail [22:02:26] (03PS6) 10Faidon Liambotis: base::certs: remove backwards-compat ensure => absents [puppet] - 10https://gerrit.wikimedia.org/r/215597 [22:02:31] !log aude Synchronized wmf-config/InitialiseSettings.php: Enable Wikibase usage tracking on ukwiki and viwiki (duration: 00m 15s) [22:02:37] Logged the message, Master [22:02:47] (03CR) 10Faidon Liambotis: [C: 032 V: 032] base::certs: remove backwards-compat ensure => absents [puppet] - 10https://gerrit.wikimedia.org/r/215597 (owner: 10Faidon Liambotis) [22:03:28] RECOVERY - puppet last run on neptunium is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [22:05:28] (03CR) 10Paladox: "Nope all working." [puppet] - 10https://gerrit.wikimedia.org/r/215780 (owner: 10Dzahn) [22:07:16] (03PS1) 10QChris: Drop gerrit's misleading phabricator tracking id [puppet] - 10https://gerrit.wikimedia.org/r/215804 [22:12:03] (03CR) 10QChris: "This change's modifications of gerrit.config has been partially reverted in I43087d26985b82eed787ae6fbdd9078e8857eff9." [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [22:12:53] (03CR) 10QChris: "The reverting of this change's remaining modifications of gerrit.config can be found in I5f150af0a47d2ac3b9b89f4aa23c1dfe9284ab8c." [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [22:21:44] (03PS2) 10Dzahn: ganglia's sha1 cert to sha256 [puppet] - 10https://gerrit.wikimedia.org/r/214670 (https://phabricator.wikimedia.org/T100825) (owner: 10RobH) [22:22:20] James_F: thanks for adding collaboration team to a bunch of the -log-errors bugs [22:22:33] tasks. tasks. /me chants [22:22:56] (03PS2) 10Dzahn: Drop gerrit's misleading phabricator tracking id [puppet] - 10https://gerrit.wikimedia.org/r/215804 (owner: 10QChris) [22:24:46] (03PS1) 10Filippo Giunchedi: install-server: create placeholder LV to work around partman-lvm bug [puppet] - 10https://gerrit.wikimedia.org/r/215806 (https://phabricator.wikimedia.org/T100636) [22:26:17] (03CR) 10Dzahn: [C: 032] Drop gerrit's misleading phabricator tracking id [puppet] - 10https://gerrit.wikimedia.org/r/215804 (owner: 10QChris) [22:26:31] sorry, one more gerrit config change [22:31:30] (03PS3) 10Dzahn: ganglia's sha1 cert to sha256 [puppet] - 10https://gerrit.wikimedia.org/r/214670 (https://phabricator.wikimedia.org/T100825) (owner: 10RobH) [22:35:28] PROBLEM - puppet last run on antimony is CRITICAL puppet fail [22:36:10] arrrr [22:36:41] icinga-wm: i don't believe you [22:37:08] RECOVERY - puppet last run on antimony is OK Puppet is currently enabled, last run 30 seconds ago with 0 failures [22:37:13] better :p [22:52:17] grrrit-wm [22:52:26] mutante: broken again [22:53:02] paravoid: :/ fixing [22:54:15] greg-g: Always. [22:58:53] (03CR) 10Dzahn: "grrrrit-wm come on come on come on grrrrit-wm come on come on come ongrrrrit-wm come on come on come on" [puppet] - 10https://gerrit.wikimedia.org/r/214670 (https://phabricator.wikimedia.org/T100825) (owner: 10RobH) [22:59:37] mutante: lol on your commit msg. [22:59:52] (03CR) 10Alex Monk: "It doesn't look like you've put these through optipng?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215811 (owner: 10BryanDavis) [23:00:05] RoanKattouw, ^d, tgr: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150603T2300). Please do the needful. [23:00:39] (03CR) 10BryanDavis: "Nope, I just pulled them down from File:Wiki.png on the various beta cluster hosts." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215811 (owner: 10BryanDavis) [23:01:08] robh: i needed a random one to test if the bot is back :) [23:01:22] robh: on that actual change though, the order is different than first expected [23:01:49] delete cert, merge cert-only change, puppet, merge config change, puppet again [23:02:02] (03CR) 10Paladox: "Thanks." [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [23:02:13] (03PS2) 10BryanDavis: Beta cluster wiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215811 [23:03:05] (03CR) 10Faidon Liambotis: "Do we have to format too (as ext4)? Can't we just configure it to just create a dummy LV and not format it?" [puppet] - 10https://gerrit.wikimedia.org/r/215806 (https://phabricator.wikimedia.org/T100636) (owner: 10Filippo Giunchedi) [23:03:11] (03CR) 10Filippo Giunchedi: "@gwicke I think it is related to exported resources and the time it takes to fully converge, haven't had the time yet to look in depth tho" [puppet] - 10https://gerrit.wikimedia.org/r/215557 (owner: 10Filippo Giunchedi) [23:03:26] (03CR) 10BryanDavis: "Ran beta*.png through optipng (default settings)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215811 (owner: 10BryanDavis) [23:10:34] phabbot gone again, whatever its name was? [23:11:01] wikibugs [23:11:50] wikibugs: hi! [23:14:32] Anybody doing SWAT deploy today? [23:14:34] it probably cant resolve things [23:16:10] RoanKattouw, ^d, tgr: I can do the SWAT deployment today if no one else wants to do it [23:16:43] That would be nice [23:17:15] "If you have to restart the IRC bot (you really shouldn't)" [23:18:04] RoanKattouw: NP [23:19:06] Thanks man [23:19:07] (03CR) 10Kaldari: [C: 032] Enable MediaWiki logo on mobile login page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215361 (https://phabricator.wikimedia.org/T100633) (owner: 10Jdlrobson) [23:19:14] I usually don't mind doing it, just... not today, you know [23:19:41] kaldari: I'm only mentioned there as a the patch owner [23:20:29] trg: ah, yep, I see. I'll do yours right after Jon's [23:20:45] (it would be a lot less confusing IMO of jouncebot sent two separate messages) [23:21:17] tgr: file a bug? [23:21:25] (03Merged) 10jenkins-bot: Enable MediaWiki logo on mobile login page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215361 (https://phabricator.wikimedia.org/T100633) (owner: 10Jdlrobson) [23:21:43] bd808: which project is that? [23:21:53] I don't think it has one [23:22:29] tgr: deployment-systems for lack of a better place and cc me [23:22:59] code is in gerrit and I have rights on the tools account [23:23:41] \o sorry [23:23:42] jdlrobson2: Howdy, about to merge your config change... [23:23:48] thank you [23:24:56] kaldari: it's merged? i'm not seeing the right behaviour for some reason [23:25:24] (03CR) 10Filippo Giunchedi: "by the way this resulted in this change:" [puppet] - 10https://gerrit.wikimedia.org/r/215700 (owner: 10Matanya) [23:25:27] !log kaldari Synchronized images/mobile/mediawiki.png: syncing mediawiki logo for mobile (duration: 00m 12s) [23:25:33] Logged the message, Master [23:25:54] matanya andrewbogott_afk ^ qualifying hostname in puppet changes it from unqualified to qualified (!) [23:26:13] !log kaldari Synchronized wmf-config/InitialiseSettings.php: syncing config change for mediawiki logo on mobile (duration: 00m 12s) [23:26:13] the first qualified is "puppet qualified" the second is "dns qualified" [23:26:18] Logged the message, Master [23:26:25] jdlrobson2: You should be able to see the change now [23:26:28] I'm shocked at how unpredictable puppet is [23:27:00] jdlrobson2: lemme know if it looks good to you [23:27:05] not working kaldari... [23:27:08] https://m.mediawiki.org/wiki/Special:UserLogin [23:27:11] should show mediawiki logo [23:27:15] when anon [23:27:40] kaldari: i assume not a caching problem? [23:28:01] (03CR) 10Ottomata: multicast: qualify var (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/215767 (owner: 10Matanya) [23:28:37] (03CR) 10Ottomata: [C: 032] zookeeper_hosts: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215795 (owner: 10Matanya) [23:29:12] jdlrobson2: I don't see one on en.wiki either [23:29:43] bd808: https://phabricator.wikimedia.org/T101329 [23:29:57] I can try my hand at it if you point me to the right repo [23:30:11] works fine kaldari for enwiki [23:30:13] must be logged out [23:30:25] ah [23:30:31] (03PS1) 10Filippo Giunchedi: Revert "graphite: hotname is a fact, qualify" [puppet] - 10https://gerrit.wikimedia.org/r/215814 (https://phabricator.wikimedia.org/T97251) [23:30:50] tgr: https://gerrit.wikimedia.org/r/#/admin/projects/wikimedia/bots/jouncebot [23:31:06] kaldari: ahh it's a bad patch [23:31:09] it's mediawikiwiki not mediawiki [23:31:25] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Revert "graphite: hotname is a fact, qualify" [puppet] - 10https://gerrit.wikimedia.org/r/215814 (https://phabricator.wikimedia.org/T97251) (owner: 10Filippo Giunchedi) [23:31:56] jdlrobson2: ah, shit. If you wanna do another pass, I'll wait. [23:31:58] (03PS5) 10Faidon Liambotis: wikitech.wikimedia.org certificate sha1 to sha256 [puppet] - 10https://gerrit.wikimedia.org/r/214666 (https://phabricator.wikimedia.org/T92709) (owner: 10RobH) [23:32:25] kaldari: coming up [23:32:25] (03PS1) 10Jdlrobson: Correct the key for mediawiki.org mobile login logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215818 [23:32:29] ^ kaldari [23:32:31] (03PS6) 10Faidon Liambotis: certs: wikitech.wm.org certificate SHA1 to SHA2 [puppet] - 10https://gerrit.wikimedia.org/r/214666 (https://phabricator.wikimedia.org/T92709) (owner: 10RobH) [23:32:34] (03CR) 10Filippo Giunchedi: "afaict that field accepts a parted fs name but partman won't touch the partition unless filesystem{ } use_filesystem{ } and format{ } is a" [puppet] - 10https://gerrit.wikimedia.org/r/215806 (https://phabricator.wikimedia.org/T100636) (owner: 10Filippo Giunchedi) [23:32:56] (03CR) 10Kaldari: [C: 032] Correct the key for mediawiki.org mobile login logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215818 (owner: 10Jdlrobson) [23:33:03] (03Merged) 10jenkins-bot: Correct the key for mediawiki.org mobile login logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215818 (owner: 10Jdlrobson) [23:33:58] (03PS1) 10Faidon Liambotis: sslcert: cleanup for ::ca ensure => absents [puppet] - 10https://gerrit.wikimedia.org/r/215820 [23:34:00] (03PS1) 10Faidon Liambotis: base: kill the wmf-ca CA [puppet] - 10https://gerrit.wikimedia.org/r/215821 [23:34:41] !log kaldari Synchronized wmf-config/InitialiseSettings.php: syncing config change for mediawiki logo on mobile, take 2 (duration: 00m 12s) [23:34:46] Logged the message, Master [23:34:56] jdlrobson2: Looks good now: https://m.mediawiki.org/wiki/Special:UserLogin [23:35:12] (03CR) 10Faidon Liambotis: [C: 032] sslcert: cleanup for ::ca ensure => absents [puppet] - 10https://gerrit.wikimedia.org/r/215820 (owner: 10Faidon Liambotis) [23:35:32] (03CR) 10Kaldari: [C: 032] Disable PHP error logging in the Sentry extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215650 (https://phabricator.wikimedia.org/T85188) (owner: 10Gergő Tisza) [23:35:47] perfect thanks kaldari! [23:36:02] (03Merged) 10jenkins-bot: Disable PHP error logging in the Sentry extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215650 (https://phabricator.wikimedia.org/T85188) (owner: 10Gergő Tisza) [23:37:01] tgr: Your noop is finished [23:37:08] PROBLEM - Translation cache space on mw1018 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:38:38] RECOVERY - Translation cache space on mw1018 is OK: HHVM_TC_SPACE OK TC sizes are OK [23:39:03] kaldari: thanks! [23:39:23] ori: I'm seeing "Lost parent, LightProcess exiting" errors from HHVM. Not a lot, but just started a few minutes ago [23:39:31] FYI [23:39:37] I have no idea what that means :) [23:40:47] (03CR) 10Kaldari: [C: 032] Adding Chinese Wikipedia to to list of ImportSources for meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215810 (https://phabricator.wikimedia.org/T101324) (owner: 10Kaldari) [23:41:11] (03Merged) 10jenkins-bot: Adding Chinese Wikipedia to to list of ImportSources for meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215810 (https://phabricator.wikimedia.org/T101324) (owner: 10Kaldari) [23:42:33] !log kaldari Synchronized wmf-config/InitialiseSettings.php: syncing ImportSource change for meta (duration: 00m 13s) [23:42:39] Logged the message, Master [23:42:58] (03PS2) 10Faidon Liambotis: base: kill the wmf-ca CA [puppet] - 10https://gerrit.wikimedia.org/r/215821 [23:42:58] OK, SWAT is done [23:43:00] (03PS1) 10Faidon Liambotis: ldap: kill ldap::role::config::production/corp roles [puppet] - 10https://gerrit.wikimedia.org/r/215824 [23:43:02] (03PS1) 10Faidon Liambotis: ldap: remove wmf-ca addition from the truststores [puppet] - 10https://gerrit.wikimedia.org/r/215825 [23:46:15] (03CR) 10Faidon Liambotis: [C: 032] ldap: kill ldap::role::config::production/corp roles [puppet] - 10https://gerrit.wikimedia.org/r/215824 (owner: 10Faidon Liambotis) [23:46:47] (03CR) 10Faidon Liambotis: [C: 032] ldap: remove wmf-ca addition from the truststores [puppet] - 10https://gerrit.wikimedia.org/r/215825 (owner: 10Faidon Liambotis) [23:46:54] (03PS2) 10Ottomata: zookeeper_hosts: qualify var [puppet] - 10https://gerrit.wikimedia.org/r/215795 (owner: 10Matanya) [23:47:20] ottomata: hey [23:47:29] ottomata: did jgage gave you an update re: an1021? [23:47:55] yep, we just talked about it [23:48:02] because he is here in teh office woo [23:50:31] oh awesome [23:51:05] (03CR) 10Ottomata: Make it possible to install multiple custom diamond collectors that use the same source (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/215056 (owner: 10Ottomata) [23:51:27] (03PS1) 10Ori.livneh: varnishstatsd: notify service when code is touched; touch code [puppet] - 10https://gerrit.wikimedia.org/r/215828 [23:51:31] godog: ^ [23:52:18] (03PS3) 10Ottomata: Make it possible to install multiple custom diamond collectors that use the same source [puppet] - 10https://gerrit.wikimedia.org/r/215056 [23:54:00] (03CR) 10Filippo Giunchedi: [C: 031] varnishstatsd: notify service when code is touched; touch code [puppet] - 10https://gerrit.wikimedia.org/r/215828 (owner: 10Ori.livneh) [23:54:16] (03PS2) 10Ori.livneh: varnishstatsd: notify service when code is touched; touch code [puppet] - 10https://gerrit.wikimedia.org/r/215828 [23:54:32] (03PS1) 10Faidon Liambotis: base: remove wmf-ca ensure => absent [puppet] - 10https://gerrit.wikimedia.org/r/215829 [23:55:53] (03CR) 10Ori.livneh: [C: 032 V: 032] varnishstatsd: notify service when code is touched; touch code [puppet] - 10https://gerrit.wikimedia.org/r/215828 (owner: 10Ori.livneh) [23:56:38] (03CR) 10Filippo Giunchedi: [C: 031] Make it possible to install multiple custom diamond collectors that use the same source [puppet] - 10https://gerrit.wikimedia.org/r/215056 (owner: 10Ottomata) [23:57:07] thank you, i am going to merge that, and hopefully not break labs puppet again [23:57:22] or maybe i shoudl merge it tomorrow when i don't feel so groggy [23:57:40] I'd recommend that yes [23:57:48] haha [23:58:06] (03CR) 10Ori.livneh: [C: 04-1] Make it possible to install multiple custom diamond collectors that use the same source (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/215056 (owner: 10Ottomata) [23:58:16] :P [23:59:55] (03CR) 10Ottomata: Make it possible to install multiple custom diamond collectors that use the same source (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/215056 (owner: 10Ottomata) [23:59:57] (03PS4) 10Ottomata: Make it possible to install multiple custom diamond collectors that use the same source [puppet] - 10https://gerrit.wikimedia.org/r/215056