[00:00:22] because you're not the first, at least 3 people have popped up reporting strange DNS issues since [00:01:50] i am getting some packet loss [00:02:55] ? [00:04:24] MaxSem: thanks [00:04:43] (03CR) 10Ricordisamoa: "Fully rebased now!" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129464 (owner: 10Ricordisamoa) [00:12:21] I'm going to revert the first of the two, since I can still see (very rare) queries coming through for the bad hostnames, inexplicably. [00:13:07] (03PS1) 10BBlack: Revert "Kill $project-lb.$site.wikimedia.org and free IPs" [operations/dns] - 10https://gerrit.wikimedia.org/r/146990 [00:13:48] (03CR) 10BBlack: [C: 032] Revert "Kill $project-lb.$site.wikimedia.org and free IPs" [operations/dns] - 10https://gerrit.wikimedia.org/r/146990 (owner: 10BBlack) [00:15:10] AFAICS, as Faidon's original comments alude to, those specific hostnames have been out of our config for months and months and months now. There's no way anyone should have them cached. [00:15:40] but reverting is probably best at this point since I can't find a rational explanation for the rare strangeness out there, either. [00:16:56] if it were just some issues in labs tools or companies that take a special interest in us, I could understand that perhaps they configured things using hostnames they shouldn't have. [00:17:16] but generic caches having this issue is pretty mysterious/dubious [00:18:35] (unless perhaps regional ISPs have gone and put in specific hacks for us? e.g. rewriting all our domains to whatever-lb.esams.wikimedia.org because it's closest to them and they were having geolocation problems with the auto-mapping sending their users elsewhere?) [00:31:17] bblack, if we're lucky, they didn't even hardcode our IPs:P [00:32:07] I can actually see them looking up the hostnames. Hostnames that should have left all public usage months ago. [00:32:22] but, yeah, I guess there's no accounting for stupid behavior on the internet. [00:32:37] the percentage is very small, but it's there [00:39:20] (03PS5) 10BBlack: Kill $project-lb(.$site)?.wikimedia.org IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/140136 (owner: 10Faidon Liambotis) [00:40:55] Report of failure to load enwiki in #wikipedia; from a toronto IP [00:41:11] 2014-07-16 - 17:33:20 its been off and onn the past hour [00:41:31] I wonder if that relates to the post to meta [00:41:35] Might be. [00:41:43] https://meta.wikimedia.org/w/index.php?diff=9220354&oldid=9219953&rcid=5435971 [00:41:43] bblack: Any chance your DNS changes are still causing trouble? [00:42:15] 19:13 < grrrit-wm> (CR) BBlack: [C: 2] Revert "Kill $project-lb.$site.wikimedia.org and free IPs" [operations/dns] - https://gerrit.wikimedia.org/r/146990 (owner: BBlack) [00:42:24] Fair enough [00:42:42] marktraceur: ^ that change from nearly 30 minutes ago should have fixed any remaining issue (which is still inexplicable, and still seemed to only affect a small portion of people) [00:43:07] what I really don't get is the "off and on" part [00:43:23] Seems unlikely to be your problem then [00:43:47] No other deploys recently though [00:43:53] where is the NaturalRX report coming from? can we get a person who observes (or observed) it to provide some reliable dig data? [00:44:04] He's having connection issues [00:44:25] I'll ask when he comes back [00:44:38] marktraceur: it's definitely "my problem", but I believe the revert mentioned above has fixed it (unless ISPs with bad caching are also delaying the fix taking effect for some) [00:45:25] (as opposed to being caused by some other change/deploy; I still think the DNS changes themselves are correct and something else is configured poorly out there in the world and making a mess of it) [00:45:53] Ah. [00:50:27] i haven [00:50:36] 't had any issues in hte last few minutes [00:50:49] No, it all seems intermittent at worst [00:52:48] (03CR) 10BBlack: [C: 04-1] "This is mostly just to record the issue and the desired changes, pending more investigation or a better approach. We might have to leave " [operations/dns] - 10https://gerrit.wikimedia.org/r/140136 (owner: 10Faidon Liambotis) [00:53:17] rschen7754: did you ever get a dig output while having the issue? [00:53:28] unfortunately no :( [00:53:32] (03CR) 10Chmarkine: [C: 031] Make lists.wikimedia.org HTTPS only [operations/puppet] - 10https://gerrit.wikimedia.org/r/145616 (owner: 10JanZerebecki) [01:22:28] (03PS1) 10Kmosher: Fix kafka::server comment spacing [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/147005 [01:22:30] (03PS1) 10Kmosher: Add $java_home option to kafka::server [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/147006 [01:22:32] (03PS1) 10Kmosher: Add $num_partitions option to kafka::server [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/147007 [01:22:34] (03PS1) 10Kmosher: Use new-style variables in template [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/147008 [01:27:41] (03PS1) 10Kmosher: Small fixups [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/147009 [01:27:43] (03PS1) 10Kmosher: Add $java_home and $num_partitions parameters [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/147010 [01:28:23] (03Abandoned) 10Kmosher: Use new-style variables in template [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/147008 (owner: 10Kmosher) [01:28:27] (03Abandoned) 10Kmosher: Add $num_partitions option to kafka::server [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/147007 (owner: 10Kmosher) [01:28:29] (03Abandoned) 10Kmosher: Add $java_home option to kafka::server [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/147006 (owner: 10Kmosher) [01:28:33] (03Abandoned) 10Kmosher: Fix kafka::server comment spacing [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/147005 (owner: 10Kmosher) [02:03:09] there's a bunch of people in -en saying they can't connect to any wmf wikis [02:13:32] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Thu 17 Jul 2014 00:12:42 UTC [02:14:36] (03PS1) 10Chad: Undeploy CommunityVoice extension [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147012 [02:15:02] <^demon|away> RoanKattouw: You were a usability initiative guy. Mind if I do that ^? [02:16:00] <^demon|away> Oh, idle for 229 hours. [02:16:03] <^demon|away> Might not answer. [02:18:21] (03PS2) 10Chad: Undeploy CommunityVoice/ClientSide extensions [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147012 [02:24:11] !log LocalisationUpdate completed (1.24wmf12) at 2014-07-17 02:23:08+00:00 [02:24:19] Logged the message, Master [02:33:49] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Thu Jul 17 02:33:38 UTC 2014 [02:47:27] !log LocalisationUpdate completed (1.24wmf13) at 2014-07-17 02:46:24+00:00 [02:47:32] Logged the message, Master [03:11:01] (03CR) 10PleaseStand: "I've been looking forward to this because it would get rid of some uses of the deprecated Xml::escapeJsString." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147012 (owner: 10Chad) [03:25:15] (03PS1) 10Ori.livneh: update puppetlabs-stdlib to 4.3.2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/147016 [03:33:31] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul 17 03:32:25 UTC 2014 (duration 32m 24s) [03:33:37] Logged the message, Master [03:34:49] (03PS2) 10Ori.livneh: backport floor() func from puppetlabs-stdlib v4.3.2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/147016 [03:40:42] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Thu 17 Jul 2014 01:40:24 UTC [03:40:44] (03CR) 10Ori.livneh: "Well, it's not really sensible to define the directory in three different manifests. Is there any reason you can't just include ::mediawik" [operations/puppet] - 10https://gerrit.wikimedia.org/r/144599 (owner: 10BryanDavis) [03:48:37] (03CR) 10BryanDavis: "Including ::mediawiki everywhere seems like it pulls in way more things than are needed. I was looking around though and found what might " [operations/puppet] - 10https://gerrit.wikimedia.org/r/144599 (owner: 10BryanDavis) [03:49:07] ori: ^ what about that? [04:00:18] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Thu Jul 17 04:00:11 UTC 2014 [04:31:56] Hi. I can't connect to Wikipedia for some reason. [04:34:16] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Thu 17 Jul 2014 02:33:38 UTC [04:54:47] PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: Puppet has 9 failures [05:02:55] (03PS1) 10KartikMistry: Enable wgUseInstantCommons for es/ca betawikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147022 [05:04:19] (03CR) 10Amire80: [C: 031] Enable wgUseInstantCommons for es/ca betawikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147022 (owner: 10KartikMistry) [05:18:25] (03CR) 10BryanDavis: Enable wgUseInstantCommons for es/ca betawikis (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147022 (owner: 10KartikMistry) [05:28:41] (03PS2) 10KartikMistry: Enable wgUseInstantCommons for all betawikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147022 [05:30:23] (03PS1) 10Mwalker: AppArmor Profile for OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 [05:31:08] (03PS2) 10Mwalker: AppArmor Profile for OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 [05:31:48] (03PS3) 10Mwalker: AppArmor Profile for OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 [05:32:17] (03Abandoned) 10Mwalker: Start OCG from a single location / script [operations/puppet] - 10https://gerrit.wikimedia.org/r/146934 (owner: 10Mwalker) [05:33:25] (03CR) 10jenkins-bot: [V: 04-1] AppArmor Profile for OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 (owner: 10Mwalker) [05:34:58] (03PS4) 10Mwalker: AppArmor Profile for OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 [05:35:53] (03CR) 10jenkins-bot: [V: 04-1] AppArmor Profile for OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 (owner: 10Mwalker) [05:36:56] (03PS5) 10Mwalker: AppArmor Profile for OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 [05:37:54] (03CR) 10jenkins-bot: [V: 04-1] AppArmor Profile for OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 (owner: 10Mwalker) [05:47:15] (03PS6) 10Mwalker: AppArmor Profile for OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 [05:48:10] (03CR) 10jenkins-bot: [V: 04-1] AppArmor Profile for OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 (owner: 10Mwalker) [05:51:19] (03PS7) 10Mwalker: AppArmor Profile for OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 [05:58:30] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:59:20] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.006 second response time [06:06:21] (03PS8) 10Mwalker: AppArmor Profile for OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 [06:07:18] (03CR) 10jenkins-bot: [V: 04-1] AppArmor Profile for OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 (owner: 10Mwalker) [06:08:44] (03PS9) 10Mwalker: AppArmor Profile for OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 [06:22:23] (03PS10) 10Mwalker: WIP: AppArmor Profile for OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 [06:28:55] PROBLEM - puppet last run on mw1117 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:24] PROBLEM - puppet last run on analytics1030 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:44] PROBLEM - puppet last run on mw1008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:44] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:54] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Thu Jul 17 06:32:51 UTC 2014 [06:34:34] (03PS1) 10Springle: Labsdb MariaDB 10 config starting with s5 [operations/puppet] - 10https://gerrit.wikimedia.org/r/147029 [06:37:27] (03CR) 10Springle: [C: 032] Labsdb MariaDB 10 config starting with s5 [operations/puppet] - 10https://gerrit.wikimedia.org/r/147029 (owner: 10Springle) [06:38:54] PROBLEM - puppet last run on ssl3001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:43:12] (03PS1) 10Springle: Skip packages_wmf which clashes with mysql_multi_instance, until after migration is done. [operations/puppet] - 10https://gerrit.wikimedia.org/r/147030 [06:44:37] RECOVERY - puppet last run on mw1008 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [06:45:15] (03CR) 10Springle: [C: 032] Skip packages_wmf which clashes with mysql_multi_instance, until after migration is done. [operations/puppet] - 10https://gerrit.wikimedia.org/r/147030 (owner: 10Springle) [06:45:57] RECOVERY - puppet last run on mw1117 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [06:46:27] RECOVERY - puppet last run on analytics1030 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [06:47:37] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [06:53:57] PROBLEM - puppet last run on ssl3002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:54:57] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [06:55:57] RECOVERY - puppet last run on ssl3001 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:59:18] (03PS1) 10Springle: Duplicate some of mariadb::config for role::mariadb::labs during migration, as the conflicts with mysql_multi_instance are various. [operations/puppet] - 10https://gerrit.wikimedia.org/r/147035 [07:00:37] (03CR) 10Springle: [C: 032] Duplicate some of mariadb::config for role::mariadb::labs during migration, as the conflicts with mysql_multi_instance are various. [operations/puppet] - 10https://gerrit.wikimedia.org/r/147035 (owner: 10Springle) [07:01:00] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [07:11:00] RECOVERY - puppet last run on ssl3002 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [07:12:50] RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [07:27:41] !log mariadb 10 on labsdb1002:3309 cloning s5 from sanitarium db1054:3308 [07:27:46] Logged the message, Master [07:51:25] (03PS14) 10Giuseppe Lavagetto: mediawiki: manage single configs via apache::site (one server)) [operations/puppet] - 10https://gerrit.wikimedia.org/r/146082 [07:53:03] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: manage single configs via apache::site (one server)) [operations/puppet] - 10https://gerrit.wikimedia.org/r/146082 (owner: 10Giuseppe Lavagetto) [07:53:14] <_joe_> hey ho, let's go [07:55:19] (03PS1) 10Giuseppe Lavagetto: Revert "mediawiki: manage single configs via apache::site (one server))" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147046 [07:55:30] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Revert "mediawiki: manage single configs via apache::site (one server))" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147046 (owner: 10Giuseppe Lavagetto) [07:56:05] <_joe_> ehm, circular dependency [07:56:18] <_joe_> those things, the compiler does not catch them [07:56:33] (03PS1) 10Giuseppe Lavagetto: Revert "Revert "mediawiki: manage single configs via apache::site (one server))"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147049 [08:04:24] (03PS2) 10Giuseppe Lavagetto: mediawiki: manage sites with apache::sites [operations/puppet] - 10https://gerrit.wikimedia.org/r/147049 [08:07:56] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: manage sites with apache::sites [operations/puppet] - 10https://gerrit.wikimedia.org/r/147049 (owner: 10Giuseppe Lavagetto) [09:00:50] (03PS1) 10Giuseppe Lavagetto: mediawiki: use sites-available everywhere. [operations/puppet] - 10https://gerrit.wikimedia.org/r/147066 [09:08:35] (03PS1) 10Giuseppe Lavagetto: jobrunner: use a more standard config file location [operations/puppet] - 10https://gerrit.wikimedia.org/r/147067 [09:12:05] (03PS1) 10Aaron Schulz: Moved more runners to the new job loop [operations/puppet] - 10https://gerrit.wikimedia.org/r/147068 [09:16:47] <_joe_> AaronSchulz: hey :) [09:16:57] <_joe_> didn't think you'll be around at this time [09:36:11] (03CR) 10Filippo Giunchedi: [C: 031] Moved more runners to the new job loop [operations/puppet] - 10https://gerrit.wikimedia.org/r/147068 (owner: 10Aaron Schulz) [09:37:06] (03CR) 10Alexandros Kosiaris: [C: 032] gitblit: fully qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/145894 (owner: 10Matanya) [10:03:01] (03CR) 10Alexandros Kosiaris: [C: 032] ferm: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/145250 (owner: 10Matanya) [10:07:08] _joe_: I lost track of what happened with https://gerrit.wikimedia.org/r/#/c/146082, it is applied and it will be eventually used everywhere? [10:20:21] (03PS3) 10Matanya: mailman: monitor queue size [operations/puppet] - 10https://gerrit.wikimedia.org/r/146756 [10:25:32] (03PS1) 10Matanya: mysql_multi_instance: qualify port [operations/puppet] - 10https://gerrit.wikimedia.org/r/147076 [10:25:53] <_joe_> godog: it is applied [10:26:12] <_joe_> godog: it had a circular dependency, so I reverted it to avoid icinga spam [10:26:25] <_joe_> while I fixed that [10:26:40] <_joe_> so, mw1017 (the staging host) is running with the new apache config [10:26:46] <_joe_> and it runs fine I'd say [10:27:35] (03PS2) 10Matanya: mysql_multi_instance: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/147076 [10:29:07] _joe_: ok so only 1017, btw I think I'm still -1 on letting puppet reload apache "uncontrolled" but would like to hear more opinions [10:30:16] <_joe_> godog: it's not 'uncontrolled' [10:30:34] <_joe_> also, it the syntax checks do not pass, we're not reloading [10:31:07] <_joe_> so it's just doing what we did before with apache-graceful-all after apache config changes [10:31:21] <_joe_> but- we can override this by changing the apache classes a bit [10:32:13] it is uncontrolled in the sense that it is hard to stop once it has started [10:32:26] <_joe_> ? [10:32:36] <_joe_> once puppet is running? [10:32:39] <_joe_> eh. [10:33:10] once it has been merged and puppet starts running, yes, like the apache2 upgrade the other day [10:33:49] it might be fine though, I don't know how often we change the config for example [10:33:56] <_joe_> well, this is a general problem using config management systems that run unattended [10:34:00] <_joe_> godog: rarely [10:34:11] <_joe_> godog: and we can think of ways to have staging areas [10:34:16] <_joe_> or 'service tiers' [10:34:49] if it doesn't change often then it might be alright [10:35:03] <_joe_> godog: see the commit history of apache-config [10:35:14] <_joe_> it's once per month, more or less [10:35:20] <_joe_> and it's usually adding redirects [10:36:09] <_joe_> now we'll be editing it _a_lot_ as we're moving to HHVM, but that will be a one-off [10:36:31] <_joe_> I'll write to ops@ for consensus on this anyway [10:49:38] (03CR) 10Alexandros Kosiaris: [C: 04-1] "The approach is sound, minor comments inline" (0322 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/146826 (owner: 10Filippo Giunchedi) [10:50:19] (03PS3) 10Matanya: mysql_multi_instance: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/147076 [10:53:07] (03CR) 10Filippo Giunchedi: "> This can be hooked up with dput. The problem I see though is that it can lead to concurrent > > runs, which a cron job would avoid. Not " [operations/puppet] - 10https://gerrit.wikimedia.org/r/146826 (owner: 10Filippo Giunchedi) [10:54:20] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Thu 17 Jul 2014 08:53:45 UTC [10:54:45] (03CR) 10Matanya: "compiled using puppet compiler:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147076 (owner: 10Matanya) [10:55:51] (03CR) 10Alexandros Kosiaris: [C: 032] "LGTM. Btw, that 10k line diff would have cause a minor regression on this change" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147016 (owner: 10Ori.livneh) [10:56:10] PROBLEM - puppet last run on wtp1004 is CRITICAL: CRITICAL: Puppet has 1 failures [10:57:01] PROBLEM - puppet last run on wtp1011 is CRITICAL: CRITICAL: Puppet has 1 failures [10:57:01] PROBLEM - puppet last run on ssl3002 is CRITICAL: CRITICAL: Puppet has 1 failures [10:58:00] seems like hosts can not apt-get update ? [10:59:02] PROBLEM - puppet last run on db1070 is CRITICAL: CRITICAL: Puppet has 1 failures [10:59:50] PROBLEM - puppet last run on cp1037 is CRITICAL: CRITICAL: Puppet has 1 failures [10:59:50] PROBLEM - puppet last run on mw1185 is CRITICAL: CRITICAL: Puppet has 1 failures [11:00:01] PROBLEM - puppet last run on mw1152 is CRITICAL: CRITICAL: Puppet has 1 failures [11:00:01] PROBLEM - puppet last run on cp3004 is CRITICAL: CRITICAL: Puppet has 1 failures [11:00:10] PROBLEM - puppet last run on mw1142 is CRITICAL: CRITICAL: Puppet has 1 failures [11:00:10] RECOVERY - puppet last run on wtp1004 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [11:00:15] change from notrun to 0 100 failed: Command exceeded timeout [11:00:26] something transient ? [11:00:42] PROBLEM - puppet last run on ms-be1002 is CRITICAL: CRITICAL: Puppet has 1 failures [11:01:11] PROBLEM - puppet last run on cp1044 is CRITICAL: CRITICAL: Puppet has 1 failures [11:01:31] PROBLEM - puppet last run on cp4020 is CRITICAL: CRITICAL: Puppet has 1 failures [11:02:18] <_joe_> no sure [11:02:23] <_joe_> not sure [11:02:39] <_joe_> akosiaris: some remote apt source timing out? [11:02:45] looks like it [11:02:50] but not anymore [11:02:55] at least on a couple of these hosts [11:03:59] <_joe_> ok well, better this way [11:06:17] does the check alarm on the first failure right now I think? it might make sense to notify if it fails twice in a row perhaps [11:06:28] <_joe_> godog: well, it depends [11:06:47] <_joe_> if we did like you said, the other night we would have gone down probably [11:07:39] it is 3 failures [11:07:41] but! [11:08:00] icinga-wm is not bound to the icinga notification mechanisms [11:08:17] it just read the icinga status file and does what it thinks best [11:09:02] so 3 checks before we get a page, but 1 check for icinga-wm to complain IIRC [11:09:18] <_joe_> akosiaris: oh ok [11:10:13] _joe_: you mean the mpm upgrade? surely we want another alarm if apache somehow starts failing [11:10:42] <_joe_> godog: yes we did in fact [11:10:56] <_joe_> that was nasty btw [11:12:02] RECOVERY - puppet last run on wtp1011 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [11:12:06] I stand corrected. Icinga populates a specific log file for icinga-wm and ircecho reads that [11:12:12] RECOVERY - puppet last run on ssl3002 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [11:12:19] so it is bound to icinga notification mechanisms [11:12:45] it is that other jabber/xmpp bot that reads the icinga.status file [11:12:51] RECOVERY - puppet last run on mw1185 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [11:13:01] RECOVERY - puppet last run on db1070 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [11:13:51] RECOVERY - puppet last run on cp1037 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [11:14:02] RECOVERY - puppet last run on mw1152 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [11:14:11] RECOVERY - puppet last run on cp3004 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [11:14:12] <_joe_> oh god [11:14:31] RECOVERY - puppet last run on cp4020 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [11:14:38] <_joe_> how many bots do we have? [11:14:41] RECOVERY - puppet last run on ms-be1002 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [11:14:41] <_joe_> :P [11:15:00] we got bots that log to bots [11:15:11] RECOVERY - puppet last run on cp1044 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [11:15:11] RECOVERY - puppet last run on mw1142 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [11:15:15] it is a step before skynet :P [11:17:19] i need help figuring out puppet weirdness [11:17:46] it is bots all the way down [11:19:28] in modules/mysql_multi_instance/templates/my.conf.cnf.erb line 2 the var settings is called from modules/mysql_multi_instance/manifests/config.pp but when changing settings to @setting, puppet complains. [11:19:44] what am i missing? [11:21:32] for starters variables are not called [11:21:40] they are referenced or passed [11:21:45] functions are called [11:21:58] now, let's see [11:22:03] tahnls [11:22:20] *thanks for the correction [11:23:25] np, what does puppet complain about ? [11:23:29] got a gerrit change ? [11:24:04] e.g http://puppet-compiler.wmflabs.org/162/change/147076/html/db1054.eqiad.wmnet.html akosiaris [11:24:44] patch here https://gerrit.wikimedia.org/r/#/c/147076/2/modules/mysql_multi_instance/templates/my.conf.cnf.erb akosiaris [11:25:10] dammit [11:25:14] i found the issue [11:25:18] (03PS1) 10Giuseppe Lavagetto: jobrunner: create hhvm-only jobrunners (WiP) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147086 [11:25:20] sorry for the noise [11:26:06] matanya: :-) [11:26:42] (03PS4) 10Matanya: mysql_multi_instance: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/147076 [11:28:15] lunch, bbl [11:29:01] <_joe_> I'll follow alex [11:29:28] <_joe_> I'll probably be back quite late, and be here longer [11:31:06] apergos: Could you have a look at https://gerrit.wikimedia.org/r/146470 please? [11:31:18] I've really kept it as simple as possible [11:33:41] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Thu Jul 17 11:33:40 UTC 2014 [11:39:44] yep sorry, was working on something else [11:40:42] Not an issue, it's not super urgent. It would just be nice to get this in sometime :) [11:41:40] hoo: can you add a system role for this corn job? the reaso is that it won't liv on all snapshots, just on one [11:41:57] so whichever one has it enables will have the system role show up n the motd, it's handy [11:42:18] that would go in https://gerrit.wikimedia.org/r/#/c/146470/1/modules/snapshot/manifests/wikidatajsondumps.pp I guess [11:43:59] how long doe the job take to run these days? [11:46:48] th targetDir is public/other/wikidata right? if I read your script right it will write into public/wikidata instead [11:47:04] apergos: I haven't run it manually since the hackathon [11:47:17] how long did it take then? [11:48:19] I think they ran in something less than 8h [11:48:29] but I can't really remember [11:49:08] hm ok we'll need to keep an eye o it when it runs as it will overlap a bunch of other jobs [11:49:35] Ok, it mostly just "hammers" the es storage [11:49:39] PROBLEM - puppet last run on db1022 is CRITICAL: CRITICAL: Puppet has 1 failures [11:49:52] so this shouldn't run durin an es storage peak load, if that's possible [11:49:59] PROBLEM - puppet last run on mw1069 is CRITICAL: CRITICAL: Puppet has 1 failures [11:49:59] PROBLEM - puppet last run on mw1174 is CRITICAL: CRITICAL: Puppet has 1 failures [11:49:59] PROBLEM - puppet last run on mw1150 is CRITICAL: CRITICAL: Puppet has 1 failures [11:50:29] PROBLEM - puppet last run on analytics1035 is CRITICAL: CRITICAL: Puppet has 1 failures [11:50:29] PROBLEM - puppet last run on elastic1008 is CRITICAL: CRITICAL: Puppet has 1 failures [11:50:29] PROBLEM - puppet last run on mc1002 is CRITICAL: CRITICAL: Puppet has 1 failures [11:50:29] PROBLEM - puppet last run on mw1068 is CRITICAL: CRITICAL: Puppet has 1 failures [11:50:29] PROBLEM - puppet last run on elastic1012 is CRITICAL: CRITICAL: Puppet has 1 failures [11:50:29] PROBLEM - puppet last run on search1010 is CRITICAL: CRITICAL: Puppet has 1 failures [11:50:30] PROBLEM - puppet last run on mw1046 is CRITICAL: CRITICAL: Puppet has 1 failures [11:50:39] PROBLEM - puppet last run on mw1003 is CRITICAL: CRITICAL: Puppet has 1 failures [11:50:39] PROBLEM - puppet last run on elastic1018 is CRITICAL: CRITICAL: Puppet has 1 failures [11:50:39] PROBLEM - puppet last run on mc1003 is CRITICAL: CRITICAL: Puppet has 1 failures [11:50:40] it should (the php code) respect db lag, then you should be ok, I assume the stores use that param just like anything else [11:50:49] PROBLEM - puppet last run on mw1176 is CRITICAL: CRITICAL: Puppet has 1 failures [11:50:50] PROBLEM - puppet last run on mw1189 is CRITICAL: CRITICAL: Puppet has 1 failures [11:50:50] PROBLEM - puppet last run on mw1120 is CRITICAL: CRITICAL: Puppet has 1 failures [11:50:50] PROBLEM - puppet last run on mw1117 is CRITICAL: CRITICAL: Puppet has 1 failures [11:50:50] PROBLEM - puppet last run on wtp1016 is CRITICAL: CRITICAL: Puppet has 1 failures [11:50:50] PROBLEM - puppet last run on amssq61 is CRITICAL: CRITICAL: Puppet has 1 failures [11:50:50] PROBLEM - puppet last run on ms-fe1004 is CRITICAL: CRITICAL: Puppet has 1 failures [11:50:59] PROBLEM - puppet last run on db1046 is CRITICAL: CRITICAL: Puppet has 1 failures [11:50:59] PROBLEM - puppet last run on mw1173 is CRITICAL: CRITICAL: Puppet has 1 failures [11:50:59] PROBLEM - puppet last run on mw1100 is CRITICAL: CRITICAL: Puppet has 1 failures [11:50:59] PROBLEM - puppet last run on db1015 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:09] PROBLEM - puppet last run on mw1217 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:09] PROBLEM - puppet last run on ruthenium is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:09] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:09] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:09] PROBLEM - puppet last run on mw1008 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:09] PROBLEM - puppet last run on mw1060 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:09] PROBLEM - puppet last run on lvs3001 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:19] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:19] PROBLEM - puppet last run on virt1006 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:19] PROBLEM - puppet last run on db1002 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:19] PROBLEM - puppet last run on mw1153 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:19] PROBLEM - puppet last run on mw1205 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:19] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:29] PROBLEM - puppet last run on mw1099 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:29] PROBLEM - puppet last run on labsdb1003 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:29] PROBLEM - puppet last run on mw1088 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:30] PROBLEM - puppet last run on mw1164 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:30] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:39] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:39] PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:39] PROBLEM - puppet last run on iron is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:39] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:40] PROBLEM - puppet last run on db1040 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:40] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:49] PROBLEM - puppet last run on analytics1010 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:49] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:50] PROBLEM - puppet last run on search1001 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:50] PROBLEM - puppet last run on db1034 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:50] PROBLEM - puppet last run on mw1042 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:53] apergos: We're not waiting for slaves there [11:51:58] frankly [11:51:59] PROBLEM - puppet last run on holmium is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:59] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:59] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:59] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:59] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [11:51:59] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:09] PROBLEM - puppet last run on mw1211 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:09] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:09] PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:09] PROBLEM - puppet last run on mw1172 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:17] I thought taht wouldn't really have an advantage on read only operation [11:52:19] PROBLEM - puppet last run on db1028 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:29] PROBLEM - puppet last run on mw1213 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:29] PROBLEM - puppet last run on db1023 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:29] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:39] PROBLEM - puppet last run on mw1054 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:49] PROBLEM - puppet last run on db1051 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:49] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:49] PROBLEM - puppet last run on mw1114 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:49] PROBLEM - puppet last run on logstash1002 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:49] PROBLEM - puppet last run on db1021 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:50] PROBLEM - puppet last run on search1007 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:50] PROBLEM - puppet last run on pc1002 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:50] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:50] PROBLEM - puppet last run on amssq60 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:51] PROBLEM - puppet last run on amssq48 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:52] PROBLEM - puppet last run on analytics1016 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:59] PROBLEM - puppet last run on gallium is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:59] PROBLEM - puppet last run on mw1011 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:59] PROBLEM - puppet last run on mw1177 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:59] PROBLEM - puppet last run on es1007 is CRITICAL: CRITICAL: Puppet has 1 failures [11:52:59] PROBLEM - puppet last run on mw1126 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:00] every script should have that built n [11:53:09] PROBLEM - puppet last run on wtp1005 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:09] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:09] PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:09] PROBLEM - puppet last run on amslvs1 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:19] PROBLEM - puppet last run on mw1039 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:21] (03CR) 10Matthias Mullie: [C: 031] "LGTM" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/146849 (owner: 10Bsitu) [11:53:29] PROBLEM - puppet last run on db1016 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:29] PROBLEM - puppet last run on mw1175 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:29] PROBLEM - puppet last run on antimony is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:29] PROBLEM - puppet last run on cp1058 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:29] PROBLEM - puppet last run on ms-be3002 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:29] PROBLEM - puppet last run on amssq46 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:30] PROBLEM - puppet last run on amssq55 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:39] PROBLEM - puppet last run on cp4019 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:39] PROBLEM - puppet last run on db1043 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:39] PROBLEM - puppet last run on mw1076 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:40] PROBLEM - puppet last run on mw1002 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:40] PROBLEM - puppet last run on mw1195 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:40] PROBLEM - puppet last run on mw1044 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:40] PROBLEM - puppet last run on wtp1012 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:40] PROBLEM - puppet last run on labstore1001 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:49] PROBLEM - puppet last run on mw1206 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:50] PROBLEM - puppet last run on mw1208 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:50] PROBLEM - puppet last run on db1003 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:50] PROBLEM - puppet last run on dataset1001 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:50] PROBLEM - puppet last run on mc1012 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:50] PROBLEM - puppet last run on labnet1001 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:50] PROBLEM - puppet last run on analytics1013 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:51] PROBLEM - puppet last run on silver is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:52] PROBLEM - puppet last run on db1026 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:52] PROBLEM - puppet last run on ssl1005 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:52] PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:53] PROBLEM - puppet last run on cp4014 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:59] PROBLEM - puppet last run on cp1050 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:59] PROBLEM - puppet last run on db1052 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:59] PROBLEM - puppet last run on analytics1022 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:59] PROBLEM - puppet last run on lvs3004 is CRITICAL: CRITICAL: Puppet has 1 failures [11:53:59] PROBLEM - puppet last run on mw1162 is CRITICAL: CRITICAL: Puppet has 1 failures [11:54:05] apergos: Does that even wait for the external storage? [11:54:09] I doubt it [11:54:09] PROBLEM - puppet last run on polonium is CRITICAL: CRITICAL: Puppet has 1 failures [11:54:09] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: Puppet has 1 failures [11:54:09] PROBLEM - puppet last run on search1005 is CRITICAL: CRITICAL: Puppet has 1 failures [11:54:10] PROBLEM - puppet last run on cp4001 is CRITICAL: CRITICAL: Puppet has 1 failures [11:54:19] PROBLEM - puppet last run on analytics1038 is CRITICAL: CRITICAL: Puppet has 1 failures [11:54:19] PROBLEM - puppet last run on mw1014 is CRITICAL: CRITICAL: Puppet has 1 failures [11:54:29] PROBLEM - puppet last run on mw1151 is CRITICAL: CRITICAL: Puppet has 1 failures [11:54:29] PROBLEM - puppet last run on mw1133 is CRITICAL: CRITICAL: Puppet has 1 failures [11:54:29] PROBLEM - puppet last run on db1048 is CRITICAL: CRITICAL: Puppet has 1 failures [11:54:35] and the heat on the other DBs is negligible [11:54:39] PROBLEM - puppet last run on db1039 is CRITICAL: CRITICAL: Puppet has 1 failures [11:54:39] PROBLEM - puppet last run on gadolinium is CRITICAL: CRITICAL: Puppet has 1 failures [11:54:40] PROBLEM - puppet last run on mc1005 is CRITICAL: CRITICAL: Puppet has 1 failures [11:54:40] PROBLEM - puppet last run on mw1051 is CRITICAL: CRITICAL: Puppet has 1 failures [11:54:40] PROBLEM - puppet last run on lvs4003 is CRITICAL: CRITICAL: Puppet has 1 failures [11:54:49] PROBLEM - puppet last run on mw1098 is CRITICAL: CRITICAL: Puppet has 1 failures [11:54:50] PROBLEM - puppet last run on elastic1006 is CRITICAL: CRITICAL: Puppet has 1 failures [11:54:50] PROBLEM - puppet last run on mw1125 is CRITICAL: CRITICAL: Puppet has 1 failures [11:54:50] PROBLEM - puppet last run on snapshot1001 is CRITICAL: CRITICAL: Puppet has 1 failures [11:54:50] PROBLEM - puppet last run on sodium is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:09] PROBLEM - puppet last run on mw1180 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:21] I need to check how the maxlag parameter works again [11:55:29] PROBLEM - puppet last run on mw1202 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:29] PROBLEM - puppet last run on rdb1001 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:29] PROBLEM - puppet last run on cp1062 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:29] PROBLEM - puppet last run on virt1007 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:30] PROBLEM - puppet last run on rhenium is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:39] PROBLEM - puppet last run on mc1014 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:39] PROBLEM - puppet last run on cp1048 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:39] PROBLEM - puppet last run on db1020 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:39] PROBLEM - puppet last run on rubidium is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:40] PROBLEM - puppet last run on mw1190 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:40] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:40] PROBLEM - puppet last run on amssq40 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:40] PROBLEM - puppet last run on thallium is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:49] PROBLEM - puppet last run on analytics1032 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:49] PROBLEM - puppet last run on mw1146 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:49] PROBLEM - puppet last run on analytics1023 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:49] PROBLEM - puppet last run on mw1057 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:49] PROBLEM - puppet last run on ms-be1012 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:50] PROBLEM - puppet last run on mw1050 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:50] PROBLEM - puppet last run on mw1056 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:50] PROBLEM - puppet last run on mw1084 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:50] PROBLEM - puppet last run on ssl1009 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:51] PROBLEM - puppet last run on search1002 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:51] PROBLEM - puppet last run on mw1165 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:52] PROBLEM - puppet last run on snapshot1004 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:52] PROBLEM - puppet last run on cp4005 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:53] PROBLEM - puppet last run on bast4001 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:54] PROBLEM - puppet last run on amssq56 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:54] PROBLEM - puppet last run on mw1159 is CRITICAL: CRITICAL: Puppet has 1 failures [11:55:55] PROBLEM - puppet last run on wtp1018 is CRITICAL: CRITICAL: Puppet has 1 failures [11:56:00] PROBLEM - puppet last run on db1004 is CRITICAL: CRITICAL: Puppet has 1 failures [11:56:00] PROBLEM - puppet last run on mw1168 is CRITICAL: CRITICAL: Puppet has 1 failures [11:56:00] PROBLEM - puppet last run on cp1063 is CRITICAL: CRITICAL: Puppet has 1 failures [11:56:00] PROBLEM - puppet last run on mw1079 is CRITICAL: CRITICAL: Puppet has 1 failures [11:56:00] PROBLEM - puppet last run on db1060 is CRITICAL: CRITICAL: Puppet has 1 failures [11:56:09] PROBLEM - puppet last run on hafnium is CRITICAL: CRITICAL: Puppet has 1 failures [11:56:09] PROBLEM - puppet last run on mw1156 is CRITICAL: CRITICAL: Puppet has 1 failures [11:56:09] PROBLEM - puppet last run on ms-be1007 is CRITICAL: CRITICAL: Puppet has 1 failures [11:56:09] PROBLEM - puppet last run on db69 is CRITICAL: CRITICAL: Puppet has 1 failures [11:56:09] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Puppet has 1 failures [11:56:19] PROBLEM - puppet last run on wtp1002 is CRITICAL: CRITICAL: Puppet has 1 failures [11:56:19] PROBLEM - puppet last run on amssq51 is CRITICAL: CRITICAL: Puppet has 1 failures [11:56:27] $lb = wfGetLB(); [11:56:29] PROBLEM - puppet last run on mw1183 is CRITICAL: CRITICAL: Puppet has 1 failures [11:56:29] PROBLEM - puppet last run on tungsten is CRITICAL: CRITICAL: Puppet has 1 failures [11:56:29] PROBLEM - puppet last run on ssl3002 is CRITICAL: CRITICAL: Puppet has 1 failures [11:56:30] $lb->getMaxLag(); [11:56:38] that's what we do in some of our maint. scripts [11:56:39] PROBLEM - puppet last run on wtp1023 is CRITICAL: CRITICAL: Puppet has 1 failures [11:56:46] (in all that write data to the DB actually) [11:56:50] PROBLEM - puppet last run on wtp1022 is CRITICAL: CRITICAL: Puppet has 1 failures [11:57:19] I'll come back to that later when I've looke at the code again [11:57:27] can you doublecheck the targetDir please? [11:57:29] PROBLEM - puppet last run on wtp1011 is CRITICAL: CRITICAL: Puppet has 1 failures [11:57:29] PROBLEM - puppet last run on mw1029 is CRITICAL: CRITICAL: Puppet has 1 failures [11:57:29] PROBLEM - puppet last run on wtp1013 is CRITICAL: CRITICAL: Puppet has 1 failures [11:57:49] PROBLEM - puppet last run on search1024 is CRITICAL: CRITICAL: Puppet has 1 failures [11:57:59] apergos: I can, yes [11:58:24] I guess I'll walk through these variables step by step, although I copied most of them from the centralauth dump thing [11:58:26] * hoo is lazy [11:58:29] PROBLEM - puppet last run on mw1116 is CRITICAL: CRITICAL: Puppet has 1 failures [11:59:17] yes please (it's fine for you to be lazy as long as it doesn't mean more work for me :-P) [11:59:33] PROBLEM - puppet last run on hooft is CRITICAL: CRITICAL: Puppet has 1 failures [11:59:34] :D :) [11:59:35] did you actually want [ ] at the beginning and end of the concatted gzip content? [12:00:43] apergos: Yeah, that way it becomes valid json [12:01:04] okey dokey.. no commas or anything needed? well thta's your business anyways [12:01:15] oh, I think I forgot something there [12:01:17] :P [12:02:51] We changed the PHP script to be less smart about content at some point, but I didn't update the bash script [12:03:54] RECOVERY - puppet last run on amssq61 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [12:04:03] RECOVERY - puppet last run on mw1174 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [12:04:23] RECOVERY - puppet last run on mc1002 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [12:04:23] RECOVERY - puppet last run on elastic1008 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [12:04:43] RECOVERY - puppet last run on db1022 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [12:04:43] RECOVERY - puppet last run on elastic1018 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [12:04:43] RECOVERY - puppet last run on mc1003 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [12:04:53] RECOVERY - puppet last run on wtp1016 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [12:04:54] RECOVERY - puppet last run on ms-fe1004 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [12:05:03] RECOVERY - puppet last run on mw1217 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [12:05:13] RECOVERY - puppet last run on mw1008 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [12:05:23] RECOVERY - puppet last run on db1002 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [12:05:23] RECOVERY - puppet last run on mw1205 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [12:05:23] RECOVERY - puppet last run on mw1153 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [12:05:23] RECOVERY - puppet last run on analytics1035 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [12:05:23] RECOVERY - puppet last run on mw1099 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [12:05:33] RECOVERY - puppet last run on elastic1012 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [12:05:33] RECOVERY - puppet last run on mw1068 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [12:05:33] RECOVERY - puppet last run on mw1088 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [12:05:33] RECOVERY - puppet last run on search1010 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [12:05:33] RECOVERY - puppet last run on mw1164 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [12:05:33] RECOVERY - puppet last run on mw1046 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [12:05:34] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [12:05:43] RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [12:05:43] RECOVERY - puppet last run on mw1003 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [12:05:43] RECOVERY - puppet last run on iron is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [12:05:43] RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [12:05:44] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [12:05:44] RECOVERY - puppet last run on mw1176 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [12:05:53] RECOVERY - puppet last run on search1001 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [12:05:54] RECOVERY - puppet last run on mw1189 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [12:05:54] RECOVERY - puppet last run on mw1120 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [12:05:54] RECOVERY - puppet last run on mw1117 is OK: OK: Puppet is currently enabled, last run 60 seconds ago with 0 failures [12:05:54] RECOVERY - puppet last run on holmium is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [12:05:54] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [12:05:54] RECOVERY - puppet last run on db1046 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [12:05:55] RECOVERY - puppet last run on mw1173 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [12:05:55] RECOVERY - puppet last run on mw1100 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [12:05:56] RECOVERY - puppet last run on mw1069 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [12:06:03] RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [12:06:03] RECOVERY - puppet last run on mw1150 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [12:06:03] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [12:06:03] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [12:06:04] RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [12:06:13] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 60 seconds ago with 0 failures [12:06:13] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [12:06:13] RECOVERY - puppet last run on mw1060 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [12:06:13] RECOVERY - puppet last run on lvs3001 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [12:06:23] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [12:06:23] RECOVERY - puppet last run on mw1213 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [12:06:33] RECOVERY - puppet last run on virt1006 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [12:06:33] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [12:06:33] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [12:06:33] RECOVERY - puppet last run on db1023 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [12:06:43] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [12:06:43] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [12:06:44] RECOVERY - puppet last run on db1051 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [12:06:44] RECOVERY - puppet last run on mw1114 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [12:06:44] RECOVERY - puppet last run on db1021 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [12:06:44] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [12:06:53] RECOVERY - puppet last run on mw1042 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [12:06:53] RECOVERY - puppet last run on db1028 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [12:06:54] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [12:06:54] RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [12:06:54] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [12:06:54] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [12:07:03] RECOVERY - puppet last run on mw1177 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [12:07:03] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [12:07:23] RECOVERY - puppet last run on mw1175 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [12:07:23] RECOVERY - puppet last run on antimony is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [12:07:23] RECOVERY - puppet last run on cp1058 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [12:07:33] RECOVERY - puppet last run on amssq46 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [12:07:33] RECOVERY - puppet last run on amssq55 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [12:07:33] RECOVERY - puppet last run on labsdb1003 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [12:07:33] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [12:07:43] RECOVERY - puppet last run on mw1002 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [12:07:43] RECOVERY - puppet last run on mw1054 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [12:07:43] RECOVERY - puppet last run on wtp1012 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [12:07:44] RECOVERY - puppet last run on labstore1001 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [12:07:44] RECOVERY - puppet last run on analytics1010 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [12:07:44] RECOVERY - puppet last run on mw1208 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [12:07:44] RECOVERY - puppet last run on logstash1002 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [12:07:44] RECOVERY - puppet last run on db1003 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [12:07:53] RECOVERY - puppet last run on mc1012 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [12:07:53] RECOVERY - puppet last run on labnet1001 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [12:07:53] RECOVERY - puppet last run on db1034 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [12:07:53] RECOVERY - puppet last run on pc1002 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [12:07:53] RECOVERY - puppet last run on search1007 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [12:07:53] RECOVERY - puppet last run on snapshot1001 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [12:07:54] RECOVERY - puppet last run on cp4014 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [12:07:54] RECOVERY - puppet last run on analytics1016 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [12:07:54] RECOVERY - puppet last run on cp3010 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [12:07:55] RECOVERY - puppet last run on amssq60 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [12:07:56] RECOVERY - puppet last run on amssq48 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [12:07:56] RECOVERY - puppet last run on mw1011 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [12:08:03] RECOVERY - puppet last run on db1052 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [12:08:03] RECOVERY - puppet last run on db1060 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [12:08:03] RECOVERY - puppet last run on analytics1038 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [12:08:03] RECOVERY - puppet last run on mw1162 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [12:08:03] RECOVERY - puppet last run on es1007 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [12:08:04] RECOVERY - puppet last run on mw1126 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [12:08:04] RECOVERY - puppet last run on mw1211 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [12:08:04] RECOVERY - puppet last run on polonium is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [12:08:05] RECOVERY - puppet last run on wtp1005 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [12:08:05] RECOVERY - puppet last run on ruthenium is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [12:08:05] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [12:08:13] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [12:08:13] RECOVERY - puppet last run on amslvs1 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [12:08:13] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [12:08:13] RECOVERY - puppet last run on mw1172 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [12:08:13] RECOVERY - puppet last run on cp4001 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [12:08:23] RECOVERY - puppet last run on mw1039 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [12:08:23] RECOVERY - puppet last run on db1016 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [12:08:33] RECOVERY - puppet last run on ms-be3002 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [12:08:43] RECOVERY - puppet last run on db1043 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [12:08:43] RECOVERY - puppet last run on mw1076 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [12:08:44] RECOVERY - puppet last run on mw1206 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [12:08:44] RECOVERY - puppet last run on dataset1001 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [12:08:53] RECOVERY - puppet last run on elastic1006 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [12:09:23] RECOVERY - puppet last run on mw1133 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [12:09:23] RECOVERY - puppet last run on mw1151 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [12:09:33] RECOVERY - puppet last run on db1048 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [12:09:33] RECOVERY - puppet last run on rhenium is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [12:09:33] RECOVERY - puppet last run on mc1014 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [12:09:33] RECOVERY - puppet last run on cp4019 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [12:09:43] RECOVERY - puppet last run on gadolinium is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [12:09:43] RECOVERY - puppet last run on db1039 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [12:09:43] RECOVERY - puppet last run on mc1005 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [12:09:43] RECOVERY - puppet last run on mw1195 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [12:09:44] RECOVERY - puppet last run on mw1044 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [12:09:44] RECOVERY - puppet last run on mw1190 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [12:09:44] RECOVERY - puppet last run on analytics1023 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [12:09:53] RECOVERY - puppet last run on analytics1013 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [12:09:53] RECOVERY - puppet last run on mw1084 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [12:09:53] RECOVERY - puppet last run on db1026 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [12:09:53] RECOVERY - puppet last run on silver is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [12:09:53] RECOVERY - puppet last run on mw1098 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [12:09:53] RECOVERY - puppet last run on search1002 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [12:09:53] RECOVERY - puppet last run on ssl1005 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [12:09:54] RECOVERY - puppet last run on sodium is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [12:09:55] RECOVERY - puppet last run on amssq56 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [12:09:55] RECOVERY - puppet last run on db1004 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [12:09:56] RECOVERY - puppet last run on mw1168 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [12:09:56] RECOVERY - puppet last run on cp1050 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [12:09:56] RECOVERY - puppet last run on mw1079 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [12:10:03] RECOVERY - puppet last run on analytics1022 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [12:10:03] RECOVERY - puppet last run on lvs3004 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [12:10:03] RECOVERY - puppet last run on hafnium is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [12:10:03] RECOVERY - puppet last run on mw1180 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [12:10:13] RECOVERY - puppet last run on search1005 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [12:10:23] RECOVERY - puppet last run on amssq51 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [12:10:23] RECOVERY - puppet last run on mw1183 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [12:10:23] RECOVERY - puppet last run on mw1202 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [12:10:33] RECOVERY - puppet last run on tungsten is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [12:10:33] RECOVERY - puppet last run on cp1062 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [12:10:33] RECOVERY - puppet last run on rdb1001 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [12:10:33] RECOVERY - puppet last run on ssl3002 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [12:10:33] RECOVERY - puppet last run on wtp1002 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [12:10:43] RECOVERY - puppet last run on cp1048 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [12:10:43] RECOVERY - puppet last run on db1020 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [12:10:43] RECOVERY - puppet last run on rubidium is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [12:10:43] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [12:10:44] RECOVERY - puppet last run on amssq40 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [12:10:44] RECOVERY - puppet last run on thallium is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [12:10:44] RECOVERY - puppet last run on mw1051 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [12:10:44] RECOVERY - puppet last run on lvs4003 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [12:10:44] RECOVERY - puppet last run on analytics1032 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [12:10:45] RECOVERY - puppet last run on mw1146 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [12:10:46] RECOVERY - puppet last run on ms-be1012 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [12:10:46] RECOVERY - puppet last run on mw1057 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [12:10:53] RECOVERY - puppet last run on mw1014 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [12:10:53] RECOVERY - puppet last run on mw1050 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [12:10:53] RECOVERY - puppet last run on mw1056 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [12:10:53] RECOVERY - puppet last run on ssl1009 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [12:10:53] RECOVERY - puppet last run on mw1165 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [12:10:53] RECOVERY - puppet last run on bast4001 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [12:10:54] RECOVERY - puppet last run on cp4005 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [12:10:54] RECOVERY - puppet last run on wtp1022 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [12:10:55] RECOVERY - puppet last run on snapshot1004 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [12:10:55] RECOVERY - puppet last run on mw1125 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [12:10:56] RECOVERY - puppet last run on mw1159 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [12:10:56] RECOVERY - puppet last run on wtp1018 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [12:10:56] RECOVERY - puppet last run on cp1063 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [12:11:03] RECOVERY - puppet last run on mw1156 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [12:11:03] RECOVERY - puppet last run on db69 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [12:11:03] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [12:11:33] RECOVERY - puppet last run on mw1116 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [12:11:33] RECOVERY - puppet last run on virt1007 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [12:11:34] RECOVERY - puppet last run on wtp1013 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [12:11:43] RECOVERY - puppet last run on wtp1023 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [12:11:54] RECOVERY - puppet last run on search1024 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [12:12:03] RECOVERY - puppet last run on ms-be1007 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [12:12:23] RECOVERY - puppet last run on wtp1011 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [12:12:23] RECOVERY - puppet last run on mw1029 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [12:12:33] RECOVERY - puppet last run on hooft is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [12:22:16] apergos: Should it only set the system::role if enabled? [12:32:27] hoo: I use ensure => $ensure, to set it when it's needed [12:32:40] Yeah, got that [12:35:14] (03PS1) 10BBlack: Revert 79d934f8 CNAME changes, but leave in the new text-lb RR [operations/dns] - 10https://gerrit.wikimedia.org/r/147095 [12:38:19] (03CR) 10Mark Bergsma: [C: 031] "As no negative caches should exist for wikipedia-lb.wikimedia.org, wikimedia-lb.wikimedia.org etc, this should be a workaround for some of" [operations/dns] - 10https://gerrit.wikimedia.org/r/147095 (owner: 10BBlack) [12:40:59] (03CR) 10BBlack: [C: 032] Revert 79d934f8 CNAME changes, but leave in the new text-lb RR [operations/dns] - 10https://gerrit.wikimedia.org/r/147095 (owner: 10BBlack) [12:57:33] (03PS2) 10Hoo man: Introduce snapshot::wikidatajsondump [operations/puppet] - 10https://gerrit.wikimedia.org/r/146470 [12:58:12] (03CR) 10Hoo man: "Added system::role, also fixed the script." [operations/puppet] - 10https://gerrit.wikimedia.org/r/146470 (owner: 10Hoo man) [12:58:53] apergos: I've now tested the script on snapshot1003 with testwikidata (and manipulated paths so that stuff goes into my home) [12:59:11] it works well now and produces valid JSON (php is able to parse it) [13:05:12] ok great [13:05:40] I'll have another look at it this evening [13:06:50] apergos: Awesome :) [13:09:14] (03PS3) 10Hoo man: Introduce snapshot::wikidatajsondump [operations/puppet] - 10https://gerrit.wikimedia.org/r/146470 [13:09:59] (03CR) 10Hoo man: "No need for the comment to be there twice :P" [operations/puppet] - 10https://gerrit.wikimedia.org/r/146470 (owner: 10Hoo man) [13:46:39] (03PS1) 10Vogone: Added namespace aliases for thwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147102 (https://bugzilla.wikimedia.org/68108) [13:54:37] (03CR) 10Matanya: "ran under puppet compiler:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147076 (owner: 10Matanya) [13:58:16] (03CR) 10Nullzero: [C: 031] Added namespace aliases for thwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147102 (https://bugzilla.wikimedia.org/68108) (owner: 10Vogone) [14:14:56] (03PS1) 10Ottomata: Ensure hadoop directories exist in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/147109 [14:26:30] (03PS1) 10Chmarkine: update SSL ciphers for Ganglia to support PFS [operations/puppet] - 10https://gerrit.wikimedia.org/r/147110 (https://bugzilla.wikimedia.org/53259) [14:32:42] akosiaris: are revision bumps supposed to go on the debian branch? [14:32:59] i thought we just used debian branch for actual debian/ changes, and version based branches for changelog [14:34:00] (03CR) 10Ottomata: [C: 032 V: 032] "Thanks!" [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/147009 (owner: 10Kmosher) [14:35:21] (03CR) 10Ottomata: [C: 032 V: 032] "LGTM, thank you!" [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/147010 (owner: 10Kmosher) [14:36:02] debian branch is pretty much for everything under debian/ [14:36:12] that includes debian/changelog ottomata [14:36:35] so yes, revision bumps on debian branch too [14:36:48] (03PS2) 10Giuseppe Lavagetto: jobrunner: use a more standard config file location [operations/puppet] - 10https://gerrit.wikimedia.org/r/147067 [14:37:20] in fact, having a new version should be as easy as: git checkout -b debian-0.X.Y 0.X.Y ; git merge debian ; git-buildpackage -uc -us [14:37:43] well some editing of patches and debian/gbp.conf is needed somewhere along those steps but you get the idea [14:38:48] (03CR) 10Giuseppe Lavagetto: [C: 032] jobrunner: use a more standard config file location [operations/puppet] - 10https://gerrit.wikimedia.org/r/147067 (owner: 10Giuseppe Lavagetto) [14:39:39] PROBLEM - puppet last run on fenari is CRITICAL: CRITICAL: Puppet has 1 failures [14:42:34] (03CR) 10BryanDavis: [C: 031] Enable wgUseInstantCommons for all betawikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147022 (owner: 10KartikMistry) [14:43:29] hmm [14:45:31] jouncebot: refresh [14:45:33] I refreshed my knowledge about deployments. [14:45:48] (03PS2) 10Giuseppe Lavagetto: Moved more runners to the new job loop [operations/puppet] - 10https://gerrit.wikimedia.org/r/147068 (owner: 10Aaron Schulz) [14:45:57] (03CR) 10Giuseppe Lavagetto: [C: 032] Moved more runners to the new job loop [operations/puppet] - 10https://gerrit.wikimedia.org/r/147068 (owner: 10Aaron Schulz) [14:46:08] hmm, ok akosiaris, sorry about that, i guess that makes sense [14:46:20] (03CR) 10Ottomata: [C: 032 V: 032] 0.8.0-1 debianization release [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/146918 (owner: 10Alexandros Kosiaris) [14:46:30] (03CR) 10Ottomata: [C: 032 V: 032] Upping revision to 0.8.0-2 with support for setting ulimit open files [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/146919 (owner: 10Alexandros Kosiaris) [14:46:49] (03CR) 10Giuseppe Lavagetto: [V: 032] Moved more runners to the new job loop [operations/puppet] - 10https://gerrit.wikimedia.org/r/147068 (owner: 10Aaron Schulz) [14:47:02] sigh... fenari, fenari, fenari [14:47:09] ottomata: its ok [14:47:25] we need to kill fenari ASAP... [14:47:25] <_joe_> akosiaris: I've seen packages not being installed [14:47:32] yes [14:47:34] <_joe_> akosiaris: what has happened there? [14:47:36] missing from the repos [14:47:48] Err http://security.ubuntu.com/ubuntu/ precise-security/main mysql-client-5.5 amd64 5.5.38-0ubuntu0.12.04.1 [14:47:48] 404 Not Found [IP: 2620:0:861:1:208:80:154:10 8080] [14:48:24] <_joe_> btw, we've almost completely killed job-loop.sh [14:48:39] <_joe_> on the jobrunners at least [14:49:39] nice [14:49:59] <_joe_> next step is, include ::mediawiki::jobrunner::hhvm [14:50:15] hmm seems like security.ubuntu.com is to blame [14:50:25] publishing a deb that does not exist [14:50:41] <_joe_> akosiaris: they had issues this morning as well [14:50:46] (03PS2) 10Filippo Giunchedi: releases: add reprepro repository [operations/puppet] - 10https://gerrit.wikimedia.org/r/146826 [14:50:48] (03CR) 10Filippo Giunchedi: "thanks for the feedback Alex!" (0322 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/146826 (owner: 10Filippo Giunchedi) [14:51:15] ok, let's wait it out then [14:59:19] gi11es: around to support you SWAT deploy? [14:59:33] manybubbles_: yep [14:59:52] * bd808 waves for his prod no-op merge [15:00:05] manybubbles, anomie, gi11es: The time is nigh to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140717T1500) [15:00:55] bd808: sure sure sure [15:01:03] jouncebot: be polite [15:02:34] matanya: I actually like the message because it's easy to set a mention watch on without false positiives. [15:03:11] !log manybubbles Synchronized wmf-config/CommonSettings-labs.php: SWAT deploy to keep us synced, but this is a noop in prod. only anything in beta. (duration: 00m 05s) [15:03:16] Logged the message, Master [15:03:18] bd808: you are done [15:03:29] thanks manybubbles_ [15:03:47] bd808: yeah, well, he can say please. [15:04:28] matanya: put a pull request in at https://github.com/mattofak/jouncebot :) [15:04:38] i will [15:05:37] PROBLEM - puppet last run on labnet1001 is CRITICAL: CRITICAL: Puppet has 1 failures [15:05:56] !log manybubbles Synchronized php-1.24wmf13/extensions/MultimediaViewer/: SWAT - Moving repo icon back to the right-hand side in Media Viewer (duration: 00m 05s) [15:06:00] Logged the message, Master [15:06:20] gi11es: that is you [15:07:06] * manybubbles_ all done with SWAT for the day [15:07:59] manybubbles_: everything looks fine on commons, thanks [15:12:53] (03CR) 10BryanDavis: "Tested at http://es.wikipedia.beta.wmflabs.org/wiki/Archivo:NOFX2.jpg" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147022 (owner: 10KartikMistry) [15:17:37] RECOVERY - puppet last run on fenari is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [15:27:58] (03CR) 10Alexandros Kosiaris: releases: add reprepro repository (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/146826 (owner: 10Filippo Giunchedi) [15:33:30] (03PS1) 10Andrew Bogott: Don't try to stop dnsmasq if we've already removed the init script [operations/puppet] - 10https://gerrit.wikimedia.org/r/147121 [15:34:54] (03PS2) 10Chmarkine: update SSL ciphers for Ganglia to support PFS [operations/puppet] - 10https://gerrit.wikimedia.org/r/147110 (https://bugzilla.wikimedia.org/53259) [15:34:56] (03PS1) 10Chmarkine: update SSL ciphers for noc.wikimedia.org to support PFS [operations/puppet] - 10https://gerrit.wikimedia.org/r/147123 (https://bugzilla.wikimedia.org/53259) [15:35:06] (03CR) 10Andrew Bogott: [C: 032] Don't try to stop dnsmasq if we've already removed the init script [operations/puppet] - 10https://gerrit.wikimedia.org/r/147121 (owner: 10Andrew Bogott) [15:35:59] (03PS1) 10Filippo Giunchedi: swift-account-stats: fetch containers when needed [operations/puppet] - 10https://gerrit.wikimedia.org/r/147124 [15:36:10] ACKNOWLEDGEMENT - HTTP on francium is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 734 bytes in 0.034 second response time rhalsell its not live [15:36:10] ACKNOWLEDGEMENT - Memcached on francium is CRITICAL: Connection refused rhalsell its not live [15:36:45] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift-account-stats: fetch containers when needed [operations/puppet] - 10https://gerrit.wikimedia.org/r/147124 (owner: 10Filippo Giunchedi) [15:40:05] (03PS1) 10Andrew Bogott: Revert "Don't try to stop dnsmasq if we've already removed the init script" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147126 [15:40:08] (03PS1) 10Andrew Bogott: Revert "Remove /etc/init.d/dnsmasq on labs network nodes." [operations/puppet] - 10https://gerrit.wikimedia.org/r/147127 [15:41:42] (03CR) 10Andrew Bogott: [C: 032] Revert "Don't try to stop dnsmasq if we've already removed the init script" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147126 (owner: 10Andrew Bogott) [15:41:54] (03CR) 10Andrew Bogott: [C: 032] Revert "Remove /etc/init.d/dnsmasq on labs network nodes." [operations/puppet] - 10https://gerrit.wikimedia.org/r/147127 (owner: 10Andrew Bogott) [15:42:03] (03PS2) 10Andrew Bogott: Revert "Don't try to stop dnsmasq if we've already removed the init script" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147126 [15:42:10] (03PS2) 10Andrew Bogott: Revert "Remove /etc/init.d/dnsmasq on labs network nodes." [operations/puppet] - 10https://gerrit.wikimedia.org/r/147127 [15:45:44] (03PS1) 10Hoo man: Set Wikibase client's allowArbitraryDataAccess to false [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147129 [15:46:49] RECOVERY - puppet last run on labnet1001 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [15:47:09] (03CR) 10Aude: [C: 031] Set Wikibase client's allowArbitraryDataAccess to false [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147129 (owner: 10Hoo man) [15:47:21] (03CR) 10Chmarkine: "This patch does not depend on Ifacd5e4a3a3fdb5b832afec947c2c213797429d9. That's a mistake." [operations/puppet] - 10https://gerrit.wikimedia.org/r/147123 (https://bugzilla.wikimedia.org/53259) (owner: 10Chmarkine) [15:53:31] (03CR) 10Dzahn: [C: 032] update SSL ciphers for Ganglia to support PFS [operations/puppet] - 10https://gerrit.wikimedia.org/r/147110 (https://bugzilla.wikimedia.org/53259) (owner: 10Chmarkine) [15:54:04] (03CR) 10Dzahn: [V: 032] update SSL ciphers for Ganglia to support PFS [operations/puppet] - 10https://gerrit.wikimedia.org/r/147110 (https://bugzilla.wikimedia.org/53259) (owner: 10Chmarkine) [15:55:36] (03PS1) 10Dzahn: retab ganglia apache config template [operations/puppet] - 10https://gerrit.wikimedia.org/r/147131 [15:57:00] (03PS2) 10Dzahn: retab ganglia apache config template [operations/puppet] - 10https://gerrit.wikimedia.org/r/147131 [15:58:33] (03CR) 10Chmarkine: [C: 031] retab ganglia apache config template [operations/puppet] - 10https://gerrit.wikimedia.org/r/147131 (owner: 10Dzahn) [15:59:01] (03PS1) 10Dzahn: ganglia apache template: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/147132 [15:59:22] (03CR) 10Dzahn: [C: 032] retab ganglia apache config template [operations/puppet] - 10https://gerrit.wikimedia.org/r/147131 (owner: 10Dzahn) [16:02:58] (03PS1) 10Ori.livneh: mediawiki: use native min/floor funcs rather than inline_template [operations/puppet] - 10https://gerrit.wikimedia.org/r/147135 [16:07:05] (03PS1) 10Ori.livneh: mediawiki: remove inline_template in favor of min/floor [operations/puppet] - 10https://gerrit.wikimedia.org/r/147136 [16:12:32] <_joe_> ori: \o/ [16:15:51] akosiaris: do you have the 0.8.1.1 debian commit up in gerrit yet? [16:16:03] i have a change for the kafka shell script to match 0.8.1.1 changes to bin/*.sh scripts [16:16:40] * RD pokes Nemo_bis - otrs-wiki (private) needs local uploads re-enabled but they're down now for some reason...any idea? [16:16:46] ottomata: https://gerrit.wikimedia.org/r/#/c/146920/ [16:17:51] (03CR) 10Dzahn: [C: 032] ganglia apache template: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/147132 (owner: 10Dzahn) [16:18:04] RD: did you try putting something in MediaWiki:Licenses? [16:18:24] Nemo_bis: No [16:18:28] Why? [16:19:00] Because it's now required (not my fault) [16:19:39] https://gerrit.wikimedia.org/r/#/c/136520/ [16:21:06] thanks Nemo_bis - I'll correct this soon/now. What happens then? [16:21:12] which license is it going to be? [16:21:13] It will work automatically? [16:21:28] I have no idea. It's a very weird wiki. [16:21:45] I'll likely ask LCA what we should do ;-) [16:22:01] as long as it doesnt just lead to content "foo" on that page, heh :) [16:22:16] ;-) I actually just said that a minute ago [16:22:24] hehe [16:22:26] I was kidding [16:22:26] 07/17/14 [12:20:52] Oh, so can I just add some foobar to the page? [16:22:26] "foo" is ok as content [16:22:42] hah, and i had _not_ read that [16:23:02] don't use WTFPL :) [16:23:46] Anyways, what is next step? [16:24:00] akosiaris: why debian-0.8.1 instead of debian-0.8.1.1? [16:24:00] After I get that straigtened out [16:24:20] i guess we plan on just merging minor-minor upstream version changes onto that branch? [16:24:30] e.g. 0.8.1.2? [16:26:29] (03PS2) 10Giuseppe Lavagetto: jobrunner: create hhvm-only jobrunners [operations/puppet] - 10https://gerrit.wikimedia.org/r/147086 [16:31:06] (03CR) 10Dzahn: "why are we still getting a B from Qualis now? ehm..." [operations/puppet] - 10https://gerrit.wikimedia.org/r/147132 (owner: 10Dzahn) [16:31:45] (03CR) 10Dzahn: "eh..why are we still getting a B from Qualis?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147110 (https://bugzilla.wikimedia.org/53259) (owner: 10Chmarkine) [16:32:24] (03PS1) 10Giuseppe Lavagetto: add alternatives installation for /usr/bin/php [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/147141 [16:32:53] ottomata: the name of the branch ? [16:33:03] it is unimportant... it can be anything you want [16:33:22] but you are correct, I should have been more consistent [16:33:45] it is however there just to make git-buildpackage happy [16:34:13] in fact, I think a README should be added in the debian branch on what is the best way to build the package [16:34:41] <_joe_> debuild --pray [16:34:49] ahahaha [16:34:54] it only that would work [16:35:04] <_joe_> works in Rome [16:35:11] <_joe_> true story [16:35:41] the soiled must be consecrated or something ? [16:35:52] s/soiled/soil [16:37:17] akosiaris: i'll do a readme, including how we want to manage branches [16:37:23] <_joe_> no, the nearer you are to the holy see, the better it works [16:37:37] i'm ok with keeping the branch at 0.8.1, but that means that we'd have to merge in future minor minor tags, right? [16:37:40] if that's normal, that's fine [16:37:48] otherwise we can make a debian-0.8.1.1 branch [16:37:52] which do you think is better? [16:38:36] (03CR) 10BryanDavis: jobrunner: create hhvm-only jobrunners (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147086 (owner: 10Giuseppe Lavagetto) [16:38:41] ottomata: so, the branch is essentially temporary. No need to really keep it around (which is why I said it is unimportant) [16:38:58] it is only used by git-buildpackage and more or less nothing else [16:39:25] let's just point out that it will be debian-THE_TAG_TO_BUILD_AGAINST [16:39:54] and it should be recreated on every new version we build against and we should be ok [16:40:12] (03CR) 10Dzahn: [C: 032] gerrit - remove DHE ciphers [operations/puppet] - 10https://gerrit.wikimedia.org/r/146464 (owner: 10Dzahn) [16:40:13] I am already on the README pointing this out [16:40:44] ok cool [16:40:47] sounds good akosiaris [16:41:42] akosiaris: I'd be interested in that too (what we want to standardize on for operations/debs) [16:42:28] i think what we have going in kafka is pretty good, but only when upstream maintains a git repository and tags [16:42:37] the problem is, that that is not often the case [16:42:38] but we do: [16:42:53] upstreams usual branches and tags, whatever they are (master, 0.8.1.1, etc.) [16:43:04] godog: ok. I suppose it will have some if/then/cases depending on how the upstream provides sources, as ottomata has already pointed out [16:43:12] branch: debian - contains debian stuff only, no upstream code [16:43:40] branch: debian-X.Y.Z - branched off of debian branch, then merge in upstream tag X.Y.Z [16:44:10] oh and, i guess: actual debian/ changes should only be made on the 'debian' branch [16:44:17] including gbp.conf and changelog updates [16:44:24] before you create your debian-X.Y.Z branch [16:44:31] (I learned this today, I like it.) [16:44:42] btw ottomata, now that kafka finally releases .tar.gz and we don't have to maintaining a prelease version package, we could move back to using their .tar.gz and not git [16:44:45] that way you only need to create the versioned debian branch and merge in upstream, and you should be good to go [16:44:55] akosiaris: i like git-buildpackage [16:44:57] does somebody understand why this: http://puppet-compiler.wmflabs.org/139/change/145266/html/nickel.wikimedia.org.html would be caused by this: https://gerrit.wikimedia.org/r/#/c/145266/1/modules/ganglia_new/templates/gmond.conf.erb [16:45:08] haha, but only because that is the only way I k now to build debs really! [16:45:10] let's not change! [16:45:14] (i think it just changes on every puppet run ?!) [16:45:45] mutante: I have seen this too, it does really have something to do with your change [16:45:50] it does not* [16:45:52] <_joe_> mutante: probably does [16:45:54] yeah [16:45:55] i think i know the problem [16:45:58] i've seen that too [16:46:04] probably a hash [16:46:05] the ganglia view template needs sorted before rendered [16:46:05] i think [16:46:07] <_joe_> some non sorted array [16:46:18] <_joe_> s/array/hash/ [16:47:12] it's like when run puppet manually on nickel, you can also see some similar ones [16:47:13] ottomata: well git-buildpackage is all nice and well, but it has the downside it does not work with pbuilder/cowbuilder [16:47:15] https://github.com/wikimedia/operations-puppet/blob/production/templates/ganglia/ganglia_view.json.erb [16:47:29] i have never used pbuilder or cowbuilder! [16:47:34] and i barely even know what those are [16:47:38] cowbuilder never heard of :D [16:47:47] but this shows my ignorance in the deb package building world [16:47:55] so both basically create a "pristine" environment every time they build a package [16:47:56] i have only done it here at WMF, and we use git-buildpackage [16:48:09] including dependencies, etc? [16:48:12] yes [16:48:20] aye, that's the big downside of git-buildpackage then [16:48:31] gbp can do pristine checkout or whatever [16:48:39] but it requires build deps to be installed :/ [16:48:55] yeah, pbuilder installs them on the fly during the build [16:48:58] mutante: its proably this [16:48:59] "items":<%= JSON.pretty_generate(items) %> [16:49:13] hmm, maybe if [16:49:17] this [16:49:18] graphs.each do |graph| [16:49:19] was changed to [16:49:32] and after it is done it just drops the build environments and you just keep the packages [16:49:39] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [16:49:40] hmm, no that's an array [16:49:58] icinga-wm: nope. No changes to merge. [16:49:59] (03CR) 10Andrew Bogott: Fixed base module Puppet 3 lint issues. (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/146675 (owner: 10Scottlee) [16:51:00] who runs sudo puppet-merge vs sudo -i ; puppet-merge ? [16:51:33] (03PS2) 10Ottomata: Update to 0.8.1.1 [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/146920 (owner: 10Alexandros Kosiaris) [16:51:52] (03PS1) 10Reedy: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147144 [16:51:53] hmmm [16:51:54] (03PS1) 10Reedy: testwiki to 1.24wmf14 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147145 [16:51:56] (03PS1) 10Reedy: Wikipedias to 1.24wmf13 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147146 [16:51:58] (03PS1) 10Reedy: group0 to 1.24wmf14 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147147 [16:52:29] (03CR) 10Reedy: [C: 032] Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147144 (owner: 10Reedy) [16:52:39] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [16:53:12] http://p.defau.lt/?VoFOQoBQ7pm0MxDXAPh_4g [16:53:14] weird.... [16:53:20] akosiaris: looks like it's just the delay between palladium and strontium? [16:53:26] really really weird [16:53:32] mutante: there is not delay [16:53:35] oh [16:53:36] I fixed it manually [16:53:47] but this was really really weird ... [16:53:50] git bug ? [16:54:37] (03CR) 10Reedy: [C: 032] testwiki to 1.24wmf14 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147145 (owner: 10Reedy) [16:54:42] why did id fail the first time ? [16:54:46] did it* [16:57:18] (03Merged) 10jenkins-bot: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147144 (owner: 10Reedy) [16:57:23] akosiaris: ... really weird indeed for it to happen just a single time [16:58:10] https://git.wiki.kernel.org/index.php/Git_FAQ#Git_commit_is_dying_telling_me_.22fatal:_empty_ident_.3Cuser.40myhost.3E_not_allowed.22.2C_what.27s_wrong.3F [16:59:02] mutante: yes but this is not a commit [16:59:33] it is a pull... a ff pull on top of that (not that it matters) [17:00:04] true, just makes it weirder? [17:00:04] (03Merged) 10jenkins-bot: testwiki to 1.24wmf14 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147145 (owner: 10Reedy) [17:01:58] !log reedy Started scap: testwiki to 1.24wmf14 [17:02:01] PROBLEM - Host payments1004 is DOWN: PING CRITICAL - Packet loss = 100% [17:02:03] Logged the message, Master [17:02:42] hi. payments alert - intentional? [17:02:48] yea i just pinged him too, heh [17:02:55] its known [17:02:57] cook ok [17:03:04] !log payments4 is kernel updating (per jgreen) [17:03:08] Logged the message, RobH [17:03:11] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: Puppet has 1 failures [17:03:15] pheew.. we still get paid [17:03:18] hahaha [17:03:20] hehe [17:03:21] ok [17:03:32] ouf [17:03:39] I'm going to cycle through all of them, silencing now. [17:04:22] ottomata: I take the previous statement back. git-buildpackage can work with pbuilder. Will try it and update the README [17:04:30] (03CR) 10Dzahn: [C: 032] update SSL ciphers for noc.wikimedia.org to support PFS [operations/puppet] - 10https://gerrit.wikimedia.org/r/147123 (https://bugzilla.wikimedia.org/53259) (owner: 10Chmarkine) [17:04:45] !log reedy scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="testwiki" --list-file="/a/common/wmf-config/extension-list" --output="/tmp/tmp.VsoJsYY6Q2" ' returned non-zero exit status 1 (duration: 02m 46s) [17:04:49] Logged the message, Master [17:05:02] (03CR) 10Dzahn: [V: 032] update SSL ciphers for noc.wikimedia.org to support PFS [operations/puppet] - 10https://gerrit.wikimedia.org/r/147123 (https://bugzilla.wikimedia.org/53259) (owner: 10Chmarkine) [17:05:13] Oh, duh [17:05:21] RECOVERY - Host payments1004 is UP: PING OK - Packet loss = 0%, RTA = 1.29 ms [17:06:50] (03PS1) 10Reedy: Move ClientSide and CommunityVoice [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147149 [17:07:03] (03CR) 10Reedy: [C: 032] Move ClientSide and CommunityVoice [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147149 (owner: 10Reedy) [17:07:10] (03Merged) 10jenkins-bot: Move ClientSide and CommunityVoice [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147149 (owner: 10Reedy) [17:07:21] Is there a plan/ETA for moving production to Trusty? (Especially icinga.wikimedia.org.) [17:07:22] (03CR) 10CSteipp: WIP: AppArmor Profile for OCG (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 (owner: 10Mwalker) [17:07:29] !log reedy Started scap: testwiki to 1.24wmf14 take 2 [17:07:34] Logged the message, Master [17:08:47] (03PS1) 10Andrew Bogott: Set /etc/mailname to contain $::fqdn [operations/puppet] - 10https://gerrit.wikimedia.org/r/147150 [17:08:49] (03CR) 10Alexandros Kosiaris: [C: 031] "Seems fine to me" [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/146920 (owner: 10Alexandros Kosiaris) [17:08:55] (03CR) 10Giuseppe Lavagetto: jobrunner: create hhvm-only jobrunners (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147086 (owner: 10Giuseppe Lavagetto) [17:13:59] !log dist-upgrade and reboot payments1003 [17:14:04] Logged the message, Master [17:15:17] (03PS2) 10Tim Landscheidt: Set /etc/mailname to contain $::fqdn [operations/puppet] - 10https://gerrit.wikimedia.org/r/147150 (https://bugzilla.wikimedia.org/64962) (owner: 10Andrew Bogott) [17:16:12] PROBLEM - Host payments1003 is DOWN: PING CRITICAL - Packet loss = 100% [17:16:32] Jeff_Green: its still paging for them, heh [17:16:46] icinga is stupider than a rock [17:16:48] sigh. [17:17:06] do a scheduled downtime and it should not send notifications [17:18:07] mutante: ok, trying that. [17:19:09] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [17:19:53] (03CR) 10Filippo Giunchedi: [C: 04-1] jobrunner: create hhvm-only jobrunners (036 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147086 (owner: 10Giuseppe Lavagetto) [17:20:20] RECOVERY - Host payments1003 is UP: PING OK - Packet loss = 0%, RTA = 0.74 ms [17:21:13] !log nickel (ganglia) apt-get upgrading packages [17:21:18] Logged the message, Master [17:25:46] (03CR) 10Giuseppe Lavagetto: jobrunner: create hhvm-only jobrunners (036 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147086 (owner: 10Giuseppe Lavagetto) [17:25:55] <_joe_> godog: thanks! [17:26:25] can we salt * remove smbclient ?:) [17:27:00] (03PS7) 10Rush: Packaging for debian using pkg-php-tools/dh_php5. [operations/debs/php-mailparse] (review) - 10https://gerrit.wikimedia.org/r/142751 (owner: 1020after4) [17:27:24] (03CR) 10Rush: [C: 032 V: 032] "seems everyone has reached accommodation? going to merge and try it out then thanks!" [operations/debs/php-mailparse] (review) - 10https://gerrit.wikimedia.org/r/142751 (owner: 1020after4) [17:30:24] (03CR) 10Bsimmers: jobrunner: create hhvm-only jobrunners (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147086 (owner: 10Giuseppe Lavagetto) [17:30:47] !log payments1002 dist upgrade & reboot [17:30:53] Logged the message, Master [17:32:59] PROBLEM - Host payments1002 is DOWN: PING CRITICAL - Packet loss = 100% [17:33:29] PROBLEM - MySQL Slave Delay on db1021 is CRITICAL: CRIT replication delay 318 seconds [17:33:31] PROBLEM - MySQL Replication Heartbeat on db1021 is CRITICAL: CRIT replication delay 318 seconds [17:35:19] RECOVERY - Host payments1002 is UP: PING OK - Packet loss = 0%, RTA = 0.85 ms [17:35:28] (03PS1) 10Scottlee: Fixing variable. [operations/puppet] - 10https://gerrit.wikimedia.org/r/147155 [17:36:51] Can someone have a look at db1021? The lag keeps rising https://www.wikidata.org/w/api.php?action=query&meta=siteinfo&siprop=dbrepllag&sishowalldb= [17:40:32] !log reedy Finished scap: testwiki to 1.24wmf14 take 2 (duration: 33m 02s) [17:40:37] Logged the message, Master [17:41:25] https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=cpu_report&s=by+name&c=MySQL+eqiad&h=db1021.eqiad.wmnet&host_regex=&max_graphs=0&tab=m&vn=&hide-hf=false&sh=1&z=small&hc=4 [17:41:51] (03PS1) 10Scottlee: Added entry for datasets cname. [operations/dns] - 10https://gerrit.wikimedia.org/r/147156 [17:42:17] (03CR) 10BryanDavis: jobrunner: create hhvm-only jobrunners (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147086 (owner: 10Giuseppe Lavagetto) [17:43:07] (03PS1) 10Dzahn: retab noc.wm.org apache config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/147157 [17:45:26] Reedy: Doesn't look very healthy to me [17:46:09] 17:17 is exactly when my bot started complaining about lag :P [17:47:39] (03PS2) 10Scottlee: RT 7858: Added entry for datasets cname. [operations/dns] - 10https://gerrit.wikimedia.org/r/147156 [17:50:15] mwalker: https://github.com/mattofak/jouncebot/pull/3 [17:50:21] bd808: ^ FYI [17:51:00] lolz [17:51:23] ahaha [17:51:25] * greg-g walks to a coffee shop, will be online shortly after 11, probably [17:52:34] jouncebot: reload [17:52:40] jouncebot: refresh [17:52:41] I refreshed my knowledge about deployments. [17:53:06] thanks for the merge mwalker [17:54:10] matanya: cute [17:54:43] maybe now James_F will agree to host him in a nice british castle [17:54:53] noc.wm has this: Redirect permanent /cgi-bin/report.py https://performance.wikimedia.org/profiler/report [17:54:54] Ha. [17:55:03] but that is 404 [17:55:10] multichill: Looking at ganglia: The wait on IO on that machine peaked at 17:20... I guess a disk broke or for some other reason IO got slow there [17:55:22] matanya: It needs a gender macro though. jouncebot shouldn't call aude "sir". [17:55:29] can't back up these claims though... from that I can see on the server it's almost idle [17:56:44] bd808: i didn't want to put logic for the entire british order hierarchy [17:57:10] (03CR) 10Dzahn: [C: 032] retab noc.wm.org apache config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/147157 (owner: 10Dzahn) [17:58:16] Did anyone look at db1021? I can see that someone from iron is on it, but no idea who [17:58:51] bd808: does "your majesty" work for both genders ? [17:59:02] or your higness [18:00:04] The time is nigh to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140717T1800) [18:00:34] jouncebot: log out and come back again, you are not acting nicely [18:00:59] jouncebot: die [18:01:33] matanya: by the Grace of God, of Great Britain, Ireland and the British Dominions beyond the Seas, Queen, Defender of the Faith [18:01:42] https://en.wikipedia.org/wiki/List_of_titles_and_honours_of_Queen_Elizabeth_II#Royal_titles_and_styles [18:02:22] mutante: i'll let aude know her new title. thanks! [18:02:39] springle: db1021 is getting connection errors and has lots of lag atm :( [18:02:53] AaronSchulz: It's 4am for him :P [18:02:54] hoo: AaronSchulz : yea, he knows already afaik [18:03:05] "traffic sampling"? is that tendril? [18:03:12] mutante: Who's he? Sean? [18:03:15] yes [18:03:25] Reedy: I was up at 2:30am last night ;) [18:03:29] Ok, cause otherwise I would suggest to depool that DB [18:03:36] bd808: does he respawn himself? [18:03:42] but yeah 4 is rough [18:03:47] greg-g: I thought so... [18:04:17] 894 seconds [18:05:43] (03PS1) 10Dzahn: remove broken redirects to performance.wm.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/147162 [18:06:09] hark! jouncebot! return! [18:06:40] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [18:06:56] (03PS1) 10Reedy: depool db1021 due to replag [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147163 [18:07:12] Reedy: if Sean is not around that's the way to go, yes [18:07:25] Aye [18:07:30] I hadn't seen him speak [18:08:10] mwalker: I killed jouncebot and he didn't respawn :( [18:08:13] mwalker: btw, where does jouncebot live? [18:08:36] jouncebot is a tool in toollabs [18:08:45] can have in gerrit? [18:08:58] (03CR) 10Reedy: [C: 032] depool db1021 due to replag [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147163 (owner: 10Reedy) [18:09:02] https://wikitech.wikimedia.org/w/index.php?search=jouncebot&title=Special%3ASearch&go=Go :( [18:09:04] (03Merged) 10jenkins-bot: depool db1021 due to replag [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147163 (owner: 10Reedy) [18:09:26] man; now y'all are wanting to get all formal on me :p [18:09:30] gerrit; page on wikitech [18:09:36] next up unicorn logo [18:09:49] !log reedy Synchronized wmf-config/: De-pool db1021 due to increasing replag (duration: 00m 14s) [18:09:54] Logged the message, Master [18:09:57] ascii art on IRC is about the least you could do, really [18:10:21] mwalker: but, wanna add me to the tool/project whatever? [18:10:23] (03PS2) 10Reedy: Wikipedias to 1.24wmf13 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147146 [18:10:32] (03CR) 10Reedy: [C: 032] Wikipedias to 1.24wmf13 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147146 (owner: 10Reedy) [18:10:39] (03Merged) 10jenkins-bot: Wikipedias to 1.24wmf13 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147146 (owner: 10Reedy) [18:11:51] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf13 [18:11:56] Logged the message, Master [18:12:11] icinga-wm: recover strontium [18:12:30] RECOVERY - MySQL Replication Heartbeat on db1021 is OK: OK replication delay -0 seconds [18:12:34] greg-g, once I figure out how to do that; sure... [18:12:40] toollabs is confusing to me [18:12:40] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [18:12:45] icinga-wm: thx [18:12:50] RECOVERY - MySQL Slave Delay on db1021 is OK: OK replication delay 0 seconds [18:13:14] (03PS1) 10Dzahn: Revert "depool db1021 due to replag" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147166 [18:13:18] User:Reinheitsgebot :P (I guess you need to speak German to understand that... or be a Beer enthusiast) [18:13:24] lol [18:13:59] Reedy: too early to revert? [18:14:26] mutante: What did you do to it? [18:14:27] greg-g, do you have a toollabs account? [18:14:32] hoo: nothing [18:14:53] ok [18:14:57] !log db1021 disabled sync_binlog, thread tied up on fsync [18:15:00] oh [18:15:02] Logged the message, Master [18:15:02] ah [18:15:03] you depooled it [18:15:06] ok [18:15:34] if springle didn't restart mysql you can repool, if he did you better warm up first :P [18:15:38] springle: https://gerrit.wikimedia.org/r/#/c/147166/ [18:15:44] he didn't apparently [18:16:17] mutante: no it's fine. leave it [18:16:28] ok [18:16:35] mwalker: gjg [18:16:45] mwalker: er, "Greg Grossmeier" on wikitech [18:16:51] gjg is my shell [18:17:13] (03CR) 10Dzahn: [C: 032] remove broken redirects to performance.wm.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/147162 (owner: 10Dzahn) [18:18:13] greg-g, ah; you need to be part of the toollabs project: https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request (or bug coren) [18:18:32] oh, I suppose so, I've only cared about deployment-prep so far... [18:19:19] oh coren.... https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Greg_Grossmeier [18:19:52] (03PS1) 10Ori.livneh: mediawiki: profile.d script to add scap to $PATH [operations/puppet] - 10https://gerrit.wikimedia.org/r/147167 [18:20:50] (03CR) 10Ori.livneh: [C: 032] mediawiki: use native min/floor funcs rather than inline_template [operations/puppet] - 10https://gerrit.wikimedia.org/r/147135 (owner: 10Ori.livneh) [18:21:52] (03PS1) 10Scottlee: Fixed spacing. [operations/dns] - 10https://gerrit.wikimedia.org/r/147168 [18:23:26] (03PS1) 10Ori.livneh: Revert "mediawiki: use native min/floor funcs rather than inline_template" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147169 [18:23:40] (03PS2) 10Ori.livneh: Revert "mediawiki: use native min/floor funcs rather than inline_template" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147169 [18:23:46] (03CR) 10Ori.livneh: [C: 032 V: 032] Revert "mediawiki: use native min/floor funcs rather than inline_template" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147169 (owner: 10Ori.livneh) [18:24:44] (03CR) 10Dzahn: "can you remove the literal tabs? like here f.e. https://gerrit.wikimedia.org/r/#/c/143212/" [operations/dns] - 10https://gerrit.wikimedia.org/r/147168 (owner: 10Scottlee) [18:25:04] (03CR) 10Scottlee: "I fixed this but it seems to have created a separate patch? https://gerrit.wikimedia.org/r/#/c/147155/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/146675 (owner: 10Scottlee) [18:26:46] (03PS2) 10Reedy: group0 to 1.24wmf14 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147147 [18:27:07] cmjohnson1: hey Chris can you please open a ticket for me with instructions on connecting the ms array in B8 and B7? thank you [18:27:29] B8 and B6 [18:27:39] papaul: sure I will have to you shortly [18:27:53] thank you [18:28:42] RobH: did you get my question before lunch? [18:29:11] (03CR) 10Reedy: [C: 032] group0 to 1.24wmf14 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147147 (owner: 10Reedy) [18:29:17] (03Merged) 10jenkins-bot: group0 to 1.24wmf14 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147147 (owner: 10Reedy) [18:30:02] !log db1021 raid write-cache failure, BBU at 9% [18:30:06] Logged the message, Master [18:31:46] ? [18:31:49] mwalker: I now am a part of the elite group known as Toollabs project member [18:31:51] papaul: nope, sorry [18:33:18] was asking when are we going to start testing the serials connections and mgt connections for row A and B [18:34:58] meh, cpan is down http://search.cpan.org/ 503 [18:35:09] PROBLEM - check_mysql on payments1003 is CRITICAL: Slave IO: Yes Slave SQL: No Seconds Behind Master: (null) [18:35:10] PROBLEM - check_mysql on payments1004 is CRITICAL: Slave IO: Yes Slave SQL: No Seconds Behind Master: (null) [18:35:11] PROBLEM - check_mysql on payments1002 is CRITICAL: Slave IO: Yes Slave SQL: No Seconds Behind Master: (null) [18:35:13] greg-g, ok; you are added as a member; and also see https://wikitech.wikimedia.org/wiki/Jouncebot [18:35:46] mwalker: ty sir [18:36:09] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf14 [18:36:14] Logged the message, Master [18:40:09] PROBLEM - check_mysql on payments1003 is CRITICAL: Slave IO: Yes Slave SQL: No Seconds Behind Master: (null) [18:40:10] PROBLEM - check_mysql on payments1004 is CRITICAL: Slave IO: Yes Slave SQL: No Seconds Behind Master: (null) [18:40:18] PROBLEM - check_mysql on payments1002 is CRITICAL: Slave IO: Yes Slave SQL: No Seconds Behind Master: (null) [18:40:23] i love our alerts [18:40:30] Yes Slave SQL: No Seconds Behind Master: (null) [18:42:22] (03PS2) 10Rush: route incoming mail for phabricator.wm.org to iridium [operations/puppet] - 10https://gerrit.wikimedia.org/r/146859 [18:42:38] (03CR) 10Rush: [C: 032 V: 032] "no original thinking here just mimicking rt" [operations/puppet] - 10https://gerrit.wikimedia.org/r/146859 (owner: 10Rush) [18:42:39] there should be a comma or something between Yes and Slave [18:43:06] Yes something Slave SQL: [18:43:28] marktraceur: helpful [18:43:32] I try [18:43:36] greg-g: Don't think it should talk to us like that, we're not the slaves here (yet?) [18:43:41] imo it should all just be "an error has occured with the system" [18:43:59] and leave out the hostname too, who really needs that anyway. [18:44:06] Jeff_Green: maybe: "Error: an error has occurred. Please fix the error and try again." [18:44:07] Check Engine [18:44:22] ori: perfect. [18:45:09] PROBLEM - check_mysql on payments1004 is CRITICAL: Slave IO: Yes Slave SQL: No Seconds Behind Master: (null) [18:45:09] PROBLEM - check_mysql on payments1003 is CRITICAL: Slave IO: Yes Slave SQL: No Seconds Behind Master: (null) [18:45:16] (03CR) 10Ottomata: [C: 031 V: 032] "Me too!" [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/146920 (owner: 10Alexandros Kosiaris) [18:45:18] RECOVERY - check_mysql on payments1002 is OK: Uptime: 4309 Threads: 3 Questions: 12560 Slow queries: 66 Opens: 659 Flush tables: 1 Open tables: 45 Queries per second avg: 2.914 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [18:45:23] (03CR) 10Ottomata: [C: 032] "Me too!" [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/146920 (owner: 10Alexandros Kosiaris) [18:45:45] akosiaris: let's do it! want to add .deb to apt? [18:46:40] (03CR) 10Ottomata: [C: 032 V: 032] RT 7858: Added entry for datasets cname. [operations/dns] - 10https://gerrit.wikimedia.org/r/147156 (owner: 10Scottlee) [18:50:09] PROBLEM - check_mysql on payments1003 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1280 [18:50:11] PROBLEM - check_mysql on payments1004 is CRITICAL: Slave IO: Yes Slave SQL: No Seconds Behind Master: (null) [18:55:09] PROBLEM - check_mysql on payments1004 is CRITICAL: Slave IO: Yes Slave SQL: No Seconds Behind Master: (null) [18:55:09] RECOVERY - check_mysql on payments1003 is OK: Uptime: 5909 Threads: 2 Questions: 16611 Slow queries: 173 Opens: 665 Flush tables: 1 Open tables: 45 Queries per second avg: 2.811 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [18:55:10] (03CR) 10Mwalker: WIP: AppArmor Profile for OCG (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 (owner: 10Mwalker) [18:55:25] (03PS11) 10Mwalker: WIP: AppArmor module and profile for OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 [19:00:12] PROBLEM - check_mysql on payments1004 is CRITICAL: Slave IO: Yes Slave SQL: No Seconds Behind Master: (null) [19:03:57] (03PS1) 10Dzahn: racktables - outdated variable syntax [operations/puppet] - 10https://gerrit.wikimedia.org/r/147184 [19:05:12] PROBLEM - check_mysql on payments1004 is CRITICAL: Slave IO: Yes Slave SQL: No Seconds Behind Master: (null) [19:05:45] (03CR) 10Ori.livneh: "Python scoping is a bit of a mess when it comes to nested functions definitions. You don't really need to segregate the imperative part of" (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/146932 (owner: 10Rush) [19:06:10] (03PS1) 10Dzahn: racktables - update SSL cipher list [operations/puppet] - 10https://gerrit.wikimedia.org/r/147185 [19:07:06] (03PS1) 10Dzahn: racktables - retab apache config [operations/puppet] - 10https://gerrit.wikimedia.org/r/147186 [19:08:24] (03PS1) 10Aaron Schulz: Moved remaning job runner slots to the new loop [operations/puppet] - 10https://gerrit.wikimedia.org/r/147188 [19:08:31] (03CR) 10BBlack: [C: 031] "Looks decent to me. Personally, I still like programs to have a main() always, but I'd still pull the nested functions out of it." [operations/puppet] - 10https://gerrit.wikimedia.org/r/146932 (owner: 10Rush) [19:09:22] (03PS2) 10Dzahn: racktables - retab apache config [operations/puppet] - 10https://gerrit.wikimedia.org/r/147186 [19:09:59] papaul: So on testing serial connections, I've had it in my head that it would be once we have the basic connections up for us remotes to hit the serial console servers, but... [19:10:12] RECOVERY - check_mysql on payments1004 is OK: Uptime: 7630 Threads: 4 Questions: 8040 Slow queries: 66 Opens: 782 Flush tables: 1 Open tables: 60 Queries per second avg: 1.053 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [19:10:13] i suppose you could start testing them manually [19:10:24] (03PS12) 10Mwalker: AppArmor module and profile for OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 [19:10:29] papaul: I'll detail a ticket on how to potentially start manual testing so you can see if your crimps work ahead of time [19:10:45] and how to configure and label the ports in the opengear software [19:10:48] so you can handle that as well [19:11:15] i'll also include how to use serial to output the chassis serial # of the network siwtch [19:11:24] so you can compare to our racktables entries to audit your own serial connections [19:11:28] and ensure they go where you think [19:11:33] (that part is a huge deal) [19:13:01] ok [19:13:41] thx for checkign though it just had slipped my mind to detail that [19:14:17] (03PS1) 10Dzahn: smokeping - outdated variable syntax [operations/puppet] - 10https://gerrit.wikimedia.org/r/147191 [19:16:25] (03PS1) 10Dzahn: smokeping - retab Apache config [operations/puppet] - 10https://gerrit.wikimedia.org/r/147192 [19:16:27] (03CR) 10Jgreen: [C: 031] AppArmor module and profile for OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 (owner: 10Mwalker) [19:17:47] (03PS2) 10Ori.livneh: Moved remaning job runner slots to the new loop [operations/puppet] - 10https://gerrit.wikimedia.org/r/147188 (owner: 10Aaron Schulz) [19:19:50] (03PS1) 10Dzahn: smokeping - update SSL cipher list [operations/puppet] - 10https://gerrit.wikimedia.org/r/147196 [19:19:57] (03CR) 10jenkins-bot: [V: 04-1] Moved remaning job runner slots to the new loop [operations/puppet] - 10https://gerrit.wikimedia.org/r/147188 (owner: 10Aaron Schulz) [19:21:17] (03PS13) 10Mwalker: AppArmor module and profile for OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 [19:21:26] PROBLEM - puppet last run on mw1127 is CRITICAL: CRITICAL: Puppet has 1 failures [19:21:51] csteipp, per our discussion in -dev, I allow only writes to *.pdf, *.txt, *.tex, and *.zip now [19:21:51] (03PS3) 10Ori.livneh: Moved remaning job runner slots to the new loop [operations/puppet] - 10https://gerrit.wikimedia.org/r/147188 (owner: 10Aaron Schulz) [19:22:01] if you would give it a +1 if it all looks good Jeff_Green can merge it [19:22:17] (03PS1) 10Dzahn: etherpad - retab Apache config [operations/puppet] - 10https://gerrit.wikimedia.org/r/147197 [19:24:10] (03PS1) 10Dzahn: etherpad - outdated variable syntax [operations/puppet] - 10https://gerrit.wikimedia.org/r/147198 [19:24:23] (03CR) 10CSteipp: [C: 031] "Profile looks sane. I'll let someone else approve the puppet parts." [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 (owner: 10Mwalker) [19:24:25] (03CR) 10Ori.livneh: [C: 032] "OK'd by giuseppe" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147188 (owner: 10Aaron Schulz) [19:25:42] (03PS1) 10Dzahn: etherpad - update SSL cipher list [operations/puppet] - 10https://gerrit.wikimedia.org/r/147199 [19:27:52] (03PS1) 10Dzahn: icinga - retab Apache config template [operations/puppet] - 10https://gerrit.wikimedia.org/r/147201 [19:32:47] (03PS1) 10Dzahn: icinga - outdated variable syntax [operations/puppet] - 10https://gerrit.wikimedia.org/r/147204 [19:35:51] (03PS1) 10Dzahn: icinga - update SSL cipher list [operations/puppet] - 10https://gerrit.wikimedia.org/r/147207 [19:37:35] (03PS3) 10Rush: phabricator inbound email handler [operations/puppet] - 10https://gerrit.wikimedia.org/r/146932 [19:40:00] (03CR) 10Rush: "got a +1, thanks! but still addressed some points" [operations/puppet] - 10https://gerrit.wikimedia.org/r/146932 (owner: 10Rush) [19:40:28] RECOVERY - puppet last run on mw1127 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [19:41:00] (03PS1) 10Dzahn: generic_vhost (webserver) - update SSL ciphers [operations/puppet] - 10https://gerrit.wikimedia.org/r/147208 [19:41:10] (03PS2) 10Scottlee: Fixed base module Puppet 3 lint issues. [operations/puppet] - 10https://gerrit.wikimedia.org/r/146675 [19:41:26] (03CR) 10Dzahn: "unless we're not using this anymore?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147208 (owner: 10Dzahn) [19:42:41] (03PS1) 10Dzahn: generic_vhost - retab Apache template [operations/puppet] - 10https://gerrit.wikimedia.org/r/147209 [19:42:56] (03PS2) 10Ottomata: Ensure hadoop directories exist in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/147109 [19:43:05] (03CR) 10Ottomata: [C: 032] Ensure hadoop directories exist in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/147109 (owner: 10Ottomata) [19:43:11] (03CR) 10Ottomata: [V: 032] Ensure hadoop directories exist in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/147109 (owner: 10Ottomata) [19:44:21] hey ^d [19:44:26] are there gerrit replication logs somewhere? [19:44:29] there are, rigth? [19:44:37] a repo isn't replicating to github properly [19:44:39] want to see why [19:44:39] <^d> Ummm, not really. Logs on failure. [19:44:47] (03PS1) 10Dzahn: generic_vhost - oudated variable syntax [operations/puppet] - 10https://gerrit.wikimedia.org/r/147210 [19:44:47] <^d> Oh, you might see a failure then [19:44:49] ottomata: I think there are a few [19:44:56] where do they go? [19:44:56] I noticed one yesterday, python-diamond [19:45:04] but I dunno if they are all meant to [19:45:11] (03CR) 10Ori.livneh: [C: 031] phabricator inbound email handler [operations/puppet] - 10https://gerrit.wikimedia.org/r/146932 (owner: 10Rush) [19:45:17] <^d> ottomata: ytterbium:/var/lib/gerrit2/review_site/logs/error_log [19:45:22] ah [19:45:27] we shoudl symlink that into /var/log/ or something [19:45:51] (03CR) 10jenkins-bot: [V: 04-1] generic_vhost - oudated variable syntax [operations/puppet] - 10https://gerrit.wikimedia.org/r/147210 (owner: 10Dzahn) [19:46:03] (03Abandoned) 10Ori.livneh: mediawiki: remove inline_template in favor of min/floor [operations/puppet] - 10https://gerrit.wikimedia.org/r/147136 (owner: 10Ori.livneh) [19:46:07] (03PS1) 10Ori.livneh: decom mediawiki::jobqueue [operations/puppet] - 10https://gerrit.wikimedia.org/r/147211 [19:46:18] (03CR) 10Andrew Bogott: [C: 032] Fixed base module Puppet 3 lint issues. [operations/puppet] - 10https://gerrit.wikimedia.org/r/146675 (owner: 10Scottlee) [19:47:27] ottomata: want I should merge these hadoop changes? [19:47:31] (03PS2) 10Dzahn: generic_vhost - outdated variable syntax [operations/puppet] - 10https://gerrit.wikimedia.org/r/147210 [19:47:42] oh! [19:47:44] oops, sorry [19:47:52] yes andrewbogott, thanks [19:49:08] (03PS4) 10Rush: phabricator inbound email handler [operations/puppet] - 10https://gerrit.wikimedia.org/r/146932 [19:49:17] (03CR) 10Andrew Bogott: "retest" [operations/puppet] - 10https://gerrit.wikimedia.org/r/146675 (owner: 10Scottlee) [19:49:19] (03CR) 10Rush: [C: 032 V: 032] phabricator inbound email handler [operations/puppet] - 10https://gerrit.wikimedia.org/r/146932 (owner: 10Rush) [19:49:35] (03CR) 10Andrew Bogott: "recheck" [operations/puppet] - 10https://gerrit.wikimedia.org/r/146675 (owner: 10Scottlee) [19:51:21] (03CR) 10Aaron Schulz: [C: 031] decom mediawiki::jobqueue [operations/puppet] - 10https://gerrit.wikimedia.org/r/147211 (owner: 10Ori.livneh) [19:51:32] (03PS1) 10Dzahn: metrics - update SSL cipher list [operations/puppet] - 10https://gerrit.wikimedia.org/r/147214 [19:52:29] (03PS1) 10Dzahn: metrics - outdated variable syntax [operations/puppet] - 10https://gerrit.wikimedia.org/r/147215 [19:53:46] hm, ^d, i'm not sure what's up, there's a mising commit on some repo and its being weird [19:53:55] if I delete the github one, will gerrit just push everything it has? [19:54:35] <^d> delete + recreate. [19:54:37] <^d> Yeah [19:56:19] (03PS1) 10Dzahn: stats - update SSL cipher list [operations/puppet] - 10https://gerrit.wikimedia.org/r/147216 [19:56:53] (03CR) 10Ori.livneh: [C: 032] decom mediawiki::jobqueue [operations/puppet] - 10https://gerrit.wikimedia.org/r/147211 (owner: 10Ori.livneh) [19:57:28] (03PS2) 10Dzahn: stats - update SSL cipher list [operations/puppet] - 10https://gerrit.wikimedia.org/r/147216 [19:58:14] (03PS1) 10Dzahn: stats - retab Apache config template [operations/puppet] - 10https://gerrit.wikimedia.org/r/147217 [19:58:37] do I ahve to recreate the repo on github? [19:58:41] or will gerrit do that? [19:58:45] ^d? [19:59:06] <^d> Yes. Sorry, had tabbed away :) [19:59:12] <^d> Have to recreate. [19:59:23] (03PS1) 10Dzahn: stats - outdated variable syntax [operations/puppet] - 10https://gerrit.wikimedia.org/r/147218 [19:59:25] on github, hm, ok.... [19:59:27] do I create it empty then? [19:59:44] <^d> Yeah the only option is like empty or with a README, do empty. [20:00:00] ok cool [20:01:23] PROBLEM - puppet last run on mw1015 is CRITICAL: CRITICAL: Puppet has 1 failures [20:01:33] PROBLEM - puppet last run on tmh1002 is CRITICAL: CRITICAL: Puppet has 1 failures [20:01:35] interesting, its also not pushing to gallium, ^d [20:01:36] Cannot replicate to gerritslave@gallium.wikimedia.org:/srv/ssd/gerrit/operations/debs/kafka.git [20:01:47] (03PS2) 10Dzahn: metrics - update SSL cipher list [operations/puppet] - 10https://gerrit.wikimedia.org/r/147214 [20:02:22] i guess I shoudl do the same there? [20:02:30] rm -r ... && git init? [20:03:18] hi [20:03:30] I have problems with receiving emails from mailing lists again... [20:03:43] just as in March this year it's probably due to my email provider: Hotmail [20:03:50] any idea what I can do to solve this forever? [20:04:33] Change e-mail provider? [20:06:01] (03PS1) 10Aaron Schulz: Set "daemonized" flag for the redis job queue [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147220 [20:07:15] (03PS1) 10Dzahn: gmond - outdated variable syntax in erb [operations/puppet] - 10https://gerrit.wikimedia.org/r/147221 [20:07:22] (03CR) 10Ottomata: [C: 032 V: 032] stats - retab Apache config template [operations/puppet] - 10https://gerrit.wikimedia.org/r/147217 (owner: 10Dzahn) [20:07:41] Trijnstel: what happened last time? [20:07:53] did a ticket get resolved related to tis? [20:08:08] oops, mutante, i missed the dependency on that commit [20:08:16] the tab recommit is +2ed [20:08:30] ottomata: that's ok, it can merge with the other one [20:09:12] ottomata: thanks,maybe i should have done that one the other way around [20:11:09] (03CR) 10Dzahn: [C: 031] ""Warning: Variable access via 'cname' is deprecated. Use '@cname' instead. " etc" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147221 (owner: 10Dzahn) [20:11:17] <^d> ottomata: Bleh, you can. `git init --bare` I think. [20:11:18] ja, np [20:11:20] ok [20:12:00] (03CR) 10Jgreen: [C: 031 V: 031] AppArmor module and profile for OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 (owner: 10Mwalker) [20:12:42] (03CR) 10Jgreen: [C: 032] AppArmor module and profile for OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/147027 (owner: 10Mwalker) [20:13:23] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Thu 17 Jul 2014 18:12:52 UTC [20:13:24] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Thu Jul 17 20:13:22 UTC 2014 [20:15:46] mutante: hmh, not sure [20:15:54] there is a ticket though [20:15:57] I'll look [20:16:05] Trijnstel: if you can find it.. just reopen it [20:16:06] Someone just pointed me to this: http://descrier.co.uk/technology/wikipedia-goes-many-users-around-world/ I couldn't find an incident report. Does anyone have a brief description of what was/is happening? [20:17:22] If anything. [20:18:19] mutante: found it -> https://bugzilla.wikimedia.org/show_bug.cgi?id=62838 [20:18:23] I'll reopen [20:18:23] RECOVERY - puppet last run on mw1015 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [20:18:33] RECOVERY - puppet last run on tmh1002 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [20:19:56] Trijnstel: cool! it might even need MS to put our _new_ mailserver IP on the whitelist [20:20:20] no, ignore that, that's lists.wm and it's still the same..hmm [20:20:36] zellfaze: Yes, we have some idea what happened. However, I think that article is a gross mischaracterization of the scope of the issue. The impact has been isolated and tiny as far as I can tell from e.g. traffic graphs [20:20:53] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Thu 17 Jul 2014 18:20:00 UTC [20:21:52] there isn't an incident report yet as I'm still waiting for a response from an ISP, but we could at least publish some sort of overview of the situation. [20:22:12] http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&c=LVS+loadbalancers+eqiad&m=cpu_report&s=by+name&mc=2&g=network_report [20:22:13] bblack: that'd be nice, if you have something worth-while [20:22:13] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Thu Jul 17 20:22:09 UTC 2014 [20:22:54] ^ but in general, if we had a real "outage" of DNS on a large scale, you'd see it there [20:23:43] Aye. Channel seemed surprisingly calm for an outage. [20:24:35] The super-brief version of what we know is: Yesterday I pushed some DNS changes. The way they were deployed had a tactical error in it, which would give remote caches perhaps a ~1/1000 chance of being unable to resolve us, but even if they were struck by those odds, the situation should have self-corrected for them within 10 minutes. [20:25:08] But for some subset of that subset, the problem didn't go away after 10 minutes, because those ISPs are horribly violating internet standards. [20:25:25] Several hours later (this morning US time) we reverted the change anyways to try to remedy the situation for those ISPs [20:26:02] Excellent. Thank you. :) [20:26:10] mutante: it's pretty anonying anyway [20:26:14] I'll pass that on. [20:26:26] and I don't want to switch my email provider when it's not really necessary tbh [20:26:39] Trijnstel: sounds like they blocked us again :/ [20:27:07] ^d, ok, i think this one is working, i'm not sure what's up on gallium, but its not erroring anymore [20:27:09] i got one more [20:27:20] https://github.com/wikimedia/operations-puppet-cdh [20:27:20] vs [20:27:20] https://github.com/wikimedia/puppet-cdh [20:27:22] mutante: really? :S [20:27:23] why for god sake? [20:27:26] and i don't see any errors for puppet-cdh [20:28:32] Trijnstel: "because [20:28:33] too many people have reported your mail as unwanted" [20:29:01] Trijnstel: i think this is likely "subscribers not being able to work out how to unsubscribe, and then just calling it spam to deal with the issue." [20:29:46] !log updated jobrunner to 71d84ea18d and restarted service [20:29:48] ^ AaronSchulz [20:29:52] Logged the message, Master [20:33:06] (To elaborate a bit futher here on the DNS thing: the "violating internet standards" bit is not honoring our 10-minute negative time-to-live on our wikimedia.org zone, and caching negative responses far longer than that. I've managed to get in contact with one affected ISP through one of their users that reported to us, and I'm trying to at least find out from them what software they're using and why it behaves that way) [20:33:53] oh, nice [20:34:09] that negative ttl issue was problematic during outages in past. [20:35:08] (03CR) 10Andrew Bogott: [C: 032] racktables - outdated variable syntax [operations/puppet] - 10https://gerrit.wikimedia.org/r/147184 (owner: 10Dzahn) [20:35:14] yeah, and sadly googling doesn't turn up a lot of hard data on any software that even has a setting to do something dumb like that. at best you get mailing list / blog posts of two natures: (1) No software should ever do that!, and (2) Sometimes caches do this and it sucks, be careful [20:35:53] the only hint of a hint I've seen was someone indicting a Microsoft DNS Server for being able to do it, but then I haven't found any docs about that behavior from MS's own technical info. [20:40:38] (03PS1) 10Scottlee: RT 7858: datasets Apache and Puppet edits. [operations/puppet] - 10https://gerrit.wikimedia.org/r/147226 [20:42:07] mutante: yeah, I know, that's what they said in March [20:42:28] but since we told them they aren't unwanted... I would assume they wouldn't blacklist us again [20:43:00] never assume sensibility from hotmail :) [20:44:53] (03CR) 10Ottomata: RT 7858: datasets Apache and Puppet edits. (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147226 (owner: 10Scottlee) [20:54:08] (03PS1) 10Dzahn: fix RT Apache config setup [operations/puppet] - 10https://gerrit.wikimedia.org/r/147230 [20:55:43] (03CR) 10Dzahn: "it should have never put files straight into sites-enabled, but use apache_site to make the symink" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147230 (owner: 10Dzahn) [20:59:15] (03CR) 10Tim Landscheidt: [C: 031] Set /etc/mailname to contain $::fqdn (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147150 (https://bugzilla.wikimedia.org/64962) (owner: 10Andrew Bogott) [21:01:20] (03PS1) 10Rush: phabricator moving shared settings to defaults [operations/puppet] - 10https://gerrit.wikimedia.org/r/147304 [21:02:15] !log deployed fix for bug68187 [21:02:19] Logged the message, Master [21:03:18] (03CR) 10Rush: [C: 032] phabricator moving shared settings to defaults [operations/puppet] - 10https://gerrit.wikimedia.org/r/147304 (owner: 10Rush) [21:03:33] (03PS1) 10Dzahn: fix racktables Apache setup [operations/puppet] - 10https://gerrit.wikimedia.org/r/147307 [21:04:10] PROBLEM - puppetmaster backend https on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:05:01] RECOVERY - puppetmaster backend https on palladium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.014 second response time [21:05:49] (03PS2) 10Dzahn: fix RT Apache setup [operations/puppet] - 10https://gerrit.wikimedia.org/r/147230 [21:09:39] please stop filtering gerrit mail:) [21:10:08] (03PS2) 10Rush: mysql @var template linting [operations/puppet] - 10https://gerrit.wikimedia.org/r/146927 [21:10:13] (03CR) 10Rush: [C: 032] mysql @var template linting [operations/puppet] - 10https://gerrit.wikimedia.org/r/146927 (owner: 10Rush) [21:10:19] (03CR) 10Rush: [V: 032] mysql @var template linting [operations/puppet] - 10https://gerrit.wikimedia.org/r/146927 (owner: 10Rush) [21:10:49] (03PS3) 10Ori.livneh: fix RT Apache setup [operations/puppet] - 10https://gerrit.wikimedia.org/r/147230 (owner: 10Dzahn) [21:11:31] (03Abandoned) 10Dzahn: remove admins::restricted from terbium,fluorine [operations/puppet] - 10https://gerrit.wikimedia.org/r/126941 (owner: 10Dzahn) [21:11:32] mutante: changed ';' to ',' [21:11:40] (03CR) 10Ori.livneh: [C: 031] fix RT Apache setup [operations/puppet] - 10https://gerrit.wikimedia.org/r/147230 (owner: 10Dzahn) [21:12:22] ori: thank you [21:14:27] (03CR) 10Dzahn: "hmm, prerequisite had been merged a while ago,, need to look again if we can take it now" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122399 (owner: 10Dzahn) [21:15:44] (03CR) 10Rush: [C: 04-1] "chasemp" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147167 (owner: 10Ori.livneh) [21:16:46] (03CR) 10Dzahn: [C: 032] fix RT Apache setup [operations/puppet] - 10https://gerrit.wikimedia.org/r/147230 (owner: 10Dzahn) [21:16:48] (03PS2) 10Dzahn: fix racktables Apache setup [operations/puppet] - 10https://gerrit.wikimedia.org/r/147307 [21:18:09] (03PS2) 10Ori.livneh: mediawiki: profile.d script to add scap to $PATH [operations/puppet] - 10https://gerrit.wikimedia.org/r/147167 [21:19:21] (03CR) 10Dzahn: [C: 032] fix racktables Apache setup [operations/puppet] - 10https://gerrit.wikimedia.org/r/147307 (owner: 10Dzahn) [21:19:23] (03CR) 10Ori.livneh: [C: 04-1] fix racktables Apache setup (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147307 (owner: 10Dzahn) [21:19:28] ar [21:19:29] oops [21:19:30] (03PS3) 10Rush: mediawiki: profile.d script to add scap to $PATH [operations/puppet] - 10https://gerrit.wikimedia.org/r/147167 (owner: 10Ori.livneh) [21:19:31] doesn't matter [21:19:37] (03CR) 10Rush: [C: 031] "ok sounds good" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147167 (owner: 10Ori.livneh) [21:19:43] chasemp: thanks! [21:19:50] ori: i'll follow-up, bad timing :) [21:20:01] (03PS2) 10Scottlee: RT 7858: datasets Apache and Puppet edits. [operations/puppet] - 10https://gerrit.wikimedia.org/r/147226 [21:21:51] (03CR) 10Ori.livneh: [C: 032 V: 032] mediawiki: profile.d script to add scap to $PATH [operations/puppet] - 10https://gerrit.wikimedia.org/r/147167 (owner: 10Ori.livneh) [21:23:21] (03PS1) 10Dzahn: racktables - use apache::conf instead of file{} [operations/puppet] - 10https://gerrit.wikimedia.org/r/147310 [21:24:25] (03CR) 10QChris: metrics - update SSL cipher list (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147214 (owner: 10Dzahn) [21:25:51] (03CR) 10Dzahn: fix racktables Apache setup (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147307 (owner: 10Dzahn) [21:26:15] (03PS2) 10Scottlee: Fixed spacing. [operations/dns] - 10https://gerrit.wikimedia.org/r/147168 [21:27:14] (03CR) 10QChris: stats - update SSL cipher list (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147216 (owner: 10Dzahn) [21:28:08] (03CR) 10Ori.livneh: [C: 031] racktables - use apache::conf instead of file{} [operations/puppet] - 10https://gerrit.wikimedia.org/r/147310 (owner: 10Dzahn) [21:28:36] (03CR) 10Dzahn: metrics - update SSL cipher list (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147214 (owner: 10Dzahn) [21:28:55] (03CR) 10Dzahn: [C: 032] racktables - use apache::conf instead of file{} [operations/puppet] - 10https://gerrit.wikimedia.org/r/147310 (owner: 10Dzahn) [21:32:25] (03PS1) 10Andrew Bogott: Set /etc/mailname on first boot [operations/puppet] - 10https://gerrit.wikimedia.org/r/147314 [21:34:18] (03CR) 10QChris: [C: 031] metrics - update SSL cipher list (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147214 (owner: 10Dzahn) [21:34:31] (03PS3) 10Scottlee: RT 7858: datasets Apache and Puppet edits. [operations/puppet] - 10https://gerrit.wikimedia.org/r/147226 [21:34:42] (03PS1) 10Dzahn: wikitech - remove DHE ciphers [operations/puppet] - 10https://gerrit.wikimedia.org/r/147315 [21:35:15] (03CR) 10QChris: [C: 031] stats - update SSL cipher list (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147216 (owner: 10Dzahn) [21:36:51] (03PS1) 10Dzahn: OTRS - remove DHE ciphers [operations/puppet] - 10https://gerrit.wikimedia.org/r/147316 [21:37:23] (03CR) 10QChris: [C: 031] wikitech - remove DHE ciphers [operations/puppet] - 10https://gerrit.wikimedia.org/r/147315 (owner: 10Dzahn) [21:37:26] (03CR) 10Dzahn: metrics - update SSL cipher list (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147214 (owner: 10Dzahn) [21:39:52] (03CR) 10Dzahn: [C: 032] stats - update SSL cipher list (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147216 (owner: 10Dzahn) [21:40:39] (03CR) 10Dzahn: [C: 032] stats - outdated variable syntax [operations/puppet] - 10https://gerrit.wikimedia.org/r/147218 (owner: 10Dzahn) [21:43:34] (03CR) 10Dzahn: [C: 04-1] "still has a bunch of literal tabs, following the "A", see the red marks in gerrit" [operations/dns] - 10https://gerrit.wikimedia.org/r/147168 (owner: 10Scottlee) [21:46:03] (03PS1) 10Dzahn: Revert "stats - outdated variable syntax" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147318 [21:46:29] (03CR) 10Dzahn: [C: 032] Revert "stats - outdated variable syntax" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147318 (owner: 10Dzahn) [21:46:43] (03CR) 10Dzahn: [V: 032] Revert "stats - outdated variable syntax" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147318 (owner: 10Dzahn) [21:47:19] (03PS1) 10Ori.livneh: Get rid of symlinks to scap scripts [operations/puppet] - 10https://gerrit.wikimedia.org/r/147321 [21:48:32] qchris: sorry about that, that broke stats.wm for a short time. it's back [21:48:35] (03PS2) 10Mwalker: Enable Petition Extension on Labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/146925 [21:57:57] (03PS2) 10QChris: metrics - outdated variable syntax [operations/puppet] - 10https://gerrit.wikimedia.org/r/147215 (owner: 10Dzahn) [21:58:34] (03CR) 10QChris: [C: 031] metrics - outdated variable syntax [operations/puppet] - 10https://gerrit.wikimedia.org/r/147215 (owner: 10Dzahn) [21:59:31] (03CR) 10BryanDavis: [C: 04-1] "modules/beta/files/wmf-beta-scap will need to be updated as well." (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147321 (owner: 10Ori.livneh) [22:02:57] (03PS1) 10Rush: phab settings comment [operations/puppet] - 10https://gerrit.wikimedia.org/r/147326 [22:03:44] (03PS3) 10Scottlee: Fixed spacing. [operations/dns] - 10https://gerrit.wikimedia.org/r/147168 [22:03:46] (03CR) 10BryanDavis: "mediawiki::sync isn't applied on all beta hosts that will need it apparently either. deployment-bastion and deployment-rsync don't have it" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147321 (owner: 10Ori.livneh) [22:14:21] (03PS4) 10Ori.livneh: Set the statsd server for the jobrunners [operations/puppet] - 10https://gerrit.wikimedia.org/r/146837 (owner: 10Aaron Schulz) [22:15:05] AaronSchulz: ^ review for sanity? [22:16:34] grr, stil missing the param [22:16:55] right [22:17:03] (03PS5) 10Ori.livneh: Set the statsd server for the jobrunners [operations/puppet] - 10https://gerrit.wikimedia.org/r/146837 (owner: 10Aaron Schulz) [22:18:25] (03CR) 10Aaron Schulz: [C: 031] Set the statsd server for the jobrunners [operations/puppet] - 10https://gerrit.wikimedia.org/r/146837 (owner: 10Aaron Schulz) [22:18:46] chasemp: got a cycle to spare for CR? if so, see aaron's patch immediately above ^ [22:19:10] (03PS4) 10Scottlee: Fixed spacing. [operations/dns] - 10https://gerrit.wikimedia.org/r/147168 [22:20:25] ori: do I need to merge? [22:20:36] i can if you +1 [22:20:42] or you can if you like, either way [22:20:57] (03CR) 10Rush: [C: 031] "can't confirm config file syntax but otherwise seems good" [operations/puppet] - 10https://gerrit.wikimedia.org/r/146837 (owner: 10Aaron Schulz) [22:21:04] thanks very much [22:21:17] (03CR) 10Ori.livneh: [C: 032] Set the statsd server for the jobrunners [operations/puppet] - 10https://gerrit.wikimedia.org/r/146837 (owner: 10Aaron Schulz) [22:36:57] (03PS1) 10Kmosher: Check KAFKA_START before JAVA_HOME [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/147338 [22:39:00] database locked on enwp? [22:39:11] apparently [22:39:24] just when I try to make an edit of course [22:40:35] 40-60seconds for s1, s2-s7 are all good though [22:40:52] https://en.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=dbrepllag&sishowalldb= lies [22:41:43] db1071 according to dbtree [22:43:32] and it's gone [23:00:04] Sir, Please deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140717T2300), the time has come. At your service [23:01:28] hmmm nick pings fell out of the new message [23:02:16] still there in the source... [23:02:23] jouncebot: refresh [23:02:25] I refreshed my knowledge about deployments. [23:02:30] jouncebot: next [23:02:30] In 87 hour(s) and 57 minute(s): SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140721T1500) [23:02:58] Hah. [23:03:13] Never seen it spell out just how long our "weekends" are. [23:03:23] 88 hours out of 168. [23:03:43] jouncebot: help [23:03:55] Hmm, nothing for "current" or "last"? [23:03:57] Oh well. [23:04:02] nope; that would be too fancy [23:04:09] I have plans [23:04:18] Clearly I should write such a module if I want to see it? :-) [23:04:19] but they are just plans [23:04:34] yep [23:04:35] or wait [23:04:43] Sure. :-) [23:04:49] bd808: Are you doing the SWAT, then? [23:05:15] Or mwalker? [23:05:16] nope. I'm not on the tactical team [23:05:23] And I have to go pick up my wife from work [23:05:24] Kk. :-) [23:06:04] hmm... guess that means I draw the short straw [23:07:00] phuedx, why did you -1 your proposed change? https://gerrit.wikimedia.org/r/#/c/146651/ [23:07:10] (03CR) 10Mwalker: [C: 032] Added namespace aliases for thwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147102 (https://bugzilla.wikimedia.org/68108) (owner: 10Vogone) [23:07:17] (03CR) 10Mwalker: [C: 032] Enable Petition Extension on Labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/146925 (owner: 10Mwalker) [23:07:19] (03Merged) 10jenkins-bot: Added namespace aliases for thwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147102 (https://bugzilla.wikimedia.org/68108) (owner: 10Vogone) [23:07:21] (03PS7) 10BryanDavis: beta: Apply ::mediawiki::sync on all host participating in scap [operations/puppet] - 10https://gerrit.wikimedia.org/r/144599 [23:07:26] (03Merged) 10jenkins-bot: Enable Petition Extension on Labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/146925 (owner: 10Mwalker) [23:15:14] Has SWAT started? [23:15:19] yep [23:15:25] do you need something to go out StevenW ? [23:15:40] phuedx, if you dont respond soon your change will not go out [23:16:01] yeah, https://gerrit.wikimedia.org/r/#/c/146651/ [23:16:05] (03CR) 10Ori.livneh: [C: 032] beta: Apply ::mediawiki::sync on all host participating in scap [operations/puppet] - 10https://gerrit.wikimedia.org/r/144599 (owner: 10BryanDavis) [23:16:13] ah; that's the one I'm confused about [23:16:15] it has a -1 [23:16:45] mwalker: it was -1'd just so it doesn't go out too soon [23:16:52] ah... [23:16:53] mwalker: i've removed the -1, thanks [23:17:04] (03CR) 10Mwalker: [C: 032] Disable the anonymous signup invite experiment [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/146651 (owner: 10Phuedx) [23:17:13] (03Merged) 10jenkins-bot: Disable the anonymous signup invite experiment [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/146651 (owner: 10Phuedx) [23:17:14] If it's in mediawiki-config and hasn't got comments/-1s... it's fair game to deploy ;) [23:18:03] ;) [23:18:21] * phuedx took note the last time that happened [23:18:26] :) [23:19:35] !log mwalker Started scap: SWAT for {{gerrit|146651}}, {{gerrit|147102}}, {{gerrit|146925}}, {{gerrit|147331}}, {{gerrit|147332}}, and {{gerrit|147206}} [23:19:39] Logged the message, Master [23:19:42] Yay. [23:21:05] Thanks mwalker [23:21:12] (03CR) 10Andrew Bogott: [C: 032] Set /etc/mailname on first boot [operations/puppet] - 10https://gerrit.wikimedia.org/r/147314 (owner: 10Andrew Bogott) [23:27:19] (03PS3) 10Andrew Bogott: Set /etc/mailname to contain $::fqdn [operations/puppet] - 10https://gerrit.wikimedia.org/r/147150 (https://bugzilla.wikimedia.org/64962) [23:29:40] (03PS4) 10Andrew Bogott: Set /etc/mailname to contain $::fqdn [operations/puppet] - 10https://gerrit.wikimedia.org/r/147150 (https://bugzilla.wikimedia.org/64962) [23:30:58] (03CR) 10Andrew Bogott: [C: 032] Set /etc/mailname to contain $::fqdn [operations/puppet] - 10https://gerrit.wikimedia.org/r/147150 (https://bugzilla.wikimedia.org/64962) (owner: 10Andrew Bogott) [23:43:20] StevenW, James_F; scap has just finished sync common; so... your stuff should be live on the cluster now [23:43:27] phuedx, ^ [23:43:28] mwalker: Ta! [23:43:37] thanks mwalker [23:43:41] /cc StevenW [23:44:12] mwalker: Yeah, working great. Thanks!@ [23:45:35] greg-g, ... the beta cluster is dead; thoughts on the best way to troubleshoot if it was because I pushed out the petition extension? [23:46:17] I dont think it was my change though, because the error given is "Domain not configured" [23:46:32] i'll take a look [23:48:09] mwalker: i don't think it's related to your change; the puppet log shows the apache config changing and the service refreshing [23:48:16] andrewbogott should i just abandon this change: https://gerrit.wikimedia.org/r/#/c/147155/? [23:48:39] dogeydogey: yep [23:48:47] I think if you rebase it it will disappear anyway [23:48:48] ori: thanks [23:48:55] (03Abandoned) 10Scottlee: Fixing variable. [operations/puppet] - 10https://gerrit.wikimedia.org/r/147155 (owner: 10Scottlee) [23:49:23] who is dzahn? [23:49:36] !log mwalker Finished scap: SWAT for {{gerrit|146651}}, {{gerrit|147102}}, {{gerrit|146925}}, {{gerrit|147331}}, {{gerrit|147332}}, and {{gerrit|147206}} [23:49:42] Logged the message, Master [23:50:06] dogeydogey: mutante [23:50:22] mutante https://gerrit.wikimedia.org/r/#/c/147168/ i fixed all the white spaces, please check when you get a chance [23:54:57] greg-g: back up [23:58:16] (03CR) 10Ori.livneh: "I'd prefer not to have a jobrunner-specific HHVM class but instead to have a generic HHVM class that is provisioned on all Trusty app serv" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147086 (owner: 10Giuseppe Lavagetto)