[00:19:33] (03PS2) 10Ori.livneh: Add abacist module & role; provision on stat1001 [puppet] - 10https://gerrit.wikimedia.org/r/181110 [00:21:48] (03CR) 10Ori.livneh: [C: 032] Add abacist module & role; provision on stat1001 [puppet] - 10https://gerrit.wikimedia.org/r/181110 (owner: 10Ori.livneh) [00:26:56] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 57863 bytes in 0.084 second response time [00:34:36] ori: here? [00:34:52] Can I run sync-common on osmium without having to worry about the world ending? [00:35:13] I need a hhvm production host to test a two lines of code [00:36:25] hoo: sure, go for it [00:52:32] PROBLEM - puppet last run on amssq53 is CRITICAL: CRITICAL: Puppet has 2 failures [00:52:43] PROBLEM - puppet last run on amssq49 is CRITICAL: CRITICAL: Puppet has 1 failures [00:55:54] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures [00:59:56] PROBLEM - puppet last run on cp3009 is CRITICAL: CRITICAL: Puppet has 3 failures [01:00:24] PROBLEM - puppet last run on ms-fe3001 is CRITICAL: CRITICAL: Puppet has 1 failures [01:01:03] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [01:01:24] PROBLEM - puppet last run on amssq38 is CRITICAL: CRITICAL: Puppet has 1 failures [01:02:03] PROBLEM - puppet last run on cp3012 is CRITICAL: CRITICAL: Puppet has 4 failures [01:03:23] RECOVERY - puppet last run on amssq49 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [01:06:13] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [01:06:45] RECOVERY - puppet last run on amssq53 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [01:11:12] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [01:13:34] RECOVERY - puppet last run on cp3009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:13:43] RECOVERY - puppet last run on ms-fe3001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:14:44] RECOVERY - puppet last run on amssq38 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:15:32] RECOVERY - puppet last run on cp3012 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [01:17:37] !log Ran mysql:wikiadmin@db1033 [centralauth]> DELETE FROM bug_54847_password_resets WHERE r_username = 'Stilfehler'; [01:17:47] Logged the message, Master [01:38:19] (03PS1) 10Hoo man: Fix Bug54847.php for broken hashes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181710 [04:54:14] (03CR) 10Legoktm: [C: 031] Fix Bug54847.php for broken hashes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181710 (owner: 10Hoo man) [05:21:17] (03CR) 10Legoktm: [C: 031] beta: honor log sampling for logstash [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181349 (owner: 10BryanDavis) [06:33:46] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: Puppet has 2 failures [06:33:57] PROBLEM - puppet last run on search1001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:30] PROBLEM - puppet last run on amslvs1 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:33] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Puppet has 3 failures [06:35:34] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 2 failures [06:36:06] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:17] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:40] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 2 failures [06:45:40] RECOVERY - puppet last run on search1001 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:47:06] RECOVERY - puppet last run on amslvs1 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:47:06] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [06:47:06] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:49] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:47:55] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:48:07] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:49:16] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [08:15:17] matanya: did you investigate how to upload larger files to mw? [08:15:37] godog: nit really, i just split them [08:15:40] *not [08:15:52] the upper limit is 5GB [08:15:57] and done server side [08:16:10] client side is limited to 1GB [08:18:12] matanya: ack, and the videos reenconded to smaller size too? [08:18:26] no [08:18:42] but i can do that as well, though it is a pain [08:20:59] matanya: if you don't mind being a lab rat, I can create a big image for your video project on one of the new servers. [08:21:01] yeah I was wondering if it makes sense to have videos raw on commons as opposed to just reencoded and raw on e.g. archive [08:21:02] What specs do you want? [08:21:48] andrewbogott: feel free to experiment on me 16 or more CPU, 16GB of ram [08:22:09] godog: i can just copy from youtube then, and no need for reecoding [08:23:21] matanya: that'll do too yeah [08:23:39] so that is the new approche ? [08:27:58] matanya: disk space? [08:28:07] matanya: merely a suggestion, perhaps somebody from multimedia team? [08:28:11] andrewbogott: 80 GB ? [08:28:54] godog: the raw isn't higher quality from a 720p file [08:29:36] https://commons.wikimedia.org/wiki/File:Evaluation_I_Metrics_p1-001.webm [08:30:05] PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: Puppet has 1 failures [08:30:21] vs https://www.youtube.com/watch?v=0Z6Hv91pIEs godog [08:30:30] matanya: want to create an instance of type 'm1.gigantic' and make sure it gets scheduled on virt1010 or virt1011? [08:30:41] yes [08:31:38] matanya: nice, looks good [08:32:17] godog: so the one on commos in 10mbps and the one on youtube is approx 2mbps [08:32:22] and no real diff [08:33:18] andrewbogott: Failed to create instance. [08:37:51] matanya: hm, I will look [08:39:06] oh, quota [08:40:07] matanya: try again? [08:40:41] yes, now it is being created [08:44:03] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [08:44:35] <_joe_> good morning [08:44:53] <_joe_> I'm on/off IRC this morning (on a train) but I'm working [08:46:08] matanya: if that new instance holds up for a few days then please delete the old smaller one if you can spare it. [08:46:14] <_joe_> so if you need me, drop me an email [08:46:39] ok, andrewbogott i'll need to install some tools [08:46:42] matanya: the only bad behavior we've seen on virt1011 so far is that an instance turned itself off at one point. No idea if that's a hosting problem or was just bad behavior within the instance. [08:46:57] i can live with that [08:47:46] btw, andrewbogott can you automaticlly install all packages on one instance on another one ? [08:48:17] there's certainly no automated way to do it, although dpkg --list and a bit of sed work might get you a full list to jam into apt-get install [08:48:48] i used dpkg --get-selection [08:49:01] but that was 100% by hand [08:49:38] <_joe_> andrewbogott: will you find time to fix the openstack templates anytime soon? [08:49:54] <_joe_> they're the last big offender when it comes to variable access [08:50:16] <_joe_> (the puppet templates I mean) [08:50:28] _joe_: sure, I just now put it on my list-of-things-to-do-soon [08:50:42] but I'm out for the next couple of weeks, so it won't be before mid-January. [08:52:12] <_joe_> eheh me too [08:52:21] <_joe_> whoever does that first I'd say [08:52:40] <_joe_> this is utterly-low-priority of course, it's just my OCD [08:59:12] andrewbogott: just FYI, puppet is failing on this host [08:59:33] matanya: I'll look [08:59:41] nfs issues [08:59:49] running it again to verify [09:00:00] oh, in that case it probably just needs a reboot. [09:00:11] I thought that race was fixed, but… I saw the same problem earlier today [09:04:17] rebooting [09:14:44] thanks andrewbogott fixed [09:14:52] cool. Hope it holds up [09:18:15] (03PS2) 10KartikMistry: WIP: Content Translation configuration for Production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181546 [09:37:16] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Minor inline comments, rest LGTM" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/181080 (owner: 10Filippo Giunchedi) [09:53:20] <_joe_> can someone name me a good reason why we don't include admin in standard? [09:53:40] <_joe_> apart from the fact we didn't have hiera at the time [09:55:52] (03PS1) 10Andrew Bogott: Fix image_commands in the jessie manifest. [puppet] - 10https://gerrit.wikimedia.org/r/181722 [09:57:13] (03CR) 10Andrew Bogott: [C: 032] Fix image_commands in the jessie manifest. [puppet] - 10https://gerrit.wikimedia.org/r/181722 (owner: 10Andrew Bogott) [09:57:47] _joe_: we could not override per host the parameters, which pretty much boils down to we did not have hiera [09:57:57] which means we now can :-) [09:58:10] there is my useless answer for the year ;-) [09:58:54] <_joe_> not useless [09:59:01] <_joe_> you just confirmed I was right [09:59:09] <_joe_> :) [10:06:06] <_joe_> but well, I won't work on that today anyways [10:06:45] <_joe_> gosh I almost have 2 weeks without working, I'm not used to this [10:07:02] _joe_: whenever I’ve had that I’ve never managed to not work [10:07:20] * YuviPanda wishes _joe_ good luck [10:07:24] _joe_: you can volunteer :) [10:07:42] <_joe_> matanya: I already do, I'm paid for a 40 hour week [10:08:07] hehe [10:08:16] * matanya grants _joe_ the honor of all wikimedians [10:08:42] the line is somewhat blurry. All my ‘side projects’ have somehow been very related to my ‘get paid for this’ stuff [10:09:00] <_joe_> also, I have other projects I'm a bit negleticting lately I should really give some love to [10:09:59] <_joe_> but well, I'll mostly eat and meet some old friends for a few days [10:10:25] <_joe_> so I'll probably manage not to work. [10:11:33] nice [10:22:30] akosiaris: for the HP devices, you also redirect console output ? [10:22:49] yeah [10:23:34] thanks [11:51:27] (03PS1) 10Alexandros Kosiaris: backups: Create an offsite pool [puppet] - 10https://gerrit.wikimedia.org/r/181725 [11:51:29] (03PS1) 10Alexandros Kosiaris: backups: Support next pool directive [puppet] - 10https://gerrit.wikimedia.org/r/181726 [11:51:31] (03PS1) 10Alexandros Kosiaris: backups: define next_pool for the production pool [puppet] - 10https://gerrit.wikimedia.org/r/181727 [11:51:33] (03PS1) 10Alexandros Kosiaris: backups: add offsite job template [puppet] - 10https://gerrit.wikimedia.org/r/181728 [12:31:37] (03PS1) 10Yuvipanda: shinken: Provide ldap information for shinkengen [puppet] - 10https://gerrit.wikimedia.org/r/181732 [13:16:02] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: puppet fail [13:29:42] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:57:16] (03CR) 10Alexandros Kosiaris: [C: 032] backups: add offsite job template [puppet] - 10https://gerrit.wikimedia.org/r/181728 (owner: 10Alexandros Kosiaris) [13:58:05] (03CR) 10Alexandros Kosiaris: [C: 032] backups: define next_pool for the production pool [puppet] - 10https://gerrit.wikimedia.org/r/181727 (owner: 10Alexandros Kosiaris) [13:59:38] (03CR) 10Alexandros Kosiaris: [C: 032] backups: Support next pool directive [puppet] - 10https://gerrit.wikimedia.org/r/181726 (owner: 10Alexandros Kosiaris) [13:59:51] !log package updates and reboots for several fundraising servers... [13:59:57] Logged the message, Master [13:59:57] (03CR) 10Alexandros Kosiaris: [C: 032] backups: Create an offsite pool [puppet] - 10https://gerrit.wikimedia.org/r/181725 (owner: 10Alexandros Kosiaris) [14:17:44] (03PS1) 10Yuvipanda: Use LDAP as source for instance and project info [software/shinkengen] - 10https://gerrit.wikimedia.org/r/181737 [14:18:46] (03PS2) 10Yuvipanda: Use LDAP as source for instance and project info [software/shinkengen] - 10https://gerrit.wikimedia.org/r/181737 [14:27:09] (03PS3) 10Yuvipanda: Use LDAP as source for instance and project info [software/shinkengen] - 10https://gerrit.wikimedia.org/r/181737 [14:27:24] (03PS4) 10Yuvipanda: Use LDAP as source for instance and project info [software/shinkengen] - 10https://gerrit.wikimedia.org/r/181737 [14:29:26] YuviPanda: isn't there any openstack API you could query instead? [14:30:58] paravoid: not for puppet classes, no. [14:31:06] err [14:31:12] puppet roles paplied to each instance via wikitech [14:31:13] that’s in LDAP [14:34:16] <_joe_> we need puppetdb! [14:36:36] (03CR) 10Faidon Liambotis: [C: 04-1] "I was about to comment something about perhaps using the metadata service to encode the tenant/project ID instead of the separate LDAP que" [puppet] - 10https://gerrit.wikimedia.org/r/181535 (owner: 10Andrew Bogott) [14:38:21] (03CR) 10Faidon Liambotis: [C: 032] "I've killed a bunch of provider => upstart all over the tree as well. Older versions of puppet were buggy and did not do the right thing w" [puppet] - 10https://gerrit.wikimedia.org/r/181540 (owner: 10Andrew Bogott) [14:40:22] (03CR) 10Faidon Liambotis: [C: 031] "The only thing that base::instance-upstarts does is to set up a getty under ttyS0 (= serial console). This isn't needed with systemd as it" [puppet] - 10https://gerrit.wikimedia.org/r/181541 (owner: 10Andrew Bogott) [14:42:01] (03PS1) 10RobH: setting mgmt info for stat2001 [dns] - 10https://gerrit.wikimedia.org/r/181741 [14:42:36] (03PS2) 10RobH: setting mgmt info for stat2001 [dns] - 10https://gerrit.wikimedia.org/r/181741 [14:44:57] (03CR) 10Faidon Liambotis: [C: 04-1] "Clearly the case. I'm wondering what we're using it for, though. I don't think it's needed for unattended upgrades. If it is, how do they " [puppet] - 10https://gerrit.wikimedia.org/r/181539 (owner: 10Andrew Bogott) [14:45:43] (03CR) 10RobH: [C: 032] setting mgmt info for stat2001 [dns] - 10https://gerrit.wikimedia.org/r/181741 (owner: 10RobH) [14:50:16] (03PS5) 10Yuvipanda: Use LDAP as source for instance and project info [software/shinkengen] - 10https://gerrit.wikimedia.org/r/181737 [14:51:48] _joe_: puppetdb wouldn't help much here [14:52:06] YuviPanda: it's kind of a pity that we call out to LDAP as often we are :/ [14:52:29] considering we have this infrastructure that is supposed to be built on well-defined restful APIs :) [14:52:33] <_joe_> paravoid: I will use the tag next time :) [14:52:52] heh [14:52:58] we need moar clojure!!! [14:53:04] paravoid: add an LDAP api to wikitech! :-p [14:53:05] we have infrastructure that’s built on well defined restful APIs?! [14:53:06] where? [14:53:21] also I don't think the mediawiki API classified as RESTful :-p [14:53:31] valhallasw`cloud: he was referring to OpenStack, I think [14:53:34] I was refering to openstack [14:53:41] ah, fair enough [14:54:03] I think we can get puppet roles into OS’s metadata service somehow, but that somehow feels like a much bigger project than hitting LDAP :D [14:54:05] <_joe_> valhallasw`cloud: the mediawiki api is everything but restful [14:54:06] maybe along with Horizon. [14:54:23] _joe_: spiteful. that’s what it is. [14:54:26] I was researching this just now for a different reason [14:54:31] (https://gerrit.wikimedia.org/r/181535) [14:54:44] so Horizon is that new shiny beatiful thing that will solve all of our problems and bring world peace along as well ? [14:54:50] but the metadata service seems to only be about an instance querying its metadata [14:55:00] why I am so ready to be disappointed ? :-/ [14:55:01] not a host querying all instances [14:55:01] (03PS6) 10Yuvipanda: Use LDAP as source for instance and project info [software/shinkengen] - 10https://gerrit.wikimedia.org/r/181737 [14:55:32] akosiaris: mostly switching to horizon will require so much work we hope there is enough time and energy to do that properly instead of like the last time around [14:55:44] akosiaris: !xkcd standards [14:55:52] https://xkcd.com/927/ [14:55:57] hehe [14:56:15] paravoid: hmm, puppet failure status is tracked through the metadata service (and is completely useless also), so I’m guessing we can do something like that.. [14:56:22] put a JSON blob in there or something [14:57:28] !log disabled puppet on helium while testing copy jobs [14:57:32] Logged the message, Master [14:58:22] also, err [14:58:29] home NFS on labs is 94% full? [14:58:41] CorenAFK: ^ [14:58:57] at least something’s fuller than /var [15:00:21] YuviPanda: puppet master's api is a restful API [15:00:40] I'm not sure if you can query for included classes [15:00:48] and how long would that take (probably a full catalog generation) [15:01:02] paravoid: that, plus self hosted puppetmasters [15:01:12] grumble [15:01:15] I also don’t know if it can query for the classes that are explicitly included [15:01:30] right [15:02:03] in some ways this might be somewhat like role() that _joe_ has for prod :D [15:02:07] hmm [15:02:13] (or not, I have only vague ides what role() does) [15:02:47] (03PS2) 10Yuvipanda: shinken: Provide ldap information for shinkengen [puppet] - 10https://gerrit.wikimedia.org/r/181732 [15:02:49] hmm, except now shinken throws up the equivalent of a NullPointerException [15:02:52] * YuviPanda shakes fist at shinken [15:02:57] if it can query the classes that are explicitly included (i.e. query the ldap terminus), then it wouldn't need a full catalog compilation and it would work for selfs [15:04:39] ‘it’? [15:04:56] and yeah, just hitting ldap works for selfs and also doesn’t involve any compilation [15:05:58] what the fuck, shinken [15:06:04] if hg is None: [15:06:07] hg.configuration_errors.append(err) [15:06:09] ... [15:06:47] <_joe_> lol [15:09:28] <_joe_> oh man, I'm trying to apply role() in site.pp, and I keep finding edge cases [15:09:37] <_joe_> that don't really need to be there btw [15:10:40] Hm. I think I need to look at storage and see where it's going away. [15:11:06] It's probably one or two outliers consuming most of this. [15:11:20] YuviPanda: root@virt1000:~# curl -k -H "Accept: pson" https://virt1000.wikimedia.org:8140/production/node/i-00000615.eqiad.wmflabs | jq '.data.classes' [15:11:52] it needed a auth.conf stanza to allow virt1000 to query for this [15:12:09] (plus an apache2 allow from, to allow virt1000 to query itself) [15:12:26] but you can easily make this better [15:13:32] <_joe_> paravoid: it think that implies a full catalog compilation [15:13:40] no it doesn't [15:13:41] <_joe_> *I think it [15:13:54] <_joe_> paravoid: it uses cached catalogs? [15:14:07] no [15:14:18] this is just the node information [15:14:35] most of it is facts [15:14:37] from the database [15:14:42] <_joe_> oh and .data.classes includes information from ldap? [15:14:42] but I think classes is from the ldap terminus [15:14:54] <_joe_> nice, didn't know that [15:14:56] we can clear the database for an instance and check, though [15:15:20] <_joe_> even if you get those info from the db, it's still good enough [15:17:34] hmm, this plus hitting nova / wikitech would work as well... [15:17:57] I can also make the wikitech API hit the puppetmaster, but that sounds horrible + there’s no way to test any changes to wikitech atm... [15:18:56] wikitech wants to also write and this isn't suitable for this [15:19:39] you could write a new API for getting/setting classes, though, and use an ENC instead of the LDAP terminus [15:19:44] and abstract LDAP from puppet altogether [15:22:10] That may be counterproductive since the intent is to get rid of wikitech. [15:22:40] you need something like this anyway [15:22:50] yeah, if we didn’t want to get rid of wikitech, setting up a YAML backed page with a UI for both puppetclasses + puppetvars would work with an ENC [15:22:54] since you need a UI that allows people to pick puppet classes they want applied to hosts [15:23:10] whether that's wikitech, a horizon module or a third-party app [15:23:14] hmm [15:23:32] PROBLEM - Host thulium is DOWN: PING CRITICAL - Packet loss = 100% [15:24:22] paravoid: that actually sounds like a nice solution, and will also give us ways to define roles / puppetvars for project-wide application [15:24:30] <_joe_> why do I get paged for machines I don't have access to? [15:24:33] hmm, and we can probably get rid of puppetvars, and just put those in hiera instead. [15:24:48] <_joe_> YuviPanda: we should, in fact [15:24:52] Jeff_Green: ping [15:24:53] paravoid: the only problem with all this is that we don’t have a way to test any changes to wikitech atm. [15:25:35] thulium is frack right? [15:25:39] yes [15:25:43] <_joe_> right [15:26:09] YuviPanda: Andrew's test wikitech instance is broken? [15:26:18] Coren: has been for a while. [15:26:24] paravoid: hi [15:26:37] Jeff_Green: we are getting paged about thulium being down [15:27:01] yeah, i updated the kernel and rebooted, and the root partition is corrupt [15:27:23] did you set downtime? [15:27:26] i already marked it down for maintenance in icinga, not sure how to stop it paging [15:27:33] hm [15:27:54] it should not page anymore then [15:28:07] I've been wondering why the maintenance mode doesn't work in icinga, could it be because it's all passive checks? [15:28:27] <_joe_> it works, I have used it extensively [15:29:07] did I do the wrong host or something? [15:30:10] nope, thulium [15:30:24] i selected all services and scheduled downtime for all of them [15:31:05] RECOVERY - Host thulium is UP: PING OK - Packet loss = 0%, RTA = 2.76 ms [15:31:07] should be good for another 45 min, yet it's still paging [15:32:00] but not the server, right ? [15:32:19] yea thats gotten me in past [15:32:23] gotta do both services and server =P [15:32:24] akosiaris: where do you do the server? [15:32:37] host summary view iirc [15:33:07] if you click on the server name it has Host Commands on the right and a "Schedule downtime for this host and all services" [15:33:19] paravoid: Coren filed https://phabricator.wikimedia.org/T85279?workflow=create [15:33:46] I'm probably a buffoon, but I don't see anything to that effect [15:33:57] I did it [15:34:11] it now added the ballon and the grey/more grey thingy [15:34:18] next to the server name [15:34:25] ok but I'd like to know for future reference [15:34:44] i find the server summary by search, then click the server name [15:35:03] yes, not the services, the service name explicitly [15:35:06] from there I see a lot of "View..." but nothing about Schedule [15:35:22] oh there I see it [15:35:27] ok [15:35:34] ok great, I've been doing it wrong. this explains a lot :-) [15:35:37] thx [15:35:46] don't mention it [15:36:10] now back to figuring out how to repair the root partition :-) [15:39:21] (03PS3) 10Yuvipanda: shinken: Provide ldap information for shinkengen [puppet] - 10https://gerrit.wikimedia.org/r/181732 [15:44:06] (03PS1) 10Gage: strongswan: puppet module [puppet] - 10https://gerrit.wikimedia.org/r/181742 [15:45:46] paravoid: I think I’m going to just hit LDAP for now. shouldn’t cause *that* much additional load, I think. Will move it over to the ENC one when we do move to using the ENC. [15:46:00] it's not about of load at all [15:46:06] s/of// [15:46:25] true, but I don’t want to switch to one thing now (nova + puppetmaster) and then switch again. [15:46:39] hmm [15:46:43] it won’t be *that* much of a switch [15:47:00] the *real* reason, of course, is I’m lazy, and wanted this so I can start writing checks against roles :) [15:47:14] :-) [15:47:14] and don’t want to start yakshaving now. [15:47:24] no opinion [15:47:27] I don't care all that much :P [15:47:50] ah, heh :) [15:48:30] "Yak shaving"? [15:52:18] <_joe_> YuviPanda: yak shaving meaning? [15:52:49] https://en.wiktionary.org/wiki/yak_shaving [15:53:02] uh [15:53:04] that’s a terrible page [15:53:22] http://www.urbandictionary.com/define.php?term=yak%20shaving [15:53:25] > Any seemingly pointless activity which is actually necessary to solve a problem which solves a problem which, several levels of recursion later, solves the real problem you're working on. [15:53:42] original story I remember hearing involves someone deciding to knit a sweater [15:53:55] and then 4 days later finds themselves in the freezing himalayas trying to shave a yak [15:54:27] Coren: _joe_ ^ [15:54:46] Ah. [15:55:57] Coren: want to +1? https://gerrit.wikimedia.org/r/#/c/181732/ [15:56:36] this creates hostgroups with all the puppet roles ever applied directly to a node, making it easy to target nodes with particular set of roles for monitoring [15:56:45] all tools proxies should have a check_http, for example [15:57:38] hmm, my power is going to die. brb. [15:57:57] (03CR) 10coren: [C: 031] "That looks (a) sane, and (b) like a good idea" [puppet] - 10https://gerrit.wikimedia.org/r/181732 (owner: 10Yuvipanda) [15:58:32] <_joe_> lol [16:08:18] Gah! Someone changed something on the puppetmaster config that breaks everything. [16:08:36] * Coren hunts for the changeset. [16:33:15] _joe_: Puppet fails in labs because " Error 400 on SERVER: 'find' is already in the '/certificate_request' ACL" but the only reference to an ACL of that name I can find in puppet is a change you made some time ago to /modules/puppet/templates/auth-self.conf.erb which shouldn't matter to virt1000. Any insight? [16:45:39] (03PS1) 10BBlack: Install mcelog and intel-microcode everywhere [puppet] - 10https://gerrit.wikimedia.org/r/181743 [16:47:08] sometimes yak shaving has negative connotations, too [16:47:50] like we when you start looking for a real bug, but end up correcting trivial nits all over the codebase as you go, generating lots of pointless commit traffic that does nothing to solve the real problem :) [16:56:33] bblack: that sounds more like yak nailpainting :) [16:56:55] Also I'm locked out of the house. [16:57:05] gj [16:57:11] YuviPanda|brb: That seems like a bad idea. Why did you do that? [16:58:04] Coren: looks like everyone went to a Christmas party without telling me and took the keys [16:58:26] So much fun [16:58:38] http://www.capricorn.org/~akira/home/lockpick/mitlg-a4.pdf [16:58:51] Oh, if they're at a party I'm sure they /are/ having so much fun. :-) [16:59:23] bblack i got in trouble in college for slipping printed copies of that under the door of everyone in my dorm ;) [16:59:52] I built my tools by grinding down tiny allen keys and such when I read that doc in highschool :) [17:00:06] street sweeper bristles! [17:00:17] Heh [17:00:54] So I get to sit outside and IRC from my.phone until that runs out [17:09:21] (03PS1) 10Sn1per: Add version number 3.0 to CC license Phab footer [puppet] - 10https://gerrit.wikimedia.org/r/181744 [17:12:23] Yuvi|LockedOut: At least, you are someplace where exposure is not an issue. [17:25:58] Coren: true except for the mosquitoes. Lots. And lots of mosquitoes [17:29:03] I hear if you hop on one foot mosquitoes won't bite you [17:29:09] just a thought in case you wanted to try it :) [17:29:44] I have this toy on my desk to remind me about yak shaving -- www.amazon.com/Palisades-Ren-Stimpy-Shaven-Action/dp/B000302CQM/ -- I tend to use the term in the negative connotation. Why shave a yak when you can just buy yarn? [17:41:27] (03PS1) 10RobH: setting mac info for codfw mw servers [puppet] - 10https://gerrit.wikimedia.org/r/181746 [17:42:22] (03CR) 10RobH: [C: 032] setting mac info for codfw mw servers [puppet] - 10https://gerrit.wikimedia.org/r/181746 (owner: 10RobH) [18:13:08] chasemp: heh. I usually just let them bite. Being zen about it is probably the only thing that's going to work [18:13:22] Also when these guys come back I'm going to murder them one by one slowly [18:14:43] (03PS2) 10Rush: Add version number 3.0 to CC license Phab footer [puppet] - 10https://gerrit.wikimedia.org/r/181744 (owner: 10Sn1per) [18:15:04] (03CR) 10Rush: [C: 032] "per https://phabricator.wikimedia.org/T644#13818 this should be gtg" [puppet] - 10https://gerrit.wikimedia.org/r/181744 (owner: 10Sn1per) [18:15:28] (03CR) 10Rush: [V: 032] "per https://phabricator.wikimedia.org/T644#13818 this should be gtg" [puppet] - 10https://gerrit.wikimedia.org/r/181744 (owner: 10Sn1per) [18:42:11] !log manually running debmirror on carbon to sync over the holidays; "pkill -f debmirror" should suffice if there is a problem [18:42:15] Logged the message, Master [18:50:25] paravoid: Syncing Jessie? [19:10:20] /last greg-g [19:36:04] any deploy gods around? PageTriage is still pretty badly broken on enwiki. It's an easy 1-line fix https://gerrit.wikimedia.org/r/#/c/181753 , but no greg-g around. [19:45:58] YuviPanda|futex my browsers get locked out all the time [19:46:11] spagewmf: heh :D [20:04:10] spagewmf: just do it [20:05:28] ori: OK [23:15:12] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet last ran 28847 seconds ago, expected 28800 [23:20:16] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet last ran 29148 seconds ago, expected 28800 [23:25:05] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet last ran 29448 seconds ago, expected 28800 [23:30:06] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet last ran 29748 seconds ago, expected 28800 [23:35:10] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet last ran 30047 seconds ago, expected 28800 [23:40:09] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet last ran 30348 seconds ago, expected 28800 [23:43:10] PROBLEM - puppet last run on amssq33 is CRITICAL: CRITICAL: Puppet has 1 failures [23:43:10] PROBLEM - puppet last run on amssq39 is CRITICAL: CRITICAL: Puppet has 2 failures [23:44:40] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [23:44:40] PROBLEM - puppet last run on cp3005 is CRITICAL: CRITICAL: Puppet has 2 failures [23:45:10] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet last ran 30648 seconds ago, expected 28800 [23:48:22] PROBLEM - puppet last run on amssq57 is CRITICAL: CRITICAL: Puppet has 1 failures [23:48:44] PROBLEM - puppet last run on amssq52 is CRITICAL: CRITICAL: Puppet has 1 failures [23:49:28] (03CR) 10Yuvipanda: [C: 032 V: 032] Use LDAP as source for instance and project info [software/shinkengen] - 10https://gerrit.wikimedia.org/r/181737 (owner: 10Yuvipanda) [23:49:42] (03PS4) 10Yuvipanda: shinken: Provide ldap information for shinkengen [puppet] - 10https://gerrit.wikimedia.org/r/181732 [23:50:11] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet last ran 30948 seconds ago, expected 28800 [23:50:47] (03CR) 10Yuvipanda: [C: 032] shinken: Provide ldap information for shinkengen [puppet] - 10https://gerrit.wikimedia.org/r/181732 (owner: 10Yuvipanda) [23:54:37] greg-g: you're not around, but Collaboration team is going to deploy a 2-line JS fix to PageTriage. I'll send e-mail [23:55:17] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet last ran 31247 seconds ago, expected 28800 [23:55:20] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [23:57:08] RECOVERY - puppet last run on amssq39 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:57:08] RECOVERY - puppet last run on amssq33 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:58:48] RECOVERY - puppet last run on cp3005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:59:18] RECOVERY - puppet last run on amssq52 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures