[00:00:21] Ryan_Lane: is this the final hour? :( [00:01:03] https://www.youtube.com/watch?v=9jK-NcRmVcw [00:02:05] well, I'm technically going to be in the office monday and tuesday too :D [00:02:31] Ryan_Lane: Are you staying on as a volunteer sysadmin? [00:02:34] greg-g: I was thinking of that before I clicked the link [00:02:40] (Congrats, BTW.) [00:02:44] ah, seems my info leaked out :D [00:02:59] Elsie: I'll be on contract for a while [00:03:05] and after that, yeah, volunteer [00:03:09] Nice. :-) [00:03:21] I should probably send a notice to labs-l [00:03:39] Wouldn't you be more surprised if I didn't know? ;-) [00:03:55] I'm surprised anyone outside of the foundation knows ;) [00:03:57] (such a rocking song ;) ) [00:04:37] bd808: so, I realized... even if I add an openid provider on wikitech there's still no way to limit it to specific groups [00:04:43] I run Wikimedia. [00:04:47] kibana, that is [00:05:05] which I'm sure you'd want to do (limit to wmf group) [00:05:25] so I started thinking of ways to handle that :D [00:05:45] nginx + openid + lua may be doable [00:05:53] where lua uses http://www.keplerproject.org/lualdap/ [00:06:02] then caches group info into redis [00:06:08] Ryan_Lane: Ah. Yeah we'd need some sort of SSO data that Apache can process to limit [00:06:20] then we could add openid + ldap group support to yuviproxy [00:06:32] http://luaopenid.luaforge.net/ [00:06:34] there's that too [00:06:46] PROBLEM - Puppet freshness on neon is CRITICAL: No successful Puppet run in the last 3 hours [00:06:51] lua openid + lua ldap would work [00:07:23] bleh. it's only a server implementation of openid [00:07:53] What we really need is some flavor of SAML I guess [00:08:08] ewww [00:08:23] a simple openid client in lua would suffice :) [00:09:24] it would honestly be nice if kibana3 had atleast a small server shim :) [00:09:33] So wikitech would an OpenID provider, send creds to Apache and then ??? How do we get group info? [00:09:33] so it could do the auth [00:09:44] group info would come from lualdap [00:09:46] PROBLEM - Puppet freshness on tin is CRITICAL: No successful Puppet run in the last 3 hours [00:09:58] another way we could do this..... [00:10:13] is to have wikitech add group info to the openid attributes [00:11:09] we could add a hook to the openid provider and have LdapAuthentication implement the hook. it could pull the group info and add it to the openid info [00:11:27] we could use http://openid.net/specs/openid-attribute-exchange-1_0.html [00:11:40] (hah, all of the recommended next videos after Final Countdown are equally 80s awesome) [00:12:04] ah, seems my info leaked out :D < Elsie lives in the dumpsters behind the building [00:12:11] http://findingscience.com/mod_auth_openid/ [00:12:24] Ryan_Lane: you know OpenID Attribute Exchange is SAML right? :) [00:12:44] p858snake|l: I'm still trying to figure out which WMFer is actually leading a double life ;) [00:12:44] ^^ mod_auth_openid can check attributes [00:12:59] bd808: SAML is an entire protocol ;) [00:13:12] * paravoid hears SAML and runs [00:13:16] paravoid: indeed [00:13:38] AuthOpenIDAXRequire [00:13:52] can check the group info via attribute exchange :) [00:14:00] greg-g: if you are talking about the karoke bar performances, i hear Reedy does a pretty mean rendition [00:14:31] p858snake|l: hah [00:14:45] mod_auth_openid looks surprisingly nice [00:14:47] Ryan_Lane: I really really really don't want to go near SAML ever again in my life [00:14:53] paravoid: I don't either [00:14:57] I've had enough for a lifetime [00:14:57] and I'm not suggesting it :) [00:14:59] greg-g: I witnessed at least one leak on a logged irc channel [00:15:02] heh [00:15:35] Running a SAML provider is a PITA, but hacking up a consumer is pretty easy. [00:15:50] SAML is a little overkill for what we need [00:16:06] we just need an openid provider (with attribute exchange) and forced consumers [00:16:29] bd808: of ryan's departure? [00:16:39] Ryan_Lane: Sure. I'm not advocating. [00:16:45] greg-g: Yeah. [00:16:47] bd808: it's easy. until it's not. [00:16:51] indeed [00:16:56] actually, I disagree [00:16:59] it's always hard [00:17:57] we'd probably want two attributes [00:18:00] wiki groups and ldap groups [00:18:22] The hardest part of the maintaining last SAML integration I wrote was getting the client to actually send their signing cert *before* it expired. [00:18:28] we could have the openid extension add wiki groups and LdapAuthentication can add the ldap groups [00:19:04] And explaining to their "professional security staff" that the public cert was indeed safe to send out of their network. [00:19:06] alternatively we could do ldap group sync and just have openid do it [00:19:20] Im not sure I want to do group sync [00:21:31] Having safe group based auth available to labs sounds like a pretty nice feature, but I can live without it for now. I used the highly secure "static password published on officewiki" protocol. [00:21:45] * Ryan_Lane nods [00:21:51] well, I want this for other things ;) [00:22:04] bd808: it's still possible to do openid [00:22:10] but openid + a locally managed group file [00:22:28] That would work for now too. [00:22:39] either way, it seems now is a good time to add the provider :) [00:22:46] then you can decide what you want to do. heh [00:23:02] :) [00:23:06] I really want to be able to drop LDAP auth for everything we do [00:23:17] and have all of our services use openid + groups [00:23:23] typing passwords sucks [00:24:02] it also adds two-factor auth for all the services too [00:24:46] I was sort of surprised that you folks don't do ssl client certs for stuff like this. [00:25:02] it's such a pain in the ass to manage that [00:25:08] The CA is for sure [00:25:13] all of it is [00:25:26] because then it's attached to the user's computer completely [00:25:39] or you need to have the keypair all over the place [00:25:43] and it's not two factor [00:26:04] I guess it is if the user puts a password on the key, but no one does that [00:26:30] too hard :) [00:27:40] * bd808 needs to finish writing an interview eval and find a cold beer [00:28:18] damn, too many things I want to do. now I need to figure out how to add AX to the openid provider :D [00:36:28] this actually looks really easy to support [00:47:23] (03CR) 10Manybubbles: "Fixing spelling." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/95720 (owner: 10Manybubbles) [00:52:58] (03PS2) 10Manybubbles: Puppet configuration for new elasticsearch servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/95720 [00:53:58] (03PS3) 10Manybubbles: Puppet configuration for new elasticsearch servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/95720 [00:56:46] PROBLEM - Puppet freshness on terbium is CRITICAL: No successful Puppet run in the last 3 hours [01:00:02] (03PS4) 10Manybubbles: Puppet configuration for new elasticsearch servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/95720 [01:01:26] (03CR) 10Manybubbles: [C: 04-1] "Still a work in progress despite the spelling corrections. Still haven't linted properly." [operations/puppet] - 10https://gerrit.wikimedia.org/r/95720 (owner: 10Manybubbles) [01:37:46] PROBLEM - Puppet freshness on fenari is CRITICAL: No successful Puppet run in the last 3 hours [01:38:46] PROBLEM - Puppet freshness on bast1001 is CRITICAL: No successful Puppet run in the last 3 hours [02:07:16] !log LocalisationUpdate completed (1.23wmf3) at Sat Nov 16 02:07:15 UTC 2013 [02:07:36] Logged the message, Master [02:12:51] !log LocalisationUpdate completed (1.23wmf4) at Sat Nov 16 02:12:51 UTC 2013 [02:13:07] Logged the message, Master [02:33:06] PROBLEM - SSH on amslvs1 is CRITICAL: Server answer: [02:33:35] !log LocalisationUpdate ResourceLoader cache refresh completed at Sat Nov 16 02:33:35 UTC 2013 [02:33:51] Logged the message, Master [02:34:07] RECOVERY - SSH on amslvs1 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [03:07:46] PROBLEM - Puppet freshness on neon is CRITICAL: No successful Puppet run in the last 3 hours [03:10:46] PROBLEM - Puppet freshness on tin is CRITICAL: No successful Puppet run in the last 3 hours [03:57:46] PROBLEM - Puppet freshness on terbium is CRITICAL: No successful Puppet run in the last 3 hours [04:38:46] PROBLEM - Puppet freshness on fenari is CRITICAL: No successful Puppet run in the last 3 hours [04:39:46] PROBLEM - Puppet freshness on bast1001 is CRITICAL: No successful Puppet run in the last 3 hours [05:25:36] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:26:27] RECOVERY - RAID on searchidx1001 is OK: OK: optimal, 1 logical, 4 physical [05:50:26] RECOVERY - check_job_queue on hume is OK: JOBQUEUE OK - all job queues below 200,000 [05:53:36] PROBLEM - check_job_queue on hume is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:08:46] PROBLEM - Puppet freshness on neon is CRITICAL: No successful Puppet run in the last 3 hours [06:11:46] PROBLEM - Puppet freshness on tin is CRITICAL: No successful Puppet run in the last 3 hours [06:58:46] PROBLEM - Puppet freshness on terbium is CRITICAL: No successful Puppet run in the last 3 hours [07:39:46] PROBLEM - Puppet freshness on fenari is CRITICAL: No successful Puppet run in the last 3 hours [07:40:46] PROBLEM - Puppet freshness on bast1001 is CRITICAL: No successful Puppet run in the last 3 hours [07:49:46] PROBLEM - Puppet freshness on searchidx1001 is CRITICAL: No successful Puppet run in the last 3 hours [09:09:46] PROBLEM - Puppet freshness on neon is CRITICAL: No successful Puppet run in the last 3 hours [09:12:46] PROBLEM - Puppet freshness on tin is CRITICAL: No successful Puppet run in the last 3 hours [09:44:06] RECOVERY - Puppet freshness on searchidx1001 is OK: puppet ran at Sat Nov 16 09:43:58 UTC 2013 [09:59:46] PROBLEM - Puppet freshness on terbium is CRITICAL: No successful Puppet run in the last 3 hours [10:40:46] PROBLEM - Puppet freshness on fenari is CRITICAL: No successful Puppet run in the last 3 hours [10:41:46] PROBLEM - Puppet freshness on bast1001 is CRITICAL: No successful Puppet run in the last 3 hours [12:10:46] PROBLEM - Puppet freshness on neon is CRITICAL: No successful Puppet run in the last 3 hours [12:13:46] PROBLEM - Puppet freshness on tin is CRITICAL: No successful Puppet run in the last 3 hours [13:00:46] PROBLEM - Puppet freshness on terbium is CRITICAL: No successful Puppet run in the last 3 hours [13:41:46] PROBLEM - Puppet freshness on fenari is CRITICAL: No successful Puppet run in the last 3 hours [13:42:46] PROBLEM - Puppet freshness on bast1001 is CRITICAL: No successful Puppet run in the last 3 hours [15:11:46] PROBLEM - Puppet freshness on neon is CRITICAL: No successful Puppet run in the last 3 hours [15:14:46] PROBLEM - Puppet freshness on tin is CRITICAL: No successful Puppet run in the last 3 hours [15:26:06] PROBLEM - Disk space on cp1045 is CRITICAL: DISK CRITICAL - free space: /srv/sda3 12523 MB (4% inode=99%): /srv/sdb3 12434 MB (3% inode=99%): [16:01:46] PROBLEM - Puppet freshness on terbium is CRITICAL: No successful Puppet run in the last 3 hours [16:42:46] PROBLEM - Puppet freshness on fenari is CRITICAL: No successful Puppet run in the last 3 hours [16:43:46] PROBLEM - Puppet freshness on bast1001 is CRITICAL: No successful Puppet run in the last 3 hours [17:51:36] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:54:36] RECOVERY - RAID on searchidx1001 is OK: OK: optimal, 1 logical, 4 physical [18:12:46] PROBLEM - Puppet freshness on neon is CRITICAL: No successful Puppet run in the last 3 hours [18:15:46] PROBLEM - Puppet freshness on tin is CRITICAL: No successful Puppet run in the last 3 hours [19:02:46] PROBLEM - Puppet freshness on terbium is CRITICAL: No successful Puppet run in the last 3 hours [19:05:37] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:06:36] RECOVERY - RAID on searchidx1001 is OK: OK: optimal, 1 logical, 4 physical [19:07:21] (03PS1) 10Odder: (bug 44629) Clean up $wgSitename, $wgMetaNamespace etc. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95796 [19:41:33] (03CR) 10Matanya: "Regarding the module structure: I would combine reporter.pp, communitymetrics.pp, crons.pp and auditlog.pp into a bigger reporting.pp that" [operations/puppet] - 10https://gerrit.wikimedia.org/r/94075 (owner: 10Dzahn) [19:43:46] PROBLEM - Puppet freshness on fenari is CRITICAL: No successful Puppet run in the last 3 hours [19:44:46] PROBLEM - Puppet freshness on bast1001 is CRITICAL: No successful Puppet run in the last 3 hours [20:06:06] (03CR) 10Matanya: "nitpick: align arrows." [operations/puppet] - 10https://gerrit.wikimedia.org/r/94407 (owner: 10Dzahn) [20:39:22] (03CR) 10Matanya: "check out https://github.com/pyr/check-graphite and https://github.com/obfuscurity/nagios-scripts they seem to go the same direction you a" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77366 (owner: 10Pyoungmeister) [21:13:46] PROBLEM - Puppet freshness on neon is CRITICAL: No successful Puppet run in the last 3 hours [21:16:46] PROBLEM - Puppet freshness on tin is CRITICAL: No successful Puppet run in the last 3 hours [22:03:46] PROBLEM - Puppet freshness on terbium is CRITICAL: No successful Puppet run in the last 3 hours [22:05:24] (03Abandoned) 10Hashar: WORK IN PROGRESS: check graphite data from nagios [operations/puppet] - 10https://gerrit.wikimedia.org/r/77366 (owner: 10Pyoungmeister) [22:05:38] matanya: Peter is no more part of wmf [22:06:16] hashar: I know, does that mean he doesn't push any code ever? [22:06:26] matanya: and I must agree there is no point in reinventing the wheel :] so I just abandoned thechange [22:07:04] hashar: oh, ok. i might push those by myself the coming days if i find time [22:07:31] matanya: talk to Ori-l about it as well [22:07:41] he is becoming the performance engineer nowadays :] [22:07:45] hashar: you may review my recent patch if you wish :) [22:07:47] taking over graphite / ganglia [22:08:13] Ah cool! I have a lot i'd like to contribute in those acpects [22:08:41] and there is a loooooot to do [22:09:44] matanya: what might be nice is to integrate https://github.com/etsy/skyline [22:09:55] detects anomaly in graphite metrics [22:10:13] hashar: I thought about begining with graphios [22:10:40] yeah that too [22:11:07] I wish we could phase out nagios honestly [22:11:35] I would prefer not too :) it works and it is very stable [22:16:52] Reedy: whey did you need unzip on tin? [22:42:42] matanya: to upload is hacks compressed locally with WinZip :D [22:43:05] I am off [22:43:08] hashar: LOL ! [22:43:09] have a nice Sunday [22:43:13] :-D [22:43:22] night, you too, see you maybe ? :) [22:44:46] PROBLEM - Puppet freshness on fenari is CRITICAL: No successful Puppet run in the last 3 hours [22:45:46] PROBLEM - Puppet freshness on bast1001 is CRITICAL: No successful Puppet run in the last 3 hours [22:49:45] off and sleeping