[00:00:52] we were supposde to get mounting hardware on wednesday [00:01:07] I haven't checked in with robh or cmj to see if they arrived [00:01:25] I need to do the first one with cmj because I"m not sure how the host will come back up with two more drives [00:01:49] (i.e. will it shift around existing drives? will it move the OS? will it even boot? will they just be sdm and sdn?) [00:02:23] anyway, it's now past 2am. my bed time. [00:02:24] g'night. [00:02:30] see you [00:03:07] PROBLEM - Puppet freshness on search13 is CRITICAL: Puppet has not run in the last 10 hours [00:19:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:19:28] PROBLEM - Host bellin is DOWN: PING CRITICAL - Packet loss = 100% [00:20:31] RECOVERY - Host bellin is UP: PING OK - Packet loss = 0%, RTA = 0.60 ms [00:27:52] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.024 seconds [00:39:07] PROBLEM - Puppet freshness on bellin is CRITICAL: Puppet has not run in the last 10 hours [01:01:19] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:08:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.392 seconds [01:20:32] PROBLEM - Puppet freshness on es1003 is CRITICAL: Puppet has not run in the last 10 hours [01:20:32] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [01:20:32] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [01:42:35] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 321 seconds [01:42:53] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:45:17] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 0 seconds [01:50:05] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.874 seconds [02:02:32] PROBLEM - Puppet freshness on search20 is CRITICAL: Puppet has not run in the last 10 hours [02:24:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:26:32] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [02:31:38] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.688 seconds [02:37:56] RECOVERY - Puppet freshness on search13 is OK: puppet ran at Fri Jun 1 02:37:44 UTC 2012 [03:09:35] PROBLEM - Router interfaces on cr1-sdtpa is CRITICAL: CRITICAL: host 208.80.152.196, interfaces up: 76, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/1/0: down - Core: cr2-eqiad:xe-5/2/1 (FPL/Level3, CV71028) [10Gbps wave]BR [03:46:02] PROBLEM - Host bellin is DOWN: PING CRITICAL - Packet loss = 100% [03:47:05] RECOVERY - Host bellin is UP: PING OK - Packet loss = 0%, RTA = 0.56 ms [04:46:12] PROBLEM - Host bellin is DOWN: PING CRITICAL - Packet loss = 100% [04:50:42] RECOVERY - Host bellin is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [04:55:30] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [05:01:31] PROBLEM - Puppet freshness on search17 is CRITICAL: Puppet has not run in the last 10 hours [05:05:24] PROBLEM - Puppet freshness on search15 is CRITICAL: Puppet has not run in the last 10 hours [05:12:48] PROBLEM - Puppet freshness on search19 is CRITICAL: Puppet has not run in the last 10 hours [05:12:48] PROBLEM - Puppet freshness on search14 is CRITICAL: Puppet has not run in the last 10 hours [05:12:48] PROBLEM - Puppet freshness on search16 is CRITICAL: Puppet has not run in the last 10 hours [05:22:42] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [06:06:24] PROBLEM - Puppet freshness on sodium is CRITICAL: Puppet has not run in the last 10 hours [06:26:12] PROBLEM - Host bellin is DOWN: PING CRITICAL - Packet loss = 100% [06:27:15] RECOVERY - Host bellin is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [06:45:40] !log reboot dataset2, kernel update and security updates [06:45:44] Logged the message, Master [06:47:38] PROBLEM - Host dataset2 is DOWN: CRITICAL - Host Unreachable (208.80.152.185) [06:49:03] yes, it certainly is. 300 days without fsck so that's how it is [06:53:11] RECOVERY - Host dataset2 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [06:53:29] it's not reallyup, it's still fscking [06:53:40] and there goes the rest of the boot [08:17:10] RECOVERY - Router interfaces on cr1-sdtpa is OK: OK: host 208.80.152.196, interfaces up: 78, down: 0, dormant: 0, excluded: 0, unused: 0 [08:20:01] New patchset: ArielGlenn; "basedir arg for starting python script; formatting; proper arg handling" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/9606 [08:20:47] New review: ArielGlenn; "(no comment)" [operations/dumps] (ariel); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9606 [08:20:49] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/9606 [08:36:05] New patchset: ArielGlenn; "turn off mod_compress on dataset2 (http://redmine.lighttpd.net/issues/2391 ?)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9607 [08:36:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9607 [08:37:11] New review: ArielGlenn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9607 [08:37:14] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9607 [09:00:52] PROBLEM - Puppet freshness on srv300 is CRITICAL: Puppet has not run in the last 10 hours [09:00:52] PROBLEM - Puppet freshness on srv298 is CRITICAL: Puppet has not run in the last 10 hours [09:03:52] PROBLEM - Puppet freshness on srv254 is CRITICAL: Puppet has not run in the last 10 hours [09:07:55] PROBLEM - Puppet freshness on srv228 is CRITICAL: Puppet has not run in the last 10 hours [09:07:55] PROBLEM - Puppet freshness on srv248 is CRITICAL: Puppet has not run in the last 10 hours [09:07:55] PROBLEM - Puppet freshness on srv284 is CRITICAL: Puppet has not run in the last 10 hours [09:07:55] PROBLEM - Puppet freshness on srv292 is CRITICAL: Puppet has not run in the last 10 hours [09:07:55] PROBLEM - Puppet freshness on srv297 is CRITICAL: Puppet has not run in the last 10 hours [09:09:52] PROBLEM - Puppet freshness on srv229 is CRITICAL: Puppet has not run in the last 10 hours [09:10:55] PROBLEM - Puppet freshness on srv267 is CRITICAL: Puppet has not run in the last 10 hours [09:10:55] PROBLEM - Puppet freshness on srv208 is CRITICAL: Puppet has not run in the last 10 hours [09:12:53] PROBLEM - Puppet freshness on srv259 is CRITICAL: Puppet has not run in the last 10 hours [09:20:05] PROBLEM - Puppet freshness on srv241 is CRITICAL: Puppet has not run in the last 10 hours [09:21:17] PROBLEM - Puppet freshness on srv299 is CRITICAL: Puppet has not run in the last 10 hours [09:22:20] PROBLEM - Puppet freshness on srv231 is CRITICAL: Puppet has not run in the last 10 hours [09:22:20] PROBLEM - Puppet freshness on srv295 is CRITICAL: Puppet has not run in the last 10 hours [09:25:20] PROBLEM - Puppet freshness on srv226 is CRITICAL: Puppet has not run in the last 10 hours [09:25:20] PROBLEM - Puppet freshness on srv237 is CRITICAL: Puppet has not run in the last 10 hours [10:03:01] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/8454 [10:03:04] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8454 [10:04:47] RECOVERY - Puppet freshness on sodium is OK: puppet ran at Fri Jun 1 10:04:19 UTC 2012 [11:01:52] New patchset: Pyoungmeister; "adding db1001 and 1020 back into db-secondary.php" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9615 [11:01:58] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/9615 [11:02:54] PROBLEM - Host bellin is DOWN: PING CRITICAL - Packet loss = 100% [11:03:13] New review: Pyoungmeister; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9615 [11:03:15] Change merged: Pyoungmeister; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9615 [11:05:27] RECOVERY - Host bellin is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms [11:22:34] PROBLEM - Puppet freshness on es1003 is CRITICAL: Puppet has not run in the last 10 hours [11:22:34] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [11:22:34] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [11:22:38] New patchset: Reedy; "Only load checkuser if cluster is PTMPA" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9617 [11:22:38] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/9617 [12:03:14] heyaa mutante [12:03:16] you there? [12:03:52] PROBLEM - Puppet freshness on search20 is CRITICAL: Puppet has not run in the last 10 hours [12:04:28] I'm not sure he is [12:04:38] Not seen him about today, and he was going out with some of the rest of ops... [12:05:30] ay [12:05:31] e [12:05:31] k [12:27:33] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [13:19:13] New patchset: Ottomata; "Puppetizing gerrit-stats on stat1" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9627 [13:19:34] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9627 [13:22:11] New patchset: Ottomata; "site.pp - Giving access on oxygen to Ryan Faulkner. RT3063" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9628 [13:22:32] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9628 [13:31:06] anybody around to approve those? [13:31:10] or is berlin too much f un? [13:35:15] ryan_lane? [13:35:35] maplebed? [13:35:54] heh ottomata [13:36:07] hey jeremeb [13:36:09] can you approve https://gerrit.wikimedia.org/r/#/c/9628/ [13:36:18] * jeremyb has no special rights [13:36:27] :( [13:36:42] i did open them though and i can read. and maybe +1 [13:37:36] New patchset: Pyoungmeister; "putting db1017 back as the secondary master" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9630 [13:37:42] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/9630 [13:37:53] can anyone purge Varnish caches? I know it's lame but we're working on making it not needed [13:39:20] MaxSem: sec [13:39:54] New patchset: Pyoungmeister; "putting db1017 back as the secondary master" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9630 [13:39:59] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/9630 [13:41:26] New review: Pyoungmeister; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9630 [13:41:29] Change merged: Pyoungmeister; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9630 [13:41:59] MaxSem: done [13:43:29] Ryan_Lane: thanks [13:48:58] drdee: so, why not host this repo on gerrit? [13:49:05] cmjohnson1: how'd mw31 go? [13:49:19] also, moin ;) [13:49:21] jeremyb: because........... [13:49:33] the git / gerrit system is broken [13:49:46] we were trying to get a repo setup for three days [13:49:48] no success [13:49:57] ie, nobody had time to do it [13:49:58] hmmmmmm [13:50:04] so we do it ourselves [13:50:05] s [13:50:10] we cannot afford to wait all the time [13:50:34] * jeremyb hasn't noticed you try. but i don't see everything... [13:50:40] demon's not in berlin iirc [13:51:13] cmjohnson1: i'll move it back to spares then? [13:52:37] oohhh a cmjohnson1 [13:52:45] only 1! [13:52:47] jeremyb [13:52:55] everyone needs the ability to create repos whenever they want [13:53:02] ottomata: agreed! [13:53:13] until that happens, gerrit is really hard to use for stuff like this [13:53:28] ottomata: although approximately as important is creating branches whenever they want [13:53:37] we are running out of patience [13:53:41] true [13:53:43] both are good [13:53:52] i can create branches on analytics gerrit projects [13:53:54] but only because I am an admin [13:53:57] it's possible to create branches [13:54:00] for that group [13:54:04] Ryan_Lane: for mortals [13:54:15] the fact that gerrit does not support customizable rights is really a pain [13:54:16] yes, that's possible [13:54:25] ottomata: although there may be some req that creating either by unprivileged people be limited to some namespace? but that's ok IMO [13:54:37] repos should be free to create [13:54:42] as long as it is configurable, that's fine [13:54:42] there is absolute no harm [13:54:43] aananyyyyyway, its no biggie [13:54:55] we can move this stuff to gerrit when gerrit is ready i think [13:55:03] i put the origin url in variable! [13:55:06] easy to change later :) [13:55:14] really there should just be a gitorious instance. and some repos can be configured to mirror to gerrit and some don't have to [13:55:34] RECOVERY - Host mw31 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [13:55:37] until gerrit get's it's act together ;) [13:56:13] I think this version of gerrit allows us to give access to let anyone create a repo [13:56:14] also, you can maintain access to your own repo [13:56:28] ottomata: well idk where this will run but there may be some security implications. you're fetching a cleartext gerrit URL and have no hardcoded hashes [13:56:40] Ryan_Lane: demon says no [13:56:40] so, yeah, I don't think your points are valid. waiting for a repo does suck, though [13:57:10] jeremyb: says no on what? [13:57:11] ottomata: s/gerrit/git/ of course [13:57:22] Ryan_Lane: creators must have full admin rights [13:57:28] Ryan_Lane: no middle ground right to grant [13:57:37] gitorious is like a *really* shitty github [13:57:42] ah cmjohnson1 I was wondering whether the second eth port on dataset1 has been cabled up or not [13:57:48] if not I want to schedule it (but not today) [13:57:48] hah [13:58:16] jeremyb: . :a cleartext gerrit URL and have no hardcoded hashes" ??? [13:58:25] the repo is public readable [13:59:08] er [13:59:10] dataset2 [13:59:10] sorry [13:59:14] ottomata: the repo itself is not controlled by ops. it's not served over a secure channel. there's no verification of what's been cloned other than against what came over the secure channel [13:59:19] dataset1 is deaddeaddead and thank goodness [13:59:19] PROBLEM - Apache HTTP on mw31 is CRITICAL: Connection refused [13:59:33] ottomata: over the insecure channel* [13:59:44] cmjohnson1: mw31? [13:59:48] (nagios ^^) [13:59:50] hm [13:59:51] ok [13:59:58] i guess https wouldn't be enough then? [14:00:53] cmjohnson1: prolly puppet didn't get to it yet [14:01:03] cmjohnson1: apache's not started on boot. it's started by puppet [14:01:40] it is possible. it will run by itself within 30 mins i think [14:01:44] i can't [14:01:58] ok I will note that and add to my list [14:01:59] thanks [14:02:03] ottomata: https would be a minor improvement. the state of PKI is pretty bad ;-( [14:02:08] ottomata: IMO [14:02:56] ottomata: maybe some actual ops person can voice an opinion [14:02:59] or, what if I cloned via ssh using diederik's key on stat1 [14:03:01] would that be ok? [14:03:23] i would say ideally generate a new key just for this clone operation [14:03:26] david schoonover is hosting this repo right now, and he has to manually put keys in place for ssh access [14:03:47] well, in order to run the python script, I had to generate a key for a real user (diederik) for gerrit [14:03:55] this script gets stats from gerrit via ssh [14:04:00] so, i might as well use the same key there [14:04:09] hrmmm [14:04:39] gerrit already has a dummy user on manganese that ssh's in and runs commands [14:05:04] can we make another role account in the gerrit DB? (or ldap or wherever) [14:05:14] for? [14:05:34] oh yeah, that'd be cool [14:05:42] for what? [14:05:50] wouldn't the private key have to be on stat1 then though? [14:06:32] New review: Jeremyb; "looks good, key exists and is enabled in admins.pp" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/9628 [14:06:48] ottomata: if that's where this class is used [14:06:54] Ryan_Lane: for stats queries [14:07:02] eh? [14:07:15] queries for what? [14:07:17] gerrit? [14:07:18] Ryan_Lane: ssh newrole@gerrit 'gerrit query' [14:07:27] ah [14:08:18] um, yes [14:08:24] how do private keys get put in place though? [14:08:25] puppet? [14:08:33] are they stored in the private repo? [14:08:35] yes [14:08:39] aaahh, mmk [14:09:26] or you could just generate it and if the box dies generate a new one [14:10:09] not really... [14:10:26] because we only put a password on role accounts temporarily [14:10:35] just long enough to set their key in gerrit [14:11:28] PROBLEM - Host es4 is DOWN: PING CRITICAL - Packet loss = 100% [14:11:31] but it's possible to force reset a role acct passwd? [14:12:11] boxes with keys should be rare so boxes with keys dying should be no more than ~twice a year? [14:12:29] anyway, i don't really care which way. just curious [14:13:16] RECOVERY - Apache HTTP on mw31 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.035 second response time [14:16:08] puppet version? 2.6.x ? [14:18:32] Ryan_lane: you said "Ryan_Lane: I think this version of gerrit allows us to give access to let anyone create a repo" [14:18:47] drdee: i contested ;) [14:18:50] could you please give me, David Schoonover and Andrew Otto the right to create repo's [14:19:07] jeremyb, Ryan_Lane: so what should I do about the gerrit-stats clone? [14:19:21] 2 questions remain to be solved: [14:19:43] ottomata: either one is magically created in the next ~5 mins (not counting on it) or just clone by ssh? [14:19:48] 1. how should I clone (https? ssh + some users's key?) [14:19:48] 2. how should I run the script (needs access to run ssh gerrit queries) [14:19:55] drdee: I didn't say we decided to do that yet :) [14:20:30] but we really need to make the process less painful because we are wasting so much time with workarounds, and discussions [14:20:32] Ryan_Lane: i'm pretty damn certain demon said it was unpossible [14:20:37] it is impossible [14:20:40] i am pretty sure as well [14:20:48] there is only 'admin' in gerrit [14:20:53] and that means you can do anything [14:20:54] that's not true [14:21:06] well that's essentially what he said [14:21:11] when? [14:21:13] how long ago? [14:21:24] less than 10 days? [14:21:31] maybe 20 [14:21:33] idk [14:21:33] ryan_lane: ops can give us repo create rights but is not giving them? [14:21:43] Ryan_Lane: anyway, can you make the role user? [14:21:57] jeremyb: I'm in the middle of a sprint [14:22:04] is this for a hackathon sprint? [14:22:23] drdee: no. we need to figure out the process for stuff [14:22:42] Ryan_Lane: well then in 2 hrs? sometime? [14:22:44] it's a brand new feature. it's also possible that it requires other rights [14:22:44] what stuff? [14:22:54] jeremyb: again, is this for a sprint? [14:23:00] idk [14:23:04] then no [14:23:04] drdee? [14:23:08] put in a ticket [14:23:21] ottomata: drdee: ^^ [14:23:40] oh come on [14:24:22] fyi: https://gerrit.wikimedia.org/r/#/admin/groups/119,members [14:24:36] drdee: eh? [14:24:47] "These users can create new projects in Gerrit, but are not Gerrit administrators." [14:24:57] so. there you go. [14:25:12] how odd [14:25:23] so why can't the analytics team get this right? [14:25:40] I'm not saying you can't [14:25:50] I'd like to talk to robla's team before that [14:25:59] and ^demon [14:26:14] but we are robla's team :) [14:27:38] does new project == new repository? [14:29:02] must be [14:29:28] did you see /r/#/admin/projects/ ? [14:30:09] you guys know the var naming in this script is really bad? [14:30:19] not all wikimedians are staff! [14:30:21] ;-) [14:30:56] if i push it up somewhere will you pull my changes? [14:31:05] drdee? [14:31:10] yo [14:31:14] ^^ [14:31:25] i'll fix that [14:31:32] * jeremyb has other things to change [14:32:38] haha, we'd be happy to host this repo from gerrit if someone can just create the repo for us right now [14:32:44] then jeremyb can push whatever he likes [14:32:59] heh [14:33:05] or, hm, i dunno, actually [14:33:14] i was going to just push to e.g. github [14:33:18] and you could pull from there [14:33:24] or mail a git format-patch [14:33:28] yeah, drdee, we could host this from wikimedia github [14:33:38] was there a reason we did less.ly over that? [14:34:04] i mean it could still be at less.ly. just a simple pull from an extra remote [14:34:08] or a `git am` [14:34:18] we are not admin's on github that is tomasz [14:34:33] ah, we can't create repos there? [14:34:54] i have the option to [14:35:00] to set the owner to wikimedia [14:35:08] jeremyb, drdee [14:35:13] whatever changes you want to make to the script, cool [14:35:14] but [14:35:16] as for my puppetization [14:35:20] aaahhhh i just want to do it [14:35:25] instead of spending > 1h talking about it! [14:35:28] so. [14:35:33] heh [14:35:44] i will just use diederik's user to ssh clone from less.ly, and I will use diederik's user to run the script [14:35:48] we can change later if we need to [14:39:11] New review: Jeremyb; "(no comment)" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/9627 [14:41:45] New review: Ottomata; "(no comment)" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/9627 [14:41:56] jeremyb, responded to comments inline [14:42:46] yah [14:43:47] ottomata: docs are kinda lacking [14:43:51] ottomata: > Where the cron job should be stored. For crontab-style entries this is the same as the user and defaults that way. Other providers default accordingly. [14:43:57] that's for "target" [14:44:43] also all of the examples in the docs do specify user. even though they're all root [14:45:45] if I did target => root, user => diederik [14:45:52] how would it know to run the command as diederik? [14:46:01] idk [14:46:04] the job would then be in root's crontab [14:46:16] maybe. idk what puppet does now [14:46:43] does it touch root's crontab or does it touch /etc/crontab (which has user to run as specified explicitly) or both? [14:46:51] as i said, docs are lacking [14:47:29] haha [14:47:31] anyway, not a show stopper [14:47:39] hm, yeah [14:47:44] welp, that's how i've always done it [14:48:00] keep all crons in root's crontab, sudo -u to whatever user should run it [14:50:19] ewww, evil: query = query.split(' ') [14:56:42] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [14:58:32] is that your # for the race? [14:59:16] no, it's AS # [14:59:33] oops [14:59:39] wrong network to do that in [14:59:42] you should make them match [14:59:46] ;) [15:00:04] doh [15:00:05] hehe [15:00:31] technically i should be leslie-14907-43821 [15:00:35] but i felt like that was long [15:01:01] you could use hex [15:01:05] or base64 even [15:01:27] but then adding on the 0x makes it long again [15:01:41] LeslieCarr: mw31 looks back. adding back to spares? [15:01:55] :) [15:02:13] and people wouldn't be able to figure it out as quickly [15:02:13] cmjohnson1: so how is the patient (mw31 ) ? [15:02:30] he said mem came back good [15:02:42] PROBLEM - Puppet freshness on search17 is CRITICAL: Puppet has not run in the last 10 hours [15:03:09] cool [15:03:09] :) [15:03:34] motrin?! [15:04:38] eep [15:05:01] poor delivery guy and poor you ! [15:05:16] rain?! i guess it's just 5 mins because it's FL? [15:05:27] cmjohnson1: i know. just what does motrin look like? [15:06:45] PROBLEM - Puppet freshness on search15 is CRITICAL: Puppet has not run in the last 10 hours [15:06:54] https://healthy.kaiserpermanente.org/static/drugency/images/MCN04631.JPG [15:06:57] New patchset: Jeremyb; "mc.php: mw31 DOWN -> SPARES" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9637 [15:07:23] LeslieCarr: want to double check that there's a memcached running there? [15:07:31] puppet started apache so it should be [15:07:57] so where do you put the tablets? in the fans? [15:08:45] on the motherboard, then you wash it down with a glass of water [15:08:53] ohhh, ok [15:09:24] * jeremyb notes some people run boxes submerged in oil 24/7 [15:09:39] petan|wk: in berlin or not? [15:09:46] i think he is [15:10:40] hrm, for some reason fenari is misbehaving for me [15:10:52] ssh is just holding after key exchange... [15:10:55] it needs motrin [15:11:03] :) [15:11:18] weird, working happily from home going direct.. [15:12:34] jeremyb: grrr i can't check it right now, having home tube problems, going to debug now :( [15:13:10] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/9637 [15:13:11] home tube problems means it works fine from home? how backwards [15:13:20] jenkins is slow [15:13:39] PROBLEM - Puppet freshness on search19 is CRITICAL: Puppet has not run in the last 10 hours [15:13:39] PROBLEM - Puppet freshness on search14 is CRITICAL: Puppet has not run in the last 10 hours [15:13:39] PROBLEM - Puppet freshness on search16 is CRITICAL: Puppet has not run in the last 10 hours [15:14:24] ok, back … let's see if this is working happily now :) [15:14:38] < jeremyb> home tube problems means it works fine from home? how backwards [15:14:51] hehehe [15:22:20] New review: Lcarr; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9637 [15:22:22] Change merged: Lcarr; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9637 [15:23:42] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [15:24:23] brb again [15:26:23] danke [15:29:18] LeslieCarr: i guess sync doesn't matter but should at least pull to fenari? [15:29:40] yeah [16:03:53] New patchset: Jdlrobson; "match varnish config with DeviceDetection.php" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9640 [16:04:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9640 [16:14:21] PROBLEM - Host bellin is DOWN: PING CRITICAL - Packet loss = 100% [16:15:15] RECOVERY - Host bellin is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [16:16:29] drdee: what's ws mean? gerritstats/stats.py:118 [16:17:28] jeremby: *w*hiteli*s*t haha it was a bit late when i had to come up with that abbreviation :D [16:20:56] bbiab [16:59:15] New patchset: Ottomata; "Puppetizing gerrit-stats on stat1" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9627 [16:59:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9627 [17:06:01] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7348 [17:06:03] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7348 [17:44:24] cmjohnson1: hey [17:44:34] can you configure the wireless router to not hand out any ip's at all [17:44:44] and then plug it into fe-0/0/3.0 on mr1-pmtpa [17:44:58] give it the ip of 10.3.1.2 / 24 [17:45:03] default gateway 10.3.1.1 [17:57:21] RECOVERY - Apache HTTP on mw64 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.083 second response time [18:08:45] RECOVERY - Puppet freshness on srv237 is OK: puppet ran at Fri Jun 1 18:08:34 UTC 2012 [18:08:47] Change abandoned: MarkAHershberger; "already done." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5433 [18:09:17] Change abandoned: MarkAHershberger; "minor trivial" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4734 [18:09:48] RECOVERY - Puppet freshness on srv299 is OK: puppet ran at Fri Jun 1 18:09:27 UTC 2012 [18:09:57] Change abandoned: MarkAHershberger; "trivial and leaving for ops" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5431 [18:11:45] RECOVERY - Puppet freshness on srv259 is OK: puppet ran at Fri Jun 1 18:11:35 UTC 2012 [18:13:42] RECOVERY - Puppet freshness on srv267 is OK: puppet ran at Fri Jun 1 18:13:26 UTC 2012 [18:13:42] RECOVERY - Puppet freshness on srv228 is OK: puppet ran at Fri Jun 1 18:13:39 UTC 2012 [18:15:48] RECOVERY - Puppet freshness on srv208 is OK: puppet ran at Fri Jun 1 18:15:33 UTC 2012 [18:17:45] RECOVERY - Puppet freshness on srv229 is OK: puppet ran at Fri Jun 1 18:17:19 UTC 2012 [18:19:15] RECOVERY - Puppet freshness on srv241 is OK: puppet ran at Fri Jun 1 18:19:07 UTC 2012 [18:19:42] RECOVERY - Puppet freshness on srv292 is OK: puppet ran at Fri Jun 1 18:19:20 UTC 2012 [18:22:42] RECOVERY - Puppet freshness on srv226 is OK: puppet ran at Fri Jun 1 18:22:32 UTC 2012 [18:23:45] RECOVERY - Puppet freshness on srv297 is OK: puppet ran at Fri Jun 1 18:23:40 UTC 2012 [18:28:15] RECOVERY - Puppet freshness on srv248 is OK: puppet ran at Fri Jun 1 18:27:56 UTC 2012 [18:28:42] RECOVERY - Puppet freshness on srv295 is OK: puppet ran at Fri Jun 1 18:28:38 UTC 2012 [18:29:18] RECOVERY - Puppet freshness on srv284 is OK: puppet ran at Fri Jun 1 18:28:55 UTC 2012 [18:32:45] RECOVERY - Puppet freshness on srv254 is OK: puppet ran at Fri Jun 1 18:32:37 UTC 2012 [18:35:45] RECOVERY - Puppet freshness on srv300 is OK: puppet ran at Fri Jun 1 18:35:41 UTC 2012 [18:36:48] RECOVERY - Puppet freshness on srv231 is OK: puppet ran at Fri Jun 1 18:36:25 UTC 2012 [18:37:42] RECOVERY - Puppet freshness on srv298 is OK: puppet ran at Fri Jun 1 18:37:22 UTC 2012 [18:39:08] * jeremyb stabs Reedy... 7348's commit summary too long and rest of msg needs wrapping. ;-( [18:39:17] #care [18:39:39] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [18:39:48] IT's merged now [18:40:09] i know, i can't do anything about something that's already merged [18:40:15] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [18:40:36] * Reedy dances [18:43:15] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [18:55:46] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.027 second response time [19:51:11] ottomata, I'm having another apparmor problem with your puppet class. Can you assist? [19:51:26] yup! [19:51:28] what's up? [19:51:31] tell me the name of the node again? [19:51:41] log into instance 'mwreview' and have a loot at the end of the syslog. [19:51:49] *look [19:52:22] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [19:53:28] looks like the datadir isn't getting picked up where it needs to be [19:53:44] Connection closed by UNKNOWN [19:54:28] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [19:54:32] how mysterious [19:54:50] ottomata: You mean, when accessing that instance? [19:55:04] yeah [19:55:41] my mistake, just a second... [19:56:46] ottomata, try now [19:57:52] hmm, same [19:58:04] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [19:58:14] well, dammit. I must not understand how keys actually get handed out [20:03:37] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.024 second response time [20:40:50] PROBLEM - Puppet freshness on bellin is CRITICAL: Puppet has not run in the last 10 hours [20:41:31] andrewbogott: are you about? [20:41:42] I am! [20:41:52] can you rename a user for me on labsconsole? [20:42:01] I think the space is causing gerrit to get somewhat upset [20:42:22] I have a space in my username... [20:42:34] what username do you use to login to gerrit? [20:43:03] 'Andrew Bogott' [20:43:10] ffs [20:43:21] I think the password is the same as the labsconsole password. [20:43:26] yeah [20:45:40] andrewbogott: [20:45:42] User names cannot contain spaces. Currently they are restricted to be [20:45:42] strings that match the following regular expression: [20:45:42] ^[a-zA-Z][a-zA-Z0-9._-]*[a-zA-Z0-9]$ [20:46:29] Sure the spaces aren't stripped to _ as per wiki stuff? [20:46:41] Just tried _ on gerrit, doesn't play [20:46:42] link? [20:46:44] but that's talking to ldap [20:46:52] https://groups.google.com/group/repo-discuss/browse_thread/thread/a14cfffddbd4e11f/2d03b54f4c3ed945?pli=1 [20:47:29] https://www.mediawiki.org/wiki/Commit_access#10gible [20:48:26] He can log into labsconsole, I take it? [20:48:35] Yeah, labs console is fine [20:48:42] hm... [20:48:46] silly gerrit... [20:48:48] gerrit gives [20:48:49] "Cannot assign username "10gible" to account 311; name does not conform" [20:48:57] <^demon> Spaces are evil in filenames and usernames. Curse MediaWiki...never should've dropped CamelCase. [20:49:23] ^demon, learn to quote the names [20:49:42] Bobby tables we call him here... [20:51:04] Oh, that's probably because his shell name starts with a number. [20:51:14] Dummy that I am, I didn't even notice that when he requested an account. [20:51:38] double inconforming [20:51:52] * ^demon puts andrewbogott in the village stocks for the rest of the day :) [20:53:06] Well, hm. [20:54:18] 10gible is a valid shell name... so we're hitting some gerrit corner case. [20:54:39] Reedy, can you ask him to pick a shell name that starts with a letter? I suspect that'll fix it, although I'm not positive. [20:55:37] tangible [20:55:39] ^ [20:56:34] ok, with any luck he can reset his password again... [20:56:46] on labsconsole? [20:56:49] yeah [20:59:46] Gives incorrect username or password on gerrit [21:00:18] <^demon> Prolly have to drop the entry in account_external_ids. [21:00:33] <^demon> And then try agains. [21:01:27] in gerrit? [21:01:57] though, if gerrit doesn't allow spaces in usernames, it's not going to allow a username with a space to login? OR does ldap let that be overridden? [21:02:18] but works for andrewbogott? [21:02:36] <^demon> Reedy: *nod* in gerrit. [21:02:54] <^demon> We can change the mapping potentially. But really, people should just avoid spaces. [21:02:59] lol [21:03:02] It sounds to me like gerrit is having trouble associating the labsconsole name with the shell name. [21:03:09] mmm [21:03:15] And that it was validating the shell name during that process, using an overly sensitive validator. [21:03:17] we can just create a set of new accounts from scratch [21:03:19] might be easier [21:03:29] That's what I just did, I think. [21:03:48] But maybe now he has two different shell names associated with one username... crap. [21:04:10] delete it [21:04:14] DELETE THEM ALL [21:04:16] ssmollett, do you know how to delete accounts so we can start from scratch? [21:04:27] I sort of know how, but fear to cause unwanted side-effects. [21:04:30] in mediawiki world, we don't delete stuff [21:04:43] there is a delete-ldap-user script [21:04:49] oh, well then! [21:04:55] If he hasn't logged into gerrit, it's easy - if he has then ROFL [21:05:35] he can't log into gerrit [21:05:38] that's the problem [21:06:08] ok, now: [21:06:20] shell name 'tangible' username 'TanmayShah' [21:06:43] formey is still trying to be clever and reuse his previous home directory, so things /might/ still be broken. [21:06:45] But, we will see. [21:07:39] lol [21:08:21] that username doesn't work in gerrit.. [21:10:06] hey guys, how can I make it so that I can log in using my ssh key [21:10:07] which? [21:10:08] oops [21:10:13] didn't mean to hit enter there [21:10:17] so better phrased question: [21:10:26] I've got 10 analytics machines [21:10:28] I cannot ssh between them [21:10:34] only from my local machine (which has my public key) [21:10:35] ssh forward? [21:10:42] I have tried ssh -A [21:10:51] and adding it to the .ssh/config proxy rule [21:11:20] that's interesting ... [21:11:27] it actually says that it is offering my key [21:11:29] in ssh -v [21:11:39] but you can ssh directly to them ? [21:11:40] andrewbogott: yes, without space works! [21:11:42] wheeee [21:11:42] ottomata: Are you doing ssh -A to bastion and also /from/ bastion to the analytics machines? (Dumb question, but that's the mistake I always make) [21:11:45] otto@analytics1002:~$ ssh -v analytics1003 [21:11:45] … [21:11:46] debug1: Offering public key: /home/otto/.ssh/id_rsa [21:12:00] ok [21:12:10] ProxyCommand ssh -A -e none fenari.wikimedia.org exec nc -w 3600 %h %p [21:12:51] Preferred git username - will also be your Git commit author name, so your full name or wiki username would be reasonable (this will be permanent so choose wisely!). Letters, numbers, and spaces are okay. [21:12:58] ^demon: ^ /facepalm [21:13:04] [~]$ ssh -A analytics1002.eqiad.wmnet [21:13:04] otto@analytics1002:~$ ssh -v -A analytics1003.eqiad.wmnet [21:13:04] … [21:13:04] debug1: Offering public key: /home/otto/.ssh/id_rsa [21:13:04] … [21:13:05] debug1: Next authentication method: password [21:13:05] otto@analytics1003.eqiad.wmnet's password: [21:13:07] is that wher eyour key is on your home dir ? [21:13:19] yes [21:13:21] on your laptop you're at /home/otto ? [21:13:22] hrm [21:13:46] oh no [21:13:48] hmm [21:13:48] no [21:13:50] because that has the double issue of possibly being trying to just use the key on that machine [21:13:54] yeah [21:13:55] ah yeah [21:13:56] hmmm ok one sec [21:13:57] lemme check that [21:14:09] ok [21:14:20] i do ahve an ssh key on analytics1002 [21:14:23] Reedy: So things are working now? Despite us having been wrong about every possible diagnosis? [21:14:25] so i just tried 1005 -> 1006 [21:14:27] same thing though [21:14:36] debug1: Trying private key: /home/otto/.ssh/id_dsa [21:14:41] debug1: Trying private key: /home/otto/.ssh/id_rsa [21:14:42] etc. [21:14:44] but those files do not exist [21:14:45] so [21:14:47] debug1: Next authentication method: password [21:15:23] hrm so… i'm guessing the agent isn't getting forwarded [21:15:36] yeah [21:18:28] andrewbogott: no spaces wins [21:18:37] Reedy: does tangible also need a bastion account so he can use a labs project? [21:18:54] I don't think so, at the moment at least [21:19:23] https://www.mediawiki.org/w/index.php?title=Developer_access%2FInstructions_to_post_your_request_below&diff=545571&oldid=538141 [21:21:19] I still don't think that spaces are actually bad. [21:21:36] But I'm too distracted to run the necessary experiments right now... [21:22:46] indeed [21:22:59] I'm not familiar with how stuff is setup either [21:23:07] mw would replace spaces with _ [21:23:09] PROBLEM - Puppet freshness on es1003 is CRITICAL: Puppet has not run in the last 10 hours [21:23:09] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [21:23:09] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [21:23:09] but we did try that [21:23:52] i don't think linux would have issues with spaces in username [21:24:11] Gerrit explicitly doesn't like then [21:24:33] https://groups.google.com/group/repo-discuss/browse_thread/thread/a14cfffddbd4e11f/2d03b54f4c3ed945 [21:24:41] User names cannot contain spaces. Currently they are restricted to be [21:24:41] strings that match the following regular expression: [21:24:41] ^[a-zA-Z][a-zA-Z0-9._-]*[a-zA-Z0-9]$ [21:24:50] there's so many different things going on... [21:25:56] I wonder why ._- are forbidden as the last char [21:25:58] CamelCase ftw [22:05:16] PROBLEM - Puppet freshness on search20 is CRITICAL: Puppet has not run in the last 10 hours [22:28:13] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [22:39:59] New patchset: Victor Vasiliev; "(bug 33273) Fix Ukrainian Wikipedia FlaggedRevs configuration." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9717 [22:40:06] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/9717