[00:49:28] PROBLEM - Puppet freshness on snapshot4 is CRITICAL: Puppet has not run in the last 10 hours [03:56:30] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [06:10:06] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [09:38:32] PROBLEM - Puppet freshness on bast1001 is CRITICAL: Puppet has not run in the last 10 hours [09:38:32] PROBLEM - Puppet freshness on fenari is CRITICAL: Puppet has not run in the last 10 hours [09:38:32] PROBLEM - Puppet freshness on es1002 is CRITICAL: Puppet has not run in the last 10 hours [09:56:42] RECOVERY - MySQL slave status on es1004 is OK: OK: [10:09:19] PROBLEM - Disk space on db9 is CRITICAL: DISK CRITICAL - free space: /a 10743 MB (3% inode=99%): [11:24:44] PROBLEM - Puppet freshness on snapshot4 is CRITICAL: Puppet has not run in the last 10 hours [11:27:10] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1618 [11:27:10] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1618 [11:31:06] New review: Dzahn; "what about "misc::jenkins"?" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/1611 [11:33:18] New review: Dzahn; "sure, if the dir is missing anyways..." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1620 [11:33:18] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1620 [11:36:42] New review: Hashar; "misc::jenkins already exist and is a different project not related to mediawiki continuous integrati..." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/1611 [11:37:07] mutante: I have answered on change 1611 (misc::jenkins already exist and is a different project) [11:37:10] oh and hello :-) [11:37:31] hello hashar [11:37:46] I have made some other changes friday related to android. They should be easy changes :D [11:38:27] ok, so in 1611 all you did was move code around, you did not change other things in the same patch set, right? [11:40:48] just moved stuff around [11:41:01] so that should be fine [11:41:12] mutante: Could you also review&merge 1617 and 1619? They fix the puppet errors on fenari [11:41:23] I am not sure what the second patch set is about. I think I git pushed my repo somehow [11:41:38] and that triggered new patch set in gerrit for everything that was not merged [11:42:00] hello RoanKattouw :) [11:48:39] New review: Dzahn; "yep, we want to move stuff into separate files..." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1611 [11:48:40] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1611 [11:50:20] mutante: thanks. https://gerrit.wikimedia.org/r/#change,1572 should be straight forward too. [11:50:41] it is messing up with the file headers :) [11:50:59] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1617 [11:50:59] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1617 [11:52:59] RoanKattouw: so you also want "mysql-client-5.1" removed from "misc::survey"? [11:54:05] Which rev is this again? [11:54:09] 1619? [11:54:12] !g 1619 [11:54:13] https://gerrit.wikimedia.org/r/1619 [11:54:47] yes [11:55:03] mutante: I'm removing mysql-client-5.1 and adding mysql::client which adds mysql-client-5.1 [11:55:09] So the package should continue to be installed [11:55:44] ok, just saw that [11:55:45] (Also note that this is a cherry-pick (of 1548 IIRC)) [11:56:13] 1048 [11:57:01] you are adding it to misc::fundraising , i was just wondering "why does it say bastion host then" [11:57:04] ok [11:57:09] Oh [11:57:13] So here's the story [11:57:21] 1048 added it to a few things, including misc::bastion [11:57:30] Then Chad went and split out misc::bastion into its own file [11:57:43] Then that change made it into production, but my change introducing mysql::client didn't [11:58:01] So puppet on fenari was failing because misc::bastion required the non-existent class mysql::client [11:58:10] ..ah i see [11:58:19] Cherry-picking my change (sans the bastion part because that stuff was moved and erroneously applied already) fixes that [11:59:13] New review: Dzahn; "ok, convinced :)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1619 [11:59:14] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1619 [12:05:27] New review: Dzahn; "just a tiny thing: "ensure=>installed;" would be nicer as "ensure => installed;"" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/1612 [12:06:46] New review: Dzahn; "Should "Nighties" be "Nightlies"? (nightly)?" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/1615 [12:09:36] mutante: yeah I would have said nightlies too but my Mac converts it to nighties (without L) [12:09:40] need to verify that [12:10:26] http://en.wiktionary.org/wiki/nightly#Noun [12:10:53] http://www.indianshaadi.org/tag/nighties/ ( Not safe for work :p ) [12:11:01] heh [12:11:23] (informal) A woman's nightgown or nightdress; a dress-like garment worn to bed. [12:11:33] http://en.wiktionary.org/wiki/nightie#Noun [12:13:26] check_job_queue still fails .. sigh [12:17:48] hashar: off-topic, have you been here before? catacombes-de-paris.fr [12:18:17] yeah several times [12:18:23] this is the official site for tourists [12:18:29] this / that [12:18:49] that is also the safe way to visit the catacombs [12:19:03] and yeah, you can actually see skeletons :-)) [12:19:39] during the 18th century, the graveyard in Paris center was closed [12:19:52] and all human remains moved to the Catacombes [12:20:22] not sure how old the graveyard was [12:21:26] i was actually there on the weekend, next time i shall tell you before;) [12:21:44] yea, skeletons, creepy:) [12:21:59] RECOVERY - Puppet freshness on bast1001 is OK: puppet ran at Mon Dec 19 12:21:46 UTC 2011 [12:22:37] mutante: I used to visit them the illegal way [12:22:54] i.e. finding a way which is not really open to public [12:23:01] and NOT safe at all [12:23:09] hashar: some guy told use people get drunk and then dont find their way out anymore :P [12:23:21] I am sure it happened a few time [12:24:10] we did that using "un Fil d'Ariane" [12:24:19] http://de.wikipedia.org/wiki/Tauchseil [12:24:31] it was kind of funny how they search your bag when you exit the catacombs.. to check for stolen bones and skulls :p [12:25:00] the people I explored with had maps of the places, we had water and food for several days and light (ton of lights) [12:25:23] yeah I am sure tourist try to steal bones for "souvenirs" [12:25:25] wow. i didnt know about "Tauchseil" [12:25:48] I am not sure Tauchseil is what it is meant to be. I just followed the interwiki links 8-) [12:25:52] well, i understand, but not that it is named after "Ariadne" / Arian" [12:26:00] yep [12:26:45] I kinda like nighties ;-) [12:26:50] apergos: :))) [12:26:51] :) [12:27:14] so there's a complaint about the yaml file for puppet on sn4, it's what prevents it from running. [12:27:18] we could keep it as an easter egg :D [12:27:19] no idea what the fix is for that though. [12:27:28] +1 to keep! [12:27:38] food in 5 minutes! [12:27:56] lunching too [12:28:11] syntax error on line 90, col 5: `' [12:28:12] hmmm [12:28:24] * hashar hides [12:29:59] RECOVERY - Puppet freshness on fenari is OK: puppet ran at Mon Dec 19 12:29:31 UTC 2011 [12:36:40] apergos: fixed it! [12:36:52] quote: "Need to remove the .yaml files in /var/puppet/yaml/node and /var/puppet/yaml/facts (or where ever you may have these files, ... Then the client will run and the files will get rebuilt. " [12:36:59] RECOVERY - Puppet freshness on snapshot4 is OK: puppet ran at Mon Dec 19 12:36:41 UTC 2011 [12:38:26] !log deleted snapshot4 files from /var/lib/puppet/yaml/node and ./yaml/facts on sockpuppet and stafford, they got recreated and fixed puppet run on sn4 [12:38:34] Logged the message, Master [12:39:55] ah you just deleted it? great [12:40:05] yep, and after a manual run they exist again [12:40:09] but dont cause the error anymore [12:40:20] cause they are rebuilt [12:40:23] yes [12:40:28] sweet [12:41:01] :) enjoy lunch ,bbiaw [12:42:19] oh my [12:42:55] nighties renamed : https://gerrit.wikimedia.org/r/#change,1615 [12:43:04] and I managed to amend a change :-))) [12:43:05] \o/ [12:43:10] eating [12:47:52] New patchset: Hashar; "basic class to install Android SDK prerequisites" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1612 [12:47:52] New patchset: Hashar; "directory to host WikipediaMobile nightly builds" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1615 [12:47:52] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1615 [12:51:41] New patchset: Hashar; "basic class to install Android SDK prerequisites" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1612 [12:51:53] New patchset: Hashar; "directory to host WikipediaMobile nightly builds" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1615 [12:52:05] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1612 [12:52:05] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1615 [12:52:13] bah [12:52:33] mutante_: i have amended changes 1612 & 1615 :-) [12:56:45] [12:13:26] check_job_queue still fails .. sigh [12:56:48] mutante_: What's the error message? [12:57:16] Also [12:57:18] [12:29:59] RECOVERY - Puppet freshness on fenari is OK: puppet ran at Mon Dec 19 12:29:31 UTC 2011 [12:57:20] Yay! [13:00:39] New review: Dzahn; "i think you want the 2 links in their own
  • tag, so they are listed below each other. and how wil..." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/1615 [13:01:08] RoanKattouw: it still gets "return code 127" [13:01:23] RoanKattouw: i think it just takes too long (longer than timeout configured in Nagios) [13:01:50] Hmm [13:02:02] No descriptive error message? Have you tried running the command manually on spence? [13:02:20] Hmm, also [13:02:21] yes, not today, but after our changes, and it worked (but took quite a while) [13:02:36] It could probably be rewritten to be more efficient [13:02:40] and i saw the timeout is less [13:02:52] Right now, it runs checkJobs.php or whatever it's called separately for every wiki [13:03:03] So that's 800+ PHP initializations and 800+ MySQL connection setups and teardowns [13:03:40] If we write one script to check all wikis (already exists in some form for jobs-loop.sh I believe), we'd have 1 PHP initialization and 7 MySQL conns [13:04:10] that sure sounds better [13:04:34] or we could have a central job database :) [13:04:34] time /usr/local/nagios/libexec/check_job_queue [13:04:34] JOBQUEUE OK - plwiki has the 3rd biggest job queue, 0 entries [13:04:44] real0m21.203s [13:04:46] 1 PHP instant / 1 conn / 1 query ? [13:04:52] heh, that is already a LOT faster than it used to be [13:04:53] WTf [13:05:04] it was like 5 minutes or so before [13:05:04] 21 seconds just to check plwiki which has 0 entries? [13:05:06] That's insane [13:05:17] This is a check for *one wiki* , right? [13:05:36] for a in commonswiki enwiktionary dewikinews itwiki ptwiki dewiki plwiki frwiki eswiki jawiki svwiki ; do [13:06:34] Ohg [13:06:36] "arning: Return code of 127 for check of service 'check_job_queue' on host 'spence' was out of bounds. Make sure the plugin you're trying to run actually exists." [13:06:42] So 11 wikis [13:06:43] hmmm.. or the path is just wrong? :p [13:06:50] and we changed the wrong file.. checking [13:06:52] Anyawy [13:07:05] Rewriting that check sounds like a fun project [13:07:12] I'll take a stab at that later today [13:07:16] cool [13:08:26] ooh.. "duplicate definition" [13:10:09] yeah, now that it's been completely puppetized, we need to get rid of the old manual config [13:11:50] uuh.. there is also "makeService" in "conf.php" [13:14:13] !log commented check_job_queue stuff from non-puppetized files on spence (hosts.cfg, conf.php) to get rid of "duplicate definition" now that it's been pupptized [13:14:21] Logged the message, Master [13:18:53] !log truncated spence.cfg in ./puppet_checks.d/ - it had multiple dupe service definitions for all checks on spence [13:19:01] Logged the message, Master [13:19:16] stupid apt pinning [13:19:27] bac [13:19:28] k [13:19:36] look at this: :) [13:19:38] /etc/init.d/nagios reload [13:19:38] Purging decommissioned resources...Running configuration check...done. [13:19:42] Reloading nagios configuration...done [13:19:49] no "WARNING"s at all anymore:) [13:19:58] yay [13:20:02] well done :-) [13:20:12] eh, nevermind [13:20:22] none for spence, but a lot for other hosts [13:21:06] they way services are define in puppet_checks.d creates duplicates [13:21:08] the [13:22:15] it's like stuff is being appended to those files, but not removed [13:25:52] and also interesting is that first you get tons of "Warning:" lines when checking config, but then it still claims "Total Warnings: 0" afterwards [13:27:04] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1612 [13:27:04] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1612 [13:29:06] hashar: you are now including "misc::contint::android::sdk" on gallium, i ran puppet..applied config .. now. "Misc::Contint::Android::Sdk/Package[ant1.8]/ensure: ensure changed 'purged' to 'present'" [13:29:25] Finished catalog run [13:29:35] great! [13:29:56] I am trying to find out how to amend https://gerrit.wikimedia.org/r/#change,1572 :p [13:30:08] ii ant1.8 [13:30:45] hashar: let me show you what i meant.. [13:32:34] mutante: https://gerrit.wikimedia.org/r/#change,1572 amended :) [13:33:39] New patchset: Dzahn; "logrotate: add "managed by puppet" headers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1621 [13:33:47] arrrg..no [13:34:00] that was supposed to be 1572 [13:34:19] Change abandoned: Dzahn; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1621 [13:34:54] ohh [13:35:11] yeah pushing to production instead of test is fun :) [13:36:13] mutante: have you uninstalled ant1.8 from gallium? [13:37:00] hashar: no, it has been installed by your puppet change [13:37:45] well it is no more installed :D [13:38:01] 14:30 < mutante> ii ant1.8 [13:38:28] yeah and I confirm I had access to a 1.8 version of ant just after you pasted that [13:38:41] but now: un ant1.8 [13:38:46] hehe [13:38:58] do you want to remove "ant"? [13:39:06] to keep "ant1.8" ? [13:39:17] ant = 1.7.1 [13:39:42] ahh could they be in conflict? [13:39:53] I mean installing "ant" would automatically remove ant1.8? [13:40:05] I was expecting to have both in parralels [13:40:10] yes, they are in conflict [13:40:13] Conflicts: ant, ant-doc (<= 1.6.5-1), ant1.7, libant1.6-java [13:40:25] need to ensure "ant" is gone.. [13:41:06] I should have looked at the Conflicts: field [13:41:14] was expecting it to behave like the gcc packages [13:42:14] did this break something that is already used ? [13:42:37] want me to fix it manually while you fix puppet? or can it wait [13:42:45] it can wait [13:42:53] ant 1.7.1 is still around for jenkins to use [13:42:58] ok, and 1572 ok for you like it is now? [13:44:04] yeah 1572 is fine you can merge it [13:44:15] I have cut the ### lines after 70 chars [13:45:34] done. feel free to review my pending changes as well:) [13:46:34] :p [13:49:53] mutante: I think I will add a new class for ant1.8 :) [13:50:17] I am just wondering if I should "require" it or just "include" it [13:50:31] it will be needed by the android::sdk one as well as jenkins [13:53:08] if you use require it will apply any changes to required objects BEFORE the rest [13:53:25] then there is also the opposite, called .. "before" :p [13:53:51] require: "used purely for guaranteeing that changes to required objects happen before the dependent object" [13:53:55] before: "the opposite of require — it guarantees that the specified object is applied later than the specifying object" [13:54:11] oh [13:54:14] so I can just include [13:54:22] sounds a bit confusing .. applying it LATER needs BEFORE :p [13:54:57] class x { before y } [13:55:07] So x says "apply me before y" [13:55:29] ok, yeah [13:57:56] New patchset: Hashar; "misc::contint::ant to ensure we have ant1.8" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1622 [13:58:09] oh.. and "Note that Puppet will autorequire everything that it can" [13:58:26] here is the new ant class ^^^ (1622) [13:58:28] "exec resources will autorequire their CWD (if it is specified) plus any fully qualified paths that appear in the command. " [13:58:54] which does not lint of course [13:59:54] "line 142 in file /var/lib/gerrit2/review_site/tmp/Ic7fd6a0d094c2ffe6b6cc99d5e590b7587f5798e/manifests/dns.pp" [14:00:02] where is the relation to "dns.pp"? [14:00:31] New patchset: Hashar; "misc::contint::ant to ensure we have ant1.8" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1622 [14:00:43] yeah it is confusing :( [14:01:00] dns.pp is followed by some ascii sequence [14:01:14] I think a newline character is stripped there [14:01:35] now it is "Syntax error at end of file" in contint.pp [14:01:42] doh [14:02:56] New review: Dzahn; "one more } on line 64" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/1622 [14:02:58] New patchset: Hashar; "misc::contint::ant to ensure we have ant1.8" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1622 [14:03:37] this time it looks fine [14:03:50] still need to install puppet locally so I can do basic lint checks [14:04:36] ideally the lint should be made in a pre commit hook to show us the nice colored message in our cli :) [14:09:34] New review: Hashar; "Having all blogs under the same class looks nicer." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/1607 [14:11:01] New review: Hashar; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/1606 [14:15:18] New review: Hashar; "Both links are part of the same sentence so they deserve to be in the same
  • block. " [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/1615 [14:17:01] New patchset: Dzahn; "misc::contint::ant to ensure we have ant1.8" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1622 [14:19:16] New patchset: Dzahn; "misc::contint::ant to ensure we have ant1.8" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1622 [14:20:53] New review: Dzahn; "How do you like this? Putting it in generic::packages , so it can be used in other places too." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/1622 [14:22:01] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1615 [14:22:01] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1615 [14:23:47] New review: Dzahn; "yes, not using new classes anywhere initially, only once we are sure nothing is missing." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1607 [14:23:47] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1607 [14:28:44] moving to dentist and then downtown [14:28:49] should be back in roughly 2 hours [14:28:57] thanks mutante for your reviews / merges :-) [14:29:34] New patchset: Hashar; "move misc::bugzilla::crons to misc/bugzilla.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1623 [14:29:39] see you [14:29:48] New patchset: Hashar; "misc::contint::ant to ensure we have ant1.8" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1622 [14:29:59] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1623 [15:10:34] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [15:23:06] mutante: you there? [15:27:05] hi peter [15:28:16] hey [15:28:26] so about the search index ticket [15:28:40] I'll look into it some, as I'm trying to figure out this whole search thing [15:28:47] cool! [15:28:57] but the thing about "start proc as rainman" [15:29:04] is that he has access to our search boxes [15:29:09] and is the one who actually set it all up [15:29:14] so if we start procs as root and not him [15:29:22] then he can come and do stuff if he has time [15:29:49] ok, his comments sounds like he does not have root though [15:30:27] no he doesn't [15:30:29] (have root) [15:31:02] New review: Dzahn; "Yea, thanks. I was planning to do just this, too." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1623 [15:41:17] notpeter: if you are getting into search stuff anyways.. there is one other ticket .. RT #187 , we (maplebed and me) never found out what is actually the difference that makes some servers show less statistics [15:41:21] ;) [15:43:33] New patchset: Catrope; "Rewrite Nagios job queue check using new getJobQueueLengths.php script" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1624 [15:46:40] robh: good morning [15:48:41] !log restarting indexer on searchidx2 [15:48:49] Logged the message, and now dispaching a T1000 to your position to terminate you. [15:52:35] !log dataset1 reinstalled and has had puppet run. Now to see if it can keep time [15:52:38] apergos: ^ [15:52:43] Logged the message, RobH [15:53:08] yay [15:53:11] thank you very much [15:54:02] I'm watching the syslog [15:54:16] anybody want to place any bets? [15:54:25] i bet it works longenough for us to think its good [15:54:30] we close ticket, and it dies [15:54:31] hahahaha [15:54:59] so you think it will in fact keep time. for at least a while. [15:55:14] yea [15:55:32] we'll see [15:55:42] so, when the ticket closes, the warranty is expired. [15:55:44] its three years old [15:55:45] in an hour or so if it still looks ok I'm gonna crank up an rsync [15:55:50] next time it really dies, it stays dead. [15:55:51] shitballs realls? [15:56:00] are these 1tb drives? [15:56:05] dunno [15:56:05] i dont recall... [15:56:17] you would think i would, but its that shitty raid controller [15:56:56] new eq portal is nice. [15:57:06] i can actually see, and read all my open and resolved tickets [15:57:11] amazin. [15:57:41] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1624 [15:57:42] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1624 [15:59:30] um [15:59:35] where's the raid? [16:02:13] shouldn't there be... an lvm or something? on the raid array? [16:02:22] mounted someplace? [16:03:55] ahhh [16:04:31] apergos: oh, its borked [16:04:35] there's sda b c and d (apparently), claming to be 4T each or something [16:04:37] =/ [16:04:42] ah bleep bleepers [16:04:53] yea, the logical volumbe group is there [16:05:02] but the logival volume doesnt appear to be [16:05:16] nice [16:06:15] ds1 is broken, news at 11 [16:07:06] apergos: ok run vgdislplay [16:07:09] so you see what i see [16:07:12] =[ [16:07:22] see how its bitching about uuids? [16:07:33] yes I do [16:07:51] * apergos thinks about stabbing roan [16:08:08] apergos: so i am not sure how to fix this one, i think we may just want to wipe out all that stuff [16:08:10] and redo the lvm [16:08:15] sure [16:08:20] may be easier than tryin gto fix the existing one [16:08:24] worksforme [16:08:44] * RoanKattouw thinks he's reached his quota for the next while re antagonizing people [16:09:47] hahaha [16:09:48] Cannot change VG vg0 while PVs are missing! [16:09:54] i want to remove it stupid computer!~ [16:10:22] RobH: that might be dangerous... [16:10:31] how dangerous can it be [16:10:38] notpeter: we are trying to lose the data. [16:10:40] at worst he just reinstalls the whole bloomin thing again [16:10:49] except reinstall needs someone to hit F12. [16:10:50] well, I mean, dangerous enough that there might be a warning and it won't let you [16:10:55] oh yeah [16:10:55] usually that means it's pretty dangerous [16:10:58] and we have no one huh [16:11:06] notpeter: yea, i want it to remove all data ;] [16:11:12] but it doesnt like that [16:11:15] its not mounted though. [16:16:06] hmm.. something like "dd if=/dev/zero of=/dev/..." to kill meta data? [16:16:43] meh, i just going to use parted to kill the lvm partitions entirely [16:16:56] is parted ok for gpt partitions? [16:17:44] that worked [16:17:51] fdisk recommended parted for it [16:17:52] heh [16:17:55] yea, just fdisk isnt [16:17:58] indeed [16:18:11] ok great [16:18:15] its working, and sees them all, and now they are gone [16:18:21] so the space is now unclaimed, which is a good start. [16:20:28] RoanKattouw: works on spence: JOBQUEUE OK - all job queues below 10,000 [16:20:40] yay [16:25:22] sweet [16:32:11] ok, lvm is kind of a pain in the ass [16:32:22] apergos: want me to bore you with the details? [16:33:30] yes I do [16:33:34] if you don't mind that is [16:33:48] the way we create stuff in /etc/nagios/puppet_checks.d/ is broken [16:33:55] spence:/etc/nagios/puppet_checks.d# wc -l *.cfg | sort -rn| less [16:34:04] f.e. "spence.cfg" has over 1000 lines :p [16:34:23] um [16:34:24] keeps appending stuff [16:34:36] so i wiped out the partitions, but the pv data was still there [16:34:40] so i did work i didnt need do [16:34:51] what i really needed to do is 'pvremove /dev/partition -ff [16:34:58] which forces them to wipe out regardless [16:35:27] So I did that just now, and i have no physical volumes [16:35:31] ok... [16:35:49] with lvm, you create physical volumes first, then volume group [16:35:52] then logical volume [16:35:54] right [16:35:59] so now I am doing pvcreate on each new partitoin [16:36:40] apergos: still on ds1? go ahead and run pvdisplay [16:36:48] the output no longer is filled iwth bad UUID info [16:36:52] as they are brand new [16:37:20] well vgdisplay (which I ran earlier) is quite short now :-D [16:37:49] there are no volume groups, since we wiped out the physical volume under it [16:37:51] pvdisplay shows me four volumes [16:37:59] yep, each of our 4 raid6 arrays [16:38:03] two per chassis [16:38:26] (though part of array 1 and two are os and swap) [16:38:38] so we're going to wind up with how much available at the end of this? [16:39:11] never mind I'll wait til we get there [16:39:18] then we'll know for sure [16:39:20] apergos: now do vgdisplay [16:39:33] i did vgcreate vg0 /dev/disk /dev/disk [16:39:36] for all the partitions [16:39:37] uh huh 14 + change [16:40:00] ok [16:41:36] so now i am making the actual logical volume [16:41:39] and do lvdisplay [16:41:55] uh huh [16:41:56] did lvcreate -L 14.45T -n lv0 vg0 [16:42:10] so now i need to make the filesystem, and mount is all [16:42:15] great [16:43:38] off-topic: my cat has learned that rattling noices from a small plastic container mean cat treats. except they don't. they mean cashews. she's been very disappointed lately. [16:43:39] apergos: ok, done, and mounted. now we just have to update fstab [16:43:56] xfs, okey dokey [16:45:16] !log dataset1 new data partition ready and setup to automount [16:45:23] sweet [16:45:24] Logged the message, RobH [16:45:25] thanks [16:45:27] apergos: Ok, dataset1 is a clean install all for you now =] [16:45:34] glad to do it, i was rusty on that stuff [16:45:37] was good to poke at it again [16:45:47] guess I'll give it more than an hour [16:45:56] if the drift is going to be noticeable enough .... [16:46:00] lets move data =] [16:46:11] I will right after that [16:46:14] cool [16:46:25] I cna't in fact just do an rsync of /data [16:46:33] cause guess what this disk is much smaller than ds2 [16:46:44] sucks eh? [16:46:45] but I can start with the pagestats files [16:46:48] yes it does [16:46:54] so is this a useful server? [16:47:00] let's wait and see [16:47:04] if it turns out stable that is [16:47:30] even with smaller size, it can be where active data is stored while historic data and public serving is from dataset2 with its larger disk? [16:47:33] (just wondering aloud) [16:47:38] if its stable of course. [16:47:51] well my main concern is that I want a second copy of the data on ds2 [16:48:09] (plus it owul dbe nice if it dies to have another host so we can keep the runs going) [16:48:20] so I can get part of that done. :-/ [16:48:47] did the ds3 order go in before the price went up? [16:53:20] i put it in this morning [16:53:26] and the quote said old prive [16:53:31] price but dunno [16:53:34] ok [16:53:37] guess we'll see [16:53:50] it will cost what it costs at this point [16:55:03] yep, either way its ordered [16:55:07] and your snapshot hosts are too [16:55:24] so by end of next month you will have another cluster for this =] [16:58:00] sweeeeet! [17:01:46] New patchset: Dzahn; "make service retry and check intervals configurable - still missing a line" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1625 [17:02:13] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1625 [17:02:14] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1625 [17:06:58] New patchset: Dzahn; "fix path to check_job_queue" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1626 [17:07:23] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1626 [17:07:24] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1626 [17:24:03] top thing I hate about winter: Xmas [17:24:14] second. [17:24:42] Dr 1 hour became Mr 2 hours thanks to traffic jam [17:25:05] looks like suddenly anyone 50 kilometers around wanted to be in downtown :-) [17:26:22] top thing i hate: too damned cold. [17:30:19] boo hoo, your virginia is so cold =P [17:30:40] my place is near the atlantic coast and is not that cold [17:30:59] we usually have 3 weeks of freezing (below 0°C) weather [17:33:57] the majority of the winter here is below freezing [17:34:16] well, on three year average. [17:34:25] but that includes a crazy winter before i got here ;] [17:34:33] its in the 40s now, its been there all week [17:34:55] i cannot contemplate Celsius for outside temps. [17:35:00] horrible american education [17:35:11] for server temps its fine, but its just odd disconnect in my brain [17:36:42] mutante: https://rt.wikimedia.org/Ticket/Display.html?id=2160 -- worse than before? [17:39:37] RobH: notpeter: top thing about Xmas: they are airing the good old James Bond movies :) [17:39:50] a pen that kills people! [17:39:52] yay! [17:40:03] * RobH has seen them all and would watch them again [17:40:13] hexmode: I restarted the indexer. I would not be surprised if it's just starting to rebuild the index and is kinda fucked up currently [17:40:35] notpeter: k, wasn't sure how long to wait [17:40:48] hexmode: I'm not sure either :) [17:40:59] \o/ confusion! [17:41:00] trying to figure search out... [17:41:03] hahahahah [17:41:07] RobH: i have most 5 (or at least 10) degree C increments memorized in F for realistic weather range. so, that helps but then i started getting used to it. my phone is set to C. i lived in the land of C for over a month straight [17:41:31] so, 20=68,25=77,30=86 [17:42:07] notpeter: oo! rainman updated the bz ticket [17:42:14] ah what's going on? [17:42:59] rainman-sr: you have RT access? [17:43:19] nop [17:44:02] I am not sure any volunteers have access to RT. [17:44:06] hexmode: ^ [17:44:11] y [17:44:23] Because it has a lot of security issues listed as well as pricing information [17:44:26] hexmode: what's the BZ number? [17:44:51] I think the end plan is to have something that is more customizable, but its a large scale project. [17:44:54] notpeter: rainman-sr just updated another one, 1s [17:45:05] kk [17:45:28] notpeter: https://bugzilla.wikimedia.org/32634 points to both, I think [17:45:39] so basically [17:45:53] someone just needs to reset the dates to before gender stuff was introduced [17:46:00] there are files on searchidx2 [17:46:09] like: /a/search/index/status/dewiki [17:46:20] which just have a date of last incremental update [17:46:24] this needs to be changed [17:46:32] and no restarting is needed [17:46:57] once the wiki is up to be updated the next time, this file will be read and update started from that date [17:47:54] gotcha [17:47:56] RobH: does domas really not have RT? [17:47:58] it's fairly ease, a bit laborious, but what we lack at this point is the exact then to which this needs to be reset [17:48:06] that makes sense [17:48:29] domas has root [17:48:36] i know :) [17:48:40] not giving root folks access is an empty gesture [17:48:50] so they are not normal cases, they are root first [17:48:52] =] [17:49:29] so domas and jens have access [17:49:54] rainman-sr: what will be different this time the index is rebuilt? I'm just curious [17:49:55] RobH: anyway, bugzilla has groups which are usually used for security. does RT not have such a thing? [17:50:06] will it be the new intializesetting.php? [17:50:21] (I'm trying to learn as much as I can about our search setup) [17:50:24] jeremyb: you can set queues to security, but then we would have to accomodate that, and we also have closed registration [17:50:26] RobH: or does RT have federation? i.e. i'm wondering what the labs parallel is for RT [17:50:33] notpeter: There was a bug in the XML export thingy [17:50:40] RoanKattouw: gotcha [17:50:43] So the data we output was corrupted for a while [17:50:44] notpeter, previously the xml dumps were broken [17:50:49] RobH: i meant security for an individual ticket level [17:50:49] kki [17:51:03] thanks! [17:51:04] RobH: or even comment level [17:51:08] jeremyb: you cant in our install, perhaps with addons [17:51:23] its basically there because our bz setup isnt good for super sensitive info [17:51:30] so its not there to work for public, its for internal stuff. [17:51:39] hence there isnt really a push to make it adjustable for that [17:52:02] its on our roadmap to improve it and make it more transparent where possible, but it is not a high priority compared to other projects [17:52:13] yeah [17:52:35] in particular on that map is making it so user creation is tied to something more easily updatable [17:52:48] is not ldap? [17:52:51] it can be [17:52:53] but it isnt. [17:53:06] New patchset: Hashar; "move misc::bugzilla::crons to misc/bugzilla.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1623 [17:53:08] RoanKattouw rainman-sr: which indexes will this be required for? everything of the form [a-z][a-z]wiki? or all of the indexes? [17:53:18] New patchset: Hashar; "misc::contint::ant to ensure we have ant1.8" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1622 [17:55:20] i was just thinking (and until a few mins ago i thought this might already be the case) that a level in the labs -> full root path for new vols might be RT. and of i may have assumed (15 mins ago) that ppl that predate labs like rainman already had it [17:55:25] RobH: ^ [17:55:45] New patchset: Hashar; "move misc::bugzilla::crons to misc/bugzilla.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1623 [17:55:55] notpeter, only those that have gender stuff, as noted by siebrand on that bug [17:55:57] New patchset: Hashar; "generic::packages::ant18 to ensure we have ant1.8" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1622 [17:56:09] * hashar spamming gerrit since november 2011 [17:56:21] jeremyb: ahh, you mean if you are admin of a labs project, yo uthought it would include RT? [17:56:25] kk, thanks [17:56:43] I think it would be awesome if a labs project automatically generated its own RT queue that all project members can see, and the admin can admin [17:56:57] but thats a tie in from openstack to rt [17:57:00] RobH: no. part of the labs concept is that people eventually get full prod root. and there's other "levels" before you get that far [17:57:06] ohh, no. [17:57:14] the point of labs is folks dont need full production root [17:57:26] i know [17:57:45] now, those folks may transition into ops roles, but I think the idea is we will honestly have less roots over time [17:57:56] notpeter, i think siebrand list is correct, with exception of enwiki which shouldnt be there [17:57:59] and root will not go out to volunteers [17:58:06] it will go out to even less employees [17:58:20] as we are trying to restructure how we run the cluster so root isnt needed for daily tasking [17:58:20] /query mutante [17:58:24] RobH: i thought giving prod root to vols was an explicit goal of labs... [17:58:36] i may be wrong, ryan is the person to ask [17:58:46] but i dont see why we would give out root to folks on cluster [17:58:55] * jeremyb suddenly realizes ryan isn't here [17:58:59] +1 robh [17:59:00] not really, there's the idea that people can have root on labs, do some puppet change,s then push them to production [17:59:00] so they won't need to have root at prod [17:59:12] yes, LeslieCarr is saying what i said poorly [17:59:25] jeremyb: you are meaing root as in change cluster [17:59:27] i think [17:59:44] i mean root as in root user, and how we are moving away from requiring it for anything but holy shitballs moments [18:00:14] and we are working to break up and better seed differing levels of deployment access [18:00:27] that will tie into labs for trusted users to push changes related to their access area [18:00:36] but they wont be root. [18:00:46] RobH: i mean root as in "has at least as much access to 99% of boxen as all other humans" [18:00:57] yea, we wont do that [18:01:09] ideally they have access to just their limited subset of services and puppet manifests [18:01:21] and someone in a code review mode for ops, ryan myself, mark, whoever [18:01:30] those folks can read your change, and approve, which will push it live [18:01:40] rainman-sr: ok! [18:01:42] much like brion and tim do code review for mw before it goes out the door [18:02:12] at some point, for something this large, it has to fall to someone who has signed agreements and has a responsiblity to ensure the change doesn't take down the entire site [18:02:36] so if it does, they are going to be there to fix whatever it is, until its fixed, no matter what [18:02:57] as it is in ops, we shouldnt push our own changes [18:03:16] when notpeter writes a puppet change for a production server, he has asked others to review it [18:03:30] in fact, most of them are better at doing this than I am ;P [18:03:33] RobH: i think i understand what your saying but it's just not what i understood (having seen the presentation in person and talked about it beyond that). maybe i need to reread the presentation. regardless should probably chat with ryan to see what he thought/remembers when he's around [18:03:50] indeed, i need to as well, so ping me if I am around [18:04:13] and i dont want to be telling folks the wrong thing [18:04:16] =] [18:04:22] RobH: and re push your own changes... maybe it's best if you *do* push your own? just wait to push until someone else has reviewed [18:04:46] jeremyb: it's not necessarily better , it's better that you are around [18:04:47] indeed, i generically say push when I should say approve and commit [18:05:00] our gerrit install actually doesnt push the changes automatically [18:05:11] a root level ops person has to login to our puppet masters and update them [18:05:17] its an intentional disconnect. [18:05:50] we need to have all that workflow written down somewhere :-) [18:05:58] i remember the growing pains with people doing the merge fine gerrit side but then messing up the pull (on sockpuppet i guess) [18:05:59] Yeah so esssentially the steps are these [18:06:02] its on labs [18:06:18] but it needs a clearer picture, indeed [18:06:41] we are also toying with migrating wikitech into labs, and renaming the labswiki to be all inclusive docs for labs projects and the actual cluster (wikitech) [18:06:43] 1) Write a change, 2) submit it in Gerrit, 3) someone else reviews, approves, and tells Gerrit to merge, 4) puppet master is updated from the repo [18:06:47] maybe Jorm can do a nice infographic [18:06:54] that way all labs users can update how their things work on cluster [18:07:23] i say toying, when its really 'it will happen when we get to it, no on dislikes the idea' [18:07:25] heh [18:07:55] cuz we need to merge it then setup a static copy of it off cluster for outage procedures and docs [18:09:04] initial version: static dump to git repo. commit and push (fast forward) somewhere that's then mirrored in a few places [18:09:38] evil version: make mediawiki export to git or store in git or export something that can be git-fast-import'd (is that the right name?) [18:11:59] look who's here :) [18:12:28] huh, i am hungry, and its not 3pm [18:12:36] i am remembering to eat lunch almost on time! \o/ [18:12:40] Ryan_Lane: some labs questions. want the executive summary or a log or both? [18:12:42] apergos: have you eaten anything yet today? [18:12:55] summary [18:12:56] RobH: is he in SF or greece? [18:12:58] yes. I ate breakfast, a small one; then I are a decent lunch [18:13:08] and soon I can heat up some lentil soup from yesterday for dinner [18:13:13] location doesnt matter, both ariel and i constantly forget meals [18:13:17] usually i forget lunch [18:13:21] thank you for asking [18:13:29] ariel dinner, so late dinner there and late lunch for me usually =] [18:13:33] I m in Greece, it will be mealtime in about another 45 mins [18:13:45] can someone possibly review & merge https://gerrit.wikimedia.org/r/1622 please? :-) [18:14:16] Ryan_Lane: so, as i understood it labs was supposed to have various levels of access. you start somewhere and as you show what you know/can do and ppl trust you then you move on to more and more access [18:14:21] it is a lot of patch set, but in the end it is just a class to install ant1.8 package :) [18:15:13] Ryan_Lane: i thought the end of the path which some may graduate to was full prod root. maybe after 6 months or 3 years or whenever it feels right. and not for everyone. but i thought that was a goal at the end of the path [18:15:18] Ryan_Lane: is that wrong? [18:15:22] New review: Catrope; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/1622 [18:15:24] that's correct [18:15:55] Ryan_Lane: you mean as in, 'here is our root password' ? [18:15:56] is the assertion correction or is it correct that is wrong ? :-))) [18:16:10] RobH: or people will have a root key [18:16:12] because the point is that we are doing things in restructure to make it so folks do not need root to do thier jobs [18:16:21] thats entirely the opposite of what i thought labs was for [18:16:38] RoanKattouw: thanks 8-) [18:16:41] i thought it was to make it so they can write puppet and services that can be checked in and a final ops person can approve [18:16:43] that's, of course, for people who have been working with us long enough for us to trust them [18:17:00] i dont see handing root out to folks as a solution [18:17:08] ever? [18:17:11] we need more accountability [18:17:26] I'm not saying we're giving out root like candy [18:17:27] for final code review, someone who collects a paycheck signs off right? [18:17:36] so they are personally responsible [18:17:37] Ryan_Lane: cough syrup? [18:17:40] we have a bunch of volunteer roots [18:17:54] we have not given any volunteer roots out since before i started [18:17:55] I don't see the problem with having more, if we trust what they are doing [18:18:07] because we have no method of them gaining trust [18:18:24] right, but the point of labs shouldnt be explained as do labs, eventually get root [18:18:30] i think that is setting those folks up for failure [18:18:50] its merely an area they can prove themselves, and write things that root users can push live after code review [18:18:51] I think it should be described as "it's possible to eventually get root one day" [18:18:57] not "you'll get root" [18:19:17] because the latter is not true [18:19:21] RobH: even if it's explicit that root will be rare and there's many steps along the way? it's not just committer and then root [18:19:26] as long as its stressed that its a very remote thing and rare [18:19:38] and ideally we need it less and less [18:19:51] I feel that the way things are set up, it isn't actually necessary to *need* root much [18:19:53] the future roadmap is for even us ops folks to not have to use root for our daily stuff [18:20:08] since it harms accountability [18:20:44] back in a bit [18:20:48] breakfast [18:20:50] hence i would phrase it like administrative powers but thats me being odd. [18:21:03] LeslieCarr: you may want to look up when you have chance [18:21:19] RobH: heh [18:21:33] cuz ideally none of us should have to drop to root for normal things, they should all be setup with security to specific groups, and individual users have access and assignment to those groups on various servers [18:21:49] its just a huge pain in the ass. [18:22:36] sure. but sometimes you e.g. change puppet's sudo config and lock everyone out [18:22:47] indeed [18:22:57] but ideally we would catch that, since it would be tested in labs =] [18:23:03] right [18:23:10] but we dont have a full virtuallized cluster yet [18:23:13] so yea, it can happen [18:23:17] google's desktop (for macs) puppet has multiple environments [18:23:24] the user can choose which env to use [18:23:39] and they don't have root on their own machine. but they mostly all have sudo [18:24:04] yep, we also need to setup sudo for users on ops level, which we do not really do now. [18:24:07] right now we just use root. [18:24:11] the most minimal manifest that all the environments have is: fix sudo after someone manually locks themself out [18:24:41] so we also need to configure proper sudo policies across the cluster so basic operations tasks on root level access can be done via individual users with sudo [18:24:43] they also do the weird masterless thing :) [18:25:00] its funny cuz we all know we needed to do this [18:25:08] but up until this year its been like 4 people. [18:25:16] .5 :P [18:25:25] well, maybe more than a year [18:25:53] not counting the volunteer roots, who are in and out based on their real workloads, but they do contribute [18:26:01] but i count who gets paged when there is an outage [18:26:04] and thus who comes running. [18:26:27] but yea, now we are large enough we need and are starting to put in those policies [18:27:05] jeremyb: To me giving someone sudo with proper enfocement to audit actions is a lot easier to hand off [18:27:18] rather than 'here is root, go nuts' ;] [18:27:41] i digress, i need to go get lunch before i forget its time for lunch, back in about 15 [18:29:06] RobH: sure. i don't think go nuts was on the roadmap. there can even be multiple levels of sudo. but aiui, one of the explicit goals of labs was to make it possible to get more stuff done by vols at all levels [18:29:11] enjoy [18:29:18] * jeremyb needs to get breakfast! [18:29:20] to get stuff done. [18:29:27] yep, we hope the labs will help a lot with that [18:29:39] it already has [18:29:42] quite a bit [18:29:52] and its still in closed beta. [18:30:39] jeremyb: indeed, i think we are on same page, i just have a strong reservation about the attainability of root on cluster [18:30:43] anyway, if RT is the place to throw lots of stuff that doesn't need to be secret (or at least not topsecret) and many of those things can be done by labs vols then how do you let them know [18:31:04] i think its possible, and labs gives us the space to have someone gain that trust in an environment that we can easily see what they are doing [18:31:11] so it makes it easier for anyone to eventually get root [18:31:23] but its still really really difficult to get root. [18:31:26] =] [18:31:44] who wants to open a candy shop? [18:31:49] i dunno about the rt thing [18:31:58] Ryan_Lane: so is the plan for labs to do ticketing in RT or something? [18:31:59] * jeremyb runs away [18:32:34] ok, i read backlog when back, im walking out door to pickup foodz =] [18:33:08] enjoy [18:53:52] jeremyb: i was having foods, i'll read back [18:56:58] RobH: I dunno what the plan is for labs regarding ticketing [18:57:05] I think we want to make RT use ldap auth [18:57:12] and make an open queue for labs [18:57:19] that was what i recalled [18:57:23] but i didnt wanna quote anyone [18:57:28] * Ryan_Lane nods [18:57:42] i recall you sayiing something about rt ldap goddamn mess? [18:57:45] ;] [18:58:01] heh [18:58:50] well, poorly documented, for sure [19:06:14] so, shell bugs... who is a good person to ask for help so I'm not going to Reedy all the time? [19:06:36] is there a volunteer you guys could recommend? [19:06:37] hah [19:07:20] hexmode: i assume you mean stuff that's not labsable [19:07:55] * jeremyb grumbles at noc.wm.o/conf still (afaik) not being in git/labs workflow [19:08:07] jeremyb: these are typically changes to projects... like an extension enabled or some such [19:08:16] hexmode: right [19:08:36] jeremyb: you're doing labs stuff? [19:10:08] hexmode: maybe soon. i've read a handful of the puppet files so far [19:10:24] hexmode: and actively playing with puppet outside WM [19:10:40] jeremyb: are you wmf employee? [19:10:46] no [19:11:01] k, didn't think so, but wanted to check [19:11:40] jeremyb: you could start putting noc.w.o/conf in git! [19:14:51] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [19:15:00] hexmode: noc.wm.o/conf is already in non-public version control. which contains non-public files. i think there needs to be a discussion before i just scrape the HTTP and check in to git. how would i get people to use the git instead of svn? what about the non-public files in use which are siblings of the public files i move to git? if i don't get ppl to use git then what? cron job scraping and commiting? then you have no commit messages. [19:16:22] jeremyb: so, my understanding of how we're going to use git (which could be completely wrong) [19:16:58] hexmode: anyway, in the past, one answer to this question was that noc.wm.o/conf is a dev thing not ops so the labs crew wasn't going to force them to move [19:17:39] jeremyb: is that we'll have some gate from gerrit -> git trunk [19:18:02] hexmode: gerrit is the gate [19:18:09] jeremyb: if we did that, then wouldn't it be ok to just stuff the public stuff into git? [19:18:59] sure. that's not a hard question [19:19:14] jeremyb: doesn't someone have to move it into the main branch in gerrit, though? [19:19:26] hexmode: so, in the puppet config, i can tell you everyone who can push shell requests [19:19:28] would that help? [19:19:31] so gerrit isn't the gate, but it is where gating happens [19:19:46] RobH: that would! [19:20:00] hexmode: ok, its public file, so its cool i just put in here [19:20:07] :) [19:20:08] a mailing list request for ops :) https://bugzilla.wikimedia.org/show_bug.cgi?id=33255 [19:20:17] so we have our normal ops folks, who can do shell requests, but are not usually the ideal folks [19:20:39] right, I'm trying to avoid ops for things that are simply shell [19:20:53] never heard of that language [19:21:00] and I think I've been relying on Reedy too much [19:21:15] then in dev we have the following: andrew, aurther, aaron, nimish, ryan faulkner, ^demon, hashar_, reedy, patrick, robla, and neil [19:21:34] aurther=awjr? [19:21:38] so any of those folks are capable of pushing extensions and project requests [19:21:40] yep [19:21:52] that doesnt mean its their job though, you would have to check with them [19:21:58] hexmode: I can push shell bugs yes [19:22:04] i know for a fact you arent going to get patrick to push that stuff [19:22:11] mobile is his job, etc.. [19:22:55] sure, I'm seeing a couple nicks that I wasn't really using before, though. Mostly just hashar ;) [19:23:17] and *maybe* ^demon [19:23:31] + aaron possibly [19:23:56] right... just good to keep this in mind for later [19:24:00] tyvm robh [19:24:08] hexmode: some ppl are governed by exam schedules [19:24:11] welcome [19:24:14] neil is probably just for the upload wizards [19:24:32] the rest..I don't know about them nor their projects [19:25:13] RobH: who is faulkner? [19:25:23] ryan faulker, dev [19:25:29] i dont think he would be doing shell requests [19:25:38] hexmode: he does fundraising analysis [19:25:38] kk [19:25:49] or did... maybe it changed [19:26:12] hexmode: oh, and of course RoanKattouw but i didnt include him cuz he is root [19:26:26] but his root is limited to info gathering [19:26:43] but he can do shell requests, he has that access as well, just again not sure if its his thing [19:26:45] same with tim, right? [19:26:58] I don't routinely do shell reqs [19:26:59] Tim does not have any access anymore [19:26:59] probably not his thing [19:27:01] i dont know if tim still does root [19:27:05] But I often do shell-like things [19:27:10] I'm pretty sure Tim still has root [19:27:11] heh, brion doesnt have shell [19:27:14] he loves it [19:27:44] hah [19:27:52] to be more precise, Tim is overbusy. So just pretends he does not have any access :D [19:27:53] RoanKattouw: i think so, but i have not worked an outage with tim in awhile [19:28:11] tim has whatever he wants, and is too busy to help with normal shell requests [19:28:13] =] [19:28:20] He helped me with tcpdump a while back [19:28:25] tcpdump requires root ) [19:28:29] indeed [19:28:43] so yea, it used to be outside of ops, the folks with root were brion, tim, domas, jens, river [19:28:50] brion doesnt even have shell anymore [19:28:57] tim domas and jens have root still [19:29:02] What about river? [19:29:06] and i think river has it [19:29:17] but may not be around when the next password change happens [19:29:27] though if river came back and got involved would more than likely get it again [19:29:35] And I'm on that list now, although as you say there are socially-enforced restrictions on my access [19:29:41] indeed [19:29:54] And I also don't have the password, just key access [19:29:55] restrictions? [19:30:06] such as you have root access but are not allowed to use use it ?:) [19:30:10] Sort of :) [19:30:17] roan doesnt do root actions, he has root so he can gather data on performance and execution of things [19:30:28] I am allowed to look around, but I am not allowed to touch things without approval [19:30:36] can't that be done with root access / using sudo? [19:30:45] we dont have proper sudo policies in place [19:30:50] its an open project =P [19:31:12] can roan boot an unresponsive box? i.e. pull power remotely [19:31:15] No [19:31:21] I don't have serial access either [19:31:24] thats mgmt access [19:31:27] Or maybe I do but I don't know how to use it? [19:31:31] all serial and mgmt interfaces, you dont ;] [19:31:37] totally different logins [19:31:47] Anyway -- I occasionally do things like ownership fixes if another root approves them [19:31:53] look at /root/mgmt-passwords.txt [19:31:57] * binasher stabs robh repeatedly for leaving db10 without a replica  [19:32:06] s/10/9 [19:32:09] our high level access is atleast split up between OS, networking, and mgmt [19:32:10] binasher: ? [19:32:18] what's db10 doing then? [19:32:18] dude, i just know it ran out of space [19:32:22] did our pulling binlogs break replication? [19:32:42] I am a server monkey, I need a db runs out of space checklist. [19:32:43] yeah, you can't delete the binlog that is actually being read from [19:32:56] hehehe [19:32:58] i was afraid we were going to do that [19:33:03] but i also wasnt sure how to ensure I didnt [19:33:11] and the services were down, so i just acted [19:33:25] it was that, or call you on your day off ;] [19:33:52] binasher: but yea, if we can document on a basic level the steps to run for normal instances of this kind of thing, that would be awesome =] [19:33:57] i'm going to reload the pair and upgrade mysql.. the new build i packaged has a neat patch actually shows what binlog position each slave is on when you "show processlist" on the master [19:34:06] nice [19:34:14] i also allocated you replacements [19:34:17] on the one RT ticket [19:34:20] for db9/10 [19:34:35] so you can just migrate to those in your rebuild if you want [19:34:37] RobH: I thought you only deleted binlogs >24hrs old. [19:34:44] maplebed: indeed. [19:34:47] was there a replica that was >24hrs behind in reading the binlog? [19:34:57] thats all i meant to do, perhaps i or someone else cut too close? [19:35:12] may have been me, i didnt check any positions [19:35:31] I deleted a few today and it was only a few. [19:35:36] even when hosts are behind in replication, they're usually up to date in binlogs, unless slaving is stopped. [19:35:36] are we in trouble? [19:35:44] the replication broke is all [19:35:47] grrr [19:35:48] so its bad, but binasher is fixing [19:35:49] binasher is fixing it. with him around, there can't be trouble. [19:35:50] :) [19:36:24] "there are no problems, there will be no problems" [19:36:29] that one's for faw :) [19:36:42] hahaha [19:36:49] it would have had to be 24 hours + out of date I think, I remember looking at the timestamps [19:36:50] Next year, there'll be no shoes :) [19:36:54] (of the ones I deleted) [19:37:09] faw++ [19:37:17] i'm also going to reload db9 at some point and free up a couple hundred gigs of actual disk space, when it's ok to have 30min downtime.. maybe i'll arbitrarily make up that time in the middle of the night one day [19:37:38] binasher: traditionally thats done on saturday morning ;P [19:37:41] hehe cool :) [19:37:48] saturday is the slowest day for most services and traffic [19:37:51] then go get mimosas ? :) [19:37:58] sunday tends to pick up due to timezones in EU being in monday [19:38:08] what happened is the most recent binlog was corrupted when db9 ran out of space, slaving on db10 would have had to have been moved to the next position after restart, then after time passed, that log was also deleted [19:38:26] ah. [19:38:29] ahhh, so we didnt delete the active log, we just didnt fix slaving at the time of the outage [19:38:29] db9 ran out completely? [19:38:37] oh yea, db9 was full, caused initial outage [19:38:41] oh [19:38:46] yeah that's a problem [19:38:48] everything fell over and there was much sadness [19:39:01] hexmode: told me bz was broke and i called him a liar ;] [19:39:12] (not really, just funnier) [19:39:14] RobH: "start slave;" @ db10 should have done the trick after fixing db9 [19:39:20] and I cried [19:39:33] jeremyb: binasher said we would have had to point it at the noncorrupt binlog [19:39:41] so would have had to tell it that somehow right? [19:40:27] RobH: i've never had a binlog corrupted. but i always used to run with binlogs on their own filesystem so if the ibdata system filled then binlog still had some room [19:40:30] food! rats [19:40:39] ahh, these share with data [19:40:46] and bring them to their knees [19:42:07] New patchset: Asher; "install new mysql pkgs on db9/10" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1627 [19:42:17] RobH: idk how i prefer it. but i'm no expert either. certainly would be nice if mysql could e.g. push binlogs to swift. pull the current position # and current log from master but transparently pull from swift if you need older logs [19:42:37] then you need swift as a point of failure in mysql cluster [19:42:47] how so? [19:42:54] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1627 [19:42:54] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1627 [19:42:56] what if swift is offline, now replication doesnt work? [19:43:09] the *current* logs (or 5 most recent) would be on the master and work same as now [19:43:10] its adding in a service dependency [19:43:18] but old logs would be on swift [19:43:31] ahh, i see what you mean [19:43:48] we dont run into these issues as much for wiki dbs [19:43:53] !log powercycling maerlant [19:43:55] because those are honestly split up and handled a lot more sanely [19:43:56] and swift is shared nothing [19:44:02] Logged the message, and now dispaching a T1000 to your position to terminate you. [19:44:10] only since binasher has started have we looked at fixing the misc db issues [19:44:18] !log dropping several db's from db9 which have already been migrated to fundraisingdb cluster [19:44:22] notpeter: waited that long for a shell? wow [19:44:26] Logged the message, Master [19:44:28] otherwise it was just there, and we threw more hardware at it as needed [19:44:45] jeremyb: eventually it timed out. but I aksed Ryan if it was doing anything. he said just reboot [19:44:50] backing up databases + binlogs is a legit practice which we should embrace.. i was kind of hoping we could use the netapp's for that once they're online [19:45:45] binasher: mark doesn't want to use them for that [19:46:12] i don't care. give me something comparable to use for that. [19:46:13] Ryan_Lane: does maerlant do anything? came up a few times in #wikimedia-tech and on one had a clue [19:46:34] no [19:47:47] New patchset: Asher; "Revert "install new mysql pkgs on db9/10"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1628 [19:48:33] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1628 [19:48:33] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1628 [20:01:48] i am breaking things using db9 for a minute or two.. [20:02:54] <^demon|away> Everything uses db9 :p [20:03:11] muhaha [20:03:21] I mean, really just puppet, the blogs, bz, rt, etherpad, gallium, and a couple of other things [20:03:24] You'll break wikipedia [20:06:24] sometimes you have to break shit to fix shit [20:06:35] omelettes. eggs. etc [20:06:55] we need to move labs to db9/10 at some point :) [20:07:24] * binasher is running out of people to stab [20:07:48] as we said in college: you can't cook an omelette without breaking a couple of whiskey bottles. [20:07:58] :-D [20:08:04] holy shit i was sleeping under a rock this weekend - I didn't know kim jong-il died [20:08:27] it seemed to hit facebook ~10 hours ago based on my friends [20:08:34] lcarr: neither did anyone lese until last night [20:08:39] yeah [20:08:48] who knows when he actually died... [20:08:49] http://static.pokato.net/2010-09-26-22-14-381037156564.jpg [20:08:51] good point [20:09:08] oh man, talk of the nation just said something like that - [20:09:14] Reedy: and there was much loling... [20:09:17] "kim jong il became kim jong very ill" [20:09:26] notpeter, I hear he's not so ronry now [20:09:34] I wonder if they thought about propping him up in his chair and not telling anybody... [20:11:09] apergos, what do you think they've been doing for the last three years since his stroke [20:11:20] good point! [20:12:37] weekend at bernie's: n korea edition [20:13:03] so anyway, hoping for NOT a nuke war on the korean peninsula as a distraction [20:13:21] his cabinet had a vested interest in doing so.... as they will all be killed when his son takes over [20:13:23] LeslieCarr: so did we ever figure out how to fix passwords in racktables? [20:13:27] cuz i just locked myself out [20:13:39] it seems racktables didnt like one of the characters in keepassx random generation [20:13:47] and i think it just trimmed it somepalce in the string [20:13:49] doh [20:14:01] RECOVERY - Puppet freshness on maerlant is OK: puppet ran at Mon Dec 19 20:13:35 UTC 2011 [20:14:14] food! doing it now. [20:14:27] i did something - i google recovering racktables admin password or something like that [20:14:30] Dude, it's past 10pm [20:14:41] RECOVERY - SSH on maerlant is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [20:14:44] LeslieCarr: but you then set a admin pass right? [20:15:00] oh, you put in docs [20:15:07] i am just not paying attention =P [20:15:21] hehe [20:15:22] ok [20:25:49] lentil soup with tomatoes, carrots and onions, a dash of curry flavor. day 2: even better than day 1 [20:25:59] and forgot, also has beet greens! [20:26:55] apergos: sounds awesome [20:27:08] I'm loving it [20:28:06] hrmm, where the hell do you set passwords for others [20:31:39] bleh, i think i need to do it via mysql console, racktables fail [20:32:11] are they not magically salted/hashed? [20:37:43] RobH: hey - on rack b3, does the machine slotted in the bottom (RU1) go in port 0 or does the one on the top go in port 0 ? [20:39:47] bottom u goes in lowest port on switch [20:39:54] so U1 bottom of rack to port0 [20:40:53] awesome [20:40:54] thanks [20:41:16] so it loooks like dataset1 is keeping time [20:41:26] (yes that extra o is deliberate. ) [20:42:53] ... i cannot find anyplace to do this in docs [20:42:55] this sucks [20:43:10] old racktables was also horrible, but it let anyone set anyone elses passsword, heh [20:44:04] ahh, found it [20:44:08] this does too =P [20:46:36] hahaha [20:46:38] holy shit [20:46:39] http://wikitech.wikimedia.org/view/User:RobH/pmtpa_rackspace [20:46:42] apergos: lookit that [20:46:47] LeslieCarr: that was before rackspace, that linnk [20:47:01] :) [20:47:02] gaze upon what our pain was and dispair [20:47:07] wikitables for racks =P [20:47:20] you needed an extension [20:47:26] and one was written... and then never finished [20:47:43] heh, no metered power, took sampled readings [20:47:47] horrible. [20:54:01] PROBLEM - Puppet freshness on es1002 is CRITICAL: Puppet has not run in the last 10 hours [20:55:05] ERE_ACCESS_DENIED?? [20:55:14] err, I mean ERR [20:55:23] https://wikisource.org/w/index.php?title=Wikisource%3AScriptorium&action=historysubmit&diff=323108&oldid=323079 [20:56:11] oh, feh [20:56:13] nm [20:56:39] maplebed: is es1002 down you ? [20:56:48] yes. [20:57:00] cool [21:04:48] i've got a question for peeps - nagios.wikimedia.org/nagios/cgi-bin/extinfo.cgi?type=2&host=cp1041&service=Varnish+HTTP+mobile-backend - that is showing as down, however a manual get of the proper ports shows as up -- anyone knows what's wrong with the nagios check ? [21:07:34] hey binasher your change on the 12th requiring xtra-backup also requires msql-client as a dependancy, i'll update pupet [21:07:51] (if anyone couldn't tell, i'm going over nagios checks and trying to fix them ) [21:09:31] New patchset: Lcarr; "Adding in mysql-client to requirements for xtrabackup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1629 [21:09:48] oh actually updating this changelist, ignore that one [21:11:43] New patchset: Lcarr; "Adding in mysql-client to requirements for xtrabackup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1629 [21:11:55] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1629 [21:12:35] LeslieCarr: requiring mysql-client to be latest may not be good [21:12:42] and conflicts with mysql::package [21:13:31] ah okay, i'll switch it to present [21:13:55] LeslieCarr: the conflict is only on the few servers that are in between states between old and new, with some by hand pre-puppet stuff [21:14:18] so.. not doing anything might be best [21:16:41] New patchset: Lcarr; "Adding in mysql-client to requirements for xtrabackup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1629 [21:17:20] i switched it to require => present so it won't conflict [21:17:38] no [21:17:45] it's only on db13 and db17, yes [21:17:46] oh ? [21:17:50] that will break too [21:18:19] the actual package that gets installed in the new build is mysql-client-5.1 which has a "provides: mysql-client" in the pkg manifest [21:18:49] that rule will fix puppet on those 3 dbs and break all others [21:19:33] the real solution here is just to yell at me to fix those 3 [21:21:01] ah [21:21:06] * LeslieCarr yells  [21:21:22] Change abandoned: Lcarr; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1629 [21:22:46] aagh and one of the prod db's for fr / ru /ja wikipedias died over the weekend [21:25:08] RobH: are any of the other new db's in pmtpa from the last order imaged? [21:25:31] hrmm, dont think so, but not sure [21:25:52] binasher: unless there is a ticket to say install the os, nope [21:26:09] though we can get it done if needed [21:26:21] is there any chance you could help me out by getting a couple of them up? s6 is down to a single slave which is also the snapshot host so its kinda of an emergency :( [21:26:35] just want the basic OS with no puppet run yet right? [21:26:40] yup [21:27:04] yea I will go ahead and get them done for you, its just two step process, attended raid setup, then unattended os install [21:27:14] binasher: so you have not installed any of the new ones right? [21:27:20] they are all not in service [21:27:38] db48-db58? [21:27:39] you imaged two which are now in use for otrs [21:27:45] ahh, thats right [21:27:48] 48/49 i think [21:28:00] yep [22:13:54] binasher: so we really do have all of those new dbs allocated for your use right? [22:14:12] so they should all be raid10 since they all will do database level stuff for cluster and such [22:14:34] if thats the case, i can do the os install on all of them and leave them sitting, but if they arent used, its kinda bad to to that [22:14:46] so i prefer to just setup the raid10 on the rest for you, but not do the OS [22:15:09] so when you do need them, its just a netboot and confirming yes to the dialogs in the installer for the lvm stuff [22:15:25] as the dns and dhcp stuff is already done, and the raid will be ready [22:15:55] !log deployed two new cron jobs on hume via /etc/cron.d/mw-fundraising-stats, temporary, will puppetize once we see that they script is working properly [22:16:04] Logged the message, Master [22:20:48] !log db50/db51 online for asher to deploy into s6 [22:20:56] Logged the message, RobH [22:20:57] binasher: db50/51 are ready for you [22:24:59] can any op have a look at the very simple https://gerrit.wikimedia.org/r/#change,1622 please ? :-) [22:25:20] it is to force "ant1.8" package on gallium, the continuous integration server [22:26:25] hashar: i'll check it out [22:28:09] hashar: why are you specifying 1.18 instead of just latest ? [22:32:31] RobH: thanks! [22:34:45] hashar: you back ? [22:34:54] LeslieCarr: yes sorry [22:34:59] np [22:35:13] question - why specifying 1.18 instead of just "latest" [22:35:30] ubuntu provide three ant package [22:35:32] *1.8 [22:35:41] a meta one "ant" which actually install ant1.17 [22:35:45] ant1.17 [22:35:48] and then ant1.18 [22:35:55] okay [22:36:03] installing the latest version of "ant" will bring you ant1.17 :D [22:36:19] android does require 1.18, so I have to explicitly require it [22:36:34] and of course both ant1.18 and ant1.17 are conflicting each others :-( [22:37:46] err not seventeen / eighteen but one dot seven and one dot eight [22:37:53] ant is not MediaWiki :-))) [22:38:13] gotcha - could you maybe put a one line comment to the effect of "when specifying latest this will install ant 1.7 instead of ant 1.8, so specifically calling it out" ? [22:38:15] hehehe [22:38:34] :-D [22:39:09] hehe [22:39:13] if I manage to remember how I rebased this afternoon :D [22:39:40] it's like when kernel write kprintf in userland programs [22:41:38] New patchset: Asher; "new pmtpa dbs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1636 [22:42:53] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1636 [22:42:53] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1636 [22:50:27] LeslieCarr: still wondering how to merge my change :-)) [22:52:32] hehe oops [22:52:43] change 1607 closed :D [22:52:45] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1622 [22:52:46] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1622 [22:52:57] so looks like we need to have it merged and I will send a new change [22:53:07] another new change ? [22:53:26] !log running a hot xtrabackup of db47 to db50 [22:53:34] Logged the message, Master [22:54:00] ahh I forgot to "git rebase" ! [22:54:27] basically i would ask binasher if he's already doing a merge :) [22:57:39] New patchset: Hashar; "comment about ant => latest not being latest ant!" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1637 [22:57:49] LeslieCarr: here is the comment (hopefully) [22:57:51] New patchset: Hashar; "move misc::bugzilla::crons to misc/bugzilla.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1623 [22:58:38] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1637 [22:58:38] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1637 [22:58:44] \o/ [22:58:46] cool :) [22:59:01] you are my evening "mutante" [22:59:04] :D [22:59:19] I have worked a lot with Daniel discovering git / puppet etc... [22:59:30] so he ends up merging most of my stuff [23:00:01] !log resolved apt issues on db13,17 [23:00:10] Logged the message, Master [23:00:53] hashar: what's the contint box again ? [23:01:02] gallium [23:05:51] now to see if it works… :) [23:06:16] looks good :) [23:06:26] puppet hasn't run yet … :) [23:06:44] so there is some kind of race condition installing 1.7 / 1.8 :) [23:06:58] the change should make it clears to puppet that 1.8 is desired [23:07:08] let's hope :) [23:07:15] RECOVERY - DPKG on db17 is OK: All packages OK [23:07:23] however the puppet server is currently hugely loaded up … grrrr [23:07:37] wasn't stafford supposed to give us a pony ? [23:09:01] <^demon> I was promised unicorns, ponies and rainbows. [23:09:13] <^demon> Somebody isn't fulfilling their SLA. [23:11:24] have you signed the agreement? :-) [23:13:14] wait [23:13:19] I got dibs on the unicorn buddy [23:13:25] you only get the ponies and rainbows [23:14:38] http://insaneboer.deviantart.com/art/Alphred-Rainbow-Pooping-Pony-143268599 [23:15:13] <^demon> How'd you find my wallpaper?!? [23:15:27] google. [23:15:29] they have everything. [23:15:32] * apergos goes to bed.  [23:15:37] <^demon> Night apergos. [23:15:48] have a good rest of the day folks [23:15:59] and wish the house to get everything done so they leave before wednesday [23:16:06] otherwise they'll be back doing sopa, dang it. [23:16:10] hashar: interesting --- err: /Stage[main]/Misc::Contint::Test::Jenkins/File[/srv/org/mediawiki/integration/WikipediaMobile/nightly]/ensure: change from absent to directory failed: Cannot create /srv/org/mediawiki/integration/WikipediaMobile/nightly; parent directory /srv/org/mediawiki/integration/WikipediaMobile does not exist [23:16:11] tah! [23:16:22] grr [23:16:30] <^demon> LeslieCarr, hashar: You have to create every directory in a tree. [23:16:33] I wish puppet could create parent directories [23:16:36] <^demon> file{} isn't recursive. [23:16:41] that is so lame ! [23:16:49] it is super lame. [23:16:49] let me amend [23:16:55] its annoyed me many times in the past [23:17:02] pretty much every time i puppetize a service. [23:17:24] hashar: also… ant installed 3 versions of itself [23:18:54] New patchset: Hashar; "Create WikipediaMobile/nightly parent directory" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1640 [23:19:13] LeslieCarr: that is common among ants. They try to reproduce / spread quickly. [23:19:19] ah… that's it :) [23:19:28] I am pretty sure I killed their queen though [23:19:50] damn ;) [23:20:06] so 1640 might/should/dontknow create the parent directory [23:20:15] but then it depends on another change :-/ [23:21:26] New review: Lcarr; "LGTM" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1640 [23:21:29] looks good to me [23:21:34] let's try it out :) [23:21:48] but you need https://gerrit.wikimedia.org/r/#change,1623 [23:22:02] I still don't know how to change the parent of a change [23:22:19] that should be possible by branching / rebasing but I have not tried it yet :\ [23:23:11] New patchset: Hashar; "Create WikipediaMobile/nightly parent directory" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1640 [23:23:29] here we go [23:23:33] a new clean patch set :-) [23:23:55] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1640 [23:23:56] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1640 [23:24:01] I did a new branch, cherry picked the change and push-for-review-production [23:25:18] * hashar daugther duty [23:37:56] hashar: the new files are working fine, however ant is not [23:40:14] <^demon> Wait, was there a reason we had to reinstall ant on gallium? [23:40:21] <^demon> We should've already had ant for jenkins [23:40:41] they need ant 1.18 [23:43:19] also ^demon - it looks like ant isn't directly installed for jenkins… maybe indirectly installed via one of its dependancies ? [23:43:33] <^demon> Perhaps. [23:44:00] <^demon> Or maybe jenkins uses a bundled ant. [23:45:33] ants sneak around in everything ! [23:46:00] LeslieCarr: sorry my daughter does not want to sleep [23:46:11] no problem hashar - she's obviously the boss :) [23:46:12] I guess I will keep it busy a bit on my knees :D [23:46:55] so ant is broken. Maybe there is another package requesting ant [23:51:37] yeah... [23:54:41] I have ant 1.8.0 on gallium right now :D [23:54:56] it's there, there's just also the 1.7 [23:54:57] eep [23:55:30] dpkg -l 'ant*' [23:55:35] give me uninstalled status [23:55:43] for ant and ant1.7