[00:49:28] PROBLEM - Puppet freshness on snapshot4 is CRITICAL: Puppet has not run in the last 10 hours [03:56:30] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [06:10:06] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [09:38:32] PROBLEM - Puppet freshness on bast1001 is CRITICAL: Puppet has not run in the last 10 hours [09:38:32] PROBLEM - Puppet freshness on fenari is CRITICAL: Puppet has not run in the last 10 hours [09:38:32] PROBLEM - Puppet freshness on es1002 is CRITICAL: Puppet has not run in the last 10 hours [09:56:42] RECOVERY - MySQL slave status on es1004 is OK: OK: [10:09:19] PROBLEM - Disk space on db9 is CRITICAL: DISK CRITICAL - free space: /a 10743 MB (3% inode=99%): [11:24:44] PROBLEM - Puppet freshness on snapshot4 is CRITICAL: Puppet has not run in the last 10 hours [11:27:10] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1618 [11:27:10] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1618 [11:31:06] New review: Dzahn; "what about "misc::jenkins"?" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/1611 [11:33:18] New review: Dzahn; "sure, if the dir is missing anyways..." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1620 [11:33:18] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1620 [11:36:42] New review: Hashar; "misc::jenkins already exist and is a different project not related to mediawiki continuous integrati..." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/1611 [11:37:07] mutante: I have answered on change 1611 (misc::jenkins already exist and is a different project) [11:37:10] oh and hello :-) [11:37:31] hello hashar [11:37:46] I have made some other changes friday related to android. They should be easy changes :D [11:38:27] ok, so in 1611 all you did was move code around, you did not change other things in the same patch set, right? [11:40:48] just moved stuff around [11:41:01] so that should be fine [11:41:12] mutante: Could you also review&merge 1617 and 1619? They fix the puppet errors on fenari [11:41:23] I am not sure what the second patch set is about. I think I git pushed my repo somehow [11:41:38] and that triggered new patch set in gerrit for everything that was not merged [11:42:00] hello RoanKattouw :) [11:48:39] New review: Dzahn; "yep, we want to move stuff into separate files..." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1611 [11:48:40] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1611 [11:50:20] mutante: thanks. https://gerrit.wikimedia.org/r/#change,1572 should be straight forward too. [11:50:41] it is messing up with the file headers :) [11:50:59] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1617 [11:50:59] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1617 [11:52:59] RoanKattouw: so you also want "mysql-client-5.1" removed from "misc::survey"? [11:54:05] Which rev is this again? [11:54:09] 1619? [11:54:12] !g 1619 [11:54:13] https://gerrit.wikimedia.org/r/1619 [11:54:47] yes [11:55:03] mutante: I'm removing mysql-client-5.1 and adding mysql::client which adds mysql-client-5.1 [11:55:09] So the package should continue to be installed [11:55:44] ok, just saw that [11:55:45] (Also note that this is a cherry-pick (of 1548 IIRC)) [11:56:13] 1048 [11:57:01] you are adding it to misc::fundraising , i was just wondering "why does it say bastion host then" [11:57:04] ok [11:57:09] Oh [11:57:13] So here's the story [11:57:21] 1048 added it to a few things, including misc::bastion [11:57:30] Then Chad went and split out misc::bastion into its own file [11:57:43] Then that change made it into production, but my change introducing mysql::client didn't [11:58:01] So puppet on fenari was failing because misc::bastion required the non-existent class mysql::client [11:58:10] ..ah i see [11:58:19] Cherry-picking my change (sans the bastion part because that stuff was moved and erroneously applied already) fixes that [11:59:13] New review: Dzahn; "ok, convinced :)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1619 [11:59:14] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1619 [12:05:27] New review: Dzahn; "just a tiny thing: "ensure=>installed;" would be nicer as "ensure => installed;"" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/1612 [12:06:46] New review: Dzahn; "Should "Nighties" be "Nightlies"? (nightly)?" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/1615 [12:09:36] mutante: yeah I would have said nightlies too but my Mac converts it to nighties (without L) [12:09:40] need to verify that [12:10:26] http://en.wiktionary.org/wiki/nightly#Noun [12:10:53] http://www.indianshaadi.org/tag/nighties/ ( Not safe for work :p ) [12:11:01] heh [12:11:23] (informal) A woman's nightgown or nightdress; a dress-like garment worn to bed. [12:11:33] http://en.wiktionary.org/wiki/nightie#Noun [12:13:26] check_job_queue still fails .. sigh [12:17:48] hashar: off-topic, have you been here before? catacombes-de-paris.fr [12:18:17] yeah several times [12:18:23] this is the official site for tourists [12:18:29] this / that [12:18:49] that is also the safe way to visit the catacombs [12:19:03] and yeah, you can actually see skeletons :-)) [12:19:39] during the 18th century, the graveyard in Paris center was closed [12:19:52] and all human remains moved to the Catacombes [12:20:22] not sure how old the graveyard was [12:21:26] i was actually there on the weekend, next time i shall tell you before;) [12:21:44] yea, skeletons, creepy:) [12:21:59] RECOVERY - Puppet freshness on bast1001 is OK: puppet ran at Mon Dec 19 12:21:46 UTC 2011 [12:22:37] mutante: I used to visit them the illegal way [12:22:54] i.e. finding a way which is not really open to public [12:23:01] and NOT safe at all [12:23:09] hashar: some guy told use people get drunk and then dont find their way out anymore :P [12:23:21] I am sure it happened a few time [12:24:10] we did that using "un Fil d'Ariane" [12:24:19] http://de.wikipedia.org/wiki/Tauchseil [12:24:31] it was kind of funny how they search your bag when you exit the catacombs.. to check for stolen bones and skulls :p [12:25:00] the people I explored with had maps of the places, we had water and food for several days and light (ton of lights) [12:25:23] yeah I am sure tourist try to steal bones for "souvenirs" [12:25:25] wow. i didnt know about "Tauchseil" [12:25:48] I am not sure Tauchseil is what it is meant to be. I just followed the interwiki links 8-) [12:25:52] well, i understand, but not that it is named after "Ariadne" / Arian" [12:26:00] yep [12:26:45] I kinda like nighties ;-) [12:26:50] apergos: :))) [12:26:51] :) [12:27:14] so there's a complaint about the yaml file for puppet on sn4, it's what prevents it from running. [12:27:18] we could keep it as an easter egg :D [12:27:19] no idea what the fix is for that though. [12:27:28] +1 to keep! [12:27:38] food in 5 minutes! [12:27:56] lunching too [12:28:11] syntax error on line 90, col 5: `' [12:28:12] hmmm [12:28:24] * hashar hides [12:29:59] RECOVERY - Puppet freshness on fenari is OK: puppet ran at Mon Dec 19 12:29:31 UTC 2011 [12:36:40] apergos: fixed it! [12:36:52] quote: "Need to remove the .yaml files in /var/puppet/yaml/node and /var/puppet/yaml/facts (or where ever you may have these files, ... Then the client will run and the files will get rebuilt. " [12:36:59] RECOVERY - Puppet freshness on snapshot4 is OK: puppet ran at Mon Dec 19 12:36:41 UTC 2011 [12:38:26] !log deleted snapshot4 files from /var/lib/puppet/yaml/node and ./yaml/facts on sockpuppet and stafford, they got recreated and fixed puppet run on sn4 [12:38:34] Logged the message, Master [12:39:55] ah you just deleted it? great [12:40:05] yep, and after a manual run they exist again [12:40:09] but dont cause the error anymore [12:40:20] cause they are rebuilt [12:40:23] yes [12:40:28] sweet [12:41:01] :) enjoy lunch ,bbiaw [12:42:19] oh my [12:42:55] nighties renamed : https://gerrit.wikimedia.org/r/#change,1615 [12:43:04] and I managed to amend a change :-))) [12:43:05] \o/ [12:43:10] eating [12:47:52] New patchset: Hashar; "basic class to install Android SDK prerequisites" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1612 [12:47:52] New patchset: Hashar; "directory to host WikipediaMobile nightly builds" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1615 [12:47:52] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1615 [12:51:41] New patchset: Hashar; "basic class to install Android SDK prerequisites" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1612 [12:51:53] New patchset: Hashar; "directory to host WikipediaMobile nightly builds" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1615 [12:52:05] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1612 [12:52:05] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1615 [12:52:13] bah [12:52:33] mutante_: i have amended changes 1612 & 1615 :-) [12:56:45] [12:13:26] check_job_queue still fails .. sigh [12:56:48] mutante_: What's the error message? [12:57:16] Also [12:57:18] [12:29:59] RECOVERY - Puppet freshness on fenari is OK: puppet ran at Mon Dec 19 12:29:31 UTC 2011 [12:57:20] Yay! [13:00:39] New review: Dzahn; "i think you want the 2 links in their own

tag, so they are listed below each other. and how wil..." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/1615 [13:01:08] RoanKattouw: it still gets "return code 127" [13:01:23] RoanKattouw: i think it just takes too long (longer than timeout configured in Nagios) [13:01:50] Hmm [13:02:02] No descriptive error message? Have you tried running the command manually on spence? [13:02:20] Hmm, also [13:02:21] yes, not today, but after our changes, and it worked (but took quite a while) [13:02:36] It could probably be rewritten to be more efficient [13:02:40] and i saw the timeout is less [13:02:52] Right now, it runs checkJobs.php or whatever it's called separately for every wiki [13:03:03] So that's 800+ PHP initializations and 800+ MySQL connection setups and teardowns [13:03:40] If we write one script to check all wikis (already exists in some form for jobs-loop.sh I believe), we'd have 1 PHP initialization and 7 MySQL conns [13:04:10] that sure sounds better [13:04:34] or we could have a central job database :) [13:04:34] time /usr/local/nagios/libexec/check_job_queue [13:04:34] JOBQUEUE OK - plwiki has the 3rd biggest job queue, 0 entries [13:04:44] real0m21.203s [13:04:46] 1 PHP instant / 1 conn / 1 query ? [13:04:52] heh, that is already a LOT faster than it used to be [13:04:53] WTf [13:05:04] it was like 5 minutes or so before [13:05:04] 21 seconds just to check plwiki which has 0 entries? [13:05:06] That's insane [13:05:17] This is a check for *one wiki* , right? [13:05:36] for a in commonswiki enwiktionary dewikinews itwiki ptwiki dewiki plwiki frwiki eswiki jawiki svwiki ; do [13:06:34] Ohg [13:06:36] "arning: Return code of 127 for check of service 'check_job_queue' on host 'spence' was out of bounds. Make sure the plugin you're trying to run actually exists." [13:06:42] So 11 wikis [13:06:43] hmmm.. or the path is just wrong? :p [13:06:50] and we changed the wrong file.. checking [13:06:52] Anyawy [13:07:05] Rewriting that check sounds like a fun project [13:07:12] I'll take a stab at that later today [13:07:16] cool [13:08:26] ooh.. "duplicate definition" [13:10:09] yeah, now that it's been completely puppetized, we need to get rid of the old manual config [13:11:50] uuh.. there is also "makeService" in "conf.php" [13:14:13] !log commented check_job_queue stuff from non-puppetized files on spence (hosts.cfg, conf.php) to get rid of "duplicate definition" now that it's been pupptized [13:14:21] Logged the message, Master [13:18:53] !log truncated spence.cfg in ./puppet_checks.d/ - it had multiple dupe service definitions for all checks on spence [13:19:01] Logged the message, Master [13:19:16] stupid apt pinning [13:19:27] bac [13:19:28] k [13:19:36] look at this: :) [13:19:38] /etc/init.d/nagios reload [13:19:38] Purging decommissioned resources...Running configuration check...done. [13:19:42] Reloading nagios configuration...done [13:19:49] no "WARNING"s at all anymore:) [13:19:58] yay [13:20:02] well done :-) [13:20:12] eh, nevermind [13:20:22] none for spence, but a lot for other hosts [13:21:06] they way services are define in puppet_checks.d creates duplicates [13:21:08] the [13:22:15] it's like stuff is being appended to those files, but not removed [13:25:52] and also interesting is that first you get tons of "Warning:" lines when checking config, but then it still claims "Total Warnings: 0" afterwards [13:27:04] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1612 [13:27:04] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1612 [13:29:06] hashar: you are now including "misc::contint::android::sdk" on gallium, i ran puppet..applied config .. now. "Misc::Contint::Android::Sdk/Package[ant1.8]/ensure: ensure changed 'purged' to 'present'" [13:29:25] Finished catalog run [13:29:35] great! [13:29:56] I am trying to find out how to amend https://gerrit.wikimedia.org/r/#change,1572 :p [13:30:08] ii ant1.8 [13:30:45] hashar: let me show you what i meant.. [13:32:34] mutante: https://gerrit.wikimedia.org/r/#change,1572 amended :) [13:33:39] New patchset: Dzahn; "logrotate: add "managed by puppet" headers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1621 [13:33:47] arrrg..no [13:34:00] that was supposed to be 1572 [13:34:19] Change abandoned: Dzahn; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1621 [13:34:54] ohh [13:35:11] yeah pushing to production instead of test is fun :) [13:36:13] mutante: have you uninstalled ant1.8 from gallium? [13:37:00] hashar: no, it has been installed by your puppet change [13:37:45] well it is no more installed :D [13:38:01] 14:30 < mutante> ii ant1.8 [13:38:28] yeah and I confirm I had access to a 1.8 version of ant just after you pasted that [13:38:41] but now: un ant1.8 [13:38:46] hehe [13:38:58] do you want to remove "ant"? [13:39:06] to keep "ant1.8" ? [13:39:17] ant = 1.7.1 [13:39:42] ahh could they be in conflict? [13:39:53] I mean installing "ant" would automatically remove ant1.8? [13:40:05] I was expecting to have both in parralels [13:40:10] yes, they are in conflict [13:40:13] Conflicts: ant, ant-doc (<= 1.6.5-1), ant1.7, libant1.6-java [13:40:25] need to ensure "ant" is gone.. [13:41:06] I should have looked at the Conflicts: field [13:41:14] was expecting it to behave like the gcc packages [13:42:14] did this break something that is already used ? [13:42:37] want me to fix it manually while you fix puppet? or can it wait [13:42:45] it can wait [13:42:53] ant 1.7.1 is still around for jenkins to use [13:42:58] ok, and 1572 ok for you like it is now? [13:44:04] yeah 1572 is fine you can merge it [13:44:15] I have cut the ### lines after 70 chars [13:45:34] done. feel free to review my pending changes as well:) [13:46:34] :p [13:49:53] mutante: I think I will add a new class for ant1.8 :) [13:50:17] I am just wondering if I should "require" it or just "include" it [13:50:31] it will be needed by the android::sdk one as well as jenkins [13:53:08] if you use require it will apply any changes to required objects BEFORE the rest [13:53:25] then there is also the opposite, called .. "before" :p [13:53:51] require: "used purely for guaranteeing that changes to required objects happen before the dependent object" [13:53:55] before: "the opposite of require — it guarantees that the specified object is applied later than the specifying object" [13:54:11] oh [13:54:14] so I can just include [13:54:22] sounds a bit confusing .. applying it LATER needs BEFORE :p [13:54:57] class x { before y } [13:55:07] So x says "apply me before y" [13:55:29] ok, yeah [13:57:56] New patchset: Hashar; "misc::contint::ant to ensure we have ant1.8" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1622 [13:58:09] oh.. and "Note that Puppet will autorequire everything that it can" [13:58:26] here is the new ant class ^^^ (1622) [13:58:28] "exec resources will autorequire their CWD (if it is specified) plus any fully qualified paths that appear in the command. " [13:58:54] which does not lint of course [13:59:54] "line 142 in file /var/lib/gerrit2/review_site/tmp/Ic7fd6a0d094c2ffe6b6cc99d5e590b7587f5798e/manifests/dns.pp" [14:00:02] where is the relation to "dns.pp"? [14:00:31] New patchset: Hashar; "misc::contint::ant to ensure we have ant1.8" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1622 [14:00:43] yeah it is confusing :( [14:01:00] dns.pp is followed by some ascii sequence [14:01:14] I think a newline character is stripped there [14:01:35] now it is "Syntax error at end of file" in contint.pp [14:01:42] doh [14:02:56] New review: Dzahn; "one more } on line 64" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/1622 [14:02:58] New patchset: Hashar; "misc::contint::ant to ensure we have ant1.8" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1622 [14:03:37] this time it looks fine [14:03:50] still need to install puppet locally so I can do basic lint checks [14:04:36] ideally the lint should be made in a pre commit hook to show us the nice colored message in our cli :) [14:09:34] New review: Hashar; "Having all blogs under the same class looks nicer." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/1607 [14:11:01] New review: Hashar; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/1606 [14:15:18] New review: Hashar; "Both links are part of the same sentence so they deserve to be in the same

block. " [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/1615 [14:17:01] New patchset: Dzahn; "misc::contint::ant to ensure we have ant1.8" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1622 [14:19:16] New patchset: Dzahn; "misc::contint::ant to ensure we have ant1.8" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1622 [14:20:53] New review: Dzahn; "How do you like this? Putting it in generic::packages , so it can be used in other places too." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/1622 [14:22:01] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1615 [14:22:01] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1615 [14:23:47] New review: Dzahn; "yes, not using new classes anywhere initially, only once we are sure nothing is missing." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1607 [14:23:47] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1607 [14:28:44] moving to dentist and then downtown [14:28:49] should be back in roughly 2 hours [14:28:57] thanks mutante for your reviews / merges :-) [14:29:34] New patchset: Hashar; "move misc::bugzilla::crons to misc/bugzilla.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1623 [14:29:39] see you [14:29:48] New patchset: Hashar; "misc::contint::ant to ensure we have ant1.8" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1622 [14:29:59] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1623 [15:10:34] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [15:23:06] mutante: you there? [15:27:05] hi peter [15:28:16] hey [15:28:26] so about the search index ticket [15:28:40] I'll look into it some, as I'm trying to figure out this whole search thing [15:28:47] cool! [15:28:57] but the thing about "start proc as rainman" [15:29:04] is that he has access to our search boxes [15:29:09] and is the one who actually set it all up [15:29:14] so if we start procs as root and not him [15:29:22] then he can come and do stuff if he has time [15:29:49] ok, his comments sounds like he does not have root though [15:30:27] no he doesn't [15:30:29] (have root) [15:31:02] New review: Dzahn; "Yea, thanks. I was planning to do just this, too." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1623 [15:41:17] notpeter: if you are getting into search stuff anyways.. there is one other ticket .. RT #187 , we (maplebed and me) never found out what is actually the difference that makes some servers show less statistics [15:41:21] ;) [15:43:33] New patchset: Catrope; "Rewrite Nagios job queue check using new getJobQueueLengths.php script" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1624 [15:46:40] robh: good morning [15:48:41] !log restarting indexer on searchidx2 [15:48:49] Logged the message, and now dispaching a T1000 to your position to terminate you. [15:52:35] !log dataset1 reinstalled and has had puppet run. Now to see if it can keep time [15:52:38] apergos: ^ [15:52:43] Logged the message, RobH [15:53:08] yay [15:53:11] thank you very much [15:54:02] I'm watching the syslog [15:54:16] anybody want to place any bets? [15:54:25] i bet it works longenough for us to think its good [15:54:30] we close ticket, and it dies [15:54:31] hahahaha [15:54:59] so you think it will in fact keep time. for at least a while. [15:55:14] yea [15:55:32] we'll see [15:55:42] so, when the ticket closes, the warranty is expired. [15:55:44] its three years old [15:55:45] in an hour or so if it still looks ok I'm gonna crank up an rsync [15:55:50] next time it really dies, it stays dead. [15:55:51] shitballs realls? [15:56:00] are these 1tb drives? [15:56:05] dunno [15:56:05] i dont recall... [15:56:17] you would think i would, but its that shitty raid controller [15:56:56] new eq portal is nice. [15:57:06] i can actually see, and read all my open and resolved tickets [15:57:11] amazin. [15:57:41] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1624 [15:57:42] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1624 [15:59:30] um [15:59:35] where's the raid? [16:02:13] shouldn't there be... an lvm or something? on the raid array? [16:02:22] mounted someplace? [16:03:55] ahhh [16:04:31] apergos: oh, its borked [16:04:35] there's sda b c and d (apparently), claming to be 4T each or something [16:04:37] =/ [16:04:42] ah bleep bleepers [16:04:53] yea, the logical volumbe group is there [16:05:02] but the logival volume doesnt appear to be [16:05:16] nice [16:06:15] ds1 is broken, news at 11 [16:07:06] apergos: ok run vgdislplay [16:07:09] so you see what i see [16:07:12] =[ [16:07:22] see how its bitching about uuids? [16:07:33] yes I do [16:07:51] * apergos thinks about stabbing roan [16:08:08] apergos: so i am not sure how to fix this one, i think we may just want to wipe out all that stuff [16:08:10] and redo the lvm [16:08:15] sure [16:08:20] may be easier than tryin gto fix the existing one [16:08:24] worksforme [16:08:44] * RoanKattouw thinks he's reached his quota for the next while re antagonizing people [16:09:47] hahaha [16:09:48] Cannot change VG vg0 while PVs are missing! [16:09:54] i want to remove it stupid computer!~ [16:10:22] RobH: that might be dangerous... [16:10:31] how dangerous can it be [16:10:38] notpeter: we are trying to lose the data. [16:10:40] at worst he just reinstalls the whole bloomin thing again [16:10:49] except reinstall needs someone to hit F12. [16:10:50] well, I mean, dangerous enough that there might be a warning and it won't let you [16:10:55] oh yeah [16:10:55] usually that means it's pretty dangerous [16:10:58] and we have no one huh [16:11:06] notpeter: yea, i want it to remove all data ;] [16:11:12] but it doesnt like that [16:11:15] its not mounted though. [16:16:06] hmm.. something like "dd if=/dev/zero of=/dev/..." to kill meta data? [16:16:43] meh, i just going to use parted to kill the lvm partitions entirely [16:16:56] is parted ok for gpt partitions? [16:17:44] that worked [16:17:51] fdisk recommended parted for it [16:17:52] heh [16:17:55] yea, just fdisk isnt [16:17:58] indeed [16:18:11] ok great [16:18:15] its working, and sees them all, and now they are gone [16:18:21] so the space is now unclaimed, which is a good start. [16:20:28] RoanKattouw: works on spence: JOBQUEUE OK - all job queues below 10,000 [16:20:40] yay [16:25:22] sweet [16:32:11] ok, lvm is kind of a pain in the ass [16:32:22] apergos: want me to bore you with the details? [16:33:30] yes I do [16:33:34] if you don't mind that is [16:33:48] the way we create stuff in /etc/nagios/puppet_checks.d/ is broken [16:33:55] spence:/etc/nagios/puppet_checks.d# wc -l *.cfg | sort -rn| less [16:34:04] f.e. "spence.cfg" has over 1000 lines :p [16:34:23] um [16:34:24] keeps appending stuff [16:34:36] so i wiped out the partitions, but the pv data was still there [16:34:40] so i did work i didnt need do [16:34:51] what i really needed to do is 'pvremove /dev/partition -ff [16:34:58] which forces them to wipe out regardless [16:35:27] So I did that just now, and i have no physical volumes [16:35:31] ok... [16:35:49] with lvm, you create physical volumes first, then volume group [16:35:52] then logical volume [16:35:54] right [16:35:59] so now I am doing pvcreate on each new partitoin [16:36:40] apergos: still on ds1? go ahead and run pvdisplay [16:36:48] the output no longer is filled iwth bad UUID info [16:36:52] as they are brand new [16:37:20] well vgdisplay (which I ran earlier) is quite short now :-D [16:37:49] there are no volume groups, since we wiped out the physical volume under it [16:37:51] pvdisplay shows me four volumes [16:37:59] yep, each of our 4 raid6 arrays [16:38:03] two per chassis [16:38:26] (though part of array 1 and two are os and swap) [16:38:38] so we're going to wind up with how much available at the end of this? [16:39:11] never mind I'll wait til we get there [16:39:18] then we'll know for sure [16:39:20] apergos: now do vgdisplay [16:39:33] i did vgcreate vg0 /dev/disk /dev/disk [16:39:36] for all the partitions [16:39:37] uh huh 14 + change [16:40:00] ok [16:41:36] so now i am making the actual logical volume [16:41:39] and do lvdisplay [16:41:55] uh huh [16:41:56] did lvcreate -L 14.45T -n lv0 vg0 [16:42:10] so now i need to make the filesystem, and mount is all [16:42:15] great [16:43:38] off-topic: my cat has learned that rattling noices from a small plastic container mean cat treats. except they don't. they mean cashews. she's been very disappointed lately. [16:43:39] apergos: ok, done, and mounted. now we just have to update fstab [16:43:56] xfs, okey dokey [16:45:16] !log dataset1 new data partition ready and setup to automount [16:45:23] sweet [16:45:24] Logged the message, RobH [16:45:25] thanks [16:45:27] apergos: Ok, dataset1 is a clean install all for you now =] [16:45:34] glad to do it, i was rusty on that stuff [16:45:37] was good to poke at it again [16:45:47] guess I'll give it more than an hour [16:45:56] if the drift is going to be noticeable enough .... [16:46:00] lets move data =] [16:46:11] I will right after that [16:46:14] cool [16:46:25] I cna't in fact just do an rsync of /data [16:46:33] cause guess what this disk is much smaller than ds2 [16:46:44] sucks eh? [16:46:45] but I can start with the pagestats files [16:46:48] yes it does [16:46:54] so is this a useful server? [16:47:00] let's wait and see [16:47:04] if it turns out stable that is [16:47:30] even with smaller size, it can be where active data is stored while historic data and public serving is from dataset2 with its larger disk? [16:47:33] (just wondering aloud) [16:47:38] if its stable of course. [16:47:51] well my main concern is that I want a second copy of the data on ds2 [16:48:09] (plus it owul dbe nice if it dies to have another host so we can keep the runs going) [16:48:20] so I can get part of that done. :-/ [16:48:47] did the ds3 order go in before the price went up? [16:53:20] i put it in this morning [16:53:26] and the quote said old prive [16:53:31] price but dunno [16:53:34] ok [16:53:37] guess we'll see [16:53:50] it will cost what it costs at this point [16:55:03] yep, either way its ordered [16:55:07] and your snapshot hosts are too [16:55:24] so by end of next month you will have another cluster for this =] [16:58:00] sweeeeet! [17:01:46] New patchset: Dzahn; "make service retry and check intervals configurable - still missing a line" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1625 [17:02:13] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1625 [17:02:14] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1625 [17:06:58] New patchset: Dzahn; "fix path to check_job_queue" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1626 [17:07:23] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1626 [17:07:24] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1626 [17:24:03] top thing I hate about winter: Xmas [17:24:14] second. [17:24:42] Dr 1 hour became Mr 2 hours thanks to traffic jam [17:25:05] looks like suddenly anyone 50 kilometers around wanted to be in downtown :-) [17:26:22] top thing i hate: too damned cold. [17:30:19] boo hoo, your virginia is so cold =P [17:30:40] my place is near the atlantic coast and is not that cold [17:30:59] we usually have 3 weeks of freezing (below 0°C) weather [17:33:57] the majority of the winter here is below freezing [17:34:16] well, on three year average. [17:34:25] but that includes a crazy winter before i got here ;] [17:34:33] its in the 40s now, its been there all week [17:34:55] i cannot contemplate Celsius for outside temps. [17:35:00] horrible american education [17:35:11] for server temps its fine, but its just odd disconnect in my brain [17:36:42] mutante: https://rt.wikimedia.org/Ticket/Display.html?id=2160 -- worse than before? [17:39:37] RobH: notpeter: top thing about Xmas: they are airing the good old James Bond movies :) [17:39:50] a pen that kills people! [17:39:52] yay! [17:40:03] * RobH has seen them all and would watch them again [17:40:13] hexmode: I restarted the indexer. I would not be surprised if it's just starting to rebuild the index and is kinda fucked up currently [17:40:35] notpeter: k, wasn't sure how long to wait [17:40:48] hexmode: I'm not sure either :) [17:40:59] \o/ confusion! [17:41:00] trying to figure search out... [17:41:03] hahahahah [17:41:07] RobH: i have most 5 (or at least 10) degree C increments memorized in F for realistic weather range. so, that helps but then i started getting used to it. my phone is set to C. i lived in the land of C for over a month straight [17:41:31] so, 20=68,25=77,30=86 [17:42:07] notpeter: oo! rainman updated the bz ticket [17:42:14] ah what's going on? [17:42:59] rainman-sr: you have RT access? [17:43:19] nop [17:44:02] I am not sure any volunteers have access to RT. [17:44:06] hexmode: ^ [17:44:11] y [17:44:23] Because it has a lot of security issues listed as well as pricing information [17:44:26] hexmode: what's the BZ number? [17:44:51] I think the end plan is to have something that is more customizable, but its a large scale project. [17:44:54] notpeter: rainman-sr just updated another one, 1s [17:45:05] kk [17:45:28] notpeter: https://bugzilla.wikimedia.org/32634 points to both, I think [17:45:39] so basically [17:45:53] someone just needs to reset the dates to before gender stuff was introduced [17:46:00] there are files on searchidx2 [17:46:09] like: /a/search/index/status/dewiki [17:46:20] which just have a date of last incremental update [17:46:24] this needs to be changed [17:46:32] and no restarting is needed [17:46:57] once the wiki is up to be updated the next time, this file will be read and update started from that date [17:47:54] gotcha [17:47:56] RobH: does domas really not have RT? [17:47:58] it's fairly ease, a bit laborious, but what we lack at this point is the exact then to which this needs to be reset [17:48:06] that makes sense [17:48:29] domas has root [17:48:36] i know :) [17:48:40] not giving root folks access is an empty gesture [17:48:50] so they are not normal cases, they are root first [17:48:52] =] [17:49:29] so domas and jens have access [17:49:54] rainman-sr: what will be different this time the index is rebuilt? I'm just curious [17:49:55]