[00:18:29] !log updating OpenStackManager to r114754 on virt0 [00:18:31] Logged the message, Master [00:45:13] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [00:50:01] New patchset: Bhartshorne; "teaching swiftcleaner to save the files it deletes for later inspection" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4382 [00:50:16] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4382 [00:50:49] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/4382 [00:50:52] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4382 [00:55:35] New patchset: Bhartshorne; "enabling saving deleted files in the swift cleaner" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4383 [00:55:50] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4383 [00:55:52] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/4383 [00:55:55] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4383 [01:07:39] New patchset: Bhartshorne; "grumble. overlapped option letters." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4384 [01:07:54] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4384 [01:07:55] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/4384 [01:07:58] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4384 [01:21:43] !log updating OpenStackManager to r114757 on virt0 [01:21:47] Logged the message, Master [05:26:04] PROBLEM - Squid on brewster is CRITICAL: Connection refused [05:40:46] RECOVERY - Squid on brewster is OK: TCP OK - 0.001 second response time on port 8080 [05:47:04] PROBLEM - Squid on brewster is CRITICAL: Connection refused [06:32:22] PROBLEM - MySQL Slave Delay on db1018 is CRITICAL: CRIT replication delay 202 seconds [06:33:43] PROBLEM - MySQL Replication Heartbeat on db1018 is CRITICAL: CRIT replication delay 247 seconds [06:33:52] PROBLEM - MySQL Slave Delay on db24 is CRITICAL: CRIT replication delay 210 seconds [06:34:01] PROBLEM - MySQL Replication Heartbeat on db24 is CRITICAL: CRIT replication delay 214 seconds [06:51:21] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [06:51:21] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [07:06:21] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [07:06:21] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [07:12:57] RECOVERY - Squid on brewster is OK: TCP OK - 0.004 second response time on port 8080 [08:24:21] PROBLEM - Puppet freshness on search1022 is CRITICAL: Puppet has not run in the last 10 hours [08:24:48] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:28:51] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [08:39:21] PROBLEM - Puppet freshness on search1021 is CRITICAL: Puppet has not run in the last 10 hours [08:45:48] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:47:45] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [09:11:09] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:13:06] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [09:19:33] PROBLEM - MySQL Replication Heartbeat on db24 is CRITICAL: CRIT replication delay 196 seconds [09:19:51] PROBLEM - MySQL Slave Delay on db24 is CRITICAL: CRIT replication delay 203 seconds [09:51:21] PROBLEM - Puppet freshness on sq34 is CRITICAL: Puppet has not run in the last 10 hours [09:55:24] RECOVERY - MySQL Replication Heartbeat on db24 is OK: OK replication delay 0 seconds [09:55:33] RECOVERY - MySQL Slave Delay on db24 is OK: OK replication delay 0 seconds [10:46:24] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [11:06:52] apergos: wanna review my "change master" mysql query, re: reslaving es-* ? [11:07:14] oookkkkk [11:07:37] (that's me answerign with some skepticism about review abilities, but yeah I'll look at it) [11:08:34] so i followed the steps on External_storage#Making_a_new_slave_using_snapshots on es1004 [11:08:42] rsyncing from es1001 [11:08:45] and that finished now [11:08:51] right [11:09:12] so the next step is cat /a/slave_status_YYYY-MM-DD.txt [11:09:20] uh huh [11:09:34] es1004: /a/slave_status_2011-04-04.txt [11:09:48] this file has been created by leslie on es1002 though [11:10:01] and we have been rsyncing from the same snapshot [11:10:12] yeah I thought that was the case [11:10:22] since ther were two rsyncs going and only the one snap mounted [11:10:33] mutante: jenkins still alive :-D [11:10:34] now I of course have a separate snap mounted :-P anyways [11:10:41] so i need to relace the values for master_host, master_user, password , log_file, log_pos [11:10:51] uh huh [11:11:23] the one i would execute is : cat /root/query.sql [11:12:02] the password 'xxxx' i replaced with the one i found as "repl-password" on fenari [11:12:39] note though, the master_host IP resolves to es3 , not es1001 [11:12:48] is that really right? [11:14:11] so lemme do what I do (which is make sure I undrstand how these commands are supposed to work) [11:14:18] then I'll chime in in a bit [11:14:18] es1001 has that "intermediate master" status [11:14:29] uh huh [11:14:39] ok [11:14:43] thanks [11:14:46] RECOVERY - MySQL Replication Heartbeat on db1018 is OK: OK replication delay 8 seconds [11:15:21] RECOVERY - MySQL Slave Delay on db1018 is OK: OK replication delay 0 seconds [11:20:54] RECOVERY - MySQL replication status on es1004 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : s [11:39:57] !log restarting lsearchd on search3... again... [11:40:00] Logged the message, notpeter [11:51:17] !lop es1004 - rsync was finished, deleted all binlogs from old host, mysqld_safe& , but did not "change master.." and "start slave" (see mail) [11:51:23] !log es1004 - rsync was finished, deleted all binlogs from old host, mysqld_safe& , but did not "change master.." and "start slave" (see mail) [11:51:25] Logged the message, Master [12:01:19] New review: Dzahn; "looks good." [operations/puppet] (production); V: 1 C: 1; - https://gerrit.wikimedia.org/r/4380 [12:10:38] New review: Dzahn; "should fix RT #2777 and BZ #35709. "SSL certificate problem, verify that the CA cert is OK." on git ..." [operations/puppet] (production); V: 1 C: 1; - https://gerrit.wikimedia.org/r/4334 [12:11:44] we should make the bot take popular tyops [12:11:47] typos! [12:12:12] hehe, that always happens [12:14:58] imho we should set SSLCACertificateFile as well on all SSL hosts, besides SSLCertificateFile SSLCertificateKeyFile [12:35:27] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [12:37:18] checking spence [12:58:45] New patchset: Hashar; "testwarm: set innodb buffer pool size to 256M" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4395 [12:59:00] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4395 [12:59:19] that one is for you mutante :-) [13:00:48] RECOVERY - Puppet freshness on spence is OK: puppet ran at Fri Apr 6 13:00:34 UTC 2012 [13:02:59] New review: Hashar; "Some context:" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/4395 [13:14:50] New patchset: Mark Bergsma; "Set SO_REUSEADDR in varnishhtcpd, to fix conflict with ganglia HTCP monitoring" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4396 [13:15:05] New patchset: Mark Bergsma; "Use Upstart for varnishhtcpd" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4397 [13:15:19] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4396 [13:15:19] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/4397 [13:16:26] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/4396 [13:16:28] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4396 [13:17:35] New patchset: Mark Bergsma; "Use Upstart for varnishhtcpd" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4397 [13:17:50] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4397 [13:18:00] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/4397 [13:18:03] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4397 [13:20:14] New patchset: Mark Bergsma; "upstart_job will overwrite the init.d file" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4398 [13:20:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4398 [13:20:38] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/4398 [13:20:40] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4398 [13:22:53] New patchset: Mark Bergsma; "install => true doesn't work" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4399 [13:23:08] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4399 [13:23:17] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/4399 [13:23:20] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4399 [13:26:46] New patchset: Hashar; "testswarm: log slow queries (bug 35028)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4400 [13:27:02] New patchset: Hashar; "testwarm: set innodb buffer pool size to 256M" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4395 [13:27:15] OH NO [13:27:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4400 [13:27:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4395 [13:27:47] stupid git-review rebased my change :-/ [13:28:17] New review: Hashar; "second patchset is just a rebase done automatically by git-review :-/" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/4395 [13:30:03] <^demon> hashar: `git config --global alias.r 'review -R'` [13:30:12] <^demon> I keep forgetting the -R, so I aliased it :) [13:30:30] I use git-review [13:30:36] from their master [13:31:34] <^demon> ah [13:31:37] which means I can use defaultrebase=0 in .gitreview!!!! [13:32:03] <^demon> git-review doesn't explode on unknown items in .gitreview does it? [13:32:22] <^demon> If not, I'd say go ahead and start preemptively adding it. [13:33:51] here is the change set : https://review.openstack.org/#change,5784 [13:34:09] I don't think It will cause any trouble with previous versions of git-review [13:36:13] <^demon> Just tried it out, doesn't seem to break anything. [13:36:20] \O/ [13:36:54] <^demon> So between disabling this behavior and the coming "automatic rebase" in gerrit, we've almost fix the annoying rebases for simple cases. [13:37:11] for i in extensions/*/.gitreview; do echo defaultrebase = 0 >> $i; done; [13:37:47] the last thing we will have to fix are the annoying merge conflicts in RELEASE-NOTES [13:38:04] I think we should use the RELEASE-NOTES to explain stuff [13:38:25] and have the list of bugs fixed automatically generated on release and added to something like a ChangeLog file [13:40:50] <^demon> Yeah, we can bikeshed over that a bit more on the list and come up with something. [13:41:06] <^demon> I think we all agree that release-notes as we do them now is going to become a rebase nightmare. [13:44:08] I would not say it is a nightmare [13:44:12] but that is surely annoying [13:45:37] you could make it not nightmare by having each entry in it's own file and then a build step or jenkins job to aggregate them into one file [13:46:02] that would be fun [13:46:07] but I prefer enforcing firstline standards and generating automatically [13:46:08] there is no way my changes that eliminate that many lines can be right mutante [13:46:11] i must be doing it wrong [13:46:13] =P [13:46:19] <^demon> jeremyb: We've also discussed using something like "Release-Notes: " in the commit message footer and parse those out. [13:46:37] ^demon: for something that has to be more than one line? [13:46:44] i haven't watched the bikeshedding [13:47:24] <^demon> Well the standard for git commit footers is to prefix your info with something like "Change-Id:" or "Signed-Off-By:" [13:47:41] <^demon> We could do something similar for Release-Notes: [13:47:56] right. but in what case could we not just use the first line of the msg? [13:48:10] <^demon> Well not all commits deserve a release notes entry. [13:48:20] <^demon> "Coding style fixes -- braces and indentation" probably doesn't. [13:48:39] hah [13:48:56] so, we could have a minor flag of some kind [13:48:59] either in footer [13:49:17] of course, git doesnt wanna take my review for productoin [13:49:17] Tim argument is that the commit summary is intended to other developers where as the release notes are for end users [13:49:18] or "[m] Coding style fixes -- braces and indentation" [13:49:19] which make sense [13:49:20] ;_; [13:49:39] <^demon> I think training people to "include the Release-Notes: if you want them" is easier than saying "Mark changes as minor when they shouldn't clutter release notes." [13:49:52] there is proposal for an alternate git merge algorithm that could potentially merge release notes :-D [13:49:58] New patchset: RobH; "endless corrections of tabulation and spacing, updating dhcp related files for install server Change-Id: I0a41915af9982819d19766df1747daff6f9f82bd" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4374 [13:50:10] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/4374 [13:50:11] the fact i corrected a git error just now on my own frightens me. [13:50:14] RobH: still on that one?! [13:50:24] i completely redid the file mgmt to use recursion [13:50:33] nicely took out like 30 lines of shite [13:50:50] <^demon> RobH: Would you mind approving a puppet change for me? It's just a capitalization fix. [13:51:03] hrmm, now i have an entirely different git error [13:51:42] ^demon is it already in gerrit? [13:51:46] <^demon> Yep. https://gerrit.wikimedia.org/r/#change,4357 [13:52:14] ^demon: i think no matter what the solution is people will forget to flag for release notes (or forget to flag for exclusion). and we can't change commit msgs post merge. at least not without lots of headache. so idk what to do [13:52:18] heh, that is quite the change! [13:52:37] <^demon> Yeah. Took me longer to diagnose than to fix ;-) [13:52:45] <^demon> By a factor of about 10:1. [13:52:46] New review: RobH; "yes, capitalization matters" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/4357 [13:52:49] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4357 [13:53:28] i ahve crazy erorrs, i imagine i need to rebase or something since my checkout was old? [13:53:34] cuz lint is bitching about things that i didnt touch. [13:53:44] https://gerrit.wikimedia.org/r/#change,4374 [13:54:17] (anyone can feel free to advise, as everyone prolly knows more about git/gerrit than me ;) [13:54:37] <^demon> Files you didn't touch? Eww [13:54:42] RobH: source => "puppet:///files/dhcpd". [13:54:46] I think that's your problem [13:54:48] New patchset: Mark Bergsma; "Actually drop privileges; remove old unused code" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4408 [13:54:56] shouldn't be . [13:54:59] I think should be ; [13:55:03] well shit. [13:55:04] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4408 [13:55:24] check that in with a --amend [13:55:25] notpeter: or , [13:55:27] and it should work [13:55:34] heh, that patch set has 4 ammends [13:55:38] yeah, although ; is proper [13:55:43] on that i am clear how to do, heh, oh is it? [13:55:48] i should do it properly. [13:55:51] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/4408 [13:55:53] <^demon> RobH: We've got a couple in mediawiki with 8 or so patchsets. [13:55:53] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4408 [13:56:19] well, all my patchsets except one are due to typos and tabulation [13:56:20] RobH: the gerrit errors are hella misleading. usually you only need to pay attention to the last line [13:56:26] bad tabbing makes mark sad [13:56:47] Syntax error at '.'; expected '}' at ./manifests/misc/install-server.pp:185 err: Try 'puppet help parser validate' for usage [13:56:50] New patchset: RobH; "endless corrections of tabulation and spacing, updating dhcp related files for install server Change-Id: I0a41915af9982819d19766df1747daff6f9f82bd" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4374 [13:57:05] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4374 [13:57:12] notpeter: good to know! [13:57:18] and indeed, it is much happier now [13:57:21] ja [13:57:26] says it passed lint now [13:57:29] mutante: what apache file did you want me to peek at? [13:57:40] i am happy to do so [13:57:44] ah, i didnt even scroll down:) [13:57:48] cool [13:58:08] !change 4334 | RobH [13:58:08] RobH: https://gerrit.wikimedia.org/r/4334 [13:58:19] oh, this is apache in gerrit, i assumed you meant in plain old httpd heh [13:58:42] it just changes a line in the apache config file though [13:58:55] plain file in file/apache/sites [13:59:16] so the pem file is copied down by an unrelated manifest that i dont need to hunt down and confirm exists? [13:59:25] i assume yes, but figured it would ask ;] [13:59:29] right [13:59:33] coolness [13:59:35] i checked it is on the server [13:59:45] I wouldn't merge that rob [14:00:02] you mean my change or the apache one for daniel? [14:00:03] it looks full of file conflicts and potential for data loss [14:00:07] your dhcp stuff [14:00:17] ok, can you notate the change? [14:00:35] no i'm busy [14:00:53] but you're doing a recursive file type on a directory you're also putting files in? [14:01:10] why purge true? [14:01:14] yes, palcing the files via the recurse type [14:01:22] the purge ditches any local cruft folks put in they shouldnt [14:01:30] is that really necessary? [14:01:45] it seems like a good practice since folks may not be used to editing this stuff in puppet [14:01:55] but if the purge is gone, would that alliviate the issue? [14:02:23] you don't want to mention all the files individually for subscribe [14:02:27] mutante: im not sure of our proper protocol here, would i do public and submit, or just publish? [14:02:38] (for your patch) [14:03:06] mark: just file "directory" and subscribe "directory" ? [14:03:17] yes that should work [14:03:30] so i will change that and dump the purge [14:03:54] though i like the purge, its bold ;] [14:03:55] i'll check again when you've done that [14:04:05] RobH: if you want to change the "Verified" and "Code Review" values, "Publish and Submit". If you want _only_ text comments, ""Publish Comments" [14:04:31] mutante: yea but thats my question, which is what i should do. i assume publish and submit, as it then saves you from self-submit [14:04:36] which i imagine is the point of the code review? [14:05:06] but then its goign to be pending a push on sockpuppet, which then you take care of right? [14:05:23] this is Ops, right? https://rt.wikimedia.org/Ticket/Display.html?id=2783 [14:05:35] i wouldnt't mind a merge, yeah:). so if it looks ok, +2 and Publish and Submit. then one of use, i guess by protocol the one who clicked +2 , should also merge on sockpuppet [14:05:53] New review: RobH; "simple enough change, looks good to me." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/4334 [14:05:55] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4334 [14:05:59] ok, well, this is me passing that buck back to you [14:06:05] RobH: i'm happy to do the sockpuppet part [14:06:05] merge your own change on sockpuppet ;] [14:06:09] heh [14:06:11] hexmode: yes [14:06:13] thx [14:06:35] RobH: tyvm :) [14:06:41] hexmode: those redirects live in an ops only area in /home/wikipedia/conf/httpd [14:06:48] you can of course read them on NOC [14:06:51] but not edit [14:07:06] someday those will be in gerrit, but not until someone takes a ton of time to clean them up [14:07:18] messing with the cluster apache files is like playing jenga. [14:07:38] sounds scary [14:07:59] RobH: so, I have a primative bz4 package... I need to spend some time this w/e working on puppetizing it [14:08:07] <^demon> Now I have an image of RobH shouting "jenga" when he breaks the site. [14:08:21] i have not had to yell that in years. [14:08:27] i hope that doesn't jinx me. [14:08:42] I'm really surprised that no one has packaged bz4 yet [14:08:45] <^demon> Oh man, you said that on a Friday too. [14:08:47] we could have a live, in person jenga game [14:09:02] <^demon> jeremyb: I'd hate to be the person on the bottom of the tower. [14:09:40] hexmode: so I want to help, but not sure how useful i will be, as when i am doing puppet stuff like now I am totally letting a shitton of other things slide [14:09:40] ^demon: so... 06 13:52:13 < jeremyb> ^demon: i think no matter what the solution is people will forget to flag for release notes (or forget to flag for exclusion). and we can't change commit msgs post merge. at least not without lots of headache. so idk what to do [14:09:46] like every single eqiad ticket =P [14:10:03] ^demon: don't forget about all the people in the basement! [14:10:24] <^demon> jeremyb: I don't know either. I'm willing to let everyone else bikeshed over it :) [14:10:25] RobH: just giving you an update, not asking for help :) I'll just randomly bang on people here and in -labs ;) [14:10:48] awesomesauce [14:10:54] jeremyb: you are more than welcome to comment on wikitech-l :-))) [14:10:59] jeremyb: that is a fun place [14:11:09] hashar: thread too long... [14:11:18] too many lists [14:11:48] same issue there [14:11:54] I have so many mail that I end up using two different clients [14:12:21] wikitech-l, mediawiki-l, wikidata-l. plus at least a few private lists. at least no foundation-l [14:13:18] hashar: seen notmuch? [14:13:18] notmuch ? [14:13:18] sorry I don't understand [14:13:36] http://packages.debian.org/notmuch [14:13:54] !log manganese (gerrit) now sends SSL CA certificate on https, (curl -vvv says verify ok), should resolve [[RT:2777]] and [[BZ:35709]] [14:13:56] Logged the message, Master [14:14:05] RobH: <- looks good live [14:14:23] jeremyb: well I have them filtered nicely [14:14:25] hexmode: ^ [14:14:34] * jeremyb waits for a slow identi.ca [14:14:39] yay [14:14:41] jeremyb: I am just missing a way to dynamically create sub folders based on rules like X-Gerrit-Change-ID [14:15:18] mutante: tyvm [14:15:40] oh, i know someone that's done dynamic creation by list name. should be similar i guess [14:16:09] I might end up writing my own thunderbird extension to do that [14:17:23] hashar: http://identi.ca/notice/92346721 [14:17:40] oh yeah mutt [14:18:26] would have to learn that one day [14:18:55] but then people will be staring at me [14:20:06] paravoid won't stare ;) [14:22:00] * RobH stares at hashar [14:22:22] * hashar hands RobH a few gerrit changes to review :D [14:22:31] cant, busy staring [14:22:45] New patchset: RobH; "endless corrections of tabulation and spacing, updating dhcp related files for install server Change-Id: I0a41915af9982819d19766df1747daff6f9f82bd" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4374 [14:22:49] who mentioned 8 patch sets on a change? [14:22:52] im closing in on it. [14:22:54] <^demon> Me :) [14:23:00] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4374 [14:23:31] <^demon> RobH: https://gerrit.wikimedia.org/r/#change,3363 [14:24:06] i am sure there will be a mistake in my work [14:24:09] and i shall also hit 8 [14:24:35] http://upload.wikimedia.org/wikipedia/commons/6/64/Time_100_Jimmy_Wales_stares_and_grins.jpg \o/ [14:24:38] coffee break [14:24:41] brback [14:25:08] <^demon> Upgrading to 2.3 is going to be cool. It makes the "Drafts" easier to use. [14:25:17] <^demon> That'll be a nice user-facing feature. [14:26:10] mark: whenever you have a moment, https://gerrit.wikimedia.org/r/#change,4374 [14:26:57] hashar: http://www.flickr.com/photos/williambrawley/3277223827/ [14:27:04] the past three days have had more puppet style guidelines pushed into my eyeballs than the past three months. [14:27:18] will check in a bit [14:27:33] ok, im gonna run down the street, insulin pickup time at the Rx. [14:27:36] back shortly. [14:27:42] <^demon> RobH: gerrit has made the trailing whitespace in mediawiki painfully obvious. [14:28:02] it does that to everything, i know understand why mark is hard on excess tabs and whitespace [14:28:22] i even had to PM ben/maplebed to change my view on vim whitespace markers. [14:28:24] heh [14:28:43] now know why even =P [14:30:40] why is every nrpe check_procs a separate check_command? [14:31:55] because we cant pass arguments why the nrpe, it is disabled for security reasons, so it can just take hardcoded stuff [14:32:03] via [14:32:17] to NRPE [14:32:21] but you can to checkcommands, right? [14:32:26] like all other checkcommands? [14:34:11] eh, yeah, there need to be matching checkcommands it the nrpe.local.cfg [14:34:19] sure [14:34:31] but why does there ALSO need to be a matching checkcommand in the nagios config? [14:34:36] that seems pointless [14:34:39] unless I'm missing something [14:36:56] New patchset: Mark Bergsma; "Setup process monitoring for varnishhtcpd" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4412 [14:36:57] mutante: ^^ why wouldn't this work? [14:37:08] assuming I make a generic nrpe_check_procs checkcommand, once [14:37:11] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4412 [14:37:12] mark -- the matching checkcommand is in the server side nagios config? [14:37:22] not yet but it will be [14:37:39] and the nrpe.local.cfg is client side? [14:37:42] yes [14:37:54] note that I also used /etc/nagios/nrpe.d instead of nrpe_local [14:37:58] sounds like the latter is a janky ACL? [14:38:06] no need to have every possible NRPE command on every server [14:41:38] New patchset: Mark Bergsma; "Add one generic NRPE check_procs command to rule^replace them all..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4413 [14:41:41] there [14:41:53] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4413 [14:41:58] speak now, or forever be more productive. [14:42:30] sweet indifference :-P [14:42:43] i like simpler, sounds like that's what you've done, so I will upvote. [14:42:57] and I would like to make it even nicer [14:43:00] is there any reason to not setup a seperate root crontab list for labs? [14:43:09] by having a nice definition for installing a separate NRPE file for each NRPE service we want to monitor [14:43:24] notpeter: raising that b/c of the cronspam topic of the AM? [14:43:43] yeah [14:43:58] so you can say nrpe::check { "check_varnishhtcpd": command => "/usr/lib/nagios/plugins/check_procs -c 1:1 -u varnishhtcpd -a 'varnishhtcpd worker'" } [14:44:04] for what I've now done with a manual file in the above [14:44:08] or just have the mail go to people who are on the project [14:44:25] notpeter: that one was just insane, and I finally got annoyed enough to fix it [14:44:43] thing is, it wasn't project-specific--it was ldap stuff on the nfs server [14:44:58] hurray.... [14:45:17] wel, I gues I meant "only have it go to people who have access to that instance" [14:45:25] that way there's some ownership/accountability [14:45:36] i agree re. projects [14:45:52] mark: that looks nice! i was looking for a literal problem in the varnishhtcpd check. i can use that to replace others [14:45:53] but in this case I think it was labs infrastructure [14:46:35] i'm making it even nicer now [14:46:39] gimme a few mins [14:47:12] Jeff_Green: yeah, but the by-project thing might reduce the "somebody else's problem" effect [14:47:20] even on infrastructure [14:47:47] yeah [14:47:54] i'm not by any means disagreeing with you [14:48:02] i'm just not sure in this case whose project it would be? [14:48:03] WHY ARE YOU DISAGREEING WITH ME?!?!?! [14:48:17] anyone who has access? [14:48:19] SEND ME MORE COFFEE DAMMIT. [14:48:23] MOAR [14:48:38] * hashar sends coffee to everyone [14:48:40] I'm at a sweet hipster coffee shop, actually [14:48:48] I rode my fixxie here :) [14:48:56] another approach would be to revert insane cronjobs [14:49:09] yeah [14:49:26] or feed them to a script that does a git history and pummels the last commiter [14:49:29] I mean, the real problem is that it's a tragedy of the commons [14:49:40] a tragicomedy, if you will [14:49:56] tragicommondy [14:52:24] any op willing to review / merge my pending changes in puppet ? :D [14:52:33] most of them are trivial ones. List at https://gerrit.wikimedia.org/r/#q,owner:hashar+project:operations/puppet+status:open+branch:production,n,z [14:56:28] Jeff_Green: http://upload.wikimedia.org/wikipedia/commons/6/60/Turkishcoffee....jpg [14:56:34] New patchset: Demon; "Squashing 11 local commit for svn2git rules and such." [operations/software] (master) - https://gerrit.wikimedia.org/r/4415 [14:56:54] apergos: ooh, that looks tasty [14:57:17] apergos: I konw what I'm having with lunch! [14:57:44] New review: Dzahn; "well, like it says. minimmal git cli config, error description makes sense" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/4295 [14:57:47] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4295 [15:00:23] frappes are big around here but meh [15:00:34] New patchset: Mark Bergsma; "Two new definitions for dealing with NRPE checks in a nice way, monitor varnishhtcpd with it" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4416 [15:00:43] mutante: check change 4416, how do you like that? [15:00:44] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/4416 [15:00:57] after I fix syntax errors ;) [15:01:21] heh,ok [15:01:51] I also have two changes to testswarm mysql configuration (on gallium) https://gerrit.wikimedia.org/r/4395 https://gerrit.wikimedia.org/r/4400 [15:02:01] diederik: /j #wikimedia-tech ? [15:02:07] one just describe in puppet what is already in production, the other allow slow query logs :-) [15:06:26] why are there two gerrit bots... [15:06:35] New patchset: Mark Bergsma; "Two new definitions for dealing with NRPE checks in a nice way, monitor varnishhtcpd with it" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4416 [15:06:52] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4416 [15:07:08] mutante: check now [15:08:40] mark: i was already looking. it looks really cool. one comment on this " Command run by NRPE, e.g. "/usr/lib/nagios/plugins/check_procs -c 1:1 -C varnishtcpd"" [15:08:54] sometimes we wanted to use -c and sometimes -a with check_procs [15:09:02] I know [15:09:05] so do use that [15:09:10] look how I did it in varnish.pp [15:09:12] it's not like this example [15:11:08] Jeff_Green: to battle the cron spam from labs, we're gonna setup a special labs mail relay [15:11:15] which sends it to the instance owners instead of to us :P [15:12:27] i see :) yes. i shall use that to replace other checks [15:12:30] notpeter: have a look at https://gerrit.wikimedia.org/r/#change,4416,patchset=2 as well if you want [15:12:45] mark: On adding the tftp pxelinux.cfg files. Would you suggest appending them to the misc::install-server::tftp-server class, or as a subclass of that since they are technically distro specific? [15:12:54] i assume the latter, but i wanted to get your take. [15:12:55] a subclass [15:13:02] cool [15:13:28]