[07:48:36] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [07:48:36] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [08:41:22] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [10:06:20] PROBLEM - Puppet freshness on ms-be3 is CRITICAL: Puppet has not run in the last 10 hours [10:06:20] PROBLEM - Puppet freshness on ms-be10 is CRITICAL: Puppet has not run in the last 10 hours [10:08:17] PROBLEM - Puppet freshness on ms-be4 is CRITICAL: Puppet has not run in the last 10 hours [10:13:56] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [10:51:53] PROBLEM - Puppet freshness on ms-be12 is CRITICAL: Puppet has not run in the last 10 hours [10:52:47] PROBLEM - Puppet freshness on ms-be11 is CRITICAL: Puppet has not run in the last 10 hours [11:34:26] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: Puppet has not run in the last 10 hours [11:46:18] labs-nfs1:/export/home/varnish/mark 18G 17G 256K 100% /home/mark [11:51:32] PROBLEM - Puppet freshness on ms-fe3 is CRITICAL: Puppet has not run in the last 10 hours [12:01:26] PROBLEM - Puppet freshness on ms-fe4 is CRITICAL: Puppet has not run in the last 10 hours [12:05:20] PROBLEM - SSH on lvs6 is CRITICAL: Server answer: [12:06:50] RECOVERY - SSH on lvs6 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [12:11:20] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 194 seconds [12:11:56] PROBLEM - MySQL Slave Delay on db1020 is CRITICAL: CRIT replication delay 203 seconds [12:12:05] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 218 seconds [12:12:05] PROBLEM - MySQL Replication Heartbeat on db1020 is CRITICAL: CRIT replication delay 209 seconds [12:13:26] RECOVERY - MySQL Slave Delay on db1020 is OK: OK replication delay 0 seconds [12:13:35] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 13 seconds [12:13:35] RECOVERY - MySQL Replication Heartbeat on db1020 is OK: OK replication delay 0 seconds [12:14:20] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [12:17:15] PROBLEM - Puppet freshness on ms-be1003 is CRITICAL: Puppet has not run in the last 10 hours [12:17:15] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [12:17:15] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: Puppet has not run in the last 10 hours [12:17:15] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [12:17:15] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [12:17:16] PROBLEM - Puppet freshness on ms-fe1001 is CRITICAL: Puppet has not run in the last 10 hours [12:17:16] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [12:17:17] PROBLEM - Puppet freshness on ms-be1002 is CRITICAL: Puppet has not run in the last 10 hours [12:17:17] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [12:17:18] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [12:17:18] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [12:17:19] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [12:17:19] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [12:29:30] !log Built varnish 3.0.3plus~rc1 packages and deployed them on cp1041 (mobile) [12:29:40] Logged the message, Master [12:30:25] jetlagged? [12:30:31] :-) [12:30:41] ya ..woke up at 2.30am [12:30:44] aw [12:30:48] 14:29:30 <@mark> !log Built varnish 3.0.3plus~rc1 packages and deployed them on cp1041 (mobile) [12:30:55] woot!! [12:31:32] how is it performing? [12:31:44] same as before [12:31:48] we're not using the streaming code on mobile [12:31:57] but i'm just running it for a few mins now [12:32:05] too early to tell [12:33:29] going to try it on the cp1021 - cp1036 today? [12:33:34] no [12:33:53] end of this week or next week perhaps [12:33:55] wait & see? [12:33:57] ok [12:33:58] but i'm out on friday and also this evening [12:34:13] ah ok [12:34:20] btw, this monday is a holiday here [12:34:24] labor day weekend [12:34:25] oh ok [12:37:10] we need to upgrade cp1021-1036 to precise as well [12:38:07] wanna get PY to work on it? [12:38:12] or someone else [12:38:16] perhaps someone can do that on friday [12:38:22] but let me first see if anything else needs to be done [12:39:20] will ask py later this morning [12:39:27] if not it is daniel [12:39:31] ok [12:43:44] u think the image scalers will not be showing load problem with nfs gone? [12:43:51] so far, they look good [12:43:57] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [12:44:13] if everything is as it should be, they should have 0 additional load [12:44:22] that's why I had some concerns last week when that proved not to be the case [12:44:50] but I think there's some additional development work that needs to be done on the swift client in mediawiki [12:44:53] looking at the traffic, it is going up but the load is constant [12:45:13] ya, those head requests [12:45:19] also better timeout control [12:45:49] so i'll be bringing that up in SF if not before that [12:46:16] this transition surfaced some of those issues ;-P [12:50:23] yes [12:55:57] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [13:23:20] hmm I see varnishd ran out of memory on cp1041 earlier [13:23:22] the previous build [13:23:42] and cp1042 [13:25:34] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [13:29:17] I'm jetlagged too by all of these deploys :P [13:31:38] mark: still around? [13:32:32] yes [13:32:41] heya :) [13:33:30] hi [13:33:35] I haven't fixed varnish upload yet [13:34:41] not a problem, i'll do that before deploy next week [13:35:47] we're still haven't completely finished up with ms7 [13:36:19] unfortunately... [13:36:43] so, question time :-) [13:36:50] why do we even store thumbs in swift? [13:37:08] why not just put a squid/varnish with persistent storage in front of rendering [13:37:26] and let LRU expire unused thumbs [13:37:27] we've considered that several times [13:37:35] and some people were uneasy about it [13:37:37] oh have you? didn't know that [13:37:38] but sure, we can try that some day [13:37:52] with varnish not having reliable persistent storage and stuff [13:37:56] but now it's getting there again... [13:38:03] heh yeah [13:38:09] right now we're quite wasteful in terms of disk and memory [13:38:30] yes [13:38:39] we keep a thumb in three copies, plus pagecache, plus memory on the proxy servers, plus squids backend/frontend memory & disk [13:38:42] we can try that at some point [13:38:45] I know ;) [13:38:51] I know you do :) [13:38:55] so I think the general consensus was, let's first get us off the current arch [13:38:57] and then we can optimize [13:39:05] so later we can probably remove thumbs [13:39:08] can't disagree with that! [13:39:20] given that the current architecture was ready to kick the bucket at any time, yes [13:39:45] the other reason was [13:39:52] and I guess that's not solved ;) [13:39:57] I would love it if we could generate on the fly (with appropriate caching as opposed to "save on disk forever and guess at which things to throw away once every year or so"_ [13:40:02] mediawiki needs to be able to obtain a list of generated thumbs [13:40:08] in order to know which ones to purge from squid/varnish [13:40:41] arguably it's not entirely needed since varnish supports bans as well, based on regexps [13:40:44] hmm? [13:40:45] right! [13:40:48] but with the rate of purging we do, that would be rather inefficient [13:40:50] I was about to say exactly that [13:40:58] so we'd need to find a solution for that too [13:41:42] if we had to keep a list of thumbs someplace that would still be a lot cheaper than the thumbs themselves [13:43:23] okay [13:43:34] we can play with thumbs in esams [13:43:58] since we're still looking for a solution there [13:44:05] and a swift cluster over there would be an overkill imho [13:44:28] or over here. dammit, I'm assimilated already [13:44:42] first goes the time, now I say "there" [13:45:43] apergos: thanks for the move to wikitech [13:45:47] sure [13:45:53] apergos: I think we should mail those pages to ops@ too [13:46:02] no reason to keep this between ourselves, let's engage the rest of the team [13:46:12] sure, it can go with the next update [13:46:47] resistance is futile [13:46:50] you will be assimilated [13:47:22] I guess it's only fair, we assimilated you :P [13:47:36] I was assimilated before showing up :-P (much more efficient that way) [13:47:58] paravoid: swift cluster hardware has already been racked in esams [13:48:03] different servers [13:48:10] dell R720XDs [13:48:14] oh [13:48:20] but that's ok [13:48:25] if we ever decide to no longer need it [13:48:28] we can use it for other storage [13:48:31] I hope these are less of a PITA than the c2100s [13:48:31] backups or so :P [13:48:40] i think they will be [13:48:47] we have 4 of them [13:48:59] that wil be a cute little cluster [13:49:13] how much storage do they have? [13:49:20] don't remember [13:49:25] anyway, esams doesn't need storage [13:49:30] esams never had storage until about a year ago [13:49:32] when I set up ms6 there [13:49:34] just for fun [13:49:38] before it was purely squid [13:49:55] i figured ms6 could help reduce latency a bit, and it was a nice test case for btrfs [13:50:01] we had the hardware doing nothing [13:51:15] yep [13:51:29] hehehe [13:52:04] the C2100s are really crappy [13:52:06] * apergos goes to look at btrfs status again (it's been too soon since the las time but wth) [13:52:15] I was playing with ms-be6 a bit yesterday night [13:52:30] besides not being able to see half of the disks [13:52:44] even the ones that it does see, sda/b, part of md0, i.e. / [13:52:46] oh yeah, see my updates on the ms be ticket for that [13:52:56] New patchset: Demon; "Adding link to new gerrit nightlies" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21725 [13:53:03] (= we should bug the vendor) [13:53:13] and it couldn't unpack a .deb [13:53:20] it hanged [13:53:23] nice [13:53:40] yeah, i thought of installing swift 1.5, since it's the last one missing it [13:53:41] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/21725 [13:53:50] (it was down when I upgraded the rest) [13:54:00] and it couldn't even install packages [13:54:23] it will be a waste of time [13:54:45] sadly [13:56:13] hi friends, I'm looking for documentation on how to change a DNS A record [13:56:17] particularly stats.wikimedia.org [13:56:25] I'm moving it off of spence to stat1001 [13:56:42] its ready to go, I just need to change DNS [13:57:04] i've looked at wikitech DNS and greped sockpuppet:/root/pdns-templates [13:57:09] ottomata: this is what we have http://wikitech.wikimedia.org/view/DNS [13:57:34] and it's all you need [13:57:39] ok reading harder then [13:57:54] thanks! [13:57:56] see "Changing records in a zonefile" [13:58:01] but let us review that [13:58:07] one small typo and everything can go down [13:58:26] certainly [13:58:36] i'm not changing anything til I know how it works and you guys say OK [13:58:44] note--there should be a giant blinking red warning that pdns faceplants occasionally on update, so you need to keep an eye on all three nameservers after the authdns-update [13:58:50] just wanted to learn as much as I can before bothering, will read this then ask more qs [13:58:56] grumble grumble, we need to get hardy off that box [13:58:58] ok [14:00:14] ahhh, i see, currently stats is a CNAME to spence of couuuurse [14:00:17] k need to CNAME it to [14:00:19] to stat1001 [14:00:53] hm, this is curious: [14:00:54] t@sockpuppet:~/pdns-templates# grep stats wikimedia.org [14:00:54] stats 1H IN CNAME spence [14:00:55] stats 1H IN CNAME zwinger [14:02:29] should stats have to CNAMEs? i could be reading this file wrong... [14:02:39] i don't see any sections though, it seems like just a list of records [14:03:01] New patchset: Demon; "Setup new Gerrit nightly builds" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21725 [14:03:10] what do you mean by sections? [14:03:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/21725 [14:03:49] i dunno what I mean, trying to understand why there would be two CNAMES for stats [14:04:14] doesn't seem like there should be [14:04:17] right? [14:04:43] ok, do you have experience with DNS and bind style zonefiles [14:04:44] ? [14:04:50] nope [14:05:06] not more than adding new entries for the analytics servers in the last couple of weeks [14:05:09] ok then i'm really not comfortable with you doing DNS changes [14:05:14] i'm fine with that [14:05:16] i'm not changing anything [14:05:21] so there are sections [14:05:26] they are introduced with $ORIGIN [14:05:41] but you should probably read a good introduction about DNS at some point [14:05:52] I recommend O'Reilly's DNS & Bind, even though we don't sue bind [14:05:53] use [14:06:30] ok, i mean, i understand DNS from a high level, but not nitty gritty details [14:06:34] this is a powerdns file? [14:06:51] ah! that is the 'p' in pdns :) [14:07:01] it's a powerdns bind-style zone file [14:07:16] unfortunately there are very many nitty gritty details with DNS [14:07:28] aye [14:07:42] yeah i'm totally fine with not messing with this [14:08:01] but i'd like to know what you guys do so I can start to understand [14:08:02] i can change it for you now, or you can change it and ask for a review before deploying [14:08:05] and how the systems work [14:08:09] (so, no svn commit until someone has reviewed) [14:08:13] certainly not [14:08:18] the how to is in the documentation [14:08:22] but it assumes dns knowledge as well [14:08:34] aye [14:08:40] i understand the instructions [14:08:54] my change is simple enough, what I didn't get was why there were multiple CNAMEs, but you showed me the sections [14:09:13] without looking, I assume they are under different $ORIGINs [14:09:21] yeah you're right [14:09:34] you can also have the same cname multiple times though [14:09:47] then it would simply point at both, however that's not supported for CNAMEs [14:09:49] so it would be broken [14:09:56] aye [14:10:11] with A records that would be fine, but you would expect them to be adjacent anyway, for clarity [14:10:23] aye, yeah [14:10:31] so i would change this [14:10:31] stats 1H IN CNAME spence [14:10:32] to [14:10:35] stats 1H IN CNAME stat1001 [14:10:37] yes [14:10:57] should I do that and let you double check, or do you want to do it? [14:11:02] you can do that [14:11:40] cat /tmp/dns.diff [14:11:42] if you wanna review [14:11:54] it looks fine [14:11:59] i'll commit and deploy it now [14:12:12] ok, deploying is just what is in the instructions, right? [14:12:17] yes [14:12:22] coo [14:12:23] l [14:12:26] ok! [14:12:36] it's going live now [14:13:09] great, thanks mark! [14:13:20] it will take up to one hour for the change to take full effect [14:14:19] aye [14:23:05] mark, while you are here :) [14:23:11] i'd love to clean up some of the rsyncd stuff in puppet [14:23:25] particularly, it is difficult to puppetize multiple rsyncd modules [14:23:32] right now we do it per server/funciton [14:23:37] so we have tons of rsyncd .conf files [14:23:40] each with a 'name' [14:23:49] it would be much cleaner if I could do [14:24:13] rsync_module { "/a" … } [14:24:13] rsync_module { "/var/log/udp2log" … } [14:24:14] or whatever [14:24:23] i just found this: [14:24:23] https://github.com/puppetlabs/puppetlabs-rsync [14:24:28] which looks great [14:24:38] its all nice and modularlized [14:25:30] we haven't really discussed about how to import third-party modules [14:25:33] but I'd say go ahead [14:25:34] aye [14:25:41] and let's review this like we review anything else [14:25:54] ok cool, as long as I have at least one "hmm, ok!", then ok! [14:25:55] (modulo indentantion/style :) [14:32:40] good morning maplebed [14:32:50] hi! [14:33:00] last day heh? :) [14:33:26] startin bright and early! [14:33:27] :D [14:33:31] gotta make it count. [14:43:25] hehe [14:43:54] now, though, off to the office. cya in an hour. [14:44:02] see ya [14:51:16] New review: Hashar; "The HEADER.html / README.html need to be defined as HeaderName / ReaderName directives in Apache con..." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/21725 [14:54:10] http://docs.openstack.org/trunk/openstack-object-storage/admin/content/analyzing-log-files-with-swift-cli.html [14:54:14] seriously, wtf [14:54:16] wtf [14:54:53] this is such a PR bullshit [14:55:03] and/or completely stupid [14:56:02] "we have 4 log files in a directory that we want to run awk on. only way to do it: upload them to swift, then download them and pipe them to awk." [14:56:13] along with a statement that says: "The swift utility is simple, scalable, flexible and provides useful solutions all of which are core principles of cloud computing; with the -o output option being just one of its many features." [14:56:23] what do you know, unix pipelines are cloud! [15:03:02] PROBLEM - Puppet freshness on ms-be8 is CRITICAL: Puppet has not run in the last 10 hours [15:07:30] wow, that's pretty priceless (I wasn't going to click through but curiosity got the best of me) [15:13:13] New patchset: Ottomata; "Adding rsync and xinetd submodules from https://github.com/puppetlabs/puppetlabs-rsync and https://github.com/puppetlabs/puppetlabs-xinetd." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21744 [15:13:59] New patchset: Ottomata; "Setting up rsync module on statistics servers." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21745 [15:14:43] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/21744 [15:14:44] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/21745 [15:14:47] mark, paravoid, would love some serious comments on those [15:15:11] I added those modules via git submodules, as described on http://sysadminsjourney.com/content/using-git-submodules-dynamic-puppet-environments/ [15:15:23] not sure if this is how you'd like, its def up for discussion [15:15:39] erm? and where's the repository going to be hosted? [15:15:50] did you make a gerrit mirror? [15:16:06] nope, could do that if you like, i just added them as submodules from github [15:16:12] I don't think we should use submodules for this [15:16:14] but even if we did [15:16:31] we *definitely* shouldn't use submodules to random repositories out there [15:16:33] hm, ok, would you rather I just clone/export and add to our own? [15:16:42] can you imagine the security implications? [15:16:52] puppet runs on our whole fleet and has root [15:17:13] that's cool, we can change [15:17:28] which way would you rather have it? [15:17:43] gerrit mirror or just a commit code to operations/puppet? [15:20:12] paravoid, I gotta run, battery low and no outlets at this cafe, if you got a sec, would you leave me a scathing review on that commit? :) and add your suggestions? [15:20:25] will do [15:20:30] danke! [15:22:39] New review: Faidon; "We shouldn't use submodules to third-parties as we are effectively giving them root to our infrastru..." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/21744 [15:26:08] PROBLEM - Puppet freshness on ms-be1 is CRITICAL: Puppet has not run in the last 10 hours [15:37:30] PROBLEM - Puppet freshness on ms-be2 is CRITICAL: Puppet has not run in the last 10 hours [15:44:36] hmm I think I'm gone for the day [15:44:53] <^demon> Have a good evening apergos [15:45:19] thanks. I'll check in before I go to sleep of course [15:49:30] PROBLEM - Puppet freshness on srv193 is CRITICAL: Puppet has not run in the last 10 hours [15:53:33] PROBLEM - Puppet freshness on srv194 is CRITICAL: Puppet has not run in the last 10 hours [15:55:49] maplebed: quick q: where are the package sources for the swift debs? [15:56:34] ppa:swift-core/release? [15:57:04] paravoid: I'd go with github/openstack/swift [15:57:13] package sources, not sources [15:57:22] who made the .debs? [15:57:25] oh... I'm not sure. [15:57:44] swiftstack made them for us, since they included the statsd stuff before it got mainlined. [16:00:08] ugh [16:00:52] it is now integrated though (in the official 1.5 release) so we can probably use the public packages [16:01:17] there are none for 1.5 [16:01:18] just 1.6 [16:01:32] huh. ok. [16:02:04] I wonder if I should upgrade to 1.6 [16:02:42] *cough* how hard can it be? *cough* :-D [16:02:49] I wouldn't bother until there's something in it we want. [16:03:15] well if it means we can use standard packages, might be worth it [16:03:20] the upgrade was easily the simplest thing we've done these past three weeks. [16:05:12] ok, I really got to get going. back in a few hours. [16:06:49] New patchset: Ottomata; "Manually adding modules/{xinetd,rsync} for managing rsyncd modules." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21749 [16:07:33] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/21749 [16:08:09] paravoid ^ :) better? I will abandon the other changes (since I had committed 2 and one depended on the other) [16:15:16] !log added payments100[1-4] & pay-lvs100[12] to nagios nsca config, restarted nagios on spence [16:15:27] Logged the message, Master [16:16:36] PROBLEM - Puppet freshness on ms-fe1 is CRITICAL: Puppet has not run in the last 10 hours [16:16:54] New patchset: Demon; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [16:17:36] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/13484 [16:17:48] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [16:17:48] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [16:17:48] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [16:17:48] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [16:20:12] PROBLEM - check_mysql on payments1004 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:20:12] PROBLEM - check_mysql on payments1002 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:20:12] PROBLEM - check_mysql on payments1003 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:20:21] PROBLEM - Host pay-lvs1002 is DOWN: PING CRITICAL - Packet loss = 100% [16:20:21] PROBLEM - Host pay-lvs1001 is DOWN: PING CRITICAL - Packet loss = 100% [16:20:40] PROBLEM - check_minfraud_primary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:20:40] PROBLEM - check_minfraud_secondary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:20:40] PROBLEM - check_mysql on payments1001 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:23:48] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [16:24:31] New patchset: Demon; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [16:25:09] PROBLEM - check_mysql on payments1002 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:25:09] PROBLEM - check_mysql on payments1003 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:25:09] PROBLEM - check_mysql on payments1004 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:25:20] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/13484 [16:25:36] PROBLEM - check_minfraud_primary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:25:36] PROBLEM - check_minfraud_secondary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:25:36] PROBLEM - check_mysql on payments1001 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:25:44] garghabargha . . . pay* nagios alerts are false positive. fixing . . . [16:30:06] PROBLEM - check_mysql on payments1003 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:30:06] PROBLEM - check_mysql on payments1004 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:30:06] PROBLEM - check_mysql on payments1002 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:30:42] PROBLEM - check_minfraud_primary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:30:42] PROBLEM - check_minfraud_secondary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:30:42] PROBLEM - check_mysql on payments1001 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:35:12] PROBLEM - check_mysql on payments1003 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:35:12] PROBLEM - check_mysql on payments1002 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:35:12] PROBLEM - check_mysql on payments1004 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:35:39] PROBLEM - check_minfraud_primary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:35:39] PROBLEM - check_minfraud_secondary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:35:39] PROBLEM - check_mysql on payments1001 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:40:22] PROBLEM - check_mysql on payments1003 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:40:22] PROBLEM - check_mysql on payments1004 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:40:22] PROBLEM - check_mysql on payments1002 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:40:58] PROBLEM - check_minfraud_primary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:40:58] PROBLEM - check_minfraud_secondary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:40:58] PROBLEM - check_mysql on payments1001 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:45:28] PROBLEM - check_mysql on payments1004 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:45:28] PROBLEM - check_mysql on payments1003 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:45:28] PROBLEM - check_mysql on payments1002 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:45:51] heya mark, or paravoid, would love some help reviewing that bit sooner rather than later if either of you are still around and working. I'm doing this so I can set up rsync deployment instead of NFS for erik zachte and stats.wikimedia.org [16:45:59] https://gerrit.wikimedia.org/r/#/c/21749/ [16:46:31] RECOVERY - check_minfraud_primary on payments1001 is OK: OK [16:46:31] RECOVERY - check_minfraud_secondary on payments1001 is OK: OK [16:46:31] RECOVERY - check_mysql on payments1001 is OK: OK [16:50:25] PROBLEM - check_mysql on payments1003 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:50:25] PROBLEM - check_mysql on payments1004 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:50:25] PROBLEM - check_mysql on payments1002 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:52:31] PROBLEM - Puppet freshness on ms-fe2 is CRITICAL: Puppet has not run in the last 10 hours [16:53:47] Change abandoned: Ottomata; "Going to do as Faidon suggests and manually add third party modules in another commit." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21744 [16:55:12] Change abandoned: Ottomata; "I will recommit this change once https://gerrit.wikimedia.org/r/#/c/21749/ is approved." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21745 [16:55:22] PROBLEM - check_mysql on payments1002 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:55:22] PROBLEM - check_mysql on payments1003 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [16:55:22] PROBLEM - check_mysql on payments1004 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [17:00:28] PROBLEM - check_mysql on payments1004 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [17:00:28] PROBLEM - check_mysql on payments1003 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [17:00:28] PROBLEM - check_mysql on payments1002 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [17:00:31] New patchset: MaxSem; "Log API errors caused by the WLM app in a separate file" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/21753 [17:00:37] PROBLEM - check_minfraud_primary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:00:37] PROBLEM - check_minfraud_secondary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:00:37] PROBLEM - check_mysql on payments1001 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [17:02:34] PROBLEM - Puppet freshness on ms-be9 is CRITICAL: Puppet has not run in the last 10 hours [17:02:45] New patchset: Demon; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [17:03:28] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/13484 [17:04:59] New patchset: Demon; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [17:05:25] PROBLEM - check_mysql on payments1002 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [17:05:25] PROBLEM - check_mysql on payments1004 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [17:05:25] PROBLEM - check_mysql on payments1003 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [17:05:34] PROBLEM - check_minfraud_primary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:05:34] PROBLEM - check_minfraud_secondary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:05:34] PROBLEM - check_mysql on payments1001 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [17:05:40] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/13484 [17:06:12] New patchset: Demon; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [17:06:55] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/13484 [17:10:22] PROBLEM - check_mysql on payments1004 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [17:10:22] PROBLEM - check_mysql on payments1003 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [17:10:22] PROBLEM - check_mysql on payments1002 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [17:10:58] PROBLEM - check_minfraud_primary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:10:58] PROBLEM - check_minfraud_secondary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:10:58] PROBLEM - check_mysql on payments1001 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [17:15:28] PROBLEM - check_mysql on payments1003 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [17:15:28] PROBLEM - check_mysql on payments1004 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [17:15:28] PROBLEM - check_mysql on payments1002 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [17:15:37] PROBLEM - check_minfraud_primary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:15:37] PROBLEM - check_minfraud_secondary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:15:37] PROBLEM - check_mysql on payments1001 is CRITICAL: Access denied for user nagios@localhost (using password: YES) [17:18:55] PROBLEM - check_minfraud_primary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:18:55] PROBLEM - check_minfraud_secondary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:18:55] RECOVERY - check_mysql on payments1001 is OK: Uptime: 71955 Threads: 4 Questions: 2085 Slow queries: 11 Opens: 220 Flush tables: 4 Open tables: 24 Queries per second avg: 0.028 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [17:20:25] RECOVERY - check_mysql on payments1003 is OK: Uptime: 48485 Threads: 1 Questions: 111 Slow queries: 0 Opens: 43 Flush tables: 2 Open tables: 8 Queries per second avg: 0.002 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [17:20:25] RECOVERY - check_mysql on payments1004 is OK: Uptime: 48372 Threads: 1 Questions: 111 Slow queries: 0 Opens: 43 Flush tables: 2 Open tables: 8 Queries per second avg: 0.002 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [17:20:25] RECOVERY - check_mysql on payments1002 is OK: Uptime: 48634 Threads: 1 Questions: 108 Slow queries: 0 Opens: 43 Flush tables: 2 Open tables: 8 Queries per second avg: 0.002 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [17:20:34] PROBLEM - check_minfraud_primary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:20:34] PROBLEM - check_minfraud_secondary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:25:58] PROBLEM - check_minfraud_primary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:25:58] PROBLEM - check_minfraud_secondary on payments1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:22] RECOVERY - check_minfraud_primary on payments1001 is OK: HTTP OK: HTTP/1.1 302 Found - 128 bytes in 0.019 second response time [17:28:22] RECOVERY - check_minfraud_secondary on payments1001 is OK: HTTP OK: HTTP/1.1 302 Found - 128 bytes in 0.412 second response time [17:36:00] New patchset: Demon; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [17:36:42] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/13484 [17:37:47] New patchset: Demon; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [17:38:30] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/13484 [17:41:55] !log include debs for swift 1.5 and friends to apt (lucid/precise) [17:42:06] Logged the message, Master [17:44:38] New patchset: Demon; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [17:45:22] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/13484 [17:50:13] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [17:50:13] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [17:50:47] Change merged: Bsitu; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17503 [18:03:58] maplebed: how long does it take to add a second swift user? One that is better named. [18:05:03] the act of adding the user is very simple. [18:05:22] making sure it has appropriate access and testing everything, a bit longer. [18:05:42] http://wikitech.wikimedia.org/view/Swift/How_To#Create_a_user_.2F_account [18:08:09] maplebed: I guess the name could be decided at https://bugzilla.wikimedia.org/show_bug.cgi?id=34814 [18:08:35] also... [18:08:38] <-- last day. [18:08:49] probably better to ping paravoid. [18:08:51] :D [18:09:00] New patchset: Dzahn; "do not quote booleans or they become strings, the string "false" can actually be true." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21759 [18:09:46] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/21759 [18:13:57] Change merged: Kaldari; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/21570 [18:36:26] New patchset: Dzahn; "do not quote booleans or they become strings, the string "false" can actually be true." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21759 [18:37:13] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/21759 [18:42:25] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [18:52:28] !log rebooting cp1021 into PXE [18:52:38] Logged the message, Master [18:55:10] PROBLEM - Host cp1021 is DOWN: PING CRITICAL - Packet loss = 100% [18:56:13] ACKNOWLEDGEMENT - Host cp1021 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn reinstall [19:00:19] RECOVERY - Host cp1021 is UP: PING OK - Packet loss = 0%, RTA = 26.49 ms [19:00:37] PROBLEM - NTP on cp1021 is CRITICAL: NTP CRITICAL: No response from NTP server [19:00:55] PROBLEM - SSH on cp1021 is CRITICAL: Connection refused [19:00:55] PROBLEM - Varnish traffic logger on cp1021 is CRITICAL: Connection refused by host [19:04:49] PROBLEM - Varnish HTCP daemon on cp1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:05:07] PROBLEM - Varnish HTTP upload-frontend on cp1021 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:07:13] RECOVERY - SSH on cp1021 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [19:25:36] is it in fact impossible to do a https gerrit checkout? [19:25:56] i now get "error: RPC failed; result=22, HTTP code = 502" every time now [19:52:19] Jeff_Green: hahhh [19:52:24] Jeff_Green: I got that error earlier [19:52:39] i finally got a checkout after about 5 tries [19:52:48] mentioned it to ^demon|away and opened a bug for it https://bugzilla.wikimedia.org/show_bug.cgi?id=39737 [19:53:18] ah, good! [19:53:41] feel free to add your informations to that bug report (aka wich repo your tried to clone, the URL and the date/time of your attempts) [19:53:54] that would give some data for upstream to investigate [19:54:05] k [19:54:06] chad suppose it is a bug in the git implementation [19:56:56] added [20:04:36] New patchset: DamianZaremba; "Adding check_ram into nrpe::packages" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21822 [20:05:24] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/21822 [20:07:04] PROBLEM - Puppet freshness on ms-be3 is CRITICAL: Puppet has not run in the last 10 hours [20:07:04] PROBLEM - Puppet freshness on ms-be10 is CRITICAL: Puppet has not run in the last 10 hours [20:07:15] New patchset: DamianZaremba; "Adding check_ram into nrpe::packages" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21822 [20:07:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/21822 [20:09:10] PROBLEM - Puppet freshness on ms-be4 is CRITICAL: Puppet has not run in the last 10 hours [20:13:03] New patchset: Catrope; "Set the PCRE backtrack limit to 1e6" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21824 [20:13:48] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/21824 [20:15:01] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [20:31:16] New review: awjrichards; "Can you limit this just to the affected project(s)? I think it's just commons, right?" [operations/mediawiki-config] (master); V: 0 C: -1; - https://gerrit.wikimedia.org/r/21753 [20:35:32] New review: Dzahn; "what about the testswarm user then? require misc::contint::test::testswarm::systemuser" [operations/puppet] (production); V: 1 C: 1; - https://gerrit.wikimedia.org/r/21675 [20:36:32] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21541 [20:41:36] eh,, so ar.wp is Arabic, but ar.wikimedia is Argentina .. confusing? [20:42:19] Yeah [20:42:33] Also, uk.wikipedia.org is Ukranian, but uk.wikimedia.org is the United Kingdom chapter [20:42:50] heh [20:42:57] merging namespace change for Argentina chapter [20:43:03] I was confused by that as well yesterday :/ [20:43:18] Change merged: Dzahn; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/21586 [20:43:29] the way to remember about it is that projects are in a given language [20:43:35] and a chapter cover a country [20:43:59] ..or a city in the US [20:44:02] but yeah, we should have used something else for wm.org like united-kingdom.wikimedia.org / argentina.wikimedia.org or something [20:44:06] Yeah but then uk.wikimedia.org could be uk for ukraine [20:44:06] ohh [20:44:23] sanfrancisco.california.united-states-of-america.wikimedia.org ;-D [20:44:35] Yeah there are also a few that aren't even language codes [20:44:41] anyway, I am out for real now [20:44:44] Like br.wikimedia.org (Brazil) or nyc.wikimedia.org (New York City) [20:44:51] sf.ca.us.wiki !:) [20:44:56] get that TLD [20:45:12] hashar: cu [20:53:08] PROBLEM - Puppet freshness on ms-be12 is CRITICAL: Puppet has not run in the last 10 hours [20:54:20] PROBLEM - Puppet freshness on ms-be11 is CRITICAL: Puppet has not run in the last 10 hours [20:57:12] AaronSchulz: I wanted to mention that I see requests on ms7 coming in for stuff like /wikipedia/(langcode)/math/... so it's not just the top level math directory that gets referenced. [20:57:37] See http://wikitech.wikimedia.org/view/Swift/Open_Issues_Aug_-_Sept_2012/Cruft_on_ms7 the section on math, if you want specific examples I can put a few up tomorrow [20:57:44] (tonight I'm pretty done) [20:59:11] !log git pull and sync-file InitialiseSettings.php to cluster to push namespace fix on ar.wikimedia [20:59:22] Logged the message, Master [21:00:01] New review: Dzahn; "synced to cluster" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/21586 [21:05:59] New patchset: DamianZaremba; "Correcting nagios hostname for labs." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21831 [21:06:45] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/21831 [21:19:09] whee commits on my last day! [21:19:14] New patchset: Bhartshorne; "adding 50th percentile to average in gathered swift statistics." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21835 [21:20:00] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/21835 [21:21:35] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21835 [21:22:42] RECOVERY - Puppet freshness on ms-fe1 is OK: puppet ran at Tue Aug 28 21:22:29 UTC 2012 [21:27:38] RECOVERY - Puppet freshness on ms-fe3 is OK: puppet ran at Tue Aug 28 21:27:13 UTC 2012 [21:35:17] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: Puppet has not run in the last 10 hours [21:42:58] New patchset: Hashar; "realname for hashar unix account" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21839 [21:43:44] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/21839 [21:47:31] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21839 [21:51:59] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21831 [21:58:11] AaronSchulz: howdy [22:00:48] AaronSchulz: how are we going to proceed with https://wikitech.wikimedia.org/view/Swift/Open_Issues_Aug_-_Sept_2012/Cruft_on_ms7 ? [22:01:31] do you want to track them outside of this page (e.g. bz?) [22:02:17] PROBLEM - Puppet freshness on ms-fe4 is CRITICAL: Puppet has not run in the last 10 hours [22:13:53] RECOVERY - Puppet freshness on virt0 is OK: puppet ran at Tue Aug 28 22:13:34 UTC 2012 [22:14:43] Anyone around that's familiar with production nagios puppet stuff? [22:18:32] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: Puppet has not run in the last 10 hours [22:18:32] PROBLEM - Puppet freshness on ms-be1002 is CRITICAL: Puppet has not run in the last 10 hours [22:18:32] PROBLEM - Puppet freshness on ms-be1003 is CRITICAL: Puppet has not run in the last 10 hours [22:18:32] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [22:18:32] PROBLEM - Puppet freshness on ms-fe1001 is CRITICAL: Puppet has not run in the last 10 hours [22:18:33] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [22:18:33] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [22:18:34] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [22:18:34] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [22:18:35] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [22:18:35] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [22:18:36] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [22:18:36] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [22:52:06] New patchset: DamianZaremba; "Adding check_ram into nrpe::packages" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21822 [22:52:59] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/21822 [22:58:24] New patchset: DamianZaremba; "Adding check_ram into nrpe::packages" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21822 [22:59:14] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/21822 [23:03:39] New patchset: Bhartshorne; "closing my account" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21843 [23:04:32] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/21843 [23:05:16] New patchset: Bhartshorne; "closing my account" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21843 [23:06:01] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/21843 [23:06:39] Damianz: yeah, what's up [23:07:52] I was wondering if the free_ram check was used in prod - I've since found out no and my change shouldn't break anything. If you want to review https://gerrit.wikimedia.org/r/#/c/21822/ it would be awesome though. [23:08:47] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21843 [23:09:23] New patchset: preilly; "Revert "closing my account"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21844 [23:10:12] Change abandoned: preilly; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21844 [23:11:00] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21822 [23:11:43] Awesome, thanks LeslieCarr [23:12:16] and merged on sockpuppet [23:12:27] including maplebed's no more accounts :( [23:12:32] * LeslieCarr quietly sobs  [23:13:10] * Damianz hands maplebed an ice lolly [23:17:01] !log fixed scripts for management home directories and gluster shares for labs by changing the project search filter to match keystone's LDAP DIT [23:17:12] Logged the message, Master [23:17:12] where's that bot? [23:17:15] morebots: -_- [23:19:45] New patchset: Ryan Lane; "Fix scripts for management of gluster and home directories" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21846 [23:20:31] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/21846 [23:21:36] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21846 [23:26:29] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [23:27:25] Not important but http://noc.wikimedia.org/cgi-bin/ng/report.py seems broken with a db error. [23:31:24] New patchset: Demon; "Don't explicitly set jvm path" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21848 [23:32:08] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/21848 [23:32:51] New review: Demon; "I've tested this in labs, and can't see any reason it won't work in production too." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/21848 [23:33:41] Damianz: hmm, is it this ? http://noc.wikimedia.org/cgi-bin/report.py [23:33:48] or another report.py [23:34:53] Probably that, link on wikitech homepage to "Profilling web interface" links to /ng/ though [23:45:24] !log there are some network issues with one of our ipv6 carriers, causing some european ipv6 users to have trouble reaching our US sites. [23:45:34] Logged the message, Mistress of the network gear.