[00:06:00] is Adam Miller still employed by the foundation? [00:07:47] i don't know who that is (to be fair, i don't know a lot of employees) [00:08:42] Usability Initiative developer [00:08:48] http://www.mediawiki.org/wiki/User:Adammiller [00:08:55] doesn't look like he's ever edited [00:09:52] I don't see him at http://wikimediafoundation.org/wiki/Staff_and_contractors [00:10:01] hehe that's wher ei was checking [00:10:42] some staff aren't on there, but I don't think he's on any staff lists elsewhere [00:10:58] I think they _should_ appear there [00:11:16] there's also https://meta.wikimedia.org/wiki/Wikimedia_Foundation_contractors [00:12:11] maybe you should just send an email "Are you still employed by WMF?" to the wikimedia.org address and see if it bounces :P [00:12:19] oh i didn't know that existed [00:12:49] that page in meta was started because http://wikimediafoundation.org/wiki/Staff didn't list contractors [00:12:57] which are an important part of wmf employees [00:13:29] http://wikimediafoundation.org/wiki/Staff was then (years later) changed to http://wikimediafoundation.org/wiki/Staff_and_contractors [00:13:39] so I'm not sure there's a reason to keep it, though [00:21:45] No he isn't [00:23:14] No, Adam isn't with WMF any more [00:23:17] Hasn't been for a while [00:30:09] PROBLEM - Host ms-be5 is DOWN: PING CRITICAL - Packet loss = 100% [00:35:42] RECOVERY - Host ms-be5 is UP: PING OK - Packet loss = 0%, RTA = 1.09 ms [00:37:27] !log restarting exim4 on mchenry with split_spool_directory = true [00:37:32] Logged the message, Mistress of the network gear. [00:48:37] PROBLEM - Host ms-be5 is DOWN: PING CRITICAL - Packet loss = 100% [00:54:09] RECOVERY - Host ms-be5 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [00:54:58] damn, i know why that's not reducing the backlog [00:55:01] it only matters for new messages [01:03:59] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 18.6763888889 (gt 8.0) [01:13:35] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 0.2422868 [01:41:56] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 280 seconds [01:42:23] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 244 seconds [01:48:59] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 639s [01:49:17] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 1 seconds [01:51:50] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 4s [01:52:35] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 18 seconds [02:12:59] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [04:36:05] New patchset: Faidon; "partman: fix boot for Ciscos" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12565 [04:36:38] New review: Faidon; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12565 [04:36:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12565 [04:37:41] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12565 [04:45:06] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: Puppet has not run in the last 10 hours [04:46:09] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [04:49:09] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [05:57:32] up at 7:30 am? I mean, so was I... but that was after sleeping all night [06:02:40] aw [06:03:02] sleep is good. [06:22:04] New review: Dereckson; "I still need to add aliases." [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/12556 [06:24:18] PROBLEM - Host lvs1001 is DOWN: PING CRITICAL - Packet loss = 100% [06:24:27] PROBLEM - BGP status on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, sessions up: 9, down: 1, shutdown: 0BRPeering with AS64600 not established - BR [07:35:56] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [07:44:02] PROBLEM - Puppet freshness on db42 is CRITICAL: Puppet has not run in the last 10 hours [07:44:13] $#$%$@!%$!$%@!$% PARTMAN [07:44:23] what happened with lvs1001? [07:44:28] anyone investigating? [07:45:17] eh? [07:45:31] crap it did not page me and I was busy typing (code) [07:45:53] that really can'tbe true or we would have a huuuuge numebr of complaints [07:47:09] we assume that the cr1 whine is the real issue I guess? [07:56:40] going to powercycle it I guess [07:57:16] seems like the cr1 whine is a symptom not a cause [07:59:01] !log powercycled lvs1001, not pingable, nothing good from mgmt console, etc. [07:59:08] Logged the message, Master [08:01:50] yes, cr1 is complaining that it lost a bgp session [08:01:52] with lvs1001 [08:02:16] host is up again [08:02:21] also, we didn't get any other alerts, so either 1001 was already secondary or it wa primary and cr1 fell back to its backup [08:02:28] wasn't worried much :) [08:02:47] RECOVERY - Host lvs1001 is UP: PING OK - Packet loss = 0%, RTA = 26.36 ms [08:03:09] yeah, I guess that there was a failover (after looking at the config) [08:03:23] RECOVERY - BGP status on cr1-eqiad is OK: OK: host 208.80.154.196, sessions up: 10, down: 0, shutdown: 0 [08:03:47] and there we go [08:10:29] :) [08:10:30] thanks [08:10:34] I'm so deep in partman shit right now [08:10:44] going through logs & source [08:12:14] source? sounds bad already [08:15:59] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [08:25:42] New patchset: Hashar; "labs use the same wgCentralDBname on all wiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12566 [08:25:49] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/12566 [08:26:31] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/12566 [08:26:34] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12566 [08:30:51] New review: Hashar; "Deployed on live site" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12566 [08:39:14] New patchset: Hashar; "Disable wgNoticeInfrastructure on 'beta' cluster" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12568 [08:39:20] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/12568 [08:39:33] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/12568 [08:39:35] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12568 [08:40:34] New review: Hashar; "Deployed on live site." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12568 [08:48:06] New patchset: Hashar; "Load transcode conf on -e /etc/wikimedia-transcoding" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12569 [08:48:12] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/12569 [08:48:25] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/12569 [08:48:31] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12569 [08:49:20] New review: Hashar; "deployed on live site." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12569 [08:53:03] New patchset: Faidon; "partman: more fixes for Cisco" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12570 [08:53:36] New review: Faidon; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12570 [08:53:36] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12570 [08:53:45] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12570 [09:15:16] mark: ping? [09:15:54] pong? [09:15:56] hi [09:15:58] i'm about to leave to the datacenter [09:16:27] so, I've spent like an hour or two thinking that I had a third bug with the d-i for virt[678] [09:16:35] because it didn't ping after installing [09:16:50] I just learned that we're dropping that traffic from prod [09:17:05] I'm fine with that, but could we make it to reply icmp prohibited instead? [09:17:21] silently dropping traffic is unintuitive [09:17:49] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [09:18:57] we don't drop silently [09:19:07] unless someone changed it [09:19:12] I set it up as admin prohibited [09:19:55] i'll look at it later [09:19:58] and also arrange your access ;) [09:20:06] :-) [09:20:08] leaving now or it'll get very late [09:20:09] bye [09:20:10] ! [09:20:18] bye & thanks [09:30:43] New patchset: Faidon; "partman: do not warn for not having a swap" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12572 [09:31:05] Ryan_Lane: okay, gerrit question for you [09:31:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12572 [09:31:56] ? [09:32:14] commit 9e18e152096641ba79c5fba26d587a74941f1e0d ebcc2ddc112be4e83b1e81f341e745cc45d0a175 728f1f1dc47d1524dbb933a07b08e76f2148a5b8 [09:32:17] commit 728f1f1dc47d1524dbb933a07b08e76f2148a5b8 ebd1c8e20750f455498ef0c179d08a6596b8261b [09:32:21] commit ebcc2ddc112be4e83b1e81f341e745cc45d0a175 2e3d4d2e0ce79e0a185aebc7e30d986434dfb1c9 ebd1c8e20750f455498ef0c179d08a6596b8261b [09:32:24] why the hell did it do a merge? [09:32:30] New review: Faidon; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12572 [09:32:33] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12572 [09:32:48] I commited 728f1f1d right on top ebd1c8e [09:32:58] submitted to gerrit which did a merge of those two [09:33:03] that's just crazy. [09:34:17] hmmmm [09:34:25] it didn't do it with the latest commit [09:34:45] the only difference was that last time I did a +2, waited for gerrit2 to v: +1 and hit "submit" [09:34:53] so review & submit is different from review + submit? [09:36:49] so you had the old item as +2/+1 when you review/submitted the new commit? [09:38:14] New review: Jens Ohlig; "I find the functional style readable and elegant, but it really makes things a lot harder to debug f..." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/8344 [09:40:21] I don't understaand the question [09:41:26] that means I don't understand your question :-D [09:42:48] you had two commits that you pushed for review, and then what exactly? [09:49:00] they were separate commits that were merged independently [09:49:12] the git log on the first is a mess [09:49:20] anyway, got to run to catch the bank open [09:49:23] ok [09:49:26] see ya later [10:31:56] !log installing security upgrades on formey (gerrit) [10:32:01] Logged the message, Master [10:41:48] !log installing security upgrades on fenari [10:41:53] Logged the message, Master [10:42:48] !log fenari upgrade - this included replace wikimedia-lvs-realserver 0.04 (using .../wikimedia-lvs-realserver_0.08 [10:42:53] Logged the message, Master [10:46:34] !log installing security upgrades and kernel on bast1001 (still needs reboot, but dont break user sessions) [10:46:39] Logged the message, Master [11:33:30] New patchset: Hashar; "detect cluster with /etc/wikimedia-realm" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12583 [11:33:36] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/12583 [11:44:33] !log installing upgrades and kernel on pdf1, can reboot? (also needs puppetizing and precise reinstall) [11:44:38] Logged the message, Master [12:04:57] New patchset: Hashar; "(bug 37700) update stewardwiki logo & favicon" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11943 [12:05:06] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11943 [12:05:41] New review: Hashar; "Updated commit message and rebased." [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11943 [12:06:02] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11943 [12:06:05] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11943 [12:07:02] New review: Hashar; "deployed on live site" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11943 [12:14:03] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [12:23:47] New patchset: Hashar; "enhance account throttling" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12185 [12:23:53] New review: jenkins-bot; "Build Failed " [operations/mediawiki-config] (master); V: -1 C: 0; - https://gerrit.wikimedia.org/r/12185 [12:24:25] New review: Hashar; "Patchset 2 fix issues mentioned in the inline diff of patchset 1." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12185 [12:46:14] mark: so, we just tested the ircd package I made [12:46:17] it works [12:47:48] on a side note, we're officially out of public IPs in labs [12:58:10] New patchset: Hashar; "enhance account throttling" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12185 [12:58:16] New review: jenkins-bot; "Build Failed " [operations/mediawiki-config] (master); V: -1 C: 0; - https://gerrit.wikimedia.org/r/12185 [13:01:30] New review: Hashar; "Patchset3 just rewrite most of the original patch and code :-D" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/12185 [13:04:02] New patchset: Hashar; "enhance account throttling" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12185 [13:04:08] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/12185 [13:04:51] New review: Hashar; "Patchset 4 is a rebase to latest master. We had a change to raise throttle on enwiki : https://gerr..." [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/12185 [13:55:52] New patchset: Ryan Lane; "Ensure the apparmor profile is added before mysql is reconfigured." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12590 [13:56:23] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12590 [14:02:06] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12590 [14:02:08] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12590 [14:27:53] hey guys, nrpe question da room [14:28:08] if a new check file is created in /etc/nagios/nrpe.d [14:28:17] does nagios-nrpe-server need to be reloaded in order to see it? [14:28:21] (I assume yes) [14:28:26] I ask, because currently puppet does not do this [14:28:29] should it? [14:29:31] ottomata: yes it needs [14:29:43] reload :o [14:29:51] ok, i will make it so in puppet, subscribing the service [14:30:00] that would be cool [14:30:08] because right now puppet has troubles with this, on labs [14:30:19] sometimes I need to reload nrpe by hand [14:30:34] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12387 [14:30:37] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12387 [14:31:41] oh i take it back! [14:31:44] it is supposed to already do this [14:31:44] notify => Service["nagios-nrpe-server"] [14:31:45] hmm [14:32:59] is there a way I can ask nrpe what a check is currently returning? [14:34:52] ottomata: there's a 'returns' resource (proper term?) which you can act on [14:35:04] hmm [14:35:06] i mean manually [14:35:10] oh ha [14:35:12] something like [14:35:27] nagios-check —name check_udp2log_log_age-lucene [14:35:32] and have nrpe run it [14:35:43] rather than me run the command manually [14:35:45] that I don't know [14:35:47] ok [14:35:49] well hmm [14:35:50] so [14:35:58] yesterday notpeter added a new udp2log instance [14:36:05] and it has a check to make sure the log files aren't old [14:36:05] http://nagios.wikimedia.org/nagios/cgi-bin/extinfo.cgi?type=2&host=oxygen&service=udp2log+log+age+for+lucene [14:36:15] trying to understand why it says this check is not defined [14:36:30] it exists in the /etc/nagios/nrpe.d directory [14:36:43] otto@oxygen:/etc/nagios/nrpe.d$ cat check_udp2log_log_age-lucene.cfg [14:36:43] command[check_udp2log_log_age-lucene]=/usr/lib/nagios/plugins/check_udp2log_log_age lucene [14:36:56] and running that command manually works [14:37:36] New patchset: Ryan Lane; "Allow the labs mysql role to be more configurable" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12592 [14:38:10] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/12592 [14:38:38] hey Ryan_Lane, why not pass the $mysql_datadir as a class parameter [14:38:49] instead of resolving the variable in global scope? [14:39:19] because I can't call parameterized classes from labs [14:39:23] ohhhhhh [14:39:25] righto [14:39:26] cool [14:41:10] can I seriously not do: if !$::mysql_datadir { [14:41:10] ? [14:41:19] but if !$mysql_datadir { is allowed? [14:42:13] New patchset: Ryan Lane; "Allow the labs mysql role to be more configurable" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12592 [14:42:20] if that's true, I'm going to be really annoyed [14:42:40] spoken like a true puppet user [14:42:44] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/12592 [14:42:48] maybe not [14:43:10] oh [14:43:11] I'm dumb [14:43:37] New patchset: Ryan Lane; "Allow the labs mysql role to be more configurable" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12592 [14:44:10] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12592 [14:44:33] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12592 [14:44:36] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12592 [14:45:50] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: Puppet has not run in the last 10 hours [14:46:44] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [14:47:12] Ryan_Lane, not that it really matters, but it might look cleaner to use conditional assignment instead of an if/else block for those [14:47:40] datadir => $::mysql_datadir ? { [14:47:40] false => "/mnt/mysql", [14:47:40] default => $::mysql_datadir [14:47:40] } [14:47:51] that works? [14:47:55] yeah [14:48:02] ah. cool [14:48:03] you can even do that in the class inclusion, instead of setting a temp variable [14:48:07] was going to do that priginally [14:48:51] something like that [14:48:52] https://gist.github.com/2973201 [14:48:52] <^demon> drdee: Ping. [14:48:54] but yeah, doesn't really mater [14:49:04] ^demon: pong [14:49:24] <^demon> Hey :) I was trying to setup webstatscollector yesterday, but I couldn't find the existing code in SVN. [14:49:27] Jeff_Green: ping! [14:49:28] hehe [14:49:35] ^demon: hold on [14:49:44] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [14:50:03] http://svn.mediawiki.org/viewvc/mediawiki/trunk/webstatscollector/ [14:50:19] so I was giving access to evan rosen, a new global dev to some db machines [14:50:23] <^demon> Ah, I was looking in trunk/tools. Thanks :) [14:50:31] notpeter just informed me that the default wikidev group is not available on all machines [14:50:44] and he suggested I ask you (Jeff_Green) what to do about it [14:52:03] ottomata: hm. if this is being included via the node config, it's global, right? [14:52:21] because my change didn't work when I ran it. heh [14:52:36] oh. hell, it's not set [14:52:39] the var? [14:52:45] I didn't set the var [14:52:54] aye [14:53:00] and actually, i don't know much about the $:: syntax [14:53:01] oh. something's broken [14:53:04] i've only used that here [14:53:08] i think if you define it like this in your node [14:53:12] $mysql_datadir = '...' [14:53:14] project puppet groups are broken [14:53:19] then it is local? [14:53:38] maybe $:: only qualifies things that are declared outside of a class? [14:53:39] (not sure) [14:53:42] sorry I was afk for a minute there [14:53:46] np [14:54:08] well, it doesn't matter, because it's something broken in labs anyway [14:54:11] hehe, aye [14:54:35] does this make any sense (nowadays or ever): puppet: systemuser ... groups => [ 'project-foo' ] } ? [14:54:42] ooohhhh [14:54:45] ottomata: I don't know the history of the wikidev group, my impression is that it's a legacy and a throwawy in terms of security [14:55:00] it seems setting 0 doesn't work? [14:55:04] aye, i don't really care much I think, but admins.pp sets it as the default group for new users [14:55:16] I wonder how my logic is fucked up for that one [14:55:25] and we do use wikidev on stat servers, to allow for all of us to access data we are working on in /a (and other) places [14:55:58] is it missing from one of the "class admins::..." blocks? [14:56:30] i see it all there [14:56:41] but this is a new users, and maybe not in one of the admins groups? [14:56:49] also, this is on one of the db servers [14:56:50] umm [14:56:52] which server? [14:57:04] db42 and db1047 [14:57:16] not sure which puppet is complaining on, notpeter knows [14:57:24] garg puppet is blinding me [14:57:30] hehe [14:57:33] ottomata: once you have time: there is a new issue in RT-3180 (cant find libcairo), and 3119 (shell for Evan) in RT-3119. fyi and no worries [14:57:54] ottomata: save me some pain--how is the user being added to db1047 for example? [14:58:01] hm, i Know about libcairo, was waiting til someone cared to fix it. that is a problem with precise [14:58:04] can you use true/false in the config, rather than 1/0? [14:58:05] # RT 3119 [14:58:05] if $hostname == "db1047" { [14:58:05] include accounts::erosen [14:58:05] } [14:58:14] hmmm [14:58:14] o_0 [14:58:20] ottomata: but that did not work on db1047 and db42. Could not find dependency Group[500] for User[erosen] [14:58:21] no sure Ryan_Lane, would have to try it with mysql [14:58:25] not sure if my.cnf likes that [14:58:35] could make the my.cnf.erb file smarter though [14:58:48] holy cow there have been a ton of changes since my last git pull [14:58:51] <%= innodb_file_per_table ? 1 : 0 %> [14:59:09] file_per_table is my doing [14:59:10] mutante, right, I am talking to Jeff_Green about that right now [14:59:20] Jeff_Green: because I did a merge from test to production [14:59:23] Jeff_Green: see above, dependency group and RT-3119 ottomata: just saw backlog :) [14:59:29] :) [15:00:32] i'm the wrong person to ask about this because I hate our entire user creation scheme so much I wrote a new one [15:00:37] which is sitting on the shelf [15:01:22] where did /var/log/daemon.log go btw? moved in precise? [15:01:28] i guess my suggestion would be to create a new flavor of admin:: class in admins.pp, and apply that to db1047 instead of doing individual accounts [15:02:13] that seems to be sort of the standard, and imo that's the logical place to make sure the group is added as well--sticks with the standard as awful as it is [15:02:21] is this a case for virutal resources? [15:02:23] http://docs.puppetlabs.com/guides/virtual_resources.html [15:02:50] or um, can we just put wikidev group everywhere? :) [15:02:54] New review: MaxSem; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: -1; - https://gerrit.wikimedia.org/r/12185 [15:02:57] those seem more about preventing conflicts [15:03:19] my issue is that we sprinkle user creation all over the place and it makes it very hard to control what's going on for a particular host [15:03:41] wikidev is a good example--you want to use it as a point of control for access to directory [15:04:02] and you know what users you're adding to that group within the classes you deal with [15:04:37] but suddenly you discover some other class somewhere got sucked onto the box and granted some other user that group [15:05:13] that's totally unacceptable in the FR context where I've been working--I have to know exactly who is going to have what access and why [15:05:23] (FR==?) [15:05:27] fundraising [15:05:28] aye [15:05:29] hm [15:05:39] ottomata: / Jeff: well, in class erosen there is gid => $gid, but its not set to 500 or something else, and you dont have group 500, so use another gid? shrug [15:06:02] even better [15:06:10] that's true [15:06:21] create a new group specific to your purpose, and hope nobody else touches it :-) [15:06:25] can't I just not set it? [15:06:29] and it will be the user's group? [15:06:30] erosen? [15:06:45] ah, but I do want him to be in wikidev on stat machines [15:06:53] ok so this is the other problem [15:07:02] well not *the* but *an* [15:07:03] ^demon: yeah not sure why webstatscollector is not in trunk/tools anyways [15:07:09] right [15:07:13] we use wikidev as the login group all over the place [15:07:18] yeah, that is wrong [15:07:26] default group should be individual, i think, no? [15:07:32] and then they shoudl be in this other 'wikidev' group [15:07:32] <^demon> drdee: It's all good--it'll be in analytics/webstatscollector in just a bit :) [15:07:34] if needed [15:07:34] I agree yeah [15:07:38] if I did that for erosen [15:07:38] exactly [15:07:41] puppet might not be angry [15:07:43] ^demon: Thx! [15:07:50] i think it might be ok with a non-existent secondary group [15:07:55] ottomata: you're going to find that that is forced in several places [15:08:12] grep "$gid = 500" in manifests [15:08:35] oh actually it looks a little better than last time I looked [15:08:37] heh [15:08:38] ottomata: one of the issues is that you can't add existing users to existing groups using the puppet linux provider afaik and you would have to use an Exec to add a user to additional groups after the creation [15:08:52] oh no, grep "$gid=500" [15:09:07] that's the one that made me cry [15:09:24] so you either accept wikidev as the default group or redo half the world [15:11:04] mutante, really? [15:11:05] das crazy [15:11:22] is this worth a bigger discussion? [15:11:38] (and then redo half the world?) [15:11:49] probably not, i mean, yes [15:12:00] it would be amazing to make our user management much smoother and better [15:12:02] i'd be happy to put my new fully parameterized approach out there again for bashing [15:12:02] so many ways to do that [15:12:05] buuuuuuuut [15:12:11] it would probably break tons of stuff [15:12:12] it allows totally granular control, but it's a total redo [15:12:15] yep [15:12:15] and people would be pretty unhappy [15:12:19] https://groups.google.com/forum/?fromgroups#!topic/puppet-users/gRjXoaukopE [15:12:22] me and Jeff would be happy! [15:12:33] that's why i said first a bigger discussion [15:12:43] because the problem (not treated) will only get worse [15:12:49] with additional hires in the coming year [15:13:27] hey Jeff_Green, what about this: [15:13:28] https://gist.github.com/2973365 [15:13:29] would that work? [15:13:41] "By default most Linux distributions will use the ‘groupadd’ provider, which doesn’t allow you to manage group members, so you’ll have to do it on the user resources instead." http://www.puppetcookbook.com/posts/add-a-unix-group.html [15:14:20] this is getting to complicated for IRC but . . . [15:14:31] ottomata, for adding yes, for limiting it gets tricky [15:14:50] mutante: i found puppet and linux to work fine together for full control of both flavors of group [15:15:08] it's just that our class is not well designed [15:15:12] limiting it? [15:15:17] so [15:15:33] puppet treats groups as inclusive|minimum [15:15:41] Jeff_Green: cool, i always ran into the problem when wanting to add a user to a group after both existed already. [15:16:03] Jeff_Green: even had an Exec for usermod somewhere in puppet [15:16:29] huh, yeah I haven't had trouble with it [15:16:36] but I was working on this a month ago so I forget a bit [15:16:54] eh yeah, its been a while here as well and there have been changes [15:17:19] looking forward to the new stuff then [15:18:07] for fundraising I gave up short term and I've been administering the additional groups by hand, it's only two boxes [15:18:25] i'm leaning toward moving aluminium/grosley to the payments puppet instance though [15:19:32] I'll post the stuff I did on fenari so you can look at it and laugh [15:19:33] https://groups.google.com/forum/?fromgroups#!topic/puppet-users/-ZDnT4aO3uw [15:19:47] "Re: [Puppet Users] Workaround to "Provider groupadd does not support features manages_members" ?" [15:20:09] so um, not sure what you mean by limiting still, [15:20:15] oh sorry [15:20:32] i think my suggested change will work, but only in that i think puppet won't complain about the non existent group in the secondary groups [15:20:43] not sure about that though... [15:20:47] i.e. will puppet make the list of additional groups the complete list? or will it just add them to the list [15:21:03] i.e. will puppet remove a user from groups you have not specifically applied in manifests [15:21:30] Jeff_Green: https://github.com/duritong/puppet-user/blob/9bbd720da1549bf58c7707c1ac109a47e4b4a946/manifests/groups/manage_user.pp [15:21:46] Jeff_Green: maybe they merged that or similar meanwhile [15:21:49] oh hmm [15:21:53] lemme try it and find out [15:21:59] (on my local...) [15:22:54] they are using Augeas (like we do for iptables) [15:23:05] to have manage_user [15:23:50] fenari:/tmp/manifests [15:24:01] um, how does this ever work? [15:24:02] require => Group[$gid], [15:24:09] the group is not named by $gid [15:24:10] this is from the very stripped down and incomplete payments puppet [15:24:16] class wikidev { [15:24:16] group { "wikidevgroup": [15:24:16] name => "wikidev", [15:24:16] gid => 500, [15:24:23] ottomata: it'll take either gid or group name iirc [15:24:28] it doesn't care [15:24:34] Could not find dependency Group[602] [15:24:44] group { "erosen": [15:24:44] gid => $uid, [15:24:44] } [15:24:49] $uid = 602 [15:24:57] maybe iirw [15:25:12] hehe, maybe i'm doing something done [15:25:15] dumb* [15:25:26] http://docs.puppetlabs.com/references/stable/type.html#user [15:25:56] oh [15:25:59] interesting [15:26:10] so why the require = > Group[$gid] in the unixaccount define? [15:26:47] where are you? I'm lost [15:27:19] If Puppet is managing the user’s primary group (as provided in the gid attribute), the user resource will autorequire that group. [15:27:29] yup [15:27:43] in unixaccount define where user is defined [15:27:46] require => Group[$gid], [15:27:56] user { "${username}": [15:27:56] … [15:27:56] require => Group[$gid], [15:27:59] admins.pp line 36 [15:28:02] you're looking at the production admins.pp? [15:28:05] yup [15:28:44] maybe a legacy thing? [15:28:49] yeah maybe [15:29:00] its ok though, I can specify gid as the group name [15:29:01] and that works [15:29:09] ahhh phooey [15:29:13] my suggested change will not work [15:29:23] Could not set groups on user[erosen]: Execution of '/usr/sbin/usermod -G nonya,wikidev erosen' returned 6: usermod: group 'nonya' does not exist [15:29:59] good news though [15:30:04] using groups => [groupA, groupB] [15:30:07] is limiting [15:30:17] it will ensure they are only in the secondary groups listed [15:31:07] so, my idea doesn't work. [15:31:15] but, a bunch of nodes in site.pp [15:31:19] manually include groups::wikidev [15:31:28] yep [15:31:29] group { "nonya": ensure => "present" ? [15:31:29] can I just do that on the db hosts where I am including erosen? [15:31:37] OR! [15:31:39] how about [15:31:41] in unixaccount [15:31:51] if $gid == 500 { include groups::wikidev } [15:31:52] ? [15:31:58] ottomata: this is why I was suggesting a class like those at the end of admins.pp [15:32:12] it gives you one tidy place to add your groups and users [15:32:16] yeah, but a new class just for erosen? [15:32:20] why not? [15:32:22] hmm [15:32:26] i guess he is a global dev [15:32:33] i could make a admins::globaldev class [15:32:38] it's a class for the type of user [15:32:43] wouldn't mind an admins::analytics class either [15:32:55] would simplify some of the other machines we are included on [15:32:57] we have a new type of user, for better or worse we do that with a class [15:33:00] okey dokey [15:33:44] i thought now you would just add the group via "group" ensure = "present" and then require that where you just ran into group 'nonya' does not exist. [15:34:07] yeah i could do that [15:34:11] but this was a test [15:34:19] but i am getting confused as well, AND looking at Jeffs manifests as well [15:34:22] to see if puppet would complain if I specified a secondary group that didn't exist [15:34:24] hahah [15:34:46] mutante: fair warning, the thing that is broken with my manifests--you can't include a user in multiple classes [15:34:57] you get a conflict [15:35:25] I started looking at virtual resources to fix that, but I couldn't get them to work as I interpreted the documentation [15:36:01] and I'm not up on augeas or other advanced puppetisms [15:37:13] Jeff_Green: gotcha, and me neither, just the manage_user.pp looked fairly easy from the resulting puppet code (once you got the right Augeas commands there) [15:37:30] mutante: yup [15:37:56] i sent out an email on my prototype to ops@ on 5/21 which explains it a little [15:39:01] you're right. this is something for lists for a while now [15:39:21] I actually *like* the fact that a user can only be created in one place, but I can see why others would hate that [15:43:06] New review: Dereckson; "Not ready yet, it's still a draft." [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/12556 [15:44:02] <^demon> drdee: Done :) [15:44:11] ^demon: sweet! [15:47:28] hey does anyone here understand udp2log well? [15:47:42] kinda [15:47:44] sort of [15:47:55] together we should be able to figure it out :D [15:48:01] if you are talking about the code itself, not so much [15:48:07] but the setup on the servers, jajaja [15:48:23] i might know a bit about the code, the real expert is Tim Starling [15:48:26] with the chatter on changing the log rotation scheme, peter (fundraising tech peter) is wondering about flipping back from 1:100 sampling to 1:1 so he can study it a bit [15:49:06] Jeff_Green: which box? [15:49:10] how long is a bit? does he want to leave it like that for just a few hours, or days? [15:49:17] they're on locke [15:49:22] hour or two, and not today [15:49:23] mmmmmmmmmmmm [15:49:28] he's just wondering about the ease of doing so [15:49:35] that sounds tricky on Locke [15:49:47] can't he run the test on Oxygen? [15:49:52] it was like that on locke until a month ago [15:50:06] this is how we ran FR logging on lock until May [15:50:21] yeah but you increase the traffic by a factor 100 [15:50:26] the question we're asking is whether 1:100 sampling will be acceptable for the fundraiser [15:50:55] drdee yep that's known [15:51:06] drdee_: if we can leave locke at 1:100 and run an hour or two simultaneously on oxygen at 1:1 that would be amazing [15:51:13] New patchset: Ottomata; "Putting erosen in admins::globaldev class that includes group wikidev." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12600 [15:51:14] PROBLEM - Puppet freshness on lvs1001 is CRITICAL: Puppet has not run in the last 10 hours [15:51:15] that' no problem [15:51:27] ottomata, agree? [15:51:49] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12600 [15:51:52] as long as you are watching it while it is at 1:1 [15:52:01] and you can revert if things go sour [15:52:02] then jaaa [15:52:14] also the filters on oxygen are less volatile than on locke/emery, so tha tis less dangerous [15:52:31] and we just disabled the overal sampled filter on oxygen [15:52:37] i have a second question--does anyone know whether we can afford to do file compression on locke now? since I've been here the word has always been "zomg if you use any cpu you'll cause udp2log to drop packets" [15:52:40] so there are more resources available [15:53:51] like, as they come in? [15:54:03] logs should be compressed on rotation, ja? [15:54:07] on rotation [15:54:18] pretty sure that is happening now [15:54:25] not with the fundraising stuff [15:54:29] hm [15:54:32] where is that? [15:54:37] the banner log was likely the issue [15:54:47] they were HUGE at 1:1 [15:54:53] well this is what I'm wondering [15:55:12] also now that the filter is 1:100 does that reduce CPU utilization significantly as well for the logging portion? [15:55:30] ok yeah, dont' see any rotation for /a/squid/fundraising [15:55:41] I know there's none :-) [15:56:07] ah right, you are copying them off elsewhere [15:56:12] yeah [15:56:21] afaik, filter stuff is never CPU bound but i could be very wrong [15:56:22] scripts/rotate_and_archive_fundraising_logs [15:56:58] if we can compress on rotate, and rotate every 15 min, I can get behind the logrotate idea [15:57:26] if we can't compress, then transferring the files becomes more of a pain in the ass and I will be sad [15:57:32] load is currently ~36% on locke [15:57:33] what is the overhead to logrotate? [15:58:00] jeff_green, is that better than what you are currently doing? [15:58:04] to logrotate itself? or with gzip [15:58:14] pgehres: to logrotate itself? or with gzip [15:58:15] what are you doing right now? [15:58:35] Jeff_Green: I was just considering the idea of rotating every 5 or 10 to get stats faster [15:58:53] i assume gzip is the same for X data in however many chunks [15:58:59] how often does your manual rotate do it now? [15:59:02] pgehres: the only part of that which would concern me is what happens with udp2log when you hup [15:59:13] 15 minutes [15:59:22] ottomata: every 15min, script runs and does what logrotate would do--copies aside and hups udp2log [15:59:29] then it rsyncs to storage3 [15:59:35] request to other ops folks, who could review https://gerrit.wikimedia.org/r/#/c/11898/ [15:59:41] that's fine then, right? if your script is working now…might as well keep usin git? [15:59:49] next cycle it purges what it sent last time [15:59:49] oh nono [15:59:52] tha tis not ready drdee [16:00:01] there is a discussion on the ops list about what to do with that [16:00:03] needs some work [16:00:04] whooopsie [16:00:30] ottomata: I like standardization! script is stupid and only made sense with the insane limitations of last years fundraiser [16:00:35] the work I needed to do was dependent on my udp2log refactor, which has been merged [16:00:38] going to try to do that today [16:00:45] aye yeah [16:01:02] it even bugs me that people seem to arbitrarily choose where to log on locke [16:01:06] haha [16:01:07] oh man [16:01:09] did you know [16:01:11] to me it should all be done by one system user [16:01:12] that on all 3 udp2log machines [16:01:18] packet-loss.log is in 3 different locations? [16:01:18] it's all different! [16:01:18] heheh [16:01:20] yes [16:01:31] yeah, I want to go in and standardize all locations [16:01:39] so when mark commented that the fundraising pipeline is hacky, i bit my tongue [16:01:50] ottomata: I'm happy to help [16:02:03] yeah, not going to do it right now though, …kinda low priority :p [16:02:04] but! [16:02:06] as for your logrotate [16:02:11] the only diff [16:02:21] is that right now your custom script is not gzipping? [16:02:26] right [16:02:32] how much data is it? [16:02:41] like, per 15 minutes or whatever? [16:02:43] and logrotate doesn't need to gzip either, but if it does the rsync to final destination is trivial [16:02:51] aye yeah [16:02:56] i think its fine to try it [16:03:01] but, it wouldnt' be thaaat much more standarized [16:03:06] only a couple of MB once gzipped per 15m [16:03:07] ottomata: if the FR folks are good with 1:100 they're small [16:03:09] as the udp2log refactor only works with a single defaul tlog directory [16:03:14] you'd have to install your own custom logrotate file [16:03:15] which is fine [16:03:29] we could make a template [16:03:33] but you could do what I did for zero [16:03:36] Jeff_Green: 1:1 for landing pages, 1:100 for banners [16:03:48] for zero, i prefixed all of the log files with zero- [16:03:50] and a standard puppet config that tweezes filter, creates dirs, sets up logrotate, configures rsync [16:03:54] and kept them in the default log directory [16:04:02] then the default logrotate scirpt worked with them [16:04:11] and I can rsync them off by name [16:04:23] rsync -r …/zero*.gz /dest [16:04:32] buuuuut oh yeah [16:04:35] you want to rotate every 15 mins [16:04:39] so you'd need a file anyway [16:04:56] this has been done tons before i'm sure [16:04:57] https://www.google.com/search?sugexp=chrome,mod=10&sourceid=chrome&ie=UTF-8&q=puppet+logrotate [16:04:59] yeah--i'd logrotate to datestamped.gz files, and keep ~1wk [16:05:17] and from the archiving side, just rsync those right into the permanent archive dir [16:05:33] aye [16:05:51] https://github.com/rodjek/puppet-logrotate [16:05:54] the FR folks want to store permanently [16:06:02] yup yup [16:08:21] if I can't compress, I guess I'd do a no-op rsync to see what's available, then come back and fetch what isn't already compressed in the permanent archive. that's what I meant by pain in the ass [16:08:29] a script, no big whoop [16:09:40] aye, kinda annoying though [16:09:47] woudl be nice to just do an rsync and get the new changes [16:09:50] new files* [16:09:58] yeah [16:10:01] i say try it, [16:10:12] try logrotate + gz [16:10:30] my concern is that I won't find out its causing packet loss until 2 months later in the middle of the fundraiser [16:10:49] I seek the origin of the urban legend [16:10:59] packet loss on your fundraising logs? or others [16:11:04] globally [16:11:19] hmm, well, there are packet loss monitors for the global ones [16:11:20] literally udp being dropped at the interface I believe [16:11:55] just packet-loss.log? [16:12:01] yeah, so the montior should catch it [16:12:06] ah ok [16:12:08] yeah, and there is a ganglia/nagios alert [16:12:12] so we'll test [16:12:13] if the loss goes too high [16:12:43] i mean, maybe you want to just move your fundraising stuff over to oxygen? [16:12:55] that way it doesn't break (too many) people's stuff? [16:13:11] the metrics meeting stats that erik generates are from some of the files on locke and emery [16:13:20] oxygen just has the wikipedia zero filters right now [16:13:24] which are pretty low traffix [16:13:27] traffic* [16:13:45] does sound compelling [16:14:02] oop. my folks are here, so I'm going to do lunch. [16:14:07] ok, yeah i need to lunchy too [16:14:32] I'll try compression stuff and think toward logrotate assuming the FR folks 1;100 test go well [16:17:07] PROBLEM - NTP on virt1002 is CRITICAL: NTP CRITICAL: No response from NTP server [16:42:17] New review: MaxSem; "Have you seen the comment above: "Better, don't add wikis here unless you are Andrew or he knows wha..." [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/12366 [16:50:33] !log added a database account on db9/10 for read-only access to the gerrit database [16:50:39] Logged the message, Master [16:53:04] notpeter / Jeff_Green [16:53:05] https://gerrit.wikimedia.org/r/#/c/12600/ [16:53:08] whenever you get a chance [16:59:05] New patchset: Pyoungmeister; "setting mc hosts to use mw.cfg for partitioning" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12607 [16:59:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12607 [17:13:46] New patchset: Pyoungmeister; "setting mc hosts to use mw.cfg for partitioning" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12607 [17:14:18] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12607 [17:18:43] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12607 [17:18:46] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12607 [17:33:57] PROBLEM - Host owa2 is DOWN: PING CRITICAL - Packet loss = 100% [17:36:33] anyone available for a puppet brain bounce? [17:36:39] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [17:36:39] RECOVERY - Host owa2 is UP: PING OK - Packet loss = 0%, RTA = 1.66 ms [17:36:45] i want to make some changes to the way generic::rsyncd works [17:36:50] not sure of the best way to do it [17:37:22] ottomata: yes, however i'm not sure how much brain i have to bounce ;) [17:37:49] aye, you have a lot of brain i know it [17:37:56] so i need to add rsync daemon modules to each of the udp2log hosts [17:38:05] but they each have their log directories in different places [17:38:07] which is fine [17:38:22] but I don't want to add a files/rsync/rsyncd.conf.{oxygen,emery,locke} [17:38:29] that is not a very puppet way to do things [17:38:49] i'd like to make an rsyncd.conf.erb [17:39:17] but the trouble is, rsyncd.conf can contain mutiple modules [17:39:40] it'd be nice if rsyncd used an /etc/rsync.d/ directory [17:39:44] but it doesn't [17:39:49] just /etc/rsyncd.conf [17:41:52] i could do this [17:41:52] http://projects.puppetlabs.com/projects/puppet/wiki/Generating_a_config_file_from_fragments [17:42:10] which would take some work [17:42:26] or. i could just set up a udp2log_rsyncd.conf.erb [17:42:37] that would be the same for all log machines, except for the directory [17:42:42] that would solve my problem [17:42:44] New patchset: preilly; "fix MobileFrontend javascript reference" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12612 [17:42:50] but woudln't genericise it for others later [17:42:56] generisize* [17:42:59] (is that a word?) [17:43:35] hmm, maybe i'll just solve my problem with udp2log_rsyncd.conf.erb right now [17:43:35] New review: preilly; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/12612 [17:43:38] Change merged: preilly; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12612 [17:43:51] and give myself a TODO to fix up the whole rsyncd stuff all nice an abstract later? [17:43:55] whatcha think? [17:44:36] PROBLEM - Puppet freshness on db42 is CRITICAL: Puppet has not run in the last 10 hours [17:45:22] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [17:45:30] PROBLEM - Host ms-be5 is DOWN: PING CRITICAL - Packet loss = 100% [17:56:45] RECOVERY - Host ms-be5 is UP: PING OK - Packet loss = 0%, RTA = 0.82 ms [18:01:23] cmjohnson1/RobH - so, how's it going ? and, want to work on wireless ? [18:01:42] PROBLEM - Host owa1 is DOWN: PING CRITICAL - Packet loss = 100% [18:02:11] cool [18:02:18] so i think pmtpa wireless should be working right now :) [18:04:15] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [18:04:51] RECOVERY - Host owa1 is UP: PING OK - Packet loss = 0%, RTA = 0.48 ms [18:08:09] PROBLEM - Swift HTTP on owa1 is CRITICAL: Connection refused [18:08:29] yeah yhea, thanks nagios. [18:15:21] RECOVERY - SSH on ms-be5 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:16:22] Ryan_Lane: I'm sure this is a repeat question, but I can't remember what the answer was... [18:16:31] sources.list includes a reference to security.ubuntu.com. [18:16:42] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [18:16:50] Neither virt1001 nor virt1002 can ping security.ubuntu.com. Yet, apt-get update works on virt1001 and not on virt1002. [18:17:12] So there must be a super-specific security rule somewhere? [18:19:24] PROBLEM - Host owa3 is DOWN: PING CRITICAL - Packet loss = 100% [18:21:39] RECOVERY - Host owa3 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [18:26:03] alllrighhhthtyyyy, another q for ops room [18:26:16] binasher wants me to restrict these new rsync modules i'm setting up [18:26:22] via both the rsync module setting and iptables [18:26:29] how do I do so wtih iptables and puppet? [18:26:57] look at swift.pp for an iptables example. [18:27:01] it's craptastic. [18:27:10] ha, k looking [18:29:31] ah ok, so i can probably use iptables_add_service [18:29:35] since this is an rsync daemon [18:29:45] the list of services is in iptables.pp [18:29:48] if it's not there you can add it. [18:29:50] oh [18:29:51] k [18:30:09] that's the ports? [18:30:17] yup. [18:30:20] k [18:30:24] 873 is rsyncd default [18:30:24] cool [18:30:25] will add it [18:30:33] ooh [18:30:33] i tis there [18:30:34] cool [18:31:45] oh researchers. [18:36:09] !log removing 28790 bounce messages from exim queue on mchenry [18:36:14] Logged the message, Mistress of the network gear. [18:42:05] New patchset: Bhartshorne; "moving data to different drives for swift hosts with ssds" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12671 [18:42:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12671 [18:45:22] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/12671 [18:45:24] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12671 [18:48:03] RECOVERY - Puppet freshness on ms-be5 is OK: puppet ran at Fri Jun 22 18:47:36 UTC 2012 [18:49:51] RECOVERY - swift-object-server on ms-be5 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [18:49:51] RECOVERY - swift-object-updater on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [18:49:51] RECOVERY - swift-container-server on ms-be5 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [18:50:00] RECOVERY - swift-account-server on ms-be5 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [18:50:09] RECOVERY - swift-account-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [18:50:18] RECOVERY - swift-container-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [18:50:18] RECOVERY - swift-container-replicator on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [18:50:36] RECOVERY - swift-object-replicator on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [18:50:36] RECOVERY - swift-account-reaper on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [18:50:54] RECOVERY - swift-container-updater on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [18:53:56] New review: Nemo bis; "MaxSem, that's becaue LQT deploy is "forbidden" on regular Wikimedia projects, but is otherwise a no..." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/12366 [18:58:15] RECOVERY - NTP on ms-be5 is OK: NTP OK: Offset -0.02327513695 secs [18:59:25] New patchset: Reedy; "Disable Zak Greants account. No longer an employee" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12673 [18:59:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12673 [19:02:41] Change abandoned: Ottomata; "This commit got too far behind to be useful. I will do these changes based on ops recommendations i..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11898 [19:06:20] New patchset: Ottomata; "generic::rsyncd now allows content for use of ERb template config files." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12676 [19:06:55] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12676 [19:08:03] maplebed, if you got a sec, could you review that one? [19:08:08] i've got other commits coming in that depend on that [19:08:16] and mark asked me to do things like this in separate commits [19:08:39] OHHH POOOP [19:08:45] i didn't want to commit all those files [19:08:47] grrrrrrrrrrrrrrr [19:09:01] yargh [19:09:04] so much git trouble today [19:11:08] Should've done status before commit :P [19:11:12] i did [19:11:18] i was all ready [19:11:21] but then did commit -a [19:11:33] cause my fingers have a mind of their own [19:13:43] Change abandoned: Ottomata; "AGH! Didn't mean to commit all these files. Yargh, fat git fingers today. :(" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12676 [19:14:57] New patchset: Ottomata; "generic::rsyncd now allows content for use of ERb template config files." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12677 [19:15:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12677 [19:16:00] oook [19:16:02] that is much better [19:16:17] maplebed or LeslieCarr maybe. whenever you get a chance: [19:16:17] New review: Dereckson; "@MaxSem this is maybe why I added werdna as a reviewer." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/12366 [19:16:18] https://gerrit.wikimedia.org/r/#/c/12677/ [19:16:19] :) [19:16:58] ottomata: ok checking out now [19:16:59] yay [19:17:20] danke [19:18:16] ooo, there's a downside to split_spool_directory - it looks liek exipick doesn't work on that [19:18:20] perhaps i should revert that... [19:18:28] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [19:22:49] RECOVERY - swift-account-replicator on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [19:26:15] LeslieCarr: Remember a couple of weeks ago when you had a server where the serial-on-lan connection was hung? Do you remember how you got it unstuck? [19:26:34] like where it won't let another person on ? [19:26:55] racadm racreset soft [19:27:38] That's a management interface command? [19:27:43] On the cisco itself? [19:29:39] LeslieCarr: Sorry, I think I don't know what that is/how to do it. [19:29:49] oh [19:29:50] doh [19:29:55] i was thinking dells [19:29:59] don't know how ot do it on cisco [19:30:02] The symptom is that the mgmt interface works fine but when I do 'host connect' I get no response at all. [19:30:13] i haven't played with the ciscos [19:30:17] I think you/we solved a similar problem recently. [19:30:30] ok [19:30:41] Well, one of them just now came up, maybe I just need to be patient. [19:48:58] LeslieCarr, how'd that commit look, did you get a chance to peruse? (asking cause I want to commit my next one) [19:53:47] New patchset: Ottomata; "Adding rsync jobs for stat1." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12681 [19:54:18] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12681 [20:48:01] ottomata: sorry i went to food [20:49:14] no probs! [20:52:38] woosters: could we get an update on https://rt.wikimedia.org/Ticket/Display.html?id=2996 [20:58:26] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/12677 [20:58:28] ottomata: commented [20:58:35] mostly good [20:58:40] looking at our long list of rewrite rules, I have to wonder how much performance pain it brings us [20:58:41] (inline comments) , not "no comment" [20:59:01] <^demon> (no comment) just means "No cover message" -- output there kinda sucks. [20:59:38] gerrit's not the most user friendly … but it is better than anything else [20:59:59] <^demon> It's not gerrit's fault :p [21:00:07] <^demon> It's our fault, hooks are ours :) [21:00:53] LeslieCarr [21:00:59] since I have committed a later commit that depends on this one [21:01:12] do you know if it is still possible to amend my commit? [21:01:19] my generic::rsyncd commit? [21:01:38] that stretches my git knowledge (as to whether or not that will then require a rebase ...) [21:01:55] it will quite possibly be ok [21:01:58] <^demon> Probably will need to rebase. As long as it's not Merged|Abandoned, you can always amend a commit [21:02:46] <^demon> Protip: You can actually write a branch new commit, and if you use the same Change-Id, it'll be grouped as a followup patchset. Kinda useful for the situation where making the one-line-fix all over again is quicker than rebasing and such. [21:02:54] New patchset: Ottomata; "generic::rsyncd now allows content for use of ERb template config files." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12677 [21:02:56] <^demon> s/branch/brand/ [21:03:03] cool it worked! [21:03:16] i just checked out the previous commit [21:03:17] modified [21:03:25] and then commit —amend [21:03:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12677 [21:03:39] LeslieCarr, done. [21:04:10] <^demon> That'll work too :) [21:04:27] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12677 [21:04:30] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12677 [21:04:32] going through the next change :) [21:04:36] <^demon> It's 5 o'clock here. Have a good weekend everyone. [21:04:53] so ^demon, you're saying that I can branch from a previous commit [21:04:55] and then when I commit [21:05:01] just make sure I keep the same change-id [21:05:01] ? [21:05:10] like, manually paste it into the commit message? [21:05:24] ah, look at that https://gerrit.wikimedia.org/r/#/c/12681/1 - it will require a rebase [21:05:33] <^demon> Yeah, if you manually paste it into the commit message, it'll treat is as a patchset 2 (which is what happens with --amend, too) [21:05:48] hmmm [21:05:49] <^demon> Your change-id is preserved on --amend [21:06:29] so, for the rebase [21:06:50] should I just do [21:06:52] git rebase production [21:06:53] ? [21:07:28] <^demon> `git rebase origin/production` [21:07:32] k [21:07:44] <^demon> Anyway, I'm out :) [21:07:45] after fetch? [21:08:05] <^demon|away> Yeah, you're wanting to rebase your changes on top of master, so it'd be after the fetch. [21:09:48] right [21:09:52] New patchset: Ottomata; "Adding rsync jobs for stat1." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12681 [21:10:08] cool! [21:10:13] LeslieCarr, i think that did it [21:10:24] i jsut ran [21:10:24] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12681 [21:10:26] git-review [21:10:28] cool [21:10:29] after the rebase [21:10:34] i think it uploaded a new patchset [21:10:35] PROBLEM - Host ms-be5 is DOWN: PING CRITICAL - Packet loss = 100% [21:10:37] but the content is the same [21:10:49] yeah, cool, new parent on the 2nd patchset [21:10:50] perfect [21:10:58] New review: Lcarr; "(no comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12681 [21:11:09] those reviews are on patchset one, but grr whitespace [21:11:14] hah, ok [21:11:17] let me actually check out all the content better [21:11:23] (first read and second read) [21:11:24] k, will fix whitespace [21:13:35] New patchset: Ottomata; "Adding rsync jobs for stat1." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12681 [21:14:09] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12681 [21:21:50] RECOVERY - Host ms-be5 is UP: PING OK - Packet loss = 0%, RTA = 2.75 ms [21:24:59] PROBLEM - swift-account-auditor on ms-be5 is CRITICAL: Connection refused by host [21:25:08] PROBLEM - SSH on ms-be5 is CRITICAL: Connection refused [21:25:26] PROBLEM - swift-object-replicator on ms-be5 is CRITICAL: Connection refused by host [21:25:26] PROBLEM - swift-account-server on ms-be5 is CRITICAL: Connection refused by host [21:25:26] PROBLEM - swift-container-server on ms-be5 is CRITICAL: Connection refused by host [21:25:35] PROBLEM - swift-object-server on ms-be5 is CRITICAL: Connection refused by host [21:25:35] PROBLEM - swift-account-replicator on ms-be5 is CRITICAL: Connection refused by host [21:25:35] PROBLEM - swift-container-replicator on ms-be5 is CRITICAL: Connection refused by host [21:25:35] PROBLEM - swift-account-reaper on ms-be5 is CRITICAL: Connection refused by host [21:26:01] LeslieCarr, I'm out pretty soon [21:26:02] PROBLEM - swift-object-updater on ms-be5 is CRITICAL: Connection refused by host [21:26:07] feel free to merge that if you think it is ok [21:26:11] PROBLEM - swift-container-updater on ms-be5 is CRITICAL: Connection refused by host [21:26:11] PROBLEM - swift-container-auditor on ms-be5 is CRITICAL: Connection refused by host [21:26:14] it should be ok without a babysitter [21:26:36] ottomata: hey [21:26:39] one last whitespace [21:26:41] ack! [21:26:44] i missed one?! [21:26:46] udp2log.pp:49 [21:26:56] got it [21:26:58] other than that, looks good, i'll merge once fixed up [21:27:12] ok cool, then I will babysit it [21:27:16] New patchset: Ottomata; "Adding rsync jobs for stat1." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12681 [21:27:18] drdee will be happy if it goes well [21:27:49] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12681 [21:28:21] ok goodbye whitespace! [21:30:15] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12681 [21:30:18] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12681 [21:30:26] woohoo [21:30:33] lemme know when it is on sockpuppet and i'll try it [21:30:55] it is pushed to sockpuppet now [21:30:58] mmk [21:31:04] need me to run puppet on emery ? [21:31:12] naw i can do it [21:31:14] wanna watch it [21:31:20] cool [21:31:21] aahhhh! [21:31:23] forgot to add a file [21:31:26] glad I babysat [21:32:09] New patchset: Ottomata; "Adding missing templates/udp2log/rsyncd.conf.erb file." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12688 [21:32:41] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12688 [21:33:14] LeslieCarr, wouldya merge that real quick so I can try again? [21:33:54] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12688 [21:33:56] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12688 [21:36:44] more probs [21:37:42] New patchset: Ottomata; "Wrong variable name. Should be 'hosts_allow', not 'allow_hosts'." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12691 [21:37:51] ok, LeslieCarr, hopefully last one [21:37:54] PROBLEM - Host ms-be5 is DOWN: PING CRITICAL - Packet loss = 100% [21:38:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12691 [21:38:21] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12691 [21:38:28] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12691 [21:39:27] andrewbogott: ping? [21:39:32] ottomata: merged [21:39:39] paravoid: What's up? [21:39:48] so, among the many hours I've spent with the Ciscos [21:39:58] danke [21:40:00] I've managed to have a look at their web if [21:40:06] maybe this will help with your hanged server [21:40:34] paravoid: Sure... how much setup does it take to get a remote view of the web interface? [21:40:43] I just logged in [21:41:02] Oh, cool. What do you see? [21:41:06] I tried a hard reset, didn't seem to matter. [21:41:11] I never even see a memory test, just a hang. [21:42:32] wait [21:42:32] New patchset: Lcarr; "adding mc 1-16 to dhcpd.conf Change-Id: I10300f62408fe35f4f96167dcdfc69b23b297d48" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12520 [21:42:44] LeslieCarr: what's mc1-16 btw? :) [21:42:55] memcache1-16 [21:42:57] in tampa [21:43:05] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12520 [21:43:05] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12520 [21:43:28] andrewbogott: it doesn't get a DHCP lease [21:43:42] No DHCP or proxyDHCP offers were received [21:43:45] after Broadcom PXE [21:43:48] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12520 [21:43:51] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12520 [21:43:54] but that's besides the point; the machine works, so the problem's with SOL [21:44:00] I'd say let's reset the mgmt [21:44:13] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/12673 [21:44:15] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12673 [21:44:34] andrewbogott: I just hit "Reboot CIMC" [21:44:52] paravoid: Cool. I predict success. [21:45:25] I predict sleep [21:45:27] good night :) [21:45:28] or day [21:45:34] 'night! [21:45:36] g'night! [21:50:12] PROBLEM - NTP on virt1008 is CRITICAL: NTP CRITICAL: No response from NTP server [21:50:13] PROBLEM - NTP on virt1003 is CRITICAL: NTP CRITICAL: No response from NTP server [21:50:21] PROBLEM - NTP on virt1005 is CRITICAL: NTP CRITICAL: No response from NTP server [21:50:39] PROBLEM - NTP on virt1004 is CRITICAL: NTP CRITICAL: No response from NTP server [21:50:40] New patchset: Bhartshorne; "calling the swift storage raid config what it is, adjusting it to use sdc and sdd." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12692 [21:50:57] PROBLEM - NTP on virt1007 is CRITICAL: NTP CRITICAL: No response from NTP server [21:51:11] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12692 [21:51:39] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/12692 [21:51:42] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12692 [21:51:48] paravoid: no dice :( [21:53:07] LeslieCarr: I think I caught your chancge - dhcp for the new mc servers? [21:53:11] yeah [21:53:13] oops forgot to merge [21:53:16] can you merge ? [21:54:30] New patchset: Bhartshorne; "second half of partman recipe file name change" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12693 [21:54:51] RECOVERY - Host ms-be5 is UP: PING OK - Packet loss = 0%, RTA = 2.03 ms [21:55:02] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/12693 [21:55:54] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/12693 [21:55:56] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12693 [21:57:10] LeslieCarr, thanks so much for the merges! [21:57:12] it is wooooorrrrking! [21:57:17] ottomata: woot [21:57:23] well have a great rsynctastic weekend [21:57:40] danke! you tooooo! [22:06:00] New review: Lcarr; "The standard class has a mailserver (exim)" [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/12384 [22:07:46] PROBLEM - Host ms-be5 is DOWN: PING CRITICAL - Packet loss = 100% [22:13:18] RECOVERY - Host ms-be5 is UP: PING OK - Packet loss = 0%, RTA = 0.60 ms [22:15:06] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [22:23:19] !log stopping mysql on es3, reseeding slave via innodb hotbackup of es1004 [22:23:24] Logged the message, Master [22:26:12] PROBLEM - Host ms-be5 is DOWN: PING CRITICAL - Packet loss = 100% [23:03:38] RECOVERY - SSH on ms-be5 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [23:03:47] RECOVERY - Host ms-be5 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [23:04:05] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [23:04:41] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.60 ms [23:07:59] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [23:10:23] PROBLEM - Host ms-be5 is DOWN: PING CRITICAL - Packet loss = 100% [23:11:08] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.053 second response time [23:11:15] !log restarted apache on srv278 [23:11:20] Logged the message, Mistress of the network gear. [23:12:01] since when did morebots recognize who's logging it? [23:13:59] RECOVERY - Host ms-be5 is UP: PING OK - Packet loss = 0%, RTA = 0.77 ms [23:14:26] Jasper_Deng: since it started becoming self-aware [23:14:40] which was...? [23:15:21] when I started Ryan modified the code for my username ;) [23:15:40] a nice title it gives you xD [23:20:17] PROBLEM - Host ms-be5 is DOWN: PING CRITICAL - Packet loss = 100% [23:24:31] hehe, yeah [23:29:36] New patchset: Jalexander; "Adding WikimediaShopLink extension to labs cluster and setting required globals." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12706 [23:29:51] New patchset: Platonides; "Enhance account throttling" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12185 [23:30:53] New patchset: Platonides; "Enhance account throttling" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12185 [23:34:14] New review: Platonides; "Note: all those patchsets contain the same change. I had problems with some changes where it report..." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/12185 [23:36:11] RECOVERY - Host ms-be5 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [23:39:22] sigh, and I expecting these gerrit-wm reports at #mediawiki [23:44:16] maplebed: Regarding partman... you caught Faidon's earlier note about turning of vmedia, right? [23:44:26] I think I missed that one. [23:44:29] *off [23:44:38] Ah, ok. Lemme find you a link. [23:45:06] http://wikitech.wikimedia.org/view/Cisco_UCS_C250_M1#Virtual_Media [23:45:19] That solved many of my problems regarding drive lettering. Not sure if that's your issue or not. [23:45:29] unlikely. [23:45:45] sda and b in my server (ms-be5 / dell c2100) are SSDs. [23:46:04] I want them there (and they're there post boot) it just doesn't seem to want to boot from them. [23:46:06] Oh, ok, thought you were working on a cisco box same as me. [23:48:36] nope. [23:48:42] thanks for the thought though. [23:49:13] "doesn't seem to want to boot from them." looks like "doesn't have the drivers to read the disk" [23:49:52] maybe... but the same install process successfully works for other SSD-based servers. [23:50:00] same SSDs, sfaik. [23:50:10] I suppose that is something to double check with RobH. [23:59:02] PROBLEM - NTP on ms-be5 is CRITICAL: NTP CRITICAL: No response from NTP server