[00:04:52] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 2.608 seconds [00:23:28] RECOVERY - Disk space on srv192 is OK: DISK OK [00:24:40] RECOVERY - MySQL Replication Heartbeat on db45 is OK: OK replication delay 0 seconds [00:25:07] RECOVERY - MySQL Slave Delay on db45 is OK: OK replication delay 0 seconds [00:31:52] PROBLEM - MySQL Slave Delay on db47 is CRITICAL: CRIT replication delay 290 seconds [00:33:31] PROBLEM - MySQL Replication Heartbeat on db47 is CRITICAL: CRIT replication delay 386 seconds [00:39:13] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:45:04] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 1.129 seconds [01:05:19] New patchset: Lcarr; "Installing ssl module and certificate" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2815 [01:05:46] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2815 [01:12:24] PROBLEM - MySQL Replication Heartbeat on db1006 is CRITICAL: CRIT replication delay 2866 seconds [01:12:51] RECOVERY - MySQL Slave Delay on db1006 is OK: OK replication delay NULL seconds [01:16:34] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2815 [01:16:35] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2815 [01:16:45] PROBLEM - MySQL Slave Running on db1006 is CRITICAL: CRIT replication Slave_IO_Running: Yes Slave_SQL_Running: No Last_Error: Error Duplicate key name page_redirect_namespace_len on query. De [01:22:02] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:25:00] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 4.580 seconds [01:31:09] PROBLEM - Disk space on srv191 is CRITICAL: DISK CRITICAL - free space: / 284 MB (3% inode=63%): /var/lib/ureadahead/debugfs 284 MB (3% inode=63%): [01:33:46] New patchset: Lcarr; "Commenting out duplicate check definition" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2817 [01:34:10] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2817 [01:34:10] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2817 [01:34:16] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2817 [01:34:17] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2817 [01:55:00] RECOVERY - Disk space on srv191 is OK: DISK OK [01:55:08] New patchset: Lcarr; "ensure sample config file removed" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2819 [01:56:33] New patchset: Lcarr; "ensure sample config file removed" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2819 [01:56:56] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2819 [01:57:04] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2819 [01:57:05] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2819 [02:00:51] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:08:39] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.019 seconds [02:12:33] RECOVERY - MySQL Slave Running on db1006 is OK: OK replication Slave_IO_Running: Yes Slave_SQL_Running: Yes Last_Error: [02:13:27] !log moved db1006 to new s6 master [02:13:30] Logged the message, Master [02:14:30] RECOVERY - MySQL Replication Heartbeat on db1006 is OK: OK replication delay 0 seconds [02:16:00] RECOVERY - MySQL Replication Heartbeat on db47 is OK: OK replication delay 0 seconds [02:16:01] RECOVERY - MySQL Slave Delay on db47 is OK: OK replication delay 0 seconds [02:23:57] PROBLEM - MySQL Replication Heartbeat on db47 is CRITICAL: CRIT replication delay 388 seconds [02:23:57] PROBLEM - MySQL Slave Delay on db47 is CRITICAL: CRIT replication delay 388 seconds [03:01:48] PROBLEM - Puppet freshness on hooper is CRITICAL: Puppet has not run in the last 10 hours [03:02:51] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [03:10:13] RECOVERY - MySQL Slave Delay on db47 is OK: OK replication delay 0 seconds [03:12:01] RECOVERY - MySQL Replication Heartbeat on db47 is OK: OK replication delay 0 seconds [03:17:52] PROBLEM - MySQL Replication Heartbeat on db47 is CRITICAL: CRIT replication delay 303 seconds [03:18:10] PROBLEM - MySQL Slave Delay on db47 is CRITICAL: CRIT replication delay 320 seconds [04:03:48] RECOVERY - Puppet freshness on lvs1002 is OK: puppet ran at Tue Feb 28 04:03:20 UTC 2012 [04:10:42] RECOVERY - MySQL Replication Heartbeat on db47 is OK: OK replication delay 0 seconds [04:10:51] RECOVERY - MySQL Slave Delay on db47 is OK: OK replication delay 0 seconds [04:11:18] RECOVERY - ps1-d2-pmtpa-infeed-load-tower-A-phase-Y on ps1-d2-pmtpa is OK: ps1-d2-pmtpa-infeed-load-tower-A-phase-Y OK - 1150 [04:17:45] PROBLEM - MySQL Replication Heartbeat on db36 is CRITICAL: CRIT replication delay 302 seconds [04:17:45] PROBLEM - MySQL Slave Delay on db36 is CRITICAL: CRIT replication delay 302 seconds [04:36:21] PROBLEM - Puppet freshness on mw1010 is CRITICAL: Puppet has not run in the last 10 hours [04:42:30] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [04:42:30] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [05:26:30] RECOVERY - MySQL Replication Heartbeat on db36 is OK: OK replication delay 0 seconds [05:26:48] RECOVERY - MySQL Slave Delay on db36 is OK: OK replication delay 0 seconds [06:14:38] PROBLEM - MySQL Replication Heartbeat on db34 is CRITICAL: CRIT replication delay 21203 seconds [06:24:14] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [06:24:59] RECOVERY - MySQL Replication Heartbeat on db34 is OK: OK replication delay 0 seconds [06:25:08] RECOVERY - MySQL Slave Delay on db34 is OK: OK replication delay 0 seconds [06:33:14] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [06:33:14] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [06:46:41] PROBLEM - NTP on srv278 is CRITICAL: NTP CRITICAL: Offset unknown [06:47:17] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [06:50:35] RECOVERY - NTP on srv278 is OK: NTP OK: Offset 0.007147789001 secs [07:09:11] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.052 second response time [08:21:02] hello [08:31:50] i have something strange in my replication from the master database hosts. I see an "alter table". is this possible and correct? [08:36:22] i wonder why we get this patch now since the fields existed already some days ago (i added them manually to our databases) [09:56:54] New patchset: Hashar; "redirect some missing Swift syslog messages" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2820 [09:57:22] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2820 [11:20:43] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:22:30] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [11:31:45] New patchset: Hashar; "allow hashar on formey host" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2821 [11:32:09] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2821 [13:02:59] PROBLEM - Puppet freshness on hooper is CRITICAL: Puppet has not run in the last 10 hours [13:04:02] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [13:50:09] thanks apergos - I've added https://www.mediawiki.org/wiki/SwiftMedia/status#2012-02-29 and I saw Ben already added https://www.mediawiki.org/wiki/Wikimedia_engineering_report/2012/February#Site_infrastructure [13:50:17] good [13:50:28] here it is feb 28 already [13:51:29] how time flies [13:51:29] * sumanah will let others fix up what I may have missed in the distinction between Swift & SwiftMedia [13:51:29] s/I/she/ [13:51:29] apergos: It needs to be "Feb 29" in order to be autotranscluded for the monthly report [13:51:29] :-/ [13:51:30] heh [14:37:30] PROBLEM - Puppet freshness on mw1010 is CRITICAL: Puppet has not run in the last 10 hours [14:44:33] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [14:44:33] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [14:46:08] New review: Demon; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/2821 [15:27:58] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 10.0594110526 (gt 8.0) [15:30:39] Change abandoned: Mark Bergsma; "Old" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/887 [15:38:50] Change abandoned: Mark Bergsma; "This doesn't work as it causes duplicate Puppet definitions." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1726 [15:44:27] Change abandoned: Mark Bergsma; "This does not look like it can go into production as it is now, sorry. Since it's an old change I wi..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2254 [15:45:46] New review: Mark Bergsma; "This breaks the mail gateway I'm afraid" [operations/puppet] (production); V: 0 C: -2; - https://gerrit.wikimedia.org/r/2446 [15:47:36] New review: Mark Bergsma; "What is the status of this change?" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2495 [15:47:46] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:49:43] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [15:52:07] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 8.81116736842 (gt 8.0) [16:02:01] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.60184761062 [16:26:16] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [16:35:16] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [16:35:16] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [16:37:22] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 11.5292930702 (gt 8.0) [16:50:45] hrmm [16:50:53] i just checked something in, shouldnt it be spamming the channel =/ [16:51:15] or does it no longer do that for test branch? [16:51:26] not since a long time ago [16:52:21] shows how often until past two days I messed with labs ;] [17:00:13] maplebed: I CC'd you [17:00:32] k. [17:00:37] they may just be MW though [17:00:41] * AaronSchulz looks [17:02:40] maplebed: more interesting is why swift purging is slow http://noc.wikimedia.org/cgi-bin/report.py?db=1.19&sort=onereal&limit=50 [17:02:53] nah, that's not terribly interesting. [17:02:59] wmfPurgeBackendThumbCache [17:03:01] ;) [17:03:13] someone was complaining about deletion slowness [17:03:32] I don't know what to make of that number. [17:03:40] is it higher or lower than it used to be? [17:04:00] it's always been that slow [17:04:57] that says 1/2 second, right? [17:05:14] that's in line with this: http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&hreg[]=^ms-fe[1-9]&mreg[]=swift_GET_200_90th&mreg[]=swift_PUT_201_90th&mreg[]=swift_DELETE_204_90th&x=4>ype=line&title=Swift+90th+percentile+query+response+time+-+200s&aggregate=1 [17:05:47] 5.86 * 1000 ms [17:06:59] AaronSchulz: that bug you gave me hasn't been updated since august of last year. [17:07:00] I don't think swift is involved. [17:07:03] :P [17:07:27] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 11.6548744737 (gt 8.0) [17:07:36] might be the wrong one, let me check [17:07:53] it is talking about tiffs... [17:08:23] its 34724 [17:08:47] ah, the one you linked me was 2 something. [17:09:13] I already see a MW bug [17:11:48] AaronSchulz: that tiff is in squid but not in swift. [17:18:41] AaronSchulz: I'm going to lay off that tiff bug because there's something going on with medaiwiki. [17:21:24] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.59682 [17:23:43] maplebed: then you can look at the slowness :) [17:24:08] I'm going to be working on clearing out cruft from swift today. [17:31:18] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 10.0625042105 (gt 8.0) [17:46:29] mark: in case you're interested, this is what happened when I told swift to forget about ms-be5: http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=network_report&s=by+name&c=Swift+pmtpa&h=&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4 [17:46:54] aka "whee, let's quick make a third copy of everything that was on ms-be5!!!" [17:47:21] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.23885684211 [17:47:53] hehe nice [18:01:59] New patchset: Mark Bergsma; "Add strontium and palladium as bits servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2834 [18:03:01] New patchset: Mark Bergsma; "Add strontium and palladium as bits servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2834 [18:04:04] New patchset: Mark Bergsma; "Add strontium and palladium as bits servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2834 [18:04:56] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2834 [18:04:56] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2834 [18:08:48] New patchset: Ryan Lane; "Moving operations/software commits to operations channels." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2836 [18:22:05] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 10.4970663158 (gt 8.0) [18:33:38] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2836 [18:33:39] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2836 [18:42:11] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.64073380531 [18:59:46] New patchset: Mark Bergsma; "Make strontium and palladium internal rightaway, and prepare for converting the others" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2837 [19:00:27] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2837 [19:00:28] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2837 [19:35:49] New patchset: Lcarr; "Ensuring conflicting files absent Also fixing some indenting" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2838 [19:37:21] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2838 [19:37:22] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2838 [19:56:52] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2629 [19:56:53] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2629 [19:57:26] New patchset: Lcarr; "switching to exec whenever (as does not trigger if nagios is not running already)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2839 [19:57:49] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2781 [19:57:49] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2781 [19:58:01] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2839 [19:58:02] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2839 [19:58:10] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2590 [19:58:11] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2590 [19:58:30] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2578 [19:58:31] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2578 [20:00:22] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2514 [20:02:56] New patchset: Lcarr; "amending neon search path" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2840 [20:03:20] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2840 [20:03:23] New patchset: Hashar; "pyc files are now ignored" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2514 [20:03:47] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2840 [20:05:28] opengear is insane fast [20:05:34] i placed the order less than two hours ago [20:05:37] its already shipped. [20:15:11] New patchset: Mark Bergsma; "Reformat" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2841 [20:15:36] New patchset: Mark Bergsma; "Let's try raid1-lvm.cfg for bits servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2842 [20:16:06] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2841 [20:16:07] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2841 [20:16:09] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2842 [20:16:09] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2842 [20:18:57] RobH: I placed 4 orders today, 3 out of 4 shipped within an hour [20:19:20] netherlands vendors are a lot more on the ball than their US counterparts ;P [20:19:42] though i suppose i shouldnt think of this like a server order, its an on shelf item [20:19:55] yeah not all [20:19:56] it really depends [20:20:20] the fibers didnt ship yet ;] (though they say most orders before 2pm est ship same day) [20:20:30] so i expect them in by friday [20:20:33] at latest [20:20:40] i ordered a hdmi-cec adapter, a cat6 wall socket, some cables and a hdmi-over-utp adapter kit [20:20:50] cleaning up the home theater system? [20:20:59] no I bought a projector [20:21:09] so extending it into the bedroom [20:21:33] i am both impressed by it, and feel the point to chant 'nerd' or 'one of us' [20:21:42] haha [20:21:50] a friend of mine had projectors at home, it was awesome for xbox parties. [20:22:02] i would borrow one from work for it as well [20:22:08] lots of nerds playing halo. [20:37:09] RECOVERY - Host db1004 is UP: PING WARNING - Packet loss = 93%, RTA = 1924.98 ms [20:40:08] !log streaming hot backup of db1006 to db1040 [20:40:11] Logged the message, Master [20:41:03] PROBLEM - mysqld processes on db1004 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [20:41:12] don't overheat my switches asher [20:42:30] but they need a workout [20:42:49] I hope you did a proper warmup first then [20:43:23] !log streaming a hotbackup of db1038 to db1004 [20:43:25] Logged the message, Master [20:43:59] i must not have, the eqiad db's are a total strain [20:44:32] i hope the grounding issue fixes it [20:45:39] New patchset: Mark Bergsma; "Fix syntax, add missing 'echo'" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2843 [20:47:02] New review: Mark Bergsma; "lint check can kiss my..." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2843 [20:47:10] New review: Mark Bergsma; "lint check can kiss my..." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2843 [20:47:11] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2843 [21:04:04] New patchset: Mark Bergsma; "Really fix syntax this time. And coffee." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2844 [21:04:57] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2844 [21:04:58] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2844 [21:20:51] New patchset: Mark Bergsma; "Readd the mountpoint, accidently removed by the reformat" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2845 [21:21:24] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2845 [21:21:25] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2845 [21:28:01] LeslieCarr: puppet linter got merged in :) Doc in commit message of https://gerrit.wikimedia.org/r/#change,2629 [21:28:08] ok cool :) [21:33:38] New review: Reedy; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/2821 [21:34:21] New patchset: Lcarr; "Adding in proper cgi.cfg for nagios3 and moving all nagios3 specific files to their own folder" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2846 [21:35:33] New patchset: Lcarr; "Adding in proper cgi.cfg for nagios3 and moving all nagios3 specific files to their own folder" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2846 [21:39:42] New patchset: Lcarr; "Adding in proper cgi.cfg for nagios3 and moving all nagios3 specific files to their own folder" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2846 [21:41:09] anyone know what "warning: Implicit invocation of 'puppet apply' by passing files (or flags) directly to 'puppet' is deprecated, and will be removed in the 2.8 series. Please invoke 'puppet apply' directly in the future." is ? the error isn't popping up in my local lint check [21:44:52] PROBLEM - MySQL Idle Transactions on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:45:10] PROBLEM - MySQL Recent Restart on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:45:28] PROBLEM - SSH on db24 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:45:55] PROBLEM - MySQL Replication Heartbeat on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:46:04] PROBLEM - DPKG on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:46:04] PROBLEM - mysqld processes on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:46:22] PROBLEM - MySQL Slave Delay on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:46:22] PROBLEM - MySQL Slave Running on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:46:31] PROBLEM - Disk space on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:46:31] PROBLEM - MySQL disk space on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:46:41] PROBLEM - RAID on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:46:41] PROBLEM - Full LVS Snapshot on db24 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:48:07] New review: Lcarr; "gerrit borked" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2846 [21:48:11] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2846 [21:49:36] New patchset: Ryan Lane; "Revert "rm ansi sequences when validating puppet changes". Unfortunately broke the lint check." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2858 [21:49:58] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2858 [21:49:59] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2858 [21:57:57] !log Setup servers strontium and palladium as additional (internal) bits servers in eqiad. awaiting connection of eth1-3 before deployment [21:57:59] Logged the message, Master [22:04:37] New patchset: Hashar; "rm ansi sequences when validating puppet changes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2863 [22:09:41] PROBLEM - NTP on db24 is CRITICAL: NTP CRITICAL: No response from NTP server [22:16:29] New patchset: Lcarr; "Fixing file paths" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2864 [22:17:18] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2864 [22:17:19] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2864 [22:22:08] New patchset: Lcarr; "removing default site" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2865 [22:22:32] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2865 [22:23:28] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2865 [22:23:29] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2865 [22:30:32] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:32:29] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:34:26] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:36:23] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:38:02] RECOVERY - Disk space on db40 is OK: DISK OK [22:38:20] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:39:05] RECOVERY - MySQL disk space on db40 is OK: DISK OK [22:40:17] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:42:14] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:44:11] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:46:17] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:48:14] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:48:33] !log rebuilding manganese to act as new gerrit server [22:48:35] Logged the message, Master [22:49:26] RECOVERY - Puppet freshness on db1040 is OK: puppet ran at Tue Feb 28 22:49:16 UTC 2012 [22:50:11] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:52:08] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:53:29] PROBLEM - RAID on db40 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:54:05] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:56:02] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [22:57:59] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:00:05] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:02:02] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:03:59] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:05:02] PROBLEM - Puppet freshness on hooper is CRITICAL: Puppet has not run in the last 10 hours [23:05:56] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [23:05:56] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:07:17] RECOVERY - Puppet freshness on db1040 is OK: puppet ran at Tue Feb 28 23:06:55 UTC 2012 [23:07:53] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:09:50] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:11:47] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:13:53] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:15:50] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:17:41] New patchset: Ryan Lane; "Adding manganese as a new gerrit server." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2866 [23:17:44] ^demon|away: ^^ [23:17:47] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:18:23] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2866 [23:18:24] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2866 [23:19:44] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:20:02] RECOVERY - Puppet freshness on db1040 is OK: puppet ran at Tue Feb 28 23:19:50 UTC 2012 [23:21:21] New patchset: Ryan Lane; "Removing sumanah from manganese, for now." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2867 [23:21:41] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:22:35] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2867 [23:22:36] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2867 [23:23:38] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:25:53] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:27:50] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:29:47] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:30:05] New patchset: Ryan Lane; "Adding an apache server to the gerrit proxy" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2868 [23:30:50] RECOVERY - Puppet freshness on db1040 is OK: puppet ran at Tue Feb 28 23:30:42 UTC 2012 [23:31:44] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:32:10] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2868 [23:32:10] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2868 [23:34:31] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:36:37] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:37:41] why is nagios-wm going crazy ? [23:38:34] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:38:48] New patchset: Ryan Lane; "Fix requirement chain" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2869 [23:40:31] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:42:28] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:44:11] New patchset: Ryan Lane; "Prefer RC4 ciphers to combat BEAST" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2870 [23:44:25] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:44:34] RECOVERY - SSH on db24 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [23:44:34] RECOVERY - MySQL Idle Transactions on db24 is OK: OK longest blocking idle transaction sleeps for seconds [23:44:52] RECOVERY - Disk space on db24 is OK: DISK OK [23:45:01] RECOVERY - MySQL Recent Restart on db24 is OK: OK seconds since restart [23:45:01] RECOVERY - RAID on db24 is OK: OK: 1 logical device(s) checked [23:45:10] RECOVERY - MySQL Replication Heartbeat on db24 is OK: OK replication delay seconds [23:45:37] RECOVERY - MySQL Slave Delay on db24 is OK: OK replication delay seconds [23:45:46] RECOVERY - DPKG on db24 is OK: All packages OK [23:45:52] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2869 [23:45:53] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2869 [23:46:03] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2870 [23:46:04] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2870 [23:46:04] RECOVERY - MySQL Slave Running on db24 is OK: OK replication [23:46:22] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:46:40] bah. I obviously didn't review that well [23:47:45] New patchset: Ryan Lane; "Fix stupid requirement copy/paste" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2871 [23:48:19] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:48:54] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2871 [23:48:55] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2871 [23:50:16] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:51:10] PROBLEM - MySQL Replication Heartbeat on db24 is CRITICAL: CRIT replication delay 7830 seconds [23:51:55] PROBLEM - MySQL Slave Running on db24 is CRITICAL: CRIT replication Slave_IO_Running: No Slave_SQL_Running: No Last_Error: Rollback done for prepared transaction because its XID was not in the [23:52:22] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:54:19] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:56:16] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [23:58:13] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours