[00:01:44] maplebed: your new swift storage nodes are ordered [00:01:50] win! [00:02:05] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27400 bytes in 0.223 seconds [00:02:14] also, fyi: if you need to do power and drac shit to them [00:02:21] ipmi_mgmt is a script on sockpuppet now [00:02:29] running it with no arguements shows help [00:02:43] goign to swap all dells over to use that and document it on wikitech [00:02:46] but fyi for now [00:04:27] tnx. [00:05:55] PROBLEM - Disk space on db1042 is CRITICAL: Connection refused by host [00:05:55] RECOVERY - MySQL disk space on db30 is OK: DISK OK [00:07:35] PROBLEM - DPKG on srv300 is CRITICAL: Connection refused by host [00:07:35] PROBLEM - DPKG on srv225 is CRITICAL: Connection refused by host [00:09:15] PROBLEM - RAID on mw1129 is CRITICAL: Connection refused by host [00:09:25] PROBLEM - DPKG on srv263 is CRITICAL: Connection refused by host [00:09:25] PROBLEM - Disk space on srv300 is CRITICAL: Connection refused by host [00:10:15] PROBLEM - DPKG on mw1129 is CRITICAL: Connection refused by host [00:10:16] RobH: can you check db47's raid? it had a drive replaced (ticket closed yesterday) but still show a span as degraded. both drives in it are marked as online now though [00:10:45] PROBLEM - RAID on mw69 is CRITICAL: Connection refused by host [00:11:25] PROBLEM - Disk space on srv263 is CRITICAL: Connection refused by host [00:11:38] yea i will tkae a glance at it [00:12:15] PROBLEM - Disk space on mw1129 is CRITICAL: Connection refused by host [00:13:05] RECOVERY - mysqld processes on db36 is OK: PROCS OK: 1 process with command name mysqld [00:13:47] PROBLEM - DPKG on mw1147 is CRITICAL: Connection refused by host [00:14:15] PROBLEM - DPKG on srv199 is CRITICAL: Connection refused by host [00:14:35] PROBLEM - DPKG on db1042 is CRITICAL: Connection refused by host [00:14:45] PROBLEM - RAID on srv300 is CRITICAL: Connection refused by host [00:15:05] PROBLEM - RAID on srv225 is CRITICAL: Connection refused by host [00:16:23] PROBLEM - DPKG on snapshot1001 is CRITICAL: Connection refused by host [00:16:33] PROBLEM - RAID on srv193 is CRITICAL: Connection refused by host [00:16:33] PROBLEM - DPKG on srv216 is CRITICAL: Connection refused by host [00:16:43] PROBLEM - RAID on srv263 is CRITICAL: Connection refused by host [00:16:43] PROBLEM - RAID on srv269 is CRITICAL: Connection refused by host [00:17:03] PROBLEM - MySQL disk space on db1042 is CRITICAL: Connection refused by host [00:17:13] PROBLEM - Disk space on ganglia1001 is CRITICAL: Connection refused by host [00:17:33] RECOVERY - DPKG on srv300 is OK: All packages OK [00:17:40] New patchset: Asher; "monitoring for all core dbs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2135 [00:17:57] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2135 [00:18:03] PROBLEM - Disk space on mw69 is CRITICAL: Connection refused by host [00:18:13] PROBLEM - DPKG on ganglia1001 is CRITICAL: Connection refused by host [00:18:13] PROBLEM - DPKG on srv193 is CRITICAL: Connection refused by host [00:18:23] PROBLEM - Disk space on srv225 is CRITICAL: Connection refused by host [00:18:33] PROBLEM - DPKG on srv269 is CRITICAL: Connection refused by host [00:18:33] RECOVERY - Disk space on srv300 is OK: DISK OK [00:18:38] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2135 [00:18:38] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2135 [00:19:33] RECOVERY - DPKG on mw1129 is OK: All packages OK [00:19:33] RECOVERY - DPKG on mw1147 is OK: All packages OK [00:20:13] PROBLEM - Disk space on srv193 is CRITICAL: Connection refused by host [00:20:23] PROBLEM - DPKG on srv261 is CRITICAL: Connection refused by host [00:20:23] PROBLEM - Disk space on srv216 is CRITICAL: Connection refused by host [00:21:23] PROBLEM - Disk space on srv264 is CRITICAL: Connection refused by host [00:21:33] RECOVERY - DPKG on srv263 is OK: All packages OK [00:21:33] RECOVERY - Disk space on mw1129 is OK: DISK OK [00:22:03] PROBLEM - RAID on snapshot1001 is CRITICAL: Connection refused by host [00:22:33] PROBLEM - Disk space on srv261 is CRITICAL: Connection refused by host [00:22:33] PROBLEM - DPKG on srv264 is CRITICAL: Connection refused by host [00:22:43] PROBLEM - RAID on srv261 is CRITICAL: Connection refused by host [00:22:53] RECOVERY - DPKG on db1042 is OK: All packages OK [00:23:13] PROBLEM - RAID on ganglia1001 is CRITICAL: Connection refused by host [00:23:23] PROBLEM - Disk space on srv269 is CRITICAL: Connection refused by host [00:23:33] RECOVERY - DPKG on srv199 is OK: All packages OK [00:24:13] PROBLEM - RAID on srv264 is CRITICAL: Connection refused by host [00:24:13] RECOVERY - Disk space on srv263 is OK: DISK OK [00:24:23] PROBLEM - RAID on srv216 is CRITICAL: Connection refused by host [00:24:23] PROBLEM - Disk space on srv223 is CRITICAL: Connection refused by host [00:24:23] RECOVERY - RAID on srv225 is OK: OK: no RAID installed [00:25:53] RECOVERY - Disk space on db1042 is OK: DISK OK [00:26:03] RECOVERY - RAID on mw1129 is OK: OK: no RAID installed [00:26:23] RECOVERY - RAID on srv300 is OK: OK: no RAID installed [00:26:33] RECOVERY - DPKG on snapshot1001 is OK: All packages OK [00:26:53] PROBLEM - RAID on srv190 is CRITICAL: Connection refused by host [00:26:53] RECOVERY - RAID on srv193 is OK: OK: no RAID installed [00:26:53] RECOVERY - RAID on srv263 is OK: OK: no RAID installed [00:27:03] RECOVERY - RAID on srv269 is OK: OK: no RAID installed [00:27:23] PROBLEM - RAID on mw1142 is CRITICAL: Connection refused by host [00:27:23] RECOVERY - MySQL disk space on db1042 is OK: DISK OK [00:27:23] RECOVERY - Disk space on ganglia1001 is OK: DISK OK [00:27:53] RECOVERY - DPKG on srv225 is OK: All packages OK [00:28:03] RECOVERY - Disk space on mw69 is OK: DISK OK [00:28:13] PROBLEM - DPKG on srv235 is CRITICAL: Connection refused by host [00:28:23] PROBLEM - DPKG on srv190 is CRITICAL: Connection refused by host [00:28:23] RECOVERY - DPKG on srv193 is OK: All packages OK [00:28:23] RECOVERY - DPKG on ganglia1001 is OK: All packages OK [00:28:33] RECOVERY - Disk space on srv225 is OK: DISK OK [00:28:33] RECOVERY - DPKG on srv269 is OK: All packages OK [00:28:43] PROBLEM - RAID on srv223 is CRITICAL: Connection refused by host [00:28:43] PROBLEM - Disk space on srv235 is CRITICAL: Connection refused by host [00:29:33] New patchset: Asher; "experiment at avoiding endlessly appending to nagios svc check files" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2136 [00:29:43] PROBLEM - DPKG on mw1142 is CRITICAL: Connection refused by host [00:29:45] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/2136 [00:30:03] RECOVERY - RAID on mw69 is OK: OK: no RAID installed [00:30:23] PROBLEM - Disk space on srv190 is CRITICAL: Connection refused by host [00:30:23] RECOVERY - Disk space on srv193 is OK: DISK OK [00:30:23] PROBLEM - RAID on srv204 is CRITICAL: Connection refused by host [00:30:43] PROBLEM - RAID on srv210 is CRITICAL: Connection refused by host [00:30:43] RECOVERY - Disk space on srv216 is OK: DISK OK [00:30:53] RECOVERY - DPKG on srv261 is OK: All packages OK [00:31:33] RECOVERY - Disk space on srv264 is OK: DISK OK [00:32:03] PROBLEM - RAID on mw25 is CRITICAL: Connection refused by host [00:32:23] RECOVERY - RAID on snapshot1001 is OK: OK: no RAID installed [00:32:33] PROBLEM - DPKG on srv223 is CRITICAL: Connection refused by host [00:32:43] RECOVERY - Disk space on srv261 is OK: DISK OK [00:32:43] RECOVERY - DPKG on srv264 is OK: All packages OK [00:32:53] RECOVERY - RAID on srv261 is OK: OK: no RAID installed [00:33:03] PROBLEM - Disk space on mw1142 is CRITICAL: Connection refused by host [00:33:23] RECOVERY - RAID on ganglia1001 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [00:33:53] RECOVERY - Disk space on srv269 is OK: DISK OK [00:34:03] PROBLEM - Disk space on srv204 is CRITICAL: Connection refused by host [00:34:03] PROBLEM - DPKG on srv204 is CRITICAL: Connection refused by host [00:34:23] RECOVERY - RAID on srv264 is OK: OK: no RAID installed [00:34:33] PROBLEM - Disk space on srv210 is CRITICAL: Connection refused by host [00:34:33] RECOVERY - RAID on srv216 is OK: OK: no RAID installed [00:34:43] RECOVERY - Disk space on srv223 is OK: DISK OK [00:35:03] PROBLEM - DPKG on mw25 is CRITICAL: Connection refused by host [00:36:13] PROBLEM - RAID on mw1115 is CRITICAL: Connection refused by host [00:36:33] PROBLEM - Disk space on mw25 is CRITICAL: Connection refused by host [00:36:53] RECOVERY - DPKG on srv216 is OK: All packages OK [00:37:03] RECOVERY - RAID on srv190 is OK: OK: no RAID installed [00:38:13] PROBLEM - DPKG on srv288 is CRITICAL: Connection refused by host [00:38:33] RECOVERY - DPKG on srv190 is OK: All packages OK [00:38:43] RECOVERY - DPKG on srv235 is OK: All packages OK [00:38:53] RECOVERY - RAID on srv223 is OK: OK: no RAID installed [00:38:53] PROBLEM - Disk space on srv288 is CRITICAL: Connection refused by host [00:38:53] RECOVERY - Disk space on srv235 is OK: DISK OK [00:39:53] RECOVERY - DPKG on mw1142 is OK: All packages OK [00:40:03] PROBLEM - DPKG on mw1115 is CRITICAL: Connection refused by host [00:40:33] PROBLEM - RAID on srv191 is CRITICAL: Connection refused by host [00:40:43] RECOVERY - Disk space on srv190 is OK: DISK OK [00:40:43] RECOVERY - RAID on srv210 is OK: OK: no RAID installed [00:41:53] PROBLEM - Disk space on mw1105 is CRITICAL: Connection refused by host [00:41:53] PROBLEM - Disk space on mw1115 is CRITICAL: Connection refused by host [00:42:13] RECOVERY - RAID on mw25 is OK: OK: no RAID installed [00:42:33] PROBLEM - DPKG on srv191 is CRITICAL: Connection refused by host [00:42:53] RECOVERY - DPKG on srv223 is OK: All packages OK [00:43:23] RECOVERY - Disk space on mw1142 is OK: DISK OK [00:44:13] RECOVERY - DPKG on srv204 is OK: All packages OK [00:44:23] RECOVERY - Disk space on srv204 is OK: DISK OK [00:44:44] RECOVERY - Disk space on srv210 is OK: DISK OK [00:45:13] RECOVERY - DPKG on mw25 is OK: All packages OK [00:45:35] binasher: so yea, db47 disk changed, to a bad disk [00:46:23] PROBLEM - DPKG on mw1105 is CRITICAL: Connection refused by host [00:46:23] RECOVERY - RAID on mw1115 is OK: OK: no RAID installed [00:46:33] PROBLEM - RAID on mw1105 is CRITICAL: Connection refused by host [00:46:53] RECOVERY - Disk space on mw25 is OK: DISK OK [00:47:43] RECOVERY - RAID on mw1142 is OK: OK: no RAID installed [00:48:00] so i think something is up with spence [00:48:33] RECOVERY - DPKG on srv288 is OK: All packages OK [00:49:03] RECOVERY - Disk space on srv288 is OK: DISK OK [00:50:13] RECOVERY - DPKG on mw1115 is OK: All packages OK [00:50:43] RECOVERY - RAID on srv204 is OK: OK: no RAID installed [00:51:03] RECOVERY - RAID on srv191 is OK: OK: no RAID installed [00:52:03] RECOVERY - Disk space on mw1115 is OK: DISK OK [00:52:03] RECOVERY - Disk space on mw1105 is OK: DISK OK [00:52:53] RECOVERY - DPKG on srv191 is OK: All packages OK [00:53:08] LeslieCarr: whats up with it? [00:53:25] just the craziness with nagios, taking forever to do puppet's [00:56:33] RECOVERY - DPKG on mw1105 is OK: All packages OK [00:56:44] RECOVERY - RAID on mw1105 is OK: OK: no RAID installed [01:00:18] Change abandoned: Asher; "fail" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2136 [01:03:57] New patchset: Ottomata; "Tab -> spaces, formatting changes for PEP8." [analytics/reportcard] (master) - https://gerrit.wikimedia.org/r/2137 [01:03:58] New patchset: Ottomata; "observation.py - fixed __str__ method" [analytics/reportcard] (master) - https://gerrit.wikimedia.org/r/2138 [01:09:17] New patchset: Ottomata; "Changes to Pipeline classes + unit tests. Need to talk about this with Diederik (which is why it is in a new branch!)" [analytics/reportcard] (master) - https://gerrit.wikimedia.org/r/2139 [01:27:51] PROBLEM - MySQL Slave Delay on db42 is CRITICAL: CRIT replication delay 172315 seconds [01:29:11] PROBLEM - MySQL Replication Heartbeat on db48 is CRITICAL: NRPE: Unable to read output [01:29:16] New patchset: Pyoungmeister; "adding cp1001 and 1002 as ganglia agregators for eqiad text squids" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2140 [01:29:32] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2140 [01:30:32] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2140 [01:30:32] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2140 [01:33:01] PROBLEM - MySQL Replication Heartbeat on db49 is CRITICAL: NRPE: Unable to read output [01:39:13] New patchset: Catrope; "WIP puppetization of fatal error log (RT 623). DO NOT MERGE" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2141 [01:58:36] PROBLEM - MySQL Replication Heartbeat on db1042 is CRITICAL: NRPE: Unable to read output [02:03:46] PROBLEM - MySQL Replication Heartbeat on db1048 is CRITICAL: NRPE: Unable to read output [02:28:53] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [02:40:43] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Puppet has not run in the last 10 hours [04:15:38] RECOVERY - Disk space on es1004 is OK: DISK OK [04:26:18] RECOVERY - MySQL disk space on es1004 is OK: DISK OK [04:33:48] PROBLEM - Puppet freshness on knsq9 is CRITICAL: Puppet has not run in the last 10 hours [04:41:48] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [04:49:13] New review: Diederik; "Ok." [analytics/reportcard] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2139 [04:50:01] New review: Diederik; "Ok." [analytics/reportcard] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2138 [04:51:30] New review: Diederik; "Ok." [analytics/reportcard] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2137 [04:51:30] Change merged: Diederik; [analytics/reportcard] (master) - https://gerrit.wikimedia.org/r/2139 [04:51:31] Change merged: Diederik; [analytics/reportcard] (master) - https://gerrit.wikimedia.org/r/2138 [04:51:31] Change merged: Diederik; [analytics/reportcard] (master) - https://gerrit.wikimedia.org/r/2137 [05:50:24] PROBLEM - MySQL Replication Heartbeat on db42 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:07:16] PROBLEM - Puppet freshness on sodium is CRITICAL: Puppet has not run in the last 10 hours [09:07:16] PROBLEM - Puppet freshness on virt3 is CRITICAL: Puppet has not run in the last 10 hours [09:57:36] PROBLEM - MySQL disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 402461 MB (3% inode=99%): [09:58:46] PROBLEM - Disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 395279 MB (3% inode=99%): [10:04:04] someone who is good with git around? [10:04:17] I am wondering where the files puppet use on labs live in [10:04:40] I checked out the test branch but I don't see the files I merged while ago [10:04:55] weird is that they are present on machines, but not in repo [10:05:07] I believe I am in wrong branch or something [10:05:19] I guess Leslies is not around now? [10:05:26] * Leslie Carr [10:05:51] because she fixed it last time [10:07:58] hi Reedy [10:08:10] you know git well? [10:08:29] Nope [10:10:00] lol [10:10:01] Why? [10:10:20] I need to find where the current puppet config files live in [10:10:38] of labs [10:10:40] not prod [11:06:45] RECOVERY - MySQL slave status on es1004 is OK: OK: [12:39:37] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [12:51:37] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Puppet has not run in the last 10 hours [14:23:47] methecooldude: hey [14:23:51] around? [14:24:02] you've been doing some gerrit stuff, or not? [14:33:17] someone who understand gerrit :o [14:37:03] !log dist-upgrading storage3 [14:37:05] Logged the message, Master [14:44:54] PROBLEM - Puppet freshness on knsq9 is CRITICAL: Puppet has not run in the last 10 hours [14:49:15] mark around? [17:15:16] !log s2 dbs are a sad lot. streaming hotback of db1034 to db54 to build a new slave [17:15:18] Logged the message, Master [17:32:15] New patchset: Diederik; "Initial commit" [analytics/udp-filters] (master) - https://gerrit.wikimedia.org/r/2142 [17:57:42] roan: shall we do it now before everybody is in the office? [18:20:19] anyone want to see if they can help out with https://bugs.launchpad.net/ubuntu/+source/php5/+bug/922650 [18:20:27] it would fix a php bug on commons [18:32:34] !log dns update for fluorine host [18:32:35] Logged the message, RobH [18:48:43] maplebed: mechanical! http://en.wikipedia.org/wiki/Microelectromechanical_systems [18:56:07] andrew_wmf: i raise you http://en.wikipedia.org/wiki/Micro_Machines [18:56:39] http://www.youtube.com/watch?v=j2egGfd5j_k [18:57:39] RobH: I think that's something else :) [18:57:47] its better. [18:57:48] ;] [18:58:12] Although I would totally buy a 1/1,000,000-scale remote control car. [19:15:53] PROBLEM - check_gcsip on payments4 is CRITICAL: Connection timed out [19:15:53] PROBLEM - check_gcsip on payments1 is CRITICAL: Connection timed out [19:15:53] PROBLEM - check_gcsip on payments3 is CRITICAL: Connection timed out [19:15:53] PROBLEM - check_gcsip on payments2 is CRITICAL: CRITICAL - Cannot make SSL connection [19:18:13] PROBLEM - Puppet freshness on sodium is CRITICAL: Puppet has not run in the last 10 hours [19:18:13] PROBLEM - Puppet freshness on virt3 is CRITICAL: Puppet has not run in the last 10 hours [19:20:23] RECOVERY - check_gcsip on payments2 is OK: HTTP OK: HTTP/1.1 200 OK - 378 bytes in 0.175 second response time [19:20:23] RECOVERY - check_gcsip on payments1 is OK: HTTP OK: HTTP/1.1 200 OK - 378 bytes in 0.161 second response time [19:20:23] RECOVERY - check_gcsip on payments4 is OK: HTTP OK: HTTP/1.1 200 OK - 378 bytes in 0.171 second response time [19:20:23] RECOVERY - check_gcsip on payments3 is OK: HTTP OK: HTTP/1.1 200 OK - 378 bytes in 0.157 second response time [19:30:14] New review: Catrope; "Comments inline. I apologize for my pedantry :)" [analytics/udp-filters] (master) C: 0; - https://gerrit.wikimedia.org/r/2142 [19:32:23] RECOVERY - RAID on ms-fe2 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [19:32:23] RECOVERY - RAID on ms-fe1 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [19:35:33] RECOVERY - DPKG on ms-fe1 is OK: All packages OK [19:35:43] RECOVERY - DPKG on ms-fe2 is OK: All packages OK [19:37:18] !log reinstalling sq31 [19:37:20] Logged the message, RobH [19:39:43] RECOVERY - Disk space on ms-fe1 is OK: DISK OK [19:40:03] RECOVERY - Disk space on ms-fe2 is OK: DISK OK [19:41:23] RECOVERY - Memcached on ms-fe1 is OK: TCP OK - 0.003 second response time on port 11211 [19:46:02] New patchset: Asher; "db54 -> s2" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2143 [19:46:29] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2143 [19:46:30] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2143 [20:00:25] New patchset: Bhartshorne; "inserting real AUTH key for pmtpa prod swift cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2144 [20:00:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2144 [20:00:43] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2144 [20:00:43] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2144 [20:04:47] New patchset: Bhartshorne; "adding in accept all traffic from localhost" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2145 [20:05:03] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2145 [20:05:04] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2145 [20:05:04] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2145 [20:17:52] PROBLEM - RAID on ms-fe1 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:18:02] PROBLEM - RAID on ms-fe2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:32:12] PROBLEM - DPKG on ms-fe2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:32:22] PROBLEM - DPKG on ms-fe1 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:34:02] PROBLEM - Disk space on ms-fe2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:34:02] PROBLEM - Disk space on ms-fe1 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:49:02] PROBLEM - MySQL Replication Heartbeat on db42 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:03:50] LeslieCarr: http://www.noob.us/entertainment/mr-ghetto-walmart/ [21:18:20] PROBLEM - mysqld processes on db54 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [21:28:04] !log dns update [21:28:05] Logged the message, RobH [21:34:30] !log applying loopback filter on cr1-eqiad [21:34:31] Logged the message, Mistress of the network gear. [21:46:50] LeslieCarr: would you be able to tag yttrium on asw-a4-eqiad into the public services vlan for that row please? [21:46:53] or should i drop you a ticket? [21:47:01] (ports are name labeled on switch) [21:47:16] please don't say tag :) [21:47:18] that's something else [21:47:23] as you'll find out next week [21:48:03] RobH sure i can put it in that vlan [21:51:01] cool, ping me when done so i can do install on it thx =] [22:23:19] RECOVERY - Disk space on ms-fe2 is OK: DISK OK [22:25:10] New patchset: Bhartshorne; "adding iptables rules to allow connections to port 80 for anybody that wants to talk to swift" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2146 [22:25:27] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2146 [22:25:27] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2146 [22:25:27] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2146 [22:48:29] PROBLEM - MySQL Replication Heartbeat on db42 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:49:49] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [22:57:29] PROBLEM - Disk space on ms-fe2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:02:59] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Puppet has not run in the last 10 hours [23:09:12] LeslieCarr: is there any continuing reason for the root of ganglia-web to be http://ganglia.wikimedia.org/2.2.0/ ? [23:09:47] so we can have multiple installs running simultaneously - i did the 2.2.0 so that people stopped complaining about it being called /ganglia again ;) [23:10:12] can / just be the root? [23:11:20] Ryan_Lane: why was it a better idea to have it redirect to another directory again ? [23:11:25] Ryan had a very convincing argument [23:11:29] that I should have written down :) [23:12:24] i want to be able to have permanent ganglia links that don't require a redirect or aren't bound to a version [23:13:21] even if it's something silly like /latestganglia!!!/graph.php that is internally rewritten to the current latest install [23:13:34] but the point being, internally rewritten, not redirected [23:15:38] i could call the default "latest" [23:15:51] or "current" [23:16:48] that would work [23:18:05] why doesnt the office have a nap room? [23:18:17] * RobH needs a nap [23:22:08] I thought it did? [23:22:31] Baby feeding room? [23:22:38] where is that? [23:22:44] i doubt they want me napping in there ;] [23:22:57] we need kittens.. and then feed the kittens in there [23:23:13] wtf do we have a baby feeding room for? [23:23:19] isnt any unused meeting room that? [23:23:30] why are folks bringing babies to the office? [23:23:46] heh [23:23:54] i think it's not only for that, just that takes priority [23:23:57] or something [23:24:03] * RobH doesn't really care if all the mothers breastfeeding start hating him, it only makes him stronger. [23:24:15] this must be on the 6th floor [23:24:24] yeah [23:29:29] RobH: it is indeed on the 6th floor, next to the board room, has a nice little couch thing and is currently unoccupied [23:29:43] the 6th floor scares me [23:30:05] lol, it is really quiet up here today, so many people are not here [23:30:32] in fact, i only count 5 people on this side of the floor [23:32:48] Are the rest under the floor? [23:33:08] i wish we had raised floors, i miss them [23:44:24] LeslieCarr: yeah. I'm not a huge fan of the version in the url [23:44:50] LeslieCarr: usually it's good to alias /ganglia to [23:45:07] but the scheme you guys came up with is good too [23:45:19] hehe but everyone was complaining about calling it /ganglia because it seemed redundant [23:45:25] having the version breaks links when the version upgrades [23:45:37] mail.google.com/mail <-- [23:45:56] i'll rename it to "latest" [23:49:28] it can't just be /? [23:52:20] New patchset: Lcarr; "Changing the ganglia redirect to /latest" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2147 [23:52:36] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2147 [23:52:45] RECOVERY - mysqld processes on db54 is OK: PROCS OK: 1 process with command name mysqld [23:52:48] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2147 [23:52:48] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2147 [23:55:10] binasher: done [23:55:15] it's redirecting to /latest [23:56:15] New patchset: Bhartshorne; "moving pmtpa swift cluster to treat upload as its backend while populating the cache" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2148 [23:56:31] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2148 [23:56:32] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2148 [23:56:32] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2148