[00:26:59] PROBLEM - Puppet freshness on search1022 is CRITICAL: Puppet has not run in the last 10 hours [00:42:12] PROBLEM - Puppet freshness on search1021 is CRITICAL: Puppet has not run in the last 10 hours [01:53:49] PROBLEM - Puppet freshness on sq34 is CRITICAL: Puppet has not run in the last 10 hours [02:42:07] PROBLEM - MySQL disk space on db1047 is CRITICAL: DISK CRITICAL - free space: /a 65055 MB (3% inode=99%): [02:49:55] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [02:53:13] PROBLEM - MySQL disk space on db1047 is CRITICAL: DISK CRITICAL - free space: /a 61995 MB (3% inode=99%): [04:19:01] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [06:05:09] PROBLEM - MySQL disk space on db1047 is CRITICAL: DISK CRITICAL - free space: /a 63754 MB (3% inode=99%): [07:18:30] PROBLEM - MySQL disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 453216 MB (3% inode=99%): [07:29:18] PROBLEM - SSH on db25 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:29:27] PROBLEM - Full LVS Snapshot on db25 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:29:27] PROBLEM - MySQL Slave Delay on db25 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:29:45] PROBLEM - MySQL Replication Heartbeat on db25 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:29:45] PROBLEM - mysqld processes on db25 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:29:45] PROBLEM - MySQL disk space on db25 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:29:54] PROBLEM - MySQL Slave Running on db25 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:29:54] PROBLEM - MySQL Idle Transactions on db25 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:30:03] PROBLEM - MySQL Recent Restart on db25 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:51:08] PROBLEM - NTP on db25 is CRITICAL: NTP CRITICAL: No response from NTP server [08:36:07] RECOVERY - SSH on db25 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [08:36:07] RECOVERY - Full LVS Snapshot on db25 is OK: OK no full LVM snapshot volumes [08:36:25] RECOVERY - mysqld processes on db25 is OK: PROCS OK: 1 process with command name mysqld [08:36:43] RECOVERY - MySQL disk space on db25 is OK: DISK OK [08:37:10] RECOVERY - MySQL Replication Heartbeat on db25 is OK: OK replication delay seconds [08:37:10] RECOVERY - NTP on db25 is OK: NTP OK: Offset 0.0007747411728 secs [08:38:04] RECOVERY - MySQL Idle Transactions on db25 is OK: OK longest blocking idle transaction sleeps for 0 seconds [08:38:04] RECOVERY - MySQL Slave Running on db25 is OK: OK replication Slave_IO_Running: Yes Slave_SQL_Running: Yes Last_Error: [08:38:22] RECOVERY - MySQL Recent Restart on db25 is OK: OK 5574705 seconds since restart [08:41:22] PROBLEM - MySQL Replication Heartbeat on db25 is CRITICAL: CRIT replication delay 210 seconds [08:41:40] RECOVERY - MySQL Slave Delay on db25 is OK: OK replication delay 0 seconds [08:42:43] RECOVERY - MySQL Replication Heartbeat on db25 is OK: OK replication delay 0 seconds [08:56:13] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [08:56:13] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [09:11:13] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [09:11:13] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [09:13:10] PROBLEM - swift-container-auditor on ms-be5 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [09:34:32] RECOVERY - swift-container-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:28:04] PROBLEM - Puppet freshness on search1022 is CRITICAL: Puppet has not run in the last 10 hours [10:43:04] PROBLEM - Puppet freshness on search1021 is CRITICAL: Puppet has not run in the last 10 hours [11:54:51] PROBLEM - Puppet freshness on sq34 is CRITICAL: Puppet has not run in the last 10 hours [12:50:56] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [13:36:20] PROBLEM - Router interfaces on cr1-sdtpa is CRITICAL: CRITICAL: host 208.80.152.196, interfaces up: 73, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/1/0: down - Core: cr2-eqiad:xe-5/2/1 (FPL/Level3, CV71028) [10Gbps wave]BR [13:39:11] RECOVERY - Router interfaces on cr1-sdtpa is OK: OK: host 208.80.152.196, interfaces up: 75, down: 0, dormant: 0, excluded: 0, unused: 0 [15:16:07] 08 14:56:22 < halfak_home> Hey folks. I'm hoping someone can give me a hand with the analytics database db1047. [15:16:10] 08 14:56:57 < halfak_home> I'm trying to run a query that sorts the revision table by timestamp, but I'm running out of disk space with the temp table. [15:21:15] PROBLEM - MySQL disk space on db1047 is CRITICAL: DISK CRITICAL - free space: /a 67887 MB (3% inode=99%): [15:24:09] halfak_home: i already pasted you in here [15:24:15] no one responded yet [15:24:19] Thanks [15:24:39] (i haven't looked into it at all myself but my access is quite minimal) [15:24:46] you might want to look at ganglia [15:25:15] halfak_home: if you're lucky, domas will take a look for you [15:25:28] kk [15:25:34] domas: <3 [15:25:44] disk filling up on temp table [15:25:57] halfak_home: is it actually `create temporary table x` or what? [15:26:05] negative. [15:26:19] Using a server-side cursor to run: SELECT * FROM revision ORDER BY rev_timestamp; [15:26:29] oh, it's just a query that the server decided to sort on disk [15:26:34] Yup [15:27:20] halfak_home: one solution is to pull them all down and sort them on your own box [15:28:01] @info db1047 [15:29:56] I was going to try that. but databases are better at it than I am. ;) [15:30:18] Also, my 2 virtual cores + 4GB of ram is a little silly when compared to an enterprise database server. [15:30:40] I'm sure I don't have to tell anyone in this channel the specs of the DB machines. [15:30:42] errrr.... you sure it's not a COTS DB server? ;) [15:32:39] yes [15:33:00] According to asher, it has 64GB of ram and the fast variant of 1+0 array. [15:33:49] halfak_home: http://ganglia.wikimedia.org/latest/?p=2&c=MySQL%20eqiad&h=db1047.eqiad.wmnet [15:34:27] :) [15:34:35] halfak_home: can you ssh to that box? [15:34:41] Negative :\ [15:35:05] jeremyb: I have indirect access through the virtual machine I describes above [15:35:46] i didn't see the VM [15:38:23] 2 virtual cores + 4GB of memory. [15:52:41] oh, i thought you were describing a workstation [15:53:24] anyway, i still think you may wait over a day for an answer or you (if you do it yourself) can have an answer in under an hour. [15:53:38] i'm happy to help sort if you need help [15:54:14] (or is that a simplified test case and you had something more complex planned?) [16:51:21] jeremyb: I have my own work around that will allow me to use db1047's massive amount of memory to sort smaller sets at a time. Thanks for your help though. [16:51:53] heh, where range ? [16:52:36] LIMIT 100000 with a continue query wrapper in a python iterator. [16:56:12] ^^^ Forces the query to take advantage of the available btree index = no on-disk sort. [18:57:06] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [18:57:06] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [19:12:06] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [19:12:06] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [20:29:18] PROBLEM - Puppet freshness on search1022 is CRITICAL: Puppet has not run in the last 10 hours [20:44:18] PROBLEM - Puppet freshness on search1021 is CRITICAL: Puppet has not run in the last 10 hours [21:55:30] PROBLEM - Puppet freshness on sq34 is CRITICAL: Puppet has not run in the last 10 hours [22:52:15] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [23:00:48] PROBLEM - MySQL Slave Delay on db1018 is CRITICAL: CRIT replication delay 182 seconds [23:00:57] PROBLEM - MySQL Replication Heartbeat on db1018 is CRITICAL: CRIT replication delay 185 seconds [23:02:00] PROBLEM - MySQL Replication Heartbeat on db24 is CRITICAL: CRIT replication delay 191 seconds [23:02:09] PROBLEM - MySQL Slave Delay on db24 is CRITICAL: CRIT replication delay 192 seconds [23:04:42] PROBLEM - SSH on lvs6 is CRITICAL: Server answer: [23:07:42] RECOVERY - SSH on lvs6 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0)