[00:11:13] New patchset: Ryan Lane; "Add gluster cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2663 [00:12:03] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2663 [00:12:03] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2663 [00:16:09] New patchset: Ryan Lane; "Fix labstore node syntax" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2664 [00:16:31] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2664 [00:16:31] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2664 [00:16:32] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2664 [00:19:23] New patchset: Ryan Lane; "RAWR. How did this upstart job not get fixed?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2665 [00:19:45] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2665 [00:19:46] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2665 [00:20:01] BarkingFish: Evening guys, are you aware of server issues with the Simple English Wikipedia? I've tried 3 times to submit an edit now, and keep getting a message from my browser informing me that the server did not send any data. Error 324 (Empty Response) (from -tech) [00:20:51] New patchset: Lcarr; "Creating new class for new nagios host (aka neon)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2666 [00:27:42] New patchset: Lcarr; "Creating new class for new nagios host (aka neon)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2666 [01:38:46] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:39:58] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.038 seconds [01:56:10] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 626s [01:56:19] PROBLEM - MySQL replication status on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 635s [02:13:07] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:15:40] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.064 seconds [02:30:31] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [02:38:01] RECOVERY - Puppet freshness on search1002 is OK: puppet ran at Sat Feb 18 02:37:45 UTC 2012 [02:56:46] RECOVERY - MySQL replication status on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [02:57:31] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [03:23:48] PROBLEM - RAID on spence is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [03:24:24] PROBLEM - DPKG on spence is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [03:24:42] PROBLEM - Disk space on spence is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [05:56:54] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [05:59:54] PROBLEM - Puppet freshness on search1003 is CRITICAL: Puppet has not run in the last 10 hours [06:02:54] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [06:02:55] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [06:43:58] who set up neon? it is seriously cronspamming... [08:10:21] PROBLEM - Puppet freshness on bast1001 is CRITICAL: Puppet has not run in the last 10 hours [08:13:12] PROBLEM - Lucene on search3 is CRITICAL: Connection timed out [08:13:21] PROBLEM - LVS Lucene on search-pool1.svc.pmtpa.wmnet is CRITICAL: Connection timed out [08:14:24] RECOVERY - LVS Lucene on search-pool1.svc.pmtpa.wmnet is OK: TCP OK - 0.002 second response time on port 8123 [08:14:25] PROBLEM - DPKG on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:15:36] RECOVERY - DPKG on db1047 is OK: All packages OK [08:38:24] PROBLEM - Puppet freshness on fenari is CRITICAL: Puppet has not run in the last 10 hours [08:43:21] PROBLEM - Puppet freshness on search1001 is CRITICAL: Puppet has not run in the last 10 hours [08:53:06] PROBLEM - LVS Lucene on search-pool1.svc.pmtpa.wmnet is CRITICAL: Connection timed out [08:54:09] RECOVERY - LVS Lucene on search-pool1.svc.pmtpa.wmnet is OK: TCP OK - 0.006 second response time on port 8123 [08:54:54] RECOVERY - Lucene on search3 is OK: TCP OK - 0.006 second response time on port 8123 [08:56:17] !log restarted all the searchpool1 lsearchds [08:56:20] Logged the message, Master [09:05:17] New patchset: Mark Bergsma; "Remove Ryan's ssh key" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2667 [09:05:40] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2608 [09:05:40] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2667 [09:05:51] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2667 [09:05:52] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2667 [09:10:54] New patchset: Mark Bergsma; "Revert "Remove Ryan's ssh key"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2668 [09:11:15] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2668 [09:12:43] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2668 [09:12:44] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2668 [09:14:24] PROBLEM - Disk space on srv219 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=62%): /var/lib/ureadahead/debugfs 0 MB (0% inode=62%): [09:21:00] RECOVERY - Disk space on srv219 is OK: DISK OK [09:23:42] RECOVERY - Puppet freshness on fenari is OK: puppet ran at Sat Feb 18 09:23:29 UTC 2012 [09:25:08] that was a curious set of commits [10:47:30] PROBLEM - Disk space on db30 is CRITICAL: DISK CRITICAL - free space: / 287 MB (3% inode=87%): [10:47:48] PROBLEM - MySQL disk space on db30 is CRITICAL: DISK CRITICAL - free space: / 287 MB (3% inode=87%): [11:54:37] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:58:57] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [12:31:39] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [13:47:52] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:50:25] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [14:12:55] PROBLEM - MySQL Slave Delay on db16 is CRITICAL: CRIT replication delay 236 seconds [14:18:10] RECOVERY - MySQL Slave Delay on db16 is OK: OK replication delay 3 seconds [15:47:37] New patchset: Pyoungmeister; "adding searchidx1001 to site.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2669 [15:47:59] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2669 [15:57:22] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [16:00:22] PROBLEM - Puppet freshness on search1003 is CRITICAL: Puppet has not run in the last 10 hours [16:03:22] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [16:03:22] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [16:06:13] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:07:25] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [16:40:34] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:41:46] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [18:11:16] PROBLEM - Puppet freshness on bast1001 is CRITICAL: Puppet has not run in the last 10 hours [18:27:55] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:29:07] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [18:40:04] PROBLEM - Disk space on srv222 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=62%): /var/lib/ureadahead/debugfs 0 MB (0% inode=62%): [18:43:49] PROBLEM - Puppet freshness on search1001 is CRITICAL: Puppet has not run in the last 10 hours [18:44:34] PROBLEM - Disk space on srv224 is CRITICAL: DISK CRITICAL - free space: / 232 MB (3% inode=62%): /var/lib/ureadahead/debugfs 232 MB (3% inode=62%): [18:45:55] RECOVERY - Disk space on srv224 is OK: DISK OK [18:49:49] PROBLEM - Disk space on srv224 is CRITICAL: DISK CRITICAL - free space: / 208 MB (2% inode=62%): /var/lib/ureadahead/debugfs 208 MB (2% inode=62%): [18:50:34] RECOVERY - Disk space on srv222 is OK: DISK OK [18:51:10] RECOVERY - Disk space on srv224 is OK: DISK OK [19:34:49] PROBLEM - Puppet freshness on fenari is CRITICAL: Puppet has not run in the last 10 hours [20:45:07] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2669 [20:45:08] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2669 [20:56:50] PROBLEM - Lucene on search1002 is CRITICAL: Connection refused [22:32:09] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [23:11:15] New patchset: Pyoungmeister; "new conf for eqiad" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2670 [23:11:41] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2670 [23:14:04] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2670 [23:14:04] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2670 [23:23:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:24:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 3.648 seconds [23:58:09] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:59:30] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 9.033 seconds