[00:08:28] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 00:08:18 UTC 2013 [00:08:28] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:09:55] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [00:10:55] PROBLEM - Host sq54 is DOWN: PING CRITICAL - Packet loss = 100% [00:11:15] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 00:11:07 UTC 2013 [00:11:25] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:12:05] RECOVERY - Host sq54 is UP: PING OK - Packet loss = 0%, RTA = 26.51 ms [00:13:25] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 00:13:23 UTC 2013 [00:14:25] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:15:35] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 00:15:33 UTC 2013 [00:16:27] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:17:35] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 00:17:27 UTC 2013 [00:18:25] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:19:05] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 00:18:56 UTC 2013 [00:19:25] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:20:15] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 00:20:11 UTC 2013 [00:20:25] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:20:55] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 00:20:50 UTC 2013 [00:21:26] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:39:19] PROBLEM - Host sq52 is DOWN: PING CRITICAL - Packet loss = 100% [00:39:48] RECOVERY - Host sq52 is UP: PING OK - Packet loss = 0%, RTA = 26.71 ms [00:44:58] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 00:44:51 UTC 2013 [00:45:28] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:53:08] PROBLEM - Host sq56 is DOWN: PING CRITICAL - Packet loss = 100% [00:53:48] RECOVERY - Host sq56 is UP: PING OK - Packet loss = 0%, RTA = 26.53 ms [00:59:58] PROBLEM - Puppet freshness on db44 is CRITICAL: No successful Puppet run in the last 10 hours [01:02:58] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset -0.0008350610733 secs [01:04:28] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset -0.001398563385 secs [01:04:38] PROBLEM - Host sq51 is DOWN: PING CRITICAL - Packet loss = 100% [01:05:35] RECOVERY - Host sq51 is UP: PING OK - Packet loss = 0%, RTA = 26.62 ms [01:06:23] PROBLEM - NTP on sq51 is CRITICAL: NTP CRITICAL: Offset unknown [01:09:23] RECOVERY - NTP on sq51 is OK: NTP OK: Offset 0.08956778049 secs [01:11:43] PROBLEM - Host sq53 is DOWN: PING CRITICAL - Packet loss = 100% [01:12:53] PROBLEM - Host sq55 is DOWN: PING CRITICAL - Packet loss = 100% [01:13:23] RECOVERY - Host sq53 is UP: PING OK - Packet loss = 0%, RTA = 26.57 ms [01:13:53] RECOVERY - Host sq55 is UP: PING OK - Packet loss = 0%, RTA = 26.47 ms [01:23:13] PROBLEM - Host sq58 is DOWN: PING CRITICAL - Packet loss = 100% [01:23:24] PROBLEM - Host sq57 is DOWN: PING CRITICAL - Packet loss = 100% [01:24:13] RECOVERY - Host sq57 is UP: PING OK - Packet loss = 0%, RTA = 26.92 ms [01:24:23] RECOVERY - Host sq58 is UP: PING OK - Packet loss = 0%, RTA = 26.54 ms [02:01:09] odder: Wikimedia is a unique position to beta test software like WordPress, I think. [02:01:13] in a * [02:06:32] !log LocalisationUpdate completed (1.22wmf4) at Thu May 23 02:06:32 UTC 2013 [02:06:48] Logged the message, Master [02:13:09] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [02:16:09] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [02:19:36] Susan: I'm betting the ops would want WP to care a little bit more about security if they were going to beta test it [02:25:20] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu May 23 02:25:13 UTC 2013 [02:25:28] Logged the message, Master [02:43:51] PROBLEM - HTTP on formey is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:47:31] PROBLEM - HTTPS on formey is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:49:35] p858snake|l: MediaWiki has had like six security updates this year. How many has WordPress had? ;-) [02:50:11] Susan: It seems to have them every second week (or did when I last looked at it) [02:50:21] [citation needed] [02:51:31] RECOVERY - HTTPS on formey is OK: OK - Certificate will expire on 08/22/2015 22:23. [02:57:32] PROBLEM - HTTPS on formey is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:01:31] RECOVERY - HTTPS on formey is OK: OK - Certificate will expire on 08/22/2015 22:23. [03:07:33] PROBLEM - HTTPS on formey is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:08:23] RECOVERY - HTTPS on formey is OK: OK - Certificate will expire on 08/22/2015 22:23. [03:11:35] PROBLEM - HTTPS on formey is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:23:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:24:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [03:25:23] RECOVERY - HTTPS on formey is OK: OK - Certificate will expire on 08/22/2015 22:23. [03:31:08] Change abandoned: Andrew Bogott; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64197 [03:31:33] PROBLEM - HTTPS on formey is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:35:23] RECOVERY - HTTPS on formey is OK: OK - Certificate will expire on 08/22/2015 22:23. [03:41:33] PROBLEM - HTTPS on formey is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:43:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:44:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.163 second response time [03:48:23] RECOVERY - HTTPS on formey is OK: OK - Certificate will expire on 08/22/2015 22:23. [03:59:33] PROBLEM - HTTPS on formey is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:02:33] RECOVERY - HTTPS on formey is OK: OK - Certificate will expire on 08/22/2015 22:23. [04:08:21] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 04:08:18 UTC 2013 [04:08:41] RECOVERY - HTTP on formey is OK: HTTP OK: HTTP/1.1 200 OK - 3596 bytes in 8.888 second response time [04:09:00] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [04:09:00] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [04:09:00] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [04:09:21] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:09:50] RECOVERY - Puppet freshness on mc15 is OK: puppet ran at Thu May 23 04:09:46 UTC 2013 [04:10:32] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 04:10:20 UTC 2013 [04:11:21] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:12:32] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 04:12:21 UTC 2013 [04:13:20] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:14:10] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 04:14:05 UTC 2013 [04:14:20] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:15:40] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 04:15:30 UTC 2013 [04:16:21] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:17:01] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 04:16:55 UTC 2013 [04:17:21] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:17:26] looks like there's no problem deps for the python-virtualenv in raring. could we copy that into the wikimedia precise? [04:17:32] (for labs) [04:18:08] (i.e. the deps seem almost exactly the same as the deps for the precise package with the same name) [04:19:55] huh, I hopped on formey to see what was wrong and now of course it is fine [04:20:03] http://ganglia.wikimedia.org/latest/graph_all_periods.php?h=formey.wikimedia.org&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2&st=1369280894&g=mem_report&z=large&c=Miscellaneous%20pmtpa [04:20:08] nothing in the logs, etc [04:20:14] it's playing tricks on you? [04:20:27] yep [04:20:33] not so nice early in the morning [04:21:15] huh, weird graphs [04:21:30] it's 7:21:30 by my calculations [04:23:01] PROBLEM - Puppet freshness on db1017 is CRITICAL: No successful Puppet run in the last 10 hours [05:07:36] PROBLEM - NTP on ssl3002 is CRITICAL: NTP CRITICAL: No response from NTP server [05:10:56] PROBLEM - NTP on ssl3003 is CRITICAL: NTP CRITICAL: No response from NTP server [06:27:50] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 06:27:41 UTC 2013 [06:28:21] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:28:50] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 06:28:47 UTC 2013 [06:29:20] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:29:41] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 06:29:35 UTC 2013 [06:30:20] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [07:35:37] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset 0.005872249603 secs [08:08:08] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 08:07:59 UTC 2013 [08:08:28] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:09:29] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 08:09:24 UTC 2013 [08:10:28] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:14:58] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 08:14:49 UTC 2013 [08:15:28] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:16:08] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [08:16:08] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [08:16:08] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [08:19:08] PROBLEM - Puppet freshness on pdf2 is CRITICAL: No successful Puppet run in the last 10 hours [08:19:08] PROBLEM - Puppet freshness on pdf1 is CRITICAL: No successful Puppet run in the last 10 hours [08:32:15] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [08:33:08] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [08:33:28] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset 0.001932382584 secs [08:42:17] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [08:44:18] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [08:44:59] PROBLEM - Puppet freshness on db45 is CRITICAL: No successful Puppet run in the last 10 hours [09:32:17] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [09:34:17] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [10:10:13] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [10:20:13] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (856 MB out of 952 MB) [10:23:54] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [10:24:54] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [10:25:13] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (856 MB out of 952 MB) [10:30:03] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [10:35:13] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [10:40:10] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [10:45:10] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [10:50:10] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [10:55:10] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [11:00:10] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [11:00:31] PROBLEM - Puppet freshness on db44 is CRITICAL: No successful Puppet run in the last 10 hours [11:05:10] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [11:10:06] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [11:15:09] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [11:20:07] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [11:25:07] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [11:29:47] PROBLEM - SSH on searchidx1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:30:07] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [11:30:47] RECOVERY - SSH on searchidx1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [11:30:57] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [11:31:57] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [11:35:07] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [11:40:08] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [11:45:10] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [11:50:00] greg-g, hi, can i deploy a bug fix in a bit? [11:50:11] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [11:50:14] (or who is the release manager now?) [11:52:01] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [11:53:05] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [11:55:11] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [12:00:11] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [12:02:47] anybody alive? would really like to push out a fix [12:05:05] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [12:05:05] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [12:06:53] I am here but not the right person for deployments [12:08:23] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 12:08:13 UTC 2013 [12:08:53] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:09:03] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [12:10:03] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [12:10:38] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 12:10:31 UTC 2013 [12:10:58] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:12:32] ok, i will go ahead then, hope i won't step on anyone's toes for dir-syncing Zero extension [12:12:48] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 12:12:43 UTC 2013 [12:12:58] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:13:08] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [12:14:58] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 12:14:54 UTC 2013 [12:14:58] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:15:08] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [12:16:40] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 12:16:31 UTC 2013 [12:17:00] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:18:10] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 12:18:05 UTC 2013 [12:18:48] !log yurik synchronized php-1.22wmf4/extensions/ZeroRatedMobileAccess/ [12:18:57] Logged the message, Master [12:19:01] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:19:01] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 12:18:58 UTC 2013 [12:20:01] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:20:09] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [12:25:09] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [12:27:02] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [12:30:02] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [12:30:11] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [12:35:12] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [12:40:07] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [12:45:08] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [12:50:08] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [12:51:01] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [12:53:02] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [12:55:11] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [12:59:01] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [13:00:11] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [13:02:01] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [13:03:02] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [13:05:12] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [13:06:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:07:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [13:09:02] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [13:10:12] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [13:15:12] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [13:18:02] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [13:19:02] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [13:20:12] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [13:22:13] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [13:25:14] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [13:29:13] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [13:30:13] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [13:35:03] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [13:39:11] New patchset: Yurik; "Added Opera CIDRs per their recommendation" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65117 [13:40:10] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [13:43:00] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [13:44:00] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [13:45:10] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [13:50:10] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [13:51:10] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [13:52:00] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [13:52:10] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [13:54:00] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [13:55:10] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [13:56:10] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [13:57:11] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [14:00:12] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [14:03:02] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [14:03:12] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [14:04:02] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [14:05:12] PROBLEM - check_swap on db1025 is CRITICAL: SWAP CRITICAL - 90% free (855 MB out of 952 MB) [14:09:44] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [14:09:44] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [14:09:44] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [14:09:44] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [14:10:21] PROBLEM - DPKG on virt6 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:10:22] PROBLEM - mysqld processes on db44 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [14:10:22] PROBLEM - Parsoid on wtp1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:11:22] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [14:14:31] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [14:15:11] RECOVERY - check_swap on db1025 is OK: SWAP OK - 100% free (0 MB out of 0 MB) [14:19:32] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [14:23:21] PROBLEM - Puppet freshness on db1017 is CRITICAL: No successful Puppet run in the last 10 hours [14:28:31] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [14:30:54] New patchset: Ottomata; "Adding ganglia view for Kafka stats" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65122 [14:31:23] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65122 [14:31:41] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [14:32:31] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [14:33:41] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [14:34:42] New patchset: Ottomata; "Fixing typo" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65124 [14:34:50] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65124 [14:40:38] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [14:42:38] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [14:46:29] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [14:47:33] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [14:54:19] LeslieCarr: *poke* [14:57:38] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [14:58:38] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [15:07:15] New review: Ori.livneh; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63890 [15:08:27] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [15:11:28] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [15:21:26] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [15:22:27] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [15:23:26] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [15:23:26] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [15:31:26] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [15:34:26] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [15:44:20] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [15:48:20] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [15:52:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:53:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [15:59:20] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [16:02:20] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [16:05:20] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [16:07:21] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [16:08:21] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 16:08:14 UTC 2013 [16:08:31] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [16:08:32] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:10:31] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [16:10:32] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 16:10:26 UTC 2013 [16:11:32] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:12:22] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [16:12:51] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 16:12:50 UTC 2013 [16:13:32] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:14:21] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [16:14:41] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 16:14:35 UTC 2013 [16:15:32] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:15:51] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 16:15:42 UTC 2013 [16:16:31] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:16:41] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 16:16:39 UTC 2013 [16:17:31] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:18:11] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 16:18:09 UTC 2013 [16:18:31] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:18:55] New patchset: Ottomata; "Fixing confdir for ganglia::view" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65131 [16:19:05] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65131 [16:22:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:23:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [16:25:31] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [16:27:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:28:21] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [16:28:31] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [16:29:21] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [16:31:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [16:40:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:41:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [16:44:57] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 16:44:50 UTC 2013 [16:45:37] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:52:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:53:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.152 second response time [17:19:33] PROBLEM - Disk space on ms-be1011 is CRITICAL: DISK CRITICAL - /var/lib/ceph/osd/ceph-125 is not accessible: Input/output error [17:20:33] PROBLEM - Apache HTTP on mw1160 is CRITICAL: Connection timed out [17:20:33] PROBLEM - Apache HTTP on mw1155 is CRITICAL: Connection timed out [17:20:44] PROBLEM - Apache HTTP on mw1158 is CRITICAL: Connection timed out [17:20:44] PROBLEM - Apache HTTP on mw1157 is CRITICAL: Connection timed out [17:20:53] PROBLEM - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is CRITICAL: Connection timed out [17:20:58] PROBLEM - Apache HTTP on mw1154 is CRITICAL: Connection timed out [17:20:58] PROBLEM - Apache HTTP on mw1156 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:20:58] PROBLEM - Apache HTTP on mw1153 is CRITICAL: Connection timed out [17:22:23] RECOVERY - Apache HTTP on mw1160 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.064 second response time [17:22:33] RECOVERY - Apache HTTP on mw1155 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 4.765 second response time [17:22:43] RECOVERY - Apache HTTP on mw1158 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.057 second response time [17:22:43] RECOVERY - Apache HTTP on mw1154 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.048 second response time [17:22:44] RECOVERY - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 61049 bytes in 0.233 second response time [17:23:02] RECOVERY - Apache HTTP on mw1156 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 1.597 second response time [17:23:03] RECOVERY - Apache HTTP on mw1157 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 2.483 second response time [17:23:42] RECOVERY - Apache HTTP on mw1153 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.051 second response time [17:26:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:29:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [17:33:21] New patchset: Bsitu; "Add new eventlogging schema: EchoPrefUpdate" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65137 [17:43:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:44:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [18:05:27] Change merged: Kaldari; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65137 [18:17:05] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [18:17:06] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [18:17:06] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [18:20:05] PROBLEM - Puppet freshness on pdf1 is CRITICAL: No successful Puppet run in the last 10 hours [18:20:05] PROBLEM - Puppet freshness on pdf2 is CRITICAL: No successful Puppet run in the last 10 hours [18:27:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:28:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.143 second response time [18:30:26] !log bsitu synchronized wmf-config/CommonSettings.php 'Add new eventlogging schema: EchoPrefUpdate' [18:30:35] Logged the message, Master [18:40:37] going to run scap in a second [18:45:09] PROBLEM - Puppet freshness on db45 is CRITICAL: No successful Puppet run in the last 10 hours [18:47:37] !log bsitu Started syncing Wikimedia installation... : Update Echo to Master [18:47:46] Logged the message, Master [18:54:02] !log bsitu Finished syncing Wikimedia installation... : Update Echo to Master [18:54:10] Logged the message, Master [19:17:10] !log kaldari synchronized php-1.22wmf4/extensions/Echo/modules 'syncing Echo modules for css updates' [19:17:19] Logged the message, Master [19:53:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:54:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.138 second response time [20:08:19] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 20:08:16 UTC 2013 [20:08:41] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:10:29] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 20:10:24 UTC 2013 [20:10:39] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:11:08] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [20:12:28] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 20:12:22 UTC 2013 [20:12:38] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:13:59] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 20:13:55 UTC 2013 [20:14:38] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:14:58] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 20:14:55 UTC 2013 [20:15:38] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:15:48] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Thu May 23 20:15:43 UTC 2013 [20:16:38] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:28:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:29:06] !log spage synchronized php-1.22wmf4/extensions/GuidedTour 'E3 latest GuidedTour' [20:29:14] Logged the message, Master [20:29:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.137 second response time [20:30:45] !log spage synchronized php-1.22wmf4/extensions/GettingStarted 'E3 latest GettingStarted' [20:30:54] Logged the message, Master [20:39:39] PROBLEM - Host wtp1008 is DOWN: PING CRITICAL - Packet loss = 100% [20:40:41] RECOVERY - Host wtp1008 is UP: PING OK - Packet loss = 0%, RTA = 0.48 ms [20:46:24] FOR SALE CHEAP: 75 minutes of deploy window time. Unused, new in box. Operators are standing by. [20:47:06] * Nemo_bis makes a list of desired config changes and calculates offer [20:58:03] " All apache error logs are routed to tin in /h/w/l/syslog/apache.log" according to https://wikitech.wikimedia.org/wiki/How_to_deploy_code , but not so. tin's /home is different from fenari's nas1-a.pmtpa.wmnet:/vol/home_pmtpa [21:01:21] PROBLEM - Puppet freshness on db44 is CRITICAL: No successful Puppet run in the last 10 hours [21:03:20] Those are still on fenari [21:07:53] New patchset: Bsitu; "Enable email bundling on enwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65155 [21:08:42] New patchset: Yurik; "Removed WMF detection as a ZERO carrier" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65156 [21:09:42] uuh [21:14:52] Change merged: Kaldari; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/65155 [21:15:10] New patchset: Yurik; "Removed WMF detection as a ZERO carrier" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/65156 [21:16:57] greg-g: hi [21:22:22] !log bsitu synchronized wmf-config/InitialiseSettings.php 'Enable echo email bundling on enwiki' [21:22:30] Logged the message, Master [21:32:00] * Nemo_bis not receiving any enotif, bundled or otherwise [22:00:23] Reedy,thanks, I updated How to deploy [22:13:45] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [22:58:31] PROBLEM - NTP on ssl3002 is CRITICAL: NTP CRITICAL: No response from NTP server [23:48:23] PROBLEM - NTP on ssl3003 is CRITICAL: NTP CRITICAL: No response from NTP server