[00:36:00] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [02:48:43] hi Tim [02:54:39] hi [02:58:39] still stuggling a little bit with converting ip4 addresses to a long or 32 bit integer, can't figure out what standard libary i should use or if i just should it manually myself [03:00:41] i first do a getaddrinfo() call [03:01:12] and then i do a inet_pton call but the value that i get does not look right [03:34:31] you should use a library [03:39:15] getaddrinfo() is all you need, why are you calling inet_pton after that? [03:39:47] to try to get the actual 32bit integer [03:39:58] getaddrinfo gives you a sockaddr* [03:40:56] what field do i need from sockaddr? [03:42:34] if you want a 32-bit integer you can cast the sockaddr* to a sockaddr_in* and then get sin_addr.s_addr, that's your integer [03:42:46] see man ip [03:42:55] ahhhhh [03:43:09] but you have to know that it is IPv4 before you do that [03:43:30] yes, but sockaddr tells me that,right? [03:43:48] yes, in sa_family [03:43:56] sa_family == AF_INET [03:44:02] http://manpages.ubuntu.com/manpages/lucid/en/man7/ip.7.html [03:44:25] chec [03:44:26] k [03:44:34] http://manpages.ubuntu.com/manpages/lucid/en/man2/bind.2.html [03:44:50] those manpages have the definitions of the structs [03:45:17] and for ip6 i just need to make sure that the destination variable is big enough (128bit), right? [03:45:50] there's no 128-bit integer type if that's what you're thinking [03:47:00] you would cast the sockaddr* to a sockaddr_in6* [03:47:08] mmmmmm, i was thinking that but that's obviously not the case [03:47:24] http://manpages.ubuntu.com/manpages/precise/en/man7/ipv6.7.html [03:47:57] then you have sin6_addr.s6_addr, which has the bytes in it [03:47:59] so how can you determine whether an ip6 address falls in a ip6 range? [03:51:22] ok, thanks for the pointers! this should help me fix it! [03:51:36] it's a more interesting question, I was reading some manuals [03:51:49] in fact the way to do it for IPv4 might not be as simple as you think [03:52:48] the ip manpage says it: the s_addr field is in network byte order [03:53:20] so you'd have to do ntohl() on it [03:54:15] is this related to the big-endian / small-endian problem? [03:54:51] yes [03:55:00] you might have to do the range checks yourself [03:55:46] this is my understanding: small endian is least significant byte first and big endian is most significant byte first [03:55:59] and network byte order is big endian [03:56:13] is that correct? [03:57:32] yes [03:58:23] and so ntohl will convert network byte order to small endian and then you can do safely a larger / smaller comparison [04:00:03] with IPv4, yes [04:00:16] with IPv6 you only have the raw bytes [04:01:50] i'll have to do some more googling to see how to work with ip6 and comparisons [04:02:12] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [04:02:31] thanks again for all your feedback,i appreciate it a lot! [04:03:25] no problem [04:08:12] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [04:08:12] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [04:12:21] !log disabling search lvs1 check because it's going to false-positive in 4 hours... [04:12:24] Logged the message, and now dispaching a T1000 to your position to terminate you. [06:15:23] PROBLEM - Puppet freshness on bast1001 is CRITICAL: Puppet has not run in the last 10 hours [07:17:48] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [07:19:36] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms [07:23:21] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [07:32:48] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.270 second response time [07:40:27] PROBLEM - Puppet freshness on fenari is CRITICAL: Puppet has not run in the last 10 hours [08:01:36] PROBLEM - Lucene on search1 is CRITICAL: Connection timed out [08:04:00] PROBLEM - Lucene on search3 is CRITICAL: Connection timed out [08:09:16] RECOVERY - Puppet freshness on brewster is OK: puppet ran at Tue Feb 21 08:08:42 UTC 2012 [08:33:52] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [08:35:49] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [08:39:52] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [09:01:37] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.048 second response time [09:26:09] PROBLEM - DPKG on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:28:42] RECOVERY - DPKG on db1047 is OK: All packages OK [09:31:15] RECOVERY - Lucene on search1 is OK: TCP OK - 2.999 second response time on port 8123 [09:39:39] PROBLEM - Lucene on search1 is CRITICAL: Connection timed out [09:51:21] PROBLEM - udp2log processes on locke is CRITICAL: CRITICAL: filters absent: /a/squid/urjc.awk, [09:58:15] RECOVERY - udp2log processes on locke is OK: OK: all filters present [09:59:54] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 18.7802274783 (gt 8.0) [10:03:39] PROBLEM - udp2log processes on locke is CRITICAL: CRITICAL: filters absent: /a/squid/urjc.awk, [10:05:01] RECOVERY - udp2log processes on locke is OK: OK: all filters present [10:09:03] RECOVERY - Lucene on search3 is OK: TCP OK - 0.006 second response time on port 8123 [10:10:22] RECOVERY - Lucene on search1 is OK: TCP OK - 0.001 second response time on port 8123 [10:13:58] PROBLEM - udp2log processes on locke is CRITICAL: CRITICAL: filters absent: /a/squid/urjc.awk, [10:15:19] RECOVERY - udp2log processes on locke is OK: OK: all filters present [10:27:01] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 0.267108521739 [10:37:13] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [11:07:13] PROBLEM - Puppet freshness on db46 is CRITICAL: Puppet has not run in the last 10 hours [11:07:13] PROBLEM - Puppet freshness on mw1002 is CRITICAL: Puppet has not run in the last 10 hours [12:23:40] New review: Mark Bergsma; "Is the Puppet dependency between the packages necessary, i.e. doesn't APT resolve it?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2614 [12:27:37] New review: Mark Bergsma; "Instead of matching on $hostname to determine which site, please just use $::site, which is pmtpa or..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2670 [12:43:35] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [12:44:29] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [12:47:29] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [13:07:53] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.037 second response time [13:57:03] New patchset: QChris; "Set up .gitignore" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/2683 [13:57:07] New patchset: QChris; "Create directory for FileUtils.writeFile, if it does not exist" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/2684 [14:02:43] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [14:08:43] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [14:08:43] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [15:06:22] New patchset: Pyoungmeister; "search lvs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2685 [15:10:15] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:10:15] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:10:15] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:10:15] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:15:20] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:15:21] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:15:21] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:15:21] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:20:17] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:20:17] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:20:17] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:20:26] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:25:14] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:25:14] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:25:14] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:25:14] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:30:29] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:30:30] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:30:30] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:30:30] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:35:17] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:35:26] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:35:26] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:35:27] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:40:14] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:40:23] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:40:23] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:40:24] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:45:20] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:45:20] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:45:29] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:45:29] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:50:17] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:50:17] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:50:17] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:50:26] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:54:00] New patchset: Pyoungmeister; "allowing eqiad to rsync from home" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2686 [15:55:02] Change abandoned: Pyoungmeister; "going to think about this one some more..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2686 [15:55:14] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:55:15] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:55:15] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:55:50] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:00:54] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:00:54] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:00:54] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:00:54] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:05:33] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:05:33] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:05:33] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:05:33] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:08:57] New patchset: Pyoungmeister; "allowing 10.64.0.0/22 - private1-a-eqiad to rsync from home" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2687 [16:10:30] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:10:30] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:10:30] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:10:30] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:15:27] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:15:28] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:15:28] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:15:28] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:15:54] PROBLEM - Puppet freshness on bast1001 is CRITICAL: Puppet has not run in the last 10 hours [16:16:19] New patchset: Pyoungmeister; "do sites properly" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2688 [16:20:33] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:20:33] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:20:34] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:20:34] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:25:30] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:25:30] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:25:30] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:25:30] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:30:27] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:30:27] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:30:28] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:30:28] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:35:33] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:35:33] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:35:33] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:35:33] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:40:30] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:40:30] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:40:31] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:40:31] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:42:05] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2687 [16:42:05] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2687 [16:45:27] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:45:28] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:45:28] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:45:28] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:49:09] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2688 [16:49:10] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2688 [16:50:33] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:50:33] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:50:33] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:50:33] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:51:18] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 8.56340678261 (gt 8.0) [16:53:42] PROBLEM - Disk space on searchidx1001 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=51%): /var/lib/ureadahead/debugfs 0 MB (0% inode=51%): [16:58:47] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:58:47] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:58:47] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:58:47] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:03:12] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:03:15] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:03:15] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:03:15] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:03:15] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:03:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.036 seconds [17:03:21] RECOVERY - Disk space on searchidx1001 is OK: DISK OK [17:03:21] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 1.9399722807 [17:05:50] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:05:50] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:05:50] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:05:50] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:10:30] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:10:30] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:10:31] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:10:31] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:14:12] New patchset: Pyoungmeister; "needs more root..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2689 [17:14:35] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2689 [17:14:39] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2689 [17:15:28] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:15:28] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:15:28] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:15:28] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:16:38] !log reimaging searchidx1001 :( [17:16:40] Logged the message, and now dispaching a T1000 to your position to terminate you. [17:19:14] New patchset: Diederik; "Added full support for ip address and ip range filtering Added full support for regular expression matching Incorporated feedback from Tim, still struggling around lines 235-240. Change-Id: I8d52bbd84fd4ec39a6d735d802d9b87f95d1b0a0" [analytics/udp-filters] (refactoring) - https://gerrit.wikimedia.org/r/2626 [17:19:57] PROBLEM - Host searchidx1001 is DOWN: PING CRITICAL - Packet loss = 100% [17:20:33] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:20:34] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:20:34] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:20:34] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:25:21] RECOVERY - Host searchidx1001 is UP: PING OK - Packet loss = 0%, RTA = 26.42 ms [17:25:31] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:25:31] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:25:31] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:25:31] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:13] PROBLEM - Disk space on searchidx1001 is CRITICAL: Connection refused by host [17:28:49] PROBLEM - RAID on searchidx1001 is CRITICAL: Connection refused by host [17:29:06] PROBLEM - SSH on searchidx1001 is CRITICAL: Connection refused [17:29:15] PROBLEM - DPKG on searchidx1001 is CRITICAL: Connection refused by host [17:30:27] PROBLEM - check_minfraud1 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:30:28] PROBLEM - check_minfraud1 on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:30:28] PROBLEM - check_minfraud1 on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:30:28] PROBLEM - check_minfraud1 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:33:09] RECOVERY - SSH on searchidx1001 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:33:18] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:35:15] RECOVERY - check_minfraud1 on payments1 is OK: HTTP OK: HTTP/1.1 200 OK - 8643 bytes in 0.390 second response time [17:35:16] RECOVERY - check_minfraud1 on payments2 is OK: HTTP OK: HTTP/1.1 200 OK - 8643 bytes in 0.314 second response time [17:35:16] RECOVERY - check_minfraud1 on payments4 is OK: HTTP OK: HTTP/1.1 200 OK - 8643 bytes in 0.313 second response time [17:35:16] RECOVERY - check_minfraud1 on payments3 is OK: HTTP OK: HTTP/1.1 200 OK - 8643 bytes in 0.313 second response time [17:35:52] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 2.045 seconds [17:41:24] PROBLEM - Puppet freshness on fenari is CRITICAL: Puppet has not run in the last 10 hours [17:47:43] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [17:48:10] RECOVERY - DPKG on searchidx1001 is OK: All packages OK [17:48:55] RECOVERY - Disk space on searchidx1001 is OK: DISK OK [17:53:24] PROBLEM - NTP on searchidx1001 is CRITICAL: NTP CRITICAL: Offset unknown [17:56:07] RECOVERY - NTP on searchidx1001 is OK: NTP OK: Offset 0.001218318939 secs [18:09:45] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:13:39] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 3.842 seconds [18:18:21] mark: you around ? [18:18:52] maplebed: post-lunch want to see if we can get packaging working on the fw creator ? :) [18:20:40] New patchset: Pyoungmeister; "searchidx wants this too" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2690 [18:22:01] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2690 [18:22:02] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2690 [18:22:31] LeslieCarr: that sounds like FUN! [18:22:35] :) [18:22:42] yeah, I'm game. [18:29:24] puppet knowledgeable peeps - anyone know if i can refer to files in another class? - example is here http://pastebin.com/MufKN5KK [18:36:02] LeslieCarr: I believe that that should work [18:36:21] although I'd break it out to a seperate class. like puppet::config, or some such [18:38:44] but yeah, you can totally grab a list from another class with an absolute reference [18:40:12] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 9.36314886957 (gt 8.0) [18:44:09] hrm, so i have a failure that claims to be "]" failure, except i don't see any missing ones... [18:44:22] notpeter: if you'd like to check out https://gerrit.wikimedia.org/r/#change,2666 :) [18:46:17] <^demon|class> LeslieCarr: nagios.pp, line 494, looks like you're missing a [ in front of /etc/nagios/puppet_checks.d [18:46:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:46:31] ah :) [18:46:41] thank you :) [18:46:45] <^demon|class> yw. [18:47:02] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.31944417391 [18:47:16] New patchset: Lcarr; "Creating new class for new nagios host (aka neon)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2666 [18:51:08] New patchset: Lcarr; "Creating new class for new nagios host (aka neon)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2666 [18:52:08] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 8.365 seconds [18:53:02] ^demon|class: do you see what's wrong at line 490 ? (for some reason the installation of puppet on my machine doesn't include parser ) [19:03:44] New patchset: Ottomata; "Comments, fixing tests" [analytics/reportcard] (master) - https://gerrit.wikimedia.org/r/2691 [19:04:43] New review: Diederik; "Ok." [analytics/reportcard] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2691 [19:04:43] Change merged: Diederik; [analytics/reportcard] (master) - https://gerrit.wikimedia.org/r/2691 [19:12:38] could someone please check out line 490 on nagios.pp in this change ? https://gerrit.wikimedia.org/r/#change,2666 [19:12:57] i have gone over it again and again and can't find out why it gives me an error at ":" expecting "]" [19:13:31] oh wait [19:13:35] i might have found it a few lines laer [19:13:57] nope [19:14:00] still hating on me [19:14:33] LeslieCarr: there's a ] imbalance on the subscribe [19:14:46] 7x [ [19:14:52] 6 x ] [19:14:57] so i found i was missing a "}" at the end of subscrib [19:15:00] File[nagios::monitor::["/etc/nagios/puppet_checks.d"] ]; [19:15:00] to [19:15:02] File[nagios::monitor::["/etc/nagios/puppet_checks.d"]]]; [19:15:07] but it still is giving the same error [19:15:10] after fixing that [19:16:45] New patchset: Lcarr; "Creating new class for new nagios host (aka neon)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2666 [19:16:52] pusehd the fixed "]" as another patchset [19:17:49] service { nagios3: [19:17:57] should the nagios3 be in ""? [19:18:25] hrm, the service nagios wasn't, but i think "" is best practice so i'll change that [19:18:46] # snmp tarp stuff [19:18:48] It's a tarp! [19:18:50] It's a tarp! [19:18:53] * Reedy grins [19:18:58] :) [19:19:56] New patchset: Lcarr; "Creating new class for new nagios host (aka neon)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2666 [19:20:52] does that mean you know anything about the neon setup? [19:24:46] well i just build it [19:24:59] and then put the old nagios setup on it [19:24:59] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:25:11] actually might want to wipe it again before putting on this new class and see if it works [19:25:24] apergos: can you see what's wrong with line 490 ? [19:25:43] I don't see what's wrong. sorry, LeslieCarr. [19:25:46] is there still something? [19:25:49] lemme go look [19:26:03] cause what reedy said was correct [19:26:39] yeah, i fixed the ] and it's still complaining.. same error [19:26:44] ok [19:26:46] lemme stare [19:28:03] my staring makes me want to smashy [19:28:19] I don't think it's a mismatch [19:28:25] I think it's complaining about something else [19:28:39] If we made ops spend half an hour on pypuppet everytime they got annoyed with puppet ;) [19:29:00] I will guess it doesn't like the second set of :: in that reference [19:29:11] File[nagios::monitor::["$puppet_files"] ] [19:29:33] hrm [19:29:46] :( [19:30:29] I don't know if it's the second set or the first set, to be honest [19:30:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 9.192 seconds [19:31:31] but that for sure is what it is telling you... it doesn't like the reference in there [19:32:03] New patchset: Ottomata; "Another test for push" [analytics/reportcard] (master) - https://gerrit.wikimedia.org/r/2692 [19:33:18] ah ok [19:33:20] good to know [19:33:23] bad puppet! [19:34:01] it is [19:34:24] New review: Diederik; "Ok." [analytics/reportcard] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2692 [19:34:25] Change merged: Diederik; [analytics/reportcard] (master) - https://gerrit.wikimedia.org/r/2692 [19:36:49] New patchset: Lcarr; "Creating new class for new nagios host (aka neon)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2666 [19:38:09] passed [19:38:35] congrats [19:41:19] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/2675 [19:42:09] :) [19:42:29] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2666 [19:42:30] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2666 [19:43:35] PROBLEM - SSH on neon is CRITICAL: Connection refused [19:43:53] PROBLEM - RAID on neon is CRITICAL: Connection refused by host [19:43:53] PROBLEM - Disk space on neon is CRITICAL: Connection refused by host [19:44:11] PROBLEM - DPKG on neon is CRITICAL: Connection refused by host [19:46:47] sorry, that's me [19:46:51] reformatting neon [19:47:39] ok [19:47:41] that means... [19:47:49] the cron spam will stop! :-D [19:50:56] * AaronSchulz stares down http://www.google.com/support/forum/p/gmail/thread?tid=783a111ce040cf89&hl=en [19:59:38] PROBLEM - NTP on neon is CRITICAL: NTP CRITICAL: No response from NTP server [20:04:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:06:40] yay, the hacky s3 interface to archive.org works for everything now except for mulltipart uploads (haven't started that) [20:07:53] RECOVERY - SSH on neon is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [20:08:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 5.665 seconds [20:38:03] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [20:42:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:46:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.746 seconds [20:47:39] PROBLEM - SSH on amslvs1 is CRITICAL: Server answer: [20:49:00] RECOVERY - SSH on amslvs1 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [21:07:11] rainman-sr: hey, I havea quick question, if you have a second [21:08:03] PROBLEM - Puppet freshness on db46 is CRITICAL: Puppet has not run in the last 10 hours [21:08:04] PROBLEM - Puppet freshness on mw1002 is CRITICAL: Puppet has not run in the last 10 hours [21:09:38] notpeter, yep sure [21:10:38] hey! so, I'm rsyncing over the current contents of searchidx2 to searchidx1001. what crons/scripts should I run on searchidx1001 to make sure the indexes are up to date before deploying? [21:14:56] notpeter, the stuff in my crontab [21:15:30] all of them? [21:15:43] let me see [21:16:14] yep, all of them [21:16:43] cool! will do [21:16:44] thanks! [21:16:48] np [21:20:12] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:24:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 1.478 seconds [21:36:38] !log force-running puppet on every labs instance [21:36:41] Logged the message, Master [21:58:45] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:02:39] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 4.773 seconds [22:36:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:40:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 2.092 seconds [22:44:09] hrmph [22:44:10] The authenticity of host 'searchidx1001 (10.64.0.119)' can't be established. [22:44:15] (on fenari when syncing stuff) [22:44:33] notpeter: Does that mean that you got the syncs to searchidx1001 set up? I assume you just added it to the mediawiki-installation node list? [22:45:48] RoanKattouw: yeah, the search indexer gets a full mediawiki install [22:45:51] so that's all set up [22:45:58] (sorry for the key change....) [22:46:00] OK excellent [22:46:04] Then I don't have to do it anymore [22:46:08] well [22:46:15] That's OK, it'll fix itself when the phase of the moon script runs, right? [22:46:26] I do need confs on all of the other search nodes.... [22:46:32] yes, that will clear itself up [22:46:58] but I was thinking about that, and scap might not be the right tool for pushing those confs out [22:47:21] You could pull em too, if you only need them sporadically [22:47:42] yeah, I was thinking about that as well [22:47:55] I need to be slightly more sure on what they're needed for... [22:48:09] but yes, I think an rsync pull cron might be the solution [22:48:33] You can just pull from 10.0.5.8, provided you don't still get the perms issue [22:48:50] nope, got the rsync into eqiad working [22:48:57] so yeah, that will probably be the thing to do [22:49:14] I'll ocnfirm with you tomorrow, but... pull seems reasonable [22:50:58] What was the problem, out of interest? [22:51:35] that rsyncd was only set up to allow from 10.0.0.0/16 [22:51:44] needed to add 10.64.blah [22:52:05] aah [23:14:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:18:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 3.716 seconds [23:18:40] New patchset: Lcarr; "Changed name of createfirewall.py to match new name of software" [operations/software] (master) - https://gerrit.wikimedia.org/r/2694 [23:19:11] New review: Lcarr; "(no comment)" [operations/software] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2694 [23:19:12] Change merged: Lcarr; [operations/software] (master) - https://gerrit.wikimedia.org/r/2694 [23:34:22] New patchset: Pyoungmeister; "adding cron to search hosts to occassionally poll for new mediawiki config files" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2695 [23:34:44] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2695 [23:35:50] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2695 [23:35:51] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2695 [23:43:58] New patchset: Diederik; "Added full support for ip address and ip range filtering Added full support for regular expression matching Incorporated feedback from Tim, still struggling around line 369 - 378. Change-Id: I8d52bbd84fd4ec39a6d735d802d9b87f95d1b0a0" [analytics/udp-filters] (refactoring) - https://gerrit.wikimedia.org/r/2626 [23:47:25] RECOVERY - Lucene on search1007 is OK: TCP OK - 0.027 second response time on port 8123 [23:52:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:55:27] New patchset: Pyoungmeister; "adding ganglia data sources for eqiad search" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2696 [23:55:50] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2696 [23:56:01] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2696 [23:56:01] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2696 [23:56:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 6.933 seconds