[00:12:44] PROBLEM - Puppet freshness on sq70 is CRITICAL: Puppet has not run in the last 10 hours [00:13:02] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.161 seconds [00:13:47] PROBLEM - Puppet freshness on sq68 is CRITICAL: Puppet has not run in the last 10 hours [00:22:47] PROBLEM - Puppet freshness on db23 is CRITICAL: Puppet has not run in the last 10 hours [00:23:41] PROBLEM - Puppet freshness on db10 is CRITICAL: Puppet has not run in the last 10 hours [00:26:41] PROBLEM - Puppet freshness on hooft is CRITICAL: Puppet has not run in the last 10 hours [00:32:41] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [00:34:47] PROBLEM - Puppet freshness on db21 is CRITICAL: Puppet has not run in the last 10 hours [00:36:19] Thehelpfulone: thanks for the revert, even though the edit to my user page was legit. I didn't realize I wasn't logged in. [00:36:31] err... [00:36:34] one sec. [00:36:41] heh [00:37:48] no problem, can't quite remember where it was or when I did it though [00:38:01] just 5m ago. [00:38:57] oh wait. [00:39:02] I'm misreading the changelog. [00:39:04] are you sure it was me, I haven't edited for a few hours? now you've intrigued me [00:39:13] it was reverted *to* the last version edited by you, not by you. [00:39:18] heh [00:39:31] so... uhh... nevermind. [00:39:34] :D [00:39:41] thanks for nothing! [00:39:44] :P [00:42:29] I just categorised your user page, that was a lot of effort [00:42:55] every grain of sand helps make the beach a wonderful place. [00:43:11] and now, with thanks and misguided platitudes, off to bed. [00:48:44] PROBLEM - Puppet freshness on linne is CRITICAL: Puppet has not run in the last 10 hours [00:56:14] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:56:32] PROBLEM - Frontend Squid HTTP on cp1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:57:35] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27409 bytes in 0.109 seconds [00:58:38] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [00:59:23] RECOVERY - Frontend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27546 bytes in 9.713 seconds [01:04:56] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [01:05:50] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.55 ms [01:06:35] PROBLEM - Frontend Squid HTTP on cp1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:07:47] RECOVERY - Frontend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27544 bytes in 0.115 seconds [01:08:59] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:10:11] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27409 bytes in 0.109 seconds [01:15:53] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [01:19:38] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.027 second response time [01:22:02] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [01:40:56] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 234 seconds [01:43:38] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.236 seconds [01:44:41] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27406 bytes in 0.106 seconds [01:46:29] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 0 seconds [01:47:46] New patchset: Jeremyb; "bug 37006 - fawiki: add Book namespace + aliases" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/10084 [01:47:53] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/10084 [01:49:52] New review: Jeremyb; "needs a local to sanity check that I didn't butcher the chars and put them in the right place." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/10084 [02:02:14] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [02:03:35] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [02:07:20] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [02:09:08] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:10:38] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27409 bytes in 0.113 seconds [02:21:08] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.021 second response time [02:36:53] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [02:37:47] RECOVERY - Puppet freshness on linne is OK: puppet ran at Mon Jun 4 02:37:32 UTC 2012 [02:38:14] RECOVERY - Puppet freshness on db21 is OK: puppet ran at Mon Jun 4 02:38:07 UTC 2012 [02:40:11] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Mon Jun 4 02:40:07 UTC 2012 [02:50:41] PROBLEM - Frontend Squid HTTP on cp1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:53:23] RECOVERY - Frontend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27546 bytes in 4.215 seconds [02:56:05] RECOVERY - Puppet freshness on hooft is OK: puppet ran at Mon Jun 4 02:55:51 UTC 2012 [02:57:44] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.161 seconds [03:01:56] PROBLEM - Frontend Squid HTTP on cp1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:02:41] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [03:05:14] RECOVERY - Puppet freshness on db23 is OK: puppet ran at Mon Jun 4 03:04:52 UTC 2012 [03:06:17] RECOVERY - Frontend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27544 bytes in 2.338 seconds [03:06:44] RECOVERY - Puppet freshness on spence is OK: puppet ran at Mon Jun 4 03:06:27 UTC 2012 [03:14:18] PROBLEM - Host search32 is DOWN: PING CRITICAL - Packet loss = 100% [03:29:18] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [03:43:15] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [03:58:15] PROBLEM - Frontend Squid HTTP on cp1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:58:33] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.161 seconds [03:59:36] RECOVERY - Frontend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27546 bytes in 2.971 seconds [04:04:59] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [04:19:05] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.131 seconds [05:41:46] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [05:55:34] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [05:58:07] PROBLEM - swift-container-auditor on ms-be2 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [06:01:25] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.108 seconds [06:02:19] RECOVERY - swift-container-auditor on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [06:52:47] PROBLEM - Host bellin is DOWN: PING CRITICAL - Packet loss = 100% [06:55:20] RECOVERY - Host bellin is UP: PING OK - Packet loss = 0%, RTA = 0.38 ms [07:05:05] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [07:06:35] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [07:23:24] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.117 seconds [07:32:44] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.170 seconds [08:14:08] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [08:34:01] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27398 bytes in 0.109 seconds [08:45:52] PROBLEM - Puppet freshness on bellin is CRITICAL: Puppet has not run in the last 10 hours [08:58:01] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [09:03:43] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.161 seconds [09:25:57] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [09:28:30] PROBLEM - Puppet freshness on es1003 is CRITICAL: Puppet has not run in the last 10 hours [09:28:30] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [09:28:30] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [09:34:21] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.161 seconds [09:45:27] PROBLEM - Puppet freshness on sq69 is CRITICAL: Puppet has not run in the last 10 hours [09:52:57] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [09:59:40] what's up with cp1001/1002? [10:01:11] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9982 [10:01:14] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9982 [10:05:24] RECOVERY - Puppet freshness on sq68 is OK: puppet ran at Mon Jun 4 10:05:04 UTC 2012 [10:05:33] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.110 seconds [10:07:21] RECOVERY - Puppet freshness on sq69 is OK: puppet ran at Mon Jun 4 10:06:55 UTC 2012 [10:08:15] RECOVERY - Puppet freshness on sq70 is OK: puppet ran at Mon Jun 4 10:07:59 UTC 2012 [10:11:06] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [10:19:15] New patchset: Mark Bergsma; "Add IPv6 addresses to pmtpa SSL servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10098 [10:19:37] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/10098 [10:19:54] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/10098 [10:19:56] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10098 [10:26:21] PROBLEM - Host bellin is DOWN: PING CRITICAL - Packet loss = 100% [10:27:24] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.117 seconds [10:28:20] New patchset: Mark Bergsma; "Multiple interface stanzas for the same interface name and different families are allowed" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10099 [10:28:41] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/10099 [10:28:48] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/10099 [10:28:50] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10099 [10:28:54] RECOVERY - Host bellin is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [10:34:00] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [10:49:09] PROBLEM - Auth DNS on ns2.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [10:52:30] New patchset: Mark Bergsma; "Decommission sq40" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10102 [10:52:51] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/10102 [10:54:12] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/10102 [10:54:14] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10102 [10:56:21] RECOVERY - Auth DNS on ns2.wikimedia.org is OK: DNS OK: 0.124 seconds response time. www.wikipedia.org returns 208.80.154.225 [11:28:08] New patchset: Mark Bergsma; "Don't upgrade wikimedia-lvs-realserver while I test it" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10104 [11:28:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/10104 [11:28:37] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/10104 [11:28:39] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10104 [11:32:14] !log Copied wikimedia-lvs-realserver 0.08 from APT distribution precise-wikimedia to lucid-wikimedia [11:32:21] Logged the message, Master [11:32:45] PROBLEM - Host bellin is DOWN: PING CRITICAL - Packet loss = 100% [11:36:48] RECOVERY - Host bellin is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [11:43:28] New patchset: Mark Bergsma; "Factor prefers the lo ipv6 address for ::ipaddress6" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10106 [11:43:35] morning mr lane [11:43:44] service IPs have been allocated [11:43:44] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/10106 [11:44:32] New patchset: Mark Bergsma; "Factor prefers the lo ipv6 address for ::ipaddress6" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10106 [11:44:49] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/10106 [11:45:39] New patchset: Mark Bergsma; "Facter prefers the lo ipv6 address for ::ipaddress6" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10106 [11:45:56] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/10106 [11:47:04] mark: morning :D [11:47:07] oh? [11:47:15] see dns [11:47:24] cool [11:47:49] stupid puppet takes the loopback addresses for the facter $ipaddress6 variable :( [11:48:07] :D [11:48:23] yay puppet! [11:49:27] oh god it uses ifconfig [11:49:38] was the lvs realserver stuff for ipv6 done yet? [11:49:56] yes [11:50:00] it's not deployed yet but in the repo [11:50:05] don't use it yet [11:50:07] just do nginx conf [11:50:12] i'm still fiddling :) [11:50:35] well, part of that needs to assign addresses via lvs_realserver [11:50:49] i'll handle that [11:50:52] if I don't do that, then I'm going to cause an outage [11:50:58] why? [11:51:05] because the addresses won't be bound [11:51:09] so what? [11:51:13] noone's using it yet [11:51:19] nginx will restart [11:51:24] and fail [11:51:31] ah it's not INADDR_ANY [11:51:34] right [11:51:35] then wait [11:51:38] * Ryan_Lane nods [11:51:46] brb [11:51:58] I can push it into gerrit and do a −1 if you' dlike [11:54:26] ok [11:56:17] heh [11:56:21] New patchset: Mark Bergsma; "Facter prefers the lo ipv6 address for ::ipaddress6" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10106 [11:56:30] have a look at interface_add_ip6_address in generic-definitions.py [11:56:37] dirty hacks fest [11:56:43] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/10106 [11:57:10] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/10106 [11:57:12] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10106 [12:04:16] hmmz [12:04:22] ssh -6 ssl1.wikimedia.org doesn't work [12:04:24] why not... [12:11:59] Ryan_Lane: I can remove the old ipv6 service IP stuff from lvs1, right? [12:12:02] it's completely broken anyway [12:12:21] there was stuff there? [12:12:28] iface eth0 inet6 static [12:12:28] address 2620:0:860:2::80:2 [12:12:28] netmask 64 [12:12:28] gateway 2620:0:860:2::1 [12:12:33] that's the wrong subnet ;) [12:15:19] I've now removed it [12:15:23] so that may mean that nginx won't restart now [12:15:25] on ssl1 [12:15:39] but I don't have time now to fix that [12:16:00] i'm gonna reboot the box [12:16:07] it either does or does not come back up with nginx [12:17:09] yeah. probably won't restart [12:17:13] I'll depool it [12:17:32] it can be repooled after we have the new ipv6 config in [12:17:45] PROBLEM - Host ssl1 is DOWN: PING CRITICAL - Packet loss = 100% [12:18:05] !log depooling ssl1 [12:18:09] Logged the message, Master [12:18:11] yep [12:19:25] faidon's new wikimedia-lvs-realserver with v6 support seems to work fine [12:19:33] RECOVERY - Host ssl1 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [12:19:37] he's our new packaging ninja [12:19:46] heh [12:20:09] PROBLEM - HTTPS on ssl1 is CRITICAL: Connection refused [12:20:17] i'm gonna ask puppet to upgrade it everywhere now [12:20:27] PROBLEM - NTP on ssl1 is CRITICAL: NTP CRITICAL: Offset unknown [12:20:46] New patchset: Mark Bergsma; "Revert "Don't upgrade wikimedia-lvs-realserver while I test it"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10109 [12:21:07] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/10109 [12:21:09] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/10109 [12:21:12] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10109 [12:22:47] oh damn [12:22:56] now we need to have puppet manage the ssh keys for the ipv6 addresses as well [12:23:13] it'll become even slower [12:23:39] !log Upgrading wikimedia-lvs-realserver to version 0.08 across the cluster (by Puppet) [12:23:43] Logged the message, Master [12:24:48] RECOVERY - NTP on ssl1 is OK: NTP OK: Offset 0.0009467601776 secs [12:25:10] ugh. yeah [12:25:26] dnssec and keys in the dns entries? [12:25:30] and we'll have all kinds of weirdness soon with all the hosts that now also have AAAA records for their hostnames [12:25:51] well, fingerprints in the dns entries, that is [12:25:54] especially right after fresh install, when the ipv6 address is not present yet [12:28:15] mark: any estimated date when ipv6 is enabled on prod [12:28:39] hm. I'm looking in dns, were's the forward lookups? [12:28:43] for the -lb addresses? [12:28:58] like Erik wrote in email 3 days ago that it was about to be enabled during hackaton [12:29:50] petan|wk: wednesday [12:29:56] ok [12:29:58] Ryan_Lane: think [12:30:29] it's not in the pipe one [12:30:41] no [12:30:55] because giving wikipedia a AAAA address in DNS before infrastructure is ipv6 enabled is a really bright idea ;-) [12:31:05] hahaha [12:31:07] good point [12:31:34] we should have this in git, then it could be sitting in a change [12:31:42] soon [12:31:46] * Ryan_Lane nods [12:32:03] hrm darn [12:32:18] if I add the v6 ips to the service ip hash in lvs.pp then pybal will try to add it everywhere [12:34:15] yep [12:34:20] go ahead on the ssl servers now [12:34:20] wait [12:34:23] is that true? [12:34:23] those will actually be the best test [12:34:28] since they DONT use that hash yet ;) [12:34:32] ah [12:34:34] that's why :) [12:34:45] they have their own service ip listing right now [12:34:47] I want to change that [12:34:52] but right now it's good that we haven't done that yet ;) [12:35:03] ok, I just add the addresses to the list? [12:35:16] Ryan_Lane: btw if you want I can help you with tagging on wikitech and such boring work [12:35:32] regarding the email [12:35:32] Ryan_Lane: I think. I've not looked at the protoproxy puppet config at all yet (in a while) [12:35:46] petan|wk: ok [12:35:49] as long as it only affects the ssl servers and not lvs/pybal you should be good [12:36:00] Ryan_Lane: petan|wk: count me in, which wiki are you moving stuff to? [12:36:03] it'll only affect ssl, yeag [12:36:11] wikimedia-lvs-realserver you can give the addresses too though [12:36:12] wait. is that tre? [12:36:19] to [12:36:20] it should be [12:36:51] ugh, reading these addresses from the reverse is horrible [12:36:56] hehe [12:37:07] it's all 2620:0:860:ed1a::0 to ::11 [12:37:13] (for pmtpa) [12:37:14] Ryan_Lane: you can create account Petrb with email benapetr at gmail dot com when you aren't busy, let me know then and I will take a look there [12:37:23] petan|wk: ok [12:38:52] so, wikimedia-lb is 2620:0:860:ed1a::0 ? [12:38:58] yes [12:39:05] (or 2620:0:860:ed1a:: ) [12:39:15] it seemed appropriate ;) [12:39:18] fucking hate ipv6's scheme [12:40:04] well, mobile-lb is 12 [12:40:21] hh [12:40:23] *heh [12:40:24] 12? [12:40:28] c? [12:40:31] yes [12:42:52] i've made wikimedia-lvs-realserver really flexible yesterday btw [12:43:00] you can now give it any combination of arrays and hashes [12:43:07] and it will just compile a list of ip addresses out of that [12:43:10] so you can use that if you want [12:43:26] instead of manually specifying each value from key in a hash [12:43:50] eqiad is: 2620:0:861:ed1a:: ? [12:44:04] correct [12:44:08] so for example, this also works: [12:44:09] # TEMP: during ipv6 migration [12:44:09] if $::site == "pmtpa" { [12:44:09] class { "lvs::realserver": realserver_ips => [ $lvs::configuration::lvs_service_ips[$::realm]['bits'][$::site], "2620:0:860:ed1a::a" ] } [12:44:09] } [12:44:09] else { [12:44:10] class { "lvs::realserver": realserver_ips => $lvs::configuration::lvs_service_ips[$::realm]['bits'][$::site] } [12:44:10] } [12:44:35] just putting an entire hash in an array with a literal value [12:45:24] hm [12:45:24] ok [12:45:50] New patchset: Mark Bergsma; "Add IPv6 service IP to pmtpa bits servers for testing" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10110 [12:46:12] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/10110 [12:46:20] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/10110 [12:46:23] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10110 [12:47:55] hrm [12:48:07] why didn't faidon make the lvs service IPs scope host instead of global [12:48:20] ah to distinguish them [12:52:45] Geo = {} [12:52:51] is what geoiplookup returns for v6 clients [12:52:53] that's fine for now [12:53:08] * Ryan_Lane nods [12:54:02] I think i'm gonna modify the pybal conf template to filter out ipv6 addresses/services except if the host is in a special ipv6 class [12:54:03] for now [12:54:09] until all pybals have been upgraded [12:54:12] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [12:54:19] seems like the easiest way to handle things [12:56:15] New patchset: Ryan Lane; "Adding ipv6 support for all sites for protoproxy" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10111 [12:56:22] * Ryan_Lane is stealing all the credit ^^ [12:56:35] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/10111 [12:56:38] bah [12:57:46] New patchset: Ryan Lane; "Adding ipv6 support for all sites for protoproxy" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10111 [12:58:07] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/10111 [12:58:16] all the blame [12:59:19] brb [13:00:38] New patchset: Ryan Lane; "Adding ipv6 support for all sites for protoproxy" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10111 [13:01:02] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/10111 [13:01:09] \o/ [13:01:16] mark: review? ^^ [13:01:22] yeah I will [13:01:29] damn, just got a bag of crisps :P [13:01:50] I need to get some food :( [13:03:21] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [13:04:37] New review: Mark Bergsma; "Many service IPs are wrong for the different sites." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/10111 [13:05:03] how so? [13:05:09] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [13:05:10] 860 for all sites [13:05:16] see inline comments for an example [13:05:18] damn it [13:05:19] right [13:07:02] New patchset: Ryan Lane; "Adding ipv6 support for all sites for protoproxy" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10111 [13:07:24] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/10111 [13:08:27] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.165 seconds [13:15:30] notpeter: regarding bellin, going to have to replace the main board. The problem did not follow the DIMM. Should get the new board today. [13:16:38] New review: Mark Bergsma; "The existing ipv6 address *is* in use in Amsterdam (it's in DNS there for certain providers), so we ..." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/10111 [13:16:51] Ryan_Lane: perhaps a better idea to do this change in small steps at a time? ;) [13:17:27] which existing ip address? [13:17:28] upload? [13:17:30] yes [13:17:34] I can change that back [13:17:50] hrm [13:17:53] that one is gonna be a pain in the butt [13:17:56] perhaps I just disable that now [13:18:00] didn't know if you had changed it already [13:18:08] haven't [13:18:11] that's been like that for years [13:18:12] why not put it in the pipe? [13:18:18] what? [13:18:20] then disable the pipe later? [13:18:32] I think i'll disable that ipv6 thing now [13:18:36] ok [13:18:40] then when dns ttl expires for that, we can make this change [13:18:48] * Ryan_Lane nods [13:19:05] people won't have ipv6 for a few days [13:19:08] and will complain about that ;) [13:19:12] heh [13:19:14] but in 2 days they should be happy [13:19:27] we're going backwards in ipv6 support [13:19:38] briefly ;) [13:19:46] that should be our report [13:20:21] back... [13:20:29] oh [13:20:32] there's an even easier way to do that [13:20:41] that thing i still called upload.esams instead of upload-lb.esams [13:20:47] wb paravoid [13:20:55] that took a while [13:21:38] hm. can I put more than one ipv6 address in there.... [13:21:41] paravoid: i'm gonna disable selective answer for a few days so it's not in our way during our changes [13:22:00] maybe I can just have both [13:22:04] naah [13:22:05] let's not [13:22:07] ok [13:22:09] i want to get rid of it anyway [13:22:16] might as well do that now [13:22:27] where are we? [13:22:48] what can I do? [13:22:56] you can tell me what you did with the lvs balancers [13:23:00] (after having a shower, since I stink atm) [13:23:06] TMI [13:23:19] d-i partitioning wasn't ready [13:23:25] and I was one of the last people left at the venue [13:23:30] so aborted and left [13:23:33] ok [13:23:44] Ryan told me it'd be risky to leave lvs1 down for a lengthy period of time [13:23:53] in case it's pair had a fault [13:23:59] so I just put it back into prod with lucid [13:24:05] good [13:24:36] so, I can do that [13:25:00] please do [13:25:03] make sure lvs1 is not used in any way [13:25:22] nice [13:25:28] dns scenario pmtpa-down still has yaseo in it [13:25:32] I think i'm not gonna touch it now :P [13:25:49] no idea what yaseo is [13:25:57] :| [13:26:04] a squid cluster in south korea we got rid of in 2008 [13:26:40] lol! [13:27:16] !log Changed upload.esams.wikimedia.org CNAME to upload-lb.esams, effectively disabling the IPv6 selective answer script [13:27:20]