[00:03:44] RECOVERY - Host es3 is UP: PING OK - Packet loss = 0%, RTA = 0.66 ms [00:05:05] !log es3:~# rm -rf /usr/local/mysql* [00:05:10] Logged the message, Master [00:09:41] !log pointed es3 to MASTER_LOG_FILE='es1-bin.000788', MASTER_LOG_POS=453509865 [00:09:45] Logged the message, Master [00:12:53] PROBLEM - MySQL Slave Delay on es3 is CRITICAL: CRIT replication delay 1207 seconds [00:21:35] New patchset: Asher; "es1 = master" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11013 [00:23:27] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11013 [00:24:44] !log shutdown mysql on es3. stopped slaving on es1002, rsyncing cluster23 tables to es3 [00:24:48] Logged the message, Master [00:26:08] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11013 [00:26:10] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11013 [00:27:08] PROBLEM - MySQL Slave Delay on es1003 is CRITICAL: CRIT replication delay 260 seconds [00:28:20] PROBLEM - MySQL Slave Delay on es1004 is CRITICAL: CRIT replication delay 334 seconds [00:32:50] RECOVERY - MySQL Slave Delay on es1003 is OK: OK replication delay 0 seconds [00:34:11] RECOVERY - MySQL Slave Delay on es1004 is OK: OK replication delay 1 seconds [00:42:33] New patchset: Asher; "ben's thir^H^H^H^Hsecond prod ssh key" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11015 [00:42:56] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11015 [00:43:19] maplebed: want to review it in gerrit? ^^ [00:43:26] yup. [00:43:49] nice snark in the comment there. [00:44:39] it might be snake, but seriously - there are places where not replacing your key after losing it would not be ok [00:44:44] *snark [00:45:27] luckily, wiki doesn't care about security :) [00:48:18] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 1; - https://gerrit.wikimedia.org/r/11015 [00:48:27] reviewed. [00:49:34] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11015 [00:49:37] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11015 [00:51:04] hm, I think I just added edns-client-subnet support to pdns' geo backend [00:51:33] that would solve the "I'm in Europe, use 8.8.8.8 as a DNS and hit eqiad instead of esams" little problem [00:51:42] binasher: which host should I try? (did you run puppet manually on one of the bastions?) [00:53:31] puppet is being slow but it will be on bast1001 [00:54:06] oh well, 4am [00:54:09] k. [00:54:11] time to sleep [00:54:21] srsly, paravoid I think you're not human. [00:54:28] ? [00:54:29] you never sleep! [00:54:34] I sleep a lot [00:54:57] hmmm.... [00:55:10] I'll believe it when I see it. [00:55:12] ;) [00:55:26] nope, i added the key incorrectly [00:55:29] have you considered that our sleep times might overlap? :-) [00:55:30] PROBLEM - Puppet freshness on es1004 is CRITICAL: Puppet has not run in the last 10 hours [00:55:56] binasher: a conflict with the existing ensure absent key? [00:56:13] just a typo - Parameter type failed: Invalid value "ssh-dsa" [00:56:54] grumble. sorry I didn't catch that. [00:57:03] geobackend.cc | 16 ++++++++++++++-- [00:57:03] 1 file changed, 14 insertions(+), 2 deletions(-) [00:57:15] New patchset: Asher; "key type typo" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11016 [00:57:19] * paravoid loves when things are so easy [00:57:30] loves it even [00:57:41] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11016 [00:57:49] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/11016 [00:58:16] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11016 [00:58:18] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11016 [01:00:40] notice: /Stage[main]/Accounts::Ben/Ssh_authorized_key[ben@JDoe-LinuxBookAir-3.local]/ensure: created [01:01:14] success. thanks binasher [01:12:57] hrm, slight problem with my change [01:13:21] would DoS our NS kind of change, dammit [01:15:25] :( [01:22:54] !log removing one slave from each db shard to upgrade/restart [01:23:00] Logged the message, notpeter [01:35:48] !log passes the dba mantel to notpeter [01:35:53] Logged the message, Master [01:41:30] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 259 seconds [01:43:54] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 375 seconds [01:46:00] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 0 seconds [01:46:36] PROBLEM - mysqld processes on db54 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [01:46:36] PROBLEM - Host db32 is DOWN: PING CRITICAL - Packet loss = 100% [01:48:15] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [01:50:03] RECOVERY - Host db32 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [01:51:24] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 24 seconds [02:40:18] PROBLEM - Puppet freshness on searchidx2 is CRITICAL: Puppet has not run in the last 10 hours [04:14:29] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [05:58:53] RECOVERY - MySQL Replication Heartbeat on db1047 is OK: OK replication delay 0 seconds [05:59:11] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay 1 seconds [06:23:42] !Log db1047 looks like the aft_article_filter_count is missing a few rows compared to the master (after replication caught up), presumably this is a side effect of the repair, have pinged binasher for help, leaving everything running and hope it's tolerable error for a day [06:23:59] grrr [06:24:06] !log db1047 looks like the aft_article_filter_count is missing a few rows compared to the master (after replication caught up), presumably this is a side effect of the repair, have pinged binasher for help, leaving everything running and hope it's tolerable error for a day [06:24:12] Logged the message, Master [06:24:24] you really need to learn to read capital letters [07:25:35] New review: Dzahn; "in retrorespect: please also leave a line for the existing / non-1000 virt servers in here, mapping ..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/10951 [07:48:24] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [07:48:24] PROBLEM - Puppet freshness on es1003 is CRITICAL: Puppet has not run in the last 10 hours [07:48:24] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [08:19:17] New review: Dereckson; "I confirm the Extension:Collection doesn't have any ZIM reference in the code." [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/10855 [08:42:27] PROBLEM - Puppet freshness on srv232 is CRITICAL: Puppet has not run in the last 10 hours [08:52:22] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [09:21:21] New review: Hashar; "Mail sent to operations mailing list to get some feedback about this change." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/7773 [09:28:21] New review: Hashar; "I have asked Ryan to take a look at this old change :-]" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/4145 [09:35:16] New review: Hashar; "I have asked Tim for review and send him an email to get his opinion on that limit." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/9130 [09:56:34] New patchset: Hashar; "bug 37391 - Install Translate extension on be.wikimedia.org" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/10593 [09:56:40] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/10593 [09:57:15] New review: Hashar; "Lets merge it!" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/10593 [09:57:18] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/10593 [10:00:54] New patchset: Mark Bergsma; "Add RFC 4760 to the supported features list" [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11020 [10:00:55] New patchset: Mark Bergsma; "During IPPrefix constructors, address family is not set yet" [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11021 [10:00:55] New patchset: Mark Bergsma; "Set self.addressfamily when not None" [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11022 [10:00:56] New patchset: Mark Bergsma; "Fix some newly introduced bugs" [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11023 [10:00:57] New patchset: Mark Bergsma; "Add rudimentary support for BGP capability advertisements" [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11024 [10:00:58] New patchset: Mark Bergsma; "Simplify BGP MP attribute encoding through code reuse" [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11025 [10:00:58] New patchset: Mark Bergsma; "Fix several bugs in IPPrefix and Attribute handling" [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11026 [10:00:59] New patchset: Mark Bergsma; "Make missing attribute checking optional, as it breaks with multi protocol" [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11027 [10:00:59] New patchset: Mark Bergsma; "Fix IPv6 address __str__ method" [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11028 [10:01:00] New patchset: Mark Bergsma; "Increase code reuse in MP attribute classes, add NotificationSent __str__ method" [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11029 [10:02:27] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (mp-bgp); V: 0 C: 2; - https://gerrit.wikimedia.org/r/10859 [10:02:37] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (mp-bgp); V: 1 C: 2; - https://gerrit.wikimedia.org/r/10859 [10:02:39] Change merged: Mark Bergsma; [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/10859 [10:03:12] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (mp-bgp); V: 1 C: 2; - https://gerrit.wikimedia.org/r/10860 [10:03:14] Change merged: Mark Bergsma; [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/10860 [10:03:39] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (mp-bgp); V: 1 C: 2; - https://gerrit.wikimedia.org/r/10861 [10:03:41] Change merged: Mark Bergsma; [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/10861 [10:04:12] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (mp-bgp); V: 1 C: 2; - https://gerrit.wikimedia.org/r/10862 [10:04:15] Change merged: Mark Bergsma; [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/10862 [10:04:57] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (mp-bgp); V: 1 C: 2; - https://gerrit.wikimedia.org/r/10865 [10:04:59] Change merged: Mark Bergsma; [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/10865 [10:05:23] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (mp-bgp); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11020 [10:05:25] Change merged: Mark Bergsma; [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11020 [10:05:47] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (mp-bgp); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11021 [10:05:49] Change merged: Mark Bergsma; [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11021 [10:06:11] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (mp-bgp); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11022 [10:06:13] Change merged: Mark Bergsma; [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11022 [10:06:44] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (mp-bgp); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11023 [10:06:46] Change merged: Mark Bergsma; [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11023 [10:07:34] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (mp-bgp); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11024 [10:07:36] Change merged: Mark Bergsma; [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11024 [10:08:08] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (mp-bgp); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11025 [10:08:14] Change merged: Mark Bergsma; [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11025 [10:08:58] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (mp-bgp); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11026 [10:08:59] Change merged: Mark Bergsma; [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11026 [10:09:29] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (mp-bgp); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11027 [10:09:30] Change merged: Mark Bergsma; [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11027 [10:09:54] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (mp-bgp); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11028 [10:09:56] Change merged: Mark Bergsma; [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11028 [10:10:40] New review: Mark Bergsma; "(no comment)" [operations/debs/pybal] (mp-bgp); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11029 [10:10:42] Change merged: Mark Bergsma; [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11029 [10:33:35] PROBLEM - Host db54 is DOWN: PING CRITICAL - Packet loss = 100% [10:36:35] RECOVERY - Host db54 is UP: PING OK - Packet loss = 0%, RTA = 0.39 ms [10:41:32] RECOVERY - mysqld processes on db54 is OK: PROCS OK: 1 process with command name mysqld [10:43:47] PROBLEM - MySQL Slave Delay on db54 is CRITICAL: CRIT replication delay 31738 seconds [10:43:56] PROBLEM - MySQL Replication Heartbeat on db54 is CRITICAL: CRIT replication delay 31695 seconds [10:49:49] New review: Hashar; "Deployed and bug 37391 marked resolved." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/10593 [10:56:42] PROBLEM - Puppet freshness on es1004 is CRITICAL: Puppet has not run in the last 10 hours [11:01:57] PROBLEM - mysqld processes on db22 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [11:03:09] PROBLEM - mysqld processes on db11 is CRITICAL: Connection refused by host [11:03:32] all of those DBs are me, btw. and it's fine. they're out of db.php right now for kern upgrades [11:03:45] PROBLEM - Host db44 is DOWN: PING CRITICAL - Packet loss = 100% [11:03:54] PROBLEM - Host db47 is DOWN: PING CRITICAL - Packet loss = 100% [11:04:23] going to cycle mw1042 / mw1071 (down since ~ 3d) unless they are special cases [11:04:30] PROBLEM - mysqld processes on db26 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [11:04:46] mutante: I believe it is still the case that none of those are in use yet [11:04:59] so should be fine [11:05:11] although, if they have hardware issues, that'd be good to note [11:05:12] ok [11:05:15] PROBLEM - Host db22 is DOWN: PING CRITICAL - Packet loss = 100% [11:05:15] PROBLEM - Host db11 is DOWN: PING CRITICAL - Packet loss = 100% [11:06:09] RECOVERY - Host db47 is UP: PING OK - Packet loss = 0%, RTA = 0.38 ms [11:06:45] RECOVERY - Host db44 is UP: PING OK - Packet loss = 0%, RTA = 0.30 ms [11:06:45] RECOVERY - Host db22 is UP: PING OK - Packet loss = 0%, RTA = 0.40 ms [11:07:21] RECOVERY - Host db11 is UP: PING OK - Packet loss = 0%, RTA = 0.17 ms [11:07:57] RECOVERY - mysqld processes on db22 is OK: PROCS OK: 1 process with command name mysqld [11:08:33] !log powercycled mw1042 to check for hardware issues and fscked. appears to be just unused (though down since ~3d like mw1071 per nagios) [11:08:37] Logged the message, Master [11:09:18] PROBLEM - mysqld processes on db47 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [11:10:30] RECOVERY - mysqld processes on db11 is OK: PROCS OK: 1 process with command name mysqld [11:10:39] RECOVERY - Host mw1042 is UP: PING OK - Packet loss = 0%, RTA = 26.52 ms [11:12:27] RECOVERY - mysqld processes on db47 is OK: PROCS OK: 1 process with command name mysqld [11:14:24] PROBLEM - MySQL Replication Heartbeat on db11 is CRITICAL: CRIT replication delay 931 seconds [11:14:24] PROBLEM - MySQL Slave Delay on db11 is CRITICAL: CRIT replication delay 931 seconds [11:14:24] PROBLEM - MySQL Replication Heartbeat on db47 is CRITICAL: CRIT replication delay 691 seconds [11:14:42] PROBLEM - MySQL Slave Delay on db47 is CRITICAL: CRIT replication delay 675 seconds [11:14:51] PROBLEM - Host db26 is DOWN: PING CRITICAL - Packet loss = 100% [11:16:21] RECOVERY - Host db26 is UP: PING OK - Packet loss = 0%, RTA = 0.70 ms [11:17:24] ACKNOWLEDGEMENT - SSH on gilman is CRITICAL: Server answer: daniel_zahn grep gilman site.pp. is gilman dead? can we remove this? [11:17:33] ACKNOWLEDGEMENT - jenkins_service_running on gilman is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. daniel_zahn grep gilman site.pp. is gilman dead? can we remove this? [11:19:09] ACKNOWLEDGEMENT - Host mw1071 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn most likely not in use yet. please remove flag once it is. [11:19:39] RECOVERY - mysqld processes on db26 is OK: PROCS OK: 1 process with command name mysqld [11:19:48] ACKNOWLEDGEMENT - Host mw1102 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn most likely not in use yet. please remove flag once it is. [11:21:09] ACKNOWLEDGEMENT - Host knsq25 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn RT #2918 - hardware fail [11:23:24] ACKNOWLEDGEMENT - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours daniel_zahn puppet actually runs - fix firewalling for UDP / snmptraps? [11:25:57] RECOVERY - MySQL Replication Heartbeat on db47 is OK: OK replication delay 0 seconds [11:26:15] RECOVERY - MySQL Slave Delay on db47 is OK: OK replication delay 4 seconds [11:28:14] !log powercycling downed srv232 (also cause for check_all_memcached crit) [11:28:19] Logged the message, Master [11:28:48] RECOVERY - MySQL Slave Delay on db11 is OK: OK replication delay 0 seconds [11:28:57] RECOVERY - MySQL Replication Heartbeat on db11 is OK: OK replication delay 0 seconds [11:29:56] MySQL error: 1637: Too many active concurrent transactions (10.0.6.50) :o [11:30:45] PROBLEM - Host srv232 is DOWN: PING CRITICAL - Packet loss = 100% [11:31:03] RECOVERY - Memcached on srv232 is OK: TCP OK - 0.002 second response time on port 11000 [11:31:12] RECOVERY - Host srv232 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [11:31:38] dang it [11:31:39] RECOVERY - SSH on srv232 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [11:31:48] PROBLEM - Apache HTTP on mw33 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:32:15] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [11:32:42] PROBLEM - Apache HTTP on mw10 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:32:51] PROBLEM - Apache HTTP on mw39 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:33:09] RECOVERY - Apache HTTP on mw33 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.312 second response time [11:33:17] whats going on with the mws now [11:33:54] 10.0.6.50 = db40 [11:34:03] RECOVERY - Apache HTTP on mw10 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.537 second response time [11:34:12] RECOVERY - Apache HTTP on mw39 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.378 second response time [11:34:21] better:) [11:36:02] aude / notpeter: just temp.? / db40 out of db.php? [11:36:36] mutante: works now [11:36:41] cool [11:36:46] just a glitch [11:37:00] * mutante nods [11:38:00] ok [11:38:42] RECOVERY - Puppet freshness on srv232 is OK: puppet ran at Tue Jun 12 11:38:37 UTC 2012 [11:39:45] PROBLEM - MySQL Slave Delay on db26 is CRITICAL: CRIT replication delay 2093 seconds [11:40:13] mutante: db40 is the parser cache [11:40:38] the page rendered except the article cache [11:40:45] article text, errr [11:40:48] RECOVERY - Apache HTTP on srv232 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.033 second response time [11:40:51] ah, well, that's potentially awful [11:40:57] PROBLEM - MySQL Replication Heartbeat on db26 is CRITICAL: CRIT replication delay 1967 seconds [11:41:35] yeah, db40 load spiked crazily, but seems to be evening back out [11:41:39] ACKNOWLEDGEMENT - Host srv266 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn went down repeatedly in the past. hardware RT #2896 [11:41:41] good [11:41:54] notpeter: ok, gotcha [11:42:08] well, hrm, maybe [11:42:57] New patchset: Hashar; "wmfHostnames array to easily change hostnames on a cluster basis" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11034 [11:43:02] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11034 [11:43:57] nagios-wm: can have a seperator between user and message @ "daniel_zahn went down repeatedly in the past." ? heh [11:44:12] haha [11:44:13] s/user/host [11:44:43] daniel_zahn went high repeatedly in the past [11:45:01] up it is [11:45:40] mark: ganglia seems to be showing that the parsecache isn't able to purge fast enough [11:45:58] what do you know about parsercache, or who that is online knows stuff [11:46:03] tim and asher are both not online... [11:46:10] I don't know anything about the parsercache [11:46:15] damn. [11:47:11] transactions unpurged seems to have a downward trend at the moment [11:47:21] so this might recover, but if not... hello, asher! [11:48:45] is asher the ops DBA ? [11:48:53] I mean the main DBA ? [11:49:16] New patchset: Hashar; "vary wgUploadStashScalerBaseUrl based on cluster" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11035 [11:49:21] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11035 [11:49:48] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [11:49:48] RECOVERY - MySQL Replication Heartbeat on db26 is OK: OK replication delay 0 seconds [11:49:48] RECOVERY - MySQL Slave Delay on db26 is OK: OK replication delay 0 seconds [11:52:29] RECOVERY - MySQL Replication Heartbeat on db54 is OK: OK replication delay 0 seconds [11:52:56] RECOVERY - MySQL Slave Delay on db54 is OK: OK replication delay 0 seconds [11:53:22] hashar: yeah, I'd say so [11:53:34] db40 seems to be calming down [11:54:17] hashar: more specifically, if there were a parsercache problem, I'd say that he and tim are the two people well qualified to fix it. perhaps doma_s as well [11:59:02] New patchset: Hashar; "move mobile related conf to their own files" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11036 [11:59:03] New patchset: Hashar; "cleanup whitespace in mobile-pmtpa.php" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11037 [11:59:08] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11036 [11:59:10] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11037 [11:59:36] notpeter: hoo the parser cache that is a crazy part of MediaWiki [11:59:46] I am pretty sure only a handful of people actually understand it [11:59:50] (and I am not one of them) [12:00:35] yeah, nor am I. but I know that when db40 becomes angry, the site goes down very quickly :) [12:00:45] hehe [12:26:20] mark: here? [12:26:27] yes [12:26:42] so [12:26:59] I was bored late last night and was playing a bit with powerdns [12:27:09] I managed to add edns-client-subnet to the geobackend [12:27:15] (which I found out that you wrote!) [12:27:20] I did [12:27:21] nice :) [12:27:31] although I'm kind of planning to rewrite geobackend with something better [12:27:33] it's a bit aged ;) [12:27:54] that means that we'll get clients using 8.8.8.8 as their NS to the correct geolocated site [12:28:01] yeah :) [12:28:05] very nice [12:29:02] put the commits in gerrit and I'll happily review [12:29:09] I got twisted bgp multiprotocol support working btw [12:29:11] it's for pdns 3.1 ;-) [12:29:21] didn't even check for 2.x, didn't see the point [12:29:25] yeah [12:29:35] btw, they've put the edns fields into the pipe's backend proto v3 [12:29:38] I dunno what SCM they're using atm [12:29:47] (we're using v2) [12:29:49] yep [12:30:01] so we can even do that in pipe if you feel like dumping the geobackend [12:30:09] we might [12:30:15] could just write something around maxmind [12:30:17] geo needs a lot of changes to support ipv6 as I saw it [12:30:23] yeah [12:30:29] oh, and another problem [12:30:29] and something which does straight A records instead of CNAMES [12:30:34] and AAAA [12:30:52] the edns spec specifies that you get a request that has an address and a netmask [12:31:02] so, e.g. google sends no less that /24 for privacy reasons [12:31:17] then on your replies, you reply with a so-called scope netmask [12:31:35] so you have to get that from [12:31:38] what's it called again [12:31:40] ipreftre ;) [12:31:55] as in, you might reply for the whole /16 or for a more specific block [12:31:59] yeah [12:32:01] ippreftree [12:32:09] yes [12:32:14] basically python radix ;) [12:32:15] I haven't done that yet [12:32:17] and maxmind too [12:32:30] I just reply with the same netmask as the request [12:32:42] sounds reasonable [12:32:47] I had a look at ipt, but it would have been more complicated [12:32:55] yeah [12:32:59] it's implemented in a recursive function rather than an iteration [12:33:07] well I don't know what powerdns wants, but I basically just want to make a geobackend v2 [12:33:10] which is quite different [12:33:17] yes [12:33:23] although I don't recall the specifics [12:33:27] fuck, it's like 10 years ago now [12:33:28] so it's difficult to keep the real mask on the side [12:33:50] plus, I was hoping to find an external library that does prefix trees for both v4/v6 in the meantime :) [12:33:55] yup [12:34:08] you see why I'd rather start over ;) [12:34:12] heh :) [12:37:16] anyway [12:37:24] so the real questions now arew [12:37:38] what's the plan for ns[0-4]? :) [12:37:51] 4? [12:38:01] 3, sorry [12:38:05] ns0-2 ;) [12:38:08] 2?? [12:38:12] just 2? really? [12:38:16] 3 [12:38:20] learn to count dude ;p [12:38:29] yeah one in each DC [12:38:31] haha [12:38:59] we need something newer than hardy preferrably :) [12:39:26] basically the plan is to redo the dns deployment system with something git, start using a newer geobackend (possibly pipe based, possibly native C++) [12:39:33] put in something more automated for ipv6 perhaps [12:39:44] and definitely everything on precise [12:40:15] even newer than that I'd say for pdns [12:40:28] yes, pdns3 based anyway [12:40:29] precise has 3.0, there's 3.1 available in quantal+ [12:40:47] and 3.1 is supposed to fix a lot of .0 bugs [12:40:52] yup [12:40:58] habbie has done a lot of good work on it [12:41:23] PROBLEM - Puppet freshness on searchidx2 is CRITICAL: Puppet has not run in the last 10 hours [12:42:13] Habbie: apparently a colleague of mine added edns-client-subnet support to geobackend last night [12:42:20] er [12:42:22] hehe [12:42:31] wrong channel [12:42:57] hahaha [12:42:59] i'll let them know so we can sync up plans [12:46:14] hmm [12:46:31] i'm thinking about how to redo NaiveBGPPeering [12:46:34] with ipv6 support [12:46:43] it's so naive that I kind of hate it [12:46:47] but doing it well takes a bit of time ;) [12:47:50] also I need a good way to get the main ipv6 address in python [12:47:56] twisted doesn't help me yet [13:01:37] 15:00:50 <@Habbie> mark, oh, cute - of course client-subnet support is a 3 line patch ;) [13:01:37] 15:01:04 <@Habbie> mark, i also have a patch lying around that allows running multiple geobackends with separate configs, but it eats RAM like crazy [13:01:37] 15:01:27 <@mark> the one that loaded every ip database separately right [13:02:39] it's more like 16, but yes, simple [13:38:03] New review: Tim Starling; "I don't know how many bytes would have to change for it to saturate the network or whether that's ev..." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/9130 [13:41:24] !log added awight to fr-tech@wikimedia.org email alias [13:41:29] Logged the message, Master [13:45:33] New patchset: Pyoungmeister; "re-adding one db per shard after kernel upgrade" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11039 [13:45:38] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11039 [13:50:39] PROBLEM - MySQL Slave Delay on es1004 is CRITICAL: CRIT replication delay 223 seconds [14:28:45] New patchset: Ottomata; "Installing reportcard.wikimedia.org on stat1" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11042 [14:29:09] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11042 [14:30:18] New review: Ottomata; "This needs to wait until stat1 is reinstalled with Precise, so that we have the newer nodejs version." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/11042 [14:32:49] so, [14:32:56] I have installed the new php5 on srv193 aka test.wp.org [14:32:58] and it seems to work [14:33:08] shall I go ahead and put it in the repo? [14:33:25] reminder: puppet has ensure => latest, so that means that it will hit servers immediately [14:33:41] mark? [14:34:21] I don't want it to immediately be in effect on the snapshot hosts [14:34:46] it's just a security update, it shouldn't affect anything [14:35:00] I would try it on one production server first [14:35:04] before you roll it everywhere [14:35:07] build was identical to the old build but with a few patches? same configs etc? [14:35:11] but other than that... [14:35:27] mark: other than srv193 you mean? [14:35:36] apergos: yes [14:35:52] ok, that is reassuring [14:36:17] ah this is not the build for precise. duh ok [14:37:11] no, I have that too [14:37:17] although I don't see the point of rolling that out [14:37:27] and it's not the php 5.4 one either [14:38:11] yeah then I'm fine with it after it's run on a production server (i.e. serving things other than just test.wp) [14:38:20] RECOVERY - MySQL Slave Delay on es1004 is OK: OK replication delay 0 seconds [14:39:01] okay [14:40:53] silly questions then: a) how do I know which of the MWs are getting real traffic b) when I upgrade one of them how do I check if things indeed work? is there a log of some kind? [14:42:58] paravoid: yes [14:43:09] yes? :-) [14:43:12] a) /home/w/conf/pybal/pmtpa/apaches [14:43:36] b) you dont. check logs, do manual tests [14:44:05] !log resumed replication on es3, es1002 after cluster23 sync completed [14:44:10] Logged the message, Master [14:44:21] thanks a lot [14:44:54] mw1 has Nagios Apache HTTP and memcached monitoring. other mw100x hosts have just NTP / SSH / puppet. https://nagios.wikimedia.org/nagios/cgi-bin/status.cgi?host=mw1xxx [14:45:29] !log putting kern-upgraded DBs back into pools [14:45:33] Logged the message, notpeter [14:45:40] New review: Pyoungmeister; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11039 [14:45:42] Change merged: Pyoungmeister; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11039 [14:52:26] !log set innodb_max_dirty_pages_pct = 0 on db40 in prep for shutdown [14:52:31] Logged the message, Master [14:53:29] !rt 2919 [14:53:29] http://rt.wikimedia.org/Ticket/Display.html?id=2919 [14:53:45] that should be a DNS change for wikidata.org [14:54:08] can someone please tell me the status or resolve it? thanks! [14:54:08] wasn't Ryan involved in that? [14:54:15] i haven't a clue [14:54:19] it should be a trivial change [14:54:28] ack, there was discussion whether it should be a data.wikimedia.org URL or its own domain [14:54:42] because usually there was some policy or so to use wm subdomains in these cases [14:54:43] or at least as trivial as any other DNS change [14:54:49] and then it stalled [14:54:50] for what? [14:55:21] this is not about where to host a service. this is just a website for publishing info for people looking up the term "wikidata" [14:58:16] hrmm, yea lengthy discussion on 2919. would like another Ryan comment as he asked for more discussion on the URIs [15:00:15] !log rebooting db40 [15:00:20] Logged the message, Master [15:00:45] mutante: so i need to poke ryan ? [15:01:05] jeremyb: eh, yes please b/c quote "could have unforeseen [15:01:06] consequences," [15:01:56] well i guess maybe it's too early there now but for when he comes back... [15:02:00] Ryan_Lane: comments? [15:02:13] 12 14:55:21 < jeremyb> this is not about where to host a service. this is just a website for publishing info for people looking up the term "wikidata" [15:02:30] PROBLEM - Host db40 is DOWN: PING CRITICAL - Packet loss = 100% [15:02:30] unless there's some further context in the ticket that i'm not understanding [15:03:13] (obviously i can't see it) [15:03:39] There is/was a big discussion about actually whether it should be using wikidata.wikimedia.org or similar [15:04:42] Reedy: what should? [15:04:50] the wikidata project [15:05:02] err... what should? [15:05:16] using for what? [15:05:16] I don;t see why we wouldn't do it like any other project (i.e. like commons, meta, etc) = x.wikimedia.org [15:05:27] lol [15:05:30] RECOVERY - Host db40 is UP: PING OK - Packet loss = 0%, RTA = 0.19 ms [15:05:34] we have a model, why break it? [15:05:34] right. but this is not for a project. this is not for a wiki [15:05:51] I'm only saying what the discussion is [15:05:52] this is for (for now) 1 static HTML page that mostly just links to meta [15:05:54] I didn't say it either way [15:06:10] so do they want wikidata.static.wm.o? :-P [15:06:21] ugh [15:06:25] :-D [15:06:42] it's no emergency but it's been a while and IMO is frankly rediculous ;-( [15:07:18] (at least given what i know now... maybe there's still more context that i don't know about) [15:07:49] I personally think they should use https://secure.wikimedia.org/wikipedia/wikidata/wiki/Main_Page [15:08:01] <3 ;P [15:08:08] sold! [15:08:37] I econd ryan's view [15:08:44] question is who gets to decide [15:08:49] apergos: can you elaborate? [15:08:52] does some manager just arbitrarily say "here's how it is"? [15:08:55] *second [15:08:57] I suppose the point is here, is they just want a landing page [15:09:10] yes exactly reedy [15:09:22] the rest can be bikeshedded later [15:09:43] that too. again this is not about where to host a service ;) [15:10:21] New patchset: Pyoungmeister; "removing another 6 dbs from the pools for kernel upgrades" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11043 [15:10:26] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11043 [15:17:29] jeremyb: i added you as a CC: to that ticket, like Lydia from WMFDE, so now if you ask somebody to "reply" to the ticket you should get mail (which might not happen if people just comment) [15:17:50] k, thanks [15:19:21] jeremyb: you can add comments via direct mail to 2919@rt [15:19:37] ACK [15:31:34] New review: Pyoungmeister; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11043 [15:31:36] Change merged: Pyoungmeister; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11043 [15:31:48] !log doing another round of DB kernel upgrades [15:31:52] Logged the message, notpeter [15:39:37] !log migrating enwiki.aft_article_filter_count to innodb [15:39:42] Logged the message, Master [15:41:50] !log migrating enwiki.moodbar_feedback to innodb [15:41:54] Logged the message, Master [15:42:34] andrewbogott: you about? [15:42:52] was wondering how you were logging into virt1002 since the defaults dont seem to work... [15:43:42] RobH: Yep, I'm here. [15:43:52] Logging in to mgmt? Or to the OS? [15:43:55] mgmt [15:44:00] New review: Hashar; "Thanks for the input Tim. Sam, mine testing that out starting with a fork limit of 5, check CPU, ra..." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/9130 [15:44:10] I think it worked fine for me, but I'll double-check. [15:44:18] You're remembering to do admin@ rather than root@ ? [15:44:33] tried both [15:45:03] !log migrating enwiki.bv2009_edits (?) to innodb [15:45:08] Logged the message, Master [15:45:33] PROBLEM - mysqld processes on db33 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [15:46:00] PROBLEM - mysqld processes on db55 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [15:46:06] andrewb@bast1001:~$ ssh admin@virt1002.mgmt.eqiad.wmnet [15:46:06] admin@virt1002.mgmt.eqiad.wmnet's password: [15:46:06] ucs-c250-m1# [15:46:09] Works for me :( [15:46:45] PROBLEM - mysqld processes on db56 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [15:47:09] bleh, some bad cert stuff, had to blow out firefox [15:47:21] PROBLEM - mysqld processes on db34 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [15:47:39] PROBLEM - mysqld processes on db50 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [15:48:51] PROBLEM - mysqld processes on db36 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [15:49:00] Logged the message, Master [15:52:58] andrewbogott: so i am poking at virt1001 and virt1002 to compare them [15:53:08] i have a feeling im going to have to reboot into raid post [15:53:57] PROBLEM - Host db50 is DOWN: PING CRITICAL - Packet loss = 100% [15:54:12] I don't know what that means :) but, I continue to have no investment in the current state of either machine, so reboot away. [15:54:15] PROBLEM - Host db33 is DOWN: PING CRITICAL - Packet loss = 100% [15:54:15] PROBLEM - Host db36 is DOWN: PING CRITICAL - Packet loss = 100% [15:54:15] PROBLEM - Host db34 is DOWN: PING CRITICAL - Packet loss = 100% [15:54:15] PROBLEM - Host db55 is DOWN: PING CRITICAL - Packet loss = 100% [15:54:15] PROBLEM - Host db56 is DOWN: PING CRITICAL - Packet loss = 100% [15:55:00] RECOVERY - Host db50 is UP: PING OK - Packet loss = 0%, RTA = 1.19 ms [15:55:09] RECOVERY - Host db56 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms [15:55:17] !log virt1001 and virt1002 rebooting, disregard [15:55:22] Logged the message, RobH [15:55:36] RECOVERY - Host db55 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [15:55:54] RECOVERY - Host db36 is UP: PING OK - Packet loss = 0%, RTA = 1.34 ms [15:56:21] RECOVERY - Host db34 is UP: PING OK - Packet loss = 0%, RTA = 0.40 ms [15:56:57] RECOVERY - Host db33 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [15:57:42] RECOVERY - mysqld processes on db34 is OK: PROCS OK: 1 process with command name mysqld [15:58:00] PROBLEM - Host virt1001 is DOWN: PING CRITICAL - Packet loss = 100% [15:58:09] RECOVERY - Host search32 is UP: PING OK - Packet loss = 0%, RTA = 0.45 ms [15:58:27] RECOVERY - mysqld processes on db56 is OK: PROCS OK: 1 process with command name mysqld [16:00:42] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 668 seconds [16:01:13] PROBLEM - MySQL Slave Delay on db50 is CRITICAL: CRIT replication delay 1006 seconds [16:01:13] PROBLEM - MySQL Replication Heartbeat on db50 is CRITICAL: CRIT replication delay 998 seconds [16:01:49] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 467 seconds [16:01:49] PROBLEM - MySQL Replication Heartbeat on db56 is CRITICAL: CRIT replication delay 234 seconds [16:02:11] andrewbogott: i hate these systems. [16:02:16] PROBLEM - MySQL Replication Heartbeat on db34 is CRITICAL: CRIT replication delay 1139 seconds [16:02:16] PROBLEM - MySQL Replication Heartbeat on db36 is CRITICAL: CRIT replication delay 969 seconds [16:02:27] comparing the sas adapter properties, virt1001 and virt1002 are identical. [16:02:34] PROBLEM - MySQL Slave Delay on db34 is CRITICAL: CRIT replication delay 1137 seconds [16:02:34] PROBLEM - MySQL Slave Delay on db36 is CRITICAL: CRIT replication delay 978 seconds [16:03:01] PROBLEM - Lucene disk space on search32 is CRITICAL: Connection refused by host [16:03:19] RECOVERY - MySQL Replication Heartbeat on db56 is OK: OK replication delay 0 seconds [16:03:35] mutante, paravoid, jeremyb: we had this discussion at the hackathon, and they decided upon the URI scheme. [16:03:44] RobH: Yeah, that was my experience too; the mgmt interface didn't show any differences, it was only when I started to configure that they diverged. [16:03:53] I'm not sure if they updated that or not on the documentation page [16:03:54] Ryan_Lane: they == wikidata ppl? [16:03:54] Ryan_Lane: update RT please? :) [16:03:57] not me [16:04:02] they need to [16:04:10] well, it's a bug in our queue [16:04:13] add a comment perhaps? [16:04:16] and they submitted it [16:04:19] * Ryan_Lane shrugs [16:04:22] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 6 seconds [16:04:26] what's the number again? [16:04:31] 2919 [16:04:33] I'll update it asking for them to update it [16:04:36] which seems silly to me [16:04:39] I had a vague recollection that you were involved [16:04:40] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [16:04:45] it's better to have such stuff in the tracker [16:04:47] if you "reply" you will mail Lydia and JeremyB with it [16:04:56] (i think not if you merely comment) [16:05:20] yeah [16:05:22] even if you add it as a comment (i.e. just for us) saying "We talked about that during the hackathon; we're waiting for them" or something [16:05:28] I updated it [16:05:30] Ryan_Lane: can we break up project uri scheme and project landing page location for what people get when they google "wikidata" into separate tasks? [16:06:19] Ryan_Lane: i.e. (at least initially) just a single static page that mostly just links to meta [16:07:04] PROBLEM - Lucene on search32 is CRITICAL: Connection refused [16:11:52] jeremyb: thats not really my decision [16:12:02] k [16:12:28] RECOVERY - MySQL Replication Heartbeat on db34 is OK: OK replication delay 0 seconds [16:12:37] RECOVERY - MySQL Slave Delay on db34 is OK: OK replication delay 0 seconds [16:14:43] RECOVERY - MySQL Replication Heartbeat on db50 is OK: OK replication delay 0 seconds [16:15:19] RECOVERY - MySQL Slave Delay on db50 is OK: OK replication delay 0 seconds [16:18:01] RECOVERY - MySQL Replication Heartbeat on db36 is OK: OK replication delay 0 seconds [16:18:19] RECOVERY - MySQL Slave Delay on db36 is OK: OK replication delay 0 seconds [16:20:16] PROBLEM - Host search32 is DOWN: PING CRITICAL - Packet loss = 100% [16:21:19] RECOVERY - Host search32 is UP: PING OK - Packet loss = 0%, RTA = 0.65 ms [16:21:28] RECOVERY - Lucene on search32 is OK: TCP OK - 0.001 second response time on port 8123 [16:21:55] RECOVERY - Lucene disk space on search32 is OK: DISK OK [16:25:31] New patchset: Pyoungmeister; "re-adding upgraded DBs to pools" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11046 [16:25:37] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11046 [16:26:07] New review: Pyoungmeister; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11046 [16:26:09] Change merged: Pyoungmeister; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11046 [16:30:24] New patchset: Pyoungmeister; "removing db25 from pool for kern upgarde" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11047 [16:30:29] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11047 [16:36:22] New review: MaxSem; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11037 [16:43:39] New review: Pyoungmeister; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11047 [16:43:41] Change merged: Pyoungmeister; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11047 [16:54:19] PROBLEM - mysqld processes on db25 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [16:55:17] New review: MaxSem; "(no comment)" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/7773 [17:00:19] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [17:00:46] PROBLEM - Host db25 is DOWN: PING CRITICAL - Packet loss = 100% [17:01:04] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [17:03:01] RECOVERY - Host db25 is UP: PING OK - Packet loss = 0%, RTA = 0.96 ms [17:04:04] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [17:08:58] PROBLEM - MySQL Replication Heartbeat on db25 is CRITICAL: CRIT replication delay 1175 seconds [17:09:16] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.047 second response time [17:09:25] PROBLEM - MySQL Slave Delay on db25 is CRITICAL: CRIT replication delay 1189 seconds [17:10:47] New patchset: Mark Bergsma; "Fix use of some undefined variables" [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11051 [17:10:48] New patchset: Mark Bergsma; "Replace type() checking in AttributeSet by instanceof()" [operations/debs/pybal] (mp-bgp) - https://gerrit.wikimedia.org/r/11052 [17:13:05] cmjohnson1_: any news on search32? [17:13:07] New review: Aaron Schulz; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11034 [17:13:30] cool [17:13:53] then I shall not worry about it for a while :) [17:15:37] I can shut it down. is it not in use? [17:18:06] hah, ok, I'll try to get some kinda confermation [17:20:53] drdee: quick question. some documentation says that you're the man to talk to re: owa3. can it be turned off for repair? [17:21:09] owa3? [17:21:18] yep [17:21:23] open web analytics? [17:21:28] where does it say this? [17:21:35] somehwere in RT [17:21:37] might be wrong [17:21:45] New review: Aaron Schulz; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9237 [17:21:49] I'm just trying to track down someone who knows what they're being used for :) [17:22:16] which RT ticket? [17:22:31] sure, trying to help out [17:22:34] 2511 [17:23:21] what are the specs of these machines? [17:23:59] no clue... [17:28:24] Ryan_Lane: mutante: re 2919, fyi, i would expect no further news on it until tomorrow. lydia didn't know what had been discussed/decided at the hackathon and said she'd ask them about it in a meeting tomorrow. (their day is essentially over and yours is still young!) [17:29:17] well, mine is not [17:29:20] I'm still in berlin [17:29:32] ohh. nvm! [17:30:15] btw, srv278 was acting up again [17:30:15] there's a long-standing RT ticket [17:31:17] srv278 sounds like a popular subject of nagios-wm's spam iirc [17:32:35] notpeter: i would love to use the owa3 machines once we fixed the dimm errors [17:36:27] New review: Aaron Schulz; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9136 [17:36:43] RECOVERY - MySQL Slave Delay on db25 is OK: OK replication delay 0 seconds [17:36:43] RECOVERY - MySQL Replication Heartbeat on db25 is OK: OK replication delay 1 seconds [17:37:37] RECOVERY - Host virt1001 is UP: PING OK - Packet loss = 0%, RTA = 26.49 ms [17:38:53] New review: Aaron Schulz; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: -1; - https://gerrit.wikimedia.org/r/9131 [17:42:30] notpeter: so yes you can turn off owa3 for repair [17:45:07] Reedy: https://rt.wikimedia.org/Ticket/Display.html?id=2619 does that have any status update? [17:45:24] no [17:45:26] notpeter: I used owa3 most recently. [17:45:45] it's currently part of a hardware swift test cluster [17:46:01] (at least it's supposed to be; you could check puppet) [17:46:53] though all that detail aside, I also think it's ok to take it down for maintenance. :) [17:49:46] PROBLEM - Puppet freshness on es1003 is CRITICAL: Puppet has not run in the last 10 hours [17:49:46] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [17:49:46] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [17:53:55] New review: Aaron Schulz; "Can we be more consistent with the names. The current pattern is ".php" and "-wmflabs.ph..." [operations/mediawiki-config] (master); V: 0 C: -1; - https://gerrit.wikimedia.org/r/11036 [17:54:32] New review: Aaron Schulz; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11035 [18:02:32] !log halting owa3 for repairs [18:02:37] Logged the message, notpeter [18:03:00] cmjohnson1_: ^ [18:05:22] PROBLEM - Host owa3 is DOWN: PING CRITICAL - Packet loss = 100% [18:06:27] New patchset: Jgreen; "moved fundraising udplog collection to its own script" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11059 [18:06:51] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11059 [18:07:25] New patchset: Pyoungmeister; "repooling db25" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11060 [18:07:31] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11060 [18:08:20] New review: Pyoungmeister; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11060 [18:08:22] Change merged: Pyoungmeister; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11060 [18:09:10] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11059 [18:09:12] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11059 [18:13:37] New review: Aaron Schulz; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9132 [18:25:15] New patchset: Jgreen; "adding cron job to fetch fundraising udplogs for archiving" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11064 [18:25:39] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11064 [18:25:44] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11064 [18:25:47] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11064 [18:32:22] mark: can you copy data from stat1:/a to some place safe? [18:34:20] New patchset: Dereckson; "(bug 37482) Adding Proofread Page ext. namespaces on nl.wikisource" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11067 [18:34:26] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11067 [18:35:41] maplebed: do you have a sec to come over? [18:35:55] about to head out to lunch. [18:39:20] robh; do you have two decommissioned servers available for the analytics team for 24 hours [18:39:39] so we can move some data temporarily to those machines while we are reinstalling stat1 with precise [18:39:55] maybe owa1 and owa2 or some other boxes? [18:44:13] New patchset: Jgreen; "fixed cron minute syntax" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11072 [18:44:35] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11072 [18:44:35] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11072 [18:52:53] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [19:02:35] New review: Alex Monk; "(no comment)" [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/10855 [19:19:13] New patchset: Jgreen; "adjusting cron user for hume's impression_log_rotator" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11079 [19:19:37] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11079 [19:19:37] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11079 [19:33:17] robh are you around? [19:33:27] whats up? [19:33:45] do you have two decommissioned servers available for the analytics team for 24 hours [19:33:52] so we can move some data temporarily to those machines while we are reinstalling stat1 with precise [19:33:58] maybe owa1 and owa2 or some other boxes? [19:34:09] if they are decommissioned, its due to them being failed [19:34:16] New patchset: Dereckson; "Changing namespaces on the Kurdish wikis." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11082 [19:34:17] owa1 and owa2 are not decomissioned. [19:34:21] (that im aware of) [19:34:22] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11082 [19:34:27] are they being used at all? [19:34:37] thats not the same thing ;] [19:34:40] we need about 400Gb of hard disk space [19:34:44] for 24 hours [19:35:06] does it need to be an independent server or can we simply attached a usb disk? [19:35:17] the latter is quite a bit easier [19:35:39] we have a disk drive toaster and if i recall, some spare disks that we want for another use that is delayed [19:35:47] cmjohnson1_: you have those grosley disks for upgrade right? [19:36:11] if i recall, those are over 1TB disks. [19:36:20] can you hook up a 400Gb usb drive to stat1? then let's do that [19:36:36] if we can attach one in the drive toaster and attach it to stat1 pls ping an affirmative [19:36:40] cmjohnson1_: ^ [19:36:50] drdee: you have sudo on stat1? [19:36:58] no, but ottomata has [19:37:11] ok, if he is comfortable formatting and mounting the disk thats cool [19:37:18] if not he can feel free to ping me to assist [19:37:34] im going to go ahead and create an RT ticket and assign it to cmjohnson1_ [19:37:41] sure he can do that [19:42:09] i wish the nfs solution in tmapa was working =P [19:42:09] would make this easier [19:45:09] hm [19:45:18] RobH, I can probably figure it out [19:45:37] New review: Siebrand; "(no comment)" [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/11067 [19:45:48] i've actually never worked with a USB device in linux beforeā€¦but I assume it just shows up in /dev [19:45:54] yep [19:46:05] k, lemme know which one it is and ja, no probs [19:46:20] well, cmjohnson1_ is afk so he hasnt started yet [19:46:33] ok [19:46:34] but you are on the ticket, so once he has it done you will get an email [19:46:37] ok cool [19:47:25] New patchset: Dereckson; "Changing namespaces on the Kurdish wikis." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11082 [19:47:31] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11082 [19:47:49] RobH: thx! [19:49:22] New review: Siebrand; "(no comment)" [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/11003 [19:51:35] New review: Siebrand; "See https://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Volap?k_Wikibooks" [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/10755 [19:56:45] New review: Siebrand; "No expected side effects?" [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/10707 [19:57:46] New review: Siebrand; "(no comment)" [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/11082 [20:03:18] New review: MaxSem; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11067 [20:03:22] Change merged: MaxSem; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11067 [20:04:37] New review: MaxSem; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/10084 [20:04:44] Change merged: MaxSem; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/10084 [20:07:39] New review: MaxSem; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: -1; - https://gerrit.wikimedia.org/r/11082 [20:10:39] RobH: Did you learn anything useful about the cisco drives? [20:10:54] still poking at some screens [20:11:05] but they seem identical to me and its starting to annoy me that i dunno why =P [20:11:14] https://gerrit.wikimedia.org/r/#/c/11082/ [20:11:23] most reviewers added ever. [20:11:25] going to take a break from them shortly before i have the urge to set fire to them [20:12:46] New patchset: Dereckson; "Changing namespaces on the Kurdish wikis." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11082 [20:12:53] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11082 [20:14:37] RobH: Understandable [20:22:32] RECOVERY - Host owa3 is UP: PING OK - Packet loss = 0%, RTA = 0.75 ms [20:23:29] !log Fixed ownership of php-1.20wmf{4,5}/cache/l10n , should be l10nupdate:wikidev . The wmf4 copy had wrong ownership causing rebuildLocalisationCache.php to fail for shell users (e.g. from scap) [20:23:33] Logged the message, Mr. Obvious [20:27:14] !log hume has a full disk [20:27:19] Logged the message, Mr. Obvious [20:27:31] cmjohnson1_: yea use the 3tb [20:27:33] its temp anyhow [20:28:05] RobH: you did not get to moving of test2, right? [20:28:10] !log Correction: the /usr/local/apache filesystem is full on hume, the root fs is not [20:28:15] Logged the message, Mr. Obvious [20:28:28] nope, not yet, been too swamped with last minute ashburn tasks as i fly to tampa tomorrow [20:28:55] I forgot about scheduling that [20:29:07] RobH: Hmm, Wikimania hackathon maybe? [20:29:31] that would be great - i could see it in action [20:30:43] no can do hackathon, prior work commitments [20:32:45] woot woot [20:32:53] thanks cmjohnson1_ [20:35:34] RobH, what device? [20:35:37] or [20:35:37] cmjohnson1_ [20:35:38] ? [20:36:19] cmjohnson1: wont be able to query that [20:36:22] lemme take a look [20:36:22] New patchset: Dereckson; "(bug 37340) Set default search options on vi.wikibooks" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11086 [20:36:28] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11086 [20:39:49] hrmm [20:40:23] ottomata: ok, its sdb [20:40:35] ha, ok [20:40:38] just was confirming it [20:40:47] i can see it get applied in syslog [20:41:11] it doesnt appear as 3tb, but its either a smaller disk [20:41:18] or an issue iwth theusb drive chassis detection [20:41:26] but sincey you only need less than what it shows now, should be ok. [20:41:31] ok cool [20:41:34] formatting now [20:45:53] ottomata: for paranoia sake, feel free to do your copy, then we can ask cmjohnson1 to move it to another server [20:45:58] and we can mount it for a test read [20:46:28] New patchset: Hashar; "import CommonSettings from wmflabs" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9131 [20:46:34] New review: jenkins-bot; "Build Failed " [operations/mediawiki-config] (master); V: -1 C: 0; - https://gerrit.wikimedia.org/r/9131 [20:47:06] New review: Hashar; "Patchset 2 use wmfConfigDir in require() call." [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/9131 [20:49:19] RobH, ottomata: let's make sure that we trust the backup for 100% :D [20:49:29] hence my suggestion ;] [20:51:31] New patchset: Hashar; "import CommonSettings from wmflabs" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9131 [20:51:37] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/9131 [20:52:49] New review: Hashar; "Patchset 3 is a rebase, fixing a trivial conflict." [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/9131 [20:53:48] New review: Aaron Schulz; "(no comment)" [operations/mediawiki-config] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9131 [20:54:18] I love platform engineer team [20:54:41] We have two very quick bots: Sam and Aaron ;-D [20:55:09] ok sounds good [20:55:12] it is copying now [20:56:58] New patchset: Hashar; "move mobile related conf to their own files" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/11036 [20:57:03] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/11036 [20:57:28] PROBLEM - Puppet freshness on es1004 is CRITICAL: Puppet has not run in the last 10 hours [20:57:42] New review: Hashar; "Patchset2:" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/11036 [21:04:37] New review: Hashar; "This change need approval from either Patrick or Arthur (aka mobile engineers). We do not want to m..." [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/11036 [21:10:31] PROBLEM - Apache HTTP on mw15 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:12:01] RECOVERY - Apache HTTP on mw15 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [21:12:05] New patchset: Jgreen; "added key for root@silicon-->logmover@{various fr hosts}" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11116 [21:12:28] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11116 [21:12:43] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11116 [21:12:45] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11116 [21:19:26] New patchset: Jgreen; "grr, actually don't want silicon key authd by logmover account" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11117 [21:19:50] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/11117 [21:20:10] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11117 [21:20:13] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11117 [21:22:12] New patchset: Jgreen; "adding fundraising::backup::archive to storage3" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11118 [21:22:34] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11118 [21:22:34] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11118 [21:29:30] New patchset: Jgreen; "added loudon root public key to backupmover@{various hosts}" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11120 [21:29:53] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11120 [21:29:53] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11120 [21:43:59] New patchset: Jgreen; "adjusting fundraising backup scripts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11121 [21:44:22] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/11121 [21:44:22] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11121 [21:51:07] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [22:42:08] PROBLEM - Puppet freshness on searchidx2 is CRITICAL: Puppet has not run in the last 10 hours [22:57:46] does anyone here have access to mailman error logs? [22:58:19] maplebed or Reedy maybe? [22:58:31] ops will, I don't [22:58:36] sorry, can't help you at the moment. [22:58:52] ok [22:59:58] maybe you can help troubleshoot without it Reedy, there's an error with trying to view a subscriber list, one of the mailing list threads said this could be because [23:00:01] > This is the same issue. Due to a redirect or something else, probably in [23:00:02] > your web server configuration, you are losing the POST data from all [23:00:07] > your web transactions. [23:00:50] lighttpd/50-mailman.conf has one part of the config [23:01:34] and exim/exim4.listserver_aliases.conf has another part, but I don't think it's that file [23:03:18] wouldn't Casey know about those? [23:03:23] do you know where the POST data could be getting lost? [23:04:01] i suspect he won't [23:04:03] Theo10011, I don't think so, this is the more technical ops side of mailman [23:04:14] k [23:58:25] !log started swift container listing loop to compare purge timing when listings are fresh [23:58:31] Logged the message, Master [23:58:39] AaronSchulz: ^^^^ [23:58:59] hmm [23:59:01] it's running in a screen session on iron, fwiw. [23:59:07] * AaronSchulz debugs an infinite loop [23:59:19] so now we wait 12-24hrs and look at the graphs.