[00:05:53] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.041 second response time [00:11:29] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [00:18:42] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [00:19:26] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.30 ms [00:51:32] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [01:15:32] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [01:41:02] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 234 seconds [01:41:38] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 234 seconds [01:42:23] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.046 second response time [01:48:59] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 677s [01:53:02] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 14 seconds [01:53:47] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 9 seconds [01:54:59] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 2s [02:38:20] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [02:50:20] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.052 second response time [03:13:08] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [03:36:45] New review: Jeremyb; "still didn't work, re-followup: I45266eacf4002e" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/14869 [03:36:57] New patchset: Jeremyb; "wikipedie.cz/experti_na_prirodu followup (again)" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/15697 [03:43:57] New patchset: Jeremyb; "wikipedie.cz/experti_na_prirodu followup (again)" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/15697 [03:48:54] New review: Jeremyb; "I thought I tested this before? anyway..." [operations/apache-config] (master) C: 1; - https://gerrit.wikimedia.org/r/15697 [04:13:08] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [05:04:16] morning [05:05:53] morning [05:26:11] PROBLEM - Puppet freshness on ms3 is CRITICAL: Puppet has not run in the last 10 hours [05:27:14] bah, I should look at mw8, but I also *should finish up testing on this code I'm working on... an the latter is much more motivating than the former [05:31:57] !log reboot to upgrade kernel etc on mw8 since it's been flapping anyways [05:32:06] Logged the message, Master [05:41:11] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [05:43:11] you lie, it is back up [05:43:13] silly nagios [05:46:35] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [05:47:11] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.73 ms [05:47:19] ok that was mw8 rebooting itself. how irritating [06:12:50] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.030 second response time [06:14:40] moin! [06:14:47] paravoid: home yet? [06:16:00] !g 15697 | *, could use a deploy when you have a chance [06:16:00] *, could use a deploy when you have a chance: https://gerrit.wikimedia.org/r/#q,15697,n,z [06:16:09] ;) [06:21:03] * jeremyb sleeps [06:38:29] jeremyb: yep! [06:38:56] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [06:45:32] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [06:51:54] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 10.0376452756 (gt 8.0) [06:54:36] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 0.63721990991 [06:55:30] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [07:00:00] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [07:02:51] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [07:11:51] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [07:25:30] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 9.15754380531 (gt 8.0) [07:28:03] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 0.652939206349 [07:33:09] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [07:43:13] New patchset: Tim Starling; "redirects.conf cleanup" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/15698 [07:54:45] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [08:00:18] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.040 second response time [08:12:18] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [08:32:06] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.038 second response time [08:39:36] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [08:44:06] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.047 second response time [09:43:48] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [09:46:48] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [11:35:06] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:39:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.026 seconds [12:11:15] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:20:51] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [12:21:54] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.025 seconds [12:37:48] PROBLEM - Puppet freshness on srv194 is CRITICAL: Puppet has not run in the last 10 hours [12:49:03] New patchset: Dzahn; "RT-3244,redirect for wikipedie.cz,fix double encoding,make _ or - optional" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/15705 [12:49:55] Change merged: Dzahn; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/15705 [12:52:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:56:06] !log sync-apache, graceful-all to fix wikipedia.cz redirect [12:56:15] Logged the message, Master [13:03:00] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.021 seconds [13:13:21] mutante: Hello! jeremyb made a new change to the wikipedie.cz redirect https://gerrit.wikimedia.org/r/#/c/15697/ [13:13:45] [13:56:23] !log sync-apache, graceful-all to fix wikipedia.cz redirect [13:13:48] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [13:14:06] hashar: hi. already fixed / mid-air collision :) [13:14:14] \O/ [13:14:56] hashar: https://gerrit.wikimedia.org/r/#/c/15705/1/redirects.conf [13:15:22] (-l_) ?? [13:15:27] hashar: also makes "_" or "-" optional, somebody pointed out using underscores may not be the best choice when it is used in print [13:15:29] oh it is a pipe [13:16:02] Tim did some huge rewrite of redirects.conf in https://gerrit.wikimedia.org/r/#/c/15698/ [13:16:24] ooh, so much text [13:16:30] he dropped the regex to use RewriteCond . =somematch [OR] RewriteCond . =someothermatch [13:16:31] hehe [13:16:55] I am looking for an Apache conf unit testing [13:17:01] hashar: i liked the deployment though i want to point out. first one without svn [13:17:31] probably easier to use only one SCM [13:17:37] hashar: /h/w/bin/apache-fast-test [13:17:39] make sure you submit back in gerrit :) [13:18:11] i just do gerrit [13:18:22] and Jeff-test: testing 8 urls on 126 servers, totalling 1008 requests [13:18:36] spawning threads..... [13:21:03] New review: Dzahn; "already fixed in this one:" [operations/apache-config] (master); V: 0 C: -1; - https://gerrit.wikimedia.org/r/15697 [13:22:19] "Removed RewriteRule directives for some hosts that aren't in the [13:22:28] ServerAlias list" . heh . yea :p [13:22:56] maybe there is too many changes in Tim commit :) [13:23:01] might end up splitting it [13:23:04] hashar: wow, yea, what a cleanup [13:24:03] so [OR] and 2 Conditions is better than | in one condition [13:24:20] ? [13:24:29] it surely prevents people from doing regex mistakes [13:24:56] it replace 1 PCRE call by 2 strings comparisons, might be even slightly faster (I have no idea) [13:25:05] Tim point was to prevents people from doing mistakes [13:27:03] ah, and he removed these "RewriteCond %{REQUEST_URI}" [13:27:34] to be replaced with regular RewriteRule [13:34:48] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:42:09] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.341 seconds [13:44:40] mw8 php: PHP Fatal error: Allowed memory size of 157286400 bytes exhausted (tried to allocate 284992 bytes) in /usr/local/apache/common-local/php-1.20wmf6/includes/objectcache/MemcachedClient.php on line 932 [13:44:52] apergos: you said you saw mw8 rebooting ... [13:45:50] yes I sure did [13:45:58] you can look at the dmesg.* logs for that [13:46:08] but I had it reboot from under me [13:46:18] (and checked the uptime to see it had in fact rebooted righ then [13:46:30] back in a bit, I have to get juice, I'm out. [13:46:40] oki [13:46:47] poor little laptop, it's working harder than ever (fan going on full) [13:46:50] it's probably faster, but negligible either way [13:47:00] it replace 1 PCRE call by 2 strings comparisons, might be even slightly faster (I have no idea) [13:47:08] ah, thanks Tim [13:47:16] Tim-away: definitely clearer and less error prone :-D [13:48:13] yep, that's the idea [13:48:39] apergos: no wonder,it's probably outside specified operating temperature with the heat over there [13:49:01] system-wide profiling on mw1 shows that apache regexes are less than 0.01% of CPU time [13:50:15] !log mw8 PHP fatal errors, running out of memory [13:50:24] Logged the message, Master [13:56:17] mutante: I pulled the system event log on mw8 and there are DIMM errors...I will need to do a few test to satisfy Dell support but most likely needs new DIMM. [13:56:59] cmjohnson1: ah, gotcha, i just happened to see that in syslog because it was reported as rebooting [13:58:41] !log mw8 - alright, most likely just needs new DIMM per cmjohnson [13:58:49] Logged the message, Master [14:00:47] grr [14:00:56] I always end up writing parsers in PHP [14:00:59] ;) [14:02:45] !log Stopped PyBal on lvs6 to failover traffic to lvs2 [14:02:53] Logged the message, Master [14:05:24] PROBLEM - BGP status on cr1-sdtpa is CRITICAL: CRITICAL: host 208.80.152.196, sessions up: 5, down: 1, shutdown: 0BRPeering with AS64600 not established - BR [14:07:06] New patchset: Mark Bergsma; "Move lvs6 configuration to new-style with IPv6, for after reinstall" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15707 [14:07:43] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15707 [14:11:23] back [14:12:00] yeah I was going to ask for memory tests first [14:12:06] yay cmjohnson1 who got them done already [14:12:40] !log Reinstalling lvs6 with Ubuntu Precise [14:12:47] Logged the message, Master [14:13:16] apergos: random rebooting is usually the indicator....will need to call Dell to send us new DIMM...but they require a special report. [14:13:25] ok [14:13:48] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [14:16:22] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15707 [14:16:57] PROBLEM - SSH on lvs6 is CRITICAL: Connection refused [14:17:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:20:06] RECOVERY - SSH on lvs6 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [14:25:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.546 seconds [14:29:33] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [14:32:33] RECOVERY - BGP status on cr1-sdtpa is OK: OK: host 208.80.152.196, sessions up: 6, down: 0, shutdown: 0 [14:33:02] !log lvs6 is back up and serving traffic [14:33:09] Logged the message, Master [14:35:23] !log adjusted swift rings; set new object servers to 20, new container servers to 100 [14:35:31] Logged the message, Master [14:45:30] !log Stopped PyBal on lvs5 to failover traffic to lvs1 [14:45:37] Logged the message, Master [14:48:11] New patchset: Mark Bergsma; "Move lvs5 configuration to new-style with IPv6, for after reinstall" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15709 [14:48:49] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15709 [14:49:03] PROBLEM - BGP status on cr1-sdtpa is CRITICAL: CRITICAL: host 208.80.152.196, sessions up: 5, down: 1, shutdown: 0BRPeering with AS64600 not established - BR [14:49:41] !log Reinstalling lvs5 with Ubuntu Precise [14:49:48] Logged the message, Master [14:51:45] PROBLEM - Host lvs5 is DOWN: PING CRITICAL - Packet loss = 100% [14:57:25] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15709 [15:00:00] RECOVERY - Host lvs5 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [15:00:09] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:04:16] New patchset: Mark Bergsma; "Use a selector instead of if-statements" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15710 [15:04:52] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15710 [15:10:48] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.030 seconds [15:11:51] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.130 second response time [15:14:43] New patchset: Dzahn; "planet - new configs compatible with planet-venus, add index.html, include locals" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15711 [15:15:20] New patchset: Mark Bergsma; "Clean up old style LVS service IP configuration Use a selector instead of if-statements" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15710 [15:15:54] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15711 [15:15:54] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15710 [15:17:42] RECOVERY - BGP status on cr1-sdtpa is OK: OK: host 208.80.152.196, sessions up: 6, down: 0, shutdown: 0 [15:17:45] !log lvs5 is back up and serving traffic [15:17:53] Logged the message, Master [15:18:30] New review: Dzahn; "not yet, these should actually be templates too" [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/15711 [15:22:00] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15710 [15:22:30] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [15:24:00] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.025 second response time [15:26:51] PROBLEM - Puppet freshness on ms3 is CRITICAL: Puppet has not run in the last 10 hours [15:28:50] New patchset: Mark Bergsma; "Enable IPv6 for lvs3, after reinstall" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15712 [15:29:24] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15712 [15:29:35] !log Reinstalling lvs3 with Ubuntu Precise [15:29:42] Logged the message, Master [15:31:48] PROBLEM - Host lvs3 is DOWN: PING CRITICAL - Packet loss = 100% [15:33:09] PROBLEM - BGP status on cr2-pmtpa is CRITICAL: CRITICAL: host 208.80.152.197, sessions up: 6, down: 1, shutdown: 0BRPeering with AS64600 not established - BR [15:34:25] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15712 [15:37:21] RECOVERY - Host lvs3 is UP: PING OK - Packet loss = 0%, RTA = 1.27 ms [15:37:48] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [15:40:40] mark: oh? [15:40:54] oh what? :) [15:41:03] precise/ipv6 on all lvs? [15:41:08] well, just tampa today [15:41:10] and not lvs4 yet [15:41:20] lvs4 is active internal tampa [15:41:32] still :) [15:41:34] it does some different checks which haven't gotten testing yet with the new pybal, the runmonitor [15:41:48] but yes [15:42:01] is that the mbgp pybal? [15:42:08] i found sufficiently severe bugs in the old version that the new one can't possibly be worse ;) [15:42:09] yes [15:42:27] PROBLEM - Host mw1032 is DOWN: PING CRITICAL - Packet loss = 100% [15:42:33] yay [15:42:47] i'll to eqiad/esams later this week [15:42:49] do [15:43:27] Logged the message, Master [15:43:57] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:44:17] there is a chance I will be late for tonight's meeting. if so it will only be by 10 or 15 minutes, please don't wait though [15:44:21] back in a while [15:44:34] are you ever in time for it? ;) [15:45:16] New patchset: Dzahn; "add missing locale pt_PT.UTF-8 UTF-8 for pt.planet" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15713 [15:45:51] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15713 [15:45:59] New review: Dzahn; "fix pt.planet runs" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/15713 [15:46:01] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15713 [15:46:57] mark: we can share the work if you like [15:47:14] can do, but i'd still rather not do it today [15:47:20] sure [15:47:24] this is the first time lvs on precise gets some serious traffic for somet ime [15:47:32] before it just did ipv6 traffic and the occasional failover [15:47:40] yes, make sense to do it gradually [15:47:45] then again, it's gotten quite a bit of traffic during the recent juniper trouble [15:48:00] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [15:48:02] yeah [15:48:02] I missed that [15:48:08] yes [15:48:10] you know what caused it? [15:48:38] "reject" instead of "discard" on the ACLs ;-) [15:48:40] I vaguely recollect coming to irc and reading something about the reject term? [15:48:44] right [15:48:44] yeah [15:49:12] there was a bug somewhere in the nicaragua telcos [15:49:23] and I was getting each SMS about ~30 times [15:49:31] haha ouch [15:49:31] over the course of 3-4 days [15:49:44] can you imagine what happened when that occured? :) [15:49:52] yeah [15:49:56] had a nice holiday otherwise? :) [15:49:59] hehe [15:50:05] yeah, it was great [15:50:16] mithrandir said you were quite enjoying the poolside ;) [15:50:24] hahaha! [15:50:45] when I said "too bad faidon's on holiday, or I would have him merge this debian git repo in with your patches" [15:51:09] well, I'm back ;) [15:51:27] too late, did it myself now [15:51:41] wb ;) [15:51:45] oh, too bad [15:51:56] hrm, question for you [15:52:07] are we going to keep ipv6 traffic on the ipv4-inactive lvs? [15:52:20] or move both afis to the same servers? [15:52:20] i am not decided about that yet [15:52:29] right now they don't announce the ipv6 ips for most (squid) services [15:52:38] they do for native (varnish) [15:52:45] in the latter case, it means that we'll get blind monitoring-wise regarding the amount of ipv6 traffic that we get [15:52:47] but once we've ugpraded them all, we can change that [15:52:48] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.036 seconds [15:52:54] yeah [15:53:00] i'm considering keeping it as is too [15:53:03] but now with bgp and failover [15:53:16] but i'll wait until the upgrades are complete, less complicated that way [15:54:14] we can always find another way to measure it [15:55:48] RECOVERY - BGP status on cr2-pmtpa is OK: OK: host 208.80.152.197, sessions up: 7, down: 0, shutdown: 0 [15:56:35] !log lvs3 is back up, and idling [15:56:42] Logged the message, Master [16:00:05] lvs3 seems to do fine as well [16:09:04] New patchset: Platonides; "Update to the new url." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15715 [16:09:40] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15715 [16:24:54] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:28:03] PROBLEM - Host mw1041 is DOWN: PING CRITICAL - Packet loss = 100% [16:33:54] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.148 seconds [16:57:12] New patchset: Matthias Mullie; "lower AFTv4 odds to display AFTv5 at 5% (inverse odds)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/15719 [16:57:21] Change abandoned: Matthias Mullie; "(no reason)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14041 [17:03:02] !log draining cr2-eqiad to cr1-sdtpa link for moving of fiber [17:03:09] Logged the message, Mistress of the network gear. [17:03:45] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [17:07:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:08:38] !log reactivating XO transit connections on cr1-sdtpa [17:08:45] Logged the message, Mistress of the network gear. [17:12:45] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [17:16:12] PROBLEM - Router interfaces on cr1-sdtpa is CRITICAL: CRITICAL: host 208.80.152.196, interfaces up: 76, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/1/0: down - Core: cr2-eqiad:xe-5/2/1 (FPL/Level3, CV71028) [10Gbps wave]BR [17:17:52] 10Gbps of sexyness [17:18:09] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.020 seconds [17:19:10] hehehe [17:19:17] fyi the pulling that is me :) [17:21:18] it's just one wave, good knows how many more colours are there [17:21:50] well, only for us… oh man, i wish we bought dark fiber instead of just waves … that would be so cool :) [17:23:17] Dark fiber always makes me think of un-light fiber rather than shared betwean isps fiber... damn funky terms [17:28:52] Change merged: Kaldari; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/15719 [17:30:38] http://dilbert.com/strips/comic/2002-06-14/ [17:35:13] LeslieCarr: we're not doing it right. We need to get these suppliers paying to peer with us ;) [17:35:23] hahaha [17:43:48] RECOVERY - Router interfaces on cr1-sdtpa is OK: OK: host 208.80.152.196, interfaces up: 78, down: 0, dormant: 0, excluded: 0, unused: 0 [17:50:15] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:55:48] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [17:59:42] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.036 seconds [18:03:24] LeslieCarr: I saw that you merged the LVS in separate ganglia group when I was away [18:03:48] LeslieCarr: but I don't see it in ganglia.wm.org [18:04:35] any ideas? :) [18:04:42] hrm [18:04:56] gmetad.conf looking at that multicast address ? [18:31:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:38:51] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.455 seconds [18:43:47] New patchset: Alex Monk; "(bug 38424) Replace hywikiquote logo" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/15730 [19:11:33] PROBLEM - MySQL Replication Heartbeat on db12 is CRITICAL: CRIT replication delay 187 seconds [19:12:45] PROBLEM - MySQL Slave Delay on db12 is CRITICAL: CRIT replication delay 249 seconds [19:13:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:24:09] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.036 seconds [19:40:03] RECOVERY - MySQL Slave Delay on db12 is OK: OK replication delay 24 seconds [19:40:21] RECOVERY - MySQL Replication Heartbeat on db12 is OK: OK replication delay 14 seconds [19:44:51] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [19:47:42] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [19:56:15] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:06:54] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.027 seconds [20:38:51] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:47:51] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.032 seconds [21:21:09] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:30:00] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.056 seconds [22:03:54] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:11:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.576 seconds [22:13:05] Change abandoned: Jeremyb; "done in I5c1b572084caaee" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/15697 [22:19:30] Ryan_Lane: Can you fix my LDAP info "Full name" entry to be "Timo Tijhof" instead of "Krinkle"? Looks like more people from svn have that mis-set. [22:19:31] My git commits do have the right settings [22:21:20] you can't set it in mediawiki? [22:21:54] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [22:22:36] Ryan_Lane: What do you mean "in mediawiki" ? [22:22:47] If I go to my Gerrit preferences the field is locked, it comes from LDAP. [22:22:50] ah [22:23:01] (or at least I think that's the reason) [22:23:03] sec [22:23:51] I'm not sure which field that pulls from [22:23:59] your cn is Krinkle [22:24:16] all kinds of shit will break if I change tat [22:24:18] *that [22:24:40] username is lowercase "krinkle", that's fine. But the Full name field shouldn't affect anything, right ? [22:25:10] Hm.. [22:26:02] New patchset: Asher; "optionally collect ibdata free space" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15741 [22:26:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15741 [22:29:41] Krinkle, I think it was you git config of user.name [22:30:00] My user.name in git is Timo Tijhof, always has been [22:30:29] hmm.. it is [22:31:16] yes, the Krikles come from gerrit merges [22:31:34] I go with TImo Tijhof, then [22:32:25] So the three places I see "Krinkle" in Gerrit (could all becoming from the same source): https://gerrit.wikimedia.org/r/#/settings/web-identities https://gerrit.wikimedia.org/r/#/settings/ (profile) and whenever I make a comment [22:32:38] Platonides: yes [22:34:16] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15741 [22:37:18] Krinkle: full name matters, yes [22:37:23] it's used for wiki and gerrit log in [22:37:31] and renames don't work in gerrit right now [22:38:05] once that's solved I'll gladly rename you [22:38:08] Ryan_Lane: Used as mediawiki username or mediawiki full name ? [22:38:13] ok [22:38:14] mediawiki username [22:38:19] and gerrit log in name [22:38:19] aye [22:38:36] yeah, but login name isn't an issue. That only affects me logging in :D [22:38:51] PROBLEM - Puppet freshness on srv194 is CRITICAL: Puppet has not run in the last 10 hours [22:38:59] Ryan_Lane: Is there an open bug for this in Gerrit? [22:39:16] yes [22:39:19] somewhere [22:44:00] Logged the message, Master [22:44:08] Logged the message, Master [22:44:16] Logged the message, Master [22:44:23] Logged the message, Master [22:44:31] Logged the message, Master [22:44:39] Logged the message, Master [22:44:48] New patchset: Asher; "Revert "optionally collect ibdata free space"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15743 [22:45:23] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15743 [22:45:29] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15743 [22:45:29] Logged the message, Master [22:45:37] Logged the message, Master [22:45:45] Logged the message, Master [22:45:48] o.O [22:45:52] Logged the message, Master [22:46:00] Logged the message, Master [22:46:08] ok. that's gotta stop [22:46:09] Logged the message, Master [22:46:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:54:15] it's interesting, I see 14 merges by Krinkle and 11 by Timo [22:54:52] some are performed by Gerrit (uses LDAP name + email), and others by myself directly from git command line (which uses my .gitconfig) [22:56:58] you got that changed now? [22:57:09] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.022 seconds [23:00:39] Platonides: No, Gerrit doesn't support renames yet and (contrary to mediawiki) full name is actually used in references, not just as freeform field. [23:01:02] what? wtf [23:01:24] it's locked after creation [23:01:29] it pulls from ldap [23:01:40] I understand your permissions might need to be changed, if it's adding you by name [23:01:43] Krinkle: mediawiki renames are bs ;) [23:01:48] (easy since we do it by groups) [23:02:04] but I don't see anything in git where it uses it in a reference [23:02:14] AaronSchulz: Don't mix up full name and user name. User rename in MediaWiki is something else. I'm not talking about changing my ldap user name. Just the full name property. [23:02:16] hmm... maybe as a primary key for user search [23:02:26] well [23:02:27] ... [23:02:28] Platonides: Not in git, git is just text based and hard coded. [23:02:31] full name is cn, too [23:02:47] so is user name [23:02:54] Logged the message, Master [23:03:13] Logged the message, Master [23:03:19] this pdf server crap needs to be fixed [23:03:23] add that to the gerrit vs others list :) [23:03:40] Jeff_Green: where are these bots and what are they doing? [23:03:48] bleh [23:03:51] Platonides: afiak its not much gerrit related [23:04:01] I don't understand why bugs that can be fixed are being used as fodder against gerrit [23:04:11] its just that for some reason we use the ldap full name as labsconsole wiki user name (instead of the ldap user ID (e.g. "krinkle" instead of "Krinkle") [23:04:23] especially when other systems don't have any ldap support *at all* [23:04:42] Krinkle: that's for consistency sake [23:04:47] all web login uses the same thing [23:05:00] all shell login uses the same thing [23:05:16] from experience I know this is a good idea [23:05:59] the problem is that gerrit doesn't support user renames. that can and should be fixed. [23:06:00] the shell login uses the full name? [23:06:13] if you use the same user name for web and shell, yes [23:06:22] if you don't, then no [23:06:48] the shell name is restricted to lowercase-only and no special characters other than . _ and - [23:07:32] the web name can use full UTF-8 [23:08:25] I'm open to suggestions, but the current configuration is the most flexible option [23:08:50] and the way the authentication is split in a consistent scheme is the least confusing option [23:13:19] New review: Tim Starling; "I tested redirects.conf with apache itself:" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/15720 [23:14:20] New patchset: Cmjohnson; "Adding db63 -db77 to the dhcpd file" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15748 [23:14:51] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [23:14:55] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15748 [23:16:07] Ryan_Lane: bots? [23:16:34] Jeff_Green: the pp-pdf1-92390234890324 writing to the logs [23:17:41] even worse, it seems that they connect to the channel, then log, then exit the channel [23:18:02] so, for every server we have 4 lines in IRC for every log [23:18:07] it's super-spammy [23:18:44] ah. no clue [23:18:49] i have not had anything to do with that [23:20:34] who's working with the pediapress people right now? [23:26:23] New patchset: Tim Starling; "redirects.conf cleanup" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/15698 [23:27:59] New review: Tim Starling; "PS2: rebase to after Daniel's change I5c1b5720, tested that line." [operations/apache-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/15698 [23:28:46] I want to deploy that apache change shortly, anyone here who cares? [23:28:48] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:29:09] I tried PMing mutante but I guess he's gone for the day [23:37:38] #if APR_CHARSET_EBCDIC [23:37:38] what = apr_xlate_conv_byte(ap_hdrs_to_ascii, (unsigned char)what); [23:37:38] #endif /*APR_CHARSET_EBCDIC*/ [23:37:58] wow, I've never seen C code written to work in EBCDIC before [23:39:38] New patchset: Asher; "added on innodb_data_free metric to report ibdata free space, if collectable" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15749 [23:40:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.015 seconds [23:40:13] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15749 [23:40:30] are you sure an empty path pais not possible? [23:40:35] *part [23:41:02] I think that at least under some condition you must not place a leading / in a RewriteRule [23:41:20] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15749 [23:44:20] in .htaccess [23:45:23] so they are required in but never present in .htaccess? [23:45:41] the coders of mod_rewrite could have made it consistent... [23:50:04] many shop.* domains lead to the project, not to http://shop.wikimedia.org/ [23:50:25] (on current version)+ [23:51:58] I don't see the poing on having shop.mediawiki.org, when they are wikipedia goodies [23:53:00] or shop.quickipedia.net? [23:53:48] well, that's not even ours [23:56:30] Ryan_Lane: afaik nobody [23:56:56] # SHOP redirects [23:56:57] RewriteCond %{HTTP_HOST} ^(shop|store)\.(wik|mediawiki) [23:56:57] RewriteRule ^(.*)$ http://shop.wikimedia.org/ [R=301,L] [23:57:14] looks like they go to shop.wikimedia.org to me [23:57:27] try one [23:57:54] other wildcards may be matched earlier [23:57:58] see http://shop.wikimediafoundation.org/ [23:58:48] yes, there is an earlier wikimediafoundation rule [23:59:12] I tried some other which also redirected to the home project [23:59:19] but the *.mediawiki.org rule is after the shop redirect [23:59:26] that was my test case [23:59:28] shop.wikiquote.org [23:59:35] Jeff_Green: did you test any of these? [23:59:35] that also fails