[00:31:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:31:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:35:52] hi [00:35:53] any expert can help me with this http://epicfreeprizes.com/?ref=516747 ?? thanks so much, of course! [00:41:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.090 seconds [00:41:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.090 seconds [01:14:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:14:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:27:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.039 seconds [01:27:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.039 seconds [01:41:45] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 237 seconds [01:41:46] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 237 seconds [01:41:54] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 248 seconds [01:41:54] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 248 seconds [01:48:57] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 670s [01:48:57] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 670s [01:53:54] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 1 seconds [01:53:54] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 1 seconds [01:54:49] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 1s [01:54:49] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 1s [01:54:57] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 1 seconds [01:54:57] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 1 seconds [01:58:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:58:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:08:45] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.911 seconds [02:08:45] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.911 seconds [02:48:30] PROBLEM - Puppet freshness on srv281 is CRITICAL: Puppet has not run in the last 10 hours [02:48:30] PROBLEM - Puppet freshness on srv281 is CRITICAL: Puppet has not run in the last 10 hours [02:49:45] nagios-wm: quiet, 281 is out of rotation [03:19:06] PROBLEM - Puppet freshness on labstore1 is CRITICAL: Puppet has not run in the last 10 hours [03:19:07] PROBLEM - Puppet freshness on labstore1 is CRITICAL: Puppet has not run in the last 10 hours [03:32:09] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [03:32:10] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [03:56:09] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [03:56:10] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [04:36:26] PROBLEM - Puppet freshness on ms-be1003 is CRITICAL: Puppet has not run in the last 10 hours [04:36:26] PROBLEM - Puppet freshness on ms-be1003 is CRITICAL: Puppet has not run in the last 10 hours [04:51:26] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [04:51:26] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [05:04:20] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [05:04:20] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [05:19:20] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [05:19:20] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [06:48:53] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [06:48:53] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [06:48:53] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [06:48:54] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [06:48:54] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [06:48:54] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [07:19:03] New patchset: ArielGlenn; "rsync setup for ms10 (tampa media mirror)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17789 [07:19:42] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/17789 [07:23:59] Change abandoned: ArielGlenn; "ms10 has been set up as an internal host. gotta reinstall, gahhhh" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17789 [08:36:33] PROBLEM - Puppet freshness on bayes is CRITICAL: Puppet has not run in the last 10 hours [08:36:33] PROBLEM - Puppet freshness on bayes is CRITICAL: Puppet has not run in the last 10 hours [08:38:30] PROBLEM - Puppet freshness on srv242 is CRITICAL: Puppet has not run in the last 10 hours [08:38:30] PROBLEM - Puppet freshness on niobium is CRITICAL: Puppet has not run in the last 10 hours [08:38:30] PROBLEM - Puppet freshness on srv242 is CRITICAL: Puppet has not run in the last 10 hours [08:38:30] PROBLEM - Puppet freshness on niobium is CRITICAL: Puppet has not run in the last 10 hours [08:39:33] PROBLEM - Puppet freshness on mw27 is CRITICAL: Puppet has not run in the last 10 hours [08:39:33] PROBLEM - Puppet freshness on srv190 is CRITICAL: Puppet has not run in the last 10 hours [08:39:33] PROBLEM - Puppet freshness on srv238 is CRITICAL: Puppet has not run in the last 10 hours [08:39:33] PROBLEM - Puppet freshness on mw27 is CRITICAL: Puppet has not run in the last 10 hours [08:39:33] PROBLEM - Puppet freshness on srv190 is CRITICAL: Puppet has not run in the last 10 hours [08:39:33] PROBLEM - Puppet freshness on srv238 is CRITICAL: Puppet has not run in the last 10 hours [12:29:59] argh stat1 [12:30:03] sendmail?!? [12:30:46] (cleanup cronspam monday) [12:35:59] Cannot chdir to /mnt/htdocs/wikibooks for the other one [12:36:00] nice [12:40:23] welcome back :) [12:44:25] thanks [12:47:26] Is fenari heavily loaded atm? [12:47:42] took ages to get a login prompt.. [12:48:37] I dunno [12:48:40] lemme see [12:48:57] nope [12:49:10] how's nfs? :-P [12:49:36] PROBLEM - Puppet freshness on srv281 is CRITICAL: Puppet has not run in the last 10 hours [12:49:37] PROBLEM - Puppet freshness on srv281 is CRITICAL: Puppet has not run in the last 10 hours [12:49:59] seems ok when logged in [12:50:04] suggesting my connection or similar [12:52:53] maybe [12:53:01] it didn't seem like a long delay for me for the login [12:54:03] Computers suck! [12:54:16] Hmm, loading again and it's fine [12:54:32] yes they do [13:12:51] PROBLEM - Host ps1-b1-eqiad is DOWN: CRITICAL - Network Unreachable (10.65.0.40) [13:12:52] PROBLEM - Host ps1-b1-eqiad is DOWN: CRITICAL - Network Unreachable (10.65.0.40) [13:13:43] from 10.64.0.141 via cp1015.eqiad.wmnet (squid/2.7.STABLE9) to () [13:13:43] Error: ERR_CANNOT_FORWARD, errno (11) Resource temporarily unavailable at Mon, 06 Aug 2012 13:13:06 GMT [13:14:25] Our servers are currently experiencing a technical problem. [13:14:26] :( [13:14:29] squid down... [13:14:40] amssq33.esams.wikimedia.org (squid/2.7.STABLE9) to () [13:15:00] amssq34.esams.wikimedia.org (squid/2.7.STABLE9) to () [13:15:01] ., [13:15:41] via amssq33.esams.wikimedia.org (squid/2.7.STABLE9) to () [13:15:42] Error: ERR_CANNOT_FORWARD, errno (11) Resource temporarily unavailable at Mon, 06 Aug 2012 13:14:02 GMT [13:17:54] uh oh [13:18:03] think its network... [13:19:48] http://en.m.wikipedia.org/ [13:19:49] heh :) [13:20:31] different set of servers I think [13:21:31] yep [13:21:44] if it's an emergency, you can read wikipedia on the mobile site [13:22:16] grr, was browsing Wikipedia on US Elections 2012... [13:23:14] Hydriz: that was a *bad* idea :) [13:23:24] haha :P [13:35:05] woots [13:35:30] PROBLEM - check_minfraud_secondary on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:35:30] PROBLEM - check_minfraud_primary on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:35:30] PROBLEM - check_minfraud_secondary on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:35:30] PROBLEM - check_minfraud_primary on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:35:30] PROBLEM - check_minfraud_secondary on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:35:30] PROBLEM - check_minfraud_primary on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:35:30] PROBLEM - check_minfraud_secondary on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:35:31] PROBLEM - check_minfraud_primary on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:39:33] PROBLEM - NTP on db1022 is CRITICAL: NTP CRITICAL: No response from NTP server [13:40:27] PROBLEM - check_minfraud_secondary on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:40:27] PROBLEM - check_minfraud_secondary on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:40:27] PROBLEM - check_minfraud_primary on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:40:27] PROBLEM - check_minfraud_primary on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:40:27] PROBLEM - check_minfraud_secondary on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:40:27] PROBLEM - check_minfraud_primary on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:40:27] PROBLEM - check_minfraud_secondary on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:40:28] PROBLEM - check_minfraud_primary on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:41:21] RECOVERY - MySQL Slave Delay on es2 is OK: OK replication delay 12 seconds [13:45:25] PROBLEM - check_minfraud_secondary on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:45:25] PROBLEM - check_minfraud_secondary on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:45:25] PROBLEM - check_minfraud_primary on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:45:25] PROBLEM - check_minfraud_primary on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:45:25] PROBLEM - check_minfraud_secondary on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:45:25] PROBLEM - check_minfraud_primary on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:45:25] PROBLEM - check_minfraud_secondary on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:45:26] PROBLEM - check_minfraud_primary on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:46:00] PROBLEM - MySQL Slave Delay on es4 is CRITICAL: CRIT replication delay 289 seconds [13:47:48] RECOVERY - MySQL Slave Delay on es4 is OK: OK replication delay 12 seconds [13:50:30] PROBLEM - check_minfraud_secondary on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:50:30] PROBLEM - check_minfraud_primary on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:50:30] PROBLEM - check_minfraud_secondary on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:50:30] PROBLEM - check_minfraud_primary on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:50:30] PROBLEM - check_minfraud_secondary on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:50:30] PROBLEM - check_minfraud_primary on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:50:30] PROBLEM - check_minfraud_secondary on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:50:31] PROBLEM - check_minfraud_primary on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:50:48] PROBLEM - LVS on payments.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:52:36] PROBLEM - MySQL Slave Delay on es2 is CRITICAL: CRIT replication delay 300 seconds [13:53:30] PROBLEM - MySQL Slave Delay on es4 is CRITICAL: CRIT replication delay 353 seconds [13:55:27] PROBLEM - check_minfraud_secondary on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:55:27] PROBLEM - check_minfraud_secondary on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:55:27] PROBLEM - check_minfraud_primary on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:55:27] PROBLEM - check_minfraud_primary on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:55:27] PROBLEM - check_minfraud_secondary on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:55:27] PROBLEM - check_minfraud_primary on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:55:28] PROBLEM - check_minfraud_secondary on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:55:28] PROBLEM - check_minfraud_primary on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:57:24] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [13:57:43] RECOVERY - Host google is UP: PING OK - Packet loss = 0%, RTA = 55.46 ms [13:57:43] RECOVERY - Host cp3001 is UP: PING OK - Packet loss = 0%, RTA = 123.11 ms [13:57:43] RECOVERY - Host amslvs2 is UP: PING OK - Packet loss = 0%, RTA = 123.27 ms [13:57:43] RECOVERY - Host amssq61 is UP: PING OK - Packet loss = 0%, RTA = 122.85 ms [13:57:43] RECOVERY - Host amssq50 is UP: PING OK - Packet loss = 0%, RTA = 121.71 ms [13:57:44] RECOVERY - Host amssq53 is UP: PING OK - Packet loss = 0%, RTA = 122.99 ms [13:57:44] RECOVERY - Host amssq55 is UP: PING OK - Packet loss = 0%, RTA = 123.07 ms [13:57:45] RECOVERY - Host amssq62 is UP: PING OK - Packet loss = 0%, RTA = 122.96 ms [13:57:45] RECOVERY - Host amssq57 is UP: PING OK - Packet loss = 0%, RTA = 121.71 ms [13:57:46] RECOVERY - Host amssq52 is UP: PING OK - Packet loss = 0%, RTA = 122.91 ms [13:57:46] RECOVERY - Host amssq36 is UP: PING OK - Packet loss = 0%, RTA = 121.62 ms [13:59:03] RECOVERY - MySQL Slave Delay on es4 is OK: OK replication delay 0 seconds [13:59:03] RECOVERY - Host bits.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 118.87 ms [13:59:04] PROBLEM - BGP status on cr1-sdtpa is CRITICAL: (Service Check Timed Out) [13:59:12] PROBLEM - BGP status on cr2-pmtpa is CRITICAL: (Service Check Timed Out) [18:02:20] * Damianz pats wm-bot [18:02:23] Change merged: Aaron Schulz; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17827 [18:03:13] Could someone merge/push https://gerrit.wikimedia.org/r/#/c/17774/ ? [18:03:27] I'm not sure what difference it'll make, but either way, in it's current form it is wrong [18:04:10] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17774 [18:04:11] PROBLEM - Host csw1-esams is DOWN: PING CRITICAL - Packet loss = 58%, RTA = 3556.09 ms [18:04:28] Reedy: back if you didn't notice (wmbot) [18:04:34] Ta [18:04:54] Reedy: mark recently moved it to internal [18:05:02] and he's going to do the rest at some point too [18:09:44] RECOVERY - Host csw1-esams is UP: PING OK - Packet loss = 0%, RTA = 118.27 ms [18:09:48] Ryan_Lane: it's not just mobile servers [18:10:00] then it definitely wasn't our change :) [18:10:04] yes it was [18:10:07] how? [18:10:22] it's all varnish? [18:10:22] that fragment is in templates/varnish/wikimedia.vcl.erb [18:10:37] varnish only serves bits and mobile [18:10:39] and takes effect if xff_sources is non-empty [18:10:49] you can't edit through bits [18:11:15] who talked about edit? [18:11:26] it's about stats etc. too, isn't it? [18:11:27] the entire thread is about edits [18:11:36] no. it's specifically about edits [18:12:02] in fact, the only carrier this likely doesn't affect is opera mini [18:12:04] I hijacked the thread and talking about XFF & Opera Mini in general, sorry if I wasn't clear [18:12:06] because of our change [18:12:21] right, so our change only affects varnish [18:12:27] yeah, the opera mini stuff is completely separate from the original purpose of the thread [18:12:29] so, bits and mobile [18:12:38] and upload? [18:12:53] is upload totally on varnish right now? [18:13:54] either way, this shouldn't really affect stats or edits [18:14:08] trusting the XFF simply means we don't strip it [18:14:22] the stats will get the same thing, with the XFF field added [18:14:36] mediawiki would get the same thing, with XFF added [18:15:00] which means edits originating from the varnish servers would actually have the correct IP [18:15:19] the correct meaning "not opera's"? [18:15:22] as long as mediawiki trusts them for XFF [18:15:33] opera is likely the only one actually working [18:15:35] so, mediawiki has another layer of "trusting X-F-F"? [18:15:39] yes [18:15:43] okay [18:15:48] didn't know that. [18:15:58] Yeah [18:16:03] it's how the thread started ;) [18:16:26] In the case of 10.64.169, it wasn’t actually listed for XFF in $wgSquidServersNoPurge. 154.53 and 154.54 were already listed in $wgSquidServersNoPurge. [18:16:30] so, why do we do that overriding in varnish then? [18:16:37] quoting from the email ^^ [18:16:39] To determine the IP, MW uses the real IP unless that's in the trusted Squids list, in which case it follows the XFF chain to find the first untrusted IP [18:17:35] what's the point of doing it in two places? couldn't we just add the SSL proxies to that trusted XFF list? [18:19:39] the MW one? [18:19:44] yes [18:19:46] Hmm yeah why is there a trust list in Varnish? [18:19:56] Can't Varnish just follow the protocol and append to the XFF, then let MW sort it out? [18:20:29] possibly [18:21:10] I know there was some reasoning behind this when we did it [18:21:20] hell if I can remember off the top of my head right now [18:21:30] seems like the kind of thing we should have documented :D [18:22:25] Change merged: Aaron Schulz; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17650 [18:22:42] it says "Needed for the geoiplookup code later on as it will use xff." there [18:22:46] perhaps that's the reasoning? [18:22:59] both varnish and squid strip XFF unless it comes from the HTTPS servers [18:23:26] RobH: So apparently my Parsoid server in Tampa (#3271) is ready from Chris's end, when can we get it set up? [18:23:41] it doesn't really matter if people spoof XFF for geoiplookup [18:23:51] that's your comment :P [18:24:01] we inherently don't trust XFF, so we added it for geoiplookup, though [18:26:39] Aaaah right [18:26:50] geoip needs to recognize the XFF set by the SSL proxies [18:27:14] It didn't always do this, I remember filing a bug about always getting "San Francisco, CA" when hitting geoiplookup over https [18:27:36] we don't actually need to strip XFF for that, though [18:27:48] No [18:27:50] just need to have a trust list for geoip [18:27:53] both varnish and squid strip XFF unless it comes from the HTTPS servers [18:27:58] I'm pretty sure that is false [18:28:02] is it? [18:28:08] At least the text Squids should be preserving XFF [18:28:11] are we just stripping XFP? [18:28:52] I don't know what happens in practice, I just know what should happen in theory [18:29:16] Which is that all caching proxies should process XFF per protocol [18:29:31] except for geoip which should interpret the XFF if the originating IP is an SSL proxy [18:30:41] just XFP [18:30:42] header_access X-Forwarded-Proto deny !sslproxy [18:30:52] which makes sense [18:31:14] Where "process per protocol" means prepend the originating IP to the XFF header [18:31:23] * Ryan_Lane nods [18:33:24] ok, so we are only reassigning XFF in varnish if its coming from the ssl servers or opera mini [18:33:39] wait [18:33:54] it's in fact the opposite [18:34:12] set req.http.X-Forwarded-For = client.ip; [18:34:30] By reassigning you mean destroying the XFF chain? [18:34:30] that said, this still wouldn't trigger the bug we're seeing [18:35:14] are you guys examining the inline C that remaps client.ip in very limited cases? [18:35:19] yes [18:35:23] Are you sure that line doesn't overwrite the XFF header and destroys the information in the incoming XFF header? [18:35:28] Because that would be bad [18:35:42] RoanKattouw: it does. [18:35:44] The incoming XFF header could be from a legitimate proxy that MW trusts [18:35:53] an outside proxy? [18:35:55] Yes [18:36:05] MW trusts lots of external proxies [18:36:31] this should be irrelevant to the issue of how mw sees edits via varnish [18:36:41] binasher: that's what I said [18:37:03] Hmm right [18:37:14] MW would report the IP of the external proxy, not the IP of some internal proxy [18:37:31] RoanKattouw: either way, if we *do* trust lots of external proxies, we should have this available as a puppet variable so that https and varnish can also have the same trust list [18:37:38] Wait [18:37:45] The proxies shouldn't *need* trust lists [18:37:53] PROBLEM - Puppet freshness on bayes is CRITICAL: Puppet has not run in the last 10 hours [18:37:58] If they just manipulate XFF per protocol it'll be fine [18:39:08] New patchset: Catrope; "Add a service class for Parsoid" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15856 [18:39:08] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15856 [18:39:13] looking at https config [18:39:13] RoanKattouw: If the server is ready for install, are all the puppet manifest changes checked in? [18:39:13] if so we can review and merge change, and a puppet run on the server will make it live [18:39:14] binasher: Who and what needs to be done for me to shut down db1047 to replace its bad dimm? [18:39:14] its one of the two analytics slaves according to rt 3084 [18:39:14] I assume since it is one of two slaves, I can just do a clean shutdown [18:39:16] it doesn't strip [18:39:24] proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; [18:39:28] it just adds to the chain, properly [18:39:35] OK god [18:39:37] *good [18:39:46] Then you don't need to trust anything on the proxy's end, do you? [18:39:47] RobH: the analytics people should be notified first [18:39:49] PROBLEM - Puppet freshness on niobium is CRITICAL: Puppet has not run in the last 10 hours [18:39:50] PROBLEM - Puppet freshness on srv242 is CRITICAL: Puppet has not run in the last 10 hours [18:39:50] so, just varnish is changing the chain [18:40:01] RobH: https://gerrit.wikimedia.org/r/#/c/15856/ [18:40:02] RobH: if you want to do it at a certain time today, i'll email them [18:40:42] binasher: I wanna do it as soon as possible, but its not an emergency [18:40:44] PROBLEM - Puppet freshness on mw27 is CRITICAL: Puppet has not run in the last 10 hours [18:40:44] PROBLEM - Puppet freshness on srv238 is CRITICAL: Puppet has not run in the last 10 hours [18:40:44] PROBLEM - Puppet freshness on srv190 is CRITICAL: Puppet has not run in the last 10 hours [18:40:50] RoanKattouw: and we only do it for geoiplookup [18:40:53] so whatever the minimum leadtime they need, the best [18:41:01] would be best even [18:41:01] so, ideally we should check the host header there [18:41:04] OK that's fine then [18:41:09] RoanKattouw: well... [18:41:25] RoanKattouw: what I meant is: we only *need* to do it for geoiplookuo [18:41:29] Right [18:41:30] geoiplookup [18:41:38] RobH: want to do it in 30min? [18:41:40] Remember that geoiplookup is no longer its own host [18:41:48] right now we do it for every non-https or opera-mini client [18:41:55] It's no longer http://geoiplookup.wikimedia.org , it's now http://bits.wikimedia.org/geoiplookup [18:42:01] damn it [18:42:08] well, we can strip for bits [18:42:09] that's fine [18:42:15] it doesn't take edits [18:42:18] Yeah [18:42:24] let me open an rt [18:42:40] binasher: That would be great, yep! [18:42:43] this isn't really a problem right now [18:42:49] but if someone cannot do that, and needs two hours, thats fine. [18:42:50] it will be when we use varnish for text, thoguh [18:42:53] *though [18:42:56] I will be here until 7PM EST [18:42:59] or when we start allowing edits from mobile [18:45:22] RobH: What's the hostname of my new shiny server? [18:45:50] RobH: go for it at 3:15PM EST, a normal clean shutdown should be ok [18:45:58] binasher: great, thank you! [18:46:58] RoanKattouw: wtp1 [18:47:08] wtp.pmtpa.wmnet [18:47:12] sorry, wtp1.pmtpa.wmnet [18:47:13] RoanKattouw: added an rt to fix ths [18:47:15] *this [18:47:22] Thanks [18:47:35] welcome [18:47:42] or you thanking ryan ;] [18:47:47] i take his thanks as mine. [18:47:59] hahaha [18:48:01] he didnt send me any of the booze from sysadmin day [18:48:05] so he owes me anyhow [18:48:12] there's still two unopened bottles [18:48:29] hmm? [18:48:38] wouldnt this get server kitties drunk? [18:48:58] They're not in the datacentre(s) [18:50:05] New patchset: Catrope; "Install Parsoid on wtp1" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17831 [18:50:14] RobH, Ryan_Lane: ---^^ [18:50:46] New patchset: Catrope; "Add a service class for Parsoid" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15856 [18:50:47] argh, when did the theme take effect [18:50:56] its very bright. [18:50:59] hahaha [18:51:10] i dont love it. [18:51:13] I know you aren't going to say you like the old one better [18:51:24] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17831 [18:51:24] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15856 [18:51:26] meh, honestly, dont recall what it looked like already [18:51:32] except it was easier on the brightness [18:51:59] bleh [18:52:02] it has a dependency [18:52:21] * RobH goes back to on site stuff [18:53:52] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17831 [18:53:52] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15856 [18:54:14] RoanKattouw: ok. merged all the way in [18:55:44] Thanks [18:56:00] * RoanKattouw hopes Parsoid will magically come up on wtp1 in the next hour [18:56:34] I would run Puppet on the machine manually except it doesn't trust my key for root yet because Puppet hasn't run :) [18:56:43] heh [19:01:53] cmjohnson1: 3298 needs someone to install these machines and put them into puppet [19:04:46] New patchset: MaxSem; "Add user accounts to the WLM host" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17623 [19:05:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17623 [19:05:29] paravoid, ^^ [19:07:58] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17623 [19:09:52] paravoid, thank you [19:10:39] MaxSem: done and applied on yttrium [19:11:29] cool, I can log in now [19:13:02] \o/ [19:13:12] thanks guys :) [19:14:07] awjr, lol: The program 'git' is currently not installed. To run 'git' please ask your administrator to install the package 'git' [19:14:15] * MaxSem goes back to puppet [19:14:50] doh [19:15:24] !log db1047 mysql and system shutdown per rt 3084 for bad memory swap [19:15:33] Logged the message, RobH [19:16:43] * RobH waits for mysql to actually shut down [19:17:07] cmjohnson1: looks like you reseated dimms in mc1 this morning? hows it look? [19:17:43] We need to have 2 more sent from DELL [19:17:57] ah, ok [19:17:58] i have a few things for them so I will be calling in a few mins [19:18:01] !log db1047 shutting down [19:18:07] cool [19:18:09] Logged the message, RobH [19:18:45] cmjohnson1: will you have time in the next day or two to try making the 10gb nics in those hosts pxe'able? [19:19:06] yes [19:19:17] PROBLEM - Host db1047 is DOWN: PING CRITICAL - Packet loss = 100% [19:19:35] hoping to have them for you NLT Wednesday [19:19:49] that would be great [19:20:21] RobH: can you try to get the eqiad mc1-16 hosts pxe bootable from their 10gb nics this week too? [19:20:38] PROBLEM - Apache HTTP on mw18 is CRITICAL: Connection refused [19:26:11] binasher: only mc1001-1008 are racked and wired [19:26:22] the other 8 i need a second person to help me, and that will prolly be next week. [19:26:40]