[00:03:02] New patchset: MaxSem; "WLM banner" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/23021 [00:03:26] New review: MaxSem; "Not now." [operations/mediawiki-config] (master); V: 0 C: -2; - https://gerrit.wikimedia.org/r/23021 [00:03:36] PROBLEM - Puppet freshness on search18 is CRITICAL: Puppet has not run in the last 10 hours [00:04:30] PROBLEM - Puppet freshness on search20 is CRITICAL: Puppet has not run in the last 10 hours [00:04:30] PROBLEM - Puppet freshness on search35 is CRITICAL: Puppet has not run in the last 10 hours [00:04:33] !log authdns-update to support wicipediacymraeg.org [00:04:42] Logged the message, RobH [00:05:33] PROBLEM - Puppet freshness on search32 is CRITICAL: Puppet has not run in the last 10 hours [00:05:33] PROBLEM - Puppet freshness on search15 is CRITICAL: Puppet has not run in the last 10 hours [00:05:33] PROBLEM - Puppet freshness on search34 is CRITICAL: Puppet has not run in the last 10 hours [00:06:30] !log remove cn from uniqueness check in LDAP. it's a global check and causes problems [00:06:39] Logged the message, Master [00:07:30] PROBLEM - Puppet freshness on search36 is CRITICAL: Puppet has not run in the last 10 hours [00:11:33] PROBLEM - Puppet freshness on search13 is CRITICAL: Puppet has not run in the last 10 hours [00:11:33] New patchset: RobH; "added in redirect for wicipediacymraeg.org" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/23022 [00:13:30] PROBLEM - Puppet freshness on search27 is CRITICAL: Puppet has not run in the last 10 hours [00:14:33] PROBLEM - Puppet freshness on search17 is CRITICAL: Puppet has not run in the last 10 hours [00:14:33] PROBLEM - Puppet freshness on search23 is CRITICAL: Puppet has not run in the last 10 hours [00:14:33] PROBLEM - Puppet freshness on search28 is CRITICAL: Puppet has not run in the last 10 hours [00:15:36] PROBLEM - Puppet freshness on search22 is CRITICAL: Puppet has not run in the last 10 hours [00:15:36] PROBLEM - Puppet freshness on search31 is CRITICAL: Puppet has not run in the last 10 hours [00:15:47] New patchset: Aaron Schulz; "Added global backend config for things like math." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22941 [00:16:30] PROBLEM - Puppet freshness on search25 is CRITICAL: Puppet has not run in the last 10 hours [00:16:30] PROBLEM - Puppet freshness on search30 is CRITICAL: Puppet has not run in the last 10 hours [00:16:30] PROBLEM - Puppet freshness on searchidx2 is CRITICAL: Puppet has not run in the last 10 hours [00:17:33] PROBLEM - Puppet freshness on search19 is CRITICAL: Puppet has not run in the last 10 hours [00:17:33] PROBLEM - Puppet freshness on search33 is CRITICAL: Puppet has not run in the last 10 hours [00:19:13] Change merged: Aaron Schulz; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22941 [00:21:36] PROBLEM - Puppet freshness on search29 is CRITICAL: Puppet has not run in the last 10 hours [00:22:30] PROBLEM - Puppet freshness on search21 is CRITICAL: Puppet has not run in the last 10 hours [00:22:51] New patchset: ArielGlenn; "db70 -> ms-be13" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23023 [00:23:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/23023 [00:25:06] New patchset: ArielGlenn; "db70 -> ms-be13" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23023 [00:25:56] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/23023 [00:26:09] Change abandoned: ArielGlenn; "committed a bunch of cruft instead of just what I wanted, grrr. need moar sleep." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23023 [00:27:55] New patchset: ArielGlenn; "db70 -> ms-be13" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23024 [00:28:46] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/23024 [00:29:23] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23024 [00:31:58] ^demon: 'can not update the reference as a fast forward' [00:32:19] ^demon: did push get disabled for the config repo? [00:32:43] AaronSchulz: Possibly. Or you forgot to pull before you pushed? [00:32:52] I already did pull --rebase [00:33:00] Hmm right [00:33:03] You should be able to push [00:33:05] <^demon> I just quickly granted a couple of permissions to get it up and running earlier. [00:33:20] <^demon> It may need more setup. Anyone in deployment can adjust those permissions. [00:33:21] Right, the ACLs got hosed too [00:33:51] * AaronSchulz sighs [00:38:25] RoanKattouw: any idea how to fix? [00:40:55] Let me see [00:41:42] Try now [00:41:59] worked [00:59:52] New patchset: ArielGlenn; "ms-be13 disk layout without ssds" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23028 [01:00:43] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/23028 [01:07:05] PROBLEM - Puppet freshness on search1003 is CRITICAL: Puppet has not run in the last 10 hours [01:07:46] apergos: so were did we say ext-dist is going? [01:07:53] we didn't [01:08:18] my notes there say "it would be nice not to be in swift. but dunno where." [01:08:20] ^h [01:08:23] (on the cruft page) [01:16:53] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23028 [01:20:00] New patchset: Dzahn; "bug 31369 - make redirects protorel where possible" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/13293 [01:22:05] PROBLEM - Puppet freshness on search1001 is CRITICAL: Puppet has not run in the last 10 hours [01:22:51] New patchset: Dzahn; "bug 31369 - make redirects protorel where possible" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/13293 [01:26:03] New review: Dzahn; "JeremyB: actually the problem was just the comment in line 164, Apache takes that as an argument to ..." [operations/apache-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/13293 [01:27:57] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 7s [01:28:05] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 17 seconds [01:29:58] apergos: so it seems like you guys won't need me for the docroot/jar stuff right? [01:30:11] I think not for the jar stuff for sure [01:30:46] we'll be keeping the same path in timedmediahandler and in uh [01:30:50] ogghandler [01:31:13] just making sure that url winds up going somewhere else. [01:32:24] and ext-dist doesn't need me if you just move it to another server and update the 3 globals [01:33:31] * apergos goes grepping for 503s [01:33:33] * jeremyb waves mutante [01:33:52] feel free to propose that on the cruft page [01:33:59] if it isn't already [01:34:15] mutante: sorry, didn't get to that yet, thanks for the extra testing [01:34:57] jeremyb: i rebased it. there was a major cleanup of redirects.conf in between. [01:38:25] mutante: yup, just went through the diffs (but ignored the rebasing) [01:40:32] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 190 seconds [01:41:26] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 289 seconds [01:43:33] apergos: do you want the task of ext-dist? [01:43:55] huh? [01:44:03] I won't get to it right away [01:44:26] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 5 seconds [01:44:31] I need to do the r510, then the dell call, if there is anything left to do for ms-be6 then that, and then pybaltest.txt [01:44:44] if I get that working I want to do jars/bla next [01:45:02] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 9 seconds [01:45:14] basically I need to know where we want stuff like that, the mw tarballs, etc [01:45:23] and I have no frickin clue [01:45:31] that = ext dist files [01:51:44] New patchset: Jeremyb; "tweak sync wrapper script" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/23033 [01:54:42] New review: Jeremyb; "do both paths exist? is there a symlink or bind mount in there somewhere? which is really canonical?" [operations/apache-config] (master) C: 0; - https://gerrit.wikimedia.org/r/23033 [01:55:53] logging off [01:55:59] PROBLEM - ps1-d2-sdtpa-infeed-load-tower-A-phase-Z on ps1-d2-sdtpa is CRITICAL: ps1-d2-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2438* [01:57:02] bye apergos! [02:01:41] PROBLEM - ps1-d2-sdtpa-infeed-load-tower-A-phase-Y on ps1-d2-sdtpa is CRITICAL: ps1-d2-sdtpa-infeed-load-tower-A-phase-Y CRITICAL - *2438* [02:07:50] PROBLEM - ps1-d2-sdtpa-infeed-load-tower-A-phase-Y on ps1-d2-sdtpa is CRITICAL: ps1-d2-sdtpa-infeed-load-tower-A-phase-Y CRITICAL - *2450* [02:12:38] PROBLEM - ps1-a5-sdtpa-infeed-load-tower-A-phase-Y on ps1-a5-sdtpa is CRITICAL: ps1-a5-sdtpa-infeed-load-tower-A-phase-Y CRITICAL - *2575* [02:18:38] PROBLEM - Host professor is DOWN: PING CRITICAL - Packet loss = 100% [02:19:05] PROBLEM - ps1-a5-sdtpa-infeed-load-tower-A-phase-Z on ps1-a5-sdtpa is CRITICAL: ps1-a5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2575* [02:22:31] PROBLEM - ps1-a5-sdtpa-infeed-load-tower-A-phase-Z on ps1-a5-sdtpa is CRITICAL: ps1-a5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2575* [02:25:58] PROBLEM - mysqld processes on es9 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [02:26:34] PROBLEM - mysqld processes on es10 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [02:28:31] PROBLEM - ps1-a5-sdtpa-infeed-load-tower-A-phase-Z on ps1-a5-sdtpa is CRITICAL: ps1-a5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2588* [02:30:37] PROBLEM - Puppet freshness on search1002 is CRITICAL: Puppet has not run in the last 10 hours [02:31:40] PROBLEM - ps1-a5-sdtpa-infeed-load-tower-A-phase-X on ps1-a5-sdtpa is CRITICAL: ps1-a5-sdtpa-infeed-load-tower-A-phase-X CRITICAL - *2575* [02:37:40] PROBLEM - ps1-a5-sdtpa-infeed-load-tower-A-phase-Z on ps1-a5-sdtpa is CRITICAL: ps1-a5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2600* [02:38:52] RECOVERY - Puppet freshness on search1003 is OK: puppet ran at Fri Sep 7 02:38:31 UTC 2012 [02:39:28] RECOVERY - Puppet freshness on search16 is OK: puppet ran at Fri Sep 7 02:39:17 UTC 2012 [02:42:01] RECOVERY - Puppet freshness on search1001 is OK: puppet ran at Fri Sep 7 02:41:48 UTC 2012 [02:42:55] RECOVERY - Puppet freshness on search19 is OK: puppet ran at Fri Sep 7 02:42:39 UTC 2012 [02:43:41] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [02:43:41] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [02:43:41] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [02:43:41] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [02:43:41] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [02:43:42] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [02:43:42] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [02:43:43] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [02:43:43] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [02:43:49] RECOVERY - ps1-a5-sdtpa-infeed-load-tower-A-phase-Z on ps1-a5-sdtpa is OK: ps1-a5-sdtpa-infeed-load-tower-A-phase-Z OK - 2375 [02:43:58] RECOVERY - Puppet freshness on search1002 is OK: puppet ran at Fri Sep 7 02:43:51 UTC 2012 [02:43:58] RECOVERY - Puppet freshness on search33 is OK: puppet ran at Fri Sep 7 02:43:51 UTC 2012 [02:44:07] RECOVERY - ps1-a5-sdtpa-infeed-load-tower-A-phase-X on ps1-a5-sdtpa is OK: ps1-a5-sdtpa-infeed-load-tower-A-phase-X OK - 2288 [02:44:34] PROBLEM - ps1-d2-sdtpa-infeed-load-tower-A-phase-Y on ps1-d2-sdtpa is CRITICAL: ps1-d2-sdtpa-infeed-load-tower-A-phase-Y CRITICAL - *2438* [02:45:01] RECOVERY - Puppet freshness on search31 is OK: puppet ran at Fri Sep 7 02:44:54 UTC 2012 [02:45:10] RECOVERY - ps1-a5-sdtpa-infeed-load-tower-A-phase-Y on ps1-a5-sdtpa is OK: ps1-a5-sdtpa-infeed-load-tower-A-phase-Y OK - 2400 [02:45:28] PROBLEM - ps1-d2-sdtpa-infeed-load-tower-A-phase-Z on ps1-d2-sdtpa is CRITICAL: ps1-d2-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2475* [02:45:55] RECOVERY - Puppet freshness on search18 is OK: puppet ran at Fri Sep 7 02:45:41 UTC 2012 [02:47:52] PROBLEM - ps1-d2-sdtpa-infeed-load-tower-A-phase-Y on ps1-d2-sdtpa is CRITICAL: ps1-d2-sdtpa-infeed-load-tower-A-phase-Y CRITICAL - *2413* [02:48:01] RECOVERY - Puppet freshness on search26 is OK: puppet ran at Fri Sep 7 02:47:46 UTC 2012 [02:48:01] RECOVERY - Puppet freshness on searchidx2 is OK: puppet ran at Fri Sep 7 02:47:49 UTC 2012 [02:49:31] RECOVERY - Puppet freshness on search17 is OK: puppet ran at Fri Sep 7 02:49:20 UTC 2012 [02:50:25] RECOVERY - Puppet freshness on search20 is OK: puppet ran at Fri Sep 7 02:50:02 UTC 2012 [02:51:28] RECOVERY - Puppet freshness on search34 is OK: puppet ran at Fri Sep 7 02:51:22 UTC 2012 [02:51:55] RECOVERY - Puppet freshness on search15 is OK: puppet ran at Fri Sep 7 02:51:31 UTC 2012 [02:52:31] RECOVERY - Puppet freshness on search28 is OK: puppet ran at Fri Sep 7 02:52:05 UTC 2012 [02:52:31] RECOVERY - Puppet freshness on search23 is OK: puppet ran at Fri Sep 7 02:52:16 UTC 2012 [02:54:55] RECOVERY - Puppet freshness on search29 is OK: puppet ran at Fri Sep 7 02:54:55 UTC 2012 [02:56:25] RECOVERY - Puppet freshness on search25 is OK: puppet ran at Fri Sep 7 02:56:06 UTC 2012 [02:57:01] RECOVERY - Puppet freshness on search32 is OK: puppet ran at Fri Sep 7 02:56:51 UTC 2012 [02:58:58] RECOVERY - Puppet freshness on search30 is OK: puppet ran at Fri Sep 7 02:58:35 UTC 2012 [03:00:01] RECOVERY - Puppet freshness on search24 is OK: puppet ran at Fri Sep 7 02:59:32 UTC 2012 [03:00:55] RECOVERY - Puppet freshness on search13 is OK: puppet ran at Fri Sep 7 03:00:45 UTC 2012 [03:01:31] RECOVERY - Puppet freshness on search14 is OK: puppet ran at Fri Sep 7 03:01:08 UTC 2012 [03:03:28] RECOVERY - Puppet freshness on search36 is OK: puppet ran at Fri Sep 7 03:03:02 UTC 2012 [03:03:28] RECOVERY - Puppet freshness on search35 is OK: puppet ran at Fri Sep 7 03:03:04 UTC 2012 [03:04:58] RECOVERY - Puppet freshness on search21 is OK: puppet ran at Fri Sep 7 03:04:46 UTC 2012 [03:07:31] RECOVERY - poolcounter on ersch is OK: PROCS OK: 1 process with command name poolcounterd [03:07:40] RECOVERY - Puppet freshness on search22 is OK: puppet ran at Fri Sep 7 03:07:28 UTC 2012 [03:07:49] RECOVERY - Puppet freshness on search27 is OK: puppet ran at Fri Sep 7 03:07:38 UTC 2012 [03:43:22] New review: Ori.livneh; "Might want to add your e-mail address to the comment." [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/22172 [03:50:45] apergos: pybaltestfile was/is on my radar... [03:52:03] apergos: as for the jars, I'm not sure if I agree, I've added some comments on the wiki page, have you seen that? [04:39:00] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [05:16:53] I didn't see your comments [05:17:44] I was planning t move it into swift in a "jars" container [05:17:59] maybe we'll wind up with other java crapola [05:18:11] so might as well have a container for it [05:18:14] paravoid: [05:19:08] as for pybaltestfile, unless you were planning to do it in the next couple days, I might get to it before you [05:19:16] we'll see [05:26:12] the next couple days I don't plan to do anything [05:26:14] (weekend) [05:27:04] and for pybaltestfile I was looking at modifying rewrite.py to provide one or two status URLs [05:27:19] like /status or something [05:27:37] one to be handled internally in rewrite.py and one that checks backends [05:27:48] and with pybal & squid using the former [05:28:47] if we don't do that, then we're risking that the loss of that file (because e.g. we had a triple failure like the one that almost happened these days) would result in a DoS of the cluster [05:29:19] also, I'd like having a few URLs that tell us more about the overall cluster health and plug those into Nagios. [05:29:56] so, I may haven't commited code yet, but as you see I've already planned it a bit :-) [05:30:00] PROBLEM - Puppet freshness on oxygen is CRITICAL: Puppet has not run in the last 10 hours [05:30:36] TimStarling: wouldn't you say that jars fit better into bits than upload? [05:30:40] or am I way off? [05:31:39] you are way off [05:32:09] heh [05:32:09] they would have been on bits to start with, except that it is required that they be hosted from the same domain as the video files, due to the same origin policy in Java [05:32:32] java has crossdomain.xml nowadayss [05:32:50] it's always had it, I think [05:33:05] no, it's >= 6 update 10 [05:33:15] (I checked that yesterday) [05:33:26] I knew it was a Flash thing that Java copied it at some point [05:33:50] couldn't we just serve a crossdomain.xml from docroot and have cortado in bits? [05:34:29] when was update 10 released? [05:34:48] 2008 apparently [05:35:02] but Java generally has autoupdate, so it's slightly better [05:35:26] where is the documentation for it? [05:35:40] for what? 2008 or crossdomain.xml? [05:35:52] crossdomain.xml [05:36:05] http://www.oracle.com/technetwork/java/javase/plugin2-142482.html [05:36:12] links to http://www.adobe.com/devnet/articles/crossdomain_policy_file_spec.html [05:36:31] Java Web Start and the Java Plug-In currently implement only a subset of the cross-domain policy file functionality. Specifically, site access restrictions are not supported. Access to a particular server is only granted if the crossdomain.xml file is present and contains only an entry granting access from all domains: [05:36:36] hah [05:36:54] so, yeah,

[05:36:58] should suffice [05:41:18] * paravoid checks [05:44:45] http://javasourcecode.org/html/open-source/jdk/jdk-6u23/com/sun/deploy/net/CrossDomainXML.html [05:44:54] "This implementation only supports the cross-domain policy . More specific access policies are not implemented. This class will return false if they are specified. " [05:45:59] I just mentioned that above [05:46:12] but allow from=* should be fine for us [05:46:14] so you did [05:47:29] how do you set a system property? [05:48:44] maybe that is just java web start, not the plugin [05:49:33] mmm, I guess so [05:49:50] and if you run things with JWS you're probably screwed anyway, ask Leslie [05:50:14] ok, I am fine with it [06:18:44] heh, not exactly way off after all :-) [06:22:48] so, I ran a grep on the squid logs for Java versions in UAs [06:24:03] http://pastebin.com/EFyjQ6kE [06:24:41] that includes all hits, incl. quite a few that are API hits [06:25:16] and none of Java/1.6.0 with no update number are browser hits [06:31:08] the cortado.jar sample is very small, 29 hits in the whole sampled-1000.log [06:31:29] out of those only 2 are < se6 upd. 10, but the sample is too small to extrapolate [06:31:55] so, yeah, looks good [06:42:29] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [06:42:29] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [06:42:29] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [06:43:55] TimStarling: I created a test page in my own domain, verified it did request '/crossdomain.xml' and didn't work by saying "Not allowed" [06:44:12] TimStarling: then I placed the crossdomain.xml above in ms7 and retried and it works [06:44:22] :-) [06:44:31] so, who should I talk with to move this to bits? [06:44:35] file a bug? [06:48:29] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [06:48:56] I can probably do it [06:49:26] that'd be great :) [06:49:38] always happy to work with you :-) [06:49:55] TMH sucks so much [06:50:04] it's both OggHandler and TMH iirc [06:50:09] oh and TMH has the path hardcoded in the code [06:50:23] yeah I noticed [06:51:43] you can open a bug against TMH asking them to clean up their code and support some sort of runtime configuration [06:51:49] I'll do a patch for OggHandler [06:53:50] thanks! [06:54:05] is that crossdomain.xml file likely to disappear in the future? [06:56:07] you know we're moving to swift right? ;) [06:57:16] hahaha :) [06:57:53] so, I've really liked your docroot idea [06:58:06] and I like also having an index.html, favicon.ico, robots.txt etc. [06:58:11] New patchset: Tim Starling; "Don't customise $wgCortadoJarFile" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/23036 [06:58:15] that's good [06:58:27] Change merged: Tim Starling; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/23036 [06:59:29] I haven't decided where to host that docroot yet obviously [06:59:37] but we could probably do it on all appservers, right? [06:59:56] you're pushing docroot regularly to them aiui [07:00:03] so it'd just be another vhost [07:01:02] doesn't work for me [07:01:12] what isn't? [07:01:14] "Not allowed http://upload.wikimedia.org/..." [07:01:19] cortado can't read from upload [07:02:10] try http://tty.gr/cortado.html [07:02:39] not allowed [07:03:07] you have Java/1.6.0_24 from what I can see [07:03:20] could you tcpdump and see if you request crossdomain.xml? [07:03:30] works for me [07:04:38] I have icedtea-6-plugin 1.2-2ubuntu1.2 [07:04:53] 26.709004 192.168.24.60 -> 178.32.250.226 HTTP GET /cortado.jar HTTP/1.1 [07:04:56] 27.037665 192.168.24.60 -> 91.198.174.234 HTTP GET /crossdomain.xml HTTP/1.1 [07:04:59] 27.488855 192.168.24.60 -> 91.198.174.234 HTTP GET /wikipedia/commons/0/0e/Merikoski_Dam_Oulu_20120810.OGG HTTP/1.1 [07:07:44] no, it doesn't fetch crossdomain.xml, confirmed with tcpdump [07:10:14] it is definitely IcedTea [07:10:20] yeah I'm on their bugtracker [07:10:29] searching [07:11:19] sigh [07:12:06] New patchset: Tim Starling; "Revert "Don't customise $wgCortadoJarFile"" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/23037 [07:12:27] Change merged: Tim Starling; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/23037 [07:14:02] your 99.9% figure probably still stands, but yeah, let's not be evil and break icedtea knowlingly :) [07:14:43] actually it seems to be broken with the old location too [07:15:37] hm? broken how? [07:16:24] New patchset: Dereckson; "(bug 39942) Disables patrol on fi.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22999 [07:16:28] "no such element: httpsrc (check plugins.ini)" [07:18:22] httpsrc = ElementFactory.makeByName("httpsrc", "httpsrc"); [07:18:22] if (httpsrc == null) { [07:18:22] noSuchElement ("httpsrc"); [07:18:39] New review: Dereckson; "Amending the config change to include new page patrol." [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/22999 [07:22:09] so maybe it doesn't matter [07:22:39] everyone who uses linux has a few different ways to play theora files, most notably the browser [07:22:49] okay, a bit offtopic, but why are we using Java in the first place? [07:22:53] instead of flash? [07:23:24] open source [07:23:46] and you can't play theora in flash [07:23:57] probably the latter more than the former [07:24:14] ha hm, apparently so [07:24:25] I just googled that just after I asked [07:25:13] cortado contains a complete port of ogg/theora/vorbis to java [07:35:43] PROBLEM - Host search32 is DOWN: PING CRITICAL - Packet loss = 100% [07:36:53] same here with icedtea plugin [07:37:15] and that's with icedtea-7-plugin [07:37:49] RECOVERY - Host search32 is UP: PING OK - Packet loss = 0%, RTA = 0.19 ms [07:51:10] PROBLEM - udp2log log age for emery on emery is CRITICAL: CRITICAL: log files /var/log/squid/countries-100.log, /var/log/squid/countries-10.log, /var/log/squid/countries-1.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [08:14:07] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [08:14:07] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [08:37:43] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [08:37:43] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Puppet has not run in the last 10 hours [09:08:43] New patchset: Dereckson; "(bug 37524) Change namespaces configuration - ku.wiktionary" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12556 [09:09:42] New review: Dereckson; "Rebased to master" [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/12556 [09:16:39] New patchset: Dereckson; "(bug 37524) Change namespaces configuration - ku.wiktionary" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12556 [09:22:09] New patchset: Dereckson; "(bug 37524) Change namespaces configuration - ku.wiktionary" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12556 [09:22:26] New review: Dereckson; "Added namespaces aliases." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/12556 [11:07:03] !log Sending Canadian upload traffic to upload-lb.eqiad (Varnish) [11:07:13] Logged the message, Master [11:11:13] PROBLEM - Auth DNS on ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [11:12:34] RECOVERY - Auth DNS on ns1.wikimedia.org is OK: DNS OK: 0.023 seconds response time. www.wikipedia.org returns 208.80.154.225 [11:14:02] are you picking countries at random? :) [11:14:08] no [11:14:12] i'm picking small countries [11:14:17] argentina was spanish speaking [11:14:20] now an english speaking one [11:14:25] and needs to be outside europe [11:15:24] funny to read argentina & canada as small countries, but I know what you meant :) [11:15:46] yeah, in terms of wikipedia traffic [11:15:54] argentina was 1%, canada around 2% of our traffic iirc [11:16:17] how do you get those numbers? [11:20:12] stats.wikimedia.org [11:20:48] amazing, I didn't know that [11:20:52] I knew about reportcard but not that [11:21:09] it has quite a lot [11:24:18] are you planning to gradually switch all traffic? [11:35:39] well, no [11:35:41] did that last time [11:36:02] and when I switched pretty much all traffic, noticed that all GigE links were at > 80% capacity ;-) [11:36:10] 8 servers was plenty to handle the load in terms of varnish efficiency [11:36:18] but forgot that with fewer servers, GigE links become a problem ;-) [11:36:24] then added 8 more servers just to solve that problem [11:36:36] but now I have 8 servers active again, the remaining 8 i'm gonna have moved and upgraded to 10G [11:36:40] and then probably 8 is enough again [11:36:50] but until then, these can't really handle the full traffic load [11:36:54] but I want to seed the cache a bit [11:37:02] and having a small mix of languages/countries active on it will help with that [11:59:21] 10G? not bonds? [11:59:33] like 2x1G? [12:00:22] if we were at ~80%, then 2x1G would get us to 40% and better resiliency than a single network card [12:11:11] morning [12:12:24] apergos: morning, slowly drifting I see [12:12:29] the reason I didn't really care about moving the pybal status check file from ms7 to swift is that retrieving it from ms7 was also a way to have the whole cluster fall over if ms7 was sad (but actually if ms7 dies then the upload squids are going to be unhappy anyways) [12:12:34] no, I woke up at 4 am again [12:12:49] just stubbornly stayed in the bed for an hour to see if I could sleep more. no dice [12:13:14] I wonder if I'll go through the whole 2 weeks without any adjustment [12:13:21] New patchset: Dereckson; "(bug 29902) Cleaning InitialiseSettings.php" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/23059 [12:13:52] paravoid: no not bonds [12:14:01] those servers can do a whole lot more than 2G [12:14:01] today I'll finish the r510 setup and, assuming all goes well, I plan to add it into the pool of backends, it's temporary though because asher wants it back [12:14:06] and the switches only have 48 ports [12:14:16] apergos: then don't [12:14:41] filing up a backend server takes more than a week [12:14:42] we need to know it's going to work [12:14:50] why wouldn't it work [12:14:53] we know that these servers work [12:15:02] stop checking whether wheels are round, they are [12:15:04] have we had them in the pool? no [12:15:07] the only "question" is whether we can see individual disks [12:15:13] instead of raid [12:15:16] we don't know how they are performance wise [12:15:19] you can always make individual raid arrays, which sucks a bit, but does work [12:15:24] they are very good performance wise [12:15:29] they run our mysql core databases [12:15:39] we don't know how they are performance wise for swift [12:15:47] ben said no, asher said yes, I tend to believe asher but let's double-check. we don't need to fill it for that [12:15:48] again, stop checking whether wheels are round [12:16:38] well I already got told that not having ssds is enough of an issue that we could wind up returning 503s to people purigng thumbs [12:16:49] so yes I want to test [12:17:38] feel free to, but please don't move around 20TB back and forth just for that... [12:17:50] it won't be 20t [12:17:59] this box has smaller disks and I intend to weight it lower [12:20:43] I also don't see the point of performance testing it, and I see it even less if you use different disks and downweight it in traffic [12:20:45] ah paravoid do you think you will be on the dell call (do you care about being on it)? I think it will mostly be them asking about how the system is set up [12:21:37] I won't be, I have a couple of friends visiting Greece and we're going out for dinner [12:21:45] ok no worries [12:22:45] replacements aren't even gonna be R510s [12:22:49] since those are end of sale [12:22:57] they will be 520s I expect [12:23:03] i expect they'll be 720XDs [12:23:44] what's the difference in the specs? [12:23:54] this is the first I've heard that we would put 720xds iin there [12:23:57] http://www.dell.com/us/enterprise/p/poweredge-r720xd/pd?~ck=anav [12:24:02] no it's not [12:24:05] yes, it is [12:24:06] i've said this multiple times already [12:24:12] we bought these in esams [12:24:15] they're racked there [12:24:53] you said you had these in esams, we've (or at least while I've been in the channel) never discussed it as a replacement for the c2100s [12:25:00] well DUH [12:25:06] and I've heard about the 720xds exactly once before this [12:25:29] so testing an R510 now, which has a rather different configuration, seems pretty pointless [12:26:33] fine, I'll give theh box back and undo my puppet changes [12:27:16] it would have been nice to know this was the plan though, I have been talking about this for a few days (in the channel) [12:27:24] i've said it several times [12:27:45] it's not "the plan", I don't think there is "a plan", but it's what makes sense to me [12:27:56] in any case it's not gonna be end of sale R510s [12:28:14] R720XD looks like pretty similar to C2100 except in the proper dell poweredge R line [12:28:17] no, I assumed it would be 520s [12:29:22] dell NL didn't want to sell the C2100 to us as they were having many issues and were going out [12:30:05] what a good idea. too bad dell usa didn't do the same [12:31:54] do you know if you got the h310 or the h710 for the ones in amsterdam? [12:32:12] what do you mean? [12:32:15] oh the raid controller [12:33:04] H310 [12:33:10] which they assured me, can do JBOD [12:33:13] but i haven't tested that yet [12:33:20] RT #1961 [12:34:06] uh huh [12:38:27] ok so if the backplane doesn't do the trick on ms-be6, what do we want next? (this assumes no great insights are gained from the dell phone call) [12:39:47] i have not been following that, so no idea [12:39:57] i think those systems are broken and need to be replaced [12:40:18]