[00:03:53] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 00:03:48 UTC 2013 [00:04:43] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [00:27:53] (03PS1) 10Springle: db1018 to precise, mariadb, innodb_file_per_table [operations/puppet] - 10https://gerrit.wikimedia.org/r/85943 [00:29:17] (03PS2) 10Springle: db1018 to precise, mariadb, innodb_file_per_table [operations/puppet] - 10https://gerrit.wikimedia.org/r/85943 [00:30:30] mwalker, why does CN set mediaWiki.user.sessionId cookie? [00:31:10] (03CR) 10Springle: [C: 032] db1018 to precise, mariadb, innodb_file_per_table [operations/puppet] - 10https://gerrit.wikimedia.org/r/85943 (owner: 10Springle) [00:31:23] the only reason I know of is, if, when connection to meta the requesting user does not have a SUL session, MW will autocreate one [00:32:04] it probably also does it when you view a Special:CentralNotice page that uses HTMLForms [00:32:15] I observe it being created from CN JS [00:32:29] interesting; can you explain further? [00:33:05] like what file/line? [00:33:53] because a simple case insensitive grep on my resource loader modules doesn't come up with anything [00:35:35] mwalker, my bad, in addition to CN there was an ULS in that RL blob:( [00:35:37] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 00:35:32 UTC 2013 [00:35:37] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [00:36:20] it's ULS [00:36:37] hehe, 'an ULS' [00:36:58] they might be multiplying! [00:37:08] off to hang out with the alots [00:37:19] me looks in Varnish t o see if it results in cache bypass [00:42:25] I'll poke mark tomorrow about it [00:55:44] !log started xtrabackup clone db1002 to db1018 [00:55:58] Logged the message, Master [00:58:51] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: No successful Puppet run in the last 10 hours [01:03:26] (03PS1) 10Legoktm: Enable MassMessage extension on test2.wikipedia.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85944 [01:03:29] (03PS11) 10Yuvipanda: Add labsvagrant module [operations/puppet] - 10https://gerrit.wikimedia.org/r/85814 [01:10:16] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [01:13:36] PROBLEM - Varnish HTTP upload-frontend on cp1064 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:13:49] Can someone fix wikibugs? [01:13:56] It should still be connected. [01:14:02] But not joined to #mediawiki. [01:14:36] RECOVERY - Varnish HTTP upload-frontend on cp1064 is OK: HTTP OK: HTTP/1.1 200 OK - 229 bytes in 0.907 second response time [01:14:42] (03PS12) 10Yuvipanda: Add labsvagrant module [operations/puppet] - 10https://gerrit.wikimedia.org/r/85814 [01:17:32] (03PS13) 10Yuvipanda: Add labsvagrant module [operations/puppet] - 10https://gerrit.wikimedia.org/r/85814 [01:17:36] PROBLEM - Varnish HTTP upload-frontend on cp1051 is CRITICAL: HTTP CRITICAL - No data received from host [01:19:36] RECOVERY - Varnish HTTP upload-frontend on cp1051 is OK: HTTP OK: HTTP/1.1 200 OK - 229 bytes in 3.131 second response time [01:20:10] (03CR) 10Ryan Lane: [C: 032] Add labsvagrant module [operations/puppet] - 10https://gerrit.wikimedia.org/r/85814 (owner: 10Yuvipanda) [01:20:24] wooho, thanks Ryan_Lane [01:20:36] yw [01:20:40] hm [01:20:45] it didn't go all the way through yet [01:21:19] must be multiple api calls ;) [01:23:36] PROBLEM - LVS HTTPS IPv4 on upload-lb.eqiad.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 0.016 second response time [01:24:36] RECOVERY - LVS HTTPS IPv4 on upload-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 519 bytes in 2.222 second response time [01:25:29] (03PS1) 10Yuvipanda: Add labsvagrant role [operations/puppet] - 10https://gerrit.wikimedia.org/r/85946 [01:25:30] bah [01:25:33] hop online says ok. [01:25:40] bah i say! [01:26:06] PROBLEM - Varnish HTTP upload-frontend on cp1049 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:26:56] RECOVERY - Varnish HTTP upload-frontend on cp1049 is OK: HTTP OK: HTTP/1.1 200 OK - 230 bytes in 0.589 second response time [01:30:20] (03CR) 10Ryan Lane: [C: 032] Add labsvagrant role [operations/puppet] - 10https://gerrit.wikimedia.org/r/85946 (owner: 10Yuvipanda) [01:30:36] PROBLEM - LVS HTTPS IPv4 on upload-lb.eqiad.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 0.986 second response time [01:31:36] No ottoman? [01:31:40] Ottomata. [01:32:36] RECOVERY - LVS HTTPS IPv4 on upload-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 517 bytes in 5.233 second response time [01:34:56] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 01:34:50 UTC 2013 [01:35:16] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [01:37:35] RobH / Ryan_Lane: either of you have access to the box wikibugs sits on (machenry iirc) [01:37:37] ? [01:37:50] (03PS3) 10Yuvipanda: Add labsbastion role [operations/puppet] - 10https://gerrit.wikimedia.org/r/84927 [01:40:13] !log restarted wikibugs [01:40:26] Logged the message, Master [01:40:30] Thanks. [01:43:33] p858snake|l: everyone on the ops team does [01:44:06] Ryan_Lane: I know, that was more of a "is anyone around to do it" question [01:45:44] to do what? [01:45:55] I'm on a plane, making changes isn't likely a great idea [01:46:05] Ryan_Lane: to restart wikibugs (but tim did it already) [01:46:23] ah, ok [01:46:26] (03PS4) 10Yuvipanda: Add labsbastion role [operations/puppet] - 10https://gerrit.wikimedia.org/r/84927 [01:47:13] if we have learn't anything from domas, that making changes on a plane trip is the perfect place! [01:47:43] I miss Domas. [01:47:52] And River. [01:47:56] (03CR) 10Ryan Lane: [C: 032] Add labsbastion role [operations/puppet] - 10https://gerrit.wikimedia.org/r/84927 (owner: 10Yuvipanda) [01:48:04] And Jens. [01:48:04] woohoo, ty Ryan_Lane [01:48:26] yw [01:48:39] p858snake|l: well, I'm reviewing and merging code ;) [02:00:46] PROBLEM - Varnish HTTP upload-frontend on cp1062 is CRITICAL: HTTP CRITICAL - No data received from host [02:01:46] RECOVERY - Varnish HTTP upload-frontend on cp1062 is OK: HTTP OK: HTTP/1.1 200 OK - 230 bytes in 4.828 second response time [02:05:16] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 02:05:14 UTC 2013 [02:06:16] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [02:07:36] PROBLEM - Varnish HTTP upload-frontend on cp1051 is CRITICAL: HTTP CRITICAL - No data received from host [02:08:36] RECOVERY - Varnish HTTP upload-frontend on cp1051 is OK: HTTP OK: HTTP/1.1 200 OK - 229 bytes in 3.186 second response time [02:10:56] PROBLEM - LVS HTTPS IPv6 on upload-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:11:11] upload.wikimedia.org is occasionally throwing a 502 Bad Gateway error. [02:11:46] RECOVERY - LVS HTTPS IPv6 on upload-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 515 bytes in 0.014 second response time [02:11:50] yeah, meta too [02:11:58] clearly something going on [02:12:04] HTTPS? [02:12:08] https://bugzilla.wikimedia.org/show_bug.cgi?id=50891 [02:12:52] it is on https… though to be fair I'm always on https now so not sure if it's the same thing or different [02:13:00] !log LocalisationUpdate completed (1.22wmf18) at Wed Sep 25 02:13:00 UTC 2013 [02:13:05] hmmm [02:13:08] it could have been that too [02:13:17] Logged the message, Master [02:14:51] (03CR) 10MZMcBride: "I believe you also need to update wmf-config/extension-list." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85944 (owner: 10Legoktm) [02:17:20] loading pages on meta is giving me a ton of time outs from upload too [02:17:34] (I'm assuming that would happen everywhere) [02:17:42] * Jamesofur tries on http [02:18:36] errr [02:18:42] is meta on https only for anon now too? [02:18:55] * Jamesofur clearly missed that memo [02:19:05] Elsie: hm, yeah i do [02:19:43] Jamesofur: You're logged in. [02:19:47] So presumably you're being redirected. [02:19:53] Because otherwise it would be annoying. [02:20:03] I logged out …. and still being redirected [02:20:15] (I even manually went to Special:logout just in case) [02:20:42] (03PS2) 10Legoktm: Enable MassMessage extension on test2.wikipedia.org [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85944 [02:20:46] PROBLEM - Varnish HTTP upload-frontend on cp1050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:46] Dunno. That bug should probably be re-opened. [02:21:17] I think it's that varnish error ^ [02:21:25] there were a couple more before I joined it seems looking at the logs [02:21:36] I have the bot on ignroe. [02:21:37] ignore [02:21:47] fair enough [02:22:36] RECOVERY - Varnish HTTP upload-frontend on cp1050 is OK: HTTP OK: HTTP/1.1 200 OK - 229 bytes in 0.387 second response time [02:23:16] PROBLEM - Puppet freshness on mw1072 is CRITICAL: No successful Puppet run in the last 10 hours [02:25:31] !log LocalisationUpdate completed (1.22wmf17) at Wed Sep 25 02:25:30 UTC 2013 [02:25:43] Logged the message, Master [02:26:31] one of my extensions must be causing the redirect, don't have it on incognito (though I don't have SSLEverywhere on this one on purpose oh well). still seem to be having some of the issues on http though [02:28:21] You're getting errors on HTTP? [02:32:37] Hi [02:32:39] I'm here [02:35:16] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 02:35:14 UTC 2013 [02:35:55] Elsie: I'm seeing some upload time outs at least [02:36:03] no meta time out yet [02:36:16] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [02:36:18] s/time out/bad gateway [02:45:35] I take that back, the stuff I'm seeing on http is different [02:45:40] the bad gateways are only on https so far [02:45:44] * Jamesofur reopens bug for now [02:47:11] ahh you beat me to it [02:48:20] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Sep 25 02:48:19 UTC 2013 [02:48:32] Logged the message, Master [03:04:06] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 03:03:59 UTC 2013 [03:04:16] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [03:33:51] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 03:33:46 UTC 2013 [03:34:11] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [03:45:56] (03PS1) 10Springle: repool db1018 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85951 [03:46:54] (03CR) 10Springle: [C: 032] repool db1018 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85951 (owner: 10Springle) [03:47:52] !log springle synchronized wmf-config/db-eqiad.php 'repool db1018' [03:48:04] Logged the message, Master [04:08:53] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [04:19:43] something funky going on with site performance: http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&hreg[]=client-side&mreg[]=%5Ebrowser.rendering.%28desktop%7Cmobile%29_median%24>ype=line&title=Rendering%3A+responseEnd+to+loadEventEnd&aggregate=1 [04:19:56] weekly view also revealing: http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&hreg[]=client-side&mreg[]=%5Ebrowser.rendering.%28desktop%7Cmobile%29_median%24>ype=line&title=Rendering%3A+responseEnd+to+loadEventEnd&aggregate=1 [04:20:17] ditto http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&hreg[]=client-side&mreg[]=%5Ebrowser.loading.%28desktop%7Cmobile%29_median%24>ype=line&title=Loading%3A+navStart+to+loadEventStart&aggregate=1 [04:20:48] slowness reports on en.wp: https://en.wikipedia.org/wiki/Wikipedia:VPT#Is_preview_slower_than_usual.3F [04:21:26] TimStarling, ping [04:34:03] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 04:34:01 UTC 2013 [04:34:53] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [05:02:59] PROBLEM - Puppet freshness on db1033 is CRITICAL: No successful Puppet run in the last 10 hours [05:04:55] looking [05:05:36] !log tstarling cleared profiling data [05:05:50] Logged the message, Master [05:06:35] may have resolved itself meanwhile, there was a follow-up on VPT saying the slowness went away, and the graphs bear that out [05:07:14] graph at http://status.wikimedia.org/8777/131241/Images-&-media suspicious too [05:09:40] (03PS1) 10Springle: depool db1035 for upgrade [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85954 [05:11:50] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [05:11:52] (03CR) 10Springle: [C: 032] depool db1035 for upgrade [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85954 (owner: 10Springle) [05:12:08] so do you know what was slow exactly? [05:12:59] !log springle synchronized wmf-config/db-eqiad.php 'depool db1035 for upgrade' [05:13:11] Logged the message, Master [05:13:39] TimStarling, I do see users complaining about image loads specifically, but also slow previews/saves. And the usual (frequent) complaints about bits.wm.o being slow. [05:14:01] maybe just an external network issue then [05:15:51] actually, this is a bit suspicious: http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Upload+caches+eqiad&m=cpu_report&s=by+name&mc=2&g=network_report [05:16:06] that looks like a plateau doesn't it? [05:16:37] or a boa constrictor digesting an elephant [05:16:59] on the weekly: http://ganglia.wikimedia.org/latest/?r=week&cs=&ce=&m=network_report&s=by+name&c=Upload+caches+eqiad&h=&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4 [05:17:06] you can see that w've exceeded this traffic before [05:17:21] so it may be that we are throttled by some upstream network event [05:18:05] ah, i see the reasoning, makes sense [05:18:31] well, you know that a plateau is indicative of an overload [05:18:42] so if it is a plateau, that is a bad thing [05:19:07] you get a plateau when the normal traffic curve is truncated by capacity constraints [05:21:14] the observium data does not look normal [05:21:22] and if the limit was regularly exceeded in the past than the constraint is external to the system [05:21:30] what's observium? [05:22:13] https://observium.wikimedia.org/ [05:23:14] oh, i don't have access to that [05:23:53] 25/Sep/13 05:06:46 Device status changed to Down [05:24:00] on csw1-sdtpa [05:27:07] also there was an event at 20:00 [05:27:54] can you send a text to leslie? it's not too late there is it? [05:28:30] 10:30pm, i'll grab my phone [05:28:34] this hasn't already been discussed with leslie has it? [05:31:00] not that I've seen (just trying to catch up while doing other things right now) [05:32:01] I'll send a message, if you haven't already [05:33:07] i just did [05:33:16] sorry, had to get my phone from the other room [05:34:00] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 05:33:55 UTC 2013 [05:34:50] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [05:36:25] observium is a terrible piece of software [05:36:31] it takes forever to find anything in it [05:39:12] like showing 6000 separate traffic graphs [05:40:19] or if you go to cr1-eqiad, it shows 727 separate traffic graphs, in little thumbnails, all with a different scale so that you can't tell where the traffic is going at a glance [05:40:22] what about torrus? google cache suggests it recently had content about that router but it errors out for me [05:40:37] i didn't know about that subdomain either [05:41:15] there was never much data in torrus [05:41:23] at least observium does actually have the graphs [05:43:30] anyway, if/when Leslie comes online, I just want her to know that there were reports of slowness, ganglia data is suspicious, and observium shows a rerouting of some kind at 18:00 that may be related [05:44:08] ^^ LeslieCarr [05:44:29] I'll flag it if she shows up and I'm still around [05:44:36] but an e-mail might be warranted [05:45:18] I can just copy/paste the last 30 minutes or so into an email if you're running off [05:47:49] I'm not going anywhere [05:49:59] TimStarling: 18 UTC? [05:50:05] when you say 18:00 [05:50:14] that's a long time ago. (~12 hrs) [05:50:32] actually it was closer to 20:00 [05:50:36] and yes, it was a long time ago [05:50:48] that's why I asked if it has already been discussed [05:50:51] k. just wanted to be sure i understood :) [05:51:26] TimStarling: 24 19:18:42 < LeslieCarr> yay all uplinks are 40g now [05:51:34] no idea if that's relevant [05:51:40] (that's UTC) [05:54:49] yes, it would have been around that time [05:59:13] (03CR) 10Physikerwelt: "Mh..." [operations/puppet] - 10https://gerrit.wikimedia.org/r/61767 (owner: 10Physikerwelt) [06:02:01] yeah, look, here is the saturation: https://observium.wikimedia.org/graphs/to=1380088901/id=134/type=port_bits/from=1380002501/ [06:02:21] plain as day [06:08:52] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [06:09:57] what does it look like? [06:10:17] I'll attach it to my mail which I'm writing [06:10:31] this is non-urgent, we have until ~18:00 tomorrow to fix this [06:11:06] 2am here, g'night, thanks for poking [06:11:29] good night [06:34:12] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 06:34:04 UTC 2013 [06:34:52] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [07:11:40] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [07:18:47] (03PS1) 10TTO: Temporary celebration logo for tawiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85960 [07:34:40] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 07:34:30 UTC 2013 [07:34:40] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [07:55:21] (03CR) 10Jeremyb: [C: 04-1] "the file doesn't appear to be protected. (although I can't read the language so who knows)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/85960 (owner: 10TTO) [08:12:25] hi, any ops around to discuss ESI deployment? [08:13:16] yurik: i think it's reasonable to expect that all besides tim and maybe mark are asleep [08:14:00] jeremyb, true that, but mark should be the best person for that i would think :) [08:14:11] for some reason i don't see him online :( [08:14:11] but he's not here [08:14:21] 6JTAABPF7: ^^ [08:14:39] is that his secret random nick? :) [08:15:00] yes! or maybe just the result of a split? [08:15:35] * yurik now knows mark's secret identity! Superman bewarned! [08:25:01] yurik: I would recommend using engineering list [08:25:10] yurik: Gabriel Wicke (in SF) would be interested probably. [08:25:26] yurik: Gabriel wrote some ESI support back in the old day for squid 3, though it never get used. [08:25:53] hashar, i just sent my proposal to ops & wikitech-l [08:26:02] yurik: and from a discussion I had with him during the all-staff, he is still interested in it. An evil plan would be to use yet another layer of cache and have the frontend cache include text from that "new" backend cache. [08:26:07] thx for the heads up abt gabriel [08:26:20] yurik: aren't you SF based? [08:26:24] NYC [08:26:33] sleep you should! [08:26:35] 4:30am here :) [08:26:36] :-D [08:26:42] still on SF time [08:26:43] but yeah definitely Gabriel + Mark [08:26:51] yurik: we have an RFC about it, and a bug IIRC [08:27:02] some other folks would be interested as well such as the mobile team, parsoid team as a whole and ori [08:27:05] Matt Walker did benchmarking [08:27:18] gwicke: Guten Abend :-] [08:27:22] hmm... look i shall. Although something tells me that ESI can be implemented extremelly well in varnish (need to check the code of course) [08:27:24] good morning ;) [08:27:29] are you in Germany? [08:27:30] lol [08:27:46] hashar: no, in SF- still playing with Cassandra [08:28:03] yurik: yes, we did our experiments with Varnish so far [08:28:12] you know what I love, we are all talking across three timezones, each of us being of a different nationality. Who could have imagined that 20 years ago? :-] [08:28:21] Matt verified that more includes slow things down a lot [08:28:42] at 5 includes request rates drop by 50% IIRC, so that is kind of the upper limit [08:29:06] the challenge now is to properly divide up the skin & content to get by with 2-3 fragments.. [08:29:07] hashar, hehe, i had such a drive when i used cell phone in the middle of a lake on a canoe a few years ago... [08:29:29] yurik: indeed. I feel like a kid right now [08:29:52] gwicke: you should talk to domas about cassandra. I am not sure whether it is still actively maintained [08:29:55] gwicke, i don't understand why the perf degrades so badly - basically ESI should be no different from serving several resources though a keep-alive http connection [08:30:24] yurik: the Varnish implementation uses regexps to find the syntax [08:30:33] then there are the subrequests and splicing [08:30:41] has nothing to do with pipelining really [08:31:15] hmm, kind of weird -- it doesn't need to search it on every request [08:31:30] Squid 3 ESI used libxml, which was both brittle (wikis did not produce anything like XML) and slow [08:32:51] you should really follow up on the list so that other people knows about all of that :D [08:32:54] in my mind it should be like this: get main from apache, search of esi tags, split it into several chunks and store those chunks in cache together with metadata about ESI urls. On every subsequent request just pull the metadata, determine other chunks to be included in the main one, and stream those chunks one after another [08:32:59] hashar: errr, doesn't seem so far-fetched. US (CONUS at least) has 4 timezones and 100+ nationalities [08:33:02] :) [08:33:03] we talked about that on the list and in the bug [08:33:16] mainly wanted to get Yuri up to speed ;) [08:33:39] thx gwicke, is my thinking about ESI is how they implemented it? [08:33:49] jeremyb: get to sleep :-D [08:33:52] see http://open.blogs.nytimes.com/tag/varnish/ ; they only run ESI if the backend says they should [08:34:10] jeremyb: err "please kindly head to your bed and enjoy a nice night of sleep" [08:34:15] (as of 3 years ago) [08:34:22] hashar: ok!!! [08:34:23] yurik: afaik fragments are re-validated and reassembled on each request [08:34:32] why??? [08:34:57] otherwise Varnish would have to track which complete page uses a given fragment [08:35:00] its already parsed, all that is needed is dynamically push chunks out one after another [08:35:17] gwicke, not sure i understand why it needs that? [08:35:18] so that it can refresh the page on fragment purge [08:36:00] again, not sure why - when the request comes in for URL1 that includes URL2, it goes to URL1, sees in metadata that it needs URL2, checks the cashing status of URL2, fetches it if needed, and serves [08:36:32] that is what it does, and it is not fast beyond a few fragments [08:36:33] if URL2 needs to be purged, it can simply be deleted [08:36:55] I thought you wanted to also cache the fully assembled page to speed things up [08:37:07] i don't think its a good idea [08:37:25] there is no major benefit as it is all in ram [08:37:39] you can simply give a linked list of pointers to the sending que [08:38:21] you also need to split up the parent page [08:38:37] we did not plan to rearchitect Varnish for now ;) [08:38:39] gwicke, the parent page only gets split up once on the first request [08:38:56] then the parent page is stored as several chunks [08:39:01] performance will be fine with a few fragments [08:39:46] yes, but this approach allows unlimited fragments really [08:40:19] without any substantial degradation of performance (unless of course each of your chunks is a few bytes long and there are thousands of them :) [08:40:28] I believe it when I see it ;) [08:41:03] hehe, i guess i should go dig into varnish now :) [08:42:23] were you thinking about the main site or about Zero? [08:42:41] if the latter, then you probably don't have to worry at all [08:42:47] zero for now, the world later :) [08:44:23] so the plan Matt and me were considering was to have a per-page loader page, then a user-specific head section including the tag, then the content fragment, and finally a footer fragment and the user-specific navi fragment [08:45:09] were you thinking of possibly doing it through ajax? [08:45:27] instead of returning content on every call :) [08:45:42] * yurik still hopes for a more on-the-fly site [08:45:45] that content is fairly small compared to all the JS and CSS we ship these days [08:45:51] exactly [08:46:01] so its better to load everything else once :) [08:46:09] single page foreva [08:47:00] the main functionality should work without JS [08:47:25] and many visits only open a single page after following a link from a search engine [08:47:26] do you think its wise for modern browsers? We could gracefully fallback [08:47:37] the second point is true [08:49:03] there are use cases for ajax, but I still see them more in the optional feature area [08:49:17] at least for content views [08:49:37] well, i think that if the user is logged in, they should be ajaxy - because they are more likely to view multiple pages [08:49:42] on high latency links doing several small requests sucks [08:49:53] better to get 10k of compressed HTML in one go [08:49:54] whereas if they are anonymous, they should be served fastest html chunk [08:50:37] true true [08:50:54] * yurik is digging into varnish source... [08:54:26] yurik: https://bugzilla.wikimedia.org/show_bug.cgi?id=32618 [08:55:31] gwicke, not sure what you proposed there [08:59:26] gwicke, ok, it seems there are no re-assembling or re-parsing (not as sure about later) [08:59:43] reading cache_esi_deliver.c [09:00:21] apparently they mostly have to deal with added complexity of zipped content including unzipped child and vice versa [09:00:45] yes, that is part of the task [09:01:12] and vary etc needs to be considered [09:01:25] for fragments as well [09:01:44] not exactly - vary is dealt with in the request handling for all requests [09:02:00] right, including subrequests [09:02:08] every time it processes a URL, it goes through all those steps again [09:02:37] only when it needs to reassemble it needs to figure out how to work with zipped content, fix CRC, etc [09:02:59] the problem is knowing when it needs to do so [09:03:40] for that it needs to check the cache status for each fragment [09:03:41] so i suspect the real bottleneck is in fixing compression - if you have main and child both zipped, it might be a bit harder to deal with them in a fast manner [09:04:07] Matt went through a lot of permutations re compression [09:04:30] true, but it is still a local search - shouldn't be that long compared with hitting backend [09:05:06] i mean - how long is processing of one request vs 2 requests if there is no network overhead [09:05:40] haha- once you hit the backend you are in a different magnitude [09:05:57] for zero none of this matters really [09:06:18] i meant - 2 requests vs 1 request only on the varnish server :) [09:06:30] of course the backend kills everything completelly (PHP be damned!!!) [09:07:11] five subrequests can be had at a 50% slowdown [09:07:37] which means that it is faster than full requests, but slower than a non-ESI page [09:08:04] this is assuming that all of those are in cache [09:08:23] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [09:08:56] hashar: re Cassandra, there are quite a few users outside of FB these days [09:09:17] afaik Cassandra 2 and CQL happened outside of FB [09:10:02] * gwicke waves goodbye [09:10:21] gwicke, oki, so it means we won't use varnish as wiki templates :) [09:10:33] gwicke_away, ^ :) [09:11:12] (03CR) 10Dzahn: "i see puppet is disabled here. bug or purpose?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/85869 (owner: 10Cmjohnson) [09:34:03] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Wed Sep 25 09:33:54 UTC 2013 [09:34:23] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [09:35:31] <6JTAABPF7> !log Put all *.wikimedia.org, wikidata, wikivoyage eqiad traffic on Varnish [09:35:48] Logged the message, Master [09:35:58] bah puppet doc generator is broken :( https://integration.wikimedia.org/ci/job/operations-puppet-doc/ [09:36:23] * hashar looks at https://gerrit.wikimedia.org/r/#/c/85840/ [09:36:35] by my wth [09:40:48] Reedy: Do you know if the software (as it was in late 2003/early 2004) also recorded timestamps in UTC? I have no reason to believe otherwise, just curious if anyone else does. [09:41:01] e.g. deleted revisions in bnwiki.archive [09:41:03] hashar: the .git stuff ? [09:41:26] mark: gwicke_away: AaronSchulz: TimStarling: ^ [09:41:53] akosiaris: yeah that is crazy. The git plugin is rewriting the submodules URLs [09:42:05] akosiaris: I have reopened an old bug https://bugzilla.wikimedia.org/show_bug.cgi?id=42953#c5 [09:42:05] i'm pretty sure it did by 2004, not sure about before that [09:42:18] I'm trying to unfold the exact time of the first edit to that wiki for their 10 year party. [09:42:41] OK [09:42:49] Krinkle: brion / tim might know. [09:43:05] else you ill have to dig in the old code :D [09:43:48] don't bother :) [09:43:53] it won't be the real first edit anyway [09:44:21] did we drop some history at one point? [09:44:35] I am pretty sure we are missing the history of use mod era since it did not have history [09:44:44] and some history got lost when migrating from phase2 to phase3 [09:44:46] bnwiki wasn't in UseMod afaik. [09:44:55] but bnwiki is probably old enough to have started directly on phase3 [09:45:14] phase3 or phase3. It started amonth after the first MediaWiki release [09:45:29] since we were probably using it internally for a little bit by then, I guess so. [09:46:02] I wonder why the first few edits have hostnames as the username though. I thought that was a UseMod thing. [09:46:03] e.g. flets-a-west-15-144.dsn.jp [09:46:13] Did MediaWiki ever do that? [09:46:54] Krinkle: oldest entry 27 January 2004 // https://bn.wikipedia.org/w/index.php?oldid=6&uselang=en [09:47:11] hashar: I'm way ahead of you [09:47:12] https://meta.wikimedia.org/w/index.php?title=User:Krinkle/Queries/The_Start_of_Bengali_Wikipedia [09:47:21] 2003 actually [09:47:46] HomePage was deleted [09:47:57] oh [09:48:31] hi mark, when you have a chance, take a look at the ESI email i sent earlier wrt zero -- do you think we can start testing the first and second steps this week? (I will be travelling next week and we can already start getting some data on it) [09:49:37] hashar: These are interesting as well: https://bn.wikipedia.org/wiki/User:!Popular_articles [09:49:44] There's about a dozen pages like that [09:50:00] but not created by "MediaWiki default". All by different IPs or user names [09:50:24] https://bn.wikipedia.org/wiki/Special:PrefixIndex/User:! [09:51:46]