[03:04:28] 10serviceops, 10Operations, 10HHVM, 10Patch-For-Review, 10Performance-Team (Radar): Remove HHVM from production - https://phabricator.wikimedia.org/T229792 (10Krinkle) >>! In T229792#5574497, @gerritbot wrote: > Change 539128 had a related patch set uploaded (by Effie Mouzeli; owner: Giuseppe Lavagetto):... [05:32:56] 10serviceops, 10Growth-Team, 10Notifications, 10Operations, and 3 others: Provision Kask for Echo timestamp storage in k8s - https://phabricator.wikimedia.org/T234376 (10Joe) Heh yes sorry, I forgot to tell you yesterday - you need to use `helmfile destroy` in newer versions of helmfile. [12:15:29] 10serviceops, 10Operations, 10Performance-Team: Increased latency in POST requests - https://phabricator.wikimedia.org/T235755 (10jijiki) [12:15:33] 10serviceops, 10Operations, 10Performance-Team: Increased latency in POST requests - https://phabricator.wikimedia.org/T235755 (10jijiki) [12:15:37] 10serviceops, 10Operations, 10Performance-Team: Increased latency in POST requests - https://phabricator.wikimedia.org/T235755 (10jijiki) [13:18:55] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Strengthen backup infrastructure and support - https://phabricator.wikimedia.org/T229209 (10jcrespo) The copy finished correctly and actually found a bug on transfer.py: ` ERROR: Original checksum c89dcd766fa3072718753b9ab0bdfb7d baculasd... [13:58:44] 10serviceops, 10Growth-Team, 10Notifications, 10Operations, and 3 others: Provision Kask for Echo timestamp storage in k8s - https://phabricator.wikimedia.org/T234376 (10Eevans) >>! In T234376#5582893, @Joe wrote: > Heh yes sorry, I forgot to tell you yesterday - you need to use `helmfile destroy` in newer... [14:32:11] 10serviceops, 10Operations, 10Patch-For-Review: Make the parsoid cluster support parsoid/PHP - https://phabricator.wikimedia.org/T233654 (10Dzahn) [14:46:37] 10serviceops, 10Operations, 10service-runner, 10CPT Initiatives (RESTBase Split (CDP2)), and 5 others: RESTBase/RESTRouter/service-runner rate limiting plans - https://phabricator.wikimedia.org/T235437 (10Joe) One detail I want to understand about the Redis hypothesis: - What happens if the rate-limiting... [14:51:11] 10serviceops, 10Operations, 10service-runner, 10CPT Initiatives (RESTBase Split (CDP2)), and 5 others: RESTBase/RESTRouter/service-runner rate limiting plans - https://phabricator.wikimedia.org/T235437 (10Pchelolo) > What happens if the rate-limiting service is unavailable or is lagging? In Change-Prop, w... [17:15:55] 10serviceops, 10Release-Engineering-Team: Missing annotations for sync-wikiversions - https://phabricator.wikimedia.org/T235787 (10jijiki) [17:52:38] mutante, i am ready to run the perf tests today .. can you depool the eqiad server (wtp1025 i guess since that has the logstash routing enabled in puppet) in the next hour or two and let me know? [18:01:19] <_joe_> subbu: {{done}} [18:01:28] <_joe_> the server is now depooled [18:01:33] thanks!, just got back online [18:01:41] ty. [18:17:01] so, looks like curl -x wtp1025.eqiad.wmnet:80 -v http://en.wikipedia.org/w/rest.php/en.wikipedia.org/v3/page/html/Hospet doesn't work ... -x scandium.eqiad.wmnet:80 of course works [18:19:33] looking in logstash [18:21:08] ah .. the symlinking part is missing ... this will need a fresh parsoid deploy since marko added it as a post-deploy scap task .. or a manual symlink for now. [18:22:16] subbu: do you want me to create the link for right now? [18:22:23] cannot do it manually of course .. yes. that woul dbe helpful. [18:22:25] which one is it on scandium [18:22:34] scandium is fine. [18:22:42] yea, i mean to compare what is missing [18:22:52] so on wtp1025 ... cd /srv/deployment/parsoid/deploy/src; ln -s ../vendor [18:24:48] vendor: symbolic link to ../vendor [18:24:56] deploy-service deploy-service 9 Oct 17 18:24 vendor -> ../vendor [18:25:10] done. as deploy-service [18:25:33] thanks .. works now. [18:26:36] nice [18:29:32] interesting .. scandium is faster than wtp1025 (for both parsoid/js & parsoid/php) [18:30:08] maybe it is a newer server? [18:32:26] Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz (wtp1025) vs Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz (scandium) ... wtp1025 has more procs (40 vs 32) and more cache and cpu mhz is higher than scandium .. [18:34:45] scandium is a Dell PowerEdge R440 [18:35:04] wtp1025 is a Dell PowerEdge R430 [18:36:57] scandium was bought July 2018, wtp1025 was bought March 2017 [18:38:03] ok. anyway, that tells me i shouldn't just look a /proc/cpuinfo numbers :) [18:38:07] *at [18:42:56] anyway, it does look like basic perf tests shows that parsoid/php is indeed faster than parsoid/js for wt->html. i'll grab some lunch and do some more tests across multiple endpoints. parsoid/php is 3 weeks stale on that server .. but perf hasn't changed much in that timeframe . but we'll do a fresh parsoid deploy monday and i can rerun tests then. [18:44:08] oh wow that's great, hope it holds up! [18:44:16] :) [18:46:14] apergos, it does .. we've known this for the last 2 months .. just verifying it is true on an actual prod server ... not fully sure for why it is the case .. caching effects is probably one .. we might investigate at a later point. [18:46:32] congrats, in any case :-) [18:46:35] but html->wt is slower on larger doms. [18:48:18] hrm [18:48:30] job security? :-D [18:50:22] job security? [18:54:44] mutante, and eqiad servers are newer than codfw right? [18:58:47] subbu|lunch: wtp2001 is purchase date Jan 2015 and Dell PowerEdge R420 , yes, right [18:58:51] <_joe_> I think the basic reason is that getting data over two tcp/ip hops is slower than one [18:59:18] <_joe_> parsoid => mediawiki => data layer [18:59:26] <_joe_> vs parsoid => data layer [19:56:09] _joe_, maybe, but i am not convinced that is the entire answer because a lot of the time, parsoid/js is cpu bound and overlaps computation with i/o wait. if anything, the argument was that we are offloading computation work to the mediawiki cluster .. so as long as there isn't i/o wait .. that shouldn't be an issue. [19:57:16] it is probably still one part of the answer but i think it would be useful to investigate carefully to see if that is the entire answer. [19:58:16] parsoid/js parse.js script has the ability to record api responses and disk and replay them .. so that effectively eliminates network i/o time .. parse/php doesn't have this record/replay ability, but adding that would help us isolate factors [19:59:49] I suspect the PHP's libxml C-based dom impl might also be a factor since dom work is ~30% of wt->html parse time usually. anyway, something for later. :) [20:16:42] <_joe_> oh yes also nodejs model of execution is not really suited for a parser [20:16:53] <_joe_> (actually one of my interview questions :P)