[00:05:19] 10serviceops, 10Operations, 10Parsoid-PHP: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10Dzahn) The memory_limit set in php-fpm config is 500M. When trying to curl from restbase1017 to wtp1025 the error shows a limit of 660M though. ` [restbase1017:~] $ curl -H "Host: en.w... [00:47:46] 10serviceops, 10Operations, 10Parsoid-PHP: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10Dzahn) Ok, wrong curl command to actually talk to wtp1025 and not the cluster while avoiding cert issue. This works: ` curl --header "Host: en.wikipedia.org" --resolve 'parsoid-php.dis... [00:57:09] The Syrian war template page can't be fixed even with much higher memory_limit, the other example URL though can be fixed if it would be raised to about 980M. [00:57:22] wtp1025 reverted live hack and pooled again [03:04:18] 10serviceops, 10Operations, 10Parsoid-PHP: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10ssastry) Let us not bump the memory limit quite yet. I am curious to see how many instances of these we run into and we can then test these pages on scandium and determine what might be a... [03:39:35] 10serviceops, 10Operations, 10Parsoid-PHP: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10ssastry) >>! In T236833#5617979, @ssastry wrote: > Let us not bump the memory limit quite yet. I am curious to see how many instances of these we run into and we can then test these pages... [05:40:41] 10serviceops, 10Core Platform Team, 10MediaWiki-Cache, 10Performance-Team (Radar): Upgrade php-apcu to 5.1.18 - https://phabricator.wikimedia.org/T236800 (10jijiki) [06:02:19] 10serviceops, 10Core Platform Team, 10MediaWiki-Cache, 10Performance-Team (Radar): Upgrade php-apcu to 5.1.18 - https://phabricator.wikimedia.org/T236800 (10Joe) Instead of upgrading to a later version, we can backport that patch to the version we're using instead. [06:04:40] 10serviceops, 10Core Platform Team, 10MediaWiki-Cache, 10Performance-Team (Radar): Upgrade php-apcu to 5.1.18 - https://phabricator.wikimedia.org/T236800 (10Joe) We're on 5.1.17 so backporting the patch should be pretty simple. [06:14:57] 10serviceops, 10Core Platform Team, 10MediaWiki-Cache, 10Performance-Team (Radar): Ensure apcu incr/decr are atomic - https://phabricator.wikimedia.org/T236800 (10jijiki) [06:38:58] 10serviceops, 10MediaWiki-General, 10Core Platform Team Workboards (Clinic Duty Team), 10Language-Team (Language-2019-October-December), and 4 others: Preemptive refresh in getMultiWithSetCallback() and getMultiWithUnionSetCallback() pollutes cache - https://phabricator.wikimedia.org/T235188 (10Joe) @WDora... [08:19:56] 10serviceops, 10Operations, 10observability, 10Patch-For-Review, 10Performance-Team (Radar): Messages in Logstash from php-fatal-error.php are missing from type:mediawiki/channel:fatal - https://phabricator.wikimedia.org/T234283 (10Joe) In the case of HHVM, fatals were correctly handled by the daemon and... [09:33:37] 10serviceops, 10Operations, 10observability, 10Patch-For-Review, 10Performance-Team (Radar): Messages in Logstash from php-fatal-error.php are missing from type:mediawiki/channel:fatal - https://phabricator.wikimedia.org/T234283 (10fgiunchedi) >>! In T234283#5618330, @Joe wrote: > We need to do one of th... [11:17:24] 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Product-Infrastructure-Team-Backlog, and 2 others: Resolve service instability due to excessive event loop blockage since starting PCS response pregeneration - https://phabricator.wikimedia.org/T229286 (10jijiki) @Mholloway Can we mark this... [12:13:21] 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Product-Infrastructure-Team-Backlog, and 2 others: Resolve service instability due to excessive event loop blockage since starting PCS response pregeneration - https://phabricator.wikimedia.org/T229286 (10Mholloway) 05Open→03Resolved [12:21:50] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Release Pipeline, 10Release-Engineering-Team: Allow testing of feature-flag-protected features in deployment-charts CI - https://phabricator.wikimedia.org/T236899 (10Joe) [12:22:01] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Release Pipeline, 10Release-Engineering-Team: Allow testing of feature-flag-protected features in deployment-charts CI - https://phabricator.wikimedia.org/T236899 (10Joe) p:05Triage→03High a:03Joe [12:29:47] <_joe_> hey if anyone has opinions on my proposed structure for adding "tests" to our helm charts, please comment there ^^ [13:46:50] puppet runs on deploy* are printing a warning that /etc/hhvm cannot be removed, which is caused by a fatal-error.php in there, is that known issue/addressed by some of the ongoing patches? [13:53:17] moritzm: that is me [13:53:32] I will remove fatal-error.php in a new patch [13:53:50] how come you all found out today? [13:53:54] :D [13:54:14] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/547079/ [13:55:27] I will actualy wait for whatever is up with puppet [14:13:23] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) Some jobs are getting stuck (single xml backups) for some issue complaining about the sd daemon. Even cancel gets stuck (expe... [14:25:27] <_joe_> effie: this kind of issues are part of the "I told you" about puppet absenting resources [14:30:26] _joe_: I know [14:30:33] you also said you wouldn't say I told you so [14:30:49] :d [14:30:50] :D [14:32:11] <_joe_> I said "I told you" [14:32:12] <_joe_> :P [14:33:54] haaha [15:20:45] 10serviceops, 10Mobile-Content-Service, 10Operations, 10Page Content Service, and 4 others: New Service Request: wikifeeds - https://phabricator.wikimedia.org/T223469 (10akosiaris) 05Open→03Resolved a:03akosiaris This is done. wikifeeds has been deployed for some time now [16:46:11] 10serviceops, 10Machine vision, 10Operations, 10Product-Infrastructure-Team-Backlog: How should the MachineVision extension interact with external APIs from production? - https://phabricator.wikimedia.org/T236797 (10dbarratt) As far as UX is concerned... The HTTP request should **not** happen on page load... [16:52:50] 10serviceops, 10Machine vision, 10Operations, 10Product-Infrastructure-Team-Backlog: How should the MachineVision extension interact with external APIs from production? - https://phabricator.wikimedia.org/T236797 (10Joe) Hi, I assumed the fetching of such data would happen via an async job indeed, upon ima... [16:56:29] 10serviceops, 10Machine vision, 10Operations, 10Product-Infrastructure-Team-Backlog: How should the MachineVision extension interact with external APIs from production? - https://phabricator.wikimedia.org/T236797 (10Joe) >>! In T236797#5620179, @dbarratt wrote: > As far as UX is concerned... > > The HTTP... [16:57:45] 10serviceops, 10Machine vision, 10Operations, 10Product-Infrastructure-Team-Backlog: How should the MachineVision extension interact with external APIs from production? - https://phabricator.wikimedia.org/T236797 (10dbarratt) >>! In T236797#5620272, @Joe wrote: > I would assume that running in the client w... [17:08:47] 10serviceops, 10Machine vision, 10Operations, 10Product-Infrastructure-Team-Backlog: How should the MachineVision extension interact with external APIs from production? - https://phabricator.wikimedia.org/T236797 (10Mholloway) The HTTP requests for labels happen asynchronously in a deferred update on uploa... [17:27:40] 10serviceops, 10Machine vision, 10Operations, 10Product-Infrastructure-Team-Backlog: How should the MachineVision extension interact with external APIs from production? - https://phabricator.wikimedia.org/T236797 (10Joe) >>! In T236797#5620320, @Mholloway wrote: > The HTTP requests for labels happen asynch... [17:29:02] 10serviceops, 10MediaWiki-General, 10Operations, 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Anomie) a:05Anomie→03None I'm not actively working on... [17:34:34] 10serviceops, 10Machine vision, 10Operations, 10Product-Infrastructure-Team-Backlog: How should the MachineVision extension interact with external APIs from production? - https://phabricator.wikimedia.org/T236797 (10Mholloway) @joe That sounds good, thanks. I'll update the code accordingly. [17:41:22] 10serviceops, 10Machine vision, 10Operations, 10Product-Infrastructure-Team-Backlog: How should the MachineVision extension interact with external APIs from production? - https://phabricator.wikimedia.org/T236797 (10Joe) Basically what you need is: - a setting for the domains to exclude from proxying to -... [18:05:46] 10serviceops, 10Machine vision, 10Operations, 10Product-Infrastructure-Team-Backlog: How should the MachineVision extension interact with external APIs from production? - https://phabricator.wikimedia.org/T236797 (10Mholloway) While investigating this yesterday I found that the Google client library that o... [18:11:20] 10serviceops, 10Machine vision, 10Operations, 10Product-Infrastructure-Team-Backlog: How should the MachineVision extension interact with external APIs from production? - https://phabricator.wikimedia.org/T236797 (10Joe) So after some quick grepping, we already define a proxy in `mediawiki-config`, and it... [18:14:18] 10serviceops, 10MediaWiki-General, 10Operations, 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10kaldari) >After filtering out *wiktionary ns0 and ns1, loo... [18:35:21] dear serviceops: a heads up; echo seen-time storage just went live. kthnxbye. Love, Me [18:37:20] 10serviceops, 10Operations, 10Traffic, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10Dzahn) [18:38:51] 10serviceops, 10Operations, 10Traffic, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10Dzahn) RT (requesttracker) moved from jessie and public IP (ununpentium) to buster and private IP (moscovium) and https to backend via https://rt.discovery.wmnet [18:57:51] 10serviceops: Outdated Blubber package 0.6 in repo - https://phabricator.wikimedia.org/T236942 (10holger.knust) [19:09:57] 10serviceops, 10Operations, 10Patch-For-Review: decom cobalt - https://phabricator.wikimedia.org/T236187 (10Dzahn) [19:36:06] dear urandom: serviceops is almost compeltely people in europe who are now not here. there's two exceptions, and one of those is a new person. :-P :-P [19:36:29] * apergos is also not here. (9:30 pm, dinner time!) [19:38:32] apergos: I know, but if shit happens and people come running back, I'm sure they'll see my message :) [19:38:44] :-P :-P :-P :-P [19:42:44] 10serviceops, 10MediaWiki-General, 10Operations, 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Anomie) >>! In T219279#5620704, @kaldari wrote: >>After fi... [19:50:00] <_joe_> urandom: that's great [19:50:08] <_joe_> urandom: are we still reading from redis? [19:50:37] _joe_: it falls back on a miss, yeah [19:59:37] 10serviceops, 10Parsing-Team, 10Parsoid, 10PHP 7.2 support: Parsoid-php doesn't get updated after a code deploy - https://phabricator.wikimedia.org/T236275 (10Arlolra) Alas, we're not quite there for beta, ` 19:53:36 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'parsoid/deploy', '-g', 'default', 'pro... [20:04:14] _joe_: looks good, I think [20:04:21] _joe_: ~1k reqs/sec [20:05:19] the latency distribution is great IMO, with a mean latency of < 2ms and a 99p of < 4ms [20:06:27] I doubt we need so many pods [20:06:32] Pchelolo, when are you deploying that restbase patch to fix the req. rate issue? [20:08:04] <_joe_> subbu: what's the req rate issue? [20:08:21] <_joe_> urandom: probably, but melius abundare quam deficere [20:08:29] _joe_, let me find the phab task .. one moment [20:09:04] _joe_: I had to look that up, but, +1 :) [20:09:06] https://phabricator.wikimedia.org/T236838 [20:09:09] _joe_, ^ [20:11:36] subbu: will answer in 20 mins, in a meeting [20:13:16] k [20:14:14] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: mw1239 memory errors - https://phabricator.wikimedia.org/T227867 (10wiki_willy) @jijiki - just following up to see if this is still an issue or if we can resolve this. Thanks, Willy [20:37:13] subbu: sorry, so yeah, I've been thinking about deploying it today together with logging requests that cause core to 412 [20:37:21] maybe in 30 minutes or so? [20:37:32] sounds good. [20:37:45] ok. I'll ping you once there's anything new to look at [21:09:34] 10serviceops, 10Operations, 10Continuous-Integration-Infrastructure (phase-out-jessie): Upload docker-ce 18.06.3 upstream package for Stretch - https://phabricator.wikimedia.org/T226236 (10hashar) Eventually a few days after I noticed jobs were slower than expected and spend a couple days narrowing down. Tha... [21:21:11] 10serviceops, 10Operations, 10Parsoid-PHP: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10ssastry) Okay, I have the full set of about 110 urls (that were failing with Parsoid/PHP with OOMs in the last 24 hours) and am ready to test these urls on scandium. @mutante, can you bum... [21:22:40] 10serviceops, 10Operations, 10Parsoid-PHP: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10ssastry) Some of those urls were being repeatedly retried (many 10s of times over the last 24 hours probably because of T236838) .. so, that is why only ~100 unique urls even though there... [22:29:36] !log scandium - live hack /srv/mediawiki/wmf-config/InitialiseSettings.php - set wmgMemoryLimit to 850 (*1024 *1024), restart php7.2-fpm (T236833) [22:30:33] mutante wrong channel :P [22:31:39] paladox: yes :) [22:32:15] well, didn't hurt to mention here [22:34:44] 10serviceops, 10Operations, 10Parsoid-PHP: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10Dzahn) > @Mutante, can you bump the limit to 850 MB on scandium? @ssastry I changed it to 850MB on scandium in MediawikiConfig as logged above. (Please use the @Dzahn user here on Ph... [22:50:00] 10serviceops, 10Parsing-Team, 10Parsoid, 10PHP 7.2 support: Parsoid-php doesn't get updated after a code deploy - https://phabricator.wikimedia.org/T236275 (10Dzahn) a:05Joe→03Dzahn