[03:33:24] <wikibugs>	 10serviceops, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading, and 2 others: 502 Server Hangup Error for "Upload a new version of this file" on Special:Upload on Commons - https://phabricator.wikimedia.org/T247454 (10Rsteen) To @AntiCompositeNumber. As someone who regularly encounters this ty...
[09:16:46] <effie>	 _joe_: should I merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/636047 ?
[09:17:05] <effie>	 and to a cluster restart using systemctl reload?
[09:17:17] <effie>	 do*
[09:18:59] <_joe_>	 did you check what effect that change has?
[09:24:47] <effie>	 so in our case, it will delay even more restarting php-fpm due to almost exhausting our max keys on opcache  
[09:26:02] <effie>	 right now php-fpm will restart of we are 2000 keys away from opcache restarting 
[09:26:50] <effie>	 so right now we restart php-fpm when one of the following conditions is met:
[09:27:08] <effie>	 1) we gave less than 200Mb of free opcache
[09:27:26] <effie>	 2) when apcu frag is 95% 
[09:28:25] <effie>	 3) when the current number of keys in opcache is ~30k keys 
[09:29:22] <effie>	 if we change 3) to ~63k keys I anticipate that either 
[09:30:06] <effie>	 a) we will never reach this number, if we have less php scripts to cache, so php-fpm will never restart due to that 
[09:30:57] <effie>	 b) if we are reaching this number, we will rarely have restarts due to this condition because conditions 1) and 2) will be met earlier
[09:43:51] <effie>	 both approaches are ok, but I think increasing the number is better in terms of, fewer restarts, fewer requests killed
[09:54:03] <wikibugs>	 10serviceops, 10Push-Notification-Service, 10Product-Infrastructure-Team-Backlog (Kanban), 10User-jijiki: High latency on push notification service initialization - https://phabricator.wikimedia.org/T265258 (10Jgiannelos) @jijiki Nothing comes in mind for that specific task, given that we still don't use i...
[10:44:09] <_joe_>	 I don't care much about having a bunch of requests failing every now and then, that is mostly covered by ats retries
[10:44:31] <_joe_>	 but I am worried about the effects on  perf/stability of changing that setting
[10:44:54] <_joe_>	 I would frankly apply it on some canary hosts, and I would certainly wait until after the switchover is done
[10:53:42] <effie>	 _joe_: I have not come across any reference of this causing any issues
[10:54:21] <effie>	 _joe_: is there something specific on your mind?
[10:54:26] <_joe_>	 effie: I'm just saying it's better to do changes to php-fpm configurations with some caution
[10:54:37] <_joe_>	 and that includes not changing settings the day before a switchover
[10:54:43] <_joe_>	 but if you disagree, go on
[10:54:55] <effie>	 I am happy to wait until Thursday 
[10:55:16] <effie>	 I will do so then 
[11:10:19] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Test deployment-charts for kubernetes 1.19 compatibility - https://phabricator.wikimedia.org/T266032 (10JMeybohm) a:03JMeybohm My current plan is to build a kubeval deb and add a git repo with the needed kubernetes api schema. I think we don't need fancy auto-...
[11:14:19] <wikibugs>	 10serviceops, 10Desktop Improvements, 10Operations, 10Product-Infrastructure-Team-Backlog, and 3 others: Connection closed while downloading PDF of articles - https://phabricator.wikimedia.org/T266373 (10ovasileva)
[11:28:39] <akosiaris>	 I got an interesting question. I was playing a bit with the percentiles of etcd requests. So here's the question. Normally GETs are faster than QGETs (as one would expect). That's until ~p83 (per https://grafana.wikimedia.org/d/unWPiYtGz/etcd-tests?orgId=1). Around there, GETs become slower than QGETs. By p99, the trend has entirely shifted and GETs are slower than QGETs. I am wondering which those requests are. Metrics don
[11:28:40] <akosiaris>	 't have something in them however.
[11:32:05] <_joe_>	 akosiaris: uhm interesting
[11:32:13] <_joe_>	 it might be large recursive requests
[11:32:28] <_joe_>	 normally we do qgets only for CAS operations
[11:32:38] <_joe_>	 quorum read -> CAS operation
[11:33:08] <_joe_>	 so they tend to be smaller requests and responses
[11:33:32] <_joe_>	 while for example `confctl select` will query a large amount of data to sweep through
[13:35:21] <wikibugs>	 10serviceops, 10User-jijiki: Create a structured testing environment for applications running on kubernetes - https://phabricator.wikimedia.org/T264025 (10jijiki)
[14:19:03] <wikibugs>	 10serviceops, 10Growth-Structured-Tasks, 10Growth-Team, 10Release-Engineering-Team: Move dedcode/mwaddlink from github to gerrit - https://phabricator.wikimedia.org/T261403 (10kostajh) 05Open→03Resolved
[16:01:18] <wikibugs>	 10serviceops, 10Wikifeeds, 10Kubernetes: wikifeeds-production-tls-proxy regularly exceeding its k8s CPU reservation - https://phabricator.wikimedia.org/T266194 (10JMeybohm) 05Open→03Resolved Looks way better now, even under higher load.
[16:24:11] <wikibugs>	 10serviceops, 10Kubernetes, 10User-jijiki: Create a structured testing environment for applications running on kubernetes - https://phabricator.wikimedia.org/T264025 (10jijiki)
[16:27:43] <wikibugs>	 10serviceops, 10Operations, 10Scap, 10Release-Engineering-Team-TODO (2020-10-01 to 2020-12-31 (Q2)): Make a way to build Scap .deb in Docker - https://phabricator.wikimedia.org/T265501 (10jijiki) p:05High→03Low
[17:23:43] <wikibugs>	 10serviceops, 10Operations, 10Platform Engineering, 10Performance-Team (Radar), and 2 others: Upgrade memcached cluster to Debian Stretch/Buster - https://phabricator.wikimedia.org/T213089 (10jijiki)
[17:24:21] <wikibugs>	 10serviceops, 10Desktop Improvements, 10Operations, 10Product-Infrastructure-Team-Backlog, and 3 others: Connection closed while downloading PDF of articles - https://phabricator.wikimedia.org/T266373 (10ovasileva)
[19:14:44] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10observability, 10Kubernetes, 10Patch-For-Review: Store Kubernetes events for more than one hour - https://phabricator.wikimedia.org/T262675 (10JMeybohm) I've changed the field names to be more specific so events are indexed now.  Also I created a fancy dashboard using m...
[21:06:04] <wikibugs>	 10serviceops, 10MW-on-K8s, 10Operations, 10TechCom-RFC, 10Patch-For-Review: RFC: PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10BPirkle)
[21:07:06] <wikibugs>	 10serviceops, 10Operations, 10Parsoid, 10Parsoid-Tests: Make testreduce web UI publicly accessible on the internet - https://phabricator.wikimedia.org/T266509 (10ssastry)
[22:36:29] <wikibugs>	 10serviceops, 10Platform Engineering, 10observability, 10Developer Productivity: Set ENV SERVERGROUP for jobrunner MW web requests - https://phabricator.wikimedia.org/T266515 (10Krinkle)
[22:38:06] <wikibugs>	 10serviceops, 10Platform Engineering, 10observability, 10Developer Productivity: Set ENV SERVERGROUP for jobrunner MW web requests - https://phabricator.wikimedia.org/T266515 (10Krinkle) Job execution currently uses the `/rpc/RunSingleJob` endpoint in production which iirc has its own VirtualHost, that mig...