[00:43:44] <wikibugs>	 10serviceops, 10Wikidata, 10Wikidata Query Builder, 10Wikidata Query UI, 10User-Addshore: Host static sites on kubernetes - https://phabricator.wikimedia.org/T264710 (10Dzahn) While we are generally interested in moving all static sites at some point in the future we are not there yet at the current time...
[07:04:27] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Release Pipeline, 10Patch-For-Review: Refactor our helmfile.d dir structure for services - https://phabricator.wikimedia.org/T258572 (10JMeybohm) >>! In T258572#6562150, @Ottomata wrote: > Ok!  Done.  Great, thanks!
[07:29:44] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Release Pipeline, 10Patch-For-Review: Refactor our helmfile.d dir structure for services - https://phabricator.wikimedia.org/T258572 (10Joe) >>! In T258572#6562150, @Ottomata wrote: > Ok!  Done.  Thanks a ton, I'm going to remove all the special casing both in puppet and...
[07:32:23] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10observability, 10Kubernetes, 10Patch-For-Review: Store Kubernetes events for more than one hour - https://phabricator.wikimedia.org/T262675 (10JMeybohm) Quick chat in IRC turned out that we don't have a "good for kubernetes" way to discover the kafka brokers (like DNS S...
[07:56:44] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10observability, 10Kubernetes, 10Patch-For-Review: Store Kubernetes events for more than one hour - https://phabricator.wikimedia.org/T262675 (10Joe) >>! In T262675#6562991, @JMeybohm wrote: > Quick chat in IRC turned out that we don't have a "good for kubernetes" way to...
[08:01:19] <wikibugs>	 10serviceops, 10DBA, 10MediaWiki-Parser, 10Parsoid, 10Platform Team Workboards (Green): CAPEX for ParserCache for Parsoid - https://phabricator.wikimedia.org/T263587 (10jcrespo) I have one question before everything else- does the parsercache expansion mean like a new "cluster/service" in parallel to the...
[08:26:03] <jayme>	 _joe_: akosiaris: I stubled again over a unapplied change in helmfile.d/admin and I think we should maybe add something to preventing or at least alert on that. It's like unmerged puppet changes, right?!
[08:26:21] <_joe_>	 +1
[08:26:38] <jayme>	 Maybe we can just store the last git sha applied to the cluster in a configmap or annotation?
[08:26:51] <jayme>	 and alert on diff after whatever time
[08:27:16] <_joe_>	 write a task?
[08:27:22] <_joe_>	 that looks like a sensible idea
[08:27:36] <jayme>	 yeah, will do
[08:28:33] <jayme>	 probably no so sensible, as helmfile.d/admin is no separate repo...we'll see
[08:28:57] <akosiaris>	 hmm, good point
[08:41:14] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Alert on unapplied changes in deployment-charts repo - https://phabricator.wikimedia.org/T265979 (10JMeybohm)
[08:42:17] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Make helm upgrades atomic - https://phabricator.wikimedia.org/T252428 (10JMeybohm) 05Open→03Resolved
[09:25:48] <wikibugs>	 10serviceops: eqiad: New ganeti instance for orchestrator installation - https://phabricator.wikimedia.org/T265982 (10Marostegui)
[09:25:58] <wikibugs>	 10serviceops: eqiad: New ganeti instance for orchestrator installation - https://phabricator.wikimedia.org/T265982 (10Marostegui) p:05Triage→03Medium
[09:26:10] <wikibugs>	 10serviceops, 10Operations, 10vm-requests: eqiad: New ganeti instance for orchestrator installation - https://phabricator.wikimedia.org/T265982 (10Marostegui)
[09:33:22] <wikibugs>	 10serviceops, 10Operations, 10vm-requests: eqiad: New ganeti instance for orchestrator installation - https://phabricator.wikimedia.org/T265982 (10akosiaris) LGTM, perhaps old do codfw as well since you are at it to have a fallback/backup?
[09:34:00] <wikibugs>	 10serviceops, 10Operations, 10vm-requests: eqiad: New ganeti instance for orchestrator installation - https://phabricator.wikimedia.org/T265982 (10Marostegui) No, no need for codfw for now, we are still on super early stages.
[10:54:11] <wikibugs>	 10serviceops, 10Operations, 10vm-requests: eqiad: New ganeti instance for orchestrator installation - https://phabricator.wikimedia.org/T265982 (10Kormat)
[11:41:06] <wikibugs>	 10serviceops, 10Operations, 10Performance-Team (Radar): Increased latency in CODFW API and APP monitoring urls (~07:20 UTC 19 Jan 2020) - https://phabricator.wikimedia.org/T243149 (10Marostegui) 05Open→03Resolved a:03Marostegui I am going to close this, as there is not much else we can really do here a...
[11:43:34] <wikibugs>	 10serviceops, 10Operations: php-fpm  invalid opcode on mw1317 - https://phabricator.wikimedia.org/T236292 (10Marostegui) @jijiki what do you want to do with this task?
[13:17:45] <wikibugs>	 10serviceops, 10DBA, 10MediaWiki-Parser, 10Parsoid, 10Platform Team Workboards (Green): CAPEX for ParserCache for Parsoid - https://phabricator.wikimedia.org/T263587 (10Pchelolo) Thank you for the answers!  > I have one question before everything else- does the parsercache expansion mean like a new "clus...
[13:18:41] <wikibugs>	 10serviceops, 10DBA, 10MediaWiki-Parser, 10Parsoid, 10Platform Team Workboards (Green): CAPEX for ParserCache for Parsoid - https://phabricator.wikimedia.org/T263587 (10Marostegui) >>! In T263587#6562289, @Pchelolo wrote: > I guess we have to begin here.  >  > TLDR of the problem is that we will not have...
[13:25:33] <wikibugs>	 10serviceops, 10DBA, 10MediaWiki-Parser, 10Parsoid, 10Platform Team Workboards (Green): CAPEX for ParserCache for Parsoid - https://phabricator.wikimedia.org/T263587 (10jcrespo) Small addendum: Note that parsercache functionality is memcached + MySQL, not just MySQL. In fact the MySQL part was a later ad...
[13:30:20] <wikibugs>	 10serviceops, 10DBA, 10MediaWiki-Parser, 10Parsoid, 10Platform Team Workboards (Green): CAPEX for ParserCache for Parsoid - https://phabricator.wikimedia.org/T263587 (10jcrespo) Another small correction: > it could bring us capability to write into the ParserCache from the secondary DC, which we don't cu...
[13:30:39] <wikibugs>	 10serviceops, 10DBA, 10MediaWiki-Parser, 10Parsoid, 10Platform Team Workboards (Green): CAPEX for ParserCache for Parsoid - https://phabricator.wikimedia.org/T263587 (10ArielGlenn) >>! In T263587#6563095, @jcrespo wrote: > I have one question before everything else- does the parsercache expansion mean li...
[13:32:45] <wikibugs>	 10serviceops, 10DBA, 10MediaWiki-Parser, 10Parsoid, 10Platform Team Workboards (Green): CAPEX for ParserCache for Parsoid - https://phabricator.wikimedia.org/T263587 (10Joe) Cassandra is not absent of its own issues, and it has a much higher cost per GB than parsercache currently has (I did no research,...
[13:35:42] <wikibugs>	 10serviceops, 10DBA, 10MediaWiki-Parser, 10Parsoid, 10Platform Team Workboards (Green): CAPEX for ParserCache for Parsoid - https://phabricator.wikimedia.org/T263587 (10Marostegui) @ArielGlenn the current parsercache hosts run SSDs.
[13:36:14] <wikibugs>	 10serviceops, 10DBA, 10MediaWiki-Parser, 10Parsoid, 10Platform Team Workboards (Green): CAPEX for ParserCache for Parsoid - https://phabricator.wikimedia.org/T263587 (10jcrespo) >>! In T263587#6564251, @ArielGlenn wrote: > I'm going by the Dell quotes for the hw, backtracking from the racking task. If th...
[13:47:22] <wikibugs>	 10serviceops, 10DBA, 10MediaWiki-Parser, 10Parsoid, 10Platform Team Workboards (Green): CAPEX for ParserCache for Parsoid - https://phabricator.wikimedia.org/T263587 (10Marostegui) >>! In T263587#6564281, @jcrespo wrote: >> I'm going by the Dell quotes for the hw, backtracking from the racking task. If t...
[14:14:57] <cdanis>	 so rzl last night it was wikifeeds that was overwhelmed, right?
[14:17:51] <cdanis>	 we didn't serve any 429s last night -- looks like we didn't really come close to hitting the generous ratelimit on the `/api/rest_v1/page/random/summary` URL -- but that's just restbase ofc, not wikifeeds
[14:19:22] <cdanis>	 `/api/rest_v1/feed/featured/2020/10/20` and `/api/rest_v1/feed/onthisday/events/10/20` both spike at 22:00 ofc, and I assume they're both wikifeeds?
[14:21:00] <_joe_>	 yes
[14:21:24] <cdanis>	 the former, btw, is quite a large response on enwiki, *and* is `pass` in the frontend (and a hit in ats-be)
[14:21:28] <cdanis>	 that seems a bit wrong
[14:21:56] <_joe_>	 if it's a hit on ats-be, why is it a pass in varnish? size?
[14:22:12] <cdanis>	 100  258k    0  258k    0     0  1012k      0 --:--:-- --:--:-- --:--:-- 1012k
[14:22:14] <_joe_>	 did you try requesting it compressed?
[14:22:15] <cdanis>	 potentially
[14:22:55] <cdanis>	 requesting it compressed, it is still a pass in the frontend
[14:23:26] <cdanis>	 it must be size, it's a hit on itwiki and most other wikis I've tried
[14:23:34] <rzl>	 cdanis: here now -- it was wikifeeds that paged yeah, I haven't checked on anything else
[14:27:05] <wikibugs>	 10serviceops, 10Operations: php-fpm  invalid opcode on mw1317 - https://phabricator.wikimedia.org/T236292 (10jijiki) 05Open→03Resolved a:03jijiki Resolve it since it has not been updated for so long :)
[14:29:57] <cdanis>	 weirdly we didn't actually serve that many 5xx
[14:30:35] <cdanis>	 only about 2.5k failed wikifeeds requests over a span of 3 minutes
[14:35:16] <cdanis>	 https://grafana.wikimedia.org/d/lxZAdAdMk/wikifeeds?viewPanel=25&orgId=1&from=1603144093031&to=1603146481971
[14:35:18] <cdanis>	 did we crash a pod?
[14:35:24] <cdanis>	 we definitely saturated all of them on CPU
[14:35:30] <cdanis>	 but the dip in limit makes me think we crashed a pod
[14:37:16] <_joe_>	 we should set up a cron on deploy1001 adding replicas before 22:00
[14:37:17] <cdanis>	 yup https://grafana.wikimedia.org/d/lxZAdAdMk/wikifeeds?viewPanel=100&orgId=1&from=1603144093031&to=1603146481971
[14:37:33] <_joe_>	 and remove them afterwards
[14:37:46] <cdanis>	 also, the latency quantiles stop at 1s, and we were far in excess of that :)
[14:37:49] <_joe_>	 a cron, not a systemd timer, it needs to feel duct-tapey
[14:40:51] <rzl>	 Kubernetes Will Save Us From All This, Of Course
[14:40:58] <rzl>	 hashtag elasticity
[14:42:01] <_joe_>	 rzl: well I doubt the horizontal pod autoscaler has reaction times suitable to this surge
[14:42:11] <_joe_>	 if we want to enable it
[14:42:42] <rzl>	 that's okay, we'll just bolt it to an ML system trained on our traffic graphs
[14:42:49] <rzl>	 "oh, it's 21:50, time to scale up wikifeeds"
[14:43:49] <cdanis>	 rzl: https://prometheus.io/docs/prometheus/latest/querying/functions/#holt_winters
[14:44:11] <rzl>	 Holt Winters is a comic book action hero, change my mind
[14:44:15] <_joe_>	 cdanis: holt-winters is not really great at predicting things
[14:44:26] <cdanis>	 _joe_: I'm sorry, was this a serious conversation?
[14:44:29] <_joe_>	 I've tried to use it in the past for 5xx patterns
[14:44:34] <cdanis>	 ahah
[14:44:39] <_joe_>	 yeah :P
[14:44:59] <_joe_>	 then I went to read the maths and :}
[14:45:09] <rzl>	 Holt Winters and the Midnight Surge
[14:45:24] <rzl>	 Holt Winters and the Thundering Herd
[14:45:30] <rzl>	 Holt Winters and the Fisher-Yates Controversy
[14:45:42] <rzl>	 that issue sucked tbh
[14:46:04] <cdanis>	 okay
[14:46:31] <cdanis>	 I have a suspicion that part of the wikifeeds problem was that the dewiki /api/rest_v1/feed/onthisday/events/10/20 response is especially large
[14:46:43] <cdanis>	 and thus has the same "hit in ats-be, pass in varnishfe" behavior
[14:47:03] <cdanis>	 which probably means we also miss out on a bunch of request coalescing to the backends
[14:47:35] <rzl>	 sure, I'll buy that
[14:49:22] <cdanis>	 hm
[14:49:28] <cdanis>	 evn on smaller responses I'm still seeing a pass in varnishfe
[15:01:56] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Upgrade production kubernetes clusters to a security supported version - https://phabricator.wikimedia.org/T244335 (10JMeybohm)
[15:01:58] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Define the plan for the upgrade of kubernetes cluster to a security supported release - https://phabricator.wikimedia.org/T241076 (10JMeybohm)
[15:02:02] <wikibugs>	 10serviceops, 10Operations, 10Kubernetes, 10User-fsero: Upgrade calico in production to version 2.4+ - https://phabricator.wikimedia.org/T207804 (10JMeybohm)
[15:11:11] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Upgrade kubernetes clusters to a security supported (LTS) version - https://phabricator.wikimedia.org/T244335 (10JMeybohm)
[15:13:54] <_joe_>	 cdanis: that seems very strange indeed
[15:14:13] <cdanis>	 I traced the VCL and couldn't figure it out
[15:14:38] <_joe_>	 should we summon the vcl oracle?
[15:15:13] <wikibugs>	 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, 10User-fsero: Upgrade Calico - https://phabricator.wikimedia.org/T207804 (10JMeybohm)
[15:15:40] <bblack>	 cdanis: the bahviors aren't constant for a given URI, either
[15:15:44] <bblack>	 *behaviors
[15:16:16] <bblack>	 the frontend has a few different mechanisms for trying to be smart-ish about when to pass in the fe
[15:17:26] <bblack>	 hmmm but we're currently setting text+upload to an admission_policy of "none", so maybe not so complex right now
[15:17:37] <cdanis>	 yeah
[15:18:15] <cdanis>	 and e.g. https://de.wikipedia.org/api/rest_v1/feed/onthisday/events/10/17 is ~90kB and never gets cached AFAICT
[15:18:23] <cdanis>	 (by fe; it does get cached by ats-bes eventually)
[15:19:07] <bblack>	 does it have a CL header, when it arrives at v-fe?
[15:19:40] <bblack>	 I saw a hint in another ticket that there may have been a change affecting CL logic with our V6 upgrade too, but haven't followed it yet
[15:23:39] <bblack>	 fe pass on a be hit is problematic in general, because we have other code that assumes the passes are consistent across layers, and thus randomizes the backend cache choice instead of chashing...
[15:24:01] <bblack>	 (which I imagine you already stumbled on, since you said it gets cached "eventually")
[15:26:04] <bblack>	 ah hold on
[15:26:19] <bblack>	 that URL is ~90KB compressed, but the true CL is ~500KB
[15:26:47] <bblack>	 so it is the size cutoff that's causing the frontend pass in this case
[15:27:20] <bblack>	 but what's curiouser to me, and has probably been causing us some inefficiencies for a very long time...
[15:27:38] <bblack>	 is that willful passes of cacheable content like this shouldn't be replacing backend-selection chash with randomization
[15:28:36] <bblack>	 (by shouldn't, I don't mean the code is buggy, I mean we probably never even tried to do this right, but should)
[15:32:16] <bblack>	 making a ticket about this with some more light digging in it
[15:39:48] <bblack>	 ugh
[15:40:05] <bblack>	 describing the problem in sufficient detail is way harder than proposing a patch in this case :/
[15:41:05] <wikibugs>	 10serviceops, 10Push-Notification-Service, 10Product-Infrastructure-Team-Backlog (Kanban), 10User-jijiki: High latency on push notification service initialization - https://phabricator.wikimedia.org/T265258 (10jijiki) @Jgiannelos is there any help you would like from #serviceops  ?
[15:43:48] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Test deployment-charts for kubernetes 1.19 compatibility - https://phabricator.wikimedia.org/T266032 (10JMeybohm)
[15:44:04] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Test deployment-charts for kubernetes 1.19 compatibility - https://phabricator.wikimedia.org/T266032 (10JMeybohm) p:05Triage→03High
[15:44:29] <wikibugs>	 10serviceops, 10Operations, 10Growth-Team (Current Sprint), 10Patch-For-Review, 10User-Elukey: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10jijiki)
[15:44:57] <wikibugs>	 10serviceops, 10Operations, 10Growth-Team (Current Sprint), 10Patch-For-Review, and 2 others: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10jijiki)
[15:48:17] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Define the plan for the upgrade of kubernetes cluster to a security supported release - https://phabricator.wikimedia.org/T241076 (10JMeybohm)
[15:48:18] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Upgrade kubernetes clusters to a security supported (LTS) version - https://phabricator.wikimedia.org/T244335 (10JMeybohm)
[15:54:15] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Upgrade kubernetes clusters to a security supported (LTS) version - https://phabricator.wikimedia.org/T244335 (10JMeybohm)
[15:55:02] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Upgrade kubernetes clusters to a security supported (LTS) version - https://phabricator.wikimedia.org/T244335 (10JMeybohm)
[16:00:11] <wikibugs>	 10serviceops, 10Operations, 10Platform Engineering: Upgrade MediaWiki's Redis cluster to Debian Buster - https://phabricator.wikimedia.org/T265643 (10jijiki)
[16:00:13] <wikibugs>	 10serviceops, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar), and 2 others: Upgrade memcached cluster to Debian Stretch/Buster - https://phabricator.wikimedia.org/T213089 (10jijiki)
[16:01:30] <wikibugs>	 10serviceops, 10Operations, 10Platform Engineering, 10User-jijiki: Upgrade MediaWiki's Redis cluster to Debian Buster - https://phabricator.wikimedia.org/T265643 (10jijiki)
[16:06:41] <bblack>	 https://phabricator.wikimedia.org/T266040 for the pass/random stuff above
[16:21:51] <rzl>	 bblack: I'm hanging that off T264821 if you don't mind
[16:22:26] <wikibugs>	 10serviceops, 10Operations, 10Wikidata: Hourly read spikes against s8 resulting in occasional user-visible latency & error spikes - https://phabricator.wikimedia.org/T264821 (10RLazarus)
[16:22:45] <bblack>	 rzl: yeah sounds good, as I'm sure this is contributing to the impact
[16:23:15] <bblack>	 if I'm right about this problem (I think I am, just maybe not about the solution), it probably has been having some pretty wide-ranging negative effects for a long time, for many things :/
[16:25:01] <rzl>	 yeah, makes sense
[16:25:30] <cdanis>	 ugh yeah, I noticed the same request hopping amongst backends but didn't think too hard about it, but ofc that's a problem
[16:29:29] <_joe_>	 so yeah this is also the reason for other behaviour I've seen, probably 500kb uncompressed is a tad too small as a threshold?
[16:29:49] <_joe_>	 esp for api responses
[16:30:38] <bblack>	 the cutoff is 256KB uncompressed
[16:30:58] <_joe_>	 oh
[16:30:59] <bblack>	 we could potentially tune that, but without taking a good data-driven approach, any guess is as good as another really
[16:31:06] <_joe_>	 sure
[16:31:16] <_joe_>	 I wasn't suggesting we divine tea leaves
[16:32:24] <bblack>	 the basic parameters of the situation is that a typical modern frontend cache box has a total memory storage size of 194GB, and that has to cover any frontend-side caching for the entire cache_text dataset (basically everything but upload.wm.o and maps.wm.o)
[16:32:46] <bblack>	 so the 256KB cutoff is intended to protect it against large objects pushing out lots and lots of small ones.
[16:33:00] <_joe_>	 right, you want to strike the right balance to optimize cache hit-ratio
[16:33:06] <bblack>	 (and then the large objects are still a dc-local hit in the backend cache, so not a huge cost)
[16:33:38] <bblack>	 but the way it's working now, it's also causing all of those large objects to be cached 8 times (once in each backend cache), instead of just once, at the backend layer
[16:33:38] <_joe_>	 it could already be enough to say "if size is between 256kb and X, don't randomize the backend"
[16:33:50] <bblack>	 8x th emisses to get it cached for everyone, and 1/8th the space for them all, etc
[16:34:08] <cdanis>	 yeah, the 8x the misses is why I think wikifeeds melted on a particularly large response
[16:35:17] <bblack>	 there are a few "easy" ways to fix the problem we're staring at here.  The hard thing is to fix it and not screw up the 2,924,316 classes/patterns of traffic we're not thinking about that are currently working fine.
[16:39:20] <bblack>	 This seems like a good time to tactically nerdsnipe you into reading https://ferd.ca/you-reap-what-you-code.html instead of staring at this horror
[16:41:09] <rzl>	 oh I heard about this talk! thanks for the link
[16:47:39] <cdanis>	 good talk, thanks Brandon
[16:49:28] <rzl>	 +1
[16:50:12] <rzl>	 followed the author at https://twitter.com/mononcqc too
[16:54:36] <wikibugs>	 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Strengthen backup infrastructure and support - https://phabricator.wikimedia.org/T229209 (10jcrespo)
[17:41:21] <wikibugs>	 10serviceops, 10Performance-Team, 10Patch-For-Review, 10Sustainability (Incident Followup), 10User-jijiki: Avoid php-opcache corruption in WMF production - https://phabricator.wikimedia.org/T253673 (10Krinkle)
[17:47:35] <wikibugs>	 10serviceops, 10Operations, 10Wikimedia-production-error: PHP7 corruption reports in 2020 (Call on wrong object, etc.) - https://phabricator.wikimedia.org/T245183 (10Mholloway)
[17:52:06] <wikibugs>	 10serviceops, 10Operations, 10Wikimedia-production-error: PHP7 corruption reports in 2020 (Call on wrong object, etc.) - https://phabricator.wikimedia.org/T245183 (10CDanis)
[20:32:34] <wikibugs>	 10serviceops, 10MediaWiki-Authentication-and-authorization, 10Platform Team Workboards (Clinic Duty Team), 10Wikimedia-production-error: Error fetching URL "http://localhost:600...": (curl error: 28) Timeout was reached - https://phabricator.wikimedia.org/T265551 (10Clarakosi) p:05Triage→03High
[20:36:09] <wikibugs>	 10serviceops, 10Operations, 10Platform Team Initiatives (Containerise Services): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10Dzahn) parsoid: WIP in https://gerrit.wikimedia.org/r/c/operations/puppet/+/634383  / T257906
[20:45:06] <wikibugs>	 10serviceops, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar), and 2 others: Upgrade memcached cluster to Debian Stretch/Buster - https://phabricator.wikimedia.org/T213089 (10jijiki) 05Stalled→03Open
[20:45:11] <wikibugs>	 10serviceops, 10Operations, 10Patch-For-Review: Upgrade and improve our application object caching service (memcached) - https://phabricator.wikimedia.org/T244852 (10jijiki)
[20:45:44] <wikibugs>	 10serviceops, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar), and 2 others: Upgrade memcached cluster to Debian Stretch/Buster - https://phabricator.wikimedia.org/T213089 (10jijiki)
[20:46:00] <wikibugs>	 10serviceops, 10Operations, 10Platform Engineering, 10Performance-Team (Radar), and 2 others: Upgrade memcached cluster to Debian Stretch/Buster - https://phabricator.wikimedia.org/T213089 (10jijiki)
[21:15:11] <wikibugs>	 10serviceops, 10Release-Engineering-Team, 10Patch-For-Review: replace production deployment servers - https://phabricator.wikimedia.org/T265963 (10Dzahn) p:05Triage→03Medium
[22:51:31] <wikibugs>	 10serviceops, 10Performance-Team, 10Patch-For-Review, 10Sustainability (Incident Followup), 10User-jijiki: Avoid php-opcache corruption in WMF production - https://phabricator.wikimedia.org/T253673 (10tstarling) My idea for detection/prevention of opcache corruption is to use a [[http://manpages.ubuntu.c...