[00:07:45] 10serviceops, 10Operations, 10Patch-For-Review: move 20 new codfw parsoid servers (parse2*) into production - https://phabricator.wikimedia.org/T247441 (10Dzahn) p:05Triage→03Medium [03:53:51] 10serviceops, 10OpenRefine, 10Operations, 10Traffic, and 2 others: Clients failing API login due to dependence on "Set-Cookie" header name casing - https://phabricator.wikimedia.org/T249680 (10Antigng) Just to mention that apache httpd does a camel casing by default when proxying back from an http2-talking... [04:33:57] 10serviceops, 10Graphoid, 10Operations, 10Core Platform Team (Icebox), 10MW-1.35-notes (1.35.0-wmf.34; 2020-05-26): Undeploy graphoid - https://phabricator.wikimedia.org/T242855 (10Jseddon) [06:53:34] 10serviceops, 10CX-cxserver, 10Language-Team (Language-2020-Focus-Sprint), 10Release-Engineering-Team (Pipeline): Migrate apertium to the deployment pipeline - https://phabricator.wikimedia.org/T255672 (10KartikMistry) Some relevant updates: I've fixed Apertium packages for few bugs in Debian with coordin... [07:04:01] 10serviceops: mcrouter memcached flapping in gutter pool - https://phabricator.wikimedia.org/T255511 (10elukey) The change has been rolled out, but there is one use case that changed, namely the mcrouter proxies in codfw. All the eqiad mcrouters are configured to use 4 mw2* mcrouters (via TLS) in codfw as proxie... [07:06:14] 10serviceops: mcrouter memcached flapping in gutter pool - https://phabricator.wikimedia.org/T255511 (10elukey) Remaining things to do: 1) think about a "proxy-gutter-pool" for codfw proxies (likely in another task) 2) verify if the one minute delay added works better than the previous value (3s), and decide if... [07:11:47] 10serviceops, 10Operations, 10Performance-Team (Radar), 10User-Elukey: mcrouter codfw proxies sometimes lead to TKOs - https://phabricator.wikimedia.org/T227265 (10elukey) As described in T255511 we should think about adding a gutter pool for mw2* proxies to better handle TKOs when they happen. [07:28:31] 10serviceops, 10Graphoid, 10Operations, 10Core Platform Team (Icebox), 10MW-1.35-notes (1.35.0-wmf.34; 2020-05-26): Undeploy graphoid for phase 1 wiki's - https://phabricator.wikimedia.org/T257402 (10Jseddon) [07:53:20] 10serviceops, 10Graphoid, 10Operations, 10Core Platform Team (Icebox): Undeploy graphoid for phase 1 wiki's - https://phabricator.wikimedia.org/T257402 (10Jseddon) [08:26:43] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Write a script to prune old chart versions/charts from chartmuseum - https://phabricator.wikimedia.org/T257408 (10JMeybohm) [08:29:49] 10serviceops, 10Operations, 10wikitech.wikimedia.org: Install php-ldap on all MW appservers - https://phabricator.wikimedia.org/T237889 (10Joe) So one of my concerns is actually about the parent task, I'll comment there. [10:44:27] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Move proton to use TLS only - https://phabricator.wikimedia.org/T255877 (10jcrespo) p:05Triage→03Medium [10:51:34] etcd1003 is down, known issue? [10:52:49] moritzm: comment points to T244530 memory upgrade [10:53:35] T244530#6288145 [10:53:47] ah sure, I didn't make the connection! but in fact those etcd instances use the local storage, so that explains [14:35:17] 10serviceops, 10Performance-Team: Avoid php-opcache corruption in WMF production - https://phabricator.wikimedia.org/T253673 (10jcrespo) Today at 2020-07-08T14:00:12, with no deployments happening, mw1346 [[ https://logstash.wikimedia.org/goto/45f9809016562944d644dacd81613cfd | started generating exceptions at... [14:39:19] 10serviceops, 10Performance-Team: Avoid php-opcache corruption in WMF production - https://phabricator.wikimedia.org/T253673 (10Joe) Interestingly, according to the opcache metadata, the file where the error was (`/srv/mediawiki/php-1.35.0-wmf.39/skins/MinervaNeue/includes/MinervaHooks.php`) was in opcache sin... [15:03:22] 10serviceops, 10Operations, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service: Deploy push-notifications service to Kubernetes - https://phabricator.wikimedia.org/T256973 (10MSantos) [16:17:31] 10serviceops, 10Operations, 10Wikimedia-production-error: PHP7 corruption: Method call executed on unrelated object (also: Call to undefined method) - https://phabricator.wikimedia.org/T245183 (10Krinkle) Copying here so that this task remains complete for analysis of the problem, separate from T253673 which... [18:09:09] 10serviceops, 10Performance-Team: Avoid php-opcache corruption in WMF production - https://phabricator.wikimedia.org/T253673 (10CDanis) FTR `i`-->`h` is a single bit-flip in the LSB. [18:23:56] 10serviceops, 10Performance-Team: Avoid php-opcache corruption in WMF production - https://phabricator.wikimedia.org/T253673 (10CDanis) >>! In T253673#6290704, @CDanis wrote: > FTR `i`-->`h` is a single bit-flip in the LSB. Sorry, I was off by one; it's actually a transposition. So seems much less likely to... [18:28:55] 10serviceops, 10Operations, 10Wikimedia-production-error: PHP7 corruption: Method call executed on unrelated object (also: Call to undefined method) - https://phabricator.wikimedia.org/T245183 (10mmodell) Happened twice recently on host: `wtp1040` timestamp: `2020-07-08T18:06:14` reqid: `1e314c17-ae0b-48... [18:37:29] _joe_: ema: I've just deployed the last set of cleanups for kafka purges project. We have zero HTCP packets flowing around in our infrastructure [18:38:13] Pchelolo: \o/ [18:43:32] 10serviceops, 10Performance-Team: Avoid php-opcache corruption in WMF production - https://phabricator.wikimedia.org/T253673 (10Daimona) I think it's also interesting to compare this failure to T221347: there it was `L` -> `K`. It's a -1 in both cases. Unsure what this could mean... [18:44:58] <_joe_> Pchelolo: \o/ [19:24:03] 10serviceops, 10Performance-Team, 10Sustainability (Incident Prevention): Avoid php-opcache corruption in WMF production - https://phabricator.wikimedia.org/T253673 (10Krinkle)