[00:15:13] PHP version on Phabricator prod server upgraded to 7.2.22 [00:19:50] 10serviceops, 10Operations: Update component/php72 to 7.2.22 - https://phabricator.wikimedia.org/T230024 (10Dzahn) Thank you @MoritzMuehlenhoff and @jijiki first done on 2001 now also done on 1001. upgrade command: ` sudo cumin phab1003.eqiad.wmnet 'export DEBIAN_FRONTEND=noninteractive; apt-get install... [00:26:58] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: mw1239 memory errors - https://phabricator.wikimedia.org/T227867 (10Dzahn) self-healing?? <+icinga-wm> RECOVERY - Memory correctable errors -EDAC- on mw1239 is OK: (C)4 ge (W)2 ge 1 [02:04:59] 10serviceops, 10Operations, 10observability, 10Performance-Team (Radar): Messages in Logstash from php-fatal-error.php are missing from type:mediawiki/channel:fatal - https://phabricator.wikimedia.org/T234283 (10Krinkle) [02:05:24] 10serviceops, 10Operations, 10Wikimedia-Logstash, 10observability, 10Patch-For-Review: Errors managed by php-wmerrors (like OOMs) lack normalized_message on logstash - https://phabricator.wikimedia.org/T233828 (10Krinkle) >>! In T233828#5534006, @Krinkle wrote: >>>! In T233828#5532983, @Joe wrote: >>[…]... [02:05:31] 10serviceops, 10Operations, 10observability, 10Performance-Team (Radar): Messages in Logstash from php-fatal-error.php are missing from type:mediawiki/channel:fatal - https://phabricator.wikimedia.org/T234283 (10Krinkle) [02:05:45] 10serviceops, 10Operations, 10Wikimedia-Logstash, 10observability, 10Patch-For-Review: Errors managed by php-wmerrors (like OOMs) lack normalized_message on logstash - https://phabricator.wikimedia.org/T233828 (10Krinkle) 05Open→03Resolved a:03herron [08:17:34] 10serviceops, 10Operations: Update component/php72 to 7.2.22 - https://phabricator.wikimedia.org/T230024 (10MoritzMuehlenhoff) 05Open→03Resolved 7.2.22 is rolled out fleet-wide to all servers using PHP 7.2 [08:17:37] 10serviceops, 10Operations: Remove PHP 7.0 from production application servers - https://phabricator.wikimedia.org/T220600 (10MoritzMuehlenhoff) [10:46:38] 10serviceops, 10Operations, 10Traffic, 10Puppet: Puppet systemd::mask is an anti pattern that has unwanted side effect - https://phabricator.wikimedia.org/T233839 (10ema) We are using `systemd::mask` and `systemd::unmask` to ensure that package installation does not trigger service startup (see for example... [14:26:45] 10serviceops, 10Operations, 10Release Pipeline, 10CPT Initiatives (RESTBase Split (CDP2)), and 4 others: Deploy the RESTBase front-end service (RESTRouter) to Kubernetes - https://phabricator.wikimedia.org/T223953 (10akosiaris) >>! In T223953#5535332, @mobrovac wrote: > @akosiaris regarding rate limiting,... [15:15:58] got a ping about https://phabricator.wikimedia.org/T230917 after no response for over a month - could someone take a look and respond? [15:19:39] _joe_ akosiaris ^ [15:21:08] I dammit I was pinged on it as well and forgot about it [15:34:23] <_joe_> nah it's my fault [15:34:39] <_joe_> I was oncall that week, and alone, and I might have just lost track of that task [15:34:55] <_joe_> anyways, I will look in a few minutes [15:35:54] <_joe_> I thought they wanted something more complex that what was requested [15:40:27] _joe_: I think I got it [15:40:33] lemme upload my patch [15:40:39] <_joe_> ok! [15:41:42] 10serviceops, 10Core Platform Team, 10Performance-Team, 10Scap, and 5 others: Define variant Wikimedia production config in compiled, static files - https://phabricator.wikimedia.org/T223602 (10Jdforrester-WMF) [15:47:15] _joe_: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/540155/ [15:47:22] should address this relatively ok [15:47:55] <_joe_> check_procs, eek [15:48:00] <_joe_> but ok, it's a solution [15:49:21] <_joe_> merging it :) [15:49:24] well, aaron already ruled out talking to celery itself [15:49:43] inspect stats and inspect ping seem to not be working [15:49:51] _joe_: cool, thanks [15:54:37] <_joe_> akosiaris: I prefer a check on the systemd unit [15:54:55] <_joe_> btw, I will try to make the systemd general check more explicit about what is failing [15:55:08] I think that it should be possible now [15:55:20] I clearly remember systemd wasn't reporting what the issue was on jessie [15:55:31] or stretch, that part is unclear [15:55:40] but there was work underway to add that info [15:55:45] <_joe_> I will see how we can fix that [15:57:01] <_joe_> akosiaris: basically, what I want to do is [15:57:12] <_joe_> if system is degraded, check which units are failing [15:57:23] <_joe_> (it can be done, not necessarily in an elegant way) [16:00:26] 10serviceops, 10ORES, 10Operations, 10Patch-For-Review, 10Scoring-platform-team (Current): celery-ores-worker service failed on ores100[2,4,5] without any apparent reason or significant log - https://phabricator.wikimedia.org/T230917 (10Joe) First of all, apologies for losing track of this task. What you... [16:18:14] 10serviceops, 10ORES, 10Operations, 10Patch-For-Review, 10Scoring-platform-team (Current): celery-ores-worker service failed on ores100[2,4,5] without any apparent reason or significant log - https://phabricator.wikimedia.org/T230917 (10Halfak) Thanks @Joe and no worries. I'm happy to move this one off... [21:17:02] 10serviceops, 10ORES, 10Operations, 10Patch-For-Review, 10Scoring-platform-team (Current): celery-ores-worker service failed on ores100[2,4,5] without any apparent reason or significant log - https://phabricator.wikimedia.org/T230917 (10akosiaris) 05Open→03Resolved I 'll resolve this, all workers hav... [21:17:09] How are you feeling re. HHVM? Is it time to start dropping HHVM from CI and MW? [21:41:19] James_F: i think so, but also ask Effie and on https://phabricator.wikimedia.org/T229792 [21:41:52] when we talked about that change to remove it from servers the consensus was that reimaging of the servers is preferred [21:42:29] regarding CI .. it seems to make sense as soon as it is also out of production [21:42:37] * James_F nods. [21:42:56] Happy to wait on that task. [21:45:09] 10serviceops, 10Operations, 10HHVM, 10Patch-For-Review, 10Performance-Team (Radar): Remove HHVM from production - https://phabricator.wikimedia.org/T229792 (10Jdforrester-WMF) [22:44:34] 10serviceops, 10Gerrit, 10Operations, 10Release-Engineering-Team-TODO, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10Dzahn) [22:46:04] 10serviceops, 10Gerrit, 10Operations, 10Release-Engineering-Team-TODO, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10Dzahn) p:05Normal→03High [23:07:27] 10serviceops, 10Gerrit, 10Operations, 10Release-Engineering-Team-TODO, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10Dzahn) @thcipriani You and the other members of gerrit-roots admin group can now ssh to gerrit1001.wikimed... [23:11:56] 10serviceops, 10Gerrit, 10Operations, 10Release-Engineering-Team-TODO, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10thcipriani) >>! In T222391#5539841, @Dzahn wrote: > @thcipriani You and the other members of gerrit-roots... [23:34:32] 10serviceops, 10Gerrit, 10Operations, 10Release-Engineering-Team-TODO, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10Dzahn) >>! In T222391#5539856, @thcipriani wrote: > Confirmed that I can ssh in and I can see `/srv/gerrit...