[07:15:42] 10serviceops, 10Core Platform Team, 10MediaWiki-General, 10Operations, 10Wikimedia-Incident: Revisit timeouts, concurrency limits in remote HTTP calls from MediaWiki - https://phabricator.wikimedia.org/T245170 (10Joe) >>! In T245170#5916911, @WDoranWMF wrote: > Moving this to feature requests for PMs to... [09:21:24] _joe_: is it somewhat new that envoy requires the package getenvoy-envoy? It is missing on jessie so i am running into that on contint1001, but in the past on OTRS i don't think we needed it [09:22:04] <_joe_> mutante: I have no idea what you're talking about [09:22:12] <_joe_> jessie imports the package from an external repo [09:22:40] E: Unable to locate package getenvoy-envoy [09:23:02] that happened to me when i just added the class on a jessie machine [09:23:22] but on another one in the past it did not seem to be an issue [09:24:25] Notice: /Stage[main]/Envoyproxy/Package[getenvoy-envoy]/ensure: created [09:24:33] it just needs 2 puppet runs [09:25:20] i assumed it is jessie related but that was wrong then [09:39:34] <_joe_> mutante: probably we don't setup the apt component in the right way [09:40:43] _joe_: makes sense, probably needs a dependency added. i'll take a look at a fix [10:14:02] reminder that folks in greece are off today and monday, so any indications that we are actually working should be ignored :-) [10:56:57] that would in fact be fixed with apt::package_from_component, but not really worth fixing with jessie going away [11:00:17] ok [11:21:01] 10serviceops, 10Operations, 10Patch-For-Review: decom old appservers in eqiad - https://phabricator.wikimedia.org/T247780 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw1253.eqiad.wmnet` - mw1253.eqiad.wmnet (**FAIL**) - Host steps raised exception: Empty... [11:22:26] 10serviceops, 10Operations, 10Patch-For-Review: decom old appservers in eqiad - https://phabricator.wikimedia.org/T247780 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw1253.eqiad.wmnet` - mw1253.eqiad.wmnet (**FAIL**) - Failed downtime host on Icinga (li... [14:04:42] _joe_: so, it turns out, the period jobs we've already moved to systemd timers, ALSO exist as cron jobs still [14:04:45] *periodic [14:05:16] this revelation brought to you by cron spam from processEchoEmailBatch.php even though we migrated it last week, and verified in /var/spool/cron/crontabs/www-data on mwmaint1002 [14:05:21] so, I missed a step, clearly [14:07:32] presumably both have been running -- it looks like the cron job is failing and the systemd timer is succeeding, so I guess they're contending on some resource and the systemd copy is winning [14:15:10] <_joe_> rzl: oh shit I forgot to tell you that cron in puppet doesn't purge :/ [14:18:55] _joe_: should I edit the crontab manually then? [14:20:05] oh I didn't realize you're in a meeting right now, this can wait [14:23:44] rzl: if it's already gone from puppet, i think that's the only way, yea. otherwise "ensure => absent" [14:24:47] nod [14:25:34] I'm reluctant to go ensure=>absent for the remaining ones, just because it means keeping both the cron and systemd resources in there temporarily, which gets messy when there are so many of these to keep track of [14:25:42] I'll probably just add the manual deletion step to my routine [14:26:43] anyone object to doing that cleanup on a Friday? seems worthwhile to get out of this broken state asap [14:31:11] <_joe_> rzl: +1 to manual deletion [14:31:14] <_joe_> go go! [14:31:17] <_joe_> :P [14:53:58] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: scb1001: Memory correctable errors -EDAC- - https://phabricator.wikimedia.org/T250482 (10Dzahn) [15:21:02] done: https://phabricator.wikimedia.org/P11012 [15:21:18] I'll do the rest as they're merged [15:58:58] <_joe_> rzl: oh there is a neat trick [15:59:19] <_joe_> crontab -u www-data -d && run-puppet-agent [15:59:33] <_joe_> it's -d to remove it? I don't remember [15:59:47] oh, so puppet recreates only the ones it's aware of [15:59:47] cute [16:00:05] is that safe? [16:01:53] <_joe_> rzl: https://gfycat.com/weethosecoqui [16:02:20] good enough for me [16:22:44] rzl: if you wanted you could just move aside the crontab, run-puppet-agent, and do a diff [16:22:57] but if anyone manually defined a cron on a machine, they deserve what happens [16:23:05] ha, fair [16:47:29] 10serviceops, 10MediaWiki-Cache, 10Operations, 10Performance-Team, and 2 others: WANObjectCache::getWithSetCallback seems not to set objects when fetching data is slow - https://phabricator.wikimedia.org/T244877 (10Krinkle) 05Open→03Resolved a:03aaron [16:50:27] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: scb1001: Memory correctable errors -EDAC- - https://phabricator.wikimedia.org/T250482 (10wiki_willy) a:03Cmjohnson @Dzahn - when I look at the purchase date in Netbox, it shows this server was first installed 7yrs ago in January 2013. If that's accurate... [18:36:53] 10serviceops: Please provide our special component/php72 in buster-wikimedia - https://phabricator.wikimedia.org/T250515 (10Jdforrester-WMF) [18:39:19] 10serviceops: Please provide our special component/php72 in buster-wikimedia - https://phabricator.wikimedia.org/T250515 (10Jdforrester-WMF) p:05Triage→03Low [19:28:32] 10serviceops, 10MediaWiki-General, 10Operations, 10Core Platform Team Workboards (Clinic Duty Team), and 5 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10holger.knust) @DannyS712 Look at https://commons.wikimedi...