[06:54:52] 10serviceops, 10MediaWiki-General, 10Operations, 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: Revisit timeouts, concurrency limits in remote HTTP calls from MediaWiki - https://phabricator.wikimedia.org/T245170 (10Physikerwelt) What do you think about implementing a bot that checks tha... [07:19:10] 10serviceops, 10LDAP-Access-Requests, 10Operations, 10observability, 10Patch-For-Review: Grant Access to Logstash to Peter(peter.ovchyn@speedandfunction.com) - https://phabricator.wikimedia.org/T249037 (10Dzahn) Alright, thanks @AMooney ! [07:37:21] 10serviceops, 10observability, 10Patch-For-Review: "PHP opcache hit ratio" alert shouldn't bother on mwdebug*/scandium/etc - https://phabricator.wikimedia.org/T254025 (10Dzahn) After the merge above, the opcache check has been removed from scandium while it is unchanged on regular appservers. https://icinga... [07:42:27] 10serviceops, 10observability, 10Patch-For-Review: "PHP opcache hit ratio" alert shouldn't bother on mwdebug*/scandium/etc - https://phabricator.wikimedia.org/T254025 (10Dzahn) mwdebug hosts have not been changed so far, but now it would just be a Hiera change [08:35:31] 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): Migrate mobileapps to k8s and node 10 - https://phabricator.wikimedia.org/T218733 (10akosiaris) >>! In T218733#6190306, @Mholloway wrote: > Awesome! Thanks, @akosiaris! I'... [08:47:25] 10serviceops, 10Operations, 10Patch-For-Review: rack/setup/install ganeti10([09]|1[0-8]).eqiad.wmnet - https://phabricator.wikimedia.org/T228924 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by akosiaris on cumin1001.eqiad.wmnet for hosts: ` ['ganeti1009.eqiad.wmnet', 'ganeti1010.eqiad.wmnet',... [08:51:55] 10serviceops, 10Operations, 10Patch-For-Review: rack/setup/install ganeti10([09]|1[0-8]).eqiad.wmnet - https://phabricator.wikimedia.org/T228924 (10Dzahn) Was another reimage needed? I already did these. Something wrong with RAID still? [09:12:02] 10serviceops, 10Operations, 10Patch-For-Review: rack/setup/install ganeti10([09]|1[0-8]).eqiad.wmnet - https://phabricator.wikimedia.org/T228924 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['ganeti1011.eqiad.wmnet', 'ganeti1017.eqiad.wmnet', 'ganeti1012.eqiad.wmnet', 'ganeti1013.eqiad.wmnet', '... [09:14:55] 10serviceops, 10Operations, 10Patch-For-Review: rack/setup/install ganeti10([09]|1[0-8]).eqiad.wmnet - https://phabricator.wikimedia.org/T228924 (10akosiaris) >>! In T228924#6191712, @Dzahn wrote: > Was another reimage needed? I already did these. Something wrong with RAID still? buster vs stretch. the curr... [09:17:26] 10serviceops, 10Operations, 10Patch-For-Review: rack/setup/install ganeti10([09]|1[0-8]).eqiad.wmnet - https://phabricator.wikimedia.org/T228924 (10Dzahn) >>! In T228924#6191828, @akosiaris wrote: > buster vs stretch. the current clusters are stretch Oh yea, that makes a lot of sense. gotcha, thanks. [09:22:56] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Upgrade all TLS enabled charts to v0.2 tls_helper - https://phabricator.wikimedia.org/T253396 (10JMeybohm) [09:45:26] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Upgrade all TLS enabled charts to v0.2 tls_helper - https://phabricator.wikimedia.org/T253396 (10JMeybohm) [13:51:17] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: decom 36 old appservers in eqiad (onsite, dcops) - https://phabricator.wikimedia.org/T253856 (10Jclark-ctr) @Cmjohnson Host have been removed from racks and netbox has been updated for removing from rack. [14:15:42] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: (Re) add wmf.chartid as label to all kubernetes objects - https://phabricator.wikimedia.org/T254479 (10JMeybohm) [14:47:25] 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): Migrate mobileapps to k8s and node 10 - https://phabricator.wikimedia.org/T218733 (10Mholloway) Thanks for that explanation, @akosiaris, that process sounds great. Of cours... [14:57:34] hello akosiaris around? would appreciate a review on https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/602087 [15:18:14] 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): Migrate mobileapps to k8s and node 10 - https://phabricator.wikimedia.org/T218733 (10bearND) Yes, this plan sounds good to me. I think we should probably try to deploy both... [15:29:23] 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): Migrate mobileapps to k8s and node 10 - https://phabricator.wikimedia.org/T218733 (10JMeybohm) If you want so start with TLS (via envoy) right away (which would be great!) y... [16:04:40] 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): Migrate mobileapps to k8s and node 10 - https://phabricator.wikimedia.org/T218733 (10Mholloway) Thanks, @JMeybohm. Using TLS right away sounds great! I've reserved port 41... [16:40:52] 10serviceops, 10Continuous-Integration-Infrastructure, 10Operations: replace backends for releases.wikimedia.org with buster VMs - https://phabricator.wikimedia.org/T247652 (10Jdforrester-WMF) [16:58:33] 10serviceops, 10Operations, 10Performance-Team (Radar), 10Sustainability (Incident Prevention): Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10Krinkle) Update - The gutterpools are live. The conversation here does not look finished though, so it'... [17:09:53] 10serviceops, 10Operations, 10Performance-Team (Radar), 10Sustainability (Incident Prevention): Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10elukey) Trying to answer :) >>! In T240684#6193525, @Krinkle wrote: > Update - The gutterpools are liv... [17:40:37] 10serviceops, 10Operations, 10Performance-Team, 10Sustainability (Incident Prevention): Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10Krinkle) [17:48:12] 10serviceops, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Reduce read pressure on memcached servers by adding a machine-local Memcache instance - https://phabricator.wikimedia.org/T244340 (10Krinkle) `name=From IRC I wonder how the backfill logic would work...would it be get... [17:48:27] 10serviceops, 10Operations, 10Performance-Team, 10Patch-For-Review: Reduce read pressure on memcached servers by adding a machine-local Memcache instance - https://phabricator.wikimedia.org/T244340 (10Krinkle) [19:11:38] 10serviceops, 10Operations, 10observability, 10Sustainability (Incident Prevention): add monitoring of sustained memcached TKO rates - https://phabricator.wikimedia.org/T253384 (10CDanis) [19:23:53] 10serviceops, 10Operations, 10PHP 7.2 support, 10PHP 7.3 support, 10Patch-For-Review: PHP 7.2 is very slow on an allocation-intensive benchmark - https://phabricator.wikimedia.org/T230861 (10cscott) [20:15:19] 10serviceops, 10Gerrit, 10Operations, 10Release-Engineering-Team-TODO, 10Release-Engineering-Team (Development services): Deploy multi-site plugin to gerrit1001 and gerrit2001 - https://phabricator.wikimedia.org/T217174 (10Jdforrester-WMF) [20:20:32] 10serviceops, 10Gerrit, 10Operations, 10Patch-For-Review: Convert Gerrit to use H2 as the database - https://phabricator.wikimedia.org/T211139 (10Paladox) 05Stalled→03Declined Declining as we're going straight to 3.1 so we won't be needing a db from that release. [20:30:31] 10serviceops, 10Gerrit, 10Operations, 10Release-Engineering-Team-TODO, 10Release-Engineering-Team (Development services): Deploy multi-site plugin to gerrit1001 and gerrit2001 - https://phabricator.wikimedia.org/T217174 (10Paladox) [20:30:43] 10serviceops, 10Gerrit, 10Operations, 10Release-Engineering-Team-TODO, 10Release-Engineering-Team (Development services): Deploy multi-site plugin to gerrit1001 and gerrit2001 - https://phabricator.wikimedia.org/T217174 (10Paladox) p:05Medium→03Low [20:34:49] 10serviceops, 10Gerrit, 10Operations, 10Patch-For-Review: Convert Gerrit to use H2 as the database - https://phabricator.wikimedia.org/T211139 (10Paladox) [21:29:37] 10serviceops, 10Gerrit, 10Operations, 10Release-Engineering-Team-TODO, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10Cmjohnson) [21:50:59] 10serviceops, 10Operations, 10Performance-Team, 10Patch-For-Review: Reduce read pressure on memcached servers by adding a machine-local Memcache instance - https://phabricator.wikimedia.org/T244340 (10aaron) I suspect that the keys that cause trouble are big text/JSON blobs and ParserOutput objects, all of... [22:32:10] 10serviceops: Evaluate Locust Stress Test Tool - https://phabricator.wikimedia.org/T254530 (10wkandek) [22:34:46] 10serviceops: Evaluate Locust Stress Test Tool - https://phabricator.wikimedia.org/T254530 (10wkandek) [22:47:41] 10serviceops: Evaluate Locust Stress Test Tool - https://phabricator.wikimedia.org/T254530 (10wkandek) Documentation: https://wikitech.wikimedia.org/wiki/LocustProofOfConcept Code: https://github.com/wkandek/locust Locust was first setup locally under Virtualbox VMs and functionally tested. Vagrant was used to... [22:52:21] 10serviceops: Evaluate Locust Stress Test Tool - https://phabricator.wikimedia.org/T254530 (10wkandek) p:05Triage→03Medium [23:37:45] 10serviceops, 10Operations, 10Performance-Team, 10Patch-For-Review: Reduce read pressure on memcached servers by adding a machine-local Memcache instance - https://phabricator.wikimedia.org/T244340 (10Krinkle) > Some keys are super hot - take for instance `WANCache:v:global:CacheAwarePropertyInfoStore:wiki...