[04:03:32] 10serviceops, 10Operations, 10Performance-Team, 10Patch-For-Review, 10User-jijiki: Enable "/*/mw-with-onhost-tier/" route for MediaWiki where safe - https://phabricator.wikimedia.org/T264604 (10jijiki) @Krinkle @aaron do you think we are ready to move this forward? [04:29:24] 10serviceops, 10MW-on-K8s, 10Operations: Sandbox/limit child processes within a container runtime - https://phabricator.wikimedia.org/T252745 (10tstarling) [04:29:48] 10serviceops, 10MW-on-K8s, 10Operations, 10Patch-For-Review, and 2 others: RFC: PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10tstarling) 05Resolved→03Open Can the task stay open to track implementation? The RFC workboard has "Approved" and "Implemente... [05:14:51] 10serviceops, 10MW-on-K8s, 10Operations, 10Patch-For-Review, and 2 others: RFC: PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10Krinkle) Given the title and task description, I assumed it was a dedicated task, but I see it's used as tracking task indeed. So... [05:30:36] 10serviceops, 10CX-cxserver, 10Language-Team (Language-2020-October-December), 10Patch-For-Review, 10Release-Engineering-Team (Pipeline): Migrate apertium to the deployment pipeline - https://phabricator.wikimedia.org/T255672 (10KartikMistry) [08:19:22] jayme / akosiaris do you want us to run the helmfile deploy command to staging, codfw and eqiad or just staging? How will we know if it "works" if it isn't available over the network yet? [08:58:23] kostajh: start with staging I 'd say. You can test by curl https://staging.svc.eqiad.wmnet:4005 from anywhere in the fleet. deploy1001 also has service-checker-swagger installed so you can do $ service-checker-swagger staging.svc.eqiad.wmnet https://staging.svc.eqiad.wmnet:4005. If that succeeds, you can move to eqiad/codfw. Addressing eqiad/codfw [08:58:23] requires extra work (setting up LVS,DNS and discovery) so they are not directly accessible yet [08:58:48] Got it, thanks [09:22:36] 10serviceops, 10Add-Link, 10Growth-Team: Add Link engineering: Allow external traffic to linkrecommendation service - https://phabricator.wikimedia.org/T269581 (10kostajh) >>! In T269581#6680513, @Tgr wrote: > If it's easy to selectively address one or the other from MediaWiki, we could just have a special p... [09:25:19] 10serviceops, 10Add-Link, 10Growth-Team: Add Link engineering: Allow external traffic to linkrecommendation service - https://phabricator.wikimedia.org/T269581 (10JMeybohm) Can/should it maybe be integrated into https://wikitech.wikimedia.org/wiki/API_Gateway instead of going though MW? [10:01:16] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Implement switching of staging clusters - https://phabricator.wikimedia.org/T269835 (10JMeybohm) [10:01:21] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Implement switching of staging clusters - https://phabricator.wikimedia.org/T269835 (10JMeybohm) p:05Triage→03Medium [10:14:21] effie: o/ - qq: is the redis lock manager going to stay in the long term, or is there a plan to migrate it out of Redis? [10:14:46] elukey: I have been thinking to start the convo on the phasig out redis task [10:15:20] I have it in my mind to pick it up with aaron [10:15:27] next Q [10:15:30] I checked all the nice Lua code that we use to lock from mediawiki (sigh), but in theory somebody from CPT should be able to verify that it works with Redis 5.x and unblock us.. Otherwise it is a endless pain :( [10:16:21] I saw that main stash should probably go to mariadb, that is great [10:16:22] we are installing redis 2.8 [10:16:39] so for now, that painful part wornt be an issue [10:17:07] the mainstash migration is a bit stalled, but yes it is on its way [10:17:29] bullseye is not super far, forward porting 2.8 again will surely be not ideal for example [10:17:47] I mean I completely get this particular use case [10:18:18] but it worries me that we are kept into this due to lua code that we don't know how to test (we as technology in general) [10:19:11] no the 2.8 is a bandaid until we remove redis completely [10:19:46] from a very ignorant (me) code review it seems that it doesn't use any advanced/deprecated command, the only question mark is if redis 5.x still runs lua in "Atomic mode" [10:20:20] we assessed that since we have been trying to get out of redis since last year [10:20:40] (seems so from https://redis.io/commands/eval) [10:20:45] running redis 2.8 for one more Q is not that bad compaired to analysing if redis 5.X will work [10:21:20] anyway, let's get into this rabbithole if we reach the end of Q3 and we are still where we are today [10:21:45] we have spent already many hours discussing this, without getting anywhere [10:22:09] the decision to keep 2.8 unblocked our memcached upgrade [10:22:10] sure it's ok, my point is not questioning this choice, but long term it would be great to be able to quickly test redis in a reliable way, I'd love to understand what it is blocking CPT or other teams [10:22:50] I think just to seriously push for it, the two main issues after mainstash is the lock and the file upload thing [10:23:07] to keep the same filename to be uploaded while one file is being uploaded [10:26:34] i would be +1 to push other teams to have a Q-goal to sign off for redis upgrades, it seems something doable, and if there is something to fix we'll deal with it [10:58:20] 10serviceops, 10Operations, 10Release-Engineering-Team-TODO, 10Patch-For-Review, and 2 others: Upgrade MediaWiki appservers to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10MoritzMuehlenhoff) >>! In T245757#6680909, @Dzahn wrote: >>>! In T245757#6645352, @jijiki wrote: >> @Dzahn... [11:39:10] 10serviceops, 10Growth-Team, 10Operations, 10Patch-For-Review, and 2 others: Reimage one memcached shard per DC to Buster - https://phabricator.wikimedia.org/T252391 (10jijiki) [11:39:19] 10serviceops, 10Operations, 10Platform Engineering, 10Patch-For-Review, 10User-jijiki: Upgrade MediaWiki's Redis cluster to Debian Buster - https://phabricator.wikimedia.org/T265643 (10jijiki) [11:39:25] 10serviceops, 10Operations, 10Platform Engineering, 10Wikidata, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10jijiki) [11:40:12] 10serviceops, 10Growth-Team, 10Operations, 10Patch-For-Review, and 2 others: Reimage one memcached shard per DC to Buster - https://phabricator.wikimedia.org/T252391 (10jijiki) 05Open→03Resolved a:03jijiki [11:52:13] 10serviceops, 10Performance-Team, 10Platform Engineering, 10Wikimedia-Rdbms, and 2 others: Determine multi-dc strategy for ChronologyProtector - https://phabricator.wikimedia.org/T254634 (10jijiki) [11:53:20] `helmfile -e staging -I apply` worked fine for linkrecommendation, but `service-checker-swagger staging.svc.eqiad.wmnet https://staging.svc.eqiad.wmnet:4005` fails with timeout issues for the apidocs url [11:54:00] I don't exactly understand why, since `curl -L https://staging.svc.eqiad.wmnet:4005/apidocs` works without issue [11:55:22] 10serviceops, 10Performance-Team, 10Platform Engineering, 10Wikimedia-Rdbms, and 2 others: Determine multi-dc strategy for ChronologyProtector - https://phabricator.wikimedia.org/T254634 (10jijiki) @Krinkle do you think it would be possible to schedule this for Q3? Either move to the 'redis_misc' cluster o... [12:08:15] oh, it's because i need to pass the JSON endpoint specifically `service-checker-swagger staging.svc.eqiad.wmnet https://staging.svc.eqiad.wmnet:4005 -t 2 -s /apispec_1.json` [12:27:28] 10serviceops, 10Operations, 10Platform Engineering, 10Wikidata, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10jijiki) @aaron In order to change the servers defined in the mediawiki-config (and use other redis instances), apart from roll change them... [14:17:45] 10serviceops, 10Platform Engineering Roadmap Decision Making, 10Code-Health-Objective, 10Performance-Team (Radar), and 4 others: Determine multi-dc strategy for CentralAuth - https://phabricator.wikimedia.org/T267270 (10Naike) [15:01:43] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Implement switching of staging clusters - https://phabricator.wikimedia.org/T269835 (10akosiaris) > When switching from staging-eqiad to staging-codfw (and vice versa) we would need to: > * Ensure all services currently deployed on staging-eqiad are deployed to... [15:17:28] 10serviceops, 10Operations, 10MW-1.36-notes (1.36.0-wmf.18; 2020-11-17), 10Performance Issue, and 3 others: Strategy for storing parser output for "old revision" (Popular diffs and permalinks) - https://phabricator.wikimedia.org/T244058 (10Pchelolo) [15:54:26] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Refactor calico deploy strategy - https://phabricator.wikimedia.org/T267653 (10JMeybohm) [puppet-private] (487bdca0) (jayme) Add calicoctl and calico-cni kubernetes users [15:56:39] 10serviceops, 10Operations, 10Release-Engineering-Team-TODO, 10Patch-For-Review, and 2 others: Upgrade MediaWiki appservers to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10jijiki) >>! In T245757#6681662, @MoritzMuehlenhoff wrote: > ffmpeg -i Wall_of_Death_-_Pitts_Todeswand_2017_... [15:57:28] 10serviceops, 10Operations, 10Release-Engineering-Team-TODO, 10Patch-For-Review, and 2 others: Upgrade MediaWiki appservers to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10jijiki) [15:59:43] 10serviceops, 10Operations, 10Parsoid: Upgrade Parsoid servers to buster - https://phabricator.wikimedia.org/T268524 (10jijiki) @ssastry is there someway we could check that parse2001, which is running on buster now, works as expected? [16:13:02] 10serviceops, 10Operations, 10Parsoid: Upgrade Parsoid servers to buster - https://phabricator.wikimedia.org/T268524 (10ssastry) Run a few curl commands like these but while using parse2001 as a proxy. Here is the equivalent for scandium itself: ` curl -L -x http://scandium.eqiad.wmnet:80 http://en.wikipedia... [16:20:21] 10serviceops, 10Operations, 10Parsoid: Upgrade Parsoid servers to buster - https://phabricator.wikimedia.org/T268524 (10ssastry) Ah, because parse2001 and parse2002 are codfw, not eqiad. Anyway, here goes: ` ssastry@scandium:~$ curl -L -x http://parse2001.codfw.wmnet:80 http://en.wikipedia.org/w/rest.php/en.... [16:58:32] 10serviceops, 10Operations, 10Platform Engineering, 10Wikidata, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mc2032.codfw.wmnet ` The log can be... [17:24:13] 10serviceops, 10Operations, 10Platform Engineering, 10Wikidata, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc2032.codfw.wmnet'] ` and were **ALL** successful. [17:31:09] effie: mc1035's get hit ratio jumped from 0.95 to 0.96, really really great! [17:31:47] please elukey you must learn some salesmanship. "we reduced misses by 20%!" [17:32:21] luca is selling the main memcached cluster, I am trying to sell the onhost one [17:32:32] :D [17:37:38] cdanis: ahahahaha [17:38:04] cdanis: should I also add more !!!!! to my sentences?? [17:38:07] ahaha [17:38:12] I do honestly think it makes it clearer [17:38:19] going from, say, .99 to .995 is very very hard [17:38:27] elukey: you need to tweet about it [17:38:41] that is when things get truly real [17:38:43] we went from 1.3 nines to 1.4 nines!! [17:39:16] effie: 🤔 I feel as if I'm being mocked [17:40:47] 10serviceops, 10Operations, 10Platform Engineering, 10Wikidata, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mc1032.eqiad.wmnet ` The log can be... [17:53:50] 10serviceops, 10Analytics, 10Analytics-Kanban, 10Event-Platform, and 5 others: Set up internal eventstreams instance exposing all streams declared in stream config (and in kafka jumbo) - https://phabricator.wikimedia.org/T269160 (10fdans) p:05Triage→03Medium [18:11:11] 10serviceops, 10Operations, 10Platform Engineering, 10Wikidata, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc1032.eqiad.wmnet'] ` and were **ALL** successful. [18:40:20] cdanis: no no [21:35:10] 10serviceops, 10Add-Link, 10Growth-Team: Add Link engineering: Allow external traffic to linkrecommendation service - https://phabricator.wikimedia.org/T269581 (10Tgr) >>! In T269581#6681424, @JMeybohm wrote: > Can/should it maybe be integrated into https://wikitech.wikimedia.org/wiki/API_Gateway instead of...