[08:17:53] effie: memcached on mc103[4-6] looks really good! \o/ [08:18:29] I now! I am happy they reach a satisfactory ratio pretty fast too [08:29:09] effie: I filed https://gerrit.wikimedia.org/r/c/operations/puppet/+/647190 as improvement, doesn't really change much right now (but it could be confusing) [08:39:53] cool I will take a look, I am trying to get redis running on these hosts [08:39:59] and get this out of teh way too [09:35:02] 10serviceops, 10Operations, 10cloud-services-team (Kanban): Upgrade labweb servers to buster - https://phabricator.wikimedia.org/T269004 (10MoritzMuehlenhoff) >>! In T269004#6677107, @Andrew wrote: > It's most useful if effort is directed towards completing T237773, which will render this issue moot. In theo... [10:40:55] akosiaris: mind to take a quick look at https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/647211 ? [10:41:28] this currently leads to weird lint errors like in https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/646649 [10:42:16] ah indeed, I just met that yesterday with apertium. I was planning to figure out how to fix it, that's a nice way to do it [10:42:39] merged [10:42:46] thanks! [10:43:12] that was the mind boggling rabbithole I referred to on Monday [10:43:17] aaah [10:43:25] yeah.. [10:44:16] the trick is to "helmfile -e staging template" the chart, extract the mentioned "broken" yaml file from the output and count lines there. They do match :-) [11:04:49] jayme: thank you for your reviews! do you know who I should ask for a review with a +2? [11:05:53] kostajh: oh, I wasn't aware that you can't do +2 there yourself. I can do that ofc! [11:07:52] sorry for all this being a bit of a steep learning curve :) [11:14:15] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Refactor calico deploy strategy - https://phabricator.wikimedia.org/T267653 (10JMeybohm) The new calico chart is merged, thanks @akosiaris What is missing currently is a proper RoleBinding for the calicoctl user as I was n... [11:57:55] heh, https://docs.projectcalico.org/reference/cni-plugin/configuration#cni-network-configuration-lists cni can no do bandwidth limiting.. nice [11:59:06] kostajh: I got an action item to give the 3 people you mentioned in the task +2 rights [12:02:50] 10serviceops, 10Operations, 10Datacenter-Switchover: Updates to warmup script - https://phabricator.wikimedia.org/T269179 (10jbond) p:05Triage→03Medium [12:02:57] jayme: thank you! [12:03:13] akosiaris: yeah, thats cool right?! (traffic shaping) [12:03:40] akosiaris: thanks. It could just be me and Tgr (Gergő), I'm not sure Martin from Research needs +2 per se... [12:03:40] jayme: yeah. We don't have an actual use case, but it might prove handy at some point [12:04:38] kostajh: fine by me. https://en.wikipedia.org/wiki/Principle_of_least_privilege says that if it's not needed, no reason for it [12:08:50] 10serviceops, 10Operations, 10Platform Engineering: Upgrade snapshot hosts to Buster - https://phabricator.wikimedia.org/T269377 (10jbond) p:05Triage→03Medium [12:09:56] +1 [13:28:32] 10serviceops, 10Discovery-Search, 10Maps, 10Product-Infrastructure-Team-Backlog: [OSM] Backport imposm3 to the debian channel - https://phabricator.wikimedia.org/T238753 (10Jgiannelos) It looks like the debian package are also able to be built on stretch if backports are enabled. I uploaded a debian packag... [13:48:01] 10serviceops: install racktables on miscweb2002 - https://phabricator.wikimedia.org/T269746 (10jbond) p:05Triage→03Medium [13:48:20] 10serviceops: install racktables on miscweb2002 - https://phabricator.wikimedia.org/T269746 (10jbond) @Dzahn would this be for you? [14:35:48] 10serviceops, 10CX-cxserver, 10Language-Team (Language-2020-October-December), 10Patch-For-Review, 10Release-Engineering-Team (Pipeline): Migrate apertium to the deployment pipeline - https://phabricator.wikimedia.org/T255672 (10KartikMistry) [14:36:16] 10serviceops, 10Add-Link, 10Growth-Team: Add Link engineering: Allow external traffic to linkrecommendation service - https://phabricator.wikimedia.org/T269581 (10akosiaris) Arguably, in the interest of https://en.wikipedia.org/wiki/Separation_of_concerns it's probably better than whatever instance of the se... [14:36:18] 10serviceops, 10CX-cxserver, 10Language-Team (Language-2020-October-December), 10Patch-For-Review, 10Release-Engineering-Team (Pipeline): Migrate apertium to the deployment pipeline - https://phabricator.wikimedia.org/T255672 (10KartikMistry) All done except announcement/notes to the team! [14:36:36] akosiaris / jayme could I schedule a short (30min) meeting tomorrow or Friday with either / both of you or whoever else from SRE to talk about next steps for deploying the link recommendation service? [14:37:13] Oh, and I was going to ask about T269581 specifically but I see akosiaris just replied there, thank you :) [14:38:13] kostajh: Silent Fridays :-) [14:38:24] sshh... [14:38:42] but I think once you got our deployer access all you need is a helmfile -e sync and you 'll deploy [14:38:47] 10serviceops, 10Add-Link, 10Growth-Team: Add Link engineering: Allow external traffic to linkrecommendation service - https://phabricator.wikimedia.org/T269581 (10kostajh) >>! In T269581#6679284, @akosiaris wrote: > Arguably, in the interest of https://en.wikipedia.org/wiki/Separation_of_concerns it's probab... [14:39:27] Docs at https://wikitech.wikimedia.org/wiki/Deployments_on_kubernetes#Deploying_with_helmfile [14:39:51] Once we are sure it works, we can setup LVS and allow mediawiki/other internal clients to talk to it [14:41:24] ok. When I get deploy access I'll give it a go [14:43:59] +1 to what akosiaris said :-) I think you're good to go to deploy (and we need to pick up then and set up LVS for you when the service is working as expected). [14:44:12] 10serviceops, 10Discovery-Search, 10Maps, 10Product-Infrastructure-Team-Backlog: [OSM] Backport imposm3 to the debian channel - https://phabricator.wikimedia.org/T238753 (10MoritzMuehlenhoff) Relying on stretch-backports isn't much of an option, it can disappear any moment from the Debian archive and we're... [14:44:18] I just have to add your mysql secrets still. Will do in a sec [14:45:45] 10serviceops, 10Add-Link, 10Growth-Team: Add Link engineering: Allow external traffic to linkrecommendation service - https://phabricator.wikimedia.org/T269581 (10akosiaris) >>! In T269581#6679290, @kostajh wrote: >>>! In T269581#6679284, @akosiaris wrote: >> Arguably, in the interest of https://en.wikipedia... [15:06:54] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Refactor calico deploy strategy - https://phabricator.wikimedia.org/T267653 (10akosiaris) >>! In T267653#6678721, @JMeybohm wrote: > The new calico chart is merged, thanks @akosiaris > > What is missing currently is a prop... [15:35:48] 10serviceops, 10Dumps-Generation, 10Operations, 10Platform Engineering: Upgrade snapshot hosts to Buster - https://phabricator.wikimedia.org/T269377 (10ArielGlenn) [15:37:19] 10serviceops, 10Dumps-Generation, 10Operations, 10Platform Engineering: Upgrade snapshot hosts to Buster - https://phabricator.wikimedia.org/T269377 (10ArielGlenn) I can do the testbed host first, and then the rest. Do we have a mediawiki server on buster anywhere in the cluster yet? [15:39:04] 10serviceops, 10Dumps-Generation, 10Operations, 10Platform Engineering: Upgrade snapshot hosts to Buster - https://phabricator.wikimedia.org/T269377 (10MoritzMuehlenhoff) Yes, mwdebug1003 is running Buster, you can select it with the latest version of the WikimediaDebug browser extension. [15:52:59] thanks moritzm I will maybe set up a buster instance in deployment-prep and see how things look [15:53:14] slow though, what's our timeline for the move anyways? [15:54:30] early next Q, but will take bit to get them all done [15:55:35] I think instead of fidding with with deployment-prep you can just as well do the testbed in prod directly, the PHP are identical except minor toolchain changes, so would not expect any issues at all for dumps [16:12:16] will keep it in mind, thanks! [16:17:48] 10serviceops, 10Packaging: Please provide our special component/php72 in buster-wikimedia - https://phabricator.wikimedia.org/T250515 (10MoritzMuehlenhoff) >>! In T250515#6676486, @MoritzMuehlenhoff wrote: > That's actually a bug, will update the package tomorrow. Fixed. [16:34:16] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: eqiad: Physical moves for MediaWiki servers - https://phabricator.wikimedia.org/T266164 (10Cmjohnson) @Dzahn Is it possible to move mw1281,82 and 83? I need this space for the an-workers on 10G. I can move them to A8. [17:00:07] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: eqiad: Physical moves for MediaWiki servers - https://phabricator.wikimedia.org/T266164 (10elukey) I can definitely help on this @Dzahn, lemme know if you need a pair of extra hands :) [17:16:42] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: eqiad: Physical moves for MediaWiki servers - https://phabricator.wikimedia.org/T266164 (10Dzahn) [18:02:03] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: eqiad: Physical moves for MediaWiki servers - https://phabricator.wikimedia.org/T266164 (10ops-monitoring-bot) Icinga downtime for 4:00:00 set by dzahn@cumin1001 on 1 host(s) and their services with reason: move_to_other_rack ` mw1281.eqiad.wmnet ` [18:02:06] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: eqiad: Physical moves for MediaWiki servers - https://phabricator.wikimedia.org/T266164 (10ops-monitoring-bot) Icinga downtime for 4:00:00 set by dzahn@cumin1001 on 1 host(s) and their services with reason: move_to_other_rack ` mw1282.eqiad.wmnet ` [18:02:09] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: eqiad: Physical moves for MediaWiki servers - https://phabricator.wikimedia.org/T266164 (10ops-monitoring-bot) Icinga downtime for 4:00:00 set by dzahn@cumin1001 on 1 host(s) and their services with reason: move_to_other_rack ` mw1283.eqiad.wmnet ` [18:06:17] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: eqiad: Physical moves for MediaWiki servers - https://phabricator.wikimedia.org/T266164 (10Dzahn) @Cmjohnson Yes. I just depooled mw1281-1283, downtimed them and then shut them down physically. You can move them. [18:08:51] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Refactor calico deploy strategy - https://phabricator.wikimedia.org/T267653 (10JMeybohm) >>! In T267653#6679363, @akosiaris wrote: >>>! In T267653#6678721, @JMeybohm wrote: >> The new calico chart is merged, thanks @akosiari... [18:12:06] 10serviceops, 10Operations, 10Release-Engineering-Team-TODO, 10Patch-For-Review, and 2 others: Upgrade MediaWiki appservers to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Icinga downtime for 4:00:00 set by dzahn@cumin1001 on 1 host(s) and their services with... [18:14:31] 10serviceops, 10Add-Link, 10Growth-Team: Add Link engineering: Puppetize DB credentials - https://phabricator.wikimedia.org/T269573 (10JMeybohm) 05Open→03Resolved a:03JMeybohm Secrets have been planted with "[puppet-private] (d2082d1e) (jayme) Add linkrecommendation db credentials". Generated YAML look... [18:19:18] 10serviceops, 10Operations, 10Release-Engineering-Team-TODO, 10Patch-For-Review, and 2 others: Upgrade MediaWiki appservers to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` mw2... [19:01:07] 10serviceops, 10Operations, 10Parsoid, 10Parsoid-Tests, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10Dzahn) meanwhile there is no more /srv/parsoid on testreduce1001 but /srv/parsoid-testing instead. I tried an "npm... [19:25:18] 10serviceops, 10Operations, 10Release-Engineering-Team-TODO, 10Patch-For-Review, and 2 others: Upgrade MediaWiki appservers to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2243.codfw.wmnet'] ` and were **ALL** successful. [19:43:30] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: eqiad: Physical moves for MediaWiki servers - https://phabricator.wikimedia.org/T266164 (10Cmjohnson) [19:43:59] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: eqiad: Physical moves for MediaWiki servers - https://phabricator.wikimedia.org/T266164 (10Cmjohnson) @dzahn completed the move and mw1281-83 are up [20:17:10] 10serviceops, 10Operations, 10Parsoid, 10Parsoid-Tests, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10Dzahn) 05Open→03Resolved [20:29:20] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: eqiad: Physical moves for MediaWiki servers - https://phabricator.wikimedia.org/T266164 (10Dzahn) @Cmjohnson Thank you. Repooled and receiving traffic again. Monitoring looks good. [20:35:46] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: eqiad: Physical moves for MediaWiki servers - https://phabricator.wikimedia.org/T266164 (10Dzahn) @Cmjohnson Should this stay open for mw1313-mw1316 or did we solve the issue by moving other servers now? [20:51:48] 10serviceops, 10Add-Link, 10Growth-Team: Add Link engineering: Allow external traffic to linkrecommendation service - https://phabricator.wikimedia.org/T269581 (10Tgr) If it's easy to selectively address one or the other from MediaWiki, we could just have a special page or API proxy queries to the secondary... [21:23:33] 10serviceops, 10MW-on-K8s, 10Operations, 10TechCom-RFC, and 2 others: RFC: PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10Krinkle) The service is mainly for executing shell commands. Right now that happens through `wfShellExec`. Typically to invoke program... [21:24:39] 10serviceops, 10MW-on-K8s, 10Operations: Sandbox/limit child processes within a container runtime - https://phabricator.wikimedia.org/T252745 (10Krinkle) [21:24:42] 10serviceops, 10MW-on-K8s, 10Operations, 10Patch-For-Review, and 2 others: RFC: PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10Krinkle) 05Open→03Resolved [21:24:56] 10serviceops, 10MW-on-K8s, 10Operations, 10Patch-For-Review, and 2 others: RFC: PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10Krinkle) >>! In T260330#6632099, @Krinkle wrote: > Put on Last Call until 2 December. This RFC has been approved and is now closed. [22:01:56] 10serviceops, 10Operations, 10Release-Engineering-Team-TODO, 10Patch-For-Review, and 2 others: Upgrade MediaWiki appservers to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` mw2... [23:13:01] 10serviceops, 10Operations, 10Release-Engineering-Team-TODO, 10Patch-For-Review, and 2 others: Upgrade MediaWiki appservers to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2243.codfw.wmnet'] ` and were **ALL** successful. [23:20:46] 10serviceops, 10Operations, 10Release-Engineering-Team-TODO, 10Patch-For-Review, and 2 others: Upgrade MediaWiki appservers to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Dzahn) [23:24:58] 10serviceops, 10Operations, 10Release-Engineering-Team-TODO, 10Patch-For-Review, and 2 others: Upgrade MediaWiki appservers to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Dzahn) >>! In T245757#6645352, @jijiki wrote: > @Dzahn @hnowlan After discussing with @Muehlenhoff, since w...