[00:12:26] <wikibugs>	 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)): Replace production deployment servers and update them to Buster - https://phabricator.wikimedia.org/T265963 (10Dzahn) p:05Medium→03High
[00:55:08] <wikibugs>	 10serviceops, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Patch-For-Review: Switch CI docker usage to use dedicated "ci-build" account - https://phabricator.wikimedia.org/T275559 (10thcipriani) >>! In T275559#6862829, @Legoktm wrote: >> but are the credentials anywhere else outsi...
[00:57:37] <wikibugs>	 10serviceops, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Patch-For-Review: Switch CI docker usage to use dedicated "ci-build" account - https://phabricator.wikimedia.org/T275559 (10Legoktm) You should be able to obtain the password for the "ci-build" user in `/etc/docker-pkg/int...
[01:17:11] <wikibugs>	 10serviceops: Phase out legacy "uploader" docker-registry.wikimedia.org user - https://phabricator.wikimedia.org/T275581 (10Legoktm)
[01:17:37] <wikibugs>	 10serviceops, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Patch-For-Review: Switch CI docker usage to use dedicated "ci-build" account - https://phabricator.wikimedia.org/T275559 (10Legoktm) 05Open→03Resolved a:03Legoktm Tested that pushing to the registry still works with...
[01:18:32] <wikibugs>	 10serviceops: Phase out legacy "uploader" docker-registry.wikimedia.org user - https://phabricator.wikimedia.org/T275581 (10Legoktm) I'll probably aim to do this mid-next week just to give a bit of time to find any other broken things.
[01:19:37] <wikibugs>	 10serviceops, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Patch-For-Review: Switch CI docker usage to use dedicated "ci-build" account - https://phabricator.wikimedia.org/T275559 (10thcipriani) >>! In T275559#6863276, @thcipriani wrote: >>>! In T275559#6862829, @Legoktm wrote: >>...
[08:46:53] <wikibugs>	 10serviceops, 10Continuous-Integration-Infrastructure, 10SRE, 10Patch-For-Review: legoktm can't build CI docker images without using root because he's no longer in contint-admins - https://phabricator.wikimedia.org/T275731 (10jbond) This has now been added i have included the full list of the contin-admins...
[08:47:31] <wikibugs>	 10serviceops, 10Continuous-Integration-Infrastructure, 10SRE, 10Patch-For-Review: legoktm can't build CI docker images without using root because he's no longer in contint-admins - https://phabricator.wikimedia.org/T275731 (10jbond) 05Open→03Resolved a:03jbond
[14:02:09] <wikibugs>	 10serviceops, 10MW-on-K8s, 10Platform Engineering, 10Scap, and 5 others: Define variant Wikimedia production config in compiled, static files - https://phabricator.wikimedia.org/T223602 (10Ladsgroup) What is this blocked on? Just needs to be done?
[14:49:17] <jakob_WMDE_>	 Hi! I want to add the request id of the incoming mediawiki request to termbox's log output for better traceability. is there a standard way to do this? any naming convention for the field? I checked whether something like that is already built into service-runner but it doesn't look like it.
[14:49:35] <jakob_WMDE_>	 (ticket https://phabricator.wikimedia.org/T268640)
[15:23:25] <jakob_WMDE_>	 akosiaris: ^ if you have a minute, I think you may be able to answer this :)
[17:18:32] <godog>	 so I've been staring at swift logs for a couple of days, I've trimmed down the issue of "some swift uploads reported slow" in T275752 to jobrunners, specifically buster jobrunners it looks like (from latest updates)
[18:00:53] <legoktm>	 commented on the termbox task
[18:02:21] <legoktm>	 godog: do you think it's another "new kernel makes it slower" or something else?
[18:35:54] <wikibugs>	 10serviceops, 10Continuous-Integration-Infrastructure, 10SRE: legoktm can't build CI docker images without using root because he's no longer in contint-admins - https://phabricator.wikimedia.org/T275731 (10Legoktm) Thanks!
[19:08:40] <wikibugs>	 10serviceops, 10SRE: upgrade mwmaint servers to buster - https://phabricator.wikimedia.org/T267607 (10Dzahn) T274170 introduced new hardware mwmaint2002 and can be used now. timing :p
[19:10:05] <wikibugs>	 10serviceops, 10SRE: move mwmaint2002 into production - https://phabricator.wikimedia.org/T275905 (10Dzahn)
[19:12:14] <wikibugs>	 10serviceops, 10SRE: move mwmaint2002 into production - https://phabricator.wikimedia.org/T275905 (10Dzahn)
[19:19:09] <wikibugs>	 10serviceops, 10SRE: move mwmaint2002 into production, replace mwmaint2001 - https://phabricator.wikimedia.org/T275905 (10Dzahn)
[21:13:14] <mutante>	 when running "scap pull" on a deployment server:
[21:13:21] <mutante>	 sudo: /usr/local/sbin/check-and-restart-php: command not found
[21:13:29] <mutante>	 there is no such command on any deploy* though
[21:13:44] <mutante>	 normally we dont scap pull there but .. we want to sync new servers
[21:17:52] <mutante>	 but why do we have this: https://gerrit.wikimedia.org/r/c/operations/puppet/+/667043/2/modules/scap/templates/scap.cfg.erb   and at the same time the restart commands arent installed on deploy* ?
[21:25:18] <mutante>	 I'll make a patch to include ::profile::mediawiki::php::restarts  in deployment_server role
[21:25:38] <mutante>	 I am surprised why we did not hear complaints from deployers though.. since these are in dsh groups
[21:31:08] <legoktm>	 I don't think deployers would run `scap pull` on deployment servers themselves
[21:32:03] <legoktm>	 most deployers would only ever run it on mwdebug* after testing a patch or when we were doing the reimages on various appservers
[21:32:32] <legoktm>	 also, there's no php-fpm to restart on the deployment servers I think
[21:33:06] <mutante>	 legoktm: yea, the first part is why we haven't noticed but if they are in dsh groups and there is no restart command wouldn't it also error on normal deploy
[21:33:32] <mutante>	 legoktm: yea, that it's missing is something I want to fix now so that we can scap pull when migrating deployment servers
[21:33:50] <mutante>	 because "scap init" can't work on the inactive servers 
[21:34:13] <legoktm>	 the restart should be the final step, doesn't it sync everything anyways?
[21:34:30] <legoktm>	 but for whatever reason, when you do a normal sync, it doesn't error out
[21:34:50] <mutante>	 yes, it does, except another issue about deleting non-empty cache dirs
[21:34:55] <mutante>	 but yes, it does sync
[21:36:15] <mutante>	 ah.. so they are in dsh.yaml but ot _as targets_, only as _masters_
[21:36:30] <mutante>	 that's why it is not an issue during deployments
[21:40:06] <mutante>	 well.. I can either just add the restart command or ignore the error because "not an issue unless in this special case"
[21:43:18] <legoktm>	 mutante: there's a `scap pull-master` command that might do it correctly
[21:43:25] <legoktm>	 > Sync local MediaWiki staging directory with deploy server state.
[21:43:46] <mutante>	 legoktm: that handles only "staging" and "patches" 
[21:44:45] <legoktm>	 hm
[21:44:56] <legoktm>	 not sure then, maybe someone in -releng knows
[21:45:32] <mutante>	 ack, already 2 people and tickets in :)
[21:46:59] <legoktm>	 :D
[22:19:34] <wikibugs>	 10serviceops, 10SRE, 10Patch-For-Review: move mwmaint2002 into production, replace mwmaint2001 - https://phabricator.wikimedia.org/T275905 (10Dzahn)
[22:33:27] <mutante>	 another thing.. if you switch the deployment server and scap master in codfw...
[22:33:37] <mutante>	 and then scap pull on a random codfw host to confirm it works
[22:33:45] <mutante>	 that isnt a real test 
[22:34:01] <mutante>	 why? because: deployment.codfw.wmnet is an alias for deploy1001.eqiad.wmnet.
[22:35:02] <mutante>	 we will have to just test it once we switch eqiad, messing with that seems a bad idea