[00:12:26] 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)): Replace production deployment servers and update them to Buster - https://phabricator.wikimedia.org/T265963 (10Dzahn) p:05Medium→03High [00:55:08] 10serviceops, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Patch-For-Review: Switch CI docker usage to use dedicated "ci-build" account - https://phabricator.wikimedia.org/T275559 (10thcipriani) >>! In T275559#6862829, @Legoktm wrote: >> but are the credentials anywhere else outsi... [00:57:37] 10serviceops, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Patch-For-Review: Switch CI docker usage to use dedicated "ci-build" account - https://phabricator.wikimedia.org/T275559 (10Legoktm) You should be able to obtain the password for the "ci-build" user in `/etc/docker-pkg/int... [01:17:11] 10serviceops: Phase out legacy "uploader" docker-registry.wikimedia.org user - https://phabricator.wikimedia.org/T275581 (10Legoktm) [01:17:37] 10serviceops, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Patch-For-Review: Switch CI docker usage to use dedicated "ci-build" account - https://phabricator.wikimedia.org/T275559 (10Legoktm) 05Open→03Resolved a:03Legoktm Tested that pushing to the registry still works with... [01:18:32] 10serviceops: Phase out legacy "uploader" docker-registry.wikimedia.org user - https://phabricator.wikimedia.org/T275581 (10Legoktm) I'll probably aim to do this mid-next week just to give a bit of time to find any other broken things. [01:19:37] 10serviceops, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Patch-For-Review: Switch CI docker usage to use dedicated "ci-build" account - https://phabricator.wikimedia.org/T275559 (10thcipriani) >>! In T275559#6863276, @thcipriani wrote: >>>! In T275559#6862829, @Legoktm wrote: >>... [08:46:53] 10serviceops, 10Continuous-Integration-Infrastructure, 10SRE, 10Patch-For-Review: legoktm can't build CI docker images without using root because he's no longer in contint-admins - https://phabricator.wikimedia.org/T275731 (10jbond) This has now been added i have included the full list of the contin-admins... [08:47:31] 10serviceops, 10Continuous-Integration-Infrastructure, 10SRE, 10Patch-For-Review: legoktm can't build CI docker images without using root because he's no longer in contint-admins - https://phabricator.wikimedia.org/T275731 (10jbond) 05Open→03Resolved a:03jbond [14:02:09] 10serviceops, 10MW-on-K8s, 10Platform Engineering, 10Scap, and 5 others: Define variant Wikimedia production config in compiled, static files - https://phabricator.wikimedia.org/T223602 (10Ladsgroup) What is this blocked on? Just needs to be done? [14:49:17] Hi! I want to add the request id of the incoming mediawiki request to termbox's log output for better traceability. is there a standard way to do this? any naming convention for the field? I checked whether something like that is already built into service-runner but it doesn't look like it. [14:49:35] (ticket https://phabricator.wikimedia.org/T268640) [15:23:25] akosiaris: ^ if you have a minute, I think you may be able to answer this :) [17:18:32] so I've been staring at swift logs for a couple of days, I've trimmed down the issue of "some swift uploads reported slow" in T275752 to jobrunners, specifically buster jobrunners it looks like (from latest updates) [18:00:53] commented on the termbox task [18:02:21] godog: do you think it's another "new kernel makes it slower" or something else? [18:35:54] 10serviceops, 10Continuous-Integration-Infrastructure, 10SRE: legoktm can't build CI docker images without using root because he's no longer in contint-admins - https://phabricator.wikimedia.org/T275731 (10Legoktm) Thanks! [19:08:40] 10serviceops, 10SRE: upgrade mwmaint servers to buster - https://phabricator.wikimedia.org/T267607 (10Dzahn) T274170 introduced new hardware mwmaint2002 and can be used now. timing :p [19:10:05] 10serviceops, 10SRE: move mwmaint2002 into production - https://phabricator.wikimedia.org/T275905 (10Dzahn) [19:12:14] 10serviceops, 10SRE: move mwmaint2002 into production - https://phabricator.wikimedia.org/T275905 (10Dzahn) [19:19:09] 10serviceops, 10SRE: move mwmaint2002 into production, replace mwmaint2001 - https://phabricator.wikimedia.org/T275905 (10Dzahn) [21:13:14] when running "scap pull" on a deployment server: [21:13:21] sudo: /usr/local/sbin/check-and-restart-php: command not found [21:13:29] there is no such command on any deploy* though [21:13:44] normally we dont scap pull there but .. we want to sync new servers [21:17:52] but why do we have this: https://gerrit.wikimedia.org/r/c/operations/puppet/+/667043/2/modules/scap/templates/scap.cfg.erb and at the same time the restart commands arent installed on deploy* ? [21:25:18] I'll make a patch to include ::profile::mediawiki::php::restarts in deployment_server role [21:25:38] I am surprised why we did not hear complaints from deployers though.. since these are in dsh groups [21:31:08] I don't think deployers would run `scap pull` on deployment servers themselves [21:32:03] most deployers would only ever run it on mwdebug* after testing a patch or when we were doing the reimages on various appservers [21:32:32] also, there's no php-fpm to restart on the deployment servers I think [21:33:06] legoktm: yea, the first part is why we haven't noticed but if they are in dsh groups and there is no restart command wouldn't it also error on normal deploy [21:33:32] legoktm: yea, that it's missing is something I want to fix now so that we can scap pull when migrating deployment servers [21:33:50] because "scap init" can't work on the inactive servers [21:34:13] the restart should be the final step, doesn't it sync everything anyways? [21:34:30] but for whatever reason, when you do a normal sync, it doesn't error out [21:34:50] yes, it does, except another issue about deleting non-empty cache dirs [21:34:55] but yes, it does sync [21:36:15] ah.. so they are in dsh.yaml but ot _as targets_, only as _masters_ [21:36:30] that's why it is not an issue during deployments [21:40:06] well.. I can either just add the restart command or ignore the error because "not an issue unless in this special case" [21:43:18] mutante: there's a `scap pull-master` command that might do it correctly [21:43:25] > Sync local MediaWiki staging directory with deploy server state. [21:43:46] legoktm: that handles only "staging" and "patches" [21:44:45] hm [21:44:56] not sure then, maybe someone in -releng knows [21:45:32] ack, already 2 people and tickets in :) [21:46:59] :D [22:19:34] 10serviceops, 10SRE, 10Patch-For-Review: move mwmaint2002 into production, replace mwmaint2001 - https://phabricator.wikimedia.org/T275905 (10Dzahn) [22:33:27] another thing.. if you switch the deployment server and scap master in codfw... [22:33:37] and then scap pull on a random codfw host to confirm it works [22:33:45] that isnt a real test [22:34:01] why? because: deployment.codfw.wmnet is an alias for deploy1001.eqiad.wmnet. [22:35:02] we will have to just test it once we switch eqiad, messing with that seems a bad idea