[05:22:42] 10serviceops, 10Continuous-Integration-Infrastructure, 10Operations, 10Release-Engineering-Team-TODO (201907): contint1001 store docker images on separate partition or disk - https://phabricator.wikimedia.org/T207707 (10Joe) @thcipriani why do we even need to save the images? We don't really care about loc... [08:03:04] 10serviceops, 10Continuous-Integration-Infrastructure, 10Operations, 10Release-Engineering-Team-TODO (201907): contint1001 store docker images on separate partition or disk - https://phabricator.wikimedia.org/T207707 (10hashar) >>! In T207707#5302475, @Joe wrote: > @thcipriani why do we even need to save t... [08:19:55] 10serviceops: deploy CoreDNS as a in-cluster DNS service - https://phabricator.wikimedia.org/T226516 (10fsero) p:05Triage→03Normal [09:01:58] o/ _joe_ and akosiaris i need some help with this, uploaded a secret value for helm on puppet run a puppet run and it failed due to this https://gerrit.wikimedia.org/r/c/operations/puppet/+/520387 [09:02:34] could you please take a quick review? what im doing there is create a system user since im using username field on services array as an user and that user does not exist on deploy1011 [09:02:37] depoy1001 [09:02:49] * akosiaris looking [09:03:09] dont know if thats the best course of action but it seems the quickest [09:03:38] <_joe_> uhm [09:03:48] <_joe_> let me see the data it's reading [09:03:50] it seems ugly to me [09:03:59] <_joe_> I assumed user/group would me the usual deployment group [09:04:06] me too, I think the user should be 'wikidev' [09:04:35] i can do that as well [09:05:08] I wonder though why it's different from $data['username'] a few lines below [09:05:11] * akosiaris looking [09:06:46] ah I think I know [09:06:46] https://www.irccloud.com/pastebin/uAYu9NeE/ [09:07:16] "I think the user should be 'wikidev'" there is no wikidev user, there is a wikidev group [09:08:49] yeah it's trebuchet:wikidev [09:08:50] sigh [09:09:10] the other choice is mwdeploy:wikidev [09:09:36] both are fine in this case and indeed avoiding that old trebuchet thing would be wiser [09:09:41] lemme upload a change [09:10:07] sure [09:12:19] <_joe_> yeah either we change the concept of what's a deployer [09:12:48] <_joe_> and yes, not trebuchet pls :D [09:13:25] <_joe_> but the wider question is - do we want everyone to be able to deploy everyone else's code? [09:13:57] they actually can right now [09:14:22] but if we dont want that, and probably we dont want that we need tom play with groups i guess [09:17:12] yup, exactly that [09:17:33] we have all the stuff in place aside from creating/naming and assigning the groups [09:18:02] so what we are missing is essentially turning the knob [09:18:06] fsero: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/520390 [09:18:50] there's a bit of duplication in there for now, we can avoid it creating a define and passing a default value [09:19:50] should we add some Require[ User[] and Group[] ] [09:19:57] for the sake of safety? [09:20:09] besides that LGTM [09:20:13] i'd abandon mine [09:20:14] <_joe_> fsero: +1 [09:20:25] <_joe_> we should require User[] and Group[] [09:20:39] they are implicitly required [09:21:01] puppet at least does that for us (and it can be a pain when you are trying to figure out a cycle) [09:21:15] <_joe_> akosiaris: nope, not with users/groups [09:21:22] <_joe_> unless there is something I never saw [09:21:34] file -> user and file -> group exist implicitly [09:22:56] https://puppet.com/docs/puppet/5.3/type.html#file [09:22:57] Autorequires: If Puppet is managing the user or group that owns a file, the file resource will autorequire them. If Puppet is managing any parent directories of a file, the file resource will autorequire them. [09:23:18] that's a big if IMHO :) [09:23:40] don't jump into 2025 when we are using puppet 5 https://puppet.com/docs/puppet/4.8/type.html#file [09:23:45] still stands true [09:23:49] but be cautios :P [09:23:52] *cautious [09:23:53] we are already using 5.3 [09:23:56] * vgutierrez hides [09:23:59] yeah it's been true since 0.26 [09:24:10] hence my easy with versions [09:24:14] ease* [09:24:25] ok, ill merge akosiaris [09:25:39] cool [09:29:14] works like a charm :) [09:29:40] you will see several commits today on private repo, just filling the blanks for helm secrets on staging [09:54:53] <_joe_> akosiaris: " If Puppet is managing the user or group that owns a file," [09:55:04] <_joe_> that's why I proposed to add the explicit require [09:55:12] <_joe_> so that we're sure those users are puppet-managed [09:55:33] what makes you believe they won't be ? [09:55:45] <_joe_> a future error? [09:55:49] like? [09:56:16] <_joe_> like someone like me changes the user for 'zotero' to a new one and forgets to check if it's declared :P [09:56:31] <_joe_> the compiler will tell me "all ok" [09:56:37] <_joe_> and the catalog will fail to apply [09:56:47] <_joe_> while with the require, complation would fail [09:57:21] which would be easily fixed [09:57:46] I am not so sure it's worth it tbh, feel free to add it, but I don't feel it's really needed [09:57:57] <_joe_> yeah it was "nice to have" [09:58:02] <_joe_> rather than "needed" [12:53:33] _joe_: anything me or cdanis can do to help with the conftool "blocker"? [12:53:56] <_joe_> volans: err wait, I'm about to send a patch [12:55:08] great, thanks :) [I was also asking because of team meeting today, in order to have up-to-date info ;) ] [12:55:28] <_joe_> volans: https://gerrit.wikimedia.org/r/#/c/mediawiki/tools/scap/+/491412/ [13:55:07] https://www.irccloud.com/pastebin/1oM0G8Cx/ [13:55:30] no changes between what is live in staging and what is captured in code in the deployment-charts repo plus the private repo on puppet [14:58:29] 10serviceops, 10Core Platform Team Backlog (Designing), 10Services (designing): Allow service-checker to run multiple domains for RESTBase - https://phabricator.wikimedia.org/T227198 (10Pchelolo) [14:59:27] 10serviceops, 10Core Platform Team Backlog (Designing), 10Services (designing): Allow service-checker to run multiple domains for RESTBase - https://phabricator.wikimedia.org/T227198 (10Pchelolo) Having said that, we need to think whether this can wait until Phester is ready and we are making a decision to r... [15:53:28] same for eqiad [15:53:34] https://www.irccloud.com/pastebin/gY07Niza/ [16:10:29] https://www.irccloud.com/pastebin/YyLLTI7M/ [16:36:26] <_joe_> fsero: \o/ [16:36:30] <_joe_> that's great [16:44:40] regarding the "CI needs 2 disks added in contint1001" - i created exactly what hashar requested. a RAID1 and then LVM on top and then a volume group and a logical volume with 250G for docker images, mounted it on /mnt/docker and then assigned it back [16:45:16] regarding phabricator, merged the change Mukunda needed to manage new ssh.log file [16:45:28] created zoteto namespace from helmfile, and recreated zotero service from helmfile as well [16:45:33] it works [16:45:42] also as you may notice helmfile is also able to spam [16:45:45] on SAL [16:46:47] <_joe_> mutante: it should be possible to move to the lvm storage engine for docker on contint1001 [16:47:05] <_joe_> but you would need to get an ok from them that they are ok with losing all images [16:47:13] <_joe_> and having to pull them again [16:47:37] lvm storage is only good for images that receives a lot of writes, for a CI server i think either ZFS or overlay2 are better options [16:48:41] <_joe_> well CI does a lot of writes, compared to the rest [16:48:50] <_joe_> but I think we can live with overlay [16:49:44] akosiaris: _joe_ how do we contact service owners to make them change to helmfile and drop scap-helm? announce to engineering@ list? [16:49:58] maybe they are already reading this channel? [16:50:21] <_joe_> fsero: remove scap-helm and put a script that outputs a link to the wikitech page about helmfile :) [16:50:29] i like that [16:50:44] lemme write the page first [16:50:47] but yep [16:50:51] <_joe_> and yes send an email to the few people who already use that [16:50:52] ok, i read a bit about overlay2, will look into switching to that [16:51:04] <_joe_> mutante: we use overlay2 right now [16:51:13] mutante: is the default for newer versions [16:51:29] <_joe_> unless it's still on overlay [16:52:16] ok, but i thought you mean which file system i put on the new logical volume [16:52:26] that is for storing images [16:52:38] 10serviceops, 10Prod-Kubernetes, 10Patch-For-Review: Helm packages deployment tool, at least for cluster applications. - https://phabricator.wikimedia.org/T212130 (10fsero) pending some documentation for helping people to migrate this is essentially done [16:52:52] i wasnt clear if that just stores files or they run from there i guess [16:53:18] overlay and overlay2 uses files over an ext4 fs [16:53:43] it also support xfs [16:53:45] ok, i formatted it with ext4 [16:53:54] https://docs.docker.com/v17.09/engine/userguide/storagedriver/overlayfs-driver/ [16:58:40] ok, i see thanks. i see Tyler already made comments about having options between overlay2 and devicemapper.. well as i just saw on that page overlay2 should be more performant than devicemapper.. so they should probably stick to that [16:59:15] <_joe_> remember contint is jessie [16:59:20] <_joe_> will have like docker 3 [16:59:54] <_joe_> scratch that, it has 18.06 [16:59:56] also.. ²) The devicemapper storage driver is deprecated in Docker Engine 18.09, and will be removed in a future release. It is recommended that users of the devicemapper storage driver migrate to overlay2. [17:00:21] <_joe_> fsero: interesting, we're using devicemapper on the kube nodes I think [17:00:28] <_joe_> because well at the time overlay was a joke [17:00:29] yep and we shouldnt [17:00:43] well on kube i think we are using docker 1.12 [17:00:47] which is quite old right now [17:00:59] <_joe_> we could upgrade one node, see what happens! [17:02:08] 10serviceops, 10Continuous-Integration-Infrastructure, 10Operations, 10Release-Engineering-Team-TODO (201907): contint1001 store docker images on separate partition or disk - https://phabricator.wikimedia.org/T207707 (10Dzahn) >>! In T207707#5302139, @thcipriani wrote: > The other option would be to move t... [17:17:10] _j.oe_: for tomorrow, https://gerrit.wikimedia.org/r/c/operations/puppet/+/520503 review this please :) [17:17:24] adding a buster base docker image [17:17:35] moritz.m: is happy about it [17:17:53] damn [17:18:24] fsero: ETOOLATE [17:18:25] :) [17:18:44] ETOOLATTE [17:18:48] lol [17:18:56] yup [17:19:19] I guess you'll wait the official release [17:19:50] well is next monday, i can definitely wait yep [17:20:03] buster next monday? [17:20:25] 6th of July, 2019 [17:20:25] The release date of Debian 10 Buster has been set to 6th of July, 2019 [17:22:01] saturday! [17:22:13] caturday! [17:22:16] utc time of release please? [17:22:21] baturday! [17:22:26] or busterday or something [17:27:31] apergos: where is ready ofc :) [17:27:33] *when [17:28:35] yes but it could come out saturday at 11 pm pst let's say [17:29:12] which is already sunday for us [17:29:22] and then that's half the weekend gone :-P [17:47:22] they said "in 3 days" today on #debian [17:47:28] oh, yep [17:48:38] fsero: i think you have a remnant from copy/paste in that patch [17:49:11] yup [17:49:17] copy pasta driven developmet [17:49:27] ill fix it tomorrow :) [17:49:35] ty mutante [17:50:02] yw, and good night [19:54:34] 10serviceops, 10Operations, 10ops-codfw, 10User-jijiki: Degraded RAID on mw2250 - https://phabricator.wikimedia.org/T226948 (10Papaul) I talked to @MoritzMuehlenhoff on irc about this system. We have no 500GB 2"5 SATA disks on site for replacement. Option 1: Open a procurement task to request spare disks...