[00:00:58] and /blubber/ has content now [00:08:51] nice [01:00:40] 10serviceops, 10Patch-For-Review, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Next): Publish Blubber releases on releases.wikimedia.org - https://phabricator.wikimedia.org/T213563 (10thcipriani) 05Open→03Resolved a:03thcipriani yipee! I pushed up latest binaries to releases and they ar... [08:26:41] _joe_: I love your first comment in https://phabricator.wikimedia.org/T213318#4887059 "Sounds of DBAs screaming in horror" :P [09:11:04] <_joe_> addshore: I was just awake with 2 hours of meetings in front of me [09:11:29] Morning, make sure you grab a healthy sized coffee! [09:11:49] <_joe_> oh I did [09:11:55] <_joe_> the meeting was from 7 am to 9 am [09:13:09] oooof [09:35:22] 10serviceops, 10Core Platform Team, 10Operations, 10Performance-Team, and 3 others: Set up a beta feature offering the use of PHP7 - https://phabricator.wikimedia.org/T213934 (10Mainframe98) This should probably go in Tech News, as the HHVM beta feature was too. It could repurpose from [[ https://meta.wiki... [09:39:59] 10serviceops, 10Core Platform Team, 10Operations, 10Performance-Team, and 3 others: Set up a beta feature offering the use of PHP7 - https://phabricator.wikimedia.org/T213934 (10Joe) Very good point @Mainframe98 - in fact I was planning to write an email to wikitech-l once the beta feature is set up and I... [09:51:27] 10serviceops, 10Core Platform Team, 10Operations, 10Performance-Team, and 3 others: Set up a beta feature offering the use of PHP7 - https://phabricator.wikimedia.org/T213934 (10Aklapper) > ` > Please [[phab:|report bugs]] if you see them. > ` To notify Tech News, see #notice. Small nitpick: Please avoid l... [09:54:16] 10serviceops, 10Core Platform Team, 10Operations, 10Performance-Team, and 3 others: Set up a beta feature offering the use of PHP7 - https://phabricator.wikimedia.org/T213934 (10Joe) @Krinkle did mention he saw a couple fatal errors that looked worrisome, so I'd wait for him to comment before backporting t... [09:57:23] 10serviceops, 10Core Platform Team, 10Operations, 10Performance-Team, and 3 others: Set up a beta feature offering the use of PHP7 - https://phabricator.wikimedia.org/T213934 (10Mainframe98) >>! In T213934#4887396, @Aklapper wrote: >> ` >> Please [[phab:|report bugs]] if you see them. >> ` > [...] Small ni... [10:00:28] 10serviceops, 10Core Platform Team, 10Operations, 10Performance-Team, and 3 others: Set up a beta feature offering the use of PHP7 - https://phabricator.wikimedia.org/T213934 (10Joe) >>! In T213934#4887401, @Mainframe98 wrote: >>>! In T213934#4887396, @Aklapper wrote: >>> ` >>> Please [[phab:|report bugs]]... [12:27:48] 10serviceops, 10Analytics, 10EventBus: Include git in our alpine docker image on docker-registry.wikimedia.org - https://phabricator.wikimedia.org/T213963 (10fselles) @Ottomata you can create a blubberfile like ` version: v3 base: docker-registry.wikimedia.org/wikimedia-stretch apt: packages: - git... [13:23:20] fsero: _joe_: zotero handled over to mvolz :-) [13:23:50] \o/ [13:23:52] we need to setup some icinga alerting for her, but she is now able to do a deployment and rollback starting from the "upload a change to the repo" [13:32:45] <_joe_> and maybe we (== pipeline people) can help set up the pipeline so that it detects details like zotero not being able to start [13:32:50] <_joe_> before it hits production [13:36:06] we need an alert based on desired replicas vs current ready replicas [13:36:22] it will signal issues launching the pod and cluster pressure [13:43:24] <_joe_> for every service based on k8s, yes [13:43:39] <_joe_> which means defining something via puppet probably [13:46:39] akosiaris: so... I'd like your help with https://phabricator.wikimedia.org/T214041 [13:46:54] akosiaris: mainly a memory effort regarding the steps you made while getting rid of zoterov2 [13:49:08] * akosiaris looking [13:50:00] <_joe_> oh I can explain that [13:50:15] <_joe_> pybal didn't clear the ipvs configuration upon restart I guess [13:50:24] <_joe_> or pybal was never restarted in the first place [13:50:36] it was I am pretty sure of that [13:50:57] no SAL mention though [13:51:18] Jan15 739:44 /usr/bin/python /usr/sbin/pybal [13:51:38] <_joe_> also you would've had a different alert in that case [13:51:56] probably forgot to log it. I did inform the rest of the team about it [13:52:22] <_joe_> so, I think we don't remove all ipvs entries in pybal when it restarts [13:54:47] there is one small caveat, conftool data still has the 1968 port, lemme fix that [13:55:06] mmm where? [13:55:13] but that is probably unrelated [13:55:21] <_joe_> that's in services, but that's not used by pybal [13:55:59] yeah... basically pybal doesn't clean ipvs on restart [13:56:37] and being a removed service, it didn't made anything further with it [13:58:08] so.. is it ok with you guys if I remove 10.2.2.29:1968 from IPVS? [13:58:15] υεσ [13:58:16] yes [13:59:12] we 've removed that IP from everything 2 days ago [13:59:39] pybal not cleaning ipvs on restart was expected? [14:00:27] I remember us meeting this once more in the past [14:00:42] it's not everyday we remove an lvs service [14:01:24] fsero: yes cause cleaning everything up and reinstating all rules would mean lost requests [14:04:20] i see.. however there a sort of 'absent' state should be declared so only that specific rule is cleaned [14:04:59] true. pybal doesn't have that concept AFAIK [14:19:26] <_joe_> fsero: ot [14:19:32] <_joe_> argj [14:19:43] <_joe_> still off by one sorry [14:20:13] <_joe_> It's intended IIRC, so that if someone wipes out the pybal config and it gets restarted, it won't wipe out the configuration [14:20:19] <_joe_> but I'm not 100% certain [14:20:50] <_joe_> akosiaris: we do wipe out the configs for the services we're configuring though [14:21:02] <_joe_> we just don't touch things that are not in the config anymore [14:21:13] yes [14:25:15] 10serviceops, 10Analytics, 10EventBus: Include git in our alpine docker image on docker-registry.wikimedia.org - https://phabricator.wikimedia.org/T213963 (10Ottomata) Yar ok. I don't really want to put the schemas into EventGate, soooo I'll make a deploy repo after all! :) [14:26:01] 10serviceops, 10Analytics, 10EventBus: Include git in our alpine docker image on docker-registry.wikimedia.org - https://phabricator.wikimedia.org/T213963 (10Ottomata) Hm, well, I did consider doing this for final prod deployment.........yar ok. Nevermind. I'll DO IT! [14:33:43] fsero: if I do it in blubber, i won't be able to do the initContainer thing, which I'd much rather do, especially at first [14:33:55] <_joe_> *why* [14:34:04] _joe_: if you don't come to meetings you dont' get to know! :D [14:34:05] jk [14:34:08] <_joe_> it's just conceptually wrong. [14:34:11] _joe_: no [14:34:15] <_joe_> it is [14:34:17] because the schemas are an evolving resource [14:34:22] <_joe_> now you can have a practical reason for that [14:34:28] they could as well be stored in a database [14:34:38] and we wouldn't package the data in a database into a container [14:34:41] image [14:34:52] that's actually debatable [14:34:53] <_joe_> sorry, it's a non-sequitur. [14:34:53] we just wanted to decentralize them, so we are using git [14:35:08] there are cases were data for the database is packaged with the source [14:35:09] <_joe_> we do pack geoip data in containers [14:35:16] <_joe_> and they vary much more than your ones [14:35:19] anyway, yesterday we discussed, and the initContainer is much more flexible especially while' we are figure this out (e.g. in staging) [14:35:21] things like fixtures, dummy data, initdata and so on [14:35:38] <_joe_> it doesn't look like you convinced someone [14:35:47] eventually schemas will be added at any time by many people [14:36:04] it was more of a "we can tolerate this and see if it indeed has merit" [14:36:06] when that happens we'll havae a separate service for those ones [14:36:17] i thoought it was its ok for this developemnt phase [14:36:23] <_joe_> ottomata: so answer me about this [14:36:25] but we'll switch to packaging in containers when we actually do it for real [14:36:48] grumpy channel this morning sheessshhh :p [14:37:19] _joe_: he called you grumpy I think [14:37:22] <_joe_> ottomata: well, it seems you're rejecting some reasonable objections, hence the irritation, at least on my part [14:37:24] hahah [14:37:28] old and grumpy I believe :P [14:37:33] <_joe_> I'm not grumpy, I'm irritated [14:37:45] _joe_: i'm not rejecting anything! [14:38:20] <_joe_> because this thing is spinning out of control already. So mutating the content of containers at runtime is a very bad idea IMHO [14:38:35] <_joe_> it doesn't allow you an easy way to know what each container is running [14:38:45] _joe_: out of control? [14:38:58] to recap what I said yesterday: "I still think that bundling the schemas with the container is better, but if you can use the initContainer and know which versions of the schemas you are running, ok" [14:39:02] we talked about this in a meeting yesterday its a discussion! [14:39:13] <_joe_> a thing that both bundling the schemas in the container or a database [14:39:21] for the initContainer case, the chartr will specify which version of the git repo it is going to clone [14:39:22] <_joe_> or a registry to reach via http [14:39:26] <_joe_> will allow you to avoid [14:39:27] and it doesn't auto pull anything [14:39:32] <_joe_> so... [14:39:34] do get a new schema we'd have to redploy [14:39:36] <_joe_> why not bundling? [14:39:41] <_joe_> what's the difference? [14:39:48] because building in the image requires going through the whoel CI pipeline [14:39:53] <_joe_> not having to rebuild the container via an automated pipeline? [14:39:56] <_joe_> seriously? [14:40:07] <_joe_> sorry, I don't think that's an acceptable reason [14:40:25] <_joe_> how long does the pipeline take to run? [14:40:33] _joe_: i think you think i'm way more opinionated on this than I am, yes that's what I want to do. its much easier especially at this phase [14:40:39] <_joe_> mind that that's also traced clearly in VCS [14:40:41] and in the meeting yesterday everyone was ok with it [14:40:53] hence why i was trying to move forward with it [14:40:55] <_joe_> that's... not what I got [14:41:06] akosiaris: am I wrong about that? [14:41:07] <_joe_> but ok then [14:41:16] <_joe_> just don't use alpine [14:41:21] <_joe_> that I veto :P [14:41:26] _joe_: sure I don't care about that at all [14:41:35] ottomata: I did recap what I said above. [14:41:36] i just wantd to run git clone in an initContainer [14:41:47] akosiaris: to recap what I said yesterday: "I still think that bundling the schemas with the container is better, but if you can use the initContainer and know which versions of the schemas you are running, ok" [14:42:00] ya, that's what i remember [14:42:17] btw note that there's one thing you did not account for and it's the initContainer failing [14:42:27] and we decided that we'd do initContainer for now, but when we turn it on for real it would be done in the image [14:42:56] it is bound to cause some interesting debugging sessions [14:43:12] I may be wrong ofc, but let's wait and see [14:43:56] makes sense, but i guess is it ok to use the initContainer for now and then later build it into the image for prod purposes? if not, i can close that 'give me git in a prebuild image' task [14:44:05] <_joe_> well, that initContainer has more chances to fail (and tie the startup of a pod to gerrit's availability) is not an opinion akosiaris [14:45:05] that also makes a lot of sense _joe_, that argument i buy more than any other, its one of the reasons we even wanted to decouple from an schema external service [14:45:06] <_joe_> but ok, if it's temporary, who cares [14:45:13] but ya, temproary! :) [14:45:23] <_joe_> ottomata: oh, I thought you wanted a registry service [14:45:30] _joe_: we do. [14:45:35] there are different deployments of eventgate [14:45:35] <_joe_> in that case, you would want to make it as solid as possible [14:45:44] <_joe_> like - nginx serving static files [14:45:48] ottomata: for now, yeah it's ok. I agreed to that already yesterday, didn't I ? [14:45:55] _joe_: ya that's the idea for that [14:45:57] the one that things like cp will use will have rarely changing schemas [14:46:05] and likely have the schema repo built into the image [14:46:15] my impression is that they want a registry service, but not necessarily to spend the time immediately on building a fully-productionized registry service? [14:46:16] the eventloggign-analytics replacement will have often changing schemas (daily even) [14:46:41] akosiaris: yes ok. you did but i just was getting a lot of mixed messages! :) [14:46:57] I haven't yet understood why that will happen btw (that is the very often changing schemas) [14:47:18] <_joe_> cdanis: a non-fully productized one is just nginx plus a git clone :) [14:47:23] I'd like to reboot darmstadtium/docker registry to effect the SSBD-enabled qemu, there's nothing really to follow except !log and boot? it's a ganeti instance will only be gone for 30s or so anyway [14:47:35] moritzm: yeah go for it [14:47:37] akosiaris: https://www.mediawiki.org/wiki/Extension:EventLogging/Guide#Using_EventLogging:_The_workflow [14:47:40] thanks for letting us now [14:47:41] <_joe_> moritzm: it should be perfectly fine [14:47:51] ack, will proceed in a few minutes [14:47:56] https://meta.wikimedia.org/w/index.php?title=Special%3AAllPages&from=&to=&namespace=470 [14:48:34] https://meta.wikimedia.org/w/index.php?title=Schema:NavigationTiming&action=history [14:49:00] _joe_: indeed! well, plus probably multiple git clones but ya [14:49:09] that is the idea, a very simple http fileserver [14:49:52] akosiaris: ok, so what is needed to put an image with git available in our docker-registry? [14:50:30] yeah. as long as it's based off of wikimedia-stretch [14:50:33] <_joe_> ottomata: you need to build off of docker-registry.wikimedia.org/nodejs [14:50:33] that's fine [14:50:38] lol [14:50:45] <_joe_> akosiaris: they need to use the nodejs image as a base, right? [14:50:49] <_joe_> then add git via blubber [14:50:52] _joe_: https://github.com/wikimedia/eventgate/blob/master/.pipeline/blubber.yaml [14:50:55] that's different [14:50:58] tha'ts the app image [14:51:14] i'm asking for a very minimal prebuilt image with git for the initContainer [14:51:32] <_joe_> well then start from wikimedia-stretch [14:51:45] <_joe_> in... production-images? [14:51:45] line 33 here https://gerrit.wikimedia.org/r/#/c/operations/deployment-charts/+/483035/12/charts/eventgate-analytics/templates/deployment.yaml [14:51:50] <_joe_> I dunno, akosiaris [14:52:23] ottomata: actually there is one already [14:52:23] could our wikimedia-stretch images just have it by default? [14:52:25] oh? [14:52:33] (actually i haven't checked that they don't have it...) [14:52:41] if it's just for development there is the wmfdebug image [14:52:48] oh! [14:52:49] ok [14:52:50] that has git plus a whole set of other tools [14:53:06] ok cool, i'll use that. and we will never deploy it this way outside of staging [14:53:12] tcpdump iproute mtr-tiny iputils-arping iputils-ping tshark sudo dnsutils dstat gdb git curl wget httpry iperf jq moreutils binutils ngrep ncdu psmisc strace sysstat tree linux-perf nmap [14:53:24] so, should suit you fine [14:53:55] q, where do we keep our prod blubber/docker files? [14:54:01] i'm only going off of https://people.wikimedia.org/~thcipriani/docker/ [14:54:14] lol, what's this? [14:54:47] <_joe_> a simple frontend to the registry [14:54:47] so, blubber specific images are in their respective repos [14:54:55] the bases are at https://gerrit.wikimedia.org/r/operations/docker-images/production-images [14:55:10] <_joe_> but IIRC fsero is planning to install a decent frontend [14:55:22] generally any blubber image is based on one from that repo [14:55:46] aye ok [14:56:11] there's an even lower layer of images but that's just wikimedia-stretch, wikimedia-jessie and it's best to avoid layering right on top of them unless adding support for some new language [14:56:30] great. [15:00:04] 10serviceops, 10Analytics, 10EventBus: Include git in our alpine docker image on docker-registry.wikimedia.org - https://phabricator.wikimedia.org/T213963 (10Ottomata) 05Open→03Declined For development phase, i'll use thee wmfdebug image. For prod deployment (outside of staging k8s), we'll build the sch... [15:17:44] _joe_: thats phase 2 of the registry, to add portus and while we are at it one thing it will help is a +1 here https://gerrit.wikimedia.org/r/c/operations/puppet/+/484510 [15:17:49] is a really silly thing [15:18:06] <_joe_> fsero: sure [15:18:23] <_joe_> fsero: I'm half a manger, I have to sell your successes well [15:18:32] <_joe_> it's your problem to make the magic happen!!1! [15:22:06] :D [15:38:25] fsero: https://phabricator.wikimedia.org/T211247#4886237 [15:38:33] i wonder if its worth doing the requiremnets.yaml thing after all [15:38:54] we can keep kafka-single-node in the repo, and then just in the README of eventgate mention that to run it in minikube you need to install it first [15:39:06] ? [15:39:39] Let me go for coffee and take a look :) [15:39:42] k danke [15:41:08] <_joe_> coffee, good idea [15:51:21] * ottomata just learned about minikube dashboard ... [15:51:56] ottomata: works for me, but kafka-single-node chart should be copied (or symlinked as you did) [15:52:04] https://www.irccloud.com/pastebin/bvO0DrJZ/ [15:52:21] eventgate-analytics git:(master) ✗ helm list (⎈ |eventgate:default) [15:52:21] NAME REVISION UPDATED STATUS CHART APP VERSION NAMESPACE [15:52:22] eventgate 1 Thu Jan 17 16:49:03 2019 DEPLOYED eventgate-analytics-0.0.1 default [15:52:42] fsero: yea it works with the symlink [15:52:49] this is without a symlink [15:52:51] is copied [15:52:55] antoher option is this one.. [15:52:56] or copied [15:52:58] i mean [15:53:08] it has to be in the sub charts/ directory of my chart [15:53:21] it just doesn't seem to respect file://../ [15:54:04] nope, needs to live in the charts directory yep [15:54:12] https://www.irccloud.com/pastebin/gdP8ywai/ [15:54:18] thats another option [15:54:21] hm [15:54:42] what do you think is best? i also don't mind not adding the requirement, and making people do it manually in minikube [15:54:56] its a little easier that way for iterative development, beacuse the kafka pods don't need to be stopped and started each time [15:55:09] ^ if kafka-single-node is installed manually [15:55:18] instead via requirements.yaml [15:55:28] the symlink here is the most flexible for both [15:55:31] if you are ok with that [15:55:56] this is for development and hence ok with that [15:56:02] right now the symlink works, and if i want to auto install kafka via requirements, i just do --set subcharts.kafka=true [15:56:10] ok cool, thanks [15:56:17] wont do it for a prod chart, in that scenario i would vendor the chart [15:56:23] either copying the chart or the tgz [15:56:44] aye, makes sense [15:57:20] it could make sense name kafka-single-node as dev-kafka-single-node keeping it clear is just for dev [16:04:06] fsero: k [16:04:33] will do at https://gerrit.wikimedia.org/r/#/c/operations/deployment-charts/+/484498/, lemme know if there are other changes I should make to it [16:04:35] before we can merge [16:09:06] fsero 'kafka-dev' ok for the name? [16:09:47] yep [16:30:46] <_joe_> fsero: meeting! [16:58:14] how does the service.{externalIP,port} work in a prod deployment? [16:58:23] is that something that the scap-helm wrapper takes care of? [17:13:12] fsero: , akosiaris. i think https://gerrit.wikimedia.org/r/#/c/operations/deployment-charts/+/483035/ (and also https://gerrit.wikimedia.org/r/#/c/operations/deployment-charts/+/484498/) are ready for review (and eventual merge) [17:13:17] would be fun to be able to do a staging deploy next weeek [17:15:03] ottomata: sure. But kind of swamped today and tomorrow. Will try and review on Monday though [17:15:37] ottomata: oh and we are removing externalIP. See https://gerrit.wikimedia.org/r/484670 [17:15:38] ok, thanks [17:15:49] turns out it causes more harm than good currently [17:17:32] akosiaris: ok, I should probably pick a non default service-template-node port then? [17:38:33] ottomata: no it's fine if you leave it as is [17:38:53] kubernetes can do the translation of the external port to the internal (on the container that is) port [17:39:08] but yeah feel free to pick one to help yourself when debugging [17:39:25] aye, but it might be nice to just have them both set to the same? and we have pick a manual nodePort iin prod anyway,ya? [17:40:40] helm actually does it but yes it's probably better to pick one anyway [17:40:52] you can avoid nasty surprises indeed [19:39:42] arrived at SF office and got new power supply, laptop boots [21:39:04] mutante: woohoo!