[08:25:50] 10serviceops, 10Operations, 10Patch-For-Review, 10User-jijiki: Create a mediawiki::cronjob define - https://phabricator.wikimedia.org/T211250 (10elukey) 1. is done :) [09:11:48] 10serviceops, 10Continuous-Integration-Infrastructure, 10Developer-Wishlist (2017), 10Patch-For-Review, and 3 others: Relocate CI generated docs and coverage reports - https://phabricator.wikimedia.org/T137890 (10hashar) The broken OOJS page (T206046) does work properly on doc1001.eqiad.wmnet CI has been... [09:31:54] <_joe_> akosiaris: so what was the interesting feedback? [10:41:51] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/482675/ akosiaris _joe_ I will appreciate a review [10:43:33] <_joe_> fsero: I'll take a look in a few [10:46:28] Thanks _joe_ I'm going to be afk from a while but let me know [11:00:45] _joe_: people had a lot of questions on how we handle IPv6, especially with regard to firewalling. Logging was a funny part (I 've promised them a dedicated presentation for that). Of course many knew IPVS, they more or less do the same thing we do. The extending swagger specs part got some people interested. They also wondering how we treat TLS certificates. Some even wondered if we were interested in doing AI in kubernete [11:00:45] s. Of course questions over the deployment auth method. One guy told me that tiller is going away in Helm 3. Need to refresh my memory on this one. Maybe things have changed since May. Many people liked the presentation overall. [11:01:26] oh ofc questions about network connectivity of hosts (some people expected we do MLAG) [11:01:36] and PDUs as well [11:01:41] <_joe_> PDUs? [11:01:41] all kinds of questions [11:01:58] yeah. "single or dual pdus?" [11:02:04] <_joe_> ahah I would've been in a big time difficulty remembering that [11:02:20] <_joe_> I think we do dual pdus in general right? [11:02:29] oh I remembered the a5 incident with all the memcacheds going offline [11:02:30] <_joe_> but I understand where the question is coming from [11:02:39] cause of the faulty pdu and then operator error [11:02:52] <_joe_> single-pdu can make sense for k8s nodes [11:02:56] oh and a question about blubber! [11:03:02] just so jijiki is happy [11:03:11] <_joe_> I knew that, my spy already told me [11:03:23] <_joe_> the only thing I knew about your talk [11:03:30] <_joe_> is that you namedropped blubber [11:04:46] akosiaris: helm 3 is going tillerless [11:05:12] https://github.com/helm/community/blob/master/helm-v3/000-helm-v3.md [11:05:46] _joe_: I did not. I only showed them the slide [11:05:59] and the redhat guy noticed it. He was impressed we even have a logo for it [11:06:02] <_joe_> how is that going to work with our rbac setup? [11:06:35] * akosiaris has to read the design doc [11:06:39] <_joe_> we'll still have a kube role helm will need to use [11:06:56] <_joe_> and given it's going to be namespaced anyways, that shouldn't be an issue [11:06:59] Credentials will be provided over helm [11:07:14] <_joe_> An extension mechanism (Lua scripts stored in the chart) [11:07:27] Helm needs to have a proper kubeconfig attached to an user [11:07:29] <_joe_> lua keeps crawling back into my life [11:07:56] <_joe_> oh and they're making a helm release a kubernetes resource [11:08:06] <_joe_> which makes a lot of sense after all [11:08:21] <_joe_> basically they're moving what was implemented in tiller to be part of kubernetes [11:08:28] <_joe_> which is both logical and scary to me [11:09:57] to be clear. Even now tiller stores the releases in kubernetes [11:10:14] under a configmap in a json object [11:10:32] so the change is that the object now has a schema and is a CRD [11:10:58] Well it makes sense [11:11:05] and they talk about an alternative implemention via a controller [11:11:18] which I guess is just tiller transformed ? [11:11:33] https://github.com/helm/community/blob/master/helm-v3/008-controller.md [11:11:45] Yep, essentially instead of using cli [11:11:49] You create a crd [11:11:57] That the helm operator will execute [11:12:10] However this is far in kubernetes terms [11:12:17] At least 6 months away [11:13:57] well even the first release of tiller 3 seems months away [11:14:30] I am not concerned much. We have other fish to fry right now [11:14:54] but we should be keeping an eye on it, cause we don't know for how long they will agree to keep maintaining helm 2.x [11:54:20] <_joe_> fsero: I did review your code, it wasn't that bad :) [12:16:18] thanks _joe_ I'll address comments [13:33:26] 10serviceops, 10Continuous-Integration-Infrastructure, 10Developer-Wishlist (2017), 10Patch-For-Review, and 3 others: Relocate CI generated docs and coverage reports - https://phabricator.wikimedia.org/T137890 (10hashar) https://doc.wikimedia.org/ is now served by doc1001.eqiad.wmnet and working as expecte... [13:39:12] 10serviceops, 10Operations, 10User-jijiki: Add `supervised` option to redis configuration - https://phabricator.wikimedia.org/T212102 (10jijiki) [13:45:24] 10serviceops, 10Continuous-Integration-Infrastructure, 10Developer-Wishlist (2017), 10Patch-For-Review, and 3 others: Relocate CI generated docs and coverage reports - https://phabricator.wikimedia.org/T137890 (10hashar) 05Open→03Resolved Doc updated on the wiki https://www.mediawiki.org/w/index.php?ti... [15:17:42] 10serviceops, 10Scap, 10Release-Engineering-Team (Watching / External), 10User-jijiki: Allow scap sync to deploy gradually - https://phabricator.wikimedia.org/T212147 (10jijiki) [15:32:57] _joe_: to be fair, Alex didnt namedrop blubber [15:33:12] but someone saw "blubber" on the diagram [15:33:27] and asked alex "I don't recognise that blubber thing, tell me more" [15:33:30] <_joe_> and asked themselves who would be so foolish to name a software like that [15:33:34] <_joe_> ahahahah [15:34:23] while alex thought that I put that guy up to this [15:34:25] :p [15:43:04] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, and 3 others: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817 (10jijiki) [15:43:10] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, and 2 others: Assess Thumbor upgrade options - https://phabricator.wikimedia.org/T209886 (10jijiki) 05Open→03Stalled [15:44:35] hi. apologies; I'm going to ask a very naive question. would thumbor be a good candidate for running on k8s? [15:45:59] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, and 3 others: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817 (10jijiki) [15:46:02] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, and 2 others: Assess Thumbor upgrade options - https://phabricator.wikimedia.org/T209886 (10jijiki) 05Stalled→03Resolved [15:46:18] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, and 2 others: Assess Thumbor upgrade options - https://phabricator.wikimedia.org/T209886 (10jijiki) [15:48:50] hey cdanis IMO makes perfect sense as a candidate [15:49:06] only requirement is set up application autoscaling inside the cluster [15:50:10] cdanis: yes [15:50:43] we have discussed this option, it is a stateless service [15:51:01] <_joe_> yes, thumbor seems like a perfect candidate for an autoscaling deployment [15:51:17] <_joe_> between a minimum and a maximum, so that we don't unload the bottleneck on swift [15:51:46] cdanis: my general impression is that we will upgrade it to stretch [15:52:00] and then instead of upgrading to buster [15:52:20] (we have some librsvg bugs which are resolved in buster) [15:52:33] <_joe_> jijiki: I'm not sure of the timeline, but I would think we should plan to move thumbor next quarter [15:52:39] <_joe_> to k8s I mean [15:52:39] we will move it to k8s [15:52:55] _joe_: yeah we could do the stretch upgrade this quarter [15:53:18] I will organise a meeting sometime in the foloowing week/weeks [15:53:32] and discuss what we want to do [15:54:22] <_joe_> try to write up a proposal first [15:54:44] on phab ? [15:54:54] <_joe_> on paper if you prefer :P [15:55:07] lol, then I will snail mail it to you all [15:55:11] <_joe_> or on etherpad, or on-wiki under your user page namespace, it doesn't matter [15:55:22] ok ok [15:55:35] <_joe_> just get to the meeting with an idea we can rubberstamp and take credit for [15:55:35] what's the work needed there, once the stretch upgrade is done? make our own docker image for thumbor, helm chart, probably some k8s networking config, maybe the k8s cluster would need to have more nodes in it? [15:55:59] cdanis: well first, we need to test th buster version really [15:56:00] <_joe_> cdanis: we'd need to convert the thumbor servers to k8s nodes probably, yes [15:56:08] jijiki: i can help you with that [15:56:10] if you want [15:56:31] ofc I will need help if I take on this task [15:56:37] I am blind when it comes to k8s [15:56:38] cdanis: docker image, helm chart, deploying on k8s and if we want autoscaling there [15:56:42] enable api agreggation layer [15:56:44] I attend the meetings though :p [15:56:49] deploy metrics-server [15:56:52] and meetups [15:56:55] and set up scaling policy [15:57:10] but anyway let's not get ahead of ourselves [15:57:10] and probably expand the cluster a little bit more yes [15:57:13] jijiki: haha i hear that. i should really find some time to play with minikube and helm on a toy project [16:00:15] <_joe_> cdanis: you could try, just as a toy project, to make this cool wiki software work in kubernetes [16:00:27] <_joe_> it's called "MediaWiki", comes with a few helper services [16:00:36] <_joe_> I think it can be done over a weekend [16:01:34] https://i.imgur.com/dSjMyuO.jpg [16:01:37] _joe_: Making it work is a simple matter of engineering, and is left as an exercise for students. [16:01:53] <_joe_> James_F: spoken like Lev Landau would! [16:02:06] * James_F grins. [16:02:12] the problem with mediawiki is outside the software Ñ= [16:02:13] :) [16:02:14] <_joe_> (I studied 6 of his theoretical physics books while at the university) [16:02:21] tehcnically speaking is easy [16:02:22] a past colleague was very fond of "just a simple matter of programming" [16:02:29] and we have discussed this i know :P [16:02:38] <_joe_> fsero: more or less true [16:02:56] however we should do a PoC [16:02:57] The divide at my uni between the Software Engineers and the Computer Scientists was acute, and each group of faculty clearly had some contempt for the others. [16:03:01] to test our ideas [16:03:35] <_joe_> James_F: I'm a physicist, we are taught maybe on the second day of course that engineers are clearly an inferior race [16:04:20] James_F: I'm a physicist, we are taught maybe on the first day of course that engineers are clearly a superior race [16:04:26] The advantage the Engineers had was that they made all the money for the department by making Beowolf clusters for NASA and the US Department of Energy. [16:04:44] but that is because in greece whoever ends up at the physics school [16:04:56] means they failed to enter an engineering school :p [16:06:41] <_joe_> jijiki: pretty much the opposite here! [16:07:08] _joe_: frankly I was one of the maybe 10/240 people who chose to study physics [16:07:26] <_joe_> I mean given the choice between studying quantum physics and sewer systems, what would you pick [16:07:28] so initially, I didn't understand what was going on [16:07:43] lol _joe_, true that [16:08:17] <_joe_> (in my imagination, all engineers have to take at least two exams about sewer systems) [16:08:49] my favourite one is concrete I [16:08:51] <_joe_> James_F: oh beowulf clusters, what unfortunate memories you bring back to me [16:09:10] * jijiki used to run a MOSIX cluster [16:09:14] * _joe_ had to write parallel fortran) [16:09:14] _joe_: Sorry. :-) [16:09:42] <_joe_> James_F: https://en.wikipedia.org/wiki/Cactus_Framework [16:09:50] <_joe_> I can still scream just thinking about it [16:12:53] https://www.dcs.warwick.ac.uk/~saj/papers/PMEOpaper.pdf was more "our" thing (but I didn't do the PhD there, so I escaped). [16:30:57] geez, someone dropped a fortran reference [16:31:01] you people are old [16:31:10] * urandom knows because he is old [16:32:49] COBOL 4 LIFE [16:33:15] * bd808 actually doesn't remember anything about COBOL except that the manual was havy [16:33:22] *heavy [16:33:44] someone put gerrit in k8s https://gerrit-review.googlesource.com/q/project:k8s-gerrit+status:open [16:38:11] it was very verbose for those days (the language) [18:14:02] 10serviceops, 10CX-cxserver, 10Release Pipeline, 10Release-Engineering-Team, 10Services: Migrate cxserver to kubernetes - https://phabricator.wikimedia.org/T213195 (10greg) p:05Triage→03Normal [18:14:06] 10serviceops, 10ChangeProp, 10Release Pipeline, 10Release-Engineering-Team, 10Services: Migrate changeprop to kubernetes - https://phabricator.wikimedia.org/T213193 (10greg) p:05Triage→03Normal [18:14:14] 10serviceops, 10Citoid, 10Release Pipeline, 10Release-Engineering-Team, 10Services: Migrate citoid to kubernetes - https://phabricator.wikimedia.org/T213194 (10greg) p:05Triage→03Normal [18:15:20] 10serviceops, 10CX-cxserver, 10Release Pipeline, 10Release-Engineering-Team, 10Services: Migrate cxserver to kubernetes - https://phabricator.wikimedia.org/T213195 (10thcipriani) [18:15:51] 10serviceops, 10Citoid, 10Release Pipeline, 10Release-Engineering-Team, 10Services: Migrate citoid to kubernetes - https://phabricator.wikimedia.org/T213194 (10thcipriani) [18:16:03] 10serviceops, 10ChangeProp, 10Release Pipeline, 10Release-Engineering-Team, 10Services: Migrate changeprop to kubernetes - https://phabricator.wikimedia.org/T213193 (10thcipriani) [18:17:54] 10serviceops, 10Operations, 10Wikidata, 10Wikidata-Termbox-Hike, and 4 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10WMDE-leszek) 05Open→03Stalled I've submitted RFC about the whole concept of Wikibase front end changes as T213318. I've taken the liber... [18:54:12] _joe_: I changed my cr with all your suggestions so if you can take a look that would be great (or someone else) [19:37:54] 10serviceops, 10Continuous-Integration-Infrastructure, 10Developer-Wishlist (2017), 10Patch-For-Review, and 3 others: Relocate CI generated docs and coverage reports - https://phabricator.wikimedia.org/T137890 (10hashar) [19:43:46] 10serviceops, 10Citoid, 10Release Pipeline, 10Services, 10Release-Engineering-Team (Next): Migrate citoid to kubernetes - https://phabricator.wikimedia.org/T213194 (10greg) [19:43:51] 10serviceops, 10CX-cxserver, 10Release Pipeline, 10Services, 10Release-Engineering-Team (Next): Migrate cxserver to kubernetes - https://phabricator.wikimedia.org/T213195 (10greg) [19:43:58] 10serviceops, 10ChangeProp, 10Release Pipeline, 10Services, 10Release-Engineering-Team (Next): Migrate changeprop to kubernetes - https://phabricator.wikimedia.org/T213193 (10greg) [19:55:32] 10serviceops, 10Mail, 10Operations, 10Phabricator, and 2 others: Convert Phabricator mail config to use cluster.mailers - https://phabricator.wikimedia.org/T212989 (10greg) [20:03:40] 10serviceops, 10CX-cxserver, 10Release Pipeline, 10Release-Engineering-Team (Next), 10Services (watching): Migrate cxserver to kubernetes - https://phabricator.wikimedia.org/T213195 (10Pchelolo) [20:03:44] 10serviceops, 10Citoid, 10Release Pipeline, 10Release-Engineering-Team (Next), 10Services (watching): Migrate citoid to kubernetes - https://phabricator.wikimedia.org/T213194 (10Pchelolo) [20:03:49] 10serviceops, 10ChangeProp, 10Release Pipeline, 10Release-Engineering-Team (Next), 10Services (watching): Migrate changeprop to kubernetes - https://phabricator.wikimedia.org/T213193 (10Pchelolo) [20:50:19] 10serviceops, 10Continuous-Integration-Infrastructure, 10Developer-Wishlist (2017), 10Patch-For-Review, and 3 others: Relocate CI generated docs and coverage reports - https://phabricator.wikimedia.org/T137890 (10Paladox) [21:15:14] looks at loading "mod_journald" on httpd to get error_logs in journalctl https://httpd.apache.org/docs/trunk/de/mod/mod_journald.html [21:15:33] then shell users.. who already have journalctl in sudo privs.. could read them [21:21:02] version 2.5 ... hmmmm [21:24:27] meeeeh.. thread from 2014 https://lists.gt.net/apache/dev/442533 Is there any special reason why mod_systemd and mod_journald (available in trunk) are not backported to 2.4 yet [21:33:00] 16:31 < Cefiar> mutante: I wouldn't just yet anyway. systemd-journald has known exploits if you push lots of data at it (see https://seclists.org/oss-sec/2019/q1/54), so doing that before they "fix" the [21:33:04] exploits is probably not a great idea. [21:35:47] < thumbs> mutante: no easy way, other than patching 2.4 [22:04:16] we should add some users to "adm" group and forget about all this and even about using sudo [22:05:04] ops membership means we automatically get "adm" as well, which means cat /var/log/apache2/error.log without sudo. shell users like hashar on his doc.wm VM don't get "adm" but should have it