[05:31:09] 10serviceops, 10Core Platform Team, 10MediaWiki-General, 10Operations, and 2 others: Revisit timeouts, concurrency limits in remote HTTP calls from MediaWiki - https://phabricator.wikimedia.org/T245170 (10tstarling) Plan: * Of the pseudo-libraries, EtcdConfig and RESTBagOStuff have short default timeouts a... [08:03:18] <_joe_> jayme: I'm about to merge https://gerrit.wikimedia.org/r/#/c/operations/deployment-charts/+/582777/ - it could be nice to try to use that in one of the new services getting TLS [08:05:02] _joe_: I've tree of them still not deployed with TLS enabled. I could switch to the v0.2 implementation prior to doing so [08:05:55] *three :-p [08:05:56] <_joe_> yeah that was my idea [08:06:09] <_joe_> it should be a 1:1 replacement so you just need to update the symlink [08:06:37] ok. Will take a look [08:11:52] 10serviceops, 10Operations, 10Puppet: delete the puppet module "apache" - https://phabricator.wikimedia.org/T252190 (10Dzahn) 05Open→03Resolved This has happened. The module is gone now. [08:12:28] _joe_: how is "/srv/images/production-images/" updated on the build hosts? Shoould that happen regularly or do I have to git pull manually when building production images? [08:15:18] <_joe_> yes for now yes [08:15:27] <_joe_> the idea is in the future CI should do it [08:15:44] <_joe_> you submit the change, CI builds and publishes the images [08:18:48] that would be nice :) [08:25:44] <_joe_> I just had this conversation with jbond42 in private [08:26:22] <_joe_> we've never trusted our current CI (with jenkins basically exposed to the internet) to be trusted enough / have enough separation of concerns to allow us building artifacts for production [08:26:30] <_joe_> that has changed with the deployment pipeline though [08:26:42] <_joe_> we're now building docker images that go in prod from CI [08:27:40] <_joe_> we've had a discussion for the future of CI - and I drew a simple schematics at the time https://people.wikimedia.org/~oblivian/ci/ci-threat.pdf [08:28:51] <_joe_> as moritzm and hashar can attest, jenkins hasn't a great record with vulnerabilities, esp for plugins [08:32:23] _joe_: what are the timelines for the future of CI and realising that diagram? [08:33:23] <_joe_> I have no idea tbh :) [08:33:33] kk thx :) [08:48:08] o/ [08:48:27] they have vulnerabilities, a bunch of them [08:48:52] which is not that surprising for a web app that is fairly complicated and has probably a couple thousands plugins at least [08:48:57] most of them being community written [08:49:35] that being said, they do some kind of embargo, try to fix common issues in bulk and do pre advisories announcements [08:49:43] so they at least have a nice security process :] [08:50:14] I also suggested them to not disclose on a friday evening but earlier in the week [08:50:27] no security process which has biweekly releases of 5-20 issues is ever nice... [08:57:38] https://www.cvedetails.com/vendor/15865/Jenkins.html 254 in 2019 ... that is a lot indeed [08:59:37] <_joe_> yeah but I mean, I wasn't trying to make a judgement call, I was just stating what we've all agreed to - that our jenkins installation being mostly publicly-accessible puts us one sql injection away from compromise [09:00:06] <_joe_> so we would really need a separated, fully-protected CI instance to perform trusted processes [09:01:10] there is another one for releases , though it is still publicly reacheable [09:02:48] <_joe_> ideally only the build logs should be publicly accessible, possibly on a completely separated server. [09:03:21] <_joe_> anyways, I wanted to give some context to jayme and jbond42, not move forward a better CI architecture *right now* :) [09:07:03] 10serviceops, 10Packaging: prepare mcrouter package for Debian Buster - https://phabricator.wikimedia.org/T251574 (10jbond) @Andrew I have actully made a bit of progress on this here https://gerrit.wikimedia.org/r/c/operations/debs/mcrouter/+/596779 Sorry i didn't link the task as it was still a bit WIP [09:19:10] _joe_: there is a little quoting error in the v0.2 template: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/597229 [09:19:33] <_joe_> oh sigh [09:19:42] <_joe_> thanks [09:19:55] np [09:41:28] and this one _joe_ ;) https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/597240 [09:44:03] <_joe_> lol, to be fair, I planned to do the work you're doing myself [09:44:05] <_joe_> sorry for that [09:46:39] Never mind. It's almost impossible to spot prior to rendering and no human has ever managed to write helm files that render without error on first try ;) [09:51:22] _joe_: looks good now. Could you please look at https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/597230 ? I would deploy that as noop and merge/deploy https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/597032 afterwards [09:53:01] <_joe_> the second patch might need some additional data to add to values.yaml, but LGTM in general [09:53:37] What do you think is missing there? [09:53:56] <_joe_> lemme check the default values in the chart first :) [09:54:44] <_joe_> no it's all there, actually [09:54:48] <_joe_> we should be GTG [10:00:08] nice. Noop deploy went through as well. If you'd give that your +1 I would merge and deploy [10:00:35] (did a rebase because gerrit was complaining about a merge conflict...did not see one though) [10:03:48] <_joe_> I think that repo is ff-only [12:25:34] 10serviceops, 10Analytics, 10Operations, 10vm-requests: Create a VM for matomo1002 (eqiad) - https://phabricator.wikimedia.org/T252742 (10elukey) ` elukey@ganeti1003:~$ sudo gnt-group list Group Nodes Instances AllocPolicy NDParams row_A 4 44 preferred ovs=False, ssh_port=22, ovs_link=, spin... [12:26:33] 10serviceops, 10Packaging: prepare mcrouter package for Debian Buster - https://phabricator.wikimedia.org/T251574 (10elukey) a:05elukey→03None [12:27:26] 10serviceops, 10Packaging: prepare mcrouter package for Debian Buster - https://phabricator.wikimedia.org/T251574 (10elukey) @jbond feel free to assign the task to yourself, I was added by mistake, not working on it :) [12:59:19] 10serviceops, 10Packaging: prepare mcrouter package for Debian Buster - https://phabricator.wikimedia.org/T251574 (10jbond) p:05Triage→03Medium a:03jbond [13:27:31] 10serviceops, 10Operations, 10Patch-For-Review: rack/setup/install ganeti10([09]|1[0-8]).eqiad.wmnet - https://phabricator.wikimedia.org/T228924 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` ganeti1009.eqiad.wmnet ` The log can be found in `/var/lo... [13:39:24] 10serviceops, 10Operations, 10Patch-For-Review: rack/setup/install ganeti10([09]|1[0-8]).eqiad.wmnet - https://phabricator.wikimedia.org/T228924 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['ganeti1009.eqiad.wmnet'] ` Of which those **FAILED**: ` ['ganeti1009.eqiad.wmnet'] ` [13:49:32] 10serviceops, 10Analytics, 10Operations, 10vm-requests: Create a VM for matomo1002 (eqiad) - https://phabricator.wikimedia.org/T252742 (10elukey) 05Open→03Stalled This is currently blocked due to resource constraints in row_c eqiad for Ganeti, see https://wikitech.wikimedia.org/wiki/Ganeti#Verify_clust... [14:14:07] 10serviceops, 10Operations, 10ops-codfw: (Need by: TBD) rack/setup/install 86 new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10Papaul) We have 5 mw servers left to be racked in row c rack c3 since we used 10 servers in T252185 [14:41:49] 10serviceops, 10Operations, 10Kubernetes, 10Patch-For-Review, and 2 others: Support kubernetes Egress networkpolicies in our helm charts - https://phabricator.wikimedia.org/T249927 (10akosiaris) blubberoid in staging switch to have the new policy in the Networkpolicy. Verified with `kubectl describe netpol... [14:43:58] 10serviceops, 10Packaging: prepare mcrouter package for Debian Buster - https://phabricator.wikimedia.org/T251574 (10MoritzMuehlenhoff) So I think there are separate things here - How to rebuild it for Buster - How to build mcrouter in general (I tried to rebuild our current package for Buster, but 0.37.0 is... [15:15:44] 10serviceops, 10Packaging: prepare mcrouter package for Debian Buster - https://phabricator.wikimedia.org/T251574 (10jbond) >>! In T251574#6148741, @MoritzMuehlenhoff wrote: > (I tried to rebuild our current package for Buster, but 0.37.0 is incompatible with OpenSSL 1.1, this is likely fixed in current releas... [15:19:43] 10serviceops, 10Packaging: prepare mcrouter package for Debian Buster - https://phabricator.wikimedia.org/T251574 (10jbond) > That's exactly what i have used, docker_entry.sh is just a cleaned up version of the FB packaging scripts Correction i based my script on https://github.com/facebook/mcrouter/tree/mast... [15:22:51] Does anyone know if operations/docker-images/production-images is only for base images or is it also intended as a place for built-in-house service images? [15:42:14] <_joe_> example? [15:42:34] <_joe_> It's supposed to be for base images for things that run in production [15:42:46] <_joe_> so it's hard to tell in abstract :) [15:45:03] for example, if we wanted to run the Grafana stack in Docker (but not k8s) [15:46:31] <_joe_> that fits perfectly [15:46:45] <_joe_> also yes please not in k8s :P [15:46:56] <_joe_> not our tool for monitoring k8s [15:47:44] Great. Thank you :) [15:47:46] <_joe_> shdubsh: I now get your original question. production-images is definitely also for built-in-house service images [15:49:29] <_joe_> shdubsh: may I ask you to expand the README if you get to work on that repo? so that the next person has less doubts [15:50:02] good idea, I'll do that [15:50:49] <_joe_> also, if you have to find out something that's undocumented / poorly searchable, let me know [15:51:04] <_joe_> every person who's fresh to a process is a good chance to improve the docs [15:53:16] sure. I ran into the same issue Kosta had wrt the requests ca certificate bundle, but found the env var in the docker-pkg code. It wasn't clear to my why docker-pkg wasn't actually building the images until the changelog was amended, then it worked perfectly. [15:55:01] also had some issues getting the update command to work. it's not clear that update requires both NAME and DIRECTORY [15:56:05] for example `docker-pkg -c config.yaml update --version 1.14 golang` yields: usage: docker-pkg update [-h] [--reason REASON] [--version VERSION] NAME [15:56:24] docker-pkg update: error: the following arguments are required: NAME [15:56:52] but adding the directory after that works, so I think I was holding it wrong [16:00:27] <_joe_> heh that's argparse being silly :/ [16:00:37] <_joe_> can you file a task? [16:00:44] sure :) [16:09:31] ahoy - do any of ye know much about the ratelimiting service in restbase? I think it might not have worked for a while because of firewall config- this came up a while ago https://phabricator.wikimedia.org/T249699 [16:37:27] 10serviceops, 10Kubernetes, 10Patch-For-Review: Make helm upgrades atomic - https://phabricator.wikimedia.org/T252428 (10Jdforrester-WMF) Test run with 2.16.7-2: https://integration.wikimedia.org/ci/job/helm-lint/1317/console [16:39:26] 10serviceops, 10Operations, 10docker-pkg: docker-pkg update cli renders unclear guidance - https://phabricator.wikimedia.org/T253131 (10colewhite) [19:08:45] 10serviceops, 10Operations, 10Thumbor, 10Sustainability (Incident Prevention): Reverse proxy supporting XFF-based per-IP concurrency limit and request queueing - https://phabricator.wikimedia.org/T252749 (10RLazarus) p:05Triage→03Medium Agree we ought to do this, and I think it's something Envoy can do... [20:39:32] 10serviceops, 10Operations, 10ops-codfw: (Need by: TBD) rack/setup/install kubernetes20[07-14].codfw.wmnet and kubestage200[1-2].codfw.wmnet. - https://phabricator.wikimedia.org/T252185 (10Papaul) [22:27:35] 10serviceops, 10Operations, 10docker-pkg: docker-pkg update cli renders unclear guidance - https://phabricator.wikimedia.org/T253131 (10RLazarus) p:05Triage→03Medium Looks like this comes from [[ https://gerrit.wikimedia.org/r/plugins/gitiles/operations/docker-images/docker-pkg/+/1023a62053f730a349db99dd...