[06:58:04] <_joe_> uhm jijiki can you take a look at https://gerrit.wikimedia.org/r/#/c/wikibase/termbox/+/493078/ when you have time? I'd need a +1 [06:58:32] oh come on man [06:58:52] * jijiki runs [06:59:02] <_joe_> ? [07:00:02] Add bubberoid hint to docs [07:00:27] <_joe_> oh sorry got the wrong link [07:00:34] LOL SURE [07:00:38] <_joe_> I meant https://gerrit.wikimedia.org/r/c/operations/puppet/+/508827 [07:00:44] <_joe_> :D [07:00:59] <_joe_> I just told fsero it was too long since I last rickrolled you with blubber [07:01:32] hahaha [07:01:48] you've being blubbered [07:02:06] termbox threw me off really [07:02:07] <_joe_> oh man [07:02:30] <_joe_> jijiki: I am an internet troll with 20+ years of experience [07:04:29] lol [08:36:59] i would like an extra set of eyes on those CRs https://gerrit.wikimedia.org/r/c/operations/puppet/+/508994 https://gerrit.wikimedia.org/r/c/operations/dns/+/508996 [08:54:50] jijiki: _joe_ ☝️ ☝️ :) [08:59:16] <_joe_> will do in a few! [09:34:58] turns out we have to have metrics for our Core Work by tomorrow [09:35:08] so i propose we discuss having one for Service Ops in the meeting today [09:40:06] 10serviceops, 10Beta-Cluster-Infrastructure, 10Release Pipeline, 10Core Platform Team Backlog (Next), and 2 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10Joe) There is a simple solution to run services that are now on k8s on deployment-prep: - Creat... [09:41:41] 10serviceops, 10Beta-Cluster-Infrastructure, 10Release Pipeline, 10Core Platform Team Backlog (Next), and 2 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10Joe) Please also note you can run multiple services on the same VM if you really want to, it's e... [09:42:09] scb* has uwsgi installed, but it seems entirely unused, probably from the time ORES was running on scb* ? [09:50:17] <_joe_> moritzm: probably [09:50:37] <_joe_> mark: ok, I wanted to talk about phab and gerrit boards, but we can keep that brief [09:50:55] <_joe_> mark: can I ask what kind of metrics? [09:51:06] <_joe_> so that I can come with some ideas to the meeting :) [09:51:30] there is no guidance on what kind of metrics [09:51:35] basically we have key deliverables for "core work" [09:51:47] but for our core work those are more "ongoing activities" rather than real deliverables/outputs [09:52:08] and so for Service Ops I basically (tongue-in-cheek) aggregated it as "Platform as a Service" which right now is the KD for service ops [09:52:20] and I would say we should have -a- reasonable metric there for our core work [09:53:33] * akosiaris buffled [09:53:57] like "The platform keeps running for X% of the time?" [09:54:32] no [09:54:47] we can't even measure that [09:54:53] it's not well defined [09:55:13] well... let's define "core work" then ? [09:55:27] maybe the metric will come out on its own after that [09:56:32] <_joe_> I mean I'm not sure I understand if it should be performance/uptime metrics, or work advancement metrics [09:56:45] work advancement is where I'm leaning [09:57:20] <_joe_> that is "we served content from services A,B,C within their SLO 99.97% of the time" vs "All services apart from mediawiki are on kubernetes [09:57:30] the latter is not in scope [09:57:36] because the pipeline is part of the strategic programs [09:57:41] they will have separate metrics there [09:57:43] mark: I can quickly whip up those package update stats we did in the past if that's helpful [09:57:53] <_joe_> it was an example [09:57:54] moritzm: that might be a metric for infra-foundations yes [09:58:15] <_joe_> frankly I don't know what besides SLOs, that don't only depend on us [09:58:17] _joe_: so yeah, something along those lines [09:58:19] <_joe_> very little in fact [09:58:22] but not that one :) [09:59:34] there is also this key deliverable [09:59:37] which is across SRE [09:59:46] Update essential infrastructure code to evolving standards [09:59:57] which I put there for python 2 to python 3 [10:00:20] and I think Corey is now adding node10 migration into that one as well (mainly for their team) [10:00:33] <_joe_> ok for kubernetes, I would assume "do not run an unsupported version of kubernetes for more than X days" [10:00:41] yes [10:00:43] <_joe_> would be a good objective [10:00:50] but arguably most pipeline stuff will be under the strategic pipeline program [10:00:56] <_joe_> so etcd3 is also in that key deliverable I guess [10:01:04] so I already came up with an example metric for infra foundations [10:01:07] which is: [10:01:07] <_joe_> not just for k8s [10:01:23] " [10:01:23] Number of Debian Linux systems running on EOL releases [10:01:23] " [10:01:40] but that's for faidon to change/update, since he's traveling i've just put something [10:01:45] <_joe_> what can we measure for us? [10:01:53] <_joe_> I mean that's not going to be on k8s [10:02:04] * akosiaris still buffled. so "core work" is the old TEC1? [10:02:07] <_joe_> automation? That's definitely something we need to improve on [10:02:37] <_joe_> I'm trying to think of metrics that will stimulate us rather than stress us. [10:03:22] 10serviceops, 10ORES, 10Scoring-platform-team: Ores hosts: mwparserfromhell tokenizer random segfault - https://phabricator.wikimedia.org/T222866 (10Volans) [10:03:43] akosiaris: I'm sorry :-P [10:04:15] volans: don't be. You just volunteered yourself :P [10:04:34] I did not know about it btw, nice find [10:04:54] lol [10:04:56] akosiaris: this is kind of the old TEC1, yes [10:05:07] but because our SRE sub teams are roughly organized around areas of expertise/infra [10:05:15] we've also come up with a single KD/activity per team [10:05:17] totally unrelated, I was looking for repros of the uwsgi segfault and ores hosts happen to have uwsgi :D [10:05:27] for python2/3, how about this one: [10:05:46] Percentage Python LOC able to run on Python 3 under operations git repositories [10:05:48] roughly like that [10:05:58] mark: how to measure it? [10:06:05] yeah hard [10:06:08] unit tests? ;) [10:06:14] but that's a metric by itself hehe [10:06:26] <_joe_> I would go with projects % maybe, but then it's risky [10:09:49] and really [10:09:57] they are asking for "up to 2" metrics per outcome per department [10:10:01] our outcome for this is [10:10:08] Wikimedia’s essential technical infrastructure is sustained and evolves along with industry standards [10:10:18] (essentially that's TEC1) [10:11:00] i don't want to go with uptime again, that simply doesn't mean anything across many services [10:11:04] and we can't even measure it well [10:11:10] maybe if we had SLOs for everything, that would make sense [10:11:19] and one of the projects for service ops is actually [10:11:24] (taken from the roadmap): [10:11:32] Process improvement around management of services (runbooks/cookbooks/SLOs...) [10:11:37] we could make a metric out of that [10:11:52] but it depends on headcount also how much progress we'll be able to make on that next year [10:11:58] <_joe_> that looks like the kind of metric that is a positive and not a negative [10:12:37] <_joe_> frankly getting 90% of the services running on the pipeline to have a defined SLO would be great [10:12:45] <_joe_> by the end of the year [10:12:49] that is a metric for the pipeline i'm sure [10:12:56] so you could perhaps say, service ops doesn't need another metric [10:13:00] it's already part of strategic [10:13:00] also one could argue that we can have a perfectly stable environment with good SLOs and fail at "evolves along with industry standards" ;) [10:13:10] yes [10:13:14] so key deliverables are outoputs [10:13:15] outputs [10:13:24] and outputs are "hoped" to improve things to achieve outcomes ;) [10:13:43] <_joe_> ok, so I circle back to automation/cookbooks [10:13:54] * volans hides in advance [10:13:56] <_joe_> else we have very specific goals [10:14:16] volans / moritzm: anyway, it may be good to do some brainstorming around infra foundations metrics for now [10:14:22] we'll need them by tomorrow, faidon is traveling :) [10:14:27] the KD for Infra Foundations is: [10:14:36] "Maintain and improve foundational infrastructure" [10:14:43] * _joe_ bbiab, restarting pybals now [10:14:46] I originally put tongue-in-cheek "Debian as a Service" there ;-p [10:14:49] faidon didn't like it haha [10:14:59] are these metrics per quarter or FY? [10:15:03] (and admittedly it doesn't cover nearly the entire scope of the team) [10:15:10] moritzm: they are for the year and for the mid-term-plan [10:15:16] so separate targets for "next year" and "in 3 years" [10:15:25] or rougly 3 years anyway [10:15:28] that's not even well defined :P [10:15:33] ok, so the last/current year is out of scope? [10:15:47] out of scope, we need to measure against the start of next fiscal [10:15:48] so july 1 [10:15:53] ok [10:15:54] ideally we're able to measure this metric on july 1 [10:16:07] how many metrics are expected per team? [10:16:18] we have 3 main areas (observability, infra, automation) + various minor ones though [10:16:19] finance wants no more than 2 per outcome... [10:16:25] ok [10:16:28] so really only 2 for outcome which is moslty sre [10:16:35] but they also didn't want more KDs [10:16:37] and we did make more KDs [10:16:43] otherwise it would be 1-2 for all of SRE, just doesn't work [10:16:50] it's not ideal [10:16:54] i would say, try to have one per team [10:16:56] but we can deviate [10:17:08] and as said, maybe service ops can go with none as there's already a big one under the pipeline in strategic [10:17:13] only one that cover all foundations seems hard to find to me [10:17:20] it doesn't need to cover everything [10:18:21] "Instructions - Core/maintenance/operational [10:18:21] For work not directly related to the priorities, we will ask each department who does this type of work to describe 1.) the multi-year outcomes that departments intend to achieve 2.) metrics and targets to track our progress on those outcomes and 3.) the Key Deliverables that the commits to achieve in the next 12 months. [10:18:21] There should be up to 3 Outcomes per department and up to 3 deliverables per outcome. Copy and paste the template provided to add an additional department. [10:18:21] " [10:18:50] I'll move this to #wikimedia-sre-foundations so that we can flesh something out [10:18:51] but there's no way we can cover everything with 2 metrics [10:18:53] yes [10:19:59] Definitions [10:19:59] Key deliverables - These are the principle outputs that we commit to accomplish in the next 12 months. They are the 1-3 things that we think will have the biggest impact on our Medium-term outcomes or core/maintenance/operational Outcomes. [10:19:59] Core/maintenance/operational outcomes and deliverables - This is work that we do that doesn’t directly support the Medium-term Plan’s outcomes. To help determine if a specific deliverable or project directly relates to our priorities, try asking yourself “Will accomplishing this particular deliverable or project “move the needle” on the Medium-term outcomes or metrics? Examples include: work in Finance, Talent & [10:19:59] Culture, and maintaining our technical infrastructure. [10:20:25] i don't think we need to question whether sustaining and evolving our technical infrastructure supports the MTP outcomes :) [10:21:34] volans: btw, observability has its own KD [10:21:56] ok [10:24:05] dc ops's KD is "Bare Metal as a Service" ;) [10:24:15] katherine commented she hadn't heard that one before [10:28:46] I'm following along on this discussion but have no great insights. pipeline work already seems like a huge undertaking [10:29:08] yeah it's the main work of the team next year and will also have mediawiki in scope [10:29:15] (unlike this year) [10:30:28] i think the MTP has a metric for "code health" across all code in all repositories [10:31:01] akosiaris: can I gently poke you about the termbox service deployment? Any tasks you see likely to come on our radar? [10:31:01] which is, let's say, challenging :P [10:33:54] all code. in all the various languages. in all repos. uh. [10:34:01] * apergos heaves a heavy sigh [10:34:53] i think my python 3 one is still easier to measure ;) [10:34:59] but already challenging [10:35:10] we'll just ask volans to automate that ;) [10:36:02] :-) [10:40:14] totally the metric should have been "number of volans nitpicks" [10:40:26] perhaps we should just go with that [10:41:10] run it up the flagpole and see who salutes, as the saying goes [10:48:27] mark: as long as you keep a 64bit counter :-P [10:49:19] and then volans nitpicks become toil [10:49:53] if we need a 64 bit counter, perhaps we can train a ML model for it [10:52:49] but will that be on wmf-supplied laptops or will it be on desktops paid for out of our pockets? :-P [10:56:48] * _joe_ sarcasm warning [11:02:51] 10serviceops, 10Beta-Cluster-Infrastructure, 10Release Pipeline, 10Core Platform Team Backlog (Next), and 2 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10Krenair) Will the service run into any differences in its environment due to being run with role... [13:14:40] akosiaris: thanks for the review (r/507397)! [13:15:02] akosiaris: out of curiosity, what was the edit that created patch 5? I can't see any change [13:15:27] * urandom isn't sure if he likes the new gerrit UI [13:15:50] urandom: https://gerrit.wikimedia.org/r/c/mediawiki/services/kask/+/507397/4..5 [13:16:30] volans: that. is awesome. [13:16:31] thanks [13:17:21] volans: did the old UI do that? [13:17:44] the UI way to get that is where it says File Patcheset # -> Patchset # to click on the two PS and select start/end [13:17:47] yes it did [13:18:02] was "Diff against:" [13:18:21] ^ [13:18:23] TIL, I guess [13:18:42] wondering how I missed it all this time [13:21:10] you probably got it always right at PS1 :-P [13:21:26] lol [13:31:20] akosiaris: did you see https://gerrit.wikimedia.org/r/c/mediawiki/services/kask/+/507397/4#message-7a13c78bc3f79511f04c16f7b422fdf7f645bb18 btw? I think this might be related to that edit you made (patch 5), if your intention there was that $CWD would be the default location of the spec [14:03:01] 10serviceops, 10Beta-Cluster-Infrastructure, 10Release Pipeline, 10Core Platform Team Backlog (Next), and 2 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10Ottomata) I believe the VM has to be Jessie atm, unfortuntely. Can't remember exactly why. Sin... [14:14:43] 10serviceops, 10Beta-Cluster-Infrastructure, 10Release Pipeline, 10Core Platform Team Backlog (Next), and 2 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10akosiaris) > > Since the docker container will be the same as the one running in production, I... [14:16:43] hiya fsero [14:16:57] o/ [14:16:59] eventgate-main in staging doesn't seem to be able to talk to kafka1001 etc. [14:17:02] tryingi to figure out wy [14:17:03] why [14:17:47] i can't get a shell long enough on the eventgate container to test exactly, since it dies pretty fast [14:17:53] and also doesn't have debug tools [14:18:13] but i deployed a pod with debug_mode_enabled: true, which deploys a wmfdebug image [14:18:29] and i can't ping either the v4 or v6 addy from it [14:19:07] maybe my changes to the template to use wmf.releasename messed up some network policy stuff? [14:20:37] no, this is calico stuff that didnt get updated [14:20:49] let me take a look, this one thing alex commented the other day [14:21:02] we should update calico that stores config in kubernetes and not outsiide of it [14:21:34] ah, this is not the defaualt-kubernetes-policy stuff in puppet? [14:21:51] it is, but is not applied automatically [14:22:05] give me a min, im going to _joe_ you, i have an important patch to merge [14:22:24] haha ok [14:26:06] <_joe_> ? [14:26:24] * _joe_ wonders since when I've become a verb [14:28:23] 10serviceops, 10Operations, 10Release Pipeline, 10Release-Engineering-Team, and 5 others: Introduce wikidata termbox SSR to kubernetes - https://phabricator.wikimedia.org/T220402 (10WMDE-leszek) Hey @akosiaris and @mobrovac we've been wondering if you had a chance to look into our service again. As reporte... [14:51:39] ottomata: could you try again please? [14:52:35] ya... [14:53:28] <_joe_> fsero, akosiaris is there an easy way to see which version of mathoid is deployed to production? [14:54:16] docker-registry.discovery.wmnet/wikimedia/mediawiki-services-mathoid:build-39 [14:54:39] <_joe_> how did you get that? get pods? [14:55:15] yep [14:57:00] 10serviceops, 10Operations: create IRC channel for the Service Operations SRE subteam - https://phabricator.wikimedia.org/T211902 (10jijiki) [14:57:47] looks better fsero tank you! [14:58:49] fsero: even though we are choosing v4 addresses if we can [14:58:51] just to be sure [14:58:57] did you also make changes to allow the ipv6 ones too? [14:59:18] i applied what is on puppet [14:59:25] so i think it should be there [15:01:15] great [15:01:16] thanks [15:02:51] awesome , -main in staging looking good! [15:05:00] fsero: i'm going to proceed with deploying to eqiad and codfw, if that's ok with you. [15:05:05] the throughput here is lower than -analytics [15:05:13] it would fasil [15:05:14] around 1.5K-2K / second [15:05:15] OH [15:05:16] ya? [15:05:22] so my replicas can be less [15:05:24] the policy is not updated there [15:05:35] lemme update ir for ya and maybe earn my thsirt [15:05:39] haha ok [15:08:27] please proceed [15:08:38] the usage will be different, clients will wait for produce request to finish [15:08:44] so http requests will be open longer to this service [15:08:52] which means req latency will be higher [15:09:29] however IMHO if client cannot connect to kafka it should output something before it dies [15:09:32] i still seem to be able to push ~ 1K msgs/sec through staging main [15:10:02] so, how many replicas here? [15:10:05] maybe 5 is fine for now? [15:10:15] woudl that be ok fsero ? [15:10:53] the current load in our kubelets is really low [15:10:56] so go ahead [15:11:05] why 5 ? [15:11:07] why no 3? [15:11:42] 3 could be fine too. [15:12:04] was just trying to have a little extra headroom [15:12:41] let's start with 3 [15:18:23] all looks good fsero thank you! [15:18:33] now for an lvs task... :) [15:18:39] yeah lvs [15:19:17] i think we can do both on monday [15:19:20] if you ar eok with it [15:19:21] cool [15:19:22] ya [15:21:22] 10serviceops, 10Analytics, 10EventBus, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Set up LVS for eventgate-main on port 32192 - https://phabricator.wikimedia.org/T222899 (10Ottomata) [15:24:58] 10serviceops, 10Beta-Cluster-Infrastructure, 10Release Pipeline, 10Core Platform Team Backlog (Next), and 2 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10Joe) >>! In T220235#5169739, @Krenair wrote: > Will the service run into any differences in its... [15:26:12] 10serviceops, 10Beta-Cluster-Infrastructure: Puppet broken on VMs in deployment-prep - https://phabricator.wikimedia.org/T221654 (10Joe) The way to go for such things is to use `role::beta::docker_services` on a fresh VM. I've already created deployment-docker-mathoid01 that should replace the old mathoid ser... [15:28:54] 10serviceops, 10Operations: Separate Wikitech cronjobs from production - https://phabricator.wikimedia.org/T222900 (10jijiki) [15:29:35] 10serviceops, 10Operations, 10cloud-services-team, 10Core Platform Team Backlog (Watching / External), and 3 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10jijiki) [15:29:40] 10serviceops, 10Operations: Separate Wikitech cronjobs from production - https://phabricator.wikimedia.org/T222900 (10jijiki) [15:30:15] In am joining in 1' [16:25:16] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, and 2 others: migrate endpoint from old registry instance to new one - https://phabricator.wikimedia.org/T221101 (10fsero) [16:26:11] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, and 2 others: improve docker registry architecture - https://phabricator.wikimedia.org/T209271 (10fsero) [16:26:18] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, and 2 others: migrate endpoint from old registry instance to new one - https://phabricator.wikimedia.org/T221101 (10fsero) 05Open→03Resolved [16:30:25] 10serviceops, 10Analytics, 10Analytics-Kanban, 10EventBus, and 3 others: Set up LVS for eventgate-main on port 32192 - https://phabricator.wikimedia.org/T222899 (10fdans) p:05Triage→03High [16:30:41] 10serviceops, 10Analytics, 10Analytics-Kanban, 10EventBus, and 3 others: Set up LVS for eventgate-main on port 32192 - https://phabricator.wikimedia.org/T222899 (10fdans) [17:20:38] jijiki: re "Separate Wikitech cronjobs from production". the ones running in www-data crontab, right? i only see 2 , runJobs.php and the TorBlock one [17:20:48] or also ones runnign in root crontab [17:21:04] i am on labweb1001 comparing to mwmaint1002 [17:21:42] mutante: I have not check the details, thus the task description is below par [17:22:40] if we figure out which are common and will break on wikitech if we switch to PHP7, that would be awesome! [17:22:54] jijiki: ok. let me try to do that and update [17:23:08] oh dear, thank you thank you [17:23:23] did I thank you ? [17:23:27] thank you :p [17:23:30] haha, yes :) [17:29:33] 10serviceops, 10Operations: Separate Wikitech cronjobs from production - https://phabricator.wikimedia.org/T222900 (10Dzahn) cron jobs running as 'www-data' on wikitech (labweb1001/1002) are only 2: ` # Puppet Name: run-jobs * * * * * /usr/local/bin/mwscript maintenance/runJobs.php --wiki=labswiki > /dev/nul... [17:31:18] wikitech is already PHP7, just not 7.2.. it's 7.0 [17:32:06] considers making a patch to upgrade that to 7.2 [17:34:52] 10serviceops, 10MediaWiki-Cache, 10Operations, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), and 5 others: Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10EvanProdromou) [17:36:16] 10serviceops, 10MediaWiki-Cache, 10Operations, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), and 5 others: Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10EvanProdromou) I moved this task so it's a direct chi... [17:48:13] 10serviceops, 10Operations: Separate Wikitech cronjobs from production - https://phabricator.wikimedia.org/T222900 (10Dzahn) labweb1001/1002 are on PHP 7, but it's 7.0 instead of 7.2 Regarding the TorBlock job, i tried to switch that in the past and it worked fine in production but failed on labweb, so it was... [18:15:01] 10serviceops, 10Operations: Separate Wikitech cronjobs from production - https://phabricator.wikimedia.org/T222900 (10Dzahn) The TorBlock job is already separated between wikitech and mwmaint. For mwmaint we are using `profile::mediawiki::periodic_job { 'mediawiki_tor_exit_node':` and for wikitech we are using... [18:15:36] TLDR: i think it's already done ^ [20:26:13] 10serviceops, 10Gerrit, 10Operations, 10cloud-services-team: Change /r/p/ to /r/ on all hosts (where https://gerrit.wikimedia.org/r/p/ exists) - https://phabricator.wikimedia.org/T222093 (10Paladox) [20:33:55] 10serviceops, 10Gerrit, 10Operations, 10cloud-services-team: Change /r/p/ to /r/ on all hosts (where https://gerrit.wikimedia.org/r/p/ exists) - https://phabricator.wikimedia.org/T222093 (10Paladox) [20:43:55] 10serviceops, 10Beta-Cluster-Infrastructure, 10Release Pipeline, 10Core Platform Team Backlog (Next), and 2 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10Ottomata) An example of environmental differences: service-runner uses statsd. In prod we use... [20:45:24] 10serviceops, 10Gerrit, 10Operations, 10cloud-services-team: Change /r/p/ to /r/ on all hosts (where https://gerrit.wikimedia.org/r/p/ exists) - https://phabricator.wikimedia.org/T222093 (10Paladox) [20:53:35] 10serviceops, 10Gerrit, 10Operations, 10cloud-services-team: Change /r/p/ to /r/ on all hosts (where https://gerrit.wikimedia.org/r/p/ exists) - https://phabricator.wikimedia.org/T222093 (10Paladox) [23:51:03] 10serviceops, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Kanban): Add k8s credentials for Blubberoid continuous deployment - https://phabricator.wikimedia.org/T217147 (10greg) p:05Triage→03Normal