[07:24:10] <_joe_> good morning [08:38:56] hey joe [08:40:35] <_joe_> hello [08:57:31] _joe_: hi, thanks for the merge! [08:58:32] <_joe_> paladox: thanks for the patch [08:59:09] Your welcome :) [08:59:30] * paladox has already used your module else where and got php 7.3 working :) [09:01:03] <_joe_> paladox: sooner than later we should start publishing new, well written modules to puppetforge [09:02:14] :) [09:02:15] <_joe_> hey team: anyone wants to work with me on rebuilding a debian package, uploading it to reprepro and installing it across the fleet? [09:02:32] <_joe_> jijiki, fsero one of you two I guess [09:03:18] I can trade you a discussion about pigeons and caches [09:03:25] and scap [09:06:18] <_joe_> hehe you have your hands full already :P [09:16:34] Heh [09:16:43] <_joe_> anyone has experience in setting up stashbot for a channel? [09:29:37] _joe_: I can help in a while still not caffeinated [09:34:12] <_joe_> ahahah I wouldn't ask you to do php debian packaging without caffeine [09:47:25] volans _joe_ [09:47:33] so one thing we have on the table now is [09:47:45] before we deploy mediawiki to servers [09:47:59] we git pull all relevant repos on a deploy server [09:48:04] mediawiki + extentions [09:48:20] the idea is that this pull will have let's say a version name [09:48:31] bla-1.0 [09:48:53] if we need to add a patch, even if it is a single line into a file [09:49:09] it will change to eg bla-1.0p1 [09:49:17] (naming is provisional ofc) [09:49:41] <_joe_> yeah I got it from `bla` [09:49:51] we could have something that updates the version automatically [09:50:11] <_joe_> I see a few issues with anything that can't be automated via scap itself [09:50:11] given this, we will not have scap rsync single files any more [09:50:28] <_joe_> that's... not a small thing [09:50:35] <_joe_> and it would kill the dbas as we stand [09:50:45] tell me [09:50:56] we don't want to kill anyone in the process [09:50:57] <_joe_> scap sync-file takes what, 1 minute? [09:51:06] yes [09:51:08] <_joe_> scap sync takes even several minutes [09:51:21] <_joe_> dbas have to do sync-file like 20 times a day [09:51:42] <_joe_> so you're moving the overhead for them from 20 minutes to 2 hours [09:51:50] can you give me an example? [09:52:06] oh mediawiki config [09:52:19] <_joe_> https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/480008 [09:52:27] yep [09:52:28] <_joe_> I would much prefer if we had something like [09:52:42] I have another idea about this [09:52:45] <_joe_> we check the version of mediawiki-config and all the submodules [09:52:55] or we can just include the dbconfig in etcd as a pre-requisite ;) [09:52:56] * volans hides [09:53:08] <_joe_> volans: yeah it's not just the dbas [09:53:41] so what if we have a manifest with all file hashes [09:53:52] <_joe_> jijiki: we need a way to describe the state of each sub-repo [09:54:00] lol, are you re-implementing a debian package? :-P [09:54:03] <_joe_> plus the version of PrivateSettings [09:54:17] volans: I think I am reinventing the wheel really [09:54:27] <_joe_> volans: we're inventing how to add state to something that wasn't designed to have one [09:54:32] _joe_: do we have generated files? how they work? (where are they generated) [09:54:38] <_joe_> well "designed", "sedimented" [09:54:56] <_joe_> volans: AIUI, we have several sources for what gets distributed to the appservers [09:55:01] <_joe_> 1 - mediawiki-config [09:55:24] volans: I think the total number of repos is ~190 [09:55:51] <_joe_> 2 - various subdirectories named php-1.XX.YY-wmf.Z [09:56:38] <_joe_> I'm not sure how they are created [09:56:42] <_joe_> 3 - PrivateSettings.php which is written by puppet [09:56:53] <_joe_> 4 - possibly security patches [09:57:36] <_joe_> 5 - symlinks like php => php-1.XX.YY-wmf.Z [09:57:55] <_joe_> so it's a glorious mess, really. [09:58:04] any CSS/JS minified or things like those? [09:58:07] <_joe_> describing the state of the whole thing is going to be challenging [09:58:26] <_joe_> volans: no AIUI that's done before things hit the deployment server anyways [09:58:50] ack [09:58:55] * volans brb [09:58:57] <_joe_> describing the state of that thing is going to be challenging. [09:59:32] <_joe_> jijiki: I was thinking of a radically simpler approach, which ofc would give us way less leverage [10:00:35] <_joe_> we keep a file with a simple counter in /srv/mediawiki-staging [10:00:53] <_joe_> scap updates that file every time it is run, for any type of sync [10:00:59] <_joe_> and it distributes it too [10:01:28] <_joe_> this way you know exactly if a sync has gone to a server. It doesn't give you any protection against past partial releases ofc [10:01:46] <_joe_> but in general, I think the right approach is to first decide what we want to achieve [10:02:12] <_joe_> I think what we want is an easy way to know if the current scap release has reach all servers [10:02:28] <_joe_> err, s/scap/mediawiki/ [10:02:46] <_joe_> oh my, what did I write [10:03:17] <_joe_> I think what we want is an easy way to know if the current scap action has been applied to all servers [10:04:51] <_joe_> it doesn't guarantee consistency, but short of doing very expensive things (or removing scap sync-file, which I'd rather not do for now), I don't think it's possible [10:05:48] isn't the failure of some nodes already reported by scap? [10:05:59] <_joe_> that's not what we want [10:06:04] <_joe_> sorry I thought you had context [10:06:24] <_joe_> context is: we want to support gradual deploys to a % of traffic [10:06:32] right [10:06:36] <_joe_> hence we want to be able to deploy to a % of servers [10:06:42] hangon though [10:06:52] when it comes to eg dbs [10:06:53] <_joe_> and we need scap to be able to know which servers have been deployed to [10:06:59] we don't want a percentage [10:07:01] right? [10:07:07] when we depool a db server [10:07:15] we need it to be depooled everywhere [10:07:17] <_joe_> no, but for other things we use scap sync-file for, we might [10:07:45] maybe we could seperate those requirements [10:08:00] and leave scap file for specific deployments [10:08:02] <_joe_> let's first focus on what is our current problem: there is no way to persist information about what was deployed where between scap runs [10:08:33] <_joe_> even if they're for the same code version [10:08:47] <_joe_> I'm not sure what is the right way to do it [10:09:28] <_joe_> yup, all good [10:09:47] <_joe_> sorry wrong window :P [10:10:11] what if we fine grain it [10:10:23] as in, keep individual versions of repo [10:10:26] repos* [10:10:45] and have scap detect what change was on which repo [10:11:13] and implicitly do scap file if needed [10:11:30] <_joe_> anyways, one obvious way to do this without getting into the hornets nest of determining what is being synced univocally is to assign an ID to the scap operation, and have scap store the state somewhere [10:11:41] I am over optimistic that this will be easy to implement, I know [10:11:52] <_joe_> jijiki: maybe that works, yeah :P [10:12:21] so just to sum up a bit, our issues are: [10:12:39] I have a step-back question, the "deploy to a % of servers" is for "features", so basically we have version X on A% of servers and most likely X+1 on B% of servers [10:12:56] <_joe_> yes [10:13:00] that means that we'll need to be able to deploy a change in mediawiki config either to all of them or to just a group of them [10:13:06] so we'll have multiple mediawiki-config? [10:13:08] volans: yes, if we assume that rollback means increasing the version number [10:13:14] <_joe_> volans: what? [10:13:16] or something like feature toggles [10:13:22] you deploy the config everywhere [10:13:26] <_joe_> stop [10:13:29] fsero: that is another pandora's box [10:13:29] but just a % of servers will react to them [10:13:31] :p [10:13:39] yea h yeah i know if i talk pandoras box [10:13:42] hahahaha [10:13:43] :P [10:13:51] * fsero hides [10:13:57] fsero: there is an idea for this to be a next step [10:14:02] _joe_: you've feature X with a toggle in mediawiki config, you want group A to have it disabled, group B to have it enabled [10:14:06] <_joe_> fsero, volans let me explain the problem in terms a product manager would like [10:14:22] <_joe_> this is not about A/B testing [10:14:24] <_joe_> a feature [10:14:29] <_joe_> we do that in other ways [10:14:51] <_joe_> our problem is: [10:15:03] <_joe_> - we release the MediaWiki train [10:15:19] <_joe_> - we find issues, we rollback after having caused an outage [10:15:40] <_joe_> right now we release the code first to N servers, than to all [10:15:54] but that release to N servers lasts 20" [10:16:12] <_joe_> what we want to support is releasing, from scap, first to 5% of server, wait some minutes, release to 10%, etc [10:16:19] so tl;dr with in a few minutes we go from 8% -> 100% [10:16:50] <_joe_> this is different from toggling features visible to the users on/off [10:17:15] toggling features could be implemented at code level to my understanding [10:17:25] eg if a feature is visible to only logged in users [10:17:30] <_joe_> jijiki: that's already being done for user-visible features [10:17:34] ok, so my gut feeling is what we need first is a way to describe the state [10:17:35] yeah [10:17:40] <_joe_> fsero: exactly [10:17:47] if this is the *only* feature you want, it seems easier to just have scap keep a list of hosts for that deployment on the deployment server and keep track of what was deployed there and what not [10:17:52] so we need to "version" mediawiki deploys somehow [10:18:03] <_joe_> volans: that keeping track is not that easy [10:18:13] <_joe_> that's what this was all about [10:18:17] volans: that is where the issues we just discussed come up [10:18:23] depooling a db server [10:18:28] should be done everywhere [10:18:32] <_joe_> so keeping an incremental version as a tracker, basically [10:18:56] <_joe_> the easiest way to keep track (and to have monitoring as a bonus!) is to sync a version file [10:19:03] <_joe_> but yeah, we might want to do both things [10:19:27] jijiki: we can treat db changes as changes that doesnt allow rollout deployments for now [10:19:33] i think we should focus first on features [10:19:38] but thats again my gut feeling [10:19:38] <_joe_> volans: the picture is much more complicated than how you're think about it [10:20:04] <_joe_> fsero: I would advise against having sync-file supporting partial rollouts [10:20:19] <_joe_> but it will still need to increase the version number [10:20:22] <_joe_> that's my point [10:21:04] <_joe_> volans: if we keep state on the deployment server, then we need it to know which servers are pooled/depooled at the time of the release [10:21:11] <_joe_> and what happens when they get repooled? [10:21:18] <_joe_> and what when I run "scap pull"? [10:21:51] <_joe_> so we can either rethink completely how scap and releases work [10:22:02] <_joe_> which we're bound to do someday anyways [10:22:47] <_joe_> or we try to work within the limitations and constraints of the current system [10:23:16] <_joe_> we can anyways limit the new feature to just work for full syncs (that is, restarting a sync) [10:23:32] how the scap pull is solved with the version file on the target host anyway? [10:23:48] I think you'll endup changing some logic in scap anyway ;) [10:23:52] <_joe_> volans: that scap pull will pull the latest version from deploy1001 [10:24:03] <_joe_> which will include that file with the latest sequence number [10:24:05] that will stay like that [10:24:06] <_joe_> easy [10:24:48] <_joe_> volans: the only real limitation I see is [10:24:59] <_joe_> someone does a release that is incomplete for $reason [10:25:15] <_joe_> someone else does sync-file and that updates the version [10:25:19] <_joe_> silently [10:25:31] <_joe_> that means that we don't ensure consistency [10:25:41] <_joe_> but that's not our current goal, right? [10:26:02] <_joe_> I mean sure it would be great, but it's not our current cat to skin [10:26:57] agree, yeah, maybe with a version file with train version + incremental is doable [10:27:00] <_joe_> it would help us monitor inconsistencies a bit though [10:27:16] <_joe_> volans: why do you need the train version? [10:27:46] <_joe_> I mean we can make the file fancy and nice later, in a first iteration just a counter would be enough I think [10:28:07] how to keep track of which host is in which train version? [10:28:11] <_joe_> well [10:28:17] if in the middle we depool a db [10:28:43] <_joe_> ok that's the point, we should not allow overlapping releases [10:28:52] but we do, all the time :) [10:28:55] <_joe_> no we don't [10:29:01] <_joe_> like at all [10:29:09] <_joe_> there is a locking in scap to prevent that [10:29:35] <_joe_> volans: or do you think you can run scap sync-file while someone else is running it too? [10:29:39] <_joe_> because that's not the case [10:30:47] no not that [10:30:48] <_joe_> so scap should do something like: check the file everywhere, if it's not at the desired version, tell the user "not every server is at version X. please do a full sync" if someone tries a sync-file [10:31:07] AIUI the only way to ensure consistency is to remove sync-file, and besides dbas i dont know other usages for that, and for dbas we already thought of removing the dbconfig from file and move the config to a configuration server (etcd) [10:31:36] _joe_: so we would not be able to depool a DB during a long-time procedure of deploying gradually from 0-100%? [10:31:41] <_joe_> fsero: you don't ensure consistency with the full sync either, but yeah it would be much less of a problem [10:32:26] <_joe_> volans: if it's a long-time procedure, you would not be able, no. I don't see a way around that [10:32:51] <_joe_> without restructuring radically how we organize /srv/medaiwiki-staging [10:33:13] if you have a counter for scap-full and a counter for sync-file I guess you can enable it without too much hassle [10:33:21] <_joe_> uhm [10:33:24] it could be train+incremental or two incrementals [10:33:32] * effie errand [10:33:47] <_joe_> not sure that's the case, but I have to think about it [10:34:01] <_joe_> let's talk in terms of scap sync and scap sync-file [10:34:21] <_joe_> don't think in terms of what's in that dir, it is a can of worms for the ages [10:34:37] <_joe_> fsero: I agree that's the only way to ensure consistency, btw [10:35:38] <_joe_> volans: so two files, I would say. Sync-file without the ability to do partial deploys, and only syncs its own file [10:35:45] <_joe_> yeah that could work [10:35:59] <_joe_> anyways, let's start working on excimer fsero [10:36:24] <_joe_> we should probably do it in private, not sure it's of general interest [11:30:14] hi! for https://phabricator.wikimedia.org/T211124 I'm looking to include profile::rsyslog::udp_localhost_compat somewhere in mediawiki role or profile, which place would be best? [11:30:27] _joe_ perhaps you know ^ [11:31:02] <_joe_> godog: it depends on where do you want that config to go [11:31:14] <_joe_> if the answer is "any server with mediawiki configured", heh [11:31:30] <_joe_> I would say in role::mediawiki::common [11:31:59] yeah any server with mw configured [11:32:19] thanks I'll use that, will be sending the cr your way [12:19:30] volans: are there are other stuff in your list with ideas about this? [12:21:04] tell us after lunch :p [12:21:07] jijiki: given we're not going probably with the solve-all-problem solution as it will be too much, I guess not :) [12:21:22] :( [12:21:26] the two serials I guess should cover most of our short term requirements [12:21:43] have lunch, we'll revisit it later [12:21:45] :D [12:21:49] ack, ttyl :) [15:15:25] fsero, _joe_ mark, jijiki: I 've added btw the deploy of eventgate to our TEC3 goal. Looks like analytics will ask to deploy it next quarter. It's pending a final confirmation from Andrew IIRC, but here it is [15:16:28] <_joe_> well the good thing with that is that otto will do the work [15:16:34] <_joe_> and we'll reap the credit [15:16:38] <_joe_> so a win-win [15:19:31] it's a net win indeed I think. Will definitely help with the other 3 bullet points (docs, training, helm replacements) [15:21:57] for the newcomers [15:21:59] whats eventgate? [15:24:02] lemme see if I can find otto's slides [15:24:40] sigh, not in https://office.wikimedia.org/wiki/Operations/SRE_sessions [15:26:19] awesome [15:26:30] yeah, pinged him already [15:26:33] how about the bullet to productionize the k8s cluster a bit more? :) [15:27:19] well, not sure how to write that down tbh... [15:27:30] fsero: got any ideas ? [15:28:29] this https://phabricator.wikimedia.org/T201963 ? [15:28:57] fsero: yes [15:29:54] SRE sessions page updated [15:30:37] slide 17, it's the stream intake service [15:30:52] hey, does the option 3 in that ticket have to do with the now-not-FLOSS software that paravoid was talking about earlier today? :) [15:32:36] <_joe_> cdanis: yes [15:32:42] <_joe_> good we didn't decide to use it [15:32:49] kafka REST is no longer open source ? [15:32:57] <_joe_> I even pushed for using it [15:33:14] <_joe_> or well, I asked for a strong reason not to [15:33:35] <_joe_> akosiaris: it's not free [15:33:51] I don't follow. https://github.com/confluentinc/kafka-rest right? [15:33:53] <_joe_> akosiaris: it adopted a non-osi-compliant license, even [15:33:57] https://www.confluent.io/confluent-community-license-faq [15:34:11] ah the mess that bryan cantrill was ranting about the other day ? [15:34:45] <_joe_> "a source-available license" [15:34:47] there was response on medium by the CEO of confluent I think to it [15:34:52] <_joe_> I love it every time I read it [15:36:21] oh god [15:36:30] every time i join a company i need to deploy an event collector [15:36:39] http://dtrace.org/blogs/bmc/2018/12/14/open-source-confronts-its-midlife-crisis/ btw [15:36:56] it's like the 3rd or 4th for us :-) [15:37:26] thankfully it's way less convoluted and problematic than PDF generation around here [15:37:59] at the same point in time there is a pretty large discussion now about event propagation [15:38:29] there's even talks about a graph database for tracking down some pretty interesting relationships [15:38:58] e.g. templates being used in templates being used in templates ending up being used in some heavy pages [15:44:14] mark akosiaris i've rephrased and moved things around on goals for the productionizing part, please feel free to change / edit [15:44:42] akosiaris: if kafka lives outside of the cluster [15:44:43] fsero: which pad though ? [15:44:53] https://etherpad.wikimedia.org/p/SRE-goals-FQ3-FY1819 [15:44:53] https://etherpad.wikimedia.org/p/SRE-goals-FQ3-FY1819 [15:45:08] hmm why don't I see the changes ... [15:45:25] ah, I see .. different section [15:45:38] yea i've messed up the seciton [15:45:41] Amend TEC3 at line 228 if you want [15:47:58] re event logging, if kafka lives outside the cluster and that service receives (probably transform a little bit) and ingest over kafka is a good fit for k8s [15:48:17] <_joe_> fsero: yeah it's exactly that [15:49:45] too much detail for a goal :) [15:51:23] <_joe_> who is going to work on configuring stashbot so that we get info on our tickets here? [15:51:42] <_joe_> I am not sure I got that detail in our meeting [15:56:25] Licensee is not granted the right to, and Licensee shall not, exercise the License for an Excluded Purpose. For purposes of this Agreement, “Excluded Purpose” means making available any software-as-a-service, platform-as-a-service, infrastructure-as-a-service or other similar online service that competes with Confluent products or services that provide the Software. [15:56:37] ok... I just read it, now I got what all the fuss is about [15:56:55] congrats confluent. I was wondering how they were making money, looks like they weren't [15:57:29] i think is the same debate as with RedisLabs before http://antirez.com/news/120 [15:58:40] <_joe_> there is a reason why things like the CNCF and the linux foundation are good for the ecosystem. The idea is that large companies should pour their money in there [15:58:59] yeah it's essentially the same thing, albeit in Redis case it's the modules that will be released under the common clause [15:59:12] <_joe_> so not the software we use [15:59:14] <_joe_> just some addons [15:59:17] so it's an open core/openish modules model [15:59:26] <_joe_> well it's not even open core [15:59:31] <_joe_> no one uses those modules [15:59:35] <_joe_> no one I know at least [15:59:44] well, if they did it, they disagree [16:00:58] mark summarized goal [16:06:42] wow, bryan fired back at jay. http://dtrace.org/blogs/bmc/2018/12/16/a-eula-in-foss-clothing/ [16:26:57] fsero: nice [16:31:32] wow, it's so shiny in here [16:35:06] that's how new channels are [16:35:17] it'll get all beat up soon enough [16:39:04] I guess we should see what's up with wikibugs, whether it's been whitelisted yet [16:39:55] apergos: apparently the channel was not registered [16:40:03] oh that's right [16:40:21] I pinged mark but he might not have noticed, and I forgot to mention it when we spoke earlier, completely forgot about that [16:40:46] let's hope he puts in a cameo at tonight's meeting at least [16:41:06] <_joe_> the channel is now registered [16:41:11] <_joe_> don't you see mark has the @ [16:41:23] he had it [16:41:27] there's no Chanserv in here [16:43:16] channels can be registered without chanserv being present [16:44:31] < valhallasw`cloud> mutante: Channel Services Channel #wikimedia-serviceops is not registered. - [16:45:51] (06:45:35 μμ) ChanServ: (notice) #wikimedia-serviceops is not registered. this is chanserv telling me right now [16:45:54] so... [16:47:33] and mark will have to do i as he's op [16:47:41] <_joe_> I poked mark too earlier :) [16:48:16] alright alright [16:48:42] it's now registered. [16:48:46] go ahead and set the guard on the channel to while you're t it please [16:48:51] that will bring chanserv in [16:49:21] *too [16:49:39] thnk you! [16:49:57] mutante: back to you :-) [16:52:29] _joe_ i would like to thank you for the php module! It helped me identify a performance issue (stemming from mw 1.31) using the slowlog (and later tested in mw 1.32 which seems to be fixed) [16:52:52] <_joe_> so, now we have the SRE meeting, but urandom and clarakosi had an interesting discussion to be had about the storage service and what language to pick for it. We can circle back to it at a later date [16:53:19] <_joe_> paladox: you mean you found an underperforming endpoint by looking at slowlog? [16:53:32] yeh [16:53:35] <_joe_> that reminds me we're not collecting those to logstash at the moment :D [16:53:42] is this the session storage piece, you mean? [16:53:48] _joe_ i found the slowness to do with commons. [16:54:13] it dosen't seem to be caching (guessing here by how long it takes to load a page using resources from commons) [16:54:21] _joe_: mostly clarakosi, she's turning out to be a real troublemaker [16:54:43] <_joe_> apergos: yes, basically they think it can perform drastically better if not written in {php,js/node,python} [16:54:57] did I hallucinae that they are writing it in go? [16:55:10] or is that rumour real? [16:55:12] <_joe_> *cough* *cough* [16:55:25] <_joe_> that's what we should discuss, really [16:55:34] urandom: 😂 [16:55:51] * apergos adds 'go memory and performance issues' to their already large reading pile [16:56:28] <_joe_> apergos: so my point is - we already run large complex applications in both go and java [16:56:33] if someone has a link for a summary of the discussion, I would love it [16:56:47] <_joe_> so from an operational point of view, either would be ok as long as the service is well observable [16:57:11] yet another pandora's box cc jijiki [16:57:14] <_joe_> that meaning we have RED metrics for every endpoint exposed to prometheus, more or less [16:57:15] whats wrong with go? [16:57:20] <_joe_> fsero: nothing [16:57:25] who do we have on the team that is go-fluent? not that this is a blocker but we should at least have a couple people to start with [16:57:46] im much much fluent on go than python or php for instance [16:57:46] <_joe_> fsero: just that none of our self-written services is in go [16:58:01] <_joe_> I'm not fluent, but I can read and write code [16:59:16] I am also teaching it to myself [16:59:23] but it's not going well up to now [16:59:37] mostly ETOOMANYCONTEXTSWITCHES [16:59:40] i started down tht road but didn't [16:59:45] <_joe_> yup, same here [16:59:45] yeah that. didn't get so far [17:00:00] <_joe_> but I'm going through a good book over the weekned [17:00:15] https://exercism.io/tracks/go [17:00:37] opinionated but i think is useful to read https://peter.bourgon.org/go-for-industrial-programming/ [17:03:05] i have repeated the request for wikibugs exempt on #freenode-sigyn [17:03:13] now that the channel is registered [17:10:34] <_joe_> also we now have a go service in production, blubberoid [17:10:37] <_joe_> cc jijiki [17:11:59] * urandom cringes at WMF naming once again [17:12:40] <_joe_> I am not joking, we really called a service 'blubberoid' [17:13:23] <_joe_> but I blame dan and tyler for naming the software it originates from "blubber" :P [17:13:32] <_joe_> the -oid is an internal joke [17:13:37] <_joe_> more or less [17:13:42] <_joe_> you know we have failoid [17:13:46] <_joe_> right? [17:14:26] <_joe_> it's a service desinged to quickly send 503s to requestors, it's used during swithcovers of active/passive services [17:16:16] No, I did not know that; I am Jack's complete lack of surprise [17:16:54] we also now have "EventGate", which sounds like a conspiracy uncovered [17:17:13] scandalous! [17:17:40] <_joe_> urandom: we also have eventbus and eventlogging [17:17:53] <_joe_> and as you might have guessed, they all do similar-but-not-equal things [17:17:55] I know [17:18:37] I argued (and lost) on eventbus [17:18:49] <_joe_> ahem [17:19:26] <_joe_> don't worry, now you have the opportunity to argue and lose on eventgate, join my team :P [17:19:48] ahem... failoid doesn't sends 503s itself [17:19:50] by lost I mean that it was discussed ad nauseam until I wanted to chew my own arm off, and I went away while others named it eventbus [17:19:57] but the end results is basically that [17:20:28] volans: irony-as-a-service? [17:21:02] lol [17:26:26] _joe_: fsero: I 've just persisted the zotero related changes so that at least that ticking time bomb is no longer around. [17:26:39] you got already my +1 [17:26:57] :-) [18:05:35] _joe_: we depooled mc2033 from mcrouter [18:05:44] hashar is requesting a Ganeti VM for doc.wikimedia.org, to separate it from contint1001/2001. he added detailed explanation, seems all reasonable. i was going to grant it and create that. i guess this means "doc1001" [18:05:55] but we found connections there [18:06:21] what did we do wrongly ? [18:06:45] nutcracker! :) [18:06:49] <_joe_> mutante: uhm why does he want to separate it? [18:06:57] <_joe_> jijiki: nutcracker, but let em be [18:07:06] yep yep _joe_ [18:07:13] should we leave it as it is ? [18:07:21] "only doc.wikimedia.org requires php however the machine runs on Jessie and lacks php7.0. That breaks oojs demos and probably other ones " "whenever the CI machine is under maintenance, doc.wikimedia.org is no more available. [18:07:23] IIRC we didn't remove the shard from there the last time [18:07:30] "although content is code-review +2 by project owners, the code is running on a production machine that has jenkins/zuul/docker which might be a security breach. " [18:07:31] ok other question, will anything alert tomorrow where they will recable stuff? [18:07:48] separation of concerns, mostly, there's a task: https://phabricator.wikimedia.org/T206046 [18:07:57] _joe_: quotes from https://phabricator.wikimedia.org/T211974 [18:08:20] <_joe_> ack [18:09:28] mutante: FYI there are jenkins job that push documentation to doc.w.o on post-merge, not sure if/what needs to be updated for this ;) [18:10:46] volans: "Networking Requirementsssh/rsync from contint1001.wikimedia.org (208.80.154.17), HTTP for Varnish caches (text-lb?)" [18:11:05] volans: ack, the plan is to rsync https://phab.wmfusercontent.org/file/data/l7cf6ytcn4jzomr4azhv/PHID-FILE-dlffqssayirxnk3qt6as/201812_Proposal_for_doc_coverage_publishing.png [18:11:07] copy/paste lost a space [18:12:05] _joe_ shall we open a task to get rid of nutcracker? Not sure if we already have one or not [18:12:27] <_joe_> elukey: we should move things off of it, but we can't get rid of it yet [18:12:54] sure sure this is what I meant, I know that there is some reference in mw config [18:13:12] <_joe_> more than some, but sorry, in a meeting [18:13:21] I'll open a task with jijiki [18:13:28] so the performance team will love us [18:13:29] greg-g, mutante: ack [18:13:39] :) [18:26:16] apergos: it should be exempted™ otherwise feel free to blame me, and we'll unkline if it gets k-lined [18:26:23] such confidence [18:26:28] heh [18:26:34] guess we'll see [18:26:43] 13:22 note that it seems we have one global configuration for everything wikimedia, so it will also work for other potential wikimedia bots and users, I hope that's fine (it likely is) [18:27:01] 13:22 otherwise this will be tricky [18:27:33] I pretty much don't care [18:27:51] care to invite it in then? [18:28:25] that just unblocked the gerrit change , right [18:28:31] yep [18:28:47] so, just pinging on https://gerrit.wikimedia.org/r/#/c/labs/tools/wikibugs2/+/479512/ with the good news [20:25:07] serviceops, Wikibugs, Patch-For-Review: add wikibugs to #wikimedia-serviceops - https://phabricator.wikimedia.org/T211912 (valhallasw) Testing wikibugs. [20:25:13] mutante: ^ [20:25:21] valhallasw`cloud: :) thank you! [20:25:30] you're welcome! [20:25:50] serviceops, Operations, User-ArielGlenn, User-jijiki: create IRC channel for the Service Operations SRE subteam - https://phabricator.wikimedia.org/T211902 (valhallasw) [20:25:52] serviceops, Wikibugs: add wikibugs to #wikimedia-serviceops - https://phabricator.wikimedia.org/T211912 (valhallasw) Open→Resolved a:valhallasw [20:25:56] serviceops, Operations, User-ArielGlenn, User-jijiki: create IRC channel for the Service Operations SRE subteam - https://phabricator.wikimedia.org/T211902 (Dzahn) [20:25:58] serviceops, Wikibugs: add wikibugs to #wikimedia-serviceops - https://phabricator.wikimedia.org/T211912 (Dzahn) Resolved→Open a:valhallasw→None [20:26:10] serviceops, Wikibugs: add wikibugs to #wikimedia-serviceops - https://phabricator.wikimedia.org/T211912 (Dzahn) a:valhallasw [20:26:13] ^ test, works [20:26:15] serviceops, Wikibugs: add wikibugs to #wikimedia-serviceops - https://phabricator.wikimedia.org/T211912 (jijiki) p:Triage→Normal [20:27:04] serviceops, Operations, User-ArielGlenn, User-jijiki: create IRC channel for the Service Operations SRE subteam - https://phabricator.wikimedia.org/T211902 (valhallasw) [20:27:06] serviceops, Wikibugs: add wikibugs to #wikimedia-serviceops - https://phabricator.wikimedia.org/T211912 (valhallasw) Open→Resolved [20:27:07] oh we need the nice colours we have on -operations [20:27:16] edit conflicts and phabricator is still not a happy marriage [20:27:40] channel mode +c is set, which blocks colors [20:27:54] valhallasw`cloud: thank you very much btw [20:28:14] we will deal with our channel op [20:29:04] * mobrovac hands an imaginary brush to jijiki [20:29:23] lol [20:30:06] mobrovac: https://www.youtube.com/watch?v=KjACWDPQ8nw#t=37m52s ? :-p [20:30:41] hahahaha [20:30:46] impressive! [20:30:54] wow [20:31:59] and then they say basic income should not be a thing? just try to imagine how creative people would get if they had all the time in the world [20:32:06] * mobrovac will shut up now :P [20:34:59] but we have a lot of issues as well because some people have all the time in the world :p [20:34:59] mobrovac: who knows, maybe they'll even write an encyclopedia ;-) [20:35:31] hopefully :) [20:36:32] jijiki: well, boredom is a dangerous thing, both for the person experiencing it and everybody around them :P [20:51:42] need a couple volunteers to also have chanops in here, I'd prefer not to do it [21:07:21] serviceops, Scap, Release-Engineering-Team (Watching / External): Allow scap sync to deploy gradually - https://phabricator.wikimedia.org/T212147 (greg) [21:34:05] serviceops, Wikibugs: add wikibugs to #wikimedia-serviceops - https://phabricator.wikimedia.org/T211912 (Dzahn) [21:34:49] serviceops, Operations, User-ArielGlenn, User-jijiki: create IRC channel for the Service Operations SRE subteam - https://phabricator.wikimedia.org/T211902 (Dzahn) [21:41:21] <_joe_> apergos: I'd expect all members of the team to have ops privileges, to use only if needed [22:23:14] so because it keeps coming up.. where do we put interface::add_ip6_mapped? site, role, profile, module ? [22:23:39] afair first we said nodes only but then we changed it to profiles being ok ..especially where we have groups of nodes [22:23:53] and style check says it's bad in both.. checks again [22:28:50] serviceops, Operations: Add `supervised` option to redis configuration - https://phabricator.wikimedia.org/T212102 (colewhite) p:Triage→Normal