[01:38:02] serviceops, Operations: Canaries canaries canaries - https://phabricator.wikimedia.org/T210143 (Dzahn) [07:36:36] serviceops, Scap, Release-Engineering-Team (Watching / External): Allow scap sync to deploy gradually - https://phabricator.wikimedia.org/T212147 (Joe) From our meeting yesterday: - To do progressive deploys of the train, we just have to modify `sync-wikiversion` and keep track of its completion (by... [08:32:38] serviceops, Core Platform Team, MediaWiki-Cache, Operations, Performance-Team: Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (Joe) [10:35:12] serviceops, Operations, TechCom, Wikidata, and 4 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (mobrovac) [10:40:01] <_joe_> mobrovac: thanks for tagging that ticket for us [10:40:17] yw [10:42:53] <_joe_> https://phabricator.wikimedia.org/F27624864 I love how the php layer is first called "mediawiki php" and then "wikidata api" [10:43:08] <_joe_> this picture is problematic architecturally [10:43:37] you don't like that the guy abover the browser is happy, admit it! [10:44:26] <_joe_> yeah [10:44:47] <_joe_> so frankly, I would've had the js code running in the browser to generate the termbox calling the api [10:44:55] <_joe_> if I had to write this thing [10:45:24] <_joe_> but even for non-js users, one could think of an html version not composited into the main page? [10:46:55] i think this is a larger discussion about presentation layer composition [10:47:06] but definitely using the client as the fallback for the server is backwards [10:48:02] oh oh, mediawiki php and wikidata api are actually the same entity [10:48:20] for the same project, i mean [10:49:36] these two diagrams that are there are different [10:50:03] why can't mwapi give all the info the service right away as depicted in the arch diagram? [10:51:16] <_joe_> you mean sending that info to the service, yes [10:52:18] <_joe_> that would make it a rendering service [10:53:51] <_joe_> so I see one argument for doing server-side rendering, and that is caching [10:54:35] <_joe_> and compatibility with non-js-browsers, although I don't know what level of support to those we're giving [10:56:39] having a working site for non-js is a hard requirement [10:58:52] <_joe_> ok, define working :) [11:03:39] haha [11:05:14] <_joe_> so if a wikidata page is not considered "working" without the termbox, then yes, we need server-side rendering [11:10:46] iirc, the full page has to be displayed regardless of whether js is enabled/available or not [11:11:17] so from that standpoint, i'd say we need to provide a server-side fallback at least [11:11:48] 10serviceops, 10Operations, 10TechCom, 10Wikidata, and 4 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10Joe) Looking at the attached diagrams, it seems that the flow of a request is as follows: - page gets requested to MediaWiki - MW sends a request to the... [11:15:10] 10serviceops, 10Operations, 10TechCom, 10Wikidata, and 5 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10Addshore) [11:15:26] 10serviceops, 10Operations, 10TechCom, 10Wikidata, and 5 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10Joe) Also: it is stated in https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service that "In case of no configured server-side rendering service or... [11:36:12] hey we have colours [11:36:17] yey! [11:39:41] haha [11:49:59] * _joe_ installs the pizzascript [12:26:22] Lol [12:42:48] i am discussing the staging goal with effie in private [12:43:16] 10serviceops, 10Operations, 10TechCom, 10Wikidata, and 5 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10mobrovac) As a further optimisation of both the architecture as well as parsing and load times, MW/Wikibase could populate a (hidden?) tag in the DOM wit... [12:43:47] so is it really sadly the case that (mediawiki/scap 2) deploys do not simply reflect a single git revision in git repo(s)? [12:45:32] for train deploys, iirc they flatten out the repo and the submodules, but for full scap deploys (say swat), afaik you deploy the git repo as-is [12:46:03] i can see difficulties around security patches etc [12:46:14] that said, i don't know if scap does some flattening magic behind the scenes when you do a full scap [12:46:25] i have no idea how scap 2 and 3 work [12:46:27] never used them [12:46:39] but wait, why scap2? [12:46:44] <_joe_> lol [12:46:47] mw deploys use scap3, don't they? [12:46:51] <_joe_> nope [12:47:07] nope [12:47:08] i'm just wondering why keeping track of deployed mediawiki revision state in scap is apparently so hard [12:47:12] sooo the migration to scap3 has never happened? [12:47:16] no [12:47:21] til [12:47:25] ha [12:47:29] <_joe_> mark: it's hard because of the way we're used to do deployments [12:47:49] <_joe_> we want to rsync single files-directories [12:48:14] i would say that's fine if those single files are the only changes in that git revision :P [12:48:20] <_joe_> and also to deploy N different repositories, plus security patches, plus privatesettings coming from puppet, plus magic! [12:48:51] <_joe_> it's clear we have to rethink how our code is organized [12:48:56] <_joe_> and how it's deployed [12:49:01] so why isn't that all merged into a (private) branch in a git repo first? [12:49:06] <_joe_> I mean organized in production [12:49:13] i can see why sometimes some things can't be in the public git repo [12:49:17] for a while [12:49:33] but then at least can we have some private git branch somewhere that has the exact revision of what's deployed? [12:49:57] i suppose there are reasons why trebuchet failed [12:50:00] and I know i'm being naive here [12:50:52] <_joe_> no the problem is that if we allow people to sync a single file manually (and in some cases right now that's needed) you can't really easily keep track of what's where [12:51:00] <_joe_> we do not use git to deploy the code [12:51:19] <_joe_> we actively strip git info out of most repositories we bundle up in mediawiki [12:51:32] <_joe_> well trebuchet had at least one fatal flaw [12:51:41] having the private repo that reflects the state would work for trains, but not for swats [12:52:15] <_joe_> mobrovac: hence my idea that we should split the repos, and deploy each separately [12:52:38] <_joe_> the php code for core+extensions, the "docroot" part of wmf-config [12:52:43] <_joe_> err [12:52:47] <_joe_> mediawiki-config [12:52:56] <_joe_> and the "wmf-config" directory there [12:53:05] volans: this is your cue to say etcd [12:53:10] <_joe_> also, things like dbconfig will remove the need for many deploys [12:53:16] <_joe_> jijiki: I did :P [12:53:20] ehe [12:53:20] jijiki: docker [12:53:45] shared hosting! [12:53:45] paravoid: this work will lead us to dockerise it :p [12:53:54] lol [12:53:57] ok lets ditch it all, move to ftp [12:54:24] :D [12:55:14] <_joe_> tsk [12:55:23] <_joe_> if you want to go old school [12:55:30] <_joe_> we create a gopher server [12:55:36] and a time machine [12:55:38] ack [12:55:43] <_joe_> I must have my wordpress2gopher server somewhere [12:55:54] let's just move to geocities [12:56:00] <_joe_> it's in obfuscated perl [12:56:25] <_joe_> written by me when I still was in astrophysics [12:56:34] <_joe_> I'm sure we can repurpose it easily [12:56:47] I guess is time for me to say NFS and run away :-P [12:56:58] <_joe_> I should've added, it's involuntarily obfuscated, like most perl [12:57:15] i don't know what's special about a swat deploy I guess [12:57:32] <_joe_> mark: I don't think there is anything special [12:57:38] <_joe_> things could be changed [12:57:42] patch by patch is deployed successively [12:58:08] which means that in a span of one hour you have X versions of the code in production [12:58:11] <_joe_> it just needs an idea of where do we want to get, a plan, and people working on it [12:58:17] X being the number of patches swatted [12:58:37] _joe_: i likehow you made it sound simple with "just" [12:58:39] <_joe_> and we rollback each patch individually if needed, too [12:59:03] but i agree that it's unnecessarily complicated [12:59:06] <_joe_> and yes, SWAT is something that would need a saner approach in general IMHO [13:00:04] so, question: if we had a veritable staging env, could we ditch this patch-by-patch approach for swat? [13:00:10] <_joe_> yes [13:00:18] <_joe_> I was about to say that [13:00:18] if so, things get much simpler [13:00:26] <_joe_> but in a more passive-aggressive way [13:01:18] <_joe_> mobrovac: but even now, we could deploy the patches just to the mwdebug servers [13:01:23] <_joe_> have people verify [13:01:25] i don't see why 98% of things couldn't be a straight reflection of a set of git repos together [13:01:26] oh i thought my approach was passive-aggressive since i posed a question instead of bluntly staing it :P [13:01:43] <_joe_> and then deploy the whole shebang to the cluster [13:01:52] successive patches? this isn't the 90s :P [13:01:56] <_joe_> mark: no one said they couldn't [13:02:08] <_joe_> reality is a bit different right now [13:02:19] but I do realize I'm just being naive and theorizing about a problem space I've ignored for over a decade :P [13:02:20] <_joe_> and changing the state of things requires some work [13:02:48] mark: ignoring it for a decade still doesn't put you in the 90's which is where we're at :D [13:03:03] <_joe_> I don't think it's a fair assessment [13:03:05] yep back then I thought people were going to fix it ;) [13:03:24] <_joe_> I've seen way worse deployment methods everywhere else I worked [13:03:33] there are quite some complicating factors of course [13:03:36] large files etc [13:03:44] security patches [13:03:52] generated language files? [13:04:06] <_joe_> at JOB~1 I had to weed through an NFS with 800 untracked files [13:04:22] <_joe_> and move them from CVS to git [13:04:31] <_joe_> yes, that's not a typo [13:04:44] <_joe_> mark: that's what I call being in the 90s :P [13:05:02] <_joe_> mark: yes, but all that generation of files could and should be done by CI [13:05:36] <_joe_> and artifacts should be either committed to git or offered for download with any kind of versioning [13:06:29] <_joe_> so the real complexity as I see it is twofold: technical (as there are a thousand things that explicitly or subtly depend on the way we do things) and cultural (however we change things, a decade-old process will need to change) [13:06:47] yeah [13:07:05] <_joe_> finally, I'd like whatever we decide to change to make our lives easier once we try to move mediawiki to kubernetes [13:07:08] and this should all fit/prepare for a kubernetes world as well [13:07:10] heh [13:07:22] <_joe_> it's actually the main reason why I thought we should do mediawiki last :P [13:07:34] well given all things we have mentioned [13:07:36] <_joe_> as opposed to do it as second [13:07:46] it would be impossible not to be the last one [13:10:19] so what are the main differences between scap 2 and 3 [13:10:26] and what are the blockers migrating mediawiki to scap 3? [13:10:37] <_joe_> they're two completely different softwares [13:10:48] <_joe_> one is a glorified rsync [13:11:11] <_joe_> the other uses git and ssh and symlinks to the deployment directories [13:11:25] <_joe_> neither stores state, they have that in common [13:12:10] scap 2 is glorified rsync [13:12:13] <_joe_> in the meantime, I think I found a way to make scap sync-file not be destructive for php7 [13:12:17] scap 2 is what bryan davis wrote? [13:12:20] <_joe_> yes [13:12:32] <_joe_> it's a rewriting in python of the old bash magic IIRC [13:12:39] yes i remember [13:13:11] <_joe_> when I say "glorified rsync" I'm not downplaying it. Rsync is awesome, and the way we use it in scap is awesomer [13:13:49] landlord is here, bbl [13:48:31] 10serviceops, 10Operations, 10Wikibase-Containers, 10Wikidata, and 2 others: Create a wmf production ready nginx image - https://phabricator.wikimedia.org/T209292 (10hashar) #serviceops should be able to help / review. [13:53:21] im late to the party but for me is difficult to change things without changing the process somehow. If we want to mediawiki to run on k8s that means that either the image only have Apache, PHP without the actual code that comes from outside (so we treat code as data) or either code is included on the docker image. [13:54:14] having lived in an immutable world i can see the benefits of having the code inside the image, especially easy rollbacks but that would impact deployment processes [14:33:03] 10serviceops, 10Operations, 10vm-requests: eqiad: 1-2 VM requests for docker-registry-beta.wikimedia.org - https://phabricator.wikimedia.org/T212212 (10fselles) [14:34:15] ^^ i think i can do this myself _joe_ akosiaris jijiki but help/tips general guidance is appreciated [14:34:41] <_joe_> fsero: do you want eqiad-only? [14:34:54] <_joe_> in that case I suggest getting row redundancy [14:34:54] for starters yes [14:35:12] i think moving from one instance to HA in the same DC is some improvement [14:35:25] <_joe_> yeah I was trying to understand if that was 2 vms per DC :) [14:36:07] <_joe_> so, https://wikitech.wikimedia.org/wiki/Ganeti#Create_a_VM [14:36:14] I don't see a reason to cap the request on a single DC. We can create all 4 VMs now and work on adding the necessary bits for codfw later on [14:36:20] <_joe_> but you need to put each VM in a different row [14:36:24] <_joe_> that too [14:36:43] <_joe_> oh makevm, that's new :D [14:36:58] yup, daniel's script [14:37:15] it's meant to become a spicerack cookbook next Q [14:39:20] akosiaris: i wanted tour input so im happy to create 4, 2 per DC [14:39:42] yeah that sounds fine [14:40:23] 10serviceops, 10Operations, 10vm-requests: eqiad: 1-2 VM requests for docker-registry-beta.wikimedia.org - https://phabricator.wikimedia.org/T212212 (10akosiaris) Add codfw in the mix as well, no reason to cap this to eqiad. Everything else LGTM [14:42:23] fsero: sure [14:42:28] <_joe_> fsero: so I was thinking, regarding this. The new registry can support clair, it would be nice to use clair reports to launch rebuilding of our images when needed, at least for the base images [14:42:41] are we going to use the current redis::misc cluster? [14:42:45] <_joe_> so I will work on docker-pkg to make it work with that [14:43:05] <_joe_> jijiki: we need a page for that cluster, on wikitech, detailing its users [14:43:06] so where do we put clair DB? [14:43:25] its a good idea to set up a postgres local mysql for that service? [14:43:29] _joe_: I will start one yes [14:43:40] in my mind that would be another VM just for clair API with local db [14:43:43] <_joe_> fsero: unless the dataset is huge [14:44:05] <_joe_> I would say one of our mixed dbs is the best choice, but let's ask the dbas [14:44:25] <_joe_> that is if it supports mysql [14:44:41] i think it only support postgresql [14:44:59] <_joe_> local db it is! [14:45:33] https://github.com/coreos/clair/blob/master/Documentation/running-clair.md [14:45:45] <_joe_> hey the docs say deploy it with kubernetes [14:45:47] <_joe_> !!1! [14:46:11] sure, we can deploy it on kube, that would mean having a db on kube [14:46:25] <_joe_> yeah, I wasn't suggesting that [14:47:06] <_joe_> what is the "notification endpoint" in that page? clair can be configured to send notifications to $any_service ? [14:47:56] yep [14:48:07] https://www.irccloud.com/pastebin/cTt92EGc/ [14:48:13] this is part of the config [14:48:28] i guess you are thinking on we can notify debmonitor [14:48:31] nope? [14:51:24] yeah [14:52:37] 10serviceops, 10Operations, 10TechCom, 10Wikidata, and 5 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10daniel) @mobrovac Please note that the term box is shown based on user preferences (languages spoken), the initially served DOM however needs to be the s... [15:15:54] <_joe_> fsero: or, for now, any kind of 'service' that can build a report and trigger an application rebuild, yes [15:15:58] <_joe_> err, image [15:19:52] _joe_: I will update this doc https://wikitech.wikimedia.org/wiki/Redis [15:20:06] makes sense to document the role there [15:30:27] <_joe_> sure [16:20:51] i've posted our PHP7 goal [16:21:07] i will wait for confirmation on staging and deployment pipeline [16:29:51] 10serviceops, 10Core Platform Team, 10MediaWiki-Cache, 10Operations, 10Performance-Team: Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10EvanProdromou) a:03EvanProdromou [16:31:05] 10serviceops, 10Core Platform Team, 10MediaWiki-Cache, 10Operations, 10Performance-Team: Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10daniel) Calls to getMainObjectStash: https://codesearch.wmflabs.org/search/?q=getMainObjectStash&i=nope... [16:53:31] with moritz's help, I have packaged librsvg 2.40.20 for stretch [16:54:08] what is more preferrable to have a go and installing those manualy on deployment-prep and test it [16:54:30] or make changes in the puppet thumbor role [16:54:58] to use those packages if role is thumbor and debian release is stretch? [16:55:24] (which will go to waste if 2.40.20 has the same issues as .18 and .16) [16:59:57] 10serviceops, 10Operations, 10Thumbor, 10Patch-For-Review, and 2 others: Assess Thumbor upgrade options - https://phabricator.wikimedia.org/T209886 (10jijiki) [17:35:57] 10serviceops, 10Operations, 10TechCom, 10Wikidata, and 5 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10mobrovac) >>! In T212189#4831314, @daniel wrote: > @mobrovac Please note that the term box is shown based on user preferences (languages spoken), the ini... [17:37:14] 10serviceops, 10Operations, 10vm-requests, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): eqiad: 1 VM request for doc.wikimedia.org - https://phabricator.wikimedia.org/T211974 (10Dzahn) @Hashar Here you go: - doc1001.eqiad.wmnet - stretch - php 7.2, not just 7.0 - php-fpm, not mod_php anymore... [17:38:01] i am also going to post the staging goal [17:38:04] and the other goal I think [17:38:05] so we have it up there [17:38:08] we can still edit the wiki [17:38:13] but deadline is tomorrow end of day [17:41:05] I made the new ganeti VM and puppet role/profile for hashar to move doc.wikimedia.org to. (doc1001/ 'doc') I did php-fpm, not mod_php and php 7.2 not just 7.0, following the appservers and phab. [17:42:18] he will take over now with patches to push static files from contint* [18:48:50] 10serviceops, 10Operations, 10vm-requests, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): eqiad: 1 VM request for doc.wikimedia.org - https://phabricator.wikimedia.org/T211974 (10Dzahn) - added "doc" as an official cluster prefix https://wikitech.wikimedia.org/w/index.php?title=Infrastructure_n... [19:03:24] CI webserver on contint also not using apache module anymore now [19:05:13] adding support for php7 to unblock contint1002 on stretch [19:17:36] 10serviceops, 10Core Platform Team, 10MediaWiki-Cache, 10Operations, 10Performance-Team (Radar): Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10Imarlier) [20:11:17] 10serviceops, 10Operations, 10TechCom, 10Wikidata, and 5 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10Joe) >>! In T212189#4831959, @mobrovac wrote: >>>! In T212189#4831314, @daniel wrote: >> @mobrovac Please note that the term box is shown based on user p... [20:19:43] 10serviceops, 10Core Platform Team, 10MediaWiki-Cache, 10Operations, 10Performance-Team (Radar): Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10aaron) [20:22:39] 10serviceops, 10Core Platform Team, 10MediaWiki-Cache, 10Operations, 10Performance-Team (Radar): Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10aaron) We need persistence and replication. The plan is to use the same store as session for th... [20:38:27] 10serviceops, 10Core Platform Team, 10MediaWiki-Cache, 10Operations, 10Performance-Team (Radar): Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10Joe) >>! In T212129#4832390, @aaron wrote: > We need persistence and replication. The plan is t... [20:38:45] <_joe_> oh sigh [20:42:00] <_joe_> the discussion on the MainStash is really worryisome [20:43:18] 10serviceops, 10Release Pipeline, 10Release-Engineering-Team (Kanban): Allow access to blubberoid.discovery.wmnet:8748 - https://phabricator.wikimedia.org/T212251 (10dduvall) [20:48:16] 10serviceops, 10Release Pipeline, 10Release-Engineering-Team (Kanban): Allow access to blubberoid.discovery.wmnet:8748 - https://phabricator.wikimedia.org/T212251 (10dduvall) [21:07:53] 10serviceops, 10Core Platform Team, 10MediaWiki-Cache, 10Operations, 10Performance-Team (Radar): Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10Eevans) >>! In T212129#4832455, @Joe wrote: > [ ... ] > This needs a thorough discussion ASAP.... [21:10:50] 10serviceops, 10Core Platform Team, 10MediaWiki-Cache, 10Operations, 10Performance-Team (Radar): Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10Joe) Well that discussion was limited to Session storage, and I stand by the idea that service,...