[12:18:22] * andre__ feels like he's spending half of the day fixing broken links [12:20:04] broken links? [12:21:57] twentyafterfour, I refer to links to fab.wmflabs [12:22:06] "intended 404s" I should say, I guess [12:22:32] and I'm on mediawiki.org's Phabricator pages, so external links [12:22:38] sorry if my wording was confusing :) [12:23:08] caught my attention because I'm currently building the bug link redirecting code [12:23:15] actually I just finished it [12:24:29] oh awesome [12:26:20] twentyafterfour: I admit it's still a bit unclear to me where we are with regard to OAuth on the production instance and if we could enable it tomorrow so folks could log in. Sorry :-/ [12:26:41] are we safe to enable the local implementation (and replace it by the upstreamed one, once upstream has accepted it)? [12:27:46] andre__: that's totally not my call. I mean,, yes I think it's safe and there is really very little reason to wait on upstream,, however, chase has been a stickler about upstream reviews previously so I have my doubts [12:28:12] twentyafterfour: heh, alright. That's helpful to know [12:29:55] the only issue really is if we merge it locally then we have to deal with a conflict when the identical code lands upstream (because it doesn't get tracked as the same change when it goes through differential, and git thinks it's a conflicting change) [12:31:16] The more we merge locally the more (potentially) difficult it is the next time we pull from upstream. [12:32:45] hi, Oauth is already deployen in Wikimedia servers (legalpad) so we are not creating a new precedent... [12:37:51] twentyafterfour, in any case, can you ask Evan to review your patches? I don't think he is aware that this upstreaming might be blocking our migration right now, and if he knew... [12:40:19] andre__, do you know the current status of the migration? Are all the tasks imported? Will the workboards be populated? [12:40:36] qgil, all tasks imported; workboards not automatically populated [12:40:39] doing so right now [12:40:46] ouch [12:40:50] qgil: He's already reviewed the code that really matters, the only thing remaining is reviewing the changes I made to implement his suggestions in the last review... however, I don't think it's fair to put pressure on upstream to merge our change in order to save us the trouble of a merge conflict later :-/ [12:41:09] qgil, plus assignees... well, without accounts in production they are also not there, of course. [12:41:16] I guess that's something to communicate. [12:41:25] but yeah, I'm cleaning up currently on prod [12:42:17] twentyafterfour, ok, fair enough, but then I personally think we need to swallow the pill and deploy Oauth as it is. Waiting for upstream while we have a LDAP-only instance doesn't make much sense [12:42:20] it'd be helpful if I had an account on production ;) [12:42:38] twentyafterfour, oh, would you like to log in via LDAP? [12:42:47] twentyafterfour, got your account info handy? [12:42:48] Also, let's not forget that we wanted to use this margin of time before Bugzilla migration to test OAuth better [12:42:51] I tried it says ldap is disabled [12:43:03] twentyafterfour: I know, but I could enable it for a minute for you [12:43:21] we just don't want "normal" people to log in yet, that's why it's off [12:43:22] andre__: yes I have my account info handy.. it's username 20after4 in labs ldap [12:44:44] twentyafterfour: alright, please give it a shot (and be quick) [12:45:04] qgil: I strongly prefer to just merge the oauth patch now and not wait for upstream. But I have no power to do so. [12:46:12] andre__: done [12:46:31] alright, disabled LDAP auth again [12:46:43] s/auth/registration/ [12:47:41] qgil: if chase isn't comfortable with merging the patch then I guess I'll bug epriestley to review the patch. I did notify him with an @mention when submitting the updated patch so he's at least aware of it [12:49:00] ok [13:43:10] chasemp: I can haz administrator access to phab? mainly wanting to look at how you have the custom fields configured but /config/ is 403 forbidden :( [13:45:48] twentyafterfour, I have added you [13:45:57] please try again [13:46:34] * andre__ has set up most workboards for projects that he still had data for; some smaller projects with less than 10 open tasks remaining; hope not too disruptive [14:25:59] :) [14:48:24] twentyafterfour: all of the config stuff is in puppet [14:48:27] no mystery [14:48:37] I mean there shouldn't be extra stuff local to install [15:33:39] twentyafterfour: hey man if you wanted to look at what it would take to put that cert in place for wmfusercontent.org in puppet [15:33:52] that would be great [15:35:35] chasemp: ok [15:43:58] chasemp: what machine hosts that domain? [15:44:31] there is a generic https termination cluster that consists of cp1043, cp1044 [15:44:46] uses the role::cache::misc class [15:44:56] but it's tricky as protoproxy is used for lots of prod stuffs [15:44:59] so lots to go wrong [15:49:16] so it should go to the iridium backend? [15:52:09] yes [15:56:18] uhhhhh [15:56:42] I get http://paste.debian.net/120424/ when trying to view the RelEng project page: http://phabricator.wikimedia.org/project/view/22/ [15:56:53] now a 503 [15:57:01] yes it's getting updated [15:57:07] ah, cool [15:57:10] good morning! [15:57:14] wfm [15:57:21] I get a generic error if I try to go to http://phabricator.wikimedia.org - http instead of https [15:57:27] andre__: does now, yeah :) [15:57:29] shall I open a ticket for that in our prod instance? [15:57:50] we know about it, to be talked aobut in 2 minutes [15:57:54] must be because you're logged in, I don't get an error with http-only [15:57:55] look at bottom right of page [15:57:56] * qgil just realizes that the sequence of comments is lost i.e. http://phabricator.wikimedia.org/T10 [15:58:13] oh wow, are they out of order [15:58:13] ok, but now let's meet :) [15:58:51] I'll be late :/ but no worries :) [15:59:06] This is an incentive to close the old tasks. Yay for sequential discussions! :) [16:01:03] looks sequential to me, but maybe not [16:38:01] migration question from a curious bystander. if phabricator.wikimedia.org already has tasks, does that mean hat the bugzilla ids will not line up after import? [16:59:41] cburroughs: no, they will not lineup, but there will be redirects for show_bug.cgi URLs [17:39:56] andre__: twentyafterfour probably want to hold off on updating tickets there until we are live, I was considering blowing data away to try to do the timestamp conversions [17:40:04] didn't realize you guys were doing work there already [17:40:15] anything we can't recreate easily? [17:40:42] chasemp: I haven't done anything significant [17:40:51] but I think andre__ has [17:50:46] just so I'm transparent, this scares me "Downtime of both Bugzilla and RT of 1-3 days" :) [17:50:57] from a pitchfork and torches perspective [17:52:15] we can just use sf.net to track bugs for those days! [17:52:25] (but yes, that scares me too) [17:52:29] * greg-g kicks legoktm in the shins [17:53:03] https://phabricator.wikimedia.org/T1 <-- so no chance of getting bug 1 to be that? :( [17:53:54] greg-g: not sure of alternative? [17:55:33] https://bugzilla.wikimedia.org/show_bug.cgi?id=1 will go to the right new task at least [17:56:00] I was originally told it'd be on the order of hours for BZ to be read-only. Via doing: 1) dump BZ 2) import into Phab (this gets the vast majority of the data) 3) make BZ read-only, 4) dump again (much faster) 5) import new things (much faster), 6) redirects in place 7) open the flood gates on Phab [17:56:46] Phab would be readonly until step 7 [17:57:12] (so there aren't conflicts) [17:57:36] not sure who told you that [17:57:49] IF we could make sure closed issues in bugzilla stay closed [17:57:53] we could pre-fetch all closed issues [17:57:54] twentyafterfour :) [17:57:57] and cut down the time dramatically [17:58:00] but [17:58:14] my understanding is untl the moment we turn it off anyone could reopen any issue and post new imfo [17:58:15] info [17:58:31] so we have to grab everything in one go and it has to be after taking it offline [17:59:02] that's a difficulty is the state of even the oldest bugzilla bug can change at any time [17:59:04] so it's more "dump current state"/"import state" as opposed to "dump a replayable script"/"replay the script in new instance"? [17:59:06] so it all has to be within teh window [17:59:31] what is the difference in those two scenarios? [17:59:32] not understanding [18:00:49] the first is a sitaution that would barf between step 1 and 5, the second is one that would only replay events that happened between step 1 and 3 when doing the import in step 5. [18:01:55] how do you replay changes since you dumped content without rechecking all tickets? [18:02:33] that's my point :) we're a "full state as one go" not a "there a recent changes stream I can suck down" [18:02:45] one is cp'ing a tarball, one is replaying git merges [18:03:01] (is my understanding) [18:03:08] sure I'm asking if you have an idea you are pitching for teh second [18:03:14] because you said you are unhappy with teh first or worried [18:03:41] well, yeah, worried, so if there was a way to extract the data from BZ in a git-merge-list kind of way, then I'd prefer that [18:03:42] I'm not sure how we could do it in that manner, any way I've thought of is not less time / work than freeze/dump/restore [18:04:04] it's not about less work, it's about customer expectations :) [18:04:22] I meant work in the window [18:04:28] which determines length of the window [18:04:31] if it's technically not doable, that's one thing [18:04:43] well...do you have an idea of how to do it? [18:05:10] no, I meant my voice of concern to be a starting point for brainstorming, sorry if that was unclear [18:05:26] chasemp: I set a few assignee and the workboards [18:05:55] greg-g: it's cool, not busting balls either, just unless someone has a way to do the thing you are suggesting [18:06:04] we are stuck w/ a long window [18:06:16] legoktm: no, that's not how it works :) [18:06:25] chasemp, but feel free to blow away again, really not a biggie [18:06:43] andre__: we should change how it works then xP [18:07:09] chasemp: no worries, so you don't see a way of doing it? if that's the case, then we just need to be prepared to say *why* it's a 1-3 day full on no task management work can be done situation [18:07:24] but I think losing bug 1 would be a huge loss :( [18:07:40] the api doesn't offer a transactional view of history persay, really would be unusual [18:07:43] show me all changes since x [18:07:48] but if it did then hell yeah [18:07:52] chasemp: /me nods [18:07:52] but w/o that [18:07:58] I have no idea how that would be feasable [18:07:59] MediaWiki does it! ;) [18:08:03] without caching a state of things [18:08:07] and then rechecking [18:08:10] yeah [18:08:16] which is more work than just grabbing a full good state 1 time [18:08:31] I've also had this convo like 10 times so I'm further down teh conclusion road :) [18:08:41] sorry :( [18:08:44] np [18:08:54] now I'm putting my emails greg said we should use mw instead of phab :D [18:09:44] chasemp: :P [18:09:45] in reality, I _think_ we can be back up from soup to nuts in less than 48 hours...I think [18:09:53] but leaving a full day to cover the "oh no!" case [18:09:58] yeah, good [18:10:00] is much better than saying 48 hours [18:10:10] and then communicating a last minnute delay [18:10:26] all my estimations are thought + padding, which ends up being close-ish [18:10:38] so, that downtime, I assume Quim or Andre are in charge of communicating it to the community and WMF Product Managers? :) [18:10:50] likely [18:11:03] they are going to do it in unison, an octave apart [18:11:08] is my understanding [18:11:48] octave? I was into a tritonus [18:12:41] can you do that while holding hands and staring deeply into each others eyes like we discussed? [18:12:45] impressive [18:13:13] not sure for how long, but yeah [18:13:54] hey was the conclusion then to force https? [18:14:03] I know there was chatter but I missed a definitive "yes" [18:14:55] I'd say it was [18:15:18] speaking of https, andre__ https://bugzilla.mozilla.org/show_bug.cgi?id=758857#c25 ;) [18:15:47] I have an idea about how to do the incremental catch up. when I get to a real computer. (bz import) [18:15:49] chasemp, I mean, if I go to http bugzilla, I end up on https too. [18:16:08] where is the bz import code anyway? [18:16:09] greg-g, speaking of, see my posting on ops@? :) [18:16:47] jeremyb: phabricator/tools ...but it's a bit messy and changing frequently at this point [18:17:15] is there a good prod puppet role I can apply in labs (or vagrant?) to magically get a phab test instance? [18:17:27] bbl [18:17:42] there is a phabricator::labs role, not sure if avail in the normal list, I usually ./localrun it [18:17:44] but it's there [18:18:22] Adding to the list in a project is easy -- https://wikitech.wikimedia.org/wiki/Special:NovaPuppetGroup [18:18:49] Adding globally takes some magic I don't know [18:20:12] I meant to do get it in there, just haven't had time and no one asked yet who wasn't ops and can do it themselves :) [18:21:16] andre__: ah, I missed it (deep thread/lots of messages today) [18:22:16] andre__: that's probably worth a Scrum of Scrums mention (and even, mentioning it to mark to put on the TechOps meeting agenda for monday) [18:24:59] greg-g: hmm, probably a good idea that I wouldn't have had myself. Now how do I escalate that properly? [18:25:13] jeremyb: drop me on a line on the idea, I can at least tell you if we've tried / explored it [18:30:11] chasemp: I expect to be on sometime between 1-2 hours from now [18:30:59] all good, I'm around [18:34:03] andre__: for scrum of scrums: attend it (or have Quim do so in your absence?), for TechOps meeting, forward that email to mark to add to the agenda [18:53:38] twentyafterfour: any luck on the ssl cert SNI stuff? [18:53:45] going to look at it now if you haven't gotten a chance [18:54:20] I looked at it but I've been watching reedy do a deployment so I couldn't concentrate on that much [18:54:44] I don't see anywhere that mentions the wmfusercontent domain and I don't see anywhere to put the ssl cert specifically [18:57:34] k [18:58:52] is that domain even set up at all or is it brand new just for this purpose? [19:04:45] new [19:05:25] and no nginx sni anywhere else [19:05:32] so it's all open to disaster :) [19:05:50] not my call man [19:17:07] the domain itself is RT #8161, for the records [19:22:10] chasemp, if you look at the last comment at http://phabricator.wikimedia.org/T10 you will see that it is not the most recent. The sequence is broken several times. I haven't checked many tasks more, and many look correct, so I don't know how big is this problem now. [19:23:56] thanks qgil [19:45:38] maybe we just dropped removed comments? dunno [19:57:58] Something that I think it is not yet clear and wasn't discussed in the meeting: we will need a test instance back, to try new stuff e.g. all the code review stuff. It an also serve as school for newbies, to avoid tests tasks in the real Phab. [19:58:35] Is the creation of this test instance in your plans? In Labs or some phabricator-test.wikimedia.org, for better replication? [19:59:44] It can (or even "should") be empty, and this time users will have to expect that nothing there will stay [20:00:24] qgil: could wipe it every X days to make that explicit, even ;) (I'm not seriously recommending that, unless it seems like a good idea later on) [20:00:27] that's pretty easy [20:00:32] all the puppet stuff is ther [20:00:33] e [20:01:04] great, then when? [20:03:03] before or after phabricator.wikimedia.org is open with oauth? Before RT migration, I guess? (no need to answer with dates, but relative to other milestones) [20:04:26] greg-g, I think the content can be like https://test.wikipedia.org/ , where it is clearly volatile (although it seems to be surprisingly persistent) [20:05:12] I guess just like test.wikipedia, it is useful to keep the content through updates to check that no update wipes out everything unintentionally :) [20:05:22] qgil: test/test2.wikipedias aren't volatile in any "the DB-died" sense of the word, only "you stuff might get edited by someone else and you better be ok with that 'cuz this is a wild west" [20:05:53] that would work conceptually for phabricator-test as well [20:05:59] I kind of see it as...I'll have a test instance for migration [20:06:08] anyone is free to stand one up for other things? [20:06:18] whenver they need [20:06:55] chasemp, will users be able to play with the one you use for testing the migration, knowing that sometimes might be blocked, down, empty? [20:07:21] preferably not as I won't know what changes were what, makes it hard to test automated stuffs [20:07:38] instances are cheap tho no? [20:08:04] they are if you can set a third on the fly, otherwise I'll need to find a volunteer (or twentyafterfour ) [20:08:35] I think jeremyb was going to set one up for fun [20:08:39] and poking [20:08:58] if jeremyb can use the same puppet rules that you are using, then great [20:09:32] honestly, we need one that is supported and for people to play with repos/diffs/etc [20:09:46] not jeremyb's that he's using as a test for migration of bz [20:09:46] bottom line: you don't worry about this third instance, I will find someone [20:10:48] chasemp, however, would you set it up as phabricator-test.wikimedia.org or Labs (or Labs now, and then when we have time phabricator-test.w.o)? [20:10:51] i can't find another ticket w/ messed up history besides T10 [20:10:53] this is making me nuts [20:11:11] is there another example? need a common thread, Idk if labled dates are wrong or what [20:11:12] let me see... [20:12:09] we could do phabricator-test.w.o [20:12:16] I don't know what other similar stuff does [20:12:21] (To continue my thought/that point: Just like when Robla and Erik said today that this shows that we need more eyes on this stuff pre-migration, we *need* an instance that is there for testing, like what fab.wmflabs.org was (but with an explicit "we won't migrate data, mmmkay?" policy) [20:12:23] probably a better greg-g question [20:13:01] agreed and agreed, but ppl critiquing migration "outcome" / "look" [20:13:05] on the same isntance ppl can change [20:13:05] test.phabricator.w.o probably isn't a good idea, eh? [20:13:07] is not really great [20:13:18] as ppl will complain about things that are just things other ppl have changed [20:13:21] word, so remove admin abilities? [20:13:35] yes but then it can't also be quim's idea :) [20:13:42] so I was going ot standup a "this is what it will look like" instance [20:13:43] sure [20:13:49] but I think he wants another that is play time [20:13:54] oh [20:13:57] I missed that [20:13:57] which I have no opinion on really [20:14:00] sounds good [20:14:03] but where to put it [20:14:41] https://phabricator.wikimedia.org/T124 also has jumps [20:14:51] the instance to solicit feedback for UI I will take care of for sure [20:14:52] it could be that you just need long discussions [20:15:01] hmmm, maybe [20:15:16] thank you for looking [20:15:30] What is the problem with phabricator-test.wikimedia.org to test new stuff, learn, and play with? [20:16:36] I was asking for a place people can: interact as they would with a real instance (add/edit/comment on tasks, add/edit/review DIffs, etc etc) but not necessarily have admin rights to change the look/feel from what is at phabricator.w.o [20:16:52] greg-g, sure, that works for me [20:17:08] peope would test with the permissions they would usually test [20:17:23] (also good for testing upgrades) [20:17:54] word [20:17:58] GO TEAM! [20:18:06] :) [20:18:11] e.g. an average user would be an average user, but someone like Chad, Antoine or Roan could be admins or whatever role with special permissions when they need to figure out how code review will work [20:19:33] so let's say it's the same instance, but doesn't get turned over to the wild [20:19:36] until after bugzilla [20:19:42] and before that it's "come see how it will look post migration" [20:21:39] chasemp, this means that people will not have anything to play with before the Bugzilla migration, and this might mean that we will have the pressure and the garbage in pahbricator.wikimedia.org [20:22:05] then we can not say that :) either way to me [20:23:02] It is our interest to have a test instance to replace the void of the decomissioned Labs instance. The more users we have with a clue about Phabricator, the less trouble we will have right after the Bugzilla migration [20:23:25] if you say that instances are cheap and easy to create, I'd like to offer one to these new users sooner than later [20:25:03] maybe we can have fab.wmflabs.org back, and then use the one for testing the migration, after the migration? Would that be the easiest compromise? [20:26:03] not sure what this means ' and then use the one for testing the migration, after the migration' [20:26:04] https://phabricator.wikimedia.org/T149 also has jumps, I'm not finding problems to catch them, apparently [20:26:06] but overall sounds good [20:26:21] looking at the sorting issue now [20:26:25] 1. bring fab back [20:26:39] 2. you get your instance for testing the migration and only you use it [20:26:48] (others can watch, not touch) [20:26:56] 3. Bugzilla migration [20:27:21] 4. Your instance becomes phabricator-test for everybody with managed permissions etc [20:27:31] 5. we unplug fab for good [20:29:50] sure, assuming my instance is in labs as well and we use the puppet stuff to set up the new fab [20:29:54] I think is good [21:43:40] chasemp: hey. sorry my schedule is crazy. typing something quick and then running again [21:46:05] chasemp: so look at https://bugzilla.wikimedia.org/buglist.cgi?chfieldfrom=24h&columnlist=bug_status%2Cchangeddate& [21:46:23] basic idea is agnostic to where you're importing *to* (doesn't have to be phab) [21:46:40] do a complete, consistent dump. leave BZ read only [21:46:52] import dump to new instance [21:47:45] set readonly and dump again and run query like I just linked to. for each bug in the results for that URL, do a diff for that one bug between old dump and new dump [21:47:57] catch up the new instance with whatever it had missed [21:48:11] maybe there's something i've not thought of though [21:48:13] also... [21:48:42] I'm concerned that we wouldn't have a chance for people to evaluate problems with the import process [21:48:56] and then you couldn't scrap it and rerun the import [21:49:10] e.g. this fab -> phab [21:49:18] messed up comment order [21:50:21] so, i think we need to test the script by importing to a throwaway instance and having people like andre__ and carmela (mzmcbride), nemo, etc. dig up where the problems are [21:50:50] and then rerun import to prod, make BZ read-only, rerun import again in catch up mode [21:51:10] also, wtf @ phab can't be read-only for maint windows???!?!?????? [21:51:20] I can't imagine any ops being happy about that [21:51:26] ok, bye :) [21:51:28] greg-g: ^ [21:51:33] chasemp: ^ [21:53:02] so on the first thought, we could do that but it adds more time than doing it the other wy [21:53:04] way evn [21:53:08] dumping is the expensive part [21:53:15] so dumping to later dump to then compare [21:53:25] is more expensive than dumping and then creating from vanilla [21:53:37] on teh second point, yep we basically came to the same conclusion [21:53:46] find a way to get more eyes on a prototype [21:55:45] as long as that prototype has a real data import, I guess [21:56:43] I was thinking some subject [21:56:47] every 100 tickets or something [21:56:49] idk [21:56:52] every 1000 even [21:57:44] also if you set phab to read only for maintenance in this case, which is actually a case of needing to create content I don't know how it works [21:57:46] subset? :) [21:57:57] whao, didn't even notice that correct [21:57:59] yes [21:58:00] :) [21:58:21] if it's a subset we'd be liable to miss the stuff that qgil found this morning [21:58:51] your thought is to restore 80,000 tickets into a labs instance? [21:59:09] random sampling [21:59:12] or a prod test one [21:59:18] that's fair but we don't have one [21:59:21] so that adds to the timeline [21:59:32] what's the turn around like for a new machine nowadays? [21:59:40] * greg-g asks robh [21:59:54] few days setup [22:00:02] after we get it [22:00:04] * greg-g nods [22:00:38] I thought the puppet stuff made that easy peasy (/me is trying to reconcile what you said earlier and now, if "easy peasy" (not your words) meant "a few days") [22:00:42] plus you have to get teh db allocation [22:00:48] * greg-g counts parens, yep, I'm good [22:01:01] a labs one is easy peasy [22:01:06] a physical machine allocation [22:01:09] is more work [22:01:17] you are familiar with vm's vs physical :D [22:01:33] plus you are talking prod db then, etc [22:01:39] backups and all kinds of stuff [22:01:41] idk [22:02:06] it's another prod setup completely, which is not aweful but a bit of work to setup and verify, do we monitor it? [22:02:17] who is responsible for up keep? do we put it behind the misc-web? [22:02:29] all that is setup