[06:19:32] 10Phabricator: Resolving a task should not assign it to me - https://phabricator.wikimedia.org/T138806#2410881 (10bmansurov) [06:41:59] 10Phabricator: Resolving a task should not assign it to me - https://phabricator.wikimedia.org/T138806#2410909 (10Peachey88) [06:42:02] 10Phabricator: Phabricator is auto claiming tasks when closing as resolved (again) - https://phabricator.wikimedia.org/T134106#2410911 (10Peachey88) [06:46:23] 10Phabricator: Phabricator is auto claiming tasks when closing as resolved (again) - https://phabricator.wikimedia.org/T134106#2410918 (10bmansurov) > What is the use-case for closing tasks but leaving them owned by nobody? That makes it difficult to search for tasks closed by a given user. When signing-off on... [09:49:17] 05Gitblit-Deprecate, 06Operations, 06Release-Engineering-Team, 07Developer-notice, and 2 others: Redirect Gitblit urls (git.wikimedia.org) -> Diffusion urls (phabricator.wikimedia.org/diffusion) - https://phabricator.wikimedia.org/T137224#2411057 (10Danny_B) >>! In T137224#2407860, @Dzahn wrote: > I merged... [09:51:52] 05Gitblit-Deprecate, 06Operations, 06Release-Engineering-Team, 07Developer-notice, and 2 others: Redirect Gitblit urls (git.wikimedia.org) -> Diffusion urls (phabricator.wikimedia.org/diffusion) - https://phabricator.wikimedia.org/T137224#2411064 (10Danny_B) I guess the cause may be the asterisk in ` A spammer: https://phabricator.wikimedia.org/p/Lullu121verma/ [13:30:45] Andre is on vacation, can someone deal with them, please? [15:56:18] Hi, can't think of a better place to ask: I can't access glamtools, it's just stuck loading. Is it down? [15:59:02] #wikimedia-labs would probably be a better place :) [15:59:24] not a development tool [16:03:16] thanks [16:03:18] :) [17:35:14] 10Phabricator, 06Design-Research, 06Team-Practices: Design-Research-Archive tasks seem to remain unresolved forever though there is often nothing left to do - https://phabricator.wikimedia.org/T138662#2406725 (10Capt_Swing) These can be closed. We were following @DarTar's system. Then he changed his system.... [18:37:54] 05Gitblit-Deprecate, 06Operations, 06Release-Engineering-Team, 07Developer-notice, and 2 others: Redirect Gitblit urls (git.wikimedia.org) -> Diffusion urls (phabricator.wikimedia.org/diffusion) - https://phabricator.wikimedia.org/T137224#2412573 (10Dzahn) Yes, the problem appears when these rules are adde... [18:48:36] 05Gitblit-Deprecate, 06Operations, 06Release-Engineering-Team, 07Developer-notice, and 2 others: Redirect Gitblit urls (git.wikimedia.org) -> Diffusion urls (phabricator.wikimedia.org/diffusion) - https://phabricator.wikimedia.org/T137224#2412605 (10Paladox) The redirect Bugzilla looks like https://phabric... [19:09:48] what is the difference between the phabricator ::labs role and the prod role? [19:10:01] is it easy to tell why we cant just use the prod role [19:10:10] since that would be much better for testing [19:10:16] twentyafterfour ^^ [19:10:32] or was it just one specific thing that cant work in labs [19:11:00] and separately, paladox , which error did you get when using the labs role on a new instance? [19:12:18] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find data item scap::deployment_server in any Hiera data file and no default supplied at /etc/puppet/modules/scap/manifests/target.pp:97 on node git-phab4.git.eqiad.wmflabs [19:12:19] Warning: Not using cache on failed catalog [19:12:19] Error: Could not retrieve catalog; skipping run [19:12:19] trusted_proxies => $cache_misc_nodes[$::site], [19:12:19] line 34 ^^ [19:12:22] mutante ^^ [19:13:10] ah.. hmm [19:13:41] wondering if that just happens because you are in a different project and is fixed in the phabricator project [19:13:56] mutante not sure maybe [19:14:02] because it's a hiera lookup... [19:14:08] Oh. [19:14:25] so you saw phab-01 thrugh phab-05 ? [19:14:38] are they all configured they same way and using that ::labs role? [19:15:07] do you have access for that? [19:16:25] mutante [19:16:28] i doint have access [19:16:31] But you do [19:16:40] And i doint think there configured the same [19:16:42] as each other [19:17:27] which of the instances should be used for testing? [19:17:32] (asking the channel) [19:17:53] for testing the Apache setup that is.. so it's as close to prod as possible [19:18:31] or i can just make yet another one [19:18:43] mutante looks like this one https://wikitech.wikimedia.org/wiki/Nova_Resource:Phab-04.phabricator.eqiad.wmflabs [19:18:46] apply that ::labs role and see what happens [19:18:49] but yes set up another one [19:18:58] please [19:19:16] i would like it if it wasnt't a "labs" role [19:19:34] mutante we could try prod puppet role [19:19:36] ok, what made you pick -04 ? [19:19:55] Since it is the only one showing it as using the phabricator labs role. [19:20:03] and was running ubuntu trusty [19:20:08] same as the prod machine [19:20:08] mutante: The prod role has a bunch of things that don't work in labs, but could probably be dealt with [19:20:39] you can use any of the phab-0* for testing [19:21:10] twentyafterfour i think phab-01 keeps going down due to small instance [19:21:23] i think all phab instances needs to be a medium or higher [19:21:27] twentyafterfour: ah! hi! that would be cool as a medium term goal. have an instance that is using the prod role and has no other changes. so some instance is for staging new stuff and one is just always like prod [19:21:36] since i had the problem with running phd the daemons and it crashing. [19:21:48] paladox: ah, you're probably right [19:21:49] twentyafterfour: short term it's just where we test those rewrite rules that DannyB wrote.. [19:22:13] mutante: yeah like paladox said, phab-04 sounds like a fine one or any of them really [19:22:16] twentyafterfour yep, would there be a way to swith phab-01 to a medium to prevent any more crashes please. [19:22:23] I was working on phab-tin for deployment via scap [19:22:30] but that's not quite set up right atm [19:22:45] paladox: I don't know how to switch it other than destroy and re-create [19:22:50] oh [19:23:11] it's probably better to destroy and re-create [19:23:15] anyways [19:23:26] twentyafterfour could you help us test the rewrites for git.wikimedia.org please. [19:23:35] Since we broke bugzilla by mistake. [19:23:51] so these rules seem to be ok when they are standalone [19:24:01] but once combined with the existing setup for bugzilla [19:24:03] it breaks [19:24:34] mutante would havent it set on a different apache port work [19:24:48] that's why we need tests with the same apache config and it's not sufficient to run on git.wmflabs or a laptop [19:24:53] I found with gerrit and jenkins on a test instance i had to do that to allow them to work. [19:25:37] but we dont want to run 2 Apache instances in prod anyways [19:25:46] so the rules need to work together [19:26:04] oh ok [19:26:44] that would mean 2 different varnish backends on the same instance.. etc.. just for this [19:26:58] it's about the VirtualHosts and the order they are in [19:27:05] oh [19:27:17] brb [19:27:53] or something.. we need to compare the Apache config part between 04 and prod too [19:28:32] the root of the issue remains that nothing is a real test if we have diffs [19:34:03] paladox: I'm on train duty [19:34:09] I can try to help test [19:39:53] Im back [19:39:58] twentyafterfour Ok [19:40:19] mutante ^^ [19:40:50] twentyafterfour hi, could you have another look at https://phabricator.wikimedia.org/D273 please or do you have a better way please [19:42:59] twentyafterfour: unrelated..i got pinged about rebooting iridium tomorrow during phab maintenace.. for kernel upgrade [19:43:28] mutante do you want to use phab-04 or should we create another instance [19:43:31] to test on please [19:45:32] mutante: I can probably do the reboot from the shell but it wouldn't hurt to have an opesen around just in case [19:45:47] twentyafterfour: ok, i can be around, np [19:46:21] twentyafterfour currently i think https://phabricator.wikimedia.org/D273 is the only way for now. [19:46:31] Since it will cause refs/cache* to break repos [19:46:45] currently without that patch [19:48:57] mutante :) [19:50:16] paladox: did you want to request access to phab project? [19:50:37] mutante im not sure, how do i do that. [19:50:43] please [19:51:04] not sure either [19:51:19] Oh, maybe i could ask twentyafterfour [19:51:44] twentyafterfour hi, would i be able to access the https://wikitech.wikimedia.org/wiki/Nova_Resource:Phabricator please [19:51:46] per mutante [19:51:54] paladox: added you to the project [19:51:59] :) [19:52:14] I thought you were already a member :) [19:52:25] twentyafterfour thanks, and no, i was on a different project. [19:52:27] :) [19:52:45] made you a projectadmin as well [19:52:53] twentyafterfour thanks :) [19:53:04] mutante should i create the instance? [19:55:21] paladox: sure, it won't hurt to make a new one. as Ryan always said, we should treat instances "like cows not pets" [19:55:29] create, test, remove [19:55:33] mutante ok :) [19:55:37] which size [19:55:39] please [19:55:55] default / same as the other existing ones [19:55:55] medium? [19:56:04] Ok thanks twentyafterfour [19:56:18] how do i ssh. Do i do the same. ssh into bastion. [19:56:22] ? [19:56:28] yes it's the same [19:56:59] I have *.wmflabs set up in my .ssh/config so it automatically uses the bastion [19:57:12] twentyafterfour oh, thanks do i set the os as trusty [19:57:16] ? [19:57:38] please [19:57:59] paladox: yes [19:58:04] Ok thanks [19:58:12] paladox: ssh just like we set it up for git.wmflabs.org [19:58:20] that instance i made in the other project [19:58:25] mutante ok, tahnks [19:58:27] thanks [19:58:37] except you have to put .phabricator. in host name for this [19:58:39] mutante and twentyafterfour ive set it up as phab-05 [19:58:45] medium and os is trusty [19:58:46] :) [19:58:48] phab-04.phabricator.eqiad.wmflabs [19:58:52] same style as this [19:59:00] and yes, use trusty [19:59:04] because prod is trusty [19:59:18] Ok [19:59:20] :) [19:59:58] thanks paladox, i will be back in a few minutes [20:00:10] mutante ok, and your welcome. [20:00:12] then we can compare Apache config [20:00:14] with iridium [20:00:21] after you applied the role [20:01:55] ok [20:02:23] twentyafterfour i think phab-01 has crashed again since website is not working properly and i carn't seem to ssh into it. [20:02:27] What should we do. [20:02:46] Should we destroy it and re create it as a medium instance to prevent it crashing all the time. [20:03:16] i think there are 2 separate use cases here [20:03:30] one is to test new stuff before it gets to prod [20:03:34] Yep [20:03:44] and the other is to test changes on something that is like the existing prod [20:04:18] we could define which instance is for what [20:04:28] Yep [20:04:31] or use slightly different names [20:04:35] instead of just the numbers [20:04:42] mutante i think one of the instances doint work. [20:05:02] make a new phab-01 too but separate from what you just did [20:05:30] -staging vs -somethingelse .. hmm [20:05:33] mutante should i delete phab-01 and recreate it [20:06:42] mutante rebooting phab-01 brings up ERROR [20:07:39] hmm. do not delete it [20:07:46] mutante oh, what do i do [20:07:55] How do i get it to work [20:08:02] please [20:08:18] re-open https://phabricator.wikimedia.org/T137270 [20:08:23] and ask there what to do [20:08:31] ok [20:10:44] 10Phabricator, 06Labs: Upgrade phab-01.wmflabs.org - https://phabricator.wikimedia.org/T127617#2412816 (10Paladox) [20:10:48] 10Phabricator, 06Labs: https://phab-01.wmflabs.org returns a core exception - https://phabricator.wikimedia.org/T137270#2412814 (10Paladox) 05Resolved>03Open Re opening this. I tried ssh into it but failed I looked at the website and it showed http 500 error so I rebooted it but now it failed to reboot.... [20:12:00] mutante i can setup another instance and migrate it from phab 01 since we can re point the domain dns to the new instance then fix up phab-01 and re point it back to it. [20:12:57] mutante says this now [20:12:58] channel 0: open failed: connect failed: No route to host [20:12:58] stdio forwarding failed [20:12:58] ssh_exchange_identification: Connection closed by remote host [20:15:54] mutante it says this [20:15:55] Messageinternal error: No PCI buses available [20:15:58] twentyafterfour ^^ [20:17:16] paladox: ok, let's handle that via the ticket. let's focus on the redirects for now [20:17:44] mutante ok [20:18:00] that pci buses notice I believe is related to general labs overload [20:18:03] re-pointing DNS is just clicking in wikitech ui too btw [20:18:17] "manage proxies" [20:18:19] I don't have a minute to try to shuffle that instance now but if there is a task cc me on it [20:18:42] chasemp oh this one https://phabricator.wikimedia.org/T137270 please [20:19:23] paladox: let's look at -05 [20:19:25] yeah that's fine there is a general ticket for instances struck by overload [20:19:31] but I can't get to this right this moment [20:19:36] mutante ok [20:19:39] idk why we have so many phab instances tbh [20:20:02] and chasemp i doint think we have a ticket for the issue like that. But we have one for phab-01 going down [20:20:25] because you never know what the others have been used for, so you create a new one to be sure it's "clean" [20:20:43] Yep [20:20:46] i think that is why [20:21:39] only of them is configured to use the ::labs role when paladox just checked [20:21:46] Yep [20:21:47] and none with the prod role [20:21:54] so the others are manual testing [20:24:02] what we wanted is just one of them to be actually like prod otherwise you dont know if the test is a real test [20:24:14] yep [20:24:16] we should clean sweep those VMs [20:24:39] the labs role comes w/ a different banner since no oauth, local mysql, and none of the email crazy routing or prod monitoring [20:24:52] afa phab itself is concerned there isn't a difference [20:25:03] it's the ecosystem that has to be self contained or nonexistent [20:26:06] ok, thanks, we wanted to find out just that, what the difference is in the ::labs role [20:26:31] chasemp is there a way to swap phab-01 to a medium instance [20:26:40] without deleting please [20:26:42] no, other than delete it and set up a new instance [20:26:50] and we need ot remove some of these to do that [20:26:54] Oh ok [20:26:56] we are literally on overload atm in labs [20:26:56] actually all we need right now is the same Apache config .. and a new proxy [20:27:06] mutante we can rename the instance [20:27:15] then setup a new phab-01 [20:27:34] you cannot rename the instance like that in the VM itself if that's what you mean [20:27:46] we just change the proxy config , paladox [20:27:57] but let's stick to one of the issues at a time [20:28:04] chasemp oh, but it shows the option to do it. [20:28:26] in horizon? [20:28:40] Yep [20:28:41] I wouldn't trust that at all, there are too many things associated and I'm not sure it's been tested [20:28:48] puppet certs, etc [20:28:51] chasemp ok. [20:30:06] paladox: did you apply a role on the new instance yet? [20:30:18] mutante im going to do that now [20:30:21] cool [20:30:35] mutante how do i apply the role again in puppet on instance [20:30:39] not on wikitech [20:30:40] please [20:31:01] you click "configure instance" in wikitech [20:31:12] mutante yep, i mean in instance [20:31:15] The command [20:31:20] please [20:31:25] you can't afaik [20:31:28] Oh [20:31:34] i thought you run puppet [20:31:37] puppet run [20:31:43] puppet agent -tv [20:31:46] Thanks [20:31:47] yes, you do that [20:32:01] but after telling it via wikitech which role class to use [20:32:14] It says [20:32:14] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: DNS lookup failed for phab-tin.phabricator.eqiad.wmflabs Resolv::DNS::Resource::IN::A at /etc/puppet/modules/scap/manifests/target.pp:98 on node phab-05.phabricator.eqiad.wmflabs [20:32:14] Warning: Not using cache on failed catalog [20:32:14] Error: Could not retrieve catalog; skipping run [20:32:48] hmm.. that means the ::labs role is broken? [20:33:02] try running puppet on the -04 instance [20:33:08] same error ? [20:33:41] or failed DNS lookups are also symptoms of the general labs overload? [20:34:36] if that's the case we can't really test it i guess [20:34:56] maybe we should postpone the announced change [20:35:09] no dns should be ok, no clue whwat the deal is there [20:35:13] mutante phab-04 is suffering the same as phab-01 [20:35:23] plus i coulden ssh into phab-04 [20:35:24] that's a scap issue [20:35:31] probbly because they moved deploy of phab to scap3 [20:35:35] sigh, ok. so we have instances without a role.. and the ones with the broken role [20:35:44] and maybe none of this labs stuff was ever dealt w/ [20:35:49] that's a releng question I guess [20:35:52] how hae they been testing? [20:35:53] mutante we can create a new puppet rule without scap ? [20:35:53] idk [20:36:03] yes, what chase just said [20:36:11] i wanted to say the same [20:36:23] Ok [20:36:32] i also dont know [20:37:03] twentyafterfour can we help [20:37:04] please [20:37:07] paladox: can you see the Apache config on -04 though? [20:37:12] mutante nope [20:37:15] .. [20:37:16] i carn't ssh into it [20:37:21] plus it is suffering the same fault [20:37:24] as phab-01 [20:38:08] did you keep track of the number of issues ?;p [20:38:17] since we "just" wanted to test [20:38:35] can you ssh to any other instance? [20:38:38] let me try as well [20:39:20] mutante ok, i will try the 2nd and 3rd one [20:40:15] ssh root@phab-04.phabricator.eqiad.wmflabs [20:40:15] channel 0: open failed: connect failed: No route to host [20:40:15] stdio forwarding failed [20:40:20] yea.. no [20:40:36] Yep [20:40:40] I get that error [20:41:26] i dont know what to say, but nothing workds today [20:41:32] Yep [20:45:11] i am on -05 now, but doesnt help because of the puppet issue [20:46:17] so now we could try to manually copy the Apache config .. and then we have yet another instance with manual stuff :p but we can delete it right after we used it [20:46:24] at least [20:47:34] that also means manually installing Apache, unless we have another role that currently does that [20:47:55] there used to be webserver::php5 but pretty sure that is historic and doesnt work anymore [20:48:03] looks [20:48:22] Oh [20:48:54] do you see any other role that instance use, that installs Apache? [20:49:14] well, it's also puppet groups _per project_ .. [20:49:27] so we'd have to add that in phab project context [20:50:10] mutante [20:50:11] apache: [20:50:11] [20:50:11] [20:50:11] role::simplelamp [20:50:28] ah, right. wanna try that on -05 ? [20:50:40] mutante ok [20:50:43] I will do that now [20:51:20] It says [20:51:21] Notice: Run of Puppet configuration client already in progress; skipping (/var/lib/puppet/state/agent_catalog_run.lock exists) [20:51:23] mutante ^^ [20:51:53] that's when puppet is already running [20:52:01] Oh [20:52:01] you can just wait [20:52:03] Ok [20:52:06] or tail -f /var/log/syslog now [20:52:10] Ok [20:52:12] to see the output of the ongoing run [20:52:44] how do i exit it [20:52:46] mutante [20:52:46] it is started via cron, so that was just timing [20:52:58] meaning the screen [20:53:01] please [20:53:20] exit from "tail -f?" ctrl+c [20:53:30] Thanks [20:53:46] i see it installed apaache and mod_rewrite. which we need [20:53:47] good [20:53:53] Notice: /Stage[main]/Role::Labs::Instance/Exec[enable_sites_local]/returns: executed successfully [20:53:53] Notice: Augeas[Apache2 logs](provider=augeas): [20:53:53] --- /etc/logrotate.d/apache2 2016-05-04 17:07:02.000000000 +0000 [20:53:53] +++ /etc/logrotate.d/apache2.augnew 2016-06-28 20:53:43.052539723 +0000 [20:53:53] @@ -1,7 +1,7 @@ [20:53:55] /var/log/apache2/*.log { [20:53:57] - weekly [20:53:59] + daily [20:54:01] missingok [20:54:03] - rotate 52 [20:54:05] + rotate 30 [20:54:07] compress [20:54:09] delaycompress [20:54:10] yea, ok, no need to paste all that [20:54:11] notifempty [20:54:13] Notice: /Stage[main]/Apache::Logrotate/Augeas[Apache2 logs]/returns: executed successfully [20:54:15] mutante ^^ [20:54:22] yea, that's normal [20:54:29] Ok sorry [20:54:30] doesnt matter for us for testing [20:54:37] Ok [20:55:04] ok, hold on.. getting config from prod [20:55:13] mutante http://phab-05.wmflabs.org/ [20:55:15] Ok [20:55:26] you can try making a proxy [20:55:42] mutante already done http://phab-05.wmflabs.org/ [20:56:04] oh, ok. do we need to delete the old one? [20:56:09] so git.wmflabs is free again [20:56:12] and use it for this [20:56:18] Ok [20:56:48] yea, in the context of the project "git", you can delete that, and also the instance if you want [20:57:05] and then re-create it and let it be used by phab-05 [20:57:24] so we can test against git.wmflabs.org [20:57:27] Ok [20:57:35] Actually i also use project git [20:57:40] but i can delete the domain [20:57:43] if you want [20:57:47] then just delete the domain, yea [20:57:49] the proxy [20:57:50] Ok [20:58:47] mutante done https://git.wmflabs.org/ [20:59:43] on iridium, prod phab [20:59:53] we have 50-phabricator.conf [21:00:04] 6 ServerName phabricator.wikimedia.org [21:00:04] 7 ServerAlias phab.wmfusercontent.org [21:00:15] but that file does not have the Bugzilla rules [21:00:51] going back to DannyB's change [21:01:41] Ok [21:02:14] yea, so it doesnt work like that either.. hrmm [21:02:19] Oh [21:02:25] we need the result of the change that had to be reverted [21:02:42] i cant just copy existing config [21:02:47] mutante what do you mean by results of change. [21:02:58] https://gerrit.wikimedia.org/r/#/c/296085/2/modules/phabricator/manifests/init.pp [21:03:16] https://gerrit.wikimedia.org/r/#/c/296138/ [21:03:18] mutante ^^ [21:03:39] yea, so put that on this intance [21:03:49] mutante oh, how do i do that [21:03:51] please [21:04:01] that's what i mean by "we need the result of the change" [21:04:26] the content of the template https://gerrit.wikimedia.org/r/#/c/296138/2/modules/phabricator/templates/gitblit_vhost.conf.erb [21:04:36] but without the templating stuff [21:04:43] Do i need to add class phabricator to puppet group [21:05:47] no, that's what we tried before this and didnt work, right [21:06:06] Oh i tryed the phabricator::main [21:06:20] But yes dosent work [21:06:38] How do i apply the files we want to apply please [21:08:07] mutante ^^ [21:08:10] twentyafterfour [21:09:31] you need a puppet role that uses the code above and gets applied on the instance.. or manually copy/paste from that .erb template, then remove all the templating syntax.. [21:10:07] so it's back to the issue we had before or manual and error-prone [21:10:28] i'll give it a shot nevertheless..running out of options [21:11:44] Ok [21:11:48] mutante thanks [21:13:00] paladox: meanwhile we should have a ticket about the puppet error with the main:: role [21:13:12] need that fixed anyways [21:13:21] mutante ok. Im not sure what to write. [21:13:32] just paste the full error that puppet gave you [21:13:37] Ok [21:13:41] say which role you picked, onj which instance [21:13:49] that should be enough [21:14:00] Ok [21:14:35] ... [21:17:22] ooh, we still have 50-git-wikiemdia.org.conf on iridium in sites-available [21:17:39] because reverting dpuppet id not mean it deletes that [21:17:55] did not manually delete it [21:18:10] so i copied that over to the -05 instance [21:18:11] 10Phabricator, 06Labs: Applying role role::phabricator::main causes errors on instances - https://phabricator.wikimedia.org/T138881#2412999 (10Paladox) [21:18:14] mutante https://phabricator.wikimedia.org/T138881 [21:18:43] twentyafterfour could we have some help with testing please :) [21:18:56] paladox: I'm still on train duty [21:19:08] what can I help with? I lost track of the convo in here [21:19:23] twentyafterfour we are trying to apply https://gerrit.wikimedia.org/r/#/c/296138/ [21:19:40] but also role::phabricator::main and role::phabricator:labs doint work [21:20:23] role::phabricator::labs should work but you have to clone /srv/phab manually or do a scap deployment from a deployment server [21:20:44] Oh [21:20:46] and we don't have a place to deploy from yet so just cloning rPHDEP into /srv/phab should do the trick [21:21:13] twentyafter would not cloning that stop role::phabricator::labs from working [21:21:17] twentyafterfour ^^ [21:21:35] Since i get [21:21:35] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: DNS lookup failed for phab-tin.phabricator.eqiad.wmflabs Resolv::DNS::Resource::IN::A at /etc/puppet/modules/scap/manifests/target.pp:98 on node phab-05.phabricator.eqiad.wmflabs [21:21:35] Warning: Not using cache on failed catalog [21:21:59] i did this: copy 50-phabricator.conf from production and 50-git.wikimedia.org.conf from gerrit to labs instance phab-05 [21:22:14] restarting Apache, syntax error [21:22:30] Could not open configuration file /etc/apache2/phabbanlist.conf: [21:22:41] so it would really be easier to use the puppet role [21:22:45] that also installs that [21:22:54] instead of using simplelamp [21:23:15] mutante seems /srv/phab is installed [21:23:43] mutante: I have no idea what it'd take to make production role work on labs [21:23:50] production role has a lot of extra crap [21:24:15] define "crap" [21:24:24] how do we test changes [21:24:40] is there any isntance that has the same setup as iridium when it comes to Apache [21:24:53] where we can also apply the suggested change by DannyB [21:25:09] i had to revert that in prod [21:25:29] and i dont want to keep testing in prod [21:25:42] mutante: that was the reason I asked for phab2001 [21:25:48] because I had no way to test things [21:25:55] well, among other reasons [21:26:02] i thought that is for failover when we switchover DCs [21:26:10] or if a server dies [21:26:14] it's that too [21:26:21] _and_ testing? [21:26:28] that seems like a conflict [21:26:31] but having a way to test things without killing prod, would be a nice benefit [21:27:00] well, if we have broken redirects.. for example for the old bugzilla URLs [21:27:03] like we had with this one [21:27:08] and people click the URLs [21:27:12] they get cached in varnish [21:27:20] and next we cant even revert easily [21:27:21] it's kind of hard to test redirects, no matter how you go about it [21:27:35] for the bugzilla redirects I mocked it up in beta [21:27:37] it's definitely not sufficient that DannyB runs them on a laptop [21:27:54] so.. wait [21:27:54] and just tested mock urls [21:28:01] lets use phab2001 then? [21:28:16] well, i don't think the varnish stuff is set up yet for phab2001 [21:28:33] lets just fix phab-05. what's broken on that? [21:28:37] I can probably make it work [21:28:40] mutante would it be easy just to setup another http port [21:28:43] for git.wikimedia.org [21:29:13] paladox: can you answer that? [21:29:16] the error from earlier [21:29:25] mutante error? [21:29:25] DNS lookup for tin.. ? [21:29:28] Oh [21:29:37] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: DNS lookup failed for phab-tin.phabricator.eqiad.wmflabs Resolv::DNS::Resource::IN::A at /etc/puppet/modules/scap/manifests/target.pp:98 on node phab-05.phabricator.eqiad.wmflabs [21:29:37] Warning: Not using cache on failed catalog [21:29:48] mutante ^^ [21:31:13] paladox: i am deleting the config again that i put on it manually.. or it's already not clean [21:31:24] Oh [21:32:16] that is not related to the error you pasted [21:33:08] twentyafterfour: ^ yea, so the DNS lookup for phab-tin is the blocker [21:33:28] I don't know why it can't look up phab-tin? that's a valid host name or it should be [21:33:45] that's what made me think it's back to general labs issues [21:33:49] mutante maybe it could be the hardware failure [21:34:53] i doubt it's hardware failure , given the history of DNS issues in labs [21:34:59] Oh [21:35:16] mutante would it be easy just to setup another http port [21:35:22] how would that help [21:35:29] I'll look into it, but cloning the deployment repo into /srv/phab might fix it? [21:35:42] twentyafterfour [21:35:46] the repo is already clone [21:35:48] into there [21:35:50] oh [21:35:57] I doint know how it done it [21:36:02] but i didnt do it. [21:36:24] Theres phab-old too [21:37:25] we already need a new instance again :/ [21:37:44] first you applied the phabricator role on -05 [21:37:49] it failed with the DNS lookup thing [21:38:24] then we applied "simplelamp" to get just an Apache and manually copy config over [21:38:50] now you already have to clean all that up again and delete that stuff [21:39:27] removing the roles usually does not mean there are no remnants [21:39:38] why not just apply the apache.conf manually to a working phab-0* instance? [21:39:52] there should be one that works, maybe phab-03? [21:39:53] that's what i tried above [21:39:58] syntax error [21:39:59] mutante oh [21:40:07] missing more stuff that would be installed by the phab role [21:40:23] can you try that? [21:40:29] twentyafterfour sorry i was mistaken [21:40:33] phab was installed [21:40:41] 14:27 < mutante> i did this: copy 50-phabricator.conf from production and 50-git.wikimedia.org.conf from gerrit to labs instance phab-05 [21:40:53] the 2 files are on the -05 instance in /root/ [21:41:04] i can copy them back to sites-available [21:41:10] and link them to sites-enabled [21:42:11] mutante i get [21:42:11] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate declaration: Class[Mysql::Config] is already declared in file /etc/puppet/modules/role/manifests/phabricator/labs.pp:35; cannot redeclare at /etc/puppet/modules/mysql/manifests/server.pp:35 on node phab-05.phabricator.eqiad.wmflabs [21:42:11] Warning: Not using cache on failed catalog [21:42:11] Error: Could not retrieve catalog; skipping run [21:42:12] now [21:42:22] twentyafterfour ^^ [21:42:33] paladox: you're trying to apply production + labs role? [21:42:41] twentyafterfour yep [21:42:43] you can't have both and phab production role won't work on labs [21:42:49] paladox: like i said, you applied on simplelamp, it's basically messed up [21:42:50] because of mysql [21:42:57] Oj [21:42:59] oh [21:43:00] labs role has a local mysql instance instead of connecting to the prod cluster [21:43:06] should i delete the instance and restart [21:43:10] mutante ^^ [21:43:14] oh [21:43:30] paladox: yes, delete instance, recreate, apply only phabricator::main role [21:43:37] Ok [21:43:42] but the DNS lookup issue is still the problem [21:43:43] yeah probably should start fresh. I don't think the prod role is going to fly. we can copy files from iridium to make apache work right though [21:43:51] Ok [21:43:51] or comment out the banlist stuff since it's not needed for this test [21:43:59] ok, wait [21:44:11] we just said opposite things [21:44:21] if we cant apply production role [21:44:26] how do you get an Apache on it? [21:44:28] I don't think phabricator::main will work on labs [21:44:41] we tried to use "simplelamp" [21:44:43] phabricator::labs should install apache right [21:44:46] I recreated phab-05 instance now [21:44:46] that is why he got the mysql stuff now [21:44:50] aha!! [21:45:12] phabricator::labs [21:45:18] then that paladox [21:45:35] mutante then? [21:45:36] you mean "role::phabricator::labs" [21:45:38] right [21:45:44] I apply that [21:46:01] so .. still the DNS lookup fail [21:46:04] right [21:46:16] yea, we are going in circles [21:46:19] no apache becaues of that [21:46:47] ok let me try to dig into the dns issue. sorry I'm on deployment duty so I've only been able to give this 10% of my attention [21:47:17] ssh phab-tin.phabricator.eqiad.wmflabs [21:47:30] I'm able to log into phab-tin so no idea why dns would fail? testing some things. [21:47:57] twentyafterfour if you restart it [21:48:00] it will fail [21:48:12] if I restart phab-tin? [21:48:12] phab-01 and phab-04 are failing [21:48:17] Yep [21:49:04] labs has been real unreliable, at least the phabricator project is constantly failing somehow [21:49:32] phab-03 works [21:50:01] Yep [21:50:02] twentyafterfour@phab-03:~$ ping phab-tin [21:50:04] PING phab-tin.phabricator.eqiad.wmflabs (10.68.18.56) 56(84) bytes of data. [21:50:08] dns works apparently [21:51:34] Im getting [21:51:34] Permission denied (publickey). [21:51:34] Killed by signal 1. [21:51:41] twentyafterfour ^^ [21:51:45] for the new instance [21:51:48] i recreated [21:51:52] phab-06 [21:52:38] mutante ^^ [21:53:43] yeah same for phab-06 [21:53:50] that's also common, I don't know why [21:54:15] lame :-/ [21:54:24] Oh [21:56:31] we're up to 6 now? :) [21:57:25] 4 of them cant be ssh'ed to [21:57:33] 1 of them uses a role [21:57:35] greg-g i think more then 6 but i deleted phab-05 the one i created [21:57:42] but decided to name it 06 [21:58:22] paladox: i dont know why we cant ssh to a new instance, that is a -labs problem [21:58:29] Oh [22:07:54] twentyafterfour: after manually copying the configs in place i got "14:28 < mutante> Could not open configuration file /etc/apache2/phabbanlist.conf: [22:08:13] taking that from prod too [22:08:47] oh wait.. is that private data ? hrmm [22:09:01] i'll put a placeholder i guess [22:09:15] but first we need to be able to create a new instance [22:09:51] until then i think we are blocked and should postpone the announced gitblit chnage [22:10:06] Yep [22:10:20] mutante but can we just add another port and do it that [22:10:21] way [22:10:25] eg another http port [22:10:52] that would be the easiast and would allow us to continue with tomarror git.wikimedia.org redirection [22:10:55] ? i dont understand what you mean [22:11:03] why would that fix our problem? [22:11:43] you would still do the same that DannyB did before [22:11:46] wouldnt you [22:11:59] you would confirm "yea, those rules work by themselves" [22:12:03] like before [22:12:13] but it would still break when merged in prod [22:12:35] unless we test all the configs together.. we dont know it works [22:12:48] mutante but if done in a different port, the bugzilla redirection wont affect the git.wikimedia.org redirection [22:12:56] 10Phabricator, 15User-greg: Change of workboard columns in #Phabricator - https://phabricator.wikimedia.org/T131568#2413119 (10greg) BTW: My reasoning for using columns as categories instead of state-tracking is that, predominately, state is best tracked in team or project specific 'sprint' (loosely defined) b... [22:13:12] but the point is that we need git.wm and bugzilla.wm to both work together [22:13:26] Oh, but they will just on different ports [22:13:46] 10Phabricator, 15User-greg: Change of workboard columns in #Phabricator - https://phabricator.wikimedia.org/T131568#2171121 (10greg) [22:13:52] mutante ^^ [22:15:09] paladox: how would that work in production? [22:15:28] you mean you also want to have 2 separate varnish backends? [22:15:46] mutante because well one redirect will only work on for example port 80 the other one on 81 it shoulden affect both if on different port [22:16:00] oh, is varnish what we use to seperate things [22:16:03] then yes. [22:16:08] please [22:16:15] that would mean 2 separate varnish backends on different ports on the same machine (i think not possible currently) [22:16:26] and even if that would be very unusual [22:16:29] mutante yes, maybe we could try please [22:16:34] because that is what VirtualHosts are for [22:16:57] but we can clean it up later that way we can get more testing in on it so it dosent affect bugzilla. [22:17:22] running a separate backend for some redirects just because we cant get the rewrite rules to work together is total overkill [22:17:36] and i think it wont even work [22:17:39] mutante maybe. [22:18:03] ..or we could go back to my original suggestion and put it on the cluster [22:18:54] mutante well yes, then work on getting them to iridum [22:18:59] setting it up manually on -06 would be ok now.. if only we could use it [22:19:04] I think that would be the right way forward [22:19:09] for now [22:19:20] mutante yep. [22:19:23] i think the right way forward is to get on that fresh instance [22:19:28] Im trying to fix it so i can get it working [22:19:58] mutante so we put it on the phab instance on labs [22:20:15] try rebooting that instance first [22:20:28] i think that is what is suggested in -labs usually [22:20:33] mutante i made it a large now for the extra traffic [22:20:44] which traffic [22:22:09] I restarted [22:22:12] git.wikimedia.org [22:22:24] since i thought you said we were hosting the rewrites on phab-06 [22:22:55] you mean git.wmflabs.org, right [22:23:02] that doesnt mean it will get much traffic [22:23:05] or needs to be large [22:23:23] but as long as it works and you can ssh to it,, fine [22:23:25] can you?/ [22:23:36] Nope i carn't ssh into [22:23:43] it but restarting still dosent ssh into it [22:25:11] ok.. so that's our blocker [22:25:36] Yep [22:26:08] :-/ [22:29:08] twentyafterfour i doint know why ssh isent working [22:29:09] ? [22:29:49] paladox: I don't know either [22:29:55] Oh [22:30:00] it's been an issue in the phabricator labs project for a while on-and-off [22:30:26] i have had the same issue before in other projects [22:30:37] it's some kind of race condition is all i know [22:31:03] it sounds bad.. but make another one :/ [22:32:19] mutante can we use salt [22:33:08] like yuvipanda did for phab-02 [22:33:14] he did? [22:33:28] yep, but i doint know how to use salt [22:33:48] mutante https://phabricator.wikimedia.org/T137270#2381745 [22:34:27] it's a way to execute commands, but we'd still need to know which commands [22:34:40] he used that to get access and remove existing roles/config afaict [22:35:00] in this case the instance is brandnew and doesnt even have roles yet, [22:35:03] rihgt [22:35:10] yep [22:35:36] what happened to -05 ? [22:35:49] did you delete it? [22:36:02] Yep [22:36:08] since i thought that was one of the [22:36:11] reason for ssh [22:36:14] but no it wasent [22:36:28] just try repeating it [22:36:30] best i have [22:36:36] so delete -06 too [22:36:40] and try it again [22:36:44] or with some other name [22:37:18] Ok [22:38:29] meanwhile.. breaking news about terror in Turkey .. pretty horrible [22:38:37] and need to leave in a few [22:40:19] mutante ok [22:40:52] paladox: i feel it will work again later and we should maybe just take a break and then try again [22:40:58] mutante we have phab-03 [22:41:01] i can ssh into that one [22:41:07] ah:) [22:41:16] yep [22:41:22] does it have apache? [22:41:34] go to /etc/apache2/sites-enabled/ [22:41:41] let me ssh to..ok [22:42:36] mutante yep it does [22:42:41] since phabricator is installed [22:42:47] phab-03.wmflabs.org [22:43:54] ok, just a sec [22:44:26] mutante seems syntax error in apache [22:44:50] yes, that's why i said to wait [22:45:30] that's the missing phabbanlist.conf as i pasted earlier [22:45:42] and why i said we cant just copy that one config [22:46:26] mutante we can remove that bit from it [22:46:29] try now [22:46:32] ok [22:46:48] Yep works now [22:46:58] But shows AH00112: Warning: DocumentRoot [/srv/git.wikimedia.org] does not exist [22:47:08] already fixed [22:47:12] ok [22:47:24] so that's 3 files that were copied manually [22:47:38] the 2 sites and the bannlist (i removed the actual IPs) [22:47:48] Oh [22:48:21] now we need to either change the server aliases [22:48:27] Yep [22:48:31] and have 2 proxies [22:48:48] Oh [22:49:09] or test with curl and manually sending host header [22:49:11] mutante i can update the proxy [22:49:20] go ahead if you know how [22:49:43] Ok, ive updated it now [22:49:47] mutante ^^ [22:49:52] how [22:49:52] its pointing to phab-03 [22:49:56] https://horizon.wikimedia.org/project/proxy/ [22:50:10] git.wmflabs.org [22:50:17] Gets redirected now :) [22:50:25] what about phabricator. and bugzilla. [22:50:32] do they also get redirected? [22:50:38] that is the problem we had [22:50:50] we already know that git.wm redirected.. but everything redirected [22:50:50] But phab-03 is redirected too [22:50:57] so.. same issue as before [22:51:11] multiple vhosts [22:51:14] mutante oh it is redirecting bugzilla [22:51:15] now [22:51:37] how do you know [22:51:39] mutante not even on same machine and causing problems. [22:51:42] tested it [22:51:47] tested what? [22:51:49] http://bugzilla.wikimedia.org/555 [22:52:02] mutante we should quickly change port [22:52:11] eh, that is just the production setup [22:52:22] that doesnt test anything about the new setup [22:52:31] no, we should not change the port [22:53:15] the problem did not change, it's still that those virtual hosts and rules must work together [22:53:30] i will look again but i ran out of time right now [22:53:31] Oh [22:54:20] root@phab-03:/etc/apache2/sites-enabled# apache2ctl -S | grep namevhost port 80 namevhost git.wikimedia.org (/etc/apache2/sites-enabled/50-git-wikimedia-org.conf:1) port 80 namevhost phabricator.wikimedia.org (/etc/apache2/sites-enabled/50-phabricator.conf:5) [22:54:55] ^ phabricator.wikimedia.org and git.wikimedia.org are still the names of the virtual servers in that apache [22:55:03] that will not work [22:55:18] mutante oh, as you ran out of time will you do it later, since probaly by the time your back online i will be offline since it is 12:00am almost 5 mins to go [22:55:23] Oh [22:55:35] either it needs to be git.wmflabs and phabricator.wmflabs or somehting that you can proxy to [22:55:52] mutante we need git.wmflabs.org [22:55:56] or you need to test by telling Apache (with curl) which VirtualHost you want to request [22:55:57] so we should change that [22:56:03] oh [22:56:07] yes, like we did before, right [22:56:11] yep [22:56:25] just that you also need that for the other VirtualHost [22:56:43] and.. [22:56:52] Yep [22:56:56] the Bugzilla rewrite rules... they are not there yet [22:57:00] Oh [22:57:06] so where are they coming from [22:57:13] Not sure [22:57:29] root@phab-03:/etc/apache2/sites-enabled# grep -r bugzilla * [22:57:29] root@phab-03:/etc/apache2/sites-enabled# [22:58:11] eh.. that is actually the same on production [22:58:46] mutante the rules are comming from 50-git-wikimedia-org.conf [22:58:56] yea, so that is because it is not even done in Apache itself [22:58:58] on phab-03 [22:58:59] but in phabricator [22:59:01] templates/redirect_config.json.erb: "pattern": "(bugs|bugzilla).wikimedia.org/show_bug.cgi\\?id=([0-9]+)", [22:59:05] yep [22:59:09] which means we cant test [22:59:13] by setting up just Apache [22:59:17] we need phabricator [22:59:19] Oh [22:59:27] twentyafterfour ^^ [22:59:36] which means back to the bug about applying the puppet role [22:59:39] sorry, need to go [22:59:40] laters [22:59:44] Ok bye [22:59:52] Will you be on tomarror [22:59:57] mutante ^^ [23:00:06] are you still on the same timezone as me? [23:05:20] mutante im going to upgrade windows now. [23:05:31] I will be online tomarror so we can hopefully test it. [23:10:23] twentyafterfour it seems we may have remove phab-03 apache file, would you be able to re add it please [23:10:35] so phab-03.wmflabs.org works please [23:10:40] or bd808 ^^ [23:18:13] I fixed it for now [23:30:29] 10Phabricator, 07LDAP: Having difficulty logging into Phabricator via LDAP when multiple accounts returned for username - https://phabricator.wikimedia.org/T138672#2413320 (10Aklapper) p:05High>03Normal