[00:00:02] And since they are invisible characters you are not seeing them when you use `cat` or `less` to examine the file on the linux side [00:00:05] what should I save it as? [00:00:43] tomthirteen_, I agree with Bryan, try Notepad++ [00:00:50] the answer is probably "use another text editor". Notepad++ or something similar [00:00:59] rather than whatever editor you're using right now [00:01:11] I am writing it in note++ [00:01:19] saving it as .txt [00:01:28] notepad++* [00:02:06] tomthirteen_: https://stackoverflow.com/a/8432691/8171 -- there is a "Enconding > Encode in UTF-8 without BOM" setting [00:02:07] tomthirteen_, if you open the 'Encoding' drop-down what does it say? [00:02:39] BOM stands for "Byte Order Mark" and that's the invisible character that is causing troubles [00:02:53] utf-8 [00:03:04] https://i.imgur.com/v0fNzv4.png [00:03:38] should i switch it to bom? [00:03:43] no [00:04:06] to be clear [00:04:11] this file is saved with notepad++ right? [00:04:16] it's already at utf-8 [00:04:17] you do not ever edit it with the WinSCP editor? [00:04:37] actually I've viewed the files in winscp [00:04:42] is that screwing it up? [00:04:53] might be? [00:05:33] hmmm [00:05:58] and now the correct command to run bash files is "cat" + file name? [00:06:42] tomthirteen_: reading an article like http://matt.might.net/articles/bash-by-example/ might help you with the concepts of executable shell scripts [00:07:30] `cat` is similar to the windows `type` command. It prints a file to the console [00:08:12] tomthirteen_, no [00:08:19] tomthirteen_, cat merely shows the contents of the file [00:08:34] oh [00:10:06] hmmm [00:10:26] i think it's working now but i don't have a clue what's been happening [00:11:19] i work in notepad++. I've been running scripts in python to put variables in the commands. save as text [00:11:24] sounds good? [00:11:35] should I avoid the winsc editor? [00:12:57] tomthirteen_: I haven't used winscp for a long time, but if the file you uploaded that worked had not been opened with it's editor/file viewer and it worked when the others did not, then I would say yes avoid it [00:13:22] yes i think you're right [00:13:40] can i ask what do you use instead of winsc? [00:14:09] notepad++ should be able to same utf-8 encoded txt files *without* a byte order mark (BOM). That's what you want for use on the linux side [00:14:33] bd808 > so just utf-8, yes? [00:14:51] tomthirteen_: I actually use Apple's OS X for my laptop. It had command line programs for ssh and scp that I use [00:15:06] k [00:15:08] tomthirteen_: I think so, yes if the other options mention BOM [00:15:18] thank you [00:18:21] On Windows, I still use OpenSSH and SCP [00:18:45] they are available with Git for Windows etc. [00:25:32] it also doesn't seem to like if the sql enwiki_p is on a separate line [00:26:50] Ok, I "think" I got it. bd808 and Krenair, thank you SO much. this forum is always tremendously helpful. I appreciate your time and effort. [00:28:08] you could try Edit -> EOL Conversion -> Unix (LF) [00:28:18] that might solve some problems relating to line endings [00:32:13] I will try that. Did you also say that "source" was unusual for running commands? [00:35:11] Bryan did, and he's probably right [00:35:25] it's valid though [00:35:30] how should you run bash files then? [00:35:38] I'm also not sure changing away from source would solve any of the problems you've experienced today [00:35:47] ok [00:36:12] normally you make the file executable, and ./name_of_file [00:36:27] this computer i'm using is on its last leg. would that have to do with it? [00:37:43] no [00:40:53] ok [11:12:25] !log toolsbeta mangle sources.list to handle some apt warnings related to missing repos, etc [11:12:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:12:42] !log toolsbeta mangle sources.list to handle some apt warnings related to missing repos, etc in toolsbeta-k8s-master-01: [11:12:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:17:18] !log toolsbeta install by hand some openstack client packages that puppet would refuse to install in toolsbeta-k8s-master-01 [11:17:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:45:58] !log toolsbeta T224273 create toolsbeta-k8s-master-arturo-[12] stretch VMs [11:46:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:46:00] T224273: Toolforge: develop new k8s cluster in toolsbeta - https://phabricator.wikimedia.org/T224273 [11:49:33] !log toolsbeta T224273 create `toolsbeta-k8s-master-arturo` puppet prefix in horizon [11:49:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [15:04:13] I have a doubt related to adding a co-maintainer for a tool. Anybody online here? [15:06:56] I am currently a maintainer for 2 tools namely templatetransclusioncheck and fireflytools. I am also working on Incubator as a test admin for two wikis namely Khowar Wikipedia and Malayalam Wikivoyage [15:09:44] All the wikis present in the incubator uses an option called localisation which works using the tool named Robin. The tool has two maintainers namely MF-Warburg and SPQRobin [15:10:54] Since I was interested in making use of that tool, I was having interest in making that tool run which led me to the idea of becoming a co-maintainer of that tool [15:12:12] When I asked to MF-Wafburg, the reply was that he know nothing about that tool and informed me that it was totally controlled by Robin. Then I sent out a message to robin for which I haven't got any replies yet. [15:13:13] I would like to know whether I can request in Phab in order to become a co-maintainer of that tool. I would also like to know whether there is any sample form which I can use for making such a request [15:14:10] If this message is seen by someone while looking through the history, please do post a reply on any of my userpages. My username is Adithyak1997 [15:16:19] Adithya: you probably want to take a look at https://wikitech.wikimedia.org/wiki/Help:Toolforge/Abandoned_tool_policy [15:16:43] Adithya: I think that https://wikitech.wikimedia.org/wiki/Help:Toolforge/Abandoned_tool_policy is what you are looking for [15:16:55] jinx. thanks Lucas_WMDE :) [15:17:00] what bd808 said ;) [15:18:23] according to https://tools.wmflabs.org/guc/?src=rc&user=SPQRobin, SPQRobin was active less than a week ago on nlwiki, so the tool wouldn’t be considered abandoned yet IIUC [16:09:37] Hi! I'm trying to edit a hiera config in deployment-prep but it doesn't appear obvious how to do that. I imagine there should be an edit button or the like in the "Actions" column but there is nothing there. Anything else I should be looking at? [16:24:07] shdubsh: you have 3 entry points in horizon: VM puppet config, puppet prefix config and project-wide puppet config [16:24:39] shdubsh: if the hiera key is already applied and you want to change the value, you would like to see where is defined in the first place, and update the value [16:26:32] it looks like the config is applied to the host (deployment-logstash03), but there are a couple of inherited configs that do not contain the key I want to update (profile::elasticsearch::common_settings) [16:27:29] check all the 3 panels I would say [16:27:50] the 3 horizon panels in red https://usercontent.irccloud-cdn.com/file/TfhZE6pB/Screenshot_2019-05-24_18-25-54.png [16:28:05] shdubsh: which key do you want to update? I kinda/sorta wrote a bit of that puppet so you can blame me :( [16:28:27] shdubsh: also, you can check https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep [16:30:01] I suspect I don't have permissions to edit the config. What am I looking for? A button or a text box? [16:30:55] shdubsh: imo editing on wikitech is the least desirable place to edit, it's more annoying to get merged but hieradata/labs/deployment-prep/common.yaml in the operations/puppet repository is preferred imo [16:32:25] wikitech/horizon [16:32:33] ebernhardson: does that common.yaml sit higher in the hierarchy than the horizon configs? [16:34:07] shdubsh: hmm, i think the order of overrides has yaml's in the repository at the lowest level, so horizon should override it [16:34:29] bummer... [16:34:37] shdubsh: as to your other question, if i open up prefix puppet in horizon and scroll all the way to the bottom it has `Hiera Config`, then a text box containing yaml, then a blue `Edit` button [16:34:54] ah, I don't have a blue edit button! [16:35:45] no [16:35:53] hieradata/labs/deployment-prep/common.yaml is to be avoided [16:36:15] Krenair: why? [16:36:22] it is in the ops/puppet repository [16:36:35] where we cannot merge freely without going through ops [16:36:54] Krenair: right, meaning if something in puppet changes it can be found via grep, instead of simply breaking deployment-pre [16:37:30] there is puppet swat for getting changes merged, although ymmv [16:37:44] puppet swat rarely works [16:38:15] well i can agree there are certainly tradeoffs, i might have better luck getting puppet merged [16:46:23] shdubsh: if you want i could probably give you appropriate access to have that button. You currently have user while i also have projectadmin. But various sre also have projectadmin [16:48:27] I would like that if it's possible :) [16:48:29] shdubsh can definitely have projectadmin access [16:49:24] shdubsh: applied [16:49:38] thanks! [16:52:57] it works now, thanks again :) [18:02:51] o/ bstorm_ [18:02:53] Sorry I'm late. Ready to migrate? [18:03:11] I think so! The monitor should be downtimed now. (checks) [18:04:08] Hrm. It didn't seem to. Well I have the patch up to change DNS [18:05:03] It says the command is successful... [18:05:18] The replica seems to be up to date, though, and I've got the promote command ready [18:05:33] OK hold off one second. [18:05:54] ok ready [18:05:58] bstorm_, ^ [18:06:07] ok :) [18:06:21] Let me know when it *should* be working :) [18:06:22] This really doesn't want to downtime [18:09:30] andrewbogott: I'm trying to downtime the toolschecker that checks the rw status of wikilabelsdb...is it just not working to do that for a service in the web UI now, in your experience? I may just downtime the entire server [18:09:58] "fun" [18:10:28] I would expect it to work but don't know any more than that [18:10:54] And am on my way to lunch [18:10:56] I just made sure that we have no open connections -- if that is relevamt. [18:11:45] andrewbogott, come to the coffee shop and help me manage this migration ;) [18:11:53] lol [18:12:22] Ok, I'm just going to have to let it page people. The downtime script doesn't recognize that host. [18:13:16] Ok, halfak: I'm merging the patch and running the DNS script [18:13:22] kk [18:16:51] It's...broken? [18:17:34] Uh? [18:17:47] Should I reset our DB connections? [18:17:53] Nope :) [18:18:00] The dns script just blew up. [18:18:05] Which I really didn't expect [18:18:05] Rollback! [18:18:22] It's not my change that caused it [18:18:33] It's a 500 in another service [18:19:31] andrewbogott: when you are back from lunch. art.uro got this error as well at one point (saw a passing email about it) requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://cloudservices1003.wikimedia.org:9001/v2/zones/04c45c1f-214d-450b-a733-028dcdc87a12/recordsets [18:19:50] There's a circular reference error as well. [18:19:54] I wonder what that's about [18:20:21] bstorm_, can I just re-open my rw connections on the old DB? [18:20:24] wikilabels DNS is unchanged [18:20:29] Cool [18:20:46] If you want to for now, sure. I have to figure this out. To test it, I have to switch to the new server, though :) [18:21:03] The new server will work fine, you just cannot write until I run a quick command to promote it [18:22:32] The migration should take a minute once I figure this out...but this is something that broken in our DNS system. Sorry about that. [18:22:53] T224000 [18:22:54] T224000: wmcs-wikireplica-dns: error: circular reference detected - https://phabricator.wikimedia.org/T224000 [18:23:08] Looks like it was not resolved. I'll try to figure that out. [18:23:28] bstorm_, OK. For now, I'll plan to stay on the old host. Looks like we recovered just fine. [18:23:56] Yeah, there's no real change yet until the script can actually change DNS lol [18:24:15] OK let me know when you are ready to reschedule then [18:24:34] we've seen errors like that one from designate before bstorm_ [18:24:37] You probably wouldn't even notice if I was able to suddenly change it. [18:24:41] lol [18:25:28] If we just used the IP address of the server, we could proceed. The DNS was supposed to make this easier [18:27:35] halfak: if I can figure it out today, is there any possibility of doing it later on? I don't want to crash the service when it tries to write to your DB in the brief time it would go read only. [18:29:08] Krenair: I suspect there's a larger problem with the script that caused designate to blow up. [18:29:24] I'll revert my patch to change the IP of the db for now [18:29:33] And I suppose I'll work on that script today [18:29:44] bstorm_, I could work with you on this again some time in the next 3 hours, but not the immediate 30 minutes. [18:29:51] I have to get on call. [18:30:02] No worries, I doubt I'll find the error that fast anyway :) [18:30:46] Back in 30 minutes! [18:46:16] back and available to talk migration when you are ready bstorm_. [18:51:28] bstorm_, Designate should never be throwing http 500s back at our scripts [18:51:52] it is possible some error in our script triggers the faulty designate behaviour [19:05:52] Krenair: it is clear we are sending garbage to designate, though [19:06:01] From the tracebacks. [19:06:06] ok [19:06:09] So Im' sure it does :) [19:06:11] so yeah [19:06:14] problems on both sides [19:06:33] bad error handling in the client lib too, with that circular reference thing [19:06:39] So I figure, I'll fix our script and dream of upgrades to designate in the future...put in a PR if I'm every really bored. [19:06:43] *ever [19:06:57] Sorry openstack uses gerrit. s/PR/patch/ [19:07:25] It seems like pyyaml fails to load the config. Something is wrong in there as well [19:07:35] That may be the root [19:07:58] Thanks halfak :) I'm still troubleshooting python and such [19:08:07] godspeed [19:12:49] bstorm_: I'm back but have lost track of what's happening… is there still a designate issue that's blocking you? [19:14:22] Designate or script. I think it's the script. [19:14:26] But yes [19:14:29] effectively [19:14:41] I'm digging around for the error [19:15:15] First thing I do with any designate issue is make sure there either is or isn't a terminating . on the domain name [19:15:38] which script is breaking? [19:15:46] wmcs-wikireplica-dns [19:15:56] T224000 [19:15:57] T224000: wmcs-wikireplica-dns: error: circular reference detected - https://phabricator.wikimedia.org/T224000 [19:17:11] It hasn't been run since bd808 tried to add the additional cnames, maybe? But that might not be it either. I corrected a random string in an error message, but it shouldn't be trying to create names that already exist, for one [19:17:29] regardless of the incorrect log :) [19:17:44] So it's confused somewhere. I'm walking through it to find where [19:20:15] It looks like it did have a dot at the end in the resulting payload? [19:20:48] it looks like it's trying to create a record named 's6.{u'status': u'ACTIVE', u'masters': [], u'name': u'analytics.db.svc.eqiad.wmflabs.', u'links': {u'self': u'http://cloudservices1003.wikimedia.org:9001/v2/zones/04c45c1f-214d-450b-a733-028dcdc87a12'}, u'transferred_at': None, u'created_at': u'2017-09-15T23:16:54.000000', u'pool_id': u'794ccc2c-d751-44fe-b57f-8894c9f5c842', u'updated_at': u'2019-04-23T20:55:06.000000', u'email': [19:20:48] u'root@wmflabs.org', u'version': 2864, u'ttl': 60, u'action': u'NONE', u'attributes': {}, u'serial': 1556052260, u'project_id': u'noauth-project', u'type': u'PRIMARY', u'id': u'04c45c1f-214d-450b-a733-028dcdc87a12', u'description': u'long running wiki replica queries'}' ? [19:20:59] Like maybe it's formatting a whole object into the domain name [19:21:13] but also maybe that's just how that debug line looks… I'm trying to find the code now [19:21:28] Yup :) [19:21:51] There's a few things there. The traceback is not correct because the log line isn't right [19:21:59] it should not try to create that record [19:23:22] where are you running the script? [19:24:07] nm, foun [19:24:08] found [19:25:18] :) [19:25:25] I'm walking through it and debugging. [19:26:19] has this script ever worked? Either the config file handling is 100% broken or I'm misreading badly [19:26:20] Damn it, it's python 2 :-/ [19:26:22] yes [19:26:38] has it ever worked without the —zone arg passed in? [19:26:41] The config handling isn't broken entirely [19:26:45] yes [19:26:48] I've used it [19:27:04] ok :) [19:27:20] checking the file history. I think it worked before bry.an's latest patch [19:27:32] and in theory it's idempotent so it's ok for me to just run it with no args, repeatedly? [19:27:47] yes...but it is broken. So it may blow away DNS records [19:27:53] That payload looked bad to me [19:28:13] ok. So… probably I'm telling you what you already know… it looks to me like the code expects all_zones to contain a simple list of zones [19:28:14] So I would not run it in its current state. I'm afraid it may succeed in doing something [19:28:25] but instead it also looks like all_zones is going to contain a bunch of dicts instead [19:28:26] It does [19:28:33] not exactly [19:28:35] and then hilarity ensues [19:28:47] The dict is coming out of mwopenstackclients [19:30:22] I have not seen it work since https://github.com/wikimedia/puppet/commit/ec0d9312f6b57a1c15d0f553b357fa749afac6af#diff-f48c187908e72a7ee7f2b2be9e7edc9b [19:30:42] hm, ok, I made a test and all of the above is wrong [19:31:32] heh [19:34:34] I assume the issue in T224000 happened on repeated runs? [19:34:35] T224000: wmcs-wikireplica-dns: error: circular reference detected - https://phabricator.wikimedia.org/T224000 [19:34:47] no [19:34:54] oh :( [19:34:55] Oh well yes, it happens every time [19:35:05] It happened on the first run, though [19:36:20] This is what we've had changed since then. https://gerrit.wikimedia.org/r/c/operations/puppet/+/506019 [19:36:38] IIRC, the aliases it is supposed to create don't yet exist either [19:36:59] checking [19:37:22] Nope, it does :) [19:37:31] so that suggests it must have run once [19:37:36] ? [19:45:33] somewhere in this loop "for svc, ips in config['zones'][zone].iteritems()" the 'zone' variable is having its value modified [19:45:46] Nope, got it. [19:46:08] It's what's coming out of new cnames stuff [19:46:14] It gets processed lower down [19:46:19] but comes back around [19:47:03] Sorry, I should say "yes" not nope because that is in this loop [19:47:08] ok :) [19:47:11] I was briefly confused [19:47:11] But that's where it is happening [19:47:12] zone = find_zone_for_fqdn(dns, cname) [19:47:19] I think [19:47:29] yeah... [19:47:36] I don't know what all this is doing but I know you shouldn't modify a loop variable [19:47:38] and this does that [19:48:42] It starts happening on the first variable that hits that spot [19:48:53] So, I'll throw up a patch we can try out [19:49:27] did it look like this? https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/512423/ [19:50:00] LOL, yes something like that [19:51:19] What is bugging me is how it is quite coming back around to where the error is happening [19:51:40] Wait...I think I get it [19:54:27] That's really the only issue, I think [19:54:38] I was just being obsessive about it. [19:55:45] I'm inclined to ignore the specifics of the particular designate error since the thing we were passing in was deeply nonsensical. [19:56:14] Yeah. exactly [19:57:53] So, while I do kind of love the zonipus, how does this look? https://gerrit.wikimedia.org/r/c/operations/puppet/+/512425 [20:00:48] I don't honestly see any other changes to it that should do anythin gbad. It was just shadowing a variable in a loop (which is terribly naughty) [20:01:33] I think I'm confident enough to try running it again when puppet is done [20:02:23] Running... [20:02:40] It completes without error. Let me make sure that didn't just erase all of DNS or something [20:03:06] that commit certainly looks like it will unbreak some stuff [20:03:39] It looks great [20:03:50] DNS appears to be working as is the script [20:04:06] Thanks andrewbogott and Krenair [20:04:23] I didnt do anything [20:04:30] cool [20:04:35] You looked and reviewed my nitpick change :) [20:04:42] I appreciate that [20:04:52] :) [20:04:55] thanks [20:05:29] Ok, so! halfak: how much of a stomach have you got for a DB migration at this point? [20:05:43] Enough. Want to do it now? [20:06:46] How about in maybe 10 minutes? The bus is turning onto a dirt road in the middle of some mountains in Nevada. I don't want to be bounced into the keyboard. [20:07:38] Sure. That works for me [20:07:52] Also, you must have a strong cell receiver! [20:07:57] I remember trying to do some work on a bus once [20:08:07] not fun [20:08:17] and that was on proper roads [20:08:36] I do, halfak :) I use a radio amplifier and large antenna on the roof [20:09:11] I've been on a highway this whole time. [20:09:51] Roads that are "maintained" by the Bureau of Land Management are harder to type safely on. [20:13:00] Krenair: proper road? As in the A roads or the motorway? [20:13:17] both [20:13:42] certainly not dirt roads [20:13:49] Heh [20:14:01] what about B roads? [20:17:22] Ok, managed to get the patch back up: https://gerrit.wikimedia.org/r/c/operations/puppet/+/512428 [20:17:36] idk [20:18:21] I'll have to be quick since I hate setting off alarms, and I cannot quiet the toolschecker [20:18:39] halfak: we are in camp now (and not moving), so I'm ready when you are [20:19:03] OK let's do it. [20:19:10] Let me know when we go read-only. [20:19:34] merging. [20:19:43] You should end up that once I've run the script [20:20:56] * bstorm_ waiting for puppet [20:22:14] * bstorm_ running the script now...which will take a few moments for DNS to propagate, etc. [20:23:38] I should probably have used the --zone arg and -v [20:24:02] Last time I did this the change seemed quicker [20:24:10] I still see the old value in DNS [20:24:36] andrewbogott: if you are around, is there a way to confirm if designate saw a change? [20:25:30] probably. Like, want me to just check the api logs? [20:25:59] That'd be great [20:26:03] what zone am I looking for? [20:26:13] wikilabels.db.svc.eqiad.wmflabs [20:26:31] https://quarry.wmflabs.org/ Quarry is down? 502 Bad Gateway – Sadly Quarry is currently experiencing issues and is unavailable. Please try again later. [20:26:57] somewhere in designate will be a zone for that [20:27:07] but I think it might be one of the noauth-project special things [20:27:14] might not be able to get to it with novaobserver [20:28:06] hm… bstorm_ the logs are much quieter than I expected. I'll dig in the db [20:28:25] It seems to have not changed, which is surprising to me so far. [20:28:40] I can run the script with debug logging. [20:29:35] it's there [20:29:38] it's just not synced to ns0 yet [20:29:55] $ dig wikilabels.db.svc.eqiad.wmflabs @cloud-ns0.wikimedia.org +short [20:29:55] 172.16.5.119 [20:30:03] $ dig wikilabels.db.svc.eqiad.wmflabs @cloud-ns1.wikimedia.org +short [20:30:03] 172.16.3.117 [20:30:16] that sync between nameservers is always the slowest part of changes made in designate [20:31:05] Ok! [20:31:19] So in that case, it went through and is just taking ages [20:31:34] halfak: you are read only now [20:32:03] stopping postgres on the primary and promoting the replica [20:32:11] $ dig wikilabels.db.svc.eqiad.wmflabs @cloud-ns0.wikimedia.org +short [20:32:12] 172.16.3.117 [20:32:14] done now [20:32:53] Confirmed we're not able to write [20:33:13] bstorm_, I'm ready to kick wikilabels when you say go. [20:33:39] It's up [20:33:48] it should be able to write now [20:34:20] halfak: ^^ [20:35:51] halfak: I see a mistake I made. Adding your pg_conf stuff [20:35:59] Ok. Was just going to say :) [20:37:38] That should be fixed now [20:38:22] Sorry [20:40:15] halfak: how's it look now? [20:40:52] The last log msg I have to go on is that it reread config when I realized I forgot a very important step :-p [20:41:07] Looks good bstorm_ [20:41:15] 🎊 [20:41:28] Great. I should be able to set up the replica whenever without impact. [20:41:57] enabling puppet [20:42:41] Thanks [20:43:12] It looks like we're good to go [20:43:15] Wurgl: oops did I kill it? [20:43:42] what is going on [20:44:06] Great! Looks like the prometheus exporter doesn't work right, but I can fix that later [20:44:16] That probably was true of the other server as well [20:44:31] Considering its a missing role, and this is a replica [20:44:33] zhuyifei1999_: I have no idea, not my project. Just a dumb user [20:45:02] bstorm_, could the wikilabels DB move have any impact on quarry? [20:45:08] I do not know what that DB does [20:45:10] It shouldn't [20:45:23] Plus, that was down before I moved it [20:45:29] ok [20:45:44] Quarry would be more impacted by replica DNS not working or something... [20:45:49] I seem to still have some quarry admin access, lets see [20:45:58] zhuyifei1999_: It seems to be up [20:46:10] I just started it [20:46:14] oh [20:46:14] yep [20:46:16] there it is [20:46:38] Thanks [20:47:00] the journal is like [20:47:02] https://www.irccloud.com/pastebin/a6lfdmX1/ [20:47:09] no idea what is going on [20:47:19] like why systemd stopped the service [20:48:11] huh [20:48:29] Looks like puppet thought that was correct? [20:48:44] Still everything ok halfak? Just one more check before I go and eat [20:48:59] * zhuyifei1999_ tries a manual puppet run [20:49:35] yep Notice: /Stage[main]/Uwsgi/Service[uwsgi]/ensure: ensure changed 'running' to 'stopped' [20:51:11] !log quarry disabled puppet on quarry-web-01 because it wants uwsgi dead [20:51:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL [20:51:49] bstorm_: any ideas what could be the reason? [20:52:28] bstorm_, still good! [20:53:03] Great! [20:53:38] zhuyifei1999_: not off the top of my head unless it's controlled by hiera and that was broken at the time [20:54:47] Looking at the puppetizatin [20:54:53] *puppetization [20:54:59] it always stops the uwsgi service [20:55:25] this is the profile that the web server runs https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/quarry/web.pp#L12 [20:55:53] Thanks just found it [20:55:56] looks ok [20:56:16] zhuyifei1999_, what about the uwsgi-quarry-web service? [20:57:07] huh both uwsgi and uwsgi-quarry-web are running [20:57:23] oh I'll stop the former and restart the latter [20:58:06] Maybe that's it? [20:58:37] I notice this is using base::service_unit instead of systemd::service, that's sort of dangerous in terms of where things get slashed by the SRE team :) [20:58:53] But that likely doesn't explain this [20:58:59] yep thanks Krenair and bstorm_ [20:59:10] np [20:59:24] Ok, now I'm going to eat :) [20:59:26] !log quarry reenabled puppet on quarry-web-01, should use uwsgi-quarry-web service not uwsgi service [20:59:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL [21:00:17] !log quarry masked uwsgi service on quarry-web-01 to prevent future mess-ups [21:00:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL [21:00:39] !log clouddb-services T224062 Moved wikilabels postgres db to clouddb-wikilabels-01 [21:00:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Clouddb-services/SAL [21:00:41] T224062: clouddb1002 low on space -- move wikilabelsdb - https://phabricator.wikimedia.org/T224062 [21:07:26] halfak: does https://gerrit.wikimedia.org/r/#/c/analytics/quarry/web/+/512420/1/quarry/web/output.py look good to you? [21:09:04] zhuyifei1999_, what exactly are we catching here? [21:09:15] Looks like we're matching things that start with a tab. [21:09:19] But I'm not following the rest [21:09:34] CSV injection. https://phabricator.wikimedia.org/T209226 [21:11:02] the first part catch those that are already tab-prepended. the second part catch those that don't need tab-prepending. [21:11:35] tab-prepending is needed when it starts with one of the affected CSV injection characters [21:12:00] the "not (element and ..." is super confusing/ [21:12:07] ik [21:12:24] would not element or not... be better? [21:12:45] if : do thing to escape [21:12:48] else: don't do thing [21:12:49] * zhuyifei1999_ feels like element and blah... is better and `not (...)` [21:12:51] k [21:13:51] What if the value is "===Foobar==="? [21:14:09] Are we going to put a weird tab in there because spreadsheet docs read that weird? [21:14:15] *spreadsheet apps. [21:14:27] There must be a better way to escape. [21:15:24] In the end, this doesn't escape really. It mangles. [21:15:32] well, yeah [21:15:41] do you have a better solution? [21:16:08] Hmm. [21:16:23] How do you make a literal "=" for excel? [21:16:42] is that possible? [21:17:30] It looks like google spreadsheets understands "'=" to be a literal "=" [21:17:39] That's a single tick (apostrophe) [21:18:07] It's such a messy hack [21:18:19] Stupid spreadsheet apps are getting in the way of clean data. [21:19:37] I've got to run away. If you want to stick with tabs, that's cool zhuyifei1999_. [21:19:46] what about other spreadsheet apps? [21:19:52] Ping me if you want another review and I'll aim to get it done this weekend. [21:20:00] zhuyifei1999_, sorry no idea. [21:20:03] libreoffice calc and and microsoft excel [21:20:08] ok [21:20:31] I just need a way to have literal '=1+1' that doesn't show up as 2 [21:20:45] and works across all spreadsheep apps [21:20:55] *spreadsheet [21:21:18] Gotta run. I'll talk to you later [21:21:19] o/ [21:21:27] ok byw [21:21:29] *bye [21:54:10] bstorm_, did the designate logs show anything useful in https://phabricator.wikimedia.org/T224000 ? [21:54:26] is there some known way to reproduce that designate http 500? [21:54:52] !log codesearch maintenance to reduce disk space [21:54:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Codesearch/SAL [21:57:11] Not to my knowledge. I'm not at all worried though. GIGO applies. [21:57:42] Plus, as long as we are using old versions of everything in openstack, I'm entirely happy to blame that for a lot of bad bug handling [21:57:47] I know the Openstack team will [21:57:48] :) [21:58:37] We are 100% confident that we were sending garbage to designate. I only expect garbage back from that (even if APIs should be more resilient than 500). [21:59:00] I'm just glad it didn't try to somehow insert the garbage into a DNS record! [22:12:28] bstorm_, designate has a bug here [22:12:54] I believe that, but we are using an unmaintained version of openstack [22:13:02] if we dont report it on the basis of being out of date, we must check an up to date version for it [22:13:19] otherwise it might never get reported and fixed [22:14:30] I am not upset by the fact that it threw a 500 either. I'd much rather that than create a faulty dns record. Nobody takes bug reports for EoL software unless you can reproduce it on a modern version. [22:14:50] right [22:15:02] which is why I think we should test it on a modern version [22:17:38] And I still won't mind a 500 if I send it a random pile of nonsense as a name. [22:17:41] Which is what we were doing [22:18:00] To top it off, we are working on python 2. I wonder if OpenStack signed the pledge... [22:18:42] Nope, they haven't yet :) [22:18:55] So theoretically, they may still support python2 [22:19:40] doesn't really matter [22:20:15] that sort of thing is why we'd test on an up to date install [22:21:10] what was the garbage we were sending designate? [22:21:21] A dict for a name [22:21:31] It wasn't even json [22:21:36] It was a dict [22:24:14] I mean, I'm not against troubleshooting, but I don't have time to dig into it. Also, if it only happens on an error--and error that isn't graceful in an error condition is hardly the worst thing in OpenStack. There are loads of APIs out there that will 500 if you send them garbage. [22:24:40] *an error... [22:24:54] Because it is correct that it cannot handle what was sent to it [22:26:52] The traceback was a bit interesting because keystone middleware having circular references in it during some instances is a bug that is still open on OpenStack: https://bugs.launchpad.net/designate/+bug/1605331 [22:27:13] BWAHAHAHAA, just saw the submitter of that bug [22:27:45] Krenair: I think you may have already reported this bug 😉 [22:28:39] No idea if it still exists, though. [22:30:20] lol [22:30:29] but if I could not reproduce that bug on mitaka [22:30:36] why are we seeing it on mitaka here? [22:31:27] It's probably a new manifestation...not exactly the same or something. [22:33:26] If you look at the top of the ticket at the warning log, it says it's creating a record for .(somedict object that represents a zone). It tried to create that at some point. The methods it used are in the mwopenstackclients.py file if you want to try to recreate that locally. [22:38:36] I should also add that the designate logs are literally silent on the matter :-/ [22:40:26] alright, thanks [22:55:28] bstorm_: sorry I left a typo/name collision for you to trip over in that script :/ [22:55:47] No worries. Python is one of my happy places :) [22:56:12] I was thinking DB maintenance, so it mostly took me a long time to context shift. [22:56:42] Better to find it now than during a much worse meltdown. [22:56:48] * bd808 should do more testing and less code and deploy cycles [22:57:02] I am teh hax0r [22:59:49] https://twitter.com/LEGO_Group/status/1131922873222868992 [23:01:02] * bd808 is here for new space LEGO® sets [23:01:10] pinged! [23:01:16] spaaaaaaaaaccceeeeeeee [23:01:40] Reedy, hey it's the openstack logo [23:02:05] :D [23:06:18] !log clouddb-services T224318 cloudb-wikilabels-02 is now a replica of clouddb-wikilabels-01! [23:06:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Clouddb-services/SAL [23:06:21] T224318: Set up clouddb-wikilabels-02 as postgresql replica - https://phabricator.wikimedia.org/T224318 [23:37:15] @log clouddb-services T224062 clouddb1002 is now free of postgresql services and the volume is reclaimed for toolsdb use [23:37:15] T224062: clouddb1002 low on space -- move wikilabelsdb - https://phabricator.wikimedia.org/T224062 [23:37:55] Oops [23:38:04] !log clouddb-services T224062 clouddb1002 is now free of postgresql services and the volume is reclaimed for toolsdb use [23:38:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Clouddb-services/SAL [23:40:00] I feel like these steps for running devstack on top of stretch may belong in some docs somewhere: https://phabricator.wikimedia.org/T204013#4639071 [23:40:18] Upstream ones are for Ubuntu 16.04 [23:41:28] Krenair: I have literally never gotten devstack to work [23:41:49] never spent the time to stand on the correct foot and hum the right tune I guess [23:42:33] Krenair: but yeah, you could put up a page on wikitech about how to get it to work [23:43:40] I just went back and tried to use them [23:43:47] and ran into errors [23:43:52] :( [23:52:24] AH00534: apache2: Configuration error: No MPM loaded. [23:52:40] and yet: [23:52:41] krenair@labs-t224000-alex-osdev:~/devstack$ grep mpm /etc/apache2/mods-enabled/ -R [23:52:47] /etc/apache2/mods-enabled/mpm_event.conf: [23:52:55] /etc/apache2/mods-enabled/mpm_event.load:# Conflicts: mpm_worker mpm_prefork [23:53:01] /etc/apache2/mods-enabled/mpm_event.load:LoadModule mpm_event_module /usr/lib/apache2/modules/mod_mpm_event.so [23:58:19] looks like apache2.conf was mostly wiped out so mods-enabled was not getting used