[04:18:55] mutante: yeah [04:38:55] If something _can_ be run on Toolforge, does that mean you cannot have a Cloud VPS for said thing? [04:44:40] davidwbarratt: just because it can doesn't mean it should [04:44:51] haha, well that's what I like to hear. :) [04:45:11] I was just wondering if that was a requirment before a project can be created [04:45:17] if there are some legit reason why it shouldn't / can't feel free to file a ticket [04:47:40] zhuyifei1999_ thanks! [04:47:54] (np) [06:23:18] (03CR) 10jenkins-bot: Localisation updates from https://translatewiki.net. [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/454749 (owner: 10L10n-bot) [08:53:34] jynus: any comments regarding T202549 ? [08:53:35] T202549: cloudvps: eqiad1: move nova db to m5-master - https://phabricator.wikimedia.org/T202549 [09:08:12] arturo: yes, that using @jcrespo instead of #dba, is the best way to get tasks ignored ;-) [09:08:28] :-) [09:13:01] jynus: just fixed it [10:11:51] hi guys [10:12:14] i found that german wp replica asks for permissions in some table (externallinks) [10:12:16] https://pastebin.com/B1tK17tz [10:50:08] MariaDB [dewiki_p]> select 1 from externallinks limit 1; [10:50:08] ERROR 1356 (HY000): View 'dewiki_p.externallinks' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them [10:50:11] very interesting [10:50:33] wonder if there is some problem between the actual dewiki schema and maintain-views [10:50:42] this is fine on enwiki_p [10:53:02] !help Does labs have some sort of SpamAssasin/EXIM spam filters for their tools.wmflabs.org mail addresses, and if so, where's that configured? Cfr. T202558. Thank you. [10:53:02] Hauskatze: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [10:53:03] T202558: Ban spam arriving to my tools email - https://phabricator.wikimedia.org/T202558 [10:53:50] Hauskatze: I'm not sure, I would like to discuss that with the team [10:54:31] arturo: please do, it's annoying to be receiving this level of spam [10:57:35] gracias arturo [10:58:37] made visible to you both pastes [10:58:42] full email headers [10:59:20] with regards to the externallinks view [10:59:44] the view definition between enwiki_p and dewiki_p looks identical [10:59:57] can someone check the table definitions in enwiki and dewiki? [11:09:17] Krenair: perhaps ping jynus [11:10:44] arturo- your team maintains the views [11:10:53] I have nothing to do with them [11:23:48] jynus, so does the table definition differ between enwiki and dewiki or not? [12:08:08] Krenair: They shouldn't differ, but it can happen if something is updated between runs of the view script. I can take a look [12:08:44] well the view is identical [12:08:49] what I'm interested in is the underlying table [12:09:02] jynus: This is actually about the table, not the views, but I can check it myself anyway :) [12:09:31] Is on the views servers anyhow [12:10:29] Krenair: The original tables are different [12:11:05] https://www.irccloud.com/pastebin/eBSycuKG/ [12:12:10] so the problem is that dewiki's view includes a field el_from_namespace that doesn't exist in the actual table [12:12:43] https://phabricator.wikimedia.org/T86415 [12:13:07] Yeah, that's a database issue, and I wonder if they actually differ in prod. [12:13:46] Oh yeah, that ticket definitely applies [12:13:51] lol [12:16:20] bstorm_, so your paste is from a labsdb server, what does s5-master have for dewiki.externallinks? [12:16:23] Weirder is that it doesn't match its view definition. I can update the view, but why wouldn't it have the column? [12:16:33] Checking [12:17:07] don't worry about the view for a minute I think the problem might be either a schema difference between servers, or an actually missing column on dewiki.externallinks [12:18:41] It's missing in production [12:18:52] Just confirmed [12:18:56] so [12:19:06] somehow the whole of dewiki is missing that field [12:19:11] If you want to write a ticket, you can mention that I confirmed it. [12:19:19] Otherwise, I can create one. [12:19:27] But yes, dewiki is missing it [12:19:38] What I would've done in the past at this point is check what other wikis have the field, and which do not have the field [12:19:41] via information_schema [12:19:42] Which is strange because the view shouldn't look for it [12:19:57] The view isn't a custom one. [12:20:05] Which makes it seem like the column was dropped [12:20:20] We don't define that view by hand. It's a full view [12:20:24] does maintain-views check for that and ignore missing columns? [12:20:44] oh or was it one of the ones where it just provides a 100% complete view of the table? [12:20:50] On a full view, it just presents the table as a view, reading columns in and spitting them out [12:21:05] For custom views, they are managed manually and this would make sense [12:21:08] yeah that does sound like it got dropped then [12:21:16] It does [12:21:28] The question becomes....was that intentional? [12:21:29] https://phabricator.wikimedia.org/T114117 [12:21:29] :) [12:21:52] okay [12:21:57] Ahah! It was Manuel. We can blame him. [12:22:05] This will break views, though. [12:22:19] so either we run maintain-views each time they make progress there [12:22:29] or we temporarily make it a custom view with only the fields that should exist everywhere [12:22:33] We should I think. [12:22:37] rerun it [12:22:41] I'll add a note in there [12:22:44] second one might be easier [12:24:45] Less work at least :) [12:24:55] It's "easier" to rerun the script every now and then. [12:25:14] No real thought or chance of blowing away every tool lol [12:25:38] But yeah, a custom view that just nulls the column resolves things from my end. [12:26:00] hm [12:26:03] could null it [12:26:10] but might be better to leave it out entirely [12:28:52] Depends on how they do the work. If it's per shard, reruns may be practical enough. It takes me a minute. [12:29:05] Unlike major table changes where I need things depooled. [12:31:37] I only even consider doing reruns because temporary view changes are very "tech debt" ish :) [12:33:35] bstorm_, actually marmick!~marmick@44.red-79-148-227.staticip.rima-tde.net found it [12:33:40] Left a comment in there to coordinate either way [12:33:44] Ok thanks :) [12:36:16] can't find their wikitech info right now... [12:36:21] :-/ [12:38:27] marmick: if you want me to subscribe you to the task, let me know what your phab handle is. [12:38:48] not online. Oh well :) [12:41:24] that looks like an spanish ISP address [12:58:49] Those views/tables are now resynced, and the DBA team will let me know when they are making the changes. I decided I didn't want to change the definition because I'll forget and turn it into random tech debt, and the real problem is getting a full S3 run squeezed in between table locks (which means I'll have to have them depooled by the DBAs, usually). [13:08:06] !log admin T202115 `root@cloudcontrol1003:~# neutron subnet-update --allocation-pool start=10.64.22.254,end=10.64.22.254 e4fb2771-a361-4add-ac4e-280cc300c59f` [13:08:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:08:09] T202115: cloudvps: eqiad1: review floating IP mechanisms - https://phabricator.wikimedia.org/T202115 [13:09:05] !log tools.openstack-browser Added multi-region checks to nova lookups -- yesterday [13:09:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.openstack-browser/SAL [13:10:28] !log admin T202115 (was `{"start": "10.64.22.2", "end": "10.64.22.254"}` ) [13:10:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:15:13] !log admin T202115 `root@cloudcontrol1003:~# neutron subnet-update --allocation-pool start=10.64.22.4,end=10.64.22.4 e4fb2771-a361-4add-ac4e-280cc300c59f` [13:15:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:15:15] T202115: cloudvps: eqiad1: review floating IP mechanisms - https://phabricator.wikimedia.org/T202115 [16:17:42] !log admin T188589 bstorm_ merged patch to reduce nova DB connection usage [16:17:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:17:44] T188589: m5-master overloaded by idle connections to the nova database - https://phabricator.wikimedia.org/T188589 [16:18:49] zhuyifei1999_: ok :) i think if i convert it to a profile first then your change will not get the jenkins downvote [16:23:05] thanks mutante :-) [16:30:03] mutante: ok. what's the difference between 'include', 'require', and a class declaration? [16:31:24] I'll be on a plane (then a bus) tomorrow to the US so will have a very long offtime [16:31:32] zhuyifei1999_: "include" says nothing about the order of things.. "require" says A must come before B [16:31:51] ok [16:32:38] and for include/require compared to declaring a class: [16:32:39] https://puppet.com/docs/puppet/5.3/lang_classes.html#include-like-vs-resource-like [16:32:55] when declaring the class you get the "resource-like" behavior [16:34:14] so it seems that using include/require is superior? [16:34:35] (if the class declaration is empty) [16:36:24] well. it depends.. but yea, if you dont need parameters you can just include and you dont run into duplicate declaration issues if you do it twice in 2 locations [16:37:10] the thing about the quarry module is that it includes stuff from another module inside a module [16:37:29] that's just supposed to happen in profiles [16:37:32] ok [16:37:51] yeah jy.nus explained about it already [16:38:12] (about the profile thing) [16:38:12] ok, cool [16:39:36] zhuyifei1999_: are you still online for a couple minutes? we could merge that now i think .. [16:39:51] the prerequisite i mean [16:39:53] yeah, wait. saw some talk page messages [16:41:17] ok, let's do this [16:43:52] should the other 'stuffs' in the module (like redis, celeryrunner, and querykiller) be changed to profiles as well? [16:46:27] btw, any comments on https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/451698/ , besides it doesn't use profiles and will soon need rebasing? [16:49:53] mutante: [16:50:41] zhuyifei1999_: i haven't seen that change yet. but executing pip isn't recommended [16:53:01] is there any other way to set up a virtualenv in puppet in cloud? [16:53:01] zhuyifei1999_: can you check puppet on the relevant instances after i merge? [16:53:01] ok [16:53:01] i dont know that last question [16:53:01] but let's do one thing at a time [16:53:01] just a sec [16:53:01] the database and web are both quarry-main-01. /me is ssh'ed in [16:56:21] ok, i am merging the first one [16:56:56] ok [16:56:59] done.. but i thikn there might be some delay between merging it on prod puppetmaster and you seeing it [16:57:04] but try running puppet [16:57:34] (doing) [16:57:44] btw, if you look at the jenkins output from gerrit: [16:57:46] https://integration.wikimedia.org/ci/job/operations-puppet-tests-docker/26765/console [16:57:58] and then search it for "wmf-style" [16:58:04] "wmf-style: total violations delta -1 [16:58:06] https://www.irccloud.com/pastebin/vqWXcxMS/ [16:58:19] Resolved violations: wmf-style: class 'quarry::database' declares class mysql::server from another module [16:58:22] see that [16:58:38] yeah. thanks :) [16:59:20] hmm.. so i'm not sure how long it takes to sync the masters.. keep forgettting [16:59:35] wonder if we should merge all 3 of my changes [17:00:10] sure [17:00:50] this /data/project/quarry is being really weird though (probably related to NFS maintenance?). I guess I'll do a recursive chown. [17:01:14] (they are all owned by '998') [17:04:18] arturo: bstorm_ hey, is there a way to get real IPs behind requests to cloud VPS? someone is using ores.wmflabs.org a lot (instead of ores.wikimedia.org) and I want to contact the person (the user agent is pretty generic) [17:04:51] by a lot I mean 200k times a day [17:05:01] hmm 998 is the id on the runners [17:05:38] When I'm back I can check at the proxies, perhaps [17:05:50] Thanks [17:06:05] zhuyifei1999_: i think i'll merge more and ask you to run puppet every 5 min until you see a change :) [17:06:13] ok [17:07:49] zhuyifei1999_: i wonder about the " require => [Class['::labs_debrepo']]," i wonder if it uses the "labs_debrepo" stuff [17:08:11] replaced it with require ::labs_debrepo in case it does [17:08:23] I honestly don't even know what that is [17:08:29] let me check [17:08:39] it's a local APT repo [17:08:48] but maybe it was removed globally [17:09:18] also my next change i was about to merge.. it would affect different instances.. the ones for the quarry "web" role [17:09:54] oh my [17:10:12] so that is how all the python packages are installed [17:10:30] yes web is still quarry-main-01 [17:12:28] ok so keep labs_debrepo. It's still in use in the current code (but after https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/451698/ it will be no longer used) [17:13:47] yea.. keeping it! ack [17:14:15] are you also on the "web" instances.. or should we only touch "db" now [17:14:39] yes [17:14:49] I mean on quarry-main-01 [17:14:57] or are they the same and using more than one role per instance [17:15:02] yes [17:15:17] ok :) so that's another thing. ideally a role is one thing [17:15:17] well, main-01 is web, redis, db, and killer [17:15:21] that can only happen once per instance [17:15:27] but one role can include many profiles [17:15:32] ok [17:16:05] well, the instance assignment is more dynamic right now [17:16:07] merging the "web" change [17:16:10] k [17:16:45] is there only one type of quarry intsance ? [17:16:47] I think I should disable auto puppet runs so I can catch changes right? [17:16:49] using all the roles at once? [17:17:03] there are also two runners [17:17:08] well.. if there is an error we will see it on the next run as well [17:17:13] that does celeryworker [17:17:25] so how many different types do we actually have? 2? [17:17:32] everything else is currently on main [17:17:34] yes [17:17:42] ok, so in the end we should have 2 roles [17:17:51] including the profiles [17:18:02] in the 'big quarry migration' we want to have 3 or 4 [17:18:14] so separating redis and db from main [17:18:25] yea, we can have many roles that's fine. but each instance should be one or the other [17:18:30] *nod* [17:18:33] (and might want to have a project-local puppetmaster) [17:19:24] I just did a puppet run and don't see anything interesting [17:19:38] ok :) [17:19:51] wait [17:19:55] well, i have one more.. the one for redis [17:20:04] did you get errors now because masters synced ? :P [17:20:06] https://www.irccloud.com/pastebin/nKtAtHjv/ [17:20:13] yes [17:20:41] awww.. i see.. i wish we could compile these before merging [17:20:51] almost expected [17:21:10] so that was the "web" change. the "db" change is fine then [17:21:22] yeah [17:21:24] will upload a fix in a moment [17:21:27] ok [17:31:57] zhuyifei1999_: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/454862/ but jenkins hates it [17:32:00] " role 'role::labs::quarry::web' includes quarry::base which is neither a role nor a profile" [17:32:15] so here we have an example for difference between include/require and class declaration .. [17:32:27] you get those duplicate delcaration errors [17:33:04] and.. jenkins hates this too because it's a cycle.. quarry::base would also have to be a profile for this to be ok.. but one has to be first [17:33:18] it's messy [17:33:26] why can't modules be included? [17:34:08] https://wikitech.wikimedia.org/wiki/Puppet_coding#Modules [17:34:48] must not use classes from other modules ... per "represent basic units of functionality and should be mostly general-purpose and reusable across very different environments." [17:35:11] ok. I guess we should make base a profile then [17:35:22] guess so,, yea [17:35:38] if it's ok to leave this as it is for a few minutes. i would get food and continue [17:37:09] i can also override jenkins vote on this one and then follow-up fixing "base" [17:41:03] yea, i did that. i merged it ignoring jenkins (usually dont do this) but that should fix the issue you saw on puppe trun [17:41:12] and then i will make yet another fix for base [17:41:20] will be back soon [17:41:40] ok [17:56:49] zhuyifei1999_: is puppet run fixed ? [17:57:47] yeah [17:58:05] :) ok [18:03:48] (03PS1) 10Ladsgroup: Use ORES in production instead of the cloud VPS setup [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/454866 (https://phabricator.wikimedia.org/T202653) [18:16:59] zhuyifei1999_: doing this now https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/454870/ [18:17:15] ok [18:18:37] such a rabbit hole: https://integration.wikimedia.org/ci/job/operations-puppet-tests-docker/26774/console [18:18:44] Parameter 'clone_path' of class 'profile::quarry::base' has no call to hiera [18:18:51] Parameter 'result_path_parent' of class 'profile::quarry::base' has no call to hiera [18:19:02] profile 'profile::quarry::base' includes non-profile class redis::client::python [18:19:05] it doesnt stop [18:19:16] legacy code *facepalm* [18:19:17] and all of this was there before, it just notices when you touch it [18:19:29] or jenkins was overridden before [18:20:04] were these checks in place when quarry was built? [18:20:20] I remember profiles being a somewhat recent thing [18:20:26] this change fixes 2 violations [18:20:30] but introduces 4 [18:20:38] so the "delta" is 2 .. so it votes -1 [18:21:21] fair point, yea, that's possible it is older than the checks [18:23:00] so we are supposed to move those values to Hiera.. let's see _where_ in Hiera though [18:24:54] I'm pretty sure the current values are all default values [18:31:13] zhuyifei1999_: yea, it wants us to put them in Hiera and not even have defaults at all... [18:31:23] but then we have Hiera in repo or Hiera in wiki.. [18:32:43] why can't we put the defaults in profiles? [18:32:46] and in prod we do hiera by puppet role.. but hierdata/labs doesnt have ./role/ [18:32:52] it has project names [18:33:42] I could make hiera on wiki, but to me it looks like unnecessary redundancy [18:35:11] hmm. maybe the style guide changed about that. i cant find the "no defaults" comment right now [18:36:13] i'll do hiera in the repo [18:36:45] project-wide [18:37:00] ok thanks [18:46:51] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/454870/ now it solves 2 and adds just 1 new one. so delta -1, so it likes it [18:48:25] merging it [18:49:49] ok :) [18:56:16] mutante: [18:56:20] https://www.irccloud.com/pastebin/lHypH92Z/ [18:56:34] i think i already know.. looking [18:56:41] ok [18:57:01] yes:) fix is https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/454889/ [18:58:57] I guess these two should be made into profiles [18:59:12] well, lol [18:59:13] class 'quarry::celeryrunner' includes profile::quarry::base from another module [18:59:21] class 'quarry::querykiller' includes profile::quarry::base from another module [18:59:23] (such a rabbit hole) [18:59:27] i was about to say they dont have to .. but they do [19:02:12] ok, have to power through it at this point. merging the one for redis.. then i'll amend to do the last 2 in one step.. then hopefully done [19:02:46] ok [19:11:52] andrewbogott: I'm having issues writing files on my VPS [19:12:09] Everytime I try to upload to my VPS I get an error returned. [19:12:17] Claiming the file write failed. [19:12:23] And it's not a permission error. [19:12:28] Any chance your disk is full? [19:12:37] It shouldn't be [19:13:05] what VM? [19:13:13] cyberbot-exec-iabot-01 [19:13:36] its drive is full [19:14:02] Holy shit [19:14:07] how did that happen [19:14:23] :/ [19:14:35] run du [19:15:51] zhuyifei1999_: what units are the numbers in? [19:16:00] mutante: I'm getting really sleepy. let's continue next week after I get to the US [19:16:15] Cyberpower678: du -h for human-readable units [19:16:50] du -h returns 375M total [19:17:05] zhuyifei1999_: 1 minute :) [19:17:13] um ok [19:17:15] merged [19:17:24] Cyberpower678: du -h counts working directory [19:17:37] zhuyifei1999_: I know [19:17:37] unless you specify a path yourself [19:17:40] I'm in the root [19:17:49] For my bot [19:17:54] do you have a lot of opened but deleted files? [19:18:10] How are files that are deleted still open? [19:18:24] IABot should close all handles before removing the files. [19:18:29] du -h / [19:18:54] paladox: 3.9G [19:19:03] Did you mount any paths? (Ie to add the extended storage) [19:19:10] Cyberpower678: run $ sudo lsof [19:19:15] zhuyifei1999_: i merged all my changes. now it would be time to rebase your original one that started it [19:19:25] paladox: no [19:19:31] zhuyifei1999_: i will do that if you want.. you dont have to wait [19:19:36] where PID is the process that may be having opened filed [19:19:40] 3.9G? I thought the total is 19G? [19:19:40] zhuyifei1999_: but maybe if you can check puppet one last time now [19:19:56] paladox: it should be [19:20:09] mutante: ok thanks. I'll do the rebasing in the weekend / next week [19:20:25] zhuyifei1999_: I wouldn't know. There are 23 processes open [19:20:44] though, I think puppet has just updated and still broken [19:20:48] If IABot can't write to disk, it just defaults to holding the stuff in memory [19:20:55] https://www.irccloud.com/pastebin/KgwN5nte/ [19:20:57] mutante: ^ [19:20:57] Maybe try [19:20:59] find / -type f -size +1024k [19:21:08] Or find / -size +50000 -exec ls -lahg {} \; [19:21:19] Cyberpower678: can you let me ssh in? [19:21:38] zhuyifei1999_: oh no.. "role/manifests/labs/quarry/killer.pp" ? what :P [19:21:58] zhuyifei1999_: sure. [19:22:07] zhuyifei1999_: do you need me to do anything? [19:22:46] well, I'll see if it;s caused by deleted but opened files [19:23:30] oh I don't have cloud-wide access so might need you to grant me access [19:23:42] paladox: it dumped mostly system files, and one larger txt file I am aware of, but it's not 20GB [19:23:55] Grunt. [19:24:28] Ok [19:25:03] zhuyifei1999_: remind me where I can grant you access? [19:25:10] horizon [19:25:16] * Cyberpower678 hasn't touched the nova interface in a while [19:25:28] Oh it's on Horizon now? [19:25:32] http://horizon.wikimedia.org/ [19:25:35] Cyberpower678: try 'lsof -nP | grep '(deleted)' ' [19:26:08] oh wait the project membership could be wikitech [19:26:24] nope, Horizon [19:26:38] andrewbogott: lot's of sh, php, and flock, but I don't know what to make of it. [19:27:03] zhuyifei1999_: merged a fix for that last issue [19:27:11] IABot writes extensionless files, and then closes and deletes them when it's done with ti [19:27:17] mutante: ok, checking [19:28:14] https://www.irccloud.com/pastebin/TPu1pLlE/ [19:28:18] I think it broke [19:29:35] Cyberpower678: you can see a process id in that list, and then see what the process is with ps [19:29:36] the runners [19:29:41] https://www.irccloud.com/pastebin/qvluIfnc/ [19:29:53] So, for example [19:29:54] sh 1410 cyberpower678 2u REG 254,3 14779359232 3068 /tmp/tmpfnYa23b (deleted) [19:30:05] There's a big file being held open by process 1410 [19:30:16] https://www.irccloud.com/pastebin/xs93Bxd8/ [19:30:20] And that's what process 1410 is [19:30:26] zhuyifei1999_: crap... it's related to finding the values in Hiera.. on it [19:30:54] * zhuyifei1999_ thinks hiera is confusing [19:31:17] andrewbogott: how is a deleted file being held open? [19:31:17] Cyberpower678: that file alone is many gb, and you have quite a few process holding files open of that size [19:31:20] * zhuyifei1999_ also thinks stuffs in wikitech vs stuffs in horizon is confusing [19:31:40] I don't know of anything that dumps GBs into a file. [19:31:55] IABot's dumps are usually around 1 MB [19:32:35] Cyberpower678: it's just what it sounds like — a file has been deleted in the filesystem but it's still being held open by whoever writes to it [19:32:36] Cyberpower678: an opened file can be deleted, moved over, or otherwise made inaccessible just as if it's not open [19:32:50] but as long as it's open the space can't be re-used [19:33:04] the space is only reclaimed if all opened fds are closed [19:33:10] yeah that [19:33:24] zhuyifei1999_: what is the complete instance name you were testing on plz.. i will ssh there myself [19:33:30] Anyway to access this file? [19:33:51] mutante: quarry-main-01.quarry.eqiad.wmflabs and quarry-runner-01.quarry.eqiad.wmflabs [19:34:00] As usually horizon isn't work [19:34:02] *working [19:34:27] 'not working' in what sense? [19:34:29] Cyberpower678: /proc/1410/fd/ [19:35:28] it's actually fd 2, so /proc/1410/fd/2 [19:35:34] andrewbogott: none of my instances are being listed [19:35:43] Probably you need to change your region to 'eqiad' [19:35:56] Out of the 20 times I've used it, Horizon has worked about 5 of them [19:36:00] but anyway weren't you logging in to change project access? That works across regions [19:36:51] I'm trying to figure this out. [19:37:00] fwiw fd 2 is usually stderr [19:37:02] I managed project access on Wikitech [19:37:25] Project -> Access -> Project Members [19:37:38] zhuyifei1999_: I need your username [19:37:46] zhuyifei1999 [19:38:04] just my irc nick without _ [19:38:36] zhuyifei1999_: you're in. No breaking shit. :-) [19:39:03] But fixing shit, is very much allowed. ;-) [19:39:05] mutante: ok and thanks for doing all these fixes [19:39:48] Cyberpower678: I'd probably be read-only and do nothing except having new stuffs in bash_history [19:40:05] and ask you to do the real fixes :P [19:40:35] I'd first need to know what's broken. [19:40:48] zhuyifei1999_: i am on those machines and fixing the "runner" one right now first [19:40:50] I have never seen any part of IABot create a massive file. [19:41:00] mutante: ok thanks [19:42:06] And none of the files IABot are producing are remotely that large. [19:42:18] https://www.irccloud.com/pastebin/IY80460T/ [19:42:27] hmm need sudo I think [19:42:53] zhuyifei1999_: you have access to that [19:43:28] https://www.irccloud.com/pastebin/49bhfl14/ [19:43:44] I thought by default all members have sudo [19:44:08] zhuyifei1999_: projectadmins have sudo. Members do not. [19:44:24] Otherwise everyone on tools could sudo and that would be awkward and dangerous. :p [19:44:51] (oh I think I was intending to sudo -l rather than -i, too used to running -i) [19:45:04] Cyberpower678: there's a sudo policy on horizon [19:45:08] quarry-runnter-01 has a running quarry worker and puppet is happy again. on to the other one [19:45:17] mutante: ok thanks [19:46:04] zhuyifei1999_: oh so I need to give you access? [19:46:10] Cyberpower678: I'm pretty sure projectadmin is just horizon write access and hiera write access. sudo is it's own [19:46:34] could you? (thanks) [19:47:08] zhuyifei1999_: should be done [19:47:28] indeed, thanks. /me looks to lsof [19:49:17] * Cyberpower678 is sad that he isn't an advanced linux user [19:49:21] https://www.irccloud.com/pastebin/x1kvXHED/ [19:49:33] the big deleted files are the same one [19:49:59] but why is it opened by cron... [19:50:16] Can /tmp/tmpfnYa23b be accessed at all? [19:52:13] https://www.irccloud.com/pastebin/eckPhncI/ [19:52:44] !log ores shutting down ores-worker-06 [19:52:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL [19:52:48] the contents of this 14 GB file is basically... PHP warnings :{ [19:52:48] Is that the contents of /dev/null? [19:52:55] *:P [19:52:57] no [19:53:16] Well that's where the output should be going [19:53:19] the contents of this to-be-deleted /tmp/tmpfnYa23b [19:53:53] Granted it shouldn't be creating such a massive amount of errors, but still. [19:55:07] /dev/null is empty because all reads returns length 0 (https://github.com/torvalds/linux/blob/06dd3dfeea60e2a6457a6aedf97afc8e6d2ba497/drivers/char/mem.c#L648), but that's irrelevant :P [19:55:23] zhuyifei1999_: so how do I prevent these errors from getting written to disk at all. Everything should be going to /dev/null [19:55:38] what's your crontab command? [19:55:54] /Service[uwsgi-quarry-web]/ensure: ensure changed 'stopped' to 'running' [19:56:01] The crontab can be seen in the IABot root directory in home [19:56:02] mutante: yay thanks [19:56:06] Service[celery-quarry-worker]/ensure: ensure changed 'stopped' to 'running' [19:56:22] the issue was it used variables names in a template that referred to the old class name [19:56:31] so the "clone_path" was wrong on both [19:57:12] though it's still not 100% right [19:57:20] I hatenot being able to tab-complete when drive is full [19:57:54] Cyberpower678: so basically instead of doing > /dev/null, do &> /dev/null [19:58:09] zhuyifei1999_: that will fix it? [19:58:30] other more 'complex' methods include > /dev/null 2> /dev/null, 2>&1 > /dev/null [19:58:50] I assume before I can write anything, I need to do a complete reboot [19:59:15] that will redirect stderr to /dev/null for future processes [19:59:28] well, you can kill the relevant processes [19:59:32] !log ores shutting down ores-worker-05 [19:59:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL [20:00:13] zhuyifei1999_: I'm just going to reboot the entire thing. If there is one errant process, then there are at 22 others. [20:01:08] there is only 5 [20:01:09] https://www.irccloud.com/pastebin/x1kvXHED/ [20:01:10] (03PS2) 10Lokal Profil: Ensure Python container waits for the database to be ready [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/454224 (owner: 10Jean-Frédéric) [20:01:27] 5 processes has that file open [20:01:38] (and we don't count threads) [20:02:53] whatever reboot works too, but if cron started anything before you update crontab they will run with the old command [20:04:04] ok I really need to sleep. gtg to airport in... 4 hours [20:04:26] * zhuyifei1999_ facepalm [20:05:08] zhuyifei1999_: thanks for your help, and good night, and no facepalming [20:09:07] zhuyifei1999_: the changes from https://www.irccloud.com/pastebin/TPu1pLlE/ have been reverted and both instances show no issues now [20:09:15] after one more thing i merged [20:09:21] good night! [20:11:03] (03CR) 10Lokal Profil: [C: 032] "Thanks for this. One of those annoyances that is easy to work around (by waiting) but hits you every time." [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/454224 (owner: 10Jean-Frédéric) [20:12:55] (03PS2) 10Lokal Profil: Streamline start-up instructions in ReadMe [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/454225 (https://phabricator.wikimedia.org/T202293) (owner: 10Jean-Frédéric) [20:13:17] (03Merged) 10jenkins-bot: Ensure Python container waits for the database to be ready [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/454224 (owner: 10Jean-Frédéric) [20:14:55] (03CR) 10jenkins-bot: Ensure Python container waits for the database to be ready [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/454224 (owner: 10Jean-Frédéric) [20:19:59] (03CR) 10Lokal Profil: Streamline start-up instructions in ReadMe (031 comment) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/454225 (https://phabricator.wikimedia.org/T202293) (owner: 10Jean-Frédéric) [20:44:14] (03PS1) 10Lokal Profil: Add more known fields to ru_ru [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/454920 [20:47:39] (03PS1) 10Lokal Profil: Add missingCommonscatPage to Swedish datasets [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/454923 [21:00:41] (03CR) 10Lokal Profil: "WIP because I'm checking if there are any more fields to add (since I forgot this task for a year)" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/454920 (owner: 10Lokal Profil) [21:53:40] (03CR) 10Jean-Frédéric: "Thanks for fixing the linting failures − I had not even noticed :D" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/454224 (owner: 10Jean-Frédéric) [21:54:19] (03CR) 10Jean-Frédéric: [C: 032] Add missingCommonscatPage to Swedish datasets [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/454923 (owner: 10Lokal Profil) [21:55:37] (03Merged) 10jenkins-bot: Add missingCommonscatPage to Swedish datasets [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/454923 (owner: 10Lokal Profil) [21:56:44] (03CR) 10jenkins-bot: Add missingCommonscatPage to Swedish datasets [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/454923 (owner: 10Lokal Profil) [22:09:09] (03PS1) 10Jean-Frédéric: Use Pipenv to manage requirements [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/455015 [22:09:11] (03PS1) 10Jean-Frédéric: Bump pywikibot to latest version 3.0.20180823 [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/455016 (https://phabricator.wikimedia.org/T202378) [22:11:28] (03CR) 10jerkins-bot: [V: 04-1] Bump pywikibot to latest version 3.0.20180823 [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/455016 (https://phabricator.wikimedia.org/T202378) (owner: 10Jean-Frédéric) [22:35:48] (03PS1) 10Lokal Profil: Handle missing commonsTrackerCategory gracefully [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/455023 (https://phabricator.wikimedia.org/T147750)