[00:21:28] !log deployment-prep ORES with git-lfs [00:21:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [00:33:15] awight: I think you need to set [00:33:20] git_binary_manager: git-lfs [00:33:29] In scap.cfg [00:35:13] oho, paladox thank you! [00:35:51] Your welcome :) [00:35:59] awight: did that work? :) [00:36:08] !log wikilabels u_wikilabels=> update campaign set active = 'f' where wiki = 'fawiki' and id != 71; [00:36:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilabels/SAL [15:47:28] !log deployment-prep ORES with git-lfs, scap config [15:47:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [15:48:38] halfak: I’ll deploy that quarry thing next hour hopefully. Outside right now. Thanks for reviewing the patch [15:56:45] !log quarry deploying d653400 to quarry-runner-0{1,2} T188564 [15:56:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL [15:56:47] T188564: Quarry should refuse to save results that are way too large - https://phabricator.wikimedia.org/T188564 [15:58:44] halfak: can you test if it works now? [16:07:36] !log puppet3-diffs add myself to the project as admin so I can update compiler facts [16:07:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Puppet3-diffs/SAL [16:07:40] * halfak tries [16:08:41] the task get killed without status updating if I kill it manually via SIGALRM [16:09:28] zhuyifei1999_, it failed [16:09:29] https://quarry.wmflabs.org/query/25502 [16:09:30] (i.e. the signal handler seems to be the default one if I do it manually) [16:09:39] Hmm. [16:09:50] Writing 92k rows with one int should be fast, right? [16:09:57] Do you see a spike in IO? [16:10:24] looking [16:10:35] * halfak looks back at the code. [16:11:31] Hmm. Looks right. [16:11:36] * halfak runs some tests [16:11:49] yeah IO spike right now [16:12:11] Oh. Hmm. Shouldn't be related to the query that just failed, you'd think [16:12:28] (when I ran my forked query) [16:14:15] halfak: shall I switch back to the un-limited version and see how long the save takes? [16:14:28] Just ran a test on the analytics machines and the save took less than .5 seconds. :| [16:14:48] ik, but it saves the query results to NFS [16:14:54] OH! [16:15:01] maybe NFS is unhappy :| [16:15:16] which is... one of the slowest things here [16:15:25] damn. [16:15:33] Can never get away from NFS [16:16:14] FWIW, the output file from my test is 809K -- which seems reasonable. [16:16:22] This is the default TSV format that MySQL outputs [16:16:33] if I switch to instance local storage I doubt the storage space of single-instances can handle the amount of query outputs [16:16:41] zhuyifei1999_, right. [16:16:42] Hmm [16:16:53] I think there are some option for big-disk instances [16:17:22] how about let's just switch back to unlimited and see how long it takes? [16:17:56] I'm down for attempting that but I need to step away from the computer soon. [16:18:06] Can you test in my absence for the next ~hour? [16:18:13] ok [16:18:21] !log quarry depool quarry-runner-01 [16:18:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL [16:18:53] o/ [16:20:57] !log installed python-dbg on quarry-runner-02 because it's so good [16:20:57] zhuyifei1999_: Unknown project "installed" [16:21:04] !log quarry installed python-dbg on quarry-runner-02 because it's so good [16:21:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL [16:23:08] py-bt is empty o.O [16:23:55] oh, wrong version of python running... [16:29:04] !log quarry quarry-runner-02 is on d9cc1c8 [16:29:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL [16:30:28] (03PS1) 10Volans: Puppetboard: add dummy secret [labs/private] - 10https://gerrit.wikimedia.org/r/419775 [16:53:42] (03CR) 10Filippo Giunchedi: [C: 031] Puppetboard: add dummy secret [labs/private] - 10https://gerrit.wikimedia.org/r/419775 (owner: 10Volans) [16:54:14] (03CR) 10Volans: [V: 032 C: 032] Puppetboard: add dummy secret [labs/private] - 10https://gerrit.wikimedia.org/r/419775 (owner: 10Volans) [16:56:33] !log granted elasticsearch credentials to tools.denkmalbot T185624 [16:56:34] zhuyifei1999_: Unknown project "granted" [16:56:34] T185624: Elasticsearch credential request for denkmalbot - https://phabricator.wikimedia.org/T185624 [16:56:40] !log tools granted elasticsearch credentials to tools.denkmalbot T185624 [16:56:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:10:38] (03PS1) 10Volans: Add keyholder dummy keys for puppetboard [labs/private] - 10https://gerrit.wikimedia.org/r/419796 (https://phabricator.wikimedia.org/T184563) [17:14:24] (03CR) 10Volans: [V: 032 C: 032] Add keyholder dummy keys for puppetboard [labs/private] - 10https://gerrit.wikimedia.org/r/419796 (https://phabricator.wikimedia.org/T184563) (owner: 10Volans) [17:34:07] o/ zhuyifei1999_ [17:34:09] Sorry was AFK for longer than expected. [17:34:32] hi [17:35:02] so I tested your query and the whole run seems a lit shorter than 5 mins [17:35:14] I guess we can double to 10? [17:35:49] Writing out 800K takes 5 mins? [17:35:57] Or the query + the write time? [17:36:09] the query finish is seconds [17:36:11] *in [17:36:27] so the query+write [17:37:00] (if the explain says nothing then it is probably writing) [17:38:30] Strange that it would take so long. It seems like this is an issue for the NFS-based setup. [17:38:42] zhuyifei1999_, do you have a good way to check how much storage quarry is currently using? [17:38:55] du? proabably [17:38:57] https://phabricator.wikimedia.org/T178520 fyi [17:39:15] Graphite? [17:39:29] Oh nice. [17:39:38] OK. So I think the new big-disk instances are ~300GB [17:39:48] andrewbogott, ^ [17:39:51] Can you confirm? [17:40:36] the other problem is how to send the query results to the bigdisk, whereever it is [17:40:39] halfak: that's right [17:41:42] and what if the instance is gone... query results is a snapshot of a specific state and is not rebuildable [17:43:15] another way would be kill on memory usage > a value, but that would be complicated [17:43:55] neither the malloc-ed size and resident size is accurate on how much memory is 'used' [17:46:06] (they accumulate asymptotically for a non-memory-leaking program) [17:48:13] zhuyifei1999_, maybe we write primarily to the bigdisk and do a periodic backup to nfs or a second bigdisk? [17:48:27] hmm [17:48:39] yeah sounds good to me [17:48:49] andrewbogott, sorry to bug again. Any better option for a long-term backup of impossible to reproduce data in cloud VPS? [17:49:11] so how to send the query results to bigdisk? [17:51:12] (when I think of virtual file IO I keep on thinking of FUSE, then I realize nobody here actually run fuse according to puppet, so I'd have to be very adventurous to go that way) [17:51:45] halfak: not really. I think if you search phab you'll find tickets about pending backup systems. [17:56:09] yes in theory we have something on the horizon (pun!) but nothing super near-term, procurement and then time to implement [17:56:19] halfak: out of curiosity how big are we talking storage wise? [17:57:04] zhuyifei1999@quarry-main-01:/data/project/quarry$ du -sh results [17:57:04] 127G results [17:57:31] ack [17:57:32] 112G in Oct 2017 https://phabricator.wikimedia.org/T178520 [17:57:46] zhuyifei1999_: would it compress well? [17:58:24] I can try, but unlikely. It's sqlite databases [17:58:50] usually to backup a sqlite database I have dumped to text and compressed [18:00:10] http://www.sqlitetutorial.net/sqlite-dump/ [18:02:15] zhuyifei1999@quarry-main-01:/data/project/quarry/results/1232/18542$ gzip -vc 176005.sqlite > /dev/null [18:02:15] 176005.sqlite: 51.4% [18:02:22] random selection ^ [18:02:30] not bad [18:03:41] https://www.irccloud.com/pastebin/wq6q9UwH/ [18:03:45] a few more [18:04:00] looks like smaller databases compress worse [18:04:31] less means less to dedupe in the case of text usually [19:27:29] !log quarry switch back to d9cc1c8 on both hosts [19:27:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL [20:45:45] halfak: any suggestions on how to send the results from the query runners to the bigdisk? [21:35:50] hey, how can we go about deploying https://gerrit.wikimedia.org/r/#/c/403833/ ? we've finally received a DBA approval for the underlying table, but we need a Sanitarium config for it [21:53:01] MaxSem: Needs manual rebase [21:58:47] MaxSem: You might be able to ask bstorm_ nicely as she's been working on the script [22:01:12] To deploy that, it needs a rebase, merge and then puppet-merge, but the sanitarium config is more the DBAs I think. [22:01:27] zhuyifei1999_, I saw your ping here and added some thoughts in the task :) [22:01:36] k [22:01:50] I responded as well [22:02:58] MaxSem: Do you need sanitarium stuff? The comment from Jaime seems to suggest otherwise? [22:03:24] I can get that in there up to the sanitarium [22:05:30] rebased [22:05:49] he seems to be unsure it this patch has already been merged or not [22:05:57] question for bd808, @here: hi everyone. Looking for some info. I have a tool (HostBot) that's been offline since August, and the tool has (had?) a public user db on the old replica servers. I'm just now seeing this blog post https://phabricator.wikimedia.org/phame/post/view/70/new_wiki_replica_servers_ready_for_use/ has my old db deleted, or migrated to a new server? [22:06:20] MaxSem: I'll check :) [22:06:59] J-Mo: I suspect it would be gone [22:07:18] mmm. was afraid of that. thanks Platonides [22:07:26] * J-Mo goes hunting for his latest backup [22:07:33] I'm not completely sure about that [22:08:06] but I think there were some warnings about not keeping some dbs [22:08:23] so my question is shall I create the table now or the puppet change needs to be first [22:08:50] The puppet change is just for creating the labs views after the fact. [22:09:09] J-Mo: iirc there was a backup [22:09:19] If you need to create a table, this won't show up on the Cloud DB views until after we manually run the script anyway MaxSem [22:09:33] before the complete shutdown [22:09:46] * zhuyifei1999_ checks [22:09:52] The puppet change is so that a view will also get created...once we run the scripts. The table has to be there before we run it. [22:10:02] But it doesn't run on its own [22:10:17] Does that help? [22:12:07] J-Mo: https://phabricator.wikimedia.org/T183758#3887502 [22:13:20] thanks zhuyifei1999_ I'll go hunt in scratch [22:13:42] np [22:15:38] yep. found it! what a relief [22:20:09] J-Mo: I think they archived them somewhere during as well [22:21:38] chasemp thank you! I've got the zipped archive downloaded. Will migrate it to tools.labsdb [22:25:48] MaxSem: What wiki/db should the table exist on at this point? I was just poking to see if the table is created yet. I don't see it on enwiki at least [22:26:11] ok, thanks bstorm_. I'll create the table and populate it with a couple of test rows [22:26:47] Ok :) [22:27:09] bstorm_: it'll be on the centralauth db [22:27:16] Oh! [22:27:17] Ok [22:27:24] only one, not hundreds :) [22:29:34] In that case, it isn't in that db either on our replicas [22:31:31] That means it is either not in prod or not on the sanitarium, etc. [22:31:50] either way, it's not there yet, so the puppet patch doesn't need to merge yet [22:32:25] Once it is replicating through, then the puppet patch will be ready, per se. [22:47:55] bstorm_: the table is up [22:48:21] I see it on one of the replicas [22:48:57] So from that point, we should be able to merge this and run the script, which will cause it to show up in the clients [22:49:13] well to be available to them at least on the replicas [22:55:00] MaxSem: do you have the ability to merge on puppet or do you need me to push it through? [22:55:25] no, I'm not an ops =) [22:55:31] Ok, I'll get it [22:57:02] Merged [23:00:31] Now for the table to show up on the wiki replicas, it still needs to be run. I have another thing I'm working on that this is waiting behind. [23:01:01] If I can get that cleared up soon, I'll run the scripts [23:02:53] thanks bstorm_! [23:12:16] This user is now online in #wikimedia-cloud. I'll let you know when they show some activity (talk, etc.) [23:12:16] @notify halfak what do you think of just increasing the time limit to 10 mins for the time being, before we come up with a plan for moving the data? [23:13:12] why don't I see their name here... [23:18:09] The bot lies [23:18:27] lol