[00:15:34] RileyH: regarding T188684, do you leave a browser open with your server? [00:15:35] T188684: Actively running servers shut down unexpectedly - https://phabricator.wikimedia.org/T188684 [00:15:56] Sometimes yes, sometimes no [00:16:23] The last time it happened, no. Previous times, probably once or twice. [00:16:58] if you do leave than it is something new, otherwise it is a known bug [00:17:31] inactivity is defined as time since last "ping" by the browser to the server [00:17:58] Isn't the whole point of this being able to walk away from your computer or close your browser? [00:18:20] If I am understanding you correctly, this should not happen if you leave the terminals open in your browser. [00:18:44] yes, that is the current limitations we have [00:19:28] ideally both inactive but open could be shut down and active but closed could be kept alive (your use case) [00:19:57] Eh.. [00:20:13] That doesn't work for me then [00:20:19] I can't have both servers open [00:20:38] I can only have one open at a time due to OAuth [00:20:58] unless yu.vi or some other jupyterhub developer has already solved this (they might), it will probably need significant developer time to solve this [00:21:23] I use more than one account with chrome's profiles, if that helps [00:21:42] Smart.. Could also just use two browsers [00:21:58] or, you can use toolforge itself for the time being [00:22:33] (I dislike the idea of saying PAWS is limited but it is the reality at this point) [00:23:10] I am awaiting toolforge access [00:23:24] PAWS would be ideal for me if I could access both bots from my main account [00:24:08] Do you have the phab for the known issue? [00:25:23] RileyH: bd80.8 has already aproved your toolforge access [00:25:50] you're [[User:Huntley]], aren't you? [00:25:56] Sweet, that would have been within the last two hours then [00:25:57] Indeed [00:26:03] https://toolsadmin.wikimedia.org/tools/membership/status/255 prerfect [00:31:00] RileyH: looking at phab tasks, I think it as known issue upstream only. I'll turn T188684 into an upstream tracking task once I find the relevant github issue again [00:31:00] T188684: Actively running servers shut down unexpectedly - https://phabricator.wikimedia.org/T188684 [00:33:39] Cheers [01:42:42] bd808: do you have a little bit of time to help me understand why codesearch is out of disk space but at the same time not out of space? [01:42:49] or anyone else (!help) :) [01:42:57] df -h shows [01:42:58] /dev/vda3 19G 12G 5.7G 69% / [01:42:58] legoktm: sure. what's the instance name? [01:43:02] codesearch2 [01:43:09] but I can't even tab complete [01:43:10] legoktm@codesearch2:~$ cd /var-bash: cannot create temp file for here-document: No space left on device [01:43:21] and puppet has presumably been failing too [01:43:22] The last Puppet run was at Thu Mar 1 10:55:00 UTC 2018 (885 minutes ago). [01:43:26] I bet you are out of inodes [01:43:33] https://tools.wmflabs.org/nagf/?project=codesearch shows 6000GB left [01:43:34] oh [01:43:38] (I mean 6GB) [01:43:42] how do I check that again? [01:44:26] df -i [01:44:27] /dev/vda3 1245184 1245184 0 100% / [01:44:28] yep [01:44:29] awesome [01:44:34] thanks bd808 :) [01:44:45] inodes are the worst [01:44:59] they are the worst indeed [01:45:19] are inodes monitored by anything? [01:45:21] also putting your working data on the / partition is the second worst [01:45:31] where should I be putting data? [01:46:04] probably /srv after mounting the extra disk. Is this a small instance or something else? [01:46:05] - /srv [01:46:19] lemme check [01:46:43] it's medium [01:46:52] it is in /srv/ right now [01:46:57] but how do I mount the disk? [01:47:33] legoktm: a puppet class I think does it [01:47:34] there is a puppet role for it ... [01:47:43] lsb [01:47:56] I'm going to create a fresh instance for this [01:48:08] yeah, that would be best [01:48:15] legoktm: but move your data before applying it otherwise the data will be lost [01:48:29] hidden really, not lost [01:48:32] I think ... [01:48:34] the data doesn't matter [01:48:41] it's just the index of the git repos [01:48:45] I don't think the puppet role blanks the mount point [01:49:59] legoktm: applying role::labs::lvm::srv will format and mount the quota the instance has above the 20G that is used for the / partition [01:50:32] ok but does that have more inodes too? [01:50:51] well, yes because nothing else will be in there [01:51:04] inodes are per-partition [01:51:28] we could rebuild the filesystem on a new mount too to have higher inode density [01:52:02] !log codesearch created codesearch3 instance (jessie, medium flavor) [01:52:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Codesearch/SAL [01:52:11] !log codesearch applied role::labs::lvm::srv role to codesearch3 [01:52:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Codesearch/SAL [01:52:27] the default is something like 1 inode per 512K or something I think... [01:53:12] /dev/mapper/vd-second--local--disk 21G 44M 19G 1% /srv [01:53:35] /dev/mapper/vd-second--local--disk 1343488 11 1343477 1% /srv (inodes) [01:53:45] ok :D [01:58:49] legoktm: you might want to unmount and reformat that /srv partition with `mkfs.ext4 -T news /dev/mapper/vd-second--local--disk` to get more inode headspace [01:59:22] bd808: ok, is that the full command I need to run? [01:59:30] I .. think so [01:59:50] https://wiki.archlinux.org/index.php/ext4#Bytes-per-inode_ratio [02:00:02] mke2fs 1.42.12 (29-Aug-2014) [02:00:02] /dev/mapper/vd-second--local--disk contains a ext4 file system [02:00:02] last mounted on Fri Mar 2 01:52:40 2018 [02:00:02] Proceed anyway? (y,n) y [02:00:03] /dev/mapper/vd-second--local--disk is mounted; will not make a filesystem here! [02:00:20] yeah, you have to unmount it first [02:00:28] sudo unmount /srv [02:00:56] you probably should disable puppet first too so it doesn't fight you... [02:01:22] uh, how do I do that? [02:01:43] puppet agent --disable? [02:01:55] sudo puppet agent --disable "legoktm messing with partitions" [02:02:13] legoktm: just for the record, the other codesearch endpoints do work fine (core, skis, ext, skin+ext), it's just the /search (everything) that's not responding [02:02:17] nicely isolated :) [02:02:35] E.g. https://codesearch.wmflabs.org/core/ works [02:02:55] they're probably going to fall over sooner or later [02:03:53] bd808: awesome, it now has 5 million inodes instead of 1 million [02:04:02] \o/ [02:04:07] 5 times the fun [02:04:50] don't forget to `sudo puppet agent --enable` when you are done with manual stuff [02:04:56] just did that :) [02:05:24] !log codesearch ran `mkfs.ext4 -T news /dev/mapper/vd-second--local--disk` (h/t bd808) to quintuple the number of inodes [02:05:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Codesearch/SAL [02:09:25] !log codesearch pointed codesearch.wmflabs.org at codesearch3:3002 [02:09:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Codesearch/SAL [02:09:54] ok, looks back to normal now ;) [02:09:58] thanks bd808 :) [02:10:01] legoktm: Thank you! [02:10:12] you did all the work legoktm :) [02:10:32] I would have never figured out the inode thing on my own [02:10:44] is that monitored by anything? I don't see it on nagf [02:11:01] fair enough. I heard "why is the disk full when it's not full" and I figured inodes [02:11:25] it probably is in the raw monitoring data feed somewhere [02:12:16] !log codesearch deleted codesearch2 instance [02:12:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Codesearch/SAL [02:16:12] I'll be back in a bit and then document everything I just learned :) [02:26:27] legoktm: here's your graph of doom -- https://graphite-labs.wikimedia.org/render/?width=586&height=308&target=codesearch.codesearch2.diskspace.root.inodes_percentfree&from=-90days [02:33:52] bd808: haha, that looks perfect [02:35:14] Krinkle: https://github.com/wikimedia/nagf/issues/19 :) [03:32:01] legoktm: https://tools.wmflabs.org/nagf/?project=codesearch#h_overview_disk-inodes [03:32:16] ooh, thanks :) [03:44:57] (03PS1) 10Legoktm: Document how to increase inodes during image setup [labs/codesearch] - 10https://gerrit.wikimedia.org/r/415797 [03:45:10] bd808: ^ sanity check? :) [04:11:37] good night irccloud [04:12:56] (03CR) 10BryanDavis: [C: 031] "You might want to add in the puppet agent --disable/--enable too." [labs/codesearch] - 10https://gerrit.wikimedia.org/r/415797 (owner: 10Legoktm) [04:23:25] (03PS2) 10Legoktm: Document how to increase inodes during image setup [labs/codesearch] - 10https://gerrit.wikimedia.org/r/415797 [04:23:42] (03CR) 10Legoktm: [C: 032] "Done, thanks!" [labs/codesearch] - 10https://gerrit.wikimedia.org/r/415797 (owner: 10Legoktm) [04:23:57] (03Merged) 10jenkins-bot: Document how to increase inodes during image setup [labs/codesearch] - 10https://gerrit.wikimedia.org/r/415797 (owner: 10Legoktm) [11:56:33] anyone can help with this? https://phabricator.wikimedia.org/T188496 [11:56:53] cannot access the dumps [13:41:40] !log tools doing some testing with puppet classes in tools-package-builder-01 via horizon [13:41:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:19:20] general question, is there already a fixed date, when all server with trusty should be migrated to somethind fiferent? [17:19:27] *different [17:22:22] Sagan: I cannot answer your question (not a cloud member), but apparently trusty is EOLed on April 2019 [17:22:33] https://wiki.ubuntu.com/Releases [17:35:36] Sagan: we haven't made the timeline yet, but as jynus says it will be before April 2019 [17:45:40] So my script that normally runs 15-20 minutes weekly, just took 127 minutes :-( [17:46:41] Dispenser: do want help figuring out why? [17:47:09] I'm guessing its the begging of the month [17:48:52] chicocvenancio: The think I'd /REALLY/ like to know is why https://phabricator.wikimedia.org/T181401 became 100-1000x slower in a few weeks on both ToolForge and production [17:50:09] That's a bit beyond my abilities. Could table growth explain it? [17:50:52] Dispenser: you know data is not static, right? [17:51:30] specially wikidata- there is huge amounts of edits coming every second [17:51:51] and wikidata team has been changing the structure to be more efficient in the last months [17:52:27] maybe some queries have too adapt, too- but I don't know the details [17:53:19] based on this- https://phabricator.wikimedia.org/T181401#3791003 the query is trying to read over 50 million rows [17:57:59] Dispenser: checking the query- your query is not well written [17:58:52] change it to SELECT 1 FROM wikidatawiki_p.wb_items_per_site WHERE ips_site_page='Harrington Place, Stellenbosch' AND ips_site_id IN ('frwiki', 'eswiki', ...); [17:59:11] and you will be able to get use of the index [17:59:31] The query was running sub 1-sec (probably 0.01 sec) in November [18:00:11] I am telling you now and here, the query is badly written [18:01:59] jynus: How would I find out if we dropped an index? Specifically wb_ips_site_page: https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/repo/sql/Wikibase.sql#L28 [18:02:17] check for changes on the sql, yes [18:02:21] that would be the case [18:02:34] but I would find strange there was an index for that [18:02:49] you can ask, however for a custom index, if you indicate which to add and seems reasonable [18:03:14] I would suggest changing the queries is the right way based on it [18:03:40] != 'enwiki'; cannot be resolved eficiently [18:04:05] That's more a HAVING statement... [18:04:16] my version up there will return the same results 100x faster [18:05:47] Why isn't it using the INDEX? [18:06:31] Seriously what happened to it? [18:06:43] I can't figure out how to browse gerrit [18:07:23] phabricator is easier, it has a source code navigator [18:09:42] Dispenser: I have commented on https://phabricator.wikimedia.org/T181401#4018690 why it is slow [18:10:15] I cannot give you more tips than that, but feel free to comment there further, maybe someone else can help [18:12:22] I didn't write it there, but my version of the query takes 0.1 seconds [18:12:31] instead of 17 minutes [18:14:00] jynus: Is there a was I do SHOW INDEXES FROM wikidatawiki.wb_items_per_site; on ToolForge views? [18:14:54] I can show them to you, they should be the same than on production (tables.sql, or the extension) [18:15:07] actually [18:15:22] I will show you, but they should be on the information_schema_p database [18:16:19] actually no, the structure is not there [18:16:25] let me copy and paste for you [18:16:47] this is from labsdb1009, the current analytics replica [18:18:04] https://phabricator.wikimedia.org/P6782 [18:18:15] so I am not denying the structure can change- it does [18:18:36] but there is 0 guarantee that it will be stable [18:19:01] basically, the "contract" with wikireplicas is that you get access to the internal stuff [18:19:23] but the internal stuff can change (unlike the official api, which is stable) [18:19:25] Last I looked I can't use force index on views [18:19:42] my solution doesn't involve force [18:20:04] the problem is the != [18:20:10] that is bad performance [18:20:55] it could be that some time ago, non-enwiki items were smaller [18:21:00] and that worked [18:21:20] but that is the case of queries- as data changes, the query plan changes, too [18:21:38] Long time ago = 2016 to November 2017 [18:21:45] for "force" [18:21:55] we have special views, like user_revisions [18:22:16] we can also create those- special view with different quering strategy [18:22:31] as I said, propose a way, and we can help :-) [18:23:19] but yes, sadly data changes and structure changes [18:23:59] I am trying to help you here reducing 100x your query time [18:24:12] The documentation https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/repo/sql/Wikibase.sql;10a69d8231e94ff39098fd0a680ebf9a3864ec11$28 lists the `wb_ips_site_page` INDEX, but its gone from production. What other documentation can I use? [18:24:55] let me see [18:28:31] so, apparently, that was written [18:28:42] but never deployed [18:28:46] or removed [18:30:11] Can we make a tool that dumps the actual schema with indexes? [18:30:13] yep, it was dropped https://phabricator.wikimedia.org/T179793 [18:30:35] Dispenser: yes, in fact, I thought information_schema_p had it [18:30:48] I think the problem here is other [18:31:04] it was requested to be dropped, but the documentation never updated to reflect this [18:31:39] I am going to request to be updated [18:33:34] so the databases are good, the documentation wasn't [18:35:27] the index seems reasonable to be added to wikireplicas only, I would be ok to ask for it, Dispenser [18:35:44] although I am not 100% sure it would fix the query [18:37:08] jynus: Don't want to mess with the databases. I'm only upset the documentation I consulted for building the query wasn't up to date. [18:37:51] I am too :-) [18:38:04] AND ips_site_id IN (SELECT dbname FROM meta_p.wiki WHERE dbname != 'eniwki') -- Isn't too complicated to add :-) [18:38:05] that is why I reopen T179793 [18:38:06] T179793: Consider dropping the "wb_items_per_site.wb_ips_site_page" index - https://phabricator.wikimedia.org/T179793 [18:38:27] my advice on bug reporting is [18:38:53] change how you express things, and people will be more receptive [18:43:44] jynus: I picked up the wrong project. Apparently the "dyslexia gets better with age" is a complete fucking lie. Should've pick a project with less people adept at communicating. [19:23:36] bd808: ok, that is long enough. I first thought it would be this year, that was why I asked [19:57:03] Sagan: We will probably start talking to everyone about it in August and try to get most people to migrate by January or Feburary. [19:57:24] still enough time for me :)