[15:41:49] andrewbogott: I seem to have a major memory leak in my VM. Any chance you can help me figure out the cause? [15:42:55] When I nuke all of the bot jobs, I'm still left with 8 GB of RAM being used. [15:43:16] So the leak is currently chewing up more than half of my VM's RAM. [15:45:36] or maybe bd808? [15:45:40] Cyberpower678: I have a meeting in a few minutes but may be able to look later. In the meantime you could try using 'top' to show you how memory usage is distributed. [15:45:59] That's the thing. Nothing in particular stands our. [15:46:15] My PHP (30 of them) each use about 250 MB. [15:46:54] Err, 200 [15:46:58] if the memory is not attached to any proc, could that be disk cache? [15:47:26] No idea. I'm not too good at reading top. [15:47:41] It's a little cluttered in my mind. [15:47:53] was is the output of `free -m`¿? [15:48:17] KiB Mem : 16436228 total, 1493488 free, 8620712 used, 6322028 buff/cache [15:48:54] When I completely kill the bot, I get KiB Mem : 16436228 total, 6922568 free, 3171264 used, 6342396 buff/cache [15:49:57] could you try `free -g`? too many digits [15:50:27] $ free -m [15:50:27] total used free shared buff/cache available [15:50:27] Mem: 16051 8092 2199 170 5759 7457 [15:50:29] Swap: 510 0 510 [15:52:22] arturo: ^ [15:57:57] there's 5.7GB in the mem cache..try freeing that `sync; echo 1 > /proc/sys/vm/drop_caches` [16:00:24] * Cyberpower678 wonders why the cache filled up so much. [16:02:16] phamhi: the cache only dropped to 4740 [16:02:39] what's the free memory now? [16:02:50] 3039 [16:03:39] do you have anything running on that vm? [16:03:51] About 30 IABot workers [16:05:23] and each of them uses 250BM?..so 8GB used sounds about right [16:06:30] phamhi: except I just killed them all and the memory is not free. [16:06:53] # killall php; sync; echo 1 > /proc/sys/vm/drop_caches [16:07:04] # free -m [16:07:04] total used free shared buff/cache available [16:07:04] Mem: 16051 3701 7705 170 4644 11871 [16:07:06] Swap: 510 0 510 [16:07:26] try echo 3 > /proc/sys/vm/drop_caches [16:08:56] phamhi: there we go. Memory usage now dropped to zero. [16:09:03] phamhi: what is that command [16:09:19] https://tecadmin.net/flush-memory-cache-on-linux-server/ [16:09:52] you generally dont want to force linux to drop the cache..it doesnt it automatically (lazily most of the time)..but it will clear it as needed [16:10:04] *it does it automatically* [16:10:36] phamhi: ah so I'm guessing it was flooded with lots of inode entries. [16:10:43] yah [16:11:02] IABot will write up to 600K files in a minute [16:11:16] *write/delete [16:12:02] just remember to close the file descriptors if you dont need them [16:12:21] phamhi: it does. Otherwise I'd hit socket errors everywhere. :p [16:12:50] It's got a really good handler in place. [16:13:15] And if the garbage collector fails to run on exit, then it will run on start cleaning up all the files not being actively used by a process. [16:14:12] phamhi: lest I repeat what happened on my Toolforge account. :p [16:14:46] The collector was broken and I didn't notice for months. I ran rm -rf on the directory and it took 5 hours to run. [16:15:17] Files that amounted to 600 GB of data. [16:15:39] Across a few billion files. [20:48:23] Speaking of which, I think the iowait on tools-sgebastion-07 must be astronomical; I issue "top" or any other command and it takes minutes [20:51:42] Maybe it's getting better now, or not. I only see some gzipping going on [20:55:48] Nemo_bis: thanks for checking... it's getting better for me too [21:10:38] !log openstack updating eqiad1 haproxy configuration [21:10:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Openstack/SAL [22:32:02] !log openstack disable puppet agent and keystone on cloudcontrol1004 (standby) T223907 [22:32:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Openstack/SAL [22:32:05] T223907: Set up HA endpoints for keystone, glance, nova, designate apis - https://phabricator.wikimedia.org/T223907 [22:32:25] !log openstack add icingia downtime for puppet and systemd state on cloudcontrol1004 T223907 [22:32:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Openstack/SAL