[08:28:12] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#User_databases doesn't say anything about quotas [08:31:55] what's a reasonable size for user DBs? [08:32:33] I'm trying to make a dump of ORES data for some analytics, and it would be in the 100M-1G range [08:32:45] is that acceptable? [14:23:49] Can someone explain to me why I'm getting OOM fatal errors after just 125829120 bytes? [14:24:16] My limit is set to 256M from PHP and my cron job is set to 1G [14:24:27] andrewbogott: ^ [14:28:02] Cyberpower678, 120x1024x1024 ?? [14:28:32] Apparently. But my limits are set higher than that. WAAAY higher [14:29:18] some funny overhead? Not clearing something? [14:31:54] No. It's literally when only running on jusb [14:31:58] *jsub [14:32:20] It works correctly everywhere else using on average 128 MB [14:32:49] It's like something on PHP is seriously misconfigured on the exec nodes [14:32:58] "My limit is set to 256M from PHP" -- how did you set that? [14:34:18] ini_set [14:34:35] Nothing unusual. [14:34:55] ini_set( 'memory_limit', '256M' ); [14:36:12] valhallasw`cloud: ^ [14:36:49] mm, I think that should work for php-cli as well [14:37:44] Well I'm not running the PHP from the command line and not on the job grid and it works just fine. [14:37:47] You could try passing it as a command line parameter instead, i.e. php -d memory_limit=256M [14:38:05] But when passed to the jsub it almost immediately crashes with OOM. [14:38:18] Yes. But php-cli and php-from-lighttpd uses a different configuration file with different defaults [14:38:37] I thought the cli hard codes it so there is no limit (regardless of what you set?) [14:38:53] And the php-cli has a limit of 120MB? [14:39:04] That seems awfully low. [14:39:20] The config file /etc/php/7.2/cli/php.ini says: [14:39:23] ; Maximum amount of memory a script may consume (128MB) [14:39:23] ; http://php.net/memory-limit [14:39:23] memory_limit = -1 [14:39:44] that page does suggest -1 means no limit [14:40:20] also -- to be clear, when you say 'crashes with OOM', that means you get a 'Fatal error: Out of memory (allocated 17039360) (tried to allocate 77824 bytes)', right? [14:40:30] Yes [14:40:44] Oh and PHP Warning: mysqli_connect(): (HY000/2002): php_network_getaddresses: getaddrinfo failed: System error in /mnt/nfs/labstore-secondary-tools-project/iabot/IABot/Core/DB.php on line 181 just happened [14:41:30] I love how that error becomes a warning [14:42:24] You're using the default php version, i.e. 7.2? [14:42:57] PHP Fatal error: Out of memory (allocated 125829120) (tried to allocate 12288 bytes) in /mnt/nfs/labstore-secondary-tools-project/iabot/IABot/Core/DB.php on line 192 [14:43:14] PHP 7.2.16-1+0~20190307202415.17+stretch~1.gbpa7be82+wmf1 (cli) (built: Mar 11 2019 11:23:18) ( NTS ) [14:43:56] valhallasw`cloud: the warning becomes an error for IABot as it will throw an exception if it can't connect to the DB. [14:44:31] PHP Fatal error: Uncaught Exception: Unable to connect to the database in /mnt/nfs/labstore-secondary-tools-project/iabot/IABot/Core/DB.php:196 :-) [14:46:16] ok -- so as far as I can see the only point where memory_limit is set is /etc/php/7.2/cli/php.ini, and this setting (-1) is still there when I run Cyberpower678: is the error preceded by mmap() failed: [12] Cannot allocate memory ? [14:50:00] valhallasw`cloud: on the login VM the memory limit errors do not happen. Only when run on the job scheduler [14:50:11] I think so [14:51:07] ok, then just bump the -mem parameter up some more. [14:51:52] I need more than 1G to run an application that is limiting itself to 256M? [14:52:12] if you're getting mmap errors it's not limiting itself to 256M [14:52:27] that means it's trying to allocate more memory than SGE will allow it to [14:52:35] Granted it will increase the limit to 1G if it finds it can't write large amounts of data to files, but that seems odd. [14:52:43] it could be that it's not PHP itself doing that but maybe the mysql library [14:53:20] * Cyberpower678 sets it to 2G [14:54:05] During the incidences though IABot had a set limit of 256M, with no report of having to had to increase to 1G to account for an NFS failure. [14:54:51] And real memory usage came back at 128 MB of memory when doing test runs. [14:55:47] measuring memory usage is a bit of a subtle art, especially when it comes to dynamically loaded libraries [14:56:06] so I would suggest to only use SGE itself as a measurement (through qstat or qacct) [14:56:24] because that will be the actual measurement used (which, iirc, includes the memory used by all the libraries) [14:56:59] valhallasw`cloud: real memory usage was reported by IABot. It basically reports the peak real mem usage after every article it analyzes. It usually keeps to 128 MB. [14:57:51] define 'peak real mem usage'? [14:58:52] valhallasw`cloud: lhttps://www.php.net/manual/en/function.memory-get-peak-usage.php [14:58:53] https://www.php.net/manual/en/function.memory-get-peak-usage.php [14:59:03] the value is set to true [14:59:30] Ok, so that will give you probably the same measure that's used in php's memory_limit [15:00:11] the updated cron should be firing in about 1 minute [15:00:12] but that's not a reliable indicator of memory usage as reported by the system [15:02:32] And 2 workers crashed [15:03:16] for example, allocating a 256MB block of memory using $array = new SplFixedArray(256 * (1024 * 1024 / 16)); results in memory_peak_usage reporting approx 270MB while SGE measures maxvmem = 288MB. [15:03:40] what's the job number? [15:04:24] 6092014 [15:04:38] ok, so that reports maxvmem = 153M [15:04:47] is for one of them. It went down because it can't connect the db given the previous DNS warning. [15:04:54] ah ok [15:04:57] well, not ok [15:05:04] but a different problem [15:57:57] valhallasw`cloud: any ideas as to why my bot is getting dns errors to a VPS server? [15:58:03] *VM [15:58:23] namely cyberbot-db-01 [16:17:20] Cyberpower678: not really -- especially the System Error part is weird. Normally you'd get a Temporary failure in name resolution [16:18:18] valhallasw`cloud: /usr/bin/php7.2: error while loading shared libraries: libargon2.so.0: cannot open shared object file: Error 24 [16:18:34] Might be related? [16:19:45] "Errno 24 means: "Too many open files"" [16:20:40] so that sounds like your code or PHP is opening too many sockets and/or files in a short time? I don't know if that's also the exact error that getaddrinfo gets [16:21:52] valhallasw`cloud: maybe. I did discover a fatal flaw in the updated code I'm working to fix. It writes about several thousand files that would consume generous amounts of memory if retained. But it should close them when done. [16:22:02] And then delete them. [16:22:24] so System Error means getaddrinfo returned EAI_SYSTEM / Other system error, check errno for details. [16:22:34] But the script is a self restarting script, it's possible on a fatal error that doesn't actually kill it, the sockets remain open. [16:22:52] Good learning experience, here. [16:23:35] unfortunately it seems php does not expose glibc's errno [16:23:48] so that information seems lost [16:27:47] I think I fixed the error causing constant restarts. When it can't finish analyzing a page and it spontaneously restarts, the sockets remain open I guess. [16:32:02] valhallasw`cloud: yea, that seems to have fixed it. [17:32:21] bd808: hi - I see you have +2 on wikimedia/iegreview.git. For when you have time, could you please review https://gerrit.wikimedia.org/r/#/q/status:open+project:wikimedia/iegreview ? [17:32:34] three very simple patches await your royal assent :) [19:52:39] !log tools rebooting tools-worker-1023 [19:52:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL