[00:00:16] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: WARNING - load average: 7.54, 6.61, 5.62 [00:11:05] RECOVERY dpkg-check is now: OK on bots-abogott-devel.pmtpa.wmflabs 10.4.1.42 output: All packages OK [00:38:34] RECOVERY Free ram is now: OK on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: OK: 22% free memory [00:41:04] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [01:01:56] PROBLEM Free ram is now: WARNING on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Warning: 18% free memory [01:04:07] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [01:06:26] PROBLEM Total processes is now: WARNING on bots-salebot.pmtpa.wmflabs 10.4.0.163 output: PROCS WARNING: 179 processes [01:11:25] RECOVERY Total processes is now: OK on bots-salebot.pmtpa.wmflabs 10.4.0.163 output: PROCS OK: 102 processes [01:28:03] RECOVERY Current Load is now: OK on nagios-main.pmtpa.wmflabs 10.4.0.120 output: OK - load average: 2.14, 3.54, 4.85 [01:56:32] RECOVERY Total processes is now: OK on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS OK: 150 processes [02:40:27] RECOVERY Current Load is now: OK on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: OK - load average: 4.99, 4.79, 4.97 [02:42:07] PROBLEM Current Load is now: WARNING on nagios-main.pmtpa.wmflabs 10.4.0.120 output: WARNING - load average: 9.97, 9.18, 6.58 [02:48:37] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: WARNING - load average: 5.04, 5.00, 5.03 [03:17:13] RECOVERY Current Load is now: OK on nagios-main.pmtpa.wmflabs 10.4.0.120 output: OK - load average: 0.95, 1.70, 3.97 [03:28:34] RECOVERY Current Load is now: OK on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: OK - load average: 4.87, 4.91, 4.97 [04:06:16] RECOVERY Disk Space is now: OK on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: DISK OK [04:14:15] PROBLEM Disk Space is now: WARNING on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: DISK WARNING - free space: / 520 MB (5% inode=70%): [04:15:15] PROBLEM Current Load is now: WARNING on nagios-main.pmtpa.wmflabs 10.4.0.120 output: WARNING - load average: 7.69, 7.81, 6.18 [04:30:22] RECOVERY Current Load is now: OK on nagios-main.pmtpa.wmflabs 10.4.0.120 output: OK - load average: 0.33, 2.79, 4.64 [04:37:33] RECOVERY Free ram is now: OK on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: OK: 22% free memory [04:38:53] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [04:40:53] PROBLEM Free ram is now: WARNING on newchanges-bot.pmtpa.wmflabs 10.4.0.221 output: Warning: 19% free memory [04:41:53] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: WARNING - load average: 5.13, 5.12, 5.06 [04:45:52] RECOVERY Free ram is now: OK on newchanges-bot.pmtpa.wmflabs 10.4.0.221 output: OK: 28% free memory [04:56:52] RECOVERY Current Load is now: OK on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: OK - load average: 4.93, 4.91, 4.98 [04:56:52] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [05:10:33] PROBLEM Free ram is now: WARNING on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Warning: 18% free memory [06:29:44] PROBLEM Total processes is now: WARNING on parsoid-roundtrip4-8core.pmtpa.wmflabs 10.4.0.39 output: PROCS WARNING: 152 processes [06:32:43] PROBLEM Total processes is now: WARNING on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS WARNING: 152 processes [06:54:45] RECOVERY Total processes is now: OK on parsoid-roundtrip4-8core.pmtpa.wmflabs 10.4.0.39 output: PROCS OK: 148 processes [06:57:44] RECOVERY Total processes is now: OK on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS OK: 149 processes [07:03:15] PROBLEM Current Load is now: WARNING on nagios-main.pmtpa.wmflabs 10.4.0.120 output: WARNING - load average: 4.90, 5.66, 5.06 [07:08:14] RECOVERY Current Load is now: OK on nagios-main.pmtpa.wmflabs 10.4.0.120 output: OK - load average: 4.64, 4.94, 4.89 [07:17:44] PROBLEM Total processes is now: WARNING on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS WARNING: 152 processes [08:42:02] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [08:43:32] PROBLEM Free ram is now: WARNING on techvandalism-bot.pmtpa.wmflabs 10.4.0.194 output: Warning: 16% free memory [09:00:12] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [09:13:34] RECOVERY Free ram is now: OK on techvandalism-bot.pmtpa.wmflabs 10.4.0.194 output: OK: 24% free memory [12:17:53] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: CHECK_NRPE: Error - Could not complete SSL handshake. [12:23:02] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [12:40:32] RECOVERY Free ram is now: OK on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: OK: 22% free memory [12:40:52] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [12:52:54] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Unable to read output [12:53:34] PROBLEM Free ram is now: WARNING on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Warning: 18% free memory [13:03:52] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [13:18:52] PROBLEM Current Load is now: CRITICAL on maps-pgsql3.pmtpa.wmflabs 10.4.1.75 output: Connection refused by host [13:19:32] PROBLEM Disk Space is now: CRITICAL on maps-pgsql3.pmtpa.wmflabs 10.4.1.75 output: Connection refused by host [13:20:14] PROBLEM Free ram is now: CRITICAL on maps-pgsql3.pmtpa.wmflabs 10.4.1.75 output: Connection refused by host [13:21:44] PROBLEM Total processes is now: CRITICAL on maps-pgsql3.pmtpa.wmflabs 10.4.1.75 output: Connection refused by host [13:22:24] PROBLEM dpkg-check is now: CRITICAL on maps-pgsql3.pmtpa.wmflabs 10.4.1.75 output: Connection refused by host [13:23:54] RECOVERY Current Load is now: OK on maps-pgsql3.pmtpa.wmflabs 10.4.1.75 output: OK - load average: 0.77, 0.90, 0.53 [13:24:34] RECOVERY Disk Space is now: OK on maps-pgsql3.pmtpa.wmflabs 10.4.1.75 output: DISK OK [13:25:12] RECOVERY Free ram is now: OK on maps-pgsql3.pmtpa.wmflabs 10.4.1.75 output: OK: 96% free memory [13:26:42] RECOVERY Total processes is now: OK on maps-pgsql3.pmtpa.wmflabs 10.4.1.75 output: PROCS OK: 100 processes [13:27:22] RECOVERY dpkg-check is now: OK on maps-pgsql3.pmtpa.wmflabs 10.4.1.75 output: All packages OK [13:33:13] PROBLEM Current Load is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: CHECK_NRPE: Error - Could not complete SSL handshake. [13:38:22] RECOVERY Current Load is now: OK on aggregator2.pmtpa.wmflabs 10.4.0.193 output: OK - load average: 0.39, 0.29, 0.32 [13:53:40] so i just rebooted my labs instance, and now i can't ssh back in [13:54:07] i see this error message in the 'Get console output' action from the labs UI: [13:54:20] Feb 2 13:51:28 pdbhandler-2 dhclient: bound to 10.4.1.73 -- renewal in 46 seconds. [13:54:21] Feb 2 13:51:31 pdbhandler-2 automount[917]: add_host_addrs: hostname lookup failed: Name or service not known [13:54:23] Feb 2 13:52:00 pdbhandler-2 nslcd[1119]: [206613] error writing to client: Broken pipe [13:55:01] any ideas what going wrong, Damianz et al? [13:57:43] ah, now it works [14:13:02] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [15:27:53] PROBLEM Free ram is now: UNKNOWN on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Call to fork() failed [15:33:02] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [16:38:32] RECOVERY Free ram is now: OK on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: OK: 22% free memory [17:01:32] PROBLEM Free ram is now: WARNING on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Warning: 18% free memory [17:49:18] anyone can help me please? I'm getting Exit status 254 when sshing to my instance [18:03:02] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:07:53] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [18:29:32] PROBLEM dpkg-check is now: CRITICAL on nagios-dev.pmtpa.wmflabs 10.4.0.201 output: DPKG CRITICAL dpkg reports broken packages [18:34:34] RECOVERY dpkg-check is now: OK on nagios-dev.pmtpa.wmflabs 10.4.0.201 output: All packages OK [18:43:02] PROBLEM Free ram is now: UNKNOWN on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Call to fork() failed [18:56:53] matanya which? when? [18:57:09] it was somehoe fixed suddenly [18:57:17] ok [18:57:21] but petan I have some other issue [18:57:36] hm [18:57:53] I trying to work on nagios (icinga) but I'm on a vm [18:58:11] so the hardware checks I write, don't work [18:58:41] what can be done about that? more over, the instance can't connect the internet [18:58:50] why icinga? [18:58:59] how is that better than nagios [18:59:21] I write nagios checks, it is equivalent [18:59:36] i have absolutely no idea what you mean, is that on labs? [18:59:45] yes [18:59:47] what are you checking and where [18:59:58] on prod wikimedia uses icinga [19:00:07] instead of nagios [19:00:40] ok but I still don't see ur problem [19:00:56] I can't check hardware checks on a VM [19:01:05] the hardware is abstracted [19:01:07] what hardware u check? [19:01:11] * what [19:01:16] dell [19:01:30] cpu, memory, raid, disks [19:01:32] etc [19:02:03] ok but unless you can emulate this on system you are testing it on, it will hardly be possible [19:02:26] I can't that is the point [19:02:32] I need a real server, not a vm [19:03:07] I believe people in ops can make such check themselve if they needed it [19:03:16] this more for -operations [19:03:32] I asked, peter encouraged me to help out here [19:04:01] he asked me to develop it in a lab instance [19:04:07] but it doesn't work [19:04:22] then u need to use magic ;) [19:04:36] ah, I did that already :) [19:04:46] I came here after even that failed [19:04:58] I don't know how your checks work but u need to somehow emulate the prod env [19:05:09] I know that :) [19:06:29] is there any op around that can help? [19:06:39] in -operations [19:06:43] not here [19:06:54] I don't see anyone there [19:07:52] if u need hardware then u want Rob and I doubt he's in office today [19:08:34] RobH? [19:09:44] http://wikimediafoundation.org/wiki/User:RobH [19:09:46] yup [19:10:24] last time I wanted sth regarding hw I was told to speak to him [19:11:33] ok, thanks [19:23:02] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [20:08:02] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: CHECK_NRPE: Error - Could not complete SSL handshake. [20:22:53] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [20:38:54] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [20:41:32] RECOVERY Free ram is now: OK on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: OK: 22% free memory [20:41:42] PROBLEM Total processes is now: WARNING on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS WARNING: 154 processes [20:47:53] PROBLEM Free ram is now: UNKNOWN on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Call to fork() failed [20:54:32] PROBLEM Free ram is now: WARNING on stackfarm-sql2.pmtpa.wmflabs 10.4.1.23 output: Warning: 18% free memory [21:06:53] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 14% free memory [21:08:53] PROBLEM Free ram is now: WARNING on newchanges-bot.pmtpa.wmflabs 10.4.0.221 output: Warning: 12% free memory [21:27:54] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 8% free memory [22:07:53] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [22:23:52] RECOVERY Free ram is now: OK on newchanges-bot.pmtpa.wmflabs 10.4.0.221 output: OK: 28% free memory [22:46:43] RECOVERY Total processes is now: OK on wikidata-dev-9.pmtpa.wmflabs 10.4.1.41 output: PROCS OK: 150 processes [22:53:02] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [23:02:57] petan, around? [23:03:10] yes [23:03:15] wm-bot seems to have a strange bug where it only reports edits by certain users to a page [23:03:26] in #wikimedia-OTRS I've got it stalking OTRS/Volunteering [23:03:58] but it's not reported the last few edits, and yesterday was only reporting my edits [23:04:15] can you help? :) [23:04:53] that's weird [23:05:01] afaik it only reports edits [23:05:10] not new pages / protection changes and moves [23:05:17] I will look into that but not now [23:05:30] yeah after I moved it it would report my edits and another users User:Daniel, but not other users edits [23:05:32] sure, thanks [23:05:44] that's really weird [23:06:05] btw there is bugzilla for it [23:48:02] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Unable to read output [23:49:32] PROBLEM Disk Space is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:54:33] RECOVERY Disk Space is now: OK on aggregator2.pmtpa.wmflabs 10.4.0.193 output: DISK OK