[00:07:23] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [00:13:33] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [00:27:22] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [00:37:23] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [00:44:13] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [00:47:43] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5.pmtpa.wmflabs output: Warning: 14% free memory [00:57:23] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [00:57:43] PROBLEM Free ram is now: CRITICAL on bots-3 i-000000e5.pmtpa.wmflabs output: Critical: 4% free memory [01:02:43] RECOVERY Free ram is now: OK on bots-3 i-000000e5.pmtpa.wmflabs output: OK: 177% free memory [01:07:32] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [01:09:32] PROBLEM Total processes is now: WARNING on bots-salebot i-00000457.pmtpa.wmflabs output: PROCS WARNING: 174 processes [01:14:32] RECOVERY Total processes is now: OK on bots-salebot i-00000457.pmtpa.wmflabs output: PROCS OK: 97 processes [01:14:32] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [01:28:12] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [01:38:02] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [01:44:43] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [01:46:52] PROBLEM Free ram is now: WARNING on dumps-bot1 i-000003ed.pmtpa.wmflabs output: Warning: 11% free memory [01:58:12] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [02:08:03] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [02:15:23] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [02:28:13] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [02:38:03] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [02:39:53] RECOVERY Free ram is now: OK on wikistream-1 i-0000016e.pmtpa.wmflabs output: OK: 25% free memory [02:39:53] RECOVERY Free ram is now: OK on sube i-000003d0.pmtpa.wmflabs output: OK: 39% free memory [02:45:23] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [02:58:14] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:08:03] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [03:12:52] PROBLEM Free ram is now: WARNING on wikistream-1 i-0000016e.pmtpa.wmflabs output: Warning: 11% free memory [03:16:52] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:29:02] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:38:52] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [03:47:33] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:59:03] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:08:52] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [04:17:33] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:29:42] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:38:53] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [04:47:13] RECOVERY Disk Space is now: OK on kubo i-000003dd.pmtpa.wmflabs output: DISK OK [04:47:43] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:59:43] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:08:54] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [05:17:43] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:29:43] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:39:32] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [05:47:45] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:59:43] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:09:32] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [06:18:33] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:29:33] PROBLEM Total processes is now: WARNING on parsoid-spof i-000004d6.pmtpa.wmflabs output: PROCS WARNING: 154 processes [06:29:43] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:31:32] PROBLEM Total processes is now: WARNING on vumi-metrics i-000004ba.pmtpa.wmflabs output: PROCS WARNING: 151 processes [06:34:33] RECOVERY Total processes is now: OK on parsoid-spof i-000004d6.pmtpa.wmflabs output: PROCS OK: 150 processes [06:36:32] RECOVERY Total processes is now: OK on vumi-metrics i-000004ba.pmtpa.wmflabs output: PROCS OK: 146 processes [06:39:32] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [06:49:13] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:50:53] RECOVERY Disk Space is now: OK on scribunto i-0000022c.pmtpa.wmflabs output: DISK OK [06:58:52] PROBLEM Disk Space is now: WARNING on scribunto i-0000022c.pmtpa.wmflabs output: DISK WARNING - free space: / 558 MB (5% inode=81%): [07:01:43] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:09:32] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [07:19:42] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:24:32] PROBLEM Total processes is now: WARNING on wikidata-dev-3 i-00000225.pmtpa.wmflabs output: PROCS WARNING: 156 processes [07:32:22] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:39:33] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [07:49:42] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:02:24] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:09:33] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [08:19:42] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:33:12] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:39:33] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [08:46:53] PROBLEM Free ram is now: CRITICAL on dumps-bot1 i-000003ed.pmtpa.wmflabs output: Critical: 5% free memory [08:49:43] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:56:54] PROBLEM Free ram is now: WARNING on dumps-bot1 i-000003ed.pmtpa.wmflabs output: Warning: 7% free memory [09:03:14] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [09:10:53] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [09:19:43] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [09:34:02] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [09:36:52] PROBLEM Free ram is now: CRITICAL on dumps-bot1 i-000003ed.pmtpa.wmflabs output: Critical: 5% free memory [09:41:42] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [09:50:22] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [10:04:03] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [10:12:23] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [10:20:23] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [10:34:42] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [10:36:53] PROBLEM Free ram is now: CRITICAL on dumps-bot1 i-000003ed.pmtpa.wmflabs output: Critical: 5% free memory [10:43:02] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [10:50:32] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [10:52:11] Hello! Has anyone here had problems with the LocalisationCache when turning on memcached with a MW installation??? [11:04:42] PROBLEM host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [11:05:34] @labs-resolve 51a [11:05:34] I don't know this instance - aren't you are looking for: I-0000051a (fawikitest), [11:05:48] Damianz ping [11:11:52] PROBLEM Free ram is now: WARNING on dumps-bot1 i-000003ed.pmtpa.wmflabs output: Warning: 6% free memory [11:13:02] PROBLEM host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [11:17:48] ACKNOWLEDGEMENT host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [11:18:03] ACKNOWLEDGEMENT host: i-0000051a.pmtpa.wmflabs is DOWN address: i-0000051a.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [11:18:18] ACKNOWLEDGEMENT host: i-00000536.pmtpa.wmflabs is DOWN address: i-00000536.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000536.pmtpa.wmflabs) [11:18:21] :P [11:18:22] here wo go [11:19:42] DISK CRITICAL - free space: /export 189 MB (1% inode=49%): [11:32:59] Change on 12mediawiki a page Developer access was modified, changed by Chihonglee link https://www.mediawiki.org/w/index.php?diff=617567 edit summary: /* User:Chihonglee */ [11:46:52] PROBLEM Free ram is now: CRITICAL on dumps-bot1 i-000003ed.pmtpa.wmflabs output: Critical: 4% free memory [11:54:38] @labs-info dumps-bot1 [11:54:38] [Name dumps-bot1 doesn't exist but resolves to I-000003ed] I-000003ed is Nova Instance with name: dumps-bot1, host: virt6, IP: 10.4.0.4 of type: m1.large, with number of CPUs: 4, RAM of this size: 8192M, member of project: dumps, size of storage: 90 and with image ID: ubuntu-12.04-precise [11:55:06] that's a lot of ram [12:00:12] whoever wrote these bots they are not effectively using resources :/ [12:11:53] PROBLEM Free ram is now: WARNING on dumps-bot1 i-000003ed.pmtpa.wmflabs output: Warning: 6% free memory [12:30:43] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5.pmtpa.wmflabs output: Warning: 13% free memory [12:40:43] PROBLEM Free ram is now: CRITICAL on bots-3 i-000000e5.pmtpa.wmflabs output: Critical: 4% free memory [12:46:53] PROBLEM Free ram is now: CRITICAL on dumps-bot1 i-000003ed.pmtpa.wmflabs output: Critical: 5% free memory [12:49:43] PROBLEM Disk Space is now: CRITICAL on bots-3 i-000000e5.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [12:49:43] PROBLEM Total processes is now: CRITICAL on bots-3 i-000000e5.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [12:49:43] PROBLEM dpkg-check is now: CRITICAL on bots-3 i-000000e5.pmtpa.wmflabs output: Connection refused or timed out [12:49:53] PROBLEM Current Load is now: CRITICAL on bots-3 i-000000e5.pmtpa.wmflabs output: Connection refused or timed out [12:49:53] PROBLEM Current Users is now: CRITICAL on bots-3 i-000000e5.pmtpa.wmflabs output: Connection refused or timed out [12:49:53] PROBLEM SSH is now: CRITICAL on bots-3 i-000000e5.pmtpa.wmflabs output: No route to host [12:56:07] well .. grmpf [12:56:51] either there is something wrong with bots-3 .. or there is something wrong with my bots [12:57:16] But if they first run for days .. weeks .. without problems .. and now everything gets killed in one day, then I am not sure if it is the bots [12:58:52] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [13:02:22] 1 day and approx 4 hours running to be precise [13:29:33] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [13:59:29] @labs-resolve e5 [13:59:29] I don't know this instance - aren't you are looking for: I-000000e5 (bots-3), I-000001e5 (vumi), I-000002e5 (su-fe1), I-000004e5 (timing), [13:59:33] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [13:59:38] o.o [14:13:42] PROBLEM dpkg-check is now: CRITICAL on bots-labs i-0000015e.pmtpa.wmflabs output: DPKG CRITICAL dpkg reports broken packages [14:14:32] RECOVERY Disk Space is now: OK on bots-3 i-000000e5.pmtpa.wmflabs output: DISK OK [14:14:32] RECOVERY Total processes is now: OK on bots-3 i-000000e5.pmtpa.wmflabs output: PROCS OK: 95 processes [14:14:42] RECOVERY host: i-000000e5.pmtpa.wmflabs is UP address: i-000000e5.pmtpa.wmflabs PING OK - Packet loss = 0%, RTA = 0.75 ms [14:14:42] RECOVERY dpkg-check is now: OK on bots-3 i-000000e5.pmtpa.wmflabs output: All packages OK [14:14:52] RECOVERY Current Users is now: OK on bots-3 i-000000e5.pmtpa.wmflabs output: USERS OK - 0 users currently logged in [14:14:52] RECOVERY Current Load is now: OK on bots-3 i-000000e5.pmtpa.wmflabs output: OK - load average: 0.02, 0.03, 0.00 [14:14:52] RECOVERY SSH is now: OK on bots-3 i-000000e5.pmtpa.wmflabs output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [14:15:43] RECOVERY Free ram is now: OK on bots-3 i-000000e5.pmtpa.wmflabs output: OK: 1388% free memory [14:16:34] @labs-resolve i-000000fd [14:16:34] The i-000000fd resolves to instance I-000000fd with a fancy name patchtest2 and IP 10.4.0.74 [14:16:46] @labs-resolve i-0000010c.pmtpa.wmflabs [14:16:47] The i-0000010c.pmtpa.wmflabs resolves to instance I-0000010c with a fancy name aggregator1 and IP 10.4.0.79 [14:23:42] RECOVERY dpkg-check is now: OK on bots-labs i-0000015e.pmtpa.wmflabs output: All packages OK [15:13:23] wtf is adminbot on bots-1 [16:06:53] PROBLEM Free ram is now: WARNING on dumps-bot1 i-000003ed.pmtpa.wmflabs output: Warning: 10% free memory [16:24:17] ACKNOWLEDGEMENT Current Load is now: CRITICAL on patchtest i-000000f1.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [16:24:32] ACKNOWLEDGEMENT Current Users is now: CRITICAL on patchtest i-000000f1.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [16:24:32] ACKNOWLEDGEMENT Disk Space is now: CRITICAL on patchtest i-000000f1.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [16:24:47] ACKNOWLEDGEMENT Free ram is now: CRITICAL on patchtest i-000000f1.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [16:24:47] ACKNOWLEDGEMENT Total processes is now: CRITICAL on patchtest i-000000f1.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [16:25:17] ACKNOWLEDGEMENT Current Load is now: CRITICAL on patchtest2 i-000000fd.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [16:25:32] ACKNOWLEDGEMENT Current Users is now: CRITICAL on patchtest2 i-000000fd.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [16:26:02] ACKNOWLEDGEMENT Disk Space is now: CRITICAL on patchtest2 i-000000fd.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [16:26:17] ACKNOWLEDGEMENT Free ram is now: CRITICAL on patchtest2 i-000000fd.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [16:28:14] :| [16:28:17] ACKNOWLEDGEMENT Total processes is now: CRITICAL on patchtest2 i-000000fd.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [16:28:29] @labs-resolve i-00000207 [16:28:29] The i-00000207 resolves to instance I-00000207 with a fancy name rds and IP 10.4.0.18 [16:28:38] @labs-info rds [16:28:38] [Name rds doesn't exist but resolves to I-00000207] I-00000207 is Nova Instance with name: rds, host: virt7, IP: 10.4.0.18 of type: m1.medium, with number of CPUs: 2, RAM of this size: 4096M, member of project: hadoop, size of storage: 50 and with image ID: ubuntu-11.10-oneiric [16:28:52] @labs-project-info hadoop [16:28:52] The project Hadoop has 3 instances and 3 members, description: {unknown} [16:28:57] :/ [16:29:06] we really want to use description field [16:29:20] @labs-project-users hadoop [16:29:20] Following users are in this project (showing all 3 members): Diederik, Novaadmin, Whym, [17:05:08] hey petan, can i pm? [17:05:16] fast pls :D [17:05:21] I am just walking out of office [18:06:16] Does anyone need a reprieve before I make $HOME read-only for the day? [18:20:55] WARNING: Labs homedirs will be read-only for much of today. [18:20:59] starting… in just a minute. [18:21:37] great [18:21:50] how much storage will new have? [18:22:06] I hope for more than what we have :P [18:22:13] storage per project would be cool [18:22:18] rather than per labs [18:22:57] It will be per-project. I'm not sure how/if Ryan has quotad the volumes. [18:23:07] ok [18:42:14] <^demon> Ryan_Lane: https://ocroquette.wordpress.com/2012/12/16/simple-user-management-for-gerrit/ makes debugging LDAP for Gerrit way easier. [18:42:36] <^demon> (Single perl script that's an ldap daemon reading from flat files. Only implements enough LDAP to satisfy gerrit) [18:42:46] heh [18:42:46] <^demon> (Wish this existed a week or two ago) [18:43:15] andrewbogott: working now? [18:43:21] looks like [18:43:25] cool [18:43:42] 12/17/2012 - 18:43:42 - Updating keys for mwang at /export/keys/mwang [18:43:54] yep, 'permission denied' [18:46:37] permission denied? [18:46:48] Just, I succeeded in making things read-only. [18:46:54] In this case 'permission denied' is a good thing [18:47:00] Hm... what does it mean that there are 155 dirs in /export/home but labsconsole thinks there are only 145 projects? [18:48:36] ah [18:48:37] cool [18:48:50] good question :) [18:50:02] andrewbogott: some of them are groups and not projects [18:50:30] 'groups'? [18:50:59] allnovausersstatic <— that was a group (that has since been deleted) that was a virtual group that listed all members [18:59:28] btw I made nagios stop spamming [18:59:35] !nlogin [18:59:42] eh [18:59:45] @search login [18:59:45] Results (Found 1): newgrp, [19:00:23] ok if you open /nlogin in browser you will be able to login to nagios and use browser to stop checks from spamming in here [19:00:37] you need to create yourself account though [19:00:54] sudo htpasswd /etc/nagios3/ht* name [19:01:44] PROBLEM dpkg-check is now: CRITICAL on integration-jobbuilder i-000004e3.pmtpa.wmflabs output: DPKG CRITICAL dpkg reports broken packages [19:02:54] if someone knew how to connect it to ldap that would be cool [19:04:14] PROBLEM host: i-000004c8.pmtpa.wmflabs is DOWN address: i-000004c8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000004c8.pmtpa.wmflabs) [19:05:56] @labs-resolve c8 [19:05:56] I don't know this instance - aren't you are looking for: I-000001c8 (swift-be2), I-000004c8 (integration-androidsdk), [19:06:31] ah [19:06:32] cool [19:06:35] petan: thanks :) [19:06:45] but these are freshly down [19:06:55] this channel was spammed by 3 dead [19:07:00] which are dead like for months [19:07:02] * Ryan_Lane nods [19:07:08] they are probably the corrupted ones [19:07:12] I should just delete them [19:07:20] maybe [19:07:20] people are bad about cleaning up their own instances [19:07:28] disabling in nagios is first step [19:07:55] I think Damianz should swap host and alias in nagios [19:08:05] so we can see fancy names instead of real name [19:08:14] I know I can just @resolve all of them but :D [19:11:53] PROBLEM Free ram is now: CRITICAL on dumps-bot1 i-000003ed.pmtpa.wmflabs output: Critical: 5% free memory [19:13:36] @labs-project-info integration [19:13:36] The project Integration has 6 instances and 5 members, description: A project to try out Jenkins for MediaWiki and its relevant projects. [19:21:53] PROBLEM Free ram is now: WARNING on dumps-bot1 i-000003ed.pmtpa.wmflabs output: Warning: 6% free memory [19:34:53] PROBLEM host: i-000004c8.pmtpa.wmflabs is DOWN address: i-000004c8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000004c8.pmtpa.wmflabs) [19:36:53] RECOVERY dpkg-check is now: OK on integration-jobbuilder i-000004e3.pmtpa.wmflabs output: All packages OK [19:36:53] PROBLEM Free ram is now: CRITICAL on dumps-bot1 i-000003ed.pmtpa.wmflabs output: Critical: 5% free memory [19:37:33] RECOVERY host: i-000004c8.pmtpa.wmflabs is UP address: i-000004c8.pmtpa.wmflabs PING OK - Packet loss = 0%, RTA = 0.60 ms [19:48:54] PROBLEM dpkg-check is now: CRITICAL on integration-androidsdk i-000004c8.pmtpa.wmflabs output: DPKG CRITICAL dpkg reports broken packages [20:00:32] PROBLEM Total processes is now: WARNING on parsoid-spof i-000004d6.pmtpa.wmflabs output: PROCS WARNING: 155 processes [20:04:33] Change on 12mediawiki a page Developer access was modified, changed by Fomafix link https://www.mediawiki.org/w/index.php?diff=617745 edit summary: /* User:Fomafix */ Account created. Request removed. [20:11:30] benestar_: please don't use bastion for your bot [20:11:42] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core i-000004f9.pmtpa.wmflabs output: WARNING - load average: 10.58, 9.23, 6.81 [20:11:54] Ryan_Lane: huh? [20:12:03] are you bene in labs? [20:12:11] yes [20:12:13] i-000000ba.pmtpa.wmflabs : Dec 17 19:34:22 : bene : user NOT in sudoers ; TTY=pts/14 ; PWD=/home/bene ; USER=root ; COMMAND=/bin/mkdir /data/project/benebot [20:12:18] i am trying but it does not work [20:12:31] did i change anything? [20:12:34] no [20:12:41] but, don't use bastion for your bots [20:12:47] ask to be added to the bots project [20:12:51] Damianz, petan: ^^ [20:12:53] well, i tried to log in to bots-4 [20:13:06] put it says Permission denied (publickey). [20:13:14] giftpflanze: have a look [20:13:16] ok, let's solve that, then :) [20:13:24] benestar_ hi [20:13:24] benestar_: is your agent forwarded? [20:13:28] giftpflanze was trying to help me [20:13:32] @labs-user Benestar [20:13:33] That user is not a member of any project [20:13:42] what is your ldap [20:13:43] @labs-user Bene [20:13:43] Bene is member of 2 projects: Bastion, Bots, [20:13:49] ok [20:13:52] @labs-project wordpress [20:14:04] @labs-project-info wordpress [20:14:04] The project Wordpress has 0 instances and 1 members, description: A project for testing upgrades to wordpress. [20:14:39] @labs-project membase [20:14:45] -info :P [20:15:03] @labs-project-info [20:15:08] * andrewbogott is only half paying attention :) [20:15:11] thanks [20:15:19] @labs-project-info membase [20:15:20] The project Membase has 0 instances and 1 members, description: {unknown} [20:15:31] btw description is nice thing :P [20:15:35] Ryan_Lane: where did we stop? [20:15:37] you should use it more [20:15:41] yeah [20:15:43] agreed [20:15:52] benestar_: on bastion type this: ssh-add -l [20:16:28] @labs-project-info turnkey-mediawiki [20:16:28] The project Turnkey-mediawiki has 0 instances and 2 members, description: {unknown} [20:16:36] Ryan_Lane: The agent has no identities. [20:16:46] @labs-project-info bugtracker [20:16:46] The project Bugtracker has 1 instances and 3 members, description: {unknown} [20:16:49] damn there are a lot of empty projects [20:16:57] benestar_: your agent isn't properly forwarded, then [20:17:01] !access [20:17:01] https://labsconsole.wikimedia.org/wiki/Access#Accessing_public_and_private_instances [20:17:06] ok [20:17:09] lets sprint to set descriptions for all projects [20:17:09] so what to do? [20:17:11] benestar_: see: https://labsconsole.wikimedia.org/wiki/Access#Using_agent_forwarding [20:17:16] andrewbogott empty projects are fine unused instances are not [20:17:30] yep, I agree. Surprised, not alarmed. [20:17:31] because these eat cpu and storage [20:17:45] @labs-project-info hackathon [20:17:45] The project Hackathon has 0 instances and 3 members, description: {unknown} [20:18:20] Ryan_Lane: i already tried this [20:18:32] on your local system: ssh-add -l [20:18:40] does it show your key? [20:18:48] well, the fingerprint, that is [20:18:50] his local system is windows ;) [20:19:03] are you using putty? [20:19:11] he is [20:19:13] if so, you'll need to use pagent [20:19:28] and you'll need to add your key to pagent [20:19:45] then you'll need to configure putty to forward your agent when logging into bastion [20:19:51] I wish we had screencaps of this [20:19:55] * Ryan_Lane doesn't have windows [20:19:56] it has worked before, must be missing pageant (it even worked without adding keys) [20:20:08] Ryan_Lane I have windows I can create some [20:20:19] but I don't use putty so much :P [20:20:20] https://labsconsole.wikimedia.org/wiki/Access#Using_agent_forwarding [20:20:34] yes, pageant seems not to work [20:20:48] I bought windows cuz they are great for gaming :D [20:20:55] there should show an icon in the traybar, according to my sources [20:20:58] http://www.siteground.com/tutorials/ssh/putty.htm [20:21:09] need to fix huggle too [20:21:39] hm, that second doc goes through creating keys too [20:21:44] btw [20:21:46] which may not be what you want [20:21:47] petan: ? [20:21:55] checked out btrfs and it's cool thing [20:22:07] yeah, still pretty unstable, though [20:22:30] I don't know, redhat wants to use it in production builds [20:22:30] the problem [20:22:40] if i start pageant, nothing appears [20:22:57] @labs-project-info tor [20:22:57] The project Tor has 1 instances and 1 members, description: {unknown} [20:23:07] I switched to btrfs on my linux box and it's so great [20:23:07] benestar_: there's no icon in your lower-right menu? [20:23:11] whatever that is called [20:23:18] tasktray? [20:23:43] PROBLEM Current Load is now: WARNING on parsoid-roundtrip3 i-000004d8.pmtpa.wmflabs output: WARNING - load average: 6.39, 7.10, 5.72 [20:23:53] PROBLEM Current Load is now: WARNING on ve-roundtrip2 i-0000040d.pmtpa.wmflabs output: WARNING - load average: 6.86, 7.81, 5.87 [20:24:15] Ryan_Lane: :o [20:24:18] <^demon> Ryan_Lane: Notification area, technically. [20:24:18] i see it now [20:24:22] damned [20:24:26] wee [20:24:28] yeah, pagent is annoying in that regard [20:24:29] <^demon> Is the official name :) [20:24:43] it's a really non-intuitive way to ran an application [20:24:51] ^demon it's taskbar :D [20:24:57] that how microsoft call it [20:25:02] <^demon> The whole bar is the taskbar. [20:25:09] meh [20:25:11] <^demon> The bottom right area is called the "Notification area" [20:25:16] omg [20:25:18] it works xD [20:25:27] wee [20:25:38] when I roll over it "click to select what icons appear in taskbar" [20:25:41] \o/ [20:25:50] * on the taskbar [20:25:50] <^demon> petan: http://en.wikipedia.org/wiki/Taskbar#Taskbar_elements ;-) [20:25:59] meh [20:26:10] wikipedia can't beat microsoft :D [20:26:15] <^demon> "The notification area is the portion of the taskbar that displays icons for system and program features that have no presence on the desktop as well as the time and the volume icon. It contains mainly icons that show status information..." [20:26:25] would you trust some encyclopedia which everyone can edit? [20:26:26] <^demon> petan: We already did. See [[w:Encarta]] ;-) [20:26:26] :D :D [20:26:42] lol [20:26:52] PROBLEM Free ram is now: WARNING on dumps-bot1 i-000003ed.pmtpa.wmflabs output: Warning: 6% free memory [20:29:52] PROBLEM Free ram is now: WARNING on sube i-000003d0.pmtpa.wmflabs output: Warning: 19% free memory [20:30:15] @labs-user Bene [20:30:16] Bene is member of 2 projects: Bastion, Bots, [20:30:34] @labs-user Ryan Lane [20:30:34] Ryan Lane is member of 72 projects: Analytics, Bastion, Bots, Category-sorting, Cvresearch, Demo, Deployment-prep, Deployment-prepbackup, Dumps, Editor-engagement, Etherpad, Feeds, Ganglia, Gareth, Gerrit, Gluster, Integration, Juju, Linkcache, Lvs-labs, Mail, Mailman, Maps, Mapstory, Mediahandler, Mediawiki-bugfix, Membase, Mobile, Mobile-sms, Mobile-stats, Mwreview, Nagios, Nginx, Opengrok, Openstack, Otrs, Outreach, Patchtest, Pediapress, Phabricator, Project-proxy, Publicdata, Puppet, Puppet-cleanup, Queue, Reportcard, Selenium, Shop-analytics, Signwriting, Simplewiki, Statsgrokse, Storage, Sugarcrm, Swift, Swiftupgrade, Syslog-collection, Testing, Testlabs, Translation-memory, Turnkey-mediawiki, Video, Visualeditor, Webplatform, Webtools, Wiki-migration, Wikidata-dev, Wikifollower, Wikisource-dev, Wikistats, Wikistream, Wikitrust, Wlmjudging, [20:30:44] bah [20:30:49] he needs to be everywhere :P [20:30:49] I'm in a ton of projects [20:31:05] Ryan_Lane, petan: still does not work [20:31:12] benestar_: you're on bastion trying that [20:31:17] i can connect but immediatly disconnect again [20:31:23] ah [20:31:25] ooohhhh [20:31:26] I know why [20:31:29] benestar_I suggest you create another key which is simple [20:31:33] we're in the middle of home directory migration [20:31:33] :P [20:31:38] :/ [20:31:39] it can't create your home directory right now [20:31:42] ok [20:31:45] heh [20:31:49] we'll be done with that today [20:32:00] that suck but at some point could be sorted out [20:32:06] I think your home directory exist [20:32:11] but your key doesn't [20:32:22] hmm [20:32:22] ah [20:32:27] let me check [20:32:34] it's the same key as for bastion [20:33:10] benestar_: the first time you weren't disconnected immediately, right? [20:33:29] or were you? [20:33:34] ok [20:33:34] i was [20:33:37] ok [20:33:41] but i didn't see it first [20:33:43] your $home doesn't exist [20:33:43] RECOVERY Current Load is now: OK on parsoid-roundtrip3 i-000004d8.pmtpa.wmflabs output: OK - load average: 5.57, 4.69, 4.94 [20:33:47] that's it [20:33:54] so what to do? [20:34:09] you can either wait or we could do something extremely hardcore :D [20:34:14] who need's a ~ anyway :p [20:34:17] :o [20:34:31] like temporarily changing your home to /tmp/bene or such thing :D [20:34:38] :D [20:34:53] and how long must i wait? [20:34:55] or! [20:34:56] or for what? [20:35:10] take a fork and poke andrewbogott with it [20:35:18] lol [20:35:30] It's going to be a while… many unexpected complications [20:35:40] :/ [20:35:49] more than a week? [20:35:52] he's moving home [20:35:59] see topic [20:36:00] can bene use sftp in the meanwhile? [20:36:03] or only some hours [20:36:05] I mean UNIX home :D [20:36:08] not real [20:36:21] ah, tommorow [20:36:51] so this is the point [20:37:20] maintenance work like this is pretty rare [20:37:34] hmm [20:37:42] yes, like just twice every hour :) [20:37:43] block filesystems are a pain in the ass, from this regard [20:37:55] <3 btrfs [20:38:07] snapshots ftw [20:38:09] yeah [20:38:20] we may use ceph in the future [20:38:22] rather than gluster [20:38:29] we'll be testing it in eqiad, likely [20:38:35] mm [20:38:55] not using btfrfs as a backend, though ;) [20:39:00] btrfs, that is [20:40:17] it's still flagged as not ready for production use but from what I heard it's nearly finished [20:40:40] it supports some really cool stuff [20:40:53] Ryan_Lane: http://dpaste.org/rsOas/ <-? [20:41:00] subvolumes, mirroring on object level etc [20:41:22] andrewbogott: what's the gluster volume info command show for that one? [20:41:24] Ryan_Lane: Going to get us some sexy ssds? [20:41:40] :D [20:41:41] Damianz: well, only for logging/metadata/etc [20:41:48] they are expensive and small [20:41:53] yep [20:41:56] Sounds like a hooker [20:42:02] :) [20:42:03] Ryan_Lane: Looks right; added. [20:42:17] I mean -- I added the output from gluster volume info to the paste [20:42:22] oh [20:42:32] investing money into computer, cars... and girls [20:42:37] that's all same [20:42:38] andrewbogott: I don't see it [20:42:45] Damianz ^^ [20:42:56] oh… http://dpaste.org/iKF4m/ [20:42:57] Nah girls are more cuddly [20:43:06] lol [20:43:32] hm [20:43:49] Is maybe gluster down on that third brick? [20:44:06] lemme see [20:44:21] bleh. gluster is swapping hard [20:44:26] due to that stupid memory leak [20:44:31] let me restart it on all of the nodes [20:44:40] is it in the middle of copying all of the data now? [20:44:45] or was it stuck on that one? [20:44:52] nope, I haven't mounted a single thing yet [20:44:55] ah [20:44:55] ok [20:48:24] stupid labstore1 [20:48:34] when I stopped gluster it had segfaults [20:48:36] and wouldn't restart it [20:48:39] I hate gluster [20:50:46] and sda2 has errors :( [20:51:42] PROBLEM Current Load is now: WARNING on parsoid-roundtrip3 i-000004d8.pmtpa.wmflabs output: WARNING - load average: 5.20, 6.10, 5.47 [20:52:29] don't worry, gluster will have totally replicated that and will ensure no dataloss [20:52:33] rofllmfao [20:53:14] this fsck is absurd [20:59:53] RECOVERY Free ram is now: OK on sube i-000003d0.pmtpa.wmflabs output: OK: 25% free memory [21:04:35] RECOVERY Total processes is now: OK on wikidata-dev-3 i-00000225.pmtpa.wmflabs output: PROCS OK: 100 processes [21:07:06] this is killing me [21:07:28] andrewbogott: so, most of them had OOM'd processes [21:07:31] now what? [21:07:31] which would cause your problem [21:07:36] Ah, sure. [21:07:37] I'm fixing that [21:07:42] RECOVERY Current Load is now: OK on wikidata-dev-3 i-00000225.pmtpa.wmflabs output: OK - load average: 1.23, 2.66, 4.83 [21:07:56] going to a newer point release will also permanently solve this [21:07:56] So do we need to add a cron job to restart gluster every day? [21:08:06] Or will an upcoming upgrade fix this? [21:08:08] nah, it's weird. it is triggered by something [21:08:17] then it just eats up a shit-ton of memory [21:08:33] upgrading will fix it, though, yeah [21:08:56] OK. So, you're clear of the gluster boxes and I can start migrating again? [21:10:08] almost done [21:10:11] 1/2 anyway [21:10:15] 'k [21:10:19] labstore3 isn't letting me in via ssh [21:10:20] that's bad [21:10:22] ah [21:10:23] there we go [21:10:28] it's letting me in now [21:11:27] of course [21:13:52] RECOVERY Current Load is now: OK on ve-roundtrip2 i-0000040d.pmtpa.wmflabs output: OK - load average: 1.75, 3.29, 4.88 [21:16:30] labstore3 isn't being a happy camper [21:18:29] ok, now it's starting to come back up [21:23:16] ok, last box is being rebooted [21:29:52] PROBLEM Current Load is now: WARNING on labs-nfs1 i-0000005d.pmtpa.wmflabs output: WARNING - load average: 15.52, 12.25, 6.12 [21:32:53] PROBLEM Free ram is now: WARNING on sube i-000003d0.pmtpa.wmflabs output: Warning: 19% free memory [21:37:58] andrewbogott: ok, should be good now [21:38:05] ok, let's see... [21:41:43] seems the spam in the logs is caused by another fixed bug [21:45:04] goddamn now labs-nfs1 is so full I can't even mkdir [21:45:06] again [21:45:52] PROBLEM Current Load is now: WARNING on parsoid-roundtrip6-8core i-000004f8.pmtpa.wmflabs output: WARNING - load average: 10.31, 9.28, 5.97 [21:46:53] heh [21:49:12] bleh gluster's download site is down [22:09:54] RECOVERY Current Load is now: OK on labs-nfs1 i-0000005d.pmtpa.wmflabs output: OK - load average: 0.47, 0.80, 4.34 [22:14:43] PROBLEM Current Load is now: WARNING on parsoid-roundtrip3 i-000004d8.pmtpa.wmflabs output: WARNING - load average: 5.31, 5.91, 5.38 [22:16:27] ugh [22:16:29] http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&s=by+name&c=Glusterfs%2520cluster%2520pmtpa&tab=m&vn= [22:16:31] andrewbogott: [22:16:46] dear gluster, you are so, so full of fail [22:18:06] :-D [22:18:44] to the tune of 'dear prudence' [22:20:07] Ryan_Lane: That's the spike from me doing one rsync? [22:20:32] seems like it [22:29:02] looks like labstore1 is dead [22:32:08] wait, already? Migrations are still working for me… [22:32:21] Bah, should not have said that [22:33:28] did they stop? [22:36:09] getting some failures in wordpress; not sure what about yet [22:36:52] PROBLEM Current Load is now: WARNING on ve-roundtrip2 i-0000040d.pmtpa.wmflabs output: WARNING - load average: 5.54, 5.63, 5.25 [22:37:12] yeah, I have a mount but I can't write to it. [22:41:47] * Ryan_Lane sighs [22:42:06] this is absurd [22:45:03] Actually that might be project-specific. Other things are working [22:45:53] RECOVERY Current Load is now: OK on parsoid-roundtrip6-8core i-000004f8.pmtpa.wmflabs output: OK - load average: 3.14, 3.67, 4.59 [22:46:53] RECOVERY Current Load is now: OK on ve-roundtrip2 i-0000040d.pmtpa.wmflabs output: OK - load average: 5.07, 4.72, 4.95 [22:50:16] andrewbogott: well, I think it's based on which node is being written to [22:50:21] 1/4 freaked out [22:59:22] @labs-info deployment-dbdump [22:59:22] [Name deployment-dbdump doesn't exist but resolves to I-000000d2] I-000000d2 is Nova Instance with name: deployment-dbdump, host: virt7, IP: 10.4.0.56 of type: m1.small, with number of CPUs: 1, RAM of this size: 2048M, member of project: deployment-prep, size of storage: 30 and with image ID: lucid-server-cloudimg-amd64.img [22:59:42] RECOVERY Current Load is now: OK on parsoid-roundtrip3 i-000004d8.pmtpa.wmflabs output: OK - load average: 3.88, 4.24, 4.79 [23:05:07] ok. I may have brought it back into a stable state [23:05:20] one of the gluster people are looking at making a lucid version [23:07:37] andrewbogott: is it still working? [23:07:55] the cluster is stable again: http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&s=by+name&c=Glusterfs%2520cluster%2520pmtpa&tab=m&vn= [23:08:06] Most projects are working. A few gluster volumes I can't write to for some reason. [23:08:09] e.g. wordpress-home [23:08:22] I bet the client process needs to be killed [23:48:48] Ryan_Lane: Nope, not the client process. You can investigate if you want, otherwise I'm keeping a list of failures to look at when I get to the end. [23:49:06] I predict that if you mount wordpress-home you will immediately find that you can't write to it. [23:49:10] cool [23:49:20] * Ryan_Lane nods [23:49:25] let me look at the brick log [23:50:08] A couple of others (selenium and ganglia) did partial rsyncs and then failed midway through. Those worry me more. [23:50:39] that may have been due to the process restarts I was doing [23:52:40] I reproduced it though, same behavior. [23:52:52] heh [23:52:58] I just caused gluster to die [23:53:10] winner! [23:53:26] my rsyncs are still running... [23:53:40] I just killed it on one [23:53:48] by stopping wordpress-home [23:54:15] I think gluster volume start/stop/create/delete commands are causing the memory issue [23:54:22] http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&s=by+name&c=Glusterfs%2520cluster%2520pmtpa&tab=m&vn= [23:54:28] see what happened just now? [23:54:59] and on start. same thing [23:58:32] andrewbogott: stop/start on wordpress-home fixed it [23:59:04] and the load spike is definitely being caused by start/stop [23:59:44] ok, I'll sync wordpress next [23:59:47] thanks. [23:59:57] Will also see if that gixes ganglia and selenium