[03:04:02] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [03:04:58] Is there anyone who knows C++ in here? [03:05:10] Ryan_Lane? [03:05:28] I don't kow c++ [03:05:30] *know [03:05:32] :( [03:05:36] Ok, thanks anyway [03:05:39] yw [03:05:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [03:05:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [03:06:02] PROBLEM host: i-00000469.pmtpa.wmflabs is DOWN address: i-00000469.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:07:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [03:07:52] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:07:52] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [03:07:52] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [03:07:52] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [03:08:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [03:08:22] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:08:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:08:44] TParis: did you try asking in #wikipedia-bag ? [03:08:55] iirc Earwig knows C++, but he's away atm [03:12:52] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [03:12:52] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [03:34:53] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [03:35:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [03:35:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [03:36:02] PROBLEM host: i-00000469.pmtpa.wmflabs is DOWN address: i-00000469.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:37:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [03:37:52] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:37:52] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [03:37:52] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [03:37:52] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [03:38:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [03:38:22] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:38:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:42:52] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [03:42:52] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [03:45:52] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f.pmtpa.wmflabs output: Warning: 15% free memory [03:46:21] !log metavidwiki installed MediaWiki 1.18.5 + SMW 1.7.1 on metavidwiki.wmflabs.org, works as expected on ubuntu-12.04-precise. next up: testing MetaVidWiki compatibility on this base install (...which may run into compatibility issues) [03:46:22] Logged the message, Master [04:04:52] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [04:05:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [04:05:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [04:05:52] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f.pmtpa.wmflabs output: Critical: 5% free memory [04:06:02] PROBLEM host: i-00000469.pmtpa.wmflabs is DOWN address: i-00000469.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:07:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [04:07:52] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:07:52] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [04:07:52] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [04:07:52] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [04:08:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [04:08:22] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:08:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:12:52] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [04:12:52] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [04:15:52] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f.pmtpa.wmflabs output: OK: 95% free memory [04:34:52] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [04:35:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [04:35:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [04:36:02] PROBLEM host: i-00000469.pmtpa.wmflabs is DOWN address: i-00000469.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:37:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [04:37:52] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:37:52] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [04:37:52] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [04:37:52] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [04:38:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [04:38:22] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:38:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:42:52] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [04:42:52] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [04:55:18] Change on 12mediawiki a page Wikimedia Labs/Toolserver features wanted in Tool Labs was modified, changed by Nemo bis link https://www.mediawiki.org/w/index.php?diff=589104 edit summary: Explanation by Carl (CBM): «[[mailarchive:toolserver-l/2012-September/005382.html|plans for WMF Labs don't seem to include database database replication ''in a form that makes it a useful, direct replacement for toolserver'']]». [05:04:52] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [05:05:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [05:05:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [05:06:02] PROBLEM host: i-00000469.pmtpa.wmflabs is DOWN address: i-00000469.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:07:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [05:07:52] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:07:54] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [05:07:54] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [05:07:54] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [05:08:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [05:08:22] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:08:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:10:48] 10/01/2012 - 05:10:48 - Created a home directory for apramana in project(s): bastion,watchlist-groups [05:11:00] 10/01/2012 - 05:10:59 - Creating a home directory for apramana at /export/keys/apramana [05:12:52] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [05:12:52] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [05:14:02] PROBLEM Total processes is now: WARNING on incubator-apache i-00000211.pmtpa.wmflabs output: PROCS WARNING: 158 processes [05:15:49] 10/01/2012 - 05:15:49 - User apramana may have been modified in LDAP or locally, updating key in project(s): bastion,watchlist-groups [05:15:59] 10/01/2012 - 05:15:58 - Updating keys for apramana at /export/keys/apramana [05:23:08] Hello, I'm having trouble connecting to Wikimedia Labs. I've followed all the steps on the Help:Access page. I am receiving both errors: "Permission denied (publickey)" & "Connection closed by remote host" It doesn't appear to be a problem with my SSH client because I can access gerrit without any trouble. Any help is appreciated, thanks! [05:34:52] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [05:35:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [05:35:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [05:36:02] PROBLEM host: i-00000469.pmtpa.wmflabs is DOWN address: i-00000469.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:37:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [05:37:52] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [05:37:52] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [05:37:52] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:37:52] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [05:38:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [05:38:22] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:38:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:42:52] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [05:42:52] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [06:04:52] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [06:05:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [06:05:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [06:06:02] PROBLEM host: i-00000469.pmtpa.wmflabs is DOWN address: i-00000469.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:07:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [06:07:52] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:07:52] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [06:07:52] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [06:07:52] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [06:08:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [06:08:22] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:08:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:10:32] !log glam created initial instance for the gwtoolset project [06:10:34] Logged the message, Master [06:12:52] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [06:12:52] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [06:34:52] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [06:35:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [06:35:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [06:36:02] PROBLEM host: i-00000469.pmtpa.wmflabs is DOWN address: i-00000469.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:37:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [06:37:52] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:37:52] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [06:37:52] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [06:37:52] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [06:38:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [06:38:22] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:38:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:42:52] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [06:42:52] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [07:00:12] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c.pmtpa.wmflabs output: 1732164 [07:04:52] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [07:05:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [07:05:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [07:06:02] PROBLEM host: i-00000469.pmtpa.wmflabs is DOWN address: i-00000469.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:07:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [07:07:52] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:07:52] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [07:07:52] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [07:07:52] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [07:08:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [07:08:22] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:08:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:12:52] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [07:12:52] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [07:18:22] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af.pmtpa.wmflabs output: 917948 [07:34:52] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [07:35:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [07:35:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [07:36:02] PROBLEM host: i-00000469.pmtpa.wmflabs is DOWN address: i-00000469.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:37:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [07:37:52] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:37:52] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [07:37:52] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [07:37:52] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [07:38:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [07:38:22] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:38:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:42:52] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [07:42:52] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [08:04:52] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [08:05:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [08:05:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [08:06:10] PROBLEM host: i-00000469.pmtpa.wmflabs is DOWN address: i-00000469.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:07:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [08:07:52] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:07:52] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [08:07:52] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [08:07:52] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [08:08:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [08:08:22] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:08:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:12:53] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [08:12:53] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [08:28:32] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [08:34:52] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [08:35:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [08:35:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [08:36:02] PROBLEM host: i-00000469.pmtpa.wmflabs is DOWN address: i-00000469.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:37:20] !log bots restarting mysql on bots-sql2 - going oom again [08:37:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [08:37:22] Logged the message, Master [08:37:52] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:37:52] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [08:37:52] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [08:37:52] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [08:38:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [08:38:22] RECOVERY Free ram is now: OK on bots-sql2 i-000000af.pmtpa.wmflabs output: 822980 [08:38:22] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:38:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:40:12] RECOVERY Free ram is now: OK on bots-2 i-0000009c.pmtpa.wmflabs output: 1683128 [08:40:25] cd ~/ [08:40:58] !log nagios Stopped puppet/ircecho again, commented out in crontab this time [08:40:59] Logged the message, Master [08:41:07] Damianz why? [08:42:12] Repetative spam/broken checks/not much point until it gets fixed properly/it's annoying/it's been complaing about those ~14 checks for like 10hours [08:42:37] * Damianz goes to sign paperwork as it's the start of the month [08:43:33] maybe we could just disable irc feed for these broken? [08:43:43] that's what I was doing before [08:44:15] I was going to change the config a little and either mark them as down so they don't report or just ignore them totally. Not got time to do it until this afternoon though. [08:58:07] Damianz: or petan: do either of you have a moment to answer some noob ?'s regarding puppet [08:58:32] just ask :D [08:58:48] !ask [08:58:48] Hi, how can we help you? Just ask your question. [09:01:37] earlier i tried creating an instance w/o any puppet config selected , then tried to add a puppet config after the instance finished building - my assumption was that by adding the puppet config the instance would install the software i selected, but it didn't ... i tried webserver::apache2, webserver::php5, webserver::php5-mysql thinking it would basically install a lamp stack [09:02:03] dan-nl: sudo puppetd -tv and result pls :) [09:02:24] ok [09:02:26] so [09:02:29] php5 includes apache [09:02:33] petan: tried that as well and got a certificate error - couldn't connect to the puppet server or something like that [09:02:33] apache will conflict [09:02:37] our manifests suck [09:02:50] dan-nl can you just paste it [09:03:07] got rid of that instance and started to use apt-get instead ... [09:03:16] aha [09:03:19] can create another instance and show you what i mean ... give me a few moments [09:03:36] it may be conflict, or temporary problem with puppet master [09:03:45] or that manifest is wrong [09:03:51] dan-nl: Did you select puppetmaster::self at all? [09:03:58] no [09:04:04] do i need that as well? [09:04:07] nope [09:04:17] that breaks stuff [09:04:20] If you are doing and ever reboot the instance puppetmaster isn't configured to start so it breaks [09:04:56] We really need to fix bots-sql2 [09:05:02] Damianz what's wrong there [09:05:06] literally just running puppet is enough to oom it due to mysql's memory usage [09:05:12] aha [09:05:16] has like 2/3 heavy bots that rape mysql [09:05:18] I can decrease the buffer [09:05:28] but it will be slow [09:05:30] probably needs it, I can't even log back in now [09:05:37] ok [09:05:55] mysql 9666 92.1 61.6 832168 631080 ? Ssl 09:04 0:58 /usr/sbin/mysqld < urgh [09:06:21] Shame we can't increase it to like 4/8 gb of ram =/ [09:06:41] total used free shared buffers cached [09:06:42] Mem: 1023448 1014144 9304 0 24688 156892 [09:06:43] -/+ buffers/cache: 832564 190884 [09:06:44] Swap: 0 0 0 [09:06:53] Probably due to all these [09:06:54] SELECT COUNT(distinct concat(lang,wikidomain)) AS total_record FROM linkwatcher_linklog WHERE doma [09:06:54] 190mb free [09:06:57] that's fine [09:07:01] thanks linkwatcher [09:07:02] no reason to decrease it? [09:07:21] Hmm [09:07:28] Damianz we can increase RAM, but that would require either reboot or maybe reinstall [09:08:23] reinstall I think [09:08:28] http://ganglia.wmflabs.org/latest/?c=bots&h=bots-sql2&m=load_one&r=hour&s=by%20name&hc=4&mc=2 < hmm [09:08:44] it seems to have enough ram but still randomly keeps having oomkiller come out (see console log) [09:09:22] pff, tried creating a new instance, which error'd, then tried to delete it and got the following error "Successfully deleted instance, but failed to remove glam-puppet-test DNS entry." [09:09:38] Yeah the dns code sucks [09:10:38] great, how do i get rid of the entry ... maybe why it error'd i used the same name earlier [09:11:45] I think it needs manually deleting out of ldap, though you could try deleting the instance again if it didn't remove it. [09:11:47] very strange, tried creating another instance and that one error'd as well ... [09:12:19] have no idea why ... did not select any puppet packages ... https://labsconsole.wikimedia.org/wiki/Nova_Resource:I-00000471 [09:12:25] petan: Well bots-sql2 issue seems to be puppet, if you run it and watch the ram it gets down to like 20mb then oomkiller kicks in. Might be alright for a while but probably an idea to start using sql3 more. [09:13:41] hmm [09:13:49] we really need that buffer, even more than puppet [09:14:27] yeah, puppet isn't really doing much on that instance currently [09:14:45] should be scheduled for recreation [09:14:52] with like 4gb of ram [09:14:55] or more [09:14:57] mhm [09:15:14] shame we can't rename instances or we could do it, import the dump then rename rather than take it down for like 30min. [09:15:18] I can do it today, but it will be outage like 2 hours [09:15:26] We do at lest have daily dumps in project storage now though. [09:15:31] we can't import dump and rename [09:15:43] or wait [09:15:44] we can [09:15:56] I could setup a second server and make replica [09:16:13] but... bleh, that would still be small outage [09:16:32] or we would need to change the DNS in /etc/hosts [09:16:36] The 'less downtime' way would be to bring up a second server, import the dump, setup mysql replication, switch stuff over. But everything points at 'bots-sql2' which we can't move the instance name over. So it's easier to just re-install. [09:16:53] importing the dump is nonsense [09:17:01] it will make inconsistent db [09:17:15] because you can't import live db, which is being used [09:17:25] Damianz: petan: good luck with getting it sorted out [09:17:48] Well realistically it's going to take like 30min+however long it takes to import the dump. [09:18:15] if I replicated db to second server we would have online backup [09:18:28] then we can change the DNS on each machine of bots project [09:18:40] so that they start resolving bots-sql2 as new machine [09:18:48] then we recreate the bots-sql2 :D [09:18:51] and replicate it again [09:18:53] could just stick an entry in the hosts file [09:18:57] then we kill new machine [09:19:03] yes, that's what I mean [09:19:07] It would be nice to be able to move the hostname betwean instances [09:19:12] yes... [09:20:12] sounds like an interesting way how to spend rest of monday :D [09:20:18] lol [09:20:30] It would be kinda of nice to have replicas setup in general [09:20:34] heh [09:20:37] indeed [09:20:46] we could just keep new machine :P [09:21:25] !log bots upgrading bots-sql2 to higher ram, going to take while [09:21:26] Logged the message, Master [09:22:11] !log bots creating new bots-sql2r [09:22:12] Logged the message, Master [09:22:26] WTF [09:22:34] all new instances have 10GB storage max? [09:22:48] @seen Ryan_Lane [09:22:48] petan: Last time I saw Ryan_Lane they were quiting the network N/A at 10/1/2012 7:07:45 AM [09:22:55] they do? [09:23:02] seems like that [09:23:22] so they do... [09:23:26] mutante: ping [09:23:34] we don't support attaching volumes either bleh [09:24:21] hmm how did this instance get a 50gb seconday drive then, weird [09:24:28] in past it was possible [09:24:36] now it's not [09:24:40] dunno why [09:25:53] !logs bots waiting for someone from ops to let me create big instance... [09:26:00] !log bots waiting for someone from ops to let me create big instance... [09:26:01] Logged the message, Master [09:26:24] we really need some EU ops [09:27:28] * Damianz points at paravoid [09:27:38] he's sorta european and sorta an op [09:27:57] I'm not sure how I should respond to that [09:28:03] since I'm both a European and an op :) [09:28:07] :D [09:28:13] not "sorta" [09:28:17] paravoid why there are no more big storage instances [09:28:29] You're not classifiable as european if you're not in europe and since you only just re-appeared it's 'sorta' :) [09:33:36] @seenrx Ryan.* [09:33:36] petan: Last time I saw Ryan_Lane they were quiting the network N/A at 10/1/2012 7:07:45 AM (02:25:51.0563240 ago) (multiple results were found: Ryan_Lane1) [09:33:50] better :) [09:34:21] @seenrx Ryan.* [09:34:21] petan: Last time I saw Ryan_Lane they were quiting the network N/A at 10/1/2012 7:07:45 AM (02:26:35.9355800 ago) (multiple results were found: Ryan_Lane1) [09:34:21] !ping [09:34:22] pong [09:34:32] am I lagging or bot is [09:34:47] @help seen [09:34:47] Info for seen: Display information about user activity [09:34:48] bot is but like 2seconds [09:34:53] s/but/by/ [09:34:57] :/ [09:35:02] it lives on labs [09:35:03] !ping [09:35:03] pong [09:35:09] @seenrx Ryan.* [09:35:09] petan: Last time I saw Ryan_Lane they were quiting the network N/A at 10/1/2012 7:07:45 AM (02:27:24.1138110 ago) (multiple results were found: Ryan_Lane1) [09:35:15] teh command run in own thread [09:35:20] can't slower it [09:35:21] that's not laggy now [09:35:48] hmm [09:36:08] what is some expensive regex [09:37:06] @seenrx /^(e+)+$/ [09:37:06] petan: I have never seen /^(e+)+$/ [09:37:14] @seenrx ^(e+)+$ [09:37:14] petan: I have never seen ^(e+)+$ [09:37:21] mhm [09:37:46] Hmm, is regex really needed. Most the time a simple glob would do. [09:38:19] glob? [09:38:30] there is @seen for simple search [09:38:44] @seen Damianz [09:38:45] petan: Damianz is in here, right now [09:38:55] @seenrx D.mian. [09:38:56] petan: Last time I saw Damianz they were talking in the channel, they are still in the channel #wikimedia-labs at 10/1/2012 9:37:46 AM (00:01:09.8385000 ago) (multiple results were found: Damianz_) [09:40:06] Glob would be like [09:40:10] @seen damian* [09:40:10] Damianz: I have never seen damian* [09:40:21] aha [09:40:46] I prefer regexes over that [09:45:57] @seenrx .* [09:45:57] petan: Last time I saw petan they were talking in the channel, they are still in the channel #wikimedia-labs at 10/1/2012 9:45:57 AM (00:00:00.1371620 ago) (multiple results were found: KuboF, Jarry1250, gerrit-wm, Danny_B|backup, Tlusta and 1648 more results) [09:46:40] .* matches me :) [09:46:42] that's good [09:47:07] az na to, ze to na tomhle kanalu zminuje i lidi z jinych kanalu [09:47:17] :P [10:19:52] https://i.chzbgr.com/completestore/12/9/19/Tm06DgsywUeGRsAduE7VdQ2.jpg rofl [10:48:07] hey all, anyone have any ideas why accessing web services of an instance via socks proxy would refuse a connection? it worked about two hours ago and now i get a Connection refused message [10:52:24] I assume your webserver is running [10:52:30] are you accessing it via i- or name? [10:52:34] name seems to not work that well [11:00:22] Damianz: was using this http://glam-gwtoolset.pmtpa.wmflabs:8080/ [11:00:58] Damianz: but now i'm trying to clean it up .. i had installed php5 with apt-get php5 ... so removed that ... now cleaning up ... [11:02:48] Either the webserver isn't running or the security group is restricting traffic [11:04:43] ja, really wierd ... it worked about three hours ago ... will try and clean up a bit more and see [11:05:05] rebooting [11:19:21] Damianz: got apache cleaned up and running again and the security group has port 8080 open ... still Connection refused [11:19:32] Damianz: any ideas? [11:25:26] anyone have any ideas on this error when running puppetd -tv ... err: Could not request certificate: getaddrinfo: Name or service not known [13:00:25] anyone have any ideas why an instance build will fail? have tried a few times to create glam-puppet-test without success, current attempt it i-00000474. no puppet packages are selected when i create the instance. [13:57:34] dan-nl: Sorry got distracted, does it work if you try the instance fqdn? [13:57:56] And there's a really annoying scheduler issue where the compute nodes get marked as offline and any requests fail then they come back [13:59:38] Damianz: np, do you mean instance creation - no, it still says error [14:00:06] Once it says error it won't fix its self, if it never got past building it needs deleting/creating again [14:00:40] Damianz: right - tried that a few times with no success, will try another build type now to see if that helps [14:00:49] hmm [14:00:54] are you using the tiny type by any chance? [14:01:19] Damianz: no, have been using m1.medium [14:01:41] interesting, I can't see the logs to check but there might be something weird going on [14:02:54] Damianz: ja, even the m1.small came back with err [14:04:24] Damianz: regarding apache ... finally got it to work when i opened both 80 and 8080 in one of the security groups i'm using ... thought 8080 would be enough but it wasnt' [14:04:26] You'll probably have to wait a few hours for someone that can poke the logs to see why because our error feedback sucks. [14:04:51] lol [14:14:19] I need to nip out to the shop so I'll be back in a while [14:22:26] Change on 12mediawiki a page Developer access was modified, changed by Das Schäfchen link https://www.mediawiki.org/w/index.php?diff=589200 edit summary: [14:59:01] Change on 12mediawiki a page Developer access was modified, changed by Atrawog link https://www.mediawiki.org/w/index.php?diff=589208 edit summary: [15:13:02] hashar: chrismcmahon: just updated the status for https://www.mediawiki.org/wiki/Beta_cluster so if it's inaccurate or incomplete you should finish it [15:13:22] thanks sumanah, looking [15:13:49] sumanah: thanks :-] [15:15:43] hashar: chrismcmahon: same for https://www.mediawiki.org/wiki/Continuous_integration [15:16:51] chrismcmahon: hashar - it would probably be good to indicate whether we hit our end-of-Sept milestones for CI https://www.mediawiki.org/wiki/Wikimedia_Engineering/2012-13_Goals#QA_Engineering_and_Continuous_Integration [15:20:30] sumanah: I added a sentence to https://www.mediawiki.org/wiki/Beta_cluster. Also, I spent 3+ hours in a hangout with Zeljko the new QA Engineer (it's his first day) discussion browser test automation etc., and we cemented the first project, we should be able to present that this week. [15:20:53] got it [15:53:49] anyone know why every time i try and create an instance it errors out? [16:03:00] does not seem to matter which instance type i select ... also, have not selected any puppet configurations as suggested [16:06:24] anyone know why when i try to change the puppet config it never seems to install anything? [16:07:28] <^demon> You have to run puppet afterwords. [16:07:34] <^demon> `sudo puppetd -tv` [16:08:12] <^demon> That'll force a puppet run, if you don't want to wait for puppet to do its thing. [16:09:34] ^demon: you mean, after i get the error status go to configure instance and select a puppet configuration? [16:11:24] <^demon> I don't know about the errors. But to configure puppet, yeah that's how it's done. [16:13:19] ^demon: i was having a different issue with puppet earlier, but this issue has to dow with the actual creation of the instance ... i'm at https://labsconsole.wikimedia.org/wiki/Special:NovaInstance ... view the project > server region and click add instance ... set values and submit ... instance state stays at error ... no console output ... setting a puppet config does not trigger any further build [16:14:09] ^demon: instance is never built so i can't log into it and run puppetd -tv [16:15:35] <^demon> I have no clue how to help with that. [16:15:48] k, any idea who might know ... [16:16:12] <^demon> Ryan, Andrew. The usual suspects. [16:16:23] k, thanks [16:31:39] ^demon: any idea why when i run sudo puppetd -tv i get the following error ... err: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate definition: Package[apache2] is already defined in file /etc/puppet/manifests/webserver.pp at line 91; cannot redefine at /etc/puppet/manifests/webserver.pp:42 on node i-0000046f.pmtpa.wmflabs [16:32:39] dan-nl: Do you select webserver::apache2 and webserver::php5 [16:32:48] yes [16:32:49] dan-nl: Can you clarify? You say you can't create instances except it sounds like you have one running at least... [16:32:58] Did instance creation work once and then never again? [16:33:18] andrewbogott: yes, i have one that i created a few days ago [16:33:55] dan-nl: Ok. For that one… I believe that webserver::php5 includes apache already. So you should deselect the webserver:apache2 package and rerun puppet. [16:33:56] dan-nl: You must not select webserver::apache2 when you select webserver::php5. webserver::php5 includes webserver::apache2. [16:34:14] andrewbogott: i couldn't get puppet running properly so i started to use apt-get in that instance and tried to create another to play around with puppet called glam-puppet-test ... that's the one that i could not create [16:34:16] dan-nl: Lemme see if I can create an instance today. [16:34:35] andrewbogott: sounds good [16:36:12] andrewbogott: Can we change this so webserver::php5 and webserver::apache2 can selected together? [16:37:02] Jan_Luca: Maybe. We're gradually refactoring a lot of those puppet classes to make them less broken. [16:43:22] andrewbogott: that would be great, find them very confusing atm ... tried selecting webserver::php5-mysql and figured it would install apache2, php5, and mysql but it didn't ... so i switched to using apt-get and ran into issues with it as well ... [16:44:14] andrewbogott: it has been very difficult to get a regular lamp stack in place with phpmyadmin ... still haven't achieved that goal yet [16:44:28] dan-nl: Most all of those classes are an undocumented mess. The 'Basic instance configurations' section on this page is an attempt to offer guidance: https://labsconsole.wikimedia.org/wiki/Help:Contents [16:44:42] Although there isn't much there yet. [16:44:57] andrewbogott: yes, i've been in there adding documentation as i go ... [16:45:13] I would expect that /just/ webserver::php5 would get you most or all of a lamp stack in place. [16:46:13] andrewbogott: that is what i had hoped but it didn't work ... puppet kept complaining about not being able to access a certificate ... today it seems to be able to do that but now mysql is complaining about ... ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) [16:48:11] also noticed that unselecting a puppet configuration does not uninstall what it installed ... was hoping it would do that so i could try another puppet config and see the difference, but it just doesn't seem to work that way ... that's another reason i thought of creating another instance to work with, but then that didn't work ... [16:49:53] creating a new instance is the right thing to do; I'm trying to figure out why that is broken. [16:50:00] k [16:50:06] It's true that puppet is largely a one-way street. [16:53:53] imo we should just remove the puppet options on the install page [16:54:54] yeah, probably. [16:55:10] * Damianz wonders if zeljkof__ is the said Zeljko, if so HI! [16:56:04] Damianz: he is :) [16:56:22] chrismcmahon: Yay he's the friendly socialising type :) [16:56:55] * andrewbogott thinks that puppetd must keep a log someplace but can never find it [16:57:21] it does [16:57:44] it keeps a log of basically what -tv outputs and keeps 'state' files which are like yamlish diffs of what changed [16:57:53] They're actually awesome to collect and parse then complain about centrally [16:58:14] And… where does that log go? I mean for automated runs, not by-hand runs [16:58:26] Our servers have /var/log/puppet but, of course, no logs there. [16:59:18] I can see nova-compute restarting frequently, tempted to blame puppet. [17:00:22] * Damianz looks at how puppet is setup here [17:00:39] well, wait, maybe compute is doing what it's supposed to be doing… I'm baffled. [17:01:23] if you look at /var/lib/puppet/state/last_run_report.yaml it will tell you if it restarted the service on the last run and roughly why [17:01:35] not sure why we don't actually log anything in /var/log/puppet ... [17:02:21] oh that reminds me [17:02:48] andrewbogott: Dunno if petan asked you, but is it possible to create an instance with a secondary disk? (they all have 10gb only now, sql servers need more and can't run on gluster). [17:02:55] Or is it possible to up the ram (I don't think so) [17:03:30] Don't the other image options have bigger drive and ram sizes? [17:03:34] Or is that not what you mean? [17:03:42] nope, they all say 10gb now [17:03:46] it use to list a 50gb option [17:04:10] Oh, well… that's probably because Ryan is trying to encourage you to use project storage instead of instance storage :) [17:04:29] Would love gluster to play nice with mysql but it has locking issues and causes data loss [17:04:31] But I guess project storage is gluster and not suitable for database? [17:05:12] Really we just need more ram on bots-sql2, but the only way to up it to like 4/8gb is to make a new vm, but there isn't the option to get the same disk size again heh [17:05:25] * Damianz would love to do live resizing of instances for ram/cpu which kvm can more than do [17:05:36] I wouldn't want to risk resizing at this point, I've never seen it work properly. [17:05:41] speak of the devil [17:05:47] But perhaps Ryan_Lane can suggest a solution, he having just arrived :) [17:06:03] Ryan_Lane: Hi! [17:06:22] I'd like to try a resize on a test instance before doing it on a live one [17:06:24] The max storage on instances is now 10gb, can we have a 50gb option with like 4/8 gb of ram to create a new instance to replace bots-sql2 with. [17:06:34] it's ooming constantly everytime puppet runs [17:06:40] and mysql is getting hammered [17:06:46] yeah, you can always do that [17:06:53] or do you need your quota increased? [17:07:03] well I only have options for 10gb storage [17:07:09] use to have at least a 50gb option [17:07:20] hm [17:07:43] oh wow [17:07:45] that's weird [17:07:50] (since mysql can't use project storage without the locking data corruption issues) [17:07:56] yeah [17:08:07] <^demon> Ryan_Lane: fyi speaking of memory, we had an issue about an hour ago. manganese went swapping on a puppet run. Mark killed puppet, gerrit's fine. [17:08:09] seems essex changed all of the flavor types [17:08:23] stupid puppet [17:08:53] I should increase medium to 20 and large and xlarge to 50 [17:08:55] <3 random un-expected changes [17:09:08] and just delete the s types [17:09:15] tiny is useless [17:09:18] yeah [17:09:23] I need to delete that one too [17:09:26] hmm bots-sql* need to use the puppet mysql class it seems too [17:09:52] ok, well I need to head into the office [17:10:18] <^demon> I've used small. [17:10:23] <^demon> Not totally useless. [17:10:24] :o it's like 10am heh [17:10:26] have fun [17:10:36] ^demon: small is ok, tiny you can't even run puppet on [17:10:44] <^demon> yup :) [17:11:13] Ryan_Lane: When you get to said office could you poke Leslie about network restrictions to the TS :) [17:11:27] <^demon> xsmall can go too :p [17:11:51] We should just have 'one size fits all' :D [17:12:09] Though I did just spend the last hour and a bit trying on like 6 different sizes of helmets so I'm not a fan of random sizing. [17:12:30] Ryan_Lane: Did you find the bug with the project storage? [17:12:42] Jan_Luca: haven't had a chance to look at it just yet [17:12:50] dan-nl: Did andrewbogott sort your building instance issues? [17:13:05] Damianz: Still debugging. [17:13:12] Ah ok :) [17:13:27] I assume related to the puppet question then and probably the same scheduling issue [17:13:43] * Damianz goes to put his stupidly expensive clothes away and returns to bike shopping [17:23:27] When creating a instance I get the status "error". [17:29:40] Jan_Luca: andrewbogott is looking at that [17:34:52] Damianz: i agree regarding removing the puppet config option from the create instance page ... based on the suggestion there i've never used it and if it is the case that it can cause a build fail, might as well just indicate something like “set puppet configurations on the instance’s config page” after the instance has been created ... would also love to see a “human understandably” description of those puppet configurations [17:35:29] and some suggestions on which ones to use ... [17:35:37] dan-nl: What I'd love to do (and has been suggested before) is to parse the manifests, pull comments from each class and we can easily describe each option in human language [17:35:44] It's a little hard because puppet is a *mess* right now [17:35:51] Hence in labs it's manually added fields [17:35:58] Damianz: I was about to say that :) There should be embedded docstrings in the puppet .pp files. [17:36:06] ah, that's too bad ... nice suggestion though ... makes a lot of sense [17:36:07] things like apache2 conflicting with php is just stupid, we'll fix it eventually [17:36:26] andrewbogott: Real docs in role classes would be perfect in the long run =D [17:36:41] I don't think we need to fix puppet before adding docstrings… at the moment they could say things like "this class is stupid, don't use it" which would get us well on our way :) [17:37:03] heh [17:37:03] ah, that would definitely work! [17:37:13] very clear [17:37:15] I actually tried writing a parser and gave up because I was busy [17:37:24] Taking a point that we *only* include documented things [17:37:32] Then force people to document classes they want could work [17:37:40] Yeah, correlating docs with classes might be hard or easy, I haven't thought about it much. [17:37:43] We'd still need a manual option for testing new classes with puppetmaster::self though [17:38:09] In theory writing a parser to go though every class and pull docstrings is easy, it's in ruby so easily also causes rage [17:40:55] andrewbogott: i’ll stop for now ... hopefully you’ll have a solution for instance creation tomorrow and then i’ll have a chance to play around with it a bit more and see if i can sort it out properly ... thanks for your help [17:41:15] dan-nl: I have a fix that I'm about to push, but I don't know if it's /the/ fix. [17:41:28] But, yeah, let me know if things go any better tomorrow :) [17:41:31] andrewbogott: hehe, okay [17:46:08] Platonides, any plans for the webtools project? (for the near future) [17:48:01] Hello, I'm having trouble connecting to Wikimedia Labs. I've followed all the steps on the Help:Access page. I am receiving both errors: "Permission denied (publickey)" & "Connection closed by remote host" It doesn't appear to be a problem with my SSH client because I can access gerrit without any trouble. Any help is appreciated, thanks! [17:48:29] blackjack48: Do you have a key setup? [17:48:44] alternativly paste ssh -vvv into a pastebin somewhere [17:49:20] Damianz: Yes, I added it on NoveKey. [17:49:39] 1sec [17:49:45] let me check if you're in the project [17:49:49] is your username blackjack48 [17:50:07] it's "Aaron Pramana" [17:50:28] ok, you are a member [17:50:54] errr [17:51:08] andrewbogott: Since Ryan is MIA, wtf is labs-homedir bot thing [17:51:16] as it's not here I assume it's not updating keys or homedirs [17:51:41] Um… no idea :( [17:51:43] oh it is here [17:51:49] * Damianz wonders why labs-home-wm isn't voiced [17:52:10] blackjack48: Did you see labs-home-wm say it had added you key? [17:52:44] I know there was an issue with new project storage the other day... I wouldn't be surprised if bastion just wants nscld restarting again heh [17:53:52] Damianz: yes, I think so [17:54:00] hmm [17:54:04] the keys show up [17:54:20] so paste the output of your ssh with -vv so it tells you what auth methods it's trying [17:54:45] ok (brb) [17:56:09] you know sometimes it would be easier to tailing the log while you login [17:56:28] totally should work on getting logstash setup and pipe bastion logs into it [18:01:51] dschoon: (I think it was you.. might have been the other dscwhoooy person) TS access from labs is sorted now [18:02:04] hi Damianz :) sorry, I have just noticed your "hi" [18:02:07] Other dsc :) [18:02:15] sorry Damianz [18:05:17] damn [18:05:24] he can read the mailing list though :) [18:08:52] Damianz: Sorry, I haven't been able to copy/paste from VirtualBox, so I took screenshots of the output: http://www.sociotopia.com/sshproblem/ [18:09:10] errr, any reason you're using virtualbox heh [18:09:43] have alternate recommendations? [18:09:46] ok [18:09:48] that host doesn't exist [18:10:07] (watchlist-groups.pmtpa.wmflabs) [18:10:30] Well I use normal ssh, you could use putty or such if on windows [18:10:35] oh is it supposed to be the instance? [18:10:37] I find a vm for daily stuff a bit of a pita [18:11:09] that's the project name [18:11:16] go to [18:11:22] https://labsconsole.wikimedia.org/wiki/Special:NovaInstance [18:11:25] hit add instance [18:11:27] select some options [18:11:30] press ok [18:11:33] wait [18:11:35] wait a bit more [18:11:38] hope andrewbogott has fixed the issue [18:11:40] wait [18:11:43] then ssh to the instance name [18:11:49] I can't see any instances in that project atm [18:12:11] Damianz: want to see if we can resize? [18:12:20] I'm going to create a fake instance and resize it [18:12:25] sure :D [18:12:33] I'd love to just up the ram, would save a load of effort/downtime [18:12:57] We at least have daily dumps so if you do screw it over it's mostly recoverable heh [18:13:02] ah. right. I need to modify the flavors too [18:13:14] Damianz: ok, I tried setting up an instance last night, but it resulted in an error... I'll try again later today after my classes. [18:13:18] that's a really annoying thing for them to change the defaults on :( [18:13:31] blackjack48: Ok, talk to andrewbogott if you get another error [18:13:45] andrewbogott: the gluster script on labstore2 seems to not be updating :( [18:13:59] andrewbogott: there's some projects that were created that aren't being found, it seems [18:13:59] Can't believe I've been sending out email for over a week with my real name set to the mx ip :( *embarassed* [18:14:31] Ryan_Lane: Also dunno if you saw my email but Leslie is awesome [18:14:46] Ryan_Lane: OK, before I start thinking about a new problem, can you I tell you about the immediate one? [18:14:49] Damianz: yeah, when I got in I asked her if she could do it, and she said "funny, I just did it" [18:14:56] andrewbogott: what's the immediate problem? :) [18:14:56] heh [18:15:30] Instance creation is failing. nova-manage service list shows the compute nodes as being down and up alternately every few minutes... [18:15:33] ah [18:15:44] any idea which service its failing in? [18:15:46] also, which project? [18:15:49] we may be out of IPs [18:16:05] which is one of the other immediate problem that I was talking about :) [18:16:15] nova-compute instaces are doing 'qemu-img info' every few minutes as well (which I thought they were supposed to only do when compute started up?) [18:16:29] I thought this was because of a puppet bug I introduced, but I fixed that to no effect. [18:16:34] hm [18:16:46] are the services actually restarting themselves? [18:16:55] Not according to ps [18:17:01] 'I STRONGLY FUCKING BELIEVE THAT MY SYNONYM FOR COCKRING IS WIDELY ACCEPTED AMONG WESTERN CULTURE YOU FAGGOT, RVERT IT NOW OR JIMMEH DIES.' < this basically sums up why I hate people, as if that's a valid comment for a reverted edit. [18:17:46] Damianz: You should be excited that our project inspires such enthusiasm [18:18:19] :D [18:18:25] andrewbogott: Oh people are very enthusiastic in creating more work for reviewers :D [18:18:26] Ryan_Lane: Of course, I haven't spent much time looking at logs when things are working properly. So maybe everything I'm noticing is normal [18:18:41] andrewbogott: well, so, the first thing to find is if the instance is being bound to a host [18:18:50] if it is, then it isn't failing at the scheduler [18:19:02] then, to see if it is getting a private IP [18:19:09] that means it isn't failing at the network node [18:19:33] it should be getting bound to virt5, since it's new and has way less instances [18:19:35] Yeah, nova-scheduler creates these log lines that are as big as my whole screen, making it hard to follow. Lemme look again... [18:19:47] yeah. nova-scheduler sucks [18:19:58] well, you should be able to do a nova show on the instance id [18:20:02] and it'll tell you which host [18:20:03] "Casted 'run_instance' to compute 'virt5'" [18:20:34] 2012-10-01 18:15:28 TRACE nova.rpc.amqp RemoteError: Remote error: NoMoreFixedIps Zero fixed ips available. [18:20:42] well, that's a bitch [18:20:49] Oh, where was that? [18:20:59] andrewbogott: do you have any scratch instances around that can be deleted? [18:21:04] I'll delete some of mine too [18:21:06] I can delete 1 [18:21:14] I'll also delete some of the corrupted ones [18:21:25] shouldn't we have, like, 2^24 fixed ips? [18:21:32] no [18:21:36] we have 253 [18:21:41] actually less [18:21:45] we need some /8's :D [18:21:45] 250 or so [18:21:53] we're about to add another network right now [18:22:08] Yikes, no wonder. [18:22:21] maybe we should add a larger range this time :D [18:22:40] we didn't really expect to have this many instances right now [18:23:02] we're not even open yet :D [18:23:30] Aren't the labs instances all on a private network, and hence should be able to use 10.everything? [18:23:30] I know. it's crazy [18:23:35] we do [18:23:48] 10.4.0.0/24 [18:24:47] It's non-trivial to change that to /16? [18:25:10] andrewbogott: yes, unfortunately [18:25:20] we're going to add another network [18:25:32] I should try this in eqiad first [18:25:44] ok. btw, where did you find that notice about IPs? In virt5 nova-compute.log? [18:25:48] yeah [18:25:53] hi, I would like to set up mysql access from my instance; should I run a server in it, or request a db on another instance? [18:26:01] I just view'd it, then kept going back till I found a stack trace [18:26:10] gribeco: which project? [18:26:34] Ryan_Lane: the instance is bots-salebot, it may be in the bots project [18:26:52] yes, it is [18:27:00] ok, deleted 2 un-used instances [18:27:21] gribeco: ask later :) [18:27:36] I deleted a couple of instances; that should last us until maybe lunchtime. [18:27:36] Damianz: okay, when is a good time? [18:27:58] hopefully we're going to make bots sql servers suck much less if resizing works (or re-install some, if really needed) so a few hours if it's not uber important. [18:28:05] okay [18:28:14] currently they're a little... err... overloaded [18:29:25] Ryan_Lane: Just so you don't forget, the DNS code sucks ass [18:29:58] not that there is anything rwong with sucking ass, if that's your thing. If you're designed to do dns and instead of managing dns you suck ass that's an issue. [18:31:00] OK, now, labstore… Ryan_Lane, can you remind me where the script and its log live? [18:31:58] su to glustermanager [18:32:02] it's in its cron [18:32:48] Any reason it's in the use crontab (which is a pita to do line exist stuff in puppet) vs cron.d with the user prefixed to the job (which is easy to add/delete in puppet)? [18:32:56] (just wondering) [18:33:33] because this is how puppet works [18:35:09] Have an example of a project that's getting missed? [18:35:10] seems bizzare to me heh [18:35:26] Or, I guess, my actual question: How do you know it's broken? [18:35:42] Ryan_Lane: Is the overload general, or just IP related? I can probably delete maps-test-osm2pgsql if it helps. [18:35:51] ip [18:35:55] just IP related [18:36:00] are there memory quotas on bots? [18:36:07] 'no' [18:36:08] apmon: don't delete anything [18:36:10] check ulimit for the user though [18:36:19] we're deleting scratch instances we own [18:36:30] because they aren't needed and we should be cleaning up behind ourselves [18:36:35] but we're also adding more IPs [18:36:35] On another note, the stuff that's in the error log I don't suppose is exposed via the api at all? (error is not useful, error no ips is more useful and stops making people run around in circles). [18:37:08] I wish people would clean up *looks at his complaing email* [18:37:35] Actually should read replies to that, need to find an open chinese first. As if my fav chinese doesn't open mondays and restricts me to my fav indian [18:38:30] Well, I haven't really used that instance in a while and probably won't in the foreseeable future. I created the instance to see if a substantial import on the OSM data was feasible on labs in its current form. Which it was not. [18:39:02] apmon: Aside from the current noted missing features are you lacking anything else that's blocking using labs? [18:39:06] For small data extracts to test things out, e.g. puppetization, there is no need for such a big instance. [18:39:35] Damianz: Depends on for what type of project. [18:39:53] For developing and testing software, labs is fine. And having root access which you don't on toolserver is great [18:40:09] any :) we like to know and complain about what doesn't work and hopefully eventually fix it. [18:40:45] But for actually running tools, one needs a replication of the OSM (rendering) database. [18:40:56] But that is sort of already on the list of missing features... [18:41:41] apmon: ah. ok. if you don't need it, then deleting it is ok [18:41:52] apmon: that was the postgres database? [18:42:20] Yes. I tried to import the europe extract, which is about half of the full OSM dataset [18:42:45] But it a) caused to much I/O load on toolserver and b) still actually ran out of diskspace before the import was completed [18:42:57] I suspect postgre will also suck on gluster so will also need larger instance disks [18:42:59] s/toolserver/labs [18:43:22] would be nice for user dbs on postgre too though (sidenote) [18:45:07] Random thought, how do people feel about have a 'labs-scripts' or 'labs-tool' repo to dump stuff like mysql backup scripts, creating per user dirs for apache automatically etc so people can just clone it into /data/project and stick cron entries in (we have some scripts on bots, but I'm thinking reuse wise it might be an idea to cross-project collaborate). [18:45:37] Clearly auto updating woult be a no-no for security issues of exploiting other projects, though we could git pull via puppet and force it to a known good ref which would force ops review [18:45:41] Damianz: Good idea [18:45:44] * Damianz stops random thinking [18:46:05] Ryan_Lane: Can you tell me more about the problem with labstore? Or tell me how to reproduce it, or tell me which projects are troubled? [18:46:28] andrewbogott: any of the new projects [18:46:33] Jan_Luca was affected iirc [18:46:34] Jan_Luca: what's your project name again? [18:46:50] Ryan_Lane: centralauth [18:48:43] Which of the Database-puppets is the right one? [18:49:08] the one that has labs in it and database afaik [18:49:29] role::labs-mysql-server? [18:49:39] I might try writing a parser again later, kinda want to avoid ruby but finding a decent python implimentation of the dsl is apparently hard [18:49:42] yes [18:50:13] Jan_Luca: I take it centralauth had instances in it at some point? [18:50:31] the dir should exist without instances anyway [18:50:41] At moment only centralauth-frontend [18:51:07] Other creations of instances was affected by the IP-problem [18:52:26] Ryan_Lane: When you add the next range how will it affect security groups? For example ping is enabled based on the /24 iirc, is that a manually update each project or have a secret script to update nova job? [18:52:33] (ip wise) [18:53:11] andrewbogott: You can delete the instance if needed. There is at moment only a Apache with PHP5 working without any content [18:57:19] Damianz: I'm going to need to modify all of them [18:57:30] That's going to suck [18:57:33] are we ready to break the world ? [18:59:07] break the world? is that like pushing a bgp update for every ip range and nullrouting them all [18:59:11] heh [18:59:19] I'm ready when you are [18:59:24] Damianz: it's worse than you think [18:59:37] Damianz: I also need to manually extend the network by modifying the database [18:59:43] seems openstack engineers hate us all [18:59:46] seriously [18:59:59] someone on the openstack network told me I should use windows [19:00:02] he's an asshole [19:00:04] ewww [19:00:12] yeah, windows clearly affects openstack [19:00:21] Damianz: he was calling me a newb [19:00:27] oh fyi, the addresses of 10.4.16.253-255 are off limits [19:00:30] lol [19:00:44] because I didn't think I should have to manually modify the network for this [19:01:04] I'll be writing a bitchy email about this later [19:01:15] it's currently impossible to modify networks at all [19:01:15] if you ever have to modify the db directly the tools are fucking useless [19:01:17] sorry, revising 10.4.16.252 -255 are off limits [19:01:42] due to existing network ips [19:01:42] even after the chabge? [19:01:58] I'm confused [19:02:03] Didn't we manage to get one of the not-host ips assigned to an instance once? heh [19:02:08] we have existing network gear assigned to those ip's [19:02:17] if it's a pain in the ass i can change the gear ips [19:02:21] ah [19:02:24] I can mark them as reserved [19:02:38] so that nova won't allocate them [19:04:03] ok [19:04:23] You know a whole back when you where trying to point out that having to add 'novaadmin' to every project since the acl change? For things like changing the security groups on everything it would /really/ suck if we didn't enforce that outside of nova... kinda liked the 'global cloud admin' role that got access to everything. [19:04:56] Damianz: yeah. it sucks that global roles don't exixt [19:04:59] *exist [19:05:00] so ips are changed [19:05:09] /21 is a go [19:05:17] cool. thanks [19:05:22] =D [19:05:35] How much bribing do you think it would take to get like 2 /8's from Leslie? [19:05:40] hahaha [19:05:49] 10.4.16/0-10.4.23.255? [19:05:54] yeah, that would require us to be privately routing actual public ip space [19:06:05] err .0 [19:06:11] make her a CEO of level3 and create a WMF / Level3 joint venture that would hold all level3 IP space :) [19:06:13] hello btw [19:06:14] Ryan_Lane: correct [19:06:17] ok. cool [19:06:20] hashar: yes, that would be the best way :) [19:06:27] oh god all that ip space [19:06:31] it would be so beautiful [19:06:37] now to see if I'm going to break everything [19:06:40] * Ryan_Lane backs up the database [19:06:45] LeslieCarr: or get v6 in labs :-)))))) [19:06:55] RPIE is rather mean with ip assignments now, it's annoying when you get randomly broken up subnets due to the 'only assigning what you really, really need, right now'. [19:06:57] look at Ryan_Lane for that [19:07:06] v6 needs testing in the new region [19:08:36] It would be sort of nice to have a random range of ips for testing stuff like this on the nova dev instances (yes it would be a massive pita technically because of security, probably). [19:08:44] Testing in semi-production never broke anything though :) [19:09:58] 'Chicken Steak' what [19:10:05] how is it possible to have a *chicken* steak [19:10:45] fillet would be a better description [19:10:53] chicken-fried steak? [19:11:54] I'd happily take a plump breast grilled on the bbq in a bap, only eating a 'steak' if it comes off a m00m00 [19:12:15] Or an ostrich, but I'm too cheap to buy ostrich. [19:14:48] LeslieCarr: broadcast is 10.4.23.255? [19:14:51] okay, signing off of this channel again [19:14:52] yep [19:15:01] or want me to stay on until you're sure it works ? [19:15:18] andrewbogott: Do already have a idea what's wrong? [19:15:37] Not yet. Well, not much of one. [19:16:11] Doesn't it just do an ldap search then mkdir the folder and add a gluster export? [19:17:38] it should... [19:17:57] 'labs ' [19:18:15] A new slogan :-) [19:18:33] Well magic stopped happening when the rainbow unicorn went away :( [19:19:16] Actually tempted to photoshop in a rainbow coming out of it's.... tail for lols. [19:19:34] heh [19:22:26] okay, Ryan_Lane now you have 10.4.0.0 to 10.4.7.255 [19:22:30] enjoy ! [19:22:48] hrmmmm, i don't really even remember subscribing to openstack-operators. i guess was during the discussion about merging or not [19:22:52] cool. thanks [19:22:59] http://weknowmemes.com/wp-content/uploads/2012/01/theres-so-much-room-for-activities.jpg [19:23:30] there's an openstack-operators list too? [19:23:42] extra 1,785 instances available (at least from the network side, i know the machines will fall over) [19:23:46] Developers, Review and Users is spammy enough heh [19:23:48] okay, now bye from this channel :) [19:24:04] ok. modified the networks entry [19:24:04] I'm sure we can use 1785 instances [19:24:13] now to add the fixed_ips entries [19:25:48] You know you use your debit card too much when you've physically worn the numbers off but can remember the long number, expiary and security code off the top of your head =/ [19:28:48] Hmm [19:29:22] Without using exported recourses there's no way to just push data onto something like an array then loop over it in puppet, right? (due to it running stuff randomly in order) [19:31:45] ok. added another 255 addresses [19:31:50] let's see if this breaks everything [19:32:13] --dhcp-lease-max=2048 [19:32:16] that's a good sign [19:32:34] fixed_range=10.4.0.0/24 [19:32:36] I need to fix that [19:32:48] are we still on 1 dhcp/network server? iirc you where saying something about bgp support being pushed back a release so we couldn't do it. [19:33:02] I'm going to backport it [19:33:09] I'm going to test it in eqiad [19:33:14] ah cool [19:33:55] you know it would be interesting to have jenkins take the latests stable tag of our branch, merge in code, run unit tests, build debs and push up to the repo =D [19:34:13] yes [19:34:14] it would be [19:34:27] per-project repos [19:34:29] would be nice [19:34:36] that would be nice too [19:34:43] it can push into the repo, under a specific release [19:34:49] the release being the project name [19:34:59] then instances in a project could be configured to use that [19:35:34] It would be nice (based on git) that if a build suceeds it tags a release number and builds the corrosponding deb, pushes that. Then we could just edit the puppet manifest, bump the pin and yay we can test and run prod together in happyness. [19:35:46] andrewbogott: When you find the bug, can you send me a mail (https://labsconsole.wikimedia.org/wiki/Special:EmailUser/Jan)? [19:35:51] Damianz: yep [19:36:01] Jan_Luca: OK. [19:36:16] Thank you! [19:36:31] We could also then based on version numbers do diffs against what changed when things break and fix tests. [19:36:36] Feedback loops in workflow ftw! [19:36:46] All Europeans: Good night! [19:36:51] * Damianz sits and jumps up and down waiting for his indian to arive. [19:36:54] Jan_Luca: nn [19:37:01] Damn, it's half 8 already [19:50:47] 10/01/2012 - 19:50:47 - Created a home directory for andrew in project(s): conventionextension [19:50:56] :o [19:51:14] oh reminds me [19:51:39] Ryan_Lane: Any reason labs-home-wm isn't voiced like the other bots? [19:51:50] no clue [19:55:50] 10/01/2012 - 19:55:50 - User andrew may have been modified in LDAP or locally, updating key in project(s): conventionextension [20:04:57] * Ryan_Lane sighs [20:05:04] well, this surely isn't going well [20:07:42] Ryan_Lane: I think I know what the gluster problem is, although I don't yet know how to fix it. [20:07:53] andrewbogott: oh, what's that? [20:08:05] The next step involves using getfattr on labstore2, mind if I apt-get it? [20:08:38] Gluster seems to believe that /a is already part of a volume, and therefore won't allow anything in /a to be added to a new volume. http://joejulian.name/blog/glusterfs-path-or-a-prefix-of-it-is-already-part-of-a-volume/ [20:08:56] So every new volume creation has failed for… quite a while. [20:09:57] Obviously /a should not itself be part of a brick, but that's my best guess for what's happening. [20:12:27] Ryan_Lane: I created a brand new directory, /a/andrewistesting and gluster immediately told me "/a/andrewistesting or a prefix of it is already part of a volume" which means that either / or /a is being used as a brick. [20:13:37] !log glam created glam-gwtoolset-apt instance to test apt-get installations [20:13:38] Logged the message, Master [20:14:03] !log glam created glam-gwtoolset-puppet instance to test puppet installations [20:14:04] Logged the message, Master [20:15:16] !log glam deleted glam-gwtoolset as its configuration seems to have gone wrong [20:15:17] Logged the message, Master [20:32:06] andrewbogott: ugh [20:32:10] andrewbogott: that's annoying [20:33:33] Ryan_Lane, but if / or /a is used as a brick that's surely a mistake, right? [20:33:39] yep [20:33:56] I really don't see how that's possiblw [20:34:06] ok. So, to investigate… should I just install attr via apt, or should I puppetize? [20:34:16] via apy [20:34:17] apt [20:35:23] 'k [20:38:42] dead [20:39:54] Ryan_Lane: OK, yes indeed there are gluster attrs set on /a. I can just delete them blindly, but it would be nice to know where they came from... [20:40:10] yeah. no clue how they got there [20:40:21] you need to delete them from /a on all nodes [20:40:50] ok, blind deletion coming up! [20:42:01] * Damianz looks at loooooool [20:42:55] would it be possible that there is a bug where by the root got added accidentally then removed again [20:45:32] Rian_Lane: It's for a volume called 'dht' [20:54:09] Grrrrr [20:54:10] stupid [20:54:12] bloody [20:54:13] wiki [21:07:01] andrewbogott: dht? [21:07:03] o.O [21:07:13] dunno how that would have happened [21:07:20] maybe I fucked something up at some point? [21:23:34] Ryan_Lane: OK, I need you to check my work before I destroy the world. On labstore2, look at the diff between /usr/local/sbin/manage-volumes and manage-volumes-new [21:24:10] The problem with the attrs in /a caused many volume creations to abort half-way through, leaving attrs scattered in subdirs (e.g. /a/glam/home) [21:24:19] which means that gluster now refuses to create those volumes, etc. etc. [21:24:38] My proposed change should clean that stuff up **without erasing any existing volumes** [21:24:47] I'd like you to verify the **without** bit :) [21:24:58] heh [21:26:02] yeah [21:26:03] that's fine [21:26:17] it's rmdir, so, if things aren't in the directory, it'll delete it [21:27:03] And presumably gluster won't fill the volumes on some bricks while leaving others empty... [21:27:28] OK, thanks for reading… we'll see if this helps. [21:27:59] nope! [21:28:18] :( [21:28:21] I hate gluster [21:36:37] ugh [21:36:42] this range is *everywhere* [21:36:48] this is going to take me all day [21:37:20] * Damianz wonders how well ceph would function for both instance and project storage so we can actually have redundancy and not-breakage back [21:40:57] Damianz: very poorly [21:41:03] well.... [21:41:05] depends, really [21:41:19] labstore1-4 are SATA [21:41:31] if we stuck it there, it would perform very, very poorly [21:41:54] if we had two ceph clusters, one on local disk and the other on labstore1-4, then it should be fine [21:47:31] ok. instances build now. that's one step down [21:48:39] hmm [21:48:44] ok. I'm lying [21:48:51] sata is kinda slow for vms [21:48:59] puppetmaster is blocking, now [21:50:46] how the hell am I going to update all of the security groups? [21:50:47] 10/01/2012 - 21:50:46 - Created a home directory for marktraceur in project(s): visualeditor [21:51:42] magic [21:52:31] in theory it should be possible to write a script to do it using the admin login, but that's going to suck so much and I garuntee it won't go though cleanly in one go. [21:55:20] yeah. that's what I plan on doing [21:55:36] I really only care about ssh [21:55:37] ping is fine [21:55:52] 10/01/2012 - 21:55:52 - User marktraceur may have been modified in LDAP or locally, updating key in project(s): visualeditor [21:56:28] I updated the docs to say /21 is labs [21:56:39] really that needs to be modified further for when we add eqiad [21:59:48] it's kinda a shame you can't have a 'global' security group like 1 default for every project [21:59:54] yeah [21:59:56] that would be nice [22:00:08] it's available for blocking [22:00:12] just not for opening up [22:00:27] that seems rather silly, can't be that much effort to add opening [22:00:45] 10/01/2012 - 22:00:44 - User jasonspriggs may have been modified in LDAP or locally, updating key in project(s): bots,bastion,testing,wikibits,wlmjudging [22:00:55] 10/01/2012 - 22:00:55 - Updating keys for jasonspriggs at /export/keys/jasonspriggs [22:01:28] Ryan_Lane, is it possible to assign a wildcard hostname to an IP? like *.wlm.wmflabs.org [22:04:31] yes [22:04:36] we need to add the wlm domain, though [22:05:36] well its alreasdy there but i would just need to assign *.wlm to an ip; correct? or is there some other way to make OpenStack understand [22:06:16] Ryan_Lane: user glustermanager should be able to sudo without a password? [22:07:16] andrewbogott: yeah [22:07:32] JasonDC: if wlm domain is already there, then it's just adding * to the wlm domain [22:07:35] yeah [22:07:50] great :) thanks [22:08:10] Ryan_Lane: OK, so, what am I missing here? [22:08:12] # whoami [22:08:12] root [22:08:13] root@labstore1:~# su - glustermanager [22:08:14] glustermanager@labstore1:~$ sudo echo bananas [22:08:15] [sudo] password for glustermanager: [22:08:22] that won't work [22:08:25] it's specific to commands [22:08:28] not for everything [22:08:44] Ah, ok. [22:08:51] So my problem is I need to add rmdir to that list of commands. [22:08:54] How do I do that? [22:09:11] change the sudo list in puppet [22:09:25] yeah [22:09:41] right now it's: glustermanager ALL = NOPASSWD: /bin/mkdir -p /a/* [22:09:41] glustermanager ALL = NOPASSWD: /usr/sbin/gluster * [22:09:52] make sure to specify where it's allowed to rmdir [22:09:56] like I'm doing with mkdir [22:10:08] ok [22:10:39] Personally I'd write a script that takes a minimum amount of args and allow sudo to that, or even possible setuid it. Not a fan of allowing args to be passed to things [22:10:40] ugh [22:10:47] I can't ssh into the new instance because of the security groups. heh [22:11:13] actually, I'm surprised I can ssh to it at all [22:11:27] the security group rule is /24 [22:13:35] 'How scaleable is the sqlite backend for xxxx' everytime a kitten dies [22:16:25] Ryan_Lane: https://gerrit.wikimedia.org/r/#/c/25960/1/manifests/openstack.pp [22:17:00] andrewbogott: yep. looks good [22:17:11] +2'd [22:17:45] +2 is the most stupid terminology ever [22:17:51] heh [22:18:25] Damianz: I scripted it [22:18:28] and it was easy [22:18:34] fuck [22:18:35] security groups? [22:18:51] did you add like 20 conditions for when nova bitch slaps you and just doesn't do some projects [22:18:55] I just got rate limited [22:19:01] LOL [22:19:27] now I need to delete them all and try again [22:19:29] I really shouldn't drink while Ryan tries things [22:19:30] with a sleep this time [22:19:59] You know if you had a global admin right you could just make it unlimited [22:20:01] heh [22:21:28] obviously novaadmin isn't in some projects [22:21:57] obviously [22:23:32] I do keep playing with the idea of implimeting some maintenance script that can clean up the wiki crap as we have instances with data out of sync with nova and projects with missing settings. Not figures out how easy it is to pull from nova api though. [22:23:37] well, it seems the delete worked [22:23:44] even if it threw 500s [22:23:53] The wiki updating works pretty sweet, except when we have instances that have died and thus not triggered an update [22:24:11] dead or not they'll trigger an update [22:24:29] ok. adding rules [22:24:34] sleeping for 2 secs :) [22:24:43] no rate limiting for me! [22:24:55] https://labsconsole.wikimedia.org/wiki/Nova_Resource:I-00000336 [22:24:55] that was sleeping for 9 secs [22:24:58] state = active [22:25:01] it's no active [22:25:05] s/no/not/ [22:25:11] it's dead? [22:25:15] mother fucker [22:25:19] I got rate limited [22:25:23] rofl [22:25:25] this is killing m [22:25:25] me [22:26:18] bad example [22:26:35] apparently I need to use a sleep of 3 or 4 [22:26:52] I'll use 5, just to be safe. heh [22:27:09] Or just if($rate_limited) sleep(10); else hell_for_leather() [22:27:17] meh [22:27:27] I need to add the 5666 rule too [22:27:33] and then remove the /24 rules [22:27:41] and rule 31, can't forget rule 31 [22:27:42] * Damianz giggls [22:28:10] the compute services are probably having shit fits right now [22:28:28] you probably just made them rebuild their iptables like 400 times [22:28:32] yeah [22:28:42] more than that, I'm sure [22:29:25] I assume it's also effectivly just calling the iptables command rather than using the C api so will be lameish. [22:29:34] yeah [22:30:10] Sadtimes when my biggest issue with nagios is I have so many hosts just forking the processes for checks is too much overhead =\ [22:30:35] Though maybe that's may fault for trying to check ~50k of services every min [22:31:55] Btw since you're screwing with security groups did you see my email note re filtering betwean pmtpa and eqiad causing nagios sadface? [22:32:50] ues [22:32:51] *yes [22:33:03] hm [22:35:20] I know I'm bored when I'm reading Telly release notes [22:35:42] on another note, when can we upgrade to puppet 3.0! [22:48:15] Damianz: when we kill off all the deprecation warnings in our code base [22:48:35] lol [22:48:49] sometimes I think it would be nice to just take a blank repo and start from scratch writing modules [23:00:16] heh [23:09:35] I kinda wish I wasn't going to Dublin next week now, oh well I'll enjoy my little trip anyway. [23:10:02] what are you doing there? [23:12:32] Job interview + 3days in the city for kicks. I just accepted a slightly more awesome offer that means I don't have to move like 300miles and lets me stay involved as an activity instructor at the local camp sites. Since I've already paid for the hotel I'm going anyway and just pushing back starting my new job for a couple of weeks, but now I'd rather just dive in head first heh. [23:13:02] I like talking about interesting problems so it will make for a nice 'break' eitherway, even if I have no plans of moving out there in the medium term. [23:32:42] Damianz: ah. nice [23:37:58] hi again... so, how do I get mysql installed in my instance? =) [23:38:29] gribeco: Oh you wanted a mysql db in bots, right? [23:38:35] yes [23:38:36] * legoktm would like one too [23:38:47] Ryan_Lane: Don't suppose you managed to test resizing instances yet [23:41:37] urgh [23:42:01] wth is bots-sql3 running mariadb [23:44:55] see pms [23:45:20] Hopefully should work. [23:45:20] thanks [23:46:48] I hate the way sql is setup atm, we should have 1 user <> 1 db not x users <> everything, might take a look at what needs hacking up in salt to make it work for us. [23:50:36] Damianz: is phpmyadmin installed anywhere that could access bots-sql1? [23:52:05] no [23:52:17] it kept getting exploited, I'd suggest using something like mysql workbench over a ssh tunnel [23:52:53] What about how the toolserver requires a LDAP login first? [23:53:51] Possibly, but we're dealing with shared instances and self signed ssl certs so intercepting someones password would be pretty easy [23:54:07] What we really need is openid/oauth/something that can do auth without giving away the password [23:54:09] Hm alright [23:54:18] I would also like that :) [23:54:24] It's really the only secure way to do it, sadly the extension sucks [23:54:31] Ill look into mysql workbench then