[00:00:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.043 seconds [00:01:15] ^demon: what's this registerEmailPrivateKey thing? [00:01:19] it just got removed from the config [00:01:23] is it important? [00:01:39] <^demon> I didn't touch that in my change to gerrit. [00:02:04] it was added on the upgrade [00:02:10] when puppet ran it removed it [00:02:43] <^demon> Hmm. Don't know. It's not documented in [auth] [00:02:53] <^demon> But it is in an example config for secure.config :\ [00:03:06] <^demon> Inconsistent docs, whe [00:03:52] I'm looking at the code [00:04:26] seems it's needed for allowing people to register their email addresses [00:04:37] they can't do that anyway [00:04:39] since we're using ldap [00:05:40] <^demon> The UI tells them they can, then spews errors when you try. [00:05:46] <^demon> Finally makes sense why now [00:05:48] ugh [00:05:59] well, we'll need to add this to the private repo [00:06:17] lemme add this really quick [00:06:38] <^demon> You can register multiple user names under your account. The only one you can't set is preferred_email since that'll always revert back to LDAP version [00:07:32] New patchset: Ryan Lane; "Adding missing field for email registration in gerrit" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14434 [00:08:05] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14434 [00:09:13] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14434 [00:09:25] !log force running puppet on manganese, it'll restart gerrit [00:09:33] Logged the message, Master [00:10:05] <^demon> 2.5 hasn't been branched yet, so we shouldn't have to repeat this soon. In the meantime, I'll work on gerrit it 100% working with puppet on labs, so we can iron out some of these issues. [00:10:33] <^demon> s/it/so it is/ [00:10:34] bleh [00:10:38] undefined [00:10:47] I fucked that up somehow [00:11:29] Are they still on track to add in the plugin interface to 2.5? [00:11:44] <^demon> Yeah, that's already in master. [00:11:49] New patchset: Ryan Lane; "Add email key to gerrit config class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14435 [00:11:50] Oh cool. [00:12:05] <^demon> So unless they branch some arbitrary point before that, yeah it'll be in 2.5 [00:12:21] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14435 [00:12:29] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14435 [00:12:52] <^demon> 2.5 is gonna be super awesome. I need to try writing a plugin :) [00:13:17] saved searches, saved searches! [00:13:20] heh [00:13:27] <^demon> Don't think that made it in yet :( [00:13:32] <^demon> Plugins are a Big Deal though. [00:13:38] write that as a plugin ;) [00:13:46] !log restarting gerrit [00:13:54] Logged the message, Master [00:14:24] well, we're done :) [00:14:28] <^demon> Ryan_Lane: My first idea for a plugin is actually getting drdee_'s stats stuff integrated. Having it as a dashboard like /stats/ would be super cool. [00:14:35] oh [00:14:36] yeah [00:14:37] that would be [00:15:25] <^demon> Ok, gonna mark this FIXED :) [00:15:32] Can we have a plugin that tracks how many times ryan has to restar it per upgrade? It's java so I'd expect a steady upwards line :D [00:15:40] -_- [00:17:16] <^demon> We'll resolve a bunch of these issues before next time :) [00:21:50] <^demon> Ryan_Lane: Thanks so much! [00:21:55] yw [00:30:56] Ryan_Lane: When you talk about per-project puppet branches requiring modules... are you envisioning one module per repo, or a bunch of modules all in one repo? [00:31:13] a bunch of modules all in one repo [00:31:17] 'k [00:31:25] otherwise we need to deal with submodules [00:31:35] and the workflow for that would suck [00:31:52] git submodules, you mean? ok, makes sense. [00:32:00] yeah [00:32:03] * andrewbogott doesn't hate git submodules, but understands why they are generally hated [00:32:12] it would be much nicer than one monolithic repo [00:32:29] Git submodules are awesome but are 100% not svn externals. [00:32:41] well, nice to everyone but us [00:33:18] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:33:24] Supposedly gerrit is moderately smart about submodules. Not that I've ever seen it done. [00:33:32] Gerrit is smart about it [00:33:37] See the mediawiki/extensions.git repo [00:35:58] andrewbogott: but, either way, we need to use puppet modules :) [00:36:07] it'll likely be a long time till we're fully using modules [00:36:46] email incoming [00:41:56] New patchset: Bhartshorne; "updating swift ring files putting ms-be6,7,8 into rotation for containers and accounts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14437 [00:42:30] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14437 [00:42:32] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14437 [00:43:39] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.032 seconds [00:44:41] hm... forgot that Faidon is on vacation forever :( [00:55:41] andrewbogott: ah. I see what you were asking about now [00:55:48] we'd really want to have modules for each key piece [00:55:58] like, openstack would be a module with a number of components [00:56:13] like keystone, nova, glance, cinder, swift, quantum, etc [00:56:21] gerrit would be a module [00:56:53] even if we went with one giant module, we'd need to rename every class [00:57:09] it's better to just break out pieces slowly over time and turn them into modules [01:05:57] Erik and I are just getting caught up on the db40 fun today....still ongoing, or in cleanup now? [01:06:15] Eloquence: I just asked :) [01:06:19] k [01:07:38] mutante: still around? [01:09:43] * robla may have to resort to random pinging [01:09:57] robla: asher and domas would likely know that answer [01:10:03] ask via email? [01:10:16] I'd imagine that it's done, or we'd see them around talking about it and working on it [01:16:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:21:02] PROBLEM - udp2log log age for oxygen on oxygen is CRITICAL: CRITICAL: log files /a/squid/zero-orange-kenya.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [01:25:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.073 seconds [01:26:05] Ryan_Lane: Why would we have to rename ever class? [01:26:19] the class names need to start with the module name [01:26:46] so, openstack would need to start with openstack::nova, openstack::swift, etc [01:28:32] RECOVERY - udp2log log age for oxygen on oxygen is OK: OK: all log files active [01:29:15] New patchset: Ryan Lane; "Diablo ppa is gone, removing" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14439 [01:29:20] ah, ok. Hm. [01:29:48] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14439 [01:31:02] it should be somewhat easy to convert a lot of it quickly [01:31:31] it's the spaghetti code-ish parts that will be difficult [01:35:44] !log updated squid redirector to cover wiki(quotes|books|versity) [01:35:53] Logged the message, Master [01:40:50] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14439 [01:40:50] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 224 seconds [01:42:47] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 251 seconds [01:49:32] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 655s [01:52:25] binasher: I think I got FlaggedRevs working with LocalRDBStore now in testing \o/ [01:52:38] * AaronSchulz should really go home now... [01:52:44] :O that's awesome!! [01:54:02] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 24s [01:54:29] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 23 seconds [01:54:56] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 1 seconds [01:58:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:09:02] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.026 seconds [02:34:58] PROBLEM - Puppet freshness on cp1017 is CRITICAL: Puppet has not run in the last 10 hours [02:34:58] PROBLEM - Puppet freshness on mw1102 is CRITICAL: Puppet has not run in the last 10 hours [02:48:55] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [02:53:01] http://upload.wikimedia.org/wikipedia/commons/thumb/c/c8/RTVE_Testcard.svg/200px-RTVE_Testcard.svg.png [02:53:10] ***MEMORY-ERROR***: rsvg-convert[9385]: GSlice: failed to allocate 496 bytes (alignment: 512): Cannot allocate memory [02:53:28] Error generating thumbnail Error creating thumbnail: [02:53:39] http://commons.wikimedia.org/wiki/File:RTVE_Testcard.svg [02:59:16] ToAruShiroiNeko: surprise [02:59:37] 24 megabytes of SVG are not easy to render [03:07:23] if its normal behaviour, please disregard [03:48:47] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [05:01:44] PROBLEM - Puppet freshness on ms3 is CRITICAL: Puppet has not run in the last 10 hours [05:11:47] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [05:21:48] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [05:34:51] PROBLEM - Puppet freshness on ms1 is CRITICAL: Puppet has not run in the last 10 hours [06:18:59] New patchset: Tim Starling; "Remove some accumulated crap from live-1.5" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14447 [06:38:26] PROBLEM - Puppet freshness on search31 is CRITICAL: Puppet has not run in the last 10 hours [06:38:26] PROBLEM - Puppet freshness on cp3002 is CRITICAL: Puppet has not run in the last 10 hours [06:40:32] PROBLEM - Puppet freshness on sq69 is CRITICAL: Puppet has not run in the last 10 hours [06:41:26] PROBLEM - Puppet freshness on search24 is CRITICAL: Puppet has not run in the last 10 hours [06:41:26] PROBLEM - Puppet freshness on search34 is CRITICAL: Puppet has not run in the last 10 hours [06:41:26] PROBLEM - Puppet freshness on strontium is CRITICAL: Puppet has not run in the last 10 hours [06:46:32] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [06:47:26] PROBLEM - Puppet freshness on search16 is CRITICAL: Puppet has not run in the last 10 hours [06:47:26] PROBLEM - Puppet freshness on search21 is CRITICAL: Puppet has not run in the last 10 hours [06:47:26] PROBLEM - Puppet freshness on search22 is CRITICAL: Puppet has not run in the last 10 hours [06:47:26] PROBLEM - Puppet freshness on search20 is CRITICAL: Puppet has not run in the last 10 hours [06:47:26] PROBLEM - Puppet freshness on search30 is CRITICAL: Puppet has not run in the last 10 hours [06:47:27] PROBLEM - Puppet freshness on search33 is CRITICAL: Puppet has not run in the last 10 hours [06:47:27] PROBLEM - Puppet freshness on search27 is CRITICAL: Puppet has not run in the last 10 hours [06:47:27] PROBLEM - Puppet freshness on search36 is CRITICAL: Puppet has not run in the last 10 hours [06:47:28] PROBLEM - Puppet freshness on search28 is CRITICAL: Puppet has not run in the last 10 hours [06:50:26] PROBLEM - Puppet freshness on sq70 is CRITICAL: Puppet has not run in the last 10 hours [06:52:57] PROBLEM - Puppet freshness on search26 is CRITICAL: Puppet has not run in the last 10 hours [06:54:00] PROBLEM - Puppet freshness on search18 is CRITICAL: Puppet has not run in the last 10 hours [06:54:00] PROBLEM - Puppet freshness on sq67 is CRITICAL: Puppet has not run in the last 10 hours [06:54:00] PROBLEM - Puppet freshness on sq68 is CRITICAL: Puppet has not run in the last 10 hours [06:56:06] PROBLEM - Puppet freshness on cp3001 is CRITICAL: Puppet has not run in the last 10 hours [06:57:09] PROBLEM - Puppet freshness on search13 is CRITICAL: Puppet has not run in the last 10 hours [06:57:09] PROBLEM - Puppet freshness on search17 is CRITICAL: Puppet has not run in the last 10 hours [06:57:09] PROBLEM - Puppet freshness on search25 is CRITICAL: Puppet has not run in the last 10 hours [07:00:00] PROBLEM - Puppet freshness on search15 is CRITICAL: Puppet has not run in the last 10 hours [07:04:03] PROBLEM - Puppet freshness on search19 is CRITICAL: Puppet has not run in the last 10 hours [07:04:03] PROBLEM - Puppet freshness on search29 is CRITICAL: Puppet has not run in the last 10 hours [07:04:03] PROBLEM - Puppet freshness on search23 is CRITICAL: Puppet has not run in the last 10 hours [07:04:03] PROBLEM - Puppet freshness on search14 is CRITICAL: Puppet has not run in the last 10 hours [07:06:00] PROBLEM - Puppet freshness on arsenic is CRITICAL: Puppet has not run in the last 10 hours [07:09:00] PROBLEM - Puppet freshness on palladium is CRITICAL: Puppet has not run in the last 10 hours [07:16:35] ACKNOWLEDGEMENT - Host srv266 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn RT-2896 - hardware fail [07:29:40] !log continue to restart and upgrade downed mw10xx servers [07:29:47] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [07:29:50] Logged the message, Master [07:36:14] RECOVERY - Host mw1015 is UP: PING OK - Packet loss = 0%, RTA = 30.91 ms [07:36:23] RECOVERY - Host mw1044 is UP: PING OK - Packet loss = 0%, RTA = 31.46 ms [07:36:23] RECOVERY - Host mw1040 is UP: PING OK - Packet loss = 0%, RTA = 30.88 ms [07:36:23] RECOVERY - Host mw1047 is UP: PING OK - Packet loss = 0%, RTA = 30.90 ms [07:41:56] RECOVERY - Host mw1048 is UP: PING OK - Packet loss = 0%, RTA = 30.90 ms [07:42:05] RECOVERY - Host mw1050 is UP: PING OK - Packet loss = 0%, RTA = 30.91 ms [07:42:32] RECOVERY - Host mw1085 is UP: PING WARNING - Packet loss = 80%, RTA = 30.90 ms [07:45:50] PROBLEM - SSH on mw1085 is CRITICAL: Connection refused [07:46:35] PROBLEM - swift-object-auditor on ms-be7 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [07:47:20] RECOVERY - SSH on mw1085 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [07:48:05] RECOVERY - Host mw1087 is UP: PING OK - Packet loss = 0%, RTA = 30.93 ms [07:48:15] RECOVERY - Host mw1089 is UP: PING OK - Packet loss = 0%, RTA = 30.91 ms [07:50:55] New patchset: Raimond Spekking; "Move generic wikisource/wikiversitry entries to top of the section" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14448 [07:53:20] PROBLEM - Host mw1048 is DOWN: PING CRITICAL - Packet loss = 100% [07:53:29] RECOVERY - Host mw1092 is UP: PING OK - Packet loss = 0%, RTA = 30.88 ms [07:54:23] RECOVERY - Host mw1048 is UP: PING OK - Packet loss = 0%, RTA = 30.91 ms [07:58:08] RECOVERY - Host mw1093 is UP: PING OK - Packet loss = 0%, RTA = 30.94 ms [07:59:20] RECOVERY - Host mw1095 is UP: PING OK - Packet loss = 0%, RTA = 31.00 ms [07:59:20] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [07:59:38] RECOVERY - Host mw1096 is UP: PING OK - Packet loss = 0%, RTA = 30.90 ms [07:59:38] RECOVERY - Host mw1105 is UP: PING OK - Packet loss = 0%, RTA = 30.95 ms [07:59:38] RECOVERY - Host mw1098 is UP: PING OK - Packet loss = 0%, RTA = 30.94 ms [07:59:47] RECOVERY - Host mw1110 is UP: PING WARNING - Packet loss = 50%, RTA = 42.77 ms [08:00:59] RECOVERY - Host mw1085 is UP: PING OK - Packet loss = 0%, RTA = 30.97 ms [08:03:14] PROBLEM - SSH on mw1110 is CRITICAL: Connection refused [08:03:23] RECOVERY - swift-object-auditor on ms-be7 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [08:04:45] RECOVERY - SSH on mw1110 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [08:16:45] RECOVERY - Host mw1117 is UP: PING OK - Packet loss = 0%, RTA = 30.92 ms [08:16:45] RECOVERY - Host mw1114 is UP: PING OK - Packet loss = 0%, RTA = 30.88 ms [08:16:45] RECOVERY - Host mw1115 is UP: PING OK - Packet loss = 0%, RTA = 30.91 ms [08:16:54] RECOVERY - Host mw1160 is UP: PING OK - Packet loss = 0%, RTA = 30.93 ms [08:16:54] RECOVERY - Host mw1132 is UP: PING OK - Packet loss = 0%, RTA = 30.98 ms [08:16:54] RECOVERY - Host mw1141 is UP: PING OK - Packet loss = 0%, RTA = 30.92 ms [08:17:03] RECOVERY - Host mw1154 is UP: PING OK - Packet loss = 0%, RTA = 30.88 ms [08:20:12] PROBLEM - SSH on mw1132 is CRITICAL: Connection refused [08:20:12] PROBLEM - SSH on mw1117 is CRITICAL: Connection refused [08:21:42] RECOVERY - SSH on mw1132 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [08:21:42] RECOVERY - SSH on mw1117 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [08:21:51] RECOVERY - Host mw1119 is UP: PING OK - Packet loss = 0%, RTA = 30.90 ms [08:22:36] RECOVERY - Host mw1128 is UP: PING OK - Packet loss = 0%, RTA = 30.93 ms [08:22:36] RECOVERY - Host mw1123 is UP: PING OK - Packet loss = 0%, RTA = 30.91 ms [08:35:57] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [08:39:15] PROBLEM - Host mw1141 is DOWN: PING CRITICAL - Packet loss = 100% [08:41:39] RECOVERY - Host mw1141 is UP: PING OK - Packet loss = 0%, RTA = 30.92 ms [08:46:27] PROBLEM - swift-object-auditor on ms-be8 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [08:46:54] PROBLEM - swift-object-auditor on ms-be6 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [08:58:01] !log dist-upgrading (unused) db10xx servers [08:58:09] RECOVERY - swift-object-auditor on ms-be8 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [08:58:10] Logged the message, Master [09:02:57] PROBLEM - Host db1009 is DOWN: PING CRITICAL - Packet loss = 100% [09:03:51] RECOVERY - Host db1009 is UP: PING OK - Packet loss = 0%, RTA = 30.93 ms [09:04:54] PROBLEM - Host db1010 is DOWN: PING CRITICAL - Packet loss = 100% [09:05:57] RECOVERY - Host db1010 is UP: PING OK - Packet loss = 0%, RTA = 30.96 ms [09:08:10] PROBLEM - Host db1027 is DOWN: PING CRITICAL - Packet loss = 100% [09:09:40] RECOVERY - Host db1027 is UP: PING OK - Packet loss = 0%, RTA = 30.95 ms [09:09:58] PROBLEM - Host mw1158 is DOWN: PING CRITICAL - Packet loss = 100% [09:10:16] PROBLEM - Host db1028 is DOWN: PING CRITICAL - Packet loss = 100% [09:11:28] RECOVERY - Host db1028 is UP: PING OK - Packet loss = 0%, RTA = 30.91 ms [09:11:55] PROBLEM - Host db1013 is DOWN: PING CRITICAL - Packet loss = 100% [09:12:58] RECOVERY - Host db1013 is UP: PING OK - Packet loss = 0%, RTA = 30.91 ms [09:13:34] PROBLEM - Host db1015 is DOWN: PING CRITICAL - Packet loss = 100% [09:14:01] RECOVERY - swift-object-auditor on ms-be6 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [09:15:13] RECOVERY - Host db1015 is UP: PING OK - Packet loss = 0%, RTA = 30.91 ms [09:15:31] RECOVERY - Host mw1158 is UP: PING OK - Packet loss = 0%, RTA = 30.92 ms [09:19:25] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [09:21:31] PROBLEM - SSH on mw1156 is CRITICAL: Connection refused [09:23:01] RECOVERY - SSH on mw1156 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [09:23:28] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [09:23:46] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [09:24:49] RECOVERY - Host mw1085 is UP: PING OK - Packet loss = 0%, RTA = 30.90 ms [09:41:01] PROBLEM - SSH on mw1152 is CRITICAL: Connection refused [09:42:31] RECOVERY - SSH on mw1152 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [09:43:16] PROBLEM - Host mw1151 is DOWN: PING CRITICAL - Packet loss = 100% [09:48:04] RECOVERY - Host mw1151 is UP: PING OK - Packet loss = 0%, RTA = 30.92 ms [10:04:55] New patchset: Mark Bergsma; "Redo monitor module imports, to not conflict" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/14450 [10:06:23] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/14450 [10:18:47] PROBLEM - Host db1046 is DOWN: PING CRITICAL - Packet loss = 100% [10:20:44] RECOVERY - Host db1046 is UP: PING OK - Packet loss = 0%, RTA = 30.92 ms [10:22:05] PROBLEM - Host db1045 is DOWN: PING CRITICAL - Packet loss = 100% [10:23:44] RECOVERY - Host db1045 is UP: PING OK - Packet loss = 0%, RTA = 30.92 ms [10:24:11] PROBLEM - Host db1030 is DOWN: PING CRITICAL - Packet loss = 100% [10:25:41] RECOVERY - Host db1030 is UP: PING OK - Packet loss = 0%, RTA = 30.95 ms [10:27:29] PROBLEM - Host db1016 is DOWN: PING CRITICAL - Packet loss = 100% [10:27:38] PROBLEM - SSH on mw1127 is CRITICAL: Connection refused [10:28:32] RECOVERY - Host db1016 is UP: PING OK - Packet loss = 0%, RTA = 30.91 ms [10:28:38] Change merged: MaxSem; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14447 [10:29:08] RECOVERY - SSH on mw1127 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [10:29:35] PROBLEM - Host db1014 is DOWN: PING CRITICAL - Packet loss = 100% [10:32:17] RECOVERY - Host db1014 is UP: PING OK - Packet loss = 0%, RTA = 30.92 ms [10:33:38] PROBLEM - Host db1011 is DOWN: PING CRITICAL - Packet loss = 100% [10:34:41] RECOVERY - Host db1011 is UP: PING OK - Packet loss = 0%, RTA = 30.93 ms [10:36:29] PROBLEM - MySQL Slave Delay on db12 is CRITICAL: CRIT replication delay 220 seconds [10:36:56] PROBLEM - MySQL Replication Heartbeat on db12 is CRITICAL: CRIT replication delay 242 seconds [10:39:56] PROBLEM - SSH on mw1129 is CRITICAL: Connection refused [10:41:26] RECOVERY - SSH on mw1129 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [10:42:47] PROBLEM - Host db1048 is DOWN: PING CRITICAL - Packet loss = 100% [10:43:59] RECOVERY - Host db1048 is UP: PING OK - Packet loss = 0%, RTA = 31.12 ms [10:45:20] PROBLEM - Host db1029 is DOWN: PING CRITICAL - Packet loss = 100% [10:47:44] PROBLEM - Host db1044 is DOWN: PING CRITICAL - Packet loss = 100% [10:49:05] RECOVERY - MySQL Replication Heartbeat on db12 is OK: OK replication delay 5 seconds [10:49:05] RECOVERY - Host db1044 is UP: PING OK - Packet loss = 0%, RTA = 30.94 ms [10:49:32] PROBLEM - Host db1031 is DOWN: PING CRITICAL - Packet loss = 100% [10:49:50] RECOVERY - MySQL Slave Delay on db12 is OK: OK replication delay 5 seconds [10:50:35] RECOVERY - Host db1031 is UP: PING OK - Packet loss = 0%, RTA = 30.91 ms [10:50:53] RECOVERY - Host db1029 is UP: PING OK - Packet loss = 0%, RTA = 30.90 ms [10:51:02] PROBLEM - SSH on mw1137 is CRITICAL: Connection refused [10:52:32] RECOVERY - SSH on mw1137 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [10:53:35] PROBLEM - Host db1026 is DOWN: PING CRITICAL - Packet loss = 100% [10:54:02] PROBLEM - SSH on db1029 is CRITICAL: Connection refused [10:55:05] PROBLEM - MySQL disk space on db1029 is CRITICAL: Connection refused by host [10:55:05] RECOVERY - Host db1026 is UP: PING OK - Packet loss = 0%, RTA = 31.16 ms [10:55:23] PROBLEM - Host db1012 is DOWN: PING CRITICAL - Packet loss = 100% [10:56:26] RECOVERY - Host db1012 is UP: PING OK - Packet loss = 0%, RTA = 31.20 ms [10:57:20] PROBLEM - Host db1003 is DOWN: PING CRITICAL - Packet loss = 100% [10:58:23] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:59:26] RECOVERY - Host db1003 is UP: PING OK - Packet loss = 0%, RTA = 31.03 ms [11:11:15] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [11:13:33] New patchset: Tim Starling; "Delete more junk files" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14452 [11:17:29] Change merged: Tim Starling; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14452 [11:21:00] RECOVERY - Host mw1069 is UP: PING OK - Packet loss = 0%, RTA = 30.98 ms [11:26:42] RECOVERY - Host mw1076 is UP: PING OK - Packet loss = 0%, RTA = 30.91 ms [11:31:13] New patchset: Mark Bergsma; "Fix failure.check() invocation" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/14457 [11:31:49] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/14457 [11:32:24] RECOVERY - Host mw1071 is UP: PING OK - Packet loss = 0%, RTA = 30.94 ms [11:32:24] RECOVERY - Host mw1064 is UP: PING OK - Packet loss = 0%, RTA = 31.30 ms [11:32:33] RECOVERY - Host mw1082 is UP: PING WARNING - Packet loss = 80%, RTA = 584.91 ms [11:36:18] PROBLEM - SSH on mw1082 is CRITICAL: Connection refused [11:37:48] RECOVERY - SSH on mw1082 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [11:38:06] RECOVERY - Host mw1078 is UP: PING OK - Packet loss = 0%, RTA = 30.90 ms [11:38:06] PROBLEM - Host mw1071 is DOWN: PING CRITICAL - Packet loss = 100% [11:38:42] RECOVERY - Host mw1071 is UP: PING OK - Packet loss = 0%, RTA = 30.90 ms [12:07:57] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [12:08:06] If anyone's around labs currently has rather high load with what looks like a chunk of io wait, probably not much that can be done about it but just incase there is :) [12:08:12] * Damianz goes back to failing to login to bastion [12:09:24] New patchset: Mark Bergsma; "Fix calcStatus broken by a previous commit" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/14459 [12:09:40] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [12:10:18] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/14459 [12:25:49] New review: Hashar; "Thanks for cleaning all of the old files! I have been wondering myself if they were actually of any ..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14452 [12:36:04] PROBLEM - Puppet freshness on cp1017 is CRITICAL: Puppet has not run in the last 10 hours [12:36:04] PROBLEM - Puppet freshness on mw1102 is CRITICAL: Puppet has not run in the last 10 hours [12:45:42] New patchset: Reedy; "pngcrush everything" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/13285 [12:50:10] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [12:56:47] New patchset: Mark Bergsma; "Merge branch 'master' into monitors/dns" [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14463 [12:56:48] New patchset: Mark Bergsma; "Bug fixes" [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14464 [12:56:48] New patchset: Mark Bergsma; "Fix successful result report" [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14465 [12:56:49] New patchset: Mark Bergsma; "Improve DNS query error handling" [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14466 [12:56:50] New patchset: Mark Bergsma; "Report up on NXDOMAIN" [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14467 [12:56:50] New patchset: Mark Bergsma; "Fix DNS query error handling bugs" [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14468 [12:56:51] New patchset: Mark Bergsma; "Rename DNS monitor to DNSQuery" [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14469 [12:56:52] New patchset: Mark Bergsma; "Improve DNS query error messages" [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14470 [12:56:53] New patchset: Mark Bergsma; "Allow configuration of down status on NXDOMAIN responses" [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14471 [12:56:53] New patchset: Mark Bergsma; "Add DNSQuery monitor example" [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14472 [12:56:56] New patchset: Mark Bergsma; "Configuration variables are enforced to be lower case" [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14473 [12:56:56] New patchset: Mark Bergsma; "Shorten NXDOMAIN error message" [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14474 [12:56:57] New patchset: Mark Bergsma; "Fix string expansion" [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14475 [12:56:59] New patchset: Mark Bergsma; "Merge branch 'monitors/dns'" [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/14476 [12:57:34] Change merged: Mark Bergsma; [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14385 [12:57:56] Change merged: Mark Bergsma; [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14386 [12:58:13] Change merged: Mark Bergsma; [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14463 [12:58:50] Change merged: Mark Bergsma; [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14464 [12:59:17] Change merged: Mark Bergsma; [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14465 [12:59:49] Change merged: Mark Bergsma; [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14466 [13:00:24] Change merged: Mark Bergsma; [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14467 [13:00:49] Change merged: Mark Bergsma; [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14468 [13:01:23] Change merged: Mark Bergsma; [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14469 [13:01:51] Change merged: Mark Bergsma; [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14470 [13:02:18] Change merged: Mark Bergsma; [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14471 [13:02:42] Change merged: Mark Bergsma; [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14472 [13:03:06] Change merged: Mark Bergsma; [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14473 [13:03:28] Change merged: Mark Bergsma; [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14474 [13:04:06] Change merged: Mark Bergsma; [operations/debs/pybal] (monitors/dns) - https://gerrit.wikimedia.org/r/14475 [13:04:25] Change merged: Mark Bergsma; [operations/debs/pybal] (master) - https://gerrit.wikimedia.org/r/14476 [13:15:54] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [13:22:04] !log Inserted new pybal 1.04 package in the precise-wikimedia APT repository, and upgraded all precise LVS servers [13:22:14] Logged the message, Master [13:25:39] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [13:28:16] New patchset: Mark Bergsma; "Fix lvs manifest on hosts that don't have IPv6" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14481 [13:28:52] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14481 [13:28:59] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14481 [13:30:36] RECOVERY - Host mw1085 is UP: PING OK - Packet loss = 0%, RTA = 31.16 ms [13:31:03] RECOVERY - Puppet freshness on cp1017 is OK: puppet ran at Fri Jul 6 13:30:54 UTC 2012 [13:31:12] RECOVERY - Puppet freshness on search18 is OK: puppet ran at Fri Jul 6 13:30:59 UTC 2012 [13:31:48] RECOVERY - Frontend Squid HTTP on cp1017 is OK: HTTP OK HTTP/1.0 200 OK - 27535 bytes in 0.189 seconds [13:32:09] New patchset: Mark Bergsma; "Some hosts don't have eth0" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14482 [13:32:43] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14482 [13:32:49] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14482 [13:33:27] RECOVERY - Puppet freshness on search25 is OK: puppet ran at Fri Jul 6 13:33:11 UTC 2012 [13:34:21] RECOVERY - Puppet freshness on sq70 is OK: puppet ran at Fri Jul 6 13:34:10 UTC 2012 [13:34:57] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Fri Jul 6 13:34:29 UTC 2012 [13:36:00] RECOVERY - Puppet freshness on search30 is OK: puppet ran at Fri Jul 6 13:35:54 UTC 2012 [13:36:27] RECOVERY - Puppet freshness on search24 is OK: puppet ran at Fri Jul 6 13:36:01 UTC 2012 [13:36:27] RECOVERY - Puppet freshness on search23 is OK: puppet ran at Fri Jul 6 13:36:02 UTC 2012 [13:37:03] RECOVERY - Puppet freshness on palladium is OK: puppet ran at Fri Jul 6 13:36:46 UTC 2012 [13:39:00] RECOVERY - Puppet freshness on sq68 is OK: puppet ran at Fri Jul 6 13:38:52 UTC 2012 [13:40:03] RECOVERY - Puppet freshness on cp3002 is OK: puppet ran at Fri Jul 6 13:39:34 UTC 2012 [13:40:21] RECOVERY - Puppet freshness on search28 is OK: puppet ran at Fri Jul 6 13:40:04 UTC 2012 [13:41:33] RECOVERY - Puppet freshness on arsenic is OK: puppet ran at Fri Jul 6 13:41:17 UTC 2012 [13:41:33] RECOVERY - Puppet freshness on search20 is OK: puppet ran at Fri Jul 6 13:41:22 UTC 2012 [13:42:27] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [13:46:57] RECOVERY - Puppet freshness on search36 is OK: puppet ran at Fri Jul 6 13:46:53 UTC 2012 [13:47:33] RECOVERY - Puppet freshness on strontium is OK: puppet ran at Fri Jul 6 13:47:07 UTC 2012 [13:49:03] RECOVERY - Puppet freshness on sq67 is OK: puppet ran at Fri Jul 6 13:48:36 UTC 2012 [13:49:21] RECOVERY - Puppet freshness on search34 is OK: puppet ran at Fri Jul 6 13:49:07 UTC 2012 [13:49:30] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [13:50:33] RECOVERY - Puppet freshness on search13 is OK: puppet ran at Fri Jul 6 13:50:08 UTC 2012 [13:50:42] RECOVERY - Puppet freshness on search16 is OK: puppet ran at Fri Jul 6 13:50:27 UTC 2012 [13:50:51] RECOVERY - Puppet freshness on search33 is OK: puppet ran at Fri Jul 6 13:50:35 UTC 2012 [13:51:00] RECOVERY - Puppet freshness on search29 is OK: puppet ran at Fri Jul 6 13:50:45 UTC 2012 [13:51:00] RECOVERY - Puppet freshness on search19 is OK: puppet ran at Fri Jul 6 13:50:49 UTC 2012 [13:51:21] New patchset: Mark Bergsma; "Add DNSQuery monitor to dns_rec LVS service" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14483 [13:51:55] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14483 [13:52:17] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14483 [13:52:30] RECOVERY - Puppet freshness on search15 is OK: puppet ran at Fri Jul 6 13:52:19 UTC 2012 [13:53:33] RECOVERY - Puppet freshness on search27 is OK: puppet ran at Fri Jul 6 13:53:18 UTC 2012 [13:53:51] RECOVERY - Puppet freshness on search31 is OK: puppet ran at Fri Jul 6 13:53:41 UTC 2012 [13:54:27] RECOVERY - Puppet freshness on search17 is OK: puppet ran at Fri Jul 6 13:54:11 UTC 2012 [13:55:57] RECOVERY - Puppet freshness on search22 is OK: puppet ran at Fri Jul 6 13:55:41 UTC 2012 [13:55:57] RECOVERY - Puppet freshness on search26 is OK: puppet ran at Fri Jul 6 13:55:55 UTC 2012 [13:56:10] yeah more puppet fixes :-) [13:57:00] RECOVERY - Puppet freshness on sq69 is OK: puppet ran at Fri Jul 6 13:56:52 UTC 2012 [13:58:03] RECOVERY - Puppet freshness on search21 is OK: puppet ran at Fri Jul 6 13:57:54 UTC 2012 [14:00:09] RECOVERY - Puppet freshness on search14 is OK: puppet ran at Fri Jul 6 13:59:57 UTC 2012 [14:04:39] RECOVERY - Backend Squid HTTP on cp1017 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.190 seconds [14:10:57] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [14:20:58] labs again unusable [14:25:58] Has been most the day :( [14:28:45] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 214 seconds [14:38:55] !log authdns-update to correct typo in mgmt dns entry [14:39:03] Logged the message, Master [14:39:22] hrmm, someone updated morebots and yanked my name out of its confirmation.... [14:40:36] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 12 seconds [14:44:15] New patchset: RobH; "added vanadium to dhcp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14485 [14:44:48] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14485 [14:45:25] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14485 [14:49:13] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14277 [14:50:26] New review: Reedy; "I think leaving it as was, and prefixing a + might work" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14277 [14:50:48] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14448 [15:03:15] PROBLEM - Puppet freshness on ms3 is CRITICAL: Puppet has not run in the last 10 hours [15:05:39] hey ^demon, what do you get as output when you run the gerrit ls-projects command through ssh? [15:05:48] PROBLEM - udp2log log age for oxygen on oxygen is CRITICAL: CRITICAL: log files /a/squid/zero-orange-kenya.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [15:06:06] <^demon> drdee_: One second, lemme pastebin. [15:06:10] my suspicion is that gerrit has 'a one off' error when returning the projects list [15:06:49] I sent ya two, but indeed, one goes where the memcached are [15:07:03] you should work with mark or Leslie to get it setup [15:07:19] <^demon> drdee_: http://p.defau.lt/?g_L6c0NT7ENRJOm53FMXVg [15:07:20] but you can do the usual, rack it, power and serial wire, mgmt wire [15:07:28] what rack is it? [15:08:09] ack, none of the servers are in it yet in racktables =P [15:08:10] orgchart is also missing for you [15:08:34] cmjohnson1: So yea, normally we rack the switches in rack top, so just slap that in u44/45 [15:08:37] ^demon pretty sure it's a gerrit bug :) [15:09:02] <^demon> Weird.... [15:09:05] <^demon> Let's file it. [15:09:10] or fix it? ;) [15:09:22] where in the source code is the ls-projects defined [15:09:29] we can have a look at that first [15:09:34] * RobH kicks vanadium [15:09:47] work damn you, i dont wanna have to drive 30 miles because today you stopped rebooting when yesterday you were fine [15:10:14] sigh....im gonna have to drive in and kick this thing. [15:10:15] <^demon> drdee_: Lemme check. [15:10:32] cmjohnson1: the other one is a spare [15:10:42] we [15:10:46] we'll need it if the active one fails [15:11:26] ok, heading into eqiad to poke at the analytics machine that is failing to work for me ;_; [15:12:14] RobH, is that stat1001? [15:13:00] RobH: Leslie opened a ticket about mgmt serial console redirect on neon (RT-3235), appears to work for me after a racreset so i closed it already, but just in case it is temporary or comes back after a while...hrmm [15:13:09] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [15:14:42] <^demon> drdee_: Should be mostly in gerrit-server/src/main/java/com/google/gerrit/server/project/ListProjects.java I think. It's injected into ListProjectsCommand in gerrit-sshd. [15:15:10] thx, i'll have a look [15:16:39] <^demon> If you end up seeing an easy fix, I've got push access upstream :) [15:17:18] drdee_: nope, analytics click tracking? [15:17:28] vanadium, its serial redirection is not working now for some reason [15:17:33] stat1001 i thought we fixed. [15:17:44] i thought you needed to call dell [15:17:53] it wouldn't powercycle [15:17:55] AFAIK [15:18:14] hrmm, it may be one of the ones i called dell on, my bad. [15:18:21] i have a notepad page of dell cases right now [15:18:30] ^demon: look at lines 178-181 [15:18:36] ok, headed in, back online shortly from eqiad [15:18:52] could that be the cause why the orgchart repo is not showing? [15:19:10] <^demon> I don't know why it wouldn't be visible [15:19:19] me neither :) [15:19:33] or maybe it's not visible to you and me [15:20:17] <^demon> But it is via the UI :\ [15:20:33] <^demon> https://gerrit.wikimedia.org/r/#/admin/projects/wikimedia/orgchart,access [15:21:10] robh: I like how you "liked" what I did on my 4th -- bugzilla4 packaging! ;) [15:21:34] Logged the message, Master [15:22:29] oh look, I'm talking to a person who left :{P [15:22:45] <^demon> drdee_: I wonder if line 163 is to blame. [15:22:45] hexmode: i like to hear that too. yay for the package [15:22:52] <^demon> drdee_: Like it's missing from the cache or something. [15:23:13] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [15:23:32] mmmmmm, possible although i have no clue what / how gerrit caches stuf [15:24:16] <^demon> Just tried flushing the project_list and projects caches, no change. [15:28:21] <^demon> drdee_: The comment there is kinda funny, "If we can't get it from the cache, pretend its not present." [15:29:12] yeah, so maybe not a bug but a feature :D [15:29:14] // lets home we have some background worker warming that caches [15:29:19] hope* [15:29:45] gerrit will be sloows [15:34:06] New patchset: Alex Monk; "(bug 38216) Let frwiki autoconfirmed users see private AbuseFilter log entries." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14491 [15:36:15] PROBLEM - Puppet freshness on ms1 is CRITICAL: Puppet has not run in the last 10 hours [15:56:02] New patchset: Ottomata; "filters.oxygen.erb - re-enabling Bangladesh filter with new range" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14495 [15:56:56] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14495 [15:57:38] New patchset: Ottomata; "check_udp2log_log_age - fixing name of orange kenya log in list of slow logs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14497 [15:58:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14497 [16:15:06] New patchset: RobH; "added precise to vanadium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14498 [16:15:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14498 [16:16:43] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14498 [16:17:12] RECOVERY - udp2log log age for oxygen on oxygen is OK: OK: all log files active [16:22:45] wtf vanadium. [16:22:55] all your serial stuff is right, why arent you redirecting. [16:27:09] New patchset: Bhartshorne; "moving swift ring files to volatile storage to get them out of git" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14501 [16:27:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14501 [16:28:40] how annoying, it needed a full power removal to fix the bad state. [16:29:03] mark: this kind of stuff makes me think we should start using all switched cdus. [16:29:15] no remote hands needed for odd power related issues like this. [16:33:44] New patchset: Bhartshorne; "moving swift ring files to volatile storage to get them out of git" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14501 [16:34:43] New patchset: RobH; "removed vanadium from autopart install" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14502 [16:35:29] come on lint check. [16:35:32] so slow. [16:35:32] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14501 [16:35:32] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14502 [16:35:49] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14502 [16:36:09] PROBLEM - swift-object-auditor on ms-be7 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [16:39:51] New patchset: RobH; "setting up vanadium like oxygen" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14503 [16:40:44] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14503 [16:43:38] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14503 [16:46:21] RECOVERY - SSH on neon is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [16:46:36] RobH: about vanadium, I don't think it needs the lucene upd2log instance, right ottomata? [16:47:51] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [16:50:59] drdee_: take a look at my commit, i can pull it out and change it if needed [16:51:11] but lemme know what in the site.pp manifest is wrong, i have not synced it with puppet yet [16:52:08] otherwise im setting it like oxygen, small 120gb boot raid, rest in lvm for data, and this has room for two additional disks to be added, raided and lvm'd for data if needed [16:53:37] RobH: ignore my comment, CT looped me in [16:53:45] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14501 [16:53:48] ok, cool, then its being deployed now [16:53:58] the OS install will finish in a bit, then I will do the initial puppet runs [16:54:05] then if all that works it will be time for lunch [16:57:38] New patchset: Jeremyb; "bug 38111 - rm space from mlwikiquote sitename" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14506 [17:03:19] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14491 [17:08:25] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13428 [17:09:40] Reedy: want another shell req while you're at it? [17:09:58] New review: Jeremyb; "has been sanity checked and consensus link reviewed by native speaker and steward Jyothis. (I don't ..." [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/14506 [17:24:54] RECOVERY - swift-object-auditor on ms-be7 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [17:28:49] !log adjusted swift ring files to move container listings off spinning disks (ms-be1-4) [17:28:58] Logged the message, Master [17:30:45] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [17:34:00] New patchset: Jeremyb; "bug 36884 - redirect labs.wm.o -> labsconsole" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/14509 [17:39:26] cmjohnson1: are you in the dc ? [17:45:58] PROBLEM - swift-object-auditor on ms-be6 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [17:47:30] uhoh, the eqiad dns recursor is borked [17:48:56] is it? [17:48:56] * mark looks [17:51:23] how do i ssh into a machine on eqiad w/o a public ip? [17:51:46] ori-l: do you have an account on said machine ? [17:52:06] LeslieCarr: yeah [17:52:07] RECOVERY - swift-object-auditor on ms-be6 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [17:52:25] LeslieCarr: though i'd be hard-pressed if you asked me to prove it now :) [17:52:28] ssh in via bast1001 or fenari [17:53:15] New patchset: Ottomata; "statistics.pp - installing python-mysqldb on stat1 for gerrit-stats" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14510 [17:53:18] cmjohnson1: ping [17:53:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14510 [17:54:05] * ori-l tries [17:55:06] New patchset: Mark Bergsma; "Add dns_rec LVS service IP to lvs1002" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14511 [17:55:50] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14511 [18:01:35] cmjohnson1: i figured out my own question, but thanks :) [18:05:53] cmjohnson1: hi! can you swap some drives around in pc1.pmtpa today? [18:06:05] RECOVERY - Puppet freshness on neon is OK: puppet ran at Fri Jul 6 18:05:48 UTC 2012 [18:07:20] LeslieCarr: ok, i guess i don't have access [18:08:07] mark: yt? any chance you can push my key to vanadium? [18:10:12] cmjohnson1: thanks! also wondering about the status of a couple of SSD orders, and what's arrived so far. there were a couple of separate orders for intel 710 ssds and then one for 42 x 480GB intel 520 ssd's that are intended for tool labs… do you how many of the 480GB 520's (if any) have arrived? [18:11:08] ori-l: what's your account name ? [18:11:10] RECOVERY - NTP on neon is OK: NTP OK: Offset -0.05941545963 secs [18:12:41] LeslieCarr: 'olivneh'. my key is linked to from this ticket: https://rt.wikimedia.org/Ticket/Display.html?id=3116 [18:12:41] thanks, checking it out [18:12:41] LeslieCarr: erm, https://gerrit.wikimedia.org/r/#/c/13223/1/manifests/admins.pp [18:12:48] thanks [18:16:00] "We want to get this out of Labs eventually, so [18:16:00] we'll likely have to puppetize the current Etherpad Lite instance and [18:16:03] deploy it to a dedicated machine." [18:16:17] Hi, ops! I'm here to get some help with that :) [18:17:49] No rush, but whenever someone gets some time, I'd love the help [18:18:17] ori-l: hrm, trying to figure out why your account wasn't created on vanadium… it definitely should be there [18:20:09] marktraceur - sounds good! let us know once u have that puppetized [18:20:09] LeslieCarr: thanks! [18:20:59] woosters: How do I do that, I guess, is the question [18:29:22] PROBLEM - Host mw1100 is DOWN: PING CRITICAL - Packet loss = 100% [18:29:29] New patchset: RobH; "added Ori for access to vanadium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14513 [18:30:06] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14513 [18:30:34] RobH: thanks :) [18:30:59] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14513 [18:31:04] RobH: he was already in via admins::restricted [18:31:23] RobH: no accounts are showing up at all on vanadium [18:32:03] ori-l: quite welcome, its merged and I am forcing a puppet run on vanadium now [18:32:13] oh? [18:32:25] LeslieCarr: you workign on this already? [18:32:39] yeah [18:32:56] sorry then, lemme revert my change [18:32:59] well, trying.... [18:33:06] New patchset: RobH; "Revert "added Ori for access to vanadium"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14514 [18:33:56] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14514 [18:33:56] Change abandoned: RobH; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14514 [18:37:52] question for the opsen: should i able to connect to db10 from stat1 using mysql? [18:42:23] drdee: there is no db10, so no. [18:42:56] i am trying to access the gerrit db, on which host is that running? [18:43:11] db9? [18:43:20] oh, that! yeah, db9 [18:43:36] and you probably won't have an account there ;) [18:43:39] i doubt any db contains grants that would allow stat1 [18:45:05] cmjohnson1: woo. Ryan_Lane: those can go into ciscos as we discussed for tool labs dbs. let me know when you and faidon decide on os / container build details and i'll help [18:45:20] i am using the gerrit readonly account [18:45:31] so i do have access to the gerrit db [18:45:59] and through fenari i am accessing db10 to query gerrit db, so i am pretty sure it is db10 [18:46:00] it's db9/db10 [18:46:07] db10 is the replica [18:46:13] so, it's likely best to use that [18:46:22] it's db10.pmtpa.wmnet [18:46:30] you may need to use the fqdn [18:46:47] drdee: nevermind what i said before, the gerritro grant is for @% [18:46:56] Thanks Ryan_Lane!!!! [18:47:03] fqdn is the solution [18:47:04] yw [18:47:52] New patchset: Lcarr; "trying commenting out then uncommenting accounts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14515 [18:48:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14515 [18:48:29] binasher: I'd say let's go with precise [18:48:39] however the dbs are normally created [18:49:13] otherwise it'll just have nova-compute added, which will handle the containers [18:49:15] $gerrit_db_host = "db9.pmtpa.wmnet" [18:49:26] Reedy: ? [18:49:30] that's for writing [18:49:31] if you want the same mysql build as prod, we'll have to stick with lucid for now, but we don't have to do that necessarily [18:49:38] ah [18:49:41] Change abandoned: Lcarr; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14515 [18:49:55] we'll be upgrading nova soon [18:49:58] which will require precise [18:50:29] Change restored: Lcarr; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14514 [18:50:38] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14514 [18:50:52] Ryan_Lane: go ahead and build them as you want them [18:50:58] ok [18:51:19] i'll either build new mysql packages for both distros (on my general to-do list) or figure out something else [18:51:27] New patchset: Lcarr; "trying commenting out groups on vanadium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14516 [18:52:01] New patchset: Hashar; "(bug 38111) rm space from mlwikiquote sitename" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14506 [18:52:01] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14516 [18:52:05] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14516 [18:52:08] New review: Hashar; "Good for me. I am not deploying on a friday night though." [operations/mediawiki-config] (master); V: 0 C: 1; - https://gerrit.wikimedia.org/r/14506 [18:52:11] sounds good [18:53:54] New patchset: Lcarr; "Revert "trying commenting out groups on vanadium"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14517 [18:54:07] hashar: i wonder why the rebase? [18:54:26] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14517 [18:55:01] jeremyb: I usually rebase to avoid a merge commit by gerrit [18:55:12] jeremyb: sometime I also tweak the commit message for minor typos [18:58:43] hashar: how odd [18:59:58] hashar: if we don't want merge commits can't we just tell gerrit not to do them? I think there was even a list thread on that very topic? [19:00:17] that can be done [19:00:24] that is how I have setup the integration/jenkins repo [19:00:31] "fast forward only" or something like that [19:00:47] now that we have a [rebase] button, we can probably use the same on other repositories [19:01:05] i haven't seen this button! [19:03:14] * ori-l brbs [19:03:20] maybe it is restricted to some people ? :/ [19:04:30] jeremyb: the button just happened last night with the upgrade [19:04:46] LeslieCarr: i figured. but i still haven't seen it! [19:04:50] weird [19:04:55] I don't see it either [19:04:58] hard reload ? [19:05:09] idk where to look for it [19:06:32] maybe on one of your changes [19:06:34] that is pending [19:06:39] along with the "review" button [19:06:46] Oh wait [19:06:47] I see it [19:06:54] 'Rebase Change' [19:07:02] It's available to change owner, change submitter, and users with the 'Rebase' permission [19:13:13] New patchset: Hashar; "pngcrush everything" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/13285 [19:14:00] sigh, i can't figure out why accounts aren't being created .... [19:15:45] * jeremyb spies a rebase button [19:17:02] New patchset: Hashar; "pngcrush everything" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/13285 [19:18:20] New review: Hashar; "Patchset 3 : removes tEXt fragments in gnu-fdl.png files" [operations/mediawiki-config] (master); V: 0 C: 1; - https://gerrit.wikimedia.org/r/13285 [19:19:46] anyone have any ideas as to why accounts aren't being created on precise ? [19:20:35] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [19:21:10] are they being made in search? [19:21:16] cuz that precise isnt it? [19:21:52] LeslieCarr: yea, search makes them [19:22:03] i wonder if its something else in the puppet run thats interrupting it. [19:22:21] weird [19:22:29] search includes the roots and mortals groups for mediawiki updates [19:22:45] though its a group inclusion not individual. [19:22:53] interesting ... [19:23:01] this isn't making group or individual [19:23:08] maybe if i only comment out the individual ... [19:24:05] running puppet with debug didn't help ….. stupid debug [19:24:29] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [19:28:23] New patchset: Lcarr; "testing commenting out more accounts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14519 [19:28:57] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14519 [19:34:15] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14519 [19:38:14] LeslieCarr: yikes, sorry -- didn't realize this would be so complicated [19:38:28] ori-l: coming to wikimania? [19:38:42] jeremyb: yeah! [19:39:07] leslie can get revenge then ;) [19:39:21] cool [19:39:22] hehehe [19:39:26] :) [19:39:31] sigh... [19:40:44] LeslieCarr: why are all of the invidual accounts includes there? not just groups? [19:40:57] individual* [19:41:06] i guess maybe that's the same thing rob asked [19:41:13] i have no idea … [19:41:25] i'm just trying to make the accounts even happen ;) [19:41:42] and i have no clue... [19:51:48] notpeter: are you around ? [19:52:00] notpeter: did you have any troubles setting up user accounts on the search machines ? [19:52:41] LeslieCarr: so what's `getent passwd` and `getent group` look like? [19:53:14] they're not in the passwd list [19:53:55] New patchset: Lcarr; "reverting commenting of users" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14521 [19:54:29] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14521 [19:55:19] LeslieCarr: but is there e.g. a 565 in the list? [19:55:30] LeslieCarr: or group 500 even? [19:55:41] nope [19:55:50] nothing other than the default users :( [19:57:58] Is there documentation for puppetizing a service? [19:59:32] marktraceur: Like http://docs.puppetlabs.com/references/stable/type.html#service? Or do you mean for a 'service' rather than a service. [20:01:15] Damianz: he means etherpad-lite. it's already been packaged but idk if anyone reviewed the packaging [20:01:17] Damianz: I don't know, I haven't worked with puppet before. I basically have a bunch of code that I want to run on a dedicated server, and apparently need to "puppetize" in order to do that [20:01:47] marktraceur: you've seen the package, right? [20:01:59] Ah, probably not the best person to point you in that direction... still getting hang of puppet myself. [20:02:04] jeremyb: If someone already packaged, it's unlikely that they had the right version; I needed to get the develop branch of the upstream and add several plugins [20:02:31] marktraceur: still you can use their packaging [20:02:44] marktraceur: prerequisite for puppetizing is first packaging [20:03:07] Mmkay [20:03:35] Well, I haven't seen the package, and I don't know any of the process. Just resources for getting started would be helpful. [20:04:29] marktraceur: http://svn.wikimedia.org/viewvc/mediawiki/trunk/debs/etherpad-lite/ [20:04:52] marktraceur: make that build with whatever extra stuff you need to add [20:05:36] marktraceur: i'll not be around much until monday but feel free to poke me with further questions here or in person at the hackathon [20:07:10] jeremyb: /me will not be at the hackathon [20:07:50] Poking someone in person sounds less fun [20:07:53] marktraceur: is there some reason you're trying to do this right now? may be best to either 1) try to participate in hackathon remotely) or 2) wait 10 days and try again [20:08:13] (unless there's an immediate need for it) [20:08:15] Ha, it's just one of the few things on my TODO list [20:08:25] k [20:08:25] Other than "I need something to do", there's no huge need for it [20:08:31] hah [20:08:34] I can't get the stupid nagios host back up [20:08:43] Ryan_Lane: labs or prod? [20:08:46] labs [20:08:59] marktraceur: well i'll send you a mail about packaging but idk if it will help [20:21:05] marktraceur: check your mail [20:21:55] PROBLEM - Puppet freshness on db1029 is CRITICAL: Puppet has not run in the last 10 hours [20:22:08] jeremyb: I saw, thanks [20:34:00] mark: are you going ot be at the hackathon ? [20:34:21] marktraceur: i meant for above --- are you going to be at the hackathon ? [20:35:39] LeslieCarr: he won't [20:36:26] LeslieCarr: Not that I know of, and I suspect that if I don't know yet I'm not going :P [20:36:40] marktraceur: you should come! [20:36:45] marktraceur: where art thou? [20:36:53] jeremyb: San Francisco [20:36:59] too far away [20:37:01] *nod* [20:37:18] I wish I could come, but no luck [20:37:22] LeslieCarr: hence i said he should either try to participate remotely or he should wait ~10 days. (given it's not urgent). orrrrrrr he can start on his own and ask questions as he goes. but that's essentially just remote participation in the hackathon! [20:37:25] eh, think you could sneak into one of my suitcases ? [20:37:30] ;) [20:37:31] yeah [20:38:33] LeslieCarr: I fear the additional baggage charges would almost equal the cost of a plane ticket! [20:39:21] hehe that's true [20:40:09] <^demon> Driving 2h -> one of the best travel plans ever. [20:40:18] <^demon> Although RobH has got me beat--one metro stop. [20:42:06] ^demon: tiffany? [20:42:19] <^demon> Yeah, couple of people have me beat. [20:42:32] <^demon> But I'm looking forward to just...not flying [20:43:01] could you even fly it if you wanted? [20:44:17] <^demon> Not usually. I looked it up before and I had to go through Philly. [20:44:20] <^demon> Which is just stupid. [21:07:33] !log removed a bit more load from the swift spinning media container servers by adjusting the ring weights [21:07:42] Logged the message, Master [21:48:28] !log truncating pagetriage tables on enwiki (per bsitu) [21:48:36] Logged the message, Master [21:51:39] New patchset: Ryan Lane; "Change nrpe address for labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14595 [21:52:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14595 [21:52:54] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14595 [22:18:35] LeslieCarr: how's it going? (this isn't urgent, btw, so if you moved on to something else that's ok) [22:18:46] ori-l: i'm totally stuck [22:19:23] what's wrong? or is it going to be hopelessly over my head? [22:19:34] puppet's just not making the user accounts [22:19:41] i'm not sure if it's because this is a precise box or what [22:19:53] notpeter set up the search boxes with precise and they have user accounts [22:19:56] i'm hoping he has some fix [22:20:19] hm. user accounts should work on precise [22:20:21] what node is this? [22:21:36] LeslieCarr: ? [22:21:49] Ryan_Lane: vanadium [22:21:56] on eqiad [22:23:51] LeslieCarr: fqdn s vanadium.eqiad.wmnet [22:23:53] *is [22:24:03] node in puppet is vanadium.wikimedia.orf [22:24:04] yep [22:24:05] *org [22:24:06] ah [22:24:10] heh [22:24:13] hah, no wonder it had no effect [22:24:18] yep :) [22:25:39] New patchset: Lcarr; "switching fqdn to internal for vanadium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14598 [22:26:13] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14598 [22:26:21] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14598 [22:28:00] hehe, amazing that everything is working properly now ;) [22:29:50] \o/ [22:30:17] sweet! thanks LeslieCarr / Ryan_Lane [22:30:19] ori-l: how's it look now ? [22:30:42] like this: [22:30:42] olivneh@vanadium:~$ [22:30:44] :) [22:36:18] yay [22:36:45] LeslieCarr: not sure how to sudo, tho [22:37:08] PROBLEM - Puppet freshness on mw1102 is CRITICAL: Puppet has not run in the last 10 hours [22:38:03] oh, the request was for restricted shell access [22:38:04] not sudo [22:38:06] https://rt.wikimedia.org/Ticket/Display.html?id=3116 [22:38:27] LeslieCarr: that's for emery [22:38:39] LeslieCarr: sorry, that's my bad -- i just quoted it because it linked to my ssh public key [22:39:01] LeslieCarr: the ticket for vanadium is this (rather epic) one: https://rt.wikimedia.org/Ticket/Display.html?id=3152 [22:39:37] i don't see anything about sudo access in here though ? [22:51:14] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [22:54:44] ori-l: please don't tell me after all that, sudo is required ? [22:56:28] lol [22:58:53] PROBLEM - SSH on srv276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:59:38] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.26:11000 (Connection timed out) [23:01:35] PROBLEM - Memcached on srv276 is CRITICAL: Connection refused [23:01:44] RECOVERY - SSH on srv276 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [23:13:35] PROBLEM - MySQL Replication Heartbeat on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:14:56] RECOVERY - MySQL Replication Heartbeat on db1047 is OK: OK replication delay 0 seconds [23:17:47] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [23:18:05] RECOVERY - Memcached on srv276 is OK: TCP OK - 0.004 second response time on port 11000 [23:29:06] LeslieCarr: sorry, i got pulled into a meeting. yes, i need to have sudo on vanadium -- this wouldn't make sense otherwise [23:31:24] … that's a different approval set, can you ask woosters for approval for that ? [23:31:35] sorry :( [23:32:57] i'll bbi10 [23:33:20] LeslieCarrafk: sure. [23:35:34] actually, RobH might be the person if he's around [23:47:49] back [23:50:12] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [23:54:21] LeslieCarr: do you need this instance anymore? https://labsconsole.wikimedia.org/wiki/Nova_Resource:I-000000b7 [23:54:45] I'm pretty sure it's corrupted due to broken kvm block migration [23:54:48] I was going to delete it [23:54:49] you can delete it [23:54:51] thanks [23:55:43] PROBLEM - udp2log log age for lucene on vanadium is CRITICAL: NRPE: Command check_udp2log_log_age-lucene not defined [23:56:01] PROBLEM - udp2log log age for oxygen on vanadium is CRITICAL: NRPE: Command check_udp2log_log_age-oxygen not defined