[00:01:54] mm oranges [00:05:20] Sounds like you had fun, not. [00:05:45] nope, but woosters was nice and found me a headset … so i looked like an old fashioned operator, but at least i wasn't craning my neck any more [00:06:02] lol [00:07:00] well done, sounds like it was quite a tricky issue [00:07:42] indeed [00:07:52] cookies will have to wait until September [00:08:14] are you headed here in september paravoid ? [00:08:18] or will i see you in amsterdam ? [00:09:36] Could someone please merge https://gerrit.wikimedia.org/r/#/c/13170/ ? It fixes the cron entry for extension distributor, so we don't have to worry about moving configs (the invoker was fixed a little while back) [00:11:04] LeslieCarr: I'll be there for the staff meeting [00:11:07] (afaik) [00:11:30] anyway [00:11:33] going now [00:11:36] bye-bye [00:11:39] oh yeah :) [00:11:40] bye [00:12:01] Reedy: sure [00:12:20] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13170 [00:16:23] Thanks :) [00:33:44] RECOVERY - MySQL Replication Heartbeat on db36 is OK: OK replication delay 8 seconds [00:34:02] RECOVERY - MySQL Slave Delay on db36 is OK: OK replication delay 1 seconds [01:10:56] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [01:14:50] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [01:40:40] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 214 seconds [01:42:28] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 251 seconds [01:49:31] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 674s [01:51:10] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 3 seconds [01:52:22] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 16s [01:53:07] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 0 seconds [03:38:34] RECOVERY - udp2log log age for emery on emery is OK: OK: all log files active [03:51:15] New patchset: Tim Starling; "Re-enable API action=purge for commonswiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14011 [03:51:56] Change merged: Tim Starling; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14011 [04:05:18] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [04:20:00] Ryan_Lane: hey, just wondering what's up with gerrits 8120/8344 (ircecho) [04:20:19] * jeremyb heads back to scrollback [04:22:42] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [04:25:00] paravoid: good flight? how was the sala de protocolo? ;) [04:25:10] i wonder if i've ever had greek cheese... [04:28:33] PROBLEM - Puppet freshness on cp1017 is CRITICAL: Puppet has not run in the last 10 hours [04:28:33] PROBLEM - Puppet freshness on mw1102 is CRITICAL: Puppet has not run in the last 10 hours [04:29:36] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [04:31:06] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [04:36:55] jeremyb: I'll take a look at them soon [04:37:07] jeremyb: well, I was sick for the flight, so it wasn't the best ever ;) [04:41:36] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [04:49:55] Ryan_Lane: youch. and now? recovered? [04:50:40] I'm feeling way better now [04:50:45] I'm just a little jet lagged [04:55:07] slightly lagged is appropriate ;) [04:55:25] although... now you get to bounce back east again soon! [04:56:03] whoops, almost typed the wrong thing in here. can't decide if it would be worse than a password [04:56:06] ;P [04:56:21] heh [04:56:36] yeah. wikimedia is quickly approaching [04:56:40] not looking forward to the heat [04:56:50] wikimedia is all around you [04:56:55] wikimania is approaching [04:57:13] heat ain't so bad. at least not worse than NY i think [04:58:47] NY summer heat is terrible ;) [04:58:59] that part of the country doesn't seem to understand what air conditioning is [04:59:10] http://forecast.weather.gov/zipcity.php?inputstring=10003 [04:59:13] heh. wikimania, right [04:59:15] http://forecast.weather.gov/zipcity.php?inputstring=20052 [04:59:35] 82, at night [04:59:47] Friday Sunny and hot, with a high near 98. [04:59:51] 98!! [05:00:12] that is *incredibly* hot [05:00:38] * jeremyb declares it to be 37 degrees [05:02:48] Ryan_Lane: it isn't even triple digits [05:19:08] Ryan_Lane: quicker question i hope, if you're still there. can we do (or at least approve) the simple DNS change of RT 2919 (in a separate RT if needed) and deal with the bigger question of where services live later? this (immediate need) is not about a deployment [05:19:18] (this is for wikidata) [05:19:45] I'm not so sure it's as simple as a dns change [05:20:22] but the uri scheme was already approved [05:20:55] Ryan_Lane: what's not simple about it? [05:21:19] we need 2 A records installed is all [05:21:27] www.wikidata.org and wikidata.org [05:21:41] ah, this is just for the placeholder for now? [05:21:52] because the full change needs apache rules and such [05:21:53] to point to a one page site which is essentially a soft redirect to meta [05:21:57] yes [05:22:00] no apache rules [05:22:02] just DNS [05:22:08] this is hosted at WMDE not WMF [05:22:17] ... [05:22:42] i've no idea why but i did test (once upon a time) with hosts file and it looks fine [05:22:50] well, I'd imagine that needs to be discussed with legal, then [05:22:56] ugh [05:23:03] then i'll cc them? [05:23:20] sure [05:23:24] k, thanks [05:23:37] people will be getting sent to a server outside of our control [05:23:43] in europe [05:23:44] right [05:24:00] well it's not terribly relevant where that server is i think. but who knows [05:24:00] have a giant feeling they're not going to like that [05:24:11] it definitely matters where the server is [05:24:12] no cookie sharing with the rest of the sites [05:24:29] anyway, we'll see what they say [05:24:31] it doesn't matter. data retention laws, and privacy laws are much different in the EU [05:24:38] yeah [05:25:05] the ticket doesn't mention anything about wmde hosting the servers, does it? [05:25:38] it just says they need a placeholder [05:25:55] certainly the gerrit does [05:26:14] i thought the RT too. but of course i can't see the whole RT [05:26:18] anyway, i'll mail [05:26:47] nope. not in the rt at all [05:27:31] k [05:32:25] Ryan_Lane: can you also reply to the ticket explicitly approving this 1) placeholder and 2) WMDE hosting from your end? (pending legal) [05:32:50] I'd actually prefer we host a placeholder on our cluster [05:34:49] sure we can. but this is what they decided and set up. i personally am ok with it. i definitely think adding another roadblock is not good at this point. (maybe installing on cluster is smaller than legal approval? idk) we don't even seem to know exactly how to deploy redirects.conf changes from gerrit! ;-( ;-( [05:34:55] they can push the placeholder into the docroot for the domain, and add an apache rule for it to display, if they'dlike it to go faster [05:35:09] right [05:35:21] anyway, i'm simultaneously pinging lydia and also composing this mail [05:35:24] maybe they shouldn't make decisions like that without discussing it with people? [05:35:47] it only takes a couple minutes to ask if it's easy or hard [05:35:58] hosting stuff in the eu is hard for us [05:36:15] (even the toolserver is apparently troublesome for us) [05:36:58] Ryan_Lane: i think they did discuss it (on gerrit even!). and not a single person mentioned the EU factor until you just did. it's been a whole month that this request has been going (for the simple not whole uri scheme version) [05:37:15] idk where else they discussed it but i think gerrit's not the only place [05:37:26] dude, I'm not seeing anything about this being hosted anywhere but on the cluster [05:37:41] sure. i got it [05:37:42] can you point me to the gerrit change that discusses this? [05:37:49] cause it surely isn't in the RT [05:37:58] !g 9874 | Ryan_Lane [05:37:58] Ryan_Lane: https://gerrit.wikimedia.org/r/#q,9874,n,z [05:38:00] and gerrit is definitely not the right place for that [05:38:30] what's 3078 for that matter? is that the one submitted by me? [05:38:36] (RT 3078) [05:39:06] * Ryan_Lane sighs [05:39:24] anyway, i'm still getting ahold of wikidata people. i think i don't really need anything from you right now [05:39:54] 3078 does indeed mention an EU IP [05:40:00] I'm wondering if no one noticed that [05:41:40] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [05:41:54] jeremyb: either way, we'll get something in place before wikimania [05:42:54] Ryan_Lane: right. i hope so. and kinda know so (partly because i will personally hunt down people and make it so) [05:44:01] speaking of which i have to go ping freenode staff on the RT i didn't get a reply to ;) [05:44:15] Ryan_Lane: so, is 3078 from me? [05:44:54] yes [05:45:02] k [05:49:33] there's an error in WMDE's markup ;) [06:08:31] ugh, and they used DOS line endings! [06:55:32] PROBLEM - Puppet freshness on ms3 is CRITICAL: Puppet has not run in the last 10 hours [07:05:35] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [07:15:29] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [07:26:35] PROBLEM - Puppet freshness on ms1 is CRITICAL: Puppet has not run in the last 10 hours [07:31:32] PROBLEM - Puppet freshness on tarin is CRITICAL: Puppet has not run in the last 10 hours [07:59:39] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [08:08:21] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [08:31:45] PROBLEM - Puppet freshness on cp3002 is CRITICAL: Puppet has not run in the last 10 hours [08:31:45] PROBLEM - Puppet freshness on search31 is CRITICAL: Puppet has not run in the last 10 hours [08:33:51] PROBLEM - Puppet freshness on sq69 is CRITICAL: Puppet has not run in the last 10 hours [08:34:46] PROBLEM - Puppet freshness on search34 is CRITICAL: Puppet has not run in the last 10 hours [08:35:48] PROBLEM - Puppet freshness on search24 is CRITICAL: Puppet has not run in the last 10 hours [08:35:48] PROBLEM - Puppet freshness on strontium is CRITICAL: Puppet has not run in the last 10 hours [08:39:44] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [08:39:44] PROBLEM - Puppet freshness on search27 is CRITICAL: Puppet has not run in the last 10 hours [08:39:44] PROBLEM - Puppet freshness on search22 is CRITICAL: Puppet has not run in the last 10 hours [08:39:44] PROBLEM - Puppet freshness on search28 is CRITICAL: Puppet has not run in the last 10 hours [08:39:44] PROBLEM - Puppet freshness on search21 is CRITICAL: Puppet has not run in the last 10 hours [08:39:45] PROBLEM - Puppet freshness on search36 is CRITICAL: Puppet has not run in the last 10 hours [08:40:47] PROBLEM - Puppet freshness on search33 is CRITICAL: Puppet has not run in the last 10 hours [08:41:50] PROBLEM - Puppet freshness on search16 is CRITICAL: Puppet has not run in the last 10 hours [08:41:50] PROBLEM - Puppet freshness on search20 is CRITICAL: Puppet has not run in the last 10 hours [08:41:50] PROBLEM - Puppet freshness on search30 is CRITICAL: Puppet has not run in the last 10 hours [08:43:47] PROBLEM - Puppet freshness on sq70 is CRITICAL: Puppet has not run in the last 10 hours [08:45:53] PROBLEM - Puppet freshness on search26 is CRITICAL: Puppet has not run in the last 10 hours [08:46:47] PROBLEM - Puppet freshness on sq67 is CRITICAL: Puppet has not run in the last 10 hours [08:47:50] PROBLEM - Puppet freshness on sq68 is CRITICAL: Puppet has not run in the last 10 hours [08:47:50] PROBLEM - Puppet freshness on search18 is CRITICAL: Puppet has not run in the last 10 hours [08:48:53] PROBLEM - Puppet freshness on cp3001 is CRITICAL: Puppet has not run in the last 10 hours [08:49:47] PROBLEM - Puppet freshness on search17 is CRITICAL: Puppet has not run in the last 10 hours [08:50:50] PROBLEM - Puppet freshness on search13 is CRITICAL: Puppet has not run in the last 10 hours [08:50:50] PROBLEM - Puppet freshness on search25 is CRITICAL: Puppet has not run in the last 10 hours [08:52:47] PROBLEM - Puppet freshness on search15 is CRITICAL: Puppet has not run in the last 10 hours [08:54:53] PROBLEM - Puppet freshness on search14 is CRITICAL: Puppet has not run in the last 10 hours [08:54:53] PROBLEM - Puppet freshness on search29 is CRITICAL: Puppet has not run in the last 10 hours [08:54:53] PROBLEM - Puppet freshness on search19 is CRITICAL: Puppet has not run in the last 10 hours [08:55:47] PROBLEM - Puppet freshness on search23 is CRITICAL: Puppet has not run in the last 10 hours [08:57:53] PROBLEM - Puppet freshness on arsenic is CRITICAL: Puppet has not run in the last 10 hours [09:00:44] PROBLEM - Puppet freshness on palladium is CRITICAL: Puppet has not run in the last 10 hours [09:21:53] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [09:32:23] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [09:41:12] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [11:11:53] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [11:15:56] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [12:50:09] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [12:51:48] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [12:57:23] New patchset: Matthias Mullie; "Bug 37998 - give "confirmed" users same AFT privileges as "autoconfirmed"" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14036 [13:18:26] New patchset: Demon; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [13:19:00] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/13484 [13:20:08] New review: Demon; "Biggest missing feature: need to move gerrit_config crap as parameters to role::gerrit so they can b..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/13484 [13:20:13] New patchset: Matthias Mullie; "lower AFTv4 odds to display AFTv5 at 2% (inverse odds)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14040 [13:20:57] New patchset: Demon; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [13:21:30] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/13484 [13:22:22] New patchset: Matthias Mullie; "lower AFTv4 odds to display AFTv5 at 5% (inverse odds)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14041 [13:23:35] New patchset: Matthias Mullie; "lower AFTv4 odds to display AFTv5 at 10% (inverse odds)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14044 [13:23:38] Change abandoned: Demon; "Went ahead and squashed this into the omnibus I3d8a9fd2." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/12876 [13:24:43] New review: Matthias Mullie; "Do not merge before July 5" [operations/mediawiki-config] (master); V: 0 C: -2; - https://gerrit.wikimedia.org/r/14040 [13:25:06] New review: Matthias Mullie; "Do not merge before July 10" [operations/mediawiki-config] (master); V: 0 C: -2; - https://gerrit.wikimedia.org/r/14041 [13:25:24] New review: Matthias Mullie; "Do not merge before July 17" [operations/mediawiki-config] (master); V: 0 C: -2; - https://gerrit.wikimedia.org/r/14044 [14:24:48] Hey RobH, would you have some time to look at stat1001? (see https://rt.wikimedia.org/Ticket/Display.html?id=3121and assuming that you are physically nearby that box) [14:29:45] PROBLEM - Puppet freshness on cp1017 is CRITICAL: Puppet has not run in the last 10 hours [14:29:45] PROBLEM - Puppet freshness on mw1102 is CRITICAL: Puppet has not run in the last 10 hours [14:42:58] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [15:41:37] PROBLEM - Host mw1160 is DOWN: PING CRITICAL - Packet loss = 100% [15:42:58] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [16:14:00] PROBLEM - Host mw1069 is DOWN: PING CRITICAL - Packet loss = 100% [16:14:59] grrrrrr stupid stupid juniper not getting back to me [16:15:34] :) [16:15:54] !log silicon gets dist-upgrade & reboot [16:16:03] Logged the message, Master [16:16:13] I'm still waiting to hear back from a case I opended with them like 4 months ago about a VRRP bug, come to not expect so much of hardware suppliers. [16:16:20] sigh [16:16:50] oh that reminds me [16:16:58] i still have a foundry bfd ipv6 bug open [16:17:08] i should poke them again [16:37:24] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/13121 [16:39:38] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/13301 [16:41:58] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/13347 [16:43:43] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/12185 [16:44:51] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14036 [16:56:39] PROBLEM - Puppet freshness on ms3 is CRITICAL: Puppet has not run in the last 10 hours [17:00:07] notpeter: https://gerrit.wikimedia.org/r/#/c/14070/ [17:00:58] Ryan_Lane: https://gerrit.wikimedia.org/r/#/c/14070/ [17:02:57] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14070 [17:02:59] New patchset: Jeremyb; "make a single page wikidata.org site (meta portal)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14071 [17:03:14] New patchset: Jeremyb; "make a single page wikidata.org site (meta portal)" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/9874 [17:04:57] !log replacing bad disk in db1047 [17:05:10] Logged the message, RobH [17:06:42] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [17:08:48] PROBLEM - Host mw1117 is DOWN: PING CRITICAL - Packet loss = 100% [17:09:23] New patchset: Jeremyb; "make a single page wikidata.org site (meta portal)" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/9874 [17:10:45] !log db1047 disk0 rebuild in progress [17:10:53] Logged the message, RobH [17:16:45] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [17:22:12] !log pulling helium offline for disk testing with fluorine disks [17:22:21] Logged the message, RobH [17:22:22] !log fluorine offlining to test disks [17:22:24] mutante: ^ [17:22:31] Logged the message, RobH [17:23:57] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [17:25:23] RobH: Who should I poke to get an update to our bugzilla configuration deployed? I've been waiting for a fix to the comment link regex for weeks now [17:25:45] Its not deployable by most people. It requires root on a specific server [17:25:52] (Or so Im told) [17:26:12] its all in svn and ready to roll [17:26:15] hrmm [17:26:30] i would normally be happy to snag and deploy for you, but im busy with on site stuff only for now [17:26:36] too much to do that only i can do here. [17:26:49] RobH: np :) but do you know who else has access to this that I might poke? [17:26:55] i would prolly email the entire ops list [17:27:05] poking folks individually is somethign we wanna get away from [17:27:12] does that require being subscribed to the list? [17:27:15] (since I'm not) [17:27:25] it just gets moderated and we approve to list is all [17:27:32] k [17:27:42] PROBLEM - Puppet freshness on ms1 is CRITICAL: Puppet has not run in the last 10 hours [17:28:02] <^demon> Krinkle: We really really need to get that puppetized so we can avoid having to bug a root for this :\ [17:28:15] yeah, and moved to gerrit [17:28:18] once its puppetized root still has to roll it live [17:28:22] but indeed, its easier. [17:28:46] where in svn? [17:28:58] ^demon: how was articlefeedback converted? by you? [17:29:14] <^demon> Yes. [17:29:24] <^demon> Using the svn2git script I have. [17:29:29] ^demon: no tags? is that the default? [17:30:00] (came up on list recently) [17:30:10] <^demon> Tags were there if I knew about them. [17:30:12] i couldn't tell if it was your conversion or someone else did it [17:30:30] <^demon> I didn't go digging for random tags/branches unless someone asked for them [17:32:00] k [17:32:39] PROBLEM - Puppet freshness on tarin is CRITICAL: Puppet has not run in the last 10 hours [17:36:20] jeremyb: re: "where in svn": https://www.mediawiki.org/wiki/Special:Code/MediaWiki/?path=/trunk/tools/bugzilla [17:36:33] danke [17:36:52] Krinkle: so you think i'm ready for deploy? [17:37:10] whether you are ready for deploy? [17:37:30] In which repository are you :P [17:37:35] meta portal for wikidata.org [17:37:40] ah [17:37:53] Yes, looks good to me. But "someone else has to approve" [17:38:09] does that apply to both changesets? ;P [17:38:15] Yep [17:38:22] okey [17:38:27] I don't have access to ops [17:38:41] I'll +1 both [17:38:44] right, you have access to deploy one of the two maybe? but definitely not both [17:38:44] <^demon> Krinkle: I found t-shirts that say "+2 LGTM" :p [17:38:55] haha [17:38:57] ^demon: and do not submit? [17:39:11] <^demon> And "I would prefer you didn't submit this. Really" [17:39:11] Oh, I'd love those t-shirts. [17:39:32] <^demon> http://www.zazzle.com/i_would_prefer_t_shirts-235689879400316433 http://www.zazzle.com/2_lgtm_tee_shirt-235862924096653439 [17:39:59] You could even produce them for couples with pointy arrows in them like "Looks good to me. Devoted. +2 <3" :D [17:40:18] omg, that's awesome [17:40:33] Is that Ryan_Lane in the picture :P ? [17:40:43] <^demon> Haha, not as far as I know [17:40:46] (I only see the facial hair) [17:40:50] heh [17:40:52] not me [17:40:55] okay [17:44:01] RobH: sorry about the delay in the virt ports, getting this now (was distracted by the junos "fun" yesterday) [17:44:29] hah, fun! [17:45:47] New patchset: Bhartshorne; "setting up labs swift cluster to test upgrades" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14079 [17:46:09] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [17:46:20] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14079 [17:46:53] ok, i updated RT 2919 [17:51:35] !log investigating stat1001 power issue [17:51:44] Logged the message, RobH [17:54:01] drdee: working on stat1001 for you, had to pull all power [17:57:25] RobH: thanks i saw the rt update [17:57:32] bummer :( [17:57:37] yea, i am going to have to call dell later today [17:57:51] goign to wrap up a fair number of other hands on things to queue up a number of dell issues and call once [18:00:07] heh, i never was able to call dell about more than one issue at a time IIRC [18:00:14] New patchset: Bhartshorne; "updating labsupgrade cluster with real rings" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14082 [18:00:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14082 [18:02:31] jeremyb: dunno how they are going to stop me [18:02:40] i get one tech on the line, then i open as many cases as I need [18:02:48] Im not about to hang up, call back, do hold process for each one. [18:02:54] RobH: i mean i never had that many at once [18:03:03] oh, sounds nice ;] [18:03:17] RobH: remember i only ever had one rack at peak ;) [18:03:23] and not all were dell [18:03:39] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14082 [18:04:24] heh, 98% of the hardware in eqiad is dell [18:11:16] !log virt1006 mgmt serial not set correctly, fixed [18:11:26] Logged the message, RobH [18:18:26] PROBLEM - MySQL Slave Delay on es4 is CRITICAL: CRIT replication delay 213 seconds [18:19:56] RECOVERY - MySQL Slave Delay on es4 is OK: OK replication delay 0 seconds [18:27:28] New patchset: Bhartshorne; "updating labsupgrade cluster with the right real rings" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14087 [18:28:00] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14087 [18:32:59] PROBLEM - Puppet freshness on cp3002 is CRITICAL: Puppet has not run in the last 10 hours [18:32:59] PROBLEM - Puppet freshness on search31 is CRITICAL: Puppet has not run in the last 10 hours [18:35:05] PROBLEM - Puppet freshness on sq69 is CRITICAL: Puppet has not run in the last 10 hours [18:35:59] PROBLEM - Puppet freshness on search34 is CRITICAL: Puppet has not run in the last 10 hours [18:37:02] PROBLEM - Puppet freshness on strontium is CRITICAL: Puppet has not run in the last 10 hours [18:37:03] PROBLEM - Puppet freshness on search24 is CRITICAL: Puppet has not run in the last 10 hours [18:41:05] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [18:41:05] PROBLEM - Puppet freshness on search21 is CRITICAL: Puppet has not run in the last 10 hours [18:41:05] PROBLEM - Puppet freshness on search36 is CRITICAL: Puppet has not run in the last 10 hours [18:41:05] PROBLEM - Puppet freshness on search28 is CRITICAL: Puppet has not run in the last 10 hours [18:41:05] PROBLEM - Puppet freshness on search27 is CRITICAL: Puppet has not run in the last 10 hours [18:41:06] PROBLEM - Puppet freshness on search22 is CRITICAL: Puppet has not run in the last 10 hours [18:41:59] PROBLEM - Puppet freshness on search33 is CRITICAL: Puppet has not run in the last 10 hours [18:43:02] PROBLEM - Puppet freshness on search16 is CRITICAL: Puppet has not run in the last 10 hours [18:43:02] PROBLEM - Puppet freshness on search20 is CRITICAL: Puppet has not run in the last 10 hours [18:43:02] PROBLEM - Puppet freshness on search30 is CRITICAL: Puppet has not run in the last 10 hours [18:44:59] PROBLEM - Puppet freshness on sq70 is CRITICAL: Puppet has not run in the last 10 hours [18:47:05] PROBLEM - Puppet freshness on search26 is CRITICAL: Puppet has not run in the last 10 hours [18:47:59] PROBLEM - Puppet freshness on sq67 is CRITICAL: Puppet has not run in the last 10 hours [18:49:02] PROBLEM - Puppet freshness on search18 is CRITICAL: Puppet has not run in the last 10 hours [18:49:02] PROBLEM - Puppet freshness on sq68 is CRITICAL: Puppet has not run in the last 10 hours [18:50:05] PROBLEM - Puppet freshness on cp3001 is CRITICAL: Puppet has not run in the last 10 hours [18:50:59] PROBLEM - Puppet freshness on search17 is CRITICAL: Puppet has not run in the last 10 hours [18:52:02] PROBLEM - Puppet freshness on search13 is CRITICAL: Puppet has not run in the last 10 hours [18:52:02] PROBLEM - Puppet freshness on search25 is CRITICAL: Puppet has not run in the last 10 hours [18:53:59] PROBLEM - Puppet freshness on search15 is CRITICAL: Puppet has not run in the last 10 hours [18:56:05] PROBLEM - Puppet freshness on search14 is CRITICAL: Puppet has not run in the last 10 hours [18:56:05] PROBLEM - Puppet freshness on search19 is CRITICAL: Puppet has not run in the last 10 hours [18:56:05] PROBLEM - Puppet freshness on search29 is CRITICAL: Puppet has not run in the last 10 hours [18:57:08] PROBLEM - Puppet freshness on search23 is CRITICAL: Puppet has not run in the last 10 hours [18:59:05] PROBLEM - Puppet freshness on arsenic is CRITICAL: Puppet has not run in the last 10 hours [19:02:05] PROBLEM - Puppet freshness on palladium is CRITICAL: Puppet has not run in the last 10 hours [19:03:21] New patchset: Hashar; "rsyslog should send logs to $::syslog_server" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14090 [19:03:54] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14090 [19:09:06] Ryan_Lane: https://gerrit.wikimedia.org/r/#/c/14095/ [19:10:16] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14095 [19:13:31] will all mifis be at wikimania? i guess except for tampa? will some be available for supplementing wifi in some places or will just stay with individuals? [19:13:59] (e.g. people don't know if all the hotels will stay up under load) [19:14:31] jeremyb, is there power to start with? :) [19:14:35] <^demon> jeremyb: stay in a different hotel from everyone else ;-) [19:14:49] hey, i come bearing my own connection [19:15:23] bring a 48 port gigabit switch and some ethernet cables [19:15:28] <^demon> I used to, but my $cellularProvider decided to start charging me for tethering :( [19:16:33] it's cheaper than a mifi! [19:17:11] <^demon> $14.95/mo for a feature I barely use? Not worth it. [19:17:27] <^demon> That's kinda why it bugs me. I *don't* use it often, but I liked having it when I needed it. [19:18:36] Considering you can do almost anything on a phone now... differentiating tethering traffic is strange [19:21:20] <^demon> Reedy: Supposedly I can work around it with my UA, but that's stealing. I don't want to steal, I just want to keep getting it for free :p [19:21:32] ottomata: ping [19:23:02] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [19:24:02] binasher, that list is public, in case you don't know [19:24:28] i was just told by a couple people [19:24:35] oki [19:24:43] <^demon> binasher: wsor list? That's public. [19:39:49] New patchset: Matthias Mullie; "blacklist AFTv5 whitelist for AFTv4 on test as well" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14098 [19:40:31] New patchset: Hashar; "docroot for deployment.wikimedia.beta.wmflabs.org" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14099 [19:45:37] ^demon: how can you be charged for tethering ? [19:45:55] ping! [19:45:56] ping? [19:45:56] <^demon> Ask my company :p [19:46:00] jeremyb hiya [19:46:13] ^demon: I will :-] [19:46:14] ottomata: well, half nvm, you mailed [19:46:20] ottomata: also wondering about lucene [19:46:31] bout how it is going? [19:46:32] good! [19:46:33] i got it [19:46:40] now i'm organizing and building debs for every piece [19:46:46] finally got this working all nice [19:46:52] https://github.com/wmf-analytics/thrift-debian [19:47:45] *click* [19:48:24] *click* [19:49:13] * hashar 's heuristic detects a potentially brain damaging repo (keywords: thrift, debian, analytics, DHAVE.*, amd64) [19:49:57] ottomata: it is awesome to know someone is working on thrift :-) [19:50:15] haah [19:50:38] we need it in MediaWiki :-D [19:51:07] i put that on github rather than gerrit, becauase some people have already done some work on debian packaging for this stuff there, and I want it to be more easily reachable by others [19:51:09] really!? [19:51:09] cool [19:51:13] what for? [19:51:17] no idea [19:51:22] but we will find a usage, trust me :-) [19:51:24] that should be ready! [19:51:24] I"ve checked in the .debs [19:51:24] i'm doing this for scribe [19:51:35] i hope to get those debs in our apt repo eventaully [19:57:14] Change merged: Bsitu; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14098 [19:59:56] ottomata: poke ops on the mailing list I guess [20:01:38] I am out see you tomorrow [20:02:29] laters [20:09:47] New patchset: Bhartshorne; "last puppet commit for labsupgrade cluster setup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14103 [20:10:20] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14103 [20:14:57] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14103 [20:29:34] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [20:35:25] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [20:40:40] PROBLEM - NTP on db32 is CRITICAL: NTP CRITICAL: Offset unknown [20:48:51] New review: Platonides; "Where is docroot/labs/images/wiki-en.png used?" [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/14099 [20:51:19] RECOVERY - NTP on db32 is OK: NTP OK: Offset 0.001102805138 secs [21:05:23] * jeremyb wonders who from ops is still around? i guess maybe one of the european ops could do it tomorrow but would be nice to finally have it done ;) [21:05:29] !rt 2919 [21:05:29] http://rt.wikimedia.org/Ticket/Display.html?id=2919 [21:12:55] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [21:13:20] jeremyb: hey [21:13:21] what's up [21:13:58] LeslieCarr: gave the RT link [21:14:10] i *think* it should be straightforward [21:14:38] see the latest mail from me today? [21:14:41] hrm, reading [21:16:45] so, are we switching wikidata.org to 188.93.10.125 or are we hosting wikidata.org is my big question from reading through everything [21:16:58] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [21:23:44] LeslieCarr: no DNS change at all. hosting on cluster [21:23:59] sorry, i wasn't paying attention, please say my name next time ;) [21:24:05] oh no worries [21:24:28] ok, so rt 3078 is unneeded ? [21:24:32] jeremyb: [21:24:46] err, i guess. no DNS changes should be made AFAIK [21:25:10] just push those 2 changes live and that should do it think [21:25:14] i think* [21:30:40] SQL is down [21:30:48] .... [21:30:55] SQL is what? [21:30:56] That's not helpful [21:30:58] too many active concurrent transactions [21:30:59] it says [21:31:04] for which IP? [21:31:10] it didn't say [21:31:12] does the msg say retry? [21:31:13] (back up now) [21:31:14] huh?! [21:31:19] MySQL error: 1637: Too many active concurrent transactions (10.0.6.50) [21:31:19] it always says the IP [21:31:20] the message was not the Wikimedia error [21:31:26] not this time [21:31:31] db40 [21:31:36] parsercache! [21:31:37] that's parser cache still? [21:32:13] http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=load_one&s=by+name&c=MySQL+pmtpa&h=db40.pmtpa.wmnet&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4 [21:32:19] It's had a rather large load bump [21:33:59] LeslieCarr: running in ~10 mins fyi. brb [21:34:13] oh fun [21:34:34] I am getting some weird error messages (both on Meta and Office wiki) that I've never seen before upon submitting a revision: "A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was: (SQL query hidden) from within function "SqlBagOStuff::set". Database returned error "1637: Too many active concurrent transactions (10.0.6.50)"." – is this a known issue? [21:34:55] Tim-away: ^ there's no binasher. Fancy looking at db40? [21:34:58] else maplebed? [21:35:18] um, i'll look at db40 though i would say i'm not the best person for it ... [21:35:26] super high load [21:35:44] it's dropped, a little [21:36:24] (if you feel like still working on it after db40's resolved, feel free to mail me any questions or i'll be back on irc in ~2.5 hrs) [21:38:39] we had a spam of "slow queries" [21:39:53] yeah, it fixes itself [21:41:27] write queries [21:43:41] ah LeslieCarr before I forget a friend of mine said we should look into this, dunno what you think: https://ring.nlnog.net/ [21:43:53] I imagine we aren't already in it right? [21:44:17] we are not in it [21:44:47] i don't know if i trust it security-wise enough… not without doing a proper security audit [21:44:49] Did fenari's key fingerprint change recently? [21:44:57] no [21:45:00] Oh, nm [21:45:08] I'm logged in as a different user that's why [21:45:23] mm hmm it's access to a lot of folks I guess [21:45:46] You are willing to give full sudo access to the Ring-Admins [21:45:54] anyways maybe worth thinking about [21:46:03] and now I really am off to bed, see yas [21:46:28] don't forget tomorrow is higgs boson day! :-) [21:48:07] " Too many active concurrent transactions " = usually indicates anti-pattern of "BEGIN; ... do something ... ; wait a while ; do something else; COMMIT" [21:48:26] (when you're not organically overloaded with write) [21:54:00] bye! [21:56:38] New patchset: Hashar; "labs: projects uses different NFS host for apache" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14152 [21:57:11] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14152 [22:01:06] New patchset: Catrope; "Revert "RT #1424 (Set up log rotation for wmerrors log)"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14154 [22:01:32] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14154 [22:27:14] !log updating testswarm submitter on gallium [22:27:24] Logged the message, Master [22:27:36] Krinkle: lets talk here it is quieter :-] [22:27:42] ok [22:29:27] hashar: as you may have noticed, I'm working on implementing jshint cli [22:29:32] works pretty good [22:29:38] much better than I had expected, this is going to be simple [22:29:43] that is, granted we have node available [22:30:50] hashar: can I trigger the build now? [22:31:46] .. [22:34:09] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [22:35:04] PROBLEM - Host db1013 is DOWN: PING CRITICAL - Packet loss = 100% [22:39:03] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [23:02:27] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [23:10:06] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [23:13:06] RECOVERY - NTP on cp1040 is OK: NTP OK: Offset -0.01972687244 secs [23:14:36] PROBLEM - MySQL Slave Delay on db1001 is CRITICAL: CRIT replication delay 265 seconds [23:15:12] PROBLEM - MySQL Replication Heartbeat on db1001 is CRITICAL: CRIT replication delay 303 seconds [23:15:36] db1001 is me [23:20:12] New patchset: Reedy; "Install clamav on mediawiki app servers (upload scanning)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14162 [23:20:45] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14162 [23:22:06] RECOVERY - Host cp1038 is UP: PING OK - Packet loss = 0%, RTA = 33.07 ms [23:23:18] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [23:23:28] New patchset: preilly; "add more opera mini IPs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/14163 [23:23:58] Ryan_Lane: https://gerrit.wikimedia.org/r/#/c/14163/1/templates/varnish/mobile-frontend.inc.vcl.erb [23:24:04] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/14163 [23:28:24] PROBLEM - Host mw1092 is DOWN: PING CRITICAL - Packet loss = 100% [23:41:18] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [23:46:18] @replag [23:46:19] Krinkle: [s1] db36: 1s, db32: 1s, db59: 3s, db60: 1s, db12: 4s [23:55:16] PROBLEM - Host mw1046 is DOWN: PING CRITICAL - Packet loss = 100%