[00:00:29] RobH: looking [00:02:10] in the old slot format config file, we had to pull down memcached servers out asap [00:02:23] so just wondering if the new memcached and config format also needs this, or if its smarter =] [00:02:35] (i rather know before we have a crashed mc server, heh ;) [00:02:45] so nothing is wrong right now? [00:02:51] on the doc? [00:02:56] oh, on cluster [00:02:57] on the site :) [00:02:58] nothing is wrong now. [00:03:01] ok [00:03:09] there will still be a flood of errors [00:03:19] so I'd imagine you'd want to remove servers from the list [00:03:22] I am just updating the wikitech docs so when we have a broken one we know [00:03:26] cool, sounds good to me [00:03:31] the only difference is that you don't have to put in a replacement [00:03:47] since the hashing is consistent, a lot of keys will still map to the same servers (though not all) [00:05:13] I suppose changing the servers in the list more than once a in a short time could cause keys to map back to a server they used to before...which could cause consistency problems [00:05:27] a short time is ? [00:05:37] 1/5/10 minutes? [00:05:53] adding to docs so folks know what to do [00:06:35] depends how long items are cached [00:06:39] we should probably audit the MW code [00:09:10] RobH: I can think of same things that cache for up to a day that expect some consistency [00:09:17] eww [00:09:51] i think the mctest.php only tests tampa memcached. [00:10:09] yep... [00:10:25] so our memcache testing script is checking tampa memcached, and i thikn we are running memcached out of eqiad arent we? [00:10:35] say a file is cached as "not existing" on mc1, mc 1 is pulled, and mc7 is used, someone uploads A and it is cached as "existing", then someone adds mc1 back, and the old "not existing" key comes back [00:10:35] PROBLEM - Puppet freshness on amssq41 is CRITICAL: Puppet has not run in the last 10 hours [00:10:55] well, if you restarted mc1 that won't happen [00:10:56] New patchset: Jdlrobson; "Enable Watchlist schema in config file (EventLogging)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/48060 [00:11:19] but you could imagine, maybe pulling mc2 would cause the key to move from m1 => mc7 and back when mc2 is re-added [00:11:42] in which case the file would be falsely cached as not existing and thumbnails would give 404s for no apparent reason [00:11:44] and this wasnt an issue before sicne servers had slot IDs? [00:11:55] and since a server wouldnt go back to its old slot id, no problem then? [00:12:05] it wasn't because we used modulo hash and always swapped in a spare [00:12:10] yea [00:12:13] so keys were never getting remapped around [00:12:37] i put in docs that it can cause an issue, and why, and to be careful [00:12:40] the advantage of consistent hashing is that if *you don't have a spare* you can pull a server without causing almost everything to map to new servers [00:12:48] but i dont think there is much else one can do then when swapping if a server is down. [00:13:12] AaronSchulz: if an mc host is actually pulled by ops, its likely to come back after a reboot [00:13:27] binasher: see my above comment :) [00:13:30] i just put in the docs on wikitech to reboot before pulling [00:13:43] and only pull from the config if the server cannot be resurrected asap [00:13:45] er, reboot after pulling? [00:13:51] binasher: the server pulled won't be the only with keys remapped [00:14:07] ie: dont take server out of mc-site.php until its rebooted and isnt goign to come back easily [00:14:28] if it's coming back right away, less overhead and errors to simply fix [00:14:33] it's better to let the log spam for a few minutes than pulling a server [00:14:37] AaronSchulz: ahh, your pulling mc2 causing mc1 keys to move to mc7 comment [00:14:39] on old setup one would always swap out with an up spare first, then troubleshoot [00:14:44] unless that server is used for same crazy hot key (like slave lag cache) [00:14:51] you'd probably know that when you see it :) [00:14:55] binasher: this is all discussion, all mc servers are fine now =] [00:14:57] yeah, puling servers should be done in ways that support the consistent hashing [00:15:10] im trying to update the memcache wikitech page since it was horribly outdate [00:15:11] d [00:15:15] PROBLEM - MySQL Slave Delay on db1007 is CRITICAL: CRIT replication delay 187 seconds [00:15:21] Also, the mctest.php script seems to only test tampa mc servers. [00:15:59] PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 211 seconds [00:16:34] binasher: maybe it would be nice to have a feature to list a server as down, so that keys that map to it would map to the "next" server on the hash ring [00:16:38] AaronSchulz: we use http://www.last.fm/user/RJ/journal/2007/04/10/rz_libketama_-_a_consistent_hashing_algo_for_memcache_clients by way of libmemcached [00:17:15] New patchset: FastLizard4; "Allow override of the MySQL server bind address through the mysql_server_bind_address puppet variable. The default will still be 127.0.0.1 (end result: Instance administrators may now set their MySQL server's bind address in their NovaInstance configurat" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48585 [00:17:24] binasher: right [00:17:26] removing a server should not remap many keys, though its possible that a small percentage will be [00:17:57] that's what I was saying about, most stuff will map to the same place [00:18:04] s/about/above [00:18:25] which is different than in the slot system [00:18:48] but slot system style, we can also still avoid changing the number of server slots when a box is down for maintenance [00:19:43] or as you said, put up with the log spam [00:20:03] sorry, i jumped in on this without having read all of the back scroll [00:20:53] no worries [00:21:08] so who do i bug to fix mctest.php? [00:21:09] ;] [00:21:10] RobH: actually when I said "module hash" I meant "modulo slot", but you know :) [00:21:24] wasn't that fixed? [00:21:37] it outputs tampa mc servers [00:21:47] shouldnt it be outputting whatever the active cluster is? (in this case, eqiad?) [00:21:55] RobH: Running from fenari? [00:21:58] yep [00:21:59] try it on bastion1001? [00:22:03] ^ [00:22:14] bleh. [00:22:52] different error [00:22:55] you guys try this? ;] [00:23:04] What's the error there? [00:23:05] Could not open input file: /srv/deployment/mediawiki/common/multiversion/MWScript.php [00:23:11] lols [00:23:24] heh [00:23:31] I guess bast1001 isn't a deployment target [00:23:55] so, then is the spence memcached check running against tampa servers? [00:24:03] makes it kind of pointless then ;] [00:24:04] it wouldn't suprise me [00:24:33] New review: Andrew Bogott; "As discussed on IRC, this change is good but should be done in the labs role class instead." [operations/puppet] (production); V: 0 C: -2; - https://gerrit.wikimedia.org/r/48585 [00:25:25] RobH: run /usr/bin/sync-common on bast1001 as root [00:25:25] heh, so copper is running a memcache check on nagios [00:25:37] so somepalce does have mc in eqiad chekcing [00:25:39] thats good. [00:25:41] Then someone needs to add it to the dsh group [00:26:22] well, the check is on nagios but its its own memcache instance, nm. [00:26:36] i still think we may not be checking mc service in eqiad [00:27:09] trwikibooks: Memcached error for key "trwikibooks:revisiontext:textid:33143" on server "10.64.0.183:11211": ITEM TOO BIG [00:27:10] heh [00:27:27] so we list the ports in the mc-pmtpa.php [00:27:31] but no ports in mc-eqiad.php [00:27:57] still same ports, just odd they dont match configurations. [00:28:18] !log aaron synchronized php-1.21wmf9/includes/db/Database.php 'deployed d8705542627f006a7ec9f81a9fb488fcc9a367bd' [00:28:19] Logged the message, Master [00:28:21] (why the hell do we run it on non standard port anyhow?) [00:28:29] AaronSchulz: the max slab size in memcached is no longer limited to 1mb objects, it's user configurable [00:28:30] legacy? [00:28:37] Reedy: thats all i can see. [00:28:44] i hate that reason. [00:28:48] Kids these days. [00:28:52] RobH: we don't. [00:28:59] we use the standard port [00:29:04] i thought the tampa port was non standard [00:29:16] i guess it was fixed a long time ago and i never noticed =P [00:29:53] AaronSchulz: i do believe that the 1m limit is hardcoded in libmemcached though, so changing our running memcached wouldn't immediately filter down [00:29:57] binasher: maybe we should do some logging to get histogram from a sample of attempted sets for size [00:30:01] we do build and package libmemcached ourselves [00:30:05] oh [00:30:10] but i'm not sure if this would even be worth doing [00:30:14] PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 208 seconds [00:30:18] * RobH is still bummed he cannot deploy new apaches until chris runs some cables [00:30:26] seems pretty occasional that we hit it the case [00:30:32] not super often [00:30:50] PROBLEM - MySQL Slave Delay on db1007 is CRITICAL: CRIT replication delay 217 seconds [00:31:25] all depends on the latency cost of never caching those few keys vs. the ram cost in having memory allocated to huge-object slabs that would very rarely be used [00:32:30] and we'd still need to generate a key size histogram to determine the new max slab size [00:32:32] one could even automatically shard keys and use getmulti ;) [00:32:51] some updates applied: http://wikitech.wikimedia.org/view/Memcached [00:33:24] AaronSchulz: twemproxy has a cool feature that lets you set an arbitrary shard key for a set of keys [00:33:25] andrewbogott: http://arstechnica.com/security/2013/02/at-facebook-zero-day-exploits-backdoor-code-bring-war-games-drill-to-life/ [00:33:34] binasher: I kind of wish the hashing was based on names the client gives the hosts [00:33:35] AaronSchulz: so that when you getmulti, they will all be on the same server [00:33:49] that way one could swap out a server and the distribution would not change [00:35:15] New patchset: Reedy; "Remove memcached ports from pmtpa config" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/48590 [00:35:38] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [00:36:11] AaronSchulz: splitting very large keys into multiples and reassembling after a getmulti might not be that crazy… though doing so with that twemproxy feature would be better [00:36:31] getting them on the same server is nice [00:36:43] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/48590 [00:38:17] AaronSchulz: i might start testing twemproxy on a small scale, possibly on a production apache or two [00:39:05] RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 28 seconds [00:39:21] TimStarling: I've made some more jobqueue commits [00:39:32] RECOVERY - MySQL Slave Delay on db1007 is OK: OK replication delay 11 seconds [00:39:40] ok [00:40:46] binasher: btw, how are those redis servers comming? :) [00:41:21] depends on if we burn db class hosts on them or not… which would kinda be a waste [00:41:43] RobH: what's up with the continuing saga of the new ssd having high perf misc servers? [00:42:05] the swift legacy? [00:43:13] LocalFile::recordUpload2: Transaction already in progress (from SiteStatsUpdate::doUpdate), performing implicit commit! [00:43:15] * AaronSchulz hrms [00:48:16] wikitech is full of horribly outdated and misleading cruft. [00:48:33] RobH: do you just learn that? [00:48:59] i thought wikitech was getting killed off [00:49:04] at last years berlin hackathon [00:49:18] its going to be merged with labsconsole wiki [00:49:33] so I am attempting to clean up the major pages we actually use, and get rid of some cruft pre-merge [00:49:48] though most will have to wait post-merge when we have a mass of volunteers who can edit it. [00:50:05] plus a lot of it is just wrong. [00:57:54] <^demon> Ok, let's do this! [00:58:03] meh [00:58:07] let's push it off another month [00:58:14] ;) [00:58:18] <^demon> That's not even funny at this point :p [00:58:21] ready when you are [00:58:38] realistically, you're really doing all of the work ;) [00:58:45] New patchset: Andrew Bogott; "Rework the RT manifests so it can be installed in Labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47026 [00:59:46] <^demon> Ryan_Lane: Ok, can you merge https://gerrit.wikimedia.org/r/#/c/48574/ now and update sockpuppet? [01:00:03] yep [01:00:11] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48574 [01:00:40] ^demon: done [01:01:25] <^demon> !log stopped puppet and gerrit on manganese and formey [01:01:27] Logged the message, Master [01:06:47] AaronSchulz: hey, https://bugzilla.wikimedia.org/show_bug.cgi?id=42133 [01:07:03] Reedy just opened an RT about creating containers for new wikis [01:07:09] so this is the next time I was referring to :) [01:07:17] heh [01:07:25] I got distracted at the end of the process... [01:09:47] paravoid: do you have dinner plans tonight? [01:10:13] uhm, kind of [01:10:23] paravoid: hmm [01:10:26] tomorrow? [01:10:32] paravoid: PM [01:13:24] <^demon> Ryan_Lane: https://gerrit.wikimedia.org/r/#/c/35562/ and https://gerrit.wikimedia.org/r/#/c/34516/, please? [01:13:51] ooh, style changes [01:14:08] New review: Ryan Lane; "Patch Set 2: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/35562 [01:14:09] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35562 [01:14:29] New review: Ryan Lane; "Patch Set 1: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/34516 [01:14:39] New review: Ryan Lane; "Patch Set 1: Verified+2" [operations/puppet] (production); V: 2 - https://gerrit.wikimedia.org/r/34516 [01:14:48] path conflict [01:14:54] needs rebase [01:14:58] on 2nd change [01:16:52] <^demon> Rebased. [01:17:05] <^demon> (Hmm, no notif? Will investigate in a bit) [01:17:22] is it down? [01:17:47] it's not working too well for me [01:17:58] <^demon> wfm? [01:18:07] when I click review it just sits there [01:18:13] saying "Working ..." [01:18:42] maybe it's just working hard [01:18:48] ;) [01:19:01] yeah, this is broken [01:19:07] I tried firefox and chrome [01:19:21] Ditto [01:19:49] GET https://gerrit.wikimedia.org/r/accounts/self/avatar?s=100 404 (Not Found) 7EC8574480641920105EFDDCBDBC26F2.cache.html:22797 [01:19:49] GET https://gerrit.wikimedia.org/r/accounts/self/avatar?s=26 404 (Not Found) 7EC8574480641920105EFDDCBDBC26F2.cache.html:22797 [01:19:49] Uncaught TypeError: Cannot read property 'length' of undefined 7EC8574480641920105EFDDCBDBC26F2.cache.html:11326 [01:20:13] The last error specifically from trying to review [01:22:02] <^demon> The 404s are expected, knowing how that feature works.. The last error is the problem. [01:23:07] <^demon> Hmm, was able to review https://gerrit.wikimedia.org/r/#/c/48487/ [01:23:54] another 405 [01:23:56] *404 [01:23:57] GET https://gerrit.wikimedia.org/r/projects/mediawiki%2Fcore/dashboards/default?inherited 404 (Not Found) [01:24:07] oh [01:24:14] did that change get applied? [01:24:18] and did apache restart? [01:24:24] <^demon> Yes. [01:24:28] :( [01:24:37] was hoping it would be an easy fix. heh [01:24:45] I can apparently review https://gerrit.wikimedia.org/r/#/c/48591/1 [01:25:01] <^demon> Maybe stuff stuck in your cache? Weird, but maybe. [01:25:52] nope [01:25:55] ^demon: Possibly related, I'm trying to submit a patch and 'The remote end hung up unexpectedly' [01:26:02] Working better in an incognito window [01:26:04] I reset safari and tried [01:26:24] <^demon> andrewbogott_afk: Gerrit has restarted several times during the process. [01:26:46] ^demon: he's having the issue right now [01:27:41] I can't fetch either, it seems [01:31:18] <^demon> I was just able to push a new patch to https://gerrit.wikimedia.org/r/#/c/28352/ [01:31:37] <^demon> And just fetched from a couple of repos. [01:31:54] <^demon> And just saw a review from Tyler. [01:32:11] some pages definitely aren't working, though [01:32:18] I was able to review and merge 1/2 of the changes [01:32:30] the other one I cannot [01:32:32] <^demon> Let's flush all caches. [01:32:38] ok [01:32:45] need me to do so, or can you? [01:32:52] <^demon> Just did. [01:32:55] ok [01:33:40] still not working [01:33:58] I'm resetting safari on each attempt as well [01:34:40] <^demon> Ahh, error for me in safari too. [01:34:43] <^demon> (was working in chrome) [01:35:08] it doesn't work for me in any browser [01:35:16] specifically this change: https://gerrit.wikimedia.org/r/#/c/34516 [01:36:21] <^demon> The heck is up with that change. [01:36:28] <^demon> Try some unrelated change? [01:37:02] https://gerrit.wikimedia.org/r/#/c/22698/ [01:37:19] that also has the problem [01:38:19] lots of changes have this problem [01:38:27] https://gerrit.wikimedia.org/r/#/c/43148/ [01:39:29] https://gerrit.wikimedia.org/r/#/c/47535/ [01:39:34] <^demon> The heck. I've not hit this anywhere in our testing. [01:44:18] ^demon: so, what to do? [01:44:26] <^demon> I'm looking still. [01:44:40] ok [01:46:42] <^demon> A-ha! [01:46:46] <^demon> There's a fix in master we need. [01:46:47] <^demon> https://gerrit-review.googlesource.com/#/c/42170/ [01:46:51] <^demon> Re-building now. [01:47:27] <^demon> Well, didn't know we needed this. But it'll fix the problem. [01:47:46] <^demon> (and it just went in 30m ago, so wouldn't have seen it) [01:48:34] heh [01:48:59] <^demon> So, we will be deploying HEAD after all :p [01:55:22] <^demon> Ryan_Lane: Working now. [01:55:31] eh? [01:55:36] oh [01:55:39] <^demon> It's working now. [01:55:40] <^demon> :) [01:55:49] you deployed a newer version without a package update? [01:56:00] mind updating the package? [01:56:07] <^demon> Yes, was going to. [01:56:08] <^demon> Once I unbroke gerrit. [01:56:10] I'll build it and push it into the repo [01:56:11] heh [01:56:11] ok [01:56:24] Oh, that's right. It's February 11. [01:56:32] <^demon> I was afraid of putting in a patch for operations/debs/gerrit that we couldn't review :p [01:59:31] heh [01:59:31] right [02:01:35] New review: Demon; "Patch Set 1:" [operations/debs/gerrit] (master) - https://gerrit.wikimedia.org/r/48594 [02:01:43] <^demon> Ok, updated package. [02:04:39] <^demon> Ryan_Lane: ^ [02:05:21] New review: Ryan Lane; "Patch Set 1: Code-Review+2" [operations/debs/gerrit] (master) C: 2; - https://gerrit.wikimedia.org/r/48594 [02:05:28] New review: Ryan Lane; "Patch Set 1: Verified+2" [operations/debs/gerrit] (master); V: 2 - https://gerrit.wikimedia.org/r/48594 [02:05:29] Change merged: Ryan Lane; [operations/debs/gerrit] (master) - https://gerrit.wikimedia.org/r/48594 [02:08:08] So… this is thought to work now? Or are other fixes still in the works? [02:08:29] <^demon> The review problem should be fixed now. [02:08:44] ^demon: I thought they were going to do something about the "needs verified" error kicking you out of the review screen [02:08:49] it's so freaking annoying [02:08:59] <^demon> I don't remember that. [02:09:11] <^demon> Nothing upstream I know of ever addressing that. [02:09:15] :( [02:09:39] <^demon> Can you review https://gerrit.wikimedia.org/r/#/c/34516/ and its child now? [02:09:49] <^demon> (both a please and "does it work" :) [02:10:07] Ryan_Lane: Would you have time today or tomorrow for me to pick your brain about how to deploy a 30-megabyte node_modules directory via git-deploy (but presumably not via git itself)? [02:10:32] RoanKattouw: sure. after the gerrit stuff is done I can [02:10:36] OK [02:10:43] Yeah deal with that first obviously :) [02:11:04] <^demon> Wrapping it up now, should be mostly up. [02:11:08] New review: Ryan Lane; "Patch Set 2: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/34516 [02:11:09] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/34516 [02:11:19] <^demon> (Will need to restart one last time for 2 config changes to go live) [02:11:41] merged all the way through [02:13:21] <^demon> Changes live. [02:15:12] <^demon> Heh, glad we didn't put gitblit live as default. It's having problems on prod. [02:15:24] <^demon> Will sort that tomorrow. [02:15:29] ok. you should upgrade the package [02:15:37] I pushed it into the repo [02:17:53] <^demon> Done. [02:17:53] cool [02:18:59] <^demon> Man this package sucks. [02:19:08] yep [02:19:17] I think there's a native one in debian now [02:19:23] we should look at switching to that [02:20:10] <^demon> Yeah, worth looking at. [02:20:19] gerrit is down [02:20:29] I was just coming to ask if that was intentional. [02:20:42] https://gerrit.wikimedia.org/r/#/c/48583 isn't loading for me. I get "Service Temporarily Unavailable." [02:21:01] ^demon: ^^ [02:21:19] <^demon> I had to run puppet one last time because the package does stupid things. [02:21:23] ah ok [02:24:29] <^demon> Ok, package in place, and cleaned up after it with puppet. [02:24:33] <^demon> Everything's back up. [02:25:05] New review: Andrew Bogott; "Patch Set 2: Verified+2 Code-Review+2" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/48595 [02:25:07] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48595 [02:28:54] !log LocalisationUpdate completed (1.21wmf9) at Tue Feb 12 02:28:53 UTC 2013 [02:28:58] Logged the message, Master [02:33:11] <^demon> Hmm, replication plugin isn't loading. Will debug that now. [02:33:29] <^demon> Luckily plugin deployment doesn't require gerrit restart. [02:52:43] !log LocalisationUpdate completed (1.21wmf8) at Tue Feb 12 02:52:42 UTC 2013 [02:52:45] Logged the message, Master [03:16:07] <^demon> Ryan_Lane: I'm beat. I've got a couple of loose ends to tie up in the morning, but I think we're mostly in the clear. [03:16:10] <^demon> Night. [03:16:40] ^demon|away: night! [03:50:38] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [03:57:23] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [04:15:30] New review: Asher; "Patch Set 2: Verified+2 Code-Review+2" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/48129 [04:15:32] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48129 [04:18:32] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [04:18:32] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [04:18:32] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [04:19:35] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [04:20:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:22:08] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.300 seconds [04:55:08] RECOVERY - MySQL disk space on neon is OK: DISK OK [05:26:47] RECOVERY - Puppet freshness on srv241 is OK: puppet ran at Tue Feb 12 05:26:37 UTC 2013 [05:34:55] New patchset: FastLizard4; "Allow setting of MySQL bind address in NovaInstance config" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48585 [05:52:35] PROBLEM - Puppet freshness on ms-fe1002 is CRITICAL: Puppet has not run in the last 10 hours [06:38:38] PROBLEM - Puppet freshness on mw37 is CRITICAL: Puppet has not run in the last 10 hours [06:43:37] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 187 seconds [06:44:04] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 197 seconds [06:45:25] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [06:45:43] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [07:00:13] New review: FastLizard4; "Patch Set 1: Code-Review-1" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/48585 [07:44:49] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 184 seconds [07:46:37] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [07:57:07] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [08:09:07] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 189 seconds [08:09:52] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 194 seconds [08:14:32] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 198 seconds [08:18:16] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [08:18:43] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [08:29:04] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [09:34:20] paravoid, around? [09:36:43] RECOVERY - Puppet freshness on amssq41 is OK: puppet ran at Tue Feb 12 09:36:29 UTC 2013 [09:41:04] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [09:42:34] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 198 seconds [09:42:43] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 201 seconds [09:49:37] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 1 seconds [09:49:46] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [10:24:43] RECOVERY - Puppet freshness on spence is OK: puppet ran at Tue Feb 12 10:24:32 UTC 2013 [10:33:24] RECOVERY - Solr on vanadium is OK: HTTP OK HTTP/1.1 200 OK - 6435 bytes in 0.081 seconds [11:25:01] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 202 seconds [11:25:27] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 215 seconds [11:33:01] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [11:33:28] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [12:14:40] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 190 seconds [12:14:49] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 195 seconds [12:38:58] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [12:40:28] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [13:00:57] New review: Silke Meyer; "Patch Set 3:" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48474 [13:38:17] would it be hard to move branch "production" in operations/puppet to be "master", now that we don't use "test" anymore and every other repo has "master"? [13:40:42] gerrit-wm: ping [13:41:51] New review: Dzahn; "Patch Set 1: Verified+2 Code-Review+2" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/48620 [13:41:53] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48620 [13:43:55] merging a change to nginx.conf.erb logformat that was sitting on sockpuppet [13:51:25] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [13:51:51] singer, heh, for real..just removed old stuff from there [13:52:19] ah, of course:) fixing [13:58:16] New review: Dzahn; "Patch Set 1: Verified+2 Code-Review+2" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/48623 [13:58:17] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48623 [13:58:39] gerrit-wm: you don't report new patch sets anymore, but you do report the reviews and merges [14:00:07] RECOVERY - Puppet freshness on singer is OK: puppet ran at Tue Feb 12 13:59:43 UTC 2013 [14:04:22] hashar: re: changing font size in gerrit.. it is also just hitting Ctrl++ [14:04:47] <^demon> mutante: new patch set hook is messed up, on my todo list this morning. [14:04:48] C++ ? [14:05:08] hashar: Ctrl and the + key [14:05:19] oh sorry [14:05:24] ^demon: ah, gotcha. thanks [14:05:31] mutante: well it uses to be hardcoded to 8pt now it is 9pt [14:05:35] but I guess we can make it dynamic [14:05:44] that would another hack though to fix the font-family: monospace; [14:06:21] i don't care much, just expecting people to then say it is too large now.:) [14:06:44] hehe [14:06:55] I will let them figure out another patch so :-] [14:07:03] ok:) [14:07:05] for now 8pt is too small for my eyes. [14:07:23] just saying how large 8pt are is under the control of the user anyways..on their computer [14:07:30] I can detect small fonts when I am not able to read the text without my glasses [14:07:38] yeah sure :-] [14:07:44] I use it multiple time [14:07:44] <^demon> hashar: Upstreaming our skin improvements makes people happy :) [14:08:08] Ctrl + + is actually the first thing I do on Wikipedia since the font is a bit too small there [14:08:18] ^demon: if that plays well sureè [14:08:41] maybe the design team feels like doing gerrit CSS too,heh:) [14:08:56] <^demon> No, just give them to me in the forms of generic CSS to fix, and I'll upstream them. [14:09:39] btw.. video is online http://video.fosdem.org/2013/lightningtalks/How_to_hack_on_Wikipedia.webm [14:09:48] from Quim's talk at Fosdem [14:10:09] and all the others http://video.fosdem.org/2013/ [14:10:17] someone put it on commons..or i will [14:10:49] ^demon: remembers me I can't contribute to Gerrit cause of their end user agreement license [14:15:50] http://mirror.be.gbxs.net/video.fosdem.org//2013/maintracks/Janson/The_Keeper_of_Secrets.webm [14:19:28] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [14:19:29] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [14:19:29] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [14:19:48] hrmmm, is mutante not in SF? (i.e. why is he awake?) [14:19:59] no, i am in Germany [14:20:31] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [14:20:43] good :) (but i thought you moved?) [14:20:50] i did, i am just visting [14:21:07] i was in Europe for FOSDEM anyways [14:21:17] going back in time for S.F. hackathon [14:25:38] RECOVERY - Puppet freshness on professor is OK: puppet ran at Tue Feb 12 14:25:13 UTC 2013 [14:25:56] !log ran puppet on professor [14:25:56] Logged the message, Master [14:32:46] morning [14:34:09] morning! [14:40:34] mutante: I got an easy change for you :-D That adds a wikimedia package 'php-luasandbox' ensuring it is latest https://gerrit.wikimedia.org/r/48127 [14:41:02] mutante, you around? looking at 4513 - I'm pretty sure we have a reject posts from non members setting [14:41:41] hashar: you sure you want automatic updates ? [14:41:48] mutante: yup :-] [14:41:55] I can add a comment there though [14:42:07] Thehelpfulone: if you wanna reply to it, that would be nice:) [14:42:15] sure [14:42:15] oh i did [14:42:25] i just forwarded it to RT to be reminded to check for that sometime [14:43:20] New review: Dzahn; "Patch Set 2: Verified+2 Code-Review+2" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/48127 [14:43:21] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48127 [14:45:14] \O/ [14:45:36] mutante: just merge on sock puppet, I got it installed on the server already [14:45:54] hashar: already done. and watching gallium [14:46:01] hashar: unrelated "err: /Stage[main]/Misc::Docs::Puppet/Git::Clone[puppetsource]/Exec[git_pull_puppetsource]/returns: change from notrun to 0 failed: git pull --quiet returned 1 instead of one of [0]" [14:46:05] bah [14:46:21] I need to phase out that puppet doc stuff [14:46:28] should be made by Jenkins instead of via puppet [14:47:07] as long as we keep doc.wm :) [14:47:24] because i already referred people to it when being asked for puppet docs:) [14:52:05] mark: netapp tech will be here soon to replace main board on nas1001a [15:02:02] mutante: yeah doc.wikimedia.org is on the Jenkins host :-] [15:03:37] hashar: ..it reminds us that we should convert everything into modules.. then they would all show up in the upper left corner [15:03:48]