[00:49:07] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [00:52:32] (03CR) 10Jeremyb: "> Hashar, do you know where to configure the Beta Labs logo the way http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page does?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122084 (owner: 10Mattflaschen) [00:53:03] (03PS3) 10Jeremyb: Set a special logo for Commons on Beta Labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122084 (owner: 10Mattflaschen) [00:53:42] (03PS4) 10Jeremyb: Set a special logo for Commons on Beta Labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122084 (owner: 10Mattflaschen) [00:54:13] (03CR) 10Jeremyb: "PS4 is just a rebase" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122084 (owner: 10Mattflaschen) [00:59:58] Nemo_bis: live hacked seems unlikely [02:14:35] !log LocalisationUpdate completed (1.23wmf19) at 2014-03-30 02:14:34+00:00 [02:14:46] Logged the message, Master [02:31:23] !log LocalisationUpdate completed (1.23wmf20) at 2014-03-30 02:31:23+00:00 [02:31:29] Logged the message, Master [03:11:17] !log LocalisationUpdate ResourceLoader cache refresh completed at Sun Mar 30 03:11:14 UTC 2014 (duration 11m 13s) [03:11:23] Logged the message, Master [04:35:17] (03PS1) 10Springle: Set userstat=ON for mysql_multi_instance roles, to match coredb. This was previously SET GLOBAL on labsdb, so make it stick. [operations/puppet] - 10https://gerrit.wikimedia.org/r/122090 [04:49:13] (03CR) 10Springle: [C: 032] Set userstat=ON for mysql_multi_instance roles, to match coredb. This was previously SET GLOBAL on labsdb, so make it stick. [operations/puppet] - 10https://gerrit.wikimedia.org/r/122090 (owner: 10Springle) [04:53:50] (03PS1) 10Springle: Repool db1034 in s7 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122092 [04:54:27] (03CR) 10Springle: [C: 032] Repool db1034 in s7 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122092 (owner: 10Springle) [04:54:34] (03Merged) 10jenkins-bot: Repool db1034 in s7 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122092 (owner: 10Springle) [04:55:23] !log springle synchronized wmf-config/db-eqiad.php 's7 db1034 to full steam' [04:55:29] Logged the message, Master [05:08:24] (03PS1) 10Springle: Replacement for the ad-hoc pt-kill jobs I've been using to keep the cluster alive during abnormal traffic events. [operations/software] - 10https://gerrit.wikimedia.org/r/122093 [05:09:02] (03CR) 10Springle: [C: 032] Replacement for the ad-hoc pt-kill jobs I've been using to keep the cluster alive during abnormal traffic events. [operations/software] - 10https://gerrit.wikimedia.org/r/122093 (owner: 10Springle) [07:00:15] (03PS1) 10Ori.livneh: Don't load as-yet-incompatible extensions when running under HHVM [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122096 [07:00:53] (03CR) 10Ori.livneh: [C: 032] Don't load as-yet-incompatible extensions when running under HHVM [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122096 (owner: 10Ori.livneh) [07:01:00] (03Merged) 10jenkins-bot: Don't load as-yet-incompatible extensions when running under HHVM [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122096 (owner: 10Ori.livneh) [10:31:40] (03CR) 10Steinsplitter: "I need to move File:Commons-Beta-logo.svg to File:Wiki.svg (or creating a Wiki.png?)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/122084 (owner: 10Mattflaschen) [19:36:34] (03PS1) 10Hashar: Lint misc/icinga.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/122131 [19:44:48] arrg [19:44:54] i was working on this now :/ [19:57:12] (03CR) 10Matanya: Lint misc/icinga.pp (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122131 (owner: 10Hashar) [22:01:31] (03PS1) 10Matanya: applicationserver: lint and tidy [operations/puppet] - 10https://gerrit.wikimedia.org/r/122269 [23:41:35] hoo: ping [23:42:19] hey springle :) [23:42:26] hey :) [23:42:49] the 800+ connectiosn you saw on s5 -- how many owuld you say were sleepers? (if you noticed) [23:43:23] springle: By the time I looked (already got better at that time) like 750+ [23:43:33] ugh [23:43:52] not much was happening at that time [23:43:59] this has been a problem lately [23:44:07] https://bugzilla.wikimedia.org/show_bug.cgi?id=63058 [23:44:12] https://bugzilla.wikimedia.org/show_bug.cgi?id=62303 [23:44:21] couple other stack traces around too [23:45:19] I'm not sure there's even Scribunto / Luasandbox active on Wikidata [23:45:23] it probably is, but not widely used [23:45:29] memcached stuff sounds more likely [23:46:10] I looked at that memcached-serious logs but it hadn't had anything of interest around that time (none of hte logs had anything interesting after all) [23:47:06] :\ [23:47:11] ok [23:47:12] thanks [23:47:18] * springle will dig some more [23:47:32] springle: Btw, is there a cron or so that kill wikiuser sleepers? [23:47:36] * kills [23:48:18] no, not yet. there are some ad-hoc pt-kill jobs around on problem shards [23:48:36] $wgMemCachedTimeout = 250000; # default is 100000 [23:48:43] why is it so freaking high, mh [23:48:56] springle: Like locally on the DB hosts? [23:49:07] i'm hesitant to enshrine kills in code anywhere yet, lest it masks a problem. but i think it's time to do so [23:49:28] hoo: mostly on db1044 -> problems hosts [23:50:05] which shard is that? [23:50:35] no shard. it's the backend to tendril.wikimedia.org [23:51:50] springle: Could we make icinga cry if a prod. slave hits n connections? Where n is so high that we don't see it during usual peeks [23:53:19] springle: Any idea why the memcached timeout is set to 250s? That could have caused that, hm... [23:54:44] all sleepers ever seemed to stay under 300s [23:57:36] hoo: we could tell icinga, except it should not be critical, only warning. we tried having the innodb processlist checks from percona, but they often just make noise that isn't very useful for transient problems [23:59:20] yeah... at that time I didn't notice the probs. with Wikidata and even icinga or anything didn't cry out... only a dewiki(!) user pinged because of DB troubles