[04:27:41] anyone milling around who knows much about Wikimedia Labs? [04:28:38] maybe try #wikimedia-labs? [04:32:28] thanks legoktm [15:54:13] Cannot contact the database server: Too many connections (10.64.16.23) [16:01:19] maybe result of an admin job from terbium [16:01:32] er maintenance script I mean [16:02:01] SELECT /* MostlinkedTemplatesPage::reallyDoQuery */ tl_namespace AS namespace,tl_title AS title blah blah, [16:02:07] and the bad news is "Copying to tmp table on disk" [16:08:03] I thought we disabled the run of such scripts [16:08:22] well this is definitely coming from terbium and it's wikiadmin, so... [16:08:38] against which db ? [16:08:46] zhwiki [16:09:11] isExpensive() => true [16:09:13] so hmm [16:10:17] apergos: if that cause any issue, feel free to kill it [16:10:36] if I get more reports, yep [16:11:03] I am not sure why it is running anyway [16:12:41] potentially the list of special pages hasn't been updated for quite a long time [16:12:55] and since wikis kept growing, some of them are now taking super long and should be banned [16:13:05] can't remember the config off hand though [16:13:19] I can't keep track of which things Nemo scheduled for once in a while updates and which no [16:13:20] t [16:14:00] Nemo_bis: maybe you have some insight? I feel like sprin gle has discussed all this and if I had a memory worth a dang I would be able to say exactly what's supposed to be running and what's not [16:14:05] but, since I don't... :-/ [17:11:56] apergos: there are no maintenance scripts run in this area of the month [17:12:07] well it's done now anyways [17:12:21] and no other reports so... [17:12:28] spr ingle said they hit another db and that there is the query killer [17:12:32] what was done? [17:13:00] that query [17:13:13] that I reported above [17:13:21] ok [17:13:48] apergos: if you want, it could be useful to have a crontab -l from terbium, just to ensure puppet did its job correctly [17:14:23] well [17:15:29] mostlinkedtemplates is among the disabled one and should be run by terbium on the 21st [17:16:40] it's just as easy to check the syslog (which I did) [17:17:13] I see the usual dispatchChanges, pruneChanges there [17:17:18] and one /usr/local/apache/common/php/extensions/FlaggedRevs/maintenance/wikimedia-periodic-update.sh [17:18:14] sprin gle has much better monitoring I believe [18:34:54] greg-g: It would be super to snag a deploy window sometime tomorrow or Wednesday to push MultimediaViewer updates ahead of the new version going to wikipedias [18:35:06] If possible. [18:35:07] marktraceur: yeah [18:35:32] marktraceur: tomorrow afternoon is open [18:35:41] marktraceur: what time do you want to break the cluster? [18:36:22] Well, I like to do my murdering after breakfast, but apart from that I don't really care [18:36:36] marktraceur: think just an hour? 1-2pm? [18:36:45] Should be fine, yeah [18:36:48] cool [18:36:55] only 1 hour of breakage, please [18:37:05] Can do [18:37:27] * greg-g hopes marktraceur's sarcasm detector is working [18:38:19] marktraceur: can you send gerrit changes that are going out? [18:38:35] marktraceur: is it like a few cherry picks, or a full on update to master including a lot of stuff? [18:39:03] I'd sort of prefer an update to master, but I can cherry-pick instead [18:39:18] Mostly I'm going to want to deploy a bugfix for the "use this file" dialog, because in 0.2 it's basically 100% broken [18:39:25] * greg-g nods [18:39:26] Maybe 110% broken [18:39:28] heh [18:39:41] The bug isn't the best at what it does, but by god, it has spirit [18:39:48] but it'll be auto-upgraded to master on thursday, but only to test wikis... [18:40:18] ok, how much churn happened between what's out there now and on master? (I could look, but I'm lazy, and it's something you know in your head) [18:40:21] or shoul [18:40:24] d [18:41:00] Not that much [18:41:08] Mostly the changes have been GCI students' patches, I think [18:41:22] The MM team has been mostly cycling through iterations on patches and doing CR [18:42:52] is that supposed to instill confidence or not? ;) [19:04:43] greg-g: My patch will be brilliant, though, so no worries. [19:04:52] * marktraceur had a few interrupts fire [19:05:17] marktraceur: ok, yeah, you're on for 2pm tomorrow [19:05:27] Saweet. [19:05:34] I'd better fix it before then [19:05:53] yes [19:58:53] Just to be sure - has there been any change since October 31st that could cause an edit page API call to return a "503 not available" in a section of code that previously worked just fine? [20:00:27] Excirial: 503 is returned by the caching proxies when the API servers are overloaded [20:00:55] This has been the case for years, but our API cluster has been a bit wonky lately [20:01:33] Thanks - never ran into it but thank goodness it is not a code related issue. [20:09:56] Excirial: well, it might be [20:10:03] https://bugzilla.wikimedia.org/show_bug.cgi?id=57865 [20:12:53] That one seems to be aimed at the lab cluster - i'm just using the regular "http://en.wikipedia.org/w/api.php?action=edit&format=xml" though :) [20:34:46] Krinkle: did you receive your password reset email in the end? [20:36:25] mchenry keeps being busier than usual https://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&c=Miscellaneous+pmtpa&h=mchenry.wikimedia.org&tab=m&vn=&hide-hf=false&mc=2&z=medium&metric_group=ALLGROUPS [20:37:36] speaking of which, ori-l, did you ever find out any suggestion on how to monitor stats of messages delivered by the mail relays? a quick search for ganglia plugins for exim doesn't find anything, probably wrong search :) [20:42:28] Nemo_bis: no idea; if there's no plugin, you can write a shell script that gets the value and submits it by using the gmetric command-line tool -- same as the jobqueue graph [20:46:02] ori-l: ah, does exim provide such a value? [20:48:35] Nemo_bis: are you exim me? [20:48:54] sorry, awful pun. i have no idea. [20:49:30] ok :) for a moment I hoped you knew [20:55:05] Nemo_bis: http://linux.die.net/man/8/eximstats [20:59:23] nice, let's see if I already have an open bug where to drop the link