[02:16:58] !log LocalisationUpdate completed (1.19) at Tue Apr 10 02:16:58 UTC 2012 [02:17:00] Logged the message, Master [14:13:08] !log reedy synchronized wmf-config/InitialiseSettings.php 'wgLanguageConverterCacheType for git deployment later' [14:13:10] Logged the message, Master [14:58:34] http://commons.wikimedia.org/w/index.php?title=File:Flag_of_Belarus.svg&action=purge <-- what's that? [14:58:39] looong error message [14:58:43] any server issues? [14:59:48] the file is missing as well, no logs of deletion whatsoever [15:00:24] on local projects using the file gives a red link [15:00:25] the other message (the first one) reflects the fact that the file is missing [15:00:37] we should probably be a bit more graceful about handling that [15:00:43] why is the file missing? [15:00:47] it has been there before [15:00:59] I have no idea [15:01:19] heh [15:01:23] no recent history about it [15:02:29] who can do some magic to restore this? [15:02:33] or solve [15:02:44] it would be nice to know the first time someone noticed it was missing [15:03:01] and the most recent time someone saw it on the cluster [15:03:06] it's a bit touch because we have caching [15:03:13] ah, it added itself to Category:Pages with broken file links [15:03:18] today it was the first time [15:03:54] DarkoNeko: that is because the file is included on its file page itself [15:04:05] hm [15:04:49] I'm going to look into it a little more [15:04:50] apergos: it must be today or in the past days this problem [15:04:55] I may come up empty-handed [15:05:26] yesterday the category with Pages with broken file links had none of this file in it [15:05:32] today suddenly a lot [15:09:42] wtf it's there (the original) ... http://upload.wikimedia.org/wikipedia/commons/8/85/Flag_of_Belarus.svg [15:09:59] root@ms7 # ls -l Flag_of_Belarus.svg [15:09:59] -rw-r--r-- 1 apache apache 18172 Apr 10 03:39 Flag_of_Belarus.svg [15:10:01] strange [15:10:17] guess the row in the file table must be gone. I could try to hunt for that [15:10:38] ie verify it's gone [15:11:08] anyways it was changed recently (like, earlier tody) [15:11:10] *today) [15:11:35] nothing seen in the logs on the wiki that something changed, what kind of change was it then? [15:11:42] I have no idea [15:11:56] I just see the timestamp and see that it was replaced at that time [15:12:01] april 10 3:39 utc [15:12:37] it certainly seems likely [15:15:19] mysql> select * from image where img_name = 'Flag_of_Belarus.svg'; [15:15:19] Empty set (0.00 sec) [15:15:25] so that's bad [15:15:34] what does it mean? [15:15:59] it means the image exists on the filesystem, the file description page and its history exist in the database, but the row in the image table is gone. [15:16:02] ugh [15:16:30] time to look through a few logs and see if I can find anything from around that time [15:17:04] nothing in sal [15:28:22] nothing in syslog [15:33:27] Apr 10 03:37:28 10.0.2.247 apache2[7840]: PHP Warning: Division by zero in /usr/local/apache/common-local/php-1.19/extensions/ParserFunctions/Expr.php on line 423 [15:33:32] a bit unlikely but it's curious anyways [15:35:27] As if that's still a warning, everyone knows a division by zero is zero... [15:35:40] :-D [15:35:54] bad templateor something I guess [15:35:55] anyays... [15:36:52] nothing in swift logs [15:38:10] Does it actually exist in swift? [15:38:24] thumbs should [15:38:43] right now I'm just hunting for anything that indicates an error at aaround that time [15:38:47] not turning up anything yet [15:39:21] Interesting that it's missing from the images table, like not being in the pages but on the fs you could say delete fail but being in pages and not images is like wtf [15:39:46] that category of missing file links might be usefil [15:39:51] to look at [15:40:12] well it sounds to me like a non atomic transaction failed halfway through [15:40:13] like: [15:40:20] we're going to upload a new version of the file now [15:40:27] here it is, let's write it to the filesystem first [15:40:35] ok now let's delete the one row from the image table [15:40:42] and insert the new *whoops* barf [15:40:50] I'm looking for something like that [15:41:15] Hmm yeah that could work -- doesn't it keep the old one as the history though and just update the current id after save? [15:41:33] the row in the image table is one row [15:41:37] that's a different deal [15:43:03] Really... mw makes no sense to me sometimes. [15:43:05] I guess without looking at the schema that oldimage might have older versions in it [15:44:01] http://www.mediawiki.org/wiki/Manual:Oldimage_table voiala [15:44:04] or something. :-P [15:45:49] so what I would ask for today is no one re-upload or restore a previoous version of this file [15:46:00] so that hopefully some othe rpeople can look at the issue [15:46:30] a couple more logs to look at and the I will give up on that angle [15:49:37] apergos: some silent DB transaction rollback? [15:49:57] I dunn about that [15:50:24] a tru rollback would restore the old image table row [15:50:32] it's more affecting postgresql and oracle, but maybe mysql had this problem there too. [15:50:46] well, we commit every so often so I am not sure anymore :) [15:51:31] i got this last time on postgres - due to error in SQL query (table column was missing due to upgrade not done) - a log entry was written (for a block), but the block itself was not added. [15:51:37] so that last was the apache log, I'm looking at the swift log now [15:51:43] I can see the thumb being loaded up etc [15:52:08] with uploading I get an error :S [15:52:21] well pelase don't to these [15:52:34] I want a coupleother people to be able to poke at some of them first [15:52:49] * jeremyb hasn't caught up on backlog yet but... [15:53:00] apergos: FYI, mailed ben+aaron about this last night [15:53:26] care to summarize what you know? [15:54:22] apergos: you have mail [15:56:13] still wondering about their TZs [15:56:55] sf [15:58:25] here's a thumb delete of the file [15:58:49] Apr 10 03:32:26 10.0.6.203 object-server 10.0.6.211 - - [10/Apr/2012:03:32:26 +0000] "DELETE ... 120px-Flag_of_Belarus.svg.png [15:59:26] Apr 10 03:32:26 10.0.6.200 object-server 10.0.6.211 - - [10/Apr/2012:03:32:26 +0000] "DELETE second server [15:59:41] let's check dumps [15:59:42] Apr 10 03:32:26 10.0.6.200 container-server 10.0.6.204 - - [10/Apr/2012:03:32:26 +0000] "DELETE third server [15:59:51] or toolserver [15:59:56] dumps don't have images [16:00:06] and I don't have toolserver access [16:00:13] we won't learn what the malfunction was from there [16:00:14] 2012-04-10 15:59:46 commonswiki: Dump in progress [16:00:20] yeah but have DB entries [16:00:35] all we'll see is that it was gone by the time of the dump, or not [16:00:36] commons is s4? [16:00:41] but we know it disappeared already [16:00:54] I want to know the cause [16:01:02] I don't remember but you can look at [16:01:24] http://noc.wikimedia.org/dbtree/ [16:01:41] > select * from image where img_name = 'Flag_of_Belarus.svg'; [16:01:41] Empty set (0.00 sec) [16:01:55] there should be replication logs somewhere [16:02:19] Apr 10 03:32:30 10.0.6.202 object-server 10.0.6.211 - - [10/Apr/2012:03:32:30 +0000] "DELETE yet another server [16:06:25] there are more of these deletes (object server, container server) [16:06:50] fine if the image was replaced, I guess [16:07:09] uh huh [16:14:55] your mail finally showed up, I'm looking at all the links [16:15:08] wtf, that was forever [16:15:20] yeah [16:15:23] jorn: who art thou? [16:15:38] you look suspiciously close to jorm [16:16:29] hi, uhm just some random guy ;) [16:16:43] not related to jorm [16:16:51] but joern [16:17:34] ok [16:17:56] the bug report is not this bug [16:18:08] the history described in the email is useful however [16:18:52] 10 16:18:33 < jeremyb> saper: no reason to do that on TS. prod has those logs too [16:19:11] i can walk you through getting info out of the binlogs if you think that could be helpful [16:20:13] yes, I think that would be helpful; we would know exactly what got done [16:20:31] ok, so you've narrowed some time windows [16:20:47] uh huh [16:20:57] and we want commons at that time [16:21:03] we can grab all statements from the logs from those windows and then start grepping [16:21:08] really from 03:30 through 03:40 [16:21:36] so s4 [16:21:42] can I do this on a slave over there? [16:21:58] anywhere that has binlog. maybe has to be a master [16:22:08] yeah prolly [16:22:09] ook [16:22:12] but you can copy the binlogs to any arbitrary box and then query them there [16:22:18] db31 it is [16:22:21] domas: sorry for pinging you, but i don't know whom else to ask... it's about those awesome wikistats pagecounts. the files seem to be sorted which is great, but any idea what locale was used to sort them? i'm trying to merge sort some of the files but sort keeps telling me the files are not sorted when i check [16:22:32] load's relatively low [16:23:07] i already tried several locales and LC_COLLATE combinations (tried C, POSIX, en_US.utf8, en_GB)... but none seems to be the correct one :( [16:23:23] >look [16:23:45] You are in a room surrounded by binlogs. There is a bin.index here. [16:23:49] apergos: so, ls -lktr should give you an idea of the time range for each file (last modified time) [16:24:26] have one. it covers quite a large timeframe [16:24:40] 1 gb [16:24:58] >examine db31-bin.000288 [16:25:03] it looks like any other binary file. [16:25:09] or maybe there is someone else in here who knows whom to ask for such details about the http://dumps.wikimedia.org/other/pagecounts-raw ? [16:25:24] all the info I have about the pagecounts is on the index page [16:25:35] I don't know about the locale unfortunately [16:25:50] apergos: the parser is mysqlbinlog [16:26:58] no args [16:27:01] well that's easy enough [16:27:11] >find needle in haystack [16:27:28] You found 1.538.269 needles in the haystack. Which one would you like? [16:28:33] so, maybe something like --start-datetime="2012-04-10 03:30:00" --stop-datetime="2012-04-10 03:40:00" [16:28:43] or maybe i have the date wrong [16:28:49] oh, I'm just sipping up to that point in the file right now [16:29:00] then I'll start looking at it more closely [16:33:16] who formats times 0:00 ? come one [16:33:21] *on [16:34:12] apergos: i have some vague memory of it being picking. you're talking about me, right? [16:34:45] no. I'm talking about mysqlbinlog [16:34:46] output [16:34:47] picky* [16:41:32] apergos: AaronSchulz is here [16:41:48] I saw [16:43:20] Joan: see interwiki talk page @ wikitech-l [16:48:34] and a maplebed too! [16:48:40] whee! [17:01:02] jeremyb: was that meant for me? [17:01:29] jorn: nope, was for Joan [17:01:51] assuming you mean the thing i addressed to Joan [17:01:59] aww, cause those locales are killing me ;) [17:04:49] jorn: regular binary sort methinks [17:07:07] domas: tried that... afaik this corresponds to C locale, in that case "." would come after " ", but it's the other way around (e.g., aa.b occurs before aa in the project field) [17:16:12] domas: is it possible that the projects are always dumped in the order .b, .d, .m, .mw, .n, .q, .s, .v, .w and "" and then the remaining parts are binary sorted? [17:16:48] oh my, i think that's it... sorry for bothering you [17:24:31] oh, you are looking at erik's files [17:24:47] there's an explanation of those formats on the index page for those [17:25:12] but you want to talk to erik zachte about those if you're looking at his repackaged ones [17:26:10] i know... but it didn't say anything about the order of the projects... and i had assumed the whole files were just sorted [17:27:38] no, i'm talking about the rawlogs [17:27:44] oh [17:27:45] ok [17:33:15] jorn: ah, may be [17:33:30] you can look at the source!!!!!11 [17:33:56] domas: well, tried a couple of files and it now also makes sense why erik uses the .z for the wikipedias ;) [17:34:27] what? where's the source? [17:34:58] http://svn.wikimedia.org/svnroot/mediawiki/trunk/webstatscollector/collector.c =) [17:35:44] ah yes, key internally is project:title [17:36:25] heh, 2006 [17:37:44] ;) but the logs only go back to 2007!!! ;) [17:41:17] so jeremyb [17:41:23] ok, mergesort is running now, *whaa, bus* [17:41:24] this looks like it's a failed move [17:41:25] thanks a lot [17:41:27] so striker [17:41:42] did i get the name right? [17:41:50] a bunch of stuff goes into filearchive [17:41:52] with the notation [17:42:09] Deleted to make way for move from [[File:Flag of Belarus 2012.svg]] [17:42:15] and then there is the delete from images [17:42:18] and that's all there is [17:42:37] so, there's no insert to the other table? [17:43:05] I don't see anything, I looked for follow-on transactions by the same user [17:43:06] nada [17:43:59] what about just searching for the name or primary key? at least one should be in every query i think [17:44:54] there are revision tables [17:45:05] yeah I have looked at the transactions for the old and new filename [17:45:15] revision table updates but as far as what we want, no [17:47:50] So, heads up for all: Last call for any and all revisions that *need* to be in the wmf branch and any relevant extensions! [17:50:47] since my guesses about the time frame were a bit off (the real action was around 3:32) I'm looking at the other logs again to see if I see anything failed, a thumb, a rendering, whatever [18:03:41] !log aaron synchronized wmf-config/swift.php 'Catch e bogus empty file names from listings' [18:03:43] Logged the message, Master [18:05:42] * AaronSchulz sighs at typos [18:05:52] I wondered where that e went, got shifted back [18:07:39] Hello wikimedia ops. [18:07:51] There are a few oddities with files being reported on Commons. [18:07:57] http://commons.wikimedia.org/wiki/Commons:Administrators%27_noticeboard#File:Flag_of_Belarus.svg [18:08:01] and the thread directly above it. [18:08:25] yes [18:08:26] we know [18:08:32] this looks like it was a failed move [18:08:41] okay at least you know [18:08:44] yes [18:08:47] thanks for the report [18:08:50] np [18:09:03] the question is why the move failed [18:18:57] AaronSchulz: when you do an image thumbnail purge, does it ask swift for a container listing or just use the list from ms5? [18:19:30] it gets it from swift [18:27:44] apergos: anything clear with the Belarus flag? [18:28:39] no, like I was saying earlier there was a move in process, I see the batch inserts into filearchive, the delete of the image, but no new insert [18:29:52] is there no way to solve this? [18:33:25] two other people have been asked to look at it [18:33:32] that dead with the media code more directly [18:33:58] the immediate fix will be to reset some things in swift and then re-upload the file I suppose [18:34:07] but the real question is what broke it [18:34:38] I am guessing that somewhere in the bowels of swift rendering a thumb or retrieving one, something went awry that impacted the rest ofthe move, but that's only a guess [18:38:44] typically one should be able to restore after half-completed deletes and the page is back to the old way (able to be deleted again) [18:39:28] MW should be at least that tolerant of file op failures and php exceptions to some extent [18:39:52] though some files have mismatched metadata...hopefully new cases shouldn't be popping up [18:44:57] AaronSchulz: so if a thumbnail exists in ms5 but not in swift, it won't try and delete the image from swift? [18:45:10] (i.e. ms5 has 100, 200, and 300px versions, swift only has the 300px version) [18:45:21] right [18:45:26] how about squid? [18:45:36] squid is based on ms5 list [18:45:39] ok. [19:00:07] !log reedy synchronized php-1.20wmf1/ 'Pushing files for 1.20wmf1' [19:00:09] Logged the message, Master [19:05:56] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: test2wiki to 1.20wmf1 [19:05:58] Logged the message, Master [19:07:12] !log reedy synchronized php-1.20wmf1/LocalSettings.php 'Push LocalSettings out' [19:07:14] Logged the message, Master [19:23:54] !log reedy synchronized php-1.20wmf1/extensions/ 'Would you like some extensions to go with that, sir?' [19:23:56] Logged the message, Master [19:24:38] !log reedy synchronized wmf-config/ExtensionMessages-1.20wmf1.php 'Sync ExtensionMessages' [19:24:40] Logged the message, Master [19:35:09] !log reedy synchronizing Wikimedia installation... : Rebuilding localisation cache for test2/1.20wmf1 [19:35:12] Logged the message, Master [20:12:29] sync done. [20:14:01] !log reedy synchronized php-1.20wmf1/extensions/PrefSwitch/ 'PrefSwitch is needed by SimpleSurvey' [20:14:03] Logged the message, Master [20:18:06] !log Deleting php-1.18 from all apaches due to lack of space [20:18:08] Logged the message, Master [20:19:13] o.0 wow that's old php [20:19:35] <^demon> No, we just name our directories silly things :) [20:21:06] You totally should host the mw stuff out of ~/.porn just for kicks [20:23:03] !log reedy synchronized live-1.5 'Fix symlinks' [20:23:05] Logged the message, Master [20:24:17] !log reedy synchronized php-1.20wmf1/ 'Resyncing for apaches with no space' [20:24:18] Logged the message, Master [20:31:59] !log reedy synchronized live-1.5/ [20:32:01] Logged the message, Master [20:35:41] !log reedy synchronized docroot/ [20:35:43] Logged the message, Master [20:45:53] !log reedy synchronized docroot/ [20:45:55] Logged the message, Master [20:55:59] !log reedy synchronized docroot/ 'Fix symlinks' [20:55:59] Logged the message, Master [21:04:13] !log reedy synchronized php-1.20wmf1/extensions/MobileFrontend/javascripts 'minified JS' [21:04:15] Logged the message, Master [21:08:11] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: mediawikiwiki to 1.20wmf1 [21:08:13] Logged the message, Master [21:18:03] bonne nuit~ [21:18:56] !log awjrichards synchronized wmf-config/InitialiseSettings.php 'Disabling mobile URL template for mediawiki.org' [21:18:58] Logged the message, Master [21:20:58] !log awjrichards synchronized wmf-config/InitialiseSettings.php 'Disabling mobile URL template for mediawiki.org (using "mediawikiwiki" this time)' [21:20:59] Logged the message, Master [21:30:08] Joan: see interwiki talk page @ wikitech-l [21:30:09] What? [21:30:32] wikitech-l is fairly low traffic. can you find it? [21:30:53] (low traffic atm*) [21:31:15] http://lists.wikimedia.org/pipermail/wikitech-l/2012-April/059963.html [21:32:08] Oh. [21:32:14] Sorry, I had "interwiki map" in my head. [21:32:18] And I was like wtffff. [21:32:23] I'll respond now. Thanks! :-) [22:23:31] !log reedy synchronized php-1.20wmf1/extensions/AntiSpoof/ 'g4103' [22:23:33] Logged the message, Master [22:24:22] !log reedy synchronized php-1.20wmf1/extensions/CentralAuth/ 'g4102' [22:24:24] Logged the message, Master [22:39:57] is it possible to convert liquid threads back to normal discussions? [22:41:12] yes, just disable it [22:42:26] won't that remove all the existing threads? [22:43:09] copy and paste [22:45:33] heh, I'm talking on a mass scale [22:45:51] if I wanted to convert all of strategy wiki's LQT back to normal discussion for example p858snake|l [22:46:46] copy and paste [22:47:03] from each talk page? [22:53:08] !log catrope synchronized php-1.19/extensions/WikimediaMaintenance/jobs-loop.sh 'r114834' [22:53:10] Logged the message, Master [22:53:35] !log Trying a graceful restart of the job runner on mw1 by sending SIGHUP to the jobs-loop.sh process [22:53:37] Logged the message, Mr. Obvious [23:31:12] !log catrope synchronized wmf-config/InitialiseSettings.php 'bug 35869 - Add strategywiki as an import source on testwiki' [23:31:14] Logged the message, Master [23:50:32] !log Removed srv187-189 from /etc/dsh/group/job-runners , their jobrunner class has been commented out in puppet since October [23:50:35] Logged the message, Mr. Obvious