[01:20:06] * Romaine points at https://bugzilla.wikimedia.org/show_bug.cgi?id=52853 [01:26:07] * Elsie points at Romaine. [01:26:11] You broke Wikipedia. [01:26:45] douze points [01:27:17] hopefully someone finds a cure for the bug [01:27:51] <{{Guy|sleeps}}> Antibiotics works wonders and next time... [01:27:54] > This page was last modified on 10 August 2013, at 17:30 [01:28:13] Ouch. [01:28:19] and still that page is edited after that date [01:28:26] fancy software [01:29:08] TimStarling: https://nl.wikipedia.org/wiki/Wikipedia:Aanmelding_moderatoren [01:29:18] Says "This page was last modified on 10 August 2013, at 17:30." in the footer. [01:29:22] But it was edited today. [01:29:39] I've only ever seen this kind of issue while logged out. [01:30:03] https://nl.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=dbrepllag&sishowalldb= [01:30:24] A null edit might fix it, but if I destroy the test case, Tim won't be happy. [01:30:55] <{{Guy|sleeps}}> Screenshot? [01:30:59] null edits on the page have been done by various users [01:31:23] but after pressing a couple of F5 it returns [01:31:52] randomly [01:43:10] Romaine: Do you see "10 August" on any pages currently? [01:43:31] I can check that page [01:44:04] yes [01:44:22] 10 aug 2013 om 19:30. [01:44:25] Which server? [01:44:48] [01:45:47] also [01:45:48] [01:45:52] it varies [01:47:06] another one: [01:47:06] [01:47:50] mw1086, mw1040 for me [01:48:02] mw1177 [01:54:24] Elsie: any idea how this is caused? [01:54:41] My guess is parser cache corruption. [01:54:50] It doesn't seem to be related to Squid. [04:12:22] I'd really appreciate if a someone who is in the know of IP geolocating would look into this matter. [04:15:44] ToAruShiroiNeko: what matter? I don't see much context [04:16:08] The suspense is killing me. [04:26:58] ......... [04:27:20] you can't read minds? [04:27:41] closedmouth: I don't have the skill points yet [04:27:57] useless! [04:28:51] :-( [04:28:58] * greg-g sulks [10:37:05] OK I was told to ask here [10:37:22] http://en.wikisource.org/wiki/Page:Ruffhead_-_The_Statutes_at_Large_-_vol_9.djvu/63 [10:37:46] It seems IMPOSSIBLE tro pass paragraph formatting to templates in a manner which actually works consistently [10:38:02] Can someone PLEASE explain HOW you are supposed to do this? [10:38:20] because I fail to see why mediawiki can't cope with something that simple. [10:38:55] Link template and testcase [10:39:02] I did [10:39:23] http://en.wikisource.org/wiki/Template:Cl-act-section/1 is the template [10:39:38] There is NOTHING in the template which should effect how paragrpahs are handled [10:40:27] and I've already tried various approaches to try and get it working. [10:43:50] I see that. [10:44:19] The problem seems to be that when markup is passed to the template, Mediawiki is ignoring whitepsace... [10:44:34] (including the linke break = paragrpah break) [10:45:18] So I changed to using

which breaks the template in other ways.. [10:50:01] Something seems to be broken also when I look at the page source ... [10:50:10] In that i'm seeing < where I shouldn't be [10:52:33] I'm not however seeing anything obvious as to why the paragrpahs aren't formatting [10:55:06] Let me look at template on computer. [10:55:53] There isn't anything in the template that should cause whitespace or

tags to be ignored [10:56:12] The fault seems to be that Mediawiki is straightforwardly ignoring it internally [10:58:37] I've also tried removing the drop initial [11:00:07] Outside the template the markup works... [11:06:47] The problem seem to be where you are nesting title inside section... [11:07:05] That's not the issue [11:07:18] The title div should be it's own thing [11:07:25] The section template itself has multiple nested switch statements [11:07:33] Yes [11:07:43] That shouldn't be an issue? [11:08:06] The issue is that Mediawiki isn't parsing valid markup [11:08:21] This is NOT an issue about nesting of switch statments [11:08:21] It would take me more time than I have available to trace the template and make a flow chart to see where the issue is. [11:08:41] It's not an issue with the template as far as I can tell [11:09:10] because the same markup supplied as the text= param works perfectly when to supplied to the template [11:09:32] How many times do you think the parser runs per page? If the nesting is too deep, some of the page might not get parsed. [11:10:17] I'm thinking maybe a max template call issue at first glance. [11:10:38] But, I would have to trace the template out to be sure. [11:11:08] It's not a nesting issue [16:16:37] apergos, parent5446: hello [16:16:41] hello [16:17:20] Hey [16:17:44] apergos: i have fixed the bug you encountered yesterday [16:17:54] I saw and I ran it too :-) [16:18:19] I have an unrelated comment, and tht is, that you should choose *some* order for the revisions to be in the xml file [16:18:39] why? [16:18:39] either choose timestamp r choose revid (and say so in the docs) [16:19:16] because although there is not a guarantee with the current dumps, it's generally true that dumps are by revid [16:19:33] with in pages that is, and I am sure that a ton of tools expect that [16:19:37] you know how coders are [16:19:45] ok [16:19:58] so choose one, be able to modify it later, that's all [16:20:33] right [16:20:42] how are the diff dumps coming? [16:20:52] Good day csteipp. [16:22:20] it's getting closer, i think i'll have something worth commiting tomorrow or on monday [16:22:33] great! [16:23:10] Hey Technical_13 [16:24:08] I know in a while you'll be into the speed phase of things, remember that I can do some real-world testing during the periods we don't run the en wp dumps [16:24:39] I can do it at other times but it won't be as nice (server with a lot less cpu and ram) [16:25:43] anything coming up as you are working on the diff dumps? [16:27:31] hmm, i have performance tweaking planned for the start of September (after compression); and that's when enwiki dump seems to usually run [16:29:26] maybe one thing: i consider a revision mostly immutable; the only thing that can change is that text, comment and contributor can be deleted or undeleted [16:30:09] but is that really true? can't for example MediaWiki update change model and format of revision, or something like that? [16:30:25] yeah I run at the beginning of the month in order to give the stats foks time to process the numbers for their reports [16:30:26] if that could happen, i need a way to represent that change in diff dump [16:31:02] the content model could in theory change [16:31:22] I forget, are you getting the revision text directly from the DB? [16:31:52] no, i use fetchText.php, just like current 2-phase dumps [16:32:00] we should expect that sort of thing to be rare (but sometimes there will be e.g. a bug and we will wind up rewriting revision rows) [16:32:27] OK then you shouldn't have too much trouble. I think it's pretty rare revision content changes happen. [16:32:41] right now for example the content model etc are not taken from the db but from the page title, iirc [16:32:46] that's mw configuration [16:33:10] hmm and we have seen page moves where in theory the page content handler would change [16:33:19] (though in practice there was a little bug about that :-D) [16:34:04] so you're saying i need to be able to represent any kind of change? including things like timestamp or minor edit flag changing? [16:34:54] but model and format are a property of revision, so if the new move revision has different model and format, that's not a problem for me [16:34:56] well it seems to me there are two choices [16:35:16] either you need to be able to represent changes on all fields [16:35:44] or periodically a new base full goes through and checks all that crap (metadata anyways) [16:36:53] modifying text on an old revision would be extremely rare and in that case I think you would be justified in saying 'we'll only pick that up for a new full dump' (and then only if asked to, eg compare rev_len, or sha1 or whatever) [16:37:10] so people would have to download new full and couldn't use diff dump after i do that? i don't like that [16:38:05] do you really plan to go through all revisions every time and check rev_len and/or sha1 against what you already have? [16:39:07] i already have to check whether deletion status of the three fields changed; so adding a check for SHA1 wouldn't be much of a problem, i think [16:39:20] (and if SHA1 did change, retrieve the new text) [16:39:42] well in that case maybe you want to check all the revision table fields and track changes for them [16:40:02] ok, i'll do that [16:42:31] and i think that's it from me for today [16:43:00] ok, I will be interested to see how that gets implemented :-) [16:43:18] parent5446: got anything you want to bring up? [16:44:37] Nope [16:44:53] see you both tomorrow then [16:45:00] ok, well that's it for me for today too, thanks for catching tht bug so fast [16:45:06] talk to you tomorrow [16:45:11] See you tomorrow [16:55:42] ^d: CirrusSearch is up on all the beta wikis, yes? http://commons.wikimedia.beta.wmflabs.org included? [16:56:03] <^d> chrismcmahon: Should be, I set default => true on the labs config. [16:56:16] <^d> And they've all got LuceneSearch as a secondary backend for comparing results. [16:56:17] ^d: cool, I think I found a bug. [16:56:23] (a little one) [16:56:40] <^d> Cool! File in BZ and Nik or I will take a look :) [16:58:48] yep [18:17:13] marktraceur: hey, not sure if you saw, etherpad.wm.org is etherpad lite now [18:17:46] Eeee [18:17:51] paravoid: Amazing [18:18:08] I had only a tiny part in that [18:18:14] in the US it's called 'diet etherpad' [18:18:27] the convert script needed some serious modifications to work with the old data [18:18:36] also, etherpad lite's database schema is just CRAZY. [18:18:39] "schema" [18:18:46] I mean, wtf. [18:19:30] +------------------------+ [18:19:30] | Tables_in_etherpadlite | [18:19:30] +------------------------+ [18:19:30] | store | [18:19:30] +------------------------+ [18:19:47] | store | CREATE TABLE `store` ( `key` varchar(100) NOT NULL, `value` longtext NOT NULL, PRIMARY KEY (`key`) [18:19:50] ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | [18:20:40] really? [18:20:49] that's 'lite' all right... [18:20:53] Apparently. [18:21:03] marktraceur: I have the feeling that a lot of people used etherpad.wmflabs.org as "production"; maybe we should import the pads into prod and/or redirect from labs to prod now... [18:21:30] that is, if the labs instance isn't 90% test pads or so :) [18:22:15] apergos: yeah... [18:22:23] guess what happens if you want to e.g. merge two authors [18:22:35] you have to parse & alter json in every single row in your table... [18:22:37] nothing good! [18:22:51] paravoid: Frankly I'm of the opinion that they should have understood the nature of labs, but I'm also OK with yelling "OK, move your pads" on several mailing lists and letting people self-backup [18:23:16] so you're saying we need a fork called etherpad nqtl (not-quite-that-lite) [18:23:40] we tried to tell em 'don't store stuff in the pads, it's for "scratch", store it on a wiki' [18:23:44] fat lot of good that did though [18:24:05] the worst part with the "schema" is that it's more than deliberate [18:24:09] it's well abstracted [18:24:18] it's https://github.com/Pita/ueberDB [18:24:47] really, wtf. [18:24:47] bummer... [18:25:30] paravoid: Blame pita, I guess [18:25:38] I do [18:25:42] Is ueber supposed to be ironic? [18:27:59] there's talks about a rewrite and ueberdb is probably going to be the first to go [18:28:02] ( https://groups.google.com/forum/#!topic/etherpad-lite-dev/ynkkN1n2Nyw ) [18:30:17] hahaha third message even [18:30:46] but at least it doesn't need openoffice to run! [18:30:59] that is true. very very true. [19:59:54] shell user needed for log check https://bugzilla.wikimedia.org/show_bug.cgi?id=15434#c74 [20:00:41] Reedy, MaxSem, ori-l? :) [20:01:16] On which server? ;) [20:01:31] * MaxSem sees no /home/mwdeploy ? [20:02:25] command => "flock -n /var/lock/update-special-pages /usr/local/bin/update-special-pages > /home/wikipedia/logs/norotate/updateSpecialPages.log 2>&1", [20:03:20] The log suggests it got to zuwiktionary [20:03:34] Aug 14 05:46 [20:06:44] Oops [20:06:49] hm, it did https://zu.wiktionary.org/wiki/Special:DeadendPages [20:06:51] I think I just killed that log [20:07:14] we don't need no logs [20:07:51] on commons it definitely did not, can you check the log of today [20:08:04] No [20:08:06] See above [20:08:23] Also, it's from the 14th, not today [20:08:42] well, was [20:08:53] yes, it is correct [20:09:18] ah, you deleted the whole dir? [20:09:23] No [20:09:28] commons is on another file [20:09:43] No [20:09:46] The clue is in the path [20:09:49] They're not rotated [20:11:00] how so, should be /home/mwdeploy/updateSpecialPages/s4@14-DeadendPages.log [20:11:16] Why should it? [20:11:33] http://bug-attachment.wikimedia.org/attachment.cgi?id=13028 [20:11:33] Oh [20:11:58] We can play guess hte host [20:12:35] or use ariel's beloved, https://noc.wikimedia.org/dbtree/ [20:12:54] commonswiki [20:12:54] ------------------------------------- [20:12:54] Statistics completed in 9.56s [20:13:00] suspicious [20:13:39] Also, there are no s6 or s7 logs on disk [20:13:54] even though there is the entry [20:14:07] updatequerypages::cronjob { ['s1@11', 's2@12', 's3@13', 's4@14', 's5@15', 's6@16', 's7@17']: } [20:14:59] yes, they will be in next days [20:15:12] or db-{eqiad|tampa}.php [20:15:31] I was more meaning which host the logs file would exist on [20:15:42] same it should [20:16:00] /home/wikipedia/logs? [20:16:06] no [20:16:16] on terbium [20:16:17] is there one any more? [20:16:17] not NFS [20:16:20] ah [20:16:23] yuck [20:16:28] how do we have logs over there now [20:16:40] locally stored for locally run cronjobs [20:16:42] we have them on these random scattered hosts now, worse than ever [20:16:55] Presumably we don't really care about them too much [20:16:57] fluorine ftw, centralize them all [20:18:00] I think this was what that user was supposed to have enough privileges for? [20:18:17] anyway, does not seem too important to debug now [20:18:39] how can such a query last 9 s on commons? it was aborted it seems, but why [20:19:09] Well, that is apparently all the output [20:19:28] let's run it manually for the luls [20:19:51] if they are killed automatically after 10 s we should find them (only) for extremely small wikis [20:20:06] It says completed [20:20:11] so it wasn't killed [20:20:23] or it got bogus output [20:21:17] Why does it say Statistics? [20:21:29] Not Deadendwhateverthehellitshouldbe [20:21:36] reedy@terbium:/home/mwdeploy/updateSpecialPages$ mwscript updateSpecialPages.php commonswiki --only=DeadendPages [20:21:37] Statistics completed in 27.18s [20:21:37] Ancientpages disabled [20:21:37] Deadendpages disabled [20:21:37] Mostlinked disabled [20:21:38] Mostrevisions disabled [20:21:40] Fewestrevisions disabled [20:21:42] Wantedpages disabled [20:26:51] Is that enough rain yet? [21:07:10] Reedy: hmm, are you suggesting that it is updating Special:Statistics? :/ [21:07:22] Nemo_bis: It seems to suggest that [21:07:39] Reedy: I wonder if it is a lowercase vs. uppercase problem, does it default to Statistics if the special page name is not recognised? [21:07:46] but then why would zu.wikt work [21:07:48] what wiki? cause that could take a loooong long time [21:07:59] commons. oh ugh [21:08:04] apergos: not more than a few dozens hours [21:08:14] for each page check if it has [[ ]] [21:08:24] among the other fun things it does [21:08:24] Deadendpages says it's disabled [21:08:47] Reedy: did you forget --override ? [21:08:54] Apparently so [21:09:29] It still does statistics [21:09:35] reedy@terbium:~$ mwscript updateSpecialPages.php commonswiki --only=DeadendPages --override [21:09:35] Statistics completed in 5.78s [21:09:35] reedy@terbium:~$ [21:09:36] command is like https://noc.wikimedia.org/dbtree/ [21:09:40] Buggy maintenance script is buggy [21:09:41] hmpf [21:09:53] and with lowercase P [21:09:58] ? [21:10:34] Runs statistics first [21:10:39] Then doing Deadendpages [21:10:43] heh [21:10:43] Think you need to fix your class ;) [21:10:48] sigh [21:10:55] well, that's an easy fix [21:11:04] thanks [21:11:19] but why did it work on ku.wikt, perhaps because it's not disabled there? [21:11:22] All 6 are buggy [21:11:25] I suspect so [21:11:28] oki [21:11:54] will file a patch in half a hour or so if nobody else does [21:13:15] Yeah, all of those need to be just uppercase of the first letter [21:43:16] Reedy: so it is running on Commons now? [21:43:26] Deadendpages got 5000 rows in 48.12s [21:44:40] lol [22:20:31] chrismcmahon: beta still in semi-permanent 503 ? [22:20:46] Nemo_bis: beta should be pretty healthy right now [22:20:51] <^d> Most things tend to wfm on beta, except special:version. [22:21:01] Request: GET http://deployment.wikimedia.beta.wmflabs.org/, from 127.0.0.1 via deployment-cache-text1 deployment-cache-text1 ([127.0.0.1]:3128), Varnish XID 1483935948 [22:21:04] Forwarded for: 91.153.141.143, 127.0.0.1 [22:21:04] this is main page [22:21:07] Error: 503, Service Unavailable at Thu, 15 Aug 2013 22:18:47 GMT [22:21:11] Nemo_bis: checking [22:21:26] same on any refresh [22:21:27] <^d> Although bits might be down? [22:21:36] <^d> I'm not getting any CSS/JS on en.wp.beta [22:21:44] Nemo_bis: yep, one moment [22:21:52] http://en.wikipedia.beta.wmflabs.org/wiki/Marching_arts loads unstyled [22:22:00] ^d: +1 [22:22:12] NFS died, just a sec [22:22:21] sure sure :) [22:22:53] yikes, I can't ssh to the apache hosts at all it seems [22:23:06] <^d> Oh, nfs must still be down :\ [22:23:11] <^d> That was a problem earlier. [22:23:20] <^d> Considering beta uses nfs, that would explain bad times. [22:23:25] <^d> nfs :( [22:25:50] well, surely better than glusterfs [22:26:12] <^d> Granted. But if prod can live without nfs it would make it nice if we did the same in beta. [22:39:05] Anyone else seeing 503 Service Unavailable errors for API calls (pywikipediabot)? [22:39:42] they occur on seemingly random edits in page.put [23:29:03] is error 503 a known thing with the api (like 504 is likely to be a large page)? [23:32:11] greg-g: since Andre is not around, and since I don't see Andrew Otto (who I think is on RT duty right now), can you speak to dschwen's question? [23:33:43] dschwen: I was just stepping out, but are you seeing this consistently? Any specific url you're requesting? [23:33:55] ah, I see the scrollback now [23:34:32] dschwen: probably best to report a bug and cc me (greg@wikimedia.org) and link to it in here in case anyone else is watching [23:35:23] * greg-g bikes home [23:38:33] dschwen: with beta or prod? [23:38:56] i guess prod. but greg-g said "scrollback" and beta is up there too [23:39:21] sorry what is beta/prod? [23:39:36] dschwen: "production" is the real site, e.g., en.wikipedia.org. [23:39:44] yeah, production commons site [23:40:21] dschwen: We also have a setup called Wikimedia Labs. Within Labs there is a bunch of sites called "the beta cluster" which are trying to be live replicas of the real production sites except with somewhat newer MediaWiki code running on them. We use the beta cluster for testing [23:40:24] it takes up to 6 retries over 10 mins to get the pages to save [23:40:40] https://www.mediawiki.org/wiki/Wikimedia_Labs has more about Labs [23:40:56] yeah, I know what labs is, I'm running the bot from there ;-) [23:41:16] sorry! [23:41:37] I was like 75% sure you knew what Labs was, but I just wanted to make sure you had the context for the beta cluster https://www.mediawiki.org/wiki/Beta_cluster [23:42:16] yeah, it just did not occur to me that anyone would run a bot against the beta cluster [23:42:21] I guess for testing [23:42:49] yeah [23:43:10] it would actually be a good idea to test against the beta cluster in some cases [23:43:17] (if one were to need to test a bot) [23:43:38] Anyway! so, how long have you been experiencing the issue with the API (on Commons), dschwen? [23:44:36] dschwen: well immediately above you people were talking about beta. but i guess you weren't here yet [23:44:40] yeah, how long? [23:44:56] for the last hour [23:45:20] it seems to occur on rather large pages [23:45:34] at 70kb for example [23:45:55] URL? [23:45:56] which is below the size that would throw a 504 ususally [23:46:07] can you repro with curl? [23:46:10] page url? or api call? [23:46:14] both [23:46:24] and also what's the body of the 503? [23:46:29] I'm using pywikipediabot, so the call is buriied somewhere [23:46:38] ok, well try to repro with curl [23:46:41] sorry, not too helpful, eh [23:47:33] http://commons.wikimedia.org/wiki/Commons:Quality_images/Subject/People is one of the pages the bot edited [23:48:43] since it happens on the page save that is not straight forward (for me) to reproduce with curl [23:50:36] legoktm: is there a way to have pywikipediabot dump api calls and server responses?