[00:05:48] Reedy: you around? [00:07:27] perhaps someone else knows [00:07:39] I've been working on some tools related to new page patrolling [00:08:03] by searching through the logging table [00:08:24] and I've noticed that the log_params field of that table seems to have switched formats drastically, at some point in the last year [00:08:41] the new format looks like: a:3:{s:8:"4::curid";i:10091973;s:9:"5::previd";i:10091972;s:7:"6::auto";i:1;} [00:08:55] and the old format is just a number, like: 10074107 [00:10:00] where's the best place for someone to learn how to decode the information in both of those formats? I'm looking for information on how to determine if the patrol was automatic or manual [00:10:36] I assume that, with the new format, I'm just looking for "6::auto";i:1 or "6::auto";i:0 [00:10:45] but for the old format, I have no idea [00:11:32] looks like Reedy had some involvement with it at http://www.mediawiki.org/wiki/Special:Code/MediaWiki/112374#c31649 [00:11:46] The new format is just json [00:12:16] is it? [00:12:24] strangest json I've ever seen [00:12:24] yes [00:12:28] Why? [00:12:33] oh, sorry [00:12:39] it's serialised php isn't it? [00:12:55] might be, i'm not a php guy [00:13:24] the new format seems easy enough, but the old format is another story [00:14:08] http://p.defau.lt/?pdbVE_Wps2G7Fm4ojn0O0A [00:14:26] presumably the old ones are just the curid then [00:14:42] ahh, got it. that makes sense [00:14:51] so with the old format, there's no way to distinguish between autopatrols and manual patrols? [00:15:25] seemingly not [00:15:37] My comment on that revision shows to zeros on new lines [00:15:53] but that's no help to anyone [00:16:10] hmmm [00:17:04] yeah, looks like you're right, those are curid's [00:17:50] Though [00:17:50] v [00:17:52] list( $vals2['cur'], $vals2['prev'], $vals2['auto'] ) = $params; [00:18:04] log_params: 10074107 [00:18:05] 0 [00:18:05] 0 [00:18:07] i could swear that MW could distinguish between auto and manual patrols though.. the switch seems to have happened around feb/march 2012 [00:18:16] ohhhhhhhhhh [00:18:19] missed those [00:18:58] so the third number is auto [00:19:04] that's what I needed! [00:19:20] yeah, would look like that's the case [00:19:36] thanks for digging into that for me [00:19:39] appreciate it [00:36:57] susan is being stupid [00:37:00] the usual [00:37:47] domas: <3 [00:39:47] Susan: lollish, after you said that bug about edit conflicts doesn't interest you :p https://en.wikipedia.org/w/index.php?title=Wikipedia:Requests_for_comment/Article_feedback&curid=38126110&diff=534738832&oldid=534738771 [00:40:26] Nemo_bis: Which bug about edit conflicts? [00:40:30] I only saw one about undo. [00:41:04] susan: that was my point, that it is an anti-pattern [00:41:18] slapping cache on top of expensive data operations and expecting everything to work fine afterwards [00:41:42] Susan: it's the same. [00:41:55] Nemo_bis: I don't see how. [00:42:13] Susan: don't they all use diff3? [00:42:20] Anyway, to bed now [00:42:20] Who's they? [00:42:28] undo and edit conflicts [00:43:10] No. [00:43:37] domas: I guess we could add a field to store the counts. It's probably overkill, though. [00:44:05] is it? [00:44:16] domas: There are about 100 pages with over 50,000 revisions on en.wiki, it looks like. [00:44:23] Yes, WP:AN/I is an edge case. [00:44:37] You could just disable the counts for that page. :-) [00:44:55] I'm looking at . [00:44:56] so nice that I can cut and paste and don't have to write answers, e.g. [00:44:56] [16:36:56] susan is being stupid [00:44:57] [16:36:59] the usual [00:45:19] How expensive is a COUNT(*) on 25,000 rows? [00:45:57] multiple I/Os, megabyte of cache pressure, etc [00:46:11] * AaronSchulz waits for those SSDs [00:46:13] I guess if you overprovision hardware like crazy (or like wikipedia), then nobody cares [00:46:39] Google doesn't overprovision? [00:46:41] Twitter doesn't? [00:46:58] domas: https://gerrit.wikimedia.org/r/#/c/41196/ That one'll make you cry :P [00:47:13] The vast, vast majority of pages have very few revisions. [00:47:19] Probably somewhere around 99%. [00:47:25] Across Wikimedia wikis. [00:47:36] I'm not sure an extra DB field is worth it. [00:47:50] If it's saving a COUNT(*) on 5 rows. [00:48:21] google is exceptionally good at their resource management, I think [00:48:31] Susan: Actually why do you want to know the revision count and in which cases? [00:48:37] hoo: It's interesting information. [00:48:49] In which cases what? I don't follow the second question. [00:49:05] susan: well, how many have 1000 rows? [00:49:11] thats still multiple i/os and multiple pages in cache [00:49:26] how many have 100 rows? [00:49:26] ditto [00:49:33] now, multiple cumulative effect [00:49:41] domas: What would you say is a reasonable number to justify COUNT(*)? [00:49:47] I'm not sure that question is phrased well. [00:49:50] Susan: Why can't you use the TS DBs for that [00:49:53] COUNT(*) should never be done in web environments [00:49:58] or well... how often do you want to do this? [00:50:15] hoo: It's part of the info action in MediaWiki core. [00:50:17] domas: well if you know there is low upper bound it's fine [00:50:24] just like tiny filesorts [00:50:24] hoo: https://en.wikipedia.org/w/index.php?title=Sean_Combs&action=info [00:50:37] aaronschulz: meh, still pulls in more data than it needs [00:50:44] hello? [00:50:53] domas: :) [00:50:56] what is better, 10 counts that count 1000000 rows, or 1000000 counts that count 10 rows? [00:51:08] eric_lee: HI!II?!?! [00:51:23] doms [00:51:24] domas [00:51:26] domas: What is your recommendation? To store the counts in a DB field? [00:51:30] yup [00:51:32] Okay. [00:51:34] domas: on an ssd or hdd? :D [00:51:34] I'll file a bug. [00:51:35] not reco [00:51:37] but question [00:51:46] Susan: That'll probably never happen [00:51:47] * hoo hides [00:51:48] uh [00:51:53] what the counts are for indexes all in ram [00:52:02] hoo: What will never happen? Storing the counts? [00:52:03] domas: so I guess, "it depends" [00:52:10] Susan: Probably, yes [00:52:29] hoo: Well, I think (I know) people are interested in this information. [00:52:30] well, realistically the first one would suck for some other reasons [00:52:56] like tying row purging and all sorts of nonsense [00:53:00] Susan: And I know people actually working on nasty data analysis stuff [00:53:01] *tying up [00:53:01] and still that doesn't work the way they want it [00:53:15] (if someone got a spare computing cluster ping me) [00:53:33] hoo: I'm not sure what you're saying. [00:53:49] hoo: The action can be made more efficient, though I kind of hate stored counts. [00:53:54] Because they're invariably wrong. [00:54:38] they *are* always wrong, that's true :) [00:55:00] Doing that sounds like a nasty hack we should only do if REALLY needed desperatly [00:55:15] mysql> SELECT COUNT(*) FROM revision WHERE rev_page = 5149102; [00:55:16] +----------+ [00:55:16] | COUNT(*) | [00:55:16] +----------+ [00:55:16] | 214211 | [00:55:16] +----------+ [00:55:17] 1 row in set (15.94 sec) [00:55:23] It's not just you! http://upload.wikimedia.org looks down from here. [00:55:24] sorry for the flood, bah [00:55:24] ? [00:55:39] getting that kind of thing right and fast is about as easy as riding a unicycle through a wind tunnel while juggling handkerchiefs [00:56:04] My point should be: Creating that on demand in some rare cases IMO is fine [00:56:11] * was supposed to be [00:56:39] If we introduce a caching field for like everything, we're going to have a littly messy DB scheme [00:56:52] wodim: Seems fine to me. [00:57:34] yes, fine for some people... but not for me or for google appspot [00:57:34] hoo: Domas doesn't like COUNT(*) for Web environments. [00:57:40] so it's something in between i suppose :p [00:57:50] wodim: Do you know how to trace route? [00:57:58] yes i'm on it [00:58:03] Okay. [00:58:09] could use triggers [00:58:13] then the stored counts will be right [00:58:15] Susan: Well... we could of course get the whole data and count it in PHP, I'm not sure that's better though [00:58:36] TimStarling: MySQL triggers? Or you mean just += 1 to the count after every edit (or watch)? [00:58:39] Susan: oh, now it works fine, so... sorry for the noise [00:58:41] TimStarling: Do we really need a new DB field for that? If we do, I want some DB things myself [00:58:59] What DB things do you want? [00:59:00] Starting with some indexes [00:59:05] ah, yeah, triggers, a good way to anchor yourself to a particular rdbms [00:59:09] Heh. [00:59:20] they are quite powerful in pg, if you like the pg trigger language :) [00:59:30] hoo: you're saying we should just remove the feature? [00:59:30] I have a great deal of respect for Domas. But I'm still not convinced that COUNT(*)s are evil. [00:59:46] Susan: you are assuming he is literal [00:59:46] I am fine with removing the feature [01:00:00] domas always exaggerates, isn't that right domas? [01:00:14] TimStarling: You're fine with re-disabling the info action, you mean? [01:00:29] TimStarling: I'd like to see a focus on feature that look at the last X months of activity [01:00:29] no, just removing that particular data item from it [01:00:34] It took me something like seven years to get it re-enabled. [01:00:37] TimStarling: If it affects production, probably, yes [01:00:47] tends to be useful and have much better bounding performance [01:00:55] There are multiple COUNT(*)s in it currently. [01:01:07] Even I introduced one [01:01:23] Is there evidence that it's affecting the production cluster? [01:01:30] And I don't feel very guilty [01:01:38] Domas will fix that. [01:01:59] really, it could just not show some stuff if the estimated count is too high [01:02:02] some people will never understand that performance engineering is a combination of lots of 0.1%s [01:02:09] and it would work fine for 99.9% of pages [01:02:17] Heh. [01:02:22] we have had namespace selectors on RC and contributions pages for some years, despite domas complaining, and the cluster somehow survives [01:02:33] and the filesort for newbie contribs [01:02:33] except that from time to time it breaks shit [01:02:36] and we have to block IPs [01:02:37] and whatnot [01:02:40] domas: Well... the only difference over here is that we don't sum up those 0.1s but multiply 'em... [01:03:04] you multiply 1.01s [01:03:04] and usually scanning 50k rows or so doesn't take long [01:03:05] TimStarling: Right. That was mostly my point much earlier above. [01:03:06] not 0.1s [01:03:15] There are a few dozen ways to cause a large table scan currently. [01:03:18] and I wouldn't expect many pages to have more watchers than that [01:03:39] TimStarling: https://en.wikipedia.org/wiki/Wikipedia:Database_reports/Most-watched_pages [01:03:48] The highest is 76,000-ish. [01:03:54] And it's a steep drop-off. [01:03:55] when you need to read megabytes of data to serve a single digit [01:03:58] there's something wrong at that [01:03:58] domas: No... that's only a problem on rare cases (let's say 0.1) and that again is only a rare fraction of total page requests (let's again say 0.1), ... [01:04:02] but I guess wikimedia has moneys [01:04:11] TimStarling: does watchlist filesort? [01:04:18] domas: The money is much better spent on larger and more robust comments boxes? [01:04:49] AaronSchulz: yes, I think so [01:04:49] I mean, if you're going to raise $35 million, I think providing interesting information to readers and editors is better than most other uses of the money. [01:04:59] well the watch join would be LEFT, so nevermind [01:04:59] hoo: my point was there're lots of small things that are "not big problem" [01:05:10] sometimes it is easier to throw hardware at the problem and ignore outliers [01:05:11] if could be deferred till after the WHERE (god I'd hope that is what would happen) [01:05:16] as long as you don't monitor bad behavior [01:05:19] there is no bad behavior [01:05:21] TimStarling: I think Domas cares less about number of watchers than he does about number of revisions. [01:05:21] and everyone is happy [01:05:22] and cluster is up [01:05:27] * AaronSchulz still would not be comfortable with that [01:05:48] Can ?action=info be profiled via the Web UI? [01:05:48] I mentioned it is anti-pattern [01:05:56] It'd be interesting to see which queries it's running. [01:06:03] imagine that number being on every page, and it being beyond memcached [01:06:15] so it is very cheap when it is in the cache [01:06:19] Analytics is expensive; film at 11. ;-) [01:06:25] but when it is not in the cache, you end up with stampede and thousands of threads all counting same rows [01:06:27] tadaaaa [01:06:40] ddsh ... sudo service memcached restart [01:06:42] domas: Well, there are more nasty things which we yet do [01:06:49] and we will probably do in the future [01:06:53] "the domas performance test" [01:07:16] as someone mentioned, yeh, just buy lots of ssds [01:07:19] and it will be all fine [01:07:32] and fuck-all RAM for the innodb buffer [01:07:41] You could put the counts behind $wgMiserMode. [01:07:44] domas: So in you're opinion it's actually good to have those user_text fields and stuff? [01:07:56] But it feels pretty cruel to punish 700+ wikis because a few are popular. [01:09:16] domas does not care about cruelty to users, only servers [01:09:18] There are a few COUNT(*)s on recent changes. [01:09:25] recentchanges [01:09:26] why is enwiki punished with high cost-per-user of small wikis? [01:09:27] :) [01:09:43] Good night... I'm out, I don't really care about that counter anyway [01:09:56] just put the query in the watchlist group [01:09:57] ok, bye :) [01:10:13] anyway, you can do whatever you want [01:10:14] then it will be guaranteed to be in RAM, and then we are only talking about some tens of milliseconds of CPU time [01:10:21] I'm just telling what are good practices and what are the costs [01:10:38] I'm not sure storing counts for the recentchanges counts would make sense. [01:10:46] talking about overprovisioning hehe [01:10:55] For revision, probably makes sense. For watchlist, it definitely seems like overkill. [01:11:07] well [01:11:10] counts also deadlock easily [01:11:21] at least the way we do them [01:11:32] http://ganglia.wikimedia.org/latest/graph.php?h=db1043.eqiad.wmnet&m=cpu_report&r=day&s=by%20name&hc=4&mc=2&st=1359076267&g=cpu_report&z=medium&c=MySQL%20eqiad [01:11:42] you see that we have some spare CPU time on the watchlist server in eqiad [01:11:42] it needs a password [01:11:43] in any case, it's serializing unrelated thing [01:11:43] heh [01:11:43] For watchlist, after the top 1000 pages on en.wiki, we're talking about 650 rows or fewer, it looks like. [01:11:51] *things [01:12:02] domas: I imagine you're allowed to have the password. [01:12:05] I'm not. :-( [01:12:09] TimStarling: "some" [01:12:09] I had it somewhere [01:12:13] heh [01:12:27] timstarling: well, thats what I say, one can have whatever he wants as long as he overprovisions hardware [01:12:34] * domas is not relevant [01:12:42] irrelevant * [01:13:00] * domas was double-plus-relevant when we had no money to spend on hardware [01:13:10] :) [01:13:35] Of all the complaints I can list about the Wikimedia Foundation, I really can't add "buying too many servers" to the list. [01:13:49] I'm serious, instead of wfGetDB(DB_SLAVE)->select(...) it should be wfGetDB(DB_SLAVE, 'watchlist')->select(...) [01:14:26] I don't think anyone thought you were kidding. You should file a bug. :-) [01:14:26] someone here is serious? [01:14:30] Susan: *shrug*, I never asked for your list, nor I thought it was credible [01:14:52] you can wave your hands as much as you want [01:15:03] it is easy to get metrics on crap like that [01:15:14] I'm just saying that an argument that tries to make overprovisioning the enemy is kind of weak. [01:15:38] how much do you donate to WMF? [01:15:40] in cash? [01:15:47] Not much. [01:15:57] I haven't in years. [01:15:57] how much do your friends donate? [01:15:57] More than me! [01:16:22] how often do you have to do capacity management of any sorts? [01:16:27] I had a friend from work tell me he asked for a donation to Wikipedia for a Christmas gift. [01:16:38] I thought that was sweet. [01:16:46] Not often, thankfully. [01:16:48] I had friends, who were major donors, who stopped being major donors [01:16:59] I used to donate [01:17:01] I stopped too [01:17:02] heh [01:17:11] Sure. [01:17:13] world didn't stop [01:17:16] there's new generation of donors [01:17:21] Right. [01:17:25] and yet you donate a lot of your precious time in this discussion [01:17:27] :-) [01:17:28] And your time and expertise are much more valuable than cash. [01:17:50] I donate quite a lot. [01:17:53] you don't seem to care much about it anyway, ha ha [01:18:06] paravoid: yup [01:18:22] that's not true! people care enough to play along [01:18:30] I care. I just feel your argument is weak. [01:18:52] which one? [01:18:58] that one can solve problems by overprovisioning? [01:19:25] If you could say "X feature costs Y" and Y is understandable and relatable, that'd be a stronger argument, I think. [01:19:49] Instead, the argument seems to be "the problem can be solved with more servers." To which I say: that's a much better use of money than most other things WMF buys. [01:20:02] I'd take a few more servers over a few more staff any day. [01:20:49] i'd take a new datacenter not in tampa :) [01:20:52] And in the abstract, the Y cost doesn't mean anything to anyone but DB engineers. ;-) You could convert it to dollars. [01:20:54] wikipedia, where anyone can be HR/CFO/VP/CTO/etc at the same time [01:21:37] shutting down the datacentre in tampa would obviously be a bigger win for technical spending than removing the watchers count from action=info [01:22:00] well, there would be some dc to takes its place [01:22:17] *take [01:22:30] TimStarling: never said that is not the case [01:22:37] *shrug*, I didn't yet tell anywhere that anyone should do something [01:22:41] AaronSchulz: not really [01:22:41] There's a lot of waste in any non-profit. But overprovisioning servers doesn't feel like waste to me when a huge portion of the goal of the projects is to serve Web pages. [01:22:50] I said there're tradeoffs, indeed [01:22:52] TimStarling: is that not the plan? [01:22:56] the question is whether or not to have a spare datacentre [01:23:02] in case one fails [01:23:07] Susan: I guess in commercial world people have different perspective, because they have to buy their servers [01:23:19] Wikimedia steals them. [01:23:25] extorts [01:23:27] Heh. [01:23:30] would be correct word [01:23:58] TimStarling: there's a question about that? [01:24:14] I thought pmtpa was being taken offline completely. [01:24:15] maybe a question by tim ;) [01:24:27] I thought we were all agreeing that we need a secondary site [01:24:47] A secondary site or a secondary data center? [01:25:12] and that pmtpa has a great deal of technical debt accumulated from being evolved all these years, so it might make sense to build a new one somewhere [01:25:27] and west coast/SF was one of the proposals [01:25:29] I believe the plan is to build a caching site in SF. [01:25:35] But not a data center, per se. [01:26:09] no, the short-term plan is to build a caching center in SF (ulsfo), and possibly build a new DC somewhere in SF (but in ulsfo) [01:26:14] but not* [01:26:38] Oh, I hadn't heard anything about a new DC. [01:26:46] because it's not decided yet [01:26:54] it's just an idea that's been floating. [01:27:02] Sometimes there are discussions before decisions. [01:27:11] Or there used to be... [01:27:13] yes, and we haven't had that discussion yet. [01:27:51] What happened to yaseo and the knams? Are they both still around? [01:28:17] Guess yaseo is gone. [01:28:28] knams still seems to be around. [01:29:24] there are still knsq servers [01:29:31] it's esams now [01:29:37] evoswitch not kennisnet [01:30:06] Should http://wikitech.wikimedia.org/view/Kennisnet_cluster be marked historical then? [01:30:08] we still have some networking equipment in kennisnet but most of the amsterdam stuff is in the evoswitch DC [01:30:14] yea, we have knsq and amssq [01:30:37] sq? [01:30:40] squid [01:30:49] Ah. [01:30:55] Susan: no, it just shows a couple of bits of networking equipment [01:31:01] that's what is there [01:31:16] http://wikitech.wikimedia.org/view/Amsterdam_cluster has a note about how it's allegedly "Enqueued for move to the new wikitech". [01:31:32] I don't think the new wikitech idea ever went anywhere. [01:31:51] http://wikitech.wikimedia.org/view/Special:WhatLinksHere/Template:MoveToNewWiki [01:31:55] Hm. [01:36:07] Krinkle|detached: solution for your mailman question. login as list admin and go to "Edit the public HTML pages and text files", then edit the HTML of your "Subscribe results page" and remove or deactivate the form [03:51:32] TimStarling, Susan: don't you think we should just be a bit ruthless and delete 'historical' pages from Wikitech? Wikitech is less useful to the extent that it is a museum. [03:52:15] I don't think the old pages are in the way. [03:53:26] ori-l: should we delete articles from wikipedia just because they are historical? [03:54:13] p858snake|brb: the primary thing wikitech and wikipedia have in common is the software that powers them; I don't see how that isn't a straw man argument. [03:54:13] That's not really comparable. [03:55:05] I think it's somewhat comparable to having a medical manual that contains historic treatments for various conditions alongside contemporary, evidence-based ones. [03:55:40] The 'historical' template doesn't affect what comes up for search autocompletion or results, or what appears in category lists, or the appearance of wikilinks to the article [03:56:30] So there are a lot of views in which historic content makes an appearance without being clearly marked as such, and that diminishes from wikitech's usefulness as a quick diagnostic manual, and also makes it more difficult to assess the adequacy of coverage of various topics. [03:56:57] Do you have an example? [03:57:16] Sure, hang on a moment. [03:57:59] https://en.wikipedia.org/wiki/Wikipedia_talk:Miscellany_for_deletion/Wikipedia:Esperanza [03:58:16] "Messedrockerfy" is an interesting concept. [03:58:23] Not sure if you're familiar. It's kind of obscure these days. [03:59:38] Well, let's start with the simple and obvious: [03:59:41] http://wikitech.wikimedia.org/view/Main_Page [03:59:46] Which ones are stale? [03:59:53] Can you tell? [04:00:10] well of course historical content won't appear "historical" unless someone spends the time to mark them as such, its kinda a biased argument to make [04:00:57] ori-l: Right. That index just needs to be updated. [04:01:15] I don't have any issue with removing categories or de-linking irrelevant material. [04:01:49] If you start deleting content, I doubt you'll ever stop. The whole site is pretty bad. [04:01:51] right, but the preservationist attitude counsels treating content as precious [04:03:26] Krinkle doesn't seem to think so. [04:03:34] http://wikitech.wikimedia.org/history/Collected_Status [04:03:50] Let me put it another way: instead of deleting pages, why not fix them? :-) [04:04:51] If you delete the pages, you'll have red links that you'll still need to remove. [04:05:00] Why not just remove the links and leave the pages? [04:06:12] I've turned into a bit of a wiki-archaeologist, so it won't be easy to convince me to delete some of this stuff. Sometimes I reference it. [04:09:21] Sorry to be rude, my son woke up, so I have to go afk for a bit. [04:47:31] * Jasper_Deng would think http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#content_not_refreshing_from_Hungary.2C_using_ipv6. 's premise is invalid [05:21:24] TimStarling: So duh tells me that the combination of a lack of interwiki transclusion and the support for global images (from Commons) has lead users to create global user pages created out of images: https://commons.wikimedia.org/wiki/File:Snowolf_gluserpage.png [05:21:30] I thought you might appreciate that. [05:22:41] It's a smart idea imo [05:34:04] smart, for those who want the hassle of creating a new image for each edit to the userpage [05:38:38] I don't understand the linkability. Do you use an imagemap with it, then? [05:40:05] rschen did [05:40:12] Snowolf used |link= [05:40:34] https://de.wikipedia.org/w/index.php?title=Benutzer:Snowolf&action=edit [07:54:16] Susan: I don't use an image map because to update it I'd have to change the image map on all wikis; I merely have a link to meta and I update the image on commons [07:54:29] Jasper_Deng_away: easier to update one image than 732 pages :) [07:54:53] Late at night, I hear the mice. [07:57:10] Have you tried restarting? [07:57:38]