[10:45:48] its a DanielK_WMDE_ [10:46:13] Check my comment on your next services patch, you have a method you talk about but that you havn't implemented yet! [10:49:41] Lydia_WMDE: around? [10:52:37] don't think so [10:52:44] neither of them [10:55:42] :D [12:39:58] We got 871 identifiers now... I guess the biggest pile is done :) [12:45:07] :) [13:21:42] If I go to https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q34201&format=json&languages=en I get wikidata on Zeus, but all the values for statment like "sex or gender" come as further wikidata ids instead of title, as in "6581097" instead of "male", is it possible to add something to the URL so that I get male there instead of 6581097? [14:18:51] addshore: Do you think (greatly) cutting down the number of properties loaded by authority control justifies a few more ms spent in its getproperties function? [14:19:02] yes [14:20:38] In that case, see https://phabricator.wikimedia.org/P2797 [14:21:07] Reduces the number of properties loaded from 41 to 9 on Q64 and from 37 to 11 on Q183 [14:21:45] getProperties is probably a bit slower, but I wasn't able to measure the difference [14:22:08] Only on Q21558717 I could measure it (don't load that one, its gigantic) [14:22:21] it's about 30ms slower on that item [14:26:17] I'll take another look today then! :) [14:35:16] https://gerrit.wikimedia.org/r/270255 [14:35:51] https://gerrit.wikimedia.org/r/278234 [14:37:50] wait hoo https://phabricator.wikimedia.org/P2797 is the same as the last version you gave me... :P [14:37:53] infact, its the same link! [14:38:02] oh wait, or did you edit it? [14:45:54] addshore: Edited it, see history [14:45:59] :) [15:22:38] addshore: around now but might drop out any time [15:22:40] wasup? [15:22:41] :) [15:22:46] shitty hotel wifi... [15:25:03] haha [15:27:35] Lydia_WMDE: https://www.wikidata.org/wiki/Special:Contributions/Addshore [15:29:42] addshore: \o/ [15:29:44] <3 [15:29:51] <- does a little happy dance [15:30:30] yeh, going to see how many it adds [15:30:30] :) [15:52:17] hoo: have you tested that new code then (im guessing yet) [15:52:19] *yes [15:52:30] Yeah, I did [15:52:36] but mostly just profiled it [15:52:43] hoo: looking at the diff why did if ( entity.datatype === "commonsMedia" ) { return true; } get removed? [15:52:53] because commons media is already linked [15:53:05] thus those get caught by the DOM check for > a already [15:53:09] makes that redundant [15:53:11] ahh! [15:53:37] Things to look for: imdb special handling, coordinate special handline [15:53:43] those are the edge cases [15:53:45] updated [15:54:10] addshore: You got the wrong version [15:54:36] needs to be if ( $.isEmptyObject( PROPERTIES ) ) { [15:56:14] whahahaa [15:56:15] ;0 [15:56:39] yeah... I initially didn't realize it's an object [15:56:53] I added that thing after I done most of the testing :P [15:57:21] fixed [15:57:22] :D [15:58:16] The api gets slower with the number of properties loaded (big surprise, eh) [15:58:32] yeh :P [15:58:40] surprise surprise [15:59:09] I guess we're doing meta data prefetching fine these days [15:59:16] so no more low hanging fruits on that end [15:59:20] yeh, although *goes to find the ticket* [15:59:28] but we will only load a few entities anymore now [15:59:47] https://phabricator.wikimedia.org/T131026 [16:00:03] XD [16:00:25] kind of feel like wbgetentites should have a lower limit... [16:00:46] haha, nice [16:01:06] yeh, and that request is the prefetching [16:01:31] probably maria is doing a stupid index choice in that case [16:01:36] need to EXPLAIN the query [16:03:59] hoo: https://phabricator.wikimedia.org/T131026#2187220 [16:25:25] addshore: wtf [16:25:32] how would that possibly be a good index [16:26:33] Open a bug about the index thing... jynus probably has input on it [16:26:41] or we can force an index, if needed [16:28:07] or re-purpose the first bug [16:41:58] :D [17:01:04] addshore: Easy re+2 https://gerrit.wikimedia.org/r/#/c/279118/1 [18:22:21] DanielK_WMDE_: around? [18:23:21] SMalyshev: yes, but jetlagged & recovering. 1 hour time shift is no problem, but a missed connection and a night at an airport hotel is >_< [18:23:23] what's up? [18:23:39] DanielK_WMDE_: sorry to hear that... [18:23:59] but you probably have more of a lag to deal with ;) [18:24:50] DanielK_WMDE_: so I am working on service-izing SearchEngine, and there's this bunch of static methods... I tried to convert them to non-static, but then you can not override them with static ones, and turns out that's what happens in SearchMySQL [18:25:15] i.e. they have static legalSearchChars, they override it, and then they call it as $this->legalSearchChars() [18:25:24] it's an unholy mess :( [18:25:39] So I wonder what'd be the best way to address this [18:26:14] I can fix SearchMySQL but there can be other extensions of course... third party etc [18:28:27] SMalyshev: the b/c trick is to make a new class with non-static methods, and make the static methods on the old class delegate to the non-static methods of the global service instance. [18:28:54] DanielK_WMDE_: I don't really want to make new class for SearchEngine... [18:28:56] it's annoying, but not su horrible if we deprecate and later actually remove the old classes. [18:29:25] Well, we can either have it pretty, or backward compatible :) [18:29:31] especially given these methods are really used only once and mostly return constant or near-constant things [18:29:53] Also, iof the static methods don't access global state, it's not stricly necessary to convert them. Though it's nicer, since static methods can't be mocked. [18:30:11] "near-constant" is a slippery slope :D [18:30:36] DanielK_WMDE_: well, in case of legalSearchChars it's not even truly static... they override it and call it with $this but define it as static :( [18:31:00] So I could leave them as static but that makes it not nice.... [18:31:01] o_O [18:31:04] does that even work?? [18:31:15] sure [18:31:19] DanielK_WMDE_: yes, it works, you can call static method with $this [18:31:25] actually - that means it won't break if you make it non-static ;) [18:31:26] and it would even call the right one [18:31:35] no, not the other way around [18:31:52] DanielK_WMDE_: it would since you can not extend non-static method with static [18:32:10] so if I make it non-static in SearchEngine.php and you have extension which overrides it with static... [18:32:15] messy :( [18:32:32] SMalyshev: you have an extension that declares it static, but treats it as non-static ;) [18:32:33] silly [18:33:10] DanielK_WMDE_: exactly. but that's what we have in out own code, so we can't complain... [18:34:05] SMalyshev: looking at SearchEngine, it does seem to me that this class should be split up anyway. Type-hinting against the default (sql) implementation is evil anyway. So make narrow interfaces, move/split the default impl, keep the old class for compat. [18:34:33] SMalyshev: doing all this at once would be tricky though. Let's try to make small steps towards it. [18:34:56] if that means keeping some static methods for now, so be it. Add some warnings and todos :) [18:35:01] DanielK_WMDE_: SearchEngine doesn't really have DB implementation afaik [18:35:17] it should probably be abstract class [18:35:49] ah, right, it's just the factory method that calls wfGetDB... for no clear reason. [18:36:23] as it is, it's actually a null implementation, it seems [18:36:32] and factory method is going out anyway [18:36:40] to its own class [18:37:24] SMalyshev: my approach would be to gut it: extract sensible parts of the code into separate classes and use composition to keep the behavior of the old class. Change code to tyhint against narrow interfaces, instead of the whole shebang. [18:38:21] DanielK_WMDE_: well, SearchEngine API right now is a bit fat. And I don't want to do too big an overhaul in one patch :) [18:38:35] hm, getNearMatchInternal seems to be an obvious candidate for extraction. Why should that be hard-coded? [18:38:51] but SearchEngine as a base class I think makes sense. Maybe also makes sense to extract some things [18:39:33] SMalyshev: Sure. I'd start by extracting things bit by bit. Ideally, there will be nothing left to do for a base class [18:39:37] DanielK_WMDE_: maybe. getNearMatchInternal worth a look. [18:39:55] hardcoded legalSearchChars. Ugh. [18:40:09] also things like userHighlightPrefs may be happier in config class than SearchEngine class maybe [18:40:23] since they can be potentially configured even if now they are not [18:40:59] SMalyshev: userNamespaces etc are really accessors for configuration, which should be injected. the static method can just look in the global search config [18:41:33] SMalyshev: actually, that seems to be it: static methods are either accessors for config, or they do near match stuff. [18:41:34] DanielK_WMDE_: that is already done. See https://gerrit.wikimedia.org/r/#/c/282217/5/includes/search/SearchEngineConfig.php [18:41:54] DanielK_WMDE_: but not for all statics. So maybe I'll move all of them [18:42:00] except for nearMatch [18:43:10] and then extract nearMatch into a separate service. and let the statics delegate to that service. [18:43:20] I am worried about overriding a bit though... if you wanted to override legalSearchChars like searchMysql does, I need to override config class then. which is tricky [18:43:53] because it's going from config... so not sure how override would work then [18:44:03] the near match service interface can actually be implemented by the concrete search engine class. but the methods would have to have slightly different names then, to avoid name clashes between the old static methods and the methods from the new interface [18:44:46] SMalyshev: does it really need to override the config class? then it's not really a config class, right? the extension could perhaps change the data in the config instead. [18:45:00] DanielK_WMDE_: well, the thing is there's no data :) [18:45:22] DanielK_WMDE_: right now SearchEngine.php has fixed set of legal chars, and SearchMySQL has a different set [18:45:24] but it's also possible to override the wiring for the config at the same time as they override the wiring for the SE itself. not pretty, but entirely possible. [18:46:06] DanielK_WMDE_: that's problematic since in existing config there would be config for SearchEngine but not for config class... so they'd get a wrong one [18:46:50] SMalyshev: the legal chars can just be config, right? Overriding the static function in an extension would no loinger have an effect, once the config is no longer accessed via the static methods. But we could keep doing this for now, and only change it later. [18:47:30] SMalyshev: i'm not sure i follow the bit about the existing config. if they override the wiring, there is no pre-existing config. they controll the instantiation. [18:47:42] if the override the wiring for the config, i mean. [18:48:57] anyway [18:49:00] DanielK_WMDE_: well, so what happens is: now we have legalSearchChars() in SearchEngine.php and legalSearchChars() returning fixed string, and one in SearchMySQL.php returning different fixed string. And we have call like $this->legalSearchChars [18:49:11] so if we move it to config, it can not depend on $this anymore... [18:49:25] sure, it can depend on $this->config. [18:49:54] DanielK_WMDE_: but then SearchEngine concrete class should control which class gets instantiated as config class [18:50:00] DanielK_WMDE_: currently it's not the case [18:50:11] they are independent services [18:50:13] no - the *extension* should control that [18:50:15] but not the class [18:50:27] DanielK_WMDE_: they are both core, not extension [18:50:27] the extension would override both, the wiring for the engine, and the wiring for the config [18:51:06] Ah. in that case, you have a config switch in the instantiator callback that creates the config [18:51:43] so what kind of config you get depends on what search engine you have configured. [18:51:48] not that pretty, but ok [18:52:30] actually, if the concrete implementations are both core, the config class itself could use different hard-coded defaults, based on the search type [18:52:34] that's perhaps the simplest solution [18:52:46] though, again, not pretty :) [18:52:57] DanielK_WMDE_: so you say the SE class would contain like class name for its config? [18:53:13] intermediate refactoring steps with b/c code are often a bit hairy... [18:53:28] and the instantiator would instantiate SE and then get the class name and instantiate that? [18:53:37] SMalyshev: no, why? [18:54:36] in SearchEngineConfig::getLegalSearchCharacters, return different values based on the search type. [18:54:40] DanielK_WMDE_: ok, then what? i.e. when I do legalSearchChars() how I ensure correct one is called? [18:54:48] they can be hard coded in the config class for now. they were hard coded before, right? [18:55:07] DanielK_WMDE_: you mean like switch? I don't like that [18:55:24] return $this->getSearchType() == 'MySqlSearch' ? "xxx" : "yyy". [18:55:28] DanielK_WMDE_: they were hardcoded, but in their own classes [18:55:43] DanielK_WMDE_: ugh. Don't like this. very smelly [18:55:44] true. it's a bit sucky. [18:56:03] if the config knows the class name of the search class, it can look at static there. but that's also not very nice. [18:56:15] and can not work for any custom search type [18:56:25] yes, it's smelly. there is no way to get a perfect solution in one step [18:56:28] we have to compromize [18:56:43] it think it would be ok, if there is a // TODO with a clear way forward. [18:57:35] DanielK_WMDE_: also, note that we can have non-default search engines instantiated [18:57:47] SMalyshev: the thing is: the config knows what engine class to use, and it knows what default chars to use. Both these things come from the same thing (core wiring, or extension wiring) [18:58:09] DanielK_WMDE_: and each of those should use correct char set. so $this->getSearchType() won't work as that is for default engine [18:58:10] the concrete config has this knowledge (as data, or hard coded). [18:58:25] DanielK_WMDE_: currently, config knows nothing [18:58:35] well, it should ;) [18:58:44] except the globals. But you can instantiate different SE that is not in the globals [18:59:11] e.g. SearchUpdate instantiates several [18:59:30] but only one config service object, so it can not serve all these engines at once [18:59:43] ok, let's check again what problem we are trying to solve. [19:00:16] we want to make the legal chars overridable in a dynamic method or member [19:00:28] so they can be injected/overwritten from config [19:00:39] DanielK_WMDE_: well, not exactly [19:00:43] but we want to stay compatible with extensions that override the static method [19:00:51] are there any such extensions? [19:00:53] DanielK_WMDE_: we'd like to have legal chars dpeend on concrete engine class [19:01:14] this is already the case. [19:01:23] so... just leave it? [19:01:40] DanielK_WMDE_: but legalSearchChars is defined as static and used as non-static [19:01:45] it's hardcoded. no global state -> no urgent problem [19:01:53] so we can leave it alone but that kind of sucks. [19:02:03] can the $this... call just be replaced by static::...? [19:02:20] that would do the right thing, right? [19:02:21] I wondered if we can find a better way. Maybe not [19:02:36] i can see none that is 100% b/c [19:02:51] DanielK_WMDE_: it can be maybe but $this is actually better because once we fix it we'd want it to be called with $this [19:02:57] a switch in the config, or the code that create the config, is another solution. [19:03:56] SMalyshev: how about this: make a new non-static methid, call it with $this. The implementation just calls the old static method, with static::, not self:: (unless i'm confusing the semantics). [19:04:10] hmm... I guess I just leave those two config-like functions alone, but will move getNearMatch to a separate class [19:04:11] then deprecate the static method [19:04:29] DanielK_WMDE_: yeah that may be possible [19:04:40] the config-like method that access globals should move. the hard coded onces are not that urgent. [19:05:16] DanielK_WMDE_: by config-like I mean "hardcoded, but throretically may not be" :) [19:05:47] ok :D [19:54:28] what's up with wikidata deployments? https://www.mediawiki.org/wiki/Wikidata_deployment seems to be a bit out of date [19:56:21] aude, hoo: any insights? ^ [19:56:39] 7nick hoo [19:56:48] Sadly it's almost up to date [19:56:55] we didn't deploy any new code since wmf16 [19:57:01] will probably next week [19:57:03] so. no deployments for a month? ouch [19:57:10] yeah :/ [19:57:18] ok then, I'll wait till next week [20:48:18] DanielK_WMDE_: also see https://grafana.wikimedia.org/dashboard/db/mediawiki-watcheditemstore for the stuff I put in WacthedItemStore [20:59:47] addshore: what does eg 1K mean for caching? 1K per what? second? minute? hour? [21:00:50] ...and what happened around 21:00? [21:02:23] is there a way to select all statements using properties with a particular data type in sparql? [21:02:33] it seems like it should be possible, but I can't figure out how [21:27:40] benestar|cloud: About? [21:28:13] benestar|cloud: https://phabricator.wikimedia.org/T105794 [21:28:30] "BeneBot*/1.0 WikibasePhpLib/0.1" (from logstash) [21:28:32] fix it! :) [21:33:04] DanielK_WMDE_: per minuite [21:33:15] and DanielK_WMDE_ thats was the deployment onto the final group of wikis! [21:36:07] oh wait the big spike in uncaches, hmm, I have no idea! [21:36:54] but yeh less than 25% of requests are retrieved cached [22:00:39] addshore: Still around? [22:02:12] Hoo yup [22:02:14] Just [22:02:45] I was just doing some profiling and wtfed [22:02:56] guess where we spent an awful lot of time in wbgetentities [22:03:05] SiteLinkTargetProvider :P [22:03:06] No idea :) [22:03:17] Really? How much? And doing what? [22:03:27] I wonder whether we should ACCEL cache it [22:03:40] especially as it's thhe only reason for loading the sites in most requests [22:04:25] the numbers are not conclusive... profiling an API call I got a baffling 170ms, but that's probably because of the profiler making function calls slow [22:04:45] Oh lol, only 1 method in it! [22:05:00] Running it interactive I get around 20ms on hhvm (on second run, so after already fetching the sites) [22:05:29] php is about the same once the sites have been loaded [22:05:42] not sure how long it takes to load the sites in production (as they're ACCEL cached in hhvm) [22:05:59] but I'm pretty sure that's way to much [22:06:35] well, dont we have profiling stuff from the live site? [22:06:43] WE do [22:07:13] https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=P373|P213|P345&props=claims&format=jsonfm&forceprofile=1 [22:07:14] P213 cirrus visibility - https://phabricator.wikimedia.org/P213 [22:07:14] P373 test-uwsgi-threading.py - https://phabricator.wikimedia.org/P373 [22:07:14] P345 kill sajax - https://phabricator.wikimedia.org/P345 [22:07:32] (you need the wikimedia debug addon thing for that to work) [22:07:44] we need jsonfm so that the profiling information can be put somewhere [22:08:22] *has that* [22:08:39] hah "*": "Unrecognized parameter: 'forceprofile'" [22:09:07] scroll down to the very bottom of the source [22:09:23] 29.40% 174.833 1 - Wikibase\Repo\Api\GetEntities::getAllowedParams [22:09:23] 29.26% 173.963 1 - Wikibase\Repo\SiteLinkTargetProvider::getSiteLis [22:09:54] but hoo https://performance.wikimedia.org/xhgui/url/view?url=%2Fw%2Fapi.php%3Faction%3Dwbgetentities [22:10:04] doesnt that do something too? ;) [22:10:25] dont think that provides the profiling info :9 [22:10:45] oh, or does it https://performance.wikimedia.org/xhgui/run/view?id=5706cf73b47a0c8a761aaa37 [22:11:22] Wikibase\Repo\SiteLinkTargetProvider::__invoke is the longest self wall time and Site::unserialize is the biggest memory hog [22:11:22] Yeah, indeed [22:11:27] and that has the same nubmer [22:12:23] yeh, Wikibase\Repo\SiteLinkTargetProvider::__invoke is just the worst! [22:13:09] but... heh [22:14:23] why __invoke> [22:14:38] addshore: $sites->uasort( … ); [22:14:43] the anon. function [22:14:43] ahh! [22:16:36] Anyway... I think we should just cache that in some way (probably CACHE_ACCEL with a short duration) and ... magic [22:16:51] yeh :P [22:17:22] hoo: well, yeh, it looks like its the sorting that hurts, introduced in https://gerrit.wikimedia.org/r/#/c/225088/3/repo/includes/SiteLinkTargetProvider.php [22:17:48] :( [22:17:48] Order SiteLists alphabetically for easy use in API sandbox -> Maybe best to just do the sorting specifically for the api sandbox? ;) [22:17:59] imo that may also be a good idea [22:18:16] maybe api sandbox should order itself? [22:18:22] I don't think we should bother with it [22:18:30] yeh, I'd be happy with that [22:18:38] especially if it removes such load [22:18:41] but even only loading the objects and unserializing them so often is bad [22:18:57] it would also remove all of the many calls to Site::getGlobalId :P [22:19:02] back to the sites caching drama :( [22:20:13] I mean, removing the sorting there and just doing it in the sandbox itself would save just over 300ms for each and every call to wb getentities [22:21:07] 7,000,000 calls per day [22:21:07] I like how many pages no longer even need API call to wbgetentities now [22:21:16] thats 583 wall hours per day ;_) [22:21:39] 24 wall days per day xd hah [22:22:19] anyway, sleep time in a bit, finally got the pigeon out of the chimney! [22:22:27] I'll leave all these tabs open to look at again tommorrow! [22:22:32] Nice [22:22:40] Yeah, I'll probably leave in a bit as well [22:22:53] I think we did the most to authority control we could [22:23:00] would be nice if it didn't even exist :P [22:23:04] yeh