[05:33:21] ACKNOWLEDGEMENT - WDQS HTTP on wdqs1002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - 416 bytes in 0.040 second response time daniel_zahn scheduled downtime was set [05:33:21] ACKNOWLEDGEMENT - WDQS SPARQL on wdqs1002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - 416 bytes in 0.012 second response time daniel_zahn scheduled downtime was set [10:16:18] is there a bot who fills the properties of any wikidata items ? [12:56:04] jzerebecki: is there anything special i need to do for jenkins to pick up a new composer dependency? https://gerrit.wikimedia.org/r/#/c/286144/ [13:17:30] DanielK_WMDE: that is not the problem: see in the log - Installing wikimedia/purtle (v1.0) [15:05:41] nikki: I reverted two of your Facebook ID's addings, not sure if musicbrainz is a trusted source for those [16:14:27] jzerebecki: https://gerrit.wikimedia.org/r/#/c/286144/ [16:15:46] DanielK_WMDE: did you see https://phabricator.wikimedia.org/T133924 ? [16:16:03] wonder if you have any suggestions? [16:16:05] aude: i saw your mails. [16:16:17] ok [16:16:48] unrelated, i'm also trying to figure out https://integration.wikimedia.org/ci/job/mwext-testextension-hhvm/10722/testReport/junit/(root)/EchoUserLocatorTest__testLocateArticleCreator/testLocateArticleCreator_with_data_set__0/ [16:17:05] Echo gets loaded anyway in our tests becase Wikidata depends on ContentTranslation [16:17:08] now* [16:17:16] and ContentTranslation requires Echo [16:17:18] aude: jzerebecki just told me about that [16:17:23] DanielK_WMDE: ok [16:17:24] it's silly [16:17:50] i would like to fix the build first, since we have something to backport (and maybe will have something for the rdf issue) [16:18:21] aude: the error "bad transition" means that the purtle commands are called in the wrong order. the serializer is stateful, you can't output an object if you didn't output a predicate before (or something). [16:18:34] so we need to find out what causes this to get out of whack [16:18:34] we could revert https://gerrit.wikimedia.org/r/#/c/285444/ temporarily [16:18:53] ok [16:19:00] aude: for the echo thing? i'll leave that to jzerebecki to figure out [16:19:05] maybe the predicate is deleted? [16:19:18] no, it's a straming interface [16:19:20] or just wrong order as you say [16:19:22] aude: yes, thx, I'll revert that [16:19:25] it's all about current state. no persistent state [16:19:32] thre eis just one current predicate [16:19:34] or none [16:19:34] ok [16:20:06] not sure the predicate is the problem here. a similar error occurs when you try to write a subject just after you wrote a prodicate. can't do that, need an object or literal first. [16:20:18] i'll have to check what 5 -> 11 actually is [16:20:21] hmmm [16:21:00] it's a regular aautomaton. 5 and 11 are states, but there is no edge between them [16:27:01] aude: it would be very helpful to know which item triggered this error... [16:27:21] DanielK_WMDE: yeah [16:28:09] i think https://gerrit.wikimedia.org/r/#/c/286083/ would help and they we can try to run a dump again [16:28:21] from looking at the code, i see no way this can happen. [16:28:42] hey [16:28:54] or i can try to run with a batch of the missing ids [16:28:58] hey SMalyshev [16:29:07] any progress with the dump thing while I slept? [16:29:14] SMalyshev: we are just discussing [16:29:26] * aude just woke up not that long ago :) [16:30:01] aha, I see. I looks like some subject gets skipped wrongly, because 5->11 is document->predicate [16:30:06] SMalyshev: i'm just looking at the code. 5 -> indicates that something calls say() before about() is called. But QuantityRdfBuilder also calls say() on the same write that later ComplexValueRdfHelper calls say() on, and the earlier call doesn't fail [16:30:12] i can't see what could trigger this state [16:30:43] i.e. subject is missing for something. I suspect it's some interaction between dedup and current subjects, as it's the only code path that can skip the subject [16:30:48] but i think aude is right: we should make this more robust, and get a more meaningful error message. then we have a (mostly) complete dump, and we know what data triggers the problem [16:31:38] so can we merge https://gerrit.wikimedia.org/r/#/c/286083/ then and try to run it on shard 1? [16:31:54] SMalyshev: yea, i just gave it a +2 [16:32:01] the problem seems to be clearly in shard 1, around 5940875 somewhere [16:32:03] ah cool! [16:32:22] SMalyshev: btw, i just released purtle as a separate component. thze change that removes purtle code from the wikibase repo was merged a few minutes ago+ [16:32:34] i hope you are fine with being in the credits of the component [16:32:45] yeah, sure, thanks [16:33:10] SMalyshev: https://github.com/wmde/purtle https://packagist.org/packages/wikimedia/purtle [16:33:22] ...and of course, it's 1.0.1 already ;) [16:38:27] we can backport my patch, though for now i'm running the dump script on my list of missing entities [16:38:47] to see if the error can be reproduced on that set of entities [16:39:43] aude: in theory, the first entry in your list should trigger the errir [16:42:54] the list of ids is sorted and then used sdiff to get missing ids [16:43:12] anyway, dumpRdf is going quick on the list [16:55:49] aude: I would assume it would not reproduce just from missing ones.... I tried those individually and they work [16:56:00] aude: do you want to deploy the backport or should I? [16:56:12] jzerebecki: if you want [16:56:20] we should add the rdf patch [16:56:44] aude: this one?: https://gerrit.wikimedia.org/r/#/c/286083/ [16:56:56] https://gerrit.wikimedia.org/r/#/c/286177/ [16:58:01] SMalyshev: that's worrying [16:58:21] but actually... [16:58:25] i am just trying [16:58:29] SMalyshev could be right [16:58:37] DanielK_WMDE: yes it is. [16:58:42] aude: doesn't hurt to try :) [16:58:43] next thing would be to try an entire dump [16:58:56] with the improved error logging [16:58:58] aude: just shard 1, I think that's where the problem is [16:58:59] aude, SMalyshev: the item that caused the error would have been particlly exported. so it would not be "missing". everything after it (in the same shard) would be missing [16:59:08] SMalyshev: true [16:59:24] aude: how about trying shard 1/5 up to the first "missign" id? [16:59:55] DanielK_WMDE: yes, I tried the ones immediately preceding the broken ones, but by itself they seem to be fine... [16:59:58] or does the log tell us what id we were trying to export when the error occurred? [17:00:05] DanielK_WMDE: no it doesn't [17:00:13] that's what my patch would do [17:00:21] SMalyshev: that's really strange. from the looks of the error, it should be realyl localized. [17:00:40] unless we are hitting a serious bug in zend (or hhvm) [17:00:47] is there any phab project dedicated to wikidata editing? [17:01:22] aude, SMalyshev: once we can reproduce it, i recommend trying with another php implementation. just to make sure it's not really a bug in the vm [17:01:33] Danny_B: "Wikidata"? [17:01:43] i am trying shard 1 also [17:01:58] DanielK_WMDE: sure.... ;-) i thought if there is any dedicated one... [17:02:03] bug in vm seems unlikely... though who knows [17:02:18] the dump normally runs on snapshot [17:02:33] snapshot1003 [17:03:01] i am trying on terbium, but think it's also hhvm there [17:03:13] SMalyshev: yes, unlikely. but let's make sure. [17:03:22] I'd rather suspect some cleanup missing in purtle parsing states.... i.e. somehow we are ending up in DOCUMENT state with currentSubject still set [17:03:47] that's the only way we can get 5->11 from quantity generator as far as I can see [17:04:09] SMalyshev: if the entities work by themselves, but not together, that is very suspicious. the error relates to rdf writer state, and it doesn't hold much state. there is nothing that could carry over from a previous item [17:04:13] at least i can't think of anything [17:04:17] well, alternatively we try to output subject null/null... [17:04:33] DanielK_WMDE: there's dedup has [17:04:36] *hash [17:04:58] SMalyshev: if you follow the code path the stack trace indicates, you will see that just before the error happens, another ->say() call succeeds. That should be impossible. [17:05:01] so code flow may be different [17:05:12] true [17:05:32] DanielK_WMDE: well, the way say() may succeed without changing state is the check for current subject there [17:06:07] the question is how we end up with current subject but in document state? [17:06:36] I mean current predicate [17:07:07] one way that could happen if $base and $local end up nulls, but that seems improbably [17:07:11] *improbable [17:09:16] another way if we got them from somewhere else, but then how we end up in document state? VM bug could do that too, theoretically, by either getting us to null or copying variable to currentPredicate that should not be there... but that's crazy assumption [17:09:33] SMalyshev: QuantityRdfBuilder::addValue calls $writer->say()... and it succeeds. $writer should now be in state 12 (STATE_OBJECT). It then calls $this->addValueNode( $writer ... ), which calls ComplexValueRdfHelper::attachValueNode( $writer ... ). The call to $writer->say() in ComplexValueRdfHelper::attachValueNode fails, because it's in state 5 (STATE_DOCUMENT) now, and can't transition to 11 (STATE_PREDICATE) from there. [17:09:49] But how did it get into state 5 if it was in state 12 already two frames up the stack trace? [17:10:36] that's wat I am saying - transition to state 12 is after if(). So if we took if() instead for some reason, the state may not change [17:10:45] SMalyshev: to analyze this further, it would be really nice to know what combination of data triggers it [17:10:52] the question is - why would we take that if()? [17:11:11] SMalyshev: which if()? [17:11:27] i see the transition in QuantityRdfBuilder line 52/53 [17:11:28] if ( $base === $this->currentPredicate[0] && $local === $this->currentPredicate[1] ) { [17:11:28] El búfer 1 está vacío. [17:11:28] return $this; // redundant about() call [17:11:28] } [17:11:42] wut? [17:12:13] DanielK_WMDE: in say() in RdfWriterBase [17:12:21] SMalyshev: you are trying to find a problem in purtle, i'm looking at the RdfBuilder code [17:13:23] SMalyshev: hm, but perhaps you are right - the call to state() should come before the bailout. [17:13:32] otherwise, we might hide errors. [17:14:01] I'm not sure about that... if we don't actually output anything, I'm not sure we should change state... don't remember already [17:14:47] the problem is, we should not be able to get there with wrong state at the first place, as currentPredicate can be set only by say() [17:15:03] but maybe I miss some weird way to get it... [17:15:22] yea. say() should guarantee that after it returns, the state is STATE_PREDICATE, and not something else. [17:15:41] this is excellent timing! we get to make now releases to the new component :D [17:16:17] SMalyshev: i'll make a pull request :) [17:16:29] SMalyshev: i made you a collaborator on that github project, btw [17:16:34] you can push [17:16:47] that still doesn't explain how we got there in the first place. shouldn't ever happen [17:19:29] SMalyshev: true. but we might get a better stack trace [17:20:11] yeah at least having entity id would be nice. [17:20:26] though I fear it proves to be nothing special [17:21:01] hehe, changing the order of the bail-out messes with output indentation... [17:22:34] aude: please update the build ( https://gerrit.wikimedia.org/r/#/c/286109/ ) [17:22:57] SMalyshev: huh... trying to "fix" this breaks tests... odd... i'll try and figure it out [17:23:18] DanielK_WMDE: yeah I suspected it's not that simple :) [17:23:44] jzerebecki: doing [17:41:41] SMalyshev: https://github.com/wmde/purtle/pull/1 [17:41:52] hm, apparently, i didn't set up travis correctly. [17:44:00] DanielK_WMDE: enabled travis [17:44:19] aude: thanks. what did i miss? [17:44:20] just needed to click sync on travis [17:44:32] on the travis site? [17:44:34] to make it aware of new stuff in wmde [17:44:35] yeah [17:44:37] i thought that would happen autmagically [17:44:49] it synced yesterday, so maybe once a day [17:44:50] do you know how to add scrutenizer btw? [17:44:57] no [17:45:04] not sure i have access to that [17:45:15] ...and packagist also doesn't seem to work autmatically. oh, well [17:45:26] packagist probably same issue as travis [17:45:46] on packagist i did all the manual stuff [17:45:51] DanielK_WMDE: looks ok. I'm not 100% sure OBJECT is the only state but I guess we'll find out... [17:45:56] it works if i click "update", but not automatically [17:45:57] hmm [17:46:09] i have no idea [17:46:27] SMalyshev: i just spent half an hour thinking about this. i'm pretty sure. 96% ;) [17:46:35] DanielK_WMDE: same here :) [17:47:17] i am halfway through the list of missing entities [17:47:33] SMalyshev: after about() you expect to be able to write a predicate. you can do that in OBJECT state, but not in PREDICATE or SUBJECT. [17:48:08] sounds plausible :) [17:48:41] aude: I suspect then you passed the trigger... because they are dumped in order IIRC [17:48:48] SMalyshev: after say() you expect to be able to write an object or value. you can do that in OBJECT state or PREDICATE state. but calling say() when in predicate state is an error, which we should report by going on to the state() call. [17:48:56] aude: did you also try shard 1 in parallel? [17:49:25] DanielK_WMDE: yeah let's merge it and see [17:49:31] SMalyshev: doing shard 1 also [17:49:58] SMalyshev: so we are up to 1.0.3 already ;) i made 1.0.2 an hour or so ago. [17:50:17] aude: cool, it should drop somewhere after: Processed 1378502 entities. [17:50:54] aude, SMalyshev: writing the dump to /dev/null my speed things up [17:51:12] hm, a NullRdfWriter would be nice. it would do the same transitions, and generat6e no outpuot [17:51:16] DanielK_WMDE: that's what i'm doing [17:51:28] * aude doesn't want to use up all the disk space on terbium [17:51:28] ah good then :) [17:51:33] and it's faster [17:51:38] true, though then we won't see the part where it was breaking... [17:51:52] but maybe it's not needed [17:52:11] SMalyshev: why not? you still have the state, and the item id, and the stack trace. [17:52:24] a NullRdfWriter could omit any escaping and encoding [17:58:55] aude: how far down the line is the dump? I have to leave for like 20 mins but I want to see what happens :) [18:04:31] SMalyshev: Any idea when the query service is not corrupted anymore? Could you update the task with the ETA? [18:04:47] SMalyshev: 70% of missing entities [18:04:48] and [18:05:03] Processed 315242 entities. of shard 1 [18:05:21] multichill: when we figure out the dump problem and reload it. Probably several days [18:05:37] aude: thanks! guess it's some time to go yet [18:05:45] brb [18:06:13] yeah [18:34:28] ok, I'm back [18:57:08] DanielK_WMDE: as always, thanks for your hints! [19:01:09] matej_suchanek: now we only have to get the Echo tests working properly. they seem to be extremely brittle [19:02:08] sjoerddebruin: thanks for letting me know. I'm not really sure what would be a good source for them... it doesn't really seem any worse than importing them from wikipedia though. looking at the two, I should be able to filter out ones which no longer work, but I'm not sure about the other [19:07:29] DanielK_WMDE: I see, that's a pity [19:09:13] anyway, there is patch set 18 [19:24:48] DanielK_WMDE: search for map on https://linuxcontainers.org/lxc/security/ and https://linuxcontainers.org/lxc/getting-started/ [20:19:25] I'm getting "VM849:425 Uncaught ReferenceError: async is not defined" on https://www.wikidata.org/wiki/Q7156 in Chrome... [20:20:26] nikki: ok [20:20:55] And https://www.wikidata.org/wiki/User:Magnus_Manske/authority_control.js doens't load anymore... [20:21:32] same in safari [20:23:59] sjoerddebruin: Problem with the gadget or Wikidata? [20:24:09] (userscript*) [20:24:09] don't use the gadget, but I get the first error [20:25:03] ok...so something has happened [21:15:37] aude: any luck with that shard 1? [21:42:02] SMalyshev: it indeed fails [21:42:23] i'm running it again and this time we should get an entity id [21:43:05] Processed 1378395 entities. [21:43:06] Exception encountered, of type "LogicException" [21:46:28] aude: ok [22:08:41] aude: any news on the dump failure? [22:09:56] DanielK_WMDE: not surprisingly, canr eproduce it [22:10:12] canr? [22:10:15] i'm running it again on the shard and this time should get an id [22:10:17] heh [22:10:45] have you tried dumping that id alone? [22:10:55] what is it? [22:11:17] * DanielK_WMDE wonders if it happens with Special:EntityData too [22:12:24] i don't know what id it is [22:12:27] yet [22:14:28] ah, right [22:15:24] aude: are you trying with out without the patch against purtle? i doubt that patch will fix the issue. it should give us a stack trace that isn't misleading, though [22:17:33] the patch is not backported [22:17:45] would have to be submitted against the wikibase branch [22:18:31] that patch against purtle isn't even merged. i was thinking of a hot patch. but you probably don't want to do that on terbium [22:18:41] yeah [22:18:57] DanielK_WMDE: we could probably merge the purtle patch? [22:19:06] SMalyshev: go ahead :) [22:19:16] sjoerddebruin: sorry for doxxing/outing you on twitter.... [22:19:26] SMalyshev: if you do, also make a release tag. numbers are cheap [22:20:01] Josve05a: no problem. :) [22:20:09] (phew)