[08:22:41] Is there a way to get all articles which don't have an image using query. [11:19:59] gyan: something like that should do the job: https://quarry.wmflabs.org/query/32222 [11:48:59] https://quarry.wmflabs.org/query/32229 [11:49:15] The query is running forever [13:02:30] gyan: looks completed? [13:10:09] Yes but it shows only five article name I don't think that is correct. [13:10:27] What do you think LIMIT 5 does? [13:11:20] Sorry. My mistake. I have changed the quarry as it was taking a lot of time. [13:13:12] As your query is doing subqueries, sorting other tables and stuff, it's probably going to take nearly as long to get 5 results as it would 1000 [13:13:25] Because it has to sort the other tables and such [13:15:27] Ok. Thank you. [19:36:52] 1 million entries... [20:21:45] Yes [20:21:57] This should not be. [20:22:01] https://quarry.wmflabs.org/query/32229 [20:23:57] We have around 14000 articles, but 1 Million result [20:42:42] gyan: yes, because you are selecting on both page an imagelinks [20:43:17] for each page in the result the row is duplicated for every imagelink [20:43:57] btw GROUP BY(il_from) is unnecessary. NOT IN is implemented with NOT EXISTS [20:45:12] I know mysql a little but not expert. I am using a query suggested by @framawiki https://quarry.wmflabs.org/query/32222 [20:45:30] if you do want unique results (without fancy stuffs like COUNT) in some query, DISTINCT is usually better than GROUP BY [20:46:17] well, the original query can be improved :) [20:46:35] My intention was to get all the articles which don't have an image. I have not put any limit in the query as I thought the result will be less than 10000. [20:47:00] ok, let me optimize it [20:47:10] Thank You [20:49:01] what is this commonswiki_p part... it's checking if the target if the image exists... [20:49:26] https://quarry.wmflabs.org/query/32245 [20:49:28] done [20:49:32] 6 seconds [20:50:10] I don't have tehe fonts installed to display orwiki titles though [20:50:45] WoW! it's perfect. [20:52:26] :) [20:58:08] I am novice user but what I understand is the previous query just checking all images on commons if they are linked or not but the second just retrieves the pages where there no image link associated. (I don't know this should be discussed here or not) [20:58:53] Thank you. @zhuyifei1999 There should be a button on wiki to make friends :) [21:12:35] gyan: it's okay. let me explain that original query for you [21:13:31] first you have SELECT ... FROM page, imagelinks [21:13:57] this means for every pair of page and imagelinks [21:14:58] so the number of stuffs checked is kind of like (number of pages) * (number of imagelinks) [21:15:29] and then WHERE page_namespace = 0 AND page_is_redirect = 0... the page must be in ns:0 and not a redirect [21:16:03] and then SELECT il_from FROM imagelinks WHERE il_from_namespace = 0 so all the source page ids of imagelinks [21:16:41] and page_id NOT IN ... the page isn't a source of imagelink [21:17:07] and then this AND NOT EXISTS thing seems irrelevant [21:17:55] as in, I cannot comprehend what it means in the context of this query. [21:18:31] it means in the original page * imagelink matrix filter out all the imagelinks that points to a commons image [21:20:33] I got it. [21:22:55] That was showing the correct result but repeated time. [21:23:32] Thank You for your time. [21:25:38] np