[10:21:51] pfischer: do we have a 1:1 ? [10:47:07] lunch [14:16:08] o/ [14:26:17] \o [14:28:56] o/ [14:45:53] hmm, the token at https://gitlab.wikimedia.org/groups/repos/maven/-/settings/access_tokens was created nov 14, 2025 and expired dec 31, 2025 [14:46:09] i don't even like yearly token expires :P [15:57:30] Donn't get me started on the TLS certificate expiration stuff ;P [16:00:05] i'm not sure where exactly that token was used though :S Probably jenkins, but i can't grep through that to check (i think it's only in the jenkins ui for admins or some such) [17:09:33] dcausse: Is the space-issue with the structured dumps already mentioned in https://docs.google.com/document/d/1NidOLZdtGtJRcm24jAh2qaEN9M1yQ6y1n94gU8M-AHM/edit?tab=t.0 ? I can definitely raise this to Enterprise. [17:10:06] pfischer: yes and I raised this doc to enterprise folks [17:19:08] dcausse: Thanks, sorry, just saw your slack posts. [17:30:58] I was trying to gather information on what Search already logs that might be used for analytics in context of the semantic search prototype. Looked at the hive tables. DataHub helps to some extend to understand the schema (and purpose), but is there some kind of overview of what we track? [17:45:47] I put a list together: https://docs.google.com/document/d/1WOeJjG9x2QhzfQy8VC42ckP67o70iFOVHts4aYd3Qro/edit?tab=t.0 Is any source missing from it? [18:00:44] pfischer: i don't think we have any kind of overview. The main ones should all be imported into datahub, the list we expose to datahub is in airflwo dags repo, sec [18:01:19] this is the full list of what we considered potentially useful to other teams: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/main/main/dags/datahub/ingestion/configs/hive_discovery.yaml?ref_type=heads [18:01:29] i guess there are also the events though, hmm [18:40:03] https://bigdataboutique.com/blog/opensearch-in-2025-features-improvements-and-community-growth-642146 I just found the OpenSearch "year in review" article. The read/write separation feature sounds interesting [20:04:43] inflatador: Small request, i think dumps.wikimedia.org has it's directories mounted to clouddumps1001.wikimedia.org. Could you place the contents of https://phabricator.wikimedia.org/P86770 in other/cirrussearch/DEPRECATED.txt ? [20:04:55] probably have to check `mount` to see where it is on the server [20:05:39] afaikt only dumps::distribution::server has that data, and thats only applied to cloudumps100[12].wikimedia.org [21:40:28] ebernhardson 👀 [21:42:06] thanks! I was also double checking now plugin versions, i was expecting `insource:/\u{1F600}/` to work in prod but it doesn't, suspect we still need to ship an updated plugin (not now, but upcoming) [21:44:04] yup, looks like we released wmf9 for opensearch-extra plugin but prod only has wmf8. Somehow it looks like i'm still cleaning up things i didn't manage to finish before taking vacation [21:45:38] * ebernhardson should keep better notes :P [21:45:53] ebernhardson I'm not seeing what I expect to see on the dumps server, would you mind getting on a quick meet to do a sanity check? https://meet.google.com/nym-guvj-ddr?authuser=0 [21:57:32] ^^ closing the loop, we got it set up [22:17:31] ebernhardson do we use https://github.com/synhershko/elasticsearch-analysis-hebrew or a derivative? Just wondering, I've been talking with the author on OpenSearch Slack recently [22:18:11] hmm, we have a plugin but i have to double check it's source. I think it's perhaps a variant of hebmorph [22:18:48] we apparently use qAgmtcXtFcsrvubJhuv9_W86MQp1OjF3bAk.01.0z13xz1xf [22:19:00] doh...thankfully thats a temp token [22:19:25] that's cool, it looks like he wrote hebmorph as well [22:22:33] but yea we fetch hebmorph, but our fork: https://gitlab.wikimedia.org/repos/search-platform/HebMorph [22:22:36] inflatador: ^ [22:23:02] looks like it was mostly about updating lucene to match opensearch [22:24:41] ebernhardson ACK, I thought it was something like that ;) [22:32:54] * ebernhardson apparently just needed to log out and log back in to fix the clipboard [22:33:13] it was for some reason only pasting from my terminal, copies were ignored