[01:25:20] FIRING: GobblinKafkaRecordsExtractedNotEqualRecordsExpected: Gobblin job webrequest_sampled ingested an unexpected number of records for a Kafka topic partition. ... [01:25:20] - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin - https://grafana.wikimedia.org/d/pAQaJwEnk/gobblin?orgId=1&var-gobblin_job_name=webrequest_sampled&var-kafka_topic=webrequest_sampled&viewPanel=24 - https://alerts.wikimedia.org/?q=alertname%3DGobblinKafkaRecordsExtractedNotEqualRecordsExpected [05:25:20] FIRING: GobblinKafkaRecordsExtractedNotEqualRecordsExpected: Gobblin job webrequest_sampled ingested an unexpected number of records for a Kafka topic partition. ... [05:25:20] - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin - https://grafana.wikimedia.org/d/pAQaJwEnk/gobblin?orgId=1&var-gobblin_job_name=webrequest_sampled&var-kafka_topic=webrequest_sampled&viewPanel=24 - https://alerts.wikimedia.org/?q=alertname%3DGobblinKafkaRecordsExtractedNotEqualRecordsExpected [06:40:51] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164#11561064 (10ops-monitoring-bot) Starting pool of db1163 by marostegui@cumin1003: After schema change [06:41:14] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163#11561065 (10Marostegui) [06:41:25] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164#11561066 (10Marostegui) [06:54:51] 06Data-Engineering, 10MediaWiki-extensions-EventLogging, 06Test Kitchen, 07Essential-Work: Deprecate and remove EventLogging::getMetricsPlatformClient() - https://phabricator.wikimedia.org/T415246#11561108 (10phuedx) >>! In T415246#11548974, @Sfaci wrote: > @phuedx are usages like [[https://gerrit.wikimedi... [07:23:41] 06Data-Engineering, 06DBA, 13Patch-For-Review, 07Schema-change-in-production: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163#11561119 (10Marostegui) [07:23:47] 06Data-Engineering, 06DBA, 13Patch-For-Review, 07Schema-change-in-production: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164#11561120 (10Marostegui) [07:24:14] 06Data-Engineering, 06DBA, 13Patch-For-Review, 07Schema-change-in-production: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163#11561129 (10Marostegui) [07:24:23] 06Data-Engineering, 06DBA, 13Patch-For-Review, 07Schema-change-in-production: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164#11561134 (10Marostegui) [07:26:18] 06Data-Engineering, 06DBA, 13Patch-For-Review, 07Schema-change-in-production: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164#11561136 (10ops-monitoring-bot) Completed pooling of db1163 by marostegui@cumin1003: After schema change [08:15:25] (03CR) 10Joal: [C:03+1] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1233834 (https://phabricator.wikimedia.org/T396031) (owner: 10Xcollazo) [09:25:20] FIRING: GobblinKafkaRecordsExtractedNotEqualRecordsExpected: Gobblin job webrequest_sampled ingested an unexpected number of records for a Kafka topic partition. ... [09:25:20] - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin - https://grafana.wikimedia.org/d/pAQaJwEnk/gobblin?orgId=1&var-gobblin_job_name=webrequest_sampled&var-kafka_topic=webrequest_sampled&viewPanel=24 - https://alerts.wikimedia.org/?q=alertname%3DGobblinKafkaRecordsExtractedNotEqualRecordsExpected [13:25:20] FIRING: GobblinKafkaRecordsExtractedNotEqualRecordsExpected: Gobblin job webrequest_sampled ingested an unexpected number of records for a Kafka topic partition. ... [13:25:20] - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin - https://grafana.wikimedia.org/d/pAQaJwEnk/gobblin?orgId=1&var-gobblin_job_name=webrequest_sampled&var-kafka_topic=webrequest_sampled&viewPanel=24 - https://alerts.wikimedia.org/?q=alertname%3DGobblinKafkaRecordsExtractedNotEqualRecordsExpected [13:27:17] 06Data-Engineering, 10Wikidata Analytics: [Analytics] [Tech Debt] Implement initial data quality assessment system for - https://phabricator.wikimedia.org/T415783 (10AndrewTavis_WMDE) 03NEW [13:29:21] 06Data-Engineering, 10Wikidata Analytics: [Analytics] [Tech Debt] Exploration of initial data quality assessment system for Airflow DAGs - https://phabricator.wikimedia.org/T415783#11562141 (10AndrewTavis_WMDE) [13:47:31] (03CR) 10Xcollazo: [C:03+2] Remove mediawiki_wikitext_* from refinery-drop-mediawiki-snapshots [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1233834 (https://phabricator.wikimedia.org/T396031) (owner: 10Xcollazo) [13:50:43] (03CR) 10TChin: [C:03+1] Remove datahub-cli package env dir [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1233738 (https://phabricator.wikimedia.org/T415357) (owner: 10Aqu) [13:52:45] (03CR) 10Joal: [C:03+1] Remove datahub-cli package env dir [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1233738 (https://phabricator.wikimedia.org/T415357) (owner: 10Aqu) [14:04:51] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 07OKR-Work: SDS 2.2.6 Improve experiment event data data lake management - https://phabricator.wikimedia.org/T414105#11562338 (10xcollazo) Summary from [[ https://wikimedia.slack.com/archives/C05RHK7PS6Q/p1769543688758459?thread_ts=1769525669.672129&cid... [14:06:46] 06Data-Engineering, 06Data-Engineering-Radar, 10GrowthExperiments, 06MediaWiki-Engineering, and 8 others: mw.track: support for histogram metrics - https://phabricator.wikimedia.org/T383563#11562352 (10DMburugu) [14:13:58] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 (10Zabe) 03NEW [14:16:16] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786#11562433 (10Marostegui) p:05Triage→03Medium a:03Marostegui [14:17:55] (03CR) 10A-pizzata: [C:03+1] Remove datahub-cli package env dir [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1233738 (https://phabricator.wikimedia.org/T415357) (owner: 10Aqu) [14:28:05] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 13Patch-For-Review: Upgrade DataHub CLI virtualenv used by metadata_ingest_daily to restore Druid ingestion - https://phabricator.wikimedia.org/T415357#11562488 (10xcollazo) >git history is preserved (the repo is actually a fork of analytics/refiner... [14:37:43] 06Data-Engineering, 06Privacy Engineering: The soon-to-be-released pageview datasets should be linked from dumps page - https://phabricator.wikimedia.org/T335958#11562525 (10Ottomata) Got notification about this task and it made me wonder: should we have done this years ago? Especially now that Data-Engineerin... [14:50:58] 06Data-Engineering, 06DBA, 13Patch-For-Review, 07Schema-change-in-production: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786#11562583 (10Marostegui) [14:52:19] 06Data-Engineering, 06DBA, 13Patch-For-Review, 07Schema-change-in-production: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786#11562602 (10Marostegui) [15:10:28] (03CR) 10Xcollazo: [V:03+2 C:03+2] Remove mediawiki_wikitext_* from refinery-drop-mediawiki-snapshots [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1233834 (https://phabricator.wikimedia.org/T396031) (owner: 10Xcollazo) [15:43:22] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Optimize canary event generation resources consumption on Airflow - https://phabricator.wikimedia.org/T411989#11562796 (10Antoine_Quhen) K8s execution deployed, but we are not observing the overall performance gain we would have expected. We later tweak t... [15:47:44] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): mw_content_history_reconcile_enrich api call returned 503 - https://phabricator.wikimedia.org/T415264#11562825 (10Ottomata) Weird! - stream config failing should fail the startup of the app, so that is okay. - but something caused the app to originally... [16:00:12] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Event-Platform, 05MW-1.46-notes (1.46.0-wmf.13; 2026-01-27), 13Patch-For-Review, 07Technical-Debt: [EventBus] Stabilize EventSerializer and related classes - https://phabricator.wikimedia.org/T392516#11562873 (10Ahoelzl) [16:00:58] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): mw_content_history_reconcile_enrich api call returned 503 - https://phabricator.wikimedia.org/T415264#11562879 (10Ahoelzl) a:03JMonton-WMF [16:01:28] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): mw_content_history_reconcile_enrich api call returned 503 - https://phabricator.wikimedia.org/T415264#11562881 (10Ahoelzl) We'll invest time to understand the startup failure. [17:05:50] 06Data-Engineering, 06Data-Engineering-Radar, 06Test Kitchen, 07Essential-Work, 10Event-Platform: X-Experiment-Enrollments EventGate handling reinforcement for MalformedHeaderError cases - https://phabricator.wikimedia.org/T409106#11563129 (10Milimetric) looks good, please re-add us if you need support [17:06:17] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Airflow main performance instance optimization - https://phabricator.wikimedia.org/T411988#11563134 (10Ahoelzl) [17:07:35] 06Data-Engineering, 06Data-Engineering-Radar, 06Test Kitchen, 07Essential-Work, 10Event-Platform: More strictly validate X-Experiment-Enrollments-Header - https://phabricator.wikimedia.org/T401198#11563142 (10Milimetric) DE doesn't plan to work on this, but we'll discuss in our sync-up meeting [17:10:13] 06Data-Engineering, 10Dumps-Generation, 06Data-Platform-SRE (2026.01.23 - 2026.02.13), 07Essential-Work: Certain *recombine tasks in dumps_v1 are non-idempotent and can generate corrupt files - https://phabricator.wikimedia.org/T404859#11563157 (10Milimetric) we're shifting to the new Dumps 2.0 system [17:22:13] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 07OKR-Work (WE1 FY2025-26): WE1.5.3 Productize Data for Monthly Active Moderator Actions - https://phabricator.wikimedia.org/T410940#11563275 (10Ahoelzl) [17:22:21] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 07OKR-Work (WE1 FY2025-26): WE1.5.3 Productize Data for Monthly Active Moderator Actions - https://phabricator.wikimedia.org/T410940#11563278 (10Ahoelzl) [17:25:20] FIRING: GobblinKafkaRecordsExtractedNotEqualRecordsExpected: Gobblin job webrequest_sampled ingested an unexpected number of records for a Kafka topic partition. ... [17:25:20] - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin - https://grafana.wikimedia.org/d/pAQaJwEnk/gobblin?orgId=1&var-gobblin_job_name=webrequest_sampled&var-kafka_topic=webrequest_sampled&viewPanel=24 - https://alerts.wikimedia.org/?q=alertname%3DGobblinKafkaRecordsExtractedNotEqualRecordsExpected [17:29:14] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 07OKR-Work: SDS 2.2.6 Improve experiment event data data lake management - https://phabricator.wikimedia.org/T414105#11563317 (10AKhatun_WMF) We have the same exact problem in superset dashboards. Project slice had this in it. Summarizing the decisions... [17:34:04] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Data-Engineering-Wikistats: The stat site show INVALID on Cantonese Wikipedia. - https://phabricator.wikimedia.org/T411938#11563355 (10Ahoelzl) [17:34:29] 06Data-Engineering, 06Test Kitchen: json schema tools: Can we allow changing the regex pattern for a schema field - https://phabricator.wikimedia.org/T411518#11563356 (10Milimetric) 05Open→03Declined no immediate need for this - please reopen otherwise (ideal would be to reduce friction on major versio... [18:13:31] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 13Patch-For-Review: duplicated page_title in mediawiki_content_current_v1 - https://phabricator.wikimedia.org/T413888#11563486 (10xcollazo) 05In progress→03Resolved [18:20:56] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 07OKR-Work (WE1 FY2025-26): WE1.5.3 Productize Data for Monthly Active Moderator Actions - https://phabricator.wikimedia.org/T410940#11563519 (10Ahoelzl) @OSefu-WMF do you have an update for us? We'd like to get implementation work started soon. [18:37:16] 10Analytics-Canonical-Data, 06Movement-Insights: Add CI checking that the data protection information in the canonical country dataset matches the source - https://phabricator.wikimedia.org/T415817 (10nshahquinn-wmf) 03NEW p:05Triage→03High [18:38:17] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 13Patch-For-Review: Upgrade DataHub CLI virtualenv used by metadata_ingest_daily to restore Druid ingestion - https://phabricator.wikimedia.org/T415357#11563608 (10Antoine_Quhen) I've marked all failed dag run as success to clear the UI. The dag is... [18:39:26] (03PS2) 10Aqu: Remove datahub-cli package env dir [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1233738 (https://phabricator.wikimedia.org/T415357) [18:50:23] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Data-Platform-SRE (2026.01.23 - 2026.02.13), 07Essential-Work: Refine generates very large XCOM values - https://phabricator.wikimedia.org/T414953#11563668 (10Ahoelzl) a:05Antoine_Quhen→03None Given the progress on the Airflow infrastructure (le... [18:54:12] (03CR) 10Aqu: [V:03+2 C:03+2] Remove datahub-cli package env dir [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1233738 (https://phabricator.wikimedia.org/T415357) (owner: 10Aqu) [18:57:12] 06Data-Engineering: Migrate cleanup jobs for snapshot datasets from systemd timers to Airflow - https://phabricator.wikimedia.org/T411999#11563708 (10Antoine_Quhen) a:05Antoine_Quhen→03None [18:59:25] 06Data-Engineering: Migrate cleanup jobs for snapshot datasets from systemd timers to Airflow - https://phabricator.wikimedia.org/T411999#11563712 (10Antoine_Quhen) Following discussion with @Ahoelzl we can postpone that on Q4. [19:01:09] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Reduce main Airflow DB size and consider splitting heavy workloads into separate instances - https://phabricator.wikimedia.org/T411992#11563715 (10Antoine_Quhen) 05Open→03Resolved Closing. Next optimization could be splitting from main instance fi... [19:15:23] 06Data-Engineering: [refine] Add support for custom Hive Iceberg partitioning - https://phabricator.wikimedia.org/T377600#11563753 (10Ahoelzl) This will also allow for a more direct Growthbook event stream integration https://phabricator.wikimedia.org/T414105 [19:24:00] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10DPE-Mediawiki-Content: Inconsistent page title styles in Mediawiki content current v1 dumps - https://phabricator.wikimedia.org/T410405#11563779 (10xcollazo) I've investigated this issue and concluded that the `page_title` inconsistency comes from bac... [19:55:37] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10DPE-Mediawiki-Content: Missing/inconsistent page_redirect_target field for redirects in Mediawiki content current v1 dumps - https://phabricator.wikimedia.org/T400632#11563844 (10xcollazo) >>! In T400632#11383880, @xcollazo wrote: > To recap, this fix... [20:38:00] 06Data-Engineering: Decomission dedicated growthbook airflow pipeline(s) - https://phabricator.wikimedia.org/T415826 (10AKhatun_WMF) 03NEW [20:39:11] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 07OKR-Work: SDS 2.2.6 Improve experiment event data data lake management - https://phabricator.wikimedia.org/T414105#11563931 (10AKhatun_WMF) Decision Log ([Slack](https://wikimedia.slack.com/archives/C05RHK7PS6Q/p1769623161173349?thread_ts=1769525669.6... [21:05:50] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10DPE-Mediawiki-Content: Inconsistent page title styles in Mediawiki content current v1 dumps - https://phabricator.wikimedia.org/T410405#11563977 (10xcollazo) From description: > Weirdly, there are some pages that have both styles too -- e.g., Categor... [21:25:20] FIRING: GobblinKafkaRecordsExtractedNotEqualRecordsExpected: Gobblin job webrequest_sampled ingested an unexpected number of records for a Kafka topic partition. ... [21:25:20] - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin - https://grafana.wikimedia.org/d/pAQaJwEnk/gobblin?orgId=1&var-gobblin_job_name=webrequest_sampled&var-kafka_topic=webrequest_sampled&viewPanel=24 - https://alerts.wikimedia.org/?q=alertname%3DGobblinKafkaRecordsExtractedNotEqualRecordsExpected [22:42:12] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Migrate cleanup jobs for snapshot datasets from systemd timers to Airflow - https://phabricator.wikimedia.org/T411999#11564253 (10Ahoelzl) p:05Triage→03Medium [22:43:25] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Migrate cleanup jobs for snapshot datasets from systemd timers to Airflow - https://phabricator.wikimedia.org/T411999#11564255 (10Ahoelzl) Keeping it in Q3 scope Next Up in case capacity frees up later this quarter.