[00:24:14] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [00:24:14] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [04:20:27] 06Data-Engineering, 10Pageviews-Anomaly, 07SecTeam-Processed, 07Security: Views data integrity compromised by entity running up fake views - https://phabricator.wikimedia.org/T366554#11680384 (10Stevietheman) There was a period of time when the problem seemed to be receding, but now it's happening worse th... [04:24:14] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [04:24:14] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [08:24:14] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [08:24:14] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [09:29:15] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Growth-Team, 10Image-Suggestions: Fix Image suggestion DagProperty values - https://phabricator.wikimedia.org/T419204 (10APizzata-WMF) 03NEW [10:47:56] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Event-Platform: Logs and Monitoring for the HTML pipeline - https://phabricator.wikimedia.org/T418996#11680994 (10JMonton-WMF) These are logs taken from a PyFlink application. They seem to mix JSON with plain text, and many errors are reported as INFO... [11:36:44] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 07Essential-Work, 10Event-Platform, 13Patch-For-Review: Upgrade mediawiki-event-enrichment jobs to Flink 1.20.2 and Java 17 - https://phabricator.wikimedia.org/T408918#11681214 (10JMonton-WMF) There was another issue with the new eventutilities-pyth... [12:24:17] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [12:24:23] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [12:42:08] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 07Essential-Work, 10Event-Platform, 13Patch-For-Review: Upgrade mediawiki-event-enrichment jobs to Flink 1.20.2 and Java 17 - https://phabricator.wikimedia.org/T408918#11681493 (10JMonton-WMF) This MR: https://gitlab.wikimedia.org/repos/data-enginee... [13:03:20] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 07Essential-Work, 10Event-Platform, 13Patch-For-Review: Upgrade mediawiki-event-enrichment jobs to Flink 1.20.2 and Java 17 - https://phabricator.wikimedia.org/T408918#11681546 (10JMonton-WMF) Related to that previous MR, I believe that the issue wa... [14:01:05] 06Data-Engineering, 06Data-Platform-SRE (2026-03-06 - 2026-03-27): Optimize enqueueing of refine_webrequest_hourly pipeline - https://phabricator.wikimedia.org/T419050#11681850 (10Gehel) [14:01:11] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-03-06 - 2026-03-27), 13Patch-For-Review: Requesting Kerberos access for SCardenas (WMF) - https://phabricator.wikimedia.org/T418664#11681852 (10Gehel) [14:01:21] 06Data-Engineering, 06Data-Engineering-Radar, 06Privacy Engineering, 06Security-Team, and 2 others: Privacy review of x1 tables in preparation of adding them to wikireplicas - https://phabricator.wikimedia.org/T415219#11681864 (10Gehel) [14:01:55] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Data-Platform-SRE (2026-03-06 - 2026-03-27), 13Patch-For-Review: Deploy turnilo to dse-k8s-eqiad - https://phabricator.wikimedia.org/T416113#11681876 (10Gehel) [14:02:24] 06Data-Engineering, 06Data-Engineering-Radar, 06cloud-services-team, 06Data-Persistence, and 3 others: Create wiki replicas views for globaljsonlinks tables - https://phabricator.wikimedia.org/T387419#11681884 (10Gehel) [14:02:34] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Data-Platform-SRE (2026-03-06 - 2026-03-27), 07Essential-Work, 13Patch-For-Review: Carry out end-user testing of spark on kubernetes - https://phabricator.wikimedia.org/T412925#11681888 (10Gehel) [14:02:40] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-03-06 - 2026-03-27): Transfer ownership of Watchlist CTR dashboard to Mikhail - https://phabricator.wikimedia.org/T418485#11681898 (10Gehel) [14:02:46] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Data-Platform-SRE (2026-03-06 - 2026-03-27), 07Essential-Work: Blunderbuss: Move Hadoop/HDFS XML configuration into Helm deployment chart - https://phabricator.wikimedia.org/T402323#11681900 (10Gehel) [14:02:52] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Movement-Insights, 06Data-Platform-SRE (2026-03-06 - 2026-03-27), 07OKR-Work, 13Patch-For-Review: Run dbt from Airflow - https://phabricator.wikimedia.org/T410268#11681896 (10Gehel) [14:03:08] 06Data-Engineering, 06cloud-services-team, 06Data-Persistence, 10Data-Services, and 3 others: Set up x1 replication to Wiki Replicas - https://phabricator.wikimedia.org/T395881#11681904 (10Gehel) [14:03:25] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-03-06 - 2026-03-27), 07Essential-Work, 13Patch-For-Review: Provide an access to MaxMind GeoIP in DSE K8S pods - https://phabricator.wikimedia.org/T405509#11681906 (10Gehel) [14:03:53] 06Data-Engineering, 06Discovery-Search, 06Java-Scala-Standardization, 06Data-Platform-SRE (2026-03-06 - 2026-03-27), and 2 others: [Epic] Replace Archiva with Gitlab artifact repositories - https://phabricator.wikimedia.org/T367315#11681919 (10Gehel) [14:04:39] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-03-06 - 2026-03-27), 07Essential-Work: Do performance testing of a big Hadoop Table hosted by Ceph - https://phabricator.wikimedia.org/T381416#11681945 (10Gehel) [14:05:04] 06Data-Engineering, 10DPE-Mediawiki-Content, 06Data-Platform-SRE (2026-03-06 - 2026-03-27), 07Essential-Work: When wikis cannot be exported due to SiteInfo, don't fail them - https://phabricator.wikimedia.org/T408819#11681957 (10Gehel) [14:06:24] 06Data-Engineering, 06Data-Engineering-Radar, 10FR-Tech-Analytics, 06Data-Platform-SRE (2026-03-06 - 2026-03-27): Create FR Tech Airflow instance - https://phabricator.wikimedia.org/T417213#11682003 (10Gehel) [14:06:36] 06Data-Engineering, 06Data-Platform-SRE (2026-03-06 - 2026-03-27): Task Tries and Logs for Airflow DAGs sometimes unavailable - https://phabricator.wikimedia.org/T419162#11682084 (10Gehel) [14:07:29] 06Data-Engineering, 10Technical-blog-posts, 06Data-Platform-SRE (2026-03-06 - 2026-03-27), 07Essential-Work: Write a blog post about the recent Airflow migration to Kubernetes - https://phabricator.wikimedia.org/T393603#11682093 (10Gehel) [14:32:35] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Research-engineering, 06Research-Freezer, 10Event-Platform, 13Patch-For-Review: Productionized Edit Types - https://phabricator.wikimedia.org/T351225#11682262 (10Ottomata) I'd like to start a bikeshed around the 'edit type' name. Now that we ha... [14:38:11] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Research-engineering, 06Research-Freezer, 10Event-Platform, 13Patch-For-Review: Productionized Edit Types - https://phabricator.wikimedia.org/T351225#11682291 (10Ottomata) I like the 'fact' concept, but it is a bit broad. ML world sometimes cal... [15:00:03] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Research-engineering, 06Research-Freezer, 10Event-Platform, 13Patch-For-Review: Productionized Edit Types - https://phabricator.wikimedia.org/T351225#11682464 (10Ottomata) I've also started a [[ https://wikimedia.slack.com/archives/C05F8ERE2CV/p... [15:06:38] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Commons-Impact-Metrics, 10Commons-Impact-Metrics-Requests, 13Patch-For-Review: Update Commons Impact Metrics allow-list February 2026 - https://phabricator.wikimedia.org/T418434#11682488 (10xcollazo) 05In progress→03Resolved [15:26:10] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Research-engineering, 06Research-Freezer, 10Event-Platform, 13Patch-For-Review: Productionized Edit Types - https://phabricator.wikimedia.org/T351225#11682538 (10AKhatun_WMF) To me, the term `edit types` seems not bad: it is identifying the sema... [15:33:19] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Event-Platform: Adatp HTML pipeline to the new diffs schema - https://phabricator.wikimedia.org/T419258 (10JMonton-WMF) 03NEW [15:39:28] 06Data-Engineering, 06Data-Engineering-Radar, 07Essential-Work, 10Event-Platform, and 2 others: X-Experiment-Enrollments EventGate handling reinforcement for MalformedHeaderError cases - https://phabricator.wikimedia.org/T409106#11682584 (10tchin) Eventgate v1.28.0 is now deployed [15:44:22] 06Data-Engineering, 10Dumps-Generation, 10Prod-Kubernetes, 06ServiceOps new: mediawiki-dumps-legacy is running without security policy on dse-k8s-eqiad - https://phabricator.wikimedia.org/T419259 (10JMeybohm) 03NEW [16:13:54] RESOLVED: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [16:13:54] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [16:18:54] FIRING: MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [16:18:54] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [16:23:54] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [16:23:54] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [16:56:19] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Movement-Insights, 06Data-Platform-SRE (2026-03-06 - 2026-03-27), 07OKR-Work, 13Patch-For-Review: Run dbt from Airflow - https://phabricator.wikimedia.org/T410268#11682867 (10Ahoelzl) a:05JMonton-WMF→03amastilovic [17:00:29] 06Data-Engineering, 06Data-Platform-SRE, 10Dumps-Generation, 10Prod-Kubernetes, 06ServiceOps new: mediawiki-dumps-legacy is running without security policy on dse-k8s-eqiad - https://phabricator.wikimedia.org/T419259#11682904 (10xcollazo) [17:21:01] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Research-engineering, 06Research-Freezer, 10Event-Platform, 13Patch-For-Review: Productionized Edit Types - https://phabricator.wikimedia.org/T351225#11682969 (10TBurmeister) For information about a revision (data computed about the revision), I... [17:40:01] 06Data-Engineering: The revision_seconds_to_identity_revert field in wmf.mediawiki_history has sometimes negative values - https://phabricator.wikimedia.org/T419267 (10mforns) 03NEW [17:49:46] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 07OKR-Work (WE1 FY2025-26): [Spike] Adding access_method metadata to moderator action event streams - https://phabricator.wikimedia.org/T419019#11683038 (10CMyrick-WMF) > The caveats: > * First, as with other change tags, this registers whether the user... [19:04:29] 06Data-Engineering, 06Data-Platform-SRE (2026-03-06 - 2026-03-27): Task Tries and Logs for Airflow DAGs sometimes unavailable - https://phabricator.wikimedia.org/T419162#11683255 (10xcollazo) Another weird behavior that I just observed: I have a long running task currently going for 1 day. For some reason, Ai... [19:21:39] 06Data-Engineering, 10ChangeProp, 10EventStreams, 06MediaWiki-Engineering, and 15 others: Migrate node-based services in production to node22 - https://phabricator.wikimedia.org/T393434#11683306 (10Jdforrester-WMF) [19:26:59] 06Data-Engineering, 06Data-Platform-SRE (2026-03-06 - 2026-03-27): Task Tries and Logs for Airflow DAGs sometimes unavailable - https://phabricator.wikimedia.org/T419162#11683315 (10xcollazo) >>! In T419162#11683255, @xcollazo wrote: > Another weird behavior that I just observed: I have a long running task cur... [19:54:20] 06Data-Engineering: The revision_seconds_to_identity_revert field in wmf.mediawiki_history has sometimes negative values - https://phabricator.wikimedia.org/T419267#11683444 (10mforns) When troubleshooting and fixing this, we should consider also solving T266374, since diving in mediawiki_history code always tak... [20:23:54] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [20:23:54] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [20:47:56] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): enwiki File Export failed for 2026-03-01 - https://phabricator.wikimedia.org/T419291 (10xcollazo) 03NEW [20:49:37] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): enwiki File Export failed for 2026-03-01 - https://phabricator.wikimedia.org/T419291#11683668 (10xcollazo) This is the first time we fail on `enwiki` since we went to prod. [20:58:10] 06Data-Engineering, 10Dumps-Generation, 06Wikimedia Enterprise: Stale data / missing pages in HTML ("enterprise") - https://phabricator.wikimedia.org/T305407#11683690 (10JArguello-WMF) Hi all! We’re following up on this task with the results of a 2026 audit of the English Wiktionary Enterprise dumps to confi... [20:58:49] 06Data-Engineering, 10Dumps-Generation, 06Wikimedia Enterprise: Stale data / missing pages in HTML ("enterprise") - https://phabricator.wikimedia.org/T305407#11683693 (10JArguello-WMF) 05Open→03Resolved [21:19:47] 06Data-Engineering, 06Test Kitchen: GrowthBook experiment analysis keeps failing/stalling - https://phabricator.wikimedia.org/T419286#11683754 (10mpopov) [21:39:56] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): enwiki File Export failed for 2026-03-01 - https://phabricator.wikimedia.org/T419291#11683826 (10Pppery) I wonder if this is fallout from #2026-user-javascript-incident [21:54:08] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Test Kitchen: GrowthBook experiment analysis keeps failing/stalling - https://phabricator.wikimedia.org/T419286#11683849 (10Ahoelzl) [22:00:14] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Test Kitchen: GrowthBook experiment analysis keeps failing/stalling - https://phabricator.wikimedia.org/T419286#11683854 (10Ahoelzl) a:03amastilovic [23:20:56] 10Analytics-Canonical-Data, 06Movement-Insights: Add CI checking that the data protection information in the canonical country dataset matches the source - https://phabricator.wikimedia.org/T415817#11684014 (10nshahquinn-wmf) 05Open→03Declined Folding this into T419304. [23:41:12] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Movement-Insights: Investigate and repair pageviews and unique devices spike starting in Nov 2025 - https://phabricator.wikimedia.org/T416933#11684037 (10Hghani) Hi everyone, here's a quick update on the investigation: We have isolated approximately...