[01:01:50] 06Data-Engineering, 10MobileFrontend, 06Traffic: Add ismobile attribute to X-Analytics header - https://phabricator.wikimedia.org/T390924#10926212 (10tstarling) >>! In T390924#10707573, @phuedx wrote: > IIRC Varnish is the decision maker in production – MobileFrontend simply responds to the presence of the `... [01:19:04] 06Data-Engineering, 10MobileFrontend, 06Traffic: Add ismobile attribute to X-Analytics header - https://phabricator.wikimedia.org/T390924#10926221 (10Krinkle) p:05Triage→03High a:03Krinkle In a sense the question is whether `access_method` should classisy the client, or the server response. * client -... [03:18:28] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [03:31:47] 06Data-Engineering, 06Traffic, 10MediaWiki-Platform-Team (Radar), 13Patch-For-Review: Add ismobile attribute to X-Analytics header - https://phabricator.wikimedia.org/T390924#10926395 (10Krinkle) [05:26:28] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130#10926495 (10Marostegui) [07:18:28] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [07:59:48] (03PS2) 10Gehel: feat(FileOutputCommitter): file output committer isolating jobs. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1155597 [08:20:47] !log restart `hadoop-hdfs-zkfc.service` on an-master1004 T374922 [08:20:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:20:51] T374922: Bring an-conf100[4-6] into service to replace an-conf100[1-3] - https://phabricator.wikimedia.org/T374922 [08:24:24] !log restart `hadoop-hdfs-zkfc.service` on an-master1003 T374922 [08:24:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:47:57] (03PS1) 10Gehel: test: configure minimal logging during tests [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1160696 [08:51:42] (03CR) 10CI reject: [V:04-1] test: configure minimal logging during tests [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1160696 (owner: 10Gehel) [08:53:56] (03PS3) 10Gehel: feat(FileOutputCommitter): file output committer isolating jobs. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1155597 [08:53:56] (03PS1) 10Gehel: test: configure minimal logging in refinery-core tests [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1160697 [08:54:29] (03Abandoned) 10Gehel: test: configure minimal logging during tests [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1160696 (owner: 10Gehel) [08:56:15] (03CR) 10CI reject: [V:04-1] feat(FileOutputCommitter): file output committer isolating jobs. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1155597 (owner: 10Gehel) [08:57:48] (03PS4) 10Gehel: feat(FileOutputCommitter): file output committer isolating jobs. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1155597 [09:00:18] (03CR) 10CI reject: [V:04-1] feat(FileOutputCommitter): file output committer isolating jobs. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1155597 (owner: 10Gehel) [09:26:50] (03PS5) 10Gehel: feat(FileOutputCommitter): file output committer isolating jobs. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1155597 [09:29:09] (03CR) 10CI reject: [V:04-1] feat(FileOutputCommitter): file output committer isolating jobs. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1155597 (owner: 10Gehel) [10:45:53] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130#10927482 (10Marostegui) [11:18:28] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [12:23:25] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Event-Platform, 13Patch-For-Review: [Event Platform] eventutilites-python: improve consistency guarantees of async process functions - https://phabricator.wikimedia.org/T347282#10927918 (10gmodena) > @xcollazo gave a go for using the mediawiki.-content-hi... [12:45:38] !log restart `zookeeper` on an-conf1002 to get a new cluster leader T374922 [12:45:42] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:45:43] T374922: Bring an-conf100[4-6] into service to replace an-conf100[1-3] - https://phabricator.wikimedia.org/T374922 [12:49:33] 06Data-Engineering, 10Event-Platform: mediawiki.content_history: flink applications experiencing frequent restarts - https://phabricator.wikimedia.org/T397330 (10gmodena) 03NEW [12:55:46] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Event-Platform, 13Patch-For-Review: [Event Platform] eventutilites-python: improve consistency guarantees of async process functions - https://phabricator.wikimedia.org/T347282#10928073 (10xcollazo) > @xcollazo if you are up to it, we could promote the -n... [13:07:26] 10Quarry: quarry: Upgrade Python libraries - https://phabricator.wikimedia.org/T397331 (10taavi) 03NEW [13:11:29] 10Quarry: quarry: Use a proper Python package manager - https://phabricator.wikimedia.org/T397332 (10taavi) 03NEW [13:24:43] 06Data-Engineering, 06MediaWiki-Engineering, 06MW-Interfaces-Team, 06serviceops, and 3 others: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745#10928172 (10HCoplin-WMF) p:05High→03Low Updating the priority to low. We are curren... [13:33:15] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content, 13Patch-For-Review: Investigate reasons for remaining inconsistencies - https://phabricator.wikimedia.org/T385112#10928204 (10xcollazo) Now rerunning `all-of-wiki-time` [[ https://airflow.wikimedia.org/dags/mw_content_reconcile_mw_c... [13:34:12] 06Data-Engineering, 06MediaWiki-Engineering, 06MW-Interfaces-Team, 06serviceops, and 3 others: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745#10928208 (10hnowlan) I can't say with certainty, but there is a reasonable chance that... [13:39:09] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Create Airflow pool for bursty MW Content Pipelines tasks - https://phabricator.wikimedia.org/T397333 (10xcollazo) 03NEW [13:39:20] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Create Airflow pool for bursty MW Content Pipelines tasks - https://phabricator.wikimedia.org/T397333#10928235 (10xcollazo) [13:39:22] 10Data-Engineering-Roadmap, 10DPE-Mediawiki-Content, 07Epic: Dumps 2.0 Phase III: Production level dumps - https://phabricator.wikimedia.org/T366752#10928236 (10xcollazo) [13:52:35] 10Data-Engineering (Q4 2025 April 1st - June 30th): Investigate mw-page-content-change memory alerts - https://phabricator.wikimedia.org/T397336 (10gmodena) 03NEW [13:53:27] 10Data-Engineering (Q4 2025 April 1st - June 30th): Investigate mw-page-content-change memory alerts - https://phabricator.wikimedia.org/T397336#10928324 (10gmodena) [13:59:40] 06Data-Engineering, 06Data-Platform-SRE: Enable async queries for Superset with Celery - https://phabricator.wikimedia.org/T397338 (10BTullis) 03NEW [14:55:44] 10Quarry, 10cloud-services-team (FY2024/2025-Q3-Q4): Deploy prometheus-redis-exporter - https://phabricator.wikimedia.org/T396771#10928594 (10github-toolforge-bot) supertassu closed https://github.com/toolforge/quarry/pull/90 [15:16:00] 06Data-Engineering, 06Data-Engineering-Icebox, 06SRE Observability, 10Data-Platform-SRE (2025.06.13 - 2025.07.04), 13Patch-For-Review: [Data Platform] Install a Prometheus connector for Presto, pointed at thanos-query - https://phabricator.wikimedia.org/T347430#10928659 (10BTullis) a:03BTullis [15:18:29] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [15:46:48] 06Data-Engineering: AlertLintProblem - https://phabricator.wikimedia.org/T395539#10928810 (10phaultfinder) [15:52:31] 10Data-Engineering (Q4 2025 April 1st - June 30th): Establish what data must be backed up before the Hadoop 3 upgrade - https://phabricator.wikimedia.org/T394071#10928822 (10BTullis) →14Duplicate dup:03T397184 [16:16:53] hey btullis - I assume applying the archiva-legacy external service in admin_ng is safe right? [16:17:30] hnowlan: Oh yes, sorry. Did I forget to do that? [16:17:51] no problem, I can apply that now [16:18:16] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10MediaWiki-DomainEvents, 10ci-test-error (WMF-deployed Build Failure), 10Event-Platform, 13Patch-For-Review: phpunit\integration\PageChangeEmissionTest::testPageMove with data set "Valid move with red... - https://phabricator.wikimedia.org/T397087#10928904 [16:18:47] Cool, thanks. Sorry about that. [16:26:37] no bother [16:26:58] btullis: similarly is the removal of postgresql-analytics okay? [16:27:33] Oh yes. Which k8s cluster is this? Both of wikikubes? [16:28:10] 06Data-Engineering, 06Data-Platform-SRE: Enable async queries for Superset with Celery - https://phabricator.wikimedia.org/T397338#10928966 (10JAllemandou) While I Iike the technical ability to run large queries asynchronously with Presto, I don't think we should consider this option for the current usability... [16:29:39] btullis: yeah, and the stagings. it's just codfw for the postgresql one [16:30:20] 06Data-Engineering, 06Data-Platform-SRE: Enable async queries for Superset with Celery - https://phabricator.wikimedia.org/T397338#10928981 (10JAllemandou) Ping @GGoncalves-WMF on this comment above :) [16:30:21] Ack, cheers. [16:43:12] 06Data-Engineering, 06Data-Engineering-Icebox, 10Data-Engineering-Wikistats, 10Data Pipelines, and 4 others: Merge ks-Arab and ks-Deva to ks - https://phabricator.wikimedia.org/T314476#10929024 (10srishakatux) [17:10:38] 10Quarry: quarry: Drop manual frontend build process - https://phabricator.wikimedia.org/T396991#10929092 (10SD0001) Nunjucks is only used in JS. We can remove the manual build step by making it a part of docker build. I don't think there's a need to remove the built step altogether as it comes at the cost of ha... [17:11:26] 06Data-Engineering, 06Data-Engineering-Icebox, 06SRE Observability, 10Data-Platform-SRE (2025.06.13 - 2025.07.04), 13Patch-For-Review: [Data Platform] Install a Prometheus connector for Presto, pointed at thanos-query - https://phabricator.wikimedia.org/T347430#10929093 (10CDanis) >>! In T347430#10917569... [17:27:57] 10Quarry: quarry: Drop manual frontend build process - https://phabricator.wikimedia.org/T396991#10929192 (10SD0001) https://github.com/toolforge/quarry/pull/91 [18:27:54] 06Data-Engineering, 06Data-Platform-SRE: Enable async queries for Superset with Celery - https://phabricator.wikimedia.org/T397338#10929383 (10BTullis) >>! In T397338#10928966, @JAllemandou wrote: > I think our current approach of having everything queryable by everyone in Superset (almost) is the real problem... [18:35:36] FIRING: MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [18:35:36] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [18:49:49] 06Data-Engineering, 06Data-Platform-SRE: Enable async queries for Superset with Celery - https://phabricator.wikimedia.org/T397338#10929424 (10JAllemandou) >>! In T397338#10929383, @BTullis wrote: > Jupyter/Stat server access and therefore access to the spark CLI still has a relatively high amount of friction... [18:50:36] RESOLVED: MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [18:50:36] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [18:57:06] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [18:57:06] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [19:18:29] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [19:42:06] RESOLVED: MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [19:42:06] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [20:33:36] FIRING: MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [20:33:36] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [20:39:01] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Movement-Insights, 06Traffic: NEW BUG REPORT: Investigate rise in May 2025 Reader metrics - https://phabricator.wikimedia.org/T395934#10929769 (10mforns) Yesterday we tried an alternative approach that aims to identify which IPs belong to the bot-net by l... [21:03:36] RESOLVED: MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [21:03:36] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [21:10:06] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [21:10:06] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [21:34:43] 06Data-Engineering, 06Data-Platform-SRE: Enable async queries for Superset with Celery - https://phabricator.wikimedia.org/T397338#10929987 (10BTullis) >>! In T397338#10929424, @JAllemandou wrote: >>>! In T397338#10929383, @BTullis wrote: >> Jupyter/Stat server access and therefore access to the spark CLI stil... [22:10:06] RESOLVED: MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [22:10:06] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [23:18:29] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem