[01:02:33] (03CR) 10Sbisson: [C:04-1] add support for pagination for single page collections (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1206409 (https://phabricator.wikimedia.org/T384485) (owner: 10Nik Gkountas) [02:22:04] FIRING: KubernetesDeploymentUnavailableReplicas: ... [02:22:04] Deployment aya-llm-predictor-00006-deployment in llm at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=llm&var-deployment=aya-llm-predictor-00006-deployment - ... [02:22:04] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [02:54:46] 06Machine-Learning-Team, 10Add-Link-Structured-Task, 10Community Feedback (Growth), 06Growth-Team: Introduce case sensitivity to machine learning model for Add a Link - https://phabricator.wikimedia.org/T405185#11390790 (10Sdkb) @Chipmunkdavis has [[ https://en.wikipedia.org/w/index.php?title=Wikipedia_tal... [04:32:25] 10Lift-Wing, 06Machine-Learning-Team, 10Wikidata, 06Wikimedia Enterprise, 10Wikimedia Enterprise - Content Integrity: Request to host Wikidata Revert Risk on Lift Wing - https://phabricator.wikimedia.org/T406179#11390839 (10kevinbazira) @Trokhymovych, here are resources to help you create a comprehensive... [05:16:57] 10Lift-Wing, 06Machine-Learning-Team, 10Wikidata, 06Wikimedia Enterprise, 10Wikimedia Enterprise - Content Integrity: Request to host Wikidata Revert Risk on Lift Wing - https://phabricator.wikimedia.org/T406179#11390873 (10kevinbazira) The revertrisk-wikidata inference service is now [[ https://phabrica... [06:22:04] FIRING: KubernetesDeploymentUnavailableReplicas: ... [06:22:04] Deployment aya-llm-predictor-00006-deployment in llm at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=llm&var-deployment=aya-llm-predictor-00006-deployment - ... [06:22:04] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [08:41:12] (03CR) 10AikoChou: "Very Nice! I only have one comment :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1207175 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [08:55:26] (03CR) 10Bartosz Wójtowicz: revise-tone-task-generator: Adapt code to page_change events. (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1207175 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [09:03:36] (03PS2) 10Bartosz Wójtowicz: revise-tone-task-generator: Adapt code to page_change events. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1207175 (https://phabricator.wikimedia.org/T408538) [09:04:04] (03CR) 10Bartosz Wójtowicz: revise-tone-task-generator: Adapt code to page_change events. (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1207175 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [09:16:26] (03CR) 10AikoChou: [C:03+1] "LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1207175 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [09:19:10] (03CR) 10Bartosz Wójtowicz: [C:03+2] revise-tone-task-generator: Adapt code to page_change events. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1207175 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [09:28:03] (03Merged) 10jenkins-bot: revise-tone-task-generator: Adapt code to page_change events. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1207175 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [09:49:39] (03CR) 10Nik Gkountas: add support for pagination for single page collections (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1206409 (https://phabricator.wikimedia.org/T384485) (owner: 10Nik Gkountas) [09:52:00] (03PS7) 10Nik Gkountas: Page collections caching: Use sitematrix lang code for all articles [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1206883 (https://phabricator.wikimedia.org/T410387) [09:53:27] (03CR) 10CI reject: [V:04-1] Page collections caching: Use sitematrix lang code for all articles [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1206883 (https://phabricator.wikimedia.org/T410387) (owner: 10Nik Gkountas) [10:20:07] 06Machine-Learning-Team, 10Add-Link-Structured-Task, 10Community Feedback (Growth), 06Growth-Team: Introduce case sensitivity to machine learning model for Add a Link - https://phabricator.wikimedia.org/T405185#11391268 (10OKarakaya-WMF) thank you both @Sdkb and @Chipmunkdavis for reporting this issue, I... [10:22:04] FIRING: KubernetesDeploymentUnavailableReplicas: ... [10:22:04] Deployment aya-llm-predictor-00006-deployment in llm at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=llm&var-deployment=aya-llm-predictor-00006-deployment - ... [10:22:04] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [10:51:25] (03CR) 10Nik Gkountas: add support for pagination for single page collections (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1206409 (https://phabricator.wikimedia.org/T384485) (owner: 10Nik Gkountas) [11:34:43] (03PS8) 10Nik Gkountas: Page collections caching: Use sitematrix lang code for all articles [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1206883 (https://phabricator.wikimedia.org/T410387) [11:34:52] (03CR) 10Nik Gkountas: Page collections caching: Use sitematrix lang code for all articles (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1206883 (https://phabricator.wikimedia.org/T410387) (owner: 10Nik Gkountas) [12:54:34] klausman: o/ could you review this patch for creating the new ns for revise tone task generator when you have a moment? https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1207798 [12:54:46] on it [13:03:39] review done [13:28:06] (03CR) 10Sbisson: [C:03+2] Page collections caching: Use sitematrix lang code for all articles [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1206883 (https://phabricator.wikimedia.org/T410387) (owner: 10Nik Gkountas) [13:28:40] (03Merged) 10jenkins-bot: Page collections caching: Use sitematrix lang code for all articles [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1206883 (https://phabricator.wikimedia.org/T410387) (owner: 10Nik Gkountas) [13:37:14] (03CR) 10Sbisson: [C:04-1] add support for pagination for single page collections (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1206409 (https://phabricator.wikimedia.org/T384485) (owner: 10Nik Gkountas) [13:57:10] klausman: dpogorzelski forgot to merge this: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1207800 on the puppetserver, ok to me just merging now? [14:03:38] checking [14:03:51] yes, go ahead [14:05:00] (03PS1) 10Bartosz Wójtowicz: revise-tone-task-generator: Send weighted tags events. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1207867 (https://phabricator.wikimedia.org/T408538) [14:19:23] (03PS2) 10Bartosz Wójtowicz: revise-tone-task-generator: Send weighted tags events. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1207867 (https://phabricator.wikimedia.org/T408538) [14:22:04] FIRING: KubernetesDeploymentUnavailableReplicas: ... [14:22:10] Deployment aya-llm-predictor-00006-deployment in llm at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=llm&var-deployment=aya-llm-predictor-00006-deployment - ... [14:22:10] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [14:44:33] (03PS3) 10Bartosz Wójtowicz: revise-tone-task-generator: Send weighted tags events. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1207867 (https://phabricator.wikimedia.org/T408538) [14:50:29] (03PS4) 10Bartosz Wójtowicz: revise-tone-task-generator: Send weighted tags events. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1207867 (https://phabricator.wikimedia.org/T408538) [15:02:27] 06Machine-Learning-Team, 10EditCheck, 10VisualEditor: [SPIKE] Define process for validating Tone Check model eval data for languages staff members do not speak - https://phabricator.wikimedia.org/T407155#11392333 (10gkyziridis) ==== Update ===== Target languages: Dutch, Latvian, German, and Polish Target Wik... [15:42:07] 06Machine-Learning-Team, 10Add-Link-Structured-Task, 10Community Feedback (Growth), 06Growth-Team: Introduce case sensitivity to machine learning model for Add a Link - https://phabricator.wikimedia.org/T405185#11392670 (10Sdkb) Thanks for that clarification, @OKarakaya-WMF! The [[ https://en.wikipedia.org... [16:19:44] FIRING: LiftWingServiceErrorRate: ... [16:19:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=recommendation-api-ng&var-backend=recommendation-api-ng-main.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [16:23:47] 06Machine-Learning-Team: Upgrade AMD GPU + torch version of ML Labs machines - https://phabricator.wikimedia.org/T410663 (10Isaac) 03NEW [16:24:40] 06Machine-Learning-Team: Upgrade AMD GPU + torch version of ML Labs machines - https://phabricator.wikimedia.org/T410663#11393011 (10Isaac) Tagging you @Trokhymovych as I think you mentioned having (similar?) issues with a Qwen re-ranking model as well that seemed to relate to the torch version? [16:24:44] RESOLVED: LiftWingServiceErrorRate: ... [16:24:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=recommendation-api-ng&var-backend=recommendation-api-ng-main.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [16:25:59] 06Machine-Learning-Team: Upgrade AMD GPU + torch version of ML Labs machines - https://phabricator.wikimedia.org/T410663#11393019 (10Isaac) [16:43:23] 06Machine-Learning-Team, 10ORES, 10Browser Test Platform, 10VisualEditor, 10Continuous-Integration-Config: Audit tests/selenium/LocalSettings.php file aiming at possibly deprecating the feature - https://phabricator.wikimedia.org/T199939#11393201 (10zeljkofilipin) [17:17:18] (03PS1) 10Nik Gkountas: add support for combining single page collection with topic filter [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1207918 (https://phabricator.wikimedia.org/T409338) [17:23:27] (03PS6) 10Nik Gkountas: add support for pagination for single page collections [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1206409 (https://phabricator.wikimedia.org/T384485) [17:27:36] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 10PersonalDashboard, and 2 others: Enable revertrisk filters in thwiki - https://phabricator.wikimedia.org/T409438#11393536 (10Kgraessle) @Ladsgroup Just curious if you had any reservations about us proceeding with deployin... [17:42:33] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 10PersonalDashboard, and 2 others: Enable revertrisk filters in thwiki - https://phabricator.wikimedia.org/T409438#11393641 (10Kgraessle) a:03Kgraessle [18:00:54] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 10PersonalDashboard, and 3 others: Enable revertrisk filters in thwiki - https://phabricator.wikimedia.org/T409438#11393697 (10Ladsgroup) >>! In T409438#11393535, @Kgraessle wrote: > @Ladsgroup > > Just curious if you had a... [18:22:04] FIRING: KubernetesDeploymentUnavailableReplicas: ... [18:22:04] Deployment aya-llm-predictor-00006-deployment in llm at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=llm&var-deployment=aya-llm-predictor-00006-deployment - ... [18:22:04] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [19:10:27] (03CR) 10Sbisson: "When we populate the collection cache, we fetch article size in English to partially support size filtering when sourcelang=en. We do that" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1207918 (https://phabricator.wikimedia.org/T409338) (owner: 10Nik Gkountas) [19:30:20] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 10PersonalDashboard, and 3 others: Enable revertrisk filters in thwiki - https://phabricator.wikimedia.org/T409438#11393922 (10Kgraessle) 05Open→03Stalled [22:24:16] FIRING: KubernetesDeploymentUnavailableReplicas: ... [22:24:17] Deployment aya-llm-predictor-00006-deployment in llm at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=llm&var-deployment=aya-llm-predictor-00006-deployment - ... [22:24:17] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas