[02:24:17] FIRING: KubernetesDeploymentUnavailableReplicas: ... [02:24:17] Deployment aya-llm-predictor-00006-deployment in llm at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=llm&var-deployment=aya-llm-predictor-00006-deployment - ... [02:24:19] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [06:24:27] FIRING: KubernetesDeploymentUnavailableReplicas: ... [06:24:27] Deployment aya-llm-predictor-00006-deployment in llm at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=llm&var-deployment=aya-llm-predictor-00006-deployment - ... [06:24:27] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [07:20:27] 06Machine-Learning-Team, 13Patch-For-Review: Create a Revise Tone Task Generator in LiftWing - https://phabricator.wikimedia.org/T408538#11394844 (10BWojtowicz-WMF) Notes on connection issues discovered during development. This is our first service deployed on LiftWing cluster, which requires pod-to-pod commu... [08:08:59] (03PS5) 10AikoChou: revise-tone-task-generator: Send weighted tags events. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1207867 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [08:11:30] (03CR) 10AikoChou: "This is awesome! Thanks so much for getting this done so fast and with such high quality :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1207867 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [08:34:02] (03CR) 10Bartosz Wójtowicz: "Ahh, this makes perfect sense and thank you for extending the code! It looks good to me :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1207867 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [08:41:07] (03CR) 10AikoChou: [C:03+1] revise-tone-task-generator: Send weighted tags events. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1207867 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [08:46:43] (03CR) 10Bartosz Wójtowicz: [C:03+2] revise-tone-task-generator: Send weighted tags events. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1207867 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [08:56:57] (03Merged) 10jenkins-bot: revise-tone-task-generator: Send weighted tags events. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1207867 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [10:24:27] FIRING: KubernetesDeploymentUnavailableReplicas: ... [10:24:27] Deployment aya-llm-predictor-00006-deployment in llm at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=llm&var-deployment=aya-llm-predictor-00006-deployment - ... [10:24:27] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [11:06:46] kevinbazira: Thnx for committing on the operations/puppet repo. I am not very familiar with the current repo, however your patch looks good: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1208189 . Lets wait for a +1 from @klausman as well. [11:07:04] SGTM! [11:23:41] LGTM'd. I can merge it and run p-m any time that is convenient for you [11:29:33] (03PS1) 10Bartosz Wójtowicz: revise-tone-task-generator: Add log message if no tone issues are found. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1208309 (https://phabricator.wikimedia.org/T408538) [11:34:04] (03CR) 10AikoChou: [C:03+1] "LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1208309 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [11:34:12] niiiice thnx folks! [11:35:37] (03CR) 10Bartosz Wójtowicz: [C:03+2] revise-tone-task-generator: Add log message if no tone issues are found. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1208309 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [11:45:05] (03Merged) 10jenkins-bot: revise-tone-task-generator: Add log message if no tone issues are found. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1208309 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [14:24:27] FIRING: KubernetesDeploymentUnavailableReplicas: ... [14:24:27] Deployment aya-llm-predictor-00006-deployment in llm at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=llm&var-deployment=aya-llm-predictor-00006-deployment - ... [14:24:27] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [14:44:42] 06Machine-Learning-Team: model reference-risk: reference_risk_score is always 0. - https://phabricator.wikimedia.org/T410744 (10OKarakaya-WMF) 03NEW [14:45:37] 06Machine-Learning-Team: model reference-risk: reference_risk_score is always 0. - https://phabricator.wikimedia.org/T410744#11396094 (10OKarakaya-WMF) The service works fine: curl https://api.wikimedia.org/service/lw/inference/v1/models/reference-risk:predict -X POST -d '{"rev_id": 1322686680, "lang": "en"}' {... [15:36:44] klausman: thanks for the review. looks like patch: 1208189 was +2'd but wasn't submitted [15:37:20] I asked if I should :) [15:37:31] 12:23 LGTM'd. I can merge it and run p-m any time that is convenient for you [15:37:31] please proceed :) [15:37:35] ack [15:37:40] ty ty! [15:48:17] 06Machine-Learning-Team, 13Patch-For-Review: Create a Revise Tone Task Generator in LiftWing - https://phabricator.wikimedia.org/T408538#11396458 (10akosiaris) >>! In T408538#11394844, @BWojtowicz-WMF wrote: > Notes on connection issues discovered during development. > > This is our first service deployed on... [15:48:47] 06Machine-Learning-Team, 05Goal, 07OKR-Work: Q1 FY2025-26 Goal: Task generation engine for Revise Tone task - https://phabricator.wikimedia.org/T408341#11396469 (10achou) **Weekly Report** Progress update on the hypothesis for the week, including if something has shipped: - Cassandra <-> LiftWing connection... [16:20:41] 06Machine-Learning-Team, 13Patch-For-Review: Create a Revise Tone Task Generator in LiftWing - https://phabricator.wikimedia.org/T408538#11396572 (10klausman) >>! In T408538#11396458, @akosiaris wrote: >>>! In T408538#11394844, @BWojtowicz-WMF wrote: >> Thus, our setup requires connection between 2 services: >... [16:46:00] have a nice weekend! [16:58:56] o/ enjoy the weekend ozge! :) [18:24:27] FIRING: KubernetesDeploymentUnavailableReplicas: ... [18:24:27] Deployment aya-llm-predictor-00006-deployment in llm at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=llm&var-deployment=aya-llm-predictor-00006-deployment - ... [18:24:27] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [22:24:36] FIRING: KubernetesDeploymentUnavailableReplicas: ... [22:24:36] Deployment aya-llm-predictor-00006-deployment in llm at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=llm&var-deployment=aya-llm-predictor-00006-deployment - ... [22:24:36] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas