[01:58:23] (03CR) 10Tim Starling: [C:03+2] Drop support for filtering model by rc_type [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1205974 (https://phabricator.wikimedia.org/T74157) (owner: 10Zabe) [02:22:20] (03Merged) 10jenkins-bot: Drop support for filtering model by rc_type [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1205974 (https://phabricator.wikimedia.org/T74157) (owner: 10Zabe) [03:17:04] FIRING: KubernetesDeploymentUnavailableReplicas: ... [03:17:04] Deployment aya-llm-predictor-00003-deployment in llm at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=llm&var-deployment=aya-llm-predictor-00003-deployment - ... [03:17:04] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [07:14:35] good morning [07:17:04] FIRING: KubernetesDeploymentUnavailableReplicas: ... [07:17:04] Deployment aya-llm-predictor-00003-deployment in llm at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=llm&var-deployment=aya-llm-predictor-00003-deployment - ... [07:17:04] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [09:19:23] 06Machine-Learning-Team, 06Data-Engineering, 06serviceops: Enable ChangeProp to consume mediawiki.page_content_change.v1 - https://phabricator.wikimedia.org/T409469#11377845 (10JMonton-WMF) That sounds good. Then, we could consider increasing the partitions in Jumbo too, `codfw.mediawiki.page_content_change.... [09:24:28] (03CR) 10Nik Gkountas: [C:04-1] Page collection validation script (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1203077 (owner: 10Sbisson) [09:26:26] (03CR) 10Nik Gkountas: [C:03+2] Prevent calling wikidata with no titles [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1205206 (owner: 10Sbisson) [09:27:59] (03Merged) 10jenkins-bot: Prevent calling wikidata with no titles [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1205206 (owner: 10Sbisson) [09:37:00] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, and 2 others: WE 1.3.4 Roll out Revert Risk Filters to Wikis that don't have damaging/goodfaith Edit Models - https://phabricator.wikimedia.org/T408388#11377907 (10DMburugu) [09:51:26] (03CR) 10Nik Gkountas: [C:04-1] "The script doesn't account for empty page collections, which are not saved in cache. This will always lead to errors if empty page collect" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1203077 (owner: 10Sbisson) [09:55:27] (03PS1) 10Kevin Bazira: revertrisk-wikidata: update feature processing based on research team feedback [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1206197 (https://phabricator.wikimedia.org/T406179) [10:12:24] (03CR) 10Kevin Bazira: "This patch has been tested based on the Research team's feedback provided here: https://phabricator.wikimedia.org/T406179#11375298" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1206197 (https://phabricator.wikimedia.org/T406179) (owner: 10Kevin Bazira) [10:44:56] (03CR) 10Gkyziridis: [C:03+1] "LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1206197 (https://phabricator.wikimedia.org/T406179) (owner: 10Kevin Bazira) [10:47:49] (03PS1) 10Bartosz Wójtowicz: llm: Resolve system built-in packages last in PYTHONPATH. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1206206 [10:49:30] (03CR) 10Kevin Bazira: [C:03+2] revertrisk-wikidata: update feature processing based on research team feedback [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1206197 (https://phabricator.wikimedia.org/T406179) (owner: 10Kevin Bazira) [10:50:07] (03Merged) 10jenkins-bot: revertrisk-wikidata: update feature processing based on research team feedback [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1206197 (https://phabricator.wikimedia.org/T406179) (owner: 10Kevin Bazira) [10:55:50] (03CR) 10AikoChou: [C:03+1] llm: Resolve system built-in packages last in PYTHONPATH. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1206206 (owner: 10Bartosz Wójtowicz) [11:03:36] (03CR) 10Bartosz Wójtowicz: [C:03+2] llm: Resolve system built-in packages last in PYTHONPATH. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1206206 (owner: 10Bartosz Wójtowicz) [11:04:11] (03Merged) 10jenkins-bot: llm: Resolve system built-in packages last in PYTHONPATH. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1206206 (owner: 10Bartosz Wójtowicz) [11:14:55] 06Machine-Learning-Team, 06Data-Engineering, 06serviceops: Enable ChangeProp to consume mediawiki.page_content_change.v1 - https://phabricator.wikimedia.org/T409469#11378447 (10jijiki) Thank you for the discussion everyone! Reading through, I would suggest proceeding with Option D for the time being. This ap... [11:17:04] FIRING: KubernetesDeploymentUnavailableReplicas: ... [11:17:04] Deployment aya-llm-predictor-00003-deployment in llm at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=llm&var-deployment=aya-llm-predictor-00003-deployment - ... [11:17:04] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [11:51:49] RESOLVED: KubernetesDeploymentUnavailableReplicas: ... [11:51:49] Deployment aya-llm-predictor-00003-deployment in llm at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=llm&var-deployment=aya-llm-predictor-00003-deployment - ... [11:51:49] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [12:11:43] (03PS1) 10Bartosz Wójtowicz: revise-tone-task-generator: Do not query when deleting from cache. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1206358 (https://phabricator.wikimedia.org/T408538) [12:39:45] (03CR) 10AikoChou: revise-tone-task-generator: Do not query when deleting from cache. (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1206358 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [12:45:49] FIRING: KubernetesDeploymentUnavailableReplicas: ... [12:45:49] Deployment aya-llm-predictor-00005-deployment in llm at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=llm&var-deployment=aya-llm-predictor-00005-deployment - ... [12:45:49] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [13:14:35] (03CR) 10Bartosz Wójtowicz: revise-tone-task-generator: Do not query when deleting from cache. (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1206358 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [14:20:10] 06Machine-Learning-Team, 06LPL Hypothesis, 10Recommendation-API, 10LPL Projects (Other), 07Unplanned-Sprint-Work: Collection data unavailable in several rec-api hosts - https://phabricator.wikimedia.org/T406854#11379023 (10Nikerabbit) [16:11:54] (03PS1) 10Bartosz Wójtowicz: revise-tone-task-generator: Do not query when deleting from cache. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1206358 (https://phabricator.wikimedia.org/T408538) [16:35:00] (03PS1) 10Nik Gkountas: add support for "continue_from" for single page collections [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1206409 (https://phabricator.wikimedia.org/T384485) [16:36:28] (03CR) 10CI reject: [V:04-1] add support for "continue_from" for single page collections [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1206409 (https://phabricator.wikimedia.org/T384485) (owner: 10Nik Gkountas) [16:38:08] 06Machine-Learning-Team, 10EditCheck, 10VisualEditor: [SPIKE] Define process for validating Tone Check model eval data for languages staff members do not speak - https://phabricator.wikimedia.org/T407155#11379733 (10gkyziridis) ==== Update ==== > You can try tweaking the filters in the notebook, such as loo... [16:38:25] (03CR) 10AikoChou: [C:03+1] revise-tone-task-generator: Do not query when deleting from cache. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1206358 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz) [16:46:04] FIRING: KubernetesDeploymentUnavailableReplicas: ... [16:46:04] Deployment aya-llm-predictor-00005-deployment in llm at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=llm&var-deployment=aya-llm-predictor-00005-deployment - ... [16:46:04] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [20:46:04] FIRING: KubernetesDeploymentUnavailableReplicas: ... [20:46:04] Deployment aya-llm-predictor-00005-deployment in llm at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=llm&var-deployment=aya-llm-predictor-00005-deployment - ... [20:46:04] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [21:42:10] 06Machine-Learning-Team, 10ORES, 10Browser Test Platform, 10Testing Support, and 2 others: Audit tests/selenium/LocalSettings.php file aiming at possibly deprecating the feature - https://phabricator.wikimedia.org/T199939#11381346 (10SDunlap)