[00:00:29] ah, nm. seems to be something with cassandra. we'll try to disable the persistence [12:02:36] 10serviceops, 10Prod-Kubernetes, 10Release-Engineering-Team, 10Kubernetes: CI pipeline/job to build and release helm chart artifacts - https://phabricator.wikimedia.org/T257333 (10JMeybohm) [12:55:00] akosiaris: mutante: do you think we need a redirect notice or somethink on https://releases.wikimedia.org/charts/ to link to https://helm-charts.wikimedia.org/stable/? I would prefer not to do a HTTP redirect so that we can catch old repository config ... [12:56:21] jayme: probably not. An email to wikitech-l is probably sufficient [12:58:29] akosiaris: Nice. Will need to send a mail anyways to tell people not to "helm package && helm repo index" anymore [13:29:02] <_joe_> jayme: oh no [13:29:07] <_joe_> I loved doing that [14:09:15] _joe_: you clould have said that earlier...you could have taken the cronjobs job :) [14:42:30] 10serviceops, 10OTRS, 10Operations, 10Patch-For-Review, 10User-notice: Update OTRS to the latest stable version (6.0.x) - https://phabricator.wikimedia.org/T187984 (10eyazi) Not sure if you did, but you should also reset the Ticket::SearchIndexModule setting. Can be done on the interface if you have acce... [16:30:05] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` wtp2016.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [16:31:45] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` wtp2017.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [16:34:22] jayme: +1 wikitech-l [16:34:32] i am doing the last 4 wtp* reimages and then it should be done today [18:19:06] 10serviceops, 10Operations: reinstall xhgui* with buster - https://phabricator.wikimedia.org/T259206 (10Dzahn) 05Open→03Resolved both xhgui1001 and xhgui2001 are now on buster, have xhgui package installed and puppet is happy [18:40:31] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp2017.codfw.wmnet'] ` and were **ALL** successful. [18:42:13] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp2016.codfw.wmnet'] ` and were **ALL** successful. [18:47:04] oh.. look, using httpbb paid off [18:47:38] all these reinstalls and they all worked after reinstall.. but now i have that one case where it's "29 of 40 assertions failed" [18:47:49] one-off [18:53:07] \o/ [18:58:46] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` wtp2018.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [19:20:17] in the end i had to restart apache/php7.2-fpm manually and wait a moment and it went from "29/40 fails" to "40 working". some race. just one out of a dozen [19:21:25] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` wtp2019.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [19:28:09] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` wtp2020.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [20:06:11] 10serviceops, 10MediaWiki-General, 10MediaWiki-Stakeholders-Group, 10Release-Engineering-Team, and 3 others: Drop PHP 7.2 support in MediaWiki 1.35 - https://phabricator.wikimedia.org/T257879 (10Akuckartz) @Kringle Thanks, yes, I agree. I was irritated by some of the comments :-) [21:07:44] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp2018.codfw.wmnet'] ` and were **ALL** successful. [21:24:58] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10Dzahn) a:05JMeybohm→03Dzahn [21:42:55] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp2019.codfw.wmnet'] ` and were **ALL** successful. [21:54:28] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp2020.codfw.wmnet'] ` and were **ALL** successful. [22:36:42] 10serviceops, 10Operations, 10Platform Engineering, 10Release Pipeline, and 6 others: Kask functional testing with Cassandra via the Deployment Pipeline - https://phabricator.wikimedia.org/T224041 (10jeena) We attempted to run the tests using CI, but ran into errors deploying cassandra to k8s (on the ci cl... [23:10:34] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10Dzahn) All wtp* and parse* servers have been reimaged. With the exception of wtp2019 they have also been tested with httpbb, parsoid service running, repooled and look fine in mo... [23:14:55] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10Dzahn) ` [cumin1001:~] $ sudo cumin wtp* 'df -h | grep mapper | cut -d "/" -f1,2' 43 hosts will be targeted: wtp[2001-2004,2006-2020].codfw.wmnet,wtp[1025-1048].eqiad.wmnet Confirm... [23:15:13] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10Dzahn) 05Open→03Resolved [23:31:21] all wtp* and parse* hosts have been reimaged and have the new /srv partition. [23:31:38] all repooled with one exception. wtp2019 has some issue i will look at again later [23:32:57] icinga all green, tested with httpbb and that parsoid service runs (it does, except on 2019) (i needed some manual things for different race conditions during the reimaging process, like downtimes, restarting services, deleting deployment-cache etc)