[07:06:23] 10serviceops: mc1024 broke - replace it or remove it from configs - https://phabricator.wikimedia.org/T272078 (10Peachey88) [07:55:35] hello folks [07:56:06] mc1024 is down (hw failure) and OOW, so we should think what to do with the shard [10:02:07] 10serviceops, 10SRE, 10User-jijiki: Enable TLS on memcached - https://phabricator.wikimedia.org/T271967 (10jijiki) p:05Triage→03Medium [10:24:24] 10serviceops: mc1024 broke - replace it or remove it from configs - https://phabricator.wikimedia.org/T272078 (10jijiki) [10:35:01] 10serviceops: mc1024 broke - replace it or remove it from configs - https://phabricator.wikimedia.org/T272078 (10jijiki) As it has been noted on the description, traffic is serviced by the gutter pool so we can wait to hear from DCops when this server can be replaced. In the meantime we can leave things as they... [20:53:16] 10serviceops, 10SRE, 10ops-eqsin: ganeti5002 was down / powered off, machine check entries in SEL - https://phabricator.wikimedia.org/T261130 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` ganeti5002.eqsin.wmnet ` The log can be found in `/var/log/... [21:03:17] 10serviceops, 10SRE, 10ops-eqsin: ganeti5002 was down / powered off, machine check entries in SEL - https://phabricator.wikimedia.org/T261130 (10Papaul) This issue was that after replacing the system motherboard I am guess that the credentials were restored in the new IDRAC board from the chassis flash bac... [21:29:30] _joe_: well, they would be uwsgi logs if uwsgi is being used as the wsgi container yes. RIght now Toolhub is not using uwsgi, but that can certainly change for the prod deployment if it has advantages. [21:38:04] 10serviceops, 10SRE, 10ops-eqsin: ganeti5002 was down / powered off, machine check entries in SEL - https://phabricator.wikimedia.org/T261130 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['ganeti5002.eqsin.wmnet'] ` and were **ALL** successful. [21:39:43] 10serviceops, 10SRE, 10ops-eqsin: ganeti5002 was down / powered off, machine check entries in SEL - https://phabricator.wikimedia.org/T261130 (10RobH) 05Open→03Resolved a:05wiki_willy→03RobH So this is now ready to be pushed back into service, resolving this hw repair task.