[03:16:46] 10serviceops, 10Release-Engineering-Team, 10Patch-For-Review: switch prod Phabricator from phab1003 to phab1001 - https://phabricator.wikimedia.org/T238956 (10mmodell) Phab1001 disk I/O seems a lot slower than phab1003. Running `lshw -class storage` yields one obvious difference: phab1001 is running in lega... [03:22:10] 10serviceops, 10Release-Engineering-Team, 10Patch-For-Review: switch prod Phabricator from phab1003 to phab1001 - https://phabricator.wikimedia.org/T238956 (10mmodell) Slow disk is manifesting with tasks blocked for extended periods of time waiting for I/O: `name=/var/log/kern.log Dec 4 02:08:53 phab1001... [04:55:04] 10serviceops, 10Release-Engineering-Team, 10Patch-For-Review: switch prod Phabricator from phab1003 to phab1001 - https://phabricator.wikimedia.org/T238956 (10Dzahn) > @Dzahn: Can we switch it over to AHCI in the bios? Do we need DC-Ops for that? @mmodell I did reboot into BIOS and looked at it but when swi... [05:05:01] 10serviceops, 10Release-Engineering-Team, 10Patch-For-Review: switch prod Phabricator from phab1003 to phab1001 - https://phabricator.wikimedia.org/T238956 (10Dzahn) [05:15:31] 10serviceops, 10Operations, 10Traffic, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10Dzahn) [05:16:07] 10serviceops, 10Operations, 10Traffic, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10Dzahn) https://ticket.wikimedia.org (OTRS) has been switched to use https://ticket.discovery.wmnet (envoy on mendelevium). [05:58:18] 10serviceops, 10Release-Engineering-Team, 10Patch-For-Review: switch prod Phabricator from phab1003 to phab1001 - https://phabricator.wikimedia.org/T238956 (10Mholloway) https://integration.wikimedia.org/ci/job/mobileapps-periodic-test/ has been failing since this happened. I'm guessing that is not a coinci... [11:48:11] 10serviceops, 10Analytics, 10Event-Platform: conntrack -L - https://phabricator.wikimedia.org/T239795 (10Aklapper) [11:55:49] 10serviceops, 10Analytics, 10Event-Platform: Connection tracking on kubernetes hosts alerts - https://phabricator.wikimedia.org/T239795 (10akosiaris) 05Open→03Resolved p:05Triage→03Normal [12:02:00] 10serviceops, 10Analytics, 10Event-Platform, 10Patch-For-Review: Connection tracking on kubernetes hosts alerts - https://phabricator.wikimedia.org/T239795 (10akosiaris) [13:40:26] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by rzl on cumin1001.eqiad.wmnet for hosts: ` ['mw2274.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/201912041329_rzl_100644.log`. [13:45:31] 10serviceops, 10Release-Engineering-Team, 10Patch-For-Review: switch prod Phabricator from phab1003 to phab1001 - https://phabricator.wikimedia.org/T238956 (10Lucas_Werkmeister_WMDE) The recently added Gerrit integration directly beneath a tasks description seems to be gone, could that be related to this swi... [14:21:44] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2274.codfw.wmnet'] ` and were **ALL** successful. [14:41:45] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10RLazarus) [14:48:36] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by rzl on cumin1001.eqiad.wmnet for hosts: ` ['mw2273.codfw.wmnet', 'mw2272.codfw.wmnet', 'mw2267.codfw.wmnet'] ` The log can be found in `/var/log/w... [15:24:41] 10serviceops, 10Operations, 10Puppet, 10User-jbond: Rolling restart of etcd to pick up the renewed CA public certificate. - https://phabricator.wikimedia.org/T237362 (10jbond) The new CA has been distributed now so this can be started [15:59:11] 10serviceops, 10Operations: VP9-enabled ffmpeg doesn't get installed after reimage of mw job runner/video scaler - https://phabricator.wikimedia.org/T239831 (10MoritzMuehlenhoff) [16:11:44] 10serviceops: setup new, buster based, kubernetes etcd servers for staging/codfw/eqiad cluster - https://phabricator.wikimedia.org/T239835 (10akosiaris) [16:11:55] 10serviceops: setup new, buster based, kubernetes etcd servers for staging/codfw/eqiad cluster - https://phabricator.wikimedia.org/T239835 (10akosiaris) p:05Triage→03High [16:21:08] 10serviceops, 10Release-Engineering-Team, 10Patch-For-Review: switch prod Phabricator from phab1003 to phab1001 - https://phabricator.wikimedia.org/T238956 (10Mainframe98) I suspect that this move inadvertently caused {T239786} as I only started seeing it today. [16:32:17] 10serviceops: setup new, buster based, kubernetes etcd servers for staging/codfw/eqiad cluster - https://phabricator.wikimedia.org/T239835 (10akosiaris) [16:35:11] 10serviceops, 10Operations, 10Patch-For-Review: VP9-enabled ffmpeg doesn't get installed after reimage of mw job runner/video scaler - https://phabricator.wikimedia.org/T239831 (10jijiki) a:03jijiki [16:47:50] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2273.codfw.wmnet', 'mw2272.codfw.wmnet', 'mw2267.codfw.wmnet'] ` and were **ALL** successful. [17:41:56] 10serviceops, 10DC-Ops, 10Operations, 10ops-codfw: mw2259 down and mgmt does not exist? - https://phabricator.wikimedia.org/T239758 (10Papaul) 05Open→03Resolved Reset the IDRAC server is back up [18:46:05] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10RLazarus) [18:58:59] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by rzl on cumin1001.eqiad.wmnet for hosts: ` ['mw2266.codfw.wmnet', 'mw2265.codfw.wmnet', 'mw2264.codfw.wmnet', 'mw2263.codfw.wmnet'] ` The log can b... [19:14:05] 10serviceops, 10Release-Engineering-Team, 10Patch-For-Review: switch prod Phabricator from phab1003 to phab1001 - https://phabricator.wikimedia.org/T238956 (10Dzahn) Unfortunately we have to switch back to the server before, change a BIOS setting in the current server, reimage it and then switch back a third... [19:41:27] 10serviceops, 10Proton, 10Product-Infrastructure-Team-Backlog (Kanban): Profile proton memory usage for Helm chart - https://phabricator.wikimedia.org/T238830 (10WDoranWMF) [19:46:28] 10serviceops, 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10Release-Engineering-Team (Development services): Reimage both phab1001 and phab2001 to stretch / buster - https://phabricator.wikimedia.org/T190568 (10mmodell) [21:26:37] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2266.codfw.wmnet', 'mw2265.codfw.wmnet', 'mw2264.codfw.wmnet', 'mw2263.codfw.wmnet'] ` and were **ALL** successful. [21:42:22] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10RLazarus)