[00:42:55] 10SRE-tools, 06Infrastructure-Foundations, 06SRE: Reboot cookbook workflow leaves Puppet disabled - https://phabricator.wikimedia.org/T410944#11403724 (10RLazarus) [01:06:36] 10SRE-tools, 06Infrastructure-Foundations, 06SRE, 06Traffic: Reboot cookbook workflow leaves Puppet disabled - https://phabricator.wikimedia.org/T410944#11403767 (10ssingh) [09:06:48] 10netops, 06Infrastructure-Foundations, 06SRE: Cloudcephosd: migrate to single network uplink - https://phabricator.wikimedia.org/T399180#11404346 (10fgiunchedi) [09:09:23] 10netops, 06Infrastructure-Foundations, 06SRE: Cloudcephosd: migrate to single network uplink - https://phabricator.wikimedia.org/T399180#11404351 (10fgiunchedi) The logical side on the host side is done. Next up is deleting the interfaces from netbox for the hosts and unplug network cables. I'll file subtasks [09:11:01] 10netops, 06Infrastructure-Foundations: Remove extra netbox interfaces for cloudcephosd hosts with single uplink - https://phabricator.wikimedia.org/T410989 (10fgiunchedi) 03NEW [10:09:49] 10netops, 06Infrastructure-Foundations: Remove extra netbox interfaces for cloudcephosd hosts with single uplink - https://phabricator.wikimedia.org/T410989#11404665 (10cmooney) @fgiunchedi have all the physical interfaces been removed on site? Typically I would ask DC-Ops to remove in Netbox when they remove... [11:07:52] 10netops, 06Infrastructure-Foundations: Remove extra netbox interfaces for cloudcephosd hosts with single uplink - https://phabricator.wikimedia.org/T410989#11404877 (10cmooney) 05Open→03Resolved a:03cmooney > and run homer what we can do if the second port being in an "up" state on the switch is a... [11:48:21] 10netops, 06Infrastructure-Foundations: Remove extra netbox interfaces for cloudcephosd hosts with single uplink - https://phabricator.wikimedia.org/T410989#11405013 (10cmooney) 05Resolved→03Open [12:18:54] 10netops, 06Infrastructure-Foundations, 06SRE: rancid: message has lines too long for transport - https://phabricator.wikimedia.org/T410606#11405075 (10cmooney) [13:08:00] 10netops, 06Infrastructure-Foundations: Remove extra netbox interfaces for cloudcephosd hosts with single uplink - https://phabricator.wikimedia.org/T410989#11405226 (10fgiunchedi) >>! In T410989#11404665, @cmooney wrote: > @fgiunchedi have all the cables been removed on site? > > Typically I would ask DC-Ops... [13:18:01] 10netops, 06Infrastructure-Foundations: Remove extra netbox interfaces for cloudcephosd hosts with single uplink - https://phabricator.wikimedia.org/T410989#11405248 (10cmooney) >>! In T410989#11405226, @fgiunchedi wrote: > I see, thank you I was not aware of the procedure and it makes sense! Yeah the main th... [13:18:31] 10netops, 06Infrastructure-Foundations: Remove second network connection for cloudcephosd hosts with single uplink - https://phabricator.wikimedia.org/T410989#11405249 (10cmooney) [13:24:03] 10netops, 06Infrastructure-Foundations: Remove second network connection for cloudcephosd hosts with single uplink - https://phabricator.wikimedia.org/T410989#11405268 (10fgiunchedi) >>! In T410989#11405248, @cmooney wrote: > Thinking it through what is probably best: > > # We disable the switch interfaces te... [13:43:23] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad C/D refresh: 2 x test hosts for config validation - https://phabricator.wikimedia.org/T405560#11405332 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1003 for host sretest1006.eqiad.wm... [13:49:42] 10netops, 06Infrastructure-Foundations, 06SRE: Eqiad row C/D servers need to boot/reimage in UEFI mode - https://phabricator.wikimedia.org/T410910#11405342 (10cmooney) [13:59:06] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations: Cookbook sre.hardware.upgrade-firmware fails to get firmwares from Dell's website - https://phabricator.wikimedia.org/T357756#11405424 (10jcrespo) Happened to me again today. [14:36:55] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations: Cookbook sre.hardware.upgrade-firmware fails to get firmwares from Dell's website - https://phabricator.wikimedia.org/T357756#11405703 (10elukey) @jcrespo sadly the upstream website changed and the way that we used to get the latest firmware doesn't work a... [14:51:51] 10Mail, 06Infrastructure-Foundations, 06SRE: Emails to Google group no-reply@wikimedia.org are not being delivered - SMTP server issue? - https://phabricator.wikimedia.org/T411027#11405766 (10Aklapper) [@JKelsoteel-WMF: Please set project tags so tasks can be found on project workboards - thanks!] [14:53:54] 10Mail, 06Infrastructure-Foundations, 06SRE: Emails to Google group no-reply@wikimedia.org are not being delivered - SMTP server issue? - https://phabricator.wikimedia.org/T411027#11405776 (10Aklapper) > (see screenshot). There is no screenshot. See also https://www.mediawiki.org/wiki/Phabricator/Help#Uploa... [14:56:04] 10Mail, 06Infrastructure-Foundations, 06SRE: Emails to Google group no-reply@wikimedia.org are not being delivered - SMTP server issue? - https://phabricator.wikimedia.org/T411027#11405786 (10JKelsoteel-WMF) @Aklapper sorry, here they are! Thanks for the flag. I also messaged Jesse on Slack to ask if certain... [15:04:20] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad C/D refresh: 2 x test hosts for config validation - https://phabricator.wikimedia.org/T405560#11405805 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1003 for host sretest1006.eqiad.wmnet... [15:15:00] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad C/D refresh: 2 x test hosts for config validation - https://phabricator.wikimedia.org/T405560#11405866 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1003 for host sretest1006.eqiad.wm... [15:52:57] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad C/D refresh: 2 x test hosts for config validation - https://phabricator.wikimedia.org/T405560#11406114 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1003 for host sretest1006.eqiad.wmnet... [16:46:12] 10Mail, 06Infrastructure-Foundations, 06SRE: Emails to Google group no-reply@wikimedia.org are not being delivered - SMTP server issue? - https://phabricator.wikimedia.org/T411027#11406435 (10JKelsoteel-WMF) Hello @jhathaway, our requester has let us know that he hopes to use the no-reply@ address in early D... [17:18:25] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:51:44] 10SRE-tools, 06Infrastructure-Foundations, 06SRE, 06Traffic: Reboot cookbook workflow leaves Puppet disabled - https://phabricator.wikimedia.org/T410944#11406729 (10Vgutierrez) From SREBatchRunnerBase `__reboot_action()`: `lang=python puppet = self._spicerack.puppet(hosts) reboot_time = da... [17:55:58] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11406753 (10Papaul) [18:18:28] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11406867 (10Jclark-ctr) Day 10 Update: - 7 host Moved, 11 Remaining - 300 host at start of migration - John worked with Ben directly to migrate the (4) Data P... [19:30:22] 10netops, 06Infrastructure-Foundations, 06SRE: Eqiad row C/D servers need to boot/reimage in UEFI mode - https://phabricator.wikimedia.org/T410910#11407124 (10cmooney) [20:29:22] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, and 3 others: lvs1020: move primary uplink from asw2-d7-eqiad to lsw1-d7-eqiad and remove link to asw2-c2-eqiad - https://phabricator.wikimedia.org/T405609#11407342 (10cmooney) @BCornwall thanks for the gerrit reviews! Could you have a look at... [20:29:38] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, and 3 others: lvs1019: move primary uplink from asw2-c7-eqiad to lsw1-c7-eqiad and remove link to asw2-d2-eqiad - https://phabricator.wikimedia.org/T405628#11407343 (10cmooney) @BCornwall thanks for the gerrit reviews! Could you have a look at... [20:38:51] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11407364 (10RobH) [20:46:18] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11407373 (10RobH) New host count: 7 host Moved, 11 Remaining - 308 host at start of migration (counting the 8 John audited and filed a task for) [21:18:40] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:01:33] 10netops, 06Infrastructure-Foundations, 06SRE: Nokia SR-Linux DHCP Relay Bug - https://phabricator.wikimedia.org/T411054 (10cmooney) 03NEW p:05Triage→03Medium [22:01:43] 10netops, 06Infrastructure-Foundations, 06SRE: Nokia SR-Linux DHCP Relay Bug - https://phabricator.wikimedia.org/T411054#11407632 (10cmooney) [22:03:18] 10netops, 06Infrastructure-Foundations, 06SRE: Nokia SR-Linux DHCP Relay Bug - https://phabricator.wikimedia.org/T411054#11407640 (10cmooney) [22:09:56] 10netops, 06Infrastructure-Foundations, 06SRE: Eqiad row C/D servers need to boot/reimage in UEFI mode - https://phabricator.wikimedia.org/T410910#11407654 (10cmooney) [23:18:25] RESOLVED: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed