[08:24:06] is there a way to stop deployment-prep puppet failure email spam? [08:42:56] fix puppet? :) [09:02:36] FYI I've just offboarded WMCS contacts from legacy paging in icinga (cfr T276792) please let me know if sth is amiss, cc arturo / dcaro [09:02:37] T276792: Remove cloud contacts from legacy paging - https://phabricator.wikimedia.org/T276792 [09:04:01] thanks godog, ack [09:04:23] godog: I think the comment by bstorm yesterday is all we needed [09:05:05] arturo: ah ok! good to know [09:05:15] I'll go ahead and tentatively resolve [09:07:38] thanks! [09:07:54] !log admin [codfw1dev] reimaging cloudvirt2003-dev (T276964) [09:07:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:07:58] T276964: cloudvirt2003-dev: missing neutron bridges - https://phabricator.wikimedia.org/T276964 [09:12:21] godog: thanks! [09:37:22] !log admin draining cloudvirt1023 for T275753 [09:37:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:16:02] !log deployment-prep briefly stopping deployment-puppetdb03 to disable VMX CPU flag [10:16:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [10:26:35] !log wmf-research-tools briefly stop region-groundtruth-test to disable VMX cpu flag [10:26:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wmf-research-tools/SAL [10:29:42] !log admin rebooting cloudvirt1023 for T275753 [10:29:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:33:40] !log admin draining cloudvirt1028 for T275753 [10:33:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:55:48] !log rcm briefly stopped VM alderaan to disable VMX cpu flag [10:55:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL [10:56:14] !log tools briefly stopped VM tools-k8s-etcd-7 to disable VMX cpu flag [10:56:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:57:33] !log clouddb-services briefly stopped VM clouddb-wikireplicas-proxy-1 to disable VMX cpu flag [10:57:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Clouddb-services/SAL [10:58:50] !log devtools briefly stopped VM 'doc' to disable VMX cpu flag and live-migrate it [10:58:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Devtools/SAL [10:59:34] !log recommendation-api briefly stopped VM 'spd-test' to disable VMX cpu flag and live-migrate it [10:59:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Recommendation-api/SAL [11:00:26] !log admin rebooting cloudvirt1028 for T275753 [11:00:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:05:49] !log admin draining cloudvirt1013 for T275753 [11:05:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:29:46] !log admin rebooting cloudvirt1013 for T275753 [11:29:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:31:29] !log admin draining cloudvirt1029 for T275753 [11:31:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:53:55] !log admin [codfw1dev] restart nova-conductor in all 3 cloudcontrol servers for T276964 [11:53:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:54:00] T276964: cloudvirt2003-dev: missing neutron bridges - https://phabricator.wikimedia.org/T276964 [11:56:51] !log admin [codfw1dev] restart rabbitmq-server in all 3 cloudcontrol servers for T276964 [11:56:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:43:56] !log dumps briefly stopping VM dumps-5 and dumps-4 to migrate hypervisor [12:43:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Dumps/SAL [12:44:26] !log mobile briefly stopping VM wikiwho-ios-experiments to migrate hypervisor [12:44:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Mobile/SAL [12:44:36] !log wmf-research-tools briefly stopping VM wikipediaWikidata to migrate hypervisor [12:44:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wmf-research-tools/SAL [12:44:44] !log clouddb-services briefly stopping VM clouddb-wikireplicas-proxy-2 to migrate hypervisor [12:44:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Clouddb-services/SAL [12:47:41] !log admin rebooting cloudvirt1029 for T275753 [12:47:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:51:40] !log admin draining cloudvirt1030 for T275753 [12:51:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:14:52] !log admin starting manually the canary VM for cloudvirt1029 (nova start 349830f6-3b39-4a8c-ada4-a7439f65cffe) (T275753) [13:14:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:48:18] !log entity-detection briefly stopping VM maps-beta-1 to migrate hypervisor [16:48:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Entity-detection/SAL [16:48:23] !log toolhub briefly stopping VM toolhub-beta01 to migrate hypervisor [16:48:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolhub/SAL [16:48:29] !log toolsbeta briefly stopping VM toolsbeta-test-k8s-etcd-8 to migrate hypervisor [16:48:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [16:48:34] !log wmf-research-tools briefly stopping VM content-similarity-prototype to migrate hypervisor [16:48:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wmf-research-tools/SAL [16:51:47] !log admin rebooting cloudvirt1030 for T275753 [16:51:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [18:56:36] !log tools.pagepile-visual-filter deployed c86ef3f7a5 (better error handling) [18:56:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.pagepile-visual-filter/SAL [18:59:31] hey, I changed where the 185.15.56.34 floating ip points (deployment-ircd to deployment-ircd02) but all VMs somehow think it's pointing to the internal IP of the old target [19:00:17] sorry, should been more clear. irc.beta.wmflabs.org dns record points to it on Horizon, but WMCS VMs see that pointing to the old target deployment-ircd [19:01:48] Majavah: could it just be DNS cache? [19:02:38] Zppix: querying it directly from ns-recursor0.openstack.eqiad1.wikimediacloud.org, not the hosts own cache but maybe recursors have their own cache too [19:03:26] Try the recursors Majavah [19:03:49] huh? [19:04:04] Sorry, try looking to see if they have their own cache [19:04:30] I have no idea, and that's why I'm asking here [19:04:33] I think all hosts have their own local DNS cache [19:04:45] I know that it's not cached anywhere on my VM [19:04:59] Are you on buster? [19:05:07] yes [19:05:49] Yeah doesnt appear by default theres a dns cache [19:06:25] “there is no OS-level DNS caching unless a caching service such as Systemd-Resolved” [19:06:30] As I already said I know it's not cached by my VM, as I already said. [19:06:35] Oh [19:07:01] Sorry i missed that [19:08:33] How long ago did you make the change? [19:08:59] https://sal.toolforge.org/log/9Qx2HXgB8Fs0LHO5BJo6 [19:08:59] Majavah: that sounds like a potential problem with the split horizon DNS setup, but I'm not sure. It does look like the PTR for 185.15.56.34 is showing the expected instance-deployment-ircd02.deployment-prep.wmflabs.org target. [19:10:52] bd808: external connectivity works just fine and goes to ircd02, but internally something rewrites irc.beta.wmflabs.org to the internal ip of deployment-ircd [19:15:18] Majavah: *nod* I can see with `dig @208.80.154.143 irc.beta.wmflabs.org` and `dig @208.80.154.24 irc.beta.wmflabs.org` that the internal resolver is still returning the 172.16.5.68 from deployment-ircd [19:16:03] andrewbogott: ^ troubleshooting tips for Designate not updating the internal mapping for a floating ip that moved to a new instance? [19:25:54] Majavah: I'm running our "leak detector" dns script to see if it notices what is wrong. I haven't been able to find the bad record manually yet. [19:38:32] bd808: magically fixed now [19:40:07] bd808: sorry for the delay… those are updated by a cron and I don't think it runs all that often [19:46:59] Majavah, andrewbogott: yeah. it looks like labs-ip-alias-dup ran at 19:32 and fixed the mapping. Wikitech says it runs every 30m, but I'm going to find the timer and update the docs if that's no longer true. [19:47:12] *labs-ip-alias-dump [19:48:18] now that it's working, I get to shut down one more Jessie instance :P [19:49:59] just to close the loop: the script that sets up the internal IP DNS for floating IPs runs once an hour at the 30m mark more or less. Docs said every 30m until I just changed them. [19:59:44] thanks bd808 [19:59:48] !log tools.lexeme-forms deployed 94dfecbc2a (generic API error handler) [19:59:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL [20:05:46] !log tools.lexeme-forms deployed 712d262475 (restore logging for generic API errors) [20:05:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL