[01:46:46] !log tools killed the toolschecker cron job, which had an LDAP error, and ran it again by hand [01:46:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [03:35:36] !log tools rebooting tools-sgecron-01 to try to clear up the ldap-related errors coming out of it [03:35:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [03:37:11] !log tools deleted a massive number of stuck jobs that misfired from the cron server [03:37:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [03:49:30] !log tools restarting sssd on tools-sgegrid-master [03:49:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [03:59:13] !log tools rebooting grid master. sorry for the cron spam [03:59:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:12:52] !log tools rebooted tools-sgeexec-0935.tools.eqiad.wmflabs because it forgot how to LDAP...likely root cause of the issues tonight [04:12:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [06:43:12] https://wiki.debian.org/DebianStretch says that Stretch should be on kernel 4.9, but somehow I have deployment-ms-be[05-06] on stretch and kernel .19 which is causing some issues with swift, any ideas how this happened? :/ [09:56:28] Majavah: no, sorry :-( [10:28:34] arturo: sll stretch wmcs vms I checked are on 4.19 for some reason, I guess I'll have to work around this specific issue, looks like it should be as simple as changing a condition on a puppet manifest [10:37:30] Majavah: I'm not familiar with why that happened. Maybe open a phab task and ask for clarification from other deployment-prep admins? [10:46:24] arturo: it's not limited to deployment-prep, everything I checked on toolforge (bastion and a random sgeexec node) and the generic bastions are on 4.19, so I'm suspecting it's either some cloud-vps automatic update or in the base images itself [10:46:56] it can be worked around, just curious why cloud vps uses a different kernel than production on the same distro. should I still open a task, and where? cloud-vps and wmcs-kanban? [10:49:00] Majavah: I see stretch-backports has 4.19.118-2+deb10u1~bpo9+1, so perhaps the VMs are using a backported kernel? [10:50:59] Majavah: does this helps? [10:51:01] https://www.irccloud.com/pastebin/M2XfZh2I/ [10:51:32] maybe https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/manifests/wmcs/instance.pp#77 ? [10:51:46] arturo: Installed: 4.9+80+deb9u13 [10:51:52] now I'm just more confused [10:52:12] the interesting commands are the last 2 [10:52:48] Majavah: [10:52:51] https://www.irccloud.com/pastebin/LasFZpcY/ [10:53:08] somehow a backported kernel ended up in the stretch security repository [10:53:40] https://paste.toolforge.org/view/173d16d8 [10:55:04] dcaro: I guess that's where the repo comes, I just thought each package would need to have been pulled from backports manually [10:55:08] Majavah: yeah, same: backported kernel in the security repository [10:55:54] https://www.debian.org/security/2021/dsa-4843 ? [10:58:11] Majavah: in deployment-prep VMs you can simply downgrade the kernel. But again, that's something you deployment-prep folks should decide on [11:00:05] arturo: okay, thanks for the help, not sure yet what's the best way forwards (fixing puppet manifests to support 4.19 on stretch or downgrading) but I'm sure we'll figure something out [11:08:23] 👍 [12:47:32] !log toolsbeta delete puppet prefix `toolsbeta-buster-grirdmaster` (no longer useful) T277653 [12:47:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:47:38] T277653: Toolforge: migrate grid to Debian Buster - https://phabricator.wikimedia.org/T277653 [12:48:13] !log toolsbeta destroy VM toolsbeta-buster-gridmaster (no longer useful) T277653 [12:48:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:50:10] !log toolsbeta added puppet prefix `toolsbeta-sgegrid-shadow`, migrate puppet config from VM to here [12:50:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:51:30] !log toolsbeta rebuild toolsbeta-sgegrid-shadow instance as debian buster (T277653) [12:51:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:53:11] !log toolsbeta create anti-affinity server group toolsbeta-sgegrid-master-shadow [12:53:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [16:19:09] !log tools added profile::toolforge::infrastructure class to puppetmaster T277756 [16:19:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:20:41] !log tools disabling puppet tools-wide to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/672456 [16:20:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:21:30] !log tools enabling puppet tools-wide [16:21:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:24:10] !log toolsbeta live-hacking puppetmaster with https://gerrit.wikimedia.org/r/c/operations/puppet/+/672456 [16:24:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [16:52:53] andrewbogott: sorry I got distracted. I'm ready to pay attention to the grid stuff now [16:53:27] np, I'm in the process of building toolsbeta-sgeexec-0902, once it's up I'll have you check my work :) [16:53:38] ok, that will take some time [16:54:00] yeah :/ [16:58:05] maybe I will get some lunch while puppet runs. arturo if you want to go we can revisit this tomorrow, otherwise I'll ping you when it finishes [16:59:28] andrewbogott: ok! I will disconnect then for today [16:59:45] I think this doesn't block me for the grid buster upgrade stuff I plan to do tomorrow anyway [16:59:56] (without the path is just a small puppet agent complain) [17:03:31] I am getting the "Puppet failure" mails for nodes like "node3.cluster.local" but I have no idea what they are. Normally you receive this mail if you are a project admin. But this one surprises me. Should I dig more? [17:43:28] mutante: that suggests that a VM has forgotten its hostname :( [17:44:42] let me see if I can figure out who is saying that. [17:47:03] mutante: are you a member of pontoon by chance? That would be my first guess... [17:47:15] andrewbogott: aha! thank you. so actually there are 3 of them. node1, node2 and node3 [17:47:20] Things in automation-framework are also chronically broken [17:47:32] checking pontoon [17:47:58] When I last looked it seemed beyond help :) [17:48:39] no, I don't see pontoon in my project list on Horizon [17:49:06] maybe "puppet-diffs". let's see if I can leave that [17:49:46] ah no.. then I probably cant sync compiler facts [17:50:41] checking which of the projects I am in has exactly 3 nodes..hmm [17:50:54] bastion does [17:51:49] packaging does [17:52:28] puppet-diffs does ... yea,, those 3.. the others would not match [17:55:09] I'm doing some cumin searches for broken instances… no guarantee this will find whatever's emailing you though [17:55:12] https://www.irccloud.com/pastebin/PELdagPK/ [17:56:18] andrewbogott: I guess let's just see if it keeps doing it every day in the future or it stops [17:56:59] mutante: here are the other candidates: [17:57:02] https://www.irccloud.com/pastebin/eqg1ehF2/ [17:57:14] although if the hostname is scrambled I have no idea if cumin can reach them [17:57:23] if it mails me as the proper "k8splay" again i will ping Wolfgang and check why puppet is bron there [17:58:09] if it keeps mailing me as "node1" though I can open a ticket [17:59:11] andrewbogott: hmm, thanks but in that paste I see nothing that looks familiar. maybe it was k8splay though [18:14:06] mutante: ok! It's clearly not urgent, just depends on your tolerance for cronspam [18:17:29] andrewbogott: tolerance is high enough :) [18:18:03] we all have lots of practice [18:35:31] mutante: that domain sounds like k8s [18:44:33] !log toolsbeta replacing toolsbeta-sgegrid-master with a Debian Buster VM (T277653) [18:44:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [18:44:38] T277653: Toolforge: migrate grid to Debian Buster - https://phabricator.wikimedia.org/T277653 [18:49:29] !log toolsbeta deleting VM toolsbeta-workflow-test, no longer useful [18:49:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [18:50:19] !log toolsbeta deleting VMs toolsbeta-paws-worker-1001 toolsbeta-paws-worker-1002 toolsbeta-paws-master-01 (testing for PAWS should happen in the paws project) [18:50:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [18:55:55] !log toolsbeta set profile::toolforge::infrastructure across the entire project with login_server set on the bastion prefix [18:55:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [19:24:23] !log tools set profile::toolforge::infrastructure across the entire project with login_server set on the bastion and exec node-related prefixes [19:24:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:52:23] * bstorm lunch [19:52:37] ugh, getting my tabs confused. moving that to the other channel [20:02:26] bd808: andrewbogott: maybe it's just my monitoring application but I just mounted the new Cinder volume under /srv-new and my usage monitor already is claiming 19GB of used space. [20:03:11] I haven't put anything in it yet. [20:05:12] what does df say? [20:06:20] That my monitoring application is being dumb and I should switch to something better lol. [20:06:57] Thanks [20:08:04] np! Glad you're trying out cinder. [20:09:23] I've been eagerly waiting for this new feature. :D [20:10:49] andrewbogott: I [20:11:09] I'm reading the section on moving old srv data to the new cinder, but the instructions confuse me a liitle. [20:11:37] Step 2 tells me to verify the mounted drive exists of the new volume. [20:11:50] So what am I mounting in Step 6? [20:13:51] https://wikitech.wikimedia.org/wiki/Help:Adding_Disk_Space_to_Cloud_VPS_instances#Moving_old_/srv_data_to_new_volume [20:23:10] Cyberpower678: you're reading in the deprecated lvm section now I think? [20:23:17] So not sure if it applies to what you're ding [20:23:48] Yea, I noticed. I just adapted the command to copy from /srv to /srv-new [20:24:00] 'k [20:24:22] But once I have it copied, how can I change the mountpoints to move srv to srv-old and srv-new to srv? [20:25:02] I would edit /etc/fstab and then reboot [20:25:35] although typically when people are moving from an lvm volume to cinder it's in anticipation of attaching that cinder volume to a fresh VM I would think [20:25:42] alright that's easy enough. [20:26:20] Not inclined to setup the VM at this time. I've got so much to do right now. :-) [20:27:40] Should I also remove the srv puppet role, or will it not matter>? [20:33:08] You probably should remove it, otherwise they'll collide