[00:01:41] ariutta: yeah, that seems like a good summary. [00:02:40] bd808: thanks. Any recs on where to store database content or where a webserver root should be? [00:03:42] Both probably would be best on a cinder volume now that we have them. The db storage for sure is a common use case for a Cinder volume. [00:06:29] trying to think of a good analogy for each storage system. The instance's "system disk" lives and dies with the instance, maybe think of it as a laptop drive. A Cinder volume is a bit like an external USB drive that you can unplug from one instance and then plug into another (but only one at a time usually). [00:06:36] And then NFS is like tying your data to the feet of pigeons and hoping they bring it back when you need it. ;) [00:12:01] bd808: haha, thanks! Helpful analogies. [00:13:19] bd808: unless my puppet patch got merged the prepare_cinder_volume script wont work for anyone who tries it fyi, ive tested my patch on a instance and it worked as well [00:14:16] I'm confident that one of the SREs will review and merge as appropriate :) [00:14:31] Ik i was just letting you know [00:14:58] I didnt want it to surprise you when someone came back and said it error’d :) [06:53:28] do you have a way of knowing which instance(s) currently have a connection with kornbluth.freenode.net? ref T276299#6877466 [06:53:28] T276299: dbctl not sending !log to IRC - https://phabricator.wikimedia.org/T276299 [07:00:13] Majavah: this isnt cloud [07:00:18] Majavah: logmsgbot is prod [07:00:36] Zppix: yes, but logmsgbot is currently connecting from a cloud address [07:00:55] i believe its been established that it only appears that way due to nAT [07:00:58] NAT* [07:01:24] as its hosted on the same server as icinga iirc [07:05:22] so prod servers use the wmcs nat address too? [09:26:58] no, they don't. wikis and cloud are different networks [09:30:04] !log admin installing linux kernel 5.10.13-1~bpo10+1 in cloudnet1003 and rebooting it (network failover) (T271058) [09:30:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:30:08] T271058: cloudnet1004/cloudnet1003: network hiccups because broadcom driver/firmware problem - https://phabricator.wikimedia.org/T271058 [09:58:58] !log admin update firmware-bnx2x from 20190114-2 to 20200918-1~bpo10+1 on cloudnet1003 (T271058) [09:59:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:59:02] T271058: cloudnet1004/cloudnet1003: network hiccups because broadcom driver/firmware problem - https://phabricator.wikimedia.org/T271058 [10:00:57] !log admin rebooting again cloudnet1003 (no network failover) (T271058) [10:01:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:01:56] !log admin icinga-downtime cloudnet1003 for 14 days bc potential alerting storm due to firmware issues (T271058) [10:02:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:47:58] !log admin moved cloudvirt1014 to the 'maintenance' host aggregate, drain it for T275753 [11:48:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:50:46] !log admin moved cloudvirt1023 away from the maintenance host aggregate, leave it in the ceph aggregate (was in the 2) [11:50:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:58:40] !log admin rebooting cloudvirt1014 for T275753 [11:58:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:59:37] !log admin cloudvirt1014 now in the ceph host aggregate [11:59:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:01:45] !log admin draining cloudvirt1016 for T275753 [12:01:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:20:34] !log admin rebooting cloudvirt1016 for T275753 [12:20:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:22:23] !log admin draining cloudvirt1017 for T275753 [12:22:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:26:17] !log tools.stewardbots hard restart stewardbot, disconnected from the network but nothing in logs [12:26:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [12:27:39] Majavah: that would be me failovering cloudnet1003 earlier today [12:34:53] * dcaro lunch [12:49:40] !log admin rebooting cloudvirt1017 for T275753 [12:49:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:28:14] !log admin draining cloudvirt1018 for T275753 [13:28:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:59:04] !log admin rebooting cloudvirt1018 for T275753 [13:59:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:31:08] !log admin draining cloudvirt1021 for T275753 [14:31:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:48:53] !log tools killed pywikibot instance running in tools-sgebastion-07 by user msyn [14:48:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:11:34] !log tools tools-sgebastion-07 triggered a neutron exception (unauthorized) while being live-migrated from cloudvirt1021 to 1029. Resetting nova state with `nova reset-state bd685d48-1011-404e-a755-372f6022f345 --active` and try again [15:11:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:17:08] !log tools shutting down tools-sgebastion-07 in an attempt to fix nova state and finish hypervisor migration [15:17:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:34:55] !log admin rebooting cloudvirt1021 for T275753 [15:34:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:00:14] !log admin move cloudvirt1013 into the 'toobusy' host aggregate, it has 221% cpu subscription and 82% MEM subscription [16:00:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:03:30] !log admin draining cloudvirt1022 for TT275753 [16:03:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:03:39] !log admin draining cloudvirt1022 for T275753 [16:03:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:10:01] dcaro: I hope its okay, but I went ahead and fixed your typo on sal for you :) [16:10:32] Zppix: thanks! no problem :) [16:12:44] anytime [17:16:34] !log admin restarting rabbitmq-server on cloudcontrol1003,1004,1005; trying to explain amqp errors in scheduler logs [17:16:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:26:39] !log maps resizing deployment-maps08 to (identical) flavor g2.cores4.ram8.disk80 [17:26:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Maps/SAL [18:19:30] So to make sure im reading the standalone puppetmaster docs right, after doing step 1, do i in step 2 run the rm -rf /var/lib/puppet/ssl cause wouldnt that delete the cert that was just generated in step 1? [18:21:58] Zppix: yes, that's what you want, because that cert is signed by the central puppetmaster, and now you want one signed by your own new puppetmaster [18:22:46] Zppix: but run it on the client.. not the master [18:24:05] mutante: ok, so i shouldnt apply the puppetmaster hiera to the local puppetmaster then? [18:29:33] Zppix: ehm.. I _think_ it can be both ways, obviously you want all your clients (agents) to use your own master. then the remaining question is.. which master should the agent on your master use. itself.. or another master. [18:30:25] Zppix: but the quote "Only if the client is the puppetmaster itself," under 2.3 tells me it is using itself and then the answer is .. you should apply hiera on the puppetmaster too [18:30:58] but i shouldnt run the rm on the /ssl dir on the master regardless right? [18:32:46] actually.. yea. you should. and then run all the things under "Only if the client is the puppetmaster itself, run also: " [18:33:06] ok [18:33:14] when the docs say stuff about "on the client instance" you gotta see the master and both a master and a client [18:33:21] as [18:33:35] if it uses itself as the master [18:34:02] alright ill try it, thanks [21:50:39] so my cloud vps was made before eqiad1 existed, should i only show 1 dns zone in horizon, and that being for projectname.wmflabs.org. or should there also be another zone with the updated urls? [21:51:40] im just trying to figure out why i cant get this new puppetmaster instance to not 403 [21:58:48] Zppix: your VMs are going to talk to the master on an internal domain, most likely .eqiad1.wikimedia.cloud [21:59:48] andrewbogott: ok, but i've not even been able to finish setting up my new master yet, because it gets 403 when trying to puppet agent [22:00:09] is the master trying to use itself as a master? [22:00:18] if so you're going to have a chicken/egg issue [22:00:20] yes, but the old one did just fine [22:01:36] so should it not be set as its own master? [22:05:41] You'll be much happier if you have your puppetmaster pointed at the standard puppetmaster.cloudinfra.wmflabs.org [22:05:51] ok [22:06:00] otherwise you'll get into conditions like the one you're in now (where the master is trying to set itself up using a master that is not yet set up) [22:06:24] im just confused on how it managed to work fine before on the old master [22:12:29] anyway thanks [22:45:01] My instance wikipathways-dev.wikipathways.eqiad1.wikimedia.cloud has image name "debian-10.0-buster (deprecated 2021-02-22)". Does that "deprecated" mean I should do something to update/upgrade? [22:55:51] ariutta: not really, no. It mostly means that there is a newer base image available if you were to build a fresh instance. There will be lots of loud and persistent noises when you run an instance that is using an OS that we are truly deprecating. :) [22:56:51] haha, figured that was the case. [22:57:23] We do things like https://os-deprecation.toolforge.org/ to reach out to folks running an actually deprecated OS [22:57:51] and that comes with phabricator task nags and mailing list announcements, etc [22:59:28] BTW, thanks to everyone for the help getting up to speed w/ Cloud VPS! Our data site is now working: https://wikipathways-data.wmcloud.org [23:03:15] awesome! [23:36:27] public service announcement: "/mode +R" will help prevent you from receiving Freenode PM spam. This makes your account require the account attempting to PM you be authenticated with Freenode which in turn makes it much harder for spam bots to pester you.