[08:15:04] <dhinus>	 morning. the RAM usage on cloudcephosd1006 did reset yesterday at 18:00 UTC, but I think it was due to dcar.o tweaking the setting
[08:15:12] <dhinus>	 it seems to be growing again, I'll keep an eye on it today
[14:30:41] <dhinus>	 andrewbogott: do you have any explanation for the quota behavior in T399746?
[14:30:41] <stashbot>	 T399746: Request to increase Object Storage capacity - Wikilink project - https://phabricator.wikimedia.org/T399746
[14:31:02] <dhinus>	 user_quota.max_objects is set, but it seems to apply per-bucket
[14:31:08] <dhinus>	 I would expect it to apply per-project
[14:32:00] <andrewbogott>	 I can look at the docs! Just to make sure I understand... the behavior you're seeing means that users can create /more/ objects than they expect, right?
[14:32:28] <dhinus>	 yes, not enough for their needs... but more than I would expect
[14:32:42] <dhinus>	 I looked at the docs and I was still confused
[14:33:04] <dhinus>	 unless the quota applies "eventually", and they managed to create a few thousands objects before the quota kicked in, but it seems weird
[14:33:13] <andrewbogott>	 Are you setting the quotas with a cookbook, and does the cookbook specify --quota-scope ?
[14:33:25] <dhinus>	 the current quota was set by somebody else months ago...
[14:33:53] <andrewbogott>	 ok
[14:33:54] <dhinus>	 but there is a SAL entry from then: "--quota-scope=user"
[14:34:01] <dhinus>	 https://phabricator.wikimedia.org/T392870#10806765
[14:34:04] <andrewbogott>	 I definitely see what you mean about it not doing what you'd expect
[14:34:29] <andrewbogott>	 All I can think is that having unlimited per-bucket quota somehow overrides the user quota
[14:34:33] <andrewbogott>	 but then what even is a user quota?
[14:34:44] <dhinus>	 if you're also confused maybe I can try reproducing it on another bucket with a smaller quota
[14:35:01] <andrewbogott>	 I think there's nothing for it but for us to write a script to upload a bunch of files and then experiment and see if we can /ever/ hit an object limit.
[14:35:05] <andrewbogott>	 yeah
[14:35:11] <dhinus>	 I assumed "user quota" meant "project quota" given that user name is "wikilink$wikilink"
[14:35:19] <andrewbogott>	 yes, that it is also what I assumed
[14:35:30] <andrewbogott>	 I think that the docs have failed and it's time for empiricism.
[14:35:45] <dhinus>	 agreed :D I was hoping you had a magic answer and/or ran into this before
[14:36:08] <andrewbogott>	 I haven't, although I'm pretty sure that I have at least had users run out of objects? So there must be /some/ enforcement someplace
[14:36:27] <dhinus>	 ok, I'll see if empiricism gives me any answers :)
[14:36:27] <andrewbogott>	 You are right that in this context a rados 'user' should == an openstack 'project'
[14:36:37] <andrewbogott>	 thanks, I am now curious
[14:36:58] <dhinus>	 in other news, I'm crossing fingers that dcar.o's change from yesterday in the ceph memory limits might be helping
[14:37:14] <dhinus>	 we'll find out in a couple hours, see the graph at https://grafana.wikimedia.org/goto/K9_YyJUNR?orgId=1
[14:38:27] <dhinus>	 if not, I expect cloudcephosd1006 ( the bookworm one) might crash later today, are you around if it needs a restart?
[14:42:31] <andrewbogott>	 I'll be around.
[14:43:04] <dhinus>	 great. if it _does_ crash it might be worth downgrading it back to bullseye, or taking it out of the cluster... otherwise it might need further reboots during the weekend
[14:43:08] <andrewbogott>	 Even if the change manages the memory overrun, that seems more like managing a symptom... although maybe that's enough.
[14:43:42] <andrewbogott>	 yeah. Although even if it behaves very badly it shouldn't affect cluster behavior since it's just the one host.
[14:43:49] <dhinus>	 true that
[14:43:58] <andrewbogott>	 ^^ TEMPTING FATE ^^
[14:44:02] <dhinus>	 :D
[14:44:54] * andrewbogott wonders what the online chat equivalent is of knocking wood
[14:46:33] <dhinus>	 I still don't like the idea of one host swapping and/or doing crazy things for the entire weekend :)
[14:47:15] <andrewbogott>	 yeah me neither. Especially if ssds are going to fail at the same time.
[14:47:47] <dhinus>	 the one that failed was on a host that was not having issues though (which adds to the mystery)
[16:50:12] <dhinus>	 andrewbogott: I was busy working on T396724 so I could not do any empirical testing of the object quotas
[16:50:13] <stashbot>	 T396724: [trove] Disk full for DBapp instance in glamwikidashboard project - https://phabricator.wikimedia.org/T396724
[16:50:18] <dhinus>	 I think they can wait until next week
[16:51:02] <andrewbogott>	 yep, the user isn't going to get blocked by having too much room
[16:51:25] <dhinus>	 well it looks like they are blocked, maybe I could just bump to 100k as they asked, and then we do some checks next week
[16:52:27] <dhinus>	 they want 100k objects in a SINGLE bucket, and my understanding is that now they can only put 20k per bucket (to be verified)
[16:54:11] <dhinus>	 if you think it's fine to bump the current quota from 20k to 100k, leave a +1 in the request and I'll run the command T399746
[16:54:12] <stashbot>	 T399746: Request to increase Object Storage capacity - Wikilink project - https://phabricator.wikimedia.org/T399746
[16:55:19] <dhinus>	 I'm not sure they actually tried to upload more than 20k in a single bucket, maybe you're right and the quotas are simply not working at all
[16:56:36] <andrewbogott>	 I think it's Ok to bump it up
[17:04:56] <dhinus>	 can you leave a +1 on the task, per the standard quota request procedure?
[17:10:19] <dhinus>	 reading again, I don't think they're //immediately// blocked, just worried for the long-term growth
[17:10:36] <dhinus>	 I will go offline and think about it on Monday :)
[17:12:26] <dhinus>	 have a good weekend!