[08:15:04] morning. the RAM usage on cloudcephosd1006 did reset yesterday at 18:00 UTC, but I think it was due to dcar.o tweaking the setting [08:15:12] it seems to be growing again, I'll keep an eye on it today [14:30:41] andrewbogott: do you have any explanation for the quota behavior in T399746? [14:30:41] T399746: Request to increase Object Storage capacity - Wikilink project - https://phabricator.wikimedia.org/T399746 [14:31:02] user_quota.max_objects is set, but it seems to apply per-bucket [14:31:08] I would expect it to apply per-project [14:32:00] I can look at the docs! Just to make sure I understand... the behavior you're seeing means that users can create /more/ objects than they expect, right? [14:32:28] yes, not enough for their needs... but more than I would expect [14:32:42] I looked at the docs and I was still confused [14:33:04] unless the quota applies "eventually", and they managed to create a few thousands objects before the quota kicked in, but it seems weird [14:33:13] Are you setting the quotas with a cookbook, and does the cookbook specify --quota-scope ? [14:33:25] the current quota was set by somebody else months ago... [14:33:53] ok [14:33:54] but there is a SAL entry from then: "--quota-scope=user" [14:34:01] https://phabricator.wikimedia.org/T392870#10806765 [14:34:04] I definitely see what you mean about it not doing what you'd expect [14:34:29] All I can think is that having unlimited per-bucket quota somehow overrides the user quota [14:34:33] but then what even is a user quota? [14:34:44] if you're also confused maybe I can try reproducing it on another bucket with a smaller quota [14:35:01] I think there's nothing for it but for us to write a script to upload a bunch of files and then experiment and see if we can /ever/ hit an object limit. [14:35:05] yeah [14:35:11] I assumed "user quota" meant "project quota" given that user name is "wikilink$wikilink" [14:35:19] yes, that it is also what I assumed [14:35:30] I think that the docs have failed and it's time for empiricism. [14:35:45] agreed :D I was hoping you had a magic answer and/or ran into this before [14:36:08] I haven't, although I'm pretty sure that I have at least had users run out of objects? So there must be /some/ enforcement someplace [14:36:27] ok, I'll see if empiricism gives me any answers :) [14:36:27] You are right that in this context a rados 'user' should == an openstack 'project' [14:36:37] thanks, I am now curious [14:36:58] in other news, I'm crossing fingers that dcar.o's change from yesterday in the ceph memory limits might be helping [14:37:14] we'll find out in a couple hours, see the graph at https://grafana.wikimedia.org/goto/K9_YyJUNR?orgId=1 [14:38:27] if not, I expect cloudcephosd1006 ( the bookworm one) might crash later today, are you around if it needs a restart? [14:42:31] I'll be around. [14:43:04] great. if it _does_ crash it might be worth downgrading it back to bullseye, or taking it out of the cluster... otherwise it might need further reboots during the weekend [14:43:08] Even if the change manages the memory overrun, that seems more like managing a symptom... although maybe that's enough. [14:43:42] yeah. Although even if it behaves very badly it shouldn't affect cluster behavior since it's just the one host. [14:43:49] true that [14:43:58] ^^ TEMPTING FATE ^^ [14:44:02] :D [14:44:54] * andrewbogott wonders what the online chat equivalent is of knocking wood [14:46:33] I still don't like the idea of one host swapping and/or doing crazy things for the entire weekend :) [14:47:15] yeah me neither. Especially if ssds are going to fail at the same time. [14:47:47] the one that failed was on a host that was not having issues though (which adds to the mystery) [16:50:12] andrewbogott: I was busy working on T396724 so I could not do any empirical testing of the object quotas [16:50:13] T396724: [trove] Disk full for DBapp instance in glamwikidashboard project - https://phabricator.wikimedia.org/T396724 [16:50:18] I think they can wait until next week [16:51:02] yep, the user isn't going to get blocked by having too much room [16:51:25] well it looks like they are blocked, maybe I could just bump to 100k as they asked, and then we do some checks next week [16:52:27] they want 100k objects in a SINGLE bucket, and my understanding is that now they can only put 20k per bucket (to be verified) [16:54:11] if you think it's fine to bump the current quota from 20k to 100k, leave a +1 in the request and I'll run the command T399746 [16:54:12] T399746: Request to increase Object Storage capacity - Wikilink project - https://phabricator.wikimedia.org/T399746 [16:55:19] I'm not sure they actually tried to upload more than 20k in a single bucket, maybe you're right and the quotas are simply not working at all [16:56:36] I think it's Ok to bump it up [17:04:56] can you leave a +1 on the task, per the standard quota request procedure? [17:10:19] reading again, I don't think they're //immediately// blocked, just worried for the long-term growth [17:10:36] I will go offline and think about it on Monday :) [17:12:26] have a good weekend!