[05:37:41] bd808: can I use https://csp-report.toolforge.org/collect for Cloud VPS projects? [10:03:15] ryankemper: excellent! I assume you will handle all the customer-facing communication part? Thanks [16:06:51] legoktm: Yes and no. It would need a small amount of work to do so safely. The collector looks at the document-uri from the report to determine which tool to assign the report to. Today it would cause naming collisions to send in a report from .wmcloud.org if was also a tool name. [18:49:04] bd808 hi there. Is there any update on the new provisioning mechanism you guys are implementing on adjusting VM disk size allocations after they have been provisioned? [18:51:25] CP678: yes! https://lists.wikimedia.org/pipermail/cloud-announce/2021-February/000366.html is the teaser. [18:52:20] That feature is ready now and you could use it to separate your database storage into a volume that can be moved from one instance to anohter as you upgrade things. [18:52:38] I'm drooling from excitement. [18:54:25] a.ndrewbogott is working on a follow up announcement too. We plan to make the base disk size 20GiB for all flavors with additional disk space being these quota limited cinder volumes. We will finally be able to reason about disk usage separate from CPU/RAM [18:57:49] I love it. My 300GB disk is running out, and I garbage collected everything I could collect. [19:11:47] bd808: https://phabricator.wikimedia.org/T277333 [19:28:34] !log tools.lexeme-forms deployed aa07bef3bd (i18n update) – also, previous SAL message mentioned 712d262475 but that’s still in git log @..@{u}, so I think I forgot to rebase last time [19:28:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL [19:47:45] bd808 I assume I need to wait until Wednesday before I see action on the request? [19:49:03] CP678: we are not going to give you 1TB [19:49:17] How much can I request then. [19:50:47] bd808 ^ [19:50:52] what is you current disk usage for just the database storage? Cinder volumes are resizable, so I think you should start at something like 110-125% of your actual usage today. [19:51:04] saking for 310% of you usage is too much [19:51:07] Current usage is 270 GB [19:53:17] Disk cap is 300 GB [19:53:46] 232G is your current mysql data usage. [19:53:54] With the bot being deployed to new wikis, this will fill fairly quickly [19:54:32] CP678, as bd808 mentions since the volume can be resized, it would be helpful to have something more reflective of current needs [19:54:33] \/srv reports 264 [19:54:45] how quickly? You've been running this thing long enough that you should be able to make pretty good estimates based on the size of each wiki [19:55:02] `du -sh /srv/mysql/data` 232G /srv/mysql/data [19:55:46] bd808 a few months best estimate. [19:56:01] CP678: show me the math please [19:58:39] CP678: this may sound harsh, but you cried wolf on filling the 300GiB multiple times so I have developed a lack of trust in your seat of the pants estimates [19:59:53] That's unfair. The DB was full in those instances, and was able to mitigate the problen, [20:00:12] But now we are beginning to hit the limit for real. [20:01:32] If I had not pushed back on your request and actually looked at your database myself, would you have "mitigated" the problems or just kept storing the junk? [20:03:01] bd808: hard to say, but your pushback proved to be an educational moment for me. [20:03:14] CP678: you probably should look at T269914 more seriously too. [20:03:14] T269914: IAbot sending a huge volume of action=raw requests (HTTP 415 errors) - https://phabricator.wikimedia.org/T269914 [20:05:00] bd808: I did. [20:11:37] bd808: back to the issue at hand, it's really hard to gauge how much a new wiki adds, as they are wildly differing in sizes. Given that we can resize at any time, is it unreasonable to ask to bump it to 350 GB for now? [20:12:06] that's a much more reasonable request [20:12:27] :-) [20:12:43] And BTW, I never INTENTIONALLY cry wolf. [20:12:49] You might want to think about externallinks_scan_log too and how much of that you really need to keep. [20:13:35] I didn't anticipate it would grow as rapidly as it did, but it should be retained as best as possible for error diagnostic reasons. [20:14:19] I will probably need to make it a rolling log. [20:15:34] arturo: yup I’ve got all the comms for the wdqs-cloudvirt1001 restart (and verifying service health after the restart etc) [20:15:54] bd808: I have updated the ticket for the quota increase. [20:18:01] CP678: in 3 months externallinks_scan_log has become larger than externallinks_global. It is 96G of your disk usage [20:18:21] this looks like another foot gun [20:18:32] Not necessarily. [20:18:35] !log tools.lexeme-forms deployed 9500beeed4 (three new translations) – should be a no-op but I didn’t want to leave it lying around without a webservice restart either [20:18:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL [20:18:42] CP678: I replied on the 415 task [20:19:12] This holds information useful in debugging false positive reports when the bot fails to properly assess URLs. [20:19:50] I'm sure it does, but it also is what is currently eating up all of your disk space [20:20:02] s/all/one third/ [20:20:14] Without it, debugging transient/random issues, is a lot harder. But I will implement a means to purge older data from the set to keep the size in check. [20:21:12] legoktm: looking again. I changed it yesterday, so maybe that wasn't it. :-( [20:21:52] I was holding off commenting to give it time to see if it resolved. [20:23:49] bd808: I was hoping that it would retain values back to a year, before I would have to start cleaning it up, but you are right that is a bit large at just 3 months. Maybe I will have to limit it to 1 month of logging and hope FP reports appear within that time. [20:33:01] legoktm: Okay, my bad it wasn't commented it was starred. (*) I commented it out (again). Let's see if this fixes it, because it's a broken task at current and not vital. I thought it was disabled long ago. [21:11:06] ack, thanks [23:05:13] * bstorm weekend [23:05:28] oh I usually do that in another channel meh. Which ever [23:06:06] Also, is a lie because there are sick grid queues [23:13:53] !log tools cleared error state for all grid queues [23:13:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL