High disk usage

derick · June 27, 2025, 3:56pm

I was curious about why some instances were consuming too much disk space (when reality dictates they shouldn’t - e.g a factor of the number of users and amount of reports they submit per day)

I navigated the couchdb folder and found very large db.couch.compact.meta and db.couch.compact.data files, even when compaction is not happening. There isn’t much info on what these files do but from their naming one can associate them with being useful during compaction.

Questions:

What would happen if we delete these files?
Whey have they grown so big?
Why are they lingering after compaction has completed?

For this instance, said files are occupying about 3TB of storage of out 3.7TB used.

I see two old issues from CouchDB github making mention of similarly named files:

hareet · June 27, 2025, 6:56pm

@derick Can you check the timestamp of the data files? medic.1682956796. If the ones from .compact are older than the actual data files, it will help answer some of your other questions.

What would happen if we delete these files?

Compaction should restart

Why have they grown so big?

Why are they lingering after compaction has completed?

From your investigation, it’s pointing to a stuck or stale compaction process. Especially if nothing is present in _active_tasks and timestamps of files have moved on.

Do we have a test instance of one of these projects with low doc / users but high disk usage or can we get a clone? I would love to test your idea of deleting the files and seeing if compaction restarts or the impact of those files on CouchDB or our actual data. This will also help identify if the files reach the same size, or get cleaned up once we can verify that compaction finishes.

derick · June 27, 2025, 7:56pm

Can you check the timestamp of the data files? medic.1682956796 . If the ones from .compact are older than the actual data files, it will help answer some of your other questions.

One one instance we have:

On another:

As an aside: I’m surprised we have similar shard ids on different instances

Do we have a test instance of one of these projects with low doc / users but high disk usage or can we get a clone?

I’ll check with the team and circle back cc @Karim_K_Kanji