Hi @sablearjun-ola - welcome to the forum!
I’m sorry to hear you’re having issues with your CHT instance. Can you clarify what version 5.0.0-custom-20260119 is? Is it a custom built version of the CHT and if yes, can you try running the official 5.0.1 release of CHT?
However, I’m not sure the version really matters here - from what you’ve described, I think you’re looking at data loss : (
To answer your questions:
Is there any supported way to repair a corrupted shard in CouchDB 3.5.0?
If a shard is corrupt and reporting read_beyond_eof errors, the supported way is to restore from backup.
Is rebuilding from analytics (v1.couchdb) considered safe in CHT disaster recovery?
Data in PostgreSQL populated from either couch2pg or CHT Sync can continue to be used when the upstream CHT instance is offline due to data corruption. However, you can not rebuild the data in PostgreSQL without restoring the CHT instance.
Additionally, the data in PostgreSQL is involves aggregating and consolidating. As such, there is intentionally data loss and CouchDB data can not be recreated from couch2pg or CHT Sync data in PostgreSQL.
Are there recommended hardening steps to prevent shard corruption in Docker deployments?
Being sure to run production CHT instances in a data center or cloud provider is likely the best way. These have high availability hardware with multiple levels of redundant power supply to avoid hard shutdowns of VMs or bare-metal.
For more budget constrained instances, running on a dedicated server, only used for production services with a UPS can help.
Should we upgrade CouchDB before re-importing data?
If you have a valid backup of the corrupt shard that you’re trying to restore, do not change the version of CouchDB.
cc @diana @binod - we had a private chat about this topic. Adding you in case I missed anything or made any mistakes!