We have a node running CHT 4.6.0 and for some time now since December we have been experiencing slow syncing and replication. Find our specs below.
Server Specs:
Cores: 8
RAM: 16GB
OS: Ubuntu 22.04 LTS
Watchdog Stats (last 90 days):
Monthly Active Users: 769
Replication 50th Percentile: 21.9 s 43.4 mins
Replication 90th Percentile: 41.1 s 56.7 mins
Replication Max: 3 mins 1 hour
CouchDB Stats:
Mode: Single node
Doc count: 8800965
What We Have Tried So Far:
Upgraded our infrastructure to its current state
Upgrade CHT but due to bugs with the upgrade service we could only upgrade up to v4.6.0, from v4.3.1
Adjust replication depth for some of our user groups to reduce load on the server
Attempted to force an upgrade manually through Docker to v4.11.0 so that we get some of the updates to how replication is done but it would just hang when running migrations
I would consider that your server specs are quite low for an instance that serves almost 800 users. I would first recommend trying to further upgrade your instance.
I think @binod 's question is very relevant, the number of docs (including purged docs) is quite impactful of replication times.
Thirdly, can you please share which bugs in the upgrade service you are referring to? have u tried upgrading the upgrade service first?
About the upgrade service, we encountered the view indexing and CouchDB crashing bugs. We tried restarting the concerned services each time but it never really would finish. We referred to the docs here, Troubleshooting 4.x upgrades | Community Health Toolkit.
@danielmwakanema - thanks for sharing your issue! As @diana mentioned, before following any of the troubleshooting steps - be sure you upgrade your upgrade service. Assuming you’re using docker to host, do this by pulling the image with:
You can check before and after to see if your image is updated with a docker images call - it should have been created ~4 mo ago:
docker images public.ecr.aws/s5s3h4s7/cht-upgrade-service
REPOSITORY TAG IMAGE ID CREATED SIZE
public.ecr.aws/s5s3h4s7/cht-upgrade-service latest bf1133f540ed 4 months ago 396MB
Can you please be a little more verbose? We need a bit more information to help you
we could only upgrade up to v4.6.0, from v4.3.1
So is this right?
You upgraded to 4.6.0
The upgrade to 4.6.0 completed (?)
You had some crashes
After some time of running 4.6.0, you downgraded back to 4.3.1?
Can you provide some details about the crashes? Logs? At least what service is crashing? Why exactly do you think it was “during indexing” (usually this completes before the upgrade)? Did you check disk space which is the normal cause of crash during index?
@kenn, the upgrade to v4.6.0 completed. However CouchDB would crash during indexing, the indexing that happens after a restart, with the following error: .
We know it was during indexing because there were some indexing tasks running at those times. We verified through Fauxton. We also verified that we have enough disk space.
For KE deployment, the most comparable instance is Isiolo with ~800 users. Server specs are 16 cores, 8GB RAM & 500 GB disk. It’s been operating smoothly so far. I recommend monitoring resource consumption periodically to identify where the bottleneck is.