Chunking Data in CHT android App

Kenyuri · April 1, 2025, 10:17am

Hello everyone,

We’ve been encountering synchronization issues on our CHT application due to large payloads and network interruptions. Some clinics can accumulate multiple days of data (potentially reaching hundreds of thousands of characters), and if a network failure happens, the entire payload must be resent.

To mitigate this, we’re considering a chunk-based approach, where data is split into smaller pieces, so if a single chunk fails during transmission, only that portion is re-sent instead of the entire dataset. This would reduce both bandwidth usage and the likelihood of hitting payload limits.

Does the CHT application have any built-in support for sending data in smaller chunks?
If not, has anyone successfully modified or extended the sync process to enable chunking?
Any best practices or code samples you can share regarding chunk-based uploads or sync retries?

We’d appreciate any insights, experiences, or suggestions you may have. Thank you in advance for your help!

mrjones · April 1, 2025, 4:20pm

@Kenyuri - thanks for your question - this sounds interesting!

Can you quantify what you mean by “issues on our CHT application due to large payloads and network interruptions”? Specifically:

how many documents are you trying to sync when you see this issue?
are there individual documents that are too large or the total size of many small documents that are too large?
how did you discover this was a problem? what were the symptoms that you saw in the CHT and with the end user? how do you know this is a large payload issue?

Helping us better understand your issue will guide the conversation more toward if this is a CHT based issue or a configuration based issue.

I otherwise no work on chunking has been done that I know of.

To mitigate the amount of data, but specifically the number of documents and not individual large documents, be sure to read up on these topics:

replication depth
purging
Warn end users about syncing too many docs
per user replication limits which is also exposed in monitoring API and thus available in Watchdog

jkuester · April 1, 2025, 7:48pm

Just to add a bit more context to what @mrjones has already said: under the hood, the CHT just uses PouchDB - CouchDB replication to synchronize data between offline user’s devices and the server.

The Pouch/Couch data transfer uses an extremely robust protocol that should allow for partial syncs (replicating just some of the documents as network connectivity allows). That being said, I wonder if you are actually talking about chunking data at a sub-doc level? How big are your documents normally? Is that what you mean by “hundreds of thousands of characters”?

diana · April 2, 2025, 6:12am

Uploads are already batched by document count (and the batch size is 100 documents).

We have one additional “chunking” mechanism when the payload the client sends exceeds the maximum size the server accepts, which is now set to 32MB, which decreases the batch size even further: Users with a lot of large docs to replicate continuously fail · Issue #6143 · medic/cht-core · GitHub
However this does not cover repeated client-side failed uploads for other reasons.

The document size for CHT is generally very small, and the maximum batch size that is sent to the server is of 100 documents. When we hit the 32M limit and implemented the batch reduction I linked above, the cause was uploading large numbers of reports that had image attachments - so a batch of 100 reports, each having an image attachment can easily go over 32MB.

I’m curious what kind of payloads fail in your case @Kenyuri . Do you also have image attachments to your reports?