CHT app slowing down

danielmwakanema · September 18, 2023, 11:20am

We are running an instance of 4.x and using Samsung A7 Lite tablets to see clients. We have noted that the performance of the app on health worker tablets has degraded quickly over a month or two. This as the number of documents has increased. We started with ~700 documents and now we have ~10k.

CHT specs:
Version: 4.x

Device specs:

Additional details:

avg load time is 7+ seconds on the tablets
the app has ~10k documents at the moment
the app is slow when loading health centers, reports tab, tasks tab and initializing
we tried running the app on higher end devices like an Xiaomi Redmi Note 10T, iPhone 14, MacBook, mid 2010s desktop and the performance problems varied from improved to completely disappeared

Nice to have:

low cost solution

diana · September 18, 2023, 2:32pm

Hi @danielmwakanema

It’s expected for the number of documents to affect device performance.
Loading times vary depending on many factors, including battery levels, how many other apps are running on the same device, etc.
We store loading time statistics in telemetry docs: User telemetry | Community Health Toolkit

Im curious if you’re noticing the same performance across-the-board or noticing differences depending on how long the app was running for example.

It would be useful if you could inspect telemetry and share some numbers with us.

Thank you!

danielmwakanema · September 19, 2023, 6:07am

Hello @diana, we have noticed the performance drop across-the-board.

Alright, I will have a look at the telemetry data. I see I can find this metadata in CouchDB. Is this data also available/visualized in Watchdog or something similar?

diana · September 19, 2023, 7:42am

Hi @danielmwakanema

I believe this is only available in CouchDb out of the box. I’m not entirely sure if we ship any prebuilt queries to postgres. Maybe @binod can help there?

danielmwakanema · September 19, 2023, 8:17am

Alright, thanks. Those would be a nice-to-have.

binod · September 20, 2023, 3:05pm

Do you have Couch2PG set up?

It will make it easier to query the telemetry and feedback docs in PostgreSQL once it is set up and data synced.

danielmwakanema · September 20, 2023, 4:12pm

Hello @binod, yes we do have Couch2pg setup, :).

binod · September 20, 2023, 4:34pm

In that case, you can have a look at the telemetry and feedback data from these materialized views:

useview_telemetry
useview_telemetry_devices
useview_telemetry_metrics
useview_feedback

Here are some forum posts with SQL query examples that you might find helpful:

kenn · September 20, 2023, 5:49pm

I met with the team today to discuss these client-side challenges.

Proposed Mitigation This project already has a few cron tasks to support an integration with OpenMRS. These tasks maintain a summary of data per patient by inspecting the CHT reports and combining it with data from OpenMRS. Our plan to solve client-side scaling challenges is to extend this task further so it stores all relevant information needed for the CHT app & this would allow us to purge all other CHT reports for all users.

Scale-up There are about 2500 contacts and 8000 reports on these slow devices today. This is about 40% of the contacts required to scale to support their largest health center + 15% of these contacts have reports today.

At scale + after this mitigation there will be ~5500 contacts and ~5500 reports (1 summary report per contact + all other reports purged). That is 11k docs/device which is where we are at today. Performance is satisfactory on desktops, iPhones, iPads (etc) but unsatisfacty on the production hardware. Our recommendation is that this project procure faster devices (at least for nurses working in the largest facilities). Our Nepal team recently went through a similar experience procuring new devices for users, so I’m going to consult with them before we share device recommendations.

kenn · September 22, 2023, 7:28am

This may prove to be an important client-side performance bug affecting tab load times - espectially since this project has so many contacts. It affects 3.17.1 which is the production version for this app. Low performance in Reports tab · Issue #8576 · medic/cht-core · GitHub

kenn · October 10, 2023, 8:20am

I’ve reproduce the dataset for this project: 5500 contacts and 0 reports. I’ve warmed all views and synced all docs. Testing on CHT Core prerelease of 4.4.1 (includes fix for #8576 mentioned above). I’m testing this on a beefy desktop machine.

Contacts load at 0:07s
Contact summary loads at 0:57s
When I scroll down, I just see white. The list of contacts actually renders at 1:51s

Numerous times I saw the “Page Unresponsive” warning.
Page is non-responsive - can’t open hamburger menu or change tabs.
System CPU is pinned at 100% for nearly the full 2 minutes.
Not surprised that these little A7 Lite tablets are struggling under these conditions.

I’m thinking to try to work around this by putting these contacts into useless places that serve no purpose except to render fewer contacts at a time. Pretty bad experience though. Any other suggestions for handling this?

Opened for consideration as a long-term fix Pagination for contacts within the selected contact tab · Issue #8627 · medic/cht-core · GitHub

derick · October 10, 2023, 8:56am

Recording to accompany the post

derick · October 10, 2023, 8:58am

I’m thinking to try to work around this by putting these contacts into useless places that serve no purpose except to render fewer contacts at a time

This should work

System CPU is pinned at 100% for nearly the full 2 minutes.
Do we know what’s causing this? Are we trying to fetch (and render?) all contacts at once?

danielmwakanema · October 10, 2023, 9:12am

I think pagination is a tried and tested path. First thing that came to mind for me as well.

kenn · October 20, 2023, 4:33pm

This is the total doc counts for a user on this project:

Metric	Value
Total Docs	26802
Total Metadocs	27
Nurses	11
Patients	983
Tombstones	1503
Tasks	20267
Reports	3787

Deep diving into a single offline user, they download ~11k docs to their device. This consists of all contacts + all reports + ~6k task docs.

Here are some load times on my laptop & on the Sumsung A7 Lite Tablets used in the field (I accidentally used CHT 4.4 to test this even though production is running 3.17.2). Actions are measured in the order listed. “1st Load” means load time after an initial replication. “2nd Load” means the “1st Load” has completed and then the user repeats the action.

Action	Laptop 1st Load (w/o task purge)	Laptop 2nd Load	Tablet 1st Load (w/ task purge)	Tablet 2nd Load
Tasks Tab	15m32s / 11m32s	~2s	24m47s	~5s
Tasks Tab 7 days after load	1m51s	~2s
Reports Tab	4m 25s / 4m 35s	~2s	7m57s	~4s
Contacts Tab	2m23s	~25s	8s	3s
Contact Search	2m06s	< 1s	4m42s	~1s

Here is a report from the field which suggests that these “1st Load” times are highly impactful:

Yesterday, when we got to the site it took about 10 minutes for the app to load, unfortunately, all clients searched could not be loaded and the delays prompted the nurses to proceed to work without the tablets because they had close to 30 clients waiting to be seen.

It is odd they are experiencing >10 min load times months after installing the app and using it successfully. Perhaps their device is going to sleep while loading? I’m going to investigate battery settings and keeping the device awake while loading.

Perhaps they are experiencing a lot of view warming even after 1st load … I’ve set up a cron task to monitor the seq number over time to get a better idea of how docs are changing – maybe there is a lot of document churn which is causing large indexing times. But I don’t believe this to be the case.

michael · October 27, 2023, 2:07pm

We had a chat with the team this morning and learned that they actually have a very meaningful way to split these up into different places. Each patient is always associated to a specific “distribution point” and there are ~130 distribution points… that would give you a much more reasonable ~42 contacts per place (5,500 / 130). It is possible for a patient to visit another distribution point but they generally have a primary distribution point.

We also learned that… Nurses (ie the users we are concerned with) are never working in the health facilities, they are always only using their tablets and attending to the distribution points. Patients associated to the same distribution point may actually go to different health facilities, but this does not really affect the Nurse’s workflow, the health facility is primarily useful for reporting purposes.

Any other reasons not to split patients by distribution points? Nurses would still need to have access to all distribution points as well as all the patients in every distribution point (so the amount of data would be the same) but I think this would solve the issue on the contact’s tab.

kenn · November 2, 2023, 9:04pm

I believe the hierarchy for this app has been discussed in greater detail here. I believe we never did find a CHT hierarchy that worked for this project and the best we could do was to have one CHT application/CHT instance per health facility + a flat hierarchy (health facility > patient).

Maybe something has changed…? I’ll DM you to find specifics of your conversation + learn who you are speaking with.

kenn · November 22, 2023, 7:52pm

We know the CHT is slow on 1st load: and users see ~24min load times. Since CARES has two apps (one per facility), users have to pay this price twice. So the ideal scenario for these CHT users is a 48 minute expected load per user.

But! CARES has 10 users who are sharing 5 devices. Each day when nurses start work, they pick a device at random. They login to whatever device they get, do their work, and logout at the end of the day. The next day they pick a new device at random from the pool.

The impact of this device sharing is that instead of paying this 24min price 2x (48min waiting) they are paying it 10x (480min waiting) - once per device per app. Across the whole project, this is a ton of unnecessary load time. Days of it.

We recently got budget to buy a few high-end devices so things would be faster. Instead, we are going to spend the money to get more A7 Lite devices so each user can have a dedicated device and we can elimate all this unnecessary loading.