We are running an instance of 4.x and using Samsung A7 Lite tablets to see clients. We have noted that the performance of the app on health worker tablets has degraded quickly over a month or two. This as the number of documents has increased. We started with ~700 documents and now we have ~10k.
CHT specs:
Version: 4.x
Device specs:
Additional details:
avg load time is 7+ seconds on the tablets
the app has ~10k documents at the moment
the app is slow when loading health centers, reports tab, tasks tab and initializing
we tried running the app on higher end devices like an Xiaomi Redmi Note 10T, iPhone 14, MacBook, mid 2010s desktop and the performance problems varied from improved to completely disappeared
It’s expected for the number of documents to affect device performance.
Loading times vary depending on many factors, including battery levels, how many other apps are running on the same device, etc.
We store loading time statistics in telemetry docs: User telemetry | Community Health Toolkit
Im curious if you’re noticing the same performance across-the-board or noticing differences depending on how long the app was running for example.
It would be useful if you could inspect telemetry and share some numbers with us.
Hello @diana, we have noticed the performance drop across-the-board.
Alright, I will have a look at the telemetry data. I see I can find this metadata in CouchDB. Is this data also available/visualized in Watchdog or something similar?
I believe this is only available in CouchDb out of the box. I’m not entirely sure if we ship any prebuilt queries to postgres. Maybe @binod can help there?
I met with the team today to discuss these client-side challenges.
Proposed Mitigation This project already has a few cron tasks to support an integration with OpenMRS. These tasks maintain a summary of data per patient by inspecting the CHT reports and combining it with data from OpenMRS. Our plan to solve client-side scaling challenges is to extend this task further so it stores all relevant information needed for the CHT app & this would allow us to purge all other CHT reports for all users.
Scale-up There are about 2500 contacts and 8000 reports on these slow devices today. This is about 40% of the contacts required to scale to support their largest health center + 15% of these contacts have reports today.
At scale + after this mitigation there will be ~5500 contacts and ~5500 reports (1 summary report per contact + all other reports purged). That is 11k docs/device which is where we are at today. Performance is satisfactory on desktops, iPhones, iPads (etc) but unsatisfacty on the production hardware. Our recommendation is that this project procure faster devices (at least for nurses working in the largest facilities). Our Nepal team recently went through a similar experience procuring new devices for users, so I’m going to consult with them before we share device recommendations.
I’ve reproduce the dataset for this project: 5500 contacts and 0 reports. I’ve warmed all views and synced all docs. Testing on CHT Core prerelease of 4.4.1 (includes fix for #8576 mentioned above). I’m testing this on a beefy desktop machine.
Contacts load at 0:07s
Contact summary loads at 0:57s
When I scroll down, I just see white. The list of contacts actually renders at 1:51s
Numerous times I saw the “Page Unresponsive” warning.
Page is non-responsive - can’t open hamburger menu or change tabs.
System CPU is pinned at 100% for nearly the full 2 minutes.
Not surprised that these little A7 Lite tablets are struggling under these conditions.
I’m thinking to try to work around this by putting these contacts into useless places that serve no purpose except to render fewer contacts at a time. Pretty bad experience though. Any other suggestions for handling this?
I’m thinking to try to work around this by putting these contacts into useless places that serve no purpose except to render fewer contacts at a time
This should work
System CPU is pinned at 100% for nearly the full 2 minutes.
Do we know what’s causing this? Are we trying to fetch (and render?) all contacts at once?
This is the total doc counts for a user on this project:
Metric
Value
Total Docs
26802
Total Metadocs
27
Nurses
11
Patients
983
Tombstones
1503
Tasks
20267
Reports
3787
Deep diving into a single offline user, they download ~11k docs to their device. This consists of all contacts + all reports + ~6k task docs.
Here are some load times on my laptop & on the Sumsung A7 Lite Tablets used in the field (I accidentally used CHT 4.4 to test this even though production is running 3.17.2). Actions are measured in the order listed. “1st Load” means load time after an initial replication. “2nd Load” means the “1st Load” has completed and then the user repeats the action.
Action
Laptop 1st Load (w/o task purge)
Laptop 2nd Load
Tablet 1st Load (w/ task purge)
Tablet 2nd Load
Tasks Tab
15m32s / 11m32s
~2s
24m47s
~5s
Tasks Tab 7 days after load
1m51s
~2s
Reports Tab
4m 25s / 4m 35s
~2s
7m57s
~4s
Contacts Tab
2m23s
~25s
8s
3s
Contact Search
2m06s
< 1s
4m42s
~1s
Here is a report from the field which suggests that these “1st Load” times are highly impactful:
Yesterday, when we got to the site it took about 10 minutes for the app to load, unfortunately, all clients searched could not be loaded and the delays prompted the nurses to proceed to work without the tablets because they had close to 30 clients waiting to be seen.
It is odd they are experiencing >10 min load times months after installing the app and using it successfully. Perhaps their device is going to sleep while loading? I’m going to investigate battery settings and keeping the device awake while loading.
Perhaps they are experiencing a lot of view warming even after 1st load … I’ve set up a cron task to monitor the seq number over time to get a better idea of how docs are changing – maybe there is a lot of document churn which is causing large indexing times. But I don’t believe this to be the case.
We had a chat with the team this morning and learned that they actually have a very meaningful way to split these up into different places. Each patient is always associated to a specific “distribution point” and there are ~130 distribution points… that would give you a much more reasonable ~42 contacts per place (5,500 / 130). It is possible for a patient to visit another distribution point but they generally have a primary distribution point.
We also learned that… Nurses (ie the users we are concerned with) are never working in the health facilities, they are always only using their tablets and attending to the distribution points. Patients associated to the same distribution point may actually go to different health facilities, but this does not really affect the Nurse’s workflow, the health facility is primarily useful for reporting purposes.
Any other reasons not to split patients by distribution points? Nurses would still need to have access to all distribution points as well as all the patients in every distribution point (so the amount of data would be the same) but I think this would solve the issue on the contact’s tab.
I believe the hierarchy for this app has been discussed in greater detail here. I believe we never did find a CHT hierarchy that worked for this project and the best we could do was to have one CHT application/CHT instance per health facility + a flat hierarchy (health facility > patient).
Maybe something has changed…? I’ll DM you to find specifics of your conversation + learn who you are speaking with.
We know the CHT is slow on 1st load: and users see ~24min load times. Since CARES has two apps (one per facility), users have to pay this price twice. So the ideal scenario for these CHT users is a 48 minute expected load per user.
But! CARES has 10 users who are sharing 5 devices. Each day when nurses start work, they pick a device at random. They login to whatever device they get, do their work, and logout at the end of the day. The next day they pick a new device at random from the pool.
The impact of this device sharing is that instead of paying this 24min price 2x (48min waiting) they are paying it 10x (480min waiting) - once per device per app. Across the whole project, this is a ton of unnecessary load time. Days of it.
We recently got budget to buy a few high-end devices so things would be faster. Instead, we are going to spend the money to get more A7 Lite devices so each user can have a dedicated device and we can elimate all this unnecessary loading.