Exposing impact metrics from the API

yuv · September 24, 2025, 1:19pm

Hi CHT Community,

We are working on possiblities of reporting cetain impact metrics from CHT API itself. Impact metrics are quantifiable measures that show how effectively health programs are achieving their intended outcomes, such as the number of pregnancy registrations or total caring activities.

As of now, it’s only possible to extract impact metrics when couchdb data is sent to Postgres with the help of couch2pg or cht-sync and when required sql objects are created. This requires extra services to run and understanding of SQL. While not everything achievable in RDBMS can be achieved from couchdb alone, we are looking at the possiblity of certain metrics that we’ll release in an initial work. All of the metrics exposed from API will be aggregated and do not contain any PII or PHI information. Here’s what’s planned for the current development with an example.

{
"contactsByType": {
"g10_district": 3,
"g20_municipality": 2,
"g30_health_center": 2,
"g40_clinic": 1,
"person": 1999
},
"totalReports": 3007,
"reportsByForm": {
"L": 1005,
"N": 2002
}
}

This initial work will basically return the count of place/person in the system and total reports (can be counted as caring activities) along with the distribution of the form.

We are adding these metrics as initial work as this doesn’t require us to create new couchdb views and also these metrics do not contain any PHI/PII information.

We’d like to know from the community what other metrics are you interested to be exposed from API that are helpful to you.

jkuester · September 25, 2025, 3:26pm

Thanks for raising this out here @yuv! I had a couple questions for consideration:

How does the kind of data returned from the new impact endpoint compare with the kinds of things we include on the monitoring endpoint? Could we just include these impact stats in the monitoring response?
What is the performance implications of the impact endpoint? Watchdog will call the monitoring endpoint every 5min when collecting metrics. Would calling the impact endpoint on a similar interval be a problem?
If performance is a consideration, do we need to perhaps be exposing these impact values separately (on their own endpoints) instead of sending all the data back on the same response? Particularly as list of impact metrics grows, consumers might not want/need all the values.

In addition, I also have some rambling thoughts here on the API structure (happy to continue this on a different thread if you don’t want to pollute this one with a more theoretical discussion…).

I know there are as many different opinions on “the correct way” to do REST apis as there are IP addresses, but my personal opinion is that the best way to structure REST endpoints is according to identifiable entities that can be derived from the data. This is opposed to more “purpose built” HTTP endpoints that exist to perform a specific function or to provide a view of a particular collection of data (these other HTTP endpoints might not technically be “REST-ful”). Some advantages of entity-based REST endpoints is that they tend to be more consistent, predictable, and generically useful (they don’t make a lot of assumptions about what the consumer is doing). One disadvantage, though, is that for the originally targeted use-case, they might be less efficient since a consumer might need multiple calls to different endpoints to achieve the desired workflow.

Bringing this all back to the data we are looking to provide on the impact endpoint, lets just take as an example the number of contacts that exist with a particular type. We have the GET /api/v1/contact/uuid endpoint that returns an array of UUIDs for contacts, filterable by a type query param. The JSON object returned from that endpoint is an envelope that currently contains the data array with the UUIDs and a cursor value for paging. We could potentially add a total_count number value to that envelope that would contain the total number of entities found.

Then, to get the number of g40_clinic’s that exist you could just request:

GET /api/v1/contact/uuid?type=g40_clinic&limit=0

With the limit=0, the data in the response will be empty, but we could still fill out the total_count.

(Also, lets not get hung up on the uuid part of the endpoint. We could add similar functionality at /api/v1/contact. It is just that currently we don’t offer any functionality at that endpoint yet (no way to retrieve multiple contact entities at the same time.)

yuv · September 26, 2025, 1:49pm

Hi @jkuester , let me answer some of the questions in brief here.

How does the kind of data returned from the new impact endpoint compare with the kinds of things we include on the monitoringendpoint Could we just include these impact stats in themonitoring` response?

As you know monitoring endpoint returns more of a instance health and how certain expected functionalities are performing. The impact endpoint returns the question of how many? For example how many reports are submitted up to now, how many patients have been registered on the system, etc. For separation of concern, we are not adding these on the /monitoring endpoint.

What is the performance implications of the impact endpoint? Watchdog will call the monitoring endpoint every 5min when collecting metrics. Would calling the impact endpoint on a similar interval be a problem?

Currently, impact endpoint derives output from couchdb’s built-in reduce functions and db info. So, calling them doesn’t have any performance implication. Also, these endpoints do not need to be queried every 5 minute like monitoring. Calling them once a month is absolutely fine. We only report impact metrics in monthly aggregation. If required they can be called daily as well, but not necessary.

jkuester · September 26, 2025, 10:09pm

Yeah, this was my first thought as well. However, the more I think about it, the less of a distinction I see… There are quite a few “how many?” values currently returned by monitoring (doc_count, outgoing messages, connected_users, etc). Now, maybe those don’t belong in monitoring either, but the point is that stuff like “total number of reports” is very similar data.

I definitely think it would be valuable to be able to capture this data in Watchdog. (I agree that a 5m interval is probably needlessly frequent, but perhaps something like 1h would be useful from a monitoring perspective.) Ideally, if we are going to scrape the endpoint with Watchdog, it would be unauthenticated (like /monitoring). Do you think that would be a problem for /impact?