CHT Dashboards and Analytics in 2022

Data dashboards are an essential component of any Health Management Information System to provide analysis of key health indicators in the community. However, the schemaless data stored in the CHT’s CouchDB database isn’t suitable for bespoke aggregate queries over large data sets. To make this possible the data is copied to a data warehouse which makes querying easier and faster. To achieve this couch2pg was developed to replicate the raw JSON data into a PostgreSQL database, however the data was still schemaless which was difficult to query and optimize. To solve this, cht-couch2pg was developed to create a suite of CHT specific materialized views to provide a consistent schema to query against. Many projects have successfully deployed these tools and integrated with third party dashboard applications to meet the needs of their users.

However, the CHT is now used by increasingly large projects and the couch2pg solution is no longer scaling as required, sometimes taking hours to update the materialized views. The dashboard queries are also hard to optimize leading to unacceptably long dashboard loading times. In addition, a recent survey of app builders found that dashboard development is one of the two hardest to configure parts of a CHT application.

This led to the realization that a complete review of the dashboard solution was required.

The new dashboard stack utilizes proven third party libraries and streams the data into the schema incrementally close to realtime. Firstly logstash and PostgREST are configured to stream data from CouchDB’s changes feed straight into a raw JSON table in PostgreSQL. From here DBT incrementally extracts the raw data into a series of schemas suitable for querying. Application builders will configure DBT modules to perform bespoke data transforms to extract useful schemas out of their unique JSON documents.

Once the data warehouse pipeline is complete the final piece of the puzzle is supporting an open source self hosted dashboard tool. The leading contender for this is Apache Superset, which will give application developers full control over the dashboards, and allow deployments to have full control over their own data from collection to dashboard.

Development has already started and initial performance comparisons on production data have proven the new approach scales well. If you’re interested in trying out a beta version of the new tool on your data please let me know, as this will help iron out any issues before the first version is released.

If you have any questions, please respond below. Otherwise keep watching this space for more announcements.

8 Likes

thanks @gareth for this announcement and work on this :+1:
Is it possible to bundle up this new dashboard tech stack in a single docker-compose file ? ie mainly logstah,PostREST,DBT and PostgreSQL

1 Like

Yes absolutely! Exactly what this will look like is still up for discussion, for example, best practice is for databases such as PostgreSQL to be managed separately, however the rest of the stack should run with a single docker-compose up command.

Sounds awesome :+1: ,thanks @gareth