Scalable, near real time sync for Dashboards: CHT Sync and CHT Pipeline

Philip_Mwago · July 20, 2023, 5:05pm

Medic has built CHT Sync and CHT Pipeline to enhance the user experience with analytics. The goal of these two tools is to get data quickly and efficiently from CouchDB into PostgreSQL for immediate dashboard visibility.

CHT Sync is a service running on the server that listens to changes in the CHT database, and updates the analytics database accordingly. CHT Pipeline is used to define data models for transforming the raw CouchDB data into a more useful format that can then be queried to build dashboards. These tools are meant to run in collaboration to produce data for high-quality dashboards.

Dashboards

Due to the near real time sync, we get accurate and reliable data visualizations on the dashboards. CHT Sync is designed to constantly update the analytics database hence the data visualizations on the dashboards are up-to-date. The data is made generally available in PostgreSQL where any other system can read it. We currently recommend Superset as a dashboard tool to do this due to its feature set, ease of use, and friendly license.
As with the CHT in general, we recommend running CHT Sync with docker for easier operation. The readme has instructions on getting started. It’s important to note that CHT Sync operates independently of your CHT app and does not depend on any specific version of the CHT to operate. This means you can use this now, no matter what version of the CHT your app currently runs on.

Note: CHT Sync is not a web application, and it does not have a user interface.

CHT Sync and Couch2pg

CHT Sync serves as a complete substitute for Couch2pg as both tools are designed to retrieve data from CouchDB and replicate it to PostgreSQL. However, they differ in their implementation and scalability potential.

Making the data reportable

The process of replicating a document to Postgres takes just a few milliseconds from its creation. However, the replicated document is stored in a json column, making it unsuitable for direct use in creating dashboards or performing analytics. To address this issue, the CHT Pipeline comes into play. It takes the data from the json column and converts it into more usable tables or views, depending on the requirements.

The CHT Pipeline is executed in the background at an interval you can configure based on data availability needed. When it runs, DBT transformations are executed and data is normalized. By default, the interval is set to 5 seconds, but for complex transformations that require more time, it is recommended to adjust the interval to a few minutes.

To better assess the performance of the system, the team plans to conduct performance testing soon. This testing will provide more concrete insights into how the system performs with “real-world” data and use cases.

You can read more on CHT Sync and CHT Pipeline on our documentation website.

ghmwesigwa · August 2, 2024, 10:48am

The design choice to separate the data synchronization (CHT Sync) from the data transformation (CHT Pipeline) is particularly commendable, as it allows for a scalable architecture. Overall, this implementation exemplifies a well-thought-out strategy to address the complexities of real-time data handling and transformation in a robust and scalable manner. Kudos!