Allies is excited to announce we’ve begun development of a monitoring and alerting solution using best in class tools. The solution is available for CHT 3.12.0 and later and only requires you to run a simple
docker-compose up command to get started. We say this is “best in class” because it uses Grafana and Prometheus which are not only easy to configure for cases like the CHT, but come with a ton of out of the box monitoring. Right now we’re focused on just the CHT, but it should be trivial to add support for for Docker, CouchDB, RapidPro and more! Finally, if there’s other custom metrics deployments are looking to track, it will be easy to capture and show these with out impacting the de-facto ones Medic is shipping.
Right now this feature has a public ticket tracking its active development. If you’re looking to try it out today, head over to this branch in the newly created repository and check out the
readme - we welcome any early feedback (it is safe to run against production)! Finally, to give some more background on each of the metrics, there’s a draft for some the documentation we’re looking to publish.
If you prefer a video demonstration instead, here’s a ~4 minute video walking through set up and initial deployment:
I wanted to followup the above post with two announcements:
- The testing branch mentioned above has been released (merged
main \o/) and is ready for use in production
- The documentation on how to set this up is now public as well
We welcome any questions or feedback! We’re going to continue to improve CHT monitoring and alerting in the repository, so please keep an eye out for future additions!
CUrrently our team has access to a superset installation that uses postgresql as data source. Data from these postgresql data sources is ingested from the cht instances with this https://github.com/medic/cht-app-monitoring-data-ingest
In the cht-wachtdog the data is obtained from the cht API, my question is,
- does grafana prometheus or the cht-watchdog code access directly postgresql to obtaine additional data ?
- can the cht-watchdog works without the postgresql database used by the couch2pg as the sink database ? If no, wich kind of data needs the cht-watchdog to be obtained from postgresql that is not available throw the monitoring API ?
@bamatic great questions! Watchdog is intended to be the spiritual successor of cht-app-monitoring-data-ingest! In Watchdog we want to take the wisdom and insights from app-monitoring, generalize and expand the scope of the monitoring/alerting, and provide everything in a format that is easily accessible to self-hosting partners!
does grafana prometheus or the cht-watchdog code access directly postgresql to obtaine additional data ?
Yes! Watchdog can be configured to connect to your couch2pg database and record metrics from Postgres data. Currently only a single metric (
couch2pg_progress_sequence) is being recorded, but many more will be added soon (and PRs are welcome!).
can the cht-watchdog works without the postgresql database used by the couch2pg as the sink database ?
cht-app-monitoring-data-ingest queries the Postgres DB populated by couch2pg and then stores the recorded metric values back in a Postgres database (where they can be queried by Superset). Watchdog, on the other hand, uses Prometheus to store the historical metrics data. So, metric values are recorded by calling the CHT
/monitoring endpoint as well as by querying the couch2pg DB (as noted above) and then are stored in Promethes (where they can be queried by Grafana). Watchdog does not store metrics (or do any other create/update) in Postgres.
wich kind of data needs the cht-watchdog to be obtained from postgresql that is not available throw the monitoring API ?
/monitoring endpoint provides a good amount of general data about the CHT instance, but there is still quite a lot of useful information that can be mined from the data collected by couch2pg. Pretty much all of the metrics currently being collected from Postgres by
cht-app-monitoring-data-ingest are not available via the
/monitoring endpoint. (For example, if you wanted to know how many users have an unsupported Webview version, that data is not available from
/monitoring, but can be calculated from the couch2pg data.)