CHT-Textit metric monitoring

Hello community

We have set our 2WT with text-it and external sms gateway and do have a couple of questions about how to best best monitor the to ensure all the components in the system are in sync , talking to each other and it case of any breakpoint oor breach we are notified , we want to mainly rely on the cht-watch dog since we won’t be having access to the systems directly after handing them over to the clients technical team but we do like to keep monitoring them ie components in the system closely via grafana alerts.

  • What could be the ideally metrics to keep tabs on and visualize in the dashboards for this ?, we do have the ones from the monitoring API , but was wondering if there are others out there.

  • How can i monitor and visualize that CHT is talking successfully to textit via the outbound feature and incase of an failure am notified and vise-versa

cc @mrjones @michael @jkuester @diana

Sorry for the delayed response @cliff (many of us have been OOO for summer break)!

I don’t know that I have a perfect answer to you question, but I can say that the config in cht-watchdog is designed to leverage the all the data currently available from the CHT monitoring API. And, that is pretty much all the metric data that is available out-of-the-box. If you are using couch2pg, you could setup some custom metrics if you had some particular things you wanted to watch out for. If you are on 4.3.0+, it woiuld be worth checking out the cht_api_* metrics being collected since those will give a pretty good picture of how the server is performing. For more low-level system metrics, you should consider setting up cAdvisor on the CHT host server.

How can i monitor and visualize that CHT is talking successfully to textit via the outbound feature and incase of an failure am notified and vise-versa

The existing Grafana panel around the cht_outbound_push_backlog_count metric in watchdog are probably the best thing to watch here. If there is a problem pushing messages to textit, I think this is where you would see it (since the backlog would continue growing if messages could not be pushed).

2 Likes

There is also rapidpro2pg service, although it hasn’t been updated in the last two years. I haven’t tried it myself yet.

Since TextIt is a hosted RapidPro service, it should work similarly.

In the CHT docs, there is a single line about it:

  • Remember to set up the rapidpro2pg service to get your RapidPro workspace data over to the Postgres database.
2 Likes

sure no problem :slightly_smiling_face:

yeah i have set up cadvisor on the CHT host server as part of services in the docker-compose file , it that approach right ?

@jkuester so with this metric when the number is growing in the grafana UI panel then it means there is a problem ?

oh thanks @binod :upside_down_face:
going to check it out and see how it works

yeah i have set up cadvisor on the CHT host server as part of services in the docker-compose file , it that approach right ?

Correct! With cAdvisor on your docker host with the CHT instance you can get lots of useful data about the CHT containers, themselves.

so with this metric when the number is growing in the grafana UI panel then it means there is a problem ?

Right. If the backlog is non-zero, that is not necessarily a problem since there may just be queued messages that the system has not had time to send yet. However, if the number is very large and continuing to grow, there is definitely a problem.

1 Like