What is the recommended approach to monitoring the details of Couch performance?

jkuester · December 22, 2022, 3:51pm

I am looking for a way to holistically evaluate the health of a Couch instance. I see that the _stats and the _system Couch endpoints give lots of interesting data as well as the CHT monitoring API. These would be most helpful if they were compiled into a longitudinal view so you can see changes over time.

My question is: can anyone share their general approach for polling and compiling this data? What software do you use? How often do you poll? Which endpoints do you collect data from? What values are most useful to watch?

(I think what I am dreaming of is the unfinished second half of these monitoring docs that gives the practical approach for what tech to use…)

Thanks!

jkuester · December 23, 2022, 10:57pm

Well, I have created my own example configuration that provides something of an answer to this question. Basically, it is the config files necessary to spin up a CouchDB instance wired up to Prometheus via the couchdb-prometheus-exporter. Prometheus will periodically pull data from Couch and store it in a query-able format. It also provides a basic web interface that allows for graphing the results of queries. (My understanding is that you would want to use something like Grafana for creating more in-depth reporting dashboards from the Prometheus data. However, the basic graphing functionality in Prometheus works great for evaluating/debugging.)