Community Health Toolkit

Are Materialized Views with dependencies refreshed in order by couch2pg?

The CHT “couch2pg” service refreshes Postgres materialized views at a regular cadence. If I have two materialized views A and B where B is dependent on A - what is the expected behaviour for this refresh? Is A refreshed before B so data properly cascades down the dependency chain? If not, what is the behaviour?

Not at all an answer to your question, but it looks like there’s very little mention of how to configure custom materialized views in our couch2pg docs or on the parent repo. Are there docs you know of that I couldn’t find?

I’d love to use what ever you have, or create what is possibly entirely missing, so how to create mat. views is more clear. A big part of this will be clearly documenting how to resolve dependencies like you’re looking to understand in this thread. I’ll follow along here so the next time this comes up we have good docs around it!

Thanks thanks!

The function we create is to refresh matviews is declared here. As I read it there is no order just a generic loop on the matviews FOR matview IN SELECT matviewname FROM pg_catalog.pg_matviews LOOP.

Reading the docs on postgres the call we make to EXECUTE format('REFRESH MATERIALIZED VIEW CONCURRENTLY %I', matview.matviewname); with the parameter CONCURRENTLY is what I think answers your question. It doesn’t lock the matview so connections are still allowed to it while being refreshed. Which makes me wonder if you could run into incorrect data as it is refreshing if it were a large table.

CONCURRENTLY
Refresh the materialized view without locking out concurrent selects on the materialized view. Without this option a refresh which affects a lot of rows will tend to use fewer resources and complete more quickly, but could block other connections which are trying to read from the materialized view. This option may be faster in cases where a small number of rows are affected.

This option is only allowed if there is at least one UNIQUE index on the materialized view which uses only column names and includes all rows; that is, it must not be an expression index or include a WHERE clause.

This option may not be used when the materialized view is not already populated.

Even with this option only one REFRESH at a time may run against any one materialized view.

I have a postgres environment setup and would be happy to test it out if you can provide me some matviews.

@Nick I think you’re right. It looks like they all just refresh at the same time, so if there is interdependence then the result of how new data is handled would be non-deterministic.

Here are some example mat views (sorry, don’t know of public ones relevant to CHT’s app data)

It looks like they all just refresh at the same time

They are refreshed serially based on what postgres returns when we query about the matviews present in a database.

We could implement this in couch2pg to guarantee the correct ordering during refreshes.