Decreasing the number of reports modified by a big move-contact operation

I definitely recognize the challenge of moving contacts with large amounts of historical records. Just wanted to add some of my initial thoughts here!

I think you are right that implementing something like modify_purged_reports_when_move_contacts would require some kind of association between particular types of records and roles (since there is no universal concept of the doc being “purged” that is not associated to particular roles). This kind of thing might be more of a paradigm shift than is worth doing for just this use-case… (unless there was other value we could obtain from that kind of mapping :thinking: )

A --skip-older-than parameter for move-contacts would, I think, be more simple to implement. I think cht-conf would still have to retrieve all the reports (so it can check the date), but I would expect that avoiding updates to a significant percentage of the reports would still have a positive effect on performance (and reduce the overall load on the system caused by the move). The downside, of course, would be the inconsistent lineage data for older reports.

Another idea is to take out of couchdb all purged docs or docs that are older than a given duration, so unpurged it later will be impossible.

This proposal is something that has come up several times recently and I appreciate it getting raised again here in the context of moving contacts! Permanently “archiving” historical data (so it is no longer in the main CouchDB) could help alleviate a number of performance-related issues on long-running instances.


I know that @mrjones, @iesmail, and I were recently discussing the challenges with moving contacts within the CHT. It is clear that the current tooling/support for moving contacts in the CHT is inadequate. In my opinion, we may need to rethink the approach of de-normalizing the contact lineage to the various reports. (But, of course, that data is there to make the contact-specific replication work properly. So changing how that data is stored would require updating the fundamentals of the replication algorithm…)