Hi @anro
Thanks so much for the detailed response! I can see where your concern is coming from. Excellent research on date_of_birth, and thank you for noticing that this is not documented in our database schema section: Database schema conventions | Community Health Toolkit
I’ve created an issue to address this: Update database schema description to include all "hardcoded" fields · Issue #1183 · medic/cht-docs · GitHub
As far as the migration goes, I think you should not add a migration to the CHT directly, since this will be a one time run.
If you insist on migrating your data, you can just write a one-off script, using whatever software you want - it doesn’t even need to be JavaScript, and you can connect to CouchDb endpoints directly to read and edit documents.
If you do choose to create a migration, migrations run once when the API service starts. If the migration succeeded, it will add an entry to a document called migration-log
, so that it will not run again on next startup.
Disruptions vary depending on how much data you are updating and how much data exists. I would not recommend doing a migration over a database with millions of docs.
From your example, you are using db.medic.allDocs
and iterating over all docs in the database. This will never work on an instance with millions of docs, the request will likely timeout or crash the process with an out of memory error. You will need to use other endpoints to get your data, and batch your requests. If you have a small dataset, this will work fine.
The disruption consists of several things:
- once you edit a document, every view that indexes that document will need to be updated. View catchups can be costly and view requests will not respond until views are up to date.
- once you edit a document, every device (users logged in on mobile phones) that has a copy of that document will need to download it again.
- API startup is delayed until migrations run. This will mean that while your migration code is running, your server is completely offline.
Points 1 & 2 apply even if you run a custom script to update the data.