Migrate data from one form structure to another

With the timelines being what they are, we’ve unfortunately resulted to using our non-prod environment as a “semi” prod in the interim.

We’ve changed two main things:
Age is now being dynamically calculated, whereas before it was persisted to the db.
We’ve changed our dob variable to date_of_birth to align with CHT in order to utilize the ageInYears function made available to the context object in the <form_name>.properties.json files.

The problem is that some records have already been captured in the dated manner, and with us gearing up to make a new deployment, we’re unsure of how to approach some sort of migration where we can minimize the impact on the legitimacy of the existing data.

How would one go about writing some sort of migration to handle the transformation from one form structure to an updated one?

In our case the dobdate_of_birth change only affects the hierarchy person of contact_type = hhm.

This approach would also be useful to keep in mind as our requirements/specs change going forward.

I see there’s a migrations folder in the api/src/ folder. Would that be the location to house such migration scripts, and if so, how does one trigger the migration to run?

Hi @Anro

What are your concerns on keeping the data as is, without changes?

As your project grows, there are high chances you will need to update your forms many times. As data size grows, migrating becomes very impractical and even highly disruptive, if a large number of docs need updates.
Your configuration should be designed such that it can handle old data formats, and the CHT already embraces flexible structure.

We no longer create costly data migrations. If a new migration is added, it affects some strategic docs, like one settings doc, or translation docs for example. Any new migration will always update a known and reduced scope of documents to avoid large disruptions when upgrading.

HI @diana

Thank you for taking the time to get back to me.

The main concern is being able to hook into existing CHT functionality without having to maintain source file changes ourselves - which would also require a new image to be built.
Take the webapp/src/ts/services/xml-forms-context-utils.ts for example.
By having our date capture field named date_of_birth (instead of dob) we gain the ability to perform date calculation in the <app_form_name>.properties.json file - a feature we require for our new-born-child and pregnancy-and-womans-health forms.

I’m sure there’s a few other benefits with aligning our properties the way CHT caters for it, but at this time I’m not too well versed in all the functionalities provided the out of the box.


I’ve read that, and I’m speaking under correction, with non-relational databases one usually caters for these changes on the client side.
At this time though we’re not yet in full operation, but in a phase where we’re almost ready to go live - so we have explored this option as a possible solution to this issue and perhaps future scenarios.
It is with this in mind that I am attempting to keep the code base as clean as possible before more change requests come in - as I can imagine file complexity can increase quite significantly.
Especially since there’s a possibility that changes need to be made in multiple files for a change to take effect.
For instance changing dobdate_of_birth required a change in contact-summary.js, houehold-create (since we’re creating a person in there), the household_member-create and -edit files, and all the app forms that reference that value for certain calculations.

Writing a migration script would fix the problem by simply “renaming” all existing contact_type='hhm' records in the db and not require each of the files mentioned above to hold an additional reference to a dob field that may or may not apply anymore as time goes on - that will also need to be maintained.

Perhaps I’m not yet informed enough on how CHT uses the flexible structure, could you please provide an example?


The disruptions mentioned, do you perhaps have some more info on how impactful that may be?

At the moment the “solution” is the following file content placed in the “migrations” folder:

const { promisify } = require('util');
const db = require('../db');

module.exports = {
    name: 'update-dob-in-hhm',
    created: new Date(2023, 9, 13, 17, 6, 0, 0),
    run: promisify(function(callback) {
        db.medic.allDocs(
            { include_docs: true },
            function(err, results) {
              if (err) {
                return callback(err);
              }

              console.log('************** RUN MIGRATION **************');
    
            //   console.log(results);
            for (const key in results.rows) {
                const entry = results.rows[key];
                const doc = entry['doc'];
                const contact_type = doc['contact_type'];
                const hasDob = doc.hasOwnProperty('dob');
                if(contact_type==='hhm' && hasDob){
                    doc['date_of_birth'] = doc['dob'];
                    delete doc.dob;
                    
                    console.log(doc);
                    
                    db.medic.put(doc);
                    console.log(`${doc['name']} migrated`);
                }
                else if(contact_type==='hhm'){ // Check if it was updated successfully
                    console.log(doc);
                }
            }

              console.log('*****************************');
    
              callback();
            }
          );
    }),
};

Which gets executed by the following file:

const migration = require('./2023-09-13_dob_change.js');

console.log('Starting migration...');

migration.run((err) => {
  if (err) {
    console.error('Error running migration:', err);
  } else {
    console.log('Migration complete.');
  }
});

console.log('Migration script execution finished.');

The last file gets triggered manually by running node <file_name>.js when testing locally.
With our current deployment environment though, we don’t have all the project files available.
Since these file relies on the ../db being present, it is providing some complexity.
Do you perhaps have an idea of how to approach this in a “flat” manner?

Hi @anro

Thanks so much for the detailed response! I can see where your concern is coming from. Excellent research on date_of_birth, and thank you for noticing that this is not documented in our database schema section: Database schema conventions | Community Health Toolkit

I’ve created an issue to address this: Update database schema description to include all "hardcoded" fields · Issue #1183 · medic/cht-docs · GitHub

As far as the migration goes, I think you should not add a migration to the CHT directly, since this will be a one time run.
If you insist on migrating your data, you can just write a one-off script, using whatever software you want - it doesn’t even need to be JavaScript, and you can connect to CouchDb endpoints directly to read and edit documents.

If you do choose to create a migration, migrations run once when the API service starts. If the migration succeeded, it will add an entry to a document called migration-log, so that it will not run again on next startup.

Disruptions vary depending on how much data you are updating and how much data exists. I would not recommend doing a migration over a database with millions of docs.

From your example, you are using db.medic.allDocs and iterating over all docs in the database. This will never work on an instance with millions of docs, the request will likely timeout or crash the process with an out of memory error. You will need to use other endpoints to get your data, and batch your requests. If you have a small dataset, this will work fine.
The disruption consists of several things:

  1. once you edit a document, every view that indexes that document will need to be updated. View catchups can be costly and view requests will not respond until views are up to date.
  2. once you edit a document, every device (users logged in on mobile phones) that has a copy of that document will need to download it again.
  3. API startup is delayed until migrations run. This will mean that while your migration code is running, your server is completely offline.

Points 1 & 2 apply even if you run a custom script to update the data.