I wanted to know what are the effects (if any) of a rollback in production environment? What things we need to keep in mind if we decide to do it?
Please clarify what you mean by rollback.
Rolling back to a previous snapshot of the data? Rolling back an upgrade? Something else?
Rolling back an upgrade.
The impact is dependent on which versions are upgraded to and then rolled back to, but the biggest concern is data structure rollback.
All schema changes, whatever they might be, are implemented with migrations. We don’t have documenation about migrations (except for them being mentioned as a development step here: Development Workflow | Community Health Toolkit) but you can take a look at the migrations that exist now: cht-core/api/src/migrations at master · medic/cht-core · GitHub
In short, whenever a change in schema (for one document, one type of document or several types of documents) is necessary, this is achieved with a migration. We try to avoid large migrations (ones that affect a large number of documents), but migrations that affect a reasonable amount of documents might exist between versions.
Migrations don’t have rollback algorithms (we don’t develop them).
There’s no guarantee that the previous version of the app will work as expected with the new schema for affected docs. This is not something that we test or support.
Given the distributed nature of data, even if you rollback to a previous snapshot of the database, if any CHW had downloaded migrated docs, those will be synced back up to the main database and potentially cause the same problems as if the server-side data rollback had not even happened.
I don’t believe we have a comprehensive document or statement about the impact of rollbacks.
Maybe we should. What do you think, @gareth?
@diana Yes, definitely! It looks like there isn’t any documentation on upgrading at all, so that’d be worth adding (why, when, how, etc), and part of that should be how to recover if something goes wrong.
Aside from the technical feasibility, it would be helpful to know why you are looking to roll back an upgrade. Was there a performance issue, or a deprecated feature that you were relying on? That context can help to see if there are alternatives, and also to make sure whatever you are seeing as broken is fixed too.
Sometimes it happens that an upgrade results in an unforseen critical bug which affects the whole project. For instance, when we upgraded to 3.9, we experienced a logout issue which affected all the CHVs. In that kind of a scenario, a rollback becomes a good option until the issue is fixed.
Thanks for the extra context, and apologies for the difficulties you experienced. In most cases, fixing a current issue would be less problematic than rolling back a version, and we’ll document that better, along with possible recovery instructions.
With our commitment to testing, smooth upgrades, and responsive support we hope to mitigate any difficulties that would be experienced. When a severe bug is discovered we aim to have a patch released with no delay, and the sooner the issue is raised the quicker we can investigate – as was done when you reported your experience. We hope to avoid similar issues in the future, and appreciate your feedback and ideas to ensure that the CHT suits the needs of your deployments.