Form versioning

Hi community,
We’ve an issue with form versioning feature, I’ve been comparing github file changes, diff, and form version sha256 values for the patient_assessment.xlsx file
We’ve that in the last 20 deploys to production 7 of them modified this file, so we expect to find in this period, the period in which the last 20 deploys have been performed, 7 distinct sha256 values but we find 20,a new sha256 per deploy, in addition, comparing deploys timestamps with xml_version.time values, for each deploy we have some seconds later a xml_version timestamp in the form doc on couchdb.

Looking in deployment runs, I can see that the file patient_assessment.xml is uploaded even when the corresponding xls has no changes (or when github diff says that the file has no changes).

Our probleme is that we find more sha256 version values than real changes in the xls files, we’ve proposed to upload with the cht_conf tool the new xlsx in addition to the new xml, and to store the replaced xml/xls instead of overwriting the old one with the new one. But our idea seems to makes no sense in order to relate the xml sha256 with the xlsx file that generated the xml file. Currently we need to use github and xlsx comparation to map several sha256 version to the same version id that we generate, in this way data analysts can access the xlsx that the describes the form used to submit a given data.

We can get the commit deployed of a deployment in wich the xml has changed and has been uploaded to the cht by the cht_conf, however if we download, from this commit, the xlsx file and we compare it with the xlsx file downloaded from the commit of the previous deploy, we get identical xlsx files, we use xlCompare.com online tool to compare the files.

do you have get any feedback of this kind of behaivor from other projects ?
could you help me figuring out what is going on ?
Thanks

@bamatic - oh no! I’m sorry to hear you’re having an issue with the hashes and timestamps of uploaded forms :frowning:

Doing a quick check, I don’t see any outstanding issues for folks facing this problem, but I’d like to know more about your situation! Are you able to reproduce the issue with a few simple steps? Ideally if we can reproduce the issue, we can more easily debug it.

Also, does it depend on using a specific version of cht-conf you use? Which version are you on?

All the same, I’ll take some time over the coming day to try and reproduce your findings, but I wanted to inquire before doing so in case you had some more info that might help.

Hi @mrjones thanks very much,
Here in Mali today is the Aïd El Fitr holiday, other countries will celebrate it tomorrow, so I will work on this later this week and come you back with more details.

@bamatic - no worries about the break for holiday!

That said, I believe I’ve gotten to the bottom of the behavior you’re seeing. The main issue is that cht-conf has you install Medic’s version of pyxform. This version is behind it’s upstream, so is missing a number of features. One feature it’s missing is deterministic ordering of XML attributes when converting a form.

What this means is that when you run convert-app-forms or convert-contact-forms on the exact same .xlsx file, it can output different .xml files each time. On very simple forms there may be only 3 or 4 possible combinations. On very complex forms, there’s effectively infinite combinations.

One way to work around this is to explicitly convert forms only when they’ve changed. So instead of calling convert-app-forms like this:

cht --url=https://medic:password@cht.server.com convert-app-forms upload-app-forms

You can specify to ONLY convert and upload one form, in this case delivery:

cht --url=https://medic:password@cht.server.com convert-app-forms upload-app-forms -- delivery

You can chain multiple forms at the end, separating them by spaces if you’d like.

For sure, the much better path forward is to fix pyxform to be deterministic per the ticket.

1 Like