Cht-conf convert forms changes xml even when xlsx form does not change

An issue that I’m having is that cht convert-contact-forms changes the .xml output even when the source .xlsx file does not change. This is a problem for version control because it makes it difficult to see which forms have changed.

Steps to reproduce:

  1. download person-create.xlsx
  2. cht convert-contact-forms
  3. git add .
  4. git commit -m "convert forms"
  5. cht convert-contact-forms
  6. git diff

Expected behavior: no changes in person-create.xml
Actual behavior: changes in person-create.xml

cht-conf version 3.18.3
macOS 13.1

:disappointed: Yes, sadly our current version of pyxform does not deterministically sort the xml attributes. So, regenerating the xml results in a randomized sorting that can be different each time the form is converted (which as you have noted is a huge pain for version control).

The “good” news is that this has been fixed in the upstream version of pyxform, and an issue has been logged to pull these changes into medic/pyxform!

2 Likes

Thanks! Any idea when that issue might be fixed?

Unfortunately I do not have any specific timeline, but I can say that there is nothing blocking that issue from being addressed. It is just a matter of an alignment of dev time/priorities (or someone submitting a PR) :smile:

1 Like

Ok thank you.

In case others are having this issue, here is some code I wrote (with ChatGPT :slight_smile: ) that will sort the xml files. This could be used as part of a Git pre-commit hook.

My code isn’t great - the processed xml files don’t run work correctly (they fail validation with cht-conf because of namespace issues). But the processed xml files are useful for version control.

import xml.etree.ElementTree as ET
import glob
import re
from xml.dom.minidom import parseString

def pretty_print_xml(xml_string):
    dom = parseString(xml_string)    
    pretty_xml = dom.toprettyxml(indent="  ")
    # Remove unnecessary blank lines
    pretty_xml = '\n'.join([line for line in pretty_xml.split('\n') if line.strip()])
    return pretty_xml

def sort_attributes(element):
    if element.attrib:
        element.attrib = {k: element.attrib[k] for k in sorted(element.attrib)}

def sort_elements(element):
    sort_attributes(element)
    element[:] = sorted(element, key=lambda child: (child.tag, child.text if child.text else ''))
    for child in element:
        sort_elements(child)

def sort_xml(xml_string):
    parser = ET.XMLParser(target=ET.TreeBuilder(insert_comments=True))
    root = ET.fromstring(xml_string, parser=parser)
    sort_elements(root)
    return ET.tostring(root, encoding="utf-8", method="xml").decode("utf-8")

def normalize_whitespace(xml_string):
    # Normalize whitespaces between tags
    xml_string = re.sub(r'>\s+<', '><', xml_string)
    # Remove spaces around = in attributes
    xml_string = re.sub(r'\s+=\s+', '=', xml_string)
    return xml_string

def sort_and_overwrite_xml_files(directory):
    for xml_file in glob.glob(f"{directory}/**/*.xml", recursive=True):
        with open(xml_file, "r") as file:
            xml_content = file.read()
        
        sorted_xml = sort_xml(xml_content)
        normalized_xml = normalize_whitespace(sorted_xml)
        beautified_xml = pretty_print_xml(normalized_xml)
        
        with open(xml_file, "w") as file:
            file.write(beautified_xml)


sort_and_overwrite_xml_files("forms")
1 Like

I am experiencing the same version control frustration when forms aren’t changing, and was looking to see if anyone had a workaround before reporting, and came across this post.

Finding a solution to this would be extremely very helpful to avoid the silent introduction of bugs or changes to forms. It’s otherwise easy to have unintended form changes being committed by any one of the config contributors.

Does anyone have a suitable workaround already, or is anyone in the community already considering picking this up? Is there an open issue somewhere for this to be tracked?

Yes, as @jkuester has said above:

2 Likes