Cht-conf convert forms changes xml even when xlsx form does not change

An issue that I’m having is that cht convert-contact-forms changes the .xml output even when the source .xlsx file does not change. This is a problem for version control because it makes it difficult to see which forms have changed.

Steps to reproduce:

  1. download person-create.xlsx
  2. cht convert-contact-forms
  3. git add .
  4. git commit -m "convert forms"
  5. cht convert-contact-forms
  6. git diff

Expected behavior: no changes in person-create.xml
Actual behavior: changes in person-create.xml

cht-conf version 3.18.3
macOS 13.1

:disappointed: Yes, sadly our current version of pyxform does not deterministically sort the xml attributes. So, regenerating the xml results in a randomized sorting that can be different each time the form is converted (which as you have noted is a huge pain for version control).

The “good” news is that this has been fixed in the upstream version of pyxform, and an issue has been logged to pull these changes into medic/pyxform!

2 Likes

Thanks! Any idea when that issue might be fixed?

Unfortunately I do not have any specific timeline, but I can say that there is nothing blocking that issue from being addressed. It is just a matter of an alignment of dev time/priorities (or someone submitting a PR) :smile:

1 Like

Ok thank you.

In case others are having this issue, here is some code I wrote (with ChatGPT :slight_smile: ) that will sort the xml files. This could be used as part of a Git pre-commit hook.

My code isn’t great - the processed xml files don’t run work correctly (they fail validation with cht-conf because of namespace issues). But the processed xml files are useful for version control.

import xml.etree.ElementTree as ET
import glob
import re
from xml.dom.minidom import parseString

def pretty_print_xml(xml_string):
    dom = parseString(xml_string)    
    pretty_xml = dom.toprettyxml(indent="  ")
    # Remove unnecessary blank lines
    pretty_xml = '\n'.join([line for line in pretty_xml.split('\n') if line.strip()])
    return pretty_xml

def sort_attributes(element):
    if element.attrib:
        element.attrib = {k: element.attrib[k] for k in sorted(element.attrib)}

def sort_elements(element):
    sort_attributes(element)
    element[:] = sorted(element, key=lambda child: (child.tag, child.text if child.text else ''))
    for child in element:
        sort_elements(child)

def sort_xml(xml_string):
    parser = ET.XMLParser(target=ET.TreeBuilder(insert_comments=True))
    root = ET.fromstring(xml_string, parser=parser)
    sort_elements(root)
    return ET.tostring(root, encoding="utf-8", method="xml").decode("utf-8")

def normalize_whitespace(xml_string):
    # Normalize whitespaces between tags
    xml_string = re.sub(r'>\s+<', '><', xml_string)
    # Remove spaces around = in attributes
    xml_string = re.sub(r'\s+=\s+', '=', xml_string)
    return xml_string

def sort_and_overwrite_xml_files(directory):
    for xml_file in glob.glob(f"{directory}/**/*.xml", recursive=True):
        with open(xml_file, "r") as file:
            xml_content = file.read()
        
        sorted_xml = sort_xml(xml_content)
        normalized_xml = normalize_whitespace(sorted_xml)
        beautified_xml = pretty_print_xml(normalized_xml)
        
        with open(xml_file, "w") as file:
            file.write(beautified_xml)


sort_and_overwrite_xml_files("forms")
1 Like