In an SMS project, we have received complaints from multiple sources that valid SMS reports are sometimes not being accepted as reports and are going to the messages tab.
Sample SMS:
ज 12345 2
Here,
-
ज: Form code
-
12345: Patient ID
-
2: Days
The SMS message looks fine, but when using a Unicode text analyzer, we can see there are hidden (non-printing) characters:
-
Zero-width joiner - U+200D
-
Zero-width non-joiner - U+200C
In Nepali Unicode, these non-printing characters are used to enable/disable the transformation of certain other characters and ligatures.
Examples:
Without ZWJ: प+र्+यो = पर्यो (incorrect)
Using ZWJ: प+र्+ZWJ+यो = पर्यो (correct)
Without ZWNJ: अहम्+को = अहम्को (incorrect)
With ZWJ: अहम्+ZWNJ+को = अहम्को (correct)
The CHWs use various phone models (mostly keypad/feature phones) and we don’t know which key combinations are being used to enter these invisible characters. Although it is most likely a user error, it can happen unknowingly to the user. Also, it is not easy to identify the problem because everything looks normal in the CHT app. Since this is not specific to a few users or a few instances, should we consider handling this in the CHT?
Can we ignore the invisible characters (there could be more) in fields other than the text fields?