Duplicated form data, except with different report_uuid and slightly different report_date (off by less than 5 min)

Hello,

It is my first time posting, and I am still new to CHT, so please let me know if there are posting conventions I should follow!

We have found many instances in our form data where the data submitted by the Community Health Toolkit user is duplicated, except for a different report_uuid and a slightly different report_date. I’ve screenshotted an example below, and highlighted in yellow the report_uuid, report_date, and open text field (exactly the same text):

This happens across our forms and is not isolated to one form type.

Does anyone have any insight to why this happens? I’ve posted our SQL query below as well. Nothing seems to be amiss from my perspective:

CREATE OR REPLACE VIEW public.reportview_malnutrition_48h
AS SELECT reports.doc ->> '_id'::text AS report_uuid,
    to_timestamp((((reports.doc ->> 'reported_date'::text)::bigint) / 1000)::double precision) AS report_date,
    reports.doc ->> 'form'::text AS which_report,
    reports.doc #>> '{contact,parent,_id}'::text[] AS hh_uuid,
    reports.doc #>> '{fields,patient_uuid}'::text[] AS hhmem_uuid,
    reports.doc #>> '{contact,_id}'::text[] AS chc_uuid,
    reports.doc #>> '{fields,patient_uuid}'::text[] AS patient_uuid,
    reports.doc #>> '{fields,patient_id}'::text[] AS patient_id,
    reports.doc #>> '{fields,patient_gender}'::text[] AS patient_gender,
    reports.doc #>> '{fields,inputs,contact,date_of_birth}'::text[] AS birthdate,
    reports.doc #>> '{fields,patient_age_in_months}'::text[] AS patient_age_inmonths,
    reports.doc #>> '{fields,patient_age_in_years}'::text[] AS patient_age_inyears,
    reports.doc #>> '{fields,g_continue,continue}'::text[] AS continue_followup,
    reports.doc #>> '{fields,g_visit,visit_rhu}'::text[] AS visitrhu_continue_yes,
    reports.doc #>> '{fields,g_input,visit_date}'::text[] AS rhu_date_attendance,
    reports.doc #>> '{fields,g_input,weight}'::text[] AS continue_weight,
    reports.doc #>> '{fields,g_input,height}'::text[] AS continue_height,
    reports.doc #>> '{fields,data,__hfa_class}'::text[] AS hfa_class,
    reports.doc #>> '{fields,data,__hfa_score}'::text[] AS hfa_score,
    reports.doc #>> '{fields,data,__weight}'::text[] AS patient_weight,
    reports.doc #>> '{fields,data,__target_weight}'::text[] AS target_weight,
    reports.doc #>> '{fields,g_treatment,treatment}'::text[] AS treatment,
    reports.doc #>> '{fields,data,__wfh_class}'::text[] AS wfh_class,
    reports.doc #>> '{fields,data,__wfh_score}'::text[] AS wfh_score,
    reports.doc #>> '{fields,data,__rhu_visit_done}'::text[] AS rhu_done_visitation,
    reports.doc #>> '{fields,g_continue,reschedule}'::text[] AS resched_date,
    reports.doc #>> '{fields,inputs,contact,parent,parent,contact,name}'::text[] AS patient_parent_name,
    reports.doc #>> '{fields,inputs,contact,parent,parent,contact,phone}'::text[] AS patient_parent_phone
   FROM raw_reports reports
  WHERE (reports.doc ->> 'form'::text) = 'malnutrition_followup_48h'::text
  ORDER BY (to_timestamp((((reports.doc ->> 'reported_date'::text)::bigint) / 1000)::double precision)) DESC;

Thanks for any insight, and please let me know if other information would be helpful to provide!

1 Like

Hi @helenamanguerra-icm! So great to have you with us on the CHT forum. The Technical Support category is the right place to seek assistance on any technical issues or bugs you could be facing.

Regarding the issue, do you have any more information on these duplicates and your project? Some questions I would ask:

  • What version of the CHT Core Framework are you on?
  • Roughly how many duplicate cases have been observed?
  • Are these duplicates seen across all/many users?

It is possible that users are resubmitting reports for some reason. If this is confirmed not to be the case, then reproducing the bug would be a key step in trying to fix it.

@samuel, @diana, @mrjones, we have noted after downloading data some cases with duplicate entries as shown below, we are trying to explore the source of the problem.

Due to poor internet connection in some areas, capacity of data collection phones and number of documents, some data capture teams had to sync more than once, does this have an effect on the duplicates?

Hi @oyierphil

I’m not sure what you mean by this. Syncing multiple times should not produce duplicates.

@diana, we have few users with duplicate records, as many as 43 and we are trying to find out what happened? In total, we have about 18,000 duplicate records, filtering unique ids, just wondering on the effects of multiple sync operations using same account, same o one or several devices?

What’s the effect of editing same record many times, does this process create a new record with the same UUID, for example a record is submitted at time t1, edited and resubmitted at t2?
Trying to understand what could have happened?

Hi @oyierphil

From the screenshot that you shared, I’m not sure what property you have highlighted.
Is this the result of the export function in the CHT? Is it an export from postgres? Which field are you highlighting and what is the significance of the other fields that are in the image?

Editing a report should not produce a copy of it, unless your form is creating additional docs, along with the original report: Editing a report that created extra docs duplicates all extra docs · Issue #7594 · medic/cht-core · GitHub

@diana, this data downloaded from the App, those are duplicate contact_ids, I removed some details. The other fields are binary responses to survey questions

Hi @oyierphil

Can you please share which field they correspond to from the report?

@diana, inputs.contact._id and patient_id fields

Hi @oyierphil

Having duplicates for inputs.contact._id is expected, because that field represents the contact of the report submitter. Every time a CHW submits a report, the report they create will have the same inputs.contact._id.
Similarly for patient_id, every report about a specific patient will have the same patient_id field.

@diana, what baffles me is only few users have this challenge, the rest are OK, from the explanation above, it could mean the duplicates were deliberate :slightly_smiling_face:, few people editing and making multiple instances of the same patient, hoping to increase the count of patients seen, would this make sense?

Hi @oyierphil

I’m sorry, but for me it’s not clear what sort of duplication or challenge you are referring to.
Could you please provide an example of two reports that are duplicated, including all relevant fields, but please replace PHI or sensitive information with placeholders.

Thanks!

@diana, have picked one the instances and tried to anonymize, hope it helps, we have many instances of the Diana Philip

Hi @oyierphil

Please share the json contents of the reports, instead of screenshots form the export.

we have many instances of the Diana Philip

Diana Philip is your CHW, I would expect her to submit reports, so what you’re showing me is expected.

@diana, yes, Diana Philip is a CHW, my issue is similar report for the same patient as above, we have one report per client/patient for other CHWs, 15 in the case above, so you can imagine if I have 10 patients for CHW Diana Philip, each duplicated 15 times, you get 150 records of the same patient, this is our problem.
If for example you want to pay per number of patients entered in the system, then you see the challenge
I have the json download of the report from couchdb table, will get the duplicates and share in the evening

Ah I see. Thanks for sharing. I think it’d be useful to compare all fields of such duplicates.

@diana, comparison returns almost same data with slight modifications in some cases, especially survey questions

Hi @oyierphil, thanks for reviving this topic! We have also continued to see this same issue (though I neglected to follow-up on this forum). Our current hypothesis is that our CHWs are not hitting submit on the report during the follow-up. Then when the CHW returns for another follow-up visit with the household member, they then hit submit on the previous report and submit a follow-up report on the same day. We are guessing there isn’t a big change in the household member’s health status between the two visits, so in the DB, it looks like two duplicate forms with slightly different submit times.

@diana Any insight you can provide would be very helpful. Thank you!

Welcome back @helenamanguerra-icm !

This is very valuable insight! Thank you!

@oyierphil , given that this is happening for only some CHW, repeatedly, I would follow up with them to check how they are using the app and whether there aren’t any misunderstandings in regards to the workflows.

From a technical perspective, I’m not aware of having other reports of duplicate data, and the fact that some fields are different between submissions, it most likely points to user actions rather than bugs.

It would still be helpful to see the whole contents of two of your reported duplicates, if you could provide those.

Thanks!

If you just care about how many and when a patient was registered, then you could look at the contact (type = person, for example) instead of the reports (type = data_record). The contact record has a reported_date property.

If the _id and the reported_date on the report are different (particularly if the reported_dates are significantly different), you can generally tell if it’s a potential system issue or simply the user is submitting multiple reports.

Also… it looks like these reports are being submitted by using the “New Action” option on the Contacts tab instead of as a result of a task (it’s cut off in the screenshot, but I assume the inputs.source is contact)… so it’s probably not an issue of a task not clearing properly.