Coming Soon: Duplicate Contact Detection!

@Anro and team are currently developing a new CHT feature that would allow for detecting when a user is about to create a new contact that is a duplicate of one of the existing contacts accessible to that user. When that happens, the user will be warned about the existing contact(s) and given the choice if they want to proceed or not.

More details about the design can be found on the issue thread, but some of the feature highlights are:

  • New/edited contacts will be checked against each of their sibling contacts (contacts with the same type/parent)
  • “Equality” of string fields (e.g. name) can be measured according to their Levenshtein distance, allowing for fussy matching when desired.
  • By default, sibling contacts of the same type with name fields that are less than the Levenshtien distance of 3 will be considered duplicates.
  • For each contact type, you can configure a JS expression to be used as your custom duplicate checking logic. So, you can fully customize the conditions for when a contact should be considered duplicate (and the fields involved).
  • Additionally, for each contact type you can choose to completely disable dupe checking if it is not needed

Development of this feature is currently underway with a working draft! Checkout this video for a demo of what the feature looks like…

One goal of this post is to solicit feedback on the UX for this feature. We would love to hear from community members (particularly any UX designers out there!) regarding the look and feel here and especially regarding the kinds of information that would be most useful to the user trying to decide if they should add a contact flagged as a duplicate or not. Keep in mind, though that for the initial PR we are striving for the minimum viable feature set. What is the most simple feature set that would be actually useful for end users? Once we get this feature into the hands of users, we can start iterating on more sophisticated UX workflows for this functionality.

Here are some additional stills of how things look now:

3 Likes

I will go ahead and jump in here with my UI mockup suggestion! :sweat_smile: I am thinking we could just go with a contact list similar to what you would see for the list of children on a contact’s profile page. Instead of providing an accordion to view the contact doc details, the MVP functionality would just be that each contact row would serve as a button to take you to that contact’s profile page.

The emphasis here is on simplicity and UI consistency (at the cost of not providing the user with access to more data about the duplicate contacts).

2 Likes

My thoughts…

  • I would not be in favor of showing the contact doc. (It has too much information, more than the person creating / editing probably needs.)

  • I like your suggestion to show the duplicate contacts.

Suggestion: Would it help if on the Duplicates found header, it includes what field was detected for possible duplicate to give a hint of what the problem is? Something like:

Duplicate found (names similar) or Duplicate found (postal code similar)

2 Likes

Thanks for the great feedback @Ben_Kiarie!

This is a good question and something that has been heavily discussed in the issue thread. To summarize, I see two challenges with giving the user a hint about what is triggering the duplicates:

  1. Technical: With a dynamic expression function (that can include basically any of the fields from the contact doc) it becomes very challenging to track through exactly which fields matched. (And really, nothing is stopping the expression code from doing extra logic besides simple field matching.)
  2. Functional (how to communicate something meaningful to the user): Even if we could update the expression code to somehow report the actual matching fields, then we have the challenge of how to communicate this to the user. There is not necessarily a one-to-one mapping between the questions visible to a user in the contact form and the fields stored on the contact doc. Suppose the contact doc name field is calculated as a result of concatenating the answers to the first_name and last_name questions. If the dupe-check flags the contact as duplicate because of the contents of the name field, what do we tell the user? Either we just say there is an issue with the name field (which in a more complex case might be meaningless to the user who has no knowledge of the internal data structure of the form), or we try to implement some crazy reverse matching to figure out which form questions are visible and contributed to the doc field.

Any approach I can think of here seems well beyond the scope of what is feasible for the MVP. :grimacing:

Finally, if we take a step back and consider the actual problem we are trying to solve with the dupe-checking functionality, I am not sure flagging the duplicate fields is even helpful to the user, anyway. If the user is trying to create a new contact and gets the duplicate warning, they really just need to decide if a contact record already exists for the person/place they are trying to create or if they should override the dupe check because they know the person/place they are trying to create is real and distinct from the existing contacts. The user does NOT need to know which questions in the form they should update so that the new contact is not flagged as a duplicate (in fact, I think we really want to avoid prompting this behavior). We can assume their answers to the questions were correct and the only thing that the user needs to decide is if the contact is distinct from the existing ones or not.

1 Like

I see where you’re coming from from the Technical and Functional challenges of showing the user which is the duplicate field in the MVP. Therefore, from an MVP perspective perhaps we may not need to show the field.

That being said, I was thinking it might be helpful because of this scenario:

In the screenshot above it would be intuitive that the duplicate is being flagged because of the First Child name. Let’s ignore the example that the name field could be calculated from concatenating the first_name and last_name. Let’s think of it as a user: when I see a contact or place is being flagged as duplicate, can I tell why so that I can decide whether it is duplicate or not?

What had me thinking of this, is a situation whereby the expression function checks for duplicates of a field in the contact doc that is not visible on a contact card.

As a user I might see there is a duplicate of a place, but when I check the two Places, I would see they have different names and wonder what’s going one. When all along the duplicate is being flagged because of the same street_address that I cannot see. In that case, I would not be able to tell why it is being flagged as a duplicate.

Could that scenario happen ?

2 Likes

Hmm, yes, this is a very plausible user story. Thanks for thinking through this with me!

My thinking at this point is that we should keep things simple for the MVP, but in the long run the user is probably going to need more info than just a name/dob to be able to make a decision about duplicates. That might include figuring out how to flag the fields involved, or it might just be providing more context/data about each contact in the list. (Imagine if tapping the contact card popped open a drawer with more data about the contact (maybe their contact-summary fields?))…

My hope is that once this feature is actually being used, it will be more clear what is needed to best support the user decision. :+1:

1 Like

To add to @Ben_Kiarie’s point, if the user is forced to click on a duplicate item and navigate away from the form—losing their filled-in data (including the found duplicates)—just to see more details on a profile screen that may or may not display the fields causing the duplicate, that feels less user-friendly.

IMO, the user should see enough information upfront to identify the duplicate. This isn’t just to help them decide whether to proceed with their current submission, but also because dealing with multiple duplicates (e.g., five or more) could be challenging if only limited information is available.

Imagine being a user who needs to check five different items, switching back and forth. What information would you need to confidently verify all duplicates at once instead of revisiting them repeatedly?

1 Like

Agreed! The tricky part, though, is what qualifies as “enough information” :sweat_smile:. From the discussion here there seems to be a general concern that just name and date_of_birth might not be enough information even for the “minimum viable product” implementation of this feature.

In that case, we need to determine what actually is the most simple thing we can provide to the user so they have “enough information” (still keeping in mind that we are going for an MVP here).

Are there other standard fields we could hard-code in to be displayed for each duplicate contact? If we think the list of fields must be configurable, then I suggest we consider my previous suggestion of just showing the contact-summary fields card for each contact. (Hoping that would (or could) contain all of the most important info for each contact.)

Another possible approach would be to support displaying a custom duplicate error message to the user that is specific to each type of contact (and therefore to each custom duplicate_check.expression). The error message could help inform the user of next steps or things to check. A static error message would be easy to implement, but things could get a lot more complex if the message needs to be dynamic (and contain information specific to the current duplicate contacts)…

I wonder if we could use a modal to preview the possible duplicates? I’m thinking of modals as they appear for training cards where a user can click any of the possible duplicates and they’ll show in the modal with “Previous” and “Next” buttons to easily flip between all the possible duplicates, as well as a “Close” (“Cancel”?) button to return to editing the current form.

@mrjones I do like the idea of previous and next buttons, perhaps we can pair that with the drawer approach discussed in the issue thread. I would then suggest < and > arrows to avoid confusing the user with the “Pervious” and “Next” buttons of the form itself.

The issue with modals, as discussed in that issue thread, is that they prohibit the user from scrolling and interacting with the current form while the duplicates are open. To cross compare various fields, opening and closing the modal would be quite frustrating. This issue is especially prevalent in multi-page forms.

A drawer would allow the user to resize the duplicate ‘window’ to facilitate easy field comparison (or minimize if necessary), allow the user to interact with the form underneath (scroll, make changes), and it is situated at the bottom of the screen for easy thumb access on mobile devices.

That said, I don’t think this is MVP.


@jkuester THAT… is the million-dollar question :sweat_smile: . The reported_date is maybe another addition to the “must have” fields, but the concern of not being enough remains. A last_updated/last_edited field would be nice, to indicate activity, but I don’t think there’s a field for that (perhaps not on our version). Just thinking out loud.

I like the contact-summary suggestion. Like you’ve noted before, if the properties are important enough to display there, it will probably be important enough to be considered in the expression. We should probably still consider some standard fields. We can’t display a blank contact if the user didn’t configure any contact-summary fields.

If we’re to supply a message, it would be more useful to make it specific. Would the contact specific error message supplement the duplicate contact list? On its own it would necessitate navigating in and out of forms, which would be a terrible user experience without saving forms as drafts. I envisioned it being a translation key supplied alongside the duplicate_check.expression. Maybe I’m oversimplifying it.

1 Like

Right, this was essentially what I was thinking. (Though, maybe we would not even need to configure a key in the properties.json since it could be derived from the actual contact-type in question. E.g. the translation key format would be something like duplicate_check.contact.person.duplication_message…)

Unfortunately, no, there is currently no support for this built into the CHT. (Personally, I am appalled this is not just a core feature of Couch…) I do have some hope that once we get cht-core updated to use cht-datasource as the primary interface for interacting with Couch/Pouch that will give us the power to include a feature like this (without fear that the field will be stale half the time…).

Depending on the UX design, there could be the opportunity for both a small set of default fields and then a view to see the rest of the contact-summary fields (if any). I tend to agree that name, date_of_birth and reported_date should probably be shown no matter what. However, if we are going to show the contact-summary fields we need to be careful with how we show date_of_birth and reported_date since those values might be duplicated in the contact-summary.