Mitigate duplicate data capture

gareth · February 7, 2024, 6:46am

The way our search is set up is that the overall search string is split up into key words and then short words are dropped. This means the “01” in your search phrase is completely ignored. This is to improve performance because if we indexed on, for example, “a” the size of the index would be enormous, and the results wouldn’t be useful because it would match on many docs in the database.

There is a notable change in the next CHT Core release - in previous versions your search term would have returned no results, because while the short key was being ignored during indexing, it wasn’t being ignored in the client side. As of 4.6.0 as you noticed you get both results as it’s better to get too many results than not enough. I hope this will go some way to addressing the duplicate creation issue in this thread as it won’t appear as though “test 01” doesn’t exist.

I think this could work but there would be quite a lot of effort involved in exposing the right APIs to complete this.

On the other hand the ability to detect during contact creation if this contact looks like one that already exists seems like a really useful addition to CHT Core that would benefit all projects. I’m not sure how this should be implemented but let’s raise an issue and start the discussion.