Upper limit on number of docs server will sync

What would you advise, client side or server side purging, given our current challenges with document limits?

Sorry, I think I made this more confusing by saying “server-side”. Newer versions of the CHT really just support only one kind of purging (described on the docs page). This purging has both server-side and client-side components, but it is all working together with the same configuration (so there is no choice between one or the other).

Would appreciate guidance on writing purge rules so that we don’t loose any data, our greatest worry

The most important thing to understand about CHT purging is that the documents are removed from the client devices, but they are not deleted from the server. Technically speaking, the medic database on the server is not affected at all by the purging process. Instead, the purging process will separately maintain a list of docs that should be considered “purged” for specific users. These docs will be removed from the client devices for those specific users, but will remain untouched in the medic database on the server. In this way, it is not possible for the purging process to result in data loss (in the sense that the data is gone from the server).

Purged docs will be removed (or not replicated in the first place) from affected client devices. So, if you have a CHW user that currently has access to 11317 documents and 11000 of those docs get purged, then the user will only sync 317 docs when logging into their device. Of course, that means that the user will only have those 317 documents on their device. This can break workflows that depend on existing data (e.g. pregnancy followup tasks may not be triggered if the original pregnancy document is purged from the device).

A good approach, when setting up your purge configuration, is to have a matrix of user roles and the types of data records that get created by your various forms. Then in the matrix you can fill out how long the user needs access to the particular records to perform their workflows (e.g. a pregnancy record might need to remain on a client device for 9+ months, but you might be able to purge a patient assessment form after 1 month (and maybe vaccination records should never be purged)). These time periods are completely dependent on your particular config and the needs of your users, so we do not really have any kind of “recommended” purge configuration. This data matrix can then be referenced when actually writing your purge configuration.

@jkuester, our use-case is simple, for CHWs, register households, fill suspect registration form and for each suspect, fill the CIF form while offline, sync in the evening. For Health Workers, register suspects at the clinic and fill the CIF form for each case while offline, then sync in the evening. We are not using the messaging app, thus don’t store any messages, no need to purge messages.

Have reviewed the code here, Purging | Community Health Toolkit and customized for our scenario, pushed the code to the test instance as below:
purge.js
module.exports = {
text_expression: ‘Everyday at 4 pm’,
run_every_days: 7,
cron: ‘0 16 * * *’,

fn: (userCtx, contact, reports) => {
const NOW = Date.now();
const monthAgo = months => NOW - 1000 * 60 * 60 * 24 * 30 * months;

const reportsToPurge = reports.filter(r => {
  if (userCtx.roles.includes('chw_supervisor' || 'chw' || 'health_worker') ){
    return true;
  }
const purgeThreshold = ['household', 'suspected_case','case_investigation'].includes(r.form) ? 12 : 6;
return r.reported_date <= monthAgo(purgeThreshold);
    }).map(r => r._id);

      return [...reportsToPurge];

}
};

@jkuester, @diana, I have tested the revised purge rule above on a test instance with few records.

Retrieving a list of all purge logs, I get the above information:
{“total_rows”:1295,“offset”:23,“rows”:[
{“id”:“purgelog:1674996116976”,“key”:“purgelog:1674996116976”,“value”:{“rev”:“1-3b88f4a26e28ff412f9732ccad3414a5”}},

{“id”:“purgelog:1674995816884”,“key”:“purgelog:1674995816884”,“value”:{“rev”:“1-cc42012ef55f410eab30687b436df1b0”}},

{“id”:“purgelog:1674734276001”,“key”:“purgelog:1674734276001”,“value”:{“rev”:“1-5b1a16c98cd62e95c42bd03dee77b65a”}}
]}

Retrieving list of all purge logs with errors, I get the following:
{“total_rows”:1295,“offset”:23,“rows”:[

]}
I finally get the following from Fauxton:

Trying to understand what this means, had wanted to test a rule that would purge documents (household, suspected_case and case_investigation) for chw, chw_supervisor and health_workers everyday at 4 pm before pushing to production instance, would appreciate any insights

@jkuester, @diana, would wish to get feedback before pushing the code to the production instance, thank you

Hi @oyierphil ,
Having worked in configuring purging for one of our app, I will try to answer some of your questions here. Diana and Josh please feel free to validate and add any more information.

The medic-purged-role-xxxxx databases that you’re seeing in Fauxton contains the uuid of documents that are purged for a particular role as mentioned here. The xxxxxxxx on these database means the hash of the role for which the documents were purged. In your case, the second role had 18 documents purged.

That could be one of chw_supervisor or chw or health_worker. You can open the medic-purged-role- database and see list of ids that are purged. You can pick one and check that id on medic database to know which document will be purged.

In your test data, did you have documents for each of the roles to be purged ? Looking at your fauxton database screnshot, only one role will have 18 documents purged. Please verify that on your end.

1 Like

@yrimal, thank you for the feedback. I created users as chws, one of the roles and requested them to enter data, this explains the 18 documents and 0 for the other two roles. Want to push the code to the production instance tomorrow, and maybe increase the number of months to two, will appreciate a review of the rule we created above, given your experience to be sure it will achieve the purpose, regards

So you’re looking to purge records that are older than 12 months for `‘household’, ‘suspected_case’, and ‘case_investigation’ forms and purge all records older than 6 months for all other forms right? And you’re purging for three roles right ? That’s my best understanding based on this code and if that’s what you’re looking for, the purging rule looks good.

1 Like

@yrimal, thank you for the great insights. We started data collection in November last year, thus want to purge documents for November and December 2022 (two months), run the code today at 4 pm, then revert to every Friday evening.
The revised code is as below:

module.exports = {
text_expression: “Purge reports at 4 pm on Wednesday”,
run_every_days: 7,
cron: “0 16 * * WED”,

fn: (userCtx, contact, reports) => {
  const NOW = Date.now();
  const monthAgo = months => NOW - 1000 * 60 * 60 * 24 * 30 * months;
  const FORMS_WITH_2_MONTH_PURGE_THRESHOLD = ["household", "suspected_case", "case_investigation"];

  const reportsToPurge = reports.filter(r => {
    if (userCtx.roles.includes("chw_supervisor") || userCtx.roles.includes("chw") || userCtx.roles.includes("health_worker")) {
      return true;
    }
    const purgeThreshold = FORMS_WITH_2_MONTH_PURGE_THRESHOLD.includes(r.form) ? 2 : 3;
    return r.reported_date <= monthAgo(purgeThreshold);
  }).map(r => r._id);

  return [...reportsToPurge];
}

};

I do not think you want this code here inside the filter since returning true in this case is going to end up causing all reports for users with the specified roles to be purged! (You are returning true from the filter for each report for a user of these types so none of the reports will be filtered out and you will never reach your reported_date logic!)

I think you probably should check the roles outside of the filter and just return an empty array (with no report ids) if the user does not have the proper role. Something like what I have done in this example:

fn: (userCtx, contact, reports) => {
  const NOW = Date.now();
  const FORMS_WITH_2_MONTH_PURGE_THRESHOLD = ["household", "suspected_case", "case_investigation"];
  const USER_ROLES = ["chw_supervisor", "chw", "health_worker"];
  const monthAgo = months => NOW - 1000 * 60 * 60 * 24 * 30 * months;
  const getPurgeThreshold = ({ form }) => FORMS_WITH_2_MONTH_PURGE_THRESHOLD.includes(form) ? 2 : 3;
  if(!userCtx.roles.some(role => USER_ROLES.includes(role))) {
    return [];
  }
  return reports
    .filter(r => r.reported_date <= monthAgo(getPurgeThreshold(r)))
    .map(r => r._id);
}