Upper limit on number of docs server will sync

jkuester · January 24, 2023, 4:52pm

What would you advise, client side or server side purging, given our current challenges with document limits?

Sorry, I think I made this more confusing by saying “server-side”. Newer versions of the CHT really just support only one kind of purging (described on the docs page). This purging has both server-side and client-side components, but it is all working together with the same configuration (so there is no choice between one or the other).

Would appreciate guidance on writing purge rules so that we don’t loose any data, our greatest worry

The most important thing to understand about CHT purging is that the documents are removed from the client devices, but they are not deleted from the server. Technically speaking, the medic database on the server is not affected at all by the purging process. Instead, the purging process will separately maintain a list of docs that should be considered “purged” for specific users. These docs will be removed from the client devices for those specific users, but will remain untouched in the medic database on the server. In this way, it is not possible for the purging process to result in data loss (in the sense that the data is gone from the server).

Purged docs will be removed (or not replicated in the first place) from affected client devices. So, if you have a CHW user that currently has access to 11317 documents and 11000 of those docs get purged, then the user will only sync 317 docs when logging into their device. Of course, that means that the user will only have those 317 documents on their device. This can break workflows that depend on existing data (e.g. pregnancy followup tasks may not be triggered if the original pregnancy document is purged from the device).

A good approach, when setting up your purge configuration, is to have a matrix of user roles and the types of data records that get created by your various forms. Then in the matrix you can fill out how long the user needs access to the particular records to perform their workflows (e.g. a pregnancy record might need to remain on a client device for 9+ months, but you might be able to purge a patient assessment form after 1 month (and maybe vaccination records should never be purged)). These time periods are completely dependent on your particular config and the needs of your users, so we do not really have any kind of “recommended” purge configuration. This data matrix can then be referenced when actually writing your purge configuration.

oyierphil · January 24, 2023, 9:01pm

@jkuester, our use-case is simple, for CHWs, register households, fill suspect registration form and for each suspect, fill the CIF form while offline, sync in the evening. For Health Workers, register suspects at the clinic and fill the CIF form for each case while offline, then sync in the evening. We are not using the messaging app, thus don’t store any messages, no need to purge messages.

Have reviewed the code here, Purging | Community Health Toolkit and customized for our scenario, pushed the code to the test instance as below:
purge.js
module.exports = {
text_expression: ‘Everyday at 4 pm’,
run_every_days: 7,
cron: ‘0 16 * * *’,

fn: (userCtx, contact, reports) => {
const NOW = Date.now();
const monthAgo = months => NOW - 1000 * 60 * 60 * 24 * 30 * months;

const reportsToPurge = reports.filter(r => {
  if (userCtx.roles.includes('chw_supervisor' || 'chw' || 'health_worker') ){
    return true;
  }
const purgeThreshold = ['household', 'suspected_case','case_investigation'].includes(r.form) ? 12 : 6;
return r.reported_date <= monthAgo(purgeThreshold);
    }).map(r => r._id);

      return [...reportsToPurge];

}
};

oyierphil · January 29, 2023, 1:38pm

@jkuester, @diana, I have tested the revised purge rule above on a test instance with few records.

Retrieving a list of all purge logs, I get the above information:
{“total_rows”:1295,“offset”:23,“rows”:[
{“id”:“purgelog:1674996116976”,“key”:“purgelog:1674996116976”,“value”:{“rev”:“1-3b88f4a26e28ff412f9732ccad3414a5”}},

{“id”:“purgelog:1674995816884”,“key”:“purgelog:1674995816884”,“value”:{“rev”:“1-cc42012ef55f410eab30687b436df1b0”}},
…
{“id”:“purgelog:1674734276001”,“key”:“purgelog:1674734276001”,“value”:{“rev”:“1-5b1a16c98cd62e95c42bd03dee77b65a”}}
]}

Retrieving list of all purge logs with errors, I get the following:
{“total_rows”:1295,“offset”:23,“rows”:[

]}
I finally get the following from Fauxton:

Trying to understand what this means, had wanted to test a rule that would purge documents (household, suspected_case and case_investigation) for chw, chw_supervisor and health_workers everyday at 4 pm before pushing to production instance, would appreciate any insights

oyierphil · January 30, 2023, 10:43am

@jkuester, @diana, would wish to get feedback before pushing the code to the production instance, thank you

yuv · January 30, 2023, 4:05pm

Hi @oyierphil ,
Having worked in configuring purging for one of our app, I will try to answer some of your questions here. Diana and Josh please feel free to validate and add any more information.

The medic-purged-role-xxxxx databases that you’re seeing in Fauxton contains the uuid of documents that are purged for a particular role as mentioned here. The xxxxxxxx on these database means the hash of the role for which the documents were purged. In your case, the second role had 18 documents purged.

That could be one of chw_supervisor or chw or health_worker. You can open the medic-purged-role- database and see list of ids that are purged. You can pick one and check that id on medic database to know which document will be purged.

In your test data, did you have documents for each of the roles to be purged ? Looking at your fauxton database screnshot, only one role will have 18 documents purged. Please verify that on your end.

oyierphil · January 30, 2023, 7:15pm

@yuv, thank you for the feedback. I created users as chws, one of the roles and requested them to enter data, this explains the 18 documents and 0 for the other two roles. Want to push the code to the production instance tomorrow, and maybe increase the number of months to two, will appreciate a review of the rule we created above, given your experience to be sure it will achieve the purpose, regards

yuv · January 30, 2023, 8:43pm

So you’re looking to purge records that are older than 12 months for `‘household’, ‘suspected_case’, and ‘case_investigation’ forms and purge all records older than 6 months for all other forms right? And you’re purging for three roles right ? That’s my best understanding based on this code and if that’s what you’re looking for, the purging rule looks good.

oyierphil · January 31, 2023, 9:22am

@yuv, thank you for the great insights. We started data collection in November last year, thus want to purge documents for November and December 2022 (two months), run the code today at 4 pm, then revert to every Friday evening.
The revised code is as below:

module.exports = {
text_expression: “Purge reports at 4 pm on Wednesday”,
run_every_days: 7,
cron: “0 16 * * WED”,

fn: (userCtx, contact, reports) => {
  const NOW = Date.now();
  const monthAgo = months => NOW - 1000 * 60 * 60 * 24 * 30 * months;
  const FORMS_WITH_2_MONTH_PURGE_THRESHOLD = ["household", "suspected_case", "case_investigation"];

  const reportsToPurge = reports.filter(r => {
    if (userCtx.roles.includes("chw_supervisor") || userCtx.roles.includes("chw") || userCtx.roles.includes("health_worker")) {
      return true;
    }
    const purgeThreshold = FORMS_WITH_2_MONTH_PURGE_THRESHOLD.includes(r.form) ? 2 : 3;
    return r.reported_date <= monthAgo(purgeThreshold);
  }).map(r => r._id);

  return [...reportsToPurge];
}

};

jkuester · February 1, 2023, 3:12pm

I do not think you want this code here inside the filter since returning true in this case is going to end up causing all reports for users with the specified roles to be purged! (You are returning true from the filter for each report for a user of these types so none of the reports will be filtered out and you will never reach your reported_date logic!)

I think you probably should check the roles outside of the filter and just return an empty array (with no report ids) if the user does not have the proper role. Something like what I have done in this example:

fn: (userCtx, contact, reports) => {
  const NOW = Date.now();
  const FORMS_WITH_2_MONTH_PURGE_THRESHOLD = ["household", "suspected_case", "case_investigation"];
  const USER_ROLES = ["chw_supervisor", "chw", "health_worker"];
  const monthAgo = months => NOW - 1000 * 60 * 60 * 24 * 30 * months;
  const getPurgeThreshold = ({ form }) => FORMS_WITH_2_MONTH_PURGE_THRESHOLD.includes(form) ? 2 : 3;
  if(!userCtx.roles.some(role => USER_ROLES.includes(role))) {
    return [];
  }
  return reports
    .filter(r => r.reported_date <= monthAgo(getPurgeThreshold(r)))
    .map(r => r._id);
}

oyierphil · February 1, 2023, 5:58pm

@jkuester, thank you for reviewing the code and the feedback. I did run the code for the first time on a test VM and got some results shared above. Made slight changes and nothing happened as you correctly observed, did further slight modifications but still nothing happened, no output, no errors.

Revised the rule as below and ran on another test instance with the following output:
fn: (userCtx, contact, reports) => {
const NOW = Date.now();
const FORMS_WITH_2_MONTH_PURGE_THRESHOLD = [“household”, “suspected_case”, “case_investigation”];
const USER_ROLES = [“chw_supervisor”, “chw”, “health_worker”];
const monthAgo = months => NOW - 1000 * 60 * 60 * 24 * 30 * months;
const getPurgeThreshold = ({ form }) => FORMS_WITH_2_MONTH_PURGE_THRESHOLD.includes(form) ? 2 : 3;
if(!userCtx.roles.some(role => USER_ROLES.includes(role))) {
return [];
}
return reports
.filter(r => r.reported_date <= monthAgo(getPurgeThreshold(r)))
.map(r => r._id);
}
};

Retrieving a list of all your purge logs gives the following:

|total_rows|  5978|
|offset    |681|
|rows     |[]|

Retrieve a list of purge logs with errors gives the following:

|total_rows   |5978|
|offset    |681|
|rows  |[]|

Trying to review the code further to see why it is returning empty array, will appreciate any further insights

oyierphil · February 2, 2023, 12:53pm

@jkuester, @yuv, tried the following simple code and had results:

module.exports = {
    text_expression: 'at 3 pm on Thursday',
    run_every_days: 7,
    cron: '20 14 * * THUR',
    fn: function(userCtx, contact, reports) {
      const NOW = Date.now(); 
      const monthAgo = NOW - (1000 * 60 * 60 * 24 * 30);  
      const reportsToPurge = reports
        .filter(r => r.reported_date <= monthAgo)
        .map(r => r._id);
      
      return [...reportsToPurge];
    }
  };

Retrieving a list of all your purge logs gives the following:

total_rows	5983
offset	682
rows	
0	
id	"purgelog:1675337139005"
key	"purgelog:1675337139005"
value	
rev	"1-24ba339e137a712b206a7169ef085293"
1	
id	"purgelog:1675336840161"
key	"purgelog:1675336840161"
value	
rev	"1-bd711a77f7629d901e913b639bbf0bc6"
2	
id	"purgelog:1675336540491"
key	"purgelog:1675336540491"
value	
rev	"1-a2d547c69e1b39911b2b5022b76fea90"

Retrieve a list of purge logs with errors gives the following:

total_rows	5983
offset	682
rows	[]

We have noted the challenge with polling and replication mainly affects CHWs, logged as one of them using my laptop since noon today with 6500 documents, the app is still polling now, almost 4 pm. We now a good number of CHW users, especially from some regions with the same challenge as below:

The code was modified to target them first in purging, wondering why it doesn’t give results, any ideas, just want to be sure I got it correct before pushing to production instance:

module.exports = {
  text_expression: 'Everyday at 4 pm',
  run_every_days: 7,
  cron: '0 22 * * Wed',

  fn: (userCtx, contact, reports) => {
    const NOW = Date.now();
    const FORMS_WITH_2_MONTH_PURGE_THRESHOLD = ["case_investigation"];
    const USER_ROLES = ["chw"];
    const monthAgo = months => NOW - 1000 * 60 * 60 * 24 * 30 * months;
    const getPurgeThreshold = ({ form }) => FORMS_WITH_2_MONTH_PURGE_THRESHOLD.includes(form) ? 2 : 3;
    if(!userCtx.roles.some(role => USER_ROLES.includes(role))) {
      return [];
    }
    return reports
      .filter(r => r.reported_date <= monthAgo(getPurgeThreshold(r)))
      .map(r => r._id);
  }
};

Finally, I noted it medic-purged-role-… database documents was created, can I delete as I review and test the codes?

yuv · February 2, 2023, 3:37pm

Hi @oyierphil ,
For offline users, whenever they’re logged in for the first time in a device that’s not logged in previously, it takes some time to poll data. But, this is the one time cost.Once polling is completed and records are downloaded for that user, it should not usually take that time. If multiple users log in at the same time for the first time, they all start polling data and can be slower. The thing that seems to impact replication time the most is the quality of the data connection that the phone has.

can I delete as I review and test the codes?

If it’s the test instance and you don’t need purging testing anymore, you can remove this database. On production instance however, you should not delete this database as it holds the ids of the records that are purged for that role. Is it still showing polling or it has changed?

oyierphil · February 2, 2023, 6:54pm

@yuv, I told the browser to forget the session, it took very very long, I was on a LAN. Wanted to delete to start again and see if the code works or not, not sure how to confirm if we have the medic-purge-… as above. I still can’t explain why the second code can’t run

oyierphil · February 3, 2023, 9:31am

@yuv, @jkuester, checked Fauxon today to see progress of purging status as below:

The app for some users (mainly CHWs) doesn’t load displaying the message, Polling replication data. We had set replication depth for them as 3 (CHW Area, Households and Household Members).

My assumption was that if purging is done, then we will reduce the number of documents to poll, the challenge still exists, any ideas where else to check, the users are frustrated, so am I. Tried to login from my laptop as one of the users and same challenge, had to forget the site to stop the polling and replication screen

jkuester · February 6, 2023, 2:55pm

@oyierphil sorry for the delayed response! I see you added some more details in this thread and I have a few followup questions.

Starting index update for db: shards/a0000000-bfffffff/xxxx.1668361497 idx: _desi…, Index update finished for db: shards/00000000-1fffffff/xxxx.1668361497 idx: _desi…

You mentioned seeing a lot of these messages in the Couch logs. These are normal messages that indicate Couch is properly updating the views for changes occurring to the DB. (Though an increase in the frequency of these messages could indicate an increase in load/traffic to Couch.) Couch should be able to process these changes very quickly. One thing to look for in the Couch logs is any kind of timeout error. (Though at this point any error in the logs would be of interest.) The timeout errors could indicated that the Couch instance is currently just inundated with traffic and is having trouble keeping up. Another thing to check would be to see if there are any long-running “Active Tasks” in Fauxton: _utils/#/activetasks. As I mentioned above, normally these active tasks should cycle through pretty quickly, but if some of them are blocking for some reason that could indicate an issue.

had to forget the site to stop the polling and replication screen

Once you forgot the site and re-tried the login, did it work? Is this behavior happening for all users or just some? Does it happen consistently for the users or are they sometimes able to successfully load the app?

My assumption was that if purging is done, then we will reduce the number of documents to poll,

Purging will reduce the number of documents to “replicate” (copy to the device) but technically it should slightly increase the amount of time it takes to “poll” (determine which docs should be replicated for that user). During the polling process, the CHT server will compare the docs associated with the user with the purge records to see which docs should be considered “purged” for that user. Under normal circumstances, this should be a very quick process. However as was suggested in the other thread, it may take much longer if Couch is still rebuilding the views necessary for fetching the purge data. This is why I am interested to hear about the active tasks.

oyierphil · February 6, 2023, 6:34pm

@jkuester, thank you for the feedback.

 Couch should be able to process these changes very quickly

I have checked the logs just now and I no longer see index update messages.

One thing to look for in the Couch logs is any kind of `timeout` error.

I have checked the logs and I share the feedback as below:
[2023-02-06 18:13:43] REQ 5112058b-5ec0-469e-9302-2d57df01d9ce 105.160.38.170 - GET /medic/_changes?timeout=600000&style=all_docs&heartbeat=10000&since=6411

[2023-02-06 18:14:06] REQ 679802c2-a885-4eb6-9c44-acff609db295 41.81.75.210 - GET /medic/_changes?timeout=600000&style=all_docs&heartbeat=10000&since=641080

[2023-02-06 18:20:33] RES dbc61bff-328b-4f6d-87f9-980e0fb97d2a 197.237.245.238 - GET /medic/_changes?feed=longpoll&heartbeat=10000&since=641173-g1AAAAJ7eJyd

[2023-02-06 18:20:42] RES 7869ec3c-a130-458c-a2bd-a04a9ed10870 41.90.5.10 - GET /medic/_changes?timeout=600000&style=all_docs&heartbeat=10000&since=641169-g

Once you forgot the site and re-tried the login, did it work?

No it didn’t, same screen, polling replication data even after three (3) hrs, thus had to forcefully stop the process

Is this behavior happening for all users or just some? Does it happen consistently for the users or are they sometimes able to successfully load the app?

This behavior doesn’t happen to all users, but a few. Some users are able to load the app after a very very very long time, we are dealing with CHWs, most of whom are not patient with technology

Another thing to check would be to see if there are any long-running “Active Tasks” in Fauxton: _utils/#/activetasks.

I have checked and no active tasks as below:

I now can’t move users from one place to another, it takes long to load the associated contacts, the app keeps on submitting the changes as below but never completes, thus I can’t move users who have changed places or revise users placed at the wrong places:

Finally, the purge database grows daily from my observation, the next job will run tomorrow at 6 pm, hoping this will improve access for our users, we are at 60240 now

oyierphil · February 8, 2023, 9:56am

@jkuester, syncing takes time for some users, especially those with many documents, but eventually works. Our current challenges are: 1) Polling replication data, which is taking a very long time and doesn’t complete for some users and 2) Inability to move users from one place to another, the submit command never completes. We used to move users with ease, not sure what has happened

Purging ran, logs with errors returns null as below:
|total_rows |566073|
|offset |158437|
|rows |[]|

We have complaints from the affected users, any idea where else to check, about 130,000 records have been submitted by teams from the four counties?

diana · February 10, 2023, 12:25pm

Hi @oyierphil

Both of these actions call an endpoint that checks the documents that a specific user will download. If this is taking very long for some users, I would suspect that they have too many docs and replication depth rules should be updated, to lower how many hierarchy levels these users have access to.

oyierphil · February 10, 2023, 12:27pm

@diana, these are CHWs, hierarchy set to 3 (CHW Area, Household and Household member)

diana · February 10, 2023, 12:52pm

Hi @oyierphil

I would be interested to check how many docs these users have. Do you think we could work out how you could share some access credentials to your server?

Thanks!