Is purging broken because of #7280

kenn · April 1, 2022, 5:46pm

Purging hasn’t run on a large project instance since Dec, 2021. I’d like to confirm that the problem is Server Side Purge does not complete to the end · Issue #7280 · medic/cht-core · GitHub fixed in 3.14.0. What’s the best way to confirm this?

I can see purging log events 2022-04-01 17:35:50 INFO: Task purging completed for today.

mrjones · April 1, 2022, 10:00pm

Hey @kenn !

Are you debugging an instance that has been upgrade to >3.14.0 with the fix or a version <3.14.0 without the fix? I ask because you said both “Purging hasn’t run … since Dec, 2021” and then had a log file showing purging run today (2022-04-01).

If I had to guess, maybe you’re on >3.14.0 and you’re seeing poor performance of some users who have above 20k docs maybe? I suspect this because the original bug only the initiation messages, and not the completion message (which you’re seeing), would be logged, like so:

[2021-08-27 08:30:00] 2021-08-27 08:30:00 INFO: Running server side purge 
[2021-08-27 09:30:00] 2021-08-27 09:30:00 INFO: Running server side purge 
[2021-08-27 14:00:00] 2021-08-27 14:00:00 INFO: Running server side purge

The 3.14.0 purge consideration docs call out the 20k threshold as being treated differently:

As of 3.14.0, contacts that have more than 20,000 associated reports + messages will be skipped, and none of their associated reports and messages will be purged. A single contact that has more than 20,000 associated records most likely points to a configuration issue. Skipped contacts’ ids are reported both in logs and in purgelog files

Can you check your logs to see if you have any relevant entries? Or, if I guessed wrong the specifics of your situation, please fill me in so we can get you sorted!

diana · April 2, 2022, 5:25am

Hi @kenn

Inspecting the instance, the medic-sentinel process is up, and active, with uptime since Jan 26.
This is not the same behavior we saw with the original instance that reported this (where the sentinel process ran out of memory).
However, I’ve inspected the logs after the last unsuccessful purging (6h ago) and I can see a couchdb request failing (when parsing the response) and then purging failing as a consequence (failing was graceful).

Running the same request locally, I see the result contains 423.644 reports.
Trying to run the same request with include_docs, I get:

node:buffer:806
    return this.utf8Slice(start, end);
                ^

Error: Cannot create a string longer than 0x1fffffe8 characters
    at Buffer.toString (node:buffer:806:17)
    at Request.<anonymous> (/home/diana/projects/medic-utils/node_modules/request/request.js:1135:39)
    at Request.emit (node:events:527:28)
    at IncomingMessage.<anonymous> (/home/diana/projects/medic-utils/node_modules/request/request.js:1083:12)
    at Object.onceWrapper (node:events:641:28)
    at IncomingMessage.emit (node:events:539:35)
    at endReadableNT (node:internal/streams/readable:1342:12)
    at processTicksAndRejections (node:internal/process/task_queues:83:21) {
  code: 'ERR_STRING_TOO_LONG'
}

This confirms it’s the same issue, where a batch of contacts has too many reports to be processed at one time.