Purging Not Working Anymore

Gilbert · May 27, 2025, 6:34pm

Current Setup

The purging process is scheduled using the cron literal 0 22 * * SAT, meaning it should run every Saturday at 22:00 UTC.

Log analysis

Querying the logs via:
https(s)://<host>/medic-sentinel/_all_docs?end_key="purgelog:"&start_key="purgelog:\ufff0"&descending=true
revealed the following details:

Last failed purge: "date": "2025-04-06T11:04:20.320Z"

{"id":"purgelog:error:1743937460320","key":"purgelog:error:1743937460320","value":{"rev":"1-af19dfd921471d31a906ff7a64e5fa3d"}}

Last successful purge: "date": "2025-05-11T08:27:22.916Z"

{"id":"purgelog:1746952042916","key":"purgelog:1746952042916","value":{"rev":"1-f4cb23014b0f09b458823205c1051e23"}}

Extract of the Last Successful Purge

2025-05-10T22:00:00.002 INFO: Running purging  
2025-05-10T22:00:00.138 INFO: Purging: Starting contacts batch: key "", doc id "", batch size 1000  
...
2025-05-11T08:27:22.910 INFO: Purging: Starting "targets" purge batch with id "target~2024-10~fe1de9e0-2a34-44bb-96e0-b1d559d4eb00~org.couchdb.user:korkagu"  
2025-05-11T08:27:22.916 INFO: Purging completed in 627.38 minutes

Issues & Observations

After adding a new user role, the expected purge operation failed to trigger, preventing access to the related account.

Scheduled Purging Operation for 17/05

2025-05-17T22:00:00.001 INFO: Running purging  
2025-05-17T22:00:00.111 INFO: Purging: Starting contacts batch: key "", doc id "", batch size 1000  
...
2025-05-18T17:15:55.657 INFO: Purging: Starting contacts batch: key "c50_family", doc id "A3D161AC-16CC-A746-9B19-DC68B7443CA1", batch size 112  
2025-05-18T17:17:10.845 ERROR: Error while running purging: FetchError: request to `<server>` failed, reason: socket hang up  
2025-05-18T17:17:10.845 INFO: Purging failed after 1157.18 minutes

Despite the failure, the error was not logged in the basic query output, raising concerns about reliability.

Scheduled Purging for 24/05

2025-05-24T22:00:00.000 INFO: Running purging  
2025-05-24T22:00:00.150 INFO: Purging: Starting contacts batch: key "", doc id "", batch size 1000  
...
2025-05-26T05:29:11.792 INFO: Purging: Starting contacts batch: key "c50_family", doc id "b6db8eda-936b-47d8-b401-114a8a73ae0a", batch size 112  
2025-05-26T05:36:38.406 INFO: Purging: Starting contacts batch: key "c50_family", doc id "b87b4158-7c09-43df-8c01-28efc2daf66b", batch size 112

Several logs like these appear, but it’s unclear if they are directly related to purge operations: :

INFO: Task purging started  
INFO: Task purging completed

The end of purging operation is not mentionned, we don’t have error mention too.

Further Troubleshooting

To increase the chances of execution, the cron literal was modified to '0 22 * * MON'. However, the purge did not run on Monday either. Logs showed no indication of execution.

Excepted log entries such as:

INFO: Task purging started  
INFO: Task purging completed

But nothing related to actual purge execution.

So here are my questions

How can we confirm that a purging operation has actually been triggered?
Is there a way to receive notifications when a purge operation fails?
How can we manually execute purge operations instead of waiting for a cron job to run after deployment?
How to restart our purge process to make it properly work?

diana · May 28, 2025, 6:39am

Hi @Gilbert

Just to clarify, by this you mean that the errored purgelog was not recorded?

The cron config is ignored if you have a text_expression as well. Can you please confirm that you are not also using text_expression?

How can we confirm that a purging operation has actually been triggered?

You should see a purgelog document with either the success or the failure. My only guess for the lack of purgelogs is that CouchDb was inaccessible at the time and the log also failed to save. You would see an error message like this immediately after Purging failed after ....

Is there a way to receive notifications when a purge operation fails?

Unfortunately there is no mechanism for this. You could enable your own log trailing or changes watching alerts for this, but the CHT does not include anything of the sort.

How can we manually execute purge operations instead of waiting for a cron job to run after deployment?

There is no way to manually execute purge. You can change the cron property and wait for sentinel to start purging. The scheduler runs every 5 minutes, so you can set your cron to run every minute and wait for a max of 5 minutes before purging starts.

How to restart our purge process to make it properly work?

This question I am not sure I understand. The failure was due to some networking or request failure:

Error while running purging: FetchError: request to `<server>` failed, reason: socket hang up

Restarting would require you to change the purge cron to make it run again sooner than the original time.

Gilbert · May 28, 2025, 10:19am

Thank you @diana for your responses
Here are the answers for your questions:

We are unable to find the failed purge operation from 17/05 using the query:
https(s)://<host>/medic-sentinel/_all_docs?end_key="purgelog:"&start_key="purgelog:\ufff0"&descending=true
The last recorded failed operation on the list appears to be from 05/04 of this year.

We don’t have text_expression in the purge config

The purge operation following the one that failed due to a network issue still has no recorded completion.
Additionally, the purge scheduled after the incomplete operation never started.

Once the purge starts, how will the system handle subsequent executions if the cron runs every minute? Will ongoing purges prevent new ones from starting, or will they overlap?

diana · May 28, 2025, 2:18pm

The purging script is aware that purging is already running and will not start another process.

The purge operation following the one that failed due to a network issue still has no recorded completion.
Additionally, the purge scheduled after the incomplete operation never started.

I’m suspecting that the network failure also affected saving the log - which is another database operation. If the database is not reachable, then neither purging would succeed nor the log will be saved, unfortunately.