How to reset the sentinel sequence ID to skip processing a large backlog?

I have a large CHT instance with a sentinel backlog of >60million. Since it will never catch up, and it’s not processing daily, current requests, I would like to reset the sentinel sequence ID (mentioned here).

How do you do this?

You can see the backlog is going down here in this watchdog chart, but not fast enough:

This has worked for me:

  1. Stop the Sentinel container
  2. Update the value stored in medic-sentinel/_local/background-seq to match the current update-seq value for the medic database (or whatever update-seq you are targeting)
  3. Restart Sentinel

Thanks @jkuester !

So, using curl to poke around a bit, the update_seq for me is 289631, correct?

curl -qs https://medic:password@192-168-68-26.local-ip.medicmobile.org:10443/medic/_design/medic/_info | jq ".view_index"

returns:

{
  "updates_pending": {
    "minimum": 0,
    "preferred": 0,
    "total": 0
  },
  "waiting_commit": false,
  "waiting_clients": 0,
  "updater_running": false,
  "update_seq": 289631,
  "sizes": {
    "file": 255092816,
    "external": 229458710,
    "active": 244840183
  },
  "signature": "5fa04631e0f06462a0ae01de34ba1207",
  "purge_seq": 0,
  "language": "javascript",
  "compact_running": false,
  "collator_versions": [
    "153.120"
  ]
}

And then I would use Fauxton at (/_utils/) to update the value in _rev as shown below? Do I need to prepend it with a 0-? What is the value in value? Oh - wait! Do I need to increment _rev to 0-3871 and then preface the value of value with 289631? The doubly, what is the suffix after the sequence ID?

curl -qs https://medic:password@192-168-68-26.local-ip.medicmobile.org:10443/medic-sentinel/_local/background-seq | jq

returns:

{
  "_id": "_local/background-seq",
  "_rev": "0-3871",
  "value": "274411-g1AAAAO1eJyV0jFOwzAUBmCLIrGwMHAJkFBix3EywcSKoLVE2fLiRlFUysTCwsgJyhGg9SU4RSVYGXsGWvPjeKyQLA-_ZMuf3nv2lDF22A4MO6ofHuvW0EXK1VmClU5xtFcxOtZ61LUDYmx8dY-9gzpNyozErjv_SH-OphNrlz12--WxPJUFL2QsBsfSuXPbgG09RpxXqaljMTiOUN2wx-5e-8rqCWV5dJtwRtRZuwiVfXvMmFIKnsdicJb07NwmYJ99m3lTChndJpztbJ_daD0P3IvnuCr4RKhIDtIQEuIdbxE-yNyDwhjFRezkIC0gIX6c-wjgzINZUyVJ2cSDG0iIa61X4W2fPFglSjQmjQcxvBXizdp1qPDSg0pyJUseD2J4awR-oAvgqQcLSUUus11Xu19C-AB_"
}

I’m testing this by:

  1. starting a docker helper instance
  2. stopping the sentinel container
  3. running TDG Easy Mode
  4. Spinning up Watchdog and checking what it sees:

It’s not the _local/background-seq document that you need to update. That document isn’t actually even being used anymore. It was used before Nairobi replication protocol when we were generating tombstones.

The document that you need to edit is _local/transitions-seq!

To get the update sequence of the database you need to call:

curl -qs https://medic:password@192-168-68-26.local-ip.medicmobile.org:10443/medic/ | jq ".update_seq"

Thanks @diana and @jkuester !

So to pull it all together:

  1. Confirm in watchdog you have a sentinel backlog. This is likely in the thousands or millions. While it may decreasing you’ll note that it will take weeks or months to reach zero. While this backlog is being processed, no new sentinel transitions will be processed:

  2. get the current sequence ID for medic db via curl -qs https://medic:password@192-168-68-26.local-ip.medicmobile.org:10443/medic/ | jq ".update_seq" - will show a like ~400 char string starting with a number::

    307933-g1AAAAO1eJyV0j1OwzAUB3CLIiEQUsXAJRiQP2I7ZoErQGsxMMV2oigqMDFzil4BUk-sTJyCKzAwMtOGZ8cLUlXJ9vCGJ__k97cXCKHjduLQiX18sq0zV4TKcwybLKC197CPXr3_hjLA6tqJQcgd3kProOSmFLzYdnQ3eKP1J5QXcBN4FEHJqeSK5oPrYfiAcg3uCNa3EaywZI0j-WDvvYfyC2664XsEi6bCWDX54Ezr5RilT-BFBJlzkjKWCZrnYdiMQS5Hzr5FjsqS1kzmcp33q2CuE_YVMSMaxbjNxbTW8yD2Cesj5pzijIpc7BI-XRBn6V3vIiZsbQqRndoZpB_ETXqC6TgmpRVx2WOewq2CuPqXmSC8pCXPxCoUuHmSfqJkCVaF2Tpj9wfVKv_m
    
  3. In Fauxton, go to the transitions-seq document via the URL https://192-168-68-26.local-ip.medicmobile.org:10443/_utils/#database/medic-sentinel/_local/transitions-seq . You can note before you edit that the leading integer is lower than step 2 above. For me it was "value": "292713-g1AAAAO1e...

  4. Stop the sentinel container. This is critical as the timing may be such that if you don’t stop it, the sentinel process will overwrite the attempt reset the sequence ID and the backlog will not be cleared.

  5. Carefully edit value to paste in the string you got on step 2 and click “Save Changes”

  6. Restart the Sentinel container

  7. Watchdog should now show a curt drop off and zero backlog:

In case anyone had issues following the instructions above, be sure to re-read them. They’ve just been updated to include a step to restart the sentinel container.

Thanks!

1 Like