Corrupt Shard on CouchDB

Hello,
Am getting the below error from Sentinel:

2024-06-14T07:26:19.259 ERROR: Task backgroundCleanup completed with error: {
  error: 'bad_return_value',
  reason: '{read_beyond_eof,"./data/shards/40000000-5fffffff/medic-user-jar-binza-meta.1685021444.couch"}',
  status: 500,
  name: 'bad_return_value',
  message: '{read_beyond_eof,"./data/shards/40000000-5fffffff/medic-user-jar-binza-meta.1685021444.couch"}',
  stack: 'Error\n' +
    '    at Object.generateErrorFromResponse (/service/sentinel/node_modules/pouchdb-errors/lib/index.js:104:18)\n' +
    '    at /service/sentinel/node_modules/pouchdb-adapter-http/lib/index.js:254:33\n' +
    '    at runMicrotasks (<anonymous>)\n' +
    '    at processTicksAndRejections (node:internal/process/task_queues:96:5)'
}

Does this mean that the shard is corrupt? how can I rebuild it?

Hi @Job_Isabai

There’s no way to rebuild a shard, only to restore it from backup.
I believe we have seen two conditions where this error can happen (and it’s extremely rare):

  • storage write/lock hiccup - this is extremely rare
  • running two couchdb instances connected to the same data - very likely.

Can you check whether you have two couchdb instances running?

I am running only one instance of CouchDB.
This issue might be as a result of migration from 4.3 to 4.5. There were some failures in indexing views during the upgrade. Let me see if I can restore a backup.
Thanks

You did not need to migrate from 4.3 to 4.5, just the simple upgrade would have been enough. Did you actually migrate or upgrade?

Sorry I upgraded. I stagged 4.5 and upgraded afterwards.

I’m facing a similar issue but now affecting a shard in medic-sentinel. There are no recent data backups but we are restoring the VMs from snapshots.

My question is: can can the medic-sentinel database be re-rebuilt from data from the medic database?

Logs:

last msg: redacted
     state: [{data,[{"State",{file,{file_descriptor,prim_file,#{handle => #Ref<0.445768430.378667132.9520>,owner => <0.21178.1>,r_ahead_size => 0,r_buffer => #Ref<0.445768430.378667012.51607>}},false,5324976374,#Ref<0.445768430.378535937.249408>,infinity}},{"InitialFilePath","./data/shards/20000000-3fffffff/medic-sentinel.1669626451.couch"}]}]
    extra: [<0.21173.1>,[{gen,do_call,4,[{file,"gen.erl"},{line,214}]},{gen_server,call,3,[{file,"gen_server.erl"},{line,243}]},{couch_file,pread_iolist,2,[{file,"src/couch_file.erl"},{line,170}]},{couch_file,pread_binary,2,[{file,"src/couch_file.erl"},{line,166}]},{couch_file,pread_term,2,[{file,"src/couch_file.erl"},{line,155}]},{couch_btree,get_node,2,[{file,"src/couch_btree.erl"},{line,474}]},{couch_btree,stream_node,8,[{file,"src/couch_btree.erl"},{line,1069}]},{couch_btree,fold,4,[{file,"src/couch_btree.erl"},{line,242}]}]]
[error] 2025-03-27T06:57:47.272910Z couchdb@127.0.0.1 <0.20964.1> a174b56a38 req_err(4070966599) {bad_return_value,
    {file_truncate_error,eof,
        "./data/shards/20000000-3fffffff/medic-sentinel.1669626451.couch"}} : {gen_server,call,[<0.21178.1>,{pread_iolist,5324969561},infinity]}
    [<<"gen_server:call/3 L247">>,<<"couch_file:pread_iolist/2 L170">>,<<"couch_file:pread_binary/2 L166">>,<<"couch_file:pread_term/2 L155">>,<<"couch_btree:get_node/2 L474">>,<<"couch_btree:stream_node/8 L1069">>,<<"couch_btree:fold/4 L242">>,<<"couch_bt_engine:fold_docs_int/5 L1129">>]
[error] 2025-03-27T06:57:47.272955Z couchdb@127.0.0.1 <0.21178.1> -------- CRASH REPORT Process  (<0.21178.1>) with 1 neighbors exited with reason: bad return value {file_truncate_error,eof,"./data/shards/20000000-3fffffff/medic-sentinel.1669626451.couch"} at gen_server:handle_common_reply/8(line:815) <= proc_lib:init_p_do_apply/3(line:226); initial_call: {couch_file,init,['Argument__1']}, ancestors: [<0.21177.1>,<0.21176.1>], message_queue_len: 0, links: [<0.21177.1>], dictionary: [{couch_file_fd,{{file_descriptor,prim_file,#{handle => #Ref<0.445768430.378667132.9520>,...}},...}},...], trap_exit: false, status: running, heap_size: 2586, stack_size: 29, reductions: 1691
[error] 2025-03-27T06:57:47.273023Z couchdb@127.0.0.1 <0.21178.1> -------- CRASH REPORT Process  (<0.21178.1>) with 1 neighbors exited with reason: bad return value {file_truncate_error,eof,"./data/shards/20000000-3fffffff/medic-sentinel.1669626451.couch"} at gen_server:handle_common_reply/8(line:815) <= proc_lib:init_p_do_apply/3(line:226); initial_call: {couch_file,init,['Argument__1']}, ancestors: [<0.21177.1>,<0.21176.1>], message_queue_len: 0, links: [<0.21177.1>], dictionary: [{couch_file_fd,{{file_descriptor,prim_file,#{handle => #Ref<0.445768430.378667132.9520>,...}},...}},...], trap_exit: false, status: running, heap_size: 2586, stack_size: 29, reductions: 1691

 last msg: redacted
     state: [{data,[{"State",{file,{file_descriptor,prim_file,#{handle => #Ref<0.445768430.378667130.9492>,owner => <0.22573.1>,r_ahead_size => 0,r_buffer => #Ref<0.445768430.378667014.16814>}},false,5324976374,#Ref<0.445768430.378535940.52656>,infinity}},{"InitialFilePath","./data/shards/20000000-3fffffff/medic-sentinel.1669626451.couch"}]}]
    extra: [<0.22569.1>,[{gen,do_call,4,[{file,"gen.erl"},{line,214}]},{gen_server,call,3,[{file,"gen_server.erl"},{line,243}]},{couch_file,pread_iolist,2,[{file,"src/couch_file.erl"},{line,170}]},{couch_file,pread_binary,2,[{file,"src/couch_file.erl"},{line,166}]},{couch_file,pread_term,2,[{file,"src/couch_file.erl"},{line,155}]},{couch_btree,get_node,2,[{file,"src/couch_btree.erl"},{line,474}]},{couch_btree,stream_node,8,[{file,"src/couch_btree.erl"},{line,1069}]},{couch_btree,fold,4,[{file,"src/couch_btree.erl"},{line,242}]}]]
[error] 2025-03-27T06:59:50.639461Z couchdb@127.0.0.1 <0.22569.1> f8dae97157 rexi_server: from: couchdb@127.0.0.1(<0.21670.1>) mfa: fabric_rpc:all_docs/3 exit:{{bad_return_value,{file_truncate_error,eof,"./data/shards/20000000-3fffffff/medic-sentinel.1669626451.couch"}},{gen_server,call,[<0.22573.1>,{pread_iolist,5324969561},infinity]}} [{gen_server,call,3,[{file,"gen_server.erl"},{line,247}]},{couch_file,pread_iolist,2,[{file,"src/couch_file.erl"},{line,170}]},{couch_file,pread_binary,2,[{file,"src/couch_file.erl"},{line,166}]},{couch_file,pread_term,2,[{file,"src/couch_file.erl"},{line,155}]},{couch_btree,get_node,2,[{file,"src/couch_btree.erl"},{line,474}]},{couch_btree,stream_node,8,[{file,"src/couch_btree.erl"},{line,1069}]},{couch_btree,fold,4,[{file,"src/couch_btree.erl"},{line,242}]},{couch_bt_engine,fold_docs_int,5,[{file,"src/couch_bt_engine.erl"},{line,1129}]}]
[error] 2025-03-27T06:59:50.639485Z couchdb@127.0.0.1 <0.22573.1> -------- CRASH REPORT Process  (<0.22573.1>) with 1 neighbors exited with reason: bad return value {file_truncate_error,eof,"./data/shards/20000000-3fffffff/medic-sentinel.1669626451.couch"} at gen_server:handle_common_reply/8(line:815) <= proc_lib:init_p_do_apply/3(line:226); initial_call: {couch_file,init,['Argument__1']}, ancestors: [<0.22572.1>,<0.22571.1>], message_queue_len: 0, links: [<0.22572.1>], dictionary: [{couch_file_fd,{{file_descriptor,prim_file,#{handle => #Ref<0.445768430.378667130.9492>,...}},...}},...], trap_exit: false, status: running, heap_size: 2586, stack_size: 29, reductions: 1693
[error] 2025-03-27T06:59:50.639540Z couchdb@127.0.0.1 <0.22573.1> -------- CRASH REPORT Process  (<0.22573.1>) with 1 neighbors exited with reason: bad return value {file_truncate_error,eof,"./data/shards/20000000-3fffffff/medic-sentinel.1669626451.couch"} at gen_server:handle_common_reply/8(line:815) <= proc_lib:init_p_do_apply/3(line:226); initial_call: {couch_file,init,['Argument__1']}, ancestors: [<0.22572.1>,<0.22571.1>], message_queue_len: 0, links: [<0.22572.1>], dictionary: [{couch_file_fd,{{file_descriptor,prim_file,#{handle => #Ref<0.445768430.378667130.9492>,...}},...}},...], trap_exit: false, status: running, heap_size: 2586, stack_size: 29, reductions: 1693
[error] 2025-03-27T06:59:50.639675Z couchdb@127.0.0.1 <0.21670.1> f8dae97157 req_err(4070966599) {bad_return_value,
    {file_truncate_error,eof,
        "./data/shards/20000000-3fffffff/medic-sentinel.1669626451.couch"}} : {gen_server,call,[<0.22573.1>,{pread_iolist,5324969561},infinity]}
    [<<"gen_server:call/3 L247">>,<<"couch_file:pread_iolist/2 L170">>,<<"couch_file:pread_binary/2 L166">>,<<"couch_file:pread_term/2 L155">>,<<"couch_btree:get_node/2 L474">>,<<"couch_btree:stream_node/8 L1069">>,<<"couch_btree:fold/4 L242">>,<<"couch_bt_engine:fold_docs_int/5 L1129">>]
[notice] 2025-03-27T06:59:50.639833Z couchdb@127.0.0.1 <0.21670.1> f8dae97157 haproxy:5984 172.22.0.6 medic GET /medic-sentinel/_all_docs?include_docs=true&startkey=%22_design%2F%22&endkey=%22_design%2F%EF%BF%B0%22 500 ok 2
[error] 2025-03-27T06:59:50.641861Z couchdb@127.0.0.1 <0.22587.1> -------- gen_server <0.22587.1> terminated with reason: bad return value {file_truncate_error,eof,"./data/shards/20000000-3fffffff/medic-sentinel.1669626451.couch"}
  last msg: redacted
     state: [{data,[{"State",{file,{file_descriptor,prim_file,#{handle => #Ref<0.445768430.378667132.9542>,owner => <0.22587.1>,r_ahead_size => 0,r_buffer => #Ref<0.445768430.378667012.52690>}},false,5324976374,#Ref<0.445768430.378535938.97218>,infinity}},{"InitialFilePath","./data/shards/20000000-3fffffff/medic-sentinel.1669626451.couch"}]}]
    extra: [<0.22583.1>,[{gen,do_call,4,[{file,"gen.erl"},{line,214}]},{gen_server,call,3,[{file,"gen_server.erl"},{line,243}]},{couch_file,pread_iolist,2,[{file,"src/couch_file.erl"},{line,170}]},{couch_file,pread_binary,2,[{file,"src/couch_file.erl"},{line,166}]},{couch_file,pread_term,2,[{file,"src/couch_file.erl"},{line,155}]},{couch_btree,get_node,2,[{file,"src/couch_btree.erl"},{line,474}]},{couch_btree,stream_node,8,[{file,"src/couch_btree.erl"},{line,1069}]},{couch_btree,fold,4,[{file,"src/couch_btree.erl"},{line,242}]}]]
[info] 2025-03-27T06:59:50.641921Z couchdb@127.0.0.1 <0.266.0> -------- db shards/20000000-3fffffff/medic-sentinel.1669626451 died with reason {bad_return_value,{file_truncate_error,eof,"./data/shards/20000000-3fffffff/medic-sentinel.1669626451.couch"}}
[error] 2025-03-27T06:59:50.641961Z couchdb@127.0.0.1 <0.22587.1> -------- gen_server <0.22587.1> terminated with reason: bad return value {file_truncate_error,eof,"./data/shards/20000000-3fffffff/medic-sentinel.1669626451.couch"}
  last msg: redacted
     state: [{data,[{"State",{file,{file_descriptor,prim_file,#{handle => #Ref<0.445768430.378667132.9542>,owner => <0.22587.1>,r_ahead_size => 0,r_buffer => #Ref<0.445768430.378667012.52690>}},false,5324976374,#Ref<0.445768430.378535938.97218>,infinity}},{"InitialFilePath","./data/shards/20000000-3fffffff/medic-sentinel.1669626451.couch"}]}]
    extra: [<0.22583.1>,[{gen,do_call,4,[{file,"gen.erl"},{line,214}]},{gen_server,call,3,[{file,"gen_server.erl"},{line,243}]},{couch_file,pread_iolist,2,[{file,"src/couch_file.erl"},{line,170}]},{couch_file,pread_binary,2,[{file,"src/couch_file.erl"},{line,166}]},{couch_file,pread_term,2,[{file,"src/couch_file.erl"},{line,155}]},{couch_btree,get_node,2,[{file,"src/couch_btree.erl"},{line,474}]},{couch_btree,stream_node,8,[{file,"src/couch_btree.erl"},{line,1069}]},{couch_btree,fold,4,[{file,"src/couch_btree.erl"},{line,242}]}]]
[error] 2025-03-27T06:59:50.641974Z couchdb@127.0.0.1 <0.22583.1> 4df543a60d rexi_server: from: couchdb@127.0.0.1(<0.21670.1>) mfa: fabric_rpc:all_docs/3 exit:{{bad_return_value,{file_truncate_error,eof,"./data/shards/20000000-3fffffff/medic-sentinel.1669626451.couch"}},{gen_server,call,[<0.22587.1>,{pread_iolist,5324969561},infinity]}} [{gen_server,call,3,[{file,"gen_server.erl"},{line,247}]},{couch_file,pread_iolist,2,[{file,"src/couch_file.erl"},{line,170}]},{couch_file,pread_binary,2,[{file,"src/couch_file.erl"},{line,166}]},{couch_file,pread_term,2,[{file,"src/couch_file.erl"},{line,155}]},{couch_btree,get_node,2,[{file,"src/couch_btree.erl"},{line,474}]},{couch_btree,stream_node,8,[{file,"src/couch_btree.erl"},{line,1069}]},{couch_btree,fold,4,[{file,"src/couch_btree.erl"},{line,242}]},{couch_bt_engine,fold_docs_int,5,[{file,"src/couch_bt_engine.erl"},{line,1129}]}]
[error] 2025-03-27T06:59:50.642037Z couchdb@127.0.0.1 <0.22587.1> -------- CRASH REPORT Process  (<0.22587.1>) with 1 neighbors exited with reason: bad return value {file_truncate_error,eof,"./data/shards/20000000-3fffffff/medic-sentinel.1669626451.couch"} at gen_server:handle_common_reply/8(line:815) <= proc_lib:init_p_do_apply/3(line:226); initial_call: {couch_file,init,['Argument__1']}, ancestors: [<0.22586.1>,<0.22585.1>], message_queue_len: 0, links: [<0.22586.1>], dictionary: [{couch_file_fd,{{file_descriptor,prim_file,#{handle => #Ref<0.445768430.378667132.9542>,...}},...}},...], trap_exit: false, status: running, heap_size: 2586, stack_size: 29, reductions: 1697
[error] 2025-03-27T06:59:50.642169Z couchdb@127.0.0.1 <0.22587.1> -------- CRASH REPORT Process  (<0.22587.1>) with 1 neighbors exited with reason: bad return value {file_truncate_error,eof,"./data/shards/20000000-3fffffff/medic-sentinel.1669626451.couch"} at gen_server:handle_common_reply/8(line:815) <= proc_lib:init_p_do_apply/3(line:226); initial_call: {couch_file,init,['Argument__1']}, ancestors: [<0.22586.1>,<0.22585.1>], message_queue_len: 0, links: [<0.22586.1>], dictionary: [{couch_file_fd,{{file_descriptor,prim_file,#{handle => #Ref<0.445768430.378667132.9542>,...}},...}},...], trap_exit: false, status: running, heap_size: 2586, stack_size: 29, reductions: 1697
[error] 2025-03-27T06:59:50.642165Z couchdb@127.0.0.1 <0.21670.1> 4df543a60d req_err(4070966599) {bad_return_value,
    {file_truncate_error,eof,
        "./data/shards/20000000-3fffffff/medic-sentinel.1669626451.couch"}} : {gen_server,call,[<0.22587.1>,{pread_iolist,5324969561},infinity]}
    [<<"gen_server:call/3 L247">>,<<"couch_file:pread_iolist/2 L170">>,<<"couch_file:pread_binary/2 L166">>,<<"couch_file:pread_term/2 L155">>,<<"couch_btree:get_node/2 L474">>,<<"couch_btree:stream_node/8 L1069">>,<<"couch_btree:fold/4 L242">>,<<"couch_bt_engine:fold_docs_int/5 L1129">>]
[notice] 2025-03-27T06:59:50.642304Z couchdb@127.0.0.1 <0.21670.1> 4df543a60d haproxy:5984 172.22.0.6 medic GET /medic-sentinel/_all_docs?include_docs=true&startkey=%22_design%2F%22&endkey=%22_design%2F%EF%BF%B0%22 500 ok 2

Hi @derick

Sorry about your data corruption issue.

can the medic-sentinel database be re-rebuilt from data from the medic database?

The answer is no. For several reasons:

  1. we don’t store duplicate data in medic and medic-sentinel. So everything that is unique about the docs in medic-sentinel (transitions, last replication dates, muting histories, outbound tasks) are only stored in medic-sentinel.
  2. practically speaking, I don’t know how you could “rebuild” a shard so that CouchDb will recognize it. I don’t think this is an option. And I’m not sure that’s what you are suggesting.

However, you should have a copy of the medic-sentinel docs in your Postgres database, if you have cht-sync / couch2pg set up.

Just to add:

  • I am still experiencing this issue, I have tried reconstructing the shard, views, changing database compaction among other fixes
  • This issue causes intermittent restart/failure of couch_db container resulting to an error 503 thus rendering our application unreachable
  • I have a scheduled upgrade, I hope this might fix the issue if not, can I export users + contacts and import them on a fresh database and archive the corrupted database?

I don’t know how you could “rebuild” a shard so that CouchDb will recognize it.

My question was more around can we delete medic-sentinel and sentinel follows medic’s changes feed to create what’s needed. Your response however indicates this is not feasible since the data stored medic-sentinel is not stored in the medic database.

1 Like

Hi @Job_Isabai

I’m not sure I understand. Have you restored from backup after originally reported the issue?
If not, an upgrade will not fix it. The only way to fix this is to resolve the corrupted database file, either by removing the database or by restoring from backup.
It seems that in your case this database is a user’s meta database that doesn’t contain important data (just a record of which reports were read by this user), so it would be easiest to just delete it. It will get automatically created again when this user syncs.

Hello Diana,
What I intend to do is archive all the data in flat files for each form. I would like to know if its possible to export contacts and their corresponding users, app settings among others so that I can import them into the newly created database. This is to avoid the data corruption issue after the upgrades.
Regards,
Job

I suggest you make actual backups, as it will be easier to restore if this is necessary.
Another option is to replicate all your data to other databases, but this doesn’t serve as a full backup, you have to enable replication on a per-database basis, it takes up CouchDb time (CPU) for both backing up and restoring. You’d be creating much more work for yourself in case of a need for restore.

With full disk backups, you just copy all the files with simple commands with no additional know-how required.

1 Like