CHT on Kubernetes

Hello,
There is minimum documentation on CHT installation on Kubernetes: Kubernetes vs Docker | Community Health Toolkit
I would like to know if there is anyone who has implemented this and if they can share with me their experience or documentation so that I can referee to it and make a case of migration from Docker to Kubernetes. My focus is having a much more robust infrastructure.
Thanks,
Job

Hi Job,

We’ve migrated CHT projects from docker to kubernetes and are working on adding guides in PR#1557. I’d be happy to walk through the process with you when ready.

3 Likes

Hello @elijah
at ICRC we’d like to deploy our first CHT instance on Kubernetes. Is there any resources ( helm, yaml files) we can use ?
thanks

Hi Frederic,

CHT’s helm charts are hosted in this repository: GitHub - medic/helm-charts

2 Likes

Thanks @elijah
I suppose we should use helm-charts/charts/cht-chart-4x at main · medic/helm-charts · GitHub correct ?

Confirmed Frederic, this is the correct link for the recommended CHT v4 deployment

1 Like

Hello,
I was looking at the helms charts, tried to set them up locally on Docker environment with Kubernetes but I couldn’t find a clear documentation or process. Would you please share with us a documentation for both Windows and Linux environment.

Hi Job,

Helm charts can be deployed to a kubernetes cluster using this command helm install /path/to/chart. Additional details are available on the helm documentation site: Helm | Using Helm

Hi Elijah,
Yes that’s exactly what I thought, but I kept on getting this error on a couple of yaml files:
Error: INSTALLATION FAILED: YAML parse error on cht-chart-4x/templates/api-deployment.yaml: error converting YAML to JSON: yaml: line 35: mapping values are not allowed in this context

Please share the values.yaml file that you’re using for this installation

project_name: "msfecare" # e.g. mrjones-dev
namespace: "ecare-dev" # e.g. "mrjones-dev"
chtversion: 4.10.0
# cht_image_tag: 4.1.1-4.1.1 #- This is filled in automatically by the deploy script. Don't uncomment this line.

# If images are cached, the same image tag will never be pulled twice. For development, this means that it's not
# possible to upgrade to a newer version of the same branch, as the old image will always be reused.
# For development instances, set this value to false.
cache_images: true

# Don't change upstream-servers unless you know what you're doing.
upstream_servers:
  docker_registry: "public.ecr.aws/medic"
  builds_url: "https://staging.dev.medicmobile.org/_couch/builds_4"
upgrade_service:
  tag: 0.32

# CouchDB Settings
couchdb:
  password: "Password" # Avoid using non-url-safe characters in password
  secret: "f9053a0a-ef77-4be3-994d-87d6732600fd" # for prod, change to output of `uuidgen
  user: "medic"
  uuid: "7300115e-1a98-4607-a37c-50e0c9913767" # for prod, change to output of `uuidgen`
  clusteredCouch_enabled: false
  couchdb_node_storage_size: 100Mi
clusteredCouch:
  noOfCouchDBNodes: 3
toleration:   # This is for the couchdb pods. Don't change this unless you know what you're doing.
  key: "dev-couchdb-only"
  operator: "Equal"
  value: "true"
  effect: "NoSchedule"
ingress:
  annotations:
    groupname: "dev-cht-alb"
    tags: "Environment=dev,Team=QA"
    certificate: "arn:aws:iam::720541322708:server-certificate/2024-wildcard-dev-medicmobile-org-chain"
  # Ensure the host is not already taken. Valid characters for a subdomain are:
  #   a-z, 0-9, and - (but not as first or last character).
  host: "<subdomain>.dev.medicmobile.org"  # e.g. "mrjones.dev.medicmobile.org"
  hosted_zone_id: "Z3304WUAJTCM7P"
  load_balancer: "dualstack.k8s-devchtalb-3eb0781cbb-694321496.eu-west-2.elb.amazonaws.com"

environment: "remote"  # "local", "remote"
cluster_type: "eks" # "eks" or "k3s-k3d"
cert_source: "eks-medic" # "eks-medic" or "specify-file-path" or "my-ip-co"
certificate_crt_file_path: "/path/to/certificate.crt" # Only required if cert_source is "specify-file-path"
certificate_key_file_path: "/path/to/certificate.key" # Only required if cert_source is "specify-file-path"

nodes:
  # If using clustered couchdb, add the nodes here: node-1: name-of-first-node, node-2: name-of-second-node, etc.
  # Add equal number of nodes as specified in clusteredCouch.noOfCouchDBNodes
  node-1: "" # This is the name of the first node where couchdb will be deployed
  node-2: "" # This is the name of the second node where couchdb will be deployed
  node-3: "" # This is the name of the third node where couchdb will be deployed
  # For single couchdb node, use the following:
  # Leave it commented out if you don't know what it means.
  # Leave it commented out if you want to let kubernetes deploy this on any available node. (Recommended)
  # single_node_deploy: "gamma-cht-node" # This is the name of the node where all components will be deployed - for non-clustered configuration. 

# Applicable only if using k3s
k3s_use_vSphere_storage_class: "false" # "true" or "false"
# vSphere specific configurations. If you set "true" for k3s_use_vSphere_storage_class, fill in the details below.
vSphere:
  datastoreName: "DatastoreName"  # Replace with your datastore name
  diskPath: "path/to/disk"         # Replace with your disk path

# -----------------------------------------
#       Pre-existing data section
# -----------------------------------------
couchdb_data:
  preExistingDataAvailable: "false" #If this is false, you don't have to fill in details in local_storage or remote.
  dataPathOnDiskForCouchDB: "data" # This is the path where couchdb data will be stored. Leave it as data if you don't have pre-existing data.
    # To mount to a specific subpath (If data is from an old 3.x instance for example): dataPathOnDiskForCouchDB: "storage/medic-core/couchdb/data"
    # To mount to the root of the volume: dataPathOnDiskForCouchDB: ""
    # To use the default "data" subpath, remove the subPath line entirely from values.yaml or name it "data" or use null.
    # for Multi-node configuration, you can use %d to substitute with the node number.
    # You can use %d for each node to be substituted with the node number.
    # If %d doesn't exist, the same path will be used for all nodes.
    # example: test-path%d will be test-path1, test-path2, test-path3 for 3 nodes.
    # example: test-path will be test-path for all nodes.
  partition: "0" # This is the partition number for the EBS volume. Leave it as 0 if you don't have a partitioned disk.

# If preExistingDataAvailable is true, fill in the details below.
# For local_storage, fill in the details if you are using k3s-k3d cluster type.
local_storage:  #If using k3s-k3d cluster type and you already have existing data.
  preExistingDiskPath-1: "/var/lib/couchdb1" #If node1 has pre-existing data.
  preExistingDiskPath-2: "/var/lib/couchdb2" #If node2 has pre-existing data.
  preExistingDiskPath-3: "/var/lib/couchdb3" #If node3 has pre-existing data.
# For ebs storage when using eks cluster type, fill in the details below.
ebs:
  preExistingEBSVolumeID-1: "vol-0123456789abcdefg" # If you have already created the EBS volume, put the ID here.
  preExistingEBSVolumeID-2: "vol-0123456789abcdefg" # If you have already created the EBS volume, put the ID here.
  preExistingEBSVolumeID-3: "vol-0123456789abcdefg" # If you have already created the EBS volume, put the ID here.
  preExistingEBSVolumeSize: "100Gi" # The size of the EBS volume.

Thanks Job,

The following deployment files within the template directory reference cht_image_tag which has been replaced by chtversion in values.yaml:

  • api-deployment.yaml
  • couchdb-single-deployment.yaml
  • haproxy-deployment.yaml
  • healthcheck-deployment.yaml
  • sentinel-deployment.yaml

After making this update the charts will compile successfully.

Hello Elijah,
Thank you very much for this, I made the update and now am able to run installations. Now onto my next issue:

  • All pods except couchdb and haproxy even when running a single couchdb or a clustered.
  • The couchdb logs:
[notice] 2025-01-08T11:18:18.773849Z couchdb@127.0.0.1 <0.109.0> -------- config: [admins] medic set to '****' for reason nil
[info] 2025-01-08T11:18:18.812620Z couchdb@127.0.0.1 <0.254.0> -------- Apache CouchDB has started. Time to relax.
[notice] 2025-01-08T11:18:18.818237Z couchdb@127.0.0.1 <0.353.0> -------- rexi_server : started servers
[notice] 2025-01-08T11:18:18.819214Z couchdb@127.0.0.1 <0.357.0> -------- rexi_buffer : started servers
[warning] 2025-01-08T11:18:18.838070Z couchdb@127.0.0.1 <0.365.0> -------- creating missing database: _nodes
[info] 2025-01-08T11:18:18.838123Z couchdb@127.0.0.1 <0.366.0> -------- open_result error {not_found,no_db_file} for _nodes
[warning] 2025-01-08T11:18:18.889595Z couchdb@127.0.0.1 <0.381.0> -------- creating missing database: _dbs
[warning] 2025-01-08T11:18:18.889595Z couchdb@127.0.0.1 <0.382.0> -------- creating missing database: _dbs
[info] 2025-01-08T11:18:18.889640Z couchdb@127.0.0.1 <0.384.0> -------- open_result error {not_found,no_db_file} for _dbs
[notice] 2025-01-08T11:18:18.907307Z couchdb@127.0.0.1 <0.396.0> -------- mem3_reshard_dbdoc start init()
[notice] 2025-01-08T11:18:18.926356Z couchdb@127.0.0.1 <0.398.0> -------- mem3_reshard start init()
[notice] 2025-01-08T11:18:18.926461Z couchdb@127.0.0.1 <0.399.0> -------- mem3_reshard db monitor <0.399.0> starting
[notice] 2025-01-08T11:18:18.930542Z couchdb@127.0.0.1 <0.398.0> -------- mem3_reshard starting reloading jobs
[notice] 2025-01-08T11:18:18.930639Z couchdb@127.0.0.1 <0.398.0> -------- mem3_reshard finished reloading jobs
[info] 2025-01-08T11:18:18.952029Z couchdb@127.0.0.1 <0.405.0> -------- Apache CouchDB has started. Time to relax.
[info] 2025-01-08T11:18:18.952116Z couchdb@127.0.0.1 <0.405.0> -------- Apache CouchDB has started on http://0.0.0.0:5984/
[notice] 2025-01-08T11:18:18.967965Z couchdb@127.0.0.1 <0.426.0> -------- chttpd_auth_cache changes listener died because the _users database does not exist. Create the database to silence this notice.
[error] 2025-01-08T11:18:18.968299Z couchdb@127.0.0.1 emulator -------- Error in process <0.427.0> on node 'couchdb@127.0.0.1' with exit value:
{database_does_not_exist,[{mem3_shards,load_shards_from_db,[<<"_users">>],[{file,"src/mem3_shards.erl"},{line,430}]},{mem3_shards,load_shards_from_disk,1,[{file,"src/mem3_shards.erl"},{line,405}]},{mem3_shards,load_shards_from_disk,2,[{file,"src/mem3_shards.erl"},{line,434}]},{mem3_shards,for_docid,3,[{file,"src/mem3_shards.erl"},{line,100}]},{fabric_doc_open,go,3,[{file,"src/fabric_doc_open.erl"},{line,39}]},{chttpd_auth_cache,ensure_auth_ddoc_exists,2,[{file,"src/chttpd_auth_cache.erl"},{line,214}]},{chttpd_auth_cache,listen_for_changes,1,[{file,"src/chttpd_auth_cache.erl"},{line,160}]}]}
[error] 2025-01-08T11:18:18.968424Z couchdb@127.0.0.1 emulator -------- Error in process <0.427.0> on node 'couchdb@127.0.0.1' with exit value:
{database_does_not_exist,[{mem3_shards,load_shards_from_db,[<<"_users">>],[{file,"src/mem3_shards.erl"},{line,430}]},{mem3_shards,load_shards_from_disk,1,[{file,"src/mem3_shards.erl"},{line,405}]},{mem3_shards,load_shards_from_disk,2,[{file,"src/mem3_shards.erl"},{line,434}]},{mem3_shards,for_docid,3,[{file,"src/mem3_shards.erl"},{line,100}]},{fabric_doc_open,go,3,[{file,"src/fabric_doc_open.erl"},{line,39}]},{chttpd_auth_cache,ensure_auth_ddoc_exists,2,[{file,"src/chttpd_auth_cache.erl"},{line,214}]},{chttpd_auth_cache,listen_for_changes,1,[{file,"src/chttpd_auth_cache.erl"},{line,160}]}]}
[notice] 2025-01-08T11:18:19.015081Z couchdb@127.0.0.1 <0.474.0> -------- Missing system database _users
Waiting for cht couchdb
  • Haproxy logs
# servers are added at runtime, in entrypoint.sh, based on couchdb-1.ecare.svc.cluster.local,couchdb-2.ecare.svc.cluster.local,couchdb-3.ecare.svc.cluster.local
  server couchdb-1.ecare.svc.cluster.local couchdb-1.ecare.svc.cluster.local:5984 check agent-check agent-inter 5s agent-addr healthcheck.ecare.svc.cluster.local agent-port 5555
  server couchdb-2.ecare.svc.cluster.local couchdb-2.ecare.svc.cluster.local:5984 check agent-check agent-inter 5s agent-addr healthcheck.ecare.svc.cluster.local agent-port 5555
  server couchdb-3.ecare.svc.cluster.local couchdb-3.ecare.svc.cluster.local:5984 check agent-check agent-inter 5s agent-addr healthcheck.ecare.svc.cluster.local agent-port 5555
[alert] 007/111000 (1) : parseBasic loaded
[alert] 007/111000 (1) : parseCookie loaded
[alert] 007/111000 (1) : replacePassword loaded
[NOTICE]   (1) : haproxy version is 2.6.17-a7cab98
[NOTICE]   (1) : path to executable is /usr/local/sbin/haproxy
[ALERT]    (1) : config : [/usr/local/etc/haproxy/backend.cfg:7] : 'server couchdb-servers/couchdb-1.ecare.svc.cluster.local' : parsing agent-addr failed. Check if 'healthcheck.ecare.svc.cluster.local' is correct address..
[ALERT]    (1) : config : [/usr/local/etc/haproxy/backend.cfg:8] : 'server couchdb-servers/couchdb-2.ecare.svc.cluster.local' : parsing agent-addr failed. Check if 'healthcheck.ecare.svc.cluster.local' is correct address..
[ALERT]    (1) : config : [/usr/local/etc/haproxy/backend.cfg:9] : 'server couchdb-servers/couchdb-3.ecare.svc.cluster.local' : parsing agent-addr failed. Check if 'healthcheck.ecare.svc.cluster.local' is correct address..
[ALERT]    (1) : config : Error(s) found in configuration file : /usr/local/etc/haproxy/backend.cfg
[ALERT]    (1) : config : Fatal errors found in configuration.`

Hi Job,

Couch logs indicate that it’s trying to create databases with failing writes.

I would recommend setting up CHT locally using k3s starting with a single-node deployment then move to multi-node deployment and finally the production environment. This approach will make it easier to isolate where specific issues are occurring.