Hosting Total Cost of Ownership Squad

27 May 2025 Call

Attending

Notes

  • General discussion about “upgrades take a lot of space” - very likely Hosting TCO Squad 2.0 focus
    • why does CHT upgrading (adding new DDocs) cause more than 100% increase disk space?
    • view changes are what cause a large increase in disk space, which do not happen every upgrade
    • @jkuester to file POC ticket in CHT Coreto show how disk space goes up on a generic couch instance with view re-indexing. Discuss w/ @diana and @twier . We’ll then take this over to Couch Slack channel for questions.
  • @elijah - looking to have VMs to test eCHIS KE upgrades clones of their production from 4.11 to nouveau@master
  • @mrjones - confirm how extra disk space use when upgrading 4.19/4.18 → nouveau@master

3 Jun 2025 Call

Attending

Notes

  • Review main ticket
  • any ddoc changes, all views are rebuilt
    • See related comment on research ticket
    • Q: do we know for certain if a 4.19 → 4.20 will cause this spike in disk use? A: . From the current diff of 4.19.0...master, I do not see any changes to the ddoc files
    • Need for more disk space upgrade testing on ec2 test instance with prod data?
  • MoH KE update on testing Nouveau on prod data
    • had to work through MoH process for provisioning a new VM
    • got a large and small cloned instance
    • will commence testing once ready - but need to make sure large instance has enough free space before proceeding
  • hosting TCO 2.0
    • consider moving views to more/different ddocs?
    • existing research ticket
    • current ticket status is “it’s tricky”
    • still looking into this effort though as it’s still quite promising - it is likely a key part of path forward to address 5x disk space in upgrading
    • relevant findings in shard/cpu research on forums
    • tom exploring why exactly the space used is more than ~5x we see in production. ideally it’s ~2x.
    • early research showing map reduce (including, but beyond, freetext) views are resource intensive (both CPU & Disk)
    • some early research on removing freetext views

10 Jun 2025 Call

Attending

Notes

  • Review main ticket
  • Review large MoH deployment testing Nouveau branch on prod data
    • How much spare disk space do they need?
    • Sugat concerned about more than 5x as seen in research ticket
    • Elijah: I’m in the process of procuring additional storage to begin the upgrade process and wanted to clarify that we settled on 5x current capacity e.g. one clone has a 5TB volume with 3.2TB utilization, should we expand to 16TB or up to 25TB to get some margin of safety.
    • recommendation: 16TB should be fine since the utilization is 3.2TB
    • Interrupted upgrades should both not lose all progress, and be able to resume where they left off
    • recommend starting in MoH Data Center with 8vCPU/16GB RAM to see how it goes. Success or failure will be well to inform TCO Squad with next steps.
  • Maybe if a 3.2TB instance needs >16TB (25TB!!?!), do we need to ship TCO V2 (eg in 4.2x) before TCO v1 (in 5.0) which will reduce total space needed
  • TCO Squad agrees that we should wait for Elijah’s testing in MoH datacenter, this may be weeks, possibly months in worst case, to completed. In the interim, upgrade disk space research can continue
1 Like

24 Jun 2025 Call

Attending

Notes

  • Review main ticket
  • Does eCHIS KE need to test Nouveau before we release 5.0?
    • nothing will really change for the branch if don’t test MoH KE before the release
    • eCHIS KE only has 50% of avail storage, so upgrades won’t be easy b/c they don’t have 5x avail free disk
    • input from eCHIS KE: prepare community for what they’ll need to upgrade and what the benefits will be
    • eCHIS has two instances, one small and one big, small is done and big is maybe 3/4 done.
    • will retest and run CHT Toolbox to closely monitor disk space
    • Despite earlier choice to wait, we will not wait for eCHIS KE test results. If we get lucky and it’s done before hand, we’ll incorporate findings, but not blocking.
  • k8s effort underway
  • review 5.0 milestone