Hosting Total Cost of Ownership 2.0 Squad

The initial effort to decrease hosting total cost of ownership (TCO) is wrapping up. This squad was hugely successful by reducing day to day disk use of up to 30% for large deployments. The ticket covering this work will be released in CHT 5.0.

However, while testing this initial Hosting TCO 1.0 feature, it became clear that there was another big disk use pain that our deployments feel: the huge amount of disk space the CHT needs to upgrade. This can be up to 5x of free space needed. For example, if a large deployment had 1.8 TB of data and was on a 4TB disk they would have 55% of free disk space - plenty of room to grow for months, or possibly years! However, they would need to get to upgrade to a 9TB disk to have enough free space to upgrade. This is a massive imposition.

To better visualize the issue, here’s a chart where you can see an impressive >30% gain that was achieved labeled “TCO 1.0”. As well, you can see the ~500% ephemeral space needed for the upgrade - this is labelled “TCO 2.0” - as this is where the focus of 2.0 will be:

Hence, Hosting TCO 2.0 has been born. The squad will be composed of exactly the same members as the current squad, but with a focus on reducing disk space during upgrades.

For anyone interested in joining the squad or dropping in on our weekly call, please see the calendar and check for the Tuesday “Hosting TCO Squad Call”. All are welcome!

2 Likes

1 Jul 2025 Call

Attending

Notes

  • Not much updates this week as most of the involved teammates in the squad are offline
  • @twier Working on PR to adding Nouveau to helm charts
  • Another ticket dependent on helm chart upgrades

15 Jul 2025 Call

Attending

Notes

  • Josh working on chtoolbox pre-stage script for sequentially building staging indexes for an upgrade. Hoping to be done today. Will post updates to this issue thread.
  • Tom: Being doing local tests to check actual performance improvements or benefits from cleaning up redundant/unused indexes.
    • From a disk-space perspective, seems that the most gains are from freetext indexes. Anything else will just be incremental.
    • Also looking deeper into I/O improvements. MOH I/O issue seems around lots of small reads * high read latency.
    • Still worth it to try and reduce indexes. General performance improvments will result.
    • Probably not worth trying to get all the tiny incremental improvements here. (Just not going to get good return for the effort).
  • Need more investigation around Couch IO Queue config.