Since vSAN was released in 2014 there has been a bit of confusion as to how much cache should be sized for the cluster, this article is intended to clear that up and provide direction for both Hybrid and All-Flash configurations. The reason there are differences in the recommendations is primarily because in Hybrid the cache is a Read Cache as well as a Write Buffer and in All-Flash it’s just serving as a Write Buffer, so the sizing is not a “one size fits all”.
- RAW Capacity – This is the amount of capacity the vSAN Datastore will provide
- Usable Capacity – This is the amount of capacity that can be provided based on the FTT level specified in storage policies
- Provisioned / Deployed Capacity – This is the amount of space taken by objects before FTT is taken into account
- Consumed Capacity – This is the amount of space that has been consumed by objects taking into account FTT, for example a 100GB Object with FTT=1 will consume 200GB of storage space
Hybrid Cache Sizing
In Hybrid the recommendation has always been 10% of Usable Capacity or Deployed capacity, if we have a 3-Node vSAN cluster, each host has two disk groups and each disk group has 7×1.2TB 10K SAS Drives, that means each host has 16.8TB of RAW Capacity and our 3-Node cluster has 50.4TB of RAW Capacity, based on the following FTT Values in our storage policy this means our total Usable Capacity is:
FTT=0 – 50.4TB
FTT=1 – 25.2TB
FTT=2 – 16.8TB
FTT=3 – 12.6TB
Based on the 10% rule our cache requirements are as follows:
FTT=0 – 5.4TB which equates to 1.8TB Per node which equates to 900GB Per Disk Group
FTT=1 – 2.52TB which equates to 0.84TB Per node which equates to 420GB Per Disk Group
FTT=2 – 1.68TB which equates to 0.56TB Per node which equates to 280GB Per Disk Group
FTT=3 – 1.26TB which equates to 0.42TB Per node which equates to 210GB Per Disk Group
The above sizing is all well and good if you are only using a single FTT method, however vSAN allows you to define policies with different FTT levels which means you can have objects on vSAN that have varying levels of protection, this makes sizing using the above method all the more difficult.
The best way to size the cache in a Hybrid cluster is to base it on your deployed or provisioned capacity, for example in the above RAW capacity of 50.4TB you may choose to have the following as an example
10 Objects based on FTT=0 of 500GB which totals 5TB of Provisioned Capacity and 5TB of Consumed Capacity
10 Objects based on FTT=1 of 500GB which totals 5TB of Provisioned Capacity and 10TB of Consumed capacity
10 Objects based on FTT=2 of 500GB which totals 5TB of Provisioned Capacity and 15TB of Consumed Capacity
10 Objects based on FTT=3 of 500GB which totals 5TB of Provisioned Capacity and 20TB of Consumed Capacity
If you total up the above, our Provisioned Capacity is 20TB but our Consumed Capacity is 50TB, based on the Provisioned Capacity of 20TB, 10% of this is 2TB which equates to 0.67TB Per Node, or 333GB per Disk Group, this is how your cache in Hybrid should be sized.
All-Flash Cache Sizing
All flash has a lot more factors to consider, Erasure Coding, Dedupe and Compression and the fact that the cache is purely a Write Buffer so we have to take into account write endurance so the usual 10% sizing does not apply here. In reality the typical 70% Read / 30% Write workload means that a lot of the requests are coming from the Capacity Tier which in this case is flash based anyway, so this means that the cache layer can be much smaller than it would have been in Hybrid for the same RAW capacity, however there is the write endurance factor to take into account. We all know there is a write buffer limit in vSAN but that does not mean you should limit the size of the SSD drives based on that, the main reason is to increase the endurance of the drive, vSAN will cycle through all the cells on the drive irrelevant of the Write Buffer Limit. VMware recently published a new sizing guide for All-Flash which is shown below
There we have it!