Enabling RDMA for vSAN with intel e810 adapter

The Intel E810 network adapter is now fully certified for RDMA support in vSAN, I thought I would try it out and see what performance improvement I would get by enabling it. However I found that just installing the drivers is not enough to enable RDMA on the adapter itself.

At the time of writing this article, the driver versions that have been certified are as follows:

  • icen version
  • irdman version
  • E810 firmware 2.40

After installing the above drivers, I did not see any RDMA adapters listed in the vSphere UI:

So it would appear that the driver module has to be told to switch on RDMA, in order to do this you run the following two commands:

esxcli system module parameters set -m icen -p "RDMA=1,1"
esxcli system module parameters set -m irdman -p "ROCE=1,1"

The above two commands enable RDMA at the driver level, and then the version of RDMA at the RDMA driver level, for both ports. After a reboot of the host, you should now see an option in the UI for RDMA adapters:

Now going into the vSAN Services under network, you can now enable RDMA for your vSAN cluster:

In the networking section it should now show that RDMA Support is Enabled:

Now that RDMA is enabled there should be a performance boost due to the offload capabilities that RDMA offers. I will post some results as soon as my test cycles have completed.

Do I need a Bigger Write Buffer?

Even since VMware published this article on cache sizing guidelines for all-flash I still get asked two questions, the first one is around the amount of cache required per node? The second is about when vSAN will have a larger write buffer?

The amount of cache required per node in all-flash is not dependant on the amount of usable space like it was in Hybrid configurations, the amount of cache per node is based purely on the endurance of the cache SSDs, which typically fall into four categories:

  • Up to 2 drive writes per day
  • 3 drive writes per day
  • 10 drive writes per day
  • 30 drive writes per day

With the birth of the P5800X from Intel having an endurance capability of 100 drive writes per day, I would expect a 5th category will appear soon too.

If we look at the amount of DWPD a drive is capable of we can see whether it would be good in a cache tier or not, for example a device with 0.4-2 DWPD is likely to be certified for the vSAN capacity tier and not the cache tier.

Since the cache tier is where 100% of the writes happen, this is where you need the higher endurance devices, the higher the writes in your environment means you need to look at the endurance as this will be the biggest factor in the amount of cache you need.

If we look at the 3 DWPD category, this is normally categorised by the vendors as “Mixed-Use”, and is the most economically priced cache device for vSAN, but because of the lower endurance, you actually need more cache. I have looked at a lot of Live Optics reports over the past few months to gather information on what is the average % of writes in a customer environment, and the number that came out was 37%, yes higher than the 30% normally envisaged.

So based on 3DWPD and >30% Random Writes, the VMware Article states you need 3.6TB of cache per node based on an AF-8 Config, so this would result in a likely configuration of 3x 1.6TB Devices:

The next category of 10 DWPD, these are usually classed as “Write Intensive” by the vendors, again according to the VMware table you would need 1.2TB of cache per node, again based on an AF-8 Config with >30% Random writes:

Then we come to the final category of 30 DWPD, these devices are usually categorised as “Write Intensive Express Flash”, and this is usually Intel Optane SSD Devices such as the P4800X, for the same workload, the VMware recommendation is to have 400GB of cache per node:

As you can see, the amount of cache you require is based on the endurance of the devices when it comes to vSAN all-flash.

To address the second question about vSAN ever having a larger write buffer, this has been mentioned for a long time, but my opinion here is that you do not need to have a larger write buffer if you are using high endurance devices, and with the new Intel P5800X having an endurance factor of 100 DWPD, I expect that the amount of cache per node would be lower still, so I would not expect a big emphasis on the write buffer from a vSAN perspective.

As SSDs become faster, more higher endurance, it mitigates the need to have larger write buffers, especially in Full NVMe configurations for example where the storage is sat on the PCIe BUS directly, rather than sat behind a disk controller. And in my experience with Intel Optane SSDs, the 375GB (P4800X) and 400GB (P5800X) serve very well even in write intensive environments.

vSAN 7.0U1 – A New Way to Add Capacity

As we all know there are a number of ways of scaling capacity in a vSAN environment, you can add disks to existing hosts and scale the storage independently of compute, or you can add nodes to the cluster and scale both the storage and compute together, but what if you are in a situation where you do not have any free disk slots available, and / or you are unable to add more nodes to the existing cluster? Well vSAN 7.0U1 comes with a new feature called vSAN HCI Mesh, so what does this mean and how does it work?

Let’s take the scenario below, we have two vSAN clusters in the same vCenter, Cluster A is nearing capacity from a storage perspective, but the compute is relatively under utilised, there are no available disk slots to expand out the storage. Cluster B on the other hand has a lot of free storage capacity but is more utilised on the compute side of things:

Now the vSAN HCI Mesh will allow you to consume storage on a remote vSAN cluster providing it exists within the same vCenter inventory, there are no special hardware / software requirements (apart from 7.0U1) and the traffic will leverage the existing vSAN network traffic configuration.

This cool feature adds an elastic capability to vSAN Clusters, especially if you need to have some additional temporary capacity for application refactoring or service upgrade where you want to deploy the new services but keep the old one operational until the transition is made.

VMware has not left the monitoring capabilities of such use out either, in the UI you can monitor the usage of “Remote VM” from a capacity perspective as well as within the performance service

So this clearly allows dissagregation of storage and compute in a vSAN environment and offers that flexibility and elasticity of storage consumption are there any limitations?

  • A vSAN cluster can only mount up to 5 remote vSAN Datastores
  • The vSAN Cluster must be able to access the other vSAN cluster(s) via the vSAN Network
  • vSphere and vCenter must be running 7.0U1 or later
  • Enterprise and Enterprise Plus editions of vSAN
  • Enough hosts / configuration to support storage policy, for example if your remote cluster has only four hosts, you cannot use a policy which requires RAID6

So this is a pretty cool feature and sort of elliminates the need for Storage Only vSAN nodes which was discussed in the past at many VMworlds

It's all about VMware vSAN