The Intel E810 network adapter is now fully certified for RDMA support in vSAN, I thought I would try it out and see what performance improvement I would get by enabling it. However I found that just installing the drivers is not enough to enable RDMA on the adapter itself.
At the time of writing this article, the driver versions that have been certified are as follows:
icen version 184.108.40.206
irdman version 220.127.116.11
E810 firmware 2.40
After installing the above drivers, I did not see any RDMA adapters listed in the vSphere UI:
So it would appear that the driver module has to be told to switch on RDMA, in order to do this you run the following two commands:
esxcli system module parameters set -m icen -p "RDMA=1,1"
esxcli system module parameters set -m irdman -p "ROCE=1,1"
The above two commands enable RDMA at the driver level, and then the version of RDMA at the RDMA driver level, for both ports. After a reboot of the host, you should now see an option in the UI for RDMA adapters:
Now going into the vSAN Services under network, you can now enable RDMA for your vSAN cluster:
In the networking section it should now show that RDMA Support is Enabled:
Now that RDMA is enabled there should be a performance boost due to the offload capabilities that RDMA offers. I will post some results as soon as my test cycles have completed.
As we all know storage media has evolved very quickly over the past few years, the decline of the spinning disk and the move to flash based storage devices, but also the shift from SAS/SATA protocol based drives to NVMe protocol based drives in order to address the performance limitations of older protocols that were designed for spinning disks and not for SSDs.
A question I get asked regularly is what type of SSD is best for the vSAN Cache tier, there are vSAN ready nodes out there that contain NAND based SSDs for both the cache tier and for the capacity tier, but then there are other technologies like Intel Optane™ SSDs being used for the cache tier, so let’s talk about the two, for the purpose of this comparison I am going to use the most common 3D NAND based NVMe in vSAN Ready node configurations, the 1.6TB Intel P4610 NVMe drive, and the 375GB P4800X Intel Optane™ SSD, both of these SSDs are NVMe based devices.
As you can see from the above table there are some major differences between the two different SSD’s notably the Random Write performance which is critical in the cache tier in a vSAN environment as all the incoming writes are random in nature and are absorbed by the cache tier, the NAND based SSD does not have as much capability around the random writes versus the Optane™ SSD, but the biggest impact to a vSAN Cache tier is the Drive Writes Per Day (DWPD), if you look at the specifications in detail, the P4610 can handle around 3.5 DWPD which equates to around 5.6TB of data written daily, whereas the 375GB Optane™ SSD can handle up to 60 DWPD which equates to 15TB of data written daily, remember that the Optane™ SSD is also less than a quarter of the capacity of the P4610, so in a vSAN environment cache tier, the Optane™ SSD wins hands down from an endurance perspective as well as the abaility to handle the random writes a lot quicker, so why such a difference?
Well if you look at NAND based SSDs, firstly there is usally an element of DRAM that acts as a buffer to the NAND media which is usually around 1GB of DRAM for every TB of media, so any incoming writes hit the DRAM buffer first, this can be a positive boost in short, low block size write bursts, but cannot be sustained over a longer period of time, in an Optane™ SSD there is no such DRAM buffer so the data is being written to directly to the media. The VxRAIL team at Dell EMC have done some extensive testing around this and clearly demonstrated that a NAND based SSD cannot sustain the same level of write performance in a continuous fashion whereas the Optane™ SSD maintains the same level of write performance consistently, below is the results of their performance testing:
The way NAND based SSDs and Optane™ SSDs perfrom write operations is fundamentally different, in everybody’s NAND, media has to be read and written in pages, but everything has to be erased in blocks. Page updates are typically written to a new unused block, as new data is written, old pages become stale, and on an SSD these stale pages can build up fairly quickly which means at some point there a significant chunks of blocks that are obsolete, this then has to be garbage collected. This will then clear the block and allow that block to receive data, and the process starts all over again.
Optane™ SSDs are transistor-less which essentially means that each cell state can be changed from a 0 or 1 independently of other cells on the device. This means that Optane™ SSDs are completely bit addressable as opposed to having to write in pages, there is also no garbage collection required, and this obviously has a positive impact on performance as well as endurance which is why Optane™ SSDs have very high endurance capabilities.
So what does all this mean from an application perspective? Well the VxRAIL Guys at Dell EMC also did some performance testing using Hammer DB and shown some significant performance gains when using Intel Optane™ SSDs versus traditional NAND as Cache as much as a 61% gain in performance in a complex OLTP workload
As we all know latency is critical in any type of workloads, what I have seen in performance testing is that Intel Optane™ SSDs consistently provide lower latency as well as a much more tightly controlled standard deviation on latency versus the P4610, even though in some smaller block size tests the performance of both devices was similar, in larger block size tests the Optane™ SSD again delivered lower latency and tightly controlled standard deviation in latency but also provided a much higher performance in comparison to the P4610. You also have to remember that the P4610 device was only using 37% Span due to vSAN Currently having a limit of 600GB write buffer per disk group, whereas the Optane™ SSD was using 100% Span, so the P4610 had a bit of an unfair advantage here.
Conclusion What is clear from a vSAN perspective, endurance plays a critical role in the vSAN cache tier, in the very early days of vSAN there was no other choice but SAS or SATA based NAND devices with a ranging DWPD of between 10 and 25 based on an 800GB Drive, but as the technology evolution pushes the boundaries of performance and endurance, technology like Intel Optane™ SSDs clearly have an edge offering up to 60 DWPD on a smaller capacity of 375GB.
Smaller cache device…are you serious?
In the testing I have done on full NVMe systems where Intel Optane™ SSDs are being used in the vSAN Cache Tier, and standard more read-intensive NVMe drives like the Intel P4510 are being used in the capacity tier, a 375GB Optane™ SSD is more than sufficient, in most workloads a 750GB Optane™ SSD did not improve performance, even with 375GB I was only able to saturate the write buffer by 60% (based on vSAN 6.7 Update 3).
So whilst NAND based devices are fully supported as a vSAN cache device, they may not be the right choice when it comes down to consistent performance and endurance required for a modern infrastructure.
I met with a team from Intel recently and discussed their new additions to the vSAN Compatibility Guide, mainly around their QLC NVMe drives. I have spoken to many customers around Full NVMe configurations on many occasions and usually there was a slightly higher price to pay for such configurations, but the QLC NVMe drives could be a turning point for future proofing your HCI platform because they are cheaper than your SAS/SATA Equivalent!
This being said, I have heard many times that the days of SATA/SAS based drives are numbered, but clearly with these QLC NVMe drives this could be much sooner rather than later.
Right now the 7.68TB D5-P4320 has been certified, and I have been informed by Intel that the 15.3TB one is currently going through certification, that’s now a game changer for delivering high amounts of capacity at a reasonable cost price. If I take the 4-Node Full NVMe cluster I have access too and replaced all the current NVMe devices for the 7.68TB QLC NVMe devices, I would have an effective usable capacity of 166TB and double that with the 15.3TB drives, this is based on RAID5 Storage Policy only and also taking into account the 10% difference between Device Capacity and Actual Capacity. So let’s take a look a bit more closely at these new QLC NVMe drives from Intel:
From the ARK portal we can determine the following information:
Sequential Read (up to)
Sequential Write (up to)
Random Read (100% span)
Random Write (100% span)
Latency – Read
Latency – Write
PCIe NVMe 3.1 x4
Now if you remember my blog around Full NVMe performance, combining Intel Optane with their NVMe drives will deliver a much more superior performance characteristic versus traditional SAS/SATA, however in addition to that with these new QLC NVMe drives it also reduces the cost of capacity, but just how much of a difference is it?
So I checked out the prices here in the UK, from the same supplier, here’s the link to the NVMe QLC Device and here’s the link to a SAS Equivalent.
For the benefit of this exercise I compared the lowest cost SAS 12G 7.68TB Drive on the vSAN Compatibility Guide since Intel do not manufacture SAS based SSDs and vendors seem to favour SAS based SSDs over SATA
As you can clearly see, the cost per GB is significantly lower at £0.20 per GB (this falls to around £0.18 per GB on the larger 15.3TB device), however there is one thing to note, the DWPD of the QLC NVMe device is much lower in comparison to the SATA device but in a vSAN environment should this matter too much? The simple answer here is no, but if we look at the maths, if I had 8 of the QLC devices in each host in my 4-node cluster, and I have a usable capacity of 166TB, at 0.2 DWPD that means I would have to be writing 33.2TB of data per day to hit the 0.2 DWPD, so the lower DWPD in a vSAN environment is not significant unless you are constantly writing fresh data that would exceed the above.
I am hoping that I can get some of these QLC NVMe drives from Intel to get some performance data from them in order to complete the write up and give some performance characteristics, but based on my previous full NVMe performance testing I would not expect them to be lower than those previous tests.