Tag Archives: p4800X

Why a NAND based SSD may not be the right choice for vSAN cache Tier

As we all know storage media has evolved very quickly over the past few years, the decline of the spinning disk and the move to flash based storage devices, but also the shift from SAS/SATA protocol based drives to NVMe protocol based drives in order to address the performance limitations of older protocols that were designed for spinning disks and not for SSDs.

A question I get asked regularly is what type of SSD is best for the vSAN Cache tier, there are vSAN ready nodes out there that contain NAND based SSDs for both the cache tier and for the capacity tier, but then there are other technologies like Intel Optane™ SSDs being used for the cache tier, so let’s talk about the two, for the purpose of this comparison I am going to use the most common 3D NAND based NVMe in vSAN Ready node configurations, the 1.6TB Intel P4610 NVMe drive, and the 375GB P4800X Intel Optane™ SSD, both of these SSDs are NVMe based devices.

Let’s compare the two devices:

1.6TB Intel P4610375GB P4800X Intel Optane™ P4800X
Capacity1.6TB375GB
Sequential Read (up to)3200 MB/Sec2400 MB/Sec
Sequential Write (up to)2080 MB/Sec2000 MB/Sec
Random Read (100% Span)643000 IOPS550000 IOPS
Random Write (100% Span)199000 IOPS500000 IOPS
Latency Read77 µs10 µs
Latency Write18 µs10 µs
Endurance Rating12.25PBW or around 3.5 DWPDUp to 60 DWPD or around 41PBW
Data obtained from ark.intel.com

As you can see from the above table there are some major differences between the two different SSD’s notably the Random Write performance which is critical in the cache tier in a vSAN environment as all the incoming writes are random in nature and are absorbed by the cache tier, the NAND based SSD does not have as much capability around the random writes versus the Optane™ SSD, but the biggest impact to a vSAN Cache tier is the Drive Writes Per Day (DWPD), if you look at the specifications in detail, the P4610 can handle around 3.5 DWPD which equates to around 5.6TB of data written daily, whereas the 375GB Optane™ SSD can handle up to 60 DWPD which equates to 15TB of data written daily, remember that the Optane™ SSD is also less than a quarter of the capacity of the P4610, so in a vSAN environment cache tier, the Optane™ SSD wins hands down from an endurance perspective as well as the abaility to handle the random writes a lot quicker, so why such a difference?

Well if you look at NAND based SSDs, firstly there is usally an element of DRAM that acts as a buffer to the NAND media which is usually around 1GB of DRAM for every TB of media, so any incoming writes hit the DRAM buffer first, this can be a positive boost in short, low block size write bursts, but cannot be sustained over a longer period of time, in an Optane™ SSD there is no such DRAM buffer so the data is being written to directly to the media. The VxRAIL team at Dell EMC have done some extensive testing around this and clearly demonstrated that a NAND based SSD cannot sustain the same level of write performance in a continuous fashion whereas the Optane™ SSD maintains the same level of write performance consistently, below is the results of their performance testing:

https://www.esg-global.com/validation/esg-technical-validation-dell-emc-vxrail-with-intel-xeon-scalable-processors-and-intel-optane-ssds

The way NAND based SSDs and Optane™ SSDs perfrom write operations is fundamentally different, in everybody’s NAND, media has to be read and written in pages, but everything has to be erased in blocks. Page updates are typically written to a new unused block, as new data is written, old pages become stale, and on an SSD these stale pages can build up fairly quickly which means at some point there a significant chunks of blocks that are obsolete, this then has to be garbage collected. This will then clear the block and allow that block to receive data, and the process starts all over again.

Optane™ SSDs are transistor-less which essentially means that each cell state can be changed from a 0 or 1 independently of other cells on the device. This means that Optane™ SSDs are completely bit addressable as opposed to having to write in pages, there is also no garbage collection required, and this obviously has a positive impact on performance as well as endurance which is why Optane™ SSDs have very high endurance capabilities.

So what does all this mean from an application perspective?
Well the VxRAIL Guys at Dell EMC also did some performance testing using Hammer DB and shown some significant performance gains when using Intel Optane™ SSDs versus traditional NAND as Cache as much as a 61% gain in performance in a complex OLTP workload

As we all know latency is critical in any type of workloads, what I have seen in performance testing is that Intel Optane™ SSDs consistently provide lower latency as well as a much more tightly controlled standard deviation on latency versus the P4610, even though in some smaller block size tests the performance of both devices was similar, in larger block size tests the Optane™ SSD again delivered lower latency and tightly controlled standard deviation in latency but also provided a much higher performance in comparison to the P4610. You also have to remember that the P4610 device was only using 37% Span due to vSAN Currently having a limit of 600GB write buffer per disk group, whereas the Optane™ SSD was using 100% Span, so the P4610 had a bit of an unfair advantage here.

Conclusion
What is clear from a vSAN perspective, endurance plays a critical role in the vSAN cache tier, in the very early days of vSAN there was no other choice but SAS or SATA based NAND devices with a ranging DWPD of between 10 and 25 based on an 800GB Drive, but as the technology evolution pushes the boundaries of performance and endurance, technology like Intel Optane™ SSDs clearly have an edge offering up to 60 DWPD on a smaller capacity of 375GB.

Smaller cache device…are you serious?

In the testing I have done on full NVMe systems where Intel Optane™ SSDs are being used in the vSAN Cache Tier, and standard more read-intensive NVMe drives like the Intel P4510 are being used in the capacity tier, a 375GB Optane™ SSD is more than sufficient, in most workloads a 750GB Optane™ SSD did not improve performance, even with 375GB I was only able to saturate the write buffer by 60% (based on vSAN 6.7 Update 3).

So whilst NAND based devices are fully supported as a vSAN cache device, they may not be the right choice when it comes down to consistent performance and endurance required for a modern infrastructure.

Full NVMe or not Full NVMe, that is the question

Image result for nvme logo

As you have seen, my recent posts have been around Intel Optane and the performance gains that can be delivered by implementing the technology into a vSAN environment. I have been asked many times about what benefits a full NVMe solution would bring and what such a solution would look like, but before we go into that, let’s talk about NVMe, what exactly is NVMe?

Non-Volatile Memory Express (NVMe) is not a drive type, but more of an interface and protocol solution that looks like is set to replace the SAS/SATA interface. It encompasses a PCIe controller and the whole purpose of NVMe is to exploit the parallelism that flash media provides which in turn reduces the I/O overhead and thus improve performance. As SSDs become faster, protocols like SAS/SATA which were designed for slower hard disks where the delay between the CPU request and data transfer was much higher, the requirement for faster protocols become evident, and this is where NVMe comes into play.

So in a vSAN environment, what does a full NVMe solution look like? Because vSAN is currently a two tier architecture (Cache and Capacity) a full NVMe solution would mean that both tiers have to have NVMe capable drives and this can be done with either all Standard NVMe drives in both cache and capacity, or using a technology like Intel Optane NVMe as the Cache and Standard NVMe as capacity. So from an architecture perspective it is pretty straight forward, but how does performance compare, for this I persuaded my contacts at Intel to provide me some Full NVMe kit in order to perform some benchmark tests, and in order to provide a like for like comparison, I ran the same benchmark tests on an Optane+SATA configuration.

Cluster Specification:
Number of Nodes: 4
Network: 2x 10gbit in LACP configuration
Disk groups per node: 2
Cache Tier both clusters: 2x Intel Optane 375GB P4800X PCIe Add In Card
Capacity Tier Optane/SATA: 8x 3.84TB SATA S4510 2.5″
Capacity Tier Full NVMe: 8x 2.0TB NVMe P4510 2.5″ U.2

Test Plan:
Block Size: 4K, 8K, 16K, 32K, 64K, 128K
I/O Pattern: Random
Read/Write Ratio: 0/100, 30/70, 70/30, 100/0
Number of VMs: 120
Number of VMDKs per VM: 1
Size of VMDK: 50GB
Storage Policy: FTT=1, RAID1

Let’s look at the results:

And if you want the numbers:

So what is clear here that Optane serves really well in the cache tier in both solutions, however in the Full NVMe solution read performance is significantly improved also, in the 128K, 100% read test the 2x10G Links were being pushed to their limits, but not only was we able to push up throughput and IOPS but we also drove down latency, in some cases reducing it by over 50%.

So why would you choose a full NVMe solution? The simple answer here is if you have applications that are latency sensitive then having clusters dedicated to those applications would be adequately provided for from an IOPS, Throughput and Latency perspective with Full NVMe.

Vendors have also recognised this, for example Dell EMC have just launched their Intel Optane Powered Full NVMe vSAN Ready node, based on the R740xd platform and consists of similar drives to what I have used in the tests here being the Optane 375GB and P4510 U.2 NVMe drives, you can see the vSAN ready node details here

So clearly NVMe has major performance benefits over traditional SAS/SATA devices, could this be the end of SAS/SATA in the not so distant future?

Linear scaling is anything but a myth

I have many conversations with many people, whether they be customers, friends, colleagues, potential customers but the question is always the same, does vSAN really scale linearly?

So to answer this question, I have access to an 8-Node cluster which I essentially removed four of the nodes and ran a performance test usinc HCI Bench, for the 4-Node cluster I ran a total of 120 VMs and for scalability reasons 240 VMs in the 8-Node cluster.

For the purpose of the test, I wanted to run all the performance tests I have ran previously so all block sizes up to 128K as well as Read/Write percentages of 0%, 30%, 70% and 100%, so let us take a look at the 4-Node performance:

IOPS of the 4 Node Cluster
Throughput in MB/sec
Latency in ms

As you can see even the four node cluster was pretty performant and we can see that the four node cluster can quite easily achieve in excess of 200K IOPS on reads, and 150K IOPS on 70/30 split, so what happens when we add another 4 nodes to the cluster?

IOPS
Throughput MB/sec
Latency

So as you can see from both the tests, latency was pretty much similar in both sets of tests indicating it was a pretty comparable test, so the IOPS and Throughput was more or less double when doubling the size of the cluster proving that vSAN does scale linearly. I would have liked to have had an additional 8-nodes to show further scaling, but in all the customers who I have spoken to about increasing their cluster sizes, they confirm that it scales linearly.