Tag Archives: Optane

How intel optane and nvme helps with node consolidation

As the core density increases on a CPU it opens up the opportunity to consolidate the number of nodes required in any given cluster, but in a vSAN cluster, node consolidation has a negative effect on available IOPS, if you think about how each node provides a specific amount of IOPS, lowering the number of hosts in the cluster removes the IOPS capability of the nodes you are consolidating by, take the following for example:

Number of VMs : 200
vCPU Per VM : 4
Virtual Memory per VM : 32GB
Storage per VM : 600GB
vCPU to Core Ratio : 4 to 1

Now for the purpose of this sizing excersize I am going to use the vSAN Sizing tool and apply some cluster settings as per below:

So in the above scenario, the number of cores per CPU is 18, and I want to ensure that this is a two disk group configuration, if we then input the workload details as per below:

You will see when we click on recommendation that it shows a required node count of 8 (not taking into account any N+1 capability as we left that as 0 for the purpose of this sizing)

And we can see the disk config below:

However, if we increase the number of CPU Cores to 20 by clicking on the “+” in the sizing output we can see that it changes the number of nodes

And again if we increase the number of cores again to 22 we get a further reduction in the number of nodes to 6

The sizing tool will dynamically increase or decrease the number of disks required per host as well as the RAM per node that is required as you can see here:

But one thing we have not factored in here is the decrease in IOPS Capability that reducing by two nodes , if say for example each node was capable of 80K IOPS, reducing the node count by two means you have just lost 160K IOPS Capability, so what can we do to mitigate that?

Well instead of using SAS/SATA SSDs in your vSAN design, you could opt to use Intel Optane for Cache, and NAND based NVMe drives for capacity.
For write operations, Intel Optane greatly improves on write performance as I have written about before, but also read performance is greatly accelerated because the capacity devices are NVMe, so therefore reducing your node count by two in this case and utilising this kind of technology means you still get similar levels of performance, the best part is, the overall solution will cost you less too, so your TCO comes down which is good for your finance department right?

One question I get asked frequently is what size Optane device is sufficient?

Well in all of my testing, I very rarely saturated the write buffer even with 375GB Optane drives as cache devices, the reason for this is because vSAN starts to perform de-staging from the cache tier to the capacity tier when the write buffer becomes around 30% Full, and because the capacity tier it NVMe based, the de-staging happens a lot quicker, especially since vSAN 6.7 U3 where the de-stage limits have been removed.

So when would a 750GB Optane be useful?

High write intensive workloads such as Video Surveilance and Databases, or when your capacity disks are much slower, Optane can still be used in vSAN Configurations where the Capacity Tier is SAS/SATA which of course are not as fast as most NVMe devices so the write buffer can get more full.

So just to re-cap, you can save money on your vSAN deployments by consolidating hosts with higher core count CPUs as well as leveraging newer technology such as Intel Optane in the Cache Tier and NVMe in the capacity tier thus saving money whilst maintaining same level of performance or better, what’s not to like?

QLC NVMe – could this signal the end of SAS and SATA?

I met with a team from Intel recently and discussed their new additions to the vSAN Compatibility Guide, mainly around their QLC NVMe drives. I have spoken to many customers around Full NVMe configurations on many occasions and usually there was a slightly higher price to pay for such configurations, but the QLC NVMe drives could be a turning point for future proofing your HCI platform because they are cheaper than your SAS/SATA Equivalent!

This being said, I have heard many times that the days of SATA/SAS based drives are numbered, but clearly with these QLC NVMe drives this could be much sooner rather than later.

Right now the 7.68TB D5-P4320 has been certified, and I have been informed by Intel that the 15.3TB one is currently going through certification, that’s now a game changer for delivering high amounts of capacity at a reasonable cost price. If I take the 4-Node Full NVMe cluster I have access too and replaced all the current NVMe devices for the 7.68TB QLC NVMe devices, I would have an effective usable capacity of 166TB and double that with the 15.3TB drives, this is based on RAID5 Storage Policy only and also taking into account the 10% difference between Device Capacity and Actual Capacity. So let’s take a look a bit more closely at these new QLC NVMe drives from Intel:

From the ARK portal we can determine the following information:

FormatU.2 2.5Inch
Sequential Read (up to)3200 MB/s
Sequential Write (up to)1000 MB/s
Random Read (100% span)427000 IOPS
Random Write (100% span)36000 IOPS
Latency – Read138 µs
Latency – Write30 µs
InterfacePCIe NVMe 3.1 x4

Now if you remember my blog around Full NVMe performance, combining Intel Optane with their NVMe drives will deliver a much more superior performance characteristic versus traditional SAS/SATA, however in addition to that with these new QLC NVMe drives it also reduces the cost of capacity, but just how much of a difference is it?

So I checked out the prices here in the UK, from the same supplier, here’s the link to the NVMe QLC Device and here’s the link to a SAS Equivalent.

For the benefit of this exercise I compared the lowest cost SAS 12G 7.68TB Drive on the vSAN Compatibility Guide since Intel do not manufacture SAS based SSDs and vendors seem to favour SAS based SSDs over SATA

Correct as of 11th August 2019Samsung 7.68TB
SAS 12G
Intel P4320 7.68TB
QLC NVMe
Capacity7.68TB7.68TB
InterfaceSASNVMe
Total Cost of Drive£3093.60£1609.20
Cost per GB£0.40£0.20
DWPD10.2

As you can clearly see, the cost per GB is significantly lower at £0.20 per GB (this falls to around £0.18 per GB on the larger 15.3TB device), however there is one thing to note, the DWPD of the QLC NVMe device is much lower in comparison to the SATA device but in a vSAN environment should this matter too much? The simple answer here is no, but if we look at the maths, if I had 8 of the QLC devices in each host in my 4-node cluster, and I have a usable capacity of 166TB, at 0.2 DWPD that means I would have to be writing 33.2TB of data per day to hit the 0.2 DWPD, so the lower DWPD in a vSAN environment is not significant unless you are constantly writing fresh data that would exceed the above.

I am hoping that I can get some of these QLC NVMe drives from Intel to get some performance data from them in order to complete the write up and give some performance characteristics, but based on my previous full NVMe performance testing I would not expect them to be lower than those previous tests.

Full NVMe or not Full NVMe, that is the question

Image result for nvme logo

As you have seen, my recent posts have been around Intel Optane and the performance gains that can be delivered by implementing the technology into a vSAN environment. I have been asked many times about what benefits a full NVMe solution would bring and what such a solution would look like, but before we go into that, let’s talk about NVMe, what exactly is NVMe?

Non-Volatile Memory Express (NVMe) is not a drive type, but more of an interface and protocol solution that looks like is set to replace the SAS/SATA interface. It encompasses a PCIe controller and the whole purpose of NVMe is to exploit the parallelism that flash media provides which in turn reduces the I/O overhead and thus improve performance. As SSDs become faster, protocols like SAS/SATA which were designed for slower hard disks where the delay between the CPU request and data transfer was much higher, the requirement for faster protocols become evident, and this is where NVMe comes into play.

So in a vSAN environment, what does a full NVMe solution look like? Because vSAN is currently a two tier architecture (Cache and Capacity) a full NVMe solution would mean that both tiers have to have NVMe capable drives and this can be done with either all Standard NVMe drives in both cache and capacity, or using a technology like Intel Optane NVMe as the Cache and Standard NVMe as capacity. So from an architecture perspective it is pretty straight forward, but how does performance compare, for this I persuaded my contacts at Intel to provide me some Full NVMe kit in order to perform some benchmark tests, and in order to provide a like for like comparison, I ran the same benchmark tests on an Optane+SATA configuration.

Cluster Specification:
Number of Nodes: 4
Network: 2x 10gbit in LACP configuration
Disk groups per node: 2
Cache Tier both clusters: 2x Intel Optane 375GB P4800X PCIe Add In Card
Capacity Tier Optane/SATA: 8x 3.84TB SATA S4510 2.5″
Capacity Tier Full NVMe: 8x 2.0TB NVMe P4510 2.5″ U.2

Test Plan:
Block Size: 4K, 8K, 16K, 32K, 64K, 128K
I/O Pattern: Random
Read/Write Ratio: 0/100, 30/70, 70/30, 100/0
Number of VMs: 120
Number of VMDKs per VM: 1
Size of VMDK: 50GB
Storage Policy: FTT=1, RAID1

Let’s look at the results:

And if you want the numbers:

So what is clear here that Optane serves really well in the cache tier in both solutions, however in the Full NVMe solution read performance is significantly improved also, in the 128K, 100% read test the 2x10G Links were being pushed to their limits, but not only was we able to push up throughput and IOPS but we also drove down latency, in some cases reducing it by over 50%.

So why would you choose a full NVMe solution? The simple answer here is if you have applications that are latency sensitive then having clusters dedicated to those applications would be adequately provided for from an IOPS, Throughput and Latency perspective with Full NVMe.

Vendors have also recognised this, for example Dell EMC have just launched their Intel Optane Powered Full NVMe vSAN Ready node, based on the R740xd platform and consists of similar drives to what I have used in the tests here being the Optane 375GB and P4510 U.2 NVMe drives, you can see the vSAN ready node details here

So clearly NVMe has major performance benefits over traditional SAS/SATA devices, could this be the end of SAS/SATA in the not so distant future?