Tag Archives: Flash

Why a NAND based SSD may not be the right choice for vSAN cache Tier

As we all know storage media has evolved very quickly over the past few years, the decline of the spinning disk and the move to flash based storage devices, but also the shift from SAS/SATA protocol based drives to NVMe protocol based drives in order to address the performance limitations of older protocols that were designed for spinning disks and not for SSDs.

A question I get asked regularly is what type of SSD is best for the vSAN Cache tier, there are vSAN ready nodes out there that contain NAND based SSDs for both the cache tier and for the capacity tier, but then there are other technologies like Intel Optane™ SSDs being used for the cache tier, so let’s talk about the two, for the purpose of this comparison I am going to use the most common 3D NAND based NVMe in vSAN Ready node configurations, the 1.6TB Intel P4610 NVMe drive, and the 375GB P4800X Intel Optane™ SSD, both of these SSDs are NVMe based devices.

Let’s compare the two devices:

1.6TB Intel P4610375GB P4800X Intel Optane™ P4800X
Capacity1.6TB375GB
Sequential Read (up to)3200 MB/Sec2400 MB/Sec
Sequential Write (up to)2080 MB/Sec2000 MB/Sec
Random Read (100% Span)643000 IOPS550000 IOPS
Random Write (100% Span)199000 IOPS500000 IOPS
Latency Read77 µs10 µs
Latency Write18 µs10 µs
Endurance Rating12.25PBW or around 3.5 DWPDUp to 60 DWPD or around 41PBW
Data obtained from ark.intel.com

As you can see from the above table there are some major differences between the two different SSD’s notably the Random Write performance which is critical in the cache tier in a vSAN environment as all the incoming writes are random in nature and are absorbed by the cache tier, the NAND based SSD does not have as much capability around the random writes versus the Optane™ SSD, but the biggest impact to a vSAN Cache tier is the Drive Writes Per Day (DWPD), if you look at the specifications in detail, the P4610 can handle around 3.5 DWPD which equates to around 5.6TB of data written daily, whereas the 375GB Optane™ SSD can handle up to 60 DWPD which equates to 15TB of data written daily, remember that the Optane™ SSD is also less than a quarter of the capacity of the P4610, so in a vSAN environment cache tier, the Optane™ SSD wins hands down from an endurance perspective as well as the abaility to handle the random writes a lot quicker, so why such a difference?

Well if you look at NAND based SSDs, firstly there is usally an element of DRAM that acts as a buffer to the NAND media which is usually around 1GB of DRAM for every TB of media, so any incoming writes hit the DRAM buffer first, this can be a positive boost in short, low block size write bursts, but cannot be sustained over a longer period of time, in an Optane™ SSD there is no such DRAM buffer so the data is being written to directly to the media. The VxRAIL team at Dell EMC have done some extensive testing around this and clearly demonstrated that a NAND based SSD cannot sustain the same level of write performance in a continuous fashion whereas the Optane™ SSD maintains the same level of write performance consistently, below is the results of their performance testing:

https://www.esg-global.com/validation/esg-technical-validation-dell-emc-vxrail-with-intel-xeon-scalable-processors-and-intel-optane-ssds

The way NAND based SSDs and Optane™ SSDs perfrom write operations is fundamentally different, in everybody’s NAND, media has to be read and written in pages, but everything has to be erased in blocks. Page updates are typically written to a new unused block, as new data is written, old pages become stale, and on an SSD these stale pages can build up fairly quickly which means at some point there a significant chunks of blocks that are obsolete, this then has to be garbage collected. This will then clear the block and allow that block to receive data, and the process starts all over again.

Optane™ SSDs are transistor-less which essentially means that each cell state can be changed from a 0 or 1 independently of other cells on the device. This means that Optane™ SSDs are completely bit addressable as opposed to having to write in pages, there is also no garbage collection required, and this obviously has a positive impact on performance as well as endurance which is why Optane™ SSDs have very high endurance capabilities.

So what does all this mean from an application perspective?
Well the VxRAIL Guys at Dell EMC also did some performance testing using Hammer DB and shown some significant performance gains when using Intel Optane™ SSDs versus traditional NAND as Cache as much as a 61% gain in performance in a complex OLTP workload

As we all know latency is critical in any type of workloads, what I have seen in performance testing is that Intel Optane™ SSDs consistently provide lower latency as well as a much more tightly controlled standard deviation on latency versus the P4610, even though in some smaller block size tests the performance of both devices was similar, in larger block size tests the Optane™ SSD again delivered lower latency and tightly controlled standard deviation in latency but also provided a much higher performance in comparison to the P4610. You also have to remember that the P4610 device was only using 37% Span due to vSAN Currently having a limit of 600GB write buffer per disk group, whereas the Optane™ SSD was using 100% Span, so the P4610 had a bit of an unfair advantage here.

Conclusion
What is clear from a vSAN perspective, endurance plays a critical role in the vSAN cache tier, in the very early days of vSAN there was no other choice but SAS or SATA based NAND devices with a ranging DWPD of between 10 and 25 based on an 800GB Drive, but as the technology evolution pushes the boundaries of performance and endurance, technology like Intel Optane™ SSDs clearly have an edge offering up to 60 DWPD on a smaller capacity of 375GB.

Smaller cache device…are you serious?

In the testing I have done on full NVMe systems where Intel Optane™ SSDs are being used in the vSAN Cache Tier, and standard more read-intensive NVMe drives like the Intel P4510 are being used in the capacity tier, a 375GB Optane™ SSD is more than sufficient, in most workloads a 750GB Optane™ SSD did not improve performance, even with 375GB I was only able to saturate the write buffer by 60% (based on vSAN 6.7 Update 3).

So whilst NAND based devices are fully supported as a vSAN cache device, they may not be the right choice when it comes down to consistent performance and endurance required for a modern infrastructure.

Why HCI Matters in the Datacenter and Beyond

Technology is changing and evolving at an ever-increasing pace, whether you are a consumer of electronics, or you are a CEO of a large organisation with a large IT infrastructure, the changes in technology affect us all in different ways.  An example of this is CPUs and Flash Storage, we’re now at an era of constantly increasing CPU Core densities, and Flash Storage is becoming bigger and faster, these technology transformations are not only changing the way we operate as human beings in our own personal IT bubbles at home, but also within organisations too.

As organisations large and small take on the whole business transformation, a key element of the business transformation is their IT, whilst the last 15 years IT was more focused around being IT centric with traditional applications and the wide adoption of the internet.  The next 15 years poses some challenges as IT becomes more business centric along with cloud applications and the Internet of Everything.

A key enabler to the whole IT transformation is the Software Defined Data Center, many of you would have heard me talk about the Software Defined Data Center not as an object, but more as an Operating System that runs your IT infrastructure, if you are asked what three things are required to run an operating system?   You’ll find yourself answering storage, compute and networking for connectivity, which is essentially the three key elements that make up the Software Defined Data Center.

Hyper-Converged Infrastructure allows you to deliver capabilities that underpin the whole Software Defined Data Center based on a standard x86 architecture and offers a building block approach, it also brings the storage closer to the CPU and Memory which in a virtualised environment is highly benefitial and it is more VM centric rather than being storage centric.

So why is HCI being adopted by the masses?

There are a number of reasons for this, we’ve already outlined the fact that having the storage closer to the compute delivers a much more efficient platform, but outside of that there is a Harware Evolution which is driving the changes in infrastructure, rather much like an Infrastructure Revolution.

Higher CPU Core densities means you can run much more dense workloads, in conjunction with this, RAM has become much comoditized, affordable and available in larger capacity.  From a storage aspect Flash has evolved in such a way that is has enabled the delivery of high capacity and high performing devices that only a few years ago would have took a whole refrigerator sized array to produce but now can be delivered by a device that you can hold in the palm of your hand.  Another aspect from the storage side of things is that traditional storage is unable to keep up with the demands of applications and IT, this resulted in a new approach to storage and infrastructure….HCI


What is required from your storage platform?

I have met with many customers in various meetings or events, and depending on who you talk to in the organisation you will get a different answer to that question

  • Application Owner – Performance and Scalability
    They need to deliver an application that performs well as well as offers scalability, so the storage has to be able to offer this.
  • Infrastructure Owner – Simplicity and Reliability
    They need the platform to be simple to deploy, simple to manage but also offer reliability, they don’t want to be getting calls in the middle of the night from the Application Owner!
  • CFO / Finance Team – Lower Cost and Operational Efficiency
    There’s always somebody looking at the numbers and it’s usually this side of the organisation, reducing TCO, CAPEX, OPEX and making IT more cost effective is the biggest driver here.

Everyone is aiming for that sweet spot where all three circles converge the only problem is, with traditional infrastructure, you can never satisfy all three of the above requirements, there’s usually one requirement that has to be sacrificed, and that’s usually the Finance Team or CFO that has to back down in order to deliver the requirements for the Application Owner and the Infrastructure Owner,  this is where HCI is different, HCI brings everyone to that central convergence and meets the goals of all the requirements, so now everyone is happy, lets take a closer look at how HCI powered by vSAN meets these requirements


vSAN HCI delivers an architecture that not only delivers on performance but it is scalable simply by adding more nodes, or by adding more storage, it also allows for linear scaling of performance.  This means as your IT or business applications scale and demand more capacity or performance, then this is easily delivered in whatever increments meet the requirements at that point in time.


vSAN HCI allows the infrastructure team to deploy and manage environments at a simple management plane in a single interface, no separate management tools are required which means there’s no extensive retraining of staff required.  Reliability and Resiliency are built in with the ability to protect from a Disk Level all the way up to a Site level.


We’ve already talked about how HCI offers a building block approach, this means environments can be built to meet your requirements now and be grown as and when required.   Because there’s a much simpler management plane, this means operational efficiencies come into play as well, offering a more streamlined approach to IT

At this point we have met all of the criteria set by the three key stakeholders, but the benefits of HCI don’t just stop there there are other positive impacts that HCI brings to your organisation:


vSAN HCI offers a much wider of choice in the hardware that can be used along with different hardware vendors to choose from, there is also the range of different deployment options, this allows organisations to have a lot more flexibility on how they adopt HCI as well as having choices for newer hardware technology at their fingertips, this includes:

  • vSAN Ready nodes from all major server OEM vendors to suit all performance and capacity requirements
  • Turnkey appliance solution from Dell EMC which is VxRAIL
  • VMware Cloud Foundation which incorporates a full SDDC Stack

For deployment options, vSAN HCI offers the following:

  • Standard clusters up to 64 Nodes
  • Remote Office / Branch Office (ROBO) Solutions for customers with multiple sites
  • Stretched Cluster Solutions
  • Disater Recover Solutions
  • Rack Level Solutions
  • Same Site “Server Room” configurations


vSAN HCI allows organisations to become more agile by allowing  faster deployments, faster procurement and giving more control back to the business, which in a competitive world is a key enabler to success

As you can see, no matter what the size of your IT infrastructure is, HCI brings a wealth of benefits, from large scale data center deployments, to multi site ROBO deployments, there’s a perfect fit for HCI