All posts by MrVSAN

Virtual SAN ROBO

Along with the Stretched Cluster feature that arrived in Virtual SAN 6.1 (vSphere 6.0 U1) another feature that was delivered is the Remote Office Branch Office (ROBO) solution.? For many months customers who typically have a remote office branch office vSphere solution were not best pleased at the fact they needed a minimum of three Virtual SAN nodes per site.? With the ROBO solution the minimum number of hosts per site has changed and the licensing for ROBO is different than the conventional Virtual SAN solution as well.

ROBO solutions from VMware have been around for a while, there is a licensing model for vSphere that incorporates a ROBO solution, and in the days of VMware vCenter Storage Appliance (VSA) there was a ROBO solution for customers to have two ESXi hosts on site with a “Cluster” node as a thin client solution on site too, so technically you still needed three systems for VSA.? So how does Virtual SAN ROBO work?

Typical ROBO Setup:


The above image is typical of a Remote Office – Branch Office setup, all remote site vSphere clusters are managed centrally by a single Virtual Center server over standard WAN links, with the vSphere ROBO license pack you can run up to 25 virtual machines per site or even split the 25 Virtual Machines across multiple sites for example have 5 individual sites running 5 virtual machines under a single 25 virtual machine ROBO licence.? The Virtual SAN license for ROBO is exactly the same, it is purchased in packs of 25 virtual machines and those can be spread across multiple sites the same way that vSphere ROBO licence can be.

Virtual SAN ROBO Requirements:


Each remote site will host two ESXi servers running Virtual SAN allowing the RAID1 of disk objects to be fulfilled, in the centralized office there will be a witness host for each remote site, the witness is a virtualised ESXi host and performs the same functions as in the Virtual SAN Stretched Cluster. The storage requirements for the witness appliance also remain the same at 15MB per disk object, the maximum being 25 Virtual Machines having 60 disks each resulting in a maximum storage requirement for the witness appliance of around 23GB.

Unlike the stretched cluster the link between clusters and the witness is much lower, the diagram below shows the requirements:


As you can see from the above diagram, the witness has to reside on a different network to the hosts to prevent I/O traffic flowing across the WAN link and back to the other ESXi host, however the requirements for latency are much lower than the Stretched Cluster, this is mainly down to the maximum number of virtual machines per site being 25 versus the potential 1000’s for the Stretched Cluster.? You will also notice that 1Gb networking between the two hosts on the remote site is fully supported, again this is due to the maximum number of virtual machines that can be ran on the remote site.? It is still recommended to use multiple interfaces as an Active/Standby configuration in order to recover in the event of a network interface failure.

The main question I am asked is “Is it necessary to have a network switch on site”, the answer to this would be not necessarily, the VSAN VMKernel interfaces on the ESXi hosts have to be able to communicate with the Witness Appliance so they could be connected directly to a multiple port router that supports 1Gbps, in the event of a single port router a switch would be required, connecting a crossover cable between the ports on the hosts would not allow communication with the witness appliance and therefore it would not work.



Virtual SAN ROBO will be sold in packs of 25 licenses, this pack can be split across sites or used on a single site, this basically means you can run a maximum of 1 site with 25 virtual machines or up to 25 sites with a single virtual machine, it is a single license per site so there’s no going above 25 virtual machines per site with the ROBO License, you can split the license as any ways as you want providing you do not exceed the 25 virtual machines.

Over the past few days I have spoken with multiple customers about this solution, these customers come from a wide variety of sectors but the end goal remains the same, to reduce the hardware footprint at the remote sites and reduce management overhead, with Virtual SAN ROBO this is easily achieved the main points are:

  • No need to have a third host on site
  • No need for a shared storage array that is managed separately
  • Single CPU servers will also reduce the cost of hardware
  • Lower power consumption and hardware footprint
  • Single pane of glass to manage the remote cluster and storage








Virtual SAN Stretched Cluster

As you have already heard, one of the major features in VSAN 6.1 is the Stretched Cluster feature, with this feature Virtual SAN is enabling customers to have a 2nd site where the data exists in order to provide an increase in Availability and Data Protection, so what does the stretched cluster feature offer exactly?? Let’s take a look:

  • Increased Enterprise availability and data protection
  • Ability to deploy Virtual SAN across dispersed locations
  • Active/Active architecture
  • Syncronous replication between sites
  • Site failure tolerated without data loss and almost zero downtime

So what does this mean and how does it work you may ask, well here’s the details:

Active / Active Datacenter configuration

In the above scenario, we have virtual machines running on both sites so this is considered an Active/Active configuration, the Virtual SAN datastore is still a single datastore that covers both the sites as each site contributes storage to the VSAN datastore by equal capacities so in essence you have 50% of the VSAN datastore capacity on each site.

There is one question that springs to mind straight away based on the functionality of Virtual SAN….What about the Witness?.? As we know already the function of the witness is to provide >50% voting mechanism, and this is still the same in the stretched cluster, the witness still exists but this time in the form of an appliance based ESXi host which can be hosted on a third site, or even in vCloud Air.

In order to use the stretched cluster, three fault domains are required, one for each site, and a third for the witness, the below image shows this:

Stretched Cluster Witness

The witness only contains metadata, there is no I/O traffic from the virtual machines or VMDK data on the witness appliance, there are some space requirements for the Witness appliance though, each disk object residing on Virtual SAN needs 16Mb of storage on the witness, for example if you have 1000VMs and each VM has 4 disk objects, then the space requirement would be 4000 * 16 = 64Gb.? Each VMDK on the appliance is limited to 21000 Objects with a maximum of 45000 objects per stretched cluster.? The VMDKs for the appliance can be thin provisioned if needed in order to save space.


Another question that would be asked is what are the network requirements for the stretched cluster, the below image shows this:

Stretched Cluster Network

As you can see from the above image, the connection between the two sites must be at least a 10Gbps connection with latency no higher than 5ms, remember when a virtual machine submits a write, then the acknowledgement only comes when both sites have received the data with the exception of one site being down.

In addition to this, the link between the two data sites can also be routed over L3

The connection between each site and the witness site whether this be an on-premise third location or vCloud AIR needs to be at minimum a 100Mbit connection with a response time of no more than 100ms, there will be some relaxation on the response time based on the number of ESXi hosts and this would be as follows:

  • Up to 10 hosts per site, latency must be below 200ms to the witness
  • Above 10 hosts per site, latency must be below 100ms to the witness

The requirement of an L3 network between the main sites and the witness location, this is very important, putting them all on the same L2 network can result in I/O traffic going over the witness link which is not something you want to do

Other things to know:

Read Locality – With Virtual SAN data locality is not important, however with the stretched cluster read locality is important as it would be silly to have a virtual machine running on Site A and it fetching the reads from Site B, built into the stretched cluster functionality is the concept of read locality, the read requests will only come from the site/domain where the virtual machine compute is running, the writes will go to both sites.? If a virtual machine is vMotioned to a host in the other site, then the read locality will also switch to the site where the virtual machine compute now resides

Failures to Tolerate (FTT) – Since there are only two sites, then you can only configure storage policies with a maximum FTT=1, remember the formula for the number of fault domains required is 2n+1 where “n” is the number of FTT

Hybrid or All-Flash Both Hybrid and All-Flash will be supported with the stretched cluster functionality

Licensing – The stretched cluster functionality will only be included in the Virtual SAN Advanced licence, this license will also cover All-Flash, so any customer who already has a license for All-Flash is automatically entitled to use the Stretched Cluster functionality

What’s new in Virtual SAN 6.1

Since Virtual SAN was released in March 2014 we have seen various functionality and features added, below is a list of the major features added in the 2nd release of Virtual SAN (Version 6.0)

  • Fault Domain Support
  • Pro-Active Rebalance
  • All-Flash
  • Virtual SAN Health UI
  • Disk Servicability Functions
  • Disk and Disk Group Evacuation
  • JBOD Support
  • UI Improvements such as:
    • Storage Consumption Models
    • Resync Dashboard

Recently announced at VMworld Version 6.1 is no exception, with even more enterprise features being included, just to be clear Virtual SAN 6.1 is being released as part of vSphere 6.0 Update 1, so if you missed the announcement, here is a recap of the features:

  • Stretched / Metro Cluster with RPO=0 for sites no more than 100km apart and a response time of <5ms
  • 5 Minute RPO for vSphere Replication, this is exclusive to Virtual SAN
  • Multiple CPU Fault Tolerance (SMP-FT)
  • Support for Oracle RAC
  • Support for Microsoft Failover Clustering (DAG and AAG)
  • Remote Office – Branch Office (ROBO) 2 Node Virtual SAN Solution
  • Support for new Flash Hardware
    • Intel NVMe
    • Diablo ULLtraDIMM
  • Further UI Enhancements such as:
    • Integrated Health Check Plugin for Hardware Monitoring and Compliance
    • Disk and Disk Group Claiming enhancements
    • Virtual SAN On-Disk format upgrade
    • vRealize Operations (vROPS) integration

This clearly demonstrates the investment that VMware is making in Virtual SAN, I will be writing up on some of the features in more detail, particularly the Stretched Cluster and ROBO solution, so watch out for those