VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure #BCO4872

It’s wednesday and time for a session about vSphere Metro Storage Clusters (vSMC) presented by Duncan Epping and Lee Dilworth. They will explain the various ways to design a vSMC and what you have to pay attention to during the design phase.

Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

The session started off showing a typical stretched cluster environment using HA, stretched networking and stretched storage. This type of environment will provide disaster avoidance, since both networking and storage will be available in your sites. HA (kind of disaster recovery) will not have to kick in unless one of your sites gets isolated. More about this in a second.

Stretched clustering is not a product or solution, but a supported configuration by VMware. The thing you should not do, is use this type of configuration and just leave all the defaults there. A situation might occur where the default configuration will end you up with all VMs being powered off and not being recovered using HA.

Compared to Site Recovery Manager, stretched clustering doesn’t involve testing, orchestration or automation. If you need to have more control when disaster strikes, you should still look at the features SRM is offering. Depending on the size of your environment, you need to buy SRM in packs of 25 VMs or a vCloud Suite Enterprise license for every socket in your ESXi hosts. If you implement a vSMC, you need to stretch your network and storage, which requires additional hardware and network connectivity between your sites. Besides this, the hardware needs to support this type of configuration. You should find out the break-even point where one of the solutions gets cheaper for your environment.

Lee mentioned that there are several techniques involved with keeping your metro cluster happy. Like network heartbeats, which is the default type of heartbeat when creating an ESXi cluster. Hosts in this cluster have to be able to talk with each other using the management network. The maximum latency for this network is 5ms RTT (Round Trip Time). Unless you are using Enterprise Plus licenses, which comes with Metro vMotion functionality and allow 10ms RTT. For synchronous storage, the maximum latency is usually 5ms RTT, depending on the type of storage your are using.

A technique besides network heartbeats are datastore heartbeats, used by ESXi hosts to check if they are isolated. Each host will write files to 2 or more datastores (manually specified) and as long as every host keeps doing that, no isolation has to occur. When a host indeed stops writing these heartbeat files to the datastores, other hosts will initiate an isolation response (which you can also configure, like leave VMs powered on, power off VMs or shutdown VMs). Datastore heartbeats will only be used when network heartbeats would fail between hosts.

Lee and Duncan described different scenarios where HA would power-off VMs and reboot them on a different site. These scenarios showed two architectures: Uniform and non-uniform. A uniform architecture with 2 sites, will have active I/O (Read/Write) on the primary site and the secondary site will have Read-Only access to the LUNs on that site, which means I/O will be flowing from the secondary site to the primary site using ISL (inter-switch link) between sites.

A non-uniform architecture will have two active sites, with on both ends active storage nodes and synchronous replication between the nodes. In this architecture, it’s very important to set up site awareness on your storage array. By doing this, you make sure I/O flows from hosts in site A to storage nodes in site A and vice versa for site B. And you don’t need ISL, since replication is happening instantly between both sites. A failure of a node in a specific site will failover to the other node in that site (of course you need to have 2 or more nodes in each site).

The next thing to consider is site awareness. vSphere is not site-aware and when creating a vSphere cluster which spans multiple sites, vSphere will not know the ESXi hosts are in different physical locations. Because of this, workloads may end up on the same site, where a site failure could mean that important workloads fail and go offline. To make your metro cluster more robust, you should create some sort of site awareness by configuring DRS affinity rules. Create groups for your hosts (each group containing hosts in a specific site), create groups for your VMs (Lets say VMs-Site-A and VMs-Site-B) and link them to the host groups. In case of an application running on two VMs in a load-balanced matter or cluster, place one in group A and one in group B to make sure that in case of a site failure, your application will still be available.

Finally some do’s and don’ts were mentioned by Duncan and Lee:

  • Don’t use the ‘must’ DRS affinity rules. In case of a failure, HA will not ignore these rules and your VMs could stay powered off. Use the ‘should’ rules instead.
  • Configure isolation addresses when running a metro cluster by using the ‘das.IsolationAddress’ option in your ESXi cluster settings. You can specify up to 9 isolation addresses, making ESXi decide whether it’s isolated or not in a more accurate way. The default gateway of your ESXi host is already an isolation address by default.
  • Configure heartbeat datastores and when using multiple storage systems, increase the default amount of 2 datastores to 4 per site.
  • When using vSphere 5.0 ensure you configured PDL enhancements by setting ‘das.MaskCleanShutdownEnabled’ to ‘true’, in your cluster settings which is by default on false.
  • Enable PDL enhancements to allow HA to respond to PDL by killing the VM process when storage has been lost. This has to be done on host-level, see screenshot below.

Thanks to Lee and Duncan for the presentation!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s