Transforming from baremetal VMware vSphere to nested VSAN: Part Two (Preparation and Execution)

Welcome to part two of my VSAN ‘trilogy’! If you’ve missed the first part, see the link below.
Transforming from baremetal VMware vSphere to nested VSAN: Part One (Design)
This post contains the steps I’ve taken to prepare my transformation from a traditional vSphere lab environment to a VSAN-powered environment.

So far I have been gathering information mainly to me sure the transformation is possible and doesn’t cost me time to find out it won’t work anyway. This is the summary I have based on the design in part one.

  • Number of VMs and templates: 14
  • Total size of VMs: 500GB
  • Total space on temporary disk: 800GB

Currently having 4 physical 1TB disks in a RAID configuration for my VMs, I need to clear the RAID configuration before creating a datastore on each one of them to place my VSAN hosts on. As described in the previous post I decided to move them to a single physical disk that hosts my templates and media. I have moved each VM from the RAID-backed datastore to the single physical disk. Keeping all VMs powered on was not an option seen the number and the I/O load involved from both running the VMs and moving VMs.

I kept my Domain Controller, Database server (which hosts the vCenter server DB) and vCenter Server VMs alive and running on the temporary disk, the rest was powered off.

Next I deleted my RAID configuration using the hpacucli tool, which is available by default if you are using the HP Custom ESXi ISO. Why HP? Well I’m using a HP P410 array controller =)
I created four new RAID-0 logical volumes using the tool, so I had four volumes in my physical ESXi box to create datastores on.

Now it’s time to create virtual ESXi hosts and place them on their own (private) datastore.

Oh, before I forget, create an internal-only switch on the physical ESXi host, where the VSAN replication and vMotion traffic will take place.

I have used the specs below:

Guest OS: VMware ESXi 5.x (choosable after you create a VM using any Guest OS and changing it afterwards)
CPU: 2 vCPU (Enable “Expose hardware assisted virtualisation” under the CPU settings when editing the VM)
Memory: 12GB
E100E Network Adapter 1: Management Portgroup (In the same VLAN as my physical ESXi host)
E100E Network Adapter 2: VSAN-Replication Portgroup (Internal-Only switch)
E100E Network Adapter 3: vMotion Portgroup (Internal-Only switch)
E100E Network Adapter 4: VM Trunk Portgroup (New Virtual Machine portgroup in existing switch used for VM networks. Use VLAN 4095 to create it as a trunk, tagging VM portgroups from virtual/nested ESXi hosts)
SCSI Controller: LSI Logic SAS
Hard Disk 1: 4GB Thick Provisioned on private datastore
Hard Disk 2: 16GB Thick Provisioned Eager Zeroed on SSD datastore
Hard Disk 3: 500GB Thick Provisioned Eager Zeroed on private datastore

You need a minimal of 3 VSAN hosts to create a VSAN cluster. You obviously don’t need to use the same specs, but with this setup I was able to get everything to work. It should look like this:

After creating 3 hosts, I decided to give it a go and configure VSAN. Just leave the new hosts directly under your datacenter object. First create a new VSAN VMkernel port using the vSphere Web Client by selecting the specific host, going to Manage>Networking>VMkernel adapters and click the “Add host networking” icon. Make sure you connect the port to the right portgroup and contains the other VSAN host virtual NICs.

Next, create a new cluster in the vSphere Web Client and tick the “Turn On” checkbox next to Virtual SAN.

I left the claiming of disks to manual so I can control which disks to initialize for VSAN.
After the cluster has been created, you can put your hosts in. Navigate to the cluster object and click the “Manage” tab on top. Navigate to Settings > Virtual SAN > general to check if your VSAN is healthy. You should have a minimum of three hosts, 3 of 3 SSD disks and 3 of 3 data disks (when using that amount of SSDs and data disks).

Click “Disk Management” right under the VSAN General menu you were just in.

Now, create a disk group for each host using the second icon (disks with a green plus icon on it). Assign the SSDs and data disks you want to initialize for that specific hosts. Each host you configure this way, will contribute to the total amount of storage space available. After you’re done, you should have a new datastore on each of you hosts which is represented as “vsanDatastore”. Placing VMs on this datastore will distribute data across VSAN hosts to ensure availability.

So here I went, migrating VMs in groups to the new datastore. Everything went well, even booted some VMs up and noticed the high responsiveness of the systems! Because it was already quiet late, I decided to continue the next day.

When I tried to login to vCenter the day after, I was prompted with a cannot find servername. Hmm.. =]

Long story short: I didn’t plan my required capacity accurately and filled up my VSAN datastore, which was also running vCenter Server by then. You need vCenter Server to manage your VSAN.. DOH

I tried freeing up space by deleting VMs which I had in my backup anyway, but this didn’t did the trick. I managed to solve this by SSH’ing into the physical ESXi box (which I had added to the VSAN cluster was well I reckon now) and copying my vCenter VM files to a local drive and booting it up there. Thankfully this worked and the datastore didn’t got corrupted or inaccessible (well it was rather slow but that’s no surprise).

After quiet some time bringing up my main systems again by copying them manually and booting them up again, I got to the point where I needed to increase my capacity of my VSAN. I edited my original virtual harddisks and added about 150GB each, so I would get an increase of 450 raw space. Well, VSAN doesn’t really like it if you change the capacity of your ‘physical’ disk. Ofcourse, this would not even be possible in the physical world, but I had to work around this in my environment. Saving you time and effort, this is the fastest method in my opinion:

  • Put your host in maintenance mode choosing Full data migration (VSAN will automatically migrate data to ensure availability)
  • Remove the disk group of your host (destroying all data on SSD/Data disks, so make sure your data is safe!)
  • Increase the capacity of your virtual data disk
  • Refresh the host storage system and recreate the disk group (check if the new capacity is shown)
  • Get your host back from maintenance mode

I made a mistake here when I copied my VM data, as VSAN spreads the data across nodes and also makes cross references from the original VMDK file to unique ID extents on your VSAN datastore. I cleaned up my VSAN disks as described, but didn’t know about the cross references. So there I was, with a VMDK file pointing to a “vsan://” path which did not exist anymore.. Hmpf =]

When you would Consolidate your VMs using the Snapshot menu, any extents will be commited to the primary virtual disk file and no cross references will be active to worry about.

I ended up recovering my Domain Controller VM (of which the VMDK above is shown) by restoring the e47d… extent file (Thanks Veea!m) to the same directory as the VM and changed the path inside the VMDK accordingly.

Status: All data secure, now eager-zeroing new virtual disks for my other 2 VSAN hosts so I can hopefully migrate my VMs back to VSAN soon. I will leave my primary domain controller, vCenter Server and database server running outside VSAN until I have some more faith in my nested VSAN configuration.

Next post will be about experiences I had while using VSAN, performance, do’s and dont’s and everything else I think is valuable to share with you.

Thanks for reading!

One thought on “Transforming from baremetal VMware vSphere to nested VSAN: Part Two (Preparation and Execution)

  1. Pingback: VMware NOW: VSAN General Availability | SnowVM Blog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s