Speeding up LUN snapshot imports / resignaturing 0

Background Information:

This past weekend I was tasked with migrating 307 virtual machines off of old HP blades (AMD processors) onto new Cisco UCS blades (Intel processors).  Since there is no supported way to live migrate these virtual machines from one environment into another, I had to perform this work during a scheduled maintenance window — a.k.a. an offline migration.

While planning this migration, it was decided that all of the VMs would move from the 2 clusters that existed on the HP blades into a single cluster in the UCS environment.  As such, the storage LUNs would have to be renumbered onto the destination cluster to avoid conflicts.  To accomplish this, a SAN-side clone operation was used.  The clone allowed for the renumbering of the LUNs into the destination cluster, and additionally acted as a fail-safe mechanism — in the case that something went terribly wrong the old clusters could simply be powered back up.

As expected, after the cloned LUNs were presented to the UCS blades, the datastores were not immediately recognized.  This is due to how vSphere handles snapshot LUNs and UUID mismatches.  Not a big deal, as all that needed to be done was a quick resignaturing of the datastores.

The Problem:

The problem became apparent as soon as I opened up the new datastore dialog box.  It took approximately 5 to 10 minutes for the dialog box to report back the LUNs that the ESXi host could see.  As you know, you can only add one datastore at a time.  Multiply this by the 15 LUNs I needed to import and you can see my pain…  Thinking that this problem could be web client related (vSphere 5.5 environment), I also tried the legacy C# client.  No luck in speeding things up.

The Solution:

Desperate for a way to speed up this process, I found the answer buried within the Advanced Settings area on my ESXi host configuration.

vmfs-unresolvedvolumelivecheck

By default (at least in vSphere 5.5), there is a setting called VMFS.UnresolvedVolumeLiveCheck with a value of true.  It is my understanding that since the snapshot LUNs are presenting an “unresolved” VMFS filesystem to the ESXi host, the host itself is performing checks against the filesystem to make sure it is not “lively” (i/o traffic).  Since I was 100% sure that no other ESXi hosts were using the filesystems (these are brand-new cloned LUNs), I decided to flip the VMFS.UnresolvedVolumeLiveCheck value to false and see what happens.

As soon as I changed the setting to “false” and re-opened up the add datastore dialog box — BOOM!  All of the LUNs appeared immediately.  This allowed me to go through the resignaturing processes in a much more timely fashion.  After the resignaturing, I then used this PowerCLI script to re-import my VMs on the new cluster and life was good!

I should note that I only had to change this setting on one host in the new UCS cluster.  I then added all of my datastores by assigning them new signatures by using this single host.  The rest of the hosts in the cluster immediately found the datastores once the new signatures had been written to them.  Once all of the datastores had been imported, I flipped the VMFS.UnresolvedVolumeLiveCheck setting back to “true”.

Here is my obligatory cautionary note:  DO NOT use this process for LUNs when there is even a slight chance that the filesystems might be in use elsewhere.  In my case I was certain that this operation was safe due to my use of a SAN clone.  I am not responsible for the bad things that might happen to your data if you do not understand what is going on behind the scenes with this operation.