VM Power-up failure — Reason: Failed to lock the file. Cannot open the disk… 0

Recently, I was tasked to investigate an error encountered when trying to power up a virtual machine.  The end user is getting the following error message:

An unexpected error was received from the ESX host while powering on VM vm-26615.Reason: Failed to lock the file.Cannot open the disk /vmfs/volumes/5029553f-63b72580-1c18-002655238b38/Windows7x64-GoldImage/Windows7x64-GoldImage.vmdk or one of the snapshot disks it depends on.

Upon investigation into this issue, I found VMware KB10051.  The article describes how to use the ‘vmkfstools’ command to obtain the MAC address of the ESXi host that is currently holding the lock to the VMDK file in question.  Here is the complete command I executed:

vmkfstools -D /vmfs/volumes/5029553f-63b72580-1c18-002655238b38/Windows7x64-GoldImage/Windows7x64-GoldImage.vmdk

After running the ‘vmkfstools’ command from the first host in the cluster, I was presented with a MAC address of a different host that supposedly contained the file lock.  In order to determine which host correlated to the MAC address provided, I used the following PowerCLI snippet (slightly modified from an example posted by Robert van den Nieuwendijk, his code found here):

Get-Cluster UCS01 | Get-VMHost | Get-VMHostNetworkAdapter | Where-Object {$_.Mac -eq "00:50:56:78:98:a2"} | Format-List -Property

The PowerCLI snippet gave me the information I was looking for, or so I thought…

I tried once again (like a fool hoping for a different result) to start to VM and was presented with the same error message I encountered initially.

Next, I logged in to the suspect host via SSH, and re-executed the vmkfstools command exactly as I did before (listed above).  This time I was given the MAC address of the first host in the cluster.  I thought to myself, “how can this be possible???”  Not believing the results, I ran the command a few more times.  I was able to see the MAC address alternate from the first host to the second host, but only after attempting to power on the VM in question.

Then it hit me…  Could it be possible that there were two VMs pointing to the same VMDK file simultaneously?   Thinking on my feet, I ran the following command from the SSH console of the first ESXi host:

grep "Windows7x64-GoldImage.vmdk" /vmfs/volumes/*/*/*.vmx

The result (ignoring all of the ‘device is busy’ errors), was a listing of two .vmx files that were both pointing at the same VMDK file.  One VM was powered up, thus preventing the other from starting.  This explained everything!

After ensuring that both VMs were fully backed up, I moved the VMDK of the running VM into its folder and restored the conflicted VM from its most recent backup.  It turns out that a new staff member who was not familiar with how to clone virtual machines had created a new VM and figured that he simply needed to point the new VM at the existing virtual hard drive in order to provision his clone.

After (very politely) educating the user on how VM templates, cloning, etc. are supposed to work, I figured I would share this story with all of you to help you find that “ah-ha!” moment if you are ever presented with this situation.

Happy hunting!