VMware ESXi 6.x.x: “Invalid VM Status” or “Virtual machines are numbers”

I woke up today to a wonderful sight. My ESXi 6.7 server bootlooping! It was failing with the error code 15 (Not Found) on a driver file. I know that I certainly did not delete it, but oh well this is what backups are for… I don’t have backups of the ESXi boot flashdrive, but I do for everything else. Luckily, I was able to change the boot environment to an older version. In doing this, It uninstalled my Fusion IOdrive2 Drivers, and, strangely, it appended an old version of my virtual machine inventory file to the running one. Because VMware is special, it changed the UUID of my Fusion IOdrive2’s datastore, so now I have to fix the filepath in the inventory folder. To restore my Virtual Machines, I did the following:

Method 1: Fixing the Problem

Firstly, you need to find the VM Object ID’s of your VM’s. Luckily, this is easy. Running the following command will get a list of all the invalid virtual machines.

vim-cmd vmsvc/getallvms | grep invalid

This is the output I got on my problem child:

[root@ESXi-S2600:~] vim-cmd vmsvc/getallvms | grep invalid
Skipping invalid VM '553'
Skipping invalid VM '714'
Skipping invalid VM '715'

From this, we know that the ID’s of the invalid VM’s are 553, 714, 715.

Next we need to look at the ESXi inventory file to get the path of the virtual machine. The file is located at /etc/vmware/hostd/vmInventory.xml. If we run the command below, it will parse the file and find anything in the file with the string “553”. Replace 553 with your VM ID

grep 553 -A 2 /etc/vmware/hostd/vmInventory.xml

The output will be similar to this:

root@ESXi-S2600:/vmfs/volumes] grep 553 -A 2 /etc/vmware/hostd/vmInventory.xml
<objID>553</objID>
<secDomain/>
<vmxCfgPath>/vmfs/volumes/5af3d037-c454c5ba-f280-0026b97b8252/VPN EU Routing/VPN EU Routing.vmx</vmxCfgPath>

With this, we now know the filepath VMware is expecting. In my case, the file path is “/vmfs/volumes/5af3d037-c454c5ba-f280-0026b97b8252/VPN EU Routing/VPN EU Routing.vmx”

If I try and cat the file I will get an error saying that the file is not found. This is because my UUID for my “FusionIO” datastore has changed. I can find the new UUID by going to the directory in the command line or by finding the file.

[root@ESXi-S2600:~] cd /vmfs/volumes/FusionIO/

with the output being nothing, and changing the directory.

[root@ESXi-S2600:/vmfs/volumes/5e3e8d1a-40d7d622-f7e7-001b2142c328]

The UUID can be found by looking at what the directory has changed to, in this case, “5e3e8d1a-40d7d622-f7e7-001b2142c328” instead of “FusionIO” like I told it too. That is because the Datastore name is a symlink to the UUID of the Datastore volume.

Another way of finding the UUID is by searching all of your drives for the .vmx file VMware is looking for. The name of my VMX file is “VPN EU Routing.vmx”. I can run the following command to find it in the filesystems:

find /vmfs/volumes/ | grep "VPN EU Routing.vmx"

My output was the following

/vmfs/volumes/5be986bd-b0d0aa76-cf8d-001b2142c328/VPN EU Routing/VPN EU Routing.vmx
/vmfs/volumes/5be986bd-b0d0aa76-cf8d-001b2142c328/VPN EU Routing/VPN EU Routing.vmx.lck
/vmfs/volumes/5be986bd-b0d0aa76-cf8d-001b2142c328/VPN EU Routing/VPN EU Routing.vmx~

With this, we now have the full path of the virtual machine configuration file. the only one that matters is the .vmx and not the .vmx.lck and vmx~

 

Next, we need to stop some of the VMware services so they recognize the change to the file. Run the following commands:

/etc/init.d/vpxa stop
/etc/init.d/hostd stop

 

Now that we have the new UUID of the Datastore Volume, we need to change it in the manifest file. we can use sed to accomplish this. I will be taking my old UUID “5af3d037-c454c5ba-f280-0026b97b8252” to my new UUID “5be986bd-b0d0aa76-cf8d-001b2142c328”. MAKE SURE TO BACKUP THE FILE BEFORE DOING THIS STEP! -i.bak will make a backup as well, but it is good to be thorough.

sed -i.bak 's/5af3d037-c454c5ba-f280-0026b97b8252/5be986bd-b0d0aa76-cf8d-001b2142c328/' /etc/vmware/hostd/vmInventory.xml

Now that the file has been edited, we can restart the services

/etc/init.d/vpxa start
/etc/init.d/hostd start

 

Next, we need to reload the virtual machine, with 553 being my VM ID.

vim-cmd vmsvc/reload '553'

Now we can see if the VM was fixed either logging to the webui and looking for the missing virtual machine, or by running

vim-cmd vmsvc/getallvms | grep invalid

If you don’t see the VM ID, then it worked! However, all hope may not be lost if the ID does show up, there may be other methods to re-add them. If you know of any, PLEASE POST THEM BELOW!

Method 2: Remove and Re-add

The last resort method is to remove the VM’s and re add them. You can accomplish them by doing the following.

Firstly, you need to find the VM Object ID’s of your VM’s. Luckily, this is easy. Running the following command will get a list of all the invalid virtual machines.

vim-cmd vmsvc/getallvms | grep invalid

This is the output I got on my problem child:

[root@ESXi-S2600:~] vim-cmd vmsvc/getallvms | grep invalid
Skipping invalid VM '553'
Skipping invalid VM '714'
Skipping invalid VM '715'

From this, we know that the id’s of the invalid VM’s are 553, 714, 715.

 

Next we need to look at the ESXi inventory file to get the path of the virtual machine. The file is located at /etc/vmware/hostd/vmInventory.xml. If we run the command below, it will parse the file and find anything in the file with the string “553”. Replace 553 with your VM ID

grep 553 -A 2 /etc/vmware/hostd/vmInventory.xml

The output will be similar to this:

root@ESXi-S2600:/vmfs/volumes] grep 553 -A 2 /etc/vmware/hostd/vmInventory.xml
<objID>553</objID>
<secDomain/>
<vmxCfgPath>/vmfs/volumes/5af3d037-c454c5ba-f280-0026b97b8252/VPN EU Routing/VPN EU Routing.vmx</vmxCfgPath>

From this, we can see what the .vmx file. In my case, this vmx file is “VPN EU Routing.vmx”. Save this filename for future use.

 

Now that we have the filename. we can remove the problem virtual machine. In my case ID 553.

vim-cmd /vmsvc/unregister 553

 

Now that the machine is removed, we can re-add it by finding the .vmx, in my case “VPN EU Routing.vmx”

find /vmfs/volumes/ | grep "VPN EU Routing.vmx"

My output was the following

/vmfs/volumes/5be986bd-b0d0aa76-cf8d-001b2142c328/VPN EU Routing/VPN EU Routing.vmx
/vmfs/volumes/5be986bd-b0d0aa76-cf8d-001b2142c328/VPN EU Routing/VPN EU Routing.vmx.lck
/vmfs/volumes/5be986bd-b0d0aa76-cf8d-001b2142c328/VPN EU Routing/VPN EU Routing.vmx~

With this, we now have the full path of the virtual machine configuration file. The only one that matters is the .vmx and not the .vmx.lck and vmx~

 

Finally, we can issue a command to add the virtual machine from the filepath discovered by the find command.

vim-cmd solo/registervm "/vmfs/volumes/5be986bd-b0d0aa76-cf8d-001b2142c328/VPN EU Routing/VPN EU Routing.vmx"

And just like that, it is re-added. You will need to disconnect and reconnect the host in vCenter to see the changes. Since it is a new VM, you will have to modify auto start, resource groups, and logical folders, as well as any other thing using that VM’s old VM Object ID. It sucks but at least you did not lose any data.

If you have a better method for fixing this (or a similar issue) please post about it below. I would love to see better methods, as the documentation for these issues is not too great.



About: Ryan Parker

I am former captain of the Cyber Defense team form Cal State San Bernardino. I also have a side job helping small to medium business with anything technology, including but not limited to servers, networking, and end user devices. One of my hobbies is building out infrastructures for myself, friends, and clients. I current maintain a VMware ESXi cluster with about 280GB of RAM, with a 10Gbit network as backbone.


Leave a Reply

Your email address will not be published. Required fields are marked *