A few weeks ago I realized the storage Vmotion (svmotion) was not working.
Right after it started it stopped with an error message: Failed to connect to host
I did check several KBs and posts out there, but the common sense was about name resolution (DNS issues), deleting snapshots, restarting the VMware management services.
So I decide to look further, first I wanted to make sure Vmotion was working (the first 10% of svmotion task is a vmotion to the same host). It worked fine; my host was licensed to vmotion.
Then I looked at the host’s logs, /var/log/vmkernel, /var/log/vmkwarning, var/log/vmware/hostd.log and nothing in there gave me a clue.
A few more troubleshoot was.
Moved the guest to another cluster, to make sure the issue was not on the guest. It worked, so the problem was within the hosts.
I moved one of the hosts to a new cluster, just to make sure the problem was not with some cluster configuration. It was not.
Then looking at the Virtual Center logs (could be located at: C:\Documents and Settings\All Users\Application Data\VMware\VMware VirtualCenter\Logs )
Looks for the actual one, it starts from vpxd-0.log to vpxd-9.log
I found a very interesting log:
[VpxdInvtHost] IP address change for xxx.xxx.xxx.9 to xxx.xxx.xxx.106 not handled, SSL certificate verification is not enabled.
So, apparently Virtual Center was thinking I have changed the cOS IP (I have not) and just during svmotion it was getting lost in the middle of the process.
At this time, I put the host into maintenance mode (to move the guests off), then disconnect the host and remove it from VC.
I wait a few seconds and add it back. ( it will register the host with correct IP back on VC)
After that my svmotion started to work again.
Hope it can help someone out there.
Bye