Most of my time as a Consulting Architect at VMware Professional Services I spend with clients, helping them to create innovative solutions, overcoming challenges, etc.
Since every environment is unique, sometimes I stumble to some weird situations, this past week was one of them.
The client was upgrading their ESXi hosts from version 6.0 to 6.5, while the majority of the hosts went smoothly, a couple of them presented some undesired behavior.
Update Manager was used to remediate the hosts, everything was going fine, the patches have been staged and the first reboot occurred as expected, but during the installation, it crashed with a blue screen and an error message:
*******************
An expected error occurred
See logs for details
UnicoDecodeError: ‘utf-8’ codec can’t decode byte 0x8b in position 513: invalid start byte
*******************
(I'm sorry about the image quality, I was in a hurry trying to figure it out)
And then the installation rollback automatically to ESXi 6.0
Surprisingly all hosts were the same model, installed at the same period, the same way with the same ISO, so there’s nothing special about those hosts we could think off.
After some basic troubleshooting nothing pops up and an internet search for this error did not return anything relevant.
Time to search internally, VOILA ….that’s when I found a couple of past cases with the same behavior.
Long story short, the altbootbank for some reason was corrupted, we never found out why.
The solution was to recreate the altbookbank from the bootbank partition.
First, we got rid of the content in /altbootbank and then we copied the content from /bootbank to it.
Wait a minute, what /altbootbank and /bootbank is all about ?
ESXi keeps two independents copies of its boot partition, bootbank and altbootbank. One of them will have the active image, bootbank, which is used to boot up the system and the other one will have an alternate image, altbootbank, you can imagine that as the last good known state, so in case your boot partition becomes corrupted you can reboot your host from the last good know state (altbootbank).
It really took me a while to figured out how to solve it. I’m publishing it hoping it can save some of your time too, just let me know if you faced this issue too.