Wednesday, April 26, 2017

vSAN HCL Warning

A few weeks ago I talked about Identifying your controllers and installing the correct firmware and drivers for a vSAN environment.
Even though you follow my recommendations vSAN Health Check still showing a warning for the vSAN cache tier controllers.

Don’t be afraid, there’s a bug that prevents Health Check to correctly identify PCIE/NVMe controllers and so the alarm is triggered, even with the certified driver and firmware installed.

It took me some time to find this information, so I thought I could save some of your time too.
Here's the KB2146676 that outline this bug.

Obs: This bug has been fixed on vSAN 6.5

Wednesday, April 12, 2017

vSphere 6.5 Enhanced Logging

Among all the amazing new features and enhancements of vSphere 6.5 there’s always one of two that flies under the radar, until you need it.

That’s the case of this small but crucial enhancement on the logging functionality, now it comes with detailed information about who did what when,  becoming more audit-quality and security focus than the previous troubleshooting oriented one.

Let’s take an example: I increasing the vCPU and memory allocation on a VM.

On vSphere 6, the only thing you gonna see is a reconfigure event without much information about what has been changed.

You could guess that ESXi logs might have more details, guess again !!!
There are no details about what has changed neither, just that a reconfigure action has been performed on the specified VM

Now on vSphere 6.5 it has been improved considerably, adding sections for modified, added and removed components.

Now we can clearly see that the vCPU has been increased from 1 to 2 and memory has been reduced from 512mb to 64mb

Guess what about ESXi’s logs…. You can find the same information there.

It will make great to diagnostic an issue and revert to the previous state, if you believe that what was caused the issue.

For the one with a higher need for audit track, that’s a good motivator to move to vSphere 6.5

See you next folks

Tuesday, April 4, 2017

Identifying controllers and drivers for vSAN

One of the critical components on a vSAN environment is it’s I/O controller’s and flash drivers and firmwares, VMware provides two sets of guidelines, one for general purpose and one specific for vSAN purpose, and they are NOT mandatorily  the same.

When you install ESXi, either through the VMware ISO or a vendor’s custom installation ISO, it comes with general purpose driver, which might lead you to issues on your vSAN environment and thus you need to replace them by the vSAN’s certified one.

Let me make crystal clear here, if you are building a vSAN environment you MUST use the firmware and drivers certified for vSAN.

There are many ways and tools to help you identify correctly what controller, model and driver are right for your environment.

So, let’s see how to check what might be those:

First, identify if your card has been found and loaded by the host.
run: esxcfg-scsidevs -a

As you can seem my controller has been loaded to adapter vmhba5 and it’s using the lsi_mr3 driver.

Second, let’s also check which driver’s version this module is using, so we can compare with the HCL later.
run: vmkload_mod -s “module”   (in my example it’s using lsi_mr3)

In the example, it's running version 6.610.17.00.

Third, let’s list the device’s details, you can filter by adapter name, I’m just filtering anything with hba on it.
run: vmkchdev –l | grep hba

The important thing here is the four, four digits codes:
in my example 1000:005d: 1014:0454, representing:

VIV - Vendor ID: 1000
DID - Device ID: 005d
SVID - SubVendor ID: 1014
SSID – SubDevice ID: 0454

Now let’s find exactly what this controller is.

Go the VMware HCL (remember it’s the general purpose HCL, at this step we will just identify the correct controller model)
Switch the filter to IO Devices, then use the fields on the right to filter out with the details we just grab from the previous step.
Once you update the results you will get your controller model.

Remember it’s the general purpose information, and the idea is just to identify the controller based on the codes we got directly from the hardware itself.

Now let’s switch to vSAN HCL,
Switch the filter to vSAN and click on “Build your Own based on Certified Components”

With the controller model we just found previously, just type it into the keyword field.

Now we have it. 
Click on the controller to see it's firmware and drivers requirement.

As you can see, in my example, I should be using the megaraid driver version 6.610.15.00 instead of the lsi_mr

After I change the driver from lsi_mr3 to megaraid the vSAN health check status changed from warning to Passed.

At this point we just check drivers version, firmware version, on the other hand, is hard to find through the host and it depends if the vendor provides such functionality but it's critically important to have them accordingle with vSAN HCL. Please check with your vendor how to obtain this information, most of the time it comes from baseboard utilities.

Another thing to keep in mind, is about ESXi lifecycle, when creation your Update Manager make sure to include just the drivers certified to vSAN, otherwise, it might install general purpose drivers and vSAN Health Check will report warnings again.


Monday, March 20, 2017

vSAN stretched cluster topology explained

VMware vSAN 6.2 Stretched Cluster & 2 Node Guide covers greatly all the network topologies supported and the configuration steps involved on setting up such solution, but the guide does not detail some network aspects which could pass unnoticed and cause some issues down the road in case you did not consider them during design phase.

That’s what I’m trying to help clarify today.

My example is based on VMware’s recommended architecture, a stretched L2 network between data sites and L3 network between data sites and witness site.

If you follow the guide, soon you will notice the hosts from both sites would use the same gateway.

 If you implement something like that, the traffic from Site B would have to transverse the link between sites in order to reach the witness site.

Guest what, in case of link failure between witness site and Site A, link failure between sites or gateway ( failure, hosts on site B would also be unable to reach the witness, even though there’s an alternate path through the gateway on site B (
Without access to the witness, the hosts on site B, would not have access to 50% plus of the components and then would not be able to run or restart any VM on it.

Back to my hypothetical scenario, to avoid this cross-link situation and make the hosts use their own local gateway and links this is what I’m proposing.

- First, plan your vSAN IP address space in advance;
Let’s take the example bellow: my L2 stretched vSAN network is 192.168.0.X/24
I’ll configure half of the IPs to be used on site A and the other half on site B.
        * Hosts on site A, will have their vSAN VMkernel IPs from a range of 
, ( is already taken by the gateway).
        * Hosts on site B, will have their vSAN VMkernel IPs from a range of 
, ( is already taken by the gateway).

- Second, hosts will have static routes to the witness site using it’s local gateway;
        * Hosts on site A will have a static route through
        * Hosts on site B will have a static route through

Now the tricky part, on the witness gateway (, we will have to tell it to use different routes to reach each subset of vSAN network, each gateway vendor might implement it in its own way.

On my design, from the witness gateway perspective, I made my /24 network looks like two /25 networks, even though I did not make any change underneath.

Now I can add 2 static routes on the witness gateway, one for each site going through its own link and gateway
        * Route to through
        * Route to through

Another option would be using dynamic routes among the gateways, not covering it on this post.

This way, communication will always follow through its local gateway and link, providing full high-availability for each site.

I’m pretty sure there are a lot of different topologies, sites interconnection, and technologies to solve this issue. 

This example is not a single size fits all solution, but my main goal here is to create the awareness that this routing discussion must be taken. 

Who am I

My photo
I’m and IT specialist with over 15 years of experience, working from IT infraestructure to management products, troubleshooting and project management skills from medium to large environments. Nowadays I'm working for VMware as a Senior Consultant, helping customers to embrace the Cloud Era and make them succefully on this journay. Despite the fact I'm a VMware employee these postings reflect my own opnion and do not represents VMware's position, strategies or opinios.

Most Viewed Posts

Blog Archive