Wednesday, April 12, 2017

vSphere 6.5 Enhanced Logging


Among all the amazing new features and enhancements of vSphere 6.5 there’s always one of two that flies under the radar, until you need it.

That’s the case of this small but crucial enhancement on the logging functionality, now it comes with detailed information about who did what when,  becoming more audit-quality and security focus than the previous troubleshooting oriented one.

Let’s take an example: I increasing the vCPU and memory allocation on a VM.

On vSphere 6, the only thing you gonna see is a reconfigure event without much information about what has been changed.


You could guess that ESXi logs might have more details, guess again !!!
There are no details about what has changed neither, just that a reconfigure action has been performed on the specified VM

Now on vSphere 6.5 it has been improved considerably, adding sections for modified, added and removed components.


Now we can clearly see that the vCPU has been increased from 1 to 2 and memory has been reduced from 512mb to 64mb

Guess what about ESXi’s logs…. You can find the same information there.

 
It will make great to diagnostic an issue and revert to the previous state, if you believe that what was caused the issue.

For the one with a higher need for audit track, that’s a good motivator to move to vSphere 6.5

See you next folks
 

Tuesday, April 4, 2017

Identifying controllers and drivers for vSAN


One of the critical components on a vSAN environment is it’s I/O controller’s and flash drivers and firmwares, VMware provides two sets of guidelines, one for general purpose and one specific for vSAN purpose, and they are NOT mandatorily  the same.

When you install ESXi, either through the VMware ISO or a vendor’s custom installation ISO, it comes with general purpose driver, which might lead you to issues on your vSAN environment and thus you need to replace them by the vSAN’s certified one.

Let me make crystal clear here, if you are building a vSAN environment you MUST use the firmware and drivers certified for vSAN.

There are many ways and tools to help you identify correctly what controller, model and driver are right for your environment.

So, let’s see how to check what might be those:

First, identify if your card has been found and loaded by the host.
run: esxcfg-scsidevs -a

As you can seem my controller has been loaded to adapter vmhba5 and it’s using the lsi_mr3 driver.

Second, let’s also check which driver’s version this module is using, so we can compare with the HCL later.
run: vmkload_mod -s “module”   (in my example it’s using lsi_mr3)

In the example, it's running version 6.610.17.00.

Third, let’s list the device’s details, you can filter by adapter name, I’m just filtering anything with hba on it.
run: vmkchdev –l | grep hba


The important thing here is the four, four digits codes:
in my example 1000:005d: 1014:0454, representing:

VIV - Vendor ID: 1000
DID - Device ID: 005d
SVID - SubVendor ID: 1014
SSID – SubDevice ID: 0454

Now let’s find exactly what this controller is.

Go the VMware HCL (remember it’s the general purpose HCL, at this step we will just identify the correct controller model)
Switch the filter to IO Devices, then use the fields on the right to filter out with the details we just grab from the previous step.
Once you update the results you will get your controller model.


Remember it’s the general purpose information, and the idea is just to identify the controller based on the codes we got directly from the hardware itself.

Now let’s switch to vSAN HCL,
Switch the filter to vSAN and click on “Build your Own based on Certified Components”


With the controller model we just found previously, just type it into the keyword field.


Now we have it. 
Click on the controller to see it's firmware and drivers requirement.


As you can see, in my example, I should be using the megaraid driver version 6.610.15.00 instead of the lsi_mr

After I change the driver from lsi_mr3 to megaraid the vSAN health check status changed from warning to Passed.








At this point we just check drivers version, firmware version, on the other hand, is hard to find through the host and it depends if the vendor provides such functionality but it's critically important to have them accordingle with vSAN HCL. Please check with your vendor how to obtain this information, most of the time it comes from baseboard utilities.

Another thing to keep in mind, is about ESXi lifecycle, when creation your Update Manager make sure to include just the drivers certified to vSAN, otherwise, it might install general purpose drivers and vSAN Health Check will report warnings again.

 

Monday, March 20, 2017

vSAN stretched cluster topology explained


VMware vSAN 6.2 Stretched Cluster & 2 Node Guide covers greatly all the network topologies supported and the configuration steps involved on setting up such solution, but the guide does not detail some network aspects which could pass unnoticed and cause some issues down the road in case you did not consider them during design phase.

That’s what I’m trying to help clarify today.

My example is based on VMware’s recommended architecture, a stretched L2 network between data sites and L3 network between data sites and witness site.

If you follow the guide, soon you will notice the hosts from both sites would use the same gateway.

 If you implement something like that, the traffic from Site B would have to transverse the link between sites in order to reach the witness site.


Guest what, in case of link failure between witness site and Site A, link failure between sites or gateway (192.168.0.1) failure, hosts on site B would also be unable to reach the witness, even though there’s an alternate path through the gateway on site B (192.168.0.129).
Without access to the witness, the hosts on site B, would not have access to 50% plus of the components and then would not be able to run or restart any VM on it.

Back to my hypothetical scenario, to avoid this cross-link situation and make the hosts use their own local gateway and links this is what I’m proposing.

- First, plan your vSAN IP address space in advance;
Let’s take the example bellow: my L2 stretched vSAN network is 192.168.0.X/24
I’ll configure half of the IPs to be used on site A and the other half on site B.
        * Hosts on site A, will have their vSAN VMkernel IPs from a range of 
          192.168.0.2-192.168.0.126, (192.168.0.1 is already taken by the gateway).
        * Hosts on site B, will have their vSAN VMkernel IPs from a range of 
          192.168.0.130-192.168.0.254, (192.168.0.129 is already taken by the gateway).

- Second, hosts will have static routes to the witness site using it’s local gateway;
        * Hosts on site A will have a static route through 192.168.0.1
        * Hosts on site B will have a static route through 192.168.0.129

Now the tricky part, on the witness gateway (192.171.0.1), we will have to tell it to use different routes to reach each subset of vSAN network, each gateway vendor might implement it in its own way.

On my design, from the witness gateway perspective, I made my /24 network looks like two /25 networks, even though I did not make any change underneath.

Now I can add 2 static routes on the witness gateway, one for each site going through its own link and gateway
        * Route to 192.168.0.0/25 through 192.169.0.10
        * Route to 192.168.0.128/25 through 192.170.0.10

Another option would be using dynamic routes among the gateways, not covering it on this post.

This way, communication will always follow through its local gateway and link, providing full high-availability for each site.


I’m pretty sure there are a lot of different topologies, sites interconnection, and technologies to solve this issue. 

This example is not a single size fits all solution, but my main goal here is to create the awareness that this routing discussion must be taken. 
 

Monday, March 13, 2017

vSphere Replication Traffic Isolation


Once again I was working on a vSphere Site Recovery Manager project using vSphere Replication as a replication technology and had to explain how to isolate replication traffic from others network traffics, while explaining it is never a problem, I was wondering how many out there still have doubts about it, so creating this post is my way to try reach as many of you as I can.

But before we start we should ask ourselves, why isolate the replication traffic ?

Isolating the replication traffic might enhance the network performance in the data center because you isolate it from the other critical business network traffics, then we can apply individual prioritization and QoS methods, utilize a dedicated physical uplink or entirely network, enhance your monitoring and troubleshooting because you know exactly what is the purpose of the traffic and where they are flowing, security is enhanced as well as you don’t mix and match them. It’s all benefits ; )

While vSphere Replication Installation and Configuration guide has a procedure on how to accomplish the isolation, it does not really explain how it works and what the replication flow looks like, let’s see if I can through some lights on it.

Let’s work on a hypothetical environment:

- 2 sites;
- a management segment on each site (VLAN 120 and 220);
- a replication segment on each site (VLAN 110 and 210).



The goal, again, is to isolate the replication traffic from other traffics.

Obs: I’m intentionally hiding other services like vMotion or VM’s network to keep it simple, but imagine they are all running on their own segments.

Now let’s setup the environment properly

ESXi Preparation:
The way vSphere replication isolation works is sending and receiving the replication traffic through some specific VMkernel adapters, as we will see bellow.

On each ESXi hosts, create 2 new VMkernel adapters, make sure to select the portgroup which correlates to the VLAN ID for the replication segment for each site and configure an IP address accordingly for the adapter. (don't forget to select the right service for each adapter)

- One for outgoing traffic (vSphere Replication traffic)

- One for incoming traffic (vSphere Replication NFC traffic)
 Obs: We are creating both VMkernels adapter on each host, so they can work bi-directional, which means, it could be a source of a replication but also a destination.

If you remember ESXi TCP/IP stack there’s not individual stack for replication, so it would use the default gateway (on management interface) to replicate the traffic to a routed segment, that’s not our goal.
In this case, we must add a static route to each and every ESXi host telling to use another route through the new VMkernel interface to reach the replication segment on the other site. (KB2001426) is a very nice KB on how to add static routes to ESXi hosts.


vSphere Replication Preparation:
vR comes with a single vNIC, which is used for everything, management traffic, which involves communication with vCenter and ESXi and coordination with other vRs as well, plus replication.
Since we want to isolate the traffic, we will add a second vNIC just for this purpose.

- First, shutdown the vR appliance;
- add a second vNIC;
- power it on and access it’s VAMI Console. (https://"vr_ip":5480)
- On the Network tab, select Address;
- Scroll down to the eth1 and add an IP of the replication segment according to the site;



- Go back to VR tab, select Configuration;
- Add the recently configured replication IP at IP Address for incoming Storage Traffic field.

As has been on ESXi, we don’t want to use the default gateway, on management segment, to send the replication traffic, so we need to add a static route to vR.
Since it's a Linux box we can add the static routes information to /etc/sysconfig/network/routes file.
- Restart network services.

Now with everything ready, let's see how the replication flows:


- Configure on VM for replication; 

- Once an RPO is met, the ESXi hosting the VM which needs to be replicated sends it’s data through the VMkernel setup as vSphere Replication Traffic to the vSphere Replication Server on the destination site. (Dark Blue flow)

The vSphere Replication on the destination site receives and buffer the data and then sends it to the ESXi host on the destination site, which receives it through the VMkernel setup as vSphere Replication NFC traffic and saves the data to the destination datastore. (Red flow)

As we can see the traffic is isolated from management segment and hopefully from others.

Others can argue that you could create a dedicated segment for vSphere Replication traffic (outgoing) and other for vSphere Replication NFC traffic (incoming), but personally I believe, since it’s all replication traffic related, breaking it down further just adds complexity, so I like to keep it simple with a single segment for all replication traffic, incoming and outgoing.

We are done, you can start a replication and see if the traffic is following how it supposed to.

Hope this makes clear on how to isolate vSphere Replication traffic.

Who am I

My photo
I’m and IT specialist with over 15 years of experience, working from IT infraestructure to management products, troubleshooting and project management skills from medium to large environments. Nowadays I'm working for VMware as a Senior Consultant, helping customers to embrace the Cloud Era and make them succefully on this journay. Despite the fact I'm a VMware employee these postings reflect my own opnion and do not represents VMware's position, strategies or opinios.

Most Viewed Posts

Blog Archive