Monday, March 20, 2017

vSAN stretched cluster topology explained

VMware vSAN 6.2 Stretched Cluster & 2 Node Guide covers greatly all the network topologies supported and the configuration steps involved on setting up such solution, but the guide does not detail some network aspects which could pass unnoticed and cause some issues down the road in case you did not consider them during design phase.

That’s what I’m trying to help clarify today.

My example is based on VMware’s recommended architecture, a stretched L2 network between data sites and L3 network between data sites and witness site.

If you follow the guide, soon you will notice the hosts from both sites would use the same gateway.

 If you implement something like that, the traffic from Site B would have to transverse the link between sites in order to reach the witness site.

Guest what, in case of link failure between witness site and Site A, link failure between sites or gateway ( failure, hosts on site B would also be unable to reach the witness, even though there’s an alternate path through the gateway on site B (
Without access to the witness, the hosts on site B, would not have access to 50% plus of the components and then would not be able to run or restart any VM on it.

Back to my hypothetical scenario, to avoid this cross-link situation and make the hosts use their own local gateway and links this is what I’m proposing.

- First, plan your vSAN IP address space in advance;
Let’s take the example bellow: my L2 stretched vSAN network is 192.168.0.X/24
I’ll configure half of the IPs to be used on site A and the other half on site B.
        * Hosts on site A, will have their vSAN VMkernel IPs from a range of 
, ( is already taken by the gateway).
        * Hosts on site B, will have their vSAN VMkernel IPs from a range of 
, ( is already taken by the gateway).

- Second, hosts will have static routes to the witness site using it’s local gateway;
        * Hosts on site A will have a static route through
        * Hosts on site B will have a static route through

Now the tricky part, on the witness gateway (, we will have to tell it to use different routes to reach each subset of vSAN network, each gateway vendor might implement it in its own way.

On my design, from the witness gateway perspective, I made my /24 network looks like two /25 networks, even though I did not make any change underneath.

Now I can add 2 static routes on the witness gateway, one for each site going through its own link and gateway
        * Route to through
        * Route to through

Another option would be using dynamic routes among the gateways, not covering it on this post.

This way, communication will always follow through its local gateway and link, providing full high-availability for each site.

I’m pretty sure there are a lot of different topologies, sites interconnection, and technologies to solve this issue. 

This example is not a single size fits all solution, but my main goal here is to create the awareness that this routing discussion must be taken. 

Monday, March 13, 2017

vSphere Replication Traffic Isolation

Once again I was working on a vSphere Site Recovery Manager project using vSphere Replication as a replication technology and had to explain how to isolate replication traffic from others network traffics, while explaining it is never a problem, I was wondering how many out there still have doubts about it, so creating this post is my way to try reach as many of you as I can.

But before we start we should ask ourselves, why isolate the replication traffic ?

Isolating the replication traffic might enhance the network performance in the data center because you isolate it from the other critical business network traffics, then we can apply individual prioritization and QoS methods, utilize a dedicated physical uplink or entirely network, enhance your monitoring and troubleshooting because you know exactly what is the purpose of the traffic and where they are flowing, security is enhanced as well as you don’t mix and match them. It’s all benefits ; )

While vSphere Replication Installation and Configuration guide has a procedure on how to accomplish the isolation, it does not really explain how it works and what the replication flow looks like, let’s see if I can through some lights on it.

Let’s work on a hypothetical environment:

- 2 sites;
- a management segment on each site (VLAN 120 and 220);
- a replication segment on each site (VLAN 110 and 210).

The goal, again, is to isolate the replication traffic from other traffics.

Obs: I’m intentionally hiding other services like vMotion or VM’s network to keep it simple, but imagine they are all running on their own segments.

Now let’s setup the environment properly

ESXi Preparation:
The way vSphere replication isolation works is sending and receiving the replication traffic through some specific VMkernel adapters, as we will see bellow.

On each ESXi hosts, create 2 new VMkernel adapters, make sure to select the portgroup which correlates to the VLAN ID for the replication segment for each site and configure an IP address accordingly for the adapter. (don't forget to select the right service for each adapter)

******* Updated information 07/12/2017  **************
I might be to conservative in here, creating one VMkernel for each traffic direction, while it allows more granular control it also might create some additional complexity and requires more effort setting up the environment.

The point is, if you want, you can create a single VMkernel and enable both services, 
got it ?

- One for outgoing traffic (vSphere Replication traffic)

- One for incoming traffic (vSphere Replication NFC traffic)
 Obs: We are creating both VMkernels adapter on each host, so they can work bi-directional, which means, it could be a source of a replication but also a destination.

If you remember ESXi TCP/IP stack there’s not individual stack for replication, so it would use the default gateway (on management interface) to replicate the traffic to a routed segment, that’s not our goal.
In this case, we must add a static route to each and every ESXi host telling to use another route through the new VMkernel interface to reach the replication segment on the other site. (KB2001426) is a very nice KB on how to add static routes to ESXi hosts.

vSphere Replication Preparation:
vR comes with a single vNIC, which is used for everything, management traffic, which involves communication with vCenter and ESXi and coordination with other vRs as well, plus replication.
Since we want to isolate the traffic, we will add a second vNIC just for this purpose.

- First, shutdown the vR appliance;
- add a second vNIC;
- power it on and access it’s VAMI Console. (https://"vr_ip":5480)
- On the Network tab, select Address;
- Scroll down to the eth1 and add an IP of the replication segment according to the site;

- Go back to VR tab, select Configuration;
- Add the recently configured replication IP at IP Address for incoming Storage Traffic field.

As has been on ESXi, we don’t want to use the default gateway, on management segment, to send the replication traffic, so we need to add a static route to vR.
Since it's a Linux box we can add the static routes information to /etc/sysconfig/network/routes file.
- Restart network services.

Now with everything ready, let's see how the replication flows:

- Configure on VM for replication; 

- Once an RPO is met, the ESXi hosting the VM which needs to be replicated sends it’s data through the VMkernel setup as vSphere Replication Traffic to the vSphere Replication Server on the destination site. (Dark Blue flow)

The vSphere Replication on the destination site receives and buffer the data and then sends it to the ESXi host on the destination site, which receives it through the VMkernel setup as vSphere Replication NFC traffic and saves the data to the destination datastore. (Red flow)

As we can see the traffic is isolated from management segment and hopefully from others.

Others can argue that you could create a dedicated segment for vSphere Replication traffic (outgoing) and other for vSphere Replication NFC traffic (incoming), but personally I believe, since it’s all replication traffic related, breaking it down further just adds complexity, so I like to keep it simple with a single segment for all replication traffic, incoming and outgoing.

We are done, you can start a replication and see if the traffic is following how it supposed to.

Hope this makes clear on how to isolate vSphere Replication traffic.

Thursday, March 2, 2017

vRealize Production Test

Maintain a healthy environment is part of any IT Administrator’s duty,  sometimes is hard to assess several components that might compound a solution, would it not be nice to have a tool that looks at these solutions and provides you a health report ?


Well, that’s what vRealize Production Test is all about. 


It started as a testing tool to vRealize Automation but VMware has been evolving it during the past months and now it includes engines to test vRealize Business and vRealize Operations as well. 


Let’s see how it works. 


-       Download vRealize Production Test jar file;






It’s part of the vRealize Automation download page, just go to Drivers & Tools and there is it.


 -       Open a command prompt and navigate to the directory where you save it;

      It’s based on Java, so be sure you have java installed on the system where you will run it from.


-       Start configuring the tool, running:  

java –jar vrealize-productiontest-1.7.0.jar config

Obs: make sure you change to your version file accordingly


When running for the first time you need to accept the EULA

-       On the left menu, you should select each product you wanna test and fill out its information;

-       Fill out the field appropriately and Save;

It will create an XML file with the information you just provide


-    To start checking your environment run:

java –jar vrealize-productiontest-1.7.0.jar run --oobList VRA

Once it’s done, there will be a html-reports folder on the source where you ran the script, inside that, open the VRPTR.html file

The report shows a nice summary of findings, but also a detailed section, where you can learn what has been found and the details on how to fix that.

If you want to learn more about the tool, there’s a detailed guide as well.


OBS: don’t be alarmed about the number of issues on my environment, I intentionally gave some bad hostnames and credentials to generate more errors ; )

Who am I

My photo
I’m an IT specialist with over 15 years of experience, working from IT infrastructure to management products, troubleshooting and project management skills from medium to large environments. Nowadays I'm working for VMware as a Consulting Architect, helping customers to embrace the Cloud Era and make them successfully on their journey. Despite the fact I'm a VMware employee these postings reflect my own opinion and do not represents VMware's position, strategies or opinions. Reach me at @dumeirell

Most Viewed Posts

Blog Archive