vSAN stretched cluster topology explained

VMware vSAN 6.2 Stretched Cluster & 2 Node Guide covers greatly all the network topologies supported and the configuration steps involved on setting up such solution, but the guide does not detail some network aspects which could pass unnoticed and cause some issues down the road in case you did not consider them during design phase.

That’s what I’m trying to help clarify today.

My example is based on VMware’s recommended architecture, a stretched L2 network between data sites and L3 network between data sites and witness site.

If you follow the guide, soon you will notice the hosts from both sites would use the same gateway.

If you implement something like that, the traffic from Site B would have to transverse the link between sites in order to reach the witness site.

Guest what, in case of link failure between witness site and Site A, link failure between sites or gateway (192.168.0.1) failure, hosts on site B would also be unable to reach the witness, even though there’s an alternate path through the gateway on site B (192.168.0.129).

Without access to the witness, the hosts on site B, would not have access to 50% plus of the components and then would not be able to run or restart any VM on it.

Back to my hypothetical scenario, to avoid this cross-link situation and make the hosts use their own local gateway and links this is what I’m proposing.

- First, plan your vSAN IP address space in advance;

Let’s take the example bellow: my L2 stretched vSAN network is 192.168.0.X/24

I’ll configure half of the IPs to be used on site A and the other half on site B.

* Hosts on site A, will have their vSAN VMkernel IPs from a range of

192.168.0.2-192.168.0.126, (192.168.0.1 is already taken by the gateway).

* Hosts on site B, will have their vSAN VMkernel IPs from a range of

192.168.0.130-192.168.0.254, (192.168.0.129 is already taken by the gateway).

- Second, hosts will have static routes to the witness site using it’s local gateway;

* Hosts on site A will have a static route through 192.168.0.1

* Hosts on site B will have a static route through 192.168.0.129

Now the tricky part, on the witness gateway (192.171.0.1), we will have to tell it to use different routes to reach each subset of vSAN network, each gateway vendor might implement it in its own way.

On my design, from the witness gateway perspective, I made my /24 network looks like two /25 networks, even though I did not make any change underneath.

Now I can add 2 static routes on the witness gateway, one for each site going through its own link and gateway

* Route to 192.168.0.0/25 through 192.169.0.10

* Route to 192.168.0.128/25 through 192.170.0.10

Another option would be using dynamic routes among the gateways, not covering it on this post.

This way, communication will always follow through its local gateway and link, providing full high-availability for each site.

I’m pretty sure there are a lot of different topologies, sites interconnection, and technologies to solve this issue.

This example is not a single size fits all solution, but my main goal here is to create the awareness that this routing discussion must be taken.

Author Description