Wednesday, June 21, 2017

vSphere Integrated Containers – Performance over the Limits - 1/2


After I talked about resources management within vSphere Integrated Containers some questions arose immediately:

What happens if I try to create containers above the limits?

That’s what I’m about to demonstrate today, let’s take this journey together, shall we?

By default, the VCH endpoint VM has 1vCPU and 2GB of memory and that's what we'll work on with.
With that said, the first VCH I’ll create has a low CPU limit (only 100MHz);

Apparently, the VCH creation went normally, but during the endpoint VM communication validation step some errors were reported, which what looks like is a timeout issue;

If we take a look at vSphere, the endpoint VM is taking a long time to boot-up, just showing a banner on it’s console...
… eventually, it boot’s up but with some errors on the console, you might try, but even a simple “ping” does not work.
Looking further, we can see on the performance chart that the VMs is Entitled to only 100MHz, the same amount which was specified by the VCH creation.
So during the boot process, the endpoint VM Demands more than what it’s entitled to, that’s when we saw a high READY time, meaning that the VMs is ready to run but could not get scheduled on the physical ESXi CPU.
Now the time-out make sense ; )


Let’s try differently this time, let’s create a VCH with a low memory limit (only 100MB);

This time the VCH creation fails with a more descriptive message. "Failed to power on appliance. The available Memory resources in the parent resource pool are insufficient for the operation"
As we can see there’s not enough memory to power one the endpoint VM so it cannot proceed.

Well, it’s clear that at least we need to set the limits higher that the VCH endpoint configuration.

I think it’s enough for one post, next I will cover what happens when the containers consume more than the Limits.

Stay tuned !!

Tuesday, June 13, 2017

vSphere Integrated Containers – User Defined Network


A few weeks ago I talked about vSphere Integrated ContainersNetworking, what they are used for, syntaxes and how the traffic flows to and from the containers, but they were all from the point of view of vSphere Administrators provisioning virtual container host, VCH.

Developers, on the other hand, are used to create their own networks; for several reasons, like; isolating containers from each other, creating a backend network for some application or just for service discovery outside of the default bridge network, they are called User-Defined Networks.

Let’s see how it works:
The standard deployment of VCH comes with a default bridge network;

When we create containers without any specification, it’s connected to the port group backing up the bridge network, which was specified during VCH provisioning, in this case, "backend" and got and IP address from 172.16.0.0/24 address space.
 

Now, let’s create a user-defined network;
Obs: I’m using --subnet option because I don’t have a DHCP listening on that segment.



This time I will create another container connected to this new user-defined network I just created.


As expected the container is connected to the same port group backing up the bridge network but received an IP address from the range specified during the user-defined network creation (10.10.10.0/24).



My point here is, although they are connected to the same segment (port group) the different  address space provides enough segregation between containers.

That’s one of the reasons we recommend a dedicated segment for each VCH bridge network, otherwise diferent users could create additional user-defined networks with the same address space of each other, which might inadvertently allow access to each other containers or cause and IP conflict.

See you next

Wednesday, June 7, 2017

vSphere Integrated Containers – Resource Manager

One of the many vSphere Integrated Containers, VIC, benefits is its ability to control and manage resource allocation. 
Let’s make a comparison between VIC and traditional containers deployment to clarify what I mean with that.

With traditional containers deployment, you have to size your container host upfront, we all know how easy it is to foresee adoption and future grown, right ?!
Inevitably you will end up in two situations, either you sized your container host to small and in a few months or weeks it will be full and your developers will ask for another one or you size it too big and the container host is out there just wasting resources, which could be utilized somewhere else, not efficient.
Last be honest neither of them is good scenarios.

VIC, on the other hand, approaches resource allocation in a different way: 
first, when you create your virtual container host, VCH, you are not allocating resources, you are just defining its boundaries, think of it as vSphere resource pool definition which we all knows for years.

When you create your VCH it will show up at vCenter as a vApp (nothing really new).


By default, VCH is created without any limitation, just edit the VCH and you will see it.

































At this point, you are probably worried that your developers would consume ALL your resources.
Luckily VIC has all the tools to solve the problem, during VCH creation you can specify the limits of memory (in MB) and cpu (in MHz), just adding the options  --memory “size” or/and --cpu “amount"


Now the limitation is applied to the vApp

































Also, it's reported back to the developers as well



Well, it does not prevent us from an unexpected grow, doesn't it ? 
But since VCH is just a resource pool like, you can manually edit it for expanding or shrinking it’s limitation without any impact or downtime to the actual containers.
It’s what I call an elastic solution !!!

What about the containers itself ?

By default, they are created with 2vCPUS and 2GB of RAM

































 If you want you can give them more or fewer resources, just add the options --memory “size” or/and --cpuset-set “amount” when creating your container.


Remember, since every container is a unique VM on vCenter you can see it’s allocations is properly set up



Now you can size your container host like a boss !!!

Thursday, June 1, 2017

Top vBlog 2017


http://topvblog2017.questionpro.com/

Once again, Eric Siebert , launched it’s popular VMware related blog contest for 2017. We should give this man a credit, despite all personal challenges he made to put it out there again.
Thanks, Eric !!!

When voting tries to keep in mind the following criteria:

  • Longevity – Anyone can start a blog but it requires dedication, time & effort to keep it going. Some bloggers start a blog only to have it fall to the wayside several months later. Things always come up in life but the good bloggers keep going regardless of what is happening in their life.
  • Length – It’s easy to make a quick blog post without much content, nothing wrong with this as long as you have good content in the post that people will enjoy. But some bloggers post pretty long detailed posts which take a lot of time and effort to produce. The tip of the hat goes to these guys that burn the midnight oil trying to get you some great detailed information.
  • Frequency – Some bloggers post several times a week which provides readers with lots of content. This requires a lot of effort as bloggers have to come up with more content ideas to write about. Frequency ties into length, some do high frequency/low length, some do low frequency/high length, some do both. They’re all good and require a lot of time and effort on the bloggers part.
  • Quality – It all comes down to what in the blog post regardless of how often or how long the blog posts are. After reading a blog post if you come away with learning something that you did not previously know and it benefits you in some way then you know you are reading a quality post. Good quality is usually the result of original content, it's easy to re-hash something previously published elsewhere, the good bloggers come up with unique content or put their own unique spin on popular topics.
 
What you are waiting for, show your appreciation for the bloggers that help you the most.

GO VOTE !!!

Ohhh yes, I'm there as well...


Let's see if I can make better this year ; )


 

Friday, May 19, 2017

vSphere Integrated Containers - Networking


At this point, you probably already heard about vSphere Integrated Containers and how it solves many of the challenges with traditional containers management at scale.

One of its key pillars is the implementation of  Container-VMs, meaning that for each container you have a corresponding VM, this approach allows us to use vSphere distributed Port groups or NSX Logical Switches to secure and segregate network communication as has never seen before.

Let’s examine what network options are available for VIC.

Public Network
Mandatory: YES
Syntax: --public-network “port_group”
Its main use is to pull container images from registries, like Docker Hub, and enable access to containers services through port mapping.
If not another optional network is specified during VCH creation it will also handle communication from VCH to vCenter and ESXi hosts and Docker API calls from clients creating and managing containers.

Bridged Network
Mandatory: YES
Syntax: --bridge-network “port_group”
It’s the segment where the containers communicate with each other.
If Management network is not specified it will also handle communication between VCH and containers.
Best practice is to use a dedicated segment for each VCH

Management Network
Mandatory: NO
Syntax: --management-network “port_group”
When specified, It will be used by VCH to communicate with vCenter and ESXi and
communication between VCH and containers.

Client Network
Mandatory: No
Syntax: --client-network “port_group”
When specified, it will be used to expose the Docker API service to clients in order to create and manage containers by the clients.

Containers Network
Mandatory: No
Syntax: --container-network “port_group:friendly_name”
When specified, it will expose containers services directly on the network, bypassing the VCH altogether.
That’s a great option which only VIC can provide, avoiding the single point of failure VCH can be, also providing dedicated network bandwidth per container.

Let’s see all the communication possible within this solution.



1 – Developers create containers through the Client network leveraging Docker API calls
2 – VCH pulls image from Docker Hub through the Public network
3 – VCH, through the management network, calls vCenter and ESXi hosts to created Container-VMs
4 – Through the Public network, users access container services by port mapping
5 – Frontend container app access data from the backend database through the use of Bridge network.
6 – Users connect directly to the container's services

If you want to learn more about the networking and advanced examples, please, check VIC documentation here.

Wednesday, April 26, 2017

vSAN HCL Warning

A few weeks ago I talked about Identifying your controllers and installing the correct firmware and drivers for a vSAN environment.
Even though you follow my recommendations vSAN Health Check still showing a warning for the vSAN cache tier controllers.


Don’t be afraid, there’s a bug that prevents Health Check to correctly identify PCIE/NVMe controllers and so the alarm is triggered, even with the certified driver and firmware installed.

It took me some time to find this information, so I thought I could save some of your time too.
Here's the KB2146676 that outline this bug.

Obs: This bug has been fixed on vSAN 6.5

Wednesday, April 12, 2017

vSphere 6.5 Enhanced Logging


Among all the amazing new features and enhancements of vSphere 6.5 there’s always one of two that flies under the radar, until you need it.

That’s the case of this small but crucial enhancement on the logging functionality, now it comes with detailed information about who did what when,  becoming more audit-quality and security focus than the previous troubleshooting oriented one.

Let’s take an example: I increasing the vCPU and memory allocation on a VM.

On vSphere 6, the only thing you gonna see is a reconfigure event without much information about what has been changed.


You could guess that ESXi logs might have more details, guess again !!!
There are no details about what has changed neither, just that a reconfigure action has been performed on the specified VM

Now on vSphere 6.5 it has been improved considerably, adding sections for modified, added and removed components.


Now we can clearly see that the vCPU has been increased from 1 to 2 and memory has been reduced from 512mb to 64mb

Guess what about ESXi’s logs…. You can find the same information there.

 
It will make great to diagnostic an issue and revert to the previous state, if you believe that what was caused the issue.

For the one with a higher need for audit track, that’s a good motivator to move to vSphere 6.5

See you next folks
 

Tuesday, April 4, 2017

Identifying controllers and drivers for vSAN


One of the critical components on a vSAN environment is it’s I/O controller’s and flash drivers and firmwares, VMware provides two sets of guidelines, one for general purpose and one specific for vSAN purpose, and they are NOT mandatorily  the same.

When you install ESXi, either through the VMware ISO or a vendor’s custom installation ISO, it comes with general purpose driver, which might lead you to issues on your vSAN environment and thus you need to replace them by the vSAN’s certified one.

Let me make crystal clear here, if you are building a vSAN environment you MUST use the firmware and drivers certified for vSAN.

There are many ways and tools to help you identify correctly what controller, model and driver are right for your environment.

So, let’s see how to check what might be those:

First, identify if your card has been found and loaded by the host.
run: esxcfg-scsidevs -a

As you can seem my controller has been loaded to adapter vmhba5 and it’s using the lsi_mr3 driver.

Second, let’s also check which driver’s version this module is using, so we can compare with the HCL later.
run: vmkload_mod -s “module”   (in my example it’s using lsi_mr3)

In the example, it's running version 6.610.17.00.

Third, let’s list the device’s details, you can filter by adapter name, I’m just filtering anything with hba on it.
run: vmkchdev –l | grep hba


The important thing here is the four, four digits codes:
in my example 1000:005d: 1014:0454, representing:

VIV - Vendor ID: 1000
DID - Device ID: 005d
SVID - SubVendor ID: 1014
SSID – SubDevice ID: 0454

Now let’s find exactly what this controller is.

Go the VMware HCL (remember it’s the general purpose HCL, at this step we will just identify the correct controller model)
Switch the filter to IO Devices, then use the fields on the right to filter out with the details we just grab from the previous step.
Once you update the results you will get your controller model.


Remember it’s the general purpose information, and the idea is just to identify the controller based on the codes we got directly from the hardware itself.

Now let’s switch to vSAN HCL,
Switch the filter to vSAN and click on “Build your Own based on Certified Components”


With the controller model we just found previously, just type it into the keyword field.


Now we have it. 
Click on the controller to see it's firmware and drivers requirement.


As you can see, in my example, I should be using the megaraid driver version 6.610.15.00 instead of the lsi_mr

After I change the driver from lsi_mr3 to megaraid the vSAN health check status changed from warning to Passed.








At this point we just check drivers version, firmware version, on the other hand, is hard to find through the host and it depends if the vendor provides such functionality but it's critically important to have them accordingle with vSAN HCL. Please check with your vendor how to obtain this information, most of the time it comes from baseboard utilities.

Another thing to keep in mind, is about ESXi lifecycle, when creation your Update Manager make sure to include just the drivers certified to vSAN, otherwise, it might install general purpose drivers and vSAN Health Check will report warnings again.

 

Monday, March 20, 2017

vSAN stretched cluster topology explained


VMware vSAN 6.2 Stretched Cluster & 2 Node Guide covers greatly all the network topologies supported and the configuration steps involved on setting up such solution, but the guide does not detail some network aspects which could pass unnoticed and cause some issues down the road in case you did not consider them during design phase.

That’s what I’m trying to help clarify today.

My example is based on VMware’s recommended architecture, a stretched L2 network between data sites and L3 network between data sites and witness site.

If you follow the guide, soon you will notice the hosts from both sites would use the same gateway.

 If you implement something like that, the traffic from Site B would have to transverse the link between sites in order to reach the witness site.


Guest what, in case of link failure between witness site and Site A, link failure between sites or gateway (192.168.0.1) failure, hosts on site B would also be unable to reach the witness, even though there’s an alternate path through the gateway on site B (192.168.0.129).
Without access to the witness, the hosts on site B, would not have access to 50% plus of the components and then would not be able to run or restart any VM on it.

Back to my hypothetical scenario, to avoid this cross-link situation and make the hosts use their own local gateway and links this is what I’m proposing.

- First, plan your vSAN IP address space in advance;
Let’s take the example bellow: my L2 stretched vSAN network is 192.168.0.X/24
I’ll configure half of the IPs to be used on site A and the other half on site B.
        * Hosts on site A, will have their vSAN VMkernel IPs from a range of 
          192.168.0.2-192.168.0.126, (192.168.0.1 is already taken by the gateway).
        * Hosts on site B, will have their vSAN VMkernel IPs from a range of 
          192.168.0.130-192.168.0.254, (192.168.0.129 is already taken by the gateway).

- Second, hosts will have static routes to the witness site using it’s local gateway;
        * Hosts on site A will have a static route through 192.168.0.1
        * Hosts on site B will have a static route through 192.168.0.129

Now the tricky part, on the witness gateway (192.171.0.1), we will have to tell it to use different routes to reach each subset of vSAN network, each gateway vendor might implement it in its own way.

On my design, from the witness gateway perspective, I made my /24 network looks like two /25 networks, even though I did not make any change underneath.

Now I can add 2 static routes on the witness gateway, one for each site going through its own link and gateway
        * Route to 192.168.0.0/25 through 192.169.0.10
        * Route to 192.168.0.128/25 through 192.170.0.10

Another option would be using dynamic routes among the gateways, not covering it on this post.

This way, communication will always follow through its local gateway and link, providing full high-availability for each site.


I’m pretty sure there are a lot of different topologies, sites interconnection, and technologies to solve this issue. 

This example is not a single size fits all solution, but my main goal here is to create the awareness that this routing discussion must be taken. 
 

Monday, March 13, 2017

vSphere Replication Traffic Isolation


Once again I was working on a vSphere Site Recovery Manager project using vSphere Replication as a replication technology and had to explain how to isolate replication traffic from others network traffics, while explaining it is never a problem, I was wondering how many out there still have doubts about it, so creating this post is my way to try reach as many of you as I can.

But before we start we should ask ourselves, why isolate the replication traffic ?

Isolating the replication traffic might enhance the network performance in the data center because you isolate it from the other critical business network traffics, then we can apply individual prioritization and QoS methods, utilize a dedicated physical uplink or entirely network, enhance your monitoring and troubleshooting because you know exactly what is the purpose of the traffic and where they are flowing, security is enhanced as well as you don’t mix and match them. It’s all benefits ; )

While vSphere Replication Installation and Configuration guide has a procedure on how to accomplish the isolation, it does not really explain how it works and what the replication flow looks like, let’s see if I can through some lights on it.

Let’s work on a hypothetical environment:

- 2 sites;
- a management segment on each site (VLAN 120 and 220);
- a replication segment on each site (VLAN 110 and 210).



The goal, again, is to isolate the replication traffic from other traffics.

Obs: I’m intentionally hiding other services like vMotion or VM’s network to keep it simple, but imagine they are all running on their own segments.

Now let’s setup the environment properly

ESXi Preparation:
The way vSphere replication isolation works is sending and receiving the replication traffic through some specific VMkernel adapters, as we will see bellow.

On each ESXi hosts, create 2 new VMkernel adapters, make sure to select the portgroup which correlates to the VLAN ID for the replication segment for each site and configure an IP address accordingly for the adapter. (don't forget to select the right service for each adapter)

- One for outgoing traffic (vSphere Replication traffic)

- One for incoming traffic (vSphere Replication NFC traffic)
 Obs: We are creating both VMkernels adapter on each host, so they can work bi-directional, which means, it could be a source of a replication but also a destination.

If you remember ESXi TCP/IP stack there’s not individual stack for replication, so it would use the default gateway (on management interface) to replicate the traffic to a routed segment, that’s not our goal.
In this case, we must add a static route to each and every ESXi host telling to use another route through the new VMkernel interface to reach the replication segment on the other site. (KB2001426) is a very nice KB on how to add static routes to ESXi hosts.


vSphere Replication Preparation:
vR comes with a single vNIC, which is used for everything, management traffic, which involves communication with vCenter and ESXi and coordination with other vRs as well, plus replication.
Since we want to isolate the traffic, we will add a second vNIC just for this purpose.

- First, shutdown the vR appliance;
- add a second vNIC;
- power it on and access it’s VAMI Console. (https://"vr_ip":5480)
- On the Network tab, select Address;
- Scroll down to the eth1 and add an IP of the replication segment according to the site;



- Go back to VR tab, select Configuration;
- Add the recently configured replication IP at IP Address for incoming Storage Traffic field.

As has been on ESXi, we don’t want to use the default gateway, on management segment, to send the replication traffic, so we need to add a static route to vR.
Since it's a Linux box we can add the static routes information to /etc/sysconfig/network/routes file.
- Restart network services.

Now with everything ready, let's see how the replication flows:


- Configure on VM for replication; 

- Once an RPO is met, the ESXi hosting the VM which needs to be replicated sends it’s data through the VMkernel setup as vSphere Replication Traffic to the vSphere Replication Server on the destination site. (Dark Blue flow)

The vSphere Replication on the destination site receives and buffer the data and then sends it to the ESXi host on the destination site, which receives it through the VMkernel setup as vSphere Replication NFC traffic and saves the data to the destination datastore. (Red flow)

As we can see the traffic is isolated from management segment and hopefully from others.

Others can argue that you could create a dedicated segment for vSphere Replication traffic (outgoing) and other for vSphere Replication NFC traffic (incoming), but personally I believe, since it’s all replication traffic related, breaking it down further just adds complexity, so I like to keep it simple with a single segment for all replication traffic, incoming and outgoing.

We are done, you can start a replication and see if the traffic is following how it supposed to.

Hope this makes clear on how to isolate vSphere Replication traffic.

Thursday, March 2, 2017

vRealize Production Test


Maintain a healthy environment is part of any IT Administrator’s duty,  sometimes is hard to assess several components that might compound a solution, would it not be nice to have a tool that looks at these solutions and provides you a health report ?

 

Well, that’s what vRealize Production Test is all about. 

 

It started as a testing tool to vRealize Automation but VMware has been evolving it during the past months and now it includes engines to test vRealize Business and vRealize Operations as well. 

 

Let’s see how it works. 

 

-       Download vRealize Production Test jar file;

 

 

 

 

 

It’s part of the vRealize Automation download page, just go to Drivers & Tools and there is it.

 

 -       Open a command prompt and navigate to the directory where you save it;

      It’s based on Java, so be sure you have java installed on the system where you will run it from.

 

-       Start configuring the tool, running:  

java –jar vrealize-productiontest-1.7.0.jar config

Obs: make sure you change to your version file accordingly

 

When running for the first time you need to accept the EULA

-       On the left menu, you should select each product you wanna test and fill out its information;

-       Fill out the field appropriately and Save;

It will create an XML file with the information you just provide

 

-    To start checking your environment run:

java –jar vrealize-productiontest-1.7.0.jar run --oobList VRA

Once it’s done, there will be a html-reports folder on the source where you ran the script, inside that, open the VRPTR.html file

The report shows a nice summary of findings, but also a detailed section, where you can learn what has been found and the details on how to fix that.

If you want to learn more about the tool, there’s a detailed guide as well.

 

OBS: don’t be alarmed about the number of issues on my environment, I intentionally gave some bad hostnames and credentials to generate more errors ; )

Who am I

My photo
I’m and IT specialist with over 15 years of experience, working from IT infraestructure to management products, troubleshooting and project management skills from medium to large environments. Nowadays I'm working for VMware as a Senior Consultant, helping customers to embrace the Cloud Era and make them succefully on this journay. Despite the fact I'm a VMware employee these postings reflect my own opnion and do not represents VMware's position, strategies or opinios.

Most Viewed Posts

Blog Archive