Friday, January 30, 2015

Transparent Page Sharing Dilemma 2/2

Now that we learned what’s this Transparent Page Sharing (TPS) vulnerability is about, it’s time to see how you can secure your environment. If you are not following this series, read the past blog about it.

Everybody know that TPS shares identical pages of memory among VMs running on the same host to save some physical memory, that’s called Inter-VM. But just a few know that it also shares identical pages of memory within a VM, called Intra-VM.

To secure vSphere environments, VMware introduced the concept of Salting, which can control and manage the bounder of memory sharing among VMs.
To make it simple, you can think about it as Grouping, just VMs members of the same Group (Salt) share pages of memory, this way you can group VMs of the same client, customer or organization unit without worrying  those memories will be shared with unsecured VMs.

It’s behavior is ruled by the Hosts’ Advanced Setting of Mem.ShareForceSalting

This new capabilities were first introduced by the following patches:

 Up to the build releases above the default Salting behavior will looks like this:
By default Mem.ShareForceSalting = 0  which means TPS will work as always and memory will be shared among all VMs.
If you change Mem.ShareForceSalting = 1, then you enable Salting and the memory will be shared only among VMs of the same Salt group. (By default no VMs will be joined together on the same Salt Group)

To make VMs belong to the same Salt Group and so share memory among them, you need to change the VM’s Advanced Setting sched.mem.pshare.salt = “unique string”

Be aware that this defaul behavior will change starting with the following releases:
  • ESXi 5.5 U2d planned for Q1, 2015.
  • ESXi 5.1 U3 released on December 4, 2014.
  • ESXi 5.0 U3d planned for Q1, 2015.

The default Salting behavior will be Mem.ShareForceSalting=2

It’s important to notice that once you install those new builts the behavior of  Mem.ShareForceSalting=1 changes as well. If the VMs have no information of any Salt Group, then it’s memory will be shared among all others VMs.

Before this new capability the only way to disable TPS was to change the advanced setting Mem.ShareScanGHz from 4 to 0.
VMware strongly suggest using the new capability instead of this old method, so if you have made such adjustment revert it back from 0 to 4.

If you want to learn more about Salting behavior, check thos VMware's KB.
For initial Salting behavior, check KB2091682
For Salting behavior after the Update bundle, check KB2097593
 See you guys.

Friday, January 23, 2015

Transparent Page Sharing Dilemma 1/2

The big dilemma these days seems to be around disable or not disable Transparent Page Sharing (TPS), let me try through some light on the discussion.

Last year Worcester Polytechnic Institute released a paper called "Wait a minute ! A fast, cross-VM attack on AES", describing a vulnerability on vSphere environments.
In general, when  TPS is enable a user could flush the VM's memory and reload that memory from cache in order to gain an AES encryption key to use in a further attack.
As much as it sounds like Sci-Fi, it’s true. VMware has acknowledged that and even though believing  it's  very hard and unlikely to be successfully on an attack like that, they decided to take precautions about it, allowing us to disable TPS completely or adjusting it’s behavior a little bit (Next post I will cover how to adjust it)!

You might be asking yourself, what’s the impact of disabling TPS ?

Let’s cover that under 2 different aspects, security and performance.

- Security
Probably, on enterprises (private cloud) environments, all your VMs belong to the same company and you have a higher level of trust on your users, so enabling TPS would not be a big concern.
On the other hand, if you are a public cloud provider and you don’t know your users, disabling TPS will guarantee the isolation between them.

- Performance
 Since ESXi started to leverage Memory Large Pages, a looong time ago, TPS has not been making that difference on the majority of implementations anymore. Large Pages are hard to share among VMs, therefore you will notice a small utilization of it. But during memory contention periods, as one of it's memory reclamation techniques, the host breaks large pages into small pages, which are easy to share.
In summary, if your environment have high rates of overcommitment, then TPS is playing an importante role on your environment.

You can check if your system is a heavy user of TPS with esxtop.

Shared is the amount of memory of guest physical memory that is being shared.
Common is the amount of machine memory that is common across World(s).
Saving is the amount of machine memory that is saved due to page-sharing.

If you think reading that paper was boring, may be you can watch them explain it 

Friday, January 16, 2015

vSphere Replication Capacity Planning

There’s a common question that always pop up when I’m doing Disaster Recovery projects.
What amount of bandwidth would be required for replication ?

It’s a fair question, but it’s hard to estimate it without proper tools or data in place !!!

The majority of storage vendor have tools that monitor their appliances and can tell you the amount of data that would need to be replicated during a period of time.

But what about if you want to use host base replication (vSphere Replication) or you don’t have those tools in place ?!?

In general, you would define your RPO and estimate the amount of data changed on the servers you want to protect (that’s the hard part). You could use the disk metrics provided by vSphere to estimate that.

Luckily, there’s a FLING that can do all this calculation for you, it’s called vSphere Replication Capacity Planning Appliance.

Since it’s an appliance you just have to download the OVF and deploy it on your environment as you always do with appliances.
OBS: this tool is not meant to work with vSphere Replication, if you already have vR deployed on your environment first remove it.

There’s one requirement not documented, to power on the appliance it will check for an IP Pool associated with it’s Portgroup, if there’s none, power on will fail with the error: Cannot initialize property vami.netmask0 has no associate network protocol profile.

Once it's up..first thing is to access the appliance portal for some basic configuration.

Address: https://"appliance_IP":5480
User: root
Password: vmware

Now we need to start the replication on the VMs so you can measure it’s change rates.
OBS: In fact it does not replicate any data, it’s just a simulation.

Login through SSH into the appliance and run the following command to enable the replication monitoring: all the commands should be run under  /opt/vmware/hbrtraffic/bin

To enable replication run: 
./configureReplication --vc=”vcenter” --vcuser=”username” --vcpass=”password” --lwd=”appliance_ ip” --vmname=”VM_name” --rpo=”mins”

If the command is fine you will see a message: Enable replication for vm

Also you will see a task on vCenter

OBS: remember it’s a FLING so, it has not been exhaustedly tested, and might not be able to handle hundreds of VMs, so be nice with it…test just a few VMs at a time.
In my case to every VM I added I needed to reboot the appliance in order to add the next one  : (

Leave the appliance collecting the data for some cycles of RPO.
Then, you can go to the Graphs page and check the results:

Address: https://"appliance_IP":5480/vr-graphs/

Just click on the VM you want to check it’s data.

There’s its nice graph with the information you capture for the period.
You can look for past 4 hours average, daily, monthly… my favortie metric is delta size average.

WOW amazing……

Just one bad thing…there’s no graph that sum all the VMs collected, you will be have to sum that by yourself.

Once you are done monitoring, don’t forget to disable the replication running the following command:

To disable replication run: 
./configureReplication --vc=”vcenter” --vcuser=”username” --vcpass=”password” --lwd=”appliance_ ip” --vmname=”VM_name” --remove

Again, if the command is fine you will see a message: Disable replication of vm

 And a message on vCenter as well

 One last tip....
If you do have all the data about your VMs and how much it’s changed over time, you can just use vSphere Replication Calculator.

I bet you are now more prepared to make your estimated bandwidth needs, right ?

Thursday, January 8, 2015

VMware’s TOP 10 Blog posts of 2014

Keeping my tradition of listing my top 10 blog posts of the year, here comes the list.

8. Demystifying vSphere Replication
10. ESXi Hardware Health Monitoring

 A few comments about them:
- #1 and #2 still the same as 2013, seems there’s still room for P2V out there.
- I’m glad to see there’s a few posts from 2014 that made it’s way up, it’s hard to compete with when posts of past years have views since January 1st.
- There’s only one post from 2011 (#5), not sure if this issue still hurting people out there (I haven’t see it for a long time) or it’s just a good combination of words for the search’s engines.

If you want to compare the lists, those are from 2013 and 2012.

An amazing 2015 for all of you !!!

Who am I

My photo
I’m an IT specialist with over 15 years of experience, working from IT infrastructure to management products, troubleshooting and project management skills from medium to large environments. Nowadays I'm working for VMware as a Consulting Architect, helping customers to embrace the Cloud Era and make them successfully on their journey. Despite the fact I'm a VMware employee these postings reflect my own opinion and do not represents VMware's position, strategies or opinions. Reach me at @dumeirell

Most Viewed Posts

Blog Archive