Just Another IT Blog

It's time to share some of my experiences, crazy ideas, tips and tricks !!!

Post Page Advertisement [Top]

This week one of my clients replaced their vCenter’s certificate, broking the communication between vCenter and Big Data Extension, preventing user from creating new Hadoop Clusters.

We started to see this issue when creating clusters through BDE plug-in, the error message was:
Create Cluster failed. Serengeti initialization error: Cannot login to vCenter Server Server initialization canceled.

It worth’s to mention that I was still able to see the already provisioned clusters and it’s status, the error just appears if I try to create something new or scaling out/up the resources already provisioned.

Then I tried to login through the BDE CLI, this time the error message was:
The connection is refused, may be invalid username or invalid password. Try to reconnect.

Even though I was sure the username and password were correct I gave it a few tries, without luck off course !!!

It was time to check the log messages at /opt/serengeti/logs/serengeti.log

The message could not be more clear !!!
Server certificate chain is not trusted and thumbprint does not match.

It came clear that replacing vCenter’s certificates does in fact impact BDE.

I started with the more obvious step; removing and registering back the BDE plugin within  vCenter. https://bde:8443/register-plugin

 Which did not have any effect, the authentication problem persisted.

At this point I decided to check the configuration files at: /opt/serengeti/conf
As you can see on vc.properties file, there’s a vCenter’s thumbprint information.
In my case it was holding the old thumbprint information, so I manually update it reflecting the new thumbprint.

Another file I wanted to check was vhm.properties
And again there was information about vCenter’s thumbprint. As I did with vc.properties, I updated it accordingly too.

I recycled the services to make sure these changes did the trick.
Ran service tomcat restart  and tried to login again.

Unfortunately, it did not work.
Curiously the thumbprint value at vc.properties had reverted back to it’s old value.

Scratching my hair, I was wondering where it’s getting this information from?!?!
After some time researching though the system I found another curious file,  
/opt/serengeti/tmp/vcproperties.xml, which holds a lot of information from my environment and guest what ?!?! the old vCenters’s thumbprint was in there as well.

As I did before I updated the thumbprint on all files and recycled the services again, this time thumbprint at vc.properties did not change and the authentication was back to work.

But I was not happy, I rebooted BDE management server to make sure this solution would survive a reboot.
Well, it did not !!! the old vCenter’s thumbprint value was back at vcproperties.xml, which was then replicated to vc.properties, causing the authentication issue again.

Back to the question: What’s happening during boot sequence that would update vcproperties.xml file ?!?
After some long time troubleshooting it I found a script (/opt/serengeti/sbin/serengeti-onboot.rb) that runs at every reboot and update/create this file.

This script has a lot of calls that controls the entirely system, rolling through the script I found a section for vCenter.

Turns out that on each reboot the scripts reads the ovf Environment’s settings and then populates the vcproperties.xml file.

I checked BDE’s vApp ovf Environment’s settings at vCenter with the hope it would be just a matter of adjusting it in there, but the thumbprint information was just right on the vApp.

At this point I could only guess it’s a bug. 
I filed a support request (SR16904215603) to VMware’s support, there’s not fix for this yet and they are working to provide a patch.
In case you have the same issue, please, open up a support request as well and reference mine, as many clients reporting the same issue as faster will be the fix released.

Since I could not afford to leave the system down until a definitive fix is release, here’s the workaround to bring it back to production.

-       adjust vcproperties.xmI with all correct information;
-       comment out the line on serengeti-onboot.rb which populates vcproperties.xml.

this way, on the next boot it will not overwrite the information you manually entered.

For a final test, another reboot of the system, this time everything was back to work properly.

Pieces of advise: 
Perform these steps at your risk, if possible test before on a non-production environment and I highly encourage you go through these steps with VMware personnel.

Take a backup of all your files before changing it:
cp /opt/serengeti/conf/vc.properties /opt/serengeti/conf/vc.properties.bkp
cp /opt/serengeti/conf/vhm.properties /opt/serengeti/conf/vhm.properties.bkp
cp /opt/serengeti/tmp/vcproperties.xml /opt/serengeti/tmp/vcproperties.xml
cp /opt/serengeti/sbin/serengeti-onboot.rb /opt/serengeti/sbin/serengeti-onboot.rb.bkp 

Hope it helps.

Bottom Ad [Post Page]