This week one of my clients replaced their vCenter’s certificate, broking
the communication between vCenter and Big Data Extension, preventing user from
creating new Hadoop Clusters.
We started to see this issue when creating clusters through BDE plug-in,
the error message was:
Create Cluster failed.
Serengeti initialization error: Cannot login to vCenter Server Server
initialization canceled.
It worth’s to mention that I was still able to see the already
provisioned clusters and it’s status, the error just appears if I try to create
something new or scaling out/up the resources already provisioned.
Then I tried to login through the BDE CLI, this time the error message
was:
The connection is refused,
may be invalid username or invalid password. Try to reconnect.
Even though I was sure the username and password were correct I gave it
a few tries, without luck off course !!!
It was time to check the log messages at /opt/serengeti/logs/serengeti.log
The message could not be more clear !!!
Server certificate chain is
not trusted and thumbprint does not match.
It came clear that replacing vCenter’s certificates does in fact impact
BDE.
I started with the more obvious step; removing and registering back the BDE plugin within vCenter. https://bde:8443/register-plugin
Which did not have any effect, the authentication problem persisted.
At this point I decided to check the configuration files at: /opt/serengeti/conf
As you can see on vc.properties
file, there’s a vCenter’s thumbprint information.
In my case it was holding the old thumbprint information, so I manually
update it reflecting the new thumbprint.
Another file I wanted to check was vhm.properties
And again there was information about vCenter’s thumbprint. As I did
with vc.properties, I updated it accordingly too.
I recycled the services to make sure these changes did the trick.
Ran service tomcat restart and tried to login again.
Unfortunately, it did not work.
Curiously the thumbprint value at vc.properties
had reverted back to it’s old value.
Scratching my hair, I was wondering where it’s getting this information
from?!?!
After some time researching though the system I found another curious file,
/opt/serengeti/tmp/vcproperties.xml,
which holds a lot of information from my environment and guest what ?!?! the
old vCenters’s thumbprint was in there as well.
As I did before I updated the thumbprint on all files and recycled the
services again, this time thumbprint at vc.properties
did not change and the authentication was back to work.
But I was not happy, I rebooted BDE management server to make sure this
solution would survive a reboot.
Well, it did not !!! the old vCenter’s thumbprint value was back at vcproperties.xml, which was then
replicated to vc.properties, causing
the authentication issue again.
Back to the question: What’s happening during boot sequence that would
update vcproperties.xml file ?!?
After some long time troubleshooting it I found a script (/opt/serengeti/sbin/serengeti-onboot.rb)
that runs at every reboot and update/create this file.
This script has a lot of calls that controls the entirely system, rolling
through the script I found a section for vCenter.
Turns out that on each reboot the scripts reads the ovf Environment’s
settings and then populates the vcproperties.xml
file.
I checked BDE’s vApp ovf Environment’s settings at vCenter with the hope it
would be just a matter of adjusting it in there, but the thumbprint information
was just right on the vApp.
At this point I could only guess it’s a bug.
I filed a support request
(SR16904215603) to VMware’s support, there’s not fix for this yet and they are
working to provide a patch.
In case you have the same issue, please, open up a support request as
well and reference mine, as many clients reporting the same issue as faster will
be the fix released.
Since I could not afford to leave the system down until a definitive fix is release, here’s the workaround
to bring it back to production.
-
adjust vcproperties.xmI
with all correct information;
-
comment out the line on serengeti-onboot.rb which populates vcproperties.xml.
this way, on the next boot it will not overwrite the information you
manually entered.
For a final test, another reboot of the system, this time everything was
back to work properly.
Pieces of advise:
Perform these steps at your risk, if possible test before on a non-production environment and I highly encourage you go through these steps with VMware personnel.
Perform these steps at your risk, if possible test before on a non-production environment and I highly encourage you go through these steps with VMware personnel.
Take a backup of all your files before changing it:
cp /opt/serengeti/conf/vc.properties /opt/serengeti/conf/vc.properties.bkp
cp /opt/serengeti/conf/vhm.properties /opt/serengeti/conf/vhm.properties.bkp
cp /opt/serengeti/tmp/vcproperties.xml /opt/serengeti/tmp/vcproperties.xml
cp /opt/serengeti/sbin/serengeti-onboot.rb /opt/serengeti/sbin/serengeti-onboot.rb.bkp
Hope it helps.
cp /opt/serengeti/tmp/vcproperties.xml /opt/serengeti/tmp/vcproperties.xml
cp /opt/serengeti/sbin/serengeti-onboot.rb /opt/serengeti/sbin/serengeti-onboot.rb.bkp
Hope it helps.