This document provides an overview and demonstration of the Splunk App for VMware. It discusses how the app can provide insight into VMware data, collect various types of VMware data over time for analytics, and gain visibility into other infrastructure layers when monitoring VMware environments. The demo shows the app architecture, and the document discusses installing and scaling the app, including deploying and configuring a forwarder virtual appliance to collect VMware data.
4. Provide insight into all VMware data
Persist granular data over time for analytics
Gain visibility into other infrastructure layers
Monitoring VMware Environments
5. Data Collected
5
Data type Data description
Performance Performance metrics for hosts, VMs and clusters
ESX/i and VC logs Logs from your physical hosts and virtual centers
Task/Events
Tasks/Events performed on physical hosts and virtual
centers
Hierarchy/Invent
ory
Hierarchy of VMware environment and inventory data
about hosts and VMs
6. Proactive Monitoring
6
• Visualize state of environment based on
health dashboards.
• Narrow down problems to sections of
environment with topology tree
• Drill down workflow to find root cause
7. Comprehensive Operational Analytics
7
• Uses 20 second performance metrics for
historical trends and statistical comparisons
• Analyze ESX, VC logs and exceptions with pre-
packaged topology-based filters
• Track environment and user changes using
tasks and events, for security and change
analyses
8. Platform For End-to-End Visibility
8
• Correlate virtualization data with other
technologies in your IT stack
• Harness data from large-scale distributed
VMware deployments
• Scale to handle any volume of data across
technologies and data centers
14. VMware Prerequisites
14
Ensure you have supported versions of VMware components
– vCenter server 4.1, 5.0, 5.0 Update 1, 5.1
– vCenter server on Windows (for VC logs)
– ESX/i hosts 4.1, 5.0. 5.0 Update 1, 5.1 on 64 bit x86 CPUs
Install vCenter on recommended VMware reference hardware
15. Splunk Prerequisites
15
We support Splunk versions 4.3.5 and above
Ensure Splunk is installed on reference hardware
Ensure license supports summary indexing
Ensure you have adequate licensing volume
– 500 MB - 1 GB per host per day for default config
18. Deploy and Resource the FA VM
Deploy and
Resource the
FA VM
Create service
accounts
Configure data
colection
20
19. Deploying the FA VM with Required Access
21
FA
VM
vCenter Server (Windows)
ESX/i hosts
Network access
DNS configured
Set time zones
20. FA VM Resources for Best Results
Out of the box
2 vCPUs, 250 Mhz reservation
4 GB memory, 128 MB reservation
Can monitor up to 20 hosts (25 VMs per host)
Increase to
4 vCPUs or 2 vCPUs with 2 cores, 4 Ghz reservation
Increase memory reservation to 2 GB
Can monitor up to 30 hosts (25 VMs per host)
22
23. Use Script to Create Service Accounts
25
Using the script minimizes errors and saves time
Create accounts on all hosts in a VC
– Expects the same admin account for all hosts
– Creates the same account on all hosts
Create account on one host, managed or unmanaged
Permission existing accounts on hosts
– Can be AD accounts
25. Use Script to Engine Files for Data Collection
27
Using the script minimizes errors and saves time
Script creates .conf files that are used by inputs.conf
Creates engine-<datatype>.conf files
– Specify what data to collect and what entity to collect from
– Splits the .conf files for parallelized data collection
We wanted to give access to all data coming from your Vmware environment, this includes performance logs and a host of other data. The Vmware environment can be very much like a black box and visibility is key. This is something that is unqique to our vmware app in that we provide access to perf and log data all in one place.Splunk can maintain data granularity and persist data as far back as desired and still provide very good performance in retrieving it. This is something other solutions are not capable of doing.Finally while the Vmware environment provides a lot of data ranging from performance to logs, it can be affected by other technologies in the infrastructure. With Splunk you can correlate any data to get an end-to-end view of your environment. Again a very unique aspect of using Splunk.
The Vmware app 2.0 has a completely new workflow and while the data in the background is the same as previous versions, the ability to troubleshoot has been greatly improved.The app now allows the user to praoctively monitor what is going wrong in their environment through health dashboards and use the workflow to drill down to find the root cause of the problem. The improved workflow visualizes the key areas in the environment that are facing trouble and allows for quicker investigation of issues.
For deeper analysis you can use the performance views to dig through the granular 20second metrics from VC. VC generally expires the 20 second metrics in 2 hours but we can persist it over as long as desired. This allows for more accurate troubleshooting and analysis of performance issues in your environment.Further more you can not only identify performance issues but also delve directly into the ESX and VC logs to find events of note and to correlate them with the performance issues in order to narrow down the cause
The Vmware solutionhas 3 components. On the left is your Vmware environment and on the right is your Splunk indexer/searchhead, drawn as one unit for simplicity. The three components are the Vmware app, which has all the dashboards and visualizations; the Splunk_TA_vcenter which sits on a Splunk forwarder on a windows vCenter server, this is responsible for identifying all the VC logs that need to be forwarded to the indexer; and finally there is the FA VM. The Forwarder Appliance Virtual Machine. This is a machine running Splunk and a component called Splunk_TA_vmware which is responsible for making API calls to the ESX hosts and VC to collect performance, ESX log, hierarchy, inventory, task and event data. The yellow lines indicate these data. There is a ratio of the number of FA VMs to hosts around 1:30 but we will cover that later.
This is the actual FA VM. We ship it as a single .ova file (open virtual appliance) downloadable from Splunk base. It is essentially a machine running CentOS 5.7 with a Splunk forwarder and a component called Splunk_TA_vmware that contains perl modules that connect to Vmware’sperl SDK and make all the api calls. The reason for providing a data collection VM is to make it easier for you the user to get up and running faster. You do have to install the Vmwareperl SDK yourself.We provide root access to the OS so you can keep it upgraded as you likeWe do support building your own Data collection appliance on RedHat. The docs cover what is supported and what is not.
Support vSphere 4.1 and aboveIt is highly recommended you have installed your vCenter on Vmware’s recommended hardware. It is likely you have already done this. But I stress on it because we do make a number of API calls to the VC and the impact to the VC is minimal when it is installed on recommended hardware.If VC logs are desired we only support the vCenter on windows, not the linux appliance. Having the linux appliance however does not mean you cannot get all other data such as perf, esx logs, tasks and events etc. All this data comes from the API so it does not matter which VC you are running.
Ensure Splunk is installed on reference hardware because the app is search heavy and is constantly performing data summarization.The app makes heavy use of summary indexing and cannot work without it. Enterprise licenses definitely have summary indexing but some other licenses like Education licenses may not.
The first step prior to the main installation steps is to create a limited permission service accounts. This has to be done manually on a windows machine.
On Splunkbase-Splunk App for Vmware-Splunk Technology Add-on for VMware vCenter-Splunk Forwarder Virtual Appliance for Vmware
The first two steps are very simple and involve unzipping apps onto the indexer/searchhead or fowarder on the vCenter. I will focus most of my time today on the third step which relates to the configuration of the FA VM.
The FA VM Deployment and configuration can be broken up into 3 major best practice areas.The first being to deploy and resource it correctly, second to create service accounts on the esx hosts using the tools we provide and finally to get your engine running by configuring data collection.
FA VM needs network access to be able to send and receive data from your vCenter servers and ESX/i hosts and send data to your Splunk indexer. We recommend you install the FA VM into the same subnet as the VC and hosts. Ensure DNS is configured on the FA VM. DNS configuration allows the FA to use the list of ESX hosts provided by the vc, in order to connect to the hosts.Set time zones between your FA VM and indexer(s), recommended to use NTP, this is enabled by default. Also check your ESX hosts and make sure they are in the same timezone as each other.
Increase the resources on your FA VM while deploying. Increasing the reservation will ensure this machine gets the resources it needs to crawl the hosts. Doing these steps will save you a lot of time later on if you want to monitor more hosts as you can just add hosts without worrying about whether it can manage that much load. This will also reduce your management hassles with fewer FA VMs.
You can use the scripts that are provided with the FA VM to remotely create/re-permission local limited-permission service accounts on your ESX hosts. Remember the vCenter account is created manually.
The script is called enginebuilder.py. It is important to use this script at all times because it helps parallelize the data collection by splitting the conf files efficiently and also does a credentials check before it creates the files. It checks the credentials on your vc and all your esx hosts.
This is how the enginebuilder.py script splits up the conf files. It creates 4 separate conf files, one for each major data type. The inputs.conf file then uses these configuration files as parameters to a scripted input. The conf files specify where to collect what data from. E.g. perf data is only collected from hosts, hierarchy and inventory only from the VC, and so on.
You can use the enginbuilder.py to further split up data collection if you have more than one FA VM. You just tell it how many hosts per FA VM and it splits it up into a set of conf files as just files where the enginebiulder is being run. And for the other FAs it creates tar balls for you to copy to the desired FA VM and use enginbuilder on that FA VM to untar it correctly.
This is an example of how a scaled environment might look like. One FA VM is dedicated to collecting data from the VC, and may collect from some hosts too, the second FA only collects from hosts.