VMworld 2013
Bernd Harzog, The Virtualization Practice
Mark Leake, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
VMworld 2013: Building the Management Stack for Your Software Defined Data Center
1. Building the Management Stack for Your
Software Defined Data Center
Bernd Harzog, The Virtualization Practice
Mark Leake, VMware
VCM4869
#VCM4869
2. Bernd Harzog — Virtualization Performance and
Capacity Management Analyst
• Analyst and Consultant Focused upon:
— Infrastructure Performance and Capacity Management of Virtualized Systems
— Application Performance Management
— Transaction Performance Management
— End User Experience Management
• Clients include:
— Enterprises seeking
virtualization performance
management solutions
— Vendors offering solutions
• Key Findings
— Virtualization introduces
sharing and dynamic behavior
— Agile Development produces
rapidly changing applications
— Both combine to require a new
tools, organizations and
management processes
3. Key Trends
• The demand for business functionality implemented in
software is infinite (therefore so is the backlog)
• More and more software – from different sources, from
different tools, in different languages, on different run times
• Scaled out commodity deployment platforms
• Distribution of applications across data centers and
private/public clouds
• Virtualization of business critical and performance critical
applications
• More than one hypervisor in the enterprise
• Rapidly changing applications running on dynamic platforms
• The Software Defined Data Center will deliver dramatic
benefits and create significant management challenges
5. Your New World
• Agile Development creates
rapidly changing applications
• Built in diverse languages
and running on diverse
language runtimes
• Running on next generation
deployment platforms
• Deployed on multiple
virtualization platforms
• Running on scaled out
commodity hardware
• Located in multiple clouds
with multiple owners
Your Cloud Public CloudHybrid Cloud
8. What’s Different about a Software Defined Data
Center?
• Configuration, management, and some of the functional execution of CPU,
memory, networking and storage is done in the SDDC software
— Example – configuration of a virtual tunnel between two VM’s across clusters or
even virtual data centers
• Some of the services currently performed by dedicated hardware appliances
will be performed by software plug-ins to the SDDC
— Example – load balancing and security
• Almost all of the configuration for a VM or for an N-tier application system
can be done in one place, and will follow that workload around
• Since all of this configuration will be done in the SDDC software layer, and
since it will all be exposed via API’s, configuration changes will occur much
more frequently and easily
• Private clouds will be able to address a broad range of business critical
applications since the required resources will be able to be automatically
marshaled by the Cloud Management platform from the SDDC
• An SDDC supporting a private cloud will be a highly dynamic computing
platform using a high degree of automation to continuously execute a variety
of actions on a highly automated basis
9. Management Principles for the Software Defined
Data Center
• Start Over – Start with a new Reference Architecture - do not assume
that any tool you have purchased automatically makes the cut
• Insist upon easy to try, easy to buy, easy to manage, and results in
production before purchase
• Organize for the successful virtualization of business critical applications
• Define Performance as Latency and Response time, not Resource
Utilization
• Manage every application for performance, not just the 5% most painful
and important ones
• Get Real Time, Deterministic and Comprehensive about Data Collection
• Design your management architecture for the distributed cloud case
even if you are not there yet
10. If you Don’t Believe Me!
Statistics Collection & Telemetry
Another area of focus for an open networking ecosystem should be defining a framework for common
storage and query of real time and historical performance data and statistics gathered from
all devices and functional blocks participating in the network. This is an area that doesn’t
exist today. Similar to Quantum, the framework should provide for vendor specific extensions and
plug-ins. For example, a fabric vendor might be able to provide telemetry for fabric link utilization,
failure events and the hosts affected, and supply a plug-in for a Tool vendor to query that data and
subscribe to network events.
http://cto.vmware.com/open-source-open-interfaces-and-open-networking/
11. The Worse than Useless Test
Apply this test to every single management product in
your company
1. Does it operate on a real-time, continuous, and deterministic basis?
2. Does it support workloads distributed across data centers (yours and
ones you rent (cloud))?
3. Does it work across your virtualization and cloud vendor
environments?
4. Can it re-configure itself every time you change something in the
environment or in the applications?
5. Can you support it and use it without the continuous presence of on
premise consultants from the vendor of the tool?
6. If it is a monitoring tool, does it focus upon response time and
latency?
7. Can you try it, for free, in production, before you buy it or more of it?
If the answer is not “Yes” to all seven junk the tool
and start over
12. Starting Over – Rethink ITIL and the CMDB
• ITIL is designed to get you to document and slow down the rate of change
• Don’t tell the Change Control Committee about vMotion!
• Your CMDB will never be able to keep up with rate of change in a Software
Defined Data Center
• Every configuration change needs to be tracked in real time, and cross-
correlated with performance degradations and resource contention
13. Starting Over – Rethink ITIL Business Service
Management
There will be no time to “Design Services”. They will need to be
discovered automatically as they are put into production
14. Starting Over – Legacy Management Solutions
will Never by able to Cope with the SDDC
=
• A Software Defined Data Center changes too frequently for
legacy management solutions to be able to keep up.
• Legacy solutions cannot be incrementally modified to be able to
cope with the SDDC
• Gluing a new product from an acquired startup to the side of a
legacy management solution cannot fix a fundamentally broken
approach.
• Put the dino in a cage and do not let him out – build a new
management stack for your SDDC – isolate the dino to your
legacy physical environment
Blind
Dinosaur
15. Gartner is Not Going to Be Much Help Either
• Gartner used to cover “Operations
Management” tools in its “IT Event
Correlation and Analysis” Magic Quadrant
• That MQ was last published in December
2012, and was retired in 2013
• Gartner has not yet come up with a
replace MQ that includes legacy vendors
like IBM, BMC, HP and CA, as well as
newcomers like VMware vCenter
Operations, Dell vFoglight, Microsoft
SCVVM, VMTurbo, etc.
16. Insist upon the New Way of Trying,
Implementing, and Buying Management Software
The Old Way The New Way
• Rep takes the CIO to play
golf
• Enterprise software deal gets
signed
• Some products work, others
don’t
• People go around the ELA to
get the tools they need
• You get to download and use
the software in production
first
• You prove to yourself that it
really does work and add
value in your environment
• Then (and only then) do you
buy it
17. Organize for Virtualization of Critical Applications,
Agility, and Success
Virtualization is Just
One Team
Data Center Operations
LAN Team
Windows Server Team
Linux Server Team
WAN Team
Database Team
Java Server Team
Web Server Team
SAN Team
Storage Team
Programmer/Analyst Team
Virtual Operations
Tier 3 Support
Tier 2 Support
Tier 1 Help Desk
Application Operations Support
Systems Engineering
Virtualization and Application
Operations are THE Teams
• The existing IT Operations Organization will not be able to cope with the
SDDC or the clouds that run on it
• Virtualization pervades IT Operations, and becomes Virtual Operations
• Application Operations is responsible for the performance of every application
in production (purchased and custom developed)
18. Performance ≠ Resource Utilization
Performance = Response Time & Latency
The Root of All Evil
• CPU and Memory are horrible
indicators of performance
Latency is the appropriate
measure of infrastructure
performance
Response Time is the
appropriate measure of
application performance
20. A Reference Architecture for your SDDC
Management Stack
App Performance Mgmt
Automation&Orchestration
Infrastructure Perf. Mgmt
The SDDC Management Stack
Cloud Management
Security*
Operations Mgmt
BigDataRepository
Self-LearningAnalytics
Data Protection*
* Not Covered in this Presentation
22. Big Data Repository
Potential Vendors
• VMware (LogInsights)
• Splunk
• Cloudera (Hadoop)
• 10gen (MongoDB)
• NuOdb
• Pivotal (HVE)
We Need a Multi-Vendor Management Data Store!
AppPerformanceMgmt
Automation&Orchestration
InfrastructurePerf.Mgmt
CloudManagement
Security
OperationsMgmt
Big Data Repository
Self-Learning Analytics
DataProtection
Key Functionality
• All management products
should feed one data store
• One version of the truth
as to the state of the SDDC
• Since the SDDC is one
“Domain”
• The only feasible way to do
“entire-Domain” root cause
and reporting
• The only feasible way to do
“entire domain” analytics
23. Operations Management
Key Features
1. Host and guest resource
utilization monitoring
2. Capacity Mgmt & Planning
3. Used by IT Operations
Example Vendors
• Cirba
• CloudPhysics
• HP (VPV)
• ManageEngine
• Quest (vOperations)
• Reflex Systems
• Solarwinds
• Splunk
• Veeam
• VMTurbo
• VMware vC OPS
• Zenoss
24. Key Criteria for Resource Based Performance and
Capacity Monitoring
• Out of the box value – if it is not providing value in
10 minutes junk it and find something else (auto-
discovery is key)
• Collect data from vCenter AND the other
virtualization platforms that you support or plan to
support
• Look for the integration of performance
management, capacity management, and
configuration management
• Collecting, dashboarding, alerting, and reporting on
vCenter data is commodity functionality – look for
value in analytics and automation
25. Infrastructure Performance (Latency)
Management
• Servers
• Storage
• SAN Fabric
Key Features
1. Understanding of end-to-end
infrastructure performance
2. Capacity management and
planning
3. Infrastructure response time is
the key metric
4. Used by the team supporting
the virtual infrastructure
Example Vendors
• AppNeta
• ExtraHop Networks
• Riverbed
• Sevone
• Virtual Instruments
• Xangati
• Network Fabric
26. Key Criteria for Infrastructure Response Time
Solutions
• Measure IRT – Monitor how long it takes the infrastructure to
respond to requests for work, not how much resource it takes
• Deterministic – Get the real data, not a synthetic transaction,
or an average
• Real Time – Get the data when it happens, not seconds or
minutes later
• Comprehensive – Get all of the data, not a periodic sample of
the data
• Zero-Configuration (Discovery) – Discover the environment
and its topology, and keep this up to date in real time
• Application (or VM) Aware – Understand where the load is
coming from and where it is going
• Application Agnostic – Work for every workload or VM type in
the environment irrespective of how the application is built or
deployed
27. Example - Infrastructure Performance
Management & Real Time Metrics
• Knowing whether performance is good or not all of the time, requires
measuring performance in a comprehensive, deterministic, and real
time manner
• Averaging good transactions with bad transactions obscures the true
nature and impact of the bad transactions
VMware vCenter
5 Minute Average Data
Virtual Instruments VirtualWisdom
Real Time Data
28. Application Performance Management
Key Features
1. Understanding of app response time
across the application system
2. Used by Operations and Application
Support
Example Vendors
• AppEnsure
• AppDynamics
• AppFirst
• AppNeta
• BlueStripe
• Boundary
• Confio Software
• Correlsense
• Compuware (dynaTrace)
• ExtraHop Networks
• HP (Performance Anywhere)
• New Relic
• Quest (Foglight)
• Riverbed
Agent Agent Agent Agent Agent Agent Agent Agent Agent Agent
29. APM is not just for Custom Applications
Apps Ops = Every Application!
• CA/Wily
• HP Diagnostics
• IBM ITCAM
• Precise
• AppDynamics
• AppNeta (TraceView)
• Compuware
• HP (Perf. Anywhere)
• New Relic
• Quest (Foglight)
• BMC Patrol
• NetIQ
• HP BAC
• CA Unicenter/Spectrum
• AppEnsure
• AppFirst
• BlueStripe
• Boundary
• Confio Software
• Correlsense
• ExtraHop
• Riverbed
Legacy Modern
Custom
Developed
Apps
(DevOps)
Every App
(AppOps))
30. Key Criteria for Application Response Time
Solutions
• Measure Actual Application Response Time – How long did it take, not
how much resource it used
• Breadth of Application Support – Ideally support every application
running in the environment automatically (conflicts with depth)
• Depth of Root Cause Diagnostics – Provide deep analysis into the
application stack for root cause (conflicts with breadth)
• Deterministic – Get the real data, not a synthetic transaction, or an
average
• Real Time – Get the data when it happens, not seconds or minutes later
• Comprehensive – Get all of the data, not a periodic sample of the data
• Application Discovery and Topology Mapping – Automatically discover
new applications and their topology and keep this update to date
automatically and continuously
• Analytics and Baselining – Avoid manual thresholds, learn normal
behavior and alarm based upon deviations from normal
• Public Cloud Ready – Allow applications to be distributed across
organizational boundaries, and have monitoring work with no firewall
work
32. Cloud Management
Key Features
1. Automated Provisioning of Services
2. Presentation of Services in a Service
Catalog
Agent Agent Agent Agent Agent Agent Agent Agent Agent Agent
Example Vendors
• BMC CLM
• Cisco (Cloupia)
• Citrix (Cloud.com)
• CloudBolt Software
• Embotics
• Eucalyptus
• FluidOps
• Piston Cloud (OpenStack)
• ServiceMesh
• VirtuStream
• VMware vCAC
33. The Three Phases of Cloud Management
1) AWS Clone Phase (Self-Service from IT)
– Let IT offer what AWS offers
– Probably not as easy
– Probably not as flexible
– Probably not as cheap
– Why the first generation of Cloud Management failed
2) Tactical IT Agility Phase (Automated Provisioning)
– Automates provisioning of tactical and simple production applications
– Does not address anything that really matters to the business
– Where we are now
3) Enterprise Application Phase (Lifecycle Management)
– Automate the management of the applications that matter (DevOps,
SAP)
– Address the core of what IT does day in and day out
– The strategy for the enterprise capable Cloud Management vendors
34. IT Automation in Your SDDC
Puppet Chef
vFabric AppDirector
Legacy Automation Process
Populate the
Image
Assemble the
Application
35. Self-Learning Analytics – The Only Way to Keep
up with your SDDC
Self-Learning
Analytics
• The right organization, the right tools, and the right data
• Combined with the right self-learning Analytics
• Leads to an automated “entire stack” Root Cause Analysis Process
App Performance Mgmt
Automation & Orchestration
Infrastructure Perf. Mgmt
Cloud Management
Security
Operations Mgmt
BigDataRepository
Data Protection
Real Time, Deterministic
and Comprehensive Data
Prelert
Netuitive
VMW vC Ops
36. Before You Try to be Predictive….
• Instrument your infrastructure for end-to-end latency
(Infrastructure Performance Management)
• Implement a real-time operational data store that can keep up
with the rate of change in your virtual environment
• Implement a modern Developer focused APM solution for your
critical custom developed applications
• Implement an Operations focused APM solution to measure
response time for every application
• Get as real time, deterministic, and comprehensive as possible
with all of your response time and latency metrics
• Reorganize and implement an Application Operations function
staffed with application domain experts
• Operationalize finding and fixing problems in real time
• Then and only then – try to get truly predictive
37. Evaluation Criteria for Performance Analytics
• How automated is the learning (really)
• Diversity of accepted data (time series, events)
• Frequency and quantity of data inputs
• Breadth of plug-ins to the monitoring products you
own, or are going to own
• Process for learning (handling) “normal” events
• Tradeoffs between false positives (false alarms) and
false negatives (you missed something)
• Ease of implementation (time and cost)
• Quality of the Analysis (can you trust it?)
38. The Reference Architecture with VMware
Management Solutions
Partner Solutions
vCOrchestratorandPuppet
Future Networking Instrumentation
vCloud Automation Center
vShield
vCenter Operations Manager
LogInsight
vCOps&LogInsightAnalytics
VDP & SRM
App Performance Mgmt
Automation&Orchestration
Infrastructure Perf. Mgmt
Cloud Management
Security
Operations Mgmt
BigDataRepository
Self-LearningAnalytics
Data Protection
The SDDC Management Stack The VMware Implementation
The first vendor of an SDDC (VMware) will be the first
vendor of an SDDC Management Stack (VMware)
39. A Reference Architecture for your SDDC
Management Stack
App Performance Mgmt
Infrastructure Perf. Mgmt
The SDDC Management Stack
Cloud Management
Security*
Operations Mgmt
BigDataRepository
Self-LearningAnalytics
Data Protection*
* Not Covered in this Presentation
Netuitive,
Prelert
Splunk
CloudBolt, Embotics, FluidOps
ServiceMesh, VirtuStream
AppDynamics, AppEnsure, AppFirst,
AppNeta, BlueStripe, Boundary,
Compuware, Correlsense, ExtraHop,
INETCO, New Relic, Riverbed
Confio, ExtraHop, GigaMon,
Virtual Instruments, Xangati
Cirba, CloudPhysics, Dell,
HP, Hotlink,
VMTurbo, Zenoss
Automation&Orchestration
Puppet, Chef, Cloud Sidekick
Intigua
41. One Final Point (Wrap Up)
• In this industry we are great at inventing things to
solve problems that we did not know that we had
• The PC, the LAN, Client/Server, the Internet, Java,
Server Virtualization, VDI, Clouds and Smartphones are
all innovations that targeted previously unknown
problems
• We are very good at propagating these innovations
throughout enterprise organizations worldwide
• Every time we do this we forget about managing the
innovation before we deploy it
• If you buy the right management products at the right
time you can avoid repeating this mistake with your
SDDC
43. Building a New Management Stack for your
Software Defined Data Center (and your Cloud)
Bernd Harzog
Analyst, Virtualization Performance and Capacity Management
bharzog@virtualizationpractice.com
44. 44
Other VMware Activities Related to This Session
HOL:
HOL-SDC-1301
Applied Cloud Operations
HOL-SDC-1313
vCloud Suite Use Cases - Infrastructure Provisioning (IaaS)
HOL-SDC-1314 vCloud Suite Use Cases
Application Provisioning (PaaS)
HOL-SDC-1307
Enable Hybrid Cloud Automation & Governance with vCAC
Group Discussions:
VCM1002-GD, VCM1004-GD
Cloud Operations with Hicham Mourad or Sam McBride
VCM1003-GD
Cloud Automation with Naomi Sullivan
45.
46. Building the Management Stack for Your
Software Defined Data Center
Bernd Harzog, The Virtualization Practice
Mark Leake, VMware
VCM4869
#VCM4869