3. Monitoring – Holistic view of performance and availability
Operating Systems
Applications
Hypervisors
Hardware
Transactions
SmartCloud
Monitoring
4. What is SmartCloud Monitoring?
Health dashboards to provide an
instant, consolidated glimpse into
cloud health
Topology views of the key
interrelated components of the cloud
Reports on the health trends of cloud
components and workloads, powered
by Cognos
What-If capacity planning scenarios
Policy-Based optimization to put
workloads where they’ll perform best,
not just where they’ll fit
Performance Analytics for right-
sizing of virtual machines
Integration with industry-leading IBM
service management portfolio
5.
VMware
KVM (IBM, Redhat)
Citrix XenServer
Citrix XenApp,
Citrix XenDesktop
Hyper-V
•X86-Hypervisors
•Transactions •Middleware•Applications
• Transaction tracking
• Response time
• Robotic transactions
• Monitoring of zVM
and Linux on System z
•IBM z/VM
•Databases
• SAP
• Exchange
• Lotus Notes
• PeopleSoft
• etc
• DB2
• Oracle
• SQL
• etc
• WebSphere
• MQ
• WebLogic
• etc
• Monitoring of IBM
Power VM (CEC, VIOS,
LPARS (AIX, Linux),
HMC)
•IBM Power
• Windows
• Linux
• AIX, Linux on p, i5
• zOS, z/VM
• Solaris
• HP-UX
•OS •Storage/Network
• TPC: IBM Storage,
EMC, Hitachi,
NetApp
• DFM: NetApp
• Ethernet Switches
• etc
• Solaris/Zones
•Other Hypervisors
•Integrate with ITM/ITCAM and other tools to extend scope beyond hypervisors
• x86 (IBM non IBM)
• IBM Power 5, 6, 7
• IBM system z, zEnterprise
• Sun (SPARC)
• HP
• Cisco UCS
• etc
•Server Platforms/OS
SmartCloud Monitoring Broad Hypervisor & Platform Support
6. Monitor the Virtualization/Cloud Ecosystem
ITMITM
ESX/
ESXi
ESX/
ESXiESX/
ESXi
ESX/
ESXiESX/
ESXi
ESX/
ESXiESX/
ESXi
ESX/
ESXiESX/
ESXi
ESX/
ESXiESX/
ESXi
ESX/
ESXiESX/
ESXi
ESX/
ESXi
vCenter
Server
vCenter
Server
Holistic Approach to Monitoring including storage,
networking, hypervisor, etc.
NetApp Storage Agent:
– Provides Monitoring data in ITM
– Integrates into Health Dashboard
TADDM Integration
– TADDM DLA discovers the vCenter environment/topology
– TADDM provides change data to VMware Health Dashboard
IBM Director Integration
– ITM Agent provides integration with the Director Server
– Allows for Management of VMware resources
– Historical Collection of HW data
Tivoli Storage Productivity Center (TPC):
– Agent provides storage metrics in TEP
– Integrates into Health Dashboard
– Warehouse storage metrics for reporting and analysis
Network Monitoring Agent
– Monitor switches used by VMware
– Integrate Network Events into Dashboard
Consider adding application monitoring to the ecosystem
NetAppNetApp
NetApp
Agent
NetApp
Agent
DFMDFM
TADDMTADDM
Health
Dashboard
Health
Dashboard
TPCTPC
IBM
Storage
IBM
Storage
HitachiHitachi
EMCEMC
NetAppNetApp
Network
Switches
Network
Switches
Network
Switches
Network
Switches
VI AgentVI Agent
Apps /
Midleware
Apps /
Midleware
7. SmartCloud Monitoring 7.2 – What's new?
Additional VMware Metrics:
− Orphaned VMDK files
Completely rewritten VMware Health Dashboard:
− Lighter weight/Faster Response Time
− More intuitive and easy to navigate
− Fewer clicks to drill down to root cause
New DASH user interface:
− Single User Interface where multiple Tivoli products are integrated
− Includes TCR 3.1 which includes Active Reporting
− Sample Active Report Attached Here:
− Self-Service dashboarding capabilities. Build dashboards using any ITM data
and data using Tivoli Directory Integrator
− Support for tablet devices
Improved VMware Capacity Planning:
− Can save existing customization
− Can do partial loads of the VMware environment
− New VMware Expense Reduction Report and other reports
− Improved benchmark matching
−
Evaluates CPU, Mem, Network I/O, Storage, and Storage Topology
8. SmartCloud Monitoring 7.2 – What's new?
Power Systems:
− Capacity Planning: What-if scenarios, server sizing, etc. for Power Systems
− Enhanced Power Systems Agents including consolidation of UNIX OS Agent
and Premium AIX Agent
Enhancements to other hypervisors:
− Other hypervisors such as Citrix and Cisco UCS have been enhanced
ITM 6.3 Enhancements:
− OS Dashboard in DASH
− OSLC Linked Data for integration with TBSM and other products
− 64-bit TEPS that doubles the number of concurrent users and improves scale
− Warehouse Range Partitioning
This can dramatically improve performance by eliminating the need for
Pruning of the historical data
− Authorization Policy Server
To restrict the access for dashboard users to Managed System Groups and
to individual agent managed systems.
The ability to grant role-based access control in addition to Access Control
Lists, making access control easier and safer.Role inheritance for scalable
management.
11. 11
Single Cluster view showing events and KPI’s
Can be real-time or historical
Select any of the links below to go to Servers, VMs, or Datastores
12. 12
Single Cluster view showing VSphere Servers
Click on a link to drill down to a single server
13. 13
Single Server showing historical data
Actions allow you to launch in context to TEP or TCR
Select any of the links below to go to Cluster, Servers, VMs, or Datastores
23. Dynamic Thresholding & Adaptive Monitoring
View real-time and time aligned
historical data
Analyze the trends and see trends vs
anomalies
Use Avg, Max, Min, Percentile, Mode
Monitors can be defined for shifts or can
be adjusted for seasonal differences
Agents provide static monitoring
thresholds
IBM Tivoli Monitoring provides static
thresholds and Dynamic
Thresholds/Adaptive Monitoring
The system learns the “normal”
behaviour for a resource and sets the
threshold based on historical data
24. 24
Performance Analyzer: Predictive Trending
• Hands off capacity monitoring
• Automates performance analysis and reporting
• Prediction of application bottlenecks
• Creation of alerts for potential service threats.
• “What will my resources look like tomorrow, next
week. next month or next year?”
• “What IT Resources should I worry about next?”
• “Will I have enough capacity to get me through
Monday?”
Leverage collected data to spot trends and highlight emerging concerns
•Time
•Metric
•Predicted trend
•Threshold •Predicted
•Metric Violation
•Actual Monitor Data
25. Key Capabilities of Performance Analyzer
Out of the box analytics tasks for:
– OS Agents (CPU, Memory, Disk, Network)
– DB2 (Pool Read/Write, Sort Time, Memory, Tablespaces, etc.)
– Oracle (Cache Hits, Archive Space, Tablespaces, Transactions, etc.)
– Vmware (Physical Server, VMs, Memory, CPU, Datastores, etc.)
– Power Systems (SEA, Network, CPU, Memory, I/O, Entitlement, etc.)
– Response Time (Web, Robotic, and Client Response Time)
Easily Customized to Analyze any numeric data
– Arithmetic Modules for calculating data and normalizing data
– Analytic Modules for performing predictive analytics
– Predict the following:
• How many days until I reach a Warning or Critical Threshold
• Predict the value 7, 30, 90 days into the future (customizable)
When defining Situations in Performance Analyzer, define the number of
days notice you need
– Take into account the statistical values such as the strength, number of data points, etc.
26. Key Capabilities of Performance Analyzer
Select
Agent Type
Select Attribute Group
Select Hourly, Daily, Weekly, etc.
Analyze all or a
subset of your
agents
27. Key Capabilities of Performance Analyzer
Define
Warning and
Critical
Thresholds
29. Performance Analyzer for:
Vmware and Power Systems (CPU Trends, Disk Utilization, Memory, Network) out of the box
For Vmware, recommend setting up Analytic Task for:
Cluster Percent Effective CPU utilization, Cluster Percent Effective Memory utilization, CPU
Percent Ready
Setup Analytic Tasks for other hypervisors (Hyper-V, KVM, Solaris Zones)
Linear and non-Linear trending…the tool picks the model that best fits the data
Model ChosenTime to Critical and Warning
Bar Chart represents the
historical data and the line
graph represents the
statistical trend line
31. Why is Capacity Management Critical to Cloud Management?
Helps consolidate and reduce IT costs
Reduces HW and labor costs
Fewer physical servers needed
Reduce hypervisor license costs
Increase VM density to drive Cloud ROI
Predict how many more customers / VMs can be serviced
Helps ensure application availability and reduce risk
Are any resources overloaded? When will physical resources reach their
limits?
Have there been any significant changes in my environment recently?
Identify trends to predict bottlenecks, or free up space and balance
workloads
Ensure supply can meet demand
Ensure technical and business policies are met to reduce risk
Helps optimize resource utilization
Right size virtual machines and allocate based on usage, over-commit
within known risk limits
Pack VMs on the infrastructure to optimize resources
32. Capacity Planning & Analytics - Architecture
…
Platforms
Storage Network
Hypervisors
Servers
Workload Characterization
- Establish patterns using historical data
- Capture workload attributes to enable optimization policies
Capacity Planning
Database
Optimization Engine to size
and place VMs
Optimization Engine to size
and place VMs
Plan
Recommendation
(minimize systems,
license, balance)
Business and
Technical policies
Copy, Federate
Custom Tags
enhance Config Profiles and
workload relationships
Benchmarking data
Usage profiles, workload relationships
33. The Case for Holistic Capacity Analytics
Unbalanced
Can’t move workload to this cluster because it’s
almost out of datastore space
On one screen, I can check all of
the key resources to see if my
workload is balanced
34. Unbalanced across Clusters
and within a cluster
Need to look at all key attributes to look for bottlenecks and imbalance
Disk I/O
Metrics
also
available
The Case for Holistic Capacity Analytics
37. Policy Driven Capacity Planning
Opens a new tab with
pre-built rules or a rule
editor for customer rules
Choose
rules for
“what-if”
scenario
Out of the box rules
•Colocation/Anti-colocation
• Place Win2003 32-bit on the
same ESX server
•Boundary Rules
• Place Win2003 32-bit on the
same ESX server
•Utilization Rules
• Provide 20% growth for key
business application
Create custom rules
SPECint data used for analysis
39. 39
Reduce from 9
ESX servers
down to 4 while
lowering
memory and
CPU utilization
Utilization projections
accommodated growth
Started with 9 ESX
servers and 24 VMs
The tool always includes
headroom to ensure you
don’t run out of capacity
Capacity Planning Results
Recommendations to optimize workloads; reduce risk while eliminating or reallocating
servers
41. ROI Case Study – IBM Test & Development Cloud
Cluster consisting of 18 Servers and 1802 Virtual machines
Goal: Analyze an existing,
production virtual environment in
search of further optimization, and
show ROI using management and
capacity planning
Solution: Used IBM SmartCloud
Monitoring to analyze the current
environment and perform “what-if”
analysis
Results: More Optimized
environment uses fewer physical
servers, which results in savings in
hardware, administrator /support,
energy, data center floor space and
license costs, resulting in an
additional ROI of 14.4.% over a year,
and the ability to accommodate an
additional 113 virtual machines.
Optimization of an IBM Internal
development and test cloud using
IBM SmartCloud monitoring results in
an additional ROI of 14.2%
“In order to realize true cost savings from a virtualization
or cloud investment, customers need to be able to run
virtual machines densely enough to maximize
consolidation, yet be assured that their workloads are still
running as well as they were before being virtualized, with
room for expansion.”
14.4 % Annual ROI
45. SmartCloud Monitoring
Virtual Machines | Storage | Networks
Provides greater visibility to cloud health
• Track cloud service levels & performance,
and predict cloud problems before clients are impacted
• Understand performance and capacity today, and
know what it will look like months from now
Lowers total cost of operations
• Optimize workload placement to wring maximum
capacity and performance out of
your cloud investment
• Freedom from expensive hypervisor or OS lock-in with
a heterogeneous cloud infrastructure monitoring solution
Optimizes cloud performance
• Built-in performance analytics for right-sizing
of virtual machines and resource optimization
in the cloud
• Real-time proactive & predictive alerts help identify
and fix problems quickly
Health
Dashboards
Capacity
Analytics
Performance
Optimization
Increased Density
Reduced Risk
Minimized Outages
Optimized Workload
Placement
Improved Service
Levels
Years of experience managing mission critical workloads in the
World’s largest enterprises yield peerless best practices advice
IBM SmartCloud Monitoring: Optimize your cloud
performance and maximize ROI