SmartCloud Monitoring and Capacity Planning

© 2013 IBM Corporation
SmartCloud Monitoring
Simon Coote
29 May 2013, Copenhagen

Agenda

Introduction

What is SmartCloud Monitoring?

Demonstration

Health Dashboards

Predictive Analytics

Capacity Planning

Reporting

Summary

Monitoring – Holistic view of performance and availability
Operating Systems
Applications
Hypervisors
Hardware
Transactions
SmartCloud
Monitoring

What is SmartCloud Monitoring?

Health dashboards to provide an
instant, consolidated glimpse into
cloud health

Topology views of the key
interrelated components of the cloud

Reports on the health trends of cloud
components and workloads, powered
by Cognos

What-If capacity planning scenarios

Policy-Based optimization to put
workloads where they’ll perform best,
not just where they’ll fit

Performance Analytics for right-
sizing of virtual machines

Integration with industry-leading IBM
service management portfolio


VMware

KVM (IBM, Redhat)

Citrix XenServer

Citrix XenApp,

Citrix XenDesktop

Hyper-V
•X86-Hypervisors
•Transactions •Middleware•Applications
• Transaction tracking
• Response time
• Robotic transactions
• Monitoring of zVM
and Linux on System z
•IBM z/VM
•Databases
• SAP
• Exchange
• Lotus Notes
• PeopleSoft
• etc
• DB2
• Oracle
• SQL
• etc
• WebSphere
• MQ
• WebLogic
• etc
• Monitoring of IBM
Power VM (CEC, VIOS,
LPARS (AIX, Linux),
HMC)
•IBM Power
• Windows
• Linux
• AIX, Linux on p, i5
• zOS, z/VM
• Solaris
• HP-UX
•OS •Storage/Network
• TPC: IBM Storage,
EMC, Hitachi,
NetApp
• DFM: NetApp
• Ethernet Switches
• etc
• Solaris/Zones
•Other Hypervisors
•Integrate with ITM/ITCAM and other tools to extend scope beyond hypervisors
• x86 (IBM non IBM)
• IBM Power 5, 6, 7
• IBM system z, zEnterprise
• Sun (SPARC)
• HP
• Cisco UCS
• etc
•Server Platforms/OS
SmartCloud Monitoring Broad Hypervisor & Platform Support

Monitor the Virtualization/Cloud Ecosystem
ITMITM
ESX/
ESXi
ESX/
ESXiESX/
ESXi
ESX/
ESXiESX/
ESXi
ESX/
ESXiESX/
ESXi
ESX/
ESXiESX/
ESXi
ESX/
ESXiESX/
ESXi
ESX/
ESXiESX/
ESXi
ESX/
ESXi
vCenter
Server
vCenter
Server

Holistic Approach to Monitoring including storage,
networking, hypervisor, etc.

NetApp Storage Agent:
– Provides Monitoring data in ITM
– Integrates into Health Dashboard

TADDM Integration
– TADDM DLA discovers the vCenter environment/topology
– TADDM provides change data to VMware Health Dashboard

IBM Director Integration
– ITM Agent provides integration with the Director Server
– Allows for Management of VMware resources
– Historical Collection of HW data

Tivoli Storage Productivity Center (TPC):
– Agent provides storage metrics in TEP
– Integrates into Health Dashboard
– Warehouse storage metrics for reporting and analysis

Network Monitoring Agent
– Monitor switches used by VMware
– Integrate Network Events into Dashboard

Consider adding application monitoring to the ecosystem
NetAppNetApp
NetApp
Agent
NetApp
Agent
DFMDFM
TADDMTADDM
Health
Dashboard
Health
Dashboard
TPCTPC
IBM
Storage
IBM
Storage
HitachiHitachi
EMCEMC
NetAppNetApp
Network
Switches
Network
Switches
Network
Switches
Network
Switches
VI AgentVI Agent
Apps /
Midleware
Apps /
Midleware

SmartCloud Monitoring 7.2 – What's new?

Additional VMware Metrics:
− Orphaned VMDK files

Completely rewritten VMware Health Dashboard:
− Lighter weight/Faster Response Time
− More intuitive and easy to navigate
− Fewer clicks to drill down to root cause

New DASH user interface:
− Single User Interface where multiple Tivoli products are integrated
− Includes TCR 3.1 which includes Active Reporting
− Sample Active Report Attached Here:
− Self-Service dashboarding capabilities. Build dashboards using any ITM data
and data using Tivoli Directory Integrator
− Support for tablet devices

Improved VMware Capacity Planning:
− Can save existing customization
− Can do partial loads of the VMware environment
− New VMware Expense Reduction Report and other reports
− Improved benchmark matching
−
Evaluates CPU, Mem, Network I/O, Storage, and Storage Topology

SmartCloud Monitoring 7.2 – What's new?

Power Systems:
− Capacity Planning: What-if scenarios, server sizing, etc. for Power Systems
− Enhanced Power Systems Agents including consolidation of UNIX OS Agent
and Premium AIX Agent

Enhancements to other hypervisors:
− Other hypervisors such as Citrix and Cisco UCS have been enhanced

ITM 6.3 Enhancements:
− OS Dashboard in DASH
− OSLC Linked Data for integration with TBSM and other products
− 64-bit TEPS that doubles the number of concurrent users and improves scale
− Warehouse Range Partitioning

This can dramatically improve performance by eliminating the need for
Pruning of the historical data
− Authorization Policy Server

To restrict the access for dashboard users to Managed System Groups and
to individual agent managed systems.

The ability to grant role-based access control in addition to Access Control
Lists, making access control easier and safer.Role inheritance for scalable
management.

Today’s Agenda
10
High level Vmware dashboard showing all clusters, events, and key KPI’s
Click to drill down

11
Single Cluster view showing events and KPI’s
Can be real-time or historical
Select any of the links below to go to Servers, VMs, or Datastores

12
Single Cluster view showing VSphere Servers
Click on a link to drill down to a single server

13
Single Server showing historical data
Actions allow you to launch in context to TEP or TCR
Select any of the links below to go to Cluster, Servers, VMs, or Datastores

14
Configuration tab shows config data from TADDM
Bread crumbs allow for easy navigation

15
Change History data from TADDM

16
Networking KPI’s for the VSphere Server
Select any of the links below to go to Cluster, VMs, or Datastores

17
Virtual Machine Page showing real-time or
historical KPI’s and Events
Link to OS Dashboard for the VM

18
OS Dashboard with metrics and Events
from the OS Agent

20
VMware Datastores
Select to drill down

21
Single Datastore shows real-time or historical
data and Events
Select any of the links below to go to connected Clusters, Servers, or VMs

Dynamic Thresholding & Adaptive Monitoring
 View real-time and time aligned
historical data
 Analyze the trends and see trends vs
anomalies
 Use Avg, Max, Min, Percentile, Mode
 Monitors can be defined for shifts or can
be adjusted for seasonal differences
 Agents provide static monitoring
thresholds
 IBM Tivoli Monitoring provides static
thresholds and Dynamic
Thresholds/Adaptive Monitoring
 The system learns the “normal”
behaviour for a resource and sets the
threshold based on historical data

24
Performance Analyzer: Predictive Trending
• Hands off capacity monitoring
• Automates performance analysis and reporting
• Prediction of application bottlenecks
• Creation of alerts for potential service threats.
• “What will my resources look like tomorrow, next
week. next month or next year?”
• “What IT Resources should I worry about next?”
• “Will I have enough capacity to get me through
Monday?”
Leverage collected data to spot trends and highlight emerging concerns
•Time
•Metric
•Predicted trend
•Threshold •Predicted
•Metric Violation
•Actual Monitor Data

Key Capabilities of Performance Analyzer

Out of the box analytics tasks for:
– OS Agents (CPU, Memory, Disk, Network)
– DB2 (Pool Read/Write, Sort Time, Memory, Tablespaces, etc.)
– Oracle (Cache Hits, Archive Space, Tablespaces, Transactions, etc.)
– Vmware (Physical Server, VMs, Memory, CPU, Datastores, etc.)
– Power Systems (SEA, Network, CPU, Memory, I/O, Entitlement, etc.)
– Response Time (Web, Robotic, and Client Response Time)

Easily Customized to Analyze any numeric data
– Arithmetic Modules for calculating data and normalizing data
– Analytic Modules for performing predictive analytics
– Predict the following:
• How many days until I reach a Warning or Critical Threshold
• Predict the value 7, 30, 90 days into the future (customizable)

When defining Situations in Performance Analyzer, define the number of
days notice you need
– Take into account the statistical values such as the strength, number of data points, etc.

Select
Agent Type
Select Attribute Group
Select Hourly, Daily, Weekly, etc.
Analyze all or a
subset of your
agents

Define
Warning and
Critical
Thresholds

Define
prediction
durations

Performance Analyzer for:
 Vmware and Power Systems (CPU Trends, Disk Utilization, Memory, Network) out of the box
 For Vmware, recommend setting up Analytic Task for:
 Cluster Percent Effective CPU utilization, Cluster Percent Effective Memory utilization, CPU
Percent Ready
 Setup Analytic Tasks for other hypervisors (Hyper-V, KVM, Solaris Zones)
 Linear and non-Linear trending…the tool picks the model that best fits the data
Model ChosenTime to Critical and Warning
Bar Chart represents the
historical data and the line
graph represents the
statistical trend line

Why is Capacity Management Critical to Cloud Management?
 Helps consolidate and reduce IT costs
Reduces HW and labor costs
Fewer physical servers needed
Reduce hypervisor license costs
Increase VM density to drive Cloud ROI
Predict how many more customers / VMs can be serviced
 Helps ensure application availability and reduce risk
Are any resources overloaded? When will physical resources reach their
limits?
Have there been any significant changes in my environment recently?
Identify trends to predict bottlenecks, or free up space and balance
workloads
Ensure supply can meet demand
Ensure technical and business policies are met to reduce risk
 Helps optimize resource utilization
Right size virtual machines and allocate based on usage, over-commit
within known risk limits
Pack VMs on the infrastructure to optimize resources

Capacity Planning & Analytics - Architecture
…
Platforms
Storage Network
Hypervisors
Servers
Workload Characterization
- Establish patterns using historical data
- Capture workload attributes to enable optimization policies
Capacity Planning
Database
Optimization Engine to size
and place VMs
Optimization Engine to size
and place VMs
Plan
Recommendation
(minimize systems,
license, balance)
Business and
Technical policies
Copy, Federate
Custom Tags
enhance Config Profiles and
workload relationships
Benchmarking data
Usage profiles, workload relationships

The Case for Holistic Capacity Analytics
Unbalanced
Can’t move workload to this cluster because it’s
almost out of datastore space
On one screen, I can check all of
the key resources to see if my
workload is balanced

Unbalanced across Clusters
and within a cluster
Need to look at all key attributes to look for bottlenecks and imbalance
Disk I/O
Metrics
also
available
The Case for Holistic Capacity Analytics

SmartCloud Monitoring Capacity Planning Center

Planning Centre – applying parameters

Policy Driven Capacity Planning
Opens a new tab with
pre-built rules or a rule
editor for customer rules
Choose
rules for
“what-if”
scenario
Out of the box rules
•Colocation/Anti-colocation
• Place Win2003 32-bit on the
same ESX server
•Boundary Rules
• Place Win2003 32-bit on the
same ESX server
•Utilization Rules
• Provide 20% growth for key
business application
Create custom rules
SPECint data used for analysis

Spec data provides granular capacity planning

39
Reduce from 9
ESX servers
down to 4 while
lowering
memory and
CPU utilization
Utilization projections
accommodated growth
Started with 9 ESX
servers and 24 VMs
The tool always includes
headroom to ensure you
don’t run out of capacity
Capacity Planning Results
Recommendations to optimize workloads; reduce risk while eliminating or reallocating
servers

40
Utilization after optimization
Headroom
Reduce both CPU and Memory
Utilization reservations to free
up resources

ROI Case Study – IBM Test & Development Cloud
Cluster consisting of 18 Servers and 1802 Virtual machines
Goal: Analyze an existing,
production virtual environment in
search of further optimization, and
show ROI using management and
capacity planning
Solution: Used IBM SmartCloud
Monitoring to analyze the current
environment and perform “what-if”
analysis
Results: More Optimized
environment uses fewer physical
servers, which results in savings in
hardware, administrator /support,
energy, data center floor space and
license costs, resulting in an
additional ROI of 14.4.% over a year,
and the ability to accommodate an
additional 113 virtual machines.
Optimization of an IBM Internal
development and test cloud using
IBM SmartCloud monitoring results in
an additional ROI of 14.2%
“In order to realize true cost savings from a virtualization
or cloud investment, customers need to be able to run
virtual machines densely enough to maximize
consolidation, yet be assured that their workloads are still
running as well as they were before being virtualized, with
room for expansion.”
14.4 % Annual ROI

VMware Expense Reduction Report - Example

SmartCloud Monitoring
Virtual Machines | Storage | Networks
Provides greater visibility to cloud health
• Track cloud service levels & performance,
and predict cloud problems before clients are impacted
• Understand performance and capacity today, and
know what it will look like months from now
Lowers total cost of operations
• Optimize workload placement to wring maximum
capacity and performance out of
your cloud investment
• Freedom from expensive hypervisor or OS lock-in with
a heterogeneous cloud infrastructure monitoring solution
Optimizes cloud performance
• Built-in performance analytics for right-sizing
of virtual machines and resource optimization
in the cloud
• Real-time proactive & predictive alerts help identify
and fix problems quickly
Health
Dashboards
Capacity
Analytics
Performance
Optimization
Increased Density
Reduced Risk
Minimized Outages
Optimized Workload
Placement
Improved Service
Levels
Years of experience managing mission critical workloads in the
World’s largest enterprises yield peerless best practices advice
IBM SmartCloud Monitoring: Optimize your cloud
performance and maximize ROI

Kiitos!
Tack!
Thank you!
Takk!
Tak!

SmartCloud Monitoring and Capacity Planning

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à SmartCloud Monitoring and Capacity Planning

Similaire à SmartCloud Monitoring and Capacity Planning (20)

Plus de IBM Danmark

Plus de IBM Danmark (20)

Dernier

Dernier (20)

SmartCloud Monitoring and Capacity Planning