A presentation on Nastel AutoPilot's capabilities for advanced application analytics. Based on Complex Event Processing (CEP) it provides early warning about potential or actual problems across multiple data sources - and it does it in real-time.
2. Challenges many of our customers face
Competitive Pressures
Ability to react to volatile market
Rapid changes in demand
Need to retain customers and keep service levels high
2
3. Challenges many of our customers face
Competitive Pressures
Ability to react to volatile market
Rapid changes in demand
Need to retain customers and keep service levels high
Requirement for Sustainable Cost Reduction
Off shoring & Out Sourcing
De-duplication – overlapping products and roles
Need to accomplish more for less
3
4. Challenges many of our customers face
Competitive Pressures
Ability to react to volatile market
Rapid changes in demand
Need to retain customers and keep service levels high
Requirement for Sustainable Cost Reduction
Off shoring & Out Sourcing
De-duplication – overlapping products and roles
Need to accomplish more for less
Regulatory Challenges
Dodd-Frank, Basel III, HIPAA and more
Need to manage risk
4
5. Nastel helps address Competitive Pressures
Competitive Pressures
Identifies issues that could prevent systems from
handling rapid changes in order volume
Reduces number and duration of outages
5
Cloud
CEP
AutoPilot’s Complex Event Processing
helps manage competitive pressures
by providing automated problem
detection - reducing number &
duration of outages
6. Nastel helps address Competitive Pressures
Competitive Pressures
Big Data – if you don’t master the exploitation of big data,
your competitors will…
6
Cloud
CEP
7. Nastel helps address Competitive Pressures
Competitive Pressures
Big Data – if you don’t master the exploitation of big data,
your competitors will… If you master big data, you can:
Resolve problems faster, improve service levels and retain customers
Understand customer behaviour
See the patterns and learn how your users make use of your apps and from this
design ones that better meet their needs - before your competitors do
7
Cloud
CEP
8. Nastel helps address Competitive Pressures
Competitive Pressures
Big Data – if you don’t master the exploitation of big data,
your competitors will… If you master big data, you can:
Resolve problems faster, improve service levels and retain customers
Understand customer behaviour
See the patterns and learn how your users make use of your apps and from this
design ones that better meet their needs - before your competitors do
8
Cloud
CEP
AutoPilot’s is almost unique in
understanding application
performance data and analytics,
both web and legacy. It was baked
into AutoPilot from the ground up
and is provided as close to real-
time as is possible
9. Nastel helps address Cost Reduction
Requirement for Sustainable Cost Reduction
Improve effectiveness of offshore teams by avoiding
eyes-on-screen monitoring
9
Cloud
CEP
utilization
Offshore team effectiveness improved
- No eyes-on-screen monitoring
necessary as AutoPilot only alerts a
human when absolutely necessary,
resulting in improved IT resources
utilization
10. Nastel helps address Cost Reduction
Requirement for Sustainable Cost Reduction
Improve effectiveness of offshore teams by avoiding
eyes-on-screen monitoring
Reduce the number of tools required for monitoring and management
Start by consolidating their data into AutoPilot for consistency
10
Cloud
CEP
Number of tools can be reduced - AutoPilot supports all major
middleware platforms with a unified monitoring platform
Cloud
Servers
Application
Servers
TIBCO WMQ
System Z
DataPower
Solace
DB
CEP
J2EE/.NET
11. Nastel helps address Cost Reduction
Requirement for Sustainable Cost Reduction
Improve effectiveness of offshore teams by avoiding
eyes-on-screen monitoring
Reduce the number of tools required for monitoring and management
Improve productivity by eliminating false-positive alerts
11
AutoPilot improves productivity using CEP to calculate a trend and instead of false alerts
at T1, T2, T3 and T4 - CEP dynamically creates its own metrics based on the events it
receives from collectors (agents/probes) and turns them into actionable information or
metrics and correctly alerts on the trend at T5 – more effective staff utilization
Time
CPU
Threshold
T1 T2 T3 T4 T5
12. Nastel helps address Regulatory Challenges
Regulatory Challenges
Segregation of duties, Privileged access, recertification
12
AutoPilot helps enterprises control
Segregation of duties and privileged access
via a single security model employed across
all middleware – This helps reduce risk
User name: Albert Mavashev
Password Expires in: 30 days
Account disabled Audit account
Account locked
LDAP
Inherit permissions from owner: √
WMQ Group DataPower Group
Solace Group TIBCO RV Group
√
√
√
√
Administrator@Acme.com
√
TIBCO EMS Group √
13. Nastel helps address Regulatory Challenges
Regulatory Challenges
Segregation of duties, Privileged access, recertification
Provides vital insight into compliance with regulatory standards
13
AutoPilot automatically tracks
applications across the
enterprise capturing vital
insight into compliance with
regulatory standards. Its real-
time performance monitoring
enables you to you to stay
compliant with your internal
and external commitments.
TradeStart Missing Verification
TradeEnd
Customer
Access
16. Middleware-Centric Application Performance Monitoring
16
StorageServers DatabasesNetwork
INFRA
STRUCTURE
Messaging
Middleware
Application
Servers
Enterprise
Service Bus
SOA
Appliances
Trading
Equities
Claims
Processing
Funds
Transfers
Order
Handling
Payments
ProcessingAPPLICATIONS
TRANSACTIONAL MONITORINGTRANSACTIONAL MONITORING
TRADE AUDITING
CUST ID
TRACKING
BALANCE
AUTHORIZATION
FAILED TX
LOST TX
VALIDATION
OPERATIONAL MONITORINGOPERATIONAL MONITORINGCEP Policy EngineCEP Policy Engine
17. Middleware-Centric Application Performance Monitoring
17
StorageServers DatabasesNetwork
INFRA
STRUCTURE
Messaging
Middleware
Application
Servers
Enterprise
Service Bus
SOA
Appliances
Trading
Equities
Claims
Processing
Funds
Transfers
Order
Handling
Payments
ProcessingAPPLICATIONS
TRANSACTIONAL MONITORINGTRANSACTIONAL MONITORING
TRADE AUDITING
CUST ID
TRACKING
BALANCE
AUTHORIZATION
FAILED TX
LOST TX
VALIDATION
OPERATIONAL MONITORINGOPERATIONAL MONITORINGCEP Policy EngineCEP Policy Engine
Repository
Business Service Views
for Line of Business
Real-time Views
for Operations
18. AutoPilot Architecture: Foundation for building Elastic APM
Domain
Server
(CEP)
CEP
Server
PROD
CEP
Server
PROD
CEP
Server
QA
CEP
Server
QA
CEP
Server
DEV
CEP
Server
DEV
CEP
Server
PROD
CEP
Server
PROD
Pub-sub over IP
PMDBGridGrid
Fail-
over
Fail-
over
StateState
• Business Rules
• Analytics
• Actions
• Notifications
• Desired state
Policies
• Sampling
• Events
• Transactions
• Streaming
• Data sources
Monitors
• Events
• Event payload
• Metrics
• KPIs & KBIs
• Derived Metrics
Facts
Monitors
Facts
KPIs
KBIs
Policies
Objectives
Goals
Users
Dashboard
Alerts
Notifications
18
19. Active Data Grid:
In-memory cache with persistence
Elastic APM:
Just-in-time deployment across CEP instances
CEP Instance
PoliciesData
Sources
CEP Instance
Data
Source Policy
Persistent
Store
Persistent
Store
19
20. Policies: Rules &
Situation Analysis
Compound Event /
Predicted Situation
CEP: Complex Event & Metric Processing
KPIs, Events,
Actions and
Notifications
AutoPilot CEP
Events
&
Metrics
Rules processing speed:
The single CEP engine running on 64 bit
quad CPU server with 4 GB of memory
can process 2M rules per second.
Because CEP is a virtual machine it can
scale up linearly. By adding an
additional CEP engine the speed will
double.
20
21. Metrics
21
Metric Short Description
Value Current value
Update-Count Times value updated (changed or same)
Change-Count Times value changed
Reset-Count Number of resets
Previous-Value Previous value
Time-Created Time Created
Last-Updated Time last updated
Last-Changed Time last changed
Update-Age time since update
Change-Age time since change
Time-Difference time difference in ms between fact publisher (origin) and subscriber
Min Overall Minimum since reset
Max Overall Maximum since reset
MAvg Moving average
Counter last actual value for a counter type, versus the delta reported
Time-Since-Reset Time since reset
Change-Latency time between latest changes
Update-Latency time between latest updates
Update-Velocity rate of update
History-Size number of facts in history store
History-Max-Size maximum number of history samples
History-Time time reprented by history
History-Avg Average of values in history facts
History-EMAvg Exponential Moving Average of values in history facts
History-Max Maximum values in fact history
History-Min Minimum values in fact history
History-Variance Variance of values in fact history
History-Deviation Standard Deviation of values in fact history
History-Dev-Mean number of standard deviations from the mean
History-Bound Upper bound based on Chebyshev in-equality
History-Band-High High band based on Bolligner bands
History-Band-Low Low band based on Bolligner bands
History-RSI Relative Strength Indicator
History-SO-K Stochastic oscillator
History-CAvg Average percent change in history (based on % change)
History-CVariance Variance of values in fact history(based on % change)
History-CDeviation Standard Deviation of values in fact history (based on % change)
History-CBound Upper bound based on Chebyshev in-equality (based on % change)
History-CDev-Mean number of standard deviations from the mean (based on % change)
History-CBand-High High band based on Bolligner bands (based on % change)
History-CBand-Low Low band based on Bolligner bands (based on % change)
History-CAvg-Gain Average Percent Gain
History-CAvg-Loss Average Percent Loss
History-CAD-Ratio ratio of Advances to Declines
History-HROC historical rate of change percent
History-IROC instantaneous rate of change percent
Some of the derived
facts we provide
23. Complex Event Processing Capabilities
Decouples rule evaluation from physical event structure
Changes to the event patterns or structure do not break rules
Simulations and replay can be accomplished easily
Live recording and replay of actual event feeds
No need for actual event sources
Rules can be tested with simulations before going live
White Board aids during design and development of
rules based on transient data (real-time events)
Evaluations can be performed based on statistical computed
based on real-time feeds.
25. Ways to detect performance trends
Measure relevant application performance indicators
Orders filled, failed, missed
JMV GC activity, memory, I/O
Create a base line for each relevant indicator
1-60 sampling for near real-time baseline
1, 10, 15 min daily, weakly, monthly for short, long term
baseline
Samples can range anywhere from 1-60 seconds depending on
level of required resolution
Apply analytics to determine trends and behavior
Can vary from simple to complex
Prefer KISS approach (Keep It Simple and Stupid)
26. 3 Simple methods to detect trends
(No complex math required)
Bollinger Bands
Determine high and low bands based on available baseline
Defines a normal channel which is typically within 2
standard deviations from the mean
Compute STDDEV, Mean, Current sample
% Change
Sample to sample, day-to-day, week-to-week, etc.
Velocity
Number of measured units per unit of time (example:
response time drops from 10 to 20 seconds over 5 sec
interval – means (20-10/5)=2 units/sec.
27. Typical Usage
High Band
Given a set of metrics, alert when one or more are above High band for
at least 2+ samples
Indication of abnormal activity over a period of time
Caution: abnormal can become the new normal
% Change
Useful indicator for near real-time monitoring of resources (such as
heap, memory, CPU, storage)
Useful indicator for long term trends (daily, weekly)
Velocity
Very useful for monitoring metrics that measure usage of resource
that have a finite upper bound (memory, storage, table space
etc.)
Measuring velocity can help measure when upper limits can be
reached
28. Required instrumentation
Data collectors
Attempt to collect all relevant indicators within the same time tick
Response time, GC activity, memory usage, CPU usage
Build a history for each collected metric
Either in memory for near real-time analysis
Storage for short, long term (min, hours, days)
Pattern matching, analytics
Need to scan and pattern match application metrics (such as find all
applications whose GC is above High Bollinger Band for 2+ samples)
Run as a continuous query, which is executed as metrics are collected
and updated
Actionable Outcome
Alerts, notifications, actions
Visualization, dashboards
29. Example: Monitoring Java Application by examining GC Activity
Java Application running in a standalone JVM
container
Monitoring JVM GC (Garbage Collection) as a
byproduct of application activity
Sample GC every 10 seconds
# GC Samples
GC Duration (ms.)
GC CPU Usage %
Avg. GC CPU Usage (since JVM startup)
JVM Heap Utilization %
33. Typical causes of Java leaks
Programming errors, bugs
Unchecked array, list, hash map growth
Not closing JDBC Prepared Statements
Not closing Sockets, File handles
Thread leaks, handle leaks
Class loader leaks
Resources allocated outside JVM
34. Leaking Chart Pattern – Detecting Resource
Accumulation
VM Heap Usage %
VM Heap Usage %
35. Detecting Resource Leaks using Momentum Oscillator
Leak pattern
detected
Momentum Oscillator
Trending higher
Heap not yet exhausted
Momentum Oscillator: values between 0-100, difference between the sum of all
recent gains and losses in the underlying metric. Value of 50 means that the net
difference of gains and losses is zero – 0 net gain and loss.
36. Conclusion: Monitoring Elastic Environments
Elastic Applications can’t be monitored using static models
Static thresholds
Static data/transaction flow models
Complex systems layered on top of complex systems
Too many constantly changing variables
Makes root cause analysis very difficult
Requires extensive cross technology expertize
Preferred approach – Holistic Application Monitoring
Granular data collection:
Application and infrastructure metrics
Analytics, automated base lines
Real-time and historical
Resource monitoring coupled with Transaction Profiling
Visualization that connects different teams:
Application support, DevOps, IT Support
37. SEMC
De Post – La Poste
37
Some of our valued Clients
Delivering value since 1994
Over 200 customersCustomer
for 7 years
Customer
for 10 years
Customer
for 11 years