Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Brighttalk reason 114 for learning math - final
1. Reason #114 For Learning Math:
Using Analytics to Improve Service Assurance
Follow Us: #ITSMSummit!
2. Andrew White
Cloud and Smarter Infrastructure Solution Specialist
IBM Corporation
Mr. White has fifteen years of experience designing and managing the
deployment of Systems Monitoring and Event Management software. Prior
to joining IBM, Mr. White held various positions including the leader of the
Monitoring and Event Management organization of a Fortune 100 company
and developing solutions as a consultant for a wide variety of organizations,
including the Mexican Secretaría de Hacienda y Crédito Público, Telmex,
Wal-Mart of Mexico, JP Morgan Chase, Nationwide Insurance and the US
Navy Facilities and Engineering Command.
Follow Us: #ITSMSummit!
4. Ground rules for this
session…
• If you can’t tell if I am trying to be funny…
–
GO AHEAD AND LAUGH!
• Feel free to text, tweet, yammer, or whatever
to share with the rest of the attendees
• If you have a question, no need to wait until
the end. Just interrupt me. Seriously… I
don’t mind.
Follow Us: #ITSMSummit!
5. I am here today to share some of what I have learned about
6. CIO’s turn to innovative technologies to
deliver better outcomes
Big Data Analytics
§ Analyze an enormous variety of information sources
§ Real-time insights & actions on streaming data
Security
Intelligence
Mobile
Enterprise
§ Hybrid mobile "
app development
§ Multi-channel integration
§ Device management
§ Workloads on the move
Follow Us: #ITSMSummit!
Cloud & Optimized
Workloads
§ Agile provisioning
§ Elastic compute power
§ Scalable storage
resources
§ Intelligent services
§ People &
identity
§ Data &
information
§ Application
security
§ Security
analytics
IBM
CIO
Study
(2012)
7. Why is problem solving hard?
Non-transparency (lack of
clarity of the situation)
Polytely (multiple goals)
Complexity (large numbers
of items, interrelations,
and decisions)
Dynamics (time
considerations)
Follow Us: #ITSMSummit!
• commencement opacity
• continuation opacity
• inexpressiveness
• opposition
• transience
• enumerability
• connectivity (hierarchy relation, communication relation, allocation
relation)
• heterogeneity
• temporal constraints
• temporal sensitivity
• phase effects
• dynamic unpredictability
9. Predictive Modeling Timeline
Past Behavior
• The observation period
used to feed the
forecasting models
Follow Us: #ITSMSummit!
Point of
Observation
Future Behavior
• The performance
period the model is
trying to predict
10. Predictive models
harness the information
lost in past data so you
can identify discretely
identify situations and
react to them quickly.
11. Analytics 1.0
In the early days, we were just
happy to know if the network
was up or down.
We suffered from event floods
and the perpetually red event
console.
Follow Us: #ITSMSummit!
12. Analytics 2.0
Eventually the technology
allowed us to correlate based
on topology and filter
unnecessary events.
Dashboards were all the rage
and were measured in data
per square inch.
Follow Us: #ITSMSummit!
13. Evolution of Analytics
Value
What
Will
Happen?
Why
Did
It
Happen?
What
Happened?
How
Do
We
Make
It
Happen?
Prescriptive
Analytics
Predictive
Analytics
Diagnostic
Analytics
Descriptive
Analytics
Follow Us: #ITSMSummit!
Difficulty
Adapted from Gartner
16. Our Thought Process
Most primitive, seat of unconscious
Cognition
Stimulus
Perception
Limbic Center
(via the senses)***
(hypocampus and amygdala)
Conscious Choice
Pre-Frontal Cortex
(via motor centers)
Conscious, meaning, choice
Long-term memory
Follow Us: #ITSMSummit!
(hypocampus and amygdala)
Cortex
(hypocampus and amygdala)
*** not very reliable
17. Short Term Memory
Short-term memory is
where the real work of
sense-making takes place
Short-term memory
has a limited
amount of space
(The estimate is 7 ± 2)
Follow Us: #ITSMSummit!
Working Memory
Understanding
Judgement
Relationship
Your Brain
24. Models of Reasoning
Theory
Development
Theory
Interpreta@on
Hypothesis
Data
Hypothesis
Tes@ng
• Inductive
– Starts with Data Available
– Concludes with Possible
Hypotheses
– Bottom Up “Data Driven
Approach”
Follow Us: #ITSMSummit!
• Deductive
– Starts with Theoretical
Framework
– Concludes with Logical
Deductions
– Theory Driven Approach
25. Two Types of Decision Making
Programmed Decisions
–
–
–
–
Routine
Repetitive
Well-Structured
Predetermined Decision
Rules
Follow Us: #ITSMSummit!
Non-Programmed Decisions
–
–
–
–
Unique
Presence of Risk
Presence of Uncertainty
Black Swans
26. How To Improve Decision Making
• Programmed Decision
Making
–
–
–
–
Collect evidence
Identify the problem
Select a solution
Implement and evaluate the
outcome
Follow Us: #ITSMSummit!
• Non-Programmed
Decision Making
– Narrow evidence down to
the ideal level
– Apply heuristics to limit the
impact of cognitive bias
– Present options to a human
for a decision
27. Four Sources of Bad
Decisions
•
•
•
•
Failure to frame the problem correctly
Poor use of evidence
Faulty decision making process
No feedback for improvement
Follow Us: #ITSMSummit!
28. Common Logical Fallacies
•
•
•
•
•
•
•
•
•
Appeals to Authority – where you rely on an expert source to form the basis of your
argument
False Inductions – where you infer a causal relationship where none is evident
Reification – when you rely on taking a hypothesis or potential theory and present it as a
known truth
The Slippery Slope – when you base an argument on the thinking that once one action is
taken, it will trigger a sequence of events that will result in the direst of consequences
The Band Wagon – when you present an argument as true on the basis of its popularity
The False Dichotomy – when you provide only two options and force a choice to be made
The Straw Man – when you create a false argument and refute it implying that the counter
argument is true
Observational Selection – when you draw attention to the positive aspects of an idea and
ignore the negatives
Statistics of Small Numbers – when you take one (or a very small sample) and use it to draw
a general conclusion
Follow Us: #ITSMSummit!
29. The problem is not that
there are no silver bullets…
the problem is that there are
no werewolves.
- Jim Tussing, CTO, Nationwide Insurance
30.
31. Global Warming and Inflation
Global warming
Inflation
Follow Us: #ITSMSummit!
35. Boyd’s Loop
Observe
Orient
Decide
Act
Implicit Guidance & Control
Unfolding
Circumstances
Cultural
Norms
Observation
Feed
Forward
Knowledge
Life Cycle
New
Information
Outside
Information
Cognitive
Abilities
Feed
Forward
Decision
(Hypothesis)
Feed
Forward
Action
(Test)
Prior
Wisdom
Feedback
Feedback
Unfolding Interaction With Environment
•
Note how observation shapes orientation, shapes decision, shapes action, and in turn is shaped by the
feedback and other phenomena coming into our sensing or observing window.
•
Also note how the entire “loop” (not just orientation) is an ongoing many-sided implicit cross-referencing
process of projection, empathy, correlation, and rejection.
Follow Us: #ITSMSummit!
From “The Essence of Winning and Losing,” John R. Boyd, January 1996.
36. Where the Breakdown
Occurs
Systemic Influences!
• System Capability!
• Interface Design!
• Stress & Workload!
• Complexity!
• Automation!
Current State!
Feedback!
Situational Awareness!
Perception of
Elements in
Current Situation!
!
Level 1!
Observe!
Comprehension
of Current
Situation!
!
Level 2!
Projection of
Future Status!
!
!
Level 3!
Orient!
• Goals & Objectives!
• Preconceptions!
• Expectations!
Decision!
Decide!
Individual Influences!
Adapted from Endsley, M.R. (1995b). Toward a theory of situation awareness
Follow Us: #ITSMSummit!
in dynamic systems. Human Factors 37(1), 32–64.!
Act!
Cognitive Processes!
Long Term
Memory!
• Abilities!
• Experience!
• Training!
Performance
of Actions!
Automaticity!
37. Sometimes We Miss What
is Going On
Say… what’s a mountain
goat doing all the way up
here in these clouds?
Follow Us: #ITSMSummit!
39. The Gaussian Bell Curve
Mean
-1σ
+1σ
-2σ
-3σ
+2σ
67%
95%
Follow Us: #ITSMSummit!
99.5%
+3σ
40. The trick is not to spend our
time trying to get better at
predicting this world, or
making it more predictable, for
both of these strategies are
bound to fail.
- Nassim Nicholas Taleb, Author and Philosopher
41. Normative Decision
Making Model
• Limited Information Collection
– 7 +/- 2
– Tendency to acquire manageable rather than optimal amounts of
information
– Difficulty identifying all possible options
• Judgmental Heuristics
– Judgmental heuristics - rules of thumb or shortcuts that people use to
reduce information processing demands
– Availability heuristic - tendency to base decisions on information
readily available in memory
– Representativeness heuristic - tendency to assess the likelihood of an
event occurring based on impressions about similar occurrences
• Satisficing
– Choosing a solution that meets a minimum standard of acceptance
Follow Us: #ITSMSummit!
42. The Analytics Focus…
In addition to handling monitoring and performance alerts, it
helps drive improved availability.
The Formula:
1.
Continually collect, categorize, and analyze all events from as many
sources as possible
2.
Correlate events and analyze them using previous outages as patterns
to identify situations worth investigating
3.
Notify a support team so the situation can be mitigated before
becoming an outage
4.
Automate responses that have well established situational fingerprints
and proven resolution steps
Follow Us: #ITSMSummit!
43. Most Common Modeling Tasks
•
•
•
•
•
•
•
Classification: predicting an item class, “decision tree”
Clustering: finding natural groups or clusters in data
Association: finding things that occur together
Deviation: finding changes or outliers
Estimation: predicting values
Linkage: finding relationships among actors
Mining: extracting information from data
Follow Us: #ITSMSummit!
44. Types of Analytical Algorithms
Algorithm
Description
Decision Tree
Calculating the odds of an outcome
Association Rules
Identifying the relationships between elements
Naïve Bayes
Clearly showing the differences in a particular variable
Sequence Clustering
Grouping data based on a sequence of events
Time Series
Analyze and forecast time-based data
Neural Networks
Seek to uncover non-intuitive relationships in data
Text Mining
Analyze unstructured text data looking for context and meaning
Linear Regression
Determine the relationship between columns to predict an
outcome
Logistic Regression
Evaluate the relationship between columns in order to evaluate
the probability that a column will contain a specific state
Follow Us: #ITSMSummit!
45. Questions Answered by Analytics
Business Question
What is the best that can happen?
Optimization
What will happen next?
Predictive
What if this trend continues?
Predictive/Forecasting
Why is this happening?
Variance analysis/Root Cause
Is some action needed?
Alerts
Where is the problem?
Query/Drill Down
How many, how often, when?
Value
Method
Ad hoc reports
What happened?
Standard reports
Follow Us: #ITSMSummit!
47. Detection Time
Response Time
Repair Time
Recovery Time
Down Time
Observe
Follow Us: #ITSMSummit!
Orient
Decide
Restore
Recover
Repair
Diagnosis
Outage
Detection
Incident Life Cycle
Act
48. Anatomy of an Outage
!2!
!
5:45-ish pm: CICS ABENDS
start flooding the console but
not high enough to ticket!
6:00-ish pm: MQ flows start are interrupted
and are alerting in Flow Diagnostics!
!1!
!
Database!
WAS!
Load Balancer!
zOS!
CICS!
Firewall!
6:04pm: Synthetic transactions fail at
and 6:14 the Ops Center confirms the
issue Follow Us: #ITSMSummit!
and creates a P0 Incident!
Message!
Queue!
WAS!
Database!
6:54pm: Support teams
investigate the interrupted
flows and determine it is a
“back-end” problem!
!
!
!
!
!
!
!
!
!
!
3!
Web!
Servers!
4!
!
!
!
!
Corporate!
LANs & VPNs!
DB2!
5!
zOS!
MQ!
10:29pm: Support teams
investigate MQ and ultimately
and rule it out and ultimately
decide to reset CICS to resolve
the issue!
49. Why did this happen?!
hKp://www.ithakabound.com/wp-‐content/uploads/2010/02/DC-‐Snow-‐men-‐pushing-‐car.jpg
Follow Us: #ITSMSummit!
50. The Problem
Why aren’t operations teams preventative today?
§ Too much data to analyze manually
§ Existing analytic techniques, such as standard thresholds, are not up to the task
§ They cannot detect problems while they are emerging (before business impact)
§ Set threshold too high, insufficient warning before total failure.
§ Set threshold too low, too much noise, everything is ignored
If no there is no ‘early detection’ before the outage, operations teams can only react
while outage is already in effect and already losing money...
Follow Us: #ITSMSummit!
51. Processing Streams
Real-Time
Event Streams
Situational
Awareness
Engine
Patterns from
Historical Data
Follow Us: #ITSMSummit!
Detected and
Predicted Situations
Causal Relationship
from Past RCAs
Adapted from http://www.slideshare.net/TimBassCEP/getting-started-in-cephow-to-build-an-event-processing-application-presentation-717795
52. Complex Event Processing
Event Queries
A
Data Events
Control Event
Other Events
B
C
Event Filter
Time Window
Feedback Loop
Event Pipeline
Follow Us: #ITSMSummit!
Event Intelligence
Scenarios
Action Events
53. One Integrated Environment
CMDB
Paging
Presentation Framework
Service Desk
Knowledge
3rd Party Providers
Asset Mgmt
Enrichment & Correlation
Event API
Event Pool
Predictive
Business Telemetry
Mainframe
Distributed
Follow Us: #ITSMSummit!
Database
Network
Middleware
Storage
Operational!
Data Warehouse!
Event Catalog
54. Integrate Your Processes
Audit Information and Suspicious Activity
Automated
Discovery
Status Indications
Trend-Related Faults
Discovered Problems
Availability
Management
Performance
Management
Asset Management
& Topology
Database
Configuration
Management
Change
Management
Topology Snapshots
Historical Data
Configuration Discrepancies
Security
Management
Incidents
Change Activity
Aggregation
and Analysis
“Enriched” Events
Enrichment Data
Enterprise Data
Sources
Business Activity Data
Business
Telemetry
Information
Business Activity Data
Enrichment Data
Follow Us: #ITSMSummit!
Presentation
Framework
55. Service Provider
Managed Monitoring
System!
Vendor Managed
Monitoring System!
Automated Action!
KM
Entries!
Triage!
Archive and Report!
Notification and
Escalation!
Business Impact
Analysis!
Root Cause Analysis!
Automated
Provisioning
System!
Correlation and
Event Suppression!
Predictive
Analysis!
Automated
Action Tools!
Meta-Data Integration Bus!
Distributed Collectors!
Automated
Change
Reconciliation!
Enrichment!
Element
Manager!
Service Center
and Enterprise
Notification Tool!
Topology And
Relationship
Database!
Common Event
Format!
Element
Manager!
Distributed Collectors!
Element
Manager!
Business
Telemetry Data!
Distributed Collectors!
LOB Managed
Monitoring System!
Follow Us: #ITSMSummit!
Service
Center!
Security
Management!
Yammer!
CMDB!
CVOL!
APM!
Visualization!
Framework!
xMatters!
56. Optimized
Performance
Track,
Op3mize,
and
Predict
capacity
and
performance
needs
over
3me
Perform
• Track capacity and
performance of applications and
services in classic and cloud
environments
• Optimize resource deployment
with what-if and best fit
planning tools
• Escalate capacity and
performance problems before
they cause critical failures
Predictive Outage
Avoidance
Ensure
availability
of
applica3ons
and
services
Predict
• Use learning tools to
augment custom best
practices
• Leverage statistical
methods to
maximize
predictive warning
• Improve problem
detection across IT silos
Faster Problem
Resolution
Find
&
correct
problems
faster
with
tools
that
determine
ac3ons
required
to
resolve
issues
Resolve
• Identify problems quicker with
insight to large unstructured
repositories
• Isolate problems quicker by
bringing relevant unstructured
data into problem investigations
• Repair problems quicker with
the right details quickly to hand.
Automated Analytics helps lower IT Administration Costs:
Improved Insight
Enhance
visibility
into
systems
resource
rela3onships
while
increasing
customer
sa3sfac3on
Know
• Determine what resources
are interdependent to assess
impact of failures
• Gain insight into what is
important to your customer
• Decrease customer churn
and acquisition costs while
increasing customer
retention and satisfaction
• Performance and Capacity planning tools monitor appropriately and escalate, reducing
time consuming report browsing
• Learning tools reduce customization and best practices investment on initial deployment
• Log Analysis helps speed problem resolution to be able to do more with less
Follow Us: #ITSMSummit!