This talk reviews probabilistic models including frequentism and Bayesian logic before discussing business scenarios where statistics will fail to provide answers.
1. THE LIMITS OF STATISTICS
Paul Barsch
Director, Teradata Marketing
2. 2 2/5/2016 Teradata Confidential
Landscape of Data Analysis
Computer
Science
Algorithms, databases
Medicine,
Finance,
Business, Web
Computer
Programming
(C++, Java, Python)
Applications
Supply Chain, CRM,
Pricing optimization
Data Analysis
SNA, Data Mining,
Geospatial, Temporal,
Predictive, Sentiment
Probability &
Statistics
(Machine Learning, Risk
Operations Research)
Mathematics
(Calculus)
*Adapted from
simplystatistics.org
3. 3 2/5/2016 Teradata Confidential
“Statistical and applied probabilistic knowledge is
the core of knowledge; statistics is what tells you
if something is true, false, or merely anecdotal; it is
the "logic of science"; it is the instrument of risk-
taking; you can't be a modern intellectual and
not think probabilistically.”
- Nassim Nicholas Taleb, Professor of Risk Engineering, NYU
Value of Statistics and Probability
4. 4 2/5/2016 Teradata Confidential
• Successful investors think in terms of probabilities, as
Charles Munger noted in his 1994 lecture to the University of
Southern California , “Warren Buffett…automatically
thinks in terms of decision trees and the elementary
math of permutations and combinations…”
• “Sound decisions are based on identifying relevant
variables and attaching probabilities to each of them.
That’s an analytic process but also involves subjective
judgments.”
– Former US Treasury Secretary Robert Rubin
• Probabilistic models are indispensable in science, engineering
and business
Value of Thinking Probabilistically
5. 5 2/5/2016 Teradata Confidential
• Risk Management
> “Risk is no longer something to be faced, risk has become a set
of opportunities open to choice” –Peter L. Bernstein
> Probability theory helps quantify risks
> Normal distribution forms the core of most systems of risk
management
• Forecasting
> Science of forecasting – a systematic method of analyzing
future outcomes –Peter L. Bernstein
> Be careful…
– Past performance is a frail guide to the future
Probability Applied to…
7. 7 2/5/2016 Teradata Confidential
Problems with Predictions
“In all my experience I have
never been in any accident of
any sort worth speaking about.
I have seen but one vessel in
distress in all my years at
sea…I never saw a wreck and
have never been wrecked, nor
was I ever in any predicament
that threatened to end in
disaster of any sort.
-Edward John Smith- Captain, Titanic
8. 8 2/5/2016 Teradata Confidential
Problems with Predictions
"I think there is a
world market for
maybe five
computers."
- Thomas Watson, chairman of IBM, 1943
9. 9 2/5/2016 Teradata Confidential
• “We’ve never had a decline in
house prices on a nationwide
basis. So, what I think what is
more likely is that house prices
will slow, maybe stabilize, might
slow consumption spending a bit.
I don’t think it’s gonna drive the
economy too far from its full
employment path, though.” 7/05
• “I expect there will be some
failures. I don’t anticipate any
serious problems of that sort
among the large internationally
active banks that make up a very
substantial part of our banking
system.” – Feb 2008
Problems with Predictions
10. 10 2/5/2016 Teradata Confidential
• In the early 1990s, JK Rowling’s
Harry Potter and the
Philosopher’s Stone was rejected
by 12 UK publishers.
• “Not unique enough to stand out
in the marketplace” – recording
studios to Madonna in early
1980s
More Problems with Predictions
13. 13 2/5/2016 Teradata Confidential
Frequentism/Bayes/Black Swans
• There are known knowns;
there are things we know that
we know.
• There are known unknowns;
that is to say, there are things
that we now know we don't
know.
• But there are also unknown
unknowns – there are things
we do not know we don’t
know.
» Donald Rumsfeld
February 12, 2002
14. 14 2/5/2016 Teradata Confidential
Bell Curve – The Search for Significance
• Frequentism
> Measures
frequency of an
event that can
be repeated
over and over
> Need a large
number of
observations
• Assumes:
> Normal
distribution
(randomness)
> Independence
(i.e. coin flips) 66-95-99.7
Unlikely
events
are RARE;
many std
dev from
mean
15. 15 2/5/2016 Teradata Confidential
Bayesian Inference – “In All Likelihood…”
• Bayes is subjective probability
– a measure of belief.
• Not precise, not objective. We
can learn from approximations
• Allows making of predications
with no prior information at all
• Infer where objects are based
on learned experience; each
new bit of information gets you
closer to certitude, keep
revising probabilities
• Compute power helps
• The hunt for U-Boats and
Soviet Subs!
16. 16 2/5/2016 Teradata Confidential
What Lies Beneath? Black Swans!
• Beware Outliers
> Ten sigma events
> 2008 probability
1 in 73
quadrillion
• Black Swans
> Swell up, take
decades
> Statistics don’t
work here
> 1 in 100 year
events happen
now happen
every 2-3 years!
17. 17 2/5/2016 Teradata Confidential
• Most people ignored very
low probability risks of the
worse outcomes.
• They spent an inordinate
amount of time worrying
about the 20% chance of
having a bad day and no
time thinking about the
1% chance of their
entire life being turned
upside down.
– Robert Rubin at HBS
From Rubin’s Lips…
18. 18 2/5/2016 Teradata Confidential
Business Professionals – Ready to Place
Your Bets?
19. 19 2/5/2016 Teradata Confidential
$100K For A Rock Thrown at You Everyday
What we assume –
“Mediocristan”
What happens very
infrequently – but with
large impact!
“Extremistan”
20. 20 2/5/2016 Teradata Confidential
• “Our minds are in the business of
turning history into something smooth
and linear, which makes us under
estimate randomness.”
• “Complex systems – like the world we
live in – are full of non-linear responses
with disproportionate responses”
• Which has the proportionately larger
impact?
> Run my car into a wall 100x at 1pm per
hour
> Run my car into a wall 1x at 100mph?
• “More harmed by a single rock than
1000 pebbles”
We Expect the Smooth and Linear…
* All quotes sourced from Anti-Fragility by Nassim Taleb
21. 21 2/5/2016 Teradata Confidential
Beware Mickey Mouse Probabilities
• Before the -23% drop in the
1987 crash, the worst previous
in sample move was close to
10%
> Take 40 years of market data: 1
day accounts for 80% of the
kurtosis – or peak (tail weight)
• “Not in a million years would we
have expected this gyration to
be as vicious and enduring as it
has been,”
– Steven Solmonson, head of Park Place
Capital Ltd.
• “A turkey cannot figure out
what is in store for it tomorrow
based on the events of today.”
22. 22 2/5/2016 Teradata Confidential
• http://video.pbs.org/video/2202847024/ (go to 1:25 til
8:25)
Modern Day Challenges: Fat Tails Happen!
23. 23 2/5/2016 Teradata Confidential
• 40% of world’s embedded microcontrollers (for cars) made
at factory disrupted by Fukushima
Who Needs a Microcontroller Anyway?
• Controls engine
• Sensing systems for
airbags
• Dashboard display
systems
• GPS Navigation
• Collision warning
• Advanced features
such as self parking
systems, internet
access
Production of 370,000 cars delayed at Toyota
24. 24 2/5/2016 Teradata Confidential
Thai Floods Take Toll
On Dell:
• Feb 2012: Shares fall 5%
after Dell misses 4th
quarter profit targets
• Absorbed $150m cost
increase in rising HDD
prices
• “Struggled to find mix of
high end drives needed to
carry its high margin
product line”
On Honda:
• Honda’s Thai assembly
plant (5% of global output)
shut for month for floods
disruption (FT)
25. 25 2/5/2016 Teradata Confidential
Drastic Changes in Past 20 Years
• Infrastructure investments
> Dot.com/Y2K
• World is Flat Phenomenon
> Work follows the sun
• Other Characteristics:
> Big Data
> Speed and Zero Latency
> Interconnectivity
> Fewer buffers
> Consolidated players
• Now we live in a system – fewer
islands
• Is the World more dangerous?
26. 26 2/5/2016 Teradata Confidential
Implications
• Humans mostly think linearly –
tomorrow will be like today
• “Globalization creates
interlocking fragility, while
reducing volatility and giving
the appearance of stability. In
other words it creates
devastating Black Swans”
– Nassim Taleb
• Lockstep: everything moving to
correlation of 1
• Tight Coupling
> Errors cascade through the
system- and fast
27. 27 2/5/2016 Teradata Confidential
• Robustness good, but not enough
• Anti-fragility benefits from disorder
and harm – much like the hydra
> Self healing
> Get stronger – like bacteria
> Improve over time
• Methods:
> Smaller units that individually do not
threaten the system --instead of TBTF
> Skin in the game – sleeping under the
bridge?
> Barbell to limit exposure and know
your maximum loss
> Look for optionality – “the right, not
the obligation” – limited loss, large
upside
Solutions? Aim for Anti-Fragility
* All quotes sourced from Anti-Fragility by Nassim Taleb
28. 28 2/5/2016 Teradata Confidential
• Remember – things that haven’t happened before will
happen. Things that have happened, will happen again.
• Your “worst case scenario” probably isn’t really the “worst
case scenario” (think: Fukushima’s seawall).
• With knowledge that things are fragile– better to be a little
wrong (limited loss) than majorly wrong (out of business)
• Don’t think of Black Swan’s as only negative (what I can
avoid)
> Think of them as “options”. Little investments with limited loss.
Many have more upside than downside
> Trial and error are options with small costs. There are huge pay-
offs for being right such as big discoveries (positive Black
Swan).
Food for Thought
29. 29 2/5/2016 Teradata Confidential
Rumsfeld’s Framework
• There are known knowns;
there are things we know that
we know. (FREQUENTISM)
• There are known unknowns;
that is to say, there are things
that we now know we don't
know. (BAYESIAN INFERENCE)
• But there are also unknown
unknowns – there are things
we do not know we don’t
know. (UNPREDICTABLE)
» Donald Rumsfeld
February 12, 2002