1. Big Data Wonderland:!
Two Views on the Big Data Revolution
Strata New York
October 2012
Mark Madsen
Marc Demarest
Third Nature, Inc.
Noumenal, Inc.
mark@thirdnature.net
marc@noumenal.com
@markmadsen
2. Preamble
Twenty Years On
• We came up together in this
industry in the early 1990s, as
pointy-headed advocates of data
and star schema design, trained by
the deity himself, Ralph Kimball
Our Alma Mater
• Back then, it was a simpler
world...big iron, big DBMS,
hand-coded ETL, star schema, a
thousand rinky-dink query tools
• Mostly, conversation was
dominated by ETL and schema
design
• “There will never be a decisional St. Ralph
database larger than 10 GB...”
2
Third Nature, Inc. || Noumenal, Inc.
3. Preamble
Twenty Years On
• Twenty years on, we find ourselves
with opposing view on what is
either the biggest con, or the biggest
sea-change, in our data
warehousing odyssey
Madsen as
Jack Kilpatrick?
• Question: Is the big data revolution
big, or a revolution?
• Question: do we have to change?
and if so, how?
• Not a round table. A slugfest....
Demarest as
Shana Alexander?
3
Third Nature, Inc. || Noumenal, Inc.
5. Compromise
You take the blue pill.
The story ends, you
wake up in your bed
and believe whatever
you want to believe.
You take the red pill,
you stay in
Wonderland, and I
show you how deep
the rabbit hole goes.
Remember, all I am
offering is the truth: Demarest
Madsen
nothing more.
5
Third Nature, Inc. || Noumenal, Inc.
6. The Issues
1. Data As A Factor of Production
RED
BLUE
Hype.
For most companies,
data is an asset
supporting process,
not a factor in the
production of its
products or services.
Execute vs manage
the business.
6
Third Nature, Inc. || Noumenal, Inc.
7. The Issues
1. Data As A Factor of Production
RED
BLUE
Amen.
Hype.
This change has For most companies,
been in process for data is an asset
more than a decade. supporting process,
Social media leads not a factor in the
the way, but we’re all production of its
affected.
products or services.
Execute vs manage
the business.
7
Third Nature, Inc. || Noumenal, Inc.
8. The Issues
2. The Reality of Big Data
RED
BLUE
No company escapes.
Text, social, sensors,
streaming -- the
instrumentation of
the real world
transforms company
decision-making
processes.
8
Third Nature, Inc. || Noumenal, Inc.
9. The Issues
2. The Reality of Big Data
RED
BLUE
No company escapes.
Few companies
transformed.
Text, social, sensors,
streaming -- the Social media help up
instrumentation of a lot with no
the real world quantification of
transforms company benefits.
decision-making
processes.
Management
consultants? Asleep
at the switch.
9
Third Nature, Inc. || Noumenal, Inc.
10. The Issues
3. The Commodity Hardware Revolution & Radical Scale-Out
RED
BLUE
The current topology
is alive and well.
These commodity
building blocks are,
after all, just SMP
platforms.
Real problems are
under-investment,
bad design.
10
Third Nature, Inc. || Noumenal, Inc.
11. The Issues
3. The Commodity Hardware Revolution & Radical Scale-Out
RED
BLUE
The new topology.
The current topology
is alive and well.
Cheap compute,
unintelligent direct- These commodity
attach storage and building blocks are,
free comms make after all, just SMP
large scale-out grids platforms.
the future.
Real problems are
under-investment,
bad design.
11
Third Nature, Inc. || Noumenal, Inc.
12. The Issues
4. Merchant DBMSs
RED
BLUE
Increasingly
irrelevant.
We’ve been over-
structured and
under-resourced for
20 years.
CSV is still the
international
standard.
12
Third Nature, Inc. || Noumenal, Inc.
13. The Issues
4. Merchant DBMSs
RED
BLUE
Increasingly Will rise to the
irrelevant.
challenge.
We’ve been over- Any worthwhile
structured and innovation will be
under-resourced for absorbed by the
20 years.
merchant DBMS
players.
CSV is still the
international Even the big players
standard.
use these things.
13
Third Nature, Inc. || Noumenal, Inc.
14. The Issues
5. Query, Reporting & Dashboarding Tools
RED
BLUE
Will rise to the
challenge.
We have two
generations of
analysts trained to
feed using these
tools.
Big data offers no
last-mile answers.
14
Third Nature, Inc. || Noumenal, Inc.
15. The Issues
5. Query, Reporting & Dashboarding Tools
RED
BLUE
Ineffective, now and Will rise to the
in the future.
challenge.
Can’t do real-time, We have two
can’t visualize large generations of
data sets, can’t analysts trained to
support discovery feed using these
and exploration.
tools.
Big data offers no
last-mile answers.
15
Third Nature, Inc. || Noumenal, Inc.
16. The Issues
6. Structured Query Language
RED
BLUE
Toast.
Too complex, too
hard to code, too
hard to debug. A way
of ensuring
dependency on
merchant DBMSs.
16
Third Nature, Inc. || Noumenal, Inc.
17. The Issues
6. Structured Query Language
RED
BLUE
Toast.
Tasty.
Too complex, too Powerful, expressive
hard to code, too language for
hard to debug. A way complex analytical
of ensuring problems.
dependency on
merchant DBMSs.
Why do noSQL
vendors reinvent it
all the time?
17
Third Nature, Inc. || Noumenal, Inc.
18. The Issues
7. New Programming Models
RED
BLUE
The “new model”
looks a lot like SAS,
only with java and
no support.
Open source doesn’t
mean free. Or easy.
The skills gap here is
huge, we can’t fill it.
18
Third Nature, Inc. || Noumenal, Inc.
19. The Issues
7. New Programming Models
RED
BLUE
Say hello to Pig.
The “new model”
looks a lot like SAS,
New analytical only with java and
problems no support.
(decisioning,
discovery, Open source doesn’t
exploration) require mean free. Or easy.
new languages, new
tools and new The skills gap here is
programming huge, we can’t fill it.
models.
19
Third Nature, Inc. || Noumenal, Inc.
20. The Issues
8. Conventional DW Architecture
RED
BLUE
A relic.
Overly complex.
Difficult to
implement.
Controlled by the
supply side of the
market, anyway.
20
Third Nature, Inc. || Noumenal, Inc.
21. The Issues
8. Conventional DW Architecture
RED
BLUE
A relic.
Perfectly viable. No
need to change.
Overly complex.
Difficult to Some new
implement. technologies may
Controlled by the play roles, but we’re
supply side of the good to go, generally.
market, anyway.
Built by developers
for users. The new is
built by developers
for developers.
21
Third Nature, Inc. || Noumenal, Inc.
22. The Issues
9. The Cloud
RED
BLUE
Don’t go there.
Your inside-the-
firewall apps remain
the core information
asset.
Where is “there”
anyway?
22
Third Nature, Inc. || Noumenal, Inc.
23. The Issues
9. The Cloud
RED
BLUE
We all go there.
Don’t go there.
Most of the Your inside-the-
interesting data is firewall apps remain
there; it’s more the core information
effective to move our asset.
data, and our
analyses, to where Where is “there”
the data is, already.
anyway?
23
Third Nature, Inc. || Noumenal, Inc.
24. The Issues
10. New Technologies
RED
BLUE
Save Us.
Best of breed
integration led by in-
house designers ins
back, with a
vengeance.
24
Third Nature, Inc. || Noumenal, Inc.
25. The Issues
10. Emerging Technologies
RED
BLUE
Save Us.
Distract Us.
Best of breed We’ve already seen
integration led by in- what best-of-breed
house designers ins gives us: a circus.
back, with a
vengeance.
25
Third Nature, Inc. || Noumenal, Inc.
26. What We Really Think
1. Data As A Factor of Production
2. The Reality of Big Data
3. The Commodity Hardware Revolution
4. Merchant DBMSs
5. Query, Reporting & Dashboarding Tools
6. Structured Query Language
7. New Programming Models
8. Conventional DW Architecture
9. The Cloud
10. New Technologies
26
Third Nature, Inc. || Noumenal, Inc.