The document discusses challenges related to large volumes of data, or "Big Data". Traditional technologies try to divide and separate data across different systems, but this becomes difficult to manage at scale. The presenter introduces Hadoop as an alternative approach that can handle large volumes of data in a single system and democratize access to data. Hadoop provides a framework for storage, management and processing of large datasets in a distributed manner across commodity hardware.
4. 100%
Open
Source
–
Democra/zed
Access
to
Data
The
leaders
of
Hadoop’s
development
We
do
Hadoop
Drive
Innova/on
in
the
plaForm
–
We
lead
the
roadmap
Community
driven,
Enterprise
Focused
5. We
do
Hadoop
successfully.
Support
Training
Professional
Services
14. We
are
obsessive
compulsive
about
collec/ng
and
structuring
our
data.
15. Put
it
away,
delete
it,
tweet
it,
compress
it,
shred
it,
wikileak-‐it,
put
it
in
a
database,
put
it
in
SAN/NAS,
put
it
in
the
cloud,
hide
it
in
tape…
16. You
need
data.
Your
customers
expect
you
to
know
what
they
want
before
they
do.
24. Another
EDW
Analy/cal
DB
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
The
solu/on?
EDW
Data
Data
Data
Data
Data
Data
Data
Data
Data
OLTP
Data
Data
Data
Data
Data
Data
Data
Data
Data
Yet
Another
EDW
Data
Data
Data
Data
Data
Data
Data
Data
Data
25. Another
EDW
Analy/cal
DB
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
OLTP
Ummm…you
dropped
something
EDW
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Yet
Another
EDW
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
28. Wait,
you’ve
seen
this
before.
…
Data
Data
Data
Analy/cs
Sausage
Factory
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
…
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
31. “Prices,
Stupid
passwords,
and
Boring
Sta/s/cs.”
-‐
Hans
Rosling
h)p://www.youtube.com/watch?v=hVimVzgtD6w
32. Your
data
silos
are
lonely
places.
EDW
Accounts
Customers
Web
Proper/es
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
33. …
Data
likes
to
be
together.
EDW
Accounts
Customers
Data
Data
Web
Proper/es
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
34. CDR
Data
Data
Data
Machine
Data
Facebook
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Weather
Data
Twi^er
Data
Data
likes
to
socialize
too.
Data
Data
EDW
Data
Data
Data
Data
Data
Data
Accounts
Data
Web
Proper/es
Data
Data
Data
Customers
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
35. New
types
of
data
don’t
quite
fit
into
your
pris/ne
view
of
the
world.
Logs
Data
Data
Data
Data
Data
Data
Data
Machine
Data
Data
Data
Data
Data
Data
Data
Data
My
Li^le
Data
Empire
Data
?
Data
?
Data
Data
Data
Data
Data
?
?
Data
Data
36. To
resolve
this,
some
people
take
hints
from
Lord
Of
The
Rings...
38. ETL
Data
Data
Data
ETL
ETL
ETL
EDW
Data
Data
Data
Data
Data
Schema
Data
Data
Data
Data
…but
that
has
its
problems
too.
ETL
Data
Data
Data
ETL
ETL
ETL
EDW
Data
Data
Data
Data
Data
Schema
Data
Data
Data
Data
39. ETL
Data
Data
Data
ETL
ETL
ETL
EDW
Data
Data
Data
Data
Data
Schema
Data
Data
Data
Data
Fragile
workflows
make
suppor/ng
the
analy/cal
models
you
want
expensive
and
/me-‐consuming.
ETL
Data
Data
Data
ETL
ETL
ETL
EDW
Data
Data
Data
Data
Data
Schema
Data
Data
Data
Data
42. Town/City
Middle
Income
Band
Your
segmenta/on
today.
Female
Age:
25-‐30
Male
Product
Category
Preferences
43. GPS
coordinates
Looking
to
start
a
business
Walking
into
Starbucks
right
now…
Spent
25
minutes
looking
at
tea
cozies
Unhappy
with
his
cell
phone
plan
$65-‐68k
per
year
Your
segmenta/on
with
Pregnant
be^er
data.
Tea
Party
Hippie
A
depressed
Toronto
Maple
Leaf’s
Fan
Gene
Expression
for
Risk
Taker
Male
Female
Age:
27
but
feels
old
Product
recommenda/ons
Thinking
about
a
new
house
Products
lek
in
basket
indicate
drunk
amazon
shopper
44. Pick
up
all
of
that
data
that
was
prohibi/vely
expensive
to
store
and
use.
55. If
you
could
design
a
system
that
would
handle
this,
what
would
it
look
like?
56. It
would
probably
need
a
highly
resilient,
self-‐healing,
cost-‐efficient,
distributed
file
system…
Storage
Storage
Storage
Storage
Storage
Storage
Storage
Storage
Storage
57. It
would
probably
need
a
completely
parallel
processing
framework
that
took
tasks
to
the
data…
Processing
Processing
Processing
Storage
Storage
Storage
Processing
Processing
Processing
Storage
Storage
Storage
Processing
Processing
Processing
Storage
Storage
Storage
58. It
would
probably
run
on
commodity
hardware,
virtualized
machines,
and
common
OS
plaForms
Processing
Processing
Processing
Storage
Storage
Storage
Processing
Processing
Processing
Storage
Storage
Storage
Processing
Processing
Processing
Storage
Storage
Storage
59. It
would
probably
be
open
source
so
innova/on
could
happen
as
quickly
as
possible