Contenu connexe
Similaire à Introduction to hadoop (20)
Introduction to hadoop
- 2. What we’ll cover?
¡ Understand
Hadoop
components
¡ Understand
different
technologies
involved
¡ Embrace
Big
Data!
Lynx
Consultants
©
2013
- 4. What is Big Data?
¡
SQL
has
a
limited
ability
to
process
changing
data
§ SQL
schemas
are
the
truth,
data
needs
to
fit
that
Lynx
Consultants
©
2013
- 5. What is Big Data?
¡
Big
Data
is
the
solution!
§ Data
can
be
truly
dynamic
Lynx
Consultants
©
2013
- 6. What is Big Data?
¡
Big
Data
is
the
solution!
§ Data
can
be
truly
dynamic
§ Designed
to
handle
Terabytes
of
data
Lynx
Consultants
©
2013
- 7. What is Big Data?
¡
Big
Data
is
the
solution!
§ Data
can
be
truly
dynamic
§ Designed
to
handle
Terabytes
of
data
§ Designed
for
fault
tolerance
and
securing
data
Lynx
Consultants
©
2013
- 8. What is Big Data?
¡
Big
Data
is
the
solution!
§ Data
can
be
truly
dynamic
§ Designed
to
handle
Terabytes
of
data
§ Designed
for
fault
tolerance
and
securing
data
§ Designed
around
exploiting
hardware
to
the
fullest
Lynx
Consultants
©
2013
- 9. What is Big Data?
¡
Big
Data
is
the
solution!
§ Data
can
be
truly
dynamic
§ Designed
to
handle
Terabytes
of
data
§ Designed
for
fault
tolerance
and
securing
data
§ Designed
around
exploiting
hardware
to
the
fullest
§ Designed
around
Map/Reduce
Lynx
Consultants
©
2013
- 10. Who runs Big Data?
¡ A
few
small
companies
Lynx
Consultants
©
2013
- 11. Who runs Big Data?
¡ A
few
small
companies
Lynx
Consultants
©
2013
- 12. Who runs Big Data?
¡ A
few
small
companies
Lynx
Consultants
©
2013
- 13. Who runs Big Data?
¡ A
few
small
companies
Lynx
Consultants
©
2013
- 14. Who runs Big Data?
¡ A
few
small
companies
Lynx
Consultants
©
2013
- 15. Who runs Big Data?
¡ A
few
small
companies
Lynx
Consultants
©
2013
- 16. Who runs Big Data?
¡ A
few
small
companies
Lynx
Consultants
©
2013
- 17. Who runs Big Data?
¡ A
few
small
companies
Lynx
Consultants
©
2013
- 18. Who runs Big Data?
¡ A
few
small
companies
Lynx
Consultants
©
2013
- 19. Who runs Big Data?
¡ A
few
small
companies
Lynx
Consultants
©
2013
- 20. Who runs Big Data?
¡ A
few
small
companies
Lynx
Consultants
©
2013
- 21. Who runs Big Data?
¡ A
few
small
companies
Lynx
Consultants
©
2013
- 23. What is Hadoop?
¡
Hadoop
is
one
of
the
big
players
for
Big
Data
§ Developed
as
an
Open
Source
implementation
to
implement
Google
BigTable
Lynx
Consultants
©
2013
- 24. What is Hadoop?
¡
Hadoop
is
one
of
the
big
players
for
Big
Data
§ Developed
as
an
Open
Source
implementation
to
implement
Google
BigTable
§ Mainly
developed
at
Yahoo!
Lynx
Consultants
©
2013
- 25. What is Hadoop?
¡
Hadoop
is
one
of
the
big
players
for
Big
Data
§ Developed
as
an
Open
Source
implementation
to
implement
Google
BigTable
§ Mainly
developed
at
Yahoo!
§ Current
companies
behind
it:
Hortonworks
and
Cloudera
Lynx
Consultants
©
2013
- 26. What are the features of Hadoop?
¡
HDFS
–
Hadoop
Distributed
File
System
§ HDFS
is
a
distributed
filesystem
across
many
nodes
§ Has
many
copies
of
your
data
(default:
3)
§ If
one
node
goes
down
makes
sure
all
the
data
is
rebalanced
Lynx
Consultants
©
2013
- 27. What are the features of Hadoop?
¡
HDFS
–
Hadoop
Distributed
File
System
Lynx
Consultants
©
2013
- 28. What are the features of Hadoop?
¡
HDFS
–
Hadoop
Distributed
File
System
¡
Hbase
–
Hadoop
NoSQL
Database
§ Schemaless
Key-‐Value
storage
§ All
data
exportable
in
JSON
Lynx
Consultants
©
2013
- 29. What are the features of Hadoop?
¡
HDFS
–
Hadoop
Distributed
File
System
¡
Hbase
–
Hadoop
NoSQL
Database
Lynx
Consultants
©
2013
- 30. What are the features of Hadoop?
¡
HDFS
–
Hadoop
Distributed
File
System
¡
Hbase
–
Hadoop
NoSQL
Database
¡
Map/Reduce
–
The
key
to
it
all
§ This
was
invented
by
Google
§ Given
a
dataset
we
Map
all
that
match
a
criteria
§ Then
we
Reduce
this
to
a
result
Lynx
Consultants
©
2013
- 31. What are the features of Hadoop?
¡ Map/Reduce
–
The
key
to
it
all
Lynx
Consultants
©
2013
- 32. What are the features of Hadoop?
¡
HDFS
–
Hadoop
Distributed
File
System
¡
Hbase
–
Hadoop
NoSQL
Database
¡
Map/Reduce
–
The
key
to
it
all
¡
Hive
–
SQL
for
NoSQL
§ Hive
provides
a
SQL
language
called
HiveSQL
§ Provides
a
good
entrance
for
SQL
users
:)
Lynx
Consultants
©
2013
- 33. What are the features of Hadoop?
¡
HDFS
–
Hadoop
Distributed
File
System
¡
Hbase
–
Hadoop
NoSQL
Database
¡
Map/Reduce
–
The
key
to
it
all
¡
Hive
–
SQL
for
NoSQL
¡
Pig
–
Map/Reduce
made
easy
§ Creates
data
results
given
a
reduced
language
§ Reinvents
SQL
somehow
Lynx
Consultants
©
2013
- 34. What are the features of Hadoop?
¡
Hive
Lynx
Consultants
©
2013
- 35. What are the features of Hadoop?
¡
Pig
Lynx
Consultants
©
2013
- 36. What are the features of Hadoop?
¡
HDFS
–
Hadoop
Distributed
File
System
¡
Hbase
–
Hadoop
NoSQL
Database
¡
Map/Reduce
–
The
key
to
it
all
¡
Hive
–
SQL
for
NoSQL
¡
Pig
–
Map/Reduce
made
easy
¡
Flume
–
Fault
Tolerant
transport
Lynx
Consultants
©
2013
- 37. What are the features of Hadoop?
¡
Flume
§ Divides
in
Sources,
Channels,
Sinks
§ Can
have
multiple
of
everything,
makes
it
fault
tolerant
§ Many
sources!
▪ Avro,
Exec,
JMS,
Syslog,
HTTP,
NetCat,
Your
Own
(Java)
Lynx
Consultants
©
2013
- 38. What are the features of Hadoop?
¡
Flume
§ Divides
in
Sources,
Channels,
Sinks
§ Can
have
multiple
of
everything,
makes
it
fault
tolerant
§ Many
sources!
§ Many
channels!
▪ Memory,
File,
Your
Own
(Java)
Lynx
Consultants
©
2013
- 39. What are the features of Hadoop?
¡
Flume
§ Divides
in
Sources,
Channels,
Sinks
§ Can
have
multiple
of
everything,
makes
it
fault
tolerant
§ Many
sources!
§ Many
channels!
§ Many
sinks!
▪ Avro,
HDFS,
Logger,
IRC,
File,
Hbase,
ElasticSearch,
S3,
Community
sinks,
Your
Own
(Java)
Lynx
Consultants
©
2013
- 40. What are the features of Hadoop?
¡
Flume
Lynx
Consultants
©
2013
- 41. How Hadoop looks like in a DC
¡
Components
§ Primary
Namenode
§ Secondary
Namenode
§ Data
Node
Lynx
Consultants
©
2013
- 42. How Hadoop looks like in a DC
¡
Components
§ Primary
Namenode
▪ Controls
all
the
cluster,
knows
where
the
data
resides
▪ Runs
the
job
tracker
to
keep
track
of
Map/Reduce
jobs
▪ Biggest
point
of
failure,
shadowing
it
is
a
potential
option
§ Secondary
Namenode
§ Data
Node
Lynx
Consultants
©
2013
- 43. How Hadoop looks like in a DC
¡
Components
§ Primary
Namenode
§ Secondary
Namenode
▪ Performs
secondary
cleanup
options
§ Data
Node
Lynx
Consultants
©
2013
- 44. How Hadoop looks like in a DC
¡
Components
§ Primary
Namenode
§ Secondary
Namenode
§ Data
Node
▪ Stores
all
the
information
▪ Runs
Map/Reduce
Lynx
Consultants
©
2013