SlideShare une entreprise Scribd logo
1  sur  69
Télécharger pour lire hors ligne
Adam	
  Muise	
  –	
  Solu/on	
  Architect,	
  Hortonworks	
  

ELEPHANT	
  AT	
  THE	
  DOOR:	
  

HADOOP	
  AND	
  NEXT	
  GENERATION	
  DATA	
  
Who	
  am	
  I?	
  
Who	
  is	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ?	
  
100%	
  Open	
  Source	
  –	
  
Democra/zed	
  Access	
  to	
  
Data	
  

The	
  leaders	
  of	
  Hadoop’s	
  
development	
  

We	
  do	
  Hadoop	
  
Drive	
  Innova/on	
  in	
  
the	
  plaForm	
  –	
  We	
  
lead	
  the	
  roadmap	
  	
  
Community	
  driven,	
  	
  
Enterprise	
  Focused	
  
We	
  do	
  Hadoop	
  successfully.	
  
Support	
  	
  
Training	
  
Professional	
  Services	
  
We	
  do	
  Hadoop	
  successfully	
  
everywhere.	
  
We	
  do	
  Hadoop	
  successfully,	
  
everywhere,	
  with	
  partners.	
  
What	
  is	
  Hadoop?	
  	
  
What	
  is	
  everyone	
  talking	
  about?	
  
Data	
  
“Big	
  Data”	
  is	
  the	
  marke/ng	
  term	
  
of	
  the	
  decade	
  in	
  IT	
  
What	
  lurks	
  behind	
  the	
  hype	
  is	
  
the	
  democra/za/on	
  of	
  Data.	
  
You	
  need	
  data.	
  	
  
But	
  what	
  do	
  you	
  do	
  with	
  your	
  
data	
  now?	
  
We	
  are	
  obsessive	
  compulsive	
  
about	
  collec/ng	
  and	
  structuring	
  
our	
  data.	
  
Put	
  it	
  away,	
  delete	
  it,	
  tweet	
  it,	
  
compress	
  it,	
  shred	
  it,	
  wikileak-­‐it,	
  put	
  
it	
  in	
  a	
  database,	
  put	
  it	
  in	
  SAN/NAS,	
  
put	
  it	
  in	
  the	
  cloud,	
  hide	
  it	
  in	
  tape…	
  
You	
  need	
  data.	
  Your	
  customers	
  
expect	
  you	
  to	
  know	
  what	
  they	
  want	
  
before	
  they	
  do.	
  	
  
Let’s	
  talk	
  challenges…	
  
Volume	
  
Volume	
  

Volume	
  

Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  

Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  

Volume	
  

Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  

Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  

Volume	
  

Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  Volume	
   Volume	
   Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
   Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
   Volume	
   Volume	
   Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
   Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
   Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
   Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
   Volume	
  
Volume	
  
Volume	
  
Volume	
  Volume	
   Volume	
   Volume	
  

Volume	
  
Storage,	
  Management,	
  Processing	
  
all	
  become	
  challenges	
  with	
  Data	
  at	
  
Volume	
  
Tradi/onal	
  technologies	
  adopt	
  a	
  
divide,	
  drop,	
  and	
  conquer	
  approach	
  
Another	
  EDW	
  

Analy/cal	
  DB	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  

The	
  solu/on?	
  
EDW	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  

OLTP	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Yet	
  Another	
  EDW	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Another	
  EDW	
  

Analy/cal	
  DB	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  

OLTP	
  

Ummm…you	
  
dropped	
  something	
  
EDW	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  

Yet	
  Another	
  EDW	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  

Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
   Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
   Data	
  
Data	
   Data	
  Data	
  
Data	
   Data	
   Data	
  
Data	
  Data	
   Data	
   Data	
   Data	
   Data	
   Data	
   Data	
   Data	
   Data	
  Data	
  
Data	
   Data	
  Data	
  
Data	
   Data	
   Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
   Data	
   Data	
  Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
   Data	
   Data	
   Data	
  
Data	
  
Data	
   Data	
  
Analyzing	
  the	
  data	
  usually	
  raises	
  
more	
  interes/ng	
  ques/ons…	
  
…which	
  leads	
  to	
  more	
  data	
  
Wait,	
  you’ve	
  seen	
  this	
  before.	
  

…	
  

Data	
  
Data	
  
Data	
  

Analy/cs	
  Sausage	
  Factory	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
   Data	
   Data	
  
Data	
   Data	
   Data	
  
Data	
   Data	
   Data	
  
Data	
   Data	
  Data	
   Data	
  
Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
   Data	
   Data	
  
Data	
   Data	
  

…	
  

Data	
  
Data	
  
Data	
   Data	
  Data	
  
Data	
  
Data	
   Data	
  Data	
  
Data	
   Data	
   Data	
  
Data	
  
Data	
  begets	
  Data.	
  
What	
  keeps	
  us	
  from	
  our	
  Data?	
  
“Prices,	
  Stupid	
  passwords,	
  and	
  
Boring	
  Sta/s/cs.”	
  	
  
-­‐	
  Hans	
  Rosling	
  

h)p://www.youtube.com/watch?v=hVimVzgtD6w	
  
Your	
  data	
  silos	
  are	
  lonely	
  places.	
  
EDW	
  

Accounts	
  

Customers	
  

Web	
  Proper/es	
  

Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
   Data	
  
Data	
   Data	
   Data	
  
Data	
  
Data	
   Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
  
…	
  Data	
  likes	
  to	
  be	
  together.	
  
EDW	
  

Accounts	
  

Customers	
  
Data	
  
Data	
  
Web	
  Proper/es	
  
Data	
   Data	
   Data	
   Data	
  
Data	
  
Data	
   Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
   Data	
   Data	
   Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
   Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
  
CDR	
  

Data	
  
Data	
   Data	
   Machine	
  Data	
  
Facebook	
  
Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
   Data	
   Data	
   Data	
   Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
  
Weather	
  Data	
  

Twi^er	
  

Data	
  
Data	
  likes	
  to	
  socialize	
  too.	
   Data	
   Data	
  
EDW	
  

Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  

Accounts	
  
Data	
  
Web	
  Proper/es	
  
Data	
   Data	
  
Data	
  
Customers	
  
Data	
   Data	
   Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
  
Data	
   Data	
  
Data	
   Data	
   Data	
   Data	
   Data	
  
Data	
   Data	
  
Data	
  
Data	
   Data	
   Data	
   Data	
   Data	
   Data	
  
Data	
  
Data	
  
Data	
   Data	
   Data	
   Data	
  
New	
  types	
  of	
  data	
  don’t	
  quite	
  fit	
  into	
  
your	
  pris/ne	
  view	
  of	
  the	
  world.	
  
Logs	
  

Data	
   Data	
  
Data	
  
Data	
  
Data	
  Data	
  
Data	
  
Machine	
  Data	
  
Data	
   Data	
  
Data	
  
Data	
  
Data	
  Data	
  
Data	
  

My	
  Li^le	
  Data	
  Empire	
  

Data	
  
?	
   Data	
  
?	
   Data	
   Data	
  
Data	
  
Data	
   Data	
  
?	
  ?	
  
Data	
  
Data	
  
To	
  resolve	
  this,	
  some	
  people	
  take	
  
hints	
  from	
  Lord	
  Of	
  The	
  Rings...	
  
…and	
  create	
  One-­‐Schema-­‐To-­‐
Rule-­‐Them-­‐All…	
  
EDW	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Schema	
  
Data	
  
Data	
  
Data	
   Data	
  
ETL	
  
Data	
  
Data	
  
Data	
  

ETL	
  

ETL	
  

ETL	
  

EDW	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Schema	
  
Data	
  
Data	
  
Data	
   Data	
  

…but	
  that	
  has	
  its	
  problems	
  too.	
  
ETL	
  
Data	
  
Data	
  
Data	
  

ETL	
  

ETL	
  
ETL	
  

EDW	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Schema	
  
Data	
  
Data	
  
Data	
   Data	
  
ETL	
  
Data	
  
Data	
  
Data	
  

ETL	
  

ETL	
  

ETL	
  

EDW	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Schema	
  
Data	
  
Data	
  
Data	
   Data	
  

Fragile	
  workflows	
  make	
  suppor/ng	
  the	
  
analy/cal	
  models	
  you	
  want	
  expensive	
  and	
  
/me-­‐consuming.	
  
ETL	
  
Data	
  
Data	
  
Data	
  

ETL	
  

ETL	
  
ETL	
  

EDW	
  

Data	
  
Data	
   Data	
  
Data	
   Data	
  
Schema	
  
Data	
  
Data	
  
Data	
   Data	
  
What	
  do	
  you	
  want	
  to	
  do	
  with	
  
data?	
  
Marke/ng	
  Analy/cs	
  needs	
  data.	
  
Work	
  with	
  the	
  popula/on,	
  not	
  just	
  a	
  
sample.	
  
Town/City	
  
Middle	
  Income	
  Band	
  

Your	
  segmenta/on	
  today.	
  
Female	
  
Age:	
  25-­‐30	
  

Male	
  
Product	
  Category	
  
Preferences	
  
GPS	
  coordinates	
  
Looking	
  to	
  start	
  a	
  
business	
  	
  

Walking	
  into	
  
Starbucks	
  right	
  now…	
  

Spent	
  25	
  minutes	
  
looking	
  at	
  tea	
  cozies	
  

Unhappy	
  with	
  his	
  cell	
  
phone	
  plan	
  

$65-­‐68k	
  per	
  year	
  

Your	
  segmenta/on	
  with	
  
Pregnant	
  
be^er	
  data.	
  
Tea	
  Party	
  

Hippie	
  

A	
  depressed	
  Toronto	
  
Maple	
  Leaf’s	
  Fan	
  
Gene	
  
Expression	
  for	
  
Risk	
  Taker	
  

Male	
  

Female	
  
Age:	
  27	
  but	
  
feels	
  old	
  

Product	
  
recommenda/ons	
  

Thinking	
  about	
  
a	
  new	
  house	
  
Products	
  lek	
  in	
  
basket	
  indicate	
  drunk	
  
amazon	
  shopper	
  
Pick	
  up	
  all	
  of	
  that	
  data	
  that	
  was	
  
prohibi/vely	
  expensive	
  to	
  store	
  and	
  
use.	
  	
  	
  
Why	
  do	
  viewer	
  surveys…	
  
…when	
  raw	
  data	
  can	
  tell	
  you	
  what	
  
bu^on	
  on	
  the	
  remote	
  was	
  pressed	
  
during	
  what	
  commercial	
  for	
  the	
  
en/re	
  viewer	
  popula/on?	
  
To	
  approach	
  these	
  use	
  cases	
  you	
  
need	
  an	
  affordable	
  plaForm	
  that	
  
stores,	
  processes,	
  and	
  analyzes	
  the	
  
data.	
  	
  
So	
  what	
  is	
  the	
  answer?	
  
Enter	
  the	
  Hadoop.	
  

………	
  
h^p://www.fabulouslybroke.com/2011/05/ninja-­‐elephants-­‐and-­‐other-­‐awesome-­‐stories/	
  
Hadoop	
  was	
  created	
  because	
  
tradi/onal	
  technologies	
  never	
  cut	
  it	
  
for	
  the	
  Internet	
  proper/es	
  like	
  
Google,	
  Yahoo,	
  Facebook,	
  Twi^er,	
  
and	
  LinkedIn	
  
Tradi/onal	
  architecture	
  didn’t	
  
scale	
  enough…	
  
App	
   App	
   App	
   App	
  

App	
   App	
   App	
   App	
  
DB	
   DB	
  
DB	
  
SAN	
  

App	
   App	
   App	
   App	
  
DB	
   DB	
  
DB	
  
SAN	
  

DB	
   DB	
  
DB	
  
SAN	
  
Databases	
  can	
  become	
  bloated	
  
and	
  useless	
  
$upercompu/ng	
  

Tradi/onal	
  architectures	
  cost	
  too	
  
much	
  at	
  that	
  volume…	
  

$/TB	
  

$pecial	
  
Hardware	
  
So	
  what	
  is	
  the	
  answer?	
  
If	
  you	
  could	
  design	
  a	
  system	
  that	
  
would	
  handle	
  this,	
  what	
  would	
  it	
  
look	
  like?	
  
It	
  would	
  probably	
  need	
  a	
  highly	
  
resilient,	
  self-­‐healing,	
  cost-­‐efficient,	
  
distributed	
  file	
  system…	
  
Storage	
  

Storage	
  

Storage	
  

Storage	
  

Storage	
  

Storage	
  

Storage	
  

Storage	
  

Storage	
  
It	
  would	
  probably	
  need	
  a	
  completely	
  
parallel	
  processing	
  framework	
  that	
  
took	
  tasks	
  to	
  the	
  data…	
  
Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  
Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  
Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  
It	
  would	
  probably	
  run	
  on	
  commodity	
  
hardware,	
  virtualized	
  machines,	
  and	
  
common	
  OS	
  plaForms	
  
Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  
Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  
Processing	
   Processing	
  Processing	
  
Storage	
   Storage	
   Storage	
  
It	
  would	
  probably	
  be	
  open	
  source	
  so	
  
innova/on	
  could	
  happen	
  as	
  quickly	
  
as	
  possible	
  
It	
  would	
  need	
  a	
  cri/cal	
  mass	
  of	
  
users	
  
Hadoop	
  2	
  just	
  hit	
  the	
  ground:	
  
Introducing	
  YARN	
  
YARN	
  lets	
  you	
  run	
  more	
  data	
  
apps	
  than	
  ever	
  before	
  
MapReduce	
  V2	
  
MapReduce	
  V?	
   STORM	
  

Giraph	
  

Tez	
  

YARN	
  
HDFS2	
  

MPI	
  

HBase	
  

…	
  and	
  
more	
  
YARN	
  turns	
  Hadoop	
  into	
  a	
  smart	
  
phone:	
  An	
  App	
  Ecosystem	
  
hortonworks.com/yarn/	
  
YARN:	
  	
  
Yeah,	
  we	
  did	
  that	
  too.	
  
hortonworks.com/yarn/	
  
Storm	
  
HDFS	
  

YARN	
  

Pig	
  

MapReduce	
  

Apache	
  Hadoop	
  

HCatalog	
  

Hive	
  
HBase	
  

Ambari	
  

Sqoop	
  

Falcon	
  
Flume	
  
Storm	
  

Pig	
  

HDFS	
  

YARN	
  
MapReduce	
  

Hortonworks	
  Data	
  PlaForm	
  
HCatalog	
  

Hive	
  
HBase	
  

Ambari	
  

Sqoop	
  

Falcon	
  
Flume	
  
What	
  else	
  are	
  we	
  working	
  on?	
  
hortonworks.com/labs/	
  
Hadoop	
  is	
  the	
  new	
  Data	
  Opera/ng	
  
System	
  for	
  the	
  Enterprise	
  
There is NO second place

Hortonworks	
  

…the	
  Bull	
  Elephant	
  of	
  Hadoop	
  InnovaDon	
  
© Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION

Page	
  69	
  

Contenu connexe

Tendances

What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...Edureka!
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduceRyan Tabora
 
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaEdureka!
 
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...Edureka!
 
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |EdurekaHadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |EdurekaEdureka!
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopEdureka!
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Edureka!
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Mahantesh Angadi
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Edureka!
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascienceAdam Muise
 
Webinar : Talend : The Non-Programmer's Swiss Knife for Big Data
Webinar  : Talend : The Non-Programmer's Swiss Knife for Big DataWebinar  : Talend : The Non-Programmer's Swiss Knife for Big Data
Webinar : Talend : The Non-Programmer's Swiss Knife for Big DataEdureka!
 
Introduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IIntroduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IEdureka!
 
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...Simplilearn
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop Edureka!
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopGhassan Al-Yafie
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersEdureka!
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - IntroductionTomy Rhymond
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreTrendwise Analytics
 
10 Popular Hadoop Technical Interview Questions
10 Popular Hadoop Technical Interview Questions10 Popular Hadoop Technical Interview Questions
10 Popular Hadoop Technical Interview QuestionsZaranTech LLC
 

Tendances (20)

What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
 
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
 
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
 
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |EdurekaHadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascience
 
Webinar : Talend : The Non-Programmer's Swiss Knife for Big Data
Webinar  : Talend : The Non-Programmer's Swiss Knife for Big DataWebinar  : Talend : The Non-Programmer's Swiss Knife for Big Data
Webinar : Talend : The Non-Programmer's Swiss Knife for Big Data
 
Introduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IIntroduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -I
 
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoop
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-Programmers
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
 
10 Popular Hadoop Technical Interview Questions
10 Popular Hadoop Technical Interview Questions10 Popular Hadoop Technical Interview Questions
10 Popular Hadoop Technical Interview Questions
 

Similaire à Hadoop and Next Generation Data Architect Adam Muise

Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopEdureka!
 
Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastEric Kavanagh
 
Hadoop Webinar 28July15
Hadoop Webinar 28July15Hadoop Webinar 28July15
Hadoop Webinar 28July15Edureka!
 
Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Edureka!
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big DataJean-Marc Desvaux
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPDr Geetha Mohan
 
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Josh Patterson
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond Rajesh Kumar
 
Big Data - JAX2011 (Pavlo Baron)
Big Data - JAX2011 (Pavlo Baron)Big Data - JAX2011 (Pavlo Baron)
Big Data - JAX2011 (Pavlo Baron)Pavlo Baron
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn
 
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.Jennifer Walker
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Datajdijcks
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!
 
Linked data business models
Linked data business modelsLinked data business models
Linked data business modelsJesus Contreras
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupScott Mitchell
 
Big data tim
Big data timBig data tim
Big data timT Weir
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop DeveloperEdureka!
 

Similaire à Hadoop and Next Generation Data Architect Adam Muise (20)

Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
 
Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory Webcast
 
Hadoop Webinar 28July15
Hadoop Webinar 28July15Hadoop Webinar 28July15
Hadoop Webinar 28July15
 
Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?
 
Solving Big Data Problems
Solving Big Data ProblemsSolving Big Data Problems
Solving Big Data Problems
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big Data
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOP
 
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
 
Data vault: What's Next
Data vault: What's NextData vault: What's Next
Data vault: What's Next
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
Big Data - JAX2011 (Pavlo Baron)
Big Data - JAX2011 (Pavlo Baron)Big Data - JAX2011 (Pavlo Baron)
Big Data - JAX2011 (Pavlo Baron)
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Data
 
Data lake ppt
Data lake pptData lake ppt
Data lake ppt
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
Linked data business models
Linked data business modelsLinked data business models
Linked data business models
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
 
Big data tim
Big data timBig data tim
Big data tim
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 

Plus de Adam Muise

2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_finalAdam Muise
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Adam Muise
 
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadamAdam Muise
 
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part12014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1Adam Muise
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_securityAdam Muise
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLAdam Muise
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionAdam Muise
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive TuningAdam Muise
 
2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_pointsAdam Muise
 
2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalogAdam Muise
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012Adam Muise
 
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech2012 sept 18_thug_biotech
2012 sept 18_thug_biotechAdam Muise
 
hadoop 101 aug 21 2012 tohug
 hadoop 101 aug 21 2012 tohug hadoop 101 aug 21 2012 tohug
hadoop 101 aug 21 2012 tohugAdam Muise
 

Plus de Adam Muise (14)

2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015
 
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
 
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part12014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical Introduction
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
 
2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points
 
2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012
 
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech2012 sept 18_thug_biotech
2012 sept 18_thug_biotech
 
hadoop 101 aug 21 2012 tohug
 hadoop 101 aug 21 2012 tohug hadoop 101 aug 21 2012 tohug
hadoop 101 aug 21 2012 tohug
 

Hadoop and Next Generation Data Architect Adam Muise

  • 1. Adam  Muise  –  Solu/on  Architect,  Hortonworks   ELEPHANT  AT  THE  DOOR:   HADOOP  AND  NEXT  GENERATION  DATA  
  • 3. Who  is                                        ?  
  • 4. 100%  Open  Source  –   Democra/zed  Access  to   Data   The  leaders  of  Hadoop’s   development   We  do  Hadoop   Drive  Innova/on  in   the  plaForm  –  We   lead  the  roadmap     Community  driven,     Enterprise  Focused  
  • 5. We  do  Hadoop  successfully.   Support     Training   Professional  Services  
  • 6. We  do  Hadoop  successfully   everywhere.  
  • 7. We  do  Hadoop  successfully,   everywhere,  with  partners.  
  • 8. What  is  Hadoop?     What  is  everyone  talking  about?  
  • 10. “Big  Data”  is  the  marke/ng  term   of  the  decade  in  IT  
  • 11. What  lurks  behind  the  hype  is   the  democra/za/on  of  Data.  
  • 13. But  what  do  you  do  with  your   data  now?  
  • 14. We  are  obsessive  compulsive   about  collec/ng  and  structuring   our  data.  
  • 15. Put  it  away,  delete  it,  tweet  it,   compress  it,  shred  it,  wikileak-­‐it,  put   it  in  a  database,  put  it  in  SAN/NAS,   put  it  in  the  cloud,  hide  it  in  tape…  
  • 16. You  need  data.  Your  customers   expect  you  to  know  what  they  want   before  they  do.    
  • 19. Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume   Volume   Volume  Volume  
  • 20. Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume  
  • 21. Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume   Volume  Volume   Volume   Volume   Volume  
  • 22. Storage,  Management,  Processing   all  become  challenges  with  Data  at   Volume  
  • 23. Tradi/onal  technologies  adopt  a   divide,  drop,  and  conquer  approach  
  • 24. Another  EDW   Analy/cal  DB   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   The  solu/on?   EDW   Data   Data   Data   Data   Data   Data   Data   Data   Data   OLTP   Data   Data   Data   Data   Data   Data   Data   Data   Data   Yet  Another  EDW   Data   Data   Data   Data   Data   Data   Data   Data   Data  
  • 25. Another  EDW   Analy/cal  DB   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   OLTP   Ummm…you   dropped  something   EDW   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Yet  Another  EDW   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data  Data   Data   Data   Data   Data  Data   Data   Data   Data   Data   Data   Data   Data   Data  Data   Data   Data  Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data  Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data  
  • 26. Analyzing  the  data  usually  raises   more  interes/ng  ques/ons…  
  • 27. …which  leads  to  more  data  
  • 28. Wait,  you’ve  seen  this  before.   …   Data   Data   Data   Analy/cs  Sausage  Factory   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data  Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   …   Data   Data   Data   Data  Data   Data   Data   Data  Data   Data   Data   Data   Data  
  • 30. What  keeps  us  from  our  Data?  
  • 31. “Prices,  Stupid  passwords,  and   Boring  Sta/s/cs.”     -­‐  Hans  Rosling   h)p://www.youtube.com/watch?v=hVimVzgtD6w  
  • 32. Your  data  silos  are  lonely  places.   EDW   Accounts   Customers   Web  Proper/es   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data  
  • 33. …  Data  likes  to  be  together.   EDW   Accounts   Customers   Data   Data   Web  Proper/es   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data  
  • 34. CDR   Data   Data   Data   Machine  Data   Facebook   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Weather  Data   Twi^er   Data   Data  likes  to  socialize  too.   Data   Data   EDW   Data   Data   Data   Data   Data   Data   Accounts   Data   Web  Proper/es   Data   Data   Data   Customers   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Data  
  • 35. New  types  of  data  don’t  quite  fit  into   your  pris/ne  view  of  the  world.   Logs   Data   Data   Data   Data   Data  Data   Data   Machine  Data   Data   Data   Data   Data   Data  Data   Data   My  Li^le  Data  Empire   Data   ?   Data   ?   Data   Data   Data   Data   Data   ?  ?   Data   Data  
  • 36. To  resolve  this,  some  people  take   hints  from  Lord  Of  The  Rings...  
  • 37. …and  create  One-­‐Schema-­‐To-­‐ Rule-­‐Them-­‐All…   EDW   Data   Data   Data   Data   Data   Schema   Data   Data   Data   Data  
  • 38. ETL   Data   Data   Data   ETL   ETL   ETL   EDW   Data   Data   Data   Data   Data   Schema   Data   Data   Data   Data   …but  that  has  its  problems  too.   ETL   Data   Data   Data   ETL   ETL   ETL   EDW   Data   Data   Data   Data   Data   Schema   Data   Data   Data   Data  
  • 39. ETL   Data   Data   Data   ETL   ETL   ETL   EDW   Data   Data   Data   Data   Data   Schema   Data   Data   Data   Data   Fragile  workflows  make  suppor/ng  the   analy/cal  models  you  want  expensive  and   /me-­‐consuming.   ETL   Data   Data   Data   ETL   ETL   ETL   EDW   Data   Data   Data   Data   Data   Schema   Data   Data   Data   Data  
  • 40. What  do  you  want  to  do  with   data?  
  • 41. Marke/ng  Analy/cs  needs  data.   Work  with  the  popula/on,  not  just  a   sample.  
  • 42. Town/City   Middle  Income  Band   Your  segmenta/on  today.   Female   Age:  25-­‐30   Male   Product  Category   Preferences  
  • 43. GPS  coordinates   Looking  to  start  a   business     Walking  into   Starbucks  right  now…   Spent  25  minutes   looking  at  tea  cozies   Unhappy  with  his  cell   phone  plan   $65-­‐68k  per  year   Your  segmenta/on  with   Pregnant   be^er  data.   Tea  Party   Hippie   A  depressed  Toronto   Maple  Leaf’s  Fan   Gene   Expression  for   Risk  Taker   Male   Female   Age:  27  but   feels  old   Product   recommenda/ons   Thinking  about   a  new  house   Products  lek  in   basket  indicate  drunk   amazon  shopper  
  • 44. Pick  up  all  of  that  data  that  was   prohibi/vely  expensive  to  store  and   use.      
  • 45. Why  do  viewer  surveys…  
  • 46. …when  raw  data  can  tell  you  what   bu^on  on  the  remote  was  pressed   during  what  commercial  for  the   en/re  viewer  popula/on?  
  • 47. To  approach  these  use  cases  you   need  an  affordable  plaForm  that   stores,  processes,  and  analyzes  the   data.    
  • 48. So  what  is  the  answer?  
  • 49. Enter  the  Hadoop.   ………   h^p://www.fabulouslybroke.com/2011/05/ninja-­‐elephants-­‐and-­‐other-­‐awesome-­‐stories/  
  • 50. Hadoop  was  created  because   tradi/onal  technologies  never  cut  it   for  the  Internet  proper/es  like   Google,  Yahoo,  Facebook,  Twi^er,   and  LinkedIn  
  • 51. Tradi/onal  architecture  didn’t   scale  enough…   App   App   App   App   App   App   App   App   DB   DB   DB   SAN   App   App   App   App   DB   DB   DB   SAN   DB   DB   DB   SAN  
  • 52. Databases  can  become  bloated   and  useless  
  • 53. $upercompu/ng   Tradi/onal  architectures  cost  too   much  at  that  volume…   $/TB   $pecial   Hardware  
  • 54. So  what  is  the  answer?  
  • 55. If  you  could  design  a  system  that   would  handle  this,  what  would  it   look  like?  
  • 56. It  would  probably  need  a  highly   resilient,  self-­‐healing,  cost-­‐efficient,   distributed  file  system…   Storage   Storage   Storage   Storage   Storage   Storage   Storage   Storage   Storage  
  • 57. It  would  probably  need  a  completely   parallel  processing  framework  that   took  tasks  to  the  data…   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage  
  • 58. It  would  probably  run  on  commodity   hardware,  virtualized  machines,  and   common  OS  plaForms   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage  
  • 59. It  would  probably  be  open  source  so   innova/on  could  happen  as  quickly   as  possible  
  • 60. It  would  need  a  cri/cal  mass  of   users  
  • 61. Hadoop  2  just  hit  the  ground:   Introducing  YARN  
  • 62. YARN  lets  you  run  more  data   apps  than  ever  before   MapReduce  V2   MapReduce  V?   STORM   Giraph   Tez   YARN   HDFS2   MPI   HBase   …  and   more  
  • 63. YARN  turns  Hadoop  into  a  smart   phone:  An  App  Ecosystem   hortonworks.com/yarn/  
  • 64. YARN:     Yeah,  we  did  that  too.   hortonworks.com/yarn/  
  • 65. Storm   HDFS   YARN   Pig   MapReduce   Apache  Hadoop   HCatalog   Hive   HBase   Ambari   Sqoop   Falcon   Flume  
  • 66. Storm   Pig   HDFS   YARN   MapReduce   Hortonworks  Data  PlaForm   HCatalog   Hive   HBase   Ambari   Sqoop   Falcon   Flume  
  • 67. What  else  are  we  working  on?   hortonworks.com/labs/  
  • 68. Hadoop  is  the  new  Data  Opera/ng   System  for  the  Enterprise  
  • 69. There is NO second place Hortonworks   …the  Bull  Elephant  of  Hadoop  InnovaDon   © Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page  69