SlideShare une entreprise Scribd logo
1  sur  46
Télécharger pour lire hors ligne
Marc	
  Cluet	
  –	
  Lynx	
  Consultants	
  
What’s	
  behind	
  Big	
  Data	
  
What we’ll cover?
¡  Understand	
  Hadoop	
  components	
  
¡  Understand	
  different	
  technologies	
  involved	
  
¡  Embrace	
  Big	
  Data!	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What is Big Data?
Lynx	
  Consultants	
  ©	
  2013	
  
What is Big Data?
¡  	
  SQL	
  has	
  a	
  limited	
  ability	
  to	
  process	
  changing	
  data	
  
§  SQL	
  schemas	
  are	
  the	
  truth,	
  data	
  needs	
  to	
  fit	
  that	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What is Big Data?
¡  	
  Big	
  Data	
  is	
  the	
  solution!	
  
§  Data	
  can	
  be	
  truly	
  dynamic	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What is Big Data?
¡  	
  Big	
  Data	
  is	
  the	
  solution!	
  
§  Data	
  can	
  be	
  truly	
  dynamic	
  
§  Designed	
  to	
  handle	
  Terabytes	
  of	
  data	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What is Big Data?
¡  	
  Big	
  Data	
  is	
  the	
  solution!	
  
§  Data	
  can	
  be	
  truly	
  dynamic	
  
§  Designed	
  to	
  handle	
  Terabytes	
  of	
  data	
  
§  Designed	
  for	
  fault	
  tolerance	
  and	
  securing	
  data	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What is Big Data?
¡  	
  Big	
  Data	
  is	
  the	
  solution!	
  
§  Data	
  can	
  be	
  truly	
  dynamic	
  
§  Designed	
  to	
  handle	
  Terabytes	
  of	
  data	
  
§  Designed	
  for	
  fault	
  tolerance	
  and	
  securing	
  data	
  
§  Designed	
  around	
  exploiting	
  hardware	
  to	
  the	
  fullest	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What is Big Data?
¡  	
  Big	
  Data	
  is	
  the	
  solution!	
  
§  Data	
  can	
  be	
  truly	
  dynamic	
  
§  Designed	
  to	
  handle	
  Terabytes	
  of	
  data	
  
§  Designed	
  for	
  fault	
  tolerance	
  and	
  securing	
  data	
  
§  Designed	
  around	
  exploiting	
  hardware	
  to	
  the	
  fullest	
  
§  Designed	
  around	
  Map/Reduce	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Who runs Big Data?
¡  A	
  few	
  small	
  companies	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What is Hadoop?
Lynx	
  Consultants	
  ©	
  2013	
  
What is Hadoop?
¡  	
  Hadoop	
  is	
  one	
  of	
  the	
  big	
  players	
  for	
  Big	
  Data	
  
§  Developed	
  as	
  an	
  Open	
  Source	
  implementation	
  to	
  implement	
  
Google	
  BigTable	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What is Hadoop?
¡  	
  Hadoop	
  is	
  one	
  of	
  the	
  big	
  players	
  for	
  Big	
  Data	
  
§  Developed	
  as	
  an	
  Open	
  Source	
  implementation	
  to	
  implement	
  
Google	
  BigTable	
  
§  Mainly	
  developed	
  at	
  Yahoo!	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What is Hadoop?
¡  	
  Hadoop	
  is	
  one	
  of	
  the	
  big	
  players	
  for	
  Big	
  Data	
  
§  Developed	
  as	
  an	
  Open	
  Source	
  implementation	
  to	
  implement	
  
Google	
  BigTable	
  
§  Mainly	
  developed	
  at	
  Yahoo!	
  
§  Current	
  companies	
  behind	
  it:	
  Hortonworks	
  and	
  Cloudera	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  HDFS	
  –	
  Hadoop	
  Distributed	
  File	
  System	
  
§  HDFS	
  is	
  a	
  distributed	
  filesystem	
  across	
  many	
  nodes	
  
§  Has	
  many	
  copies	
  of	
  your	
  data	
  (default:	
  3)	
  
§  If	
  one	
  node	
  goes	
  down	
  makes	
  sure	
  all	
  the	
  data	
  is	
  rebalanced	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  HDFS	
  –	
  Hadoop	
  Distributed	
  File	
  System	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  HDFS	
  –	
  Hadoop	
  Distributed	
  File	
  System	
  
¡  	
  Hbase	
  –	
  Hadoop	
  NoSQL	
  Database	
  
§  Schemaless	
  Key-­‐Value	
  storage	
  
§  All	
  data	
  exportable	
  in	
  JSON	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  HDFS	
  –	
  Hadoop	
  Distributed	
  File	
  System	
  
¡  	
  Hbase	
  –	
  Hadoop	
  NoSQL	
  Database	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  HDFS	
  –	
  Hadoop	
  Distributed	
  File	
  System	
  
¡  	
  Hbase	
  –	
  Hadoop	
  NoSQL	
  Database	
  
¡  	
  Map/Reduce	
  –	
  The	
  key	
  to	
  it	
  all	
  
§  This	
  was	
  invented	
  by	
  Google	
  
§  Given	
  a	
  dataset	
  we	
  Map	
  all	
  that	
  match	
  a	
  criteria	
  
§  Then	
  we	
  Reduce	
  this	
  to	
  a	
  result	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  Map/Reduce	
  –	
  The	
  key	
  to	
  it	
  all	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  HDFS	
  –	
  Hadoop	
  Distributed	
  File	
  System	
  
¡  	
  Hbase	
  –	
  Hadoop	
  NoSQL	
  Database	
  
¡  	
  Map/Reduce	
  –	
  The	
  key	
  to	
  it	
  all	
  
¡  	
  Hive	
  –	
  SQL	
  for	
  NoSQL	
  
§  Hive	
  provides	
  a	
  SQL	
  language	
  called	
  HiveSQL	
  
§  Provides	
  a	
  good	
  entrance	
  for	
  SQL	
  users	
  :)	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  HDFS	
  –	
  Hadoop	
  Distributed	
  File	
  System	
  
¡  	
  Hbase	
  –	
  Hadoop	
  NoSQL	
  Database	
  
¡  	
  Map/Reduce	
  –	
  The	
  key	
  to	
  it	
  all	
  
¡  	
  Hive	
  –	
  SQL	
  for	
  NoSQL	
  
¡  	
  Pig	
  –	
  Map/Reduce	
  made	
  easy	
  
§  Creates	
  data	
  results	
  given	
  a	
  reduced	
  language	
  
§  Reinvents	
  SQL	
  somehow	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  Hive	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  Pig	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  HDFS	
  –	
  Hadoop	
  Distributed	
  File	
  System	
  
¡  	
  Hbase	
  –	
  Hadoop	
  NoSQL	
  Database	
  
¡  	
  Map/Reduce	
  –	
  The	
  key	
  to	
  it	
  all	
  
¡  	
  Hive	
  –	
  SQL	
  for	
  NoSQL	
  
¡  	
  Pig	
  –	
  Map/Reduce	
  made	
  easy	
  
¡  	
  Flume	
  –	
  Fault	
  Tolerant	
  transport	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  Flume	
  
§  Divides	
  in	
  Sources,	
  Channels,	
  Sinks	
  
§  Can	
  have	
  multiple	
  of	
  everything,	
  makes	
  it	
  fault	
  tolerant	
  
§  Many	
  sources!	
  
▪  Avro,	
  Exec,	
  JMS,	
  Syslog,	
  HTTP,	
  NetCat,	
  Your	
  Own	
  (Java)	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  Flume	
  
§  Divides	
  in	
  Sources,	
  Channels,	
  Sinks	
  
§  Can	
  have	
  multiple	
  of	
  everything,	
  makes	
  it	
  fault	
  tolerant	
  
§  Many	
  sources!	
  
§  Many	
  channels!	
  
▪  Memory,	
  File,	
  Your	
  Own	
  (Java)	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  Flume	
  
§  Divides	
  in	
  Sources,	
  Channels,	
  Sinks	
  
§  Can	
  have	
  multiple	
  of	
  everything,	
  makes	
  it	
  fault	
  tolerant	
  
§  Many	
  sources!	
  
§  Many	
  channels!	
  
§  Many	
  sinks!	
  
▪  Avro,	
  HDFS,	
  Logger,	
  IRC,	
  File,	
  Hbase,	
  ElasticSearch,	
  S3,	
  Community	
  
sinks,	
  Your	
  Own	
  (Java)	
  
Lynx	
  Consultants	
  ©	
  2013	
  
What are the features of Hadoop?
¡  	
  Flume	
  
Lynx	
  Consultants	
  ©	
  2013	
  
How Hadoop looks like in a DC
¡  	
  Components	
  
§  Primary	
  Namenode	
  
§  Secondary	
  Namenode	
  
§  Data	
  Node	
  
Lynx	
  Consultants	
  ©	
  2013	
  
How Hadoop looks like in a DC
¡  	
  Components	
  
§  Primary	
  Namenode	
  
▪  Controls	
  all	
  the	
  cluster,	
  knows	
  where	
  the	
  data	
  resides	
  
▪  Runs	
  the	
  job	
  tracker	
  to	
  keep	
  track	
  of	
  Map/Reduce	
  jobs	
  
▪  Biggest	
  point	
  of	
  failure,	
  shadowing	
  it	
  is	
  a	
  potential	
  option	
  
§  Secondary	
  Namenode	
  
§  Data	
  Node	
  
Lynx	
  Consultants	
  ©	
  2013	
  
How Hadoop looks like in a DC
¡  	
  Components	
  
§  Primary	
  Namenode	
  
§  Secondary	
  Namenode	
  
▪  Performs	
  secondary	
  cleanup	
  options	
  
§  Data	
  Node	
  
Lynx	
  Consultants	
  ©	
  2013	
  
How Hadoop looks like in a DC
¡  	
  Components	
  
§  Primary	
  Namenode	
  
§  Secondary	
  Namenode	
  
§  Data	
  Node	
  
▪  Stores	
  all	
  the	
  information	
  
▪  Runs	
  Map/Reduce	
  
Lynx	
  Consultants	
  ©	
  2013	
  
How Hadoop looks like in a DC
¡  	
  Components	
  
Lynx	
  Consultants	
  ©	
  2013	
  
Questions?
Lynx	
  Consultants	
  ©	
  2013	
  

Contenu connexe

Tendances

Soft-Shake 2013 : Enabling Realtime Queries to End Users
Soft-Shake 2013 : Enabling Realtime Queries to End UsersSoft-Shake 2013 : Enabling Realtime Queries to End Users
Soft-Shake 2013 : Enabling Realtime Queries to End Users
Benoit Perroud
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBase
larsgeorge
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 

Tendances (19)

SQL Saturday San Diego
SQL Saturday San DiegoSQL Saturday San Diego
SQL Saturday San Diego
 
Soft-Shake 2013 : Enabling Realtime Queries to End Users
Soft-Shake 2013 : Enabling Realtime Queries to End UsersSoft-Shake 2013 : Enabling Realtime Queries to End Users
Soft-Shake 2013 : Enabling Realtime Queries to End Users
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Data Warehouse on Hadoop Based System In Action
Data Warehouse on Hadoop Based System In ActionData Warehouse on Hadoop Based System In Action
Data Warehouse on Hadoop Based System In Action
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5
 
Webinar 5-reasons-object-storage.pptx
Webinar 5-reasons-object-storage.pptxWebinar 5-reasons-object-storage.pptx
Webinar 5-reasons-object-storage.pptx
 
Stockage des données : quel système pour quel usage ?
Stockage des données : quel système pour quel usage ?Stockage des données : quel système pour quel usage ?
Stockage des données : quel système pour quel usage ?
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBase
 
Protect your private data with ORC column encryption
Protect your private data with ORC column encryptionProtect your private data with ORC column encryption
Protect your private data with ORC column encryption
 
PLNOG 9: Ron Broersma - Enterprise IPv6 Deployment
PLNOG 9: Ron Broersma - Enterprise IPv6 Deployment PLNOG 9: Ron Broersma - Enterprise IPv6 Deployment
PLNOG 9: Ron Broersma - Enterprise IPv6 Deployment
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Red hatpartner2013edb futureofdatabase
Red hatpartner2013edb futureofdatabaseRed hatpartner2013edb futureofdatabase
Red hatpartner2013edb futureofdatabase
 
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsWhat Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
 
Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
 
C* Summit 2013: The Perils and Triumphs of using Cassandra at a .NET/Microsof...
C* Summit 2013: The Perils and Triumphs of using Cassandra at a .NET/Microsof...C* Summit 2013: The Perils and Triumphs of using Cassandra at a .NET/Microsof...
C* Summit 2013: The Perils and Triumphs of using Cassandra at a .NET/Microsof...
 
Doing More With Less: The Economics of Open Source Database Adoption
Doing More With Less: The Economics of Open Source Database AdoptionDoing More With Less: The Economics of Open Source Database Adoption
Doing More With Less: The Economics of Open Source Database Adoption
 

En vedette (7)

Innovation in the Cloud - Rackspace Zurich Event
Innovation in the Cloud - Rackspace Zurich EventInnovation in the Cloud - Rackspace Zurich Event
Innovation in the Cloud - Rackspace Zurich Event
 
Hadoop operations
Hadoop operationsHadoop operations
Hadoop operations
 
Autoscaling Best Practices - WebPerf Barcelona Oct 2014
Autoscaling Best Practices - WebPerf Barcelona Oct 2014Autoscaling Best Practices - WebPerf Barcelona Oct 2014
Autoscaling Best Practices - WebPerf Barcelona Oct 2014
 
Puppet Camp London Fall 2015 - Service Discovery and Puppet
Puppet Camp London Fall 2015 - Service Discovery and PuppetPuppet Camp London Fall 2015 - Service Discovery and Puppet
Puppet Camp London Fall 2015 - Service Discovery and Puppet
 
Ssh that wonderful thing
Ssh that wonderful thingSsh that wonderful thing
Ssh that wonderful thing
 
Networking & dns 101
Networking & dns 101Networking & dns 101
Networking & dns 101
 
Puppet and your Metadata - PuppetCamp London 2015
Puppet and your Metadata - PuppetCamp London 2015Puppet and your Metadata - PuppetCamp London 2015
Puppet and your Metadata - PuppetCamp London 2015
 

Similaire à Introduction to hadoop

Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013
Jean-Pierre König
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
Andrew Brust
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Andrew Brust
 

Similaire à Introduction to hadoop (20)

Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013
 
Introduction to hadoop V2
Introduction to hadoop V2Introduction to hadoop V2
Introduction to hadoop V2
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Polyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great TogetherPolyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great Together
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Hortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopHortonworks Big Data & Hadoop
Hortonworks Big Data & Hadoop
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x Program
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 
DBA to Data Scientist
DBA to Data ScientistDBA to Data Scientist
DBA to Data Scientist
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 

Plus de Marc Cluet

How to implement microservices
How to implement microservicesHow to implement microservices
How to implement microservices
Marc Cluet
 
Rackspace Hack Night - Vagrant & Packer
Rackspace Hack Night - Vagrant & PackerRackspace Hack Night - Vagrant & Packer
Rackspace Hack Night - Vagrant & Packer
Marc Cluet
 

Plus de Marc Cluet (15)

Your Kernel and You
Your Kernel and YouYour Kernel and You
Your Kernel and You
 
Managing DevOps teams, staying alive
Managing DevOps teams, staying aliveManaging DevOps teams, staying alive
Managing DevOps teams, staying alive
 
The DevOps journey - How to get there painlessly
The DevOps journey - How to get there painlesslyThe DevOps journey - How to get there painlessly
The DevOps journey - How to get there painlessly
 
Elastic Beanstalk, usos prácticos y conceptos
Elastic Beanstalk, usos prácticos y conceptosElastic Beanstalk, usos prácticos y conceptos
Elastic Beanstalk, usos prácticos y conceptos
 
Service discovery and puppet
Service discovery and puppetService discovery and puppet
Service discovery and puppet
 
Consul First Steps
Consul First StepsConsul First Steps
Consul First Steps
 
Microservices and the Cloud - DevOps Cardiff Meetup
Microservices and the Cloud - DevOps Cardiff MeetupMicroservices and the Cloud - DevOps Cardiff Meetup
Microservices and the Cloud - DevOps Cardiff Meetup
 
Microservices and the Cloud
Microservices and the CloudMicroservices and the Cloud
Microservices and the Cloud
 
How to implement microservices
How to implement microservicesHow to implement microservices
How to implement microservices
 
A Metadata Ocean in Chef and Puppet
A Metadata Ocean in Chef and PuppetA Metadata Ocean in Chef and Puppet
A Metadata Ocean in Chef and Puppet
 
Autoscaling Best Practices
Autoscaling Best PracticesAutoscaling Best Practices
Autoscaling Best Practices
 
Rackspace Hack Night - Vagrant & Packer
Rackspace Hack Night - Vagrant & PackerRackspace Hack Night - Vagrant & Packer
Rackspace Hack Night - Vagrant & Packer
 
Introduction to DevOps - Rackspace tech night
Introduction to DevOps - Rackspace tech nightIntroduction to DevOps - Rackspace tech night
Introduction to DevOps - Rackspace tech night
 
Juju + Puppet (Puppetconf 2011)
Juju + Puppet (Puppetconf 2011)Juju + Puppet (Puppetconf 2011)
Juju + Puppet (Puppetconf 2011)
 
Scalable, good, cheap
Scalable, good, cheapScalable, good, cheap
Scalable, good, cheap
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Introduction to hadoop

  • 1. Marc  Cluet  –  Lynx  Consultants   What’s  behind  Big  Data  
  • 2. What we’ll cover? ¡  Understand  Hadoop  components   ¡  Understand  different  technologies  involved   ¡  Embrace  Big  Data!   Lynx  Consultants  ©  2013  
  • 3. What is Big Data? Lynx  Consultants  ©  2013  
  • 4. What is Big Data? ¡   SQL  has  a  limited  ability  to  process  changing  data   §  SQL  schemas  are  the  truth,  data  needs  to  fit  that   Lynx  Consultants  ©  2013  
  • 5. What is Big Data? ¡   Big  Data  is  the  solution!   §  Data  can  be  truly  dynamic   Lynx  Consultants  ©  2013  
  • 6. What is Big Data? ¡   Big  Data  is  the  solution!   §  Data  can  be  truly  dynamic   §  Designed  to  handle  Terabytes  of  data   Lynx  Consultants  ©  2013  
  • 7. What is Big Data? ¡   Big  Data  is  the  solution!   §  Data  can  be  truly  dynamic   §  Designed  to  handle  Terabytes  of  data   §  Designed  for  fault  tolerance  and  securing  data   Lynx  Consultants  ©  2013  
  • 8. What is Big Data? ¡   Big  Data  is  the  solution!   §  Data  can  be  truly  dynamic   §  Designed  to  handle  Terabytes  of  data   §  Designed  for  fault  tolerance  and  securing  data   §  Designed  around  exploiting  hardware  to  the  fullest   Lynx  Consultants  ©  2013  
  • 9. What is Big Data? ¡   Big  Data  is  the  solution!   §  Data  can  be  truly  dynamic   §  Designed  to  handle  Terabytes  of  data   §  Designed  for  fault  tolerance  and  securing  data   §  Designed  around  exploiting  hardware  to  the  fullest   §  Designed  around  Map/Reduce   Lynx  Consultants  ©  2013  
  • 10. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 11. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 12. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 13. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 14. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 15. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 16. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 17. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 18. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 19. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 20. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 21. Who runs Big Data? ¡  A  few  small  companies   Lynx  Consultants  ©  2013  
  • 22. What is Hadoop? Lynx  Consultants  ©  2013  
  • 23. What is Hadoop? ¡   Hadoop  is  one  of  the  big  players  for  Big  Data   §  Developed  as  an  Open  Source  implementation  to  implement   Google  BigTable   Lynx  Consultants  ©  2013  
  • 24. What is Hadoop? ¡   Hadoop  is  one  of  the  big  players  for  Big  Data   §  Developed  as  an  Open  Source  implementation  to  implement   Google  BigTable   §  Mainly  developed  at  Yahoo!   Lynx  Consultants  ©  2013  
  • 25. What is Hadoop? ¡   Hadoop  is  one  of  the  big  players  for  Big  Data   §  Developed  as  an  Open  Source  implementation  to  implement   Google  BigTable   §  Mainly  developed  at  Yahoo!   §  Current  companies  behind  it:  Hortonworks  and  Cloudera   Lynx  Consultants  ©  2013  
  • 26. What are the features of Hadoop? ¡   HDFS  –  Hadoop  Distributed  File  System   §  HDFS  is  a  distributed  filesystem  across  many  nodes   §  Has  many  copies  of  your  data  (default:  3)   §  If  one  node  goes  down  makes  sure  all  the  data  is  rebalanced   Lynx  Consultants  ©  2013  
  • 27. What are the features of Hadoop? ¡   HDFS  –  Hadoop  Distributed  File  System   Lynx  Consultants  ©  2013  
  • 28. What are the features of Hadoop? ¡   HDFS  –  Hadoop  Distributed  File  System   ¡   Hbase  –  Hadoop  NoSQL  Database   §  Schemaless  Key-­‐Value  storage   §  All  data  exportable  in  JSON   Lynx  Consultants  ©  2013  
  • 29. What are the features of Hadoop? ¡   HDFS  –  Hadoop  Distributed  File  System   ¡   Hbase  –  Hadoop  NoSQL  Database   Lynx  Consultants  ©  2013  
  • 30. What are the features of Hadoop? ¡   HDFS  –  Hadoop  Distributed  File  System   ¡   Hbase  –  Hadoop  NoSQL  Database   ¡   Map/Reduce  –  The  key  to  it  all   §  This  was  invented  by  Google   §  Given  a  dataset  we  Map  all  that  match  a  criteria   §  Then  we  Reduce  this  to  a  result   Lynx  Consultants  ©  2013  
  • 31. What are the features of Hadoop? ¡  Map/Reduce  –  The  key  to  it  all   Lynx  Consultants  ©  2013  
  • 32. What are the features of Hadoop? ¡   HDFS  –  Hadoop  Distributed  File  System   ¡   Hbase  –  Hadoop  NoSQL  Database   ¡   Map/Reduce  –  The  key  to  it  all   ¡   Hive  –  SQL  for  NoSQL   §  Hive  provides  a  SQL  language  called  HiveSQL   §  Provides  a  good  entrance  for  SQL  users  :)   Lynx  Consultants  ©  2013  
  • 33. What are the features of Hadoop? ¡   HDFS  –  Hadoop  Distributed  File  System   ¡   Hbase  –  Hadoop  NoSQL  Database   ¡   Map/Reduce  –  The  key  to  it  all   ¡   Hive  –  SQL  for  NoSQL   ¡   Pig  –  Map/Reduce  made  easy   §  Creates  data  results  given  a  reduced  language   §  Reinvents  SQL  somehow   Lynx  Consultants  ©  2013  
  • 34. What are the features of Hadoop? ¡   Hive   Lynx  Consultants  ©  2013  
  • 35. What are the features of Hadoop? ¡   Pig   Lynx  Consultants  ©  2013  
  • 36. What are the features of Hadoop? ¡   HDFS  –  Hadoop  Distributed  File  System   ¡   Hbase  –  Hadoop  NoSQL  Database   ¡   Map/Reduce  –  The  key  to  it  all   ¡   Hive  –  SQL  for  NoSQL   ¡   Pig  –  Map/Reduce  made  easy   ¡   Flume  –  Fault  Tolerant  transport   Lynx  Consultants  ©  2013  
  • 37. What are the features of Hadoop? ¡   Flume   §  Divides  in  Sources,  Channels,  Sinks   §  Can  have  multiple  of  everything,  makes  it  fault  tolerant   §  Many  sources!   ▪  Avro,  Exec,  JMS,  Syslog,  HTTP,  NetCat,  Your  Own  (Java)   Lynx  Consultants  ©  2013  
  • 38. What are the features of Hadoop? ¡   Flume   §  Divides  in  Sources,  Channels,  Sinks   §  Can  have  multiple  of  everything,  makes  it  fault  tolerant   §  Many  sources!   §  Many  channels!   ▪  Memory,  File,  Your  Own  (Java)   Lynx  Consultants  ©  2013  
  • 39. What are the features of Hadoop? ¡   Flume   §  Divides  in  Sources,  Channels,  Sinks   §  Can  have  multiple  of  everything,  makes  it  fault  tolerant   §  Many  sources!   §  Many  channels!   §  Many  sinks!   ▪  Avro,  HDFS,  Logger,  IRC,  File,  Hbase,  ElasticSearch,  S3,  Community   sinks,  Your  Own  (Java)   Lynx  Consultants  ©  2013  
  • 40. What are the features of Hadoop? ¡   Flume   Lynx  Consultants  ©  2013  
  • 41. How Hadoop looks like in a DC ¡   Components   §  Primary  Namenode   §  Secondary  Namenode   §  Data  Node   Lynx  Consultants  ©  2013  
  • 42. How Hadoop looks like in a DC ¡   Components   §  Primary  Namenode   ▪  Controls  all  the  cluster,  knows  where  the  data  resides   ▪  Runs  the  job  tracker  to  keep  track  of  Map/Reduce  jobs   ▪  Biggest  point  of  failure,  shadowing  it  is  a  potential  option   §  Secondary  Namenode   §  Data  Node   Lynx  Consultants  ©  2013  
  • 43. How Hadoop looks like in a DC ¡   Components   §  Primary  Namenode   §  Secondary  Namenode   ▪  Performs  secondary  cleanup  options   §  Data  Node   Lynx  Consultants  ©  2013  
  • 44. How Hadoop looks like in a DC ¡   Components   §  Primary  Namenode   §  Secondary  Namenode   §  Data  Node   ▪  Stores  all  the  information   ▪  Runs  Map/Reduce   Lynx  Consultants  ©  2013  
  • 45. How Hadoop looks like in a DC ¡   Components   Lynx  Consultants  ©  2013