SlideShare une entreprise Scribd logo
1  sur  14
SOFTWARE DEVELOPMENT DONE RIGHT
       Netherlands | USA | India | UK | France
What is Big
Data?

  Generally refers to data that can not be
processed by traditional systems efficiently
mainly because of it's size.


    Twitter/Facebook example
    
      Facebook – 500TB data daily
    
      Twitter – 250million tweets daily


 90% of data has been generated in last 2-3
years.
Big Data
Sources

    Sources -
    • Social networking sites like twitter, facebook etc.
    • Smart phones
    • Trading platforms
    • Machines
    • Log Files


    This data is used for different purposes like
     • Product Trends
     • Market Analysis
What is
Hadoop ?

  Apache Hadoop is a Framework for running
applications on large cluster built of commodity
hardware.

  Transparently provides applications both
reliability and data motion.

  Implements a computational paradigm named
Map/Reduce where application is divided in
small fragments of work.

  Provides a distributed file system (HDFS)

  Transfers code near to data.

  Hadoop opened the gates for processing Big
Data
Hadoop's
History

    Hadoop is based on work done by Google

    
        GFS – HDFS

    
      Google Map Reduce – Hadoop Map
    Reduce

    
        BigTable – HBase
Hadoop
Features

    Partial Failure Support


    Data Recoverability


    Component Recovery


    Consistency


    Scalability
Hadoop
Components

    Core Components
    • HDFS – Hadoop Distributed File System
    • Map Reduce



    Projects in Hadoop Ecosystem
    • Pig, Hive, HBase, Flume, Oozie, Sqoop
    etc.
HDFS
Map/Reduce
Case
Study

  Product - Data Quality and cleansing product
solutions.


    Before Hadoop
     
       Two node DB cluster
     
       Multi-threaded java application for de-
     duplication
     
       1 million records took 10 hrs. to process


    After Hadoop
     
        8 GB Ram, 4 cores, 4 machines in cluster.
     
       1 million records took 30 min to process
Hadoop In
Use

    Any application which has
     
       > 10TB data
     
       Needs fast and cheap processing

    Log Analysis

    Recommendation Engine

    Feed Analysis

    Data Mining

    Statistical Analysis

    ETL Processing

    Business Intelligence
Cloudera
 
   Cloudera is “The commercial Hadoop
 company”.

 
   Founded by leading experts on Hadoop
 from Facebook, Google,Oracle and Yahoo.

 
   Provides consulting and training services
 for Hadoop users.

 
   Staff includes committers to virtually all
 Hadoop projects.
Resources
 
     Books
      
        Hadoop : The Definitive Guide (by Tom White)
      
        Hbase : The Definitive Guide (by Lars George)
      
        MapReduce Design Patterns (by Donald Miner)

 
     Web
      
          http://hadoop.apache.org/
      
          http://hbase.apache.org/
      
          http://research.google.com/archive/bigtable.html
      
          http://research.google.com/archive/mapreduce-osdi04.pdf
Contact us @




                 Xebia India
Website
www.xebia.com                  Thought Leadership
www.xebia.in                   http://blog.xebia.com
www.xebia.fr                   http://podcast.xebia.com

Contenu connexe

Tendances

Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune amrutupre
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction葵慶 李
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applicationsdzhou
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoopdhruv_gairola
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaEdureka!
 
Hadoop Architecture
Hadoop Architecture Hadoop Architecture
Hadoop Architecture Ganesh B
 
Introduction to Big Data and hadoop
Introduction to Big Data and hadoopIntroduction to Big Data and hadoop
Introduction to Big Data and hadoopSandeep Patil
 

Tendances (19)

Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Apache Hadoop at 10
Apache Hadoop at 10Apache Hadoop at 10
Apache Hadoop at 10
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applications
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Hadoop Presentation
Hadoop PresentationHadoop Presentation
Hadoop Presentation
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Cassandra eu
Cassandra euCassandra eu
Cassandra eu
 
Hadoop Architecture
Hadoop Architecture Hadoop Architecture
Hadoop Architecture
 
Hadoop..
Hadoop..Hadoop..
Hadoop..
 
Introduction to Big Data and hadoop
Introduction to Big Data and hadoopIntroduction to Big Data and hadoop
Introduction to Big Data and hadoop
 

En vedette

GlobalNow Inc. VERSO - System View
GlobalNow Inc. VERSO - System ViewGlobalNow Inc. VERSO - System View
GlobalNow Inc. VERSO - System Viewglobalnowbob
 
Java 7 Features and Enhancements
Java 7 Features and EnhancementsJava 7 Features and Enhancements
Java 7 Features and EnhancementsGagan Agrawal
 
Using Social Media for Departments and Programs at Yale
Using Social Media for Departments and Programs at YaleUsing Social Media for Departments and Programs at Yale
Using Social Media for Departments and Programs at YaleYale University Careers
 
Pecha-Kucha 2 Social Media Visibility- 20 slides-20 sec each
Pecha-Kucha 2 Social Media Visibility- 20 slides-20 sec eachPecha-Kucha 2 Social Media Visibility- 20 slides-20 sec each
Pecha-Kucha 2 Social Media Visibility- 20 slides-20 sec eachYale University Careers
 
GPars (Groovy Parallel Systems)
GPars (Groovy Parallel Systems)GPars (Groovy Parallel Systems)
GPars (Groovy Parallel Systems)Gagan Agrawal
 
Yale waterfall delivery approach training deck
Yale waterfall delivery approach training deckYale waterfall delivery approach training deck
Yale waterfall delivery approach training deckYale University Careers
 
Knowledge12 yale service management rapid maturity yale format
Knowledge12 yale service management rapid maturity   yale formatKnowledge12 yale service management rapid maturity   yale format
Knowledge12 yale service management rapid maturity yale formatYale University Careers
 
Using social media to increase organizational committment
Using social media to increase organizational committmentUsing social media to increase organizational committment
Using social media to increase organizational committmentYale University Careers
 

En vedette (15)

Tsplost
TsplostTsplost
Tsplost
 
GlobalNow Inc. VERSO - System View
GlobalNow Inc. VERSO - System ViewGlobalNow Inc. VERSO - System View
GlobalNow Inc. VERSO - System View
 
Java 7 Features and Enhancements
Java 7 Features and EnhancementsJava 7 Features and Enhancements
Java 7 Features and Enhancements
 
Using Social Media for Departments and Programs at Yale
Using Social Media for Departments and Programs at YaleUsing Social Media for Departments and Programs at Yale
Using Social Media for Departments and Programs at Yale
 
Llibre de coneixements foxcat
Llibre de coneixements foxcatLlibre de coneixements foxcat
Llibre de coneixements foxcat
 
Pecha-Kucha 2 Social Media Visibility- 20 slides-20 sec each
Pecha-Kucha 2 Social Media Visibility- 20 slides-20 sec eachPecha-Kucha 2 Social Media Visibility- 20 slides-20 sec each
Pecha-Kucha 2 Social Media Visibility- 20 slides-20 sec each
 
GPars (Groovy Parallel Systems)
GPars (Groovy Parallel Systems)GPars (Groovy Parallel Systems)
GPars (Groovy Parallel Systems)
 
Yale waterfall delivery approach training deck
Yale waterfall delivery approach training deckYale waterfall delivery approach training deck
Yale waterfall delivery approach training deck
 
Knowledge12 yale service management rapid maturity yale format
Knowledge12 yale service management rapid maturity   yale formatKnowledge12 yale service management rapid maturity   yale format
Knowledge12 yale service management rapid maturity yale format
 
Service management roadmap fy14 fy16
Service management roadmap fy14   fy16Service management roadmap fy14   fy16
Service management roadmap fy14 fy16
 
Turn your skills into a career in IT
Turn your skills into a career in ITTurn your skills into a career in IT
Turn your skills into a career in IT
 
Service lifecycle placemat v3
Service lifecycle placemat v3Service lifecycle placemat v3
Service lifecycle placemat v3
 
Ec0 401
Ec0 401Ec0 401
Ec0 401
 
Using social media to increase organizational committment
Using social media to increase organizational committmentUsing social media to increase organizational committment
Using social media to increase organizational committment
 
Request management lunch and learn v3
Request management lunch and learn v3Request management lunch and learn v3
Request management lunch and learn v3
 

Similaire à Hadoop

Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
An introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAn introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAmir Sedighi
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop DeveloperEdureka!
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in AmritsarE2MATRIX
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in MohaliE2MATRIX
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in LudhianaE2MATRIX
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14John Sing
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environmentDelhi/NCR HUG
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 

Similaire à Hadoop (20)

Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
An introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAn introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoop
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in Amritsar
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in Mohali
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in Ludhiana
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
HDFS
HDFSHDFS
HDFS
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 

Hadoop

  • 1. SOFTWARE DEVELOPMENT DONE RIGHT Netherlands | USA | India | UK | France
  • 2. What is Big Data?  Generally refers to data that can not be processed by traditional systems efficiently mainly because of it's size.  Twitter/Facebook example  Facebook – 500TB data daily  Twitter – 250million tweets daily  90% of data has been generated in last 2-3 years.
  • 3. Big Data Sources  Sources - • Social networking sites like twitter, facebook etc. • Smart phones • Trading platforms • Machines • Log Files  This data is used for different purposes like • Product Trends • Market Analysis
  • 4. What is Hadoop ?  Apache Hadoop is a Framework for running applications on large cluster built of commodity hardware.  Transparently provides applications both reliability and data motion.  Implements a computational paradigm named Map/Reduce where application is divided in small fragments of work.  Provides a distributed file system (HDFS)  Transfers code near to data.  Hadoop opened the gates for processing Big Data
  • 5. Hadoop's History  Hadoop is based on work done by Google  GFS – HDFS  Google Map Reduce – Hadoop Map Reduce  BigTable – HBase
  • 6. Hadoop Features  Partial Failure Support  Data Recoverability  Component Recovery  Consistency  Scalability
  • 7. Hadoop Components  Core Components • HDFS – Hadoop Distributed File System • Map Reduce  Projects in Hadoop Ecosystem • Pig, Hive, HBase, Flume, Oozie, Sqoop etc.
  • 10. Case Study  Product - Data Quality and cleansing product solutions.  Before Hadoop  Two node DB cluster  Multi-threaded java application for de- duplication  1 million records took 10 hrs. to process  After Hadoop  8 GB Ram, 4 cores, 4 machines in cluster.  1 million records took 30 min to process
  • 11. Hadoop In Use  Any application which has  > 10TB data  Needs fast and cheap processing  Log Analysis  Recommendation Engine  Feed Analysis  Data Mining  Statistical Analysis  ETL Processing  Business Intelligence
  • 12. Cloudera  Cloudera is “The commercial Hadoop company”.  Founded by leading experts on Hadoop from Facebook, Google,Oracle and Yahoo.  Provides consulting and training services for Hadoop users.  Staff includes committers to virtually all Hadoop projects.
  • 13. Resources  Books  Hadoop : The Definitive Guide (by Tom White)  Hbase : The Definitive Guide (by Lars George)  MapReduce Design Patterns (by Donald Miner)  Web  http://hadoop.apache.org/  http://hbase.apache.org/  http://research.google.com/archive/bigtable.html  http://research.google.com/archive/mapreduce-osdi04.pdf
  • 14. Contact us @ Xebia India Website www.xebia.com Thought Leadership www.xebia.in http://blog.xebia.com www.xebia.fr http://podcast.xebia.com