SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Hadoop
    Hadoop         (HDFS)



     




Public 2009/5/13
• Hadoop
• Hadoop   (HDFS)
    –
    –
    –
•




                    Copyright 2009 - Trend Micro Inc.
Hadoop               ?

• Hadoop

• Apache top-level                                  Cloud Applications

• Hadoop
    –                (HDFS)                   MapReduce                  HBase

    – MapReduce
•       Java                                  Hadoop Distributed File System
                                                        (HDFS)
•                 C++/Java/Shell/
    Command…
                                                  A Cluster of Machines
•
    – Linux    Mac OS/X Windows     Solaris
    –


                                                          Copyright 2009 - Trend Micro Inc.
Hadoop

• 2003   2
  – Google           MapReduce
• 2003   10
  – Google     Goofle File System (GFS)
• 2004   12
  – Google     MapReduce
• 2005   7
  – Doug Cutting     Nutch                MapReduce
• 2006   2
  – Hadoop          Nutch            Lucene
• 2006   11
  – Google     Bigtable



                                                      Copyright 2009 - Trend Micro Inc.
Hadoop

• 2007   2
  – Mike Cafarella        Hbase
• 2007   4
  – Yahoo!    1000                Hadoop
• 2008   1
  – Hadoop       Apache




                                           Copyright 2009 - Trend Micro Inc.
Who use Hadoop?
•   Yahoo!
    – Hadoop          2              CPU        10
•   Google
    –                 Hadoop
•   Amazon
    – Amazon          Hadoop
    –
•   IBM
    – Blue Cloud
•   Trend Micro
    –        Hadoop

•             Hadoop           …
    – http://wiki.apache.org/hadoop/PoweredBy



                                                     Copyright 2009 - Trend Micro Inc.
Hadoop   (HDFS)




              Copyright 2009 - Trend Micro Inc.
HDFS

•                                                 (Single
    Namespace)
•
    – 1          1             10 Peta Bytes
•
    – Write-once-read-many
    –
•                              (block)
    –                        128 MB
    –                                 (replica)
            (DataNode)




                                                        Copyright 2009 - Trend Micro Inc.
HDFS

•
    –

•       (File replication)
    –                3   .
    –
•
    –
    –
•
    –                         (low latency)

    –    (Batch processing)

                                              Copyright 2009 - Trend Micro Inc.
Copyright 2009 - Trend Micro Inc.
Copyright 2009 - Trend Micro Inc.
(NameNode)

• NameNode           HDFS                (File System
  Namespace)
   –                  (blocks)
   –         (block)             Data Node
• Hadoop cluster
•




                                                  Copyright 2009 - Trend Micro Inc.
NameNode                              (Metadata)

•   Name node         Metadata

     –         Metadata

     –

•   Metadata
     –              (files)
     –                   (blocks)

     –       (block)
             (Data Node)
     –
         •      :            (creation time),
                       (replication factor)



                                                       Copyright 2009 - Trend Micro Inc.
NameNode                             (Metadata)
•             (      EditLog)
    –

•   FsImage
    – Name Node

         •                (Name Space)
         •        (Block)     (File)

         •
    – NameNode
      FsImage  EditLog


•   Checkpoint
    –     NameNode
    –           FsImange
        EditLog    EditLog
                          FsImange



                                                      Copyright 2009 - Trend Micro Inc.
(Secondary NameNode)

•    NameNode        FsImage     EditLog        NameNode

•    FSImage   EditLog                           FSImage
•        FSImage       NameNode
    – NameNode        EditLog
• Secondary NameNode            NameNode           (Fail over)
    – Hadoop              Name Node


          FsImage
                                      FsImage
                                       (new)

          EditLog



                                                           Copyright 2009 - Trend Micro Inc.
NameNode

•   NameNode          SPOF (single point of failure)
•              (High Availablity)


               SPOF!!




                                                Copyright 2009 - Trend Micro Inc.
(DataNode)

•                    (Blocks)

    –                     (     ext3)

    –        block   metadata
        •               (CRC), block

    –
•   Block
    –            Blocks
      NameNode
    –   NameNode
      block
            NameNode
      block



                                        Copyright 2009 - Trend Micro Inc.
HDFS –                     (Replication)

•             3
•                                 (block size)
    (replication factor)
•                                     (rack- aware)
        .




                                                      Copyright 2009 - Trend Micro Inc.
Block Placement

• Policy (v0.19)
    –
    –
    –
    –
•




                   Copyright 2009 - Trend Micro Inc.
Heartbeats

• DataNode   Heartbeats    NameNode
   –   3
• NameNode    Heartbeats      DataNode




                                         Copyright 2009 - Trend Micro Inc.
(Data Correctness)

•       Checksum
    – Cyclic Redundancy Check (CRC32 )
•
    –          512                Checksum
    – DataNode    Checksum
•
    –                     Checksum
    –




                                             Copyright 2009 - Trend Micro Inc.
(User Interface)

•   API
     – Java API
     – C language wrapper for the Java API is also avaiable

•   POSIX like command
     – hadoop dfs -mkdir /foodir
     – hadoop dfs -cat /foodir/myfile.txt
     – hadoop dfs -rm /foodir myfile.txthadoop dfs -rm /foodir myfile.txt

•   DFSAdmin
     – bin/hadoop dfsadmin –safemode
     – bin/hadoop dfsadmin –report
     – bin/hadoop dfsadmin -refreshNodes

•   Web
     – http://host:port/dfshealth.jsp


                                                                   Copyright 2009 - Trend Micro Inc.
Web




      Copyright 2009 - Trend Micro Inc.
Web
  (http://172.16.203.136:50070)




Classification                    Copyright 2009 - Trend Micro Inc.
POSIX Like command




                     Copyright 2009 - Trend Micro Inc.
Java API




           Copyright 2009 - Trend Micro Inc.
POSIX Like command




                     Copyright 2009 - Trend Micro Inc.
• Hadoop document and installation
   – http://hadoop.apache.org/
• Hadoop Wiki
   – http://wiki.apache.org/hadoop/
• Google File System Paper
   – http://labs.google.com/papers/gfs.html




                                              Copyright 2009 - Trend Micro Inc.

Contenu connexe

Similaire à Zh Tw Introduction To Hadoop And Hdfs

Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
TrendProgContest13
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
Richard McDougall
 
Gregor Hohpe Track Intro The Cloud As Middle Ware
Gregor Hohpe Track Intro The Cloud As Middle WareGregor Hohpe Track Intro The Cloud As Middle Ware
Gregor Hohpe Track Intro The Cloud As Middle Ware
deimos
 
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013
Richard McDougall
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Romeo Kienzler
 

Similaire à Zh Tw Introduction To Hadoop And Hdfs (20)

Zh Tw Introduction To Cloud Computing
Zh Tw Introduction To Cloud ComputingZh Tw Introduction To Cloud Computing
Zh Tw Introduction To Cloud Computing
 
Zh Tw Introduction To H Base
Zh Tw Introduction To H BaseZh Tw Introduction To H Base
Zh Tw Introduction To H Base
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data Processing
 
Gregor Hohpe Track Intro The Cloud As Middle Ware
Gregor Hohpe Track Intro The Cloud As Middle WareGregor Hohpe Track Intro The Cloud As Middle Ware
Gregor Hohpe Track Intro The Cloud As Middle Ware
 
Big Data: Introduction to Hadoop
Big Data: Introduction to HadoopBig Data: Introduction to Hadoop
Big Data: Introduction to Hadoop
 
Hadoop 101 (v1) (20150730)
Hadoop 101 (v1) (20150730)Hadoop 101 (v1) (20150730)
Hadoop 101 (v1) (20150730)
 
Hadoop 101 - Big Data Technology
Hadoop 101 - Big Data TechnologyHadoop 101 - Big Data Technology
Hadoop 101 - Big Data Technology
 
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
 
Zh Tw Introduction To Map Reduce
Zh Tw Introduction To Map ReduceZh Tw Introduction To Map Reduce
Zh Tw Introduction To Map Reduce
 
NameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimeNameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real Time
 
The YUI Library (Yahoo! Course @NCU)
The YUI Library (Yahoo! Course @NCU)The YUI Library (Yahoo! Course @NCU)
The YUI Library (Yahoo! Course @NCU)
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 

Dernier

Dernier (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Zh Tw Introduction To Hadoop And Hdfs

  • 1. Hadoop Hadoop (HDFS)  Public 2009/5/13
  • 2. • Hadoop • Hadoop (HDFS) – – – • Copyright 2009 - Trend Micro Inc.
  • 3. Hadoop ? • Hadoop • Apache top-level Cloud Applications • Hadoop – (HDFS) MapReduce HBase – MapReduce • Java Hadoop Distributed File System (HDFS) • C++/Java/Shell/ Command… A Cluster of Machines • – Linux Mac OS/X Windows Solaris – Copyright 2009 - Trend Micro Inc.
  • 4. Hadoop • 2003 2 – Google MapReduce • 2003 10 – Google Goofle File System (GFS) • 2004 12 – Google MapReduce • 2005 7 – Doug Cutting Nutch MapReduce • 2006 2 – Hadoop Nutch Lucene • 2006 11 – Google Bigtable Copyright 2009 - Trend Micro Inc.
  • 5. Hadoop • 2007 2 – Mike Cafarella Hbase • 2007 4 – Yahoo! 1000 Hadoop • 2008 1 – Hadoop Apache Copyright 2009 - Trend Micro Inc.
  • 6. Who use Hadoop? • Yahoo! – Hadoop 2 CPU 10 • Google – Hadoop • Amazon – Amazon Hadoop – • IBM – Blue Cloud • Trend Micro – Hadoop • Hadoop … – http://wiki.apache.org/hadoop/PoweredBy Copyright 2009 - Trend Micro Inc.
  • 7. Hadoop (HDFS) Copyright 2009 - Trend Micro Inc.
  • 8. HDFS • (Single Namespace) • – 1 1 10 Peta Bytes • – Write-once-read-many – • (block) – 128 MB – (replica) (DataNode) Copyright 2009 - Trend Micro Inc.
  • 9. HDFS • – • (File replication) – 3 . – • – – • – (low latency) – (Batch processing) Copyright 2009 - Trend Micro Inc.
  • 10. Copyright 2009 - Trend Micro Inc.
  • 11. Copyright 2009 - Trend Micro Inc.
  • 12. (NameNode) • NameNode HDFS (File System Namespace) – (blocks) – (block) Data Node • Hadoop cluster • Copyright 2009 - Trend Micro Inc.
  • 13. NameNode (Metadata) • Name node Metadata – Metadata – • Metadata – (files) – (blocks) – (block) (Data Node) – • : (creation time), (replication factor) Copyright 2009 - Trend Micro Inc.
  • 14. NameNode (Metadata) • ( EditLog) – • FsImage – Name Node • (Name Space) • (Block) (File) • – NameNode FsImage EditLog • Checkpoint – NameNode – FsImange EditLog EditLog FsImange Copyright 2009 - Trend Micro Inc.
  • 15. (Secondary NameNode) • NameNode FsImage EditLog NameNode • FSImage EditLog FSImage • FSImage NameNode – NameNode EditLog • Secondary NameNode NameNode (Fail over) – Hadoop Name Node FsImage FsImage (new) EditLog Copyright 2009 - Trend Micro Inc.
  • 16. NameNode • NameNode SPOF (single point of failure) • (High Availablity) SPOF!! Copyright 2009 - Trend Micro Inc.
  • 17. (DataNode) • (Blocks) – ( ext3) – block metadata • (CRC), block – • Block – Blocks NameNode – NameNode block NameNode block Copyright 2009 - Trend Micro Inc.
  • 18. HDFS – (Replication) • 3 • (block size) (replication factor) • (rack- aware) . Copyright 2009 - Trend Micro Inc.
  • 19. Block Placement • Policy (v0.19) – – – – • Copyright 2009 - Trend Micro Inc.
  • 20. Heartbeats • DataNode Heartbeats NameNode – 3 • NameNode Heartbeats DataNode Copyright 2009 - Trend Micro Inc.
  • 21. (Data Correctness) • Checksum – Cyclic Redundancy Check (CRC32 ) • – 512 Checksum – DataNode Checksum • – Checksum – Copyright 2009 - Trend Micro Inc.
  • 22. (User Interface) • API – Java API – C language wrapper for the Java API is also avaiable • POSIX like command – hadoop dfs -mkdir /foodir – hadoop dfs -cat /foodir/myfile.txt – hadoop dfs -rm /foodir myfile.txthadoop dfs -rm /foodir myfile.txt • DFSAdmin – bin/hadoop dfsadmin –safemode – bin/hadoop dfsadmin –report – bin/hadoop dfsadmin -refreshNodes • Web – http://host:port/dfshealth.jsp Copyright 2009 - Trend Micro Inc.
  • 23. Web Copyright 2009 - Trend Micro Inc.
  • 24. Web (http://172.16.203.136:50070) Classification Copyright 2009 - Trend Micro Inc.
  • 25. POSIX Like command Copyright 2009 - Trend Micro Inc.
  • 26. Java API Copyright 2009 - Trend Micro Inc.
  • 27. POSIX Like command Copyright 2009 - Trend Micro Inc.
  • 28. • Hadoop document and installation – http://hadoop.apache.org/ • Hadoop Wiki – http://wiki.apache.org/hadoop/ • Google File System Paper – http://labs.google.com/papers/gfs.html Copyright 2009 - Trend Micro Inc.