SlideShare une entreprise Scribd logo
1  sur  14
Introduction to Big Data &
Hadoop Architecture
Module 1
• What is Big Data?
• Hadoop Ecosystem Components
• Hadoop Architecture
• Hadoop Storage: HDFS
• Hadoop Processing: MapReduce Framework
• Hadoop Server Roles: NameNode, DataNode, Secondary NameNode
• Anatomy of File Read and Write
What is Big Data?
• Walmart handles more than one million customer transactions every
hour.
• Facebook handles 40 billion photos from its user base.
• New York Stock Exchange generates about one TB of new trade data
per day.
• Last.fm hosts approximately 25 million users, taking up one TB of
storage daily.
• Twitter generates 7 TB of data daily.
• IBM claims 90% of today’s stored data was generated in last two
years.
Three Characteristics of Big Data V3s
• Volume(Data quantity)
• Facebook ingests 500 TB of new data every day.
• Boeing 737 will generate 240 TB of flight data during a single journey.
• Velocity(Data Speed)
• High Frequency stock trading algorithms reflect market changes within
microseconds.
• Clickstreams capture user behavior at millions of events per second.
• Variety(Data Types)
• Geospatial data, Audio and Video, unstructured text.
The structure of Big Data
• Structured
• CSV, Data stored in RDBMS
• Semi-Structured
• XML, JSON, SGML
• Unstructured
• Video data, Audio Data, Images
How Big Data impacts on IT?
• By 2016 4.4 million IT jobs in Big Data; 1.9 million is in US itself
• India will require a minimum of 1 lakh data scientist in the next
couple of years in addition to data analysts and data managers to
support the Big Data space.
• The opportunity for Indian Service providers lies in offering services
around Big Data implementation and analytics for global
multinationals.
References
• www.slideshare.com
• www.wikipedia.com
• www.computereducation.org
• www.youtube.com
• www.about.com
What is Hadoop?
• Apache™ Hadoop® is an open source software project that enables
the distributed processing of large data sets across clusters of
commodity servers.
• It is designed to scale up from a single server to thousands of
machines, with a very high degree of fault tolerance.
• Rather than relying on high-end hardware, the resiliency of these
clusters comes from the software’s ability to detect and handle
failures at the application layer.
Hadoop – Ecosystem Components
Hadoop Ecosystem
• Pig: A scripting language that simplifies the creation of MapReduce
jobs and excels at exploring and transforming data.
• Hive: Provides SQL like access to your Big Data.
• HBase: A Hadoop database.
• Sqoop: For efficiently transferring bulk data between Hadoop and
relation databases.
• Oozie: A workflow scheduler system to manage Apache Hadoop jobs.
• Flume: For efficiently collecting, aggregating, and moving large
amounts of log data.
Hadoop Architecture
Hadoop – Core Components
• HDFS - A file system that spans all the nodes in a Hadoop cluster for
data storage. It links together the file systems on many local nodes to
make them into one big file system. HDFS assumes nodes will fail, so
it achieves reliability by replicating data across multiple nodes
• Map/Reduce – The data processing framework that understands and
assigns work to the nodes in a cluster.
Anatomy of a File Read
Page 70 Definitive Guide
Anatomy of a File Write
Page 73 Definitive Guide

Contenu connexe

Tendances

Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Simplilearn
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map ReduceApache Apex
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaEdureka!
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn
 
VTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computingVTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computingSachin Gowda
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Simplilearn
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATAGauravBiswas9
 

Tendances (20)

Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
Hive
HiveHive
Hive
 
Hadoop
HadoopHadoop
Hadoop
 
VTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computingVTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computing
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
 

En vedette

Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebookparallellabs
 
Individual and societal risk
Individual and societal riskIndividual and societal risk
Individual and societal riskSruthi Madhu
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarCloudera, Inc.
 
Presentación de Moodle
Presentación de MoodlePresentación de Moodle
Presentación de Moodlecruizgaray
 
Data flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into FlinkData flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into FlinkMikio L. Braun
 
El cambio
El cambioEl cambio
El cambiomemoop
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningSwiss Big Data User Group
 
Building Distributed Data Streaming System
Building Distributed Data Streaming SystemBuilding Distributed Data Streaming System
Building Distributed Data Streaming SystemAshish Tadose
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopCloudera, Inc.
 
Cloudera Impala Overview (via Scott Leberknight)
Cloudera Impala Overview (via Scott Leberknight)Cloudera Impala Overview (via Scott Leberknight)
Cloudera Impala Overview (via Scott Leberknight)Cloudera, Inc.
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis PatternsMikio L. Braun
 
Hpca2012 facebook keynote
Hpca2012 facebook keynoteHpca2012 facebook keynote
Hpca2012 facebook keynoteparallellabs
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingApache Apex
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to YarnApache Apex
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataApache Apex
 
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentApache Apex
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 

En vedette (20)

Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebook
 
REDES NEURONALES
REDES NEURONALESREDES NEURONALES
REDES NEURONALES
 
Individual and societal risk
Individual and societal riskIndividual and societal risk
Individual and societal risk
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
 
Presentación de Moodle
Presentación de MoodlePresentación de Moodle
Presentación de Moodle
 
Data flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into FlinkData flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into Flink
 
El cambio
El cambioEl cambio
El cambio
 
The influence-of-prayer-coping-on-patients
The influence-of-prayer-coping-on-patientsThe influence-of-prayer-coping-on-patients
The influence-of-prayer-coping-on-patients
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time Learning
 
Building Distributed Data Streaming System
Building Distributed Data Streaming SystemBuilding Distributed Data Streaming System
Building Distributed Data Streaming System
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
Cloudera Impala Overview (via Scott Leberknight)
Cloudera Impala Overview (via Scott Leberknight)Cloudera Impala Overview (via Scott Leberknight)
Cloudera Impala Overview (via Scott Leberknight)
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis Patterns
 
Hpca2012 facebook keynote
Hpca2012 facebook keynoteHpca2012 facebook keynote
Hpca2012 facebook keynote
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data Processing
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
 
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App Development
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 

Similaire à Introduction to Big Data & Hadoop Architecture - Module 1

Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Getting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsightGetting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsightNilesh Gule
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreSoftweb Solutions
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdataTom Rogers
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologiesneeraj rathore
 

Similaire à Introduction to Big Data & Hadoop Architecture - Module 1 (20)

Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big data
Big dataBig data
Big data
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Getting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsightGetting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsight
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Hadoop training
Hadoop trainingHadoop training
Hadoop training
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
 

Plus de Rohit Agrawal

Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10Rohit Agrawal
 
Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9Rohit Agrawal
 
Advance HBase and Zookeeper - Module 8
Advance HBase and Zookeeper - Module 8Advance HBase and Zookeeper - Module 8
Advance HBase and Zookeeper - Module 8Rohit Agrawal
 
Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Rohit Agrawal
 
Pig and Pig Latin - Module 5
Pig and Pig Latin - Module 5Pig and Pig Latin - Module 5
Pig and Pig Latin - Module 5Rohit Agrawal
 
Advance MapReduce Concepts - Module 4
Advance MapReduce Concepts - Module 4Advance MapReduce Concepts - Module 4
Advance MapReduce Concepts - Module 4Rohit Agrawal
 
Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Rohit Agrawal
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Rohit Agrawal
 
Hive and HiveQL - Module6
Hive and HiveQL - Module6Hive and HiveQL - Module6
Hive and HiveQL - Module6Rohit Agrawal
 

Plus de Rohit Agrawal (9)

Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10Apache Oozie Workflow Scheduler - Module 10
Apache Oozie Workflow Scheduler - Module 10
 
Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9
 
Advance HBase and Zookeeper - Module 8
Advance HBase and Zookeeper - Module 8Advance HBase and Zookeeper - Module 8
Advance HBase and Zookeeper - Module 8
 
Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7
 
Pig and Pig Latin - Module 5
Pig and Pig Latin - Module 5Pig and Pig Latin - Module 5
Pig and Pig Latin - Module 5
 
Advance MapReduce Concepts - Module 4
Advance MapReduce Concepts - Module 4Advance MapReduce Concepts - Module 4
Advance MapReduce Concepts - Module 4
 
Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2
 
Hive and HiveQL - Module6
Hive and HiveQL - Module6Hive and HiveQL - Module6
Hive and HiveQL - Module6
 

Dernier

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Dernier (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Introduction to Big Data & Hadoop Architecture - Module 1

  • 1. Introduction to Big Data & Hadoop Architecture
  • 2. Module 1 • What is Big Data? • Hadoop Ecosystem Components • Hadoop Architecture • Hadoop Storage: HDFS • Hadoop Processing: MapReduce Framework • Hadoop Server Roles: NameNode, DataNode, Secondary NameNode • Anatomy of File Read and Write
  • 3. What is Big Data? • Walmart handles more than one million customer transactions every hour. • Facebook handles 40 billion photos from its user base. • New York Stock Exchange generates about one TB of new trade data per day. • Last.fm hosts approximately 25 million users, taking up one TB of storage daily. • Twitter generates 7 TB of data daily. • IBM claims 90% of today’s stored data was generated in last two years.
  • 4. Three Characteristics of Big Data V3s • Volume(Data quantity) • Facebook ingests 500 TB of new data every day. • Boeing 737 will generate 240 TB of flight data during a single journey. • Velocity(Data Speed) • High Frequency stock trading algorithms reflect market changes within microseconds. • Clickstreams capture user behavior at millions of events per second. • Variety(Data Types) • Geospatial data, Audio and Video, unstructured text.
  • 5. The structure of Big Data • Structured • CSV, Data stored in RDBMS • Semi-Structured • XML, JSON, SGML • Unstructured • Video data, Audio Data, Images
  • 6. How Big Data impacts on IT? • By 2016 4.4 million IT jobs in Big Data; 1.9 million is in US itself • India will require a minimum of 1 lakh data scientist in the next couple of years in addition to data analysts and data managers to support the Big Data space. • The opportunity for Indian Service providers lies in offering services around Big Data implementation and analytics for global multinationals.
  • 7. References • www.slideshare.com • www.wikipedia.com • www.computereducation.org • www.youtube.com • www.about.com
  • 8. What is Hadoop? • Apache™ Hadoop® is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers. • It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. • Rather than relying on high-end hardware, the resiliency of these clusters comes from the software’s ability to detect and handle failures at the application layer.
  • 9. Hadoop – Ecosystem Components
  • 10. Hadoop Ecosystem • Pig: A scripting language that simplifies the creation of MapReduce jobs and excels at exploring and transforming data. • Hive: Provides SQL like access to your Big Data. • HBase: A Hadoop database. • Sqoop: For efficiently transferring bulk data between Hadoop and relation databases. • Oozie: A workflow scheduler system to manage Apache Hadoop jobs. • Flume: For efficiently collecting, aggregating, and moving large amounts of log data.
  • 12. Hadoop – Core Components • HDFS - A file system that spans all the nodes in a Hadoop cluster for data storage. It links together the file systems on many local nodes to make them into one big file system. HDFS assumes nodes will fail, so it achieves reliability by replicating data across multiple nodes • Map/Reduce – The data processing framework that understands and assigns work to the nodes in a cluster.
  • 13. Anatomy of a File Read Page 70 Definitive Guide
  • 14. Anatomy of a File Write Page 73 Definitive Guide