SlideShare a Scribd company logo
1 of 25
Download to read offline
Bigdata and Hadoop Tutorial
By www.ITJobZone.biz
Audience
Students
Working Professionals
Developers
Database experts
Adminstrators
Those looking for career change.
Topics Covered
What is Bigdata?
Characteristics of Bigdata.
Use Cases and Need of Hadoop.
Apache Hadoop Ecosystem and its components
Brief History of Hadoop.
Hadoop Architecture.
Who Uses Hadoop
What is Bigdata?
Lots of Data (Terabytes or PetaBytes)
Bigdata is term for collection of datasets so large and
complex that it becomes difficult to process using on
hand database tools or Traditional data processing
applications.
The challenges include capture, curation, storage, search,
sharing, transfer, analysis and visualization.
What is Bigdata?
System /Enterprises generates huge amount of data from Terabytes to Petabytes of information
Stock Market generates around one terabyte of new data per day to
perform stock trading analysis and determine trends for optimal trades.
Bigdata Characteristics
Bigdata - Types of Data
● Structured Data
- Data from enterprise (ERP, CRM)
● Semi structured Data
- xml, json, csv, log files
● Unstructured Data
- audio, video, image, archive documents
Unstructured Data is Exploding
Bigdata Scenarios
Web and e-tailing
Recommendation Engines
Ad Targeting
Search Quality
Abuse and Click Fraud Detection
Telecommunication
Customer Churn Analysis and Prevention
Network Performance Optimization
Call Data Record (CDR) Analysis
Analysing Network to Predict Failure
Bigdata Scenarios (Cont…)
Government
Fraud Detection and Cyber Security
Welfare Schemes
Justice
Telecommunication
Health Information Exchange
Gene Sequencing
Serialization
Healthcare service Quality Improvements
Drug Safety
Brief history of Hadoop
Designed to answer the question
“ How to process big data with reasonable cost and time”?
Google ( published white papers)
GFS
2003
Mapreduce 2005
Yahoo( implementation by Doug Cutting)
HDFS 2006-07
Mapreduce 2007-08
What is Hadoop
Apache top level project, Opensource implementation of
frameworks for reliable, scalable, distributed computing
and storage.
It is a flexible and highly-available architecture for large
scale computation and data processing on a network of
commodity hardware.
Goals / Requirements
Abstract & facilitate the storage and processing of large and rapidly growing data.
• Structured and unstructured datasets
• Simple programming model
High scalability and availability
Use commodity hardware with little redundancy
Fault tolerance
Move computation rather data
Goals / Requirements
Hardware will fail.
Processing will be run in batches. Thus there is an emphasis on
high throughput as opposed to low latency.
Applications that run on HDFS have large data sets. A typical
file in HDFS is gigabytes to terabytes in size.
It should provide high aggregate data bandwidth and scale to
hundreds of nodes in a single cluster. It should support tens
of millions of files in a single instance.
Applications need a write-once-read-many access model.
Architecture
Distributed, with some centralization
Main nodes of cluster are where most of the computational power and storage
of the system lies
Main nodes run TaskTracker to accept and reply to MapReduce tasks, and
also DataNode to store needed blocks closely as possible
Central control node runs NameNode to keep track of HDFS directories &
files, and JobTracker to dispatch compute tasks to TaskTracker
Written in Java, also supports Python and Ruby
Architecture
Architecture
Architecture
Hadoop Distributed Filesystem
Tailored to needs of MapReduce
Targeted towards many reads of filestreams
Writes are more costly
High degree of data replication (3x by default)
No need for RAID on normal nodes
Large blocksize (64MB)
Location awareness of DataNodes in network
Architecture - NameNode:
Stores metadata for the files, like the directory structure of a typical FS.
The server holding the NameNode instance is quite crucial, as there is
only one.
Transaction log for file deletes/adds, etc. Does not use transactions for
whole blocks or file-streams, only metadata.
Handles creation of more replica blocks when necessary after a
DataNode failure
Architecture - DataNode:
Stores the actual data in HDFS
Can run on any underlying filesystem (ext3/4, NTFS, etc)
Notifies NameNode of what blocks it has
NameNode replicates blocks 2x in local rack, 1x elsewhere
Architecture
Some examples
•Yahoo! : More than 100,000 CPUs in ~20,000 computers running Hadoop;
biggest cluster: 2000 nodes (2*4cpu boxes with 4TB disk each); used to support
research for Ad Systems and Web Search
•AOL : Used for a variety of things ranging from statistics generation to running
advanced algorithms for doing behavioral analysis and targeting; cluster size is
50 machines, Intel Xeon, dual processors, dual core, each with 16GB Ram and
800 GB hard-disk giving us a total of 37 TB HDFS capacity.
•Facebook: To store copies of internal log and dimension data sources and use
it as a source for reporting/analytics and machine learning; 320 machine cluster
with 2,560 cores and about 1.3 PB raw storage;
Join Our Hadoop Online Training
http://www.itjobzone.biz/Big-Data-Hadoop-training.html
Get More Hadoop Tutorials on YouTube
Big Data and Hadoop Introduction Training Demo
Click - https://youtu.be/SHxf-v8ePk4
Apache Hadoop Standalone Installation Using Virtual box Training Demo
Click - https://youtu.be/u-YhaIkqubU

More Related Content

What's hot

Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellKhalid Imran
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsJames Serra
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data ScienceSpotle.ai
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyftTao Feng
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera, Inc.
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...Rahul K Chauhan
 
Data Lake Architecture
Data Lake ArchitectureData Lake Architecture
Data Lake ArchitectureDATAVERSITY
 
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introductionAlexey Grigorev
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseData Con LA
 

What's hot (20)

Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
Big data
Big dataBig data
Big data
 
Mongo db intro.pptx
Mongo db intro.pptxMongo db intro.pptx
Mongo db intro.pptx
 
Big Data and Analytics on AWS
Big Data and Analytics on AWS Big Data and Analytics on AWS
Big Data and Analytics on AWS
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for Analytics
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
 
Data Lake Architecture
Data Lake ArchitectureData Lake Architecture
Data Lake Architecture
 
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introduction
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 

Viewers also liked

Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...Edureka!
 
Hadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.inHadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.inTIB Academy
 
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaEdureka!
 
java tutorial for beginner - Free Download
java tutorial for beginner - Free Downloadjava tutorial for beginner - Free Download
java tutorial for beginner - Free DownloadTIB Academy
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiSlim Baltagi
 
Как устроен Blockchain. Лекция 4
Как устроен Blockchain. Лекция 4Как устроен Blockchain. Лекция 4
Как устроен Blockchain. Лекция 4Pavel Kravchenko, PhD
 
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...Edureka!
 
Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014
Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014
Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014Ahmed Mahmoud
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017Carol Smith
 

Viewers also liked (14)

Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
 
Hadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.inHadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.in
 
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
java tutorial for beginner - Free Download
java tutorial for beginner - Free Downloadjava tutorial for beginner - Free Download
java tutorial for beginner - Free Download
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
 
Smart devices
Smart devicesSmart devices
Smart devices
 
Smart devices
Smart devicesSmart devices
Smart devices
 
Как устроен Blockchain. Лекция 4
Как устроен Blockchain. Лекция 4Как устроен Blockchain. Лекция 4
Как устроен Blockchain. Лекция 4
 
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
 
Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014
Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014
Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
 

Similar to Introduction to Big Data Hadoop Training Online by www.itjobzone.biz

Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.MaharajothiP
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xNPN Training
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenmaharajothip1
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyJay Nagar
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampSpotle.ai
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystemnallagangus
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 

Similar to Introduction to Big Data Hadoop Training Online by www.itjobzone.biz (20)

Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Hadoop
HadoopHadoop
Hadoop
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
paper
paperpaper
paper
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 

More from ITJobZone.biz

Sailpoint Online Training on IAM overview
Sailpoint Online Training on IAM overviewSailpoint Online Training on IAM overview
Sailpoint Online Training on IAM overviewITJobZone.biz
 
CyberArk Online Training By Expert Trainer - itjobzone
CyberArk Online Training By Expert Trainer - itjobzoneCyberArk Online Training By Expert Trainer - itjobzone
CyberArk Online Training By Expert Trainer - itjobzoneITJobZone.biz
 
Worksoft certify online training basic demo tutorial
Worksoft certify online training basic demo tutorialWorksoft certify online training basic demo tutorial
Worksoft certify online training basic demo tutorialITJobZone.biz
 
SharePoint Online Training for Admin or Super User or End User
SharePoint Online Training for Admin or Super User or End User SharePoint Online Training for Admin or Super User or End User
SharePoint Online Training for Admin or Super User or End User ITJobZone.biz
 
Introduction to RPA Blue prism Online Training
Introduction to RPA Blue prism Online TrainingIntroduction to RPA Blue prism Online Training
Introduction to RPA Blue prism Online TrainingITJobZone.biz
 
90/10-principle-how-you-react?
90/10-principle-how-you-react?90/10-principle-how-you-react?
90/10-principle-how-you-react?ITJobZone.biz
 
Team spirit and attitude
Team spirit and attitudeTeam spirit and attitude
Team spirit and attitudeITJobZone.biz
 

More from ITJobZone.biz (7)

Sailpoint Online Training on IAM overview
Sailpoint Online Training on IAM overviewSailpoint Online Training on IAM overview
Sailpoint Online Training on IAM overview
 
CyberArk Online Training By Expert Trainer - itjobzone
CyberArk Online Training By Expert Trainer - itjobzoneCyberArk Online Training By Expert Trainer - itjobzone
CyberArk Online Training By Expert Trainer - itjobzone
 
Worksoft certify online training basic demo tutorial
Worksoft certify online training basic demo tutorialWorksoft certify online training basic demo tutorial
Worksoft certify online training basic demo tutorial
 
SharePoint Online Training for Admin or Super User or End User
SharePoint Online Training for Admin or Super User or End User SharePoint Online Training for Admin or Super User or End User
SharePoint Online Training for Admin or Super User or End User
 
Introduction to RPA Blue prism Online Training
Introduction to RPA Blue prism Online TrainingIntroduction to RPA Blue prism Online Training
Introduction to RPA Blue prism Online Training
 
90/10-principle-how-you-react?
90/10-principle-how-you-react?90/10-principle-how-you-react?
90/10-principle-how-you-react?
 
Team spirit and attitude
Team spirit and attitudeTeam spirit and attitude
Team spirit and attitude
 

Recently uploaded

social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 

Recently uploaded (20)

social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 

Introduction to Big Data Hadoop Training Online by www.itjobzone.biz

  • 1. Bigdata and Hadoop Tutorial By www.ITJobZone.biz
  • 3. Topics Covered What is Bigdata? Characteristics of Bigdata. Use Cases and Need of Hadoop. Apache Hadoop Ecosystem and its components Brief History of Hadoop. Hadoop Architecture. Who Uses Hadoop
  • 4. What is Bigdata? Lots of Data (Terabytes or PetaBytes) Bigdata is term for collection of datasets so large and complex that it becomes difficult to process using on hand database tools or Traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization.
  • 5. What is Bigdata? System /Enterprises generates huge amount of data from Terabytes to Petabytes of information Stock Market generates around one terabyte of new data per day to perform stock trading analysis and determine trends for optimal trades.
  • 7. Bigdata - Types of Data ● Structured Data - Data from enterprise (ERP, CRM) ● Semi structured Data - xml, json, csv, log files ● Unstructured Data - audio, video, image, archive documents
  • 9. Bigdata Scenarios Web and e-tailing Recommendation Engines Ad Targeting Search Quality Abuse and Click Fraud Detection Telecommunication Customer Churn Analysis and Prevention Network Performance Optimization Call Data Record (CDR) Analysis Analysing Network to Predict Failure
  • 10. Bigdata Scenarios (Cont…) Government Fraud Detection and Cyber Security Welfare Schemes Justice Telecommunication Health Information Exchange Gene Sequencing Serialization Healthcare service Quality Improvements Drug Safety
  • 11. Brief history of Hadoop Designed to answer the question “ How to process big data with reasonable cost and time”? Google ( published white papers) GFS 2003 Mapreduce 2005 Yahoo( implementation by Doug Cutting) HDFS 2006-07 Mapreduce 2007-08
  • 12. What is Hadoop Apache top level project, Opensource implementation of frameworks for reliable, scalable, distributed computing and storage. It is a flexible and highly-available architecture for large scale computation and data processing on a network of commodity hardware.
  • 13. Goals / Requirements Abstract & facilitate the storage and processing of large and rapidly growing data. • Structured and unstructured datasets • Simple programming model High scalability and availability Use commodity hardware with little redundancy Fault tolerance Move computation rather data
  • 14. Goals / Requirements Hardware will fail. Processing will be run in batches. Thus there is an emphasis on high throughput as opposed to low latency. Applications that run on HDFS have large data sets. A typical file in HDFS is gigabytes to terabytes in size. It should provide high aggregate data bandwidth and scale to hundreds of nodes in a single cluster. It should support tens of millions of files in a single instance. Applications need a write-once-read-many access model.
  • 15. Architecture Distributed, with some centralization Main nodes of cluster are where most of the computational power and storage of the system lies Main nodes run TaskTracker to accept and reply to MapReduce tasks, and also DataNode to store needed blocks closely as possible Central control node runs NameNode to keep track of HDFS directories & files, and JobTracker to dispatch compute tasks to TaskTracker Written in Java, also supports Python and Ruby
  • 18. Architecture Hadoop Distributed Filesystem Tailored to needs of MapReduce Targeted towards many reads of filestreams Writes are more costly High degree of data replication (3x by default) No need for RAID on normal nodes Large blocksize (64MB) Location awareness of DataNodes in network
  • 19. Architecture - NameNode: Stores metadata for the files, like the directory structure of a typical FS. The server holding the NameNode instance is quite crucial, as there is only one. Transaction log for file deletes/adds, etc. Does not use transactions for whole blocks or file-streams, only metadata. Handles creation of more replica blocks when necessary after a DataNode failure
  • 20. Architecture - DataNode: Stores the actual data in HDFS Can run on any underlying filesystem (ext3/4, NTFS, etc) Notifies NameNode of what blocks it has NameNode replicates blocks 2x in local rack, 1x elsewhere
  • 22.
  • 23.
  • 24. Some examples •Yahoo! : More than 100,000 CPUs in ~20,000 computers running Hadoop; biggest cluster: 2000 nodes (2*4cpu boxes with 4TB disk each); used to support research for Ad Systems and Web Search •AOL : Used for a variety of things ranging from statistics generation to running advanced algorithms for doing behavioral analysis and targeting; cluster size is 50 machines, Intel Xeon, dual processors, dual core, each with 16GB Ram and 800 GB hard-disk giving us a total of 37 TB HDFS capacity. •Facebook: To store copies of internal log and dimension data sources and use it as a source for reporting/analytics and machine learning; 320 machine cluster with 2,560 cores and about 1.3 PB raw storage;
  • 25. Join Our Hadoop Online Training http://www.itjobzone.biz/Big-Data-Hadoop-training.html Get More Hadoop Tutorials on YouTube Big Data and Hadoop Introduction Training Demo Click - https://youtu.be/SHxf-v8ePk4 Apache Hadoop Standalone Installation Using Virtual box Training Demo Click - https://youtu.be/u-YhaIkqubU