SlideShare une entreprise Scribd logo
1  sur  41
BigData Hadoop
Second Floor and Third Floor,
5/3 BEML Layout,
Varathur Road, Thubarahalli,
Kundalahalli Gate, Bangalore 66
Landmark – Behind Kundalahalli Gate bus stop,
Opposite to SKR Convention Mall,
Next to AXIS Bank.
WHAT I LEARNED ?
1. Course : BigData Hadoop
2. Technology Learned :
●
Hadoop
MapReduce
Single node & Multi node Cluster
Dockers
Ansible
Python
●
●
●
●
●
What Is Big Data ?
● Big data is a term for data sets that are so large or complex that
traditional data processing application software is inadequate to deal
with them.
Generally speaking, big data is:
Large Datasets
The category of computing strategies and technologies that are used
to handle large datasets.
"Large dataset" means a dataset too large to reasonably process or
store with traditional tooling or on a single computer
●
●
●
●
Categories Of BigData
●
Social Media Data:
Social networking sites such as Face book andTwitter contains the information and the
views posted by millions of people across the globe.
●
Black Box Data:
It is an incorporated by flight crafts, which stores a large sum of information, which
includes the conversation between crew members and any other communications (alert
messages or any order passed)by the technical grounds duty staff.
●
Search Engine Data:
Search engines retrieve a large amount of data from different sources of database.
●
Stock Exchange Data:
It holds information (complete details of in and out of business transactions) about the
‘buyer’ and ‘seller’ decisions in terms of share between different companies made by the
customers.
●
Power Grid Data:
The power grid data mainly holds the information consumed by a particular node in
terms of base station.
●
Transport Data:
It includes the data’s from various transport sectors such as model, capacity, distance
and availability of a vehicle.
BigData Challenges &
Issues
4 V’s of BigData :
●
Volume
●
Variety
●
Velocity
●
Veracity
VOLUME
● The main characteristic that makes data “big” is the
sheer volume.
Volume defines the huge amount of data that is
produced each day by companies.
The generation of data is so large and complex that
it can no longer be saved or analyzed using
conventional data processing methods.
●
●
VARIETY
● Variety refers to the diversity of data types and data
sources.
Types of data :
Structured
Semi-structured
Unstructured
●
VARIETY Continued..
Structured Data :
●
Structured data is very banal.
Structured data refers to any data that resides in a fixed
field within a record or file.
It concerns all data which can be stored in database SQL in
table with rows and columns and spreadsheets.
Structured data refers to any data that resides in a fixed
field within a record or file.
●
●
●
VARIETY Continued..
Unstructured Data :
●
Unstructured data represent around 80% of data.
It is all those things that can't be so readily classified and fit
into a neat box
It often include text and multimedia content.
Examples include e-mail messages, word processing
documents, videos, photos, audio files, presentations,
webpages and many other kinds of business documents.
●
●
●
VARIETY Continued..
Semi-structured Data :
● Semi-structured data is information that doesn’t reside in a
relational database but that does have some organizational
properties that make it easier to analyze.
Examples of semi-structured :
CSV but XML and JSON documents are semi structured
documents, NoSQL databases are considered as semi structured.
Note : Structured data, semi structured data represents a few
parts of data (5 to 10%) so the last data type is the strong one :
unstructured data.
●
●
VELOCITY
● Velocity is the frequency of incoming data that needs to be
generated, analyzed and processed.
Today this is mostly possible within a fraction of a second, known as
real time.
Think about how many SMS messages, Facebook status updates, or
credit card swipes are being sent on a particular telecom carrier every
minute of every day, and you’ll have a good appreciation of velocity.
A streaming application like AmazonWeb Services is an example of
an application that handles the velocity of data.
●
●
●
VERACITY
●
Veracity == Quality
A lot of data and a big variety of data with fast access are
not enough. The data must have quality and produce
credible results that enable right action when it comes to
end of life decision making.
Veracity refers to the biases, noise and abnormality in data
and it also refers to the trustworthiness of the data.
●
●
BIGDATA SOLUTIONS
Traditional Enterprise Approach
●
This approach of enterprise will use a computer to store and process big data.
For storage purpose is available of their choice of database vendors such as
Oracle, IBM, etc.
The user interacts with the application, which executes data storage and
analysis.
●
●
LIMITATION
● This approach are good for those applications which
require low storage, processing and database capabilities,
but when it comes to dealing with large amounts of
scalable data, it imposes a bottleneck.
SOLUTION
● Google solved this problem
using an algorithm based on
MapReduce.
This algorithm divides the task
into small parts or units and
assigns them to multiple
computers, and intermediate
results together integrated
results in the desired results.
●
Hadoop As A Rescue
HADOOP
● Apache Hadoop is the most important framework for working
with Big Data.
Training in Bangalore is best HadoopTraining Institute in
Bangalore
Hadoop is open source framework written in JAVA.
It efficiently processes large volumes of data on a cluster of
commodity hardware.
Hadoop can be setup on single machine, but the real power of
Hadoop comes with a cluster of machines.
It can be scaled from a single machine to thousands of nodes.
●
●
●
●
●
HADOOP Continued...
●
Hadoop biggest strength is scalability.
It upgrades from working on a single node to thousands of
nodes without any issue in a seamless manner.
It is intended to work upon from a single server to thousands
of machines each offering local computation and storage.
It supports the large collection of data set in a distributed
computing environment.
●
●
●
Hadoop Framework Architecture
Hadoop High-Level
Architecture
Hadoop Architecture based on the two main
components namely MapReduce and HDFS :
HDFS & MapReduce
HDFS(Hadoop Distributed File System)
●
Hadoop Distributed File System provides unrestricted, high-speed access
to the data application.
A scalable, fault tolerant, high performance distributed file system.
Namenode holds filesystem metadata.
Files are broken up and spread over datanodes.
Data divided into 64MB(default) or 128 blocks, each block replicated 3
times(default) .
●
●
●
●
ARCHITECTURE OF HDFS
WORKING OF HDFS
MAPREDUCE
● MapReduce is a programming model and for processing and generating big data sets with a
parallel, distributed algorithm on a cluster.
“Map” Step : Each worker node applies the "map()" function to the local data, and writes the
output to a temporary storage. A master node ensures that only one copy of redundant input
data is processed.
“Shuffle” Step :Worker nodes redistribute data based on the output keys (produced by the
"map()" function), such that all data belonging to one key is located on the same worker
node.
“Reduce” Step :Worker nodes now process each group of output data, per key, in parallel.
●
●
●
MAPREDUCE PROCESS
The world’s leading software
container platform
VM’s vs CONTAINERS
DOCKER
●
Docker is the world’s leading software container platform
●
What is a container ?
Containers are a way to package software in a format that can run isolated on a
shared operating system. UnlikeVMs, containers do not bundle a full operating
system - only libraries and settings required to make the software work are
needed.This makes for efficient, lightweight, self-contained systems and
guarantees that software will always run the same, regardless of where it’s
deployed.
WHYUSE DOCKER ?
Docker automates the repetitive tasks of setting up and
configuring development environments so that developers
can focus on what matters: building great software.
ANY QUERIES ?

Contenu connexe

Tendances

Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
Vamshikrishna Goud
 
Bigdata
BigdataBigdata
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
JULIO GONZALEZ SANZ
 

Tendances (20)

Big data hadoop
Big data hadoopBig data hadoop
Big data hadoop
 
Thilga
ThilgaThilga
Thilga
 
Introduction to BigData
Introduction to BigData Introduction to BigData
Introduction to BigData
 
Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Big Data Projects Research Ideas
Big Data Projects Research IdeasBig Data Projects Research Ideas
Big Data Projects Research Ideas
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data tools
Big data toolsBig data tools
Big data tools
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data Science
 
Big Data
Big DataBig Data
Big Data
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
 
Bigdata Analytics using Hadoop
Bigdata Analytics using HadoopBigdata Analytics using Hadoop
Bigdata Analytics using Hadoop
 
Big Data Hadoop
Big Data HadoopBig Data Hadoop
Big Data Hadoop
 
Overview of Bigdata Analytics
Overview of Bigdata Analytics Overview of Bigdata Analytics
Overview of Bigdata Analytics
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Bigdata
BigdataBigdata
Bigdata
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
 

Similaire à Hadoop Training Tutorial for Freshers

Similaire à Hadoop Training Tutorial for Freshers (20)

BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases report
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & Applications
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCESURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
Big Data
Big DataBig Data
Big Data
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
IJARCCE_49
IJARCCE_49IJARCCE_49
IJARCCE_49
 
Presentation1
Presentation1Presentation1
Presentation1
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 

Plus de rajkamaltibacademy

Plus de rajkamaltibacademy (17)

Corejava Training in Bangalore Tutorial
Corejava Training in Bangalore TutorialCorejava Training in Bangalore Tutorial
Corejava Training in Bangalore Tutorial
 
Informatica Training Tutorial
Informatica Training Tutorial Informatica Training Tutorial
Informatica Training Tutorial
 
AWS Training Tutorial for Freshers
AWS Training Tutorial for FreshersAWS Training Tutorial for Freshers
AWS Training Tutorial for Freshers
 
.Net Training Tutorial
.Net Training Tutorial.Net Training Tutorial
.Net Training Tutorial
 
CCNA Training Tutorial in bangaore
CCNA Training Tutorial in bangaoreCCNA Training Tutorial in bangaore
CCNA Training Tutorial in bangaore
 
Django Training Tutorial in Bangalore
Django Training Tutorial in BangaloreDjango Training Tutorial in Bangalore
Django Training Tutorial in Bangalore
 
Python Training Tutorial for Frreshers
Python Training Tutorial for FrreshersPython Training Tutorial for Frreshers
Python Training Tutorial for Frreshers
 
Oracle Training Tutorial for Beginners
Oracle Training Tutorial for BeginnersOracle Training Tutorial for Beginners
Oracle Training Tutorial for Beginners
 
Mongodb Training Tutorial in Bangalore
Mongodb Training Tutorial in BangaloreMongodb Training Tutorial in Bangalore
Mongodb Training Tutorial in Bangalore
 
Angular Tutorial Freshers and Experienced
Angular Tutorial Freshers and ExperiencedAngular Tutorial Freshers and Experienced
Angular Tutorial Freshers and Experienced
 
Python Tutorial for Beginner
Python Tutorial for BeginnerPython Tutorial for Beginner
Python Tutorial for Beginner
 
Teradata Tutorial for Beginners
Teradata Tutorial for BeginnersTeradata Tutorial for Beginners
Teradata Tutorial for Beginners
 
Best Core Java Training In Bangalore
Best Core Java Training In BangaloreBest Core Java Training In Bangalore
Best Core Java Training In Bangalore
 
R Programming Tutorial for Beginners - -TIB Academy
R Programming Tutorial for Beginners - -TIB AcademyR Programming Tutorial for Beginners - -TIB Academy
R Programming Tutorial for Beginners - -TIB Academy
 
Selenium tutorial to Beginners
Selenium tutorial to BeginnersSelenium tutorial to Beginners
Selenium tutorial to Beginners
 
Angularjs Tutorial for Beginners
Angularjs Tutorial for BeginnersAngularjs Tutorial for Beginners
Angularjs Tutorial for Beginners
 
Python Tutorial for Beginner
Python Tutorial for BeginnerPython Tutorial for Beginner
Python Tutorial for Beginner
 

Dernier

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Dernier (20)

On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 

Hadoop Training Tutorial for Freshers

  • 1. BigData Hadoop Second Floor and Third Floor, 5/3 BEML Layout, Varathur Road, Thubarahalli, Kundalahalli Gate, Bangalore 66 Landmark – Behind Kundalahalli Gate bus stop, Opposite to SKR Convention Mall, Next to AXIS Bank.
  • 3. 1. Course : BigData Hadoop 2. Technology Learned : ● Hadoop MapReduce Single node & Multi node Cluster Dockers Ansible Python ● ● ● ● ●
  • 4.
  • 5. What Is Big Data ? ● Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them. Generally speaking, big data is: Large Datasets The category of computing strategies and technologies that are used to handle large datasets. "Large dataset" means a dataset too large to reasonably process or store with traditional tooling or on a single computer ● ● ● ●
  • 6.
  • 8. ● Social Media Data: Social networking sites such as Face book andTwitter contains the information and the views posted by millions of people across the globe. ● Black Box Data: It is an incorporated by flight crafts, which stores a large sum of information, which includes the conversation between crew members and any other communications (alert messages or any order passed)by the technical grounds duty staff. ● Search Engine Data: Search engines retrieve a large amount of data from different sources of database.
  • 9. ● Stock Exchange Data: It holds information (complete details of in and out of business transactions) about the ‘buyer’ and ‘seller’ decisions in terms of share between different companies made by the customers. ● Power Grid Data: The power grid data mainly holds the information consumed by a particular node in terms of base station. ● Transport Data: It includes the data’s from various transport sectors such as model, capacity, distance and availability of a vehicle.
  • 11. 4 V’s of BigData : ● Volume ● Variety ● Velocity ● Veracity
  • 12. VOLUME ● The main characteristic that makes data “big” is the sheer volume. Volume defines the huge amount of data that is produced each day by companies. The generation of data is so large and complex that it can no longer be saved or analyzed using conventional data processing methods. ● ●
  • 13. VARIETY ● Variety refers to the diversity of data types and data sources. Types of data : Structured Semi-structured Unstructured ●
  • 14. VARIETY Continued.. Structured Data : ● Structured data is very banal. Structured data refers to any data that resides in a fixed field within a record or file. It concerns all data which can be stored in database SQL in table with rows and columns and spreadsheets. Structured data refers to any data that resides in a fixed field within a record or file. ● ● ●
  • 15. VARIETY Continued.. Unstructured Data : ● Unstructured data represent around 80% of data. It is all those things that can't be so readily classified and fit into a neat box It often include text and multimedia content. Examples include e-mail messages, word processing documents, videos, photos, audio files, presentations, webpages and many other kinds of business documents. ● ● ●
  • 16. VARIETY Continued.. Semi-structured Data : ● Semi-structured data is information that doesn’t reside in a relational database but that does have some organizational properties that make it easier to analyze. Examples of semi-structured : CSV but XML and JSON documents are semi structured documents, NoSQL databases are considered as semi structured. Note : Structured data, semi structured data represents a few parts of data (5 to 10%) so the last data type is the strong one : unstructured data. ● ●
  • 17. VELOCITY ● Velocity is the frequency of incoming data that needs to be generated, analyzed and processed. Today this is mostly possible within a fraction of a second, known as real time. Think about how many SMS messages, Facebook status updates, or credit card swipes are being sent on a particular telecom carrier every minute of every day, and you’ll have a good appreciation of velocity. A streaming application like AmazonWeb Services is an example of an application that handles the velocity of data. ● ● ●
  • 18. VERACITY ● Veracity == Quality A lot of data and a big variety of data with fast access are not enough. The data must have quality and produce credible results that enable right action when it comes to end of life decision making. Veracity refers to the biases, noise and abnormality in data and it also refers to the trustworthiness of the data. ● ●
  • 19.
  • 21. Traditional Enterprise Approach ● This approach of enterprise will use a computer to store and process big data. For storage purpose is available of their choice of database vendors such as Oracle, IBM, etc. The user interacts with the application, which executes data storage and analysis. ● ●
  • 22. LIMITATION ● This approach are good for those applications which require low storage, processing and database capabilities, but when it comes to dealing with large amounts of scalable data, it imposes a bottleneck.
  • 23. SOLUTION ● Google solved this problem using an algorithm based on MapReduce. This algorithm divides the task into small parts or units and assigns them to multiple computers, and intermediate results together integrated results in the desired results. ●
  • 24. Hadoop As A Rescue
  • 25. HADOOP ● Apache Hadoop is the most important framework for working with Big Data. Training in Bangalore is best HadoopTraining Institute in Bangalore Hadoop is open source framework written in JAVA. It efficiently processes large volumes of data on a cluster of commodity hardware. Hadoop can be setup on single machine, but the real power of Hadoop comes with a cluster of machines. It can be scaled from a single machine to thousands of nodes. ● ● ● ● ●
  • 26. HADOOP Continued... ● Hadoop biggest strength is scalability. It upgrades from working on a single node to thousands of nodes without any issue in a seamless manner. It is intended to work upon from a single server to thousands of machines each offering local computation and storage. It supports the large collection of data set in a distributed computing environment. ● ● ●
  • 29. Hadoop Architecture based on the two main components namely MapReduce and HDFS :
  • 31. HDFS(Hadoop Distributed File System) ● Hadoop Distributed File System provides unrestricted, high-speed access to the data application. A scalable, fault tolerant, high performance distributed file system. Namenode holds filesystem metadata. Files are broken up and spread over datanodes. Data divided into 64MB(default) or 128 blocks, each block replicated 3 times(default) . ● ● ● ●
  • 34. MAPREDUCE ● MapReduce is a programming model and for processing and generating big data sets with a parallel, distributed algorithm on a cluster. “Map” Step : Each worker node applies the "map()" function to the local data, and writes the output to a temporary storage. A master node ensures that only one copy of redundant input data is processed. “Shuffle” Step :Worker nodes redistribute data based on the output keys (produced by the "map()" function), such that all data belonging to one key is located on the same worker node. “Reduce” Step :Worker nodes now process each group of output data, per key, in parallel. ● ● ●
  • 36. The world’s leading software container platform
  • 38. DOCKER ● Docker is the world’s leading software container platform ● What is a container ? Containers are a way to package software in a format that can run isolated on a shared operating system. UnlikeVMs, containers do not bundle a full operating system - only libraries and settings required to make the software work are needed.This makes for efficient, lightweight, self-contained systems and guarantees that software will always run the same, regardless of where it’s deployed.
  • 39. WHYUSE DOCKER ? Docker automates the repetitive tasks of setting up and configuring development environments so that developers can focus on what matters: building great software.
  • 40.