SlideShare une entreprise Scribd logo
1  sur  14
What is Hadoop
HTTP://WWW.ASTERIXSOLUTION.COM/BIG-DATA-HADOOP-TRAINING-IN-
MUMBAI.HTML
 When we look at how data was handled in the past, we see that it was a
fairly easy task due to the limited amount of data that professionals had to
work with. Years ago, only one processor and storage unit was required to
handle data.
 It was handled with the concept of structured data and a database that
contained the relevant data.
 SQL queries made it possible to go through giant spreadsheets with
multiple rows and columns.
 As the years went by and data generation increased, higher volumes and
more formats emerged. Hence, multiple processors were needed to
process data in order to save time.
 However, a single storage unit became the bottleneck due to the network
overhead that was generated. This led to using a distributed storage unit
for each processor, which made data access easier.
 This method is known as parallel processing with distributed storage -
various computers run the processes on various storages.
 Big Data and its Challenges
 Big data refers to the massive amount of data which cannot be stored,
processed and analyzed using traditional ways.
 The main elements of big data are:
 Volume - There is a massive amount of data generated every second.
 Velocity - The speed at which data is generated, collected and analyzed
 Variety - The different types of data: structured, semi-structured,
unstructured
 Value - The ability to turn data into useful insights for your business
 Veracity - Trustworthiness in terms of quality and accuracy
 Hadoop and its Components
 Hadoop is a framework that uses distributed storage and parallel processing to
store and manage big data. It is the most commonly used software to handle
big data. There are two components of Hadoop.
 Hadoop HDFS - Hadoop Distributed File System (HDFS) is the storage unit of
Hadoop.
 Hadoop MapReduce - Hadoop MapReduce is the processing unit of Hadoop.
 Hadoop HDFS
 Data is stored in a distributed manner in HDFS. There are two
of HDFS - name node and data node. While there is only one name
there can be multiple data nodes.
 HDFS is specially designed for storing huge datasets in commodity
hardware. An enterprise version of a server costs roughly $10,000 per
terabyte for the full processor. In case you need to buy 100 of these
enterprise version servers, it will go up to a million dollars.
 Hadoop enables you to use commodity machines as your data nodes.
 Features of HDFS
 Provides distributed storage
 Can be implemented on commodity hardware
 Provides data security
 Highly fault tolerant - If one machine goes down, the data from that
machine goes to the next machine
 Master and slave nodes
 Master and slave nodes form the HDFS cluster. The name node is called
the master and the data nodes are called the slaves.
 Hadoop MapReduce
 Hadoop MapReduce is the processing unit of Hadoop. In the MapReduce
approach, the processing is done at the slave nodes and the final result is
sent to the master node.
 A data containing code is used to process the entire data. This coded data
usually very small in comparison to the data itself.
 You only need to send a few kilobytes worth of code to perform heavy duty
process on computers.
 The input dataset is first split into chunks of data. In this example, the input
has three lines of text with three separate entities - “bus car train”, “ship ship
train”, “bus ship car”.
 The dataset is then split into three chunks, based on these entities, and
processed parallelly.
 In the map phase, the data is assigned a key and a value of 1. In this case, we
have one bus, one car, one ship, and one train.
 These key-value pairs are then shuffled and sorted together based on their
keys. At the reduce phase, the aggregation takes place and the final output
is obtained.
 Hadoop YARN
 Hadoop YARN stands for Yet Another Resource Negotiator. It is the resource
management unit of Hadoop and is available as a component of Hadoop
version 2.
 Hadoop YARN acts like an OS to Hadoop. It is a file system that is built on
top of HDFS.
 It is responsible for managing cluster resources to make sure you don't
overload one machine.
 It performs job scheduling to make sure that the jobs are scheduled in the
right place
 Suppose a client machine wants to do a query or fetch some code for data
analysis. This job request goes to the resource manager (Hadoop Yarn),
which is responsible for resource allocation and management.
 In the node section, each of the nodes has their own node managers. These
node managers manage the nodes and monitor the resource usage in the node.
The containers contain a collection of physical resources, which could be RAM,
CPU or hard drives. Whenever a job request comes in, the app master requests
the container from the node manager. Once the node manager gets the
resource, it goes back to the Resource Manager.
 A Use Case of Hadoop
 In this case study, we will discuss how Hadoop can combat fraudulent activities.
 Let us look at the case of Zions Bancorporation. Their main challenge was in how
to use the Zions security team’s approaches to combat fraudulent activities
taking place. The problem was that they used an RDBMS dataset, which was
unable to store and analyze huge amounts of data.
 In other words, they were only able to analyze small amounts of data. But with a
flood of customers coming in, there were so many things they couldn’t keep
track of, which left them vulnerable to fraudulent activities
 They began to use parallel processing. However, the data was unstructured and
analyzing it was not possible. Not only did they have a huge amount of data
that could not get into their databases, but they also had unstructured data.
 Hadoop enabled the Zions’ team to pull all that massive amounts of data
together and store it in one place. It also became possible to process and
analyze the huge amounts of unstructured data that they had. It was more time
efficient and the in-depth analysis of various data formats became easier
through Hadoop. Zions’ team could now detect everything from malware,
spears, and phishing attempts to account takeovers.
www.asterixsolution.com
www.plus.google.com/+Asterixsolutionlab
www.facebook.com/asterixsolutionlab
To Know More Visit :-
http://www.asterixsolution.com/big-data-hadoop-training-in-mumbai.html

Contenu connexe

Tendances

Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map ReduceUrvashi Kataria
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Big Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopBig Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopRojaT4
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applicationsdzhou
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Big data technology unit 3
Big data technology unit 3Big data technology unit 3
Big data technology unit 3RojaT4
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Uwe Printz
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopEdureka!
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology LandscapeShivanandaVSeeri
 
Introduction of big data unit 1
Introduction of big data unit 1Introduction of big data unit 1
Introduction of big data unit 1RojaT4
 
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionEdureka!
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architectureHarikrishnan K
 

Tendances (19)

Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Big Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopBig Data Unit 4 - Hadoop
Big Data Unit 4 - Hadoop
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applications
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Hadoop Presentation
Hadoop PresentationHadoop Presentation
Hadoop Presentation
 
Anju
AnjuAnju
Anju
 
Big data technology unit 3
Big data technology unit 3Big data technology unit 3
Big data technology unit 3
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
Introduction of big data unit 1
Introduction of big data unit 1Introduction of big data unit 1
Introduction of big data unit 1
 
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
 

Similaire à What is Hadoop: An Introduction to Hadoop Components and Use Cases

Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopMr. Ankit
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystemrohitraj268
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Cognizant
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesDavid Tjahjono,MD,MBA(UK)
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introductionsaisreealekhya
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsCognizant
 
Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khanKamranKhan587
 

Similaire à What is Hadoop: An Introduction to Hadoop Components and Use Cases (20)

Big data
Big dataBig data
Big data
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution Times
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
paper
paperpaper
paper
 
Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khan
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 

Plus de faizrashid1995

Android Developer Training
Android Developer TrainingAndroid Developer Training
Android Developer Trainingfaizrashid1995
 
Android Developer Training
Android Developer TrainingAndroid Developer Training
Android Developer Trainingfaizrashid1995
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoptionfaizrashid1995
 
Big Data Courses In Mumbai
Big Data Courses In MumbaiBig Data Courses In Mumbai
Big Data Courses In Mumbaifaizrashid1995
 
Python Classes In Thane
Python Classes In ThanePython Classes In Thane
Python Classes In Thanefaizrashid1995
 
python classes in thane
python classes in thanepython classes in thane
python classes in thanefaizrashid1995
 
Hadoop training in mumbai
Hadoop training in mumbaiHadoop training in mumbai
Hadoop training in mumbaifaizrashid1995
 
android development training in mumbai
android development training in mumbaiandroid development training in mumbai
android development training in mumbaifaizrashid1995
 

Plus de faizrashid1995 (12)

Hadoop Training
Hadoop TrainingHadoop Training
Hadoop Training
 
Android Developer Training
Android Developer TrainingAndroid Developer Training
Android Developer Training
 
Android Developer Training
Android Developer TrainingAndroid Developer Training
Android Developer Training
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoption
 
What is python
What is pythonWhat is python
What is python
 
The mean stack
The mean stackThe mean stack
The mean stack
 
Big Data Courses In Mumbai
Big Data Courses In MumbaiBig Data Courses In Mumbai
Big Data Courses In Mumbai
 
Python Classes In Thane
Python Classes In ThanePython Classes In Thane
Python Classes In Thane
 
python classes in thane
python classes in thanepython classes in thane
python classes in thane
 
Hadoop training in mumbai
Hadoop training in mumbaiHadoop training in mumbai
Hadoop training in mumbai
 
Advanced java course
Advanced java courseAdvanced java course
Advanced java course
 
android development training in mumbai
android development training in mumbaiandroid development training in mumbai
android development training in mumbai
 

Dernier

Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 

Dernier (20)

Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 

What is Hadoop: An Introduction to Hadoop Components and Use Cases

  • 2.  When we look at how data was handled in the past, we see that it was a fairly easy task due to the limited amount of data that professionals had to work with. Years ago, only one processor and storage unit was required to handle data.  It was handled with the concept of structured data and a database that contained the relevant data.  SQL queries made it possible to go through giant spreadsheets with multiple rows and columns.
  • 3.  As the years went by and data generation increased, higher volumes and more formats emerged. Hence, multiple processors were needed to process data in order to save time.  However, a single storage unit became the bottleneck due to the network overhead that was generated. This led to using a distributed storage unit for each processor, which made data access easier.  This method is known as parallel processing with distributed storage - various computers run the processes on various storages.
  • 4.  Big Data and its Challenges  Big data refers to the massive amount of data which cannot be stored, processed and analyzed using traditional ways.  The main elements of big data are:  Volume - There is a massive amount of data generated every second.  Velocity - The speed at which data is generated, collected and analyzed  Variety - The different types of data: structured, semi-structured, unstructured
  • 5.  Value - The ability to turn data into useful insights for your business  Veracity - Trustworthiness in terms of quality and accuracy  Hadoop and its Components  Hadoop is a framework that uses distributed storage and parallel processing to store and manage big data. It is the most commonly used software to handle big data. There are two components of Hadoop.  Hadoop HDFS - Hadoop Distributed File System (HDFS) is the storage unit of Hadoop.  Hadoop MapReduce - Hadoop MapReduce is the processing unit of Hadoop.
  • 6.  Hadoop HDFS  Data is stored in a distributed manner in HDFS. There are two of HDFS - name node and data node. While there is only one name there can be multiple data nodes.  HDFS is specially designed for storing huge datasets in commodity hardware. An enterprise version of a server costs roughly $10,000 per terabyte for the full processor. In case you need to buy 100 of these enterprise version servers, it will go up to a million dollars.  Hadoop enables you to use commodity machines as your data nodes.
  • 7.  Features of HDFS  Provides distributed storage  Can be implemented on commodity hardware  Provides data security  Highly fault tolerant - If one machine goes down, the data from that machine goes to the next machine  Master and slave nodes  Master and slave nodes form the HDFS cluster. The name node is called the master and the data nodes are called the slaves.
  • 8.  Hadoop MapReduce  Hadoop MapReduce is the processing unit of Hadoop. In the MapReduce approach, the processing is done at the slave nodes and the final result is sent to the master node.  A data containing code is used to process the entire data. This coded data usually very small in comparison to the data itself.  You only need to send a few kilobytes worth of code to perform heavy duty process on computers.  The input dataset is first split into chunks of data. In this example, the input has three lines of text with three separate entities - “bus car train”, “ship ship train”, “bus ship car”.
  • 9.  The dataset is then split into three chunks, based on these entities, and processed parallelly.  In the map phase, the data is assigned a key and a value of 1. In this case, we have one bus, one car, one ship, and one train.  These key-value pairs are then shuffled and sorted together based on their keys. At the reduce phase, the aggregation takes place and the final output is obtained.  Hadoop YARN  Hadoop YARN stands for Yet Another Resource Negotiator. It is the resource management unit of Hadoop and is available as a component of Hadoop version 2.
  • 10.  Hadoop YARN acts like an OS to Hadoop. It is a file system that is built on top of HDFS.  It is responsible for managing cluster resources to make sure you don't overload one machine.  It performs job scheduling to make sure that the jobs are scheduled in the right place  Suppose a client machine wants to do a query or fetch some code for data analysis. This job request goes to the resource manager (Hadoop Yarn), which is responsible for resource allocation and management.
  • 11.  In the node section, each of the nodes has their own node managers. These node managers manage the nodes and monitor the resource usage in the node. The containers contain a collection of physical resources, which could be RAM, CPU or hard drives. Whenever a job request comes in, the app master requests the container from the node manager. Once the node manager gets the resource, it goes back to the Resource Manager.  A Use Case of Hadoop  In this case study, we will discuss how Hadoop can combat fraudulent activities.  Let us look at the case of Zions Bancorporation. Their main challenge was in how to use the Zions security team’s approaches to combat fraudulent activities taking place. The problem was that they used an RDBMS dataset, which was unable to store and analyze huge amounts of data.
  • 12.  In other words, they were only able to analyze small amounts of data. But with a flood of customers coming in, there were so many things they couldn’t keep track of, which left them vulnerable to fraudulent activities  They began to use parallel processing. However, the data was unstructured and analyzing it was not possible. Not only did they have a huge amount of data that could not get into their databases, but they also had unstructured data.  Hadoop enabled the Zions’ team to pull all that massive amounts of data together and store it in one place. It also became possible to process and analyze the huge amounts of unstructured data that they had. It was more time efficient and the in-depth analysis of various data formats became easier through Hadoop. Zions’ team could now detect everything from malware, spears, and phishing attempts to account takeovers.
  • 14. To Know More Visit :- http://www.asterixsolution.com/big-data-hadoop-training-in-mumbai.html