SlideShare a Scribd company logo
1 of 38
THE SOLUTION FOR BIG DATA
NAME:SIVAKOTI TARAKA SATYA PHANINDRA
ROLL NO:15K81D5824
COURSE: CSE M.TECH/SEM-1
CONTENT:
Data – Trends in storing data.
BigData – Problems in IT industry
Why BigData ?
Introduction to HADOOP
HDFS (Hadoop Distributed File System)
 MapReduce
Prominent users of Hadoop.
Conclusion
Data – Trends in storing data
What is data--- Any real world symbol (character, numeric,
special character) or a of group of them is said to be data it
may be of the visual or audio or scriptural , images, etc​.,
File system
Databases
Cloud (internet)
BIG DATA:
What is big data—In IT, it is a collection of data sets so
large and complex data that it becomes difficult to process
using on-hand database management tools or traditional
data processing applications.
 As of 2016, limits on the size of data sets that are
feasible to process in reasonable time were on the order
of Exabyte of data.​(KBs MBs GBs TBs PB
ZB )
BIGDATA and problems with it.
 Daily about 0.8 Petabytes of updates are being made
into FACEBOOK including 50 millions photos.​
 Daily, YOUTUBE is loaded with videos that can be watched for one year
continuously​
 Limitations are encountered due to large data sets in many areas, including
meteorology, genomics, complex physics simulations, and biological and
environmental research.
 Also affect Internet search, finance and business informatics.
 The challenges include in capture, retrieval, storage, search, sharing, analysis,
and visualization.​
Why BIG DATA ?
Unstructured DATA growth !
THEN WHAT COULD BE THE SOLUTION
FOR BIGDATA ?
Hadoop’s Developers:
 2005: Doug Cutting and Michael J. Cafarella developed Hadoop to
support distribution for the Nutch search engine project.
 The project was funded by Yahoo.
 2006: Yahoo gave the project to Apache Software Foundation.
Doug Cutting
What is Hadoop?
 It is a open source software written in java
 Hadoop software library is a framework that allows for the
distributed processing of large data sets across clusters of
computers using simple programming models.
 It is designed to scale up from single servers to thousands of
machines, each offering local computation and storage.
• Apache top level project,
open-source
implementation of
frameworks for reliable,
scalable, distributed
computing and data
storage.
• It is a flexible and highly-
available architecture for
large scale computation and
data processing on a
network of commodity
hardware.
The project includes these modules:
Hadoop Common
Hadoop Distributed File System(HDFS)
Hadoop MapReduce
1.Hadoop Commons
 It provides access to the filesystems supported by Hadoop.
 The Hadoop Common package contains the necessary JAR
files and scripts needed to start Hadoop.
 The package also provides source code, documentation,
and a contribution section which includes projects from
the Hadoop Community (Avro, Cassandra, Chukwa, Hbase,
Hive, Mahout, Pig, ZooKeeper)
2. Hadoop Distributed File System
(HDFS):
 Hadoop uses HDFS, a distributed file system based on GFS (Google
File System), as its shared filesystem.
 HDFS architecture divides files into large chunks (~64MB)
distributed across data servers (this is configurable).
 It has a namenode and datanodes
What does a HDFS contain
 HDFS consists of a global namenodes or namespaces and they are
federated.
 The datanodes are used as common storage for blocks by all the
Namenodes.
 Each datanode registers with all the Namenodes in the cluster.
 Datanodes send periodic heartbeats and block reports and handles
commands from the Namenodes
Structure of Hadoop system:
Master Node :
Name Node
Secondary Name Node
Job Tracker
Slaves :
Data Node
Task Tracker
MASTER NODE:
 Master node
 Keeps track of namespace and metadata about items
 Keeps track of MapReduce jobs in the system
 Hadoop currently configured with centurion064 as the master
node
 Hadoop is locally installed in each system.
 Installed location is in /localtmp/hadoop/hadoop-0.15.3
SLAVE NODES:
 Slave nodes
 Manage blocks of data sent from master node
 In common, these are the chunkservers
 Currently centurion060, centurion064 are the two slave nodes being
used.
 Slave nodes store their data in /localtmp/hadoop/hadoop-dfs (this is
automatically created by the DFS)
 Once you use the DFS, relative paths are from /usr/{your usr id}
Advantages and Limitations of HDFS :
 Reduce traffic on job scheduling.
 File access can be achieved through the native Java or
language of the users' choice (C++, Java, Python, PHP, Ruby,
Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml),
 It cannot be directly mounted by an existing operating
system.
 It should be provided with UNIX or LUNIX system.
3.Hadoop MAPREDUCE SYSTEM:
 The Hadoop MapReduce framework harnesses a cluster of
machines and executes user defined MapReduce jobs across
the nodes in the cluster.
 A MapReduce computation has two phases
 a map phase and
 a reduce phase.
MAP AND REDUCE METHODS USAGE…
Map function
Reduce function
Run this program as a
MapReduce job
WORD COUNT OVER A GIVEN SET OF STRINGS
We 1
love 1
India 1
We 1
Play 1
Tennis 1
Love 1
India 1
We 2
Tennis 1
Play 1
Map Reduce
MAPREDUCE IN WITH NO REDUCE TASKS
MAPREDUCE WITH TWO REDUCE TASKS - AUTOMATIC
PARALLEL EXECUTION IN MAPREDUCE
Shuffle and sort in MapReduce with
multiple reduce tasks
Prominent users of HADOOP
 Amazon – 100 nodes
 Facebook – two clusters of 8000 and 3000 nodes
 Adobe – 80 node system
 EBay – 532 node cluster
 yahoo – cluster of about 4500 nodes
 IIIT Hyderabad – 30 node cluster
Trending :Hadoop Job’s
Salaries Tend in Hadoop:
Achievements :
 2008 - Hadoop Wins Terabyte Sort Benchmark (sorted 1 terabyte of data in 209
seconds, compared to previous record of 297 seconds)
 2009 - Avro and Chukwa became new members of Hadoop Framework family
 2010 - Hadoop's Hbase, Hive and Pig subprojects completed, adding more
computational power to Hadoop framework
 2011 - ZooKeeper Completed
 March 2011 - Apache Hadoop takes top prize at Media Guardian Innovation
Award
 2013 - Hadoop 1.1.2 and Hadoop 2.0.3 alpha.
- Ambari, Cassandra, Mahout have been added
Conclusion:
It reduce traffic on capture, storage, search, sharing, analysis, and
visualization.
A huge amount of data could be stored and large computations
could be done in a single compound with full safety and security
at cheap cost.
BIGDATA and BIGDATA-SOLUTIONS is one of the burning issues in
the present IT industry so, work on those will surely make you
more useful to that.
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA

More Related Content

What's hot

Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar ReportAtul Kushwaha
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programPraveen Kumar Donta
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache HadoopAjit Koti
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionCloudera, Inc.
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemMahabubur Rahaman
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
Large Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceHortonworks
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1Giovanna Roda
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaEdureka!
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Uwe Printz
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop IntroductionDzung Nguyen
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architectureHarikrishnan K
 

What's hot (20)

Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Hadoop Technologies
Hadoop TechnologiesHadoop Technologies
Hadoop Technologies
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Large Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduce
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1
 
HDFS
HDFSHDFS
HDFS
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
 

Viewers also liked

Comparision of biogeography of microbial mapping
Comparision of biogeography of microbial mappingComparision of biogeography of microbial mapping
Comparision of biogeography of microbial mappingpriyanka kandasamy
 
літературно мистецький жовтень2015
літературно мистецький жовтень2015літературно мистецький жовтень2015
літературно мистецький жовтень2015Юлия Тер-Давлатян
 
літературно мистецький календар травень 2015
літературно мистецький календар травень 2015літературно мистецький календар травень 2015
літературно мистецький календар травень 2015Юлия Тер-Давлатян
 
Curriculum Vitae - Nenko Todorov
Curriculum Vitae - Nenko TodorovCurriculum Vitae - Nenko Todorov
Curriculum Vitae - Nenko TodorovNenko Todorov
 
літературно мистецький календар грудень 2015
літературно мистецький календар грудень 2015літературно мистецький календар грудень 2015
літературно мистецький календар грудень 2015Юлия Тер-Давлатян
 
Evaluation questions 1
Evaluation questions 1Evaluation questions 1
Evaluation questions 1Camstewart17
 
Swapnil Bhavsar - Resume
Swapnil Bhavsar - ResumeSwapnil Bhavsar - Resume
Swapnil Bhavsar - Resumeswapnil bhavsar
 
Літературно-мистецький календар ЛИСТОПАД 2016
Літературно-мистецький календар ЛИСТОПАД 2016Літературно-мистецький календар ЛИСТОПАД 2016
Літературно-мистецький календар ЛИСТОПАД 2016Юлия Тер-Давлатян
 
Brynn Kardash Resume May 2016
Brynn Kardash Resume May 2016Brynn Kardash Resume May 2016
Brynn Kardash Resume May 2016Kardash Brynn
 
Bai 1 lam quen voi sql 2008
Bai 1 lam quen voi sql 2008Bai 1 lam quen voi sql 2008
Bai 1 lam quen voi sql 2008Phương Nhung
 
информация по тренингу
информация по тренингуинформация по тренингу
информация по тренингуblondik1289
 
Hum07 heritage s lideshow c and d - johnny
Hum07   heritage s lideshow c and d - johnnyHum07   heritage s lideshow c and d - johnny
Hum07 heritage s lideshow c and d - johnnyHuawaii
 
Лiтературно-мистецький календар
Лiтературно-мистецький календарЛiтературно-мистецький календар
Лiтературно-мистецький календарЮлия Тер-Давлатян
 
The-Path-to-2016-Success
The-Path-to-2016-SuccessThe-Path-to-2016-Success
The-Path-to-2016-SuccessMatt Robbins
 
金融监管框架的改革国际经验和中国的选择
金融监管框架的改革国际经验和中国的选择金融监管框架的改革国际经验和中国的选择
金融监管框架的改革国际经验和中国的选择Beixiao(Robert) Liu
 

Viewers also liked (20)

EC_NL_2015_08
EC_NL_2015_08EC_NL_2015_08
EC_NL_2015_08
 
Comparision of biogeography of microbial mapping
Comparision of biogeography of microbial mappingComparision of biogeography of microbial mapping
Comparision of biogeography of microbial mapping
 
літературно мистецький жовтень2015
літературно мистецький жовтень2015літературно мистецький жовтень2015
літературно мистецький жовтень2015
 
літературно мистецький календар травень 2015
літературно мистецький календар травень 2015літературно мистецький календар травень 2015
літературно мистецький календар травень 2015
 
Digipak Analysis
Digipak AnalysisDigipak Analysis
Digipak Analysis
 
Curriculum Vitae - Nenko Todorov
Curriculum Vitae - Nenko TodorovCurriculum Vitae - Nenko Todorov
Curriculum Vitae - Nenko Todorov
 
літературно мистецький календар грудень 2015
літературно мистецький календар грудень 2015літературно мистецький календар грудень 2015
літературно мистецький календар грудень 2015
 
Pollution control
Pollution controlPollution control
Pollution control
 
Evaluation questions 1
Evaluation questions 1Evaluation questions 1
Evaluation questions 1
 
Swapnil Bhavsar - Resume
Swapnil Bhavsar - ResumeSwapnil Bhavsar - Resume
Swapnil Bhavsar - Resume
 
Літературно-мистецький календар ЛИСТОПАД 2016
Літературно-мистецький календар ЛИСТОПАД 2016Літературно-мистецький календар ЛИСТОПАД 2016
Літературно-мистецький календар ЛИСТОПАД 2016
 
Actual faerie powerpoint
Actual faerie powerpointActual faerie powerpoint
Actual faerie powerpoint
 
Brynn Kardash Resume May 2016
Brynn Kardash Resume May 2016Brynn Kardash Resume May 2016
Brynn Kardash Resume May 2016
 
Bai 1 lam quen voi sql 2008
Bai 1 lam quen voi sql 2008Bai 1 lam quen voi sql 2008
Bai 1 lam quen voi sql 2008
 
информация по тренингу
информация по тренингуинформация по тренингу
информация по тренингу
 
Hum07 heritage s lideshow c and d - johnny
Hum07   heritage s lideshow c and d - johnnyHum07   heritage s lideshow c and d - johnny
Hum07 heritage s lideshow c and d - johnny
 
Лiтературно-мистецький календар
Лiтературно-мистецький календарЛiтературно-мистецький календар
Лiтературно-мистецький календар
 
The-Path-to-2016-Success
The-Path-to-2016-SuccessThe-Path-to-2016-Success
The-Path-to-2016-Success
 
E learning
E learningE learning
E learning
 
金融监管框架的改革国际经验和中国的选择
金融监管框架的改革国际经验和中国的选择金融监管框架的改革国际经验和中国的选择
金融监管框架的改革国际经验和中国的选择
 

Similar to THE SOLUTION FOR BIG DATA

Similar to THE SOLUTION FOR BIG DATA (20)

Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Anju
AnjuAnju
Anju
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoop
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Hadoop
HadoopHadoop
Hadoop
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 
Cppt
CpptCppt
Cppt
 
Big data
Big dataBig data
Big data
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 

Recently uploaded

Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...amilabibi1
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityHung Le
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfSkillCertProExams
 
Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxlionnarsimharajumjf
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lodhisaajjda
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.thamaeteboho94
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatmentnswingard
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...David Celestin
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...ZurliaSoop
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalFabian de Rijk
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfMahamudul Hasan
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Baileyhlharris
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIINhPhngng3
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoKayode Fayemi
 

Recently uploaded (17)

Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptx
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait Cityin kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 

THE SOLUTION FOR BIG DATA

  • 1. THE SOLUTION FOR BIG DATA NAME:SIVAKOTI TARAKA SATYA PHANINDRA ROLL NO:15K81D5824 COURSE: CSE M.TECH/SEM-1
  • 2. CONTENT: Data – Trends in storing data. BigData – Problems in IT industry Why BigData ? Introduction to HADOOP HDFS (Hadoop Distributed File System)  MapReduce Prominent users of Hadoop. Conclusion
  • 3. Data – Trends in storing data What is data--- Any real world symbol (character, numeric, special character) or a of group of them is said to be data it may be of the visual or audio or scriptural , images, etc​., File system Databases Cloud (internet)
  • 4. BIG DATA: What is big data—In IT, it is a collection of data sets so large and complex data that it becomes difficult to process using on-hand database management tools or traditional data processing applications.  As of 2016, limits on the size of data sets that are feasible to process in reasonable time were on the order of Exabyte of data.​(KBs MBs GBs TBs PB ZB )
  • 5.
  • 6. BIGDATA and problems with it.  Daily about 0.8 Petabytes of updates are being made into FACEBOOK including 50 millions photos.​  Daily, YOUTUBE is loaded with videos that can be watched for one year continuously​  Limitations are encountered due to large data sets in many areas, including meteorology, genomics, complex physics simulations, and biological and environmental research.  Also affect Internet search, finance and business informatics.  The challenges include in capture, retrieval, storage, search, sharing, analysis, and visualization.​
  • 9. THEN WHAT COULD BE THE SOLUTION FOR BIGDATA ?
  • 10. Hadoop’s Developers:  2005: Doug Cutting and Michael J. Cafarella developed Hadoop to support distribution for the Nutch search engine project.  The project was funded by Yahoo.  2006: Yahoo gave the project to Apache Software Foundation. Doug Cutting
  • 11. What is Hadoop?  It is a open source software written in java  Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.  It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
  • 12. • Apache top level project, open-source implementation of frameworks for reliable, scalable, distributed computing and data storage. • It is a flexible and highly- available architecture for large scale computation and data processing on a network of commodity hardware.
  • 13.
  • 14. The project includes these modules: Hadoop Common Hadoop Distributed File System(HDFS) Hadoop MapReduce
  • 15. 1.Hadoop Commons  It provides access to the filesystems supported by Hadoop.  The Hadoop Common package contains the necessary JAR files and scripts needed to start Hadoop.  The package also provides source code, documentation, and a contribution section which includes projects from the Hadoop Community (Avro, Cassandra, Chukwa, Hbase, Hive, Mahout, Pig, ZooKeeper)
  • 16. 2. Hadoop Distributed File System (HDFS):  Hadoop uses HDFS, a distributed file system based on GFS (Google File System), as its shared filesystem.  HDFS architecture divides files into large chunks (~64MB) distributed across data servers (this is configurable).  It has a namenode and datanodes
  • 17. What does a HDFS contain  HDFS consists of a global namenodes or namespaces and they are federated.  The datanodes are used as common storage for blocks by all the Namenodes.  Each datanode registers with all the Namenodes in the cluster.  Datanodes send periodic heartbeats and block reports and handles commands from the Namenodes
  • 18. Structure of Hadoop system: Master Node : Name Node Secondary Name Node Job Tracker Slaves : Data Node Task Tracker
  • 19. MASTER NODE:  Master node  Keeps track of namespace and metadata about items  Keeps track of MapReduce jobs in the system  Hadoop currently configured with centurion064 as the master node  Hadoop is locally installed in each system.  Installed location is in /localtmp/hadoop/hadoop-0.15.3
  • 20. SLAVE NODES:  Slave nodes  Manage blocks of data sent from master node  In common, these are the chunkservers  Currently centurion060, centurion064 are the two slave nodes being used.  Slave nodes store their data in /localtmp/hadoop/hadoop-dfs (this is automatically created by the DFS)  Once you use the DFS, relative paths are from /usr/{your usr id}
  • 21.
  • 22. Advantages and Limitations of HDFS :  Reduce traffic on job scheduling.  File access can be achieved through the native Java or language of the users' choice (C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml),  It cannot be directly mounted by an existing operating system.  It should be provided with UNIX or LUNIX system.
  • 23. 3.Hadoop MAPREDUCE SYSTEM:  The Hadoop MapReduce framework harnesses a cluster of machines and executes user defined MapReduce jobs across the nodes in the cluster.  A MapReduce computation has two phases  a map phase and  a reduce phase.
  • 24. MAP AND REDUCE METHODS USAGE… Map function Reduce function Run this program as a MapReduce job
  • 25. WORD COUNT OVER A GIVEN SET OF STRINGS We 1 love 1 India 1 We 1 Play 1 Tennis 1 Love 1 India 1 We 2 Tennis 1 Play 1 Map Reduce
  • 26. MAPREDUCE IN WITH NO REDUCE TASKS
  • 27. MAPREDUCE WITH TWO REDUCE TASKS - AUTOMATIC PARALLEL EXECUTION IN MAPREDUCE
  • 28. Shuffle and sort in MapReduce with multiple reduce tasks
  • 29.
  • 30.
  • 31.
  • 32. Prominent users of HADOOP  Amazon – 100 nodes  Facebook – two clusters of 8000 and 3000 nodes  Adobe – 80 node system  EBay – 532 node cluster  yahoo – cluster of about 4500 nodes  IIIT Hyderabad – 30 node cluster
  • 34. Salaries Tend in Hadoop:
  • 35. Achievements :  2008 - Hadoop Wins Terabyte Sort Benchmark (sorted 1 terabyte of data in 209 seconds, compared to previous record of 297 seconds)  2009 - Avro and Chukwa became new members of Hadoop Framework family  2010 - Hadoop's Hbase, Hive and Pig subprojects completed, adding more computational power to Hadoop framework  2011 - ZooKeeper Completed  March 2011 - Apache Hadoop takes top prize at Media Guardian Innovation Award  2013 - Hadoop 1.1.2 and Hadoop 2.0.3 alpha. - Ambari, Cassandra, Mahout have been added
  • 36. Conclusion: It reduce traffic on capture, storage, search, sharing, analysis, and visualization. A huge amount of data could be stored and large computations could be done in a single compound with full safety and security at cheap cost. BIGDATA and BIGDATA-SOLUTIONS is one of the burning issues in the present IT industry so, work on those will surely make you more useful to that.