SlideShare une entreprise Scribd logo
1  sur  18
BY
B.REVATHI REDDY
(19/11/2016)
BIG DATA
What is Big Data?
Big data refers to huge amount of digital information collected from multiple
and different sources.
Big Data is one of those things that is completely transforming the way we
are doing the everyday things which leaves a digital trace which can be
used and analyzed. Big Data refers to our ability to make use of the ever
increasing volumes of data .An aim to solve new problems or old problems
in a better way.
Data generated by us :
Mobile Devices
Conversation data
Photo and video Image data
Social Networks data
Satellites
The Internet Of Things data
Big Data are characterized by 3v’s:
Volume – Data Quantity
Velocity - Data Speed
Variety – Types of data
Storing Big Data
Analyzing data characteristics
Selecting data sources for analysis
Eliminating redundant data
Processing Big Data
Mapping data to programming frame work
Connecting and extracting data from storage
Transforming data for processing
Subdividing data for Hadoop MapReduce
Creating the components of Hadoop MapReduce jobs
Executing Hadoop MapReduce Jobs
Monitoring the progress of the job flows
The Structure of Big Data
Structured – Traditional data sources , the data stored in
fields in a database
Semi-structured – a form of structured data that doesn’t
conform with the formal structure of the data models of relat
ional databases and also has tags or other markers to sepa
rate semantic elements within the data
Unstructured – video data , audio data , the data that do
esn’t reside in a traditional row-column database .
How is Big Data actually used?
Some examples…
Better understand and target customers
Understand and optimize business processes
Improving health
Improving security
Improving sports performance
Improving and optimizing cities and countries
There are endless applications of Big Data. Any business t
hat doesn’t seriously consider the implications of big data
runs in the risk of being left behind!
Infrastructure of Big Data
To handle different dimensions of big data in terms of volume , ve
locity, variety an effective and efficient design has to used proces
s large amount of data arriving at high speed from different sourc
es .Multiple faces are present here
Multi-source Big data generation
Big data Storage
Big data Processing
Cloud Computing and Big Data
Big Data needs massive amounts of memory or storage space fo
r all the data to be stored in .This is where Cloud Computing com
es into the picture which is cost saving ,scalable , provides variet
y of services like - huge processing power, high storage capabilit
y.
Survey paper on Big Data(IEEE)
Ms.Vibhavari Chavan, Prof.Rajesh.N.Phursule(IJCSIT paper)
Big Data usually includes data set with sizes beyond the ability of
commonly used software tools to capture, manage and process dat
a within a tolerable elapsed time .
 Size of big data is constantly a moving target.
 Big Data is a set of techniques and technologies that require new
form of integration to uncover large hidden values from large data s
ets.
 Big data environment is used to organize and analyze various typ
es of data.
Map Reduce framework generates a lot of intermediate data.
Hadoop
Hadoop is open source framework
Hadoop framework is written in java
Response time varies depending on the complexity of the process
Massive scalability is the key advantage
Currently used for index web searches , email spam detection, pred
iction in financial services etc.
By storing data hadoop consists of 2components:
HDFS , Map Reduce
HDFS
HDFS is the file system component of Hadoop framework designed a
nd optimized to store large amounts of data on low cost hardware. Arch
itecture of HDFS has :
Name Node - kind of master node having the information abo
ut metadata. All data node address, free space, active passive type dat
a node, stored data, job tracker.
Data Node – Data node is a type of slave node in the hadoop,
which is used to save the data and there is task tracker in data node w
hich is use to track on the ongoing job on the data node and the jobs w
hich coming from name node.
MapReduce Framework
Two input files:
file1: “hello world hello moon”
file2: “goodbye world goodnight moon”
Three operations:
Map
Combine
Reduce
Map
First map: Second map:
< hello, 1 > < goodbye, 1 >
< world, 1 > < world, 1 >
< hello, 1 > < goodnight, 1 >
< moon, 1 > < moon, 1 >
COMBINE
First map: Second map:
< moon, 1 > < goodbye, 1 >
< world, 1 > < world, 1 >
< hello, 2 > < goodnight, 1 >
< moon, 1 >
REDUCE
< goodbye, 1 >
< goodnight, 1 >
< moon, 2 >
< world, 2 >
< hello, 2 >
PIG
Initially developed by Yahoo! Is a programming language used to handle any k
ind of data.
 Pig had two components:
first being the language itself called “PigLatin”
second is the runtime environment where the PigLatin programs are
executed .
Look at the programming language itself so that easier than having to write
mapper and reducer programs:
• The first step in this language is to LOAD the data to be manipulate
d into HDFS
• Then run the data through a set of TRANSFORMations (in turn conve
rted into mapper and reducer tasks )
• DUMP the data to the screen or STORE the results elsewhere.
HIVE
Initially developed by Facebook now Apache HIVE is a data warehouse infrast
ructure built on top of hadoop for query, data summarization and analysis.
Supports analysis of datasets stored in Hadoop’s HDFS and other compatible
file systems
Different storage types – plain text, HBase and other
Metadata storage in RDBMS ,reduces time for semantic checks
Operating on compressed data stored in Hadoop
Built-in User-defined Functions(UDF’s)
SQL like queries “HiveQL” that are implicitly converted into MapReduce jobs
It provides indexes including bit map indexes to fasten the queries.
HBase
HBase is a column-oriented Database where as HDFS is file system
HBase has a table format with rows and columns and each table sho
uld have a Primary Key defined in it that is used for all accesses in this
HBase table. Allows many attributes to be grouped into Column familie
s .
Table schema should be predefined along with the column families ,b
ut is flexible enough to add new columns to the families at any time ,ma
king the schema flexible .
Just as HDFS’s NameNode and slave nodes MapReduce also has Jo
bTracker and TaskTracker slave nodes .
Availability of NameNode in this case is also a concern jus as in HDF
S , and is also sensitive to loss of information of the master node
Conclusion
Hadoop MapReduce is an open source framework used for data-sensiti
ve ,reliable, fault tolerant, scalable data, has many implementation opti
ons and allows rewriting algorithms into MapReduce.
The framework breaks up large data into smaller chunks and handles it
.
We can present the design and evaluation of a data aware cache fram
ework that requires minimum change to the original MapReduce progra
mming model for provisioning incremental processing for Big data appli
cations using the MapReduce model.
References
www.quoble.com
www.insidebigdata.com
www.ibmbigdatahub.com
www.data-magnum.com
Survey paper on Big Data and Hadoop by Varsha B.Bobade ,IRJET volume-3
Survey paper on Big Data by Ms.Vibhavari Chavan ,Prof.Rajesh N.Phursule,IJCS
IT,vol.5

Contenu connexe

Tendances

HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Cognizant
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPTAnand Pandey
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architectureHarikrishnan K
 
Apache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringApache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringBADR
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoopAditi Yadav
 

Tendances (19)

HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
 
hadoop
hadoophadoop
hadoop
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
IJARCCE_49
IJARCCE_49IJARCCE_49
IJARCCE_49
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPT
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Apache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringApache Hadoop - Big Data Engineering
Apache Hadoop - Big Data Engineering
 
Big data
Big dataBig data
Big data
 
Big data & hadoop
Big data & hadoopBig data & hadoop
Big data & hadoop
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoop
 

En vedette

Mapa educacional
Mapa educacionalMapa educacional
Mapa educacionalmauriliojr
 
Stc5 trabalho torrent grupo-6
Stc5  trabalho torrent grupo-6Stc5  trabalho torrent grupo-6
Stc5 trabalho torrent grupo-6Ricardo Malheiros
 
Emily Munford - Digital Reputation Management
Emily Munford - Digital Reputation ManagementEmily Munford - Digital Reputation Management
Emily Munford - Digital Reputation ManagementNatasha Preocanin
 
Tutorial openshot
Tutorial openshotTutorial openshot
Tutorial openshotNecy
 
Collaborating across borders: OER use and open educational practices within t...
Collaborating across borders: OER use and open educational practices within t...Collaborating across borders: OER use and open educational practices within t...
Collaborating across borders: OER use and open educational practices within t...The Open Education Consortium
 
Open Education 101 (OE Global 2015 Pre-conference workshop)
Open Education 101 (OE Global 2015 Pre-conference workshop)Open Education 101 (OE Global 2015 Pre-conference workshop)
Open Education 101 (OE Global 2015 Pre-conference workshop)The Open Education Consortium
 
Volunteer Consultants for Women's Building in San Francisco
Volunteer Consultants for Women's Building in San FranciscoVolunteer Consultants for Women's Building in San Francisco
Volunteer Consultants for Women's Building in San FranciscoChung-Ying Yeh
 
Libraries to Go: Mobile Tech in Libraries
Libraries to Go: Mobile Tech in LibrariesLibraries to Go: Mobile Tech in Libraries
Libraries to Go: Mobile Tech in LibrariesEllyssa Kroski
 
Assignment 35 final
Assignment 35 final Assignment 35 final
Assignment 35 final benchaisty
 
Calidad de salud en Colombia spf
Calidad de salud en Colombia spfCalidad de salud en Colombia spf
Calidad de salud en Colombia spfsarapatinofranco
 
Pixelis manifeste RSE 2012
Pixelis manifeste RSE 2012Pixelis manifeste RSE 2012
Pixelis manifeste RSE 2012Pixelis
 
Information Quality Criteria Analysis in Query Reformulation in Distributed D...
Information Quality Criteria Analysis in Query Reformulation in Distributed D...Information Quality Criteria Analysis in Query Reformulation in Distributed D...
Information Quality Criteria Analysis in Query Reformulation in Distributed D...Bruno Felipe
 

En vedette (20)

Mapa educacional
Mapa educacionalMapa educacional
Mapa educacional
 
Stc5 trabalho torrent grupo-6
Stc5  trabalho torrent grupo-6Stc5  trabalho torrent grupo-6
Stc5 trabalho torrent grupo-6
 
Emily Munford - Digital Reputation Management
Emily Munford - Digital Reputation ManagementEmily Munford - Digital Reputation Management
Emily Munford - Digital Reputation Management
 
Multiplicacion acortada
Multiplicacion acortadaMultiplicacion acortada
Multiplicacion acortada
 
Como introduzir o scrum na sua organização
Como introduzir o scrum na sua organizaçãoComo introduzir o scrum na sua organização
Como introduzir o scrum na sua organização
 
Tutorial openshot
Tutorial openshotTutorial openshot
Tutorial openshot
 
Collaborating across borders: OER use and open educational practices within t...
Collaborating across borders: OER use and open educational practices within t...Collaborating across borders: OER use and open educational practices within t...
Collaborating across borders: OER use and open educational practices within t...
 
Los valores
Los valoresLos valores
Los valores
 
Carta escrita en 2070
Carta escrita en 2070Carta escrita en 2070
Carta escrita en 2070
 
Open Education 101 (OE Global 2015 Pre-conference workshop)
Open Education 101 (OE Global 2015 Pre-conference workshop)Open Education 101 (OE Global 2015 Pre-conference workshop)
Open Education 101 (OE Global 2015 Pre-conference workshop)
 
Volunteer Consultants for Women's Building in San Francisco
Volunteer Consultants for Women's Building in San FranciscoVolunteer Consultants for Women's Building in San Francisco
Volunteer Consultants for Women's Building in San Francisco
 
6th biosimilars congregation 2015
6th biosimilars congregation 20156th biosimilars congregation 2015
6th biosimilars congregation 2015
 
Libraries to Go: Mobile Tech in Libraries
Libraries to Go: Mobile Tech in LibrariesLibraries to Go: Mobile Tech in Libraries
Libraries to Go: Mobile Tech in Libraries
 
Assignment 35 final
Assignment 35 final Assignment 35 final
Assignment 35 final
 
Calidad de salud en Colombia spf
Calidad de salud en Colombia spfCalidad de salud en Colombia spf
Calidad de salud en Colombia spf
 
Turismo gastronomico
Turismo gastronomicoTurismo gastronomico
Turismo gastronomico
 
Donde Dios me quiera
Donde Dios me quieraDonde Dios me quiera
Donde Dios me quiera
 
proyecto
proyectoproyecto
proyecto
 
Pixelis manifeste RSE 2012
Pixelis manifeste RSE 2012Pixelis manifeste RSE 2012
Pixelis manifeste RSE 2012
 
Information Quality Criteria Analysis in Query Reformulation in Distributed D...
Information Quality Criteria Analysis in Query Reformulation in Distributed D...Information Quality Criteria Analysis in Query Reformulation in Distributed D...
Information Quality Criteria Analysis in Query Reformulation in Distributed D...
 

Similaire à Big data

Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopIOSR Journals
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeSysfore Technologies
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation Shivanee garg
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with HadoopNalini Mehta
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoopManoj Jangalva
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopMr. Ankit
 
BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfDIVYA370851
 
Hadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An OverviewHadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An Overviewrahulmonikasharma
 
Learn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceLearn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceAssignment Help
 

Similaire à Big data (20)

Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 
G017143640
G017143640G017143640
G017143640
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
paper
paperpaper
paper
 
hadoop
hadoophadoop
hadoop
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Big Data
Big DataBig Data
Big Data
 
Hadoop
HadoopHadoop
Hadoop
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdf
 
Hadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An OverviewHadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An Overview
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Learn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceLearn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant Resource
 
Hadoop
HadoopHadoop
Hadoop
 

Dernier

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 

Dernier (20)

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 

Big data

  • 2. What is Big Data? Big data refers to huge amount of digital information collected from multiple and different sources. Big Data is one of those things that is completely transforming the way we are doing the everyday things which leaves a digital trace which can be used and analyzed. Big Data refers to our ability to make use of the ever increasing volumes of data .An aim to solve new problems or old problems in a better way.
  • 3. Data generated by us : Mobile Devices Conversation data Photo and video Image data Social Networks data Satellites The Internet Of Things data
  • 4. Big Data are characterized by 3v’s: Volume – Data Quantity Velocity - Data Speed Variety – Types of data Storing Big Data Analyzing data characteristics Selecting data sources for analysis Eliminating redundant data
  • 5. Processing Big Data Mapping data to programming frame work Connecting and extracting data from storage Transforming data for processing Subdividing data for Hadoop MapReduce Creating the components of Hadoop MapReduce jobs Executing Hadoop MapReduce Jobs Monitoring the progress of the job flows
  • 6. The Structure of Big Data Structured – Traditional data sources , the data stored in fields in a database Semi-structured – a form of structured data that doesn’t conform with the formal structure of the data models of relat ional databases and also has tags or other markers to sepa rate semantic elements within the data Unstructured – video data , audio data , the data that do esn’t reside in a traditional row-column database .
  • 7. How is Big Data actually used? Some examples… Better understand and target customers Understand and optimize business processes Improving health Improving security Improving sports performance Improving and optimizing cities and countries There are endless applications of Big Data. Any business t hat doesn’t seriously consider the implications of big data runs in the risk of being left behind!
  • 8. Infrastructure of Big Data To handle different dimensions of big data in terms of volume , ve locity, variety an effective and efficient design has to used proces s large amount of data arriving at high speed from different sourc es .Multiple faces are present here Multi-source Big data generation Big data Storage Big data Processing Cloud Computing and Big Data Big Data needs massive amounts of memory or storage space fo r all the data to be stored in .This is where Cloud Computing com es into the picture which is cost saving ,scalable , provides variet y of services like - huge processing power, high storage capabilit y.
  • 9. Survey paper on Big Data(IEEE) Ms.Vibhavari Chavan, Prof.Rajesh.N.Phursule(IJCSIT paper) Big Data usually includes data set with sizes beyond the ability of commonly used software tools to capture, manage and process dat a within a tolerable elapsed time .  Size of big data is constantly a moving target.  Big Data is a set of techniques and technologies that require new form of integration to uncover large hidden values from large data s ets.  Big data environment is used to organize and analyze various typ es of data. Map Reduce framework generates a lot of intermediate data.
  • 10. Hadoop Hadoop is open source framework Hadoop framework is written in java Response time varies depending on the complexity of the process Massive scalability is the key advantage Currently used for index web searches , email spam detection, pred iction in financial services etc. By storing data hadoop consists of 2components: HDFS , Map Reduce
  • 11. HDFS HDFS is the file system component of Hadoop framework designed a nd optimized to store large amounts of data on low cost hardware. Arch itecture of HDFS has : Name Node - kind of master node having the information abo ut metadata. All data node address, free space, active passive type dat a node, stored data, job tracker. Data Node – Data node is a type of slave node in the hadoop, which is used to save the data and there is task tracker in data node w hich is use to track on the ongoing job on the data node and the jobs w hich coming from name node.
  • 12. MapReduce Framework Two input files: file1: “hello world hello moon” file2: “goodbye world goodnight moon” Three operations: Map Combine Reduce Map First map: Second map: < hello, 1 > < goodbye, 1 > < world, 1 > < world, 1 > < hello, 1 > < goodnight, 1 > < moon, 1 > < moon, 1 > COMBINE First map: Second map: < moon, 1 > < goodbye, 1 > < world, 1 > < world, 1 > < hello, 2 > < goodnight, 1 > < moon, 1 > REDUCE < goodbye, 1 > < goodnight, 1 > < moon, 2 > < world, 2 > < hello, 2 >
  • 13.
  • 14. PIG Initially developed by Yahoo! Is a programming language used to handle any k ind of data.  Pig had two components: first being the language itself called “PigLatin” second is the runtime environment where the PigLatin programs are executed . Look at the programming language itself so that easier than having to write mapper and reducer programs: • The first step in this language is to LOAD the data to be manipulate d into HDFS • Then run the data through a set of TRANSFORMations (in turn conve rted into mapper and reducer tasks ) • DUMP the data to the screen or STORE the results elsewhere.
  • 15. HIVE Initially developed by Facebook now Apache HIVE is a data warehouse infrast ructure built on top of hadoop for query, data summarization and analysis. Supports analysis of datasets stored in Hadoop’s HDFS and other compatible file systems Different storage types – plain text, HBase and other Metadata storage in RDBMS ,reduces time for semantic checks Operating on compressed data stored in Hadoop Built-in User-defined Functions(UDF’s) SQL like queries “HiveQL” that are implicitly converted into MapReduce jobs It provides indexes including bit map indexes to fasten the queries.
  • 16. HBase HBase is a column-oriented Database where as HDFS is file system HBase has a table format with rows and columns and each table sho uld have a Primary Key defined in it that is used for all accesses in this HBase table. Allows many attributes to be grouped into Column familie s . Table schema should be predefined along with the column families ,b ut is flexible enough to add new columns to the families at any time ,ma king the schema flexible . Just as HDFS’s NameNode and slave nodes MapReduce also has Jo bTracker and TaskTracker slave nodes . Availability of NameNode in this case is also a concern jus as in HDF S , and is also sensitive to loss of information of the master node
  • 17. Conclusion Hadoop MapReduce is an open source framework used for data-sensiti ve ,reliable, fault tolerant, scalable data, has many implementation opti ons and allows rewriting algorithms into MapReduce. The framework breaks up large data into smaller chunks and handles it . We can present the design and evaluation of a data aware cache fram ework that requires minimum change to the original MapReduce progra mming model for provisioning incremental processing for Big data appli cations using the MapReduce model.
  • 18. References www.quoble.com www.insidebigdata.com www.ibmbigdatahub.com www.data-magnum.com Survey paper on Big Data and Hadoop by Varsha B.Bobade ,IRJET volume-3 Survey paper on Big Data by Ms.Vibhavari Chavan ,Prof.Rajesh N.Phursule,IJCS IT,vol.5