SlideShare une entreprise Scribd logo
1  sur  36
Big Data Analytics - Hadoop
Vishwajeet Jadeja
MSc Statistics
Department of Statistics
The Maharaja Sayajirao University of Baroda
Introduction
• Big data burst upon the scene in the first decade of the 21st century.
• Big data analytics is the process of examining large data sets
containing a variety of data types -- i.e., “big data” -- to uncover
hidden patterns, unknown correlations, market trends, customer
preferences and other useful business information.
• The analytical findings can lead to more effective marketing, new
revenue opportunities, better customer service, improved operational
efficiency, competitive advantages over competing organizations and
other business benefits.
• “Data analytics” is used to describe statistical and mathematical data
analysis that clusters, segments, scores and predicts what scenarios are
most likely to happen.
Introduction (contd.)
• Arguably, firms like Google, eBay, LinkedIn, and Facebook were built
around big data from the beginning.
• Analytics on big data have to coexist with analytics on other types of data.
• Hadoop clusters have to do their work alongside IBM mainframes.
• Data scientists must somehow get along and work jointly with mere
quantitative analysts.
• Firms that have long handled massive volumes of data are beginning to
enthuse about the ability to handle a new type of data—voice or text or log
files or images or video.
Three V’s
Examples
• A retail bank is getting a handle on its multi-channel customer interactions
for the first time by analyzing log files.
• A hotel firm is analyzing customer lines with video analytics.
• A health insurer is able to better predict customer dissatisfaction by
analyzing speech-to-text data from call center recordings.
In short, these companies can have a much more complete picture of their
customers and operations by combining unstructured and structured data.
Objectives for Big Data
• Cost Reduction from Big Data Technologies
• Time Reduction from Big Data
• Developing New Big Data-Based Offerings
• Supporting Internal Business Decisions
Cost Reduction from Big Data Technologies
• Some organizations pursuing big data believe strongly that MIPS and
terabyte storage for structured data are now most cheaply delivered through
big data technologies like Hadoop clusters.
• Organizations that were focused on cost reduction made the decision to
adopt big data tools primarily within the IT organization on largely
technical and economic criteria.
Time Reduction from Big Data
• The second common objective of big data technologies and solutions is time
reduction.
• Key objective involving time reduction is to be able to interact with the
customer in real time, using analytics and data derived from the customer
experience.
• If the customer has “left the building,” targeted offers and services are likely
to be much less effective.
• This means rapid data capture, aggregation, processing, and analytics.
Developing New Big Data-Based Offerings
• One of the most ambitious things an organization can do with big data is to
employ it in developing new product and service offerings based on data.
• Many of the companies that employ this approach are online firms, which
have an obvious need to employ data-based products and services.
• Ex. LinkedIn, Google, etc.
Supporting Internal Business Decisions
• The primary purpose behind traditional, “small data” analytics was to
support internal business decisions. What offers should be presented to a
customer? Which customers are most likely to stop being customers soon?
How much inventory should be held in the warehouse? How should we
price our products?
• These types of decisions employ big data when there are new, less
structured data sources that can be applied to the decision.
• Business decisions with big data can also involve other traditional areas for
analytics such as supply chains, risk management, or pricing. The factor that
makes these big data problems, rather than small, is the use of external data
to improve the analysis.
Hadoop: Introduction
• The Apache Hadoop software library is a framework that allows for the
distributed processing of large data sets across clusters of computers using
simple programming models.
• It is made by apache software foundation in 2011.
• Written in JAVA.
History
• Inventor: Doug Cutting, creator of Apache Lucene.
• The Origin of the Name “Hadoop”:
“The name my kid gave a stuffed yellow elephant. Short, relatively easy to
spell and pronounce, meaningless, and not used elsewhere: those are my
naming criteria.” ---Daug Cutting.
• Google’s GFS in 2003 solved storage problem.
• Feb 2006 they moved out of Nutch to form an independent subproject of
Lucene called Hadoop.
History (contd.)
• At around the same time, Doug Cutting joined Yahoo.
• February 2008 , Yahoo! announced that its production search index was
being generated by a 10,000-core Hadoop cluster.
• In January 2008, Hadoop was made its own top-level project at apache,
confirming its success and its diverse, active community.
What we’ve got : Hadoop!
• Fault-tolerant file system.
• Hadoop Distributed File System (HDFS)
• Modeled on Google File system
• Takes computation to data
• Data Locality
• Scalability:
• Program remains same for 10, 100, 1000,… nodes
• Corresponding performance improvement
• Parallel computation using MapReduce
• Other components – Pig, Hbase, HIVE, ZooKeeper
HDFS: Hadoop Distributed File System
• Filesystems that manage the storage across a network of machines are
called distributed filesystems.
• Hadoop comes with a distributed filesystem called HDFS, which stands for
Hadoop Distributed Filesystem.
• HDFS, the Hadoop Distributed File System, is a distributed file system
designed to hold very large amounts of data (terabytes or even petabytes),
and provide high-throughput access to this information.
• It ties so many small and reasonable priced machines together into a single
cost effective computer cluster.
HDFS: Hadoop Distributed File System
(contd.)
• Data and application processing are protected against hardware failure.
• If a node goes down, jobs are automatically redirected to other nodes to
make sure the distributed computing does not fail.
• It automatically stores multiple copies of all data.
• It provides simplified programming model which allows user to quickly
read and write the distributed system.
The master node: NameNode
Functions:
• Manages File System- mapping files to blocks and blocks to data nodes
• Maintaining status of data nodes
• Heartbeat
• Datanode sends heartbeat at regular intervals
• If heartbeat is not received, datanode is declared dead
• Blockreport
• DataNode sends list of blocks on it
• Used to check health of HDFS
NameNode Functions
• Replication
• On Datanode failure
• On Disk failure
• On Block corruption
• Data integrity
• Checksum for each block
• Stored in hidden file
• Rebalancing- balancer tool
• Addition of new nodes
• Decommissioning
• Deletion of some files
HDFS Robustness
• Safemode
• At startup: No replication possible
• Receives Heartbeats and Blockreports from Datanode
• Only a percentage of blocks are checked for defined replication factor
All is well  => Exit Safemode
• Replicate blocks wherever necessary
MapReduce
• It is a powerful paradigm for parallel computation.
• Hadoop uses MapReduce to execute jobs on files in HDFS.
• Hadoop will intelligently distribute computation over cluster.
• It is an associative implementation for processing and generating large data
sets.
• MAP function that process a key pair to generates a set of intermediate key
pairs.
• REDUCE function that merges all intermediate values associated with the
same intermediate key.
NameNode Overview
Programming model
• Format of input- output
(key, value)
• Map: (k1, v1) → list (k2, v2)
• Reduce: (k2, list v2) → list (k3, v3)
Example:1
Example:2
Hadoop example of wordcount of 5000
random alphabets
Hadoop Distributions
Who all are using Hadoop
Thank You

Contenu connexe

Tendances

Infrastructure And Application Consolidation Analysis And Design
Infrastructure And Application Consolidation Analysis And DesignInfrastructure And Application Consolidation Analysis And Design
Infrastructure And Application Consolidation Analysis And Design
Alan McSweeney
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
 

Tendances (20)

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data
Big dataBig data
Big data
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Introduction to sqoop
Introduction to sqoopIntroduction to sqoop
Introduction to sqoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data Analysis
 
Infrastructure And Application Consolidation Analysis And Design
Infrastructure And Application Consolidation Analysis And DesignInfrastructure And Application Consolidation Analysis And Design
Infrastructure And Application Consolidation Analysis And Design
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
MS-SQL SERVER ARCHITECTURE
MS-SQL SERVER ARCHITECTUREMS-SQL SERVER ARCHITECTURE
MS-SQL SERVER ARCHITECTURE
 
About CDAP
About CDAPAbout CDAP
About CDAP
 
Hadoop
HadoopHadoop
Hadoop
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 

Similaire à Big data analytics - hadoop

Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
Rajesh Jayarman
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
Manish Chopra
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
email2jl
 

Similaire à Big data analytics - hadoop (20)

Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big Data
Big DataBig Data
Big Data
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Modul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptxModul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptx
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big data Question bank.pdf
Big data Question bank.pdfBig data Question bank.pdf
Big data Question bank.pdf
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 

Dernier

CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 

Dernier (20)

CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 

Big data analytics - hadoop

  • 1. Big Data Analytics - Hadoop Vishwajeet Jadeja MSc Statistics Department of Statistics The Maharaja Sayajirao University of Baroda
  • 2. Introduction • Big data burst upon the scene in the first decade of the 21st century. • Big data analytics is the process of examining large data sets containing a variety of data types -- i.e., “big data” -- to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. • The analytical findings can lead to more effective marketing, new revenue opportunities, better customer service, improved operational efficiency, competitive advantages over competing organizations and other business benefits. • “Data analytics” is used to describe statistical and mathematical data analysis that clusters, segments, scores and predicts what scenarios are most likely to happen.
  • 3. Introduction (contd.) • Arguably, firms like Google, eBay, LinkedIn, and Facebook were built around big data from the beginning. • Analytics on big data have to coexist with analytics on other types of data. • Hadoop clusters have to do their work alongside IBM mainframes. • Data scientists must somehow get along and work jointly with mere quantitative analysts. • Firms that have long handled massive volumes of data are beginning to enthuse about the ability to handle a new type of data—voice or text or log files or images or video.
  • 5. Examples • A retail bank is getting a handle on its multi-channel customer interactions for the first time by analyzing log files. • A hotel firm is analyzing customer lines with video analytics. • A health insurer is able to better predict customer dissatisfaction by analyzing speech-to-text data from call center recordings. In short, these companies can have a much more complete picture of their customers and operations by combining unstructured and structured data.
  • 6. Objectives for Big Data • Cost Reduction from Big Data Technologies • Time Reduction from Big Data • Developing New Big Data-Based Offerings • Supporting Internal Business Decisions
  • 7. Cost Reduction from Big Data Technologies • Some organizations pursuing big data believe strongly that MIPS and terabyte storage for structured data are now most cheaply delivered through big data technologies like Hadoop clusters. • Organizations that were focused on cost reduction made the decision to adopt big data tools primarily within the IT organization on largely technical and economic criteria.
  • 8. Time Reduction from Big Data • The second common objective of big data technologies and solutions is time reduction. • Key objective involving time reduction is to be able to interact with the customer in real time, using analytics and data derived from the customer experience. • If the customer has “left the building,” targeted offers and services are likely to be much less effective. • This means rapid data capture, aggregation, processing, and analytics.
  • 9. Developing New Big Data-Based Offerings • One of the most ambitious things an organization can do with big data is to employ it in developing new product and service offerings based on data. • Many of the companies that employ this approach are online firms, which have an obvious need to employ data-based products and services. • Ex. LinkedIn, Google, etc.
  • 10. Supporting Internal Business Decisions • The primary purpose behind traditional, “small data” analytics was to support internal business decisions. What offers should be presented to a customer? Which customers are most likely to stop being customers soon? How much inventory should be held in the warehouse? How should we price our products? • These types of decisions employ big data when there are new, less structured data sources that can be applied to the decision. • Business decisions with big data can also involve other traditional areas for analytics such as supply chains, risk management, or pricing. The factor that makes these big data problems, rather than small, is the use of external data to improve the analysis.
  • 11. Hadoop: Introduction • The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. • It is made by apache software foundation in 2011. • Written in JAVA.
  • 12. History • Inventor: Doug Cutting, creator of Apache Lucene. • The Origin of the Name “Hadoop”: “The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere: those are my naming criteria.” ---Daug Cutting. • Google’s GFS in 2003 solved storage problem. • Feb 2006 they moved out of Nutch to form an independent subproject of Lucene called Hadoop.
  • 13. History (contd.) • At around the same time, Doug Cutting joined Yahoo. • February 2008 , Yahoo! announced that its production search index was being generated by a 10,000-core Hadoop cluster. • In January 2008, Hadoop was made its own top-level project at apache, confirming its success and its diverse, active community.
  • 14. What we’ve got : Hadoop! • Fault-tolerant file system. • Hadoop Distributed File System (HDFS) • Modeled on Google File system • Takes computation to data • Data Locality • Scalability: • Program remains same for 10, 100, 1000,… nodes • Corresponding performance improvement • Parallel computation using MapReduce • Other components – Pig, Hbase, HIVE, ZooKeeper
  • 15. HDFS: Hadoop Distributed File System • Filesystems that manage the storage across a network of machines are called distributed filesystems. • Hadoop comes with a distributed filesystem called HDFS, which stands for Hadoop Distributed Filesystem. • HDFS, the Hadoop Distributed File System, is a distributed file system designed to hold very large amounts of data (terabytes or even petabytes), and provide high-throughput access to this information. • It ties so many small and reasonable priced machines together into a single cost effective computer cluster.
  • 16. HDFS: Hadoop Distributed File System (contd.) • Data and application processing are protected against hardware failure. • If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed computing does not fail. • It automatically stores multiple copies of all data. • It provides simplified programming model which allows user to quickly read and write the distributed system.
  • 17.
  • 18. The master node: NameNode Functions: • Manages File System- mapping files to blocks and blocks to data nodes • Maintaining status of data nodes • Heartbeat • Datanode sends heartbeat at regular intervals • If heartbeat is not received, datanode is declared dead • Blockreport • DataNode sends list of blocks on it • Used to check health of HDFS
  • 19. NameNode Functions • Replication • On Datanode failure • On Disk failure • On Block corruption • Data integrity • Checksum for each block • Stored in hidden file • Rebalancing- balancer tool • Addition of new nodes • Decommissioning • Deletion of some files
  • 20. HDFS Robustness • Safemode • At startup: No replication possible • Receives Heartbeats and Blockreports from Datanode • Only a percentage of blocks are checked for defined replication factor All is well  => Exit Safemode • Replicate blocks wherever necessary
  • 21. MapReduce • It is a powerful paradigm for parallel computation. • Hadoop uses MapReduce to execute jobs on files in HDFS. • Hadoop will intelligently distribute computation over cluster. • It is an associative implementation for processing and generating large data sets. • MAP function that process a key pair to generates a set of intermediate key pairs. • REDUCE function that merges all intermediate values associated with the same intermediate key.
  • 23. Programming model • Format of input- output (key, value) • Map: (k1, v1) → list (k2, v2) • Reduce: (k2, list v2) → list (k3, v3)
  • 26. Hadoop example of wordcount of 5000 random alphabets
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 35. Who all are using Hadoop