Big data presentation

•

0 likes•54 views

SreeSowmya7

Big data

Engineering

What is big
data
Big data is also data with a huge size
Big data is a term used to describe collection of data that is
huge in volume and yet growing exponentially with time
In short such data is so large and complex that none of the
traditional data management tools are able to process it
Machine learning is the largest source of big data
Three types of diverse data sources are machine data,
Organizational data and people
An example of machine data is Weather station sensor output
An example of Organizational data is Disease data from Center
for Disease Control

Advantages
of big data
REAL-TIME
NOTIFICATION ENABLES
REAL-TIME ACTIONS
DESIGN DIFFERENTLY CULTURE SHIFT TO REAL
TIME OPERATIONS
INCREASED USE OF
SCALABILITY COMPUTING
SUPERVISORY CONTROL
AND DATA
ACQUISITION(SCADA)

Big data generated
by people :
Unstructured data
Company Data processed
daily
eBay 100
Petabytes(PB)
Google 100 PB
Facebook 30+PB
Spotify 64 Terabytes
Twitter 100 Terabytes

Big data generated by people: How is it
being used

Organization generated data-Benefits come from
combining with other types
The UPS success 16 million shipments
per day
40 million tracking
requests
UPS is estimated to
have 16PBs of data
around its operations
50 million dollar
savings
Route
optimization
SAVINGSLarge
operational
data
Optimization
algorithms

Future
Trends for
Organizations
Efficient Operations
Higher sales
Improved Safety
Customer Satisfaction
Better profit margins
Improved Product placement

Characteristics
of Big Data
Volume
Variety
Velocity
Veracity
Valence
Value

Apache
Framework
Basic
Modules
Hadoop Common
Hadoop Distributed File
System (HDFS)
Hadoop YARN
Hadoop MapReduce

The Basic
Hadoop
Components
• Hadoop Common - libraries and
utilities
• Hadoop Distributed File System
(HDFS) – a distributed file-system
• Hadoop YARN – a resource-
management platform, scheduling
• Hadoop MapReduce – a programming
model for large scale data processing

Original HDFS Design Goals
• Resilience to hardware
failure
• Streaming data access • Support for large dataset,
scalability to
hundreds/thousands of nodes
with high aggregate
bandwidth
• Application locality to data • Portability across
heterogeneous hardware and
software platforms

Original HDFS
Design
• Single NameNode - a
master server that manages
the file system namespace
and regulates access to files
by clients.
• Multiple DataNodes –
typically one per node in the
cluster.
Functions • Manage storage
• Serving read/write requests
from clients
• Block creation, deletion,
replication based on
instructions from NameNode

HDFS in Hadoop 2
• HDFS Federation • Multiple Namenode servers
• Multiple namespaces
• High Availability –
redundant NameNodes
• Heterogeneous Storage
and Archival Storage :
ARCHIVE, DISK, SSD,
RAM_DISK

MapReduce
Framework
• Software framework – for writing parallel data
processing applications
• MapReduce job splits data into chunks
• Map tasks process data chunks
• Framework sorts map output
• Reduce tasks use sorted map data as input

YARN: NexGen MapReduce
• Main idea – Separate
resource management and
job scheduling/monitoring.
• Global ResourceManager
(RM)
• NodeManager on each node • ApplicationMaster – one for
each application

YARN FEATURES
• HIGH AVAILABILITY
RESOURCEMANAGER
• TIMELINE SERVER • USE OF CGROUPS • SECURE
CONTAINERS
• YARN – WEB
SERVICES REST APIS

HBase
Features
• Consistency • High Availability • Automatic Sharding
• Replication • Security • SQL like access
(Hive, Spark, Impala)

What's hot

Intro to Big Data HadoopApache Apex

A Glimpse of Bigdata - Introductionsaisreealekhya

Big data and hadoopPrashanth Yennampelli

Hadoop Training Tutorial for Freshersrajkamaltibacademy

View on big data technologiesKrisshhna Daasaarii

Analysis of big data in pandemic case Muh Saleh

Cred_hadoop_presenatationAshish Saraf

Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...yashbheda

Big Dataipower softwares

Introduction to bigdataJayanthi Janani

A Gentle Introduction to Big DataMehmet Ali Akyol

Big data PPT Nitesh Dubey

Big data hadoopAgnieszka Zdebiak

Fundamentals of big data analytics and HadoopArchana Gopinath

Introduction To Big Data Analytics On Hadoop - SpringPeopleSpringPeople

Introduction to Big Data Hadoop Training Online by www.itjobzone.bizITJobZone.biz

Big data pptShweta Sahu

Big Tools for Big DataLewis Crawford

Big dataSyed Measum Haider Bokhari

Hadoop - A big data initiativeMansi Mehra

What's hot (20)

Intro to Big Data Hadoop

A Glimpse of Bigdata - Introduction

Big data and hadoop

Hadoop Training Tutorial for Freshers

View on big data technologies

Analysis of big data in pandemic case

Cred_hadoop_presenatation

Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...

Big Data

Introduction to bigdata

A Gentle Introduction to Big Data

Big data PPT

Big data hadoop

Fundamentals of big data analytics and Hadoop

Introduction To Big Data Analytics On Hadoop - SpringPeople

Introduction to Big Data Hadoop Training Online by www.itjobzone.biz

Big data ppt

Big Tools for Big Data

Big data

Hadoop - A big data initiative

Recently uploaded (20)

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE

the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...

Porous Ceramics seminar and technical writing

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik

Introduction to IEEE STANDARDS and its different types.pptx

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝

Introduction to Multiple Access Protocol.pptx

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...

Coefficient of Thermal Expansion and their Importance.pptx

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)

Big data presentation

1. Big data

2. What is big data Big data is also data with a huge size Big data is a term used to describe collection of data that is huge in volume and yet growing exponentially with time In short such data is so large and complex that none of the traditional data management tools are able to process it Machine learning is the largest source of big data Three types of diverse data sources are machine data, Organizational data and people An example of machine data is Weather station sensor output An example of Organizational data is Disease data from Center for Disease Control

3. Advantages of big data REAL-TIME NOTIFICATION ENABLES REAL-TIME ACTIONS DESIGN DIFFERENTLY CULTURE SHIFT TO REAL TIME OPERATIONS INCREASED USE OF SCALABILITY COMPUTING SUPERVISORY CONTROL AND DATA ACQUISITION(SCADA)

4. Big data generated by people : Unstructured data Company Data processed daily eBay 100 Petabytes(PB) Google 100 PB Facebook 30+PB Spotify 64 Terabytes Twitter 100 Terabytes

5. Big data generated by people: How is it being used

6. Organization generated data: Structured

7. Organization generated data-Benefits come from combining with other types The UPS success 16 million shipments per day 40 million tracking requests UPS is estimated to have 16PBs of data around its operations 50 million dollar savings Route optimization SAVINGSLarge operational data Optimization algorithms

8. Future Trends for Organizations Efficient Operations Higher sales Improved Safety Customer Satisfaction Better profit margins Improved Product placement

9. Characteristics of Big Data Volume Variety Velocity Veracity Valence Value

10. Hadoop

11. What is Hadoop

12. Apache Framework Basic Modules Hadoop Common Hadoop Distributed File System (HDFS) Hadoop YARN Hadoop MapReduce

13. The Basic Hadoop Components • Hadoop Common - libraries and utilities • Hadoop Distributed File System (HDFS) – a distributed file-system • Hadoop YARN – a resource- management platform, scheduling • Hadoop MapReduce – a programming model for large scale data processing

14. Hadoop Applications and Frameworks

15. Original HDFS Design Goals • Resilience to hardware failure • Streaming data access • Support for large dataset, scalability to hundreds/thousands of nodes with high aggregate bandwidth • Application locality to data • Portability across heterogeneous hardware and software platforms

16. Original HDFS Design • Single NameNode - a master server that manages the file system namespace and regulates access to files by clients. • Multiple DataNodes – typically one per node in the cluster. Functions • Manage storage • Serving read/write requests from clients • Block creation, deletion, replication based on instructions from NameNode

17. HDFS in Hadoop 2 • HDFS Federation • Multiple Namenode servers • Multiple namespaces • High Availability – redundant NameNodes • Heterogeneous Storage and Archival Storage : ARCHIVE, DISK, SSD, RAM_DISK

18. MapReduce Framework • Software framework – for writing parallel data processing applications • MapReduce job splits data into chunks • Map tasks process data chunks • Framework sorts map output • Reduce tasks use sorted map data as input

19. Original MapReduce Framework

20. YARN: NexGen MapReduce • Main idea – Separate resource management and job scheduling/monitoring. • Global ResourceManager (RM) • NodeManager on each node • ApplicationMaster – one for each application

21. YARN FEATURES • HIGH AVAILABILITY RESOURCEMANAGER • TIMELINE SERVER • USE OF CGROUPS • SECURE CONTAINERS • YARN – WEB SERVICES REST APIS

22. Hadoop Execution Environment

23. Apache HBase

24. HBase Features • Consistency • High Availability • Automatic Sharding • Replication • Security • SQL like access (Hive, Spark, Impala)

25. Applications using HDFS

26. Apache Spark Architecture

27. THANK YOU