Big data

BIG DATA
Presented By,
R.S.M.N.PRASAD.
(pvpsit)

OUTLOOK
 Introduction
 Hadoop
 MapReduce
 Hyper Table
 Advantages

BIG DATA
• The data comes from everywhere: sensors used to
gather climate information, posts to social media sites,
digital pictures and videos, purchase transaction records,
and cell phone GPS signals to name a few. This data
is called Big Data.
• Every day, we create 2.5 quintillion bytes (one quintillion
bytes = one billion gigabytes). Of all data, so much of
90% of the data in the world today has been created in
the last two years alone.

IN FACT, IN A MINUTE…
• Email users send more than 204 million messages;
• Mobile Web receives 217 new users;
• Google receives over 2 million search queries;
• YouTube users upload 48 hours of new video;
• Facebook users share 684,000 bits of content;
• Twitter users send more than 100,000 tweets;
• Consumers spend $272,000 on Web shopping;
• Apple receives around 47,000 application downloads;
• Brands receive more than 34,000 Facebook 'likes';
• Tumblr blog owners publish 27,000 new posts;
• Instagram users share 3,600 new photos;
• Flickr users , on the other hand , add 3,125 new photos;
• Foursquare users perform 2,000 check-ins;
• WordPress users publish close to 350 new blog posts.

Big Data Vectors
• High-volume:
Amount of data
• High-velocity:
Speed rate in collecting or acquiring or generating or
processing of data
• High-variety:
Different data type such as audio, video, image data
Big Data = Transactions + Interactions + Observations

What is Hadoop?
• HADOOP
High-availability distributed object-oriented platform or
“Hadoop” is a software framework which analyze structured
and unstructured data and distribute applications on different
servers.
• Basic Application of Hadoop
Hadoop is used in maintaining, scaling, error handling,
self healing and securing large scale of data. These data can
be structured or unstructured. What I mean to say is if data is
large then traditional systems are unable to handle it.

DIFFERENT COMPONENTS ARE..........
Data Access Components :- PIG & HIVE
Data Storage Components :- HBASE
Data Integration Components :- APACHEFLUME ,SQOOP, CHUKWA.
Data Management Components :- AMBARI , ZOOKEEPER.
Data Serialization Components :- THRIFT & AVRO
Data Intelligence Components :- APACHE MAHOUT, DRILL

What does it do?
• Hadoop implements Google’s MapReduce, using
HDFS
• MapReduce divides applications into many small
blocks of work.
• HDFS creates multiple replicas of data blocks for
reliability, placing them on compute nodes
around the cluster.
• MapReduce can then process the data where it
is located.
• Hadoop ‘s target is to run on clusters of the order
of 10,000-nodes.

How does MapReduce work?
• The run time partitions the input and provides it
to different Map instances;
• Map (key, value)  (key’, value’)
• The run time collects the (key’, value’) pairs and
distributes them to several Reduce functions so
that each Reduce function gets the pairs with the
same key’.
• Each Reduce produces a single (or zero) file
output.
• Map and Reduce are user written functions.

HYPERTABLE
What is it?
• Open source Big table clone
• Manages massive sparse tables with timestamped cell
versions
• Single primary key index
What is it not?
• No joins
• No secondary indexes (not yet)
• No transactions (not yet)

RANGE SERVER
• Manages ranges of table data
• Caches updates in memory (Cell Cache)
• Periodically spills (compacts) cached updates to disk (CellStore)

PERFORMANCE OPTIMIZATIONS
Block Cache
• Caches CellStore blocks
• Blocks are cached uncompressed
Bloom Filter
• Avoids unnecessary disk access
• Filter by rows or rows + columns
• Configurable false positive rate
Access Groups
• Physically store co-accessed columns together
• Improves performance by minimizing I/O

ADVANTAGES
• Flexible : Easily to access Structured & Unstructured
Data
• Scalable: It can store & distributed very large data , sets
100’s of inexpensive Servers that Operate in Parallel.
• Efficient: By distributing the data, it can process it in
parallel on the nodes where the data is located.
• Resistant to Failure: It automatically maintains
multiple copies of data and automatically redeploys
computing tasks based on failures.

Big data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Big data

Similar to Big data (20)

Recently uploaded

Recently uploaded (20)

Big data