2. What is Big
Data?
Generally refers to data that can not be
processed by traditional systems efficiently
mainly because of it's size.
Twitter/Facebook example
Facebook – 500TB data daily
Twitter – 250million tweets daily
90% of data has been generated in last 2-3
years.
3. Big Data
Sources
Sources -
• Social networking sites like twitter, facebook etc.
• Smart phones
• Trading platforms
• Machines
• Log Files
This data is used for different purposes like
• Product Trends
• Market Analysis
4. What is
Hadoop ?
Apache Hadoop is a Framework for running
applications on large cluster built of commodity
hardware.
Transparently provides applications both
reliability and data motion.
Implements a computational paradigm named
Map/Reduce where application is divided in
small fragments of work.
Provides a distributed file system (HDFS)
Transfers code near to data.
Hadoop opened the gates for processing Big
Data
5. Hadoop's
History
Hadoop is based on work done by Google
GFS – HDFS
Google Map Reduce – Hadoop Map
Reduce
BigTable – HBase
6. Hadoop
Features
Partial Failure Support
Data Recoverability
Component Recovery
Consistency
Scalability
7. Hadoop
Components
Core Components
• HDFS – Hadoop Distributed File System
• Map Reduce
Projects in Hadoop Ecosystem
• Pig, Hive, HBase, Flume, Oozie, Sqoop
etc.
10. Case
Study
Product - Data Quality and cleansing product
solutions.
Before Hadoop
Two node DB cluster
Multi-threaded java application for de-
duplication
1 million records took 10 hrs. to process
After Hadoop
8 GB Ram, 4 cores, 4 machines in cluster.
1 million records took 30 min to process
11. Hadoop In
Use
Any application which has
> 10TB data
Needs fast and cheap processing
Log Analysis
Recommendation Engine
Feed Analysis
Data Mining
Statistical Analysis
ETL Processing
Business Intelligence
12. Cloudera
Cloudera is “The commercial Hadoop
company”.
Founded by leading experts on Hadoop
from Facebook, Google,Oracle and Yahoo.
Provides consulting and training services
for Hadoop users.
Staff includes committers to virtually all
Hadoop projects.
13. Resources
Books
Hadoop : The Definitive Guide (by Tom White)
Hbase : The Definitive Guide (by Lars George)
MapReduce Design Patterns (by Donald Miner)
Web
http://hadoop.apache.org/
http://hbase.apache.org/
http://research.google.com/archive/bigtable.html
http://research.google.com/archive/mapreduce-osdi04.pdf
14. Contact us @
Xebia India
Website
www.xebia.com Thought Leadership
www.xebia.in http://blog.xebia.com
www.xebia.fr http://podcast.xebia.com