2. Topics
• Data Management Today
• New Interests, Expectations, Problems
• Big Data
• New Approach
• Big Data Ecosystem
• Q&A
3. Data Management Today
• Relational Databases
• Oracle, MySQL, MS-SQL Server
• Data warehouse Appliances
• Teradata, IBM-Netezza
• Legacy Systems
• Mainframes
4. New Interests, Expectations
• Collect More, Data-Mine More • Actionable Insights
• Complex Data Integration • Extension of Investments
• Advanced Analytics • Talent Management
• Social Data Analysis • ROI
• Machine Data Analysis • TCO
• Realtime Data Analysis • Business Continuity
5. How Big is Data?
? BIG 90
is the average
$214
amount companies
have to spend per
of the world’s data compromised
Facts was created in the
last two years
customer when a
data breach occurs
(as of Oct 2012)
2.7bn
Average number
of “likes” and
“comments”
posted on
247bn
e-mail messages are sent each
Facebook daily day… about 80% of them are
spam
It would take 2,000 hours
to watch all the YouTube
500,000+
videos uploaded while data centers across the world are large
we’re talking on this enough to fill 5,955 football fields
panel*
*this is 3x more than just 2 short years ago
5
6. New Problems
• Unpredictable Volume • Computing Limitations
• Data Processing Issues • Information vs. Insights
• Data Integration Issues • Business Requirements
• Identifying Source-of-Truth • Regulatory Requirements
• Store vs. Analyze • True Value-of-Data
• Data Retrieval Requirements • Price to Performance Dilemma
7. What is Big Data?
• Very large data sets • Real-time data streaming
• Sizes from 100 TB to 50 PB data
• Larger than “one machine” • High volume / Low latency
• Whole data set analysis • Write heavy
replaces “sampling” • Read heavy
• Both is common
Volume Velocity
• Structured data
• OLTP Variety Complexity
• DW
• ODS
• Data marts
• Unstructured data • Complexity
• Text • Data acquisition
• Audio • Analysis
• Video • Deriving insights
• Click streams
• Log files
Source: Ventana Research
8. New Approach
• Commodity Hardware
• Open Computing Project
• Open Source Solutions, Frameworks
• Value Added Products – Cloudera, Datastax, 10gen
• Research Oriented Product Development
• Augmented Ecosystem
9. Big Data : Ecosystem
Advanced
Analytics
Predictive & Optimization
Modeling, Business
Data Analytics Processes Analysis,
R Splunk
Functional Analysis SAS Big Data
Madlib Mahout
Visual Analytics
Tableau
Advanced Visualizations
Data Delivery Data Delivery - Dashboards , Scorecard SpotFire
(Strategy Maps), Spatial & Temporal Datameer
Data Visualization Analysis
Pig Hive Other BI Tools with
Data Engineering BI / Reporting Hadoop connectors
Data Engineering - Performance Reporting, Enterprise Lucene Karmasphere
Data Agility Metrics, Data Agility - Data Mining, OLAP Modeling etc
Cassandra Crunch Pangool
Data Consolidation Data Storage and Processing
HDFS HBase Mapreduce
Data Storage, Data processing
Data Economics
Flume Scribe Avro
Sqoop Chukwa
Data Integration & Management Zookeeper Oozie
Data Filtering, Data Consolidation & Warehousing, Data Quality, Metadata
Integration Management, Job Scheduling, Data Economics Native Hadoop ETL
Traditional ETL with
Hadoop connectors
Distributed Infrastructure
Hadoop components Open source Hadoop platforms
3rd party Hadoop supporting platforms
10. What Big Data can do that traditional data warehousing and analytics cannot?
Traditional DW Big Data
Complete records from known transactional Data from many different internal & external sources
systems. with unknown quality and/or utility.
u
Data is structured, and data fields have known Loosely structured data. Flat schemas with few
(and often complex) interrelationships. complex interrelationships, connections between data
u elements have to be probabilistically inferred.
Multi Terabytes of Data Multi Peta Bytes of Data
u
Mostly Scale Up Architecture Scale Out Architecture
u
The analytic models are larger and require very large
Analytics run on a stable data model. u amounts of hardware resources to process them in a
timely manner
Low Performance/Cost ratio as most of the High Performance/Cost ratio as most of the software/
software/hardware platforms are proprietary u hardware platforms are commodity, free, open source
and license based
10
11. What Big Data can do that traditional data warehousing and analytics cannot?
Traditional DW Big Data
Aggregate data (structured) u Raw Data (structured and unstructured)
Individual level analytics, Micro segmentation,
Aggregate / Segment analytics u individualized offers to customers
Mainstream analytics
Outlier analytics, Pattern discovery, Simulation and
– Structured analysis u modeling, Machine learning
- OLAP cubes
Entire population of granular data can be
Sample data is used for identifying patterns u leveraged
Reports & Dashboards are done on a production Real-time operational analytics and reporting. Intra-
basis u day decision making.
Traditional models good for small amount of Big Models: Computationally intensive analyses,
data due to time constraints u simulations, models with many parameters
11