More Related Content Similar to Couchbase Server and IBM BigInsights: One + One = Three (20) More from Dipti Borkar (18) Couchbase Server and IBM BigInsights: One + One = Three1. Couchbase 2012
Couchbase Server and IBM BigInsights:
One + One = Three
Steve Beier
Program Director, Big Data Applications & Solutions, IBM
Dipti Borkar
Director, Product Management, Couchbase
© 2012 IBM Corporation
2. 2 kinds of database management system
OLTP
Analy+cs
2 © 2012 IBM Corporation
3. 2 kinds of database management system
OLTP
Analy+cs
3 © 2012 IBM Corporation
4. 2 kinds of database management system
OLTP
Analy+cs
4 © 2012 IBM Corporation
5. 2 kinds of database management system
Big
Users
Big
Data
5 © 2012 IBM Corporation
6. 2 kinds of database management system
Simple,
fast,
elas+c
NoSQL
database
with
sub-‐
millisecond
performance
at
scale
Map-‐reduce
against
huge
datasets
to
cook
up
insights
and
answers
6 © 2012 IBM Corporation
7. Ad and offer targeting
Ad Targeting 40
milliseconds
to
pick
the
right
offer
profiles,
raw
event
data
campaigns
/
offers,
ac:onable
insights
cooked
insights
raw
event
data
cooked
insights
7 © 2012 IBM Corporation
9. sqoop
sqoop == sql RDBMS + hadoop
• a data transfer tool for Hadoop
• for moving data from non-Hadoop datasources (like
relational databases, NoSQL) into/out-of Hadoop
Couchbase provides Cloudera Certified sqoop
connector
9 © 2012 IBM Corporation
10. Ad Targeting
Ad Targeting
Platform
Logs
Logs
Logs
Couchbase Server Cluster Logs
sqoop export Logs
flume
flow
sqoop import
Hadoop Cluster
10 © 2012 IBM Corporation
11. Content Driven Site
In order to keep up with changing needs on
richer, more targeted content that is delivered
to larger and larger audiences very quickly, Content Driven
data behind content driven sites is shifting to Web Site
Couchbase.
Couchbase Server Cluster Original RDBMS
Logs
Logs
Logs
Logs
Logs Hadoop excels at complex analytics which
may involve multiple steps of processing
which incorporate a number of different data
sources.
flume
flow sqoop import
sqoop export
sqoop import
Hadoop Cluster
11 © 2012 IBM Corporation
12. Couchbase à Hadoop
$ sqoop import
–-connect http://couchbase-01:8091/pools --
table DUMP
$ sqoop import
–-connect http://couchbase-01:8091/pools --
table BACKFILL_5
12 © 2012 IBM Corporation
13. Couchbase à Hadoop
$ sqoop import
–-connect http://couchbase-01:8091/pools --
table DUMP
$ sqoop import
–-connect http://couchbase-01:8091/pools --
table BACKFILL_5
For import, table must be either:
• DUMP: All items currently in Couchbase
• BACKFILL_n: All item mutations for n minutes
13 © 2012 IBM Corporation
14. Hadoop à Couchbase
$ sqoop export
--connect http://couchbase-01:8091/pools
--table REQUIRED_BUT_IGNORED
-–export-dir HDFS_DIRECTORY_TO_EXPORT
14 © 2012 IBM Corporation
15. sqoop Versions
sqoop 1.4.2
Cloudera CDH3
• Ubuntu 10.10 – 11.10; later versions missing package needed for CDH3
Cloudera CDH4 update 1 needed
• sqoop bug fix in Cloudera CDH4u1 required
15 © 2012 IBM Corporation
16. Couchbase sqoop - Resources
http://www.couchbase.com/develop/connectors/hadoop
http://www.couchbase.com/docs/hadoop-plugin/
https://github.com/couchbase/couchbase-hadoop-plugin
http://www.ibm.com/developerworks/opensource/library/ba-hadoop-couchbase/ba-
hadoop-couchbase-pdf.pdf
16 © 2012 IBM Corporation
17. Big Data platform: Bring Together a Large Volume and Variety of Data
to Find New Insights
T-Mobile
§ Analyzing a variety of data at
enormous volumes" Multi-channel customer
experience analysis
§ Insights on streaming data"
§ Large volume structured,
semi-structure and UOIT
unstructured data analysis" Detect life-threatening
conditions in time to intervene
Vestas
Predict weather patterns to plan
optimal wind turbine usage
Big Data Platform
Dublin City Council
• Variety
Optimization and monitoring of
• Velocity public transportations
• Volume
Brocade
Identify network security
intrusions
17 © 2012 IBM Corporation
© 2011 IBM Corporation
18. Green Energy: Vestas Wind Systems A/S
Volume
§ Weather and geographic data
analysis for wind turbine and wind
farm site planning
§ Deployed IBM Big Data to store,
manage and to analyze location-
specific data
§ Analyzing 2.8 petabytes of public
and private weather data for each
geographic location
§ Reduced by 97% - from weeks to
hours – the modeling time for wind
forecasting information
18 © 2012 IBM Corporation
19. IBM Watson Demonstrated the Power of Big Data Analytics
Variety
Can we design a computing system that rivals a human’s ability to answer
questions posed in natural language, interpreting meaning and context and
retrieving, analyzing and understanding vast amounts of information in real-time?
19 © 2012 IBM Corporation
20. Big Data Analytics in Smarter Hospitals
Velocity
Big Data enabled doctors from University of Ontario to apply neonatal infant
monitoring to predict infection in ICU 24 hours in advance
IBM Data Baby
youtube.com
20 © 2012 IBM Corporation
21. Asian telco reduces
billing costs and
improves customer
satisfaction.
Capabilities:
Stream Computing
Analytic Accelerators
Real-time mediation and analysis of
6B CDRs per day
Data processing time reduced from
12 hrs to 1 sec
Hardware cost reduced to 1/8th
Proactively address issues
21
(e.g. dropped calls) impacting customer
© 2012 IBM Corporation
21 satisfaction.
22. Telecommunications – Analyze in real time
§ A Telco processing Call Detail Records 500K/sec, 6B+ IPDRs analyzed
– 6 Billion CDRs per day per day on more than 4 PBs/yr.
– Deduplicating data over 7 days sustaining 1GBps.
– Processing latency reduced from 12 hours to a few seconds
§ A Telco implementing a solution to access and analyze call, internet usage and texting detail
records (xDRs) in real-time
– 91% reduction in time to merge data
– 93% reduction in storage requirements
– 85% reduction in servers used
§ A Telco requiring a solution to analyze up to 25M messages per second. At these volumes, in-
motion analysis is the only option
– “Streams handled at least an order of magnitude more events per second on the same hardware than competitors.” (Telco’s
Chief Architect)
– Even at these volumes, Streams provided near linear scalability
22 © 2012 IBM Corporation
23. Big Data is an integral part of an enterprise data platform
§ Manage Big Data from the instant it enters the enterprise
§ High fidelity – no changes to original format
§ Available for new uses, analyses, and integrations Business Analytic
Applications (e.g. Cognos,
SPSS) and Solutions
Big Data Applications
Operational Data Store
Big Data Platform
IBM Big Data Solutions Client and Partner Solutions Warehouse and
Appliances
Big Data User Environment
Developers End Users Admin.
Big Data Enterprise Engine Traditional data sources
Streaming Internet-scale
analytics analytics
Govern:
Source data (Web, sensors, logs, media, etc. )
Quality, Lifecycle Management, Security, Privacy
23 © 2012 IBM Corporation
24. IBM’s Big Data Platform
Bringing Big Data to the Enterprise
Data
IBM Big Data Solutions Client and Partner Solutions Warehouse
InfoSphere
Warehouse
Warehouse
Appliances
Big Data User Environments Netezza
Developers End Users Administrators Master Data
Mgmt
InfoSphere MDM
INTEGRATION
AGENTS
Database
Big Data Enterprise Engines DB2, Informix
Content
Analytics
ECM
Information Server
Business
Analytics
Streaming Analytics Internet Scale Analytics
Cognos & SPSS
Marketing
Open Source Foundational Components
Unica
Hadoop HBase Pig Lucene Jaql Hive Data Growth
Management
InfoSphere Optim
24 24 © 2012 IBM Corporation
25. IBM Big Data Platform Tools
Business Users
Data Scientists
Business Analysts
Developers
Administrators
• Determine product sentiment, intent, customer segmentation
• Execute reusable Apps to classify users, predict sales, and forecast trends
• Create spreadsheets and dashboards Analyzing big data
• Productive environment for executing analysis (cluster, rank, score with R, ML, Text)
• Create reusable analytic Apps without programming
• Dynamic open dashboard
25 © 2012 IBM Corporation
26. THANK YOU
sbeier@us.ibm.com
dipti@couchbase.com
26 © 2012 IBM Corporation