4. $30B Database Market
Being Disrupted
95% <50%?
Relational Relational
Technology Other
Technology
Relational
Technology Relational NoSQL
Technology Technology
2012 2027
All new database growth will be NoSQL
All new database growth will be NoSQL
5. Operational vs. Analytic Databases
Real-time, Analytic
Interactive Databases Databases
NoSQL
Fast access Get insights from
to data data
Couchbase Cassandra Cloudera
MongoDB Hbase Hortonworks
Mapr
6. 49%
35%
29%
16% 12% 11%
Lack of flexibility/ Inability to scale Performance Cost All of these Other
rigid schemas out data challenges
Source: Couchbase Survey, December 2011, n = 1351.
8. What is Sqoop?
Sqoop is a tool designed to transfer data between Hadoop and relational
databases.
You can use Sqoop to import data from a relational database management
system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File
System (HDFS), transform the data in Hadoop MapReduce, and then export
the data back into an RDBMS.
sqoop.apache.org
9. What is Sqoop?
• Traditional ETL
Traditional ETL
T
Data Application Data
10. What is Sqoop?
• A different paradigm
A different paradigm
Applicatio
n
Data
Data
11. • A very scalable different paradigm
A very scalable different paradigm
Application
Data
Application
Data
Application
Data
Data
12. What is Sqoop?
• Where did the Transform go?
Where did the Transform go?
TTT TTT TTT TTT Applicatio
n
Data
13. What is Sqoop?
• Sqoop
• Default connection is via JDBCLots of custom connectorsCouchbase,
VoltDB, VerticaTeradata, NetezzaOracle, MySQL, Postgres
• Couchbase, VoltDB, VerticaTeradata, NetezzaOracle, MySQL,
Postgres
14. Ad and offer targeting
40 milliseconds to respond with
the decision.
profiles, real time campaign
3 statistics
2
1 profiles, campaigns
events
19. Couchbase Server Core Principles
Easy Consistent High
Scalability Performance
Grow cluster without application Consistent sub-millisecond
changes, without downtime with a read and write response times
single click with consistent high throughput
Always On Flexible Data
24x365 Model
No downtime for software JSON document model with no
upgrades, hardware maintenance, fixed schema.
etc.
The database industry is about $30B today and is dominated by companies like Oracle, IBM, and Microsoft Relational technology has dominated the industry for the last 40 years and is the technology underpinning for 95% of the industry today. We believe the database industry is being disrupted. In 10 – 15 years we believe relational technology will make up a much smaller percentage of the industry. It’s too early to tell whether it will be 50%, 40%, or 30% percent but it seems clear to me it will be much small than 95% We believe most of the future operational database growth will be NoSQL
There are two types of databases . Each is focused on a very different problem. Analytic databases were referred to in the past as OLAP databases. They are focused on looking through every record in a huge database to answer a question or gain an insight about the data contained in it. These analyses are batch processes that access every piece of data in the database, are very “ read ” heavy, and produce results in seconds , minutes, or sometimes days . For analytic databases, “real time” means an analysis takes a few seconds to run. Real-time interactive databases are often referred to as operational databases. They store a lot of data but usually much less than an analytic database. They must provide access to individual records in a database in milliseconds so that users of an application get good response time. Since the requirements of each database is very different, the architectures and capabilities of each are very different as well. When I refer to NoSQL in my presentation , I am referring to real-time, interactive databases . This is the type of NoSQL database Couchbase provides.