Analyzing transactional data residing in Oracle databases is becoming increasingly common, especially as the data sizes and complexity increase and transactional stores are no longer to keep pace with the ever-increasing storage. Although there are many techniques available for loading Oracle data, getting up-to-date data into your data warehouse store is a more difficult problem. VMware Continuent provides provides data replication from Oracle to data warehouses and analytics engines, to derive insight from big data for better business decisions. Learn practical tips on how to get your data warehouse loading projects off the ground quickly and efficiently when replicating from Oracle into Hadoop, Amazon Redshift, and HP Vertica.
2. 2
Agenda
1 Introduction to VMware Continuent
2 Understanding VMware Continuent Replication
3 Using Analytics and Data Warehouses
4 Warp-up and Questions
3. Introducing VMware Continuent
Business continuity for business-critical MySQL
database applications
Commercial-grade multi-site HA/DR
Database Clustering
Flexible, high-performance replication
for Oracle and MySQL
Simple data loading into analytics and big data
Data Replication
Oracle Oracle
MySQL Oracle
MySQL MySQL (+ MariaDB, Percona Server)
Oracle Hadoop, Redshift, Vertica
MySQL Hadoop, Redshift, Vertica
ProductsProducts
MySQL Single Site HA
MySQL Multi-Site HA and DR
4. Replication solves important problems for RDBMS users
• Real-time local copies in case the DBMS fails
• Real-time remote copies in case the site fails
• Loading data into quickly into analytic systems
• Feeding edge applications from the Oracle mother ship
• Migrating from Oracle to:
– New Oracle versions
– Less expensive editions
– Non-Oracle DBMS
CONFIDENTIAL 4
5. 5
Agenda
1 Introduction to VMware Continuent
2 Understanding VMware Continuent Replication
3 Using Analytics and Data Warehouses
4 Wrap-up and Questions
6. VMware Continuent implements flexible, high-
performance replication for Oracle and MySQL
6
Replicator
mySQL
DBMS
Logs
mySQL
Replicator
THL
THL
Download
transactions via
network or from
file system
Apply using JDBC
(Transactions + metadata)
(Transactions + metadata)
Primary
Secondary
Source
Target
Low latency
transfer
Low application
impact
7. VMware Continuent captures transactions directly from
Oracle REDO logs
7
Replicator
mySQL
REDO
Logs
mySQL
THL
(Transactions + metadata)
Primary
(To secondary)
Capture
data
dictionary
Source
Capture raw
transactions
Staging area
for REDO log
data
Replicator HostOracle DBMS Host
Convert to
serialized row
changes and DDL
8. Low-impact, high performance
• Source Oracle DBMS requirements:
– Supplemental logging
– Archive logs
– Replicator metadata stored in DBMS
– Replicator login with access to catalogs and flashback query
– local process to read REDO logs
• Target Oracle DBMS requirements:
– Replicator metadata stored in DBMS
CONFIDENTIAL 8
9. Transaction Based Replication
CONFIDENTIAL 9
Transaction Log
(Row changes + Statements)
0 Create table db1.foo
1 Create table db2.foo
2 insert into db1. foo values(1, …
3 Update db1.foo where id=1…
4 Insert into db2.foo values(5,…)
5 Insert into db1.foo values(3,…)
6 Delete from db2.foo where id=5
Source
Target
14. We can even divide logs into transaction sequences on keys
14
Table=db1.foo, key=1
2 insert into db1. foo values(1, …
3 Update db1.foo where id=1…
Table db2.foo, key=5
4 Insert into db2.foo values(5,…)
6 Delete from db2.foo where id=5
Table=db1.foo, key=3
5 Insert into db1.foo values(3,…)
Source
Target
16. 16
Agenda
1 Introduction to VMware Continuent
2 Understanding Continuent Replication
3 Using Analytics and Data Warehouses
4 Wrap-up and Questions
17. Data Warehouse Integration and Usage is Changing
• Traditional data warehouse usage was based on dump from transactional store, loads into data
warehouse
• Data warehouse and analytics were done off historical data loaded
• Data warehouses often use merged data from multiple sources, which was hard to handled
• Data warehouses are now frequently sources as well as targets for data, i.e.:
– Export data to data warehouse
– Analyze data
– Feed summary data back to application to display stats to users
17
19. How do we cope with that model
• Traditional Extract-Transform-Load (ETL) methods take too long
• Data needs to be replicated into a data warehouse in real-time
• Continuous stream of information
• Replicate everything
• Use data warehouse to provide join and analytics
20. Data Warehouse Choices
• Oracle
• Hadoop
– General purpose storage platform
– Map Reduce for data processing
– Front-end interfaces for interaction in SQL-like (Hive, HBase, Impala) and non-SQL (Pig, native, Spark)
– JDBC/ODBC Interfaces improving
• Vertica
– Massive cluster-based column store
– SQL and ODBC/JDBC Interface
• Amazon Redshift
– Highly flexible column store
– Easy to deploy
21. 21
(software formerly known as Tungsten Replicator)
is a fast,
open source, database
replication engine
Designed for speed and flexibility
Apache V2 license
100% open source, find it on Github
VMware Continuent for Replication/Data Warehouses
22. 22
Transactional Store Data Warehouse
Dump/Provision
Transactions?
X
Batch
The Data Warehouse Impedance Mismatch
23. Transactional and Data Warehouse Metadata
• Replicating data is not just about the data
• Table structures must be replicated too
• ddlscan handles the translation
– Migrates an existing MySQL or Oracle schema into the target schema
– Template based
– Handles underlying data type matches
– Needs to be executed before replication starts
29. Comparing Loading Methods for Hadoop
Manual via CSV Sqoop Tungsten
Replicator
Process Manual/Scripted Manual/Scripted Fully Automated
Incremental
Loading
Possible with DDL
changes
Requires DDL
changes
Fully Supported
Latency Full-load Intermittent Real-time
Extraction
Requirements
Full table scan Full and partial
table scans
Low-impact CDC/
binlog scan
30. Sqoop and Materialization within Hadoop
Hive
materialization
CSV
StagingTable
Base
Table
Sqoop
Replicate
31. 31
Op Seqno ID Msg
I 1 1 Hello World!
I 2 2 Meet MC
D 3 1
I 3 1 Goodbye World
Op Seq
no
ID Msg
I 2 2 Meet MC
I 3 1 Goodbye
World
How the Materialization Works
33. 33
Op Seqn
o
ID Date Msg
I 1 1 1/6/14 Hello World!
I 2 2 2/6/14 Meet MC
I 3 1 2/6/14 Goodbye World
I 4 1 3/6/14 Hello Tuesday
I 4 2 3/6/14 Ruby
Wednesday
I 5 1 4/6/14 Final Count
ID Date Msg
1 1/6/14 Hello World!
1 2/6/14 Goodbye World
1 3/6/14 Hello Tuesday
1 4/6/14 Final Count
Data Warehouse Possibilities: Time Series Generation
34. 34
Agenda
1 Introduction to VMware Continuent
2 Understanding Continuent Replication
3 Using Analytics and Data Warehouses
4 Wrap-up and Questions
35. Wrap-up
• VMware Continuent Replication provides robust, flexible capabilities that have
been battle-tested in demanding customer environments
• Replication features compare favorably to Oracle GoldenGate and Data Guard
• VMware Continuent handles HA/DR, data warehouse loading, and edge
application use cases
35
36. For more information, contact us:
Robert Noyes
Alliance Manager, AMER & LATAM
rnoyes@vmware.com
+1 (650) 575-0958
Philippe Bernard
Alliance Manager, EMEA & APJ
pbernard@vmware.com
+41 79 347 1385
MC Brown
Senior Product Line Manager
mcb@vmware.com
Eero Teerikorpi
Sr. Director, Strategic Alliance
eteerikorpi@vmware.com
+1 (408) 431-3305
www.vmware.com/products/continuent