SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
©Continuent 2014
Real-Time Loading from
MySQL to Hadoop
Featuring Continuent Tungsten
MC Brown, Director of Documentation
©Continuent 2014 2
Introducing Continuent
©Continuent 2014
Introducing Continuent
3
• The leading provider of clustering and
replication for open source DBMS
• Our Product: Continuent Tungsten
• Clustering - Commercial-grade HA, performance
scaling and data management for MySQL
• Replication - Flexible, high-performance data
movement
©Continuent 2014
Quick Continuent Facts
• Largest Tungsten installation processes over
700 million transactions daily on 225
terabytes of data
• Tungsten Replicator was application of the
year at the 2011 MySQL User Conference
• Wide variety of topologies including MySQL,
Oracle, Vertica, and MongoDB are in
production now
• MySQL to Hadoop deployments are now in
progress with multiple customers
4
©Continuent 2014
Selected Continuent Customers
5
23
©Continuent 2014 6
Five Minute Hadoop
Introduction
©Continuent 2014
What Is Hadoop, Exactly?
7
a.A distributed file system
b.A method of processing massive quantities
of data in parallel
c.The Cutting family's stuffed elephant
d.All of the above
©Continuent 2014
Hadoop Distributed File System
8
Java	

Client
NameNode	

(directory)
DataNodes (replicated data)
Hive
Pig
hadoop	

command
Find 	

file
Read	

block(s)
©Continuent 2014
Typical MySQL to Hadoop Use Case
9
Hive	

(Analytics)
Hadoop
Cluster
Transaction
Processing
Initial Load?
Latency?
App changes?
Materialized 	

views?
Changes?
App load?
©Continuent 2014
Traditional Hadoop Deployments
• Data Analytics
• Single databases
• Collective databases
• Databases and external information
• Non-structured data
• Long term datastores and archiving
10
Client
Back

Office
©Continuent 2014
• Online Analytics
• Real-time queries and caching
• Fully heterogeneous deployments
Future Hadoop Deployments
11
Client
Back

Office
©Continuent 2014
Options for Loading Data
12
CSV	

Files
Sqoop
Manual	

Loading
Sqoop
Tungsten	

Replicator
©Continuent 2014
Comparing Methods in Detail
13
Manual via
CSV
Sqoop
Tungsten
Replicator
Process
Manual/
Scripted
Manual/
Scripted
Fully
automated
Incremental
Loading
Possible with
DDL changes
Requires DDL
changes
Fully
supported
Latency Full-load Intermittent Real-time
Extraction
Requirements
Full table scan
Full and partial
table scans
Low-impact
binlog scan
©Continuent 2014 14
Replicating MySQL Data
to Hadoop using
Tungsten Replicator
©Continuent 2014
What is Tungsten Replicator?
15
A real-time,
high-performance,
open source database
replication engine
!
GPLV2 license - 100% open source	

Download from https://code.google.com/p/tungsten-replicator/	

Annual support subscription available from Continuent
“Golden Gate without the Price Tag”®
©Continuent 2014
Tungsten Replicator Overview
16
Master
(Transactions + Metadata)
Slave
THL
DBMS	

Logs
Replicator
(Transactions + Metadata)
THLReplicator
Extract
transactions
from log
Apply
©Continuent 2014
Tungsten Replicator 3.0  Hadoop
17
• Extract from MySQL or Oracle
• Base Hadoop plus commercial distributions:
Cloudera, HortonWorks, Amazon EMR, IBM
• Provision using Sqoop or parallel extraction
• Automatic replication of incremental changes
• Transformation to preferred HDFS formats
• Schema generation for Hive
• Tools for generating materialized views
©Continuent 2014
Basic MySQL to Hadoop Replication
18
MySQL Tungsten Master
Replicator
hadoop
Master-Side Filtering	

* pkey - Fill in pkey info	

* colnames - Fill in names	

* cdc - Add update type and
schema/table info	

* source - Add source DBMS	

* replicate - Subset tables to
be replicated
binlog_format=row
Tungsten Slave
Replicator
hadoop
MySQL	

Binlog
CSV	

Files
CSV	

Files
CSV	

Files
CSV	

Files
CSV	

Files
Hadoop	

Cluster
Extract from
MySQL binlog
Load raw CSV to HDFS
(e.g., via LOAD DATA to
Hive)
Access via Hive
©Continuent 2014
Hadoop Data Loading - Gory Details
19
Replicator
hadoop
Transactions
from master
CSV	

Files
CSV	

Files
CSV	

Files
Staging	

Tables
Staging	

Tables
Staging
“Tables”
Base TablesBase TablesMaterializedViews
Javascript load
script	

e.g. hadoop.js
Write data
to CSV
(Run
MapReduce)
(Generate
Table
Definitions)
(Generate
Table
Definitions)
Load using
hadoop
command
©Continuent 2014 20
Demo #1
!
Replicating data into Hadoop
©Continuent 2014
JavaScript Batch Loader
• Simple, flexible batch loader
• prepare() - when we go online
• begin() - start of batch
• apply() - write the events
• commit() - commit the events
• release() - when we go offline
21
©Continuent 2014 22
Viewing MySQL Data
in Hadoop
©Continuent 2014
Generating Staging Table Schema
23
$ ddlscan -template ddl-mysql-hive-0.10-staging.vm !
-user tungsten -pass secret !
-url jdbc:mysql:thin://logos1:3306/sales -db sales!
...!
DROP TABLE IF EXISTS sales.stage_xxx_sales;!
!
CREATE EXTERNAL TABLE sales.stage_xxx_sales!
(!
tungsten_opcode STRING ,!
tungsten_seqno INT ,!
tungsten_row_id INT ,!
id INT ,!
salesman STRING ,!
planet STRING ,!
value DOUBLE)!
ROW FORMAT DELIMITED FIELDS TERMINATED BY '001' ESCAPED BY ''!
LINES TERMINATED BY 'n'!
STORED AS TEXTFILE LOCATION '/user/tungsten/staging/sales/sales';
©Continuent 2014
Generating Base Table Schema
$ ddlscan -template ddl-mysql-hive-0.10.vm -user tungsten !
-pass secret -url jdbc:mysql:thin://logos1:3306/sales -db sales!
...!
DROP TABLE IF EXISTS sales.sales;!
!
CREATE TABLE sales.sales!
(!
id INT,!
salesman STRING,!
planet STRING,!
value DOUBLE )!
;!
24
©Continuent 2014
Creating a Materialized View in Theory
25
Log #1 Log #2 Log #N...
MAP	

Sort by key(s), transaction order
REDUCE	

Emit last row per key if not a delete
©Continuent 2014
MapReduce
26
Acme,2013,4.75!
Spitze,2013,25.00!
Acme,2013,55.25!
Excelsior,2013,1.00!
Spitze,2013,5.00
Spitze,2014,60.00!
Spitze,2014,9.50!
Acme,2014,1.00!
Acme,2014,4.00!
Excelsior,2014,1.00!
Excelsior,2014,9.00
Acme,(4.75,55.25)!
Spitze,(25.00,5,00)!
Excelsior,(1.00)
Spitze,(60.00,9.50)!
Acme,(1.00,4.00)!
Excelsior,(1.00,9.00)
MAP
MAP
REDUCE
Acme,65.00!
Excelsior,11.00!
Spitze,99.50
SELECT COMPANY,VALUE FROM ...WHERE ... GROUP
BY COMPANY
©Continuent 2014
Creating a Materialized View in Hive
$ hive!
...!
hive ADD FILE /home/rhodges/github/continuent-tools-hadoop/bin/
tungsten-reduce;!
hive FROM ( !
SELECT sales.*!
FROM sales.stage_xxx_sales sales!
DISTRIBUTE BY id !
SORT BY id,tungsten_seqno,tungsten_row_id!
) map1!
INSERT OVERWRITE TABLE sales.sales!
SELECT TRANSFORM(!
tungsten_opcode,tungsten_seqno,tungsten_row_id,id, !
salesman,planet,value)!
USING 'perl tungsten-reduce -k id -c
tungsten_opcode,tungsten_seqno,tungsten_row_id,id,salesman,planet
,value'!
AS id INT,salesman STRING,planet STRING,value DOUBLE;!
27
MAP
REDUCE
©Continuent 2014
Comparing MySQL and Hadoop Data
$ export TUNGSTEN_EXT_LIBS=/usr/lib/hive/lib!
...!
$ /opt/continuent/tungsten/bristlecone/bin/dc !
-url1 jdbc:mysql:thin://logos1:3306/sales !
-user1 tungsten -password1 secret !
-url2 jdbc:hive2://localhost:10000 !
-user2 'tungsten' -password2 'secret' -schema sales !
-table sales -verbose -keys id !
-driver org.apache.hive.jdbc.HiveDriver!
22:33:08,093 INFO DC - Data comparison utility!
...!
22:33:24,526 INFO Tables compare OK!
28
©Continuent 2014
Doing it all at once
$ git clone !
https://github.com/continuent/continuent-tools-
hadoop.git!
!
$ cd continuent-tools-hadoop!
!
$ bin/load-reduce-check !
-U jdbc:mysql:thin://logos1:3306/sales !
-s sales --verbose
29
©Continuent 2014 30
Demo #2
!
Constructing and Checking a
Materialized View
©Continuent 2014 31
Scaling It Up!
©Continuent 2014
MySQL to Hadoop Fan-In Architecture
32
Replicator
m1 (slave)
m2 (slave)
m3 (slave)
Replicator
m1 (master)
m2 (master)
m3 (master)
Replicator
Replicator
RBR
RBR
Slaves
Hadoop	

Cluster	

(many nodes)
Masters
RBR
©Continuent 2014
Integration with Provisioning
33
MySQL
Tungsten Master
hadoop
binlog_format=row
Tungsten Slave
hadoop
MySQL	

Binlog
CSV	

Files
CSV	

Files
CSV	

Files
CSV	

Files
CSV	

Files
Hadoop	

Cluster
Access via Hive
Sqoop/ETL
(Initial provisioning run)
©Continuent 2014
On-Demand Provisioning via Parallel
Extract
34
MySQL Tungsten Master
Replicator
hadoop
Master-Side Filtering	

* pkey - Fill in pkey info	

* colnames - Fill in names	

* cdc - Add update type and
schema/table info	

* source - Add source DBMS	

* replicate - Subset tables to
be replicated	

(other filters as needed)	

binlog_format=row
Tungsten Slave
Replicator
hadoop
MySQL	

Binlog
CSV	

Files
CSV	

Files
CSV	

Files
CSV	

Files
CSV	

Files
Hadoop	

Cluster
Extract from
MySQL tables
Load raw CSV to HDFS
(e.g., via LOAD DATA to
Hive)
Access via Hive
©Continuent 2014
Tungsten Replicator Roadmap
35
• Parallel CSV file loading
• Partition loaded data by commit time
• Data formats and tools to support additional
Hadoop clients as well as HBase
• Replication out of Hadoop
• Integration with emerging real-time analytics
based on HDFS (Impala, Spark/Shark,
Stinger,...)
©Continuent 2014 36
Getting Started with
Continuent Tungsten
©Continuent 2014
Where Is Everything?
37
• Tungsten Replicator 3.0 builds are now available on
code.google.com
http://code.google.com/p/tungsten-replicator/
• Replicator 3.0 documentation is available on
Continuent website
http://docs.continuent.com/tungsten-replicator-3.0/
deployment-hadoop.html
• Tungsten Hadoop tools are available on GitHub
https://github.com/continuent/continuent-tools-hadoop
Contact Continuent for support
©Continuent 2014
Commercial Terms
• Replicator features are open source (GPL V2)
• Investment Elements
• POC / Development (Walk Away Option)
• Production Deployment
• Annual Support Subscription
• Governing Principles
• Annual Subscription Required
• More Upfront Investment - Less Annual Subscription
38
©Continuent 2014
We Do Clustering Too!
39
Tungsten clusters combine off-
the-shelf open source MySQL
servers into data services with:
!
• 24x7 data access
• Scaling of load on replicas
• Simple management commands
!
...without app changes or data
migration
Amazon
US West
apache
/php
GonzoPortal.com
Connector Connector
©Continuent 2014
In Conclusion: Tungsten Offers...
• Fully automated, real-time replication from MySQL
into Hadoop
• Support for automatic transformation to HDFS data
formats and creation of full materialized views
• Positions users to take advantage of evolving real-
time features in Hadoop
40
©Continuent 2014
Continuent Web Page:	

http://www.continuent.com	

!
Tungsten Replicator 3.0:	

http://code.google.com/p/tungsten-replicator	

Our Blogs:
http://scale-out-blog.blogspot.com
http://mcslp.wordpress.com
http://www.continuent.com/news/blogs
560 S. Winchester Blvd., Suite 500
San Jose, CA 95128
Tel +1 (866) 998-3642
Fax +1 (408) 668-1009
e-mail: sales@continuent.com

Contenu connexe

Tendances

Tendances (20)

Business-critical MySQL with DR in vCloud Air
Business-critical MySQL with DR in vCloud AirBusiness-critical MySQL with DR in vCloud Air
Business-critical MySQL with DR in vCloud Air
 
Geographically Distributed Multi-Master MySQL Clusters
Geographically Distributed Multi-Master MySQL ClustersGeographically Distributed Multi-Master MySQL Clusters
Geographically Distributed Multi-Master MySQL Clusters
 
Sqoop
SqoopSqoop
Sqoop
 
Tungsten University: Replicate Between MySQL And Oracle
Tungsten University: Replicate Between MySQL And OracleTungsten University: Replicate Between MySQL And Oracle
Tungsten University: Replicate Between MySQL And Oracle
 
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...
 
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, SmallerORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
 
Sqoop2 refactoring for generic data transfer - NYC Sqoop Meetup
Sqoop2 refactoring for generic data transfer - NYC Sqoop MeetupSqoop2 refactoring for generic data transfer - NYC Sqoop Meetup
Sqoop2 refactoring for generic data transfer - NYC Sqoop Meetup
 
Replicate Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
Replicate Oracle to Oracle, Oracle to MySQL, and Oracle to AnalyticsReplicate Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
Replicate Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
 
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKIntroduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
 
Oracle HA, DR, data warehouse loading, and license reduction through edge app...
Oracle HA, DR, data warehouse loading, and license reduction through edge app...Oracle HA, DR, data warehouse loading, and license reduction through edge app...
Oracle HA, DR, data warehouse loading, and license reduction through edge app...
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
 
Set Up & Operate Tungsten Replicator
Set Up & Operate Tungsten ReplicatorSet Up & Operate Tungsten Replicator
Set Up & Operate Tungsten Replicator
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
 
Hadoop engineering bo_f_final
Hadoop engineering bo_f_finalHadoop engineering bo_f_final
Hadoop engineering bo_f_final
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
From docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayFrom docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native way
 
TeraCache: Efficient Caching Over Fast Storage Devices
TeraCache: Efficient Caching Over Fast Storage DevicesTeraCache: Efficient Caching Over Fast Storage Devices
TeraCache: Efficient Caching Over Fast Storage Devices
 

En vedette

En vedette (20)

Tungsten Replicator tutorial
Tungsten Replicator tutorialTungsten Replicator tutorial
Tungsten Replicator tutorial
 
Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors
Real-Time Data Replication to Hadoop using GoldenGate 12c AdaptorsReal-Time Data Replication to Hadoop using GoldenGate 12c Adaptors
Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors
 
OpenShift: Java EE in the clouds
OpenShift: Java EE in the cloudsOpenShift: Java EE in the clouds
OpenShift: Java EE in the clouds
 
Case study: JBoss Developer Studio, an IDE for Web, Mobile and Cloud applicat...
Case study: JBoss Developer Studio, an IDE for Web, Mobile and Cloud applicat...Case study: JBoss Developer Studio, an IDE for Web, Mobile and Cloud applicat...
Case study: JBoss Developer Studio, an IDE for Web, Mobile and Cloud applicat...
 
Tungsten University: Set Up And Manage Advanced Replication Topologies
Tungsten University: Set Up And Manage Advanced Replication TopologiesTungsten University: Set Up And Manage Advanced Replication Topologies
Tungsten University: Set Up And Manage Advanced Replication Topologies
 
Building simple and complex clusters with tungsten replicator
Building simple and complex clusters with tungsten replicatorBuilding simple and complex clusters with tungsten replicator
Building simple and complex clusters with tungsten replicator
 
Continuent Tungsten - Scalable Saa S Data Management
Continuent Tungsten - Scalable Saa S Data ManagementContinuent Tungsten - Scalable Saa S Data Management
Continuent Tungsten - Scalable Saa S Data Management
 
Tungsten University: Geographically Distributed Multi-Master MySQL Clusters
Tungsten University: Geographically Distributed Multi-Master MySQL ClustersTungsten University: Geographically Distributed Multi-Master MySQL Clusters
Tungsten University: Geographically Distributed Multi-Master MySQL Clusters
 
Percona and Continuent present: Multi-Data Center MySQL with Continuent Tungsten
Percona and Continuent present: Multi-Data Center MySQL with Continuent TungstenPercona and Continuent present: Multi-Data Center MySQL with Continuent Tungsten
Percona and Continuent present: Multi-Data Center MySQL with Continuent Tungsten
 
5 Keys to Oracle GoldenGate Implemenations
5 Keys to Oracle GoldenGate Implemenations5 Keys to Oracle GoldenGate Implemenations
5 Keys to Oracle GoldenGate Implemenations
 
Oracle GoldenGate for Big Data
Oracle GoldenGate for Big DataOracle GoldenGate for Big Data
Oracle GoldenGate for Big Data
 
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
 
Setup & Operate Tungsten Replicator
Setup & Operate Tungsten ReplicatorSetup & Operate Tungsten Replicator
Setup & Operate Tungsten Replicator
 
DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel Aviv
DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel AvivDevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel Aviv
DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel Aviv
 
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFSMySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS
 
Docker Tooling for Eclipse
Docker Tooling for EclipseDocker Tooling for Eclipse
Docker Tooling for Eclipse
 
Go Faster - Remove Inhibitors to Rapid Innovation
Go Faster - Remove Inhibitors to Rapid InnovationGo Faster - Remove Inhibitors to Rapid Innovation
Go Faster - Remove Inhibitors to Rapid Innovation
 
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data StreamingOracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
 
Replacing Oracle CDC with Oracle GoldenGate
Replacing Oracle CDC with Oracle GoldenGateReplacing Oracle CDC with Oracle GoldenGate
Replacing Oracle CDC with Oracle GoldenGate
 
Eurosmart presentation on the eidas regulation
Eurosmart presentation on the eidas regulationEurosmart presentation on the eidas regulation
Eurosmart presentation on the eidas regulation
 

Similaire à Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0

October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
Yahoo Developer Network
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
DataWorks Summit
 

Similaire à Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0 (20)

Replicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon RedshiftReplicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon Redshift
 
MariaDB pres at LeMUG
MariaDB pres at LeMUGMariaDB pres at LeMUG
MariaDB pres at LeMUG
 
Replicate from Oracle to Oracle, Oracle to MySQL, and Oracle to analytics
Replicate from Oracle to Oracle, Oracle to MySQL, and Oracle to analyticsReplicate from Oracle to Oracle, Oracle to MySQL, and Oracle to analytics
Replicate from Oracle to Oracle, Oracle to MySQL, and Oracle to analytics
 
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...
 
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
 
Tungsten University: Load A Vertica Data Warehouse With MySQL Data
Tungsten University: Load A Vertica Data Warehouse With MySQL DataTungsten University: Load A Vertica Data Warehouse With MySQL Data
Tungsten University: Load A Vertica Data Warehouse With MySQL Data
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Manuel Hurtado. Couchbase paradigma4oct
Manuel Hurtado. Couchbase paradigma4octManuel Hurtado. Couchbase paradigma4oct
Manuel Hurtado. Couchbase paradigma4oct
 
CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning
 
SD Times - Docker v2
SD Times - Docker v2SD Times - Docker v2
SD Times - Docker v2
 
Using Databases and Containers From Development to Deployment
Using Databases and Containers  From Development to DeploymentUsing Databases and Containers  From Development to Deployment
Using Databases and Containers From Development to Deployment
 
Virtualizing Apache Spark with Justin Murray
Virtualizing Apache Spark with Justin MurrayVirtualizing Apache Spark with Justin Murray
Virtualizing Apache Spark with Justin Murray
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
 
Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
 

Plus de Continuent

Continuent Tungsten Value Proposition Webinar
Continuent Tungsten Value Proposition WebinarContinuent Tungsten Value Proposition Webinar
Continuent Tungsten Value Proposition Webinar
Continuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControlWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Continuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Continuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Continuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Continuent
 
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Continuent
 
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Continuent
 
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Continuent
 

Plus de Continuent (20)

Tungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and BeyondTungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and Beyond
 
Continuent Tungsten Value Proposition Webinar
Continuent Tungsten Value Proposition WebinarContinuent Tungsten Value Proposition Webinar
Continuent Tungsten Value Proposition Webinar
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControlWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS AuroraWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
 
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
 
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
 
Webinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
Webinar Slides: Intelligent Database Proxies: Routing & Transparent FailoverWebinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
Webinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
 
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
 
Training Slides: 205 - Installing and Configuring Tungsten Dashboard
Training Slides: 205 - Installing and Configuring Tungsten DashboardTraining Slides: 205 - Installing and Configuring Tungsten Dashboard
Training Slides: 205 - Installing and Configuring Tungsten Dashboard
 
Training Slides: 352 - Tungsten Replicator for MongoDB & Kafka
Training Slides: 352 - Tungsten Replicator for MongoDB & KafkaTraining Slides: 352 - Tungsten Replicator for MongoDB & Kafka
Training Slides: 352 - Tungsten Replicator for MongoDB & Kafka
 
Training Slides: 351 - Tungsten Replicator for Data Warehouses
Training Slides: 351 - Tungsten Replicator for Data WarehousesTraining Slides: 351 - Tungsten Replicator for Data Warehouses
Training Slides: 351 - Tungsten Replicator for Data Warehouses
 
Training Slides: 303 - Replicating out of a Cluster
Training Slides: 303 - Replicating out of a ClusterTraining Slides: 303 - Replicating out of a Cluster
Training Slides: 303 - Replicating out of a Cluster
 
Training Slides: 206 - Using the Tungsten Cluster AMI
Training Slides: 206 - Using the Tungsten Cluster AMITraining Slides: 206 - Using the Tungsten Cluster AMI
Training Slides: 206 - Using the Tungsten Cluster AMI
 
Training Slides: 254 - Using the Tungsten Replicator AMI
Training Slides: 254 - Using the Tungsten Replicator AMITraining Slides: 254 - Using the Tungsten Replicator AMI
Training Slides: 254 - Using the Tungsten Replicator AMI
 
Training Slides: 253 - Filter like a Pro
Training Slides: 253 - Filter like a ProTraining Slides: 253 - Filter like a Pro
Training Slides: 253 - Filter like a Pro
 
Training Slides: 252 - Monitoring & Troubleshooting
Training Slides: 252 - Monitoring & TroubleshootingTraining Slides: 252 - Monitoring & Troubleshooting
Training Slides: 252 - Monitoring & Troubleshooting
 
Training Slides: 302 - Securing Your Cluster With SSL
Training Slides: 302 - Securing Your Cluster With SSLTraining Slides: 302 - Securing Your Cluster With SSL
Training Slides: 302 - Securing Your Cluster With SSL
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0

  • 1. ©Continuent 2014 Real-Time Loading from MySQL to Hadoop Featuring Continuent Tungsten MC Brown, Director of Documentation
  • 3. ©Continuent 2014 Introducing Continuent 3 • The leading provider of clustering and replication for open source DBMS • Our Product: Continuent Tungsten • Clustering - Commercial-grade HA, performance scaling and data management for MySQL • Replication - Flexible, high-performance data movement
  • 4. ©Continuent 2014 Quick Continuent Facts • Largest Tungsten installation processes over 700 million transactions daily on 225 terabytes of data • Tungsten Replicator was application of the year at the 2011 MySQL User Conference • Wide variety of topologies including MySQL, Oracle, Vertica, and MongoDB are in production now • MySQL to Hadoop deployments are now in progress with multiple customers 4
  • 6. ©Continuent 2014 6 Five Minute Hadoop Introduction
  • 7. ©Continuent 2014 What Is Hadoop, Exactly? 7 a.A distributed file system b.A method of processing massive quantities of data in parallel c.The Cutting family's stuffed elephant d.All of the above
  • 8. ©Continuent 2014 Hadoop Distributed File System 8 Java Client NameNode (directory) DataNodes (replicated data) Hive Pig hadoop command Find file Read block(s)
  • 9. ©Continuent 2014 Typical MySQL to Hadoop Use Case 9 Hive (Analytics) Hadoop Cluster Transaction Processing Initial Load? Latency? App changes? Materialized views? Changes? App load?
  • 10. ©Continuent 2014 Traditional Hadoop Deployments • Data Analytics • Single databases • Collective databases • Databases and external information • Non-structured data • Long term datastores and archiving 10 Client Back
 Office
  • 11. ©Continuent 2014 • Online Analytics • Real-time queries and caching • Fully heterogeneous deployments Future Hadoop Deployments 11 Client Back
 Office
  • 12. ©Continuent 2014 Options for Loading Data 12 CSV Files Sqoop Manual Loading Sqoop Tungsten Replicator
  • 13. ©Continuent 2014 Comparing Methods in Detail 13 Manual via CSV Sqoop Tungsten Replicator Process Manual/ Scripted Manual/ Scripted Fully automated Incremental Loading Possible with DDL changes Requires DDL changes Fully supported Latency Full-load Intermittent Real-time Extraction Requirements Full table scan Full and partial table scans Low-impact binlog scan
  • 14. ©Continuent 2014 14 Replicating MySQL Data to Hadoop using Tungsten Replicator
  • 15. ©Continuent 2014 What is Tungsten Replicator? 15 A real-time, high-performance, open source database replication engine ! GPLV2 license - 100% open source Download from https://code.google.com/p/tungsten-replicator/ Annual support subscription available from Continuent “Golden Gate without the Price Tag”®
  • 16. ©Continuent 2014 Tungsten Replicator Overview 16 Master (Transactions + Metadata) Slave THL DBMS Logs Replicator (Transactions + Metadata) THLReplicator Extract transactions from log Apply
  • 17. ©Continuent 2014 Tungsten Replicator 3.0 Hadoop 17 • Extract from MySQL or Oracle • Base Hadoop plus commercial distributions: Cloudera, HortonWorks, Amazon EMR, IBM • Provision using Sqoop or parallel extraction • Automatic replication of incremental changes • Transformation to preferred HDFS formats • Schema generation for Hive • Tools for generating materialized views
  • 18. ©Continuent 2014 Basic MySQL to Hadoop Replication 18 MySQL Tungsten Master Replicator hadoop Master-Side Filtering * pkey - Fill in pkey info * colnames - Fill in names * cdc - Add update type and schema/table info * source - Add source DBMS * replicate - Subset tables to be replicated binlog_format=row Tungsten Slave Replicator hadoop MySQL Binlog CSV Files CSV Files CSV Files CSV Files CSV Files Hadoop Cluster Extract from MySQL binlog Load raw CSV to HDFS (e.g., via LOAD DATA to Hive) Access via Hive
  • 19. ©Continuent 2014 Hadoop Data Loading - Gory Details 19 Replicator hadoop Transactions from master CSV Files CSV Files CSV Files Staging Tables Staging Tables Staging “Tables” Base TablesBase TablesMaterializedViews Javascript load script e.g. hadoop.js Write data to CSV (Run MapReduce) (Generate Table Definitions) (Generate Table Definitions) Load using hadoop command
  • 20. ©Continuent 2014 20 Demo #1 ! Replicating data into Hadoop
  • 21. ©Continuent 2014 JavaScript Batch Loader • Simple, flexible batch loader • prepare() - when we go online • begin() - start of batch • apply() - write the events • commit() - commit the events • release() - when we go offline 21
  • 22. ©Continuent 2014 22 Viewing MySQL Data in Hadoop
  • 23. ©Continuent 2014 Generating Staging Table Schema 23 $ ddlscan -template ddl-mysql-hive-0.10-staging.vm ! -user tungsten -pass secret ! -url jdbc:mysql:thin://logos1:3306/sales -db sales! ...! DROP TABLE IF EXISTS sales.stage_xxx_sales;! ! CREATE EXTERNAL TABLE sales.stage_xxx_sales! (! tungsten_opcode STRING ,! tungsten_seqno INT ,! tungsten_row_id INT ,! id INT ,! salesman STRING ,! planet STRING ,! value DOUBLE)! ROW FORMAT DELIMITED FIELDS TERMINATED BY '001' ESCAPED BY ''! LINES TERMINATED BY 'n'! STORED AS TEXTFILE LOCATION '/user/tungsten/staging/sales/sales';
  • 24. ©Continuent 2014 Generating Base Table Schema $ ddlscan -template ddl-mysql-hive-0.10.vm -user tungsten ! -pass secret -url jdbc:mysql:thin://logos1:3306/sales -db sales! ...! DROP TABLE IF EXISTS sales.sales;! ! CREATE TABLE sales.sales! (! id INT,! salesman STRING,! planet STRING,! value DOUBLE )! ;! 24
  • 25. ©Continuent 2014 Creating a Materialized View in Theory 25 Log #1 Log #2 Log #N... MAP Sort by key(s), transaction order REDUCE Emit last row per key if not a delete
  • 27. ©Continuent 2014 Creating a Materialized View in Hive $ hive! ...! hive ADD FILE /home/rhodges/github/continuent-tools-hadoop/bin/ tungsten-reduce;! hive FROM ( ! SELECT sales.*! FROM sales.stage_xxx_sales sales! DISTRIBUTE BY id ! SORT BY id,tungsten_seqno,tungsten_row_id! ) map1! INSERT OVERWRITE TABLE sales.sales! SELECT TRANSFORM(! tungsten_opcode,tungsten_seqno,tungsten_row_id,id, ! salesman,planet,value)! USING 'perl tungsten-reduce -k id -c tungsten_opcode,tungsten_seqno,tungsten_row_id,id,salesman,planet ,value'! AS id INT,salesman STRING,planet STRING,value DOUBLE;! 27 MAP REDUCE
  • 28. ©Continuent 2014 Comparing MySQL and Hadoop Data $ export TUNGSTEN_EXT_LIBS=/usr/lib/hive/lib! ...! $ /opt/continuent/tungsten/bristlecone/bin/dc ! -url1 jdbc:mysql:thin://logos1:3306/sales ! -user1 tungsten -password1 secret ! -url2 jdbc:hive2://localhost:10000 ! -user2 'tungsten' -password2 'secret' -schema sales ! -table sales -verbose -keys id ! -driver org.apache.hive.jdbc.HiveDriver! 22:33:08,093 INFO DC - Data comparison utility! ...! 22:33:24,526 INFO Tables compare OK! 28
  • 29. ©Continuent 2014 Doing it all at once $ git clone ! https://github.com/continuent/continuent-tools- hadoop.git! ! $ cd continuent-tools-hadoop! ! $ bin/load-reduce-check ! -U jdbc:mysql:thin://logos1:3306/sales ! -s sales --verbose 29
  • 30. ©Continuent 2014 30 Demo #2 ! Constructing and Checking a Materialized View
  • 32. ©Continuent 2014 MySQL to Hadoop Fan-In Architecture 32 Replicator m1 (slave) m2 (slave) m3 (slave) Replicator m1 (master) m2 (master) m3 (master) Replicator Replicator RBR RBR Slaves Hadoop Cluster (many nodes) Masters RBR
  • 33. ©Continuent 2014 Integration with Provisioning 33 MySQL Tungsten Master hadoop binlog_format=row Tungsten Slave hadoop MySQL Binlog CSV Files CSV Files CSV Files CSV Files CSV Files Hadoop Cluster Access via Hive Sqoop/ETL (Initial provisioning run)
  • 34. ©Continuent 2014 On-Demand Provisioning via Parallel Extract 34 MySQL Tungsten Master Replicator hadoop Master-Side Filtering * pkey - Fill in pkey info * colnames - Fill in names * cdc - Add update type and schema/table info * source - Add source DBMS * replicate - Subset tables to be replicated (other filters as needed) binlog_format=row Tungsten Slave Replicator hadoop MySQL Binlog CSV Files CSV Files CSV Files CSV Files CSV Files Hadoop Cluster Extract from MySQL tables Load raw CSV to HDFS (e.g., via LOAD DATA to Hive) Access via Hive
  • 35. ©Continuent 2014 Tungsten Replicator Roadmap 35 • Parallel CSV file loading • Partition loaded data by commit time • Data formats and tools to support additional Hadoop clients as well as HBase • Replication out of Hadoop • Integration with emerging real-time analytics based on HDFS (Impala, Spark/Shark, Stinger,...)
  • 36. ©Continuent 2014 36 Getting Started with Continuent Tungsten
  • 37. ©Continuent 2014 Where Is Everything? 37 • Tungsten Replicator 3.0 builds are now available on code.google.com http://code.google.com/p/tungsten-replicator/ • Replicator 3.0 documentation is available on Continuent website http://docs.continuent.com/tungsten-replicator-3.0/ deployment-hadoop.html • Tungsten Hadoop tools are available on GitHub https://github.com/continuent/continuent-tools-hadoop Contact Continuent for support
  • 38. ©Continuent 2014 Commercial Terms • Replicator features are open source (GPL V2) • Investment Elements • POC / Development (Walk Away Option) • Production Deployment • Annual Support Subscription • Governing Principles • Annual Subscription Required • More Upfront Investment - Less Annual Subscription 38
  • 39. ©Continuent 2014 We Do Clustering Too! 39 Tungsten clusters combine off- the-shelf open source MySQL servers into data services with: ! • 24x7 data access • Scaling of load on replicas • Simple management commands ! ...without app changes or data migration Amazon US West apache /php GonzoPortal.com Connector Connector
  • 40. ©Continuent 2014 In Conclusion: Tungsten Offers... • Fully automated, real-time replication from MySQL into Hadoop • Support for automatic transformation to HDFS data formats and creation of full materialized views • Positions users to take advantage of evolving real- time features in Hadoop 40
  • 41. ©Continuent 2014 Continuent Web Page: http://www.continuent.com ! Tungsten Replicator 3.0: http://code.google.com/p/tungsten-replicator Our Blogs: http://scale-out-blog.blogspot.com http://mcslp.wordpress.com http://www.continuent.com/news/blogs 560 S. Winchester Blvd., Suite 500 San Jose, CA 95128 Tel +1 (866) 998-3642 Fax +1 (408) 668-1009 e-mail: sales@continuent.com