View our quarterly customer education webcast to learn about the new advancements in Syncsort DMX and DMX-h data integration software and DataFunnel - our new easy-to-use browser-based database onboarding application. Learn about DMX Change Data Capture and the advantages of true streaming over micro-batch.
View this webcast on-demand where you'll hear the latest news on:
• Improvements in Syncsort DMX and DMX-h
• What’s next in the new DataFunnel interface
• Streaming data in DMX Change Data Capture
• Hadoop 3 support in Syncsort Integrate products
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Customer Education Webcast: New Features in Data Integration and Streaming CDC
1. New Features Plus Streaming
Change Data Capture
Paige Roberts, Integrate Product Marketing Manager
Ashwin Ramachandran, Integrate Product Manager
1
2. New Features Plus Streaming Change Data Capture
• Data Integration Product News
• Streaming Change Data Capture Demo
• Streaming Change Data Capture
1 Basics – Syncsort Data Integration Products
2 Lineage, Hive Improvements, Impala Support, …
3 Hadoop 3 Support
4 Onboarding User Experience Improvements – DataFunnel
3. Disclaimer
Any information about our roadmap outlines our general
product direction and is subject to change at any time
without notice. It is for informational purposes only and
shall not, be incorporated into any contract or other
commitment.
Syncsort undertakes no obligation either to develop the
features or functionality described or to include any such
feature or functionality in a future release.
3
5. 5
Syncsort DMX: High Performance ETL Software
•Template driven design for:
o High performance ETL
o SQL migration/Database ELT offload
o Mainframe data movement
•Small footprint on commodity hardware
•High speed flat file processing
•Self-tuning engine – Intelligent Execution
6. 6
DMX ETL: What We Deliver
Typical Results
Efficient
Fast
Easy
• Up to 75% less CPU
• Up to 75% less memory
• Up to 90% less storage
• Up to 10x lower elapsed
processing times
• Linear scalability
• Install in minutes
• Develop in hours
• Deploy in weeks
• Never tune again
9-month payback*
65% Lower TCO
200% ROI*
✓
✓
✓
*Source: The Total Economic Impact of Syncsort
DMExpress™, December 2011, Forrester Research, Inc.
7. 7
Syncsort DMX & DMX-h:
Simple and Powerful Big Data Integration
DMX
• GUI for developing MapReduce & Spark jobs
• Test & debug locally in Windows; deploy on cluster
• Use-case Accelerators to fast-track development
• Broad based connectivity with automated parallelism
• Simply the best mainframe access and integration with Hadoop
• Improved per node scalability and throughput
High Performance
ETL Software
• Template driven design for:
o High performance ETL
o SQL migration/DB offload
o Mainframe data movement
• Light weight footprint on commodity hardware
• High speed flat file processing
• Self tuning engine
High Performance
Hadoop ETL Software
DMX-h
8. 8
Design Once, Deploy Anywhere
Intelligent Execution - Insulate your organization from underlying complexities of Hadoop.
Get excellent performance every time
without tuning, load balancing, etc.
No re-design, re-compile, no re-work ever
• Future-proof job designs for emerging
compute frameworks, e.g. Spark 2.x
• Move from dev to test to production
• Move from on-premise to Cloud
• Move from one Cloud to another
Use existing ETL skills
No parallel programming – Java, MapReduce, Spark …
No worries about:
• Mappers, Reducers
• Big side or small side of joins …
Design Once
in visual GUI
Deploy Anywhere!
On-Premise,
Cloud
Mapreduce, Spark,
Future Platforms
Windows, Unix,
Linux
Batch,
Streaming
Single Node,
Cluster
9. Run your DMX tasks and jobs in a variety of ways on many different platforms.
Desktop
• Test and execute DMX jobs on your Windows laptop or desktop
Server
• Execute jobs stand-alone on an ETL server
• Scales to take advantage of cores on any size server for any size data
• Compatible with enterprise schedulers or has its own
Cluster
• If you move to Hadoop or Spark, DMX-h will run the same jobs designed on DMX
• Run jobs as MapReduce or Spark processes taking full advantage of cluster resources,
or run on edge node – with no coding, no tuning.
Cloud
• Move to the Cloud with the same ease – DMX/DMX-h is Cloud agnostic
• Microsoft Azure Gold Cloud Partner, Google Cloud Partner, Amazon Cloud Partner
9
DMX Versatility: Desktop, Server, Cluster, Cloud
10. 10
Get Your Database Data into Hadoop At the Press of a Button
DMX/DMX-h Includes DMX DataFunnel to:
Funnel hundreds of tables at once into your data lake
• Extract, map and move whole DB schemas in one invocation
• Extract from DB2, Oracle, Teradata, Netezza, S3, Redshift …
• To SQL Server, Postgres, Redshift, Hive, and HDFS
• Automatically create target tables
Process multiple funnels in parallel on edge node or data nodes
• Order data flows by dependencies
• Leverage DMX-h high performance data processing engine
Filter unwanted data before extraction
• Data type filtering
• Table, record or column exclusion / inclusion
In-flight transformations and cleansing
DMX DataFunnel™
11. DMX Change Data Capture
Keep data in sync in real-time
• Without overloading networks.
• Without affecting source database
performance.
• Without coding or tuning.
Reliable transfer of data you can trust even if connectivity fails on either side.
• Auto restart.
• No data loss.
Real-Time Replication
with Transformation
Conflict Resolution,
Collision Monitoring,
Tracking and Auditing
Files
RDBMS
Streams
Streams
RDBMS
Data
Lake
Mainframe
Cloud
OLAP
Broad Source and Target Support
• Mainframe, IBM i – Db2, VSAM, …
• Streams – Kafka, Amazon Kinesis, …
• Relational databases – Oracle, SQL Server, …
• Cloud – MS Azure SQL, S3, …
• OLAP databases – Teradata, …
• Hadoop / Big Data – Hive, HDFS, Impala, …
13. 13
Hive Support Enhancements
• JDBC connectivity
• Support for partitioned tables: ORC, Parquet, AVRO, HDFS
• Support for Truncate and Insert
• Automatic creation of Hive and other Hcat supported tables
• Direct distributed processing of Hive
• Update of Hive statistics
• Use Hive tables for lookups
• Change data capture target
• Hive ACID Merge support – Updates, inserts, deletes, and upserts in Hive
• Full support for complex types – arrays, structs, etc.
• Improved usability and support in mapping entire arrays, array elements
and composite fields.
• Automatic creation of tables in which the hierarchy of composite fields is
either flattened or maintained
14. 14
Cloudera Impala Support
• JDBC connectivity
• Both read and write support
• Support for Impala tables backed by: Parquet, Kudu
• Automatic creation of Impala tables
• Direct distributed processing of Impala – on edge node, framework-
designated node, or distributed on cluster with MapReduce or Spark
• Update of Impala statistics
• Use Impala tables for lookups
• Full support for update and insert
• Change data capture target support
16. 16
Hadoop 3 Support
First data integration partner certified on Cloudera 6!
Certified integration to:
• Cloudera Director
• Cloudera Manager
• Cloudera Navigator
• Apache Sentry
17. 17
Govern and Track Everything for Compliance
• Metadata and data lineage for Hive, Avro and Parquet
through HCatalog
• Metadata lineage export and REST API from DMX/DMX-h
o Simplify audits, analytics dashboards, metrics
o Integrate with enterprise metadata repositories
• Cloudera Navigator certified integration
o Audit and track data from source to cluster
o HDFS, YARN, Spark and other metadata
o Lineage, tagging
o Business and structural metadata
• Apache Atlas ingestion lineage integration
o Audit and track data from source to cluster
o Lineage, tagging
18. 18
Onboard ALL Enterprise Data – Mainframe to Streaming
Data Sources
Onboard data, modify
on-the-fly to match
Hadoop storage model,
or store unchanged for
archive and compliance.
Access data from
streaming and batch
sources outside
cluster.
Data Lake
Data
Transform, join,
cleanse, enhance
data in cluster
with MapReduce
or Spark.
Analytics,
Visualization,
Machine
Learning
Complete
Data
Analytics,
visualizations, and
machine learning
algorithms get ALL
necessary data.
19. 19
Get Source to Consumption, End-to-End Data Lineage
Data Sources
Auditors
get end-to-end
data lineage.
Analytics,
visualizations, and
machine learning
algorithms get ALL
necessary data.
Navigator or Atlas
gathers any other
changes made to
data on cluster.
Pass source-to-
cluster data
lineage info to
Navigator or Atlas.
Data Lake
Data changes made
by MapReduce,
Spark, HiveQL.
Data
Data Lineage
REST
API
Onboard data, modify
on-the-fly to match
Hadoop storage model,
or store unchanged for
archive and compliance.
Access data from
streaming and batch
sources outside
cluster.
Transform, join,
cleanse, enhance
data in cluster
with MapReduce
or Spark.
Complete
Data
Analytics,
Visualization,
Machine
Learning
22. 22
Onboard Relational Data Quickly
• Faster Parallel Data Extraction
• Extract, map and move whole DB schemas in one invocation
• Extract from Oracle, Db2, MS SQL Server, Teradata, Netezza and Redshift
• To SQL Server, Postgres, Hive, HDFS, Redshift and Amazon S3
• Automatically create target Hive and HCat tables
• Onboard hundreds of tables into your cluster
• Onboard whole database schemas at once
• Create target tables automatically in Hive
• Transform data in flight
• Filter unwanted:
o Tables
o Rows
o data types
o columns
DMX DataFunnel™
27. Keep Your Data Fresh with Real-Time Change Data Capture
Keep data in sync in real-time
• Without overloading networks.
• Without affecting source database
performance.
• Without coding or tuning.
• JDBC Encryption support
• Streaming to Apache Kafka
• Streaming to Amazon Kinesis
Reliable transfer of data you can trust even if connectivity fails on either side.
• Auto restart.
• No data loss.
Real-Time Replication
with Transformation
Conflict Resolution,
Collision Monitoring,
Tracking and Auditing
Files
RDBMS
Streams
Streams
RDBMS
Data
Lake
Mainframe
Cloud
OLAP
28. Real-Time Change Data Capture – New Sources and Targets
Sources
• MS SQL Server
• MS SQL Server 2017
• MS SQL Server Standard Edition
• Oracle
• Oracle RAC
• Oracle 12C – For Cloud
• Apache Kafka
• IBM Db2 for z, i, LUW
• VSAM
• IBM Informix
• Sybase
Targets
• Apache Kafka – Streaming
• Amazon Kinesis – Streaming
• Oracle
• Oracle RAC
• Oracle 12C – For Cloud
• SQL Server
• MS SQL Server 2017
• MS SQL Server Standard Edition
• Hive, HDFS, Impala (Kudu, Parquet, ORC)
• IBM Db2
• MS Azure SQL
• PostgreSQL
• MySQL
• Sybase
• Teradata
30. 30
THANK YOU!
For more information on DMX Change Data Capture:
http://www.syncsort.com/en/Resource-Center/BigData/SolutionSheets/Syncsort-
DMX-Change-Data-Capture