Don’t wait! Get the ultimate in data agility with the software you own today!
While other data integration companies hold out the possibility of simplifying cross-platform data integration down the road… our customers have enjoyed these capabilities for close to two years! And our industry-leading Big Data integration software keeps getting better.
Join our upcoming customer education webcast to learn how the latest advancements in Syncsort DMX and DMX-h software empower your organization to get the maximum business value from your data – fast – on premise, or in the cloud.
In this webcast, we will cover important new features that help you speed development, adapt to the latest data management requirements, and leverage rapid innovation in Big Data technology, including:
• Unparalleled simplicity and flexibility to adapt to changing workloads and frameworks with our new Integrated Workflow capability and support for Spark 2.0.
• Enhanced Cloud capabilities – including support for more sources and integration with Cloudera Director
• Unprecedented data governance flexibility and choice with new open metadata management capabilities, as well as Apache Atlas integration
We will also preview exciting new integration with Syncsort’s industry-leading Trillium Data Quality software.
2. Agenda
What’s New?
• Big Data + Quality
• DMX/DMX-h
• Big Data Integration
– Access
– Integrate
– Comply
– Simplify
– Extend
What’s Coming Soon?
Integrated Workflow Demo
2
Syncsort Confidential and Proprietary - do not copy or distribute
3. BIG DATA + QUALITY!
What’s New
3Syncsort Confidential and Proprietary - do not copy or distribute
4. Bringing Together Best-of-Breed Data Integration & Data Quality
4Syncsort Confidential and Proprietary - do not copy or distribute
“Existing customers and prospects can view this acquisition as
positive. It extends Syncsort's information management capabilities
through strengthened data quality and data governance
functionality for the use cases they encounter.”
- “Syncsort Accelerates Data Quality With Trillium Acquisition Deal,” Gartner, December 6, 2016
5. Foundational Components of Any Enterprise Data Management Strategy
– Best-in-class data integration
functionality & performance
– Early adopter & leader in Hadoop,
Spark, Cloud, Real-time
– Extensive partner ecosystem, and out-
of-the-box integration with Hadoop
tools stack
– Most robust mainframe access &
integration capabilities in market
– Best-in-class, broad data quality
capabilities & functions
– Expertise in Cloud, Big Data &
Real-time
– Most robust profiling, parsing,
standardization and matching
capabilities in the market
– Support breadth of verticals and
business data quality objectives
3
6. DMX / DMX-H
What’s New
6Syncsort Confidential and Proprietary - do not copy or distribute
7. Syncsort DMX & DMX-h: Simple and Powerful Big Data Integration
• GUI for developing MapReduce & Spark jobs
• Test & debug locally in Windows; deploy on Hadoop
• Use-case Accelerators to fast-track development
• Broad based connectivity with automated parallelism
• Simply the best mainframe access and integration with Hadoop
• Improved per node scalability and throughput
High Performance Hadoop ETL Software
• Template driven design for:
o High performance ETL
o SQL migration/DB offload
o Mainframe data movement
• Light weight footprint on commodity hardware
• High speed flat file processing
• Self tuning engine
High Performance ETL Software
7Syncsort Confidential and Proprietary - do not copy or distribute
DMX
DMX-h
8. SIMPLIFY BIG DATA INTEGRATION
What’s New
8Syncsort Confidential and Proprietary - do not copy or distribute
9. Simplify Big Data Integration with Syncsort
9Syncsort Confidential and Proprietary - do not copy or distribute
Access
Get best in class data
ingestion capabilities
for Hadoop.
Mainframes, RDBMS,
MPP, JSON, Parquet,
Avro, ORC, NoSQL,
Kafka and more.
10. Access: Get Your Database data into Hadoop, At the Press of a Button
• Funnel hundreds of tables at once into your data lake
‒ Extract, map and move whole DB schemas in one invocation
‒ Extract from Oracle, DB2/z, MS SQL Server, Teradata and Netezza
‒ To SQL Server, Postgres, Hive, HDFS and S3
‒ Automatically create target Hive and HCat tables
• Process multiple funnels in parallel on edge node or data nodes
‒ Order data flows by dependencies
‒ Leverage DMX-h high performance data processing engine
• Extract only the data you want
‒ Data type filtering
‒ Table, record or column exclusion / inclusion
• In-flight transformations and cleansing
10Syncsort Confidential and Proprietary - do not copy or distribute
DMX
DataFunnel™
Move thousands of tables in days, not weeks!
11. Access: Bring ALL Enterprise Data Securely to the Data Lake
11
Syncsort Confidential and Proprietary - do not copy or distribute
Database
– RDBMS
– MPP
– NoSQL
Mainframe
– DB2/z
– VSAM
– FTP Binary
– Mainframe Fixed
– Mainframe Variable
– Mainframe Distributable
– COBOL IT line sequential
– All file formats…
Big Data
– JSON
– Avro
– Parquet
– ORC
– Hive (Enhancements)
Streaming
– Kafka
– MapR Streams
– HDF (NiFi)
Cloud
– Amazon S3
– Amazon Redshift, RDS
– Google Cloud Storage
… And
more!
12. Access: Hive Enhancements
Improvements to Hive support
JDBC connectivity
Support for partitioned tables: ORC, Parquet, AVRO, HDFS
Support for Truncate and Insert
Automatic creation of Hive and other Hcat supported tables
Direct distributed processing of Hive
Update of Hive statistics
12
Syncsort Confidential and Proprietary - do not copy or distribute
13. Simplify Big Data Integration with Syncsort
13Syncsort Confidential and Proprietary - do not copy or distribute
Access Integrate
Get best in class data
ingestion capabilities
for Hadoop.
Mainframes, RDBMS,
MPP, JSON, Parquet,
Avro, ORC, NoSQL,
Kafka and more.
Single interface for
streaming and batch
processes. Single
data pipeline for all
enterprise data,
batch or streaming.
14. Integrate: Single Interface for Streaming & Batch
14
Syncsort Confidential and Proprietary - do not copy or distribute
Kafka, MapR Streams, Apache Nifi, and
Spark!
Combine legacy batch and cutting edge
streaming data sources
Easy development in GUI – no need to
write Scala, C or Java code
Spark 2.0!
Simplify Streaming Data Integration
15. Globalization Enhancements
15
Syncsort Confidential and Proprietary - do not copy or distribute
Improved Fujitsu NetCOBOL support
Localization
Support for multi-byte copybooks
Complete support of ALL ICU code
pages
– Drop down list in GUI that provides most
common code pages at the top
– Remembers most recent code page
selection and pre-populates
16. Simplify Big Data Integration with Syncsort
16Syncsort Confidential and Proprietary - do not copy or distribute
Access Integrate Comply
Get best in class data
ingestion capabilities
for Hadoop.
Mainframes, RDBMS,
MPP, JSON, Parquet,
Avro, ORC, NoSQL,
Kafka and more.
Single interface for
streaming and batch
processes. Single
data pipeline for all
enterprise data,
batch or streaming.
Secure data access,
data governance and
lineage. Seamless
integration with
Kerberos, Apache
Ranger, Apache
Ambari, Cloudera
Manager, Cloudera
Navigator and Sentry.
17. Comply: Manage
Syncsort Confidential and Proprietary - do not copy or distribute
17
Cloudera Manager
–Deploy DMX-h across Cloudera cluster
–Monitor DMX-h jobs
Apache Ambari
–Deploy DMX-h across Hortonworks and
other clusters
–Monitor DMX-h jobs
Cloudera Director
–Deploy DMX-h on Cloudera in the Cloud
–Elastically expand and reduce capacity as
needed for spikes in workload
18. Comply: Govern
Syncsort Confidential and Proprietary - do not copy or distribute
18
Metadata and data lineage for Hive, Avro and
Parquet through HCatalog
Metadata lineage export from DMX/DMX-h
–Simplify audits, analytics dashboards, metrics
–Integrate with enterprise metadata repositories
Cloudera Navigator certified integration
–Extends HCatalog metadata
–HDFS, YARN, Spark and other metadata
–Lineage, tagging
–Business and structural metadata
Apache Atlas lineage integration
–Lineage, tagging
–Audit and track
(Technical preview available now)
19. Simplify Big Data Integration with Syncsort
19Syncsort Confidential and Proprietary - do not copy or distribute
Access Integrate Comply Simplify
Get best in class data
ingestion capabilities
for Hadoop.
Mainframes, RDBMS,
MPP, JSON, Parquet,
Avro, ORC, NoSQL,
Kafka and more.
Single interface for
streaming and batch
processes. Single
data pipeline for all
enterprise data,
batch or streaming.
Secure data access,
data governance and
lineage. Seamless
integration with
Kerberos, Apache
Ranger, Apache
Ambari, Cloudera
Manager, Cloudera
Navigator and Sentry.
Design once, deploy
anywhere & insulate
your organization from
rapidly changing eco-
system. Future proof
your applications for
new compute
frameworks, on
premise or in the cloud.
20. Simplify: Same Solution – On Premise or In the Cloud
• ETL engine on AWS Marketplace – Update to version 9.x
• Available on EC2, EMR, Google Cloud
• S3 and Redshift connectivity
• First & only leading ETL engine on Docker Hub
• Google Cloud Storage connectivity
20Syncsort Confidential and Proprietary - do not copy or distribute
Big Data + Cloud + Syncsort = Powerful, Flexible, Cost Effective
21. Intelligent Execution - Insulate your people from underlying complexities of Hadoop.
Simplify: Design Once, Deploy Anywhere
21
Use existing ETL skills.
No worries abut mappers, reducers, big side, small side, and so on.
Automatic optimization for best performance, load balancing, etc.
No changes or tuning required, even if you change execution frameworks
Future-proof job designs for emerging compute frameworks, e.g. Spark 2.0.
Intelligent
ExecutionLayer
One interface to design jobs to run on:
Single Node, Cluster
MapReduce 1, 2.x, Spark, Spark 2.0
Windows, Unix, Linux
On-Premise, Cloud
Batch, Streaming
22. Intelligent Execution - Insulate your people from underlying complexities of Hadoop.
Simplify: Design Once, Deploy Anywhere
22
Intelligent
ExecutionLayer
One interface to design jobs to run on:
Single Node, Cluster
MapReduce 1, 2.x, Spark, Spark 2.0
Windows, Unix, Linux
On-Premise, Cloud
Batch, Streaming
Integrated Workflow
In a single job, combine any execution location, framework or style.
Ingest data on an edge node, then process on the cluster in a single workflow
Combine MapReduce ETL with Spark data analysis
Run extended tasks and custom functions in framework of your choice
25. 25
Syncsort Confidential and Proprietary - do not copy or distribute
Integrate: Easily Extend DMX / DMX-h with Custom Functions
& Extended Tasks
• Enable data scientists to add new functions
• Ability to add custom transformation
functions
– Shown in the GUI same as built-in functions
– Available via function pull-down and signature
• Ability to add job extensions to the data flow
• Publish a library in Syncsort github
– Rounding Package
– Advanced Math Package
– Multiple Pivot options
26. 26
Syncsort Confidential and Proprietary - do not copy or distribute
Integrate: Extend User Base with Data Transformation Language (DTL)
• Metadata driven dynamic creation of
DMX-h jobs
• Enables partners and end users to
build on and extend DMX
• Human readable script-like interface
for developing jobs
• Legacy ETL migrations to DMX
– Ability to import DTL to the DMX Graphical
User Interface
– Maintain applications in the GUI
– Export metadata to DTL
28. Access: Keep Legacy and Modern Systems in Sync
Syncsort Confidential and Proprietary - do not copy or distribute
• Capture changes in source database as they happen
• Update target systems automatically
• Capture changes in huge tables without straining network capacity
• Minimize impact to source database performance
28
Delta Change Data Capture
29. Access: Hive Enhancements
Improvements to Hive support
JDBC connectivity
Support for partitioned tables: ORC, Parquet, AVRO, HDFS
Support for Truncate and Insert
Automatic creation of Hive and other Hcat supported tables
Distributed processing of Hive
Update of Hive statistics
Support for Hive tables with very complex arrays
29
Syncsort Confidential and Proprietary - do not copy or distribute
30. Access: New User Experience for DataFunnel
30Syncsort Confidential and Proprietary - do not copy or distribute
DMX
DataFunnel™
31. Access: New User Experience for DataFunnel
31Syncsort Confidential and Proprietary - do not copy or distribute
DMX
DataFunnel™