Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

What’s New in DMX/DMX-h?
March 2017

Agenda
What’s New?
• Big Data + Quality
• DMX/DMX-h
• Big Data Integration
– Access
– Integrate
– Comply
– Simplify
– Extend
What’s Coming Soon?
Integrated Workflow Demo
2
Syncsort Confidential and Proprietary - do not copy or distribute

BIG DATA + QUALITY!
What’s New
3Syncsort Confidential and Proprietary - do not copy or distribute

Bringing Together Best-of-Breed Data Integration & Data Quality
“Existing customers and prospects can view this acquisition as
positive. It extends Syncsort's information management capabilities
through strengthened data quality and data governance
functionality for the use cases they encounter.”
- “Syncsort Accelerates Data Quality With Trillium Acquisition Deal,” Gartner, December 6, 2016

Foundational Components of Any Enterprise Data Management Strategy
– Best-in-class data integration
functionality & performance
– Early adopter & leader in Hadoop,
Spark, Cloud, Real-time
– Extensive partner ecosystem, and out-
of-the-box integration with Hadoop
tools stack
– Most robust mainframe access &
integration capabilities in market
– Best-in-class, broad data quality
capabilities & functions
– Expertise in Cloud, Big Data &
Real-time
– Most robust profiling, parsing,
standardization and matching
capabilities in the market
– Support breadth of verticals and
business data quality objectives
3

DMX / DMX-H
What’s New

Syncsort DMX & DMX-h: Simple and Powerful Big Data Integration
• GUI for developing MapReduce & Spark jobs
• Test & debug locally in Windows; deploy on Hadoop
• Use-case Accelerators to fast-track development
• Broad based connectivity with automated parallelism
• Simply the best mainframe access and integration with Hadoop
• Improved per node scalability and throughput
High Performance Hadoop ETL Software
• Template driven design for:
o High performance ETL
o SQL migration/DB offload
o Mainframe data movement
• Light weight footprint on commodity hardware
• High speed flat file processing
• Self tuning engine
High Performance ETL Software
DMX
DMX-h

SIMPLIFY BIG DATA INTEGRATION
What’s New

Simplify Big Data Integration with Syncsort
Access
Get best in class data
ingestion capabilities
for Hadoop.
Mainframes, RDBMS,
MPP, JSON, Parquet,
Avro, ORC, NoSQL,
Kafka and more.

Access: Get Your Database data into Hadoop, At the Press of a Button
• Funnel hundreds of tables at once into your data lake
‒ Extract, map and move whole DB schemas in one invocation
‒ Extract from Oracle, DB2/z, MS SQL Server, Teradata and Netezza
‒ To SQL Server, Postgres, Hive, HDFS and S3
‒ Automatically create target Hive and HCat tables
• Process multiple funnels in parallel on edge node or data nodes
‒ Order data flows by dependencies
‒ Leverage DMX-h high performance data processing engine
• Extract only the data you want
‒ Data type filtering
‒ Table, record or column exclusion / inclusion
• In-flight transformations and cleansing
DMX
DataFunnel™
Move thousands of tables in days, not weeks!

Access: Bring ALL Enterprise Data Securely to the Data Lake
11
Database
– RDBMS
– MPP
– NoSQL
Mainframe
– DB2/z
– VSAM
– FTP Binary
– Mainframe Fixed
– Mainframe Variable
– Mainframe Distributable
– COBOL IT line sequential
– All file formats…
Big Data
– JSON
– Avro
– Parquet
– ORC
– Hive (Enhancements)
Streaming
– Kafka
– MapR Streams
– HDF (NiFi)
Cloud
– Amazon S3
– Amazon Redshift, RDS
– Google Cloud Storage
… And
more!

Access: Hive Enhancements
Improvements to Hive support
JDBC connectivity
Support for partitioned tables: ORC, Parquet, AVRO, HDFS
Support for Truncate and Insert
Automatic creation of Hive and other Hcat supported tables
Direct distributed processing of Hive
Update of Hive statistics
12

Access Integrate
for Hadoop.
Mainframes, RDBMS,
MPP, JSON, Parquet,
Avro, ORC, NoSQL,
Kafka and more.
Single interface for
streaming and batch
processes. Single
data pipeline for all
enterprise data,
batch or streaming.

Integrate: Single Interface for Streaming & Batch
14
Kafka, MapR Streams, Apache Nifi, and
Spark!
Combine legacy batch and cutting edge
streaming data sources
Easy development in GUI – no need to
write Scala, C or Java code
Spark 2.0!
Simplify Streaming Data Integration

Globalization Enhancements
15
Improved Fujitsu NetCOBOL support
Localization
Support for multi-byte copybooks
Complete support of ALL ICU code
pages
– Drop down list in GUI that provides most
common code pages at the top
– Remembers most recent code page
selection and pre-populates

Access Integrate Comply
for Hadoop.
Mainframes, RDBMS,
MPP, JSON, Parquet,
Avro, ORC, NoSQL,
Kafka and more.
streaming and batch
processes. Single
enterprise data,
batch or streaming.
Secure data access,
data governance and
lineage. Seamless
integration with
Kerberos, Apache
Ranger, Apache
Ambari, Cloudera
Manager, Cloudera
Navigator and Sentry.

Comply: Manage
17
Cloudera Manager
–Deploy DMX-h across Cloudera cluster
–Monitor DMX-h jobs
Apache Ambari
–Deploy DMX-h across Hortonworks and
other clusters
–Monitor DMX-h jobs
Cloudera Director
–Deploy DMX-h on Cloudera in the Cloud
–Elastically expand and reduce capacity as
needed for spikes in workload

Comply: Govern
18
Metadata and data lineage for Hive, Avro and
Parquet through HCatalog
Metadata lineage export from DMX/DMX-h
–Simplify audits, analytics dashboards, metrics
–Integrate with enterprise metadata repositories
Cloudera Navigator certified integration
–Extends HCatalog metadata
–HDFS, YARN, Spark and other metadata
–Lineage, tagging
–Business and structural metadata
Apache Atlas lineage integration
–Lineage, tagging
–Audit and track
(Technical preview available now)

Access Integrate Comply Simplify
for Hadoop.
Mainframes, RDBMS,
MPP, JSON, Parquet,
Avro, ORC, NoSQL,
Kafka and more.
streaming and batch
processes. Single
enterprise data,
batch or streaming.
Secure data access,
data governance and
lineage. Seamless
integration with
Kerberos, Apache
Ranger, Apache
Ambari, Cloudera
Manager, Cloudera
Navigator and Sentry.
Design once, deploy
anywhere & insulate
your organization from
rapidly changing eco-
system. Future proof
your applications for
new compute
frameworks, on
premise or in the cloud.

Simplify: Same Solution – On Premise or In the Cloud
• ETL engine on AWS Marketplace – Update to version 9.x
• Available on EC2, EMR, Google Cloud
• S3 and Redshift connectivity
• First & only leading ETL engine on Docker Hub
• Google Cloud Storage connectivity
Big Data + Cloud + Syncsort = Powerful, Flexible, Cost Effective

Intelligent Execution - Insulate your people from underlying complexities of Hadoop.
Simplify: Design Once, Deploy Anywhere
21
Use existing ETL skills.
No worries abut mappers, reducers, big side, small side, and so on.
Automatic optimization for best performance, load balancing, etc.
No changes or tuning required, even if you change execution frameworks
Future-proof job designs for emerging compute frameworks, e.g. Spark 2.0.
Intelligent
ExecutionLayer
One interface to design jobs to run on:
Single Node, Cluster
MapReduce 1, 2.x, Spark, Spark 2.0
Windows, Unix, Linux
On-Premise, Cloud
Batch, Streaming

Intelligent Execution - Insulate your people from underlying complexities of Hadoop.
Simplify: Design Once, Deploy Anywhere
22
Intelligent
ExecutionLayer
One interface to design jobs to run on:
Single Node, Cluster
MapReduce 1, 2.x, Spark, Spark 2.0
Windows, Unix, Linux
On-Premise, Cloud
Batch, Streaming
Integrated Workflow
In a single job, combine any execution location, framework or style.
Ingest data on an edge node, then process on the cluster in a single workflow
Combine MapReduce ETL with Spark data analysis
Run extended tasks and custom functions in framework of your choice

Integrated Workflow
23

ADD CUSTOM FUNCTIONALITY
Extend

25
Integrate: Easily Extend DMX / DMX-h with Custom Functions
& Extended Tasks
• Enable data scientists to add new functions
• Ability to add custom transformation
functions
– Shown in the GUI same as built-in functions
– Available via function pull-down and signature
• Ability to add job extensions to the data flow
• Publish a library in Syncsort github
– Rounding Package
– Advanced Math Package
– Multiple Pivot options

26
Integrate: Extend User Base with Data Transformation Language (DTL)
• Metadata driven dynamic creation of
DMX-h jobs
• Enables partners and end users to
build on and extend DMX
• Human readable script-like interface
for developing jobs
• Legacy ETL migrations to DMX
– Ability to import DTL to the DMX Graphical
User Interface
– Maintain applications in the GUI
– Export metadata to DTL

WHAT’S NEXT?
Roadmap

Access: Keep Legacy and Modern Systems in Sync
• Capture changes in source database as they happen
• Update target systems automatically
• Capture changes in huge tables without straining network capacity
• Minimize impact to source database performance
28
Delta Change Data Capture

Access: Hive Enhancements
Improvements to Hive support
JDBC connectivity
Support for partitioned tables: ORC, Parquet, AVRO, HDFS
Support for Truncate and Insert
Automatic creation of Hive and other Hcat supported tables
Distributed processing of Hive
Update of Hive statistics
Support for Hive tables with very complex arrays
29

Access: New User Experience for DataFunnel
DMX
DataFunnel™

Syncsort Confidential and Proprietary - do not copy or distribute 32
THANK YOU!

Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (19)

En vedette

En vedette (6)

Similaire à Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Similaire à Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h (20)

Plus de Precisely

Plus de Precisely (20)

Dernier

Dernier (20)

Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h