SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
What’s New in DMX/DMX-h?
March 2017
Agenda
What’s New?
• Big Data + Quality
• DMX/DMX-h
• Big Data Integration
– Access
– Integrate
– Comply
– Simplify
– Extend
What’s Coming Soon?
Integrated Workflow Demo
2
Syncsort Confidential and Proprietary - do not copy or distribute
BIG DATA + QUALITY!
What’s New
3Syncsort Confidential and Proprietary - do not copy or distribute
Bringing Together Best-of-Breed Data Integration & Data Quality
4Syncsort Confidential and Proprietary - do not copy or distribute
“Existing customers and prospects can view this acquisition as
positive. It extends Syncsort's information management capabilities
through strengthened data quality and data governance
functionality for the use cases they encounter.”
- “Syncsort Accelerates Data Quality With Trillium Acquisition Deal,” Gartner, December 6, 2016
Foundational Components of Any Enterprise Data Management Strategy
– Best-in-class data integration
functionality & performance
– Early adopter & leader in Hadoop,
Spark, Cloud, Real-time
– Extensive partner ecosystem, and out-
of-the-box integration with Hadoop
tools stack
– Most robust mainframe access &
integration capabilities in market
– Best-in-class, broad data quality
capabilities & functions
– Expertise in Cloud, Big Data &
Real-time
– Most robust profiling, parsing,
standardization and matching
capabilities in the market
– Support breadth of verticals and
business data quality objectives
3
DMX / DMX-H
What’s New
6Syncsort Confidential and Proprietary - do not copy or distribute
Syncsort DMX & DMX-h: Simple and Powerful Big Data Integration
• GUI for developing MapReduce & Spark jobs
• Test & debug locally in Windows; deploy on Hadoop
• Use-case Accelerators to fast-track development
• Broad based connectivity with automated parallelism
• Simply the best mainframe access and integration with Hadoop
• Improved per node scalability and throughput
High Performance Hadoop ETL Software
• Template driven design for:
o High performance ETL
o SQL migration/DB offload
o Mainframe data movement
• Light weight footprint on commodity hardware
• High speed flat file processing
• Self tuning engine
High Performance ETL Software
7Syncsort Confidential and Proprietary - do not copy or distribute
DMX
DMX-h
SIMPLIFY BIG DATA INTEGRATION
What’s New
8Syncsort Confidential and Proprietary - do not copy or distribute
Simplify Big Data Integration with Syncsort
9Syncsort Confidential and Proprietary - do not copy or distribute
Access
Get best in class data
ingestion capabilities
for Hadoop.
Mainframes, RDBMS,
MPP, JSON, Parquet,
Avro, ORC, NoSQL,
Kafka and more.
Access: Get Your Database data into Hadoop, At the Press of a Button
• Funnel hundreds of tables at once into your data lake
‒ Extract, map and move whole DB schemas in one invocation
‒ Extract from Oracle, DB2/z, MS SQL Server, Teradata and Netezza
‒ To SQL Server, Postgres, Hive, HDFS and S3
‒ Automatically create target Hive and HCat tables
• Process multiple funnels in parallel on edge node or data nodes
‒ Order data flows by dependencies
‒ Leverage DMX-h high performance data processing engine
• Extract only the data you want
‒ Data type filtering
‒ Table, record or column exclusion / inclusion
• In-flight transformations and cleansing
10Syncsort Confidential and Proprietary - do not copy or distribute
DMX
DataFunnel™
Move thousands of tables in days, not weeks!
Access: Bring ALL Enterprise Data Securely to the Data Lake
11
Syncsort Confidential and Proprietary - do not copy or distribute
Database
– RDBMS
– MPP
– NoSQL
Mainframe
– DB2/z
– VSAM
– FTP Binary
– Mainframe Fixed
– Mainframe Variable
– Mainframe Distributable
– COBOL IT line sequential
– All file formats…
Big Data
– JSON
– Avro
– Parquet
– ORC
– Hive (Enhancements)
Streaming
– Kafka
– MapR Streams
– HDF (NiFi)
Cloud
– Amazon S3
– Amazon Redshift, RDS
– Google Cloud Storage
… And
more!
Access: Hive Enhancements
Improvements to Hive support
JDBC connectivity
Support for partitioned tables: ORC, Parquet, AVRO, HDFS
Support for Truncate and Insert
Automatic creation of Hive and other Hcat supported tables
Direct distributed processing of Hive
Update of Hive statistics
12
Syncsort Confidential and Proprietary - do not copy or distribute
Simplify Big Data Integration with Syncsort
13Syncsort Confidential and Proprietary - do not copy or distribute
Access Integrate
Get best in class data
ingestion capabilities
for Hadoop.
Mainframes, RDBMS,
MPP, JSON, Parquet,
Avro, ORC, NoSQL,
Kafka and more.
Single interface for
streaming and batch
processes. Single
data pipeline for all
enterprise data,
batch or streaming.
Integrate: Single Interface for Streaming & Batch
14
Syncsort Confidential and Proprietary - do not copy or distribute
Kafka, MapR Streams, Apache Nifi, and
Spark!
Combine legacy batch and cutting edge
streaming data sources
Easy development in GUI – no need to
write Scala, C or Java code
Spark 2.0!
Simplify Streaming Data Integration
Globalization Enhancements
15
Syncsort Confidential and Proprietary - do not copy or distribute
Improved Fujitsu NetCOBOL support
Localization
Support for multi-byte copybooks
Complete support of ALL ICU code
pages
– Drop down list in GUI that provides most
common code pages at the top
– Remembers most recent code page
selection and pre-populates
Simplify Big Data Integration with Syncsort
16Syncsort Confidential and Proprietary - do not copy or distribute
Access Integrate Comply
Get best in class data
ingestion capabilities
for Hadoop.
Mainframes, RDBMS,
MPP, JSON, Parquet,
Avro, ORC, NoSQL,
Kafka and more.
Single interface for
streaming and batch
processes. Single
data pipeline for all
enterprise data,
batch or streaming.
Secure data access,
data governance and
lineage. Seamless
integration with
Kerberos, Apache
Ranger, Apache
Ambari, Cloudera
Manager, Cloudera
Navigator and Sentry.
Comply: Manage
Syncsort Confidential and Proprietary - do not copy or distribute
17
Cloudera Manager
–Deploy DMX-h across Cloudera cluster
–Monitor DMX-h jobs
Apache Ambari
–Deploy DMX-h across Hortonworks and
other clusters
–Monitor DMX-h jobs
Cloudera Director
–Deploy DMX-h on Cloudera in the Cloud
–Elastically expand and reduce capacity as
needed for spikes in workload
Comply: Govern
Syncsort Confidential and Proprietary - do not copy or distribute
18
Metadata and data lineage for Hive, Avro and
Parquet through HCatalog
Metadata lineage export from DMX/DMX-h
–Simplify audits, analytics dashboards, metrics
–Integrate with enterprise metadata repositories
Cloudera Navigator certified integration
–Extends HCatalog metadata
–HDFS, YARN, Spark and other metadata
–Lineage, tagging
–Business and structural metadata
Apache Atlas lineage integration
–Lineage, tagging
–Audit and track
(Technical preview available now)
Simplify Big Data Integration with Syncsort
19Syncsort Confidential and Proprietary - do not copy or distribute
Access Integrate Comply Simplify
Get best in class data
ingestion capabilities
for Hadoop.
Mainframes, RDBMS,
MPP, JSON, Parquet,
Avro, ORC, NoSQL,
Kafka and more.
Single interface for
streaming and batch
processes. Single
data pipeline for all
enterprise data,
batch or streaming.
Secure data access,
data governance and
lineage. Seamless
integration with
Kerberos, Apache
Ranger, Apache
Ambari, Cloudera
Manager, Cloudera
Navigator and Sentry.
Design once, deploy
anywhere & insulate
your organization from
rapidly changing eco-
system. Future proof
your applications for
new compute
frameworks, on
premise or in the cloud.
Simplify: Same Solution – On Premise or In the Cloud
• ETL engine on AWS Marketplace – Update to version 9.x
• Available on EC2, EMR, Google Cloud
• S3 and Redshift connectivity
• First & only leading ETL engine on Docker Hub
• Google Cloud Storage connectivity
20Syncsort Confidential and Proprietary - do not copy or distribute
Big Data + Cloud + Syncsort = Powerful, Flexible, Cost Effective
Intelligent Execution - Insulate your people from underlying complexities of Hadoop.
Simplify: Design Once, Deploy Anywhere
21
Use existing ETL skills.
No worries abut mappers, reducers, big side, small side, and so on.
Automatic optimization for best performance, load balancing, etc.
No changes or tuning required, even if you change execution frameworks
Future-proof job designs for emerging compute frameworks, e.g. Spark 2.0.
Intelligent
ExecutionLayer
One interface to design jobs to run on:
Single Node, Cluster
MapReduce 1, 2.x, Spark, Spark 2.0
Windows, Unix, Linux
On-Premise, Cloud
Batch, Streaming
Intelligent Execution - Insulate your people from underlying complexities of Hadoop.
Simplify: Design Once, Deploy Anywhere
22
Intelligent
ExecutionLayer
One interface to design jobs to run on:
Single Node, Cluster
MapReduce 1, 2.x, Spark, Spark 2.0
Windows, Unix, Linux
On-Premise, Cloud
Batch, Streaming
Integrated Workflow
In a single job, combine any execution location, framework or style.
Ingest data on an edge node, then process on the cluster in a single workflow
Combine MapReduce ETL with Spark data analysis
Run extended tasks and custom functions in framework of your choice
Integrated Workflow
23
Syncsort Confidential and Proprietary - do not copy or distribute
ADD CUSTOM FUNCTIONALITY
Extend
24Syncsort Confidential and Proprietary - do not copy or distribute
25
Syncsort Confidential and Proprietary - do not copy or distribute
Integrate: Easily Extend DMX / DMX-h with Custom Functions
& Extended Tasks
• Enable data scientists to add new functions
• Ability to add custom transformation
functions
– Shown in the GUI same as built-in functions
– Available via function pull-down and signature
• Ability to add job extensions to the data flow
• Publish a library in Syncsort github
– Rounding Package
– Advanced Math Package
– Multiple Pivot options
26
Syncsort Confidential and Proprietary - do not copy or distribute
Integrate: Extend User Base with Data Transformation Language (DTL)
• Metadata driven dynamic creation of
DMX-h jobs
• Enables partners and end users to
build on and extend DMX
• Human readable script-like interface
for developing jobs
• Legacy ETL migrations to DMX
– Ability to import DTL to the DMX Graphical
User Interface
– Maintain applications in the GUI
– Export metadata to DTL
WHAT’S NEXT?
Roadmap
27Syncsort Confidential and Proprietary - do not copy or distribute
Access: Keep Legacy and Modern Systems in Sync
Syncsort Confidential and Proprietary - do not copy or distribute
• Capture changes in source database as they happen
• Update target systems automatically
• Capture changes in huge tables without straining network capacity
• Minimize impact to source database performance
28
Delta Change Data Capture
Access: Hive Enhancements
Improvements to Hive support
JDBC connectivity
Support for partitioned tables: ORC, Parquet, AVRO, HDFS
Support for Truncate and Insert
Automatic creation of Hive and other Hcat supported tables
Distributed processing of Hive
Update of Hive statistics
Support for Hive tables with very complex arrays
29
Syncsort Confidential and Proprietary - do not copy or distribute
Access: New User Experience for DataFunnel
30Syncsort Confidential and Proprietary - do not copy or distribute
DMX
DataFunnel™
Access: New User Experience for DataFunnel
31Syncsort Confidential and Proprietary - do not copy or distribute
DMX
DataFunnel™
Syncsort Confidential and Proprietary - do not copy or distribute 32
THANK YOU!

Contenu connexe

Tendances

Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
DataWorks Summit
 

Tendances (19)

Seamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with ConnectSeamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with Connect
 
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionThe DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to Production
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-Service
 
Cassandra Lunch #88: Cadence
Cassandra Lunch #88: CadenceCassandra Lunch #88: Cadence
Cassandra Lunch #88: Cadence
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Scaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedInScaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedIn
 
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage SubsystemEvolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
HDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and SupportabilityHDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and Supportability
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
 
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UK
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UKSUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UK
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UK
 

En vedette

En vedette (6)

Mainframe Education Webcast: Innovations in Your Mainframe Sort
Mainframe Education Webcast: Innovations in Your Mainframe SortMainframe Education Webcast: Innovations in Your Mainframe Sort
Mainframe Education Webcast: Innovations in Your Mainframe Sort
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEGenerating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
 
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
 
5分でわかるブロックチェーンの基本的な仕組み
5分でわかるブロックチェーンの基本的な仕組み5分でわかるブロックチェーンの基本的な仕組み
5分でわかるブロックチェーンの基本的な仕組み
 

Similaire à Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
 
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage HadoopActian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
DataWorks Summit
 

Similaire à Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h (20)

Customer Education Webcast: New Features in Data Integration and Streaming CDC
Customer Education Webcast: New Features in Data Integration and Streaming CDCCustomer Education Webcast: New Features in Data Integration and Streaming CDC
Customer Education Webcast: New Features in Data Integration and Streaming CDC
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
End-to-End, Source to Analytics, Data Lineage with Syncsort DMX-h
End-to-End, Source to Analytics, Data Lineage with Syncsort DMX-hEnd-to-End, Source to Analytics, Data Lineage with Syncsort DMX-h
End-to-End, Source to Analytics, Data Lineage with Syncsort DMX-h
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016
 
Keeping Data in Sync with Syncsort
Keeping Data in Sync with SyncsortKeeping Data in Sync with Syncsort
Keeping Data in Sync with Syncsort
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
What’s New in Syncsort Integrate? New User Experience for Fast Data Onboarding
What’s New in Syncsort Integrate? New User Experience for Fast Data OnboardingWhat’s New in Syncsort Integrate? New User Experience for Fast Data Onboarding
What’s New in Syncsort Integrate? New User Experience for Fast Data Onboarding
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
 
5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage HadoopActian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics system
 
Wasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming PlatformWasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming Platform
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 

Plus de Precisely

How to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdfHow to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
Precisely
 
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter MassendatenZukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Precisely
 
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Precisely
 
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3fTestjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Precisely
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
Precisely
 
Moving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and PreciselyMoving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and Precisely
Precisely
 
Automate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellenceAutomate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center Excellence
Precisely
 

Plus de Precisely (20)

How to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdfHow to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
 
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter MassendatenZukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Crucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdfCrucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10
 
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
 
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
 
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3fTestjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
 
Data Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity TrendsData Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity Trends
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Optimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAPOptimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAP
 
SAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige InvestitionenSAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
 
Automatisierte SAP Prozesse mit Hilfe von APIs
Automatisierte SAP Prozesse mit Hilfe von APIsAutomatisierte SAP Prozesse mit Hilfe von APIs
Automatisierte SAP Prozesse mit Hilfe von APIs
 
Moving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and PreciselyMoving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and Precisely
 
Effective Security Monitoring for IBM i: What You Need to Know
Effective Security Monitoring for IBM i: What You Need to KnowEffective Security Monitoring for IBM i: What You Need to Know
Effective Security Monitoring for IBM i: What You Need to Know
 
Automate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellenceAutomate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center Excellence
 
5 Keys to Improved IT Operation Management
5 Keys to Improved IT Operation Management5 Keys to Improved IT Operation Management
5 Keys to Improved IT Operation Management
 
Unlock Efficiency With Your Address Data Today For a Smarter Tomorrow
Unlock Efficiency With Your Address Data Today For a Smarter TomorrowUnlock Efficiency With Your Address Data Today For a Smarter Tomorrow
Unlock Efficiency With Your Address Data Today For a Smarter Tomorrow
 
Navigating Cloud Trends in 2024 Webinar Deck
Navigating Cloud Trends in 2024 Webinar DeckNavigating Cloud Trends in 2024 Webinar Deck
Navigating Cloud Trends in 2024 Webinar Deck
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

  • 1. What’s New in DMX/DMX-h? March 2017
  • 2. Agenda What’s New? • Big Data + Quality • DMX/DMX-h • Big Data Integration – Access – Integrate – Comply – Simplify – Extend What’s Coming Soon? Integrated Workflow Demo 2 Syncsort Confidential and Proprietary - do not copy or distribute
  • 3. BIG DATA + QUALITY! What’s New 3Syncsort Confidential and Proprietary - do not copy or distribute
  • 4. Bringing Together Best-of-Breed Data Integration & Data Quality 4Syncsort Confidential and Proprietary - do not copy or distribute “Existing customers and prospects can view this acquisition as positive. It extends Syncsort's information management capabilities through strengthened data quality and data governance functionality for the use cases they encounter.” - “Syncsort Accelerates Data Quality With Trillium Acquisition Deal,” Gartner, December 6, 2016
  • 5. Foundational Components of Any Enterprise Data Management Strategy – Best-in-class data integration functionality & performance – Early adopter & leader in Hadoop, Spark, Cloud, Real-time – Extensive partner ecosystem, and out- of-the-box integration with Hadoop tools stack – Most robust mainframe access & integration capabilities in market – Best-in-class, broad data quality capabilities & functions – Expertise in Cloud, Big Data & Real-time – Most robust profiling, parsing, standardization and matching capabilities in the market – Support breadth of verticals and business data quality objectives 3
  • 6. DMX / DMX-H What’s New 6Syncsort Confidential and Proprietary - do not copy or distribute
  • 7. Syncsort DMX & DMX-h: Simple and Powerful Big Data Integration • GUI for developing MapReduce & Spark jobs • Test & debug locally in Windows; deploy on Hadoop • Use-case Accelerators to fast-track development • Broad based connectivity with automated parallelism • Simply the best mainframe access and integration with Hadoop • Improved per node scalability and throughput High Performance Hadoop ETL Software • Template driven design for: o High performance ETL o SQL migration/DB offload o Mainframe data movement • Light weight footprint on commodity hardware • High speed flat file processing • Self tuning engine High Performance ETL Software 7Syncsort Confidential and Proprietary - do not copy or distribute DMX DMX-h
  • 8. SIMPLIFY BIG DATA INTEGRATION What’s New 8Syncsort Confidential and Proprietary - do not copy or distribute
  • 9. Simplify Big Data Integration with Syncsort 9Syncsort Confidential and Proprietary - do not copy or distribute Access Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more.
  • 10. Access: Get Your Database data into Hadoop, At the Press of a Button • Funnel hundreds of tables at once into your data lake ‒ Extract, map and move whole DB schemas in one invocation ‒ Extract from Oracle, DB2/z, MS SQL Server, Teradata and Netezza ‒ To SQL Server, Postgres, Hive, HDFS and S3 ‒ Automatically create target Hive and HCat tables • Process multiple funnels in parallel on edge node or data nodes ‒ Order data flows by dependencies ‒ Leverage DMX-h high performance data processing engine • Extract only the data you want ‒ Data type filtering ‒ Table, record or column exclusion / inclusion • In-flight transformations and cleansing 10Syncsort Confidential and Proprietary - do not copy or distribute DMX DataFunnel™ Move thousands of tables in days, not weeks!
  • 11. Access: Bring ALL Enterprise Data Securely to the Data Lake 11 Syncsort Confidential and Proprietary - do not copy or distribute Database – RDBMS – MPP – NoSQL Mainframe – DB2/z – VSAM – FTP Binary – Mainframe Fixed – Mainframe Variable – Mainframe Distributable – COBOL IT line sequential – All file formats… Big Data – JSON – Avro – Parquet – ORC – Hive (Enhancements) Streaming – Kafka – MapR Streams – HDF (NiFi) Cloud – Amazon S3 – Amazon Redshift, RDS – Google Cloud Storage … And more!
  • 12. Access: Hive Enhancements Improvements to Hive support JDBC connectivity Support for partitioned tables: ORC, Parquet, AVRO, HDFS Support for Truncate and Insert Automatic creation of Hive and other Hcat supported tables Direct distributed processing of Hive Update of Hive statistics 12 Syncsort Confidential and Proprietary - do not copy or distribute
  • 13. Simplify Big Data Integration with Syncsort 13Syncsort Confidential and Proprietary - do not copy or distribute Access Integrate Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more. Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming.
  • 14. Integrate: Single Interface for Streaming & Batch 14 Syncsort Confidential and Proprietary - do not copy or distribute Kafka, MapR Streams, Apache Nifi, and Spark! Combine legacy batch and cutting edge streaming data sources Easy development in GUI – no need to write Scala, C or Java code Spark 2.0! Simplify Streaming Data Integration
  • 15. Globalization Enhancements 15 Syncsort Confidential and Proprietary - do not copy or distribute Improved Fujitsu NetCOBOL support Localization Support for multi-byte copybooks Complete support of ALL ICU code pages – Drop down list in GUI that provides most common code pages at the top – Remembers most recent code page selection and pre-populates
  • 16. Simplify Big Data Integration with Syncsort 16Syncsort Confidential and Proprietary - do not copy or distribute Access Integrate Comply Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more. Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming. Secure data access, data governance and lineage. Seamless integration with Kerberos, Apache Ranger, Apache Ambari, Cloudera Manager, Cloudera Navigator and Sentry.
  • 17. Comply: Manage Syncsort Confidential and Proprietary - do not copy or distribute 17 Cloudera Manager –Deploy DMX-h across Cloudera cluster –Monitor DMX-h jobs Apache Ambari –Deploy DMX-h across Hortonworks and other clusters –Monitor DMX-h jobs Cloudera Director –Deploy DMX-h on Cloudera in the Cloud –Elastically expand and reduce capacity as needed for spikes in workload
  • 18. Comply: Govern Syncsort Confidential and Proprietary - do not copy or distribute 18 Metadata and data lineage for Hive, Avro and Parquet through HCatalog Metadata lineage export from DMX/DMX-h –Simplify audits, analytics dashboards, metrics –Integrate with enterprise metadata repositories Cloudera Navigator certified integration –Extends HCatalog metadata –HDFS, YARN, Spark and other metadata –Lineage, tagging –Business and structural metadata Apache Atlas lineage integration –Lineage, tagging –Audit and track (Technical preview available now)
  • 19. Simplify Big Data Integration with Syncsort 19Syncsort Confidential and Proprietary - do not copy or distribute Access Integrate Comply Simplify Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more. Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming. Secure data access, data governance and lineage. Seamless integration with Kerberos, Apache Ranger, Apache Ambari, Cloudera Manager, Cloudera Navigator and Sentry. Design once, deploy anywhere & insulate your organization from rapidly changing eco- system. Future proof your applications for new compute frameworks, on premise or in the cloud.
  • 20. Simplify: Same Solution – On Premise or In the Cloud • ETL engine on AWS Marketplace – Update to version 9.x • Available on EC2, EMR, Google Cloud • S3 and Redshift connectivity • First & only leading ETL engine on Docker Hub • Google Cloud Storage connectivity 20Syncsort Confidential and Proprietary - do not copy or distribute Big Data + Cloud + Syncsort = Powerful, Flexible, Cost Effective
  • 21. Intelligent Execution - Insulate your people from underlying complexities of Hadoop. Simplify: Design Once, Deploy Anywhere 21 Use existing ETL skills. No worries abut mappers, reducers, big side, small side, and so on. Automatic optimization for best performance, load balancing, etc. No changes or tuning required, even if you change execution frameworks Future-proof job designs for emerging compute frameworks, e.g. Spark 2.0. Intelligent ExecutionLayer One interface to design jobs to run on: Single Node, Cluster MapReduce 1, 2.x, Spark, Spark 2.0 Windows, Unix, Linux On-Premise, Cloud Batch, Streaming
  • 22. Intelligent Execution - Insulate your people from underlying complexities of Hadoop. Simplify: Design Once, Deploy Anywhere 22 Intelligent ExecutionLayer One interface to design jobs to run on: Single Node, Cluster MapReduce 1, 2.x, Spark, Spark 2.0 Windows, Unix, Linux On-Premise, Cloud Batch, Streaming Integrated Workflow In a single job, combine any execution location, framework or style. Ingest data on an edge node, then process on the cluster in a single workflow Combine MapReduce ETL with Spark data analysis Run extended tasks and custom functions in framework of your choice
  • 23. Integrated Workflow 23 Syncsort Confidential and Proprietary - do not copy or distribute
  • 24. ADD CUSTOM FUNCTIONALITY Extend 24Syncsort Confidential and Proprietary - do not copy or distribute
  • 25. 25 Syncsort Confidential and Proprietary - do not copy or distribute Integrate: Easily Extend DMX / DMX-h with Custom Functions & Extended Tasks • Enable data scientists to add new functions • Ability to add custom transformation functions – Shown in the GUI same as built-in functions – Available via function pull-down and signature • Ability to add job extensions to the data flow • Publish a library in Syncsort github – Rounding Package – Advanced Math Package – Multiple Pivot options
  • 26. 26 Syncsort Confidential and Proprietary - do not copy or distribute Integrate: Extend User Base with Data Transformation Language (DTL) • Metadata driven dynamic creation of DMX-h jobs • Enables partners and end users to build on and extend DMX • Human readable script-like interface for developing jobs • Legacy ETL migrations to DMX – Ability to import DTL to the DMX Graphical User Interface – Maintain applications in the GUI – Export metadata to DTL
  • 27. WHAT’S NEXT? Roadmap 27Syncsort Confidential and Proprietary - do not copy or distribute
  • 28. Access: Keep Legacy and Modern Systems in Sync Syncsort Confidential and Proprietary - do not copy or distribute • Capture changes in source database as they happen • Update target systems automatically • Capture changes in huge tables without straining network capacity • Minimize impact to source database performance 28 Delta Change Data Capture
  • 29. Access: Hive Enhancements Improvements to Hive support JDBC connectivity Support for partitioned tables: ORC, Parquet, AVRO, HDFS Support for Truncate and Insert Automatic creation of Hive and other Hcat supported tables Distributed processing of Hive Update of Hive statistics Support for Hive tables with very complex arrays 29 Syncsort Confidential and Proprietary - do not copy or distribute
  • 30. Access: New User Experience for DataFunnel 30Syncsort Confidential and Proprietary - do not copy or distribute DMX DataFunnel™
  • 31. Access: New User Experience for DataFunnel 31Syncsort Confidential and Proprietary - do not copy or distribute DMX DataFunnel™
  • 32. Syncsort Confidential and Proprietary - do not copy or distribute 32 THANK YOU!