The document discusses a Dell EMC, Cloudera, and Syncsort solution for offloading ETL workloads from data warehouses to Hadoop. It notes that traditional tools are not working well due to high costs and performance issues. The solution aims to reduce deployment time, develop ETL jobs quickly, and improve productivity. It provides an overview of the solution architecture and components. Key benefits include faster data transformation, reduced administration costs, and easier ongoing operations.
3. Dell - Internal Use - Confidential3 of 123 of 22
The digital transformation will cause disruption
48%
don’t know what their
industry will look like
in 3 years
78%
feel threatened
by digital startups
45%
fear they may
become obsolete
in 3-5 years
Business leaders see a chaotic, uncertain future ahead
Source: Digital Transformation Index, October, 2016
Research by Vanson Bourne & Dell Technologies exploring the implications of digital disruption around the world, how companies are
transforming to meet changing customer demands and business leaders’ plans to succeed in the connected future.
4. Dell - Internal Use - Confidential4 of 12
Businesses still have a huge opportunity
to get this right
73%
say a centralized
tech strategy needs
to be a priority
72%
plan to expand
their software
development
capabilities
66%
are incentivized
to invest in IT
infrastructure
and digital skills
leadership
This is how leaders plan to leap ahead
Source: Digital Transformation Index, October, 2016
Research by Vanson Bourne & Dell Technologies exploring the implications of digital disruption around the world, how companies are
transforming to meet changing customer demands and business leaders’ plans to succeed in the connected future.
4 of 22
5. Dell - Internal Use - Confidential5 of 12
Leaders agreed the following digital business
attributes are imperatives to success
Source: Digital Transformation Index, October, 2016
Research by Vanson Bourne & Dell Technologies exploring the implications of digital disruption around the world, how companies are
transforming to meet changing customer demands and business leaders’ plans to succeed in the connected future.
Predictively spot
new opportunities
Demonstrate
transparency
and trust
Deliver unique
and personalized
experiences
Innovate in
agile ways
Operate in
real time
Big Data and Analytics will
be at the core to enabling
all these attributes
5 of 22
6. Dell - Internal Use - Confidential6 of 12
Data-driven organizations are more effective
greater revenue growth
for businesses that
leverage data effectively
50%
But 44%
Become data-driven.
A journey begins with a single step.
Align IT / Business goals
Improve operational efficiency
Transform your organization
of organizations do
not know how to start…
Data from Dell Global Technology Adoption Index, November 2015
6 of 22
7. Dell - Internal Use - Confidential7 of 12
Align business and IT
Dell helps by
Utilizing ALL data to deliver deeper
insights and enhanced data-driven
decision making.
Organizational goals
.
.
. Empower end Users
Control costs
Improve outcomes
S
Reducing TCO and seamlessly
integrating with existing investments
to enable greater ROI
Providing secure anywhere, anytime
access to data and analytics for
improved productivity.
7 of 22
9. Dell - Internal Use - Confidential9 of 12
Traditional tools are not working
#1 Challenge
Organizations cite TCO as
biggest obstacle to data
integration tools
Dell accelerates time to
value by lowering data
transformation costs &
improve performance by
augmenting the Enterprise
Data Warehouse (EDW)
Dell EMC Cloudera Syncsort ETL Offload
Hadoop Solution reduces Hadoop
deployment to weeks, develop Hadoop
ETL jobs within hours, and become fully
productive within days
after deployment
of all Data Warehouses are performance
and capacity constrained
*Gartner
70%
Data integration and
transformation drive a
majority of the EDW capacity
80%
9 of 98
10. Dell - Internal Use - Confidential10 of 12
Too many workloads in the EDW
Modernize the data pipeline with Hadoop
Traditional data pipeline
Enterprise data warehouse + ETL
Data transformation jobs
Business reporting
Query
Data staging tool
Extract and load data
Clean and parse data
Disparate data
sources
The results
Longer data transformation
job times
Not meeting SLAs for
business reporting
Slow Ad Hoc Query
Too costly to scale
Perf
Capacity
10 of 98
Modern data pipeline
Enterprise data warehouse
Business reporting
Query
Hadoop + ETL
Data transformation jobs
Clean, parse, transform
Disparate data
sources
The results
Reduced data
transformation job times
Improved SLAs for
business reporting
Fast Ad Hoc Query
Scales Economically
Perf
Capacity
11. Dell - Internal Use - Confidential11 of 12
Customer value
Dell Services
Reference Architecture
ETL Offload
PE R730XD, Networking
Solution stack Components Customer value
Faster deployment
from months to weeks
Hadoop Distribution Cloudera 5.9
Data management
and security
Data Transformation
Syncsort
DMX-h version 9.1
Convert SQL jobs into
native Hadoop execution
Deployment
business application
Build operational
efficiency with Hadoop
No other vendor offers this solution
11 of 98
12. Dell - Internal Use - Confidential12 of 12
Dell data solutions drive operational efficiency
Reduce data warehouse
administrative costs up to 76%
Control
costs
Transform data 60% faster for analysis
Improve
productivity
Develop and design complex data
transformation jobs up to 54% faster
Simplify ongoing
operations
12 of 98
14. Dell - Internal Use - Confidential14 of 12
Operational Efficiency: From use case to action
Source 1. Connect 3. Act2. Analyze
Preventive
Maintenance
IT Resource Capacity
and Unitization
Operational Process
Improvement
Business Process Cost
Optimization
Cyber Security
Analytics
Improved
Forecasting
Compliance and
Reporting
Operational data
sources
Extract, transform load Business reporting
and query
Enterprise data
warehouse
Enterprise data
warehouse
Relational
management database
Relational
Management database
Data mart Data mart
Services • Management • Infrastructure • Security • Dell Financial Services
Parse
Clean
Translate
Sort
Aggregate
Group
Compute
+ Data
14 of 22
23. Goals of the Modern Data Architecture
• Centralize all your data
Collect raw data from every source from within the enterprise, regardless of
complexity. Only when you are able to collect and retain all your data, you can
see the full picture.
• Turn raw data into insight
Cleanse, blend and transform your data, give it context and meaning so decision
makers can execute.
• Maintain governance, compliance and security standards
Increase consistency and confidence in decision making by preserving the
confidentiality, integrity and availability of information. Protect data from
unauthenticated and unauthorized access.
• Eliminate complexities within IT
Your Modern Data Architecture should automate and optimize your data needs,
keep pace with the evolution of technology, and homogenize platforms and
infrastructures.
23Syncsort Confidential and Proprietary - do not copy or distribute
24. Shift Data and ELT Workloads out of Data Warehouses
24Syncsort Confidential and Proprietary - do not copy or distribute
25. Simplify Big Data Integration with Syncsort
25Syncsort Confidential and Proprietary - do not copy or distribute
Access Integrate Comply Simplify
Get best in class data
ingestion capabilities for
Hadoop. Mainframes,
RDBMS, MPP, JSON,
Parquet, Avro, ORC,
NoSQL, Kafka and more.
Single interface for
streaming and batch
processes. Single data
pipeline for all enterprise
data, batch or streaming.
Secure data access, data
governance and lineage.
Seamless integration
with Kerberos, Apache
Ranger, Apache Ambari,
Cloudera Manager,
Cloudera Navigator and
Sentry.
Design once, deploy
anywhere & insulate
your organization from
rapidly changing eco-
system. Future proof
your applications for new
compute frameworks, on
premise or in the cloud.
26. Simplify Big Data Integration with Syncsort
26Syncsort Confidential and Proprietary - do not copy or distribute
Access
Get best in class data
ingestion capabilities for
Hadoop. Mainframes,
RDBMS, MPP, JSON,
Parquet, Avro, ORC,
NoSQL, Kafka and more.
27. Access: Bring ALL Enterprise Data Securely to the Data Lake
• Collect virtually any data from mainframe
to relational, cloud and NoSQL sources
• Batch & streaming sources
• Access, re-format and load data directly
into Hive & Parquet. No staging required!
• Pull hundreds of tables at once into your
data hub, whole DB schemas in one
invocation
• Load more data into Hadoop in less time
27Syncsort Confidential and Proprietary - do not copy or distribute
Build Your Enterprise Data Hub
28. Access: Get Your Database data into Hadoop, At the Press of a Button
• Pull multiple data sources and funnel into your data lake --
extract and move whole DB schemas in one invocation
• One-step data movement, auto-generating jobs
• Process multiple funnels in parallel on your edge node or
from data nodes
‒ Leverages DMX-h high speed data engine via DTL
‒ Generated applications can be imported into GUI
• In-flight transformations
‒ Filtering, funnel dependency ordering, mixed source/target,
data type filtering, table exclusion/inclusion
28Syncsort Confidential and Proprietary - do not copy or distribute
DMX
DataFunnel™
29. Simplify Big Data Integration with Syncsort
29Syncsort Confidential and Proprietary - do not copy or distribute
Access Integrate
Get best in class data
ingestion capabilities for
Hadoop. Mainframes,
RDBMS, MPP, JSON,
Parquet, Avro, ORC,
NoSQL, Kafka and more.
Single interface for
streaming and batch
processes. Single data
pipeline for all enterprise
data, batch or streaming.
30. Integrate: Achieve the Fastest Path from Raw Data to Insight
• Prepare data on-the-fly
• Load into Hadoop without staging
• Write directly into Big Data formats (Parquet, Hive, etc.)
• Connect fast to NoSQL databases (Cassandra, HBase, etc.)
• Cloud Connectivity: Amazon AWS, Google Cloud
Platform, Microsoft Azure
• Get the fastest, most efficient data joins and sorts
• Dynamic planning/optimization at runtime
• Create Tableau & Qlikview files with one click
• Fastest parallel loads to Amazon Redshift, Greenplum,
Netezza, Oracle, Teradata & Vertica
30Syncsort Confidential and Proprietary - do not copy or distribute
Feed Business Intelligence Visualization
31. A single tool for designing both
streaming and batch jobs
Integrate: Single Interface for Streaming & Batch
• Kafka, Spark, Apache Nifi, HDF
• Combine legacy batch and cutting edge
streaming data sources
• Easy development in GUI – no need to
write Scala, C or Java code
31Syncsort Confidential and Proprietary - do not copy or distribute
Simplify Streaming Data Integration
32. Simplify Big Data Integration with Syncsort
32Syncsort Confidential and Proprietary - do not copy or distribute
Access Integrate Comply
Get best in class data
ingestion capabilities for
Hadoop. Mainframes,
RDBMS, MPP, JSON,
Parquet, Avro, ORC,
NoSQL, Kafka and more.
Single interface for
streaming and batch
processes. Single data
pipeline for all enterprise
data, batch or streaming.
Secure data access, data
governance and lineage.
Seamless integration
with Kerberos, Apache
Ranger, Apache Ambari,
Cloudera Manager,
Cloudera Navigator and
Sentry.
33. Comply: Secure, Manage & Monitor Your Cluster
• Kerberos-secured clusters
– Authenticated browsing
– Authenticated sampling
• Apache Sentry security certified
• Cloudera Manager
– Deploy DMX-h across cluster
– Monitor DMX-h jobs
33Syncsort Confidential and Proprietary - do not copy or distribute
34. Comply: Get Governance, Metadata and Lineage
• Metadata and data lineage for Hive, Avro and
Parquet through HCatalog
• Metadata lineage export from DMX
– Simplify audits, analytics dashboards, metrics
– Integrate with enterprise metadata repositories
• Cloudera Navigator certified integration
– Extends HCatalog metadata
– HDFS, YARN, Spark and other metadata
– Lineage, tagging
– Business and structural metadata
34Syncsort Confidential and Proprietary - do not copy or distribute
35. Simplify Big Data Integration with Syncsort
35Syncsort Confidential and Proprietary - do not copy or distribute
Access Integrate Comply Simplify
Get best in class data
ingestion capabilities for
Hadoop. Mainframes,
RDBMS, MPP, JSON,
Parquet, Avro, ORC,
NoSQL, Kafka and more.
Single interface for
streaming and batch
processes. Single data
pipeline for all enterprise
data, batch or streaming.
Secure data access, data
governance and lineage.
Seamless integration
with Kerberos, Apache
Ranger, Apache Ambari,
Cloudera Manager,
Cloudera Navigator and
Sentry.
Design once, deploy
anywhere & insulate
your organization from
rapidly changing eco-
system. Future proof
your applications for new
compute frameworks, on
premise or in the cloud.
36. Simplify: Design Once, Deploy Anywhere
• Use existing ETL skills
• No need to worry about mappers, reducers, big side or small side of joins,
and so on
• Automatic optimization for best performance, load balancing, etc.
• No changes or tuning required, even if you change execution frameworks
• Future-proof job designs for emerging compute frameworks, e.g. Spark
Single GUI Execute Anywhere!
36Syncsort Confidential and Proprietary - do not copy or distribute
Intelligent Execution - Insulate your organization from underlying complexities of Hadoop.
37. Using the Dell | Cloudera | Syncsort solution for Hadoop, an entry-level technician developed and deployed Hadoop
ETL jobs in 53.7% less time than a Hadoop expert
Simplify: Reclaim days of valuable time
Fact dimension load
with type 2 SCD
Data validation and
pre-processing
Vendor mainframe
file integration
Load Validate Int.
Source: http://en.community.dell.com/techcenter/blueprints/m/resources
37Syncsort Confidential and Proprietary - do not copy or distribute
Cut Development Time in Half!
8.3 Days
3.8 Days