The document discusses data virtualization as a solution to integrate disparate data sources in real-time. It outlines challenges with traditional data integration approaches and describes how a data abstraction layer using data virtualization can provide a single access point for all data while supporting security, governance and self-service. Key benefits include reducing data silos, faster data access, lower integration costs and enabling real-time decisions.
2. 1
Data Integration – “The Way We Were…”
Operational
Data Stores
Staging
Area
Data
Warehouse
Data
Marts
Analytics and
Reporting
ETLETLETL
3. 2
Data Exploration
Data Integration – A Modern Data Ecosystem
Governance
Platforms
Security, Compliance & Business Continuity
Information
Access
Actionable
Insight
Business
Outcomes
Data Integration
Streaming Computing
Operational and Analytical Repositories
Shared Reference Information
Data Sources and
Data Acquisition
Data Repositories
Sandboxes
New Insight
In-Memory
DB/Grid
“Fit for Purpose” Data Marts
EDW
Event Detection and Action
CRM
Marketing Automation
HR
Finance
ERP
Logistics
…………
Data reservoir
& Refinery
Discover data
Parse & Refine
Transform & Cleanse
ODS
Reports
Dashboards
Discovery
Visualization
Advanced
Analytics
4. 3
The Data Integration Challenge
Manually access different systems
IT responds with point-to-point
data integration
Takes too long to get answers to
business users
MarketingSales ExecutiveSupport
Database
Apps
Warehouse Cloud
Big Data
Documents AppsNo SQL
“Data bottlenecks create business
bottlenecks.”
– Create a Road Map For A Real-time, Agile, Self-Service Data
Platform, Forrester Research, Dec 16, 2015
5. 4
The Solution – A Data Abstraction Layer
Abstracts access to disparate data
sources
Acts as a single repository (virtual)
Makes data available in real-time
to consumers
DATA ABSTRACTION LAYER
“Enterprise architects must revise their
data architecture to meet the demand
for fast data.”
– Create a Road Map For A Real-time, Agile, Self-Service Data
Platform, Forrester Research, Dec 16, 2015
6. 5
Consume
in business
applications
Combine
Right information
at right time
2
3 DATA CONSUMERS
Enterprise Applications, Reporting, BI, Portals, ESB, Mobile, Web, Users, IoT/Streaming Data
Connect
Any source,
any format
1 DISPARATE DATA SOURCES
Databases & Warehouses, Cloud/Saas Applications, Big Data, NoSQL, Web, XML, Excel, PDF, Word...
Less StructuredMore Structured
Multiple protocols,
formats
Linked data services
query, search, browse
Request/Reply,
event driven
Secure
delivery
Library of
wrappers
Web
automation
Any data
or content
Read
& Write
DATA VIRTUALIZATION
DATA CONSUMERSAnalytical Operational
CONNECT COMBINE CONSUME
Share, Deliver,
Publish, Govern,
Collaborate
Discover,
Transform,
Prepare,
Improve Quality,
Integrate
Normalized
views of
disparate data
Agile Development
Performance
Resource Management
Lifecycle Management Data Services
Data Catalog
Governance & Metadata
Security & Data Privacy
Denodo Data Virtualization Architecture
7. 6
Modern Data Architectures are much more complex than the architectures
of just 10 years ago
Replicating (copying) data into a central repository doesn’t work at scale or
complexity needed today
Data Virtualization can provide access to all of your data, in real-time, and supporting
self-service with a common data model (in the context of the business users)
Let’s find out how…
8. Logical Data Warehouse
“The Logical Data Warehouse (LDW) is a new data management
architecture for analytics combining the strengths of traditional
repository warehouses with alternative data management and
access strategy.”
7
- Gartner Hype Cycle for Enterprise Information Management, 2012
9. 8
Data Warehouse + Cloud Dimensional Data
Time
Dimension
Fact table
(sales) Product
Dimension
Customer
Dimension
CRM
SFDC
Customer
EDW
10. 9
Multiple Data Warehouse Integration
Time
Dimension Sales fact Product
Dimension
Region
Finance EDW
City
Marketing EDW
Customer
Fidelity facts
Product
Dimension
*Real Examples: Nationwide POC, IBM tests
Store
11. 10
Horizontal Partitioning
Data Warehouse Historical Offloading
Time
Dimension
Fact table
(sales)
Product
Dimension
Retailer
Dimension
Current Sales Historical Sales
EDW
12. 11
Providing access to integrated data in real time
Big Data Analytics Framework
Benefits
▪ Enhanced insight across the
business without physically moving
data
▪ Simplified data consumption with a
single endpoint for all data access
▪ Faster integration of new data
sources
▪ Smarter decision making via
additional information-enrichment
capabilities
▪ Increased speed and agility of both
business and IT, significantly
increasing customer satisfaction
13. 12
Logical Data Warehouse at Autodesk
Benefits
▪ For the first time, Autodesk can do
single-point security enforcement
and have uniform data
environment for access.
▪ Reduced replication of data with
less use of ETL processes
▪ Single point of enforcement for
security
▪ Uniform environment for data
access in place
▪ Development flexibility to
understand what is needed to
build before actually building
14. 13
Summary
The Logical Data Warehouse (LDW) is an evolution and augmentation of DW
practices, not a replacement
A repository-only style data warehouse contains a single ontology/ taxonomy,
whereas in the LDW a semantic layer can contain many combination of use cases,
many business definitions of the same information
The LDW permits an IT organization to make a large number of datasets available for
analysis via query tools and applications
16. 15
- Gartner, Magic Quadrant for Data Integration, 2017
The Denodo Platform ... incorporates dynamic query optimization as a key value
point. This capability includes support for cost-based optimization specifically for
high data volume and complexity;... it has also added an in-memory data grid
with Massively Parallel Processing(MPP) architecture to its platform.
17. 16
Obtain Total Sales By Customer Country in the Last Two Years
Query Optimization: Example (1)
Naive Strategy (BI Tools, BDI Tools, Simple federation engines):
join
union
group by
Customers (3M)
Sales previous years (38)
Sales this year (290M)
290M rows 300M rows (sales
previous year)
3M rows 593M rows through the network
System Execution Time Optimization Technique
No Rewriting 20 min None
18. 17
Obtain Total Sales By Customer Country in the Last Two Years
Query Optimization: Example (2)
Denodo Strategy – Aggregation push-down
join
union
group by
Customers (3M)
Sales previous years (3B)
Sales this year (290M)
3M rows (sales by
customer this year)
3M rows (sales by
customer previous
year)
3M rows
9 M rows through the network
group by
customer
group by
customer
System Execution Time Optimization Technique
No Rewriting 20 min None
Denodo 6 51 sec Aggregation push-down
19. 18
Obtain Total Sales By Customer Country in the Last Two Years
Query Optimization: Example (3)
union
group by
3M rows
(sales by customer
this year)
3M rows
(sales by customer
previous year)
3M rows
(customers) Aggregation pushdown
group by
customer
group by
customer
join
Integrated
MPP processing
System Execution Time Optimization Technique
No Rewriting 20 min None
Denodo 6 51 sec Aggregation push-down
Denodo 7 13 sec
Aggregation push-down
+ MPP integration
Customers (3M)
20. 19
You can achieve excellent performance in
Logical Analytic Architectures
Key
techniques
needed:
Advanced Dynamic Optimization to minimize network
traffic and leverage the power of data sources
In-memory MPP processing to speed operations at the
Data Virtualization layer
Advanced incremental caching for reusing commonly
used data and complex calculations
22. 21
Different Data Sources – Different Security Models
Databases/EDW – Mature RBAC model
Hadoop – Kerberos
▪ Cloudera – Apache Sentry and Knox
▪ Hortonworks – Apache Ranger and Atlas
Cloud – OAuth 2.0 (?)
Files – Binary – Read access or none
Web Services – Multiple models
In many cases, the consumer has to deal with these
different security models and technologies
23. 22
Abstracting Data Source Security
Provide single data model to consumers
▪ Role-based Access to data on need basis
▪ Removes data silo security
Hide complexity of different security models and
maturities
Integrate with existing authentication system
(e.g. LDAP/AD)
Single point for monitoring/auditing
▪ Who, what, when, how, …
Ensure compliance with corporate policies
Data access and privacy rules enforced ‘on the fly’
24. 23
Security in a Hybrid Environment
Moving data to Cloud can exacerbate security and privacy problems
SaaS and Cloud data sources often have different security models
Not integrated to corporate authentication mechanisms
Potential for recreating authentication model in Cloud
Data Virtualization abstraction layer means Cloud sources can use same security
mechanism and access controls as on premise sources
25. 24
Customer Use Case - Asurion
International Expansion - moving into
different privacy and data protection
jurisdictions
New products – need for different data
types and sources
▪ Mixing structured, multi-structured,
streaming, text, video, voice, geo-
location, etc.
Moving to Cloud for increased speed and
agility
▪ Easier to spin up new virtual servers for
new data sets
Competing pressures for securing data
and providing access to data sets
Security Constraints
Geographical
Constraints
Contractual
Client
Obligations
PII Protection
Departmental
Restrictions
Fast Changing Hadoop & Cloud
Technologies
Hive, Spark,
Redshift
Maintaining
different code
base
Discover, Co-relate,
Enable Predictive
Analytics
Text, CSV, Voice,
JSON,
Streaming, 3rd
Party Data
60TB+ structured,
200TB+
telemetry &
unstructured
data
26. 25
Asurion – Hybrid Architecture
After implementing hybrid Data
Virtualization layer, Asurion was able to:
▪ Control security across entire
infrastructure from a single point
▪ Easily meet regional security and
privacy requirements
▪ Keep client data separate as
contractually required – but allow
analytics over all (anonymized) data
▪ Perform complete audits of data
access, as needed
▪ Quickly add new, compliant data
sources to system
27. 26
Governance…
Governance features are pervasive in Denodo Platform:
▪ Users can inspect catalog of virtualization objects through catalog search to find data
combinations for reuse
▪ Data lineage helps users to understand where data has come from and how it has changed from
the source
▪ Impact analysis helps architects understand the consequences of changes in the data source
schemas
▪ Propagate changes selectively with a single click.
28. 27
Data Lineage
Graphical view for showing data lineage for any field in any virtual view.
Trace source of any field:
• Includes any functions applied
to field contents.
Trace source of calculated fields:
• View calculations used to
create new fields.
30. 29
Single point for security and governance
Extends single point of control across Cloud and on premise architectures
catalog search helps users find data combinations for reuse
Data lineage helps users to understand where data has come from and how it has
changed from the source
Impact analysis helps architects understand the consequences of change
32. 31
Self-Service Challenges…….
Tools are designed for data analysts (or power users)
▪ Users who are happy finding, wrangling, cleansing data
▪ Creating calculations, aggregations within the data
What about the other business users?
▪ People who don’t want to spend hours fighting the spreadsheet…
Will they use common definitions for key business entities and metrics?
▪ Or will they pick and choose their own?
Ultimately, can you trust the numbers?
▪ Where did the data come from?
▪ How has is been manipulated?
31
33. 32
Self-Service with Guardrails
Don’t build just for the ‘data cowboys’
Create a common and consistent semantic layer
▪ Everyone is using the same definitions and metrics
Create pre-integrated, pre-calculated data services
▪ Save the user having to do this themselves
▪ Ensures consistency of calculations, etc.
But allow the cowboys to ‘roam and wrangle’
▪ Even the cowboys can only access ‘approved’ data
sources
35. 34
Logical Data Warehouse Improves Information Agility
Benefits
▪ Diverse data spread across the entire
enterprise can now be accessed
instantaneously and securely with a
proper authorization structure
▪ Core Business Intelligence logic is
becoming centralized, reducing
duplication of effort and enhancing
development efficiency
▪ Searchable data dictionary helps
report writers find the data they
need and help improve the self-
service experience
36. 35
“Get it Real-time and Get it Fast!”
The Benefits of Data Virtualization
Complete enterprise information, combining Web, cloud,
streaming, and structured data
ROI realization within 6 months, with the flexibility to
adjust to unforeseen changes
An 80% reduction in integration costs, in terms of
resources and technology
Real-time integration and data access, enabling faster
business decisions
37. 36
Denodo
The Leader in Data Virtualization
DENODO OFFICES, CUSTOMERS, PARTNERS
Palo Alto, CA.
Global presence throughout North America,
EMEA, APAC, and Latin America
LEADERSHIP
▪ Longest continuous focus on data virtualization
– since 1999
▪ Leader in 2018 Forrester Wave – Big Data Fabric
▪ Winner of numerous awards
CUSTOMERS
~500 customers, including many F500 and
G2000 companies across every major industry
have gained significant business agility and ROI
DENODO AUSTRALIA
L13 – Macquarie House, 167 Macquarie Street
NSW Sydney 2000