Watch the full webinar: Data Ninja Webinar Series by Denodo: https://goo.gl/gBZNXS
Enterprise semantic modeling is not a new concept. The idea of defining a semantic layer that business users can use and understand has been supported by enterprise reporting tools for a long time. However, those solutions were tied to the reporting tool of choice.
Modern data virtualization platforms like Denodo offer the capabilities to move the semantic layer outside a specific application. This means that the same semantic data model can be shared by a variety of reporting tools, published as data services and queried through a web-based catalog. The virtual layer becomes the true enterprise data fabric; all data is accessible through a unified single layer, security is always in place, and multiple access methods are available to adapt to the needs of the consumer.
This is session 4 of the Data Ninja Webinar Series organized by Denodo. If you want to learn more about some of the solutions enabled by data virtualization, click here to watch the entire series: https://goo.gl/8XFd1O
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric
1. Data Virtualization as the
Enterprise Data Fabric
webinars
Data Ninja Webinar Series
Sessions covering data virtualization solutions for driving business value
4. Agenda1.The Data Fabric
2.Evolution of the Data Fabric: A Historical Perspective
3.Benefits
4.Performance and Scalability
5.Going Beyond
6.Q&A
5. 5
In computing, a Fabric is a system of interconnected nodes
that looks like a "weave" when viewed collectively from a
distance.
In this context, a Data Fabric is a system that allows global
access to all your data assets, and leverages storage and
processing power from multiple heterogeneous nodes.
6. 6
Data Virtualization as the Data Fabric
Offers a common access point for consumers
Allows specialized data stores to be used for what
they are best at
With other approaches, like Data Lakes, that are
based on replication to a single large target system,
this ability is lost.
Data virtualization’s architecture is based on the usage of underlying
sources whenever possible.
This can be seen as a network of different specialized processing and storage
nodes that form the Data Fabric under the umbrella of a common virtual
data model:
7. 7
Successful Customer Use Cases
AGILE BUSINESS INTELLIGENCE
Replaced traditional BI with the Logical Data
Warehouse that integrates multiple sources
around a central EDW
360 VIEW APPLICATIONS
‘Unified Desktop’ that provides integrated
customer information
CLOUD INTEGRATION
Virtual layer to abstract access to SaaS
applications and enable integration with
data center
DATA SERVICES
Services Layer (REST, OData) on top of
Denodo’s data model with access to any data
9. 9
The Old Days: EDW Reporting
Simple WYSIWYG reporting tools
One-to-One reporting on top a tailor-
made Data Warehouse and Data
Marts
Problems:
Poor reusability
Reports built on top of Data Mart
data model
Excessive replication
Operational
Data
Staging EDW
SQL
Data Mart
10. 10
The Dawn: Reporting with Semantic Layers
Operational
Data
Staging EDW
SQL
More advanced reporting tools with
a built-in semantic layer for easier
use and better reusability
One-to-One reporting on top a
tailor-made Data Warehouse
Problems:
Limited to a single source
Limited to a single reporting tool
11. 11
Reporting with Federation
Operational
Data
Staging EDW
SQL
Reporting tools add a built-in
federation engine that allows for
multi-source reporting
Problems:
Bad Performance
Limited cross-source security
Limited to a single reporting tool
Other
RDBMS
12. 12
Early Data Virtualization
Operational
Data
Staging EDW
SQL
Data Virtualization as an
independent semantic abstraction
layer
Reusable semantic model can be
used by multiple reporting tools
Engine specialized in federation
(optimizer, caching, etc)
Integrated security
Other
RDBMS
Integrated
Security
Other
Sources
Cache
15. Benefits
15
Data Virtualization as the Enterprise Data Fabric
Abstracts access to disparate data sources
• Homogeneous data access regardless of back-end technology
• No need to deal with new languages and APIs: access to SFDC, Excel,
Redshift, Oracle, Hadoop, other SaaS APIs, etc.
15
Acts as a single semantic repository
• Definition of a consistent business data model across all consumers and
reporting tools
• Combination of data regardless of locations and nature
• Avoids unnecessary replication
16. Benefits
16
Data Virtualization as the Enterprise Data Fabric
16
Centralized security layer
• Role-based authorization to all tables in the virtual layer
• Integration with AD/LDAP and Kerberos
• Security is moved outside the reporting layer to avoid security bypasses
• Centralized access point simplifies operations and auditing
Real-time fabric execution model
• Advanced optimizer designed specifically for virtualization
• Execution push-down to leverage source computing capabilities
• Data comes straight from the sources
• Cache layer to improve performance when needed
18. 18
A mature virtualization engine like Denodo offers
results comparable with single source executions.
Let’s see how this is possible…
19. 19
Performance
Denodo’s unique query optimizer
Denodo’s optimizer borrows many techniques from traditional RDBMs
Cost-base query plans based on statistics and indexes
Multiple JOIN methods
Query rewriting to generate more optimal SQL
However, given the distributed execution of a query in a processing
fabric, Denodo has designed unique techniques to maximize
performance in this environment
Dynamic rewriting focused on maximizing execution at source and reduction of
network traffic
Cost estimates also factor-in:
Processing power of the sources (e.g. number of nodes in a Hadoop cluster)
Network and transfer rates
20. 20
Performance
DV Overhead: Direct vs Denodo with single source
TPCDS Benchmark Tests using JDBC with IBM Netezza as data source
with 10 Gbps LAN network
Results in seconds
When queries only hit an
individual source, the data
virtualization layer pushes
the processing completely
to the source with minimal
overhead
As a note, since data needs to flow
through the DV layer, the network
between sources and DV should be
broad to avoid network bottlenecks
21. 21
Performance
Denodo has done extensive testing using queries from the standard benchmarking test
TPC-DS* and the following scenario that compares the performance of a federated
approach in Denodo with an MPP system where all the data has been replicated via ETL
Benchmarks: Federating large data sets
Customer Dim.
2 M rows
Sales Facts
290 M rows
Items Dim.
400 K rows
* TPC-DS is the de-facto industry standard benchmark for measuring the performance of
decision support solutions including, but not limited to, Big Data systems.
vs.
Sales Facts
290 M rows
Items Dim.
400 K rows
Customer Dim.
2 M rows
22. 22
Performance
Query Description
Returned
Rows
Netezza Time
Denodo Time
(Federated Oracle,
Netezza & SQL Server)
Denodo Optimization
Technique (automatically
selected)
Total sales by customer 1.99 M 20.9 sec. 21.4 sec. Full aggregation push-down
Total sales by customer and
year between 2000 and 2004
5.51 M 52.3 sec. 59.0 sec. Full aggregation push-down
Total sales by item brand 31.35 K 4.7 sec. 5.0 sec. Partial aggregation push-down
Total sales by item where
sale price less than current
list price
17.05 K 3.5 sec. 5.2 sec. On the fly data movement
Benchmarks: Federating large data sets
Execution times are comparable with single source executions based only on automatic
optimizer decisions
23. 23
Performance
SELECT c.id, SUM(s.amount) as total
FROM customer c JOIN sales s
ON c.id = s.customer_id
GROUP BY c.id
Reporting Tools are not optimized for federation across sources
System Execution Time Data Transferred
Optimization
Technique
(automatically
selected)
Denodo 9 sec. 4 M Aggregation push-down
Tableau 125 sec. 292 M None: full scan
Join
Group
By
290 M 2 M
Sales Customer
Group
By
Join
2 M
2 M
Sales Customer
24. 24
Scalability
SQL Cluster:
Denodo1:9999
Denodo2:9999
Denodo3:9999
Denodo4:9999
Web Cont. Cluster:
Denodo1:9090
Denodo2:9090
Denodo3:9090
Denodo4:9090
Virtual Server
SQL Cluster:
192.168.0.10:9999
Web Container Cluster:
192.168.0.10:9090
Load Balancer Shared
Cache
Server
Denodo can be deployed in a
cluster for HA and horizontal
scaling
“Shared-nothing” execution
engine ensures linear
scalability
Based on the use of an
external load balancer
Supports auto-scaling for cloud
deployments (like AWS)
26. Going Beyond
26
What’s cooking in the virtualization space
26
Holistic Operations Console
• Common operations web console to orchestrate monitoring,
notifications, diagnosis, auditing, migration, license management, etc.
Web-based Self Service
• Advanced catalog enables a centralized “data marketplace”
• Keyword base search
• Collaboration (tags, comments, request for access, etc.)
Next-gen “Fabric” Execution Engine
• Tight integration with in-memory and data grids to move processing
from the virtual layer to specialized execution engines
28. Next Steps
Get Started!
Download Denodo Express: www.denodoexpress.com
Access Denodo Platform on AWS: www.denodo.com/en/denodo-
platform/denodo-platform-for-aws
Denodo Platform 6.0 Whitepaper
Download & Read:
http://www.denodo.com/en/document/whitepaper/denodo-
platform-60-whitepaper
Data Virtualization for Data Services
Visit: http://www.denodo.com/en/solutions/horizontal-
solutions/data-services
29. Data Ninja Webinar Series
Sessions covering data virtualization solutions for driving business value
Next Session:
Realizing the Promise of Data Lakes
Thursday, December 15th , 2016