Presentation slides taken from Fast Data Strategy Roadshow San Francisco Bay Area.
For more Denodo 6-0 demos, please follow this link:https://goo.gl/XkxJjX
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Semantic Data Model
1. Denodo 6.0
Self-service Search, Discovery & Governance
using an Universal Semantic Data Model
Pablo Alvarez-Yanez
Director of Solutions Consulting, Denodo
3. Provide business professionals new ways to solve their data
needs with the goal of self-reliance, while minimizing the
information technology (IT) team bottleneck
What is Self Service?
4. 4
More than 60% self-service initiatives ranked as “average” or lower
But, why?
1. Lack of expertise from business users. More complicated than expected
▪ Spawns more requests to IT
2. To address 1. try to expose curated information in business-friendly form: Define
Data Marts
▪ Creating physical, curated repositories is slow, expensive and hard to maintain
▪ Spawns more requests to IT
Enabling Self Service is Hard
5. 5
How Can We Simplify That?
Can Data Virtualization help?
All company data accessible in a single place
▪ DV Hides technical complexities, source models and query languages
Minimize replication
▪ Access in Real Time and cached
Access is secured. No “free for all”
▪ Offers advanced security, integration with LDAP, Kerberos, etc
Offers datasets in business-friendly form and adapted to the needs of each type of user
▪ Can add semantic metadata (descriptions, relationships…) to the exposed information
Data in different formats and access paradigms tailored to different project needs
▪ JDBC/ODBC, SOAP, REST, OData, JMS
▪ Query, Browse and Keyword based search
8. 8
1. Source Abstraction
Abstracts access to disparate data sources
in real time thanks to native adapters
Acts as a single virtual repository
Abstracts data complexities like location,
format, protocols
Hides data complexity for ease of data access by business
9. 9
2. Semantic Data Modeling
Business Entities and pre-integrated views and reports
User friendly models decoupled from
sources
Combination of data across data sources
with a state-of-the-art optimizer
On-the fly data transformation to
homogenize formats
10. 10
3. Flexible Publication Options
SQL access for applications, reports and
dashboards: JDBC, ODBC and ADO.NET
Data Services: SOAP, REST, OData
Built-in catalog and data exploration tool
Keyword based search thanks to native
integration with Lucene and ElasticSearch
Multiple options that adapt to the needs of the consumer
11. 11
4. Development and Operations
Integrated Development Studio with easy to use drag-
and-drop interface
Advanced RDBMS-like security & auditing with
support for LDAP, Kerberos, SAML, OAuth, etc.
Monitoring and management applications to simplify
Dev/Ops tasks
Data governance: lineage, analysis of source changes
and impact on existing models, etc.
Simplifies data security, privacy, audit
12. 12
DATA CONSUMERS
DISPARATE DATA SOURCES
SQL Queries
(JDBC, ODBC, ADO.NET)
Web Services
(SOAP, REST, OData)
Web-based catalog
& search
Secure delivery
(SSL/TLS)
DATA CONSUMERS
MPP Fabric
Relational Cache
Corporate Security
Monitoring & Auditing
Metadata
Repository
Execution Engine
& Optimizer
A Modern Data Virtualization Architecture
DATA VIRTUALIZATION
14. 14
Performance
Benchmarks: Federating large data sets
Denodo has done extensive testing using queries from the standard benchmarking test TPC-DS* and the following
scenario that compares the performance of a federated approach in Denodo with an MPP system where all the
data has been replicated via ETL
* TPC-DS is the de-facto industry standard benchmark for measuring the performance of decision support
solutions including, but not limited to, Big Data systems.
15. 15
Performance
Benchmarks: Federating large data sets
Execution times are comparable with single source executions based only on automatic
optimizer decisions
Query Description Returned Rows Netezza Time
Denodo Time
(Federated Oracle, Netezza
& SQL Server)
Denodo Optimization Technique
(automatically selected)
Total sales by customer 2 M 20.9 sec. 21.4 sec. Full aggregation push-down
Total sales by customer and year
between 2000 and 2004
5.5 M 52.3 sec. 59.0 sec Full aggregation push-down
Total sales by item brand 31 K 4.7 sec. 5.0 sec. Partial aggregation push-down
Total sales by item where sale
price less than current list price
17 K 3.5 sec. 5.2 sec On the fly data movement
16. 16
Example: Average sales by State
Most Tools: No Rewriting
Problem
Join cannot be pushed down
Group By is not pushed down
All sales rows sent to Integration Layer
Un-optimized Result
Rows transferred: 290M + 2M
Heavy processing in the integration layer
Slow execution and Netezza is underutilized
(full scan)
SELECT c.name, AVG(s.amount)
FROM customer c JOIN sales s
ON c.id = s.customer_id
GROUP BY c.name
290 M 2 M
Sales Customer
join
group by
17. 17
Example: Average sales by State
After Denodo’s Rewriting – Partial Aggregation Pushdown
Denodo Benefit
Group By automatically moved below JOIN without
affecting the results (PK-FK join)
Group By pushed down to Netezza
Automatic Identification of partial aggregation
Optimized Result
Rows transferred: 2M + 2M
Leverage star-schema features:
▪ Size of Group By output determined by cardinality
of dimensions (small)
▪ Star-schema joins allow Group By push-down
SELECT c.state, AVG(s.amount)
FROM customer c JOIN sales s
ON c.id = s.customer_id
GROUP BY c.state
Count(*)
SUM(amount)
AVG = SUM(amount) / count
2 M
2 M
Sales Customer
join
group by ID
Group by name
18. 18
Example: Average sales by State
SELECT c.state, AVG(s.amount)
FROM customer c JOIN sales s
ON c.id = s.customer_id
GROUP BY c.state
How Denodo works compared with other federation engines
System Execution Time Data Transferred Optimization Technique
Denodo 9 sec. 4 M Aggregation push-down
Tableau 125 sec. 292 M None: full scan
290 M 2 M
Sales Customer
join
group by
2 M
2 M
Sales Customer
join
group by ID
Group by
name