Watch full webinar here: https://bit.ly/2O9gcBT
Denodo 8 expands data integration and management to data fabric with advanced data virtualization capabilities. What are they? Denodo CTO Alberto Pan will touch upon the key Denodo 8 capabilities.
3. #DenodoDataFest
Current Challenges in Data Management
1. Faster & more complex demands for decision making
▪ Provide useful information for decision making at all organization levels
▪ New users with advanced analytical skills and needs: e.g. data scientists
▪ Solution? Self Service Initiatives lead by business users, etc. → Either too complex (direct
access) or too costly (specific data marts) , Governance and consistency problems
2. Regulations, enterprise-wide governance & data security
▪ Tens of new regulations worldwide: tax, finance, privacy, HR, environmental, etc.
▪ Ensure consistency in semantics of delivered data and data quality
▪ Enforce security policies
▪ Solution? Data Governance tools. Separate, static system for documentation→ get out of sync
easily, don’t enforce policies & don’t deliver data to users
3. Complexity of DM infrastructure: IT cost reduction
▪ Huge data growth, operation costs → IT is looking for cheaper and more flexible solutions
▪ Solution? Cloud, Data Lakes → Increase integration complexity in the short term. E.g. Gartner
says “83% of Data Lakes projects have failed”
4. 4
The core of the matter is being able to consolidate many diverse data sources in an
efficient manner by allowing trusted data to be delivered from all relevant data
sources to all relevant data consumers through one common layer.
Source: Demystifying the Data Fabric, Gartner, September 2020
The Data fabric focuses on automating the process integration, transformation,
preparation, curation, security, governance, and orchestration to enable analytics
and insights quickly for business success.
Source: Enterprise Data Fabric Wave, Forrester, June 2020
5. 5
Denodo’s Logical Data Fabric Enables Information Self-Service
1. Single Access Point to all Data
at any location
2. Expose Data in Business-
Friendly form, adapted to the
needs of each consumer
3. Up to 80% reductions in data
integration costs and time to
market
4. Trusted Data: enforce
consistent semantics, data
quality, governance and
security
5. Active Data Catalog builds a
Data Marketplace for the
business
6. ML and Automation to
accelerate all steps of the data
management lifecycle
7. Unified Web Administration: Central Web Portal
Entry point for all users to all Denodo
Locations.
SSO to all tools with Kerberos, SAML,
or OAuth
Better integration between the different
Tools. E.g. Diag. & Monitor. integrated with
Solution Manager
8. 8
PaaS: Automated Infrastructure Management (1)
Define and configure clusters using UI,
Including types of nodes, TLS,
load balancing, auto-scaling, etc.
9. 9
PaaS: Automated Infrastructure Management (and 2)
Start / Stop / Monitor Clusters
Automatic installation of updates and OS
Security Patches
10. Graphical modeling wizards to
easily define business-friendly
data sets
Advanced web-based development
Studio for data developers
One-click publishing of secure Data
Services using technologies like REST,
Odata and GraphQL
11. 11
GraphQL provides a query language for APIs:
▪ GraphQL is normally used as an abstraction layer between UI and REST
services
▪ Decreases number of API requests
▪ Removes orchestration from the UI when obtaining data
▪ Denodo can provide declarative execution of GraphQL queries on top of
Denodo’s virtual data model, with zero code:
▪ Security, Optimization, Lineage…
▪ Zero development time, better performance than manual coding
GraphQL Access to Denodo)
12. Query Execution
Source
Abstraction
Virtual
Modelling
Business
Delivery
Query Optimizer
Security & Governance
Query Engine
Delegate processing to data sources
▪ Transparently switch workloads according to cost or
performance
Most advanced execution engine for distributed scenarios
▪ Unique techniques automatically rewrite user queries to
maximize pushdown
▪ Leverage MPP capabilities of data sources to deal with large
data volumes
Advancing Caching / Acceleration Mechanisms
▪ Selectively materialize subsets of the data
13. 13
Similar queries share common data and operations
Smart Query Acceleration for Analytics
Store
sales by
year and
product
category,
indicating
store
name?
Store
sales by
day, with store
location ?
14. 14
Find the common patterns
Smart Query Acceleration for Analytics
Store
sales by
day, with store
location ?
Store
sales by
year and
product
category,
indicating
store
name?
15. 15
Persist the common patterns
Smart Query Acceleration for Analytics
Store
sales by
day, with store
location ?
Store
sales by
year and
product
category,
indicating
store
name?
SUMMARY
16. 16
Proposed Summary: total sales by store and day, include only store_id
▪ Aprox. 300k rows (100 times smaller)
▪ Precomputed and stored in Redshift
On-prem AWS Cloud
Query
Execution Time (no
summaries)
Execution Time (summaries)
Total sales by year
15454 2385
Total sales by quarter,
store name and city 22491 2625
Total sales by store and
city for a specific
quarter 14712 473
Total sales in a specific
store 14363 2663
Total sales in a specific
store and year 14326 3187
17. 17
Automatically Recommend Caching / DM Strategies for Enhanced Performance
Denodo 8: ML for Smart Query Acceleration
SELECT PROD, SUM(PRICE) FROM…
SELECT CUST, COUNT(PRICE) FROM…
SELECT PROD, MIN (SALE_DATE) FROM…
SELECT PROD, SHOP, SUM (PRICE)
…
Previous Queries
SELECT PROD, CUST, SUM(PRICE),
COUNT(PRICE) …
SELECT PROD, SHOP, MIN(SALE_DATE),
SUM(PRICE)
…
Caching Expressions
(Intermediate Aggregates)
Cache Database
(or Data Source) Data Sources
18. MY RECOMENDATIONS
Data Marketplace for the Business:
Discover and contextualize interesting
datasets
Search, query and Prepare Data.
Consume with any visualization /
reporting tool
Personalized recommendations
and shortcuts to most
used datasets. Think Netflix, but your data
19. #DenodoDataFest
Denodo Notebook for Data Science
▪ Based on Apache Zeppelin
▪ Support for SQL queries,
charts, and code in Python,
R, Spark, etc.
▪ Improved multi-user support
▪ Fully integrated with
Denodo’s security system
and SSO capabilities
20. 20
Key Takeaways
The Logical Data Fabric plays a crucial role in the most pressing data management challenges
today, enabling agile delivery of trusted and governed data to any consumer
Denodo 8 reinforces Denodo strengths such as ease of use and performance, extends Denodo
reach to new use cases like Data Science, and offers first-class support for cloud and hybrid
scenarios