Watch the full session on-demand here: https://goo.gl/upxC5W
Real-Time Analytics for Big Data, Cloud & Self-Service BI
The world of data is only becoming distributed. Privacy, regulations, and the need for real-time decisions are challenging organizations’ legacy information strategy. This webinar will include an expert panel discussion on Logical Data Warehouse, Universal Semantic Layer, and Real-time Analytics by Paul Moxon (VP of Data Architectures), Pablo Alvarez (Director of Product Management), and Alberto Pan (CTO).
Attend and learn:
• The major challenges of legacy information strategies.
• How data virtualization can help you overcome these challenges.
• Strategies for enabling agile data management and analytics.
3 Reasons Data Virtualization Matters in Your Portfolio
1. DATA VIRTUALIZATION PACKED LUNCH
WEBINAR SERIES
Sessions Covering Key Data Integration Challenges
Solved with Data Virtualization
2. Next session
3 Reasons Data Virtualization Matters in
Your Portfolio
Thursday, November 16th, 2017 | 11:00am PT | 2:00pm ET
Alberto Pan
Denodo’s CTO
Pablo Alvarez
Denodo’s Director of
Product Management
Paul Moxon
Denodo’s Data
Architectures & Chief
Evangelist
6. The Data Integration Challenge
Manually access different
systems
IT responds with point-to-
point data integration
Takes too long to get
answers to business users
MarketingSales ExecutiveSupport
Database
Apps
Warehouse Cloud
Big Data
Documents AppsNo SQL
“Data bottlenecks create business bottlenecks.”
– Create a Road Map For A Real-time, Agile, Self-Service Data
Platform, Forrester Research, Dec 16, 2015
7. The Solution – A Data Abstraction Layer
Abstracts access to
disparate data sources
Acts as a single repository
(virtual)
Makes data available in
real-time to consumers
DATA ABSTRACTION LAYER
“Enterprise architects must revise their data
architecture to meet the demand for fast data.”
– Create a Road Map For A Real-time, Agile, Self-Service Data
Platform, Forrester Research, Dec 16, 2015
10. Summary
• Modern Data Architectures are much more complex than the architectures of just
10 years ago
• Replicating (copying) data into a central repository doesn’t work at this scale or
complexity
• Data Virtualization can provide access to all of your data, in real-time, and
supporting self-service with a common data model (in the context of the
business users)
• Let’s find out how…
10
11. Logical Data Warehouse
“The Logical Data Warehouse (LDW) is a new data management
architecture for analytics combining the strengths of traditional
repository warehouses with alternative data management and access
strategy.”
11
Gartner Hype Cycle for Enterprise Information Management, 2012
12. 12
The State and Future of Data Integration. Gartner, 25 may 2016
Physical data movement architectures that aren’t designed to
support the dynamic nature of business change, volatile
requirements and massive data volume are increasingly being
replaced by data virtualization.
Evolving approaches (such as the use of LDW architectures) include
implementations beyond repository-centric techniques
13. 13
DW + Cloud dimensional data
Time Dimension Fact table
(sales) Product Dimension
Customer
Dimension
CRM
SFDC
Customer
EDW
14. 14
Multiple DW integration
Time
Dimensi
on
Sales fact
Product
Dimension
Region
Finance EDW
City
Marketing EDW
Customer Fidelity factsProduct
Dimension
*Real Examples: Nationwide POC, IBM tests
Store
16. 16
Summary
▪ “The LDW is an evolution and augmentation of DW practices, not a replacement”
▪ “A repository-only style DW contains a single ontology/taxonomy, whereas in the LDW a
semantic layer can contain many combination of use cases, many business definitions of
the same information”
▪ “The LDW permits an IT organization to make a large number of datasets available for
analysis via query tools and applications.”
18. 18
Gartner, Magic Quadrant for Data Integration, 2017
The Denodo Platform ... incorporates dynamic query optimization as
a key value point. This capability includes support for cost-based
optimization specifically for high data volume and complexity;... it
has also added an in-memory data grid with Massively Parallel
Processing(MPP) architecture to its platform.
19. 19
Query Optimization: Example (1)
Naive Strategy (BI Tools, BDI Tools, Simple
federation engines):
join
union
group by
Customers (3M)
Sales previous years
(3B)Sales this year
(290M)
290M rows
300M rows
(sales previous
year)
3M rows
593M rows through
the network
Obtain Total Sales By Customer Country in the Last Two Years
20. 20
Query Optimization: Example (2)
Denodo Strategy
join
union
group by
Customers (3M)
Sales previous years
(3B)Sales this year
(290M)
3M rows (sales by
customer this year)
3M rows (sales
by customer
previous year)
3M rows
9 M rows through the
network
Obtain Total Sales By Customer Country in the Last Two Years
group by
customer
group by
customer
21. Query Optimization: Example (and 3)
union
group by
3M rows
(sales by customer
this year)
3M rows
(sales by
customer
previous year)
3M rows
(customers)
Aggregation
pushdowngroup by
customer
group by
customer
join
Integrated
MPP
processing
System Execution Time
Optimization
Technique
No Rewriting 20 min None
Denodo 6 51 sec Aggregation push-down
Denodo 7 13 sec
Aggregation push-down
+ MPP integration
22. 22
Query Optimization: Summary
▪ You can achieve excellent performance in Logical Analytics Architectures.
▪ Key techniques needed:
▪ Advanced Dynamic Optimization to minimize network traffic and leverage the
power of data sources
▪ In-memory MPP processing to speed operations atthe DV layer
▪ Advanced incremental caching for reusing commonly used data and complex
calculations
24. • Let business users access the
data that they need and stop
IT being a bottleneck
• That’s the vision as sold by
many BI tool vendors
• i.e. give me the tools and
access to the data and
stand back ☺
The Promise of Self-Service Initiatives
25. Self-Service Issues…
• Tools are designed for data analysts (or power users)
• Users who are happy finding, wrangling, cleansing data
• Creating calculations, aggregations within the data
• What about the other business users?
• People who don’t want to spend hours fighting the spreadsheet…
• Will they use common definitions for key business entities and
metrics?
• Or will they pick and choose their own?
• Ultimately, can you trust the numbers?
• Where did the data come from? How has is been manipulated?
26. Rob van der Meulen, Gartner
Gartner predicts that by 2018 most business users
will have access to self-service tools, but that only
one in 10 initiatives will be sufficiently well-
governed to avoid data inconsistencies that
negatively impact the business.
27. Self-Service with Guardrails
• Don’t build just for the ‘data cowboys’
• Create a common and consistent semantic
layer
• Everyone is using the same definitions and
metrics
• Create pre-integrated, pre-calculated data
services
• Saves the user having to do this themselves
• Ensures consistency of calculations, etc.
• But allow the cowboys to ‘roam and wrangle’
• Even the cowboys can only access ‘approved’
data sources
29. Indiana University – Decisions Support Initiative
• Multi-campus public university system in state of Indiana
• 110,000 students, 8,700 academic staff, 9 campuses statewide
• DSI Goal: To provide timely, relevant, and accurate data to decision makers
within the University system
• Turning disparate data into actionable information
• DSI portal provide ‘one stop shop’ for key data
• Prepackaged data set available for users
• Role-based access
• Data provisioned through Denodo Platform
• http://dsi.iu.edu
29
32. The Benefits of Data Virtualization
32
Complete enterprise information, combining
Web, cloud, streaming, and structured data
ROI realization within 6 months, with the
flexibility to adjust to unforeseen changes
An 80% reduction in integration costs, in
terms of resources and technology
Real-time integration and data access,
enabling faster business decisions
“Get it Real-time and Get it Fast!”