More Related Content Similar to Data Quality and Governance in a Data Obsessed World (20) Data Quality and Governance in a Data Obsessed World1. 1
Copyright © 1991 ‐ 2016 R20/Consultancy B.V., The
Hague, The Netherlands. All rights reserved. No
part of this material may be reproduced, stored in
a retrieval system, or transmitted in any form or by
any means, electronic, mechanical, photographic,
or otherwise, without the explicit written
permission of the copyright owners.
Data Quality and
Governance in a
Data‐Obsessed World
by
Rick F. van der Lans
R20/Consultancy BV
Twitter @rick_vanderlans
www.r20.nl
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 2
Rick F. van der Lans
Rick F. van der Lans is an independent consultant, lecturer, and author. He
specializes in data warehousing, business intelligence, database technology,
and data virtualization. He is managing director of R20/Consultancy B.V.. Rick
has been involved in various projects in which data warehousing, and
integration technology was applied.
Rick van der Lans is an internationally acclaimed lecturer. He has lectured
professionally for the last twenty five years in many of the European and
Middle East countries, the USA, South America, and in Australia. He has been
invited by several major software vendors to present keynote speeches.
He is the author of several books on computing, including his new Data
Virtualization for Business Intelligence Systems. Some of these books are
available in different languages. Books such as the popular Introduction to
SQL is available in English, Dutch, Italian, Chinese, and German and is sold
world wide. He also authored The SQL Guide to Ingres and SQL for MySQL
Developers.
As author for TechTarget.com and BeyeNetwork.com, writer of whitepapers,
chairman for the annual European Enterprise Data and Business Intelligence
Conference, and as columnist for a few IT magazines, he has close contacts
with many vendors.
R20/Consultancy B.V. is located in The Hague, The Netherlands, www.r20.nl. You can get in touch with Rick via:
Email: rick@r20.nl
Twitter: @Rick_vanderlans
LinkedIn: http://www.linkedin.com/pub/rick-van-der-lans/9/207/223
2. 2
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 3
Economic Resources
Economic resources = Factors of
production
Primary resources: land, labor, and
capital
• primary factors facilitate production but
neither become part of the product
Secondary resources: materials and
energy
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 4
The New Economic Resource: Data
3. 3
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 5
Usage of Production Data is Changing
Data is used for reporting
Data is used for forecasting and predictions
Data is used for improving business
processes
Data is used for improving customer care
Data is used for product personalization
Data is used by customers and suppliers
Data is used …
Before
Now
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 6
The Importance of Data Quality
The quality of raw products determines
the quality of end products
The quality of labor determines the
quality of end products
Likewise …
The quality of data determines the
quality of an organization’s products and
efficiency
4. 4
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 7
Data Quality is Key
Source: Experian Data Quality, 2015; see https://www.edq.com/uk/resources/papers/global-data-quality-research/
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 8
The Classic Data Warehouse Architecture
ETL ETLETL
Source
systems
Data martsData
warehouse
Staging
area
Analytics &
reporting
5. 5
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 9
The Classic Data Warehouse Architecture
ETLETL
Source
systems
Data martsData
warehouse
Staging
area
Analytics &
reporting
Data Cleansing
ETL
Manual corrections
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 10
“Old” Requirements
No need for real-time data in reports
• There was time to spend on data cleansing
No need for high-quality data in
production systems
Only internally-produced data used
for reporting
Mostly internal users
All reports developed by IT
specialists
6. 6
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 11
New Requirements
Reporting and analytics requires real-
time data
External users, such as customers and
suppliers
Mixing of internal with external data
Machine-generated data
Self-service development of reports
…
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 12
Operational Business Intelligence
Web analytics: Which ad or product to present now
Security: Face recognition real-time
Factories: Changing machine settings based on real-
time events
Call Centers: Predict the chance of churning and
predict which service or upgrade to offer
Incorrect data can lead to the wrong reaction
7. 7
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 13
The Chain is Too Long for Real‐time Reporting
ETL ETLETL
Source
systems
Data martsData
warehouse
Staging
area
Operational
Analytics &
reporting
Too many steps and too much copying
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 14
The Chain is Too Long for Real‐time Reporting
ETL ETLETL
Source
systems
Data martsData
warehouse
Staging
area
Classic
Analytics &
reporting
Operational
BI reports
8. 8
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 15
Customer‐Driven BI
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 16
Real‐Time Reporting for Customers
9. 9
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 17
Real‐Time Analytics for Customers
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 18
High Data Quality
is Crucial for
Customer‐Driven BI
10. 10
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 19
Streaming Data
Producers
of data
Storage of
streaming data
Consumers
of data
Listener
Listener
Listener
Listener
Stream
processor
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 20
Data Streaming for Operational BI
ETL ETLETL
Source
systems
Data martsStaging
area
Analytics &
reporting
Data
warehouse
Producers
of data Consumers
of data
Stream
processor
?
11. 11
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 21
Self‐Service BI Continues
Self-Service Data
Visualization
Self-Service Analytics
Self-Service ETL
Self-Service Data
Preparation
Self-Service …
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 22
Self‐Service Data Preparation
Non-technical interface for
studying data files
Easy way of defining rules
Data is fixed by defining
filters, not by changing data
in source systems
Relationship with data
blending
Users are def ining t heir own
dat a qualit y rules
12. 12
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 23
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 24
Open Data is Available in Abundance
13. 13
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 25
External Data Integration by IT?
ETL ETLETL
Source
systems
Data martsData
warehouse
Staging
area
Analytics &
reporting
Social
media data
Open data
Spreadsheets
ETL
ETL ETL
?
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 26
External Data Integration by Users
ETL ETLETL
Source
systems Data marts
Data
warehouse
Staging
area Self‐Service
Analytics
Social
media data
Open data
Spreadsheets
?
14. 14
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 27
Raising the Data Quality Bar
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 28
Option 1: Do Nothing
15. 15
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 29
Option 2:
Old Technology
For New Applications
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 30
Option 3:
Adopt New Technology,
but Stick to Old Ideas
16. 16
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 31
Recommendations (1)
Data quality is not only relevant
for reporting and analytics
Data has become a primary
economic resource
Data quality improves reporting
results, but has operational
business impact as well
Poor data quality can be as
damaging to an organization as
other poor-quality resources
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 32
Recommendations (2)
Presenting poor data quality to
customers and suppliers will
reflect poorly on an organization
Poor data quality may lower trust
in the organization
17. 17
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 33
Recommendations (3)
Move data quality checks
upstream
Develop new production systems
with data quality checks built-in
Use new architectures
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 34
ETL ETLETL
Source
systems
Data martsStaging
area
Analytics &
reporting
Data
warehouse
Shortening the Chain
ETLETL
ETL
18. 18
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 35
Recommendations (4)
A dat a st rat egy is essential for
implementing an adequate data
quality program, not an option
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 36
What is Data Strategy?
A single, unified, organization-wide plan …
… for the use of corporate data …
… as a vital asset for strategic and
operational decision-making.
Investing in a formal data strategy lends
much needed intentionality around critical
data related issues, such as data quality,
metadata, performance, data distribution,
organization, ownership, security, privacy,
etc.
Source: Capstone Consulting, January 2009
19. 19
Copyright © 1991 - 2016 R20/Consultancy B.V., The Hague, The Netherlands 37
Data Quality