SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Data Quality Success Stories
1Copyright 2019 by Data Blueprint Slide #
Time:
• timeliness
• currency
• frequency
• time period
Form:
• clarity
• detail
• order
• presentation
• media
Content:
• accuracy
• relevance
• completeness
• conciseness
• scope
• performance
Starting
point
for new
system
development
data performance metadata
data architecture
data
architecture and
data models
shared data updated data
corrected
data
architecture
refinements
facts &
meanings
Metadata &
Data Storage
Starting point
for existing
systems
Metadata Refinement
• Correct Structural Defects
• Update Implementation
Metadata Creation
• Define Data Architecture
• Define Data Model Structures
Metadata Structuring
• Implement Data Model Views
• Populate Data Model Views
Data Refinement
• Correct Data Value Defects
• Re-store Data Values
Data Manipulation
• Manipulate Data
• Updata Data
Data Utilization
• Inspect Data
• Present Data
Data Creation
• Create Data
• Verify Data Values
Data Assessment
• Assess Data Values
• Assess Metadata
Data & Data Relationships
Hypotheses,
Rules
and
Quantifications
Queries
and
Reports
High Probability Data Quality Problem Cause Formulation
Raw Data
Good data is like good water: best served fresh, and ideally well-filtered. Data Management
strategies can produce tremendous procedural improvements and increased profit margins
across the board, but only if the data being managed is of a high quality. Determining how
Data Quality should be engineered provides a useful framework for utilizing Data Quality
Management effectively in support of business strategy, which in turn allows for speedy
identification of business problems, delineation between structural and practice-oriented
defects in Data Management, and proactive prevention of future issues. Organizations must
realize what it means to utilize Data Quality engineering in support of business strategy. This
webinar will illustrate how organizations with chronic business challenges often can trace the
root of the problem to poor Data Quality. Showing how Data Quality should be engineered
provides a useful framework in which to develop an effective approach. This in turn allows
organizations to more quickly identify business problems as well as data problems caused by
structural issues versus practice-oriented defects and prevent these from re-occurring.
Learning Objectives
• Help you understand foundational data quality concepts based on the DAMA Guide to Data
Management Book of Knowledge (DAMA DMBOK), as well as guiding principles, best
practices, and steps for improving data quality at your organization
• Demonstrate how chronic business challenges for organizations 

are often rooted in poor data quality
• Share case studies illustrating the hallmarks and benefits of 

data quality success
Date: October 8, 2019
Time: 2:00 PM ET/11:00 AM PT UTC-4
Presenter: Peter Aiken, Ph.D.
2Copyright 2019 by Data Blueprint Slide #
Shannon Kempe
Chief Digital Manager at Dataversity.net
Commonly 

Asked 

Questions
3Copyright 2019 by Data Blueprint Slide #
1) Will I get copies of the
slides after the event?
2) Is this being recorded?
Get Social With Us!
4Copyright 2019 by Data Blueprint Slide #
Like Us on Facebook
www.facebook.com/
datablueprint
Post questions and
comments
Find industry news, insightful
content
and event updates.
Join the Group
Data Management &
Business Intelligence
Ask questions, gain insights
and collaborate with fellow
data management
professionals
Live Twitter Feed
Join the conversation!
Follow us:
@datablueprint
@paiken
Ask questions and
submit your comments:
#dataed
• DAMA International President 2009-2013 / 2018
• DAMA International Achievement Award 2001 

(with Dr. E. F. "Ted" Codd
• DAMA International Community Award 2005
Peter Aiken, Ph.D.
5Copyright 2019 by Data Blueprint Slide #
• I've been doing this a long time
• My work is recognized as useful
• Associate Professor of IS (vcu.edu)
• Founder, Data Blueprint (datablueprint.com)
• DAMA International (dama.org)
• 10 books and dozens of articles
• Experienced w/ 500+ data
management practices worldwide
• Multi-year immersions
– US DoD (DISA/Army/Marines/DLA)
– Nokia
– Deutsche Bank
– Wells Fargo
– Walmart
– …
PETER AIKEN WITH JUANITA BILLINGS
FOREWORD BY JOHN BOTTEGA
MONETIZING
DATA MANAGEMENT
Unlocking the Value in Your Organization’s
Most Important Asset.
Data Quality Success Stories
Copyright 2019 by Data Blueprint Slide # 6Peter Aiken, PhD
Who is Joan Smith?
http://www.dataflux.com
7Copyright 2019 by Data Blueprint Slide #
Challenges
• Purchased an A4
on June 15 2007
• Had not done
business with the
dealership prior
• "makes them
seem sleazy when
I get a letter in the
mail before I've
even made the
first payment on
the car advertising
lower payments
than I got"
8Copyright 2019 by Data Blueprint Slide #
Letter from the Bank
… so please continue to open
your mail from either Chase or
Bank One
P.S. Please be on the lookout for any
upcoming communications from
either Chase or Bank One regarding
your Bank One credit card and any
other Bank One product you may
have.
Problems
• I initially discarded the letter!
• I became upset after reading it
• It proclaimed that Chase has data
quality challenges
9Copyright 2019 by Data Blueprint Slide #
How to solve this data quality problem using just tools?
Retail price for the unit was $40
10Copyright 2019 by Data Blueprint Slide #
A congratulations
letter from another
bank
Problems
• Bank did not know
it made an error
• Tools alone could
not have prevented
this error
• Lost confidence in
the ability of the
bank to manage
customer funds
11Copyright 2019 by Data Blueprint Slide # 12Copyright 2019 by Data Blueprint Slide #
DropTable
13Copyright 2019 by Data Blueprint Slide # 14Copyright 2019 by Data Blueprint Slide #
Data Quality Success Stories - Program Overview
1. Data quality must be understood as 

an engineering challenge
2. Putting a price on data quality
3. DM BoK components compliment 

each other well
4. Savings based stories
5. Innovation based stories
6. Non-monetary stories
7. Takeaways and Q&A
Four ways to make your data sparkle!
1.Prioritize the task
– Cleaning data is costly and time consuming
– Identify mission critical/non-mission critical data
2.Involve the data owners
– Seek input of business units on what constitutes "dirty"
data
3.Keep future data clean
– Incorporate processes and technologies that check
every zip code and area code
4.Align your staff with business
– Align IT staff with business units
(Source: CIO JULY 1 2004)
15Copyright 2019 by Data Blueprint Slide # 16Copyright 2019 by Data Blueprint Slide #
• Information transparency
• Analytics
• Business Intelligence
• Increasing efficiencies
• Decreasing costs
• Driving holistic decision-making
across the organization
High
Quality
Data is
Critical
• SQL Server
– 47,000,000,000,000 bytes
– Largest table 34 billion records
• Informix
– 1,800,000,000 queries/day
– 65,000,000 tables / 517,000 databases
• Teradata
– 117 billion records
– 23 TBs for one table
• DB2
– 29,838,518,078 daily queries
• SQL Server
– 47,000,000,000,000 bytes
– Largest table 34 billion records
• Informix
– 1,800,000,000 queries/day
– 65,000,000 tables / 517,000 databases
• Teradata
– 117 billion records
– 23 TBs for one table
• DB2
– 29,838,518,078 daily queries
Data Footprints
17Copyright 2019 by Data Blueprint Slide #
Repeat 100s, thousands, millions, billions of times ...
18Copyright 2019 by Data Blueprint Slide #
Death by 1000 Cuts
19Copyright 2019 by Data Blueprint Slide # 20Copyright 2019 by Data Blueprint Slide #
Garbage In ➜ Garbage Out!
My most profound lesson! (so far)
21Copyright 2019 by Data Blueprint Slide #
Perfect 

Model
Garbage 

Data
Garbage 

Results
Data
Warehouse
Machine
Learning
Business
Intelligence
Block ChainAIMDM
Data
Governance
AnalyticsTechnology
GI➜GO!
22Copyright 2019 by Data Blueprint Slide #
Perfect 

Model
Garbage 

Data
Garbage 

Results
Data
Warehouse
Machine
Learning
Block Chain
AI
MDM
Analytics
Technology
Data
Governance
GI➜GO!
Business
Intelligence
23Copyright 2019 by Data Blueprint Slide #
Perfect 

Model
Quality 

Data

is

founda-
tional
Garbage 

Results
Data
Warehouse
Machine
Learning
Block Chain
AI
MDM
Analytics
Technology
Data
Governance
GI➜GO!
Business
Intelligence
24Copyright 2019 by Data Blueprint Slide #
Perfect 

Model
Quality 

Data

is

founda-
tional
Garbage 

Results
Data
Warehouse
Machine
Learning
Business
Intelligence
Block Chain
AI
MDM
Analytics
Technology
Data
Governance
GI➜GO!
25Copyright 2019 by Data Blueprint Slide #
Perfect 

Model
Quality 

Data

is

founda-
tional
Garbage 

Results
Data
Warehouse
Machine
Learning
Block Chain
AI
MDM
Analytics
Technology
Data
Governance
GI➜GO!
Business
Intelligence
26Copyright 2019 by Data Blueprint Slide #
Perfect 

Model
Quality 

Data

is

founda-
tional
Good 

Results
Data
Warehouse
Machine
Learning
Business
Intelligence
Block Chain
AI
MDM
Analytics
Technology
Data
Governance
Quality In ➜ Quality Out!
Data Knowledge is insufficient and informal
• Data management happens 'pretty well' at 

the workgroup level
– Defining characteristic of a workgroup
– Without guidance, what are the chances that all 

workgroups are pulling toward the same objectives?
– Consider the time spent attempting informal practices
• Data chaff becomes sand in the machinery
– Preventing smooth interoperation and exchanges
– Difficulties that have been hard to account for
• Organizations and individuals lack
– Skills
– Knowledge (architecture)
– Data Engineering (how)
– Data Strategy (why)
27Copyright 2019 by Data Blueprint Slide #
Standard data
Data supply
Data literacy
Making a Better Data Sandwich
28Copyright 2019 by Data Blueprint Slide #
Data literacy
Standard data
Data supply
Making a Better Data Sandwich
29Copyright 2019 by Data Blueprint Slide #
Standard data
Data supply
Data literacy
Making a Better Data Sandwich
30Copyright 2019 by Data Blueprint Slide #
Standard data
Data supply
Data literacy
This cannot happen without engineering and architecture!
Quality engineering/

architecture work products 

do not happen accidentally!
Making a Better Data Sandwich
31Copyright 2019 by Data Blueprint Slide #
Standard data
Data supply
Data literacy
This cannot happen without data engineering and architecture!
Quality data engineering/

architecture work products 

do not happen accidentally!
Our barn had to pass a foundation inspection
32Copyright 2019 by Data Blueprint Slide #
Engineering Standards
33Copyright 2019 by Data Blueprint Slide #
USS Midway &
Pancakes
34Copyright 2019 by Data Blueprint Slide #
• It is tall
• It has a clutch
• It was built in 1942
• It is cemented to the floor
• It is still in regular use!
Why is this an excellent
engineering example?
35Copyright 2019 by Data Blueprint Slide #
Data Quality Success Stories - Program Overview
1. Data quality must be understood as 

an engineering challenge
2. Putting a price on data quality
3. DM BoK components compliment 

each other well
4. Savings based stories
5. Innovation based stories
6. Non-monetary stories
7. Takeaways and Q&A
Hidden Data Factories are expensive https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year
• Consider these two questions:
– Were your systems explicitly designed to 

be integrated or otherwise work together?
– If not then what is the likelihood that they 

will just happen to work well together?
• Data must function at the most granular 

interaction or it results in things that:
– Take longer (end-of-day job runs 45 hours)
– Cost more (the wrong assets are transferred)
– Deliver less (features are not delivered)
– Present greater risk (billing delayed 30 days, monthly)
• 20-40% of IT budgets are spent evolving data:
– Data migration (changing the location from one place to another)
– Data conversion (changing it into another form, state, or product)
– Data improvement (inspecting, manipulating it, preparing for subsequent use)
36Copyright 2019 by Data Blueprint Slide #
"The choice of data structure and algorithm
can make the difference between software
running in a few seconds or many days."

http://slideplayer.com/slide/7664141/










DQ
challenges
are context
specific!













37Copyright 2019 by Data Blueprint Slide # 38Copyright 2019 by Data Blueprint Slide #
Much more analysis is
required before we can
implement repeatable
solutions to today's data
quality challenges!
TWITTER
USERS SEND
473400TWEETS
SKYPEUSERS MAKE
176220CALLS
INSTAGRAM
GIPHY
USERS POST
PHOTOSSPOTIFYSTREAMS OVER
750,000
SONGS
TUMBLR
USERS PUBLISH
POSTS
USERS WATCH
VIDEOS
SHIPS
PACKAGES
SNAPCHATTHEWEATHER
CHANNEL
NETFLIX
USERS STREAM
97222HRS
OF VIDEO
VENMO
PROCESSES
$68493
PEER-TO-PEER
TRANSACTIONS
TINDER
AMAZON
USERS MATCH
TIMES
TEXTS SENTNEW COMMENTS
RECEIVES
USERS SHARE
SNAPS
YOUTUBE
LINKEDINGAINS
120+NEW
2083333
,
4333560, ,
1388889
,79740,
BITCOIN
NEW
FORECAST
REQUESTS
1.25
ARE CREATED
RECEIVES
,
49380,AMERICANS
USE
OF INTERNET DATA
3138420, , GB,6940,
18055555,,
1111
UBERUSERS TAKE
RIDES
1389,
1944,
SERVES UP
GIFS
12986111,,
PROFESSIONALS
GOOGLECONDUCTS
SEARCHES
3877140, ,
,,
, ,
REDDIT MINUTE
every
DAY
of the
PRESENTED BY DOMO
2018
,
39Copyright 2019 by Data Blueprint Slide #
https://www.domo.com/learn/data-never-sleeps-6
How much Data,

by the minute!
For the entirety of 2018, every minute
of every day:
• 18 million weather forecast requests
• Netflix streams almost 100,000
hours of video
• LinkedIn adds 120+ individuals
• 1,300 Uber rides
• (almost) a half million tweets
• 7,000 Tinder matches
• 1.25 new cryptocurrencies are
created
• ...
Great inspiration towards valuation ...
• How to Measure Anything: Finding the Value of 

Intangibles in Business by Douglas Hubbard (ISBN: 0470539399)
• Measurement is a reduction in uncertainty
• Formalizing stuff forces clarity
• Whatever your measurement problem is,
– it's been done before
• You have more data than you think
• You need less data than you think
• Getting data is more economical than you think
• You probably need different data than you think
• Special shout out to Chapter 7
– Measuring the value of additional information to a decision
40Copyright 2019 by Data Blueprint Slide #
Sheena's in color Activity-Based Costing Kills Someone
41Copyright 2019 by Data Blueprint Slide #
Enrico Fermi (Nobel Prize Physics 1938)
42Copyright 2019 by Data Blueprint Slide #
• Tuners in Chicago ≈ Population/people per household
times % households with tuned pianos
times tunings per year
divided by (tunings per tuner per day
times workdays/year)
• How many piano tuners in the city of Chicago?
– Without using existing lists such as yellow pages, google ...
– Current population of Chicago (3 million at the time)
– Average number of people per household (2 or 3)
– Share of households with regularly tuned pianos (1 in 3)
– Required frequency of tuning (1/year)
– How many pianos can a tuner tune daily? (4 or 5)
– How many days/year are worked (250)
Monitization: Time & Leave Tracking
At Least 300 employees are
spending 15 minutes/week
tracking leave/time
43Copyright 2019 by Data Blueprint Slide #
Capture Cost of Labor/Category
44Copyright 2019 by Data Blueprint Slide #
District-L (as an example) Leave Tracking Time Accounting
Employees 73 50
Number of documents 1000 2040
Timesheet/employee 13.7 40.8
Time spent 0.08 0.25
Hourly Cost $6.92 $6.92
Additive Rate $11.23 $11.23
Cost per timekeeper $12.31 $114.56
Total timekeeper cost $898.49 $5,727.89
Monthly cost $21,563.83 $137,469.40
Compute Labor Costs
45Copyright 2019 by Data Blueprint Slide #
Annual Organizational Totals
• $100,000 Salem
• $159,000 Lynchburg
• $100,000 Richmond
• $100,000 Suffolk
• $150,000 Fredericksburg
• $100,000 Staunton
• $100,000 NOVA
• $800,000/month or $9,600,000/annually
• Awareness of the cost of things considered overhead
46Copyright 2019 by Data Blueprint Slide #
47Copyright 2019 by Data Blueprint Slide #
Data Quality Success Stories - Program Overview
1. Data quality must be understood as 

an engineering challenge
2. Putting a price on data quality
3. DM BoK components compliment 

each other well
4. Savings based stories
5. Innovation based stories
6. Non-monetary stories
7. Takeaways and Q&A
48Copyright 2019 by Data Blueprint Slide #
Data Quality Success Stories - Program Overview
1. Data quality must be understood as 

an engineering challenge
2. Putting a price on data quality
3. DM BoK components compliment 

each other well
4. Savings based stories
5. Innovation based stories
6. Non-monetary stories
7. Takeaways and Q&A
The Data Management 

Body of 

Knowledge
49Copyright 2019 by Data Blueprint Slide #
Data 

Management 

Practice Areas
fromTheDAMAGuidetotheDataManagementBodyofKnowledge©2009byDAMAInternational
Overview: Data Quality Engineering
50Copyright 2019 by Data Blueprint Slide #
Definitions
• Quality Data
– Fit for purpose meets the requirements of its authors, users, 

and administrators (adapted from Martin Eppler)
– Synonymous with information quality, since poor data quality 

results in inaccurate information and poor business performance
• Data Quality Management
– Planning, implementation and control activities that apply quality 

management techniques to measure, assess, improve, and 

ensure data quality
– Entails the "establishment and deployment of roles, responsibilities 

concerning the acquisition, maintenance, dissemination, and 

disposition of data" http://www2.sas.com/proceedings/sugi29/098-29.pdf
✓ Critical supporting process from change management
✓ Continuous process for defining acceptable levels of data quality to meet
business needs and for ensuring that data quality meets these levels
• Data Quality Engineering
– Recognition that data quality solutions cannot not managed but must be engineered
– Engineering is the application of scientific, economic, social, and practical knowledge
in order to design, build, and maintain solutions to data quality challenges
– Engineering concepts are generally not known and understood within IT or business!
51Copyright 2019 by Data Blueprint Slide #
Spinach/Popeye story from http://it.toolbox.com/blogs/infosphere/spinach-how-a-data-quality-mistake-created-a-myth-and-a-cartoon-character-10166
Why isn't aren't my
data problems
solved by a data
warehouse?
52Copyright 2019 by Data Blueprint Slide #
Version 1
53Copyright 2019 by Data Blueprint Slide #
Data
Strategy
Data
Governance
Data 

Quality
Improving
operations in
3 data
management
practice areas
BI
Warehouse
Version 2
54Copyright 2019 by Data Blueprint Slide #
Data
Strategy
Data
Governance
BI
Warehouse
Metadata
Improving
operations in
3 data
management
practice areas
Version 3
55Copyright 2019 by Data Blueprint Slide #
Data
Strategy
Data
Governance
BI/
Warehouse
Reference &
Master Data
Perfecting
operations in 3
data
management
practice areas
56Copyright 2019 by Data Blueprint Slide #
Data Quality Success Stories - Program Overview
1. Data quality must be understood as 

an engineering challenge
2. Putting a price on data quality
3. DM BoK components compliment 

each other well
4. Savings based stories
5. Innovation based stories
6. Non-monetary stories
7. Takeaways and Q&A


Improve Operations
Innovation
Data quality focus should be sequenced
57Copyright 2019 by Data Blueprint Slide # 58Copyright 2019 by Data Blueprint Slide #
59Copyright 2019 by Data Blueprint Slide #
Ubiquitous Mystery Object
60Copyright 2019 by Data Blueprint Slide #
Complex Data Quality Problems
• Agency manages (4,000,000 data items)
– Executive in charge requested 

a conversion update
– Was told verbally the conversion was "going well"
– Demanded specifics
• Question: "How many items did you attempt to convert?"
• Answer: "100 items"
• Question: "How many were actually converted?"
• Answer: "5"
• Problems
– Not reporting the "right results"
– These "problems" were discovered too late in the project
– Unsophisticated contractor
61Copyright 2019 by Data Blueprint Slide #
Improving Data Quality during System Migration
• Challenge
– Millions of NSN/SKUs 

maintained in a catalog
– Key and other data stored in 

clear text/comment fields
– Original suggestion was manual 

approach to text extraction
– Left the data structuring problem unsolved
• Solution
– Proprietary, improvable text extraction process
– Converted non-tabular data into tabular data
– Saved a minimum of $5 million
– Literally person centuries of work
62Copyright 2019 by Data Blueprint Slide #
Unmatched
Items
Ignorable
Items
Items
Matched
Week # (% Total) (% Total) (% Total)
1 31.47% 1.34% N/A
2 21.22% 6.97% N/A
3 20.66% 7.49% N/A
4 32.48% 11.99% 55.53%
… … … …
14 9.02% 22.62% 68.36%
15 9.06% 22.62% 68.33%
16 9.53% 22.62% 67.85%
17 9.5% 22.62% 67.88%
18 7.46% 22.62% 69.92%
Determining Diminishing Returns
63Copyright 2019 by Data Blueprint Slide #
Before
After
Time needed to review all NSNs once over the life of the project:
NSNs 2,000,000
Average time to review & cleanse (in minutes) 5
Total Time (in minutes) 10,000,000
Time available per resource over a one year period of time:
Work weeks in a year 48
Work days in a week 5
Work hours in a day 7.5
Work minutes in a day 450
Total Work minutes/year 108,000
Person years required to cleanse each NSN once prior to migration:
Minutes needed 10,000,000
Minutes available person/year 108,000
Total Person-Years 92.6
Resource Cost to cleanse NSN's prior to migration:
Avg Salary for SME year (not including overhead) $60,000.00
Projected Years Required to Cleanse/Total DLA Person Year Saved 93
Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $5.5 million
Quantitative Benefits
64Copyright 2019 by Data Blueprint Slide #
Time needed to review all NSNs once over the life of the project:
NSNs 2,000,000
Average time to review & cleanse (in minutes) 5
Total Time (in minutes) 10,000,000
Time available per resource over a one year period of time:
Work weeks in a year 48
Work days in a week 5
Work hours in a day 7.5
Work minutes in a day 450
Total Work minutes/year 108,000
Person years required to cleanse each NSN once prior to migration:
Minutes needed 10,000,000
Minutes available person/year 108,000
Total Person-Years 92.6
Resource Cost to cleanse NSN's prior to migration:
Avg Salary for SME year (not including overhead) $60,000.00
Projected Years Required to Cleanse/Total DLA Person Year Saved 93
Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $5.5 million
Quantitative Benefits
65Copyright 2019 by Data Blueprint Slide #
Time needed to review all NSNs once over the life of the project:
NSNs 150,000
Average time to review & cleanse (in minutes) 5
Total Time (in minutes) 750,000
Time available per resource over a one year period of time:
Work weeks in a year 48
Work days in a week 5
Work hours in a day 7.5
Work minutes in a day 450
Total Work minutes/year 108,000
Person years required to cleanse each NSN once prior to migration:
Minutes needed 750,000
Minutes available person/year 108,000
Total Person-Years 7
Resource Cost to cleanse NSN's prior to migration:
Avg Salary for SME year (not including overhead) $60,000.00
Projected Years Required to Cleanse/Total DLA Person Year Saved 7
Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $420,000
Time needed to review all NSNs once over the life of the project:
NSNs 2,000,000
Average time to review & cleanse (in minutes) 5
Total Time (in minutes) 10,000,000
Time available per resource over a one year period of time:
Work weeks in a year 48
Work days in a week 5
Work hours in a day 7.5
Work minutes in a day 450
Total Work minutes/year 108,000
Person years required to cleanse each NSN once prior to migration:
Minutes needed 10,000,000
Minutes available person/year 108,000
Total Person-Years 92.6
Resource Cost to cleanse NSN's prior to migration:
Avg Salary for SME year (not including overhead) $60,000.00
Projected Years Required to Cleanse/Total DLA Person Year Saved 93
Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $5.5 million
Quantitative Benefits
66Copyright 2019 by Data Blueprint Slide #
Year 2000 (or Y2K) Bug
67Copyright 2019 by Data Blueprint Slide #
• Before the internet
– Computing resources were expensive
– It was worth the tradeoff to represent the year
field using two digits
– 1959 was represented to the computer as 59
– Subtracting 59 from 99 yields the correct answer
40 (for dates prior to 2000/01/01!)
– No one expected those programs to still be in use
– Documentation was poorly created/maintained
• If all these fields were not expanded to
four digits before 2000/01/01 then date
calculations will not give correct results
– Subtracting 59 from 00 yields the incorrect
answer -41
– No one knew how long this would take or 

cost–only when it must be completed!
On the OFFICIAL Clock of the United States at 1
second BEFORE Midnight showed:
December 31, 1999 11:59:59
One SECOND later the OFFICIAL Clock of the United
States showed:
January 1, 19100 00:00:01
• with a PhD in Chemical Engineering
• have to know whether this product was

Y2K compliant?
Why should a knowledge worker
68Copyright 2019 by Data Blueprint Slide #
International Chemical Company Engine Testing
69Copyright 2019 by Data Blueprint Slide #
• $1billion (+) chemical
company
• Develops/manufactures
additives enhancing the
performance of oils and
fuels ...
• ... to enhance engine/
machine performance
– Helps fuels burn cleaner
– Engines run smoother
– Machines last longer
• Tens of thousands of 

tests annually
– Test costs range up to
$250,000!
1.Manual transfer of digital data
2.Manual file movement/duplication
3.Manual data manipulation
4.Disparate synonym reconciliation
5.Tribal knowledge requirements
6.Non-sustainable technology
70Copyright 2019 by Data Blueprint Slide #
Data Integration Solution
71Copyright 2019 by Data Blueprint Slide #
• Integrated the existing systems to
easily search on and find similar
or identical tests
• Results:
– Reduced expenses
– Improved competitive edge 

and customer service
– Time savings and improve
operational capabilities
• According to our client’s internal
business case development, they
expect to realize a $25 million
gain each year thanks to this
data integration
Lockheed Martin
• 20 years of project email
– Example from Doug Laney
72Copyright 2019 by Data Blueprint Slide #
Logistics Company
• Fortune 450
• Room of 100 associates
• Manually correcting every

item on every customer invoice
• Upon noting this to the 

responsible manager - the reply was:
– This is the best quarter
– Of the best year
– I've ever had
– Perhaps I need 

to double the 

number in 

that room?
73Copyright 2019 by Data Blueprint Slide # 74Copyright 2019 by Data Blueprint Slide #
Data Quality Success Stories - Program Overview
1. Data quality must be understood as 

an engineering challenge
2. Putting a price on data quality
3. DM BoK components compliment 

each other well
4. Savings based stories
5. Innovation based stories
6. Non-monetary stories
7. Takeaways and Q&A
US DoD Reverse Engineering Program Manager
• "Your first project is to keep me from
having to testify to a Congressional
Hearing!" (Belkis Leon-Hong former ASD-C3I)
• Problem:
– 37 systems paid personnel within DoD
– How many were needed?
– How many potential losers?
– What do you mean by employee?
• Process modeling
– Inconclusive results
• Data reverse engineering - definitive
– One legged engineer, 

working in waist deep waters, 

underneath rotating helicopter blades, 

on overtime
75Copyright 2019 by Data Blueprint Slide #
Reverse Engineering New Systems
76Copyright 2019 by Data Blueprint Slide #
Reverse Engineering New Systems for Smooth Implementation. IEEE Software. March/April 1999 16(2):36-43
Platform: UniSys

OS: OS

1998 Age: 21 

Data Structure: DMS (Network)

Physical Records: 4,950,000

Logical Records: 250,000

Relationships: 62

Entities: 57

Attributes: 1478
Predicting Engineering Problem Characteristics
New System
Legacy System 

#1: Payroll
Legacy System 

#2: Personnel
Platform: Amdahl

OS: MVS

1998 Age: 15 

Data Structure: VSAM/virtual 

database tables

Physical Records: 780,000

Logical Records: 60,000

Relationships: 64

Entities: 4/350

Attributes: 683
Characteristics Logical Physical

Platform: WinTel Records: 250,000 600,000

OS: Win'95 Relationships: 1,034 1,020

1998 Age: new Entities: 1,600 2,706

Data Structure: Client/Sever RDBMS Attributes: 15,000 7,073
77Copyright 2019 by Data Blueprint Slide #
TheBudgetTrap(Parts1&2)
78Copyright 2019 by Data Blueprint Slide #
Actual Bid From Systems Integrator
79Copyright 2019 by Data Blueprint Slide #
Extreme Data Engineers ...
2 person months = 40 person days
2,000 attributes mapped onto 15,000
2,000/40 person days = 500/person day

or 500/8 hours = 62.5 attributes/hour
and
15,000/40 person days = 375/person day

or 375/8 hours = 46.875 attributes/hour
Locate, identify, understand, map, transform, document
108 attributes/60 minutes
1.8 attributes/minute!
80Copyright 2019 by Data Blueprint Slide #
What did Rolls Royce Learn
• Old model
– Sell jet engines
• New model
– Sell hours of thrust power
– Power-by-the-hour
– No payment for down time
– Wing to wing
– When was it invented?
from Nascar?
81Copyright 2019 by Data Blueprint Slide #
Fan Blade Sensor
82Copyright 2019 by Data Blueprint Slide #
• 1 Sensor
– Probabilistic (generalist) maintenance
forecasts
• 100 Sensors
– Establish optimal monitoring targets
– Finer tuned and safer maintenance
– Mission Readiness ???
– Storage $$$
– Handling $$$
– Opportunity $$$
– Systemic $$$
– Maintenance $$$
– Total > $1.5 Billion
83Copyright 2019 by Data Blueprint Slide #
Data Quality Success Stories - Program Overview
1. Data quality must be understood as 

an engineering challenge
2. Putting a price on data quality
3. DM BoK components compliment 

each other well
4. Savings based stories
5. Innovation based stories
6. Non-monetary stories
7. Takeaways and Q&A
Armed Force Example
• Lieutenant attempting to
correct a 4 year
underpayment 

of his private's pay
– Significant impact on moral
– Immediate cash issues
– Cost tens of man hours over
months of time to resolve
84Copyright 2019 by Data Blueprint Slide #
Nugee, R. and R. S. Seiner (2010, 6/1/2010). "TDAN.com Interview with Brigadier Richard Nugee – The British Army." 2013, from http://www.tdan.com/view-special- features/13897 and personal communications.
Friendly Fire deaths traced to Dead Battery
• Date: Tue, 26 Mar 2002 10:47:52 -0500

From: 

Subject: Friendly Fire deaths traced to dead battery



In one of the more horrifying incidents I've read about, U.S. soldiers and

allies were killed in December 2001 because of a stunningly poor design of a

GPS receiver, plus "human error."



http://www.washingtonpost.com/wp-dyn/articles/A8853-2002Mar23.html



A U.S. Special Forces air controller was calling in GPS positioning from

some sort of battery-powered device. He "had used the GPS receiver to

calculate the latitude and longitude of the Taliban position in minutes and

seconds for an airstrike by a Navy F/A-18."

• According to the *Post* story, the bomber crew "required" a "second

calculation in 'degree decimals'" -- why the crew did not have equipment to

perform the minutes-seconds conversion themselves is not explained.

• The air controller had recorded the correct value in the GPS receiver when

the battery died. Upon replacing the battery, he called in the

degree-decimal position the unit was showing -- without realizing that the

unit is set up to reset to its *own* position when the battery is replaced.



The 2,000-pound bomb landed on his position, killing three Special Forces

soldiers and injuring 20 others.

• If the information in this story is accurate, the RISKS involve replacing

memory settings with an apparently-valid default value instead of blinking 0

or some other obviously-wrong display; not having a backup battery to hold

values in memory during battery replacement; not equipping users to

translate one coordinate system to another (reminiscent of the Mars Climate

Orbiter slamming into the planet when ground crews confused English with

metric); and using a device with such flaws in a combat situation
85Copyright 2019 by Data Blueprint Slide #
Formalizing the 

Role of U.S. Army 

Data Governance
86Copyright 2019 by Data Blueprint Slide #
How one inventory item proliferates data throughout the chain
555 Subassemblies & subcomponents
17,659 Repair parts or Consumables
System 1:

18,214 Total items

75 Attributes/item

1,366,050 Total attributes
System 2

47 Total items

15+ Attributes/item

720 Total attributes
System 3
16,594 Total items
73 Attributes/item
1,211,362 Total attributes
System 4

8,535 Total items

16 Attributes/item

136,560 Total attributes
System 5

15,959 Total items

22 Attributes/item

351,098 Total attributes
Total for the five systems show above:

59,350 Items

179 Unique attributes

3,065,790 values
87Copyright 2019 by Data Blueprint Slide # 88Copyright 2019 by Data Blueprint Slide #
Business Implications
• National Stock Number (NSN) 

Discrepancies
– If NSNs in LUAF, GABF, and RTLS are 

not present in the MHIF, these records 

cannot be updated in SASSY
– Additional overhead is created to correct 

data before performing the real 

maintenance of records
• Serial Number Duplication
– If multiple items are assigned the same 

serial number in RTLS, the traceability of 

those items is severely impacted
– Approximately $531 million of SAC 3 

items have duplicated serial numbers
• On-Hand Quantity Discrepancies
– If the LUAF O/H QTY and number of items serialized in RTLS conflict, there
can be no clear answer as to how many items a unit actually has on-hand
– Approximately $5 billion of equipment does not tie out between the systems
89Copyright 2019 by Data Blueprint Slide #
Best approaches combines manual and automation
Humans Generally Better Machines Generally Better
• Sense low level stimuli
• Detect stimuli in noisy background
• Recognize constant patterns in varying situations
• Sense unusual and unexpected events
• Remember principles and strategies
• Retrieve pertinent details without a priori
connection
• Draw upon experience and adapt decision to
situation
• Select alternatives if original approach fails
• Reason inductively; generalize from observations
• Act in unanticipated emergencies and novel
situations
• Apply principles to solve varied problems
• Make subjective evaluations
• Develop new solutions
• Concentrate on important tasks when overload
occurs
• Adapt physical response to changes in situation
• Sense stimuli outside human's range
• Count or measure physical quantities
• Store quantities of coded information accurately
• Monitor prespecified events, especially infrequent
• Make rapid and consisted responses to input
signals
• Recall quantities of detailed information
accurately
• Retrieve pertinent detailed without a priori
connection
• Process quantitative data in prespecified ways
• Perform repetitive preprogrammed actions
reliably
• Exert great, highly controlled physical force
• Perform several activities simultaneously
• Maintain operations under heavy operation load
• Maintain performance over extended periods of
time
90Copyright 2019 by Data Blueprint Slide #
91Copyright 2019 by Data Blueprint Slide #
Potential Data Sources
92Copyright 2019 by Data Blueprint Slide #
Data Mapping
12
Mental
illness
Deploy
ments
Work
History
Soldier Legal
Issues
Abuse
Suicide
Analysis
FAPDMSS G1 DMDC CID
Data objects
complete?
All sources
identified?
Best source for
each object?
How reconcile
differences
between
sources?
MDR
93Copyright 2019 by Data Blueprint Slide # 94Copyright 2019 by Data Blueprint Slide #
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
95Copyright 2019 by Data Blueprint Slide #
Senior Army Official
• Room full of Stewards
• A very heavy dose of management support
• Advised the group of his opinion on the matter
• Any questions as to future direction
– "They should make an appointment to speak directly with
me!"
• Empower the team
– The conversation turned from "can this be done?" to "how are we going
to accomplish this?"
– Mistakes along the way would be tolerated
– Implement a workable solution in prototype form
96Copyright 2019 by Data Blueprint Slide #
97Copyright 2019 by Data Blueprint Slide #
Managing Data with Guidance?
• Federal employees
• 44 users from whitehouse.gov
• Thousands of military and 

government e-mails
• Canadian citizens
• One-fifth of Quebec
98Copyright 2019 by Data Blueprint Slide #


Ashley

Madison

37,000,000




25,000,000

OPM






70,000,000

Target
99Copyright 2019 by Data Blueprint Slide #
Target Corporation's Database Contents
100Copyright 2019 by Data Blueprint Slide #
• Your age
• Marital status
• Part of town you live in
• How long it takes you to drive
to work
• Estimated salary
• If you have recently moved
• Credit cards carried in your
wallet
• What websites you visit
• Your ethnicity
• Your job history
• The magazines you read
• Work commute
• Sexual preferences
• If you’ve ever declared
bankruptcy or got divorced
• The year you bought (or lost)
your house
• Where you went to school(s)
• What kinds of topics you talk
about online
• Whether you prefer certain
brands of coffee, paper
towels, cereal or applesauce
• Your political leanings,
reading habits, charitable
giving and
• The number of cars you own
101Copyright 2019 by Data Blueprint Slide #
https://oversight.house.gov/report/opm-data-breach-government-jeopardized-national-security-generation/
How the Government Jeopardized Our National
Security for More than a Generation
• Preventable
• Leadership failed
– To heed repeated
recommendations
– To sufficiently respond
to growing threats of
sophisticated cyber
attacks, and
– To prioritize resources
for cybersecurity
• 2014 data breaches
were likely connected
and possibly
coordinated to the 2015
data breach
• OPM misled the public
on the extent of the
damage of the breach
and made false
statements to Congress
Key Findings
102Copyright 2019 by Data Blueprint Slide #
Data Quality Success Stories - Program Overview
1. Data quality must be understood as 

an engineering challenge
2. Putting a price on data quality
3. DM BoK components compliment 

each other well
4. Savings based stories
5. Innovation based stories
6. Non-monetary stories
7. Takeaways and Q&A
• Quality data requires a context specific definition
• Most business problems have data challenges (hidden data
factories) at their root
• All advanced data practices depend on quality data
• AI/ML are suffering from lack of training data
• Few 'easy' fixes exist
• Data quality engineering works well when combined with other DM
BoK 'pie wedges'
• Successful data quality stories demonstrate
– Tangible ongoing savings
– Innovative data uses
– Outcomes more important than money
Take Aways
103Copyright 2019 by Data Blueprint Slide #
+ =
104Copyright 2019 by Data Blueprint Slide #
Questions?
10124 W. Broad Street, Suite C
Glen Allen, Virginia 23060
804.521.4056
Copyright 2019 by Data Blueprint Slide #
105

Contenu connexe

Tendances

Linking Data Governance to Business Goals
Linking Data Governance to Business GoalsLinking Data Governance to Business Goals
Linking Data Governance to Business GoalsPrecisely
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Top 7 Capabilities for Next-Gen Master Data Management
Top 7 Capabilities for Next-Gen Master Data ManagementTop 7 Capabilities for Next-Gen Master Data Management
Top 7 Capabilities for Next-Gen Master Data ManagementDATAVERSITY
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureDATAVERSITY
 
Master Data Management - Aligning Data, Process, and Governance
Master Data Management - Aligning Data, Process, and GovernanceMaster Data Management - Aligning Data, Process, and Governance
Master Data Management - Aligning Data, Process, and GovernanceDATAVERSITY
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data GovernanceTuba Yaman Him
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model DATUM LLC
 
Master Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceMaster Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceDATAVERSITY
 
Introduction to Data Governance
Introduction to Data GovernanceIntroduction to Data Governance
Introduction to Data GovernanceJohn Bao Vuu
 
Introduction to DCAM, the Data Management Capability Assessment Model - Editi...
Introduction to DCAM, the Data Management Capability Assessment Model - Editi...Introduction to DCAM, the Data Management Capability Assessment Model - Editi...
Introduction to DCAM, the Data Management Capability Assessment Model - Editi...Element22
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best PracticesDATAVERSITY
 
Overcoming the Challenges of your Master Data Management Journey
Overcoming the Challenges of your Master Data Management JourneyOvercoming the Challenges of your Master Data Management Journey
Overcoming the Challenges of your Master Data Management JourneyJean-Michel Franco
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogDATAVERSITY
 
Review of Data Management Maturity Models
Review of Data Management Maturity ModelsReview of Data Management Maturity Models
Review of Data Management Maturity ModelsAlan McSweeney
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best PracticesBoris Otto
 
Data Governance and Stewardship Roundtable
Data Governance and Stewardship RoundtableData Governance and Stewardship Roundtable
Data Governance and Stewardship RoundtableSumma
 

Tendances (20)

Linking Data Governance to Business Goals
Linking Data Governance to Business GoalsLinking Data Governance to Business Goals
Linking Data Governance to Business Goals
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Top 7 Capabilities for Next-Gen Master Data Management
Top 7 Capabilities for Next-Gen Master Data ManagementTop 7 Capabilities for Next-Gen Master Data Management
Top 7 Capabilities for Next-Gen Master Data Management
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
 
Data Strategy
Data StrategyData Strategy
Data Strategy
 
Master Data Management - Aligning Data, Process, and Governance
Master Data Management - Aligning Data, Process, and GovernanceMaster Data Management - Aligning Data, Process, and Governance
Master Data Management - Aligning Data, Process, and Governance
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data Governance
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Data Governance for Enterprises
Data Governance for EnterprisesData Governance for Enterprises
Data Governance for Enterprises
 
How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model
 
Master Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceMaster Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and Governance
 
Introduction to Data Governance
Introduction to Data GovernanceIntroduction to Data Governance
Introduction to Data Governance
 
Introduction to DCAM, the Data Management Capability Assessment Model - Editi...
Introduction to DCAM, the Data Management Capability Assessment Model - Editi...Introduction to DCAM, the Data Management Capability Assessment Model - Editi...
Introduction to DCAM, the Data Management Capability Assessment Model - Editi...
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 
Overcoming the Challenges of your Master Data Management Journey
Overcoming the Challenges of your Master Data Management JourneyOvercoming the Challenges of your Master Data Management Journey
Overcoming the Challenges of your Master Data Management Journey
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
Review of Data Management Maturity Models
Review of Data Management Maturity ModelsReview of Data Management Maturity Models
Review of Data Management Maturity Models
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 
Data Governance and Stewardship Roundtable
Data Governance and Stewardship RoundtableData Governance and Stewardship Roundtable
Data Governance and Stewardship Roundtable
 

Similaire à Data Quality Success Stories

Data Quality Success Stories
Data Quality Success StoriesData Quality Success Stories
Data Quality Success StoriesDATAVERSITY
 
DataEd Slides: Approaching Data Management Technologies
DataEd Slides:  Approaching Data Management TechnologiesDataEd Slides:  Approaching Data Management Technologies
DataEd Slides: Approaching Data Management TechnologiesDATAVERSITY
 
DataEd Webinar: Reference & Master Data Management - Unlocking Business Value
DataEd Webinar:  Reference & Master Data Management - Unlocking Business ValueDataEd Webinar:  Reference & Master Data Management - Unlocking Business Value
DataEd Webinar: Reference & Master Data Management - Unlocking Business ValueDATAVERSITY
 
DataEd Slides: Data Modeling is Fundamental
DataEd Slides:  Data Modeling is FundamentalDataEd Slides:  Data Modeling is Fundamental
DataEd Slides: Data Modeling is FundamentalDATAVERSITY
 
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)DATAVERSITY
 
DataEd Slides: Exorcising the Seven Deadly Data Sins
DataEd Slides: Exorcising the Seven Deadly Data SinsDataEd Slides: Exorcising the Seven Deadly Data Sins
DataEd Slides: Exorcising the Seven Deadly Data SinsDATAVERSITY
 
DataEd Slides: Data Strategy Best Practices
DataEd Slides:  Data Strategy Best PracticesDataEd Slides:  Data Strategy Best Practices
DataEd Slides: Data Strategy Best PracticesDATAVERSITY
 
DataEd Slides: Leveraging Data Management Technologies
DataEd Slides: Leveraging Data Management TechnologiesDataEd Slides: Leveraging Data Management Technologies
DataEd Slides: Leveraging Data Management TechnologiesDATAVERSITY
 
Getting Data Quality Right
Getting Data Quality RightGetting Data Quality Right
Getting Data Quality RightDATAVERSITY
 
Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM Data Blueprint
 
Data-Ed Online Webinar: Business Value from MDM
Data-Ed Online Webinar: Business Value from MDMData-Ed Online Webinar: Business Value from MDM
Data-Ed Online Webinar: Business Value from MDMDATAVERSITY
 
Data Governance Strategies - With Great Power Comes Great Accountability
Data Governance Strategies - With Great Power Comes Great AccountabilityData Governance Strategies - With Great Power Comes Great Accountability
Data Governance Strategies - With Great Power Comes Great AccountabilityDATAVERSITY
 
DataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best PracticesDataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best PracticesDATAVERSITY
 
Data-Ed Webinar: Data Architecture Requirements
Data-Ed Webinar: Data Architecture RequirementsData-Ed Webinar: Data Architecture Requirements
Data-Ed Webinar: Data Architecture RequirementsDATAVERSITY
 
Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements  Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements Data Blueprint
 
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DATAVERSITY
 
Data-Ed Webinar: Monetizing Data Management - Show Me the Money
Data-Ed Webinar: Monetizing Data Management - Show Me the MoneyData-Ed Webinar: Monetizing Data Management - Show Me the Money
Data-Ed Webinar: Monetizing Data Management - Show Me the MoneyDATAVERSITY
 
Data Architecture Strategies
Data Architecture StrategiesData Architecture Strategies
Data Architecture StrategiesDATAVERSITY
 
Data-Ed Online: Approaching Data Quality
Data-Ed Online: Approaching Data QualityData-Ed Online: Approaching Data Quality
Data-Ed Online: Approaching Data QualityDATAVERSITY
 
DataEd Slides: The Seven Deadly Data Sins
DataEd Slides: The Seven Deadly Data SinsDataEd Slides: The Seven Deadly Data Sins
DataEd Slides: The Seven Deadly Data SinsDATAVERSITY
 

Similaire à Data Quality Success Stories (20)

Data Quality Success Stories
Data Quality Success StoriesData Quality Success Stories
Data Quality Success Stories
 
DataEd Slides: Approaching Data Management Technologies
DataEd Slides:  Approaching Data Management TechnologiesDataEd Slides:  Approaching Data Management Technologies
DataEd Slides: Approaching Data Management Technologies
 
DataEd Webinar: Reference & Master Data Management - Unlocking Business Value
DataEd Webinar:  Reference & Master Data Management - Unlocking Business ValueDataEd Webinar:  Reference & Master Data Management - Unlocking Business Value
DataEd Webinar: Reference & Master Data Management - Unlocking Business Value
 
DataEd Slides: Data Modeling is Fundamental
DataEd Slides:  Data Modeling is FundamentalDataEd Slides:  Data Modeling is Fundamental
DataEd Slides: Data Modeling is Fundamental
 
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
 
DataEd Slides: Exorcising the Seven Deadly Data Sins
DataEd Slides: Exorcising the Seven Deadly Data SinsDataEd Slides: Exorcising the Seven Deadly Data Sins
DataEd Slides: Exorcising the Seven Deadly Data Sins
 
DataEd Slides: Data Strategy Best Practices
DataEd Slides:  Data Strategy Best PracticesDataEd Slides:  Data Strategy Best Practices
DataEd Slides: Data Strategy Best Practices
 
DataEd Slides: Leveraging Data Management Technologies
DataEd Slides: Leveraging Data Management TechnologiesDataEd Slides: Leveraging Data Management Technologies
DataEd Slides: Leveraging Data Management Technologies
 
Getting Data Quality Right
Getting Data Quality RightGetting Data Quality Right
Getting Data Quality Right
 
Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM
 
Data-Ed Online Webinar: Business Value from MDM
Data-Ed Online Webinar: Business Value from MDMData-Ed Online Webinar: Business Value from MDM
Data-Ed Online Webinar: Business Value from MDM
 
Data Governance Strategies - With Great Power Comes Great Accountability
Data Governance Strategies - With Great Power Comes Great AccountabilityData Governance Strategies - With Great Power Comes Great Accountability
Data Governance Strategies - With Great Power Comes Great Accountability
 
DataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best PracticesDataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best Practices
 
Data-Ed Webinar: Data Architecture Requirements
Data-Ed Webinar: Data Architecture RequirementsData-Ed Webinar: Data Architecture Requirements
Data-Ed Webinar: Data Architecture Requirements
 
Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements  Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements
 
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
 
Data-Ed Webinar: Monetizing Data Management - Show Me the Money
Data-Ed Webinar: Monetizing Data Management - Show Me the MoneyData-Ed Webinar: Monetizing Data Management - Show Me the Money
Data-Ed Webinar: Monetizing Data Management - Show Me the Money
 
Data Architecture Strategies
Data Architecture StrategiesData Architecture Strategies
Data Architecture Strategies
 
Data-Ed Online: Approaching Data Quality
Data-Ed Online: Approaching Data QualityData-Ed Online: Approaching Data Quality
Data-Ed Online: Approaching Data Quality
 
DataEd Slides: The Seven Deadly Data Sins
DataEd Slides: The Seven Deadly Data SinsDataEd Slides: The Seven Deadly Data Sins
DataEd Slides: The Seven Deadly Data Sins
 

Plus de DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...DATAVERSITY
 

Plus de DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
 

Dernier

RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 

Dernier (20)

RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 

Data Quality Success Stories

  • 1. Data Quality Success Stories 1Copyright 2019 by Data Blueprint Slide # Time: • timeliness • currency • frequency • time period Form: • clarity • detail • order • presentation • media Content: • accuracy • relevance • completeness • conciseness • scope • performance Starting point for new system development data performance metadata data architecture data architecture and data models shared data updated data corrected data architecture refinements facts & meanings Metadata & Data Storage Starting point for existing systems Metadata Refinement • Correct Structural Defects • Update Implementation Metadata Creation • Define Data Architecture • Define Data Model Structures Metadata Structuring • Implement Data Model Views • Populate Data Model Views Data Refinement • Correct Data Value Defects • Re-store Data Values Data Manipulation • Manipulate Data • Updata Data Data Utilization • Inspect Data • Present Data Data Creation • Create Data • Verify Data Values Data Assessment • Assess Data Values • Assess Metadata Data & Data Relationships Hypotheses, Rules and Quantifications Queries and Reports High Probability Data Quality Problem Cause Formulation Raw Data Good data is like good water: best served fresh, and ideally well-filtered. Data Management strategies can produce tremendous procedural improvements and increased profit margins across the board, but only if the data being managed is of a high quality. Determining how Data Quality should be engineered provides a useful framework for utilizing Data Quality Management effectively in support of business strategy, which in turn allows for speedy identification of business problems, delineation between structural and practice-oriented defects in Data Management, and proactive prevention of future issues. Organizations must realize what it means to utilize Data Quality engineering in support of business strategy. This webinar will illustrate how organizations with chronic business challenges often can trace the root of the problem to poor Data Quality. Showing how Data Quality should be engineered provides a useful framework in which to develop an effective approach. This in turn allows organizations to more quickly identify business problems as well as data problems caused by structural issues versus practice-oriented defects and prevent these from re-occurring. Learning Objectives • Help you understand foundational data quality concepts based on the DAMA Guide to Data Management Book of Knowledge (DAMA DMBOK), as well as guiding principles, best practices, and steps for improving data quality at your organization • Demonstrate how chronic business challenges for organizations 
 are often rooted in poor data quality • Share case studies illustrating the hallmarks and benefits of 
 data quality success Date: October 8, 2019 Time: 2:00 PM ET/11:00 AM PT UTC-4 Presenter: Peter Aiken, Ph.D. 2Copyright 2019 by Data Blueprint Slide # Shannon Kempe Chief Digital Manager at Dataversity.net Commonly 
 Asked 
 Questions 3Copyright 2019 by Data Blueprint Slide # 1) Will I get copies of the slides after the event? 2) Is this being recorded? Get Social With Us! 4Copyright 2019 by Data Blueprint Slide # Like Us on Facebook www.facebook.com/ datablueprint Post questions and comments Find industry news, insightful content and event updates. Join the Group Data Management & Business Intelligence Ask questions, gain insights and collaborate with fellow data management professionals Live Twitter Feed Join the conversation! Follow us: @datablueprint @paiken Ask questions and submit your comments: #dataed • DAMA International President 2009-2013 / 2018 • DAMA International Achievement Award 2001 
 (with Dr. E. F. "Ted" Codd • DAMA International Community Award 2005 Peter Aiken, Ph.D. 5Copyright 2019 by Data Blueprint Slide # • I've been doing this a long time • My work is recognized as useful • Associate Professor of IS (vcu.edu) • Founder, Data Blueprint (datablueprint.com) • DAMA International (dama.org) • 10 books and dozens of articles • Experienced w/ 500+ data management practices worldwide • Multi-year immersions – US DoD (DISA/Army/Marines/DLA) – Nokia – Deutsche Bank – Wells Fargo – Walmart – … PETER AIKEN WITH JUANITA BILLINGS FOREWORD BY JOHN BOTTEGA MONETIZING DATA MANAGEMENT Unlocking the Value in Your Organization’s Most Important Asset. Data Quality Success Stories Copyright 2019 by Data Blueprint Slide # 6Peter Aiken, PhD
  • 2. Who is Joan Smith? http://www.dataflux.com 7Copyright 2019 by Data Blueprint Slide # Challenges • Purchased an A4 on June 15 2007 • Had not done business with the dealership prior • "makes them seem sleazy when I get a letter in the mail before I've even made the first payment on the car advertising lower payments than I got" 8Copyright 2019 by Data Blueprint Slide # Letter from the Bank … so please continue to open your mail from either Chase or Bank One P.S. Please be on the lookout for any upcoming communications from either Chase or Bank One regarding your Bank One credit card and any other Bank One product you may have. Problems • I initially discarded the letter! • I became upset after reading it • It proclaimed that Chase has data quality challenges 9Copyright 2019 by Data Blueprint Slide # How to solve this data quality problem using just tools? Retail price for the unit was $40 10Copyright 2019 by Data Blueprint Slide # A congratulations letter from another bank Problems • Bank did not know it made an error • Tools alone could not have prevented this error • Lost confidence in the ability of the bank to manage customer funds 11Copyright 2019 by Data Blueprint Slide # 12Copyright 2019 by Data Blueprint Slide # DropTable
  • 3. 13Copyright 2019 by Data Blueprint Slide # 14Copyright 2019 by Data Blueprint Slide # Data Quality Success Stories - Program Overview 1. Data quality must be understood as 
 an engineering challenge 2. Putting a price on data quality 3. DM BoK components compliment 
 each other well 4. Savings based stories 5. Innovation based stories 6. Non-monetary stories 7. Takeaways and Q&A Four ways to make your data sparkle! 1.Prioritize the task – Cleaning data is costly and time consuming – Identify mission critical/non-mission critical data 2.Involve the data owners – Seek input of business units on what constitutes "dirty" data 3.Keep future data clean – Incorporate processes and technologies that check every zip code and area code 4.Align your staff with business – Align IT staff with business units (Source: CIO JULY 1 2004) 15Copyright 2019 by Data Blueprint Slide # 16Copyright 2019 by Data Blueprint Slide # • Information transparency • Analytics • Business Intelligence • Increasing efficiencies • Decreasing costs • Driving holistic decision-making across the organization High Quality Data is Critical • SQL Server – 47,000,000,000,000 bytes – Largest table 34 billion records • Informix – 1,800,000,000 queries/day – 65,000,000 tables / 517,000 databases • Teradata – 117 billion records – 23 TBs for one table • DB2 – 29,838,518,078 daily queries • SQL Server – 47,000,000,000,000 bytes – Largest table 34 billion records • Informix – 1,800,000,000 queries/day – 65,000,000 tables / 517,000 databases • Teradata – 117 billion records – 23 TBs for one table • DB2 – 29,838,518,078 daily queries Data Footprints 17Copyright 2019 by Data Blueprint Slide # Repeat 100s, thousands, millions, billions of times ... 18Copyright 2019 by Data Blueprint Slide #
  • 4. Death by 1000 Cuts 19Copyright 2019 by Data Blueprint Slide # 20Copyright 2019 by Data Blueprint Slide # Garbage In ➜ Garbage Out! My most profound lesson! (so far) 21Copyright 2019 by Data Blueprint Slide # Perfect 
 Model Garbage 
 Data Garbage 
 Results Data Warehouse Machine Learning Business Intelligence Block ChainAIMDM Data Governance AnalyticsTechnology GI➜GO! 22Copyright 2019 by Data Blueprint Slide # Perfect 
 Model Garbage 
 Data Garbage 
 Results Data Warehouse Machine Learning Block Chain AI MDM Analytics Technology Data Governance GI➜GO! Business Intelligence 23Copyright 2019 by Data Blueprint Slide # Perfect 
 Model Quality 
 Data
 is
 founda- tional Garbage 
 Results Data Warehouse Machine Learning Block Chain AI MDM Analytics Technology Data Governance GI➜GO! Business Intelligence 24Copyright 2019 by Data Blueprint Slide # Perfect 
 Model Quality 
 Data
 is
 founda- tional Garbage 
 Results Data Warehouse Machine Learning Business Intelligence Block Chain AI MDM Analytics Technology Data Governance GI➜GO!
  • 5. 25Copyright 2019 by Data Blueprint Slide # Perfect 
 Model Quality 
 Data
 is
 founda- tional Garbage 
 Results Data Warehouse Machine Learning Block Chain AI MDM Analytics Technology Data Governance GI➜GO! Business Intelligence 26Copyright 2019 by Data Blueprint Slide # Perfect 
 Model Quality 
 Data
 is
 founda- tional Good 
 Results Data Warehouse Machine Learning Business Intelligence Block Chain AI MDM Analytics Technology Data Governance Quality In ➜ Quality Out! Data Knowledge is insufficient and informal • Data management happens 'pretty well' at 
 the workgroup level – Defining characteristic of a workgroup – Without guidance, what are the chances that all 
 workgroups are pulling toward the same objectives? – Consider the time spent attempting informal practices • Data chaff becomes sand in the machinery – Preventing smooth interoperation and exchanges – Difficulties that have been hard to account for • Organizations and individuals lack – Skills – Knowledge (architecture) – Data Engineering (how) – Data Strategy (why) 27Copyright 2019 by Data Blueprint Slide # Standard data Data supply Data literacy Making a Better Data Sandwich 28Copyright 2019 by Data Blueprint Slide # Data literacy Standard data Data supply Making a Better Data Sandwich 29Copyright 2019 by Data Blueprint Slide # Standard data Data supply Data literacy Making a Better Data Sandwich 30Copyright 2019 by Data Blueprint Slide # Standard data Data supply Data literacy This cannot happen without engineering and architecture! Quality engineering/
 architecture work products 
 do not happen accidentally!
  • 6. Making a Better Data Sandwich 31Copyright 2019 by Data Blueprint Slide # Standard data Data supply Data literacy This cannot happen without data engineering and architecture! Quality data engineering/
 architecture work products 
 do not happen accidentally! Our barn had to pass a foundation inspection 32Copyright 2019 by Data Blueprint Slide # Engineering Standards 33Copyright 2019 by Data Blueprint Slide # USS Midway & Pancakes 34Copyright 2019 by Data Blueprint Slide # • It is tall • It has a clutch • It was built in 1942 • It is cemented to the floor • It is still in regular use! Why is this an excellent engineering example? 35Copyright 2019 by Data Blueprint Slide # Data Quality Success Stories - Program Overview 1. Data quality must be understood as 
 an engineering challenge 2. Putting a price on data quality 3. DM BoK components compliment 
 each other well 4. Savings based stories 5. Innovation based stories 6. Non-monetary stories 7. Takeaways and Q&A Hidden Data Factories are expensive https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year • Consider these two questions: – Were your systems explicitly designed to 
 be integrated or otherwise work together? – If not then what is the likelihood that they 
 will just happen to work well together? • Data must function at the most granular 
 interaction or it results in things that: – Take longer (end-of-day job runs 45 hours) – Cost more (the wrong assets are transferred) – Deliver less (features are not delivered) – Present greater risk (billing delayed 30 days, monthly) • 20-40% of IT budgets are spent evolving data: – Data migration (changing the location from one place to another) – Data conversion (changing it into another form, state, or product) – Data improvement (inspecting, manipulating it, preparing for subsequent use) 36Copyright 2019 by Data Blueprint Slide # "The choice of data structure and algorithm can make the difference between software running in a few seconds or many days."
 http://slideplayer.com/slide/7664141/
  • 7. 
 
 
 
 
 DQ challenges are context specific!
 
 
 
 
 
 
 37Copyright 2019 by Data Blueprint Slide # 38Copyright 2019 by Data Blueprint Slide # Much more analysis is required before we can implement repeatable solutions to today's data quality challenges! TWITTER USERS SEND 473400TWEETS SKYPEUSERS MAKE 176220CALLS INSTAGRAM GIPHY USERS POST PHOTOSSPOTIFYSTREAMS OVER 750,000 SONGS TUMBLR USERS PUBLISH POSTS USERS WATCH VIDEOS SHIPS PACKAGES SNAPCHATTHEWEATHER CHANNEL NETFLIX USERS STREAM 97222HRS OF VIDEO VENMO PROCESSES $68493 PEER-TO-PEER TRANSACTIONS TINDER AMAZON USERS MATCH TIMES TEXTS SENTNEW COMMENTS RECEIVES USERS SHARE SNAPS YOUTUBE LINKEDINGAINS 120+NEW 2083333 , 4333560, , 1388889 ,79740, BITCOIN NEW FORECAST REQUESTS 1.25 ARE CREATED RECEIVES , 49380,AMERICANS USE OF INTERNET DATA 3138420, , GB,6940, 18055555,, 1111 UBERUSERS TAKE RIDES 1389, 1944, SERVES UP GIFS 12986111,, PROFESSIONALS GOOGLECONDUCTS SEARCHES 3877140, , ,, , , REDDIT MINUTE every DAY of the PRESENTED BY DOMO 2018 , 39Copyright 2019 by Data Blueprint Slide # https://www.domo.com/learn/data-never-sleeps-6 How much Data,
 by the minute! For the entirety of 2018, every minute of every day: • 18 million weather forecast requests • Netflix streams almost 100,000 hours of video • LinkedIn adds 120+ individuals • 1,300 Uber rides • (almost) a half million tweets • 7,000 Tinder matches • 1.25 new cryptocurrencies are created • ... Great inspiration towards valuation ... • How to Measure Anything: Finding the Value of 
 Intangibles in Business by Douglas Hubbard (ISBN: 0470539399) • Measurement is a reduction in uncertainty • Formalizing stuff forces clarity • Whatever your measurement problem is, – it's been done before • You have more data than you think • You need less data than you think • Getting data is more economical than you think • You probably need different data than you think • Special shout out to Chapter 7 – Measuring the value of additional information to a decision 40Copyright 2019 by Data Blueprint Slide # Sheena's in color Activity-Based Costing Kills Someone 41Copyright 2019 by Data Blueprint Slide # Enrico Fermi (Nobel Prize Physics 1938) 42Copyright 2019 by Data Blueprint Slide # • Tuners in Chicago ≈ Population/people per household times % households with tuned pianos times tunings per year divided by (tunings per tuner per day times workdays/year) • How many piano tuners in the city of Chicago? – Without using existing lists such as yellow pages, google ... – Current population of Chicago (3 million at the time) – Average number of people per household (2 or 3) – Share of households with regularly tuned pianos (1 in 3) – Required frequency of tuning (1/year) – How many pianos can a tuner tune daily? (4 or 5) – How many days/year are worked (250)
  • 8. Monitization: Time & Leave Tracking At Least 300 employees are spending 15 minutes/week tracking leave/time 43Copyright 2019 by Data Blueprint Slide # Capture Cost of Labor/Category 44Copyright 2019 by Data Blueprint Slide # District-L (as an example) Leave Tracking Time Accounting Employees 73 50 Number of documents 1000 2040 Timesheet/employee 13.7 40.8 Time spent 0.08 0.25 Hourly Cost $6.92 $6.92 Additive Rate $11.23 $11.23 Cost per timekeeper $12.31 $114.56 Total timekeeper cost $898.49 $5,727.89 Monthly cost $21,563.83 $137,469.40 Compute Labor Costs 45Copyright 2019 by Data Blueprint Slide # Annual Organizational Totals • $100,000 Salem • $159,000 Lynchburg • $100,000 Richmond • $100,000 Suffolk • $150,000 Fredericksburg • $100,000 Staunton • $100,000 NOVA • $800,000/month or $9,600,000/annually • Awareness of the cost of things considered overhead 46Copyright 2019 by Data Blueprint Slide # 47Copyright 2019 by Data Blueprint Slide # Data Quality Success Stories - Program Overview 1. Data quality must be understood as 
 an engineering challenge 2. Putting a price on data quality 3. DM BoK components compliment 
 each other well 4. Savings based stories 5. Innovation based stories 6. Non-monetary stories 7. Takeaways and Q&A 48Copyright 2019 by Data Blueprint Slide # Data Quality Success Stories - Program Overview 1. Data quality must be understood as 
 an engineering challenge 2. Putting a price on data quality 3. DM BoK components compliment 
 each other well 4. Savings based stories 5. Innovation based stories 6. Non-monetary stories 7. Takeaways and Q&A
  • 9. The Data Management 
 Body of 
 Knowledge 49Copyright 2019 by Data Blueprint Slide # Data 
 Management 
 Practice Areas fromTheDAMAGuidetotheDataManagementBodyofKnowledge©2009byDAMAInternational Overview: Data Quality Engineering 50Copyright 2019 by Data Blueprint Slide # Definitions • Quality Data – Fit for purpose meets the requirements of its authors, users, 
 and administrators (adapted from Martin Eppler) – Synonymous with information quality, since poor data quality 
 results in inaccurate information and poor business performance • Data Quality Management – Planning, implementation and control activities that apply quality 
 management techniques to measure, assess, improve, and 
 ensure data quality – Entails the "establishment and deployment of roles, responsibilities 
 concerning the acquisition, maintenance, dissemination, and 
 disposition of data" http://www2.sas.com/proceedings/sugi29/098-29.pdf ✓ Critical supporting process from change management ✓ Continuous process for defining acceptable levels of data quality to meet business needs and for ensuring that data quality meets these levels • Data Quality Engineering – Recognition that data quality solutions cannot not managed but must be engineered – Engineering is the application of scientific, economic, social, and practical knowledge in order to design, build, and maintain solutions to data quality challenges – Engineering concepts are generally not known and understood within IT or business! 51Copyright 2019 by Data Blueprint Slide # Spinach/Popeye story from http://it.toolbox.com/blogs/infosphere/spinach-how-a-data-quality-mistake-created-a-myth-and-a-cartoon-character-10166 Why isn't aren't my data problems solved by a data warehouse? 52Copyright 2019 by Data Blueprint Slide # Version 1 53Copyright 2019 by Data Blueprint Slide # Data Strategy Data Governance Data 
 Quality Improving operations in 3 data management practice areas BI Warehouse Version 2 54Copyright 2019 by Data Blueprint Slide # Data Strategy Data Governance BI Warehouse Metadata Improving operations in 3 data management practice areas
  • 10. Version 3 55Copyright 2019 by Data Blueprint Slide # Data Strategy Data Governance BI/ Warehouse Reference & Master Data Perfecting operations in 3 data management practice areas 56Copyright 2019 by Data Blueprint Slide # Data Quality Success Stories - Program Overview 1. Data quality must be understood as 
 an engineering challenge 2. Putting a price on data quality 3. DM BoK components compliment 
 each other well 4. Savings based stories 5. Innovation based stories 6. Non-monetary stories 7. Takeaways and Q&A 
 Improve Operations Innovation Data quality focus should be sequenced 57Copyright 2019 by Data Blueprint Slide # 58Copyright 2019 by Data Blueprint Slide # 59Copyright 2019 by Data Blueprint Slide # Ubiquitous Mystery Object 60Copyright 2019 by Data Blueprint Slide #
  • 11. Complex Data Quality Problems • Agency manages (4,000,000 data items) – Executive in charge requested 
 a conversion update – Was told verbally the conversion was "going well" – Demanded specifics • Question: "How many items did you attempt to convert?" • Answer: "100 items" • Question: "How many were actually converted?" • Answer: "5" • Problems – Not reporting the "right results" – These "problems" were discovered too late in the project – Unsophisticated contractor 61Copyright 2019 by Data Blueprint Slide # Improving Data Quality during System Migration • Challenge – Millions of NSN/SKUs 
 maintained in a catalog – Key and other data stored in 
 clear text/comment fields – Original suggestion was manual 
 approach to text extraction – Left the data structuring problem unsolved • Solution – Proprietary, improvable text extraction process – Converted non-tabular data into tabular data – Saved a minimum of $5 million – Literally person centuries of work 62Copyright 2019 by Data Blueprint Slide # Unmatched Items Ignorable Items Items Matched Week # (% Total) (% Total) (% Total) 1 31.47% 1.34% N/A 2 21.22% 6.97% N/A 3 20.66% 7.49% N/A 4 32.48% 11.99% 55.53% … … … … 14 9.02% 22.62% 68.36% 15 9.06% 22.62% 68.33% 16 9.53% 22.62% 67.85% 17 9.5% 22.62% 67.88% 18 7.46% 22.62% 69.92% Determining Diminishing Returns 63Copyright 2019 by Data Blueprint Slide # Before After Time needed to review all NSNs once over the life of the project: NSNs 2,000,000 Average time to review & cleanse (in minutes) 5 Total Time (in minutes) 10,000,000 Time available per resource over a one year period of time: Work weeks in a year 48 Work days in a week 5 Work hours in a day 7.5 Work minutes in a day 450 Total Work minutes/year 108,000 Person years required to cleanse each NSN once prior to migration: Minutes needed 10,000,000 Minutes available person/year 108,000 Total Person-Years 92.6 Resource Cost to cleanse NSN's prior to migration: Avg Salary for SME year (not including overhead) $60,000.00 Projected Years Required to Cleanse/Total DLA Person Year Saved 93 Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $5.5 million Quantitative Benefits 64Copyright 2019 by Data Blueprint Slide # Time needed to review all NSNs once over the life of the project: NSNs 2,000,000 Average time to review & cleanse (in minutes) 5 Total Time (in minutes) 10,000,000 Time available per resource over a one year period of time: Work weeks in a year 48 Work days in a week 5 Work hours in a day 7.5 Work minutes in a day 450 Total Work minutes/year 108,000 Person years required to cleanse each NSN once prior to migration: Minutes needed 10,000,000 Minutes available person/year 108,000 Total Person-Years 92.6 Resource Cost to cleanse NSN's prior to migration: Avg Salary for SME year (not including overhead) $60,000.00 Projected Years Required to Cleanse/Total DLA Person Year Saved 93 Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $5.5 million Quantitative Benefits 65Copyright 2019 by Data Blueprint Slide # Time needed to review all NSNs once over the life of the project: NSNs 150,000 Average time to review & cleanse (in minutes) 5 Total Time (in minutes) 750,000 Time available per resource over a one year period of time: Work weeks in a year 48 Work days in a week 5 Work hours in a day 7.5 Work minutes in a day 450 Total Work minutes/year 108,000 Person years required to cleanse each NSN once prior to migration: Minutes needed 750,000 Minutes available person/year 108,000 Total Person-Years 7 Resource Cost to cleanse NSN's prior to migration: Avg Salary for SME year (not including overhead) $60,000.00 Projected Years Required to Cleanse/Total DLA Person Year Saved 7 Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $420,000 Time needed to review all NSNs once over the life of the project: NSNs 2,000,000 Average time to review & cleanse (in minutes) 5 Total Time (in minutes) 10,000,000 Time available per resource over a one year period of time: Work weeks in a year 48 Work days in a week 5 Work hours in a day 7.5 Work minutes in a day 450 Total Work minutes/year 108,000 Person years required to cleanse each NSN once prior to migration: Minutes needed 10,000,000 Minutes available person/year 108,000 Total Person-Years 92.6 Resource Cost to cleanse NSN's prior to migration: Avg Salary for SME year (not including overhead) $60,000.00 Projected Years Required to Cleanse/Total DLA Person Year Saved 93 Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $5.5 million Quantitative Benefits 66Copyright 2019 by Data Blueprint Slide #
  • 12. Year 2000 (or Y2K) Bug 67Copyright 2019 by Data Blueprint Slide # • Before the internet – Computing resources were expensive – It was worth the tradeoff to represent the year field using two digits – 1959 was represented to the computer as 59 – Subtracting 59 from 99 yields the correct answer 40 (for dates prior to 2000/01/01!) – No one expected those programs to still be in use – Documentation was poorly created/maintained • If all these fields were not expanded to four digits before 2000/01/01 then date calculations will not give correct results – Subtracting 59 from 00 yields the incorrect answer -41 – No one knew how long this would take or 
 cost–only when it must be completed! On the OFFICIAL Clock of the United States at 1 second BEFORE Midnight showed: December 31, 1999 11:59:59 One SECOND later the OFFICIAL Clock of the United States showed: January 1, 19100 00:00:01 • with a PhD in Chemical Engineering • have to know whether this product was
 Y2K compliant? Why should a knowledge worker 68Copyright 2019 by Data Blueprint Slide # International Chemical Company Engine Testing 69Copyright 2019 by Data Blueprint Slide # • $1billion (+) chemical company • Develops/manufactures additives enhancing the performance of oils and fuels ... • ... to enhance engine/ machine performance – Helps fuels burn cleaner – Engines run smoother – Machines last longer • Tens of thousands of 
 tests annually – Test costs range up to $250,000! 1.Manual transfer of digital data 2.Manual file movement/duplication 3.Manual data manipulation 4.Disparate synonym reconciliation 5.Tribal knowledge requirements 6.Non-sustainable technology 70Copyright 2019 by Data Blueprint Slide # Data Integration Solution 71Copyright 2019 by Data Blueprint Slide # • Integrated the existing systems to easily search on and find similar or identical tests • Results: – Reduced expenses – Improved competitive edge 
 and customer service – Time savings and improve operational capabilities • According to our client’s internal business case development, they expect to realize a $25 million gain each year thanks to this data integration Lockheed Martin • 20 years of project email – Example from Doug Laney 72Copyright 2019 by Data Blueprint Slide #
  • 13. Logistics Company • Fortune 450 • Room of 100 associates • Manually correcting every
 item on every customer invoice • Upon noting this to the 
 responsible manager - the reply was: – This is the best quarter – Of the best year – I've ever had – Perhaps I need 
 to double the 
 number in 
 that room? 73Copyright 2019 by Data Blueprint Slide # 74Copyright 2019 by Data Blueprint Slide # Data Quality Success Stories - Program Overview 1. Data quality must be understood as 
 an engineering challenge 2. Putting a price on data quality 3. DM BoK components compliment 
 each other well 4. Savings based stories 5. Innovation based stories 6. Non-monetary stories 7. Takeaways and Q&A US DoD Reverse Engineering Program Manager • "Your first project is to keep me from having to testify to a Congressional Hearing!" (Belkis Leon-Hong former ASD-C3I) • Problem: – 37 systems paid personnel within DoD – How many were needed? – How many potential losers? – What do you mean by employee? • Process modeling – Inconclusive results • Data reverse engineering - definitive – One legged engineer, 
 working in waist deep waters, 
 underneath rotating helicopter blades, 
 on overtime 75Copyright 2019 by Data Blueprint Slide # Reverse Engineering New Systems 76Copyright 2019 by Data Blueprint Slide # Reverse Engineering New Systems for Smooth Implementation. IEEE Software. March/April 1999 16(2):36-43 Platform: UniSys
 OS: OS
 1998 Age: 21 
 Data Structure: DMS (Network)
 Physical Records: 4,950,000
 Logical Records: 250,000
 Relationships: 62
 Entities: 57
 Attributes: 1478 Predicting Engineering Problem Characteristics New System Legacy System 
 #1: Payroll Legacy System 
 #2: Personnel Platform: Amdahl
 OS: MVS
 1998 Age: 15 
 Data Structure: VSAM/virtual 
 database tables
 Physical Records: 780,000
 Logical Records: 60,000
 Relationships: 64
 Entities: 4/350
 Attributes: 683 Characteristics Logical Physical
 Platform: WinTel Records: 250,000 600,000
 OS: Win'95 Relationships: 1,034 1,020
 1998 Age: new Entities: 1,600 2,706
 Data Structure: Client/Sever RDBMS Attributes: 15,000 7,073 77Copyright 2019 by Data Blueprint Slide # TheBudgetTrap(Parts1&2) 78Copyright 2019 by Data Blueprint Slide #
  • 14. Actual Bid From Systems Integrator 79Copyright 2019 by Data Blueprint Slide # Extreme Data Engineers ... 2 person months = 40 person days 2,000 attributes mapped onto 15,000 2,000/40 person days = 500/person day
 or 500/8 hours = 62.5 attributes/hour and 15,000/40 person days = 375/person day
 or 375/8 hours = 46.875 attributes/hour Locate, identify, understand, map, transform, document 108 attributes/60 minutes 1.8 attributes/minute! 80Copyright 2019 by Data Blueprint Slide # What did Rolls Royce Learn • Old model – Sell jet engines • New model – Sell hours of thrust power – Power-by-the-hour – No payment for down time – Wing to wing – When was it invented? from Nascar? 81Copyright 2019 by Data Blueprint Slide # Fan Blade Sensor 82Copyright 2019 by Data Blueprint Slide # • 1 Sensor – Probabilistic (generalist) maintenance forecasts • 100 Sensors – Establish optimal monitoring targets – Finer tuned and safer maintenance – Mission Readiness ??? – Storage $$$ – Handling $$$ – Opportunity $$$ – Systemic $$$ – Maintenance $$$ – Total > $1.5 Billion 83Copyright 2019 by Data Blueprint Slide # Data Quality Success Stories - Program Overview 1. Data quality must be understood as 
 an engineering challenge 2. Putting a price on data quality 3. DM BoK components compliment 
 each other well 4. Savings based stories 5. Innovation based stories 6. Non-monetary stories 7. Takeaways and Q&A Armed Force Example • Lieutenant attempting to correct a 4 year underpayment 
 of his private's pay – Significant impact on moral – Immediate cash issues – Cost tens of man hours over months of time to resolve 84Copyright 2019 by Data Blueprint Slide # Nugee, R. and R. S. Seiner (2010, 6/1/2010). "TDAN.com Interview with Brigadier Richard Nugee – The British Army." 2013, from http://www.tdan.com/view-special- features/13897 and personal communications.
  • 15. Friendly Fire deaths traced to Dead Battery • Date: Tue, 26 Mar 2002 10:47:52 -0500
 From: 
 Subject: Friendly Fire deaths traced to dead battery
 
 In one of the more horrifying incidents I've read about, U.S. soldiers and
 allies were killed in December 2001 because of a stunningly poor design of a
 GPS receiver, plus "human error."
 
 http://www.washingtonpost.com/wp-dyn/articles/A8853-2002Mar23.html
 
 A U.S. Special Forces air controller was calling in GPS positioning from
 some sort of battery-powered device. He "had used the GPS receiver to
 calculate the latitude and longitude of the Taliban position in minutes and
 seconds for an airstrike by a Navy F/A-18."
 • According to the *Post* story, the bomber crew "required" a "second
 calculation in 'degree decimals'" -- why the crew did not have equipment to
 perform the minutes-seconds conversion themselves is not explained.
 • The air controller had recorded the correct value in the GPS receiver when
 the battery died. Upon replacing the battery, he called in the
 degree-decimal position the unit was showing -- without realizing that the
 unit is set up to reset to its *own* position when the battery is replaced.
 
 The 2,000-pound bomb landed on his position, killing three Special Forces
 soldiers and injuring 20 others.
 • If the information in this story is accurate, the RISKS involve replacing
 memory settings with an apparently-valid default value instead of blinking 0
 or some other obviously-wrong display; not having a backup battery to hold
 values in memory during battery replacement; not equipping users to
 translate one coordinate system to another (reminiscent of the Mars Climate
 Orbiter slamming into the planet when ground crews confused English with
 metric); and using a device with such flaws in a combat situation 85Copyright 2019 by Data Blueprint Slide # Formalizing the 
 Role of U.S. Army 
 Data Governance 86Copyright 2019 by Data Blueprint Slide # How one inventory item proliferates data throughout the chain 555 Subassemblies & subcomponents 17,659 Repair parts or Consumables System 1:
 18,214 Total items
 75 Attributes/item
 1,366,050 Total attributes System 2
 47 Total items
 15+ Attributes/item
 720 Total attributes System 3 16,594 Total items 73 Attributes/item 1,211,362 Total attributes System 4
 8,535 Total items
 16 Attributes/item
 136,560 Total attributes System 5
 15,959 Total items
 22 Attributes/item
 351,098 Total attributes Total for the five systems show above:
 59,350 Items
 179 Unique attributes
 3,065,790 values 87Copyright 2019 by Data Blueprint Slide # 88Copyright 2019 by Data Blueprint Slide # Business Implications • National Stock Number (NSN) 
 Discrepancies – If NSNs in LUAF, GABF, and RTLS are 
 not present in the MHIF, these records 
 cannot be updated in SASSY – Additional overhead is created to correct 
 data before performing the real 
 maintenance of records • Serial Number Duplication – If multiple items are assigned the same 
 serial number in RTLS, the traceability of 
 those items is severely impacted – Approximately $531 million of SAC 3 
 items have duplicated serial numbers • On-Hand Quantity Discrepancies – If the LUAF O/H QTY and number of items serialized in RTLS conflict, there can be no clear answer as to how many items a unit actually has on-hand – Approximately $5 billion of equipment does not tie out between the systems 89Copyright 2019 by Data Blueprint Slide # Best approaches combines manual and automation Humans Generally Better Machines Generally Better • Sense low level stimuli • Detect stimuli in noisy background • Recognize constant patterns in varying situations • Sense unusual and unexpected events • Remember principles and strategies • Retrieve pertinent details without a priori connection • Draw upon experience and adapt decision to situation • Select alternatives if original approach fails • Reason inductively; generalize from observations • Act in unanticipated emergencies and novel situations • Apply principles to solve varied problems • Make subjective evaluations • Develop new solutions • Concentrate on important tasks when overload occurs • Adapt physical response to changes in situation • Sense stimuli outside human's range • Count or measure physical quantities • Store quantities of coded information accurately • Monitor prespecified events, especially infrequent • Make rapid and consisted responses to input signals • Recall quantities of detailed information accurately • Retrieve pertinent detailed without a priori connection • Process quantitative data in prespecified ways • Perform repetitive preprogrammed actions reliably • Exert great, highly controlled physical force • Perform several activities simultaneously • Maintain operations under heavy operation load • Maintain performance over extended periods of time 90Copyright 2019 by Data Blueprint Slide #
  • 16. 91Copyright 2019 by Data Blueprint Slide # Potential Data Sources 92Copyright 2019 by Data Blueprint Slide # Data Mapping 12 Mental illness Deploy ments Work History Soldier Legal Issues Abuse Suicide Analysis FAPDMSS G1 DMDC CID Data objects complete? All sources identified? Best source for each object? How reconcile differences between sources? MDR 93Copyright 2019 by Data Blueprint Slide # 94Copyright 2019 by Data Blueprint Slide # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 95Copyright 2019 by Data Blueprint Slide # Senior Army Official • Room full of Stewards • A very heavy dose of management support • Advised the group of his opinion on the matter • Any questions as to future direction – "They should make an appointment to speak directly with me!" • Empower the team – The conversation turned from "can this be done?" to "how are we going to accomplish this?" – Mistakes along the way would be tolerated – Implement a workable solution in prototype form 96Copyright 2019 by Data Blueprint Slide #
  • 17. 97Copyright 2019 by Data Blueprint Slide # Managing Data with Guidance? • Federal employees • 44 users from whitehouse.gov • Thousands of military and 
 government e-mails • Canadian citizens • One-fifth of Quebec 98Copyright 2019 by Data Blueprint Slide # 
 Ashley
 Madison
 37,000,000 
 
 25,000,000
 OPM 
 
 
 70,000,000
 Target 99Copyright 2019 by Data Blueprint Slide # Target Corporation's Database Contents 100Copyright 2019 by Data Blueprint Slide # • Your age • Marital status • Part of town you live in • How long it takes you to drive to work • Estimated salary • If you have recently moved • Credit cards carried in your wallet • What websites you visit • Your ethnicity • Your job history • The magazines you read • Work commute • Sexual preferences • If you’ve ever declared bankruptcy or got divorced • The year you bought (or lost) your house • Where you went to school(s) • What kinds of topics you talk about online • Whether you prefer certain brands of coffee, paper towels, cereal or applesauce • Your political leanings, reading habits, charitable giving and • The number of cars you own 101Copyright 2019 by Data Blueprint Slide # https://oversight.house.gov/report/opm-data-breach-government-jeopardized-national-security-generation/ How the Government Jeopardized Our National Security for More than a Generation • Preventable • Leadership failed – To heed repeated recommendations – To sufficiently respond to growing threats of sophisticated cyber attacks, and – To prioritize resources for cybersecurity • 2014 data breaches were likely connected and possibly coordinated to the 2015 data breach • OPM misled the public on the extent of the damage of the breach and made false statements to Congress Key Findings 102Copyright 2019 by Data Blueprint Slide # Data Quality Success Stories - Program Overview 1. Data quality must be understood as 
 an engineering challenge 2. Putting a price on data quality 3. DM BoK components compliment 
 each other well 4. Savings based stories 5. Innovation based stories 6. Non-monetary stories 7. Takeaways and Q&A
  • 18. • Quality data requires a context specific definition • Most business problems have data challenges (hidden data factories) at their root • All advanced data practices depend on quality data • AI/ML are suffering from lack of training data • Few 'easy' fixes exist • Data quality engineering works well when combined with other DM BoK 'pie wedges' • Successful data quality stories demonstrate – Tangible ongoing savings – Innovative data uses – Outcomes more important than money Take Aways 103Copyright 2019 by Data Blueprint Slide # + = 104Copyright 2019 by Data Blueprint Slide # Questions? 10124 W. Broad Street, Suite C Glen Allen, Virginia 23060 804.521.4056 Copyright 2019 by Data Blueprint Slide # 105