Data quality in decision making - Dr. Philip Woodall, University of Cambridge

Dr Philip Woodall
Senior Research Associate
Distributed Information and Automation Lab
Department of Engineering, University of Cambridge
Data quality and decision making

Data Management theme
Data Quality
Data Value
Data Sharing
Improving the
Management of
Industrial Data

Contents
• Data repurposing
– Data quality problems found when applying data
analytics in manufacturing
• ITALI – IT Architectures for Logistics
– New IT architecture for the sponsor company
– Aligning the physical process and the data

Related projects
• Data repurposing
– Data quality problems found when applying data
analytics in manufacturing
• ITALI – IT Architectures for Logistics
– New IT architecture for the sponsor company
– Aligning the physical process and the data

Retailers Face Inconsistent Product Data in E-Commerce Efforts
E.g. shoes on one website described as “sneakers” on another, “trainers”
Poor data consequences
Data scientists spend inordinate
amounts of time correcting data
before it can be used for
analysis/decisions
1
3
2
Some DQ problems arise because of the way we are now
attempting to use data…

Reuse = using data for the same or similar task again
Repurposing = using data for a completely different task
Analytics: a different use of data

Data repurposing
• When the use of the data changes so do the
data quality requirements
• How do you know when data is good enough
quality to be used for analytics?
• We conducted a survey to investigate the issues.

A survey of the DQ problems that arise
when data is repurposed in manufacturing
• What do manufacturers repurpose data for?
• Where do they get the data from?
• Data quality problems faced when repurposing
data?
• Solution: We produced a framework to help
analyse the problems

Results: What do manufacturers repurpose
data for?
• To calculate supplier performance, such as On-
Time In Full (OTIF) using purchase order and
good receiving data.
• To perform a parts obsolescence risk
assessment for all the parts on an aircraft using
the bill of materials.
• Identification of performance improvements to
the production line and logistics operations.

Results: Where do they get the data from?

Results: Where do they get the data from?
Transport mechanism: Data is extracted into a
spreadsheet and emailed to the analysts.

Results: example DQ problems arising when
using repurposed data
• Dummy data: ‘actual
delivery date’ a copy of
‘expected delivery date’,
appears that more data is
available.
• No synchronisation:
Updates from local
spreadsheets not sent
back to the original
system.
• Unknown assumptions:
Analyst not aware of how
data was collected or pre-
processed.
• Unhelpful data cleansing:
convert data back again
(e.g. cm to m, 1 box = 50
items).

A framework of DQ problems faced when
repurposing manufacturing data

Assessing and improving data quality
• Hybrid Approach:
– Discover and measure errors
• TIRM:
– Assess risks (costs) of poor
data quality
– Simulate mitigation actions
– Select most appropriate
actions
LUL

A model for information risk analysis
information

ITALI - IT Architectures for Logistics Integration
ITALI aims to…
…investigate how existing logistics-related information systems must
evolve to address future logistics needs
Project sponsor
…by exploring
A: Mismatches between physical operations and data
B: How the existing IT systems can be organised into an architecture
that supports the next generation logistics issues (B2B->B2C).
Outputs
1: A new state of the art IT architecture for
logistics and warehousing
2: New concept: Potential Problem Data
Tagging.
For avoiding disruptions caused by data
mismatches
3: A framework for supporting both B2B to B2C
commerce.

Key requirements for the architecture
• How to integrate data from differing systems
– To generate analytics reports for the organisation
• IT architecture must align with CEO vision for the
company
– Required sophisticated (and flexible) connections
between information systems

Outcomes
• Key barriers facing organisations when attempting to
generate analytics reports:
1. Data must be integrated from different systems
2. Master data management needs to be in place
3. Differences in data models between systems
• can make it very difficult to write queries to extract data when it
is needed for another purpose (e.g. aggregate data in one
field)
4. Trivial data quality problems at data entry can render the
entire process useless

Potential Problem Data Tagging
• How to make sure that both process and data
are aligned
21
One example:
Pickers misplace
items in the
warehouse

• How to make sure that both process and data
are aligned
22
One example:
Pickers misplace
items in the
warehouse

Approach:
• Tag the data with a level of accuracy
– Count the number of times the data has been exposed to an event that
could cause it to become inaccurate.
• Only pick from the most accurate locations
23
Location Item
type
Item
quantity
Tag
1 A 30 0.05
2 A 20 0
3 B 15 0.0975
4 B 4 0.14265
5 - 0 0

Results of a simulation
• 100 picks
• Extra 3 to 4
disruptions
being avoided
compared to
normal
24
Error rate
1% 1% 1% 5% 5% 5% 20% 20% 20%
Degrees of freedom
2 12 60 2 12 60 2 12 60
Meannumberofdisruptionsencountered
0
1
2
3
4
5
6
7
8
Normal
Avoid

Results of a simulation
• Can also be used
to find
inaccuracies
• Even greater
performance:
• Extra 6 to 7
inaccuracies can
be found
compared to
normal
25
Error rate
1% 1% 1% 5% 5% 5% 20% 20% 20%
Degrees of freedom
2 12 60 2 12 60 2 12 60
Meannumberofinaccuraciesfound
0
5
10
15
Normal
Find

Dr Philip Woodall
Senior Research Associate
Distributed Information and Automation Lab
Department of Engineering, University of Cambridge
Thank you

Related papers
Repurposing
Woodall, P. (2017). The Data Repurposing Challenge: New Pressures from Data
Analytics. Journal of Data and Information Quality (2017).
Assessing and improving data quality
Woodall, P., Borek, A. and Parlikad, A. (2013). Data quality assessment: The Hybrid
Approach. Information & Management, 50 (7), p.pp.369–382.
Borek, A. et al. (2014). A risk based model for quantifying the impact of information
quality. Computers in Industry, 65 (2), p.pp.354–366.
Woodall, P. et al. (2016). Data State Tracking: labelling good quality data to improve
warehouse operations. In International Conference on Information Quality (ICIQ).
Ciudad Real, Spain.

Data quality in decision making - Dr. Philip Woodall, University of Cambridge

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Data quality in decision making - Dr. Philip Woodall, University of Cambridge

Similaire à Data quality in decision making - Dr. Philip Woodall, University of Cambridge (20)

Dernier

Dernier (20)

Data quality in decision making - Dr. Philip Woodall, University of Cambridge

Notes de l'éditeur