SlideShare une entreprise Scribd logo
1  sur  28
Dr Philip Woodall
Senior Research Associate
Distributed Information and Automation Lab
Department of Engineering, University of Cambridge
Data quality and decision making
Data Management theme
Data Quality
Data Value
Data Sharing
Improving the
Management of
Industrial Data
Contents
• Data repurposing
– Data quality problems found when applying data
analytics in manufacturing
• ITALI – IT Architectures for Logistics
– New IT architecture for the sponsor company
– Aligning the physical process and the data
Related projects
• Data repurposing
– Data quality problems found when applying data
analytics in manufacturing
• ITALI – IT Architectures for Logistics
– New IT architecture for the sponsor company
– Aligning the physical process and the data
Retailers Face Inconsistent Product Data in E-Commerce Efforts
E.g. shoes on one website described as “sneakers” on another, “trainers”
Poor data consequences
Data scientists spend inordinate
amounts of time correcting data
before it can be used for
analysis/decisions
1
3
2
Some DQ problems arise because of the way we are now
attempting to use data…
Reuse = using data for the same or similar task again
Repurposing = using data for a completely different task
Analytics: a different use of data
Data repurposing
• When the use of the data changes so do the
data quality requirements
• How do you know when data is good enough
quality to be used for analytics?
• We conducted a survey to investigate the issues.
A survey of the DQ problems that arise
when data is repurposed in manufacturing
• What do manufacturers repurpose data for?
• Where do they get the data from?
• Data quality problems faced when repurposing
data?
• Solution: We produced a framework to help
analyse the problems
Results: What do manufacturers repurpose
data for?
• To calculate supplier performance, such as On-
Time In Full (OTIF) using purchase order and
good receiving data.
• To perform a parts obsolescence risk
assessment for all the parts on an aircraft using
the bill of materials.
• Identification of performance improvements to
the production line and logistics operations.
Results: Where do they get the data from?
Results: Where do they get the data from?
Transport mechanism: Data is extracted into a
spreadsheet and emailed to the analysts.
Results: example DQ problems arising when
using repurposed data
• Dummy data: ‘actual
delivery date’ a copy of
‘expected delivery date’,
appears that more data is
available.
• No synchronisation:
Updates from local
spreadsheets not sent
back to the original
system.
• Unknown assumptions:
Analyst not aware of how
data was collected or pre-
processed.
• Unhelpful data cleansing:
convert data back again
(e.g. cm to m, 1 box = 50
items).
A framework of DQ problems faced when
repurposing manufacturing data
Assessing and improving data quality
• Hybrid Approach:
– Discover and measure errors
• TIRM:
– Assess risks (costs) of poor
data quality
– Simulate mitigation actions
– Select most appropriate
actions
LUL
A model for information risk analysis
information
Related projects
• Data repurposing
– Data quality problems found when applying data
analytics in manufacturing
• ITALI – IT Architectures for Logistics
– New IT architecture for the sponsor company
– Aligning the physical process and the data
ITALI - IT Architectures for Logistics Integration
ITALI aims to…
…investigate how existing logistics-related information systems must
evolve to address future logistics needs
Project sponsor
…by exploring
A: Mismatches between physical operations and data
B: How the existing IT systems can be organised into an architecture
that supports the next generation logistics issues (B2B->B2C).
Outputs
1: A new state of the art IT architecture for
logistics and warehousing
2: New concept: Potential Problem Data
Tagging.
For avoiding disruptions caused by data
mismatches
3: A framework for supporting both B2B to B2C
commerce.
Key requirements for the architecture
• How to integrate data from differing systems
– To generate analytics reports for the organisation
• IT architecture must align with CEO vision for the
company
– Required sophisticated (and flexible) connections
between information systems
Outcomes
• Key barriers facing organisations when attempting to
generate analytics reports:
1. Data must be integrated from different systems
2. Master data management needs to be in place
3. Differences in data models between systems
• can make it very difficult to write queries to extract data when it
is needed for another purpose (e.g. aggregate data in one
field)
4. Trivial data quality problems at data entry can render the
entire process useless
Related projects
• Data repurposing
– Data quality problems found when applying data
analytics in manufacturing
• ITALI – IT Architectures for Logistics
– New IT architecture for the sponsor company
– Aligning the physical process and the data
Potential Problem Data Tagging
• How to make sure that both process and data
are aligned
21
One example:
Pickers misplace
items in the
warehouse
Potential Problem Data Tagging
• How to make sure that both process and data
are aligned
22
One example:
Pickers misplace
items in the
warehouse
Approach:
Potential Problem Data Tagging
• Tag the data with a level of accuracy
– Count the number of times the data has been exposed to an event that
could cause it to become inaccurate.
• Only pick from the most accurate locations
23
Location Item
type
Item
quantity
Tag
1 A 30 0.05
2 A 20 0
3 B 15 0.0975
4 B 4 0.14265
5 - 0 0
Results of a simulation
• 100 picks
• Extra 3 to 4
disruptions
being avoided
compared to
normal
24
Error rate
1% 1% 1% 5% 5% 5% 20% 20% 20%
Degrees of freedom
2 12 60 2 12 60 2 12 60
Meannumberofdisruptionsencountered
0
1
2
3
4
5
6
7
8
Normal
Avoid
Results of a simulation
• Can also be used
to find
inaccuracies
• Even greater
performance:
• Extra 6 to 7
inaccuracies can
be found
compared to
normal
25
Error rate
1% 1% 1% 5% 5% 5% 20% 20% 20%
Degrees of freedom
2 12 60 2 12 60 2 12 60
Meannumberofinaccuraciesfound
0
5
10
15
Normal
Find
Dr Philip Woodall
Senior Research Associate
Distributed Information and Automation Lab
Department of Engineering, University of Cambridge
Thank you
Related papers
Repurposing
Woodall, P. (2017). The Data Repurposing Challenge: New Pressures from Data
Analytics. Journal of Data and Information Quality (2017).
Assessing and improving data quality
Woodall, P., Borek, A. and Parlikad, A. (2013). Data quality assessment: The Hybrid
Approach. Information & Management, 50 (7), p.pp.369–382.
Borek, A. et al. (2014). A risk based model for quantifying the impact of information
quality. Computers in Industry, 65 (2), p.pp.354–366.
Potential Problem Data Tagging
Woodall, P. et al. (2016). Data State Tracking: labelling good quality data to improve
warehouse operations. In International Conference on Information Quality (ICIQ).
Ciudad Real, Spain.
Dr Philip Woodall
Senior Research Associate
Distributed Information and Automation Lab
Department of Engineering, University of Cambridge
Thank you

Contenu connexe

Tendances

Data analysis and cleansing
Data analysis and cleansingData analysis and cleansing
Data analysis and cleansing
DemandGen
 
Systat 13 Training ppt
Systat 13 Training pptSystat 13 Training ppt
Systat 13 Training ppt
Siriyak Cr
 
Sigmaplot 13 PPT
Sigmaplot 13 PPTSigmaplot 13 PPT
Sigmaplot 13 PPT
Siriyak Cr
 
Migrating Clinical Data in Various Formats to a Clinical Data Management System
Migrating Clinical Data in Various Formats to a Clinical Data Management SystemMigrating Clinical Data in Various Formats to a Clinical Data Management System
Migrating Clinical Data in Various Formats to a Clinical Data Management System
Perficient, Inc.
 
Foundational Methodology for Data Science
Foundational Methodology for Data ScienceFoundational Methodology for Data Science
Foundational Methodology for Data Science
John B. Rollins, Ph.D.
 

Tendances (20)

Data analysis and cleansing
Data analysis and cleansingData analysis and cleansing
Data analysis and cleansing
 
Preparing for the transition - data science as a student vs in the industry
Preparing for the transition - data science as a student vs in the industryPreparing for the transition - data science as a student vs in the industry
Preparing for the transition - data science as a student vs in the industry
 
Test data generation
Test data generationTest data generation
Test data generation
 
An overview of big data analytics
An overview of big data analytics An overview of big data analytics
An overview of big data analytics
 
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
 
Systat 13 Training ppt
Systat 13 Training pptSystat 13 Training ppt
Systat 13 Training ppt
 
Sigmaplot 13 PPT
Sigmaplot 13 PPTSigmaplot 13 PPT
Sigmaplot 13 PPT
 
1645 track 2 ard_using our laptop
1645 track 2 ard_using our laptop1645 track 2 ard_using our laptop
1645 track 2 ard_using our laptop
 
preprocessing
preprocessingpreprocessing
preprocessing
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
 
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...
 
research methodology data processing EDITING
research methodology data processing EDITING research methodology data processing EDITING
research methodology data processing EDITING
 
1555 track 2 ning_using our laptop
1555 track 2 ning_using our laptop1555 track 2 ning_using our laptop
1555 track 2 ning_using our laptop
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining Projects
 
Migrating Clinical Data in Various Formats to a Clinical Data Management System
Migrating Clinical Data in Various Formats to a Clinical Data Management SystemMigrating Clinical Data in Various Formats to a Clinical Data Management System
Migrating Clinical Data in Various Formats to a Clinical Data Management System
 
1030 track 2 barrett_using our laptop
1030 track 2 barrett_using our laptop1030 track 2 barrett_using our laptop
1030 track 2 barrett_using our laptop
 
Data Quality at the Speed of Work
Data Quality at the Speed of WorkData Quality at the Speed of Work
Data Quality at the Speed of Work
 
Foundational Methodology for Data Science
Foundational Methodology for Data ScienceFoundational Methodology for Data Science
Foundational Methodology for Data Science
 
Data analysis
Data analysisData analysis
Data analysis
 

Similaire à Data quality in decision making - Dr. Philip Woodall, University of Cambridge

351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
XanGwaps
 

Similaire à Data quality in decision making - Dr. Philip Woodall, University of Cambridge (20)

Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 
Software metrics by Dr. B. J. Mohite
Software metrics by Dr. B. J. MohiteSoftware metrics by Dr. B. J. Mohite
Software metrics by Dr. B. J. Mohite
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
 
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity ChallengesBuilding a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
 
Lauri Pietarinen - What's Wrong With My Test Data
Lauri Pietarinen - What's Wrong With My Test DataLauri Pietarinen - What's Wrong With My Test Data
Lauri Pietarinen - What's Wrong With My Test Data
 
Data Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernData Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing Concern
 
plm business benefits of a plm system
plm business benefits of a plm systemplm business benefits of a plm system
plm business benefits of a plm system
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data mining
 
Digitalization in Electronics Manufacturing
Digitalization in Electronics ManufacturingDigitalization in Electronics Manufacturing
Digitalization in Electronics Manufacturing
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Digitalization in electronics manufacturing
Digitalization in electronics manufacturingDigitalization in electronics manufacturing
Digitalization in electronics manufacturing
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
001 More introduction to big data analytics
001   More introduction to big data analytics001   More introduction to big data analytics
001 More introduction to big data analytics
 
Big data
Big dataBig data
Big data
 
Anwar kamal .pdf.pptx
Anwar kamal .pdf.pptxAnwar kamal .pdf.pptx
Anwar kamal .pdf.pptx
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 

Dernier

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

Dernier (20)

ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 

Data quality in decision making - Dr. Philip Woodall, University of Cambridge

  • 1. Dr Philip Woodall Senior Research Associate Distributed Information and Automation Lab Department of Engineering, University of Cambridge Data quality and decision making
  • 2. Data Management theme Data Quality Data Value Data Sharing Improving the Management of Industrial Data
  • 3. Contents • Data repurposing – Data quality problems found when applying data analytics in manufacturing • ITALI – IT Architectures for Logistics – New IT architecture for the sponsor company – Aligning the physical process and the data
  • 4. Related projects • Data repurposing – Data quality problems found when applying data analytics in manufacturing • ITALI – IT Architectures for Logistics – New IT architecture for the sponsor company – Aligning the physical process and the data
  • 5. Retailers Face Inconsistent Product Data in E-Commerce Efforts E.g. shoes on one website described as “sneakers” on another, “trainers” Poor data consequences Data scientists spend inordinate amounts of time correcting data before it can be used for analysis/decisions 1 3 2 Some DQ problems arise because of the way we are now attempting to use data…
  • 6. Reuse = using data for the same or similar task again Repurposing = using data for a completely different task Analytics: a different use of data
  • 7. Data repurposing • When the use of the data changes so do the data quality requirements • How do you know when data is good enough quality to be used for analytics? • We conducted a survey to investigate the issues.
  • 8. A survey of the DQ problems that arise when data is repurposed in manufacturing • What do manufacturers repurpose data for? • Where do they get the data from? • Data quality problems faced when repurposing data? • Solution: We produced a framework to help analyse the problems
  • 9. Results: What do manufacturers repurpose data for? • To calculate supplier performance, such as On- Time In Full (OTIF) using purchase order and good receiving data. • To perform a parts obsolescence risk assessment for all the parts on an aircraft using the bill of materials. • Identification of performance improvements to the production line and logistics operations.
  • 10. Results: Where do they get the data from?
  • 11. Results: Where do they get the data from? Transport mechanism: Data is extracted into a spreadsheet and emailed to the analysts.
  • 12. Results: example DQ problems arising when using repurposed data • Dummy data: ‘actual delivery date’ a copy of ‘expected delivery date’, appears that more data is available. • No synchronisation: Updates from local spreadsheets not sent back to the original system. • Unknown assumptions: Analyst not aware of how data was collected or pre- processed. • Unhelpful data cleansing: convert data back again (e.g. cm to m, 1 box = 50 items).
  • 13. A framework of DQ problems faced when repurposing manufacturing data
  • 14. Assessing and improving data quality • Hybrid Approach: – Discover and measure errors • TIRM: – Assess risks (costs) of poor data quality – Simulate mitigation actions – Select most appropriate actions LUL
  • 15. A model for information risk analysis information
  • 16. Related projects • Data repurposing – Data quality problems found when applying data analytics in manufacturing • ITALI – IT Architectures for Logistics – New IT architecture for the sponsor company – Aligning the physical process and the data
  • 17. ITALI - IT Architectures for Logistics Integration ITALI aims to… …investigate how existing logistics-related information systems must evolve to address future logistics needs Project sponsor …by exploring A: Mismatches between physical operations and data B: How the existing IT systems can be organised into an architecture that supports the next generation logistics issues (B2B->B2C). Outputs 1: A new state of the art IT architecture for logistics and warehousing 2: New concept: Potential Problem Data Tagging. For avoiding disruptions caused by data mismatches 3: A framework for supporting both B2B to B2C commerce.
  • 18. Key requirements for the architecture • How to integrate data from differing systems – To generate analytics reports for the organisation • IT architecture must align with CEO vision for the company – Required sophisticated (and flexible) connections between information systems
  • 19. Outcomes • Key barriers facing organisations when attempting to generate analytics reports: 1. Data must be integrated from different systems 2. Master data management needs to be in place 3. Differences in data models between systems • can make it very difficult to write queries to extract data when it is needed for another purpose (e.g. aggregate data in one field) 4. Trivial data quality problems at data entry can render the entire process useless
  • 20. Related projects • Data repurposing – Data quality problems found when applying data analytics in manufacturing • ITALI – IT Architectures for Logistics – New IT architecture for the sponsor company – Aligning the physical process and the data
  • 21. Potential Problem Data Tagging • How to make sure that both process and data are aligned 21 One example: Pickers misplace items in the warehouse
  • 22. Potential Problem Data Tagging • How to make sure that both process and data are aligned 22 One example: Pickers misplace items in the warehouse
  • 23. Approach: Potential Problem Data Tagging • Tag the data with a level of accuracy – Count the number of times the data has been exposed to an event that could cause it to become inaccurate. • Only pick from the most accurate locations 23 Location Item type Item quantity Tag 1 A 30 0.05 2 A 20 0 3 B 15 0.0975 4 B 4 0.14265 5 - 0 0
  • 24. Results of a simulation • 100 picks • Extra 3 to 4 disruptions being avoided compared to normal 24 Error rate 1% 1% 1% 5% 5% 5% 20% 20% 20% Degrees of freedom 2 12 60 2 12 60 2 12 60 Meannumberofdisruptionsencountered 0 1 2 3 4 5 6 7 8 Normal Avoid
  • 25. Results of a simulation • Can also be used to find inaccuracies • Even greater performance: • Extra 6 to 7 inaccuracies can be found compared to normal 25 Error rate 1% 1% 1% 5% 5% 5% 20% 20% 20% Degrees of freedom 2 12 60 2 12 60 2 12 60 Meannumberofinaccuraciesfound 0 5 10 15 Normal Find
  • 26. Dr Philip Woodall Senior Research Associate Distributed Information and Automation Lab Department of Engineering, University of Cambridge Thank you
  • 27. Related papers Repurposing Woodall, P. (2017). The Data Repurposing Challenge: New Pressures from Data Analytics. Journal of Data and Information Quality (2017). Assessing and improving data quality Woodall, P., Borek, A. and Parlikad, A. (2013). Data quality assessment: The Hybrid Approach. Information & Management, 50 (7), p.pp.369–382. Borek, A. et al. (2014). A risk based model for quantifying the impact of information quality. Computers in Industry, 65 (2), p.pp.354–366. Potential Problem Data Tagging Woodall, P. et al. (2016). Data State Tracking: labelling good quality data to improve warehouse operations. In International Conference on Information Quality (ICIQ). Ciudad Real, Spain.
  • 28. Dr Philip Woodall Senior Research Associate Distributed Information and Automation Lab Department of Engineering, University of Cambridge Thank you

Notes de l'éditeur

  1. Wall street: Problems can arise from the way search engines or e-commerce marketplaces pick up product data, Mr. Hogan says.  For example, if a particular pair of shoes is described on one website as “sneakers” and on another as “trainers”, the customer faces the possibility of seeing the same product under two different names.