SlideShare une entreprise Scribd logo
1  sur  73
Télécharger pour lire hors ligne
HackCodeX Forum
5.06.2023, Riga, Latvia
DATA QUALITY AS A PREREQUISITE
FOR BUSINESS SUCCESS:
WHEN SHOULD I START
TAKING CARE OF IT?
Anastasija Nikiforova
Assistant Professor of Information Systems, Faculty of Science and Technology,
Institute of Computer Science, Chair of Software Engineering, University of Tartu
European Open Science CLoud (EOSC) Task Force “FAIR metrics and data quality”
PHD IN COMPUTER SCIENCE – DATA PROCESSING SYSTEMS AND DATA NETWORKING
RESEARCH INTERESTS: DATA MANAGEMENT WITH A FOCUS ON DATA QUALITY, OPEN
GOVERNMENT DATA, SMART CITY, SOCIETY 5.0, SUSTAINABLE DEVELOPMENT, IOT, HCI,
DIGITIZATION.
✔ASSISTANT PROFESSOR AT THE UNIVERSITY OF TARTU, FACULTY OF SCIENCE AND TECHNOLOGY, INSTITUTE OF COMPUTER SCIENCE,
CHAIR OF SOFTWARE ENGINEERING
✔EUROPEAN OPEN SCIENCE CLOUD TASK FORCE “FAIR METRICS AND DATA QUALITY”
✔EDSC AMBASSADOR (EUROPEAN DIGITAL SKILLS CERTIFICATE, AS PART OF ACTION 9 OF THE DIGITAL EDUCATION ACTION PLAN (2021- 2027) –
JRC/SVQ/2022/OP/0013)
✔IFIP WG8.5 ON ICT AND PUBLIC ADMINISTRATION MEMBER
✔ASSOCIATE MEMBER OF THE LATVIAN OPEN TECHNOLOGY ASSOCIATION
✔EXPERT OF THE LATVIAN COUNCIL OF SCIENCES IN (1) NATURAL SCIENCES – COMPUTER SCIENCE & INFORMATICS, (2) ENGINEERING & TECHNOLOGY-
ELECTRICAL ENGINEERING, ELECTRONICS, ICT, (3) SOCIAL SCIENCES – ECONOMICS & BUSINESS
✔EXPERT OF THE COST – EUROPEAN COOPERATION IN SCIENCE & TECHNOLOGY
✔ASSISTANT PROFESSOR AT THE UNIVERSITY OF TARTU, FACULTY OF SCIENCE AND TECHNOLOGY, INSTITUTE OF COMPUTER SCIENCE,
CHAIR OF SOFTWARE ENGINEERING
✔EUROPEAN OPEN SCIENCE CLOUD TASK FORCE “FAIR METRICS AND DATA QUALITY”
✔EDSC AMBASSADOR (EUROPEAN DIGITAL SKILLS CERTIFICATE, AS PART OF ACTION 9 OF THE DIGITAL EDUCATION ACTION PLAN (2021- 2027) –
JRC/SVQ/2022/OP/0013)
✔IFIP WG8.5 ON ICT AND PUBLIC ADMINISTRATION MEMBER
✔ASSOCIATE MEMBER OF THE LATVIAN OPEN TECHNOLOGY ASSOCIATION
✔EXPERT OF THE LATVIAN COUNCIL OF SCIENCES IN (1) NATURAL SCIENCES – COMPUTER SCIENCE & INFORMATICS, (2) ENGINEERING & TECHNOLOGY-
ELECTRICAL ENGINEERING, ELECTRONICS, ICT, (3) SOCIAL SCIENCES – ECONOMICS & BUSINESS
✔EXPERT OF THE COST – EUROPEAN COOPERATION IN SCIENCE & TECHNOLOGY
✔VISITING RESEARCHER AT THE DELFT UNIVERSITY OF TEHNOLOGY, FACULTY TECHNOLOGY POLICY AND MANAGEMENT (TPM)
✔ASSISTANT PROFESSOR AT THE FACULTY OF COMPUTING, UNIVERSITY OF LATVIA
✔RESEARCHER IN THE INNOVATION LABORATORY, FACULTY OF COMPUTING, UNIVERSITY OF LATVIA
✔IT-EXPERT AT THE LATVIAN BIOMEDICAL RESEARCH AND STUDY CENTRE, BBMRI-ERIC LV NATIONAL NODE
✔ADVISOR FOR THE INSTITUTE FOR SOCIAL AND POLITICAL STUDIES, UNIVERSITY OF LATVIA
✔DATA SECURITY SOLUTIONS, LATVIA
✔VISITING RESEARCHER AT THE DELFT UNIVERSITY OF TEHNOLOGY, FACULTY TECHNOLOGY POLICY AND MANAGEMENT (TPM)
✔ASSISTANT PROFESSOR AT THE FACULTY OF COMPUTING, UNIVERSITY OF LATVIA
✔RESEARCHER IN THE INNOVATION LABORATORY, FACULTY OF COMPUTING, UNIVERSITY OF LATVIA
✔IT-EXPERT AT THE LATVIAN BIOMEDICAL RESEARCH AND STUDY CENTRE, BBMRI-ERIC LV NATIONAL NODE
✔ADVISOR FOR THE INSTITUTE FOR SOCIAL AND POLITICAL STUDIES, UNIVERSITY OF LATVIA
✔DATA SECURITY SOLUTIONS, LATVIA
MOST RECENT EXPERIENCE
PAST EXPERIENCE
https://www.linkedin.com/posts/georgefirican_data-dataquality-datamanagement-activity-7001229524768108544-v-ne/?originalSubdomain=mv
Data Quality as a prerequisite for you business success: when should I start taking care of it?
Data Quality as a prerequisite for you business success: when should I start taking care of it?
DATA … DATA ARE EVERYWHERE
Sources: Premium Vector | Artificial intelligence logo, icon. vector symbol ai, deep learning blockchain neural network concept. machine learning, artificial intelligence, ai. (freepik.com), Top 10 Successful Data Science Companies in 2023 - Learn | Hevo (hevodata.com),
How to Use Business Intelligence (BI) to Improve Organizational Alignment | Wyn Enterprise (grapecity.com), Machine learning logo - Wi6Labs, Business Intelligence Icon Gráfico por aimagenarium · Creative Fabrica, Open Data – GEOAFRICA,
https://www.gartner.com/en/articles/4-emerging-technologies-you-need-to-know-about?utm_medium=social&utm_source=linkedin&utm_campaign=SM_GB_YOY_GTR_SOC_SF1_SM-SWG&utm_content=&sf267111387=1
DATA … DATA ARE EVERYWHERE
M-Files on Twitter: "Data is the New Oil – Especially in Oil and Gas! https://t.co/zFlrvQqlMs https://t.co/qE3Q4aLNQy" / Twitter
DATA QUALITY - WHAT, WHY, HOW, 10 BEST PRACTICES & MORE - Enterprise Master Data Management • Profisee
https://dataladder.com/the-impact-of-poor-data-quality-risks-challenges-and-solutions/
https://twitter.com/bright_data/status/1346443370718240768
🤨 "Data is the new oil."​ | LinkedIn
Data is the New Oil - HubMeta
Data is the New Oil - HubMeta
NOT REALLY
“DATA IS THE NEW OIL” WHY IT IS NOT?
BUT!
✓
Source: Here's Why Data Is Not The New Oil (forbes.com), Image sources: Oil well – Wikipedia, How do we get oil and gas out of the ground? (world-petroleum.org), Customized Silos For Effective Storage of Food | Nextech Solutions (nextechagrisolutions.com)
DATA, LIKE OIL is a source of power,
and those, who control them,
are establishing themselves as «masters of the universe»,
just as oil barons did 100 years ago
effectively infinitely durable and reusable
treating like oil –storing in siloes, has little benefit & reduces its usefulness
a finite resource
can be replicated indefinitely & moved around the world at
the speed of light, at low cost, through fiber optic networks
OIL
requires huge amounts of resources to be
transported to where it is needed
when used, its energy being lost as heat or light, or
permanently converted into another form (e.g., plastic)
becomes more useful the more it is used - once
processed, data often reveals further applications
as the world’s oil reserves dwindle, extracting
it becomes increasingly difficult and expensive
becoming increasingly available as computer
technology advances
data mining doesn’t intrinsically involve damage to the
environment & exploitation of finite natural resources
*apart from the electricity used to run the system
oil drilling involve causing damage to the natural
environment and exploitation of finite natural resources
“DATA IS THE NEW OIL” WHY IT IS NOT?
✘
Source: Here's Why Data Is Not The New Oil (forbes.com), Image sources: Oil well – Wikipedia, How do we get oil and gas out of the ground? (world-petroleum.org), Customized Silos For Effective Storage of Food | Nextech Solutions (nextechagrisolutions.com)
DATA
✘
✘
✘
✘
IF WE THINK ABOUT DATA AS A POWER SOURCE OR FUEL,
IT WOULD MAKE MORE SENSE TO COMPARE THEM WITH
RENEWABLE SOURCES LIKE THE
SUN, WIND AND TIDES”
-B. Marr, Forbes
Here's Why Data Is Not The New Oil (forbes.com)
Letter from the Editor: Here comes the sun (medicalnewstoday.com), A healthy wind | MIT News | Massachusetts Institute of Technology, Tidal phenomenon: high and low tides | Ponant Magazine
AMONG OTHER “NUANCES”,
DATA QUALITY IS USE-CASE DEPENDENT AND DYNAMIC IN NATURE
“ABSOLUTE DATA QUALITY”
DATA QUALITY LEVEL AT WHICH THE DATA WOULD SATISFY
ALL POSSIBLE USE CASES - IS IMPOSSIBLE TO ACHIEVE,
BUT IT IS A GOAL TO BE PURSUED
Data Quality as a prerequisite for you business success: when should I start taking care of it?
Def. 1: FITNESS-FOR-USE
Def. 2: FITNESS-FOR-PURPOSE
Def. 3: FREE OF ERRORS
Def. 1: FITNESS-FOR-USE
Def. 2: FITNESS-FOR-PURPOSE
Def. 3: FREE OF ERRORS
UTILITY*
WARRANTY*
=
=
According to ITIL® 4: the framework for the management of IT-enabled service
ISO def.: THE DEGREE TO WHICH
DATA SATISFIES THE REQUIREMENTS
OF ITS INTENDED PURPOSE
ISO/IEC 25012
IN SIMPLER TERMS… THINK OF WINE…
INTRINSIC - flavor type & intensity
EXTRINSIC - brand, packaging…
Based on ISO 19157,
Langstaff, S. A. (2010). Sensory quality control in the wine industry.
Lacagnina, C., David, R., Nikiforova, A., Kuusniemi, M. E., Cappiello, C., Biehlmaier, O., Wright, L., Schubert, C., Bertino, A., Thiemann, H., & Dennis, R. (2023). Towards a data quality framework for
Data Quality as a prerequisite for you business success: when should I start taking care of it?
NOT ONLY ABOUT WHAT, BUT
ALSO ABOUT HOW?
IT IS A PROCESS
NOT ONLY ABOUT WHAT, BUT
ALSO ABOUT HOW?
IT IS A PROCESS –
DATA QUALITY MANAGEMENT PROCESS
Data Quality as a prerequisite for you business success: when should I start taking care of it?
DEFINE
MEASURE
ANALYSE
IMPROVE TDQM
DATA QUALITY MANAGEMENT PROCESS
TOTAL DATA QUALITY MANAGEMENT LIFCYCLE (BY MIT)
DEFINE: IDENTIFY RELEVANT DQ DIMENSIONS
MEASURE: PRODUCE DQ METRICS
ANALYSE: IDENTIFY ROOT CAUSES FOR DQ PROBLEMS AND
DETERMINE THE IMPACT OF POOR DQ
IMPROVE: IDENTIFY AND EMPLOY TECHNIQUES FOR
IMPROVING DQ
•Lacagnina, C., David, R., Nikiforova, A., Kuusniemi, M. E., Cappiello, C., Biehlmaier, O., Wright, L.,
Schubert, C., Bertino, A., Thiemann, H., & Dennis, R. (2023). Towards a data quality framework
for EOSC. Zenodo. https://doi.org/10.5281/zenodo.7515816
Source: https://healthinstitute.illinois.edu/connect/news/berd-tips-dimensions-of-data-quality
AVAILABILITY
INTERNAL CONSISTENCY
EXTERNAL CONSISTENCY
ACCESSIBILITY
COMPREHENSIVENESS
INTEGRITY
SEMANTIC ACCURACY
SYNTACTIC ACCURACY
RELEVANCE
BELIEVABILITY
TRUSTWORTHINESS
UNAMBIGUITY
DQ DIMENSIONS
CURRENCY
VOLATILITY
EASE OF UNDERSTANDING
CREDIBILITY
PORTABILITY
RESPONSIVENESS
OBJECTIVITY
REPUTATION
RELIABILITY
AND MANY MORE…
Relevance
Availability
Internal consistency
External consistency
Accessibility
Comprehensiveness
Believability
Integrity
Trustworthiness
Semantic accuracy
Unambiguity
Syntactic accuracy
Source: https://healthinstitute.illinois.edu/connect/news/berd-tips-dimensions-of-data-quality
THERE ARE MORE THAN 100 DATA QUALITY DIMENSIONS
IS THERE ANY COMMONLY ACCEPTED DQ DIMENSION
CLASSIFICATION?
https://iso25000.com/index.php/en/iso-25000-standards/iso-25012/136-iso-iec-2012
ISO 25012
SOFTWARE ENGINEERING — SOFTWARE
PRODUCT QUALITY REQUIREMENTS
AND EVALUATION (SQUARE) — DATA
QUALITY MODEL
DIMENSIONS VARY IN DEFINITION AND SCOPE
ONE AND THE SAME NOTION CAN REFER TO DIFFERENT DIMENSIONS
ONE AND THE SAME DIMENSION CAN HAVE
DIFFERENT NOTIONS [IN DIFFERENT SOURCES]
DATA QUALITY RULES ARE THEN DEFINED
FOR EACH DIMENSION
METRICS ARE THEN SELECTED FOR THEM
SIMPLER
USER-ORIENTED
APPROACH
BASED ON USER DEFINED DATA
QUALITY REQUIREMENTS
✓ STANDARDIZATION, NORMALIZATION AND PARSING
✓ MATCHING / DEDUPLICATION AND MERGING
✓ DATA CLEANSING
✓ VALIDATION
✓ DATA PROFILING / AUDITING
✓ SOME A FEW OF THEM SUPPORT (SEMI-)AUTOMATED DQ RULE RECOGNITION
BASED ON METADATA, BUILT-IN RULES, OR MACHINE LEARNING
DQ TOOLS FOR (SEMI-)AUTOMATED DQM
Data Quality as a prerequisite for you business success: when should I start taking care of it?
SO FAR…
DEFINITION USER TIME
DIMENSION
PROCESS PURPOSE
SO FAR…
DEFINITION USER TIME
DIMENSION
PROCESS PURPOSE
WHAT ELSE?
DATA OBJECT
DATASET
DATABASE DATA REPOSITORY INFORMATION SYSTEM
SOFTWARE
NO ONE-SIZE-FITS-ALL
DATA OBJECT
DATASET
DATABASE DATA REPOSITORY INFORMATION SYSTEM
SOFTWARE
DATA OWNER
KNOWN
THIRD-PARTY
NO ONE-SIZE-FITS-ALL
DATA OBJECT
DATASET
DATABASE DATA REPOSITORY INFORMATION SYSTEM
SOFTWARE
DATA STRUCTURE
NO ONE-SIZE-FITS-ALL
STRUCTURED DATA UNSTRUCTURED DATA
SEMI-STRUCTURED DATA
Image sources: https://monkeylearn.com/blog/semi-structured-data/, https://www.pngitem.com/middle/ioJTTbR_organization-structure-icon-png-download-structures-icon-png/
DATA OBJECT
DATASET
DATABASE DATA REPOSITORY INFORMATION SYSTEM
SOFTWARE
DATA WAREHOUSE DATA LAKE
Maybe even something else?
NO ONE-SIZE-FITS-ALL
DATA OBJECT
DATASET
DATABASE DATA REPOSITORY INFORMATION SYSTEM
SOFTWARE
Running Analytics on the Data Lake - The Databricks Blog
NO ONE-SIZE-FITS-ALL
Image source: https://www.grazitti.com/blog/data-lake-vs-data-warehouse-which-one-should-you-go-for/, https://www.qubole.com/data-lakes-vs-data-warehouses-the-co-existence-argument/
SCHEMA ON READ
SCHEMA ON WRITE
“SINGLE SOURCE
OF TRUTH”
Implementing a Data Lake or Data Warehouse Architecture for Business Intelligence? | by Lan Chu | Towards Data Science
NB: EXTRACT-TRANSFORM-LOAD
IS NOT DQM!!!
https://www.slideteam.net/data-lake-it-avoid-data-swamp-in-a-data-lake.html
HOW TO AVOID DATA SWAMP?
Data Quality as a prerequisite for you business success: when should I start taking care of it?
Image source: The abstracted future of data engineering | by Justin Gage | Datalogue | Medium
OR HOW TO AVOID GIGO*?
*“GARBAGE IN, GARBAGE OUT”
DATA LAKE FOR BI
BUSINESS DATA LAKE
https://www.capgemini.com/wp-content/uploads/2017/07/pivotal_data_lake_vs_traditional_bi_20140805.pdf
DATA LAKE
+
DATA WRANGLING
[an asset, not a silver bullet]
✔
Source: https://monkeylearn.com/blog/data-wrangling/, https://www.altair.com/what-is-data-wrangling/ , https://pediaa.com/what-is-the-difference-between-data-wrangling-and-data-cleaning
Image source: https://www.google.com/url?sa=i&url=https%3A%2F%2Ftwitter.com%2Frokar9%2Fstatus%2F1452339921629302784&psig=AOvVaw2IUSKtgUWxeaplk56f7CoK&ust=1668004535620000&source=images&cd=vfe&ved=0CA4QjhxqFwoTCJDHwbjnnvsCFQAAAAAdAAAAABAM
THE DATA WRANGLING PROCESS TO PREPARE DATA AND INTEGRATE IT INTO IS
DEPENDING ON THE IS AND THE DESIRED OR REQUIRED TARGET QUALITY*, INDIVIDUAL STEPS
SHOULD BE CARRIED OUT SEVERAL TIMES ➔ !!! DATA WRANGLING IS A CONTINUOUS PROCESS
!!! THAT REPEATS ITSELF REPEATEDLY AT REGULAR INTERVALS.
Information
System
Azeroual, O., Schöpfel, J., Ivanovic, D., & Nikiforova, A. (2022). Combining data lake and
data wrangling for ensuring data quality in CRIS. Procedia Computer Science, 211, 3-16.
DATA LAKE VS DATA WAREHOUSE
HOW TO TAKE
THE ADVANTAGES OF BOTH?
DATA LAKE VS DATA WAREHOUSE
HOW TO TAKE
THE ADVANTAGES OF BOTH?
DATA LAKEHOUSE
DATA LAKEHOUSE IS SEEN AS A COMBINATION OF DATA WAREHOUSING WORKLOADS & DATA LAKE ECONOMICS
Running Analytics on the Data Lake - The Databricks Blog
Running Analytics on the Data Lake - The Databricks Blog, Build a Lake House Architecture on AWS | AWS Big Data Blog (amazon.com), The Data Lakehouse, the Data Warehouse and a Modern Data platform architecture - Microsoft Community Hub
DATA OBJECT
DATASET
DATABASE DATA REPOSITORY INFORMATION SYSTEM
SOFTWARE
Running Analytics on the Data Lake - The Databricks Blog
DATA QUALITY-AWARE SOFTWARE
DEVELOPMENT
&
DATA QUALITY MODEL-BASED TESTING
THINK DATA QUALITY FIRST!!! OR TOWARDS DATA
QUALITY BY DESIGN
Guerra-García, C., Nikiforova, A., Jiménez, S., Perez-Gonzalez, H. G., Ramírez-Torres, M., & Ontañon-
García, L. (2023). ISO/IEC 25012-based methodology for managing data quality requirements in the
development of information systems: Towards Data Quality by Design. Data & Knowledge
Engineering, 145,
DAQUAVORD - A METHODOLOGY FOR PROJECT MANAGEMENT OF DATA QUALITY REQUIREMENTS
SPECIFICATION - AIMED AT ELICITING DQ REQUIREMENTS ARISING FROM DIFFERENT USERS’ VIEWPOINTS
THESE DQ REQUIREMENTS SERVE AS DATA QUALITY SOFTWARE REQUIREMENT AT THE TIME
OF THE DEVELOPMENT OF SOFTWARE THAT TAKES DATA QUALITY INTO ACCOUNT BY
DEFAULT.
IS BASED ON THE VIEWPOINT-ORIENTED REQUIREMENTS DEFINITION (VORD) METHOD, AND
THE LATEST AND MOST GENERALLY ACCEPTED ISO/IEC 25012 STANDARD.
DATA ARTIFACT
WHAT DQM APPROACH DEPENDS ON?
DEFINITION USER
TIME
DIMENSION
PROCESS PURPOSE
Data Quality as a prerequisite for you business success: when should I start taking care of it?
MUSK’S TOP PRIORITY: TO IMPROVE THE
PRODUCT…
Q: HOW DOES ONE ENSURE THE RELIABILITY OF DATA
AND DECISIONS MADE BASED ON SAID DATA?
THE ANSWER LIES NOT IN MANAGING THE DATA ALONE,
BUT ALSO THE INFORMATION AROUND AND ABOUT DATA
ACQUISITION, TRANSFORMATIONS AND VISUALIZATION
TO PROVIDE A BETTER UNDERSTANDING AND SUPPORT
DECISION MAKERS
https://www.gqindia.com/get-smart/content/5-things-elon-musk-did-to-become-one-of-the-richest-men-in-the-world
https://www.gqindia.com/get-smart/content/5-things-elon-musk-did-to-become-one-of-the-richest-men-in-the-world
MUSK’S TOP PRIORITY: TO IMPROVE THE
PRODUCT…
Q: HOW DOES ONE ENSURE THE RELIABILITY OF DATA
AND DECISIONS MADE BASED ON SAID DATA?
THE ANSWER LIES NOT IN MANAGING THE DATA ALONE,
BUT ALSO THE INFORMATION AROUND AND ABOUT DATA
ACQUISITION, TRANSFORMATIONS AND VISUALIZATION
TO PROVIDE A BETTER UNDERSTANDING AND SUPPORT
DECISION MAKERS
BY FOCUSING ON SUSTAINABLE DATA, CLEAR
DATA GOVERNANCE
AND STRONG DATA MANAGEMENT
https://www.softcrylic.com/blogs/data-catalogs-in-data-governance/
https://www.gqindia.com/get-smart/content/5-things-elon-musk-did-to-become-one-of-the-richest-men-in-the-world
DATA GOVERNANCE IS THE ANSWER
https://www.edq.com/blog/data-quality-vs-data-governance/
Azeroual O., Nikiforova A., Sha K. (2023) Overlooked Aspects of Data Governance:
Workflow Framework For Enterprise Data Deduplication
https://www.gqindia.com/get-smart/content/5-things-elon-musk-did-to-become-one-of-the-richest-men-in-the-world
DATA GOVERNANCE IS THE ANSWER
https://www.edq.com/blog/data-quality-vs-data-governance/
Data Quality as a prerequisite for you business success: when should I start taking care of it?
https://www.gqindia.com/get-smart/content/5-things-elon-musk-did-to-become-one-of-the-richest-men-in-the-world
DATA QUALITY MANAGEMENT IS A CONTINUOUS PROCESS
https://www.gqindia.com/get-smart/content/5-things-elon-musk-did-to-become-one-of-the-richest-men-in-the-world
THINK DATA QUALITY FIRST!
“1-10-100” RULE
1$ SPENT ON PREVENTION
SAVES 10$ ON APPRAISAL AND
100$ ON FAILURE COSTS
https://twitter.com/bright_data/status/1346443370718240768
https://www.gqindia.com/get-smart/content/5-things-elon-musk-did-to-become-one-of-the-richest-men-in-the-world
DEVELOP DATA QUALITY MANAGEMENT AND
GOVERNANCE STRATEGIES
MANTAIN DQM & DQG STRATEGIES
DEFINE
MEASURE
ANALYSE
IMPROVE
Data Quality as a prerequisite for you business success: when should I start taking care of it?
+
=
https://starwars.fandom.com/wiki/Destruction_of_Despayre, https://www.linkedin.com/posts/georgefirican_data-dataquality-datamanagement-activity-7001229524768108544-v-ne/?originalSubdomain=mv, History in Objects: Death Star Plans Datacard • Lucasfilm, Video Analysis of an Exploding Death Star | WIRED
+
=
https://starwars.fandom.com/wiki/Destruction_of_Despayre, https://www.linkedin.com/posts/georgefirican_data-dataquality-datamanagement-activity-7001229524768108544-v-ne/?originalSubdomain=mv, History in Objects: Death Star Plans Datacard • Lucasfilm, Video Analysis of an Exploding Death Star | WIRED
For more information, see ResearchGate,
anastasijanikiforova.com
For questions or any queries, contact me via
Nikiforova.Anastasija@gmail.com,

Contenu connexe

Tendances

Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best PracticesDATAVERSITY
 
Collibra - Forrester Presentation : Data Governance 2.0
Collibra - Forrester Presentation : Data Governance 2.0Collibra - Forrester Presentation : Data Governance 2.0
Collibra - Forrester Presentation : Data Governance 2.0Guillaume LE GALIARD
 
Approaching Data Quality
Approaching Data QualityApproaching Data Quality
Approaching Data QualityDATAVERSITY
 
Data Governance
Data GovernanceData Governance
Data GovernanceRob Lux
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management DATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
MDM Strategy & Roadmap
MDM Strategy & RoadmapMDM Strategy & Roadmap
MDM Strategy & Roadmapvictorlbrown
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data GovernanceTuba Yaman Him
 
How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model DATUM LLC
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best PracticesDATAVERSITY
 
Data strategy demistifying data
Data strategy demistifying dataData strategy demistifying data
Data strategy demistifying dataHans Verstraeten
 
Data Governance
Data GovernanceData Governance
Data GovernanceBoris Otto
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata ManagementDATAVERSITY
 
Glossaries, Dictionaries, and Catalogs Result in Data Governance
Glossaries, Dictionaries, and Catalogs Result in Data GovernanceGlossaries, Dictionaries, and Catalogs Result in Data Governance
Glossaries, Dictionaries, and Catalogs Result in Data GovernanceDATAVERSITY
 
data-analytics-strategy-ebook.pptx
data-analytics-strategy-ebook.pptxdata-analytics-strategy-ebook.pptx
data-analytics-strategy-ebook.pptxMohamedHendawy17
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best PracticesDATAVERSITY
 
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...DATAVERSITY
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureDATAVERSITY
 

Tendances (20)

Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 
Collibra - Forrester Presentation : Data Governance 2.0
Collibra - Forrester Presentation : Data Governance 2.0Collibra - Forrester Presentation : Data Governance 2.0
Collibra - Forrester Presentation : Data Governance 2.0
 
Approaching Data Quality
Approaching Data QualityApproaching Data Quality
Approaching Data Quality
 
Data Governance
Data GovernanceData Governance
Data Governance
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
MDM Strategy & Roadmap
MDM Strategy & RoadmapMDM Strategy & Roadmap
MDM Strategy & Roadmap
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data Governance
 
How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 
Data strategy demistifying data
Data strategy demistifying dataData strategy demistifying data
Data strategy demistifying data
 
Data Governance
Data GovernanceData Governance
Data Governance
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata Management
 
Glossaries, Dictionaries, and Catalogs Result in Data Governance
Glossaries, Dictionaries, and Catalogs Result in Data GovernanceGlossaries, Dictionaries, and Catalogs Result in Data Governance
Glossaries, Dictionaries, and Catalogs Result in Data Governance
 
data-analytics-strategy-ebook.pptx
data-analytics-strategy-ebook.pptxdata-analytics-strategy-ebook.pptx
data-analytics-strategy-ebook.pptx
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
 

Similaire à Data Quality as a prerequisite for you business success: when should I start taking care of it?

Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...Anastasija Nikiforova
 
OPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERS
OPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERSOPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERS
OPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERSAnastasija Nikiforova
 
BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. maigva
 
Smart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart dataSmart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart datacaniceconsulting
 
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaBIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaMaria de la Iglesia
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and AnalyticsDhruv Saxena
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionFabio Stella
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Challenges and outlook with Big Data
Challenges and outlook with Big Data Challenges and outlook with Big Data
Challenges and outlook with Big Data IJCERT JOURNAL
 
Mining Big Data using Genetic Algorithm
Mining Big Data using Genetic AlgorithmMining Big Data using Genetic Algorithm
Mining Big Data using Genetic AlgorithmIRJET Journal
 
How Can Public Data Help Your Organization? An Introduction to DataCommons.org
How Can Public Data Help Your Organization? An Introduction to DataCommons.orgHow Can Public Data Help Your Organization? An Introduction to DataCommons.org
How Can Public Data Help Your Organization? An Introduction to DataCommons.orgTechSoup
 
Supervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For CancerSupervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For Cancerpaperpublications3
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptxPerumalPitchandi
 
Mining Social Media Data for Understanding Drugs Usage
Mining Social Media Data for Understanding Drugs  UsageMining Social Media Data for Understanding Drugs  Usage
Mining Social Media Data for Understanding Drugs UsageIRJET Journal
 
A Research Study On Data Mining
A Research Study On Data MiningA Research Study On Data Mining
A Research Study On Data MiningJessica Oatis
 
Cisco service innovation 20110418 v2
Cisco service innovation 20110418 v2Cisco service innovation 20110418 v2
Cisco service innovation 20110418 v2ISSIP
 

Similaire à Data Quality as a prerequisite for you business success: when should I start taking care of it? (20)

Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
 
OPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERS
OPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERSOPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERS
OPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERS
 
BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm.
 
Smart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart dataSmart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart data
 
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaBIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
 
2017 11 cascd
2017 11 cascd2017 11 cascd
2017 11 cascd
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - Introduction
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Challenges and outlook with Big Data
Challenges and outlook with Big Data Challenges and outlook with Big Data
Challenges and outlook with Big Data
 
Mining Big Data using Genetic Algorithm
Mining Big Data using Genetic AlgorithmMining Big Data using Genetic Algorithm
Mining Big Data using Genetic Algorithm
 
How Can Public Data Help Your Organization? An Introduction to DataCommons.org
How Can Public Data Help Your Organization? An Introduction to DataCommons.orgHow Can Public Data Help Your Organization? An Introduction to DataCommons.org
How Can Public Data Help Your Organization? An Introduction to DataCommons.org
 
Supervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For CancerSupervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For Cancer
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Data Science Intro.pptx
Data Science Intro.pptxData Science Intro.pptx
Data Science Intro.pptx
 
Mining Social Media Data for Understanding Drugs Usage
Mining Social Media Data for Understanding Drugs  UsageMining Social Media Data for Understanding Drugs  Usage
Mining Social Media Data for Understanding Drugs Usage
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
A Research Study On Data Mining
A Research Study On Data MiningA Research Study On Data Mining
A Research Study On Data Mining
 
Cisco service innovation 20110418 v2
Cisco service innovation 20110418 v2Cisco service innovation 20110418 v2
Cisco service innovation 20110418 v2
 
Bigdata
BigdataBigdata
Bigdata
 

Plus de Anastasija Nikiforova

Towards High-Value Datasets determination for data-driven development: a syst...
Towards High-Value Datasets determination for data-driven development: a syst...Towards High-Value Datasets determination for data-driven development: a syst...
Towards High-Value Datasets determination for data-driven development: a syst...Anastasija Nikiforova
 
Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...Anastasija Nikiforova
 
Framework for understanding quantum computing use cases from a multidisciplin...
Framework for understanding quantum computing use cases from a multidisciplin...Framework for understanding quantum computing use cases from a multidisciplin...
Framework for understanding quantum computing use cases from a multidisciplin...Anastasija Nikiforova
 
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Anastasija Nikiforova
 
Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Putting FAIR Principles in the Context of Research Information: FAIRness for ...Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Putting FAIR Principles in the Context of Research Information: FAIRness for ...Anastasija Nikiforova
 
Open data hackathon as a tool for increased engagement of Generation Z: to h...
Open data hackathon as a tool for increased engagement of Generation Z:  to h...Open data hackathon as a tool for increased engagement of Generation Z:  to h...
Open data hackathon as a tool for increased engagement of Generation Z: to h...Anastasija Nikiforova
 
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...Anastasija Nikiforova
 
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRISCombining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRISAnastasija Nikiforova
 
The role of open data in the development of sustainable smart cities and smar...
The role of open data in the development of sustainable smart cities and smar...The role of open data in the development of sustainable smart cities and smar...
The role of open data in the development of sustainable smart cities and smar...Anastasija Nikiforova
 
Data security as a top priority in the digital world: preserve data value by ...
Data security as a top priority in the digital world: preserve data value by ...Data security as a top priority in the digital world: preserve data value by ...
Data security as a top priority in the digital world: preserve data value by ...Anastasija Nikiforova
 
IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...
IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...
IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...Anastasija Nikiforova
 
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Anastasija Nikiforova
 
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...Anastasija Nikiforova
 
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...Anastasija Nikiforova
 
Towards enrichment of the open government data: a stakeholder-centered determ...
Towards enrichment of the open government data: a stakeholder-centered determ...Towards enrichment of the open government data: a stakeholder-centered determ...
Towards enrichment of the open government data: a stakeholder-centered determ...Anastasija Nikiforova
 
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...Anastasija Nikiforova
 
ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...
ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...
ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...Anastasija Nikiforova
 
Towards a Concurrence Analysis in Business Processes
Towards a Concurrence Analysis in Business ProcessesTowards a Concurrence Analysis in Business Processes
Towards a Concurrence Analysis in Business ProcessesAnastasija Nikiforova
 
DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...
DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...
DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...Anastasija Nikiforova
 

Plus de Anastasija Nikiforova (20)

Towards High-Value Datasets determination for data-driven development: a syst...
Towards High-Value Datasets determination for data-driven development: a syst...Towards High-Value Datasets determination for data-driven development: a syst...
Towards High-Value Datasets determination for data-driven development: a syst...
 
Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...
 
Framework for understanding quantum computing use cases from a multidisciplin...
Framework for understanding quantum computing use cases from a multidisciplin...Framework for understanding quantum computing use cases from a multidisciplin...
Framework for understanding quantum computing use cases from a multidisciplin...
 
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
 
Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Putting FAIR Principles in the Context of Research Information: FAIRness for ...Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Putting FAIR Principles in the Context of Research Information: FAIRness for ...
 
Open data hackathon as a tool for increased engagement of Generation Z: to h...
Open data hackathon as a tool for increased engagement of Generation Z:  to h...Open data hackathon as a tool for increased engagement of Generation Z:  to h...
Open data hackathon as a tool for increased engagement of Generation Z: to h...
 
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
 
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRISCombining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
 
The role of open data in the development of sustainable smart cities and smar...
The role of open data in the development of sustainable smart cities and smar...The role of open data in the development of sustainable smart cities and smar...
The role of open data in the development of sustainable smart cities and smar...
 
Data security as a top priority in the digital world: preserve data value by ...
Data security as a top priority in the digital world: preserve data value by ...Data security as a top priority in the digital world: preserve data value by ...
Data security as a top priority in the digital world: preserve data value by ...
 
IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...
IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...
IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...
 
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
 
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
 
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
 
Towards enrichment of the open government data: a stakeholder-centered determ...
Towards enrichment of the open government data: a stakeholder-centered determ...Towards enrichment of the open government data: a stakeholder-centered determ...
Towards enrichment of the open government data: a stakeholder-centered determ...
 
Atvērto datu potenciāls
Atvērto datu potenciālsAtvērto datu potenciāls
Atvērto datu potenciāls
 
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...
 
ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...
ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...
ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...
 
Towards a Concurrence Analysis in Business Processes
Towards a Concurrence Analysis in Business ProcessesTowards a Concurrence Analysis in Business Processes
Towards a Concurrence Analysis in Business Processes
 
DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...
DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...
DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...
 

Dernier

PhD summary of Luuk Brederode, presented at 2023-10-17 to Veitch Lister Consu...
PhD summary of Luuk Brederode, presented at 2023-10-17 to Veitch Lister Consu...PhD summary of Luuk Brederode, presented at 2023-10-17 to Veitch Lister Consu...
PhD summary of Luuk Brederode, presented at 2023-10-17 to Veitch Lister Consu...Luuk Brederode
 
Final PPT.ppt about human detection and counting
Final PPT.ppt  about human detection and countingFinal PPT.ppt  about human detection and counting
Final PPT.ppt about human detection and countingArbazAhmad25
 
presentation by faizan[1] [Read-Only].pptx
presentation by faizan[1] [Read-Only].pptxpresentation by faizan[1] [Read-Only].pptx
presentation by faizan[1] [Read-Only].pptxkhfaizan534
 
عناصر نباتية PDF.pdfbotanical elements..
عناصر نباتية PDF.pdfbotanical elements..عناصر نباتية PDF.pdfbotanical elements..
عناصر نباتية PDF.pdfbotanical elements..mennamohamed200y
 
autonomous_vehicle_working_paper_01072020-_508_compliant.pdf
autonomous_vehicle_working_paper_01072020-_508_compliant.pdfautonomous_vehicle_working_paper_01072020-_508_compliant.pdf
autonomous_vehicle_working_paper_01072020-_508_compliant.pdfPandurangGurakhe
 
عناصر نباتية PDF.pdf architecture engineering
عناصر نباتية PDF.pdf architecture engineeringعناصر نباتية PDF.pdf architecture engineering
عناصر نباتية PDF.pdf architecture engineeringmennamohamed200y
 
electricity generation from food waste - based bioenergy with IOT.pptx
electricity generation from food waste - based bioenergy with IOT.pptxelectricity generation from food waste - based bioenergy with IOT.pptx
electricity generation from food waste - based bioenergy with IOT.pptxAravindhKarthik1
 
Introduction to Machine Learning Unit-2 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-2 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-2 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-2 Notes for II-II Mechanical EngineeringC Sai Kiran
 
Research paper publications: Meaning of Q1 Q2 Q3 Q4 Journal
Research paper publications: Meaning of Q1 Q2 Q3 Q4 JournalResearch paper publications: Meaning of Q1 Q2 Q3 Q4 Journal
Research paper publications: Meaning of Q1 Q2 Q3 Q4 JournalDr. Manjunatha. P
 
Fabrics Finishing Manual ( Arkey Knit Dyeing Mills Ltd)
Fabrics Finishing Manual ( Arkey Knit Dyeing Mills Ltd)Fabrics Finishing Manual ( Arkey Knit Dyeing Mills Ltd)
Fabrics Finishing Manual ( Arkey Knit Dyeing Mills Ltd)Mizan Rahman
 
Advanced Additive Manufacturing by Sumanth A.pptx
Advanced Additive Manufacturing by Sumanth A.pptxAdvanced Additive Manufacturing by Sumanth A.pptx
Advanced Additive Manufacturing by Sumanth A.pptxSumanth A
 
The Art of Cloud Native Defense on Kubernetes
The Art of Cloud Native Defense on KubernetesThe Art of Cloud Native Defense on Kubernetes
The Art of Cloud Native Defense on KubernetesJacopo Nardiello
 
A brief about Jeypore Sub-station Presentation
A brief about Jeypore Sub-station PresentationA brief about Jeypore Sub-station Presentation
A brief about Jeypore Sub-station PresentationJeyporess2021
 
0950_Rodriguez_200520_Work_done-GEOGalicia_ELAB-converted.pptx
0950_Rodriguez_200520_Work_done-GEOGalicia_ELAB-converted.pptx0950_Rodriguez_200520_Work_done-GEOGalicia_ELAB-converted.pptx
0950_Rodriguez_200520_Work_done-GEOGalicia_ELAB-converted.pptxssuser886c55
 
introduction to python, fundamentals and basics
introduction to python, fundamentals and basicsintroduction to python, fundamentals and basics
introduction to python, fundamentals and basicsKNaveenKumarECE
 
Wave Energy Technologies Overtopping 1 - Tom Thorpe.pdf
Wave Energy Technologies Overtopping 1 - Tom Thorpe.pdfWave Energy Technologies Overtopping 1 - Tom Thorpe.pdf
Wave Energy Technologies Overtopping 1 - Tom Thorpe.pdfErik Friis-Madsen
 
Conventional vs Modern method (Philosophies) of Tunneling-re.pptx
Conventional vs Modern method (Philosophies) of Tunneling-re.pptxConventional vs Modern method (Philosophies) of Tunneling-re.pptx
Conventional vs Modern method (Philosophies) of Tunneling-re.pptxSAQIB KHURSHEED WANI
 

Dernier (20)

PhD summary of Luuk Brederode, presented at 2023-10-17 to Veitch Lister Consu...
PhD summary of Luuk Brederode, presented at 2023-10-17 to Veitch Lister Consu...PhD summary of Luuk Brederode, presented at 2023-10-17 to Veitch Lister Consu...
PhD summary of Luuk Brederode, presented at 2023-10-17 to Veitch Lister Consu...
 
Final PPT.ppt about human detection and counting
Final PPT.ppt  about human detection and countingFinal PPT.ppt  about human detection and counting
Final PPT.ppt about human detection and counting
 
presentation by faizan[1] [Read-Only].pptx
presentation by faizan[1] [Read-Only].pptxpresentation by faizan[1] [Read-Only].pptx
presentation by faizan[1] [Read-Only].pptx
 
عناصر نباتية PDF.pdfbotanical elements..
عناصر نباتية PDF.pdfbotanical elements..عناصر نباتية PDF.pdfbotanical elements..
عناصر نباتية PDF.pdfbotanical elements..
 
autonomous_vehicle_working_paper_01072020-_508_compliant.pdf
autonomous_vehicle_working_paper_01072020-_508_compliant.pdfautonomous_vehicle_working_paper_01072020-_508_compliant.pdf
autonomous_vehicle_working_paper_01072020-_508_compliant.pdf
 
عناصر نباتية PDF.pdf architecture engineering
عناصر نباتية PDF.pdf architecture engineeringعناصر نباتية PDF.pdf architecture engineering
عناصر نباتية PDF.pdf architecture engineering
 
Caltrans view on recycling of in-place asphalt pavements
Caltrans view on recycling of in-place asphalt pavementsCaltrans view on recycling of in-place asphalt pavements
Caltrans view on recycling of in-place asphalt pavements
 
electricity generation from food waste - based bioenergy with IOT.pptx
electricity generation from food waste - based bioenergy with IOT.pptxelectricity generation from food waste - based bioenergy with IOT.pptx
electricity generation from food waste - based bioenergy with IOT.pptx
 
Introduction to Machine Learning Unit-2 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-2 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-2 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-2 Notes for II-II Mechanical Engineering
 
Research paper publications: Meaning of Q1 Q2 Q3 Q4 Journal
Research paper publications: Meaning of Q1 Q2 Q3 Q4 JournalResearch paper publications: Meaning of Q1 Q2 Q3 Q4 Journal
Research paper publications: Meaning of Q1 Q2 Q3 Q4 Journal
 
FOREST FIRE USING IoT-A Visual to UG students
FOREST FIRE USING IoT-A Visual to UG studentsFOREST FIRE USING IoT-A Visual to UG students
FOREST FIRE USING IoT-A Visual to UG students
 
Fabrics Finishing Manual ( Arkey Knit Dyeing Mills Ltd)
Fabrics Finishing Manual ( Arkey Knit Dyeing Mills Ltd)Fabrics Finishing Manual ( Arkey Knit Dyeing Mills Ltd)
Fabrics Finishing Manual ( Arkey Knit Dyeing Mills Ltd)
 
Advanced Additive Manufacturing by Sumanth A.pptx
Advanced Additive Manufacturing by Sumanth A.pptxAdvanced Additive Manufacturing by Sumanth A.pptx
Advanced Additive Manufacturing by Sumanth A.pptx
 
The Art of Cloud Native Defense on Kubernetes
The Art of Cloud Native Defense on KubernetesThe Art of Cloud Native Defense on Kubernetes
The Art of Cloud Native Defense on Kubernetes
 
A brief about Jeypore Sub-station Presentation
A brief about Jeypore Sub-station PresentationA brief about Jeypore Sub-station Presentation
A brief about Jeypore Sub-station Presentation
 
0950_Rodriguez_200520_Work_done-GEOGalicia_ELAB-converted.pptx
0950_Rodriguez_200520_Work_done-GEOGalicia_ELAB-converted.pptx0950_Rodriguez_200520_Work_done-GEOGalicia_ELAB-converted.pptx
0950_Rodriguez_200520_Work_done-GEOGalicia_ELAB-converted.pptx
 
introduction to python, fundamentals and basics
introduction to python, fundamentals and basicsintroduction to python, fundamentals and basics
introduction to python, fundamentals and basics
 
Wave Energy Technologies Overtopping 1 - Tom Thorpe.pdf
Wave Energy Technologies Overtopping 1 - Tom Thorpe.pdfWave Energy Technologies Overtopping 1 - Tom Thorpe.pdf
Wave Energy Technologies Overtopping 1 - Tom Thorpe.pdf
 
Conventional vs Modern method (Philosophies) of Tunneling-re.pptx
Conventional vs Modern method (Philosophies) of Tunneling-re.pptxConventional vs Modern method (Philosophies) of Tunneling-re.pptx
Conventional vs Modern method (Philosophies) of Tunneling-re.pptx
 
Update on the latest research with regard to RAP
Update on the latest research with regard to RAPUpdate on the latest research with regard to RAP
Update on the latest research with regard to RAP
 

Data Quality as a prerequisite for you business success: when should I start taking care of it?

  • 1. HackCodeX Forum 5.06.2023, Riga, Latvia DATA QUALITY AS A PREREQUISITE FOR BUSINESS SUCCESS: WHEN SHOULD I START TAKING CARE OF IT? Anastasija Nikiforova Assistant Professor of Information Systems, Faculty of Science and Technology, Institute of Computer Science, Chair of Software Engineering, University of Tartu European Open Science CLoud (EOSC) Task Force “FAIR metrics and data quality”
  • 2. PHD IN COMPUTER SCIENCE – DATA PROCESSING SYSTEMS AND DATA NETWORKING RESEARCH INTERESTS: DATA MANAGEMENT WITH A FOCUS ON DATA QUALITY, OPEN GOVERNMENT DATA, SMART CITY, SOCIETY 5.0, SUSTAINABLE DEVELOPMENT, IOT, HCI, DIGITIZATION. ✔ASSISTANT PROFESSOR AT THE UNIVERSITY OF TARTU, FACULTY OF SCIENCE AND TECHNOLOGY, INSTITUTE OF COMPUTER SCIENCE, CHAIR OF SOFTWARE ENGINEERING ✔EUROPEAN OPEN SCIENCE CLOUD TASK FORCE “FAIR METRICS AND DATA QUALITY” ✔EDSC AMBASSADOR (EUROPEAN DIGITAL SKILLS CERTIFICATE, AS PART OF ACTION 9 OF THE DIGITAL EDUCATION ACTION PLAN (2021- 2027) – JRC/SVQ/2022/OP/0013) ✔IFIP WG8.5 ON ICT AND PUBLIC ADMINISTRATION MEMBER ✔ASSOCIATE MEMBER OF THE LATVIAN OPEN TECHNOLOGY ASSOCIATION ✔EXPERT OF THE LATVIAN COUNCIL OF SCIENCES IN (1) NATURAL SCIENCES – COMPUTER SCIENCE & INFORMATICS, (2) ENGINEERING & TECHNOLOGY- ELECTRICAL ENGINEERING, ELECTRONICS, ICT, (3) SOCIAL SCIENCES – ECONOMICS & BUSINESS ✔EXPERT OF THE COST – EUROPEAN COOPERATION IN SCIENCE & TECHNOLOGY ✔ASSISTANT PROFESSOR AT THE UNIVERSITY OF TARTU, FACULTY OF SCIENCE AND TECHNOLOGY, INSTITUTE OF COMPUTER SCIENCE, CHAIR OF SOFTWARE ENGINEERING ✔EUROPEAN OPEN SCIENCE CLOUD TASK FORCE “FAIR METRICS AND DATA QUALITY” ✔EDSC AMBASSADOR (EUROPEAN DIGITAL SKILLS CERTIFICATE, AS PART OF ACTION 9 OF THE DIGITAL EDUCATION ACTION PLAN (2021- 2027) – JRC/SVQ/2022/OP/0013) ✔IFIP WG8.5 ON ICT AND PUBLIC ADMINISTRATION MEMBER ✔ASSOCIATE MEMBER OF THE LATVIAN OPEN TECHNOLOGY ASSOCIATION ✔EXPERT OF THE LATVIAN COUNCIL OF SCIENCES IN (1) NATURAL SCIENCES – COMPUTER SCIENCE & INFORMATICS, (2) ENGINEERING & TECHNOLOGY- ELECTRICAL ENGINEERING, ELECTRONICS, ICT, (3) SOCIAL SCIENCES – ECONOMICS & BUSINESS ✔EXPERT OF THE COST – EUROPEAN COOPERATION IN SCIENCE & TECHNOLOGY ✔VISITING RESEARCHER AT THE DELFT UNIVERSITY OF TEHNOLOGY, FACULTY TECHNOLOGY POLICY AND MANAGEMENT (TPM) ✔ASSISTANT PROFESSOR AT THE FACULTY OF COMPUTING, UNIVERSITY OF LATVIA ✔RESEARCHER IN THE INNOVATION LABORATORY, FACULTY OF COMPUTING, UNIVERSITY OF LATVIA ✔IT-EXPERT AT THE LATVIAN BIOMEDICAL RESEARCH AND STUDY CENTRE, BBMRI-ERIC LV NATIONAL NODE ✔ADVISOR FOR THE INSTITUTE FOR SOCIAL AND POLITICAL STUDIES, UNIVERSITY OF LATVIA ✔DATA SECURITY SOLUTIONS, LATVIA ✔VISITING RESEARCHER AT THE DELFT UNIVERSITY OF TEHNOLOGY, FACULTY TECHNOLOGY POLICY AND MANAGEMENT (TPM) ✔ASSISTANT PROFESSOR AT THE FACULTY OF COMPUTING, UNIVERSITY OF LATVIA ✔RESEARCHER IN THE INNOVATION LABORATORY, FACULTY OF COMPUTING, UNIVERSITY OF LATVIA ✔IT-EXPERT AT THE LATVIAN BIOMEDICAL RESEARCH AND STUDY CENTRE, BBMRI-ERIC LV NATIONAL NODE ✔ADVISOR FOR THE INSTITUTE FOR SOCIAL AND POLITICAL STUDIES, UNIVERSITY OF LATVIA ✔DATA SECURITY SOLUTIONS, LATVIA MOST RECENT EXPERIENCE PAST EXPERIENCE
  • 6. DATA … DATA ARE EVERYWHERE Sources: Premium Vector | Artificial intelligence logo, icon. vector symbol ai, deep learning blockchain neural network concept. machine learning, artificial intelligence, ai. (freepik.com), Top 10 Successful Data Science Companies in 2023 - Learn | Hevo (hevodata.com), How to Use Business Intelligence (BI) to Improve Organizational Alignment | Wyn Enterprise (grapecity.com), Machine learning logo - Wi6Labs, Business Intelligence Icon Gráfico por aimagenarium · Creative Fabrica, Open Data – GEOAFRICA, https://www.gartner.com/en/articles/4-emerging-technologies-you-need-to-know-about?utm_medium=social&utm_source=linkedin&utm_campaign=SM_GB_YOY_GTR_SOC_SF1_SM-SWG&utm_content=&sf267111387=1
  • 7. DATA … DATA ARE EVERYWHERE M-Files on Twitter: "Data is the New Oil – Especially in Oil and Gas! https://t.co/zFlrvQqlMs https://t.co/qE3Q4aLNQy" / Twitter
  • 8. DATA QUALITY - WHAT, WHY, HOW, 10 BEST PRACTICES & MORE - Enterprise Master Data Management • Profisee
  • 11. 🤨 "Data is the new oil."​ | LinkedIn
  • 12. Data is the New Oil - HubMeta
  • 13. Data is the New Oil - HubMeta NOT REALLY
  • 14. “DATA IS THE NEW OIL” WHY IT IS NOT? BUT! ✓ Source: Here's Why Data Is Not The New Oil (forbes.com), Image sources: Oil well – Wikipedia, How do we get oil and gas out of the ground? (world-petroleum.org), Customized Silos For Effective Storage of Food | Nextech Solutions (nextechagrisolutions.com) DATA, LIKE OIL is a source of power, and those, who control them, are establishing themselves as «masters of the universe», just as oil barons did 100 years ago
  • 15. effectively infinitely durable and reusable treating like oil –storing in siloes, has little benefit & reduces its usefulness a finite resource can be replicated indefinitely & moved around the world at the speed of light, at low cost, through fiber optic networks OIL requires huge amounts of resources to be transported to where it is needed when used, its energy being lost as heat or light, or permanently converted into another form (e.g., plastic) becomes more useful the more it is used - once processed, data often reveals further applications as the world’s oil reserves dwindle, extracting it becomes increasingly difficult and expensive becoming increasingly available as computer technology advances data mining doesn’t intrinsically involve damage to the environment & exploitation of finite natural resources *apart from the electricity used to run the system oil drilling involve causing damage to the natural environment and exploitation of finite natural resources “DATA IS THE NEW OIL” WHY IT IS NOT? ✘ Source: Here's Why Data Is Not The New Oil (forbes.com), Image sources: Oil well – Wikipedia, How do we get oil and gas out of the ground? (world-petroleum.org), Customized Silos For Effective Storage of Food | Nextech Solutions (nextechagrisolutions.com) DATA ✘ ✘ ✘ ✘
  • 16. IF WE THINK ABOUT DATA AS A POWER SOURCE OR FUEL, IT WOULD MAKE MORE SENSE TO COMPARE THEM WITH RENEWABLE SOURCES LIKE THE SUN, WIND AND TIDES” -B. Marr, Forbes Here's Why Data Is Not The New Oil (forbes.com) Letter from the Editor: Here comes the sun (medicalnewstoday.com), A healthy wind | MIT News | Massachusetts Institute of Technology, Tidal phenomenon: high and low tides | Ponant Magazine
  • 17. AMONG OTHER “NUANCES”, DATA QUALITY IS USE-CASE DEPENDENT AND DYNAMIC IN NATURE “ABSOLUTE DATA QUALITY” DATA QUALITY LEVEL AT WHICH THE DATA WOULD SATISFY ALL POSSIBLE USE CASES - IS IMPOSSIBLE TO ACHIEVE, BUT IT IS A GOAL TO BE PURSUED
  • 19. Def. 1: FITNESS-FOR-USE Def. 2: FITNESS-FOR-PURPOSE Def. 3: FREE OF ERRORS
  • 20. Def. 1: FITNESS-FOR-USE Def. 2: FITNESS-FOR-PURPOSE Def. 3: FREE OF ERRORS UTILITY* WARRANTY* = = According to ITIL® 4: the framework for the management of IT-enabled service
  • 21. ISO def.: THE DEGREE TO WHICH DATA SATISFIES THE REQUIREMENTS OF ITS INTENDED PURPOSE ISO/IEC 25012
  • 22. IN SIMPLER TERMS… THINK OF WINE… INTRINSIC - flavor type & intensity EXTRINSIC - brand, packaging… Based on ISO 19157, Langstaff, S. A. (2010). Sensory quality control in the wine industry. Lacagnina, C., David, R., Nikiforova, A., Kuusniemi, M. E., Cappiello, C., Biehlmaier, O., Wright, L., Schubert, C., Bertino, A., Thiemann, H., & Dennis, R. (2023). Towards a data quality framework for
  • 24. NOT ONLY ABOUT WHAT, BUT ALSO ABOUT HOW? IT IS A PROCESS
  • 25. NOT ONLY ABOUT WHAT, BUT ALSO ABOUT HOW? IT IS A PROCESS – DATA QUALITY MANAGEMENT PROCESS
  • 27. DEFINE MEASURE ANALYSE IMPROVE TDQM DATA QUALITY MANAGEMENT PROCESS TOTAL DATA QUALITY MANAGEMENT LIFCYCLE (BY MIT) DEFINE: IDENTIFY RELEVANT DQ DIMENSIONS MEASURE: PRODUCE DQ METRICS ANALYSE: IDENTIFY ROOT CAUSES FOR DQ PROBLEMS AND DETERMINE THE IMPACT OF POOR DQ IMPROVE: IDENTIFY AND EMPLOY TECHNIQUES FOR IMPROVING DQ
  • 28. •Lacagnina, C., David, R., Nikiforova, A., Kuusniemi, M. E., Cappiello, C., Biehlmaier, O., Wright, L., Schubert, C., Bertino, A., Thiemann, H., & Dennis, R. (2023). Towards a data quality framework for EOSC. Zenodo. https://doi.org/10.5281/zenodo.7515816
  • 29. Source: https://healthinstitute.illinois.edu/connect/news/berd-tips-dimensions-of-data-quality AVAILABILITY INTERNAL CONSISTENCY EXTERNAL CONSISTENCY ACCESSIBILITY COMPREHENSIVENESS INTEGRITY SEMANTIC ACCURACY SYNTACTIC ACCURACY RELEVANCE BELIEVABILITY TRUSTWORTHINESS UNAMBIGUITY DQ DIMENSIONS CURRENCY VOLATILITY EASE OF UNDERSTANDING CREDIBILITY PORTABILITY RESPONSIVENESS OBJECTIVITY REPUTATION RELIABILITY AND MANY MORE…
  • 30. Relevance Availability Internal consistency External consistency Accessibility Comprehensiveness Believability Integrity Trustworthiness Semantic accuracy Unambiguity Syntactic accuracy Source: https://healthinstitute.illinois.edu/connect/news/berd-tips-dimensions-of-data-quality THERE ARE MORE THAN 100 DATA QUALITY DIMENSIONS
  • 31. IS THERE ANY COMMONLY ACCEPTED DQ DIMENSION CLASSIFICATION? https://iso25000.com/index.php/en/iso-25000-standards/iso-25012/136-iso-iec-2012 ISO 25012 SOFTWARE ENGINEERING — SOFTWARE PRODUCT QUALITY REQUIREMENTS AND EVALUATION (SQUARE) — DATA QUALITY MODEL
  • 32. DIMENSIONS VARY IN DEFINITION AND SCOPE ONE AND THE SAME NOTION CAN REFER TO DIFFERENT DIMENSIONS ONE AND THE SAME DIMENSION CAN HAVE DIFFERENT NOTIONS [IN DIFFERENT SOURCES] DATA QUALITY RULES ARE THEN DEFINED FOR EACH DIMENSION METRICS ARE THEN SELECTED FOR THEM
  • 33. SIMPLER USER-ORIENTED APPROACH BASED ON USER DEFINED DATA QUALITY REQUIREMENTS
  • 34. ✓ STANDARDIZATION, NORMALIZATION AND PARSING ✓ MATCHING / DEDUPLICATION AND MERGING ✓ DATA CLEANSING ✓ VALIDATION ✓ DATA PROFILING / AUDITING ✓ SOME A FEW OF THEM SUPPORT (SEMI-)AUTOMATED DQ RULE RECOGNITION BASED ON METADATA, BUILT-IN RULES, OR MACHINE LEARNING DQ TOOLS FOR (SEMI-)AUTOMATED DQM
  • 36. SO FAR… DEFINITION USER TIME DIMENSION PROCESS PURPOSE
  • 37. SO FAR… DEFINITION USER TIME DIMENSION PROCESS PURPOSE WHAT ELSE?
  • 38. DATA OBJECT DATASET DATABASE DATA REPOSITORY INFORMATION SYSTEM SOFTWARE NO ONE-SIZE-FITS-ALL
  • 39. DATA OBJECT DATASET DATABASE DATA REPOSITORY INFORMATION SYSTEM SOFTWARE DATA OWNER KNOWN THIRD-PARTY NO ONE-SIZE-FITS-ALL
  • 40. DATA OBJECT DATASET DATABASE DATA REPOSITORY INFORMATION SYSTEM SOFTWARE DATA STRUCTURE NO ONE-SIZE-FITS-ALL STRUCTURED DATA UNSTRUCTURED DATA SEMI-STRUCTURED DATA Image sources: https://monkeylearn.com/blog/semi-structured-data/, https://www.pngitem.com/middle/ioJTTbR_organization-structure-icon-png-download-structures-icon-png/
  • 41. DATA OBJECT DATASET DATABASE DATA REPOSITORY INFORMATION SYSTEM SOFTWARE DATA WAREHOUSE DATA LAKE Maybe even something else? NO ONE-SIZE-FITS-ALL
  • 42. DATA OBJECT DATASET DATABASE DATA REPOSITORY INFORMATION SYSTEM SOFTWARE Running Analytics on the Data Lake - The Databricks Blog NO ONE-SIZE-FITS-ALL
  • 43. Image source: https://www.grazitti.com/blog/data-lake-vs-data-warehouse-which-one-should-you-go-for/, https://www.qubole.com/data-lakes-vs-data-warehouses-the-co-existence-argument/ SCHEMA ON READ SCHEMA ON WRITE “SINGLE SOURCE OF TRUTH”
  • 44. Implementing a Data Lake or Data Warehouse Architecture for Business Intelligence? | by Lan Chu | Towards Data Science NB: EXTRACT-TRANSFORM-LOAD IS NOT DQM!!!
  • 47. Image source: The abstracted future of data engineering | by Justin Gage | Datalogue | Medium OR HOW TO AVOID GIGO*? *“GARBAGE IN, GARBAGE OUT”
  • 48. DATA LAKE FOR BI BUSINESS DATA LAKE https://www.capgemini.com/wp-content/uploads/2017/07/pivotal_data_lake_vs_traditional_bi_20140805.pdf
  • 49. DATA LAKE + DATA WRANGLING [an asset, not a silver bullet] ✔ Source: https://monkeylearn.com/blog/data-wrangling/, https://www.altair.com/what-is-data-wrangling/ , https://pediaa.com/what-is-the-difference-between-data-wrangling-and-data-cleaning
  • 51. THE DATA WRANGLING PROCESS TO PREPARE DATA AND INTEGRATE IT INTO IS DEPENDING ON THE IS AND THE DESIRED OR REQUIRED TARGET QUALITY*, INDIVIDUAL STEPS SHOULD BE CARRIED OUT SEVERAL TIMES ➔ !!! DATA WRANGLING IS A CONTINUOUS PROCESS !!! THAT REPEATS ITSELF REPEATEDLY AT REGULAR INTERVALS. Information System Azeroual, O., Schöpfel, J., Ivanovic, D., & Nikiforova, A. (2022). Combining data lake and data wrangling for ensuring data quality in CRIS. Procedia Computer Science, 211, 3-16.
  • 52. DATA LAKE VS DATA WAREHOUSE HOW TO TAKE THE ADVANTAGES OF BOTH?
  • 53. DATA LAKE VS DATA WAREHOUSE HOW TO TAKE THE ADVANTAGES OF BOTH? DATA LAKEHOUSE
  • 54. DATA LAKEHOUSE IS SEEN AS A COMBINATION OF DATA WAREHOUSING WORKLOADS & DATA LAKE ECONOMICS Running Analytics on the Data Lake - The Databricks Blog
  • 55. Running Analytics on the Data Lake - The Databricks Blog, Build a Lake House Architecture on AWS | AWS Big Data Blog (amazon.com), The Data Lakehouse, the Data Warehouse and a Modern Data platform architecture - Microsoft Community Hub
  • 56. DATA OBJECT DATASET DATABASE DATA REPOSITORY INFORMATION SYSTEM SOFTWARE Running Analytics on the Data Lake - The Databricks Blog
  • 57. DATA QUALITY-AWARE SOFTWARE DEVELOPMENT & DATA QUALITY MODEL-BASED TESTING
  • 58. THINK DATA QUALITY FIRST!!! OR TOWARDS DATA QUALITY BY DESIGN Guerra-García, C., Nikiforova, A., Jiménez, S., Perez-Gonzalez, H. G., Ramírez-Torres, M., & Ontañon- García, L. (2023). ISO/IEC 25012-based methodology for managing data quality requirements in the development of information systems: Towards Data Quality by Design. Data & Knowledge Engineering, 145, DAQUAVORD - A METHODOLOGY FOR PROJECT MANAGEMENT OF DATA QUALITY REQUIREMENTS SPECIFICATION - AIMED AT ELICITING DQ REQUIREMENTS ARISING FROM DIFFERENT USERS’ VIEWPOINTS THESE DQ REQUIREMENTS SERVE AS DATA QUALITY SOFTWARE REQUIREMENT AT THE TIME OF THE DEVELOPMENT OF SOFTWARE THAT TAKES DATA QUALITY INTO ACCOUNT BY DEFAULT. IS BASED ON THE VIEWPOINT-ORIENTED REQUIREMENTS DEFINITION (VORD) METHOD, AND THE LATEST AND MOST GENERALLY ACCEPTED ISO/IEC 25012 STANDARD.
  • 59. DATA ARTIFACT WHAT DQM APPROACH DEPENDS ON? DEFINITION USER TIME DIMENSION PROCESS PURPOSE
  • 61. MUSK’S TOP PRIORITY: TO IMPROVE THE PRODUCT… Q: HOW DOES ONE ENSURE THE RELIABILITY OF DATA AND DECISIONS MADE BASED ON SAID DATA? THE ANSWER LIES NOT IN MANAGING THE DATA ALONE, BUT ALSO THE INFORMATION AROUND AND ABOUT DATA ACQUISITION, TRANSFORMATIONS AND VISUALIZATION TO PROVIDE A BETTER UNDERSTANDING AND SUPPORT DECISION MAKERS https://www.gqindia.com/get-smart/content/5-things-elon-musk-did-to-become-one-of-the-richest-men-in-the-world
  • 62. https://www.gqindia.com/get-smart/content/5-things-elon-musk-did-to-become-one-of-the-richest-men-in-the-world MUSK’S TOP PRIORITY: TO IMPROVE THE PRODUCT… Q: HOW DOES ONE ENSURE THE RELIABILITY OF DATA AND DECISIONS MADE BASED ON SAID DATA? THE ANSWER LIES NOT IN MANAGING THE DATA ALONE, BUT ALSO THE INFORMATION AROUND AND ABOUT DATA ACQUISITION, TRANSFORMATIONS AND VISUALIZATION TO PROVIDE A BETTER UNDERSTANDING AND SUPPORT DECISION MAKERS BY FOCUSING ON SUSTAINABLE DATA, CLEAR DATA GOVERNANCE AND STRONG DATA MANAGEMENT
  • 64. https://www.gqindia.com/get-smart/content/5-things-elon-musk-did-to-become-one-of-the-richest-men-in-the-world DATA GOVERNANCE IS THE ANSWER https://www.edq.com/blog/data-quality-vs-data-governance/ Azeroual O., Nikiforova A., Sha K. (2023) Overlooked Aspects of Data Governance: Workflow Framework For Enterprise Data Deduplication
  • 68. https://www.gqindia.com/get-smart/content/5-things-elon-musk-did-to-become-one-of-the-richest-men-in-the-world THINK DATA QUALITY FIRST! “1-10-100” RULE 1$ SPENT ON PREVENTION SAVES 10$ ON APPRAISAL AND 100$ ON FAILURE COSTS https://twitter.com/bright_data/status/1346443370718240768
  • 69. https://www.gqindia.com/get-smart/content/5-things-elon-musk-did-to-become-one-of-the-richest-men-in-the-world DEVELOP DATA QUALITY MANAGEMENT AND GOVERNANCE STRATEGIES MANTAIN DQM & DQG STRATEGIES DEFINE MEASURE ANALYSE IMPROVE
  • 73. For more information, see ResearchGate, anastasijanikiforova.com For questions or any queries, contact me via Nikiforova.Anastasija@gmail.com,