SlideShare une entreprise Scribd logo
1  sur  5
Télécharger pour lire hors ligne
AutonomousDataTrustScore
forDataCatalogs
www.FirstEigen.com • contact@firsteigen.com
FirstEigenWhitePaper
Autonomously Assign Data Trust Score to Each Data Asset Present in the Data Catalog
With the accelerating adoption of Data Catalogs as the core component for enterprise data governance, the need
to provide information about the health and usability of the data assets has become critical. With the availability of
standardized data trust scores within the data catalog, users can easily determine the usability and relevancy of the
dataset in their particular use case.
Why Should the Data Governance Team Care About Data Trust Score?
When working with data, a key question that comes to mind is, “Is the data of the right quality for my purpose?” Data
scientists working on advanced use cases need confidence in the data they use. Information about data quality,
data lineage, and the accountable person, e.g.,
data steward or data owner, are three sorts of
information that help gain trust in a dataset. With
this information, suitable data sources are more
likely to be used, whereas inappropriate data
sources that are likely to produce erroneous results
can be avoided.
Data quality and trust are keys to making the most
efficient use of data. The more accurate the data,
the better the results. It is crucial to measure the
accuracy of the data to guarantee that stakeholders
and automated systems make decisions based on
reliable information.
A Data Trust Score is a single number that
represents a currency of trust during any data usage or exchange process. Data Trust Score can serve as a solution
to the information asymmetry problem that exists in modern catalogs. The consumer and the producer of data do not
have the level of information related to the usefulness of the data.
Data Trust score is generally computed on certain data quality dimensions applied to individual columns in the data
set. The scores are then combined to calculate the total quality score for the entire data set. The combined score is
an average of the scores for all columns.
Data quality is a mandatory element for enabling large-scale and better use of data in any organization. By adding data
quality information within the catalog, users can find insights about the data without ever needing to switch contexts.
Data Catalog users often find the need for data quality information and other metadata of the data assets[1]
.
Although not a data quality tool, a data catalog can help enterprises in providing data quality information[2]
. Most
catalog provides vendor-neutral integrations with dedicated data quality platforms, thus allowing data catalog search
functions to surface higher-quality data first. Data assets with high scores are promoted which are indicative of good
data quality. Other metrics such as usage and company certifications can also guide catalog users to the highest
quality data assets.
How to Calculate Data Trust Score?
Data Trust score can be computed based on the weighted average data quality score on data quality rules.
Progressive organizations categorize the data quality rules along the data quality dimensions. A trust score is then
calculated for each dimension and then rolled up using weighted average methods for the dataset.
•	 Completeness: It determines the completeness of contextually important fields.
•	 Conformity: Dataset should contain relevant data and follow certain rules or patterns. This data quality
dimension determines conformity to a pattern, length, and format of contextually important fields.
•	 Uniqueness: This dimension determines the uniqueness of the individual records. Primary keys are
essential to distinguish between different data values and records to avoid duplication.
•	 Consistency: It determines the consistency of intercolumn relationships (e.g. date of employment must
be before the date of retirement).
•	 Drift: It determines the drift of the key categorical and continuous fields from the historical information.
•	 Anomaly: It can determine the volume and value anomaly of critical columns.
Challenges With Incorporating Data Trust Score into the Data Catalog
Chief Data Officers (CDOs) drive the data strategy for the entire organization. They are accountable for maintaining
data quality. The volume of data assets is increasing rapidly, and data teams are struggling to keep up because
existing data validation approaches are resource-intensive, time-consuming, costly, and not scalable for 1000s
of data assets. Business stakeholders often find errors in their reports and dashboards repeatedly, which leads to
mistrust of data.
In general, the data team experiences the following operational challenges while integrating data quality into the
data catalog:
•	 Analyzing data and consulting the subject matter experts to determine what data quality rules need to be
implemented is time-consuming. data quality analysts are unfamiliar with the data assets obtained from a third
party, either in a public or private context. They need to engage with subject matter experts extensively to build
data quality criteria
•	 Implementation of the rules is specific to each data set/table. So, the effort is linearly proportional to the
number of tables/buckets/folders in the Data Catalog.
•	 Existing open-source tools/approaches come with limited audit trail capability. Generating an audit trail of the
rule execution results for compliance requirements often takes time and effort from the data engineering team.
•	 Maintaining the implemented rules is cumbersome.
Why Leverage Machine Learning?
ML models can learn from massive volumes of data and uncover hidden data patterns. Moreover, they train
overtime on the new data as it changes. Using ML in calculating data trust scores has several advantages:
•	 Machine Learning helps to objectively determine data patterns or data fingerprints and
translate those patterns to data quality rules.
•	 Machine Learning can then use the data fingerprints to detect transactions that do not
adhere to the rules.
•	 Implementing an ML approach can help to assess the data health quickly.
•	 ML is usually more comprehensive and accurate than a human-driven data quality analysis.
More importantly, when a trust score is assigned by the consumer of data, the score becomes susceptible
to the same temptations to hide or even falsify information as the producers of the underlying data assets.
Machine learning-based Data Trust score eliminates human biases and provides an objective score that
both producer and consumer can agree on[3,4]
.
Autonomous Data Trust Score for Data Catalog – an Example
In the example to the right, Machine
Learning based was used to calculate the
data trust score for each of the data assets
registered in a data catalog. The result
of the analysis is then presented using
standardized data quality dimensions as
shown below and embedded within the data
catalog using standard APIs.
The Data Trust Score displayed in the
summary of the Data Quality analysis shows
how the quality score changed between
the last two analyses. Every violation
discovered can be double-clicked for
further information:
•	 Expand the dimension to see which
columns are affected at the data asset
level. Click a column name to see the
dimension details for that column.
•	 At the column level, click the dimension
name for further details.
This provides the catalog users with enough
information regarding the fitness/usability
of the dataset in the context of their use case.
References
[1]
K. Kashalikar, Integrating Data Quality with Data Catalog (2022), LinkedIn
[2]
A CDO’s Guide to the Data Catalog, Alation
[3]
Certification as Solution to the Asymmetric Information Problem? Georg von Wangenheim, University of Kassel, 2019
[4]
From Big Data to Big Profits: Success with Data and Analytics, Russell Walker, 2015
DataBuck by FirstEigen: Serverless, Autonomous, In-Situ Data Validation
FirstEigen’s DataBuck is recognized by Gartner and IDC as the most
innovative data validation software for the Lake and the Cloud. By
leveraging AI/ML it’s >10x more effective in catching unexpected data
errors in comparison to traditional DQ tools. DataBuck’s out-of-the-
box 9-Point Data Quality checks needs no coding, minimal human
intervention, and is operationalized in just a few clicks. It increases the scalability of discovering and applying
essential data quality checks to 1,000’s of tables by auto-discovering relationships and patterns, auto updating the
rules, and continuously monitoring new incoming data. It reduces man-years of labor to days.
www.FirstEigen.com • contact@firsteigen.com
DataBuck – Benefits
People Productivity
Boost >80%
Reduction in Unexpected
Errors: 70%
Cost Reduction
>50%
Time Reduction to
Onboard Data Set ˜90%
Increase in Processing
Speed >10x
Cloud
Native

Contenu connexe

Similaire à FirstEigen-White-Paper_Autonomous-Data-Trust-Score-for-Data-Catalogs.pdf

Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...Health Catalyst
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bijeffd00
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityPrecisely
 
Data Quality
Data QualityData Quality
Data QualityVijaya K
 
A simplified approach for quality management in data warehouse
A simplified approach for quality management in data warehouseA simplified approach for quality management in data warehouse
A simplified approach for quality management in data warehouseIJDKP
 
Data Quality_ the holy grail for a Data Fluent Organization.pptx
Data Quality_ the holy grail for a Data Fluent Organization.pptxData Quality_ the holy grail for a Data Fluent Organization.pptx
Data Quality_ the holy grail for a Data Fluent Organization.pptxBalvinder Hira
 
From Compliance to Customer 360: Winning with Data Quality & Data Governance
From Compliance to Customer 360: Winning with Data Quality & Data GovernanceFrom Compliance to Customer 360: Winning with Data Quality & Data Governance
From Compliance to Customer 360: Winning with Data Quality & Data GovernancePrecisely
 
Database and Data Warehousing-Building Business Intelligence
Database and Data Warehousing-Building Business IntelligenceDatabase and Data Warehousing-Building Business Intelligence
Database and Data Warehousing-Building Business IntelligenceYeng Ferraris Portes
 
Self-service analytics risk_September_2016
Self-service analytics risk_September_2016Self-service analytics risk_September_2016
Self-service analytics risk_September_2016Leigh Ulpen
 
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...DATAVERSITY
 
how to successfully implement a data analytics solution.pdf
how to successfully implement a data analytics solution.pdfhow to successfully implement a data analytics solution.pdf
how to successfully implement a data analytics solution.pdfbasilmph
 
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREEA ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREEijcsa
 
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSINGMETA DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSINGIJCSEIT Journal
 
Data Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernData Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernAmin Chowdhury
 
Hcd wp-2012-better dataleadstobetteranalytics
Hcd wp-2012-better dataleadstobetteranalyticsHcd wp-2012-better dataleadstobetteranalytics
Hcd wp-2012-better dataleadstobetteranalyticsHealth Care DataWorks
 
Tips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data AnalyticsTips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data AnalyticsAbhishek Sood
 
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deckDC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deckBeth Fitzpatrick
 
Data quality management best practices
Data quality management best practicesData quality management best practices
Data quality management best practicesselinasimpson2201
 
Multidimensional Challenges and the Impact of Test Data Management
Multidimensional Challenges and the Impact of Test Data ManagementMultidimensional Challenges and the Impact of Test Data Management
Multidimensional Challenges and the Impact of Test Data ManagementCognizant
 

Similaire à FirstEigen-White-Paper_Autonomous-Data-Trust-Score-for-Data-Catalogs.pdf (20)

Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bi
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
Data Quality
Data QualityData Quality
Data Quality
 
A simplified approach for quality management in data warehouse
A simplified approach for quality management in data warehouseA simplified approach for quality management in data warehouse
A simplified approach for quality management in data warehouse
 
Data Quality_ the holy grail for a Data Fluent Organization.pptx
Data Quality_ the holy grail for a Data Fluent Organization.pptxData Quality_ the holy grail for a Data Fluent Organization.pptx
Data Quality_ the holy grail for a Data Fluent Organization.pptx
 
do_dq.pdf
do_dq.pdfdo_dq.pdf
do_dq.pdf
 
From Compliance to Customer 360: Winning with Data Quality & Data Governance
From Compliance to Customer 360: Winning with Data Quality & Data GovernanceFrom Compliance to Customer 360: Winning with Data Quality & Data Governance
From Compliance to Customer 360: Winning with Data Quality & Data Governance
 
Database and Data Warehousing-Building Business Intelligence
Database and Data Warehousing-Building Business IntelligenceDatabase and Data Warehousing-Building Business Intelligence
Database and Data Warehousing-Building Business Intelligence
 
Self-service analytics risk_September_2016
Self-service analytics risk_September_2016Self-service analytics risk_September_2016
Self-service analytics risk_September_2016
 
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
 
how to successfully implement a data analytics solution.pdf
how to successfully implement a data analytics solution.pdfhow to successfully implement a data analytics solution.pdf
how to successfully implement a data analytics solution.pdf
 
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREEA ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
 
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSINGMETA DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
 
Data Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernData Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing Concern
 
Hcd wp-2012-better dataleadstobetteranalytics
Hcd wp-2012-better dataleadstobetteranalyticsHcd wp-2012-better dataleadstobetteranalytics
Hcd wp-2012-better dataleadstobetteranalytics
 
Tips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data AnalyticsTips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data Analytics
 
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deckDC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
 
Data quality management best practices
Data quality management best practicesData quality management best practices
Data quality management best practices
 
Multidimensional Challenges and the Impact of Test Data Management
Multidimensional Challenges and the Impact of Test Data ManagementMultidimensional Challenges and the Impact of Test Data Management
Multidimensional Challenges and the Impact of Test Data Management
 

Plus de arifulislam946965

abkb-gives-ahrc-direction-on-screening-and-credibility-bow-river-employment-l...
abkb-gives-ahrc-direction-on-screening-and-credibility-bow-river-employment-l...abkb-gives-ahrc-direction-on-screening-and-credibility-bow-river-employment-l...
abkb-gives-ahrc-direction-on-screening-and-credibility-bow-river-employment-l...arifulislam946965
 
ERC-RP-Weekly-Slides-September-2022-Linked.pdf
ERC-RP-Weekly-Slides-September-2022-Linked.pdfERC-RP-Weekly-Slides-September-2022-Linked.pdf
ERC-RP-Weekly-Slides-September-2022-Linked.pdfarifulislam946965
 
Business Case for leveraging Machine Learning (ML) to Validate Data Lake.pdf
Business Case for leveraging Machine Learning (ML) to Validate Data Lake.pdfBusiness Case for leveraging Machine Learning (ML) to Validate Data Lake.pdf
Business Case for leveraging Machine Learning (ML) to Validate Data Lake.pdfarifulislam946965
 
FirstEigen Brochure- All clouds.pdf
FirstEigen Brochure- All clouds.pdfFirstEigen Brochure- All clouds.pdf
FirstEigen Brochure- All clouds.pdfarifulislam946965
 
Snowflake-Data-validation-Architecture-FirstEigen-White-Paper.pdf
Snowflake-Data-validation-Architecture-FirstEigen-White-Paper.pdfSnowflake-Data-validation-Architecture-FirstEigen-White-Paper.pdf
Snowflake-Data-validation-Architecture-FirstEigen-White-Paper.pdfarifulislam946965
 
13-Essential-Data-Validation-Checks.pdf
13-Essential-Data-Validation-Checks.pdf13-Essential-Data-Validation-Checks.pdf
13-Essential-Data-Validation-Checks.pdfarifulislam946965
 
What are the signs of pica eating disorder
What are the signs of pica eating disorderWhat are the signs of pica eating disorder
What are the signs of pica eating disorderarifulislam946965
 

Plus de arifulislam946965 (15)

abkb-gives-ahrc-direction-on-screening-and-credibility-bow-river-employment-l...
abkb-gives-ahrc-direction-on-screening-and-credibility-bow-river-employment-l...abkb-gives-ahrc-direction-on-screening-and-credibility-bow-river-employment-l...
abkb-gives-ahrc-direction-on-screening-and-credibility-bow-river-employment-l...
 
Flowers to the world.pdf
Flowers to the world.pdfFlowers to the world.pdf
Flowers to the world.pdf
 
ERC-RP-Weekly-Slides-September-2022-Linked.pdf
ERC-RP-Weekly-Slides-September-2022-Linked.pdfERC-RP-Weekly-Slides-September-2022-Linked.pdf
ERC-RP-Weekly-Slides-September-2022-Linked.pdf
 
do_ingestion.pdf
do_ingestion.pdfdo_ingestion.pdf
do_ingestion.pdf
 
do_pipelines.pdf
do_pipelines.pdfdo_pipelines.pdf
do_pipelines.pdf
 
dq_fail.pdf
dq_fail.pdfdq_fail.pdf
dq_fail.pdf
 
strategies.pdf
strategies.pdfstrategies.pdf
strategies.pdf
 
Business Case for leveraging Machine Learning (ML) to Validate Data Lake.pdf
Business Case for leveraging Machine Learning (ML) to Validate Data Lake.pdfBusiness Case for leveraging Machine Learning (ML) to Validate Data Lake.pdf
Business Case for leveraging Machine Learning (ML) to Validate Data Lake.pdf
 
FirstEigen Brochure- All clouds.pdf
FirstEigen Brochure- All clouds.pdfFirstEigen Brochure- All clouds.pdf
FirstEigen Brochure- All clouds.pdf
 
Snowflake-Data-validation-Architecture-FirstEigen-White-Paper.pdf
Snowflake-Data-validation-Architecture-FirstEigen-White-Paper.pdfSnowflake-Data-validation-Architecture-FirstEigen-White-Paper.pdf
Snowflake-Data-validation-Architecture-FirstEigen-White-Paper.pdf
 
13-Essential-Data-Validation-Checks.pdf
13-Essential-Data-Validation-Checks.pdf13-Essential-Data-Validation-Checks.pdf
13-Essential-Data-Validation-Checks.pdf
 
What are the signs of pica eating disorder
What are the signs of pica eating disorderWhat are the signs of pica eating disorder
What are the signs of pica eating disorder
 
바카라사이트
바카라사이트바카라사이트
바카라사이트
 
카지노사이트
카지노사이트카지노사이트
카지노사이트
 
우리카지노
우리카지노우리카지노
우리카지노
 

Dernier

PHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation FinalPHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation FinalPanhandleOilandGas
 
Structuring and Writing DRL Mckinsey (1).pdf
Structuring and Writing DRL Mckinsey (1).pdfStructuring and Writing DRL Mckinsey (1).pdf
Structuring and Writing DRL Mckinsey (1).pdflaloo_007
 
BeMetals Investor Presentation_May 3, 2024.pdf
BeMetals Investor Presentation_May 3, 2024.pdfBeMetals Investor Presentation_May 3, 2024.pdf
BeMetals Investor Presentation_May 3, 2024.pdfDerekIwanaka1
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...daisycvs
 
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAIGetting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAITim Wilson
 
Marel Q1 2024 Investor Presentation from May 8, 2024
Marel Q1 2024 Investor Presentation from May 8, 2024Marel Q1 2024 Investor Presentation from May 8, 2024
Marel Q1 2024 Investor Presentation from May 8, 2024Marel
 
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al MizharAl Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizharallensay1
 
Falcon Invoice Discounting: Tailored Financial Wings
Falcon Invoice Discounting: Tailored Financial WingsFalcon Invoice Discounting: Tailored Financial Wings
Falcon Invoice Discounting: Tailored Financial WingsFalcon Invoice Discounting
 
CROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NSCROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NSpanmisemningshen123
 
Falcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon investment
 
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...meghakumariji156
 
Falcon Invoice Discounting: Aviate Your Cash Flow Challenges
Falcon Invoice Discounting: Aviate Your Cash Flow ChallengesFalcon Invoice Discounting: Aviate Your Cash Flow Challenges
Falcon Invoice Discounting: Aviate Your Cash Flow Challengeshemanthkumar470700
 
Cannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannaBusinessPlans
 
Buy Verified TransferWise Accounts From Seosmmearth
Buy Verified TransferWise Accounts From SeosmmearthBuy Verified TransferWise Accounts From Seosmmearth
Buy Verified TransferWise Accounts From SeosmmearthBuy Verified Binance Account
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with CultureSeta Wicaksana
 
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...ssuserf63bd7
 
Falcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon investment
 
Arti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdfArti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdfwill854175
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Centuryrwgiffor
 

Dernier (20)

PHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation FinalPHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation Final
 
Structuring and Writing DRL Mckinsey (1).pdf
Structuring and Writing DRL Mckinsey (1).pdfStructuring and Writing DRL Mckinsey (1).pdf
Structuring and Writing DRL Mckinsey (1).pdf
 
BeMetals Investor Presentation_May 3, 2024.pdf
BeMetals Investor Presentation_May 3, 2024.pdfBeMetals Investor Presentation_May 3, 2024.pdf
BeMetals Investor Presentation_May 3, 2024.pdf
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAIGetting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
 
Marel Q1 2024 Investor Presentation from May 8, 2024
Marel Q1 2024 Investor Presentation from May 8, 2024Marel Q1 2024 Investor Presentation from May 8, 2024
Marel Q1 2024 Investor Presentation from May 8, 2024
 
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al MizharAl Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
 
Falcon Invoice Discounting: Tailored Financial Wings
Falcon Invoice Discounting: Tailored Financial WingsFalcon Invoice Discounting: Tailored Financial Wings
Falcon Invoice Discounting: Tailored Financial Wings
 
CROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NSCROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NS
 
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
 
Falcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business Growth
 
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
 
Falcon Invoice Discounting: Aviate Your Cash Flow Challenges
Falcon Invoice Discounting: Aviate Your Cash Flow ChallengesFalcon Invoice Discounting: Aviate Your Cash Flow Challenges
Falcon Invoice Discounting: Aviate Your Cash Flow Challenges
 
Cannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 Updated
 
Buy Verified TransferWise Accounts From Seosmmearth
Buy Verified TransferWise Accounts From SeosmmearthBuy Verified TransferWise Accounts From Seosmmearth
Buy Verified TransferWise Accounts From Seosmmearth
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with Culture
 
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
 
Falcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business Potential
 
Arti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdfArti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdf
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Century
 

FirstEigen-White-Paper_Autonomous-Data-Trust-Score-for-Data-Catalogs.pdf

  • 2. Autonomously Assign Data Trust Score to Each Data Asset Present in the Data Catalog With the accelerating adoption of Data Catalogs as the core component for enterprise data governance, the need to provide information about the health and usability of the data assets has become critical. With the availability of standardized data trust scores within the data catalog, users can easily determine the usability and relevancy of the dataset in their particular use case. Why Should the Data Governance Team Care About Data Trust Score? When working with data, a key question that comes to mind is, “Is the data of the right quality for my purpose?” Data scientists working on advanced use cases need confidence in the data they use. Information about data quality, data lineage, and the accountable person, e.g., data steward or data owner, are three sorts of information that help gain trust in a dataset. With this information, suitable data sources are more likely to be used, whereas inappropriate data sources that are likely to produce erroneous results can be avoided. Data quality and trust are keys to making the most efficient use of data. The more accurate the data, the better the results. It is crucial to measure the accuracy of the data to guarantee that stakeholders and automated systems make decisions based on reliable information. A Data Trust Score is a single number that represents a currency of trust during any data usage or exchange process. Data Trust Score can serve as a solution to the information asymmetry problem that exists in modern catalogs. The consumer and the producer of data do not have the level of information related to the usefulness of the data. Data Trust score is generally computed on certain data quality dimensions applied to individual columns in the data set. The scores are then combined to calculate the total quality score for the entire data set. The combined score is an average of the scores for all columns. Data quality is a mandatory element for enabling large-scale and better use of data in any organization. By adding data quality information within the catalog, users can find insights about the data without ever needing to switch contexts. Data Catalog users often find the need for data quality information and other metadata of the data assets[1] . Although not a data quality tool, a data catalog can help enterprises in providing data quality information[2] . Most catalog provides vendor-neutral integrations with dedicated data quality platforms, thus allowing data catalog search functions to surface higher-quality data first. Data assets with high scores are promoted which are indicative of good data quality. Other metrics such as usage and company certifications can also guide catalog users to the highest quality data assets.
  • 3. How to Calculate Data Trust Score? Data Trust score can be computed based on the weighted average data quality score on data quality rules. Progressive organizations categorize the data quality rules along the data quality dimensions. A trust score is then calculated for each dimension and then rolled up using weighted average methods for the dataset. • Completeness: It determines the completeness of contextually important fields. • Conformity: Dataset should contain relevant data and follow certain rules or patterns. This data quality dimension determines conformity to a pattern, length, and format of contextually important fields. • Uniqueness: This dimension determines the uniqueness of the individual records. Primary keys are essential to distinguish between different data values and records to avoid duplication. • Consistency: It determines the consistency of intercolumn relationships (e.g. date of employment must be before the date of retirement). • Drift: It determines the drift of the key categorical and continuous fields from the historical information. • Anomaly: It can determine the volume and value anomaly of critical columns. Challenges With Incorporating Data Trust Score into the Data Catalog Chief Data Officers (CDOs) drive the data strategy for the entire organization. They are accountable for maintaining data quality. The volume of data assets is increasing rapidly, and data teams are struggling to keep up because existing data validation approaches are resource-intensive, time-consuming, costly, and not scalable for 1000s of data assets. Business stakeholders often find errors in their reports and dashboards repeatedly, which leads to mistrust of data. In general, the data team experiences the following operational challenges while integrating data quality into the data catalog: • Analyzing data and consulting the subject matter experts to determine what data quality rules need to be implemented is time-consuming. data quality analysts are unfamiliar with the data assets obtained from a third party, either in a public or private context. They need to engage with subject matter experts extensively to build data quality criteria • Implementation of the rules is specific to each data set/table. So, the effort is linearly proportional to the number of tables/buckets/folders in the Data Catalog. • Existing open-source tools/approaches come with limited audit trail capability. Generating an audit trail of the rule execution results for compliance requirements often takes time and effort from the data engineering team. • Maintaining the implemented rules is cumbersome.
  • 4. Why Leverage Machine Learning? ML models can learn from massive volumes of data and uncover hidden data patterns. Moreover, they train overtime on the new data as it changes. Using ML in calculating data trust scores has several advantages: • Machine Learning helps to objectively determine data patterns or data fingerprints and translate those patterns to data quality rules. • Machine Learning can then use the data fingerprints to detect transactions that do not adhere to the rules. • Implementing an ML approach can help to assess the data health quickly. • ML is usually more comprehensive and accurate than a human-driven data quality analysis. More importantly, when a trust score is assigned by the consumer of data, the score becomes susceptible to the same temptations to hide or even falsify information as the producers of the underlying data assets. Machine learning-based Data Trust score eliminates human biases and provides an objective score that both producer and consumer can agree on[3,4] . Autonomous Data Trust Score for Data Catalog – an Example In the example to the right, Machine Learning based was used to calculate the data trust score for each of the data assets registered in a data catalog. The result of the analysis is then presented using standardized data quality dimensions as shown below and embedded within the data catalog using standard APIs. The Data Trust Score displayed in the summary of the Data Quality analysis shows how the quality score changed between the last two analyses. Every violation discovered can be double-clicked for further information: • Expand the dimension to see which columns are affected at the data asset level. Click a column name to see the dimension details for that column. • At the column level, click the dimension name for further details. This provides the catalog users with enough information regarding the fitness/usability of the dataset in the context of their use case.
  • 5. References [1] K. Kashalikar, Integrating Data Quality with Data Catalog (2022), LinkedIn [2] A CDO’s Guide to the Data Catalog, Alation [3] Certification as Solution to the Asymmetric Information Problem? Georg von Wangenheim, University of Kassel, 2019 [4] From Big Data to Big Profits: Success with Data and Analytics, Russell Walker, 2015 DataBuck by FirstEigen: Serverless, Autonomous, In-Situ Data Validation FirstEigen’s DataBuck is recognized by Gartner and IDC as the most innovative data validation software for the Lake and the Cloud. By leveraging AI/ML it’s >10x more effective in catching unexpected data errors in comparison to traditional DQ tools. DataBuck’s out-of-the- box 9-Point Data Quality checks needs no coding, minimal human intervention, and is operationalized in just a few clicks. It increases the scalability of discovering and applying essential data quality checks to 1,000’s of tables by auto-discovering relationships and patterns, auto updating the rules, and continuously monitoring new incoming data. It reduces man-years of labor to days. www.FirstEigen.com • contact@firsteigen.com DataBuck – Benefits People Productivity Boost >80% Reduction in Unexpected Errors: 70% Cost Reduction >50% Time Reduction to Onboard Data Set ˜90% Increase in Processing Speed >10x Cloud Native