2. Data Governance
What is Data Governance
What is Data Quality
The challenges
Data governance programme
A publisher approach
The outcome: Book author example
ICEDIS
Summary
3. Data governance
“I think that the key issue here, is that the
information is probably incorrect, inaccurate and in a
form that almost certainly shouldn't have been used”
Dr John Thomson cardiologist at Leeds General Infirmary,
Sky News 30/3/2013
4. Data Governance – a definition
Data governance is defined as the
processes, policies, standards, organisation, and technologies
required to manage and ensure the
availability, accessibility, quality, consistency, auditability, and
security of data
5. Data Quality - definitions
Data are of high quality "if they are fit for their intended uses
in operations, decision making and planning"
Data are deemed of high quality if they correctly represent
the real-world construct to which they refer
6. Data Quality
Data quality attributes:
Accurate
Reliable
Complete
Appropriate
Timely
Credible
Up-to-date
7. The challenge: Data Sources
Multiple data sources – ‘system’ data silos
Multiple locations – ‘geographic’ data silos
Data entered through multiple channels
Data entered by different people
8. The challenge: Data Sources
Typical publisher systems: Data can be entered by:
Financial system Organisation staff
CRM/Sales database Authors
Authentication system Society members
Fulfilment
Agents in the supply chain
Usage statistics
3rd party organisations
Submissions system
…..
Author database
…..
9. The challenge: Institutions
UCL:
University College London (UK)
Université Catholique de Louvain (Belgium)
Universidad Cristiana Latinoamericana (Ecuador)
University College Lillebælt (Denmark)
Centro Universitario Celso Lisboa (Brazil)
Union County Library (USA)
NPL:
National Physical Laboratory (UK)
National Physical Laboratory (India)
York Uni.
University of York (UK)
York University (Canada)
Northeastern University:
Northeastern University (Boston, USA)
Northeastern University (Shenyang, China)
10. The challenge: Individuals
How can we uniquely identify individuals? Of the 700,000
individuals known to the RSC in 2012 there were:
Smith:
~1,500
Jones:
~1,000
Li:
>10,000
12. Biggest obstacle(s) to data quality
improvement in your organization?
Lack of accountability and responsibility for data quality 55.4%
Too many information silos 51.8%
Lack of awareness or communication of the magnitude of data quality problems 51.4%
Lack of common understanding of what data quality means 50.2%
Lack of awareness or communication of the opportunities associated with high quality data 45.0%
Lack of senior leadership in tackling data quality issues 44.2%
Lack of data quality policies, plans, and procedures 42.2%
Perception that data quality is an IT issue only rather than an organisation wide issue 41.8%
The State of Information and Data Quality 2012 Industry Survey& Report, (IAIDQ)
Understanding how Organizations Manage the Quality of their Information and Data Assets.
Pierce, Yonke, Malik, Nagaraj
13. Data Governance – why it is vital
“processes, policies, standards… ensure quality and consistency”
Increase consistency and confidence in our decision making
Maximise the income generation potential of our data
Provide excellent customer service
Designating accountability for information quality
Minimising or eliminating re-work
Optimise staff effectiveness
Decreasing the risk of regulatory fines
Improving data security
Data is one of the most valuable assets within an organisation
16. Plan & prioritise
Sponsorship: director level sponsor?
Program management: business or IT driven?
Organisational structure: local, national, international?
Scope: focus on the most important data?
Ownership: who are the business owners of critical data?
New system implementation: protect investment
17. Plan & prioritise
Resources: dedicated staff?
Funding: which area of the business will fund the program?
Business drivers: what are the major business drivers?
Barriers: what are the main barriers
(cultural, funding, resources, priorities etc.) and can they be
mitigated
18. Audit & Analyse
Audit existing data quality
Review all relevant systems
How poor is it?
Incomplete data
Invalid
Out of date
….
19. Clean existing data
Prioritise
Quick wins
Highlight progress
What can be automated?
Introduce unique identifiers
20. Identifiers available
People Organisations
International Standard Name International Standard Name
Identifier (ISNI) Identifier (ISNI)
Open Researcher and Ringgold ID
Contributor ID (ORCID) DUNS Number (D&B) and
Scopus Author Identifier other business and finance
ResearcherID IDs
MDR PID Numbers and other
marketing IDs
Library of Congress MARC
Code List for Organizations
21. ISNI
ISNI is designed ISNI Number ISNI Number
to be a “bridge
identifier”
Party ID 1 Party ID 2
Proprietary Proprietary
Information and/or Information and/or
Metadata Metadata
22. Author IDs
ORCID is designed to persistently identify and disambiguate
scholarly researchers and attach them to research output
ORCID identifiers utilize a format compliant with the ISNI ISO
standard
ISNI has reserved a block of identifiers for use by ORCID, so
there will be no overlaps in assignments
Recorded as http://orcid.org/0000-0001-2345-6789
http://about.orcid.org/
http://www.isni.org/
23. Use cases
Disambiguation of researchers
and connection to all their
research
Links to
contributors, editors, compiler
s and others involved in the
research process
Embed IDs into research
workflows and the supply
chain
Integrate systems
24. Institutional IDs
Ringgold is an ISNI Registration Agency
Unique institutional ID number maps data across systems
ISNI numbers should be used across the scholarly supply
chain to:
Disambiguate institutional records
Eradicate duplication of data
Map institutions into their hierarchy
Link systems using the institutional ID as the lynchpin
25. Minimising the impact of data silos
Standard identifiers (both individual and institution) can be
used to breakdown silos by enabling better system linking:
26. Improve data capture
Data quality policy
Web forms
Closer collaboration with 3rd parties to encourage use of
industry standard identifiers such as ISNI or ORCID
27. Data capture - data quality policy
Design to ensure accuracy, quality and consistency
Individual responsibilities:
All staff are responsible for the accuracy and consistency of data
Capture data in such a way that it is uniquely identifiable and easily
shared within the organisation and with 3rd parties
Records relating to individuals
Records relating to institutions
Reporting of inaccuracies to Data Owners
Data owners responsibilities:
All source data systems must have a designated Data Owner
Data owner retains overall responsibility for all records within their
source data system
28. Improve data capture – web forms
Required fields
Validation
Address validation – postcode lookup
Institution validation – institution lookup
‘Internal’ and ‘external’ web form consistency
Language barriers
Help and hints
Free-text fields
30. A publisher example
Develop a Data Governance Programme
Data ‘champion’
Engagement – at all levels
Ownership – at all levels
Allocate necessary resources
Guidelines/Policy - Data quality policy
Processes put in place
Education - raise awareness
New staff – training on Data Governance and their wider impact
Change of culture
31. A publisher example
Ringgold and DataSalon client
All institutional records contain Ringgold Identifiers
System linking via Individual and Institutional identifiers
Data (both good and bad) visible to all via MasterVision
Use of data governance dashboards
Tidying of existing data
Simple reporting of incorrect data across organisation
New data captured correctly
32. Author database
1. Create a data governance dashboard to
monitor problem areas:
• Book authors with no related institution
• Unknown book authors
• Author records without an affiliation entry
• Author records with commas in the
affiliation entry
• Book authors without an email address
• Book authors with an invalid email address
2. Correct problem records in existing data
• Dashboard clearly highlighted all records of
concern and these records were corrected
33. Author database
3. Ensure new records are created correctly
• Raise staff understanding of the importance of capturing data correctly and
the impact it has across the organisation as a whole (data silos)
• Training covering data governance
4. Ensure appropriate Ringgold coverage
• Where institutions were discovered in the Author database that didn’t exist
within Identify these were reported to Ringgold. This not only means that
individual authors can be linked to the new institution but that any
individuals in other data sources at the same institution can be linked. This
benefits all users of our data and potentially highlights new sales
opportunities.
5. Monitor data quality on an on-going basis
• Books data governance dashboard update on a weekly basis.
34. Author database – results
100.00% 10% will never link:
• Missing data (old records)
95.00%
• Institution no longer exists
90.00% • Retired author
85.00% • Genuinely no related institution
All data sources
ANKO 80.00%
75.00%
End of process:
70.00%
• 15% increase in authors linked to
institutions - information
valuable in supporting all areas
of the business
• Ready for data migration
35. ICEDIS
The international standards organization EDItEUR is working to
encourage improvements in the ways that "party" information is
communicated
Some parts of the supply chain continue to send unstructured name &
address records, making matching, disambiguation and automatic ingest
near impossible
ICEDIS has collaborated with EDItEUR to develop a highly structured
data model for exchanging names, addresses and standard identifiers.
The group has recently been validating the model by means of a "paper
pilot", using a small library of about 100 name & address types
An XML schema and HTML documentation are freely available
www.editeur.org
www.editeur.org/138/Structured-Name-and-Address-Model
info@editeur.org
36. Summary
Your data is a very valuable asset when managed correctly
Establishing a data governance programme will enable you to
gain maximum benefit from that data
Data governance is as much about changing the culture of an
organisation as it is about processes and procedures
It will take time but the benefits can be enormous