SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
Automatic creation of mappings between
classification systems for bibliographic data
Prof. Magnus Pfeffer
Stuttgart Media University
pfeffer@hdm-stuttgart.de
Agenda


Motivation



Instance-based matching



Application to bibliographic data



Evaluation



Ongoing projects



RDF Representation

November 26th, 2013

Semantic Web in Libraries, Hamburg

2
Motivation

November 26th, 2013

Semantic Web in Libraries, Hamburg

3
Current situation in Germany


Five regional library unions


Subject headings




Classification systems







Predominantly RSWK („Regeln für den Schlagwortkatalog“ „Rules for the subject catalogue“) using a shared authority file
RVK (Regensburg Union Classification)
BK (Basic Classification)
DDC (Dewey Decimal Classification)
Various local classification systems

Low proportion of indexed titles (25-30%)

November 26th, 2013

Semantic Web in Libraries, Hamburg

4
Current situation in Germany


National library


Subject headings




Classification systems






Predominantly RSWK („Regeln für den Schlagwortkatalog“ „Rules for the subject catalogue“) using a shared authority file
DDC (Dewey Decimal Classification)
Coarse categories

DDC only for titles published since 2007
Only „Reihe A“ (print trade publications) is fully indexed
with RSWK

November 26th, 2013

Semantic Web in Libraries, Hamburg

5
Austrian National Library


Subject headings




Predominantly RSWK („Regeln für den
Schlagwortkatalog“ - „Rules for the subject catalogue“)
using a shared authority file

Classification systems


BK since 2007



RVK in the Austrian librariy union catalogue

November 26th, 2013

Semantic Web in Libraries, Hamburg

6
Goals


Re-use existing indexing information


National level






BK is used mainly in northern Germany / Austria
RVK mainly in southern Germany
DDC mainly by the National Library

International level



Make RVK data more accessible to DDC users
Use DDC indexing information available from e.g. the Library
of Congress

November 26th, 2013

Semantic Web in Libraries, Hamburg

7
Ideas


Use of appropriate classification systems


Facetted search in resource discovery systems





Browsing of similar titles




Should be monohierarchical
Should have limited number of classes
→ DDC (first digits) or BK

Should be fine-grained
→ DDC (full) or RVK

(Multi-lingual retrieval)

November 26th, 2013

Semantic Web in Libraries, Hamburg

8
Ideas


Enable the use of existing tools and visualisations

Denton (2012)
November 26th, 2013

Legrady (2005)
Semantic Web in Libraries, Hamburg

9
Instance-based Matching

November 26th, 2013

Semantic Web in Libraries, Hamburg

10
Ontology matching


Well-studied problem in computer science



Several approaches


Based on the descriptors



Based on the structure



Based on the manifestations (instances)

November 26th, 2013

Semantic Web in Libraries, Hamburg

11
Instances


Entries in catalogues with multiple classifications

November 26th, 2013

Semantic Web in Libraries, Hamburg

12
Instance-based matching


Assumptions





Classes with semantic overlap co-occur in instances
The more often these classes co-occur, the stronger
the overlap

Preparation


Extraction of all pairs of classifications from the data



Count of the extracted pairs

November 26th, 2013

Semantic Web in Libraries, Hamburg

13
Example

November 26th, 2013

Semantic Web in Libraries, Hamburg

14
Example


Entry 1



Pairs





179.9 / CC 7200



RVK: CC 7200



179.9 / CC 7250





DDC: 179.9
RVK: CC 7250



179.9 / CC 7200

Entry 2


DDC: 179.9



RVK: CC 7200

November 26th, 2013

Semantic Web in Libraries, Hamburg

15
Normalisation


Comparing solely absolute numbers is bad





Some classes are more often used than others
Number of pairs correlates with the number of entries
that are classified using a given class

Instead:
Use proportion of co-occurrence ↔ occurrence
∣E c1∩ E c2∣
∣E c1∪ E c2∣
number of entries with both classifications
divided by
number of entries with either classification
(Jaccard measure for overlap of sets)
November 26th, 2013

Semantic Web in Libraries, Hamburg

16
Further interpretation


a and b are two classes from two classification
systems A and B


The classes a and b only occur together
→ exact match



a only co-occurs with b, but b co-occurs with other
classes from A
→ a is narrower concept than b



a co-occurs with several classes from B (including b)
→ a is wider concept than b



a and b do not co-occur
→ cannot infer that a and b are unrelated

November 26th, 2013

Semantic Web in Libraries, Hamburg

17
Prior work


Pfeffer (2009)


Analysis of classification system structure and actual
use







Locating classes that describe the same concept
Finding ways to improve existing mappings to RVK
Focus on RVK, using data from library union catalogues
Co-occurrence analysis

Results






High co-occurrence and close in the hierarchy:
→ classes are hard to assign properly
High co-occurrence and far in the hierarchy:
→ classes describe identical concepts
Mappings from RSWK to RVK could be augmented

November 26th, 2013

Semantic Web in Libraries, Hamburg

18
Related work


Isaac et.al. (2007)


Applied instance based matching to bibliographic data



Data from the National Library of the Netherlands



Mapping from a thesaurus to a classification system



Results



Generated mappings are quite good
More sophisticated measures than Jaccard do not lead to
better mappings

November 26th, 2013

Semantic Web in Libraries, Hamburg

19
Application to bibliographic data

November 26th, 2013

Semantic Web in Libraries, Hamburg

20
Bibliographic data is different


Multiple editions



Multiple document types

November 26th, 2013

Semantic Web in Libraries, Hamburg

21
Bibliographic data


Skewed data





Multiple editions → More pairs
Some co-occurrences could appear stronger than others

Solution: Pre-clustering individual titles on the „work“
level


Increases chance for instances with more than one
classifications



Each cluster contributes only once



Allows using absolute co-occurrence numbers



Cut-off for small numbers
Ranking of competing matches

November 26th, 2013

Semantic Web in Libraries, Hamburg

22
Prior work


Pfeffer (2013)


Matching bibliographic records


Based on author, title and uniform title




(as well as information on title changes)

Matches any edition and revision of a work


Including translations



Merge match sets → Discrete clusters



Consolidating indexing information




For indexing purposes, the differences between editions and
revisions are irrelevant
Subject headings and classifications are shared between all
members of a cluster

November 26th, 2013

Semantic Web in Libraries, Hamburg

23
Evaluation

November 26th, 2013

Semantic Web in Libraries, Hamburg

24
Comparison with existing mappings


Existing (partial) mappings can be used as a basis for
evaluation
→ „Gold standard“



Comparison of automatic and manual mapping





Recall: Are all the mappings found?
Precision: Are all found mappings correct?

Analysis of additional links


Maybe the gold standard can be improved?

November 26th, 2013

Semantic Web in Libraries, Hamburg

25
Ongoing projects

November 26th, 2013

Semantic Web in Libraries, Hamburg

26
Data


Bibliographic data



German National Library catalogue



Austrian National Library catalogue





German library union catalogues

British national bibliography

Gold standards


Partial mappings BK ↔ RVK

November 26th, 2013

Semantic Web in Libraries, Hamburg

27
Interesting Mappings


RVK → BK



BK well suited for faceted retrieval





Gold standard exists
RVK has largest proportion of classified titles

RVK ↔ DDC




Enable data sharing between the German National Library
and the RVK-using libraries

Not limited to classification systems


See Pfeffer (2009) and Wang et.al. (2009)

November 26th, 2013

Semantic Web in Libraries, Hamburg

28
Implementation: Tasks


Import and mapping of MAB2 and MARC data



Clustering



Matching and clustering





Generation of keys for the match process
Consolidation of indexing and classification information

Statistics





Co-occurrence counts
Jaccard measure

Output


Full mappings

November 26th, 2013

Semantic Web in Libraries, Hamburg

29
Implementation: State


All steps implemented as a prototype





Perl scripts
File-based data and indexes

Current development


Still Perl scripts (but better documented)



All data is accumulated in a document store




MongoDB

Further plan: Porting to MetaFacture framework
November 26th, 2013

Semantic Web in Libraries, Hamburg

30
RDF representation

November 26th, 2013

Semantic Web in Libraries, Hamburg

31
Classification systems as Linked Data


DDC has been published as Linked Data



RVK has not been published as Linked Data





There is no versioning and no stable identifiers
A project to fix this and to publish RVK as Linked Data has
been cancelled by the university library of Regensburg

BK has not been published as Linked Data


There is authority data in the GVK union catalogue

→ One would have to create temporary URIs for the
RVK and BK classes

November 26th, 2013

Semantic Web in Libraries, Hamburg

32
Direct Mappings


SKOS offers


skos:mappingRelation



skos:closeMatch



skos:exactMatch



skos:broadMatch



skos:narrowMatch



skos:relatedMatch

November 26th, 2013

Semantic Web in Libraries, Hamburg

33
Further Mappings


1:n-Relationships





List of classes that are all narrow matches
Or: A combination of classes is a (near) exact match

Qualified mappings


Express the confidence of the proposed match



Allow applications to optimize for precision or recall

November 26th, 2013

Semantic Web in Libraries, Hamburg

34
Qualification through indirection


RDF relations cannot be qualified
http://dewey.info/class/641/

November 26th, 2013

skos:exactMatch

http://ex.org/rvk/BF_8150

Semantic Web in Libraries, Hamburg

35
Qualification through indirection


So an intermediate node is used
http://dewey.info/class/641/

skos:exactMatch

ex:qualifiedMatch

http://ex.org/rvk/BF_8150

ex:targetConcept
_1

ex:confidence

Further information

“1.0”

November 26th, 2013

Semantic Web in Libraries, Hamburg

36
Summary


Mappings between classification systems are an
important means for interoperability and sharing of
classification information and tools



Simple statistics on the existing data in catalogues can
generate candidates for matches between individual
classes



Where manually created mappings exist, they can be
used to evaluate the algorithmic results



Mappings can be expressed in SKOS terms




To qualify the mappings further, intermediate nodes
need to be introduced
There is no standard for this yet

November 26th, 2013

Semantic Web in Libraries, Hamburg

37
Thank you for listening.

Slides available online
http://www.slideshare.net/MagnusPfeffer/
This work is licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.

November 26th, 2013

Semantic Web in Libraries, Hamburg

38
References


Denton, W. (2012). On dentographs, a new method of visualizing library collections.
In: Code4Lib.



Isaac, A., Van Der Meij, L., Schlobach, S. and Wang, S. (2007).
An empirical study of instance-based ontology matching. In: The Semantic Web
(pp. 253-266). Springer Berlin Heidelberg.



Legrady, G. (2005). Making visible the invisible. Seattle Library Data Flow
Visualization. In: Digital Culture and Heritage. Proceedings of ICHIM05 Sept, 21-23.



Pfeffer, M. (2009). Äquivalenzklassen – Alle Doppelstellen der RVK finden.
Presentation given at the Librarian Workshop of the 33rd Annual Conference of the
German Classification Society on Data Analysis, Machine Learning, and
Applications (GfKl).



Pfeffer, M. (2013). Using clustering across union catalogues to enrich entries with
indexing information. In: Data Analysis, Machine Learning and Knowledge
Discovery - Proceedings of the 36th Annual Conference of the Gesellschaft für
Klassifikation e. V.. Springer Berlin Heidelberg.



Wang, S., Isaac, A., Schopman, B., Schlobach, S. and Van Der Meij, L. (2009).
Matching multi-lingual subject vocabularies. In: Research and Advanced
Technology for Digital Libraries (pp. 125-137). Springer Berlin Heidelberg.
November 26th, 2013

Semantic Web in Libraries, Hamburg

39

Contenu connexe

Tendances

Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...
తేజ దండిభట్ల
 
Regal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic DataRegal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic Data
Felix Ostrowski
 

Tendances (20)

Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
 
Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...
 
Linked data-tooling-xml
Linked data-tooling-xmlLinked data-tooling-xml
Linked data-tooling-xml
 
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
 
Are you talking to me? Researching a scenario for linking objects and publica...
Are you talking to me? Researching a scenario for linking objects and publica...Are you talking to me? Researching a scenario for linking objects and publica...
Are you talking to me? Researching a scenario for linking objects and publica...
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
 
Ivan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3CIvan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3C
 
Data 2 Documents: Modular and Distributive Content Management in RDF
Data 2 Documents: Modular and Distributive Content Management in RDFData 2 Documents: Modular and Distributive Content Management in RDF
Data 2 Documents: Modular and Distributive Content Management in RDF
 
Linking library data
Linking library dataLinking library data
Linking library data
 
Linked data tooling XML
Linked data tooling XMLLinked data tooling XML
Linked data tooling XML
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Establishing the Connection: Creating a Linked Data Version of the BNB
Establishing the Connection: Creating a Linked Data Version of the BNBEstablishing the Connection: Creating a Linked Data Version of the BNB
Establishing the Connection: Creating a Linked Data Version of the BNB
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
Introduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research GraphIntroduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research Graph
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
 
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgEC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
 
Linked data as a library data platform
Linked data as a library data platformLinked data as a library data platform
Linked data as a library data platform
 
ROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data Stack
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked Data
 
Regal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic DataRegal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic Data
 

En vedette

Metadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
Metadata Provenance Tutorial Part 2: Interoperable Metadata ProvenanceMetadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
Metadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
Magnus Pfeffer
 
Ausleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzen
Ausleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzenAusleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzen
Ausleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzen
Magnus Pfeffer
 

En vedette (17)

Jetzt kommt zusammen, was zusammen gehört
Jetzt kommt zusammen, was zusammen gehörtJetzt kommt zusammen, was zusammen gehört
Jetzt kommt zusammen, was zusammen gehört
 
Metadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
Metadata Provenance Tutorial Part 2: Interoperable Metadata ProvenanceMetadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
Metadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
 
Bibliotheken und Linked Open Data Extended
Bibliotheken und Linked Open Data ExtendedBibliotheken und Linked Open Data Extended
Bibliotheken und Linked Open Data Extended
 
Bibliotheken und Linked Open Data
Bibliotheken und Linked Open DataBibliotheken und Linked Open Data
Bibliotheken und Linked Open Data
 
Cloud Computing für die Verarbeitung von Metadaten
Cloud Computing für die Verarbeitung von MetadatenCloud Computing für die Verarbeitung von Metadaten
Cloud Computing für die Verarbeitung von Metadaten
 
Automatisches Generieren von Konkordanzen
Automatisches Generieren von KonkordanzenAutomatisches Generieren von Konkordanzen
Automatisches Generieren von Konkordanzen
 
Clustering auf Werksebene
Clustering auf WerksebeneClustering auf Werksebene
Clustering auf Werksebene
 
Altbestandserschließung: Automatische Übernahme von RVK und SWD über Verbundg...
Altbestandserschließung: Automatische Übernahme von RVK und SWD über Verbundg...Altbestandserschließung: Automatische Übernahme von RVK und SWD über Verbundg...
Altbestandserschließung: Automatische Übernahme von RVK und SWD über Verbundg...
 
Abgleich von Titeldaten zur Übernahme von Sacherschließungsinformationen übe...
Abgleich von Titeldaten zur Übernahme von Sacherschließungsinformationen  übe...Abgleich von Titeldaten zur Übernahme von Sacherschließungsinformationen  übe...
Abgleich von Titeldaten zur Übernahme von Sacherschließungsinformationen übe...
 
Fallbasierte automatische Klassifikation nach der RVK - k-nearest neighbour a...
Fallbasierte automatische Klassifikation nach der RVK - k-nearest neighbour a...Fallbasierte automatische Klassifikation nach der RVK - k-nearest neighbour a...
Fallbasierte automatische Klassifikation nach der RVK - k-nearest neighbour a...
 
Linked Data in der Lehre
Linked Data in der LehreLinked Data in der Lehre
Linked Data in der Lehre
 
Bibliotheken und Linked Open Data
Bibliotheken und Linked Open DataBibliotheken und Linked Open Data
Bibliotheken und Linked Open Data
 
Resource Discovery: Herausforderung und Chance für die Sacherschließung
Resource Discovery:  Herausforderung und Chance für die SacherschließungResource Discovery:  Herausforderung und Chance für die Sacherschließung
Resource Discovery: Herausforderung und Chance für die Sacherschließung
 
Ausleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzen
Ausleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzenAusleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzen
Ausleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzen
 
RVK 3.0 - Die Regensburger Verbundklassifikation als Normdatei für Bibliothek...
RVK 3.0 - Die Regensburger Verbundklassifikation als Normdatei für Bibliothek...RVK 3.0 - Die Regensburger Verbundklassifikation als Normdatei für Bibliothek...
RVK 3.0 - Die Regensburger Verbundklassifikation als Normdatei für Bibliothek...
 
Resource Discovery - Sacherschließung am Ende?
Resource Discovery - Sacherschließung am Ende?Resource Discovery - Sacherschließung am Ende?
Resource Discovery - Sacherschließung am Ende?
 
Open Source Software zur Verarbeitung und Analyse von Metadatenmanagement
Open Source Software zur Verarbeitung und Analyse von MetadatenmanagementOpen Source Software zur Verarbeitung und Analyse von Metadatenmanagement
Open Source Software zur Verarbeitung und Analyse von Metadatenmanagement
 

Similaire à Automatic creation of mappings between classification systems for bibliographic data

Rdf and open linked data a first approach
Rdf and open linked data a first approach Rdf and open linked data a first approach
Rdf and open linked data a first approach
@CULT Srl
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
Sören Auer
 
20130711 records2 graphs_madrid
20130711 records2 graphs_madrid20130711 records2 graphs_madrid
20130711 records2 graphs_madrid
Stefan Gradmann
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
Kalpit Desai
 

Similaire à Automatic creation of mappings between classification systems for bibliographic data (20)

Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?
 
Scalable Web Data Management using RDF
Scalable Web Data Management using RDF  Scalable Web Data Management using RDF
Scalable Web Data Management using RDF
 
Rdf and open linked data a first approach
Rdf and open linked data a first approach Rdf and open linked data a first approach
Rdf and open linked data a first approach
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
gayathrinosql.pptx
gayathrinosql.pptxgayathrinosql.pptx
gayathrinosql.pptx
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
20130527 library linkeddata
20130527 library linkeddata20130527 library linkeddata
20130527 library linkeddata
 
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهمعرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
 
Scalable Cross-lingual Document Similarity through Language-specific Concept ...
Scalable Cross-lingual Document Similarity through Language-specific Concept ...Scalable Cross-lingual Document Similarity through Language-specific Concept ...
Scalable Cross-lingual Document Similarity through Language-specific Concept ...
 
guacamole: an Object Document Mapper for ArangoDB
guacamole: an Object Document Mapper for ArangoDBguacamole: an Object Document Mapper for ArangoDB
guacamole: an Object Document Mapper for ArangoDB
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
 
20130711 records2 graphs_madrid
20130711 records2 graphs_madrid20130711 records2 graphs_madrid
20130711 records2 graphs_madrid
 
20181106 survey on challenges of question answering in the semantic web saltlux
20181106 survey on challenges of question answering in the semantic web saltlux20181106 survey on challenges of question answering in the semantic web saltlux
20181106 survey on challenges of question answering in the semantic web saltlux
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1
 

Dernier

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Dernier (20)

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 

Automatic creation of mappings between classification systems for bibliographic data

  • 1. Automatic creation of mappings between classification systems for bibliographic data Prof. Magnus Pfeffer Stuttgart Media University pfeffer@hdm-stuttgart.de
  • 2. Agenda  Motivation  Instance-based matching  Application to bibliographic data  Evaluation  Ongoing projects  RDF Representation November 26th, 2013 Semantic Web in Libraries, Hamburg 2
  • 3. Motivation November 26th, 2013 Semantic Web in Libraries, Hamburg 3
  • 4. Current situation in Germany  Five regional library unions  Subject headings   Classification systems      Predominantly RSWK („Regeln für den Schlagwortkatalog“ „Rules for the subject catalogue“) using a shared authority file RVK (Regensburg Union Classification) BK (Basic Classification) DDC (Dewey Decimal Classification) Various local classification systems Low proportion of indexed titles (25-30%) November 26th, 2013 Semantic Web in Libraries, Hamburg 4
  • 5. Current situation in Germany  National library  Subject headings   Classification systems     Predominantly RSWK („Regeln für den Schlagwortkatalog“ „Rules for the subject catalogue“) using a shared authority file DDC (Dewey Decimal Classification) Coarse categories DDC only for titles published since 2007 Only „Reihe A“ (print trade publications) is fully indexed with RSWK November 26th, 2013 Semantic Web in Libraries, Hamburg 5
  • 6. Austrian National Library  Subject headings   Predominantly RSWK („Regeln für den Schlagwortkatalog“ - „Rules for the subject catalogue“) using a shared authority file Classification systems  BK since 2007  RVK in the Austrian librariy union catalogue November 26th, 2013 Semantic Web in Libraries, Hamburg 6
  • 7. Goals  Re-use existing indexing information  National level     BK is used mainly in northern Germany / Austria RVK mainly in southern Germany DDC mainly by the National Library International level   Make RVK data more accessible to DDC users Use DDC indexing information available from e.g. the Library of Congress November 26th, 2013 Semantic Web in Libraries, Hamburg 7
  • 8. Ideas  Use of appropriate classification systems  Facetted search in resource discovery systems    Browsing of similar titles   Should be monohierarchical Should have limited number of classes → DDC (first digits) or BK Should be fine-grained → DDC (full) or RVK (Multi-lingual retrieval) November 26th, 2013 Semantic Web in Libraries, Hamburg 8
  • 9. Ideas  Enable the use of existing tools and visualisations Denton (2012) November 26th, 2013 Legrady (2005) Semantic Web in Libraries, Hamburg 9
  • 10. Instance-based Matching November 26th, 2013 Semantic Web in Libraries, Hamburg 10
  • 11. Ontology matching  Well-studied problem in computer science  Several approaches  Based on the descriptors  Based on the structure  Based on the manifestations (instances) November 26th, 2013 Semantic Web in Libraries, Hamburg 11
  • 12. Instances  Entries in catalogues with multiple classifications November 26th, 2013 Semantic Web in Libraries, Hamburg 12
  • 13. Instance-based matching  Assumptions    Classes with semantic overlap co-occur in instances The more often these classes co-occur, the stronger the overlap Preparation  Extraction of all pairs of classifications from the data  Count of the extracted pairs November 26th, 2013 Semantic Web in Libraries, Hamburg 13
  • 14. Example November 26th, 2013 Semantic Web in Libraries, Hamburg 14
  • 15. Example  Entry 1  Pairs   179.9 / CC 7200  RVK: CC 7200  179.9 / CC 7250   DDC: 179.9 RVK: CC 7250  179.9 / CC 7200 Entry 2  DDC: 179.9  RVK: CC 7200 November 26th, 2013 Semantic Web in Libraries, Hamburg 15
  • 16. Normalisation  Comparing solely absolute numbers is bad    Some classes are more often used than others Number of pairs correlates with the number of entries that are classified using a given class Instead: Use proportion of co-occurrence ↔ occurrence ∣E c1∩ E c2∣ ∣E c1∪ E c2∣ number of entries with both classifications divided by number of entries with either classification (Jaccard measure for overlap of sets) November 26th, 2013 Semantic Web in Libraries, Hamburg 16
  • 17. Further interpretation  a and b are two classes from two classification systems A and B  The classes a and b only occur together → exact match  a only co-occurs with b, but b co-occurs with other classes from A → a is narrower concept than b  a co-occurs with several classes from B (including b) → a is wider concept than b  a and b do not co-occur → cannot infer that a and b are unrelated November 26th, 2013 Semantic Web in Libraries, Hamburg 17
  • 18. Prior work  Pfeffer (2009)  Analysis of classification system structure and actual use      Locating classes that describe the same concept Finding ways to improve existing mappings to RVK Focus on RVK, using data from library union catalogues Co-occurrence analysis Results    High co-occurrence and close in the hierarchy: → classes are hard to assign properly High co-occurrence and far in the hierarchy: → classes describe identical concepts Mappings from RSWK to RVK could be augmented November 26th, 2013 Semantic Web in Libraries, Hamburg 18
  • 19. Related work  Isaac et.al. (2007)  Applied instance based matching to bibliographic data  Data from the National Library of the Netherlands  Mapping from a thesaurus to a classification system  Results   Generated mappings are quite good More sophisticated measures than Jaccard do not lead to better mappings November 26th, 2013 Semantic Web in Libraries, Hamburg 19
  • 20. Application to bibliographic data November 26th, 2013 Semantic Web in Libraries, Hamburg 20
  • 21. Bibliographic data is different  Multiple editions  Multiple document types November 26th, 2013 Semantic Web in Libraries, Hamburg 21
  • 22. Bibliographic data  Skewed data    Multiple editions → More pairs Some co-occurrences could appear stronger than others Solution: Pre-clustering individual titles on the „work“ level  Increases chance for instances with more than one classifications  Each cluster contributes only once  Allows using absolute co-occurrence numbers   Cut-off for small numbers Ranking of competing matches November 26th, 2013 Semantic Web in Libraries, Hamburg 22
  • 23. Prior work  Pfeffer (2013)  Matching bibliographic records  Based on author, title and uniform title   (as well as information on title changes) Matches any edition and revision of a work  Including translations  Merge match sets → Discrete clusters  Consolidating indexing information   For indexing purposes, the differences between editions and revisions are irrelevant Subject headings and classifications are shared between all members of a cluster November 26th, 2013 Semantic Web in Libraries, Hamburg 23
  • 24. Evaluation November 26th, 2013 Semantic Web in Libraries, Hamburg 24
  • 25. Comparison with existing mappings  Existing (partial) mappings can be used as a basis for evaluation → „Gold standard“  Comparison of automatic and manual mapping    Recall: Are all the mappings found? Precision: Are all found mappings correct? Analysis of additional links  Maybe the gold standard can be improved? November 26th, 2013 Semantic Web in Libraries, Hamburg 25
  • 26. Ongoing projects November 26th, 2013 Semantic Web in Libraries, Hamburg 26
  • 27. Data  Bibliographic data   German National Library catalogue  Austrian National Library catalogue   German library union catalogues British national bibliography Gold standards  Partial mappings BK ↔ RVK November 26th, 2013 Semantic Web in Libraries, Hamburg 27
  • 28. Interesting Mappings  RVK → BK   BK well suited for faceted retrieval   Gold standard exists RVK has largest proportion of classified titles RVK ↔ DDC   Enable data sharing between the German National Library and the RVK-using libraries Not limited to classification systems  See Pfeffer (2009) and Wang et.al. (2009) November 26th, 2013 Semantic Web in Libraries, Hamburg 28
  • 29. Implementation: Tasks  Import and mapping of MAB2 and MARC data  Clustering   Matching and clustering   Generation of keys for the match process Consolidation of indexing and classification information Statistics    Co-occurrence counts Jaccard measure Output  Full mappings November 26th, 2013 Semantic Web in Libraries, Hamburg 29
  • 30. Implementation: State  All steps implemented as a prototype    Perl scripts File-based data and indexes Current development  Still Perl scripts (but better documented)  All data is accumulated in a document store   MongoDB Further plan: Porting to MetaFacture framework November 26th, 2013 Semantic Web in Libraries, Hamburg 30
  • 31. RDF representation November 26th, 2013 Semantic Web in Libraries, Hamburg 31
  • 32. Classification systems as Linked Data  DDC has been published as Linked Data  RVK has not been published as Linked Data    There is no versioning and no stable identifiers A project to fix this and to publish RVK as Linked Data has been cancelled by the university library of Regensburg BK has not been published as Linked Data  There is authority data in the GVK union catalogue → One would have to create temporary URIs for the RVK and BK classes November 26th, 2013 Semantic Web in Libraries, Hamburg 32
  • 34. Further Mappings  1:n-Relationships    List of classes that are all narrow matches Or: A combination of classes is a (near) exact match Qualified mappings  Express the confidence of the proposed match  Allow applications to optimize for precision or recall November 26th, 2013 Semantic Web in Libraries, Hamburg 34
  • 35. Qualification through indirection  RDF relations cannot be qualified http://dewey.info/class/641/ November 26th, 2013 skos:exactMatch http://ex.org/rvk/BF_8150 Semantic Web in Libraries, Hamburg 35
  • 36. Qualification through indirection  So an intermediate node is used http://dewey.info/class/641/ skos:exactMatch ex:qualifiedMatch http://ex.org/rvk/BF_8150 ex:targetConcept _1 ex:confidence Further information “1.0” November 26th, 2013 Semantic Web in Libraries, Hamburg 36
  • 37. Summary  Mappings between classification systems are an important means for interoperability and sharing of classification information and tools  Simple statistics on the existing data in catalogues can generate candidates for matches between individual classes  Where manually created mappings exist, they can be used to evaluate the algorithmic results  Mappings can be expressed in SKOS terms   To qualify the mappings further, intermediate nodes need to be introduced There is no standard for this yet November 26th, 2013 Semantic Web in Libraries, Hamburg 37
  • 38. Thank you for listening. Slides available online http://www.slideshare.net/MagnusPfeffer/ This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. November 26th, 2013 Semantic Web in Libraries, Hamburg 38
  • 39. References  Denton, W. (2012). On dentographs, a new method of visualizing library collections. In: Code4Lib.  Isaac, A., Van Der Meij, L., Schlobach, S. and Wang, S. (2007). An empirical study of instance-based ontology matching. In: The Semantic Web (pp. 253-266). Springer Berlin Heidelberg.  Legrady, G. (2005). Making visible the invisible. Seattle Library Data Flow Visualization. In: Digital Culture and Heritage. Proceedings of ICHIM05 Sept, 21-23.  Pfeffer, M. (2009). Äquivalenzklassen – Alle Doppelstellen der RVK finden. Presentation given at the Librarian Workshop of the 33rd Annual Conference of the German Classification Society on Data Analysis, Machine Learning, and Applications (GfKl).  Pfeffer, M. (2013). Using clustering across union catalogues to enrich entries with indexing information. In: Data Analysis, Machine Learning and Knowledge Discovery - Proceedings of the 36th Annual Conference of the Gesellschaft für Klassifikation e. V.. Springer Berlin Heidelberg.  Wang, S., Isaac, A., Schopman, B., Schlobach, S. and Van Der Meij, L. (2009). Matching multi-lingual subject vocabularies. In: Research and Advanced Technology for Digital Libraries (pp. 125-137). Springer Berlin Heidelberg. November 26th, 2013 Semantic Web in Libraries, Hamburg 39