Multilingualism ifla 2014 08

IFLA - Lyon, France 19 August 2014
Multilingualism in
WorldCat and
VIAF
Janifer Gatenby
Working with Karen Smith-Yoshimura, Robert Bremer, Eric Childress, Jean Godby,
Richard Greene, JD Shipengrover, Gail Thornburg, Jenny Toves, Diane Vizine
Goetz, Shenghui Wang, Jay Weitz

WorldCat Today
• Resources in nearly all
languages
• Contributed by more
than 20,000 libraries
worldwide
• More than half the
database is for works
not in English
Languages
English
German
French
Spanish
Chinese
Dutch
Japanese
Russian
Arabic
469 others

WorldCat Today
• Bibliographic Records
– Hybrid records
– Parallel records
• Clustered at Work
level (FRBR)

Existing Architecture
Authors
Authors
Authors
Subj
Classif
Subj
Classif Subj
Classif
Holding
Holding
Holdings
Bibliographic
record
Work
cluster
Content
cluster
Manifes
tation
cluster

Complementary Initiatives
Work Level
Record
GLIMIR
Manifestation &
Content Clusters
Multi-lingual
Bibliographic
Structure

Create a consolidated
metadata summary for the
content of a work
Objective: Work Level Record

Work Level Record
http://www.oclc.org/research/activities/workrecs.html
Coming Q1
2015

Create better work
presentations
GLIMIR: Objective

Users like C
• The Content Cluster
GLIMIR
– Enables better work record displays by reducing the number of
lines that display for large works
– Enables a choice of format and presents the formats that could be
acceptable substitutes
– Consolidates holdings for identical content
• The Manifestation Cluster is important
– Consolidates holdings at manifestation level
– In the short term allows the record catalogued in the language of
the interface to be chosen for display
– Reduces apparent duplication
– Allows a more accurate count of the number of manifestations in
WorldCat (as opposed to the number of records)
Cataloguers
& scholars
like C

Manifestation Clustering
So far 103 million records processed (about 30%)

SRU Search:
Loti Pêcheur d’islande
(Work ID 21536567)
Records Holdings
Work 18 148
Content 14 143
Manifestation 7 115

Multilingual Bibliographic
Structure Project
Objective: Improve displays; surface translations

Multilingual Bibliographic Structure Project
Creates true multi-lingual displays
– At work and manifestation levels
– Using all available data instead of “most appropriate record”
– Generates data
Corrects many of the 28 million records coded
“und”
Better control and linking of translations
Input to refinement of work clusters
Smarter data storage

“Most appropriate” questioned
• Worldcat.org selects the most appropriate
record to show to a user as representative of
the work in the short result list and beyond
• The end result will not be very satisfactory from a
multi-lingual viewpoint… here’s why

Which record is better to present to a German speaker?

Most appropriate
display
Build the display from all available data

Multilingual Bibliographic Structure Project
• Work level data, mined from all associated
bibliographic records will be displayed
supplemented with expression / manifestation
level data as the user drills through the short
to fuller versions of the metadata.
End user interface will show works and manifestations not bibliographic records; the
cataloguing client will also show bibliographic records

Proposed new
architecture
jpn
Work
eng
fre
ger
jpn
Manif
eng
Manif
eng
Manif
eng
Manif
eng Manif
Manif
eng
engA
oN oftrees
Contents
++
Holding
Holding
Holding
Subj
Classif
Holding
Subj
sif
eng
fre
ger
jpn
Authors
Authors
eng Authors
fre
ger
eng
fre
ger
jpn
fre
eng
ger
jpn
Translations
(Language of work)
Manif
fre
Holding

Important principles
• Language tagging of elements, particularly
– Summaries (M21 520)
– Subject headings
• Display in script preferred by the user if data is
available
• Improve translated interfaces
• Show consolidated holdings as appropriate

Translations
Surfacing the “cream”

Great works are translated
• The cream of the world’s cultural and
knowledge heritage is shared by being
translated
• WorldCat contains many rich cataloguing
records for these translations
GOAL: Data mine the really good records to
improve clustering, presentation, authority records
and linked data

Ιλιάδα
The Iliad 紅樓夢
Dream of the Red Chamber
ঘরে বাইরে
The Home and the World
زقاق المدق
Midaq Alley
Война и миръ
War and Peace
The Tale of Genji
דער בעל-תשובה
The Penitent
સત્યના પ્રયોગો અથવા આત્મકથા
源
氏
物
語
The Story of My Experiments with Truth [Gandhi autobiography]

Translations
Leo Tolstoy: 32 languages
Homer: 28 languages
Rabindranath Tagore: 21
Isaac Bashevis Singer: 17
Najīb Maḥfūẓ: 12 languages
Cao Xueqin: 9 languages
Mahatma Gandhi: 7 languages
Murasaki Shikabu: 7 languages

Improving work clustering
• Inconsistencies cause work clusters to be
incomplete resulting in less than optimal
search results
– Titles without subtitles
– Missing or different forms of uniform title
– Inverted title
– Different coding of original and translated
information
Generated uniform title authority records will overcome most of these differences
without needing to edit individual records

Addition of xR records to VIAF
Before
After

XR VIAF Record
VIAF ID for Author
Translated title
Translator

IFLA - Lyon, France 19 August 2014
VIAF Linked Data
New Information

Title: 西遊記
Language: Chinese
Author: 吳承恩
Created: 1592
HasTranslation:
Title: Journey to the West
Language: English
Translator: Anthony C. Yu
Date: 1977
IsTranslationOf:
Title: Journey to the West
Language: English
Translator: W. J. F. Jenner
Date: 1982-1984
IsTranslationOf:
Title: Tây du ký bình khảo
Language: Vietnamese
Translator: Phan Quân
Date: 1980
IsTranslationOf:
Title: Monkeys Pilgerfahrt
Language: German
Translator: Georgette Boner
Date: 1983
IsTranslationOf:
Title: 西遊記
Language: Japanese
Translator: 中野美代子
Date: 1986
IsTranslationOf:

Markup for the Semantic Web
# Original Work (in Chinese)
<http://worldcat.org/entity/work/id/1215997>
a schema:CreativeWork;
schema:creator <http://viaf.org/viaf/102266649> ; # "Gao, Xingjian”
schema:inLanguage "zh";
schema:name "靈山"@zh;
.
# Translated Work (in English)
<http://worldcat.org/entity/work/id/145209748>
a schema:CreativeWork;
schema:creator <http://viaf.org/viaf/102266649> ; # "Gao, Xingjian“
[new]:translator <http://viaf.org/viaf/81663420> ; # "Lee, Mabel"
schema:inLanguage "en";
schema:name "Soul Mountain"@en ;
[new]:translationOfWork <http://worldcat.org/entity/work/id/1215997> “

Understanding information sharing
across cultures
• What percentage of non-English works are translations of
English works, and vice-versa?
• Which authors are translated the most?
• Which works have been translated into the most languages?
• Which countries translate the most English works, the most
non-English works?
• Which countries translate a new
work the fastest?
Etc.
http://www.oclc.org/research/activities/multilingual-bib-structure.html

Where are we now?
Clustering
• Work clusters done; ongoing refinement
• GLIMIR clustering done for all [simple] text;
– 103 million records have GLIMIR IDs
• Working on collected works
Displays
• Working on VIAF expression displays
• Work level displays in WorldCat.org ++
Data Mining for translations

Janifer Gatenby
EMEA Program Manager Metadata
Janifer.gatenby@oclc.orgoclc.org
Explore. Share. Magnify.

Multilingualism ifla 2014 08

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (14)

Similaire à Multilingualism ifla 2014 08

Similaire à Multilingualism ifla 2014 08 (20)

Plus de Janifer Gatenby

Plus de Janifer Gatenby (7)

Dernier

Dernier (20)

Multilingualism ifla 2014 08

Notes de l'éditeur