SlideShare une entreprise Scribd logo
1  sur  19
Stefano Bargioni
Pontificia Università della Santa Croce

Catalogue enrichment: importing
Dewey Decimal Classification
from external sources

Oct 18, 2013

ADLUG 2013

1
The project
●

Improving the Dewey search path
–
–

●

●

with a minimal effort
while adding BNCF compliant subject headings to our
catalog

Koha 3 <http://koha-community.org> open source
ILS
Can be applied to other ILS's

Oct 18, 2013

ADLUG 2013

2
Version 1: The Batch Mode
●

Add Dewey notations to the catalog
–

automatically

–

from selected sources

–

ensure quality and uniformity

Oct 18, 2013

ADLUG 2013

3
An atomic copy cataloguing
●
●

copy cataloguing is usually related to the full record
we only need to copy field 082 (MARC21) or 676
(Unimarc)

●

ISBN unique identifier

●

the policy issue

Oct 18, 2013

ADLUG 2013

4
Records to be modified
●

without Dewey notation

●

with ISBN

●

limit: 008 language
–

SELECT biblionumber, ISBN
FROM biblio
WHERE ISBN_present
AND dewey_absent
AND language_008='...'

Oct 18, 2013

ADLUG 2013

In
Ko
cla ha,
My use i the W
Ex
tra SQ s ba HE
on ctV L
s
fie alu fun ed o RE
ld
e, t ctio n
thr bibl ha n
ou io. t w
exp gh X ma ork
res Pa rcxm s
sio th
l
ns
5
Dewey Sources (I)
●

a choice based on copy cataloguing experience

●

OCLC Classify

●

some National Libraries

●

API, Z39.50 or HTML access

Oct 18, 2013

ADLUG 2013

6
Dewey Sources (II): OCLC Classify
●

●

●

Classify is a FRBR-based prototype designed to support the assignment of classification
numbers and subject headings for books, DVDs, CDs, and other types of materials.
This project applies principles of the FRBR model to aggregate bibliographic information
above the manifestation level. Bibliographic records are grouped using the OCLC FRBR
Work-Set algorithm to form a work-level summary of the class numbers and subject headings
assigned to a work. You can retrieve a summary by ISBN, ISSN, UPC, OCLC number,
author/title, or subject heading.
The Classify database is accessible through a user interface and as a machine-to-machine
service. The database provides access to more than 36 million WorldCat records that contain
Dewey Decimal Classification (DDC) numbers,[...].

●

Retrieved information is in XML format.

●

http://www.oclc.org/research/activities/classify.html?urlm=159746

Oct 18, 2013

ADLUG 2013

7
Dewey Sources (III): National Libraries
LC

Library of Congress

(any)

MARC

BNF

Bibliothèque nationale de France

(fre)

MARC

DNB

Deutsche Nationalbibliothek

(ger)

HTML

BNCF

Biblioteca Nazionale Centrale di Firenze

(ita)

HTML

BNCR

Biblioteca Nazionale Centrale di Roma

(ita)

HTML

BNB

British National Bibliography

(eng)

MARC

Oct 18, 2013

ADLUG 2013

8
The logic used in the programs
●

open the connection to the bibliographical database

●

obtain the ISBN from records without a Dewey number

●

open the connection to the Dewey source, if Z39.50

●

for each ISBN

●

query the data source using the current ISBN

●

if a Dewey number is available in the response

●

if the Dewey number passes quality control

●

update the bibliographical record

●

wait to avoid overloading

●

close the connection to the Dewey source, if Z39.50

●

close the connection to the bibliographical database

Oct 18, 2013

ADLUG 2013

9
Quality check
●

Catalogs contain errors

●

DDC has many editions

●

Our old Dewey numbers start from edition 19

●

Indicators

●

Lot of discarded Dewey...

●

… but we moved from 40,000
to 60,000 records with Dewey number

Oct 18, 2013

ADLUG 2013

+5

0%
10
Delay while searching sources
●

Continuous searching can suffocate remote servers
–
–

●
●

robots.txt
policies for crawlers

Continuous indexing can overload your server
Wait a few seconds between searches or group of
searches
–

this will slow the harvesting process

Oct 18, 2013

ADLUG 2013

11
Statistics
Source

Language

Dewey #
not found

Dewey #
discarded

Classify

all

42387

10267

5321

6607

20059

LC

all

31999

1252

21195

8562

1011

BNF

all

30903

2253

21327

7268

55

DNB

ger

4193

163

3867

163

0

BNCF

ita

12017

4088

3643

3542

744

BNCR

ita

7549

1515

3003

2978

53

BNB

eng

6215

193

5449

55

518

Total

Oct 18, 2013

Records
Scanned

Records
Modified

ISBN not
found

Several
works
with
same
ISBN

8240

ISBN
incorrect

133

19710

ADLUG 2013

12
Browsing Dewey Index
Besides author, uniform
titles and subject
headings, our OPAC
offers a path of semantic
search based on the
Dewey classification
number

Oct 18, 2013

ADLUG 2013

13
Software
●

Query programs were written in Perl language, making
use of the Koha API and the following libraries
available on CPAN:
–

LWP for HTTP connections

–

ZOOM for Z39.50 connections

–

DBI for connections to the MySQL database

–

XML::XPath for XML data processing

–

WWW::Scraper for HTML data processing

–

MARC::Record for MARC records processing

Oct 18, 2013

ADLUG 2013

14
A scientific article
●

●

published on JLIS.it at
http://leo.cilea.it/index.php/jlis/article/view/8766
JLIS.it, Italian Journal of Library and information
science, is an academic journal of international
scope, peer-reviewed and open access

●

written with my cataloguers

●

doesn't deal with the dynamic component

Oct 18, 2013

ADLUG 2013

15
Version 2.0 - Single Record Mode
●

New record:
–
–

retrieve Dewey from important catalogs

–
●

enter the ISBN
choose and import the best one into the new record

Or upgrade an old record adding or modifying its
Dewey classification

Oct 18, 2013

ADLUG 2013

16
Oct 18, 2013

ADLUG 2013

17
Conclusions
●

Increase of available bibliographic data on the net

●

Unique identifiers
–
–

●

ISBN, ISSN, ...
VIAF Id, ISNI, ...

Catalog enrichment
–
–

●

bibliographic records
authority records

Expose rich linked data
–

with coded information like Dewey

–

with standard IDs like iSBN, ISNI, ...

Oct 18, 2013

ADLUG 2013

18
Thank you
Gracias
Grazie

Oct 18, 2013

ADLUG 2013

19

Contenu connexe

Similaire à Catalog enrichment: importing Dewey Decimal Classification from external sources (slides)

OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentscneudecker
 
JCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingJCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingSawood Alam
 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchSawood Alam
 
TPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingTPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingSawood Alam
 
The ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - RomeThe ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - RomePiergiorgio Lucidi
 
Kirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in Germany
Kirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in GermanyKirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in Germany
Kirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in GermanyKuali Days UK
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival TechnologiesCliff Landis
 
Everything you always wanted to know about WorldCat (but were afraid to ask) ...
Everything you always wanted to know about WorldCat (but were afraid to ask) ...Everything you always wanted to know about WorldCat (but were afraid to ask) ...
Everything you always wanted to know about WorldCat (but were afraid to ask) ...CILIP MDG
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using LokiKnoldus Inc.
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using LokiKnoldus Inc.
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...chiportal
 
Rene Voorburg - Using KB APIs to collect data
Rene Voorburg - Using KB APIs to collect dataRene Voorburg - Using KB APIs to collect data
Rene Voorburg - Using KB APIs to collect dataKBNLResearch
 
Cloud Foundry Logging and Metrics
Cloud Foundry Logging and MetricsCloud Foundry Logging and Metrics
Cloud Foundry Logging and MetricsEd King
 
BlogForever Project presentation at MTSR2013
BlogForever Project presentation at MTSR2013BlogForever Project presentation at MTSR2013
BlogForever Project presentation at MTSR2013eimgreece
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriDemi Ben-Ari
 
Science Gateways: one portal, many e-Infrastructures and related services
Science Gateways: one portal, many e-Infrastructures and related servicesScience Gateways: one portal, many e-Infrastructures and related services
Science Gateways: one portal, many e-Infrastructures and related servicesriround
 

Similaire à Catalog enrichment: importing Dewey Decimal Classification from external sources (slides) (20)

Session3 01.clemens neudecker
Session3 01.clemens neudeckerSession3 01.clemens neudecker
Session3 01.clemens neudecker
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documents
 
JCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingJCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive Profiling
 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext Search
 
TPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingTPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive Profiling
 
AGROVOC GACS Working Group
AGROVOC GACS Working GroupAGROVOC GACS Working Group
AGROVOC GACS Working Group
 
The ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - RomeThe ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
 
Kirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in Germany
Kirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in GermanyKirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in Germany
Kirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in Germany
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival Technologies
 
Everything you always wanted to know about WorldCat (but were afraid to ask) ...
Everything you always wanted to know about WorldCat (but were afraid to ask) ...Everything you always wanted to know about WorldCat (but were afraid to ask) ...
Everything you always wanted to know about WorldCat (but were afraid to ask) ...
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using Loki
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using Loki
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
 
Mongo db basics
Mongo db basicsMongo db basics
Mongo db basics
 
Lokijs
LokijsLokijs
Lokijs
 
Rene Voorburg - Using KB APIs to collect data
Rene Voorburg - Using KB APIs to collect dataRene Voorburg - Using KB APIs to collect data
Rene Voorburg - Using KB APIs to collect data
 
Cloud Foundry Logging and Metrics
Cloud Foundry Logging and MetricsCloud Foundry Logging and Metrics
Cloud Foundry Logging and Metrics
 
BlogForever Project presentation at MTSR2013
BlogForever Project presentation at MTSR2013BlogForever Project presentation at MTSR2013
BlogForever Project presentation at MTSR2013
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Science Gateways: one portal, many e-Infrastructures and related services
Science Gateways: one portal, many e-Infrastructures and related servicesScience Gateways: one portal, many e-Infrastructures and related services
Science Gateways: one portal, many e-Infrastructures and related services
 

Plus de Stefano Bargioni

Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]Stefano Bargioni
 
Catalog Enrichment for RDA - Adding relationship designators (in Koha)
Catalog Enrichment for RDA - Adding relationship designators (in Koha)Catalog Enrichment for RDA - Adding relationship designators (in Koha)
Catalog Enrichment for RDA - Adding relationship designators (in Koha)Stefano Bargioni
 
Koha RDA FRBR: alcune riflessioni (text)
Koha RDA FRBR: alcune riflessioni (text)Koha RDA FRBR: alcune riflessioni (text)
Koha RDA FRBR: alcune riflessioni (text)Stefano Bargioni
 
Koha, RDA, FRBR: alcune riflessioni
Koha, RDA, FRBR: alcune riflessioniKoha, RDA, FRBR: alcune riflessioni
Koha, RDA, FRBR: alcune riflessioniStefano Bargioni
 
Publication cover management in a library system (text)
Publication cover management in a library system (text)Publication cover management in a library system (text)
Publication cover management in a library system (text)Stefano Bargioni
 
Publication cover management in a library system (slides)
Publication cover management in a library system (slides)Publication cover management in a library system (slides)
Publication cover management in a library system (slides)Stefano Bargioni
 
Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using SolrStefano Bargioni
 
Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using SolrStefano Bargioni
 

Plus de Stefano Bargioni (11)

Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
 
Catalog Enrichment for RDA - Adding relationship designators (in Koha)
Catalog Enrichment for RDA - Adding relationship designators (in Koha)Catalog Enrichment for RDA - Adding relationship designators (in Koha)
Catalog Enrichment for RDA - Adding relationship designators (in Koha)
 
Koha RDA FRBR: alcune riflessioni (text)
Koha RDA FRBR: alcune riflessioni (text)Koha RDA FRBR: alcune riflessioni (text)
Koha RDA FRBR: alcune riflessioni (text)
 
Koha, RDA, FRBR: alcune riflessioni
Koha, RDA, FRBR: alcune riflessioniKoha, RDA, FRBR: alcune riflessioni
Koha, RDA, FRBR: alcune riflessioni
 
Publication cover management in a library system (text)
Publication cover management in a library system (text)Publication cover management in a library system (text)
Publication cover management in a library system (text)
 
Publication cover management in a library system (slides)
Publication cover management in a library system (slides)Publication cover management in a library system (slides)
Publication cover management in a library system (slides)
 
Open, Big, & Linked Data
Open, Big, & Linked DataOpen, Big, & Linked Data
Open, Big, & Linked Data
 
Un nuovo motore per Koha
Un nuovo motore per KohaUn nuovo motore per Koha
Un nuovo motore per Koha
 
Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using Solr
 
Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using Solr
 
Stelline 2013
Stelline 2013Stelline 2013
Stelline 2013
 

Dernier

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Dernier (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

Catalog enrichment: importing Dewey Decimal Classification from external sources (slides)

  • 1. Stefano Bargioni Pontificia Università della Santa Croce Catalogue enrichment: importing Dewey Decimal Classification from external sources Oct 18, 2013 ADLUG 2013 1
  • 2. The project ● Improving the Dewey search path – – ● ● with a minimal effort while adding BNCF compliant subject headings to our catalog Koha 3 <http://koha-community.org> open source ILS Can be applied to other ILS's Oct 18, 2013 ADLUG 2013 2
  • 3. Version 1: The Batch Mode ● Add Dewey notations to the catalog – automatically – from selected sources – ensure quality and uniformity Oct 18, 2013 ADLUG 2013 3
  • 4. An atomic copy cataloguing ● ● copy cataloguing is usually related to the full record we only need to copy field 082 (MARC21) or 676 (Unimarc) ● ISBN unique identifier ● the policy issue Oct 18, 2013 ADLUG 2013 4
  • 5. Records to be modified ● without Dewey notation ● with ISBN ● limit: 008 language – SELECT biblionumber, ISBN FROM biblio WHERE ISBN_present AND dewey_absent AND language_008='...' Oct 18, 2013 ADLUG 2013 In Ko cla ha, My use i the W Ex tra SQ s ba HE on ctV L s fie alu fun ed o RE ld e, t ctio n thr bibl ha n ou io. t w exp gh X ma ork res Pa rcxm s sio th l ns 5
  • 6. Dewey Sources (I) ● a choice based on copy cataloguing experience ● OCLC Classify ● some National Libraries ● API, Z39.50 or HTML access Oct 18, 2013 ADLUG 2013 6
  • 7. Dewey Sources (II): OCLC Classify ● ● ● Classify is a FRBR-based prototype designed to support the assignment of classification numbers and subject headings for books, DVDs, CDs, and other types of materials. This project applies principles of the FRBR model to aggregate bibliographic information above the manifestation level. Bibliographic records are grouped using the OCLC FRBR Work-Set algorithm to form a work-level summary of the class numbers and subject headings assigned to a work. You can retrieve a summary by ISBN, ISSN, UPC, OCLC number, author/title, or subject heading. The Classify database is accessible through a user interface and as a machine-to-machine service. The database provides access to more than 36 million WorldCat records that contain Dewey Decimal Classification (DDC) numbers,[...]. ● Retrieved information is in XML format. ● http://www.oclc.org/research/activities/classify.html?urlm=159746 Oct 18, 2013 ADLUG 2013 7
  • 8. Dewey Sources (III): National Libraries LC Library of Congress (any) MARC BNF Bibliothèque nationale de France (fre) MARC DNB Deutsche Nationalbibliothek (ger) HTML BNCF Biblioteca Nazionale Centrale di Firenze (ita) HTML BNCR Biblioteca Nazionale Centrale di Roma (ita) HTML BNB British National Bibliography (eng) MARC Oct 18, 2013 ADLUG 2013 8
  • 9. The logic used in the programs ● open the connection to the bibliographical database ● obtain the ISBN from records without a Dewey number ● open the connection to the Dewey source, if Z39.50 ● for each ISBN ● query the data source using the current ISBN ● if a Dewey number is available in the response ● if the Dewey number passes quality control ● update the bibliographical record ● wait to avoid overloading ● close the connection to the Dewey source, if Z39.50 ● close the connection to the bibliographical database Oct 18, 2013 ADLUG 2013 9
  • 10. Quality check ● Catalogs contain errors ● DDC has many editions ● Our old Dewey numbers start from edition 19 ● Indicators ● Lot of discarded Dewey... ● … but we moved from 40,000 to 60,000 records with Dewey number Oct 18, 2013 ADLUG 2013 +5 0% 10
  • 11. Delay while searching sources ● Continuous searching can suffocate remote servers – – ● ● robots.txt policies for crawlers Continuous indexing can overload your server Wait a few seconds between searches or group of searches – this will slow the harvesting process Oct 18, 2013 ADLUG 2013 11
  • 12. Statistics Source Language Dewey # not found Dewey # discarded Classify all 42387 10267 5321 6607 20059 LC all 31999 1252 21195 8562 1011 BNF all 30903 2253 21327 7268 55 DNB ger 4193 163 3867 163 0 BNCF ita 12017 4088 3643 3542 744 BNCR ita 7549 1515 3003 2978 53 BNB eng 6215 193 5449 55 518 Total Oct 18, 2013 Records Scanned Records Modified ISBN not found Several works with same ISBN 8240 ISBN incorrect 133 19710 ADLUG 2013 12
  • 13. Browsing Dewey Index Besides author, uniform titles and subject headings, our OPAC offers a path of semantic search based on the Dewey classification number Oct 18, 2013 ADLUG 2013 13
  • 14. Software ● Query programs were written in Perl language, making use of the Koha API and the following libraries available on CPAN: – LWP for HTTP connections – ZOOM for Z39.50 connections – DBI for connections to the MySQL database – XML::XPath for XML data processing – WWW::Scraper for HTML data processing – MARC::Record for MARC records processing Oct 18, 2013 ADLUG 2013 14
  • 15. A scientific article ● ● published on JLIS.it at http://leo.cilea.it/index.php/jlis/article/view/8766 JLIS.it, Italian Journal of Library and information science, is an academic journal of international scope, peer-reviewed and open access ● written with my cataloguers ● doesn't deal with the dynamic component Oct 18, 2013 ADLUG 2013 15
  • 16. Version 2.0 - Single Record Mode ● New record: – – retrieve Dewey from important catalogs – ● enter the ISBN choose and import the best one into the new record Or upgrade an old record adding or modifying its Dewey classification Oct 18, 2013 ADLUG 2013 16
  • 18. Conclusions ● Increase of available bibliographic data on the net ● Unique identifiers – – ● ISBN, ISSN, ... VIAF Id, ISNI, ... Catalog enrichment – – ● bibliographic records authority records Expose rich linked data – with coded information like Dewey – with standard IDs like iSBN, ISNI, ... Oct 18, 2013 ADLUG 2013 18
  • 19. Thank you Gracias Grazie Oct 18, 2013 ADLUG 2013 19