#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

#mashcat:
Evolving MarcEdit
LEVERAGING SEMANTIC DATA IN MARCEDIT

Little History
MarcEdit Development started around 1999ish (as parts)
◦ Originally coded in 3 programming languages: Assembler (libraries), Visual Basic (UI) and Delphi (COM).
◦ I started writing it as an undergraduate to better understand MARC & circumvent OCLC’s Passport for
Windows program
◦ First “MarcEdit” was released Sept. 11, 2000 (thank you WayBack Machine:
http://web.archive.org/web/20001017105529/http://ucs.orst.edu/~reeset/marcedit/indexb.html)
Today:
◦ Written in C# (Windows/Linux) & Object-C/C# (OSX)
◦ Active user community is ~20,000ish (based on update logs)
◦ Used in ~190ish countries/political regions
◦ Roughly 1/3 of the users reside outside of Canada/United States*
* Based on loose analysis of server logs by my server-side stats software

MarcEdit Evolution
MarcEdit 1.0-2.0 Main Window MarcEdit MARC Tools 1.0-2.0
MarcEdit 1.0-2.0 MarcEditor

MarcEdit Evolution
Early application was developed to (again, thank you Internet Archive):
1. Be user-friendly (whether I’ve accomplished that is debatable – I’m not a UI designer)
2. Support LC’s MARCBreakr/Maker diacritics (largely yes)
3. Be fast (which I think that it is)
4. Simplify editing records in batch
5. Provide a set of programming tools to solve my own local needs

Three development rules I follow
MarcEdit is a real-world metadata tool
◦ Tool is designed to provide workflows for data problems currently facing libraries right now
MarcEdit is MARC Agnostic
◦ Too many metadata tools are anglo-centric; MarcEdit has been designed to work within the very
heterogeneous metadata environment that we find ourselves today, which includes:
◦ Support for MARC (not a particular flavor*)
◦ Near universal characterset support (because the world is bigger than MARC8 and UTF8)
◦ Supports a wide range of Library metadata standards beyond MARC
MarcEdit is one part of the larger library metadata tooling environment
◦ So integrations with OCLC, ILSs (when possible), OpenRefine are important
* And if something assumes MARC21 – call it out

So how does any of this relate to
semantic data in Libraries?
http://musictheorysite.com/img/dwight_question.jpg

A lot of metadata people I talk to fall into
two camps

BibFrame and Linked Data as RDA 2.0
BibFrame
http://www.wired.com/wp-content/uploads/archive/news/images/full/duke_nukem_frever_f.16807.jpg
http://astronomy.nmsu.edu/cwc/Group/magiicat/images/magiicat-logo.gif
Linked Data

BibFrame and linked data as datacorns
https://whatsthebigdata.files.wordpress.com/2015/10/datascience_unicorn.png?w=640

I prefer a more practical outlook…
https://www.etsy.com/search?q=unicorn+cat+hat

MarcEdit’s MARCNext
MarcEdit’s MARCNext is a first attempt to start
having this discussion by:
1. Integrating a linked data framework into
MarcEdit, including tooling for:
a. JSON-LD
b. SPARQL
c. RDF
2. Providing catalogers with proof of concept
tools to begin experimenting with their own
data
3. Provide a method to integrate semantic
concepts into legacy data
4. Provide a toolset that MarcEdit can use to
build new tools.

Let’s take a closer look at two
Link Identifiers Tool
◦ This tool embeds URIs into MARC data
◦ Is rules driven (i.e., not MARC21 centric)
◦ Supports ~24 different in-use data sources
Validate Headings Tool
◦ First tool in MarcEdit to make use of the tools linked data platform and available data services to
provide a real-world application.

Initially released in Aug. 2014[1] as a proof of concept for testing the linked data framework
being developed in MarcEdit
◦ Initially only processed LCSH and NAF
Currently, I’ve profiled ~24 data sources, and the tool can be integrated in MarcEdit’s Task
Workflow.
◦ Translation profiles are currently in flux, as I work with a PCC group developing recommendations for
embedding URIs in MARC records.
◦ Working on a process that would allow users to self-profile identifier services, so long as they supported
JSON-LD or SPARQL.
[1] MarcEdit’s Research Toolkit: MARCNext: http://blog.reeset.net/archives/1359

Tool has evolved over the last year to utilize a rules based configuration (example):
<field type="bibliographic">
<tag>630</tag>
<ind2 value="0" vocab="naf_lcsh" />
<ind2 value="1" vocab="lcshac" />
<ind2 value="2" vocab="mesh" />
<subfields>adfkqnp</subfields>
<uri>0</uri>
<special_instructions>mixed</special_instructions>
</field>
<field type="authority|bibliographic">
<tag>336</tag>
<subfields>a</subfields>
<index>2</index>
<uri>0</uri>
</field>

Linked Identifiers: Turning strings
=336 $atext$btxt$2rdacontent
=337 $aunmediated$bn$2rdamedia
=338 $avolume$bnc$2rdacarrier
=600 10$6880-06$aHu, Zongnan,$d1896-1962$vDiaries.
=650 0$aGenerals$zChina$vBiography.
=650 0$aGenerals$zTaiwan$vBiography.
=600 17$aHu, Zongnan,$d1896-1962.$2fast$0(OCoLC)fst00131171
=650 7$aGenerals.$2fast$0(OCoLC)fst00939841
=651 7$aChina.$2fast$0(OCoLC)fst01206073
=651 7$aTaiwan.$2fast$0(OCoLC)fst01207854
=655 7$aDiaries.$2lcgft
=655 7$aAutobiographies.$2lcgft

Linked Identifiers: into strings+
=336 $atext$btxt$2rdacontent$0http://id.loc.gov/vocabulary/contentTypes/txt
=337 $aunmediated$bn$2rdamedia$0http://id.loc.gov/vocabulary/mediaTypes/n
=338 $avolume$bnc$2rdacarrier$0http://id.loc.gov/vocabulary/carriers/nc
=600 10$6880-06$aHu, Zongnan,$d1896-1962$vDiaries.$0http://id.loc.gov/authorities/names/n84029846
=650 0$aGenerals$zChina$vBiography.$0http://id.loc.gov/authorities/subjects/sh2008105087
=650 0$aGenerals$zTaiwan$vBiography.$0http://id.loc.gov/authorities/subjects/sh2008105117
=600 17$aHu, Zongnan,$d1896-1962.$2fast$0http://id.worldcat.org/fast/00131171
=650 7$aGenerals.$2fast$0http://id.worldcat.org/fast/00939841
=651 7$aChina.$2fast$0http://id.worldcat.org/fast/01206073
=651 7$aTaiwan.$2fast$0http://id.worldcat.org/fast/01207854
=655 7$aDiaries.$2lcgft$0http://id.loc.gov/authorities/genreForms/gf2014026085
=655 7$aAutobiographies.$2lcgft$0http://id.loc.gov/authorities/genreForms/gf2014026047

Linked Data tools
Things that are still hard:
◦ Most identifier services use their own rules for data escaping – and they aren’t documented
◦ Many services are still not well suited for this work
◦ Anything that doesn’t provide an option to do an exact lookup like ULAN, AAT, or VIAF – all these require additional
processing to ensure that results match the queried term.
◦ Many services are little “p” production in that lots of look-ups can (and do) cause problems.

Validate Headings
Automated authority control processing
◦ Utilizes id.loc.gov
◦ Provides reports of data that isn’t currently “authorized”
◦ Provides options for generating brief authorities
◦ Extracts for further data processing
◦ Ability to embed URIs during validation
◦ If URIs are present – they are used rather than a direct look up
◦ Automatic heading correction when variants are encountered

Validate Headings
Validate Headings can be run from inside the
MarcEditor, or outside as a stand alone tool

Continued work…
Would like to continue to add additional vocabularies
Expand headings validation to more than just LCSH/NAF
Include Linking Profiles for UNIMARC
Using Linked Data sources for sameas subject generation

Questions
Contact Information:
Terry Reese
Email: reese.2179@osu.edu or reeset@gmail.com
MarcEdit Website: http://marcedit.reeset.net
Help: http://marcedit.reeset.net/help

#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit

Similaire à #mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit (20)

Plus de Terry Reese

Plus de Terry Reese (20)

Dernier

Dernier (20)

#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit