2. Little History
MarcEdit Development started around 1999ish (as parts)
◦ Originally coded in 3 programming languages: Assembler (libraries), Visual Basic (UI) and Delphi (COM).
◦ I started writing it as an undergraduate to better understand MARC & circumvent OCLC’s Passport for
Windows program
◦ First “MarcEdit” was released Sept. 11, 2000 (thank you WayBack Machine:
http://web.archive.org/web/20001017105529/http://ucs.orst.edu/~reeset/marcedit/indexb.html)
Today:
◦ Written in C# (Windows/Linux) & Object-C/C# (OSX)
◦ Active user community is ~20,000ish (based on update logs)
◦ Used in ~190ish countries/political regions
◦ Roughly 1/3 of the users reside outside of Canada/United States*
* Based on loose analysis of server logs by my server-side stats software
4. MarcEdit Evolution
Early application was developed to (again, thank you Internet Archive):
1. Be user-friendly (whether I’ve accomplished that is debatable – I’m not a UI designer)
2. Support LC’s MARCBreakr/Maker diacritics (largely yes)
3. Be fast (which I think that it is)
4. Simplify editing records in batch
5. Provide a set of programming tools to solve my own local needs
6. Three development rules I follow
MarcEdit is a real-world metadata tool
◦ Tool is designed to provide workflows for data problems currently facing libraries right now
MarcEdit is MARC Agnostic
◦ Too many metadata tools are anglo-centric; MarcEdit has been designed to work within the very
heterogeneous metadata environment that we find ourselves today, which includes:
◦ Support for MARC (not a particular flavor*)
◦ Near universal characterset support (because the world is bigger than MARC8 and UTF8)
◦ Supports a wide range of Library metadata standards beyond MARC
MarcEdit is one part of the larger library metadata tooling environment
◦ So integrations with OCLC, ILSs (when possible), OpenRefine are important
* And if something assumes MARC21 – call it out
7. So how does any of this relate to
semantic data in Libraries?
http://musictheorysite.com/img/dwight_question.jpg
8. A lot of metadata people I talk to fall into
two camps
9. BibFrame and Linked Data as RDA 2.0
BibFrame
http://www.wired.com/wp-content/uploads/archive/news/images/full/duke_nukem_frever_f.16807.jpg
http://astronomy.nmsu.edu/cwc/Group/magiicat/images/magiicat-logo.gif
Linked Data
10. BibFrame and linked data as datacorns
https://whatsthebigdata.files.wordpress.com/2015/10/datascience_unicorn.png?w=640
11. I prefer a more practical outlook…
https://www.etsy.com/search?q=unicorn+cat+hat
12. MarcEdit’s MARCNext
MarcEdit’s MARCNext is a first attempt to start
having this discussion by:
1. Integrating a linked data framework into
MarcEdit, including tooling for:
a. JSON-LD
b. SPARQL
c. RDF
2. Providing catalogers with proof of concept
tools to begin experimenting with their own
data
3. Provide a method to integrate semantic
concepts into legacy data
4. Provide a toolset that MarcEdit can use to
build new tools.
13. Let’s take a closer look at two
Link Identifiers Tool
◦ This tool embeds URIs into MARC data
◦ Is rules driven (i.e., not MARC21 centric)
◦ Supports ~24 different in-use data sources
Validate Headings Tool
◦ First tool in MarcEdit to make use of the tools linked data platform and available data services to
provide a real-world application.
15. Link Identifiers Tool
Initially released in Aug. 2014[1] as a proof of concept for testing the linked data framework
being developed in MarcEdit
◦ Initially only processed LCSH and NAF
Currently, I’ve profiled ~24 data sources, and the tool can be integrated in MarcEdit’s Task
Workflow.
◦ Translation profiles are currently in flux, as I work with a PCC group developing recommendations for
embedding URIs in MARC records.
◦ Working on a process that would allow users to self-profile identifier services, so long as they supported
JSON-LD or SPARQL.
[1] MarcEdit’s Research Toolkit: MARCNext: http://blog.reeset.net/archives/1359
16. Link Identifiers Tool
Tool has evolved over the last year to utilize a rules based configuration (example):
<field type="bibliographic">
<tag>630</tag>
<ind2 value="0" vocab="naf_lcsh" />
<ind2 value="1" vocab="lcshac" />
<ind2 value="2" vocab="mesh" />
<subfields>adfkqnp</subfields>
<uri>0</uri>
<special_instructions>mixed</special_instructions>
</field>
<field type="authority|bibliographic">
<tag>336</tag>
<subfields>a</subfields>
<index>2</index>
<uri>0</uri>
</field>
20. Linked Data tools
Things that are still hard:
◦ Most identifier services use their own rules for data escaping – and they aren’t documented
◦ Many services are still not well suited for this work
◦ Anything that doesn’t provide an option to do an exact lookup like ULAN, AAT, or VIAF – all these require additional
processing to ensure that results match the queried term.
◦ Many services are little “p” production in that lots of look-ups can (and do) cause problems.
21. Validate Headings
Automated authority control processing
◦ Utilizes id.loc.gov
◦ Provides reports of data that isn’t currently “authorized”
◦ Provides options for generating brief authorities
◦ Extracts for further data processing
◦ Ability to embed URIs during validation
◦ If URIs are present – they are used rather than a direct look up
◦ Automatic heading correction when variants are encountered
24. Continued work…
Would like to continue to add additional vocabularies
Expand headings validation to more than just LCSH/NAF
Include Linking Profiles for UNIMARC
Using Linked Data sources for sameas subject generation