Introduction to Multilingual Retrieval Augmented Generation (RAG)
OMG! My metadata is as fresh as the Backstreet Boys: How Google Refine can update, clean up and link your metadata to the wider world
1. OMG! MY METADATA IS AS
FRESH AS THE BACKSTREET
BOYS: HOW GOOGLE REFINE
CAN UPDATE, CLEAN UP AND
LINK YOUR METADATA TO THE
WIDER WORLD
SARAH BETH WEEKS
LIBRARY TECHNOLOGY CONFERENCE 2013
WEEKSS@STOLAF.EDU
@RASCALWHALE
2. SAMPLE PROJECT: NORDIC AMERICAN
IMPRINTS
Situation: Wanted to match publishers of our books against a
list of important Nordic American Publishers (compiled by Penny
Huf fman) to find materials for our special collections.
Problem: Hard to compare when publication info is not
controlled:
3. ANSWER: GOOGLE REFINE!
Google Refine can “match and
merge” messy data filled with:
Random, leading or trailing spaces
stray punctuation
typos
odd capitalization
and more!
24. END RESULT?
Using Google Refine we were able to reduce the
3230 unique values for city (260|a) to just 1153. For
publishers (260|b) we went from 11342 unique
names for publishers to approximately 6500.
This project helped to identify over 2,000 potential
candidates for our Nordic American Imprints
collection. (These are still being evaluated).
The controlled publishers, cities of publications and
dates will be added to a local 9xx field for faceting in
our future special collections discover tool. Users will
be able to browse our Nordic American Imprints
collection by publisher, city or state.
28. FOR THE REST CLICK THE OPTIONS TO
SEE WHAT EACH REPRESENTS
Then click “Match All Identical Cells” (or double checkmarks)
to link all cells with this text to this Freebase topic
29. OR “SEARCH FOR MATCH” TO BRING UP
AN AUTO-FILL LIST TO CHOOSE FROM
37. THANK YOU!
Questions?
Link to a public version of this presentation
at my (personal) blog:
gardenandalibrary.blogspot.com
I’m also happy to take questions by e-
mail
weekss@stolaf.edu