Denver, Colorado
Sunday, November 4, 2012
Adrian Turner, California Digital Library
Ray R. Larson, School of Information, UC Berkeley
Brian Tingle, California Digital Library
http://www.diglib.org/forums/2012forum/social-networks-and-archival-context-project/
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Dlf 2012
1. Ray R. Larson, School of Information, UC Berkeley
Brian Tingle, California Digital Library
Adrian Turner, California Digital Library
2012 DLF Forum | Denver, CO
4. Hamilton, Alexander,
Hamilton Alexander 1757 1804
Patton, G
George S.
S
(George Smith),
Luce, Clare Boothe, 1903
1885 1945
1987
Oppenheimer, J. Robert, 1904 Sontag, Susan, 1933 2004
1967
Archival Name
Washington, George, 1732 1799
Authority System
Whitman, Walt, 1819 1892
Patton family Wright, Lloyd, 1890 1978
5. Franklin, Benjamin, 1706 1790
Anthony, Susan B
y Buckminster
Fuller, R.
Hamilton, Alexander,
Hamilton Alexander 1757 1804 (Richard Buckminster) S.
Buckminster),S1895 1983
Patton, G George
Berkeley Free Church (George Smith),
Luce, Clare Boothe, 1903
1885 1945 1757 1804
Hamilton, Alexander,
1987
Bernstein, Leonard, Sontag, Clare Boothe,2004 1987
Oppenheimer, J. Robert, 1904
Luce, Susan, 1933 1903
Oppenheimer, J. Robert, 1904 1967
Archival Name Washington, George, 1732 1799
19181967
Authority System
Whitman, Walt, 1819 1892
Patton family
Patton family
Block, Herbert, 1909 2001
Wright, Lloyd, 1890 1978
Bush, Vannevar, 1890 1974 Patton, George S. (
(George Smith),
h)
Frankfurter, Felix, 1882 1965
kf l
6. Engelland, Jurgen (George). Walfred.
Enwall, Ogie (Aage).
Erickson, Norwick, Goodman.
Selma Inez.
Inez Nygaard, Lars Thomas
Thomas. Holmes,
Holmes Anna Gudrun HaugeHauge.
Fahl, Hans Johan Fredrik. Odmark, Elsie Karlson. Holmes, Elias Kristofferson Velholmen.
Fet, Peter Laurits. Ohrt, Sigfrid Eidsness. Hoset, Ole.
Flones, Edward. Oliver, Kole Skaflestad. Howard, Barnett Allen, b. 1827.
Olson, Alvin E.
Fredrickson, Hans. Hamilton, Alexander, 1757 1804 Opsal, Cato Torvald. Hytmo, Guri Olsdatter.
1885 1945 Johnson, Andrew (Anders Johansson).
Fredrickson, S
F di k Sven Fredrick.
F di k Petersen, Greta Jensen. Johnson, Phiea Petersen Stahl.
Garberg, Peder. Luce, Clare Boothe, 1903 Rasmussen, Martin.
Johnson, Thelma Irene
Gillam, Chandler B., 1833 1899. Rinne, Esther Wiirre. Sontag, Susan, 1933 2004
1987
Halseth, Otto Hjalmer.Rodney family Washington, George, 1732 1799 Underdal.
Handeland, Martha Tweiten.Sandback, George Brun.
Jorgenson, Jorgen Aadneram.
Hansen, A
H Anne S h id S
Schmidt.Saure, Si t A d
Sivert Andreas.
Oppenheimer, J. Robert, 1904 Enwall, Ogie (Aage). Kjersem, Ole Johnson.
Hansen, Sylvia (Solveig).
Haug, Olga Karoline Nilsen.
Whitman, Walt, 1819 1892
Erickson,
1967 Knudsen, Johanne.
Hemmestad, Olga Kristine Brodahl.
Henry, Oscar M., 1851 1916.
Archival Name
Selma Inez. Flones, Edward. Kofoed, Thorvald Andreas.
Larsen, Elias.
Fredrickson, Hans.
Holmes, Anna Gudrun
H l A G d Authority System
Fahl,
Fahl Hans Johan Fredrik.
Fredrik
Lillelien, Thor.
Fet, Peter Laurits. Norberg, Jonas Walfred. Fredrickson, Sven Fredrick.
Hauge. Patton family Garberg, Peder.
Norwick, Goodman.
Loe, Otto Calvin.
Molund, Erik Wilhelm.
Nygaard, Lars Thomas. ChandlerNakkerud,1899.Amanda Treland.
Gillam, B., 1833 Inga
Holmes, Elias Kristofferson Odmark, Elsie Karlson. Halseth, Otto Hjalmer. Nakkerud, Trygve Bloch.
Velholmen .
Patton, George S. Ohrt, Sigfrid Eidsness.
Oh t Si f id Eid .
Nerland, Einar Magnus.
Nelson, Amanda.
Nelson Amanda
Hoset, Ole. Oliver, Kole Skaflestad.
(George Smith),
Howard, Barnett Allen, b. 1827. Olson, Alvin E. Wright, Lloyd, 1890 1978 Nielsen , Einer.
Hytmo, Guri Olsdatter. Opsal, Cato Torvald.
Petersen, Greta Jensen. Nilsen, Martha Dagsvik.
Knudsen, Johanne. Nissen,
Nissen Ole Andreas Nissenivert Andreas
Andreas.
Rasmussen, Martin.
R M ti
Kofoed, Thorvald Andreas. Johnson, Andrew (Anders Johansson).
Rinne, Esther Wiirre.
Nakkerud, Inga Amanda Treland. Johnson, Phiea Petersen Stahl.
Rodney family
Nakkerud, Trygve Bloch.
Nelson, Amanda.
Sandback, George Brun. Johnson, Thelma Irene
Nerland, Einar Magnus. Saure, SHandeland, Martha
Nielsen, Einer. Tweiten.
Underdal.
Underdal
Nilsen, Martha Dagsvik. Jorgenson, Jorgen Aadneram.
Hansen, Anne Schmidt.
Nissen, Ole Andreas Nissen. Kjersem, Ole Johnson.
Hansen, Sylvia (Solveig).
Norberg, Jonas Haug, Olga Karoline Nilsen.
7. Engelland, Jurgen (George). Nelson, Amanda. Hoset, Ole.
Enwall, Ogie (Aage). Nerland, Einar Magnus. Howard, Barnett Allen, b. 1827.
Erickson, Nielsen, Einer. Hytmo, Guri Olsdatter.
Engelland, Jurgen (George).
Selma Inez. Engelland, Jurgen (George).
Nelson, Amanda. Martha Dagsvik.
Nilsen, Nelson, Amanda.Hoset, Ole. Hoset, Ole.
Johnson, Andrew (Anders Johansson).
Enwall, Ogie (Aage).
Fahl, Hans Johan Fredrik. Nerland, Einar Magnus. , Ole Andreas Nissen.
Enwall, Ogie (Aage).
Nissen, Nerland, Einar Magnus. Barnett Allen, b. 1827. Howard, Barnett Allen, b. 1827.
Howard,
Fet, Peter Laurits. Erickson,
Ei k Nielsen, Ei
Ni l Einer. E i k
Erickson, Nielsen, Ei
Ni l Einer.
Johnson, H t G Petersen Stahl.H t G
Phiea i Olsdatter.
Hytmo, Guri Ol d tt Hytmo, Gu
Selma Inez. Selma Inez.
Flones, Edward. Nilsen, Martha Dagsvik.Norberg, Jonas Walfred.
Nilsen, Martha Dagsvik. Johnson, Andrew (Anders Johansson). Irene (Anders
Johnson, Thelma Johnson, Andrew
Fahl, Hans Johan Fredrik. Fredrickson, Hans Johan Fredrik.
Fahl, Hans. Norwick, Nissen. Benjamin, 1706 1790
Nissen, Ole Andreas Franklin, Goodman. Ole Andreas Nissen.Petersen Stahl. Phiea Peterse
Nissen, Johnson, Phiea Johnson,Underdal.
Fet, PeterFredrickson, Sven Fredrick.Peter Laurits.
Laurits. Fet, Nygaard, Lars Thomas.
Anthony, Susan B
y
Flones, Edward.
Garberg, Peder. Flones, Edward.
Odmark, Elsie Karlson.
Buckminster
Norberg, Jonas Walfred. Norberg, Jonas Walfred. R. JorgenIrene Johnson, Thelma Aadneram. Thelm
Jorgenson,
Fuller, Johnson,
FredricksonB., 1833 1899 Hamilton AlexanderNorwick, Goodman.
Fredrickson, Hans 1899.Hamilton, Alexander, 1757 1804
Hans. Fredrickson, Hans.
Fredrickson Hans Norwick, Goodman.
Gillam,
Gillam Chandler B
Fredrickson, Sven Fredrick. Fredrickson, Sven Hjalmer. Nygaard, Lars Thomas.
Halseth, Otto Fredrick.
(Richard Buckminster) S.
Buckminster),S1895 1983
Patton, G
George
Ohrt, Sigfrid Eidsness. Kj Underdal.
Underdal h
Kjersem, Ol Johnson.
Ole J U
Nygaard, Lars
Oliver, Kole Skaflestad. Thomas.
Garberg, Peder. Handeland,Peder. Odmark, Elsie Karlson.E.
Garberg, Martha Tweiten.
Olson, Alvin (George Smith),
Odmark, Elsie Karlson. Jorgenson, Jorgen Aadneram. Jorgen AadJorgenson,
Knudsen, Johanne.
Berkeley Free Church
Gillam, Chandler B., 1833 1899. Gillam, Chandler B., 1833 1899.
Luce, Clare Boothe, 1903
Hansen, Anne Schmidt. Ohrt, Sigfrid Eidsness.
Opsal, Cato Torvald. Ohrt, Sigfrid Eidsness. Kofoed, Thorvald Andreas. Ole J
Kjersem, Ole Johnson. Kjersem,
Halseth, Otto Hjalmer.
Hansen, Sylvia (Solveig). 1885 1945
Halseth, Otto Hjalmer. Kole Skaflestad.
Oliver, Petersen, Greta Jensen.Larsen, Elias. 1757 1804
Hamilton, Alexander,
Oliver, Kole Skaflestad.
g, g Karoline Nilsen. 1987
Haug, OlgaHandeland, Martha Tweiten.Olson, Alvin E. Martha Tweiten.Olson, Rasmussen Martin.Lillelien, Thor.
Handeland, Alvin E.
Rasmussen, Martin
Knudsen, Johanne. Knudsen, J
Hansen, Anne Schmidt.
Hemmestad, Olga Kristine Brodahl. Hansen, Anne Schmidt. Cato Torvald.
Opsal, Kofoed, Thorvald Andreas.
Loe, OttoCato Torvald.
Opsal, Calvin. Kofoed, Thorv
Bernstein, Leonard, Sontag, Clare Boothe,2004 1987
Henry,
Haug, Olga Karoline Nilsen.
Holmes, Anna Gudrun
Luce, Susan, 1933 1903
Hansen, Sylvia (Solveig). OscarHansen, Sylvia (Solveig).
M., 1851 1916.
Rinne, Esther Wiirre.
Rodney family Petersen, Greta Jensen.Larsen, Elias.
Oppenheimer, J. Robert,George Brun.
Haug, Olga Karoline Nilsen.
Sandback, 1904Rasmussen, Martin.
Petersen, Greta Jensen.Larsen, Elias.
Molund, Erik Wilhelm.
Lillelien, Thor.
Rasmussen, Martin.Lillelien, Thor. Treland.
Nakkerud, Inga Amanda
Hemmestad, Olga Kristine Brodahl. Oppenheimer, J. Robert, 1904 1967
Archival Name
Hemmestad, Olga Kristine Brodahl. Washington, George, 1732 1799
Rinne, Esther Wiirre. Andreas. Rinne, Esther Wiirre. Loe, Otto Calvin. Nakkerud, Otto Calvin.
Loe, Trygve Bloch.
1918 g Holmes, Anna Gudrun System
Henry, Oscar M., 18511967 Henry,Rodney family 1916.
Hauge.
Holmes, Anna Gudrunl
l d d
1916.
Authority
Saure, Sivert
Oscar M., 1851
Sandback, George Brun.
Rodney( family
Enwall, Ogie (Aage).
g g ) Molund, Erik Wilhelm. Molund, Erik Wilhelm.
Nelson, Amanda.
Sandback, George Brun. Nerland, Einar Magnus. d Treland.N kk d I
Erickson, Nakkerud, Inga A
N kk d I Amanda l d Nakkerud, Inga A Amanda
d
Holmes, Elias Kristofferson Whitman, Walt, 1819 1892 Nakkerud, Trygve Bloch. Nakkerud, Trygve Bl
Hauge. Hauge. Patton family
Saure, Sivert Andreas.
Nielsen
Saure, Sivert Andreas.
Velholmen Patton family
Block, .Herbert, 1909 2001 Inez.
Holmes, Elias Kristofferson Elias Kristofferson
Hoset, Ole. Holmes,
Selma Enwall, Ogie (Aage).
Fahl, Hans Johan Fredrik. Erickson,
Enwall, Ogie (Aage).Nelson, Amanda.
Nerland, Einar Magnus.
Wright, Lloyd, 1890 1978
Nilsen, Erickson,
Martha Dagsvik.
, Einer. Nelson, Amanda
Nerland, Einar Magnus.
Howard, Barnett Allen, b. 1827.
Velholmen
V lh l Vannevar, V lh l1974
Velholmen
. Selma Inez.
InezNielsenSmith),
Fet, Peter Laurits.
Patton, George S. (
(George
Flones,. Ed
Fl
h)
Edward.d Selma Inez.
Inez
Nissen, Ole Andreas Nissen.
Norberg, Jonas Walfred.
, Einer
Einer. Nielsen , Ei
Hytmo, Guri Olsdatter.1890
Hoset, Ole.
Bush, Hoset, Ole.
Fahl, Hans Johan Fredrik. Fredrickson, Hans JohanNilsen, Martha Dagsvik. Norwick, Goodman. Dagsvik.
Fahl, Hans. Fredrik. Nilsen, Martha
Fet, Peter Laurits. Fredrickson, Sven Fredrick. Nissen, Ole Andreas Nissen.
Johnson, Andrew (Anders Johansson). Fet, Peter Laurits. Nissen, Ole Andreas Nissen.
Nygaard, Lars Thomas.
Howard, Barnett Allen, b. 1827. Howard, Barnett Allen, b. 1827.
Flones, Edward. Flones, Edward. Norberg, Jonas Walfred. Odmark, Norberg, Jonas Walfred.
Elsie Karlson.
Hytmo, Guri Olsdatter.Garberg, Peder.
Johnson, Phiea Petersen Stahl.
Hytmo, Guri Olsdatter.
Johnson, Thelma Irene Underdal. Fredrickson, Hans. Fredrickson, Hans. Norwick,Ohrt, Sigfrid Eidsness. Norwick, Goodman.
Goodman.
Jorgenson AndrewAadneram
Johnson,
Johnson (Anders Johansson)
Jorgenson, Jorgen Aadneram. Johansson).
Johnson, Fredrickson, Sven Fredrick. 1833Fredrickson, Sven Fredrick. Skaflestad
Gillam, Chandler B.,
Johnson Andrew (Anders Johansson)
Johansson). 1899. Nygaard,
Nygaard Lars Thomas
Oliver, Thomas.
Oliver Kole Skaflestad. Nygaard,
Nygaard Lars Thomas
Thomas.
Frankfurter, Felix, 1882 Peder. Garberg, Peder.
kf Garberg, 1965
l
Johnson, Phiea Petersen Stahl. Johnson, Phiea Petersen Stahl.
Kjersem, Ole Johnson.
Halseth, Otto Hjalmer. Olson, Alvin E. Karlson.
Odmark, Elsie Odmark, Elsie Karlson.
Johnson, Thelma Knudsen, Johanne.
Irene Underdal. Johnson, Thelma Irene Underdal. Handeland, Martha Tweiten. Ohrt, Sigfrid Eidsness.
Opsal, Cato Torvald. Ohrt, Sigfrid Eidsness.
Gillam, Chandler B., 1833 1899. Gillam, Chandler B., Skaflestad.
Hansen, Anne Schmidt. Oliver, Kole 1833 1899. Jensen. Oliver, Kole Skaflestad.
Jorgenson, Jorgen Aadneram.Thorvald Andreas. Aadneram.
Kofoed, Jorgenson, Jorgen Petersen, Greta
Halseth, Otto Hjalmer.
Hansen, Sylvia (Solveig). Halseth, Otto Hjalmer.
Olson, Alvin E.
Kjersem, Ole Johnson. Larsen, Elias. Ole Johnson.
Kjersem, Rasmussen, Martin. E.
Olson, Alvin
Handeland, Martha Tweiten. Handeland, Martha Tweiten.
Haug, Olga Karoline Nilsen. Opsal, Cato Torvald.
Knudsen, Johanne. Thor.
, Lillelien, Knudsen, Johanne.
, Rinne, Esther Wiirre.Torvald.
, Opsal, Cato
Hemmestad, Ol Schmidt. Brodahl. Hansen,
H Andreas.dAnne Schmidt B d hl Hansen AnneGreta Jensen. Rodney family Greta Jensen.
Hansen,
Hansen Olga Kristine
Ki i Petersen, Schmidt
Schmidt. Petersen,
Loe, Otto Thorvald Andreas.
Kofoed, Calvin. Kofoed, Thorvald
Hansen, Sylvia (Solveig). 1851 Hansen, Sylvia (Solveig).
Henry, Oscar M., 1916. Rasmussen, Martin. Rasmussen, Martin.
Molund,Larsen, Elias.
Erik Wilhelm. Larsen, Elias.
Haug, Olga Karoline Nilsen.
Sandback, George Brun.
Lillelien, Thor. Holmes, Anna Gudrun Haug, Olga Karoline Nilsen.Saure, Sivert Andreas.
Lillelien, Thor. Hauge. Rinne, Esther Wiirre. Rinne, Esther Wiirre.
Nakkerud, Inga Amanda Treland. Hemmestad, Olga Kristine Brodahl. Velholmen. Kristine Brodahl.
Hemmestad, Olga
Loe, Otto Calvin. Loe, Otto Calvin. Holmes, Elias Kristofferson Rodney family Rodney family
Nakkerud, Trygve Bloch. Henry, Oscar M., 1851 1916. Henry, Oscar M.,Sandback, George Brun.
1851 1916.
Molund, Erik Wilhelm. Molund, Erik Wilhelm. Sandback, George Brun.
Holmes, Anna Gudrun Hauge. Holmes, Anna Gudrun Hauge. Andreas
Saure Sivert Saure Sivert Andreas
N kk d I A d T l d N kk d I A d T l d
8. Archival Name
Archival Name
Authority System
Authority System
9. Archival Name
Archival Name
Authority System
Authority System
11. Background
• Research and demonstration project
• Multi year funding
• National Endowment for the Humanities
(2010 2012)
• Andrew W. Mellon Foundation ((2012
2014)
12. Objectives
1. Develop tools for extracting EAC CPF
l l f
records, drawing on existing data (EAD
, g g (
finding aids, MARC records)
2. Match, merge
2 Match merge, and enhance; build a
large test corpus of EAC CPF records
3. Create a prototype biographical
resource and access system using
system,
those records
13. Objectives
1. Develop tools for extracting EAC CPF
l l f
records, drawing on existing data (EAD
, g g (
finding aids, MARC records)
2. Match, merge
2 Match merge, and enhance; build a
large test corpus of EAC CPF records
3. Create a prototype biographical
resource and access system using
system,
those records
14. Objectives
1. Develop tools for extracting EAC CPF
l l f
records, drawing on existing data (EAD
, g g (
finding aids, MARC records)
2. Match, merge
2 Match merge, and enhance; build a
large test corpus of EAC CPF records
3. Create a prototype biographical
resource and access system using
system,
those records
15. Project Team
• University of Virginia, Institute for
Advanced Technology in the Humanities
– Daniel Pitti (PI) and Worthy Martin
• UC Berkeley School of Information
– Ray Larson and Yiming Liu
• California Digital Library
– Rachael Hu, Brian Tingle, and Adrian Turner
16. Project Team
• Terry Catapano (Columbia University)
• Sara Sprenkle (Washington and Lee University)
• Sarah Wells (University of Virginia)
• Kathy Wisser (Simmons Graduate School of Library
and Information Science)
• Tom L h (U i
T Lynch (University of Illinois School of Library
it f Illi i S h l f Lib
and Information Science)
17.
18. EAC CPF
• XML based data structure standard for
encoding archival authority records
g y
• Authorized name headings for the entity
• Biographical/historical context f the entity
i hi l/hi i l for h i
• Links to resources created by the entity
y y
• Links to resources about the entity
23. Data Sources
• EAD fi di aids [~150,000]
finding id
– 13 regional and statewide consortia
– 35 repositories in US, UK, and France; multiple US federal
agencies
• MARC21 records [~1.5 million]
– OCLC W ldC t
WorldCat
• Authority records
– OCLC Research: Virtual International Authority File (VIAF)
[~16 million]
– Getty Vocabulary Program: Union List of Artist Names (ULAN)
[
[~120,000]
]
– Additional name records from Archives nationales, British
Library, NARA, New York State Archives, and Smithsonian
Institution Archives
24. Consortia Individual institutions
•Archives Florida •American Philosophical Society •Northwestern University
•ArchivesHub (UK) •Archives nationales (France) •Princeton University
•Arizona Archives Online •Archives of American Art •Rutgers University
•EAD FACTORY (OhioLink)
• Points
P i t •Bibliothèque nationale de France •Smithsonian Institution Archives
Bibliothèque Smithsonian
•Five Colleges •BnF Archives et manuscripts •Syracuse University
•Maine Archival Collections •French Union Catalog •University of Alabama
Online (MACON)
( ) •Brigham Young University •University of Chicago
•Northwest Digital Archives •Church of Latter Day Saints •University of Connecticut
(NWDA) Archives •University of Delaware
•Online Archive of California •Columbia University •University of Florida
•Philadelphia Area •Cornell University
Cornell •University of Illinois
University
Consortium of Special •Duke University •University of Kansas
Collections Libraries (PACSCL) •Harvard University •University of Maryland
•Rhode Island Archival & •Indiana University •University of Michigan Bentley &
Manuscript Collections Online •Library of Congress (publicly Special Collections
(RIAMCO) available without restriction) •University of Minnesota
•Rocky Mountain Online •Minnesota Historical Society •University of Nebraska
Archive (RMOA) •Massachusetts Institute of
Massachusetts •University of North Carolina,
University Carolina
•Texas Archival Resources Technology Chapel Hill
Online (TARO) •National Library of Medicine •University of Utah
•Virginia Heritage •New York Public Library •Utah State Archives
•New York University •Utah State University
•North Carolina State •Yale University
25. Data Sources
• EAD fi di aids [~150,000]
finding id
– 13 regional and statewide consortia
– 35 repositories in US, UK, and France; multiple US federal
agencies
• MARC21 records [~1.5 million]
– OCLC W ldC t
WorldCat
• Authority records
– OCLC Research: Virtual International Authority File (VIAF)
[~16 million]
– Getty Vocabulary Program: Union List of Artist Names (ULAN)
[
[~120,000]
]
– Additional name records from Archives nationales, British
Library, NARA, New York State Archives, and Smithsonian
Institution Archives
26. Data Sources
• EAD fi di aids [~150,000]
finding id
– 13 regional and statewide consortia
– 35 repositories in US, UK, and France; multiple US federal
agencies
• MARC21 records [~1.5 million]
– OCLC W ldC t
WorldCat
• Authority records
– OCLC Research: Virtual International Authority File (VIAF)
[~16 million]
– Getty Vocabulary Program: Union List of Artist Names (ULAN)
[
[~120,000]
]
– Additional name records from Archives nationales, British
Library, NARA, New York State Archives, and Smithsonian
Institution Archives
32. Activities
1. Cultivate EAC CPF expertise across the
archival community, through 140 SAA
community
hosted workshops
2. Develop a blueprint for a sustainable,
national archival authority cooperative
33. Activities
1. Cultivate EAC CPF expertise across the
archival community, through 140 SAA
community
hosted workshops
2. Develop a blueprint for a sustainable,
national archival authority cooperative
34. Activities
1. Cultivate EAC CPF expertise across the
archival community, through 140 SAA
community
hosted workshops
2. Develop a blueprint for a sustainable,
national archival authority cooperative
Stay tuned for fall 2013!
35. Ray R. Larson, School of Information, UC Berkeley
Brian Tingle, California Digital Library
Adrian Turner, California Digital Library
2012 DLF Forum | Denver, CO
36. Brian Tingle and Adrian Turner
RBMS
Pre Conference 2012
San Diego, CA
37. The Social Networks and Archival
Context Project: Status Report
Adrian Turner*, Ray R. Larson**, Brian Tingle*
*California Digital Library
**University of California, Berkeley - School of Information
Thanks to Daniel V. Pitti of the Institute for Advanced Technology in the
Humanities, University of Virginia, and Brian Tingle of the California Digital
Library for many of the slides here
DLF 2012 - Denver
2012-11-04 - SLIDE
38. Funding and People
• Funding and Timeline
– National Endowment for the Humanities
– May 2010-April 2012
– Andrew W. Mellon Foundation
– May 2012-April 2014
• People
– Daniel Pitti (PI) and Worthy Martin (Institute for Advanced
Technology in the Humanities, University of Virginia)
– Adrian Turner and Brian Tingle (California Digital Library,
University of California)
– Ray Larson (School of Information, University of California,
Berkeley)
DLF 2012 - Denver
2012-11-04 - SLIDE
39. Two Interrelated Project
• Further the transformation of archival description
(separate description of records from description of people
documented in them) in order to …
• Enhance access to archival resources, though in fact all
cultural heritage resources
• Enhance understanding of resources by providing the
social-professional context within which people lived and
worked
DLF 2012 - Denver
2012-11-04 - SLIDE
40. The Source Data
• EAD-encoded finding aids (guides to archival
records)
– 150K
– Primarily from U.S. sources, but also U.K. and
France
• Archival authority records (360K)
– National Archives and Records Administration
– State Archive of New York
– Smithsonian Institution
– British Library
– National Archives (France) & BnF
• WorldCat Archival Descriptions: 2M
DLF 2012 - Denver
2012-11-04 - SLIDE
41. Library and Museum Authority Records
• Getty Vocabulary Program: Union List of
Artist Names (293K personal and corporate
names)
• Virtual International Authority File (16M+
cluster records)
– Contributed from around the world by national
libraries and others
DLF 2012 - Denver
2012-11-04 - SLIDE
42. Methods and Processing
• Extract EAC-CPF records from existing EAD-
encoded archival descriptions
– Extracting both creators and referenced CPF
names
• Match EAC-CPF records against one another and
against existing authority records (ULAN, VIAF,
LCNAF)
– Enhance EAC-CPF by normalizing entries, adding
alternative entries, titles (VIAF), and historical data
(ULAN)
• Create a prototype historical resource and access
system
– Historical data and social-professional networks
– Links to archive, library, and museum resources (by
and about)
DLF 2012 - Denver
2012-11-04 - SLIDE
43. Example EAD Record (Hub)
<ARCHDESC LEVEL = "FONDS" LANGMATERIAL = "English">
<EAD>
<DID>
<EADHEADER LANGENCODING = "ISO 639">
<REPOSITORY>
<EADID>
University of Manchester, John Rylands University Library of Manchester
GB 0133 TAB
</REPOSITORY>
</EADID>
<UNITID ENCODINGANALOG = "ISADG3.1.1." COUNTRYCODE = "GB"
<FILEDESC>
REPOSITORYCODE = "0133">
<TITLESTMT>
GB 0133 TAB
<TITLEPROPER>
</UNITID>
Tabley Muniments
<UNITTITLE LABEL = "Title" ENCODINGANALOG = "ISADG3.1.2.">
</TITLEPROPER>
Tabley Muniments
</TITLESTMT>
</UNITTITLE>
<PUBLICATIONSTMT>
<UNITDATE LABEL = "Dates of Creation" ENCODINGANALOG = "ISADG3.1.3.">
<PUBLISHER>
19th century
John Rylands University Library of
</UNITDATE>
Manchester
<PHYSDESC LABEL = "Extent" ENCODINGANALOG = "ISADG3.1.5.">
</PUBLISHER>
<EXTENT>
<ADDRESS>
1.24 cu.m
<ADDRESSLINE>
</EXTENT>
150 Deansgate
</PHYSDESC>
</ADDRESSLINE>
<ORIGINATION LABEL = "Creator" ENCODINGANALOG = "ISADG3.2.1.">
<ADDRESSLINE>
<FAMNAME SOURCE = "NCARULES">
Manchester
Warren, family, of Tabley, Cheshire
</ADDRESSLINE>
</FAMNAME>
<ADDRESSLINE>
<PERSNAME SOURCE = "NCARULES">
... (Parts removed )…
Warren, John Byrne Leicester, 1835-1895, 3rd Baron de Tabley, poet
</FRONTMATTER>
</PERSNAME>
</ORIGINATION>
</DID>
DLF 2012 - Denver
2012-11-04 - SLIDE
44. Example EAD Record (Hub)
<BIOGHIST ENCODINGANALOG = "ISADG3.2.2.">
<HEAD>
Administrative/Biographical History
</HEAD>
<P>
The poet John Byrne Leicester Warren, later 3rd and last Baron de Tabley, of Tabley near Knutsford, Cheshire,
was born in 1835, the son of the 2nd Baron de Tabley (1811-1887), and his wife, Catherina. His mother was Italian,
the daughter of the count de Soglio, and Warren spent much of his early childhood with her in Italy and Greece. He
was educated at Eton and Christ Church, Oxford. At Oxford he published a volume of poetry. Originally he
published under the pseudonyms George F. Preston (1859-1862) and William Lancaster (1863-1868), but latterly
under his own name.
</P>
<P>
His early verse included
<TITLE>
Praeterita
</TITLE>
(1863),
<TITLE>
Eclogues and Monodramas
</TITLE>
(1864),
<TITLE>
Studies in Verse
</TITLE>
(1865),
<TITLE>
Philocletes
</TITLE>
(1866), and
<TITLE>
Orestes
</TITLE>
(1868). His early work was Tennysonian in style, but he was later to be influenced by both Browning and
Swinburne. In 1873 he produced …. (some data removed)…
DLF 2012 - Denver
2012-11-04 - SLIDE
45. Example EAD Record (Hub)
<SCOPECONTENT ENCODINGANALOG = "ISADG3.3.1.">
<HEAD>
Scope and Content
</HEAD>
<P>
The collection consists mainly of the personal papers of the 3rd Baron de Tabley. The papers reflect his interests in
literature, politics, botany and numismatics and include correspondence with numerous prominent later Victorian
figures. Attention should also be drawn to de Tabley’s extensive and important collection of armorial bookplates.
</P>
<P>
Correspondents include Sir Mountstuart Grant Duff, Edmund Gosse, Lord Houghton, A.C.Benson, and Robert
Bridges. There are volumes of Tabley's essays and verse, as well as a considerable number of notebooks and
loose manuscripts of verse and other writings. There are various bundles and boxes relating to
"Coins", "Botany", "Poetry", "Literary", "Financial"
and bookplates.
</P>
</SCOPECONTENT>
<ADD>
<OTHERFINDAID ENCODINGANALOG = "ISADG3.4.6.">
<P>
Preliminary survey list.
</P>
</OTHERFINDAID>
<RELATEDMATERIAL ENCODINGANALOG = "ISADG3.5.3.">
<P>
There is correspondence with the 3rd Baron de Tabley among the Edward Freeman Papers, held at JRULM.
The Library also has custody of the important Tabley Book Collection.
</P>
</RELATEDMATERIAL>
<SEPARATEDMATERIAL>
<P>
The family and estate papers of the Leicester-Warren Family of Tabley are held by Cheshire Record
Office. Some of these papers were originally in the custody of the John Rylands University Library
of Manchester.
</P>
</SEPARATEDMATERIAL>
</ADD>
DLF 2012 - Denver
2012-11-04 - SLIDE
47. 2010-2012 Extraction Results
• Source data: 30,000 finding aids
• EAC-CPF records extracted
– LoC: 43,702 from 1,159 finding aids
– OAC: 91,811 from ~15,400
– NWDA: 22,609 from 5,160
– VH: 15,175 from 8,390
– Total 173,297
DLF 2012 - Denver
2012-11-04 - SLIDE
48. Methods and Processing
• Extract EAC-CPF records from existing EAD-
encoded archival descriptions
– Extracting both creators and referenced CPF names
• Match EAC-CPF records against one another
and against existing authority records (ULAN,
VIAF, LCNAF)
– Enhance EAC-CPF by normalizing entries, adding
alternative entries, titles (VIAF), and historical data
(ULAN)
• Create a prototype historical resource and access
system
– Historical data and social-professional networks
– Links to archive, library, and museum resources (by
and about)
DLF 2012 - Denver
2012-11-04 - SLIDE
49. The Problem
• Proliferation of the forms of names
– Different names for the same person
– Different people with the same names
• Examples
– from Books in Print (semi-controlled but not
consistent)
– ERIC author index (not controlled)
DLF 2012 - Denver
2012-11-04 - SLIDE
52. Library and Archive Authority
• Library (or bibliographic) authority control is almost
exclusively about the control of names
• Archival authority control involves biographical-
historical description of the CPF entity
– Descriptions based on controlled vocabularies, for
example, occupations, place of birth and death
– But also biographical-historical description
• Prose
• Chronological list
• Archival authority control provides context for
understanding records, the context of their
creation, the provenance
DLF 2012 - Denver
2012-11-04 - SLIDE
53. Merging EAC-CPF Records
LCNAF Repository VIAF Repository ULAN Repository
Cheshire
Search
Connect
Connect
records using
exactly
name Merge
matching
authority
records
information
Repository of Repository of
EAC Repository connected EAC merged EAC
Records Records
(MongoDB)
DLF 2012 - Denver
2012-11-04 - SLIDE
54. Merging EAC-CPF Records
VIAF Repository
Cheshire
Search
Connect
Connect
records using
exactly
name Merge
matching
authority
records
information
Repository of Repository of
EAC Repository connected EAC merged EAC
Records Records
(MongoDB)
DLF 2012 - Denver
2012-11-04 - SLIDE
55. Connect Exact Matches
• The EAC-CPF records provide the names
without having to parse texts, etc.
• Allows us to use some simple methods like
exact matching
– Assume identical name entries means the
same person/corporate body/family
– Enter the full names and record IDs into a
database and flag IDs with same names for
merging
DLF 2012 - Denver
2012-11-04 - SLIDE
56. But…
• Exact merging assumes that archives are
following LC cataloging practice in their
EAD records
– There are some problems with this assumption
DLF 2012 - Denver
2012-11-04 - SLIDE
57. Some failures for merging…
• Different abbreviations:
– A. & G. Carisch & C.
– A. & G. Carisch & Co.
• And spacing issues:
– A. C. Peters & Bro.
– A. C. Peters & Brother.
– A. C. Peters. (??)
– A. C.Peters & Bro.
• Completeness and alternate rules
– Tabb, John B. (John Banister), 1845-1909.
– Tabb, John Banister, 1845-1909.
• Also differing transliterations for non-Latin scripts
DLF 2012 - Denver
2012-11-04 - SLIDE
58. More…
• Variant romanizations (and spacing):
– M. P. Belaieff.
– M. P. Belaïeff.
– M. P. Bieliaev.
– M.P. Belaïeff.
– M.P.Belaïeff.
• Initials vs. names:
– Zabolotskii, N.A.
– Zabolotskii, Nikolai Alekseevich, 1903-1958.
– Zabolotskii.
DLF 2012 - Denver
2012-11-04 - SLIDE
59. More…
• Inverted order vs. uninverted
– Taylor, Zachary, 1784-1850.
– Zachary Taylor.
• Various combinations:
– Tchaikovsky, Peter I.
– Tchaikovsky, Pëtr Il.
– Tchaikovsky, Piotr Ilyich.
– Tchaikovsky, Pyotr Il.
– Tchaikovsky, Pyotr Ilyich.
DLF 2012 - Denver
2012-11-04 - SLIDE
60. Merging EAC-CPF Records
VIAF Repository
Cheshire
Search
Connect
Connect
records using
exactly
name Merge
matching
authority
records
information
Repository of Repository of
EAC Repository connected EAC merged EAC
Records Records
(MongoDB)
DLF 2012 - Denver
2012-11-04 - SLIDE
61. Search Authority Files
• For each name, formulate a search of the
VIAF database using the Cheshire system
(SGML/XML retrieval system with
probabilistic and Boolean matching)
– Search both the “authoritative” and “non-
authoritative” forms
– Consider any name matching a non-
authoritative form to be a candidate match for
the authoritative form
– Flag EAC records that match the same
authority record as potential matches
DLF 2012 - Denver
2012-11-04 - SLIDE
62. NGRAM or Shingle Matching
Name: Einstein Albert
Shingle sequence: ein, ins, nst, ste, tei, ein … , ert
Probability that the sequence (ins, nst, ste) follows ein is very high for the
name einstein
Shingle Language Model for names
Krishna Janakiraman and Sean Marimpietri - Biograph
DLF 2012 - Denver
2012-11-04 - SLIDE
63. Name 1 : Einstein Albert Name 2 : Ainshtain Albert Name 3 : Albert Einstein
ein In
hta tai na
ein In ain ste
na
sht
ste al
al nst
al nsh
nst
alb
alb alb ins
ins
ins
lbe ein lbe
Ain
ein lbe
ert ert
ein
ert ein
ein rte tei rte
tei
tei rte
Shingle Language Model for names
Krishna Janakiraman and Sean Marimpietri - Biograph
DLF 2012 - Denver
2012-11-04 - SLIDE
64. Merging EAC-CPF Records
VIAF Repository
Cheshire
Search
Connect
Connect
records using
exactly
name Merge
matching
authority
records
information
Repository of Repository of
EAC Repository connected EAC merged EAC
Records Records
(MongoDB)
DLF 2012 - Denver
2012-11-04 - SLIDE
65. Merge Flagged Records
• For all of the exact matches and authority
matches
– Use the Authoritative form of the name
– Combine data from each match into a single
EAC-CPF record
– Retain all source record IDs and information
• Finally, output the merged EAC-CPF
records
DLF 2012 - Denver
2012-11-04 - SLIDE
66. Inputs to SNAC merging
• LoC: 43,702 EAC-CPF records derived from 1159
finding aids
• OAC: 91,814 EAC-CPF records derived from
~15,400 finding aids
• NWDA: 24952 EAC-CPF records derived from
5,568 finding aids
• VH: 15,175 EAC-CPF records
• Total: 175,688 Input EAC records for merging
• Result: 128,781 “unique” names
DLF 2012 - Denver
2012-11-04 - SLIDE
67. Another view of the numbers…
• 95624 Person names merged from 125555
Person records
• 31287 Institutions merged from 47189
Institution records
• 1980 Families merged from 2899 Family
records
DLF 2012 - Denver
2012-11-04 - SLIDE
68. Merging Conclusions
• There will not be a single merging method,
but a staged set of approaches that will
allow us to go from the simplest exact
matches, to (we hope) reliably identifying
various variant forms of a name, etc. when
corroborated by contextual (date, etc.)
information
DLF 2012 - Denver
2012-11-04 - SLIDE
69. Next
• Developing an updateable database of
merged EAC data (dumping Mongo for
PostgreSQL)
– Will permit incremental addition of new data
and support editing and “forced” merges
• Process the 2M WorldCat archival
descriptions
• Process the 150,000 finding aids
• Convert several hundred thousand archival
authority records into EAC-CPF and match/
merge process
DLF 2012 - Denver
2012-11-04 - SLIDE
70. Methods and Processing
• Extract EAC-CPF records from existing EAD-
encoded archival descriptions
– Extracting both creators and referenced CPF names
• Match EAC-CPF records against one another and
against existing authority records (ULAN, VIAF,
LCNAF)
– Enhance EAC-CPF by normalizing entries, adding
alternative entries, titles (VIAF), and historical data
(ULAN)
• Create a prototype historical resource and
access system
– Historical data and social-professional networks
– Links to archive, library, and museum resources
(by and about)
DLF 2012 - Denver
2012-11-04 - SLIDE
71. For More Information
• http://socialarchive.iath.virginia.edu/
(Project website)
• http://socialarchive.iath.virginia.edu/xtf/
search (public prototype)
DLF 2012 - Denver
2012-11-04 - SLIDE
74. Outline
• User Persona!
• Search and Display!
• Network graph visualization!
• Linked Data / RDF!
• Future Plans
75. Meet the target users
Personas are fictional characters created to represent the different user types within a targeted demographic, attitude and/or behavior set that might use a site, brand
or product in a similar way. http://en.wikipedia.org/wiki/Persona_(marketing)
• Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic
families and networks. Sometimes he comes to the site looking for information on specific people; other
times he is looking for information on a specific subject or event. He also TAs an undergraduate history
class and sometimes has to help students find topics for papers. "
• Connie: Works at an institution that contributed records to the project. Is going to be asking
themselves how this site would be useful to their users. Wants to understand how their records were
used and what the added value is."
• Quincy: Library School Student working to QA record matching. "
• Adele: Person doing authority work during collection processing. "
• Lenny: Lenny likes linked data, and wants to be able to mine the links that have been established
programatically.
76. Outline
• User Persona!
• Search and Display
• Network graph visualization!
• Linked Data / RDF!
• Future Plans
102. Outline
• User Persona!
• Search and Display!
• Network graph visualization
• Context widget (needs new name)
• Linked Data / RDF!
• Future Plans
103. Tinkerpop graph database stack
• Simple "property graph" model!
• "JDBC for graph databases" [SNAC is using Neo4J for
the graphDB]!
• XPath like "gremlin" for graph query!
• REST interfaces with "Rexster"!
• For me, this was 10 to 100 times easier than using RDF
104.
105.
106.
107.
108.
109.
110.
111.
112.
113. Outline
• User Persona!
• Search and Display!
• Network graph visualization!
• Linked Data / RDF
• Future Plans
114. What is Linked Open Data?
• w3c Semantic Web Technology Stack!
• Web of atomized Data, not a web of documents!
• RDF; OWL ontologies; SPARQL queries; triple/quad/quint
stores!
• httpRange14; content negotiation; CURIE!
• No restrictions on data use; free and easy license!
• Lenny wants it, but does Randy?
115. What is Linked Open Data?
• Getting to the good stuff!
• Blue underlined text!
• Pulling in data from multiple sources, in an intelligent
way, into a "document"!
• Understand and discover relationships!
• Open access for research, education, private study and
other fair use
125. My opinion on the use cases for w3c RDF tech
• Good for publishing data!
• Good for controlled vocabularies!
• Data models?!
• Most people with open source RDF-store type systems
do the real stuff with solr!
• Consider a graph database
126.
127. Outline
• User Persona!
• Search and Display!
• Linked Data / RDF!
• Network graph visualization!
• Future Plans
128. Future Plans
• Conduct assessment activities involving members of target
audiences to establish mental model of users for design work!
• Scale interface to millions of names!
• Visualizations useful and integrated (network and geospatial)!
• Stable URLs between batches for linked data!
• Social and personalization features (gateway to crowdsourcing)!
• Integration with local systems (such as with the context widget)