The University of California Libraries and the California Digital Library are in the midst of an ambitious project to build a shared system for creating, managing, and providing access to unique digital resources—many of them archival—across the ten campuses. The UC Libraries Digital Collection project, which was defined by the libraries’ Next Generation Technical Services initiative, has three major objectives: 1) configure a digital asset management system where librarians can centrally add and edit digital files and metadata, 2) harvest metadata for digital resources hosted on external platforms, and 3) create a best-of-breed, integrated public interface so end-users can seamlessly search across these disparate resources. In addition to providing critical infrastructure for campus libraries to more efficiently manage and surface digital content, the resulting platform will also provide opportunities for collaboratively growing the collection. In May 2014, we will be about halfway through the project’s implementation—an ideal time to reflect on progress so far, challenges encountered, and how the project relates to broader strategies for connecting people with archives in the digital age.
Sherri Berger is a product manager at the California Digital Library, where she focuses on helping archives, libraries, and museums provide access to their unique and special collections holdings. She is part of a small team behind the Online Archive of California and Calisphere services, and is currently serving as project manager for implementation of the UC Libraries Digital Collection. Her professional interests include digital library assessment, usability and interaction design, and sustainability planning. Sherri holds an MS in Library and Information Science from the University of Illinois Urbana-Champaign.
2. **UC Libraries only; 2012 report; may or may not be same collections listed on the OAC
*all UC; only those listed on the Online Archive of California (OAC)
22. Next-gen website
for public access
• One point of access for all content
• Incorporating research, new tech
• Will subsume the Calisphere site
• …and take the Calisphere name
3
Thank you all for having me today.
I’m very excited to share with you all some work we have been doing at the University of California—an ambitious project to build a systemwide, ten-campus digital library service.
-------------
Before I begin, I want to provide a little bit of background on the California Digital Library, for those who are unfamiliar.
We are part of the University of California system, and organizationally we are part of the Office of the President
One of the ways we serve the university is through our partnerships with the ten UC campus libraries, and the project I’ll be discussing today is an example of one of those partnerships.
Technical development work is centered at CDL, but a truly collaborative project with the continued input of all ten campus libraries
Back-of-the-napkin calculations to understand what proportion of UC special collections are available online
No good numbers on this, systemwide
So I did my own method (with disclaimers)
This 2.8% number seems plausible to me – and even if it’s off by 10 or 20%, the important fact remains that we are talking about the tip of the iceberg – where only a fraction of the UC Libraries’ collections of unique materials are digitized, let alone available online
Today I’m going to be telling you about a project that is at its heart about providing online access to special collections resources.
To me, there are two major parts involved in getting these resources on the web
The first is about digitization – [talk very briefly about Heather’s team here, the fact that they are working on systemwide processes and funding for digitization]
But that’s not actually the subject of this talk
[click for animation, moves to computer monitor visual]
What I’m going to dig into are the processes of describing, managing, and exposing the digital resources that result from digitization (or potentially are born-digital)
This is a part of the story that I don’t think is always told, but it’s crucial – because if you lack the infrastructure for getting those resources out, there’s not a whole lot of reason to digitize in the first place (aside from preservation)
And what makes this project unique is that it’s about doing this on a large scale, working collaboratively
Putting in place a service for not one collection, or one archival repository, or one campus, but TEN university libraries—each of which has a unique set of needs, priorities, and workflows
One disclaimer: this isn’t a case study yet. We are one year through a two-year project.
I can’t tell you about all our great successes (although I’m sure I could in about a year’s time!)
But what I can do is tell you a little bit about
how and why we got where we are
What we’re doing now
And what we think this means for the future
I really welcomed this speaking opportunity because it force me to take a little break from the day-to-day workings of the project, to pause and take a look around. It’s a reflection point.
What I want to do first is give you a little background as to how things historically have worked at the systemwide level
Now, this is not the first time we have tackled the project of working collaboratively to provide access to unique digital content
Several years ago, a systemwide model was developed that has worked very well and which we continue to use today.
I want to walk you through this model, so you have a sense of where it is that we’re coming from, and what we’ll be doing differently moving forward
The model for getting resources out onto the web looks something like this.
At the bottom we have the content providers. These are the folks that own and are responsible for stewarding either/both the physical or the digital content. Now I’m talking specifically about UC Libraries in this case, because that’s the focus of the project at this time, but we also do partner with libraries, archives, and museums in much the same way.
In this model, content providers are locally making decisions about digitization and creating metadata.
And when they have a set of described resources, they send us the “final outputs” – the content files and the METS, which is what we require for the system.
This goes into an access repository that underlies two systemwide websites where the resources are exposed.
The first of those sites is the Online Archive of California, which I imagine many of you are familiar with.
The OAC was originally developed as a union database for EAD finding aids to archival collections, and indeed still serves this important purpose.
And the OAC also accepts digital objects, which it presents primarily within the context of finding aids
So, as researchers are searching the collections that have been described on the OAC, they can see representations of the content which have been digitized—sort of like a ‘tip of the iceberg’ effect
The second of those websites is Calisphere.
Calisphere is effectively an alternative view of the OAC that simply does not have the funding aids.
It’s for the user that may not necessarily be conducting “weighty archival research” and who is just interested in the digital copies.
These researchers comprise a large volume of, from K-12 up to graduate, as well as a wide range of other users
Calisphere has a more topical organization of resources than the OAC to guide these users to the resources of interest and relevance to them
And this has been a pretty good model. It’s worked. This is a graph showing the growth of image objects on OAC/Calisphere. It starts in November 2009 and goes up to April 2014; so that’s about 35,000 digital objects added in the last 4.5 years. Not too shabby.
However, as the years have gone on, a few key gaps in have emerged in this model.
The first is that it assumes that those institutions have solutions for creating and managing digital content – and frankly, this is not always the case
About two years ago we asked our UC library contributors what the biggest barriers were to contributing more digital objects to OAC/Calisphere, and one of the top reasons that came up was a need for a system—or a better system—for managing their digital assets.
Obviously the lack of a proper system for describing and managing digital objects limits the amount of content that the libraries contribute to the systemwide collection
Another complicating use case is that there are some campuses that have their own access solutions—that is, their own websites for surfacing content. This model doesn’t incentivize participating in the systemwide/statewide solution, because of a cost-benefit trade off. It’s actually a fair amount of work to send us content, and because the model is based on hosting that content in the repository, it means redundant work and redundant copies of the objects living on the web. So that’s another place where we are missing content in the systemwide collection because the model doesn’t quite meet the need.
In 2009 the UC Libraries began thinking of a new model for providing access that really responded to these gaps in the systemwide infrastructure
Brief description here of planning process – it was long, and it was highly collaborative
The resulting project is called the UC Libraries Digital Collection—that’s the project we’re engaged in right now.
This project has a vision with two key parts:
The first is a shared discovery platform for the UC Libraries’ unique digital content. One in which everyone can participate easily, whether they have a lot of local resources, or whether they’re using central systems—so a platform that is agnostic to where the content is actually hosted, because users really don’t care.
Shared infrastructure that will foster a growing collection and enable feed and adapt it long-term.
So, how does this translate into an actual project?
We have three major objectives
Install and configure a shared DAMS
DAMS stands for digital asset management system
All aspects of managing digital assets / objects
To date, campus libraries have been on their own for this function – some have solutions, but 6 campuses have been either using systems that are less-than-ideal, or really just haven’t been able to do as much as they could in terms of digitizing and surfacing content, because they just don’t have anything robust in place.
Shared DAMS will save the libraries money!
For the curious: the software we are using is called Nuxeo. This is an open-source product that is also vendor-supported so we have the best of both worlds—get support and some essential tools, while being able to customize it for our needs.
Also note that one of our goals is to seed the DAMS with existing collections, so the libraries don’t have to start from scratch.
I know I’ve talked at length about UC Libraries, but this part of the project we hope will be directly applicable to non-UC partners as well.
We are currently working on a pilot with the SFPL and the LAPL to see if we can use this harvest infrastructure to aggregate their digital special collections as well
So although a lot of this project is about building infrastructure within UC, we are also actively thinking about how to expand the scope of the new service to institutions across the state
Technical perspective: a brand new site, new “guts”
User perspective: an evolution of the Calisphere they know
You’ll remember this slide – the “before”
Now campus libraries can utilize a shared DAMS if they wish
For campuses that have their own systems, we can harvest their content without needing them to actually send us their objects for hosting
The harvested metadata will go into a common index
And finally, we’ll be building a new Calisphere website where all of these digital resources will be aggregated and exposed to the world
So here’s the “after”
As for timeline, it is a 2-year project to actually build this beast.
And, as I mentioned, we are here – just about halfway in
Shooting to have the DAMS available for campus libraries to add new assets and metadata this summer
Goal is to launch the public site—that is, the new Calisphere interface—in summer of next year
The new site and the current Calisphere site will then run simultaneously for a little while, giving users the option to toggle between the different views.
Before I move on to the future implications… visit our wiki!
So we have a sense of where we are now, and where we will be about this time next year.
Building blocks of infrastructure for a growing systemwide collection, including
Management capabilities for campuses that need it
Harvest to reduce redundancy
Plus a beautiful, next-gen public interface for discovery and access
Assuming we pull all that off, what’s on the horizon?
Well, for one, the new technical architecture for Calisphere opens the gate to a lot of cool potential new features on the site
These are just some of the blue-sky ideas we’ve been tossing around—some for several years—and the new site technology will make it much easier for us to pursue these
Of course, something we want to make sure is that we are pursuing the features that are most of value for end-users, so I imagine that will be among our next steps for the ongoing development of the site
Another way that this project is exciting is that this technical model will hopefully make it possible for us to easily expose content beyond our own website. This common index provides more flexibility than the repository we used to have, in part because it will have an API (if you’re familiar with that).
This is where I think we really start to see the scalability of this model – becomes a platform for access to these resources
I promise this is the last time you’ll see this diagram!
For example, something we are working on right now is aligning our model with the Digital Public Library of America.
Familiar with DPLA? A national aggregation of metadata to digital resources, with a substantial focus on special collections
By structuring our model as we have, we will be able to share more metadata much more quickly
Another possibility for where resources in this collection appear is on campus library-developed websites and exhibits. There are a lot of reasons we can imagine why a library might want to design something custom – for example if they are celebrating an anniversary or they have a grant project they want to highlight, or they want to develop a site for a particular niche audience. One way we are thinking about supporting this is by developing connections with Omeka. Omeka is a digital library exhibit-builder that is very popular, which allows you to fairly quickly create a website with various categories
And finally, before I wrap up, I want to return to the beginning of my talk and the fact that there is still lots of material that remains to be digitized…
What we’re hoping is that by having this infrastructure in place, and by being able to surface these digital resources in so many venues…
…we’ll have an even stronger case for funding for digitization
We’ve heard that granting organizations are increasingly concerned about the long-term sustainability of digital projects – and this makes sense. They want to know that when they pay for digitization, the resulting objects will be both widely and persistently available. With a new technical infrastructure, we will be even better positioned to ask for grants and other funding.
Well. I know no better way to end a presentation than with a giant bag of money. So…
With that, I’ll close. Thank you all for your time.