ICT Role in 21st Century Education & its Challenges.pptx
Next generation data services at the Marriott Library
1. Next generation data services
at the Marriott Library
Rebekah Cummings
J.Willard Marriott Library
March 5, 2014
2. Two questions to tackle today:
How do you see the data needs of the social
sciences and humanities changing and evolving
over the next five years?
How do you imagine libraries being involved in
this changing landscape and how might we
partner with campus faculty around their data
needs?
4. UCLA Social Science Data Archive
Provided 1:1 consultations to research teams
Conducted data management workshops
Worked closely with the UCLA Civil Rights Project
to create a preservation strategy for their
publication and datasets.
5.
6. “The active and ongoing management of data
through its lifecycle of interest and usefulness to
scholarship, science, and education. Data curation
activities enable data discovery and retrieval,
maintain its quality, add value, and provide for
reuse over time.” – University of Illinois’ Graduate
School of Library and Information Science
What is data curation?
8. Social Science Data
Opinion polls
Surveys
Interviews
Government records
Social/ Mass Media
Laboratory experiments
Field experiments
Census records
Voting records
Economic indicators
9. Humanities Data
Newspapers
Photographs
Letters
Diaries
Books, articles
Birth, death, marriage records
Church records
Court records
Yearbooks
Maps
10. Why libraries?
Opportunity to expand our services in ways that
can benefit faculty.
Opportunity to build stronger relationships
between libraries and research communities.
We can continue to play a role in the preserving
and making available the scholarly record.
11. We have the skills!
Organizing information
Describing information
(Metadata)
Grant writing
Copyright and licensing
Digital preservation
Open access
Instructional experience
Reference work
Professional ethics
12. Our faculty and students have spoken
In response to the question “What services should be
added at the Marriott Library?” on the Strategic
Planning survey:
40% of users responded by saying they need more
assistance with research data.
60% of library employees responded by saying we
should provide more assistance with research data.
Key recommendation from the external library
consultants was to develop more support for
research data management.
13. Data Policies and Funder Requirements
2011 – National Science Foundation Data Management Plan
requirement
2013 –White House Public Access to Federally Funded
Research memo
2014 – NEH Office of Digital Humanities Data Management
Plan requirement
14. Journal Requirements
“A condition of publication in a Nature
journal is that authors are required to
make materials, data, code, and
associated protocols promptly
available to readers without
qualification.”
– Nature’s open data policy
15. Challenges
Data is unlike materials we’ve worked with in the past
Needs context to be understandable
Varies greatly in size and complexity
Version control
Ethical considerations
Most data are born-digital objects
16. Digital Libraries vs. Data Curation
PDF
PDF
PDF
PDF
PDF
PDF
PDF
TIFF
WAV
TIFF
TIFF
TIFF
TIFF
TIFF
TXT
PPT
SPSS
XLSX
TIFF
DOC
GPSS
PY
TIFF
CSV
WAV
17. Most researchers:
Were not trained in data management
Don’t know how to write a data management plan
Don’t know how to create proper metadata
Have concerns about sharing their data
Aren’t convinced they really have to share their
data
Adapted from http://www.slideshare.net/carlystrasser/iassist20120608
18.
19. The last five years (2010 – 2015), cont.
New policies and funder requirements
Development of best practices and standards
Technical infrastructure
Growing community of research data managers
Case studies as models for excellence
Tools for helping researchers
20. Question #1
How do you see the data needs of
the social sciences and humanities
changing and evolving over the
next five years?
21. Almost everyone is working digitally now
Researchers
Researchers
Using
Technology
1990
2020
Researchers
Researchers
Using
Technology
23. Larger datasets
HathiTrust Research Center
10.5 million volumes
3.6 billion pages
1890-present
Discover patterns over time
that were previously
invisible
Twitter Archive
Acquired by Library of
CongressApril, 2010
As of January 2013, 170
billion tweets
Available to researchers 6
months after posting
27. Changing metrics
Altmetrics
Easier to count use than citations
“Impact” means something different than it did
ten years ago.
28. Increased awareness of importance
around data sharing
Data management plan requirements
Journal requirements
White House directive
Trends towards transparency and
openness.
30. With all this in mind, remember…
NONE OF USWERE
TRAINED FORTHIS!!!
31. Question #2
How do you imagine libraries being
involved in this changing landscape
and how might we partner with
campus faculty around their data
needs?
32. Research Data Services
Education andTraining
Data Management Plan
help
Data Consultation
Metadata Assistance
Analysis andVisualization
Tools (Digital Scholarship
Lab)
Long-term stewardship/
preservation
Repository services
(Uspace)
Mint DOIs and ARKs
Catalog Datasets
Data reference and
acquisition
33. Research Data Curation pilot projects
University of Minnesota
8 months (May – Dec 2013)
Call for pilot datasets among the faculty
Outputs
Data curation workflow
Five pilot datasets
Summary report
Faculty engagement
37. Embedded librarianship
Librarians written into grant proposals as
data managers.
Cost is underwritten in the grant
Option for Sustainability
Allows us to get involved at the beginning
and throughout the entire data lifecycle!
38. The UCLA Civil Rights Project
Started as part of a class project
High profile, mostly quantitative, social science data
Two P.I.s and many graduate students in a distributed
research team.
39. Website to eScholarship
Moved 72 CRP
publications to
eScholarship
Structured metadata
Open access scholarly
publishing for the UC
Community
Some preservation
strategies
40. UCLA CRP Dataverse
Secured datasets and codebooks from CRP researchers
Added four datasets to Dataverse
Converted files to non-proprietary format
Added structured metadata
Added data citation and persistent identifier
Linked to related publications
Worked out data governance with researchers
Created workflow with research team for data and publication
archiving
42. Future concerns
Developing a cost model to support the ongoing
expense of curating research data.
Longevity of file formats
Data governance issues
Developing library staff to support data curation
activities
Most of you probably only know me as the Assistant Director for the Mountain West Digital Library. But I did have a former life before moving to Utah and for two years that life primarily revolved around data.
For one year I was on an NSF-funded grant studying the data practices of scientists. We studied scientists as they collected and analyzed their research data and interviewed them about their data practices. This was Katrina Edwards, she was a P.I. in one of the labs we were studying. Katrina was the one who gave us access to the deep sea biosphere labs because… [tell IODP cruise ship stories]
I also did a one-year internship at the UCLA Social Science Data Archive where I received valuable hands-on experience working with researchers on their data. I had a chance to conduct 1:1 consultations with research teams, give workshops at UCLA on data management, and even worked closely with the UCLA Civil Rights Project to archive their publications and datasets.
Data curation tends to make people nervous. For many of us this was a concept that didn’t exist in library school. We’ve heard the term floating around for awhile but may not be sure exactly what it even means.
The skills it takes to be be a library that excels in data curation already exist here.
Everything that we are talking about is an evolving discussion.
Before we tackle the three questions, I want to answer some of the basic questions you may have about data curation.
The most obvious question… what is it?
Data curation is often described as a set of activities.
Often useful to imagine data curation as a lifecycle.
For curation to be successful, the data manager needs to be involved as early in the lifecycle of data as possible.
http://www.bu.edu/datamanagement/background/data-life-cycle/#creating
We have created services at the Marriott Library based on this model.
The one truism is that the earlier that a data manger can get involved in the process, the more effective data curation will be.
We all tend to think of “Data” as being something that is more prevalent in the hard sciences but every researcher on this campus uses data. Data curation in the social sciences is nothing new. ICPSR has been around for over 50 years. Census data has been collected and broadly used since 1790.
Many of these items have been digitized and are now available online through different libraries and archives.
This is the kind of material we worked with most often in the Mountain West Digital Library.
I like to think of data in Michael Buckland’s term which is “alleged evidence” for whatever it is you are trying to prove.
It has always been part of our mission to support the research activities of our researchers on campus and to preserve the scholarly outputs of research. In the past that meant, journals and books, but now data are starting to be seen as scholarly outputs of research as well.
History of retooling ourselves to meet the needs of our research community. Things that we may not be great at – data analysis or data visualization.
40% of the 2400 respondents said that they want more assistance with research data.
This became a key recommendation from the strategic planning consultants.
Despite all these challenges, data curation has consistently been identified as a “Top Trend” in academic research libraries for the past few years and one of the main reasons for that is a host of new policies and funding requirements that have come out that requiring data management plans or data sharing.
https://www.lib.umn.edu/datamanagement/funding
http://libraries.mit.edu/scholarly/publishing/research-funders/research-funder-open-access-requirements/
Nature is the most cited scientific journal in the world and
Journals and Funders are requiring data sharing but they are not offering to provide the services. Funders will pay for data management.
http://www.nature.com/authors/policies/availability.html
For certain types of datasets, submission to a community-endorsed data repository is mandatory.
Recommended data repositories: http://www.nature.com/sdata/data-policies/repositories
Version control – what gets kept? Some datasets never stop amassing data. (Twitter archive, Hubble telescope)
We don’t create the data and there are literally hundreds of file formats that we may have to consider accepting a working with.
Beyond the technical problems there are lots of social considerations as well…
Data is incredibly important to out researchers, but the culture of data sharing is still new to most of them and it can be kind of scary.
Mention Data Q – funded by IMLS
That was a look at where we’ve been for the past five years. Now I would like to take a look at where we’re going.
I don’t have a crystal ball, but I think based on the trends we’ve seen, we can make some educated guesses about where the data needs are changing in the social sciences and humanities.
When I was a kid, our phone would ring off the hook with people calling my parents with survey questions.
Do they still do this, yes, but it’s a pretty ineffective way to gather data, especially if you are researching people in their 20s and 30s. I don’t even have a home phone. Now data is being collected on social media, on Google, on dating websites. This raises other ethical considerations about consent and transparency.
We also have increased computing power
Inherently interdisciplinary
Example of collaborative work
UDN is another example of collaborative work.
Another example of how humanities work is getting interdisciplinary.
Another thing that has been changing in the social sciences and humanities are different ways of evaluating impact.
This is really important when it comes to datasets because it is still unclear how researchers will be rewarded for creating datasets that are used again and again. Data citation is still a new concept.
Once thing that studies have shown is that researchers who publish data with their publications have higher citation counts than those who don’t.
Researchers are going to see that not only do they have to share their datasets, but that there is great value in doing so.
For us, this is what is changing…
Johns Hopkins Data stack layer
Retooling ourselves for these new needs and this changing environment is essential.
We need to give ourselves a little grace and we need to give ourselves lots of opportunities for training in order to retool ourselves to meet these needs.
To me, this is the most exciting part of the prompt. In light of all these changes, what can the Marriott Library do to partner with our campus faculty around their data needs?
First of all, we need to work on expanding the suite of data services that we offer and make sure our researchers know that the library is the place to get these services.
Which services we offer will be determined by two things:
The needs of our research community and
Our own resources
From UCSD - Based on the lessons learned in our processes, the Research Data Curation Program will provide a suite of services from which campus users can pick and choose, as necessary.
Once again, these projects were undertaken because of data management requirements
Visual Roadmap for the Minnesota Data Curation Pilot Project
We all have a lot to learn from each other.
Pilot project – very popular with faculty.
Helping them understand what needs to go on to take care of their data. Highlights the role the library can play. Hey got a sense of what support was there for them. Other tings can be in helping to draft policy about faculty ownership of data, (all your data belongs to the university) what rights to researchers have? Make them aware of tools and people that exist that might make the job easier, involve them at the point of writing the data management plan. Embedded in research teams, not going to be on every project. New roles of the librarian.
They have added 12 more publications since I left.
Thank you so much for listening. Does anyone have any questions?