SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Changing the Curation Equation:
A Data Lifecycle Approach to
Lowering Costs and Increasing Value
Jim Myers1, Margaret Hedstrom1, Beth A Plale2, Praveen Kumar3, Robert
McDonald4, Rob Kooper5, Luigi Marini5, Inna Kouper4, Kavitha Chandrasekar4
myersjd@umich.edu
1 School on

Information, University of Michigan, Ann Arbor, MI, United States.
School of Informatics and Computing, Indiana University, Bloomington, IN, United States.
3 Civil and Environmental Engineering, University of Illinois, Urbana-Champaign, IL, United States.
4 Data To Insight Center, Indiana University, Bloomington, IN, United States.
5 National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign, IL, United States.
2
Outline
•
•
•
•
•

Quick Project Intro
What is SEAD? (Stop by the SEAD booth!)
Why is SEAD?
How does SEAD work?
Future active and social curation work
SEAD: Sustainable Environment Actionable Data
• An NSF DataNet project started in
October, 2011
• An international resource for
sustainability science
• A provider of light-weight Data Services
based on novel technical and business
approaches:
– Supporting the long-tail of research
– Enabling active and social curation
– Providing integrated lifecycle support for data
http://sead-data.net/

Margaret Hedstrom, PI
Praveen Kumar, co-PI
Jim Myers, co-PI
Beth Plale, co-PI
Sustainability Research
• Central to solving many of society’s most critical
challenges
• An exemplar of modern research
–
–
–
–

Local processes aggregating to produce global consequences
Multiple time scales
Coupling of natural and human systems
Interacting systems-of-systems requiring multidisciplinary
understanding
• Environmental – Economic - Social
Science
Cooperation

Technology

Policy

Economics
Poverty &
Justice
SEAD is:
• Data discovery
• Project workspaces
• A data-aware
community network
• Curation and
preservation services
that link to multiple archives and discovery
services
SEAD is:
• Secure project spaces where teams can:
– Gather reference data
– Upload and share new results
– Annotate
– Relate
– Organize
– Publish

Project Dashboard
SEAD is:
• An active repository that creates data pages with
–
–
–
–
–
–
–
–

Previews
Extracted Metadata
Overlays
Tags
Comments
Provenance
Use information
Download/Embed
SEAD is:
• A tool for community exploration:
– Personal and
Project Profiles
– Publications and
Data Citations
– Co-author,
co-investigator
graphs
– Temporal analysis
SEAD is:
• A way to preprint and publish
data:
– Branded interface
– Discovery metadata
– Drill-down
• Sub-collections
• Data Pages

– Submit for curation and
preservation

The National Center for Earth Surface Dynamics
~1.6 TB, 450K files (2.2 M objects) representing 10
years of research by multiple teams
SEAD is:
• A community platform for reference data:
– Research Object
management
– Inference
– Curation
– Preservation
– ID assignment
– Catalog Registration
– Discovery
– Citation Generation

SEAD’s Virtual Archive allows curators to
access, assess, enhance, package, and submit
data from SEAD project repositories for longterm storage in SEAD-managed storage or
external institutional repositories and cloud
data services.
–
–
–
–

Apps read what they need and write what they know
Curation snapshots meaningful Research Objects
Multiple ROs can be defined/managed re-using the same underlying ‘living’ content
The larger graph can be ~reassembled w/o the ongoing cost of managing at the item level

Flickr-style web management of data

Sensor data

Semantic Content Middleware
over Scalable File System and
Triple Store

Geospatial, social
network mash-ups,
workflows and services
Curation Services to harvest
and package specific data sets

Federation of OAI
repositories for
long-term
preservation
Why is SEAD needed for curation?
• The nature of modern research
• The nature of the data documentation
problem
• Artificial limitations derived from historical
practice
Unless these issues are addressed (in addition to
sheer scale), data curation will remain too
cumbersome and expensive for ubiquitous use…
Data Challenges in Sustainability Research
• Many dimensions, many coordinate systems, many scales,
many formats, a long-tail of providers and users, …
• Managing this data is a drag on productivity…
The Long Tail in Research
• Individuals/small groups
where:
– Scale of research prohibits traditional CI
development, dedicated IT support, full-time
curator…
– shared data but multiple disciplinary views
– Projects involve reference data from external
sources
– Project Team does not control formats and
vocabularies

These are not just
“challenges for the “future
Analyzing the curation/preservation
problem…
• Data and Metadata are known well
during the project
• Producers actually memorize or
record metadata already, and then
spend precious time transferring that
between people and systems
• Data users manually assemble
missing data/metadata but don’t
often have a way to share that with
others
• Repositories struggle to attain the
domain understanding needed to go
beyond basic bibliographic info
– Repositories only use metadata to
help with data discovery and internal
curation decisions

Producers

Users

Bill Michener – DataONE
Jim Myers - SEAD

Who knows what?
When do they know it?
Why will they tell you?
Our collective legacy
•
•
•
•

Data can only be in one place…
Data transfer is costly…
Mistakes are costly…
Only the future needs well-organized data

 (questionable assumptions)
• Curation only happens at data/project/center end-of-life
• Submission events must be formal and complete
• Only cross-trained professionals are capable of getting it
right
• Researchers should see curation only as a public service
What’s different for users?
• When you add a file:
– You can get it back, from anywhere
– You can see your video, zoom in on images, overlay spatial
data on maps and retrieve them from an OGC service
endpoint
– You see the metadata hidden in the file
– You can add titles, descriptions, locations, tags later, not as
required parts of a long submit form, and
• When you do, they are search terms and ways to create custom
maps

– You can add good data and bad, and figure out which data
to keep later (using provenance to guide you)
– Users of your data can add metadata, comments, and
derived datasets that improve quality, adapt the data for
new purposes, etc.
What’s different for curators
• Curation starts with data and metadata in hand, not as
a search through dusty disks
• Curators can embed with project teams
• Data comes with
– Formal metadata (dc:creator= http://vivo-vis-test.slis.indiana.edu/vivo/individual/n7732 )
– Informal metadata (http://www.holygoat.co.uk/owl/redwood/0.1/tags/taggedWithTag
tag:cet.ncsa.uiuc.edu,2008:/tag#bpnm)

– Context! (“bpnm” in the WSC_Reach project always means “Birds Point/New Madrid”)
– Producers and users – conversations are possible

• Packaging, repository selection, submission,
registration with catalogs are all automated/semiautomated…
SEAD Concept
• Leverage incremental, informal active use to
capture data and metadata from first sources
• Provide data-related (metadata-driven) services
to active producers and users of data
• Simplify and automate curation and preservation
processes using captured information and
context
• Leverage existing institutional repository
technologies and organizations to provide longterm storage
Increase Value, Lower Costs, Increase Immediacy
SEAD is:
• Write once, re-use
• Extensible (data, metadata) – within sustainability
research and beyond
• Incremental
• Living datasets  published Research Objects
• Scalable
A tool for data producers and users…
that also provides a long-term data plan…
that can be sustainable at community scale
How?
• Web 2.0, Web 3.0…
• Strong collaboration with researchers and
curators

• Leveraging standards – vocabularies, service
endpoints, transfer protocols, submission
packages, …
• Leveraging existing software – Medici/Tupelo,
VIVO, DataConservancy + Jena, GWT, Geoserver,
MySql, Fuseki, …
Current Status
• 10 hosted project spaces for pilot groups on VM farm +
community VIVO, VA servers
– ~< 2 TB, ~1800 profiles, proof-of-concept submissions to UI
and IU institutional repositories

• 1.0 OSS release in November, operating as a DataOne
production Member Node (next week)
– Google sign-in, cybersecurity and usability enhancements,
data-maturity-based access control, dashboard, public
discovery, and geobrowse interfaces, …

• Project info: http://sead-data.net
• Demo Space: http://sead-demo.ncsa.illinois.edu
Going forward
– Version 1.0 released
– Open early adopter period
– Improving scalability
– Exploring social feedback mechanisms to further
improve curation – add value, remove costs,
engage producers, users, and curators
– Active outreach: Use SEAD! (software or services),
Extend SEAD!, collaborate with SEAD!
Acknowledgements
• SEAD Team @ UM, UI, IU
• NSF
• NCED, IRBO, WSC-Reach, IMLCZO, ICPSR, other
sustainability researchers
• and Thank You!
… stop by the SEAD booth and share your thoughts!

http://sead-data.net/
SEAD: Components/
Communications
HTTP Links/
Embedded Content

SEAD VIVO:
Browse Through People , Projects,
Publications, Data Citations , and
Organizations, Visualize Networks and
Community Dynamics

Main Website:
Overview, Project Info,
Services, Documentation,
News

SPARQL Queries
HTTP Data/DOI links
Active Content Repository
(multiple webapps):
Branded Public Access
Active Project Spaces
Individual Data Pages

BAGIT Data/
Metadata Transfer

SEAD Virtual Archive:
Policy Driven Curation
Institutional/Cloud/Grid
Storage
Faceted Search for
Reference Data
Web Application
User Management
Data/
Metadata Mgmt

Desktop Drop Box

Android Upload

Branded Repository

Geo-webapp

Project Summary

Web Service APIs
Role-based Access Control

Extractors and
Indexing

Tupelo 2
RDF
+
Files
MySQL

Search Page

Admin Page

Map Page

Tag Page

Collection Pages

Data Pages

SEAD Active Content Repository
Architecture

Lucene

Modified/ Configured
Medici/Tupelo 2
Components

Geoserver

Local File System

SEAD ACR
Additions and 3RD Party
Components
Temporal
Visualization

Network
Visualization

Data Citations

Organizations

Publications

Projects

People

SEAD VIVO Architecture

Input Form/Display Generation
Internal APIs
User
Management

Joseki/Fuseki/Web Services

Entity Management

Analytics

Jena/RDF
MySQL

Local File System
Geo-spatial Search

Facet Search

Matchmaking

Ingest Processing

Curator’s
Workbench

SEAD Virtual Archive Architecture

Web Services

APIs
Metadata Extraction/
Persistent Identifier/
Indexing/Archival
(Adapted DC Workflow)

Solr
Matchmaker/ DataONE Geospatial
BagIt
Query
Member
Query
Repository
Conversion
(XML)
Management Node Service Service
Solr Indexer

PostGIS

SWORD

Local File System

UIUC Ideals

IUScholarworks

Archival
Storage

Contenu connexe

Tendances

Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...SEAD
 
SEAD Virtual Archive: Building a Federation of Institutional Repositories fo...
 SEAD Virtual Archive: Building a Federation of Institutional Repositories fo... SEAD Virtual Archive: Building a Federation of Institutional Repositories fo...
SEAD Virtual Archive: Building a Federation of Institutional Repositories fo...skonkiel
 
Rdap12 wrap up reagan moore
Rdap12 wrap up reagan mooreRdap12 wrap up reagan moore
Rdap12 wrap up reagan mooreASIS&T
 
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim ClarkData Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clarkdatascienceiqss
 
Repository Fringe 2016 - Survey Documentation and Analysis
Repository Fringe 2016 - Survey Documentation and AnalysisRepository Fringe 2016 - Survey Documentation and Analysis
Repository Fringe 2016 - Survey Documentation and AnalysisEDINA, University of Edinburgh
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identificationguest453b14
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...GarethKnight
 
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017ARDC
 
Research data management & planning: an introduction
Research data management & planning: an introductionResearch data management & planning: an introduction
Research data management & planning: an introductionMaggie Neilson
 
Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012 Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012 Jian Qin
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsARDC
 
Research data management for masters and ph d students
Research data management for masters and ph d studentsResearch data management for masters and ph d students
Research data management for masters and ph d studentsDebs Martindale
 
D paul ecn2013
D paul ecn2013D paul ecn2013
D paul ecn2013ECNOfficer
 
Introduction to Data Management Planning
Introduction to Data Management PlanningIntroduction to Data Management Planning
Introduction to Data Management PlanningSarah Jones
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing dataWorld Agroforestry (ICRAF)
 
Data Curation Models JHU Barbara Pralle RDAP12
Data Curation Models JHU Barbara Pralle RDAP12Data Curation Models JHU Barbara Pralle RDAP12
Data Curation Models JHU Barbara Pralle RDAP12ASIS&T
 

Tendances (19)

Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
 
SEAD Virtual Archive: Building a Federation of Institutional Repositories fo...
 SEAD Virtual Archive: Building a Federation of Institutional Repositories fo... SEAD Virtual Archive: Building a Federation of Institutional Repositories fo...
SEAD Virtual Archive: Building a Federation of Institutional Repositories fo...
 
Organising and Documenting Data
Organising and Documenting DataOrganising and Documenting Data
Organising and Documenting Data
 
Rdap12 wrap up reagan moore
Rdap12 wrap up reagan mooreRdap12 wrap up reagan moore
Rdap12 wrap up reagan moore
 
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim ClarkData Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clark
 
Repository Fringe 2016 - Survey Documentation and Analysis
Repository Fringe 2016 - Survey Documentation and AnalysisRepository Fringe 2016 - Survey Documentation and Analysis
Repository Fringe 2016 - Survey Documentation and Analysis
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identification
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...
 
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
 
Research data management & planning: an introduction
Research data management & planning: an introductionResearch data management & planning: an introduction
Research data management & planning: an introduction
 
Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012 Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directions
 
Digital Curation 101 - Taster
Digital Curation 101 - TasterDigital Curation 101 - Taster
Digital Curation 101 - Taster
 
Research data management for masters and ph d students
Research data management for masters and ph d studentsResearch data management for masters and ph d students
Research data management for masters and ph d students
 
D paul ecn2013
D paul ecn2013D paul ecn2013
D paul ecn2013
 
Introduction to Data Management Planning
Introduction to Data Management PlanningIntroduction to Data Management Planning
Introduction to Data Management Planning
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
 
Data Curation Models JHU Barbara Pralle RDAP12
Data Curation Models JHU Barbara Pralle RDAP12Data Curation Models JHU Barbara Pralle RDAP12
Data Curation Models JHU Barbara Pralle RDAP12
 
What is-rdm
What is-rdmWhat is-rdm
What is-rdm
 

Similaire à Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs and Increasing Value

Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processLouise Corti
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATTony Ross-Hellauer
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATOpenAIRE
 
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu | Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu | EUDAT
 
Metadata and Metrics to Support Open Access
Metadata and Metrics to Support Open AccessMetadata and Metrics to Support Open Access
Metadata and Metrics to Support Open AccessMicah Altman
 
A Data Scientist Perspective on Data Curation in the Digital Era
A Data Scientist Perspective on Data Curation in the Digital EraA Data Scientist Perspective on Data Curation in the Digital Era
A Data Scientist Perspective on Data Curation in the Digital EraVicki Ferrini
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data LocallyErin D. Foster
 
Large scale computing
Large scale computing Large scale computing
Large scale computing Bhupesh Bansal
 
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...datacite
 
NDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficeNDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficePhilip Bourne
 
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docxDATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docxrandyburney60861
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
Engaging with students and researchers: the case of the social sciences
Engaging with students and researchers: the case of the social sciencesEngaging with students and researchers: the case of the social sciences
Engaging with students and researchers: the case of the social sciencesLouise Corti
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 Scott Edmunds
 
Open Source and Science at the National Science Foundation (NSF)
Open Source and Science at the National Science Foundation (NSF)Open Source and Science at the National Science Foundation (NSF)
Open Source and Science at the National Science Foundation (NSF)Daniel S. Katz
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
 

Similaire à Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs and Increasing Value (20)

Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production process
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
 
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu | Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
 
Metadata and Metrics to Support Open Access
Metadata and Metrics to Support Open AccessMetadata and Metrics to Support Open Access
Metadata and Metrics to Support Open Access
 
CDL research lifecycle
CDL research lifecycleCDL research lifecycle
CDL research lifecycle
 
A Data Scientist Perspective on Data Curation in the Digital Era
A Data Scientist Perspective on Data Curation in the Digital EraA Data Scientist Perspective on Data Curation in the Digital Era
A Data Scientist Perspective on Data Curation in the Digital Era
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
 
NDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficeNDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) Office
 
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docxDATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Engaging with students and researchers: the case of the social sciences
Engaging with students and researchers: the case of the social sciencesEngaging with students and researchers: the case of the social sciences
Engaging with students and researchers: the case of the social sciences
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
Sgci nsf-si2-2-21-17
Sgci nsf-si2-2-21-17Sgci nsf-si2-2-21-17
Sgci nsf-si2-2-21-17
 
Open Source and Science at the National Science Foundation (NSF)
Open Source and Science at the National Science Foundation (NSF)Open Source and Science at the National Science Foundation (NSF)
Open Source and Science at the National Science Foundation (NSF)
 
L07 metadata
L07 metadataL07 metadata
L07 metadata
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 

Plus de SEAD

Poster: Using SEAD to Support Collaboration among Land Managers, Scientists, ...
Poster: Using SEAD to Support Collaboration among Land Managers, Scientists, ...Poster: Using SEAD to Support Collaboration among Land Managers, Scientists, ...
Poster: Using SEAD to Support Collaboration among Land Managers, Scientists, ...SEAD
 
Using SEAD to Support Collaboration among Land Managers, Scientists, and the ...
Using SEAD to Support Collaboration among Land Managers, Scientists, and the ...Using SEAD to Support Collaboration among Land Managers, Scientists, and the ...
Using SEAD to Support Collaboration among Land Managers, Scientists, and the ...SEAD
 
Ignite@AGU14
Ignite@AGU14Ignite@AGU14
Ignite@AGU14SEAD
 
Improving Data Management Capacity in the Mekong Basin Using SEAD
Improving Data Management Capacity in the Mekong Basin Using SEADImproving Data Management Capacity in the Mekong Basin Using SEAD
Improving Data Management Capacity in the Mekong Basin Using SEADSEAD
 
Practical and Conceptual Considerations of Research Object Preservation
Practical and Conceptual Considerations of Research Object PreservationPractical and Conceptual Considerations of Research Object Preservation
Practical and Conceptual Considerations of Research Object PreservationSEAD
 
Preservation, Publishing, and People: A SEAD View
Preservation, Publishing, and People: A SEAD ViewPreservation, Publishing, and People: A SEAD View
Preservation, Publishing, and People: A SEAD ViewSEAD
 
An Overview of Plans for SEAD
An Overview of Plans for SEADAn Overview of Plans for SEAD
An Overview of Plans for SEADSEAD
 
SEAD: Lightweight Data Services for Sustainability Research
SEAD: Lightweight Data Services for Sustainability ResearchSEAD: Lightweight Data Services for Sustainability Research
SEAD: Lightweight Data Services for Sustainability ResearchSEAD
 
NSF DataNet Partners Update at RDAP14
NSF DataNet Partners Update at RDAP14NSF DataNet Partners Update at RDAP14
NSF DataNet Partners Update at RDAP14SEAD
 
SEAD Prototype: Data Curation and Preservation for Sustainability Science
SEAD Prototype: Data Curation and Preservation for Sustainability ScienceSEAD Prototype: Data Curation and Preservation for Sustainability Science
SEAD Prototype: Data Curation and Preservation for Sustainability ScienceSEAD
 
SEAD: Opening Data in the "Long Tail" for Active and Social Curation
SEAD: Opening Data in the "Long Tail" for Active and Social CurationSEAD: Opening Data in the "Long Tail" for Active and Social Curation
SEAD: Opening Data in the "Long Tail" for Active and Social CurationSEAD
 
SEAD: A system to support social and active data curation
SEAD: A system to support social and active data curationSEAD: A system to support social and active data curation
SEAD: A system to support social and active data curationSEAD
 
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...SEAD
 

Plus de SEAD (13)

Poster: Using SEAD to Support Collaboration among Land Managers, Scientists, ...
Poster: Using SEAD to Support Collaboration among Land Managers, Scientists, ...Poster: Using SEAD to Support Collaboration among Land Managers, Scientists, ...
Poster: Using SEAD to Support Collaboration among Land Managers, Scientists, ...
 
Using SEAD to Support Collaboration among Land Managers, Scientists, and the ...
Using SEAD to Support Collaboration among Land Managers, Scientists, and the ...Using SEAD to Support Collaboration among Land Managers, Scientists, and the ...
Using SEAD to Support Collaboration among Land Managers, Scientists, and the ...
 
Ignite@AGU14
Ignite@AGU14Ignite@AGU14
Ignite@AGU14
 
Improving Data Management Capacity in the Mekong Basin Using SEAD
Improving Data Management Capacity in the Mekong Basin Using SEADImproving Data Management Capacity in the Mekong Basin Using SEAD
Improving Data Management Capacity in the Mekong Basin Using SEAD
 
Practical and Conceptual Considerations of Research Object Preservation
Practical and Conceptual Considerations of Research Object PreservationPractical and Conceptual Considerations of Research Object Preservation
Practical and Conceptual Considerations of Research Object Preservation
 
Preservation, Publishing, and People: A SEAD View
Preservation, Publishing, and People: A SEAD ViewPreservation, Publishing, and People: A SEAD View
Preservation, Publishing, and People: A SEAD View
 
An Overview of Plans for SEAD
An Overview of Plans for SEADAn Overview of Plans for SEAD
An Overview of Plans for SEAD
 
SEAD: Lightweight Data Services for Sustainability Research
SEAD: Lightweight Data Services for Sustainability ResearchSEAD: Lightweight Data Services for Sustainability Research
SEAD: Lightweight Data Services for Sustainability Research
 
NSF DataNet Partners Update at RDAP14
NSF DataNet Partners Update at RDAP14NSF DataNet Partners Update at RDAP14
NSF DataNet Partners Update at RDAP14
 
SEAD Prototype: Data Curation and Preservation for Sustainability Science
SEAD Prototype: Data Curation and Preservation for Sustainability ScienceSEAD Prototype: Data Curation and Preservation for Sustainability Science
SEAD Prototype: Data Curation and Preservation for Sustainability Science
 
SEAD: Opening Data in the "Long Tail" for Active and Social Curation
SEAD: Opening Data in the "Long Tail" for Active and Social CurationSEAD: Opening Data in the "Long Tail" for Active and Social Curation
SEAD: Opening Data in the "Long Tail" for Active and Social Curation
 
SEAD: A system to support social and active data curation
SEAD: A system to support social and active data curationSEAD: A system to support social and active data curation
SEAD: A system to support social and active data curation
 
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
 

Dernier

Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 

Dernier (20)

Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 

Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs and Increasing Value

  • 1. Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs and Increasing Value Jim Myers1, Margaret Hedstrom1, Beth A Plale2, Praveen Kumar3, Robert McDonald4, Rob Kooper5, Luigi Marini5, Inna Kouper4, Kavitha Chandrasekar4 myersjd@umich.edu 1 School on Information, University of Michigan, Ann Arbor, MI, United States. School of Informatics and Computing, Indiana University, Bloomington, IN, United States. 3 Civil and Environmental Engineering, University of Illinois, Urbana-Champaign, IL, United States. 4 Data To Insight Center, Indiana University, Bloomington, IN, United States. 5 National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign, IL, United States. 2
  • 2. Outline • • • • • Quick Project Intro What is SEAD? (Stop by the SEAD booth!) Why is SEAD? How does SEAD work? Future active and social curation work
  • 3. SEAD: Sustainable Environment Actionable Data • An NSF DataNet project started in October, 2011 • An international resource for sustainability science • A provider of light-weight Data Services based on novel technical and business approaches: – Supporting the long-tail of research – Enabling active and social curation – Providing integrated lifecycle support for data http://sead-data.net/ Margaret Hedstrom, PI Praveen Kumar, co-PI Jim Myers, co-PI Beth Plale, co-PI
  • 4. Sustainability Research • Central to solving many of society’s most critical challenges • An exemplar of modern research – – – – Local processes aggregating to produce global consequences Multiple time scales Coupling of natural and human systems Interacting systems-of-systems requiring multidisciplinary understanding • Environmental – Economic - Social Science Cooperation Technology Policy Economics Poverty & Justice
  • 5. SEAD is: • Data discovery • Project workspaces • A data-aware community network • Curation and preservation services that link to multiple archives and discovery services
  • 6. SEAD is: • Secure project spaces where teams can: – Gather reference data – Upload and share new results – Annotate – Relate – Organize – Publish Project Dashboard
  • 7. SEAD is: • An active repository that creates data pages with – – – – – – – – Previews Extracted Metadata Overlays Tags Comments Provenance Use information Download/Embed
  • 8. SEAD is: • A tool for community exploration: – Personal and Project Profiles – Publications and Data Citations – Co-author, co-investigator graphs – Temporal analysis
  • 9. SEAD is: • A way to preprint and publish data: – Branded interface – Discovery metadata – Drill-down • Sub-collections • Data Pages – Submit for curation and preservation The National Center for Earth Surface Dynamics ~1.6 TB, 450K files (2.2 M objects) representing 10 years of research by multiple teams
  • 10. SEAD is: • A community platform for reference data: – Research Object management – Inference – Curation – Preservation – ID assignment – Catalog Registration – Discovery – Citation Generation SEAD’s Virtual Archive allows curators to access, assess, enhance, package, and submit data from SEAD project repositories for longterm storage in SEAD-managed storage or external institutional repositories and cloud data services.
  • 11. – – – – Apps read what they need and write what they know Curation snapshots meaningful Research Objects Multiple ROs can be defined/managed re-using the same underlying ‘living’ content The larger graph can be ~reassembled w/o the ongoing cost of managing at the item level Flickr-style web management of data Sensor data Semantic Content Middleware over Scalable File System and Triple Store Geospatial, social network mash-ups, workflows and services Curation Services to harvest and package specific data sets Federation of OAI repositories for long-term preservation
  • 12. Why is SEAD needed for curation? • The nature of modern research • The nature of the data documentation problem • Artificial limitations derived from historical practice Unless these issues are addressed (in addition to sheer scale), data curation will remain too cumbersome and expensive for ubiquitous use…
  • 13. Data Challenges in Sustainability Research • Many dimensions, many coordinate systems, many scales, many formats, a long-tail of providers and users, … • Managing this data is a drag on productivity…
  • 14. The Long Tail in Research • Individuals/small groups where: – Scale of research prohibits traditional CI development, dedicated IT support, full-time curator… – shared data but multiple disciplinary views – Projects involve reference data from external sources – Project Team does not control formats and vocabularies These are not just “challenges for the “future
  • 15. Analyzing the curation/preservation problem… • Data and Metadata are known well during the project • Producers actually memorize or record metadata already, and then spend precious time transferring that between people and systems • Data users manually assemble missing data/metadata but don’t often have a way to share that with others • Repositories struggle to attain the domain understanding needed to go beyond basic bibliographic info – Repositories only use metadata to help with data discovery and internal curation decisions Producers Users Bill Michener – DataONE Jim Myers - SEAD Who knows what? When do they know it? Why will they tell you?
  • 16. Our collective legacy • • • • Data can only be in one place… Data transfer is costly… Mistakes are costly… Only the future needs well-organized data  (questionable assumptions) • Curation only happens at data/project/center end-of-life • Submission events must be formal and complete • Only cross-trained professionals are capable of getting it right • Researchers should see curation only as a public service
  • 17. What’s different for users? • When you add a file: – You can get it back, from anywhere – You can see your video, zoom in on images, overlay spatial data on maps and retrieve them from an OGC service endpoint – You see the metadata hidden in the file – You can add titles, descriptions, locations, tags later, not as required parts of a long submit form, and • When you do, they are search terms and ways to create custom maps – You can add good data and bad, and figure out which data to keep later (using provenance to guide you) – Users of your data can add metadata, comments, and derived datasets that improve quality, adapt the data for new purposes, etc.
  • 18. What’s different for curators • Curation starts with data and metadata in hand, not as a search through dusty disks • Curators can embed with project teams • Data comes with – Formal metadata (dc:creator= http://vivo-vis-test.slis.indiana.edu/vivo/individual/n7732 ) – Informal metadata (http://www.holygoat.co.uk/owl/redwood/0.1/tags/taggedWithTag tag:cet.ncsa.uiuc.edu,2008:/tag#bpnm) – Context! (“bpnm” in the WSC_Reach project always means “Birds Point/New Madrid”) – Producers and users – conversations are possible • Packaging, repository selection, submission, registration with catalogs are all automated/semiautomated…
  • 19. SEAD Concept • Leverage incremental, informal active use to capture data and metadata from first sources • Provide data-related (metadata-driven) services to active producers and users of data • Simplify and automate curation and preservation processes using captured information and context • Leverage existing institutional repository technologies and organizations to provide longterm storage Increase Value, Lower Costs, Increase Immediacy
  • 20. SEAD is: • Write once, re-use • Extensible (data, metadata) – within sustainability research and beyond • Incremental • Living datasets  published Research Objects • Scalable A tool for data producers and users… that also provides a long-term data plan… that can be sustainable at community scale
  • 21. How? • Web 2.0, Web 3.0… • Strong collaboration with researchers and curators • Leveraging standards – vocabularies, service endpoints, transfer protocols, submission packages, … • Leveraging existing software – Medici/Tupelo, VIVO, DataConservancy + Jena, GWT, Geoserver, MySql, Fuseki, …
  • 22. Current Status • 10 hosted project spaces for pilot groups on VM farm + community VIVO, VA servers – ~< 2 TB, ~1800 profiles, proof-of-concept submissions to UI and IU institutional repositories • 1.0 OSS release in November, operating as a DataOne production Member Node (next week) – Google sign-in, cybersecurity and usability enhancements, data-maturity-based access control, dashboard, public discovery, and geobrowse interfaces, … • Project info: http://sead-data.net • Demo Space: http://sead-demo.ncsa.illinois.edu
  • 23. Going forward – Version 1.0 released – Open early adopter period – Improving scalability – Exploring social feedback mechanisms to further improve curation – add value, remove costs, engage producers, users, and curators – Active outreach: Use SEAD! (software or services), Extend SEAD!, collaborate with SEAD!
  • 24. Acknowledgements • SEAD Team @ UM, UI, IU • NSF • NCED, IRBO, WSC-Reach, IMLCZO, ICPSR, other sustainability researchers • and Thank You! … stop by the SEAD booth and share your thoughts! http://sead-data.net/
  • 25.
  • 26. SEAD: Components/ Communications HTTP Links/ Embedded Content SEAD VIVO: Browse Through People , Projects, Publications, Data Citations , and Organizations, Visualize Networks and Community Dynamics Main Website: Overview, Project Info, Services, Documentation, News SPARQL Queries HTTP Data/DOI links Active Content Repository (multiple webapps): Branded Public Access Active Project Spaces Individual Data Pages BAGIT Data/ Metadata Transfer SEAD Virtual Archive: Policy Driven Curation Institutional/Cloud/Grid Storage Faceted Search for Reference Data
  • 27. Web Application User Management Data/ Metadata Mgmt Desktop Drop Box Android Upload Branded Repository Geo-webapp Project Summary Web Service APIs Role-based Access Control Extractors and Indexing Tupelo 2 RDF + Files MySQL Search Page Admin Page Map Page Tag Page Collection Pages Data Pages SEAD Active Content Repository Architecture Lucene Modified/ Configured Medici/Tupelo 2 Components Geoserver Local File System SEAD ACR Additions and 3RD Party Components
  • 28. Temporal Visualization Network Visualization Data Citations Organizations Publications Projects People SEAD VIVO Architecture Input Form/Display Generation Internal APIs User Management Joseki/Fuseki/Web Services Entity Management Analytics Jena/RDF MySQL Local File System
  • 29. Geo-spatial Search Facet Search Matchmaking Ingest Processing Curator’s Workbench SEAD Virtual Archive Architecture Web Services APIs Metadata Extraction/ Persistent Identifier/ Indexing/Archival (Adapted DC Workflow) Solr Matchmaker/ DataONE Geospatial BagIt Query Member Query Repository Conversion (XML) Management Node Service Service Solr Indexer PostGIS SWORD Local File System UIUC Ideals IUScholarworks Archival Storage