“Hot Topics: The DuraSpace Community Webinar Series, "Series Five: VIVO: Research Discovery and Networking.” Webinar #1: An Introduction to VIVO, May 14, 2013
Presented by: Dean Krafft, Chief Technology Strategist at Cornell University Library and Chair of the VIVO-DuraSpace Management Committee, Brian Lowe, Semantic Applications Programmer, Cornell and Jon Corson-Rikert, VIVO Development Lead, Cornell
UiPath Community: Communication Mining from Zero to Hero
5-14-13 An Introduction to VIVO Presentation Slides
1. May 14, 2013 Hot Topics: DuraSpace Community Webinar Series
Hot Topics: The DuraSpace
Community Webinar Series
Series Five:
“VIVO: Research Discovery &
Networking ”
Curated by Dean Krafft
2. May 14, 2013 Hot Topics: DuraSpace Community Webinar Series
Webinar 1: Overview of VIVO
Presented by:
Brian Lowe, Semantic Applications Programmer, Cornell
Jon Corson-Rikert, VIVO Development Lead, Cornell
Dean Krafft, Chief Technology Strategist at Cornell
University Library and Chair of the VIVO-DuraSpace
Management Committee
3. What is VIVO?
• A semantic-web-based researcher and
research discovery tool
– People plus much more
• Institution-wide, publicly-visible
information
– For external as well as internal audiences
• An open, shared platform for connecting
scholars, communities, campuses, and
countries using Linked Open Data
5. A brief VIVO history
2003-2005 First realization for the life sciences at
Cornell, as a relational database
2006-2008 Expansion to all disciplines at Cornell,
and conversion to Semantic Web
2009-2012 National Institutes of Health-sponsored
VIVO: Enabling the National Networking
of Scientists project transforms VIVO to
a multi-institutional open source
platform
2013-2014 VIVO Incubator Project with DuraSpace
for open community development
6. Major opportunity, 2009
NIH … “invites applications
designed to develop, enhance, or
extend infrastructure for
connecting people and
resources to facilitate
national discovery of
individuals and of scientific
resources by scientists and
students to encourage
interdisciplinary
collaboration and scientific
exchange.”
8. VIVO Collaboration
Cornell University
Dean Krafft (Cornell PI)
Manolo Bevia
Jim Blake
Nick Cappadona
Brian Caruso
Jon Corson-Rikert
Elly Cramer
Medha Devare
Elizabeth Hines
Huda Khan
Depak Konidena
Brian Lowe
Joseph McEnerney
Holly Mistlebauer
Stella Mitchell
Anup Sawant
Christopher Westling
Tim Worrall
Rebecca Younes
University of Florida
Mike Conlon (VIVO and UF PI)
Beth Auten
Michael Barbieri
Chris Barnes
Kaitlin Blackburn
Cecilia Botero
Kerry Britt
Erin Brooks
Amy Buhler
Ellie Bushhousen
Linda Butson
Chris Case
Christine Cogar
Valrie Davis
Mary Edwards
Nita Ferree
Rolando Garcia-Milan
George Hack
Chris Haines
Sara Henning
Rae Jesano
Margeaux Johnson
Meghan Latorre
Yang Li
Jennifer Lyon
Paula Markes
Hannah Norton
James Pence
Narayan Raum
Nicholas Rejack
Alexander Rockwell
Sara Russell Gonzalez
Nancy Schaefer
Dale Scheppler
Nicholas Skaggs
Matthew Tedder
Michele R. Tennant
Alicia Turner
Stephen Williams
Indiana University
Katy Borner (IU PI)
Kavitha Chandrasekar
Bin Chen
Shanshan Chen
Ryan Cobine
Jeni Coffey
Suresh Deivasigamani
Ying Ding
Russell Duhon
Jon Dunn
Poornima Gopinath
Julie Hardesty
Brian Keese
Namrata Lele
Micah Linnemeier
Nianli Ma
Robert H. McDonald
Asik Pradhan Gongaju
Mark Price
Michael Stamper
Yuyin Sun
Chintan Tank
Alan Walsh
Brian Wheeler
Feng Wu
Angela Zoss
Ponce School of Medicine
Richard J. Noel, Jr. (Ponce PI)
Ricardo Espada Colon
Damaris Torres Cruz
Michael Vega Negrón
This project is funded by the National Institutes of Health, U24 RR029822
"VIVO: Enabling National Networking of Scientists”
The Scripps Research
Institute
Gerald Joyce (Scripps PI)
Catherine Dunn
Sam Katkov
Brant Kelley
Paula King
Angela Murrell
Barbara Noble
Cary Thomas
Michaeleen Trimarchi
Washington University School of
Medicine in St. Louis
Rakesh Nagarajan (WUSTL PI)
Kristi L. Holmes
Caerie Houchins
George Joseph
Sunita B. Koul
Leslie D. McIntosh
Weill Cornell Medical College
Curtis Cole (Weill PI)
Paul Albert
Victor Brodsky
Mark Bronnimann
Adam Cheriff
Oscar Cruz
Dan Dickinson
Richard Hu
Chris Huang
Itay Klaz
Kenneth Lee
Peter Michelini
Grace Migliorisi
John Ruffing
Jason Specland
Tru Tran
Vinay Varughese
Virgil Wong
9. What does VIVO do?
• Integrates multiple sources of data
– Systems of record
– Faculty activity reporting
– External sources (e.g., Scopus, PubMed,
NIH RePORTER)
• Provides a review and editing interface
– Single sign-on for self-editing or by
proxy
• Provides integrated, filterable feeds to
other websites
13. Enabling an (inter)national network
• Open software
• Open data
• Local control
• Decentralized infrastructure
14. What does VIVO model?
• People and more
– Organizations, grants, programs, projects,
publications, events, facilities, and research
resources
• Relationships among the above
– Meaningful
– Bidirectional
– Navigable context
• Links to URIs elsewhere
– Concepts, identifiers
– People, places, organizations, events
16. Value for institutions
• Common data substrate
– Public, granular and direct
– Discovery via external and internal search
engines
– Available for reuse at many levels
• Distributed curation
– E.g., affiliations beyond what HR system tracks
– Data coordination across functional silos
– Feeding changes back to systems of record
– Direct linking across campuses
• Data that is visible gets fixed
17. The Semantic Web
• Turn data into a web of simple links
• Use ontology to explain how things are
linked
• Use reasoning to add new links
automatically
• Be flexible and extensible
18. The VIVO ontology
• Describe people and organizations in
the process of doing research
• Stay discipline neutral
• Use existing scientific domain
terminology to describe content of
research
19. What is Linked Open Data (LOD)?
• Data
– Structured information, not just documents
with text
– A common, simple format
• Open
– Available, visible, mine-able
– Anyone can post, consume, and reuse
• Linked
– Directly by reference
– Indirectly through common references and
inference
21. Linked data indexed for search
Ponce
VIVO
Ponce
VIVO
WashU
VIVO
WashU
VIVO
IU
VIVO
IU
VIVO
Cornel
l
Ithaca
VIVO
Cornel
l
Ithaca
VIVO
Weill
Cornel
l
VIVO
Weill
Cornel
l
VIVO
eagle-i
research
resources
eagle-i
research
resources Harvard
Profiles
RDF
Harvard
Profiles
RDF
Other
VIVOs
Other
VIVOs
Digital
Vita
RDF
Digital
Vita
RDF
Iowa
Loki
RDF
Iowa
Loki
RDF
Linked Open DataLinked Open Data
vivo
search
.org
UF
VIVO
UF
VIVO
Scripps
VIVO
Scripps
VIVO
Solr
search
index
Solr
search
index
another
Solr
index
another
Solr
index
22.
23.
24.
25.
26. Implementation challenges
• A simple idea – take the basic public
information about researchers at Cornell
and make it easy to find for academic
purposes
• Why is this hard?
27. Policy issues
• Dirty data
• Lack even of common definitions of
organizations or who’s faculty
• Data ownership
• Many dimensions of privacy
• Short-term “go it alone” vs. common
good
32. Weill Cornell research reporting
• How has the number of publications co-
authored with other institutions
changed year to year?
33.
34. Multi-institutional scenarios for VIVO
• Multiple campuses of one university
• University and federal lab connections
– E.g., Colorado ties with regional federal
labs
• Consortia – 60 CTSAs
• International
– 13 Netherlands universities and the
National Library
– AgriVIVO
35. Benefits across institutions
• Sharing experience provides clarity and new
ideas
• Incentives from sharing development, tools,
customizations
• Potential data-level connectivity
– Research is happening increasingly in
teams that span institutions
– Meeting the needs of short and long-term
virtual organizations
36. From outputs to outcomes
• Outputs like papers and patents can be tracked
– Collaborative ontology effort to adequately
represent the humanities
• Outcomes such as economic impact or societal
benefit are much harder to identify
• Questions about return on research investment
beg for consistent, comparable data
– over time
– across institutions
– across domains
39. Partnerships – ORCID
• Open Researcher and Contributor ID
– Attribution for works of any type
• ORCID and VIVO
– ORCID is an attribute in a VIVO profile
– Tools being tested for submission of
researcher registrations from VIVO
http://orcid.org
40. VIVO/DuraSpace Partnership
• DuraSpace is a not-for-profit organization
supporting the DSpace and Fedora repositories
• Serves as the open source community home for
future VIVO development
• Provides a legal and financial framework,
extensive tools, and proven track record of
managing community developed open source
projects
• Joint two-year initial governance based on
founding sponsors, management team, and
dedicated development and leadership effort
42. Meeting about VIVO
• 2nd Australian VIVO Days in February
• CU Boulder hosted 50 attendees for the
3rd
VIVO Implementation Fest in April
• May 20th
VIVO event for New York City
area institutions
• August 2013 will be the 4th
Annual VIVO
Conference – approximately 200-250
attendees, with workshops, papers,
keynotes, invited talks, and posters
43.
44. Research Informatics Infrastructure
• USDA adopting for intramural research,
and also using VIVO to knit together
data from their 7 major agencies to
fulfill reporting mandates to Office of
Science & Technology Policy and
Congress
• National Center for Atmospheric
Research (NCAR) is piloting VIVO to
coordinate large, multi-year, multi-
institutional, multi-instrument research
projects
45. Research Informatics Infrastructure –
cont.
• Accurate, structured VIVO data can feed
external profiling and discovery systems
(ORCID, Google Scholar, Academic
Analytics, etc.)
• VIVO extensibility allows it to represent
research resources and tie them to
research datasets, publications, and
researchers, promoting data discovery
and reuse
51. CTSAconnect and the ISF
• VIVO and eagle-i team members won NIH
funding in 2012 for a project to unify their
ontologies and extend both in the clinical
domain
• The unified ontology is known as the
Integrated Semantic Framework, or ISF
• VIVO 1.6 and eagle-i’s next release will use the
ISF
• This combined ontology is modular to allow
selective data population based on local needs
54. Challenges
• Communicating VIVO’s goals to faculty,
administrators, funders, and other
institutions
• Adapting to constant changes in data
sources
• Fully exploiting the opportunities provided
by VIVO linked open data
• Co-existing in a world where not everyone
uses VIVO
• Positioning VIVO on a sustainable path
55. Next Webinar: Case Studies
• Tuesday, June 4
• Colorado
• Duke
• Brown
• Weill Cornell Medical College
56. 3rd
Webinar – Technical Deep Dive
• Tuesday, June 11
• Ontology & Linked Data
• Open source technologies used
• What’s coming in v1.6
• VIVO technical community touch points
• Many ways to participate, benefit, and
contribute
57. May 14, 2013 Hot Topics: DuraSpace Community Webinar Series
Questions?
Notes de l'éditeur
historical overview – motivation here at Cornell before we thought of the larger context access across disciplines via a structure emphasizing connections over hierarchies predicated on information emerging from what people are doing – outputs, yes, but other activities including grants, teaching, and talks
motivation for the NIH grant and bigger vision of the VIVO network, from the opening scholarly needs and desires realization by NIH of benefits to science when communities have organized community resources like ontologies, databases, and repositories Also aware that NIH can’t fund the full cost -looking for tools that are locally sustainable
motivation for the NIH grant and bigger vision of the VIVO network, from the opening scholarly needs and desires realization by NIH of benefits to science when communities have organized community resources like ontologies, databases, and repositories Also aware that NIH can’t fund the full cost -looking for tools that are locally sustainable
As you can see, The VIVO project itself is a rather large, geographically dispersed team. 7 institutions Project areas: development, implementation, ontology, and outreach
Abawi slide (small) + predicated on information emerging from what people are doing – outputs, yes, but other activities including grants, teaching, and talks
predicated on information emerging from what people are doing – outputs, yes, but other activities including grants, teaching, and talks
Reasoning example: sameAs An ontology is a representation of entities and relations … … for a part of reality … … expressed in human and computer interpretable form
An ontology is a representation of entities and relations … … for a part of reality … … expressed in human and computer interpretable form
The Mike Conlon slide
Results link back to home instititution
See all or see nothing across colleges Information in aggregate has been known to reveal unintended identifiable data
Our philosophy is you can save time and improve currency and accuracy if you only have to input information one time. This is a feature that is very appealing to faculty. [Cornell intends to participate in the Economic Development Portal being developed by NY????? Having data in VIVO means we can pipe the data to the Portal, faculty don’t have to fill out yet form.] The College of Arts and Sciences is using VIVO to feed core data into departmental websites.
Another example of the reuse of data is the geographic display of data. CALS faculty report the impact of their work, and include the geographic focus. Using that data we can generate displays of where particular work is being done. The map in the upper right is New York State, but we can instantly render similar maps for the United States and the world.
Transform static data into a network Leverage relationships – topic, place, and shared activities We’re not just the nodes, we’re the connections
IICA
Dean will be talking about this
But it’s a many-layered problem mention Bill Trochim Fifteen Most Promising Clinical Research Processes and Outcomes Metrics from Evaluation KFC Annual Meeting Time from IRB submission to approval Studies meeting accrual goals Time from notice of grant award to study opening (e.g., investigator initiated studies) Number of technology transfer products Volume of investigators who used services Volume of types of services used Satisfaction/needs assessment Time to publication Influence of research publication (e.g., observed/expected citations) Researcher collaboration (e.g., team science; collaboration index) ROI of pilot and KL2 scholars Time from publication to a research synthesis Career development Career trajectory (e.g., K-R transition) Institutional collaboration (public-private; cross-institutional; community)
ISF, euroCRIS, CASRAI, Lattes – pushing toward data compatibility across the world's major initiatives
ISF, euroCRIS, CASRAI, Lattes – pushing toward data compatibility across the world's major initiatives
University of Colorado Boulder Laboratory for Atmospheric and Space Physics additional information about research projects, equipment, and facilities will be stored behind a firewall Linked to main Boulder VIVO via sameAs
Focus on “enter once, use many times” Cornell needs data, including expertise, partnerships, and geographic focus for: Cornell sesquicentennial campaign NY state economic development SUNY Knowledge Network Carnegie classification as an institution of community engagement Competitive landscape analysis Highlight Cornell’s uniqueness while feeding into national initiatives Build on our existing partnerships, through the Library, campus-wide teams, and specific units as necessary Strong collaboration already with Weill – build on VIVO’s momentum to make Cornell a leader in this domain