Repurposing VIVO Data to Analyze Publications and Infer Expertise
1. Repurposing authoritative data
about faculty to analyze publication
output, infer expertise, and
recommend grant opportunities
Paul Albert, Don Carpenter, and Jie Lin
paa2013@med.cornell.edu
Weill Cornell Medical College
14. Proposed question #5
What articles have faculty published
in the last month in which they were
first or last author?
15. Institutional publication
reporting: choose two*
• High quality disambiguation (>90% accuracy)
• Minimal delay between review and inclusion
in the reporting system
• Tool is simple enough to allow anyone to use
* Or one
18. VIVO Dashboard: a tool for easily
running sophisticated reports
Don Carpenter
dwc92@cornell.edu
Cornell University
19. Prime directive of VIVO
Dashboard
Empower untrained users to run
sophisticated semantic queries on Weill
Cornell faculty publications
* Secondary directive: kill Sarah Connor
22. Workflow
• One-time basis, set up the fields in the Drupal
admin
• On a weekly basis, execute a set of SPARQL
queries against VIVO’s semantic endpoint.
• Import resulting .csv files into Drupal.
23. Technology Stack
• Drupal 7.x
• Stores content using the robust indexing application,
Apache Solr
• AJAX
• Key modules
- Apache Solr
- Facet API
- Facet API graphs
- D3.js (visualization library)
- Charts and graphs
- VIVO Dashboard (custom module)
24. Performance
• A previous version using MySQL queries took >10
seconds to load
• Completely rewriting the application in Solr
allows us to store X publications
• Performance is now < 5 seconds
25. Future Work
• Enlist the talents of other Drupal developers
• Release this project as open source code
• Create a visualization for global health expertise
26. Publications
The following publications are for all publications by active Weill Cornell Medical
College faculty as represented in VIVO.
25
50
75
100
Graph List Export
✓ Research Article (657)
✓ In Process (55)
✓ Review (45)
✓ Clinical Guideline (32)
more...
Publication Type
Author Name
Journal ranking 15.4 - 68.3
Date 2009 - Present
Journal Name
28. Pressing needs
1. Researchers, development officers, and funding
agencies frequently complain that the process of
learning about grant opportunities is inefficient.
2. As a project manager for VIVO, I want to
accurately include researchers' fields and
expertise.
29. Maybe the needs of grant
recommendations and expertise
can be addressed... together.
30. 1. Gather information about people and grant notices.
2. Algorithmically make personalized recommend-
ations of grant opportunities. (Hard.)
3. In exchange for the promise of higher quality
recommendations, we get busy researchers to
provide us feedback on our initial inferences about
expertise.
4. Use expertise data in VIVO.
Our intended workflow
31. Sources for people
Source Example
Clinical expertise and board certifications
at WeillCornell.org clinical pathology
Medical Subject Headings (MeSH) in
published papers anti-bacterial agents
Personal statement ... I’ve always enjoyed medical education...
Keywords for NIH grants information system analysis
CFDA labels for NIH grants 93.821 – Lung diseases research
Spending categories for NIH grants neurosciences
ClinicalTrials.gov keywords and system-
inferred MeSH violence research
Global health expertise in Researcher
Profile System Egypt
NCCR category as asserted by CTSC staff Developmental and Child Psychology
35. Concept ranking
• Term Frequency-Inverse Document Frequency –
reward terms for showing up in a person’s list of
terms and penalize terms for being in others.
• Result: no one is expert on “humans”
• No algorithm is perfect so we allow faculty to
provide feedback on the controlled terms we have
inferred for them.
36. Mapping concepts to fields
• Objective of using a limited number of fields is to
increase overlap between people and grants
• 149 (somewhat arbitrarily) defined fields
• Fields represent eight different lists of fields (Map of
Science, ScanGrants, ABMS specialties...)
• Take concepts and fields and do a co-occurrence search
in MEDLINE.
• For example, after weighting by size of field, how often
does “Natural Language Processing” occur in
conjunction with immunology; medical informatics;
urology...?
38. Promise of co-occurrence
searching
Suppose a researcher is working almost exclusively on
autoimmune disease and is highly ranked for the
concept, “apoptosis.”
Apoptosis also frequently co-occurs in MEDLINE with
oncology. Therefore, we can predict her interest in an
oncology grant.
40. Match people to grants
• Not yet done, but early testing is promising.
• The idea is to use cosine similarity to define how
similar any person-grant combination is to any
other person-grant combination
• Then you can rank those connections by people
or by grant.
41. Utility for Development Office
• Suppose Dr. Lamon and the Development Office
want to identify candidates to apply for a
particular grant.
• He can get an ordered list of the top candidates
of the people who are appropriate for this
opportunity.