SlideShare une entreprise Scribd logo
1  sur  42
Repurposing authoritative data
about faculty to analyze publication
output, infer expertise, and
recommend grant opportunities
Paul Albert, Don Carpenter, and Jie Lin
paa2013@med.cornell.edu
Weill Cornell Medical College
Email
cyz2123@med.cornell.edu
Phone
646-962-2551
Address
1300 York Avenue
New York, NY 10065
Other sites
Clinical profile
Email
cyz2123@med.cornell.edu
Phone
646-962-2551
Address
1300 York Avenue
New York, NY 10065
Other sites
Clinical profile
Where is the ongoing
motivation to keep these
profiles current?
Researchers use profile
systems to find collaborators.
A widely invoked “fact” about VIVO
(also an old Russian proverb)
“
How can VIVO data address
pressing needs in order to
strengthen its viability?
1. Administrators want reports.
2. Both administrators and researchers
want to know about funding
opportunities.
Pressing needs
Invention is 1% inspiration and
(due to rounding error) 98%
perspiration.
Thomas A. Edison
Source: Yahoo Answers
“
1. Administrators want reports.
2. Both administrators and researchers
want to know about funding
opportunities.
Pressing needs
Administrators are avid
consumers of institutional data.
Proposed question #1
Publications appearing in journals of
a given impact factor
Proposed question #2
In any given year, which paper has
the most incoming citations?
Proposed question #3
Which papers that have received
federal funding are not deposited in
PubMed Central?
Proposed question #4
Which clinical departments tend to
publish the most?
Proposed question #5
What articles have faculty published
in the last month in which they were
first or last author?
Institutional publication
reporting: choose two*
• High quality disambiguation (>90% accuracy)
• Minimal delay between review and inclusion
in the reporting system
• Tool is simple enough to allow anyone to use
* Or one
Sample SPARQL query
SELECT distinct ?Person1_firstName ?Person1_lastName ?Person1_primaryEmail ?
AcademicArticle1_label ?Journal1_label ?AcademicArticle1_pmid ?
DateTimeValue1_dateTime
WHERE{
?AcademicArticle1 rdf:type bibo:Document .
?AcademicArticle1 bibo:pmid ?AcademicArticle1_pmid .
?AcademicArticle1 vivo:dateTimeValue ?DateTimeValue1 .
?AcademicArticle1 vivo:informationResourceSupportedBy ?FundingOrganization1 .
?AcademicArticle1 bibo:pmid ?AcademicArticle1_pmid .
?DateTimeValue1 rdf:type vivo:DateTimeValue .
?DateTimeValue1 vivo:dateTime ?DateTimeValue1_dateTime .
?FundingOrganization1 rdf:type vivo:FundingOrganization .
?FundingOrganization1 rdfs:label ?FundingOrganization1_label .
?AcademicArticle1 rdfs:label ?AcademicArticle1_label .
?AcademicArticle1 vivo:hasPublicationVenue ?Journal1 .
?Journal1 rdf:type bibo:Journal .
?Journal1 rdfs:label ?Journal1_label .
?AcademicArticle1 vivo:informationResourceInAuthorship ?Authorship1 .
?Authorship1 rdf:type vivo:Authorship .
?Authorship1 vivo:linkedAuthor ?Person1 .
?Person1 rdf:type foaf:Person .
?Person1 vivo:primaryEmail ?Person1_primaryEmail .
?Person1 wcmc:cwid ?Person1_cwid .
?Person1 foaf:firstName ?Person1_firstName .
?Person1 foaf:lastName ?Person1_lastName .
FILTER REGEX (str(?FundingOrganization1_label), 'N.I.H.', 'i')
FILTER NOT EXISTS { ?AcademicArticle1 vivo:pmcid ?AcademicArticle1_pmcid .}
FILTER (xsd:dateTime(?DateTimeValue1_dateTime) >
"2008-04-01T00:00:00"^^xsd:dateTime)
FILTER (xsd:dateTime(?DateTimeValue1_dateTime) <
"2012-12-01T00:00:00"^^xsd:dateTime)
}
ORDER BY ?Person1_lastName
SELECT distinct ?Person1_firstName ?Person1_lastName ?
Person1_primaryEmail  ?AcademicArticle1_label ?Journal1_label ?
AcademicArticle1_pmid ?DateTimeValue1_dateTime
WHERE{
?AcademicArticle1 rdf:type bibo:Document .
?AcademicArticle1 bibo:pmid ?AcademicArticle1_pmid .
?AcademicArticle1 vivo:dateTimeValue ?DateTimeValue1 .
?AcademicArticle1 vivo:informationResourceSupportedBy ?
FundingOrganization1 .
?AcademicArticle1 bibo:pmid ?AcademicArticle1_pmid .
?DateTimeValue1 rdf:type vivo:DateTimeValue .
?DateTimeValue1 vivo:dateTime ?DateTimeValue1_dateTime .
?FundingOrganization1 rdf:type vivo:FundingOrganization .
?FundingOrganization1 rdfs:label ?FundingOrganization1_label .
?AcademicArticle1 rdfs:label ?AcademicArticle1_label .
?AcademicArticle1 vivo:hasPublicationVenue ?Journal1 .
?Journal1 rdf:type bibo:Journal .
?Journal1 rdfs:label ?Journal1_label .
?AcademicArticle1 vivo:informationResourceInAuthorship ?
Authorship1 .
?Authorship1 rdf:type vivo:Authorship .
?Authorship1 vivo:linkedAuthor ?Person1 .
?Person1 rdf:type foaf:Person .
?Person1 vivo:primaryEmail ?Person1_primaryEmail .
?Person1 wcmc:cwid ?Person1_cwid .
?Person1 foaf:firstName ?Person1_firstName .
?Person1 foaf:lastName ?Person1_lastName .
FILTER REGEX (str(?FundingOrganization1_label), 'N.I.H.', 'i')
FILTER NOT EXISTS { ?AcademicArticle1 vivo:pmcid ?
AcademicArticle1_pmcid .}
FILTER (xsd:dateTime(?DateTimeValue1_dateTime) >
"2008-04-01T00:00:00"^^xsd:dateTime)
FILTER (xsd:dateTime(?DateTimeValue1_dateTime) <
"2012-12-01T00:00:00"^^xsd:dateTime)
}
  
ORDER BY ?Person1_lastName
+
VIVO
Dashboard
VIVO Dashboard: a tool for easily
running sophisticated reports
Don Carpenter
dwc92@cornell.edu
Cornell University
Prime directive of VIVO
Dashboard
Empower untrained users to run
sophisticated semantic queries on Weill
Cornell faculty publications
* Secondary directive: kill Sarah Connor
Sample SPARQL query
SELECT distinct ?Article1_pmid ?Person1_cwid ?
Authorship1_authorRank
WHERE{
?Article1 rdf:type bibo:Document .
?Article1 vivo:informationResourceInAuthorship ?Authorship1 .
?Article1 bibo:pmid ?Article1_pmid .
?Authorship1 rdf:type vivo:Authorship .
?Authorship1 vivo:authorRank ?Authorship1_authorRank .
?Authorship1 vivo:linkedAuthor ?Person1 .
?Person1 rdf:type foaf:Person .
?Person1 wcmc:cwid ?Person1_cwid .
}
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Demo
Workflow
• One-time basis, set up the fields in the Drupal
admin
• On a weekly basis, execute a set of SPARQL
queries against VIVO’s semantic endpoint.
• Import resulting .csv files into Drupal.
Technology Stack
• Drupal 7.x
• Stores content using the robust indexing application,
Apache Solr 
• AJAX
• Key modules
- Apache Solr
- Facet API
- Facet API graphs
- D3.js (visualization library)
- Charts and graphs
- VIVO Dashboard (custom module)
Performance
• A previous version using MySQL queries took >10
seconds to load
• Completely rewriting the application in Solr
allows us to store X publications
• Performance is now < 5 seconds
Future Work
• Enlist the talents of other Drupal developers
• Release this project as open source code
• Create a visualization for global health expertise
Publications
The following publications are for all publications by active Weill Cornell Medical
College faculty as represented in VIVO.
25
50
75
100
Graph List Export
✓ Research Article (657)
✓ In Process (55)
✓ Review (45)
✓ Clinical Guideline (32)
more...
Publication Type
Author Name
Journal ranking 15.4 - 68.3
Date 2009 - Present
Journal Name
Repurposing authoritative
semantic data to infer expertise
and recommend grant
opportunities
Jie Lin
jie265@gmail.com
Cornell University
Pressing needs
1. Researchers, development officers, and funding
agencies frequently complain that the process of
learning about grant opportunities is inefficient. 
2. As a project manager for VIVO, I want to
accurately include researchers' fields and
expertise.
Maybe the needs of grant
recommendations and expertise
can be addressed... together.
1. Gather information about people and grant notices.
2. Algorithmically make personalized recommend-
ations of grant opportunities. (Hard.)
3. In exchange for the promise of higher quality
recommendations, we get busy researchers to
provide us feedback on our initial inferences about
expertise.
4. Use expertise data in VIVO.
Our intended workflow
Sources for people
Source Example
Clinical expertise and board certifications
at WeillCornell.org clinical pathology
Medical Subject Headings (MeSH) in
published papers anti-bacterial agents
Personal statement ... I’ve always enjoyed medical education...
Keywords for NIH grants information system analysis
CFDA labels for NIH grants 93.821 – Lung diseases research
Spending categories for NIH grants neurosciences
ClinicalTrials.gov keywords and system-
inferred MeSH violence research
Global health expertise in Researcher
Profile System Egypt
NCCR category as asserted by CTSC staff Developmental and Child Psychology
ScanGrants
Sources for grant opportunities
Grants.gov
After global pre-filtration n = ~1,200
Concept ranking
• Term Frequency-Inverse Document Frequency –
reward terms for showing up in a person’s list of
terms and penalize terms for being in others. 
• Result: no one is expert on “humans”
• No algorithm is perfect so we allow faculty to
provide feedback on the controlled terms we have
inferred for them.
Mapping concepts to fields
• Objective of using a limited number of fields is to
increase overlap between people and grants
• 149 (somewhat arbitrarily) defined fields
• Fields represent eight different lists of fields (Map of
Science, ScanGrants, ABMS specialties...)
• Take concepts and fields and do a co-occurrence search
in MEDLINE.
• For example, after weighting by size of field, how often
does “Natural Language Processing” occur in
conjunction with immunology; medical informatics;
urology...?
The math for mapping people
and grants to fields
Promise of co-occurrence
searching
Suppose a researcher is working almost exclusively on
autoimmune disease and is highly ranked for the
concept, “apoptosis.”
Apoptosis also frequently co-occurs in MEDLINE with
oncology. Therefore, we can predict her interest in an
oncology grant.
The downside of co-occurrence
searching
Match people to grants
• Not yet done, but early testing is promising.
• The idea is to use cosine similarity to define how
similar any person-grant combination is to any
other person-grant combination
• Then you can rank those connections by people
or by grant.
Utility for Development Office
• Suppose Dr. Lamon and the Development Office
want to identify candidates to apply for a
particular grant. 
• He can get an ordered list of the top candidates
of the people who are appropriate for this
opportunity.
Demonstration
Demonstration
Demonstration
Demonstration
Demonstration
Demonstration
Demonstration
Demonstration
Demonstration
Demonstration

Contenu connexe

Tendances

Why i left my job in genomics R&D - Lunteren - april 18 - 2016
Why i left my job in genomics R&D - Lunteren - april 18 - 2016Why i left my job in genomics R&D - Lunteren - april 18 - 2016
Why i left my job in genomics R&D - Lunteren - april 18 - 2016Fiona Nielsen
 
The Right Metrics for Generation Open [Open Access Week 2014]
The Right Metrics for Generation Open [Open Access Week 2014]The Right Metrics for Generation Open [Open Access Week 2014]
The Right Metrics for Generation Open [Open Access Week 2014]Impactstory Team
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksCarole Goble
 
Software Citation and a Proposal (NSF workshop at Havard Medical School)
Software Citation and a Proposal (NSF workshop at Havard Medical School)Software Citation and a Proposal (NSF workshop at Havard Medical School)
Software Citation and a Proposal (NSF workshop at Havard Medical School)James Howison
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...Carole Goble
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Carole Goble
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Mark Wilkinson
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsManuel Corpas
 
W3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description GuidelinesW3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description GuidelinesMichel Dumontier
 
The Impact of OpenSocial at UCSF
The Impact of OpenSocial at UCSFThe Impact of OpenSocial at UCSF
The Impact of OpenSocial at UCSFericmeeks
 

Tendances (20)

Why i left my job in genomics R&D - Lunteren - april 18 - 2016
Why i left my job in genomics R&D - Lunteren - april 18 - 2016Why i left my job in genomics R&D - Lunteren - april 18 - 2016
Why i left my job in genomics R&D - Lunteren - april 18 - 2016
 
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
 
The Right Metrics for Generation Open [Open Access Week 2014]
The Right Metrics for Generation Open [Open Access Week 2014]The Right Metrics for Generation Open [Open Access Week 2014]
The Right Metrics for Generation Open [Open Access Week 2014]
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
 
Software Citation and a Proposal (NSF workshop at Havard Medical School)
Software Citation and a Proposal (NSF workshop at Havard Medical School)Software Citation and a Proposal (NSF workshop at Havard Medical School)
Software Citation and a Proposal (NSF workshop at Havard Medical School)
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
 
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014
 
RSC ChemSpider is the online chemistry database where community contributions...
RSC ChemSpider is the online chemistry database where community contributions...RSC ChemSpider is the online chemistry database where community contributions...
RSC ChemSpider is the online chemistry database where community contributions...
 
Some Early Thoughts
Some Early ThoughtsSome Early Thoughts
Some Early Thoughts
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics Datasets
 
W3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description GuidelinesW3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description Guidelines
 
Cvm library cpc guide 2013
Cvm library cpc guide 2013Cvm library cpc guide 2013
Cvm library cpc guide 2013
 
Cheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural ProductsCheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural Products
 
The Impact of OpenSocial at UCSF
The Impact of OpenSocial at UCSFThe Impact of OpenSocial at UCSF
The Impact of OpenSocial at UCSF
 
Rosenblum "Challenges Citing Preprints and How to Tackle Them"
Rosenblum "Challenges Citing Preprints and How to Tackle Them"Rosenblum "Challenges Citing Preprints and How to Tackle Them"
Rosenblum "Challenges Citing Preprints and How to Tackle Them"
 
Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008
 
Improving online chemistry one structure at a time
Improving online chemistry one structure at a timeImproving online chemistry one structure at a time
Improving online chemistry one structure at a time
 

Similaire à Repurposing VIVO Data to Analyze Publications and Infer Expertise

How Semantic Technology Helps Researchers
How Semantic Technology Helps ResearchersHow Semantic Technology Helps Researchers
How Semantic Technology Helps ResearchersDarrell W. Gunter
 
Standardizing scholarly output with the VIVO ontology
Standardizing scholarly output with the VIVO ontologyStandardizing scholarly output with the VIVO ontology
Standardizing scholarly output with the VIVO ontologymhaendel
 
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)Yoon Sup Choi
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataBarry Smith
 
Think between the boxes: the Casrai interoperability approach
Think between the boxes: the Casrai interoperability approachThink between the boxes: the Casrai interoperability approach
Think between the boxes: the Casrai interoperability approachORCID, Inc
 
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical Literature
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical LiteratureII-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical Literature
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical LiteratureDr. Haxel Consult
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015William Gunn
 
How Researchers Can Get Science Done Faster Using an R&D Services Marketplace
How Researchers Can Get Science Done Faster Using an R&D Services MarketplaceHow Researchers Can Get Science Done Faster Using an R&D Services Marketplace
How Researchers Can Get Science Done Faster Using an R&D Services MarketplaceSC CTSI at USC and CHLA
 
Nowomics at Cambridge Open Research
Nowomics at Cambridge Open ResearchNowomics at Cambridge Open Research
Nowomics at Cambridge Open ResearchNowomics
 
Practical applications for altmetrics in a changing metrics landscape
Practical applications for altmetrics in a changing metrics landscapePractical applications for altmetrics in a changing metrics landscape
Practical applications for altmetrics in a changing metrics landscapeDigital Science
 
Steffen Frederiksen: DATA, DITA, DOCX
Steffen Frederiksen: DATA, DITA, DOCXSteffen Frederiksen: DATA, DITA, DOCX
Steffen Frederiksen: DATA, DITA, DOCXJack Molisani
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedSri Ambati
 
Kwon Ph.D. Dissertation 2016
Kwon Ph.D. Dissertation 2016Kwon Ph.D. Dissertation 2016
Kwon Ph.D. Dissertation 2016Karl Kwon, Ph.D.
 
Kyeongan Kwon - PhD Dissertation 2016
Kyeongan Kwon - PhD Dissertation 2016Kyeongan Kwon - PhD Dissertation 2016
Kyeongan Kwon - PhD Dissertation 2016Karl Kwon, Ph.D.
 
Data at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsData at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsPhilip Bourne
 
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...DeVonne Parks, CEM
 
Orcid charleston presentation 110410
Orcid charleston presentation 110410Orcid charleston presentation 110410
Orcid charleston presentation 110410DKochalko
 

Similaire à Repurposing VIVO Data to Analyze Publications and Infer Expertise (20)

How Semantic Technology Helps Researchers
How Semantic Technology Helps ResearchersHow Semantic Technology Helps Researchers
How Semantic Technology Helps Researchers
 
Standardizing scholarly output with the VIVO ontology
Standardizing scholarly output with the VIVO ontologyStandardizing scholarly output with the VIVO ontology
Standardizing scholarly output with the VIVO ontology
 
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort Data
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
Think between the boxes: the Casrai interoperability approach
Think between the boxes: the Casrai interoperability approachThink between the boxes: the Casrai interoperability approach
Think between the boxes: the Casrai interoperability approach
 
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical Literature
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical LiteratureII-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical Literature
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical Literature
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015
 
How Researchers Can Get Science Done Faster Using an R&D Services Marketplace
How Researchers Can Get Science Done Faster Using an R&D Services MarketplaceHow Researchers Can Get Science Done Faster Using an R&D Services Marketplace
How Researchers Can Get Science Done Faster Using an R&D Services Marketplace
 
Nowomics at Cambridge Open Research
Nowomics at Cambridge Open ResearchNowomics at Cambridge Open Research
Nowomics at Cambridge Open Research
 
Clinical Anatomy 9566
Clinical Anatomy 9566Clinical Anatomy 9566
Clinical Anatomy 9566
 
Practical applications for altmetrics in a changing metrics landscape
Practical applications for altmetrics in a changing metrics landscapePractical applications for altmetrics in a changing metrics landscape
Practical applications for altmetrics in a changing metrics landscape
 
Öppen data och forskningens genomslag
Öppen data och forskningens genomslagÖppen data och forskningens genomslag
Öppen data och forskningens genomslag
 
Steffen Frederiksen: DATA, DITA, DOCX
Steffen Frederiksen: DATA, DITA, DOCXSteffen Frederiksen: DATA, DITA, DOCX
Steffen Frederiksen: DATA, DITA, DOCX
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
 
Kwon Ph.D. Dissertation 2016
Kwon Ph.D. Dissertation 2016Kwon Ph.D. Dissertation 2016
Kwon Ph.D. Dissertation 2016
 
Kyeongan Kwon - PhD Dissertation 2016
Kyeongan Kwon - PhD Dissertation 2016Kyeongan Kwon - PhD Dissertation 2016
Kyeongan Kwon - PhD Dissertation 2016
 
Data at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsData at the NIH: Some Early Thoughts
Data at the NIH: Some Early Thoughts
 
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
 
Orcid charleston presentation 110410
Orcid charleston presentation 110410Orcid charleston presentation 110410
Orcid charleston presentation 110410
 

Plus de Paul Albert

Perennial success with VIVO: sustained engagement with stakeholders and the c...
Perennial success with VIVO: sustained engagement with stakeholders and the c...Perennial success with VIVO: sustained engagement with stakeholders and the c...
Perennial success with VIVO: sustained engagement with stakeholders and the c...Paul Albert
 
How is the VIVO ontology structured?
How is the VIVO ontology structured?How is the VIVO ontology structured?
How is the VIVO ontology structured?Paul Albert
 
VIVO: enabling the discovery of research and scholarship
VIVO: enabling the discovery of research and scholarshipVIVO: enabling the discovery of research and scholarship
VIVO: enabling the discovery of research and scholarshipPaul Albert
 
VIVO: an interdisciplinary national network
VIVO: an interdisciplinary national networkVIVO: an interdisciplinary national network
VIVO: an interdisciplinary national networkPaul Albert
 
Introduction to Evidence Based Medicine
Introduction to Evidence Based MedicineIntroduction to Evidence Based Medicine
Introduction to Evidence Based MedicinePaul Albert
 
An Introduction to VIVO
An Introduction to VIVOAn Introduction to VIVO
An Introduction to VIVOPaul Albert
 

Plus de Paul Albert (6)

Perennial success with VIVO: sustained engagement with stakeholders and the c...
Perennial success with VIVO: sustained engagement with stakeholders and the c...Perennial success with VIVO: sustained engagement with stakeholders and the c...
Perennial success with VIVO: sustained engagement with stakeholders and the c...
 
How is the VIVO ontology structured?
How is the VIVO ontology structured?How is the VIVO ontology structured?
How is the VIVO ontology structured?
 
VIVO: enabling the discovery of research and scholarship
VIVO: enabling the discovery of research and scholarshipVIVO: enabling the discovery of research and scholarship
VIVO: enabling the discovery of research and scholarship
 
VIVO: an interdisciplinary national network
VIVO: an interdisciplinary national networkVIVO: an interdisciplinary national network
VIVO: an interdisciplinary national network
 
Introduction to Evidence Based Medicine
Introduction to Evidence Based MedicineIntroduction to Evidence Based Medicine
Introduction to Evidence Based Medicine
 
An Introduction to VIVO
An Introduction to VIVOAn Introduction to VIVO
An Introduction to VIVO
 

Dernier

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 

Dernier (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 

Repurposing VIVO Data to Analyze Publications and Infer Expertise

  • 1. Repurposing authoritative data about faculty to analyze publication output, infer expertise, and recommend grant opportunities Paul Albert, Don Carpenter, and Jie Lin paa2013@med.cornell.edu Weill Cornell Medical College
  • 3. Email cyz2123@med.cornell.edu Phone 646-962-2551 Address 1300 York Avenue New York, NY 10065 Other sites Clinical profile Where is the ongoing motivation to keep these profiles current?
  • 4. Researchers use profile systems to find collaborators. A widely invoked “fact” about VIVO (also an old Russian proverb) “
  • 5. How can VIVO data address pressing needs in order to strengthen its viability?
  • 6. 1. Administrators want reports. 2. Both administrators and researchers want to know about funding opportunities. Pressing needs
  • 7. Invention is 1% inspiration and (due to rounding error) 98% perspiration. Thomas A. Edison Source: Yahoo Answers “
  • 8. 1. Administrators want reports. 2. Both administrators and researchers want to know about funding opportunities. Pressing needs
  • 9. Administrators are avid consumers of institutional data.
  • 10. Proposed question #1 Publications appearing in journals of a given impact factor
  • 11. Proposed question #2 In any given year, which paper has the most incoming citations?
  • 12. Proposed question #3 Which papers that have received federal funding are not deposited in PubMed Central?
  • 13. Proposed question #4 Which clinical departments tend to publish the most?
  • 14. Proposed question #5 What articles have faculty published in the last month in which they were first or last author?
  • 15. Institutional publication reporting: choose two* • High quality disambiguation (>90% accuracy) • Minimal delay between review and inclusion in the reporting system • Tool is simple enough to allow anyone to use * Or one
  • 16. Sample SPARQL query SELECT distinct ?Person1_firstName ?Person1_lastName ?Person1_primaryEmail ? AcademicArticle1_label ?Journal1_label ?AcademicArticle1_pmid ? DateTimeValue1_dateTime WHERE{ ?AcademicArticle1 rdf:type bibo:Document . ?AcademicArticle1 bibo:pmid ?AcademicArticle1_pmid . ?AcademicArticle1 vivo:dateTimeValue ?DateTimeValue1 . ?AcademicArticle1 vivo:informationResourceSupportedBy ?FundingOrganization1 . ?AcademicArticle1 bibo:pmid ?AcademicArticle1_pmid . ?DateTimeValue1 rdf:type vivo:DateTimeValue . ?DateTimeValue1 vivo:dateTime ?DateTimeValue1_dateTime . ?FundingOrganization1 rdf:type vivo:FundingOrganization . ?FundingOrganization1 rdfs:label ?FundingOrganization1_label . ?AcademicArticle1 rdfs:label ?AcademicArticle1_label . ?AcademicArticle1 vivo:hasPublicationVenue ?Journal1 . ?Journal1 rdf:type bibo:Journal . ?Journal1 rdfs:label ?Journal1_label . ?AcademicArticle1 vivo:informationResourceInAuthorship ?Authorship1 . ?Authorship1 rdf:type vivo:Authorship . ?Authorship1 vivo:linkedAuthor ?Person1 . ?Person1 rdf:type foaf:Person . ?Person1 vivo:primaryEmail ?Person1_primaryEmail . ?Person1 wcmc:cwid ?Person1_cwid . ?Person1 foaf:firstName ?Person1_firstName . ?Person1 foaf:lastName ?Person1_lastName . FILTER REGEX (str(?FundingOrganization1_label), 'N.I.H.', 'i') FILTER NOT EXISTS { ?AcademicArticle1 vivo:pmcid ?AcademicArticle1_pmcid .} FILTER (xsd:dateTime(?DateTimeValue1_dateTime) > "2008-04-01T00:00:00"^^xsd:dateTime) FILTER (xsd:dateTime(?DateTimeValue1_dateTime) < "2012-12-01T00:00:00"^^xsd:dateTime) } ORDER BY ?Person1_lastName
  • 17. SELECT distinct ?Person1_firstName ?Person1_lastName ? Person1_primaryEmail  ?AcademicArticle1_label ?Journal1_label ? AcademicArticle1_pmid ?DateTimeValue1_dateTime WHERE{ ?AcademicArticle1 rdf:type bibo:Document . ?AcademicArticle1 bibo:pmid ?AcademicArticle1_pmid . ?AcademicArticle1 vivo:dateTimeValue ?DateTimeValue1 . ?AcademicArticle1 vivo:informationResourceSupportedBy ? FundingOrganization1 . ?AcademicArticle1 bibo:pmid ?AcademicArticle1_pmid . ?DateTimeValue1 rdf:type vivo:DateTimeValue . ?DateTimeValue1 vivo:dateTime ?DateTimeValue1_dateTime . ?FundingOrganization1 rdf:type vivo:FundingOrganization . ?FundingOrganization1 rdfs:label ?FundingOrganization1_label . ?AcademicArticle1 rdfs:label ?AcademicArticle1_label . ?AcademicArticle1 vivo:hasPublicationVenue ?Journal1 . ?Journal1 rdf:type bibo:Journal . ?Journal1 rdfs:label ?Journal1_label . ?AcademicArticle1 vivo:informationResourceInAuthorship ? Authorship1 . ?Authorship1 rdf:type vivo:Authorship . ?Authorship1 vivo:linkedAuthor ?Person1 . ?Person1 rdf:type foaf:Person . ?Person1 vivo:primaryEmail ?Person1_primaryEmail . ?Person1 wcmc:cwid ?Person1_cwid . ?Person1 foaf:firstName ?Person1_firstName . ?Person1 foaf:lastName ?Person1_lastName . FILTER REGEX (str(?FundingOrganization1_label), 'N.I.H.', 'i') FILTER NOT EXISTS { ?AcademicArticle1 vivo:pmcid ? AcademicArticle1_pmcid .} FILTER (xsd:dateTime(?DateTimeValue1_dateTime) > "2008-04-01T00:00:00"^^xsd:dateTime) FILTER (xsd:dateTime(?DateTimeValue1_dateTime) < "2012-12-01T00:00:00"^^xsd:dateTime) }    ORDER BY ?Person1_lastName + VIVO Dashboard
  • 18. VIVO Dashboard: a tool for easily running sophisticated reports Don Carpenter dwc92@cornell.edu Cornell University
  • 19. Prime directive of VIVO Dashboard Empower untrained users to run sophisticated semantic queries on Weill Cornell faculty publications * Secondary directive: kill Sarah Connor
  • 20. Sample SPARQL query SELECT distinct ?Article1_pmid ?Person1_cwid ? Authorship1_authorRank WHERE{ ?Article1 rdf:type bibo:Document . ?Article1 vivo:informationResourceInAuthorship ?Authorship1 . ?Article1 bibo:pmid ?Article1_pmid . ?Authorship1 rdf:type vivo:Authorship . ?Authorship1 vivo:authorRank ?Authorship1_authorRank . ?Authorship1 vivo:linkedAuthor ?Person1 . ?Person1 rdf:type foaf:Person . ?Person1 wcmc:cwid ?Person1_cwid . }
  • 22. Workflow • One-time basis, set up the fields in the Drupal admin • On a weekly basis, execute a set of SPARQL queries against VIVO’s semantic endpoint. • Import resulting .csv files into Drupal.
  • 23. Technology Stack • Drupal 7.x • Stores content using the robust indexing application, Apache Solr  • AJAX • Key modules - Apache Solr - Facet API - Facet API graphs - D3.js (visualization library) - Charts and graphs - VIVO Dashboard (custom module)
  • 24. Performance • A previous version using MySQL queries took >10 seconds to load • Completely rewriting the application in Solr allows us to store X publications • Performance is now < 5 seconds
  • 25. Future Work • Enlist the talents of other Drupal developers • Release this project as open source code • Create a visualization for global health expertise
  • 26. Publications The following publications are for all publications by active Weill Cornell Medical College faculty as represented in VIVO. 25 50 75 100 Graph List Export ✓ Research Article (657) ✓ In Process (55) ✓ Review (45) ✓ Clinical Guideline (32) more... Publication Type Author Name Journal ranking 15.4 - 68.3 Date 2009 - Present Journal Name
  • 27. Repurposing authoritative semantic data to infer expertise and recommend grant opportunities Jie Lin jie265@gmail.com Cornell University
  • 28. Pressing needs 1. Researchers, development officers, and funding agencies frequently complain that the process of learning about grant opportunities is inefficient.  2. As a project manager for VIVO, I want to accurately include researchers' fields and expertise.
  • 29. Maybe the needs of grant recommendations and expertise can be addressed... together.
  • 30. 1. Gather information about people and grant notices. 2. Algorithmically make personalized recommend- ations of grant opportunities. (Hard.) 3. In exchange for the promise of higher quality recommendations, we get busy researchers to provide us feedback on our initial inferences about expertise. 4. Use expertise data in VIVO. Our intended workflow
  • 31. Sources for people Source Example Clinical expertise and board certifications at WeillCornell.org clinical pathology Medical Subject Headings (MeSH) in published papers anti-bacterial agents Personal statement ... I’ve always enjoyed medical education... Keywords for NIH grants information system analysis CFDA labels for NIH grants 93.821 – Lung diseases research Spending categories for NIH grants neurosciences ClinicalTrials.gov keywords and system- inferred MeSH violence research Global health expertise in Researcher Profile System Egypt NCCR category as asserted by CTSC staff Developmental and Child Psychology
  • 32. ScanGrants Sources for grant opportunities Grants.gov After global pre-filtration n = ~1,200
  • 33.
  • 34.
  • 35. Concept ranking • Term Frequency-Inverse Document Frequency – reward terms for showing up in a person’s list of terms and penalize terms for being in others.  • Result: no one is expert on “humans” • No algorithm is perfect so we allow faculty to provide feedback on the controlled terms we have inferred for them.
  • 36. Mapping concepts to fields • Objective of using a limited number of fields is to increase overlap between people and grants • 149 (somewhat arbitrarily) defined fields • Fields represent eight different lists of fields (Map of Science, ScanGrants, ABMS specialties...) • Take concepts and fields and do a co-occurrence search in MEDLINE. • For example, after weighting by size of field, how often does “Natural Language Processing” occur in conjunction with immunology; medical informatics; urology...?
  • 37. The math for mapping people and grants to fields
  • 38. Promise of co-occurrence searching Suppose a researcher is working almost exclusively on autoimmune disease and is highly ranked for the concept, “apoptosis.” Apoptosis also frequently co-occurs in MEDLINE with oncology. Therefore, we can predict her interest in an oncology grant.
  • 39. The downside of co-occurrence searching
  • 40. Match people to grants • Not yet done, but early testing is promising. • The idea is to use cosine similarity to define how similar any person-grant combination is to any other person-grant combination • Then you can rank those connections by people or by grant.
  • 41. Utility for Development Office • Suppose Dr. Lamon and the Development Office want to identify candidates to apply for a particular grant.  • He can get an ordered list of the top candidates of the people who are appropriate for this opportunity.