SlideShare a Scribd company logo
1 of 28
What can we learn from topic 
modeling on 350M documents? 
William Gunn 
Head of Academic Outreach 
Mendeley 
@mrgunn โ€“ https://orcid.org/0000-0002-3555-2054
Based in London, Mendeley is 
researchers, graduates and software 
developers from...
The opposite problem 
๏ƒ˜ We have the papers (400M) and are 
looking for the best way to turn 
them into structured knowledge. 
๏ƒ˜ We have useful triage indicators - 
#altmetrics, reproducibility 
๏ƒ˜ You have great use cases
...and aggregates 
data in the cloud 
Mendeley 
extracts 
research dataโ€ฆ 
Collecting rich signals 
from domain experts.
Rich user profile data
TEAM Project 
academic knowledge management solutions 
โ€ข Algorithms to determine the content similarity of academic papers 
โ€ข Performing text disambiguation and entity recognition to 
differentiate between and relate similar in-text entities and authors 
of research papers. 
โ€ข Developing semantic technologies and semantic web languages with 
the focus of metadata integration/validation 
โ€ข Investigate profiling and user analysis technologies, e.g. based on 
search logs and document interaction. 
โ€ข We will also improve folksonomies and through that, ontologies of 
text. 
โ€ข Finally, tagging behaviour will be analysed to improve tag 
recommendations and strategies. 
โ€ข http://team-project.tugraz.at/blog/
Semantics vs. Syntax 
โ€ข Language expresses semantics via syntax 
โ€ข Syntax is all a computer sees in a research 
article. 
โ€ข How do we get to semantics? 
โ€ขTopic Modeling!
Distribution of Topics 
35% 
30% 
25% 
20% 
15% 
10% 
5% 
0% 
Bio Phys Engineer Comp 
Sci 
Psych & 
Edu 
Business Law Other
Subcategories of Comp. Sci. 
20% 
15% 
10% 
5% 
0% 
AI HCI Info Sci Software 
Eng 
Networks
Generated topics โ€“ Comp. Sci.
Generated Topics - Biology
Categorization is imperfect
Categorization As A Process 
Thing 
Process 
Reaction 
Catalysis 
Enzymatic
Categorization As A Process 
Thing 
Process 
Reaction 
Catalysis 
Enzymatic
Categories change over time
Can we assist triage?
Code Project 
Use case = mining research papers for facts 
to add to LOD repositories and light-weight 
ontologies. 
โ€ข Crowd-sourcing enabled semantic enrichment & integration 
techniques for integrating facts contained in unstructured 
information into the LOD cloud 
โ€ข Federated, provenance-enabled querying methods for fact 
discovery in LOD repositories 
โ€ข Web-based visual analysis interfaces to support human based 
analysis, integration and organisation of facts 
โ€ข Socio-economic factors โ€“ roles, revenue-models and value 
chains โ€“ realisable in the envisioned ecosystem. 
โ€ข http://code-research.eu/
Metrics as a discovery tool
We didn โ€™t 
see that a target is 
more likely to be validated if it 
was reported in ten publications 
or in two publications 
NATURE REVIEWS DRUG DISCOVERY 10, 712 (SEPTEMBER 2011)
Either the results were reproducible 
and showed transferability in other 
models, or even a 1:1 reproduction of 
published experimental procedures 
revealed inconsistencies between 
published and in-house data 
NATURE REVIEWS DRUG DISCOVERY 10, 712 (SEPTEMBER 2011)
There is no Gold Standard 
๏ƒ˜ Amgen: 47 of 53 โ€œlandmarkโ€ oncology publications could 
not be reproduced. 
๏ƒ˜ Bayer: 43 of 67 oncology & cardiovascular projects were 
based on contradictory results 
๏ƒ˜ Dr. John Ioannidis: 432 publications purporting sex 
differences in hypertension, multiple sclerosis, or lung 
cancer. Only one data set was reproducible.
Building a reproducibility dataset 
โ€ข Mendeley and Science Exchange have 
started the Reproducibility Initiative 
โ€ข working with Figshare & PLOS to host data 
& replication reports 
โ€ข building open datasets backing high-impact 
work 
โ€ข extending the โ€œexecutable paperโ€ concept 
to biomedical research
Make it porous & part of the 
web. 
๏ฌ Our success as a crowdsourcing platform 
is largely due to our openness & end-user 
usefulness. 
๏ฌ Communities must be open if they are to 
thrive.
www.mendeley.com 
william.gunn@mendeley.com 
@mrgunn

More Related Content

What's hot

Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Tom Plasterer
ย 
RDAP 15: Beyond Metadata: Leveraging the โ€œREADMEโ€ to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the โ€œREADMEโ€ to support disciplinary Doc...RDAP 15: Beyond Metadata: Leveraging the โ€œREADMEโ€ to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the โ€œREADMEโ€ to support disciplinary Doc...ASIS&T
ย 
Michener-institutional and subject-specific data repositories-nfdp13
Michener-institutional and subject-specific data repositories-nfdp13Michener-institutional and subject-specific data repositories-nfdp13
Michener-institutional and subject-specific data repositories-nfdp13DataDryad
ย 
RDAP 15: โ€œThis is just for meโ€: Researchers on their data documentation pract...
RDAP 15: โ€œThis is just for meโ€: Researchers on their data documentation pract...RDAP 15: โ€œThis is just for meโ€: Researchers on their data documentation pract...
RDAP 15: โ€œThis is just for meโ€: Researchers on their data documentation pract...ASIS&T
ย 
Creating impact with accessible data in agriculture and nutrition: sharing da...
Creating impact with accessible data in agriculture and nutrition: sharing da...Creating impact with accessible data in agriculture and nutrition: sharing da...
Creating impact with accessible data in agriculture and nutrition: sharing da...godanSec
ย 
Recommendations for selection process automation in systematic reviews
Recommendations for selection process automation in systematic reviewsRecommendations for selection process automation in systematic reviews
Recommendations for selection process automation in systematic reviewsFaisal Razzak
ย 
Policy-compliant data processing: RDF-based restrictions for data-protection
Policy-compliant data processing: RDF-based restrictions for data-protectionPolicy-compliant data processing: RDF-based restrictions for data-protection
Policy-compliant data processing: RDF-based restrictions for data-protectionSven Lieber
ย 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016Susanna-Assunta Sansone
ย 
Advancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRAdvancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRMichel Dumontier
ย 
FAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationFAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationMichel Dumontier
ย 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsTom Plasterer
ย 
OpenAIRE-COAR conference 2014: Next generation metrics of scholarly performa...
 OpenAIRE-COAR conference 2014: Next generation metrics of scholarly performa... OpenAIRE-COAR conference 2014: Next generation metrics of scholarly performa...
OpenAIRE-COAR conference 2014: Next generation metrics of scholarly performa...OpenAIRE
ย 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityJames Hendler
ย 
RDAP14 Poster: openICPSR: a public access repository for storing and sharing ...
RDAP14 Poster: openICPSR: a public access repository for storing and sharing ...RDAP14 Poster: openICPSR: a public access repository for storing and sharing ...
RDAP14 Poster: openICPSR: a public access repository for storing and sharing ...ASIS&T
ย 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
ย 
RDAP 033111
RDAP 033111RDAP 033111
RDAP 033111Philip Bourne
ย 
AHM 2014: OceanLink, Smart Data versus Smart Applications
AHM 2014: OceanLink, Smart Data versus Smart Applications AHM 2014: OceanLink, Smart Data versus Smart Applications
AHM 2014: OceanLink, Smart Data versus Smart Applications EarthCube
ย 
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkRDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkASIS&T
ย 

What's hot (20)

Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
ย 
RDAP 15: Beyond Metadata: Leveraging the โ€œREADMEโ€ to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the โ€œREADMEโ€ to support disciplinary Doc...RDAP 15: Beyond Metadata: Leveraging the โ€œREADMEโ€ to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the โ€œREADMEโ€ to support disciplinary Doc...
ย 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
ย 
Michener-institutional and subject-specific data repositories-nfdp13
Michener-institutional and subject-specific data repositories-nfdp13Michener-institutional and subject-specific data repositories-nfdp13
Michener-institutional and subject-specific data repositories-nfdp13
ย 
RDAP 15: โ€œThis is just for meโ€: Researchers on their data documentation pract...
RDAP 15: โ€œThis is just for meโ€: Researchers on their data documentation pract...RDAP 15: โ€œThis is just for meโ€: Researchers on their data documentation pract...
RDAP 15: โ€œThis is just for meโ€: Researchers on their data documentation pract...
ย 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
ย 
Creating impact with accessible data in agriculture and nutrition: sharing da...
Creating impact with accessible data in agriculture and nutrition: sharing da...Creating impact with accessible data in agriculture and nutrition: sharing da...
Creating impact with accessible data in agriculture and nutrition: sharing da...
ย 
Recommendations for selection process automation in systematic reviews
Recommendations for selection process automation in systematic reviewsRecommendations for selection process automation in systematic reviews
Recommendations for selection process automation in systematic reviews
ย 
Policy-compliant data processing: RDF-based restrictions for data-protection
Policy-compliant data processing: RDF-based restrictions for data-protectionPolicy-compliant data processing: RDF-based restrictions for data-protection
Policy-compliant data processing: RDF-based restrictions for data-protection
ย 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016
ย 
Advancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRAdvancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIR
ย 
FAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationFAIR principles and metrics for evaluation
FAIR principles and metrics for evaluation
ย 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
ย 
OpenAIRE-COAR conference 2014: Next generation metrics of scholarly performa...
 OpenAIRE-COAR conference 2014: Next generation metrics of scholarly performa... OpenAIRE-COAR conference 2014: Next generation metrics of scholarly performa...
OpenAIRE-COAR conference 2014: Next generation metrics of scholarly performa...
ย 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/Interoperability
ย 
RDAP14 Poster: openICPSR: a public access repository for storing and sharing ...
RDAP14 Poster: openICPSR: a public access repository for storing and sharing ...RDAP14 Poster: openICPSR: a public access repository for storing and sharing ...
RDAP14 Poster: openICPSR: a public access repository for storing and sharing ...
ย 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
ย 
RDAP 033111
RDAP 033111RDAP 033111
RDAP 033111
ย 
AHM 2014: OceanLink, Smart Data versus Smart Applications
AHM 2014: OceanLink, Smart Data versus Smart Applications AHM 2014: OceanLink, Smart Data versus Smart Applications
AHM 2014: OceanLink, Smart Data versus Smart Applications
ย 
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkRDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
ย 

Similar to Sci Know Mine 2013: What can we learn from topic modeling on 350M academic documents?

VIVO 2013 Topic Modeling Entity Extraction
VIVO 2013 Topic Modeling Entity ExtractionVIVO 2013 Topic Modeling Entity Extraction
VIVO 2013 Topic Modeling Entity ExtractionWilliam Gunn
ย 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13Kristi Holmes
ย 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?Anita de Waard
ย 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
ย 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data ManagementCarole Goble
ย 
ELSS use cases and strategy
ELSS use cases and strategyELSS use cases and strategy
ELSS use cases and strategyAnton Yuryev
ย 
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...CILIP MDG
ย 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData ManagementUlrike Wittig
ย 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the partsCarole Goble
ย 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madnesssemanticsconference
ย 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Dataopenminted_eu
ย 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseRinke Hoekstra
ย 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynoteCarole Goble
ย 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
ย 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
ย 
Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0Elia Brodsky
ย 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Carole Goble
ย 
Acs denver dirks potenzone 30 aug2011
Acs denver dirks potenzone 30 aug2011Acs denver dirks potenzone 30 aug2011
Acs denver dirks potenzone 30 aug2011Rudy Potenzone
ย 

Similar to Sci Know Mine 2013: What can we learn from topic modeling on 350M academic documents? (20)

VIVO 2013 Topic Modeling Entity Extraction
VIVO 2013 Topic Modeling Entity ExtractionVIVO 2013 Topic Modeling Entity Extraction
VIVO 2013 Topic Modeling Entity Extraction
ย 
Martone grethe
Martone gretheMartone grethe
Martone grethe
ย 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13
ย 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
ย 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
ย 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data Management
ย 
ELSS use cases and strategy
ELSS use cases and strategyELSS use cases and strategy
ELSS use cases and strategy
ย 
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
ย 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData Management
ย 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
ย 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madness
ย 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
ย 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS case
ย 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
ย 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ย 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
ย 
Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0
ย 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
ย 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
ย 
Acs denver dirks potenzone 30 aug2011
Acs denver dirks potenzone 30 aug2011Acs denver dirks potenzone 30 aug2011
Acs denver dirks potenzone 30 aug2011
ย 

More from William Gunn

AAAS 2014: How the Web Changes Collaboration
AAAS 2014: How the Web Changes CollaborationAAAS 2014: How the Web Changes Collaboration
AAAS 2014: How the Web Changes CollaborationWilliam Gunn
ย 
LISA VII: The Scientific and Technical Foundation for Altmetrics in the Unite...
LISA VII: The Scientific and Technical Foundation for Altmetrics in the Unite...LISA VII: The Scientific and Technical Foundation for Altmetrics in the Unite...
LISA VII: The Scientific and Technical Foundation for Altmetrics in the Unite...William Gunn
ย 
The Scientific and Technical Foundation for Altmetrics in the United States
The Scientific and Technical Foundation for Altmetrics in the United StatesThe Scientific and Technical Foundation for Altmetrics in the United States
The Scientific and Technical Foundation for Altmetrics in the United StatesWilliam Gunn
ย 
AGU2012: Creating a Collaborative Network for Scientists
AGU2012: Creating a Collaborative Network for ScientistsAGU2012: Creating a Collaborative Network for Scientists
AGU2012: Creating a Collaborative Network for ScientistsWilliam Gunn
ย 
Academia to Entrepreneur: Why and How to Leave Academia Behind
Academia to Entrepreneur: Why and How to Leave Academia BehindAcademia to Entrepreneur: Why and How to Leave Academia Behind
Academia to Entrepreneur: Why and How to Leave Academia BehindWilliam Gunn
ย 
Social metrics for Research: Quantity and Quality
Social metrics for Research: Quantity and QualitySocial metrics for Research: Quantity and Quality
Social metrics for Research: Quantity and QualityWilliam Gunn
ย 
ASIST 2013 Panel: Altmetrics at Mendeley
ASIST 2013 Panel: Altmetrics at MendeleyASIST 2013 Panel: Altmetrics at Mendeley
ASIST 2013 Panel: Altmetrics at MendeleyWilliam Gunn
ย 
Code4lib 2012: Building Research Applications with Mendeley
Code4lib 2012: Building Research Applications with MendeleyCode4lib 2012: Building Research Applications with Mendeley
Code4lib 2012: Building Research Applications with MendeleyWilliam Gunn
ย 
Beyond Academia: Communicating your Work in Academia and Beyond
Beyond Academia: Communicating your Work in Academia and Beyond Beyond Academia: Communicating your Work in Academia and Beyond
Beyond Academia: Communicating your Work in Academia and Beyond William Gunn
ย 
Charleston 2013: The Social Side of Research
Charleston 2013: The Social Side of ResearchCharleston 2013: The Social Side of Research
Charleston 2013: The Social Side of ResearchWilliam Gunn
ย 
Science Online 2013: Data Visualization Using R
Science Online 2013: Data Visualization Using RScience Online 2013: Data Visualization Using R
Science Online 2013: Data Visualization Using RWilliam Gunn
ย 
ESIP FED Spring 2012: Evolving Networks of Expertise
ESIP FED Spring 2012: Evolving Networks of ExpertiseESIP FED Spring 2012: Evolving Networks of Expertise
ESIP FED Spring 2012: Evolving Networks of ExpertiseWilliam Gunn
ย 
Charleston 2012: Altmetrics: Analyzing the Value in Scholarly Content
Charleston 2012: Altmetrics: Analyzing the Value in Scholarly ContentCharleston 2012: Altmetrics: Analyzing the Value in Scholarly Content
Charleston 2012: Altmetrics: Analyzing the Value in Scholarly ContentWilliam Gunn
ย 
VIVO 2010 2010 Paper
VIVO 2010 2010 PaperVIVO 2010 2010 Paper
VIVO 2010 2010 PaperWilliam Gunn
ย 
Mendeley Open Repositories 2011 Paper
Mendeley Open Repositories 2011 PaperMendeley Open Repositories 2011 Paper
Mendeley Open Repositories 2011 PaperWilliam Gunn
ย 
Beyond the PDF 2011 Paper
Beyond the PDF 2011 PaperBeyond the PDF 2011 Paper
Beyond the PDF 2011 PaperWilliam Gunn
ย 
Connecting Researchers with Information - and Unlocking It!
Connecting Researchers with Information - and Unlocking It!Connecting Researchers with Information - and Unlocking It!
Connecting Researchers with Information - and Unlocking It!William Gunn
ย 
Sci Tech Forum LA 2013: New Directions in Scholarly Communication
Sci Tech Forum LA 2013: New Directions in Scholarly CommunicationSci Tech Forum LA 2013: New Directions in Scholarly Communication
Sci Tech Forum LA 2013: New Directions in Scholarly CommunicationWilliam Gunn
ย 
Open Science Summit 2011: It's Time We Changed How Science is Done
Open Science Summit 2011: It's Time We Changed How Science is DoneOpen Science Summit 2011: It's Time We Changed How Science is Done
Open Science Summit 2011: It's Time We Changed How Science is DoneWilliam Gunn
ย 
VIVO 2011 Paper
VIVO 2011 PaperVIVO 2011 Paper
VIVO 2011 PaperWilliam Gunn
ย 

More from William Gunn (20)

AAAS 2014: How the Web Changes Collaboration
AAAS 2014: How the Web Changes CollaborationAAAS 2014: How the Web Changes Collaboration
AAAS 2014: How the Web Changes Collaboration
ย 
LISA VII: The Scientific and Technical Foundation for Altmetrics in the Unite...
LISA VII: The Scientific and Technical Foundation for Altmetrics in the Unite...LISA VII: The Scientific and Technical Foundation for Altmetrics in the Unite...
LISA VII: The Scientific and Technical Foundation for Altmetrics in the Unite...
ย 
The Scientific and Technical Foundation for Altmetrics in the United States
The Scientific and Technical Foundation for Altmetrics in the United StatesThe Scientific and Technical Foundation for Altmetrics in the United States
The Scientific and Technical Foundation for Altmetrics in the United States
ย 
AGU2012: Creating a Collaborative Network for Scientists
AGU2012: Creating a Collaborative Network for ScientistsAGU2012: Creating a Collaborative Network for Scientists
AGU2012: Creating a Collaborative Network for Scientists
ย 
Academia to Entrepreneur: Why and How to Leave Academia Behind
Academia to Entrepreneur: Why and How to Leave Academia BehindAcademia to Entrepreneur: Why and How to Leave Academia Behind
Academia to Entrepreneur: Why and How to Leave Academia Behind
ย 
Social metrics for Research: Quantity and Quality
Social metrics for Research: Quantity and QualitySocial metrics for Research: Quantity and Quality
Social metrics for Research: Quantity and Quality
ย 
ASIST 2013 Panel: Altmetrics at Mendeley
ASIST 2013 Panel: Altmetrics at MendeleyASIST 2013 Panel: Altmetrics at Mendeley
ASIST 2013 Panel: Altmetrics at Mendeley
ย 
Code4lib 2012: Building Research Applications with Mendeley
Code4lib 2012: Building Research Applications with MendeleyCode4lib 2012: Building Research Applications with Mendeley
Code4lib 2012: Building Research Applications with Mendeley
ย 
Beyond Academia: Communicating your Work in Academia and Beyond
Beyond Academia: Communicating your Work in Academia and Beyond Beyond Academia: Communicating your Work in Academia and Beyond
Beyond Academia: Communicating your Work in Academia and Beyond
ย 
Charleston 2013: The Social Side of Research
Charleston 2013: The Social Side of ResearchCharleston 2013: The Social Side of Research
Charleston 2013: The Social Side of Research
ย 
Science Online 2013: Data Visualization Using R
Science Online 2013: Data Visualization Using RScience Online 2013: Data Visualization Using R
Science Online 2013: Data Visualization Using R
ย 
ESIP FED Spring 2012: Evolving Networks of Expertise
ESIP FED Spring 2012: Evolving Networks of ExpertiseESIP FED Spring 2012: Evolving Networks of Expertise
ESIP FED Spring 2012: Evolving Networks of Expertise
ย 
Charleston 2012: Altmetrics: Analyzing the Value in Scholarly Content
Charleston 2012: Altmetrics: Analyzing the Value in Scholarly ContentCharleston 2012: Altmetrics: Analyzing the Value in Scholarly Content
Charleston 2012: Altmetrics: Analyzing the Value in Scholarly Content
ย 
VIVO 2010 2010 Paper
VIVO 2010 2010 PaperVIVO 2010 2010 Paper
VIVO 2010 2010 Paper
ย 
Mendeley Open Repositories 2011 Paper
Mendeley Open Repositories 2011 PaperMendeley Open Repositories 2011 Paper
Mendeley Open Repositories 2011 Paper
ย 
Beyond the PDF 2011 Paper
Beyond the PDF 2011 PaperBeyond the PDF 2011 Paper
Beyond the PDF 2011 Paper
ย 
Connecting Researchers with Information - and Unlocking It!
Connecting Researchers with Information - and Unlocking It!Connecting Researchers with Information - and Unlocking It!
Connecting Researchers with Information - and Unlocking It!
ย 
Sci Tech Forum LA 2013: New Directions in Scholarly Communication
Sci Tech Forum LA 2013: New Directions in Scholarly CommunicationSci Tech Forum LA 2013: New Directions in Scholarly Communication
Sci Tech Forum LA 2013: New Directions in Scholarly Communication
ย 
Open Science Summit 2011: It's Time We Changed How Science is Done
Open Science Summit 2011: It's Time We Changed How Science is DoneOpen Science Summit 2011: It's Time We Changed How Science is Done
Open Science Summit 2011: It's Time We Changed How Science is Done
ย 
VIVO 2011 Paper
VIVO 2011 PaperVIVO 2011 Paper
VIVO 2011 Paper
ย 

Recently uploaded

Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Mohammad Khajehpour
ย 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
ย 
High Profile ๐Ÿ” 8250077686 ๐Ÿ“ž Call Girls Service in GTB Nagar๐Ÿ‘
High Profile ๐Ÿ” 8250077686 ๐Ÿ“ž Call Girls Service in GTB Nagar๐Ÿ‘High Profile ๐Ÿ” 8250077686 ๐Ÿ“ž Call Girls Service in GTB Nagar๐Ÿ‘
High Profile ๐Ÿ” 8250077686 ๐Ÿ“ž Call Girls Service in GTB Nagar๐Ÿ‘Damini Dixit
ย 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
ย 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
ย 
โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….
โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….
โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….Nitya salvi
ย 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .Poonam Aher Patil
ย 
High Class Escorts in Hyderabad โ‚น7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad โ‚น7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad โ‚น7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad โ‚น7.5k Pick Up & Drop With Cash Payment 969456...chandars293
ย 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
ย 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxBhagirath Gogikar
ย 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
ย 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
ย 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
ย 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
ย 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSรฉrgio Sacani
ย 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
ย 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
ย 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
ย 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
ย 

Recently uploaded (20)

Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
ย 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ย 
High Profile ๐Ÿ” 8250077686 ๐Ÿ“ž Call Girls Service in GTB Nagar๐Ÿ‘
High Profile ๐Ÿ” 8250077686 ๐Ÿ“ž Call Girls Service in GTB Nagar๐Ÿ‘High Profile ๐Ÿ” 8250077686 ๐Ÿ“ž Call Girls Service in GTB Nagar๐Ÿ‘
High Profile ๐Ÿ” 8250077686 ๐Ÿ“ž Call Girls Service in GTB Nagar๐Ÿ‘
ย 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
ย 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
ย 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
ย 
โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….
โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….
โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….
ย 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
ย 
High Class Escorts in Hyderabad โ‚น7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad โ‚น7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad โ‚น7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad โ‚น7.5k Pick Up & Drop With Cash Payment 969456...
ย 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
ย 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
ย 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
ย 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
ย 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
ย 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
ย 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
ย 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
ย 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
ย 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
ย 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
ย 

Sci Know Mine 2013: What can we learn from topic modeling on 350M academic documents?

  • 1. What can we learn from topic modeling on 350M documents? William Gunn Head of Academic Outreach Mendeley @mrgunn โ€“ https://orcid.org/0000-0002-3555-2054
  • 2. Based in London, Mendeley is researchers, graduates and software developers from...
  • 3. The opposite problem ๏ƒ˜ We have the papers (400M) and are looking for the best way to turn them into structured knowledge. ๏ƒ˜ We have useful triage indicators - #altmetrics, reproducibility ๏ƒ˜ You have great use cases
  • 4. ...and aggregates data in the cloud Mendeley extracts research dataโ€ฆ Collecting rich signals from domain experts.
  • 6. TEAM Project academic knowledge management solutions โ€ข Algorithms to determine the content similarity of academic papers โ€ข Performing text disambiguation and entity recognition to differentiate between and relate similar in-text entities and authors of research papers. โ€ข Developing semantic technologies and semantic web languages with the focus of metadata integration/validation โ€ข Investigate profiling and user analysis technologies, e.g. based on search logs and document interaction. โ€ข We will also improve folksonomies and through that, ontologies of text. โ€ข Finally, tagging behaviour will be analysed to improve tag recommendations and strategies. โ€ข http://team-project.tugraz.at/blog/
  • 7. Semantics vs. Syntax โ€ข Language expresses semantics via syntax โ€ข Syntax is all a computer sees in a research article. โ€ข How do we get to semantics? โ€ขTopic Modeling!
  • 8. Distribution of Topics 35% 30% 25% 20% 15% 10% 5% 0% Bio Phys Engineer Comp Sci Psych & Edu Business Law Other
  • 9. Subcategories of Comp. Sci. 20% 15% 10% 5% 0% AI HCI Info Sci Software Eng Networks
  • 10.
  • 14. Categorization As A Process Thing Process Reaction Catalysis Enzymatic
  • 15. Categorization As A Process Thing Process Reaction Catalysis Enzymatic
  • 17. Can we assist triage?
  • 18. Code Project Use case = mining research papers for facts to add to LOD repositories and light-weight ontologies. โ€ข Crowd-sourcing enabled semantic enrichment & integration techniques for integrating facts contained in unstructured information into the LOD cloud โ€ข Federated, provenance-enabled querying methods for fact discovery in LOD repositories โ€ข Web-based visual analysis interfaces to support human based analysis, integration and organisation of facts โ€ข Socio-economic factors โ€“ roles, revenue-models and value chains โ€“ realisable in the envisioned ecosystem. โ€ข http://code-research.eu/
  • 19.
  • 20.
  • 21.
  • 22. Metrics as a discovery tool
  • 23. We didn โ€™t see that a target is more likely to be validated if it was reported in ten publications or in two publications NATURE REVIEWS DRUG DISCOVERY 10, 712 (SEPTEMBER 2011)
  • 24. Either the results were reproducible and showed transferability in other models, or even a 1:1 reproduction of published experimental procedures revealed inconsistencies between published and in-house data NATURE REVIEWS DRUG DISCOVERY 10, 712 (SEPTEMBER 2011)
  • 25. There is no Gold Standard ๏ƒ˜ Amgen: 47 of 53 โ€œlandmarkโ€ oncology publications could not be reproduced. ๏ƒ˜ Bayer: 43 of 67 oncology & cardiovascular projects were based on contradictory results ๏ƒ˜ Dr. John Ioannidis: 432 publications purporting sex differences in hypertension, multiple sclerosis, or lung cancer. Only one data set was reproducible.
  • 26. Building a reproducibility dataset โ€ข Mendeley and Science Exchange have started the Reproducibility Initiative โ€ข working with Figshare & PLOS to host data & replication reports โ€ข building open datasets backing high-impact work โ€ข extending the โ€œexecutable paperโ€ concept to biomedical research
  • 27. Make it porous & part of the web. ๏ฌ Our success as a crowdsourcing platform is largely due to our openness & end-user usefulness. ๏ฌ Communities must be open if they are to thrive.