SlideShare une entreprise Scribd logo
1  sur  1
Télécharger pour lire hors ligne
TEMPLATE DESIGN © 2008
www.PosterPresentations.com
Text-Mining PubMed Search Results to Identify Emerging Technologies
Relevant to Medical Librarians
P. F. Anderson1
<pfa@umich.edu>; Skye Bickett2
; Joanne Doucette3
; Pamela Herring4
; Andrea Kepsel5
; Tierney Lyons6
; Scott McLachlan7
; Carol Shannon1
; Lin Wu
8
University of Michigan-Ann Arbor; 2) Georgia Campus-Philadelphia College of Osteopathic Medicine; 3) MCPHS University, Boston; 4) University of Central Florida College
ofMedicine; 5) Michigan State University; 6) Cerebros Medical Systems, Jessup, PA; 7) Ruskin College, Oxford; 8) University of Tennessee, Memphis
Objectives
The Emerging Technologies Team, part of the Medical Library Association (MLA)
systematic review (SR) projects, conducted a pilot study to identify emerging
technologies relevant to medical librarians. The team analyzed results from its
previously reported PubMed Search filter using text mining to identify patterns,
themes, and trends important to the practice of medical librarianship and the
communities we support.
Methods
Analysis: Challenges & Solutions
Challenges:
1. FLink exports as CSV directly
from PubMed, but only permits
export of 10,000 records, no
abstracts.
2. Inadequate (under-powered)
hardware.
3. Large file size created
challenges with opening file and
file conversion.
4. Unable to install current version
of text mining software
(OpenRefine).
5. IT policies (blocking) and
support at some institutions.
Text Mining Images
All images were created through the Voyant-Tools analysis: <http://voyant-tools.org/>
Next Steps & Recommendations
Voyant was used for basic analysis to identify big concepts from 2016, yet we can
dig deeper with additional tools to generate unknown items. Moving forward, we
will extend the years of analysis for trends and patterns and continue gathering
more data. We will continue the text mining process, using visualization
techniques such as Google Refine and OpenRefine for analysis in context and
AntConc for concordance and fringe concepts. We will then publish the results.
Once one has a dataset, the dataset itself can be useful to look for trends in
specific areas, such as surgery or education, potentially in response to areas of
interest within the library’s target audience.
Results
Sources / Resources
Find us
The finalized search strategy results in a five- year file with 162,339 records.
Deduping in EndNote resulted in 162,221 records. We tested the analysis with 5-year
[162,221], 3-year [107,531], and 1-year sets for each of the five years. For this poster
we tested the analysis process with the single year set from 2016 [35,535].
In our initial project planning, we had identified five main areas of interest
(technology, information, public health, education, and the body). This analysis made
clear additional clusters of interesting content, especially new methodologies (e.g.,
big data and data visualization) and emerging interdisciplinary trends (such as
precision medicine). As those had not been included in the original planning, this
showed the potential benefit of text mining for discovering unknown areas of
relevance.
The primary areas of the body which were strongly represented in the data included
blood, bone, brain, and urine. Related concepts which were strongly represented in
the data set included cancer, diagnostics, treatment, and biomarkers.
The three top technologies that arose from the text mining process were robotics,
simulations, and 3D technologies, especially 3D printing. All three were being used
most heavily in surgery. Simulations were also prominent in education/training.
BIBLIOGRAPHY
Higgins JPT, Deeks JJ, (eds.) 2011. Selecting studies and collecting data, London: The
Cochrane Collaboration.
Mane KK, Borner K. 2004. Mapping topics and topic bursts in PNAS. Proceedings of the
National Academy of Sciences 101(suppl. 1), 5287-5290.
Mikova N. 2016. Recent trends in technology mining approaches: Quantitative analysis of GTM
Conference Proceedings. In: Daim TU, Chiavetta D, Porter AL, Saritas O (eds.) Anticipating
future innovation pathways through large data analysis. Cham, Switzerland: Springer
International Publishing.
Porter AL, Cunningham SW. 2005. Tech mining: Exploiting new technologies for competitive
advantage, Hoboken, NJ, John Wiley & Sons, Inc.
Schünemann HJ, Oxman AD, Vist GE, Higgins JPT, Deeks JJ, Glasziou PP, Guyatt GH. 2011.
Interpreting results and drawing conclusions. In: Higgins JPT, Green S (eds.) Cochrane
handbook for systematic reviews of interventions Version 5.1.0 (updated March 2011). London:
The Cochrane Collaboration.
Stevens A, Milne R, Lilford R, Gabbay J. 1999. Keeping pace with new technologies: Systems
needed to identify and evaluate them. BMJ 319, 1291-3.
RESOURCES
Endnote: endnote.com/
Voyant: https://voyant-tools.org/
OpenRefine: http://openrefine.org/
AntConc: www.laurenceanthony.net/software/antconc/
We began by establishing a common competency base through custom training
sessions from higher education data-mining experts. Next, the team 1) reviewed and
finalized the emerging technologies PubMed search strategy created for the project;
2) exported the data; 3) used automated tools to clean extraneous data from the data
set; and 4) tested the data by running preliminary text-mining scans. Steps 3 and 4
were repeated to refine and focus the results. Tools such as GREP, R, FLink,
pubmed.mineR were evaluated and tested for data export and cleaning, with the
ultimate choices settling on a combination of EndNote, Voyant, OpenRefine for data
cleaning, and Voyant, OpenRefine/GoogleRefine and AntConc for analysis.
MLASR6 Google Plus Community: goo.gl/RxtOFg
The Medical Library Association initiated a large systematic review project to assess the level of
evidence available to support the profession and practice of medical librarianship in several
very important questions. Team 6 has been assigned to explore this topic:
The explosion of information, expanding of technology (especially mobile technology), and
complexity of healthcare environment present medical librarians and medical libraries
opportunities and challenges. To live up with the opportunities and challenges, what kinds of
skill sets or information structure do medical librarians or medical libraries are required to have
or acquire so as to be strong partners or contributors of continuing effectiveness to the
changing environment?
Process: Software & Technology
1
st
stage: Voyant was used to identify words/word cloud to create custom
stoplist. Stop word list was created by the team leader, and peer reviewed
within the team. Collocations used to identify major tech concepts from word
cloud concepts. Additional visualizations to refine understanding of top three
tech concepts.
2
nd
stage: OpenRefine will be used to open projects & expand analysis.
(OpenRefine challenges: with full dataset, it stalls out in the project creation
stage on team desktops. Works in Chrome, but not in Opera or Safari.)
3
rd
stage: AntConc will be used for a deep dive into the specific terms/phrases
of interest to discover their context and related concepts.
Data Cleaning
After running the finalized search strategy in PubMed, the resulting list was exported
from the database in the MEDLINE format, creating a TXT file. To support the
proposed text mining analysis of this dataset using Voyant or OpenRefine, a CSV
file needed to be built. FLink, an NLM product developed to create CSV files, was
used initially. Unfortunately, the program was only able to handle 10,000 records
and could not produce a CSV file that included abstracts. To create the appropriate
CSV file, results were exported from PubMed as TXT file, imported to EndNote,
deduplicated within EndNote, then exported using a custom output style created by
the team. The resulting CSV file of 162,339 PubMed records was downloaded to
Excel where all fields except for PMID, Title, Abstract and MeSH or Keywords were
deleted. The remaining content was cleaned by removal of punctuation (using
nested SUBSTITUTE functions) and changing all text to lowercase (using the
LOWER function). Considerations during punctuation removal included separation
of MeSH headings and subheadings by removal of the “/” character, the importance
of “.” character in numerical values, and the decision of whether or not to keep
numeric data.
.
Figure 1: Word Cloud of Whole Corpus
Figure 2: Top Three Technologies Identified in Analysis
Figure 3: 3D Printing in Context
Solutions:
1. Export full records, import to
Endnote, use custom filter for initial
cleaning, export to CSV file.
2. Do you have access to more
powerful computers elsewhere?
3. Break file into smaller chunks for
cleaning; pool for final analysis.
4. Upgrade computer to have more
memory, or use more powerful
computer elsewhere.
5. Ensure good technical support and
administrative backing for project.

Contenu connexe

Tendances

Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data ChallengesPhilip Bourne
 
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...GrahamSmith646206
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps. Richard Layton
 
Prisma s manuscript preprint
Prisma s manuscript preprintPrisma s manuscript preprint
Prisma s manuscript preprintdaisyfloresc
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_PresentationYatpang Cheung
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)Michael Atkins
 
RES812 U4 Individual Project
RES812  U4 Individual ProjectRES812  U4 Individual Project
RES812 U4 Individual ProjectThienSi Le
 
Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Enayat Rajabi
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
 
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsEKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsFrancesco Osborne
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphPaul Groth
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Carole Goble
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data ManagementC. Tobin Magle
 
Bringing bioinformatics into the library
Bringing bioinformatics into the libraryBringing bioinformatics into the library
Bringing bioinformatics into the libraryC. Tobin Magle
 
Google Scholar and Web of Science: Similarities and Differences in Citation A...
Google Scholar and Web of Science: Similarities and Differences in Citation A...Google Scholar and Web of Science: Similarities and Differences in Citation A...
Google Scholar and Web of Science: Similarities and Differences in Citation A...Balachandar Radhakrishnan
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalWaqas Tariq
 

Tendances (20)

Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data Challenges
 
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps.
 
Prisma s manuscript preprint
Prisma s manuscript preprintPrisma s manuscript preprint
Prisma s manuscript preprint
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_Presentation
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
RES812 U4 Individual Project
RES812  U4 Individual ProjectRES812  U4 Individual Project
RES812 U4 Individual Project
 
Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 
Curating and Sharing Structures and Spectra for the Environmental Community
Curating and Sharing  Structures and Spectra for the Environmental CommunityCurating and Sharing  Structures and Spectra for the Environmental Community
Curating and Sharing Structures and Spectra for the Environmental Community
 
Payton Eliminating Conflicts in Ebook Metadata
Payton Eliminating Conflicts in Ebook MetadataPayton Eliminating Conflicts in Ebook Metadata
Payton Eliminating Conflicts in Ebook Metadata
 
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsEKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge Graph
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data Management
 
Bringing bioinformatics into the library
Bringing bioinformatics into the libraryBringing bioinformatics into the library
Bringing bioinformatics into the library
 
Google Scholar and Web of Science: Similarities and Differences in Citation A...
Google Scholar and Web of Science: Similarities and Differences in Citation A...Google Scholar and Web of Science: Similarities and Differences in Citation A...
Google Scholar and Web of Science: Similarities and Differences in Citation A...
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information Retrieval
 

Similaire à Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant to Medical Librarians

Peter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi's 2011 AMIA CRI Year-in-ReviewPeter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi's 2011 AMIA CRI Year-in-ReviewPeter Embi
 
Siena's Clinical Decision Assistant
Siena's Clinical Decision AssistantSiena's Clinical Decision Assistant
Siena's Clinical Decision AssistantMichael Ippolito
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectStuart Chalk
 
Connected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul GrothConnected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul GrothConnected Data World
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Dataopenminted_eu
 
Embi cri review-2012-final
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-finalPeter Embi
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAGopen_phacts
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicinePaul Groth
 
Social Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIASocial Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIAInsight_Altmetrics
 
Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...Salam Shah
 
Biomedical indexing and retrieval system based on language modeling approach
Biomedical indexing and retrieval system based on language modeling approachBiomedical indexing and retrieval system based on language modeling approach
Biomedical indexing and retrieval system based on language modeling approachijseajournal
 
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringAn Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringKelly Lipiec
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...Catherine Canevet
 
Automatic summarization of the medical literature
Automatic summarization of the medical literatureAutomatic summarization of the medical literature
Automatic summarization of the medical literatureharinithiyagarajan4
 
How to extract data from your paper for systemic review - Pubrica
How to extract data from your paper for systemic review  - PubricaHow to extract data from your paper for systemic review  - Pubrica
How to extract data from your paper for systemic review - PubricaPubrica
 
How to extract data from your paper for systemic review – Pubrica
How to extract data from your paper for systemic review – PubricaHow to extract data from your paper for systemic review – Pubrica
How to extract data from your paper for systemic review – PubricaPubrica
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europeopen_phacts
 

Similaire à Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant to Medical Librarians (20)

Developing a Replicable Methodology for Automated Identification of Emerging ...
Developing a Replicable Methodology for Automated Identification of Emerging ...Developing a Replicable Methodology for Automated Identification of Emerging ...
Developing a Replicable Methodology for Automated Identification of Emerging ...
 
Peter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi's 2011 AMIA CRI Year-in-ReviewPeter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi's 2011 AMIA CRI Year-in-Review
 
Prosdocimi ucb cdao
Prosdocimi ucb cdaoProsdocimi ucb cdao
Prosdocimi ucb cdao
 
Siena's Clinical Decision Assistant
Siena's Clinical Decision AssistantSiena's Clinical Decision Assistant
Siena's Clinical Decision Assistant
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
Connected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul GrothConnected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul Groth
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
 
Embi cri review-2012-final
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-final
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 
Social Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIASocial Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIA
 
Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...
 
Biomedical indexing and retrieval system based on language modeling approach
Biomedical indexing and retrieval system based on language modeling approachBiomedical indexing and retrieval system based on language modeling approach
Biomedical indexing and retrieval system based on language modeling approach
 
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringAn Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...
 
Automatic summarization of the medical literature
Automatic summarization of the medical literatureAutomatic summarization of the medical literature
Automatic summarization of the medical literature
 
How to extract data from your paper for systemic review - Pubrica
How to extract data from your paper for systemic review  - PubricaHow to extract data from your paper for systemic review  - Pubrica
How to extract data from your paper for systemic review - Pubrica
 
How to extract data from your paper for systemic review – Pubrica
How to extract data from your paper for systemic review – PubricaHow to extract data from your paper for systemic review – Pubrica
How to extract data from your paper for systemic review – Pubrica
 
As World’s Collide
As World’s CollideAs World’s Collide
As World’s Collide
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 

Plus de University of Michigan Taubman Health Sciences Library

Plus de University of Michigan Taubman Health Sciences Library (20)

Systematic Reviews, Tech Mining, and Other Knowledge Synthesis Beasts of Burden
Systematic Reviews, Tech Mining, and Other Knowledge Synthesis Beasts of BurdenSystematic Reviews, Tech Mining, and Other Knowledge Synthesis Beasts of Burden
Systematic Reviews, Tech Mining, and Other Knowledge Synthesis Beasts of Burden
 
It's Not Brain Surgery: Graphic Medicine, Graphic Justice, and More About Com...
It's Not Brain Surgery: Graphic Medicine, Graphic Justice, and More About Com...It's Not Brain Surgery: Graphic Medicine, Graphic Justice, and More About Com...
It's Not Brain Surgery: Graphic Medicine, Graphic Justice, and More About Com...
 
Methodology Mashups: Systematic Searches, Plus ...
Methodology Mashups: Systematic Searches, Plus ... Methodology Mashups: Systematic Searches, Plus ...
Methodology Mashups: Systematic Searches, Plus ...
 
#OwnVoices in Graphic Medicine: Creation and Collection
#OwnVoices in Graphic Medicine:  Creation and Collection#OwnVoices in Graphic Medicine:  Creation and Collection
#OwnVoices in Graphic Medicine: Creation and Collection
 
Introducing the "Librome Research Core"
Introducing the "Librome Research Core"Introducing the "Librome Research Core"
Introducing the "Librome Research Core"
 
Storytelling workshop: journeys in health care
Storytelling workshop: journeys in health careStorytelling workshop: journeys in health care
Storytelling workshop: journeys in health care
 
Research Methods: Searches & Systematic Reviews
Research Methods: Searches & Systematic ReviewsResearch Methods: Searches & Systematic Reviews
Research Methods: Searches & Systematic Reviews
 
NISO — Cutting Edges with Company: Emerging Technologies as a Collective Effort
NISO — Cutting Edges with Company: Emerging Technologies as a Collective EffortNISO — Cutting Edges with Company: Emerging Technologies as a Collective Effort
NISO — Cutting Edges with Company: Emerging Technologies as a Collective Effort
 
Ab Errantry: A Game to Build Awareness of the Aberrant and Abhorrent in Teens...
Ab Errantry: A Game to Build Awareness of the Aberrant and Abhorrent in Teens...Ab Errantry: A Game to Build Awareness of the Aberrant and Abhorrent in Teens...
Ab Errantry: A Game to Build Awareness of the Aberrant and Abhorrent in Teens...
 
Making Comics Fast — The Social Justice Version (2017)
Making Comics Fast — The Social Justice Version (2017)Making Comics Fast — The Social Justice Version (2017)
Making Comics Fast — The Social Justice Version (2017)
 
Making Comics Fast (2018)
Making Comics Fast (2018)Making Comics Fast (2018)
Making Comics Fast (2018)
 
Writing A Sexier Research Abstract: Making Research In Life Science More Disc...
Writing A Sexier Research Abstract: Making Research In Life Science More Disc...Writing A Sexier Research Abstract: Making Research In Life Science More Disc...
Writing A Sexier Research Abstract: Making Research In Life Science More Disc...
 
Rapid Reviews 101
Rapid Reviews 101 Rapid Reviews 101
Rapid Reviews 101
 
Making Research in the Life Sciences More Discoverable
Making Research in the Life Sciences More Discoverable Making Research in the Life Sciences More Discoverable
Making Research in the Life Sciences More Discoverable
 
Methods: Searching & Systematic Reviews
Methods: Searching & Systematic ReviewsMethods: Searching & Systematic Reviews
Methods: Searching & Systematic Reviews
 
Reinventing Normal 3: Connie Chang, Fast Forward Medical Innovation
Reinventing Normal 3: Connie Chang, Fast Forward Medical Innovation Reinventing Normal 3: Connie Chang, Fast Forward Medical Innovation
Reinventing Normal 3: Connie Chang, Fast Forward Medical Innovation
 
Reinventing Normal 2: David Chesney, Gaming for the Greater Good
Reinventing Normal 2: David Chesney, Gaming for the Greater Good Reinventing Normal 2: David Chesney, Gaming for the Greater Good
Reinventing Normal 2: David Chesney, Gaming for the Greater Good
 
Reinventing Normal 1: Michelle A. Meade, Technologies for Empowerment
Reinventing Normal 1: Michelle A. Meade, Technologies for Empowerment  Reinventing Normal 1: Michelle A. Meade, Technologies for Empowerment
Reinventing Normal 1: Michelle A. Meade, Technologies for Empowerment
 
Comic creation as an innovative library role
Comic creation as an innovative library roleComic creation as an innovative library role
Comic creation as an innovative library role
 
Try Not to Get Sued! The Pursuit of Accessibility and a Professional Captioni...
Try Not to Get Sued! The Pursuit of Accessibility and a Professional Captioni...Try Not to Get Sued! The Pursuit of Accessibility and a Professional Captioni...
Try Not to Get Sued! The Pursuit of Accessibility and a Professional Captioni...
 

Dernier

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Dernier (20)

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant to Medical Librarians

  • 1. TEMPLATE DESIGN © 2008 www.PosterPresentations.com Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant to Medical Librarians P. F. Anderson1 <pfa@umich.edu>; Skye Bickett2 ; Joanne Doucette3 ; Pamela Herring4 ; Andrea Kepsel5 ; Tierney Lyons6 ; Scott McLachlan7 ; Carol Shannon1 ; Lin Wu 8 University of Michigan-Ann Arbor; 2) Georgia Campus-Philadelphia College of Osteopathic Medicine; 3) MCPHS University, Boston; 4) University of Central Florida College ofMedicine; 5) Michigan State University; 6) Cerebros Medical Systems, Jessup, PA; 7) Ruskin College, Oxford; 8) University of Tennessee, Memphis Objectives The Emerging Technologies Team, part of the Medical Library Association (MLA) systematic review (SR) projects, conducted a pilot study to identify emerging technologies relevant to medical librarians. The team analyzed results from its previously reported PubMed Search filter using text mining to identify patterns, themes, and trends important to the practice of medical librarianship and the communities we support. Methods Analysis: Challenges & Solutions Challenges: 1. FLink exports as CSV directly from PubMed, but only permits export of 10,000 records, no abstracts. 2. Inadequate (under-powered) hardware. 3. Large file size created challenges with opening file and file conversion. 4. Unable to install current version of text mining software (OpenRefine). 5. IT policies (blocking) and support at some institutions. Text Mining Images All images were created through the Voyant-Tools analysis: <http://voyant-tools.org/> Next Steps & Recommendations Voyant was used for basic analysis to identify big concepts from 2016, yet we can dig deeper with additional tools to generate unknown items. Moving forward, we will extend the years of analysis for trends and patterns and continue gathering more data. We will continue the text mining process, using visualization techniques such as Google Refine and OpenRefine for analysis in context and AntConc for concordance and fringe concepts. We will then publish the results. Once one has a dataset, the dataset itself can be useful to look for trends in specific areas, such as surgery or education, potentially in response to areas of interest within the library’s target audience. Results Sources / Resources Find us The finalized search strategy results in a five- year file with 162,339 records. Deduping in EndNote resulted in 162,221 records. We tested the analysis with 5-year [162,221], 3-year [107,531], and 1-year sets for each of the five years. For this poster we tested the analysis process with the single year set from 2016 [35,535]. In our initial project planning, we had identified five main areas of interest (technology, information, public health, education, and the body). This analysis made clear additional clusters of interesting content, especially new methodologies (e.g., big data and data visualization) and emerging interdisciplinary trends (such as precision medicine). As those had not been included in the original planning, this showed the potential benefit of text mining for discovering unknown areas of relevance. The primary areas of the body which were strongly represented in the data included blood, bone, brain, and urine. Related concepts which were strongly represented in the data set included cancer, diagnostics, treatment, and biomarkers. The three top technologies that arose from the text mining process were robotics, simulations, and 3D technologies, especially 3D printing. All three were being used most heavily in surgery. Simulations were also prominent in education/training. BIBLIOGRAPHY Higgins JPT, Deeks JJ, (eds.) 2011. Selecting studies and collecting data, London: The Cochrane Collaboration. Mane KK, Borner K. 2004. Mapping topics and topic bursts in PNAS. Proceedings of the National Academy of Sciences 101(suppl. 1), 5287-5290. Mikova N. 2016. Recent trends in technology mining approaches: Quantitative analysis of GTM Conference Proceedings. In: Daim TU, Chiavetta D, Porter AL, Saritas O (eds.) Anticipating future innovation pathways through large data analysis. Cham, Switzerland: Springer International Publishing. Porter AL, Cunningham SW. 2005. Tech mining: Exploiting new technologies for competitive advantage, Hoboken, NJ, John Wiley & Sons, Inc. Schünemann HJ, Oxman AD, Vist GE, Higgins JPT, Deeks JJ, Glasziou PP, Guyatt GH. 2011. Interpreting results and drawing conclusions. In: Higgins JPT, Green S (eds.) Cochrane handbook for systematic reviews of interventions Version 5.1.0 (updated March 2011). London: The Cochrane Collaboration. Stevens A, Milne R, Lilford R, Gabbay J. 1999. Keeping pace with new technologies: Systems needed to identify and evaluate them. BMJ 319, 1291-3. RESOURCES Endnote: endnote.com/ Voyant: https://voyant-tools.org/ OpenRefine: http://openrefine.org/ AntConc: www.laurenceanthony.net/software/antconc/ We began by establishing a common competency base through custom training sessions from higher education data-mining experts. Next, the team 1) reviewed and finalized the emerging technologies PubMed search strategy created for the project; 2) exported the data; 3) used automated tools to clean extraneous data from the data set; and 4) tested the data by running preliminary text-mining scans. Steps 3 and 4 were repeated to refine and focus the results. Tools such as GREP, R, FLink, pubmed.mineR were evaluated and tested for data export and cleaning, with the ultimate choices settling on a combination of EndNote, Voyant, OpenRefine for data cleaning, and Voyant, OpenRefine/GoogleRefine and AntConc for analysis. MLASR6 Google Plus Community: goo.gl/RxtOFg The Medical Library Association initiated a large systematic review project to assess the level of evidence available to support the profession and practice of medical librarianship in several very important questions. Team 6 has been assigned to explore this topic: The explosion of information, expanding of technology (especially mobile technology), and complexity of healthcare environment present medical librarians and medical libraries opportunities and challenges. To live up with the opportunities and challenges, what kinds of skill sets or information structure do medical librarians or medical libraries are required to have or acquire so as to be strong partners or contributors of continuing effectiveness to the changing environment? Process: Software & Technology 1 st stage: Voyant was used to identify words/word cloud to create custom stoplist. Stop word list was created by the team leader, and peer reviewed within the team. Collocations used to identify major tech concepts from word cloud concepts. Additional visualizations to refine understanding of top three tech concepts. 2 nd stage: OpenRefine will be used to open projects & expand analysis. (OpenRefine challenges: with full dataset, it stalls out in the project creation stage on team desktops. Works in Chrome, but not in Opera or Safari.) 3 rd stage: AntConc will be used for a deep dive into the specific terms/phrases of interest to discover their context and related concepts. Data Cleaning After running the finalized search strategy in PubMed, the resulting list was exported from the database in the MEDLINE format, creating a TXT file. To support the proposed text mining analysis of this dataset using Voyant or OpenRefine, a CSV file needed to be built. FLink, an NLM product developed to create CSV files, was used initially. Unfortunately, the program was only able to handle 10,000 records and could not produce a CSV file that included abstracts. To create the appropriate CSV file, results were exported from PubMed as TXT file, imported to EndNote, deduplicated within EndNote, then exported using a custom output style created by the team. The resulting CSV file of 162,339 PubMed records was downloaded to Excel where all fields except for PMID, Title, Abstract and MeSH or Keywords were deleted. The remaining content was cleaned by removal of punctuation (using nested SUBSTITUTE functions) and changing all text to lowercase (using the LOWER function). Considerations during punctuation removal included separation of MeSH headings and subheadings by removal of the “/” character, the importance of “.” character in numerical values, and the decision of whether or not to keep numeric data. . Figure 1: Word Cloud of Whole Corpus Figure 2: Top Three Technologies Identified in Analysis Figure 3: 3D Printing in Context Solutions: 1. Export full records, import to Endnote, use custom filter for initial cleaning, export to CSV file. 2. Do you have access to more powerful computers elsewhere? 3. Break file into smaller chunks for cleaning; pool for final analysis. 4. Upgrade computer to have more memory, or use more powerful computer elsewhere. 5. Ensure good technical support and administrative backing for project.