PubMed now indexes roughly 25 million articles and is growing by more than a million per year. The scale of this “Big Knowledge” repository renders traditional, article-based modes of user interaction unsatisfactory, demanding new interfaces for integrating and summarizing widely distributed knowledge. Natural language processing (NLP) techniques coupled with rich user interfaces can help meet this demand, providing end-users with enhanced views into public knowledge, stimulating their ability to form new hypotheses.
Knowledge.Bio provides a Web interface for exploring the results from text-mining PubMed. It works with subject, predicate, object assertions (triples) extracted from individual abstracts and with predicted statistical associations between pairs of concepts. While agnostic to the NLP technology employed, the current implementation is loaded with triples from the SemRep-generated SemmedDB database and putative gene-disease pairs obtained using Leiden University Medical Center’s ‘Implicitome’ technology.
Users of Knowledge.Bio begin by identifying a concept of interest using text search. Once a concept is identified, associated triples and concept-pairs are displayed in tables. These tables have text-based and semantic filters to help refine the list of triples to relations of interest. The user then selects relations for insertion into a personal knowledge graph implemented using cytoscape.js. The graph is used as a note-taking or ‘mind-mapping’ structure that can be saved offline and then later reloaded into the application. Clicking on edges within a graph or on the ‘evidence’ element of a triple displays the abstracts where that relation was detected, thus allowing the user to judge the veracity of the statement and to read the underlying articles.
Knowledge.Bio is a free, open-source application that can provide, deep, personal, concise, shareable views into the “Big Knowledge” scattered across the biomedical literature.
Application: http://knowledge.bio
Source code: https://bitbucket.org/sulab/kb1/
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
1. PubMed now indexes roughly 25 million articles and is growing by
more than a million per year. The scale of this “Big Knowledge”
repository renders traditional, article-based modes of user interaction
unsatisfactory, demanding new interfaces for integrating and
summarizing widely distributed knowledge. Natural language
processing (NLP) techniques coupled with rich user interfaces can
help meet this demand, providing end-users with enhanced views into
public knowledge, stimulating their ability to form new hypotheses.
Knowledge.Bio provides a Web interface for exploring the results from
text-mining PubMed. It works with subject, predicate, object
assertions (triples) extracted from individual abstracts and with
predicted statistical associations between pairs of concepts. While
agnostic to the NLP technology employed, the current implementation
is loaded with triples from the SemRep-generated SemmedDB
database and putative gene-disease pairs obtained using Leiden
University Medical Center’s ‘Implicitome’ technology.
Users of Knowledge.Bio begin by identifying a concept of interest using
text search. Once a concept is identified, associated triples and
concept-pairs are displayed in tables. These tables have text-based
and semantic filters to help refine the list of triples to relations of
interest. The user then selects relations for insertion into a personal
knowledge graph implemented using cytoscape.js. The graph is used
as a note-taking or ‘mind-mapping’ structure that can be saved offline
and then later reloaded into the application. Clicking on edges within
a graph or on the ‘evidence’ element of a triple displays the abstracts
where that relation was detected, thus allowing the user to judge the
veracity of the statement and to read the underlying articles.
Knowledge.Bio is a free, open-source application that can provide, deep,
personal, concise, shareable views into the “Big Knowledge”
scattered across the biomedical literature.
Application: http://knowledge.bio
Source code: https://bitbucket.org/sulab/kb1/
Abstract
References
[1] Ono, Keiichiro, Barry Demchak, and Trey Ideker. "Cytoscape tools for the
web age: D3. js and Cytoscape. js exporters." F1000Research 3 (2014).
[2] Kilicoglu, Halil, et al. "SemMedDB: a PubMed-scale repository of biomedical
semantic predications." Bioinformatics 28.23 (2012): 3158-3160.
[3] Rindflesch, Thomas C., and Marcelo Fiszman. "The interaction of domain
knowledge and linguistic structure in natural language processing: interpreting
hypernymic propositions in biomedical text." Journal of biomedical informatics
36.6 (2003): 462-477.
[4] Bodenreider, Olivier. "The unified medical language system (UMLS):
integrating biomedical terminology." Nucleic acids research 32.suppl 1 (2004):
D267-D270.
[5] Hettne KM, Thompson M, Van Haagen H, Van der Horst E, Kaliyaperumal R,
Mina E, Tatum Z, Laros JFJ, Van Mulligen EM, Schuemie M, Aten E, Shu Li
T, Bruskiewich R, Good BM, Su AI, Kors JA, Den Dunnen J, Van Ommen G,
Roos M, ìt Hoen PAC, Mons B, Schultes EA. The implicitome: a resource for
inferring gene-disease associations. Under review.
[6] https://github.com/BiosemanticsDotOrg/GeneDiseasePaper
[7] Swanson, Don R. "Medical literature as a potential source of new knowledge."
Bulletin of the Medical Library Association 78.1 (1990): 29.
Acknowledgements
NIGMS
GM089820
Benjamin M. Good, Ph.D.1; Richard M. Bruskiewich, Ph.D.2; Kenneth C. Huellas-Bruskiewicz2; Farzin Ahmed2; Andrew I. Su, Ph.D.1
1The Scripps Research Institute, La Jolla, CA, USA. 2STAR Informatics / Delphinai Corporation, Port Moody, BC, Canada
Knowledge.Bio: an Interactive Tool for Literature-based Discovery
Big Knowledge
Cytoscape.js mindmap [1] for charting semantic
relationships mined from the literature. User’s
create their own maps as they interact with the tool.
The maps are interactive, with each edge linked to
the evidence underlying it. Maps can be saved as
local json files, shared and reloaded into the
application.
NHGRI HG008015
Contact
@bgood bgood@scripps.edu
Evidence view. Shows original sentence contexts for
explicit triples along with links to view the associated
abstract and to view other triples mined from that
abstract. For implicit relations, clicking on ‘’show
evidence’ opens the ‘co-occurrence’ view so that the
user can examine the A-B and B-C connections.
Selecting edges in the map allows the
user to “show evidence” or to
remove the edge.
Table views. Concept relations are
presented in tables that may be
filtered for text or semantic types.
Concept search. The user begins with
a text search to identify a concept.
Once selected, the table views
provide access to related concepts
spread across the literature
• Knowledge.bio supports a concept-centric rather than document-centric
mode of interacting with the scientific literature
• It provides a way of using both statistically predicted, implicit associations
and explicit predications linked to specific sentences in the same application
• User-created concept maps can be saved, shared with others, and reloaded
into the application.
Key features
SemMedDB [2]
• Uses SemRep [3] to extract semantic predications (‘triples’) from
PubMed abstracts.
• The complete database contains more than 70 million predications.
• Utilizes properties such as ‘treats’, ‘causes’, ‘disrupts’, and ‘augments’
from the UMLS semantic network. [4]
Implicitome [5,6]
• Uses ‘concept profiles’ to identify implicit relations in the literature.
• Concept profiles are defined by weighted vectors of co-occurring concepts
• Concept profiles enable inference of relationships through Swanson’s ABC model [7]
• LUMC generated 204,072,376 ranked, implicit genes-diseases relations.
“Convulsive seizure was suppressed by
physostigmine (p less than 0.01) or 5-HTP (p less than
0.20). “
PMID 2893633
SemRep
Current sources of concept relations
Implicit connection. The edge linking Smith
Lemli Opitz Syndrome to CYP2R1 is
computationally inferred by the Implicitome.
Those concepts never co-occur in any abstract.
The explicit relations emanating from them provide
many hypothetical explanations for their
relationship.
Work in progress
• Conversion from current implementation as a MySQL-backed
Python/DJANGO application to a Java server implementation based on
NEO4J.
• Supporting “closed discovery” workflows by suggesting relations that
connect multiple input nodes.
• Integration with http://www.ndexbio.org for storing and sharing user-created
concept maps.
• Improving the facilitation of collaborative concept-map construction
• On-demand import of new concept-network sources such as
http://www.wikipathways.org
• Capturing user feedback for improving text-mining algorithms
0
200000
400000
600000
800000
1000000
1200000
1914 1934 1954 1974 1994 2014
Number of
biomedical
research
articles created
(as listed in
PubMed)
Year