I presented a talk at the Protege research meeting on the 'Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)' https://sites.google.com/site/protegeresearchmeeting/meeting-materials/current-advances-to-bridge-the-usability-expressivity-gap-in-semantic-search
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)
1. CURRENT ADVANCES TO BRIDGE THE
USABILITY-EXPRESSIVITY GAP IN
BIOMEDICAL SEMANTIC SEARCH
(AND VISUALIZING LINKED DATA)
Maulik R. Kamdar
Biomedical Informatics PhD Program
3rd April 2015
2. QUERYING HETEROGENEOUS
DATASETS ON THE LINKED DATA WEB
André Freitas, Edward Curry, João Gabriel
Oliveira and Seán O'Riain
Internet Computing
February 2012
EVALUATING THE USABILITY OF
NATURAL LANGUAGE QUERY
LANGUAGES AND INTERFACES TO
SEMANTIC WEB KNOWLEDGE BASES
Esther Kaufmann and Abraham Bernstein
Journal Of Web Semantics
November 2010
3. INTRODUCTION
¢ Opportunities
— Builds on existing Web Infrastructure (URIs and HTTP)
and Semantic Web Standards (RDF, RDFS, vocabularies)
— Reduce barriers to data publication, consumption, reuse
and availability, adding a fine-grained structure.
— Expose previously siloed databases as data graphs (D2R,
Google Refine) to be interlinked and integrated with other
datasets to create a global-scale interlinked dataspace.
¢ Challenges
— Awareness of which exposed datasets potentially contain
the data they want, their location and their data model.
— Syntax of structured query languages like SPARQL
— Heterogeneous, different descriptors for same entity,
loosely-connected (yet!) and distributed data sources
10. EXISTING APPROACHES
¢ Information Retrieval Approaches
— Entity-centric Search (SWSE, Sindice)
— Structure Search (Semplore) – use of inverted indexes
and user feedback strategies
¢ Natural Language Queries
— Question Answering (PowerAqua, FREyA)
— Difficult to expand across domains
— Best-effort Natural Language Interfaces (Treo)
— Habitability Problem - users need guidance and support
— WordNet/Wikipedia semantic approximation techniques
¢ Structured SPARQL Queries
11. CHALLENGE DIMENSIONS
¢ Query expressivity
— Query datasets by referencing elements in the data model, operate
over the data (aggregate results, express conditional statements).
¢ Usability
— An easy-to-operate, intuitive, and task-efficient query interface.
¢ Vocabulary-level semantic matching
— Semantically match query terms to dataset vocabulary-level terms.
¢ Entity reconciliation
— Match entities expressed in the query to semantically equivalent
dataset entities.
¢ Semantic tractability mechanisms
— Answer queries not supported by explicit dataset statements
(for example, “Is Natalie Portman an Actress?” can be supported by
the statement “Natalie Portman starred Star Wars”).
16. BIOMEDICAL MOTIVATION
~5 compounds
~300 000
compounds
~300 interesting
compounds
~ 10 interesting
compounds
Literature
VirtualScreening
Querydatabases
Hypothesis
Generation
(Linked) Data
“Are there Drugs with molecular weight
under 400 tested against ‘Colon Cancer’?”
“Do any Publications refer to assays using ‘Aspirin’ as
the primary Drug in treatment of ‘Prostrate Cancer’?
17. REVEALD: A USER-DRIVEN
DOMAIN-SPECIFIC INTERACTIVE
SEARCH PLATFORM FOR
BIOMEDICAL RESEARCH
Maulik R. Kamdar, Dimitris Zeginis, Ali Hasnain,
Stefan Decker and Helena F. Deus
Journal of Biomedical Informatics
February 2014
18. CHALLENGES
¢ Awareness of which exposed datasets potentially
contain the data they want and their data model.
¢ Large, heterogeneous biomedical data sources, which
are too dynamic for reliable data centralization
¢ The assembly of SPARQL queries to create the
aggregated information for bioinformatics analysis
still poses a high cognitive entry barrier.
¢ Human-readable, and more specifically, domain-
specific representation of query results is required.
¢ None of the previous systems tested in biomedical
domains, except DistilBio, VIQUEN and Cuebee
¢ Trade-off between expressivity and usability.
19. BACKGROUND: CANCO DOMAIN-SPECIFIC MODEL
Zeginis, Dimitris, et al. "A collaborative methodology for developing a semantic model for interlinking Cancer Chemoprevention linked-data sources." Semantic Web 5.2 (2014): 127-142.
20. BACKGROUND: CANCO DOMAIN-SPECIFIC MODEL
Zeginis, Dimitris, et al. "A collaborative methodology for developing a semantic model for interlinking Cancer Chemoprevention linked-data sources." Semantic Web 5.2 (2014): 127-142.
21. LIFE SCIENCES LINKED OPEN DATA CLOUD
~3 Billion Triples Life
Sciences
53 datasets
Cyganiak,R. and Jentzsch,A. (2014) The Linking Open Data cloud diagram. http://lod-cloud.net/ [Accessed: March 23, 2013]
22. BACKGROUND: CATALOGUING & LINKING
1248 Concepts and 1255 properties were harvested
from more than 53 Linked Biomedical Data Sources
(LBDS) (Life Sciences Linked Open Data – LSLOD
catalogue) and linked to the CanCO Query Elements.
Hasnain, Ali, et al. "Cataloguing and linking life sciences LOD cloud." 1st International Workshop on Ontology Engineering in a Data-driven World (OEDW 2012).
26. REVEALD SEARCH PLATFORM
¢ ReVeaLD :- Real-Time Visual Explorer and
Aggregator of Linked Data, is a user-driven
domain-specific search platform.
¢ Intuitively formulate advanced search queries
using a click-input-select mechanism
¢ Visualize the results in a domain–suitable format.
¢ Entity-centric and Visual Query Search System
¢ Assembly of the query is governed by a Domain-
specific Language (DSL), which in this case is the
Cancer Chemoprevention Ontology(CanCO)
37. GRAPHIC RULES
¢ Query : SELECT * WHERE {<clickedURI> ?p ?o}
¢ Results are subjected to a set of Graphic Rules, which
follow the Event-Condition-Action paradigm (ECA)
and provide visual representations using
Fresnel Display Vocabulary.
¢ Example :
— Event: Each retrieved triple as query execution result
<http://www4.wiwiss.fu-berlin.de/drugbank/resource/targets/844> <http://
www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/pdbIdPage> “http://
www.pdb.org/pdb/explore/explore.do?structureId=1IVO”
— Condition: sdf_file or pdbIdpage (Predicate) + http (Object)
— Action: HTTP GET and invoke a specific Resource Renderer
— Resource Renderer: GLMol Molecular Viewer
39. EVALUATION
¢ Tracking Real-time User Experience Methodology (TRUE)
- widely used in the HCI community to evaluate computer games
¢ Game-based evaluation where domain users are given tasks to complete
and time and interactions are tracked using Google Analytics
¢ Subjectivistic evaluation where users were asked to fill out a survey.
¢ The main purpose of this evaluation focused on two usability concerns:
— Does familiarity of the users with the DSL affect the time needed to
formulate the query?
— Does a constrained DSL (smaller DSL), lead to less time needed for
query formulation?
43. OTHER IMPLEMENTATIONS: LINKED TCGA
Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TCGA and PubMed. Web Semantics: Science, Services and Agents on the World Wide Web, 27, 34-41.
http://srvgal78.deri.ie/tcga-pubmed/
44. OTHER IMPLEMENTATIONS: LINKED TCGA
Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TCGA and PubMed. Web Semantics: Science, Services and Agents on the World Wide Web, 27, 34-41.
http://srvgal78.deri.ie/tcga-pubmed/
45. OTHER IMPLEMENTATIONS: LINKED TCGA
Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TCGA and PubMed. Web Semantics: Science, Services and Agents on the World Wide Web, 27, 34-41.
http://srvgal78.deri.ie/tcga-pubmed/
46. OTHER IMPLEMENTATIONS: LINKED TCGA
Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TCGA and PubMed. Web Semantics: Science, Services and Agents on the World Wide Web, 27, 34-41.
http://srvgal78.deri.ie/tcga-pubmed/
49. DISCUSSION
¢ DSL Incrementation Mechanism
— Extend the current model represented in the Visual Query
Builder by adding new concepts and properties.
— Use or merge publicly available extensions of the DSL
¢ No reliance on the Federated Query Engine, SPARQL
Endpoint, underlying DSL and Graphic Rules.
¢ Corrupt Graphic Rules result in the textual
representation of the relevant triple.
¢ Domain-specific Languages increase usability and
enable abstraction of underlying data models
Query expressivity
Usability
Vocabulary-level
semantic matching
Entity reconciliation
Semantic tractability
mechanisms
Medium
(SELECT,
FILTER,
OPTIONAL)
Medium
(En=ty-‐
centric
Search,
VQS)
Low
(Indexed
Term
URI
to
Concept)
Low
(owl:sameAs
for
same
unique
keys)
None
50. FUTURE WORK
¢ Ontologies, indexed term labels and catalogue as elements
in a Controlled Natural Language to increase usability
¢ Results pipelined to any Problem-solving method (like
Autodock Vina, visualization, ML algorithm etc.)
¢ Faceted Search, Related Entity Recognition based on
Feature-based Similarity Measures
¢ Allowing users of the platform to provide their own DSL,
data sources, and graphic rules.
¢ SPARQL Endpoint availability and latency
¢ Ontology Reuse instead of Ontology Alignment!