Recent work on searching the Semantic Web has yielded a wide range of approaches with respect to the style of input, the underlying search mechanisms and the manner in which results are presented. Each approach has an impact upon the quality of the information retrieved and the user's experience of the search process. This highlights the need for formalised and consistent evaluation to benchmark the coverage, applicability and usability of existing tools and provide indications of future directions for advancement of the state-of-the-art. In this paper, we describe a comprehensive evaluation methodology which addresses both the underlying performance and the subjective usability of a tool. We present the key outcomes of a recently completed international evaluation campaign which adopted this approach and thus identify a number of new requirements for semantic search tools from both the perspective of the underlying technology as well as the user experience.
08448380779 Call Girls In Civil Lines Women Seeking Men
Evaluating Semantic Search Systems to Identify Future Directions of Research
1. IWEST 2012 workshop located at ESWC 2012
Evaluating Semantic Search
Systems to Identify Future
Directions of Research
Khadija Elbedweihy1, Stuart N. Wrigley1,
Fabio Ciravegna1, Dorothee Reinhard2,
Abraham Bernstein2
1University of Sheffield, UK
18.06.2012
2University of Zurich, Switzerland
1
2. Outline
• Introduction
• Evaluation Design
• Evaluation Execution
• Usability Feedback and Analysis
• Future Directions for Research
• Conclusions
18.06.2012
2
4. Semantic Search
• Semantic Search tools have different
• querying approaches (e.g., forms, graphs, keywords).
• search strategies during processing and query execution.
• format and content of the results presented to the user.
• These factors influence the user's perceived performance
usability of the tool.
• Searching is a user-centric process; usability evaluation is as
important as – if not more than – assessing the performance.
18.06.2012
4
5. Previous evaluation efforts
• Kaufmann (2007): compared 4 SW query interfaces (NL and Graph-
based)
• SemSearch Challenge: ad-hoc object retrieval using keywords
• Question Answering Over Linked Data (QALD): two NL interfaces
• TREC Entity List Completion (ELC) Task: similar to SemSearch
• All previous evaluations based upon the Cranfield methodology
– test collection; set of tasks; set of relevance judgments.
• Little or no focus on usability
18.06.2012
5
7. Evaluation Design
Aspect Details
Tools • Any query input style
• Answers extracted from data (e.g., list of URIs or literals but not documents)
Data Mooney Natural Language Learning Data
• known within the search community
• simple and well-known domain for subjects (geography)
• questions already available
• Give me all the state capitals of the USA?
• Which rivers in Arkansas are longer than Alleghany river?
Subjects 38 subjects (26 males, 12 females); aged between 20 and 35 years old
Criteria • Usability:
• query input (expressiveness, etc.)
• usefulness and suitability of returned answers (data) and presentation
• Performance: speed of execution (also affects user satisfaction)
18.06.2012
7
8. Data Captured
• Results for each question:
– time required to formulate query
– number of attempts required to answer question
– success rate (user found satisfying answer or not)
– query execution time
• Questionnaires capturing user experience
– System Usability Scale (SUS) questionnaire
– Extended questionnaire
– Demographics questionnaire
04.08.2010
8
10. Participating tools
Tool Description
K-Search Form-based
Ginseng Natural language with constrained vocabulary and grammar
NLP-Reduce Natural language for full English questions, sentence fragments,
and keywords.
PowerAqua Natural language interface
18.06.2012
10
13. Results
Criterion K-Search Ginseng Nlp- PowerAqua
‘Bad’
Bad Form- Controlled Reduce NL-based
based NL-based NL-based
Mean experiment time (s) 4313.84 3612.12 4798.58 2003.9 ‘Awful’
Mean SUS (0 – 100) 44.38 40 25.94 72.25
‘Good’
Mean Ext.Questionnaire (0-100) 47.29 45 44.63 80.67
Mean number of attempts 2.37 2.03 5.54 2.01 Twice # of
attempts
Mean answer found rate 0.41 0.19 0.21 0.55
Mean execution time (s) 0.44 0.51 0.51 11 slowest
Mean input time (s) 69.11 81.63 29 16.03
slowest
18.06.2012
13
14. Feedback: input style
Input Positive Negative
Free NL • fast (16 and 29 sec on average) mismatch (habitability) problem: “I need to
• most natural (query in plain natural know and use the terms expected by the
language) system and not my own terms to get results”
Contr. NL • guidance: suggestions and auto- very restricted language model:
completion • frustration (low SUS)
• avoids habitability problem (only valid • limit flexibility and expressiveness
queries) • slow query formulation (highest input
time: 81.63 sec)
Form • allow users to build more complex • more difficult to use than NL
queries than NL • time consuming (input time: 69.11 sec on
• helpful to know the search space average)
(concepts & relations)
18.06.2012
14
15. Feedback: results
Aspect Comments
Presentation Results not user-friendly
• provided full URIs of the concepts
(e.g. `http://www.mooney.net/geo#tennesse2’)
• used ontology labels for providing a NL representation of the answer
(e.g. `montgomeryAI’)
Management Users have high expectations; requested advanced means of managing
the results such as:
• storing and reusing results of previous queries
• filtering results according to some suitable criteria
• checking the provenance of the results
• basic manipulations such as sorting results
18.06.2012
15
17. Input Style
• Visualising the search space shows:
• what type of information is available (exploration)
• what queries are supported (query formulation guidance).
• Typing queries in natural language is fast and easy
• Provide ‘dual query formulation’ approach
• users unfamiliar with domain can correctly formulate their
intended queries using view-based
• users familiar with domain can use faster NL queries
18.06.2012
17
18. Input Style
• Comparatives and Superlatives still a challenge
e.g., FREyA uses an ‘intervention approach’
• if a numerical datatype property is found in user query:
1. generates maximum, minimum and sum functions
2. user chooses the required function
18.06.2012
18
19. Query Execution
Delays in response time negatively affect user experience and
satisfaction.
• Provide feedback
• reduces the effect of delays (more willing to wait if they know the
status of their search process).
• Provide intermediate (partial) results
• gradually incremented to provide the complete result set.
• similar to (arguably better than) basic feedback
18.06.2012
19
20. Results
• Presentation
• Attractive, accessible, understandable and user-friendly.
• Augment with associated information: `richer’ user experience.
• Management
• Filter, sort
• Some complex questions require multiple sub-queries
• Ability to store and reuse the result set could be helpful.
• Queries can then be constructed by combining saved queries
with logical operators such as `AND' and `OR’.
18.06.2012
20