a system called natural language interface which transforms user's natural language question into SPARQL query
find related papers here https://sites.google.com/site/fadhlinams81/publication
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
semantic web & natural language
1. Natural Language
Interface: Challenges and
Partial Solutions
NURFADHLINA MOHD SHAREF (PhD)
Postdoctoral Fellow
Knowledge Technology Group
Centre of Artificial Intelligence
Faculty of Technology and
Information Science
Universiti Kebangsaan Malaysia
fadhlinams81@gmail.com
2. Outline
• Part 1: Introduction to Semantic Web
– RDF
– OWL
– SPARQL
• Part 2: Natural Language Interface
– Semantic Web Search Engine
– NLI Applications
– Challenges and Partial Solutions
– Potential Works
• Part 3: Practical Examples
– Mooneys Geography Dataset
– Automatic SPARQL Construction for Natural Language-based
Search in Semantic Database
3. Part 1
• Introduction to Semantic Web
– RDF
– OWL
– SPARQL
4.
5. Semantic Web: “a web of data that can
be processed directly and indirectly by
machines (Tim Berners-Lee) ”
5
6. RDF (Resource Description
Framework)
• Talk about resources
– Resources can be pretty much anything
– Resources are identified by Uniform Resource Identifiers (URIs)
– Things (in a broad sense) are labelled with URIs
– URIs act as globally valid names
– Sets of names are organized in vocabularies
– Vocabularies are demarcated by namespaces
• Information is encoded in Triples= subject-predicate-object
patterns
– Malaysia has capital Kuala Lumpur
– Participant has course Semantic Technology
Taken from: http://www.w3.org/2009/Talks/1030-Philadelphia-IH/Tut6orial.ppt
9. Properties of the resource
- The elements, artist, country, company, price, and year
are defined in the http://www.recshop.fake/cd# namespace.
RDF Example 9
XML Declaration
namespace
12. Ontology in Information Science
• An ontology is an engineering artefact consisting of:
– A vocabulary used to describe (a particular view
of) some domain
– An explicit specification of the intended meaning
of the vocabulary.
• Often includes classification based information
– Constraints capturing background knowledge
about the domain
• Ideally, an ontology should:
– Capture a shared understanding of a domain of
interest
– Provide a formal and machine manipulateable
model
12
13. OWL
• built on top of RDF
• for processing information on the web
• designed to be interpreted by computers
• was not designed for being read by people
• written in XML
• is a W3C standard
• Based on predecessors (DAML+OIL)
• A Web Language: Based on RDF(S)
• An Ontology Language: Based on logic
13
14. OWL vs RDF
• OWL and RDF are much of the same thing, but OWL is
a stronger language with greater machine
interpretability than RDF.
• OWL comes with a larger vocabulary and stronger
syntax than RDF.
– specific relations between classes, cardinality, equality,
richer typing of properties, characteristics of properties,
and enumerated classes.
• OWL comes in three increasingly expressive layers that
are designed for different groups of users
– OWL Lite, OWL DL, and OWL Full
14
21. The SPARQL Query Language
?name ?faculty
Joe “CS“
Fred “CS“
21
SELECT ?name ?faculty
WHERE {
?teacher rdf:type Teachers.
?teacher name ?name.
?teacher faculty ?faculty.
}
Operator AND („.“)
Teachers
t1 t2
faculty faculty
name
name
Joe “CS“ Fred “CS“
22. The SPARQL Query Language
?name ?faculty
Joe “CS“
22
SELECT ?name ?faculty
WHERE {
?teacher rdf:type Teachers.
?teacher name ?name.
?teacher faculty ?faculty.
FILTER (?name=„Joe“)
}
Operator FILTER
Teachers
t1 t2
faculty faculty
name
name
Joe “CS“ Fred “CS“
23. The SPARQL Query Language
?name ?faculty ?title
Joe “CS“
Fred “CS“ “Professor“
23
SELECT ?name ?faculty ?title
WHERE {
?teacher rdf:type Teachers.
?teacher name ?name.
?teacher faculty ?faculty.
OPTIONAL {
?teacher title ?title.
}
}
Operator OPTIONAL
„Professor“
title
Teachers
t1 t2
faculty faculty
name
name
Joe “CS“ Fred “CS“
24. Part 2
• Natural Language Interface
– Semantic Web Search Engine
– NLI Applications
– Challenges and Partial Solutions
– Potential Works
25. Semantic Web Search Engine
• to provide ability to understand the intent of
the searcher and return result in the context
of the query meaning.
• distinguished from standard search engine
because the sources of the documents are in
the RDF, OWL and RDF-extended HTML
documents.
• E.g: Swoogle, Serene, Watson
26. Natural Language Interface (NLI)
• allows user to query in human-like sentences, without
requiring them to be aware of the underlying schema,
vocabulary and query language
• Famous for question answering
• three types of NLI
– with structured data such as database and ontologies,
– with a semi or unstructured data such as text documents,
– interactive setting as conversational system
• Approaches
– Controlled Natural Language for query construction
– Visual-based query construction
– NL query mapping to triple representation
34. Comparison
Year
Input type
Synonym
support
Syntactic analysis
Calculate string
similarity
Clarification
dialogue
Learnability
Support KB
Heterogeneity
Semantic
Crystal
1993 Graphical based
query
NO NO NO NO NO NO
GINO /
Ginseng
2006 Controlled natural
language based
interface
WordNet YES NO NO NO NO
Querix 2006 Query by example WordNet NO NO YES NO NO
NLPReduce 2007 Keywords, sentence
fragments and full
sentences
NO NO NO NO NO NO
QuestIO 2008 Full natural language Gazetteer YES YES NO NO NO
ORAKEL 2008 Factual question Lexicon NO NO NO NO NO
AquaLog /
2010 Full natural language WordNet,
PowerAqua
Lexicon
YES YES YES NO YES
FREyA 2012 Full natural language WordNet YES NO YES YES NO
35. NLI Implementation
• Query: “Who wrote The Neverending Story?”
• PowerAqua triple:
<[person,organization], wrote,Neverending Story>
• Triple Matching from Dbpedia:
<Writer, IS A,Person>
<Writer, author,The Neverending Story>
• Answer: “Michael Ende”
36.
37. NLI Challenges
(Unger et al., 2012)
1.
(a) Which cities have more than three universities?
(b) <[cities],more than,universities three>
(c) SELECT ?y WHERE {
?x rdf:type onto:University . ?x onto:city ?y .
} HAVING (COUNT(?x) > 3)
2.
(a) Who produced the most films?
(b) <[person,organization], produced,most films>
(c) SELECT ?y WHERE {
?x rdf:type onto:Film . ?x onto:producer ?y .
} ORDER BY DESC(COUNT(?x)) OFFSET 0 LIMIT 1
39. Query Understanding
• Input type
– Current
• Guided query: controlled natural language, query
indicator (e.g: WH-terms)
• Graphical query construction
– Problem
• Confusing
• Requires a degree of background knowledge
• Constrained search
40. Query Understanding
• Compositional Density
– Current
• Triple generated by PowerAqua for Give me five albums
by Pink Lloyd
– <[albums, five], null, Lloyd Pink>,
– <[five], null, albums>,
– <[Pink], null, Lloyd>
– Potential Works
• Negation (e.g: not, outside, except)
• Arithmetic (e.g: sum of, how many, largest)
• Auxiliary (e.g: largest, latest, top)
45. Types of queries
(Ferre & Hermann, 2011)
• Visualization
– exploration of the facet hierarchy
• Selection
– count or list items that have a particular feature
• Path
– subjects had to follow a path of properties.
• Disjunction
– required the use of unions
• Negation
– required the use of exclusions
• Inverse
– required the crossing of the inverse of properties
• Cycle
– required the use of co-reference variables (naming and reference navigation
links)
49. Result Understanding
• Ranking result
– List vs. finite answer
– Degree of confidence / hit score
– Learnability
50. Part 3
• Practical Examples
– Mooneys Geography Dataset
– Automatic SPARQL Construction for Natural
Language-based Search in Semantic Database
51. Geography.owl
Class
DataTypeProperty ObjectProperty
Name Domain Range Name Domain Range
City cityPopulation City float borders State State
Capital statePopulation State float isCityOf City State
State statePopDensity State float hasCity State City
HiPoint abbreviation State string isCapitalOf Capital State
LoPoint stateArea State float hasCapital State Capital
Mountain lakeArea Lake float isMountainOf Mountain State
Lake height Mountain float hasMountain State Mountain
River hiElevation HiPoint float isHighestPointOf HiPoint State
Road loElevation LoPoint float hasHighPoint State HiPoint
length River float isLowestPointOf LoPoint State
number Road float hasLowPoint State LoPoint
isLakeOf Lake State
hasLake State Lake
runsThrough River State
hasRiver State River
passesThrough Road State
hasRoad State Road
52. Can you tell me the capital of texas? How large is texas?
Give me all the states of usa? How long is rio grande?
Give me the cities in texas? How long is the colorado river?
Give me the cities which are in texas? How long is the mississippi?
Give me the lakes in california? How long is the mississippi river?
Give me the states that border utah? How long is the mississippi river in miles?
Give me the number of rivers in california? How many capitals does rhode island have?
How many citizens in alabama? How many cities does texas have?
How many citizens live in california? How many cities does the usa have?
Give me the longest river that passes through the
us?
How many citizens does the biggest city have in
the usa?
Give me the largest state? How high are the highest points of all the states?
Could you tell me what is the highest point in the
state of oregon? How high is the highest point in america?
Count the states which have elevations lower than
what alabama has? How high is the highest point in montana?
How big is texas? How high is the highest point in the largest state?
How big is the city of new york? How large is the largest city in alaska?
How many colorado rivers are there? How long is the longest river in california?
How high is guadalupe peak? How long is the longest river in the usa?
How high is mount mckinley? How long is the shortest river in the usa?
How many cities named austin are there in the
usa? How many big cities are in pennsylvania?
53. Approach
• Can you tell me the capital of texas?
– POS: Can/MD you/PRP tell/VB me/PRP the/DT capital/NN of/IN texas/NNS ?/.
– Triple: <capital,?,texas>
– SPARQL:
"PREFIX geo:<http://www.mooney.net/geo#>"+
"SELECT ?s "+
"WHERE "+
"{?s geo:isCapitalOf geo:texas . }";
– Answer: geo:austinTx
• Give me all the states of usa?
– POS: Give/VB me/PRP all/PDT the/DT states/NNS of/IN usa/NN ?/.
– Triple: <states, ?, usa>
– SPARQL:
"PREFIX geo:<http://www.mooney.net/geo#>"+
"SELECT ?s "+
"WHERE "+
"{?s a geo:State . }";
– Answer: geo:kansas, geo:rhodeIsland, geo:montana, geo:tennessee, geo:arkansas,
geo:newMexico, …(all the states)
55. More to Do
• Domain dependent/independent?
• Is the heuristics that POS and KB compliance
enough for SPARQL generation?
• More complex queries
– Arithmetic operation (COUNT, SUB-QUERY)
– Aggregation (requires FILTER, OPTIONAL, HAVING)
– Auxiliary (e.g: latest, earliest)
56. Conclusion
• NLI is a potential area
• Highlight: ambiguity reduction, query
understanding, query-KB matching
• Focus: SPARQL generation and optimization
• Potential sub-area: negation, arithmetic,
temporal, complex queries
57. References
• Ferre, S., & Hermann, A. (2011). Semantic Search :
Reconciling Expressive Querying and Exploratory
Search. ISWC11 Proceedings of the 10th international
conference on The semantic web (pp. 177-192).
• Unger, C., Bühmann, L., Lehmann, J., Ngonga Ngomo,
A.-C., Gerber, D., & Cimiano, P. (2012). Template-based
question answering over RDF data. Proceedings of the
21st international conference on World Wide Web -
WWW 12, 639. New York, New York, USA: ACM Press.
doi:10.1145/2187836.2187923
58. Contact
Nurfadhlina Mohd Sharef
• Postdoctoral Fellow, Knowledge Technology Group,
Centre of Artifical Intelligence, Universiti Kebangsaan
Malaysia
(Room 4.4, Level 4, Block H)
• Department of Computer Science, Faculty of
Computer Science and Information Technology,
Universiti Putra Malaysia
(Room C2.08, Level 2, Block C)
• fadhlinams81@gmail.com