SlideShare a Scribd company logo
1 of 12
Download to read offline
adfa, p. 1, 2011.
© Springer-Verlag Berlin Heidelberg 2011
A Dialogue System for Telugu, a Resource-Poor
Language
M Ch Sravanthi K Prathyusha Radhika Mamidi
IIIT-Hyderabad IIIT-Hyderabad IIIT-Hyderabad
Mullapudi.sravanthi prathyusha.k radhika.mamidi
@research.iiit.ac.in @research.iiit.ac.in @iiit.ac.in
Abstract. A dialogue system is a computer system which is designed to con-
verse with human beings in natural language (NL). A lot of work has been done
to develop dialogue systems in regional languages. This paper presents an ap-
proach to build a dialogue system for resource poor languages. The approach
comprises of two parts namely Data Management and Query Processing. Data
Management deals with storing the data in a particular format which helps in
easy and quick retrieval of requested information. Query Processing deals with
producing a relevant system response for a user query. Our model can handle
code-mixed queries which are very common in Indian languages and also han-
dles context which is a major challenge in dialogue systems. It also handles
spelling mistakes and a few grammatical errors. The model is domain and lan-
guage independent. As there is no automated evaluation tool available for dia-
logue systems we went for human evaluation of our system, which was devel-
oped for Telugu language over ‘Tourist places of Hyderabad’ domain. 5 people
evaluated our system and the results are reported in the paper.
1 Introduction
A dialogue system is a computer program that communicates with a human in a natu-
ral way. Many efforts are being done to make the conversations seem natural. Despite
a lot of progress in computational linguistics and language processing techniques we
do not see much usage of dialogue systems in real time. Some reasons for this may be
the lack of domain expertise, linguistic experts and computational tools. Our approach
to build a dialogue system is quick and does not require many language processing
tools. Our approach can be described in two parts namely Data Management and Que-
ry Processing.
─ Data Management: This component deals with categorization, text segmentation
and storage of the data in a hierarchical manner which helps in fast retrieval of the
output.
─ Query Processing: This takes a natural language query from a user as input, pro-
cess it to extract the keywords and update the context if necessary. Based on the
extracted keywords and the context it either retrieves an answer from the database
or asks the user an interactive question. In this way, it maintains the conversation
with the user all the time.
This model can be implemented with minimal or no use of any language processing
tool. Our approach is language independent and can be extended to any domain. It can
also handle context and code-mixed queries to make the conversations look more
natural.
Dialogue Example:
U1: hExarAbAxulo*
mAnyumeVMts emi unnAyi?
(What are the monuments in Hyderabad?)
S1: cArminAr, golkoVMda, makkA masjix. mIku vItilo exi kAvAli?
(Charminar, Golconda, Mecca Masjid. What among these do you need?)
U2: cArminAr eVkkada uMxi?
(Where is Charminar?)
S2: cArminAr yoVkka cirunAmA - cArminAr, heVxarAbAx, weVlaMgANa.
(The address of Charminar is -Charminar, Hyderabad, Telangana.)
U2: eVppudu opeVn uMtuMxi?
(When is it open?)
S2: cArminar yoVkka samayaM anni rojulu 9:00 am - 5:30 pm
(Charminar open timings -All days 9:00 am - 5:30 pm)
U3: eMtrI PIju eMwa?
(What is the entry fee?)
S3: cArminAr yoVkka PIju - iMdiyans - 20, vixeSIyulu - 150
(The fee for Charminar - Indians - Rs.20, Foreigners -Rs.150)
For example, the above conversation shows a user-system interaction from ‘Tourism’
domain for Telugu language. User responses are represented as U1, U2, etc. and sys-
tem responses are represented as S1, S2, etc. User and system response together make
a discourse unit i.e. U1 and S1 is a discourse unit. We can observe that context is also
handled up to many discourse units i.e. to answer U3, U4 we need context infor-
mation from U2.
U3 is a code-mixed query as it contains ‘opeVn’ (open), an English word. We can see
that U3 has been successfully processed and understood by the system. This shows
that code-mixed queries are also handled by our system.
*
Words are in wx format (sanskrit.inria.fr/DATA/wx.html). All the examples given in the
paper are from Telugu language.
2 Related Work:
There has been a lot of progress in the field of dialogue systems in last few years. In
general dialogue systems are classified into three types. (a) Finite State (or graph)
based systems, (b) Frame based systems, (c) Agent based systems.
(a)Finite state based systems: In this type of systems, conversation occurs according
to the predefined states or steps. This is simple to construct but doesn’t allow user to
ask questions and take initiative. [3] Proposed a method using weighted finite state
transducer for dialogue management.
(b)Frame based systems: These systems have a set of templates which are filled
based on the user responses. These templates are used to perform other tasks. [2] Pro-
posed an approach to build natural language interface to databases (NLIDB) using
semantic frames based on Computational Paninian Grammar. Context information in
NLIDB is handled by [1]. In this paper different types of user-system interactions
were identified and context was handled for one specific type of interaction. A dia-
logue based question answering system [6] which extracts keywords from user query
to identify a query frame has been developed for Railway information in Telugu.
(c)Agent based systems: These systems allow more natural flow of communication
between user and system than the other systems. The conversations can be viewed as
interaction between two agents, each of which is capable of reasoning about its own
actions. [4] Developed an agent based dialogue system called Smart Personal Assis-
tant for email management. This has been further extended for calendar task domain
in [5]. Our model can be categorized as an agent based system.
3 Our Approach
Fig.1 describes the flow chart of the internal working of our model.
Fig. 1. System Architecture
The major components in our method are:
─ Data Organization(Database)
─ Query Processing
 Knowledge Base
 Query Analyzer
 Advanced Filter
 Context Handler
 Dialogue Manager
3.1 Data Organization
Every domain has an innate hierarchy in it. Study of the possible queries in a domain
gives an insight about the hierarchical organization of the data in that domain. For
example, consider ‘Tourist places of Hyderabad’ domain in Fig. 2.
Fig. 2. Data organization of ‘Tourist places of Hyderabad’ domain
When we store data in this manner it becomes easy to add information and extend the
domain. We can see the extension of ‘Tourist places of Hyderabad’ domain to ‘Hy-
derabad Tourism’ domain in Fig. 3.
Fig. 3. Data organization of ‘Tourism of Hyderabad’ domain
In this hierarchical tree structure the leaf nodes are at level-0 and the level increases
from a leaf node to the root node. The data at level-n (where n is number of levels) is
segmented recursively until level-1. Then each segment at level-i (i=1... n) is given a
level-(i-1) tag.
In physical memory all the layers above level-1 are stored as directories, level-1 as
files and level-0 as tags in a file. The text in a file is stored in the form of segments
and each segment is given a level-0 tag (address, open timings, entry fee etc.). The
labels of all the files and directories along with the information in the files contribute
to the data set.
3.2 Query Processing
The entire process from taking a user query to generating a system response is termed
as ‘Query Processing’. The different components of the ‘Query Processing’ module
are described in the subsequent sections.
3.2.1 Knowledge Base
Knowledge base contains a domain dependent ontology like list of synonyms and
code-mixed words. This helps in handling code-mixed and wrongly spelt words in the
queries. This module is used by the Query Analyzer to replace the synonyms, code-
mixed words etc., in a query with corresponding level-i (0…n) tags. If any language
has knowledge resources like WordNet, dbpedia etc., they can be used to build the
knowledge base. This has to be done manually.
3.2.2 Query Analyzer
The NL query given by the user is converted into wx query which is given as input to
the Query Analyzer. The wx query is then tokenized and given as input to morpholog-
ical analyzer and parts of speech (POS) tagger[11]. From the morphological analyz-
er’s output, extract the root words of all the tokens in the query and replace these
tokens with the corresponding root words. In this modified query the synonyms, code-
mixed words etc., are replaced with corresponding level-i tags as discussed in
Knowledge Base. Languages that do not have a morphological analyzer can build a
simple stemmer which applies ‘minimum edit distance algorithm’ to find the root
word of the given token by maintaining a root dictionary and then replaces the token
with the root word [7]. Here, POS tagger is used only to identify the question words
in the query. Languages with no POS tagger can have a list of question words.
Example:
User Query: golkoVMda addreVssu emiti, e tEMlo cUdavaccu?
Golconda address what what time can visit
(What is the address of Golconda, what time to visit?)
POS-tagger: golkoVMda addreVssu emiti/WQ e/WQ tEMlo cUda-vaccu
Root word replacement: golkoVMda addreVssu emiti/WQ e/WQ tEM cUdu
Query Analyzer’s Output: golkoVMda cirunAmA emiti/WQ e/WQ samayaM cUdu
From the above output, we can see that English words like ‘addreVssu’ (address) and
‘tEM’ (time) are mapped to corresponding Telugu words i.e. ‘cirunAmA’ (address)
and ‘samayaM’ (time) respectively by using knowledge base. If there is no corre-
sponding Telugu word the English word remains as it is.
3.2.3 Advanced Filter
From the above modified query, question words and level-i words are extracted. From
these words this module extracts only the words which play a role in the answer re-
trieval by applying heuristics like level-i words nearest to the question words, level-0
words etc. If these heuristics are not satisfied, all the level-i words are considered for
further processing. These final sets of words are the keywords.
A user query can contain more than one question. In such cases, keywords belonging
to a particular question are grouped together. All the groups of a query are collective-
ly called a ‘query set’.
Example:
U: nenu golkoVMda xaggara unnAnu, cArminAr PIju eVMwa iMkA cArminAr
makkAmasjix eVkkada unnAyi ?
(I am near Golconda, what is the fee for Charminar and what is the address of
Charminar and Mecca Masjid?)
Query analyzer output: nenu golkoVMda xaggara uMdu cArminAr PIju
eVMwa/WQ iMkA cArminAr makkAmasjix cirunAmA uMdu
Extracted words: golkoVMda, cArminAr, PIju, eVMwa, cArminAr, makkAmasjix,
cirunAmA
Keywords: cArminAr, PIju, eVMwa, cArminAr, makkAmasjix, cirunAmA
Query set: [cArminAr, PIju, eVMwa], [cArminAr, makkAmasjix, cirunAmA]
In ‘U’, ‘I am near Golconda’ is unnecessary information. Therefore even if ‘Golcon-
da’ is a word belonging to level-i, it will not be considered as a keyword.
3.2.4 Context Handler
Handling context is very important in any conversation to capture user’s intention.
This is a major challenge in present day dialogue systems. The query set given by
Advanced Filter is used to update the context. If we identify any level-i (i=1, 2...n)
word in the query set then there is a shift in the context. If there are no level-n(n<i)
words in the query set we borrow level-n words relevant to level-i from the previous
query set and add them to form the new query set. If the query set contains level-0
words and no words from level-i (i=1, 2...n), then we borrow level-i words from the
previous query set.
Example:
U1: cArminAr eVkkada uMxi?
(Where is Charminar?)
S1: cArminAr yoVkka cirunAmA - cArminAr, heVxarAbAx, weVlaMgANa.
(The address of Charminar is -Charminar, Hyderabad, Telangana.)
U2: eVppudu opeVn uMtuMxi?
(When is it open?)
S2: cArminar yoVkka samayaM anni rojulu 9:00 am - 5:30 pm
(Charminar open timings -All days 9:00 am - 5:30 pm)
U3: eMtrI PIju eMwa?
(What is the entry fee?)
S3: cArminAr yoVkka PIju - iMdiyans - 20, vixeSIyulu - 150
(The fee for Charminar - Indians - Rs.20, Foreigners -Rs.150)
U4: golkoVMda eVkkada uMxi?
(Where is Golconda?)
S4: golkoVMda yoVkka cirunAmA - ibrahIM bAg, heVxarAbAx,
weVlaMgANa 500008.
(The address of Golconda is Ibrahim Bagh, Hyderabad, Telangana 500008.)
In this example, to answer U2 and U3 we need contextual information (cArminAr
[Charminar] ) from U1. Context doesn’t change for U2 and U3 as there is no level-
i(i=1,2..n) word in them. We can observe the context switch from U3 to U4 i.e. switch
from ‘cArminAr’ (Charminar) to ‘golkoVMda’ (Golconda) due to the occurrence of
Golconda (a level-1 word in ‘Tourism’ domain) in U4.
3.2.5 Dialogue Manager
In any dialogue system, Dialogue Manager (DM) is the major component. It coordi-
nates the activity of several subcomponents in a dialogue system and controls the flow
of dialogue by giving relevant responses to user queries. The dialogue manager takes
the query set and the context information as input. If the user query is ambiguous or
no keywords are identified then the dialogue manager poses the user an interactive
question from the set of canned questions. Otherwise it retrieves a relevant answer
from the database.
Example:
U1: nenu golkoVMda cUdAli
(I have to visit Golconda)
S1: mIku memu e vidamugA sAyapadagalamu?
(How can we help you?)
U2: axi eVkkada uMxi?
(Where is it?)
S2: golkoVMda yoVkka cirunAmA - ibrahIM bAg, heVxarAbAx,
weVlaMgANa 500008.
(The address of Golconda is Ibrahim Bagh, Hyderabad, Telangana 500008.)
In U1, the information provided is insufficient. Therefore the dialogue manager posed
an interactive question to the user. Based on the query set and contextual information
of U2, the Dialogue manager retrieves a relevant answer from the database.
4 Detailed Execution
User query: hExarAbaxlo makkAmasjix eVppudu opeVn uMtuMxi?
(When is Mecca Masjid open in Hyderabad?)
POS-tagger: hExarAbaxlo makkAmasjix eVppudu/WQ opeVn uMtuMxi
Replace with root word: hExarAbax makkAmasjix eVppudu/WQ opeVn uMdu
Replace with synonym: hExarAbax makkAmasjix eVppudu/WQ samayaM uMdu
Keywords: makkAmasjix, eVppudu, samayaM
Query set: [makkAmasjix, eVppudu, samayaM]
Answer: makkAmasjix yoVkka samayaM – annirojulu 4:00 am - 9:30 pm
( Mecca Masjid open timings - All days 4:00 am -9:30 pm).
5 Evaluation
Automatic evaluation is not available for dialogue systems. Human evaluation is only
possible. There were 5 evaluators who used and evaluated the system based on the
metrics given in Table 1. The mother tongue of all the evaluators is Telugu. Table 1
shows the criteria for the evaluation and the average of the rating given by human
evaluators on the scale of 1-5 where 1 means poor and 5 means excellent.
Table 1. Human evaluation of our system
Metric average rating
Speed How fast are the responses? 4
Timeout Does the system hang? 5
Recognition Does the system understand your intention? 3.5
Reliability Did you find all the information you were
looking for?
4
Relevance Are the responses appropriate? 4
Usability Is the system easy to use? 4
Complexity Does the system handle complex sentences? 3
Performance Overall performance of the system 3.5
6 Error Analysis
Many efforts are being done to make the conversations seem natural i.e. closer to
human conversation. For this, we need to handle many discourse issues such as
Anaphora, ellipsis etc., and other issues like grammatical and spelling errors. Our
system can handle these issues to some extent but as the complexity of the sentence
increases, the performance of the system degrades. Some such issues are discussed in
this section.
Anaphora Resolution: [8]
U1: cArminAr eVkkada uMxi ?
(Where is Charminar?)
S1: cArminAr - cirunAmA - cArminAr, hExarAbAx, weVlaMgANA
(The address of Charminar is - Charminar, Hyderabad, Telangana.)
U2: xAni PIju eMwa, golkoVMda eVkkada uMxi
(what is its fee and where is Golconda? )
S2: golkoVMda yoVkka cirunAmA - ibrahIM bAg, heVxarAbAx,
weVlaMgANa 500008.
(The address of Golconda is Ibrahim Bagh, Hyderabad, Telangana 500008.)
golkoVMda yoVkka PIju - iMdiyans - 5, vixeSIyulu - 100
(The fee for Golconda - Indians - Rs.5, Foreigners -Rs.100)
Here, in U2, ‘xAni’ (its) may refer to ‘cArminAr’ (Charminar) or ‘golkoVMda’
(Golconda). Though the pronoun ‘xAni’ should map to ‘cArminAr’, it is mapped to
‘golkoVMda’ as golkoVMda’ is nearer to the pronoun (‘its’).
U1: cArminAr eVkkada uMxi ?
(Where is Charminar?)
S1: cArminAr - cirunAmA - cArminAr, hExarAbAx, weVlaMgANA
(The address of Charminar is - Charminar, Hyderabad, Telangana.)
U2: xAni PIju eMwa ?
(What is the entry fee?)
S3: cArminAr yoVkka PIju - iMdiyans - 20, vixeSIyulu - 150
(The fee for Charminar - Indians - Rs.20, Foreigners -Rs.150)
Here, in U2, as there is no ambiguity in co-reference, ‘xAni’ (its) can only refer to
‘cArminAr’ (Charminar) in U1. The system successfully analyzed and gave the cor-
rect response.
Ellipsis: [10]
U1: cArminAr cirunAmA eMti, mari PIju ?
(What is the address of Charminar, (what about its) fees?)
S1: cArminAr - cirunAmA - cArminAr, hExarAbAx, weVlaMgANA
(The address of Charminar is - Charminar, Hyderabad, Telangana.)
cArminAr yoVkka PIju - iMdiyans - 20, vixeSIyulu - 150
(The fee for Charminar - Indians - Rs.20, Foreigners -Rs.150)
In this example, ‘what about its’ in U1 is ellipsis. The system properly interprets the
above query. In general if the sentence becomes complex the system may not be able
to handle the ellipses like in the below example.
U1: cArminAr cirunAmA eMti, mari PIju, golkoVMda eVppudu opeVn
uMtaxi, mari cArminAr
(What is the address of Charminar, (what about its) fees, When is Golconda open,
(what about) Charminar)
S1: golkoVMda - samayaM - somavAraM wappa anni rojulu
5:00pm – 9:00 pm, AxivAraM 7:00pm - 8:00pm kAMwi praxarSana
(except on Monday Golconda is open on all other days from 5:00pm to 9:00pm
and on Sundays there is alazer light show from 7:00pm to 8:00pm)
golkoVMda - PIju - iMdiyans - 5, vixeSIyulu – 100
(The fee for Golconda - Indians - Rs.5, Foreigners -Rs.100)
golkoVMda yoVkka cirunAmA - ibrahIM bAg, heVxarAbAx,
weVlaMgANa 500008.
(The address of Golconda is Ibrahim Bagh, Hyderabad, Telangana 500008.)
cArminAr - cirunAmA - cArminAr, hExarAbAx, weVlaMgANA
(The address of Charminar is - Charminar, Hyderabad, Telangana.)
The output of the system has to be address, entry fee and open timings of Charminar
along with the open timings of Golconda, but we can see that the system wrongly
interpreted and gave output for open timings, entry fee and address of Golconda along
with address of Charminar.
Sandhi: [9]
Sandhi is a common phenomenon in agglutinative languages. For example, consider
the below dialogue.
U: cArminAreVkkaduMxi ?
(Where is Charminar?)
S: mIku memu e vidamugA sAyapadagalamu?
(How can I help you?)
In U1, we can see that a sentence is expressed as single word in Telugu language
which cannot be analyzed by NLP applications. To handle these cases, there is a need
for sandhi splitter which splits ‘cArminAreVkkaduMxi ’ to ‘cArminAr ’ (Charminar),
‘eVkkada’ (where) and ‘uMxi’ (present).
7 Conclusion
We have shown a new and quick approach to build a dialogue system. It can be readi-
ly adapted to other languages. In general, only language specific parts like Database
and Knowledge base have to be replaced for this purpose. Our model is portable to
any domain. It requires a stemmer and a set of question words which can be easily
developed. This brings us one step closer to build dialogue systems for resource poor
languages. Our system also maintains conversation by posing questions to the user.
In future, we intend to build a multi-lingual and multi-domain dialogue system by
improving our current model which should be able to handle pragmatics and dis-
course. We also intend to handle sandhi, ellipses and anaphora resolution to make the
conversations seem more natural. This system can also be integrated with speech
input and output modules.
Acknowledgements. This work is supported by Information Technology Research
Academy (ITRA), Government of India under, ITRA-Mobile grant
ITRA/15(62)/Mobile/VAMD/01
References
1. Akula, A. R., Sangal, R., and Mamidi, R. A novel approach towards incorporating context
processing capabilities in nlidb system.
2. Gupta, A., Akula, A., Malladi, D., Kukkadapu, P., Ainavolu, V., and Sangal, R. (2012). A
novel approach towards building a portable nlidb system using the computational paninian
grammar framework In Asian Language Processing (IALP), 2012 International Confer-
ence on, pages 93–96. IEEE.
3. Hori, C., Ohtake, K., Misu, T., Kashioka, H., and Nakamura, S. (2009). Weighted finite
state transducer based statistical dialog management. In Automatic Speech Recognition
Understanding, 2009. ASRU 2009. IEEE Workshop on, pages 490–495.
4. Nguyen, A. and Wobcke, W. (2005). An agent-based approach to dialogue management in
personal assistants. In Proceedings of the 10th International Conference on Intelligent Us-
er Interfaces, IUI ’05, pages 137–144, New York, NY, USA. ACM.
5. Nguyen, A. and Wobcke, W. (2006). Extensibility and reuse in an agent-based dialogue
model. In Web Intelligence and Intelligent Agent Technology Workshops, 2006. WI-IAT
2006 Workshops. 2006 IEEE/WIC/ACM International Conference on, pages 367–371.
IEEE.
6. Reddy, R. R. N. and Bandyopadhyay, S. (2006). Dialogue based question answering sys-
tem in telugu. In Proceedings of the Workshop on Multilingual Question Answering,
MLQA ’06, pages 53–60, Stroudsburg, PA, USA. Association for Computational Linguis-
tics.
7. Srirampur, S., Chandibhamar, R., and Mamidi, R. (2014). Statistical morph analyzer
(sma++) for indian languages. COLING 2014, page 103.
8. Ruslan Mitkov. 1999. Anaphora resolution: The state of the art. Technical report. Univer-
sity of Wolverhampton, Wolverhampton.
9. Sandhi splitter and analyzer for Sanskrit(with special reference to aC sandhi), by Sachin
kumar, thesis submitted to JNU special centre for Sanskrit,2007. http://sanskrit.jnu.ac.in/
rstudents/mphil/sachin.pdf
10. Dalrymple, Mary, Stuart M. Shieber, and Fernando C. N. Pereira. Ellipsis and higher-order
unification. Technical report, Computation and Language E-Print Archive. 1991
11. Brants T, TnT–A statistical part-of-speech tagger. In: Proceedings of the sixth applied nat-
ural language processing conference (ANLP-2000). p. 224–31.

More Related Content

What's hot

A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
kevig
 
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_StudentsNLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
Himanshu kandwal
 
Text mining open source tokenization
Text mining open source tokenizationText mining open source tokenization
Text mining open source tokenization
aciijournal
 
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Waqas Tariq
 

What's hot (15)

A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
 
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
 
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_StudentsNLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
 
IRJET- Querying Database using Natural Language Interface
IRJET-  	  Querying Database using Natural Language InterfaceIRJET-  	  Querying Database using Natural Language Interface
IRJET- Querying Database using Natural Language Interface
 
Text mining open source tokenization
Text mining open source tokenizationText mining open source tokenization
Text mining open source tokenization
 
AUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENT
AUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENTAUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENT
AUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENT
 
D3 dhanalakshmi
D3 dhanalakshmiD3 dhanalakshmi
D3 dhanalakshmi
 
Role of Machine Translation and Word Sense Disambiguation in Natural Language...
Role of Machine Translation and Word Sense Disambiguation in Natural Language...Role of Machine Translation and Word Sense Disambiguation in Natural Language...
Role of Machine Translation and Word Sense Disambiguation in Natural Language...
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
 
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
 
FIRE2014_IIT-P
FIRE2014_IIT-PFIRE2014_IIT-P
FIRE2014_IIT-P
 
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESA NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
 
IRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
IRJET- Survey on Generating Suggestions for Erroneous Part in a SentenceIRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
IRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
 
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMHINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
 
Meaning Extraction - IJCTE 2(1)
Meaning Extraction - IJCTE 2(1)Meaning Extraction - IJCTE 2(1)
Meaning Extraction - IJCTE 2(1)
 

Viewers also liked

basic Anchoring script for buiness quiz competition
basic Anchoring script for buiness quiz competitionbasic Anchoring script for buiness quiz competition
basic Anchoring script for buiness quiz competition
Shwetanshu Gupta
 
Space Presentation
Space PresentationSpace Presentation
Space Presentation
nathanr07
 
Space science powerpoint
Space science powerpointSpace science powerpoint
Space science powerpoint
Laura Smith
 
Li-Fi Technology (Perfect slides)
Li-Fi Technology (Perfect slides)Li-Fi Technology (Perfect slides)
Li-Fi Technology (Perfect slides)
UzmaRuhy
 

Viewers also liked (20)

Classification and Identification of Telugu Aksharas using Moment Invariants ...
Classification and Identification of Telugu Aksharas using Moment Invariants ...Classification and Identification of Telugu Aksharas using Moment Invariants ...
Classification and Identification of Telugu Aksharas using Moment Invariants ...
 
Disadvantages of modern technology towards student
Disadvantages of modern technology towards studentDisadvantages of modern technology towards student
Disadvantages of modern technology towards student
 
Role of Press in Nation Building
Role of Press in Nation BuildingRole of Press in Nation Building
Role of Press in Nation Building
 
Advantage and disadvantage of modern technology towards students learning
Advantage and disadvantage of modern technology towards students learningAdvantage and disadvantage of modern technology towards students learning
Advantage and disadvantage of modern technology towards students learning
 
Open Space Technology
Open Space TechnologyOpen Space Technology
Open Space Technology
 
Evolution of Space Technology in India
Evolution of Space Technology in IndiaEvolution of Space Technology in India
Evolution of Space Technology in India
 
Space technology,ch. amarnath ,v class
Space   technology,ch. amarnath ,v classSpace   technology,ch. amarnath ,v class
Space technology,ch. amarnath ,v class
 
Inkjet: A Technology for the Present
Inkjet: A Technology for the PresentInkjet: A Technology for the Present
Inkjet: A Technology for the Present
 
basic Anchoring script for buiness quiz competition
basic Anchoring script for buiness quiz competitionbasic Anchoring script for buiness quiz competition
basic Anchoring script for buiness quiz competition
 
Space Exploration
Space ExplorationSpace Exploration
Space Exploration
 
Anchoring Script
Anchoring  Script Anchoring  Script
Anchoring Script
 
Space Presentation
Space PresentationSpace Presentation
Space Presentation
 
Space science powerpoint
Space science powerpointSpace science powerpoint
Space science powerpoint
 
Black hole ppt
Black hole pptBlack hole ppt
Black hole ppt
 
Information Technology
Information TechnologyInformation Technology
Information Technology
 
Space powerpoint
Space powerpointSpace powerpoint
Space powerpoint
 
Space Exploration
Space ExplorationSpace Exploration
Space Exploration
 
Arc reactor
Arc reactorArc reactor
Arc reactor
 
Li-Fi Technology (Perfect slides)
Li-Fi Technology (Perfect slides)Li-Fi Technology (Perfect slides)
Li-Fi Technology (Perfect slides)
 
Project on Solar Energy
Project on Solar EnergyProject on Solar Energy
Project on Solar Energy
 

Similar to A Dialogue System for Telugu, a Resource-Poor Language

Financial Tracker using NLP
Financial Tracker using NLPFinancial Tracker using NLP
Financial Tracker using NLP
Dr. Amarjeet Singh
 
INTELLIGENT-MULTIDIMENSIONAL-DATABASE-INTERFACE
INTELLIGENT-MULTIDIMENSIONAL-DATABASE-INTERFACEINTELLIGENT-MULTIDIMENSIONAL-DATABASE-INTERFACE
INTELLIGENT-MULTIDIMENSIONAL-DATABASE-INTERFACE
Mohamed Reda
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
IJTET Journal
 

Similar to A Dialogue System for Telugu, a Resource-Poor Language (20)

Accessing database using nlp
Accessing database using nlpAccessing database using nlp
Accessing database using nlp
 
Accessing database using nlp
Accessing database using nlpAccessing database using nlp
Accessing database using nlp
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 
Financial Tracker using NLP
Financial Tracker using NLPFinancial Tracker using NLP
Financial Tracker using NLP
 
Hindi language as a graphical user interface to relational database for tran...
Hindi language as a graphical user interface to relational  database for tran...Hindi language as a graphical user interface to relational  database for tran...
Hindi language as a graphical user interface to relational database for tran...
 
INTELLIGENT-MULTIDIMENSIONAL-DATABASE-INTERFACE
INTELLIGENT-MULTIDIMENSIONAL-DATABASE-INTERFACEINTELLIGENT-MULTIDIMENSIONAL-DATABASE-INTERFACE
INTELLIGENT-MULTIDIMENSIONAL-DATABASE-INTERFACE
 
Pattern based approach for Natural Language Interface to Database
Pattern based approach for Natural Language Interface to DatabasePattern based approach for Natural Language Interface to Database
Pattern based approach for Natural Language Interface to Database
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
 
Survey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageSurvey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi Language
 
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEMA LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
 
A language independent approach to develop urduir system
A language independent approach to develop urduir systemA language independent approach to develop urduir system
A language independent approach to develop urduir system
 
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEMCANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
 
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIRA NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
 
2. an efficient approach for web query preprocessing edit sat
2. an efficient approach for web query preprocessing edit sat2. an efficient approach for web query preprocessing edit sat
2. an efficient approach for web query preprocessing edit sat
 
IRJET - Voice based Natural Language Query Processing
IRJET -  	  Voice based Natural Language Query ProcessingIRJET -  	  Voice based Natural Language Query Processing
IRJET - Voice based Natural Language Query Processing
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
 
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
 

A Dialogue System for Telugu, a Resource-Poor Language

  • 1. adfa, p. 1, 2011. © Springer-Verlag Berlin Heidelberg 2011 A Dialogue System for Telugu, a Resource-Poor Language M Ch Sravanthi K Prathyusha Radhika Mamidi IIIT-Hyderabad IIIT-Hyderabad IIIT-Hyderabad Mullapudi.sravanthi prathyusha.k radhika.mamidi @research.iiit.ac.in @research.iiit.ac.in @iiit.ac.in Abstract. A dialogue system is a computer system which is designed to con- verse with human beings in natural language (NL). A lot of work has been done to develop dialogue systems in regional languages. This paper presents an ap- proach to build a dialogue system for resource poor languages. The approach comprises of two parts namely Data Management and Query Processing. Data Management deals with storing the data in a particular format which helps in easy and quick retrieval of requested information. Query Processing deals with producing a relevant system response for a user query. Our model can handle code-mixed queries which are very common in Indian languages and also han- dles context which is a major challenge in dialogue systems. It also handles spelling mistakes and a few grammatical errors. The model is domain and lan- guage independent. As there is no automated evaluation tool available for dia- logue systems we went for human evaluation of our system, which was devel- oped for Telugu language over ‘Tourist places of Hyderabad’ domain. 5 people evaluated our system and the results are reported in the paper. 1 Introduction A dialogue system is a computer program that communicates with a human in a natu- ral way. Many efforts are being done to make the conversations seem natural. Despite a lot of progress in computational linguistics and language processing techniques we do not see much usage of dialogue systems in real time. Some reasons for this may be the lack of domain expertise, linguistic experts and computational tools. Our approach to build a dialogue system is quick and does not require many language processing tools. Our approach can be described in two parts namely Data Management and Que- ry Processing. ─ Data Management: This component deals with categorization, text segmentation and storage of the data in a hierarchical manner which helps in fast retrieval of the output. ─ Query Processing: This takes a natural language query from a user as input, pro- cess it to extract the keywords and update the context if necessary. Based on the extracted keywords and the context it either retrieves an answer from the database
  • 2. or asks the user an interactive question. In this way, it maintains the conversation with the user all the time. This model can be implemented with minimal or no use of any language processing tool. Our approach is language independent and can be extended to any domain. It can also handle context and code-mixed queries to make the conversations look more natural. Dialogue Example: U1: hExarAbAxulo* mAnyumeVMts emi unnAyi? (What are the monuments in Hyderabad?) S1: cArminAr, golkoVMda, makkA masjix. mIku vItilo exi kAvAli? (Charminar, Golconda, Mecca Masjid. What among these do you need?) U2: cArminAr eVkkada uMxi? (Where is Charminar?) S2: cArminAr yoVkka cirunAmA - cArminAr, heVxarAbAx, weVlaMgANa. (The address of Charminar is -Charminar, Hyderabad, Telangana.) U2: eVppudu opeVn uMtuMxi? (When is it open?) S2: cArminar yoVkka samayaM anni rojulu 9:00 am - 5:30 pm (Charminar open timings -All days 9:00 am - 5:30 pm) U3: eMtrI PIju eMwa? (What is the entry fee?) S3: cArminAr yoVkka PIju - iMdiyans - 20, vixeSIyulu - 150 (The fee for Charminar - Indians - Rs.20, Foreigners -Rs.150) For example, the above conversation shows a user-system interaction from ‘Tourism’ domain for Telugu language. User responses are represented as U1, U2, etc. and sys- tem responses are represented as S1, S2, etc. User and system response together make a discourse unit i.e. U1 and S1 is a discourse unit. We can observe that context is also handled up to many discourse units i.e. to answer U3, U4 we need context infor- mation from U2. U3 is a code-mixed query as it contains ‘opeVn’ (open), an English word. We can see that U3 has been successfully processed and understood by the system. This shows that code-mixed queries are also handled by our system. * Words are in wx format (sanskrit.inria.fr/DATA/wx.html). All the examples given in the paper are from Telugu language.
  • 3. 2 Related Work: There has been a lot of progress in the field of dialogue systems in last few years. In general dialogue systems are classified into three types. (a) Finite State (or graph) based systems, (b) Frame based systems, (c) Agent based systems. (a)Finite state based systems: In this type of systems, conversation occurs according to the predefined states or steps. This is simple to construct but doesn’t allow user to ask questions and take initiative. [3] Proposed a method using weighted finite state transducer for dialogue management. (b)Frame based systems: These systems have a set of templates which are filled based on the user responses. These templates are used to perform other tasks. [2] Pro- posed an approach to build natural language interface to databases (NLIDB) using semantic frames based on Computational Paninian Grammar. Context information in NLIDB is handled by [1]. In this paper different types of user-system interactions were identified and context was handled for one specific type of interaction. A dia- logue based question answering system [6] which extracts keywords from user query to identify a query frame has been developed for Railway information in Telugu. (c)Agent based systems: These systems allow more natural flow of communication between user and system than the other systems. The conversations can be viewed as interaction between two agents, each of which is capable of reasoning about its own actions. [4] Developed an agent based dialogue system called Smart Personal Assis- tant for email management. This has been further extended for calendar task domain in [5]. Our model can be categorized as an agent based system. 3 Our Approach Fig.1 describes the flow chart of the internal working of our model.
  • 4. Fig. 1. System Architecture The major components in our method are: ─ Data Organization(Database) ─ Query Processing  Knowledge Base  Query Analyzer  Advanced Filter  Context Handler  Dialogue Manager 3.1 Data Organization Every domain has an innate hierarchy in it. Study of the possible queries in a domain gives an insight about the hierarchical organization of the data in that domain. For example, consider ‘Tourist places of Hyderabad’ domain in Fig. 2. Fig. 2. Data organization of ‘Tourist places of Hyderabad’ domain When we store data in this manner it becomes easy to add information and extend the domain. We can see the extension of ‘Tourist places of Hyderabad’ domain to ‘Hy- derabad Tourism’ domain in Fig. 3.
  • 5. Fig. 3. Data organization of ‘Tourism of Hyderabad’ domain In this hierarchical tree structure the leaf nodes are at level-0 and the level increases from a leaf node to the root node. The data at level-n (where n is number of levels) is segmented recursively until level-1. Then each segment at level-i (i=1... n) is given a level-(i-1) tag. In physical memory all the layers above level-1 are stored as directories, level-1 as files and level-0 as tags in a file. The text in a file is stored in the form of segments and each segment is given a level-0 tag (address, open timings, entry fee etc.). The labels of all the files and directories along with the information in the files contribute to the data set. 3.2 Query Processing The entire process from taking a user query to generating a system response is termed as ‘Query Processing’. The different components of the ‘Query Processing’ module are described in the subsequent sections. 3.2.1 Knowledge Base Knowledge base contains a domain dependent ontology like list of synonyms and code-mixed words. This helps in handling code-mixed and wrongly spelt words in the queries. This module is used by the Query Analyzer to replace the synonyms, code- mixed words etc., in a query with corresponding level-i (0…n) tags. If any language has knowledge resources like WordNet, dbpedia etc., they can be used to build the knowledge base. This has to be done manually. 3.2.2 Query Analyzer The NL query given by the user is converted into wx query which is given as input to the Query Analyzer. The wx query is then tokenized and given as input to morpholog-
  • 6. ical analyzer and parts of speech (POS) tagger[11]. From the morphological analyz- er’s output, extract the root words of all the tokens in the query and replace these tokens with the corresponding root words. In this modified query the synonyms, code- mixed words etc., are replaced with corresponding level-i tags as discussed in Knowledge Base. Languages that do not have a morphological analyzer can build a simple stemmer which applies ‘minimum edit distance algorithm’ to find the root word of the given token by maintaining a root dictionary and then replaces the token with the root word [7]. Here, POS tagger is used only to identify the question words in the query. Languages with no POS tagger can have a list of question words. Example: User Query: golkoVMda addreVssu emiti, e tEMlo cUdavaccu? Golconda address what what time can visit (What is the address of Golconda, what time to visit?) POS-tagger: golkoVMda addreVssu emiti/WQ e/WQ tEMlo cUda-vaccu Root word replacement: golkoVMda addreVssu emiti/WQ e/WQ tEM cUdu Query Analyzer’s Output: golkoVMda cirunAmA emiti/WQ e/WQ samayaM cUdu From the above output, we can see that English words like ‘addreVssu’ (address) and ‘tEM’ (time) are mapped to corresponding Telugu words i.e. ‘cirunAmA’ (address) and ‘samayaM’ (time) respectively by using knowledge base. If there is no corre- sponding Telugu word the English word remains as it is. 3.2.3 Advanced Filter From the above modified query, question words and level-i words are extracted. From these words this module extracts only the words which play a role in the answer re- trieval by applying heuristics like level-i words nearest to the question words, level-0 words etc. If these heuristics are not satisfied, all the level-i words are considered for further processing. These final sets of words are the keywords. A user query can contain more than one question. In such cases, keywords belonging to a particular question are grouped together. All the groups of a query are collective- ly called a ‘query set’. Example: U: nenu golkoVMda xaggara unnAnu, cArminAr PIju eVMwa iMkA cArminAr makkAmasjix eVkkada unnAyi ? (I am near Golconda, what is the fee for Charminar and what is the address of Charminar and Mecca Masjid?) Query analyzer output: nenu golkoVMda xaggara uMdu cArminAr PIju eVMwa/WQ iMkA cArminAr makkAmasjix cirunAmA uMdu Extracted words: golkoVMda, cArminAr, PIju, eVMwa, cArminAr, makkAmasjix, cirunAmA Keywords: cArminAr, PIju, eVMwa, cArminAr, makkAmasjix, cirunAmA Query set: [cArminAr, PIju, eVMwa], [cArminAr, makkAmasjix, cirunAmA]
  • 7. In ‘U’, ‘I am near Golconda’ is unnecessary information. Therefore even if ‘Golcon- da’ is a word belonging to level-i, it will not be considered as a keyword. 3.2.4 Context Handler Handling context is very important in any conversation to capture user’s intention. This is a major challenge in present day dialogue systems. The query set given by Advanced Filter is used to update the context. If we identify any level-i (i=1, 2...n) word in the query set then there is a shift in the context. If there are no level-n(n<i) words in the query set we borrow level-n words relevant to level-i from the previous query set and add them to form the new query set. If the query set contains level-0 words and no words from level-i (i=1, 2...n), then we borrow level-i words from the previous query set. Example: U1: cArminAr eVkkada uMxi? (Where is Charminar?) S1: cArminAr yoVkka cirunAmA - cArminAr, heVxarAbAx, weVlaMgANa. (The address of Charminar is -Charminar, Hyderabad, Telangana.) U2: eVppudu opeVn uMtuMxi? (When is it open?) S2: cArminar yoVkka samayaM anni rojulu 9:00 am - 5:30 pm (Charminar open timings -All days 9:00 am - 5:30 pm) U3: eMtrI PIju eMwa? (What is the entry fee?) S3: cArminAr yoVkka PIju - iMdiyans - 20, vixeSIyulu - 150 (The fee for Charminar - Indians - Rs.20, Foreigners -Rs.150) U4: golkoVMda eVkkada uMxi? (Where is Golconda?) S4: golkoVMda yoVkka cirunAmA - ibrahIM bAg, heVxarAbAx, weVlaMgANa 500008. (The address of Golconda is Ibrahim Bagh, Hyderabad, Telangana 500008.) In this example, to answer U2 and U3 we need contextual information (cArminAr [Charminar] ) from U1. Context doesn’t change for U2 and U3 as there is no level- i(i=1,2..n) word in them. We can observe the context switch from U3 to U4 i.e. switch from ‘cArminAr’ (Charminar) to ‘golkoVMda’ (Golconda) due to the occurrence of Golconda (a level-1 word in ‘Tourism’ domain) in U4. 3.2.5 Dialogue Manager In any dialogue system, Dialogue Manager (DM) is the major component. It coordi- nates the activity of several subcomponents in a dialogue system and controls the flow
  • 8. of dialogue by giving relevant responses to user queries. The dialogue manager takes the query set and the context information as input. If the user query is ambiguous or no keywords are identified then the dialogue manager poses the user an interactive question from the set of canned questions. Otherwise it retrieves a relevant answer from the database. Example: U1: nenu golkoVMda cUdAli (I have to visit Golconda) S1: mIku memu e vidamugA sAyapadagalamu? (How can we help you?) U2: axi eVkkada uMxi? (Where is it?) S2: golkoVMda yoVkka cirunAmA - ibrahIM bAg, heVxarAbAx, weVlaMgANa 500008. (The address of Golconda is Ibrahim Bagh, Hyderabad, Telangana 500008.) In U1, the information provided is insufficient. Therefore the dialogue manager posed an interactive question to the user. Based on the query set and contextual information of U2, the Dialogue manager retrieves a relevant answer from the database. 4 Detailed Execution User query: hExarAbaxlo makkAmasjix eVppudu opeVn uMtuMxi? (When is Mecca Masjid open in Hyderabad?) POS-tagger: hExarAbaxlo makkAmasjix eVppudu/WQ opeVn uMtuMxi Replace with root word: hExarAbax makkAmasjix eVppudu/WQ opeVn uMdu Replace with synonym: hExarAbax makkAmasjix eVppudu/WQ samayaM uMdu Keywords: makkAmasjix, eVppudu, samayaM Query set: [makkAmasjix, eVppudu, samayaM] Answer: makkAmasjix yoVkka samayaM – annirojulu 4:00 am - 9:30 pm ( Mecca Masjid open timings - All days 4:00 am -9:30 pm).
  • 9. 5 Evaluation Automatic evaluation is not available for dialogue systems. Human evaluation is only possible. There were 5 evaluators who used and evaluated the system based on the metrics given in Table 1. The mother tongue of all the evaluators is Telugu. Table 1 shows the criteria for the evaluation and the average of the rating given by human evaluators on the scale of 1-5 where 1 means poor and 5 means excellent. Table 1. Human evaluation of our system Metric average rating Speed How fast are the responses? 4 Timeout Does the system hang? 5 Recognition Does the system understand your intention? 3.5 Reliability Did you find all the information you were looking for? 4 Relevance Are the responses appropriate? 4 Usability Is the system easy to use? 4 Complexity Does the system handle complex sentences? 3 Performance Overall performance of the system 3.5 6 Error Analysis Many efforts are being done to make the conversations seem natural i.e. closer to human conversation. For this, we need to handle many discourse issues such as Anaphora, ellipsis etc., and other issues like grammatical and spelling errors. Our system can handle these issues to some extent but as the complexity of the sentence increases, the performance of the system degrades. Some such issues are discussed in this section. Anaphora Resolution: [8] U1: cArminAr eVkkada uMxi ? (Where is Charminar?) S1: cArminAr - cirunAmA - cArminAr, hExarAbAx, weVlaMgANA (The address of Charminar is - Charminar, Hyderabad, Telangana.) U2: xAni PIju eMwa, golkoVMda eVkkada uMxi (what is its fee and where is Golconda? ) S2: golkoVMda yoVkka cirunAmA - ibrahIM bAg, heVxarAbAx, weVlaMgANa 500008. (The address of Golconda is Ibrahim Bagh, Hyderabad, Telangana 500008.) golkoVMda yoVkka PIju - iMdiyans - 5, vixeSIyulu - 100 (The fee for Golconda - Indians - Rs.5, Foreigners -Rs.100)
  • 10. Here, in U2, ‘xAni’ (its) may refer to ‘cArminAr’ (Charminar) or ‘golkoVMda’ (Golconda). Though the pronoun ‘xAni’ should map to ‘cArminAr’, it is mapped to ‘golkoVMda’ as golkoVMda’ is nearer to the pronoun (‘its’). U1: cArminAr eVkkada uMxi ? (Where is Charminar?) S1: cArminAr - cirunAmA - cArminAr, hExarAbAx, weVlaMgANA (The address of Charminar is - Charminar, Hyderabad, Telangana.) U2: xAni PIju eMwa ? (What is the entry fee?) S3: cArminAr yoVkka PIju - iMdiyans - 20, vixeSIyulu - 150 (The fee for Charminar - Indians - Rs.20, Foreigners -Rs.150) Here, in U2, as there is no ambiguity in co-reference, ‘xAni’ (its) can only refer to ‘cArminAr’ (Charminar) in U1. The system successfully analyzed and gave the cor- rect response. Ellipsis: [10] U1: cArminAr cirunAmA eMti, mari PIju ? (What is the address of Charminar, (what about its) fees?) S1: cArminAr - cirunAmA - cArminAr, hExarAbAx, weVlaMgANA (The address of Charminar is - Charminar, Hyderabad, Telangana.) cArminAr yoVkka PIju - iMdiyans - 20, vixeSIyulu - 150 (The fee for Charminar - Indians - Rs.20, Foreigners -Rs.150) In this example, ‘what about its’ in U1 is ellipsis. The system properly interprets the above query. In general if the sentence becomes complex the system may not be able to handle the ellipses like in the below example. U1: cArminAr cirunAmA eMti, mari PIju, golkoVMda eVppudu opeVn uMtaxi, mari cArminAr (What is the address of Charminar, (what about its) fees, When is Golconda open, (what about) Charminar) S1: golkoVMda - samayaM - somavAraM wappa anni rojulu 5:00pm – 9:00 pm, AxivAraM 7:00pm - 8:00pm kAMwi praxarSana (except on Monday Golconda is open on all other days from 5:00pm to 9:00pm and on Sundays there is alazer light show from 7:00pm to 8:00pm) golkoVMda - PIju - iMdiyans - 5, vixeSIyulu – 100 (The fee for Golconda - Indians - Rs.5, Foreigners -Rs.100) golkoVMda yoVkka cirunAmA - ibrahIM bAg, heVxarAbAx, weVlaMgANa 500008. (The address of Golconda is Ibrahim Bagh, Hyderabad, Telangana 500008.)
  • 11. cArminAr - cirunAmA - cArminAr, hExarAbAx, weVlaMgANA (The address of Charminar is - Charminar, Hyderabad, Telangana.) The output of the system has to be address, entry fee and open timings of Charminar along with the open timings of Golconda, but we can see that the system wrongly interpreted and gave output for open timings, entry fee and address of Golconda along with address of Charminar. Sandhi: [9] Sandhi is a common phenomenon in agglutinative languages. For example, consider the below dialogue. U: cArminAreVkkaduMxi ? (Where is Charminar?) S: mIku memu e vidamugA sAyapadagalamu? (How can I help you?) In U1, we can see that a sentence is expressed as single word in Telugu language which cannot be analyzed by NLP applications. To handle these cases, there is a need for sandhi splitter which splits ‘cArminAreVkkaduMxi ’ to ‘cArminAr ’ (Charminar), ‘eVkkada’ (where) and ‘uMxi’ (present). 7 Conclusion We have shown a new and quick approach to build a dialogue system. It can be readi- ly adapted to other languages. In general, only language specific parts like Database and Knowledge base have to be replaced for this purpose. Our model is portable to any domain. It requires a stemmer and a set of question words which can be easily developed. This brings us one step closer to build dialogue systems for resource poor languages. Our system also maintains conversation by posing questions to the user. In future, we intend to build a multi-lingual and multi-domain dialogue system by improving our current model which should be able to handle pragmatics and dis- course. We also intend to handle sandhi, ellipses and anaphora resolution to make the conversations seem more natural. This system can also be integrated with speech input and output modules. Acknowledgements. This work is supported by Information Technology Research Academy (ITRA), Government of India under, ITRA-Mobile grant ITRA/15(62)/Mobile/VAMD/01 References 1. Akula, A. R., Sangal, R., and Mamidi, R. A novel approach towards incorporating context processing capabilities in nlidb system.
  • 12. 2. Gupta, A., Akula, A., Malladi, D., Kukkadapu, P., Ainavolu, V., and Sangal, R. (2012). A novel approach towards building a portable nlidb system using the computational paninian grammar framework In Asian Language Processing (IALP), 2012 International Confer- ence on, pages 93–96. IEEE. 3. Hori, C., Ohtake, K., Misu, T., Kashioka, H., and Nakamura, S. (2009). Weighted finite state transducer based statistical dialog management. In Automatic Speech Recognition Understanding, 2009. ASRU 2009. IEEE Workshop on, pages 490–495. 4. Nguyen, A. and Wobcke, W. (2005). An agent-based approach to dialogue management in personal assistants. In Proceedings of the 10th International Conference on Intelligent Us- er Interfaces, IUI ’05, pages 137–144, New York, NY, USA. ACM. 5. Nguyen, A. and Wobcke, W. (2006). Extensibility and reuse in an agent-based dialogue model. In Web Intelligence and Intelligent Agent Technology Workshops, 2006. WI-IAT 2006 Workshops. 2006 IEEE/WIC/ACM International Conference on, pages 367–371. IEEE. 6. Reddy, R. R. N. and Bandyopadhyay, S. (2006). Dialogue based question answering sys- tem in telugu. In Proceedings of the Workshop on Multilingual Question Answering, MLQA ’06, pages 53–60, Stroudsburg, PA, USA. Association for Computational Linguis- tics. 7. Srirampur, S., Chandibhamar, R., and Mamidi, R. (2014). Statistical morph analyzer (sma++) for indian languages. COLING 2014, page 103. 8. Ruslan Mitkov. 1999. Anaphora resolution: The state of the art. Technical report. Univer- sity of Wolverhampton, Wolverhampton. 9. Sandhi splitter and analyzer for Sanskrit(with special reference to aC sandhi), by Sachin kumar, thesis submitted to JNU special centre for Sanskrit,2007. http://sanskrit.jnu.ac.in/ rstudents/mphil/sachin.pdf 10. Dalrymple, Mary, Stuart M. Shieber, and Fernando C. N. Pereira. Ellipsis and higher-order unification. Technical report, Computation and Language E-Print Archive. 1991 11. Brants T, TnT–A statistical part-of-speech tagger. In: Proceedings of the sixth applied nat- ural language processing conference (ANLP-2000). p. 224–31.