SlideShare une entreprise Scribd logo
1  sur  5
Télécharger pour lire hors ligne
CLASSIFICATION OF ARABIC QUESTIONS
USING MULTINOMIAL NAIVE BAYES AND
SUPPORT VECTOR MACHINES
Waheeb Ahmed1
and Babu Anto P2
I. INTRODUCTION
Question Answering is a computer science discipline that uses Information Retrieval(IR) and
techniques of Natural Language Processing (NLP) to answer questions posed by humans to get
the proper answer[1]. Question Answering Systems have two domains: Open domain and Closed
domain. Open domain QA deals with everything whereas closed domain deals with questions
related to a specific domain(Quran, Medical Applications, Biology etc)[2]. Our work focuses on
Closed domain ,that is, the Arabic Wikipedia. Classifying questions posed by users in a question
answering is considered a very challenging problem [3]. Question classification in QA is a crucial
step, because it will help in anticipating the type of answer and this will narrow down the search
space for finding the correct answer. The purpose is to concentrate the answer extraction task
only on those text segments related to the expected type of answer which is identified by the
question classification module[4].
There are two approaches for classifying questions: one is rule-based approach and the other is
machine learning approach. Recently, supervised machine learning techniques are adopted, which
train a classifier from examples that are manually annotated (questions along with their
corresponding answer types). In fact, creating a training and testing set is a time-consuming
process, but no rule-writing skills are required[5]. Hence, we used the machine learning approach
by training a classifier on a set of questions derived from Arabic Wikipedia.
1
Department of Information Technology Kannur University, Kannur, Kerala, India
2
Department of Information Technology Kannur University, Kannur, Kerala, India
Abstract- Question classification plays a very important role in Question Answering systems. It gives a label to a
question depending on the type of the question. This label will be used by the Answer Extraction module to
extract the correct answer. Since there are variety of Natural Language Questions, the task of classifying different
questions becomes hard and challenging. Very limited research has been done on classifying Arabic Questions
using Machine Learning Techniques. In this paper, we used Support Vector Machines(SVM) and Multinomial
Naive Bayes(MNB) to classify Questions. The types of questions classified includes Who, What, Where, When,
How many, How much, How and Why. The labels that will be given to these questions respectively are
Person/Definition, Location, Time/date, Number/Count, Quantity, Manner and Reason. The SVM showed higher
accurate results than MNB. The dataset consisted of 300 questions from the Arabic Wikipedia. The precision of
both the SVM and the MNB is equivalent to precision of 1. The achieved F1 measure for SVM is .97 and for the
MNB is .95 which is a promising result.
Keywords – Question Classification, Question Answering, Machine Learning.
International Journal of Latest Trends in Engineering and Technology
Special Issue SACAIM 2016, pp. 82-86
e-ISSN:2278-621X
Classification of Arabic Questions Using Multinomial Naive Bayes And Support Vector Machines 83
II.RELATED WORK
Al Chalabi[6] proposed question classification methods for Arabic questions using regular
expressions and context free grammars. They used Nooj Platform[7] to write regular expressions
and used linguistic patterns to identify the type of expected answer.
Ali[8] proposed a question classification using support vector machines. Their classifier can
classify only three types of questions namely "Who", "Where" and "What". They used 1-gram, 2-
gram ,3-gram features and TF-Weighting and they indicated that the 2-gram feature produced
the best classification with a performance of 87.25% using F1 measure.
Abdenasser[9] used SVM classifier to classify Quranic questions and they got an overall accuracy
of the classifier equivalent to 77.2%. Their data set consisted of 230 questions from Quranic
domain.
III.PROPOSED METHODOLOGY
SVM is a machine learning technique that classifies text. It proved to be an efficient classifier for
text categorization. MNB is an advance version of Naive Bayes that is designed for classifying
text documents. It gets the words counts in documents rather than the presence and absence of
particular words as traditional Naive Bayes does[10]. We are using support vector machines and
Multinomial Naive Bayes to classify the given question according to the training data that we
built. The training data set consists of 300 questions derived from the Arabic Wikipedia. The
testing data set consists of 200 questions which are translated from Text Retrieval
Conference(TREC 10)[1]. We used 1-gram and 2-gram features while training the classifier.
Figure 1. Question Classification Using SVM and MNB
A natural language question is given by the user. The Arabic diacritics(Vowels) will be removed
and the normalized text will be given to the classifiers. The classified question will be given a
label(The labels are provided in Table 1 in the next section). The accuracy and evaluation stage is
used to evaluate the performance of the classifiers.
Question Class(Label)
Accuracy and Evaluation
SVM/MNB ClassifiersNormalizationNatural Language
Question
Arabic Wikipeda Questions For
Training and Testing
Waheeb Ahmed and Babu Anto P 84
IV.QUESTION TYPE TAXONOMIES
Questions are classified into different types: (Who)‫ﻣﻦ‬ , (When)‫ﻣﺘﻰ‬ , (Where)‫أﯾﻦ‬ , (What) ‫ﻣﺎھﻮ‬-‫ھﻲ‬ ‫ﻣﺎ‬ ,
(How many)‫ﻋﺪد‬ ‫ﻛﻢ‬ , (How much)‫ﻛﻤﯿﺔ‬ ‫ﻛﻢ‬ , (How)‫ﻛﯿﻒ‬ , (Why)‫ﻟﻤﺎذا‬ .
Table -1 Classes of Questions
Question type Expected Answer
type
(LABEL)
Examples
(who)‫ﻣﻦ‬ (Person)‫ﺷﺨﺺ‬ man-hua ‫ھﻮ‬ ‫/نم‬man-hya ‫ھﻲ‬ ‫ﻣﻦ‬ :
Questions that starts with Who(‫)نم‬ asks for a person name, so the
class label given to this question is Person(‫.)صخش‬ So the answer
expected for this type is a person name. e.g: Who is the president of
the United States? ‫اﻟﻤﺘﺤﺪة؟‬ ‫اﻟﻮﻻﯾﺎت‬ ‫رﺋﯿﺲ‬ ‫ھﻮ‬ ‫ﻣﻦ‬
(where)‫أﯾﻦ‬ (Location)‫ﻣﻜﺎن‬ ayin-‫:نيا‬
This question has the meaning of 'Where'. It looks for answer of the
type Location, Location is further divided into four subclasses which
includes City(), State, Country, and Other. e.g: Where is London?.
The main class for this question is Location. The subclass is City.
(when)‫ﻣﺘﻰ‬ (Time)‫زﻣﺎن‬ mata-‫:ىتم‬
This kind of questions asks for Time/Date. So the main class is
Number(‫)مقر‬ and the subclass is Time(‫)تقو‬ or Date(‫.)خيرات‬
e.g: When did Tunisia gained independence? ‫تلقتسا‬ ‫سنوت‬ ‫؟‬ ‫ىتم‬
(how much)‫ﻛﻢ‬ (Quantity)‫ﻛﻤﯿﺔ‬ Kam-kamyat ‫ﻛﻤﯿﺔ‬ ‫ﻛﻢ‬:
This question asks for Quantity. e.g: how much blood in human
body?
‫اﻻﻧﺴﺎن؟‬ ‫ﺟﺴﻢ‬ ‫ﻓﻲ‬ ‫اﻟﺪم‬ ‫ﻛﻤﯿﺔ‬ ‫ﻛﻢ‬
Question Type Expected Answer
Type
(LABEL)
Examples
(how many)‫ﻛﻢ‬ (Count)‫ﻋﺪد‬ Kam-Aded ‫ﻋﺪد‬ ‫ﻛﻢ‬ :
This is equivalent to ' How Many'. The main class for this question is
Number and the subclass is Count. How many continents are there? ‫ﻛﻢ‬
‫ﻋﺪد‬(‫اﻟﻘﺎرات؟‬(
(what) ‫ﻣﺎھﻮ‬-‫ھﻲ‬ ‫ﻣﺎ‬
(Thing)‫ﺷﻲء‬
This question asks for entity. e.g: What is the the color of the sun?
(‫؟‬ ‫اﻟﺸﻤﺲ‬ ‫ﻟﻮن‬ ‫ھﻮ‬ ‫ﻣﺎ‬)
In this case , the class of this question will be Entity.
(Definition) ‫فيرعت‬
ma-hua ‫ھﻮ‬ ‫ﻣﺎ‬/ma-hya ‫ھﻲ‬ ‫ﻣﺎ‬ :
It asks for definition. Like What is/are in English. The class of this
question is Definition(‫ﺗﻌﺮﯾﻒ‬).
e.g: What is Computer? )‫؟‬ ‫اﻟﻜﻤﺒﯿﻮﺗﺮ‬ ‫ھﻮ‬ ‫ﻣﺎ‬(
(How)‫ﻛﯿﻒ‬ (Manner)‫اﻟﻮﺳﯿﻠﺔ‬
kaif- ‫ﻛﯿﻒ‬ :
This question is asking for the manner(How). It is given label
'Manner'. e.g: How water can be transferred from liquid to solid?
)‫اﻟﺤﺎﻟﺔ‬ ‫اﻟﻰ‬ ‫اﻟﺴﺎﺋﻠﺔ‬ ‫اﻟﺤﺎﻟﺔ‬ ‫ﻣﻦ‬ ‫اﻟﻤﺎء‬ ‫ﺗﺤﻮﯾﻞ‬ ‫ﯾﻤﻜﻦ‬ ‫ﻛﯿﻒ‬‫اﻟﻐﺎزﯾﺔ؟‬(
(Why)‫ﻟﻤﺎذا‬ (Reason)‫اﻟﻤﺒﺮر‬
limatha- ‫ﻟﻤﺎذا‬ :
This question is asking for reason. So it is given the label Reason.
Why do birds sing? (‫اﻟﻄﯿﻮر؟‬ ‫ﺗﻐﻨﻲ‬ ‫ﻟﻤﺎذا‬)
Classification of Arabic Questions Using Multinomial Naive Bayes And Support Vector Machines 85
V.PERFORMANCE EVALUATION OF QUESTION CLASSIFIERS
To measure the performance of a question classifier we use precision and recall of the system.
Precision (P) is defined as the number of true positives (TP) over the number of true positives plus
the number of false positives (FP).
P =
T
T + F
Recall (R) is defined as the number of true positives (TP) over the number of true positives plus
the number of false negatives (Fn).
R =
T
T + F
Where True Positive (T ) is the set of questions that is correctly assigned to the class , False
Positive (F ) is the set of questions that are incorrectly assigned to the class, False Negative (F )
is the set of questions that are incorrectly not assigned to the class, and True Negative (Tn) is the
set of questions that are correctly not assigned to the class.
These Precision and Recall are also related to the (F1) score, which is defined as the harmonic
mean of precision and recall.
F1 = 2
P × R
P + R
A number of 200 that are used to test MNB and SVM are classified correctly with a precision of
1.
Table -1 Performance Evaluation For MNB
Question Type Precision Recall F1-measure
(who)‫ﻣﻦ‬ 1 .96 .97
(where)‫أﯾﻦ‬ 1 .94 .96
(when)‫ﻣﺘﻰ‬ 1 .90 .94
(how much)‫ﻛﻤﯿﺔ‬ ‫ﻛﻢ‬ 1 .89 .94
(how many)‫ﻋﺪد‬ ‫ﻛﻢ‬ 1 .93 .96
(what) ‫ﻣﺎھﻮ‬-‫ھﻲ‬ ‫ﻣﺎ‬ 1 .87 .93
(How)‫ﻛﯿﻒ‬ 1 .91 .95
(Why)‫ﻟﻤﺎذا‬ 1 .88 .93
AVG 1 .91 .95
Table 1 show the Precision, Recall and F-measure for the listed question types obtained MNB for
classifying the questions. The obtained average precision by MNB is 1, the recall is .91 and the
F1-measure is .95.
Table -2 Performance Evaluation For SVM
Question Type Precision Recall F1-measure
(who)‫ﻣﻦ‬ 1 .98 .99
(where)‫أﯾﻦ‬ 1 .95 .97
(when)‫ﻣﺘﻰ‬ 1 .96 .98
(how much)‫ﻛﻤﯿﺔ‬ ‫ﻛﻢ‬ 1 .96 .98
Waheeb Ahmed and Babu Anto P 86
(how many) ‫ﻋﺪد‬ ‫ﻛﻢ‬ 1 .93 .96
(what) ‫ﻣﺎھﻮ‬-‫ھﻲ‬ ‫ﻣﺎ‬ 1 .92 .95
(How)‫ﻛﯿﻒ‬ 1 .90 .94
(Why)‫ﻟﻤﺎذا‬ 1 .92 .95
AVG 1 .94 .97
Table 2 show the Precision, Recall and F-measure for the listed question types obtained by SVM
for classifying the questions. The obtained average precision by SVM is 1, the Recall is .94 and
the F1-measure is .97. The result is greatly promising comparing to some recent research on
Question Answering of English language . Systems with recall 0.63 and precision 0.7 [11] and
recall 0.73 and precision 0.73 [12] . Hence, the results that we got shows the effectiveness of
Support Vector Machines and Multinomial Naive Bayes in classifying the questions.
VI.CONCLUSION
In this paper, we proposed question classification method using SVM and MNB for Arabic
questions. We trained the classifiers on 300 questions derived from the Arabic Wikipedia and
tested them using a set of 200 translated questions from TREC. The results are very promising
and can be used in developing Arabic Question Answering Systems.
REFERENCES
[1] E. M. Voorhees,” Overview of the TREC 2001 question answering track,” Proceedings of the 10th Text
Retrieval Conference, pp. 42–52, 2001.
[2] A. M. N. Allam and M. H. Haggag, “The question answering systems: A survey,” International Journal of
Research and Reviews in Information Sciences (IJRRIS) , vol. 2, no. 3, pp. 211-221, 2012.
[3] V. Punyakanok, D. Roth, and W. tau Yih, "Natural language inference via dependency tree mapping: An
application to question answering”, Computational Linguistics, vol. 6, no. 9, 2004.
[4] Antonio, Claudia, Manuel and Luis, “Using Machine Learning and Text Mining in Question Answering,
Evaluation of Multilingual and Multi-modal Information Retrieval”, Volume 4730 of the series Lecture
Notes in Computer Science, pp 415-423, 2007.
[5] Oleksander and Marie-Francine, “A survey on question answering technology from an information
retrieval perspective”, Information Sciences 181, pp. 5412-5434, 2011.
[6] H. M. Al Chalabi, "Question Classification for Arabic Question Answering System," International
Conference on Information and Communication Technology Research (ICTRC), pp. 310 – 313, 2015.
[7] Nooj website:http://www.nooj4nlp.net-Last visited-September, 2016.
[8] Ali Muttalib, Lailatul Qadri. Question Classification using Support Vector Machine And Pattern
Matching. Journal of Theoretical and Applied Information Technology, E-ISSN:1817-3195, 2016.
[9] Heba Abdelnasser, Reham Mohammed, Al-Bayan: An Arabic Question Answering System for the Holy
Quran. Proceedings of the EMNLP 2014 Workshop on Arabic Natural Langauge Processing (ANLP), pp.
57–64, 2014.
[10] McCallum and k. Nigam, A Comparison of Event Models for Naive Bayes Text Classi cation, In
proceedings of the AAAI/ICML-98 on Learning For Text Categorization, AAAI Press, pp. 41-48,1998.
[11] Borhan Samei, Haiying Li, Fazel Keshtkar, Vasile Rus, and Arthur C. Graesser. Context-based speech act
classification in intelligent tutoring systems. In Intelligent Tutoring Systems,Springer International
Publishing, pp. 236-241, 2014.
[12] Christina Unger, Corina Forascu, Vanessa Lopez, Axel-Cyrille Ngonga, Elena Cabrio, Philipp Cirniano,
Sebastian Walter. Question Ansering over Linked Data (QALD-4). In Working Notes for CLEF 2014
Conference, volume 1180 of CEUR Workshop Proceedings, pp. 1172–1180, 2014.

Contenu connexe

En vedette

Marshall wisniewski attorney
Marshall wisniewski attorneyMarshall wisniewski attorney
Marshall wisniewski attorneymarcelohendry
 
Resultados Atletas AChAM
Resultados Atletas AChAMResultados Atletas AChAM
Resultados Atletas AChAMACAM ATLETISMO
 
Resultados Finales Master Arequipa Peru
Resultados Finales Master Arequipa PeruResultados Finales Master Arequipa Peru
Resultados Finales Master Arequipa PeruACAM ATLETISMO
 
HSC Partner Meeting 11-07-12
HSC Partner Meeting 11-07-12HSC Partner Meeting 11-07-12
HSC Partner Meeting 11-07-12bscisteam
 
Tutorial de ingreso a plataforma curso virtual
Tutorial de ingreso a plataforma curso virtualTutorial de ingreso a plataforma curso virtual
Tutorial de ingreso a plataforma curso virtualKattia Rodriguez
 
Chapter 5: The client as the problem solver
Chapter 5: The client as the problem solverChapter 5: The client as the problem solver
Chapter 5: The client as the problem solveradvancedceus
 
Overview 2o census_brazilian_ind_pevc_2009_g_vcepe
Overview 2o census_brazilian_ind_pevc_2009_g_vcepeOverview 2o census_brazilian_ind_pevc_2009_g_vcepe
Overview 2o census_brazilian_ind_pevc_2009_g_vcepeIsmael Neto
 
Audience profile
Audience profileAudience profile
Audience profilerturner93
 
Дастер
ДастерДастер
ДастерAl Maks
 
Confidentiality training
Confidentiality trainingConfidentiality training
Confidentiality trainingbaileesna
 
Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014
Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014
Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014Snorre Tørriseng
 
Кузнецова Н.А. Пресс-секретарь государственных структур
Кузнецова Н.А. Пресс-секретарь государственных структурКузнецова Н.А. Пресс-секретарь государственных структур
Кузнецова Н.А. Пресс-секретарь государственных структурprasu1995
 
EU kids online report
EU kids online reportEU kids online report
EU kids online reportnctcmedia12
 
Форум образовательных инициатив 2013
Форум образовательных инициатив 2013Форум образовательных инициатив 2013
Форум образовательных инициатив 2013Antonina Navros
 
Google docs
Google docsGoogle docs
Google docswoodsie7
 
Seres vivos
Seres vivosSeres vivos
Seres vivosmcarmeb
 
Гоношилина И.Г., Махмутова И.И. Спортивные мероприятия как средство формирова...
Гоношилина И.Г., Махмутова И.И. Спортивные мероприятия как средство формирова...Гоношилина И.Г., Махмутова И.И. Спортивные мероприятия как средство формирова...
Гоношилина И.Г., Махмутова И.И. Спортивные мероприятия как средство формирова...prasu1995
 
Ellipsoidal Representations about correlations (2011-11, Tsukuba, Kakenhi-Sym...
Ellipsoidal Representations about correlations (2011-11, Tsukuba, Kakenhi-Sym...Ellipsoidal Representations about correlations (2011-11, Tsukuba, Kakenhi-Sym...
Ellipsoidal Representations about correlations (2011-11, Tsukuba, Kakenhi-Sym...Toshiyuki Shimono
 

En vedette (20)

Marshall wisniewski attorney
Marshall wisniewski attorneyMarshall wisniewski attorney
Marshall wisniewski attorney
 
Resultados Atletas AChAM
Resultados Atletas AChAMResultados Atletas AChAM
Resultados Atletas AChAM
 
Resultados Finales Master Arequipa Peru
Resultados Finales Master Arequipa PeruResultados Finales Master Arequipa Peru
Resultados Finales Master Arequipa Peru
 
HSC Partner Meeting 11-07-12
HSC Partner Meeting 11-07-12HSC Partner Meeting 11-07-12
HSC Partner Meeting 11-07-12
 
Tutorial de ingreso a plataforma curso virtual
Tutorial de ingreso a plataforma curso virtualTutorial de ingreso a plataforma curso virtual
Tutorial de ingreso a plataforma curso virtual
 
Chapter 5: The client as the problem solver
Chapter 5: The client as the problem solverChapter 5: The client as the problem solver
Chapter 5: The client as the problem solver
 
Overview 2o census_brazilian_ind_pevc_2009_g_vcepe
Overview 2o census_brazilian_ind_pevc_2009_g_vcepeOverview 2o census_brazilian_ind_pevc_2009_g_vcepe
Overview 2o census_brazilian_ind_pevc_2009_g_vcepe
 
Audience profile
Audience profileAudience profile
Audience profile
 
Дастер
ДастерДастер
Дастер
 
Confidentiality training
Confidentiality trainingConfidentiality training
Confidentiality training
 
Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014
Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014
Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014
 
Кузнецова Н.А. Пресс-секретарь государственных структур
Кузнецова Н.А. Пресс-секретарь государственных структурКузнецова Н.А. Пресс-секретарь государственных структур
Кузнецова Н.А. Пресс-секретарь государственных структур
 
EU kids online report
EU kids online reportEU kids online report
EU kids online report
 
Форум образовательных инициатив 2013
Форум образовательных инициатив 2013Форум образовательных инициатив 2013
Форум образовательных инициатив 2013
 
Google docs
Google docsGoogle docs
Google docs
 
Seres vivos
Seres vivosSeres vivos
Seres vivos
 
Гоношилина И.Г., Махмутова И.И. Спортивные мероприятия как средство формирова...
Гоношилина И.Г., Махмутова И.И. Спортивные мероприятия как средство формирова...Гоношилина И.Г., Махмутова И.И. Спортивные мероприятия как средство формирова...
Гоношилина И.Г., Махмутова И.И. Спортивные мероприятия как средство формирова...
 
C U L T U R E
C U L T U R EC U L T U R E
C U L T U R E
 
theQuiz(2);
theQuiz(2);theQuiz(2);
theQuiz(2);
 
Ellipsoidal Representations about correlations (2011-11, Tsukuba, Kakenhi-Sym...
Ellipsoidal Representations about correlations (2011-11, Tsukuba, Kakenhi-Sym...Ellipsoidal Representations about correlations (2011-11, Tsukuba, Kakenhi-Sym...
Ellipsoidal Representations about correlations (2011-11, Tsukuba, Kakenhi-Sym...
 

Similaire à Classification of Arabic Questions Using Multinomial naive Bayes and Support Vector Machines

Question Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical featuresQuestion Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical featuresdannyijwest
 
Paper id 28201441
Paper id 28201441Paper id 28201441
Paper id 28201441IJRAT
 
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUES
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUESA SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUES
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUESJournal For Research
 
Classification of Machine Translation Outputs Using NB Classifier and SVM for...
Classification of Machine Translation Outputs Using NB Classifier and SVM for...Classification of Machine Translation Outputs Using NB Classifier and SVM for...
Classification of Machine Translation Outputs Using NB Classifier and SVM for...mlaij
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly
 
Question Focus Recognition in Question Answering Systems
Question Focus Recognition in Question  Answering Systems Question Focus Recognition in Question  Answering Systems
Question Focus Recognition in Question Answering Systems Waheeb Ahmed
 
A Meaning-Based Statistical English Math Word Problem Solver.pdf
A Meaning-Based Statistical English Math Word Problem Solver.pdfA Meaning-Based Statistical English Math Word Problem Solver.pdf
A Meaning-Based Statistical English Math Word Problem Solver.pdfAnna Landers
 
Holistic Approach for Arabic Word Recognition
Holistic Approach for Arabic Word RecognitionHolistic Approach for Arabic Word Recognition
Holistic Approach for Arabic Word RecognitionEditor IJCATR
 
QUESTION ANALYSIS FOR ARABIC QUESTION ANSWERING SYSTEMS
QUESTION ANALYSIS FOR ARABIC QUESTION ANSWERING SYSTEMS QUESTION ANALYSIS FOR ARABIC QUESTION ANSWERING SYSTEMS
QUESTION ANALYSIS FOR ARABIC QUESTION ANSWERING SYSTEMS ijnlc
 
Using mcq for effective it education woodford
Using mcq for effective it education woodfordUsing mcq for effective it education woodford
Using mcq for effective it education woodfordRipudaman Singh
 
Resolving the semantics of vietnamese questions in v news qaict system
Resolving the semantics of vietnamese questions in v news qaict systemResolving the semantics of vietnamese questions in v news qaict system
Resolving the semantics of vietnamese questions in v news qaict systemijaia
 
Doc format.
Doc format.Doc format.
Doc format.butest
 
Answer Extraction for how and why Questions in Question Answering Systems
Answer Extraction for how and why Questions in Question Answering SystemsAnswer Extraction for how and why Questions in Question Answering Systems
Answer Extraction for how and why Questions in Question Answering Systemsijceronline
 
Generating Adequate Distractors for Multiple-Choice Questions
Generating Adequate Distractors for Multiple-Choice QuestionsGenerating Adequate Distractors for Multiple-Choice Questions
Generating Adequate Distractors for Multiple-Choice QuestionsCheng Zhang
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLifeng (Aaron) Han
 
Techniques For Deep Query Understanding
Techniques For Deep Query UnderstandingTechniques For Deep Query Understanding
Techniques For Deep Query UnderstandingAbhay Prakash
 

Similaire à Classification of Arabic Questions Using Multinomial naive Bayes and Support Vector Machines (20)

Question Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical featuresQuestion Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical features
 
Paper id 28201441
Paper id 28201441Paper id 28201441
Paper id 28201441
 
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUES
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUESA SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUES
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUES
 
Classification of Machine Translation Outputs Using NB Classifier and SVM for...
Classification of Machine Translation Outputs Using NB Classifier and SVM for...Classification of Machine Translation Outputs Using NB Classifier and SVM for...
Classification of Machine Translation Outputs Using NB Classifier and SVM for...
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
 
Novel Scoring System for Identify Accurate Answers for Factoid Questions
Novel Scoring System for Identify Accurate Answers for Factoid QuestionsNovel Scoring System for Identify Accurate Answers for Factoid Questions
Novel Scoring System for Identify Accurate Answers for Factoid Questions
 
Question Focus Recognition in Question Answering Systems
Question Focus Recognition in Question  Answering Systems Question Focus Recognition in Question  Answering Systems
Question Focus Recognition in Question Answering Systems
 
A Meaning-Based Statistical English Math Word Problem Solver.pdf
A Meaning-Based Statistical English Math Word Problem Solver.pdfA Meaning-Based Statistical English Math Word Problem Solver.pdf
A Meaning-Based Statistical English Math Word Problem Solver.pdf
 
Minor
MinorMinor
Minor
 
Holistic Approach for Arabic Word Recognition
Holistic Approach for Arabic Word RecognitionHolistic Approach for Arabic Word Recognition
Holistic Approach for Arabic Word Recognition
 
QUESTION ANALYSIS FOR ARABIC QUESTION ANSWERING SYSTEMS
QUESTION ANALYSIS FOR ARABIC QUESTION ANSWERING SYSTEMS QUESTION ANALYSIS FOR ARABIC QUESTION ANSWERING SYSTEMS
QUESTION ANALYSIS FOR ARABIC QUESTION ANSWERING SYSTEMS
 
Using mcq for effective it education woodford
Using mcq for effective it education woodfordUsing mcq for effective it education woodford
Using mcq for effective it education woodford
 
Resolving the semantics of vietnamese questions in v news qaict system
Resolving the semantics of vietnamese questions in v news qaict systemResolving the semantics of vietnamese questions in v news qaict system
Resolving the semantics of vietnamese questions in v news qaict system
 
A Review on Novel Scoring System for Identify Accurate Answers for Factoid Qu...
A Review on Novel Scoring System for Identify Accurate Answers for Factoid Qu...A Review on Novel Scoring System for Identify Accurate Answers for Factoid Qu...
A Review on Novel Scoring System for Identify Accurate Answers for Factoid Qu...
 
Doc format.
Doc format.Doc format.
Doc format.
 
Answer Extraction for how and why Questions in Question Answering Systems
Answer Extraction for how and why Questions in Question Answering SystemsAnswer Extraction for how and why Questions in Question Answering Systems
Answer Extraction for how and why Questions in Question Answering Systems
 
Generating Adequate Distractors for Multiple-Choice Questions
Generating Adequate Distractors for Multiple-Choice QuestionsGenerating Adequate Distractors for Multiple-Choice Questions
Generating Adequate Distractors for Multiple-Choice Questions
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metric
 
Techniques For Deep Query Understanding
Techniques For Deep Query UnderstandingTechniques For Deep Query Understanding
Techniques For Deep Query Understanding
 

Dernier

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Dernier (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Classification of Arabic Questions Using Multinomial naive Bayes and Support Vector Machines

  • 1. CLASSIFICATION OF ARABIC QUESTIONS USING MULTINOMIAL NAIVE BAYES AND SUPPORT VECTOR MACHINES Waheeb Ahmed1 and Babu Anto P2 I. INTRODUCTION Question Answering is a computer science discipline that uses Information Retrieval(IR) and techniques of Natural Language Processing (NLP) to answer questions posed by humans to get the proper answer[1]. Question Answering Systems have two domains: Open domain and Closed domain. Open domain QA deals with everything whereas closed domain deals with questions related to a specific domain(Quran, Medical Applications, Biology etc)[2]. Our work focuses on Closed domain ,that is, the Arabic Wikipedia. Classifying questions posed by users in a question answering is considered a very challenging problem [3]. Question classification in QA is a crucial step, because it will help in anticipating the type of answer and this will narrow down the search space for finding the correct answer. The purpose is to concentrate the answer extraction task only on those text segments related to the expected type of answer which is identified by the question classification module[4]. There are two approaches for classifying questions: one is rule-based approach and the other is machine learning approach. Recently, supervised machine learning techniques are adopted, which train a classifier from examples that are manually annotated (questions along with their corresponding answer types). In fact, creating a training and testing set is a time-consuming process, but no rule-writing skills are required[5]. Hence, we used the machine learning approach by training a classifier on a set of questions derived from Arabic Wikipedia. 1 Department of Information Technology Kannur University, Kannur, Kerala, India 2 Department of Information Technology Kannur University, Kannur, Kerala, India Abstract- Question classification plays a very important role in Question Answering systems. It gives a label to a question depending on the type of the question. This label will be used by the Answer Extraction module to extract the correct answer. Since there are variety of Natural Language Questions, the task of classifying different questions becomes hard and challenging. Very limited research has been done on classifying Arabic Questions using Machine Learning Techniques. In this paper, we used Support Vector Machines(SVM) and Multinomial Naive Bayes(MNB) to classify Questions. The types of questions classified includes Who, What, Where, When, How many, How much, How and Why. The labels that will be given to these questions respectively are Person/Definition, Location, Time/date, Number/Count, Quantity, Manner and Reason. The SVM showed higher accurate results than MNB. The dataset consisted of 300 questions from the Arabic Wikipedia. The precision of both the SVM and the MNB is equivalent to precision of 1. The achieved F1 measure for SVM is .97 and for the MNB is .95 which is a promising result. Keywords – Question Classification, Question Answering, Machine Learning. International Journal of Latest Trends in Engineering and Technology Special Issue SACAIM 2016, pp. 82-86 e-ISSN:2278-621X
  • 2. Classification of Arabic Questions Using Multinomial Naive Bayes And Support Vector Machines 83 II.RELATED WORK Al Chalabi[6] proposed question classification methods for Arabic questions using regular expressions and context free grammars. They used Nooj Platform[7] to write regular expressions and used linguistic patterns to identify the type of expected answer. Ali[8] proposed a question classification using support vector machines. Their classifier can classify only three types of questions namely "Who", "Where" and "What". They used 1-gram, 2- gram ,3-gram features and TF-Weighting and they indicated that the 2-gram feature produced the best classification with a performance of 87.25% using F1 measure. Abdenasser[9] used SVM classifier to classify Quranic questions and they got an overall accuracy of the classifier equivalent to 77.2%. Their data set consisted of 230 questions from Quranic domain. III.PROPOSED METHODOLOGY SVM is a machine learning technique that classifies text. It proved to be an efficient classifier for text categorization. MNB is an advance version of Naive Bayes that is designed for classifying text documents. It gets the words counts in documents rather than the presence and absence of particular words as traditional Naive Bayes does[10]. We are using support vector machines and Multinomial Naive Bayes to classify the given question according to the training data that we built. The training data set consists of 300 questions derived from the Arabic Wikipedia. The testing data set consists of 200 questions which are translated from Text Retrieval Conference(TREC 10)[1]. We used 1-gram and 2-gram features while training the classifier. Figure 1. Question Classification Using SVM and MNB A natural language question is given by the user. The Arabic diacritics(Vowels) will be removed and the normalized text will be given to the classifiers. The classified question will be given a label(The labels are provided in Table 1 in the next section). The accuracy and evaluation stage is used to evaluate the performance of the classifiers. Question Class(Label) Accuracy and Evaluation SVM/MNB ClassifiersNormalizationNatural Language Question Arabic Wikipeda Questions For Training and Testing
  • 3. Waheeb Ahmed and Babu Anto P 84 IV.QUESTION TYPE TAXONOMIES Questions are classified into different types: (Who)‫ﻣﻦ‬ , (When)‫ﻣﺘﻰ‬ , (Where)‫أﯾﻦ‬ , (What) ‫ﻣﺎھﻮ‬-‫ھﻲ‬ ‫ﻣﺎ‬ , (How many)‫ﻋﺪد‬ ‫ﻛﻢ‬ , (How much)‫ﻛﻤﯿﺔ‬ ‫ﻛﻢ‬ , (How)‫ﻛﯿﻒ‬ , (Why)‫ﻟﻤﺎذا‬ . Table -1 Classes of Questions Question type Expected Answer type (LABEL) Examples (who)‫ﻣﻦ‬ (Person)‫ﺷﺨﺺ‬ man-hua ‫ھﻮ‬ ‫/نم‬man-hya ‫ھﻲ‬ ‫ﻣﻦ‬ : Questions that starts with Who(‫)نم‬ asks for a person name, so the class label given to this question is Person(‫.)صخش‬ So the answer expected for this type is a person name. e.g: Who is the president of the United States? ‫اﻟﻤﺘﺤﺪة؟‬ ‫اﻟﻮﻻﯾﺎت‬ ‫رﺋﯿﺲ‬ ‫ھﻮ‬ ‫ﻣﻦ‬ (where)‫أﯾﻦ‬ (Location)‫ﻣﻜﺎن‬ ayin-‫:نيا‬ This question has the meaning of 'Where'. It looks for answer of the type Location, Location is further divided into four subclasses which includes City(), State, Country, and Other. e.g: Where is London?. The main class for this question is Location. The subclass is City. (when)‫ﻣﺘﻰ‬ (Time)‫زﻣﺎن‬ mata-‫:ىتم‬ This kind of questions asks for Time/Date. So the main class is Number(‫)مقر‬ and the subclass is Time(‫)تقو‬ or Date(‫.)خيرات‬ e.g: When did Tunisia gained independence? ‫تلقتسا‬ ‫سنوت‬ ‫؟‬ ‫ىتم‬ (how much)‫ﻛﻢ‬ (Quantity)‫ﻛﻤﯿﺔ‬ Kam-kamyat ‫ﻛﻤﯿﺔ‬ ‫ﻛﻢ‬: This question asks for Quantity. e.g: how much blood in human body? ‫اﻻﻧﺴﺎن؟‬ ‫ﺟﺴﻢ‬ ‫ﻓﻲ‬ ‫اﻟﺪم‬ ‫ﻛﻤﯿﺔ‬ ‫ﻛﻢ‬ Question Type Expected Answer Type (LABEL) Examples (how many)‫ﻛﻢ‬ (Count)‫ﻋﺪد‬ Kam-Aded ‫ﻋﺪد‬ ‫ﻛﻢ‬ : This is equivalent to ' How Many'. The main class for this question is Number and the subclass is Count. How many continents are there? ‫ﻛﻢ‬ ‫ﻋﺪد‬(‫اﻟﻘﺎرات؟‬( (what) ‫ﻣﺎھﻮ‬-‫ھﻲ‬ ‫ﻣﺎ‬ (Thing)‫ﺷﻲء‬ This question asks for entity. e.g: What is the the color of the sun? (‫؟‬ ‫اﻟﺸﻤﺲ‬ ‫ﻟﻮن‬ ‫ھﻮ‬ ‫ﻣﺎ‬) In this case , the class of this question will be Entity. (Definition) ‫فيرعت‬ ma-hua ‫ھﻮ‬ ‫ﻣﺎ‬/ma-hya ‫ھﻲ‬ ‫ﻣﺎ‬ : It asks for definition. Like What is/are in English. The class of this question is Definition(‫ﺗﻌﺮﯾﻒ‬). e.g: What is Computer? )‫؟‬ ‫اﻟﻜﻤﺒﯿﻮﺗﺮ‬ ‫ھﻮ‬ ‫ﻣﺎ‬( (How)‫ﻛﯿﻒ‬ (Manner)‫اﻟﻮﺳﯿﻠﺔ‬ kaif- ‫ﻛﯿﻒ‬ : This question is asking for the manner(How). It is given label 'Manner'. e.g: How water can be transferred from liquid to solid? )‫اﻟﺤﺎﻟﺔ‬ ‫اﻟﻰ‬ ‫اﻟﺴﺎﺋﻠﺔ‬ ‫اﻟﺤﺎﻟﺔ‬ ‫ﻣﻦ‬ ‫اﻟﻤﺎء‬ ‫ﺗﺤﻮﯾﻞ‬ ‫ﯾﻤﻜﻦ‬ ‫ﻛﯿﻒ‬‫اﻟﻐﺎزﯾﺔ؟‬( (Why)‫ﻟﻤﺎذا‬ (Reason)‫اﻟﻤﺒﺮر‬ limatha- ‫ﻟﻤﺎذا‬ : This question is asking for reason. So it is given the label Reason. Why do birds sing? (‫اﻟﻄﯿﻮر؟‬ ‫ﺗﻐﻨﻲ‬ ‫ﻟﻤﺎذا‬)
  • 4. Classification of Arabic Questions Using Multinomial Naive Bayes And Support Vector Machines 85 V.PERFORMANCE EVALUATION OF QUESTION CLASSIFIERS To measure the performance of a question classifier we use precision and recall of the system. Precision (P) is defined as the number of true positives (TP) over the number of true positives plus the number of false positives (FP). P = T T + F Recall (R) is defined as the number of true positives (TP) over the number of true positives plus the number of false negatives (Fn). R = T T + F Where True Positive (T ) is the set of questions that is correctly assigned to the class , False Positive (F ) is the set of questions that are incorrectly assigned to the class, False Negative (F ) is the set of questions that are incorrectly not assigned to the class, and True Negative (Tn) is the set of questions that are correctly not assigned to the class. These Precision and Recall are also related to the (F1) score, which is defined as the harmonic mean of precision and recall. F1 = 2 P × R P + R A number of 200 that are used to test MNB and SVM are classified correctly with a precision of 1. Table -1 Performance Evaluation For MNB Question Type Precision Recall F1-measure (who)‫ﻣﻦ‬ 1 .96 .97 (where)‫أﯾﻦ‬ 1 .94 .96 (when)‫ﻣﺘﻰ‬ 1 .90 .94 (how much)‫ﻛﻤﯿﺔ‬ ‫ﻛﻢ‬ 1 .89 .94 (how many)‫ﻋﺪد‬ ‫ﻛﻢ‬ 1 .93 .96 (what) ‫ﻣﺎھﻮ‬-‫ھﻲ‬ ‫ﻣﺎ‬ 1 .87 .93 (How)‫ﻛﯿﻒ‬ 1 .91 .95 (Why)‫ﻟﻤﺎذا‬ 1 .88 .93 AVG 1 .91 .95 Table 1 show the Precision, Recall and F-measure for the listed question types obtained MNB for classifying the questions. The obtained average precision by MNB is 1, the recall is .91 and the F1-measure is .95. Table -2 Performance Evaluation For SVM Question Type Precision Recall F1-measure (who)‫ﻣﻦ‬ 1 .98 .99 (where)‫أﯾﻦ‬ 1 .95 .97 (when)‫ﻣﺘﻰ‬ 1 .96 .98 (how much)‫ﻛﻤﯿﺔ‬ ‫ﻛﻢ‬ 1 .96 .98
  • 5. Waheeb Ahmed and Babu Anto P 86 (how many) ‫ﻋﺪد‬ ‫ﻛﻢ‬ 1 .93 .96 (what) ‫ﻣﺎھﻮ‬-‫ھﻲ‬ ‫ﻣﺎ‬ 1 .92 .95 (How)‫ﻛﯿﻒ‬ 1 .90 .94 (Why)‫ﻟﻤﺎذا‬ 1 .92 .95 AVG 1 .94 .97 Table 2 show the Precision, Recall and F-measure for the listed question types obtained by SVM for classifying the questions. The obtained average precision by SVM is 1, the Recall is .94 and the F1-measure is .97. The result is greatly promising comparing to some recent research on Question Answering of English language . Systems with recall 0.63 and precision 0.7 [11] and recall 0.73 and precision 0.73 [12] . Hence, the results that we got shows the effectiveness of Support Vector Machines and Multinomial Naive Bayes in classifying the questions. VI.CONCLUSION In this paper, we proposed question classification method using SVM and MNB for Arabic questions. We trained the classifiers on 300 questions derived from the Arabic Wikipedia and tested them using a set of 200 translated questions from TREC. The results are very promising and can be used in developing Arabic Question Answering Systems. REFERENCES [1] E. M. Voorhees,” Overview of the TREC 2001 question answering track,” Proceedings of the 10th Text Retrieval Conference, pp. 42–52, 2001. [2] A. M. N. Allam and M. H. Haggag, “The question answering systems: A survey,” International Journal of Research and Reviews in Information Sciences (IJRRIS) , vol. 2, no. 3, pp. 211-221, 2012. [3] V. Punyakanok, D. Roth, and W. tau Yih, "Natural language inference via dependency tree mapping: An application to question answering”, Computational Linguistics, vol. 6, no. 9, 2004. [4] Antonio, Claudia, Manuel and Luis, “Using Machine Learning and Text Mining in Question Answering, Evaluation of Multilingual and Multi-modal Information Retrieval”, Volume 4730 of the series Lecture Notes in Computer Science, pp 415-423, 2007. [5] Oleksander and Marie-Francine, “A survey on question answering technology from an information retrieval perspective”, Information Sciences 181, pp. 5412-5434, 2011. [6] H. M. Al Chalabi, "Question Classification for Arabic Question Answering System," International Conference on Information and Communication Technology Research (ICTRC), pp. 310 – 313, 2015. [7] Nooj website:http://www.nooj4nlp.net-Last visited-September, 2016. [8] Ali Muttalib, Lailatul Qadri. Question Classification using Support Vector Machine And Pattern Matching. Journal of Theoretical and Applied Information Technology, E-ISSN:1817-3195, 2016. [9] Heba Abdelnasser, Reham Mohammed, Al-Bayan: An Arabic Question Answering System for the Holy Quran. Proceedings of the EMNLP 2014 Workshop on Arabic Natural Langauge Processing (ANLP), pp. 57–64, 2014. [10] McCallum and k. Nigam, A Comparison of Event Models for Naive Bayes Text Classi cation, In proceedings of the AAAI/ICML-98 on Learning For Text Categorization, AAAI Press, pp. 41-48,1998. [11] Borhan Samei, Haiying Li, Fazel Keshtkar, Vasile Rus, and Arthur C. Graesser. Context-based speech act classification in intelligent tutoring systems. In Intelligent Tutoring Systems,Springer International Publishing, pp. 236-241, 2014. [12] Christina Unger, Corina Forascu, Vanessa Lopez, Axel-Cyrille Ngonga, Elena Cabrio, Philipp Cirniano, Sebastian Walter. Question Ansering over Linked Data (QALD-4). In Working Notes for CLEF 2014 Conference, volume 1180 of CEUR Workshop Proceedings, pp. 1172–1180, 2014.