SlideShare une entreprise Scribd logo
1  sur  7
Télécharger pour lire hors ligne
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014
10.5121/ijnlc.2014.3311 113
VERB BASED MANIPURI SENTIMENT ANALYSIS
Kishorjit Nongmeikapam1
, Dilipkumar Khangembam1
, Wangkheimayum
Hemkumar1
, Shinghajit Khuraijam2
and Sivaji Bandyopadhyay3
1
Department of Computer Engineering, Manipur Institute of Technology, Manipur
University, Imphal, India
2
Senior Software Enggineer at Accenture, India
3
Department of Computer Engineering, Jadav University, Kolkata, India
ABSTRACT
This paper deals about the sentiment analysis of the Manipuri article. The language is very highly
agglutinative in Nature. The document files are the letters to the editor of few local daily newspapers. The
text is processed for Part of Speech (POS) tagging using Conditional Random Field (CRF). The lexicon of
verbs is modified with the sentiment polarity (Positive or Negative or Neutral) manually. With the POS
tagger the verbs of each sentence are identified and the modified lexicon of verbs is used to notify the
polarity of the sentiment in the sentence. The total number of polarity for each category that is positive,
negative and neutral is counted separately. The highest total of the three is the deciding factor of the
sentiment polarity of the document. The system shows a recall of 72.10%, a precision of 78.14% and a F-
measure of 75.00%.
KEYWORDS
CRF, POS, Sentiment, Polarity, Manipuri
1. INTRODUCTION
This document describes, and is written to conform to, author guidelines for the journals of
AIRCC series. It is prepared in Microsoft Word as a .doc document. Although other means of
preparation are acceptable, final, camera-ready versions must conform to this layout. Microsoft
Word terminology is used where appropriate in this document. Although formatting instructions
may often appear daunting, the simplest approach is to use this template and insert headings and
text into it as appropriate.
2. RELATED WORKS
The work of sentiment analysis in Manipuri is not reported up to the best of the author’s
knowledge. Works on emotion identification is reported in [2] and [3]. For Indian languages
works on SentiWordNet and subjectivity detection is reported in [4] and [5]. CRF based work on
sentiment analysis is reported in [6].
3. CONCEPT OF CONDITION RANDOM FIELD
The concept of Conditional Random Field [7] is developed in order to calculate the conditional
probabilities of values on other designated input nodes of undirected graphical models. CRF
encodes a conditional probability distribution with a given set of features. It is an unsupervised
approach where the system learns by giving some training and can be used for testing other texts.
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014
114
The conditional probability of a state sequence X=(x1, x2,..xT) given an observation sequence
Y=(y1, y2,..yT) is calculated as :
P(Y|X) = t))X,,y,y(fexp(
1
t1-t
T
1t
k
k
k
XZ
∑∑=
λ ---(1)
where, fk( yt-1,yt, X, t) is a feature function whose weight λk is a learnt weight associated with
fkand to be learned via training. The values of the feature functions may range between -∞ … +∞,
but typically they are binary. ZX is the normalization factor:
∑∑∑ =
=
T
t k
kk
y
XZ
1
t1-t t))X,,y,y(fexp λ ---(2)
which is calculated in order to make the probability of all state sequences sum to 1. This is
calculated as in Hidden Markov Model (HMM) and can be obtained efficiently by dynamic
programming. Since CRF defines the conditional probability P(Y|X), the appropriate objective
for parameter learning is to maximize the conditional likelihood of the state sequence or training
data.
∑=
N
1i
)x|P(ylog ii ---(3)
where, {(xi
, yi
)} is the labeled training data.
Gaussian prior on the λ’s is used to regularize the training (i.e., smoothing). If λ ~ N(0,ρ2
), the
objective function becomes,
∑∑ −
= k
i
2
2N
1i 2
)x|P(ylog ii
ρ
λ
---(4)
The objective function is concave, so the λ’shave a unique set of optimal values.
4. SYSTEM DESIGN
Fig. 1 gives the brief block diagram of the system. The first step of system works with identifying
of verbs after running the Part of Speech (POS) tagger using CRF. This is because the sentiment
of the sentence is highly dependent on the verbs. The POS tagging of Manipuri Text document
follows the CRF based POS tagging as mention in [8]. In order to check the polarity of the
sentiment a lexicon of verbs with manual annotation is used.
Once the verbs are identified the identified verbs have a look up to the lexicon table to identify
the polarity of the sentiment. Accordingly, the polarity is notified for each the verb.
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014
115
Figure 1. System block diagram
The number for each of the polarity is counted in the text document. The sentiment decider of the
letter to the editors of the local daily newspapers is decided with the highest number of the
polarity among the three.
A. POS tagging using CRF
As mention above the POS tagging of Manipuri text document follows the feature selection based
on [8]. Fig. 2 shows the setup of the CRF based POS tagging in the Manipuri text.
Figure 2. CRF based POS tagging
The working of CRF is mainly based on the feature selection. The feature listed for the POS
tagging is as follows:
F= {Wi-m, … ,W i-1, Wi, W i+1, …, W i+n, SWi-m, …, SWi-1, SWi, SWi+1,… , SWi-n , number of
acceptable standard suffixes, number of acceptable standard prefixes, acceptable suffixes
present in the word, acceptable prefixes present in the word, word length, word frequency,
digit feature, symbol feature, RMWE}
The details of the set of features that have been applied for POS tagging in Manipuri are as
follows:
TEXT FILE
IDENTIFY
VERBS
CHECK
COUNT THE TOTAL
NUMBER OF POSITIVE OR
NEGATIVE OR NEUTRAL
VERBS POLARITY
SENTIMENT
CRF POS
TAGGER
VERB
SENTIMENT
Evaluation Results
Pre-processing
Documents Collection
Data Test
Labelling
Features Extraction
Data Training
CRF Model
FeaturesExtraction
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014
116
1. Surrounding words as feature: Preceeding word(s) or the successive word(s) are important in
POS tagging because these words play an important role in determining the POS of the present
word.
2. Surrounding Stem words as feature: The Stemming algorithm mentioned in [9] is used. The
preceding and the following stemmed words of a particular word can be used as features. It is
because the preceding and the following words influence the present word POS tagging.
3. Number of acceptable standard suffixes as feature: As mention in [9], Manipuri being an
agglutinative language the suffixes plays an important in determining the POS of a word. For
every word the number of suffixes are identified during stemming and the number of suffixes is
used as a feature.
4. Number of acceptable standard prefixes as feature: Prefixes plays an important role for
Manipuri language. Prefixes are identified during stemming and the prefixes are used as a feature.
5. Acceptable suffixes present as feature: The standard 61 suffixes of Manipuri which are
identified is used as one feature. The maximum number of appended suffixes is reported as ten.
So taking into account of such cases, for every word ten columns separated by a space are created
for every suffix present in the word. A “0” notation is being used in those columns when the word
consists of no acceptable suffixes.
6. Acceptable prefixes present as feature: 11 prefixes have been manually identified in
Manipuri and the list of prefixes is used as one feature. For every word if the prefix is present
then a column is created mentioning the prefix, otherwise the “0” notation is used.
7. Length of the word: Length of the word is set to 1 if it is greater than 3 otherwise, it is set to
0. Very short words are generally pronouns and rarely proper nouns.
8. Word frequency: A range of frequency for words in the training corpus is set: those words
with frequency <100 occurrences are set the value 0, those words which occurs >=100 are set to
1. It is considered as one feature since occurrence of determiners, conjunctions and pronouns are
abundant.
9. Digit features: Quantity measurement, date and monetary values are generally digits. Thus the
digit feature is an important feature. A binary notation of ‘1’ is used if the word consist of a digit
else ‘0’.
10. Symbol feature:Symbols like $,% etc. are meaningful in textual use, so the feature is set to 1
if it is found in the token, otherwise 0. This helps to recognize Symbols and Quantifier number
tags.
11. Reduplicated Multiword Expression (RMWE):(RMWE) are also considered as a feature
since Manipuri is rich of RMWE. The work of RMWE is mention in [11].
5. SYSTEM DESIGN
The experiment starts with the collection and cleaning of letter to the editors from the leading
local daily newspapers. The collection is of 550 letters with an average of 500 words. In total
there are about 2,75,000 words.
The lexicon of verbs is collected from [10] and the polarities of each word are manually added.
By polarity it means positive or negative or neutral. For positive polarity the expert marked with
PV, for the negative polarity the expert marked with NG and for neutral the expert marked with
NU. Sample example is shown in the Fig. 3.
The text document is run for POS tagging using CRF.The C++ based CRF++ 0.53 package1
is
used in this work and it is readily available as open source for segmenting or labeling sequential
data. After this process each work is marked with the POS tag. The words which are tag with
1
http://crfpp.sourceforge.net/
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014
117
verbs are identified with a simple algorithm. These identified verbs then look up to the index of
the modified lexicon of verbs. By modified verbs it means the polarity tag verbs.
The total number of negative polarity verbs is counted. Likewise the numbers of positive and
neutral polarity verbs are also counted.
………….
………….
NG
NG
NG
NG
NG
NG
………
. PV
NU
………….
………….
Figure 3. Sample example of Modified Lexicon Verb List
For each of the letter to the editor document the evaluation is done with the parameter of Recall,
Precision and F-score as follows:
Recall, R =
texttheinanscorrectofNo
systemthebygivenanscorrectofNo
Precision, P =
systemthebygivenansofNo
systemthebygivenanscorrectofNo
F-score, F =
RRRRPPPP2222ββββ
1)PR1)PR1)PR1)PR2222((((ββββ
+
+
Where ββββ is one, precision and recall are given equal weight.
The system shows an averagerecall of 72.10%, a precision of 78.14% and a F-measure of 75.00%
which is also listed in Table I.
TABLE I. AVERAGE RESULT
6. CONCLUSIONS
The work on sentiment analysis of Manipuri has to go miles. This work is just the beginning of
this highly agglutinative Indian Schedule language. More methods and algorithms are to be search
and implemented in order to improve the accuracy. The works can also be extended with
improvement to other domains of blog, articles, twits, feedbacks, SMS etc.
Model Average Recall Average Precision Average
F-Score
CRF 72.10 78.14 75.00
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014
118
REFERENCES
[1] B. Pang and L. Lee. Opinion mining and sentiment analysis.Foundations and Trends in
Information Retrieval 2(1-2):1–135, 2008.
[2] Esuli, A and Sebastiani, F. 2006. SENTIWORDNET: A Publicly Available Lexical Resource for
Opinion Mining. LREC-06
[3] Das, D. and Bandyopadhyay, S. 2010. Identifying EmotionTopic - An Unsupervised Hybrid
Approach with RhetoricalStructure and Heuristic Classifier. In proceedings of the 6thIEEE-
NLPKE, doi: 10.1109/NLPKE.2010.5587777
[4] A. Das and S. Bandyopadhay, “SentiWordNet for Indian languages,” Asian Federation for Natural
Language Processing, China, pp. 56–63, August 2010.
[5] A. Das and S. Bandyopadhay, “Subjectivity Detection in English and Bengali: A CRF-based
Approach,” In Proceedings of the 7th International Conference on Natural Language Processing,
Macmillan 2009.
[6] J. Zhao, K. Liu and G. Wang, “Adding Redundant Features for CRFs- based Sentence Sentiment
Classication” In Proceedings of the Conference on Empirical Methods in Natural Language
Processing, pp.117-126, 2008
[7] Lafferty, J., McCallum, A. and Pereira, F. 2001. Conditional Random Fields: Probabilistic Models
for Segmenting and Labeling Sequence Data. In: Proc. of the 18th International Conference on
Machine Learning (ICML01).Williamstown, MA, USA. pp. 282-289
[8] Kishorjit, N. and Sivaji, B., “A Transliteration of CRF Based Manipuri POS Tagging”, In the
Proceedings of 2nd International Conference on Communication, Computing & Security (ICCCS-
2012), Elsevier Ltd, 2012
[9] Kishorjit, N., Bishworjit, S., Romina, M., Mayekleima Chanu, Ng. & Sivaji, B., (2011) A Light
Weight Manipuri Stemmer, In the Proceedings of Natioanal Conference on Indian Language
Computing (NCILC), Chochin, India
[10] NingombaM. S., “A Dictionary of Manipuri Verbs”, MALADES, Imphaal, 2010
[11] Kishorjit Nongmeikapam, Nonglenjaoba L., Nirmal Y. & Sivaji Bandhyopadhyay, Reduplicated
MWE (RMWE) Helps in Improving the CRF Based Manipuri POS Tagger, International Journal
of Information Technology Convergence and Services (IJITCS) Vol.2, No.1, DOI :
10.5121/ijitcs.2012.2106, 2012, p.45-59.
Authors
Kishorjit Nongmeikapam is working as Asst. Professor at Department of Computer Science
and Engineering, MIT, Manipur University, India. He has completed his BE from PSG
college of Tech., Coimbatore and has completed his ME from Jadavpur University, Kolkata,
India. He is presently doing research in the area of Multiword Expression and its
applications. He has so far published 30 papers and presently handling a Transliteration
project funded by DST, Govt. of Manipur, India. He is the author of the Book, “See the C Programming
Language”.
Dilipkumar Khangembam is presently a student of Manipur Institute Of Technology. He is
pursuing his B.E. in Dept. of Computer Science and Engineering. His area of interest is NLP.
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014
119
Wangkheimayum Hemkumar is presently a student of Manipur Institute Of Technology. He
is pursuing his B.E. in Dept. of Computer Science and Engineering. His area of interest is
NLP.
Professor Sivaji Bandyopadhyay is working as a Professor since 2001 in the Computer
Science and Engineering Department at Jadavpur University, Kolkata, India. His research
interests include machine translation, sentiment analysis, textual entailment, question
answering systems and information retrieval among others. He is currently supervising six
national and international level projects in various areas of language technology. He has
published a large number of journal and conference publications.
Shinghajit Khuraijam is presently working as Senior Software Enggineer at Accenture. He has
Industry experience of 5 yrs, expert in Portal development and User interface. He pursued his
B.E in Electronics Engg from Mumbai University and M.tech in Information Technology from
CEG, Anna University, Chennai

Contenu connexe

Tendances

NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODELNERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
ijnlc
 

Tendances (17)

An Improved Approach for Word Ambiguity Removal
An Improved Approach for Word Ambiguity RemovalAn Improved Approach for Word Ambiguity Removal
An Improved Approach for Word Ambiguity Removal
 
Quality estimation of machine translation outputs through stemming
Quality estimation of machine translation outputs through stemmingQuality estimation of machine translation outputs through stemming
Quality estimation of machine translation outputs through stemming
 
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
 
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
 
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONAN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
 
Evaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutionsEvaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutions
 
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODELNERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
 
A Review on a web based Punjabi t o English Machine Transliteration System
A Review on a web based Punjabi t o English Machine Transliteration SystemA Review on a web based Punjabi t o English Machine Transliteration System
A Review on a web based Punjabi t o English Machine Transliteration System
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
 
Survey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageSurvey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi Language
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
 
P-6
P-6P-6
P-6
 
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
 
C8 akumaran
C8 akumaranC8 akumaran
C8 akumaran
 
Extractive Summarization with Very Deep Pretrained Language Model
Extractive Summarization with Very Deep Pretrained Language ModelExtractive Summarization with Very Deep Pretrained Language Model
Extractive Summarization with Very Deep Pretrained Language Model
 
A decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri languageA decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri language
 
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid ApproachPunjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
 

En vedette

Developemnt and evaluation of a web based question answering system for arabi...
Developemnt and evaluation of a web based question answering system for arabi...Developemnt and evaluation of a web based question answering system for arabi...
Developemnt and evaluation of a web based question answering system for arabi...
ijnlc
 
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAl
S ENTIMENT A NALYSIS  F OR M ODERN S TANDARD  A RABIC  A ND  C OLLOQUIAlS ENTIMENT A NALYSIS  F OR M ODERN S TANDARD  A RABIC  A ND  C OLLOQUIAl
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAl
ijnlc
 

En vedette (19)

Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...
 
CLUSTERING WEB SEARCH RESULTS FOR EFFECTIVE ARABIC LANGUAGE BROWSING
CLUSTERING WEB SEARCH RESULTS FOR EFFECTIVE ARABIC LANGUAGE BROWSINGCLUSTERING WEB SEARCH RESULTS FOR EFFECTIVE ARABIC LANGUAGE BROWSING
CLUSTERING WEB SEARCH RESULTS FOR EFFECTIVE ARABIC LANGUAGE BROWSING
 
Hybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textHybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic text
 
Building a vietnamese dialog mechanism for v dlg~tabl system
Building a vietnamese dialog mechanism for v dlg~tabl systemBuilding a vietnamese dialog mechanism for v dlg~tabl system
Building a vietnamese dialog mechanism for v dlg~tabl system
 
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
 
An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabic
 
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATION
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATIONHANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATION
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATION
 
Developemnt and evaluation of a web based question answering system for arabi...
Developemnt and evaluation of a web based question answering system for arabi...Developemnt and evaluation of a web based question answering system for arabi...
Developemnt and evaluation of a web based question answering system for arabi...
 
An improved apriori algorithm for association rules
An improved apriori algorithm for association rulesAn improved apriori algorithm for association rules
An improved apriori algorithm for association rules
 
A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...
 
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
 
Evaluation of subjective answers using glsa enhanced with contextual synonymy
Evaluation of subjective answers using glsa enhanced with contextual synonymyEvaluation of subjective answers using glsa enhanced with contextual synonymy
Evaluation of subjective answers using glsa enhanced with contextual synonymy
 
An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...
 
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAl
S ENTIMENT A NALYSIS  F OR M ODERN S TANDARD  A RABIC  A ND  C OLLOQUIAlS ENTIMENT A NALYSIS  F OR M ODERN S TANDARD  A RABIC  A ND  C OLLOQUIAl
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAl
 
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
S URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELSS URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELS
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
 
Conceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarizationConceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarization
 
Document excel
Document excelDocument excel
Document excel
 
C. Porchietto Torino Cronaca Qui 30.05.09
C. Porchietto Torino Cronaca Qui 30.05.09C. Porchietto Torino Cronaca Qui 30.05.09
C. Porchietto Torino Cronaca Qui 30.05.09
 

Similaire à Verb based manipuri sentiment analysis

Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
kevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
kevig
 
Towards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian LanguagesTowards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian Languages
Algoscale Technologies Inc.
 
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmmUnit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
DhruvKushwaha12
 
Benchmarking nlp toolkits for enterprise application
Benchmarking nlp toolkits for enterprise applicationBenchmarking nlp toolkits for enterprise application
Benchmarking nlp toolkits for enterprise application
Conference Papers
 

Similaire à Verb based manipuri sentiment analysis (20)

Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Towards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian LanguagesTowards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian Languages
 
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
 
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
 
A survey of named entity recognition in assamese and other indian languages
A survey of named entity recognition in assamese and other indian languagesA survey of named entity recognition in assamese and other indian languages
A survey of named entity recognition in assamese and other indian languages
 
P-6
P-6P-6
P-6
 
ANN Based POS Tagging For Nepali Text
ANN Based POS Tagging For Nepali Text ANN Based POS Tagging For Nepali Text
ANN Based POS Tagging For Nepali Text
 
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 
228-SE3001_2
228-SE3001_2228-SE3001_2
228-SE3001_2
 
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmmUnit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
Isolated word recognition using lpc &amp; vector quantization
Isolated word recognition using lpc &amp; vector quantizationIsolated word recognition using lpc &amp; vector quantization
Isolated word recognition using lpc &amp; vector quantization
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
5a use of annotated corpus
5a use of annotated corpus5a use of annotated corpus
5a use of annotated corpus
 
07 04-06
07 04-0607 04-06
07 04-06
 
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEDETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
 
Benchmarking nlp toolkits for enterprise application
Benchmarking nlp toolkits for enterprise applicationBenchmarking nlp toolkits for enterprise application
Benchmarking nlp toolkits for enterprise application
 

Dernier

Dernier (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Verb based manipuri sentiment analysis

  • 1. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 10.5121/ijnlc.2014.3311 113 VERB BASED MANIPURI SENTIMENT ANALYSIS Kishorjit Nongmeikapam1 , Dilipkumar Khangembam1 , Wangkheimayum Hemkumar1 , Shinghajit Khuraijam2 and Sivaji Bandyopadhyay3 1 Department of Computer Engineering, Manipur Institute of Technology, Manipur University, Imphal, India 2 Senior Software Enggineer at Accenture, India 3 Department of Computer Engineering, Jadav University, Kolkata, India ABSTRACT This paper deals about the sentiment analysis of the Manipuri article. The language is very highly agglutinative in Nature. The document files are the letters to the editor of few local daily newspapers. The text is processed for Part of Speech (POS) tagging using Conditional Random Field (CRF). The lexicon of verbs is modified with the sentiment polarity (Positive or Negative or Neutral) manually. With the POS tagger the verbs of each sentence are identified and the modified lexicon of verbs is used to notify the polarity of the sentiment in the sentence. The total number of polarity for each category that is positive, negative and neutral is counted separately. The highest total of the three is the deciding factor of the sentiment polarity of the document. The system shows a recall of 72.10%, a precision of 78.14% and a F- measure of 75.00%. KEYWORDS CRF, POS, Sentiment, Polarity, Manipuri 1. INTRODUCTION This document describes, and is written to conform to, author guidelines for the journals of AIRCC series. It is prepared in Microsoft Word as a .doc document. Although other means of preparation are acceptable, final, camera-ready versions must conform to this layout. Microsoft Word terminology is used where appropriate in this document. Although formatting instructions may often appear daunting, the simplest approach is to use this template and insert headings and text into it as appropriate. 2. RELATED WORKS The work of sentiment analysis in Manipuri is not reported up to the best of the author’s knowledge. Works on emotion identification is reported in [2] and [3]. For Indian languages works on SentiWordNet and subjectivity detection is reported in [4] and [5]. CRF based work on sentiment analysis is reported in [6]. 3. CONCEPT OF CONDITION RANDOM FIELD The concept of Conditional Random Field [7] is developed in order to calculate the conditional probabilities of values on other designated input nodes of undirected graphical models. CRF encodes a conditional probability distribution with a given set of features. It is an unsupervised approach where the system learns by giving some training and can be used for testing other texts.
  • 2. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 114 The conditional probability of a state sequence X=(x1, x2,..xT) given an observation sequence Y=(y1, y2,..yT) is calculated as : P(Y|X) = t))X,,y,y(fexp( 1 t1-t T 1t k k k XZ ∑∑= λ ---(1) where, fk( yt-1,yt, X, t) is a feature function whose weight λk is a learnt weight associated with fkand to be learned via training. The values of the feature functions may range between -∞ … +∞, but typically they are binary. ZX is the normalization factor: ∑∑∑ = = T t k kk y XZ 1 t1-t t))X,,y,y(fexp λ ---(2) which is calculated in order to make the probability of all state sequences sum to 1. This is calculated as in Hidden Markov Model (HMM) and can be obtained efficiently by dynamic programming. Since CRF defines the conditional probability P(Y|X), the appropriate objective for parameter learning is to maximize the conditional likelihood of the state sequence or training data. ∑= N 1i )x|P(ylog ii ---(3) where, {(xi , yi )} is the labeled training data. Gaussian prior on the λ’s is used to regularize the training (i.e., smoothing). If λ ~ N(0,ρ2 ), the objective function becomes, ∑∑ − = k i 2 2N 1i 2 )x|P(ylog ii ρ λ ---(4) The objective function is concave, so the λ’shave a unique set of optimal values. 4. SYSTEM DESIGN Fig. 1 gives the brief block diagram of the system. The first step of system works with identifying of verbs after running the Part of Speech (POS) tagger using CRF. This is because the sentiment of the sentence is highly dependent on the verbs. The POS tagging of Manipuri Text document follows the CRF based POS tagging as mention in [8]. In order to check the polarity of the sentiment a lexicon of verbs with manual annotation is used. Once the verbs are identified the identified verbs have a look up to the lexicon table to identify the polarity of the sentiment. Accordingly, the polarity is notified for each the verb.
  • 3. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 115 Figure 1. System block diagram The number for each of the polarity is counted in the text document. The sentiment decider of the letter to the editors of the local daily newspapers is decided with the highest number of the polarity among the three. A. POS tagging using CRF As mention above the POS tagging of Manipuri text document follows the feature selection based on [8]. Fig. 2 shows the setup of the CRF based POS tagging in the Manipuri text. Figure 2. CRF based POS tagging The working of CRF is mainly based on the feature selection. The feature listed for the POS tagging is as follows: F= {Wi-m, … ,W i-1, Wi, W i+1, …, W i+n, SWi-m, …, SWi-1, SWi, SWi+1,… , SWi-n , number of acceptable standard suffixes, number of acceptable standard prefixes, acceptable suffixes present in the word, acceptable prefixes present in the word, word length, word frequency, digit feature, symbol feature, RMWE} The details of the set of features that have been applied for POS tagging in Manipuri are as follows: TEXT FILE IDENTIFY VERBS CHECK COUNT THE TOTAL NUMBER OF POSITIVE OR NEGATIVE OR NEUTRAL VERBS POLARITY SENTIMENT CRF POS TAGGER VERB SENTIMENT Evaluation Results Pre-processing Documents Collection Data Test Labelling Features Extraction Data Training CRF Model FeaturesExtraction
  • 4. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 116 1. Surrounding words as feature: Preceeding word(s) or the successive word(s) are important in POS tagging because these words play an important role in determining the POS of the present word. 2. Surrounding Stem words as feature: The Stemming algorithm mentioned in [9] is used. The preceding and the following stemmed words of a particular word can be used as features. It is because the preceding and the following words influence the present word POS tagging. 3. Number of acceptable standard suffixes as feature: As mention in [9], Manipuri being an agglutinative language the suffixes plays an important in determining the POS of a word. For every word the number of suffixes are identified during stemming and the number of suffixes is used as a feature. 4. Number of acceptable standard prefixes as feature: Prefixes plays an important role for Manipuri language. Prefixes are identified during stemming and the prefixes are used as a feature. 5. Acceptable suffixes present as feature: The standard 61 suffixes of Manipuri which are identified is used as one feature. The maximum number of appended suffixes is reported as ten. So taking into account of such cases, for every word ten columns separated by a space are created for every suffix present in the word. A “0” notation is being used in those columns when the word consists of no acceptable suffixes. 6. Acceptable prefixes present as feature: 11 prefixes have been manually identified in Manipuri and the list of prefixes is used as one feature. For every word if the prefix is present then a column is created mentioning the prefix, otherwise the “0” notation is used. 7. Length of the word: Length of the word is set to 1 if it is greater than 3 otherwise, it is set to 0. Very short words are generally pronouns and rarely proper nouns. 8. Word frequency: A range of frequency for words in the training corpus is set: those words with frequency <100 occurrences are set the value 0, those words which occurs >=100 are set to 1. It is considered as one feature since occurrence of determiners, conjunctions and pronouns are abundant. 9. Digit features: Quantity measurement, date and monetary values are generally digits. Thus the digit feature is an important feature. A binary notation of ‘1’ is used if the word consist of a digit else ‘0’. 10. Symbol feature:Symbols like $,% etc. are meaningful in textual use, so the feature is set to 1 if it is found in the token, otherwise 0. This helps to recognize Symbols and Quantifier number tags. 11. Reduplicated Multiword Expression (RMWE):(RMWE) are also considered as a feature since Manipuri is rich of RMWE. The work of RMWE is mention in [11]. 5. SYSTEM DESIGN The experiment starts with the collection and cleaning of letter to the editors from the leading local daily newspapers. The collection is of 550 letters with an average of 500 words. In total there are about 2,75,000 words. The lexicon of verbs is collected from [10] and the polarities of each word are manually added. By polarity it means positive or negative or neutral. For positive polarity the expert marked with PV, for the negative polarity the expert marked with NG and for neutral the expert marked with NU. Sample example is shown in the Fig. 3. The text document is run for POS tagging using CRF.The C++ based CRF++ 0.53 package1 is used in this work and it is readily available as open source for segmenting or labeling sequential data. After this process each work is marked with the POS tag. The words which are tag with 1 http://crfpp.sourceforge.net/
  • 5. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 117 verbs are identified with a simple algorithm. These identified verbs then look up to the index of the modified lexicon of verbs. By modified verbs it means the polarity tag verbs. The total number of negative polarity verbs is counted. Likewise the numbers of positive and neutral polarity verbs are also counted. …………. …………. NG NG NG NG NG NG ……… . PV NU …………. …………. Figure 3. Sample example of Modified Lexicon Verb List For each of the letter to the editor document the evaluation is done with the parameter of Recall, Precision and F-score as follows: Recall, R = texttheinanscorrectofNo systemthebygivenanscorrectofNo Precision, P = systemthebygivenansofNo systemthebygivenanscorrectofNo F-score, F = RRRRPPPP2222ββββ 1)PR1)PR1)PR1)PR2222((((ββββ + + Where ββββ is one, precision and recall are given equal weight. The system shows an averagerecall of 72.10%, a precision of 78.14% and a F-measure of 75.00% which is also listed in Table I. TABLE I. AVERAGE RESULT 6. CONCLUSIONS The work on sentiment analysis of Manipuri has to go miles. This work is just the beginning of this highly agglutinative Indian Schedule language. More methods and algorithms are to be search and implemented in order to improve the accuracy. The works can also be extended with improvement to other domains of blog, articles, twits, feedbacks, SMS etc. Model Average Recall Average Precision Average F-Score CRF 72.10 78.14 75.00
  • 6. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 118 REFERENCES [1] B. Pang and L. Lee. Opinion mining and sentiment analysis.Foundations and Trends in Information Retrieval 2(1-2):1–135, 2008. [2] Esuli, A and Sebastiani, F. 2006. SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining. LREC-06 [3] Das, D. and Bandyopadhyay, S. 2010. Identifying EmotionTopic - An Unsupervised Hybrid Approach with RhetoricalStructure and Heuristic Classifier. In proceedings of the 6thIEEE- NLPKE, doi: 10.1109/NLPKE.2010.5587777 [4] A. Das and S. Bandyopadhay, “SentiWordNet for Indian languages,” Asian Federation for Natural Language Processing, China, pp. 56–63, August 2010. [5] A. Das and S. Bandyopadhay, “Subjectivity Detection in English and Bengali: A CRF-based Approach,” In Proceedings of the 7th International Conference on Natural Language Processing, Macmillan 2009. [6] J. Zhao, K. Liu and G. Wang, “Adding Redundant Features for CRFs- based Sentence Sentiment Classication” In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.117-126, 2008 [7] Lafferty, J., McCallum, A. and Pereira, F. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proc. of the 18th International Conference on Machine Learning (ICML01).Williamstown, MA, USA. pp. 282-289 [8] Kishorjit, N. and Sivaji, B., “A Transliteration of CRF Based Manipuri POS Tagging”, In the Proceedings of 2nd International Conference on Communication, Computing & Security (ICCCS- 2012), Elsevier Ltd, 2012 [9] Kishorjit, N., Bishworjit, S., Romina, M., Mayekleima Chanu, Ng. & Sivaji, B., (2011) A Light Weight Manipuri Stemmer, In the Proceedings of Natioanal Conference on Indian Language Computing (NCILC), Chochin, India [10] NingombaM. S., “A Dictionary of Manipuri Verbs”, MALADES, Imphaal, 2010 [11] Kishorjit Nongmeikapam, Nonglenjaoba L., Nirmal Y. & Sivaji Bandhyopadhyay, Reduplicated MWE (RMWE) Helps in Improving the CRF Based Manipuri POS Tagger, International Journal of Information Technology Convergence and Services (IJITCS) Vol.2, No.1, DOI : 10.5121/ijitcs.2012.2106, 2012, p.45-59. Authors Kishorjit Nongmeikapam is working as Asst. Professor at Department of Computer Science and Engineering, MIT, Manipur University, India. He has completed his BE from PSG college of Tech., Coimbatore and has completed his ME from Jadavpur University, Kolkata, India. He is presently doing research in the area of Multiword Expression and its applications. He has so far published 30 papers and presently handling a Transliteration project funded by DST, Govt. of Manipur, India. He is the author of the Book, “See the C Programming Language”. Dilipkumar Khangembam is presently a student of Manipur Institute Of Technology. He is pursuing his B.E. in Dept. of Computer Science and Engineering. His area of interest is NLP.
  • 7. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 119 Wangkheimayum Hemkumar is presently a student of Manipur Institute Of Technology. He is pursuing his B.E. in Dept. of Computer Science and Engineering. His area of interest is NLP. Professor Sivaji Bandyopadhyay is working as a Professor since 2001 in the Computer Science and Engineering Department at Jadavpur University, Kolkata, India. His research interests include machine translation, sentiment analysis, textual entailment, question answering systems and information retrieval among others. He is currently supervising six national and international level projects in various areas of language technology. He has published a large number of journal and conference publications. Shinghajit Khuraijam is presently working as Senior Software Enggineer at Accenture. He has Industry experience of 5 yrs, expert in Portal development and User interface. He pursued his B.E in Electronics Engg from Mumbai University and M.tech in Information Technology from CEG, Anna University, Chennai