SlideShare utilise les cookies pour améliorer les fonctionnalités et les performances, et également pour vous montrer des publicités pertinentes. Si vous continuez à naviguer sur ce site, vous acceptez l’utilisation de cookies. Consultez nos Conditions d’utilisation et notre Politique de confidentialité.
SlideShare utilise les cookies pour améliorer les fonctionnalités et les performances, et également pour vous montrer des publicités pertinentes. Si vous continuez à naviguer sur ce site, vous acceptez l’utilisation de cookies. Consultez notre Politique de confidentialité et nos Conditions d’utilisation pour en savoir plus.
Mrs. Gauri M. Dhopavkar
Ritikesh Bhaskarwar Vimal Shah
Ashwin Borkar Shashil Pohankar
Department of ComputerTechnology
YESHWANTRAO CHAVAN COLLEGE OF ENGINEERING,
(An Autonomous Institution Affiliated to RashtrasantTukadoji Maharaj Nagpur University)
Natural language processing
Natural language processing (NLP) is a
field of computer science, artificial
intelligence, and linguistics concerned
with the interactions between computers
and human (natural) languages.
Natural Language Processing (NLP) is the
computerized approach to analysing text
that is based on both a set of theories and
a set of technologies
POS Tagging :
Part-of-Speech (POS) tagging is the
process of assigning a part-of-speech like
noun, verb, pronoun or other lexical class
marker to each word in a sentence.
After POS tags are identified, the next
step is chunking, which involves dividing
sentences into non-overlapping non-
ते फू ल खूप
THE POSTAGGING EXAMPLE
Need of Marathi POS Tagging :
Lack of significant tools for Indian
Dependence of other NLP activities on
Failure of existing techniques on Indian
Methods for POSTagging
1.Rule Based 2.Stochastic
The rule based POS tagging
models apply a set of hand
written rules and use
contextual information to
assign POS tags to words.
A stochastic approach
probability or statistics. The
approach finds out the most
frequently used tag for a
specific word in the
annotated training data and
uses this information to tag
that word in the
Methods for POSTagging
3. Hiden Markov Model 4. Maximum Entropy Model
The HMM model trains on
annotated corpora to find
out the transition and
The Maximum Entropy
Model (MEM) is based on
the principle of Maximum
Entropy, which states that
when choosing between a
number of different
probabilistic models for a
set of data, the most valid
model is the one which
makes fewest arbitrary
assumptions about the
nature of the data
Architecture and Design :
Marathi sentence is taken as input , then
the tokens are created followed by
tagging and finding ambiguity.
TOKENIZING TAGGING FINDING
Detail of Identified Module :
Tokenizer :This module is used to get the
tokens of the input sentence.Also, calls
the other modules when required.
Tagging :These modules is used for
assigning certain tags to tokens and also
search for ambiguous words and also find
their types and assign some special
symbols to them.
Details of identified modules (cntd.)
Root word : This module is used for
finding the root word of each token
finding it from the Marathi wordnet.
Probability : This module calculates the
probability and accordingly assigns the tag,
according to the higher probability of
• Showing the results :This module shows
the result.The words are shown with
Experimentation and Results :
• 1000: If first bit is 1, then we assign a tag as a noun to
the particular word.
• 1100: In this case, the word can be used as both
• 0100: If second bit is 1, then we assign a tag as an
adjective to the particular word.
• 0110: In this case, the word can be used as other
• 0010: If third bit is 1, then we assign a tag as an adverb
to the particular word.
• 0001: If fourth bit is 1, then we assign a tag as a verb
to the particular word.
A POS tagger can be seen as a first-step
towards tightening the integration
between speech recognition and natural
A POS tagger in the language model aids
in the identification of boundary tones and
speech repairs, redefining the speech
A typical NLP system consists of
tokenization, sentence delimitation, part-of-
speech (POS) tagging, phrase chunking,
parsing, and concept mapping. As one of the
initial steps, POS tagging determines the part
of speech for each token in a sentence.
Managers, educators, Trainers, Sales people
are able to accurately assess the needs of a
group, improves questioning techniques thus
improving their skills to achieve more
User Cannot enter more than one sentence
i.e. cannot enter paragraph.
It is not able to detect and report the gender
of the word i.e. Morphological analysis in
When ambiguity is encountered it is
searched for the POS of the ambiguous word
if it contains less or no word with the correct
POS and there are more number of words for
other POS then it shows incorrect POS for
the ambiguous word.
Word Sense Disambiguation (WSD)
Machine Translation (MT)
-Text to Text
-Speech to Speech
Conclusion and Future Scope :
The POS tagger described here is very
simple and efficient for automatic tagging,
but the morphological complexity of the
Marathi make it hard.The performance of
the current system is good and result
achieved by this method are excellent. In
future we wish to improve the accuracy
our system by adding more tagged
sentence in our training corpus.
Apr. 10, 2019
Natural Language Processing, , , , , ,Marathi POS tagger