SlideShare une entreprise Scribd logo
1  sur  5
Télécharger pour lire hors ligne
E1 geetha2 karthikeyan
Agaraadhi: A Novel Online Dictionary Framework

                        Elanchezhiyan.K, Karthikeyan.S, T V Geetha,
                           Ranjani Parthasarathi & Madhan Karky
           {chezhiyank@gmail.com, sethuramankarthikeyan@gmail.com, madhankarky@gmail.com}
                                     Tamil Computing Lab (TaCoLa),
                       College of Engineering Guindy, Anna University, Chennai.




Abstract
This paper describes Agaraadhi, a dictionary framework for indexing and retrieving Tamil words,
their meaning, analysis and related information. With a database of over 3 lack root words and their
corresponding meaning in English and Tamil, this paper proposes a framework to encompass various
features such as morphological analysis, morphological generation, word usage statistics, word
pleasantness analysis, spell checking, similar word finder, word usage in literature, picture dictionary,
number to text conversion, phonetic transliteration, live usage analysis from micro blogs and more.
Describing various components of the framework the paper concludes with a discussion over
dictionary statistics and possible features for future extension of the framework.

1. Introduction
Most of the Tamil dictionaries are synonym based and they do not give enough information such as
morphological analysis of the word, possible case endings for requested word, pleasantness score,
word usage in the web and social networks, equivalent words or meaning etc. To overcome these
issues we propose Agaraadhi, a framework. Agaraadhi Framework consists of a Morphological
analyser, Morphological generator, Word pleasantness and Word usage score finder as well as
analysis of current usage in Social Networks, Picture dictionary, equivalent Tamil words, Generator
(Word suggestions), Spell checker, Phonetic transliteration, Number to Text Converter, Rare-Word of
the day and Social Network sharing.

Agaraadhi dictionary has more than 3 lac words in various domains such as General, Literature,
Medical, Engineering, Computer Science, etc. The Agaraadhi framework dictionary is a Tamil English
bilingual dictionary. The following sections describe the framework and list the benefits of such a
framework over traditional online Tamil-English dictionaries. A few features proposed in this
framework such as popularity score for a word, to best of our knowledge, are not present in any other
world dictionaries.

2. Agaraadhi Framework
Agaraadhi Dictionary Framework was designed to provide additional information to the user
regarding the word that they query about. Agaraadhi framework presented in figure 1 can be divided



                                                  197
into two major divisions, online and offline, in terms of the time of processing. This section describes
the various components used in the Agaraadhi framework in detail.

2.1 Online Process

Any user query is sent sequentially to dictionary and literature, to retrieve corresponding data from
those indices, fetching phonetic transliteration from transliteration modules, morphological
information from morphological analysis and generator module and fetching live usage analysis from
micro blogs. All those Information are sent to user interface pages, shown in fig 1.




                              Fig 1: Agaraadhi Online Dictionary Framework

2.2 Offline Process

Tamil words and their meanings are entered manually and stored in text files. Those words are
sequentially sent to modules such as popularity score generator, pleasantness score generator, picture
dictionary, phonetic transliteration module and the resulting information is abstracted as a word
object. Tamil literature such as Bharathiyaar songs, Avvaiyar songs, Thirukkural and lyrics are
crawled using a static web crawler and are indexed in hash table as key value pairs.

2.3 Features of Agaraadhi Framework

Agaraadhi dictionary framework consists of more than twenty features such as Morphological
Analysis Morph Generation, pleasantness scoring, popularity scoring, spell error suggestions etc.


                                                  198
3.1 Morphological Analyser

Morphological analyser [1] chunks the query word and gives the morphological features of the query
word such as root word, parts of speech, gender, tense and count. If the Query word is padithaan,
Morphological Analyser gives as padi as root, word represents male gender and query word is past
tense and so on.

3.2 Morphological Generator

A Tamil morphological generator[2] needs to tackle different syntactic categories such as nouns, verbs,
post positions, adjectives, adverbs etc. separately, since the addition of morphological constituents to
each of these syntactic categories depends on different types of information. The generator is used to
generate possible morphological variations of the query word.

3.3 Spell Checker

Spell Checker is used to check the spelling of Tamil words and to provide alternative suggestions for
the wrong words. It uses the Morphological Analyzer. The Morphological Analyzer is used to split the
given Tamil word into the root word and a set of suffixes. If the word is fully split by the analyzer and
its root word is also found in the Agaraadhi dictionary, the given word is termed as correct.
Otherwise, the correction process is invoked to generate all the possible suggestions with minimum
variations from the given word.

3.4 Word Suggestions

Word Suggestion gives the list of equivalent or related words for the given query word.

3.5 Word Pleasantness and Word Popularity Score

Word Pleasantness score generator provides how easy to pronounce the word.

Word Popularity shows the word usage in the web. The Word from agaraadhi is given to web and
found the frequency distribution of the word across the popular blogs, news articles, social nets etc.

3.6 Word Usage in Literature

This feature finds the usage of words in popular literature such as Thirukural, Bharathiyar Padalgal,
Avvai songs and Lyrics.

3.7 Number to Text Converter

It converts a number to Tamil word equivalent as well as in English text. For example in Tamil we
represent oru Arpputham (அ      த ) for 100 million, Kumbam (       பம◌்) for 10 billion and finally up to
                                                                      ◌்
Anniyan (அ நியம◌்) for one zillion.
               ◌்

3.8 Phonetic Transliteration

The pronunciation of words in Tamil and English language, as distinct from their written form based
on the phonology and it can also vary greatly among dialects of a language. Phonetic transliteration
module splits the word into syllables and gives the transliteration for each syllable.




                                                   199
3.9 Picture Dictionary

Pictures, photos or line drawings to depict popular words have been included in the dictionary to
enable efficient learning for children using this tool.

3.10 Social net Sharing and Twitter Update

The framework also provides features to format results to be shared effectively on social networks. An
Agaraadhi Bot was designed to post updates and word of the day on Twitter automatically.

3.11 Word of the Day and Word Usage statistics

A rare word is randomly chosen and is displayed in the opening page to facilitate users to learn a new
word every day.

Word Usage Statistics [3] shows the usage of the word in the social network over the past one week.

3.12 Tamil Word Games

Games play a vital role in learning. Currently Agaraadhi has two Tamil word games namely
Miruginajambo and Thookku Thookki. Miruginajambo is an unscramble game and Thookku Thookki
is a Hangman game in Tamil.

4. Conclusion and Future Work
This paper describes Agaraadhi, an Online Dictionary Framework. Agaraadhi online dictionary is a
bilingual dictionary containing over 3 lac words on various domains like General, Medical,
Engineering, Computer science, Literature etc. This Online Dictionary framework encompass various
features such as morphological analysis, morphological generation, word usage statistics, word
pleasantness analysis, spell checking, similar word finder, word usage in literature, picture dictionary,
number to text conversion, phonetic transliteration, live usage analysis from micro blogs etc.
Providing APIs for programmers and developing mobile apps for Agaraadhi framework will open a
good platform for many researchers and developers working in Tamil Computing area.

References
         Anandan, R. Parthasarathi, and Geetha, Morphological Analyser for Tamil. ICON 2002, 2002.

         Anandan, R. Parthasarathi, and Geetha, Morphological Generator for Tamil. Tamil Inayam,
         Malaysia, 2001.

         J. Jai Hari Raju, P. IndhuReka, Dr. Madhan Karky, Statistical Analysis and visualization of
         Tamil Usage in Live Text Streams, Tamil Internet Conference, Coimbatore, 2010.




                                                    200

Contenu connexe

Tendances

Twitter Sentiment Analysis on 2013 Curriculum Using Ensemble Features and K-N...
Twitter Sentiment Analysis on 2013 Curriculum Using Ensemble Features and K-N...Twitter Sentiment Analysis on 2013 Curriculum Using Ensemble Features and K-N...
Twitter Sentiment Analysis on 2013 Curriculum Using Ensemble Features and K-N...IJECEIAES
 
A New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in MalayalamA New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in Malayalamijcsit
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
 
Adhyann – a hybrid part of-speech tagger
Adhyann – a hybrid part of-speech taggerAdhyann – a hybrid part of-speech tagger
Adhyann – a hybrid part of-speech taggerijitjournal
 
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
 
Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11Dhabal Sethi
 
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES ijnlc
 
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...kevig
 
Survey On Building A Database Driven Reverse Dictionary
Survey On Building A Database Driven Reverse DictionarySurvey On Building A Database Driven Reverse Dictionary
Survey On Building A Database Driven Reverse DictionaryEditor IJMTER
 
IRJET - Analysis on Code-Mixed Data for Movie Reviews
IRJET - Analysis on Code-Mixed Data for Movie ReviewsIRJET - Analysis on Code-Mixed Data for Movie Reviews
IRJET - Analysis on Code-Mixed Data for Movie ReviewsIRJET Journal
 
Role of Machine Translation and Word Sense Disambiguation in Natural Language...
Role of Machine Translation and Word Sense Disambiguation in Natural Language...Role of Machine Translation and Word Sense Disambiguation in Natural Language...
Role of Machine Translation and Word Sense Disambiguation in Natural Language...IOSR Journals
 
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSD
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSDMarathi-English CLIR using detailed user query and unsupervised corpus-based WSD
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSDIJERA Editor
 
IRJET- Survey on Grammar Checking and Correction using Deep Learning for Indi...
IRJET- Survey on Grammar Checking and Correction using Deep Learning for Indi...IRJET- Survey on Grammar Checking and Correction using Deep Learning for Indi...
IRJET- Survey on Grammar Checking and Correction using Deep Learning for Indi...IRJET Journal
 
Developing links of compound sentences for parsing through marathi link gramm...
Developing links of compound sentences for parsing through marathi link gramm...Developing links of compound sentences for parsing through marathi link gramm...
Developing links of compound sentences for parsing through marathi link gramm...ijnlc
 
SENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTS
SENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTSSENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTS
SENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTSijnlc
 
Static dictionary for pronunciation modeling
Static dictionary for pronunciation modelingStatic dictionary for pronunciation modeling
Static dictionary for pronunciation modelingeSAT Publishing House
 

Tendances (19)

Twitter Sentiment Analysis on 2013 Curriculum Using Ensemble Features and K-N...
Twitter Sentiment Analysis on 2013 Curriculum Using Ensemble Features and K-N...Twitter Sentiment Analysis on 2013 Curriculum Using Ensemble Features and K-N...
Twitter Sentiment Analysis on 2013 Curriculum Using Ensemble Features and K-N...
 
A New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in MalayalamA New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in Malayalam
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
 
Adhyann – a hybrid part of-speech tagger
Adhyann – a hybrid part of-speech taggerAdhyann – a hybrid part of-speech tagger
Adhyann – a hybrid part of-speech tagger
 
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
 
Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11
 
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
 
Detecting Paraphrases in Marathi Language
Detecting Paraphrases in Marathi LanguageDetecting Paraphrases in Marathi Language
Detecting Paraphrases in Marathi Language
 
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
 
Arabic MT Project
Arabic MT ProjectArabic MT Project
Arabic MT Project
 
Survey On Building A Database Driven Reverse Dictionary
Survey On Building A Database Driven Reverse DictionarySurvey On Building A Database Driven Reverse Dictionary
Survey On Building A Database Driven Reverse Dictionary
 
IRJET - Analysis on Code-Mixed Data for Movie Reviews
IRJET - Analysis on Code-Mixed Data for Movie ReviewsIRJET - Analysis on Code-Mixed Data for Movie Reviews
IRJET - Analysis on Code-Mixed Data for Movie Reviews
 
Role of Machine Translation and Word Sense Disambiguation in Natural Language...
Role of Machine Translation and Word Sense Disambiguation in Natural Language...Role of Machine Translation and Word Sense Disambiguation in Natural Language...
Role of Machine Translation and Word Sense Disambiguation in Natural Language...
 
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSD
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSDMarathi-English CLIR using detailed user query and unsupervised corpus-based WSD
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSD
 
IRJET- Survey on Grammar Checking and Correction using Deep Learning for Indi...
IRJET- Survey on Grammar Checking and Correction using Deep Learning for Indi...IRJET- Survey on Grammar Checking and Correction using Deep Learning for Indi...
IRJET- Survey on Grammar Checking and Correction using Deep Learning for Indi...
 
Developing links of compound sentences for parsing through marathi link gramm...
Developing links of compound sentences for parsing through marathi link gramm...Developing links of compound sentences for parsing through marathi link gramm...
Developing links of compound sentences for parsing through marathi link gramm...
 
Cf32516518
Cf32516518Cf32516518
Cf32516518
 
SENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTS
SENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTSSENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTS
SENTIMENT ANALYSIS OF MIXED CODE FOR THE TRANSLITERATED HINDI AND MARATHI TEXTS
 
Static dictionary for pronunciation modeling
Static dictionary for pronunciation modelingStatic dictionary for pronunciation modeling
Static dictionary for pronunciation modeling
 

En vedette (20)

I2 madankarky1 jharibabu
I2 madankarky1 jharibabuI2 madankarky1 jharibabu
I2 madankarky1 jharibabu
 
C2 mala2 janani
C2 mala2 jananiC2 mala2 janani
C2 mala2 janani
 
H3 anuraj
H3 anurajH3 anuraj
H3 anuraj
 
F2 pvairam sarathy
F2 pvairam sarathyF2 pvairam sarathy
F2 pvairam sarathy
 
A1 devarajan
A1 devarajanA1 devarajan
A1 devarajan
 
B5 msaravanan
B5 msaravananB5 msaravanan
B5 msaravanan
 
Agenda tamil
Agenda tamilAgenda tamil
Agenda tamil
 
A2 velmurugan
A2 velmuruganA2 velmurugan
A2 velmurugan
 
Actionresearch
ActionresearchActionresearch
Actionresearch
 
B1 sivakumaran
B1 sivakumaranB1 sivakumaran
B1 sivakumaran
 
I6 mala3 sowmya
I6 mala3 sowmyaI6 mala3 sowmya
I6 mala3 sowmya
 
Alagumuthu ph.d.-progress report
Alagumuthu ph.d.-progress reportAlagumuthu ph.d.-progress report
Alagumuthu ph.d.-progress report
 
E3 ilangkumaran
E3 ilangkumaranE3 ilangkumaran
E3 ilangkumaran
 
D1 singaravelu
D1 singaraveluD1 singaravelu
D1 singaravelu
 
C8 akumaran
C8 akumaranC8 akumaran
C8 akumaran
 
C3 geetha1 karthikeyan
C3 geetha1 karthikeyanC3 geetha1 karthikeyan
C3 geetha1 karthikeyan
 
C6 agramakrishnan1
C6 agramakrishnan1C6 agramakrishnan1
C6 agramakrishnan1
 
A4 elanjceziyan
A4 elanjceziyanA4 elanjceziyan
A4 elanjceziyan
 
Transformational grammar
Transformational grammarTransformational grammar
Transformational grammar
 
English idioms dictionary in PDF download for free
English idioms dictionary in PDF download for freeEnglish idioms dictionary in PDF download for free
English idioms dictionary in PDF download for free
 

Similaire à E1 geetha2 karthikeyan

Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerImplementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerIOSR Journals
 
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language TextsRBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Textskevig
 
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language TextsRBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Textskevig
 
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHHANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHijnlc
 
A Review On Recognition Of Sentiment Analysis Of Marathi Tweets Using Machine...
A Review On Recognition Of Sentiment Analysis Of Marathi Tweets Using Machine...A Review On Recognition Of Sentiment Analysis Of Marathi Tweets Using Machine...
A Review On Recognition Of Sentiment Analysis Of Marathi Tweets Using Machine...Justin Knight
 
Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...Waqas Tariq
 
Grapheme-To-Phoneme Tools for the Marathi Speech Synthesis
Grapheme-To-Phoneme Tools for the Marathi Speech SynthesisGrapheme-To-Phoneme Tools for the Marathi Speech Synthesis
Grapheme-To-Phoneme Tools for the Marathi Speech SynthesisIJERA Editor
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathiaciijournal
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathiaciijournal
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathiaciijournal
 
Developing an architecture for translation engine using ontology
Developing an architecture for translation engine using ontologyDeveloping an architecture for translation engine using ontology
Developing an architecture for translation engine using ontologyAlexander Decker
 
A Corpus-Based Concatenative Speech Synthesis System for Marathi
A Corpus-Based Concatenative Speech Synthesis System for MarathiA Corpus-Based Concatenative Speech Synthesis System for Marathi
A Corpus-Based Concatenative Speech Synthesis System for Marathiiosrjce
 
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...IJERA Editor
 
MYANMAR WORDS SORTING
MYANMAR WORDS SORTING MYANMAR WORDS SORTING
MYANMAR WORDS SORTING kevig
 
MYANMAR WORDS SORTING
MYANMAR WORDS SORTING MYANMAR WORDS SORTING
MYANMAR WORDS SORTING kevig
 
NLP and its applications
NLP and its applicationsNLP and its applications
NLP and its applicationsUtphala P
 
Implementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large DictionaryImplementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large Dictionaryiosrjce
 

Similaire à E1 geetha2 karthikeyan (20)

F017163443
F017163443F017163443
F017163443
 
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerImplementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
 
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language TextsRBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
 
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language TextsRBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
 
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHHANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
 
A Review On Recognition Of Sentiment Analysis Of Marathi Tweets Using Machine...
A Review On Recognition Of Sentiment Analysis Of Marathi Tweets Using Machine...A Review On Recognition Of Sentiment Analysis Of Marathi Tweets Using Machine...
A Review On Recognition Of Sentiment Analysis Of Marathi Tweets Using Machine...
 
Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...
 
Grapheme-To-Phoneme Tools for the Marathi Speech Synthesis
Grapheme-To-Phoneme Tools for the Marathi Speech SynthesisGrapheme-To-Phoneme Tools for the Marathi Speech Synthesis
Grapheme-To-Phoneme Tools for the Marathi Speech Synthesis
 
Pxc3898474
Pxc3898474Pxc3898474
Pxc3898474
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
 
Developing an architecture for translation engine using ontology
Developing an architecture for translation engine using ontologyDeveloping an architecture for translation engine using ontology
Developing an architecture for translation engine using ontology
 
A Corpus-Based Concatenative Speech Synthesis System for Marathi
A Corpus-Based Concatenative Speech Synthesis System for MarathiA Corpus-Based Concatenative Speech Synthesis System for Marathi
A Corpus-Based Concatenative Speech Synthesis System for Marathi
 
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...
 
MYANMAR WORDS SORTING
MYANMAR WORDS SORTING MYANMAR WORDS SORTING
MYANMAR WORDS SORTING
 
MYANMAR WORDS SORTING
MYANMAR WORDS SORTING MYANMAR WORDS SORTING
MYANMAR WORDS SORTING
 
NLP and its applications
NLP and its applicationsNLP and its applications
NLP and its applications
 
Implementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large DictionaryImplementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large Dictionary
 
A017420108
A017420108A017420108
A017420108
 

Plus de Jasline Presilda (20)

I5 geetha4 suraiya
I5 geetha4 suraiyaI5 geetha4 suraiya
I5 geetha4 suraiya
 
I4 madankarky3 subalalitha
I4 madankarky3 subalalithaI4 madankarky3 subalalitha
I4 madankarky3 subalalitha
 
I3 madankarky2 karthika
I3 madankarky2 karthikaI3 madankarky2 karthika
I3 madankarky2 karthika
 
Hari tamil-complete details
Hari tamil-complete detailsHari tamil-complete details
Hari tamil-complete details
 
H4 neelavathy
H4 neelavathyH4 neelavathy
H4 neelavathy
 
H2 gunasekaran
H2 gunasekaranH2 gunasekaran
H2 gunasekaran
 
H1 iniya nehru
H1 iniya nehruH1 iniya nehru
H1 iniya nehru
 
G3 chandrakala
G3 chandrakalaG3 chandrakala
G3 chandrakala
 
G2 selvakumar
G2 selvakumarG2 selvakumar
G2 selvakumar
 
G1 nmurugaiyan
G1 nmurugaiyanG1 nmurugaiyan
G1 nmurugaiyan
 
Front matter
Front matterFront matter
Front matter
 
F1 ferdinjoe
F1 ferdinjoeF1 ferdinjoe
F1 ferdinjoe
 
Emerging
EmergingEmerging
Emerging
 
E2 tamilselvan
E2 tamilselvanE2 tamilselvan
E2 tamilselvan
 
D5 radha chellappan
D5 radha chellappanD5 radha chellappan
D5 radha chellappan
 
D4 sundaram
D4 sundaramD4 sundaram
D4 sundaram
 
D3 dhanalakshmi
D3 dhanalakshmiD3 dhanalakshmi
D3 dhanalakshmi
 
D2 anandkumar
D2 anandkumarD2 anandkumar
D2 anandkumar
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
C7 agramakirshnan2
C7 agramakirshnan2C7 agramakirshnan2
C7 agramakirshnan2
 

Dernier

Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfDiploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfMohonDas
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.EnglishCEIPdeSigeiro
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsEugene Lysak
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxraviapr7
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfMohonDas
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.raviapr7
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17Celine George
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxraviapr7
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17Celine George
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICESayali Powar
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapitolTechU
 
Human-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesHuman-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesMohammad Hassany
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptxmary850239
 
NOTES OF DRUGS ACTING ON NERVOUS SYSTEM .pdf
NOTES OF DRUGS ACTING ON NERVOUS SYSTEM .pdfNOTES OF DRUGS ACTING ON NERVOUS SYSTEM .pdf
NOTES OF DRUGS ACTING ON NERVOUS SYSTEM .pdfSumit Tiwari
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxDr. Santhosh Kumar. N
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRATanmoy Mishra
 
3.19.24 Urban Uprisings and the Chicago Freedom Movement.pptx
3.19.24 Urban Uprisings and the Chicago Freedom Movement.pptx3.19.24 Urban Uprisings and the Chicago Freedom Movement.pptx
3.19.24 Urban Uprisings and the Chicago Freedom Movement.pptxmary850239
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17Celine George
 

Dernier (20)

Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfDiploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdf
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George Wells
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptx
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdf
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17
 
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdfPersonal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptx
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICE
 
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quizPrelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptx
 
Human-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesHuman-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming Classes
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptx
 
NOTES OF DRUGS ACTING ON NERVOUS SYSTEM .pdf
NOTES OF DRUGS ACTING ON NERVOUS SYSTEM .pdfNOTES OF DRUGS ACTING ON NERVOUS SYSTEM .pdf
NOTES OF DRUGS ACTING ON NERVOUS SYSTEM .pdf
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptx
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
 
3.19.24 Urban Uprisings and the Chicago Freedom Movement.pptx
3.19.24 Urban Uprisings and the Chicago Freedom Movement.pptx3.19.24 Urban Uprisings and the Chicago Freedom Movement.pptx
3.19.24 Urban Uprisings and the Chicago Freedom Movement.pptx
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17
 

E1 geetha2 karthikeyan

  • 2. Agaraadhi: A Novel Online Dictionary Framework Elanchezhiyan.K, Karthikeyan.S, T V Geetha, Ranjani Parthasarathi & Madhan Karky {chezhiyank@gmail.com, sethuramankarthikeyan@gmail.com, madhankarky@gmail.com} Tamil Computing Lab (TaCoLa), College of Engineering Guindy, Anna University, Chennai. Abstract This paper describes Agaraadhi, a dictionary framework for indexing and retrieving Tamil words, their meaning, analysis and related information. With a database of over 3 lack root words and their corresponding meaning in English and Tamil, this paper proposes a framework to encompass various features such as morphological analysis, morphological generation, word usage statistics, word pleasantness analysis, spell checking, similar word finder, word usage in literature, picture dictionary, number to text conversion, phonetic transliteration, live usage analysis from micro blogs and more. Describing various components of the framework the paper concludes with a discussion over dictionary statistics and possible features for future extension of the framework. 1. Introduction Most of the Tamil dictionaries are synonym based and they do not give enough information such as morphological analysis of the word, possible case endings for requested word, pleasantness score, word usage in the web and social networks, equivalent words or meaning etc. To overcome these issues we propose Agaraadhi, a framework. Agaraadhi Framework consists of a Morphological analyser, Morphological generator, Word pleasantness and Word usage score finder as well as analysis of current usage in Social Networks, Picture dictionary, equivalent Tamil words, Generator (Word suggestions), Spell checker, Phonetic transliteration, Number to Text Converter, Rare-Word of the day and Social Network sharing. Agaraadhi dictionary has more than 3 lac words in various domains such as General, Literature, Medical, Engineering, Computer Science, etc. The Agaraadhi framework dictionary is a Tamil English bilingual dictionary. The following sections describe the framework and list the benefits of such a framework over traditional online Tamil-English dictionaries. A few features proposed in this framework such as popularity score for a word, to best of our knowledge, are not present in any other world dictionaries. 2. Agaraadhi Framework Agaraadhi Dictionary Framework was designed to provide additional information to the user regarding the word that they query about. Agaraadhi framework presented in figure 1 can be divided 197
  • 3. into two major divisions, online and offline, in terms of the time of processing. This section describes the various components used in the Agaraadhi framework in detail. 2.1 Online Process Any user query is sent sequentially to dictionary and literature, to retrieve corresponding data from those indices, fetching phonetic transliteration from transliteration modules, morphological information from morphological analysis and generator module and fetching live usage analysis from micro blogs. All those Information are sent to user interface pages, shown in fig 1. Fig 1: Agaraadhi Online Dictionary Framework 2.2 Offline Process Tamil words and their meanings are entered manually and stored in text files. Those words are sequentially sent to modules such as popularity score generator, pleasantness score generator, picture dictionary, phonetic transliteration module and the resulting information is abstracted as a word object. Tamil literature such as Bharathiyaar songs, Avvaiyar songs, Thirukkural and lyrics are crawled using a static web crawler and are indexed in hash table as key value pairs. 2.3 Features of Agaraadhi Framework Agaraadhi dictionary framework consists of more than twenty features such as Morphological Analysis Morph Generation, pleasantness scoring, popularity scoring, spell error suggestions etc. 198
  • 4. 3.1 Morphological Analyser Morphological analyser [1] chunks the query word and gives the morphological features of the query word such as root word, parts of speech, gender, tense and count. If the Query word is padithaan, Morphological Analyser gives as padi as root, word represents male gender and query word is past tense and so on. 3.2 Morphological Generator A Tamil morphological generator[2] needs to tackle different syntactic categories such as nouns, verbs, post positions, adjectives, adverbs etc. separately, since the addition of morphological constituents to each of these syntactic categories depends on different types of information. The generator is used to generate possible morphological variations of the query word. 3.3 Spell Checker Spell Checker is used to check the spelling of Tamil words and to provide alternative suggestions for the wrong words. It uses the Morphological Analyzer. The Morphological Analyzer is used to split the given Tamil word into the root word and a set of suffixes. If the word is fully split by the analyzer and its root word is also found in the Agaraadhi dictionary, the given word is termed as correct. Otherwise, the correction process is invoked to generate all the possible suggestions with minimum variations from the given word. 3.4 Word Suggestions Word Suggestion gives the list of equivalent or related words for the given query word. 3.5 Word Pleasantness and Word Popularity Score Word Pleasantness score generator provides how easy to pronounce the word. Word Popularity shows the word usage in the web. The Word from agaraadhi is given to web and found the frequency distribution of the word across the popular blogs, news articles, social nets etc. 3.6 Word Usage in Literature This feature finds the usage of words in popular literature such as Thirukural, Bharathiyar Padalgal, Avvai songs and Lyrics. 3.7 Number to Text Converter It converts a number to Tamil word equivalent as well as in English text. For example in Tamil we represent oru Arpputham (அ த ) for 100 million, Kumbam ( பம◌்) for 10 billion and finally up to ◌் Anniyan (அ நியம◌்) for one zillion. ◌் 3.8 Phonetic Transliteration The pronunciation of words in Tamil and English language, as distinct from their written form based on the phonology and it can also vary greatly among dialects of a language. Phonetic transliteration module splits the word into syllables and gives the transliteration for each syllable. 199
  • 5. 3.9 Picture Dictionary Pictures, photos or line drawings to depict popular words have been included in the dictionary to enable efficient learning for children using this tool. 3.10 Social net Sharing and Twitter Update The framework also provides features to format results to be shared effectively on social networks. An Agaraadhi Bot was designed to post updates and word of the day on Twitter automatically. 3.11 Word of the Day and Word Usage statistics A rare word is randomly chosen and is displayed in the opening page to facilitate users to learn a new word every day. Word Usage Statistics [3] shows the usage of the word in the social network over the past one week. 3.12 Tamil Word Games Games play a vital role in learning. Currently Agaraadhi has two Tamil word games namely Miruginajambo and Thookku Thookki. Miruginajambo is an unscramble game and Thookku Thookki is a Hangman game in Tamil. 4. Conclusion and Future Work This paper describes Agaraadhi, an Online Dictionary Framework. Agaraadhi online dictionary is a bilingual dictionary containing over 3 lac words on various domains like General, Medical, Engineering, Computer science, Literature etc. This Online Dictionary framework encompass various features such as morphological analysis, morphological generation, word usage statistics, word pleasantness analysis, spell checking, similar word finder, word usage in literature, picture dictionary, number to text conversion, phonetic transliteration, live usage analysis from micro blogs etc. Providing APIs for programmers and developing mobile apps for Agaraadhi framework will open a good platform for many researchers and developers working in Tamil Computing area. References Anandan, R. Parthasarathi, and Geetha, Morphological Analyser for Tamil. ICON 2002, 2002. Anandan, R. Parthasarathi, and Geetha, Morphological Generator for Tamil. Tamil Inayam, Malaysia, 2001. J. Jai Hari Raju, P. IndhuReka, Dr. Madhan Karky, Statistical Analysis and visualization of Tamil Usage in Live Text Streams, Tamil Internet Conference, Coimbatore, 2010. 200