SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
Tamil Internet Conference 2020
TamilInayaVaani - Integrating TVA Open-
Source spellchecker with Python
T. Shrinivasan, Nithya Duraisamy, Ashok Ramachandran, Manickkavasakam,
Arunmozhi, and A. Muthiah
Who are we?
Few Open Source Contributors from
Ezhil Foundation
Kaniyam Foundation
Thamizha
Mozilla Tamilnadu
Indian Linux Users group, Chennai
IndicNLP
Having similar dreams in many heads
Open source Tamil Spellchecker
A Dream for many years becoming real
Existing Efforts
●
Hunspell
●
GNU Aspell
●
LanguageTool.org
●
Open-Tamil Solthiruthi
●
Bloom Filter based spellchecker
Still long way to go
How long?
Problems with Tamil Spellchecker
●
Infinity Vocabulary
●
Rich in Morphology
●
Agglutinative
●
Free Word Order
●
Sandhi
●
...
Few Algorithms
Levenshtein distance search
Levenshtein distance search
Few Algorithms
Norvig Algorithm
Norvig Algorithm
Still not perfect
Research continues...
TamilinayaVaani
A Open Source Spellchecker from
Tamil Virtual Academy
TN Govt announcement
All the software released in GNU GPL V2
All digital content in CC-BY-SA
TamilinayaVaani
●
Developed as Desktop Version
●
C# based
●
Limited version of Vaani.neechalkaran.com
●
Cant use in Linux
●
Cant use as command line
●
Cant integrate with other applications
Porting to Python
Why?
Porting to Python
Python – Easy to develop further
Easy integration
Web applications
API
Scalable
Python Port Code
●
https://github.com/tshrinivasan/Tamilinaiya-Spellchecker
The beauty of Open Source
More Contributions
Open-Tamil Python Library
●
The defacto Python library for Tamil Computing
●
Process tamil text
●
Build Games, Tamil Utilities
●
http://Tamilpesu.us
Integrating with sandhichecker
●
Open-Tamil has a SandhiChecker
●
40+ rules
●
Added this sandhi Checker to Tamilinayavaani
Python Packaging
●
Easy install in any OS
●
Pip install tamiliyavaani
Sample Usage
Web Interface with TinyMCE
●
Added a good web interface
Web Interface
Web Interface
Web Interface
Web Interface
Web Interface
JavaScript
A JavaScript port is on the way
TODO
●
Provide API
●
Host as a Public website
●
Test and add more rules
●
Set edit distance=2
●
Find method to yield better alternate
●
Word Corpus
●
Collected 1,53,548 unique tamil nouns
●
Collected 25,83,000 unique tamil words
●
https://github.com/KaniyamFoundation/all_tamil_words
●
https://github.com/KaniyamFoundation/all_tamil_nouns
TODO
●
Clean them manually
●
Build a golden corpus for quick lookup
●
BloomFilter/SymSpell/LSTM and more
Please Contribute
●
Give Tamil Rules
●
Give Tamil Corpus
●
Write Code
●
Test
●
Document
●
Provide Hosting
●
Donate
Thanks
●
Muthu Annamalai
●
Tamilnadu Government
●
Neechalkaran
●
Nithya Duraisamy
●
Ashok Ramachandran
●
Manickkavasakam
●
Arunmozhi
●
And All Contributors for
Ezhil Foundation, Kaniyam
Foundation, Thamizha,
IndicNLP and all other
Open Source Teams
Contact
●
T Shrinivasan
●
tshrinivasan@gmail.com
●
Kaniyam.com

Contenu connexe

Similaire à Tamilinayavaani - integrating tva open-source spellchecker with python

Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latest
Jaganadh Gopinadhan
 
Open AI Chat GPT-4-3.pptx
Open AI Chat GPT-4-3.pptxOpen AI Chat GPT-4-3.pptx
Open AI Chat GPT-4-3.pptx
JKHomer
 
openaichatgpt-4-3-230403022910-5eda7251.pdf
openaichatgpt-4-3-230403022910-5eda7251.pdfopenaichatgpt-4-3-230403022910-5eda7251.pdf
openaichatgpt-4-3-230403022910-5eda7251.pdf
DavidOlivos3
 

Similaire à Tamilinayavaani - integrating tva open-source spellchecker with python (12)

Benefits & features of python |Advantages & disadvantages of python
Benefits & features of python |Advantages & disadvantages of pythonBenefits & features of python |Advantages & disadvantages of python
Benefits & features of python |Advantages & disadvantages of python
 
Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latest
 
Fuel conference indic_computing_crossing_the_chasm
Fuel conference indic_computing_crossing_the_chasmFuel conference indic_computing_crossing_the_chasm
Fuel conference indic_computing_crossing_the_chasm
 
TAAI 2016 Keynote Talk: Intercultural Collaboration as a Multi‐Agent System
TAAI 2016 Keynote Talk: Intercultural Collaboration as a Multi‐Agent SystemTAAI 2016 Keynote Talk: Intercultural Collaboration as a Multi‐Agent System
TAAI 2016 Keynote Talk: Intercultural Collaboration as a Multi‐Agent System
 
Achievement And Lessons Learned By An Loc
Achievement And Lessons Learned By An LocAchievement And Lessons Learned By An Loc
Achievement And Lessons Learned By An Loc
 
Open-Tamil text processing library
Open-Tamil text processing libraryOpen-Tamil text processing library
Open-Tamil text processing library
 
Python Training in Bangalore
Python Training in BangalorePython Training in Bangalore
Python Training in Bangalore
 
Python theory.docx
Python theory.docxPython theory.docx
Python theory.docx
 
Open AI Chat GPT-4-3.pptx
Open AI Chat GPT-4-3.pptxOpen AI Chat GPT-4-3.pptx
Open AI Chat GPT-4-3.pptx
 
openaichatgpt-4-3-230403022910-5eda7251.pdf
openaichatgpt-4-3-230403022910-5eda7251.pdfopenaichatgpt-4-3-230403022910-5eda7251.pdf
openaichatgpt-4-3-230403022910-5eda7251.pdf
 
Language Translator.pptx
Language Translator.pptxLanguage Translator.pptx
Language Translator.pptx
 
Python Programming Course
Python Programming CoursePython Programming Course
Python Programming Course
 

Plus de Shrinivasan T

கட்டற்ற மென்பொருள் பற்றிய அறிமுகம் - தமிழில் - Introduction to Open source in...
கட்டற்ற மென்பொருள் பற்றிய அறிமுகம் - தமிழில் - Introduction to Open source in...கட்டற்ற மென்பொருள் பற்றிய அறிமுகம் - தமிழில் - Introduction to Open source in...
கட்டற்ற மென்பொருள் பற்றிய அறிமுகம் - தமிழில் - Introduction to Open source in...
Shrinivasan T
 
Sprit of Engineering
Sprit of EngineeringSprit of Engineering
Sprit of Engineering
Shrinivasan T
 

Plus de Shrinivasan T (20)

Giving New Life to Old Tamil Little Magazines Through Digitization
Giving New Life to Old Tamil Little Magazines Through DigitizationGiving New Life to Old Tamil Little Magazines Through Digitization
Giving New Life to Old Tamil Little Magazines Through Digitization
 
Digitization of Tamil Soviet Publications and Little Magazines.pdf
Digitization of Tamil Soviet Publications and Little Magazines.pdfDigitization of Tamil Soviet Publications and Little Magazines.pdf
Digitization of Tamil Soviet Publications and Little Magazines.pdf
 
python-an-introduction
python-an-introductionpython-an-introduction
python-an-introduction
 
Algorithms for certain classes of tamil spelling correction
Algorithms for certain classes of tamil spelling correctionAlgorithms for certain classes of tamil spelling correction
Algorithms for certain classes of tamil spelling correction
 
Tamil and-free-software - தமிழும் கட்டற்ற மென்பொருட்களும்
Tamil and-free-software - தமிழும் கட்டற்ற மென்பொருட்களும்Tamil and-free-software - தமிழும் கட்டற்ற மென்பொருட்களும்
Tamil and-free-software - தமிழும் கட்டற்ற மென்பொருட்களும்
 
Introducing FreeTamilEbooks
Introducing FreeTamilEbooks Introducing FreeTamilEbooks
Introducing FreeTamilEbooks
 
கணித்தமிழும் மென்பொருள்களும் - தேவைகளும் தீர்வுகளும்
கணித்தமிழும் மென்பொருள்களும் - தேவைகளும் தீர்வுகளும் கணித்தமிழும் மென்பொருள்களும் - தேவைகளும் தீர்வுகளும்
கணித்தமிழும் மென்பொருள்களும் - தேவைகளும் தீர்வுகளும்
 
Contribute to free open source software tamil - கட்டற்ற மென்பொருளுக்கு பங்களி...
Contribute to free open source software tamil - கட்டற்ற மென்பொருளுக்கு பங்களி...Contribute to free open source software tamil - கட்டற்ற மென்பொருளுக்கு பங்களி...
Contribute to free open source software tamil - கட்டற்ற மென்பொருளுக்கு பங்களி...
 
ஏன் லினக்ஸ் பயன்படுத்த வேண்டும்? - Why Linux? in Tamil
ஏன் லினக்ஸ் பயன்படுத்த வேண்டும்? - Why Linux? in Tamilஏன் லினக்ஸ் பயன்படுத்த வேண்டும்? - Why Linux? in Tamil
ஏன் லினக்ஸ் பயன்படுத்த வேண்டும்? - Why Linux? in Tamil
 
கட்டற்ற மென்பொருள் பற்றிய அறிமுகம் - தமிழில் - Introduction to Open source in...
கட்டற்ற மென்பொருள் பற்றிய அறிமுகம் - தமிழில் - Introduction to Open source in...கட்டற்ற மென்பொருள் பற்றிய அறிமுகம் - தமிழில் - Introduction to Open source in...
கட்டற்ற மென்பொருள் பற்றிய அறிமுகம் - தமிழில் - Introduction to Open source in...
 
Share your knowledge in wikipedia
Share your knowledge in wikipediaShare your knowledge in wikipedia
Share your knowledge in wikipedia
 
Open-Tamil Python Library for Tamil Text Processing
Open-Tamil Python Library for Tamil Text ProcessingOpen-Tamil Python Library for Tamil Text Processing
Open-Tamil Python Library for Tamil Text Processing
 
Version control-systems
Version control-systemsVersion control-systems
Version control-systems
 
Contribute to-ubuntu
Contribute to-ubuntuContribute to-ubuntu
Contribute to-ubuntu
 
Dhvani TTS
Dhvani TTSDhvani TTS
Dhvani TTS
 
Freedom toaster
Freedom toasterFreedom toaster
Freedom toaster
 
Sprit of Engineering
Sprit of EngineeringSprit of Engineering
Sprit of Engineering
 
Amace ion newsletter-01
Amace ion   newsletter-01Amace ion   newsletter-01
Amace ion newsletter-01
 
Rpm Introduction
Rpm IntroductionRpm Introduction
Rpm Introduction
 
Foss History
Foss HistoryFoss History
Foss History
 

Dernier

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
Muhammad Subhan
 

Dernier (20)

The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 

Tamilinayavaani - integrating tva open-source spellchecker with python