SlideShare une entreprise Scribd logo
1  sur  33
Project Mentor:
Mr. Nikhil Debbarma
Assistant Prof.
CSE Dept.
NIT,Agartala
Team Members:
Akash Bhargava (10UCS002)
Ashok Kumar(10UCS010)
Laxmi Kant Yadav(10UCS027)
Vijay Kumar Gupta(10UCS057)
Translator must know the Grammatical Structure
of both Input and Output language.
 According to many researchers, Sanskrit
is a very scientific language.
 Sanskrit behaves very closely as
programming language.
 So if we are able to make a translator that
translates Sanskrit into machine
code, then it would prove to be a
significant development in the field of
NLP(Natural Language Processing).
“NASA scientist Rick Briggs had invited 1,000 Sanskrit
scholars from India for working at NASA. But scholars
refused to allow the language to be put to foreign
use”- Dainik
Being a computer and human
understandable, Sanskrit was considered useful in
Space research and many other natural language
processing Applications.
We will first put up some concepts then employ
them -
 1. Advantages of using Sanskrit
 2. Lexical Analysis
 3. Parsing
 4. Approach
 5. Where we are now.
 6. Problems
 7. References
Advantages of using Sanskrit -
Why Sanskrit)
Fixed Morphology
Vibhakti as Pointer
Vibhakti as Pointer
 Lexical analysis is the process of converting a
sequence of characters into a sequence of tokens
 A program or function that performs lexical analysis
is called a lexical analyzer, lexer, tokenizer, or
scanner
 A lexer often exists as a single function which is
called by a parser or another function, or can be
combined with the parser in scanner less parsing
 The lexical analyzer is the first phase of translator.
It‟s main task is to read the input characters and
produces output a sequence of tokens that the
parser uses for syntax analysis.
Lexical Analyzer Parser
Source
program
token
getNextToken
Indexed
Database
Output
 Output of lexical analysis is a stream of tokens
 A token is a syntactic category
◦ In English:
noun, verb, adjective, …
◦ In sanskrit language:
Vibhakti, kriya, vishashena, ..
 Parser relies on the token distinctions:
 An implementation must do two things:
1. Recognize substrings corresponding to tokens
2. Search the identified token in the database to
recognize it‟s context
3. According to the different context it may be different
parts of speech of Sanskrit language eg: verb
(kriya), vibhakti (dhatu roop).
4. Every token is tagged accordingly.
 Two important points:
1. The goal is to partition the string. This is implemented
by reading left-to-right, recognizing one token at a time
2. “Lookahead” may be required to decide where one
token ends and the next token begins
◦ Even our simple example has lookahead issues
i vs. if
= vs. ==
14
LEXICAL ANALYSIS
LEXICAL ANALYSIS
Consider the dhatu(verb root) meaning „to heat‟
The following inflections are analyzed lexically -
HEATS WILL HEAT
, , | , , |
, , | , , |
, , , ,
HEATED HEAT IT(order)
, , | , , |
, , | , , |
, , , ,
LEXICAL ANALYSIS
Consider the noun representing God
The following inclusions are possible
1. Nominative (subject)
2. Accusative (object)
3. Instrumental (by)
4. Dative(to)
5. Ablative(from)
6. Genitive(of)
7. Locative(in)
LEXICAL ANALYSIS
Input Sentence
Tokenize
Avyaya Analysis
Verb Analysis
Noun Analysis
Unknown word(add to database)
 The scanner recognizes words
 The parser recognizes syntactic units
 Parser operations:
◦ Check and verify syntax based on specified
syntax rules
◦ Report errors
 Automation:
◦ The process can be automated
1. Simplicity of design
2. Improving efficiency
3. Enhancing portability
Parsing Sanskrit Text
Now we move towards translating a Sanskrit
sentence into its parser equivalent
PARSING
Analyze (a sentence) into its component parts and
describe their syntactic roles.
Analyze (a string or text) into logical syntactic components,
typically in order to test conformability to a logical grammar.
Parsing Sanskrit Text
Sanskrit Sentence Structure
SOV
English Sentence Structure
SVO
Boy reads chapter
S O V S V O
 We first tokenize the input using strtok(str,” ”);
 Each token can be of 3 types- Noun,verb,
preposition.The task is to identify these token
which is done by matching in indexed database.
 Each token is stored in a structure along with the
meaning and its morphologic.
 Then parser comes into play and form a tree
type of structure using these tokens.
 Bottom-Up LR
◦ Construct parse tree in a bottom-up manner
◦ Find the rightmost derivation in a reverse order
◦ For every potential right hand side and token decide
when a production is found
More powerful
 Bottom-up parsers can handle the largest class of
grammars that can be parsed deterministically
 Programming language used: C and C++
 Database Used: Linux file system, indexed
 Data Structures: Array, Linked List, structure,Tree,
Indexing and Hashing
 INPUT: A sanskrit sentence or paragraph
 eg:
!
 OUTPUT: recognize all the parts of speech
 Form a tree structure to be able to understand the
sentence.
 ::: this is a avyaya.. and the meaning is: where_there ]
 ::: Nominative,Singular, Gender-Masculine ,noun and the root is:
and the meaning is Ram
 ::: The root is: the meaning is: go present-tense,first-
person,singular
 ::: this is a avyaya.. and the meaning is: there
 ::: Nominative,Plural Gender-Masculine ,noun ,and the root is:
and the meaning is god
 ::: Instrumental,Singular, Gender-Masculine ,noun, and the
root is:
and the meaning is boy
 ::: Accusative,Singular, Gender-Feminine ,noun and the root is:
and the meaning is river
Avyaya words(indeclinables) are used to connect 2 or
more simple sentences. Examples -
- (if-then)
- (where-there)
(but)
(hence)
(provided,if)
Not only do avyaya connect sentences but they
also affect structure of a simple sentence.
 Every word encountered in the input sentence could be
any parts of speech of sanskrit as there is no fixed
ordering.
 Because of the above mentioned property of
sanskrit, searching becomes important.
 Database and word collection were in unicode
format, size of each word becomes even larger.
 Grammar of Sanskrit language
 How can we represent it in BNF grammar.
 Parser techniques
 Structure of code
 A big chunk of our time was invested in research of
sanskrit language and its grammar which was quite
difficult.
 Till now we have implemented lexer part and parser part.
Sanskrit & Artificial Intelligence — NASA
Knowledge Representation in Sanskrit and Artificial Intelligence
by Rick Briggs
 http://www.vedicsciences.net/articles/sanskrit-nasa.html
 AI Magazine publishes the importance of Sanskrit
 http://www.parankusa.org/SanskritAsProgramming.pdf
 http://sanskrit.jnu.ac.in/morph/analyze.jsp
 http://en.wikipedia.org/wiki/Sanskrit_verbs
 http://en.wikipedia.org/wiki/Sanskrit_grammar
Sanskrit parser Project Report

Contenu connexe

Tendances

HCI 3e - Ch 17: Models of the system
HCI 3e - Ch 17:  Models of the systemHCI 3e - Ch 17:  Models of the system
HCI 3e - Ch 17: Models of the systemAlan Dix
 
Principle based classification of design smells
Principle based classification of design smellsPrinciple based classification of design smells
Principle based classification of design smellsGanesh Samarthyam
 
Slice Based testing and Object Oriented Testing
Slice Based testing and Object Oriented TestingSlice Based testing and Object Oriented Testing
Slice Based testing and Object Oriented Testingvarsha sharma
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsEran Zimbler
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebMarina Santini
 
Software Re-Engineering in Software Engineering SE28
Software Re-Engineering in Software Engineering SE28Software Re-Engineering in Software Engineering SE28
Software Re-Engineering in Software Engineering SE28koolkampus
 
Software quality assurance
Software quality assuranceSoftware quality assurance
Software quality assuranceAman Adhikari
 
A simple approach of lexical analyzers
A simple approach of lexical analyzersA simple approach of lexical analyzers
A simple approach of lexical analyzersArchana Gopinath
 
O (papel do) Arquiteto de Software
O (papel do) Arquiteto de SoftwareO (papel do) Arquiteto de Software
O (papel do) Arquiteto de SoftwarePeter Jandl Junior
 
DSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstreamDSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstream4Science
 
Component and Deployment Diagram - Brief Overview
Component and Deployment Diagram - Brief OverviewComponent and Deployment Diagram - Brief Overview
Component and Deployment Diagram - Brief OverviewRajiv Kumar
 
requirment anlaysis , user requirements
requirment anlaysis , user requirementsrequirment anlaysis , user requirements
requirment anlaysis , user requirementscsk selva
 

Tendances (20)

Introduction to UML
Introduction to UMLIntroduction to UML
Introduction to UML
 
Domain Modeling
Domain ModelingDomain Modeling
Domain Modeling
 
HCI 3e - Ch 17: Models of the system
HCI 3e - Ch 17:  Models of the systemHCI 3e - Ch 17:  Models of the system
HCI 3e - Ch 17: Models of the system
 
Principle based classification of design smells
Principle based classification of design smellsPrinciple based classification of design smells
Principle based classification of design smells
 
AtoM, Authenticity, and the Chain of Custody
AtoM, Authenticity, and the Chain of CustodyAtoM, Authenticity, and the Chain of Custody
AtoM, Authenticity, and the Chain of Custody
 
Symbol Table
Symbol TableSymbol Table
Symbol Table
 
Slice Based testing and Object Oriented Testing
Slice Based testing and Object Oriented TestingSlice Based testing and Object Oriented Testing
Slice Based testing and Object Oriented Testing
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
Software Engineering by Pankaj Jalote
Software Engineering by Pankaj JaloteSoftware Engineering by Pankaj Jalote
Software Engineering by Pankaj Jalote
 
Software Re-Engineering in Software Engineering SE28
Software Re-Engineering in Software Engineering SE28Software Re-Engineering in Software Engineering SE28
Software Re-Engineering in Software Engineering SE28
 
Software quality assurance
Software quality assuranceSoftware quality assurance
Software quality assurance
 
A simple approach of lexical analyzers
A simple approach of lexical analyzersA simple approach of lexical analyzers
A simple approach of lexical analyzers
 
O (papel do) Arquiteto de Software
O (papel do) Arquiteto de SoftwareO (papel do) Arquiteto de Software
O (papel do) Arquiteto de Software
 
Recognition-of-tokens
Recognition-of-tokensRecognition-of-tokens
Recognition-of-tokens
 
DSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstreamDSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstream
 
interaction norman model in Human Computer Interaction(HCI)
interaction  norman model in Human Computer Interaction(HCI)interaction  norman model in Human Computer Interaction(HCI)
interaction norman model in Human Computer Interaction(HCI)
 
Chapter 2 software process models
Chapter 2   software process modelsChapter 2   software process models
Chapter 2 software process models
 
Component and Deployment Diagram - Brief Overview
Component and Deployment Diagram - Brief OverviewComponent and Deployment Diagram - Brief Overview
Component and Deployment Diagram - Brief Overview
 
requirment anlaysis , user requirements
requirment anlaysis , user requirementsrequirment anlaysis , user requirements
requirment anlaysis , user requirements
 

En vedette

Learning Sanskrit: The Easy and Practical Way - Workbook 1
Learning Sanskrit: The Easy and Practical Way -  Workbook 1Learning Sanskrit: The Easy and Practical Way -  Workbook 1
Learning Sanskrit: The Easy and Practical Way - Workbook 1Shashi Joshi
 
Sanskrit project
Sanskrit projectSanskrit project
Sanskrit projectAswin R
 
Presentation on Android application
Presentation on Android applicationPresentation on Android application
Presentation on Android applicationAtibur Rahman
 
Android Project Presentation
Android Project PresentationAndroid Project Presentation
Android Project PresentationLaxmi Kant Yadav
 
Proposal Defense Power Point
Proposal Defense Power PointProposal Defense Power Point
Proposal Defense Power Pointjamathompson
 
Good and Bad Power Point Examples Ed Tech
Good and Bad Power Point Examples Ed TechGood and Bad Power Point Examples Ed Tech
Good and Bad Power Point Examples Ed TechLynnylu
 
How to Defend your Thesis Proposal like a Professional
How to Defend your Thesis Proposal like a ProfessionalHow to Defend your Thesis Proposal like a Professional
How to Defend your Thesis Proposal like a ProfessionalMiriam College
 

En vedette (8)

Sanskrit Parser Report
Sanskrit Parser ReportSanskrit Parser Report
Sanskrit Parser Report
 
Learning Sanskrit: The Easy and Practical Way - Workbook 1
Learning Sanskrit: The Easy and Practical Way -  Workbook 1Learning Sanskrit: The Easy and Practical Way -  Workbook 1
Learning Sanskrit: The Easy and Practical Way - Workbook 1
 
Sanskrit project
Sanskrit projectSanskrit project
Sanskrit project
 
Presentation on Android application
Presentation on Android applicationPresentation on Android application
Presentation on Android application
 
Android Project Presentation
Android Project PresentationAndroid Project Presentation
Android Project Presentation
 
Proposal Defense Power Point
Proposal Defense Power PointProposal Defense Power Point
Proposal Defense Power Point
 
Good and Bad Power Point Examples Ed Tech
Good and Bad Power Point Examples Ed TechGood and Bad Power Point Examples Ed Tech
Good and Bad Power Point Examples Ed Tech
 
How to Defend your Thesis Proposal like a Professional
How to Defend your Thesis Proposal like a ProfessionalHow to Defend your Thesis Proposal like a Professional
How to Defend your Thesis Proposal like a Professional
 

Similaire à Sanskrit parser Project Report

Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShashank Shisodia
 
Sanskrit in Natural Language Processing
Sanskrit in Natural Language ProcessingSanskrit in Natural Language Processing
Sanskrit in Natural Language ProcessingHitesh Joshi
 
Mozilla Intern Summer 2014 Presentation
Mozilla Intern Summer 2014 PresentationMozilla Intern Summer 2014 Presentation
Mozilla Intern Summer 2014 PresentationCorey Richardson
 
Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)Dhabal Sethi
 
Text based search engine on a fixed corpus and utilizing indexation and ranki...
Text based search engine on a fixed corpus and utilizing indexation and ranki...Text based search engine on a fixed corpus and utilizing indexation and ranki...
Text based search engine on a fixed corpus and utilizing indexation and ranki...Soham Mondal
 
Chinese Word Segmentation in MSR-NLP
Chinese Word Segmentation in MSR-NLPChinese Word Segmentation in MSR-NLP
Chinese Word Segmentation in MSR-NLPAndi Wu
 
Dependency Analysis of Abstract Universal Structures in Korean and English
Dependency Analysis of Abstract Universal Structures in Korean and EnglishDependency Analysis of Abstract Universal Structures in Korean and English
Dependency Analysis of Abstract Universal Structures in Korean and EnglishJinho Choi
 
Towards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian LanguagesTowards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian LanguagesAlgoscale Technologies Inc.
 
5a use of annotated corpus
5a use of annotated corpus5a use of annotated corpus
5a use of annotated corpusThennarasuSakkan
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlpeSAT Journals
 
Pycon ke word vectors
Pycon ke   word vectorsPycon ke   word vectors
Pycon ke word vectorsOsebe Sammi
 
Token classification using Bengali Tokenizer
Token classification using Bengali TokenizerToken classification using Bengali Tokenizer
Token classification using Bengali TokenizerJeet Das
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)VenkateshMurugadas
 
Implementation Of Syntax Parser For English Language Using Grammar Rules
Implementation Of Syntax Parser For English Language Using Grammar RulesImplementation Of Syntax Parser For English Language Using Grammar Rules
Implementation Of Syntax Parser For English Language Using Grammar RulesIJERA Editor
 
Natural language-processing
Natural language-processingNatural language-processing
Natural language-processingHareem Naz
 
Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Aman Sharma
 

Similaire à Sanskrit parser Project Report (20)

Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliterator
 
Sanskrit in Natural Language Processing
Sanskrit in Natural Language ProcessingSanskrit in Natural Language Processing
Sanskrit in Natural Language Processing
 
Mozilla Intern Summer 2014 Presentation
Mozilla Intern Summer 2014 PresentationMozilla Intern Summer 2014 Presentation
Mozilla Intern Summer 2014 Presentation
 
Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)
 
Text based search engine on a fixed corpus and utilizing indexation and ranki...
Text based search engine on a fixed corpus and utilizing indexation and ranki...Text based search engine on a fixed corpus and utilizing indexation and ranki...
Text based search engine on a fixed corpus and utilizing indexation and ranki...
 
FIRE2014_IIT-P
FIRE2014_IIT-PFIRE2014_IIT-P
FIRE2014_IIT-P
 
Chinese Word Segmentation in MSR-NLP
Chinese Word Segmentation in MSR-NLPChinese Word Segmentation in MSR-NLP
Chinese Word Segmentation in MSR-NLP
 
Dependency Analysis of Abstract Universal Structures in Korean and English
Dependency Analysis of Abstract Universal Structures in Korean and EnglishDependency Analysis of Abstract Universal Structures in Korean and English
Dependency Analysis of Abstract Universal Structures in Korean and English
 
Towards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian LanguagesTowards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian Languages
 
5a use of annotated corpus
5a use of annotated corpus5a use of annotated corpus
5a use of annotated corpus
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
 
Ijetcas14 458
Ijetcas14 458Ijetcas14 458
Ijetcas14 458
 
Pycon ke word vectors
Pycon ke   word vectorsPycon ke   word vectors
Pycon ke word vectors
 
Token classification using Bengali Tokenizer
Token classification using Bengali TokenizerToken classification using Bengali Tokenizer
Token classification using Bengali Tokenizer
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Parser
ParserParser
Parser
 
Implementation Of Syntax Parser For English Language Using Grammar Rules
Implementation Of Syntax Parser For English Language Using Grammar RulesImplementation Of Syntax Parser For English Language Using Grammar Rules
Implementation Of Syntax Parser For English Language Using Grammar Rules
 
Natural language-processing
Natural language-processingNatural language-processing
Natural language-processing
 
Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Lexical Analysis - Compiler design
Lexical Analysis - Compiler design
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Dernier (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Sanskrit parser Project Report

  • 1. Project Mentor: Mr. Nikhil Debbarma Assistant Prof. CSE Dept. NIT,Agartala Team Members: Akash Bhargava (10UCS002) Ashok Kumar(10UCS010) Laxmi Kant Yadav(10UCS027) Vijay Kumar Gupta(10UCS057)
  • 2. Translator must know the Grammatical Structure of both Input and Output language.
  • 3.  According to many researchers, Sanskrit is a very scientific language.  Sanskrit behaves very closely as programming language.  So if we are able to make a translator that translates Sanskrit into machine code, then it would prove to be a significant development in the field of NLP(Natural Language Processing).
  • 4. “NASA scientist Rick Briggs had invited 1,000 Sanskrit scholars from India for working at NASA. But scholars refused to allow the language to be put to foreign use”- Dainik Being a computer and human understandable, Sanskrit was considered useful in Space research and many other natural language processing Applications.
  • 5. We will first put up some concepts then employ them -  1. Advantages of using Sanskrit  2. Lexical Analysis  3. Parsing  4. Approach  5. Where we are now.  6. Problems  7. References
  • 6. Advantages of using Sanskrit - Why Sanskrit)
  • 10.  Lexical analysis is the process of converting a sequence of characters into a sequence of tokens  A program or function that performs lexical analysis is called a lexical analyzer, lexer, tokenizer, or scanner  A lexer often exists as a single function which is called by a parser or another function, or can be combined with the parser in scanner less parsing  The lexical analyzer is the first phase of translator. It‟s main task is to read the input characters and produces output a sequence of tokens that the parser uses for syntax analysis.
  • 12.  Output of lexical analysis is a stream of tokens  A token is a syntactic category ◦ In English: noun, verb, adjective, … ◦ In sanskrit language: Vibhakti, kriya, vishashena, ..  Parser relies on the token distinctions:
  • 13.  An implementation must do two things: 1. Recognize substrings corresponding to tokens 2. Search the identified token in the database to recognize it‟s context 3. According to the different context it may be different parts of speech of Sanskrit language eg: verb (kriya), vibhakti (dhatu roop). 4. Every token is tagged accordingly.
  • 14.  Two important points: 1. The goal is to partition the string. This is implemented by reading left-to-right, recognizing one token at a time 2. “Lookahead” may be required to decide where one token ends and the next token begins ◦ Even our simple example has lookahead issues i vs. if = vs. == 14
  • 16. LEXICAL ANALYSIS Consider the dhatu(verb root) meaning „to heat‟ The following inflections are analyzed lexically - HEATS WILL HEAT , , | , , | , , | , , | , , , , HEATED HEAT IT(order) , , | , , | , , | , , | , , , ,
  • 17. LEXICAL ANALYSIS Consider the noun representing God The following inclusions are possible 1. Nominative (subject) 2. Accusative (object) 3. Instrumental (by) 4. Dative(to) 5. Ablative(from) 6. Genitive(of) 7. Locative(in)
  • 18. LEXICAL ANALYSIS Input Sentence Tokenize Avyaya Analysis Verb Analysis Noun Analysis Unknown word(add to database)
  • 19.  The scanner recognizes words  The parser recognizes syntactic units  Parser operations: ◦ Check and verify syntax based on specified syntax rules ◦ Report errors  Automation: ◦ The process can be automated
  • 20. 1. Simplicity of design 2. Improving efficiency 3. Enhancing portability
  • 21. Parsing Sanskrit Text Now we move towards translating a Sanskrit sentence into its parser equivalent PARSING Analyze (a sentence) into its component parts and describe their syntactic roles. Analyze (a string or text) into logical syntactic components, typically in order to test conformability to a logical grammar.
  • 22. Parsing Sanskrit Text Sanskrit Sentence Structure SOV English Sentence Structure SVO Boy reads chapter S O V S V O
  • 23.
  • 24.  We first tokenize the input using strtok(str,” ”);  Each token can be of 3 types- Noun,verb, preposition.The task is to identify these token which is done by matching in indexed database.  Each token is stored in a structure along with the meaning and its morphologic.  Then parser comes into play and form a tree type of structure using these tokens.
  • 25.  Bottom-Up LR ◦ Construct parse tree in a bottom-up manner ◦ Find the rightmost derivation in a reverse order ◦ For every potential right hand side and token decide when a production is found More powerful  Bottom-up parsers can handle the largest class of grammars that can be parsed deterministically
  • 26.  Programming language used: C and C++  Database Used: Linux file system, indexed  Data Structures: Array, Linked List, structure,Tree, Indexing and Hashing  INPUT: A sanskrit sentence or paragraph  eg: !  OUTPUT: recognize all the parts of speech  Form a tree structure to be able to understand the sentence.
  • 27.  ::: this is a avyaya.. and the meaning is: where_there ]  ::: Nominative,Singular, Gender-Masculine ,noun and the root is: and the meaning is Ram  ::: The root is: the meaning is: go present-tense,first- person,singular  ::: this is a avyaya.. and the meaning is: there  ::: Nominative,Plural Gender-Masculine ,noun ,and the root is: and the meaning is god  ::: Instrumental,Singular, Gender-Masculine ,noun, and the root is: and the meaning is boy  ::: Accusative,Singular, Gender-Feminine ,noun and the root is: and the meaning is river
  • 28. Avyaya words(indeclinables) are used to connect 2 or more simple sentences. Examples - - (if-then) - (where-there) (but) (hence) (provided,if) Not only do avyaya connect sentences but they also affect structure of a simple sentence.
  • 29.  Every word encountered in the input sentence could be any parts of speech of sanskrit as there is no fixed ordering.  Because of the above mentioned property of sanskrit, searching becomes important.  Database and word collection were in unicode format, size of each word becomes even larger.
  • 30.  Grammar of Sanskrit language  How can we represent it in BNF grammar.  Parser techniques  Structure of code
  • 31.  A big chunk of our time was invested in research of sanskrit language and its grammar which was quite difficult.  Till now we have implemented lexer part and parser part.
  • 32. Sanskrit & Artificial Intelligence — NASA Knowledge Representation in Sanskrit and Artificial Intelligence by Rick Briggs  http://www.vedicsciences.net/articles/sanskrit-nasa.html  AI Magazine publishes the importance of Sanskrit  http://www.parankusa.org/SanskritAsProgramming.pdf  http://sanskrit.jnu.ac.in/morph/analyze.jsp  http://en.wikipedia.org/wiki/Sanskrit_verbs  http://en.wikipedia.org/wiki/Sanskrit_grammar