Scaling API-first – The story of a global engineering organization
Sanskrit parser Project Report
1. Project Mentor:
Mr. Nikhil Debbarma
Assistant Prof.
CSE Dept.
NIT,Agartala
Team Members:
Akash Bhargava (10UCS002)
Ashok Kumar(10UCS010)
Laxmi Kant Yadav(10UCS027)
Vijay Kumar Gupta(10UCS057)
2. Translator must know the Grammatical Structure
of both Input and Output language.
3. According to many researchers, Sanskrit
is a very scientific language.
Sanskrit behaves very closely as
programming language.
So if we are able to make a translator that
translates Sanskrit into machine
code, then it would prove to be a
significant development in the field of
NLP(Natural Language Processing).
4. “NASA scientist Rick Briggs had invited 1,000 Sanskrit
scholars from India for working at NASA. But scholars
refused to allow the language to be put to foreign
use”- Dainik
Being a computer and human
understandable, Sanskrit was considered useful in
Space research and many other natural language
processing Applications.
5. We will first put up some concepts then employ
them -
1. Advantages of using Sanskrit
2. Lexical Analysis
3. Parsing
4. Approach
5. Where we are now.
6. Problems
7. References
10. Lexical analysis is the process of converting a
sequence of characters into a sequence of tokens
A program or function that performs lexical analysis
is called a lexical analyzer, lexer, tokenizer, or
scanner
A lexer often exists as a single function which is
called by a parser or another function, or can be
combined with the parser in scanner less parsing
The lexical analyzer is the first phase of translator.
It‟s main task is to read the input characters and
produces output a sequence of tokens that the
parser uses for syntax analysis.
12. Output of lexical analysis is a stream of tokens
A token is a syntactic category
◦ In English:
noun, verb, adjective, …
◦ In sanskrit language:
Vibhakti, kriya, vishashena, ..
Parser relies on the token distinctions:
13. An implementation must do two things:
1. Recognize substrings corresponding to tokens
2. Search the identified token in the database to
recognize it‟s context
3. According to the different context it may be different
parts of speech of Sanskrit language eg: verb
(kriya), vibhakti (dhatu roop).
4. Every token is tagged accordingly.
14. Two important points:
1. The goal is to partition the string. This is implemented
by reading left-to-right, recognizing one token at a time
2. “Lookahead” may be required to decide where one
token ends and the next token begins
◦ Even our simple example has lookahead issues
i vs. if
= vs. ==
14
16. LEXICAL ANALYSIS
Consider the dhatu(verb root) meaning „to heat‟
The following inflections are analyzed lexically -
HEATS WILL HEAT
, , | , , |
, , | , , |
, , , ,
HEATED HEAT IT(order)
, , | , , |
, , | , , |
, , , ,
17. LEXICAL ANALYSIS
Consider the noun representing God
The following inclusions are possible
1. Nominative (subject)
2. Accusative (object)
3. Instrumental (by)
4. Dative(to)
5. Ablative(from)
6. Genitive(of)
7. Locative(in)
19. The scanner recognizes words
The parser recognizes syntactic units
Parser operations:
◦ Check and verify syntax based on specified
syntax rules
◦ Report errors
Automation:
◦ The process can be automated
21. Parsing Sanskrit Text
Now we move towards translating a Sanskrit
sentence into its parser equivalent
PARSING
Analyze (a sentence) into its component parts and
describe their syntactic roles.
Analyze (a string or text) into logical syntactic components,
typically in order to test conformability to a logical grammar.
24. We first tokenize the input using strtok(str,” ”);
Each token can be of 3 types- Noun,verb,
preposition.The task is to identify these token
which is done by matching in indexed database.
Each token is stored in a structure along with the
meaning and its morphologic.
Then parser comes into play and form a tree
type of structure using these tokens.
25. Bottom-Up LR
◦ Construct parse tree in a bottom-up manner
◦ Find the rightmost derivation in a reverse order
◦ For every potential right hand side and token decide
when a production is found
More powerful
Bottom-up parsers can handle the largest class of
grammars that can be parsed deterministically
26. Programming language used: C and C++
Database Used: Linux file system, indexed
Data Structures: Array, Linked List, structure,Tree,
Indexing and Hashing
INPUT: A sanskrit sentence or paragraph
eg:
!
OUTPUT: recognize all the parts of speech
Form a tree structure to be able to understand the
sentence.
27. ::: this is a avyaya.. and the meaning is: where_there ]
::: Nominative,Singular, Gender-Masculine ,noun and the root is:
and the meaning is Ram
::: The root is: the meaning is: go present-tense,first-
person,singular
::: this is a avyaya.. and the meaning is: there
::: Nominative,Plural Gender-Masculine ,noun ,and the root is:
and the meaning is god
::: Instrumental,Singular, Gender-Masculine ,noun, and the
root is:
and the meaning is boy
::: Accusative,Singular, Gender-Feminine ,noun and the root is:
and the meaning is river
28. Avyaya words(indeclinables) are used to connect 2 or
more simple sentences. Examples -
- (if-then)
- (where-there)
(but)
(hence)
(provided,if)
Not only do avyaya connect sentences but they
also affect structure of a simple sentence.
29. Every word encountered in the input sentence could be
any parts of speech of sanskrit as there is no fixed
ordering.
Because of the above mentioned property of
sanskrit, searching becomes important.
Database and word collection were in unicode
format, size of each word becomes even larger.
30. Grammar of Sanskrit language
How can we represent it in BNF grammar.
Parser techniques
Structure of code
31. A big chunk of our time was invested in research of
sanskrit language and its grammar which was quite
difficult.
Till now we have implemented lexer part and parser part.
32. Sanskrit & Artificial Intelligence — NASA
Knowledge Representation in Sanskrit and Artificial Intelligence
by Rick Briggs
http://www.vedicsciences.net/articles/sanskrit-nasa.html
AI Magazine publishes the importance of Sanskrit
http://www.parankusa.org/SanskritAsProgramming.pdf
http://sanskrit.jnu.ac.in/morph/analyze.jsp
http://en.wikipedia.org/wiki/Sanskrit_verbs
http://en.wikipedia.org/wiki/Sanskrit_grammar