SlideShare une entreprise Scribd logo
1  sur  4
Télécharger pour lire hors ligne
International Journal of Trend in Scientific Research and Development (IJTSRD)
Volume: 3 | Issue: 4 | May-Jun 2019 Available Online: www.ijtsrd.com e-ISSN: 2456 - 6470
@ IJTSRD | Unique Paper ID - IJTSRD23841 | Volume – 3 | Issue – 4 | May-Jun 2019 Page: 578
Phrase Structure Identification and
Classification of Sentences using Deep Learning
Hashi Haris1, Misha Ravi2
1M.Tech Student, 2Assistant Professor
1,2Department of Computer Science & Engineering,
1,2Sree Buddha College of Engineering, Pathanamthitta, Kerala, India
How to cite this paper: Hashi Haris |
Misha Ravi "Phrase Structure
Identification and Classification of
Sentences using Deep Learning"
Published in International Journal of
Trend in Scientific Research and
Development
(ijtsrd), ISSN: 2456-
6470, Volume-3 |
Issue-4, June 2019,
pp.578-581, URL:
https://www.ijtsrd.c
om/papers/ijtsrd23
841.pdf
Copyright © 2019 by author(s) and
International Journal of Trend in
Scientific Research and Development
Journal. This is an Open Access article
distributed under
the terms of the
Creative Commons
Attribution License (CC BY 4.0)
(http://creativecommons.org/licenses/
by/4.0)
ABSTRACT
Phrase structure is the arrangement of words in a specific order based on the
constraints of a specified language. This arrangement is based on some phrase
structure rules which are according to the productions in context freegrammar.
The identification of the phrase structure can be done by breaking the specified
natural language sentence into its constituents that may be lexical and phrasal
categories. These phrase structures can be identified using parsing of the
sentences which is nothing but syntactic analysis. The proposed system deals
with this problem using Deep Learning strategy. Instead of using Rule Based
technique, supervised learning with sequence labelling is done using IOB
labelling. This is a sequence classification problem which has been trained and
modeled using RNN-LSTM. The proposed work has shown a considerable result
and can be applied in many applications of NLP.
Keywords: Deep Learning, Neural Network, Natural Language Processing, Phrase
Structure, Artificial Intelligence
INTRODUCTION
We are living in an era in which technology is flourishing at a faster pace. The
world is becoming automated by the rise of Artificial Intelligence. Thisneweraof
AI has given birth to very brilliant automated devices and robots which are
equivalent to human beings and possesses almostallcapabilitiesof ahuman.This
ability is due to the advent of deep learning technology. The Artificial Neural
Network concept has been under research and study since the late 1940’s.
Inspired from the structure and processing of the Biological neuron, researcher
studied and paved way for many applications that can be
implemented using the ANNs.
Artificial Neural Networkasthenamesuggests“neural”these
are the systems inspired from human brain which intend to
replicate the strategy that we humans use to learn things.
Neural Networks consist of input, output and hidden layers.
The hidden layers are responsible for the transformation of
input data into a form that can be used by the output layer.
These have become a part of ArtificialIntelligencemainlydue
to the advent of “Back propagation”. This technique allows
network to adjust the nodes in the hidden layers when the
predicted outcome shows deviations from the expected
outcome. Another important advancement is the arrival of
deep neuralnets. Deepneuralnetworkshavemultiplehidden
layers which help the neurons to extractmore features along
with feature engineering in order to get the desired output.
The input data is passed through multiple hidden layers and
within each layer more features are extracted and the output
is obtained. The deviation in the predicted output from the
target outputnamed as theerror would be rectifiedusingthe
backpropagation strategy until the target output is obtained
or error is reduced. Then the network would perform the
tasks without the human intervention. The strategiesusedto
learn and train the network are supervised, unsupervised or
reinforcement learning technique. In case of supervised
learning technique, the inputs are given with certain labels
and based on those labels, training occurs. In case of
unsupervised learning technique, no special features except
the input is provided to the network and the network learns
different patterns and features on its own and produces the
output. Reinforcement learning uses some rewards and
penalty techniques for the network to learn.
Phrase structure is the arrangement of words in a specific
order based on the constraints of a specified language. This
arrangement is based on some phrase structure rules which
are according to the productions in context free grammar.
The identification of the phrase structure can be done by
breaking the specified natural language sentence into its
constituents that may belexicalandphrasalcategories.These
phrases are tagged using IOBsequencelabelling.Thephrases
are classified into Noun Phrase and Verb Phrase. Many of the
NLP works have beendoneinlanguageslikeEnglish,Chinese,
etc. But very few works have been done in Indian Languages
especially South Dravidian languages like Malayalam. Due to
lack of resources such languages are resource poor and
corpus-based NLP tasks cannot be done. Languages like
Malayalam have variations in morphology as compared to
other languages like English or Hindi. The nouns have
IJTSRD23841
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID - IJTSRD23841 | Volume – 3 | Issue – 4 | May-Jun 2019 Page: 579
inflections due to to case, gender and number information.
The verbs are inflected due to tense, aspect and mood
information [1].
The state of the art is to identify the phrase structure of
Malayalam language by breaking the sentence into phrases
mainly Noun Phrase and Verb Phrase and tag the phrases
using suitable IOB tags. Sequence labelling is the strategy
applied in NLP for various applications such as named entity
recognition, the proposed work is done by applying this
strategy. The identification and classificationhavebeendone
using deep learning. Several deep neural networks are
available but for this purpose RNN (Recurrent Neural
Network) is used since it is proved to be efficient in sentence
structure representations. Long Short-Term Memory Units
(LSTM) which is a variety of RNN is used so as to overcome
some drawbacks of RNN.
A. RELATED WORKS
1. Convolutional Neural Networks
Convolutional neural networks have been used in manyNLP
tasks so far such as the pos tagging, chunking, sentence
modeling, etc. CNNs are designed in such a way that can
capture more important features from sentences. Works
have been reported in literature on sentence classification.
Such a work done by Kim [2] on sentence classificationtasks
like question type classification, sentiment, subjectivity
classification. But this vanilla network had difficulty in
modeling long distance dependencies. This issue wassolved
by designing a dynamic convolutional neuralnetwork.CNNs
require large training data in developing semantic models
with contextual window. This becomes an issue when data
scarcity occurs. Another issue with CNNs is they find
difficulty in preserving sequential data.
2. Sentence ordering and coherence modeling using
RNN
Modeling the structure of coherent texts is a key NLP
problem. The task of coherently organizing a given set of
sentences has been commonly used to build and evaluate
models that understand such structure. An end-to-end
unsupervised deep learning approach based on the set-to-
sequence framework is proposed to address this problem
[3]. RNNs are now the dominant approach to sequence
learning and mapping problems. An RNN-based approach to
the sentence ordering problem which exploits the set-to-
sequence framework of Vinyals, Bengio, and Kudlur (2015)
is proposed. The model is based on the read, process and
write framework. The model is comprised of a sentence
encoder RNN, an encoder RNN and a decoder RNN.
3. Deep learning for sentence classification
Neural network-based methods have obtained great
progress on a variety of natural language processing tasks.
The primary role of the neural models is to represent the
variable-length text as a fixed-length vector [4]. These
models generally consist of a projection layer that maps
words, sub-word units or n-grams to vector representations
(often trained beforehand with unsupervisedmethods),and
then combine them with the differentarchitecturesof neural
networks. Recurrent neural networks (RNN) are one of the
most popular architectures used in NLP problems because
their recurrent structure is very suitable to process the
variable-length text. A recurrent neural network (RNN)
[Elman, 1990] is able to process a sequence of arbitrary
length by recursively applying a transition function to its
internal hidden state vector ht of the input sequence. The
activation of the hidden state ht at time-step tiscomputed as
a function f of the current input symbol xt and the previous
hidden state ht-1. A simple strategy for modeling sequence is
to map the input sequence to a fixed-sized vector using one
RNN, and then to feed the vector to a softmax layer for
classification or other tasks.
B. PROPOSED SYSTEM
The proposed system titled “Phrase Structure Identification
and Classification of sentences using Deep Learning” for
Indian languages identifies the phrases within a Malayalam
sentence and classifies them into Noun and Verb Phrase.
Phrase Structure Grammar is an inevitable part of a
language. It helps to correctly frame a sentence in an order
based on the phrase structure rules. The proposed system
has a modern solution to the identification and classification
of phrases without the help of rules and no linguistic
knowledge is necessary which is possible through the most
trending hot technology named “Deep Learning”. The
relevance of phrase structure identification is checking the
grammar of a natural language becomes more accurate and
efficient if the phrases are identified correctly and it helps to
track the error within phrase level and it’s easy to rectify
those error phrases. Instead of taking into considerationthe
entire document or sentence, phrase level identification
helps in easy error tracking and rectification of grammar.
Figure B.1: Architecture of Proposed System
Phrase Structure identification and classification has been
done by the combination of two modules:
1. Data Preprocessing
The proposed system can be generalized into a Text
Classification problem. Text Classification is a common task
in NLP which would transform a sequence of text with
indefinite length into a category of text. Data Preprocessing
is a major task in all Deep Learning applications. The input
data fed into the deep neural net must be in a pattern
compatible to the network. Basically, data preprocessing is
undergone through a series of steps like cleaning,stemming,
stop word removal etc. In the proposed system, the dataset
consists of Malayalam sentences. The preprocessing done
here is splitting the sentences into phrases. Each sentence
has a sentenced and its constituent phrases also have the
same sentence id. The dataset is represented using pandas
data frame. The work has been focused onwordlengththree
Malayalam sentences from health and newspaper domain.
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID - IJTSRD23841 | Volume – 3 | Issue – 4 | May-Jun 2019 Page: 580
Next a representation of the input text is created so that it
could be fed into the deep learning model.Thephrasesinthe
text are represented asvectors.Thephrasesaremapped into
a dictionary with a sequence of numbers and these numbers
are the vector representation of the phrases. These phrases
are then padded. Since it is classification model the phrases
are tagged into Noun and Verb phrase. These tag sequences
are represented as dictionary with sequenceof numbersand
then padded. To train the model the tag sequences or the
labels are represented in a categorical format. The Noun
phrase and Verb phrase are labelled as NP and VP. Sequence
labelling is the strategy applied with IOB labels I-refers to
inside of a chunk, O- outside of a chunk, B- beginning of a
chunk. The proposed model is trained using supervised
learning and it’s a sequence or text classification task so the
labels are already tagged and the dataset is prepared in a
compatible manner to the DNN model.
2. Phrase Structure Identification and Classification
Training and Testing
The data that is given into the model has been preprocessed,
so the next task in a Deep Learning applicationis buildingup
a model. Recurrent Neural Networks are known to handle
sequential data and has been widelyused and recommended
in literature for text classification tasks. Recurrent Neural
Networks are nothing but a chunk of neural network into
which input data is given and it outputs a hidden layer. This
single chunk of neural network represented as a loop forms
a Recurrent Neural Network as the name suggests, it is
recurrent in nature. But RNN’s have a drawback of vanishing
and exploding gradient problem. The neural networks work
based on a Backpropagation algorithm. This is done mainly
to backtrack the network and rectify the error. At thetimeof
backtracking the network, some gradient values get
vanished or some of them increase exponentially which
would lead to fault in prediction of the output. In theory
RNN’s can handle long term dependencies but in practice it
is found that RNN’s lack that ability. RNN’s can predict the
previous or the next information but it cannot accurately
predict the word or information that are at a larger distance
from the current input word, this is the long-term
dependency problem. To overcome these kinds of issues
persisting in RNN’s a variant of RNN named Long Short-
Term Memory Units (LSTM) are used.
C. BIDIRECTIONAL LSTM’S
Bidirectional LSTMs are an extension of traditional LSTMs
that can improve model performance on sequence
classification problems. In problems where all timesteps of
the input sequence are available, Bidirectional LSTMs train
two instead of one LSTMs on the input sequence.Thefirst on
the input sequence as is and the second on a reversed copy
of the input sequence. This can provide additionalcontextto
the network and result in faster and even fuller learning on
the problem. LSTM in its core, preserves information from
inputs that has already passed through it using the hidden
state. Unidirectional LSTM only preservesinformationofthe
past because the only inputs it has seen are from the past.
Using bidirectional will run the inputs in two ways,onefrom
past to future and one from future to past and what differs
this approach from unidirectional is that, in the LSTM that
runs backwards we can preserve information from the
future and using the two hidden statescombined weareable
in any point in time to preserve information from both past
and future .BiLSTMs show very good results as they can
understand context better.
D. EXPERIMENTAL ANALYSIS
The Phrase Structure Classification model is definedthenthe
model is compiled and fit the model with all
hyperparameters. The first layer is an Embedding layer to
which the input is mapped. A functional Keras API is used to
build the model which uses Bidirectional LSTM with 100
memory units and recurrent dropout of 0.1. The activation
function used is softmax since it is suitable for binary
classification problems. The optimizer used is adam
optimizer which is widely used with binary cross entropy as
loss function. The model is compiled using the above
hyperparameters and it is fitted with batchsizeof32and100
epochs. Once the model is trained, the model is undergone
testingby giving as inputtestdatawhicharealsophrasesand
the model would classify the phrases by predicting them as
Noun Phrase and Verb Phrase.
For development of the proposed system following are the
requirements:
TensorFlow is an open-source software library used to
develop and deploy machine learning applications. Keras is
an open source neural network library written in Python. It
is designed for faster implementation of deep neural
networks. It is user friendly, modularand extensible.Jupyter
notebook is an interactive development environment to
designed to support interactive data science and scientific
computing across all programming languages. Pandas is a
software library designed for the Python programming
language for data manipulation and analysis. NumPy is a
Python programming language library thatdeals withmulti-
dimensional arrays, matrices and various mathematical
functions for operating on these arrays and matrices. The
performance required for the proposed system are:
Processor – Intel core i3 or higher RAM – 4GB or higher,
Speed – 1.80 Ghz or higher, OS – Ubuntu 16 orhigher,MacOS,
OS type-64-bit, Programming language – Python 3.6, Disk
space – 491.2 GB or higher.
E. Results
This section discusses the experimental results of the
proposed system. First of all, the proposed system uses
original Malayalam documents given by CDAC, Trivandrum
for results assessment. The proposed system is composed of
two modules. The first module named Data Preprocessing
module deals with converting Malayalam sentences into
phrases and labelling them with IOB tags. For word
embedding skip gram and cbow models were also
implemented. For the proposed system the embedding is
done by just encoding the input phrases into some integer
vectors. The second module which is the trainingandtesting
phase of the proposed model named Phrase structure
identification and classification training and testing deals
with the development of the model which provided an
accuracy of 100% for 210 sentence phrases with validation
accuracy of 100% with training loss of 1.1286e-04 and
validation loss of 0.0395.
F. Analysis
The proposed system has been modeled as a Sequence
Classification Problem using a sequence labelling strategy
with RNN-Bidirectional LSTM. The following graphs show
the loss and accuracy curves.
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID - IJTSRD23841 | Volume – 3 | Issue – 4 | May-Jun 2019 Page: 581
Figure D.b.1: Loss Curve
Figure D.b.2: Accuracy Curve
G. CONCLUSION & FUTURE WORKS
Deep learning has been widely used in many NLP tasks as it
needs only little engineering by hand. The proposed system
is basically a sequence classification problem under NLP
which has been solved by applying supervised learning
strategy. RNN with LSTM has been used to train the model.
The proposed problem hasbeen doneonlyfor200sentences
which is just a miniature version of theproblemandit can be
further improved with the help of more data. The Sequence
IOB labelling technique with Bidirectionallstms havegiven a
considerable accuracy with small loss percentage. The
proposed system can be further improved and used in the
development of many NLP tools.This strategycan beapplied
for other languages also. But still there are various areas in
which deep learning is still in its childhoodstageasincase of
processing with unlabeled data. But with the developing
researches it is expected that deep learning would become
more enhanced and can be applied in different areas.
H. ACKNOWLEDGMENT
The proposed work is an R&D project of CDAC-TVM. The
work was done under supervision of Knowledge Resource
Centre of CDAC-TVM. A sincere gratitude to CDAC-TVM for
providing an opportunity to work on their project, giving
permission to access their resources and guiding for the
successful completion of this work.
I. REFERENCES
[1] Latha R Nair and David peter S “Language Parsing and
Syntax of Malayalam Language” 2nd International
Symposium on Computer,Communication, Controland
Automation (3CA 2013)
[2] Y. Kim, “Convolutional neural networks for sentence
classification,” arXiv preprint arXiv:1408.5882, 2014.
[3] Lajanugen Logeswaran, Honglak Lee, Dragomir Radev
“Sentence Ordering and Coherence Modeling using
Recurrent Neural Networks”
[4] Pengfei Liu Xipeng Qiu and Xuanjing Huang“Recurrent
Neural Network for Text Classificationwith Multi-Task
Learning “Proceedings of the Twenty-Fifth
International Joint Conference onArtificialIntelligence

Contenu connexe

Tendances

Project report - Bengali digit recongnition using SVM
Project report - Bengali digit recongnition using SVMProject report - Bengali digit recongnition using SVM
Project report - Bengali digit recongnition using SVM
Mohammad Saiful Islam
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language Processing
Waqas Tariq
 
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
kevig
 

Tendances (19)

A prior case study of natural language processing on different domain
A prior case study of natural language processing  on different domain A prior case study of natural language processing  on different domain
A prior case study of natural language processing on different domain
 
Project report - Bengali digit recongnition using SVM
Project report - Bengali digit recongnition using SVMProject report - Bengali digit recongnition using SVM
Project report - Bengali digit recongnition using SVM
 
A Survey on Word Sense Disambiguation
A Survey on Word Sense DisambiguationA Survey on Word Sense Disambiguation
A Survey on Word Sense Disambiguation
 
Symbol Emergence in Robotics: Language Acquisition via Real-world Sensorimoto...
Symbol Emergence in Robotics: Language Acquisition via Real-world Sensorimoto...Symbol Emergence in Robotics: Language Acquisition via Real-world Sensorimoto...
Symbol Emergence in Robotics: Language Acquisition via Real-world Sensorimoto...
 
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATANAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language Processing
 
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...
 
Applsci 09-02758
Applsci 09-02758Applsci 09-02758
Applsci 09-02758
 
Nonparametric Bayesian Word Discovery for Symbol Emergence in Robotics
Nonparametric Bayesian Word Discovery for Symbol Emergence in RoboticsNonparametric Bayesian Word Discovery for Symbol Emergence in Robotics
Nonparametric Bayesian Word Discovery for Symbol Emergence in Robotics
 
Semantic Segmentation of Driving Behavior Data: Double Articulation Analyzer ...
Semantic Segmentation of Driving Behavior Data: Double Articulation Analyzer ...Semantic Segmentation of Driving Behavior Data: Double Articulation Analyzer ...
Semantic Segmentation of Driving Behavior Data: Double Articulation Analyzer ...
 
Natural Language Processing Theory, Applications and Difficulties
Natural Language Processing Theory, Applications and DifficultiesNatural Language Processing Theory, Applications and Difficulties
Natural Language Processing Theory, Applications and Difficulties
 
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
 
A Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition SystemA Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition System
 
Rule based algorithm for handwritten characters recognition
Rule based algorithm for handwritten characters recognitionRule based algorithm for handwritten characters recognition
Rule based algorithm for handwritten characters recognition
 
Detection of multiword from a wordnet is complex
Detection of multiword from a wordnet is complexDetection of multiword from a wordnet is complex
Detection of multiword from a wordnet is complex
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
 
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
 
An Improved Approach for Word Ambiguity Removal
An Improved Approach for Word Ambiguity RemovalAn Improved Approach for Word Ambiguity Removal
An Improved Approach for Word Ambiguity Removal
 
New instances classification framework on Quran ontology applied to question ...
New instances classification framework on Quran ontology applied to question ...New instances classification framework on Quran ontology applied to question ...
New instances classification framework on Quran ontology applied to question ...
 

Similaire à Phrase Structure Identification and Classification of Sentences using Deep Learning

Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
IJTET Journal
 

Similaire à Phrase Structure Identification and Classification of Sentences using Deep Learning (20)

IRJET- Survey on Text Error Detection using Deep Learning
IRJET-  	  Survey on Text Error Detection using Deep LearningIRJET-  	  Survey on Text Error Detection using Deep Learning
IRJET- Survey on Text Error Detection using Deep Learning
 
LSTM Based Sentiment Analysis
LSTM Based Sentiment AnalysisLSTM Based Sentiment Analysis
LSTM Based Sentiment Analysis
 
Eat it, Review it: A New Approach for Review Prediction
Eat it, Review it: A New Approach for Review PredictionEat it, Review it: A New Approach for Review Prediction
Eat it, Review it: A New Approach for Review Prediction
 
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemKnowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
 
Survey on Text Prediction Techniques
Survey on Text Prediction TechniquesSurvey on Text Prediction Techniques
Survey on Text Prediction Techniques
 
Pattern Recognition using Artificial Neural Network
Pattern Recognition using Artificial Neural NetworkPattern Recognition using Artificial Neural Network
Pattern Recognition using Artificial Neural Network
 
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKSENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural NetworkSentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
 
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGE
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGESYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGE
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGE
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 
Balochi Language Text Classification Using Deep Learning 1.pptx
Balochi Language Text Classification Using Deep Learning 1.pptxBalochi Language Text Classification Using Deep Learning 1.pptx
Balochi Language Text Classification Using Deep Learning 1.pptx
 
An Overview Of Natural Language Processing
An Overview Of Natural Language ProcessingAn Overview Of Natural Language Processing
An Overview Of Natural Language Processing
 
NLP Techniques for Text Classification.docx
NLP Techniques for Text Classification.docxNLP Techniques for Text Classification.docx
NLP Techniques for Text Classification.docx
 
Semi Automated Text Categorization Using Demonstration Based Term Set
Semi Automated Text Categorization Using Demonstration Based Term SetSemi Automated Text Categorization Using Demonstration Based Term Set
Semi Automated Text Categorization Using Demonstration Based Term Set
 
Paper id 28201441
Paper id 28201441Paper id 28201441
Paper id 28201441
 
A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...
 
NLP Techniques for Named Entity Recognition.docx
NLP Techniques for Named Entity Recognition.docxNLP Techniques for Named Entity Recognition.docx
NLP Techniques for Named Entity Recognition.docx
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
 
INTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMINTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAM
 

Plus de ijtsrd

‘Six Sigma Technique’ A Journey Through its Implementation
‘Six Sigma Technique’ A Journey Through its Implementation‘Six Sigma Technique’ A Journey Through its Implementation
‘Six Sigma Technique’ A Journey Through its Implementation
ijtsrd
 
Dynamics of Communal Politics in 21st Century India Challenges and Prospects
Dynamics of Communal Politics in 21st Century India Challenges and ProspectsDynamics of Communal Politics in 21st Century India Challenges and Prospects
Dynamics of Communal Politics in 21st Century India Challenges and Prospects
ijtsrd
 
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...
ijtsrd
 
The Impact of Digital Media on the Decentralization of Power and the Erosion ...
The Impact of Digital Media on the Decentralization of Power and the Erosion ...The Impact of Digital Media on the Decentralization of Power and the Erosion ...
The Impact of Digital Media on the Decentralization of Power and the Erosion ...
ijtsrd
 
Problems and Challenges of Agro Entreprenurship A Study
Problems and Challenges of Agro Entreprenurship A StudyProblems and Challenges of Agro Entreprenurship A Study
Problems and Challenges of Agro Entreprenurship A Study
ijtsrd
 
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...
ijtsrd
 
A Study on the Effective Teaching Learning Process in English Curriculum at t...
A Study on the Effective Teaching Learning Process in English Curriculum at t...A Study on the Effective Teaching Learning Process in English Curriculum at t...
A Study on the Effective Teaching Learning Process in English Curriculum at t...
ijtsrd
 
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...
ijtsrd
 
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...
ijtsrd
 
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadiku
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. SadikuSustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadiku
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadiku
ijtsrd
 
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...
ijtsrd
 
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...
ijtsrd
 
Activating Geospatial Information for Sudans Sustainable Investment Map
Activating Geospatial Information for Sudans Sustainable Investment MapActivating Geospatial Information for Sudans Sustainable Investment Map
Activating Geospatial Information for Sudans Sustainable Investment Map
ijtsrd
 
Educational Unity Embracing Diversity for a Stronger Society
Educational Unity Embracing Diversity for a Stronger SocietyEducational Unity Embracing Diversity for a Stronger Society
Educational Unity Embracing Diversity for a Stronger Society
ijtsrd
 
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...DeepMask Transforming Face Mask Identification for Better Pandemic Control in...
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...
ijtsrd
 

Plus de ijtsrd (20)

‘Six Sigma Technique’ A Journey Through its Implementation
‘Six Sigma Technique’ A Journey Through its Implementation‘Six Sigma Technique’ A Journey Through its Implementation
‘Six Sigma Technique’ A Journey Through its Implementation
 
Edge Computing in Space Enhancing Data Processing and Communication for Space...
Edge Computing in Space Enhancing Data Processing and Communication for Space...Edge Computing in Space Enhancing Data Processing and Communication for Space...
Edge Computing in Space Enhancing Data Processing and Communication for Space...
 
Dynamics of Communal Politics in 21st Century India Challenges and Prospects
Dynamics of Communal Politics in 21st Century India Challenges and ProspectsDynamics of Communal Politics in 21st Century India Challenges and Prospects
Dynamics of Communal Politics in 21st Century India Challenges and Prospects
 
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...
 
The Impact of Digital Media on the Decentralization of Power and the Erosion ...
The Impact of Digital Media on the Decentralization of Power and the Erosion ...The Impact of Digital Media on the Decentralization of Power and the Erosion ...
The Impact of Digital Media on the Decentralization of Power and the Erosion ...
 
Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...
Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...
Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...
 
Problems and Challenges of Agro Entreprenurship A Study
Problems and Challenges of Agro Entreprenurship A StudyProblems and Challenges of Agro Entreprenurship A Study
Problems and Challenges of Agro Entreprenurship A Study
 
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...
 
The Impact of Educational Background and Professional Training on Human Right...
The Impact of Educational Background and Professional Training on Human Right...The Impact of Educational Background and Professional Training on Human Right...
The Impact of Educational Background and Professional Training on Human Right...
 
A Study on the Effective Teaching Learning Process in English Curriculum at t...
A Study on the Effective Teaching Learning Process in English Curriculum at t...A Study on the Effective Teaching Learning Process in English Curriculum at t...
A Study on the Effective Teaching Learning Process in English Curriculum at t...
 
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...
 
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...
 
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadiku
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. SadikuSustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadiku
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadiku
 
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...
 
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...
 
Activating Geospatial Information for Sudans Sustainable Investment Map
Activating Geospatial Information for Sudans Sustainable Investment MapActivating Geospatial Information for Sudans Sustainable Investment Map
Activating Geospatial Information for Sudans Sustainable Investment Map
 
Educational Unity Embracing Diversity for a Stronger Society
Educational Unity Embracing Diversity for a Stronger SocietyEducational Unity Embracing Diversity for a Stronger Society
Educational Unity Embracing Diversity for a Stronger Society
 
Integration of Indian Indigenous Knowledge System in Management Prospects and...
Integration of Indian Indigenous Knowledge System in Management Prospects and...Integration of Indian Indigenous Knowledge System in Management Prospects and...
Integration of Indian Indigenous Knowledge System in Management Prospects and...
 
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...DeepMask Transforming Face Mask Identification for Better Pandemic Control in...
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...
 
Streamlining Data Collection eCRF Design and Machine Learning
Streamlining Data Collection eCRF Design and Machine LearningStreamlining Data Collection eCRF Design and Machine Learning
Streamlining Data Collection eCRF Design and Machine Learning
 

Dernier

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Dernier (20)

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 

Phrase Structure Identification and Classification of Sentences using Deep Learning

  • 1. International Journal of Trend in Scientific Research and Development (IJTSRD) Volume: 3 | Issue: 4 | May-Jun 2019 Available Online: www.ijtsrd.com e-ISSN: 2456 - 6470 @ IJTSRD | Unique Paper ID - IJTSRD23841 | Volume – 3 | Issue – 4 | May-Jun 2019 Page: 578 Phrase Structure Identification and Classification of Sentences using Deep Learning Hashi Haris1, Misha Ravi2 1M.Tech Student, 2Assistant Professor 1,2Department of Computer Science & Engineering, 1,2Sree Buddha College of Engineering, Pathanamthitta, Kerala, India How to cite this paper: Hashi Haris | Misha Ravi "Phrase Structure Identification and Classification of Sentences using Deep Learning" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456- 6470, Volume-3 | Issue-4, June 2019, pp.578-581, URL: https://www.ijtsrd.c om/papers/ijtsrd23 841.pdf Copyright © 2019 by author(s) and International Journal of Trend in Scientific Research and Development Journal. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0) (http://creativecommons.org/licenses/ by/4.0) ABSTRACT Phrase structure is the arrangement of words in a specific order based on the constraints of a specified language. This arrangement is based on some phrase structure rules which are according to the productions in context freegrammar. The identification of the phrase structure can be done by breaking the specified natural language sentence into its constituents that may be lexical and phrasal categories. These phrase structures can be identified using parsing of the sentences which is nothing but syntactic analysis. The proposed system deals with this problem using Deep Learning strategy. Instead of using Rule Based technique, supervised learning with sequence labelling is done using IOB labelling. This is a sequence classification problem which has been trained and modeled using RNN-LSTM. The proposed work has shown a considerable result and can be applied in many applications of NLP. Keywords: Deep Learning, Neural Network, Natural Language Processing, Phrase Structure, Artificial Intelligence INTRODUCTION We are living in an era in which technology is flourishing at a faster pace. The world is becoming automated by the rise of Artificial Intelligence. Thisneweraof AI has given birth to very brilliant automated devices and robots which are equivalent to human beings and possesses almostallcapabilitiesof ahuman.This ability is due to the advent of deep learning technology. The Artificial Neural Network concept has been under research and study since the late 1940’s. Inspired from the structure and processing of the Biological neuron, researcher studied and paved way for many applications that can be implemented using the ANNs. Artificial Neural Networkasthenamesuggests“neural”these are the systems inspired from human brain which intend to replicate the strategy that we humans use to learn things. Neural Networks consist of input, output and hidden layers. The hidden layers are responsible for the transformation of input data into a form that can be used by the output layer. These have become a part of ArtificialIntelligencemainlydue to the advent of “Back propagation”. This technique allows network to adjust the nodes in the hidden layers when the predicted outcome shows deviations from the expected outcome. Another important advancement is the arrival of deep neuralnets. Deepneuralnetworkshavemultiplehidden layers which help the neurons to extractmore features along with feature engineering in order to get the desired output. The input data is passed through multiple hidden layers and within each layer more features are extracted and the output is obtained. The deviation in the predicted output from the target outputnamed as theerror would be rectifiedusingthe backpropagation strategy until the target output is obtained or error is reduced. Then the network would perform the tasks without the human intervention. The strategiesusedto learn and train the network are supervised, unsupervised or reinforcement learning technique. In case of supervised learning technique, the inputs are given with certain labels and based on those labels, training occurs. In case of unsupervised learning technique, no special features except the input is provided to the network and the network learns different patterns and features on its own and produces the output. Reinforcement learning uses some rewards and penalty techniques for the network to learn. Phrase structure is the arrangement of words in a specific order based on the constraints of a specified language. This arrangement is based on some phrase structure rules which are according to the productions in context free grammar. The identification of the phrase structure can be done by breaking the specified natural language sentence into its constituents that may belexicalandphrasalcategories.These phrases are tagged using IOBsequencelabelling.Thephrases are classified into Noun Phrase and Verb Phrase. Many of the NLP works have beendoneinlanguageslikeEnglish,Chinese, etc. But very few works have been done in Indian Languages especially South Dravidian languages like Malayalam. Due to lack of resources such languages are resource poor and corpus-based NLP tasks cannot be done. Languages like Malayalam have variations in morphology as compared to other languages like English or Hindi. The nouns have IJTSRD23841
  • 2. International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID - IJTSRD23841 | Volume – 3 | Issue – 4 | May-Jun 2019 Page: 579 inflections due to to case, gender and number information. The verbs are inflected due to tense, aspect and mood information [1]. The state of the art is to identify the phrase structure of Malayalam language by breaking the sentence into phrases mainly Noun Phrase and Verb Phrase and tag the phrases using suitable IOB tags. Sequence labelling is the strategy applied in NLP for various applications such as named entity recognition, the proposed work is done by applying this strategy. The identification and classificationhavebeendone using deep learning. Several deep neural networks are available but for this purpose RNN (Recurrent Neural Network) is used since it is proved to be efficient in sentence structure representations. Long Short-Term Memory Units (LSTM) which is a variety of RNN is used so as to overcome some drawbacks of RNN. A. RELATED WORKS 1. Convolutional Neural Networks Convolutional neural networks have been used in manyNLP tasks so far such as the pos tagging, chunking, sentence modeling, etc. CNNs are designed in such a way that can capture more important features from sentences. Works have been reported in literature on sentence classification. Such a work done by Kim [2] on sentence classificationtasks like question type classification, sentiment, subjectivity classification. But this vanilla network had difficulty in modeling long distance dependencies. This issue wassolved by designing a dynamic convolutional neuralnetwork.CNNs require large training data in developing semantic models with contextual window. This becomes an issue when data scarcity occurs. Another issue with CNNs is they find difficulty in preserving sequential data. 2. Sentence ordering and coherence modeling using RNN Modeling the structure of coherent texts is a key NLP problem. The task of coherently organizing a given set of sentences has been commonly used to build and evaluate models that understand such structure. An end-to-end unsupervised deep learning approach based on the set-to- sequence framework is proposed to address this problem [3]. RNNs are now the dominant approach to sequence learning and mapping problems. An RNN-based approach to the sentence ordering problem which exploits the set-to- sequence framework of Vinyals, Bengio, and Kudlur (2015) is proposed. The model is based on the read, process and write framework. The model is comprised of a sentence encoder RNN, an encoder RNN and a decoder RNN. 3. Deep learning for sentence classification Neural network-based methods have obtained great progress on a variety of natural language processing tasks. The primary role of the neural models is to represent the variable-length text as a fixed-length vector [4]. These models generally consist of a projection layer that maps words, sub-word units or n-grams to vector representations (often trained beforehand with unsupervisedmethods),and then combine them with the differentarchitecturesof neural networks. Recurrent neural networks (RNN) are one of the most popular architectures used in NLP problems because their recurrent structure is very suitable to process the variable-length text. A recurrent neural network (RNN) [Elman, 1990] is able to process a sequence of arbitrary length by recursively applying a transition function to its internal hidden state vector ht of the input sequence. The activation of the hidden state ht at time-step tiscomputed as a function f of the current input symbol xt and the previous hidden state ht-1. A simple strategy for modeling sequence is to map the input sequence to a fixed-sized vector using one RNN, and then to feed the vector to a softmax layer for classification or other tasks. B. PROPOSED SYSTEM The proposed system titled “Phrase Structure Identification and Classification of sentences using Deep Learning” for Indian languages identifies the phrases within a Malayalam sentence and classifies them into Noun and Verb Phrase. Phrase Structure Grammar is an inevitable part of a language. It helps to correctly frame a sentence in an order based on the phrase structure rules. The proposed system has a modern solution to the identification and classification of phrases without the help of rules and no linguistic knowledge is necessary which is possible through the most trending hot technology named “Deep Learning”. The relevance of phrase structure identification is checking the grammar of a natural language becomes more accurate and efficient if the phrases are identified correctly and it helps to track the error within phrase level and it’s easy to rectify those error phrases. Instead of taking into considerationthe entire document or sentence, phrase level identification helps in easy error tracking and rectification of grammar. Figure B.1: Architecture of Proposed System Phrase Structure identification and classification has been done by the combination of two modules: 1. Data Preprocessing The proposed system can be generalized into a Text Classification problem. Text Classification is a common task in NLP which would transform a sequence of text with indefinite length into a category of text. Data Preprocessing is a major task in all Deep Learning applications. The input data fed into the deep neural net must be in a pattern compatible to the network. Basically, data preprocessing is undergone through a series of steps like cleaning,stemming, stop word removal etc. In the proposed system, the dataset consists of Malayalam sentences. The preprocessing done here is splitting the sentences into phrases. Each sentence has a sentenced and its constituent phrases also have the same sentence id. The dataset is represented using pandas data frame. The work has been focused onwordlengththree Malayalam sentences from health and newspaper domain.
  • 3. International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID - IJTSRD23841 | Volume – 3 | Issue – 4 | May-Jun 2019 Page: 580 Next a representation of the input text is created so that it could be fed into the deep learning model.Thephrasesinthe text are represented asvectors.Thephrasesaremapped into a dictionary with a sequence of numbers and these numbers are the vector representation of the phrases. These phrases are then padded. Since it is classification model the phrases are tagged into Noun and Verb phrase. These tag sequences are represented as dictionary with sequenceof numbersand then padded. To train the model the tag sequences or the labels are represented in a categorical format. The Noun phrase and Verb phrase are labelled as NP and VP. Sequence labelling is the strategy applied with IOB labels I-refers to inside of a chunk, O- outside of a chunk, B- beginning of a chunk. The proposed model is trained using supervised learning and it’s a sequence or text classification task so the labels are already tagged and the dataset is prepared in a compatible manner to the DNN model. 2. Phrase Structure Identification and Classification Training and Testing The data that is given into the model has been preprocessed, so the next task in a Deep Learning applicationis buildingup a model. Recurrent Neural Networks are known to handle sequential data and has been widelyused and recommended in literature for text classification tasks. Recurrent Neural Networks are nothing but a chunk of neural network into which input data is given and it outputs a hidden layer. This single chunk of neural network represented as a loop forms a Recurrent Neural Network as the name suggests, it is recurrent in nature. But RNN’s have a drawback of vanishing and exploding gradient problem. The neural networks work based on a Backpropagation algorithm. This is done mainly to backtrack the network and rectify the error. At thetimeof backtracking the network, some gradient values get vanished or some of them increase exponentially which would lead to fault in prediction of the output. In theory RNN’s can handle long term dependencies but in practice it is found that RNN’s lack that ability. RNN’s can predict the previous or the next information but it cannot accurately predict the word or information that are at a larger distance from the current input word, this is the long-term dependency problem. To overcome these kinds of issues persisting in RNN’s a variant of RNN named Long Short- Term Memory Units (LSTM) are used. C. BIDIRECTIONAL LSTM’S Bidirectional LSTMs are an extension of traditional LSTMs that can improve model performance on sequence classification problems. In problems where all timesteps of the input sequence are available, Bidirectional LSTMs train two instead of one LSTMs on the input sequence.Thefirst on the input sequence as is and the second on a reversed copy of the input sequence. This can provide additionalcontextto the network and result in faster and even fuller learning on the problem. LSTM in its core, preserves information from inputs that has already passed through it using the hidden state. Unidirectional LSTM only preservesinformationofthe past because the only inputs it has seen are from the past. Using bidirectional will run the inputs in two ways,onefrom past to future and one from future to past and what differs this approach from unidirectional is that, in the LSTM that runs backwards we can preserve information from the future and using the two hidden statescombined weareable in any point in time to preserve information from both past and future .BiLSTMs show very good results as they can understand context better. D. EXPERIMENTAL ANALYSIS The Phrase Structure Classification model is definedthenthe model is compiled and fit the model with all hyperparameters. The first layer is an Embedding layer to which the input is mapped. A functional Keras API is used to build the model which uses Bidirectional LSTM with 100 memory units and recurrent dropout of 0.1. The activation function used is softmax since it is suitable for binary classification problems. The optimizer used is adam optimizer which is widely used with binary cross entropy as loss function. The model is compiled using the above hyperparameters and it is fitted with batchsizeof32and100 epochs. Once the model is trained, the model is undergone testingby giving as inputtestdatawhicharealsophrasesand the model would classify the phrases by predicting them as Noun Phrase and Verb Phrase. For development of the proposed system following are the requirements: TensorFlow is an open-source software library used to develop and deploy machine learning applications. Keras is an open source neural network library written in Python. It is designed for faster implementation of deep neural networks. It is user friendly, modularand extensible.Jupyter notebook is an interactive development environment to designed to support interactive data science and scientific computing across all programming languages. Pandas is a software library designed for the Python programming language for data manipulation and analysis. NumPy is a Python programming language library thatdeals withmulti- dimensional arrays, matrices and various mathematical functions for operating on these arrays and matrices. The performance required for the proposed system are: Processor – Intel core i3 or higher RAM – 4GB or higher, Speed – 1.80 Ghz or higher, OS – Ubuntu 16 orhigher,MacOS, OS type-64-bit, Programming language – Python 3.6, Disk space – 491.2 GB or higher. E. Results This section discusses the experimental results of the proposed system. First of all, the proposed system uses original Malayalam documents given by CDAC, Trivandrum for results assessment. The proposed system is composed of two modules. The first module named Data Preprocessing module deals with converting Malayalam sentences into phrases and labelling them with IOB tags. For word embedding skip gram and cbow models were also implemented. For the proposed system the embedding is done by just encoding the input phrases into some integer vectors. The second module which is the trainingandtesting phase of the proposed model named Phrase structure identification and classification training and testing deals with the development of the model which provided an accuracy of 100% for 210 sentence phrases with validation accuracy of 100% with training loss of 1.1286e-04 and validation loss of 0.0395. F. Analysis The proposed system has been modeled as a Sequence Classification Problem using a sequence labelling strategy with RNN-Bidirectional LSTM. The following graphs show the loss and accuracy curves.
  • 4. International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID - IJTSRD23841 | Volume – 3 | Issue – 4 | May-Jun 2019 Page: 581 Figure D.b.1: Loss Curve Figure D.b.2: Accuracy Curve G. CONCLUSION & FUTURE WORKS Deep learning has been widely used in many NLP tasks as it needs only little engineering by hand. The proposed system is basically a sequence classification problem under NLP which has been solved by applying supervised learning strategy. RNN with LSTM has been used to train the model. The proposed problem hasbeen doneonlyfor200sentences which is just a miniature version of theproblemandit can be further improved with the help of more data. The Sequence IOB labelling technique with Bidirectionallstms havegiven a considerable accuracy with small loss percentage. The proposed system can be further improved and used in the development of many NLP tools.This strategycan beapplied for other languages also. But still there are various areas in which deep learning is still in its childhoodstageasincase of processing with unlabeled data. But with the developing researches it is expected that deep learning would become more enhanced and can be applied in different areas. H. ACKNOWLEDGMENT The proposed work is an R&D project of CDAC-TVM. The work was done under supervision of Knowledge Resource Centre of CDAC-TVM. A sincere gratitude to CDAC-TVM for providing an opportunity to work on their project, giving permission to access their resources and guiding for the successful completion of this work. I. REFERENCES [1] Latha R Nair and David peter S “Language Parsing and Syntax of Malayalam Language” 2nd International Symposium on Computer,Communication, Controland Automation (3CA 2013) [2] Y. Kim, “Convolutional neural networks for sentence classification,” arXiv preprint arXiv:1408.5882, 2014. [3] Lajanugen Logeswaran, Honglak Lee, Dragomir Radev “Sentence Ordering and Coherence Modeling using Recurrent Neural Networks” [4] Pengfei Liu Xipeng Qiu and Xuanjing Huang“Recurrent Neural Network for Text Classificationwith Multi-Task Learning “Proceedings of the Twenty-Fifth International Joint Conference onArtificialIntelligence