Natural Language Processing (NLP) involves using computational techniques to analyze and understand human languages. Key techniques in NLP include sentiment analysis to classify emotions in text, text classification to categorize text into predefined tags or categories, and tokenization which breaks text into discrete words and punctuation. NLP is used to teach machines how to read and understand human languages by identifying relationships between words and entities. Other areas of NLP include parts of speech tagging, constituent structure analysis, and analysis of pronunciation, morphology, syntax, semantics, and pragmatics.
3. Sentiment analysis
A technique used to interpret and classify emotions in subjective data. Sentiment
analysis is often performed on textual data to detect sentiment in emails, survey
responses, social media data, and beyond.
4. Text classification
Text classification is the process of categorizing text into organized groups. By
using Natural Language Processing (NLP), text classifiers can automatically
analyze text and then assign a set of pre-defined tags or categories based on its
content.
5. NLP
Identify, Analyze, Understand and Generate human languages
Applying computational techniques to natural language
• Explain computational linguistic theories
• Apply artificial Intelligence into possible contexts and makes
•Apply all statistical and mathematical models in human language use and usage
6. NLP
NLP is used to teach a machine how to read and understand human languages.
Trained machines can extract the relationships between words, identify the
entities in a sentence (i.e., entity-recognition), etc.
7. Tokenizing
Breaking up a stream of characters into words, punctuation marks, numbers and
other discrete items.
8. Parts of speech
Noun -fish, book, house, pen, procrastination, language
Proper noun -John, France, Barack, Goldsmiths, Python
Verb- loves, hates, studies, sleeps, thinks, is, has
Adjective -grumpy, sleepy, happy, bashful
Adverb- slowly, quickly, now, here, there
Pronoun- I, you, he, she, we, us, it, they
Preposition- in, on, at, by, around, with, without
Conjunction -and, but, or, unless
Determiner -the, a, an, some, many, few, 100
9. Constituent structure
(((the | a)(cat | dog))(John | Jack | Susan))(barked | slept)
Sentence → Noun Phrase, Verb Phrase
Noun Phrase → Determiner, Noun (Example: the, dog)
Noun Phrase → Proper Noun (Example: Jack)
Noun Phrase → Noun Phrase, Conj,
Noun Phrase (Examples: Jack and Jill, the owl and the pussycat)
Verb Phrase → Verb, Noun Phrase (Example: saw the rabbit)
Verb Phrase → Verb, Preposition, Noun Phrase (Examples: went up the hill, sat
on the mat)
10. corpus
corpus is a collection of data selected with a descriptive or applicative aim as its
purpose
a corpus must possess a common set of fundamental properties, including
representativeness, a finite size and existing in electronic format.
11. The linguistic data consortium
Founded in 1992 and based at the University of Pennsylvania in the United
States, this research and development center is financed primarily by the National
Science Foundation (NSF). Its main activities consist of collecting, distributing and
annotating linguistic resources which correspond to the needs of research centers
and American companies which work in the field of language technology. The
linguistic data consortium (LDC) owns an extensive catalog of written and spoken
corpora which covers a fairly large number of different languages.
12. LFG-GPSG
In LFG one parses sentences and builds up functional structures, in GPSG
sentences are parsed and translated into formulas of intentional logic, hardly
anyone knows how to generate from f-structures or from logical formulas
13. LFG-Lexical Functional Grammar
Two levels of structure
C-structure (tree)
F-structure (representation of grammatical functions)
Mappings between C-structure and F-structure
14. Pronunciation
phonology and phonetics which is concerned with pronunciation.
Pronunciation of characters in isolation and combinations
Regular and irregular pronunciation need considerations
some words have the same pronunciation with different meanings such as "weak"
and "week". Computers cannot differentiate between the two words
15. Morphology
structure of words in their written (graphemic) form and spoken (phonemic) form. It has
two forms namely inflection and derivation.
Inflection:
It is related to the grammatical function of words of the same part of speech;
e. g. the paradigm of the verb play as:
Play, plays, played, playing
Derivation:
It is related to the production of new words of different parts of speech;
e. g. nation - (a noun )
national- (an adjective )
nationalize- ( a verb )
16. Morphological Analyzer
A morphological analyzer can extract the base forms from inserted documents in
computers.
The applications which are achieved in this respect are:
a: hyphenation (segmenting words into their morphs),
b: spelling correction,
c: stemming which reduces the related words as possible. The problem of such
computational programs is the input which should be very broad. Other forms of
application are parsing and generating natural language utterances in written or
spoken form and machine translation. (Trost, 2006)
17. Syntax
concerned with the structure of sentences
Syntax analysis checks the text for meaningfulness comparing to the rules of
formal grammar.
Sometimes word order of some kinds of structure causes misleading-
Eg. I saw her with a telescope.
18. Semantics
deals with the meanings of words, phrases and sentences.
Single word may have several meanings
Eg. Chip, well, covers,
“hot ice-cream” would be rejected by semantic analyzer based on probability
19. Pragmatics
deals with the meanings of utterance depending on the context.
Interpretation plays crucial role in understanding the meaning
Eg. I am waiting
Can be identified as:
a.an ordinary fact,
b. a promise and
c.a threat.