SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Natural Language Processing
                                   Using Python




                                    Presented by:-
                                    Sumit Kumar Raj
                                    1DS09IS082

ISE,DSCE-2013
Table of Contents




        •
          Introduction
        •
          History
        •
          Methods in NLP
        •
          Natural Language Toolkit
        •
          Sample Codes
        •
          Feeling Lonely ?
        •
          Building a Spam Filter
        •
          Applications
        •
          References


ISE,DSCE-2013                        1
What is Natural Language Processing ?




    •Computer     aided text analysis of human language.

    •The    goal is to enable machines to understand human
          language and extract meaning from text.

    •It   is a field of study which falls under the category of
          machine learning and more specifically computational
          linguistics.




ISE,DSCE-2013                                                     2
History


  •
      1948- 1st NLP application
         – dictionary look-up system
         – developed at Birkbeck College, London

  •
  l   1949- American interest
         –WWII code breaker Warren Weaver
         – He viewed German as English in code.

  •
      1966- Over-promised under-delivered
         – Machine Translation worked only word by word
       l
         – NLP brought the first hostility of research funding
       l
         – NLP gave AI a bad name before AI had a name.
ISE,DSCE-2013                                                    3
Natural language processing is heavily used throughout all web
                         technologies


                           Search engines


    Consumer behavior analysis              Site recommendations



     Sentiment analysis                         Spam filtering



           Automated customer        Knowledge bases and
             support systems            expert systems


ISE,DSCE-2013                                                      4
Context


   Little sister: What’s your name?

   Me: Uhh….Sumit..?

   Sister: Can you spell it?

   Me: yes. S-U-M-I-T…..
ISE,DSCE-2013                         5
Sister: WRONG! It’s spelled “I-
     T”



ISE,DSCE-2013                          6
Ambiguity

   “I shot the man with ice cream.“
   -
    A man with ice cream was shot
   -
    A man had ice cream shot at him




ISE,DSCE-2013                         7
Methods :-

       1) POS Tagging :-

      •In  corpus linguistics, Parts-of-speech tagging also called
          grammatical tagging or word-category disambiguation.
      •It is the process of marking up a word in a text corres-
          ponding to a particular POS.
      •POS tagging is harder than just having a list of words
          and their parts of speech.
      •Consider the example:
             l
               The sailor dogs the barmaid.



ISE,DSCE-2013                                                        8
2) Parsing :-


  •In
    context of NLP, parsing may be defined as the process of
   assigning structural descriptions to sequences of words in
   a natural language.
  Applications of parsing include
     simple phrase finding, eg. for proper name recognition
     Full semantic analysis of text, e.g. information extraction or
                                         machine translation




ISE,DSCE-2013                                                    9
3) Speech Recognition:-



  •
    It is concerned with the mapping a continuous speech signal
  into a sequence of recognized words.
  •
    Problem is variation in pronunciation, homonyms.
  •
    In sentence “the boy eats”, a bi-gram model sufficient to
        model the relationship b/w boy and eats.
          “The boy on the hill by the lake in our town…eats”
  •
    Bi-gram and Trigram have proven extremely effective in
        obvious dependencies.




ISE,DSCE-2013                                                 10
4) Machine Translation:-



 •
   It involves translating text from one NL to another.
 •
   Approaches:-
        -simple word substitution,with some changes in ordering to
         account for grammatical differences
        -translate the source language into underlying meaning
         representation or interlingua




ISE,DSCE-2013                                                    11
5) Stemming:-




  •
      In linguistic morphology and information retrieval, stemming is
            the process for reducing inflected words to their stem.
    •
      The stem need not be identical to the morphological root of the
                                    word.
  •
    Many search engines treat words with same stem as synonyms
          as a kind of query broadening, a process called conflation.




ISE,DSCE-2013                                                     12
Natural Language Toolkit

    •
      NLTK is a leading platform for building Python program to
    work with human language data.
    •
      Provides a suite of text processing libraries for
      classification, tokenization, stemming, tagging, parsing,
      and semantic reasoning.

    •
      Currently only available for Python 2.5 – 2.6
    http://www.nltk.org/download
    •
      `easy_install nltk
    •
      Prerequisites
       –
         NumPy
       –
         SciPy

ISE,DSCE-2013                                                     13
Let’s dive into some code!




ISE,DSCE-2013                       14
Part of Speech Tagging

from nltk import pos_tag,word_tokenize

sentence1 = 'this is a demo that will show you how
to detects parts of speech with little effort
using NLTK!'

tokenized_sent = word_tokenize(sentence1)
print pos_tag(tokenized_sent)


[('this', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('demo', 'NN'), ('that', 'WDT'),
('will', 'MD'), ('show', 'VB'), ('you', 'PRP'), ('how', 'WRB'), ('to', 'TO'),
('detects', 'NNS'), ('parts', 'NNS'), ('of', 'IN'), ('speech', 'NN'), ('with',
'IN'), ('little', 'JJ'), ('effort', 'NN'), ('using', 'VBG'), ('NLTK', 'NNP'),('!',
'.')]
ISE,DSCE-2013                                                                  15
Fun things to Try




ISE,DSCE-2013                       16
Feeling lonely?

  Eliza is there to talk to you all day! What human could ever do that
  for you??
    from nltk.chat import eliza
    eliza.eliza_chat()
    ……starts the chatbot
   Therapist
   ---------
   Talk to the program by typing in plain English, using normal upper-
   and lower-case letters and punctuation. Enter "quit" when done.
   ============================================================
   ============
   Hello. How are you feeling today?


ISE,DSCE-2013                                                            17
Let’s build something even
    cooler




ISE,DSCE-2013                    18
Lets write a Spam filter!

   A program that analyzes legitimate emails “Ham” as well as
   “Spam” and learns the features that are associated with
   each.

   Once trained, we should be able to run this program on
   incoming mail and have it reliably label each one with the
   appropriate category.




ISE,DSCE-2013                                                   19
“Spambot.py” (continued)



  1.   Extract one of the archives from the site into your working directory.

  2.   Create a python script, lets call it “spambot.py”.

   Your working directory should contain the “spambot” script and the
  3.

  folders “spam” and “ham”.


from nltk import word_tokenize,
WordNetLemmatizer,NaiveBayesClassifier
,classify,MaxentClassifier

from nltk.corpus import stopwords
import random
ISE,DSCE-2013                                                                   20
“Spambot.py” (continued)

label each item with the appropriate label and store them as a list of tuples


mixedemails = ([(email,'spam') for email in spamtexts]
mixedemails += [(email,'ham') for email in hamtexts])

From this list of random but labeled emails, we will defined a “feature
extractor” which outputs a feature set that our program can use to statistically
compare spam and ham.



random.shuffle(mixedemails)
                                  lets give them a nice shuffle




ISE,DSCE-2013                                                                   21
“Spambot.py” (continued)


def email_features(sent):
    features = {}
    wordtokens = [wordlemmatizer.lemmatize(word.lower()) for
word in word_tokenize(sent)]         Normalize words
    for word in wordtokens:
         if word not in commonwords:
              features[word] = True
    return features
                     If the word is not a stop-word then lets
                     consider it a “feature”




featuresets = [(email_features(n), g) for (n,g) in mixedemails]

ISE,DSCE-2013
“Spambot.py” (continued)



While True:
   featset = email_features(raw_input("Enter text to classify: "))
   print classifier.classify(featset)



We can now directly input new email and have it classified as either Spam or
Ham




ISE,DSCE-2013                                                              23
Applications :-



  •
    Conversion from natural language to computer language
      and vice-versa.
  •
    Translation from one human language to another.
  •
    Automatic checking for grammar and writing techniques.
  •
    Spam filtering
  •
    Sentiment Analysis




ISE,DSCE-2013                                                24
Conclusion:-



 NLP takes a very important role in new machine human interfaces. When we look at
 Some of the products based on technologies with NLP we can see that they are very
 advanced but very useful.

 But there are many limitations, For example language we speak is highly ambiguous.
 This makes it very difficult to understand and analyze. Also with so many languages
 spoken all over the world it is very difficult to design a system that is 100% accurate.

 These problems get more complicated when we think of different people speaking the
 same language with different styles.

 Intelligent systems are being experimented right now.
 We will be able to see improved applications of NLP in the near future.


ISE,DSCE-2013                                                                          25
References :-


•
  http://en.wikipedia.org/wiki/Natural_language_processing
•
  An overview of Empirical Natural Language Processing
      by Eric Brill and Raymond J. Mooney
•
  Investigating classification for natural language processing tasks
     by Ben W. Medlock, University of Cambridge
•
  Natural Language Processing and Machine Learning using Python
     by Shankar Ambady.
•
  http://www.slideshare.net
•
  http://www.doc.ic.ac.uk/~nd/surprise_97/journal/vol1/hks/index.html
l
  http://googlesystem.blogspot.in/2012/10/google-improves-results-for-natural/
    Codes from :https://github.com/shanbady/NLTK-Boston-Python-Meetup




ISE,DSCE-2013                                                               26
Any Questions ???




ISE,DSCE-2013                       27
Thank You...

                Reach me @:
                facebook.com/sumit12dec

                sumit786raj@gmail.com

                9590 285 524

ISE,DSCE-2013

Contenu connexe

Tendances

Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 

Tendances (20)

Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
NLTK
NLTKNLTK
NLTK
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
NLP
NLPNLP
NLP
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
NLP
NLPNLP
NLP
 
NLP
NLPNLP
NLP
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing PPT presentation
Natural language processing PPT presentationNatural language processing PPT presentation
Natural language processing PPT presentation
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Natural Language Processing
Natural Language Processing Natural Language Processing
Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 

En vedette

Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Olivier Grisel
 
Summarization and opinion detection of product reviews (1)
Summarization and opinion detection of product reviews (1)Summarization and opinion detection of product reviews (1)
Summarization and opinion detection of product reviews (1)
Lokesh Mittal
 
NLP@Work Conference: email persuasion
NLP@Work Conference: email persuasionNLP@Work Conference: email persuasion
NLP@Work Conference: email persuasion
evolutionpd
 
Rahul Naik-Resume
Rahul Naik-ResumeRahul Naik-Resume
Rahul Naik-Resume
Rahul Naik
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Mariana Soffer
 

En vedette (19)

Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
 
NLTK in 20 minutes
NLTK in 20 minutesNLTK in 20 minutes
NLTK in 20 minutes
 
Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts
Real-time Natural Language Processing for Crowdsourced Road Traffic AlertsReal-time Natural Language Processing for Crowdsourced Road Traffic Alerts
Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts
 
Hands on Session on Python
Hands on Session on PythonHands on Session on Python
Hands on Session on Python
 
Thinking about nlp
Thinking about nlpThinking about nlp
Thinking about nlp
 
NLTK
NLTKNLTK
NLTK
 
Summarization and opinion detection of product reviews (1)
Summarization and opinion detection of product reviews (1)Summarization and opinion detection of product reviews (1)
Summarization and opinion detection of product reviews (1)
 
resume
resumeresume
resume
 
DeepeshRehi
DeepeshRehiDeepeshRehi
DeepeshRehi
 
Deep learning for text analytics
Deep learning for text analyticsDeep learning for text analytics
Deep learning for text analytics
 
NLP@Work Conference: email persuasion
NLP@Work Conference: email persuasionNLP@Work Conference: email persuasion
NLP@Work Conference: email persuasion
 
Rahul Naik-Resume
Rahul Naik-ResumeRahul Naik-Resume
Rahul Naik-Resume
 
Using Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From ResumesUsing Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From Resumes
 
AI Reality: Where are we now? Data for Good? - Bill Boorman
AI Reality: Where are we now? Data for Good? - Bill  BoormanAI Reality: Where are we now? Data for Good? - Bill  Boorman
AI Reality: Where are we now? Data for Good? - Bill Boorman
 
Python NLTK
Python NLTKPython NLTK
Python NLTK
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Deep Learning and Text Mining
Deep Learning and Text MiningDeep Learning and Text Mining
Deep Learning and Text Mining
 

Similaire à Natural language processing (Python)

Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
Abdullah al Mamun
 
A Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpA Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And Rlbp
Rikki Wright
 
Prolog (present)
Prolog (present) Prolog (present)
Prolog (present)
Melody Joey
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
SHIBDASDUTTA
 
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptxEXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
AtulKumarUpadhyay4
 
Wed 1430 kartik_subramanian_color
Wed 1430 kartik_subramanian_colorWed 1430 kartik_subramanian_color
Wed 1430 kartik_subramanian_color
DATAVERSITY
 

Similaire à Natural language processing (Python) (20)

Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
A Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpA Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And Rlbp
 
Tutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTutorial: Text Analytics for Security
Tutorial: Text Analytics for Security
 
An Overview Of Natural Language Processing
An Overview Of Natural Language ProcessingAn Overview Of Natural Language Processing
An Overview Of Natural Language Processing
 
Prolog (present)
Prolog (present) Prolog (present)
Prolog (present)
 
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William EnckHotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
 
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptxEXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
 
IRJET - Analysis of Paraphrase Detection using NLP Techniques
IRJET - Analysis of Paraphrase Detection using NLP TechniquesIRJET - Analysis of Paraphrase Detection using NLP Techniques
IRJET - Analysis of Paraphrase Detection using NLP Techniques
 
Antlr Conexaojava
Antlr ConexaojavaAntlr Conexaojava
Antlr Conexaojava
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
IRJET- Querying Database using Natural Language Interface
IRJET-  	  Querying Database using Natural Language InterfaceIRJET-  	  Querying Database using Natural Language Interface
IRJET- Querying Database using Natural Language Interface
 
Spell checker for Kannada OCR
Spell checker for Kannada OCRSpell checker for Kannada OCR
Spell checker for Kannada OCR
 
IRJET - Storytelling App for Children with Hearing Impairment using Natur...
IRJET -  	  Storytelling App for Children with Hearing Impairment using Natur...IRJET -  	  Storytelling App for Children with Hearing Impairment using Natur...
IRJET - Storytelling App for Children with Hearing Impairment using Natur...
 
Wed 1430 kartik_subramanian_color
Wed 1430 kartik_subramanian_colorWed 1430 kartik_subramanian_color
Wed 1430 kartik_subramanian_color
 
NLP PPT.pptx
NLP PPT.pptxNLP PPT.pptx
NLP PPT.pptx
 
NLP.pptx
NLP.pptxNLP.pptx
NLP.pptx
 
The State of #NLProc
The State of #NLProcThe State of #NLProc
The State of #NLProc
 
NLP
NLPNLP
NLP
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Natural language processing (Python)

  • 1. Natural Language Processing Using Python Presented by:- Sumit Kumar Raj 1DS09IS082 ISE,DSCE-2013
  • 2. Table of Contents • Introduction • History • Methods in NLP • Natural Language Toolkit • Sample Codes • Feeling Lonely ? • Building a Spam Filter • Applications • References ISE,DSCE-2013 1
  • 3. What is Natural Language Processing ? •Computer aided text analysis of human language. •The goal is to enable machines to understand human language and extract meaning from text. •It is a field of study which falls under the category of machine learning and more specifically computational linguistics. ISE,DSCE-2013 2
  • 4. History • 1948- 1st NLP application – dictionary look-up system – developed at Birkbeck College, London • l 1949- American interest –WWII code breaker Warren Weaver – He viewed German as English in code. • 1966- Over-promised under-delivered – Machine Translation worked only word by word l – NLP brought the first hostility of research funding l – NLP gave AI a bad name before AI had a name. ISE,DSCE-2013 3
  • 5. Natural language processing is heavily used throughout all web technologies Search engines Consumer behavior analysis Site recommendations Sentiment analysis Spam filtering Automated customer Knowledge bases and support systems expert systems ISE,DSCE-2013 4
  • 6. Context Little sister: What’s your name? Me: Uhh….Sumit..? Sister: Can you spell it? Me: yes. S-U-M-I-T….. ISE,DSCE-2013 5
  • 7. Sister: WRONG! It’s spelled “I- T” ISE,DSCE-2013 6
  • 8. Ambiguity “I shot the man with ice cream.“ - A man with ice cream was shot - A man had ice cream shot at him ISE,DSCE-2013 7
  • 9. Methods :- 1) POS Tagging :- •In corpus linguistics, Parts-of-speech tagging also called grammatical tagging or word-category disambiguation. •It is the process of marking up a word in a text corres- ponding to a particular POS. •POS tagging is harder than just having a list of words and their parts of speech. •Consider the example: l The sailor dogs the barmaid. ISE,DSCE-2013 8
  • 10. 2) Parsing :- •In context of NLP, parsing may be defined as the process of assigning structural descriptions to sequences of words in a natural language. Applications of parsing include simple phrase finding, eg. for proper name recognition Full semantic analysis of text, e.g. information extraction or machine translation ISE,DSCE-2013 9
  • 11. 3) Speech Recognition:- • It is concerned with the mapping a continuous speech signal into a sequence of recognized words. • Problem is variation in pronunciation, homonyms. • In sentence “the boy eats”, a bi-gram model sufficient to model the relationship b/w boy and eats. “The boy on the hill by the lake in our town…eats” • Bi-gram and Trigram have proven extremely effective in obvious dependencies. ISE,DSCE-2013 10
  • 12. 4) Machine Translation:- • It involves translating text from one NL to another. • Approaches:- -simple word substitution,with some changes in ordering to account for grammatical differences -translate the source language into underlying meaning representation or interlingua ISE,DSCE-2013 11
  • 13. 5) Stemming:- • In linguistic morphology and information retrieval, stemming is the process for reducing inflected words to their stem. • The stem need not be identical to the morphological root of the word. • Many search engines treat words with same stem as synonyms as a kind of query broadening, a process called conflation. ISE,DSCE-2013 12
  • 14. Natural Language Toolkit • NLTK is a leading platform for building Python program to work with human language data. • Provides a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning. • Currently only available for Python 2.5 – 2.6 http://www.nltk.org/download • `easy_install nltk • Prerequisites – NumPy – SciPy ISE,DSCE-2013 13
  • 15. Let’s dive into some code! ISE,DSCE-2013 14
  • 16. Part of Speech Tagging from nltk import pos_tag,word_tokenize sentence1 = 'this is a demo that will show you how to detects parts of speech with little effort using NLTK!' tokenized_sent = word_tokenize(sentence1) print pos_tag(tokenized_sent) [('this', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('demo', 'NN'), ('that', 'WDT'), ('will', 'MD'), ('show', 'VB'), ('you', 'PRP'), ('how', 'WRB'), ('to', 'TO'), ('detects', 'NNS'), ('parts', 'NNS'), ('of', 'IN'), ('speech', 'NN'), ('with', 'IN'), ('little', 'JJ'), ('effort', 'NN'), ('using', 'VBG'), ('NLTK', 'NNP'),('!', '.')] ISE,DSCE-2013 15
  • 17. Fun things to Try ISE,DSCE-2013 16
  • 18. Feeling lonely? Eliza is there to talk to you all day! What human could ever do that for you?? from nltk.chat import eliza eliza.eliza_chat() ……starts the chatbot Therapist --------- Talk to the program by typing in plain English, using normal upper- and lower-case letters and punctuation. Enter "quit" when done. ============================================================ ============ Hello. How are you feeling today? ISE,DSCE-2013 17
  • 19. Let’s build something even cooler ISE,DSCE-2013 18
  • 20. Lets write a Spam filter! A program that analyzes legitimate emails “Ham” as well as “Spam” and learns the features that are associated with each. Once trained, we should be able to run this program on incoming mail and have it reliably label each one with the appropriate category. ISE,DSCE-2013 19
  • 21. “Spambot.py” (continued) 1. Extract one of the archives from the site into your working directory. 2. Create a python script, lets call it “spambot.py”. Your working directory should contain the “spambot” script and the 3. folders “spam” and “ham”. from nltk import word_tokenize, WordNetLemmatizer,NaiveBayesClassifier ,classify,MaxentClassifier from nltk.corpus import stopwords import random ISE,DSCE-2013 20
  • 22. “Spambot.py” (continued) label each item with the appropriate label and store them as a list of tuples mixedemails = ([(email,'spam') for email in spamtexts] mixedemails += [(email,'ham') for email in hamtexts]) From this list of random but labeled emails, we will defined a “feature extractor” which outputs a feature set that our program can use to statistically compare spam and ham. random.shuffle(mixedemails) lets give them a nice shuffle ISE,DSCE-2013 21
  • 23. “Spambot.py” (continued) def email_features(sent): features = {} wordtokens = [wordlemmatizer.lemmatize(word.lower()) for word in word_tokenize(sent)] Normalize words for word in wordtokens: if word not in commonwords: features[word] = True return features If the word is not a stop-word then lets consider it a “feature” featuresets = [(email_features(n), g) for (n,g) in mixedemails] ISE,DSCE-2013
  • 24. “Spambot.py” (continued) While True: featset = email_features(raw_input("Enter text to classify: ")) print classifier.classify(featset) We can now directly input new email and have it classified as either Spam or Ham ISE,DSCE-2013 23
  • 25. Applications :- • Conversion from natural language to computer language and vice-versa. • Translation from one human language to another. • Automatic checking for grammar and writing techniques. • Spam filtering • Sentiment Analysis ISE,DSCE-2013 24
  • 26. Conclusion:- NLP takes a very important role in new machine human interfaces. When we look at Some of the products based on technologies with NLP we can see that they are very advanced but very useful. But there are many limitations, For example language we speak is highly ambiguous. This makes it very difficult to understand and analyze. Also with so many languages spoken all over the world it is very difficult to design a system that is 100% accurate. These problems get more complicated when we think of different people speaking the same language with different styles. Intelligent systems are being experimented right now. We will be able to see improved applications of NLP in the near future. ISE,DSCE-2013 25
  • 27. References :- • http://en.wikipedia.org/wiki/Natural_language_processing • An overview of Empirical Natural Language Processing by Eric Brill and Raymond J. Mooney • Investigating classification for natural language processing tasks by Ben W. Medlock, University of Cambridge • Natural Language Processing and Machine Learning using Python by Shankar Ambady. • http://www.slideshare.net • http://www.doc.ic.ac.uk/~nd/surprise_97/journal/vol1/hks/index.html l http://googlesystem.blogspot.in/2012/10/google-improves-results-for-natural/ Codes from :https://github.com/shanbady/NLTK-Boston-Python-Meetup ISE,DSCE-2013 26
  • 29. Thank You... Reach me @: facebook.com/sumit12dec sumit786raj@gmail.com 9590 285 524 ISE,DSCE-2013