SlideShare une entreprise Scribd logo
1  sur  22
Analyzing…
https://blogs.microsoft.com/blog/2018/01/17/future-
computed-artificial-intelligence-role-
society/?MC=DevOps&MC=MachLearn&MC=OfficeO
365&MC=MSAzure&MC=CloudPlat
BeautifulSoup: Web Scraping in Python
 Natural language processing (NLP) is a field of
computer science, artificial intelligence concerned
with the interactions between computers and human
(natural) languages, and, in particular, concerned with
programming computers to fruitfully process large
natural language data.
What???
 In 1950, Alan Turing published an article titled
"Computing Machinery and Intelligence“ which
proposed what is now called the Turing test as a
criterion of intelligence.
 Eliza 1964
ELIZA might provide a generic response, for example,
responding to "My head hurts" with "Why do you say
your head hurts?".
When???
How??
Where?
 Machine Translation
 Fighting Spam
 Mail Inbox or Spam
 Information Extraction
 Social media monitoring
 Summarization
 Question Answering
Tasks in OpenNLP
 The Apache OpenNLP library is a machine
learning based toolkit for the processing of natural
language text.
 It supports the most common NLP tasks, such
as language detection, tokenization, sentence
segmentation, part-of-speech tagging, named entity
extraction, chunking, parsing and co reference
resolution.
Tasks in OpenNLP
 Text data in the form of un-structured text, in the form
of comments, reviews or articles
 Extract meaning full information from them, done
with the help of set of tasks
 Tokenizing:
 Take a large piece of text and break it into smaller components
 Break it into sentences or individual words
Stop words removal
 Once Tokenized,
 next Stop words removal, i.e. differentiating words
which has specific meaning from the words which
adds to structure to the sentence.
 Eg:
N-Grams
 Once Stop words are removed,
 Commonly occurring words in a sentence, because these will be
most important words in the text.
https://en.wikipedia.org/wiki/N-gram#Examples
 If a word appears 2 times in a particular sentence, its called
bigrams.
 Eg:
 Code(s) Description
 M79.661 Pain in right lower leg
 M79.662 Pain in left lower leg
 M79.669 Pain in unspecified lower leg
Word Sense Disambiguation
 Eg:
 I am taking aspirin for my cold
 Let's go inside, I'm cold
 It's cold today, only 2 degrees
 It identifies the meaning of the word, based on the
context it is spoken.
Parts-of-Speech Tagging
 It can either occur as part of WSD, or as a independent
task.
 It helps in identifying parts of speech, whether Noun,
Verb, Adjective, etc.
Stemming
 Eg: Close, Closed, Closely, Closer
 Converting the word to its base form
Python NLTK and OpenNLP
 NLTK is one of the leading platforms for working with
human language data and Python, the module NLTK is
used for natural language processing.
 NLTK is literally an acronym for Natural Language Toolkit.
 The Apache OpenNLP library is a machine learning based
toolkit for the processing of natural language text.
 It supports the most common NLP tasks, such as
tokenization, sentence segmentation, part-of-speech
tagging, named entity extraction, chunking, parsing, and
coreference resolution.
Python NLTK
 Step 1: Collect all individual Sentences in an article, to
a list.
 Tokenization () from NLTK: ie from nltk.tokenize we
can import the functions sent_tokenize(breakdown into
sentences) and word_tokenize(breakdown into words)
 Import stopwords () from nltk.corpus module.
 punctuation from string module.
 Note : Sentence ends with a period symbol(.) and a space
after that.
Frequency distribution
 Construct a frequency distribution : words and no of
times each word occurs
 Functions Defined for NLTK's Frequency Distributions
Example Description
fdist = FreqDist(samples) create a frequency distribution containing the given samples
fdist[sample] += 1 increment the count for this sample
fdist['monstrous'] count of the number of times a given sample occurred
fdist.freq('monstrous') frequency of a given sample
fdist.N() total number of samples
fdist.most_common(n) the n most common samples and their frequencies
for sample in fdist: iterate over the samples
fdist.max() sample with the greatest count
fdist.tabulate() tabulate the frequency distribution
fdist.plot() graphical plot of the frequency distribution
fdist.plot(cumulative=True) cumulative plot of the frequency distribution
fdist1 |= fdist2 update fdist1 with counts from fdist2
fdist1 < fdist2 test if samples in fdist1 occur less frequently than in fdist2
Use Tokenizing: Sentence Detector
 Python Usage
 Step 1: Import NLTK
 Step2:
 text = "Mary had a little lamp. Her fleece was as white as snow"
 from nltk.tokenize import word_tokenize, sent_tokenize
 sents = sent_tokenize(text)
 print(sents)
 Java Usage
 Step 1:
 Step 2: Some Java code snippet
OpenNLP syntax
 OpenNLP components have similar APIs. Normally, to
execute a task, one should provide a model and an
input.
 A model is usually loaded by providing a
FileInputStream with a model to a constructor of the
model class
 try (InputStream modelIn = new
FileInputStream("lang-model-name.bin")) {
SomeModel model = new SomeModel(modelIn); }
Features
 Language detection
 https://www.apache.org/dist/opennlp/models/langdete
ct/1.8.3/README.txt
Breaking into Word
 Python
 words=[word_tokenize(sent) for sent in sents]
 print words
 Java
 InputStream is = new FileInputStream("en-token.bin");
 TokenizerModel model = new TokenizerModel(is);
 Tokenizer tokenizer = new TokenizerME(model);
 String tokens[] = tokenizer.tokenize("Hi. How are you? This
is Mike.");
 for (String a : tokens)
System.out.println(a);
 is.close();
POS - Tagging
https://www.winwaed.com/blog/2011/11/08/part-of-speech-tags/
I hope I made myself
understandable.
Thanks!

Contenu connexe

Tendances

Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
Robert Viseur
 
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn..."Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
Edge AI and Vision Alliance
 

Tendances (20)

TensorFlow Lite for mobile & IoT
TensorFlow Lite for mobile & IoT   TensorFlow Lite for mobile & IoT
TensorFlow Lite for mobile & IoT
 
An Overview of Google Assistant
An Overview of Google Assistant An Overview of Google Assistant
An Overview of Google Assistant
 
Open AI Chat GPT.
Open AI Chat GPT.Open AI Chat GPT.
Open AI Chat GPT.
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
 
NTLM - Open Source Language AI Tools
NTLM - Open Source Language AI ToolsNTLM - Open Source Language AI Tools
NTLM - Open Source Language AI Tools
 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?
 
200109-Open AI Chat GPT.pptx
200109-Open AI Chat GPT.pptx200109-Open AI Chat GPT.pptx
200109-Open AI Chat GPT.pptx
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
 
An introduction to Jupyter notebooks and the Noteable service
An introduction to Jupyter notebooks and the Noteable serviceAn introduction to Jupyter notebooks and the Noteable service
An introduction to Jupyter notebooks and the Noteable service
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn..."Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
PPT presentation on ARTIFICIAL INTELLIGENCE
PPT presentation on ARTIFICIAL  INTELLIGENCEPPT presentation on ARTIFICIAL  INTELLIGENCE
PPT presentation on ARTIFICIAL INTELLIGENCE
 
A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptx
 
Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of Code
 
An Introduction to ANTLR
An Introduction to ANTLRAn Introduction to ANTLR
An Introduction to ANTLR
 
Language models
Language modelsLanguage models
Language models
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGI
 

Similaire à Natural Language Processing: Comparing NLTK and OpenNLP

Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Apache OpenNLP
 
Natural Language Processing made easy
Natural Language Processing made easyNatural Language Processing made easy
Natural Language Processing made easy
Gopi Krishnan Nambiar
 
Large Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextLarge Scale Processing of Unstructured Text
Large Scale Processing of Unstructured Text
DataWorks Summit
 

Similaire à Natural Language Processing: Comparing NLTK and OpenNLP (20)

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
 
Nltk - Boston Text Analytics
Nltk - Boston Text AnalyticsNltk - Boston Text Analytics
Nltk - Boston Text Analytics
 
Nltk
NltkNltk
Nltk
 
NLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in PythonNLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in Python
 
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptAI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
 
NLP.pptx
NLP.pptxNLP.pptx
NLP.pptx
 
NLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyNLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easy
 
Natural Language Processing made easy
Natural Language Processing made easyNatural Language Processing made easy
Natural Language Processing made easy
 
Large Scale Text Processing
Large Scale Text ProcessingLarge Scale Text Processing
Large Scale Text Processing
 
Large Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextLarge Scale Processing of Unstructured Text
Large Scale Processing of Unstructured Text
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
 
Text Mining Infrastructure in R
Text Mining Infrastructure in RText Mining Infrastructure in R
Text Mining Infrastructure in R
 
Devoxx traitement automatique du langage sur du texte en 2019
Devoxx   traitement automatique du langage sur du texte en 2019 Devoxx   traitement automatique du langage sur du texte en 2019
Devoxx traitement automatique du langage sur du texte en 2019
 
Natural language processing using python
Natural language processing using pythonNatural language processing using python
Natural language processing using python
 
PYTHON PPT.pptx
PYTHON PPT.pptxPYTHON PPT.pptx
PYTHON PPT.pptx
 
Presentation1
Presentation1Presentation1
Presentation1
 
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...
 

Plus de CodeOps Technologies LLP

Plus de CodeOps Technologies LLP (20)

AWS Serverless Event-driven Architecture - in lastminute.com meetup
AWS Serverless Event-driven Architecture - in lastminute.com meetupAWS Serverless Event-driven Architecture - in lastminute.com meetup
AWS Serverless Event-driven Architecture - in lastminute.com meetup
 
Understanding azure batch service
Understanding azure batch serviceUnderstanding azure batch service
Understanding azure batch service
 
DEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNINGDEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNING
 
SERVERLESS MIDDLEWARE IN AZURE FUNCTIONS
SERVERLESS MIDDLEWARE IN AZURE FUNCTIONSSERVERLESS MIDDLEWARE IN AZURE FUNCTIONS
SERVERLESS MIDDLEWARE IN AZURE FUNCTIONS
 
BUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONS
BUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONSBUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONS
BUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONS
 
APPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICES
APPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICESAPPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICES
APPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICES
 
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPSBUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
 
CREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNER
CREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNERCREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNER
CREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNER
 
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
 
WRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESS
WRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESSWRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESS
WRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESS
 
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh SharmaTraining And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
 
Deploy Microservices To Kubernetes Without Secrets by Reenu Saluja
Deploy Microservices To Kubernetes Without Secrets by Reenu SalujaDeploy Microservices To Kubernetes Without Secrets by Reenu Saluja
Deploy Microservices To Kubernetes Without Secrets by Reenu Saluja
 
Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...
Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...
Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...
 
YAML Tips For Kubernetes by Neependra Khare
YAML Tips For Kubernetes by Neependra KhareYAML Tips For Kubernetes by Neependra Khare
YAML Tips For Kubernetes by Neependra Khare
 
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
 
Monitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
Monitor Azure Kubernetes Cluster With Prometheus by Mamta JhaMonitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
Monitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
 
Jet brains space intro presentation
Jet brains space intro presentationJet brains space intro presentation
Jet brains space intro presentation
 
Functional Programming in Java 8 - Lambdas and Streams
Functional Programming in Java 8 - Lambdas and StreamsFunctional Programming in Java 8 - Lambdas and Streams
Functional Programming in Java 8 - Lambdas and Streams
 
Distributed Tracing: New DevOps Foundation
Distributed Tracing: New DevOps FoundationDistributed Tracing: New DevOps Foundation
Distributed Tracing: New DevOps Foundation
 
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire  "Distributed Tracing: New DevOps Foundation" by Jayesh Ahire
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire
 

Dernier

Dernier (20)

Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Software Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements EngineeringSoftware Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements Engineering
 
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
 
Community is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletCommunity is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea Goulet
 
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdfThe Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
 
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
 
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
 
Effective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeConEffective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeCon
 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?
 
From Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST APIFrom Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST API
 
architecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdfarchitecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdf
 
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
 
Food Delivery Business App Development Guide 2024
Food Delivery Business App Development Guide 2024Food Delivery Business App Development Guide 2024
Food Delivery Business App Development Guide 2024
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdf
 
^Clinic ^%[+27788225528*Abortion Pills For Sale In birch acres
^Clinic ^%[+27788225528*Abortion Pills For Sale In birch acres^Clinic ^%[+27788225528*Abortion Pills For Sale In birch acres
^Clinic ^%[+27788225528*Abortion Pills For Sale In birch acres
 
The Strategic Impact of Buying vs Building in Test Automation
The Strategic Impact of Buying vs Building in Test AutomationThe Strategic Impact of Buying vs Building in Test Automation
The Strategic Impact of Buying vs Building in Test Automation
 
Abortion Clinic In Polokwane ](+27832195400*)[ 🏥 Safe Abortion Pills in Polok...
Abortion Clinic In Polokwane ](+27832195400*)[ 🏥 Safe Abortion Pills in Polok...Abortion Clinic In Polokwane ](+27832195400*)[ 🏥 Safe Abortion Pills in Polok...
Abortion Clinic In Polokwane ](+27832195400*)[ 🏥 Safe Abortion Pills in Polok...
 
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanWorkshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
 
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
 
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
 

Natural Language Processing: Comparing NLTK and OpenNLP

  • 1.
  • 4.  Natural language processing (NLP) is a field of computer science, artificial intelligence concerned with the interactions between computers and human (natural) languages, and, in particular, concerned with programming computers to fruitfully process large natural language data. What???
  • 5.  In 1950, Alan Turing published an article titled "Computing Machinery and Intelligence“ which proposed what is now called the Turing test as a criterion of intelligence.  Eliza 1964 ELIZA might provide a generic response, for example, responding to "My head hurts" with "Why do you say your head hurts?". When???
  • 7. Where?  Machine Translation  Fighting Spam  Mail Inbox or Spam  Information Extraction  Social media monitoring  Summarization  Question Answering
  • 8. Tasks in OpenNLP  The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.  It supports the most common NLP tasks, such as language detection, tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing and co reference resolution.
  • 9. Tasks in OpenNLP  Text data in the form of un-structured text, in the form of comments, reviews or articles  Extract meaning full information from them, done with the help of set of tasks  Tokenizing:  Take a large piece of text and break it into smaller components  Break it into sentences or individual words
  • 10. Stop words removal  Once Tokenized,  next Stop words removal, i.e. differentiating words which has specific meaning from the words which adds to structure to the sentence.  Eg:
  • 11. N-Grams  Once Stop words are removed,  Commonly occurring words in a sentence, because these will be most important words in the text. https://en.wikipedia.org/wiki/N-gram#Examples  If a word appears 2 times in a particular sentence, its called bigrams.  Eg:  Code(s) Description  M79.661 Pain in right lower leg  M79.662 Pain in left lower leg  M79.669 Pain in unspecified lower leg
  • 12. Word Sense Disambiguation  Eg:  I am taking aspirin for my cold  Let's go inside, I'm cold  It's cold today, only 2 degrees  It identifies the meaning of the word, based on the context it is spoken.
  • 13. Parts-of-Speech Tagging  It can either occur as part of WSD, or as a independent task.  It helps in identifying parts of speech, whether Noun, Verb, Adjective, etc. Stemming  Eg: Close, Closed, Closely, Closer  Converting the word to its base form
  • 14. Python NLTK and OpenNLP  NLTK is one of the leading platforms for working with human language data and Python, the module NLTK is used for natural language processing.  NLTK is literally an acronym for Natural Language Toolkit.  The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.  It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution.
  • 15. Python NLTK  Step 1: Collect all individual Sentences in an article, to a list.  Tokenization () from NLTK: ie from nltk.tokenize we can import the functions sent_tokenize(breakdown into sentences) and word_tokenize(breakdown into words)  Import stopwords () from nltk.corpus module.  punctuation from string module.  Note : Sentence ends with a period symbol(.) and a space after that.
  • 16. Frequency distribution  Construct a frequency distribution : words and no of times each word occurs  Functions Defined for NLTK's Frequency Distributions Example Description fdist = FreqDist(samples) create a frequency distribution containing the given samples fdist[sample] += 1 increment the count for this sample fdist['monstrous'] count of the number of times a given sample occurred fdist.freq('monstrous') frequency of a given sample fdist.N() total number of samples fdist.most_common(n) the n most common samples and their frequencies for sample in fdist: iterate over the samples fdist.max() sample with the greatest count fdist.tabulate() tabulate the frequency distribution fdist.plot() graphical plot of the frequency distribution fdist.plot(cumulative=True) cumulative plot of the frequency distribution fdist1 |= fdist2 update fdist1 with counts from fdist2 fdist1 < fdist2 test if samples in fdist1 occur less frequently than in fdist2
  • 17. Use Tokenizing: Sentence Detector  Python Usage  Step 1: Import NLTK  Step2:  text = "Mary had a little lamp. Her fleece was as white as snow"  from nltk.tokenize import word_tokenize, sent_tokenize  sents = sent_tokenize(text)  print(sents)  Java Usage  Step 1:  Step 2: Some Java code snippet
  • 18. OpenNLP syntax  OpenNLP components have similar APIs. Normally, to execute a task, one should provide a model and an input.  A model is usually loaded by providing a FileInputStream with a model to a constructor of the model class  try (InputStream modelIn = new FileInputStream("lang-model-name.bin")) { SomeModel model = new SomeModel(modelIn); }
  • 19. Features  Language detection  https://www.apache.org/dist/opennlp/models/langdete ct/1.8.3/README.txt
  • 20. Breaking into Word  Python  words=[word_tokenize(sent) for sent in sents]  print words  Java  InputStream is = new FileInputStream("en-token.bin");  TokenizerModel model = new TokenizerModel(is);  Tokenizer tokenizer = new TokenizerME(model);  String tokens[] = tokenizer.tokenize("Hi. How are you? This is Mike.");  for (String a : tokens) System.out.println(a);  is.close();
  • 22. I hope I made myself understandable. Thanks!