SlideShare une entreprise Scribd logo
Analyzing…
https://blogs.microsoft.com/blog/2018/01/17/future-
computed-artificial-intelligence-role-
society/?MC=DevOps&MC=MachLearn&MC=OfficeO
365&MC=MSAzure&MC=CloudPlat
BeautifulSoup: Web Scraping in Python
 Natural language processing (NLP) is a field of
computer science, artificial intelligence concerned
with the interactions between computers and human
(natural) languages, and, in particular, concerned with
programming computers to fruitfully process large
natural language data.
What???
 In 1950, Alan Turing published an article titled
"Computing Machinery and Intelligence“ which
proposed what is now called the Turing test as a
criterion of intelligence.
 Eliza 1964
ELIZA might provide a generic response, for example,
responding to "My head hurts" with "Why do you say
your head hurts?".
When???
How??
Where?
 Machine Translation
 Fighting Spam
 Mail Inbox or Spam
 Information Extraction
 Social media monitoring
 Summarization
 Question Answering
Tasks in OpenNLP
 The Apache OpenNLP library is a machine
learning based toolkit for the processing of natural
language text.
 It supports the most common NLP tasks, such
as language detection, tokenization, sentence
segmentation, part-of-speech tagging, named entity
extraction, chunking, parsing and co reference
resolution.
Tasks in OpenNLP
 Text data in the form of un-structured text, in the form
of comments, reviews or articles
 Extract meaning full information from them, done
with the help of set of tasks
 Tokenizing:
 Take a large piece of text and break it into smaller components
 Break it into sentences or individual words
Stop words removal
 Once Tokenized,
 next Stop words removal, i.e. differentiating words
which has specific meaning from the words which
adds to structure to the sentence.
 Eg:
N-Grams
 Once Stop words are removed,
 Commonly occurring words in a sentence, because these will be
most important words in the text.
https://en.wikipedia.org/wiki/N-gram#Examples
 If a word appears 2 times in a particular sentence, its called
bigrams.
 Eg:
 Code(s) Description
 M79.661 Pain in right lower leg
 M79.662 Pain in left lower leg
 M79.669 Pain in unspecified lower leg
Word Sense Disambiguation
 Eg:
 I am taking aspirin for my cold
 Let's go inside, I'm cold
 It's cold today, only 2 degrees
 It identifies the meaning of the word, based on the
context it is spoken.
Parts-of-Speech Tagging
 It can either occur as part of WSD, or as a independent
task.
 It helps in identifying parts of speech, whether Noun,
Verb, Adjective, etc.
Stemming
 Eg: Close, Closed, Closely, Closer
 Converting the word to its base form
Python NLTK and OpenNLP
 NLTK is one of the leading platforms for working with
human language data and Python, the module NLTK is
used for natural language processing.
 NLTK is literally an acronym for Natural Language Toolkit.
 The Apache OpenNLP library is a machine learning based
toolkit for the processing of natural language text.
 It supports the most common NLP tasks, such as
tokenization, sentence segmentation, part-of-speech
tagging, named entity extraction, chunking, parsing, and
coreference resolution.
Python NLTK
 Step 1: Collect all individual Sentences in an article, to
a list.
 Tokenization () from NLTK: ie from nltk.tokenize we
can import the functions sent_tokenize(breakdown into
sentences) and word_tokenize(breakdown into words)
 Import stopwords () from nltk.corpus module.
 punctuation from string module.
 Note : Sentence ends with a period symbol(.) and a space
after that.
Frequency distribution
 Construct a frequency distribution : words and no of
times each word occurs
 Functions Defined for NLTK's Frequency Distributions
Example Description
fdist = FreqDist(samples) create a frequency distribution containing the given samples
fdist[sample] += 1 increment the count for this sample
fdist['monstrous'] count of the number of times a given sample occurred
fdist.freq('monstrous') frequency of a given sample
fdist.N() total number of samples
fdist.most_common(n) the n most common samples and their frequencies
for sample in fdist: iterate over the samples
fdist.max() sample with the greatest count
fdist.tabulate() tabulate the frequency distribution
fdist.plot() graphical plot of the frequency distribution
fdist.plot(cumulative=True) cumulative plot of the frequency distribution
fdist1 |= fdist2 update fdist1 with counts from fdist2
fdist1 < fdist2 test if samples in fdist1 occur less frequently than in fdist2
Use Tokenizing: Sentence Detector
 Python Usage
 Step 1: Import NLTK
 Step2:
 text = "Mary had a little lamp. Her fleece was as white as snow"
 from nltk.tokenize import word_tokenize, sent_tokenize
 sents = sent_tokenize(text)
 print(sents)
 Java Usage
 Step 1:
 Step 2: Some Java code snippet
OpenNLP syntax
 OpenNLP components have similar APIs. Normally, to
execute a task, one should provide a model and an
input.
 A model is usually loaded by providing a
FileInputStream with a model to a constructor of the
model class
 try (InputStream modelIn = new
FileInputStream("lang-model-name.bin")) {
SomeModel model = new SomeModel(modelIn); }
Features
 Language detection
 https://www.apache.org/dist/opennlp/models/langdete
ct/1.8.3/README.txt
Breaking into Word
 Python
 words=[word_tokenize(sent) for sent in sents]
 print words
 Java
 InputStream is = new FileInputStream("en-token.bin");
 TokenizerModel model = new TokenizerModel(is);
 Tokenizer tokenizer = new TokenizerME(model);
 String tokens[] = tokenizer.tokenize("Hi. How are you? This
is Mike.");
 for (String a : tokens)
System.out.println(a);
 is.close();
POS - Tagging
https://www.winwaed.com/blog/2011/11/08/part-of-speech-tags/
I hope I made myself
understandable.
Thanks!

Contenu connexe

Tendances

Artificial Intelligence in cybersecurity
Artificial Intelligence in cybersecurityArtificial Intelligence in cybersecurity
Artificial Intelligence in cybersecurity
SmartlearningUK
 
Prompt Engineering.pptx
Prompt Engineering.pptxPrompt Engineering.pptx
Prompt Engineering.pptx
ahmedmishfaq
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translation
Marcis Pinnis
 
Artificial Intelligence (A I)
Artificial Intelligence (A I)Artificial Intelligence (A I)
Artificial Intelligence (A I)
NaveenXavier7
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Mariana Soffer
 
Creating Chatbots Using TensorFlow | Chatbot Tutorial | Deep Learning Trainin...
Creating Chatbots Using TensorFlow | Chatbot Tutorial | Deep Learning Trainin...Creating Chatbots Using TensorFlow | Chatbot Tutorial | Deep Learning Trainin...
Creating Chatbots Using TensorFlow | Chatbot Tutorial | Deep Learning Trainin...
Edureka!
 
Chatbot_Presentation
Chatbot_PresentationChatbot_Presentation
Chatbot_Presentation
Rohan Chikorde
 
Mother of Language`s Langchain
Mother of Language`s LangchainMother of Language`s Langchain
Mother of Language`s Langchain
Jun-hang Lee
 
Some Preliminary Thoughts on Artificial Intelligence - April 20, 2023.pdf
Some Preliminary Thoughts on Artificial Intelligence - April 20, 2023.pdfSome Preliminary Thoughts on Artificial Intelligence - April 20, 2023.pdf
Some Preliminary Thoughts on Artificial Intelligence - April 20, 2023.pdf
Kent Bye
 
Generative AI: Redefining Creativity and Transforming Corporate Landscape
Generative AI: Redefining Creativity and Transforming Corporate LandscapeGenerative AI: Redefining Creativity and Transforming Corporate Landscape
Generative AI: Redefining Creativity and Transforming Corporate Landscape
Osaka University
 
ARTIFICIAL INTELLIGENCE
ARTIFICIAL INTELLIGENCEARTIFICIAL INTELLIGENCE
ARTIFICIAL INTELLIGENCE
HANISHTHARWANI21BCE1
 
Implications of GPT-3
Implications of GPT-3Implications of GPT-3
Implications of GPT-3
Raven Jiang
 
The-CxO-Guide-to.pdf
The-CxO-Guide-to.pdfThe-CxO-Guide-to.pdf
The-CxO-Guide-to.pdf
wsscbbhngychpsvlsd
 
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
VINCI Digital - Industrial IoT (IIoT) Strategic Advisory
 
Lecture2 - Machine Learning
Lecture2 - Machine LearningLecture2 - Machine Learning
Lecture2 - Machine Learning
Albert Orriols-Puig
 
AI in India: A Strategic Necessity
AI in India: A Strategic NecessityAI in India: A Strategic Necessity
AI in India: A Strategic Necessity
Social Samosa
 
ChatGPT - AI.pdf
ChatGPT - AI.pdfChatGPT - AI.pdf
ChatGPT - AI.pdf
Bannoon1
 
The future of machine learning platform
The future of machine learning platformThe future of machine learning platform
The future of machine learning platform
GAVS Technologies
 
A Framework for Navigating Generative Artificial Intelligence for Enterprise
A Framework for Navigating Generative Artificial Intelligence for EnterpriseA Framework for Navigating Generative Artificial Intelligence for Enterprise
A Framework for Navigating Generative Artificial Intelligence for Enterprise
RocketSource
 
Conférence Sécurité et Intelligence Artificielle - INHESJ 2018
Conférence Sécurité et Intelligence Artificielle - INHESJ 2018Conférence Sécurité et Intelligence Artificielle - INHESJ 2018
Conférence Sécurité et Intelligence Artificielle - INHESJ 2018
OPcyberland
 

Tendances (20)

Artificial Intelligence in cybersecurity
Artificial Intelligence in cybersecurityArtificial Intelligence in cybersecurity
Artificial Intelligence in cybersecurity
 
Prompt Engineering.pptx
Prompt Engineering.pptxPrompt Engineering.pptx
Prompt Engineering.pptx
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translation
 
Artificial Intelligence (A I)
Artificial Intelligence (A I)Artificial Intelligence (A I)
Artificial Intelligence (A I)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Creating Chatbots Using TensorFlow | Chatbot Tutorial | Deep Learning Trainin...
Creating Chatbots Using TensorFlow | Chatbot Tutorial | Deep Learning Trainin...Creating Chatbots Using TensorFlow | Chatbot Tutorial | Deep Learning Trainin...
Creating Chatbots Using TensorFlow | Chatbot Tutorial | Deep Learning Trainin...
 
Chatbot_Presentation
Chatbot_PresentationChatbot_Presentation
Chatbot_Presentation
 
Mother of Language`s Langchain
Mother of Language`s LangchainMother of Language`s Langchain
Mother of Language`s Langchain
 
Some Preliminary Thoughts on Artificial Intelligence - April 20, 2023.pdf
Some Preliminary Thoughts on Artificial Intelligence - April 20, 2023.pdfSome Preliminary Thoughts on Artificial Intelligence - April 20, 2023.pdf
Some Preliminary Thoughts on Artificial Intelligence - April 20, 2023.pdf
 
Generative AI: Redefining Creativity and Transforming Corporate Landscape
Generative AI: Redefining Creativity and Transforming Corporate LandscapeGenerative AI: Redefining Creativity and Transforming Corporate Landscape
Generative AI: Redefining Creativity and Transforming Corporate Landscape
 
ARTIFICIAL INTELLIGENCE
ARTIFICIAL INTELLIGENCEARTIFICIAL INTELLIGENCE
ARTIFICIAL INTELLIGENCE
 
Implications of GPT-3
Implications of GPT-3Implications of GPT-3
Implications of GPT-3
 
The-CxO-Guide-to.pdf
The-CxO-Guide-to.pdfThe-CxO-Guide-to.pdf
The-CxO-Guide-to.pdf
 
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
 
Lecture2 - Machine Learning
Lecture2 - Machine LearningLecture2 - Machine Learning
Lecture2 - Machine Learning
 
AI in India: A Strategic Necessity
AI in India: A Strategic NecessityAI in India: A Strategic Necessity
AI in India: A Strategic Necessity
 
ChatGPT - AI.pdf
ChatGPT - AI.pdfChatGPT - AI.pdf
ChatGPT - AI.pdf
 
The future of machine learning platform
The future of machine learning platformThe future of machine learning platform
The future of machine learning platform
 
A Framework for Navigating Generative Artificial Intelligence for Enterprise
A Framework for Navigating Generative Artificial Intelligence for EnterpriseA Framework for Navigating Generative Artificial Intelligence for Enterprise
A Framework for Navigating Generative Artificial Intelligence for Enterprise
 
Conférence Sécurité et Intelligence Artificielle - INHESJ 2018
Conférence Sécurité et Intelligence Artificielle - INHESJ 2018Conférence Sécurité et Intelligence Artificielle - INHESJ 2018
Conférence Sécurité et Intelligence Artificielle - INHESJ 2018
 

Similaire à Natural Language Processing: Comparing NLTK and OpenNLP

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
CloudxLab
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
rudolf eremyan
 
Nltk - Boston Text Analytics
Nltk - Boston Text AnalyticsNltk - Boston Text Analytics
Nltk - Boston Text Analytics
shanbady
 
Nltk
NltkNltk
Nltk
Anirudh
 
NLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in PythonNLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in Python
shanbady
 
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptAI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
pavankalyanadroittec
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
Massimo Schenone
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Apache OpenNLP
 
NLP.pptx
NLP.pptxNLP.pptx
NLP.pptx
Rahul Borate
 
NLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyNLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easy
outsider2
 
Natural Language Processing made easy
Natural Language Processing made easyNatural Language Processing made easy
Natural Language Processing made easy
Gopi Krishnan Nambiar
 
Large Scale Text Processing
Large Scale Text ProcessingLarge Scale Text Processing
Large Scale Text Processing
Suneel Marthi
 
Large Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextLarge Scale Processing of Unstructured Text
Large Scale Processing of Unstructured Text
DataWorks Summit
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
Gabriel Hamilton
 
Text Mining Infrastructure in R
Text Mining Infrastructure in RText Mining Infrastructure in R
Text Mining Infrastructure in R
Ashraf Uddin
 
Devoxx traitement automatique du langage sur du texte en 2019
Devoxx   traitement automatique du langage sur du texte en 2019 Devoxx   traitement automatique du langage sur du texte en 2019
Devoxx traitement automatique du langage sur du texte en 2019
Alexis Agahi
 
Natural language processing using python
Natural language processing using pythonNatural language processing using python
Natural language processing using python
Prakash Anand
 
PYTHON PPT.pptx
PYTHON PPT.pptxPYTHON PPT.pptx
PYTHON PPT.pptx
AbhishekMourya36
 
Presentation1
Presentation1Presentation1
Presentation1
Liba Cheema
 
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...
John Tinsley
 

Similaire à Natural Language Processing: Comparing NLTK and OpenNLP (20)

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
 
Nltk - Boston Text Analytics
Nltk - Boston Text AnalyticsNltk - Boston Text Analytics
Nltk - Boston Text Analytics
 
Nltk
NltkNltk
Nltk
 
NLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in PythonNLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in Python
 
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptAI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
 
NLP.pptx
NLP.pptxNLP.pptx
NLP.pptx
 
NLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyNLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easy
 
Natural Language Processing made easy
Natural Language Processing made easyNatural Language Processing made easy
Natural Language Processing made easy
 
Large Scale Text Processing
Large Scale Text ProcessingLarge Scale Text Processing
Large Scale Text Processing
 
Large Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextLarge Scale Processing of Unstructured Text
Large Scale Processing of Unstructured Text
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
 
Text Mining Infrastructure in R
Text Mining Infrastructure in RText Mining Infrastructure in R
Text Mining Infrastructure in R
 
Devoxx traitement automatique du langage sur du texte en 2019
Devoxx   traitement automatique du langage sur du texte en 2019 Devoxx   traitement automatique du langage sur du texte en 2019
Devoxx traitement automatique du langage sur du texte en 2019
 
Natural language processing using python
Natural language processing using pythonNatural language processing using python
Natural language processing using python
 
PYTHON PPT.pptx
PYTHON PPT.pptxPYTHON PPT.pptx
PYTHON PPT.pptx
 
Presentation1
Presentation1Presentation1
Presentation1
 
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...
 

Plus de CodeOps Technologies LLP

AWS Serverless Event-driven Architecture - in lastminute.com meetup
AWS Serverless Event-driven Architecture - in lastminute.com meetupAWS Serverless Event-driven Architecture - in lastminute.com meetup
AWS Serverless Event-driven Architecture - in lastminute.com meetup
CodeOps Technologies LLP
 
Understanding azure batch service
Understanding azure batch serviceUnderstanding azure batch service
Understanding azure batch service
CodeOps Technologies LLP
 
DEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNINGDEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNING
CodeOps Technologies LLP
 
SERVERLESS MIDDLEWARE IN AZURE FUNCTIONS
SERVERLESS MIDDLEWARE IN AZURE FUNCTIONSSERVERLESS MIDDLEWARE IN AZURE FUNCTIONS
SERVERLESS MIDDLEWARE IN AZURE FUNCTIONS
CodeOps Technologies LLP
 
BUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONS
BUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONSBUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONS
BUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONS
CodeOps Technologies LLP
 
APPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICES
APPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICESAPPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICES
APPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICES
CodeOps Technologies LLP
 
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPSBUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
CodeOps Technologies LLP
 
CREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNER
CREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNERCREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNER
CREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNER
CodeOps Technologies LLP
 
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
CodeOps Technologies LLP
 
WRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESS
WRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESSWRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESS
WRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESS
CodeOps Technologies LLP
 
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh SharmaTraining And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
CodeOps Technologies LLP
 
Deploy Microservices To Kubernetes Without Secrets by Reenu Saluja
Deploy Microservices To Kubernetes Without Secrets by Reenu SalujaDeploy Microservices To Kubernetes Without Secrets by Reenu Saluja
Deploy Microservices To Kubernetes Without Secrets by Reenu Saluja
CodeOps Technologies LLP
 
Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...
Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...
Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...
CodeOps Technologies LLP
 
YAML Tips For Kubernetes by Neependra Khare
YAML Tips For Kubernetes by Neependra KhareYAML Tips For Kubernetes by Neependra Khare
YAML Tips For Kubernetes by Neependra Khare
CodeOps Technologies LLP
 
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
CodeOps Technologies LLP
 
Monitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
Monitor Azure Kubernetes Cluster With Prometheus by Mamta JhaMonitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
Monitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
CodeOps Technologies LLP
 
Jet brains space intro presentation
Jet brains space intro presentationJet brains space intro presentation
Jet brains space intro presentation
CodeOps Technologies LLP
 
Functional Programming in Java 8 - Lambdas and Streams
Functional Programming in Java 8 - Lambdas and StreamsFunctional Programming in Java 8 - Lambdas and Streams
Functional Programming in Java 8 - Lambdas and Streams
CodeOps Technologies LLP
 
Distributed Tracing: New DevOps Foundation
Distributed Tracing: New DevOps FoundationDistributed Tracing: New DevOps Foundation
Distributed Tracing: New DevOps Foundation
CodeOps Technologies LLP
 
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire  "Distributed Tracing: New DevOps Foundation" by Jayesh Ahire
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire
CodeOps Technologies LLP
 

Plus de CodeOps Technologies LLP (20)

AWS Serverless Event-driven Architecture - in lastminute.com meetup
AWS Serverless Event-driven Architecture - in lastminute.com meetupAWS Serverless Event-driven Architecture - in lastminute.com meetup
AWS Serverless Event-driven Architecture - in lastminute.com meetup
 
Understanding azure batch service
Understanding azure batch serviceUnderstanding azure batch service
Understanding azure batch service
 
DEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNINGDEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNING
 
SERVERLESS MIDDLEWARE IN AZURE FUNCTIONS
SERVERLESS MIDDLEWARE IN AZURE FUNCTIONSSERVERLESS MIDDLEWARE IN AZURE FUNCTIONS
SERVERLESS MIDDLEWARE IN AZURE FUNCTIONS
 
BUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONS
BUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONSBUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONS
BUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONS
 
APPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICES
APPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICESAPPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICES
APPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICES
 
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPSBUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
 
CREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNER
CREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNERCREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNER
CREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNER
 
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
 
WRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESS
WRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESSWRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESS
WRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESS
 
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh SharmaTraining And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
 
Deploy Microservices To Kubernetes Without Secrets by Reenu Saluja
Deploy Microservices To Kubernetes Without Secrets by Reenu SalujaDeploy Microservices To Kubernetes Without Secrets by Reenu Saluja
Deploy Microservices To Kubernetes Without Secrets by Reenu Saluja
 
Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...
Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...
Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...
 
YAML Tips For Kubernetes by Neependra Khare
YAML Tips For Kubernetes by Neependra KhareYAML Tips For Kubernetes by Neependra Khare
YAML Tips For Kubernetes by Neependra Khare
 
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
 
Monitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
Monitor Azure Kubernetes Cluster With Prometheus by Mamta JhaMonitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
Monitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
 
Jet brains space intro presentation
Jet brains space intro presentationJet brains space intro presentation
Jet brains space intro presentation
 
Functional Programming in Java 8 - Lambdas and Streams
Functional Programming in Java 8 - Lambdas and StreamsFunctional Programming in Java 8 - Lambdas and Streams
Functional Programming in Java 8 - Lambdas and Streams
 
Distributed Tracing: New DevOps Foundation
Distributed Tracing: New DevOps FoundationDistributed Tracing: New DevOps Foundation
Distributed Tracing: New DevOps Foundation
 
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire  "Distributed Tracing: New DevOps Foundation" by Jayesh Ahire
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire
 

Dernier

Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
Alberto Brandolini
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
Lecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptxLecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptx
TaghreedAltamimi
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
YousufSait3
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
Rakesh Kumar R
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
sjcobrien
 
Top 9 Trends in Cybersecurity for 2024.pptx
Top 9 Trends in Cybersecurity for 2024.pptxTop 9 Trends in Cybersecurity for 2024.pptx
Top 9 Trends in Cybersecurity for 2024.pptx
devvsandy
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 

Dernier (20)

Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
Lecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptxLecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptx
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
 
Top 9 Trends in Cybersecurity for 2024.pptx
Top 9 Trends in Cybersecurity for 2024.pptxTop 9 Trends in Cybersecurity for 2024.pptx
Top 9 Trends in Cybersecurity for 2024.pptx
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 

Natural Language Processing: Comparing NLTK and OpenNLP

  • 1.
  • 4.  Natural language processing (NLP) is a field of computer science, artificial intelligence concerned with the interactions between computers and human (natural) languages, and, in particular, concerned with programming computers to fruitfully process large natural language data. What???
  • 5.  In 1950, Alan Turing published an article titled "Computing Machinery and Intelligence“ which proposed what is now called the Turing test as a criterion of intelligence.  Eliza 1964 ELIZA might provide a generic response, for example, responding to "My head hurts" with "Why do you say your head hurts?". When???
  • 7. Where?  Machine Translation  Fighting Spam  Mail Inbox or Spam  Information Extraction  Social media monitoring  Summarization  Question Answering
  • 8. Tasks in OpenNLP  The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.  It supports the most common NLP tasks, such as language detection, tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing and co reference resolution.
  • 9. Tasks in OpenNLP  Text data in the form of un-structured text, in the form of comments, reviews or articles  Extract meaning full information from them, done with the help of set of tasks  Tokenizing:  Take a large piece of text and break it into smaller components  Break it into sentences or individual words
  • 10. Stop words removal  Once Tokenized,  next Stop words removal, i.e. differentiating words which has specific meaning from the words which adds to structure to the sentence.  Eg:
  • 11. N-Grams  Once Stop words are removed,  Commonly occurring words in a sentence, because these will be most important words in the text. https://en.wikipedia.org/wiki/N-gram#Examples  If a word appears 2 times in a particular sentence, its called bigrams.  Eg:  Code(s) Description  M79.661 Pain in right lower leg  M79.662 Pain in left lower leg  M79.669 Pain in unspecified lower leg
  • 12. Word Sense Disambiguation  Eg:  I am taking aspirin for my cold  Let's go inside, I'm cold  It's cold today, only 2 degrees  It identifies the meaning of the word, based on the context it is spoken.
  • 13. Parts-of-Speech Tagging  It can either occur as part of WSD, or as a independent task.  It helps in identifying parts of speech, whether Noun, Verb, Adjective, etc. Stemming  Eg: Close, Closed, Closely, Closer  Converting the word to its base form
  • 14. Python NLTK and OpenNLP  NLTK is one of the leading platforms for working with human language data and Python, the module NLTK is used for natural language processing.  NLTK is literally an acronym for Natural Language Toolkit.  The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.  It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution.
  • 15. Python NLTK  Step 1: Collect all individual Sentences in an article, to a list.  Tokenization () from NLTK: ie from nltk.tokenize we can import the functions sent_tokenize(breakdown into sentences) and word_tokenize(breakdown into words)  Import stopwords () from nltk.corpus module.  punctuation from string module.  Note : Sentence ends with a period symbol(.) and a space after that.
  • 16. Frequency distribution  Construct a frequency distribution : words and no of times each word occurs  Functions Defined for NLTK's Frequency Distributions Example Description fdist = FreqDist(samples) create a frequency distribution containing the given samples fdist[sample] += 1 increment the count for this sample fdist['monstrous'] count of the number of times a given sample occurred fdist.freq('monstrous') frequency of a given sample fdist.N() total number of samples fdist.most_common(n) the n most common samples and their frequencies for sample in fdist: iterate over the samples fdist.max() sample with the greatest count fdist.tabulate() tabulate the frequency distribution fdist.plot() graphical plot of the frequency distribution fdist.plot(cumulative=True) cumulative plot of the frequency distribution fdist1 |= fdist2 update fdist1 with counts from fdist2 fdist1 < fdist2 test if samples in fdist1 occur less frequently than in fdist2
  • 17. Use Tokenizing: Sentence Detector  Python Usage  Step 1: Import NLTK  Step2:  text = "Mary had a little lamp. Her fleece was as white as snow"  from nltk.tokenize import word_tokenize, sent_tokenize  sents = sent_tokenize(text)  print(sents)  Java Usage  Step 1:  Step 2: Some Java code snippet
  • 18. OpenNLP syntax  OpenNLP components have similar APIs. Normally, to execute a task, one should provide a model and an input.  A model is usually loaded by providing a FileInputStream with a model to a constructor of the model class  try (InputStream modelIn = new FileInputStream("lang-model-name.bin")) { SomeModel model = new SomeModel(modelIn); }
  • 19. Features  Language detection  https://www.apache.org/dist/opennlp/models/langdete ct/1.8.3/README.txt
  • 20. Breaking into Word  Python  words=[word_tokenize(sent) for sent in sents]  print words  Java  InputStream is = new FileInputStream("en-token.bin");  TokenizerModel model = new TokenizerModel(is);  Tokenizer tokenizer = new TokenizerME(model);  String tokens[] = tokenizer.tokenize("Hi. How are you? This is Mike.");  for (String a : tokens) System.out.println(a);  is.close();
  • 22. I hope I made myself understandable. Thanks!