SlideShare une entreprise Scribd logo
1  sur  28
NLTK
Alberts Pumpurs
90% of world's data generated over last two years
common
Internet
user
creates
Visual Textual
Instagram
Flickr
Vscocam
Facebook
Tumblr
Blogger
Twitter
Facebook
Emails
Costumer Reviews
Detecting hidden signals
World is full of unstructured, text-rich
data. Everything from emails to
customer tweets.
The information buried in all that
text holds the potential to deliver
valuable business insights
Text analytics is the practice of using
technology to gather, store and mine
textual information for hidden signals
that can be used to inform smarter
business decisions
An explosion of
unstructured
data
Many types of organizations are
experiencing explosive growth in their
unstructured enterprise data.
Same time that they have access to
external sources of data such as social
media, blogs, and mobile data.
Until now, much of this information
passed through the organization virtually
unanalyzed. Today, new tools for
handling large amounts of complex data
makes it easier to squeeze value from
such unlikely sources.
Text Processing
use cases
sentiment analysis
spam filtering
text categorization
topic detection
keyword frequency
plagiatism detection
document similarity
phrase extraction
Natural Language
Tool Kit
leading platform for building
Python programs to work with
human language data
NLTK Features
sentence and word tokenization
text calsification
corpora
parsing
clustring
part of speach tagging
text stemming
and mutch more..
Sentence
tokenization
Word
tokenization
Part of speech
tagging
Part of speech
tagging explanation
CC Coordinating conjunctin
CD Cardinal Number
DT Determiner
EX Existing “ there“
FW Foreign word
IN Preposition or subordination conjuction
JJ Adjective
JJR Adjective- comparative
JJS Adjective- superlative
LS List item marker
MD Modal
NN Noun- singular or mass
NNS Non-Plural
NP Proper noun- singular
nltk.help.upenn_tagset() //all tag sets
Chunking and NER
Text
clasification
Algorithms in
NLTK
Naive Bayes
Maximum Entropy
Decision Tree
Text clasification
Sentiment analysis
https://github.com/pumpurs/SentimentWordsLV/
Document similarity
detection
Tf-idf stands for term frequency-inverse document frequency, and the tf-idf weight is
a weight often used in information retrieval and text mining. This weight is a statistical
measure used to evaluate how important a word is to a document in a collection or
corpus.
Similarity and
concordance
Dispersion Plot
“Market and product reserch”
“Social CMS”
1.97 b social network users
“Costumer profiling / analytics”
70% of marketers used Facebook to gain
6.7 million people blog on blogging sites
pumpurs.alberts@gmail.com
Big Data, Startups, Text Analysis, Internet of Things, Web Development

Contenu connexe

Tendances

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Yasir Khan
 

Tendances (20)

The Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalThe Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information Retrieval
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - Introduction
 
NLP
NLPNLP
NLP
 
Natural Language Processing and Machine Learning
Natural Language Processing and Machine LearningNatural Language Processing and Machine Learning
Natural Language Processing and Machine Learning
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Introduction to NLTK
Introduction to NLTKIntroduction to NLTK
Introduction to NLTK
 
Nltk - Boston Text Analytics
Nltk - Boston Text AnalyticsNltk - Boston Text Analytics
Nltk - Boston Text Analytics
 
September 2021: Top10 Cited Articles in Natural Language Computing
September 2021: Top10 Cited Articles in Natural Language ComputingSeptember 2021: Top10 Cited Articles in Natural Language Computing
September 2021: Top10 Cited Articles in Natural Language Computing
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2
 
OpenNLP demo
OpenNLP demoOpenNLP demo
OpenNLP demo
 
Natural language processing using python
Natural language processing using pythonNatural language processing using python
Natural language processing using python
 
Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games Research
 
NLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyNLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easy
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 words
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 

Similaire à Python NLTK

Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantini
maxfalc
 
Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  	Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  
sstose
 
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)
Uma Se
 
Content Analyst - Conceptualizing LSI Based Text Analytics White Paper
Content Analyst - Conceptualizing LSI Based Text Analytics White PaperContent Analyst - Conceptualizing LSI Based Text Analytics White Paper
Content Analyst - Conceptualizing LSI Based Text Analytics White Paper
John Felahi
 

Similaire à Python NLTK (20)

Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantini
 
Top 10 Must-Know NLP Techniques for Data Scientists
Top 10 Must-Know NLP Techniques for Data ScientistsTop 10 Must-Know NLP Techniques for Data Scientists
Top 10 Must-Know NLP Techniques for Data Scientists
 
Predictive Text Analytics
Predictive Text AnalyticsPredictive Text Analytics
Predictive Text Analytics
 
Text Analytics Overview, 2011
Text Analytics Overview, 2011Text Analytics Overview, 2011
Text Analytics Overview, 2011
 
Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  	Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
 
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsMining Opinion Features in Customer Reviews
Mining Opinion Features in Customer Reviews
 
Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptx
 
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)
 
Web and text
Web and textWeb and text
Web and text
 
Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining Techniques
 
Fast and accurate sentiment classification us and naive bayes model b516001
Fast and accurate sentiment classification  us and naive bayes model b516001Fast and accurate sentiment classification  us and naive bayes model b516001
Fast and accurate sentiment classification us and naive bayes model b516001
 
leewayhertz.com-Named Entity Recognition NER Unveiling the value in unstructu...
leewayhertz.com-Named Entity Recognition NER Unveiling the value in unstructu...leewayhertz.com-Named Entity Recognition NER Unveiling the value in unstructu...
leewayhertz.com-Named Entity Recognition NER Unveiling the value in unstructu...
 
Week12
Week12Week12
Week12
 
Content Analyst - Conceptualizing LSI Based Text Analytics White Paper
Content Analyst - Conceptualizing LSI Based Text Analytics White PaperContent Analyst - Conceptualizing LSI Based Text Analytics White Paper
Content Analyst - Conceptualizing LSI Based Text Analytics White Paper
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
 
Phrase Based Indexing
Phrase Based IndexingPhrase Based Indexing
Phrase Based Indexing
 
Phrase Based Indexing and Information Retrivel
Phrase Based Indexing and Information RetrivelPhrase Based Indexing and Information Retrivel
Phrase Based Indexing and Information Retrivel
 
ShortStory_bioCaster.pptx
ShortStory_bioCaster.pptxShortStory_bioCaster.pptx
ShortStory_bioCaster.pptx
 

Python NLTK